Forgot your password?
typodupeerror
Data Storage Windows News

What Desktop Search Engine For a Shared Volume? 232

Posted by timothy
from the which-side-of-the-firewall dept.
kriston writes 'Searching data on a shared volume is tedious. If I try to use a Windows desktop search engine on a volume with hundreds of gigabytes the indexing process takes days and the search results are slow and unsatisfying. I'm thinking of an agent that runs on the server that regularly indexes and talks to the desktop machines running the search interface. How do you integrate your desktop search application with your remote file server without forcing each desktop to index the hundred gigabyte volume on its own?'
This discussion has been archived. No new comments can be posted.

What Desktop Search Engine For a Shared Volume?

Comments Filter:
  • Everything (Score:2, Informative)

    by OttoErotic (934909) on Monday October 19, 2009 @06:46PM (#29801333)
    How about Everything [voidtools.com] (assuming the server is Windows & NTFS)? Works well for me (quickest desktop search I've found yet), and can either run locally or connect to an ETP server. The site seems to be down right now, but here's the original Lifehacker article [lifehacker.com] where I found it. Incidentally, I never heard of ETP til I started using it. Anyone know if it's an Everything-specific protocol?
  • A couple of options (Score:4, Informative)

    by Unhandled (1660063) on Monday October 19, 2009 @06:48PM (#29801353)
    Here's a few options you might want to consider: 1) Use Office SharePoint Server 2007 to index the share 2) Upgrade to Windows Server 2008 (or above) and Windows Vista (or above) and use the Federated search feature: http://trycatch.be/blogs/roggenk/archive/2007/11/05/windows-vista-amp-windows-server-2008-federated-search.aspx [trycatch.be]
  • by dxdisaster (1121229) on Monday October 19, 2009 @06:49PM (#29801371)
    I guess it could work, although you can't index the files directly. You have to run a local copy and one on the server as an EPT Server. www.voidtools.com, although it seems to be down at the moment, so here's a link to the FAQ on Google's Cache: http://74.125.113.132/search?q=cache:fcYHcEJKH3UJ:www.voidtools.com/faq.php [74.125.113.132]
  • Federated Search (Score:5, Informative)

    by Anonymous Coward on Monday October 19, 2009 @06:49PM (#29801375)

    MS does have a solution, it's called Windows Federated Search. Windows 7 with 2008R2 has it .. there might be a way to do with Windows Desktop Search 4.0. Here's some info on it - http://geekswithblogs.net/sdorman/archive/2009/05/14/windows-7-federated-search.aspx

  • by RicRoc (41406) on Monday October 19, 2009 @06:54PM (#29801439) Homepage

    Yes, Google's Search Appliance (GSA) could be used, I have seen it used with limited success. The main problem was how to respect access control on documents: either you index them or you don't, and if you index them with GSA, sensitive data may show up in search results. Also, we had a lot of trouble "taming" GSA: it would regularly take down servers that were dimensioned for light loads.

    I would suggest using Alfresco http://www.alfresco.com/ [alfresco.com] as a CIFS (Common Internet File System) or WebDav store for all those documents. This would give you the simplicity of a shared folder and the opportunity to enrich the documents with searchable metadata such as tags, etc. Each folder (or any item, in fact) could have the correct access control that would be respected by the search engine, Lucene. http://lucene.apache.org/java/docs/ [apache.org]

    Alfresco comes in both Enterprise and Community Edition, it's very easy to try out -- even our non-techie project manager could install it on his PC within 10 minutes. Try that with Documentum, FileNet or IBM DB2 Content Manager!

  • by thedbp (443047) on Monday October 19, 2009 @07:02PM (#29801527)

    *ducks*

  • Re:wow! (Score:2, Informative)

    by beelsebob (529313) on Monday October 19, 2009 @07:20PM (#29801715)

    I was wondering if "spotlight on OS X does that just fine" was too trollish... But, too late, I said it, and it's true, can't be trollish if it's true can it?

  • by Anonymous Coward on Monday October 19, 2009 @07:21PM (#29801733)

    Don't you mean,

    grep damnfunctionname -R . --include='*.php*'

    I guess if you're skipping perfectly cromulent indexing servers, you might as well needlessly break out the pipes, too.

  • by Henriok (6762) on Monday October 19, 2009 @07:25PM (#29801775)

    Yeah, my thought exactly? I wasn't aware that it was a problem searching hundreds of gigabytes on shared volumes. We have a couple of terabytes shared by our Mac servers and I don't think I've had search times longer than ten seconds over a couple of million files.. MS Office files, PDFs, movies, audio, pictures, photographs, text, HTML, source code.. all indexed with metadata and contents.

    Even the days before Spotlight, using AppleShare IP Servers in the 90s, finding stuff on the servers was never an issue. It has always been so fast that I have never even reflected over that it was fast. Maybe I should use some other operating system once in a while to experience what the majority experiences. Or not.. I'd rather stay care free and productive.

    Don't call me when you figure this out.

  • NO! Try Alfresco (Score:5, Informative)

    by thule (9041) on Monday October 19, 2009 @07:28PM (#29801809) Homepage
    SharePoint is $$$$. Try Alfresco. Alfresco can look like a file share (support SMB, DAV, FTP, etc). The indexing is built is and does not require a separate SQL Server license.
  • Re:NO! Try Alfresco (Score:5, Informative)

    by Orion Blastar (457579) <[orionblastar] [at] [gmail.com]> on Monday October 19, 2009 @07:39PM (#29801923) Homepage Journal

    You mean the Document Management Alfresco [alfresco.com] and not the CMS software. The Community Edition is free but unsupported, and the Enterprise edition has a free 30 day trial. It looks like it won a government award for document management which is rare for open source document management software.

  • Re:SSH and locate. (Score:1, Informative)

    by Anonymous Coward on Monday October 19, 2009 @08:20PM (#29802315)

    Well, if you are mounting a networked filesystem (e.g. use sshfs), then there's no reason to mirror it locally. But even if you did, GNU locate and slocate understand $LOCATE_PATH as a list of databases, so you can use both..

    Of course, there's the whole mount-point issue, but either a hacked updatedb or a mount --bind lets you build the index with a fictitious prefix. e.g.:
    mntpt=/ssh-$HOSTNAME
    mkdir $mntpt
    mount --rbind /home/music $mntpt
    #for slocate:
    updatedb -U $mntpt -o $mntpt/slocate.db
    #not 100% sure on GNU updatedb syntax, but you get the point...
    umount /ssh-$HOSTNAME

    Now you can:
    scp music@host.net:slocate.db .musicdb
    export LOCATE_PATH=$LOCATE_PATH:$HOME/.musicdb
    locate mytune

    and receive results of the form
    . . .
    /usr/share/some/path/armytune.wav
    /ssh-host.net/myalbum/mytune.ogg
    . . .

    -- that is, local and remote results together. Or just make an alias for slocate -l0 -d $HOME/.musicdb, if you want searches only on the remote volume.

    p.s.

    WRT GP's music server -> X forwarding, I've come to the conclusion there's no one right way to deal with music indexing/databasing/serving/streaming/playing, but for my needs, I've always found mpd the right solution. Just thought I'd throw it out there...

  • by mrnutz (108477) on Monday October 19, 2009 @08:33PM (#29802439)

    (Disclaimer: I work for Extensis)

    Portfolio Server can continuously index files on SMB/CIFS (and AFP) volumes using a feature called "AutoSync". Web and Desktop (Windows/Mac) clients then search by folder name, file name, document text, or other metadata. Indexing and thumbnail creation takes place on the server, so clients are relieved of any cataloging workload and metadata is centralized.

    http://www.extensis.com/en/products/portfolioserver9/overview.jsp

  • by peter (3389) on Monday October 19, 2009 @10:31PM (#29803399) Homepage

    FYI, GNU find has xargs built in these days:

    find -name '*.php*' -exec grep func {} +

    the + instead of ';' makes it collect up multiple arguments to grep
    like xargs instead of the traditional find -exec behaviour which is like xargs -n1. I use -exec {} + all the time, because it's less typing, and safe with
    filenames with punctuation or whitespace, so you don't have to type -print0 | xargs -0 either. (BTW, if you have a list of filenames that you processes with something line oriented, you can use xargs -d'\n')

  • Re:Call the NSA (Score:2, Informative)

    by jo42 (227475) on Monday October 19, 2009 @11:06PM (#29803641) Homepage

    Or Google. They already wire-tap your voice mails, index them and post the results for public consumption [slashdot.org].

  • by bhpaddock (830350) on Tuesday October 20, 2009 @12:49AM (#29804205) Homepage

    For indexing files, you're better off using Windows Search 4, a free download for Windows Server 2003. The old content indexing service is deprecated and a much older technology. It's useful in some particular scenarios but for a smaller (100,000 - 250,000 items*) corpus of file content, WS4 will work much better. And for larger repositories, SharePoint and Microsoft Search Server are almost always better options.

    * = Server 2008 R2 / Win7 has a newer version of the Windows Search indexer that scales better to even larger corpuses.

  • Re:NO! Try Alfresco (Score:3, Informative)

    by thule (9041) on Tuesday October 20, 2009 @02:37AM (#29804623) Homepage
    My understanding is that you only get the full document text search when the data is backed by a real SQLServer license. The person was looking for a full search solution. This is built into Alfresco.
    SQLServer is per CAL even though the app is a web app.
  • by blowdart (31458) on Tuesday October 20, 2009 @03:43AM (#29804931) Homepage
    Oh well, if we're recommending MS solutions on slashdot (ah karma suicide) then good old Windows Desktop Search works just as well. Since V4.0 came out you can have WDS on other machines, indexing away and it's the remote index that is queried - so no need for local machines to index remote shares. Plus, like sharepoint (spit) indexing, and Index Server before that it uses iFilters, so format aware indexing is available for most of the common formats a business uses.
  • Re:NO! Try Alfresco (Score:3, Informative)

    by jasonwea (598696) * on Wednesday October 21, 2009 @12:10AM (#29819249) Homepage
    Full text search in Alfresco uses Lucene. Or at least it did when I deployed it on Debian with PostgreSQL.

We have a equal opportunity Calculus class -- it's fully integrated.

Working...