Slashdot is powered by your submissions, so send in your scoop

What Desktop Search Engine For a Shared Volume? 232

Posted by timothy on Monday October 19, 2009 @06:45PM from the which-side-of-the-firewall dept.

kriston writes 'Searching data on a shared volume is tedious. If I try to use a Windows desktop search engine on a volume with hundreds of gigabytes the indexing process takes days and the search results are slow and unsatisfying. I'm thinking of an agent that runs on the server that regularly indexes and talks to the desktop machines running the search interface. How do you integrate your desktop search application with your remote file server without forcing each desktop to index the hundred gigabyte volume on its own?'

This discussion has been archived. No new comments can be posted.

What Desktop Search Engine For a Shared Volume?

Load All Comments

Search 232 Comments Log In/Create an Account

Comments Filter:

Call the NSA (Score:5, Funny)

by Anonymous Coward writes: on Monday October 19, 2009 @05:52PM (#29800715)

They already have it indexed for you.

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by jo42 ( 227475 ) writes:
  
  Or Google. They already wire-tap your voice mails, index them and post the results for public consumption [slashdot.org].
wow! (Score:5, Funny)

by tivoKlr ( 659818 ) writes: on Monday October 19, 2009 @05:53PM (#29800733) Journal

It's been an hour since this story was posted.
You've stumped Slashdot. Bravo!

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by beelsebob ( 529313 ) writes:
  
  I was wondering if "spotlight on OS X does that just fine" was too trollish... But, too late, I said it, and it's true, can't be trollish if it's true can it?
  - Re: (Score:2)
    
    by cyber-dragon.net ( 899244 ) writes:
    
    That's what we use... server version indexes servers, SAN volumes etc and makes them searchable from each desktop ;)
    - Re: (Score:2)
      
      by v1 ( 525388 ) writes:
      
      and it's faaaaast
      It searches by content as well as by filename too for those that didn't know. Incredibly useful.
  - Spotlight (Score:2)
    
    by Herve5 ( 879674 ) writes:
    
    ... seconded
  - - Re: (Score:2)
      
      by willy_me ( 212994 ) writes:
      
      I believe that the newer versions of OSX server will perform indexing on the volume so the clients do not have to. So searches are fast and the problems you described are gone.
      Ok, just went over to apple.com to find the info page. It is here [apple.com].
Google Enterprise Search (Score:5, Interesting)

by HeavyD14 ( 898751 ) writes: on Monday October 19, 2009 @05:54PM (#29800757) Homepage

Not that I've ever used it before, but it sounds like it does what you want: http://www.google.com/enterprise/search/gsa.html [google.com]

Share
twitter facebook
- Everything (Score:2, Informative)
  
  by OttoErotic ( 934909 ) writes:
  
  How about Everything [voidtools.com] (assuming the server is Windows & NTFS)? Works well for me (quickest desktop search I've found yet), and can either run locally or connect to an ETP server. The site seems to be down right now, but here's the original Lifehacker article [lifehacker.com] where I found it. Incidentally, I never heard of ETP til I started using it. Anyone know if it's an Everything-specific protocol?
  - Re: (Score:2)
    
    by The MAZZTer ( 911996 ) writes:
    
    Well it sounds like it is. I should note that Everything only indexes filenames, so if you want to index file CONTENTS you're out of luck (that sort of thing is GOING to take a long time anyway, since you have to read every file on disk that the indexer knows how to parse, so "quicker" could well translate to "less complete search index").
    But if you don't care about indexing contents then Everything should work fine for you.
  - Re: (Score:2)
    
    by stg ( 43177 ) writes:
    
    Everything only searches for file and folder names. While that is useful too, usually Search Engine presumes searching inside the files too.
- Re: (Score:2)
  
  by xoundmind ( 932373 ) writes:
  
  It's exactly whats he wants. In a shared Windows environment, it beats the native Outlook search speed handily and covers my organization's shared drive. Actually, the search speed has saved my a few times in being able to reconstruct a problem and react accordingly.
  - Re: (Score:2)
    
    by lorenlal ( 164133 ) writes:
    
    ...it beats the native Outlook search speed handily...
    Honestly? I beat native Outlook search on a fairly regular basis. I cried tears of happiness and joy when Lookout hit the scene. I choked up a bit when Microsoft bought them out, but I recovered when it looked like they integrated the engine into the engine Microsoft uses to do Desktop search on XP.
- Seriously, ask slashdot? (Score:2)
  
  by GeckoAddict ( 1154537 ) writes:
  
  There are a number of decent options, which one to pick depends on specific requirements not included in the original question. Did the OP even search?
  
  http://en.wikipedia.org/wiki/List_of_enterprise_search_vendors [wikipedia.org]
  - Re: (Score:3, Insightful)
    
    by cbiltcliffe ( 186293 ) writes:
    
    Maybe search, but didn't know what to search for.
    If this is a guy who's used to doing home and small business support, with a handful of machines at best, and kind of got thrown into the deep end by management because he's "good with them thar cmpooterz" then he may not be thinking "enterprise" search.
    After all, why would anybody need a search engine to find a starship?
    Probably searched for "shared drive search engine" or something like that.
solution to hundreds of terabytes of docs (Score:3, Interesting)

by Anonymous Coward writes: on Monday October 19, 2009 @05:55PM (#29800771)

how about using a program like Documentum? We generate several thousand technical documents and drawing a month, and use it for all our document management needs.

Share
twitter facebook
- Re: (Score:2)
  
  by mu51c10rd ( 187182 ) writes:
  
  I assume you mean the Verity part. Documentum can be a bit pricy for the whole suite of DiskXtender, AppXtender, and Verity, but it does work great for this purpose.
A couple of options (Score:4, Informative)

by Unhandled ( 1660063 ) writes: on Monday October 19, 2009 @06:48PM (#29801353)

Here's a few options you might want to consider: 1) Use Office SharePoint Server 2007 to index the share 2) Upgrade to Windows Server 2008 (or above) and Windows Vista (or above) and use the Federated search feature: http://trycatch.be/blogs/roggenk/archive/2007/11/05/windows-vista-amp-windows-server-2008-federated-search.aspx [trycatch.be]

Share
twitter facebook
- NO! Try Alfresco (Score:5, Informative)
  
  by thule ( 9041 ) writes: on Monday October 19, 2009 @07:28PM (#29801809) Homepage
  
  SharePoint is $$$$. Try Alfresco. Alfresco can look like a file share (support SMB, DAV, FTP, etc). The indexing is built is and does not require a separate SQL Server license.
  
  Parent Share
  twitter facebook
  - Re:NO! Try Alfresco (Score:5, Informative)
    
    by Orion Blastar ( 457579 ) writes: <orionblastarNO@SPAMgmail.com> on Monday October 19, 2009 @07:39PM (#29801923) Homepage Journal
    
    You mean the Document Management Alfresco [alfresco.com] and not the CMS software. The Community Edition is free but unsupported, and the Enterprise edition has a free 30 day trial. It looks like it won a government award for document management which is rare for open source document management software.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by spyrochaete ( 707033 ) writes:
    
    You could use Microsoft Enterprise Search Server Express [microsoft.com] which is free (if you have a Windows Server license laying around). It's the same search engine as MOSS without the CMS functionality and it can crawl just about everything either natively or with connectors. You can use MSSQL Express as the database engine which is also free.
    Or you could go completely open source with Apache SOLR [apache.org], though I hear it's so featureful that it's very difficult to install and configure.
  - - Re: (Score:3, Informative)
      
      by thule ( 9041 ) writes:
      
      My understanding is that you only get the full document text search when the data is backed by a real SQLServer license. The person was looking for a full search solution. This is built into Alfresco.
      SQLServer is per CAL even though the app is a web app.
      - Re: (Score:3, Informative)
        
        by jasonwea ( 598696 ) * writes:
        
        Full text search in Alfresco uses Lucene. Or at least it did when I deployed it on Debian with PostgreSQL.
- Re: (Score:3, Insightful)
  
  by shutdown -p now ( 807394 ) writes:
  
  Here's a few options you might want to consider: 1) Use Office SharePoint Server 2007 to index the share
  First, MOSS isn't free.
  Second, have you ever actually tried using SharePoint 2007 text search feature? I dunno what it indexes, but finding anything in that afterwards is about as convenient as searching for a needle in a haystack.
  There have been claims of some huge improvements in the upcoming SP2010, which is not surprising in light of Bing, but that's not released yet.
- Re:A couple of options (Score:5, Funny)
  
  by cyber-dragon.net ( 899244 ) writes: on Monday October 19, 2009 @07:37PM (#29801887)
  
  You are on /. and actually recommending an upgrade to Vista?
  Brave man.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by popeyecu ( 1660147 ) writes:
    
    I got your back. I love Federated Search, and so do my clients. It's way easier than any other solution, because it's in Windows and it "Just Works." Try it before you bash it, /.
- Re: (Score:3, Informative)
  
  by blowdart ( 31458 ) writes:
  
  Oh well, if we're recommending MS solutions on slashdot (ah karma suicide) then good old Windows Desktop Search works just as well. Since V4.0 came out you can have WDS on other machines, indexing away and it's the remote index that is queried - so no need for local machines to index remote shares. Plus, like sharepoint (spit) indexing, and Index Server before that it uses iFilters, so format aware indexing is available for most of the common formats a business uses.
- Re: (Score:2)
  
  by MrNemesis ( 587188 ) writes:
  
  Coming from a company that made heavy use of sharepoint (2003 and 2007 - to be fair 2007 is aeons ahead of 2003 but inherited out projects' terrible design) I can heartily say that the index tool sucks, and the relevance engine is piss poor; no one could ever, ever find anything on the company intranet, and one of the helldesk's most idiotic tasks was keeping a bookmark of all the important sub-sites and documents on the intranet... and all because devs fell for the idiocy of the "don't bother organising it
Everything (Search Engine) (Score:2, Informative)

by dxdisaster ( 1121229 ) writes:

I guess it could work, although you can't index the files directly. You have to run a local copy and one on the server as an EPT Server. www.voidtools.com, although it seems to be down at the moment, so here's a link to the FAQ on Google's Cache: http://74.125.113.132/search?q=cache:fcYHcEJKH3UJ:www.voidtools.com/faq.php [74.125.113.132]
Federated Search (Score:5, Informative)

by Anonymous Coward writes: on Monday October 19, 2009 @06:49PM (#29801375)

MS does have a solution, it's called Windows Federated Search. Windows 7 with 2008R2 has it .. there might be a way to do with Windows Desktop Search 4.0. Here's some info on it - http://geekswithblogs.net/sdorman/archive/2009/05/14/windows-7-federated-search.aspx

Share
twitter facebook
- - Re: (Score:2)
    
    by icepick72 ( 834363 ) writes:
    
    But of course those search plugins in Explorer provide an answer to the question. If the built-in ones don't work, customized providers can be built because it's extensible. Where you think GP doesn't know what is being talked about, rather he does and your comment is short-sighted especially considering the link you gave provides this answer.
  - Re: (Score:2)
    
    by bhpaddock ( 830350 ) writes:
    
    Actually it's quite relevant. Windows 7 can federate queries to a SharePoint or Search Server index using OpenSearch.
    Also, Windows Vista and Win7 (and even XP with WS4 to some extent) can query remote Windows Search indexes. I use this functionality along with my Windows Home Server (running WS4) for my personal needs.
Sharepoint Services (Score:2)

by anexkahn ( 935249 ) writes:

If you have a windows server, you can tell Share point to index the file share. See: http://dotnetmafia.sys-con.com/node/1046930 [sys-con.com]
Enterprise Content Management with Alfresco (Score:5, Informative)

by RicRoc ( 41406 ) writes: on Monday October 19, 2009 @06:54PM (#29801439) Homepage

Yes, Google's Search Appliance (GSA) could be used, I have seen it used with limited success. The main problem was how to respect access control on documents: either you index them or you don't, and if you index them with GSA, sensitive data may show up in search results. Also, we had a lot of trouble "taming" GSA: it would regularly take down servers that were dimensioned for light loads.
I would suggest using Alfresco http://www.alfresco.com/ [alfresco.com] as a CIFS (Common Internet File System) or WebDav store for all those documents. This would give you the simplicity of a shared folder and the opportunity to enrich the documents with searchable metadata such as tags, etc. Each folder (or any item, in fact) could have the correct access control that would be respected by the search engine, Lucene. http://lucene.apache.org/java/docs/ [apache.org]
Alfresco comes in both Enterprise and Community Edition, it's very easy to try out -- even our non-techie project manager could install it on his PC within 10 minutes. Try that with Documentum, FileNet or IBM DB2 Content Manager!

Share
twitter facebook
- Re: (Score:2)
  
  by Tim C ( 15259 ) writes:
  
  There is some stuff you can do with GSA to try to implement document security - you can set up separate collections/indexes (I forget the GSA term) for different parts of your document repository, then restrict the search results to specific indexes based on the logged-in user's credentials. (That's assuming you roll your own interface)
  Note that my one use of GSA was a couple of years ago, and we had an extremely simple security model with only 2 user types - one got access to everything, one got access to
- Re: (Score:2)
  
  by spyrochaete ( 707033 ) writes:
  
  The GSA works perfectly well with many security standards (AD, NTLM, Kerberos, SAML, forms, cookies) and it obscures secure search results from users who do not have read permissions to those documents. It's probably easier to configure the GSA's security settings than with any other enterprise search platform.
  Furthermore, the GSA has a self-throttling feature called Host Load Scheduling which allows you to limit the number of connections opened per second. If that's not sufficient you can throttle the VL
Mirror it. (Score:5, Funny)

by palegray.net ( 1195047 ) writes: <philip.paradis@pa l e g r a y .net> on Monday October 19, 2009 @06:57PM (#29801479) Homepage Journal

You could just rsync the shared volume to a local drive as frequently as needed and run the search engine on the local copy.

Share
twitter facebook
- Re: (Score:2)
  
  by codepunk ( 167897 ) writes:
  
  So you would rsync hundreds of gig to a local disk to index it? I think I would rethink that strategy.
  - - Re: (Score:2)
      
      by Trahloc ( 842734 ) writes:
      
      So I should rsync my 20TB server to my 320gb drive? What was the point of the 20TB file server then?
      - Re: (Score:2)
        
        by Trahloc ( 842734 ) writes:
        
        Bingo, I guess Palegray has a much larger budget than most of us when it comes to storage. Afterall why have a single copy when you can have fifty.
        
        Re: (Score:2)
        
        by cbiltcliffe ( 186293 ) writes:
        
        Bingo, I guess Palegray has a much larger budget than most of us when it comes to storage. Afterall why have a single copy when you can have fifty.
        Well, at least that way you don't have to screw with tape drives......
    - Re:Mirror it. (Score:5, Insightful)
      
      by Makoss ( 660100 ) writes: on Monday October 19, 2009 @08:33PM (#29802441) Homepage
      
      Have you ever actually used rsyng on a decent sized file set? Determining the changed file set requires significant disk activity.
      It's a certain win when compared to just blindly transferring everything. But if you think that rsyncing 20 changed files in a 100 file working set is the same as rsyncing 20 changed files out of a 2,000,000 file working set you are very very wrong.
      Completely aside from the absolute insanity of suggesting that you replicate the full contents of the fileserver to every desktop, which has been covered by others.
      
      Parent Share
      twitter facebook
      - Re: (Score:2, Funny)
        
        by Saba ( 308071 ) writes:
        
        File modification date.
        
        Re: (Score:2)
        
        by L4t3r4lu5 ( 1216702 ) writes:
        
        Or checksum database.
    - Re: (Score:2)
      
      by jabuzz ( 182671 ) writes:
      
      Yeah, but what if your file system is 70TB with 20 million files?
- Re: (Score:2)
  
  by quanticle ( 843097 ) writes:
  
  It seems that the parent wants to merge a remote index with his desktop search so that he doesn't have to do this. Also, wouldn't giving each desktop its own copy of the data defeat the purpose of having a shared server?
  - Re: (Score:2)
    
    by palegray.net ( 1195047 ) writes:
    
    Whether it defeats the purpose or not depends entirely on the organization's needs. If querying data every few hours in a local app is the objective, that can be met quite effectively with mirroring. Disk space is cheap.
    - Re: (Score:2)
      
      by quanticle ( 843097 ) writes:
      
      Disk space is cheap when you're outfitting a single server. Outfitting even ten workstations with the same amount of disk can become quite expensive.
      - Re: (Score:2)
        
        by palegray.net ( 1195047 ) writes:
        
        Okay, so how much do you figure it will cost to outfit a server to be capable of supporting an arbitrary number of users, running extremely IO intestive reports on a shared volume whenever they feel like it, including the network infrastructure required to support this? Oh, and don't forget redundancy for the server.
        
        Trust me, I've learned from experience that the local disk space works out to be much cheaper for this sort of thing.
- Re: (Score:2)
  
  by adamchou ( 993073 ) writes:
  
  LOL, you've been modded funny because your idea seems so ridiculous yet after reading your comments, it becomes apparent you're being serious.
How about Spotlight? That works on shared volumes. (Score:4, Informative)

by thedbp ( 443047 ) writes: on Monday October 19, 2009 @07:02PM (#29801527)

*ducks*

Share
twitter facebook
- Re:How about Spotlight? That works on shared volum (Score:4, Informative)
  
  by Henriok ( 6762 ) writes: on Monday October 19, 2009 @07:25PM (#29801775)
  
  Yeah, my thought exactly? I wasn't aware that it was a problem searching hundreds of gigabytes on shared volumes. We have a couple of terabytes shared by our Mac servers and I don't think I've had search times longer than ten seconds over a couple of million files.. MS Office files, PDFs, movies, audio, pictures, photographs, text, HTML, source code.. all indexed with metadata and contents.
  Even the days before Spotlight, using AppleShare IP Servers in the 90s, finding stuff on the servers was never an issue. It has always been so fast that I have never even reflected over that it was fast. Maybe I should use some other operating system once in a while to experience what the majority experiences. Or not.. I'd rather stay care free and productive.
  Don't call me when you figure this out.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by gd23ka ( 324741 ) writes:
    
    Spotlight is the obvious answer if you have OS X. Not everybody in the world is lucky enough to be in that
    position, most are stuck on one of the inferior platforms. Your rubbing it in, is not helping it just
    alienates people who already have been through enough and have it tough.
- Re: (Score:3, Funny)
  
  by RancidPickle ( 160946 ) writes:
  
  I've tried ducks, but they tend to nibble the occasional one or zero, and they leave an awful mess on the platters when they poop. Try Spotlight instead -- not as cute, but easier on the data, hardware, and the nose.
- Re: (Score:3, Interesting)
  
  by kuzb ( 724081 ) writes:
  
  Except then you have another terrible search solution which isn't meant for the amount of data you'd find on a large server. Worse, you have an operating system that is terrible as a server solution.
  On the other hand, you could just use a unix/linux distro of your choice, and beagle (http://beagle-project.org) - which is meant for indexing large amounts of data and has many clients some of which can remotely access it.
Use Microsoft Indexing Service (Score:2)

by Icono ( 238214 ) writes:

One way is to set up Microsoft Indexing Service on the server with the shared drive. The MSC console app provides a search capability and one can also use the Indexing Service SDK for client apps.
Hmm (Score:2)

by ShooterNeo ( 555040 ) writes:

Basically, you need your desktop search application to look at the index file on the remote file server generated by an instance of the application running on the file server. Technically, incredibly simple but I don't know which application currently available is divided into front and back ends like that. Maybe open source...
- Re: (Score:2)
  
  by spyrochaete ( 707033 ) writes:
  
  That's not really a "desktop search engine" anymore, but I agree with your suggestion. Desktop search is not the right tool to index shared content, lest you get a department full of PCs constantly indexing the same repository over slow ethernet wires. Microsoft Enterprise Search Server lets you separate the indexing and serving servers, but unfortunately the free Express version doesn't support this.
Use MSS 2008 Express, SharePoint, FAST (Score:3, Interesting)

by VTBlue ( 600055 ) writes: on Monday October 19, 2009 @07:09PM (#29801603)

Use Microsoft Search Server 2008 Express...its free, all you need is a free server box. Also Check out SharePoint Search and FAST enterprise search.
http://www.microsoft.com/enterprisesearch [microsoft.com]

Share
twitter facebook
Two words... (Score:2)

by roc97007 ( 608802 ) writes:

Google appliance...
Lucene is a great foundation for this (Score:2)

by sribe ( 304414 ) writes:

So I think I'd start by looking here [apache.org].
- Re: (Score:2)
  
  by Thundersnatch ( 671481 ) writes:
  
  Lucene has a horrifying tendecy to corrupt it's indexes frequently. And it doesn't do what the OP wants in any case, as it doesn't understand file formats or respect native OS permissions. Even the Lucene- based Nutch is not a good fit.
Cross-Platform (Score:2)

by pinkocommie ( 696223 ) writes:

Wondering if there's anything cross-platform. I'm in the process of setting up an OpenSolaris fileserver (primarily to use ZFS/Raid-Z) and have both linux and windows boxes. It would be great to be able to have an index on each that could be read by a client app or a unified index perhaps.
- Re: (Score:2)
  
  by spyrochaete ( 707033 ) writes:
  
  You want an enterprise search service. Microsoft and Google both make web-driven search solutions which work on any end user platform including mobile. There are many other companies that have web-based search services as well but those are the two goliaths.
Windows 7? (Score:2)

by craenor ( 623901 ) writes:

I am probably missing something obvious here or misunderstanding the question, however, I am very happy with the search integrated in Windows 7. I have about a terabyte of data across two different volumes, and when I use the regular Windows 7 search I get instant, detailed results.
PC Docs Docs Open (Score:2)

by Orion Blastar ( 457579 ) writes:

Docs Open is a commercial document management system [pcdocs.com] but right now their web page doesn't seem to be working. We used it at a law firm I worked at. IIRC it was able to search through the billions of documents that the 300+ lawyers used in their cases.
You don't (Score:3, Interesting)

by saleenS281 ( 859657 ) writes: on Monday October 19, 2009 @07:44PM (#29801965) Homepage

You don't allow every client to index. There's been several suggestions already, but most enterprises intentionally DISABLE desktop search. It absolutely slaughters the share. It's not a big deal when one user is doing it... but when 5,000 are, the I/O load becomes unsustainable.

Share
twitter facebook
Your problem is obvious (Score:2)

by Locke2005 ( 849178 ) writes:

If I try to use a Windows desktop search engine
Try Earth (Score:2, Interesting)

by theritz ( 1116521 ) writes:

"Earth allows you to find files across a large network of machines and track disk usage in real time. It consists of a daemon that indexes file systems in real time and reports all the changes back to a central database. This can then be queried through a simple, yet powerful, web interface. Think of it like Spotlight or Beagle but operating system independent with a central database for multiple machines with a web application that allows novel ways of exploring your data." http://open.rsp.com.au/projects [rsp.com.au]
- Re: (Score:2)
  
  by drinkypoo ( 153816 ) writes:
  
  Earth requires a single database of search results. This is stupid because it creates an unnecessary single point of failure. When I execute a search, it should query all the remote machines for their own databases. This creates a few more ACKs but otherwise puts little more load on the system. It would be nice to have the OPTION to store the catalogs on a repository, but being forced to means that I need literally another server on my network; I don't have anything running 24/7 right now that is capable of
Desktop search is not the way to go (Score:2)

by vmxeo ( 173325 ) writes:

Seriously. You're probably going to want a separate server(s) for this job. You didn't specifiy what you're indexing, how often, or where, however I'll make some assumptions and point you towards an enterprise search appliance or product. Many will probably point you to Google Enterprise Search. I've worked with the search functionality withing Microsoft Sharepoint 2007, and it's (ostensibly) free spin-off, Microsoft Search Server. Again, you'll probably need to dedicate some hardware to this. In addition t
Share searching. (Score:2)

by Timmmm ( 636430 ) writes:

I wrote a web site/spider to do this for the whole network at uni. It was beautiful C++ all the way. After I left some silly CS people rewrote it in Python/PHP (ugh) here: http://code.google.com/p/trufflepig/ [google.com]
Portfolio Server (Digital Asset Management) (Score:2, Informative)

by mrnutz ( 108477 ) writes:

(Disclaimer: I work for Extensis)
Portfolio Server can continuously index files on SMB/CIFS (and AFP) volumes using a feature called "AutoSync". Web and Desktop (Windows/Mac) clients then search by folder name, file name, document text, or other metadata. Indexing and thumbnail creation takes place on the server, so clients are relieved of any cataloging workload and metadata is centralized.
http://www.extensis.com/en/products/portfolioserver9/overview.jsp
Use Windows Indexing Service (Score:2)

by a.koepke ( 688359 ) writes:

I am just embarking on a project to do exactly what the OP is asking for. Windows Server 2003 has an indexing service you can setup. http://www.windowsnetworking.com/articles_tutorials/Working-With-Windows-Server-2003-Indexing-Service.html [windowsnetworking.com] It is limited in its own form but provides the back-end tools you need.

Combine that with the next article from that site and you have a solution: http://www.windowsnetworking.com/articles_tutorials/Making-Windows-Server-2003-Indexing-Service-Useful.html [windowsnetworking.com]

This article s
- Re:Use Windows Indexing Service (Score:4, Informative)
  
  by bhpaddock ( 830350 ) writes: on Tuesday October 20, 2009 @12:49AM (#29804205) Homepage
  
  For indexing files, you're better off using Windows Search 4, a free download for Windows Server 2003. The old content indexing service is deprecated and a much older technology. It's useful in some particular scenarios but for a smaller (100,000 - 250,000 items*) corpus of file content, WS4 will work much better. And for larger repositories, SharePoint and Microsoft Search Server are almost always better options.
  * = Server 2008 R2 / Win7 has a newer version of the Windows Search indexer that scales better to even larger corpuses.
  
  Parent Share
  twitter facebook
Try Xapian Omega (Score:2)

by Sarusa ( 104047 ) writes:

I had this same problem not too long ago - we have a shared documentation tree with tens of thousands of documents that I wanted to index. I tried dozens of search engines in my spare time, most of which were just horrible (Beagle), were a nightmare to install for someone like me who's not a full or even part time admin (Apache SOLR), wouldn't allow cross platform access (lots of Windows ones, obviously), store a complete separate copy of every document (Alfresco, which didn't seem to have an option to ) an
- Re: (Score:2)
  
  by socsoc ( 1116769 ) writes:
  
  If you weren't the IT guy, why were you doing this?
  - Re: (Score:2)
    
    by Sarusa ( 104047 ) writes:
    
    Good question. I'm doing it for my project team, being the lead software guy. The IT dept. is a small team and kept busy with other stuff - but they gave us hardware for a team Linux server when I asked for it and let me admin it, so I'm just grateful that they're helpful when so many would be obstructionist.
    Rambling: I've also installed a wiki, which is getting rave reviews from the people using it; hoping to make that popular enough that IT will /want/ to take it over.
There's also an alternative to GSA and Sharepoint (Score:2)

by binaryspiral ( 784263 ) writes:

From IBM and Yahoo called OmniFind. It runs on a desktop or server and can index multiple shares... and the basic version is free but offers a lot of functionality.
Although if your business is booming, a GSA is freakin' sweet.
Depends on the server, doesn't it? (Score:2)

by GNU(slash)Nickname ( 761984 ) writes:

Novell's QuickFinder (http://www.novell.com/products/openenterpriseserver/quickfinder.html) works well in a Netware or OES2 environment. It even respects file permissions when displaying results.
Microsoft / Windows Search options (Score:2)

by bhpaddock ( 830350 ) writes:

Microsoft has a few solutions you can consider depending on your specific needs.
With Windows XP/2003, Vista/2008, or Windows 7 - you can install Windows Search 4 (not necessary on Win7, but recommended for Vista) on the server side to index the content, and then if you have WS4 (or Win7) on the client, it will automatically query the remote index when you perform searches against that file share.
Alternatively, if you run the free Microsoft Search Server (the Express version is free) which is based on ShareP
Pfff... (Score:2)

by ls671 ( 1122017 ) * writes:

Slackware is at version 13 which makes it much more advanced than a version 7.
Read at Slackware got to version 13 so quickly at this link:
http://en.wikipedia.org/wiki/Slackware [wikipedia.org] ;-)))
Access Rights? (Score:2)

by An anonymous Frank ( 559486 ) writes:

I had a look at some solutions last year, and ran into one hell of a road block; most solutions I had a look at presume that all the information you're indexing should be searchable and/or available to anyone that can reach your search tool's client.
Has anyone had experience with something that will search the indexes for items based on your credentials? (Meaning that if you're not in accounting you can't get results for that data set)
Looking at the wrong problem? (Score:2)

by thogard ( 43403 ) writes:

Ever consider hiring a librarian? I've worked at 3 small companies that had one and they were far more profitable than most of their competition because there was someone in charge of organizing the data.
- Re: (Score:2)
  
  by Jesus_666 ( 702802 ) writes:
  
  In fact, shouldn't it be possible to mirror the locate database to the local file system so that local calls to locate will show the proper results on the share? Granted, you lose the ability to index the local file system but depending on the setup that might not actually be a loss.
- Re: (Score:2)
  
  by ls671 ( 1122017 ) * writes:
  
  Hmmm... locate doesn't allow you to search within files. What about using rgrep or grep -r ?
  find is great too (but slower on the first run before results get cached by the kernel, if you have enough spare memory) when you need to know which files have been modified in a given period of time, which files take more room on the disk, etc..
  I usually disable locate for security reasons, at least use slocate ! ;-)
  So I'd say I use find and rgrep ;-)
- Re: (Score:2)
  
  by hey ( 83763 ) writes:
  
  You could make a web interface to locate.
  (Only searches files names.)
- Re: (Score:2)
  
  by quanticle ( 843097 ) writes:
  
  Well, that runs into the problem the OP has discussed. If the data is present as a network share, it'd take slocate forever to index the data on the remote server. Basically, he or she wants a way to run slocate once on the server and have that index file be merged with all of the individual desktops. That way, each desktop wouldn't have to go through the effort of duplicating work.
  - Re: (Score:2)
    
    by h4rr4r ( 612664 ) writes:
    
    So use ssh and run slocate on the server, or share out the slocate.db file.
    - Re: (Score:2)
      
      by quanticle ( 843097 ) writes:
      
      Yeah, that could work, but I don't think it'd be as seamless as the OP wants. The user would still have to select which db file to use. Still, its a solution.
      - Re: (Score:2)
        
        by h4rr4r ( 612664 ) writes:
        
        or here is a real smart idea:
        In each users path place a locate_on_server.sh script that just runs "locate -d $PATHTODBONSERVER".
        -d can take multiple database filename arguments, so you could have one locate_on_server that searches all your fileservers.
        
        Re: (Score:2)
        
        by zippthorne ( 748122 ) writes:
        
        Heck, you could skip the script and just alias locate
        But you still run into the problem that it runs from the command line and the database is byte-order dependent.
  - Re: (Score:2)
    
    by Trahloc ( 842734 ) writes:
    
    Well a problem with slocate is that it doesn't track changes live. Its basically a prettier version of a find dump into txt file then grep it. Something that tracks files live on the server end which can be searched remotely. Heck even a web interface or ssh would fit my needs, it doesn't need a pretty popup window thing.
- Re: (Score:2)
  
  by beelsebob ( 529313 ) writes:
  
  And then you sit and wait for ages for find to finish, and then you realise that it only searches in the file names, and not the contents of the file. Of course, what I do, is ssh in, and then use mdfind, but yeh, find doesn't cut it on multi-terrabyte volumes, and especially not when you want to search on more than just the name.
  - Re: (Score:2)
    
    by kabloom ( 755503 ) writes:
    
    Wrong! The "grep" part of that command searches the contents of the files.
    But if you think you can get away with just grep on large amounts of data, you really ought to learn something about how indexing works and how much faster it can make your searches.
- Re: (Score:3, Informative)
  
  by peter ( 3389 ) writes:
  
  FYI, GNU find has xargs built in these days:
  find -name '*.php*' -exec grep func {} +
  the + instead of ';' makes it collect up multiple arguments to grep
  like xargs instead of the traditional find -exec behaviour which is like xargs -n1. I use -exec {} + all the time, because it's less typing, and safe with
  filenames with punctuation or whitespace, so you don't have to type -print0 | xargs -0 either. (BTW, if you have a list of filenames that you processes with something line oriented, you can use xargs -d'\n
- Re: (Score:2)
  
  by rrohbeck ( 944847 ) writes:
  
  ssh user@fileserver 'cd /shared/myproject;wcfind . -name _\* -prune -o -type f -! -name \*.o -a -! -name \*.a -a -! -name \*.so\* -a -! -name \*.d -print | while read a; do ( file "$a" | fgrep -q 'text' ) &>/dev/null && fgrep -H "" "$a"; done | gzip -c9' >~/index/myproject.gz
  - Re: (Score:2)
    
    by rrohbeck ( 944847 ) writes:
    
    Oops, forgot to add
    ... and then
    zfgrep goddamnfunctionname ~/index/myproject.gz|less
    That's actually quite quick, for a couple MLoC.
- Re: (Score:3, Funny)
  
  by ubrgeek ( 679399 ) writes:
  
  I see you are trying to write a funny post. Would you like me to help with that?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Call the NSA (Score:5, Funny)

Re: (Score:2, Informative)

wow! (Score:5, Funny)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Spotlight (Score:2)

Re: (Score:2)

Google Enterprise Search (Score:5, Interesting)

Everything (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Seriously, ask slashdot? (Score:2)

Re: (Score:3, Insightful)

solution to hundreds of terabytes of docs (Score:3, Interesting)

Re: (Score:2)

A couple of options (Score:4, Informative)

NO! Try Alfresco (Score:5, Informative)

Re:NO! Try Alfresco (Score:5, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

Re:A couple of options (Score:5, Funny)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Everything (Search Engine) (Score:2, Informative)

Federated Search (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Sharepoint Services (Score:2)

Enterprise Content Management with Alfresco (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Mirror it. (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Mirror it. (Score:5, Insightful)

Re: (Score:2, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

How about Spotlight? That works on shared volumes. (Score:4, Informative)

Re:How about Spotlight? That works on shared volum (Score:4, Informative)

Re: (Score:2, Insightful)

Re: (Score:3, Funny)

Re: (Score:3, Interesting)

Use Microsoft Indexing Service (Score:2)

Hmm (Score:2)

Re: (Score:2)

Use MSS 2008 Express, SharePoint, FAST (Score:3, Interesting)

Two words... (Score:2)

Lucene is a great foundation for this (Score:2)

Re: (Score:2)

Cross-Platform (Score:2)

Re: (Score:2)

Windows 7? (Score:2)

PC Docs Docs Open (Score:2)

You don't (Score:3, Interesting)

Your problem is obvious (Score:2)

Try Earth (Score:2, Interesting)

Re: (Score:2)

Desktop search is not the way to go (Score:2)

Share searching. (Score:2)

Portfolio Server (Digital Asset Management) (Score:2, Informative)

Use Windows Indexing Service (Score:2)

Re:Use Windows Indexing Service (Score:4, Informative)

Try Xapian Omega (Score:2)

Re: (Score:2)

Re: (Score:2)

There's also an alternative to GSA and Sharepoint (Score:2)