OTHER PLACES OF INTEREST

Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.

 

HOW DID YOU GET HERE?

google.at
nettakeaway.com
nettakeaway.com
nettakeaway.com
nettakeaway.com
nettakeaway.com
nettakeaway.com
https:
https:
nettakeaway.com

 

 

 

Desktop Search, or just where did I leave that knowledge? · 12/13/2004 02:27 PM, Tech Search

(If you are here from a search, thanks! This article is part of a series on desktop search. There have been updates, changes, all sorts of new posts, and the best way to see them all is with the “Browse your favorite category” drop down to the right. Choose “Search” and have fun, or click here.

Feel free to start with this one, its the first, but then read the rest for updates.)

(Yet Another update. Yahoo annouces a desktop search based on X1. Good idea, except X1 phones home (more details below), and some people report very heavy CPU usage during indexing. Will phoning home be part of the (presumably free) version of YDS? And will X1 continue to sell their own version with extra features, or retrench to a corporate market?

Ask Jeeves released their Desktop Search here

I’ve installed Copernic. More below… but I found a major flaw that is a dealbreaker for me. Uninstalled, reluctantly.

Yes, Google Desktop Search has been released. I talk about it a bit but I don’t love it. Blinkx has a new version which corrects many deficiencies, but still just feels middling; in addition, their smart founders have moved on, so unclear where this will end up.

More links to free/open-source options added, for those who prefer to see the details of what’s running on their machines. )

Desktop search is the new hotspot. While some folks say that consumer don’t care, I disagree… everyone I’ve talked to can’t wait to find the one that works. The need for “web integration” or “index all my media as well as my info” is less important, but if someone does it well, then maybe it will take off.

While we all expect the OS to help with the search problem, Windows Search has gotten more and more deficient with each version. Even with all my hacks, I still can’t get WinXP search to search all the files I want it to. Searching for an old email in Outlook is a nightmare. Apple’s Sherlock is pretty good, and has lots of plugins, but isn’t as fast or as flexible as it could be, according to some critics. Now that Google and Yahoo have upped their mail limits, expect searching of hosted mail to be a big deal as well (and more on these players down below).

How do these work and differ?
There are two ways to search a large volume of text and data:

  1. Make an index of terms and search that
  2. Open each file and search it

Both have their advantages and disads. For example, building the index means one has to, well, build and store the index. This takes time, cpu, and drive space. In addition, if the index is not well designed, there are limits to the types of searches which can be done. In addition, no searches can be done until the index is created, and if its not kept updated, then searches return erroneous results.

On the other hand, opening each file becomes problematic as you get more and more files. While the index can leverage all sorts of speed tricks, there are not many ways to “loop over directories recursively, open each file, search either line by line or byte by byte for pattern, print if found, repeat”.

Also, each tools search different sets of your files. Some also search Outlook emails, some search Outlook Express emails, some search MP3 and Image tags. What data each reads is up for grabs: Some can only handle text, some can read Office formats, and some can handle PDF. Some read email attachments and also index them, some don’t. Some will index network drives/folders, while others only read local drives until you pay to unlock additional capability.

Some allow sophisticated and/or capability in search. Some allow wildcards, stemming, phrase search, boolean and/or, or even full regex. Some allow “find where word near word” and some have “relevance” searches. So, depending on your need, you may need to shell out some bucks, or accept that some cleverl and useful search techniques will not be available to you.

So, what tools are out there to search your drives?

Index-based Searchers

Blinkx is gradually moving up the curve from hidden to hot. They are trying to mix web and local search, and have some clever visualization. I don’t love the “web and your stuff” mix in searching; I trust my stuff but the web data needs verification (this is even more annoying with Google Desktop Search: I find it really frustrating for it to ping out to the wire when I am trying to find one of my emails). (BTW, if you want to see great visualizations, play with Kartoo, requires Flash.) Appears to handle email and (from their faq) .txt files, Adobe PDFs, PowerPoints, Excel spreadsheets, emails and Word documents. Their new 2.0 includes “Smart Folders” (more details here) as a saved search/view. This has gotten lots of buzz, but no one has said to me “this is the one to get”, so I haven’t tried it yet. (BTW, Blinkx has nothing to do with BlinkPro, the bookmark tool I recommend elsewhere).

dtSearch has been doing this stuff for a while, and was one of the first to offer a desktop search. They create an index and, depending on your spend, can even create a shared index for a shared network search, like a private company search engine. This stuff is not cheap, with the entry level at $200, but its very powerful, designed for large collections of data (say, all of your invoices for the last 10 years, or whatever). This is probably the most sophisticated tool out there, and so is probably overkill for most users.

X1. You know, I really, really want to like X1. They do lots of things well. Their unique feature is the “search while you type” approach. X1 indexes the usual stuff, and then as you type, the list of hits shrinks to include only the matches (if you remember Lotus Magellan from the 80s, this will be a welcome return of an old friend; the Magellan gang helped create this tool). Its a nice graphic tool, with separate tabs for each of the searches you may want (Email vs. Files. vs. contacts vs. etc.) Other tools can mix all “types” of files in search results, but X1 keeps them on separate tabs. In addition, it can’t handle wildcards, but forum posts imply that this is coming.

So, what bugs me? Its really expensive, starting at $100 (though sometimes discounts knock it down to $75 or so). No support for wildcards: if you are looking for words ending in “raro”, for example, there is no "*raro" or "?raro"... all search terms must start a word. Also, it phones home at each use, primarily to check if its a pirated version or not. As I’ve said elsewhere, I don’t pay money to be watched, and if they have to stop pirates, they should do it without having my machine send info about me back to a company. Its in their privacy policy, and so they aren’t doing anything illegal… but it still sucks. So, for this price, and with a phone home “feature”, its cool, but not cool enough to put up with its problems.

Enfish has been in this game for a while as well; I beta tested a couple of early versions. The product was not only a search, but a powerful integrator of the various Outlook data types, way before MS built some of that into Office. So, for example, you could search for a name, and results would include emails to that person, from that person, their contact info (if you have it), maps, all sorts of data. It also searched the web, etc. It builds an index, and searches the usual suspects, with a file viewer built in.

It looks like they have now separated the search piece into a standalone tool, Find, for $50 and the full “integrator” for $200. When I tested it, I liked it, but it was somewhat slow to index and, because of all the graphics and integration, somewhat slower “feeling” than the other tools. The file preview feature was pretty nice, and if I recall, that same feature allowed it to search a much wider variety of file types than most of the tools I looked at. But I haven’t really used it in over a year, so try the trial yourself. If you have lots of older file formats and still want to search and view them, this may be your best choice.

Copernic Desktop has been in the “search aggregation” space for a while, with a client-side tool “Agent” which would aggregate search results from multiple engines. While I never found much use for it, some friends of mine swore by it. Well, they now have a desktop search, which works like their previous products: A free entry level, and a for-pay advanced version. The entry level looks pretty good, with an attempt to be “light” on resource use, and searches a wide variety of media types (Video, MP3, Favorites, History, etc.) These are worth looking at if you don’t already use a tool to index these; it feels rather bolted on otherwise. And yes, it indexes mail and the usual suspects in documents as well.

It’s biggest flaw in my use is that it doesn’t index the entire file tree. That is, I expected it to log the location of every file, no matter what type it is, and then index contents of files it knows how to read. Instead, it only reads files that it can index… meaning text and various office formats, as well as (by default) .mht and .zip. Go to the Advanced Options page and you see that, under “Additional file types to index (name and properties only)”, you have to manually add the extension of each type of file you want listed. This is rather silly. There is no way to add <strong>.</strong> or other “list em all!” options. I will not be hand typing in the hundreds of extensions that files on my drive use; if anything, I would rather a “do not list” extension field for the few file-type that I know I will not care about. This makes it a dead end for my use; if I am trying to find a file, I shouldn’t have to have had the prescience to add it to an index list.

Also annoying is that, by default, if it doesn’t have focus, it acts like its minimized and stops indexing til 30 seconds of non-use. Yes, this can be turned off, but it takes some hunting in the options.

Look and feel wise, its pretty nice with a very “windowsy” look and feel. (And its based near me in Newton, Mass, but don’t let that sway you.) It does let you index network drives if they are mapped as a drive letter, but not by the \\\\fileserver approach (this was also mentioned below by some commentors on a previous version of this entry). As this is something many of the other free tools do not do, that’s a plus. But given the “only list files with certain extensions” problem, don’t know how useful this will be.

So, in summary, for indexing a variety of “media types” in an attractive tool, Copernic has done a nice job. But by restricting the “file location” indexing only to files that a) have an extension and b) have that extension manually typed into a one line box, this tool is a huge letdown by not allowing me to locate files that I know are currently on my drive. If I wanted that “functionality”, I would just use Windows Search, which also won’t locate files unless they have a certain extension. If someone can tell me how to get Copernic Desktop to list every file (even if it doesn’t index the contents), I’ll gladly update this entry. But until then, I won’t be using this tool. And its too bad; they did so many other things quite right. (BTW, Copernic just got acquired by Canadian company Mamma.com, aka “the Mother of all search engines”).

DiskMeta Lite is the free version of the DiskMeta indexer. The “free” one is for non-comm use only, and only indexes .txt., .doc., .html. They have personal (around $50) and pro (around $100) editions. Looking at the Pro edition: They do appear to mention some extensions not often seen (like the .CHM requested by one of the commentors) as well as “Morphological support of the English language”. Also, no bones about it, they clearly support “local newtwork shared folders and on network mapped drives”. No, I haven’t tried this one yet either.

AskSam plays in the dtSearch space. A professional searching tool / freeform database, you basically “import” information not into an index, but into a database system. Then, you add additional info as you find it or create it, and its all searchable. This becomes a bit different than “Where did I put that file” and more into “if I am going to dump all my info somwhere, where should I put it…” Starts at $150. Similar “organizer” tools include Infoselect for $250, Zoot for $100, InfoRecall for the affordable $40, and NoteLens for the even more affordable $20.

Lookout is my current tool of choice. A beta product which went free when the 2 programmer shop (2 ex-Netscape guys, btw) was acquired by Microsoft, this tool integrates into Outlook as a toolbar. It has a fast search, a pretty fast indexing, and a combined output of the various things it searches. It also has flaws: the output window has no right click menu to move, delete, whatever the results (while X1 does offer this). Instead, you have to open each item to make changes. In addition, this requires the .Net Framework 1.1, which doesn’t hurt anything, but is yet another thing to install. While one guy took the money and moved on, another has stayed with MS and is still responding to user issues on the forum. A pretty strong query language, but overall a “no-frills” product. But for the price (Free!), its very fast, and has become my turn-to tool again and again. Recommended to try, if you are an Outlook user.

File Searchers
If you are not into indexes, there are also GREP type tools, mostly text based, but some with GUIs. These will each be “open each file” tools, so keep in mind that they can search from the moment they are installed, but speed will vary with how much junk you make them search.

Wingrep is a $30 shareware tool which searches via Regex (and soundex, cool!) and even inside Zip files.

Astrogrep is an open-source project which works pretty similarly. You can search via regex or simple wildcards, and it can read most text files, but not any binary or zip files. Of course, it is free open source, and if you want to add additional features… the author welcomes it. It lacks some of the niceities of sorting output, etc. but for free, its pretty nice.

Agent Ransack is the free (or “lite”) version of another search tool, FileLocator Pro. While the feature list seems pretty basic, it is more powerful than it looks, it is free, and even the “pro” version is only $13, so if you like it, its easy to buy (as compared to the overpriced X1).

A free indexer that I completely forgot about til I got an email reminder was Wilbur, formerly commercial, now free and GPL, for Windows only at this time. Its an indexer, can index in zip files, and can handle PDF files. It appears to have been updated last around April 30, 2004, so it is still under active development.

Ones I haven’t tried but should include:
Avafind shareware
AppRocket shareware
Locate32
Filehand is .net and used to be shareware but now is free.

Summary, Other Options

The portals are also playing in this space. Terra Lycos’ HotBot has had a desktop search tool for a while, but I’ve never tried it. MSN/MS owns Lookout, but they also have hinted that the MSN Toolbar will include desktop search in a near-term release, though some have hinted that it will only be for MSN paying customers. Ask Jeeves bought Tukaroo before they had a chance to do much more than show their product to insiders. Google has also dropped hints that their toolbar may incorporate desktop search at some point soon (though how one will calculate pagerank for my sql query text file is beyond me).

Obviously, Google Desktop Search is now out, and everyone has written how they love it or hate it. I think its silly, but there are enough positive things about it that you should give it a try. Yahoo will be releasing a repackaged X1.

Most of these will be indexers, not filesearchers, so look forward to a period of indexing before searching, and of course, the need to upate your index so your searches are not out of date.

And, yes, of course, you can pull down numerous open source projects like Lucene and make your own search engine… but that’s really a pain. And, just to point it out, that’s basically what Lookout did, so save yourself the effort and leverage what others have created.

There are a couple of desktop searchers already put together with Lucene if you really do want the fully open source approach. Via Jamie’s Weblog, you can look at Docco. Others have mentioned Lucene Desktop and the “command line interface” to Lucene (more of a testing tool, but you get the idea) found here: Lucli. And if you really, really want to play with the edge of technology, Beagle uses Lucene.net and the nascent Mono project (duplicating the Microsoft .net structure in open source). Still open source, akin to Google Desktop, Popsearch.net lets you host and run your own personal search engine for your mails and docs… and you can access it from anywhere if you set it up correctly.
X-Friend is another Lucene based engine, all in Java; currently free.
Baagle is an open source attempt to duplicate Google’s desktop search.
SWISH-E has also been suggested for unix/linux folks.

CollectiveCortex has a free trial but is ultimately commercial, like X-Friend on steroids is how one slashdotter explained it.

So, right now, Lookout is my choice. I have tried and removed Copernic (but will put it back on if they index the entire file tree) and should try Blinkx at some point. I liked X1, but until they lower the price and remove the phone-home, its not an option for me and I can’t recommend it. Enfish and some of the others are nice if you have special needs, but for your average analyst, I suspect you won’t go wrong with either Lookout, Copernic (with reservations), or Blinkx.

I don’t know if any of these are what I’m looking for. If nothing else, they all store their info locally which makes them great for speed, but bad for “distribution”. If I’m not at my machine, then I don’t have my knowledge. That is, if I’m searching for something on my machine, I’m already sitting at it. But if I need to look up info, I would rather have an online knowledge-base of some kind, ala a wiki.

BTW, if this stuff is interesting, you may want to check out Amit’s Blog for a different pov. In addition, the CNET gang give their 2 cents here.

PS: Windows search still sucks, but someone on slashdot suggested, to search by content, 1. start the indexing service, wait for it to index your drives. 2. search (Win-F), and prefix your search string with ”!”
I have no idea if this really does anything. More info here and here. A whole site dedicated to this topic, I learned new stuff on almost every page: http://www.xpsearch.info

(Impressive matrix comparing searchers… Not sure how often it will be updated. http://www.goebelgroup.com/desktopmatrix.htm
)

* * *

 

  1. That's one of the most comprehensive and useful entries about the desktop search space that I've seen. You hit upon a number of programs I had forgotten about, and some I've never even heard of :)

    I've linked to you from the Google Desktop Search entry on my blog at http://www.bladam.com/archives/0410141345.htm
    Adam Lasnik    Oct 14, 05:34 PM    #


  2. Don't Forget HotBot Desktop Search Bar and ISYS:desktop

    HotBot Desktop Search Bar has been available since April(?). It indexes local files including Microsoft office documents, Outlook or Outlook Express email (including attachments), and Internet Explorer favorites and history.

    Other features include returning query results to the the left pane and pages to the right pane; an RSS reader; a popup blocker; etc.

    ISYS also provides a variety of search tools of which ISYS:hindsite (free) and ISYS:desktop ($500+) are most comparable to Google Desktop.

    "ISYS:hindsite offers Netscape Navigator and Internet Explorer users the unique ability to perform full text searches on the contents of previously accessed web pages."

    " ISYS:hindsite automatically detects and indexes all web pages visited using Netscape Navigator or Microsoft Internet Explorer."

    "ISYS:hindsite does not duplicate or store the pages

    "ISYS:hindsite is provided with ISYS:desktop, or is available for download as freeware...."

    "ISYS:desktop can [index and] search over 125 data formats, including all MS Office products, e-mail, attachments, PDF, HTML, ZIP files, databases and native format spreadsheets."

    ISYS Query Syntax includes "Boolean and proximity operators, date searching and range searching." ISYS also provides Tense Conflation for automatically finding "different tense forms of your search terms" (e.g., "run" expands to "runs", "running", "ran", etc.); stemming (e.g., "run" includes "rune", "rung", and "running"; sounds like (phonetic match: "there", "their"); and explicit synonym rings (e.g., "coffee" includes "cappacino" and "expresso"); and many other features.

    "Information is displayed in the ISYS:desktop window with configurable highlighting and links between hits.... Importantly, users don’t need to have any other software installed to be able to view search results...."

    "ISYS:desktop can support searches in more than 30 languages and can access information stored locally on a user’s PC, on a network or intranet, or on an external website."

    Excerpted from http://www.isysusa.com/products/desktop/index.html

    http://www.isysusa.com/products/desktop/features.html

    http://www.isysusa.com/products/hindsite/index.html


    BillR    Oct 17, 03:34 AM    #


  3. Hotbot was mentioned above; its actually been available for years. But I couldn't find anyone who had used it for more than a short time and had much to say about it, so neither did I. However, it has had some improvements recently (as you point out), and might be worth a re-look.

    But good find on ISYS. I hadn't heard of them, and will do an entry on them soon...
    Michael Wexler    Oct 17, 08:52 AM    #


  4. Do you know of a search program that i can use to index my huge collection of CHM and MHT files.

    http://labnol.blogspot.com/2004/10/what-is-missing-in-desktop-search.html
    Amit Agarwal    Oct 29, 01:07 AM    #


  5. Amit: No, I haven't heard of many folks trying to index MS Help files. That being said, there appear to be programs to convert them back to HTML and those, of course, would be indexed.

    BTW, good post on search wish list items at your blog...
    Michael Wexler    Oct 29, 08:34 AM    #


  6. diskmeta.com is also a great thing for desktop, fast, exact and really smart (relevance, wildcards, stemming, morphology, filtering options, etc.).

    It's index-based, very good on large (up to 100 GB) volumes.

    Both for local and network shared drives; supports most file formats (incl. PDF, ZIP, RAR, CHM), user-defined file extensions can be added.

    The most close analoque is dtsearch, unfortunately (and not fair) diskmeta is not so widely known.

    Freeware or shareware ($48 or $98 for professional version).

    http://diskmeta.com


    M.Parker    Oct 29, 09:00 AM    #


  7. It is polite, when commenting on a blog about a product you represent, to mention your affiliation. Since M. Parker chose not to, I'll do it for him or her. The above comment about Diskmeta by M. Parker was posted by someone at Diskmeta, so take that into account when deciding for yourself if its "very good". It may indeed be... but judge for yourself.
    Michael Wexler    Oct 29, 09:28 AM    #


  8. diskmeta does index chm files but that is only in the professional version.
    Amit Agarwal    Nov 5, 02:33 AM    #


  9. Thanks very much. This is quite useful. Put me on your list. Thanks!!

    wdw
    winton woods    Nov 8, 11:25 AM    #


  10. Yes Copernic searches network drives!
    Jonathan Aquino    Nov 17, 03:35 AM    #


  11. I’ve been using Copernic desktop search for several months and am quite happy with it. As noted above, it searches network drives (you can leave it running overnight for indexing purposes). I like it better than Google because it’s a standalone product. I didn’t want my desktop search integrated with Google for privacy reasons.

    Copernic hits the mark for me.
    T. Hoffman    Nov 25, 01:11 PM    #


  12. X1 5.0 beta is out and they have several new feature over the previous release.

    But no .chm indexing :(

    Amit @ Desktop Search Tools Blog

    http://labnol.blogspot.com


    amit    Nov 26, 02:02 AM    #


  13. I’ve been using CollectiveCortex for a few weeks now, works quite well. Not too bad for a beta. The context reranking search bit works better than anything else out there.

    Gave up on copernic, didnt support some on the non MS file types very well and would lockup my pc fairly regurarly.

    I prefer an application rather the limited “google and its wanna be’s” that run through your browser.
    S Bear    Dec 15, 07:22 PM    #


  14. i am in deparate need for a high quality desktop search engine and i tried all. i have about 100GB of files, mostly scientific papers asnd books, no movies. i am sad to say that copernic had a bug, repetedly it crashed after about 2 days of indexing. google is not up to par, hotbot never finished indexing and finally microsoft desktop with ifilter did the job. a am slightly biased against ms but….no choice. it is the best.

    regards, good indexing


    cagatay buyukkoc    Feb 15, 11:57 PM    #


  15. Go to citeknet.com for IFILters to index the contents of .CHM .HLP .CAB .ZIP .MHT files. Works with the Index Server of Windows 2K/XP: restart the Indexing service after installing these IFilters, then go to each directory in the Catalog list and “Initiate Full Rescan”, which should start adding files to the indexing queue.

    I also added (useing regedt32.exe) all the new IFilter DLLs to the MULTI_SZ key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\DLLsToRegister

    Maybe this DLL registration does not need to be done after the installer did it; so far no ill effects, anyway.


    I.    Feb 19, 04:38 AM    #


  16. I need to index the content of websites I visit. I do a lot of research and just indexing the page title or meta data is not good enough. I use Hindsite, but the indexing is manual and time consuming. I also tried Copernic, but it does not index content, only page titles and maybe meta data, but not content.

    Any help is appreciated. Thanks.
    Terry    Apr 20, 12:39 PM    #


  17. The problem with microsoft desktop - and a few others - and Ifilter for CHM and .HLP files is they index up to the first 1MB of the file. Any CHM file over 10000kb and you got problems… If I’m correct.

    I’m also still looking for a desktop search that that can do say 50000-60000 KB and helpfiles and CHM and Csharp and PDF… I still haven’t found what I’m looking for… oh yeah and free.


    jmw    May 24, 08:06 AM    #


  18. your link to lookout links nowadays to guess what Bill Gates himself. http://www.microsoft.com/windows/products/winfamily/desktopsearch/default.mspx


    Juup Coenen    Jan 28, 12:07 PM    #


  19. Right, this post is from 2004. Over the past few years, lots of other things have happened. Click here to see all the search updates. For example, Lookout got acquired and is now known as Windows Desktop Search. It is built into Vista, for example.


    Michael Wexler    Jan 28, 12:48 PM    #


  20. lot of thanks


    salman    Aug 7, 08:06 AM    #


Name
E-mail
http://
Message
  Textile Help
Please note that your email will be obfuscated via entities, so its ok to put a real one if you feel like it...

Admin
powered by Textpattern 4.0.4 (r1956)