OTHER PLACES OF INTEREST
Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.
Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...
Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...
Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.
R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.
HOW DID YOU GET HERE?
(If you are here from a search, thanks! This article is part of a series on desktop search. There have been updates, changes, all sorts of new posts, and the best way to see them all is with the “Browse your favorite category” drop down to the right. Choose “Search” and have fun, or click here.
Feel free to start with this one, its the first, but then read the rest for updates.)
(Yet Another update. Yahoo annouces a desktop search based on X1. Good idea, except X1 phones home (more details below), and some people report very heavy CPU usage during indexing. Will phoning home be part of the (presumably free) version of YDS? And will X1 continue to sell their own version with extra features, or retrench to a corporate market?
Ask Jeeves released their Desktop Search here
I’ve installed Copernic. More below… but I found a major flaw that is a dealbreaker for me. Uninstalled, reluctantly.
Yes, Google Desktop Search has been released. I talk about it a bit but I don’t love it. Blinkx has a new version which corrects many deficiencies, but still just feels middling; in addition, their smart founders have moved on, so unclear where this will end up.
More links to free/open-source options added, for those who prefer to see the details of what’s running on their machines. )
Desktop search is the new hotspot. While some folks say that consumer don’t care, I disagree… everyone I’ve talked to can’t wait to find the one that works. The need for “web integration” or “index all my media as well as my info” is less important, but if someone does it well, then maybe it will take off.
While we all expect the OS to help with the search problem, Windows Search has gotten more and more deficient with each version. Even with all my hacks, I still can’t get WinXP search to search all the files I want it to. Searching for an old email in Outlook is a nightmare. Apple’s Sherlock is pretty good, and has lots of plugins, but isn’t as fast or as flexible as it could be, according to some critics. Now that Google and Yahoo have upped their mail limits, expect searching of hosted mail to be a big deal as well (and more on these players down below).
How do these work and differ?
There are two ways to search a large volume of text and data:
Both have their advantages and disads. For example, building the index means one has to, well, build and store the index. This takes time, cpu, and drive space. In addition, if the index is not well designed, there are limits to the types of searches which can be done. In addition, no searches can be done until the index is created, and if its not kept updated, then searches return erroneous results.
On the other hand, opening each file becomes problematic as you get more and more files. While the index can leverage all sorts of speed tricks, there are not many ways to “loop over directories recursively, open each file, search either line by line or byte by byte for pattern, print if found, repeat”.
Also, each tools search different sets of your files. Some also search Outlook emails, some search Outlook Express emails, some search MP3 and Image tags. What data each reads is up for grabs: Some can only handle text, some can read Office formats, and some can handle PDF. Some read email attachments and also index them, some don’t. Some will index network drives/folders, while others only read local drives until you pay to unlock additional capability.
Some allow sophisticated and/or capability in search. Some allow wildcards, stemming, phrase search, boolean and/or, or even full regex. Some allow “find where word near word” and some have “relevance” searches. So, depending on your need, you may need to shell out some bucks, or accept that some cleverl and useful search techniques will not be available to you.
So, what tools are out there to search your drives?
Blinkx is gradually moving up the curve from hidden to hot. They are trying to mix web and local search, and have some clever visualization. I don’t love the “web and your stuff” mix in searching; I trust my stuff but the web data needs verification (this is even more annoying with Google Desktop Search: I find it really frustrating for it to ping out to the wire when I am trying to find one of my emails). (BTW, if you want to see great visualizations, play with Kartoo, requires Flash.) Appears to handle email and (from their faq) .txt files, Adobe PDFs, PowerPoints, Excel spreadsheets, emails and Word documents. Their new 2.0 includes “Smart Folders” (more details here) as a saved search/view. This has gotten lots of buzz, but no one has said to me “this is the one to get”, so I haven’t tried it yet. (BTW, Blinkx has nothing to do with BlinkPro, the bookmark tool I recommend elsewhere).
dtSearch has been doing this stuff for a while, and was one of the first to offer a desktop search. They create an index and, depending on your spend, can even create a shared index for a shared network search, like a private company search engine. This stuff is not cheap, with the entry level at $200, but its very powerful, designed for large collections of data (say, all of your invoices for the last 10 years, or whatever). This is probably the most sophisticated tool out there, and so is probably overkill for most users.
X1. You know, I really, really want to like X1. They do lots of things well. Their unique feature is the “search while you type” approach. X1 indexes the usual stuff, and then as you type, the list of hits shrinks to include only the matches (if you remember Lotus Magellan from the 80s, this will be a welcome return of an old friend; the Magellan gang helped create this tool). Its a nice graphic tool, with separate tabs for each of the searches you may want (Email vs. Files. vs. contacts vs. etc.) Other tools can mix all “types” of files in search results, but X1 keeps them on separate tabs. In addition, it can’t handle wildcards, but forum posts imply that this is coming.
Enfish has been in this game for a while as well; I beta tested a couple of early versions. The product was not only a search, but a powerful integrator of the various Outlook data types, way before MS built some of that into Office. So, for example, you could search for a name, and results would include emails to that person, from that person, their contact info (if you have it), maps, all sorts of data. It also searched the web, etc. It builds an index, and searches the usual suspects, with a file viewer built in.
It looks like they have now separated the search piece into a standalone tool, Find, for $50 and the full “integrator” for $200. When I tested it, I liked it, but it was somewhat slow to index and, because of all the graphics and integration, somewhat slower “feeling” than the other tools. The file preview feature was pretty nice, and if I recall, that same feature allowed it to search a much wider variety of file types than most of the tools I looked at. But I haven’t really used it in over a year, so try the trial yourself. If you have lots of older file formats and still want to search and view them, this may be your best choice.
Copernic Desktop has been in the “search aggregation” space for a while, with a client-side tool “Agent” which would aggregate search results from multiple engines. While I never found much use for it, some friends of mine swore by it. Well, they now have a desktop search, which works like their previous products: A free entry level, and a for-pay advanced version. The entry level looks pretty good, with an attempt to be “light” on resource use, and searches a wide variety of media types (Video, MP3, Favorites, History, etc.) These are worth looking at if you don’t already use a tool to index these; it feels rather bolted on otherwise. And yes, it indexes mail and the usual suspects in documents as well.
It’s biggest flaw in my use is that it doesn’t index the entire file tree. That is, I expected it to log the location of every file, no matter what type it is, and then index contents of files it knows how to read. Instead, it only reads files that it can index… meaning text and various office formats, as well as (by default) .mht and .zip. Go to the Advanced Options page and you see that, under “Additional file types to index (name and properties only)”, you have to manually add the extension of each type of file you want listed. This is rather silly. There is no way to add
<strong>.</strong> or other “list em all!” options. I will not be hand typing in the hundreds of extensions that files on my drive use; if anything, I would rather a “do not list” extension field for the few file-type that I know I will not care about. This makes it a dead end for my use; if I am trying to find a file, I shouldn’t have to have had the prescience to add it to an index list.
Also annoying is that, by default, if it doesn’t have focus, it acts like its minimized and stops indexing til 30 seconds of non-use. Yes, this can be turned off, but it takes some hunting in the options.
Look and feel wise, its pretty nice with a very “windowsy” look and feel. (And its based near me in Newton, Mass, but don’t let that sway you.) It does let you index network drives if they are mapped as a drive letter, but not by the \\\\fileserver approach (this was also mentioned below by some commentors on a previous version of this entry). As this is something many of the other free tools do not do, that’s a plus. But given the “only list files with certain extensions” problem, don’t know how useful this will be.
So, in summary, for indexing a variety of “media types” in an attractive tool, Copernic has done a nice job. But by restricting the “file location” indexing only to files that a) have an extension and b) have that extension manually typed into a one line box, this tool is a huge letdown by not allowing me to locate files that I know are currently on my drive. If I wanted that “functionality”, I would just use Windows Search, which also won’t locate files unless they have a certain extension. If someone can tell me how to get Copernic Desktop to list every file (even if it doesn’t index the contents), I’ll gladly update this entry. But until then, I won’t be using this tool. And its too bad; they did so many other things quite right. (BTW, Copernic just got acquired by Canadian company Mamma.com, aka “the Mother of all search engines”).
DiskMeta Lite is the free version of the DiskMeta indexer. The “free” one is for non-comm use only, and only indexes .txt., .doc., .html. They have personal (around $50) and pro (around $100) editions. Looking at the Pro edition: They do appear to mention some extensions not often seen (like the .CHM requested by one of the commentors) as well as “Morphological support of the English language”. Also, no bones about it, they clearly support “local newtwork shared folders and on network mapped drives”. No, I haven’t tried this one yet either.
AskSam plays in the dtSearch space. A professional searching tool / freeform database, you basically “import” information not into an index, but into a database system. Then, you add additional info as you find it or create it, and its all searchable. This becomes a bit different than “Where did I put that file” and more into “if I am going to dump all my info somwhere, where should I put it…” Starts at $150. Similar “organizer” tools include Infoselect for $250, Zoot for $100, InfoRecall for the affordable $40, and NoteLens for the even more affordable $20.
Lookout is my current tool of choice. A beta product which went free when the 2 programmer shop (2 ex-Netscape guys, btw) was acquired by Microsoft, this tool integrates into Outlook as a toolbar. It has a fast search, a pretty fast indexing, and a combined output of the various things it searches. It also has flaws: the output window has no right click menu to move, delete, whatever the results (while X1 does offer this). Instead, you have to open each item to make changes. In addition, this requires the .Net Framework 1.1, which doesn’t hurt anything, but is yet another thing to install. While one guy took the money and moved on, another has stayed with MS and is still responding to user issues on the forum. A pretty strong query language, but overall a “no-frills” product. But for the price (Free!), its very fast, and has become my turn-to tool again and again. Recommended to try, if you are an Outlook user.
If you are not into indexes, there are also GREP type tools, mostly text based, but some with GUIs. These will each be “open each file” tools, so keep in mind that they can search from the moment they are installed, but speed will vary with how much junk you make them search.
Wingrep is a $30 shareware tool which searches via Regex (and soundex, cool!) and even inside Zip files.
Astrogrep is an open-source project which works pretty similarly. You can search via regex or simple wildcards, and it can read most text files, but not any binary or zip files. Of course, it is free open source, and if you want to add additional features… the author welcomes it. It lacks some of the niceities of sorting output, etc. but for free, its pretty nice.
Agent Ransack is the free (or “lite”) version of another search tool, FileLocator Pro. While the feature list seems pretty basic, it is more powerful than it looks, it is free, and even the “pro” version is only $13, so if you like it, its easy to buy (as compared to the overpriced X1).
A free indexer that I completely forgot about til I got an email reminder was Wilbur, formerly commercial, now free and GPL, for Windows only at this time. Its an indexer, can index in zip files, and can handle PDF files. It appears to have been updated last around April 30, 2004, so it is still under active development.
Summary, Other Options
The portals are also playing in this space. Terra Lycos’ HotBot has had a desktop search tool for a while, but I’ve never tried it. MSN/MS owns Lookout, but they also have hinted that the MSN Toolbar will include desktop search in a near-term release, though some have hinted that it will only be for MSN paying customers. Ask Jeeves bought Tukaroo before they had a chance to do much more than show their product to insiders. Google has also dropped hints that their toolbar may incorporate desktop search at some point soon (though how one will calculate pagerank for my sql query text file is beyond me).
Obviously, Google Desktop Search is now out, and everyone has written how they love it or hate it. I think its silly, but there are enough positive things about it that you should give it a try. Yahoo will be releasing a repackaged X1.
Most of these will be indexers, not filesearchers, so look forward to a period of indexing before searching, and of course, the need to upate your index so your searches are not out of date.
And, yes, of course, you can pull down numerous open source projects like Lucene and make your own search engine… but that’s really a pain. And, just to point it out, that’s basically what Lookout did, so save yourself the effort and leverage what others have created.
There are a couple of desktop searchers already put together with Lucene if you really do want the fully open source approach. Via Jamie’s Weblog, you can look at Docco. Others have mentioned Lucene Desktop and the “command line interface” to Lucene (more of a testing tool, but you get the idea) found here: Lucli. And if you really, really want to play with the edge of technology, Beagle uses Lucene.net and the nascent Mono project (duplicating the Microsoft .net structure in open source). Still open source, akin to Google Desktop, Popsearch.net lets you host and run your own personal search engine for your mails and docs… and you can access it from anywhere if you set it up correctly.
X-Friend is another Lucene based engine, all in Java; currently free.
Baagle is an open source attempt to duplicate Google’s desktop search.
SWISH-E has also been suggested for unix/linux folks.
CollectiveCortex has a free trial but is ultimately commercial, like X-Friend on steroids is how one slashdotter explained it.
So, right now, Lookout is my choice. I have tried and removed Copernic (but will put it back on if they index the entire file tree) and should try Blinkx at some point. I liked X1, but until they lower the price and remove the phone-home, its not an option for me and I can’t recommend it. Enfish and some of the others are nice if you have special needs, but for your average analyst, I suspect you won’t go wrong with either Lookout, Copernic (with reservations), or Blinkx.
I don’t know if any of these are what I’m looking for. If nothing else, they all store their info locally which makes them great for speed, but bad for “distribution”. If I’m not at my machine, then I don’t have my knowledge. That is, if I’m searching for something on my machine, I’m already sitting at it. But if I need to look up info, I would rather have an online knowledge-base of some kind, ala a wiki.
PS: Windows search still sucks, but someone on slashdot suggested, to search by content,
1. start the indexing service, wait for it to index your drives.
2. search (Win-F), and prefix your search string with ”!”
I have no idea if this really does anything. More info here and here. A whole site dedicated to this topic, I learned new stuff on almost every page: http://www.xpsearch.info
(Impressive matrix comparing searchers… Not sure how often it will be updated. http://www.goebelgroup.com/desktopmatrix.htm
* * *