Deprecated: Function set_magic_quotes_runtime() is deprecated in /home/mwexler/public_html/tp/textpattern/lib/txplib_db.php on line 14
The Net Takeaway: I still hate tagging...


Danny Flamberg's Blog
Danny has been marketing for a while, and his articles and work reflect great understanding of data driven marketing.

Eric Peterson the Demystifier
Eric gets metrics, analytics, interactive, and the real world. His advice is worth taking...

Geeking with Greg
Greg Linden created Amazon's recommendation system, so imagine what can write about...

Ned Batchelder's Blog
Ned just finds and writes interesting things. I don't know how he does it.

R at LoyaltyMatrix
Jim Porzak tells of his real-life use of R for marketing analysis.







I still hate tagging... · 04/13/2005 03:30 PM, MetaBlog

Update: Lots of traffic to this link from some very nice blogs and users… so, note that it rests in context with my complete series (so far) on why I dislike tagging, including:

I Hate Tags
I still hate tagging….
I continue to despise tagging…
In conference
Tag-Hater at Yahoo, home of tagging?

Now, back to your regularly scheduled reading…


I originally wrote about how much I hate tagging a few weeks ago, in the article I Hate Tags. But I keep reading all these articles about “tagging”, the most recent being Stephen Levy’s article in Newsweek, and I still find it to be insane that all these smart people can’t see the obvious. Tagging is not designed to share, its designed to create walled gardens, defined originally in the wiki world. Here is one person’s summary from that page:

...a WalledGarden (at least as the originator originally imagined it) is a large set of pages that are suitable for a wiki, but bring in their own organizational or conceptual baggage, and hence integrate poorly with the rest of that wiki. The content is appropriate, but the form prevents integration.

This means that if you bring your own organizational structure to something, it won’t fit with the rest, walling it away.

Look, if I am looking for something specific, then I type those terms in. Say I use a search engine. If I am looking for a phrase, I use quotes and type in all the words (up to 10 for most engines) and I get hits with that phrase.

But usually, I want stuff “like” or “similar” to my words. Search engines know variations on words to try to give my search more breadth. Or, I don’t know what terms are appopriate, so I look for other terms in the content, read those findings, and learn the proper vocabulary as used by experts in that field.

But that’s now how tagging systems work. Instead, you have to know the terms up front to find anything. Using tagging systems to find stuff means typing in every possible variation of the terms you can think of. This is fun for browsing, but silly for research or answering questions. Note that every popular “tagging” system, to date, has been for consumer fun stuff (flickr, etc.) and not for real knowledge management.

Let’s try it together. is currently the hot social bookmarking site, so let’s find things on analyzing categorical data. I am sure someone out there is working on this, so let’s find it. The basic trick is that you type the term as part of the URL like this: more docs here.

So, let’s try and see that it returns nothing. For comparison, Google returns 179k for the same phrase. Ok, now I have to guess at terms that might make more sense. How about chi-square? Or logistic regression? Of course, I am an expert and know these phrases; too bad for the average person who is screwed at this point. returns 3 links.

Look at the tags they are coded under:

“Bayesian Logistic Regression Software”
“generative discriminative naive bayes logistic regression statistics machine learning tom”
“logistic regression statistics”

So, basically, I have to know some pretty techie (stats techie) to get my answer. I have to use terms that are known only to experts. This is insane.

How does this “organization” help an outsider research? It doesn’t. You have to stumble onto a phrase that returns some relevant hits, and then try to understand why a link has these other terms to decide if they help or hinder your search. The examples in the Levy article only epitomize this. He speaks of the tag “GTD” as short for relevant to a book, “Getting Things Done”. Ok, but if I don’t know that, it’s a useless tag for sharing. Oh, it shares with those who already know, but shouldn’t social activities online be about more than setting up cliques and private languages like we did in 8th grade? (US schools go from Kindergarten to 12th grade, so 8th grade means 14 year olds.).

Hell, I know all the terms I would use to categorize my links: They are very esoteric and detailed, and I use them all over my private bookmarks (ala Blinkpro and my new favorite, Link-a-go-go). But as DMOZ has shown over and over again, if you are trying to organize knowledge for others instead of just yourself, you have to think a little more open. Tagging is insular, not expansive.

So, I know this will evolve into something useful. We don’t have to stick with the Dewey or any other tree-based taxonomy if we don’t want to. My suggestion? Recommmend “Standard Terms” and “Free Terms” as 2 separate fields. Let’s try to do our best to share some phrases which are key to the concept of an article or link or page or functionality or whatever, and use that in the Standard. It can be a big list, and it will change, but the community would choose, and it would override (yes, change) the old terms where necessary. But “Free Terms” stay as whatever the user typed in there, as wacky and wild and useful and useless as they want to be.

I think we will find that using “Standard Terms” and then “Free Terms” together will allow folks to find useful things and expand their knowledge at the same time, instead of being forced to either find stuff they already know about, or ask to be let into the club.

* * *


  1. One feature of that you are overlooking is the related tags area. So if you begin your search by looking up a broad tag like just data, then you will see a list of related tags along with the results. These tags can often lead you down an interesting path which often does lead to a useful resource.

    Another point is that a comparison of to Google is hardly fair as google indexes some 11 billion documents where as I’m guessing wouldn’t even have 1% of that — wild guess

    greg    Aug 12, 01:20 AM    #

  2. Hmm… The related tags feature is relatively new, and coincided with the advent of “tag clouds”, which is a clever ui tool and can actually ameliorate some of the pains I mention in my various posts.

    But more importantly, the issue you raise is sort of my point: When you want to understand something completely new and novel to you, a maze of twisty passages is not really the best solution. After you get the basics, then the features, additions, and subtleties are valuable. But does it in the wrong order.

    In addition, a comparison of Google vs. Del is completely warranted: Google grabs any site on the planet, while Del gets only sites someone thinks is important or interesting enough to add. Shouldn’t that mean that its a better research experience? I don’t mean that Google’s 179k is impressive, I mean that Del’s 0 count is abhorrent.

    Again, once I’ve figured out some terms and what they mean, then digging around social tagging networks can add that extra spice. But everyone seems to be forgetting what its like to learn new topics, and that type of behavior is endemic to walled gardens. See my other tagging posts for more info about this…
    Michael Wexler    Aug 13, 07:54 PM    #

  3. To illustrate your idea that tags are lousy for information discovery you give an example where the seeker doesn’t know the “in” words and/or the topic can’t be pegged to a distinct label.

    That’s a reasonable gripe on the surface but misses some of the fine points:

    First, you suggest that search engines are better at a kind of “reverse dictionary” exploration where you need to go from concept to concrete information. Mostly they are, but even your example shows that it helps if you are hip to the “private language” of three keywords you searched for. If you don’t know the buzzwords and your search terms are common words (Ex: Try searching for “analyzing tag categories” instead), you’ll end up with a high noise-to-signal ratio in your results.

    In reality, web searches (when you don’t know the “in” words) require an iterative strategy we all use subconsciously: 1) Search for the most specific terms you know. 2) sift through noisy results to find relevant results (if any) 3) search again using better keywords (gleaned from previous hits or just thought up) 4) repeat.

    Likewise, if you use a tag system for information discovery you need to adopt the right strategy, only people haven’t had 10 years to figure that out like they have with search engines.

    In general, folksnomies lend themselves to the “pivot search” which works like this: 1) search for the most specific tag you can think of 2) if that yields too many or too few results, keep trying other words or combination of words 3) once you find a relevant result, look at how other people tagged it (you now know the “in” words) 4) search for those tags. 5) repeat

    For example, I may not know the current in-word for the nifty auto-suggest feature of Google Suggest but if I look for only the tag “suggest” on, I learn pretty quickly that “ajax” might be a significant term.

    That’s pretty powerful: I’m not just finding resources related to a concept but finding labels important to the community. Moreover those labels aren’t just incidental/accidental associations with the topic (as found by search engines) but intentional, conscious labels given to it by people.

    Finally, you might fear that tags create walled gardens which hide rather than share information but, in practice, that doesn’t happen because:

    1) As more people tag a resource, the “gene pool” of associated tags becomes more diverse. So, if you say “potayto” and I say “potawto”, the community will link those resources eventually.

    2) Folksonomy systems are already getting smarter about capturing fuzzy associations. You can already browse related tag clouds. Future improvements would use simple linguistic analysis to group related tags (i.e. “blog, blogs, blogging”) and allow more consistent “phrase tags” so that “social_software”, “” and “social-software” would be equivalent.

    In short, search engines are great at fuzzy searches, but return results that are sometimes too fuzzy. Tag systems seem more rigid at the outset but are gaining ground with new “fuzzy” features all the time.

    Finally, consider this: Google only sees web pages. Right now that’s an advantage since the ratio of web pages to shared bookmarks is high. What happens when social bookmarking goes mainstream? Sooner than you think, there will be more social bookmarks than there are “interesting” web pages. Do the math – people can (and do) tag far more pages than they write.

    When social bookmarks reach this tipping point (or even close), tags will be the definitive insight into how people think about and classify information.
    S. Jones    Sep 28, 04:14 PM    #

  4. I wish I’d catch this stuff earlier, if it weren’t for tagging I’d have never found it at all.

    If you’re thinking tagging systems serve the same purpose as search engines, to support research, for instance (and certainly if you mean in the same way), you’ve missed the value of tagging systems entirely. Put another way, to evaluate tagging systems in light of their ability to deliver on the same value propositions as search engines, you’re
    missing the point.

    While it is true that we could, and will, do more interesting things associating tags, the fact that they are insular, as you put it, is extremely useful just as it is. If we don’t speak the same language, we’re not useful to one another—at least not at the present time.

    I use to discover smart people on various subjects. I do this by exploring the tags others have used for sites I find useful in a given area. I track from tags back to users. Once I hit upon someone that’s smart in a given area (defined as lots more engaging material in a given area than I’ve been able to collect), I plug his tag of interest into my RSS reader. At the present time I’m “intellectually drafting” off the discoveries of several very smart folks in several different areas. I’ve also discovered that a number of people are drafting off of me in the same fashion—sometimes on the same tags.

    With regard to that latter point, you might think they’re at a loss given that I’m looking to someone else that I consider smarter than me, while they’re stuck with me. But the truth is often different than that. I don’t choose to tag everything my experts find for me. I tag only things that are relevant for me. And that is always different than what is important for anyone else. Now sometimes, my interests are more closely aligned with someone elses than thiers would be to my expert. In a sense, I’m a human filter for them. Again, that’s based upon our “insular” use of tags.

    Search is valuable, and certainly has it’s place. But to the extent it’s shaped our thinking on information acquisition and learning, it’s as much anchor as engine.

    As I’ve posted elsewhere, the value of tagging is less in search, and more in serendipidous discovery. And once the latter is experienced, it seems the greater good.
    bob    Sep 29, 09:53 PM    #

  5. Enjoyed the article. I use tagging like delicious in a different way than your present here: for backward links. I find the ‘bookmarks from others’ for a specific page helps me find other interesting users and interesting sources of information a lot more quickly than a search engine backward links feature. I concur with bob, above, that the value of tagging is less in search and more in discovery.

    I also agree with S. Jones above that we will start to see more linguistic tools such as stemming and lemmatisation pulling tags together.
    Robin    Oct 1, 04:41 PM    #

  6. Thanks for a really good post, and thanks to all the folks who’ve commented above me. You’ve really helped drive some design decisions. We’ve referenced your post and this discussion in our post at Tyner Blain, Software testing series: Organizing a test suite with tags part one.
    Scott Sehlhorst    Feb 6, 11:19 PM    #

powered by Textpattern 4.0.4 (r1956)