Jay, JD, and I have been having a running conversation about tags – not html, rfid, or pet ID tags, but user generated categories for things like blog posts, web pages, photos, etc. I think that some kind of Folksonomy will replace, or at least enhance the way the we now currently google for information. Obviously there are a lot of issues to be overcome before then, but even with my cursory understanding of the issues I can think of a couple of approaches that the computer science types that I work with haven’t been able to completely shoot down.
We have a project in the shop that has to be able to tag pages so that you can do a lookup and find other related pages by comparing the tags. As simple as this sounds, it turns out that at any scale at all this can take quite a bit of computing horsepower.
So of course that got me thinking, how does Technorati keep up with a million blogs, a million different tags, and 14 Million tagged blog posts? With all of the down time that they have been having they may be struggling with the issue too.
I’ve been interested in tags, and more importantly, folksonomy since it first appeared on my radar some time last year. Wikipedia defines it thusly:
Folksonomy is a neologism for a practice of collaborative categorization using freely chosen keywords. More colloquially, this refers to a group of people cooperating spontaneously to organize information into categories, noted because it is almost completely unlike traditional formal methods of faceted classification.
This phenomenon typically only arises in non-hierarchical communities, such as public websites, as opposed to multi-level teams. Since the organizers of the information are usually its primary users, folksonomy produces results that reflect more accurately the population’s conceptual model of the information.
With Technorati the issue of tagging is mostly a phenomenon of users tagging their own blog posts, but the idea gets much more interesting with services like furl and Flickr. With these services users create something ( a web-page or digital photo ) and they can tag it. But I can tag it, too, as can you. With an entire community tagging things they have a much better chance of being defined correctly then if one person or entity has to define it.
Labor becomes very efficient when distributing the labor to the users tagging items, and takes advantage of a huge pool of specialized knowledge. Companies like Getty Images have employed tagging for years ( tagging all of their photos ) and it works well on the front end, but is very labor intensive. Paying dedicated taggers is not a sustainable business model when you are trying to tag everything.
While furl and del.icio.us are “bookmark managers”, and Flickr is for tagging photos, Technocrat for blogs, etc. what we need is an open, standard, architecture that allows and enables all of us to tag everything digital and provides an open framework so that everyone can take this info and develop uses for them that no one has even thought of yet.