Thursday, December 3, 2009

The art (literally) of data mining

Thanks to Amanda Jarman's great "Fundraising Nerd" blog, I stumbled onto a fantastic way to procrastinate on my latest paper.  At this stage in the semester, any brief diversion is a breath of fresh air.

The Personas Project, as described by their website, "shows you how the Internet sees you...Enter your name, and Personas scours the web for information and attempts to characterize the person - to fit them to a predetermined set of categories that an algorithmic process created from a massive corpus of data. The computational process is visualized with each stage of the analysis, finally resulting in the presentation of a seemingly authoritative personal profile."

"Seemingly authoritative" would have been a great name for this blog, come to think of it.  Anyway, for kicks I ran both my full name and nickname.  Now, keep in mind that this isn't accurate at all for most people.  Using myself as an example: (click to enlarge)


I can say that the hits for politics, music, books, family, and education are think are fairly "me".  But sports?  Ha.  And what in the world is that "illegal" category?  I'm not that interesting.  Apparently, Google seems to think that I'm likely to enjoy driving my well-educated family to sporting events in a car covered with Obama bumper stickers and, while there, scalping tickets.  Good times.

So obviously, there are things like misidentifications, limited search results, algorithm limitations, and on and on.  But, that's the point.  We may hear a lot of promises from the analytics field, and there are very tangible ROIs to be had from wisely using the data you've collected, but there are also limitations.  The Personas Project is a great way to show that (perhaps more often than we'd like to admit) data/text analytics is more of an art than a science.

No comments:

Post a Comment