Monday, February 8, 2010

Basic Text Mining for Fundraisers

Kevin MacDonell has another great post up on his blog, and this time it's about his experiment with applying text mining to nonprofit fundraising.  What he did was basically this: dig into the free text fields (those ubiquitous "Comments/Notes?" blocks that are used as catch-alls for data that doesn't fit anywhere else, and for observations that were significant enough to note), and see what reoccurring phrases jumped out at him.

He outlines the process he came up with, which from the looks of it, seems to be the things that SPSS automates quite well. That makes me wonder, is the real value of text mining software really in the included libraries?  Or is there honestly some wondrous proprietary algorithm lurking in there?  My experience with it so far has led me to suspect that you can get the basic software from anyone (provided that it's fairly user friendly), and from there its value depends mainly on how you train it.  It's rather like hiring an adorable infant...it's cute, but not really productive until you teach it what it needs to know, and the entire process can be time consuming, frustrating, and expensive.  It's easy for a project like this to take on a life of its own and before you know it, terms like "boondoggle" get whispered.

So, back to Kevin.  His experiment was basic, but that's one of the best things about it.  He didn't sink an ungodly amount of time and resources in it.  It was just a good old fashioned "Hmmm...I wonder would happen if..." and off he went.  I fear that too many people will read about text mining somewhere and want to jump into the deep end, but come out overwhelmed and discouraged.  Maybe in the future when things have caught up, that'd be fine, but the entire field is one big beta version right now.

My recommendation would be to keep collecting that unstructured data, absolutely.  But don't worry just yet about when you'll get around to doing something with it. 

Friday, February 5, 2010

Sentiment Analysis at the Huffington Post

I stumbled onto Stephen Baker's blog, The Numerati, and found a fascinating article on the anaylsis methods of The Huffington Post.  Of particular interest to me was how they're putting sentiment analysis into action, by analyzing the comments made on their site, and using that data to adjust the featured stories.  Wouldn't you just love to take a field trip into their offices and see how it's all being put together?