Thursday, May 27, 2010

The new location for this blog

I've FINALLY gotten my new webpage and blog to a point where I'm reasonably comfortable going live.  Please update your bookmarks and links to http://www.olomon.com/wp  Thanks so much for reading!

Monday, March 1, 2010

Gogo Inflight Wifi is using unsecure clear text passwords?

In the process of researching a paper on the legality of wiresniffing (clif notes version: it's not legal, stop trying to pretend it's not), I found out a really horrifying fact:

Gogo, the inflight wireless company that I researched last semester (here's my paper that I wrote about inflight wifi), transmits their logon passwords as clear text.

No, seriously.

To even use their service, you have to log onto their page, accept their terms, pay for your service, and then mosey around the net.  My guess is that they use a nice secure page for their credit card intake form (can't check on that now since I'm not on a plane), but as far as logging into an existing account, it's giving your username and password out to everyone that's listening in.  People can do this using free software like Wireshark, and it's not hard to do.  I wish I knew if it was a switched network or not...anyone know?  I can't find any info on their site, and I'll probably fire off an inquiry to their customer service without a response.

Well, I guess that discount coupon I had for service this month will go unused, because there's no way on earth I'm logging on now.

Monday, February 8, 2010

Basic Text Mining for Fundraisers

Kevin MacDonell has another great post up on his blog, and this time it's about his experiment with applying text mining to nonprofit fundraising.  What he did was basically this: dig into the free text fields (those ubiquitous "Comments/Notes?" blocks that are used as catch-alls for data that doesn't fit anywhere else, and for observations that were significant enough to note), and see what reoccurring phrases jumped out at him.

He outlines the process he came up with, which from the looks of it, seems to be the things that SPSS automates quite well. That makes me wonder, is the real value of text mining software really in the included libraries?  Or is there honestly some wondrous proprietary algorithm lurking in there?  My experience with it so far has led me to suspect that you can get the basic software from anyone (provided that it's fairly user friendly), and from there its value depends mainly on how you train it.  It's rather like hiring an adorable infant...it's cute, but not really productive until you teach it what it needs to know, and the entire process can be time consuming, frustrating, and expensive.  It's easy for a project like this to take on a life of its own and before you know it, terms like "boondoggle" get whispered.

So, back to Kevin.  His experiment was basic, but that's one of the best things about it.  He didn't sink an ungodly amount of time and resources in it.  It was just a good old fashioned "Hmmm...I wonder would happen if..." and off he went.  I fear that too many people will read about text mining somewhere and want to jump into the deep end, but come out overwhelmed and discouraged.  Maybe in the future when things have caught up, that'd be fine, but the entire field is one big beta version right now.

My recommendation would be to keep collecting that unstructured data, absolutely.  But don't worry just yet about when you'll get around to doing something with it. 

Friday, February 5, 2010

Sentiment Analysis at the Huffington Post

I stumbled onto Stephen Baker's blog, The Numerati, and found a fascinating article on the anaylsis methods of The Huffington Post.  Of particular interest to me was how they're putting sentiment analysis into action, by analyzing the comments made on their site, and using that data to adjust the featured stories.  Wouldn't you just love to take a field trip into their offices and see how it's all being put together?

Saturday, January 23, 2010

Looking for good text analytics packages to review

Project 1 for my Text Analytics class is to either find some related article and give a 10-15 presentation on it for the class, or to do a quick demo of some software package other than what we're using for class (which is SPSS Text Analytics for Surveys).  So far, I've downloaded GATE 5.1 and RapidMiner 5.  I can't really do too much puttering around with them yet; I need to learn what I'm doing before I can ever figure out how to properly use them.  Luckily, I was able to sign up for a presentation date late in February, which gives me plenty of time to play around with it and have something constructive to show the class. 

This also helps out my overall interest in nonprofit analytics, especially for smaller organizations.  If we're gathering all this unstructured data, what if there's a wealth of information there that we're not yet putting together?  And what if the excuse for that is, we can't afford the pricey packages out there now?  For that reason, I wanted to make sure that what I demo'd is free, or at least exceedingly inexpensive (well...also because I'm a broke grad student.  Either it's free--and preferably open source--or I keep on looking).

Anyone have experience with any of these packages I mentioned, or have a suggestion to add to the mix?

Also, I found a couple of articles about text analytics in general:
  • Taco Bell Takes Heat Over 'Drive-Thru Diet' Menu -The quote from this particularly made me laugh:  "Prior to launch, posts were 73% positive, putting it ahead of beloved chains like Subway, Wendy's and Domino's. Words associated with the brand online were "love," "delicious," and "favorite." Postings are now 67% positive, putting Taco Bell behind White Castle, Blimpie and Arby's, which rank among the category's lower tier. Now three of the words most closely associated with Taco Bell and its campaign have been "fat," "stop," and "joke."

Tuesday, January 19, 2010

The new semester is going to have me pretty busy

Between my grad assistantship work for 2 different profs, my duties as MSIS rep for the Graduate Business Association, and my classes, I'll be pretty darned busy until May.  My courses are:

Information Systems Analysis and Design
Cyber Security Tech Factors
Java Development
Text Analytics

I'm most interested in the text analytics class, since I'll finally get hands on experience with SPSS, and the whole topic is a favorite of mine.  Should be a fun few months.