I recently read an interesting blog about how Rackspace is using Hadoop to process email logs. They are able to provide answers to complex questions that would otherwise be impossible to answer. I was wondering if anyone on the OMC know if any good Hadoop stories.
TIA
johnmwillis.com
Our team at my last place was using it for building large amounts of very large BDB indexes simultaneously. Log processing still seems to be the best candidate, though one of the neat pieces of Hadoop is DFS. DFS is fantastic for storing a large file across several nodes and making distributing parts and even copies of parts to other systems.
Here's a series of Hadoop articles on Ceteri.org intended as an intro to using it. Those blog articles are based on an open source project which I started as a place to collect Hadoop and MapReduce coding examples. I found that Hadoop application examples were few and far between. The first example uses data from the jyte.com cred graph to calculate a simplified PageRank metric.
We were using Hadoop at my previous firm to handle parts of our natural language processing and text analytics. I have a hunch that we'll be using Hadoop or something much like it at my next firm, wherever that might be ![]()
Paco NATHAN