Data Mining Strategy
In the business world, the buzzword “Big Data” has a feeling of opportunity surrounding it. The term is often used in a business conversation as a new-age solution to finding insights into your company by leveraging the large pools of data you have stored in excel files somewhere. In reality, the goal is still the same as it’s always been…discover insights from the data available. Now there’s just more data (and more noise). With the rise of “Big Data” many new companies have popped up claiming to be “Big Data experts” and offer a simple turn-key approach (which is very enticing to companies who have a novice understanding of data mining). While we all love the new neighbor offering us a free ice cream cone on a hot day, it may not be the best way to quench our thirst (and it may be deceiving to our stomach by telling it that it’s full, albeit with nothing of value).
Why you should be wary of “turn-key” Big Data solutions
For two BIG reasons:
- Science is important, not just finding correlations
- Data without context is worthless
Science is important, not just finding correlations
Google Flu Trends is a nice example of correlation falling flat on its face. Google Flu Trends uses algorithms to predict flu outbreaks by analyzing search query behavior. For example, if a certain region has a spike in the search phrase “How do I know if I have the flu?”; it is then predicted that this region is infected or about to be infected with this year’s strain of the flu. Google Flu Trends is an example of basing your intel on correlation and not causation; and the foundation eventually collapsed.
As these “new” Big Data companies pop up it’s important to understand their statistical background. There have been many statistical lessons learned over the years by “data scientists” that should not be ignored; but rather built upon and refined. Data in “smaller” quantities (before the Big Data era) had many problems that needed to be addressed. Just because there’s more data now doesn’t mean those “small data” problems have vanished; they’ve most likely grown exponentially.
Basing your business decisions off of correlations is like asking your mechanic to fix your car problems with a blindfold on. If he can’t see what caused the car to break down in the first place, how can he fix it?
Data without context is worthless
There is a common assumption that “Big Data” or data in large quantities reduces the percentage of error (or unknown variables); therefore making the data mining exercise a matter of mining for gold. This assumes the area you’re mining is on top of a gold mine.
Let’s address the first assumption… First off, you don’t have all the data available. No matter how much you think you do, you don’t. For example, some companies will crunch their Twitter data hoping to gain insights into the sentiment around their brand; and eventually use these insights to impact their messaging strategy. The problem with this is that you assume that Twitter is the whole conversation. It’s not. As Text Analytics leader Tom Anderson points out, “Twitter and blogs represent 8% of the conversation.” There must always be a question about what is missing, especially with a big pile of raw data.
Let’s address the second assumption… Believing that you’re sifting for gold on top of a gold mine assumes that all your data is relevant and meaningful. In other words it assumes whatever insights (or correlations) you pull from your data are meaningful. This is where the importance of “context” comes into play. This is also why it’s important to stay away from companies who offer a “turn-key” approach.
A data mining project should involve back and forth conversation between the client and the research company. The client should know their business well enough to identify insights that don’t make any sense to their business. The research company’s experience should serve as a guide for avoiding common pitfalls and key areas of data that are needed.
A back and forth conversation between the client and the research company helps breathe some context into the project and makes for much more meaningful insights.
So what do I do with my big pile of data?
- Find a research company that has the ability to identify the causal variables that impact your business rather than simply finding correlations. Tread the water carefully as the quickest and easiest solution is often not the best option.
- Take the steps to prepare your data for a data mining project.
“Big Data” does provide an opportunity to gain insights into your business but it faces the same challenge that “small data” faces; finding the signal in the noise. “Big Data” just makes the signal even harder to find.