Skills of a Data Miner

October 09, 2009

The sexy job in the next ten years will be statisticians

This post has been inspired by an interesting, if unusual, quote in a McKinsey Quarterly:

I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.

— Hal Varian, Google’s Chief Economist

Hal Varian is referring to what is now considered a pseudo-science on its own: data mining. Data mining is not only statistics, even if statistics is the most recognised academic component of it. It also includes data cleaning, machine learning and data visualization.

To put everything together, you need a good dose of programming skills. Therefore, a modern statisticians, a data miner, should be able to perform if not all, most of the following activities

These techniques are used together to study data and find previously-hidden trends or patterns within. Data mining is increasing acceptance in science and business areas which need to analyse large amounts of data to discover trends which they could not otherwise find.

We are being swamped with data we mostly cannot use for any business advantage. The ammount of data will double in the next three years. Raw data is useless. Those geeks or sexy statisticians who can model, clean, and visually communicate data are going to have fun.