Software Is Not Eating Data Science

I am growing weary of what I will call data science exceptionalism: assuming that anything associated with data science, analytics, and big data are completely new or different. (Data science exceptionalism is perhaps a form of what Evgeny Morozov calls “solutionism”.)

Our example for today is “Software is Eating Data Science”. A couple of representative quotes are

“Automation is sending the data scientist the same way as the switchboard operator.”


“Over the next five years, someone will say ‘why am I spending $500,000 on people to do this work, when I can do it with software?’ So in the same way you see software starting to do what people are doing,” Weiss told hosts John Furrier and Dave Vellante. “Increasingly, companies like HP are figuring out how to automate that, but we’re still at the very early stages and there’s so much exciting work to do.”

We must be generous in our interpretations of these quotes; it would be unfair to assume that the author believes that data science was conducted using pen and paper, or perhaps an abacus, to this point. The argument is that automated processes, powered by software, will eventually make data scientists obsolete. Manual data science processes, the argument goes, can be automated because “tools and techniques are similar”. This automation then makes data scientists obsolete because software solutions can be reused and deployed on clusters of computers on premises or in the cloud.

The flaw in this reasoning comes at the end: contrary to the article, automation will result in greater need for data scientists. Data science is, has always been, and will for the foreseeable future be a collaboration between man and machine. The best chess player in the world is a grandmaster with a computer, and the best analytics in the world will come from trained data scientists with computers. Automation will allow data scientists to work more productively, making data science more valuable, increasing the demand and benefit of data science applications, in turn generating more demand for data scientists.

The trends described in the article are no different then those in web development at the beginning of the (first) dot-com era. Namely, the combination of established and nascent technologies and processes being embraced by a larger audience, leading to progressively higher levels of abstraction and productivity. Simply put: what it meant to be a web developer was different in 2004 was different than what it meant in 1994, and what it means today. So it is with data science: you won’t see PhDs writing bespoke Python to mine web comments for sentiment ten, or even five, years from now. For many organizations, the frontier of data science will move up the value chain from descriptive and predictive to prescriptive analytics (i.e. decisions), and will move from low-level data munging and model building to more componentized, automated processes. This will not eliminate the need for a data scientist role – it will affect how they spend their time.

Software is one of the fundamental tools of analytics, along with mathematical and domain expertise. Better tools may change practitioners do their work, but it usually does not obviate the need for those skilled in the art. John Deere’s plow did not “eat farming”, so let’s stop the silly talk.


Author: natebrix

Follow me on twitter at @natebrix.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s