I received several insightful replies to my weekend post where I blindly speculated about the number of big data applications. I thought about it over some red wine and I am dead certain I have figured it all out.
First, some clarification about were I am coming from. A big data analytics application is simply an analytics application where the required data does not fit on a single machine and needs to be considered in full to produce a result. In my thought experiment I am considering applications and not licenses. A single application may require a number of licenses of a particular software package in order to run in a cluster. Therefore to JF Puget’s point, I’m only considering distinct “use cases”.
I was probably too pessimistic in my estimates. If I think about my own team alone, we arguably have 3 optimization applications and 3 predictive analytics applications in production right now. Broadening out to Nielsen as a company, there are several more. My updated estimates are:
- 5,000 big data applications,
- 50,000 optimization applications,
- 500,000 predictive analytics applications.
My reasoning is crude because I don’t know where to find hard data. But here’s where I am coming from.
Let’s begin with optimization, which I know best. Optimization is “old” by software standards – operations research came of age along with the advent of the digital computer. It is also the most difficult of the three application types to bring into production because the math is so sophisticated and the modeling so specialized. However, the scope of application is incredibly broad. I have no special knowledge but there must be at least 10 major optimization applications that fit my criteria at UPS alone. And 10 more at Amazon, 3 in my office, and so on. Further, the “tail” is long; from past experience I know there are many “one off” mission critical optimization applications that range from such fields as scheduling, supply chain, forestry management, budget optimization, etc.
Predictive analytics is the fuzziest of the three to predict. My gut tells me a factor of 10 more than optimization, but depending on how loose we are in our definitions, I could easily see a factor of anything between 5 and 20. Anything that forecasts is a predictive analytics application, and forecasting is everywhere. Think of the financial world alone! I think it boils down to how many of these applications are actually profit generating, and how loose we want to be in our definition of “predictive analytics”. A “TOP 10” SQL query is a predictive analytics application, I suppose, but not really in the spirit of what I was envisioning. 500K is a big number, but probably not outlandish.
This leaves us with big data. It is true that there is much that is labeled big data that is not really big data according to my definition, but that’s not my problem. I throw that stuff out. There are also innumerable samples (congratulations on computing pi!) but those are mainly to generate blog hits or sell something. Here I suspect that there are a relatively small number of players (Amazon, Google, IBM come to mind) that account for a significant percentage of total big data applications. 5000 is a concession that I might have been too pessimistic in my previous post – but it can’t be much more than that. In any case, the “tail” is much shorter currently than for other analytics applications, and my prediction is that this will always be true.
Big data applications will never be as broadly deployed as predictive and prescriptive analytics applications. Big data application user bases (e.g. Netflix) may grow to be just as large, but the sheer number and variety of applications will always trail predictive analytics and optimization. This moment will be seen as a transformative era in computing – but not because of big data, but because of the advent of cloud analytics. Cloud analytics can be built by anyone with basic programming knowledge and the ability to create models, and provides insight to anyone with an internet capable device. That’s the future – and one worth hyping.