My sanity was recently called into question, not for the first time:
— Paul A. Rubin (@parubin) October 1, 2015
Friend Of The Blog Erwin Kalvelagen asserts that it makes no sense to expression optimization models in SQL. (I wrote about this idea here.) Erwin makes a great point: different environments offer different advantages when carrying out different kinds of data manipulations. Optimization can be viewed as a very complex data manipulation, and SQL is often not the best way to express such manipulations.
That said, it’s a leap to say it never makes sense. I’m not asserting that everyone should do it all the time – just that it’s reasonable. If I wanted to write all of my models in SQL, then I would have just written a system to let me do that (which you can more-or-less do right now by reading this). I have no plans to do so, but it’s still a reasonable idea because:
- Lots of people know SQL (*),
- very few people know about optimization or operations research as a discipline, even within the “analytics community”, (**)
- conceptual knowledge of how SQL queries are constructed can easily be leveraged to define full optimization models, and
- in certain cases it’s a pain (or not possible (***)) to pop out of one computing environment to express a model in another.(****)
The parts of the machine learning community that think about optimization think of it as fancy search. SQL is a querying language.
Further, part of our future (or at least mine) are scenarios where you need to solve a very large number of relatively simple optimization models with similar structure but different data, as part of a larger data transformation pipeline. Such models will likely be solved in computational environments like Spark, Hadoop, or Flink where SQL syntax will be resident. It’s totally not crazy to imagine expressing such models in this way. Will such cases actually use SQL, as opposed to Python, Scala, R, or whatever(*****)? I don’t know. I do think that somebody will use SQL at the very least as inspiration in enabling concise expression of optimization models.
* Here is the proof. Not on this list: AMPL, GAMS, AIMMS. They never will be. Ever.
** Paco Nathan’s Just Enough Math and John Foreman’s Data Smart are refreshing exceptions.
*** A reality of working in a team environment is that you do not always have full control over your computing environment.
**** Which is why, for example, we may choose to express PageRank in GAMS….
***** Again, there is zero – ZERO – chance that AMPL, GAMS, AIMMS, or your favorite optimization modeling language will be used in these scenarios. I don’t mean to offend proponents of what are great systems, just stating a wider industry reality that may not be fully appreciated by those outside industry.