Optimization Using SQL: Not Crazy

My sanity was recently called into question, not for the first time:

Friend Of The Blog Erwin Kalvelagen asserts that it makes no sense to expression optimization models in SQL. (I wrote about this idea here.) Erwin makes a great point: different environments offer different advantages when carrying out different kinds of data manipulations. Optimization can be viewed as a very complex data manipulation, and SQL is often not the best way to express such manipulations.

That said, it’s a leap to say it never makes sense. I’m not asserting that everyone should do it all the time – just that it’s reasonable. If I wanted to write all of my models in SQL, then I would have just written a system to let me do that (which you can more-or-less do right now by reading this). I have no plans to do so, but it’s still a reasonable idea because:

  • Lots of people know SQL (*),
  • very few people know about optimization or operations research as a discipline, even within the “analytics community”, (**)
  • conceptual knowledge of how SQL queries are constructed can easily be leveraged to define full optimization models, and
  • in certain cases it’s a pain (or not possible (***)) to pop out of one computing environment to express a model in another.(****)

The parts of the machine learning community that think about optimization think of it as fancy search. SQL is a querying language.

Further, part of our future (or at least mine) are scenarios where you need to solve a very large number of relatively simple optimization models with similar structure but different data, as part of a larger data transformation pipeline. Such models will likely be solved in computational environments like Spark, Hadoop, or Flink where SQL syntax will be resident. It’s totally not crazy to imagine expressing such models in this way. Will such cases actually use SQL, as opposed to Python, Scala, R, or whatever(*****)? I don’t know. I do think that somebody will use SQL at the very least as inspiration in enabling concise expression of optimization models.

* Here is the proof. Not on this list: AMPL, GAMS, AIMMS. They never will be. Ever.
** Paco Nathan’s Just Enough Math and John Foreman’s Data Smart are refreshing exceptions.
*** A reality of working in a team environment is that you do not always have full control over your computing environment.
**** Which is why, for example, we may choose to express PageRank in GAMS….
***** Again, there is zero – ZERO – chance that AMPL, GAMS, AIMMS, or your favorite optimization modeling language will be used in these scenarios. I don’t mean to offend proponents of what are great systems, just stating a wider industry reality that may not be fully appreciated by those outside industry.

Author: natebrix

Follow me on twitter at @natebrix.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s