Though Swamy’s article focuses on mixed integer programming (MIP), a specific category of optimization problems for which there is robust, efficient software, his article applies to optimization generally. Optimization is goal seeking; searching for the values of variables that lead to the best outcomes. Optimizers solve for the best variable values.

Swamy describes two relationships between optimization and machine learning:

- Optimization as a means for doing machine learning,
- Machine learning as a means for doing optimization.

I want to put forward a third, but we’ll get to that in a moment.

Relationship 1: you can always describe **predicting** in terms of **solving**. A typical flow for prediction in ML is

- Get historical data for:
- The thing you want to predict (the outcome).
- Things that you believe may influence the predicted variable (“features” or “predictors”).

- Train a model using the past data.
- Use the trained model to predict future values of the outcome.

Training a model often means “find model parameters that minimize prediction error in the test set”. Training is solving. Here is a visual representation:

Relationship 2. You can also use ML to optimize. Swami gives several examples of steps in optimization algorithms that can be described using the verbs “predict” or “classify”, so I won’t belabor the point. If the steps in our optimization algorithm are numbered 1, 2, 3, the relationship is like this:

In these two relationships, one verb is used as a subroutine for the other: solving as part of predicting, or predicting as part of solving.

There is a **third** way in which optimization and ML relate: **using the results of machine learning as input data for an optimization model.** In other words, ML and optimization are independent operations but **chained together sequentially**, like this:

My favorite example involves sales forecasting. Sales forecasting is a machine learning problem: predict sales given a set of features (weather, price, coupons, competition, etc). Typically business want to go further than this. They want to take actions that will increase future sales. This leads to the following chain of reasoning:

- If I can reliably predict future sales…
- and I can characterize the relationship between changes in feature values and changes in sales (‘elasticities’)…
- then I can find the set of feature values that will increase sales as much as possible.

The last step is an optimization problem.

But why are we breaking this apart? Why not just stick the machine learning (prediction) step inside the optimization? Why separate them? A couple of reasons:

- If the ML and optimization steps are separate, I can improve or change one without disturbing the other.
- I do not have to do the ML at the same time as I do the optimization.
- I can simplify or approximate the results of the ML model to produce a simpler optimization model, so it can run faster and/or at scale. Put a different way, I want the structure of the ML and optimization models to differ for practical reasons.

In the machine learning world it is common to refer to data pipelines. But ML pipelines can involve models feeding models, too! Chaining ML and optimization like this is often useful, so keep it in mind.

There are other areas where optimization of ML models might be useful:

1. Active learning.

2. Reinforcement learning.

3. Surrogate ML models for optimization and uncertainty quantification.