Informs Conference on Analytics 2011

Wayyyyy back in April I attended the Informs Conference on Business Analytics and Research in Chicago. This conference was renamed from the "Informs Practice Conference on Operations Research" to broaden the appeal: operations research is essentially synonymous with prescriptive analytics. Apparently the rebranding worked: the conference had its highest attendance ever (729 attendees: 75% industry-based, 20% academic, 5% military/government).

There are basically four things I like to do at INFORMS: software workshops, exhibits, the Edelman awards, and presentations. Here are my notes.

Please note: this post is for informational purposes only and does not endorse any particular product. If this post is interpreted that way then I will have to take it down!

Workshops

My first workshop was by Palisade who build the @Risk Excel add-in for Monte Carlo simulation. The format was a walkthrough of @Risk’s features. @Risk is an add-in that provides a ribbon interface for defining, running, and reporting on simulations; custom Excel formulas for Monte Carlo simulation; and custom UI for visualizing results. Simulations are created from standard Excel spreadsheets. An input cell can be associated with a probability distribution by clicking on the appropriate button in the ribbon. Output cells can be turned into simulations (forecasts) in a similar fashion. These actions modify the cells by wrapping the original formulas inside of custom Palisade formulas.

Running a simulation amounts to recalculating the spreadsheet multiple times. The results are collected by @Risk and surfaced to the user in various ways. If you’re familiar with other simulation tools such as Crystal Ball or Risk Solver, none of this is new. Some notes about @Risk:

  • It’s easy to use. For example, the standard output is a histogram chart with summary statistics on the side. The histogram has two sliders that divide the histogram into three vertical slices. At the top are percentage of scenarios that belong to each slice. So if I want to see what percentage of the time my portfolio is underwater, I just drag the slider to 0. If I want to compare the results of multiple simulations, I click a button and see the two histograms overlaid.
  • Choosing the right number of trials for a simulation can be tricky. @Risk has an "automatic" option that simply tests certain summary statistics every 100 iterations and stops when they stop bouncing around. Not foolproof, but simple.
  • Simulations run through the Excel calc chain, so calculation inconsistencies are avoided.
  • @Risk provides very simple, but useful reporting capabilities. Click on a button and the relevant histograms, scatter plots, and summary statistics are written to a worksheet, and the print area is formatted so it will fit on a single page.

As the presenter himself said, the promise is not "deep statistics" but ease of use and flexibility.

My second workshop was by Gurobi Optimization.  Since MIP is the most important category in the world of solvers, I always like to keep up-to-date on what’s going on with Gurobi. (Gurobi supplies the default MIP solver for Solver Foundation.)

Over 6600 free academic licenses used: 60% outside the US, 50% outside the OR community.

Gurobi Cloud has been around for a couple of years. They discovered that they have two types of cloud customers (quoting):

  • “People dedicated to cloud computing
    • People who want access to powerful computers at a moment’s notice
  • People who wanted a cheaper way to get access to Gurobi.
    • Especially when they only need it occasionally"

    So they’re introducing "pay by the day licenses". You pay only for days you need Gurobi at the rate of $200 / day. The installation procedure is the same as the non-cloud version: purchase a license key and drop it in a folder.

My third workshop was by Ziena Optimization who build the Knitro suite of solvers for nonlinear (differentiable) optimization. In brief:

  • It was a practical, practitioner oriented session that showed all of the ins and outs of working with Knitro.
  • It was great to see the presentation on the Solver Foundation connector for Knitro.
  • Attendees want second derivative (Jacobian) support in MSF. The MSF team received the feedback!
  • Another attendee would like Solver Foundation to have better presolve (linear and nonlinear). I agree this is a shortcoming.

Booths

I visited nearly all of the vendor booths and I have some marketing and technical materials for many of them in my office. Here I will only make note of a few:

IBM had a huge, three panel booth. IBM has a broad-based pitch aimed squarely at businesses who require analytics.

NAG is a UK-based company that builds numerical libraries of various kinds, including optimization solvers. 

Frontline Systems (makers of the Excel Solver as well as Risk Solver Platform) had a large booth and a strong presence.

Presentations

  • 18 talks on supply chain management.
  • The "Analytics Process" track was the most popular.
  • There was a great Walmart talk that described how they use analytics in all phases of business (including HR). The most interesting part was a side discussion about R. The Walmart people hate R – they use SAS instead. They flatly refuse to use R because they don’t have full trust in the results, and they don’t know where to turn to for support. IMHO they don’t get the value of R, but on the other side the R people don’t always understand the requirements of business: support, accountability, etc.

Edelman Awards

There were six finalists for the Edelman Award, presented on Monday night. The organizers deem the Edelmans the "Super Bowl of Operations Research".  In each case, the organization implementing the solution was assisted by another company who supplied modeling and software expertise.

Each of these entries are outstanding examples of the importance and impact of operations research. The dollar amounts (in terms of revenue/cost savings realized) are in the multi, multi millions.

I refer you to https://live.blueskybroadcast.com/bsb/client/CL_DEFAULT.asp?Client=569807&PCAT=3074&CAT=3075 for descriptions and video.

Advertisement

2011 INFORMS Conference on Analytics

Recently I have been "heads down" on some new stuff we’re working on for Technical Computing. Fortunately I am getting the chance to fly out to Chicago to attend the 2011 INFORMS Conference on Analytics. The annual meeting is a lot of fun – especially because of the sheer volume of great talks – but this conference is perhaps even more relevant to my day-to-day work with Solver Foundation and Technical Computing. Here are the four things I am looking forward to the most:

The technology workshops on Sunday. We’re in the middle of a development cycle so I don’t have a workshop this time, so I will have the chance to sit back and watch! It looks there is going to be some great stuff from Palisade, Gurobi, Ziena, and many more. The workshops are a great opportunity to learn about the latest tools, straight from those who build them. I try to go as many of these as I can.

The Edelman Awards. As somebody trying to build tools to help people solve real-world problems, this is what it’s all about. It’s inspiring, educational, and impressive. I see that Mike Trick is once again a judge for the competition, and I do not envy his job. It’s going to be tough. I am not sure how I would weigh the novelty of a submission versus its business impact. I have found in several cases that simple blunt instruments can be surprisingly effective in practice, even if they make for totally uninteresting papers. We’ve got some outstanding entries this year.

The plenary keynotes, in particular Chuck Holland from UPS. The title of his talk is "Operations Research: Getting the Attention of Senior Executives". This conference was renamed from the "INFORMS Practice Conference" to put a greater emphasis on the field of analytics: using computing to gain insight from data. In this context, what has been traditionally called "operations research" provides many – but not all – of the tools used to make business critical decisions. What’s intriguing about Chuck’s abstract is that it confronts a key problem head-on: bridging the gap between those who supply insight and possible decisions (who are practitioners of operations research and analytics whether they know it or not), and those who actually make these decisions.

The last thing, and perhaps the most important to me, is the chance to talk with other attendees. The INFORMS Analytics conference is cool because it brings together the research community, practitioners, and software vendors. I learn so much just by hearing everyone’s stories, and I always make connections that make an impact on what I do the rest of the year. To top it all off, it’s in Chicago, one of my favorite cities in the whole world.

Solving traveling salesman problems using Solver Foundation

Update: see the comments below for some helpful hints. If you are unable to run this with your version of Solver Foundation and Gurobi, consider installing the lp_solve plugin for MSF. More details on this thread.

Here’s an example that I walked through during yesterday’s INFORMS session.  Erwin has two blog postings about Solver Foundation and the traveling salesman problem, but I want to throw in my two cents because I want to emphasize a couple of points:

  1. By combining C# and Solver Foundation Services it is possible to express complex models clearly and succinctly.
  2. It is very easy to build powerful, reusable model libraries using C# and Solver Foundation Services.
  3. Solver Foundation Services code can be used in many different application environments (ASP.Net, silverlight, DB, command line apps, WPF, …) with minimal changes.

The traveling salesman problem is a classical problem in computer science, and you should bow your head in shame if you don’t know about it (and turn in your conference badge if you happen to be in Phoenix). William Cook’s book on the traveling salesman problem is a wonderful read. A salesperson needs to make a tour of a number of cities.  The restrictions are that she wants to visit each city once and only once, and she wants to minimize the distance travelled.  This is perhaps the definitive example of an NP-hard problem.

TSP can be solved using mixed integer programming – optimizing a linear goal with linear constraints, where some of the decision variables are integer.  In this first post I will show how to formulate and solve a TSP model using Solver Foundation Services.  In my second post I will show how to use the Gurobi MIP solver using SFS.   There are many different ways to model the TSP – here is a nice introduction.  My goal is to provide a clear, complete example – not build a “production level” TSP model, so I am going to choose a model formulation that dates back to 1960!  First, I need to establish a couple of building blocks that will help me construct the data for the model.  We need to know the distances between each pair of cities.  Typically we are provided the coordinates of the cities and need to derive the distances.  So I will introduce a Coordinate class that contains properties for the (x, y) coordinates, and properties to convert to latitude and longitude.  Finally, a method that computes the distance between points.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SolverFoundation.Services;

namespace Microsoft.SolverFoundation.Samples {
  class TravelingSalesman {
    // TSP coordinate.
    public class Coordinate {
      public int Name { get; set; }

      // X-coordinate (from TSPLIB)
      public double X { get; set; }

      // Y-coordinate (from TSPLIB)
      public double Y { get; set; }

      public Coordinate(int name, double x, double y) {
        Name = name;
        X = x;
        Y = y;
      }

      // Latitude in radians.
      public double Latitude {
        get { return Math.PI * (Math.Truncate(X) + 5 * (X - Math.Truncate(X)) / 3) / 180; }
      }

      // Longitude in radians.
      public double Longitude {
        get { return Math.PI * (Math.Truncate(Y) + 5 * (Y - Math.Truncate(Y)) / 3) / 180; }
      }

      // Geographic distance between two points (as an integer).
      public int Distance(Coordinate p) {
        double q1 = Math.Cos(Longitude - p.Longitude);
        double q2 = Math.Cos(Latitude - p.Latitude);
        double q3 = Math.Cos(Latitude + p.Latitude);
        // There may rounding difficulties her if the points are close together...just sayin'.
        return (int)(6378.388 * Math.Acos(0.5 * ((1 + q1) * q2 - (1 - q1) * q3)) + 1);
      }
    }

    // TSP city-city arc.
    public class Arc {
      public int City1 { get; set; }
      public int City2 { get; set; }
      public double Distance { get; set; }
    }

    // Burma14 from TSPLIB. Optimal tour = 3323.
    private static Coordinate[] data = new Coordinate[] {
      new Coordinate(0, 16.47, 96.10),
      new Coordinate(1, 16.47, 94.44),
      new Coordinate(2, 20.09, 92.54),
      new Coordinate(3, 22.39, 93.37),
      new Coordinate(4, 25.23, 97.24),
      new Coordinate(5, 22.00, 96.05),
      new Coordinate(6, 20.47, 97.02),
      new Coordinate(7, 17.20, 96.29),
      new Coordinate(8, 16.30, 97.38),
      new Coordinate(9, 14.05, 98.12),
      new Coordinate(10, 16.53, 97.38),
      new Coordinate(11, 21.52, 95.59),
      new Coordinate(12, 19.41, 97.13),
      new Coordinate(13, 20.09, 94.55)
    };
}

(The data for this 14-city problem comes from the TSPLIB library). If you’ve been following my blog you know that the building blocks of a Solver Foundation model are: sets, parameters, decisions, goals, and constraints. I am going to implement a simple formulation that is centered around the following (indexed) decisions:

  • Assign[i,j]: this is equal to 1 if the optimal tour contains a trip (or arc) from city i to city j.
  • Rank[i]: this is equal to the number of cities visited after arriving at city i.

We have one parameter in our model:

  • Distance[I,j]: the distance from city i to city j.

With that in mind, here’s the model.  Explanation of the goals and constraints follow.

    
public static void Run() {
      SolverContext context = SolverContext.GetContext();
      context.ClearModel();
      Model model = context.CreateModel();

      // ------------
      // Parameters
      Set city = new Set(Domain.IntegerNonnegative, "city");
      Parameter dist = new Parameter(Domain.Real, "dist", city, city);
      var arcs = from p1 in data
                 from p2 in data
                 select new Arc { City1 = p1.Name, City2 = p2.Name, Distance = p1.Distance(p2) };
      dist.SetBinding(arcs, "Distance", "City1", "City2");
      model.AddParameters(dist);

      // ------------
      // Decisions
      Decision assign = new Decision(Domain.IntegerRange(0, 1), "assign", city, city);
      Decision rank = new Decision(Domain.RealNonnegative, "rank", city);
      model.AddDecisions(assign, rank);

      // ------------
      // Goal: minimize the length of the tour.
      Goal goal = model.AddGoal("TourLength", GoalKind.Minimize,
        Model.Sum(Model.ForEach(city, i => Model.ForEachWhere(city, j => dist[i, j] * assign[i, j], j => i != j))));

      // ------------
      // Enter and leave each city only once.
      int N = data.Length;
      model.AddConstraint("assign1",
        Model.ForEach(city, i => Model.Sum(Model.ForEachWhere(city, j => assign[i, j],
          j => i != j)) == 1));
      model.AddConstraint("assign2",
        Model.ForEach(city, j => Model.Sum(Model.ForEachWhere(city, i => assign[i, j], i => i != j)) == 1));

      model.AddConstraint("A1", Model.ForEach(city, i => Model.Sum(Model.ForEachWhere(city, j => assign[i, j], j => i != j)) == 1));
      model.AddConstraint("A2", Model.ForEach(city, j => Model.Sum(Model.ForEachWhere(city, i => assign[i, j], i => i != j)) == 1));

      // Forbid subtours (Miller, Tucker, Zemlin - 1960...)
      model.AddConstraint("nosubtours",
        Model.ForEach(city,
          i => Model.ForEachWhere(city,
            j => rank[i] + 1 <= rank[j] + N * (1 - assign[i, j]),
            j => Model.And(i != j, i >= 1, j >= 1)
          )
        )
      );

      Solution solution = context.Solve();

      // Retrieve solution information.
      Console.WriteLine("Cost = {0}", goal.ToDouble());
      Console.WriteLine("Tour:");
      var tour = from p in assign.GetValues() where (double)p[0] > 0.9 select p[2];
      foreach (var i in tour.ToArray()) {
        Console.Write(i + " -> ");
      }
      Console.WriteLine();
    }

In my humble opinion, the “Parameter data =” line is an awesome example of the power of LINQ data binding in Solver Foundation.  We generate the 2D matrix of distances using a single LINQ expression. It would be incredibly easy to change the code to retrieve the coordinate data from a database (perhaps using a LINQ expression once again), a file, or even a user application.

The goal is straightforward: minimize the distance traveled.  This is a product of the selected arcs and the distance matrix.   We have two types of constraints:

  • Assignment constraints: these ensure that we enter and leave each city only once.
  • Subtour constraints: these ensure that we do not have any subtours. In a four city problem {A, B, C, D}, for example, we cannot have two cycles (A, B), (C, D). We need to have one tour that contains all the cities.

The assignment constraints are easy using the ForEach and ForEachWhere operations.  I use ForEachWhere because I want to disallow arcs that enter and leave the same city – that doesn’t make sense.  The subtour constraint is a little more complicated. It relates the “assign” and “rank” decisions. The key fact is that if there is an arc from city i to city j, rank[i] + 1 == j. Of course, if the (i, j) arc is not part of the optimal tour then all bets are off. Last note: notice that I can mix parameters, decisions, and C# variables in my expressions.

Getting the cost is very easy using goal.ToDouble().  We can get the tour using either Assign or Rank.  I have chosen to use Assign because it gives me another opportunity to use LINQ.  When you call GetValues() on a decision, you get arrays that contain the value along with the indexes for each decision.  In this case, the last entry in the array is the one we are interested in. There are other ways to conveniently query decsision results, I’ll save that for another time.

The next post will show how we can use Solver Foundation’s plug-in model to tune the behavior of the Gurobi MIP solver.