Optimizing 19th Century Typewriters

The long title for this post is: “Optimizing 19th Century Typewriters using 20th Century Code in the 21st Century”.

Patrick Honner recently shared Hardmath123’s wonderful article “Tuning a Typewriter“. In it, Hardmath123 explores finding the best way to order the letters A-Z on an old and peculiar typewriter. Rather than having a key for each letter as in a modern keyboard, the letters are laid out on a horizontal strip. You shift the strip left or right to find the letter you want, then press a key to enter it:

Screen Shot 2018-11-26 at 12.40.38 PM.png

What’s the best way to arrange the letters on the strip? You probably want to do it in such a way that you have to shift left and right as little as possible. If consecutive letters in the words you’re typing are close together on the strip, you will minimize shifting and type faster.

The author’s approach is to:

  • Come up with an initial ordering at random,
  • Compute the cost of the arrangement by counting how many shifts it takes to type out three well-known books,
  • Try to find two letters that when you swap them results in a lower cost,
  • Swap them and repeat until you can no longer find an improving swap.

This is a strong approach that leads to the same locally optimal arrangements, even when you start from very different initial orderings. It turns out that this is an instance of a more general optimization problem with an interesting history: quadratic assignment problems. I will explain what those are in a moment.

Each time I want to type a letter, I have to know how far to shift the letter strip. That depends on two factors:

  1. The letter that I want to type in next, e.g. if I am trying to type THE and I am on “T”, “H” comes next.
  2. The location of the next letter, relative to the current one T. For example, if H is immediately to the left of T, then the location is one shift away.

If I type in a bunch of letters, the total number of shifts can be computed by multiplying two matrices:

  • A frequency matrix F. The entry in row R and column C is a count of how often letter R precedes letter C. If I encounter the word “THE” in my test set, then I will add 1 to F(“T”, “H”) and 1 to F(“H”, “E”).
  • A distance matrix D. The entry in row X and column Y is the number of shifts between positions X and Y on the letter strip. For example, D(X, X+1) = 1 since position X is next to position X+1.

Since my problem is to assign letters to positions, if I permute the rows and columns of D and multiply this matrix with F, I will get the total number of shifts required. We can easily compute F and D for the typewriter problem:

  • To obtain F, we can just count how often one letter follows another and record entries in the 26 x 26 matrix. Here is a heatmap for the matrix using the full Project Gutenberg files for the three test books:

Screen Shot 2018-11-26 at 12.58.40 PM.png

  • The distance matrix D is simple: if position 0 is the extreme left of the strip and 25 the extreme right, d_ij = abs(i – j).

The total number of shifts is obtained by summing f_ij * d_p(i),p(j) for all i and j, where letter i is assigned to location p(i).

Our problem boils down to finding a permutation that minimizes this matrix multiplication. Since the cost depends on the product of two matrices, this is referred to as a Quadratic Assignment Problem (QAP). In fact, problems very similar to this one are part of the standard test suite of problems for QAP researchers, called “QAPLIB“. The so-called “bur” problems have similar flow matrices but different distance matrices.

We can use any QAP solution approach we like to try to solve the typewriter problem. Which one should we use? There are two types of approaches:

  • Those that lead to provably global optimal solutions,
  • Heuristic techniques that often provide good results, but no guarantees on “best”.

QAP is NP-hard, so finding provably optimal solutions is challenging. One approach for finding optimal solutions, called “branch and bound”, boils down to dividing and conquering by making partial assignments, solving less challenging versions of these problems, and pruning away assignments that cannot possibly lead to better solutions. I have written about this topic before. If you like allegories, try this post. If you prefer more details, try my PhD thesis.

The typewriter problem is size 26, which counts as “big” in the world of QAP. Around 20 years ago I wrote a very capable QAP solver, so I recompiled it and ran it on this problem – but didn’t let it finish. I am pretty sure it would take at least a day of CPU time to solve, and perhaps more. It would be interesting to see if someone could find a provably optimal solution!

In the meantime, this still leave us with heuristic approaches. Here are a few possibilities:

  • Local optimization (Hardmath123’s approach finds a locally optimal “2-swap”)
  • Simulated annealing
  • Evolutionary algorithms

I ran a heuristic written by Éric Taillard called FANT (Fast ant system). I was able to re-run his 1998 code on my laptop and within seconds I was able to obtain the same permutation as Hardmath123. By the way, the zero-based permutation is [9, 21, 5, 6, 12, 19, 3, 10, 8, 24, 1, 16, 18, 7, 15, 22, 25, 14, 13, 11, 17, 2, 4, 23, 20, 0] (updated 12/7/2018 – a previous version of this post gave the wrong permutation. Thanks Paul Rubin for spotting the error!)

You can get the data for this problem, as well as a bit of Python code to experiment with, in this git repository.

It’s easy to think up variants to this problem. For example, what about mobile phones? Other languages? Adding punctuation? Gesture-based entry? With QAPs, anything is possible, even if optimality is not practical.


Solving traveling salesman problems using Solver Foundation

Update: see the comments below for some helpful hints. If you are unable to run this with your version of Solver Foundation and Gurobi, consider installing the lp_solve plugin for MSF. More details on this thread.

Here’s an example that I walked through during yesterday’s INFORMS session.  Erwin has two blog postings about Solver Foundation and the traveling salesman problem, but I want to throw in my two cents because I want to emphasize a couple of points:

  1. By combining C# and Solver Foundation Services it is possible to express complex models clearly and succinctly.
  2. It is very easy to build powerful, reusable model libraries using C# and Solver Foundation Services.
  3. Solver Foundation Services code can be used in many different application environments (ASP.Net, silverlight, DB, command line apps, WPF, …) with minimal changes.

The traveling salesman problem is a classical problem in computer science, and you should bow your head in shame if you don’t know about it (and turn in your conference badge if you happen to be in Phoenix). William Cook’s book on the traveling salesman problem is a wonderful read. A salesperson needs to make a tour of a number of cities.  The restrictions are that she wants to visit each city once and only once, and she wants to minimize the distance travelled.  This is perhaps the definitive example of an NP-hard problem.

TSP can be solved using mixed integer programming – optimizing a linear goal with linear constraints, where some of the decision variables are integer.  In this first post I will show how to formulate and solve a TSP model using Solver Foundation Services.  In my second post I will show how to use the Gurobi MIP solver using SFS.   There are many different ways to model the TSP – here is a nice introduction.  My goal is to provide a clear, complete example – not build a “production level” TSP model, so I am going to choose a model formulation that dates back to 1960!  First, I need to establish a couple of building blocks that will help me construct the data for the model.  We need to know the distances between each pair of cities.  Typically we are provided the coordinates of the cities and need to derive the distances.  So I will introduce a Coordinate class that contains properties for the (x, y) coordinates, and properties to convert to latitude and longitude.  Finally, a method that computes the distance between points.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SolverFoundation.Services;

namespace Microsoft.SolverFoundation.Samples {
  class TravelingSalesman {
    // TSP coordinate.
    public class Coordinate {
      public int Name { get; set; }

      // X-coordinate (from TSPLIB)
      public double X { get; set; }

      // Y-coordinate (from TSPLIB)
      public double Y { get; set; }

      public Coordinate(int name, double x, double y) {
        Name = name;
        X = x;
        Y = y;

      // Latitude in radians.
      public double Latitude {
        get { return Math.PI * (Math.Truncate(X) + 5 * (X - Math.Truncate(X)) / 3) / 180; }

      // Longitude in radians.
      public double Longitude {
        get { return Math.PI * (Math.Truncate(Y) + 5 * (Y - Math.Truncate(Y)) / 3) / 180; }

      // Geographic distance between two points (as an integer).
      public int Distance(Coordinate p) {
        double q1 = Math.Cos(Longitude - p.Longitude);
        double q2 = Math.Cos(Latitude - p.Latitude);
        double q3 = Math.Cos(Latitude + p.Latitude);
        // There may rounding difficulties her if the points are close together...just sayin'.
        return (int)(6378.388 * Math.Acos(0.5 * ((1 + q1) * q2 - (1 - q1) * q3)) + 1);

    // TSP city-city arc.
    public class Arc {
      public int City1 { get; set; }
      public int City2 { get; set; }
      public double Distance { get; set; }

    // Burma14 from TSPLIB. Optimal tour = 3323.
    private static Coordinate[] data = new Coordinate[] {
      new Coordinate(0, 16.47, 96.10),
      new Coordinate(1, 16.47, 94.44),
      new Coordinate(2, 20.09, 92.54),
      new Coordinate(3, 22.39, 93.37),
      new Coordinate(4, 25.23, 97.24),
      new Coordinate(5, 22.00, 96.05),
      new Coordinate(6, 20.47, 97.02),
      new Coordinate(7, 17.20, 96.29),
      new Coordinate(8, 16.30, 97.38),
      new Coordinate(9, 14.05, 98.12),
      new Coordinate(10, 16.53, 97.38),
      new Coordinate(11, 21.52, 95.59),
      new Coordinate(12, 19.41, 97.13),
      new Coordinate(13, 20.09, 94.55)

(The data for this 14-city problem comes from the TSPLIB library). If you’ve been following my blog you know that the building blocks of a Solver Foundation model are: sets, parameters, decisions, goals, and constraints. I am going to implement a simple formulation that is centered around the following (indexed) decisions:

  • Assign[i,j]: this is equal to 1 if the optimal tour contains a trip (or arc) from city i to city j.
  • Rank[i]: this is equal to the number of cities visited after arriving at city i.

We have one parameter in our model:

  • Distance[I,j]: the distance from city i to city j.

With that in mind, here’s the model.  Explanation of the goals and constraints follow.

public static void Run() {
      SolverContext context = SolverContext.GetContext();
      Model model = context.CreateModel();

      // ------------
      // Parameters
      Set city = new Set(Domain.IntegerNonnegative, "city");
      Parameter dist = new Parameter(Domain.Real, "dist", city, city);
      var arcs = from p1 in data
                 from p2 in data
                 select new Arc { City1 = p1.Name, City2 = p2.Name, Distance = p1.Distance(p2) };
      dist.SetBinding(arcs, "Distance", "City1", "City2");

      // ------------
      // Decisions
      Decision assign = new Decision(Domain.IntegerRange(0, 1), "assign", city, city);
      Decision rank = new Decision(Domain.RealNonnegative, "rank", city);
      model.AddDecisions(assign, rank);

      // ------------
      // Goal: minimize the length of the tour.
      Goal goal = model.AddGoal("TourLength", GoalKind.Minimize,
        Model.Sum(Model.ForEach(city, i => Model.ForEachWhere(city, j => dist[i, j] * assign[i, j], j => i != j))));

      // ------------
      // Enter and leave each city only once.
      int N = data.Length;
        Model.ForEach(city, i => Model.Sum(Model.ForEachWhere(city, j => assign[i, j],
          j => i != j)) == 1));
        Model.ForEach(city, j => Model.Sum(Model.ForEachWhere(city, i => assign[i, j], i => i != j)) == 1));

      model.AddConstraint("A1", Model.ForEach(city, i => Model.Sum(Model.ForEachWhere(city, j => assign[i, j], j => i != j)) == 1));
      model.AddConstraint("A2", Model.ForEach(city, j => Model.Sum(Model.ForEachWhere(city, i => assign[i, j], i => i != j)) == 1));

      // Forbid subtours (Miller, Tucker, Zemlin - 1960...)
          i => Model.ForEachWhere(city,
            j => rank[i] + 1 <= rank[j] + N * (1 - assign[i, j]),
            j => Model.And(i != j, i >= 1, j >= 1)

      Solution solution = context.Solve();

      // Retrieve solution information.
      Console.WriteLine("Cost = {0}", goal.ToDouble());
      var tour = from p in assign.GetValues() where (double)p[0] > 0.9 select p[2];
      foreach (var i in tour.ToArray()) {
        Console.Write(i + " -> ");

In my humble opinion, the “Parameter data =” line is an awesome example of the power of LINQ data binding in Solver Foundation.  We generate the 2D matrix of distances using a single LINQ expression. It would be incredibly easy to change the code to retrieve the coordinate data from a database (perhaps using a LINQ expression once again), a file, or even a user application.

The goal is straightforward: minimize the distance traveled.  This is a product of the selected arcs and the distance matrix.   We have two types of constraints:

  • Assignment constraints: these ensure that we enter and leave each city only once.
  • Subtour constraints: these ensure that we do not have any subtours. In a four city problem {A, B, C, D}, for example, we cannot have two cycles (A, B), (C, D). We need to have one tour that contains all the cities.

The assignment constraints are easy using the ForEach and ForEachWhere operations.  I use ForEachWhere because I want to disallow arcs that enter and leave the same city – that doesn’t make sense.  The subtour constraint is a little more complicated. It relates the “assign” and “rank” decisions. The key fact is that if there is an arc from city i to city j, rank[i] + 1 == j. Of course, if the (i, j) arc is not part of the optimal tour then all bets are off. Last note: notice that I can mix parameters, decisions, and C# variables in my expressions.

Getting the cost is very easy using goal.ToDouble().  We can get the tour using either Assign or Rank.  I have chosen to use Assign because it gives me another opportunity to use LINQ.  When you call GetValues() on a decision, you get arrays that contain the value along with the indexes for each decision.  In this case, the last entry in the array is the one we are interested in. There are other ways to conveniently query decsision results, I’ll save that for another time.

The next post will show how we can use Solver Foundation’s plug-in model to tune the behavior of the Gurobi MIP solver.