Quadratic Assignment Problems: branch and bound

In my last post I introduced quadratic assignment problems.  As I explained, the problem is to assign an equal number of facilities to locations, minimizing the transportation cost.  A compact way to write the problem is to store the “flows” in a matrix A, the “distances” in matrix B, and represent assignments as permutation matrices.  If you throw in a linear term C, then the problem can be written as:

min tr AXBX' + CX'

Branch-and-bound algorithms are the most common technique for finding optimal solutions for QAP.  The next few posts will focus on describing how branch-and-bound works and will result in a complete implementation of a QAP solver in C#.  Branch-and-bound is a “divide and conquer” solution technique for combinatorial problems. Branch-and-bound repeatedly divides a problem into simpler subproblems of the same type, which are then solved or further subdivided. The search space of a problem is partitioned into search spaces for its subproblems. The algorithm generates a search tree with the root corresponding to the original problem to be solved.  At each node in the search tree, a relaxed version of the subproblem is solved, typically by loosening the constraints of the subproblem. The solution to the relaxation is a lower bound to the original subproblem. If a lower bound at a given node exceeds the value of a previously known solution, then continued searching on this path cannot lead to an improved solution, and there’s no need to continue searching on that node’s subtree (this is called “fathoming”).

A little less formally, we take the search space and divide it into subregions.  Then for each piece, we solve a “relaxed” version of the problem to see if we can rule out the solution lying in that subregion.  If we can’t rule it out, we divide the subregion into pieces and repeat.  The “relaxation” part is bounding, the dividing is called branching.  The effectiveness of a branch-and-bound algorithm depends on 1) the strength of the bound, 2) how fast bounds can be computed, and 3) the branching decisions.  For example, if the bound is terrible we’ll never fathom subproblems and we’ll end up searching the entire tree.

So how do we apply branch-and-bound to QAP?  The solution to a QAP is an assignment of facilities to locations.  Therefore the subproblems are obtained by assigning some of the facilities to locations.  To branch, I can pick a facility and try assigning it to each of the available locations (or vice-versa).  As we will see, the logic for selecting which facility/location to branch on is extremely important.

Let’s start putting together some of the pieces of our branch-and-bound solver for QAP.  First, let’s define a QAP, which as I said before consists of the flow and distance matrices A and B, and a linear term C.

  // A quadratic assignment problem.
  public class Qap {
    // A (flow matrix).
    public double[][] A { get; set; }
    // B (distance matrix).
    public double[][] B { get; set; }
    // C (linear costs).
    public double[][] C { get; set; }
    // Constant term.
    public double Shift { get; set; }

    // Size of (A, B, C) matrices.
    public int Size {
      get { return A == null ? 0 : A.Length; }
    }

    // Compute tr AXBX' + CX', where the permutation matrix X is determined by p.
    public double Evaluate(int[] p) {
      double obj = this.Shift;
      for (int i = 0; i < A.Length; i++) {
        obj += this.C[i][p[i]];
        for (int j = 0; j < A[i].Length; j++) {
          obj += this.A[i][j] * this.B[p[i]][p[j]];
        }
      }
      return obj;
    }
}

To make the code shorter I am using C# 3.0 automatic properties. In addition to the matrices I have also added a constant term “Shift”, whose purpose will become clear in a moment.  I’ve also provided a method called Evaluate() that computes the objective given a permutation.  Let’s think about how branching works.  If we go back to the mathematical formulation of QAP at the start of this post, assigning a facility to a location means setting X_ij = 1.  If we plug X_ij into the formula above we see that we end up with a QAP of size n – 1 with:

A′ = A(ii)
B′ = B(jj)
C′ = C(ij) + 2 a_i b_j ,
Shift’ = A[i][i] B[j][j] + C[i][j] + Shift

Where  A(ij) denotes matrix A with row i and column j removed, a_i are the elements in row i of A, excluding A[i][i], and let b_j are the elements in column j of B, excluding b[j][j]. Here’s the code:

    private static Qap Reduce(Qap qap, int i, int j) {
      Qap r = new Qap();

      r.A = Reduce(qap.A, i, i);
      r.B = Reduce(qap.B, j, j);
      r.C = Reduce(qap.C, i, j);

      // Tack on: 2 a'_i b'_j (where a'_i is row i of A, with element i removed)
      for (int ii = 0; ii < r.Size; ii++) {
        for (int jj = 0; jj < r.Size; jj++) {
          r.C[ii][jj] += 2 * qap.A[i][ii < i ? ii : ii + 1] * qap.B[jj < j ? jj : jj + 1][j];
        }
      }

      r.Shift = qap.Shift + qap.A[i][i] * qap.B[j][j] + qap.C[i][j];

      return r;
    }

    private static double[][] Reduce(double[][] M, int i, int j) {
      double[][] R = LinearAlgebra.NewMatrix(M.Length - 1);
      for (int ii = 0; ii < R.Length; ii++) {
        for (int jj = 0; jj < R[ii].Length; jj++) {
          R[ii][jj] = M[ii < i ? ii : ii + 1][jj < j ? jj : jj + 1];
        }
      }
      return R;
    }

LinearAlgebra.NewMatrix() is a helper function that initializes the matrix. So now we have a data structure for QAP, and we now how to obtain subproblems.  Next time we’ll look at how to compute lower bounds.

Update 6/9/2010: Here is an additional method to read a problem in QAPLIB format.  It does not attempt to read the “C” matrix, and it is not totally robust. But it should work.

    public static Qap Read(StreamReader reader) {
      string content = reader.ReadToEnd();
      var tokens = content.Split(new char[] { ' ', '\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
      int k = 0;
      int size;
      if (Int32.TryParse(tokens[k++], NumberStyles.Integer, NumberFormatInfo.InvariantInfo, out size)) {
        Qap qap = new Qap();
        qap.A = ReadMatrix(size, size, tokens, ref k);
        qap.B = ReadMatrix(size, size, tokens, ref k);
        return qap;
      }
      throw new InvalidDataException("Invalid QAP size.");
    }

    private static double[][] ReadMatrix(int m, int n, string[] tokens, ref int k) {
      if (k + m * n > tokens.Length) {
        throw new InvalidDataException("Not enough matrix data.");
      }
      double[][] matrix = LinearAlgebra.NewMatrix(m, n);
      for (int i = 0; i < matrix.Length; i++) {
        for (int j = 0; j < matrix[i].Length; j++) {
          if (!Double.TryParse(tokens[k++], NumberStyles.Float, NumberFormatInfo.InvariantInfo, out matrix[i][j])) {
            throw new InvalidDataException("Invalid matrix entry.");
          }
        }
      }
      return matrix;
    }
Advertisements

Quadratic Assignment Problems, Part I

In a previous series I talked a bit about linear assignment problems (LAPs) and provided code for a simple LAP solver.  Now I want to spend a few posts on LAP’s big brother, the Quadratic Assignment Problem.  In an LAP we are trying to assign objects in one set to an equal number of objects in another set, where we are given the cost of assigning any pair of objects.  In QAP we still have two sets of objects – and I will stick with the original formulation of the problem (from the 50’s) and call the first step of objects “facilities” and the second set “locations”.  Imagine that we are going to build a bunch of computer labs in different rooms of a building, for example.  Suppose that each lab has a different purpose, and the rooms are wired up and we’re trying to figure out where to put each lab.  The quadratic assignment problem is the problem of finding an assignment
of an equal number of facilities to locations that minimizes the transportation cost between facilities.  In my example, the transportation cost is a product of how far the rooms are from each other and how often one travels from one lab to the other.

QAPs are interesting problems in part because they are so difficult to solve.  Finding the optimal assignment is an NP-hard problem.  In upcoming posts I will write down the problem more formally, and explore some heuristics for QAP, as well as some tools for finding exact solutions for QAP.

How to fix bugs (and influence people)

I work on a large team with a lot of dependencies and fairly tight deadlines.  I wanted to share some of a few techniques for fixing bugs quickly.  Maybe this is all common knowledge but I’ll share it anyway.

  1. Trade bugs. It’s not very efficient if you and another dev to be modifying the same code at the same time.  Swap bugs with your teammates to define clear ownership.  Sooner is better than later.
  2. Fix areas, not bugs. Don’t just fix one string on a page, fix all the strings on the page.  If a validation check was missing in one API, it is probably missing in other APIs too.  Proactively look for related bugs assigned to other people and take them on.
  3. Kick off a build every night before you leave. I have known people who scheduled jobs to do this automatically.
  4. Pipeline your work. There is a lot of idle time spent waiting for builds to finish, deploying, etc.  Fill that time with something useful by switching over to another bug.  My pattern is usually to repro the bug (stepping through the code), code the solution, then do final testing.  So I like to work on three bugs (actually three sets of bugs, see above) at once:
    debug  code  test
          debug  code  test
                 debug code  test

    Sometimes there is a fourth step – “think” – for more difficult bugs.

  5. Write tools. Sometimes it is a big pain in the ass to work through the product to repro and fix bugs.  Sometimes building simple tools can really help your productivity, for example a simple winforms app that mimicks a more complicated client application.  I have written command line apps that talked to webservices in the past.  Once you build the tool, let your teammates know.
  6. Unit test and refactor. Try to think about how you can build a strong foundation of unit tests that are easy to add to.  Unit tests prevent regressions.  Also, once you have confidence in your unit tests you can fearlessly refactor code if you notice lots of bugs in a certain area.
  7. Schedule uninterrupted time for yourself. It’s hard to work effectively in 15 minute chunks.
  8. Don’t break the f***ing build. That slows down the whole team.  If you notice a blocking issue, jump on it immediately even if it is not your area.  You will learn something in the process.

Linear assignment problems, part III

Here’s the code. See the previous posting for background.

using System;

namespace NathanBrixius.AssignmentLib
{
    /// <summary>Linear assignment problem solver.<summary>
    /// <remarks>
    /// This code is a port of Fortran code by R.E. Burkard and U. Derigs. 
    /// Assignment and matching problems: Solution methods with fortran programs, 
    /// volume 184 of Lecture Notes in Economics and Mathematical Systems. 
    /// Springer, Berlin, 1980.  The original code is available on the QAPLIB site.
    /// </remarks>
    public class LinearAssignment
    {
        public static void Main(params string[] args)
        {
            double[][] L = new double[5][];
            for (int i = 0; i < L.Length; i++)
            {
                L[i] = new double[5];
                for (int j = 0; j < L[i].Length; j++)
                {
                    L[i][j] = i + j;
                }
            }
            int[] p = new int[L.Length];
            int[] ip = new int[L.Length];
            double[] ys = new double[L.Length];
            double[] yt = new double[L.Length];
            Solve(L, p, ip, ys, yt);
        }

        /// <summary>Evaluate a permutation against the given LAP.<summary>
        /// <param name="L">LAP cost matrix.</param>
        /// <param name="p">Permutation.</param>
        /// <returns>The cost of the permutation.</returns>
        public static double Evaluate(double[][] L, int[] p)
        {
            double cost = 0.0;
            for (int i = 0; i < p.Length; i++)
            {
                cost += L[i][p[i]];
            }
            return cost;
        }

        /// <summary>Calculate the reduced costs matrix for the given LAP and solution.<summary>
        /// <param name="L">LAP cost matrix.</param>
        /// <param name="U">Reduced cost matrix (filled in by the method)</param>
        /// <param name="ys">Dual variables.</param>
        /// <param name="yt">Dual variables.</param>
        public static void ReducedCosts(
            double[][] L,
            double[][] U,
            double[] ys,
            double[] yt)
        {
            for (int i = 0; i < L.Length; i++)
            {
                for (int j = 0; j < L[i].Length; j++)
                {
                    U[i][j] = L[i][j] - (yt[j] + ys[i]);

                }
            }
        }

        /// <summary>Solve the specified linear assignment problem.<summary>
        /// <param name="L">LAP cost matrix.</param>
        /// <param name="p">The solution permutation.</param>
        /// <param name="ip">The inverse of the solution permutation.</param>
        /// <param name="ys">Dual variables.</param>
        /// <param name="yt">Dual variables.</param>
        /// <returns></returns>
        public static double Solve(
            double[][] L,
            int[] p,
            int[] ip,
            double[] ys,
            double[] yt
            )
        {
            if (L.Length == 0)
            {
                return 0.0;
            }
            return SolveCore(L, ip, p, ys, yt, 1.0e-7);
        }

        /// <summary>Linear sum assignment problem with double precision costs.<summary>
        /// <param name="L">Cost matrix.</param>
        /// <param name="ip">Inverse of optimal assignment.</param>
        /// <param name="p">Optimal assignment.</param>
        /// <param name="ys">Optimal dual (row) variables.</param>
        /// <param name="yt">Optimal dual (column) variables.</param>
        /// <param name="tolerance">Machine accuracy.</param>
        /// <returns>The optimal cost.</returns>
        private static double SolveCore(
            double[][] L,
            int[] ip,
            int[] p,
            double[] ys,
            double[] yt,
            double tolerance
            )
        {
            const int BadIndex = -1;
            int[] vor = new int[L.Length];
            bool[] label = new bool[L.Length];
            double[] dMinus = new double[L.Length];
            double[] dPlus = new double[L.Length];

            // Start: Construction of an initial (partial) assignment.
            // (partial --> p[i]/ip[j] may be BadIndex for some i,j)
            double maxValue = 1.0e15;
            for (int i = 0; i < p.Length; i++)
            {
                ip[i] = BadIndex;
                p[i] = BadIndex;
                vor[i] = BadIndex;
                ys[i] = 0;
                yt[i] = 0;
            }

            double ui = 0.0;
            int jOpt = -1; // optimal j index
            for (int i = 0; i < L.Length; i++)
            {
                for (int j = 0; j < L[i].Length; j++)
                {
                    if ((j == 0) || (L[i][j] - ui < tolerance))
                    {
                        ui = L[i][j];
                        jOpt = j;
                    }
                }

                // guarantee: jOpt >= 0
                ys[i] = ui;
                if (ip[jOpt] == BadIndex)
                {
                    ip[jOpt] = i;
                    p[i] = jOpt;
                }
            }

            for (int j = 0; j < ip.Length; j++)
            {
                if (ip[j] == BadIndex)
                {
                    yt[j] = maxValue;
                }
            }

            for (int i = 0; i < ys.Length; i++)
            {
                ui = ys[i];
                for (int j = 0; j < yt.Length; j++)
                {
                    double vj = yt[j];
                    if (vj > tolerance)
                    {
                        double cc = L[i][j] - ui;
                        if (cc + tolerance < vj)
                        {
                            yt[j] = cc;
                            vor[j] = i;
                        }
                    }
                }
            }

            for (int j = 0; j < vor.Length; j++)
            {
                int i = vor[j];
                if ((i != BadIndex) && (p[i] == BadIndex))
                {
                    p[i] = j;
                    ip[j] = i;
                }
            }

            for (int i = 0; i < p.Length; i++)
            {
                if (p[i] == BadIndex)
                {
                    ui = ys[i];
                    for (int j = 0; j < ip.Length; j++)
                    {
                        if (ip[j] == BadIndex)
                        {
                            double lij = L[i][j];
                            if ((lij - ui - yt[j] + tolerance) <= 0)
                            {
                                p[i] = j;
                                ip[j] = i;
                            }
                        }
                    }
                }
            }

            // Construction of the optimal assignment
            double d = 0.0;
            int w = BadIndex;
            int index = BadIndex;
            for (int u = 0; u < p.Length; u++)
            {
                if (p[u] == BadIndex)
                {
                    // Shortest path computation
                    for (int i = 0; i < vor.Length; i++)
                    {
                        vor[i] = u;
                        label[i] = false;
                        dPlus[i] = maxValue;
                        dMinus[i] = L[u][i] - ys[u] - yt[i];
                    }
                    dPlus[u] = 0;

                    bool done = false;
                    while (!done)
                    {
                        d = maxValue;
                        for (int i = 0; i < dMinus.Length; i++)
                        {
                            if (!label[i] && (dMinus[i] + tolerance < d))
                            {
                                d = dMinus[i];
                                index = i;
                            }
                        }

                        // note: index >= 0
                        if (ip[index] != BadIndex)
                        {
                            label[index] = true;
                            w = ip[index];
                            dPlus[w] = d;
                            for (int i = 0; i < label.Length; i++)
                            {
                                if (!label[i])
                                {
                                    double vgl = d + L[w][i] - ys[w] - yt[i];
                                    if (dMinus[i] > vgl + tolerance)
                                    {
                                        dMinus[i] = vgl;
                                        vor[i] = w;
                                    }
                                }
                            }
                        }
                        else
                        {
                            done = true;
                        }
                    }

                    // Augmentation
                    done = false;
                    while (!done)
                    {
                        w = vor[index];
                        ip[index] = w;
                        int tempIndex = p[w];
                        p[w] = index;
                        if (w == u)
                        {
                            done = true;
                        }
                        else
                        {
                            index = tempIndex;
                        }
                    }

                    // Transformation
                    for (int i = 0; i < dPlus.Length; i++)
                    {
                        if (dPlus[i] != maxValue)
                        {
                            ys[i] += d - dPlus[i];
                        }

                        if (dMinus[i] + tolerance < d)
                        {
                            yt[i] += dMinus[i] - d;
                        }
                    }
                }
            }

            // Computation of the optimal value
            return Evaluate(L, p);
        }
    }
}

Linear assignment problems, part II

In a previous post I talked a bit about the linear assignment problem:  finding the minimum cost matching in a bipartite graph.  In my next post I will give a simple self-contained C# program for solving LAP based on some Fortran code from my colleague Rainer Burkard.  First a few notes and disclaimers!

The input to the code is a matrix which represents the adjacency matrix of the graph.  In my application (solving quadratic assignment problems) I am concerned with input matrices that are relatively small and dense.   The code is a port of Fortran code by Burkard and Ulrich Derigs:  Assignment and matching problems: Solution methods with Fortran programs, volume 184 of Lecture Notes in Economics and Mathematical Systems.  Springer, Berlin, 1980.  The original code is available on the QAPLIB site.  Rainer has given me permission to post this code, although if you look at the terms on the original you will see that it is freely distributable.  This code is offered under the same conditions and I make no claims about its correctness, value, etc.  But it works okay for me.

My port is relatively “conservative” in that I have tried to retain the original structure and flow as much as possible because I don’t want to introduce any bugs.  I want to keep the implementation clean, so like many numerical optimization codes there isn’t much input validation.  I have chosen to use arrays rather than trying to encapsulate the input and output into matrix and vector classes.  I kept it simple so you can adapt it to your needs.

Lastly, I wanted to mention that this is not the most efficient LAP solver available.  At the time I wrote my thesis the fastest code over the domain I was working on was the code of Jonker and Volgenant, which is available on this site.  I notice that the code is used by others as well, see here and here.  This code is really quite slick and I had a great experience working with it all those years ago.  It’s not hard to port it over to C#, so go for it!

What makes a good software developer

A colleague asked me to jot down some thoughts on what makes a good developer at Microsoft.  This is by no means complete, but three statements come to mind when I think about strong developers:

  • We are engineers, not hackers.
  • A strong dev should be able to fill any engineering role in the company.
  • Senior devs improve their teams by action and by example.
     
  • We are engineers, not hackers.  My dev manager in Project (my previous team) said this years ago and it has stuck with me ever since.  Engineering is a craft whereas hacking is a hobby, and I expect a Microsoft dev to seek to become a master of the craft.  Engineers understand common patterns underlying the problems they solve and seek to make their code predictable and reliable.  (This doesn’t suck the creativity or joy out of the process, rather it should liberate us.)  For an engineer, process is not an end to itself but a tool for arriving at solid designs and trustworthy code.  Engineers take great pride in their creations and stand behind them.
     
    Engineering practices apply to all stages of software development.  In the design phase I expect a structured approach that leads to a design that clearly meets the needs described in the spec.  The deliverable is a design doc that describes how the feature is implemented.  A good design doc is an aid to the developer, his/her peers, as well as Test/PM.  I don’t want to get into details about what makes a good design doc, but it should describe the primary design and implementation challenges and how they are addressed, and why other reasonable approaches were rejected.  On a services team it’s particularly important to think of the impact the feature has on setup, deployment, and monitoring, and account for these early.
     
    Everyone has things to say about coding guidelines, so I won’t dwell on them much here.  I assembled a set of team-wide guidelines and I expect those to be followed.  I always look for simplicity and consistency.  As a services team I particularly value unit tests and tracing. 
     
    Finally, I expect a positive handoff to test.  By this I mean a detailed TRD (test release document) that describes what the feature does, what works, what doesn’t work, and suggestions for testing the feature.  I expect code to be structured so that it is easy to test as possible.  I don’t particularly care for metrics based on the number of bugs a dev fixes.  I value developers who prevent bugs from happening in the first place, and it’s easiest to do this in the earliest stages of the project before any code is written.  This begs the question of how do we measure the quality of a feature relative to the time spent on it.  I don’t have easy answers except to say that if leads and peers are involved in the entire development process, it is easier to come to a collective understanding of how challenging a feature is and how well the developer has done at implementing it.  In other words, it comes down to judgment.
     
    A strong dev should be able to fill any engineering role in the company.   The statement applies in two ways.  First, a dev should be able to wear their "PM hat" or "test hat" and operate effectively within that role.  Perhaps this is statement is too "old school Microsoft", and I certainly don’t mean to detract from the roles of Test and PM.  After all, Test and PM are also engineering disciplines.   I simply mean that devs have a deep understanding of customer requirements and are able to make their own assessments of the relative priority of features and work items, and can make suggestions as to how to improve the experience.  Devs should be able to be ruthless testers of their own (and their teammates) features and provide actionable feedback that results in improvements.  They write unit tests, which guarantee a certain level of quality.  Having the ability to operate in another role also prevents a dev from becoming blocked, and provides broader perspectives which assist devs in interacting with their Test and PM counterparts.  Everyone understands where everyone else is coming from, and that makes for a more enjoyable working environment.
     
    Secondly the statement means that a dev is not dependent on domain-specific knowledge to be effective.  A deep understanding of specific technologies, code bases, and design patterns is essential for good engineers.  But a strong dev is not a "one trick pony" that can only work in one area (though they may prefer to do so, and may be asked to do so from time to time).  A dev has technical depth, but also technical breadth, which allows them to see how their code functions to serve a larger purpose.  I’d like to think that I could be dropped into the middle of Windows, and though I would probably hate it, I would be effective.
     
    Senior devs improve their teams by action and by example.  There are only so many hours in a day, so at a certain point it becomes more difficult to provide more value by simply coding more or working longer.  Actions that make others better have a compounding benefit that can have a huge impact over time.  Most of us can think of old timers that seem to have an awesome level of productivity, to the extent that it’s hard to understand how they can be that effective.  I suggest that in most cases, these old timers have mastered the art of making others better.  First, they typically do not worry themselves about things like "scope" or "area".  Making Microsoft better is their "area", and that extends to fixing bugs in other dev’s code, fixing build problems, improving processes, giving feedback to other teams.  Master devs share knowledge, through conversations, whiteboard sessions, documents, blogs.  They cringe when they hear people talk about "their code"; it’s "our code".  Their movements are not hurried, but always with purpose.   They’ve got time for other people, and still get their work done.  They’re able to do this because they identify patterns in their work and automate them, either by defining a design pattern, or by architecting the system appropriately, or by writing tools that do their work for them.  Then they share this knowledge with the rest of the team and build a culture where others do the same.  Lastly, they continually seek new opportunities to make themselves and their team better.  These opportunities come up all the time in a product cycle; a master dev spots them first and steps in to fill a need, for example by forwarding an email to someone who knows the answer; seeing that a new platform component could help implement features in two different projects; adding a new unit test that enforces a coding or stylistic convention.
     
    I love it when devs are passionate about delivering something that works and is useful!

Linear assignment problems, part I

I’m going to devote a few posts to my thesis topic, the quadratic assignment problem.  Quadratic assignment problems are a type of matching problem that are NP hard and have some interesting applications.  I’ll start out by describing linear assignment problems, which are simpler but still fun.

As the name suggests, in a linear assignment problem objects in one set are to be assigned to an equal number of objects in a second set. The cost of assigning objects in the first set to objects in the second set are given, and the objective is to find the assignment that minimizes the total assignment cost.  For example, the problem may be to assign workers to jobs so as to minimize the total amount of working time.  A more formal description of the problem can be found on this page (and I recommend the book, BTW).

LAP can also be posed as the problem of finding the minimum cost matching on a bipartite graph.  For this reason, in algorithms texts this problem is also sometimes referred to as “minimum cost weighted bipartite matching”.  LAPs are linear programs and so they can be solved in polynomial time.  As the wikipedia article on the problem states, there are algorithms that take advantage of the special structure of LAP to solve them more efficiently than general purpose LP solvers.

Next time I’ll go through one possible solution approach for LAP.