Checkpointing and Reusing TensorFlow Models

In my last two posts I introduced TensorFlow and wrote a very simple predictive model. In doing so I introduced many of the key concepts of TensorFlow:

  • The Session, the core of the TensorFlow object model,
  • Computational graphs and some of their elements: placeholders, variables, and Tensors,
  • Training models by iteratively calling Session.run on Optimization objects.

In this post I want to show you can save and re-use the results of your TensorFlow models. As we discussed last time, training a model means finding variable values that suit a particular purpose, for example finding a slope and intercept that defines a line that best fits a series of points. Training a model can be computationally expensive because we have to search for the best variable values through optimization. Suppose we want to use the results of this trained model over and over again, but without re-training the model each time. You can do this in TensorFlow using the Saver object.

A Saver object can save and restore the values of TensorFlow Variables. A typical scenario has three steps:

  1. Creating a Saver and telling the Saver which variables you want to save,
  2. Save the variables to a file,
  3. Restore the variables from a file when they are needed.

A Saver deals only with Variables. It does not work with placeholders, sessions, expressions, or any other kind of TensorFlow object. Here is a simple example that saves and restores two variables:

def save(checkpoint_file=’hello.chk’):
    with tf.Session() as session:
        x = tf.Variable([42.0, 42.1, 42.3], name=’x’)
        y = tf.Variable([[1.0, 2.0], [3.0, 4.0]], name=’y’)
        not_saved = tf.Variable([-1, -2], name=’not_saved’)
        session.run(tf.initialize_all_variables())

        print(session.run(tf.all_variables()))
        saver = tf.train.Saver([x, y])
        saver.save(session, checkpoint_file)

def restore(checkpoint_file=’hello.chk’):
    x = tf.Variable(-1.0, validate_shape=False, name=’x’)
    y = tf.Variable(-1.0, validate_shape=False, name=’y’)
    with tf.Session() as session:
        saver = tf.train.Saver()
        saver.restore(session, checkpoint_file)
        print(session.run(tf.all_variables()))

def reset():
    tf.reset_default_graph()

Try calling save(), reset() and then restore(), and compare the outputs to verify everything worked out. When you create a Saver, you should specify a list (or dictionary) of Variable objects you wish to save. (If you don’t, TensorFlow will assume you are interested in all the variables in your current session.) The shapes and values of these values will be stored in binary format when you call the save() method, and retrieved on restore(). Notice in my last function, when I create x and y, I give dummy values and say validate_shape=False. This is because I want the saver to determine the values and shapes when the variables are restored. If you’re wondering why the reset() function is there, remember that computational graphs are associated with Sessions. I want to “clear out” the state of the Session so I don’t have multiple x and y objects floating around as we call save and restore().

When you use Saver in real models, you should keep a couple of facts in mind:

  1. If you want to do anything useful with the Variables you restore, you may need to recreate the rest of the computational graph.
  2. The computational graph that you use with restored Variables need not be the same as the one that you used when saving. That can be useful!
  3. Saver has additional methods that can be helpful if your computation spans machines, or if you want to avoid overwriting old checkpoints on successive calls to save().

At the end of this post I have include a modification of my line fitting example to optionally save and restore model results. I’ve highlighted the interesting parts. You can call it like this:

fit_line(5, checkpoint_file=’vars.chk’)
reset()
fit_line(5, checkpoint_file=’vars.chk’, restore=True)

With this version, I could easily “score” new data points x using my trained model.

def fit_line(n=1, log_progress=False, iter_scale=200,
             restore=False, checkpoint_file=None):
    with tf.Session() as session:
        x = tf.placeholder(tf.float32, [n], name=’x’)
        y = tf.placeholder(tf.float32, [n], name=’y’)
        m = tf.Variable([1.0], name=’m’)
        b = tf.Variable([1.0], name=’b’)
        y = tf.add(tf.mul(m, x), b) # fit y_i = m * x_i + b
        y_act = tf.placeholder(tf.float32, [n], name=’y_’)

        # minimize sum of squared error between trained and actual.
        error = tf.sqrt((y – y_act) * (y – y_act))
        train_step = tf.train.AdamOptimizer(0.05).minimize(error)

        x_in, y_star = make_data(n)

        saver = tf.train.Saver()
        feed_dict = {x: x_in, y_act: y_star}
        if restore:
            print(“Loading variables from ‘%s’.” % checkpoint_file)
            saver.restore(session, checkpoint_file)
            y_i, m_i, b_i = session.run([y, m, b], feed_dict)
        else:
            init = tf.initialize_all_variables()
            session.run(init)
            for i in range(iter_scale * n):
                y_i, m_i, b_i, _ = session.run([y, m, b, train_step],
                                               feed_dict)
                err = np.linalg.norm(y_i – y_star, 2)
                if log_progress:
                    print(“%3d | %.4f %.4f %.4e” % (i, m_i, b_i, err))

            print(“Done training! m = %f, b = %f, err = %e, iter = %d”
                  % (m_i, b_i, err, i))
            if checkpoint_file is not None:
                print(“Saving variables to ‘%s’.” % checkpoint_file)
                saver.save(session, checkpoint_file)

        print(”      x: %s” % x_in)
        print(“Trained: %s” % y_i)
        print(” Actual: %s” % y_star)

A Simple Predictive Model in TensorFlow

In my previous post I provided a simple introduction to TensorFlow. In this post I’d like to take the next step and build a predictive model so I can highlight some key TensorFlow concepts.

This model will fit a line y = m * x + b to a series of points (x_i, y_i). This code is not the best way fit a line – it’s just an example. In our code, we’ll generate points with small random deviations from a line with known slope and intercept. Our test will be to see if we can recover these known values using TensorFlow. Here is a picture of our training data:

TensorFlowData

My last post explained that there are often four phases to TensorFlow programs: creating a model, getting the input data, running the model, and processing the output. In our model we want to find a slope m and intercept b that best fits our input data. What do we mean by “best fit”? We mean values m, b that give the smallest sum of squared error between the predicted and actual y_i. The way we do this in TensorFlow is create this expression, and then repeatedly run a Session that adjusts the values of m and b to make the error smaller using an optimizer.

There are two functions below: one to generate test data, and another to create and run the TensorFlow model:

def make_data(n):
    np.random.seed(42) # To ensure same data for multiple runs
    x = 2.0 * np.array(range(n))
    y = 1.0 + 3.0 * (np.array(range(n)) + 0.1 * (np.random.rand(n) – 0.5))
    return x, y

def fit_line(n=1, log_progress=False):
    with tf.Session() as session:
        x = tf.placeholder(tf.float32, [n], name=’x’)
        y = tf.placeholder(tf.float32, [n], name=’y’)
        m = tf.Variable([1.0], trainable=True) # training variable: slope
        b = tf.Variable([1.0], trainable=True) # training variable: intercept
        y = tf.add(tf.mul(m, x), b) # fit y_i = m * x_i + b

        # actual values (for training)
        y_act = tf.placeholder(tf.float32, [n], name=’y_’)

        # minimize sum of squared error between trained and actual.
        error = tf.sqrt((y – y_act) * (y – y_act))
        # train_step = tf.train.GradientDescentOptimizer(0.01).minimize(error)
        train_step = tf.train.AdamOptimizer(0.05).minimize(error)

        # generate input and output data with a little random noise.
        x_in, y_star = make_data(n)

        init = tf.initialize_all_variables()
        session.run(init)
        feed_dict = {x: x_in, y_act: y_star}
        for i in range(30 * n):
            y_i, m_i, b_i, _ = session.run([y, m, b, train_step], feed_dict)
            err = np.linalg.norm(y_i – y_star, 2)
            if log_progress:
                print(“%3d | %.4f %.4f %.4e” % (i, m_i, b_i, err))

        print(“Done! m = %f, b = %f, err = %e, iterations = %d”
              % (m_i, b_i, err, i))
        print(”      x: %s” % x_in)
        print(“Trained: %s” % y_i)
        print(” Actual: %s” % y_star)

Hopefully make_data is fairly clear. The function fit_line takes two input arguments:

  • n: the number of points to generate
  • log_progress: whether to display TensorFlow’s progress in finding the right slope m and intercept b.

After we create a TensorFlow session, our next two steps are to create placeholders for our input x and output y, similar to our first example. These are both Tensors of size n since that’s how many data points we have. The next line creates a TensorFlow variable to represent the slope m. A variable is a value that is retained between calls to Session.run(). If the value is an input or an output from the model, we don’t want a variable – we want a placeholder. If the value remains constant during our computation, we don’t want a variable – we want a tf.constant. We want variables when we want TensorFlow to train the value based on some criteria in our model. Notice when we create the Variable objects we supply initial values for the variable, and a “trainable” flag. Providing TensorFlow with initial values for a variable informs TensorFlow of the dimensionality and type – in our case m and b are single dimensional Tensors of size 1, but they could just as easily be multidimensional and/or integer.

The next expression assigns y the value m * x. We want to do this on an elementwise basis: we have a series of points (x_i, y_i) that we want to train against scalar values m and b. The TensorFlow functions add and mul operate on their arguments on an elementwise basis with broadcasting: using + and * would not have the intended effect.

Now that we have a model for our predicted values y, we want to compute the sum of squared error. This is accomplished using Tensor arithmetic and tf.sqrt. Here is a picture of our computational graph to this point:

TensorFlowGraph

Here comes the next new concept: optimization. We have specified our model, and the error in the model, but now we want TensorFlow to find the best possible values for m and b given the error expression. Optimization is carried out in TensorFlow by repeatedly calling Session.run() with an Optimization object “fed” as input. An Optimization carries out logic that adjusts the variables in a way that will hopefully improve the value of the error expression. In our case we will use an AdamOptimizer object. The parameter to AdamOptimizer controls how much the optimizer adjusts the variables on each call – larger is more aggressive. All Optimizer objects have a minimize() method that lets you pass in the expression you want to optimize. You can see that the train_step, the value returned by the AdamOptimizer, is passed into the Session.run() call.

Let’s explain briefly how the optimization works. A single call to the Optimizer does not adjust variables all the way to their optimal values; a call represents a single step towards an optimum. If you want to learn more about the specific logic that AdamOptimizer uses during a step, look at the TensorFlow documentation, or if you are ambitious, read the paper. The key ingredient is the gradient of the variables that you are trying to optimize. TensorFlow computes gradients by creating computational graph elements for the gradient expressions and evaluating them – have a look at this stackoverflow response for details. Again, TensorFlow can do this because it has a symbolic representation of the expressions you’re trying to compute (it’s in the picture above). Since a call to an optimizer is a single step, Session.run() must be called repeatedly in a loop to get suitable values. In the picture below I have plotted the values of the error (MSE) and m (Slope) expressions for the first 50 steps.

TensorFlowConvergence

If you have past experience with optimization you may wonder why I am running the optimizer for a fixed number of steps rather than having a more sensible stopping criterion. The answer is I am keeping it simple – feel free to extend the example if you like. You may also observe that this code is not very efficient or accurate in fitting points in a line. That’s not TensorFlow’s fault – it’s my fault for writing such a contrived example. In many real world examples the actual computational graph represents a complicated neural network.

Much of the remaining code to create the input and output arrays and call session.run should be familiar to you if you worked through my first post. When we complete our loop of Session.run() calls we print out our final slope and intercept, as well as the trained and actual y values.

With luck, I will be able to continue this series to use TensorFlow to build and run a neural network to solve a problem that is closer to a real-world scenario.

An Introduction To TensorFlow

This post walks through a simple Google TensorFlow example.

Getting Started

TensorFlow is an open source library for analytics. It’s particularly useful for building deep learning systems for predictive models involving natural language processing, audio, and images.

The TensorFlow site provides instructions for downloading and installing the package. Loosely speaking, here’s what you need to do to get started on a Windows machine:

  • Get comfortable with Python.
  • Install docker.
  • Run the “development” image for TensorFlow. The development images contains all of the samples on the TensorFlow site. The command I used was
    docker run -i -t gcr.io/tensorflow/tensorflow:latest-devel /bin/bash

Running the development image “latest-devel” will provide you with code for all of the examples on the TensorFlow site. You don’t strictly speaking have to use docker to get started with TensorFlow, but that’s what worked for me.

A Simple TensorFlow Program

I think the TensorFlow tutorials are too complicated for a beginner, so I’m going to present a simple TensorFlow example that takes input x, adds one to it, and stores it in an output array y. Many TensorFlow programs, including this one, have four distinct phases:

  1. Create TensorFlow objects that model the calculation you want to carry out,
  2. Get the input data for the model,
  3. Run the model using the input data,
  4. Do something with the output.

I have marked these phases in the code below.

import numpy as np
import tensorflow as tf
import math

def add_one():
with tf.Session() as session:
# (1)
x = tf.placeholder(tf.float32, [1], name=’x’) # fed as input below
y = tf.placeholder(tf.float32, [1], name=’y’) # fetched as output below
b = tf.constant(1.0)
y = x + b # here is our ‘model’: add one to the input.

        x_in = [2] # (2)
y_final = session.run([y], {x: x_in}) # (3)
print(y_final) # (4)

The first line in add_one creates a TensorFlow Session object. Sessions contain “computational graphs” that represent calculations to be carried out. In our example, we want to create a computational graph that represents adding the constant 1.0 to an input array x. Here is a picture:

TensorFlow1

The next two lines create “placeholders” x and y. A placeholder is an interface between a computational graph element and your data. Placeholders can represent input or output, and in my case  x represents the value to send in, and y represents the result. The second argument of the placeholder function is the shape of the placeholder, which is a single dimensional Tensor with one entry. You can also provide a name, which is useful for debugging purposes.

The next line creates the constant b using tf.constant. As we will see in future examples, there are other TensorFlow functions for addition, multiplication, and so on. Using these helper functions you can assemble a very wide range of functions that involve inputs, outputs, and other intermediate values. In this example, we’re keeping it very simple.

The next line, y = x + b, is the computational model we want TensorFlow to calculate. This line does not actually compute anything, even though it looks like it should. It simply creates data structures (called “graph elements”) that represent the addition of x and b, and the assignment of the result to the placeholder y. Each of the items in my picture above is a graph element. These graph elements are processed by the TensorFlow engine when Session.run() is called. Part of the magic of TensorFlow is to efficiently carry out graph element evaluation, even for very large and complicated graphs.

Now that the model is created, we turn to assembling the input and running the model. Our model has one input x, so we create a list x_in that will be associated with the placeholder x. If you think of a TensorFlow model as a function in your favorite programming language, the placeholders are the arguments. Here we want to “pass” x_in as the value for the “parameter” x. This is what happens in the session.run() call. The first argument is a list of graph elements that you would like TensorFlow to evaluate. In this case, we’re interested in evaluating the output placeholder y, so that’s what we pass in. Session.run will return an output value for each graph element that you pass in as the first argument, and the value will correspond to the evaluated value for that element. In English this means that y_final is going to be an array that has the result: x + 1. The second argument to run is a dictionary that specifies the values for input placeholders. This is where we associate the input array x_in with the placeholder x.

When Session.run() is called, TensorFlow will determine which elements of the computational graph need to be evaluated based on what you’ve passed in. It will then carry out the computations and then bind result values accordingly. The final line prints out the resulting array.

This example is one of the simplest ones I could think of that includes all four key phases. It’s missing many of the core features of TensorFlow! In particular, machine learning models usually train certain values to predict or classify something, but we’re not doing that here. In my next post I will walk through another example shows how to train parameters in a simple predictive model.

2015 NFL Statistics by Player and Team

I have downloaded stats for the recently completed 2015 NFL regular season from yahoo.com, cleaned the data, and saved the data in CSV format. The files are located here. The column headers should be self-explanatory.

You will find seven CSV files, which you can open in Excel or Google Sheets:

  • QB: quarterback data.
  • RB: running backs.
  • WR: wide receivers.
  • TE: tight ends.
  • K: kickers. I have broken out attempted and made field goals by distance into separate columns for convenience.
  • DEF: defensive stats by team.
  • ST: special teams stats by team.

Enjoy!

2016 NCAA Tournament Picks

Every year since 2010 I have used analytics to make my NCAA picks. Here is a link to the picks made by my model [PDF]: the projected Final Four is Villanova, Duke, North Carolina, and Virginia with Villanova defeating North Carolina in the final. (I think my model likes Virginia too much, by the way.)

Here’s how these selections were made. First, the ground rules I set for myself:

  • The picks should not be embarrassingly bad.
  • I shall spend no more than on this activity (and 30 minutes for this post).
  • I will share my code and raw data.

Okay: the model. The model combines two concepts:

  1. A “win probability” model developed by Joel Sokol in 2010 as described on Net Prophet.
  2. An eigenvalue centrality model based on this post on BioPhysEngr Blog.

The win probability model accounts for margin of victory and serves as preprocessing for step 2. I added a couple of other features to make the model more accurate:

  • Home-court advantage is considered: 2.5 points which was a rough estimate I made a few years ago and presumably is still reasonable.
  • The win probability is scaled by an adjustment factor which has been selected for best results (see below).
  • Recency is considered: more recent victories are weighted more strongly.

The eigenvalue centrality model requires game-by-game results. I pulled four years of game results for all divisions from masseyratings.com (holla!) and saved them as CSV. You can get all the data here. It sounds complicated, but it’s not (otherwise I wouldn’t do it) – the model requires less than 200 lines of Python, also available here. (The code is poor quality.)

How do I know these picks aren’t crap? I don’t. The future is uncertain. But, I did a little bit of backtesting. I trained the model using different “win probability” and “recency” parameters on the 2013-2015 seasons, selecting the combination of parameters that correctly predicted the highest percentage of NCAA tournament games during those seasons, getting approximately 68% of those games right. I don’t know if that’s good, but it seems to be better than applying either the eigenvalue centrality model or the win probability model separately.

In general, picks produced by my models rank in the upper quartile in pools that I enter. I hope that’s the case this year too.

Blogs, Research Papers, and Operations Research

There’s an interesting thread on twitter this morning about making Operations Research accessible:

I think everyone is right! I have three types of readers for my analytics posts: 

  • Active researchers or experts
  • Technically oriented readers who aren’t experts in an analytics-related discipline, e.g. software engineers
  • Everyone else

These groups roughly correspond to “shipbuilders”, “sailors”, and “passengers” using the analogy in this post. A single blog post may not satisfy all these parties, even if well written! Experts may well prefer a research paper. Developers may well prefer a link to github. General interest readers may prefer a one paragraph overview, an interactive visual, or simply “the answer”. All of these things are good, and I have found that all three groups can sometimes benefit from content intended for only one.

You can supplement a blog post with any or all of these additional materials, or break a post into two: one that explains the problem and the answer, and another that describes the solution methodology. (Here is an example from a few years ago: problem and methodology.) Consider writing blog posts for your research papers or projects before the project is complete. This will give you practice explaining the topic to an audience, and provides the opportunity for early feedback.

//platform.twitter.com/widgets.js

Collaborating via Data Fusion and Analytics

This article was originally published in Chain Store Age. Click here to read it.

Baseball’s great accidental philosopher Yogi Berra once said, “If you don’t know where you are going, you might wind up someplace else.” In the retail business, knowing where you’re going means understanding sales, inventory, promotions, pricing, and assortment. How do these considerations change by product? By store? On Black Friday? It’s hard enough for a bodega shopkeeper to keep track of all of this information accurately and efficiently, let alone a regional, national, or global retail business. Retailers and suppliers alike need a complete view of the forces that shape their businesses, so they can harness those they control and manage the ones they do not.

Many firms have adopted an inside-out approach to leveraging data and analytics, often starting with organizing their own information through data warehousing. Big data technologies may be employed for transactional or shopper data. Summarizing and reporting on this data often yields many interesting insights, but don’t stop there! Having a diverse set of relevant data, coming from both inside and outside an organization, often matters more than sheer volume. Start with your partners. Critical business decisions, such as supply chain and promotional considerations, are often made collaboratively. After all, suppliers and retailers need each other. Fusing supplier and retailer information together, for example for budgeting and planning purposes, can be an effective way of discovering and executing high-impact changes through collaborative effort.

This leaves rest of the world: the economy, the weather, social trends, the competition, and the billions of people that are currently not your customers. An integrated, shared view of the broader retail environment will help you and your business partners to make sound strategic decisions. A complete off the shelf solution for this kind of 360 degree view does not really exist, for every business is different. Don’t despair, however: you don’t have to start from scratch. Many useful data sources, such as census and macroeconomic information, are available for free and easy for analysts to use. Fused social media data, including engagement and sentiment information, are also available. Finally, a number of data analytics companies offer custom solutions for retail. The recent partnership between Target, Ideo, and MIT Media Lab is a fascinating recent example of this kind of collaborative analysis.

Having worked from the inside out, the final step is to turn data into competitive advantage through sound decision making. Focus on key business problems, which may involve coordinated action with partners, and leverage your data by combining analytics with human wisdom. Many retail businesses are not seeking automated systems in all cases; rather computer assisted processes. A store manager can use suggested orders from an analytics system as a guide and then adjust based on local conditions that the big brain in the sky has no knowledge of: the traffic jam, the high school football game, the store appearance, prom season.

When you and your partners have a shared view of the competitive environment, you can focus on the issues that matter. Analytics on a diverse set data produces the trends, insights, and forecasts that enable better collaboration, better strategic decisions, and better operations. It doesn’t have to be complicated: start with the business decisions that matter and work from the inside out. A broad set of data unified by analytics, is a winning combination.