A Simple Predictive Model in TensorFlow

In my previous post I provided a simple introduction to TensorFlow. In this post I’d like to take the next step and build a predictive model so I can highlight some key TensorFlow concepts.

This model will fit a line y = m * x + b to a series of points (x_i, y_i). This code is not the best way fit a line – it’s just an example. In our code, we’ll generate points with small random deviations from a line with known slope and intercept. Our test will be to see if we can recover these known values using TensorFlow. Here is a picture of our training data:

TensorFlowData

My last post explained that there are often four phases to TensorFlow programs: creating a model, getting the input data, running the model, and processing the output. In our model we want to find a slope m and intercept b that best fits our input data. What do we mean by “best fit”? We mean values m, b that give the smallest sum of squared error between the predicted and actual y_i. The way we do this in TensorFlow is create this expression, and then repeatedly run a Session that adjusts the values of m and b to make the error smaller using an optimizer.

There are two functions below: one to generate test data, and another to create and run the TensorFlow model:

def make_data(n):
    np.random.seed(42) # To ensure same data for multiple runs
    x = 2.0 * np.array(range(n))
    y = 1.0 + 3.0 * (np.array(range(n)) + 0.1 * (np.random.rand(n) – 0.5))
    return x, y

def fit_line(n=1, log_progress=False):
    with tf.Session() as session:
        x = tf.placeholder(tf.float32, [n], name=’x’)
        y = tf.placeholder(tf.float32, [n], name=’y’)
        m = tf.Variable([1.0], trainable=True) # training variable: slope
        b = tf.Variable([1.0], trainable=True) # training variable: intercept
        y = tf.add(tf.mul(m, x), b) # fit y_i = m * x_i + b

        # actual values (for training)
        y_act = tf.placeholder(tf.float32, [n], name=’y_’)

        # minimize sum of squared error between trained and actual.
        error = tf.sqrt((y – y_act) * (y – y_act))
        # train_step = tf.train.GradientDescentOptimizer(0.01).minimize(error)
        train_step = tf.train.AdamOptimizer(0.05).minimize(error)

        # generate input and output data with a little random noise.
        x_in, y_star = make_data(n)

        init = tf.initialize_all_variables()
        session.run(init)
        feed_dict = {x: x_in, y_act: y_star}
        for i in range(30 * n):
            y_i, m_i, b_i, _ = session.run([y, m, b, train_step], feed_dict)
            err = np.linalg.norm(y_i – y_star, 2)
            if log_progress:
                print(“%3d | %.4f %.4f %.4e” % (i, m_i, b_i, err))

        print(“Done! m = %f, b = %f, err = %e, iterations = %d”
              % (m_i, b_i, err, i))
        print(”      x: %s” % x_in)
        print(“Trained: %s” % y_i)
        print(” Actual: %s” % y_star)

Hopefully make_data is fairly clear. The function fit_line takes two input arguments:

  • n: the number of points to generate
  • log_progress: whether to display TensorFlow’s progress in finding the right slope m and intercept b.

After we create a TensorFlow session, our next two steps are to create placeholders for our input x and output y, similar to our first example. These are both Tensors of size n since that’s how many data points we have. The next line creates a TensorFlow variable to represent the slope m. A variable is a value that is retained between calls to Session.run(). If the value is an input or an output from the model, we don’t want a variable – we want a placeholder. If the value remains constant during our computation, we don’t want a variable – we want a tf.constant. We want variables when we want TensorFlow to train the value based on some criteria in our model. Notice when we create the Variable objects we supply initial values for the variable, and a “trainable” flag. Providing TensorFlow with initial values for a variable informs TensorFlow of the dimensionality and type – in our case m and b are single dimensional Tensors of size 1, but they could just as easily be multidimensional and/or integer.

The next expression assigns y the value m * x. We want to do this on an elementwise basis: we have a series of points (x_i, y_i) that we want to train against scalar values m and b. The TensorFlow functions add and mul operate on their arguments on an elementwise basis with broadcasting: using + and * would not have the intended effect.

Now that we have a model for our predicted values y, we want to compute the sum of squared error. This is accomplished using Tensor arithmetic and tf.sqrt. Here is a picture of our computational graph to this point:

TensorFlowGraph

Here comes the next new concept: optimization. We have specified our model, and the error in the model, but now we want TensorFlow to find the best possible values for m and b given the error expression. Optimization is carried out in TensorFlow by repeatedly calling Session.run() with an Optimization object “fed” as input. An Optimization carries out logic that adjusts the variables in a way that will hopefully improve the value of the error expression. In our case we will use an AdamOptimizer object. The parameter to AdamOptimizer controls how much the optimizer adjusts the variables on each call – larger is more aggressive. All Optimizer objects have a minimize() method that lets you pass in the expression you want to optimize. You can see that the train_step, the value returned by the AdamOptimizer, is passed into the Session.run() call.

Let’s explain briefly how the optimization works. A single call to the Optimizer does not adjust variables all the way to their optimal values; a call represents a single step towards an optimum. If you want to learn more about the specific logic that AdamOptimizer uses during a step, look at the TensorFlow documentation, or if you are ambitious, read the paper. The key ingredient is the gradient of the variables that you are trying to optimize. TensorFlow computes gradients by creating computational graph elements for the gradient expressions and evaluating them – have a look at this stackoverflow response for details. Again, TensorFlow can do this because it has a symbolic representation of the expressions you’re trying to compute (it’s in the picture above). Since a call to an optimizer is a single step, Session.run() must be called repeatedly in a loop to get suitable values. In the picture below I have plotted the values of the error (MSE) and m (Slope) expressions for the first 50 steps.

TensorFlowConvergence

If you have past experience with optimization you may wonder why I am running the optimizer for a fixed number of steps rather than having a more sensible stopping criterion. The answer is I am keeping it simple – feel free to extend the example if you like. You may also observe that this code is not very efficient or accurate in fitting points in a line. That’s not TensorFlow’s fault – it’s my fault for writing such a contrived example. In many real world examples the actual computational graph represents a complicated neural network.

Much of the remaining code to create the input and output arrays and call session.run should be familiar to you if you worked through my first post. When we complete our loop of Session.run() calls we print out our final slope and intercept, as well as the trained and actual y values.

With luck, I will be able to continue this series to use TensorFlow to build and run a neural network to solve a problem that is closer to a real-world scenario.

Author: natebrix

Follow me on twitter at @natebrix.

1 thought on “A Simple Predictive Model in TensorFlow”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s