Creating Equation Haikus using Constraint Programming

Bob Bosch pointed out (in a response to Shea Serrano, wonderfully…) that 77 + 123 = 200 is a haiku in English:

How many such haikus are there? Well, we can use constraint satisfaction (CSP) to give us an idea (Python code here for the impatient):

A traditional haiku consists of three lines: the first line has five syllables, the second seven, and the last five, for a total of seventeen syllables.

We can write equations that describe our problem. First, let’s define the function s[n] as the number of syllables in the English words for n. For example, s[874] = 7; try saying “eight hundred seventy four” out loud. It’s not that hard in English to compute s[n]; at least, I don’t think I messed it up.

Let’s call A the number in the first line of the haiku, B the second, and C the third. If we want a haiku with the right number of syllables and for the equation it describes to hold, all we need is:

s[A] = 5
1 + s[B] = 7 (since “plus” is one syllable)
2 + s[C] = 5 (since “equals” is two syllables)
A + B = C

If you let A=77, B=123, C=200 as in Bob’s example, you will see the above equations hold.

Using the constraint package in Python, it’s easy to create this model and solve it. Here is the code. A great thing about CSP solvers is that they will not just give you one of the solutions to the model, but all the solutions, at least for cases where A, B, C <= 9999. It turns out there are 279 such haikus, including

eight thousand nineteen
plus nine hundred eighty one
equals nine thousand

and the equally evocative

one hundred fourteen
plus one hundred eighty six
equals three hundred

Of course, this is not the only possible form for an equation haiku. For example, why not:

+ B + C
= D

I invite you to modify the code and find other types of equation haikus!

Screen Shot 2019-09-22 at 2.35.14 PM


Forecasting iPad Sales using Facebook’s Prophet

The past couple of days I’ve been playing around with Facebook’s Prophet, a time series forecasting package.

I used Prophet to forecast quarterly sales of the Apple iPad, all in about 30 lines of Python. The repository for my code is here, and here’s a Jupyter notebook that walks through how it works.

It’s a lot of fun, and you get nice little visualizations like this one:


Check it out!


Checkpointing and Reusing TensorFlow Models

In my last two posts I introduced TensorFlow and wrote a very simple predictive model. In doing so I introduced many of the key concepts of TensorFlow:

  • The Session, the core of the TensorFlow object model,
  • Computational graphs and some of their elements: placeholders, variables, and Tensors,
  • Training models by iteratively calling on Optimization objects.

In this post I want to show you can save and re-use the results of your TensorFlow models. As we discussed last time, training a model means finding variable values that suit a particular purpose, for example finding a slope and intercept that defines a line that best fits a series of points. Training a model can be computationally expensive because we have to search for the best variable values through optimization. Suppose we want to use the results of this trained model over and over again, but without re-training the model each time. You can do this in TensorFlow using the Saver object.

A Saver object can save and restore the values of TensorFlow Variables. A typical scenario has three steps:

  1. Creating a Saver and telling the Saver which variables you want to save,
  2. Save the variables to a file,
  3. Restore the variables from a file when they are needed.

A Saver deals only with Variables. It does not work with placeholders, sessions, expressions, or any other kind of TensorFlow object. Here is a simple example that saves and restores two variables:

def save(checkpoint_file=’hello.chk’):
    with tf.Session() as session:
        x = tf.Variable([42.0, 42.1, 42.3], name=’x’)
        y = tf.Variable([[1.0, 2.0], [3.0, 4.0]], name=’y’)
        not_saved = tf.Variable([-1, -2], name=’not_saved’)

        saver = tf.train.Saver([x, y]), checkpoint_file)

def restore(checkpoint_file=’hello.chk’):
    x = tf.Variable(-1.0, validate_shape=False, name=’x’)
    y = tf.Variable(-1.0, validate_shape=False, name=’y’)
    with tf.Session() as session:
        saver = tf.train.Saver()
        saver.restore(session, checkpoint_file)

def reset():

Try calling save(), reset() and then restore(), and compare the outputs to verify everything worked out. When you create a Saver, you should specify a list (or dictionary) of Variable objects you wish to save. (If you don’t, TensorFlow will assume you are interested in all the variables in your current session.) The shapes and values of these values will be stored in binary format when you call the save() method, and retrieved on restore(). Notice in my last function, when I create x and y, I give dummy values and say validate_shape=False. This is because I want the saver to determine the values and shapes when the variables are restored. If you’re wondering why the reset() function is there, remember that computational graphs are associated with Sessions. I want to “clear out” the state of the Session so I don’t have multiple x and y objects floating around as we call save and restore().

When you use Saver in real models, you should keep a couple of facts in mind:

  1. If you want to do anything useful with the Variables you restore, you may need to recreate the rest of the computational graph.
  2. The computational graph that you use with restored Variables need not be the same as the one that you used when saving. That can be useful!
  3. Saver has additional methods that can be helpful if your computation spans machines, or if you want to avoid overwriting old checkpoints on successive calls to save().

At the end of this post I have include a modification of my line fitting example to optionally save and restore model results. I’ve highlighted the interesting parts. You can call it like this:

fit_line(5, checkpoint_file=’vars.chk’)
fit_line(5, checkpoint_file=’vars.chk’, restore=True)

With this version, I could easily “score” new data points x using my trained model.

def fit_line(n=1, log_progress=False, iter_scale=200,
             restore=False, checkpoint_file=None):
    with tf.Session() as session:
        x = tf.placeholder(tf.float32, [n], name=’x’)
        y = tf.placeholder(tf.float32, [n], name=’y’)
        m = tf.Variable([1.0], name=’m’)
        b = tf.Variable([1.0], name=’b’)
        y = tf.add(tf.mul(m, x), b) # fit y_i = m * x_i + b
        y_act = tf.placeholder(tf.float32, [n], name=’y_’)

        # minimize sum of squared error between trained and actual.
        error = tf.sqrt((y – y_act) * (y – y_act))
        train_step = tf.train.AdamOptimizer(0.05).minimize(error)

        x_in, y_star = make_data(n)

        saver = tf.train.Saver()
        feed_dict = {x: x_in, y_act: y_star}
        if restore:
            print(“Loading variables from ‘%s’.” % checkpoint_file)
            saver.restore(session, checkpoint_file)
            y_i, m_i, b_i =[y, m, b], feed_dict)
            init = tf.initialize_all_variables()
            for i in range(iter_scale * n):
                y_i, m_i, b_i, _ =[y, m, b, train_step],
                err = np.linalg.norm(y_i – y_star, 2)
                if log_progress:
                    print(“%3d | %.4f %.4f %.4e” % (i, m_i, b_i, err))

            print(“Done training! m = %f, b = %f, err = %e, iter = %d”
                  % (m_i, b_i, err, i))
            if checkpoint_file is not None:
                print(“Saving variables to ‘%s’.” % checkpoint_file)
      , checkpoint_file)

        print(”      x: %s” % x_in)
        print(“Trained: %s” % y_i)
        print(” Actual: %s” % y_star)

A Simple Predictive Model in TensorFlow

In my previous post I provided a simple introduction to TensorFlow. In this post I’d like to take the next step and build a predictive model so I can highlight some key TensorFlow concepts.

This model will fit a line y = m * x + b to a series of points (x_i, y_i). This code is not the best way fit a line – it’s just an example. In our code, we’ll generate points with small random deviations from a line with known slope and intercept. Our test will be to see if we can recover these known values using TensorFlow. Here is a picture of our training data:


My last post explained that there are often four phases to TensorFlow programs: creating a model, getting the input data, running the model, and processing the output. In our model we want to find a slope m and intercept b that best fits our input data. What do we mean by “best fit”? We mean values m, b that give the smallest sum of squared error between the predicted and actual y_i. The way we do this in TensorFlow is create this expression, and then repeatedly run a Session that adjusts the values of m and b to make the error smaller using an optimizer.

There are two functions below: one to generate test data, and another to create and run the TensorFlow model:

def make_data(n):
    np.random.seed(42) # To ensure same data for multiple runs
    x = 2.0 * np.array(range(n))
    y = 1.0 + 3.0 * (np.array(range(n)) + 0.1 * (np.random.rand(n) – 0.5))
    return x, y

def fit_line(n=1, log_progress=False):
    with tf.Session() as session:
        x = tf.placeholder(tf.float32, [n], name=’x’)
        y = tf.placeholder(tf.float32, [n], name=’y’)
        m = tf.Variable([1.0], trainable=True) # training variable: slope
        b = tf.Variable([1.0], trainable=True) # training variable: intercept
        y = tf.add(tf.mul(m, x), b) # fit y_i = m * x_i + b

        # actual values (for training)
        y_act = tf.placeholder(tf.float32, [n], name=’y_’)

        # minimize sum of squared error between trained and actual.
        error = tf.sqrt((y – y_act) * (y – y_act))
        # train_step = tf.train.GradientDescentOptimizer(0.01).minimize(error)
        train_step = tf.train.AdamOptimizer(0.05).minimize(error)

        # generate input and output data with a little random noise.
        x_in, y_star = make_data(n)

        init = tf.initialize_all_variables()
        feed_dict = {x: x_in, y_act: y_star}
        for i in range(30 * n):
            y_i, m_i, b_i, _ =[y, m, b, train_step], feed_dict)
            err = np.linalg.norm(y_i – y_star, 2)
            if log_progress:
                print(“%3d | %.4f %.4f %.4e” % (i, m_i, b_i, err))

        print(“Done! m = %f, b = %f, err = %e, iterations = %d”
              % (m_i, b_i, err, i))
        print(”      x: %s” % x_in)
        print(“Trained: %s” % y_i)
        print(” Actual: %s” % y_star)

Hopefully make_data is fairly clear. The function fit_line takes two input arguments:

  • n: the number of points to generate
  • log_progress: whether to display TensorFlow’s progress in finding the right slope m and intercept b.

After we create a TensorFlow session, our next two steps are to create placeholders for our input x and output y, similar to our first example. These are both Tensors of size n since that’s how many data points we have. The next line creates a TensorFlow variable to represent the slope m. A variable is a value that is retained between calls to If the value is an input or an output from the model, we don’t want a variable – we want a placeholder. If the value remains constant during our computation, we don’t want a variable – we want a tf.constant. We want variables when we want TensorFlow to train the value based on some criteria in our model. Notice when we create the Variable objects we supply initial values for the variable, and a “trainable” flag. Providing TensorFlow with initial values for a variable informs TensorFlow of the dimensionality and type – in our case m and b are single dimensional Tensors of size 1, but they could just as easily be multidimensional and/or integer.

The next expression assigns y the value m * x. We want to do this on an elementwise basis: we have a series of points (x_i, y_i) that we want to train against scalar values m and b. The TensorFlow functions add and mul operate on their arguments on an elementwise basis with broadcasting: using + and * would not have the intended effect.

Now that we have a model for our predicted values y, we want to compute the sum of squared error. This is accomplished using Tensor arithmetic and tf.sqrt. Here is a picture of our computational graph to this point:


Here comes the next new concept: optimization. We have specified our model, and the error in the model, but now we want TensorFlow to find the best possible values for m and b given the error expression. Optimization is carried out in TensorFlow by repeatedly calling with an Optimization object “fed” as input. An Optimization carries out logic that adjusts the variables in a way that will hopefully improve the value of the error expression. In our case we will use an AdamOptimizer object. The parameter to AdamOptimizer controls how much the optimizer adjusts the variables on each call – larger is more aggressive. All Optimizer objects have a minimize() method that lets you pass in the expression you want to optimize. You can see that the train_step, the value returned by the AdamOptimizer, is passed into the call.

Let’s explain briefly how the optimization works. A single call to the Optimizer does not adjust variables all the way to their optimal values; a call represents a single step towards an optimum. If you want to learn more about the specific logic that AdamOptimizer uses during a step, look at the TensorFlow documentation, or if you are ambitious, read the paper. The key ingredient is the gradient of the variables that you are trying to optimize. TensorFlow computes gradients by creating computational graph elements for the gradient expressions and evaluating them – have a look at this stackoverflow response for details. Again, TensorFlow can do this because it has a symbolic representation of the expressions you’re trying to compute (it’s in the picture above). Since a call to an optimizer is a single step, must be called repeatedly in a loop to get suitable values. In the picture below I have plotted the values of the error (MSE) and m (Slope) expressions for the first 50 steps.


If you have past experience with optimization you may wonder why I am running the optimizer for a fixed number of steps rather than having a more sensible stopping criterion. The answer is I am keeping it simple – feel free to extend the example if you like. You may also observe that this code is not very efficient or accurate in fitting points in a line. That’s not TensorFlow’s fault – it’s my fault for writing such a contrived example. In many real world examples the actual computational graph represents a complicated neural network.

Much of the remaining code to create the input and output arrays and call should be familiar to you if you worked through my first post. When we complete our loop of calls we print out our final slope and intercept, as well as the trained and actual y values.

With luck, I will be able to continue this series to use TensorFlow to build and run a neural network to solve a problem that is closer to a real-world scenario.

An Introduction To TensorFlow

This post walks through a simple Google TensorFlow example.

Getting Started

TensorFlow is an open source library for analytics. It’s particularly useful for building deep learning systems for predictive models involving natural language processing, audio, and images.

The TensorFlow site provides instructions for downloading and installing the package. Loosely speaking, here’s what you need to do to get started on a Windows machine:

  • Get comfortable with Python.
  • Install docker.
  • Run the “development” image for TensorFlow. The development images contains all of the samples on the TensorFlow site. The command I used was
    docker run -i -t /bin/bash

Running the development image “latest-devel” will provide you with code for all of the examples on the TensorFlow site. You don’t strictly speaking have to use docker to get started with TensorFlow, but that’s what worked for me.

A Simple TensorFlow Program

I think the TensorFlow tutorials are too complicated for a beginner, so I’m going to present a simple TensorFlow example that takes input x, adds one to it, and stores it in an output array y. Many TensorFlow programs, including this one, have four distinct phases:

  1. Create TensorFlow objects that model the calculation you want to carry out,
  2. Get the input data for the model,
  3. Run the model using the input data,
  4. Do something with the output.

I have marked these phases in the code below.

import numpy as np
import tensorflow as tf
import math

def add_one():
with tf.Session() as session:
# (1)
x = tf.placeholder(tf.float32, [1], name=’x’) # fed as input below
y = tf.placeholder(tf.float32, [1], name=’y’) # fetched as output below
b = tf.constant(1.0)
y = x + b # here is our ‘model’: add one to the input.

        x_in = [2] # (2)
y_final =[y], {x: x_in}) # (3)
print(y_final) # (4)

The first line in add_one creates a TensorFlow Session object. Sessions contain “computational graphs” that represent calculations to be carried out. In our example, we want to create a computational graph that represents adding the constant 1.0 to an input array x. Here is a picture:


The next two lines create “placeholders” x and y. A placeholder is an interface between a computational graph element and your data. Placeholders can represent input or output, and in my case  x represents the value to send in, and y represents the result. The second argument of the placeholder function is the shape of the placeholder, which is a single dimensional Tensor with one entry. You can also provide a name, which is useful for debugging purposes.

The next line creates the constant b using tf.constant. As we will see in future examples, there are other TensorFlow functions for addition, multiplication, and so on. Using these helper functions you can assemble a very wide range of functions that involve inputs, outputs, and other intermediate values. In this example, we’re keeping it very simple.

The next line, y = x + b, is the computational model we want TensorFlow to calculate. This line does not actually compute anything, even though it looks like it should. It simply creates data structures (called “graph elements”) that represent the addition of x and b, and the assignment of the result to the placeholder y. Each of the items in my picture above is a graph element. These graph elements are processed by the TensorFlow engine when is called. Part of the magic of TensorFlow is to efficiently carry out graph element evaluation, even for very large and complicated graphs.

Now that the model is created, we turn to assembling the input and running the model. Our model has one input x, so we create a list x_in that will be associated with the placeholder x. If you think of a TensorFlow model as a function in your favorite programming language, the placeholders are the arguments. Here we want to “pass” x_in as the value for the “parameter” x. This is what happens in the call. The first argument is a list of graph elements that you would like TensorFlow to evaluate. In this case, we’re interested in evaluating the output placeholder y, so that’s what we pass in. will return an output value for each graph element that you pass in as the first argument, and the value will correspond to the evaluated value for that element. In English this means that y_final is going to be an array that has the result: x + 1. The second argument to run is a dictionary that specifies the values for input placeholders. This is where we associate the input array x_in with the placeholder x.

When is called, TensorFlow will determine which elements of the computational graph need to be evaluated based on what you’ve passed in. It will then carry out the computations and then bind result values accordingly. The final line prints out the resulting array.

This example is one of the simplest ones I could think of that includes all four key phases. It’s missing many of the core features of TensorFlow! In particular, machine learning models usually train certain values to predict or classify something, but we’re not doing that here. In my next post I will walk through another example shows how to train parameters in a simple predictive model.