Forecasting iPad Sales using Facebook’s Prophet

The past couple of days I’ve been playing around with Facebook’s Prophet, a time series forecasting package.

I used Prophet to forecast quarterly sales of the Apple iPad, all in about 30 lines of Python. The repository for my code is here, and here’s a Jupyter notebook that walks through how it works.

It’s a lot of fun, and you get nice little visualizations like this one:

iPad

Check it out!

 

Advertisements

Installing TopoJSON and GDAL on Windows

Here are instructions for installing topojson and GDAL binaries for Windows 8. These tools are helpful if you want to mess around with map visualizations in d3.js. Parts of these instructions are taken from this helpful stackoverflow thread.

  1. Install GDAL and MapServer binaries from this location at gisinternals.com. This was the latest stable version as of 8/8/2015.
  2. On my machine, gdal was installed to <location of unzipped archive>\bin\gdal\apps. The batch file in the root of the install should set paths correctly.
  3. Install nodejs from nodejs.org.
  4. Install npm using the directions on npmjs.org.
  5. From the command prompt run npm install –g topojson
  6. On my machine, this installed topojson to C:\Users\Nathan\AppData\Roaming\npm\node_modules\topojson\bin

Interesting shapefile data can be found at naturalearthdata.com, as explained in this excellent tutorial.

Beautiful Box Plots in Excel 2013

Box plots are widely used among data scientists and statisticians. They’re useful because they show variation both between and within data series. R, Python’s matplotlib, and many other charting libraries support box plots right out of the…box, but Excel does not. In Excel 2013, with a little bit of imagination you can create nice looking box plots without writing any code. (If you are looking for a more comprehensive reference for charts and graphs in Excel, I recommend this book.)

Read this post to find out how to create box plots that look like this:

image

Here is a workbook that has the finished product if you don’t want to follow along.

You’ll need to start with a table containing the data you want to plot. I am using the data from the Michelson-Morley experiment:

image

A box plot shows the median of each data series as a line, with a “box” whose top edge is the third quartile and whose bottom edge is the first quartile. Often we draw “whiskers” at the top and bottom representing the extreme values of each series.

If we create an auxiliary data containing this data and follow my advice from my Error Bars in Excel post, we can create a nice looking box plot.

Step 1: Calculate Quartiles and Extremes.

Create another table with the following rows for each series: min, q1, q2, q3, max. These will be the primary data in your box plot. Min and max are easy – use the =MIN() and =MAX() formulas on each data series (represented as columns A – E in my example). To compute Q1-Q3 use the QUARTILE.INC() function. (INC means “inclusive”. QUARTILE.EXC() would work fine if that’s what you want.) Enter the formulas for the first series and then “fill right”:

image

Step 2: Calculate box and whisker edges

We are going to create a stacked column chart with error bars, and “hide” the bottommost column in the stack to make the chart look like a box plot. Therefore we have to calculate the tops and bottoms of our boxes and whiskers:

  • The bottom of each box is Q1.
  • The ‘middle’ of each box is Q2 (the median). Since this is a stacked column chart, we actually want to compute Q2 – Q1.
  • The top of each box is Q3. Since we want to represent this as a “slice” in the stacked column chart, we want Q3 – Q2.
  • The error going “down” in the chart is Q2 – min, since the whiskers start at the median.
  • The error going “up” is max – Q2.
    Compute these five quantities as rows and you’ll have this:

image

Step 3: Create a stacked column chart.

Go to the INSERT tab and select a stacked column chart:

image

Now right click on the blank chart, choose Select Data Range and select the “box lo, box mid, mix hi range” as your data:

image

Step 4: Make the chart look like a Box Plot.

This is simple: the bottom bar (the blue ones in my example) need to go away. So right click on a blue bar and change both the outline and fill to nothing.

image

Step 5: Add Whiskers.

Follow the steps in my celebrated “Add Error Bars” post. Click on the “+” next to the chart, select Error Bars. Choose Series 2 (which corresponds to the median). Click on “More options” in the Error Bars flyout menu next to the “+”. In the task pane on the right, for Error Amount choose Custom and then click the Specify Value button:

image

For “Positive Error Value” select the “err up” row and for “Negative Error Value” select “err down”. Both rows contain positive values, and that is totally fine. Here’s what mine looks like:

image

That’s it! You can of course customize the other bars as desired.

image

Again, here’s the complete workbook in case you got lost.

Adding Error Bars to Charts in Excel 2013

Excel 2013 makes it easier than ever to create error bars. Hooray! This post shows you how to add them to your charts. The spreadsheet with the chart and backing formulas can be downloaded here [link]. If you are looking for a more detailed reference, I recommend Excel Charts by John Walkenbach. I also recommend this excellent post by Peltier Tech.

Let’s get started. I created a simple bar chart from a table representing promotional response to advertising on three products in twenty major US markets. (Actually, it’s just a 20×3 table with =RAND()*RAND().) Here’s the chart:

image

When you create a chart in Excel 2013, three buttons appear just above the upper-right hand corner. Click on the “plus” button to add new chart elements – check “Error Bars” and error bars will magically appear on your chart.

image

A task pane opens on the right side of the screen. This pane lets you customize the range and formatting of the error bars. Click on one of the error bars, and then on the “bars” icon in the task pane to see range options:

image

Usually I want to supply my own values for the top and bottom based on formulas. Suppose I want to display 95% confidence intervals using the series data.

First I need to compute standard deviations for each series using STDEV.P:

=STDEV.P(A2:A21)

Then compute the 95% confidence value using CONFIDENCE.NORM:

=CONFIDENCE.NORM(0.05,F3,COUNT(A2:A21))

Row 4 has the 95% confidence values for each of the three series. I’d like to base my error bars on these values. Click on Custom. Clicking on the “Specify Value” button brings up a dialog box:

image

The “Positive Error Value” range selector lets me enter in a constant value or more interestingly, cell range that defines the top of the error bars, as an offset from the bar. Similarly, “Negative Error Value” defines the bottom of the error bars. Again, these are offsets, not absolute values. Therefore I want to use F4:H4 for both. Select those ranges, click OK and voila: a nice looking chart with error bars.

image

If I change the values in columns F-H, the error bars change too. The other Error Amount choices in the Error Bar task pane are simpler. For example, Standard Deviation means that the top and bottom will be one standard deviation from the mean across all series:

image

“Fixed value” and “Percentage” are obvious. “Standard Error” computes the top and bottom using the standard error of the corresponding series. The documentation describes the formulas used by Excel.

This feature was not widely advertised in the Excel 2013 release, but it’s really useful. Charts just look better, too. All in all, some nice improvements!