Predicting NBA Rookie Performance By Draft Position

Nate Silver (and others) have tracked how NBA draft position relates to total career performance, see for example this article. But what about first-year performance?

I pulled two sets of data from basketball-reference.com to answer this question:

I then merged them using Power Query and then created a pivot table to calculate the average number of rookie season “win shares” by draft position. You can download my Excel workbook here. Here is what I found:

AverageWinSharesByDraftPosition

The first pick in the draft averages nearly five Win Shares in his rookie season, and while the pattern is irregular, win shares decrease as we get deeper into the draft (duh). (The blip at the end is due to Isaiah Thomas, drafted by the Kings who promptly screwed up by letting him go.) I have drawn a logarithmic trendline which fits the data not-to-shabbily: R^2 of 0.7397. Obviously we could do much better if we considered additional factors related to the player (such as their college performance) and team (the strength of teammates playing the same position, who will compete with the rookie for playing time). Here are the averages for the first 30 draft positions:

Draft POSITION Win Shares
1 4.96
2 2.69
3 2.96
4 4.14
5 2.23
6 1.84
7 3.36
8 1.68
9 2.59
10 1.52
11 0.84
12 1.51
13 1.48
14 1.36
15 1.64
16 1.19
17 2.37
18 1.02
19 0.71
20 1.09
21 1.74
22 2.14
23 1.54
24 2.29
25 0.98
26 1.23
27 1.08
28 0.40
29 0.54
30 0.94
31 0.79

NBA Game Results: 2013-2014

The NBA preseason is in full swing! For those of you who like to fool around with data, I have prepared a CSV file with game-by-game results for the 2013-2014 season. The data was downloaded from basketball-reference.com using Power Query and cleaned up (see below).

The format is simple:

  • Date = When the game was played
  • Visitor = three letter abbreviation of the visiting team
  • VisitorPts = visiting team score
  • VisitorSeasonWins = number of wins by the visiting team for the entire season
  • Home = TLA of home team
  • HomePts = home team score
  • HomeSeasonWins = number of wins by the home team for the entire season
  • WinMargin = HomeSeasonWins – VisitorSeasonWins
  • Margin = HomePts – VistorPts

I include the number of wins for each team in the files because I wanted to see how often good teams beat bad teams. The diagram below plots the difference in total wins for teams against the margin of victory. I have used the trendline feature in Excel to verify that while (by definition) good teams beat bad ones frequently, the variability is quite high. Notice the R^2 value.

NBAWinsVersusMargin

The intercept for the trendline is 2.5967, which represents the home court advantage in points. In a future post I hope to use this data to make some predictions about the upcoming NBA season.

Enjoy!

401k Simulation Using Analytic Solver Platform

You can build a pretty decent 401k simulation in a few minutes in Excel using Analytic Solver Platform:

image

Let’s give it a shot! You can download the completed workbook here.

First, let’s build a worksheet that calculates 401k balances for 10 years. At the top of the worksheet let’s enter a yearly contribution rate:

image

Let’s compute 401k balances for the next 10 years, based on this contribution. A simple calculation for the balance for a given year involves five factors:

  1. The 401k balance for the previous year.
  2. The rate of return for the 401k.
  3. The previous year’s salary.
  4. The rate of increase in the salary (your raise).
  5. The rate of contribution (entered above).

In row 6 we will enter in the starting values for return, salary increase, balance, and salary in columns B, C, D, E respectively. For now let’s assume:

  • Return = 0.05
  • Salary Increase = 0.05
  • Balance = 5,000
  • Salary = 100,000

With a couple of small assumptions, the new balance is old balance * return + contribution * (salary * (1 + salary increase)). In the next row we will compute Year 1, using this formula:

  • Salary = D6 * (1 + C6). This simply means that this year’s salary is last year’s adjusted by raise. (Obviously salary could be modeled differently depending on when the raise kicks in.)
  • Balance = E6*(1 + B6)+D6*$B$3. There are two terms. The first is the old balance times the portfolio return. The second is the current salary times the contribution rate.

We can fill these values down, giving us the 401k balance for the entire period:

image

Here’s the thing: we don’t actually know what our portfolio return and salary increases will be in future years. They’re uncertain. We can use Analytic Solver Platform to turn the wild guesses in columns B and C into probability distributions. Using simulation we can then determine the most likely range for future 401k balances.

For portfolio return, a reasonable thing to do is to go back and look at past performance. Rates of return for the S&P 500 (and other financial instruments) are given on this page. Using the “From Web” feature of Power Query (or by simply copy-pasting) you can bring this data into another Excel worksheet with no sweat:

image

Now let’s turn this historical data into a probability distribution we can use in our model. Select the S&P 500 historical return data and select Distibutions –> Distribution Wizard in the Analytic Solver Platform tab:

image

Fill in the first page of the wizard:

image

Select “continuous values” in the next step, “Fit the data” in the next, and then pick an empty cell for “Location” in the final step. In the cell that you selected, you will see a formula something like this:

=PsiWeibull(3.55593208704872,0.692234009779183, PsiShift(-0.509633992648591))

This is a Weibull distribution that fits the historical data. If you hit “F9” to recalculate the spreadsheet you will see that the value for this cell changes as a result of sampling from this distribution. Each sample is a different plausible yearly return. Let’s copy this formula in place of the 0.05 values we entered in column B of our original spreadsheet. If we click on the “Model” button in the Analytic Solver Platform ribbon, we will see that these cells have been labeled as “Uncertain Variables” in the Simulation section.

For Salary Increase we will do something simpler. Let’s just assume that the increase will be between 2% and 7% each year. Enter =PsiUniform(0.02, 0.07) in cell C6, and fill down.

The last thing we need to do is to define an “output” for the simulation, called an Uncertain Function. When we define Uncertain Functions, we get nice charts and stats for these cells when we run a simulation. Click on the Balance entry for Year 10, then click on arrow next to the “+” in the Model Pane, and then Add Uncertain Function. Your Model Pane will look something like this:

image

And your spreadsheet will look something like this:

image

Now all we need to do is click Simulate in the ribbon. Analytic Solver Platform draws samples for the uncertain variables (and evaluates everything in parallel for fast performance) and then shows you a chart showing the different possible 401k balances. As you can see, the possible balances vary widely but are concentrated around $100,000:

 

image

Here’s the great thing: you can now build out this spreadsheet to your heart’s content to build simulations that incorporate more factors. If you want to get really fancy, you can correlate yearly returns. Check out the extensive help on solver.com for more.

Time Series Forecasting using Analytic Solver Platform, Windows Azure Marketplace, and Power Query

In this post, I’ll show you how to use Analytic Solver Platform in Excel 2013 with Power BI to build a time series analysis model using data from the cloud. Together, Analytic Solver Platform and Power BI provide a powerful, easy-to-use predictive analytics platform at a fraction of the cost of alternatives. A new release of Analytic Solver Platform is coming later this week! Visit solver.com to download a free trial or purchase.

My goal is to forecast Gross Domestic Product (GDP) in the United States. Historical GDP is available for free on Windows Azure Marketplace, so our first step is to subscribe to the data feed.

  1. Go to http://datamarket.azure.com and sign in using your Windows ID.
  2. Go to the Real GDP Per Capita By State data set and subscribe to it. The subscription is free.

If you look around you will find a number of interesting data sets. 

Now we’d like to bring this data into Excel so we can work with it. Power Query, a free Excel add-on provided by Microsoft, makes this extremely easy. If you use Excel and do not know about Power Query, it is definitely worth your time!

  1. Download and install Microsoft Power Query for Excel here.
  2. Start Excel and click on the Power Query tab.
  3. Select “From Other Sources” and then “From Windows Azure Marketplace”:From Windows Azure Marketplace
  4. The Navigator task pane will open on the right hand side. Your Windows Azure Marketplace subscriptions will be listed, including Real GDP Per Capita By State:Selecting Data Source
  5. Expand the Real GDP Per Capity By State entry and click on its only sub-item, called RealGDPByStatePerCapital [sic].
  6. The Query Editor window is displayed. You should see GDP information by state as well as rolled up to the US:
    GDP by State in Power Query 
  7. Let’s focus on national GDP. Click on the Area column header, uncheck “Select All” to clear all entries and scroll down to United States (screenshot below). Click OK.
  8. Click Apply and Close. A new worksheet with national GDP data will be created by Power Query.

These steps only scratch the surface of Power Query. For one thing, Power Query supports a number of interesting data sources, as you may have noticed in the menu in step 3. For another, Power Query lets you easily join and clean data, loading it into worksheets, or PowerPivot data models.

Now that we’ve got the data into Excel, we can use XLMiner to run a time series forecast.

  1. Click on the XLMiner ribbon tab.
  2. Under Time Series, select ARIMA -> ARIMA Model.
  3. Enter the following options (and check out the screenshot below):
    1. Choose RealGDPPerCapital as the “Selected Variable”. This is what we want to forecast.
    2. Select Year as the Time Variable.
    3. Set Autoregressive (p) = 4, Difference (d) = 0, Moving average (q) = 1.
      ARIMA Options 
  4. Click Advanced and:
    1. Check “Fitted values and residuals”. This will show the error in the time series analysis in the report.
    2. Check “Produce forecasts” and change “Number of forecasts” to 5.
    3. Click OK.
  5. Click OK. XLMiner produces an output worksheet, a residuals worksheet, and a stored model worksheet [which can be used for scoring].
  6. We want to look at how well the ARIMA model performed. Click on the ARIMA_Residuals worksheet and compare the Actual Value and Fitted Value columns. Not too shabby!

If I explore the tables and charts in the output worksheets, I can explore other aspects of the model results, including the forecasted GDP values.

Once you get going, you’ll find that it’s really easy to build cool predictive models using Power Query and Analytic Solver Platform. In the coming days I will share some other interesting scenarios.