Over the weekend I created a model in Excel to predict the 2014-2015 NBA season. The model simulates the full 82-game schedule, using player Win Shares from the 2013-2014 season to estimate the strength of each team, accounting for roster changes. This model is not perfect, or even particularly sophisticated, but it is interesting. Here are the model’s predictions as of 10/15/2014, with projected playoff teams in bold.
You can download my full spreadsheet here. It’s complicated but not impossible to follow. It does not exactly match the results presented above because I have a messier version that accounts for recent injuries, e.g. Kevin Durant.
The Cavs, Spurs, and Clips are the favorites to win the title in this model (a previous version of this model also had the Thunder in this class, but Kevin Durant is now injured). Comparing these estimates to over-unders in Vegas, the biggest differences are Brooklyn (lower), Indiana (higher), Memphis (lower), Minnesota (higher), New Orleans (lower), Phoenix (higher). If you take the time to read through the methodology at the end of this post, you may be able to see why some of these differences exist. Some are probably reasonable, others may not be.
How It Works
Many of the ingredients for this model were presented in my previous three posts, where I compiled game-by-game results for the 2013-2014 season, built a simple model to predict rookie performance, and tracked roster changes. Now the task is pretty simple: estimate the strength of each team, figure out how unpredictable games are, and then simulate the season using the strengths, accounting for uncertainty. At the end I discuss weaknesses of this model, which if you are a glass-half-full type of person also suggest areas for improvement.
Step 1: Estimate the strength of each team. Team strengths are estimated by adding up the 2013-2014 Win Shares for the top twelve players on each NBA team. In my last post I gave a spreadsheet with Win Shares for all 2013-2014 NBA players based on data from basketball-reference.com. I made three adjustments to this data for the purposes of this analysis:
- Added rookies. I estimated projected 2014-2015 Win Shares for rookies using the logarithmic curve given in this post.
- Accounted for injuries to good players. Kobe Bryant, Derrick Rose, Rajon Rondo, and a couple of other high profile players were injured in 2013-2014. I replaced their Win Share total with the average of the past three seasons, including the season they were injured. Is this reasonable? I don’t know.
- Trimmed to 12. I manually trimmed rosters so that only the 12 players with the highest Win Shares remained.
Adding Win Shares gives an overall “strength rating” for each team.
Step 2: Estimate the unpredictability of game results. Most of the time, a good team will beat a bad team. Most of the time. Can we quantify this more precisely? Sure. From a previous post, I determined that home court advantage is approximately 2.6 points per game. We also found that although the difference in total season wins is a predictor of who will win in a matchup between two teams, it is a rather weak predictor. in other words, bad teams beat good teams quite often, especially at home. For our prediction model we make another simple assumption: every team’s performance over the course of the season varies according to a normal distribution, with the mean of this distribution corresponding to their overall team strength.
Normal distributions are defined by two parameters: mean and standard deviation. If I know what the normal distribution looks like then I can estimate the probability of the home team winning in a matchup: take the difference of their team strengths, then calculate the cdf of the distribution at –2.6 (the home court advantage). But what is the standard deviation? I can estimate it by “replaying” all of the games in the previous season. If I guess a value for the standard deviation, I can calculate win probabilities for all games. If I add up the win probabilities, for say, Boston, then this should sum to Boston’s win total for the season (sadly, 25). So if I want to estimate the standard deviation, all I have to do is minimize the sum of deviations from estimated and actual 2013-2014 win totals. I can do this using Excel’s Solver: it’s a nonlinear minimization problem involving only one variable (the standard deviation).
It turns out that the resulting estimate does a very good job of matching 2013-2014 results:
- The estimated win totals for all teams were within 2 wins of their actual values.
- The estimated home winning percentage matches the actual value quite closely!
Step 3: Determine win totals for the 2014-2015 season. I obtained the 2014-2015 schedule for $5 from nbastuffer.com. Using this schedule, I calculated win probabilities for each game using the team strengths in Step 1 and the standard deviation in Step 2. If I add up the totals for each team, I get their estimated win totals. Voila! Since the prediction is created by looking at each game on the schedule, we also get home and away records, in conference records, and so on. It’s also easy to update the estimate during the season as games are played, players are traded or injured, and so on.
Why this model stinks. The biggest virtue of this model is that it was easy to build. I can think of at least ten potential shortcomings with this model:
- Win Shares are probably not the best metric for individual and team strength.
- It assumes that individual performance for 2014-2015 will be the same as 2013-2014. Paul Pierce isn’t getting any younger.
- It does not account for predictable changes in playing time from season-to-season.
- 2014-2015 win shares are not normalized to account for players leaving and entering the league.
- 2014-2015 win shares do not account for positive and negative synergies between players.
- There is no reason to believe that the standard deviation calculated in Step 2 should be the same for all teams.
- I have not given any justification for using a normal distribution at all!
- The vagaries of the NBA schedule are not accounted for. For example, teams play worse in their second consecutive road game.
- Several teams, including the Philadelphia 76ers, will tank games.
- Injuries were handled in an arbitrary and inconsistent fashion.
It will be interesting to see how this model performs, in spite of its shortcomings.