Predicting NBA Rookie Performance By Draft Position

Nate Silver (and others) have tracked how NBA draft position relates to total career performance, see for example this article. But what about first-year performance?

I pulled two sets of data from to answer this question:

I then merged them using Power Query and then created a pivot table to calculate the average number of rookie season “win shares” by draft position. You can download my Excel workbook here. Here is what I found:


The first pick in the draft averages nearly five Win Shares in his rookie season, and while the pattern is irregular, win shares decrease as we get deeper into the draft (duh). (The blip at the end is due to Isaiah Thomas, drafted by the Kings who promptly screwed up by letting him go.) I have drawn a logarithmic trendline which fits the data not-to-shabbily: R^2 of 0.7397. Obviously we could do much better if we considered additional factors related to the player (such as their college performance) and team (the strength of teammates playing the same position, who will compete with the rookie for playing time). Here are the averages for the first 30 draft positions:

Draft POSITION Win Shares
1 4.96
2 2.69
3 2.96
4 4.14
5 2.23
6 1.84
7 3.36
8 1.68
9 2.59
10 1.52
11 0.84
12 1.51
13 1.48
14 1.36
15 1.64
16 1.19
17 2.37
18 1.02
19 0.71
20 1.09
21 1.74
22 2.14
23 1.54
24 2.29
25 0.98
26 1.23
27 1.08
28 0.40
29 0.54
30 0.94
31 0.79

NBA Game Results: 2013-2014

The NBA preseason is in full swing! For those of you who like to fool around with data, I have prepared a CSV file with game-by-game results for the 2013-2014 season. The data was downloaded from using Power Query and cleaned up (see below).

The format is simple:

  • Date = When the game was played
  • Visitor = three letter abbreviation of the visiting team
  • VisitorPts = visiting team score
  • VisitorSeasonWins = number of wins by the visiting team for the entire season
  • Home = TLA of home team
  • HomePts = home team score
  • HomeSeasonWins = number of wins by the home team for the entire season
  • WinMargin = HomeSeasonWins – VisitorSeasonWins
  • Margin = HomePts – VistorPts

I include the number of wins for each team in the files because I wanted to see how often good teams beat bad teams. The diagram below plots the difference in total wins for teams against the margin of victory. I have used the trendline feature in Excel to verify that while (by definition) good teams beat bad ones frequently, the variability is quite high. Notice the R^2 value.


The intercept for the trendline is 2.5967, which represents the home court advantage in points. In a future post I hope to use this data to make some predictions about the upcoming NBA season.


Fantasy Football Ratings 2014

I have prepared fantasy football ratings for the 2014 NFL season based on the data from last year’s season. I hope you will find them useful! You can download the ratings here.

These ratings are reasonable but flawed. The strengths of the ratings are:

  • They are based on player performance from the 2013 season, using a somewhat standard fantasy scoring system. (6 points for touchdowns, -2 for turnovers, 1 point per 25 passing yards, 1 point per 10 passing yards, and reasonable scoring for kickers.)
  • The ratings are comparable across positions because the rating means the expected number of fantasy points that a player will score compared to a “replacement level” player for that position. I call this “Fantasy Points Over Replacement”: FPOR.
  • Touchdowns are a key contributor fantasy performance, but they are fickle: they often vary dramatically between players of the same overall skill level, and even between seasons for the same player. In a previous post I showed that passing and rushing touchdowns are lognormally distributed against the yards per attempt. I have accounted for this phenomenon in the rankings. Loosely speaking, this means that a player that scored an unexpectedly high number of touchdowns on 2013 will be projected to score fewer in 2014.
  • The ratings do a rough correction for minor injuries. Players that play in 10 or more games in a season are rated according to the number of points they score per game. Therefore a player who missed, say, two games in 2013 due to injury is not disadvantaged compared to one that did not.

There are several weaknesses:

  • I have data for several previous seasons but do not use it. This would stabilize the ratings, and we could probably account for maturation / aging of players from season-to-season.
  • Rookies are not rated.
  • We do not account for team changes. This factor is often very important as a backup for one team may end up as a starter for another, dramatically affecting fantasy performance. (I actually have a pretty good heuristic for accounting for this, but I have not implemented it in Python…only SAS and I no longer have access to a SAS license.)
  • Players who missed a large portion of the 2013 season are essentially penalized for 2014, even if they are expected to return fully.
  • I have not rated defense/special teams.

You may want to adjust the rankings accordingly. Here are the top 25 rated players (again, the full ratings are here):

Name Position RawPts AdjPts FPOR
LeSean McCoy RB 278.6 199.3125 85.96875
Jamaal Charles RB 308 194 80.65625
Josh Gordon WR 218.6 176.3571429 71.57142857
Matt Forte RB 261.3 177.46875 64.125
Calvin Johnson WR 219.2 157.7142857 52.92857143
DeMarco Murray RB 205.1 155.4642857 42.12053571
Reggie Bush RB 185.2 153.4285714 40.08482143
Jimmy Graham TE 217.5 113.90625 35.90625
Antonio Brown WR 197.9 140.53125 35.74553571
Knowshon Moreno RB 236.6 148.6875 35.34375
Adrian Peterson RB 203.7 147.5357143 34.19196429
Marshawn Lynch RB 239.3 145.59375 32.25
Le’Veon Bell RB 171.9 142.9615385 29.61778846
Demaryius Thomas WR 227 134.0625 29.27678571
A.J. Green WR 208.6 133.6875 28.90178571
Eddie Lacy RB 207.5 141.5 28.15625
Andre Johnson WR 170.7 131.90625 27.12053571
Alshon Jeffery WR 182.1 131.34375 26.55803571
Peyton Manning QB 519.98 172.48125 22.18125
Stephen Gostkowski K 176 176 22
Drew Brees QB 435.68 172.2 21.9
Ryan Mathews RB 184.4 133.5 20.15625
DeSean Jackson WR 187.2 124.875 20.08928571
Pierre Garcon WR 162.6 124.3125 19.52678571
Jordy Nelson WR 179.4 123.1875 18.40178571

FPOR is the adjusted, cross-position score described earlier. RawPts is simply 2013 fantasy points. AdjPts are the points once touchdowns have been “corrected” and injuries accounted for.

We will see how the ratings work out! If I have time I will post a retrospective once the season is done.

A Case Study in Leadership: Kevin Durant

If you have a minute or two this morning, have a look at basketball star Kevin Durant’s acceptance speech for winning the 2014 NBA Most Valuable Player award. The MVP award is the pinnacle of individual achievement in his profession, yet Durant chooses to use this occasion to talk about something of greater importance. It is a wonderful case study in leadership.

He calls out each of his teammates by name, describing not only how each has pushed Durant to be the best he can be, but also the depth and significance of their relationships as teammates. This is something that is very hard to fake. It would be very hard to stand up and look each of your colleagues in the eye in this way without having first laid the groundwork. These moments are built day by day, out of the sight of cameras. It is because these late night texts, conversations, notes in lockers were unprompted and unscripted that they have meaning. These men are not merely his “supporting cast” or resources to be allocated.

Kevin acknowledges that there are bigger, team-oriented goals which overshadow his own honors. The ultimate goal of any sports team is to win a championship. However Kevin leavens this focus on team goals with genuine joy at individual success. Some leaders are scared, or have been taught to be scared, to admit that individual recognition feels good. You can pick out such leaders by their robotic monotone in interviews or speeches, or their insincere-sounding platitudes that amount to “there is no I in TEAM”. Unfortunately there is a large population of young coaches and leaders, often coming from privileged backgrounds where they themselves received affirmation on a frequent basis, who can only speak of sacrifice to the cause. Leaders who will label anyone who desires individual success or acknowledgement to be selfish. This sounds particularly insincere when taken in the context of billionaire team owners and company CEOs, and million dollar coaches and unpaid amateur athletes. While nobody likes working with someone selfish, going too far the other way is also counterproductive. We all have a need to feel valued individually, not just as a cog in a wheel. There is a joy that comes from knowing that you’ve done your best work, and while it’s best to assess success strictly according to one’s own potential, as John Wooden would advise, it is an undeniable fact that that we often look outside ourselves as a barometer.

Kevin is emotional because he is passionate about being the best, and cares about his teammates, coaches, and trainers as people. He truthfully acknowledges difficulties and dust-ups he’s had with different players: discomfort at dealing with a reserved teammate and angry clashes with fellow stars. He is not insistent on the team looking a certain way to the outside world.

It’s telling that Kevin Durant chose for himself the (admittedly silly) nickname of “the Servant”. He has chosen a different path of leadership than the individually brilliant but imperiously bellicose Steve Jobs, Michael Jordan, or Kobe Bryant. Nor has he chosen to be some guy you’ve never heard of. It will be interesting to see how Kevin and his team fare as the years go by.

Predicting the NCAA Tournament Using Monte Carlo Simulation

I have created a simulation model in Microsoft Excel using Frontline Systems’ Analytic Solver Platform to predict the 2014 NCAA Tournament using the technique I described in my previous post.

Click here to download the spreadsheet.

To try it out, go to and download a free trial of Analytic Solver Platform by clicking on Products –> Analytic Solver Platform:


Once you’ve installed the trial, open the spreadsheet. You’ll see a filled-out bracket in the “Bracket” worksheet:


Winners are determined by comparing the ratings of each time, using Excel formulas. Basically…a bunch of IF statements:


The magic of simulation is that it accounts for uncertainty in the assumptions we make. In this case, the uncertainty is my crazy rating system: it might be wrong. So instead of a single number that represents the strength of, say, Florida, we actually have a range of possible ratings based on a probability distribution. I have entered these probability distributions for the ratings for each team in column F. Double click on cell F9 (Florida’s rating), and you can see the range of ratings that the simulation considers:


The peak of the bell curve (normal) distribution is at 0.1245, the rating calculated in my previous post. Analytic Solver Platform samples different values from this distribution (and the other 63 teams), producing slightly different ratings, over and over again. As the ratings jiggle around for different trials, different teams win games and there are different champions for these simulated tournaments. In fact, if you hit F9 (or the “Calculate Now” button in the ribbon), you can see that all of the ratings change and the NCAA champion in cell Y14 sometimes changes from Virginia to Florida to Duke and so on.

Click the “play” button on the right hand side to simulate the NCAA tournament 10,000 times:


Now move over to the Results worksheet. In columns A and B you see the number of times each team won the simulated tournament (the sum of column B adds up to 10,000):


There is a pivot table in columns E and F that summarizes the results. Right click to Refresh it, and the nifty chart below:



We see that even though Virginia is predicted to be the most likely winner, Florida and Duke are also frequent winners.

What’s nice about the spreadsheet is that you can change it to do your own simulations. Change the values in columns D and E in the Bracket worksheet to incorporate your own rating system and see who your model predicts will win. The simulation only scratches the surface of what Analytic Solver Platform can do. Go crazy with correlated distributions (perhaps by conference?) or even simulation-optimization models to tune your model. Have fun.

NCAA Tournament Analytics Model 2014: Methodology

I revealed my analytics model’s 2014 NCAA Tournament picks in yesterday’s post. Today, I want to describe how the ratings were determined. (Fair warning: this post will be quite a bit more technical and geeky.)

Click here to download the Python model source code.

My NCAA prediction model computes a numerical rating for each team in the field. Picks are generated by comparing team ratings: the team with the higher rating is predicted to advance. As I outlined in my preview, the initial model combines two ideas:

  1. A “win probability” model developed by Joel Sokol in 2010 as described on Net Prophet.
  2. An eigenvalue centrality model based on this post on BioPhysEngr Blog.

The eigenvalue centrality model creates a big network (also called a graph) that links all NCAA teams. The arrows in the network represent games between teams. Eigenvalue centrality analyzes the network to determine which network nodes (which teams), are strongest. The model I described in my preview was pretty decent, but it failed to address two important issues:

  • Recently played games should count more than games at the beginning of the season.
  • Edge weights should reflect the probability one team is stronger than another, rather than probability one will beat another on a neutral floor.

The first issue is easy to explain. In my initial model, game-by-game results were analyzed to produce edge weights in a giant network linking teams. The weight was simply the formula given by Joel Sokol in his 2010 paper. However, it seems reasonable that more recently played games are more important, from a predictive perspective, than early season games. To account for this factor, I scale the final margin of victory for more recently played games by a “recency” factor R. If one team beats another by K points at the start of the season, we apply the Sokol formula with K. However, if one team beats another by K points at the end of the season, we apply the formula with R*K. If R=2, that means a 10 point victory at the start of the season is worth the same as a 5 point victory at the end. If the game was in the middle of the season, we’d apply half of the adjustment: 7.5 points.

The second issue – regarding edge weights and team strength – is more subtle. As you saw in the “Top 25” from my preview post, there were some strange results. For example, Canisius was rated #24. The reason is that the Sokol formula is not very sensitive to small margins of victory.

Let’s look at an example. Here is the Sokol formula: phi(0.0189 * x – 0.0756)

If you try the values 1..6 you get the probabilities [0.477, 0.485, 0.492, 0.5, 0.508, 0.515]. This means that the difference between a 1-point home win and a 6-point home win is only 0.515 – 0.477 = 0.0377 ~= 3%. This means that most of the nonzero values in the big adjacency matrix that we create are around 0.5, and consequently our centrality method is determining teams that are influential in the network, rather than teams that are dominant. One way to find teams that are dominant is to scale the margin of victory so that a 6-point victory is worth much more than a 1-point victory. So the hack here is to substitute S*x for x in the formula, where S is a “sensitivity” scaling factor.

One last tiny adjustment I made was to pretend that Joel Embiid did not play this year, so that Kansas’s rating reflects their strength without him. Long story short, I subtracted 1.68 points for all games that Joel Embiid appeared in. This post has the details.

My Python code implements everything I described in this post and the preview. I generated the picks by choosing the recency parameter R = 1.5 and strength parameter S = 2. Here is a sample call and output:

scoreNcaa(25, 20, 2, 1.5, 0)
['Virginia', 0.13098760857436742] ['Florida', 0.12852960094006807] ['Duke', 0.12656196253849666] ['Kansas', 0.12443601960952431] ['Michigan St', 0.12290861109638007] ['Arizona', 0.12115701603335856] ['Wisconsin', 0.11603580613955565] ['Pittsburgh', 0.11492421298144373] ['Michigan', 0.11437543620057213] ['Iowa St', 0.1128795675290855]

If you’ve made it this far, and have the source code, you can figure out what most of the other parameters mean. (Or you can ask in the comments!)

The answer to the question, “why did Virginia come out first” is difficult to answer succinctly. Basically:

  • Virginia, Florida, and Duke are all pretty close.
  • Virginia had a consistently strong schedule.
  • Their losses were generally speaking close games to strong opponents.
  • They had several convincing, recent victories over other very strong teams.
    In a future post, I will provide an Excel spreadsheet that will allow you to build and simulate your own NCAA tournament models!

NCAA Tournament Analytics Model 2014: Picks

Here are my picks for the 2014 NCAA Tournament, based on the analytics model I described in this post. This post contains the picks and my next post will contain the code and methodology for the geeks among us. I use analytics for my NCAA picks for my own education and enjoyment, and to absolve responsibility for them. No guarantees!

Here is a link to picks for all rounds in PDF format.

Here is a spreadsheet with all picks and ratings.

This year’s model examined every college basketball game played in Division I, II, III, and Canada based on data from Prof. Peter Wolfe and from The ratings implicitly account for strength of opposition, and explicitly account for neutral site games, recency, and Joel Imbiid’s back (it turned out not to matter). I officially deem these picks “not crappy”.

The last four rounds are given at the end – the values next to each team are the scores generated by the model.

The model predicts Virginia, recent winners of the ACC tournament, will win it all in 2014 in a rematch with Duke. Arizona was rated the sixth best team in the field but is projected to make it to the Final Four because it plays in the weakest region (the West). Florida, the second strongest team in the field (juuust behind Virginia) joins them. Wichita State was rated surprisingly low (25th) even though it is currently undefeated, basically due to margin of victory against relatively weaker competition (although the Missouri Valley has been an underrated conference over the past several years). Wichita State was placed in the Midwest region, clearly the toughest region in the bracket, and is projected to lose to underseeded Kentucky in the second round. Here is the average and median strengths of the four regions. The last column is the 75th percentile, which is an assessment of the strength of the elite teams in each bracket. Green means easy:

Region Avg Med Top Q
South 0.0824 0.0855 0.1101
East 0.0816 0.0876 0.1064
West 0.0752 0.0831 0.1008
Midwest 0.0841 0.0890 0.1036

The model predicts a few upsets (though not too many). The winners of the “play-in games” are projected to knock off higher seeded Saint Louis and UMass. Kentucky is also projected to beat Louisville, both of whom probably should have been seeded higher. Baylor is projected to knock off Creighton, busting Warren Buffett’s billion dollar bracket in Round 2.

Sweet 16     Elite 8  
Florida 0.1285   Florida 0.1285
VA Commonwealth 0.1097      
Syracuse 0.1111   Kansas 0.1281
Kansas 0.1244      
Virginia 0.1310   Virginia 0.1281
Michigan St 0.1229      
Iowa St 0.1129   Iowa St 0.1129
Villanova 0.1060      
Arizona 0.1212   Arizona 0.1212
Oklahoma 0.1001      
Baylor 0.1013   Wisconsin 0.1160
Wisconsin 0.1160      
Kentucky 0.1081   Kentucky 0.1081
Louisville 0.1065      
Duke 0.1266   Duke 0.1266
Michigan 0.1144      


Final Four     Championship
Florida 0.1285   Virginia 0.1310
Virginia 0.1310   Duke 0.1266
Arizona 0.1212      
Duke 0.1266