## Predicting NBA Rookie Performance By Draft Position

Nate Silver (and others) have tracked how NBA draft position relates to total career performance, see for example this article. But what about first-year performance?

I pulled two sets of data from basketball-reference.com to answer this question:

I then merged them using Power Query and then created a pivot table to calculate the average number of rookie season “win shares” by draft position. You can download my Excel workbook here. Here is what I found:

The first pick in the draft averages nearly five Win Shares in his rookie season, and while the pattern is irregular, win shares decrease as we get deeper into the draft (duh). (The blip at the end is due to Isaiah Thomas, drafted by the Kings who promptly screwed up by letting him go.) I have drawn a logarithmic trendline which fits the data not-to-shabbily: R^2 of 0.7397. Obviously we could do much better if we considered additional factors related to the player (such as their college performance) and team (the strength of teammates playing the same position, who will compete with the rookie for playing time). Here are the averages for the first 30 draft positions:

Draft POSITION Win Shares
1 4.96
2 2.69
3 2.96
4 4.14
5 2.23
6 1.84
7 3.36
8 1.68
9 2.59
10 1.52
11 0.84
12 1.51
13 1.48
14 1.36
15 1.64
16 1.19
17 2.37
18 1.02
19 0.71
20 1.09
21 1.74
22 2.14
23 1.54
24 2.29
25 0.98
26 1.23
27 1.08
28 0.40
29 0.54
30 0.94
31 0.79

## NBA Game Results: 2013-2014

The NBA preseason is in full swing! For those of you who like to fool around with data, I have prepared a CSV file with game-by-game results for the 2013-2014 season. The data was downloaded from basketball-reference.com using Power Query and cleaned up (see below).

The format is simple:

• Date = When the game was played
• Visitor = three letter abbreviation of the visiting team
• VisitorPts = visiting team score
• VisitorSeasonWins = number of wins by the visiting team for the entire season
• Home = TLA of home team
• HomePts = home team score
• HomeSeasonWins = number of wins by the home team for the entire season
• WinMargin = HomeSeasonWins – VisitorSeasonWins
• Margin = HomePts – VistorPts

I include the number of wins for each team in the files because I wanted to see how often good teams beat bad teams. The diagram below plots the difference in total wins for teams against the margin of victory. I have used the trendline feature in Excel to verify that while (by definition) good teams beat bad ones frequently, the variability is quite high. Notice the R^2 value.

The intercept for the trendline is 2.5967, which represents the home court advantage in points. In a future post I hope to use this data to make some predictions about the upcoming NBA season.

Enjoy!

## Fantasy Football Ratings 2014

I have prepared fantasy football ratings for the 2014 NFL season based on the data from last year’s season. I hope you will find them useful! You can download the ratings here.

These ratings are reasonable but flawed. The strengths of the ratings are:

• They are based on player performance from the 2013 season, using a somewhat standard fantasy scoring system. (6 points for touchdowns, -2 for turnovers, 1 point per 25 passing yards, 1 point per 10 passing yards, and reasonable scoring for kickers.)
• The ratings are comparable across positions because the rating means the expected number of fantasy points that a player will score compared to a “replacement level” player for that position. I call this “Fantasy Points Over Replacement”: FPOR.
• Touchdowns are a key contributor fantasy performance, but they are fickle: they often vary dramatically between players of the same overall skill level, and even between seasons for the same player. In a previous post I showed that passing and rushing touchdowns are lognormally distributed against the yards per attempt. I have accounted for this phenomenon in the rankings. Loosely speaking, this means that a player that scored an unexpectedly high number of touchdowns on 2013 will be projected to score fewer in 2014.
• The ratings do a rough correction for minor injuries. Players that play in 10 or more games in a season are rated according to the number of points they score per game. Therefore a player who missed, say, two games in 2013 due to injury is not disadvantaged compared to one that did not.

There are several weaknesses:

• I have data for several previous seasons but do not use it. This would stabilize the ratings, and we could probably account for maturation / aging of players from season-to-season.
• Rookies are not rated.
• We do not account for team changes. This factor is often very important as a backup for one team may end up as a starter for another, dramatically affecting fantasy performance. (I actually have a pretty good heuristic for accounting for this, but I have not implemented it in Python…only SAS and I no longer have access to a SAS license.)
• Players who missed a large portion of the 2013 season are essentially penalized for 2014, even if they are expected to return fully.
• I have not rated defense/special teams.

You may want to adjust the rankings accordingly. Here are the top 25 rated players (again, the full ratings are here):

 Name Position RawPts AdjPts FPOR LeSean McCoy RB 278.6 199.3125 85.96875 Jamaal Charles RB 308 194 80.65625 Josh Gordon WR 218.6 176.3571429 71.57142857 Matt Forte RB 261.3 177.46875 64.125 Calvin Johnson WR 219.2 157.7142857 52.92857143 DeMarco Murray RB 205.1 155.4642857 42.12053571 Reggie Bush RB 185.2 153.4285714 40.08482143 Jimmy Graham TE 217.5 113.90625 35.90625 Antonio Brown WR 197.9 140.53125 35.74553571 Knowshon Moreno RB 236.6 148.6875 35.34375 Adrian Peterson RB 203.7 147.5357143 34.19196429 Marshawn Lynch RB 239.3 145.59375 32.25 Le’Veon Bell RB 171.9 142.9615385 29.61778846 Demaryius Thomas WR 227 134.0625 29.27678571 A.J. Green WR 208.6 133.6875 28.90178571 Eddie Lacy RB 207.5 141.5 28.15625 Andre Johnson WR 170.7 131.90625 27.12053571 Alshon Jeffery WR 182.1 131.34375 26.55803571 Peyton Manning QB 519.98 172.48125 22.18125 Stephen Gostkowski K 176 176 22 Drew Brees QB 435.68 172.2 21.9 Ryan Mathews RB 184.4 133.5 20.15625 DeSean Jackson WR 187.2 124.875 20.08928571 Pierre Garcon WR 162.6 124.3125 19.52678571 Jordy Nelson WR 179.4 123.1875 18.40178571

FPOR is the adjusted, cross-position score described earlier. RawPts is simply 2013 fantasy points. AdjPts are the points once touchdowns have been “corrected” and injuries accounted for.

We will see how the ratings work out! If I have time I will post a retrospective once the season is done.

## A Case Study in Leadership: Kevin Durant

If you have a minute or two this morning, have a look at basketball star Kevin Durant’s acceptance speech for winning the 2014 NBA Most Valuable Player award. The MVP award is the pinnacle of individual achievement in his profession, yet Durant chooses to use this occasion to talk about something of greater importance. It is a wonderful case study in leadership.

He calls out each of his teammates by name, describing not only how each has pushed Durant to be the best he can be, but also the depth and significance of their relationships as teammates. This is something that is very hard to fake. It would be very hard to stand up and look each of your colleagues in the eye in this way without having first laid the groundwork. These moments are built day by day, out of the sight of cameras. It is because these late night texts, conversations, notes in lockers were unprompted and unscripted that they have meaning. These men are not merely his “supporting cast” or resources to be allocated.

Kevin acknowledges that there are bigger, team-oriented goals which overshadow his own honors. The ultimate goal of any sports team is to win a championship. However Kevin leavens this focus on team goals with genuine joy at individual success. Some leaders are scared, or have been taught to be scared, to admit that individual recognition feels good. You can pick out such leaders by their robotic monotone in interviews or speeches, or their insincere-sounding platitudes that amount to “there is no I in TEAM”. Unfortunately there is a large population of young coaches and leaders, often coming from privileged backgrounds where they themselves received affirmation on a frequent basis, who can only speak of sacrifice to the cause. Leaders who will label anyone who desires individual success or acknowledgement to be selfish. This sounds particularly insincere when taken in the context of billionaire team owners and company CEOs, and million dollar coaches and unpaid amateur athletes. While nobody likes working with someone selfish, going too far the other way is also counterproductive. We all have a need to feel valued individually, not just as a cog in a wheel. There is a joy that comes from knowing that you’ve done your best work, and while it’s best to assess success strictly according to one’s own potential, as John Wooden would advise, it is an undeniable fact that that we often look outside ourselves as a barometer.

Kevin is emotional because he is passionate about being the best, and cares about his teammates, coaches, and trainers as people. He truthfully acknowledges difficulties and dust-ups he’s had with different players: discomfort at dealing with a reserved teammate and angry clashes with fellow stars. He is not insistent on the team looking a certain way to the outside world.

It’s telling that Kevin Durant chose for himself the (admittedly silly) nickname of “the Servant”. He has chosen a different path of leadership than the individually brilliant but imperiously bellicose Steve Jobs, Michael Jordan, or Kobe Bryant. Nor has he chosen to be some guy you’ve never heard of. It will be interesting to see how Kevin and his team fare as the years go by.

## Predicting the NCAA Tournament Using Monte Carlo Simulation

I have created a simulation model in Microsoft Excel using Frontline Systems’ Analytic Solver Platform to predict the 2014 NCAA Tournament using the technique I described in my previous post.

To try it out, go to solver.com and download a free trial of Analytic Solver Platform by clicking on Products –> Analytic Solver Platform:

Once you’ve installed the trial, open the spreadsheet. You’ll see a filled-out bracket in the “Bracket” worksheet:

Winners are determined by comparing the ratings of each time, using Excel formulas. Basically…a bunch of IF statements:

The magic of simulation is that it accounts for uncertainty in the assumptions we make. In this case, the uncertainty is my crazy rating system: it might be wrong. So instead of a single number that represents the strength of, say, Florida, we actually have a range of possible ratings based on a probability distribution. I have entered these probability distributions for the ratings for each team in column F. Double click on cell F9 (Florida’s rating), and you can see the range of ratings that the simulation considers:

The peak of the bell curve (normal) distribution is at 0.1245, the rating calculated in my previous post. Analytic Solver Platform samples different values from this distribution (and the other 63 teams), producing slightly different ratings, over and over again. As the ratings jiggle around for different trials, different teams win games and there are different champions for these simulated tournaments. In fact, if you hit F9 (or the “Calculate Now” button in the ribbon), you can see that all of the ratings change and the NCAA champion in cell Y14 sometimes changes from Virginia to Florida to Duke and so on.

Click the “play” button on the right hand side to simulate the NCAA tournament 10,000 times:

Now move over to the Results worksheet. In columns A and B you see the number of times each team won the simulated tournament (the sum of column B adds up to 10,000):

There is a pivot table in columns E and F that summarizes the results. Right click to Refresh it, and the nifty chart below:

We see that even though Virginia is predicted to be the most likely winner, Florida and Duke are also frequent winners.

What’s nice about the spreadsheet is that you can change it to do your own simulations. Change the values in columns D and E in the Bracket worksheet to incorporate your own rating system and see who your model predicts will win. The simulation only scratches the surface of what Analytic Solver Platform can do. Go crazy with correlated distributions (perhaps by conference?) or even simulation-optimization models to tune your model. Have fun.

## NCAA Tournament Analytics Model 2014: Methodology

I revealed my analytics model’s 2014 NCAA Tournament picks in yesterday’s post. Today, I want to describe how the ratings were determined. (Fair warning: this post will be quite a bit more technical and geeky.)

My NCAA prediction model computes a numerical rating for each team in the field. Picks are generated by comparing team ratings: the team with the higher rating is predicted to advance. As I outlined in my preview, the initial model combines two ideas:

1. A “win probability” model developed by Joel Sokol in 2010 as described on Net Prophet.
2. An eigenvalue centrality model based on this post on BioPhysEngr Blog.

The eigenvalue centrality model creates a big network (also called a graph) that links all NCAA teams. The arrows in the network represent games between teams. Eigenvalue centrality analyzes the network to determine which network nodes (which teams), are strongest. The model I described in my preview was pretty decent, but it failed to address two important issues:

• Recently played games should count more than games at the beginning of the season.
• Edge weights should reflect the probability one team is stronger than another, rather than probability one will beat another on a neutral floor.

The first issue is easy to explain. In my initial model, game-by-game results were analyzed to produce edge weights in a giant network linking teams. The weight was simply the formula given by Joel Sokol in his 2010 paper. However, it seems reasonable that more recently played games are more important, from a predictive perspective, than early season games. To account for this factor, I scale the final margin of victory for more recently played games by a “recency” factor R. If one team beats another by K points at the start of the season, we apply the Sokol formula with K. However, if one team beats another by K points at the end of the season, we apply the formula with R*K. If R=2, that means a 10 point victory at the start of the season is worth the same as a 5 point victory at the end. If the game was in the middle of the season, we’d apply half of the adjustment: 7.5 points.

The second issue – regarding edge weights and team strength – is more subtle. As you saw in the “Top 25” from my preview post, there were some strange results. For example, Canisius was rated #24. The reason is that the Sokol formula is not very sensitive to small margins of victory.

Let’s look at an example. Here is the Sokol formula: phi(0.0189 * x – 0.0756)

If you try the values 1..6 you get the probabilities [0.477, 0.485, 0.492, 0.5, 0.508, 0.515]. This means that the difference between a 1-point home win and a 6-point home win is only 0.515 – 0.477 = 0.0377 ~= 3%. This means that most of the nonzero values in the big adjacency matrix that we create are around 0.5, and consequently our centrality method is determining teams that are influential in the network, rather than teams that are dominant. One way to find teams that are dominant is to scale the margin of victory so that a 6-point victory is worth much more than a 1-point victory. So the hack here is to substitute S*x for x in the formula, where S is a “sensitivity” scaling factor.

One last tiny adjustment I made was to pretend that Joel Embiid did not play this year, so that Kansas’s rating reflects their strength without him. Long story short, I subtracted 1.68 points for all games that Joel Embiid appeared in. This post has the details.

My Python code implements everything I described in this post and the preview. I generated the picks by choosing the recency parameter R = 1.5 and strength parameter S = 2. Here is a sample call and output:

```scoreNcaa(25, 20, 2, 1.5, 0)
['Virginia', 0.13098760857436742]
['Florida', 0.12852960094006807]
['Duke', 0.12656196253849666]
['Kansas', 0.12443601960952431]
['Michigan St', 0.12290861109638007]
['Arizona', 0.12115701603335856]
['Wisconsin', 0.11603580613955565]
['Pittsburgh', 0.11492421298144373]
['Michigan', 0.11437543620057213]
['Iowa St', 0.1128795675290855]```

If you’ve made it this far, and have the source code, you can figure out what most of the other parameters mean. (Or you can ask in the comments!)

The answer to the question, “why did Virginia come out first” is difficult to answer succinctly. Basically:

• Virginia, Florida, and Duke are all pretty close.
• Virginia had a consistently strong schedule.
• Their losses were generally speaking close games to strong opponents.
• They had several convincing, recent victories over other very strong teams.
In a future post, I will provide an Excel spreadsheet that will allow you to build and simulate your own NCAA tournament models!

## NCAA Tournament Analytics Model 2014: Picks

Here are my picks for the 2014 NCAA Tournament, based on the analytics model I described in this post. This post contains the picks and my next post will contain the code and methodology for the geeks among us. I use analytics for my NCAA picks for my own education and enjoyment, and to absolve responsibility for them. No guarantees!

Here is a link to picks for all rounds in PDF format.

Here is a spreadsheet with all picks and ratings.

This year’s model examined every college basketball game played in Division I, II, III, and Canada based on data from Prof. Peter Wolfe and from MasseyRatings.com. The ratings implicitly account for strength of opposition, and explicitly account for neutral site games, recency, and Joel Imbiid’s back (it turned out not to matter). I officially deem these picks “not crappy”.

The last four rounds are given at the end – the values next to each team are the scores generated by the model.

The model predicts Virginia, recent winners of the ACC tournament, will win it all in 2014 in a rematch with Duke. Arizona was rated the sixth best team in the field but is projected to make it to the Final Four because it plays in the weakest region (the West). Florida, the second strongest team in the field (juuust behind Virginia) joins them. Wichita State was rated surprisingly low (25th) even though it is currently undefeated, basically due to margin of victory against relatively weaker competition (although the Missouri Valley has been an underrated conference over the past several years). Wichita State was placed in the Midwest region, clearly the toughest region in the bracket, and is projected to lose to underseeded Kentucky in the second round. Here is the average and median strengths of the four regions. The last column is the 75th percentile, which is an assessment of the strength of the elite teams in each bracket. Green means easy:

 Region Avg Med Top Q South 0.0824 0.0855 0.1101 East 0.0816 0.0876 0.1064 West 0.0752 0.0831 0.1008 Midwest 0.0841 0.0890 0.1036

The model predicts a few upsets (though not too many). The winners of the “play-in games” are projected to knock off higher seeded Saint Louis and UMass. Kentucky is also projected to beat Louisville, both of whom probably should have been seeded higher. Baylor is projected to knock off Creighton, busting Warren Buffett’s billion dollar bracket in Round 2.

 Sweet 16 Elite 8 Florida 0.1285 Florida 0.1285 VA Commonwealth 0.1097 Syracuse 0.1111 Kansas 0.1281 Kansas 0.1244 Virginia 0.1310 Virginia 0.1281 Michigan St 0.1229 Iowa St 0.1129 Iowa St 0.1129 Villanova 0.1060 Arizona 0.1212 Arizona 0.1212 Oklahoma 0.1001 Baylor 0.1013 Wisconsin 0.1160 Wisconsin 0.1160 Kentucky 0.1081 Kentucky 0.1081 Louisville 0.1065 Duke 0.1266 Duke 0.1266 Michigan 0.1144

 Final Four Championship Florida 0.1285 Virginia 0.1310 Virginia 0.1310 Duke 0.1266 Arizona 0.1212 Duke 0.1266