2015 NFL Statistics by Player and Team

I have downloaded stats for the recently completed 2015 NFL regular season from yahoo.com, cleaned the data, and saved the data in CSV format. The files are located here. If you prefer a github repository, check here. The column headers should be self-explanatory.

You will find seven CSV files, which you can open in Excel or Google Sheets:

  • QB: quarterback data.
  • RB: running backs.
  • WR: wide receivers.
  • TE: tight ends.
  • K: kickers. I have broken out attempted and made field goals by distance into separate columns for convenience.
  • DEF: defensive stats by team.
  • ST: special teams stats by team.

Enjoy!

2014 NFL Statistics by Player and Team in Excel

I have downloaded stats for the recently completed 2014 NFL regular season from yahoo.com, cleaned the data, and saved in Excel and CSV formats. The files are located here. The column headers should be self-explanatory.

The Excel workbook has seven worksheets:

  • QB: quarterback data.
  • RB: running backs.
  • WR: wide receivers.
  • TE: tight ends.
  • K: kickers. I have broken out attempted and made field goals by distance into separate columns for convenience.
  • DEF: defensive stats by team.
  • ST: special teams stats by team.

The same folder also has separate CSV files for each position, which may be more helpful if you are a coder.

Fantasy Football Ratings 2014

I have prepared fantasy football ratings for the 2014 NFL season based on the data from last year’s season. I hope you will find them useful! You can download the ratings here.

These ratings are reasonable but flawed. The strengths of the ratings are:

  • They are based on player performance from the 2013 season, using a somewhat standard fantasy scoring system. (6 points for touchdowns, -2 for turnovers, 1 point per 25 passing yards, 1 point per 10 passing yards, and reasonable scoring for kickers.)
  • The ratings are comparable across positions because the rating means the expected number of fantasy points that a player will score compared to a “replacement level” player for that position. I call this “Fantasy Points Over Replacement”: FPOR.
  • Touchdowns are a key contributor fantasy performance, but they are fickle: they often vary dramatically between players of the same overall skill level, and even between seasons for the same player. In a previous post I showed that passing and rushing touchdowns are lognormally distributed against the yards per attempt. I have accounted for this phenomenon in the rankings. Loosely speaking, this means that a player that scored an unexpectedly high number of touchdowns on 2013 will be projected to score fewer in 2014.
  • The ratings do a rough correction for minor injuries. Players that play in 10 or more games in a season are rated according to the number of points they score per game. Therefore a player who missed, say, two games in 2013 due to injury is not disadvantaged compared to one that did not.

There are several weaknesses:

  • I have data for several previous seasons but do not use it. This would stabilize the ratings, and we could probably account for maturation / aging of players from season-to-season.
  • Rookies are not rated.
  • We do not account for team changes. This factor is often very important as a backup for one team may end up as a starter for another, dramatically affecting fantasy performance. (I actually have a pretty good heuristic for accounting for this, but I have not implemented it in Python…only SAS and I no longer have access to a SAS license.)
  • Players who missed a large portion of the 2013 season are essentially penalized for 2014, even if they are expected to return fully.
  • I have not rated defense/special teams.

You may want to adjust the rankings accordingly. Here are the top 25 rated players (again, the full ratings are here):

Name Position RawPts AdjPts FPOR
LeSean McCoy RB 278.6 199.3125 85.96875
Jamaal Charles RB 308 194 80.65625
Josh Gordon WR 218.6 176.3571429 71.57142857
Matt Forte RB 261.3 177.46875 64.125
Calvin Johnson WR 219.2 157.7142857 52.92857143
DeMarco Murray RB 205.1 155.4642857 42.12053571
Reggie Bush RB 185.2 153.4285714 40.08482143
Jimmy Graham TE 217.5 113.90625 35.90625
Antonio Brown WR 197.9 140.53125 35.74553571
Knowshon Moreno RB 236.6 148.6875 35.34375
Adrian Peterson RB 203.7 147.5357143 34.19196429
Marshawn Lynch RB 239.3 145.59375 32.25
Le’Veon Bell RB 171.9 142.9615385 29.61778846
Demaryius Thomas WR 227 134.0625 29.27678571
A.J. Green WR 208.6 133.6875 28.90178571
Eddie Lacy RB 207.5 141.5 28.15625
Andre Johnson WR 170.7 131.90625 27.12053571
Alshon Jeffery WR 182.1 131.34375 26.55803571
Peyton Manning QB 519.98 172.48125 22.18125
Stephen Gostkowski K 176 176 22
Drew Brees QB 435.68 172.2 21.9
Ryan Mathews RB 184.4 133.5 20.15625
DeSean Jackson WR 187.2 124.875 20.08928571
Pierre Garcon WR 162.6 124.3125 19.52678571
Jordy Nelson WR 179.4 123.1875 18.40178571

FPOR is the adjusted, cross-position score described earlier. RawPts is simply 2013 fantasy points. AdjPts are the points once touchdowns have been “corrected” and injuries accounted for.

We will see how the ratings work out! If I have time I will post a retrospective once the season is done.

2013 NFL Statistics by Player and Team in Excel

I have downloaded stats for the recently completed 2013 NFL regular season from yahoo.com, cleaned the data, and saved in Excel format. The files are located here. The column headers should be self-explanatory.

There are seven worksheets:

  • QB: quarterback data.
  • RB: running backs.
  • WR: wide receivers.
  • TE: tight ends.
  • K: kickers. For the yardage columns, the portion before the dash indicates FG made, the portion after attempted. So 4-5 means 4 made out of 5 from the distance range.
  • DEF: defensive stats by team.
  • ST: special teams stats by team.

The same folder also contains CSV files broken out for each position.

2012 Fantasy Football Prediction Model: Retrospective

Several months ago I laid out a simple model for forecasting fantasy football performance. My post included a table ranking players by value for draft purposes. Appropriately, in our league at Nielsen I used my own model to draft (selecting Ray Rice in the first round, followed by Victor Cruz and Wes Welker). I screwed up in a later round and accidentally selected Jason Whiten instead of Matt Ryan – but other than that I strictly followed the advice of my blog: always choose the highest ranked player from positions that needed to be filled.

So how did I do? Well, I finished third out of twelve – not bad. Let’s take a retrospective look at my model to see how accurately I predicted the 2012 fantasy football season.

My previous post provides actual 2012 statistics for all positions. I scored each player and compared to my projections. Note that my projections did not include players who did not participate in the 2011 season (e.g. Jamaal Charles) or rookies (e.g. RG III), so they are not included in the analysis. If I regard the difference between actual and projected fantasy points as “error”, then I can compute R^2 to summarize fit. I computed R^2 for two projections:

  • Project 2012 points as equal to 2011 points (“2011” in the table below)
  • Project 2012 points by adjusting 2011 points as described in my post (“Model”)

The results are as follows:

Pos N 2011 Model
DEF 29 0.899414 0.899414
K 26 0.961511 0.961511
QB 31 0.909683 0.887465
RB 101 0.774406 0.797133
TE 78 0.81725 0.824893
WR 124 0.806109 0.810808

The overall R^2 for the model is around 0.852. I didn’t really dig into the details too much, but I think we can reasonably conclude that:

  • Kickers and Defense are pretty darn stable (probably because they largely depend on the strength of the team and opposition, which does not generally massively change between seasons).
  • RB performance was the hardest to predict.
  • The model worked in the sense that most positions ended up with better R^2 than if I had simply used 2011 numbers.
  • The exception is for the QB position, which perhaps implies that my assumptions about touchdown production do not apply to the QB position.

In the table below I compare the points per game projected by the model (Projected) with the actual number of fantasy points in 2012 (Actual) and the relative error. I highlight a row green when the relative error is less than 10%. If it is less than 25% it is yellow, and red otherwise.

Name

Pos

Projected

Actual

Rel. Err.

Drew Brees

QB

20.65682

21.59875

0.043611

Aaron Rodgers

QB

22.32247

21.35625

0.045243

Tom Brady

QB

20.06681

21.2675

0.056456

Cam Newton

QB

19.92028

20.17875

0.012809

Adrian Peterson

RB

13.7407

19.0875

0.280121

Matt Ryan

QB

15.70105

19.05375

0.17596

Tony Romo

QB

15.42652

17.18875

0.102522

Matthew Stafford

QB

18.70318

17.08

0.095034

Ben Roethlisberger

QB

14.58558

17.06154

0.145119

Arian Foster

RB

18.69446

16.38125

0.141211

Andy Dalton

QB

12.47327

15.6725

0.20413

Marshawn Lynch

RB

13.48645

15.4125

0.124967

Josh Freeman

QB

14.20246

15.40625

0.078137

Michael Vick

QB

21.37533

15.168

0.409239

Carson Palmer

QB

20.68852

14.688

0.408532

Eli Manning

QB

16.32711

14.5575

0.12156

Joe Flacco

QB

12.35432

14.555

0.151198

Kevin Kolb

QB

11.33343

14.12667

0.197728

Matt Schaub

QB

22.60787

13.96375

0.61904

Sam Bradford

QB

16.83711

13.92375

0.209236

Ray Rice

RB

17.53886

13.88125

0.263493

Calvin Johnson

WR

15.14139

13.775

0.099193

Brandon Marshall

WR

9.919735

13.55

0.267916

Ryan Fitzpatrick

QB

13.05769

13.35625

0.022354

C.J. Spiller

RB

7.19662

13.26875

0.457626

Philip Rivers

QB

14.7622

13.015

0.134245

Dez Bryant

WR

9.147546

13.0125

0.297019

Rob Gronkowski

TE

13.32919

13

0.025322

Stevan Ridley

RB

3.349849

12.4625

0.731206

A.J. Green

WR

9.735794

12.4375

0.217223

Demaryius Thomas

WR

6.994023

12.3375

0.433109

Jay Cutler

QB

20.61836

12.308

0.6752

Frank Gore

RB

10.8703

12.3

0.116236

Alex Smith

QB

12.57711

12.268

0.025197

LeSean McCoy

RB

16.10583

12.10833

0.330144

Christian Ponder

QB

14.64461

12.04375

0.215951

Jake Locker

QB

9.065129

12.01273

0.245373

Chicago Bears

DE

8.4375

11.875

0.289474

Matt Forte

RB

14.61648

11.82667

0.235892

2012 NFL Statistics by player and Team in CSV format

I have downloaded stats for the recently completed 2012 NFL regular season from yahoo.com, cleaned the data, and saved in CSV format. The files are located here. The column headers should be self-explanatory.

There are seven files:

  • QB: quarterback data.
  • RB: running backs.
  • WR: wide receivers.
  • TE: tight ends.
  • K: kickers. For the yardage columns, the first entry indicates FG made, the second attempted. So 4-5 means 4 made out of 5 from the distance range.
  • Def: defensive stats by team.
  • ST: special teams stats by team.

I have also provided a spreadsheet that combines all of the above.

Touchdowns are lognormally distributed

…well, not exactly. But it’s snappier if I put it that way.

What I really mean is: the number of pass attempts (or receptions, or carries) per touchdown is lognormally distributed, and that fact can be used to produce more stable fantasy football forecasts.

Click here to download the SAS source [estimate2.sas]

In my last two posts, I laid out simple fantasy football forecasting engines in SAS and R. An important component of a fantasy football score is the number of touchdowns scored by each player. Touchdowns can vary considerably among players with otherwise similar performance. For example, let’s look at the top three running backs from my previous post:

Name Rush Rush_Yds Rush_Avg Rush_TD FFPts
Ray Rice 291 1364 4.7 12 292.8
LeSean McCoy 273 1309 4.8 17 280.4
Maurice Jones-Drew 343 1606 4.7 8 262

LeSean McCoy scored more than twice as many touchdowns as Maurice Jones-Drew. He scored several more than Ray Rice, but otherwise have very similar stats. The gut instinct that drives this post is that I don’t think LeSean McCoy is not going to score that many touchdowns this year!

How can I analyze touchdowns? I could simply draw a histogram of touchdowns per player, but that wouldn’t be very insightful. Players who get the ball more are more likely to score more touchdowns. So let’s control for that by dividing by the number of rushing attempts each player makes: let’s chart the touchdown rate. The histogram of rushing attempts per touchdown for the top 60 running backs in my 2011 dataset is interesting:

image

To my eye, it looks lognormally distributed. It’s not perfect, but it looks like a very reasonable approximation. A lognormal distribution makes sense – we expect that the distribution would be “heavy tailed” because going towards the left (1 touchdown per rush) is much harder than going to the right. Nobody scores every time they get the ball. Here is the SAS code that produces the histogram and the best fitting lognormal distribution. (I’m not doing this in R because I don’t know how to fit distributions in that environment. I am sure it is easy to do.)

** Plot a histogram, and save the lognormal distribution parameters. **;
proc univariate data=rb(obs=60) noprint;
  var Rush_Per_TD;
  histogram / lognormal nendpoints=15 cfill=blue outhistogram=rb_hist;
  ods output ParameterEstimates=rb_fit;
run;

The options for the “histogram” statement specify the distribution type, chart style, and an output dataset for the bins (which I then copied over to the free Excel 2013 preview to make a less-crappy looking chart). The “ods output” statement is a fancy way to save the lognormal parameters into a dataset for later use.

I can understand why there is a wide variation of values. Off the top of my head:

  • Skill of the RB.
  • Skill of the offensive line that blocks for the RB.
  • How often the player gets carries near the goalline.
  • Some teams call more red zone rush plays than others.
  • Quality of opposition.
  • Luck.
  • Stuff like this. (This moment still burns…)

With these reasons in mind, I certainly don’t expect that all RBs will end up with the same rush/TD ratio in the long run. However, I think that it is likely that players on the ends of the distribution (either way) in 2011 are likely to be closer to the middle in 2012. Here’s what we can do: compute the conditional distribution function (cdf) for the fitted lognormal distribution for each player’s rush/TD ratio. This is a number between 0 and 1 that indicates “how extreme” the player is – 0 means all the way on the left. For example, LeSean McCoy is 0.0553 and is Maurice Jones- Drew is 0.5208. This means that LeSean McCoy is an outlier (close to 0), and MJD is not (close to 1/2).

To project next year’s ratio, I take a weighted average of the player’s binomial CDF and the middle of the distribution (0.5). I somewhat arbitrarily chose to take 2/3 times the CDF and add 1/3 times 0.5. This means that while I believe that players will regress to the mean somewhat, that I do believe that there are significant structural differences between players that will persevere from one season to the next.

Once I have the projected rush/TD figures, I can multiply by rushes and get a projected 2012 TD figure that I can use in fantasy scoring. If I take the rather large leap that touchdowns for all positions behave in this way, I can write a generic “normalizing” function that I can use for touchdowns at all positions.

** Recalibrate a variable with the assumption that it is lognormally distributed.   **;
** -- position: a dataset with player information. It should have a variable called **;
**              CalibrateVar.                                                       **;
** -- obscount: the number of observations to use for analysis.                     **;
** -- CalibrateVar: the variable under analysis.                                    **;
** The macro will create a new variable ending in _1 with the calibrated values.    **;
%macro Recalibrate(position, obscount, CalibrateVar);
  ** Sort the data by the initial score computed in my first post. **;
  proc sort data=&position;
    by descending FFPts0;
  run;

  ** Plot a histogram, and save the lognormal distribution parameters. **;
  proc univariate data=&position(obs=&obscount) noprint;
    var &CalibrateVar;
    histogram / lognormal nendpoints=15 cfill=blue outhistogram=&position._hist;
    ods output ParameterEstimates=&position._fit;
  run;

  ** Get the lognormal parameters into macro variables so I can use them for computation. **;
  data _null_;
    set &position._fit;
    if Parameter = 'Scale' then call symput('Scale', Estimate);
    if Parameter = 'Shape' then call symput('Shape', Estimate);
  run;

  ** Compute the projected values for each player using the distribution. **;
  data &position;
    set &position;
    LogNormCdf = cdf('LOGNORMAL', &CalibrateVar, &Scale, &Shape);
    &CalibrateVar._1 = quantile('LOGNORMAL', 0.67 * LogNormCdf + 0.33 * 0.5, &Scale, &Shape);
  run;

%mend;

A call to this macro looks like this:

%Recalibrate(rb, 60, Rush_Per_TD);

After this call I will have a variable called Rush_Per_TD1 in my rb dataset.

I have modified the forecasting engine to recalibrate touchdowns for all positions – see estimate2.sas. You can see below how the rankings change when I recalibrate: here are the top 20 running backs. Players in green moved up in the ratings after recalibration; players in red moved down. Unsurprisingly, LeSean McCoy moved down.

Pos Name Team G Rush Rush_Yds Rush_YG Rush_Avg Rush_TD Rec Rec_Yds Rec_YG Rec_Avg Rec_Lng YAC Rec_1stD Rec_TD Fum FumL Rush_Per_TD Rec_Per_TD FFPts0 LogNormCdf Rec_Per_TD_1 Rush_Per_TD_1 Rush_TD_1 Rec_TD_1 FFPts FFPtsN Rank New Rank Old
RB Ray Rice BAL 16 291 1364 85.3 4.7 12 76 704 44 9.3 52 9.2 30 3 2 2 24.25 25.33333 292.8 0.183094 23.80672 29.76091 9.777928 3.192375 280.62182 158.7998 1 1
RB Maurice Jones-Drew JAC 16 343 1606 100.4 4.7 8 43 374 23.4 8.7 48 9.8 18 3 6 1 42.88 14.33333 262 0.520781 16.43331 42.43739 8.082496 2.616637 260.1947976 138.3728 2 3
RB Arian Foster HOU 13 278 1224 94.2 4.4 10 53 617 47.5 11.6 78 12.1 19 2 5 3 27.80 26.5 250.1 0.249994 24.51103 32.10521 8.65903 2.162292 243.0279329 121.2059 3 4
RB LeSean McCoy PHI 15 273 1309 87.3 4.8 17 48 315 21 6.6 26 8.8 18 3 1 1 16.06 16 280.4 0.05537 17.57991 25.27579 10.80085 2.730389 241.587431 119.7654 4 2
RB Michael Turner ATL 16 301 1340 83.8 4.5 11 17 168 10.5 9.9 32 8.8 8 0 3 2 27.36 212.8 0.241639 31.81053 9.462275 0 203.5736476 81.75161 5 6
RB Marshawn Lynch SEA 15 285 1204 80.3 4.2 12 28 212 14.1 7.6 26 8.1 8 1 3 2 23.75 28 215.6 0.173974 25.38375 29.44301 9.679716 1.103068 202.2967045 80.47466 6 5
RB Steven Jackson STL 15 260 1145 76.3 4.4 5 42 333 22.2 7.9 50 7.6 17 1 2 1 52.00 42 181.8 0.646438 31.61723 48.20039 5.394148 1.32839 186.1352231 64.31318 7 11
RB Ryan Mathews SDG 14 222 1091 77.9 4.9 6 50 455 32.5 9.1 42 9.3 18 0 5 2 37.00 186.6 0.422678 38.45808 5.77252 0 185.2351183 63.41308 8 8
RB Michael Bush OAK 16 256 977 61.1 3.8 7 37 418 26.1 11.3 55 9.4 14 1 1 1 36.57 37 185.5 0.415045 29.7855 38.16249 6.708157 1.242215 185.2022349 63.38019 9 9
RB Darren Sproles NOR 16 87 603 37.7 6.9 2 86 710 44.4 8.3 39 8.4 35 7 0 0 43.50 12.28571 185.3 0.530443 15.07624 42.85039 2.03032 5.70434 177.7079564 55.88591 10 10
RB Reggie Bush MIA 15 216 1086 72.4 5 6 43 296 19.7 6.9 34 7.6 12 1 4 2 36.00 43 176.2 0.404778 31.93509 37.76765 5.719181 1.346481 176.5939721 54.77193 11 13
RB Matt Forte CHI 12 203 997 83.1 4.9 3 52 490 40.8 9.4 56 8.8 19 1 2 2 67.67 52 168.7 0.793148 34.17389 56.47246 3.594673 1.521629 175.397812 53.57577 12 15
RB Frank Gore SFO 16 282 1211 75.7 4.3 8 17 114 7.1 6.7 13 6.1 5 0 2 2 35.25 176.5 0.391156 37.24835 7.570805 0 173.92483 52.10279 13 12
RB Chris Johnson TEN 16 262 1047 65.4 4 4 57 418 26.1 7.3 34 6.8 13 0 3 1 65.50 168.5 0.777213 55.46099 4.724041 0 172.8442456 51.0222 14 16
RB Fred Jackson BUF 10 170 934 93.4 5.5 6 39 442 44.2 11.3 49 12.8 13 0 2 2 28.33 169.6 0.26023 32.4672 5.236054 0 165.0163236 43.19428 15 14
RB Adrian Peterson MIN 12 208 970 80.8 4.7 12 18 139 11.6 7.7 22 7 5 1 1 0 17.33 18 188.9 0.071217 18.96808 25.84139 8.049102 0.948963 164.8883862 43.06634 16 7
RB Shonn Greene NYJ 16 253 1054 65.9 4.2 6 30 211 13.2 7 36 7.2 6 0 1 0 42.17 162.5 0.509643 41.96655 6.02861 0 162.6716623 40.84962 17 18
RB Beanie Wells ARI 14 245 1047 74.8 4.3 10 10 52 3.7 5.2 10 2.2 1 0 4 2 24.50 165.9 0.187692 29.92123 8.188167 0 155.0290026 33.20696 18 17
RB Willis McGahee DEN 15 249 1199 79.9 4.8 4 12 51 3.4 4.3 12 3.9 2 1 4 3 62.25 12 149 0.750944 14.89466 53.86313 4.622828 0.805658 151.5709151 29.74887 19 22
RB Rashard Mendenhall PIT 15 228 928 61.9 4.1 9 18 154 10.3 8.6 35 9.3 5 0 1 1 25.33 160.2 0.203174 30.46166 7.48482 0 151.1089178 29.28688 20 19

I actually used this as draft guidance (I selected Ray Rice with my first pick in a recent draft). Let’s see if it holds water!

Fantasy Football Player Forecasting in R

I rewrote my previous post in R, mainly because Paul Rubin mentioned the fact that SAS costs too much on Twitter:

Click here to download the R source [estimate.r]

The R code is a bit shorter (120 lines instead of 157), but they are basically the same complexity. I am following exactly the same logic as last time so I won’t go into the football part of it. I will simply highlight a couple of things about the code.

I rely on read.csv to read each input file into a data frame. Once I have a data frame, the primary operation is to “score” the position. Here is the code to score the running back position. The “within” statement makes it easy.

score.rb <- function(rb) {
  rb <- within(rb, {
    FFPts <- (Rush.TD + Rec.TD) * PtsTD + FumL * PtsFum + Rush.Yds / RushYdsPt + Rec.Yds / RecYdsPt
  })
  return(rb)
}

Remember from the last post that the next thing I do is to “normalize” the scores by subtracting the score of the projected worst starter on each team. This means sorting the players by score and picking one in a particular slot. Here’s how that works:

normalize <- function(p, count) {
  FFPtsMin <- as.numeric(p[with(p, order(-as.numeric(FFPts))), ][count,"FFPts"])
  p$FFPtsN <- apply(p,1,function(row) (as.numeric(row["FFPts"]) - FFPtsMin))
  return(p)
}

So by calling score.rb and normalize in sequence, I get final scores for the RB position. Once I score all of the positions, I simply need to join the data frames for each position together into a final data frame (called “players”). I write this out to CSV. I have saved the complete rankings so you can see the results.

Click here to download player ratings [NFL 2011 Ratings.csv]

It was a lot easier for me to write this code in SAS because it’s much more familiar to me. Even taking my prejudice into account, for this task I prefer SAS to R because the use of DATA steps leads to very simple, easy to understand code. R has great libraries, better extensibility, and is free, but the language itself feels a bit clunky.

Fantasy Football Player Forecasting in less than 200 lines of SAS

In my last post I provided data for NFL players and teams for the 2011 season. In this post I develop a simple, pretty darn decent forecasting engine in less than 200 lines of SAS.

Click here to download the SAS source [estimate.sas].

For the uninitiated: fantasy football involves a bunch of 30-something males selecting players from real NFL teams and competing against each other for increasingly high stakes. The score for a fantasy team is computed by applying a set of scoring rules to the real-life performance of each player during each week of NFL season. For example, if touchdowns are valued at 6 points, and throwing an interception is penalized 2 points, if Drew Brees throws 4 TDs and 2 INTs his score for the week is 4 * 6 – 2 * 2 = 20. There are typically additional scoring rules that involve the number of yards gained by players, as well as the performances of kickers and defensive units based on more esoteric considerations. A fantasy football participant drafts a set of players (and defensive units) and selects a portion of them to “play” on his team each week. Typically you can play only a certain number of players of each position per week: for example 1 quarterback, 2 running backs, etc. Fantasy teams are matched against each other each week – the team with the highest combined team score wins.

So a smart fantasy football player tries to draft a combination of players that will result in the highest projected points per week. The forecasting engine described in this post computes a rating for each player that can be used to prioritize draft selection. The basic assumption behind the forecasting engine is that a player (or team’s) performance for the 2012 season will be exactly the same as 2011. This is obviously incorrect:

  • Players improve or decline in ability over time.
  • Players suffer injuries.
  • Rookies have no performance in 2011 since they didn’t play.
  • and so on.

All of these things can be accounted for, but I won’t here. That makes things simpler: all we really want to do is apply the rules of the league to compute the number of fantasy points for each player. Let’s take running backs as an example. In my league, running backs accumulate points as follows:

  • 1 point for every 10 rushing yards.
  • 1 point for every 10 receiving yards.
  • 6 points per touchdown.
  • 2 points deducted per fumble.
    So the first step is to read the running back data into a SAS dataset. Here’s a macro to do that:
** Read a CSV file into a SAS dataset.       **;
%macro ReadCSV(position);
  proc import datafile="C:\data\Football\NFL 2011 &position..csv" dbms=csv
    out=&position replace;
    getnames=yes;
  run;
%mend;

The next step is to score each player. That’s easily done using a SAS data step:

** Compute RB ratings. **;
%macro ScoreRB;
  %ReadCsv(RB);
  data rb;
    set rb;
    FFPts = (Rush_TD + Rec_TD) * &PtsTD + FumL * &PtsFum + Rush_Yds / &RushYdsPt + Rec_Yds / &RecYdsPt;
  run;
%mend;

Now the SAS table RB will have an additional column called FFPts that has the forecasted fantasy points for each player over the course of the season. I have introduced macro variables to represent, e.g. the number of points per touchdown. As you will see in the full code, you can customize those according to the rules for your league.

It’s pretty easy to write similar macros for quarterbacks, kickers, and so on. If you combined all of the resulting datasets and sorted them by FFPts, you’d have a “draft board” that could be used to select players. But this would stink. Why?

The reason is that simply sorting players by expected number of points does not take into account that when drafting players we also care about the variance between players of the same position. Here’s what I mean. By virtue of the scoring rules, quarterbacks usually score more fantasy points than tight ends on average. Consider a league where the average quarterback scores 400 points per year. Now suppose that tight ends score 200 points on average, but the best tight end in the league scores 280 (call him John Doe). Given the choice, it is smarter to draft John Doe over a quarterback that scores 400 because John will outscore his competition at that position by 80 points. 400 point QBs are easy to come by, but 280 point TEs are not.

Therefore I “center” the scores for each position by finding the score for the “worst starter” for each position. In other words, if my league has 12 teams then I find the score of the 12th best quarterback. Then I subtract that value from the scores of all quarterbacks. I know have a “position invariant” metric that I can use to compare players across positions. Computing centered scored is very easy using PROC MEANS:

** Create cross-position value estimates by subtracting the value of the projected  **;
** worst starter at that position. The number of league-wide starters for the       **;
** position are given by obscount. This value will depend on your league.           **;
%macro Normalize(position, obscount);
  proc sort data=&position;
    by descending FFPts;
  run;

  proc means data=&position.(obs=&obscount) min noprint;
    var FFPts;
    output out=&position._summ;
  run;

  data _null_;
    set &position._summ;
    if _STAT_='MIN';
    call symput('FFPtsMin', FFPts);
  run;

  data &position;
    length Pos $ 8;
    set &position;
    Pos = upcase("&Position");
    FFPtsN = FFPts - &FFPtsMin;
  run;
%mend;

We just need to call Normalize after we do the initial scoring. Again, here is the link to the full source.

Once this is done then we can combine all of the results and sort. What we get is a perfectly plausible draft board! Here are the first 25 players with both “raw” and “centered” points. Run the code to get ratings for all 640 players and teams. Poor Billy Volek is a the bottom, through no fault of his own.

Pos Name FFPts FFPtsN
QB Aaron Rodgers 487.42 216.2388
QB Drew Brees 449.6625 178.4813
RB Ray Rice 292.8 173.8
RB LeSean McCoy 280.4 161.4
WR Calvin Johnson 262.1 146.5
QB Tom Brady 416.5313 145.35
TE Rob Gronkowski 240.9 145.3
RB Maurice Jones-Drew 262 143
RB Arian Foster 250.1 131.1
QB Matthew Stafford 394.9875 123.8063
QB Cam Newton 379.35 108.1688
WR Jordy Nelson 216.3 100.7
TE Jimmy Graham 195 99.4
RB Marshawn Lynch 215.6 96.6
WR Wes Welker 210.9 95.3
RB Michael Turner 212.8 93.8
WR Victor Cruz 205.6 90
WR Larry Fitzgerald 189.1 73.5
RB Adrian Peterson 188.9 69.9
RB Ryan Mathews 186.6 67.6
RB Michael Bush 185.5 66.5
RB Darren Sproles 185.3 66.3
RB Steven Jackson 181.8 62.8
WR Roddy White 177.6 62
WR Steve Smith 177.4 61.8

2011 NFL Statistics by player and team in CSV format

More data, this time for fantasy football junkies! I have downloaded stats for the 2011 NFL season from yahoo.com, cleaned the data, and saved in CSV format. The files are located here. The column headers should be self-explanatory.

There are seven files:

  • QB: quarterback data.
  • RB: running backs.
  • WR: wide receivers.
  • TE: tight ends.
  • K: kickers. For the yardage columns, the first entry indicates FG made, the second attempted. So 4-5 means 4 made out of 5 from the distance range.
  • Def: defensive stats by team.
  • ST: special teams stats by team.

Enjoy!