…well, not exactly. But it’s snappier if I put it that way.
What I really mean is: the number of pass attempts (or receptions, or carries) per touchdown is lognormally distributed, and that fact can be used to produce more stable fantasy football forecasts.
In my last two posts, I laid out simple fantasy football forecasting engines in SAS and R. An important component of a fantasy football score is the number of touchdowns scored by each player. Touchdowns can vary considerably among players with otherwise similar performance. For example, let’s look at the top three running backs from my previous post:
LeSean McCoy scored more than twice as many touchdowns as Maurice Jones-Drew. He scored several more than Ray Rice, but otherwise have very similar stats. The gut instinct that drives this post is that I don’t think LeSean McCoy is not going to score that many touchdowns this year!
How can I analyze touchdowns? I could simply draw a histogram of touchdowns per player, but that wouldn’t be very insightful. Players who get the ball more are more likely to score more touchdowns. So let’s control for that by dividing by the number of rushing attempts each player makes: let’s chart the touchdown rate. The histogram of rushing attempts per touchdown for the top 60 running backs in my 2011 dataset is interesting:
To my eye, it looks lognormally distributed. It’s not perfect, but it looks like a very reasonable approximation. A lognormal distribution makes sense – we expect that the distribution would be “heavy tailed” because going towards the left (1 touchdown per rush) is much harder than going to the right. Nobody scores every time they get the ball. Here is the SAS code that produces the histogram and the best fitting lognormal distribution. (I’m not doing this in R because I don’t know how to fit distributions in that environment. I am sure it is easy to do.)
** Plot a histogram, and save the lognormal distribution parameters. **; proc univariate data=rb(obs=60) noprint; var Rush_Per_TD; histogram / lognormal nendpoints=15 cfill=blue outhistogram=rb_hist; ods output ParameterEstimates=rb_fit; run;
The options for the “histogram” statement specify the distribution type, chart style, and an output dataset for the bins (which I then copied over to the free Excel 2013 preview to make a less-crappy looking chart). The “ods output” statement is a fancy way to save the lognormal parameters into a dataset for later use.
I can understand why there is a wide variation of values. Off the top of my head:
- Skill of the RB.
- Skill of the offensive line that blocks for the RB.
- How often the player gets carries near the goalline.
- Some teams call more red zone rush plays than others.
- Quality of opposition.
- Stuff like this. (This moment still burns…)
With these reasons in mind, I certainly don’t expect that all RBs will end up with the same rush/TD ratio in the long run. However, I think that it is likely that players on the ends of the distribution (either way) in 2011 are likely to be closer to the middle in 2012. Here’s what we can do: compute the conditional distribution function (cdf) for the fitted lognormal distribution for each player’s rush/TD ratio. This is a number between 0 and 1 that indicates “how extreme” the player is – 0 means all the way on the left. For example, LeSean McCoy is 0.0553 and is Maurice Jones- Drew is 0.5208. This means that LeSean McCoy is an outlier (close to 0), and MJD is not (close to 1/2).
To project next year’s ratio, I take a weighted average of the player’s binomial CDF and the middle of the distribution (0.5). I somewhat arbitrarily chose to take 2/3 times the CDF and add 1/3 times 0.5. This means that while I believe that players will regress to the mean somewhat, that I do believe that there are significant structural differences between players that will persevere from one season to the next.
Once I have the projected rush/TD figures, I can multiply by rushes and get a projected 2012 TD figure that I can use in fantasy scoring. If I take the rather large leap that touchdowns for all positions behave in this way, I can write a generic “normalizing” function that I can use for touchdowns at all positions.
** Recalibrate a variable with the assumption that it is lognormally distributed. **; ** -- position: a dataset with player information. It should have a variable called **; ** CalibrateVar. **; ** -- obscount: the number of observations to use for analysis. **; ** -- CalibrateVar: the variable under analysis. **; ** The macro will create a new variable ending in _1 with the calibrated values. **; %macro Recalibrate(position, obscount, CalibrateVar); ** Sort the data by the initial score computed in my first post. **; proc sort data=&position; by descending FFPts0; run; ** Plot a histogram, and save the lognormal distribution parameters. **; proc univariate data=&position(obs=&obscount) noprint; var &CalibrateVar; histogram / lognormal nendpoints=15 cfill=blue outhistogram=&position._hist; ods output ParameterEstimates=&position._fit; run; ** Get the lognormal parameters into macro variables so I can use them for computation. **; data _null_; set &position._fit; if Parameter = 'Scale' then call symput('Scale', Estimate); if Parameter = 'Shape' then call symput('Shape', Estimate); run; ** Compute the projected values for each player using the distribution. **; data &position; set &position; LogNormCdf = cdf('LOGNORMAL', &CalibrateVar, &Scale, &Shape); &CalibrateVar._1 = quantile('LOGNORMAL', 0.67 * LogNormCdf + 0.33 * 0.5, &Scale, &Shape); run; %mend;
A call to this macro looks like this:
%Recalibrate(rb, 60, Rush_Per_TD);
After this call I will have a variable called Rush_Per_TD1 in my rb dataset.
I have modified the forecasting engine to recalibrate touchdowns for all positions – see estimate2.sas. You can see below how the rankings change when I recalibrate: here are the top 20 running backs. Players in green moved up in the ratings after recalibration; players in red moved down. Unsurprisingly, LeSean McCoy moved down.
|Pos||Name||Team||G||Rush||Rush_Yds||Rush_YG||Rush_Avg||Rush_TD||Rec||Rec_Yds||Rec_YG||Rec_Avg||Rec_Lng||YAC||Rec_1stD||Rec_TD||Fum||FumL||Rush_Per_TD||Rec_Per_TD||FFPts0||LogNormCdf||Rec_Per_TD_1||Rush_Per_TD_1||Rush_TD_1||Rec_TD_1||FFPts||FFPtsN||Rank New||Rank Old|
I actually used this as draft guidance (I selected Ray Rice with my first pick in a recent draft). Let’s see if it holds water!
In my last post I provided data for NFL players and teams for the 2011 season. In this post I develop a simple, pretty darn decent forecasting engine in less than 200 lines of SAS.
For the uninitiated: fantasy football involves a bunch of 30-something males selecting players from real NFL teams and competing against each other for increasingly high stakes. The score for a fantasy team is computed by applying a set of scoring rules to the real-life performance of each player during each week of NFL season. For example, if touchdowns are valued at 6 points, and throwing an interception is penalized 2 points, if Drew Brees throws 4 TDs and 2 INTs his score for the week is 4 * 6 – 2 * 2 = 20. There are typically additional scoring rules that involve the number of yards gained by players, as well as the performances of kickers and defensive units based on more esoteric considerations. A fantasy football participant drafts a set of players (and defensive units) and selects a portion of them to “play” on his team each week. Typically you can play only a certain number of players of each position per week: for example 1 quarterback, 2 running backs, etc. Fantasy teams are matched against each other each week – the team with the highest combined team score wins.
So a smart fantasy football player tries to draft a combination of players that will result in the highest projected points per week. The forecasting engine described in this post computes a rating for each player that can be used to prioritize draft selection. The basic assumption behind the forecasting engine is that a player (or team’s) performance for the 2012 season will be exactly the same as 2011. This is obviously incorrect:
- Players improve or decline in ability over time.
- Players suffer injuries.
- Rookies have no performance in 2011 since they didn’t play.
- and so on.
All of these things can be accounted for, but I won’t here. That makes things simpler: all we really want to do is apply the rules of the league to compute the number of fantasy points for each player. Let’s take running backs as an example. In my league, running backs accumulate points as follows:
- 1 point for every 10 rushing yards.
- 1 point for every 10 receiving yards.
- 6 points per touchdown.
- 2 points deducted per fumble.
- So the first step is to read the running back data into a SAS dataset. Here’s a macro to do that:
** Read a CSV file into a SAS dataset. **; %macro ReadCSV(position); proc import datafile="C:\data\Football\NFL 2011 &position..csv" dbms=csv out=&position replace; getnames=yes; run; %mend;
The next step is to score each player. That’s easily done using a SAS data step:
** Compute RB ratings. **; %macro ScoreRB; %ReadCsv(RB); data rb; set rb; FFPts = (Rush_TD + Rec_TD) * &PtsTD + FumL * &PtsFum + Rush_Yds / &RushYdsPt + Rec_Yds / &RecYdsPt; run; %mend;
Now the SAS table RB will have an additional column called FFPts that has the forecasted fantasy points for each player over the course of the season. I have introduced macro variables to represent, e.g. the number of points per touchdown. As you will see in the full code, you can customize those according to the rules for your league.
It’s pretty easy to write similar macros for quarterbacks, kickers, and so on. If you combined all of the resulting datasets and sorted them by FFPts, you’d have a “draft board” that could be used to select players. But this would stink. Why?
The reason is that simply sorting players by expected number of points does not take into account that when drafting players we also care about the variance between players of the same position. Here’s what I mean. By virtue of the scoring rules, quarterbacks usually score more fantasy points than tight ends on average. Consider a league where the average quarterback scores 400 points per year. Now suppose that tight ends score 200 points on average, but the best tight end in the league scores 280 (call him John Doe). Given the choice, it is smarter to draft John Doe over a quarterback that scores 400 because John will outscore his competition at that position by 80 points. 400 point QBs are easy to come by, but 280 point TEs are not.
Therefore I “center” the scores for each position by finding the score for the “worst starter” for each position. In other words, if my league has 12 teams then I find the score of the 12th best quarterback. Then I subtract that value from the scores of all quarterbacks. I know have a “position invariant” metric that I can use to compare players across positions. Computing centered scored is very easy using PROC MEANS:
** Create cross-position value estimates by subtracting the value of the projected **; ** worst starter at that position. The number of league-wide starters for the **; ** position are given by obscount. This value will depend on your league. **; %macro Normalize(position, obscount); proc sort data=&position; by descending FFPts; run; proc means data=&position.(obs=&obscount) min noprint; var FFPts; output out=&position._summ; run; data _null_; set &position._summ; if _STAT_='MIN'; call symput('FFPtsMin', FFPts); run; data &position; length Pos $ 8; set &position; Pos = upcase("&Position"); FFPtsN = FFPts - &FFPtsMin; run; %mend;
We just need to call Normalize after we do the initial scoring. Again, here is the link to the full source.
Once this is done then we can combine all of the results and sort. What we get is a perfectly plausible draft board! Here are the first 25 players with both “raw” and “centered” points. Run the code to get ratings for all 640 players and teams. Poor Billy Volek is a the bottom, through no fault of his own.
Andy Katz from ESPN.com reports that the Big Ten divisions will be:
|Division A||Division B|
|Michigan State||Ohio State|
Yesterday I posted several Solver Foundation models that attempted to find a realignment that is “as fair as possible”. If you take a characterization of a program’s historical strength to be its Sagarin rating over the past twelve years, and you are looking to build two evenly matched divisions then this is an extremely fair proposal. The average Sagarin rating is almost identical:
|Division A||Division B|
|Michigan State||75.82||Ohio State||87.67|
|Average A||77.25||Average B||77.27|
In fact, this is the fairest possible realignment that follows these rules:
- Six teams per division.
- Preserve the Michigan, Ohio, and Indiana in-state rivalries. (But not Illinois.)
- “Fairness” is measured by Sagarin rating.
No artificial rules about splitting Michigan and Ohio State are required – that happens naturally as a result of trying to find a fair split. Division A has 427 total conference wins since ‘93, Division B has 412. Division A has 724 total wins versus Division B’s 708. Note however that Nebraska is in Division A and that it had a run of near perfection in the early 90’s. The average attendance for Division A schools is 69,128 versus 74,035 in Division B; much of the difference is due to Northwestern.
I left a small cliffhanger in my last post. After a long week I finally had a chance to read through the Adams paper about estimating the value of “going for it” on 4th down. I admit I was a little bit let down. As a reminder – the question is what action a football team should take on fourth down. Failure to gain the necessary yards means the ball is turned over to the opposing side, kicking turns over the ball but with better field position, and making the first down allows the drive to continue, potentially leading to more points. The conclusion of the Romer paper was that coaches are too conservative and kick the ball away in situations where they should go for it instead.
Adams hits the nail on the head by asserting that the results of the Romer paper just do not pass the “smell test”. It’s nuts to suggest that it’s a good idea to go for it on 4th and 4 on your own 25 yard line. But that leaves us only with more questions – is the conclusion of the Romer paper still valid, even if overstated? Can we identify a flaw in the reasoning? Is there a better way to model the problem?
Adams first suggestion for improving the model is to include more historical data. Adams and Romer both claim it’s hard to come up with a good model for the “going for it” problem because teams seldom go for it on fourth down in practice – data is hard to come by. Romer and Adams both use game data from the 1998 – 2000 seasons, but Adams uses data from the entire game, not just the first quarter. But why not include more recent data? [The Adams paper was written in '06, so he could have doubled the data set. We have a couple more seasons-worth of data now.] So I’m not sure I even buy the premise that data is lacking.
Adams’ second approach is to use Madden ’07 to simulate 4th down situations. I initially thought this was a really cool idea, and it kind of is, and then I remembered something I once read. Madden himself asked the designers at EA to make 4th downs more difficult to convert! You cannot find a better example of Galbraith’s notion of “conventional wisdom” in action. So as far as I am concerned, you have to throw out the middle section of the paper. Madden is not a simulation: it is pretending to be a simulation. It wants to make you feel like you are experiencing real NFL football. But the problem is that we as players do not make decisions the way that GMs, coaches, and players do. Our motivations are completely different, and there are no real consequences for our actions (other than bragging rights over your roommate). My GM will not fire me if I go for it on 4th and 5 on my own 25. Thus the game must be tuned to correct for this, otherwise you will get Tecmo-like gameplay.
The last section proposes a game-theoretic approach. Adams introduces a zero-sum game with the offense and defense as opponents. The offense and defense both have the choice of choosing a pass- or run-oriented strategy. The payoffs depend on their choices. Adams points out that this is a “simplified version of reality.” (It’s very close to the original Tecmo Bowl - two choices instead of four.) He uses this approach primarily to make the point that it is not a good idea (as Romer proposes) to use third down data to model fourth down choices, because the payoffs change enough to matter. It is an interesting line of argument for the claim that Romer’s conclusions are overstated, but it does not provide insight into how to better model the problem.
Anyway, in the course of poking around the web I came across the ZEUS Football simulation engine. It is frequently referenced in the NYTimes “5th down” blog. For example, here is an interesting discussion about taking an intentional safety late in the game. (I won’t bother to explain what that means, because if you have made it this far, you clearly already know what I am talking about.)
All the questions I raised at the beginning of this post are probably best answered by a simulation engine. Which reminds me – did I mention that Solver Foundation is adding stochastic capabilities for our version 2?
Last weekend marked the first big college football Saturday of the year. The only game I really cared about was the Northern Iowa – Iowa game: I went to Iowa, and I grew up in Cedar Falls (home of the UNI campus). Iowa came from behind and won 17-16 after blocking two field goal attempts in the final seconds. My brother and I were talking on the phone during the 4th quarter. My brother, a UNI fan, was bothered by the conservative coaching that he feels let Iowa back into the game, and that spun into a more general conversation about risk-averse coaching. I don’t know anything about sabermetrics, but I did read Moneyball, and I love sports.
Our conversation reminded me of a paper by economist David Romer that I had always intended on reading: “It’s Fourth Down and What Does the Bellman Equation Say?” (I actually recommend this updated 2005 version.) So I read it. It received some attention from the sports world when it came out because Romer’s claim is that the conventional wisdom is wrong: it’s often a much better idea to try to convert on 4th down rather than kick the ball away to the other team. It depends on field position and yards-to-go, of course. He sets up a not-too-complicated dynamic programming model where he is able to place values on particular game situations, and then compares the difference between kicking and “going for it”. It’s interesting but I have some problems with it – in particular the use of 3rd down outcomes rather than 4th down outcomes. The justification is that because teams don’t go for it on 4th down very often there is not enough data, so 3rd down data is a reasonable substitute. I have issues with this because for one thing, playcalling is very different on 3rd down, especially when one is approaching field goal range. Players are also taught to handle 3rd down differently – throw the ball away and avoid taking a sack. Romer does address these sorts of issues, but it still bothers me.
To overcome the lack of 4th down “going for it” data, Christopher Adams from the FTC (!) uses Madden NFL 07 (!!) for simulation purposes and constructs a game theoretic model for 4th down attempts in this paper. He comes to a different conclusion: the conventional wisdom may not be so bad after all. I am looking forward to reading the Adams paper in detail – it’s in my backpack. I hope to do some experimentation in this area once I get a grip on the concepts. I would like to write a paper about the prevent defense that Gregg Easterbrook and Bill Simmons despise so much! But right now, I’ve got to get back to work