My NCAA Tournament Prediction Model posts have traditionally been pretty popular, so I thought I would put in a bit more effort this year. In this post I want to share some “raw materials” that you might find helpful, and describe the methodology behind this year’s model.
Here are some resources that you might find helpful if you want to build your own computer-based model for NCAA picks:
- The Net Prophet blog has useful descriptions of a number of approaches, including Microsoft Research’s Trueskill ratings.
- Joel Sokol’s LRMC rankings are here.
- This page has the results of every men’s college basketball game played this season.
- Jeff Sagarin’s (the godfather!) college basketball ratings are located here.
- My previous posts are located here. As you can see I did a pretty bad job last year.
- This year I am going to combine two ideas to build my model. The first is a “win probability” model developed by Joel Sokol which is described on Net Prophet. As the blog post says, this model estimates the probability that Team A will beat Team B on a neutral site given Team A beat Team B at home by a given number of points. So for example if A loses to B by 40 at home, this probability is close to zero. You can hijack this model to assign a “strength of victory” rating: a blowout win is a greater show of team strength than a one-point thriller.
The second idea is a graph theoretical approach stolen from this excellent post on BioPhysEngr Blog. The idea here is to create a giant network based on the results of individual games. So for example if Iowa beats Ohio State then there are arrows between the Iowa and Ohio State nodes. The weight on the edge is a representation of the strength of the victory (or loss). Given this network we can apply an eigenvalue centrality approach. In English, this means determining the importance of all of the nodes in the network, which in my application means the overall strength of each team. I like this approach because it is easy for me to code: computing the largest eigenvalue using the power method is simple enough for even Wikipedia to describe succinctly. (And shockingly enough, according to the inscription on my Numerical Analysis text written by the great Ken Atkinson, I learned it twenty years ago!)
The difference between my approach and the BioPhysEngr approach is that I am using Sokol’s win probability logic to calculate the edge weights. As you’ll see when I post the code, it’s about 150 lines of Python, including all the bits to read in the game data.
I ran a preliminary version of my code against all college basketball games up until March 9, and my model’s Top 25 is given below. Mostly reasonable with a few odd results (Manhattan? Canisius? Iona?) I will make a few tweaks and post my bracket after the selection show on Sunday.
1 |
Wichita St |
2 |
Louisville |
3 |
Villanova |
4 |
Duke |
5 |
Kansas |
6 |
Florida |
7 |
Arizona |
8 |
Virginia |
9 |
Michigan St |
10 |
North Carolina |
11 |
Ohio State |
12 |
Wisconsin |
13 |
Manhattan |
14 |
Syracuse |
15 |
Iowa |
16 |
Kentucky |
17 |
Iona |
18 |
Pittsburgh |
19 |
Creighton |
20 |
VA Commonwealth |
21 |
Tennessee |
22 |
Oklahoma St |
23 |
Michigan |
24 |
Canisius |
25 |
Connecticut |
2 thoughts on “NCAA Tournament Analytics Model 2014 Preview”