NCAA Tournament Analytics Model 2014 Preview

My NCAA Tournament Prediction Model posts have traditionally been pretty popular, so I thought I would put in a bit more effort this year. In this post I want to share some “raw materials” that you might find helpful, and describe the methodology behind this year’s model.

Here are some resources that you might find helpful if you want to build your own computer-based model for NCAA picks:

    This year I am going to combine two ideas to build my model. The first is a “win probability” model developed by Joel Sokol which is described on Net Prophet. As the blog post says, this model estimates the probability that Team A will beat Team B on a neutral site given Team A beat Team B at home by a given number of points. So for example if A loses to B by 40 at home, this probability is close to zero. You can hijack this model to assign a “strength of victory” rating: a blowout win is a greater show of team strength than a one-point thriller.

The second idea is a graph theoretical approach stolen from this excellent post on BioPhysEngr Blog. The idea here is to create a giant network based on the results of individual games. So for example if Iowa beats Ohio State then there are arrows between the Iowa and Ohio State nodes. The weight on the edge is a representation of the strength of the victory (or loss). Given this network we can apply an eigenvalue centrality approach. In English, this means determining the importance of all of the nodes in the network, which in my application means the overall strength of each team. I like this approach because it is easy for me to code: computing the largest eigenvalue using the power method is simple enough for even Wikipedia to describe succinctly. (And shockingly enough, according to the inscription on my Numerical Analysis text written by the great Ken Atkinson, I learned it twenty years ago!)

The difference between my approach and the BioPhysEngr approach is that I am using Sokol’s win probability logic to calculate the edge weights. As you’ll see when I post the code, it’s about 150 lines of Python, including all the bits to read in the game data.

I ran a preliminary version of my code against all college basketball games up until March 9, and my model’s Top 25 is given below. Mostly reasonable with a few odd results (Manhattan? Canisius? Iona?) I will make a few tweaks and post my bracket after the selection show on Sunday.

 

1

Wichita St

2

Louisville

3

Villanova

4

Duke

5

Kansas

6

Florida

7

Arizona

8

Virginia

9

Michigan St

10

North Carolina

11

Ohio State

12

Wisconsin

13

Manhattan

14

Syracuse

15

Iowa

16

Kentucky

17

Iona

18

Pittsburgh

19

Creighton

20

VA Commonwealth

21

Tennessee

22

Oklahoma St

23

Michigan

24

Canisius

25

Connecticut

Author: natebrix

Follow me on twitter at @natebrix.

2 thoughts on “NCAA Tournament Analytics Model 2014 Preview”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s