2018 NCAA Tournament Picks

Every year since 2010 I have used data science to predict the results of the NCAA Men’s Basketball Tournament. In this post I will describe the methodology that I used to create my picks (full bracket here). The model has Virginia, Michigan, Villanova, and Michigan State in the Final Four with Virginia defeating Villanova in the championship game:

Screen Shot 2018-03-13 at 9.20.05 PM

Here are my ground rules:

  • The picks should not be embarrassingly bad.
  • I shall spend no more than one work day on this activity (and 30 minutes for this post). This year I spent two hours cleaning up and running my code from last year.
  • I will share my code and raw data. (The data is available on Kaggle. The code is not cleaned up but here it is anyway.)

I used a combination of game-by-game results and team metrics from 2003-2017 to build the features in my model. Here is a summary:

I also performed some post-processing:

  • I transformed team ranks to continuous variables given a heuristic created by Jeff Sonos.
  • Standard normalization.
  • One hot encoding of categorical features.
  • Upset generation. I found the results to be not interesting enough, so I added a post-processing function that looks for games where the win probability for the underdog (a significantly lower seed) is quite close to 0.5. In those cases the model picks the underdog instead.

The model predicts the probability one team defeats another, for all pairs of teams in the tournament. The model is implemented in Python and uses logistic regression. The model usually performs well. Let’s see how it does this year!


Author: natebrix

Follow me on twitter at @natebrix.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s