A friend of mine asked me if I wanted to participate in an NCAA tournament pool. The twist: you have to write a program to predict the results. Here are the rules I was given:

The algorithm for your [Java] code goes below. As you can see in the method header, you are passed two objects: Team A and Team B. Spend some time thinking about this and writing your algorithm out, as once you submit your code you won’t get any feedback until your bracket is sent to you. The team class is a class defined by us, here’s what it looks like:

public class Team { public String name; public int seed, RPI_rank; public double points_scored, points_against, wins, losses, RPI; public Team(String n, int s, double rpi, int rpi_rank, int w, int l, double ps, double pa) { name = n; seed = s; RPI = rpi; RPI_rank = rpi_rank; wins = w; losses = l; points_scored = ps; points_against = pa; } }I need to fill this in:

public Team Game(Team A, Team B, int round) { // Fill me in }

I gave myself one hour to come up with something. For better or worse, here’s what I did. I decided that I wouldn’t "cheat", i.e. use any information other than what I have been given. Otherwise a reasonable approach would be to have a giant switch statement based on name, and look up the Sagarin ratings for each team! I also noticed right away that I don’t have any information about past games between teams. So it seems clear that I need to have a "scoring" based approach, where I compute a metric for team A and Team B and return the team with the higher score.

My first idea was to try and come up with a metric based on points_scored and points_against. I remembered from Mathletics that there is a "pythagorean expectation" formula for predicting win percentage. I quickly learned that there is a variant for basketball. I have a big problem: I don’t have the "defensive and offensive efficiencies", I only have average points for and against. A simple hack is to scale these values by the average number of points scored per game this year, which appears to be 137.3137725. (I downloaded stats from ncaa.org as CSV and threw them in an Excel spreadsheet). Once you have the "normalized" points per game on offense and defense, you can apply the formula on the wikipedia page with 11.5 as the exponent. Here are the first few records:

Name OPP PTS OPP PPG PPG TotalPPG NormPPG NormOpp Pythag Kansas 2169 63.8 81.8 145.6 77.1446 60.16908 0.945732 Murr. St. 2056 60.5 77.5 138 77.11461 60.19915 0.945204 BYU 2216 65.2 83 148.2 76.90312 60.41064 0.941358 Duke 2100 61.8 78 139.8 76.61283 60.70093 0.935671 Cst. Car. 2038 59.9 74.6 134.5 76.16065 61.15312 0.925796 Utah St. 2028 59.6 73.7 133.3 75.91916 61.39460 0.919973 Syracuse 2140 66.9 81.5 148.4 75.41153 61.90223 0.906370 Kentucky 2219 65.3 79.2 144.5 75.26125 62.05252 0.901971

You can see the problem: this metric doesn’t account for quality of opposition. Teams that beat up on bad teams will be unjustly rewarded. Murray State has had a good year, but they are not the second best team in the country! So I decided to weight this factor equally with RPI. RPI attempts to take strength of schedule into account, and is one of the factors the tournament selection committee takes into account. Let’s look at this same list of teams once I incorporate the RPI:

Name OPP PPG PTS PPG TotalPPG NormPPG NormOpp Pythag RPI Score Kansas 63.8 2780 81.8 145.6 77.14468 60.16908 0.945732 0.688 1.633732 Duke 61.8 2653 78 139.8 76.61283 60.70093 0.935671 0.664 1.599671 Kentucky 65.3 2694 79.2 144.5 75.26125 62.05252 0.901971 0.666 1.567971 Syracuse 66.9 2607 81.5 148.4 75.41153 61.90223 0.906374 0.651 1.557374 BYU 65.2 2821 83 148.2 76.90312 60.41064 0.941358 0.61 1.551358 Utah St. 59.6 2506 73.7 133.3 75.91916 61.39460 0.919973 0.602 1.521973 Murr. St. 60.5 2635 77.5 138 77.11461 60.19915 0.945204 0.575 1.520204 Cst. Car. 59.9 2535 74.6 134.5 76.16065 61.15312 0.925796 0.519 1.444796

That’s not totally crazy! My hour was pretty much up at that point, so I went with it. Here’s the code (in C#):

public static Team Game(Team A, Team B, int round) { // Fake the 'pythagorean calculation' and weight it equally with RPI. double avg_points_total = 137.31377245509; double a_points_total = A.points_scored + A.points_against; double a_adj_points_scored = (avg_points_total / a_points_total) * A.points_scored; double a_adj_points_against = (avg_points_total / a_points_total) * A.points_against; double a_pythag = Math.Pow(a_adj_points_scored, 11.5) / (Math.Pow(a_adj_points_scored, 11.5) + Math.Pow(a_adj_points_against, 11.5)); double a_score = a_pythag + A.RPI; double b_points_total = B.points_scored + B.points_against; double b_adj_points_scored = (avg_points_total / b_points_total) * B.points_scored; double b_adj_points_against = (avg_points_total / b_points_total) * B.points_against; double b_pythag = Math.Pow(b_adj_points_scored, 11.5) / (Math.Pow(b_adj_points_scored, 11.5) + Math.Pow(b_adj_points_against, 11.5)); double b_score = b_pythag + B.RPI; return a_score >= b_score ? A : B; }

And here is the resulting bracket. The Final Four is Kansas, Syracuse, Kentucky, and Duke.

Not crazy, uses only the information I have been given, and more fun than just using the RPI directly! Mission accomplished. Here is a link to my picks on ESPN.com.

Reblogged this on Chika's Blog and commented:

I’m currently testing his algorithm for this years bracket. His though process seems sound…. Let see what it can do!