Using math for March Madness bracket picks

The countdown to fill out your March Madness brackets is on! Who are you picking to win it all?

Today, we hear from Liana Valentino, a student at the College of Charleston who works with PUP authors Amy Langville and Tim Chartier. Liana discusses how math can be applied to bracket selection.

court chalk

What are the chances your team makes it to the next round?

The madness has begun! Since the top 64 teams have been released, brackets are being made all over the country. As an avid college basketball fan my entire life, this is always my favorite time of the year. This year, I have taken a new approach to filling out brackets that consist of more than my basketball knowledge, I am using math as well.

To learn more about how the math is used to make predictions, information is available on Dr. Tim Chartier’s March Mathness website, where you can create your own bracket using math as well!

My bracket choices are decided using the Colley and Massey ranking methods; Colley only uses wins and losses, while Massey integrates the scores of the games. Within these methods, there are several different weighting options that will change the ratings produced. My strategy is to generate multiple sets of rankings, then determine the probability that each particular team will make it to a specific round. Using this approach, I am able to combine the results of multiple methods instead of having to decide on one to use for the entire bracket.

Choosing what weighting options to use is a personal decision. I will list the ones I’ve used and the reasoning behind them using my basketball awareness.

(1)

Winning games on the road should be rewarded more than winning games at home. Because of that, I use constant rates of .6 for a winning at home, 1.6 for winning away, and 1 for winning at a neutral location; these are the numbers used by the NCAA when determining RPI. I incorporate home and away weightings when performing other weighting methods as well.

(2)

Margin of victory is another factor, but a “blow out” game is defined differently depending on the person. With that in mind, I ran methods using the margin of victory to be both 15 and 20. This means if the margin of victory if 15, then games with a point differential of 15 or higher are weighed the same. These numbers are mainly from personal experience. If a team wins by 20, I would consider that a blowout, meaning the matchup was simply unfair. If a team loses by 15, which in terms of the game is five possessions, the game wasn’t necessarily a blow out, but the winning team is clearly defined as better than the opposition.

In addition to this, I chose to weight games differently if they were close. I defined a close game as a game within one possession, therefore three points. My reasoning behind this was if a team is blowing out every opponent, it means those games are obviously against mismatched opponents, so that does not say very much about them. On the other hand, a team that constantly wins close games shows character. Also, when it comes tournament time, there aren’t going to be many blow out games, therefore teams that can handle close game situations well will excel compared to those who fold under pressure. Because of this, I weighted close games, within three points, 1.5, “blow out” games, greater than 20 points, .5, and any point differential in between as 1.

(3)

Games played at different points in the season are also weighted differently. Would you say a team is the same in the first game as the last? There are three different methods to weight time, as provided by Dr. Chartier using his March Mathness site, linearly, logarithmically, and using intervals. Linear and logarithmic weights are similar in the fact that both increase the weight of the game as the season progresses. These methods can be used if you believe that games towards the end of the season are more important than games at the beginning.

Interval weighting consists of breaking the season into equal sized intervals and choosing specific weightings for each. In one instance, I weighted the games by splitting the season in half, down weighting the first half using .5, and up weighting the second half using 1.5 and 2. These decisions were made because during the first half of the season, teams are still getting to know themselves, while during the second half of the season, there are fewer excuses the make. Also, the second half of the season is when conference games are played, which are generally considered more important than non-conference games. For the people that argue that non conference play is more important because it is usually more difficult than in conference play, I also created one bracket where I up weight the first half of the season and down weight the second half.

(4)

The last different weighting method used was incorporating if a team was on a winning streak. In this case, we would weight a game higher if one team breaks their opponents winning streak. Personally, I defined a winning streak as having won four or more games in a row.

I used several combinations of these various methods and created 36 different brackets that I have used to obtain the following information. Surprisingly, Kentucky only wins the tournament 75% of the time; Arizona wins about 20%, and the remaining 5% is split between Wisconsin and Villanova. Interestingly enough, the only round Kentucky ever loses in is the Final Four, so each time they do make it to the championship, they win. Duke is the only number 1 seed never predicted to win a championship.

Villanova makes it to the championship game 70% of the time, where the only team that prevents them from doing so is Duke, who makes it 25% of the time. The remaining teams for that side of the bracket that make it are Stephen F. Austin and Virginia, both with a 2.5% chance. Kentucky makes it to the championship game 75% of the time, while Arizona makes it 22%, and Wisconsin makes it 3%. However, if Arizona makes it the championship game, they win it 88% of the time. Furthermore, Wisconsin is predicted to play in the championship game once, which they win.

The two teams Kentucky loses to in the Final Four are Arizona, and Wisconsin. During the final four, Kentucky has Arizona as an opponent 39% of the time, where Arizona wins 50% of those matchups. Kentucky’s only other opponent in the final four is Wisconsin, where Wisconsin wins that game only 5% of the time. On the other side, Villanova makes it to the final four 97% of the time, where the one instance they did not was a loss to Virginia. Villanova’s opponent in the Final Four is made up of Duke 72%, Gonzaga 19%, Stephen F. Austin 6%, Utah at 3%. The only seeds that appear in the Final Four are 1, 2, and one 12 seed, Stephen F. Austin one time.

During the Elite 8, Duke is the only number 1 seed that does not make it 100% of the time, with Utah upsetting them in 17% of their matchups. The other Elite 8 member is Gonzaga 97% of the time. Kentucky’s opponent in this round is Notre Dame 47% and Kansas 53% of the time.

In the Sweet 16, there are eight teams that make it every time: Kentucky, Wisconsin, Villanova, Duke, Arizona, Virginia, Gonzaga, and Notre Dame. Kansas is the only number 2 seed not on the list as Wichita State is predicted to beat them in 8% of their matchups. Kentucky’s opponent in the Sweet 16 is Maryland 39%, West Virginia 36%, Valparaiso 14%, and Buffalo 11%. Valparaiso is the only 13 seed predicted to make it to the Sweet 16. Villanova’s opponent is either Northern Iowa 61% or Louisville 38%. Duke appears to be facing either Utah 67%, Stephen F. Austin 19%, or Georgetown 14%.

Now, for the teams that make it into the third round. I’m not sure how many people consider a 9 seed beating an 8 seed an upset, but the number 9 seeds that are expected to progress are Purdue, Oklahoma State, and St. John’s. In regards to the 10 seed, Davidson is the most likely to continue with a 47% chance to move past Iowa, which is the highest percentage for an upset not including the 8-9 seed matchups. Following them is 11 seed Texas, who have a 42% of defeating Butler. For the 12 seeds, Buffalo is the most likely to continue with a 36% chance of beating Virginia. The 13 seed with the best chance of progressing is Valparaiso with 19% over Maryland. Lastly, the only 14 seeds that move on are Georgia State and Albany, which only happens a mere 8% of the time.

In general, Arizona seems to win the championship when using Massey and linear or interval weighting without home and away. This could be because most of their losses happen during the beginning of the season, while they win important games towards the end. Using the Colley method is when most of the upsets are predicted. For example, Stephen F. Austin making it to the championship game happens using the Colley logarithmic weighting. Davidson beating Iowa in the second round is also found many times using different Colley methods.

Overall, there are various methods that include various factors, but there are still qualitative variables that we don’t include. On the other hand, math can do a lot more than people expect. Considering Kentucky is undefeated, I presumed the math would never show them losing, but there is a lot more in the numbers than you think. Combining the various methods on 36 different brackets, I computed the probabilities of teams making it to specific rounds and decided to make a bracket using the combined data. This makes it so I don’t have to decide on solely one weighting that determines my bracket; instead, I use the results from several methods. Unfortunately, there is always one factor we cannot consider, luck! That is why we can only make estimates and never be certain. From my results, I would predict to see a Final Four of Kentucky, Arizona, Villanova, Duke; a championship game of Kentucky, Villanova; and the 2015 national champion being Kentucky.

 

 

Cinderella stories? A College of Charleston student examines March Madness upsets through math

Drew Passarello, a student at the College of Charleston, takes a closer look at how math relates to upsets and predictability in March Madness.

balls

The Madness is coming. In a way, it is here! With the first round of the March Madness tournament announced, the craziness of filling out the tournament brackets is upon us! Can math help us get a better handle on where we might see upsets in March Madness? In this post, I will detail how math helps us get a handle on what level of madness we expect in the tournament. Said another way, how many upsets do we expect? Will there be a lot? We call that a bad year as that leads to brackets having lower accuracy in their predictions. By the end of the article, you will see how math can earmark teams that might be on the cusp of upsets in the games that will capture national attention.

Where am I learning this math? I am taking a sports analytics class at the College of Charleston under the supervision of Dr. Tim Chartier and Dr. Amy Langville. Part of our work has been researching new results and insights in bracketology. My research uses the Massey and Colley ranking methods. Part of my research deals with the following question: What are good years and bad years in terms of March Madness? In other words, before the tournament begins, what can we infer about how predictable the tournament will be?

One way of answering this question is to see how accurate one is at predicting the winners of the tournaments coupled with how high one’s ESPN score is. However, I also wanted to account for the variability of the level of competition going into the tournament, which is why I also looked at the standard deviation of the ratings of those in March Madness. A higher standard deviation implies the more spread out the playing level is. Ultimately, a good year will have a high tournament accuracy, high ESPN score, and a high standard deviation of ratings for those competing in March Madness. Similarly, a bad year will have low tournament accuracy, low ESPN score, and a low standard deviation of the ratings. This assessment will be relative to the ranking method itself and only defines good years and bad years solely in terms of past March Madness data.

I focused on ratings from uniformly weighted Massey and Colley ranking methods as the weighting might add some bias. However, my simple assessment can be applied for other variations of weighting Massey and Colley. I found the mean accuracy, mean ESPN score, and mean standard deviation of ratings of the teams in March Madness for years 2001 – 2014, and I then looked at the years which rested below or above these corresponding means. Years overlapping were those deemed to be good or bad, and the remaining years were labeled neutral. The good years for Massey were 2001, 2004, 2008, and 2009, and the bad years were 2006, 2010 – 2014. Neutral years were 2002, 2003, and 2007. Also, for Colley, the good years were 2005, 2007 – 2009; bad years were 2001, 2006, and 2010 – 2014; neutral years were 2002 – 2004. A very interesting trend I noticed from both Massey and Colley was that the standard deviation of the ratings of those in March Madness from 2010 to 2014 were significantly lower than the years before. This leads me to believe that basketball has recently become more competitive in terms of March Madness, which would also partially explain why 2010 – 2014 were bad years for both methods. However, this does not necessarily imply 2015 will be a bad year.

In order to get a feel for how accurate the ranking methods will be for this year, I created a regression line based on years 2001 – 2014 that had tournament accuracy as the dependent variable and standard deviation of the ratings of those in March Madness as the independent variable. Massey is predicted to have 65.81% accuracy for predicting winners this year whereas Colley is predicted to have 64.19%accuracy. The standard deviation of the ratings for those expected to be in the tournament was 8.0451 for Massey and 0.1528 for Colley, and these mostly resemble the standard deviation of the ratings of the March Madness teams in 2002 and 2007.

After this assessment, I wanted to figure out what defines an upset relative to the ratings. To answer this, I looked at season data and focused on uniform Massey. Specifically for this year, I used the first half of the season ratings to predict the first week of the second half of the season and then updated the ratings. After this, I would use these to predict the next week and update the ratings again and so on until now. For games incorrectly predicted, the median in the difference of ratings was 2.2727, and the mean was 3.0284. I defined an upset for this year to be those games in which the absolute difference in the ratings is greater than or equal to three. This definition of an upset is relative to this particular year. I then kept track of the upsets for those teams expected to be in the tournament. I looked at the number of upsets each team had and the number of times each team gets upset, along with the score differential and rating differences for these games. From comparing these trends, I determined the following teams to be upset teams to look for in the tournament: Indiana, NC State, Notre Dame, and Georgetown. These teams had a higher ratio of upsets over getting upset when compared to the other teams. Also, these teams had games in which the score differences and rating differences were larger than those from the other teams in March Madness.

I am still working on ways to weight these upset games from the second half of the season, and one of the approaches relies on the score differential of the game. Essentially, teams who upset teams by a lot of points should benefit more in the ratings. Similarly, teams who get upset by a lot of points should be penalized more in the ratings. For a fun and easy bracket, I am going to weight upset games heavily on the week before conference tournament play and a week into conference tournament play. These two weeks gave the best correlation coefficient in terms of accuracy from these weeks and the accuracy from March Madness for both uniform Massey and Colley. Let the madness begin!