March Mathness Wrap-up

On March 24 we posted some bold predictions on who would win the NCAA basketball championship game. We now have a winner and send our congratulations to Kelly Davis.

Vickie Kearn: My bold pick was Duke based on a little math, past performance, and the luck factor. The one thing missing from my equation was the upset factor and I will be sure to add that next year.

The winner of the ESPN bracket challenge is Joe Pearlman who filled out his bracket in 10 minutes and based his picks on a hunch. Out of 5.9 million entries, he is only one of two people who picked the final four and he will be taking home the $10,000 prize. Does this mean we should throw out all of our math models and go solely on hunches or throw darts at a bracket next year? Absolutely not! As you will see, Tim Chartier and his students did very well with their brackets.

Tim Chartier: Any prediction method is, at some level, working on the odds of longterm success. This can be seen by our methods producing brackets that were in the 90th percentile 3 of the past 4 years. However, this year was, indeed, quite different. Still, Kelly Davis, a senior math major at Davidson College, produced a bracket that beat many celebrity sports analysts’ brackets. We were only using the results of games, the time it occurred in the season, and whether the game was home, away or on neutral ground.

Caption: Kelly Davis with her prizes of Ben and Jerry’s and Princeton University Press books.

Vickie Kearn: What method did you use when preparing your bracket?

Kelly Davis: Like most students in my Math Modeling class, I used a linear weighted Colley Ranking method that we learned about in class, which uses a system of linear equations. Different derivations of the Colley Ranking Method are often used in sports rankings, including for the Bowl Championship Series. Each student then modified this method, to emphasize or add in different factors that each student felt was important. Not knowing much about college basketball, I had to pull from my somewhat limited knowledge of sports to help me decide what factors from the regular season were important to help predict the tournament outcomes. The three major factors that I implemented into my coding were the point difference between the winner and loser, when in the season the game was played and whether the game was home, away or on neutral ground.

Let me give a few more details on this. Factoring in point difference helps to indicate the strength of the win. Winning by a lot is a stronger win than only winning by a little. It also helps to factor in games that are very close in point systems and ultimately come down to a bunch of fouls being called. Considering when in the season the game is played allowed me to give heavy emphasis on the end of the season. If a team is playing poorly at the end (such as due to the injury of a major player) then they will probably not do well. All teams are playing intensely at the end of the season in their conference tournaments, which I consider as a good predictor for tournament play. Finally, I fold in a weight for location. Teams that typically do poorly at away games, will have a hard time in March Madness where no one plays a home game.

Dr. Chartier included the brackets generated by the linear and the uniform Colley methods into our ESPN group so that we could see how our brackets compared to the simplistic/conservative, non-modified versions. Despite ranking lower than the majority of the class last year, the linear Colley method ironically ended up being the next highest bracket after mine, placing in the 64.4 percentile. Last year it placed in the 82 percentile. Perhaps sometimes the safest approach is the best approach!

Vickie: Did you submit more than one bracket? If so, which performed the best?

Kelly: Each student in our class was allowed to submit up to three brackets and I ended up submitting two. My bracket that ended up being the most successful was my initial one that I had to complete for a homework assignment. In this bracket, except for the very first portion of the season, I divided the season up into 10 segments and weighted each segment by an increasing 10% and then weighted the last two segments with a bit higher percentile because most teams are playing conference championship games during this time. In this bracket, I also subtracted a 3 point home court advantage from the score of the home team. For my second one, I placed in the 64.4 percentile, which placed it at the same percentile as the linear Colley method. For this one, I mostly shifted more weight to the end of the season in terms of how important it was to be winning at the end of the season instead of the beginning. I also penalized teams who tended to lose more at away games.

Vickie: Were there any surprises this year that you did not count on and that affected your bracket in a big way?

Kelly: With only 4.7% of over 5.9 million brackets submitted to the Tournament Challenge accurately predicting Connecticut to win, let alone only two people in the entire country correctly picking the final four, I think it is safe to say there were many surprises that most people did not count on! In terms of my bracket, early on in the tournament, my model actually did very well at predicting the outcomes of the first two rounds, with me finishing the second round in the 91.0 percentile, which placed me above many experts on this subject such as Mike Greenberg, Dick Vitale, and Matthew Berry, who ended up in the 21.3, 21.3, and 11.9 percentiles, respectively. Then again, Matt Hasselbeck’s (quarterback for the Seattle Seahawks) 5-year-old son finished in the 93.4 percentile.

As the tournament progressed and more of the upsets started occurring/becoming more apparent, my bracket, along with many others such as President Obama’s bracket, started to be less successful at predicting these surprises. Many of the unpredicted surprises in my bracket were pretty unexpected for most people, such as Kentucky’s win over Ohio, the team that over a quarter of the brackets, including mine, had predicted would win and Butler’s surprising series of wins, as evident by the fact that only 11,326 of the 5.2 million had accurately predicted Butler being in the finals.

Vickie: Each round of the competition provides a certain number of points for a correct pick. For example, you get 10 points for each winner you pick in the second round of play, 80 points for selecting the Elite 8 and 320 points for selecting the champion. The most points you can get is 1920. What was your ESPN score? You didn’t win the $10,000 but what was your prize?

Kelly: As Dr. Chartier mentioned earlier, for the first time in the past four years, my class’s mathematical models were not as successful at predicting all of this year’s surprises and my ESPN score ended up being 560 points, placing me in the 68.1 percentile, which was the same percentile as Colin Cowherd, an American sports radio personality.

Despite not doing as well as other students in previous years, I was a bit more successful within my modeling class and end up winning our inner-class pool. As part of winning our class pool, I received $100 worth of books from Princeton University Press, a t-shirt from the Davidson College Athletics Department and several free cones to Ben & Jerry’s. The picture above shows me sitting in our local Ben & Jerry’s with some books on ranking published by Princeton University Press while I enjoy one of my victory cones.

I think the largest prize of all, however, was the opportunity to show my friends and fellow college students an exciting and cool application of math to a topic most people would never associate with math. Some of my friends hated seeing their carefully thought out brackets lose to a bracket generated by a “math nerd” who knows very little about college basketball, which made my ice cream victory taste even sweeter!

Vickie: What would you do differently next year?

Kelly: After having had a lot of success with running my coding on some of the past few seasons in terms of fairly consistently predicting a large portion of the elite eight’s each year, in some ways I would be tempted to change very little. As with most mathematical models, my model has many limitations and flaws, and consequentially will have instances such as this year where it is less successful at accurately predicting real world outcomes, but then again so were many experts. I think one of the coolest things about using math modeling to predict tournament outcomes is that you can use the same coding to predict outcomes each year without having to spend the entire regular season keeping track of scores and top teams.

A couple of things I would be interested in exploring would be to look at a team’s patterns of wins and losses as an indicator of how to weight wins at different points in the season. After seeing how successful Butler was for the second year in a row, I also think it would be interesting to consider the success rates of teams in previous March Madness tournaments.

Vickie: In the earlier post, Lucy McMurry was doing well. How did her bracket do in the end?

Tim Chartier: Lucy was, indeed, doing very well. However, many of her picks did not lead to points as the tournament progressed and so she ended up in the 50.9 percentile. So, she was better than over half the brackets but it was indeed a difficult year! We look forward to next year and maybe this year will give us new ideas and even new statistics to fold into our methods. Nevertheless, there will also be upsets and a certain amount of madness in March as the tournament unfolds.

March Mathness explained by Vickie Kearn and Tim Chartier

Just in time for the first Sweet Sixteen games, math editor Vickie Kearn speaks with Tim Chartier about how math and Google is used to predict the bracket winners. Enjoy this exclusive dialogue below.


Vickie Kearn: When I was at the University of Richmond, I only went to one football game in four years. That was not a program that got a lot of attention then. However, the Spiders had a great basketball team, as they do now. Because of their great success, I have been watching as many games as possible and following the brackets with a huge amount of energy. A few years ago, we published a book by Amy Langville and Carl Meyer, Google’s PageRank and Beyond: the Science of Search Engine Rankings, and part of the “Beyond” is how you use math to rank sports teams. It is really quite fascinating.

Amy and her students at the College of Charleston and Tim Chartier and his students at Davidson College use mathematical algorithms to rank teams and they are doing fantastic on their brackets this year. I asked Tim and his student, Lucy McMurry, a sophomore who plans to declare a major in math and a minor in Spanish, how they used their math background to make sense out of a bracket with 68 teams. How do you pick who will be #1?

Tim Chartier: There are a variety of techniques that can be used to rank items. Sports teams are often ranked by winning percentage. Elections are often won by the person who gains the highest number of votes. Search engines often use techniques from linear algebra. Amy and Carl’s book discusses important aspects of Google’s PageRank algorithm that makes it suitable for ranking webpages and how it is scalable to analyzing billions of items. For our brackets, we rank sports teams also with linear algebra. PageRank uses a stochastic matrix that is built from underlying probabilities based on a model of a surfer randomly traversing the web. Our method builds a linear system, Ax = b, which while still a linear system, has different properties than a stochastic system like PageRank’s.

Lucy McMurry: To begin, we acquire all of the data about the 68 participating teams including when each game was played, the score of each game, and whether or not it was home or away for each team. From this point, it is up to the student how he/she wants to implement the code. For example, an away win could be worth more than a home win and a game played later in the season could be worth more than one played at the beginning. Once the code is set, a ranking is produced using all of the data. From here, we assume that the higher ranked team will win each game. Thus, the top ranked team will win the tournament.

VK: When you found out on March 13 who would be playing in the tournament, how did you go about setting up your brackets and selecting who you predict will win on April 4?

LM: I personally don’t know very much about basketball and hadn’t followed the entire season. Therefore seeing who would be playing on March 13th didn’t really affect how I wanted to structure my code. I based my code on what seemed reasonable to me from an outsider’s perspective, which resulted in me creating a code based on when in the season a game was won.

TC: This is quite true. Fundamentally, the students think mathematically about their models, which allow all the students to participate. I’ve seen some students fiddle with their models when they have a particular team they want to see perform well or actually poorly. In many cases, this doesn’t help. While there are a number of different ways one could measure success, we submit our brackets to the ESPN Tournament Challenge, which allows us to compete against each other and millions of other brackets.

VK: Is there more than one way to rank the teams?

TC: Yes. Some people pick the winning team based on which team’s mascot they prefer! And yes, there are also multiple methods for doing this mathematically. In college football, the Bowl Championship Series (BCS) uses several methods to rank the teams. One such method is the Colley method is a linear system which is based on wins and losses and there is also the Massey method which can integrate the scores of games. One can also adapt Google’s PageRank algorithm.

LM: In fact, I’m using, as many of my fellow students are, the Colley method. However, we use recent research by Drs. Chartier and Langville that allows us to model momentum. By using these different models and ideas, we are all able to come up with original codes that can produce very different results.

VK: Suppose you love math but don’t know anything about basketball. Will your rankings still predict a reasonable winner?

LM: I love math, but as I mentioned earlier, I do not know very much about basketball. However, I am tied for first placed in our class pool along with three other students and am currently in the 91st percentile nationally! Therefore, I think I can safely say that my code predicted a very reasonable ranking of the teams.

TC: Lucy and three other students that are currently in the lead are performing better than over 5 million other brackets! Kelly Davis, who you will see in the video, is one of the students leading the class. Daniel Martin, who is also interviewed in the video, is in a pool outside the class and is currently in the 96th percentile. Interestingly, some students in the class pool tried very novel approaches to modeling momentum and a few such methods are performing quite poorly, with one such method ranking in the 1.5 percentile!

VK: Amy Langville’s students received quite a bit of publicity in the past because their predictions were so good. Have you experienced a bit of fame from your predictions?

TC: Yes. The media has covered our brackets for the past three years as we’ve done well each year.

LM: Just last week, Derek James of Fox News Charlotte came to our class to film Dr. Chartier talking to us about our brackets. Many of the students in the class were able to email their parents to watch Fox News and get a glimpse of us in class. You’ll see me in the news segment but I definitely have a good portion of any eventual 15 minutes of fame left! The interview concentrated on Dr. Chartier as well as a few students from my class discussing their brackets and the theories behind their code.

TC: In fact, I helped Derek create a bracket using our methods. I asked him to break the season into as many intervals as he wanted. For instance, suppose he chose 3. Then, he would weight the games during each interval. Suppose he chose weights of 1/2, 3/4 and 1. Then, all the games in the first, second and last third of the season would be worth 1/2, 3/4 and 1 game, respectively. He also gave weights to home and away games. In the end, he had a personalized bracket that is tied with Lucy’s! The winner of the class pool gets a prize from Ben and Jerry’s in Davidson. Derek isn’t eligible as coming for 15 minutes to class and talking during the lecture doesn’t qualify! We have great fun watching the brackets unfold and seeing how our modeling performs.

Vickie: To learn more about what Derek James learned in class, watch his interview below.

Derek James, Reporter FOX Charlotte-WCCB. Used with permission.

VK: In 2009 we published Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football by Wayne Winston and this is another great source for you veteran bracketologists. Also, if you go to waynewinston.com you can see all of Wayne’s calculations and odds of each team winning in a particular round. For example, for the Sweet Sixteen odds he gives the University of Richmond a .84% chance of winning the championship and Ohio State a 29.6% chance of taking home the trophy. Although my heart is still with Richmond, I am going to go with Duke for a repeat. All the rankings and calculations I have done give them less of a chance of winning than other teams, so they are a mathematical longshot. However, I added a little luck factor into my calculations. Wayne is going with Ohio State. Check back in a few weeks and we will let you know how we did.
Good luck to all!