Basketball season may be over for the Michigan State Spartans, but the NCAA Tournament will be continuing with the Sweet 16 this coming Saturday. After the bracket was released, I presented my detailed analysis of the bracket and made some math-based predictions about how the first weekend and entire tournament might play out.
While it would be more fun to write about a potential MSU-Alabama matchup in the Sweet 16 (which might have actually come to fruition had the Spartans simply boxed out properly on a rebound in the final seconds of the First Four contest against UCLA) it is still fun to reflect on the results of the first weekend and to take another math-based looked at the remaining tournament field. If nothing else, in the great words of Coach Mark Dantonio, it is time to “complete this circle.”
Let’s start with a review of the wild action of the first two rounds.
Based on this analysis, I made a historically average number of upset picks and then carried this analysis through to the Final Four and eventual champion. In the real tournament, there were an above average number of seed upsets in both rounds (10 in first and six in the second to be exact). Table 1 below summarizes my upset picks and the actual upsets through two rounds.
Of the 13 total upset predictions that I made this year, a total of six were correct, two I give myself partial credit (marked with a yellow “O”) and five were wrong. There were eight additional upsets that I did not pick. On balance I think that my method did OK.
The biggest success was that I correctly predicted one of the biggest upsets of the weekend: No. 14 Abilene Christian’s upset win over No.3 Texas. I also picked No. 13 Ohio to upset No. 4 Virginia. I am also giving myself credit for taking UCLA/MSU to beat BYU, even though I was clearly thinking that it would be the Spartans and not the Bruins to win both of those games. Most office pools take the First Four winner as either/or, so it still counts in my book. Hey, there has to be some benefit of the First Four, right?
The other partial credit comes from the fact that I correctly bounced No. 3 West Virginia and No. 4 Oklahoma State in the second round, I just had the triumphant opponent wrong. From an office pool point of view, this also has some value.
As for a more visual view of the upsets, I repeat below one of the main figures that I used to make picks last week, with the actual upsets highlighted in bold, red text.
Figure 1: 2021 odds for the first round games compared to the average historical odds for each seed pair
From a certain point of view, in retrospect, my analysis perhaps did a better job than I originally thought. Of the 32 total first round games, only eight contest clearly fell below the average line, which denotes a more likely upset. Five of those games ended in an upset and Liberty and Colgate were both very competitive in their games. It was only the LSU versus Saint Bonaventure game that bucked this trend and I didn’t even make that pick.
I probably should have more seriously considered the possible Purdue upset by North Texas, but I decided to ignore the warnings of my own analysis. As for the other five upsets, three of them (Maryland, Syracuse, and UCLA) all lie close to the average line.
Only two of the 10 first round upsets were truly surprising: Oregon State and Oral Roberts. As for Oregon State, the Beavers’ upset of Tennessee perhaps could have been predicted had I simply remembered that head coach Rick Barnes was on the bench for the Volunteers and he is absolutely notorious for losing to lower seeds. As for Ohio State, my math suggests that upset on the No. 1 and No. 2 line are simply random bad luck. It’s happened to the best of us...
Figure 2 gives a similar retrospective analysis of the second round games.
Figure 2: 2021 odds for second round games compared to the average historical odds for each seed pair
As for the predictability of the six second round upsets, the results are less clear. In total, seven of the 16 games had above average upset odds and only three of those games ended in upsets. In this case, I did correctly pick USC’s upset of Kansas and West Virginia’s loss, but Texas Tech and Maryland (actually UCONN) let me down on the upset front.
It is also clear that I let my belief in the strength of the Big Ten cloud my analysis a bit. The data did suggest that Wisconsin had a shot to beat Baylor, and that was the pick that I made. BUT, the data suggested that Loyola-Chicago beating Illinois was actually more likely. If I couple that with the in-state rivalry aspect (similar to my analysis of Texas and Abilene Christian) then perhaps I should have seen than one coming. My faith in the Big Ten also caused two of my Final Four picks: Illinois and Ohio State to be knocked out very early.
As for the other three upsets, the Oregon State/Oklahoma State game was right on the average line, but Iowa and Florida both had better than expected odds to avoid an upset. Once again, you win some and you lose some.
In order to quantify the relative likelihood of a specific first and second round outcomes in any given year, one just needs to know the odds of each individual game outcome. You can then multiple those probabilities together to get the overall odds.
Fortunately, I happen to have just these odds, as derived from Kenpom efficiency margin data. In fact, these are exactly the numbers that I use to run my Monte Carlo simulations. I also happen to have performed the same calculation on each NCAA Tournament back to the beginning of the Kenpom era (2002).
The result tell me that the odds for the specific first round outcome in 2021 were:
1 in 81.5 million.
That is on the high side. The first round odds in 2013, 2016, and 2018 were similar in magnitude, but a little lower. The geometric average since 2002 is one in around five million. However, there is still one other year, 2012, that still holds the record for the least likely first round outcome at:
1 in 800 million.
This was the year where both Duke and Missouri were upset as No. 2 seeds by No. 15 seeds Lehigh and Norfolk State respectively. The year 2012 also had 10 total first round upsets, including a No. 4 seed and two No. 5 seeds. However, the second round in 2012 recorded only two additional upsets, and the odds of the specific outcome after two rounds was “only”
1 in 1.3 trillion.
This is actually slightly lower than the odds after two rounds in 2018 when No. 1 Virginia was upset in the first round by No. 16 UMBC, and then the second round saw the upset of a second No. 1 seed (Xavier) and half of the No. 2 seeds (Cincinnati and North Carolina). The odds of seeing the exact scenario in 2018 were:
1 in 1.8 trillion.
But, that pales in comparison to the tally from 2021. The qualitative estimates are, in fact, correct. The odds that I calculate for the current tournament results after two rounds are:
1 in 6.5 trillion.
which are the longest odds of the Kenpom Era by a factor of three, and much higher than the geometric average of one in 30 billion going back to 2002.
Table 2: Monte Carlo Simulation results starting form the Sweet 16
I decided to keep the pre-tournament Kenpom efficiency values in this case, so I don’t want to get too hung up on the details. What this tells me is that Gonzaga is still a heavy favorite to win it all (43 percent) and that the Zags have about a 75 percent chance to reach the Final Four.
Then, there are three teams next in line with similar odds to cut down the nets: Michigan, Houston, and Baylor (around 13 percent odds each). Each of those teams is 50-50 to advance to the Final Four. Then, there is a group of dark horse teams (Loyola-Chicago, Alabama, Arkansas, USC, and Villanova) with between two and five percent odds to win the Title.
I also included a column in this table labeled “normalized final four odds.” This is my attempted to estimate the relative ease or difficultly of each teams path to the Final Four. The calculation involves estimating the odds of each team to advance to the Final Four if they were only as good as a benchmark team with an efficiency margin of +19.00 (an average high-major team).
Higher percentages mean an easier path, which is the case for Arkansas and Houston, as they both will face double-digit seeds in Oral Roberts and Syracuse, respectively, in round three. On the opposite end of the spectrum is Creighton (who will face Gonzaga). The Blue Jays grade out to have the most difficult remaining path.
As for potential upsets to look out for in the next few rounds, Figure 3 below compares the odds in each contest relative to the historical average for each given seed combination. This is essentially the same analysis shown above in Figures 1 and 2.
Figure 3: 2021 odds for the Sweet 16 games (left) and potential regional final games (right) compared to the average historical odds for each seed pair
In this case, for the Sweet 16 games, I am using the odds from the actual opening Vegas lines, as opposed to the Kenpom projected odds. For the region final round (Figure 3, right) I revert back to the odds from Kenpom.
Based on both the original simulation results and the expected value calculations, two upsets are expected in the Sweet 16 round. Based on the left panel of Figure 3, the most likely upsets are for No. 1 Michigan to lose to Florida State and No. 6 USC to lose to No. 7 Oregon.
That said, USC actually has better odds than an average No. 6 versus No. 7 seed matchup, which makes me balk at that pick a little. The next most likely upset would be for No. 11 Syracuse to beat No. 2 Houston, which just feels annoyingly correct. If this were to come to pass, Jim Boeheim would surpass Tom Izzo with the most upset wins in NCAA Tournament history at 16. Dislike.
As for the regional final round, the odds suggest one out of the four games will end in an upset. On the right panel of Figure 3, I compare the teams under the assumption that the higher seeds all advance. In this scenario, the most likely upset in No. 8 Loyola to beat No. 2 Houston (if the Cougars can solve the Syracuse zone). After the beat-down that the Ramblers gave to the Illini last weekend, I would totally buy that.
If I were to start again from the Sweet 16 round, I believe that I would take Florida State and Syracuse to win, and then just the top seeds in the next round, which would give me a Final Four of:
This Final Four is a reasonable distribution of seeds and I think that it is total reasonable based on the eyeball test from last weekend. I would take Gonzaga over Alabama, and then I will take a flyer on Loyola to upset Baylor before succumbing to the machine that is Gonzaga.
That is all for today. Enjoy what is left of March Madness and as always, Go Green.
While it would be more fun to write about a potential MSU-Alabama matchup in the Sweet 16 (which might have actually come to fruition had the Spartans simply boxed out properly on a rebound in the final seconds of the First Four contest against UCLA) it is still fun to reflect on the results of the first weekend and to take another math-based looked at the remaining tournament field. If nothing else, in the great words of Coach Mark Dantonio, it is time to “complete this circle.”
Let’s start with a review of the wild action of the first two rounds.
Results of Rounds One and Two
In my analysis of the bracket, I presented data that showed that the average number of upsets to expect in the first round of the NCAA Tournament is eight, and is the second round, that number is five. When I looked at the projected odds for each of the first round and projected second round games, I identified a few matchups with better than average odds for an upset.Based on this analysis, I made a historically average number of upset picks and then carried this analysis through to the Final Four and eventual champion. In the real tournament, there were an above average number of seed upsets in both rounds (10 in first and six in the second to be exact). Table 1 below summarizes my upset picks and the actual upsets through two rounds.
Table 1: Summary of NCAA Tournament upset picks and upset results through two rounds |
Of the 13 total upset predictions that I made this year, a total of six were correct, two I give myself partial credit (marked with a yellow “O”) and five were wrong. There were eight additional upsets that I did not pick. On balance I think that my method did OK.
The biggest success was that I correctly predicted one of the biggest upsets of the weekend: No. 14 Abilene Christian’s upset win over No.3 Texas. I also picked No. 13 Ohio to upset No. 4 Virginia. I am also giving myself credit for taking UCLA/MSU to beat BYU, even though I was clearly thinking that it would be the Spartans and not the Bruins to win both of those games. Most office pools take the First Four winner as either/or, so it still counts in my book. Hey, there has to be some benefit of the First Four, right?
The other partial credit comes from the fact that I correctly bounced No. 3 West Virginia and No. 4 Oklahoma State in the second round, I just had the triumphant opponent wrong. From an office pool point of view, this also has some value.
As for a more visual view of the upsets, I repeat below one of the main figures that I used to make picks last week, with the actual upsets highlighted in bold, red text.
Figure 1: 2021 odds for the first round games compared to the average historical odds for each seed pair
From a certain point of view, in retrospect, my analysis perhaps did a better job than I originally thought. Of the 32 total first round games, only eight contest clearly fell below the average line, which denotes a more likely upset. Five of those games ended in an upset and Liberty and Colgate were both very competitive in their games. It was only the LSU versus Saint Bonaventure game that bucked this trend and I didn’t even make that pick.
I probably should have more seriously considered the possible Purdue upset by North Texas, but I decided to ignore the warnings of my own analysis. As for the other five upsets, three of them (Maryland, Syracuse, and UCLA) all lie close to the average line.
Only two of the 10 first round upsets were truly surprising: Oregon State and Oral Roberts. As for Oregon State, the Beavers’ upset of Tennessee perhaps could have been predicted had I simply remembered that head coach Rick Barnes was on the bench for the Volunteers and he is absolutely notorious for losing to lower seeds. As for Ohio State, my math suggests that upset on the No. 1 and No. 2 line are simply random bad luck. It’s happened to the best of us...
Figure 2 gives a similar retrospective analysis of the second round games.
Figure 2: 2021 odds for second round games compared to the average historical odds for each seed pair
As for the predictability of the six second round upsets, the results are less clear. In total, seven of the 16 games had above average upset odds and only three of those games ended in upsets. In this case, I did correctly pick USC’s upset of Kansas and West Virginia’s loss, but Texas Tech and Maryland (actually UCONN) let me down on the upset front.
It is also clear that I let my belief in the strength of the Big Ten cloud my analysis a bit. The data did suggest that Wisconsin had a shot to beat Baylor, and that was the pick that I made. BUT, the data suggested that Loyola-Chicago beating Illinois was actually more likely. If I couple that with the in-state rivalry aspect (similar to my analysis of Texas and Abilene Christian) then perhaps I should have seen than one coming. My faith in the Big Ten also caused two of my Final Four picks: Illinois and Ohio State to be knocked out very early.
As for the other three upsets, the Oregon State/Oklahoma State game was right on the average line, but Iowa and Florida both had better than expected odds to avoid an upset. Once again, you win some and you lose some.
How Mad Was It?
Based on a few different measures, such as the number of double-digit upsets, the 2021 NCAA Tournament looks to be one of the most chaotic tournaments on record. That said, measures like just counting double-digit seed underdogs are not very mathematically precise. Fortunately, there is a better way to compare the relative madness of different months of March.In order to quantify the relative likelihood of a specific first and second round outcomes in any given year, one just needs to know the odds of each individual game outcome. You can then multiple those probabilities together to get the overall odds.
Fortunately, I happen to have just these odds, as derived from Kenpom efficiency margin data. In fact, these are exactly the numbers that I use to run my Monte Carlo simulations. I also happen to have performed the same calculation on each NCAA Tournament back to the beginning of the Kenpom era (2002).
The result tell me that the odds for the specific first round outcome in 2021 were:
1 in 81.5 million.
That is on the high side. The first round odds in 2013, 2016, and 2018 were similar in magnitude, but a little lower. The geometric average since 2002 is one in around five million. However, there is still one other year, 2012, that still holds the record for the least likely first round outcome at:
1 in 800 million.
This was the year where both Duke and Missouri were upset as No. 2 seeds by No. 15 seeds Lehigh and Norfolk State respectively. The year 2012 also had 10 total first round upsets, including a No. 4 seed and two No. 5 seeds. However, the second round in 2012 recorded only two additional upsets, and the odds of the specific outcome after two rounds was “only”
1 in 1.3 trillion.
This is actually slightly lower than the odds after two rounds in 2018 when No. 1 Virginia was upset in the first round by No. 16 UMBC, and then the second round saw the upset of a second No. 1 seed (Xavier) and half of the No. 2 seeds (Cincinnati and North Carolina). The odds of seeing the exact scenario in 2018 were:
1 in 1.8 trillion.
But, that pales in comparison to the tally from 2021. The qualitative estimates are, in fact, correct. The odds that I calculate for the current tournament results after two rounds are:
1 in 6.5 trillion.
which are the longest odds of the Kenpom Era by a factor of three, and much higher than the geometric average of one in 30 billion going back to 2002.
Analyzing the Sweet 16
So, what’s next? With 16 teams remaining it is time to wipe the slate clean and try to make some new predictions about how the rest of the tournament will play out. I will start with the results of a new Monte Carlo simulation of the remainder of the tournament.Table 2: Monte Carlo Simulation results starting form the Sweet 16
I decided to keep the pre-tournament Kenpom efficiency values in this case, so I don’t want to get too hung up on the details. What this tells me is that Gonzaga is still a heavy favorite to win it all (43 percent) and that the Zags have about a 75 percent chance to reach the Final Four.
Then, there are three teams next in line with similar odds to cut down the nets: Michigan, Houston, and Baylor (around 13 percent odds each). Each of those teams is 50-50 to advance to the Final Four. Then, there is a group of dark horse teams (Loyola-Chicago, Alabama, Arkansas, USC, and Villanova) with between two and five percent odds to win the Title.
I also included a column in this table labeled “normalized final four odds.” This is my attempted to estimate the relative ease or difficultly of each teams path to the Final Four. The calculation involves estimating the odds of each team to advance to the Final Four if they were only as good as a benchmark team with an efficiency margin of +19.00 (an average high-major team).
Higher percentages mean an easier path, which is the case for Arkansas and Houston, as they both will face double-digit seeds in Oral Roberts and Syracuse, respectively, in round three. On the opposite end of the spectrum is Creighton (who will face Gonzaga). The Blue Jays grade out to have the most difficult remaining path.
As for potential upsets to look out for in the next few rounds, Figure 3 below compares the odds in each contest relative to the historical average for each given seed combination. This is essentially the same analysis shown above in Figures 1 and 2.
Figure 3: 2021 odds for the Sweet 16 games (left) and potential regional final games (right) compared to the average historical odds for each seed pair
In this case, for the Sweet 16 games, I am using the odds from the actual opening Vegas lines, as opposed to the Kenpom projected odds. For the region final round (Figure 3, right) I revert back to the odds from Kenpom.
Based on both the original simulation results and the expected value calculations, two upsets are expected in the Sweet 16 round. Based on the left panel of Figure 3, the most likely upsets are for No. 1 Michigan to lose to Florida State and No. 6 USC to lose to No. 7 Oregon.
That said, USC actually has better odds than an average No. 6 versus No. 7 seed matchup, which makes me balk at that pick a little. The next most likely upset would be for No. 11 Syracuse to beat No. 2 Houston, which just feels annoyingly correct. If this were to come to pass, Jim Boeheim would surpass Tom Izzo with the most upset wins in NCAA Tournament history at 16. Dislike.
As for the regional final round, the odds suggest one out of the four games will end in an upset. On the right panel of Figure 3, I compare the teams under the assumption that the higher seeds all advance. In this scenario, the most likely upset in No. 8 Loyola to beat No. 2 Houston (if the Cougars can solve the Syracuse zone). After the beat-down that the Ramblers gave to the Illini last weekend, I would totally buy that.
If I were to start again from the Sweet 16 round, I believe that I would take Florida State and Syracuse to win, and then just the top seeds in the next round, which would give me a Final Four of:
- No. 1 Gonzaga
- No. 1 Baylor
- No. 8 Loyola-Chicago
- No. 2 Alabama
This Final Four is a reasonable distribution of seeds and I think that it is total reasonable based on the eyeball test from last weekend. I would take Gonzaga over Alabama, and then I will take a flyer on Loyola to upset Baylor before succumbing to the machine that is Gonzaga.
That is all for today. Enjoy what is left of March Madness and as always, Go Green.