Welcome!
Welcome to the 2023 edition of my NCAA tournament guide. Every year Kaggle hosts a data science competition to see who can build the best model for predicting the tournament. The goal of this competition is to produce a win probability for every team in the tournament against every other team. For the matchups that occur, each submission is scored based on how close the probability was to the actual outcome.
The analysis I have done is intended to be used for that competition. However, this website is intended to be used to fill out a bracket. Win probabilities cannot fill out a bracket on their own, so I have created tools that will assist you in applying my research to that task.
The bracket tool gives the entire bracket (be sure to scroll or swipe around as it is very large). Each node in the graph, represented by a team name with seed as well as a team icon is clickable. Clicking the team will imply a win in that round and advance the team to the next round. When there is a valid matchup present the win probability for each team is presented next to the team name. There is a toggle in the upper left corner that allows the user to switch from win probabilities to point spreads.
When a team is clicked that invalidates downstream matchups, that whole stream is cleared from the bracket. Refreshing the page will always reset the bracket to its original state. As the tournament progresses I will update the bracket to reflect the actual results, and this will become the default state. However, you will still be able to manipulate the entire bracket.
Seed A | Seed B | Expected Win Probability |
---|---|---|
1 | 16 | 95% |
2 | 15 | 89% |
3 | 14 | 83% |
4 | 13 | 77% |
5 | 12 | 71% |
6 | 11 | 65% |
7 | 10 | 59% |
8 | 9 | 53% |
9 | 8 | 47% |
10 | 7 | 41% |
11 | 6 | 35% |
12 | 5 | 29% |
13 | 4 | 23% |
14 | 3 | 17% |
15 | 2 | 11% |
16 | 1 | 5% |
The bracket provides the probability that team A will beat team B. There are two important things to note here. The first is that a team with a 90% win probability is not guaranteed to win, and will, in fact, lose 1 in every 10 matchups with that opponent. The second note is around bracket strategy. If you were simply trying to give yourself the best chance to get every game right, you would pick the team with the higher win probability in every game. However, my model is not going to get every game right and neither will anyone else. Your true goal is get more games right than anyone else in your bracket group. To accomplish this you are going to need to pick some upsets, so you should look for matchups where the lower seed has a higher probability than you would expect given their seed. These are the upsets where you will gain an advantage over the rest of your pool. For example, using the chart below and everyone's favorite matchup, a 12 seed should beat a 5 seed 29% of the time, so if you see a 12 seed with a 40% win probability then that is a good pick to gain an advantage in your pool even though that team is not favored (the table to the left gives all the first round seed pairings and you can generate more seed based expected probabilities using the formula 50 + 3* seed difference). My last word about this is that you should also take into account the size of your pool; bigger groups require more upset picks and smaller groups fewer.
Scott Kellert | |
---|---|
Github: | github.com/skellert |
Email: | [email protected] |