Season Simulations with Cordelia

September 28, 2016, Micah Blake McCurdy, @IneffectiveMath

My single-game prediction model for 2016-2017 is called Cordelia. Estimating the probability of a team winning a game is fun but the real fun comes from simulating entire seasons to measure some derived probabilities:

To answer these questions we need a simulation harness.

Cordelia is trained on team data; but teams are purely fictitious entities. Players and coaches, however, are somewhat less fictitious, so we would like to have a way to ground our estimates of team ability in measurements of personal abilities.

Probabilities for Individual Games

For every game in a season, we first estimate a lineup for both teams. This is done probabilistically based on the amount of ice-time each available player has seen over the last little while. Regulars with lots of accumulated icetime will be almost certain to dress, where occasional callups with a handful of minutes will have small but nonzero chance to appear in any given future game. After lineups are chosen, estimated icetimes are distributed among the players, this time based on past icetime per game when dressed instead of total icetime. With estimate icetimes in hand, I form the sum of the relevant individual results, weighted by projected icetime, to estimate the team value to feed into Cordelia. Individual results are measured over the past two calendar years and are regressed linearly towards NHL average values when individuals have very small NHL icetimes. Thus players with no NHL experience are all assumed to be NHL average quality, with only three exceptions for 2016-2017: Auston Matthews, Patrik Laine, and Jesse Puljujärvi are all assumed to:

Goaltenders are assumed to dress with a fixed probability that is entirely formed from the dense regions of my brain; with input (not always solicited) from people more familiar with the penchants of various coaches around the league.

For a given game we can repeat the above process many times, obtaining a variety of different possible lineups for both teams. I use a hundred different lineups, and cache the results in a convenient database. Whenever a team plays another game, or makes a trade, or a player is injured or returns from injury, the probabilities for all of their future games are re-calculated.

Monte Carlo Simulations

Once all of the probabilities are computed, the season itself can be simulated with a very slightly modified Monte Carlo algorithm. For each game, one of the pre-computed probabilities is randomly chosen (uniformly among the hundred), a winner of the game is also randomly chosen, weighted according to the chosen probability, a win type (regulation 75%, overtime 10%, or shootout 15%) is chosen, and the overall result recorded. This process is repeated as many times as desired; I find "one million times" suficient to stabilize estimates for most quantities of interest and also sound suitably impressive.

With a million simulation results ready to hand, one can measure the probability of all of the things I mentioned in the introduction, simply by counting what fraction of the simulations exhibit the thing that interests us. For instance, if a given team makes the playoffs in four hundred thousand simulations and does not in six hundred thousand, then their chance of making the playoffs is 40%.

Timescales

Careful readers of this article and the description of Cordelia itself will doubtless have detected that they operate on two different time scales---Cordelia is trained on covariates of "past twenty-five games" but the simulation harness uses "last two calendar years" for individual players. I have observed that using a longer window for Cordelia harms predictive ability, and I suspect that the main reason for this is that lineups for a given game no longer sufficiently resemble lineups from a few months prior. Since this simulation harness accounts for trades and injuries---that is, most of the things that make a lineup not what it was in past games---I feel justified in looking at a much longer window to gauge individual player's results.