My single-game prediction model for 2023-2024 is called Magnus 7. As the name suggests,
it is closely related to its predecessor, Magnus 6.
It maintains the same structure: estimating
the patterns of shots and penalties that will be taken in a game and then simulating the result of those
shots. It is so similar, in fact, that much of this explanation is copied from the explanation there,
where appropriate. For those who are familiar with my past work, the key improvements over last
I have made a variety of models to estimate the ability of hockey players and coaches to influence hockey games. They are interesting in their own right but I also like to use these estimates to predict the outcome of future hockey games. This article describes that simulation process, which takes as inputs a set of players for the home team and another set for the away team, together with an expected distribution of icetimes and a pair of head coaches. Many things can be measured from a simulation but my primary interest is in the probability of a given team winning, obtained as the proportion of wins in many iterations.
Individual player impacts on 5v5 shot rates are estimated using a model described here. This submodel is the most important part of the overall probabilities obtained; isolating each player's impact from their past teammates, opponents, score deployment, zone deployment, rest, and coaching. Since we take as input a distribution of icetimes, the player estimates can be combined in a weighted sum to obtain "baseline" shot rate maps for the portion of the game played at even strength.
For special teams, the 5v5 method above is repeated with some simplifications; specifically we neglect rest but keep all of the other terms. For each passage of play where one team has five skater and the other four, a record is made as the above, with the five skaters listed as attackers and the four as defenders. Thus I obtain estimates for player marginal impacts on power-play offence and penalty-kill defence. For power-play defence and penalty-kill offence I assume league average play from all players.
The individual shot isolates are combined to form a team estimate for the game being simulated, where each skater's isolate is weighted according to their expected icetime in a given situation. In all three team estimates are formed, one for even-strength, one for the power-play, and one for the penalty-kill. Player fatigue from recently-played games is taken into account at this stage, although specific fatigue-related roster decisions are not estimated. For the game at hand, the shots taken by the home team and the shots allowed by the road team are averaged to obtain the home-team shot rate estimate for the simulation to come, so for instance the "home team power-play for this game" map is the average of that team's power-play map and their opponent's penalty-kill map, and so on for all of the other maps.
For each skater, I estimate a player's impact on their team's rate of drawing and taking penalties using a regression model similar to the shot rate models. These individual impacts are averaged, weighted by expected all-situations icetime, to obtain team penalty rates, drawn and taken. The overall penalty rate for the home team is the average of the "home team taken" rate and the "away team drawn" rate, and vice versa. Major penalties, misconducts of any kind, offsetting minors, penalties taken or drawn by goaltenders, and bench minors are not considered.
With the team shot and penalty rate estimates in hand, I can perform a simulation of the game. First, for each second of the game, I randomly choose if that second will contain a penalty or not, using the estimated team penalty rates. If it does, the number of skaters on the ice from the penalized team is decreased by one. After 120 seconds, whether or not any number of goals are scored by either team, the number of skaters for that team is increased by one again. The shot rate estimates are adjusted to account for the new number of skaters, but not adjusted to account for specific players being unavailable (because they are in the penalty box). This very simplistic treatment of penalty-taking produces patterns of special-teams play that are nevertheless similar to what we see in games. Situations where both teams have the same number of skaters are "even strength"; when one team has more skaters than the other that is a "power-play" for the team that has more and "penalty-kill" for the team that has fewer. No distinction is drawn between 5v5 and 4v4 or 3v3 play (except one minor detail noted later in the shooting section), nor is 5v4 distinguished from 5v3 or 4v3 play.
Score effects are modelled, with the leading and trailing terms from the 5v5 shot rate model; every team is assumed to respond to scores in a league-average way and also the tendancy of coaches to change tactics when tied or leading by one or two in the third period is included with modifications peculiar to each coach.
For shots, I use a naive approach, that is, for each second, I randomly choose if that second will contain no shots (like most seconds), a home team shot, or an away team shot. The weights for this random choice are taken from the team shot estimates: for instance, if a team is estimated to take 42 shots per hour, the chance of them taking a shot in a given second will be 42/3600 ≈ 0.0117, a little over 1%. The "structural" terms from my shot rate models are applied depending on the second at hand, so home teams generate more shots, and both teams generate more and more dangerous shots during the second period.
Once a given team takes a shot, I estimate which player on the team is likely to take the shot. For each skater, I compute a "shot propensity", that is, the fraction of their team's on-ice shots which they have taken in the past, given the skater-strength situation (EV, PP, or PK). A shoot-first player like Ovechkin, for instance, has a shot propensity of 34% at even strength and 37% on the power-play; the caps offence runs through him. On the other hand, a playmaker like Joe Thornton has a shot propensity of 10% in both situations. A player with a low shot propensity at even-strength might have a different role and thus a different shot propensity on the power-play; most players are close to the 20-30% one might expect from a balanced offence. These shot propensities are weighted by expected icetime in the given skater-strength situation and the resulting weights are used to choose a player to designate as the shooter of the shot.
After the shooter is chosen, I choose a setter from the shooter's teammates. With a 20% chance of there being no setter. Rather than try to estimate "setting propensity", I instead use a simple position-based estimation, where forwards are twice as likely as defenders to set for other forwards, and forwards are three times as likely as defenders to set for defenders. These probabilities are chosen to roughly match what is observed from Corey Sjnader's hand-tracking of passing data.
Independently of the choice of shooter and setter, the team shot rate estimates are used to randomly choose a location for the shot. Thus, teams which consistently shoot from high-danger locations will have better results than teams which shoot at the same rate from lower-danger locations.
Once a shot is being taken by a given player from a certain spot against a specific goaltender, we compute the probability that such a shot will be a goal. This process is modelled with its own model.
Not all of the above model features are used in the game simulation.
In order to model the final minutes of close games, I have resorted to a very crude sub-model, replacing all of the sophistications of the above with hard-coded goal probabilities depending only on the minute and the score. Specifically, each second is associated with fixed goal probabilities as follows:
These clumsy percentages nevertheless do a decent job of mimicking, on average, an evolving distribution of scores similar to observed recent nhl seasons. I won't pretend to be proud of this sort of thing but it's an unmistakable improvement over ignoring empty-net behaviour, as previously.
After simulating 3600 seconds in the above manner, if one team has more goals then the other, we record this as a regulation win with the indicated score and simulation stops. If the score is tied, then a further 300 seconds are simulated, using the same methods. No distinction is drawn between 5v5 and 3v3 shot maps, but the goal likelihood of shots is higher in 3v3 play. If a team scores, the simulation is ended and I record an overtime win for that team. If no overtime goals are scored, then I record a shootout win and an unweighted (that is, 50/50) coin is flipped to decide which team wins. I find that approximately ten thousand simulations are sufficient for quantities of interest (game win probability, expected number of goals for each team, and so on) to stabilize.
Many details of how hockey is actually played are modelled in simplified ways or elided completely. Some of these do not concern me, like how icings, faceoffs, timeouts, and bench minors, among other things, are not modelled explicitly. Some of them are worth some attention, like treating overtime shot rates as if 5v5, and treating shootout attempts as 50/50 propositions for all teams.
In addition to these general weaknesses, there are some specific weaknesses than I hope to improve in the future:
There is a fine but I think crucial point that the game simulatution here is not an essentially statistical model. Some of the inputs to this simulation are estimated using statistical techniques, especially regression; but they are estimates of intrinsic "hockey" things. As a matter of type, you should think of this model as any other simulation of real-world phenomena, like a simulation of a particle undergoing ballistic motion, or reaction-diffusion, or gravity, or fluid flow. (As a matter of quality, my models are not remotely close to the state of the art for any of those things; those things being as they are much more important than hockey and having as they do much much more capable batteries of workers than me.)
Crucially, in a statistical model, the relative importance of the inputs is determined by optimizing a suitable function over some training data; Magnus contains no such step and cannot be described as "trained" in this sense. The relative importance of the various inputs (shooting ability, shot generation and suppression, and so forth) are determined by the natural parameters of the hockey game being simulated; that is, how long it is, how many players are on the ice at a given time, and so on. (These inputs themselves—the shooting talent, the ability to generate or suppress shots, and so on—are generated by statistical models, but the overarching simulation by which I obtain game-winning probabilities is, as much as I can make it, a "first principles" simulation with few "magic numbers").
This (to me) fundamental difference—moving from statistical models to scientific ones—represents an enormous improvement in interpretability, where questions like "why is it more important for a hockey team to excel at thing A than to excel at thing B" can be given better answers than simply "it has been observed so in the past". My desire to understand (and explain) hockey drives me to value interpretability over accuracy (though obviously both are desirable) which is why I have constructed Magnus (and Edgar before it) as I have done.
In very broad terms, then, Magnus is the most-recent incarnation of the type of model which I have been meaning to build since I started working in hockey nearly nine years ago, where the important features of what happens from moment to moment on the ice are replicated in silicon; the kind of thing that you can turn over in your hands and learn from.