My single-game prediction model for 2019-2020 is called Magnus 2. As the name suggests,
it is closely related to last year's
model, Magnus. It maintains the same structure:
the patterns of shots and penalties that will be taken in a game and then simulating the result of those
shots. It is so similar, in fact, that much of this explanation is copied from the explanation there,
where appropriate. For those who are familiar with my past work, the key improvements over last
The business of simulating an entire season has a number of moving parts, some of them sophisticated models in their own right. Roughly they are:
The exposition here focusses on the estimation of a single game probability.
One common feature that Magnus shares with its predecessors Edgar and Cordelia is the estimation of the lineup for a given game: every player under contract is given a probability of being in the lineup which is proportional to their total icetime (for any team) in the past two seasons. Prospects who have never played before thus are never chosen, except for a shortlist of players who have very little regular season experience (or none) yet are very likely to play substantial minutes this season. In 2019-2020, this list is:
Players who are expected to be hurt on the day of the game (given their current injuries) are not included.
Goaltender icetimes are also "hand-tuned" slightly; I guess what fraction of a given team's future games are likely to be played by each of their under-contract goalies; typically this means around 70% for most starters, most of the remainder to a backup, and a few percent to third- and fourth-stringers who will be called up in case of injury.
Individual player impacts on 5v5 shot rates are estimated using a model described here. This submodel is the most important part of the overall probabilities obtained; isolating each player's impact from their past teammates, opponents, score deployment, zone deployment, rest, and coaching.
For special teams, the 5v5 method above is repeated with some simplifications; specifically we neglect rest and coaching but keep all of the other terms. For each passage of play where one team has five skater and the other four, a record is made as the above, with the five skaters listed as attackers and the four as defenders. Thus I obtain estimates for player marginal impacts on power-play offence and penalty-kill defence. For power-play defence and penalty-kill offence I assume league average play from all players. (As an aside, this assumption is becoming more unreasonable every season; as recent work of Meghan Hall shows.)
The individual shot isolates are combined to form a team estimate for the game being simulated, where each skater's isolate is weighted according to their expected icetime in a given situation. In all three team estimates are formed, one for even-strength, one for the power-play, and one for the penalty-kill. Player fatigue from recently-played games is taken into account at this stage, although specific fatigue-related roster decisions are not estimated. For the game at hand, the shots taken by the home team and the shots allowed by the road team are averaged to obtain the home-team shot rate estimate for the simulation to come.
For each skater, I measure the rate at which they drew and took minor penalties over the past two seasons. These individual rates are averaged, weighted by expected all-situations icetime, to obtain team penalty rates, drawn and taken. The overall penalty rate for the home team is the average of the "home team taken" rate and the "away team drawn" rate, and vice versa. Major penalties, misconducts of any kind, offsetting minors, penalties taken or drawn by goaltenders, and bench minors are not considered.
With the team shot and penalty rate estimates in hand, I can perform a simulation of the game. First, for each second of the game, I randomly choose if that second will contain a penalty or not, using the estimated team penalty rates. If it does, the number of skaters on the ice from the penalized team is decreased by one. After 120 seconds, whether or not any number of goals are scored by either team, the number of skaters for that team is increased by one again. The shot rate estimates are adjusted to account for the new number of skaters, but not adjusted to account for specific players being unavailable (because they are in the penalty box). This very simplistic treatment of penalty-taking produces patterns of special-teams play that are nevertheless similar to what we see in games. Situations where both teams have the same number of skaters are "even strength"; when one team has more skaters than the other that is a "power-play" for the team that has more and "penalty-kill" for the team that has fewer. No distinction is drawn between 5v5 and 4v4 or 3v3 play, nor is 5v4 distinguished from 5v3 or 4v3 play. Empty-net situatations are not modelled, neither for delayed penalties nor at game-end. Score effects, however, are modelled, with the leading and trailing terms from the 5v5 shot rate model; where every team is assumed to respond to scores in a league-average way.
For shots, I use a naive approach, that is, for each second, I randomly choose if that second will contain no shots (like most seconds), a home team shot, or an away team shot. The weights for this random choice are taken from the team shot estimates: for instance, if a team is estimated to take 42 shots per hour, the chance of them taking a shot in a given second will be 42/3600 ≈ 0.0117, a little over 1%.
Once a given team takes a shot, I estimate which player on the team is likely to take the shot. For each skater, I compute a "shot propensity", that is, the fraction of their team's on-ice shots which they have taken in the past, given the skater-strength situation (EV, PP, or PK). A shoot-first player like Ovechkin, for instance, has a shot propensity of 34% at even strength and 37% on the power-play; the caps offence runs through him. On the other hand, a playmaker like Joe Thornton has a shot propensity of 10% in both situations. A player with a low shot propensity at even-strength might have a different role and thus a different shot propensity on the power-play; most players are close to the 20-30% one might expect from a balanced offence. These shot propensities are weighted by expected icetime in the given skater-strength situation and the resulting weights are used to choose a player to designate as the shooter of the shot.
Independently of the choice of shooter, the team shot rate estimates are used to randomly choose a location for the shot. Thus, teams which consistently shoot from high-danger locations will have better results than teams which shoot at the same rate from lower-danger locations.
Once a shot is being taken by a given player from a certain spot against a specific goaltender, we compute the probability that such a shot will be a goal. This process is modelled with its own model.
Not all of the above model features are used in the game simulation. The offence-defence model together with the shot propensity model, generate a shot from a given shooter on a given goalie from a given location. A shot type is not generated, nor do I designate certain shots as rushes or rebounds. These columns are used in the shooter-goaltender model to improve the estimates of the features which are used; namely, the shooter, the goaltender, the shot location (via distance and visible net), and the intercept. In the fullness of time the offence-defence model will include these other features also and the two models will interweave in the simulation harness up more smoothly.
After simulating 3600 seconds in the above manner, if one team has more goals then the other, we record this as a regulation win with the indicated score and simulation stops. If the score is tied, then a further 300 seconds are simulated, using the same methods. No distinction is drawn between 5v5 and 3v3 play. If a team scores, the simulation is ended and I record an overtime win for that team. If no overtime goals are scored, then I record a shootout win and an unweighted coin is flipped to decide which team wins. I find that approximately ten thousand simulations are required for quantities of interest (game win probability, expected number of goals for each team, and so on) to stabilize.
The primary weaknesses of my various previous models—namely, failing to account properly for teammates and competition—have been mitigated in Magnus. However, there are still various weaknesses, namely:
Certain effects are simply omitted entirely and are not implicitly present. The most salient of these is aging, which includes both the (typical, but not universal) improvement shown by "young" players (that is, younger than around 23 or 24) and the inevitable hockey-dotage of "old" players (that is, older than 27 or 28). To be included in Magnus, as it will inevitably eventually be, I will have to root out precisely which aspects of player behaviour change with age and in what ways, including teasing apart the effects of hours lived on this green earth from the effects of minutes spent on white ice, both of which surely contribute to "aging effects" but which do not accrue in at all the same ways. I have made some progress on this front but not enought to feel confident in integrating age into my predictions yet.
Some details of how hockey is actually played are modelled in simplified ways or elided completely. Some of these do not concern me, like how icings, faceoffs, timeouts, and bench minors, among other things, are not modelled explicitly. Some of them are worth some attention, like treating overtime play as 5v5 (instead of as 3v3 which it actually is at the moment) and treating shootout attempts as 50/50 propositions for all teams.
There is a fine but I think crucial point that Magnus is not a statistical model. While its inputs are estimated using statistical techniques (taking of moments (especially zeroth and first), regression, curve-fittings of various types (especially kernel densities)), those inputs are used to obtain measurements using simulation of a simplified version of the thing that interests us instead of, say, a general linear model. Crucially, in a statistical model, the relative importance of the inputs is determined by optimizing a suitable function over some training data; Magnus contains no such step and cannot be described as "trained" in this sense. The relative importance of the various inputs (shooting ability, shot generation and suppression, and so forth) are determined by the natural parameters of the hockey game being simulated; that is, how long it is, how many players are on the ice at a given time, and so on. (These inputs themselves—the shooting talent, the ability to generate or suppress shots, and so on—are generated by statistical models, but the overarching simulation by which I obtain game-winning probabilities is, as much as I can make it, a "first principles" simulation with no "magic numbers").
This (to me) fundamental difference—moving from statistical models to scientific ones—represents an enormous improvement in interpretability, where questions like "why is it more important for a hockey team to excel at thing A than to excel at thing B" can be given better answers than simply "it has been observed so in the past". My desire to understand (and explain) hockey drives me to value interpretability over accuracy (though obviously both are desirable) which is why I have constructed Magnus (and Edgar before it) as I have done.
In very broad terms, then, Magnus 2 is the third incarnation of the kind of model of the type that I have been meaning to build since I started working in hockey nearly eight years ago, where the important features of what happens from moment to moment on the ice are replicated in silicon; the kind of thing that you can turn over in your hands and learn from.