Prediction Model Edgar

September 19, 2017, Micah Blake McCurdy, @IneffectiveMath

Introduction

My single-game prediction model for 2017-2018 is called Edgar. It replaces last year's model, called Cordelia. It works by estimating the patterns of shots that will be taken in a game and then simulating the result of those shots. This simulation approach makes it completely different from my past models, which are all essentially statistical. Edgar contains no parameters that are obtained from any statistical algorithm; that is, it is not "trained" by optimization of any function. You should think of Edgar as a "model" of hockey like you would think of a wind-tunnel carving as a model of a race car or of the theory of gravitation as a model of planetary movement: that is, as a simplified version of something which interests us.

Game Preparation

Edgar takes as primary input a single game to be played between two NHL teams on a specified date. Simulation is done by the following process:

Line-up Estimation

The only common feature that Edgar shares with its predecessor Cordelia is the estimation of the lineup for a given game: every player under contract is given a probability of being in the lineup which is proportional to their total icetime (for any team) in the past two seasons. Prospects who have never played before thus are never chosen, except for a shortlist of players who have very little regular season experience (or none) yet are very likely to play substantial minutes this season. In 2017-2018, this list is:

With the sole exception Shipachyov, estimates for all of these prospects were provided by Hannah Stuart, who is very well-informed about prospects.

Players who are expected to be hurt on the day of the game (given their current injuries) are not included.

Goaltenders are also "hand-tuned" slightly; I guess what fraction of a given team's future games are likely to be played by each of their under-contract goalies; typically this means around 70% for most starters, most of the remainder to a backup, and a few percent to third- and fourth-stringers who will be called up in case of injury.

Shot Rate Estimation

For Individual Skaters

For each player in the lineup, I measure the on-ice shot rates of that player over the past two regular seasons, that is, for all of 2015-2017. Here, as throughout this article, "shot" means "unblocked shot", that is, a shot that is recorded by the NHL as either a goal, a save, or a miss (this latter category includes shots that hit the post or crossbar). I would prefer to include blocked shots also but cannot since the NHL does not record shot locations for blocked shots. For each player I form three such "player isolates", one for even-strength play, one for power-play, and one for the penalty-kill. Estimates for players with few minutes in a given situation are regressed towards league average results. For special teams, I consider 50 minutes of icetime in the past two seasons sufficient to use raw results; for even-strength I require 200 minutes. Players with less than the threshold number of minutes are "topped up" with imaginary league-average results until they reach the threshold.

Even-strength shot isolates are score-adjusted, that is, artificially altered to account for the fact that shots are easier to obtain when trailing than when leading. A rough description of how score-adjustment is done for shot maps can be found here. No other adjustments are done at this time but a sophisticated model might adjust for teammate quality, competitor quality, zone deployment, coaching, and age. Special-teams isolates are not adjusted in any way.

For Teams

The individual shot isolates are combined to form a team estimate for the game being simulated, where each skater's isolate is weighted according to their expected icetime in a given situation. In all three team estimates are formed, one for even-strength, one for the power-play, and one for the penalty-kill. For the game at hand, the shots taken by the home team and the shots allowed by the road team are averaged to obtain the home-team shot rate estimate for the simulation to come.

Finally, the team shot rates are adjusted to account for home-ice advantage and for rest. For home-ice advantage, the home team shot map is adjusted up (especially in front of goal) by a fixed amount matching the average observed home-ice advantage in the past season; the away team map is similarly adjusted in the opposite direction. For rest, the shot rates for both teams are adjusted according to the measured rest impact over the past five seasons, as described here.

Penalty Rate Estimation

For each skater, I measure the rate at which they drew and took minor penalties over the past two seasons. These individual rates are averaged, weighted by expected all-situations icetime, to obtain team penalty rates, drawn and taken. The overall penalty rate for the home team is the average of the "home team taken" rate and the "away team drawn" rate, and vice versa. Major penalties, misconducts of any kind, offsetting minors, penalties taken or drawn by goaltenders, and bench minors are not considered.

Game simulation

With the team shot and penalty rate estimates in hand, we can perform a simulation of the game. First, for each second of the game, we randomly choose if that second will contain a penalty or not, using the estimated team penalty rates. If it does, the number of skaters on the ice from the penalized team is decreased by one. After 120 seconds, whether or not any number of goals are scored by either team, the number of skaters for that team is increased by one again. The shot rate estimates are not adjusted to account for specific players being unavailable. This very simplistic treatment of penalty-taking produces patterns of special-teams play that are nevertheless similar to what we see in games. Situations where both teams have the same number of skaters are "even strength"; when one team has more skaters than the other that is a "power-play" for the team that has more and "penalty-kill" for the team that has fewer. No distinction is drawn between 5v5 and 4v4 or 3v3 play, nor is 5v4 distinguished from 5v3 or 4v3 play. Empty-net situatations are not modelled either for delayed penalties or at game-end.

For shots, we use a naive approach, that is, for each second, we randomly choose if that second will contain no shots (like most seconds), a home team shot, or an away team shot. The weights for this random choice are taken from the team shot estimates: for instance, if a team is estimated to take 42 shots per hour, the chance of them taking a shot in a given second will be 42/3600 ≈ 0.0117, a little over 1%.

Once a given team takes a shot, we estimate which player on the team is likely to take the shot. For each skater, we compute a "shot propensity", that is, the fraction of their team's on-ice shots which they have taken in the past, given the skater-strength situation (EV, PP, or PK). A shoot-first player like Ovechkin might have a shot propensity of almost 35%, while a playmaker like Joe Thornton might have a shot propensity of less than 10%. Similarly, a player with a low shot propensity at even-strength might have a different role and thus a different shot propensity on the power-play. These shot propensities are weighted by expected icetime in the given skater-strength situation and the resulting weights are used to randomly choose a player to designate as the shooter of the shot.

Independently of the choice of shooter, the team shot rate estimates are used to randomly choose a location for the shot. Thus, teams which consistently shoot from high-danger locations will have better results than teams which shoot at the same rate from lower-danger locations.

Once a shot is being taken by a given player from a certain spot against a specific goaltender, we compute the probability that such a shot will be a goal. We begin from league-average measurements from the past several years linking shot locations to goal probabilities, given the skater-strength situtation. From this starting goal probability, we make three adjustments; one for the shooter, one for the goaltender, and one for home ice. For the shooter, we compute their historical shooting percentage (using unblocked shots, as throughout) compared to league average; for above-average shooters the goal probability will increase and for below-average shooters it will decrease. Symmetrically, we compute the save percentage of the goaltender, again relative to league average, and adjust the goal probability again. Finally, we adjust the shot probability to account for historical home-ice advantage; the chance of a home team shot being a goal is increased by 0.0019, for a road team decreased by the same amount. Once the goal probability is fixed, we randomly choose if the shot is a goal or not.

Game Result

After simulating 3600 seconds in the above manner, if one team has more goals then the other, we record this as a regulation win with the indicated score and simulation stops. If the score is tied, then a further 300 seconds are simulated, using the same methods. No distinction is drawn between 5v5 and 3v3 play. If a team scores, the simulation is ended and we record an overtime win for that team. If no overtime goals are scored, then we record a shootout win and an unweighted coin is flipped to decide which team wins. I find that approximately ten thousand simulations are required for quantities of interest (game win probability, expected number of goals for each team, and so on) to stabilize.

Weaknesses

The primary weaknesses of Cordelia (and its predecessors Oscar and Pip) are still present in Edgar. Roughly speaking, in descending order of importance, there are weaknesses of insufficient explicitness, weaknesses of omission, and weaknesses of simplification.

Insufficient Explicitness

Certain effects which are known to affect player and team results are not explicitly modelled by Edgar; most obviously, coaching, quality of teammates, and quality of competition. All of these things are implicitly present in the player estimates, however, and so this weakness is only as big as the change in these things. For teams with new coaches, for players whose roles are changed, for teams with new power-play or penalty-kill strategies, there will be a lag as Edgar learns about the change, implicitly. The central trouble with modelling such things, important as they are, is that after you take them out to isolate individual player contributions, you have to put them back in when you model the results obtained by various players in various roles. These weaknesses collectively are my most active area of research, especially quality of teammates-and-competition (which I intuitively insist on treating simultaneously, to my mathematical pain), and I have only my own sluggish abilities to blame for their not being adequately treated at this date.

Omissions

Certain effects are simply omitted entirely and are not implicitly present. The most salient of these is aging, which includes both the (typical, but not universal) improvement shown by "young" players (that is, younger than around 23 or 24) and the inevitable hockey-dotage of "old" players (that is, older than 27 or 28). To be included in Edgar, as it will inevitably eventually be, I will have to root out precisely which aspects of player behaviour change with age and in what ways, including teasing apart the effects of hours lived on this green earth from the effects of minutes spent on NHL ice, both of which surely contribute to "aging effects" but which do not accrue in at all the same ways.

Simplifications

Some details of how hockey is actually played are modelled in simplified ways or elided completely. Some of these do not concern me, like how icings, faceoffs, timeouts, and bench minors, among other things, are not modelled explicitly. Some of them are worth some attention, like treating overtime play as 5v5 (instead of as 3v3 which is actually is at the moment) and treating shootout attempts as 50/50 propositions for all teams. That said, these weaknesses are likely to remain for a long time while I grapple with the more serious problems mentioned above.

Strengths

There is a fine but I think crucial point that Edgar is not a statistical model. While its inputs are estimated using statistical techniques (taking of moments (especially zeroth and first), regression, curve-fittings of various types (especially kernel densities)), those inputs are used to obtain measurements using simulation of a simplified version of the thing that interests us instead of, say, a general linear model. Crucially, in a statistical model, the relative importance of the inputs is determined by optimizing a suitable function over some training data; Edgar contains no such step and cannot be described as "trained" in any way. The relative importance of the various inputs to Edgar (shooting ability, shot generation and suppression, and so forth) are determined by the natural parameters of the hockey game being simulated; that is, how long it is, how many players are on the ice at a given time, and so on.

This (to me) fundamental difference---moving from statistical models to scientific ones---represents an enormous improvement in interpretability, where questions like "why is it more important for a hockey team to excel at thing A than to excel at thing B" can be given better answers than simply "it has been observed so in the past". My desire to understand (and explain) hockey drives me to value interpretability over accuracy (though obviously both are desirable) which is why I have constructed Edgar as I have done. For instance, some specific weaknesses can be addressed in this kind of framework. One example that I have in mind is score effects: every team plays differently according to the score, but not all teams respond equally to the score; there are observable differences of quality and of magnitude. With a simulation framework like Edgar, I will in time be able to model such differences, suitably weighted by how often the various score-situations occur in simulation. Modelling such an effect using a statistical model strikes me as extremely tricky, though here I may be showing my training (in science, not in statistics).

In very broad terms, then, Edgar is the first incarnation of the kind of model of the type that I have been meaning to build since I started working in hockey nearly six years ago, where the important features of what happens from moment to moment on the ice are replicated in silicon; the kind of thing that you can turn over in your hands and learn from.

Appendix: How to read player and team isolates

To help people understand how Edgar makes its predictions, I've made summary graphics for each player and team. The player isolates can be found on each player's career page; for instance, this one for Erik Karlsson:

The red regions indicate where the Senators take and allow more shots than average for the given situation when Karlsson is on the ice; at even-strength (top-left) his own shots from the right point are prominent. On the power-play, (top-right), the other side of the ice features much more strongly (he plays largely at the centre point and especially enjoys passing to the left side). The percentage listed (24%) is the fraction of his team's shots he takes in those situations; 24% is slightly high for a defender. While short-handed with Karlsson, the Senators allow somewhat fewer than league-average shots (hence the blue colour), especially from immediately in front of the net. However, Karlsson himself only takes 12% of the Senators' shots when they are short-handed.

The text at middle-right lists three things. The lower two are easy to understand; Karlsson draws penalties 30% less often than the average player, and also takes them 27% less often. The shooting line expresses how much more likely an unblocked shot is to be a goal, given that Karlsson is the one who takes it. This is computed by comparing the league average for a player who shot the puck from the same locations as the player of interest did. Thus a player who consistently takes shots from great locations will score often enough but this will have no impact on their listed shooting skill here.

Similarly, by averaging over many plausible lineups, we can form team isolates, which describe how we expect teams to perform. These are on the team pages for each team, like this one for Dallas:

The overall structure is similar to the individual isolates, but also include a goaltending term, which indicates the expected effect by that team's goaltending. Here, positive numbers indicate above-average goaltending, so +0.32% indicates that Dallas' goaltending is expected to make every unblocked shot one-third-of-one-percent less likely to be a goal, independent of where the shot is taken from. The various goaltenders on Dallas' roster are weighted according to what fraction of the season they are each expected to play.