Estimating Individual Impact on NHL 5v5 Shot Rates

(This model, "Magnus 2", is an updated version of last year's model, Magnus. In the interests of providing a self-contained documentation here, I've copied the explanation there with changes as appropriate.

I would like to be able to isolate the individual impact of a given skater on shot rates from the impact of their teammates, their opponents, the scores at which they play, the zones in which they begin their shifts, the instructions of their head coach, how tired the players are, and home-ice advantage. I have fit a regression model which provides such estimates. The most important feature of this model is that I use shot rate maps as the units of observation and thus also for the estimates themselves, permitting me to see not only what portion of a team's performance can be attributed to individual players but also detect patterns of ice usage.

Here, as throughout this article, "shot" means "unblocked shot", that is, a shot that is recorded by the NHL as either a goal, a save, or a miss (this latter category includes shots that hit the post or crossbar). I would prefer to include blocked shots also but cannot since the NHL does not record shot locations for blocked shots.

The most sophisticated element of Magnus is the method for estimating the marginal effect of a given player on shot rates; that is, that portion of what happens when they are on the ice that can be attributed to their individual play and not the context in which they are placed. We know that players are affected by their teammates, by their opponents, by the zones their coaches deploy them in, and by the prevailing score while they play. Thus, I try to isolate the impact of a given player on the shots which are taken and where they are taken from.

Although regression is mathematically more sophisticated than some other measures, it is in no way a "black box". As we shall see, every measurement can be broken down into its constituent pieces and scrutinized. If you are uneasy with the mathematical details but interested in the results, you should skip the "Method" section and just think of the method as like a souped-up "relative to team/teammate statistics", done properly.

Method

I use a simple linear model of the form $ Y = WX\beta $ where $X$ is the design matrix, $Y$ is a vector of observations, $W$ is a weighting matrix, and $\beta$ is a vector of marginals, that is, the impacts associated to each thing independent of each other thing.

The columns of $X$ correspond to all of the different features that I include in the model. There are five different types of columns:

Player performance estimates, two columns for each skater: one for their offensive impact (that is, on their own team's shot rates), and one for their defensive impact (that is, on their opponent's shot rates);
Score impacts: six columns for various different scores: trailing by three or more, trailing by two, trailing by one, leading by one, leading by two, and leading by three or more. There is no column for tied, which is taken as the reference state;
Zone impacts: six columns for the zones in which players start their shifts. The four "shift start types" that I use are:
- Offensive Zone,
- Neutral Zone,
- Defensive Zone, and
There are no columns for on-the-fly starts, which are taken as the reference state. On-the-fly shifts are chosen as the reference state not only because they are the most common, but also for theoretical reasons. A game could, in principle, consist of almost nothing but on-the-fly shifts, if the players and coaches so wished, even in a game with many whistles. Throughout the remainder of this article I'll use "zone" to mean "one of the four above ways to start a shift", understanding that on-the-fly shifts happen at all parts of the ice and nevertheless cannot be easily broken into sub-pieces according to location. For each zone I model the impact on attacking and defending shot rates separately, making six columns.
Coach impacts: two columns for the head coach of each team, one for attacking and one for defending, meant to serve as an umbrella for the effect the coach's instructions have on how the players on the ice choose to play.
Rest impacts: eight columns for "played last night", "played two nights ago", "played three nights ago", and "played four nights ago", separated into offence and defence for each.
Intercept: one column to indicate home-ice advantage.

Using such a small list of covariates ensures that there is no formal linear dependence among the columns (as there would have been had I included an overall intercept, or a column for tied score, for instance). The first iteration of Magnus included on-the-fly zone start columns, this created a near-dependence relation, which I am happy to remove in this iteration.

The entries in $Y$, the "responses" of the regression, are functions which encode the rate at which unblocked shots are generated from various parts of the ice. An unblocked shots with NHL-recorded location of $(x,y)$ is encoded as a two-dimensional gaussian centred at that point with width (standard deviation) of ten feet; this arbitrary figure is chosen because it is large enough to dominate the measurement error typically observed by comparing video observations with NHL-recorded locations and also produces suitable smooth estimates.

One detailed example (with no-longer current players for either team, but no matter) should make the structure of the model clear:

Suppose that the Senators are the home team and the Jets are the away team, and at a given moment of open play two Senators players (say, Karlsson and Phaneuf) jump on the ice, replacing two other Senators. The score is 2-1 for Ottawa, and the other players (say, Pageau, Hoffman, and Stone for Ottawa, against Laine, Ehlers, Wheeler, Byfuglien, and Enstrom) began their shift some time previously on a faceoff in Ottawa's zone. Play continues for 50 seconds, during which time Ottawa takes two unblocked shots from locations (0,80) and (-10,50) and Winnipeg takes no shots. This shift ends with the period and the players leave the ice. The game is the season-opener for Winnipeg, but Ottawa played their first game yesterday, using all of the named players except Phaneuf, who was healthy-scratched.

These fifty seconds are turned into two rows of $X$ and two entries in $Y$. First, the Ottawa players are considered the attackers, and the attacking columns for Pageau, Hoffman, Stone, Karlsson, and Phaneuf are all marked as "1". The Jets are considered the defenders and the defending columns for Laine, Ehlers, Wheeler, Byfuglien, and Enstrom are marked as "1". All of the other player columns, attacking or defending, are marked as "0". Because the Senators are winning 2-1, the score column for "leading by one" is marked with a "1" and the other score columns are marked as "0". Three of the Senators players are playing on-the-fly shifts, while the other two Senators skaters (that is, 40% of the skaters) are still playing the shift they began in their own zone, so the "home defensive-zone attacking" zone column is marked with "0.4". All of the jets skaters began their shift in the Ottawa zone, so the "away offensive-zone defending" zone column is marked with a "1.0", and all the other zone columns are marked as "0". The Senators are the home team, so the "home intercept" column is marked with a 1. The column for "Guy Boucher, attacking coach" is marked with a one, while the column for "Paul Maurice, defending coach" is marked with a one, and all the other coaching columns are marked with zeros. Since all of Winnipeg players are well-rested, no rest columns are marked for defence. Phaneuf is well-rested for Ottawa, but the other four skaters played last night, so the column for "played last night, attackers", is marked with 0.8, since 80% of the attacking skaters did so. All the other other rest columns are marked with zeros. Corresponding to this row of $X$, an entry of $Y$ is constructed as follows: Two gaussians of ten-foot width and unit volume are placed at (0,80) and (-10,50) and the two gaussians are added to one another. This function is divided by fifty (since the shift is fifty seconds long); resulting in a continuous function that approximates the observed shot rates in the shift. Finally, I subtract the league average shot rate from this. This function, which associates to every point in the offensive half-rink a rate of shots produced in excess of league average from that location, is the "observation" I use in my model.
Second, the same fifty seconds are made into another observation where the Jets players are considered the attackers, and the Senators are the defenders. The attacking player columns for the Jets are set to 1, the defending columns for the Senators players are set to 1. Since the Jets are losing, the score column of "trailing by one" is set to one, and the non-zero zone columns are:

"away offensive zone attacking": 1.0

"home defensive zone defending": 0.6

The column for "Guy Boucher, defending coach" is marked with a one, while the column for "Paul Maurice, attacking coach" is marked with a one, and all the other coaching columns are marked with zeros. Since all of Winnipeg players are well-rested, no rest columns are marked for offence. Phaneuf is well-rested for Ottawa, but the other four skaters played last night, so the column for "played last night, defenders", is marked with 0.8, since 80% of the defending skaters did so. All the other other rest columns are marked with zeros. The two rows have no non-zero columns in common. Since the Jets didn't generate any shots, the associated function is the zero function; I subtract league average shot rate from this and the result is placed in the observation vector $Y$.

The weighting matrix $W$, which is diagonal, is filled with the length of the shift in question. Thus, the above two rows will each have a weight of fifty.

By controlling for score, zone, teammates, and opponents in this way, I obtain estimates of each players individual isolated impact on shot generation and shot suppression.

Fitting

To fit a simple model such as $Y = WX\beta $ using ordinary least squares fitting is to find the $\beta$ which minimizes the total error $$ (Y - X\beta)^TW(Y - X\beta) $$ When the entries of $Y$ are numbers, this expression is a one-by-one matrix which I naturally identify with the single number it contains, and I can find the $\beta$ which minimizes it by the usual methods of matrix differentiation. To extend this framework to our situation, where the elements of $Y$ are functions from the half-rink to the reals, I must define an inner product on this function space, which I do by setting $ \left< f,g \right> = \int f(x,y) g(x,y) dA $ where the integral is taken over all coordinates $(x,y)$ in the half rink. Since I only use functions which are finite sums of gaussians, this integral always exists and is easily checked to define an inner product. Hence, I can use the well-known formula for the $\beta$ which minimizes this error, namely $$ \beta = (X^TX)^{-1}X^TWY $$ which makes it clear that the units of $\beta$ are the same as those of $Y$; that is, if I put shot rate maps in, I will get shot rate maps out.

However, I choose not to fit this model with ordinary least-squares, preferring instead to use generalized ridge regression; that is, I choose to minimize $$ (Y - X\beta)^TW(Y - X\beta) + \beta^T \Lambda \beta $$ where $\Lambda$ is a positive definite matrix which encodes our prior knowledge about the items encoded by the columns; that is, the players, the coaches, the effect of rest, and so on. This is a kind of zero-biased regression---the first error term is as before, but the second term, weighted by $\Lambda$, interprets deviation from zero (that is, league average) as intrinsically mistaken. The $\beta$ which minimizes this combined expression is the one which simultaneously agrees with the observations while doing as little violence as possible to the idea that the players in question are all, broadly, of NHL quality.

The usual methods (that is, differentiating with respect to $\beta$ to find the unique minimum of the error expression) gives a closed form for $\beta$ as: $$ \beta = (X^TWX + \Lambda)^{-1}X^TWY $$ Any positive-definite $\Lambda$ will suffice but I choose to use a (very-nearly) diagonal one, with entries $\Lambda_{ii} = \lambda_i$ depending on the column $i$ as follows:

For the score columns, the zone columns, the rest columns, and the home-ice intercept, I choose $\lambda = 0$. These terms are not theoretically constrained, and the fitting in this section of the model is like ordinary-least-squares fitting.
For all other terms (for players, zones, and scores) I use $\lambda = 10,000$. This value is obtained by examining the estimates for various $\lambda$ and choosing a sufficiently high value for stable, slowly varying estimates. More involved theoretical estimates of which $\lambda$ ought to be best (such as computing the generalized cross-validation error, following Brian MacDonald (Section 5.3)) suggest much smaller $\lambda$ values which give wildly varying year-to-year estimates of player ability.
For coaching terms, I use $\lambda = 50,000$, that is, five times as biased towards NHL-average as for skaters. This ratio has no theoretical justification, and is motivated by my judgment that coaches' influence, while diffuse and omni-present, is necessarily limited by the fact that they are not actually on the ice and can only influence results indirectly, through the imperfect conduits of their skaters.
Rookies or other players with low icetimes are not treated differently in any way, unlike in the first version of Magnus.

Players that we expect before we see their on-ice results or circumstances to be of similar ability can be fused, that is, penalty terms can be introduced to encode our prior belief that they are similar. I have chosen to fuse the Sedins in this way, with a penalty term of weight 10,000, because they are twins. I don't consider any more-distant relation than twins as legitimate grounds for this kind of prior.

The model can be fit to any length of time; in this article I'll be describing the results of fitting it iteratively on each season from 2007-2008 through 2018-2019 in turn. For 2007-2008, we use "nhl average"; for later seasons I use the estimate from the previous season as the prior for each column, with nhl average for new players or coaches. In symbols, this amounts to computing $$ \beta = (X^TWX + \Delta)^{-1}(X^TW Y + \Delta \beta_0) $$ instead of $$ \beta = (X^TWX + \Delta)^{-1}X^TW Y $$ where $\beta_0$ is the ability estimates from the previous season. In effect, instead of assuming every season that the league is full of new players about whom we know nothing, we use all of the information from the last dozen years, implicitly weighted as seasons pass to prefer newer information without ever discarding old information entirely.

Results

Home-Ice

Every column in the regression corresponds to a map of shot rates over a half-rink. The simplest is the home-ice column, which is positive as expected:

For the purposes of summarizing these maps, I use a simple "expected goals"-style weighting which I call threat. That is, for a given location, I can calculate the historical chance of a shot (known to be unblocked, as assumed throughout this article) of becoming a goal, without paying attention to any other features of the shot. To compute the associated threat of a shot map, multiply each point in a shot map with the corresponding probability of the shot becoming a goal as displayed above, and then sum. This gives a generic shooter-and-goalie-independent measure of how dangerous a given shot map is, in units of goals per hour. For instance, in 2017-2018, league average threat was around 2.3 goals per hour. For the maps in this article, I've quoted the threat of each map as a percentage of the threat of the league average map; so the +7.8% in the home intercept map above means "a pattern of shots that average shooters taking on average goalies would produce goals at a rate 7.8% higher than league average".

Score

The six score terms are as follows:

As we know, trailing teams dominate play, even though they still usually lose. To my surprise, the effect of trailing by one goal, by two goals, or by three or more goals appears to be largely similar. On the other hand, leading teams generate fewer shots, with the effect becoming more and more pronounced as the lead grows.

Fatigue

The eight fatigue terms are as follows:

These terms are additive, so a given player might have played last night and also three days ago, so the impact of fatigue on that player can be obtained by adding both of those maps. Predictably, the effects of fatigue are negative, and the strongest effect is from having played the previous night, but having played two days ago is also non-trivial. The "played four days ago" terms seem very unimportant and in the future I might well delete them.

Zone

The six zone terms are as follows:

The on-the-fly zone impacts are blank by definition, since we considered on-the-fly starts as the reference. The other terms are not nearly so easy to understand immediately, since zone start terms are not necessarily balanced between the two teams—that is, since defensive-zone starts for one team may not necessarily correspond to offensive-zone starts for the other team, some or all of whom may instead by continuing shifts that were started elsewhere. So, to try to make the zone usage easier to understand, I've computed what the sums of typical combinations of zones, with the home-ice term added in:

The home-team zone is listed on the left, the four columns show the away-team zone. So, to get our bearings, start with the fifth and sixth rows in the third column, when all five skaters from both teams are starting their shifts on-the-fly. The home shots are just the home-ice term, the away shots are zero, by definition—we've chosen this as our reference state. The small "20.5%" between the home and road maps indicates that 20.5% of all hockey in 2018-2019 was played in this state; this is the most common of the sixteen permutations.

These are scaled for display so that, for instance, if all five skaters on one team were starting their shifts in the defensive zone, the impact on shot rates for that team would be as shown.

In the top right we see the lop-sided patterns we expect for when the home team starts in the offensive zone and the road team starts in their own zone. The matching pattern for when the home team starts in the offensive zone and the home team starts in their own zone is shown at the bottom left. Interestingly, to me, these states are actually relatively uncommon—only 5.2% and 5.7% of hockey, respectively. Five-skater changes for both teams in one zone aren't that common; coaches routinely prefer to let some or all of their skaters continue their existing shifts which almost always started elsewhere. Furthermore, teams changing on-the-fly against teams that start shifts in NZ or DZ are just as productive as OZ starts against the same; these situations often correspond to situations where a team is "trapped" in their own zone, late in their own shifts, while attacking players have just changed.

Notice also the second column in the third and fourth rows, showing where NZ-vs-NZ shifts begin. Such shifts start every period and frequently follow offside whistles. Predictably from a standing start and far from both nets, shot rates are depressed for both teams, as expected. Such shifts account for 17% of hockey in 2018-2019, second only to OTF-vs-OTF shifts. More than a third of all 5v5 hockey is played by ten skaters, none of whom are not starting their shift in either attacking zone.

Distributions

One obvious reason to construct such a model is to suss out the abilities of players, which is intrinsically interesting. However, we can also make comparisons about how much each of the various factors in our models affect play.

"Raw" on-ice results

Before I describe the model outputs, I turn first to the raw observed results from the combined 2016-2018 regular seasons.

This graph is constructed as follows: for every skater who played any 5v5 minutes in these two seasons, compute the offensive and defensive threat observed while they were on the ice; this produces a point $x,y$. Then, form the density map of all such points, where each point is weighted by the corresponding number of minutes played by the player in question. For ease of interpretation, I've scaled the threat values by league average threat, so that a value of $5,5$ on the graph means "threating to score 5% more than league average, and also threatening to be scored on 5% more than league average". As is my entrenched habit, the defensive axis is inverted, so results that are favourable to the skater in question appear in the top right (marked "GOOD" so that there can be no doubt). The contours are chosen so that ten percent of the point mass is in the darkest region, another ten percent in the next region, and so on. The final ten percent of the point mass is in the white region surrounding the shaded part of the graph.

Player Marginals

Repeating the above process with the individual player marginal estimates gives the following graph in green. Unsurprisingly, since I used a model with an explicit zero-bias, the overall distribution is roughly normal around zero. Intuitively, we expect this means that weak players will be slightly over-estimated and that strong players will be under-estimated; I hope in this way to obtain the clearest possible view of players whose true ability is close to league average; these players interest me a great deal more than the other players. The choice of zero-point for the regression is mathematically arbitrary, so its choice is determined by ease of interpretability and by which sorts of players are of interest; akin to moving a magnifying glass over a large map, knowing that the fine details in the centre will be easier to see but the areas under the edges of the glass distorted.

Notice also that there is no correlation between offensive ability and defensive ability, which confirms my previous work. There are certainly many players who are offensively strong and defensively weak, but it does not appear to govern play like the tired cliche suggests.

Teammate Impact

Once I have individual estimates for players in hand, I can make many interesting secondary computations to show the distribution of various effects. Most obviously, for a given player, I can form the sum of the player estimates of all the given player's teammates, weighted by their shared icetime, and then multiply it by four (since every player has four teammates at 5v5). These estimates of teammate quality can then be graphed as above:

Here we see a definite skew towards good players; that is, most players are playing with better than average players. This fits our intuition, since better players play mostly with one another and also play considerably more minutes than weaker players when coaches are making rational decisions, which is most of the time.

Opponent Impact

The same computation can be done for any given players 5v5 opponents: form the sum of all of their isolates, weighted by head-to-head icetime, and then multiply by five, since every player has five opponents at 5v5. This is graphed below in red:

First, notice that the scale on the axes is smaller than for teammates; more discussion of that will follow later. More interestingly, this distribution is both sharply skewed and non-symmetrical. Opponent distributions are skewed broadly towards good players (as for teammates, for similar reasons) but the skew is much more pronounced in the offensive direction; that is, matchups against good offensive players are more common. Furthermore, the variation in competition quality is much more pronounced offensively, with some players playing against skaters of near-average offensive ability but others playing against very strong offensive players. By contrast, the range of defensive ability faced by players is much smaller.

The fact that both the teammates and competition distributions are skewed off-zero only makes sense if they are correlated, as we know intuitively they are. If we compute the net "skater impact" on each player, that is, if we add together the effect of each player's teammates and the effect of their competition, we obtain a distribution that is roughly normal around zero, as expected.

Score Impact

The distribution of scores is sharply linear and almost entirely team driven. Players on weak teams play more while trailing, and players on strong teams play more when leading, with smaller variations within individual teams according to coaching choices. Nevertheless, coaches choosing to play certain players instead of others is still the driving force behind score effects, as the observed changes in play as scores change are much larger than the score terms here, as you can tell from the scale of this distribution as well as the values of the score terms themselves.

Zone Impact

The net effect of zone starts for shifts is small, like score, despite these terms being unbiased, and is roughly gaussian around zero. The reader may recall that unpleasant technicalities had to be employed in last year's version of Magnus to compensate for skew in the score and zone distributions far from zero; I strongly suspect that this was an artifact of having a near-linear-dependence relation in the columns of my design matrix because of including on-the-fly terms, taking them instead as the reference state has cleared this up and I feel much better about it.

Coach Impact

Since coaches change and players change teams, not every player is affected by head coaching the same amount. Computing the distribution of the effects shows another, very roughly, gaussian shape, with some curiosities for players who played exclusively for bad coaches, bulging the distribution towards the bottom-left somewhat. Also, the effect of coaches broadly is to improve defences, as we sadly suspected; in our zero-biased context another way to say this is that coaches whose tendency is to increase the number of shots for their opponents (whether as a net gain to their team or not) are not permitted to accumulate the kind of minutes that would convince a skeptical algorithm like mine that their tendencies are truly theirs, while the defensive coaches are.

The blob in the top-right corresponds to players who have played many minutes with Jon Cooper, who has coached a very large number of extremely strong shot-rate teams in the past dozen seasons.

"Residuals"

Strictly speaking, residuals for a regression refer to the differences between the individual observations (that is, each microshift, for 2018-2019 about half a million or so) and the predicted shot rates for each microshift. I find it prohibitive to compute such residuals, so instead I settle for a suitable aggregate: for each player, I can compute the difference between their raw (observed) 5v5 on-ice results and what I would expect from computing the model outputs associated with the observed players and the zones and scores and home-ice advantage they player under. This is shown below:

Here the overall gaussian shape and zero-ish centre are encouraging. Residuals in this sense need not be so, since we are introducing bias by employing penalty terms; however, when linear independence relations among the columns are few we still expect something like this.

Relative Scale

In the above, I have shown each distribution on its own scale, so that I could discuss the shape of each one in turn. However, to understand their relative importance, they should all be placed on the same scale, which I have done below (except the residuals):

Here there are many interesting insights to be had:

The teammate distribution is much bigger than the competition distribution, in agreement with many previous studies showing that variation in competition (which is only loosely controlled by coaches) is much smaller than the variation in teammates, which coaches control almost completely. However, there are certain players for whom the net impact of competition is much larger than that of teammates, even over the scale of full seasons. In general, a careful analysis should always begin with teammates but also include competition and contextual effects.
Controlling for teammates, competition, and zone effects, as we do, the aggregate size of score effects is surprisingly small---almost negligible. This contradicts my previous work on this subject. (The link is to slides for a talk I gave some years ago in Washington; in retrospect the work is shoddy and circularly argued. It is perhaps little wonder that I never wrote it up.)
Zone effects are also surprisingly small; their total extent is perhaps three or four times smaller than competition effects which are themselves about as much smaller than teammate effects. By focussing on only offensive zone faceoffs and defensive zone faceoffs, as some analyses do, differences between contexts which are in fact quite small can appear heavily inflated.
Coaching effects, over and above coaching choices like score and zone usage, have a non-trivial effect but still pale in comparison to teammates and competition.
The fatigue distribution is incredibly tiny, invisible in comparison to the others. (In fact it is so small I removed it entirely so that it did not clutter the legend.)

Previous Work and Acknowledgements

Using zero-biased (also known as "regularized") regression in sports has a long history; the first application to hockey that I know of is the work of Brian MacDonald in 2012. His paper notes many earlier applications in basketball, for those who are curious about history; also I am very grateful for many useful conversations with Brian during the preparation of the first version of Magnus. Shortly after MacDonald's article followed a somewhat more ambitious effort from Michael Schuckers and James Curro, using a similar approach. Persons interested in the older history of such models will be delighted to read the extensive references in both of those papers.

More recently, regularized regression has been the foundation of WAR stats from both Emmanuel Perry and Josh and Luke Younggren, who publish their results at Corsica and Evolving-Hockey, respectively.

Finally, I am very thankful to Luke Peristy for many helpful discussions, and to the generalized ridge regression lecture notes of Wessel N. van Wieringen which were both of immense value to me.

As far as I can tell, the extension of generalized ridge regression to arbitrary inner product spaces (that is, shot maps instead of the usual reals) is new, at least in this context.

Player and Coach Results

While my research has always been free to the public and will remain so, I restrict access to the specific regression outputs for players and coaches to website subscribers. To gain access, please subscribe at the $5 tier (or higher, of course); then after logging in you should be able to find coaching terms, and individual player terms are on the career pages for each player.

The 5v5 Magnus Prediction Model, v2

September 12, 2019, Micah Blake McCurdy, @IneffectiveMath