Standardized Goals Against

May 31, 2017, Micah Blake McCurdy, @IneffectiveMath
This work was presented first at the 2017 Ottawa Hockey Analytics Conference, for which you may view slides and video.

I propose a new framework for evaluating goaltending performance, taking into account the difficulty of shots faced as well as the quality of skaters playing for both teams. This is the first non-trivial fragment of what I imagine will be a long sequence of articles.

Aim

Any method for evaluating goalies should be:

The gentle reader will decide for themself how successful I have been so far in the pursuit of these goals but I feel I have grasped a foothold. In this first article I mainly describe the broad framework but I mention two related applications: adjusting for quality of skaters, and adjusting for quality of shots faced.

Model

The key technical tool I use in our framework is a simple model of play in the defensive zone, from the goalie's perspective. I shoehorn all play into seven states:

The gory details and most of the modelling judgment come in with how I tabulate transitions between these states given the play-by-play information to which we have access. These details are sufficiently gruesome and lengthy that I've included them in an appendix at the end of this article.

The results for the league in 2016-2017 as a whole are:

Shot FreezeContestedGoalAttackersDefendersSafe
Shot 21.1% 25.7% 47.7% 5.5%
Freeze 49% 49% 2%
Contested 41% 59%
Goal 100%
Attackers46% 54%
Defenders 35% 65%
Safe 100%
The blank entries indicate zero transitions of that type were recorded, mostly because I imputed other transitions around them. The aggregate effect is to only permit certain transitions: for instance, all transitions from "Contested" are either to "Attackers" or "Defenders", notice how defenders consistently win the bulk of such non-static puck battles. On the other hand, when goalies freeze pucks after shots, the result is usually a faceoff, for which the attacking team has a slight edge, but in a small fraction (3%) of cases the puck is immediately taken out of the zone to "Safety" by the linesfolk, because of scrumming attacking skaters or penalties.

Adjusting for Skater Quality

The primary benefit of shaping information about defensive zone play into such a matrix as the above is that we can gain insights from simple computations using the matrix. For instance, notice that there are two "terminal" or "absorbing" states, that is, "Goal" and "Safe". These are "terminal" in the sense that I consider any play after them as totally distinct from the previous play. (We know that this is not quite true---after all, any goal scored changes the score, which we know strongly affects some aspects of play, and we know that some clearances of the puck to "safety" are actually very bad plays which give up control of the puck with minimal benefit. These concerns will have to wait for another day.) By taking a very high power of our transition matrix, we can compute the "eventual goal probability" starting from any state, that is, the chance that, starting from a given state, the puck will wind up in the net before the defenders manage to clear it. For instance, the eventual goal probability for "Shot" is 10.2%, around double the immediate probability of scoring on any given shot.

My broad opinion is that the goalie ought to be considered mostly responsible for all of the entries in the "Shot" row and not in any way responsible for the entries in any of the other rows. However, looking at transition matrixes for individual goaltenders, even for full seasons, shows significant differences in the "skater" rows. For instance, Phillipp Grubauer played twenty-four games for Washington in 2016-2017, his matrix is below:

Shot FreezeContestedGoalAttackersDefendersSafe
Shot 19.2% 25.4% 51.1% 4.3%
Freeze 51% 46% 3%
Contested 38% 62%
Goal 100%
Attackers43% 57%
Defenders 32% 68%
Safe 100%
In virtually every skater entry the results in front of him are more favourable than league average. Since the Capitals won the Presidents' Trophy in 2016-2017, it should hardly be surprising that his team won more puck battles, cleared more rebounds, and broke out of the zone better than the league average.

By comparision, consider Mike Smith, who played fifty-five games for Arizona in 2016-2017. His matrix is below:

Shot FreezeContestedGoalAttackersDefendersSafe
Shot 22.1% 28.8% 44.1% 5.0%
Freeze 49% 48% 3%
Contested 43% 57%
Goal 100%
Attackers52% 48%
Defenders 36% 64%
Safe 100%
In almost every skater entry we see results that are below league average. This again is not surprsising, as the Coyotes finished third-last in 2016-2017. They are consistently more susceptible to opponent's forechecking, win fewer puck battles, and break out in transition less.

The key idea for equalizing results across different skater contexts is to form the transition matrixes for the goalies we want to compare, and then replacing all of entries in the "skater" rows (that is, every row except the "Shot" row) with league-average values. This produces a transition matrix which I imagine as representing what would transpire if the goalie in question were provided with league-average skater context instead of the teammates and opponents they actually faced. Then, by computing the long-run probability of a shot being converted into a goal, we can compare two goaltenders more fairly. For the given pair of goaltenders, the immediate goal-per-shot figures favour Grubauer---4.3% to Smith's 5.0%. Moving to eventual goal probabilities, Grubauer's figure is 7.4% and Smith's 10.2%, where Smith's weak teammate support becomes very clear. After replacing their skater contexts with league average ones, Grubauer's "skater-independent eventual goal probability" is 7.9%, whereas Smith's is 9.5%. By this measure, Grubauer's performance was actually stronger than Smith's, even after accounting for the differences in skater quality.

Adjusting for Shot Quality

So far I've treated all of the transitions from the "Shot" state to be the responsibility of the goaltender. However, not all shots are equally easy to handle. This difficulty might be smoothed over if every goaltender faced a similar distribution of shots in each year but this is not what we observe in the NHL. For instance, Devan Dubnyk played sixty-five regular-season games for Minnesota in 2016-2017, facing the following pattern of shots:
Blue regions indicate fewer shots (than league average) per hour of 5v5 play and red regions show areas from which he saw more shots per hour. On the other hand, Mike Smith in Arizona (55 games played) saw the following distribution of shots:
Which strongly suggests that some accounting should be made to handle difficulty of shots faced.

To accomplish this, I replace the single "Shot" state with a family of states, one for every recorded shot location. I divide the defensive zone into a 100 by 100 grid, roughly corresponding to the recorded precision of the NHL's real-time stats. This changes the "Shot" state into ten thousand shot states, and our seven-by-seven transition matrixes become 10,006-by-10,006 matrixes, which makes them harder to look at but not appreciably harder to compute with. Then, we can compute a Standardized Shot Profile, that is, the relative likelihood of facing shots from given locations for a league-average goalie. Graphically, it looks like this:

Where the colour units indicate relative frequency. By encoding this standard shot profile as a matrix we can pre-multiply our observed transition matrixes by it and obtain what I call Standardized Goals Against or sGA, that is, the number of goals that a given goalie would allow per hundred shots if they faced a typical distribution of shots, calculated from how they performed on the shot distribution they did face. Similarly, we can derive "Standardized Freeze Rates", "Standardized Shot-to-contested Rates", and so on, though these quantities seem less interesting to me.

In our example above, we were comparing Dubnyk and Smith; their immediate goal probabilities (adjusting nothing at all) were 4.8% and 4.2%, respectively. However, Dubnyk's sGA for this season is 6.7, and Smith's is 4.5---unsurprisingly, Dubnyk's expected performance versus league-average shot quality is worse than observed. Somewhat less expectedly, Smith's expected performance in percentage terms also drops slightly, but the relative distribution of the shots he faces is not so different from league average, he simply sees lots more from every location.

The two adjustments described here (replacing skater terms with league averages and shot standardization) can be combined to obtain a stat that I call "sGA*".

Repeatability

If we want to imagine that our statistics are useful measures of skill than we hope that they will be repeatable, that is, future values should be related to past values. I computed correlations for several stats, including the two (sGA and sGA*) introduced here, using "career to date" as the past value and "following twenty-five games" as the future value. I tried to imagine a plausible scenario as it might appear to a decision-maker at a hockey team: which goaltender shall I (primarily) play for the next twenty-five games? Many fewer than twenty-five games risks being lost in noise completely, many more games risks disconnecting from practice. Computing Pearson correlations in this way for all goaltenders over the past decade gives:

StatCorrelation
sGA 0.185
sGA* 0.135
xGA-GA per shot0.211
All-situations save %0.211
5v5 save %0.115
There are a number of surprises. Most surprisingly, the least repeatable measure of goaltending talent is 5v5 save percentage, which is one of the most popular measures in the analyticky circles in which I travel. The third entry is based on Emmanuel Perry's expected goals model, which assigns to every shot a goal probability based on its type and location. Forming the difference between expected goals allowed and actual goals allowed and then dividing by the number of shots puts this notion on the same arithmetic footing as the other ones, allowing for comparisons. It is the most sophisticated existing model for goaltending evaluation to date so it's not surprising to see it perform well here; what is much more surprising is the equally strong repeatability from all-situation save percentage, which indiscriminately buckets together shots from all different contexts. I suspect that there may be a sort of survivor bias influencing results here; since all-situations save percentage appears to be the most common evaluating tool among NHL decision-makers over the past decade (with "consistent" goaltenders especially prized), perhaps there is artifically less variance in this measure.

I am sufficiently heartened by the repeatability of sGA to publish this article and to push forward with further work in this vein; however, the weakness of the repeatability of sGA* makes me think this latter stat might not be worth applying immediately.

Future Work

A project this size will take, I expect, several years and there is much left to do. The most obvious next steps to me are:

In any event, I am sufficiently happy with sGA that I will be computing it for past and future goaltending performances and quoting it on the site. The future work I mention here will take a long time and I welcome the assistance of those who are interested in accelerating that progress.

2016-2017 Results

The graph below shows the raw goals per hundred unblocked shots and the sGA for all goalies who played in at least fifteen regular season games in 2016-2017.

Goalies who appear above the red line would have posted better results had they faced a league-average shot profile instead of the profile they did face; and those below would have posted worse. The players are coloured by their teams; goalies who played for multiple teams are shown in white.

Appendix: Transition Imputation

These are the details of how I took the NHL play-by-play and coerced what is written there into transitions for my model. First of all: no transitions were considered when the goalie under consideration was not in the net, no matter what the play-by-play events. That said, there are two stages of model design, one is encoding of states:

Once all of the states are encoded in this way some additionals states are imputed, with accompanying transitions: