Blueline Traversals

May 1, 2024, Micah Blake McCurdy, @IneffectiveMath

Introduction

In addition to trying to score during their shifts, hockey players can help their teams in another way: by leaving the puck farther up the ice at the end of their shift than it was when they came on the ice. Conversely, players can hurt their teams immediately by conceding goals but can also simply lose ground, making life harder for their teammates and coach. Here I provide a framework for measuring this secondary effect on ice position by examining how players cause (or fail to cause) the puck to move across the two blue-lines.

This research was first presented at the Ottawa Hockey Analaytics Conference in 2021, for which both slides and video are available; some small improvements have been made since then. In particular the way of taking the outputs of this model and computing the impact of individual players on the shot rates of the following shifts is new in May of 2024.

Overview

Every second that the puck is in play, it is contested. All five players from both teams usually work together to move the puck from wherever it is to a place closer to their opponent's goal—exceptions to this generality are usually dramatic enough to be memorable.

The offside rule (where players without the puck must cross the blueline after the puck) divides the rink into three zones. I am interested in measuring the impact of each skater on the movement of the puck across both bluelines. Since every transition is desired by one team and undesired by the other, there are four impacts to measure:

Method

Strictly speaking I measure these impacts with two different but heavily overlapping models.

Furthermore, each team's head coach is included in each of the four senses, to proxy for how they instruct (either directly or indirectly) their players to play.

Both models also share the following additional terms:

I am interested in transitions for three different reasons. Most importantly, I want to obtain the coefficients for the non-player terms in order to understand the sport better. Second, I want to obtain the estimates for the player terms in order to understand how they obtain the more important on-ice impacts (such as shots and goals) on the game that they do. Finally, I want to measure the off-ice impact that players can have on the shifts which follow theirs, by leaving the puck in more (or less) promising locations than when they begin their shifts.

Fitting

This model is fitted as a logistic regression, with three kinds of ridge penalties, similar to my shot rate model. One penalty (of strength 100) is applied to every non-constant term's deviation from zero, to encode our prior intuition that no one player or structural term can dramatically change on-ice results by themselves. A second penalty (also of strength 100) is applied to every term's deviation from its value in the previous season, encoding our prior belief that the players and the sport itself change slowly over time. Finally, a third penalty (of strength 100 million) is used to pool the score terms to enforce our knowledge that all the score effects taken together must average to zero.

Structural Results

As always, the coefficients from a logistic regression are a bit of a pain to interpet. Positive values are associated with players or structural effects that make transitions more likely, and negative values with ones who make transitions less likely. In order to make them a little easier to understand, I've decided to quote them after transforming them somewhat. Each transition (entry or exit) has an associated constant term; converting this to a probability gives the chance of a transition in a given second assuming all other factors have no effect. This probability can be inverted to give an average time until the transition occurs. Repeating this process with the constant and a given term added lets us compare the two, to quote the effect of that term in seconds. One player might delay the mean time until their team exits their zone by five seconds, another structural term might icrease that time by four seconds, on average, say. These changes, strictly speaking, cannot be added together, but they give a certain naturality.

We'll return to the player and coach terms in time, but let's start with the structural terms. First, exits in 23-24:

We know that trailing teams usually dominate in shots and we see the same here for transitions. Trailing is associated with shaving around two seconds off the time it takes to get the puck out of your own zone and leading by one or two adds about a second. Teams that are up a lot are downright leisurely getting the puck out. Similarly, zone exits come quicker early in the game and slower in the third; the interaction terms between leading and the third period are also very strong; this is the environment where score effects are strongest. Road teams have a roughly two-second harder time exiting their zone than home teams.

For entries in 23-24:

When the puck is in the neutral zone, the impact of the score is quite different: tied games are the ones where entries are a little easier to come by, and both leading and trailing are associated with longer time until entry. Leading teams gaining the neutral zone and then dumping the puck and changing is familiar enough behaviour. Trailing teams having an easier time getting out of their own zone and also a harder time getting into the offensive zone suggests that leading teams systematically drop back, preferring to defend their own blue line better than their opponent's.

The period and home/road terms are very small, and the third-period interaction effects are also quite small, except for the "when tied" term. Games which are tied-in-the-third specifically have clogged neutral zones. The year-to-year variation in the structural terms is small.

Skater Results

07-0808-0909-1010-1111-1212-1313-1414-1515-1616-1717-1818-1919-2020-2121-2222-2323-24
Coaches
ANA
ARI
BOS
BUF
CAR
CBJ
CGY
CHI
COL
DAL
DET
EDM
FLA
L.A
MIN
MTL
N.J
NSH
NYI
NYR
OTT
PHI
PIT
S.J
SEA
STL
T.B
TOR
VAN
VGK
WPG
WSH

Off-ice Impacts

The structural terms tell us something about the sport itself; the coaching and player terms give us insight into player evaluation and how players achieve their on-ice shot rate impacts. However, we can use the same information to measure another aspect of player performance, that is, which players are gaining ice position and which are losing ground? Every time a coach makes a line change, they must give out shift start to the new players based on whatever the previous players have done; some players create zone start "currency" for their teammates and others spend it. Using the coefficients above for each player, we can estimate this impact also.

In order to measure this impact, I simulate a single player's shift with a simple Markov chain: at a given moment, the puck is understood to be in one of the three zones, while play is either ongoing or else a whistle has stopped play. At every moment, depending on these two factors, the puck may be moved into an adjacent zone, or else the player's shift may be ended by their coach.

I compute the zone-to-zone transition probability using only the constant term in the relevant transition model and the player's term in that model. This amounts to assuming that the player of interest has been provided with four league average teammates, and five league average opponents. Among other things, one of the reasons for centering groups of terms that we know have no aggregate impact permits us use zero as "neutral circumstances" in this way.

For each moment I can also estimate if there will be a stoppage using historical stoppage probabilities for the league as a whole. For 2018-2024 those probabilities can be measured:

Using league-wide probabilities here is a simplification, since we observe that certain players cause stoppages more quickly than others; that aspect of player impact will go unmeasured for now.

Finally, at each moment of a shift I model the probability that the shift will be ended also using 2018-2024 league averages:

With this little simulator in hand, we can measure the average resulting puck location caused by a player relative to any particular starting puck locations. In particular, the simulator itself can be understood to be a function \(S\) which takes a quadruple of starting puck location probabilities (DZ0,NZ0,OZ0,OTF0) and outputs a quadruple of final puck location probabilities (DZ1,NZ1,OZ1,OTF1), that is, as a map \(S\) from \(\mathbb R^4\) to itself. Since this map is continuous, it has a fixed point, which we can find by iteration. In this case, the fixed point is \(\alpha_0 = (11.9\%,12.5\%,13.0\%,62.5\%)\). This distribution is fairly close to league-wide shift-start types, which is an encouraging validation of our admittedly very simple transition model. For a given player \(p\), we can consider this simulator using their terms as \(S_p\), and then we can form \(S_p(\alpha_0) - \alpha_0\), the net increase or decrease of the puck in a given state caused by the player. For instance, a player with a positive value in the first entry of \(S_p(\alpha_0) - \alpha_0\) is ending their shifts at stoppages in their own zone more than they start shifts there; and a positive fourth component is associated with a player finishing their shifts on the fly more than they start them in that fashion. This net difference \(S_p(\alpha_0) - \alpha_0\) can be weighted according to the zone start impacts from my shot rate model to give the impact of a player on the shot rates of the shifts that follow their own shifts, both offensively and defensively. This can be scaled to any number of minutes in such following shifts, choosing a thousand gives the following distribution of off-ice impacts in the league as of the end of 2023-2024:

There is a definite negative correlation between the two impacts \((r = -0.40 )\), as we would expect, since every transition across either blue line is good both defensively and offensively. However, these benefits are not matched, because of the intrinsic asymmetry of the offside rules, not to mention varying player choices; these effects cause a fair bit of spread away from perfect correlation. The very best players are an interesting mix of players known to be exceptionally strong (Brady Tkachuk, Quintin Hughes, Adam Fox, Nathan MacKinnon) but also some specialist players not particularly famed by traditional measures (Marcus Foligno, perennial Selke-contender among analytics types, is one good example).

In total magnitude, however, these off-ice impacts are much smaller than on-ice impacts, by a factor of roughly ten. This is obviously minor by comparison, but still non-trivial.

For somewhat easier-to-lookup charts, the above distribution can be broken out by team:

Pacific Central Metro Atlantic
ANA ARI CAR BOS
CGY CHI CBJ BUF
EDM COL N.J DET
L.A DAL NYI FLA
S.J MIN NYR MTL
SEA NSH PHI OTT
VAN STL PIT T.B
VGK WPG WSH TOR

Weaknesses

I have deliberately chosen to work with the nhl's public play-by-play data, from which the puck location at many (but not all) times can be imputed. However, there are times when the puck moves from a defensive zone into the neutral zone and then back into that same defensive zone without an event being recorded; my approach here will treat this as a continuous stretch of play in the defensive zone. The effect of these omissions will tend to make players and coaches look worse at zone exits and better at entry defence. Similarly, there are some number of zone entries which are followed by a return of the puck to the neutral zone without any record; these omissions will tend to make coaches and players stronger at entry defence and weaker at zone exits than they actually are. I don't know how to estimate how common these two omissions are in order to even guess at a comparison between these two effects.