Better Way to Compute Score-Adjusted Fenwick

(This article orginally appeared October 2014 on Emmanuel Elk's sadly-now-defunct site, senstats.ca), Micah Blake McCurdy, @IneffectiveMath

Everybody who watches more than the occasional game of hockey knows all about score effects, even if they don’t know the term — teams that are losing get more of the puck. (For those who would like an introduction, check out the great piece over at Puck Daddy which Jen Lute Costella (@RegressedPDO) wrote just the other day). Whatever its causes, it’s certainly one of the things that makes hockey fun to watch, but it’s troublesome for stat-minded folk who want to tease out the underlying possession skills.

There are two common approaches to accounting for score effects in raw possession numbers: score-close and score-adjusted. The score-close method counts events as having value “one” if they occur when the score is tied, or within a goal in the first or second periods; all other events have value “zero” — they simply aren’t counted. Score-adjusted includes all events, but weights them according to score-situation; the established formula is due to Eric Tulsky (@BSH_EricT) in a 2012 article at Broad Street Hockey. In this article, I introduce another formula for computing score-adjusted fenwick, which has greater predictivity at all sample sizes, especially smaller sample sizes, is easier to compute, and is perhaps conceptually easier to understand.

As a technical preliminary, all events here are 5v5 Fenwick events; that is, all unblocked shots directed at the net. Note that 5v5 events do not include empty net situations.

For reference, the formula in Tulsky’s article is:

Score-Adjusted Fenwick =
[  3.75 * (Fen_up_2   - 44.0%)
+  8.46 * (Fen_up_1   - 46.1%)
+ 17.94 * (Fen_tied   - 50.0%)
+  8.46 * (Fen_down_1 - 53.9%)
+  3.75 * (Fen_down_2 - 56.0%)] / 42.36 + 50%

The various numbers require a little explanation: 42.36 is the average number of minutes per game (in the data Tulsky examined) that teams were 5v5; 3.75 the average number of minutes that teams were up-one/down-one, 8.46 the average number of minutes that teams were up-two/down-two (or more than two), and 17.94 the average number of minutes teams were tied. The obvious issue (which Tulsky points out) is that the formula can only hope to be useful when applied to a large enough sample to permit approximating the actual time spent by a given team in those score situations. For instance, the 2014-2015 Buffalo Sabres might not be able to find a single set of ten games which approximate this distribution, let alone a randomly chosen sample. If, in a given sample, a team has spent no time leading by two, then the only way to get sensible results is to assume, in the absence of any evidence, that the teams up-2 fenwick percentage is 44%, the same as league average. More vexing still is the arithmetic with percentages; ideally one would avoid taking averages of averages.

A conceptual criticism is that the numbers don’t differentiate between home and away; and score effects do not apply evenly, as we’ll see numerically later. Finally, I was intuitively thrown by the treatment of “two or more goals” as a bin unto itself. In the end, I chose to use “three or more” as my catchall bins, as there are small but non-trivial differences between up-2 and up-3 situations.

I have heard people complain that Tulsky’s formula is hard to understand and hard to compute. That is as may be; but in any event I like to tell myself that my method is easier and addresses all of the above objections.

Method

Like Tulsky’s formula, we first examine a good chunk of recent hockey; today, we’ll examine 2007-2014. Over that span, there were the following shot attempts:

Home Lead Home Fenwicks Away Fenwicks
-3 or worse 9361 6,556
-2 19,420 14,848
-1 51,921 43,075
0 116,607 109,675
1 52,382 55,137
2 19,727 22,648
3 or better 9,403 11,890

For instance, with the home team leading by a goal, the home team generated 52,382 Fenwick events, and the away team generated 55,137 Fenwick events. The score effects are clearly visible, as events for the trailing away team are easier to come by. We want to numerically count them as less than one event, since they are relatively easy to generate; similarly, we want to count events generated by the leading home team as more than one event, since they are relatively hard to come by. We choose to count away events as worth 0.975 adjusted events, and to count home events as worth 1.026 adjusted events; this means that there are, in aggregate, 0.975*55,137 + 1.026*52,382 = 107,519 = 55,137 + 52,382 events, that is, the total weight of “home team down one” events remains unchanged.

In general, the adjustment coefficient for a given team (home or away) in a given situation is the one which satisfies:
(Coefficient for given team) * (Events for given team) = Average events for both teams

Carrying out the same computation for all the other score situations, we obtain the following set of score-adjustment coefficients:

Home Lead Home Event Weight Away Event Weight
-3 or worse 0.850 1.214
-2 0.882 1.154
-1 0.915 1.103
0 0.970 1.032
1 1.026 0.975
2 1.074 0.936
3 or better 1.132 0.895

Notice that there is no need for any measurement of times. This makes it possible to compute the score-adjusted fenwick of any set, no matter what set of score situations happen to occur in it and for how long. Notice also that the adjustment coefficients for -3/+3 are substantially different from -2/+2, validating our earlier concerns about curtailing score differences at 2. Even with seven years’ data, however, there are hardly any events at a score difference of 4, so I decided to stop at 3.

Of course, if one doesn’t want to adjust for home/away, one can compute suitable score-adjustment coefficients to do so from the above table also.

Results

Every ‘new’ stat must be justified compared to the existing stats which purport to accomplish the same purpose. In this case, the comparison is obvious; namely, we must compare the method here for computing score-adjusted Fenwick with Tulsky’s method. For good measure, I have also included Fenwick-close, which I hold in very low regard, for the express purpose of showing the world just how bad it is.

We’ll look at two different sorts of tests; first, a self-consistency test to test repeatability, and, second, a prediction test to see how well possession predicts goal percentage and win percentage.

First, let’s consider an easy test: within a given season, choose forty games at random, divide them into two groups of twenty, and compare the possession metrics in the two groups. Repeated a thousand times per season, this gives:

Possession Metric Auto-determination (R2)
Fenwick Close 0.428
Score-adjusted Fenwick (Tulsky) 0.488
Score-adjusted Fenwick (This article) 0.530

All of these possession metric correlate with themselves reasonably well, with the adjusted measures being more repeatable.

More interestingly, how well do these possession metrics predict goal percentage? Again, we consider only 5v5 goals.

Possession Metric Determination of Goal Percentage (R2)
Fenwick Close 0.0857
Score-adjusted Fenwick (Tulsky) 0.108
Score-adjusted Fenwick (This article) 0.113

This is obviously harder, as we see with the reduced coefficients. The formula introduced here is slightly better than Tulsky’s formula, which is noticeably better than score-close.

Harder still, we see how possession in one set of twenty games predicts winning percentage in the other twenty, where we treat shootouts as ties:

Possession Metric Determination of Win Percentage (R2)
Fenwick Close 0.0388
Score-adjusted Fenwick (Tulsky) 0.0483
Score-adjusted Fenwick (This article) 0.0559

This is obviously hardest of all, but we see the same pattern of results.

Second, let’s consider a much more difficult test: we take samples of forty games from a given season, but this time we try to predict the results from thirty-five of the games using the other five. Intuitively we expect this to be much lower across the board.

First, we examine the auto-correlations:

Possession Metric Auto-determination (R2)
Fenwick Close 0.242
Score-adjusted Fenwick (Tulsky) 0.249
Score-adjusted Fenwick (This article) 0.330

Now, the formula introduced in this article is noticeably better, while the existing formula is scarcely better than Fenwick close.

As for predicting goal percentage:

Possession Metric Determination of Goal Percentage (R2)
Fenwick Close 0.0605
Score-adjusted Fenwick (Tulsky) 0.0691
Score-adjusted Fenwick (This article) 0.0907

Again the new formula pulls away from the other two. Finally, predicting wins:

Possession Metric Determination of Winning Percentage (R2)
Fenwick Close 0.0320
Score-adjusted Fenwick (Tulsky) 0.0366
Score-adjusted Fenwick (This article) 0.0526

While none of these results are especially impressive, I include them here to show you that, suitably read, the results of five randomly chosen games can tell you a lot more than nothing. Just to prove the point with a very silly example, let’s try using the score-adjusted Fenwick from five randomly chosen games to predict the score-adjusted Fenwick for the other 77 games in an 82 game season: This gives an R^2 of 0.157 with a p-value less than 1E-250.