Stephen Burtch is a hockey Analytics writer for Sportsnet.ca, and Pension Plan Puppets. Follow him on twitter @SteveBurtch
In the push to obtain meaningful data that allows analysts to compare NHL teams early in the season a large amount of effort has been put into focusing upon the ideal situations to use for prediction going forward. Past practice largely saw a focus on comparing possession metrics, such as Fenwick or Corsi, at 5v5 to strip out the obvious effects of power plays and short handed situations on a team’s shot attempt metrics.
As score effects were identified, various analyses posted in 2011 by the late Tore Purdy (aka JLikens of ObjectiveNHL) showed that the predictive power of Corsi and Fenwick in tied and/or close situations exceeded the power of standard 5v5 Corsi and Fenwick that included all score situations. A regular concern raised in the process of this analysis became the small sample sizes that were available at 5v5 tied – particularly for in-season projections. It thus became de rigueur to consider 5v5 “close” situations for analysis. Score Close was identified as when the game score was within one goal in the first two periods and tied in the third period. The idea here was to limit the impacts of the time related nature of score effects – teams tend to push play offensively and sit back defensively – particularly in the third period – as their desperation increases to achieve a tie or protect a lead.
The logic leading to the attempt to ignore score effects is similar to that which led to the removal of power play or shorthanded time on ice for the sake of comparison. It is an attempt to remove skewed information from the data set we have, and as the logic seemed self explanatory the practice caught on around analytical circles quickly and was rarely questioned or debated, despite the fact that Purdy would provide caveats with his work along the following lines:
“The differences between the values here are small, and we only have three seasons of data. It may very well be that all three variables correlate equally well with goal ratio over the long run. This subject may require further study in the future when more data is available.”
– JLikens, ObjectiveNHL, Feb 16th 2011
If the goal is comparing teams and individual skaters in the most “controlled” and least variable context possible, the restriction to 5v5 close statistics makes a modicum of sense. Although if that is the objective, using 5v5 Tied data makes more sense frankly. Typically though, the main objective of analyzing underlying statistics is to predict and project future outcomes. The perception that analysis of 5v5 score close situations permitted identification of the true underlying skill of a team may well be accurate, but unfortunately the information lost in the process has value and meaning. We are restricting the sample of data we have to a far smaller amount than necessary.
Another method of accounting for score effects was first proposed in April of 2010 by Gabe Desjardins and then analyzed in more detail by Eric Tulsky in January of 2012. Rather than ignoring situations with teams trailing or leading by more than one goal, it was proposed that adjusting the Fenwick percentages in various score states to account for so called “score effects” would allow us to retain the value of the sample of all 5v5 data.
Recently a more comprehensive dissection of score adjustments, venue adjustments, and an accounting for game time impacts on score effects was performed by Micah Blake McCurdy at SensStats.com and Hockey-Graphs.com. This was driven by recent work by Fangda Li that suggested that score effects make very little impact on observed shot differential results prior to the 3rd period, although it should be noted similar work was done previously by Desjardins – back in April of 2010 – though he came to a different conclusion.
In the end it appears that score and venue adjustments at 5v5 capture enough important information that they drastically improve on raw Corsi or Fenwick when it comes to predicting future goals and wins (by approximately 5% around the 20 game mark). Time impact adjustments on the other hand do not seem to add any predictive power, most likely because score effects would be largely collinear with the time effects, thus washing out any added value.
With the added data over the past 7 years of play, McCurdy showed that 5v5 close is actually WORSE than raw 5v5 Corsi or Fenwick when it comes to predicting future outcomes in terms of goals or winning percentage. This would amount to what Purdy was suggesting when he made the point that further study would be worthwhile when more data became available. It appears that 5v5 Close is not necessarily the valuable asset we have long thought.
While the time effects don’t seem to add value for prediction, they do a good job of indicating the issue with 5v5 close, which is summarized succinctly by McCurdy here:
“One of the central flaws in ‘close’ measures is that they ignore the -1/+1 states in the third period, and playing well or playing poorly in these states makes an enormous difference in one’s ability to win. There are many other (smaller) conceptual flaws, but the evidence is clear: ‘close’ possession measures are misguided and must be done away with.”
– Micah Blake McCurdy, Hockey-Graphs.com, Nov. 13th 2014
It is unlikely that 5v5 close will be wiped from the hockey analytics lexicon anytime soon. Multiple sites and end users reference it with regularity, and future research may again shift thought in favour of its use. Whichever way it goes though, the development of an improved array of adjusted measures to account for various contextual factors in our models – rather than methods that subtract useful data – will likely improve our holistic understanding of the game.