Burtch: Examining scoring chance data

Stephen Burtch is a Hockey Analytics writer for Sportsnet.ca and Pension Plan Puppets. Follow him on Twitter @SteveBurtch 

Over the past few weeks the statistical website War-On-Ice has been rolling out a new Scoring Chance Metric.  Culled from the NHL’s play-by-play data sets, the counts are based upon a standardized definition that is typically lacking when Scoring Chances are discussed in league circles.

The definition is best left to the originators of the measure so I quote them here:

danger-zones

“So based on these measures, the average probability of a goal given the type and locations, and the consideration of team defense, we have these conditions for a “scoring chance”:

  • In the low danger zone, unblocked rebounds¥ and rush shotsƚ only
  • In the medium danger zone, all unblocked shots.
  • In the high danger zone, all shot attempts (since blocked shots taken here may be more representative of more “wide-open nets”, though we don’t know this for sure.)

¥Rebounds are defined as attempts within 3 seconds of a missed, blocked, or saved shot

ƚRush shots are defined as attempts within 4 seconds of any event in the shooting team’s defensive or neutral zones.”

In essence the measure is attempting to identify a higher percentage attempt, that has an increased likelihood of resulting in a goal.  Thus we can surmise that attempts meeting these conditions are of greater value, and this has been assessed as being the case specifically for year-to-year relationships for individual skaters and description of past results.

All-Situation Scoring Chance For Percentage (SCF%) correlates more highly to past Goals For percentage (GF%) at the team level (r2 = 0.3805) than Unadjusted Fenwick For percentage (FF%) does (r2 = 0.3658), but at a lower rate than any of the Score Adjusted metrics available.  SCF% also has lower repeatability at the team level than CF% and FF%, likely due to the relative rarity of the events being recorded.  Smaller sample sizes lead to increased variance, though there is a slight improvement on the reliability obtained from regular SF% alone.

Seasonal Team Statistic Situation Correlation to Season GF% (r2) Repeatability Year to Year (Autocorrelation r2)
Score Adj. CF% All

44.38%

28.21%
Score Adj. FF% All

44.08%

26.32%
Score Adj. SF% All

42.54%

25.20%
SCF% All

38.05%

23.30%
FF% All

36.58%

23.42%
CF% All

35.55%

25.48%
SF% All

34.77%

23.02%

 

Sam Ventura of War-On-Ice has explored year-over-year correlations, finding that SCF% predicts future year GF% to a higher degree than Score Adjusted CF% and FF% for forwards.  For defenders, Score Adjusted FF% predicts future year GF% to a higher degree than Score Adjusted CF% or SCF%.  This is likely due to the direct impact on scoring attempts Forwards seem to have, while defenders tend to be put in a more passive position as far as offense is concerned.

One of the main objectives with all of this in terms of future assessment is of course an analysis of in-season predictivity.  We wish to know which of these statistics is best in-season at informing us about future outcomes.  We can recall in my previous posting relating to the predictivity of various shot attempt metrics that we had reached a point where the current state of affairs indicated that Score Adjustment of Corsi and Fenwick improved the in-season predictive power over Score Close and Raw measures substantially.

In a fashion similar to that undertaken by Micah-Blake McCurdy I explored the in-season predictivity of 5v5 SCF% vs GF% and All Situation SCF% vs GF%. The data I used was from 2007-08, excluding the lockout shortened 2011-12 season and the current 2014-15 year. Unfortunately to this point the measures we have of Scoring Chances are NOT Score Adjusted.  Interestingly the results are less powerful than we might hope based on the other information gleaned to date.

InSeason5v5All

I focused my assessment on the first 41 games of the NHL season as predictivity would continue to decline in the latter half as the sample being predicted shrinks in size and displays increased variance.  It appears that the In-Season Predictivity of SCF% for GF% at both 5v5 and in All-Situations maxes out around the 20-25 game mark.  All-Situation SCF% reaches 19.2% while at 5v5 it peaks at about 18.4%.  Neither value is particularly impressive in comparison to the predictive power we currently have in the form of Score Adjusted 5v5 CF% or FF%.

As you can see in the following graph, McCurdy found that 5v5 Score Adjusted CF% reached a maximum in predictivity around the 20-25 game mark of approximately 30%. The substantial increase is likely due to the fact that Corsi and Fenwick accrue at a much faster rate than Scoring Chances and information about the teams we are assessing is that much greater at an earlier stage.  In essence we are back to the sample size problem.

CorsiPredictivity

So where does that leave things with respect to Scoring Chances as measured from NHL Play-By-Play data?  The value of scoring chances in terms of description of goal outcomes is obvious in comparison to raw shot attempt counts. That being said we aren’t  accumulating enough data quickly enough to improve predictivity of future outcomes.  We also know that Unadjusted SCF% doesn’t out-perform Score Adjusted CF% or FF% at the moment in terms of describing past results either.

Given these results, it looks like next steps amount to exploring a weighted Shot Attempt model that includes some Score and Venue Adjustments.  Weighting attempts based upon their location of origin on the ice, and the time dynamics of the event (i.e. rebounds or rush attempts) should improve the detail of the information contained in each event.  By Score Adjusting and still including all events, we should accrue data at a high enough rate to theoretically improve upon the results we are seeing from Score Adjusted CF% or FF%.

Progress is being made, but much work is left to be done.  It will be interesting to see where it takes us.