Matt Cane is a contributor to Hockey Prospectus. Follow him on Twitter at @Cane_Matt.
This post expands on work originally presented at the Ottawa Hockey Analytics Conference. If you’re interested in seeing the original slides for this or any of the other talks, they are available here.
Across the hockey world, criticisms of Corsi are not difficult to find. “It makes good players on bad teams look worse than they actually are”, some have argued, in spite of the fact that we’ve had Corsi Rel for ages now. “It doesn’t take into account which players are playing the tough minutes” others have said, although both Usage Adjusted Corsi and dCorsi exist to address exactly these points. “It treats all shot attempts the same” a further group of voices yelled, and it’s on this point that the stats world has long lacked a good rebuttal. While Fenwick was created to address the fact that blocked shots have been shown to be a repeatable skill, the way Fenwick adjusted for blocked shots (by removing them entirely) was perhaps too blunt. Most traditional possession based analysis has not been concerned, for better or for worse, with the information that goal scoring tells us. Shot attempts have often been viewed in an all-or-nothing sense: you either include a given type or not, and there was no discussion of what each shot attempt type was worth relative to the others.
Weighted Shots, first proposed by Tom Tango, attempted to tackle this problem by changing the value given to each Corsi component (goals, saved shots, misses and blocks). Tango’s weights, which he determined using a multiple linear regression that predicted a half-season’s Goal Differential using the shot attempt data from the other half, weighted goals 5 times higher than non-goal shot attempts (goals = 1, saved shots/misses/blocks = 0.2). Tango’s work showed that weighting each event equally didn’t maximize Corsi’s predictive power, and that by changing the weights assigned to each shot attempt we’re able to make a better estimate of a team’s underlying talent. And while Tango’s original analysis looked at pure/unadjusted Corsi and Weighted Shots, when we apply the score and venue adjustment methodology outlined by Micah Blake McCurdy, we see that Score Adjusted Weighted Shots (or SAwSH) are also a better predictor of future GF% than Score Adjusted Corsi at the team level.
The natural extension to all this is applying the idea of weighted shots to individual players. But we don’t want to simply apply the same weights that we did at the team level to individual results – at the team level we’re likely to see less variance in shooting/finishing ability than at the individual level, meaning the predictive value of a single goal is likely to be different for a single player than it is for a team. On top of that, we also have more granular information available at the individual level: we know who took each shot attempt for the offensive team, and for goals we know which attacking players were the last to touch the puck before the goal scorer did. All this is to say that we have a lot more information to work with at the individual level, and this data should allow us to better identify persistent and valuable individual abilities.
There are a few other ways that we’ll differ from Tango’s original approach in our derivation of what I’ll call Individual SAwSH. First, we’re going to split up offensive and defensive play: this is partly out of necessity, since we can’t assign goals against to an individual player (or at least an individual player not named Andrew MacDonald), but also because we want to be able to highlight which end of the rink a player is making most of his contributions in. Second, we’ll look at forwards and defencemen separately, as the factors that are predictive of future success for a center or winger are likely to be different than those that are most important for a blueliner.
We’ll generate our individual level weights by using a linear regression to predict Goals For or Against Per 60 in Odd/Even numbered games using our observed variables in the Even/Odd numbered games.
Odd/Even GF60 ~ Even/Odd iGF60 + Even/Odd TMGF60 + Even/Odd iSAFS60 + … + Even/Odd A260
Odd/Even GA60 ~ Even/Odd GA60 + Even/Odd SAAS60 + Even/Odd SAAM60 + Even/Odd SAAB60
Each variable preceded by i is an individual rate, while each variable preceded by TM is the rate for the player’s teammates while he was on the ice. SAFS and SAAS are Shot Attempts For Saved and Shot Attempts Against Saved, respectively (i.e. shots on goal that didn’t go in, similarly SAFM are misses and SAFB are blocks). A full list of variables used in the regression is given in the results sections that follow. All of our predictor variables are adjusted for both Score Differential and Game Location (Home/Away) using the weights in the table below.
Each of the weights that we’ll generate can be used either on a player’s rate stats or their raw stats, and should be multiplied by the appropriate score/location factor from the table above. So which factors matter most when we run our regression? Let’s take a look.
Forwards – Offence
|Shot Attempts For Saved||0.022||0.036|
|Shot Attempts For Missed||0.052||0.037|
|Shot Attempts For Blocked||0.036||0.029|
*Teammate goals exclude goals which the player assisted on.
**All weights are before score adjustment.
SAwSHF60 = 0.346 * iGF60 + 0.022 * iSAFS60 + … + 0.337 * A160 + 0.260 * A260 + 0.100 * TMGF60 + …
As we’d expect, a forward’s individual metrics appear to be a better predictor of future success than those his teammates’ rack up while he’s on the ice. It’s comforting to see that goals are again much more important than non-goal shot attempts, with an individual goal being worth anywhere from 6 to 15 times more than other shot attempts. Saved shots, misses and blocks are all in the same ballparks, although it’s interesting to note that misses receive a higher weight than either saved shots or blocks (this may be a bit of recording bias seeping in, where all saved shots are captured but only dangerous misses written down by the scorer).
What’s also notable is that assists, both primary and secondary, are key predictors of future offensive success. In particular, first assists are worth nearly as much as an individual goal for a forward, indicating that playmaking and scoring ability are both valuable talents for forwards.
Defencemen – Offence
|Shot Attempts For Saved||0.054||0.042|
|Shot Attempts For Missed||0.022||0.029|
|Shot Attempts For Blocked||0.022||0.041|
*Teammate goals exclude goals which the player assisted on.
**All weights are before score adjustment.
SAwSHF60 = 0.054 * iGF60 + 0.054 * iSAFS60 + … + 0.320 * A160 + 0.103 * A260 + 0.114 * TMGF60 + …
For defencemen, we see a different story as individual goal scoring is less critical than general shot generation (this is to be expected, as defencemen have less control over their own or their teammates’ shooting percentages). Once again, we see that primary assists are an important indicator, while secondary assists seem to show a lot more variance for defencemen, and don’t receive nearly the same weight they do for forwards.
|Shot Attempts Against Saved||0.043||0.045|
|Shot Attempts Against Missed||0.038||0.048|
|Shot Attempts Against Blocked||0.036||0.025|
*All weights are before score adjustment
SAwSHA60 (F) = 0.128 * GA60 + 0.043 * SAAS60 + 0.038 * SAAM60 + 0.036 * SAAB60
SAwSHA60 (D) = 0.085 * GA60 + 0.045 * SAAS60 + 0.048 * SAAM60 + 0.025 * SAAB60
What’s most interesting about the defensive weights is that goals against are much more predictive of future goals against for forwards than for defencemen. While this most seem counterintuitive at first (since we’d expect defencemen to have the largest impact in their own zone), it’s easier to understand if you think about a forward who completely abandons their defensive responsibility. A forward who has no interest in playing defence whatsoever will likely see more rush shots against when he’s on the ice (which have been shown to be more dangerous) or a greater number of odd-man situations in his own end. Defencemen, on the other hand, don’t really have the luxury of selecting whether to backcheck or not, and are dependent on the 3 players up front to ensure that they’re not stuck defending down a man. It’s this dependence that’s likely resulting in the weights we observe – defencemen just don’t have the same level of control over results that forwards do.
What’s also good to see in the defensive weights is that blocked shots are weighted less than all other shot attempts (and in particular, significantly less for defencemen). This difference allows us to effectively address the “Why does a blocked shot count as a negative” argument that’s often made about Corsi. While Corsi treats a shot attempt the same whether the defenceman is able to block it or not (providing little incentive to block the shot if a player were only concerned about their stats), SAwSH shows the clear benefit of blocking that shot. It’s still a negative event, as it’s indicative of the other team having the puck and being able to shoot towards your net, but it’s a less negative event than allowing the puck to go through for a miss/save/goal.
We know that SAwSH is a good predictor of in-season results because it’s designed to be – the weights we’ve chosen are those that maximise our ability to predict half a season of goal data using the metrics we gather from the other half. The question we need to ask ourselves then is whether our metric is still the best predictor year-over-year. After all, it may be that within season stats are more “sticky” and that the weights we’ve chosen don’t produce talent estimates that are stable between years.
To test this we can look at two primary outcomes: 1) how well our metric predicts itself in the future (the repeatability or alternatively, how sure we are that we’ve identified an individual talent); and 2) how well our metric predicts goals for/against in the future (the predictive ability). These two measures are related in that repeatability is required (although not sufficient) to have predictability – we need to be measuring something that’s an actual talent in order for it to be predictive, but the fact that we’re measuring a talent doesn’t necessarily mean that it’s a useful one.
To measure repeatability we’ll look at how well players’ SAwSH correlates with future SAwSH, using Score Adjusted Corsi as our baseline measure to benchmark against. We’ll also look several years out (i.e. predicting year Y+3 using year Y), as intra-team effects can persist over seasons, and looking solely at year Y to year Y+1 can produce deceiving results.
When we run the correlation data we see a few interesting things. First, SAwSH and SAC show at least the same level of repeatability in year-to-year measurements for both forwards and defencemen. This is important, as it means that SAwSH is at least as much of a talent as SAC is. Second, for forwards, SAwSH is a better predictor of itself when we look at timespans beyond 1 year. And lastly, for defencemen, SAwSH is an equally good predictor as SAC – we see no difference in our repeatability metrics at any predictive interval.
These are encouraging findings for SAwSH – we know that when we measure SAwSH at the single season level it allows us to forecast SAwSH in future years just as well as (or for forwards better than) SAC. The next question that we need to answer then is whether it remains the best predictor of future goals over longer timespans. To check that, we’ll perform the same exercise as we did above, except instead of checking each metric’s correlation with itself, we’ll see how well SAwSH correlates with future GF%, again using SAC as our baseline.
Two things stand out when we look at the results: First, for forwards SAwSH is consistently a better predictor of future GF% than SAC. Second, for defenceman, both SAwSH and SAC are equally good predictors. In a way though, this is what we’d expect for defencemen – while we did separate out individual and teammate shot attempts in our regression, in most cases defencemen take a very low percentage of the total shot attempts that occur when they’re on the ice. So while we have the ability to isolate an individual defenceman’s efforts, they’re often washed out by the play of their teammates. And when we throw in the fact that defencemen have little control over their own or their teammates’ shooting percentages, it’s easy to see why the results align so closely with Score Adjusted Corsi.
In addition to being a good predictor of future goal differential, one of the coolest things about SAwSH is that it’s naturally measured in goals. A players SAwSHF60 can be thought of as his expected on-ice goals for per 60, and if we take a player’s SAwSH differential per 60 and multiply it by his time-on-ice we can calculate how many goals above or below average a player was worth when he was on the ice. It’s not quite a full-fledged WAR model, but it does present results in a way that can be interpreted easily even by those who don’t know the full details behind the calculation. A player with a +0.5 SAwSH/60 would be worth half a goal if he played the whole game, which is easier to understand for most than saying a player was worth 3 extra shot attempts.
While SAwSH appears to be a better individual metric than SAC, it’s still not perfect, and it admittedly suffers from some of the same flaws that Corsi does. Although it’s more representative of individual talent, a player’s teammates still have a strong influence on his results, particularly on the defensive end. Fortunately, we can take a cue from the work that’s been done to refine Corsi in looking at how to address these issues – in the same way that we have Corsi Rel, we can compute a player’s SAwSH Rel (which may be particularly useful for defensive measurements), and we can also use SAwSH to calculate Quality of Teammate and Quality of Competition metrics.
Corsi is not a bad statistic by any means – it’s still a better predictor of future success than simple goal differential, and it allows us to make better player evaluations than if we simply focused on shooting percentages and intangibles as our primary talent indicators. It does, however, leave itself open to criticism because it treats every shot attempt the same. And that’s where a major benefit in using SAwSH lies – weighting shots makes sense intuitively, as we know that players can influence both their own and their teammates’ shooting percentages, and that we should be giving credit to players who block shots rather than let them go through. SAwSH allows us to do these things, without introducing the sample size issues that we see by filtering out events we know to be important. Using SAwSH isn’t an indictment of Corsi as a metric, but rather a vindication of the effectiveness of possession based approaches, and proof that they can form the basis for the next generation of hockey statistics.
Individual SAwSH data for 2008/2009-2013/2014 is available here.