Corsi is the buzzword of the advanced hockey stats community, and for good reason. Corsi is a more descriptive statistic in smaller sample sizes, and thus it can be used to give us more accurate information on performance before other more traditional stats normalize. If you’re familiar with the current state of hockey analytics feel free to scroll down a little bit.

In other words we use corsi because goals (what we’re really after) are relatively infrequent events. A shot attempt is a positive outcome, and as long as what we are measuring is an innately positive outcome (something the team is trying to do and something the opposition is trying to prevent), we know there will be some value in it.

“A corsi attempt is a innately positive outcome.”

This is true, but it is only a *representative* positive outcome. A corsi attempt, in isolation, isn’t really valuable. This might be the point you think, “well corsi shows that you were in possession, and *that *is innately positive.” Well, it’s not possession, it *shows* possession. To be clear, I’m not saying corsi is not representative of possession (having access to actual possession data, I can say that it’s not too bad), what I’m saying is that corsi, in itself, is not possession.

Corsi acts as an inference of a goal, and we use it instead of goals because it has more predictive power in smaller sample sizes, and since these ‘smaller sample sizes’ can be entire seasons, it has value above what goals for% can offer us, since so much can change season to season in auxiliary variables (quality of teammates, ravages of age, etc..)

The barrier between shot and goal based statistics is called shooting or save percentage. The problem with goals is that we don’t really get enough of them in a season to give us a confident enough sample size, and, since these statistics require goals, they have all the same problems.

There’s a solution to this problem. By using data that is available for every shot, we can circumvent this sample size issue, while helping to fix the central problem with corsi (every shot attempt is valued exactly the same). I used all of the data that is available for every shot in the RTSS era, which goes back to the 2007 NHL season, to try to predict the chances of that particular shot going in. By predicting the chances of any one shot going in, we don’t need to rely on the small sample size ‘true’ shooting percentage gives us. This is not an unfamiliar practice, and I myself have had some attempts at it in the past.

The goal of this analysis is to see how good of a model we can create to accurately reflect the ‘true’ chances of a shot being scored, using only the data the NHL provides.

Here are the variables that went into my model.

**Rebounds**

A rebound is described as any shot that followed a shot that happened 3 seconds or less prior. Shots classified as rebounds go in 29.2% of the time.

**Distance of shot**

The record keeping of the distance from net is much derided by the analytical community, and for good measure, but there’s no question it has *some* predictive power.

**Post-faceoff**

Shots immediately following a faceoff were about a 1.5% less likely of going in.

**Shot type**

Another source of poor data from the NHL, but hey! if it has predictive value, it gets put in the model.

**Win Probability**

Analysts often use the term ‘score effects’ to describe the phenomenon of leading teams taking fewer shots with more shot/goal success while trailing teams taking more shots with less success, but it’s actually a little more complex then that: the score effects shooting% in different ways based on the time remaining in the game. To account for this time remaining is used in tandem with the score state in the model.

**Strength**

Strength can seriously effect shooting%.. especially when killing a 5 on 3 penalty (?)

I put all of the data into a logistic regression, which estimates the probability of a binary outcome occurring (if the shot resulted in a goal or not) based on 487,663 shots in the RTSS era. With this we get an ‘expected shooting%’.

Here’s what our results look like. This shows every player with over 500 shots taken, and their ‘expected sh%’ based on the model, and their actual shooting%.

That line gives us an r squared of .973, with a standard error of .017, which means that on average, shooters are about 1.7% in shooting% away from their expected shooting%.

**Conclusion**

This is probably about as good of a model as can be constructed using what we have to work with in the RTSS data, so I think we can draw some conclusions about the NHL shot data from it.

1.** A player’s shooting skill, above the variables described in the model, is worth about 1.6% per shot. **The best shooters (Tanguay, Marchand, STEVEN STAMKOS) can add 3-5% in shooting percentage above what the model inputs predict.

To describe this better I charted the difference in real and model predicted shooting% and expected shooting%:

2. With players who have played more then 5 seasons since 2007, the standard deviation of their season to season shooting% is 3.1%. For expected shooting%, the standard deviation is 0.9%. **Expected shooting% is a repeatable statistic at the season level, and it has more predictive power then raw shooting% season to season. **In 56.7% of cases, the previous season’s expected shooting% was closer to his seasons shooting% then his previous seasons actual shooting%. Pretty impressive when you consider that a player’s previous performance is completely removed from the predictive model.

3. This model does not perfectly estimate each individual shot’s chance of going in, but it’s close except for the most extreme of players. This model can be improved upon by adding inputs on the shooter and goaltenders previous performance, but that would be cheating now wouldn’t it?

**So what does this all mean?**

I plan on releasing this statistic shortly as part of some statistic renovating going on here at Hockey Prospectus, and I think it will have a lot of value when interpreting shot quality quandaries like team PDO and the varying quality of shots faced by goaltenders. **With this number, we can adjust shot to goal ratios and attribute the rest up to luck/shooting skill. I basically just hope it scrapes some of the luck off the top of shooting and save% statistics.**

To get back to corsi and what I started this article off with, there’s nothing innately special about ‘corsi’, or shot differential, that needs to be preserved for the sake of it. Speaking from a strictly analytical sense, I found 7 different variables that we *know* effect the likelihood of a shot going in in some way. Why ignore these things that would obviously improve the value of shot differential (by weighting them appropriately) if we don’t have to?

Thanks, I enjoyed the article and I’m looking forward to the additions.

It was very interesting to me to see the things many coaches are preaching vindicated by the numbers. Specifically things related to the cliches like “traffic to the net” (tip ins, deflections) and shooting for rebounds.

I’m inclined to believe that the 3v5 sh% is mostly due to the low number of occurrences and the “shot quality” becuase generally these chances would be breakaways or odd man rushes when the offensive team is collapsing towards the net and a pass is intercepted.

Matt – I really liked the article. Was looking at Zach Bogosian and his actual shot % for 2013-14 was listed as 5% in your database (or rather, on his chart) but I am seeing it as 2.2% (3 goals in 134 shots)in other data so you might want to check that.

Drew Staffords actual shot %’s don’t look right either.

Great work. As others have already stated, a lot of your actual sh% are off. One other thing to consider is that maybe you should not include power play shots, or at least separate them from even strength shots on the basis that the quality and quantity of power play shots is much more dependent on the opportunities you are being given then any skill. An obvious example of this would be Wayne Simmonds. In his three seasons in LA he hardly played on the PP and his median model predicted sh% was 10%. In his three seasons in PHI he’s played a ton on the PP standing directly in front of the net and his median model predicted sh% is 14%. I seriously doubt that jump was about him becoming so much better at creating better scoring chances as much as it was about his new found role becoming the net front presence on one of the better PP in the league. The quality of shots you attempt at even strength is much more skill related than the quality/quantity of shots you take on the PP which are more role/opportunity related. I bet your model predicted sh% would be even more consistent year to year if you looked only at even strength shots

Pingback: On Shot Quality | Hudson River Rivals

Pingback: My Favorite Devil: Adam Henrique

Pingback: Devils and Shot Quality | World Sport News

Pingback: In Defense of Pete De Boer | Hudson River Rivals

Pingback: Comparing eGF to Corsi and WAR SC | XtraHockeyStats

Pingback: Odds and Ends II Bobby Ryan on Analytics (Originally Posted 1 Nov 2014) | Stories Numbers Tell