As noted by others, the 2016-2017 NHL season is currently at the point where we have around 25 games of data for each team. This is about the time when shot attempt metrics such as Corsi carry the most predictive power, so it’s worth the time to take a peek and see what these metrics are telling us.
For fun, I decided to compare the predictive power of Corsi to the predictive power of scoring chances and as Don’t Tell Me About Heart’s expected goals model. He was kind enough to pull the data I requested, and I thank him immensely for that.
Using R, I ran a basic correlation function to construct a correlation matrix for Corsi, scoring chances, expected goals, and goals, splitting the data into two bins: “First 25 games” and “Rest of the season”.
Interestingly enough, future goals for percentage correlates the most with Corsi for percentage, with expected goals coming in a very close second. In-sample, Corsi and expected goals were essentially equally correlated with goals (Corsi had a higher correlation in the first 25 games, expected goals was higher for the rest of the season). The full table can be found here.
This basic analysis just kind of confirms what we already know, which is that Corsi and expected goals are the way to go when it comes to predicting the future; goals and scoring chances just don’t carry the predictive power that those two do. Also, we can see that Corsi stays fairly stable from the first 25 games to the rest of the season; the R^2 for the correlation between CF% for the first 25 games and CF% for the remaining games is 0.597, as compared to 0.385 for expected goals and 0.239 for scoring chances.
So not only is CF% the best predictive variable, it’s also the most consistent. Shot attempts are the best current statistic we have for analysis, mainly due to the rate at which they accumulate. No NHL team this season has scored more than 100 goals, while the team with the least amount of shot attempts taken (the Colorado Avalanche) have attempted 935 shots at 5v5 alone. With special teams included, they’re well over 1000.
This eliminates any issue that might arise due to a small sample size, and does a decent enough job at eliminating randomness from our measurements. Obviously Corsi isn’t a perfect metric, and the best predictions for future goals for percentage come from regressing Corsi 70 percent to the mean, but it’s the best we currently have.
With that in mind, I decided to construct a simple regression model to predict goals for percentage for the rest of the 2016-2017 season. I first constructed three models; one that used Goals For percentage, one that used Corsi For percentage, and one that used Corsi For percentage as well as PDO.
I then removed one season from the data set, used to model to predict Goals For percentage for the remainder of the season, and compared the predictions to the results. I did this for every season from 2007-present.
The model that was the most accurate was the third model, which utilized Corsi For percentage as well as PDO. To prevent over-fitting, I tested each variable for significance, and was given p-values below .01. I concluded that both were statistically significant, and kept both in the model.
Here are the predicted GF% values for each NHL team for the remainder of their season, sorted from highest to lowest (data up to 12/4).
The model seems to be very conservative, but for the most part, these results make sense. Some interesting observations:
- Why hello there, Nashville. Good to see you’ve recovered from your early season struggles.
- Are the Blue Jackets for real? I’ve been pretty skeptical of their run recently without really checking out their numbers, but it seems like Tortorella has them playing some pretty good hockey right now.
- Edmonton in the top-ten would confirm what I wrote about them earlier in the season, where I said I was pretty sure they’re a playoff team. McDavid really is McJesus.
- There are some interesting teams over 50%, like Carolina and Minnesota. I doubt Carolina’s shooting percentage will ever be high enough for them to have a GF% over 50%, while I think the Bruce Boudreau-led Minnesota Wild will exceed expectations. Boudreau tends to have that effect on teams.
- Pittsburgh at 13th is… interesting. They’re one team to keep an eye on, as their expected goals and scoring chance numbers are much better than their shot attempt numbers, and they’re loaded with offensive talent. They might be able to out-perform their Corsi numbers.
- Florida and Dallas will also be interesting to watch. Both have suffered from injuries early (Florida is just missing Huberdeau, but Dallas has been decimated), and as players get healthy, they could really improve, and exceed their predicted results by a noticeable margin. This is for Dallas more so than Florida, but don’t ignore Huberdeau’s return to health. It’s going to be a huge boost for the Panthers whenever the French-Canadian winger makes his way back into the lineup.
- The Rangers have really cooled off recently, and the rest of the year might be even more difficult for them.
- Tampa Bay – not a playoff team?
- This is probably the year Detroit’s playoff streak comes to an end.
- Look for the Senators to fall off soon. They can’t stay second in the Atlantic Division for much longer, not with all the warning signs surrounding their team (negative goal differential, poor shot attempt numbers, etc).
- The rest of that bottom group looks reasonable. The Islanders have struggled, the Canucks are just bad, Colorado continues to lose the shot attempt battle, and Coyotes are probably tanking again.
(data taken from corsica.hockey, and is at 5v5 unless otherwise mentioned. Corsi and scoring chances are score and venue adjusted, while goals are not.)