After an 82-game regular season, the NHL postseason has finally arrived. Fourteen teams have been sent home, their 2015-2016 campaign finished. The other 16 team are preparing to play some extra hockey, with the first round kicking off later today.
Over the past decade or so, finishing near the top of the league in puck possession metrics has been a must for teams that go on to win the Stanley Cup. As I discussed in the article above, among the 18 teams that have competed in the Stanley Cup Final over the past nine years, 15 of them finished the regular season in the top ten for score adjusted Corsi For percentage.
With this in mind, I set out to create a regression model that would be capable of predicting the winner of a playoff series. I used the statistical software R to calculate a logistic regression model fit to the data from playoff series in the years 2008-2015, with the response variable being whether or not the team won their series (1 = yes and 0 = no). In a logistic regression model, a number of explanatory variables are used to predict the probability of success for a binary outcome; in our instance, the binary outcome is whether or not a team won the playoff series.
I tested a vast number of explanatory variables in the model, starting with full regular season statistics. Both the team’s statistics and their opponent’s statistics were tested.
- 5v5 Goals For percentage (GF%)
- 5v5 score-adjusted Corsi For percentage (CF%)
- 5v5 score-adjusted Fenwick For percentage (FF%)
- 5v5 score-adjusted Scoring Chances For percentage (SCF%)
- 5v5 shooting percentage (SH%)
- 5v5 save percentage (SV%)
- Power play goal differential per 60 minutes
- Penalty kill goal differential per 60 minutes
- Penalties drawn per 60 minutes
- Penalties taken per 60 minutes
Many of these were included simply to test, and ended up not being fit in the final model. Although the amount of actual talent in shooting percentages is likely fairly small, I tested both save and shooting percentages in order to assess the value of strong goaltending or shooting.
5 on 5 play makes up the vast majority of NHL game time, but power play and penalty kill units can’t be ignored. The special teams statistics were combined into a single term that incorporated the efficiency and frequency of both situations for a given team. For power plays, I multiplied the team’s penalties drawn per 60 by the the opponent’s penalties taken per 60, and then multiplied by power play goal differential per 60. For penalty kills, the rate of penalties taken was multiplied by the rate of the opponent’s penalties drawn, and multiplied by penalty kill goal differential per 60. What I’ve done is combine the number of power play/penalty kill opportunities a team will get, as well as their expected goal differential in those situations. This gives us numbers that represent a team’s hypothetical power play strength and penalty kill strength, including not just their goal differential on special teams but also their ability to draw penalties/avoid penalties.
In addition to the full regular season statistics, I also made an effort to concentrate on post-trade deadline statistics. Most teams play around 20 games after the trade deadline, and score adjusted metrics often reach their peak predictive value for future 5v5 goals for percentage around the 20-game mark; after that, there is a decline. Because contending teams typically trade for players at the trade deadline that boost their roster, the true talent level of the roster is often reflected in post-trade deadline statistics, not full season statistics.
The post trade deadline statistics tested are as follows. Like with the full season statistics, both team and opponent metrics were tested in the model.
- Post trade deadline score adjusted Goals For percentage (GF%)
- Post trade deadline score adjusted Corsi For percentage (CF%)
- Post trade deadline score adjusted Fenwick For percentage (FF%)
- Post trade deadline score adjusted Scoring Chances For percentage (SCF%)
The data was collected from war-on-ice.com, and covered a total of 120 playoff series. Because were two teams in each series, there were 240 total observations.
In R, logistic models were fitted using the variables listed above, and then tested using leave-one-out cross validation. Cross validation is a way to check how well a model responds to the addition of new data. The LOOCV method of model testing is aptly named; one observation is left out of the data set, and the model is fit to the rest of the data. The prediction error is then calculated, and after repeating for each observation, an average of the error is taken for the entirety of the data set. The lower the error, the better the model.
LOOCV was used in this case in order to use the full data set to construct the model. With model construction, over-fitting can be a problem when there is no data set to test the model on. Over-fitting describes when a model becomes too complex, tailored to the idiosyncrasies of the sample used to create the model rather than reflecting the overall population.
The key here is that the dataset that was used to calculate the model cannot be used to test the model, as the predictive power of the model will seem much higher than it actually is. And as Michael Lopez writes, “Predictions are useful not in how they perform in-sample, but in how they do out-of-sample – that is, when they are applied to a data set other than the one in which they were generated.”
LOOCV helps prevent over-fitting by deriving a more accurate measure of model performance, while still allowing for the full data set to be used in construction of the model.
In the end, the model with the least amount of error from LOOCV was chosen as the logistic model. While most models had a delta (error) around .22, the following model had a delta of .208. The lower the error, the better the model, so this is the model I choose to use. It is described below.
With 8 terms, the model is not over fit, or too specific to the dataset that constructed it, and it contains values that make sense from an intuitive standpoint.
- Goals For percentage (GF, OGF) is essentially a combination of shot attempt metrics, as well as shooting percentage and save percentage. If a team has a skilled net minder or skilled shooters, then that will be captured within this variable.
- Post-trade deadline Corsi For percentage (TDCF, OTDCF) is the best predictor of future 5v5 goals for percentage among available statistics, and captures any effect of a team receiving a boost from trade deadline acquisitions.
- The power play (PP, OPP) and penalty kill (PK, OPK) terms capture any effect that special teams may have. Teams with poor penalty kills will suffer against teams with good power plays, and this is captured in the model.
I then applied the model to this year’s playoffs by inputting their collected statistics for the 2015-16 season. Here are the predictions for how far each team will go in the postseason; the percentage listed is the probability based on the model that the team wins that particular round.
One thing that intrigued me about the model was the inclusion of GF%, as it outperformed any combination of shot attempt metrics and save percentage. When I looked back at the data, I saw some interesting trends; the top Corsi teams of the past couple of seasons have also been insanely dominant when it comes to Goals For. Take Detroit in 2008 (58.6% GF%), Chicago in 2010 (54.8% GF%) and even L.A in 2014 (56% GF%). The best teams outshoot their opponents, and aren’t held back by poor goaltending, or rough shooting luck; they often out-score their opponents too. It will be interesting to see if this trend continues in the future.
In the future, I would like to include post-trade deadline statistics for the power play and penalty kill, as well as even strength numbers. Unfortunately, I simply ran out of time this year (and collecting data is hard). Thankfully, I have the spreadsheet saved, and I’ll be updating the model next season.
This is a brief introduction to and discussion of the model. For a more detailed explanation, please reach out to me on Twitter or by email (firstname.lastname@example.org).
Also, a big thank you to Megan Richardson, who helped with editing and checked to see if my methods were sound. Follow her on Twitter @butyoucarlotta.