Of all the positions in hockey, goaltending should, in theory, be the easiest to analyze. After all, while a player's contribution is extremely difficult to extract from that of his teammates on the ice, a goaltender's job is his and his alone: stop the puck. Despite this welldefined role, for decades goaltenders were judged mainly on goalsagainstaverage, shutouts and especially wins, which explains why Terry Sawchuk and Grant Fuhr were considered the greatest goaltenders of their eras. Today, most hockey minds have converged on what the Contrarian Goaltender called "the one stat argument": that save percentage alone is the best way to judge goaltenders. With the arrival of more detailed data from the NHL, we can now also restrict ourselves to evenstrength save percentage, which creates a more level playing field.
Goaltending today is a much different trade than it was in the days of Sawchuk and Fuhr, however: the influence of Patrick Roy, the spread of the goalie coach and the influx of European players to the NHL have created a surplus of quality goaltenders, so much that there is no longer any consensus best goaltender, and some people think that the difference in skill between NHL goaltenders is negligible. I will attempt to answer some of the most fundamental questions about goaltending, namely:
 Is there a significant spread in goaltending skill in the NHL today?
 What is the relative importance of luck to a goaltender's results?
 Is evenstrength save percentage truly the best yardstick for measuring goaltender performance?
Methodology
For my raw data, I used the last 3 seasons of goaltenders, because I want to correctly split out 5on5, 4on5 and other situations, which is easier with the new format the NHL has been producing since 200708. This gave me 123 goaltenders, roughly 90 of which played in each season. I then compiled each goaltender's goals against and shots against for 5on5 situations, 4on5 situations and "other", which I didn't end up using. I also calculated each goaltender's expected goals against, weighing each of their shots against with my shot quality metric. For those interested, the data used for this study can be found on the PuckProspectus stats repository here, under Worksheets.
There are a number of things I did not do with this data. One is that I did not restrict myself to road data, and the other is that I did not restrict myself to data when the score was tied. The reason for both is the same: sample size. My entire objective is to find a signal within the noise, so the more I restrict my data set the less likely I am to find what I'm looking for. To compensate, the shot quality measurement normalizes all team shot counts and shot distances to road totals, in order to remove arena scoring bias, and it also corrects for game score effects.
Most people looking at goaltender data tend to use save percentage, but when performing an analysis like this I prefer Goals Versus Average, which is simply (my save %  average save %) * Shots Against when using shots, or conversely Expected Goals Against – Goals Against when using expected goals. The reason is simple: goaltenders that play fewer games will have more noise in their save percentage. If you include them all, the amount of noise they add to the data is such that the results are always inconclusive. The other approach is to use save percentage only for goaltenders that have played a minimum number of games in the season, say 40. This eliminates a decent portion of our data, and still produces variables which are not normally or binomially distributed, which is the basis for most models. There is only one disadvantage to using Goals Versus Average: it is close to 0 when goaltenders play few games, but their save percentage in those situations is meaningless anyhow.
The Results
The first thing I wanted to look for was whether the variance in goaltending skill was anything more than what would be expected by luck. To do this, I calculated 3 variances: 1st, the expected Variance of Luck (VL), for which I estimated goals allowed as a binomial variable, and calculated the expected variance based on this. To be clear, I obviously don't know how much of the results are due to luck as I'm estimating based on wellestablished statistical models. 2nd, Variance of results due to Shots (VA), which is simply the variance of Goals Versus Average based on shots against. 3rd, Variance of results due to Shot Quality (VQ), which is the variance of Goals Versus Average based on Expected Goals Against. The portion of each was thus:
Skill = (VQ – VL) / VA
Shot Qual = (VA – VQ) / VA
Luck = VL / VA
You can check for yourself that these values should sum to 1. Here are the results:
Save % SingleSeason Skill Shot Quality Luck
5on5 save % singleseason 26.5% 1.7% 71.8%
4on5 save % singleseason 18.9% 2.9% 78.2%
Both save % singleseason 33.0% 1.0% 66.1%
As you can see, the theory that there is very little skill variation among NHL goaltenders is not completely crazy: evenstrength save percentage was almost threequarters luck, with the team contributing a minuscule 2% and the goaltender himself contributing only 26%. Given the smaller sample size, it's normal that the 4on5 results were even more luckdriven. What's more interesting is the third line: despite the fact that 4on5 results were very noisy, the results of summing them both were less noisy than 5on5 GVA alone. In other words, 5on5 goaltending and 4on5 goaltending are correlated skills.
Given that the singleseason results were pretty noisy, I summed each goaltender's results over the 3 seasons and repeated the analysis. Even though many of the goaltenders didn't play a full load for all 3 seasons, surely with 3 seasons' worth of data instead of 1 we would expect the goaltenders' true skill to shine, right?
Save % SingleSeason Skill Shot Quality Luck
5on5 save % 3 seasons 36.6% 5.4% 58.0%
4on5 save % 3 seasons 9.7% 19.9% 70.4%
Both save % 3 seasons 40.7% 12.6% 46.7%
These results are clearly better, although it's impressive to think that if we were to look at every minute played by a goaltender in the NHL regular season over the last 36 months, almost 50% of the variance is still due to luck. By combining the results, we now see that shot quality has taken on an enhanced role, although still much smaller than the other two. The 4on5 results are interesting: even with the longer time period, it's very hard to discern any skill whatsoever when looking at 4on5 goaltending by itself; however, we again see that it correlates well with 5on5 goaltending, as the combined skill percentage is higher than 5on5 alone.
Just to be clear, what I have observed in this analysis is the variance in skill of NHL goaltenders given their playing time over the last 3 seasons. Over this period, there are 36 goaltenders that have faced at least 2,000 shots at 5on5, and between them they've had 68% of the NHL workload. All of them also faced at least 452 4on5 shots (except Pekka Rinne, who only saw 362). It is the skill differential between these goaltenders that my analysis is truly looking for, with the others mostly representing noise. In case you're curious, the highestperforming goaltender among the fulltimers is Jonas Hiller, although the one we're most convinced of being top notch is Tomas Vokoun, given the level of play he has sustained for over 5,500+ shots.
What Is Most Sustainable?
While these results are interesting, the more important question is whether it's sustainable. Say I have a goaltender, Ryan Miller, who is coming off the season of his life. Can I expect the same from him next year? And if not, what can I expect?
To figure this out, I simply calculated the correlation coefficient of year1 GVA to year2 GVA. This will understate the actual sustainability of goaltender skill, since the amount of action a goaltender sees varies from yeartoyear: some players get injured, some enter the league and others leave. Nevertheless, it will give us a good idea of how sustainable goaltending skill is, and what best predicts next year's results. I correlated 5on5 to 5on5, 4on5 to 4on5, both to both, as well as 5on5 to both. There is a popular theory that says that evenstrength save percentage is the best way to judge a goaltender, not overall save percentage, and I wanted to find out if this is true. Below are the correlation coefficients, using GVA based on shots and GVA based on expected goals:
Year1/Year2 SA EGA
Year1 5on5 / Year2 5on5 0.187 0.166
Year1 4on5 / Year2 4on5 0.100 0.014
Year1 Both / Year2 Both 0.279 0.199
Year1 Both / Year2 5on5 0.251 0.195
Year1 5on5 / Year2 Both 0.221 0.183
A few things jump out: GVAs based on shot counts are more sustainable than GVAs based on expected goals. This means that arena scoring bias and shot quality create the illusion of sustainable goaltender skill where there is none. Over the last 3 seasons, the goaltenders who have most benefited from this are Evgeni Nabokov, Ryan Miller and Niklas Backstrom. The goaltenders who have most suffered (i.e. are better than their numbers) are Cam Ward, Vesa Toskala and Manny Legace (but even after adjusting for shot quality, Toskala is still poor).
There is still some sustainability in evenstrength save percentage, almost none in 4on5 save percentage, and yet more when combining the two numbers. Also, overall save percentage predicts next year's overall save percentage slightly better than evenstrength save percentage. The results for the 4on5 save percentage are consistent with similar work performed by JLikens.
Conclusions
My analysis, if it is correct, leads to a few interesting conclusions concerning goaltending statistics.
 As of this year (2010), all NHL goaltenders do not perform equally. However, the amount of yeartoyear variance due to luck is greater than the spread among NHL goaltenders. Refinements to our understanding of shot quality could further reduce the observed amount of skill.
 The best goaltenders are worth about 20 goals to 25 goals above average, assuming 2,000 shots against per season.
 Penaltykilling save percentage is a skill, correlated to evenstrength save percentage, but one that can hardly be observed. Most of the observed "skill" from stopping shots on the penalty kill comes from the quality of the shots.
 Much of the apparent sustainability of goaltending skill is due to scorer bias and shot quality. This is partly why two different Boston goaltenders have led the league in save percentage the last two seasons.
I can also make one anecdotal prediction, based on all this data: if the Sharks do well this spring, Evgeni Nabokov will sign a much bigger contract than he should be entitled to. Meanwhile, several unknowns will emerge next season and outperform him. Such is goaltending.
Tom Awad is an author of Hockey Prospectus.
You can contact Tom by clicking here or click here to see Tom's other articles.
