Using binomial distributions to project future performance: Part one, shooting percentage
How does a player’s performance impact our predictions of the future? This is, boiled down, the real reason most of us care about analytical thinking at all. Shooting percentage and shot quality are the grayest areas of the hockey analysis to date. We rely on shot differential statistics because it is more consistently indicative of on-ice success, but it also clear that some players can sustain shooting percentages significantly above or below the mean.
We know that shooting success is mostly random. But sometimes it is not. As analysts, we should be able to do better. Hockey could use a better sense of what we can really tell from shooting percentage. It seems many times we either use goal-based or shot-based analysis and throw out the role shooting percentage plays altogether. We can do better then this. We know shooting success means something, sometimes. The question of how we can use results to date to inform our future predictions is therefore a pertinent question.
We know somebody like Phil Kessel is probably a better then Colton Orr as a pure talent shooter of the puck, but by how much? Knowing a player’s own variance in shot to goal success, we can’t actually simply say that Kessel is truly better by the amount difference in their statistics, because we can’t be sure that either of their shooting percentage is actually what their true talent is.
Using a binomial distribution, we can incorporate the sample size and performance of his past to give us a range of potential outcomes in the future. We know Phil Kessel is a better shooter then Colton Orr, but we can investigate more about what their success to date can really tell us about what their true talent really is.
You have a coin. You flip it two times, both times you get heads. This coin, you say, must be rigged. If this coin wasn’t rigged, you’d expect it to flip heads and tails equally, and it has not done so. Obviously this is dumb. So far it’s landed heads 100% of the time, but how much does this really tell us about how future coin flips will go about in the future? Here’s what our binomial distribution shows us.
After we flip the coin once and I get heads, it tells us nothing. There’s a 50% chance it will get heads or tails the next time. With every successive heads however, the distribution gets less and less confident the “true probability” is actually 50%. Looking at this graph, if you’ve gotten the same result 20 times in a row, the chances you are dealing with a fair coin is 9.53674E-07 percent. For all intents and purposes, zero.
Now let’s focus our attention to hockey. Our coin flips are now shots on net. Landing heads will now be called scoring. Since we’re talking about hockey players, finding their true success won’t be as easy as working with a coin, because unlike the coin we do not actually know what we should be expecting to be their true ratio of goals to shots. We will reverse engineer the coin flip example by looking at a variety of different shooting percentage possibilities, and the relative likelihood of that shooting percentage being true.
Here’s our five players:
- Phil Kessel
- Tom Sestito
- Colton Orr
- Scott Gomez
- Alex Tanguay
This group gives us a wide cast of characters. Some of them have held percentages above or below the mean for a long time, and some have just recently broken in with an abnormal shooting percentage, and assumptions have already been starting to form about them.
In this graph, a player’s line shows us the possible range of his true shooting percentage level, based on his shooting success since the 2007 season. 2007 because it seems like a reasonable place to capture the true peak years of our older players, so as not to effect the distribution with unrelated effects (age). I also chose it because 2007 was the last time Peyton Manning won a Super Bowl. A player’s “true talent” even strength shooting percentage is somewhere along his line, and the closer to the middle you get the more likely his shooting% is around there.
The more straight the line, the closer the floor and ceiling are for his true even strength shooting percentage. As you would expect, the younger players of our group have longer lines, indicating the possible range of their real talent is very large. For Tom Sestito, current league wide leader in shooting%, there is not much here for him to hang his hat on. We have his realistic floor at around 7%, and his ceiling around 20-30%, depending on how we want to define “realistic”. Colton Orr has had a very poor start in terms of his offensive success at the NHL level, and without having any additional knowledge of Orr’s goonishness, the distribution has already put position average shooting success as a realistic ceiling for Orr. If we take a look at the upper level of Orr’s possible true talent shooting percentage, we see his maximum possibility is actually higher then Kessel’s. This tells us a little bit about binomial distributions, and how we should think about probability in general. Phil Kessel has been a better shooter so far in his career, but with a larger sample size we can put a closer limit on his future performance compared to Orr. If we look at the graph above, we see that the possibility that Colton Orr is a better shooter then Phil Kessel is about 8%. Given what you know from watching these two you might consider this to be rather large odds on Orr, but when only using previous performance to build this system, I think that is about what we could realistically hope for. We know Colton Orr is not the shooter that Phil Kessel is, but when we put numbers on the actual odds we’d be willing to put on it given just the numbers we have now, you should start to see why treating Orr and Kessel equally in the eyes of corsi would be unfairly taking something away from Kessel’s on-ice contributions.
Alex Tanguay is one player we can point out as clearly able to sustain an above average level of scoring success. At this point in his career, the system has enough of a sample size to be sure there is something substantial (non-luck based) to Tanguay’s success. Think back to our 20 consectutive coin flips resulting in heads. The chances he’s a true talent 10.5% shooter and his success above that (16.6%) has been luck is 8.50034E-05 percent, a similarly meaningless number.
Looking at that graph, many points on the line aren’t as relevant. There is not the same chance Sestito is a 38% shooter as it is he’s an 11% shooter. The Y axis shows the percentile of probability that particular shooting percent falls, and to look at we could possibly call ‘reasonable’ possibility going forward, we will use one standard deviation away from the mean to see just how much a binomial distribution can eye in on a player’s true talent level.
Look at Scott Gomez! We can point Gomez’s true talent shooting percentage to within a half percentage point (0.04-0.045). That’s the same chance the Denver Broncos had to win the Super Bowl in the 7th minute of the 4th quarter. I don’t really know why Scott Gomez has such a low shooting percentage, but we know that it’s real. Since we know it’s real, any shot differential stats analysis is basically useless for Gomez, as his capacity to create shots will always be harbored by his inability to turn them into goals at a regular level.
Returning to Sestito, his even strength shooting percentage might tell you just a little bit more then you think, though still not a lot. Here’s his graph isolated:
The possibility Sestito’s true talent ES Sv% is at or below 5.5% is about the same chances the team wearing white would win 9 out of the 10 last Super Bowls (0.009%). Sestito’s 4 goals in 24 shots might not strike you as a ton to go on, but the binomial distribution doesn’t throw it completely out the window.
With the five players selected as test cases, we have a wide range of examples of just how much we should rely on a player’s shooting percentage to be sustained. For players like Sestito and Orr, there’s little we can draw on with their results to date, and until a greater sample size comes up that significantly draws them away from the mean, shot analysis (corsi, fenwick) is the only logical place to consider them statistically. For players like Alex Tanguay, Phil Kessel and Scott Gomez, however, they have very small error bars on their possible true talent shooting percentage%, and any shot analysis without accounting for how they’ve proven to create goals at unique rates would be incomplete.
What this research shows us is that a healthy balance is needed when analyzing a player between his goal and shot differentials, similar to the healthy balance of musical genres the Super Bowl Half Time Show afforded to viewers this year. There will always be some error to a player’s possible on-ice outcomes, but using this kind of analysis can help us find out just how far away we can tip toe from corsi and fenwick.
I’d like to think of a way to have this sort of information available to the hockey analytics community at large. Shooting luck has been often thrown out completely due to it’s craziness. Possibly a more nuanced approach using things like the method used here to try to see if there’s anything at all sustainable in a player’s numbers going forward. With 200-300 shots, we can put a player in a bin of 2-4 points of shooting percentage. With 500+ we can usually be sure that a player’s shooting percentage in that sample is within a percentage point of accuracy to what his true talent is. Go Seahawks.
Matt Pfeffer is statistical analyst for the Ottawa 67s, and a contributor to TheHockeyNews.com and Hockey Prospectus.
Follow Matt on Twitter at @MattyPfeffer.