‘Close’ statistics are often used when working with shot differential stats due to the effect the state of the game can have on shooting percentages, and thus the real ‘goal value’ of shot events. ‘Close’ is defined as a game within 1 for the first 2 periods of play, and tied in the third. This arbitrary parameter can wipe out a lot of score bias in corsi attempts, but it’s not exactly scientific. There’s nothing about those specific circumstances that is special, and the effect of the game state on shooting percentages is much more complex then that.
By using win probability, we can more accurately represent, to the percentage point, the actual distance between the two teams in terms of winning the game. You can find the win probability state for every shot recorded in our raw data section. Not to harp on the definition of close too much, but within those parameters you can still find game situations on both ends of the spectrum in terms of the closeness of the game. Here is every shot taken since 2007 that falls within the close definition and the actual win probability for that team.
If there is something to the game state effecting shooting percentages, and there is, using within 1 in the first 2 periods and tied in the 3rd isn’t doing a great job of capturing it.
The problem isn’t just that the ‘close’ definition doesn’t accurately reflect the win probability for both teams, it’s that within these definitions there is no end point to where we can capture a equalized shooting percentage. Teams don’t just say ‘hey, we;re down 2 in the third, lets shoot this way’. Shooting percentage organically rises with win probability, and there isn’t really a place you can find a stable baseline for it.
Looking at that graph you see some things that go counter to some fancy stat reasoning. For one, there’s a much more steady increase in shooting percentage with win probability percentage then the inverse. If you’re winning, it matters much more by how much (in terms of win probability) then it does to how bad you’re losing.
Here’s the money graph. Within 30 and 70 percent win probability (meaning that when teams are within 30% of each other), the GF% and CF% differentials line up perfectly with each other. Beyond that, teams with a 40% chance of winning or less are outshooting their goal differential, while teams with a 70% or more chance of winning undershoot their goal differential.
This is the best model for adjusting for score effects, as the ‘close’ definition is just an arbitrary attempt at controlling for something that can be objectively quantified with win probability. 13% of ‘close’ shot attempts fall outside of our 30-70 scale. Close shot attempts account for 62.6 of all 5v5 shot attempts, while only 54.6 fall within the 30-70 range.
I’m pretty excited to see what corsi can tell us with a more accurate adjustments of score effects, which win probability data can now give us. As with adjusting for zone starts, these are incremental steps towards improving corsi, but with every step we give ourselves more consistency, more reliability, and less noise in our statistic. More to come!