Monday 22 September 2014

Year to Year Repeatability of New Goalie Stats

For as long as people have been analyzing hockey, goalies in particular have been difficult to analyze. Based on what information has been available to the masses up until now, it has been extremely difficult to get an accurate read on a goalie's abilities until they see a large amount of action. Metrics in evaluating goalie performance to date have ranged from awful (wins and GAA) to informative but unpredictable (SV%). However, a new hockey statistics site called War-On-Ice has released data that has been able to further break down a goalies SV% into more detailed categories depending on the shot location data provided by the NHL. Their four new stats (and one more classic stat) are as follows:
  • Low Save Percentage
  • Medium Save Percentage
  • High Save Percentage 
  • Adjusted Save Percentage
  • UnAdjusted Save Percentage (technically not new since it's just basic SV%)
While these new metrics are definitely very descriptive and informative I really wanted to test the predictive value in goalie projections. That is, if a goalie posted a .900 SV% in a certain category last year, what should we expect them to get next year?
Save Percentage Zones via war-on-ice.com
The picturing above shows the zone breakdown for each of these stats:
  • Blue = high percentage shots (SvPctHigh)
  • Red = medium percentage shots (SvPctMedium)
  • Yellow = low-percentage shots (SvPctLow)
Here is the breakdown of how these stats matched up versus one another from one season to the next. (Regression line in red)


Well, that wasn't very informative. We see very little correlation from one season to next regardless if you're treating every shot as equal (Unadjusted SV%) or break each shot down by it's general degree of difficulty (Low, Med, High SV%). I had high hopes for Adjusted Save% too (especially after I first ran these numbers and found some interesting results before realizing I had completely screwed up my data, idiot).

Adjusted Save% is still quite interesting in my eyes as explained in the War-On-Ice Glossary:
AdjustedSvPct: The weighted average of Low, Medium and High Save Percentages, as weighted by the league average frequency of each shot type. Compare to statistical benchmarking -- correcting a simple random sample for known stratification issues.
And from one of the creators, A.C. Thomas:
Here's a table with the results (we want RSQ to be close to 100% and p-value to be less than 0.05):


Type Count Shots Faced AdjustedRSQ RSQ p-value
UnAdjusted 99 >1500 1.72% 0.70% 0.196
Adjusted 99 >750 0.11% 1.14% 0.294
Low 77 >500 2.02% 3.31% 0.1134
Med 80 >300 -0.14% 1.13% 0.3482
High 103 >200 7.81% 8.71% 0.00247

**The Count column is simply the number of observances for each type of Save%. Shots faced is the arbitrary number of shots I made as a cut off point. Ex. For a goalie's LowSvPct to be counted, they would have to have faced more than 200 shots from the designated Low area of the ice in back-to-back years. This happened 77 times between 2008 and 2014 or an average of ~15 goalies per pair of years. Also, these stats are just even-strength (5v5).

The best results here were clearly for SvPctHigh, which is for shots directly in front of the next (in the slot). While the correlation is still small, it seems better than most results we see when it comes to goalie metrics. So maybe there is a little something to a goalies ability to defend a likely scoring chance.

Obviously, when using this method you observe some survivor bias as goalies who see lots of action in back-to-back years will tend to be of the higher caliber to begin which can result in some bias. Maybe I will revisit this later and try to simply weight goalies based on their shots faced instead of simply ignoring those below an arbitrary threshold. Until then, it seems like goalies will remain one of hockey's greatest enigmas.