Thursday, 27 November 2014

Corsi Against Doesn't Correlate with Save Percentage


How does a goalie's workload affect their ability to preform?  This question always seems to be bouncing around  and recently has come up again with regards to whether a goalie's workload (the amount of Corsi events they face) has a tangible impact on their save percentage.


Previous Literature 


The first analysis was done by Brodeur Is a Fraud and found little to no evidence of a correlation between the two variables. Another look was done over at Hockey-Graphs and found similar results with a different method:
For the forty active goaltenders to play at least one hundred NHL games over the past four seasons, there is no substantial relationship in them playing better -in terms of save percentage- when facing more or less shots against.
Chris Boyle in his own study at SportsNet did seemed to find a quite strong relationship yet I have some serious doubts as to the validity of his methodology. Essentially by looking at the raw shot counts and save percentages posted in individual games while removing goalies who didn't play the full game you result a very serious issue of survivor bias. Why do goalies in this study who see a large amount of shots against only post high save percentages? Most likely it is because if a goalie faces a large number of shots and doesn't post a high save percentage they will allow a large number of goals which leads to them being pulled from the game and therefore they are removed from this study. This removal doesn't happen for goalies who face a low number of shots while posting a low save percentage because they can still allow only a low number of goals against giving their coach no incentive to pull them. Example, a goalie faces 20 shots against and lets 2 in. That's a .900 save percentage which in the big picture isn't good but in an individual game only allowing two goals against is just fine. Therein lies my issues with this study.

Finally, we arrive at the most recent post by David Johnson at Hockey Analysis who can summarize his own methods best:
In my opinion, the proper way to answer the question of whether shot volume leads to higher save percentages is to look at how individual goalies save percentages have varied from year to year in relation to how their CA60 has varied from year to year. To do this I looked at the past 7 seasons of data and selected all goalie seasons where the goalie played at least 1500 minutes of 5v5 ice time. I then selected all goalies who have had at least 5 such seasons. There were 23 such goalies. I then took their 5-7 years worth of CA60 and save % stats and calculated a correlation between them. 
Basically, he found the individual correlations for each goaltender and then averaged these individual correlations. A few issues I noticed starting with the fact that correlation coefficients aren't additive. You need to first convert them to Fischer z values which are additive. This issue is minor as I ran his test again the results don't alter too much with this adjustment.

The second issue I take is with the claims made based on this study. Starting the use of word "boost" in the title implying that there is not only causation here which I am not convinced of (we simply see a correlation via his methodology) and also that there is only a positive correlation, meaning that an increase in CA/60 results in an increase in SV%.  Examine the data closer you find that 8/23 goalies saw the inverse effect (more shot-attempts against lowered their SV%) while another two saw essentially zero change in SV% in relation to their shot-attempts faced. This leaves us with only 13 goalies who we can see to have a positive correlation. This leads to my issue with the author making a general assumption about the impact of CA/60 boosting Save Percentage as a uniform result that can be applied across the board to all goalies, when he is really only talking about a specific subgroup. Later on in this post I will reveal my doubts with regard to his methods and how I believe he simply found a false positive for a relationship that doesn't exist. 

My Findings


I tweeted this graph out earlier when this question was first raised on Twitter. It is a very basic graph that took me a few minutes to put together but you can see a team allowing more shot-attempts against having a noticeable impact on their save percentage to be essentially zero.


These next few charts look at the individual goalie level. I set different cut offs in each graph just to see if we could weed out some goalie talent since better goalies tend to play the more minutes (unless your team is located in Winnipeg) and we still aren't able to find any strong evidence (the correlation does actually increase as we narrow the sample jumping from about 0 to 3%). 


This graph below is the same as the ones above but only using the data included in the Hockey Analysis study.



Since none of the graphs I managed to produce were able to find any correlation I decided to try my own blind recreation of the method used at Hockey Analysis. Below are two graphs very similar to the graphs first produced at Hockey Analysis that seemed to demonstrate the correlation between CA/60 and SV%. I have removed the titles of these two to add an element of surprise. Take a quick look at both before finding their titles below. 



***

***

***

 Surprised? This is my basic way to suggest that the results shown in Hockey Analysis' study could be the result of simple random variation. Pekka Rinne's chart is to show how one of these samples can be pretty much out of wack on the individual level while the Niemi vs. Howard chart shows that even when picking two variables that we know for a fact should have zero correlation to each other, when dealing with such small samples in this case only 5 seasons (or data points), it can be pretty easy to discover a relationship that doesn't actually exist.

The chart below shows the data on the correlation's found by Hockey Analysis. I took the liberty of converting it to Fisher z-values and then the Inverse of that which is the real correlation that he was looking for. So in actuality his correlation was higher than he first reported. To make things simpler I have stared* the important column here with the true correlation. 

Average Correlation Average Fisher Average Fisher Inverse*
0.183 0.215 0.212



The issue as you may have seen above in the Niemi vs. Howard chart is that it is very easy with this data set and this method to find correlation's that we know for a fact shouldn't exist. Below I calculated 23 correlations and their subsequent Fisher values in my blind test. I simply put the goalies in alphabetical order and compared the CA/60 for goalie A with the SV% of goalie B. 



Correlation Fisher
-0.292 -0.301
0.098 0.098
-0.098 -0.098
0.730 0.930
0.631 0.743
-0.407 -0.432
-0.726 -0.919
0.116 0.117
0.536 0.599
0.117 0.118
0.126 0.127
-0.230 -0.234
-0.131 -0.132
0.338 0.351
0.586 0.671
0.468 0.507
-0.631 -0.744
-0.383 -0.403
-0.616 -0.718
-0.708 -0.882
-0.213 -0.217
-0.095 -0.095
Average Average Fisher
-0.784 -0.916
Fisher Inverse*
-0.724

We know from common sense and logic that the number of shot-attempts faced by Evgeni Nabokov will have no effect on Henrik Lundqvist's save percentage but the number's actually show a correlation (.73). This is obviously a false positive showing a correlation that doesn't truly exist. Simply stated, correlation doesn't always prove causation. Based on what I have found here and the earlier research done on the subject, I feel confident in stating there is still little to no evidence relating the Corsi Against a goaltender and their Save Percentage.



You can reach me via email me here: DTMAboutHeart@gmail.com or via Twitter here: @DTMAboutHeart







Tuesday, 11 November 2014

NHL Draft Pick Value Chart


Drafts have always been a mystery in the sporting world. The number of teams relying on the draft to build their teams continues to rise in an era of delicate salary caps and bigger, stronger, faster athletes. Evaluating and projecting young athletes is far from an exact science to say the least. Look back at the 2007 NHL Entry Draft when the Pittsburgh Penguins selected Angelo Esposito 20th overall while the Dallas Stars were able to pick up future captain Jamie Benn with the 129th pick in the 5th round. In hindsight the mistake's seem obvious but this is hardly the standard, as you can see in the graph below earlier picks tend to yield much higher success rates than later selections.


Goalie's as it has been well documented in the past, are slightly less predictable to say the least...


What is each draft slot worth however? Attempting to nail down the value of a draft slot in the NHL has been attempted many, many, many, many, many, many times. I decided that it was time to reevaluate the idea from a slightly different approach than most.

In order to come up with my values, I gathered each draft pick going back to 1970 (when the draft really started to resemble what it is today) and looked at each player's Point Shares only during their first seven seasons in the NHL. I fully recognize that catch-all statistics are not perfect evaluations of a player but they are probably the best available statistics for judging large numbers of players throughout history. I chose Point Shares over GVT mainly because Point Shares cannot be negative, GVT on the other hand can be negative which causes difficulties when comparing certain players. Example, how do you value a player who makes the NHL and records a negative GVT against a player who never played an NHL game and therefore has a zero GVT? Should that player be counted less even though many would argue they were probably a better hockey player? It is a tough question but thankfully Point Shares doesn't share this issue.

Looking at only a player's first seven years rather than a players full career accounts for the assumption that when a team selects a player in the draft they are only guaranteed at most 7 years of that player's services before they hit free agency (3 years from their rookie contract and then 4 years of their RFA rights). I then fitted this data with a logarithmic curve to smooth the data to show the sharp drop off in value from the first picks followed then a more gradual drop for the later picks.

In the future I hope to replicate and build on Eric T.'s work found here regarding the market value of a draft pick. Where as my values were based on draft results, Eric based his on the market rate as determined by team trades. Comparing the two methods could provide some insight into what spots in the draft might be over or undervalued by teams relative to their actual expected value.

Below is the grid for comparing the individual value of each pick. Reminder that these value's are arbitrary numbers and should only be used to compare draft slots and not any players involved in a potential trade. This once again is an approximation many years of data and in no way a hard rule of how every pick should be valued. Enjoy!



You can reach me via email me here: DTMAboutHeart@gmail.com or via Twitter here: @DTMAboutHeart

Wednesday, 5 November 2014

Normalized Career Player Stats



Who is the greatest goal-scorer of all-time? What about playmaker? Hockey like all sports has evolved so much over the years that it is extremely hard to compare individuals in different eras. With the help of Rob Vollman's database and Hockey-Reference's Normalized Data I have compared the career's of every player over the past 47 years (since the 1967 expansion) to help shed some more light on these debates.

The Normalized Data is presented just like any player stats except all the stats are scaled to reflect certain changes throughout the league's history. The most common adjustments are to account for different lengths of schedules, amount of players carried on each roster and era adjustment to account for the amount of goals being regularly scored in those games (ex. it was easier to score a goal in 1981 than it is in 2014).

You can filter and sort this table at your own discretion and pleasure, enjoy!


*Players needed at least 300 Games Played by the end of the 2013-2014 season to qualify
**Even if a player's career began before 1967 this chart will only reflect their stats since 1967

Observations

  • The data is obviously skewed towards players whose careers have yet to end. It is extremely hard to maintain high levels of play throughout your entire career which is why active players still in or near their primes will see their stats slightly inflated. 
  • Bobby Orr was amazing. He absolutely dominated the game from an offensive stand point that we will probably never see again. 
  • Sidney Crosby is the greatest player alive and one of the best ever.
  • Ovechkin is probably one of the greatest goal scorers to ever lace up the skates. It still amazes me how much garbage is thrown Ovechkin's way by people who have a seriously flawed understanding of the game of hockey or are simply trying grab a headline. Ovechkin is one of, if not the most, lethal goal scorer in the leagues past half-century and we should all just appreciate the opportunity to bear-witness. 
  • Kovalchuk's stats will forever be skewed from the fact that he essentially played out his best years in the NHL before bolting to the KHL which essentially ensures that his career rate stats will never suffer as he ages. He did have a great run though, while it lasted. 
  • Lemieux and Gretzky come down to the wire here. Lemieux has the better era-adjusted PTS/Game due to his big years occurring in the 90s as opposed to Gretzky who succeeded in the high flying 80s. Gretzky however, played about 500 more games which has to be considered as a positive when considering the two.
  • Jagr is ageless. He keeps on clicking at a ridiculous rate despite taking 3 years off to play in Europe only to come back and put up unheard of numbers for a player older than 40.
  • Cam Janssen just nudges out Colton Orr for worst PTS/Game of any regular forward in the last half decade. Likewise, Wade Belak takes home the title of least offensive defenceman of the modern era.