Don't Tell Me About Heart: 2014

Sunday, 14 December 2014

How Long Does It Take For A Forward's Shooting To Stabilize?

If a player scores one goal on five shots, does that mean they are suddenly a 20% shooter? What if it is 10 goals on 50 shots? How about 20 goals on 100 shots? This is a classic issue of sample size in trying to separate the signal (talent) from the noise (randomness). That issue being, how big does a sample need to be before it stops being small? The question has been tackled before in other sports, see baseball here and basketball here, and my analysis here will mirror a lot of the methodology laid out in those pieces.

Relating the problem to a player's shooting talent, how many shots does a player need to take before we can separate the talent from the randomness?

Now if you don't care about the math then please skip to the *** for the answer and analysis.

Otherwise lets dive in!

The most common method used for testing this problem is typically split-half reliability testing. For example, if we were wondering how stable a player's shooting percentage is after 100 shots we would label each shot from 1-100 and then randomly split these 100 shots into two random 50 shot samples. We would then compare the player's shooting percentage between these two samples. This method is fine but it can be improved upon in our case by using the Kuder-Richardson Formula 21 (KR-21).

This formula will tell us the reliability of a test involving binary outcomes (two results), which is great for this test since when a player takes a shot there are only two possible results, a save or a goal. The KR-21 formula allows us to perform a split half reliability test but instead of only being able to compare one only type of combination it allows us to compare every single possible combination of these outcomes. For example, if we were going to preform a basic split half reliability test for a total sample of 10 shots (each labelled, 1 2 3 4 5 6 7 8 9 10) a simple method would be to compare all the even number shots with the odd number shots. Using the KR-21 formula however goes further and compares every single type of combination (ex. evens vs. odds, 1-5 vs. 6-10, 1 2 3 9 10 vs. 4 5 6 7 9, etc..). The results of this 10 shot KR-21 test will be a much better estimate of how reliable an indicator of a player's true talent level a stat will be over a 5 shot sample (10 divided by 2 = 5).

Our goal is to reach a reliability of 0.707 at which point the signal (skill/talent) will begin to overtake the noise (randomness/luck) in our sample (0.707 x 0.707 = 50%). Below I have charted shots versus their reliability to show how the reliability of a sample which change as your sample's cutoff point increases. The blue line shows the logarithmic curve of reliability (which had an R-squared fit of 0.99626 with the data points) which I used instead of simply plotting a basic curve graph. I used the log curve because as you might notice in the table below I got a tad lazy and stopped running the numbers as frequently for bigger samples so I used the logarithmic curve which shows the relationship just as well. The red line shows the 0.707 cut off line where talent beings to overtake the randomness. Above the red line = good, below the red line = not good.

***

I found that after about 223 shots the reliability will cross the 0.707 threshold.

Shots	Reliability	Signal (Talent)	Noise (Luck)
25	0.169	2.8%	97.2%
50	0.317	10.0%	90.0%
75	0.410	16.8%	83.2%
100	0.493	24.3%	75.7%
125	0.560	31.4%	68.6%
150	0.604	36.5%	63.5%
175	0.656	43.0%	57.0%
200	0.677	45.9%	54.1%
212.5	0.693	48.0%	52.0%
217.5	0.704	49.5%	50.5%
222.5	0.707	50.0%	50.0%
225	0.712	50.6%	49.4%
250	0.732	53.6%	46.4%
300	0.765	58.5%	41.5%
375	0.805	64.9%	35.1%
500	0.891	79.5%	20.5%

We now know that at 223 shots a player's shooting percentage is about 50% skill and 50% luck, which is still a lot of noise. We have to get about 400 shots before we really see a player's talent begin to shine through. This once again demonstrates how easy it is to be fooled by small sample sizes. While 223 may seem like a reasonable estimate it should be noted that only 40 players last season (2013-2014), or just over 6% of the entire league, record more than 223 shots. Alexander Ovechkin led the league with 386 shots total (along with a 13.6 shooting percentage) and still only gives us a signal strength of about 65%.

This isn't meant to be predictive necessarily. That is to say, just because John shot 9% over 223 shots doesn't mean that we should expect John to shot 9% over his next 223 shots. If John shoots 17% over his next 50 games did he suddenly become a better shooter? Probably not. However, if John shoots 12% over his next 223 shots, the case can actually be made that this player may have improved his actual shooting talent.

This all goes to show that it does take quite a bit of time for a player's shooting percentage to stabilize. Many are quick to reach assumptions about a player's actual ability simply based on a single season which we can see here rarely makes sense when the vast majority of the league will have taken so few shots that separating the signal from the noise is incredibly difficult. There is definitely talent at the heart of a player's ability to score goals, it just takes some time for that talent to truly become evident.

Thursday, 27 November 2014

Corsi Against Doesn't Correlate with Save Percentage

How does a goalie's workload affect their ability to preform? This question always seems to be bouncing around and recently has come up again with regards to whether a goalie's workload (the amount of Corsi events they face) has a tangible impact on their save percentage.

Previous Literature

The first analysis was done by Brodeur Is a Fraud and found little to no evidence of a correlation between the two variables. Another look was done over at Hockey-Graphs and found similar results with a different method:

For the forty active goaltenders to play at least one hundred NHL games over the past four seasons, there is no substantial relationship in them playing better -in terms of save percentage- when facing more or less shots against.

Chris Boyle in his own study at SportsNet did seemed to find a quite strong relationship yet I have some serious doubts as to the validity of his methodology. Essentially by looking at the raw shot counts and save percentages posted in individual games while removing goalies who didn't play the full game you result a very serious issue of survivor bias. Why do goalies in this study who see a large amount of shots against only post high save percentages? Most likely it is because if a goalie faces a large number of shots and doesn't post a high save percentage they will allow a large number of goals which leads to them being pulled from the game and therefore they are removed from this study. This removal doesn't happen for goalies who face a low number of shots while posting a low save percentage because they can still allow only a low number of goals against giving their coach no incentive to pull them. Example, a goalie faces 20 shots against and lets 2 in. That's a .900 save percentage which in the big picture isn't good but in an individual game only allowing two goals against is just fine. Therein lies my issues with this study.

Finally, we arrive at the most recent post by David Johnson at Hockey Analysis who can summarize his own methods best:

In my opinion, the proper way to answer the question of whether shot volume leads to higher save percentages is to look at how individual goalies save percentages have varied from year to year in relation to how their CA60 has varied from year to year. To do this I looked at the past 7 seasons of data and selected all goalie seasons where the goalie played at least 1500 minutes of 5v5 ice time. I then selected all goalies who have had at least 5 such seasons. There were 23 such goalies. I then took their 5-7 years worth of CA60 and save % stats and calculated a correlation between them.

Basically, he found the individual correlations for each goaltender and then averaged these individual correlations. A few issues I noticed starting with the fact that correlation coefficients aren't additive. You need to first convert them to Fischer z values which are additive. This issue is minor as I ran his test again the results don't alter too much with this adjustment.

The second issue I take is with the claims made based on this study. Starting the use of word "boost" in the title implying that there is not only causation here which I am not convinced of (we simply see a correlation via his methodology) and also that there is only a positive correlation, meaning that an increase in CA/60 results in an increase in SV%. Examine the data closer you find that 8/23 goalies saw the inverse effect (more shot-attempts against lowered their SV%) while another two saw essentially zero change in SV% in relation to their shot-attempts faced. This leaves us with only 13 goalies who we can see to have a positive correlation. This leads to my issue with the author making a general assumption about the impact of CA/60 boosting Save Percentage as a uniform result that can be applied across the board to all goalies, when he is really only talking about a specific subgroup. Later on in this post I will reveal my doubts with regard to his methods and how I believe he simply found a false positive for a relationship that doesn't exist.

My Findings

I tweeted this graph out earlier when this question was first raised on Twitter. It is a very basic graph that took me a few minutes to put together but you can see a team allowing more shot-attempts against having a noticeable impact on their save percentage to be essentially zero.

These next few charts look at the individual goalie level. I set different cut offs in each graph just to see if we could weed out some goalie talent since better goalies tend to play the more minutes (unless your team is located in Winnipeg) and we still aren't able to find any strong evidence (the correlation does actually increase as we narrow the sample jumping from about 0 to 3%).

This graph below is the same as the ones above but only using the data included in the Hockey Analysis study.

Since none of the graphs I managed to produce were able to find any correlation I decided to try my own blind recreation of the method used at Hockey Analysis. Below are two graphs very similar to the graphs first produced at Hockey Analysis that seemed to demonstrate the correlation between CA/60 and SV%. I have removed the titles of these two to add an element of surprise. Take a quick look at both before finding their titles below.

***

Surprised? This is my basic way to suggest that the results shown in Hockey Analysis' study could be the result of simple random variation. Pekka Rinne's chart is to show how one of these samples can be pretty much out of wack on the individual level while the Niemi vs. Howard chart shows that even when picking two variables that we know for a fact should have zero correlation to each other, when dealing with such small samples in this case only 5 seasons (or data points), it can be pretty easy to discover a relationship that doesn't actually exist.

The chart below shows the data on the correlation's found by Hockey Analysis. I took the liberty of converting it to Fisher z-values and then the Inverse of that which is the real correlation that he was looking for. So in actuality his correlation was higher than he first reported. To make things simpler I have stared* the important column here with the true correlation.

Average Correlation	Average Fisher	Average Fisher Inverse*
0.183	0.215	0.212

The issue as you may have seen above in the Niemi vs. Howard chart is that it is very easy with this data set and this method to find correlation's that we know for a fact shouldn't exist. Below I calculated 23 correlations and their subsequent Fisher values in my blind test. I simply put the goalies in alphabetical order and compared the CA/60 for goalie A with the SV% of goalie B.

Correlation	Fisher
-0.292	-0.301
0.098	0.098
-0.098	-0.098
0.730	0.930
0.631	0.743
-0.407	-0.432
-0.726	-0.919
0.116	0.117
0.536	0.599
0.117	0.118
0.126	0.127
-0.230	-0.234
-0.131	-0.132
0.338	0.351
0.586	0.671
0.468	0.507
-0.631	-0.744
-0.383	-0.403
-0.616	-0.718
-0.708	-0.882
-0.213	-0.217
-0.095	-0.095
Average	Average Fisher
-0.784	-0.916

	Fisher Inverse*
	-0.724

We know from common sense and logic that the number of shot-attempts faced by Evgeni Nabokov will have no effect on Henrik Lundqvist's save percentage but the number's actually show a correlation (.73). This is obviously a false positive showing a correlation that doesn't truly exist. Simply stated, correlation doesn't always prove causation. Based on what I have found here and the earlier research done on the subject, I feel confident in stating there is still little to no evidence relating the Corsi Against a goaltender and their Save Percentage.

You can reach me via email me here: DTMAboutHeart@gmail.com or via Twitter here: @DTMAboutHeart

Tuesday, 11 November 2014

NHL Draft Pick Value Chart

Drafts have always been a mystery in the sporting world. The number of teams relying on the draft to build their teams continues to rise in an era of delicate salary caps and bigger, stronger, faster athletes. Evaluating and projecting young athletes is far from an exact science to say the least. Look back at the 2007 NHL Entry Draft when the Pittsburgh Penguins selected Angelo Esposito 20th overall while the Dallas Stars were able to pick up future captain Jamie Benn with the 129th pick in the 5th round. In hindsight the mistake's seem obvious but this is hardly the standard, as you can see in the graph below earlier picks tend to yield much higher success rates than later selections.

Goalie's as it has been well documented in the past, are slightly less predictable to say the least...

What is each draft slot worth however? Attempting to nail down the value of a draft slot in the NHL has been attempted many, many, many, many, many, many times. I decided that it was time to reevaluate the idea from a slightly different approach than most.

In order to come up with my values, I gathered each draft pick going back to 1970 (when the draft really started to resemble what it is today) and looked at each player's Point Shares only during their first seven seasons in the NHL. I fully recognize that catch-all statistics are not perfect evaluations of a player but they are probably the best available statistics for judging large numbers of players throughout history. I chose Point Shares over GVT mainly because Point Shares cannot be negative, GVT on the other hand can be negative which causes difficulties when comparing certain players. Example, how do you value a player who makes the NHL and records a negative GVT against a player who never played an NHL game and therefore has a zero GVT? Should that player be counted less even though many would argue they were probably a better hockey player? It is a tough question but thankfully Point Shares doesn't share this issue.

Looking at only a player's first seven years rather than a players full career accounts for the assumption that when a team selects a player in the draft they are only guaranteed at most 7 years of that player's services before they hit free agency (3 years from their rookie contract and then 4 years of their RFA rights). I then fitted this data with a logarithmic curve to smooth the data to show the sharp drop off in value from the first picks followed then a more gradual drop for the later picks.

In the future I hope to replicate and build on Eric T.'s work found here regarding the market value of a draft pick. Where as my values were based on draft results, Eric based his on the market rate as determined by team trades. Comparing the two methods could provide some insight into what spots in the draft might be over or undervalued by teams relative to their actual expected value.

Below is the grid for comparing the individual value of each pick. Reminder that these value's are arbitrary numbers and should only be used to compare draft slots and not any players involved in a potential trade. This once again is an approximation many years of data and in no way a hard rule of how every pick should be valued. Enjoy!

You can reach me via email me here: DTMAboutHeart@gmail.com or via Twitter here: @DTMAboutHeart

Wednesday, 5 November 2014

Normalized Career Player Stats

Who is the greatest goal-scorer of all-time? What about playmaker? Hockey like all sports has evolved so much over the years that it is extremely hard to compare individuals in different eras. With the help of Rob Vollman's database and Hockey-Reference's Normalized Data I have compared the career's of every player over the past 47 years (since the 1967 expansion) to help shed some more light on these debates.

The Normalized Data is presented just like any player stats except all the stats are scaled to reflect certain changes throughout the league's history. The most common adjustments are to account for different lengths of schedules, amount of players carried on each roster and era adjustment to account for the amount of goals being regularly scored in those games (ex. it was easier to score a goal in 1981 than it is in 2014).

You can filter and sort this table at your own discretion and pleasure, enjoy!

*Players needed at least 300 Games Played by the end of the 2013-2014 season to qualify
**Even if a player's career began before 1967 this chart will only reflect their stats since 1967

Observations

The data is obviously skewed towards players whose careers have yet to end. It is extremely hard to maintain high levels of play throughout your entire career which is why active players still in or near their primes will see their stats slightly inflated.
Bobby Orr was amazing. He absolutely dominated the game from an offensive stand point that we will probably never see again.
Sidney Crosby is the greatest player alive and one of the best ever.
Ovechkin is probably one of the greatest goal scorers to ever lace up the skates. It still amazes me how much garbage is thrown Ovechkin's way by people who have a seriously flawed understanding of the game of hockey or are simply trying grab a headline. Ovechkin is one of, if not the most, lethal goal scorer in the leagues past half-century and we should all just appreciate the opportunity to bear-witness.
Kovalchuk's stats will forever be skewed from the fact that he essentially played out his best years in the NHL before bolting to the KHL which essentially ensures that his career rate stats will never suffer as he ages. He did have a great run though, while it lasted.
Lemieux and Gretzky come down to the wire here. Lemieux has the better era-adjusted PTS/Game due to his big years occurring in the 90s as opposed to Gretzky who succeeded in the high flying 80s. Gretzky however, played about 500 more games which has to be considered as a positive when considering the two.
Jagr is ageless. He keeps on clicking at a ridiculous rate despite taking 3 years off to play in Europe only to come back and put up unheard of numbers for a player older than 40.
Cam Janssen just nudges out Colton Orr for worst PTS/Game of any regular forward in the last half decade. Likewise, Wade Belak takes home the title of least offensive defenceman of the modern era.

Friday, 3 October 2014

Maggie Projections 2014-2015

The start of the new NHL season is right around the corner and with that I present my first instalment of the Maggie Projections for the 2014-2015 season. Essentially these are projections for the the upcoming NHL season based on the system developed by Tom Tango for baseball about a decade ago.

It is the most basic forecasting system you can have, that uses as little intelligence as possible. So, that's the allusion to the monkey. It uses 3 years of MLB data, with the most recent data weighted heavier. It regresses towards the mean. And it has an age factor.

Tango named his system Marcel after Marcel the monkey due to the idea that they're so basic a monkey could do them. In order to avoid any potential confusion and add a little hockey flavour and have affectionately named these projections after Maggie the Monkey. A quick history lesson for those who may not know, Maggie the Monkey was a reoccurring guest on TSN during the playoffs in which she would spin a giant wheel to predict playoff rounds. I think it was a brilliant display of the randomness of hockey and the unpredictability of the ~~small sample tournament~~ Stanley Cup Playoffs. Her record was pretty impressive all things considered, (I must remind you, it's a monkey spinning a giant wheel) she was 50% on her career and 53.33% before her tough last season.

I may look to try and improve on these projections at a later date by adding on new stats and adjusting the projections with some tweaks here and there. Reminder, I do not stand behind these forecasts as this is essentially one big formula that I have taken and applied to hockey with no subjective input from me at all. (Credit to Rob Vollman's player spreadsheets for my data)

To save people some time, please use the following format for all complaints:

<player> is clearly ranked <too high/too low> because <reason unrelated to Maggie Projection system>. <subjective ranking system> is way better than this. <unrelated player-supporting or -denigrating comment, preferably with poor spelling and/or chat-acceptable spelling>

So without further ado, here are the Maggie Projections for the 2014-2015 season.

Learn About Tableau

Monday, 22 September 2014

Year to Year Repeatability of New Goalie Stats

For as long as people have been analyzing hockey, goalies in particular have been difficult to analyze. Based on what information has been available to the masses up until now, it has been extremely difficult to get an accurate read on a goalie's abilities until they see a large amount of action. Metrics in evaluating goalie performance to date have ranged from awful (wins and GAA) to informative but unpredictable (SV%). However, a new hockey statistics site called War-On-Ice has released data that has been able to further break down a goalies SV% into more detailed categories depending on the shot location data provided by the NHL. Their four new stats (and one more classic stat) are as follows:

Low Save Percentage
Medium Save Percentage
High Save Percentage
Adjusted Save Percentage
UnAdjusted Save Percentage (technically not new since it's just basic SV%)

While these new metrics are definitely very descriptive and informative I really wanted to test the predictive value in goalie projections. That is, if a goalie posted a .900 SV% in a certain category last year, what should we expect them to get next year?

Save Percentage Zones via war-on-ice.com

The picturing above shows the zone breakdown for each of these stats:

Blue = high percentage shots (SvPctHigh)
Red = medium percentage shots (SvPctMedium)
Yellow = low-percentage shots (SvPctLow)

Here is the breakdown of how these stats matched up versus one another from one season to the next. (Regression line in red)

Well, that wasn't very informative. We see very little correlation from one season to next regardless if you're treating every shot as equal (Unadjusted SV%) or break each shot down by it's general degree of difficulty (Low, Med, High SV%). I had high hopes for Adjusted Save% too (especially after I first ran these numbers and found some interesting results before realizing I had completely screwed up my data, idiot).

Adjusted Save% is still quite interesting in my eyes as explained in the War-On-Ice Glossary:

AdjustedSvPct: The weighted average of Low, Medium and High Save Percentages, as weighted by the league average frequency of each shot type. Compare to statistical benchmarking -- correcting a simple random sample for known stratification issues.

And from one of the creators, A.C. Thomas:

@DTMAboutHeart So if a goalie gets more tough shots, they won't count as much in the final tally and drag down their total
— A.C. Thomas (@acthomasca) September 18, 2014

Here's a table with the results (we want RSQ to be close to 100% and p-value to be less than 0.05):

Type	Count	Shots Faced	AdjustedRSQ	RSQ	p-value
UnAdjusted	99	>1500	1.72%	0.70%	0.196
Adjusted	99	>750	0.11%	1.14%	0.294
Low	77	>500	2.02%	3.31%	0.1134
Med	80	>300	-0.14%	1.13%	0.3482
High	103	>200	7.81%	8.71%	0.00247

**The Count column is simply the number of observances for each type of Save%. Shots faced is the arbitrary number of shots I made as a cut off point. Ex. For a goalie's LowSvPct to be counted, they would have to have faced more than 200 shots from the designated Low area of the ice in back-to-back years. This happened 77 times between 2008 and 2014 or an average of ~15 goalies per pair of years. Also, these stats are just even-strength (5v5).

The best results here were clearly for SvPctHigh, which is for shots directly in front of the next (in the slot). While the correlation is still small, it seems better than most results we see when it comes to goalie metrics. So maybe there is a little something to a goalies ability to defend a likely scoring chance.

Obviously, when using this method you observe some survivor bias as goalies who see lots of action in back-to-back years will tend to be of the higher caliber to begin which can result in some bias. Maybe I will revisit this later and try to simply weight goalies based on their shots faced instead of simply ignoring those below an arbitrary threshold. Until then, it seems like goalies will remain one of hockey's greatest enigmas.

Thursday, 21 August 2014

Drafting Strategies - Reaches vs. Fallers

Yes, I am aware this isn't the timeliest of articles but please bear with me. If you have ever watched any televised professional sports draft you will probably have noticed that there are always players who seem to be "reaches" or players who seem to "fall" on draft day. What makes a player a "reach"? A reach would be a player like Derrick Pouliot in the 2012 NHL Entry Draft. Pouliot was ranked 17th in TSN's Rankings but the Pittsburgh Penguins stepped to the podium and selected him with their 8th pick. That's a fairly substantial jump. Opposite of that, we have the fallers. What is a faller? Look no further than Teavo Teravainen from that very same draft. Ranked 7th heading into the draft, Teavo had to wait til the 18th pick before he was snatched up by the Chicago Blackhawks (all TSN Rankings used in this article can be found here).

What I am attempting to do here is try and find if one strategy is really more effective than the other. Should you trust your gut and draft the guy that you know everyone has ranked lower than this spot? Should you be genuinely tempted to grab that player that everyone seems to be passing on? At the end of the day I expect each team to stick with their guns and draft the play highest on their own list. But maybe it wouldn't hurt however for teams look at what other people think and ask themselves "Are we way off on this?"

Let's pretend for a second that it's the 2005 NHL Entry Draft and you're the San Jose Sharks with the 8th overall pick. You're really high on this winger from the WHL named Devin Setoguchi which is great and all except he is ranked 26th overall (in hindsight you can do a lot worse with a first round pick than Setoguchi but this example still works). But you take a second to look at some other rankings and notice TSN has this Slovenian centre by the name of Anze Kopitar ranked 5th overall, interesting. In the end the Sharks stuck to their gut and drafted Setoguchi, leaving Kopitar to be snatched by the Kings at 11th overall and well the rest is history. This exercise was to help dive into that decision making process and see if history can teach us anything:

The x-axis of this histogram ranges from reaches to fallers (left to right) and gives us a good idea of team drafting patterns. Team's aren't afraid to take slight reaches (as seem by the highest column slightly to the left of 0) but aren't as eager to pick up players who seem to being falling as noted by the more spread out pattern to the right of the 0. If you aren't really up to speed with histograms, learn more here.

This looks at total games played by each player picked in the 1st round between 2004-2009 excluding goaltenders. Nothing really new to see here other than of course how many "sure things" never end up cracking the NHL.

Multiple R-squared: 0.0663 - 6.63%
Adjusted R-squared: 0.06057 - 6.06%
p-value: 0.0008414

So the correlation isn't the strongest here (shown by the R^2 value) which I would have guessed just based on looking at the scatter plot. Interesting though, is the p-value being significantly less than 0.05 (you can brush up on p-values here). Essentially what we can differ from these results is that there isn't a strong correlation (shown with the low R^2), ex. just because you fall X spots doesn't mean you'll play Y amount of NHL games. There is however, some significance between a player who is a faller and one that is a reach.

Of course using simply GP as a measure of a successful draft pick is hardly fair when you simply consider that a player drafted in 2004 will by default have a higher chance of playing more games than a player drafted in 2009. So another method I used to tackle this issue is by dividing the players into separate bins based on the difference between where they were ranked and where they were selected (explained below the table).

	Bins	Count	Average GP	% Success	Avg Diff
Fallers	1-20%	33	88.64	27.27%	17.42
	21-40%	33	250.73	66.67%	4.76
	41-60%	33	315.30	81.82%	0.79
	61-80%	33	286.82	78.79%	-0.70
Reaches	81-100%	33	173.94	57.58%	-2.00
Fallers	> 5	46	117.91	45.45%	14.39
	> 0 and < 5	41	131.83	96.97%	2.34
	0	27	348.00	69.70%	0.00
	> -5 and < 0	39	222.59	78.79%	-2.38
Reaches	< -5	12	179.00	21.21%	-9.92

Bins - I grouped the players based on how high or low they were taken relative to their TSN Ranking

1-20% bin holds the 33 biggest fallers, while the 81%-100% bin holds the 33 biggest reaches
The 2nd set of bins holds players based on an arbitrary cutoff point I came up with, ex. >5 contains the 46 players who were taken at least 6 spots below their TSN Ranking

Count - How many players qualify for each bin
Average GP - Average NHL Games Played by the players in that bin
% Success - # of players to play at least 100 NHL Games in that bin / Total players in that bin
Avg Diff - Average difference between where the player was ranked and where they were drafted; + means faller while - means a reach.

Some quick observations of the chart, the high % Success of slight fallers in the "> 0 and < 5" bin are probably due to players like Seth Jones, who was ranked 2nd but taken 4th so while technically he fell in the draft everyone is pretty sure a player of that caliber will become a successful NHL player. Reaches of 5 spots of more seem to have very suspect returns, with only 1 out of every 5 players turning into successful NHL players, same as the 33 largest fallers who had a 1 in 4 success rate.

I think if anything, what this exercise has can help show is that maybe consensus rules above all. Thinking of being bold and grabbing the dark horse no one else is even considering? Maybe there is a legitimate reason most aren't considering them. See that hot shot prospect seemingly passed on by every other squad? Maybe there is a reason for that too. All that said, it doesn't mean you shouldn't be afraid to stick with your guts. There are guys who fell 6 spots and turned out pretty great (Kopitar, Zajac). Similarly, there are also guys who would be considered reaches who have turned out pretty well themselves (Karlsonn, Eberle, Couture). There is no golden rule when it comes to drafting but each piece of the puzzle can only help make the picture that much clearer.

Reach me at on Twitter @DTMAboutHeart or email me at DTMAboutHeart@gmail.com

Monday, 26 May 2014

NHL Entry Draft - Drafting Goalies in the Late Rounds

With the NHL Entry Draft right around the corner its always great to speculate and ponder the upcoming prospects. I've recently stumbled upon two draft related posts that got me interested in thinking about how we evaluate these up and comers (check out this awesome post that pits the Vancouver Canucks scouting department versus a potato and this post that looks at the high success of offensive defencemen).

I am slowly growing my database of NHL Draft data (if someone reading this has their own database of prospect stats and wouldn't mind sharing, please contact me). I decided to look at goalies selections in the NHL draft, mostly because I haven't seen much done on the subject.

Below is a chart I created of goalies drafted between 1997 and 2006 (I chose this time frame in hopes to keep the sample relatively modern while still giving them ample time to crack an NHL roster).

I define success as the goalie having played at least 100 NHL games as of the end of the 2014 season. The tier's section sorts the goalies by what round they were drafted:

Top - Rounds 1-3
Middle - Rounds 4-6 (pre-lockout) Rounds 4-5 (post-lockout)
Bottom - Rounds 7-9 (pre-lockout) Rounds 6-7 (post-lockout)

I highlighted the most striking observation in that chart. (Sorry if the formatting is confusing).

Tier	League	Players	Success	Busts	Success Rate
Top	Total	85	27	58	31.76%
	CHL	44	14	30	31.82%
	Not CHL	41	13	28	31.71%
Middle	Total	101	10	91	9.90%
	CHL	46	5	41	10.87%
	Not CHL	55	5	50	9.09%
Bottom	Total	95	9	86	9.47%
	CHL	34	0	34	0.00%
	Not CHL	61	9	52	14.75%
Total		281	46	235	16.37%

No goalie who played their draft eligible season in the CHL (OHL, WHL, QMJHL) between 1997 and 2006 and was drafted in the later rounds of the draft has ever managed to play 100 NHL games.

The 2 Non-CHL goalies were Brian Elliott (Ajax-OPJHL) and Scott Clemmensen (Des Moines).

The other 7 were all Europeans: Henrik Lundqvist, Pasi Nurminen, Cristobal Huet, Martin Gerber, Fredrik Norrena, Jaroslav Halak, Pekka Rinne.

Why is this the case? I really don't know. Maybe all of the CHL prospects are so highly scouted and scrutinized it's harder to steal a future contributor in the later rounds. If I am running a draft table however, I am definitely leaning towards taking a European goalie with those late round picks.

If you have any theories let me know, I will be looking into more NHL draft material in the upcoming weeks.

Wednesday, 7 May 2014

Lucky vs. Good Matrix

In any NHL season, there is considerable amount of luck involved that is played out on the ice and therefore becomes translated to the standings. The concept of luck in sports is a tricky one for most casual fans to fully grasp. Most people accept that luck plays a real role in the outcomes of games but what forms that luck comes in and how prevalent it is can be widely debated. Hockey is a game involving so many dynamic elements all occurring simultaneously, when it is put so eloquently by Brendan Shanahan it isn't too hard imagine why some aspects can be out of an individual's direct control:

But we need to put all this together while moving at high speeds on a cold and slippery surface while 5 other guys use clubs to try and kill us, oh yeah did I mention that this whole time we're standing on blades 1/8 of an inch thick? Is ice hockey hard?

Sometimes the bounces will cancel each other out in the long run, sometimes they don't. Yes, playing an 82 game season helps even out the ups and downs, but not always. Too many fans still get suckered into starring at the standings and thinking that a teams point totals are a good indicator of how strong that team is. With this matrix I have created you, hopefully you can gain an idea of what teams were for real this past season and which were mere illusions.

***

If you aren't interested in the methodology just skip to the next set of asterisks.

The skill portion of this matrix is fairly straight forward. I chose 5on5 Close FF% as a proxy for possession for fairly obvious reasons, mainly due to how closely it trends towards actually possession numbers. For those of you who may not be aware, controlling possession has been shown to be not only a very repeatable and consistent skill but also a very good predictor of future success.

The luck aspect of this matrix is much more open for variation and discussion. Using Rob Vollman's Team Luck calculator I was able to come up with a value based on five different luck attributes. Each of these attributes was able to be rated on a range of 1-5 (1 meaning irrelevant and 5 being extremely important). The actual luck score that each team received is an arbitrary number essentially but is good for being able to compare teams on the same scale. Here are each of the attributes and how I weighted each of them:

PDO 4/5 - PDO is the sum of a team's SV% and SH%. We know that PDO tends to trend towards 100 therefore we can conclude that any team with a PDO>100 is more likely to be riding an unsustainable hot streak and a team with PDO<100 should start to see the puck bouncing their way soon enough. This is probably the most widely accepted form of luck in the current NHL and was weighted highly accordingly.

Special Teams Index (STI) 2/5 - STI, like PDO is a combination stat in which you sum a teams PP% and their PK%. The same methodology of regressing towards a score of a 100 applies as with PDO. I rated this slightly less because while I believe there are some hot and cold streaks with regards to special teams, some teams via either coaching systems or player availability can maintain some outlier percentages.

Cap Hit of Injured Player (CHIP) 3/5 - CHIP is a metric provided by Springing Malik, which essentially helps us quantify the amount a team has been hurt by injuries during the season. It accounts for the how many games were missed by a player based and uses that player's Cap Hit to determine the value of that player to their team. It isn't a perfect metric for quantifying how detrimental each injury is to a team but it's still pretty good, therefore it gets a middle of the road ranking.

Post-Regulation Record 4/5 - The shootout and 5 minutes of OT have been shown to essentially be nothing more than a flip of a coin. While there is some skill in play via differentiating certain players and goalies the eventual outcome of these situations involves a ton of luck.

Record in One-goal Games 3/5 - It has been shown that winning games by only one goal isn't a very good indicator of a teams ability and that teams who have been lucky enough to come out on top in these tight games did so mainly due to the bounces going their way.

***

Legend

Quadrants	Playoffs
Good and Lucky	9/10
Good and Unlucky	3/7
Bad and Lucky	4/5
Bad and Unlucky	0/8

X-Axis: 5on5 Close FF% (Fenwick for percentage) - Farther to the right = Better team

Y-Axis: Calculated Luck Score - Closer to the top = Lucky

Some thoughts on the results:

Good and Lucky - The only team from this section not to make the playoffs was the Yotes who in reality were just barely a positive possession team with very low levels of luck anyways so it isn't crazy to see them not in the playoff picture.
Good and Unlucky - The Devils are probably the most interesting team in this subsection. Despite sterling possession numbers they were unable to overcome some of the worst bounces in the league this season. Another observation, the Red Wings snuck into the playoffs while the Canucks couldn't despite their equal unluckiness and Canucks better position might be more due to finding themselves in a much more difficult conference.
Bad and Lucky - I was definitely surprised to see 80% of the teams in this category sneaking into the playoffs. Colorado had an absurdly lucky season which makes up for their defects. The Wild, Canadiens and Flyers were not too bad possession so they aren't too much of a stretch. The Leafs were just awful despite having the 5th best luck overall.
Bad and Unlucky - Bad teams with no bounces going their way, tough to have much success with a formula like that.

Saturday, 19 April 2014

NHL Goaltending - Best Friend or Worst Enemy

“Goaltending is 75 percent of your hockey team, unless you don't have it. Then it's 100 percent.” – Harry Neale

Goaltending can make or break a hockey team. Every season there will be a handful of men between the pipes dragging their teams into the playoffs and typically just as many who will will cost their teams an invite to the big dance.

For those of you who are unaware, goalies can be wildly inconsistent. A goalies SV% tends to vary quite largely from year-to-year. Even better evaluators of true talent such as ESSV% or RoadSV% will rise and fall from year to year.

Ottawa Senators are possibly the most recent poster boy for the reality of goaltending inconsistency. The Senators have seen their goalies SV% plummet from an absurd .935 to a below-average .908. How could this happen? Did Anderson and Lehner suddenly forget how to play goalie? Probably not. What probably did happen however was a brutal form of regression. Essentially, Anderson and Lehner's stats came hurling back to earth. So how good are they really then? Are they .935% goalies? Probably not. Are they .908? Probably not that either. The answer most likely lies somewhere in the middle.

In this analysis, I hope to show what the league would have looked like this year had some goalies not played as amazing as they did (ex. Varlamov) or not as poorly as they did (ex. Dubnyk).

The method for my research came from the idea posted by Phil Birnbaum:

Roughly speaking, that means you can expect 25% of a goalie's difference from the mean to be repeated next year. Put another way, you have to regress the goalie 75% towards the mean.

Yes, that's not as much as you'd expect. By that calculation, if the average save percentage is .904, and goalie X comes in one season at .924, you'd expect next year he'd be at .909 -- one quarter of the distance between .904 and .924.

The basics of what I did was look at every NHL goalie who had faced about 900 shots (or about 35 games played), and regressed their save percentage 75% towards the mean. The reason I chose to do this and not just set everyone at league average is because I believe this is a better reflection of what a team should realistically expect from them their individual goalie.

I then calculated how many more goals you would expect a goalie to either surrender or save for the given season. Finally, I gave a team 1 point for every 3 more goals saved and vice-versa.

Here are my results...

Team	Official Points	Goalie Regression Points	+/- PTS	Original League Standings	Regressed League Standings	+/- Standings
WINNIPEG	84	89	5	22	17	5
EDMONTON	67	71	4	28	28	0
OTTAWA	88	90	2	19	15	4
NY ISLANDERS	79	81	2	26	24	2
FLORIDA	66	68	2	29	29	0
DETROIT	93	94	1	14	12	2
NEW JERSEY	88	89	1	19	17	2
NASHVILLE	88	89	1	19	17	2
ANAHEIM	116	117	1	2	1	1
SAN JOSE	111	112	1	4	3	1
CALGARY	77	78	1	27	27	0
NY RANGERS	96	96	0	12	10	2
WASHINGTON	90	90	0	17	15	2
LOS ANGELES	100	100	0	9	8	1
PHOENIX	89	89	0	18	17	1
BOSTON	117	117	0	1	1	0
MINNESOTA	98	97	-1	11	9	2
PITTSBURGH	109	108	-1	6	5	1
CHICAGO	107	106	-1	7	6	1
VANCOUVER	83	82	-1	24	23	1
ST LOUIS	111	110	-1	4	4	0
PHILADELPHIA	94	93	-1	13	14	-1
DALLAS	91	89	-2	16	17	-1
BUFFALO	52	49	-3	30	30	0
TORONTO	84	81	-3	22	24	-2
CAROLINA	83	80	-3	24	26	-2
COLUMBUS	93	89	-4	14	17	-3
TAMPA BAY	101	96	-5	8	10	-2
MONTREAL	100	94	-6	9	12	-3
COLORADO	112	105	-7	3	7	-4

As you can see, Varlamov and Price both had huge season for Colorado and Montreal respectively. These new league standings have Montreal sliding down to a wildcard spot and Colorado sliding down to 3rd in the Central. While Winnipeg once again suffered a severe case of Pavelectricity keep them out of a potential 4 way tie for the final wildcard spot in the West.

The results show what I think many would agree with, some teams benefited highly from their goalies while others really suffered. While this helps us more or less neutralize the effects of a particularly strong or weak season by a particular goalie it doesn't totally level the playing field, keeping individuality live and well.

***Notes***

I only chose goalies fitting my 900 shots or about 35 games played just to eliminate as many small sample sizes as possible
I didn't run this regression for Henrik Lundqvist and Tuukka Rask, simply based on the fact that these two have yet to post non-elite numbers so I felt it unfair to hurt either of them on the basis that we shouldn't expect much regression at all
Minnesota didn't have any goalies who fit my minimum requirement therefore I simply ran the regression for their team average