Thursday, 21 May 2015

NHL Expected Goals Model


Did anyone ever consider shot quality? 

UPDATE: This model has since been improved upon and shown here. This post still provides good background on the basics of the model.

Shot quality and possession metrics have always been somewhat a point of contention. Expected Goals (ExpG) helps to combine these two facets in hopes of providing better information about the game. Expected Goals are not a novel concept, ones have been presented previously by Brian Macdonald for hockey and the original motivation for my study by Michael Caley's soccer version. I hope to lay out my ExpG model in a way that makes hockey sense, where everyone can understand why each factor was added into the model. The model works by assigning a value to each shot taken over the course of a season based on the model's predicted probability of that shot resulting in a goal. To calculate a team's final ExpG all you have to do is sum up all of these probabilities and there you have it. First I will breakdown the methodology that goes into this model. If you don't care and just want to see the results skip down to the Expected Goals section or check out the Expected Goals tab above.

Methodology


My model uses a logistic regression to arrive at each goal probability. Basically, it uses a bunch of independent variables to produce the odds of binary outcome occurring, in our case, yes a goal was scored or no a goal wasn't scored. I reran the logistic regression for each season instead of using one big logistic regression. So far my model only accounts for 5-on-5 situations. This helps to account for minor changes in style of league play yet the regression coefficients didn't actually change much year-to-year. Here are the factors taken into account by the model:
  • Adjusted Distance
    • The farther a shot the lower likelihood it results in a goal 
  • Type of Shot
    • Snap/Slap/Backhand/Wraparound/etc...
  • Rebound - Yes/No?
    • A rebound is defined as a shot taking place less than 4
  • Score Situation
    • Up a goal/down a goal/tied/etc…


Results


In the two graphs below you can see how well ExpG, both offensively and defensively, correlates with actual results. Each point represents one team from one season, except 2012-2013 was removed due to the lockout. 



There will always be some outliers in a given season but I think the model goes a relatively good job. The chart below shows that ExpG comes out on top when compared to Corsi and Scoring Chances in terms of correlation to real goals for and against in a given season.


Goals For Goals Against
ExpG 0.58 0.6
Corsi 0.493 0.57
Scoring Chances 0.53 0.562

Future Work


In the next coming weeks I will be focusing my efforts on two different aspects of this model. Firstly, I will investigate how well it predicts future goals, from one season to the next as well as something similar to Micah Blake McCurdy did with score-adjusted Corsi. Secondly, I will be looking at other factors to add into the model. I plan on adding rush shots as a factor, though the current state of my data will require some tweaking before I can do that. I also plan on exploring the effects of incorporating shooter talent and goaltender talent. I also plan on releasing ExpG at the player level and use aspects of this model to better xSV%. 

Expected Goals


I just wanted to thank War-On-Ice and Sam Ventura for the data used in this project. Finally, here are the results below. Note that, dGF/dGA/dGF%, are calculated as actual minus expected. I will give this spreadsheet its own tab at the top of this site too. Please let me know any questions or feedback you might have. Enjoy!



2 comments:

  1. I will be focusing my efforts on two different aspects of this model. Firstly, I will investigate how well it predicts future goals, from one season to the next as well as infographics services give you a big advanyage.

    ReplyDelete
  2. What is the formula for expected goals? is it calculated using linear weights,
    is it a secret proprietary calculation ?

    I know ther are various models out there, but can't seem to find an actual formula

    ReplyDelete