Did anyone ever consider shot quality?
UPDATE: This model has since been improved upon and shown here. This post still provides good background on the basics of the model.
Shot quality and possession metrics have always been somewhat a point of contention. Expected Goals (ExpG) helps to combine these two facets in hopes of providing better information about the game. Expected Goals are not a novel concept, ones have been presented previously by Brian Macdonald for hockey and the original motivation for my study by Michael Caley's soccer version. I hope to lay out my ExpG model in a way that makes hockey sense, where everyone can understand why each factor was added into the model. The model works by assigning a value to each shot taken over the course of a season based on the model's predicted probability of that shot resulting in a goal. To calculate a team's final ExpG all you have to do is sum up all of these probabilities and there you have it. First I will breakdown the methodology that goes into this model. If you don't care and just want to see the results skip down to the Expected Goals section or check out the Expected Goals tab above.
My model uses a logistic regression to arrive at each goal probability. Basically, it uses a bunch of independent variables to produce the odds of binary outcome occurring, in our case, yes a goal was scored or no a goal wasn't scored. I reran the logistic regression for each season instead of using one big logistic regression. So far my model only accounts for 5-on-5 situations. This helps to account for minor changes in style of league play yet the regression coefficients didn't actually change much year-to-year. Here are the factors taken into account by the model:
- Adjusted Distance
- The farther a shot the lower likelihood it results in a goal
- Type of Shot
- Rebound - Yes/No?
- A rebound is defined as a shot taking place less than 4
- Score Situation
- Up a goal/down a goal/tied/etc…
In the two graphs below you can see how well ExpG, both offensively and defensively, correlates with actual results. Each point represents one team from one season, except 2012-2013 was removed due to the lockout.
There will always be some outliers in a given season but I think the model goes a relatively good job. The chart below shows that ExpG comes out on top when compared to Corsi and Scoring Chances in terms of correlation to real goals for and against in a given season.
|Goals For||Goals Against|
In the next coming weeks I will be focusing my efforts on two different aspects of this model. Firstly, I will investigate how well it predicts future goals, from one season to the next as well as something similar to Micah Blake McCurdy did with score-adjusted Corsi. Secondly, I will be looking at other factors to add into the model. I plan on adding rush shots as a factor, though the current state of my data will require some tweaking before I can do that. I also plan on exploring the effects of incorporating shooter talent and goaltender talent. I also plan on releasing ExpG at the player level and use aspects of this model to better xSV%.
I just wanted to thank War-On-Ice and Sam Ventura for the data used in this project. Finally, here are the results below. Note that, dGF/dGA/dGF%, are calculated as actual minus expected. I will give this spreadsheet its own tab at the top of this site too. Please let me know any questions or feedback you might have. Enjoy!