clock menu more-arrow no yes mobile

Filed under:

Paint By Numbers

A statistical analysis of the advanced metrics used in the hockey analytics movement.

Jamie Sabau

Once again we have seen the topic returning to the use of "Advanced Stats" and their effectiveness in evaluating players and teams. Yesterday we had a visitor from the New York Islanders site Lighthouse Hockey who created a fanpost about Hockey Analytics, and then in Natasha's Pens Points article we were linked to an article discussing Rob Scuderi and general Corsi numbers.

So this led me to get back to an article idea I had been working on for some time, and have actually explored numerous times in comments and fanposts. So we will be taking a look at various different metrics used to gauge team success in the NHL. I opted to use 5-on-5 Zone Start Adjusted Close data because I prefer the metrics that account for differences in zone starts, as there is a distinct difference in play style when a team is in their own end compared to when they get to start in their opponent's end, and recently I have come to accept that the Close data also helps account for the differences in the way the team plays when they are trailing or defending a lead. I have also embraced the use of percentages rather than a strict numerical difference when comparing teams (or players, but that's another show).

I looked at the usual suspects Goals For per 60 minutes, Goals Against per 60 minutes, Goals For Percent, Fenwick For Percent, and Corsi For Percent. I also decided to take a look at Fenwick Shot Percentage, I prefer that one because it accounts for Shots on Goal as well as Missed Shots, since as far as I am concerned a player should be held accountable for completely missing the net just as much if not more so than being unable to beat the goalie. Then I used Corsi Save Percentage, because it accounts for the defensive ability to pitch in with Blocked Shots. Of course this means my PDO statistic is completely different from the traditional PDO statistic (which is calculated using standard Shots on Goal). I compared all of these number to Points Percentage, as this allows us to compare similar metrics (percentage vs percentage) while being able to negate the difference in Points total available in the shortened 48-game 2013 season.

I ran the numbers individually for each of the 2007-08, 2008-09, 2009-10, 2010-11, 2011-12, and 2012-13 seasons as well as a 6 season total from 2007-13. I only looked at how the stats compare to the Point totals of their respective seasons, since my main focus was to try to determine the effectiveness of the different metrics in evaluating performance, not trying to predict the future. I am very skeptical of any metric's ability to be used as a predictive measure of future team (or player) success, but I still have plans to in the future compare the numbers to the following season(s) to see how effective they were at determining how well the team would perform in future seasons.

A Picture is Worth a Thousand Words

The first thing I had to do was to determine what the league averages were for the various different metrics. I did this by using the the team data from all 6 seasons and then finding the Median Value from all 30 NHL teams. I then decided to take it a step further by calculating the Standard Deviation and determining the values in each of the 67% (1 Standard Deviation), 95% (2 Standard Deviations), and 99.7% (3 Standard Deviations) populations. So any value between 1 Standard Deviation below and 1 Standard Deviation above the Median can be seen as an average performance. A team that performs above that average would be good (1-2 Standard Deviations), great (2-3 Standard Deviations), or excellent (more than 3 Standard Deviations). Then a team that performs below that average would be poor (1-2 Standard Deviations), bad (2-3 Standard Deviations), or awful (more than 3 Standard Deviations). And of course the whole purpose was to determine the Correlation Coefficients, which I did for each of the 8 metrics.

  • GF60 - Median 2.40; 67% 2.21-2.58; 95% 2.03-2.77; 99.7% 1.84-2.96; Correlation .2685.
  • GA60 - Median 2.44; 67% 2.23-2.65; 95% 2.03-2.86; 99.7% 1.82-3.06; Correlation -.5453.
  • GF% - Median 50.3; 67% 47.5-53.0; 95% 44.7-55.8; 99.7% 41.9-58.6; Correlation .8190.
  • FF% - Median 49.5; 67% 47.2-51.7; 95% 45.0-53.9; 99.7% 42.8-56.2; Correlation .5854.
  • CF% - Median 49.4; 67% 47.1-51.7; 95% 44.8-54.0; 99.7% 42.5-56.3; Correlation .6055.
  • FSh% - Median 5.7; 67% 5.4-6.1; 95% 5.1-6.4; 99.7% 4.7-6.8; Correlation .0158.
  • CSv% - Median 95.8; 67% 95.5-96.1; 95% 95.2-96.4; 99.7% 94.9-96.6; Correlation .1225.
  • PDO - Median 101.3; 67% 100.9-101.7; 95% 100.6-102.1; 99.7% 100.2-102.4; Correlation .1496.

And since I am primarily concerned with the Pittsburgh Penguins I also determined their individual team Median values (and their corresponding Deviations) and the Correlation Coefficients for the individual team values. The reason I wanted to get the individual team values is because you often hear people talking about a team "regressing to the mean" used as an explanation for an impending fall back to Earth or a rise from the depths. So in order to determine what the mean they should expect to be regressing to happens to be, I figured I should see what the Pens average actually is (which can later be used to evaluate individual players). And while I was at it, I figured I would also see how it translates to playoff success (Win/Loss Percent).

  • GF60 - Median 2.84; 67% 2.60-3.09; 95% 2.35-3.33; 99.7% 2.11-3.58; P% .1487; W-L% -.0436.
  • GA60 - Median 2.39; 67% 2.11-2.68; 95% 1.82-2.96; 99.7% 1.54-3.25; P% -.3134; W-L% -.0611.
  • GF% - Median 54.3; 67% 50.2-58.5; 95% 46.0-62.6; 99.7% 41.9-66.8; P% .3946; W-L% .0013.
  • FF% - Median 51.6; 67% 48.7-54.6; 95% 45.7-57.6; 99.7% 42.7-60.5; P% .0000; W-L% -.7982.
  • CF% - Median 51.1; 67% 47.8-54.4; 95% 44.6-57.6; 99.7% 41.3-60.9; P% .0001; W-L% -.8411.
  • FSh% - Median 6.3; 67% 5.7-6.8; 95% 5.2-7.4; 99.7% 4.6-7.9; P% .0656; W-L% .2980.
  • CSv% - Median 95.8; 67% 95.2-96.5; 95% 94.5-97.1; 99.7% 93.8-97.8; P% .2194; W-L% .2157.
  • PDO - Median 102.1; 67% 101.0-103.2; 95% 100.0-104.3; 99.7% 98.9-105.3; P% .1740; W-L% .3154.

Paint With All the Colors of the Wind

What these numbers tell us is that there is no "Gold Standard" of hockey analytics. The closest thing we have is Goals For Percentage, which is the only instance we see during the regular season that provides us with a Strong Correlation (greater than .700) but that is just common sense that the team that is best able to outscore their opponent is going to be the team that wins the most often. But every single metric has its place, so to really evaluate the teams (or individual players) we need to consider all the different factors, not simply relying on just one.

What is rather interesting to see is just how much more important the Goals Against statistic (Moderate Correlation) is than the Goals For statistic(Weak Correlation). Being able to score is obviously necessary to win, but defense and goaltending is what separates the Champions from the rest of the pack. It is also interesting to note that the addition of a 6th season worth of data now shifts the balance of power once again so that Corsi has once again overtaken Fenwick as the more reliable metric. However, they are both Moderate Correlations and are actually still quite close, so it is reasonable to assume they can both still be used fairly interchangeably, although one should assume that Corsi is still the preferred method to use with limited amounts of data. Shot % has practically no impact, and while Save % is much higher it is still a rather Weak Correlation. And then of course there is PDO, which indicates that luck is a factor, although not a very big one (Weak Correlation).

However, the Penguins have a completely different take on things. I like to think this is due to the differences in play styles between the East and West. Whereas the West has been traditionally known for its possession based teams, here in the East the teams are more known for their rough and tumble gritty play. So with the Penguins we see the same trend of Goals Against (Moderate Correlation) being more important than Goals For (Weak Correlation), with overall differential being the more important factor (although now down to a Moderate Correlation). But in Pittsburgh we seem to notice that neither Fenwick nor Corsi have any impact whatsoever on team success. And while Shot % is still fairly meaningless, it is significantly higher than the league average, while Save % and PDO also see an increase in importance, although still being Weak Correlations.

But then we look at the playoffs and we see extremely different results. Goals For/Against are now meaningless, while the Shot % and PDO now leap into the realms of Moderate Correlation (Save % is ever so slightly lower than it is in the regular season). This leads me to believe that in the playoffs, offense is more important than defense, though that is not to say that defense is unimportant, you still need to be able to stop the puck, but as long as you walk away with more goals at the end of the night it doesn't matter how many you let in. Of course on of the biggest explanations for this is the increasingly important role luck has in the outcome of playoff games. Of course this may not hold true for all teams, and one could argue that sample size is an issue (although any time sample size is small you see luck becoming a much bigger factor, and playoffs are all about small sample size).

The most unusual result, however, is the importance of possession in the playoffs. It would appear that for the Penguins, their most successful seasons that lead to deep playoff runs are those in which their possession numbers are low. And there is an extremely Strong Correlation supporting this supposition. Of course it is important to remember that "Correlation does not imply causation," but it is quite intriguing to note that the Penguins best seasons over the past 6 years were those in which their possession numbers were below 50%, while their worst absolute train wreck in 2012 was the year that they had their best (greater than 1 Standard Deviation) possession numbers.

Paint It Black

6-years 2007-13

Points % - Good: San Jose Sharks, Detroit Red Wings, Pittsburgh Penguins, Chicago Blackhawks, Washington Capitals, Vancouver Canucks, and Boston Bruins.

GF60 - Great: Pittsburgh Penguins.

GF60 - Good: Chicago Blackhawks, Philadelphia Flyers, Detroit Red Wings, Toronto Maple Leafs, and Washington Capitals.

GA60 - Great: New Jersey Devils.

GA60 - Good: San Jose Sharks, Boston Bruins, New York Rangers, Vancouver Canucks, and Anaheim Ducks.

GF% - Good: Pittsburgh Penguins, Boston Bruins, Vancouver Canucks, Chicago Blackhawks, Detroit Red Wings, San Jose Sharks, and Washington Capitals.

FF% - Great: Detroit Red Wings and Chicago Blackhawks.

FF% - Good: San Jose Sharks, New Jersey Devils, and St. Louis Blues.

CF% - Great: Detroit Red Wings and Chicago Blackhawks.

CF% - Good: San Jose Sharks, New Jersey Devils, Boston Bruins, and Washington Capitals.

FSh% - Good: Pittsburgh Penguins, Philadelphia Flyers, and Montreal Canadiens.

CSv% - Good: Boston Bruins, New York Rangers, and San Jose Sharks.

PDO - Great: Pittsburgh Penguins.

PDO - Good: Montreal Canadiens, Anaheim Ducks, Vancouver Canucks, Philadelphia Flyers, Toronto Maple Leafs, Boston Bruins, and Washington Capitals.

Paint the Town Red

6-years 2007-13

Points % - Poor: Toronto Maple Leafs, Florida Panthers, Colorado Avalanche, Columbus Blue Jackets, Winnipeg Jets (Atlanta Thrashers), Tampa Bay Lightning, New York Islanders, and Edmonton Oilers.

GF60 - Poor: Florida Panthers, New Jersey Devils, and Minnesota Wild.

GA60 - Poor: Toronto Maple Leafs, Tampa Bay Lightning, Colorado Avalanche, Edmonton Oilers, and Winnipeg Jets (Atlanta Thrashers).

GF% - Poor: Ottawa Senators, Tampa Bay Lightning, Florida Panthers, New York Islanders, Colorado Avalanche, Winnipeg Jets (Atlanta Thrashers), and Minnesota Wild.

GF% - Bad: Edmonton Oilers.

FF% - Poor: Minnesota Wild and Edmonton Oilers.

CF% - Poor: Minnesota Wild and Edmonton Oilers.

FSh% - Poor: New York Islanders, Carolina Hurricanes, Los Angeles Kings, Ottawa Senators, Phoenix Coyotes, San Jose Sharks, New York Rangers, Florida Panthers, and New Jersey Devils.

CSv% - Edmonton Oilers, Ottawa Senators, Columbus Blue Jackets, Colorado Avalanche, Winnipeg Jets (Atlanta Thrashers), Detroit Red Wings, Tampa Bay Lightning, Calgary Flames, and Chicago Blackhawks.

PDO - Poor: Detroit Red Wings and Ottawa Senators.