/cdn.vox-cdn.com/uploads/chorus_image/image/27452791/20131221_ter_aa1_621.0.jpg)
There have been numerous attempts to try to find the best ways to predict the future, and of course there is no perfect method. There are just too many unpredictable factors at work. But that doesn't stop people from trying, so let's see how effective the team advanced stats can be.
In order to evaluate the effectiveness of different metrics I threw together the 5-on-5 Close data for each season from 2007-08 through 2012-13 in order to calculate Goals For Percent, Fenwick For Percent, and Corsi For Percent. I also got the Points totals for all the teams in each system in order to calculate Points Percent. The Points totals create a fairly accurate proxy for which teams make the playoffs and which do not as except in a small handful of occasions the 16 teams with the most points are the teams that make the playoffs in any given season. However, I also looked at the teams that did make the playoffs to see how effective they were once they got there, taking number of Playoff Wins divided by 16 to create a Playoff Success Percent.
Averages and Mean
In order to look at which metrics are more effective I decided to look at it two different ways. First we have the entire 180 separate data points from 6 seasons worth of 5-on-5 Close numbers. Then we have a combined 6-year total for each of the 30 teams.
The mean value remains the same whether we are looking at the 6-year combined results or the 180 separate data points. The mean GF%, FF%, and CF% are all an even .500, which is exactly what we would expect to see because each event for one team means an event against another team. However, due to the "Loser Point" system the NHL employs the mean P% is actually a .559 rather than a nice even .500. This is usually the number people base their evaluations on, suggesting that being above .500 is good while falling below it is bad.
However, it fails to take into account the fact that average is a range, not a fixed point. So an actual average value would be any result that falls within 1 standard deviation of the mean. So I calculated the average range we wanted to focus on using a simple standard deviation from the mean. The results differ depending on whether we are looking at the entire 180 data point sample, there is much more variability involved in a single season, or whether we look at the combined 6-year data which has had time to "regress to the mean" as it were.
That gives us an average GF% between .454 and .546 based on the total data or between .473 and .528 when looking at the combined data. Likewise we find FF% to be a range from .468 and .532 based on total data or between .478 and .522 when looking at the combined data. Then our CF% falls between .467 and .533 based on the total data or between .478 and .523 when looking at the combined data. That would indicate that some of the studies that "prove" their findings based on comparing teams around a fixed point mean of .500 are perhaps a bit inaccurate.
Interestingly, the average range of P% is a much wider spread, but considering the post-Lockout parity in the NHL that shouldn't come as too much of a surprise. The skill level is quite evenly balanced throughout the league and anybody can make the playoffs in any given year. The average P% is between .479 and .639 when looking at the total data and between .503 and .614 when we look at the combined data. So this is in fact one situation in which we can argue that being above .500 is a generally positive result while being below is generally poor.
Our PO%, on the other hand, has very little consistency. Of course that goes back to the parity in the league which means anybody has a chance to win it all once the ball gets rolling. In the old days the dynasty teams could roll through easily and upsets were few and far between, but in the modern NHL even the #8 seed stands a chance. When looking at all 180 separate data points the Mean value is .002, i.e. teams that make the playoffs and get swept in the first round, while the average range can be anything from not making the playoffs to .016, meaning most teams never make it out of the first round.
We have slightly more workable data when looking at the 6-year combined data though. The mean is .122 and the average range extends from .000 to .244, which means average teams don't make it past the first round, a mean value of 2 games with a range of 0 Wins to just shy of 4 Wins. If this is true then any time a team makes is into the 2nd round it can be seen as a positive result, and if we extend that to +3 standard deviations above the mean we see that anything at 8 Wins and above, i.e. making it into the Conference Finals, is an exceptional result that top 0.1% of teams are able to accomplish.
Of course no team has come close to being that consistent every single year, the age of the dynasty teams are long behind us. However, there are a few teams that have been exceptionally effective over the past 6 seasons. The Detroit Red Wings have the most playoff success at .531, followed by the Pittsburgh Penguins and Boston Bruins tied at .521, all 3 of which fall into the truly elite +3 standard deviation range. Nearly as impressive is the Chicago Blackhawks whose .479 puts them in the +2 standard deviation top 2.2% of the league. Only two teams, the Winnipeg Jets/Atlanta Thrashers and the Edmonton Oilers fall into the below average failed to make the playoffs range.
Correlation
Now that we have the technical details out of the way, we can throw it all together to calculate the coefficient of correlation through a simple linear regression.
The R² when comparing P% to
- Total GF% - .6541
- Combined GF% - .8247
- Total FF% - .3372
- Combined FF% - .6052
- Total CF% - .3350
- Combined CF% - .6098
That seems to suggest that when trying to predict which teams will finish the season with the highest Point totals and as such are most likely to make the playoffs are those that are the best at outscoring their opponents, while possession is far less important. However, both are moderate correlations, the only strong correlation we get is when looking at GF% in the combined 6-year data. This isn't really a surprise though, as scoring more than your opponent results in a W, which in turn results in more Points. So it doesn't tell us anything we don't already know except that scoring more goals than your opponent is more important than getting more shots than your opponent. However, the argument then is that outshooting your opponent usually results in outscoring your opponent, so let's see how that one stacks up.
The R² when comparing GF% to
- Total FF% - .3075
- Combined FF% - .6372
- Total CF% - .3128
- Combined CF% - .6249
Oddly enough that is quite similar to what we see for P%. There is a moderate correlation between the possession numbers and the ability to outscore your opponent, but when looking at a single season by itself it is on the low end of moderate. This suggests that being able to outshoot your opponent does have some influence on the results, and as such the ability to make the playoffs, but it is just one factor and there are other aspects that are equally or even possibly more important. So trying to base your predictions solely on possession is not a particularly effective method, although basing it solely on GF% is also not a fool proof plan.
But that just gives us a relative estimate of which teams are going to make the playoffs, not how they are going to do once they get there. So to do that we look at the predictive nature by comparing the regular season 5-on-5 Close data we gathered to the playoff results.
EDIT -Error Detected in Excel's attempt to graph the results, Updated with corrected correlations.
The R² when comparing PO% to
- Total P% - .0957
- Combined P% - .6966
- Total GF% - .0472
- Combined GF% - .6403
- Total FF% - .0727
- Combined FF% - .5452
- Total CF% - .0558
- Combined CF% - .5100
Well, we still see that from one season to the next there is just no rhyme or reason to who succeeds in the playoffs. The parity in the league makes it so that practically anybody has a chance to win it all, and there are too many other factors that could effect the outcome. So you can't look at the regular season results and make any kind of reasonable prediction as to what we can expect to happen in the playoffs. However, the teams that have traditionally done well in the regular season also tend to continue to do quite well during the playoffs. But with roster turnover and staff changes some teams may be better or worse than their 6-year numbers suggest, so there is still a lot of uncertainty, which explains why there is still just a moderate correlation, even if it is on the high end of moderate.
Having seen those results, I then wanted to see what the numbers would look like if we were looking to see just how far the team made it rather than trying to determine who had the most wins. So in order to do those correlations I assigned a value of 0 to teams that did not make the playoffs, 1 to teams that did not make it past the 1st round, 2 to teams that did not make it past the 2nd round, 3 to teams that did not make it past the 3rd round, 4 to teams that lost in the Finals, and 5 to teams that won the Stanley Cup.
The R² when comparing Round to
- Total P% (Playoff teams) - .1004
- Combined P% (Playoff teams) - .5581
- Total GF% (Playoff teams) - .0527
- Combined GF% (Playoff teams) - .5409
- Total FF% (Playoff teams) - .0707
- Combined FF% (Playoff teams) - .4681
- Total CF% (Playoff teams) - .0503
- Combined CF% (Playoff teams) - .4431
- Total P% (All teams) - .4569
- Combined P% (All teams) - .6269
- Total GF% (All teams) - .2906
- Combined GF% (All teams) - .6259
- Total FF% (All teams) - .2573
- Combined FF% (All teams) - .5505
- Total CF% (All teams) - .2324
- Combined CF% (All teams) - .5309
I'm somewhat surprised to see that when you toss in all 180 results you wind up with a better correlation than when you focus on just the 96 teams that had made the playoffs. But I suppose once again we can chalk that up to the playoffs being a different animal, anything can happen and sometimes the better team loses. When we look at just the playoff teams the correlation between playoff success and GF%, FF%, and CF% is low enough to be practically non-existent. However, the P% is higher, albeit still a weak correlation. If I had to wager a guess I would say that in order to have done that well in the regular season it stands to reason that the team has the perfect combination of traits, offense and defense, possession and scoring, as well as all the little unquantifiable factors that are so often chalked up to luck. And a complete team that can walk away doing that well in the regular season is usually built for a deep playoff run. When we toss in all teams, P% becomes a moderate correlation while GF%, FF%, and CF% are weak.