# Projecting 2015-16 NHL Standings by Identifying the Best Predictive Statistics

“If you can look into the seeds of time and say which grain will grow and which will not, speak then to me, who neither beg nor fear your favors nor your hate.” ~ Lord Banquo from Macbeth

Those who follow the NHL are fanatical and devoted. As every new season approaches, each and every fan gathers with a buddy over a beverage and discusses the outlook for their franchise of choice. Each of us share one thing in common with Banquo, Macbeth’s brave and ambitious right-hand man in Shakespeare’s play. You see, Banquo desperately wants to know what the future holds and urges some otherworldly prognosticators to shed light on who will become king, or should I say, sip from the Cup.

I never considered myself otherworldly, or worldly for that matter, but I do appear in front of you now with a forecast of events to come. Last season, I investigated which fancy stat possession statistic would be the best predictor of future results. This season, I have taken my prognostication to the next level. I have used both simple (i.e., one variable) and multiple linear regression analysis to determine which combination of statistics most enhances the clarity of the crystal ball.

Macbeth and Banquo were as eager as you are to have the future foretold. But Banquo was skeptical of the forecast. Macbeth, on the other hand, seemed to find the outlook quite reasonable. I have presented my analysis and summarized results below. Are you like Macbeth and ready to trust the numbers? Or are you another Banquo?

IDENTIFYING INDEPENDENT VARIABLES

The plan is this: I want to find one, or several, team statistics from the prior season (i.e., the independent variables) which act as strong predictors and can be used alone, or in combination, to reduce the expected variability in my forecast of each team’s current season point total (i.e., the dependent variable).

I started with the following groups of team statistics:

 Goals for/against Hits Special teams Corsi/Fenwick Save percentage Penalties Shooting accuracy Offensive zone starts Faceoffs

Each statistic can be evaluated under different game situations. For example, assessing puck possession of a player while he’s on the powerplay or shorthanded isn’t necessarily relevant. A team with the man advantage would naturally possess the puck more, you would hope, in that situation. Consequently, I started my analysis researching 5-on-5 situations only. This could be at any point in time of the game; however, I also narrowed my data review to critical game situations such as “Within 1” and “Close”. A “Close” situation is when the game is within one goal in the first two periods or tied in the final period.

The question now becomes: Which of these metrics can best predict future outcomes? Before fine tuning the model, I evaluated each of these statistics on their own to determine which had the strongest correlation to a team’s point total the following season.

SIMPLE REGRESSION

I used simple one-variable linear regression to assess whether a particular statistic is most directly linked (i.e., most highly correlated) to a team’s points. The goal of linear regression is to formulate a line that best approximates the observed data. The thought being, one can develop stronger forecasts the more correlated this line is with the data points. Correlation is measured using R2 which, in this case, ranges from 0 to 1. Correlation is strongest as R2 approaches 1 and weakest as it approaches 0.

The following graph shows the correlation between regular season point totals and the team’s 5-on-5 Fenwick from the prior season. (Fenwick is the sum of shots on goal and shots missed, but excludes shots blocked.) The formula at the bottom of the graph is the origin of the regression line that best fits the historical results. This formula could, in effect, be used as a predictor for future results.

{Performing regression analysis on Fenwick results in an R2 of 0.2523 which doesn’t seem all that positive on a scale of 0 to 1. However, the data points do have a visible slope in comparison to the same analysis of another statistic, such as, penalty differential.

Penalty differential is a measure of the number of penalties drawn less the penalties your team has taken. As you can see above, this metric has an R2 approaching zero and doesn’t pass the eye test. The data is scattered about randomly; there is no visual evidence that better penalty differential results in more team points.

I performed this type of simple regression analysis on the 5-on-5 independent variables introduced earlier and have presented a sample of the results in the table below. You begin to notice how the related statistics interact with each other. For example, goals for and goals against each have a low R2 on their own; however, goal differential (i.e., goals for less goals against) is abundantly more predictive.

 Goals for – R2 = 0.0230Goals against – R2 = 0.0442Goal differential – R2 = 0.2450 Hit differential % – R2 = 0.0295 Powerplay goals for – R2 = 0.0020Powerplay goals against – R2 = 0.0353Special teams goal differential – R2 = 0.0880 Corsi for – R2 = 0.1989Fenwick for – R2 = 0.2523 Save % – R2 = 0.0132PDO (shooting + save %) – R2 = 0.0516 Penalty differential – R2 = 0.0026 Shooting % – R2 = 0.0395 Offensive zone starts – R2 = 0.0916 Faceoff % – R2 = 0.1122

Of the variables analyzed, goal differential and either of the puck possession metrics, Corsi or Fenwick, are the most correlated on their own with the following season’s point totals. But can the forecast be improved by combining these statistics in a regression analysis? The next step is to test different combination of variables in an effort to increase R2. And, based on the one-variable results above, I think I’ll start with Fenwick and goal differential and build from there.

MULTIPLE REGRESSION

The concepts behind simple and multiple regression are the same; however, multiple regression is a means of testing the fit of a multi-variable forecast model. In other words, anywhere from two to several statistics can be used to test a hypothesis that the arrangement of those statistics can predict an outcome with as low variability as possible.

The statistical analysis differs slightly under multiple regression. There are two outputs to review: Adjusted R2 is the first and has essentially the same meaning as R2 under simple regression, and the second is the p-value. A model’s success in predicting outcomes requires explainable variability (as high an adjusted R2 as possible) and statistically significant independent variables (p-values with at least 95% significance is suggested and 99% significance is better). And you can’t only rely a few of your variables to be statistically significant; the forecast will not be appropriately predictive unless they all are.

I revisited the independent variable introduced earlier and tested combinations of several pairs of independent variables. The most statistically significant combination of variables always included Fenwick or goal differential with another variable. Some statistics such as hits and penalties produced unsatisfactory results and were discarded from further evaluation. Other variables (e.g., faceoffs, zone starts and special teams) were significant within a range of 80-90%. These statistics weren’t adequate enough to be used alone with Fenwick or goal differential, but they did have enough of a predictive nature to reconsider in other combinations later in my analysis.

The most statistically significant combination of statistics are listed below with the results of the multiple regression. At this point, the most predictive two merics are Fenwick and goal differential.

• Fenwick & goal differential: Adjusted R2 = 0.3023, 99% significant
• Fenwick & PDO: Adjusted R2 = 0.3021, 99% significant
• Goal differential & PDO: Adjusted R2 = 0.2772, 99% significant

As you see, despite having weak correlation on its own, PDO (the sum of offensive shooting and goaltender save percentage) contributes to the forecast when used with another variable. However, I tried running a three-variable regression with Fenwick, goal differential and PDO and the combination of variables were no longer predictive. In fact, any combination of three of variables could not reproduce the levels of reduced variability and statistical significance summarized above.

INTERACTION OF VARIABLES

Before finalizing the forecasting model, it’s crucial to understand how variables interact with each. The combination of such variables as goal differential and PDO may not be strong predictors as separate independent variables because they are dependent on each other. A team’s goal differential is completely related to its shooting accuracy and ability to stop pucks. Consequently, you can test these statistics together in a regression analysis, but as a single independent variable. How do you turn two variables into one? In this case, I multiplied goal differential by PDO.

I tested several of the statistics in this manner to determine which interactions improved the model. Could zone starts be multiplied by Fenwick? What if I analyzed the relationship between goal differential and special teams? In the end, I observed that winning faceoffs and puck possession were naturally interactive variables in addition to goal differential and PDO.

After considering the interaction of variables, the most statistically significant combination of statistics were the following:

• Fenwick x faceoffs & goal differential x PDO: Adjusted R2 = 0.3034, 99% significant

I then determined which scoring situation provided the best results. The results above are based on all 5-on-5 scenarios. Would they be the best predictors? Or would a 5-on-5 Close or 5-on-5 Within 1 scenario improve the model? It turned out that developing the most statistically significant model meant basing puck possession on a Within 1 scenario and using goal differential and PDO data from all 5-on-5 situations. The final regression results were as follows:

• 5-on-5 Within 1 Fenwick x faceoffs & 5-on-5 goal differential x PDO: Adjusted R2 = 0.3083, 99% significant

Satisfied that my model could not be significantly improved with further analysis, I developed the formula below to predict the number of points each NHL team will finish the 2015-16 NHL season with.

 2015-16 Team Points = 37.77 + 216.30 x 2014-15 5-on-5 Within 1 Fenwick x faceoffs + 0.13 x 2014-15 5-on-5 goal differential x PDO

You can visualize this formula graphically by comparing its results to the average point totals dating back to the 2008-09 NHL season. For this purpose, I excluded the lockout shortened 2012-13 campaign. In the graph below, Season-end Rank refers to the order in which each NHL team would place if each team’s point totals were ranked from one through 30.

The forecasting model developed with the regression formula is fairly representative of season-ending point totals without consideration for the specific teams involved. The model does start to produce gaps at each tail of the graph. The reason for these gaps is because regression analysis develops results that regress toward the mean.

The projected point value developed by the formula is an average of an acceptable range of results that maintains statistical significance. Therefore, the model is sound but the results would not appear similar to what you expect to see in the season-end standings. In reality, teams perform better than expected and wind up in the upper percentiles of their range of results and other franchises underachieve and fall below what is expected on average.

To develop a prediction that is more representative of reality, I converted the blue average point total line shown in the graph above into five linear segments as shown in the graph below. I then used these five segments of the actual point total data to linearly interpolate the results of the regression analysis.

The results of this adjustment to the regression analysis output is shown in the chart below. As you can see, my prediction of season-end point totals is far more consistent with actual results observed over the past several NHL seasons.

But that’s just the math behind it. Let’s get to your favorite hockey teams.

2015-16 PROJECTED NHL STANDINGS

All the statistics used in the regression formula I developed are based on the 2014-15 NHL regular season and are summarized in the table below. I have included the projected 2015-16 point totals based on the regression formula itself and the final projection which was adjusted to better represent recent historical results. As a reminder, the final regression formula was:

 2015-16 Team Points = 37.77 + 216.30 x 2014-15 5-on-5 Within 1 Fenwick x faceoffs + 0.13 x 2014-15 5-on-5 goal differential x PDO

 2015-16 Projected NHL Standings Based on Multiple Linear Regression Analysis Team 14-15 5-on-5 Within 1 Fenwick 14-15 5-on-5 Within 1 Faceoffs 14-15 5-on-5 Goal Differential 14- 15 5-on-5 PDO Projected 15-16 Points Based on Regression Results Projected 15-16 Points Based on Adjustment to Regression Results Blues 52.6% 53.9% +42 100.6% 104.7 116.0 Blackhawks 52.9% 51.6% +34 100.4% 101.3 107.1 Lightning 53.3% 48.9% +53 101.7% 101.3 106.4 Capitals 51.5% 52.3% +38 101.4% 101.1 105.7 Kings 54.2% 51.2% +21 99.7% 100.6 104.8 Islanders 54.1% 50.4% +21 99.2% 99.5 104.5 Red Wings 52.3% 51.8% +20 100.4% 99.0 102.7 Canadiens 50.1% 51.4% +30 101.7% 97.5 98.3 Wild 51.7% 49.6% +29 100.3% 97.1 97.4 Predators 53.3% 48.6% +24 99.9% 97.0 97.1 Stars 52.5% 52.1% 0 99.6% 96.9 96.9 Bruins 50.6% 52.8% +8 100.0% 96.6 96.7 Jets 53.1% 48.9% +19 100.5% 96.5 95.3 Penguins 53.8% 48.7% +13 99.9% 96.2 94.1 Ducks 51.2% 51.4% +7 99.9% 95.6 93.0 Rangers 49.8% 46.0% +61 101.9% 95.6 92.4 Sharks 51.2% 51.9% -2 99.4% 95.0 91.4 Hurricanes 52.1% 52.2% -36 97.5% 91.9 90.8 Senators 48.2% 48.8% +24 101.2% 91.9 89.5 Canucks 50.5% 47.0% +16 100.6% 91.2 87.7 Flyers 48.5% 50.6% -11 99.8% 89.4 84.9 Flames 46.2% 47.9% +24 101.6% 88.9 83.4 Panthers 50.5% 48.3% -15 99.1% 88.6 82.7 Blue Jackets 46.5% 49.8% -21 100.5% 85.1 81.7 Avalanche 44.0% 50.0% -14 100.9% 83.5 80.0 Devils 46.9% 46.8% -33 100.5% 80.8 75.0 Maple Leafs 46.0% 49.6% -51 99.2% 80.4 73.2 Coyotes 46.7% 52.4% -102 97.1% 77.6 67.9 Oilers 47.4% 49.0% -83 97.1% 77.3 65.2 Sabres 38.4% 45.1% -116 98.5% 60.1 59.3

St. Louis is projected to capture the 2015-16 Presidents’ Trophy. The Blues have similar puck possession and PDO metrics to their Central Division rivals from Chicago, but it’s winning faceoffs and goal differential that sets them apart. Of course, this is only based on regular season data and we all know what happens in Gateway to the West come playoff time.

Teams such as the New York Rangers struggle with puck possession but have Henrik Lundqvist to keep goals against to a minimum and boost PDO. On the opposite end of the spectrum are the Dallas Stars. Dallas put together some decent Fenwick and faceoff numbers last season, but are hurt by moderate goal differential and a sub-100% PDO. Other teams like the Buffalo Sabres require no further explanation.

These results can be displayed graphically so you can better visualize projected playoff implications. I have charted each division separately starting with the Eastern Conference.

Last season’s Cup finalists from Tampa Bay lead the way indicating that last season’s performance before and during the postseason was no fluke. All Eastern teams are well-balanced between the Atlantic and Metropolitan with no division jumping out as being more dominant than the other.

The problem with basing a forecast on prior season data is that the projection can look at times like the past. In this case, the only projected change to the eventual playoff lineup is Boston replacing Ottawa – which, in fact, was the likely scenario for most of last season before the Bruins imploded and Andrew Hammond emerged in the Senators net. That being said, the purpose of the regression analysis was to identify certain statistics from the prior season that would be correlated to the following season’s performance despite the fact they were observed last year.

I also produced a similar chart for the Western Conference in which there appears to be more movement in comparison to the 2014-15 standings than what is observed with the Eastern Conference results.

Two of the top three positions are occupied by Chicago and Los Angeles who have won the Stanley Cup in five of the past six seasons. The Kings are projected to make a big leap from their sub-par 2014-15 campaign all the way to the Pacific Division title. Dallas and San Jose are also expected to push into playoff positions and Winnipeg falls out in a numbers game since the Central Division is expected to be extremely competitive.

The emergence of Los Angeles and the departure of Calgary are examples of how both teams defied the logic of modern hockey analytics last season. The Kings had one of the league’s best Fenwick percentages in 2014-15 and still finished out of the playoffs whereas the Flames qualified for the playoffs (and won a series) in unprecedented fashion despite bottom quartile puck possession.

The Edmonton Oilers are projected to bring up the rear in the Western Conference yet again. However, now that I think of it, I forgot to include Connor McDavid as his own independent variable in all the regression testing I performed.

BELIEVER OR SKEPTIC?

So, there you have it. I took a new multi-variable approach to prognosticating the final 2015-16 NHL standings. I will never be able to forecast with such accuracy as the witches in Macbeth. I just hope to explain as much of this season’s variability as one could using statistics from last year.

But what does the future hold for your team? Are you like Macbeth and figure these results seem reasonable to you. Or, are you a Banquo and skeptical that the crystal ball is merely a hoax that will shatter once the first puck is dropped.

Let’s all find out.

Bob Sullivan writes periodically for SportingCharts.com and can be followed on Twitter at @mrbobsullivan.

NOTES

The puck possession data has been obtained from war-on-ice.com

TEAM UP