Photo by Braden Barwich on UnsplashWhat would a hockey 2-point line look like?Thanks to the NHL stats API, we can find out!Blake AtkinsonBlockedUnblockFollowFollowingMay 23The IdeaThe idea of a 2-point line in hockey isn’t exactly new.
Nor is there anything unique about hockey that makes a 2-point line more necessary than say, soccer.
I chose to examine hockey because the NHL has the most accessible data.
In fact, it only took a few lines of code and 30 minutes of download time to download all ~685,000 shots and shot locations over the past 5 years.
Per usual, I’ve posted the python code I used on Github.
Also, it seems important to mention the 538 NBA article that originally inspired this post.
As NBA fans are aware, the game has changed drastically over the past few years in large part due to mapping shots.
Every NBA team (hopefully) is now very aware that 3-point shots are more or less the second most valuable shot in the game next to a lay-up or dunk.
While that seems like a basic concept, it wasn’t until stat nerds started mapping shots that it became empirically obvious.
Let’s do the same for hockey!Step 1: Download Over 685,000 ShotsI love APIs.
I don’t have much experience working with them, but the few times I have it’s been wonderful.
To my knowledge, most sports data you have to scrape from a website like sports-reference.
A lot of advanced sports data you have to build more advanced crawlers.
If you’re unlucky, advanced stats will be behind a paywall (like PGA Tour historical strokes gained) or worse… it’s virtually all behind a paywall (horse racing ????).
That’s why it was so refreshing to find the NHL stats API.
I first generated all the regular season API game urls.
Here’s the main snippet of that code:for gm in range(gms): game_num = "0000" + str((gm+1)) game_num = game_num[-4:] game_id = str(year)+"02"+game_num url= "https://statsapi.
com/api/v1/game/" +game_id+"/feed/live" urls.
append(url)Easy stuff!.The API end points then offer a treasure trove of data.
Almost too much data.
With the help of python3 requests and json libraries, I then just had to select the play by play data:# Note: tqdm is a progress bar library.
for ep in tqdm(urls): response = requests.
get(ep) game_json = json.
text) all_plays = game_json['liveData']['plays']['allPlays']Lastly, I filtered the play data to only non-empty net, regular time goals.
I excluded overtime because of 3v3 weirdness.
Within the json data, there are coordinates!.My final output was just a 3 column table: Shot Type, X-coordinate, and Y-coordinate.
Step 2: Plot them!This is where we can be creative.
Matplotlib and Numpy offer many ways to represent the data.
Below, on the left, I have a scatter plot of all the shots.
They’re colored by distance to goal.
On the right, I plotted all the goal locations.
They’re colored differently, based on frequency.
We’re just scratching the surface!I’m aware that a handful of shots come from outside the rink.
Rink dimensions I plotted are from Wikipedia so they have to be 100% accurate.
Another possibility is that rinks are not uniform in size or that there is systematic error in plotting shot locations.
Also, I found that hex bins work well and are visually attractive:It’s almost as if being close to the net helps.
????What do we want in a 2-point line?Back to the question at hand.
We’ve got our shot data.
We’ve got our goal data.
Now we need to determine the shooting percentages from various points on the ice.
Using basketball as a guide, it’s obvious that the ideal shot behind a two-point line wouldn’t be rewarded more than a high-danger one-point opportunity.
That gives us a bound on how close the line can be.
If the highest danger one-point shot goes in 25% of the time, then you don’t want the two-point shot to go in at a rate higher than 12.
5% of the time.
If it does, then there is no incentive to shoot one-point shots.
The other bound on how far the two point line should be is completely open to debate.
Should it be so far away that it’s only used as a hail mary, last-ditched effort to overcome a late deficit?.Or should it be so efficient that a shot on the line is preferred to many one-point shot locations (similar to basketball)?.I will probably strive for my best guess at a happy medium.
Mapping EfficiencyAll of this speculation doesn’t matter until we actually map the shooting percentages on the ice.
How am I doing this?.I used a Numpy mask ????????♂️ to set a minimum 8-shot threshold from a point on the ice to prevent small sample sizes.
There were still one or two high outlier values.
I set the maximum efficiency possible to 24% from trial and error, although some points in front of the net approached 30%.
Then I used a Gaussian filter to smooth the data and apply contours via Matplotlib.
I’m also drawing the lines via Matplotlib.
Lastly, I’m using a Numpy mask ????????♂️ to double the value of any goals made outside the two-point line.
NHL shooting percentages are on the left.
The result of doubling the value of points outside the two-point line is show on the right.
As expected, efficiency dramatically increases near the goal.
I was shocked how dramatically it increased.
In fact, there is a tiny zone in front of the goal that has 30% efficiency.
I think some of this close-net efficiency comes from fortunate rebounds that are basically shots into an open net.
The other interesting feature is that there are lines that extend about 30 degrees out from the goal where efficiency goes up.
I think this is a sweet zone where cross-ice passes allow one-timers to beat a goalie, but they’re not at such a sharp angle that it’s a tough shot to hit.
Again, I’ve never played hockey, and so there are plenty of people that could interpret it better than I could.
Head-on is another efficient place to shoot from, which makes intuitive sense.
My 2-point LineAt first, I was trying to have the exact same efficiency ratios as basketball, and the line seemed too close.
I like where I plotted it for a few reasons.
One, there’s no doubt that it is still more advantageous to shoot in a high danger area close to the net.
Two, it’s spread out!.Whereas before there’s only maybe 25% (my rough guess) of the offensive zone that’s threatening under the current rules, that number close to doubles with a two point line.
It would be interesting to see how puck movement and strategy would change.
Three, I like that the red faceoff dots are just outside of it.
Coaches would have an easy time telling players where the most effective shooting locations are.
Basketball has similar sweet spots (corner threes and head on shots).
Of course, this is just an opinion.
It’s fun to experiment with different options.
I encourage you to play with the code.
If anything, it’s a great lesson in Matplotlib.
Here are some other examples…yes I’ve had too much fun with this…Is it a good idea?Is a 2-point line in hockey a good idea?.That’s a question that I am unqualified to answer, and it’s not the purpose of this post.
I’ve never played hockey.
Most of my hockey knowledge comes from watching the Nashville Predators late in the season and playing the masterpiece known as NHL Hitz 2002.
I can’t rule out that there’s an obvious reason it’s a dumb idea.
I don’t think a 2-point line is preposterous though.
The basketball 3-point line seems to have been a success.
It would open up the ice to different play styles and different puck movements.
It would also create heightened late-game drama.
However, there are plenty of cons.
A 2-point line would mess with basically every stat.
Gretzky records would be harder to interpret.
Would you count goalie saves vs.
a 2-point shot the same as a 1-point shot?.Wouldn’t goalie save percentage go up because of more low-probability shots?.How would it affect the common strategy of trying to tip goals?.It’s easy to get carried away with speculation.
Blocked ShotsOne potential problem in this analysis is blocked shots.
Blocked shots account for 25% of the shots recorded.
It’s my understanding that the NHL API gives the location of the block of the shot instead of the location that the offensive player shoots the puck.
I mapped the data without blocks, and it didn’t change much:There isn’t a lot of difference in the shot heat maps after you remove blocked shots.
My best guess is that a majority of blocked shots originate from long-range, and the coordinates given are mid-range.
I didn’t think it negatively affected my analysis to essentially treat blocked shots as missed shots that were slightly closer than they actually were.
This creates a small bias that makes it seem like mid-range shots are less effective than they actually are, and long-range shots are more effective than they are.
A more thorough analysis would account for this.