Punts, fourth down passes, and onside kicks tend to be called when a team has less than a 50% chance of winning…and usually with a far lower pre-snap probability.
Call me crazy, but teams should do everything in their power to retain possession of the ball when they are more likely to lose than win, and yet the average punt occurs when the possessing team has nearly a 55% probability of losing the game.
So the data tells us that coaches have some odd tendencies when it comes to play-calling, at least in context of pre-snap win probability.
But maybe there are other situational factors that impact what kinds of plays they call, and that these plays made the most sense at the time.
To analyze whether or not this is true, I built a classification model that only incorporates a few variables to see how “predictable” play -calling actually is in the NFL.
From there, I examined how being more or less predictable impacts the likelihood of a victory, as well as the average and overall per play impact of being less predictable.
I’m not going to get into the details of how I built the classification model (or much of the details for the other techniques — you can always review the code or reach out to me on Twitter) but I wanted to make the model as “naive” as possible — meaning I didn’t expose the model to as much information as I could have.
Namely, I didn’t let the model know who was playing whom nor the week of the season.
These are variables that certainly could increase accuracy, but I didn’t want individual teams and/or season dynamics to overwhelm the situational aspects of play-calling.
That said, I trained my model on a sampling of play-by-play data from 2009–2018 and tested it on a hold out sample drawn randomly from the same period.
Features included in the model described the pre-snap situation of the play — down, distance, position on the field, time remaining, score differential, pre-snap win probability, and a variety of expected points outcomes (i.
the likelihood of the next scoring play).
I also controlled for season.
The model suggests that play-calling in the NFL is surprisingly predictable — the model “called” about 77% of all plays correctly within the training and testing sets (see code for additional metrics such as precision, kappa, etc).
As you can see in the confusion matrix below, the model didn’t have much trouble predicting what play would be called, only getting slightly confused on run vs pass.
This would likely be cleared up had I given the model information on the team’s involved or some other historical feature on broader play-calling style.
Now that we have a broad sense of “predictability,” we can examine what teams are more or less predictable than “average.
” It’s not too much of a surprise to see that New England is one of the least predictable team overall during the time-frame analyzed:This chart shows that model correctly called nearly 80% of all plays the Chargers ran from 2009–2018, which doesn’t say a ton about Clarence Shelmon, Hal Hunter, Ken Whisenhunt, Frank Reich.
And though it’s not perfectly correlated, teams that have been generally successful during this time frame cluster on the less predictable side of the chart.
That correlation is interesting, partly because it’s really hard to stay good in the NFL for more than a few years and because the model is looking at each and every play — this chart is only examining “predictability” in aggregate.
So how does per game predictability impact wins and losses?The first step is just to see how spread out per game predictability is.
On the left, I’ve plotted the accuracy of the generalized classification model by game, separating out losses and wins (grey = 0 = loss, blue = 1 = win).
The model was solid across the board, predicting more than 60% of play-calls in almost every game, and getting near 95% right on a few.
But as you can see by the similar shapes and heights of the histograms above, predictability alone doesn’t do a great job of separating wins from losses.
This suggests there is some latent variable that would do a much better job of separating wins from losses, with team quality being the most obvious one.
With that in mind, I wanted to model out the influence of per game predictability on a win versus a loss while controlling for team strength.
By fitting a logit model on each game’s outcome using predictability and team, we can do just that.
And the results are interesting!The first thing to note is that using just team, predictability, and a simple model, we can pick winners from losers with a fairly good degree of accuracy.
The chart on the left is a confusion table that shows the calibration statistics of the model (basically, if we assign 10 teams a 70% chance of winning, 7 should win over the long run — this is similar to how you should measure the accuracy of a weather forecast).
And on the right, we can see how predictability influences the likelihood of a win, once we control for overall team strength.
Basically, the less generically predictable your play-calling, the higher likelihood you have to win the game (someone, please send this article to Randy Fichtner).
So the verdict is in…be less predictable in your play-calling, and you win games.
Right?Well, not quite.
Everything that’s preceded in this analysis has largely been focused on whether play-calling in the NFL is generally predictable.
It has not examined whether the plays that are being called are good or bad.
After all, being predictable is probably fine if coaches are already optimizing their play selections.
And this analysis is not getting into what plays are actually called, what personnel is on the field, and how well a team executed.
But we can broadly estimate whether zigging when other coaches zag carries any value.
To do this, I used another metric I love called win probability added.
Basically, you measure the likelihood a team is to win prior to the snap, then measure that probability again following the completion of the play.
The difference in those two probabilities is how successful or unsuccessful the play was in terms of contributing to a win (win probability added can be negative — it’s just that you “added” a negative value to your starting position).
We can use observed WPA to create a model of eWPA, or expected WPA by play-type.
We then measure the actual WPA against the eWPA for plays that the model “called” wrong, since these are the plays that the coaches deviated from what their average peer would do.
We can then see how effective teams are when they deviate and how much it’s helped their team throughout a season.
As you can see from the charts above, there is some evidence that deviating from the expected play-call increases a team’s odds of winning.
On a per play basis, an unexpected play was worth about 0.
07% in additional WPA across team seasons.
That may not sound like a lot, but sum that across an entire season, and the average team adds a total of 21% of win probability.
Only 5 teams generated less WPA than expected when they zigged instead of zagged.
Is this definitive proof that teams should throw when most team’s run?.Not necessarily.
But there is some statistical evidence that play-calling in the NFL is sub-optimal.
Not that I’m the first analyst (or fan) to draw that conclusion.
In either case, I hope you enjoyed this deep dive into play-calling tendencies in the NFL and how applying advanced analytical techniques to play-by-play data can lead to some interesting insights.
Now, all we need to do is to get coaches to stop punting when they’re expected to lose….