To answer this question I set out to do four things:Set the rules for a challengeCreate a benchmark prediction for the challengeTest how well the best fantasy tennis player performs for the challengeBuild a machine learning model and test it on the challengeThe term tennis player could be a bit confusing in this context since it can be applied to both the fantasy tennis player and the tennis player actually playing the tennis match.
For clarity I will use the FTP abbreviation for the rest of the blog post whenever I refer to the fantasy tennis player.
Set the Rules for a ChallengeThe challenge will consist of predicting which player will be the winner of a tennis match:I will use 1758 matches from all 66 ATP singles tournaments during 2018, excluding the ATP tour final.
The year 2018 was chosen since it’s the last complete season available at the time of writing.
Why not all matches of the season has been chosen is explained in section 3.
Only data available before a match started can be used to predict the outcome of the match.
The contestants in the challenge will be; the benchmark prediction, the FTP and the machine learning model which all will predict the outcome of the same matches.
Accuracy will be used to measure how good the contestants’ predictions were.
It means that the contestant who managed to correctly predict the most number of matches will win.
Accuracy is calculated by taking number of correct predictions divided by all predictions.
if 50 out of 100 matches are predicted correctly, the accuracy will be 50%.
Create a Benchmark Prediction for the ChallengeTo have something to compare with the predictions of the FTP and the machine learning model, I first created a benchmark prediction.
The simple strategy of always picking the best ranked player as the winner of a match was used.
The ATP ranking indicate a player’s performance during the last 52 weeks.
When applying this strategy, 1101 out of 1758 matches were correctly predicted.
This equals an accuracy of 62.
Test How Well the Best Fantasy Tennis Player Performs for the ChallengeWho is the best FTP in the world then?.As there doesn’t exist any official ranking for fantasy tennis, the answer is not straight forward and could be a bit subjective.
There are many different fantasy tennis sites around but www.
com is one that supports all ATP tournaments during the year.
It is also one of the sites with most active players, with over 1000 players participating for the grand slam tournaments.
For this challenge I have chosen the FTP that correctly predicted the most number of matches during 2018:Best fantasy tennis player of 2018So the best FTP in the world predicted 57% of all matches correct.
But wait, just 57%?.That sounds quite low considering flipping a coin would get you 50%.
The reason is the bracket format, where all matches have to be predicted before the tournament starts.
So when the FTP makes the predictions, it’s really only the first round matches where he is absolutely sure of who the tennis players in the match will be.
As the tournament progresses through to the later rounds, the predicted winner might not even play the match.
Naturally, the number of correct predictions will be lower than if the matches could be predicted after the tournament started but before each match starts.
To make it a bit more fair to the FTP, I’ve only taken into account the matches where the prediction was made based on the correct players in a match.
Let me explain by looking at some examples from a section of the Australian Open 2018 draw:Grey = no previous match, Green = won previous match, Red = lost previous matchExample 1: All matches in the 1st round will be taken into account since both players in each match were known at the time of the prediction.
For this example that would mean all 8 matches marked by box 1.
Example 2: In this match both players are the same as the FTP predicted them to be and will therefore be taken into account.
The FTP thought that Nadal was going to beat Mayer which turned out to be true.
Example 3: This match would not count since one of the players was not known.
The FTP thought that the match was going to be Schwartzman vs Halys but instead it turned out to be Schwartzman vs Ruud.
Since the prediction was made based on the wrong players in the match, it will not be considered.
After adjusting to only include matches where both players were known, the FTP predicted 1168 out of 1758 matches correct which would yield an accuracy of 66.
This is the result that will be tested against the machine learning model.
Build a Machine Learning Model and Test it on the ChallengeAs with any machine learning model we need data before we can predict anything.
For this challenge I used the ATP data provided by Jeff Sackmann here.
Jeff has shared free high quality tennis data for years so a big thank you to him.
From the ATP data I created a set of input that the model should be trained on.
Things like a player’s ranking, age, height, head-to-head against opponent, winning percent and more were used.
For the model itself, several different machine learning models were tested.
The one that performed the best was XGBoost.
All the data preparation and model building details can be found in my Github repository here.
When the machine learning model predicted the same 1758 matches it managed to get 1375 correct which equals an accuracy of 78.
ConclusionIn this post I tried to answer the question if a machine could beat the best FTP in the world.
I found the best FTP in the world and saw how well he had predicted the winners of tennis matches during the 2018 ATP singles season.
Then I built a machine learning model using a XGBoost classifier and tested it on the same tennis matches.
The results were:The results speak for themselves.
Not only would the machine learning model beat the best FTP.
It would do it with a huge margin.
So does this mean that every FTP should start to let machines make their decisions for them?.Not necessarily.
Usually, there are no prize money in the world of fantasy tennis so money is not the main motivation.
Most players participate for the honour and the fun of the game.
Letting a machine do all the decisions would remove some of the fun.
But most FTP could probably benefit and learn more about the game by analysing how machine learning models predict tennis matches.
It seems there is still a lot of potential for improvement in the world of fantasy tennis.
.. More details