How Do We Survive in PUBG?

What kind of strategies can help them survive until the end?This project is also my first attempt to go through the feature engineering process in machine learning, including error analysis, feature space evaluation, ensembles, and tuning process.

I used Weka and Lightside to implement my experiments.

This post won’t teach you any code, but it’s a comprehensive case of the machine learning process including data cleaning, dataset splitting, cross-validation, and feature design.

Data CollectionThe dataset found on Kaggle contains over 4.

45 million instances and 28 features (see data source in References).

There are match Id representing each game, group Id representing each group (which vary from 1 to 4) and player Id representing each player.

The data is formatted to have each instance as each player’s post-game statistics.

The features include performance in the game, such as number of times reviving group members, number of kills, walking distance, etc.

(see appendix).

There are also features of external rankings indicating the player’s performance out of the game.

The win placement percentile of each game, on a scale of 0 to 1 (1 is the first place while 0 is the last place), is restructured to be the final prediction class.

I chose the game mode “squad” in which players can form a group of 1–4 against other groups.

The reason is that a lot of features in the dataset are related to group performance.

I converted each instance to represent each group and took the average and some standard deviation of the features.

I randomly kept 2 groups in each game to compare which group ranks higher.

In both training and testing sets, I converted the win placement percentile to “winner prediction” and the value shows the higher-ranking group in order to make it a binary prediction.

I also added some features that made sense to be compared and took the difference of them as the values, such as group size difference, kill rank difference, walk distance difference, number of weapons acquired difference.

As for data cleaning, I removed unreasonable data such as duplicate players in the same game, negative rankings and group sizes larger than 4.

I split datasets in random order and the splitting proportions are as follows, cross-validation set 70%, development set 20%, test set 10%.

What I wanted to predict is which group would be the winner out of the two randomly picked groups in each game, so the classification is on the winnerPrediction.

After the data cleaning process, there are 6576 instances and 65 features.

“Group Two” and “Group One” in the winner prediction both occupy roughly 50% in each split dataset.

Data ExplorationI first did an exploratory data analysis of the development set to gain a natural sense of the data.

Below are some interesting findings.

Do people prefer to play solo or in a group?The distribution of group size was very similar across group 2 and 1.

It seems that most people preferred playing the game alone or with only one partner.

Are we more likely to win when we play aggressively?The distribution of kill place difference, which is calculated by the kill rank of Group Two subtracted by that of Group One, appeared to be a normal distribution.

Therefore, on the right side of the graph where the Y-axis is positive, Group Two ranked lower, and on the left side where the Y-axis is negative, Group Two ranked higher.

As it shows below, the red section is Group Two as the winner and the blue section is Group One as the winner.

This graph demonstrates that in most cases when a group’s kill rank is high, it’s more likely to win.

Which one is a better game strategy, move or hide?The distribution of walk distance difference, which is calculated by the walk distance of Group Two subtracted by that of Group One, also appeared to be a normal distribution.

Therefore, on the right side of the graph where the Y-axis is positive, Group Two walked more than Group One, and on the left side where the Y-axis is negative, Group Two walked less than Group One.

As it shows below, the red section is Group Two as the winner and the blue section is Group One as the winner.

This graph illustrates that in most cases when a group traveled for a longer distance, it’s more likely to win.

Error AnalysisI chose logistic regression to start with because all features are numeric data and the prediction is binary, which a weight-based model would be useful for.

Tree-based models would be time-consuming for 65 features.

The baseline performance I had is as follows, accuracy rate 0.

8905, Kappa 0.

7809.

1.

Horizontal Difference AnalysisI first inspected the instances predicted the game winner to be Group Two while it’s actually Group One.

I sorted the horizontal difference from largest to smallest and looked at features that have relatively large feature weight.

I found that the walkDistanceDifference had a large horizontal difference and relatively important feature weight.

The walkDistanceDifference is calculated by the walking distance of Group Two minus that of Group One in the same game, so a negative number means that Group Two walked less than Group One in the game while a positive number means that Group Two walked more.

It implies that whichever group walked more was more likely to win (which makes sense in the game setting since it probably means they survived longer).

However, there could be exceptions that one group played more aggressively and went out more and thus lost quicker while another group tended to play it safe, hid at one place most of the time and survived for a longer time.

To further address this problem, I also downloaded the predict labels in a csv file, looked at instances that were predicted Group Two as winner while Group One was the actual winner, and then sorted the WalkDistanceDifference from largest to smallest so that I can look at the exceptions when Group Two walked longer but lost the game.

One thing I found was that sometimes Group Two walked more than Group One, but Group One drove for a longer distance (in the game they can choose to ride a car if they find one).

As the highlighted instances show, there were many groups that didn’t walk as much as the other group but drove much more than the other group.

So measuring solely on the walk distance or the ride distance may not properly indicate the total travel distance very well.

Also, the rideDistance had the second largest horizontal difference.

Thus these two features appeared to be problematic and needed further representation.

Therefore, I proposed 3 new features by combining the walk distance and the ride distance as the total travel distance of Group Two, Group One and the difference between the two groups.

I tested the new feature space in the development set and had an insignificant improvement.

Although it was insignificant, the number of instances predicted Group Two but actually Group One was decreased by 5 and corrected to Group One.

However, when I applied it to the cross-validation set, it actually reduced the performance.

The most probable reason was that the improvement in the development set was overfitting and it didn’t generalize to a new dataset.

2.

Vertical Difference AnalysisI then did another error analysis by inspecting the vertical absolute difference.

Since the new development set introduced more errors in the instances predicted Group One but actually Group Two, my goal was to find out how these two groups of instances look similar to each other.

I found that the 1-KillPlace feature had a small vertical difference and large feature weight.

This is a kill-based ranking of only Group One.

The average ranking of Group One was around 34 when it won and around 43 when it lost.

The exception here was that sometimes Group One lost even if its ranking is as high as 34.

Same as I mentioned earlier, sometimes a group played more aggressively and killed more people so their kill-based ranking was high but they also had a higher risk to lose the game.

An inspiration for me here is that logistic regression is good at global analysis but it can be biased by some extreme exceptions.

I would need an algorithm that can ignore extreme exceptions and look at a smaller set of data at a time.

Decision tree could be a good one but since I have 68 features of numeric data, it could take massive time for decision tree to build the model.

What if I can combine the advantages of logistic regression and decision tree?.Logistic Model Tree (LMT) would be a good option to start with since it can capture nonlinear patterns and higher variance.

So I tried using LMT and compared the results of these two algorithms, I had a significant improvement over the development set.

I applied the model on the cross-validation set and had a significant improvement.

3.

EnsemblesSince boosting will specifically inspect the instances that the previous model classified wrong during iterations, I think it’d be a good method to improve the accuracy in my case.

So I tried AdaBoost with a classifier of LMT on the development set, however, the performance was reduced.

Since my feature space is relatively complex, my next exploration was to reduce features that may not be good indicators.

I tried the AttributeSelectedClassiflier and used the principal component as the evaluator.

The reason to use principal component was to reduce the dimensionalities of the feature space while keeping as much information as possible.

At the end the performance was also reduced.

I also tried CfsSubsetEvaluator since a lot of the features I have are correlated to each other, such as killPlace and KillPoints.

This evaluator would help me to selectively keep useful features correlate to the class but not repeating other features.

It turned out that the performance was also not better than the baseline I had.

Another good evaluator I would try is SVMAttributeEvaluator since it’s a backward elimination method which would be good for my large feature space, however, Weka and Lightside don’t provide this option.

4.

Feature Space EvaluationIn addition to the wrapper methods I previously tried, I wanted to explore whether using the filter method can improve the performance since it would allow me to independently select features apart from the algorithm.

I used AttributeSelectedEvaluator and the principal component as the evaluator.

The three different settings are the original feature space (68 features), 40 features and 20 features.

I ran an experiment to test three feature space.

But both new feature space actually reduced the performance.

I also tested other evaluators and had the same outcomes, so I decided to keep the feature space as it was.

Tuning ProcessThere were two parameters with the LMT algorithm I wanted to tune.

With minNumberInstances, the default is 15 and I wanted to test 50.

I wanted to know by adding more instances at which a node is considered splitting would improve the accuracy of each node and thus improve the overall performance.

With boostingIterations, the default is -1 which means no iterations.

I wanted to test 3 and see if it improves the classification accuracy.

Therefore, the four settings I tested are as follow: (1) minimum number of instances at which a node is considered splitting to be 15, boosting iterations to be -1 (means not to iterate), (2) number of instances 50, boosting iterations -1, (3) number of instances 15, boosting iterations 3, (4) number of instances 50, boosting iterations 3.

Notice that setting (1) is the default setting.

I used the accuracy rate as the measurement of the performance.

Stage 1: (1) 90.

81 (2) 90.

81 (3) 91.

05 (4) 91.

05Setting (3) is the optimal setting since it has the highest accuracy rate and it’s simpler than (4).

Stage 3:According to stage 1, setting (3) was the optimal setting.

During stage 3, I also picked setting (3) as the optimal setting in each fold.

In this case, I didn’t run any significant test and had no evidence to conclude that the optimization was worth it.

It seems that the minimum weights of instances at each node didn’t affect the model performance much.

However, increasing the number of iterations might help to improve accuracy over multiple attempts.

On a whole new dataset, I estimated the performance to be around 91.

66 if I use setting (3) since it’s the average of the test performance in 5 folds.

Final EvaluationFinally, I trained a model over the cross-validation set using LMT with setting (3) number of instances 15, boosting iterations 3.

I kept the feature space after adding the 3 new features from the error analysis.

The final performance I had from the final testing set was an accuracy rate of 0.

9179, which is close to what I estimated in the tuning process, and Kappa of 0.

8359.

Tree ViewReflectionBy observing the tree, the model started from the walk difference and split at ride distance and travel distance difference at some points, which proved the importance of my newly added features during error analysis.

Although some of the nodes split at features only related to one of the groups, such as Group 2-killStreaks, many other nodes used the difference of the two groups and even a standard deviation of the winPoints.

This also showed the usefulness of keeping raw features as well as adding combining features.

According to the features the main tree picked, to win the game, the moving distance, the killing ability and the variance of the group members’ surviving ability were important factors.

Typically, the more aggressive we play, which means to move frequently and kill more people, the more likely we would win.

During this project, I went through error analysis, including horizontal difference analysis, vertical difference analysis, ensemble algorithms and feature space evaluation, and tuning process.

The most helpful step was to observe the feature space and perform feature engineering manually.

Picking the right algorithm that fits the data was also important.

In my case, I combined the benefits of regression model, which is good at fitting linear model to numeric data, and tree model, which provides more variance and compensates to the limit of nonlinearity.

I was able to improve the model performance significantly on the cross-validation set and reach a classification accuracy of 91.

79% on the final testing set.

There were some limitations to my project analysis.

First, I only exam the data under one of the game modes and the results may not apply to all kinds of game modes in the real world.

Second, instead of predicting the rankings, I chose to turn the project scope to binary classification.

I randomly picked two groups out of one game and tried to predict who would win the game.

In this case, by removing other groups’ performance, I removed some variances and factors involved, which may also affect the prediction of the game outcome in the real world.

To sum up, I was able to reach my goal of predicting winner of the PUBG game and find out some of the important behaviors that can affect the game result.

For further evaluation, data from other game modes may be introduced and other types of prediction may be tested.

.. More details

Leave a Reply