# Exploring FIFA

Verdict- Market Value do affect Wage of a Player to an extent.

What is the preferred Foot among the players and how does it affect their positioning?We found of all the players in our dataset less than 25% of the Players are left footed (shown in Bar chart below).

To check whether the preferred foot of the player has any impact on the position of a player, we took the proportion of the preferred foot grouped by Position of a player.

#Get the required details from the dataframedfo = dfa[[‘Position’,’Preferred Foot’]].

groupby(‘Position’)[‘Preferred Foot’].

value_counts().

unstack()#Top 5 Left foot print(“Top 5 Left Foot Positions:”)print(dfo[‘Left’].

sort_values(ascending = False).

head(5))#Top 5 Right foot print(“!.Top 5 Right Foot Positions:”)print(dfo[‘Right’].

sort_values(ascending = False).

sum()dfo[‘Right’]= dfo[‘Right’]/dfo[‘Right’].

sum()fig, ax = plt.

subplots(figsize=(15,7));dfo[‘Right’].

plot(ax=ax);dfo[‘Left’].

plot(ax=ax);plt.

legend([‘Right’,’Left’])plt.

title(“Position vs Foot”)It can be observed from above that the proportion is same for both left and right foot, with a few exceptions.

Which means it hardly matters whether you are a lefty or righty the distribution of Positions, the demand for one position over other will be roughly the same.

Further exploring, the top 5 positions as per the Foot (check below), we found that CB (Center Back) is the third most preferred spot with ST (Striker) being in top 5 for both.

Though there are some striking differences like Goal Keepers are mostly Right Footed!.Verdit- Yes, Foot do have an impact but only little not very substantial.

Furthermore Striker, Goalkeeper and Center-Back are top three positions in terms of no.

of players (refer screesnshot below)!Can we predict the Value of a player based on its attributes (like accuracy, shot power, reactions, dribbling etc)?#features chosendfv = dfa[['Preferred Foot','Position','Crossing', 'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes','Value_kEuro']]Position & Preferred foot columns are encoded using one-hot encoding, and further after formatting data, removing NaNs; we split the data and tried to predict using RandomForestRegressor and GridSearch (hyper-parameter tuning).

We achieved R-Squared Score of ‘0.

42’ .

#To predict the "value" based on chosen attributesy = dfv['Value_kEuro']X = dfv.

drop(['Value_kEuro'],axis=1)#train test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.

33, random_state=42)# Create the parameter grid based on the results of random search param_grid = { 'bootstrap': [True], 'max_depth': [80, 90, 100, 110,150,200], 'max_features': [2, 3,4], 'min_samples_leaf': [3, 4, 5], 'min_samples_split': [8, 10, 12], 'n_estimators': [100, 200, 300, 1000]}# Create a based modelrf = RandomForestRegressor()# Instantiate the grid search modelgrid_search = GridSearchCV(estimator = rf, param_grid = param_grid, cv = 3, n_jobs = -1, verbose = 2)# Fit the grid search to the datagrid_search.

fit(X_train,y_train)grid_search.

best_params_y_pred = grid_search.

predict(X_test)print("Rsquared: ",r2_score(y_test, y_pred))Note- The Model can further be improved, using different algorithms and/or feature engineering.

Also using Mutual Info Regressor, we found the following as Top 5 most important features in deciding the Value of a Player- Reactions, Ballcontrol, Composure, Dribbling, & ShortPassing.

Verdict- Not all Features are equally useful, also one can predict the Market value given enough data and attributes of Players.

Further we can also ask fairly straightforward questions from the data (given we have right amount of data).

Below are two such questions we attempted, followed by the Conclusion.

Clubs with the highest median wages (Top 11)?dfa[['Wage_kEuro','Club']].

groupby(['Club'])['Wage_kEuro'].

median().

sort_values(ascending=False).

head(11)Players with largest release clause (Top 11)?dfa[['ReleaseClaus_kEuro','Name']].

sort_values(by='ReleaseClaus_kEuro',ascending=False)['Name'].