Feature Selection Using Regularisation ?Lasso comes to the rescueAkash DubeyBlockedUnblockFollowFollowingFeb 4Photo by Markus Spiske on UnsplashIntroductionRegularisation consists in adding a penalty to the different parameters of the machine learning model to reduce the freedom of the model and in other words to avoid overfitting.

In linear model regularisation, the penalty is applied over the coefficients that multiply each of the predictors.

From the different types of regularisation, Lasso or L1 has the property that is able to shrink some of the coefficients to zero.

Therefore, that feature can be removed from the model.

In this post I will demonstrate how to select features using the Lasso regularisation classification problem.

For classification I will use the Paribas claims dataset from Kaggle.

Importing important librariesimport pandas as pdimport numpy as npimport matplotlib.

pyplot as pltimport seaborn as sns%matplotlib inlinefrom sklearn.

model_selection import train_test_splitfrom sklearn.

linear_model import Lasso, LogisticRegressionfrom sklearn.

feature_selection import SelectFromModelfrom sklearn.

preprocessing import StandardScaler2.

Loading the datasetdata = pd.

read_csv(‘paribas.

csv’, nrows=50000)data.

shapedata.

head()3.

Selecting Numerical ColumnsIn practice, feature selection should be done after data pre-processing, so ideally, all the categorical variables are encoded into numbers, and then we can assess how deterministic they are of the target, here for simplicity I will use only numerical variables to select numerical columns:numerics = ['int16','int32','int64','float16','float32','float64']numerical_vars = list(data.

select_dtypes(include=numerics).

columns)data = data[numerical_vars]data.

shape4.

Separating the data into training and tests setX_train, X_test, y_train, y_test = train_test_split( data.

drop(labels=['target', 'ID'], axis=1), data['target'], test_size=0.

3, random_state=0)X_train.

shape, X_test.

shape5.

Scaling the data, as linear models benefits from feature scalingscaler = StandardScaler()scaler.

fit(X_train.

fillna(0))6.

Selecting features using Lasso regularisation using SelectFromModelHere I will do the model fitting and feature selection, altogether in one line of code.

First I specify the Logistic Regression model, and I make sure I select the Lasso (L1) penalty.

Then I use the selectFromModel object from sklearn, which will select in theory the features which coefficients are non-zero.

sel_ = SelectFromModel(LogisticRegression(C=1, penalty='l1'))sel_.

fit(scaler.

transform(X_train.

fillna(0)), y_train)7.

Visualising features that were kept by the lasso regularisationsel_.

get_support()In the above output, the output labels are index wise.

So Trueis for the features that lasso thought is important (non-zero features) while False is for the features whose weights were shrinked to zero and are not important according to Lasso.

8.

Make a list of with the selected features.

selected_feat = X_train.

columns[(sel_.

get_support())]print('total features: {}'.

format((X_train.

shape[1])))print('selected features: {}'.

format(len(selected_feat)))print('features with coefficients shrank to zero: {}'.

format( np.

sum(sel_.

estimator_.

coef_ == 0)))Number of features which coefficient was shrank to zero :np.

sum(sel_.

estimator_.

coef_ == 0)9.

Identifying the removed featuresremoved_feats = X_train.

columns[(sel_.

estimator_.

coef_ == 0).

ravel().

tolist()]removed_feats10.

Removing the features from training an test setX_train_selected = sel_.

transform(X_train.

fillna(0))X_test_selected = sel_.

transform(X_test.

fillna(0))X_train_selected.

shape, X_test_selected.

shapeNote :L2 regularisation does not shrink coefficients to zero# Separating the data into train and test set X_train, X_test, y_train, y_test = train_test_split( data.

drop(labels=['target', 'ID'], axis=1), data['target'], test_size=0.

3, random_state=0)X_train.

shape, X_test.

shapeFor comparison, I will fit a logistic regression with a Ridge regularisation, and evaluate the coefficients :l1_logit = LogisticRegression(C=1, penalty='l2')l1_logit.

fit(scaler.

transform(X_train.

fillna(0)), y_train)Now, Lets count the number of coefficients with zero values :np.

sum(l1_logit.

coef_ == 0)So, Now number of coefficients with zero values is zero.

So, now it is clear that Ridge regularisation (L2 Regularisation) does not shrink the coefficients to zero.

Conclusion :As we can see, the logistic regression we used for the Lasso regularisation to remove non-important features from the dataset.

Keep in mind that increasing the penalisation c will increase the number of features removed.

Therefore, we will need to keep an eye and monitor that we don’t set a penalty too high so that to remove even important features, or too low and then not remove non-important features.

For feature selection using Random forest :Feature Selection Using Random forestRandom forests are one the most popular machine learning algorithms.

They are so successful because they provide in…towardsdatascience.

com.