It then ranks the features based on the order of their elimination.# Recursive Feature Eliminationfrom sklearn.feature_selection import RFEfrom sklearn.linear_model import LinearRegression# create a base classifier used to evaluate a subset of attributesmodel = LinearRegression()X, y = iowa.iloc[:,:-1], iowa.iloc[:,-1]# create the RFE model and select 3 attributesrfe = RFE(model, 10)rfe = rfe.fit(X, y)# summarize the selection of the attributesprint(rfe.support_)print(rfe.ranking_)Output:[False False True True False False False False False False False False False False False True True True False True True True True False True False False False False False False False False False][16 24 1 1 4 9 18 13 14 15 11 6 7 12 10 1 1 1 2 1 1 1 1 5 1 23 17 20 22 19 8 21 25 3]→ Here is what is happening in the above example,i..‘rfe.support_’ gives the result (based on the selected model and no of requirement, obviously) with respect to features sequentially.ii..‘rfe.ranking_’ gives the rank to all features respectively..This really comes handy when you need more features than you gave input to ‘n_features_to_select’ (in above eg it was 10)..So, you can set a threshold value & select all the features above it respectively.4.Sequential Feature Selector: Sequential feature selection algorithms are a family of greedy search algorithms that are used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace (where k < d).→ Step forward feature selection starts with the evaluation of each individual feature and selects that which results in the best performing selected algorithm model.→ Step backward feature selection is closely related, and as you may have guessed, it starts with the entire set of features and works backwards from there, removing features to find the optimal subset of a predefined size.→ What’s the “best?”That depends entirely on the defined evaluation criteria (AUC, prediction accuracy, RMSE, etc.)..Next, all possible combinations of the that selected feature and a subsequent feature are evaluated, and a second feature is selected, and so on until the required predefined number of features is selected.→ In a nutshell, SFAs remove or add one feature at the time based on the classifier performance until a feature subset of the desired size k is reached.Note: I suggest you to visit official docs to understand it in more details with exampleSequential Feature Selector — mlxtendA library consisting of useful tools and extensions for the day-to-day data science tasks.rasbt.github.ioEmbbeded Method:Blueprint of Embbeded Methods→ Embedded methods combine the qualities’ of filter and wrapper methods..It’s implemented by algorithms that have their own built-in feature selection methods.→ So this are not any kind of special feature selection or extraction techniques and they also help in avoiding overfitting.Lasso Regularization in Linear RegressionSelect k-best in Random ForestGradient boosting machine (GBM)Difference between Filter and Wrapper methodPart 3: Dimension ReductionSo, again starting with the same question, What is Dimension Reduction?In simple terms, to reduce an initial d-dimensional feature space to a k-dimensional feature subspace (where k < d).So does the Feature selection & extraction then why this?In a way, yes (but only ‘In layman’s term’)..To understand this we have to dive deeper.In machine learning, dimensionality simply refers to the number of features (i.e. input variables) in your dataset..When the number of features is very large relative to the number of observations in your dataset, certain algorithms struggle to train effective models..This is called the “Curse of Dimensionality,” and it’s especially relevant for clustering algorithms that rely on distance calculations.(A Quora user has provided an excellent analogy for the Curse of Dimensionality, have a look)So when you have let’s say when you have 100s or even 1000s of features, that time you have only one choice Dimension Reduction..Let us discuss two extremely robust and popular techniques.Linear discriminant analysis (LDA): Yes, along with Filter method (as discussed above) it is also used as Dimension Reduction technique.→ We used LDA in Supervised Learning when features are labelled.→ Please do up and understand LDA (if you had not already).2..Principal Component Analysis(PCA): The main purposes of a PCA are the analysis of data to identify patterns and finding patterns to reduce the dimensions of the dataset with minimal loss of information.→ PCA will try to reduce dimensionality by exploring how one feature of the data is expressed in terms of the other features(linear dependency).. More details