Basic Ensemble Learning (Random Forest, AdaBoost, Gradient Boosting)- Step by Step Explained

Remember, boosting model’s key is learning from the previous mistakes.Gradient Boosting learns from the mistake — residual error directly, rather than update the weights of data points.Let’s illustrate how Gradient Boost learns.Step 1: Train a decision treeStep 2: Apply the decision tree just trained to predictStep 3: Calculate the residual of this decision tree, Save residual errors as the new yStep 4: Repeat Step 1 (until the number of trees we set to train is reached)Step 5: Make the final predictionThe Gradient Boosting makes a new prediction by simply adding up the predictions (of all trees).Implementation in Python SklearnHere is a simple implementation of those three methods explained above in Python Sklearn.# Load Libraryfrom sklearn.datasets import make_moonsfrom sklearn.metrics import accuracy_scorefrom sklearn.model_selection import train_test_splitfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier# Step1: Create data setX, y = make_moons(n_samples=10000, noise=.5, random_state=0)# Step2: Split the training test setX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Step 3: Fit a Decision Tree model as comparisonclf = DecisionTreeClassifier(), y_train)y_pred = clf.predict(X_test)accuracy_score(y_test, y_pred)OUTPUT: 0.756# Step 4: Fit a Random Forest model, " compared to "Decision Tree model, accuracy go up by 5%clf = RandomForestClassifier(n_estimators=100, max_features="auto",random_state=0), y_train)y_pred = clf.predict(X_test)accuracy_score(y_test, y_pred)OUTPUT: 0.797# Step 5: Fit a AdaBoost model, " compared to "Decision Tree model, accuracy go up by 10%clf = AdaBoostClassifier(n_estimators=100), y_train)y_pred = clf.predict(X_test)accuracy_score(y_test, y_pred)OUTPUT:0.833# Step 6: Fit a Gradient Boosting model, " compared to "Decision Tree model, accuracy go up by 10%clf = GradientBoostingClassifier(n_estimators=100), y_train)y_pred = clf.predict(X_test)accuracy_score(y_test, y_pred)OUTPUT:0.834Note: Parameter – n_estimators stands for how many tree we want to growOverall, ensemble learning is very powerful and can be used not only for classification problem but regression also..In this blog, I only apply decision tree as the individual model within those ensemble methods, but other individual models (linear model, SVM, etc.)can also be applied within the bagging or boosting ensembles, to lead better performance.The code for this blog can be found in my GitHub Link here also.Please feel free to leave any comment, question or suggestion below..Thank you!. More details

Leave a Reply