GOSS (Gradient Based One Side Sampling) is a novel sampling method which downsamples the instances on the basis of gradients.

As we know instances with small gradients are well trained (small training error) and those with large gradients are undertrained.

A naive approach to downsample is to discard instances with small gradients by solely focussing on instances with large gradients but this would alter the data distribution.

In a nutshell, GOSS retains instances with large gradients while performing random sampling on instances with small gradients.

Important Parameters of light GBM:num_leaves: the number of leaf nodes to use.

Having a large number of leaves will improve accuracy, but will also lead to overfitting.

min_child_samples: the minimum number of samples (data) to group into a leaf.

The parameter can greatly assist with overfitting: larger sample sizes per leaf will reduce overfitting (but may lead to under-fitting).

max_depth: controls the depth of the tree explicitly.

Shallower trees reduce overfitting.

Tuning for imbalanced dataThe simplest way to account for imbalanced or skewed data is to add weight to the positive class examples:scale_pos_weight: the weight can be calculated based on the number of negative and positive examples: sample_pos_weight = number of negative samples / number of positive samples.

Tuning for overfittingIn addition to the parameters mentioned above the following parameters can be used to control overfitting:max_bin: the maximum numbers of bins that feature values are bucketed in.

A smaller max_binreduces overfitting.

min_child_weight: the minimum sum hessian for a leaf.

In conjuction with min_child_samples, larger values reduce overfitting.

bagging_fraction and bagging_freq: enables bagging (subsampling) of the training data.

Both values need to be set for bagging to be used.

The frequency controls how often (iteration) bagging is used.

Smaller fractions and frequencies reduce overfitting.

feature_fraction: controls the subsampling of features used for training (as opposed to subsampling the actual training data in the case of bagging).

Smaller fractions reduce overfitting.

lambda_l1 and lambda_l2: controls L1 and L2 regularization.

Tuning for accuracyAccuracy may be improved by tuning the following parameters:max_bin: a larger max_bin increases accuracy.

learning_rate: using a smaller learning rate and increasing the number of iterations may improve accuracy.

num_leaves: increasing the number of leaves increases accuracy with a high risk of overfitting.

Important Parameters of XGBoost:The overall parameters have been divided into 3 categories by XGBoost authors:General Parameters: Guide the overall functioningBooster Parameters: Guide the individual booster (tree/regression) at each stepLearning Task Parameters: Guide the optimization performedGeneral ParametersThese define the overall functionality of XGBoost.

1.

booster [default=gbtree]Select the type of model to run at each iteration.

It has 2 options:gbtree: tree-based modelsgblinear: linear models2.

silent [default=0]:Silent mode is activated is set to 1, i.

e.

no running messages will be printed.

It’s generally good to keep it 0 as the messages might help in understanding the model.

3.

nthread [default to maximum number of threads available if not set]This is used for parallel processing and number of cores in the system should be enteredIf you wish to run on all cores, value should not be entered and algorithm will detect automatically.

Booster ParametersThough there are 2 types of boosters, I’ll consider only tree booster here because it always outperforms the linear booster and thus the later is rarely used.

1.

eta [default=0.

3]Analogous to learning rate in GBMMakes the model more robust by shrinking the weights on each stepTypical final values to be used: 0.

01–0.

22.

min_child_weight [default=1]Defines the minimum sum of weights of all observations required in a child.

This is similar to min_child_leaf in GBM but not exactly.

This refers to min “sum of weights” of observations while GBM has min “number of observations”.

Used to control over-fitting.

Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.

Too high values can lead to under-fitting hence, it should be tuned using CV.

3.

max_depth [default=6]The maximum depth of a tree, same as GBM.

Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.

Should be tuned using CV.

Typical values: 3–104.

max_leaf_nodesThe maximum number of terminal nodes or leaves in a tree.

Can be defined in place of max_depth.

Since binary trees are created, a depth of ’n’ would produce a maximum of 2^n leaves.

If this is defined, GBM will ignore max_depth.

5.

gamma [default=0]A node is split only when the resulting split gives a positive reduction in the loss function.

Gamma specifies the minimum loss reduction required to make a split.

Makes the algorithm conservative.

The values can vary depending on the loss function and should be tuned.

6.

max_delta_step [default=0]In maximum delta step, we allow each tree’s weight estimation to be.

If the value is set to 0, it means there is no constraint.

If it is set to a positive value, it can help to make the update step more conservative.

Usually, this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.

This is generally not used but you can explore further if you wish.

7.

subsample [default=1]Same as the subsample of GBM.

Denotes the fraction of observations to be randomly samples for each tree.

Lower values make the algorithm more conservative and prevent overfitting but too small values might lead to under-fitting.

Typical values: 0.

5–18.

lambda [default=1]L2 regularization term on weights (analogous to Ridge regression)This used to handle the regularization part of XGBoost.

Though many data scientists don’t use it often, it should be explored to reduce overfitting.

9.

alpha [default=0]L1 regularization term on weight (analogous to Lasso regression)Can be used in case of very high dimensionality so that the algorithm runs faster when implemented10.

scale_pos_weight [default=1]A value greater than 0 should be used in case of high-class imbalance as it helps in faster convergenceLearning Task ParametersThese parameters are used to define the optimization objective of the metric to be calculated at each step.

objective [default=reg:linear]This defines the loss function to be minimized.

Mostly used values are:binary:logistic –logistic regression for binary classification, returns predicted probability (not class)multi:softmax –multiclass classification using the softmax objective, returns predicted class (not probabilities)multi:softprob –same as softmax, but returns predicted probability of each data point belonging to each class.

2.

eval_metric [ default according to objective ]The metric to be used for validation data.

The default values are rmse for regression and error for classification.

Typical values are:rmse — root mean square errormae — mean absolute errorlogloss — negative log-likelihooderror — Binary classification error rate (0.

5 threshold)merror — Multiclass classification error ratemlogloss — Multiclass loglossauc: Area under the curve.

Similarity in HyperparametersImplementation on Dataset:So now let’s compare LightGBM with XGBoost by applying both the algorithms to a census income dataset and then comparing their performance.

Dataset InformationLoading the dataAfter the running both Light GBM and XGboost on the above dataset.

The results are :Evaluation metrics: Accuracy, auc_score & execution time (Model 1)Evaluation metrics: Accuracy, rsme_score & execution time (Model 2)There has been only a slight increase in accuracy, AUC score and a slight decrease in rsme score by applying XGBoost over LightGBM but there is a significant difference in the execution time for the training procedure.

Light GBM is very fast when compared to XGBOOST and is a much better approach when dealing with large datasets.

This turns out to be a huge advantage when you are working on large datasets in limited time competitions.

Parameters Tuning:For XGBOOSTThese are parameters set for model 1 :Parameters for XGBoost — Model 1Accuracy for model 1 :These are the parameters for model 2: Tuning of parameters is being done.

Parameters Tuning for XGBoost — Model 2Accuracy for model 2:As we can see, with the tuning of parameters, there was little increase in the accuracy of our model.

For LightGBM:These are parameters set for model 1 :Parameters for LightGBM — Model 1Accuracy for model 1 :These are the parameters for model 2: Tuning of parameters is being done.

Parameters for LightGBM — Model 2Accuracy for model 2:As we can see, with the tuning of parameters, there was little increase in the accuracy of our model.

End NotesIn this post, I’ve tried to compare the performance of Light GBM vs XGBoost.

One of the disadvantages of using this LightGBM is its narrow user base — but that is changing fast.

This algorithm apart from being more accurate and time-saving than XGBOOST has been limited in usage due to less documentation available.

However, this algorithm has shown far better results and has outperformed existing boosting algorithms.

You can find the whole code in my GitHub repository.

Nikhileshorg/LightGBM-vs-XGBoostContribute to Nikhileshorg/LightGBM-vs-XGBoost development by creating an account on GitHub.

github.

comDo you have any questions about LightGBM, XGBoost or related to this post?.Leave a comment and ask your question and I will do my best to answer it.

Thanks for reading!.❤.. More details