How do we interpret this in the context of the output?hmmm ????As we shift towards more complex models boasting greater accuracy, we find it increasingly difficult to explain how these models generated their predictions.

This is a precarious dilemma for several reasons:Model biases are a significant issue because they may end up unfairly impacting decisions.

Some data have built-in biases, especially when it comes to racial and gender bias, which negatively influences the model’s prediction.

With no good way to explain how the model made its decision, it is difficult to identify these inherent biases.

Model improvement can also be difficult if you don’t know what to improve.

Sure, you can tweak the hyperparameters until you get the best possible score, but the type of data you have is even more important.

Understanding the value of different features in a model provides valuable feedback for data collection, informing us what types of data are most important.

User trust is important for adoption of artificial intelligence systems.

In a recent report by the Center for the Governance of AI, the authors report: ‘There are more Americans who think that high-level machine intelligence will be harmful than those who think it will be beneficial to humanity.

’ Interpretability of complex models facilitates greater understanding, builds intuition for how the model makes a decision, and ultimately engenders user trust.

Additive feature attribution methodsTo explain more complex models, we need a simpler explanation model that approximates the original model.

Suppose we have a complex model with the form f(x), then the explanation model g(z’) ≈ f(x).

Breaking down g(z’) a little more, we attribute some effect ϕᵢ to each feature z’.

By summing all these effects and their features, we can approximate the output of the original model.

This is defined as an additive feature attribution method.

As it turns out, other current explanation models follow this same explanation model, allowing them to all be unified into a singular framework.

Properties of additive feature attribution methodsThere are three properties we want for this class of methods.

Local accuracy: one of the most important properties is that the explanation model is able to match the output of the original model.

Missingness: if there are missing features, then that feature has zero effect, or ϕᵢ = 0.

Consistency: if the model changes such that a feature contribution increases or stays the same, that features attribution, or ϕᵢ , should not decrease.

This leads us to SHAP values, which unifies previous methods, and exhibits the above properties.

SHAP valuesSHAP values (ϕᵢ) are used to describe a feature’s importance.

Consider the following diagram:f(x) is the output predicted by the model, and E[f(z)] is the base value that would be predicted if there were no features.

In other words, E[f(z)] is simply the average model output.

When we include a feature x₁, then ϕ₁, explains how we went from the base value to the new predicted value, which is now given by E[f(z) | z₁ = x₁].

Repeating this process for the remaining variables x₁, x₂, x₃, estimates the SHAP values for ϕ₁, ϕ₂, and ϕ₃, showing how the model ultimately arrives at the predicted output, f(x).

Different flavors of SHAPThere are multiple implementations of SHAP, each adapted for a specific model type, which allows for faster approximation.

TreeExplainerThe TreeExplainer was developed especially for tree ensemble methods such as XGBoost, LightGBM or CatBoost (Tree SHAP paper).

DeepExplainerDeepExplainer was developed for use in deep learning models and supports TensorFlow/Keras.

GradientExplainerThe GradientExplainer was also developed for approximating SHAP values in deep learning models, but is slower than DeepExplainer and makes different assumptions.

This method is based on the Integrated Gradient attribution method, and supports TensorFlow/Keras/PyTorch.

KernelExplainerThe KernelExplainer approximates SHAP values for any type of model, using weighted linear regression.

It is much faster and efficient to use a model type specific algorithm (TreeExplainer, DeepExplainer), instead of the general KernelExplainer.

ConclusionThe author’s found a stronger alignment between human explanation and SHAP explanation, than with any other methods, which suggests just how powerful and intuitive SHAP is.

The calculated SHAP values are easily visualized in beautiful, and simple plots that explain how features affect a specific prediction.

This makes SHAP a compelling tool for confidently interpreting, and explaining any model.

For a tutorial on how to implement SHAP, check out my notebook, and see how we can interpret the predicted results of a gradient boosted tree.

The SHAP github also has excellent resources for more examples on how to implement DeepExplainer, KernelExplainer and other helpful features.

ReferenceLundberg, Scott M.

, and Su-In Lee.

“A unified approach to interpreting model predictions.

” Advances in Neural Information Processing Systems.

2017.

Thanks for reading!.Stay tuned for more as I continue on my path to become a data scientist!.✌️.