# Breaking Down the Black Box

It’s the marginal effect of X on Y, or the partial derivative of the response variable Y with respect to the independent variable X.

If we average that marginal effect for every observation (for all the other feature values), we can obtain a set of predictions as one feature variable changes.

When we do this for complex, nonlinear models, we can get some incredibly useful visualizations.

However, the largest pitfall of PDPs are the assumption of independence between variables (think “beta is the change in Y given a change in X, holding all other variables constant”), so if X happens to be associated with another variable, our PDP may be misleading.

Partial Dependence Plots of 3 features in a regression problem, which can also be applied towards probabilities in classification.

The density plot at the bottom can be aligned to show underlying distribution of data.

SourceIndividual Conditional Expectation (ICE): Whereas the the partial dependence plots averaged the marginal effect at each value for each feature across all instances, ICE plots can show individual lines that represent just one instance in the dataset, and how that observation’s prediction will change as a function of X.

Again, these suffer in the same way as PDPs: multicollinearity in the model may lead to misleading results.

ICE Plots of 3 features in a regression problem.

Contrast to PD Plots, which only contain one line.

SourceLocally Interpretable AlgorithmsThere are a few algorithms, gaining quite some popularity in the past few years, that help break down the contributions of each model in the context of a single prediction.

Each of these follow the an additive linear form, where we begin with one baseline value phi_0, then add the contributions of each variable z, multiplied by a weight phi_i, generating g(z’), the overall explanation for each observation.

The main differences between each one is in its ability to transform the feature variables X into the function input z’, as well as the method by which phi_i values are calculated.

Source: Lundberg & Lee (2018)LIME: Local Interpretable Model-agnostic Explanations, otherwise known as LIME, generate explanations that are as their name suggests: local and model-agnostic.

Developed by that team from University of Washington we mentioned earlier, LIME is a great post-hoc tool for examining questions along the lines of “Why did the deep learning model predict a probability of mortality of 0.

90 for this patient?” While I won’t go into the details on how it works here, (you can take a look at the original paper for that), LIME can be applied towards image classification problems and tabular data alike.

Check out this article for more on implementation of the LIME package in Python.

(Left) Feature Importances for predicting credit card default.

Blue columns indicate positive associations, and greater lengths indicate stronger magnitudes of association (Source).

(Right) Masked superpixels explaining the probability of class Labrador in the original image, generated with LIME values (Source).

SHAP: SHapley Additive exPlanations, otherwise known as SHAP, are generated using Shapely values, developed for economic game theory by Lloyed Shapley in 1953.

In 2018, Lundberg & Lee, also from the University of Washington, devised a method to implement these values in machine learning using the same additive linear form.

As with LIME, these can be used to construct explanations for supervised learning problems with both image classification and tabular data.

This GitHub repo is another great resource for getting started with SHAP in Python (and to generate some incredible visualizations)(Top) Force plot explaining direction of feature weightage for each variable, where phi_i represents the magnitude for variable Xi.

Blue arrows indicate an increase in probability, and greater lengths indicate stronger magnitudes (Source).

(Right) Pixels from the MNIST digit classification dataset explaining the probability of class 8 or class 3 for the input image, generated with SHAP values (Source).

Final CommentsInterpretable machine learning is a burgeoning field, and I’d be surprised if there weren’t any other seminal papers coming out over the next year or two.

Being able to explain a model is incredibly important, not just for internal validity, but to gain trust with all those who’ll interact with the end results.

Hopefully, our discussion in this post should give a general survey of several topics in IML, as well as a quick overview of some key vocabulary you’ll definitely encounter in your own endeavors.

Further Reading & ImplementationBecause this isn’t a comprehensive guide to all things IML, I’ve provided a general reading list, as well as some resources for applying these concepts to your own work.

The images below are just some of the many visualizations you can easily generate with the Python APIs.

SHAP Image Classification Explanations from SHAP PackageICE Plot with Color Map.

SourceGeneral IML:Interpretable Machine Learning Book by Christoph Molnar (Great in-depth explanations and visualizations): https://christophm.

github.

io/interpretable-ml-book/index.

htmlOverview of IML by H2O.

ai (Cool visualizations and non-technical descriptions): https://www.