Explainable AI (XAI)

As ML models make consequential decisions — approving loans, flagging fraud, recommending medical treatments — the question “why did the model decide this?” becomes as important as “how accurate is it?” Explainable AI provides tools to answer that question at the model level, the prediction level, and in terms regulators and business stakeholders can understand.

Two Levels of Explanation

Global explanations: How does the model behave in general? Which features matter most across all predictions?

Local explanations: Why did the model make this specific prediction for this specific input?

Most XAI tools provide both, and both are necessary in practice.

SHAP (SHapley Additive exPlanations)

SHAP is the gold standard for ML interpretability. Based on game-theoretic Shapley values, it assigns each feature a fair contribution to each prediction — mathematically consistent across all model types.

import shap
import xgboost as xgb
import matplotlib.pyplot as plt

model = xgb.XGBClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)

# Create explainer (TreeExplainer is fast for tree models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Global: Bar plot of mean |SHAP| per feature
shap.summary_plot(shap_values, X_test, feature_names=feature_names, plot_type='bar')

# Global: Beeswarm plot showing feature impact distribution
shap.summary_plot(shap_values, X_test, feature_names=feature_names)

# Local: Waterfall plot for a single prediction
shap.plots.waterfall(shap.Explanation(
    values=shap_values[42],
    base_values=explainer.expected_value,
    data=X_test[42],
    feature_names=feature_names
))

# Local: Force plot (interactive)
shap.force_plot(explainer.expected_value, shap_values[42], X_test[42], feature_names=feature_names)

SHAP Interaction Analysis

# Dependence plot: how one feature's SHAP value changes across its range
# Colored by a second feature to show interactions
shap.dependence_plot('income', shap_values, X_test,
                      feature_names=feature_names,
                      interaction_index='age')  # Color by age

# Compute interaction values (expensive, but shows pairwise feature interactions)
shap_interaction = explainer.shap_interaction_values(X_test[:100])
shap.summary_plot(shap_interaction, X_test[:100], feature_names=feature_names)

LIME (Local Interpretable Model-Agnostic Explanations)

LIME explains individual predictions by fitting a simple interpretable model (linear regression) in the local neighborhood of the prediction:

import lime
import lime.lime_tabular

# Create LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train,
    feature_names=feature_names,
    class_names=['Normal', 'Fraud'],
    mode='classification',
    discretize_continuous=True
)

# Explain a single prediction
idx = 42
explanation = explainer.explain_instance(
    data_row=X_test[idx],
    predict_fn=model.predict_proba,
    num_features=10
)

# Show explanation
explanation.show_in_notebook(show_table=True)
print(explanation.as_list())  # [(feature_name, contribution), ...]

Partial Dependence Plots (PDP)

Shows the marginal effect of one or two features on model predictions:

from sklearn.inspection import PartialDependenceDisplay
import matplotlib.pyplot as plt

# Single feature PDP
fig, ax = plt.subplots(figsize=(8, 5))
PartialDependenceDisplay.from_estimator(
    model, X_train, features=['income'], feature_names=feature_names, ax=ax
)
plt.title("Partial Dependence: Income")

# Two-feature interaction PDP (heatmap)
PartialDependenceDisplay.from_estimator(
    model, X_train,
    features=[('income', 'credit_score')],  # Tuple for 2D PDP
    feature_names=feature_names
)

Counterfactual Explanations

Answers: “What would have to change for the model to flip its decision?”

import dice_ml
from dice_ml import Dice

# Define data and model
data = dice_ml.Data(dataframe=df_train, continuous_features=numeric_cols, outcome_name='target')
m = dice_ml.Model(model=model, backend="sklearn")

# Generate counterfactuals
exp = Dice(data, m)
query_instance = X_test[42:43]
cf = exp.generate_counterfactuals(
    query_instance, total_CFs=3,
    desired_class="opposite",
    proximity_weight=0.2,
    diversity_weight=1.0
)
cf.visualize_as_dataframe()
# Output: "If income had been $62,000 instead of $38,000, the model would have approved."

Model-Specific Interpretability

# Decision tree: visualize the full tree
from sklearn.tree import plot_tree, export_text
plot_tree(dt_model, feature_names=feature_names, class_names=['No', 'Yes'],
          filled=True, rounded=True, max_depth=3)

# Linear models: plot coefficients
import pandas as pd
coef = pd.Series(lr_model.coef_[0], index=feature_names).sort_values()
coef.plot(kind='barh', figsize=(10, 8))
plt.title('Logistic Regression Coefficients')

Regulatory Context (2026)

Several regulations now require AI explainability:

EU AI Act (2025+): High-risk AI systems require transparency and human oversight
GDPR Article 22: Right to explanation for automated decisions
US Executive Order on AI (2023): Transparency requirements for government AI
Financial Services: Model risk management (SR 11-7) requires explainability for credit models

In practice, this means:

Document which features drive decisions
Be able to explain individual decisions in plain language
Monitor for feature drift that changes model behavior
Detect and document bias across demographic groups

SHAP values are now the industry standard for fulfilling these requirements — they’re precise, consistent, and can be automated into production monitoring pipelines.