Demystifying Interpretability and Explainability in Machine Learning

Danni Liu

Apr 8, 2023 • 6 min read

Image by Danni Liu adapted from Jaouad.K/ Canva

Have you heard of the practice "safe to try"? It's a heuristic that can accelerate a team's decision-making process, especially when you've got some tricky choices that need everyone's input. It helps keep things moving and makes sure no one's just sitting there keeping their thoughts to themselves.

I ran a safe-to-try session on a data science project with my immediate team not long ago. It's a customer segmentation project in which I'm teaming up with a colleague from another department in Singapore. I took my team through the whole shebang – from gathering data to deploying the model. I also showed them the features we're considering for the machine learning model. If you're not sure what features are, just think of them like ingredients, and the model is your recipe. During the session, one of my teammates was like, "Whoa, that's a lot of input. How do we know they all matter, and can we actually explain what the model's spitting out?"

That question got me curious, so I dug a little deeper into the whole interpretability and explainability thing in machine learning. And now, I'm gonna share what I found in this blog post. We'll cover:

What are interpretability and explainability?
Why are interpretability and explainability important?
Interpretable vs Black Box Models
Methods and techniques to improve interpretability and explainability

What are Interpretability and Explainability?

Interpretability and explainability might sound like fancy tech talk, but they're pretty simple ideas, really. They're all about how easy it is to understand and explain decisions made by machine learning algorithms.

Interpretability is all about how well a person can figure out the reason behind a decision or predict the model's outcome. This means knowing which input features matter most for the model's predictions and understanding how those features connect to the model's output – basically, getting how the machine learning method works.

Explainability, on the other hand, is all about giving a reason or justification for the model's predictions or choices. It can mean giving extra context or info to help people get why the model's making a certain prediction. Think of explainability as being able to answer "why" questions like, "Why didn't the treatment work on the patient?" or "Why was my loan turned down?"

While these two ideas are pretty close, they're not the same thing. Interpretability is about figuring out how the model works, while explainability is about justifying the model's decisions. Sometimes, a model might be easy to understand (interpretable) without being easy to justify (explainable), or the other way around.

Why are Interpretability and Explainability Important?

Interpretability and explainability are super important for a bunch of reasons:

Trust: If we get how a machine learning model makes decisions, we're more likely to believe in its results.
Accountability: When AI systems mess up or act biased, understanding their decision-making helps us find and fix the problems.
Compliance: Explaining decisions to regulators and customers is crucial in some fields, like finance and healthcare.
Ethics: Making sure AI systems are fair, unbiased, and respect privacy means we need to be transparent about how they make decisions.

Interpretable vs Black Box Models

Machine learning models can be split into two main groups: interpretable (easy-to-understand) ones and mysterious black boxes. Interpretable models let you see the connections between input features and output predictions, making them a lot easier to understand. On the other hand, black box models are pretty hard to understand because their decision-making process is somewhat opaque.

Loads of advanced machine learning models are "black boxes" because they're super complicated and not very interpretable. Models can be placed on a spectrum, ranging from highly interpretable to highly complex and difficult to understand.
Let’s check out some models that are interpretable and some that aren't.

Interpretable Models

Some models are inherently interpretable, their decision-making process can be easily understood. Examples of such models include:

Linear Regression: In linear regression, the models try to find the best-fitting line to represent the relationship between the input features and output. With a small number of input features, it's relatively easy to understand the model's decision-making process. Think about a model that predicts house prices with a few features like property size and the number of rooms. As they increase, the property price likely increases.
Decision Trees: Decision trees split the input data into subsets based on specific feature values. The tree's structure and branching conditions are easily visualized and understood, making them interpretable models. For example, imagine you are deciding whether to buy a car.
Logistic Regression: Logistic regression models predict the probability of an event occurring based on input features. Like linear regression, they use a relatively simple mathematical function that can be readily understood and interpreted.

Black Box Models

As mentioned earlier, more complex models like deep learning algorithms and ensemble methods are often called "black boxes." Before diving into specific examples, let me break down what deep learning and ensemble methods are.

Deep learning is a subfield of machine learning that focuses on algorithms inspired by the structure and function of the brain, called neural networks. Deep learning excels in tasks like image recognition, natural language processing, and speech recognition, where traditional machine learning methods might struggle.
Ensemble methods, on the other hand, combine multiple simple models to enhance overall performance and accuracy. The idea behind ensemble methods is that a group of models working together can achieve better results than a single model alone.

Now that we've got that covered let's check out some examples of black box models:

Neural Networks: neural networks are challenging to interpret because they’re super complex and have all sorts of non-linear relationships between input features and output.
Random Forests: random forests are an ensemble method. It is made up of a bunch of decision trees. Individual decision trees are easy to understand, but when you mix a bunch of them together, it gets way more complex and harder to wrap your head around.

You might not realize it, but there's a trade-off between how easy a model is to understand and how well it performs. Simple models are usually a piece of cake to interpret, but they're not as powerful. The more powerful models tend to perform better but are harder to understand because of all the complexity. Finding the right balance depends on what you need the model for is important.

Methods and Techniques to Improve Interpretability and Explainability

I came across several methods and techniques to help demystify black box models and make them more interpretable and explainable:

Feature importance: This method ranks the input features based on their impact on the model's predictions. Know which features are essential can help one understand how the model makes decisions.
Partial Dependence Plots (PDP): PDPs help visualize the relationship between a feature and the model's predictions while keeping other features constant. It's like observing how adding more salt affects the taste of a dish while keeping all other ingredients the same. PDPs can provide insights into how changes in specific features affect the model's decisions.
Local Interpretable Model-agnostic Explanations (LIME): LIME is a method that tries to explain a black box model's predictions for a specific instance by creating a simpler, interpretable model (like linear regression) around that instance. It's like trying to figure out a chef's fancy dish by recreating it with a simpler recipe that tastes pretty close. LIME helps us get a human-understandable approximation of the complex model's decision for a particular case.
Counterfactual Explanations: This approach provides insights into a model's decisions by generating counterfactual instances. Counterfactuals are instances that are similar to the original instance but have different outcomes. Like, imagine you didn't get a loan from a credit-scoring model. A counterfactual explanation might show you a similar profile that did get approved, pointing out the differences between the two, like a better credit score or lower debt-to-income ratio. This helps you see what parts of your profile made a difference in the decision.
Shapley Additive Explanations (SHAP): SHAP is another cool technique. Taking inspiration from cooperative game theory, SHAP values help us understand how much each feature contributes to a specific prediction. To get SHAP, think about a team working on a project. Each team member has different skills and brings something different to the table for the project's success. The Shapley value, which was first made to share gains fairly among team members, can be used to measure each member's contribution to the project's success. In the same way, SHAP values measure the contribution of each feature to the model's prediction for a specific example.

So, there you have it, friends. We've just scratched the surface of this enormous topic. As machine learning keeps weaving its way into our daily routines, it's super important to grasp how these algorithms make decisions. Interpretability and explainability play a big role in building trust, ensuring accountability, and fostering ethical AI systems.

I hope this post has helped shed some light on this fascinating area within machine learning and deepened your understanding a bit. I know it's been an eye-opener for me while writing it. It just goes to show there's always more to learn.