Explainable AI: Why visualizing neural networks is important

Ben Dickson Ben Dickson
April 9, 2019 AI & Machine Learning

Image credit: Depositphotos

Last week, researchers from OpenAI and Google introduced Activation Atlases, a tool that helps make sense of the inner workings of neural networks by visualizing how they see and classify different objects.

At first glance, Activation Atlases is an amusing tool helps you see the world through the eyes of AI models. But it also one of the many important efforts that are helping explain decisions made by neural networks, one of the greatest challenges of the AI industry and an important hurdle in trusting AI in critical tasks.

openai google activation atlases

Activation Atlasses, a neural net visualization tool developed by researchers at Google and OpenAI, helps visualize how neural networks see the world and classify images (image credit: OpenAI)

AI’s trust problem is real

Artificial intelligence, or namely its popular subset deep learning, is far from the only kind of software we’re using. We’ve been using software in different fields for decades.

However, because of the way it works, deep learning has made tremendous advances in areas where software has historically struggled such as speech recognition, computer vision, and natural language processing.

Thanks to these advances in deep learning, AI algorithms have made inroads in critical domains such as self-driving cars, video and image analysis, healthcare and job application processing. The common denominator of all these fields is that if AI algorithms make mistakes, they can have grave impact on the safety, health and life of humans who are subject to their functionality.

All software goes through debugging and testing to make sure their users can trust them. The dramatic impact that mistakes in deep learning algorithms can have make it even more critical to make sure we can trust them.

For instance, if a self-driving car misses to detect an obstacle or another car, it can cause a fatal crash. In health care, a faulty AI algorithm might make a wrong diagnosis and treatment recommendation, negatively impacting the health of the patient. An AI-powered recruitment system might wrongly turn down a job application, unjustly depriving a qualified candidate of a career opportunity.

The engineers of AI algorithms must be able to make sure their models can be trusted to perform the critical tasks they defer to them. Likewise, the users of these systems must be able to verify how much they can trust the decisions made by the AI.

Deep learning’s transparency problem is also real

Question mark in black box

The problem with current AI technology is that contrary to classical software, where software engineers manually design and define behavior rules, deep learning algorithms have very little top-down, human-induced design. Neural networks, the fundamental structure underlying deep learning algorithms, derive their behavior from thousands and millions of training examples.

By analyzing and comparing the examples, neural networks create complex mathematical functions with thousands of parameters, which can make statistical predictions and classify new data. For instance, after analyzing thousands of cat pictures, a neural network can “look” at picture it hasn’t seen before and say how confident it is that contains a cat.

Well-trained neural networks can produce very accurate results, sometimes even better than humans. But the problem is we don’t know how they work. Even the engineers who build deep learning models often can’t make sense of the logic behind the thousands and millions of parameters that constitute the neural networks.

For instance, a neural network might have learned to classify shark images not by the physique of the sharks themselves but by the blue background of the water that surrounds them in photos. But it’s very hard to deduce that from an opaque neural network. Most times, you can only find the shortcomings of a trained deep learning model through trial and error, testing it with different samples and observing its output.

If you don’t know how a software works, you can’t know for certain how it fails, and we already mentioned some of the cases where failures in deep learning can have fatal consequences.

But the lack of transparency and explainability of AI algorithms also poses a serious problem to their end users. For instance, if a doctor is going to follow a treatment recommendation made by an AI algorithm, they should be able to verify the logic and reasoning behind that recommendation. After all, for the moment there will always be a human who will be held accountable, regardless of whether a decision involved AI algorithms or not.

OpenAI’s neural network visualizer

In this context, OpenAI and Google’s Activation Atlases solve a very important problem: It helps visualize how neural networks classify their data. An oversimplification of this means that the tool helps transform the thousands of numbers that compose the neural net into images, making it much easier for humans to understand their meaning. Conveniently, the team who developed the tool included Chris Olah, the one of the developers of DeepDream, a tool that used deep learning models to generate hallucinogenic images.

Activation Atlases isn’t the first tool that helps visualize the inner-workings of neural networks, but what makes it significant is that it gives a top-down view of the organization of neurons.

openai google activation atlases neural network structure

Activation Atlases shows how a neural network divides examples between different classes (image credit: OpenAI)

For instance, in the image above, we can see how the neural network categorizes images between “wok” and “frying pan.” This zoomed out view can help find problematic correlations that can result in mistakes.

A tool like Activation Atlases can help discover and fix potential vulnerabilities in neural networks. In the following example, a deep learning model was fooled into classifying a grey whale as a white shark by adding a baseball to the image.

openai google activation atlases adversarial example

Image credit: Google

Called “adversarial perturbations,” these modifications can become dangerous in areas such as self-driving cars, where misclassifications can cause safety risks. Well-designed adversarial attacks make modifications to input data that are invisible to the human eye but force neural networks to change their output values.

With Activation Atlases, developers of artificial intelligence models can investigate their neural networks and find potential pain points that can be exploited for adversarial attacks. An investigation with Activation Atlas reveals that the previous error was due to similarities between the stitches on a baseball and the teeth of a shark which can fool the neural network.

openai google activation atlases neural network structure baseball shark teeth
 

Image credit: Google

With more tools at our disposal for investigating and explaining neural networks, we are better disposed to protect our AI models against mistakes and intentional manipulations.

Building trust in artificial intelligence

Activation Atlases is one of several efforts for creating explainable AI. As artificial intelligence finds its way into more and more fields, an increasing number of organizations are working on developing method to investigate decisions made by AI models, or AI models that are open to investigation. This includes an expansive initiative by DARPA, the research arm of the U.S. Department of Defense.

In the past year, we’ve had the chance to review some of these projects, each of which tackle the challenge of explaining AI decisions from a different angle. RISE, a method developed by researchers at Boston University, produces saliency maps that show how each pixel in an input image contribute to the output classes of a neural network. The advantage of RISE is that it is model-agnostic, which means it is not dependent on the AI model and structure used and does its work by only comparing inputs and outputs. These explainable AI methods are also known as “black box” methods.

RISE explainable AI example saliency map

Examples of saliency maps produced by RISE

Another interesting black box AI investigation tool is AutoZOOM, a method paper recently developed by researchers at IBM. AutoZOOM has been specially designed to help improve robustness of AI models by finding adversarial vulnerabilities with the minimum number of examples. The point of AutoZOOM is to establish trust in AI models by better understanding what are their strengths and weaknesses.

GAN Dissection, another IBM tool for visualizing the inner-workings of neural networks, has been designed to understand generative adversarial networks (GAN), a popular AI technique that is useful for creating unique creative content. GAN Dissection and its associated tool GANPaint are a step toward creating better cooperation between human operators and AI models.

GANpaint screenshot 1

GANpaint is a visual tool that lets you manipulate the decisions made by AI models

Every one of these tools and the many others that are being developed are key to establishing trust in AI models by finding their strengths and weaknesses.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Ben Dickson

    Tags
    Artificial Intelligence
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Six Impressive Ways AI Will Reshape Fintech Mobile Apps

    Six Impressive Ways AI Will Reshape Fintech Mobile Apps

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.