Machine Un-Learning: Why Forgetting Might Be the Key to AI

Natalie Fratto Natalie Fratto
February 18, 2019 AI & Machine Learning

Let’s face it — forgetting things sucks. It’s frustrating not to remember where you left your keys or to stumble over your words because you can’t recall the name of that colleague you just ran into at the grocery store. However, forgetfulness is core to the human condition, and in fact, we’re lucky that we’re able to do so.

For humans, forgetting is more than just a failure to remember; it’s an active process that helps the brain take in new information and make decisions more effectively.

Now, data scientists are applying neuroscience principles to improve machine learning, convinced that human brains may hold the key to unlocking Turing complete artificial intelligence.


According to a recent paper in Neuron, our brains are meant to act as information filters. Put in a big pile of messy data, filter for the useful bits, then clear out any irrelevant details in order to tell a story or make a decision. The unused pieces are deleted in order to make space for new data — like running a disk cleanup on a computer. In neurobiology terms, forgetting happens when synaptic connections between neurons weaken or are eliminated over time, and as new neurons develop, they rewire the circuits of the hippocampus, overwriting existing memories (New Atlas).

For humans, forgetting has two benefits:

  1. It enhances flexibility by reducing the influence of outdated information on our decision-making
  2. It prevents overfitting to specific past events, promoting generalizations (Neuron)

In order to adapt effectively, humans need to be able to strategically forget.

But what about computers?

Herein lies one of the big challenges for artificial intelligence — computers forget differently than humans. Deep neural networks are the most successful technique for a range of machine learning tasks, but they don’t forget like we do.

Let’s take a simplified example- if you teach a child that speaks English to learn Spanish, the child will use relevant clues from learning English to apply it to Spanish —perhaps nouns, verb-tenses, sentence building — and simultaneously forget the parts that aren’t pertinent— think accents, mumbling, intonation. The child can incrementally learn and build while strategically forgetting.

In contrast, if a neural network is trained to learn English, the parameters are adapted to solve for English. If then, you’d like to teach it Spanish, new adaptations for Spanish will overwrite the knowledge that the neural network previously acquired for English, effectively deleting everything and starting anew. This is called ‘catastrophic forgetting’, and “it’s one of the fundamental limitations of neural networks” (Deep Mind).

While it’s still new territory, scientists have made strides recently to explore a few potential theories on how to overcome this limitation.

Teaching AI to Strategically Forget: Three Approaches

#1. Long Short Term Memory Networks (LSTM)

LSTMs are a type of recurrent neural network that use specific learning mechanisms to decide which pieces of information to remember, which to update, and which to pay attention to”(Edwin Chen) at any point.

It’s easiest to explain how LSTMs work by using a movie analogy: Imagine that a computer is trying to predict what will happen next in a movie by analyzing previous scenes. In one scene, a woman holds a knife — does the computer guess she’s a chef or a murderer? In another, the woman and a man are eating sushi under an golden archway — are they in Japan or at McDonalds? Maybe it’s actually St. Louis?

Pretty difficult to predict.

LSTMs aid in this process by helping a neural network 1) forget/remember, 2) save and 3) focus:

  1. Forget/Remember: “If a scene ends, for example, the model should forget the current scene location, the time of day, and reset any scene-specific information; however, if a character dies in the scene, it should continue remembering that he’s no longer alive. Thus, we want the model to learn a separate forgetting/remembering mechanism: when new inputs come in, it needs to know which beliefs to keep or throw away.” (Edwin Chen)
  2. Save: When the model sees a new image, it needs to learn whether any information about the image is worth using and saving. If the woman walks past a billboard in a certain scene — will it be important to remember the billboard or is it simply noise?
  3. Focus: We need to remember that the woman in the movie is a mother, because we will see her children later on, but it is perhaps not important in a scene that she isn’t in, so we don’t need to focus on it during that scene. In the same way, not everything stored in the neural network’s long term memory is immediately relevant, so the LSTM helps to determine which parts to focus on at any given time while keeping everything safely stored for later.

#2. Elastic Weight Consolidation (EWC)

EWC is an algorithm created in March 2017 by researchers at Google’s DeepMind that mimics a neuroscience processes called synaptic consolidation. During synaptic consolidation, our brains assess a task, compute the importance of many neurons used to perform the task, weighing some neurons as more critical to performing the task correctly. These critical neurons are coded as important and are less likely to be overwritten in subsequent tasks. Similarly, in neural networks, multiple connections (like neurons) are used to perform a task. EWC codes some connections as critical and thus protects them from being overwritten/forgotten.

In the chart below, you can see what happened when the researchers applied EWC to a game of Atari — the blue line is a standard deep learning process, and the red and brown lines are aided by EWC:

blue line = standard deep learning, red & brown lines = improvements with the help of EWC

#3. Bottleneck Theory

In the Fall of 2017, the AI community was humming over a talk by Naftali Tishby, a computer scientist and neuroscientist from the Hebrew University of Jerusalem and evidence for what he called The Bottleneck Theory. “The idea is that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts” (Quanta).

As Tishby explains it, neural networks go through two phases while learning — fitting and compressing. During fitting, the network labels its training data, and during compression, a much longer process, it “sheds information about the data, keeping track of only the strongest features” (Qanta) — those will be most relevant to helping it generalize. In this way, compressing is a way of strategically forgetting, and manipulating this bottleneck could be a tool AI researchers use to to construct new objectives and architectures of stronger neural networks in the future.

As Tishby says, “the most important part of learning is actually forgetting.”

It’s possible that our brains and distinctly human processes, like forgetting, hold the map to creating strong artificial intelligence, but scientists are collectively still figuring out how to read the directions.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Natalie Fratto

    Tags
    Artificial Intelligence
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    A Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

    A Common Data Science Mistake: Prediction/Recommendation by Manipulating Model Inputs

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.