GDPR Challenge: Finding the Data That Needs to be Forgotten

Amnon Drori Amnon Drori
November 28, 2018 Big Data, Cloud & DevOps

Ready to learn Big Data Analytics? Browse courses like Big Data – What Every Manager Needs to Know developed by industry thought leaders and Experfy in Harvard Innovation Lab.

Whether they’re ready or not, companies around the world have a new data challenge – one that they must succeed in meeting, if they don’t want to lose huge amounts of money in fines and penalties. Among its many rules, the GDPR, Europe’s new data privacy and security regime, requires that companies delete personal information on European residents within 48 hours of being asked to do so – providing Europeans with the “right to be forgotten,” and failure to do so could cost a company a lot of money.

The question for many organizations isn’t just “what system do we have in place to remove user data;” it’s “where do we find the data we need to remove?” Over the years, with the implementation of new databases, new data recording regimes, new administrator policies, and new marketing programs, personal data on users is likely stored in many locations – on servers, in backups, on social media channels, and more.

Even worse, the metadata information on the same data may vary, depending on the way data was stored and structured in a database or backup. A simple search isn’t going to find everything – and if companies can’t prove they can find everything, they may find themselves penalized. Considering that a typical organization likely has billions of pieces of data, finding and isolating a specific piece of data is going to be a major challenge – far too big and complicated for even a whole IT team. The only way to do this accurately is to automate the process, implementing a smart system that can quickly parse even vast amounts of data and track down the specific data required.

GDPR regulations went into effect in May, but as of just a few weeks ago, some 80% of large companies worldwide surveyed – and nearly 90% of those surveyed in the U.S. – were not yet GDPR compliant. While the EU seems to be taking a tolerant approach to companies that are not yet compliant, given that transition has proven to be a challenge for many firms, eventually full compliance will be expected, and full enforcement will be imposed.

Among the important goals of the GDPR is to give European Union residents control over “their” data – the personal information companies have collected about them. The EU guidelines on what is expected from firms are clear; in order to comply, companies need to be able to track down all the data they have on EU residents and have it readily available for processing.

Seems simple enough, but actually it isn’t. The challenge of tracking down data, determining its provenance, and ensuring that it has been eliminated throughout the data chain, is proving to be a very difficult task for many companies. Personally identifiable information (PII), which GDPR requires be deleted on demand, can be found in databases, backups, removable media that is in storage, employee devices, etc. Some of the data could be duplicated or propagated down the line, and be stored in dozens of places.

Locating data in an organization generally falls to the business intelligence (BI) team, which typically maps out the structure of data in an organization, and traces it through the various BI systems in place. In order to track down specific PII, the BI team must find an occurrence of the data (for example, an individual’s e-mail address) and trace its flow through the organization’s data storage areas.

In order to get GDPR-ready, companies have been (or should have been) performing this activity for all data elements GDPR would require organizations to track. This is a monumental task, and likely an important reason why companies report that they are not GDPR-compliant.

Given the stakes and the danger, organizations really can’t take a chance that their BI team will be able to process all the data in time. Instead, what they need is an automated system that will find the data for them. A smart automated BI detection system will parse through all the data in an organization’s system and determine the location of data, and find where it was propagated to. The smart automated system categorizes data according to its type, indexing its location so that it can easily be found, and determining the dependencies and relationships of that data so that all other data associated with it can be deleted quickly and accurately.

Thus when a request comes in from an EU resident that their e-mail or other PII be erased from an organization’s system – and EU enforcers inquire months later on whether that request was fulfilled – organizations will be able to claim that they carried out their obligations under the GDPR, and prove their compliance.’

For many organizations, GDPR may be the biggest data challenge they have ever faced – but it also provides organizations with an opportunity to truly own their data. By implementing a smart system that will ensure that they are able to find the data in their systems at will, organizations will ensure that they are GDPR-compliant – and have the opportunity to utilize all their data to help their organizations run more efficiently and profitably.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Amnon Drori

    Tags
    Big Data & Technology
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Learning AI if You Suck at Math -Part 4- Tensors Illustrated (with Cats!)

    Learning AI if You Suck at Math -Part 4- Tensors Illustrated (with Cats!)

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.