• Big Data & Technology
  • Amnon Drori
  • NOV 28, 2018

GDPR Challenge: Finding the Data That Needs to be Forgotten

Ready to learn Big Data Analytics? Browse courses like Big Data - What Every Manager Needs to Know developed by industry thought leaders and Experfy in Harvard Innovation Lab.

Whether they’re ready or not, companies around the world have a new data challenge – one that they must succeed in meeting, if they don’t want to lose huge amounts of money in fines and penalties. Among its many rules, the GDPR, Europe’s new data privacy and security regime, requires that companies delete personal information on European residents within 48 hours of being asked to do so – providing Europeans with the “right to be forgotten,” and failure to do so could cost a company a lot of money.

The question for many organizations isn’t just “what system do we have in place to remove user data;” it’s “where do we find the data we need to remove?” Over the years, with the implementation of new databases, new data recording regimes, new administrator policies, and new marketing programs, personal data on users is likely stored in many locations – on servers, in backups, on social media channels, and more.

Even worse, the metadata information on the same data may vary, depending on the way data was stored and structured in a database or backup. A simple search isn’t going to find everything – and if companies can’t prove they can find everything, they may find themselves penalized. Considering that a typical organization likely has billions of pieces of data, finding and isolating a specific piece of data is going to be a major challenge – far too big and complicated for even a whole IT team. The only way to do this accurately is to automate the process, implementing a smart system that can quickly parse even vast amounts of data and track down the specific data required.

GDPR regulations went into effect in May, but as of just a few weeks ago, some 80% of large companies worldwide surveyed – and nearly 90% of those surveyed in the U.S. – were not yet GDPR compliant. While the EU seems to be taking a tolerant approach to companies that are not yet compliant, given that transition has proven to be a challenge for many firms, eventually full compliance will be expected, and full enforcement will be imposed.

Among the important goals of the GDPR is to give European Union residents control over “their” data – the personal information companies have collected about them. The EU guidelines on what is expected from firms are clear; in order to comply, companies need to be able to track down all the data they have on EU residents and have it readily available for processing.

Seems simple enough, but actually it isn’t. The challenge of tracking down data, determining its provenance, and ensuring that it has been eliminated throughout the data chain, is proving to be a very difficult task for many companies. Personally identifiable information (PII), which GDPR requires be deleted on demand, can be found in databases, backups, removable media that is in storage, employee devices, etc. Some of the data could be duplicated or propagated down the line, and be stored in dozens of places.

Locating data in an organization generally falls to the business intelligence (BI) team, which typically maps out the structure of data in an organization, and traces it through the various BI systems in place. In order to track down specific PII, the BI team must find an occurrence of the data (for example, an individual’s e-mail address) and trace its flow through the organization’s data storage areas.

In order to get GDPR-ready, companies have been (or should have been) performing this activity for all data elements GDPR would require organizations to track. This is a monumental task, and likely an important reason why companies report that they are not GDPR-compliant.

Given the stakes and the danger, organizations really can’t take a chance that their BI team will be able to process all the data in time. Instead, what they need is an automated system that will find the data for them. A smart automated BI detection system will parse through all the data in an organization’s system and determine the location of data, and find where it was propagated to. The smart automated system categorizes data according to its type, indexing its location so that it can easily be found, and determining the dependencies and relationships of that data so that all other data associated with it can be deleted quickly and accurately.

Thus when a request comes in from an EU resident that their e-mail or other PII be erased from an organization’s system – and EU enforcers inquire months later on whether that request was fulfilled – organizations will be able to claim that they carried out their obligations under the GDPR, and prove their compliance.’

For many organizations, GDPR may be the biggest data challenge they have ever faced – but it also provides organizations with an opportunity to truly own their data. By implementing a smart system that will ensure that they are able to find the data in their systems at will, organizations will ensure that they are GDPR-compliant – and have the opportunity to utilize all their data to help their organizations run more efficiently and profitably.

The Harvard Innovation Lab

Made in Boston @

The Harvard Innovation Lab


Matching Providers

Matching providers 2
comments powered by Disqus.