The road for high-quality analytics starts with high-quality data

Michele Vaccaro Michele Vaccaro
November 30, 2018 Big Data, Cloud & DevOps

Ready to learn Big Data Analytics? Browse courses like Big Data – What Every Manager Needs to Know developed by industry thought leaders and Experfy in Harvard Innovation Lab.

The vast majority of enterprises are convinced that big data analytics are an important element of their organisation’s data management and business intelligence programs, yet only 30 percent of them have adopted a system and put it into production, while the rest are still evaluating the opportunity. This poses the question: why is adoption moving slowly?

There are multiple reasons for this, but one of them is that there is a lot of behind the scenes work to clean and select the most appropriate data, before a company can reap any true benefit. As a result, this increases cost and manual effort, and lowers the time-to-market of big data analytics projects.

In fact, according to a recent article in The New York Times, the work of cleaning up and wrangling data into a usable form is a critical barrier to obtaining insights. It highlighted that data scientists spend from 50 percent to 80 percent of their time mired in the mundane labour of collecting and preparing unruly digital data, before it can be explored. The Wikibon Big Data Analytics Adoption Survey 2014-2015 also supports this view and reveals that the difficulty to transform data into a suitable form for analysis is the biggest technology-related barrier in realising the full value of big data analytics. Additional barriers also derive from the difficulty to integrate big data with existing infrastructure and to merge disparate data sources.

So, when deploying a data lake, a solution that is able to gather all the different data streams, the key question should be “Are my data streams healthy?”

Considering that information from inside an enterprise is the second most common source of data when building a data lake, businesses must take proper care of the information they already possess. By managing their internal processes and organising both structured and unstructured information, organisations would be well on their way in building clean and readily usable data streams for their data lakes. Real-time, relevant information, that is integrated with their current systems, would become much more compliant, secure and easily accessible. An additional benefit is that they also would be much more efficient in running their day-to-day business.

In simple terms (and keeping the same analogy of the data lake), the concept is very simple: if you have a lake that is filled with polluted water coming from its different streams, a focus needs to be put on preventing the streams getting polluted in the first instance, rather than continuously cleaning the polluted lake.

A proper Information Management system – including healthy Enterprise Content Management (ECM) and modern archiving strategies – will definitely help to manage and integrate existing data and content across the enterprise, as well as limit or overcome the pollution of such information, by granting chain of custody and compliant governance along the data’s entire lifecycle.

As an example, the utilities department of a Greater London airport has recently implemented smart meters to collect real-time data on water, gas and electricity consumption. But smart meters are assigned to airport rental spaces, not to the tenant who has rented the area to open their shop or kiosk. Therefore, the utilities manager doesn’t know to whom to send the usage bills to, as only the leasing manager can differentiate which spaces are rented by whom. This is a typical example of where data and information are maintained in separate silos and managed with improper software tools (typically standard office tools). These situations force manual operations like data cleansing and realignment in order to produce a valuable output, thus exposing the process to inconsistencies, data loss or even data leaks.

In this instance, the solution could be that the utilities manager and leasing manager meet once a month to align their data sat in Excel files at the pub in front of a beer. However to make their lives easier, implementing a proper system to connect their two separate worlds would help to manage all the leasing processes, contracts and documents on a daily basis. Integrating this with other systems would not only make their daily operation more efficient and secure, but would also produce a usable stream of information for the data lake and the analytics engine that consumes this information.

Modern ECM and archiving systems provide all the compliance, enterprise-grade functionalities to solve these requirements and enables organisations to start their journey to the new world of big data with the right approach and clean data streams to fill their data lakes with.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Michele Vaccaro

    Tags
    Big Data & Technology
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    How Can Blockchain Solve the Shortcomings of Predictive Analytics?

    How Can Blockchain Solve the Shortcomings of Predictive Analytics?

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.