{"id":22599,"date":"2021-02-03T09:58:22","date_gmt":"2021-02-03T09:58:22","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/top-5-data-trends-for-cdos-watch-out-2021\/"},"modified":"2023-09-05T11:23:07","modified_gmt":"2023-09-05T11:23:07","slug":"top-5-data-trends-for-cdos-watch-out-2021","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/top-5-data-trends-for-cdos-watch-out-2021\/","title":{"rendered":"The Top 5 Data Trends for CDOs to Watch Out for in 2021"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22599\" class=\"elementor elementor-22599\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-441b884 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"441b884\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a2762e4\" data-id=\"a2762e4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b8e1671 elementor-widget elementor-widget-text-editor\" data-id=\"b8e1671\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5f74\">Just like every other realm, 2020 upended the data world. When COVID shut down businesses and sent employees to work from home, companies had to quickly adapt to the \u201cnew normal\u201d.<\/p>\n\n<p id=\"6c32\">Cloud became an absolute necessity, as organizations moved to working remotely. Data governance and security became a big priority, with everyone accessing data from different locations and systems. Smarter AI became appealing, now that historical models were meaningless. In short, organizations realized they needed to make changes fast. Data investments went up and organizations sought to upgrade their systems and create the perfect data stack.<\/p>\n\n<p id=\"4e1a\">With\u00a02020 in the rearview mirror, we\u2019re now looking ahead to a new and hopefully better year. What will 2021 bring to the data world? How will data infrastructure evolve to keep up with all the latest innovations and changes?<\/p>\n\n<p id=\"cf71\">This year, we\u2019ll see several new data trends: the emergence of new data roles and data quality frameworks, the rise of the modern data stack and modern metadata solutions, and the convergence of data lakes and warehouses.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-752b799 elementor-widget elementor-widget-heading\" data-id=\"752b799\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">1. Data lakes and warehouses are converging<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e4e7de7 elementor-widget elementor-widget-heading\" data-id=\"e4e7de7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">For the past decade, data architects have designed data operations around two key units:<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d46df26 elementor-widget elementor-widget-text-editor\" data-id=\"d46df26\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><strong>Data lakes<\/strong>: Cheap storage to store vast amounts of raw or even unstructured data. The data lake architecture is typically great for ad-hoc exploration and data science use cases.<\/li>\n<li><strong>Data warehouses<\/strong>: Traditionally, data warehouses have optimized compute and processing speed. This is helpful for reporting and business intelligence, making warehouses the system of choice for analytics teams.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e6431b0 elementor-widget elementor-widget-text-editor\" data-id=\"e6431b0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote class=\"wp-block-quote\">\n<p>Today, many companies still use both systems \u2014 a data lake for all their data, plus specialized data warehouses for analytics and reporting use cases.<\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a0ca989 elementor-widget elementor-widget-text-editor\" data-id=\"a0ca989\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"da85\">While we\u2019re not there yet, we\u2019re starting to see the two ecosystems converge as data lakes and warehouses both add more capabilities.<\/p>\n\n<p id=\"bf68\">Data warehouses like Snowflake already separate costs for storage and compute, drastically reducing the costs associated with storing all your data on data warehouses. Taking this one step further, some data warehouse players have started\u00a0<a href=\"https:\/\/www.theregister.com\/2020\/11\/17\/snowflake_releases\/\" target=\"_blank\" rel=\"noreferrer noopener\">adding support for semi-structured data<\/a>.<\/p>\n\n<p id=\"62de\">On the other hand, data lake players like Databricks have started moving towards the concept of a \u201cdata lakehouse\u201d, and they recently announced support for\u00a0SQL analytics\u00a0and\u00a0<a href=\"https:\/\/databricks.com\/blog\/2020\/11\/23\/acid-transactions-on-data-lakes.html\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">ACID transactions<\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-11ba8fe elementor-widget elementor-widget-heading\" data-id=\"11ba8fe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><em>Learn more:<\/em><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-80704c6 elementor-widget elementor-widget-text-editor\" data-id=\"80704c6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><a href=\"https:\/\/databricks.com\/blog\/2020\/01\/30\/what-is-a-data-lakehouse.html\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\"><em>Data Lakehouses<\/em><\/a>: An emerging system design that combines the data structures and management features from a data warehouse with the low-cost storage of a data lake.<\/li>\n<li><a href=\"https:\/\/a16z.com\/2020\/11\/12\/a16z-podcast-the-great-data-debate\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>The Great Data Debate<\/em><\/a>: A cool episode of the a16z podcast with thought-provoking notes about different technologies and architectures emerging in the data stack.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-58f5a3c elementor-widget elementor-widget-heading\" data-id=\"58f5a3c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">2. The \u201cmodern data stack\u201d goes mainstream<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-82d78ad elementor-widget elementor-widget-text-editor\" data-id=\"82d78ad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"a434\">Starting in 2020, the term \u201c<strong>modern data stack<\/strong>\u201d was everywhere you looked in the data world. It refers to the new, best-of-breed modern data architecture for dealing with massive amounts of data.<\/p>\n\n<p id=\"653f\">One of the key pillars of the modern data stack is a powerful cloud platform. Originally centered around cloud data warehouses, it\u2019s also beginning to include cloud data lakes and associated data lake engines.<\/p>\n\n<p id=\"8292\">Today, the modern data stack refers to a suite of tools for every part of the data workflow:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-904d06b elementor-widget elementor-widget-text-editor\" data-id=\"904d06b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><strong>Data ingestion<\/strong>: e.g.\u00a0<a href=\"https:\/\/fivetran.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Fivetran<\/a>,\u00a0<a href=\"https:\/\/www.stitchdata.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Stitch<\/a>,\u00a0<a href=\"https:\/\/hevodata.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Hevodata<\/a><\/li>\n<li><strong>Data warehousing<\/strong>: e.g.\u00a0<a href=\"https:\/\/www.snowflake.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Snowflake<\/a>,\u00a0<a href=\"https:\/\/cloud.google.com\/bigquery\/\" target=\"_blank\" rel=\"noreferrer noopener\">BigQuery<\/a><\/li>\n<li><strong>Data lakes<\/strong>: e.g.\u00a0<a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noreferrer noopener\">Amazon S3<\/a><\/li>\n<li><strong>Data lake processing<\/strong>: e.g.\u00a0<a href=\"https:\/\/prestodb.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Presto<\/a>,\u00a0<a href=\"https:\/\/www.dremio.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Dremio<\/a>,\u00a0<a href=\"https:\/\/databricks.com\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">Databricks<\/a>,\u00a0<a href=\"https:\/\/www.starburst.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Starburst<\/a><\/li>\n<li><strong>Data transformation<\/strong>: e.g.\u00a0<a href=\"https:\/\/www.getdbt.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">dbt<\/a>,\u00a0<a href=\"https:\/\/www.matillion.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Matillion<\/a><\/li>\n<li><strong>Metadata management:\u00a0<\/strong>e.g.\u00a0<a href=\"https:\/\/atlan.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Atlan<\/a><\/li>\n<li><strong>BI tools<\/strong>: e.g.\u00a0<a href=\"https:\/\/looker.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Looker<\/a><\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1b6b543 elementor-widget elementor-widget-heading\" data-id=\"1b6b543\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><em>Learn more:<\/em><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3ad0f44 elementor-widget elementor-widget-text-editor\" data-id=\"3ad0f44\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><a href=\"https:\/\/a16z.com\/2020\/10\/15\/the-emerging-architectures-for-modern-data-infrastructure\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Emerging Architectures for Modern Data Infrastructure<\/em><\/a>: An great, in-depth read about what technologies are winning in the modern data stack, based on interviews with 20+ practitioners.<\/li>\n<li><a href=\"https:\/\/resources.fivetran.com\/mdsconference\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Modern Data Stack Conference 2020<\/em><\/a>: Resources from Fivetran\u2019s first modern data stack conference on the latest innovations, tools, and best practices.<\/li>\n<li><a href=\"https:\/\/moderndatastack.substack.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>The Modern Data Stack Newsletter<\/em><\/a>: A biweekly newsletter with blogs, guides, and podcasts on the modern data stack.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d0ae593 elementor-widget elementor-widget-heading\" data-id=\"d0ae593\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">3. Metadata 3.0: metadata management is reborn<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cd3dabc elementor-widget elementor-widget-text-editor\" data-id=\"cd3dabc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"878a\">As the modern data stack matures, companies have undertaken ambitious projects to upgrade their data infrastructure and sort out basic data needs (i.e. ingesting data, wrapping up cloud migration projects, and setting up new BI tools).\u00a0<strong>While these have unlocked a lot of potential, they\u2019ve also created chaos<\/strong>.<\/p>\n\n<p id=\"71b4\">Context questions like \u201cWhat does this column name actually mean?\u201d and \u201cWhy are the sales numbers on the dashboard wrong again?\u201d kill the agility of teams that are otherwise moving at breakneck speed.<\/p>\n\n<p id=\"77bb\">While these aren\u2019t new questions, we\u2019re on the cusp of new disruptive solutions. As modern data platforms are converging around five main players (AWS, Azure, Google Cloud Platform, Snowflake, and Databricks) and metadata itself is becoming big data, there\u2019s significant potential for bringing intelligence and automation to the metadata space.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-85611a2 elementor-widget elementor-widget-text-editor\" data-id=\"85611a2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote class=\"wp-block-quote\">\n<p>In the next 24 to 36 months, we\u2019ll see the rise of one or more modern metadata management platforms built for the modern data stack, which solve for data discovery, data cataloging, data lineage, and observability.<\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-04704f6 elementor-widget elementor-widget-heading\" data-id=\"04704f6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><em>Learn more:<\/em><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-49adaa0 elementor-widget elementor-widget-text-editor\" data-id=\"49adaa0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><a href=\"https:\/\/towardsdatascience.com\/data-catalog-3-0-modern-metadata-for-the-modern-data-stack-ec621f593dcf\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Data Catalog 3.0<\/em><\/a>: My article on the past and future of metadata solutions, and why we\u2019re about to make a huge leap forward in creating modern metadata for the modern data stack.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d25041e elementor-widget elementor-widget-heading\" data-id=\"d25041e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">4. New roles emerge: Analytics Engineer and Data Platform Leader<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-52147c0 elementor-widget elementor-widget-text-editor\" data-id=\"52147c0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c058\">2020 saw the rise of two roles that have become more mainstream than ever before.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1221a22 elementor-widget elementor-widget-heading\" data-id=\"1221a22\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">1. Data Platform Leader<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0c6b00a elementor-widget elementor-widget-text-editor\" data-id=\"0c6b00a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote class=\"wp-block-quote\">\n<p>Organizations are increasingly realizing that there needs to be a central team responsible for developing data platforms that help the rest of the organization do their work better. And naturally, this team needs a leader.<\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-98c0abb elementor-widget elementor-widget-text-editor\" data-id=\"98c0abb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3c93\">In the past, this was handled by more traditional roles like Data Warehousing Specialists or Data Architects. Now it\u2019s become common to have a data leader, who leads the data initiative across the organization. These people go by a range of titles, such as \u201cHead of Data Platform\u201d or \u201cDirector of Data Platforms\u201d.<\/p>\n\n<p id=\"07aa\">Data platform leaders typically oversee the modernization (or set-up from scratch, for startups) of a company\u2019s data stack. This includes setting up a cloud data lake and warehouse, implementing a data governance framework, choosing a BI tool, and more.<\/p>\n\n<p id=\"ba99\"><strong>This new role comes with an important new KPI<\/strong>:\u00a0<strong>end user adoption<\/strong>. This refers to the leader\u2019s ability to get people and teams within the organization to adopt data (and data platforms) in their daily workflows. This is a welcome change, as it aligns the incentives of those deciding what data products to invest in with those who ultimately use the products.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f2399e4 elementor-widget elementor-widget-heading\" data-id=\"f2399e4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">2. Analytics Engineer<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7a2d868 elementor-widget elementor-widget-text-editor\" data-id=\"7a2d868\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5871\">Every analyst that I\u2019ve spoken to in the past decade had one major frustration: dependency on data engineers for productionalization and setting up data pipelines.<\/p>\n\n<p id=\"e729\">The rise of powerful SQL-based pipeline building tools like\u00a0<a href=\"https:\/\/www.getdbt.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">dbt<\/a>\u00a0and\u00a0<a href=\"https:\/\/dataform.co\/\" target=\"_blank\" rel=\"noreferrer noopener\">Dataform<\/a>\u00a0has changed this for the better. By giving the analyst superpowers, they put the entire data transformation process in the hands of data analysts.<\/p>\n\n<p id=\"d1cc\">The result is the rise of the term \u201cAnalytics Engineer\u201d, which describes former analysts who now own the entire data stack from ingestion and transformation to finally delivering usable data sets to the rest of the business.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a650bef elementor-widget elementor-widget-heading\" data-id=\"a650bef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><em>Learn more<\/em>:<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eea947c elementor-widget elementor-widget-text-editor\" data-id=\"eea947c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><a href=\"https:\/\/blog.getdbt.com\/what-is-an-analytics-engineer\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\"><em>What Is an Analytics Engineer?<\/em><\/a>\u00a0An article from\u00a0Claire Carroll\u00a0at dbt about the why and how behind the new analytics engineering role.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f9c6482 elementor-widget elementor-widget-heading\" data-id=\"f9c6482\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">5. Data quality frameworks are on the rise<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-86ee585 elementor-widget elementor-widget-text-editor\" data-id=\"86ee585\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"e421\">Data quality is a space that hasn\u2019t seen much innovation in the last two decades. However, it\u2019s recently made significant strides, and different aspects of data quality are being incorporated throughout the data stack.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-717d518 elementor-widget elementor-widget-heading\" data-id=\"717d518\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Data quality profiling<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-19c6470 elementor-widget elementor-widget-text-editor\" data-id=\"19c6470\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d80e\">Data profiling is the process of reviewing data to understand its content and structure, check its quality, and identify how it can be used in the future.<\/p>\n\n<p id=\"a429\">Profiling can happen several times through a data asset\u2019s lifecycle, ranging from shallow to in-depth assessments. It includes calculating the\u00a0<strong>missing values<\/strong>,\u00a0<strong>minimums and maximums<\/strong>,\u00a0<strong>median and mode<\/strong>,\u00a0<strong>frequency distribution<\/strong>, and other key statistical indicators that help users understand the underlying data quality.<\/p>\n\n<p id=\"5f05\">While data quality profiling was typically a stand-alone product in the data stack, companies are increasingly incorporating it as a capability in\u00a0<a href=\"https:\/\/atlan.com\/\" rel=\"noopener\">modern data catalogs<\/a>, enabling end users to understand and trust their data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5d876e7 elementor-widget elementor-widget-heading\" data-id=\"5d876e7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Business-driven data quality rules<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a50357a elementor-widget elementor-widget-text-editor\" data-id=\"a50357a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d5c7\">Data quality isn\u2019t just about the statistical understanding of data. It\u2019s also about whether the data is trustworthy, based on business context.<\/p>\n\n<p id=\"49ef\">For example, your sales numbers typically shouldn\u2019t increase by more than 10% per week. A 100% spike in sales should alert the right team member and stop the data pipeline run, rather than making its way to a dashboard the CEO uses!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b77d7aa elementor-widget elementor-widget-text-editor\" data-id=\"b77d7aa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote class=\"wp-block-quote\">\n<p>This need for intelligent alerts has led organizations to bring business teams into the process of writing data quality checks.<\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-94b2a5c elementor-widget elementor-widget-text-editor\" data-id=\"94b2a5c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8eae\">There still isn\u2019t a great way for data teams to collaborate with business counterparts on data quality checks, but I expect this space will see a lot of innovation in the years to come. In the future, we\u2019ll see smarter solutions that auto-generate business-driven data quality rules based on trends in the data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fefbe45 elementor-widget elementor-widget-heading\" data-id=\"fefbe45\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Data quality tests in data pipelines<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c517d1f elementor-widget elementor-widget-text-editor\" data-id=\"c517d1f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"624b\">The third way that data quality is becoming common is writing it into the data pipeline itself. This borrows principles from \u201cunit tests\u201d in the software engineering world.<\/p>\n\n<p id=\"9796\">Software engineering has included unit testing frameworks for years. These automatically test each individual unit of code to make sure it\u2019s ready to use.\u00a0<mark><strong>Data quality tests within the pipeline mimic unit testing frameworks\u00a0<\/strong><\/mark><mark>to bring the same confidence and speed to data engineering.<\/mark><\/p>\n\n<p id=\"b296\">This helps teams catch data quality issues caused by upstream data changes before it affects the organization\u2019s workflows and reports.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6cc57b5 elementor-widget elementor-widget-heading\" data-id=\"6cc57b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><em>Learn more<\/em>:<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7051002 elementor-widget elementor-widget-text-editor\" data-id=\"7051002\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/test-data-quality-at-scale-with-deequ\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Amazon Deequ<\/em><\/a>: Built internally at Amazon, Deequ is a promising open-source framework for data quality profiling.<\/li>\n<li><a href=\"https:\/\/greatexpectations.io\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Great Expectations<\/em><\/a>: This is emerging as a popular open-source community for data quality testing within data pipelines.<\/li>\n<li><em>Netflix\u2019s Presentation on Scaling Data Quality<\/em>: This is an interesting read for any data leader getting started on their data quality journey.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5414c23 elementor-widget elementor-widget-text-editor\" data-id=\"5414c23\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8ad4\">Do you agree or disagree with these trends? Spotted something that we\u2019ve missed? Drop a comment with your insights!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>What will 2021 bring to the data world? How will data infrastructure evolve to keep up with all the latest innovations and changes? This year, we\u2019ll see several new data trends.<\/p>\n","protected":false},"author":1039,"featured_media":18598,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[611,977,1304,1305],"ppma_author":[3884],"class_list":["post-22599","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-data-engineering","tag-data-management","tag-data-platforms","tag-data-trends"],"authors":[{"term_id":3884,"user_id":1039,"is_guest":0,"slug":"prukalpa-sankar","display_name":"Prukalpa Sankar","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Prukalpa-Sankar-1-150x150.jpeg","user_url":"https:\/\/atlan.com\/","last_name":"Sankar","first_name":"Prukalpa","job_title":"","description":"Prukalpa Sankar is Co-Founder at Atlan that is home for Data Teams. She was awarded Economic Times Emerging Entrepreneur for the Year, She became Forbes 30u30,  Fortune 40u40,  Top 10 CNBC Young Business Women 2016, and TED Speaker."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22599","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1039"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22599"}],"version-history":[{"count":8,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22599\/revisions"}],"predecessor-version":[{"id":32318,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22599\/revisions\/32318"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18598"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22599"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22599"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22599"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22599"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}