{"id":679,"date":"2018-05-11T03:27:12","date_gmt":"2018-05-11T03:27:12","guid":{"rendered":"http:\/\/kusuaks7\/?p=284"},"modified":"2021-05-17T18:17:56","modified_gmt":"2021-05-17T18:17:56","slug":"modern-data-architecture","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/modern-data-architecture\/","title":{"rendered":"Modern Data Architecture"},"content":{"rendered":"<p><strong><em>Ready to learn Data Science? <a href=\"https:\/\/www.experfy.com\/training\/courses\">Browse courses<\/a>&nbsp;like&nbsp;<a href=\"https:\/\/www.experfy.com\/training\/tracks\/data-science-training-certification\">Data Science Training and Certification<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<p id=\"eb71\" name=\"eb71\" style=\"text-align: center;\"><canvas height=\"40\" width=\"75\"><\/canvas><img decoding=\"async\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*Op5b_NUAPVC4ZHkf.jpg\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*Op5b_NUAPVC4ZHkf.jpg\" style=\"width: 650px; height: 349px;\"><\/p>\n<figure id=\"f87b\" name=\"f87b\"><figcaption><em>Image Courtesy:&nbsp;<\/em><a data-href=\"http:\/\/starworldpacknmove.com\/Design-And-Architecture.html\" href=\"http:\/\/starworldpacknmove.com\/Design-And-Architecture.html\" rel=\"noopener nofollow noreferrer\" target=\"_blank\"><em>http:\/\/starworldpacknmove.com\/Design-And-Architecture.html<\/em><\/a><\/p>\n<\/figcaption><\/figure>\n<p id=\"2829\" name=\"2829\">Existing data architectures are at the breaking point with the large amount of data, velocity of data ingestion, and variety of data they need to process and store. Industry analysts are predicting that up to 80% of the new data will be semi-structured and unstructured (video, pictures, audio, documents, emails, and so on) data coming from clickstream, sentiment\/social media, machine sensors, server logs, RFID, and GPS (geographic).<\/p>\n<p id=\"ad7e\" name=\"ad7e\">Modern Data Architecture address the business demands for speed and agility by enabling organizations to quickly find and unify their data across hybrid data storage technologies. The Modern Data Architecture stores data as is; it does not require pre-modeling. It handles the volume, velocity, and variety of big data.<\/p>\n<p id=\"dc65\" name=\"dc65\">Before going deep into Modern Data Architecture, let\u2019s start from the basics. I will cover these aspects of data architecture in this post:<\/p>\n<p id=\"c757\" name=\"c757\"><em>a. What is Data Architecture<\/em><\/p>\n<p id=\"55be\" name=\"55be\"><em>b. History of Data Architecture<\/em><\/p>\n<p id=\"e934\" name=\"e934\"><em>c. Traditional Data Architecture<\/em><\/p>\n<p id=\"8595\" name=\"8595\"><em>d. Advanced Data Architecture<\/em><\/p>\n<p id=\"a6ee\" name=\"a6ee\"><em>e. Modern Data Architecture<\/em><\/p>\n<p id=\"a12d\" name=\"a12d\"><em>f. Six Principles of Modern Data Architecture<\/em><\/p>\n<h4 id=\"e20e\" name=\"e20e\"><strong>What is Data Architecture?<\/strong><\/h4>\n<p id=\"6dfe\" name=\"6dfe\">Data Architecture is a set of rules, policies, and models that determine what kind of data gets collected, and how it gets used, processed, and stored within a database system. Data integration, for example, is dependent on Data Architecture for instructions on the integration process. Without the shift from a programming paradigm to a Data Architecture paradigm, modern computers would be much clumsier and much slower.<\/p>\n<h4 id=\"dd89\" name=\"dd89\"><strong>History of Data Architecture<\/strong><\/h4>\n<p id=\"f47e\" name=\"f47e\">In early days, as you can see in the diagram below, there were no distributed systems. Users were less, transactions were less so whole data architecture could be handled on single server.<\/p>\n<figure id=\"5142\" name=\"5142\"><canvas height=\"70\" width=\"75\"><\/canvas><img decoding=\"async\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*DUbYTmSeFQLnlBMm.png\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*DUbYTmSeFQLnlBMm.png\" style=\"width: 568px; height: 541px;\"><\/figure>\n<h4 id=\"4678\" name=\"4678\"><strong>Traditional Data Architecture<\/strong><\/h4>\n<p name=\"0e82\">Over the period of time, when more users got workstations &amp; automated systems started generating data, Front End &amp; Back End systems were split, which in turn increased the load on the databases. To tackle that, databases were split based on functional divisions. But this created data integrity problems for BI as different databases reported different figures and load also increased in databases from BI side. This resulted into inception of data warehouses and different data marts to cater different reporting needs.<\/p>\n<figure id=\"47db\" name=\"47db\"><canvas height=\"52\" width=\"75\"><\/canvas><img decoding=\"async\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*ec94HNZR2y1kjzvs.png\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*ec94HNZR2y1kjzvs.png\" style=\"width: 650px; height: 464px;\"><\/figure>\n<h4 id=\"b958\" name=\"b958\"><strong>Advanced Data Architecture<\/strong><\/h4>\n<p id=\"4ca4\" name=\"4ca4\">In order to solve for disaster recovery, secondary site came into picture, different replication &amp; backup\/restore options were chosen for databases &amp; data warehouses based on their sizes but latency at the BI\/Reporting side was still a problem. ELT &amp; CDC options were considered with its own pros &amp; cons. ELT allows raw data to be loaded directly into the target and transformed there. CDC is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data.<\/p>\n<figure id=\"08e5\" name=\"08e5\"><canvas height=\"37\" width=\"75\"><\/canvas><img decoding=\"async\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*4GykKMfi4wZ4Es5l.png\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*4GykKMfi4wZ4Es5l.png\" style=\"width: 650px; height: 330px;\"><\/figure>\n<h4 id=\"40db\" name=\"40db\"><strong>Modern Data Architecture<\/strong><\/h4>\n<p id=\"ee37\" name=\"ee37\">Advanced data architecture didn\u2019t solve for all the below mentioned problems:<\/p>\n<p id=\"96da\" name=\"96da\">1. Time to action takes up to 7 days<\/p>\n<p id=\"f411\" name=\"f411\">2. Amount of data is still growing<\/p>\n<p id=\"f79d\" name=\"f79d\">3. DWH MPP storage is expensive<\/p>\n<p id=\"d96a\" name=\"d96a\"><em>Data Lakes &amp; Lambda Architecture<\/em><\/p>\n<p id=\"abde\" name=\"abde\">Above problems can be solved using Data Lakes (2nd &amp; 3rd) &amp; Lambda Architecture (1st). A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.<\/p>\n<figure id=\"adad\" name=\"adad\"><canvas height=\"35\" width=\"75\"><\/canvas><img decoding=\"async\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*MWJ1ZUWO8KaCnzXo.png\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*MWJ1ZUWO8KaCnzXo.png\" style=\"width: 650px; height: 324px;\"><\/figure>\n<figure id=\"32e2\" name=\"32e2\"><canvas height=\"32\" width=\"75\"><\/canvas><img decoding=\"async\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*PyYmwTyq-jaqy_Sm.png\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*PyYmwTyq-jaqy_Sm.png\" style=\"width: 650px; height: 289px;\"><\/figure>\n<p id=\"c6a9\" name=\"c6a9\">Still, there were few problems that needed to be addressed:<\/p>\n<p id=\"8f0b\" name=\"8f0b\">1. Too many standby systems<\/p>\n<p id=\"2a96\" name=\"2a96\">2. How to replicate Hadoop cluster?<\/p>\n<p id=\"cc88\" name=\"cc88\">3. How to sync data in real-time systems?<\/p>\n<p id=\"13bc\" name=\"13bc\">4. How to better sync DWH?<\/p>\n<p id=\"34f6\" name=\"34f6\">All of the above stated problems can be solved using Pipelining. A pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one. As you can see in the next diagram, replication queues takes some time but solve for the higher latency problems.<\/p>\n<figure id=\"5708\" name=\"5708\"><canvas height=\"35\" width=\"75\"><\/canvas><img decoding=\"async\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*p_X7AbQitIRzJ0z_.png\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*p_X7AbQitIRzJ0z_.png\" style=\"width: 650px; height: 313px;\"><\/figure>\n<h4 id=\"1e77\" name=\"1e77\"><strong>Six Principles of Modern Data Architecture<\/strong><\/h4>\n<p id=\"dc87\" name=\"dc87\">1. Data is a Shared Asset<\/p>\n<p id=\"8f34\" name=\"8f34\">2. Provide the Right Interfaces for Consumption<\/p>\n<p id=\"9635\" name=\"9635\">3. Ensure Security and Access Controls<\/p>\n<p id=\"9485\" name=\"9485\">4. Ensure a Common Vocabulary<\/p>\n<p id=\"b88f\" name=\"b88f\">5. Information Through Data Stewardship<\/p>\n<p id=\"7f8d\" name=\"7f8d\">6. Eliminate Data Copies &amp; Movement<\/p>\n<p id=\"387a\" name=\"387a\"><strong>References<\/strong><\/p>\n<ol>\n<li id=\"4bc9\" name=\"4bc9\"><a data-href=\"http:\/\/www.dataversity.net\/brief-history-data-architecture-shifting-paradigms\/\" href=\"http:\/\/www.dataversity.net\/brief-history-data-architecture-shifting-paradigms\/\" rel=\"noopener nofollow noreferrer\" target=\"_blank\"><em>Brief History of Data Architecture<\/em><\/a><\/li>\n<li id=\"de54\" name=\"de54\"><a data-href=\"http:\/\/www.pearsonitcertification.com\/articles\/article.aspx?p=2427073\" href=\"http:\/\/www.pearsonitcertification.com\/articles\/article.aspx?p=2427073\" rel=\"noopener nofollow noreferrer\" target=\"_blank\"><em>Understanding the Big Data World<\/em><\/a><\/li>\n<li id=\"83a6\" name=\"83a6\"><a data-href=\"https:\/\/www.slideshare.net\/AGrishchenko\/modern-data-architecture-54850697\" href=\"https:\/\/www.slideshare.net\/AGrishchenko\/modern-data-architecture-54850697\" rel=\"noopener nofollow noreferrer\" target=\"_blank\"><em>Modern Data Architecture<\/em><\/a><\/li>\n<li id=\"be22\" name=\"be22\"><a data-href=\"https:\/\/vision.cloudera.com\/the-six-principles-of-modern-data-architecture\/\" href=\"https:\/\/vision.cloudera.com\/the-six-principles-of-modern-data-architecture\/\" rel=\"noopener nofollow noreferrer\" target=\"_blank\"><em>Six Principles of Modern Data Architecture<\/em><\/a><\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Existing data architectures are at the breaking point with a large amount of data, velocity of data ingestion, and variety of data they need to process and store. Industry analysts are predicting that up to 80% of the new data will be semi-structured and unstructured. Modern Data Architecture addresses the business demands for speed and agility by enabling organizations to quickly find and unify their data across hybrid data storage technologies. The Modern Data Architecture stores data as is; it does not require pre-modeling. It handles the volume, velocity, and variety of big data.<\/p>\n","protected":false},"author":280,"featured_media":21960,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-post-2.php","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[1811],"class_list":["post-679","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":1811,"user_id":280,"is_guest":0,"slug":"ankit-rathi","display_name":"Ankit Rathi","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Rathi","first_name":"Ankit","job_title":"","description":"Ankit Rathi is Lead Architect at SITA, the leading &amp; innovative IT organization in ATI, delivering end-to-end analytics platforms using Data Science, Big Data &amp; Cloud. He is a Data Science Architect with extensive experience is designing &amp; developing data-intensive technology solutions including data architecture, data science, big data &amp; cloud."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/679","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/280"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=679"}],"version-history":[{"count":2,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/679\/revisions"}],"predecessor-version":[{"id":21962,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/679\/revisions\/21962"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/21960"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=679"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=679"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=679"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=679"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}