{"id":10002,"date":"2020-10-02T09:48:37","date_gmt":"2020-10-02T09:48:37","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=10002"},"modified":"2023-10-25T11:47:08","modified_gmt":"2023-10-25T11:47:08","slug":"fundamentals-of-data-architecture-understand-architectural-diagrams","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/fundamentals-of-data-architecture-understand-architectural-diagrams\/","title":{"rendered":"Fundamentals of Data Architecture to Help Data Scientists Understand Architectural Diagrams Better"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"10002\" class=\"elementor elementor-10002\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-13f40267 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"13f40267\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7c41ed91\" data-id=\"7c41ed91\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7ade1413 elementor-widget elementor-widget-text-editor\" data-id=\"7ade1413\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"has-medium-font-size\"><em>Before pretending you understand the diagram your smart colleague shows to you.<\/em><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0345f93 elementor-widget elementor-widget-heading\" data-id=\"0345f93\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Introduction<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1609a4f elementor-widget elementor-widget-text-editor\" data-id=\"1609a4f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"48bf\">Within a company using data to derive business value, although you may not be appreciated with your data science skills all the time, you always are when you manage the data infrastructure well. Everyone wants the data stored in an accessible location, cleaned up well, and updated regularly.<\/p>\n\n\n\n<p id=\"b383\">See\u00a0<a href=\"https:\/\/cloud.google.com\/scheduler\/docs\/tut-pub-sub\" target=\"_blank\" rel=\"noreferrer noopener\">this official instruction<\/a>\u00a0for further details, and here are screenshots from my set-up.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-23b5633 elementor-widget elementor-widget-image\" data-id=\"23b5633\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"895\" height=\"303\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1.png\" class=\"attachment-large size-large wp-image-33741\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1.png 895w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-300x102.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-768x260.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-610x207.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-750x254.png 750w\" sizes=\"(max-width: 895px) 100vw, 895px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-98c8687 elementor-widget elementor-widget-image\" data-id=\"98c8687\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"592\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-1024x592.png\" class=\"attachment-large size-large wp-image-33742\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-1024x592.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-300x174.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-768x444.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-610x353.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-750x434.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-1140x660.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ.png 1184w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6bc6f6d elementor-widget elementor-widget-text-editor\" data-id=\"6bc6f6d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"edaf\">The code to run has to be enclosed in a function named whatever you like (\u201cnytaxi_pubsub\u201d in my case.) The code content consists of two parts: part 1 to run a query on BigQuery to reduce the original BigQuery table to KPIs and save it as another data table in BigQuery, as well as make it a Pandas data frame, and part 2 to push the data frame to Sheets.<\/p>\n\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b530d05 elementor-widget elementor-widget-text-editor\" data-id=\"b530d05\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"e8ca\">Here\u2019re the codes I actually used. Importantly, the authentication to BigQuery is automatic as long as it resides within the same GCP project as Cloud Function (see\u00a0<a href=\"https:\/\/cloud.google.com\/docs\/authentication\/production\" rel=\"noopener\">this page<\/a>\u00a0for explanation.) Yet, this is not the case about the Google Sheets, which needs at least a procedure to share the target sheet through Service Account. See\u00a0<a href=\"https:\/\/gspread.readthedocs.io\/en\/latest\/oauth2.html\" target=\"_blank\" rel=\"noreferrer noopener\">the description in gspread library<\/a>\u00a0for more details.https:\/\/towardsdatascience.com\/media\/080a1ff551fc1ac1f575063b31624087main.py (coded by author)https:\/\/towardsdatascience.com\/media\/afc6bd20ab3b518e641cb0e24baafd0frequirements.txt (coded by author)<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fe70e41 elementor-widget elementor-widget-text-editor\" data-id=\"fe70e41\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div class='gist '><\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-765ecc5 elementor-widget elementor-widget-text-editor\" data-id=\"765ecc5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"has-text-align-center has-small-font-size\">main.py (coded by author)<\/p>\n\n\n<div class='gist '><\/div>\n\n<p class=\"has-text-align-center has-small-font-size\">requirements.txt (coded by author)<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f2b3bef elementor-widget elementor-widget-heading\" data-id=\"f2b3bef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Final data mart on Google Sheets<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fbbfd4f elementor-widget elementor-widget-text-editor\" data-id=\"fbbfd4f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"152c\">Finally, I got the aggregated data in Google Sheets like this:<\/p>\n\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ab1904a elementor-widget elementor-widget-image\" data-id=\"ab1904a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"546\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-1024x546.png\" class=\"attachment-large size-large wp-image-33743\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-1024x546.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-300x160.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-768x409.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-610x325.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-750x400.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-1140x608.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA.png 1328w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c30cff1 elementor-widget elementor-widget-text-editor\" data-id=\"c30cff1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"5b10\">This sheet is automatically updated every morning, and as the data warehouse is receiving new data through ETL from the data lake, we can easily keep track of the NY taxi KPIs the first thing every morning.<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f82404a elementor-widget elementor-widget-heading\" data-id=\"f82404a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Ending Note<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f815158 elementor-widget elementor-widget-text-editor\" data-id=\"f815158\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"6636\">In a large company who hires data engineers and\/or data architects along with data scientists, a primary role of data scientists is not necessarily to prepare the <a href=\"https:\/\/www.experfy.com\/blog\/challenges-big-data-infrastructure\/\" target=\"_blank\" rel=\"noreferrer noopener\">data infrastructure<\/a> and put it in place, but knowing at least getting the gist of data architecture will benefit well to understand where we stand in the daily works.<\/p>\n\n\n\n<p id=\"3682\">Data Lake -&gt; Data Warehouse -&gt; Data Mart is a typical platform framework to process the data from the origin to the use case. Separating the process into three system components has many benefits for maintenance and purposefulness.<\/p>\n\n\n\n<p id=\"2586\">There are many options in the choice of tools. They are to be wisely selected against the data environment (size, type, and etc.) and the goal of the business.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aafbd95 elementor-widget elementor-widget-text-editor\" data-id=\"aafbd95\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"3b6e\">Finally in this post, I discussed a case study where we prepared a small size data mart on Google Sheets, pulling out data from BigQuery as a data warehouse. With the use of Cloud Scheduler and Pub\/Sub, the update was made to be automatic.<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-24c830b elementor-widget elementor-widget-heading\" data-id=\"24c830b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Reference<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9d2485d elementor-widget elementor-widget-text-editor\" data-id=\"9d2485d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<ul class=\"wp-block-list\">\n<li>\u201cData Lake vs Data Warehouse vs Data Mart\u201d,\u00a0<a href=\"https:\/\/www.holistics.io\/blog\/author\/jatin\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jatin Raisinghani<\/a>, Holistic Blog (<a href=\"https:\/\/www.holistics.io\/blog\/data-lake-vs-data-warehouse-vs-data-mart\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.holistics.io\/blog\/data-lake-vs-data-warehouse-vs-data-mart\/<\/a>)<\/li>\n<li>A slide \u201cData Platform Guide\u201d (in Japanese), @yuzutas0 (twitter),\u00a0<a href=\"https:\/\/speakerdeck.com\/yuzutas0\/20200715\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/speakerdeck.com\/yuzutas0\/20200715<\/a><\/li>\n<li>\u201cConnected Sheets: Analyze Big Data In Google Sheets\u201d, BenCollins,\u00a0<a href=\"https:\/\/www.benlcollins.com\/spreadsheets\/connected-sheets\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.benlcollins.com\/spreadsheets\/connected-sheets\/<\/a><\/li>\n<\/ul>\n\n\n<p>&nbsp;<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\n\n\n<p id=\"67d9\">Backed up by these unobtrusive but steady demands, the salary of a data architect is equally high or even higher than that of a data scientist. In fact, based on the salary research conducted by PayScale (<a href=\"https:\/\/www.payscale.com\/research\/US\/Country=United_States\/Salary\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.payscale.com\/research\/US\/Country=United_States\/Salary<\/a>) shows the US average salary of Data Architect is\u00a0<em>$121,816<\/em>, while that of Data Scientist is\u00a0<em>$96,089<\/em>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2593d66 elementor-widget elementor-widget-text-editor\" data-id=\"2593d66\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"87d3\">Not to say all data scientists should change their job, there would be a lot of benefits for us to learn at least the fundamentals of data architecture. Actually, there is one simple (but meaningful) framework that will help you understand any kinds of real-world data architectures.<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1a0ea05 elementor-widget elementor-widget-heading\" data-id=\"1a0ea05\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Table of Contents<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9731c95 elementor-widget elementor-widget-text-editor\" data-id=\"9731c95\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<ul class=\"wp-block-list\">\n<li><strong>Three Components in Data Architecture: Data Lake -&gt; Data Warehouse -&gt; Data Mart<\/strong><\/li>\n<li><strong>Tools Used in Each Component<\/strong><\/li>\n<li><strong>Case Study \u2014 Building Scheduled &amp; Automatic Data Feed from BigQuery (Data Warehouse) to Google Sheets (Data Mart)<\/strong><\/li>\n<li><strong>Ending Note<\/strong><\/li>\n<\/ul>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1242d33 elementor-widget elementor-widget-heading\" data-id=\"1242d33\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Three Components in Data Architecture: Data Lake - Data Warehouse - Data Mart <\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9a8b6a3 elementor-widget elementor-widget-text-editor\" data-id=\"9a8b6a3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"9f12\">\u201cData Lake\u201d, \u201cData Warehouse\u201d, and \u201cData Mart\u201d are typical components in the architecture of data platform. In this order, data produced in the business is processed and set to create another data implication.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d5be8eb elementor-widget elementor-widget-image\" data-id=\"d5be8eb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"498\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw-1024x498.png\" class=\"attachment-large size-large wp-image-33728\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw-1024x498.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw-300x146.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw-768x374.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw-610x297.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw-750x365.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw-1140x555.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1__5T13K9iecBAtJ73m0ImVw.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-67c05d3 elementor-widget elementor-widget-text-editor\" data-id=\"67c05d3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"4490\">Three components take responsibility for three different functionalities as such:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data Lake<\/strong>: holds an original copy of data produced in the business. Data processing from the original should be minimal if any; otherwise in case some data processing turned out to be wrong in the end, it will not be possible to fix the error retrospectively.<\/li>\n<li><mark><strong>Data Warehouse<\/strong><\/mark><mark>: holds data processed and structured by a managed data model, reflecting the global (not specific) direction of the final use of the data. In many cases, the data is in tabular format.<\/mark><\/li>\n<li><strong>Data Mart<\/strong>: holds a subpart and\/or aggregated data set for the use of a particular business function, e.g. specific business unit or specific geographical area. A typical example is when we prepare the summary of KPIs for a specific business line followed by visualization in BI tool. Especially, preparing this kind of separate and independent component after the warehouse is worthwhile when the user wants the data mart regularly and frequently updated. On contrary, this portion can be skipped in cases the user only wants some set of data for ad hoc analysis done only once.<\/li>\n<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-48cdba2 elementor-widget elementor-widget-image\" data-id=\"48cdba2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"830\" height=\"311\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_XL3CmWH4kAS7syBD7qdEzw.png\" class=\"attachment-large size-large wp-image-33729\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_XL3CmWH4kAS7syBD7qdEzw.png 830w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_XL3CmWH4kAS7syBD7qdEzw-300x112.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_XL3CmWH4kAS7syBD7qdEzw-768x288.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_XL3CmWH4kAS7syBD7qdEzw-610x229.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_XL3CmWH4kAS7syBD7qdEzw-750x281.png 750w\" sizes=\"(max-width: 830px) 100vw, 830px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aca9628 elementor-widget elementor-widget-text-editor\" data-id=\"aca9628\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"a47d\">For more real-world examples beyond this bare-bone-only description, enjoy googling \u201cdata architecture\u201d to find a lot of data architecture diagrams.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-49b2ee4 elementor-widget elementor-widget-image\" data-id=\"49b2ee4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"607\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ-1024x607.png\" class=\"attachment-large size-large wp-image-33730\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ-1024x607.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ-300x178.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ-768x455.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ-610x361.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ-750x444.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ-1140x675.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_JkiB-5TLiA-LvjUYpC7xsQ.png 1261w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-649df8a elementor-widget elementor-widget-heading\" data-id=\"649df8a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Why do we need to split into these three components?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9139682 elementor-widget elementor-widget-text-editor\" data-id=\"9139682\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"887d\">Because different stages within the process have different requirements.<\/p>\n\n\n\n<p id=\"7aa0\">In the data lake stage, we want the data is close to the original, while the data warehouse is meant to keep the data sets more structured, manageable with a clear maintenance plan, and having clear ownership. In the data warehouse, we also like the database type to be analytic-oriented rather than transaction-oriented. On the other hand, data mart should have easy access to non-tech people who are likely to use the final outputs of data journeys.<\/p>\n\n\n\n<p id=\"6bd5\">Differently-purposed system components tend to have re-design at separate times. Then, configuring the components loosely-connected has the advantage in future maintenance and scale-up.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a7306ea elementor-widget elementor-widget-heading\" data-id=\"a7306ea\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">How do data engineers and data scientists work to these three components?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aa933fe elementor-widget elementor-widget-text-editor\" data-id=\"aa933fe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"b293\">Roughly speaking, data engineers cover from data extraction produced in business to the data lake and data model building in data warehouse as well as establishing ETL pipeline; while data scientists cover from data extraction out of data warehouse, building data mart, and to lead to further business application and value creation.<\/p>\n\n\n\n<p id=\"7ff1\">Of course, this role assignment between data engineers and data scientists is somewhat ideal and many companies do not hire both just to fit this definition. Actually, their job descriptions tend to overlap.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-358801a elementor-widget elementor-widget-heading\" data-id=\"358801a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">New trend beyond the three-component approach<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e565d1e elementor-widget elementor-widget-text-editor\" data-id=\"e565d1e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"9abb\">Last but not the least, it should be worth noting that this three-component approach is conventional one present for longer than two decades, and new technology arrives all the time. For example,\u00a0<strong>\u201c<\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_virtualization#:~:text=Data%20virtualization%20is%20an%20approach,of%20any%20other%20entity)%20of\" rel=\"noopener\"><strong>Data <\/strong><\/a><strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_virtualization#:~:text=Data%20virtualization%20is%20an%20approach,of%20any%20other%20entity)%20of\" target=\"_blank\" rel=\"noreferrer noopener\">Virtualization<\/a>\u201d<\/strong>\u00a0is an idea to allow one-stop data management and manipulation interface against data sources, regardless of their formats and physical locations.<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b455aaf elementor-widget elementor-widget-heading\" data-id=\"b455aaf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Tools Used in Each Component<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dd70f71 elementor-widget elementor-widget-text-editor\" data-id=\"dd70f71\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"02b0\">Now, we understood the concept of three data platform components. Then, what tools do people use? Based on\u00a0<a href=\"https:\/\/speakerdeck.com\/yuzutas0\/20200715\" target=\"_blank\" rel=\"noreferrer noopener\">this \u201cData Platform Guide\u201d<\/a>\u00a0(in Japanese) , here\u2019re some ideas:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4cacb64 elementor-widget elementor-widget-heading\" data-id=\"4cacb64\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Data lake\/warehouse<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2dd1cf7 elementor-widget elementor-widget-text-editor\" data-id=\"2dd1cf7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are the following options for data lake and data warehouse.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-675f480 elementor-widget elementor-widget-image\" data-id=\"675f480\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"587\" height=\"185\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_6rtVhY8hh1FnBQ7-buNhDA.png\" class=\"attachment-large size-large wp-image-33731\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_6rtVhY8hh1FnBQ7-buNhDA.png 587w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_6rtVhY8hh1FnBQ7-buNhDA-300x95.png 300w\" sizes=\"(max-width: 587px) 100vw, 587px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7e9ad06 elementor-widget elementor-widget-heading\" data-id=\"7e9ad06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">ETL tools<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0a299a8 elementor-widget elementor-widget-text-editor\" data-id=\"0a299a8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"c272\">ETL happens where data comes to the data lake and to be processed to fit the data warehouse. Data arrives in real-time, and thus ETL prefers event-driven messaging tools.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-94bc78f elementor-widget elementor-widget-image\" data-id=\"94bc78f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"587\" height=\"185\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_6rtVhY8hh1FnBQ7-buNhDA-1.png\" class=\"attachment-large size-large wp-image-33732\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_6rtVhY8hh1FnBQ7-buNhDA-1.png 587w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_6rtVhY8hh1FnBQ7-buNhDA-1-300x95.png 300w\" sizes=\"(max-width: 587px) 100vw, 587px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4505a94 elementor-widget elementor-widget-heading\" data-id=\"4505a94\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Workflow engine<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-77f39d5 elementor-widget elementor-widget-text-editor\" data-id=\"77f39d5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"bafc\">A workflow engine is used to manage the overall pipelining of the data, for example, visualization of where the process is in progress by a flow chart, triggering automatic retry in case of error, etc.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4a9d59a elementor-widget elementor-widget-image\" data-id=\"4a9d59a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"318\" height=\"65\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_7C8UcLpcMj5ompw8ZLW9lg.png\" class=\"attachment-large size-large wp-image-33733\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_7C8UcLpcMj5ompw8ZLW9lg.png 318w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_7C8UcLpcMj5ompw8ZLW9lg-300x61.png 300w\" sizes=\"(max-width: 318px) 100vw, 318px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f727e43 elementor-widget elementor-widget-heading\" data-id=\"f727e43\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Data mart\/BI tools<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c32ab30 elementor-widget elementor-widget-text-editor\" data-id=\"c32ab30\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"023b\">The following tools can be used as data mart and\/or BI solutions. The choice will be dependent on the business context, what tools your company is familiar with (e.g. are you Tableau person or Power BI person?), the size of aggregated data (e.g. if the data size is small, why doesn\u2019t the basic solution like Excel or Google Sheets meet the goal?), what data warehouse solution do you use (e.g. if your data warehouse is on BigQuery, Google DataStudio can be an easy solution because it has natural linkage within the Google circle), and etc.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b0a3946 elementor-widget elementor-widget-image\" data-id=\"b0a3946\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"518\" height=\"124\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_Z7HjZvGmhisQHR2uKe-yNA.png\" class=\"attachment-large size-large wp-image-33734\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_Z7HjZvGmhisQHR2uKe-yNA.png 518w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_Z7HjZvGmhisQHR2uKe-yNA-300x72.png 300w\" sizes=\"(max-width: 518px) 100vw, 518px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1036386 elementor-widget elementor-widget-text-editor\" data-id=\"1036386\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-367a042 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"367a042\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-08d030d\" data-id=\"08d030d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9dd2fed elementor-widget elementor-widget-heading\" data-id=\"9dd2fed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Case Study <\/strong>\u2014 Building Scheduled &amp; Automatic Data Feed from BigQuery (Data Warehouse) to Google Sheets (Data Mart)<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aa60758 elementor-widget elementor-widget-text-editor\" data-id=\"aa60758\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"0dd4\">When the data size stays around or less than tens of megabytes and there is no dependency on other large data set, it is fine to stick to spreadsheet-based tools to store, process, and visualize the data because it is less-costly and everyone can use it.<\/p>\n\n\n\n<p id=\"697f\">Once the data gets larger and starts having data dependency with other data tables, it is beneficial to start from cloud storage as a one-stop data warehouse. (When the data gets even larger to dozens of terabytes, it can make sense to use on-premise solutions for cost-efficiency and manageability.)<\/p>\n\n\n\n<p id=\"be55\">In this chapter, I will demonstrate a case when the\u00a0<strong>data is stored in Google BigQuery as a data warehouse.\u00a0<\/strong>BigQuery data is processed and stored in real-time or in a short frequency. The end-user still wants to see daily KPIs on a spreadsheet on a highly aggregated basis. This means\u00a0<strong>data mart can be small and fits even the spreadsheet solution<\/strong>. Instead of Excel, let\u2019s use\u00a0<strong>Google Sheets<\/strong>\u00a0here because it can be in the same environment as the data source in BigQuery. Oh, by the way, do not think about running the query manually every day.\u00a0<strong>Try to find a solution to make everything running automatically without any action from your side.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-78adfa4 elementor-widget elementor-widget-image\" data-id=\"78adfa4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"177\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA-1024x177.png\" class=\"attachment-large size-large wp-image-33735\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA-1024x177.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA-300x52.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA-768x133.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA-610x106.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA-750x130.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA-1140x197.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_wCAmUWdRpDYIXbz7yAbXKA.png 1426w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c5a6063 elementor-widget elementor-widget-heading\" data-id=\"c5a6063\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Data to be used in this case study<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-158e585 elementor-widget elementor-widget-text-editor\" data-id=\"158e585\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"7297\">In this case study, I am going to use a sample table data which has records of NY taxi passengers per ride, including the following data fields:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Car ID<\/li>\n<li>Driver ID<\/li>\n<li>Date of ride<\/li>\n<li>Number of passengers<\/li>\n<li>Amount of fare<\/li>\n<li>etc.<\/li>\n<\/ul>\n\n\n\n<p id=\"6f30\">The sample data is stored in the BigQuery as a data warehouse.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3a7977d elementor-widget elementor-widget-heading\" data-id=\"3a7977d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Can Google Sheets pull data from BigQuery tables?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-93c54d7 elementor-widget elementor-widget-text-editor\" data-id=\"93c54d7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"870c\">Technically yes, but at the moment this is only available through\u00a0<strong>Connected Sheets and you need an account of G Suite Enterprise, Enterprise for Education, or G Suite Enterprise Essentials account<\/strong>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-32307b8 elementor-widget elementor-widget-image\" data-id=\"32307b8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"688\" height=\"269\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_Jdu2Zp0MEw-3zX_QosSjOw.png\" class=\"attachment-large size-large wp-image-33736\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_Jdu2Zp0MEw-3zX_QosSjOw.png 688w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_Jdu2Zp0MEw-3zX_QosSjOw-300x117.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_Jdu2Zp0MEw-3zX_QosSjOw-610x239.png 610w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c8e51b0 elementor-widget elementor-widget-text-editor\" data-id=\"c8e51b0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"2975\">Connected Sheets allows the user to manipulate BigQuery table data almost as if they play it on spreadsheet. See the GIF demonstration\u00a0<a href=\"https:\/\/www.benlcollins.com\/spreadsheets\/connected-sheets\/\" target=\"_blank\" rel=\"noreferrer noopener\">in this page on \u201cBenCollins\u201d blog post.<\/a><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e867cc1 elementor-widget elementor-widget-image\" data-id=\"e867cc1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"677\" height=\"285\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_gPAzyoy2X-QJxZ0vfuvp8Q.png\" class=\"attachment-large size-large wp-image-33737\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_gPAzyoy2X-QJxZ0vfuvp8Q.png 677w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_gPAzyoy2X-QJxZ0vfuvp8Q-300x126.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_gPAzyoy2X-QJxZ0vfuvp8Q-610x257.png 610w\" sizes=\"(max-width: 677px) 100vw, 677px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9391c0b elementor-widget elementor-widget-text-editor\" data-id=\"9391c0b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"6ec2\"><strong>Connected Sheets also allows automatic scheduling and refresh of the sheets<\/strong>, which is a natural demand as a data mart.<\/p>\n\n\n\n<p id=\"4ac7\">Although it demonstrates itself as a great option, one possible issue is that owing G Suite account is not very common.<\/p>\n\n\n\n<p id=\"4f6c\">For more details about the setups, see\u00a0<a href=\"https:\/\/www.benlcollins.com\/spreadsheets\/connected-sheets\/\" target=\"_blank\" rel=\"noreferrer noopener\">this blog post from \u201cBenCollins\u201d<\/a>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f58a1c elementor-widget elementor-widget-heading\" data-id=\"0f58a1c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">What can we do to push data from BigQuery to Google Sheets?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c47e3cb elementor-widget elementor-widget-text-editor\" data-id=\"c47e3cb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"d69e\">To extract data from BigQuery and push it to Google Sheets, BigQuery alone is not enough, and we need a help of server functionality to call the API to post a query to BigQuery, receive the data, and pass it to Google Sheets.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-22ab29d elementor-widget elementor-widget-image\" data-id=\"22ab29d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"672\" height=\"271\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_BB1YX6NeaAkgbE5PCZpb6g.png\" class=\"attachment-large size-large wp-image-33738\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_BB1YX6NeaAkgbE5PCZpb6g.png 672w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_BB1YX6NeaAkgbE5PCZpb6g-300x121.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_BB1YX6NeaAkgbE5PCZpb6g-610x246.png 610w\" sizes=\"(max-width: 672px) 100vw, 672px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-60ed373 elementor-widget elementor-widget-text-editor\" data-id=\"60ed373\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"7500\">The server functionality can be on a server machine, external or internal of GCP (e.g. \u2018Compute Engine\u2019 instance on GCP; or \u2018EC2\u2019 instance on AWS). The code run can be scheduled using\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cron\" target=\"_blank\" rel=\"noreferrer noopener\">unix-cron job<\/a>. But one downside here is that it takes maintenance work and cost on the instance and is too much for a small program to run.<\/p>\n\n\n\n<p id=\"0a16\">\u2018Google Cloud Functions\u2019 is a so-called \u201cserverless\u201d solution to run code without the launch of a server machine. Putting code in Cloud Functions and setting a trigger event (e.g. scheduled timing in this case study, but also can be HTML request from some internet users), GCP automatically manages the run of the code.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b0f3283 elementor-widget elementor-widget-heading\" data-id=\"b0f3283\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Set-ups in my case study<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3249fbc elementor-widget elementor-widget-text-editor\" data-id=\"3249fbc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"cfcc\">There are two steps in the configuration of my case study using NY taxi data.<\/p>\n\n\n\n<p id=\"fb11\"><em><strong>Step 1: <\/strong>Set up scheduling \u2014 set Cloud Scheduler and Pub\/Sub to trigger a Cloud Function.<\/em><\/p>\n\n\n\n<p id=\"4d26\">Here, \u201cPub\/Sub\u201d is a messaging service to be subscribed by Cloud Functions and to trigger its run every day at a certain time. \u201cCloud Scheduler\u201d is functionality to kick off something with user-defined frequency based on\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cron\" target=\"_blank\" rel=\"noreferrer noopener\">unix-cron format<\/a>. Combining these two, we can create regular messages to be subscribed by Cloud Function. See\u00a0<a href=\"https:\/\/cloud.google.com\/scheduler\/docs\/tut-pub-sub\" target=\"_blank\" rel=\"noreferrer noopener\">this official instruction on how to do<\/a>\u00a0it. Here are screenshots from my GCP set-up.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f68ef27 elementor-widget elementor-widget-image\" data-id=\"f68ef27\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"987\" height=\"171\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_0GZnSSfGaUDBKZBOuTQG7A.png\" class=\"attachment-large size-large wp-image-33739\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_0GZnSSfGaUDBKZBOuTQG7A.png 987w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_0GZnSSfGaUDBKZBOuTQG7A-300x52.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_0GZnSSfGaUDBKZBOuTQG7A-768x133.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_0GZnSSfGaUDBKZBOuTQG7A-610x106.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_0GZnSSfGaUDBKZBOuTQG7A-750x130.png 750w\" sizes=\"(max-width: 987px) 100vw, 987px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2818e16 elementor-widget elementor-widget-image\" data-id=\"2818e16\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"895\" height=\"303\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw.png\" class=\"attachment-large size-large wp-image-33740\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw.png 895w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-300x102.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-768x260.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-610x207.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-750x254.png 750w\" sizes=\"(max-width: 895px) 100vw, 895px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b18ec1b elementor-widget elementor-widget-text-editor\" data-id=\"b18ec1b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"a008\"><em><strong>Step 2: <\/strong>Set up code \u2014 prepare code on Cloud Functions to query BigQuery table and push it to Google Sheets.<\/em><\/p>\n\n\n\n<p id=\"943a\">The next step is to set up Cloud Functions. In Cloud Functions, you define 1) what is the trigger (in this case study, \u201ccron-topic\u201d sent from Pub\/Sub, linked to Cloud Scheduler which pulls the trigger every 6 am in the morning) and 2) the code you want to run when the trigger is detected.<\/p>\n\n\n<!-- wp:paragraph -->\n<p id=\"b383\">See\u00a0<a href=\"https:\/\/cloud.google.com\/scheduler\/docs\/tut-pub-sub\" target=\"_blank\" rel=\"noreferrer noopener\">this official instruction<\/a>\u00a0for further details, and here are screenshots from my set-up.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-23b5633 elementor-widget elementor-widget-image\" data-id=\"23b5633\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"895\" height=\"303\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1.png\" class=\"attachment-large size-large wp-image-33741\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1.png 895w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-300x102.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-768x260.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-610x207.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_uWinPDmnDopXEHAqEupGYw-1-750x254.png 750w\" sizes=\"(max-width: 895px) 100vw, 895px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-98c8687 elementor-widget elementor-widget-image\" data-id=\"98c8687\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"592\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-1024x592.png\" class=\"attachment-large size-large wp-image-33742\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-1024x592.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-300x174.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-768x444.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-610x353.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-750x434.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ-1140x660.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_ta5DNd5FAJTwiPSkAklRTQ.png 1184w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6bc6f6d elementor-widget elementor-widget-text-editor\" data-id=\"6bc6f6d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"edaf\">The code to run has to be enclosed in a function named whatever you like (\u201cnytaxi_pubsub\u201d in my case.) The code content consists of two parts: part 1 to run a query on BigQuery to reduce the original BigQuery table to KPIs and save it as another data table in BigQuery, as well as make it a Pandas data frame, and part 2 to push the data frame to Sheets.<\/p>\n<!-- \/wp:paragraph -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b530d05 elementor-widget elementor-widget-text-editor\" data-id=\"b530d05\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"e8ca\">Here\u2019re the codes I actually used. Importantly, the authentication to BigQuery is automatic as long as it resides within the same GCP project as Cloud Function (see\u00a0<a href=\"https:\/\/cloud.google.com\/docs\/authentication\/production\" rel=\"noopener\">this page<\/a>\u00a0for explanation.) Yet, this is not the case about the Google Sheets, which needs at least a procedure to share the target sheet through Service Account. See\u00a0<a href=\"https:\/\/gspread.readthedocs.io\/en\/latest\/oauth2.html\" target=\"_blank\" rel=\"noreferrer noopener\">the description in gspread library<\/a>\u00a0for more details.https:\/\/towardsdatascience.com\/media\/080a1ff551fc1ac1f575063b31624087main.py (coded by author)https:\/\/towardsdatascience.com\/media\/afc6bd20ab3b518e641cb0e24baafd0frequirements.txt (coded by author)<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fe70e41 elementor-widget elementor-widget-text-editor\" data-id=\"fe70e41\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:frisfruitig\/block-gist-embed {\"url\":\"https:\/\/gist.github.com\/daydreamersjp\/c6c505b0fee51ea9e39f0380d1116ed9#file-cloud_functions_nytaxi-py\",\"jsonFiles\":\"[{u0022nameu0022:u0022cloud_functions_nytaxi.pyu0022,u0022checkedu0022:1}]\",\"filesFetched\":1,\"userHasInteracted\":1} \/-->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-765ecc5 elementor-widget elementor-widget-text-editor\" data-id=\"765ecc5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph {\"align\":\"center\",\"fontSize\":\"small\"} -->\n<p class=\"has-text-align-center has-small-font-size\">main.py (coded by author)<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:frisfruitig\/block-gist-embed {\"url\":\"https:\/\/gist.github.com\/daydreamersjp\/dfa8c24230cdbbd42601096503537f83#file-cloud_functions_nytaxi_requirements-txt\",\"jsonFiles\":\"[{u0022nameu0022:u0022cloud_functions_nytaxi_requirements.txtu0022,u0022checkedu0022:1}]\",\"filesFetched\":1,\"userHasInteracted\":1} \/-->\n<!-- wp:paragraph {\"align\":\"center\",\"fontSize\":\"small\"} -->\n<p class=\"has-text-align-center has-small-font-size\">requirements.txt (coded by author)<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f2b3bef elementor-widget elementor-widget-heading\" data-id=\"f2b3bef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Final data mart on Google Sheets<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fbbfd4f elementor-widget elementor-widget-text-editor\" data-id=\"fbbfd4f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"152c\">Finally, I got the aggregated data in Google Sheets like this:<\/p>\n<!-- \/wp:paragraph -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ab1904a elementor-widget elementor-widget-image\" data-id=\"ab1904a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"546\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-1024x546.png\" class=\"attachment-large size-large wp-image-33743\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-1024x546.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-300x160.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-768x409.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-610x325.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-750x400.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA-1140x608.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/1_QcrBBIohqjF6GXNLvfYDXA.png 1328w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c30cff1 elementor-widget elementor-widget-text-editor\" data-id=\"c30cff1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"5b10\">This sheet is automatically updated every morning, and as the data warehouse is receiving new data through ETL from the data lake, we can easily keep track of the NY taxi KPIs the first thing every morning.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:separator --><hr class=\"wp-block-separator\" \/><!-- \/wp:separator -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f82404a elementor-widget elementor-widget-heading\" data-id=\"f82404a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Ending Note<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f815158 elementor-widget elementor-widget-text-editor\" data-id=\"f815158\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"6636\">In a large company who hires data engineers and\/or data architects along with data scientists, a primary role of data scientists is not necessarily to prepare the <a href=\"https:\/\/www.experfy.com\/blog\/challenges-big-data-infrastructure\/\" target=\"_blank\" rel=\"noreferrer noopener\">data infrastructure<\/a> and put it in place, but knowing at least getting the gist of data architecture will benefit well to understand where we stand in the daily works.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p id=\"3682\">Data Lake -&gt; Data Warehouse -&gt; Data Mart is a typical platform framework to process the data from the origin to the use case. Separating the process into three system components has many benefits for maintenance and purposefulness.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p id=\"2586\">There are many options in the choice of tools. They are to be wisely selected against the data environment (size, type, and etc.) and the goal of the business.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aafbd95 elementor-widget elementor-widget-text-editor\" data-id=\"aafbd95\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"3b6e\">Finally in this post, I discussed a case study where we prepared a small size data mart on Google Sheets, pulling out data from BigQuery as a data warehouse. With the use of Cloud Scheduler and Pub\/Sub, the update was made to be automatic.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:separator --><hr class=\"wp-block-separator\" \/><!-- \/wp:separator -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-24c830b elementor-widget elementor-widget-heading\" data-id=\"24c830b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Reference<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9d2485d elementor-widget elementor-widget-text-editor\" data-id=\"9d2485d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:list -->\n<ul>\n<li>\u201cData Lake vs Data Warehouse vs Data Mart\u201d,\u00a0<a href=\"https:\/\/www.holistics.io\/blog\/author\/jatin\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jatin Raisinghani<\/a>, Holistic Blog (<a href=\"https:\/\/www.holistics.io\/blog\/data-lake-vs-data-warehouse-vs-data-mart\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.holistics.io\/blog\/data-lake-vs-data-warehouse-vs-data-mart\/<\/a>)<\/li>\n<li>A slide \u201cData Platform Guide\u201d (in Japanese), @yuzutas0 (twitter),\u00a0<a href=\"https:\/\/speakerdeck.com\/yuzutas0\/20200715\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/speakerdeck.com\/yuzutas0\/20200715<\/a><\/li>\n<li>\u201cConnected Sheets: Analyze Big Data In Google Sheets\u201d, BenCollins,\u00a0<a href=\"https:\/\/www.benlcollins.com\/spreadsheets\/connected-sheets\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.benlcollins.com\/spreadsheets\/connected-sheets\/<\/a><\/li>\n<\/ul>\n<!-- \/wp:list -->\n<!-- wp:paragraph -->\n<p>&nbsp;<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>A primary role of data scientists is not necessarily to prepare the data infrastructure and put it in place, but knowing at least getting the gist of data architecture will benefit well to understand the daily works.<\/p>\n","protected":false},"author":931,"featured_media":10003,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[687,689,688],"ppma_author":[3711],"class_list":["post-10002","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-architecture","tag-data-infrastructure","tag-data-scientists"],"authors":[{"term_id":3711,"user_id":931,"is_guest":0,"slug":"motoharu-dei","display_name":"Motoharu DEI","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/Motoharu-DEI.jpg","user_url":"http:\/\/www.hilti.group","last_name":"DEI","first_name":"Motoharu","job_title":"","description":"Motoharu DEI is a Data Scientist and Actuary at Hilti Group, a global leader in providing technology-leading products, systems and services."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10002","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/931"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=10002"}],"version-history":[{"count":7,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10002\/revisions"}],"predecessor-version":[{"id":33746,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10002\/revisions\/33746"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/10003"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=10002"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=10002"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=10002"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=10002"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}