{"id":24778,"date":"2021-06-09T11:06:50","date_gmt":"2021-06-09T11:06:50","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=24778"},"modified":"2023-08-19T12:21:38","modified_gmt":"2023-08-19T12:21:38","slug":"introduction-to-data-libraries-for-small-data-science-teams-2","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/introduction-to-data-libraries-for-small-data-science-teams-2\/","title":{"rendered":"Introduction To Data Libraries For Small Data Science Teams"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"24778\" class=\"elementor elementor-24778\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-afd24b8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"afd24b8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-46a890a\" data-id=\"46a890a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-78bd1fb elementor-widget elementor-widget-text-editor\" data-id=\"78bd1fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>At smaller companies <a href=\"https:\/\/www.clearrisk.com\/risk-management-blog\/challenges-of-data-analytics\" target=\"_blank\" rel=\"noreferrer noopener\">access to and control of data is one of the biggest challenges<\/a> faced by data analysts and data scientists. The same is true at larger companies when an analytics team is forced to navigate bureaucracy, cybersecurity and over-taxed IT, rather than benefit from a team of <a href=\"http:\/\/www.experfy.com\/blog\/experfy-insights\/introduction-to-data-libraries-for-small-data-science-teams\/\" target=\"_blank\" rel=\"noreferrer noopener\">data engineers<\/a> dedicated to collecting and making good data available.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-510a589 elementor-widget elementor-widget-text-editor\" data-id=\"510a589\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Creative, persistent analysts find ways to get access to at least some of this data. Through a combination of daily processes to save email attachments, run database queries, and copy and paste from internal web pages one might build up a mighty collection of data sets on a personal computer or in a team shared drive or even a database.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8ff0bdb elementor-widget elementor-widget-text-editor\" data-id=\"8ff0bdb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>But this solution does not scale well, and is <a href=\"https:\/\/medium.com\/management-matters\/who-holds-the-institutional-knowledge-in-todays-work-environment-70c4e5fdfa3\" target=\"_blank\" rel=\"noreferrer noopener\">rarely documented and understood by others who could take it over if a particular analyst moves on to a different role or company<\/a>. In addition, it is a nightmare to maintain. One may spend a significant part of each day executing these processes and troubleshooting failures; there may be little time to actually <em>use<\/em> this data!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-22ae630 elementor-widget elementor-widget-text-editor\" data-id=\"22ae630\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>I lived this for years at different companies. We found ways to be effective but data management took up way too much of our time and energy. Often, we did not have the data we needed to answer a question. I continued to learn from the ingenuity of others and my own trial and error, which led me to the theoretical framework that I will present in this blog series: building a self-managed <em>data library<\/em>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b4e7810 elementor-widget elementor-widget-text-editor\" data-id=\"b4e7810\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>A data library is <em>not <\/em>a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_warehouse\" target=\"_blank\" rel=\"noreferrer noopener\">data warehouse<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_lake\" target=\"_blank\" rel=\"noreferrer noopener\">data lake<\/a>, or any other formal BI architecture. It does not require any particular technology or skill set (coding will not be required but it will greatly increase the speed at which you can build and the degree of automation possible). So what is a data library and how can a small data analytics team use it to overcome the challenges I\u2019ve described?<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6f83c59 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6f83c59\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a32bcd7\" data-id=\"a32bcd7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a243cf4 elementor-widget elementor-widget-heading\" data-id=\"a243cf4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">What is a data library?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-270ba28 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"270ba28\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-40573e7\" data-id=\"40573e7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5649a97 elementor-widget elementor-widget-text-editor\" data-id=\"5649a97\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>* A set of principles for data management, not a technology stack.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f2b211c elementor-widget elementor-widget-text-editor\" data-id=\"f2b211c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>* An informal, loosely but adequately connected data architecture consisting of data ponds, analytics datasets, and reporting datasets.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-81df9c6 elementor-widget elementor-widget-text-editor\" data-id=\"81df9c6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>* A balance of speed of development, agility, usability, and cost.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c26f9f elementor-widget elementor-widget-text-editor\" data-id=\"3c26f9f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>* Prioritizes inclusion of data based on potential business value, difficulty, and data privacy concerns for a particular data source.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8f871f9 elementor-widget elementor-widget-text-editor\" data-id=\"8f871f9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The data library approach is useful for the most common types of data that business create and use, but not everything. It will not accommodate unstructured data (unless that data is being stored in a structured way, like a database table). Structured data should only be added if there is business value that exceeds the cost of setup, storage, and administration.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-276ec3e elementor-widget elementor-widget-text-editor\" data-id=\"276ec3e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In this series I will write articles on each of these four points both explaining the theory and providing practical examples of how it can be implemented.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-32db72d elementor-widget elementor-widget-text-editor\" data-id=\"32db72d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Utilizing this framework I have frontloaded much of the data acquisition and data cleaning time that used to be a part of every new project. Bringing new data sources into our data library is a continuous priority, and once the data is there, it can fuel new analyses, models, and reports with little need for data munging.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5ac2904 elementor-widget elementor-widget-text-editor\" data-id=\"5ac2904\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>On a daily basis our team spends very little time collecting data. On most days we review some basic data health metrics, find no problems, and go about our business.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b5dd100 elementor-widget elementor-widget-text-editor\" data-id=\"b5dd100\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Our data library has been built with a data privacy focus. We know exactly where and what kind of personal information is stored to comply with privacy regulations such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/General_Data_Protection_Regulation\" target=\"_blank\" rel=\"noreferrer noopener\">GDPR<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/California_Consumer_Privacy_Act\" target=\"_blank\" rel=\"noreferrer noopener\">CCPA<\/a>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d417a66 elementor-widget elementor-widget-text-editor\" data-id=\"d417a66\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>We regularly check in with stakeholders for the data we are collecting. We have good&#8211;not great&#8211; documentation on each process and data element.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0bbe822 elementor-widget elementor-widget-text-editor\" data-id=\"0bbe822\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Yes, it has taken time to do this. But in less than a year most of the technical work has been done by two people who spend less than half their time on it, along with much consultation with IT, other data analysts, and our internal customers.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4cebd61 elementor-widget elementor-widget-text-editor\" data-id=\"4cebd61\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>You may be a small data analytics team (or you may even be a <a href=\"https:\/\/snuzz1.bandcamp.com\/album\/the-one-piece-band\" target=\"_blank\" rel=\"noreferrer noopener\">one-piece band<\/a>), but that doesn\u2019t mean you have to settle for inefficient and incomplete data management.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>At smaller companies access to and control of data is one of the biggest challenges faced by data analysts and data scientists. The same is true at larger companies when an analytics team is forced to navigate bureaucracy, cybersecurity and over-taxed IT, rather than benefit from a team of data engineers dedicated to collecting and<\/p>\n","protected":false},"author":1135,"featured_media":23727,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-post-2.php","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[],"ppma_author":[3185],"class_list":["post-24778","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud"],"authors":[{"term_id":3185,"user_id":1135,"is_guest":0,"slug":"chris-umphlett","display_name":"Chris Umphlett","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Chris-Umphlett-150x150.jpg","user_url":"","last_name":"Umphlett","first_name":"Chris","job_title":"","description":"Chris Umphlett is the Manager of Data Analysis and Data Privacy at TechSmith, the makers of great software like Snagit and Camtasia. Before that he worked on analytics teams in the consumer packaged goods, life insurance, and utility industries. He lives in East Lansing, Michigan with his wife and young children."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/24778","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1135"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=24778"}],"version-history":[{"count":8,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/24778\/revisions"}],"predecessor-version":[{"id":30658,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/24778\/revisions\/30658"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/23727"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=24778"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=24778"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=24778"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=24778"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}