{"id":22981,"date":"2021-04-26T16:56:35","date_gmt":"2021-04-26T16:56:35","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/a-tech-agnostic-principled-approach-to-grassroots-data-management\/"},"modified":"2023-09-12T12:20:03","modified_gmt":"2023-09-12T12:20:03","slug":"a-tech-agnostic-principled-approach-to-grassroots-data-management","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/a-tech-agnostic-principled-approach-to-grassroots-data-management\/","title":{"rendered":"A Tech-Agnostic, Principled-Approach To Grassroots Data Management"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22981\" class=\"elementor elementor-22981\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6fc2967 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"19176\" data-id=\"6fc2967\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6b5658b\" data-eae-slider=\"31367\" data-id=\"6b5658b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e7c90e1 elementor-widget elementor-widget-text-editor\" data-id=\"e7c90e1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In the introduction to this series, I explained what a data library is and how it can help a small data analytics team that lacks formal business intelligence support create a solid foundation for data management. This article will explain the universal principles that should guide the development of a data library.<\/p>\n<p><strong>Let\u2019s Look At The Principles That Will Guide Us In The Development Of A Data Library:<\/strong><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4840b74 elementor-widget elementor-widget-heading\" data-id=\"4840b74\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Forward Looking\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3123c8e elementor-widget elementor-widget-text-editor\" data-id=\"3123c8e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAll new analyses, processes, analytics models and reports should derive from the library if the required data has been cataloged. In some cases, the data can be cataloged as part of the project for creating that new report. Re-creating old reports that are working sufficiently for the business&#8217;s purpose will not be a high priority. Neither should a team expend disproportionate time or effort trying to recover historical data that is not easily available. Onboard data sources quickly and look forward to building the historical data.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cc5181c elementor-widget elementor-widget-heading\" data-id=\"cc5181c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Automation<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1568b15 elementor-widget elementor-widget-text-editor\" data-id=\"1568b15\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Data should be collected and stored in the data library automatically without daily manual intervention. If that is not possible then the collection frequency should be reduced (monthly instead of daily) and a plan for automation created. If the business insists that the data is so important it needs to be refreshed every day then there should be justification for the investment of technology, IT resources, or whatever else would enable automation.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dd0ee61 elementor-widget elementor-widget-heading\" data-id=\"dd0ee61\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Basic Software Development Principles\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2c84085 elementor-widget elementor-widget-text-editor\" data-id=\"2c84085\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tProcesses should support the utilization of a test environment (which might take many forms) if development would pose a risk to the production data or reports. Version control should be utilized to further mitigate mistakes. Commonly-used code and data should be stored in a single source, rather than hard-coded or saved in many places. There are many others you might consider but these are must-haves.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e339ee9 elementor-widget elementor-widget-heading\" data-id=\"e339ee9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Location\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c89a634 elementor-widget elementor-widget-text-editor\" data-id=\"c89a634\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBuilding the data library in a database is preferable but not required. Cloud storage or a local shared drive also can be utilized. Regardless the location must support targeted, narrow permissions options so someone could have access only to one data source for instance. The location also must be practically accessible by everyone who needs it and be compatible with directly connecting to the analytics and BI tools that will need the data.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3854a98 elementor-widget elementor-widget-heading\" data-id=\"3854a98\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Data architecture\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a722aeb elementor-widget elementor-widget-text-editor\" data-id=\"a722aeb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>There are wrong ways to structure data in general, such as blank rows between records in a spreadsheet. Then there are many acceptable choices based on preference or circumstance. I believe there is a particular architectural choice that is always possible and avoids many potential problems: <em>Organize tables in a \u201clong\u201d structure with as few columns as possible and separate metadata and metrics.\u00a0<\/em><\/p>\n<p>Consider a source table with data on students: their name, gender, grade, and eight test scores. With the architecture I recommend the first table would store metadata on each student in three columns: student ID, metadata field, and metadata value. This three-column structure can store many categorical columns, and new categories can be added<em> as an append of new rows to the table rather than as the concatenation of a new column.<\/em> This is incredibly flexible.\u00a0<\/p>\n<p>Along with that three-column metadata table is a three-column metrics table. Like the metadata table it has an ID column, and then the test identifier (metric name) and test score (metric value). As many metrics as one desires can be stored and like metadata, metrics can be added as a row append rather than a concatenate action.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-38646a0 elementor-widget elementor-widget-text-editor\" data-id=\"38646a0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/SY7XMRl2WrLpzRCze0fK4_NRCpiA-u71Q88Bjj10cmHkAlsObV6MA5NTpaITsqc0fVjvT0vy6v4rwwffc-W_5BdzRwCx71KHrYEb-uP9q79odhPSEIJX0bN8UbjGSqrEu-IaM8V2.png\" alt=\"A Tech-Agnostic, Principled-Approach To Grassroots Data Management\"\/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4b4343c elementor-widget elementor-widget-text-editor\" data-id=\"4b4343c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Besides the architectural advantages this structure can aid with analysis as well. In the example table, it is far easier in most languages and analysis tools to do an average by group than rowwise across many columns.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7b49034 elementor-widget elementor-widget-heading\" data-id=\"7b49034\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Readiness<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0d471d4 elementor-widget elementor-widget-text-editor\" data-id=\"0d471d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Data from the data library is &#8220;ready&#8221; to be utilized for reporting, analytics, or whatever the use case(s). The result is there usually should be little effort required to prepare data for new reports and analysis if the source data has been catalogued. If the raw data is not suitable for reporting then there should be either additional table(s) with the restructured data, or scripts\/processes that can be applied uniformly on the fly to any subset of the data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-015df88 elementor-widget elementor-widget-heading\" data-id=\"015df88\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Documentation<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-58292db elementor-widget elementor-widget-text-editor\" data-id=\"58292db\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>There usually should be at least enough info in each of these categories for someone completely unfamiliar with the data to understand:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9360566 elementor-widget elementor-widget-text-editor\" data-id=\"9360566\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>What does a column represent<\/li><li>What is the process for the generation, collection, and cleaning (if applicable) of the data?<\/li><li>What are the current uses of the data?<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2bd4b06 elementor-widget elementor-widget-text-editor\" data-id=\"2bd4b06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>This can be done in something as simple as text files or with the aid of a <a href=\"https:\/\/www.talend.com\/resources\/what-is-data-catalog\/\" target=\"_blank\" rel=\"noreferrer noopener\">data catalog product<\/a>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8d3961d elementor-widget elementor-widget-heading\" data-id=\"8d3961d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Standards<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-04b15fb elementor-widget elementor-widget-text-editor\" data-id=\"04b15fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Standards should be developed and utilized for things such as column names, visualization colors and file names. This can take the form of a <a href=\"https:\/\/style.tidyverse.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">robust style guide<\/a> or simple rules that are added on an as-needed basis. Certain standards should be used, like column name syntax; others may be in response to confusion from your customers when something has been done in different ways.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4e1f91d elementor-widget elementor-widget-heading\" data-id=\"4e1f91d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Monitor Data Health<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-747fe4c elementor-widget elementor-widget-text-editor\" data-id=\"747fe4c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The final principle helps to ensure all of this effort does not go for naught. Things will go wrong, for expected and unexpected reasons. It is better to find that out proactively than to be informed by a business partner. Whether you have standard checks or customize them for each data source, ensure you will know at a minimum if automation has failed or data is missing.<\/p>\n<p>In addition to using tools that support these principles an analytics team should choose its tech stack in the context of the particular resources available in its company and team:<\/p>\n<p>Choosing your data library tech stack<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-66827f9 elementor-widget elementor-widget-text-editor\" data-id=\"66827f9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>If possible, use tools that are already available at your company. This may be less expensive, avoids bureaucracy, and may allow your IT to better support you.<\/li><li>Do not try to figure out the perfect long-term solution. What you learn in the first 6-24 months will help you with that, and you\u2019ll be providing value that will help better justify the investment later.<\/li><li>Take advantages of the skillset your team already possesses; some things will require everyone to use the same method but others can accommodate one\u2019s preferences and\/or strengths.<\/li><li>Be privacy-first in your design. You should be able to document, find, and delete PI systematically and on-demand.<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9778720 elementor-widget elementor-widget-text-editor\" data-id=\"9778720\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Doing all of this is hard work but it is achievable, and it can be done quickly. In the next article I will give specific examples of how these principles can be implemented.&nbsp;<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>In the introduction to this series, I explained what a data library is and how it can help a small data analytics team that lacks formal business intelligence support create a solid foundation for data management. This article will explain the universal principles that should guide the development of a data library. Let\u2019s Look At<\/p>\n","protected":false},"author":1135,"featured_media":22983,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187],"tags":[116,687,977],"ppma_author":[3185],"class_list":["post-22981","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-automation","tag-data-architecture","tag-data-management"],"authors":[{"term_id":3185,"user_id":1135,"is_guest":0,"slug":"chris-umphlett","display_name":"Chris Umphlett","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Chris-Umphlett-150x150.jpg","author_category":"","user_url":"","last_name":"Umphlett","first_name":"Chris","job_title":"","description":"Chris Umphlett is the Manager of Data Analysis and Data Privacy at TechSmith, the makers of great software like Snagit and Camtasia. Before that he worked on analytics teams in the consumer packaged goods, life insurance, and utility industries. He lives in East Lansing, Michigan with his wife and young children."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22981","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1135"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22981"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22981\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/22983"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22981"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22981"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22981"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22981"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}