{"id":22571,"date":"2021-01-19T09:54:33","date_gmt":"2021-01-19T09:54:33","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/dont-need-data-scientists-need-engineers\/"},"modified":"2023-09-06T06:54:40","modified_gmt":"2023-09-06T06:54:40","slug":"dont-need-data-scientists-need-engineers","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/dont-need-data-scientists-need-engineers\/","title":{"rendered":"We Don&#8217;t Need Data Scientists, We Need Data Engineers"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22571\" class=\"elementor elementor-22571\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-3315932c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3315932c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-482ec927\" data-id=\"482ec927\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-34265aaf elementor-widget elementor-widget-text-editor\" data-id=\"34265aaf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Data. It\u2019s everywhere and we\u2019re\u00a0<a href=\"https:\/\/techjury.net\/blog\/how-much-data-is-created-every-day\/#gref\" target=\"_blank\" rel=\"noreferrer noopener\">only getting more of it<\/a>. For the last 5-10 years,\u00a0<em>data science<\/em>\u00a0has attracted newcomers near and far trying to get a taste of that forbidden fruit.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f34a41 elementor-widget elementor-widget-heading\" data-id=\"0f34a41\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">But what does the state of&nbsp;<em>data science<\/em>&nbsp;hiring look like today?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0fb7cc9 elementor-widget elementor-widget-heading\" data-id=\"0fb7cc9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Here\u2019s the gist of the article in two-sentences for the busy reader.<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7847174 elementor-widget elementor-widget-text-editor\" data-id=\"7847174\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong>TLDR<\/strong>: There are&nbsp;<strong>70% more open roles<\/strong>&nbsp;at companies in&nbsp;<em>data engineering<\/em>&nbsp;as compared to&nbsp;<em>data science<\/em>. As we train the next generation of data and machine learning practitioners, let\u2019s place more emphasis on engineering skills.<\/p>\n<hr class=\"wp-block-separator\"\/>\n<p>As part of my work developing an&nbsp;<a href=\"https:\/\/www.confetti.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">educational platform<\/a>&nbsp;for data professionals, I think a lot about how the market for data-driven (machine learning and data science) roles is evolving.<\/p>\n\n<p>In talking to dozens of prospective entrants to data fields including students at top institutions around the world, I\u2019ve seen a tremendous amount of confusion around what skills are most important to help candidates stand out in the crowd and prepare for their careers.<\/p>\n\n<p>When you think about it, a&nbsp;<em>data scientist<\/em>&nbsp;can be responsible for any subset of the following: machine learning modelling, visualization, data cleaning and processing (i.e. SQL wrangling), engineering, and production deployment.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a7d089b elementor-widget elementor-widget-heading\" data-id=\"a7d089b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">How do you even begin to recommend a study curriculum for newcomers?<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4320e4e elementor-widget elementor-widget-text-editor\" data-id=\"4320e4e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Data speaks louder than words. So I decided to do an analysis of the data roles being hired for at every company coming out of&nbsp;<a href=\"https:\/\/www.ycombinator.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Y-Combinator<\/a>&nbsp;since 2012. The questions that guided my research:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ff8682b elementor-widget elementor-widget-text-editor\" data-id=\"ff8682b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>What data roles are companies most frequently hiring for?<\/li><li>How in-demand is the conventional&nbsp;<em>data scientist<\/em>&nbsp;that we talk about so much?<\/li><li>Are the same skills that started the data revolution relevant today?<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fd51cbb elementor-widget elementor-widget-text-editor\" data-id=\"fd51cbb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>If you want the full details and analysis, read on.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f2b6c5b elementor-widget elementor-widget-heading\" data-id=\"f2b6c5b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Methodology<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3614523 elementor-widget elementor-widget-text-editor\" data-id=\"3614523\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>I chose to do an analysis of YC portfolio companies that claim to make some sort of data work part of their value proposition.<\/p>\n\n<p>Why focus on YC? Well, for starters, they do a good job of providing an easily searchable (and scrapable)&nbsp;<a href=\"https:\/\/www.ycombinator.com\/companies\/\" target=\"_blank\" rel=\"noreferrer noopener\">directory of their companies<\/a>.<\/p>\n\n<p>In addition, as a particularly forward-thinking incubator that has funded companies from around the world across domains for over a decade, I felt they provided a representative sample of the market with which to conduct my analyses. That being said, take what I say wit a grain of salt, as I didn\u2019t analyze super-large tech companies.<\/p>\n\n<p>I scraped the homepage URLs of every YC company since 2012, producing an initial pool of ~1400 companies.<\/p>\n\n<p>Why stop at 2012? Well, 2012 was the year that&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/AlexNet\" target=\"_blank\" rel=\"noreferrer noopener\">AlexNet<\/a>&nbsp;won the ImageNet competition, effectively kickstarting the machine learning and data-modelling wave we are now living through. It\u2019s fair to say that this birthed some of the earliest generations of data-first companies.<\/p>\n\n<p>From this initial pool, I performed keyword filtering to reduce the number of relevant companies I would have to look through. In particular, I only considered companies whose websites included at least one of the following terms: AI, CV, NLP, natural language processing, computer vision, artificial intelligence, machine, ML, data. I also disregarded companies whose website links were broken.<\/p>\n\n<p>Did this generate a ton of false positives? Absolutely! But here I was trying to prioritize high recall as much as possible, recognizing that I would do a more fine-grained manual inspection of the individual websites for relevant roles.<\/p>\n\n<p>With this reduced pool, I went through every site, found where they were advertising jobs (typically a&nbsp;<em>Careers<\/em>,&nbsp;<em>Jobs<\/em>, or&nbsp;<em>We\u2019re Hiring<\/em>&nbsp;page), and took note of every role that included data, machine learning, NLP, or CV in the title. This gave me a pool of about 70 distinct companies hiring for data roles.<\/p>\n\n<p>One note here: it\u2019s conceivable that I missed some companies as there were certain websites with very little information (typically those in stealth) that might actually be hiring. In addition, there were companies that didn\u2019t have a formal&nbsp;<em>Careers<\/em>&nbsp;page but asked that prospective candidates reach out directly via email.<\/p>\n\n<p>I disregarded both of these types of companies rather than reach out to them, so they are not part of this analysis.<\/p>\n\n<p>Another thing: the bulk of this research was done towards the final weeks of 2020. Open roles may have changed as companies update their pages periodically. However, I don\u2019t believe this will drastically impact the conclusions drawn.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f1625f8 elementor-widget elementor-widget-heading\" data-id=\"f1625f8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">What Are Data Practitioners Responsible For?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-167c3ba elementor-widget elementor-widget-text-editor\" data-id=\"167c3ba\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Before diving into the results, it\u2019s worth spending some time clarifying what responsibilities each data role is typically responsible for. Here are the four roles we will spend our time looking at with a short description of what they do:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a1e96a4 elementor-widget elementor-widget-text-editor\" data-id=\"a1e96a4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li><em>Data scientist<\/em>: Use various techniques in statistics and machine learning to process and analyse data. Often responsible for building models to probe what can be learned from some data source, though often at a prototype rather than production level.<\/li><li><em>Data engineer<\/em>: Develops a robust and scalable set of data processing tools\/platforms. Must be comfortable with SQL\/NoSQL database wrangling and building\/maintaining ETL pipelines.<\/li><li><em>Machine Learning (ML) Engineer<\/em>: Often responsible for both training models and productionizing them. Requires familiarity with some high-level ML framework and also must be comfortable building scalable training, inference, and deployment pipelines for models.<\/li><li><em>Machine Learning (ML) Scientist<\/em>: Works on cutting-edge research. Typically responsible for exploring new ideas that can be published at academic conferences. Often only needs to prototype new state-of-the-art models before handing off to <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/what-an-ml-engineer-needs-to-know\/\" target=\"_blank\" rel=\"noreferrer noopener\">ML engineers<\/a> for productionization.<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e765953 elementor-widget elementor-widget-heading\" data-id=\"e765953\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">How Many Data Roles Are There?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e36d711 elementor-widget elementor-widget-text-editor\" data-id=\"e36d711\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>So what happens when we plot the frequency of each data role that companies are hiring for? The plot looks like this:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-57651e3 elementor-widget elementor-widget-image\" data-id=\"57651e3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"960\" height=\"512\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/01\/all_roles-76753bdb67cdaac40a0ea69ffbe76267-3d61e.png\" class=\"attachment-large size-large wp-image-32465\" alt=\"all machine learning, data science, data engineering roles at Y-Combinator companies\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/01\/all_roles-76753bdb67cdaac40a0ea69ffbe76267-3d61e.png 960w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/01\/all_roles-76753bdb67cdaac40a0ea69ffbe76267-3d61e-300x160.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/01\/all_roles-76753bdb67cdaac40a0ea69ffbe76267-3d61e-768x410.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/01\/all_roles-76753bdb67cdaac40a0ea69ffbe76267-3d61e-610x325.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/01\/all_roles-76753bdb67cdaac40a0ea69ffbe76267-3d61e-750x400.png 750w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9b55023 elementor-widget elementor-widget-text-editor\" data-id=\"9b55023\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>What immediately stands out is how many more open&nbsp;<em>data engineer<\/em>&nbsp;roles there are compared to traditional&nbsp;<em>data scientists<\/em>. In this case, the raw counts correspond to companies hiring&nbsp;<strong>roughly 55% more<\/strong>&nbsp;for data engineers than data scientists, and roughly the same number of machine learning engineers as data scientists.<\/p>\n\n<p><strong>But we can do more. If you look at the titles of the various roles, there seems to be some repetition.<\/strong><\/p>\n\n<p>Let\u2019s only provide coarse-grained categorization through role consolidation. In other words, I took roles whose descriptions were roughly equivalent and consolidated them under a single title.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-69d24b5 elementor-widget elementor-widget-heading\" data-id=\"69d24b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">That included the following set of equivalence relations:<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eb5ef85 elementor-widget elementor-widget-text-editor\" data-id=\"eb5ef85\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li><em>NLP engineer<\/em>&nbsp;\\approx\u2248&nbsp;<em>CV engineer<\/em>&nbsp;\\approx\u2248&nbsp;<em>ML engineer<\/em>&nbsp;\\approx\u2248&nbsp;<em>Deep Learning engineer<\/em>&nbsp;(while the domains might be different, the responsiblities are roughly the same)<\/li><li><em>ML scientist<\/em>&nbsp;\\approx\u2248&nbsp;<em>Deep Learning researcher<\/em>&nbsp;\\approx\u2248&nbsp;<em>ML intern<\/em>&nbsp;(the internship description very much seemed research-focused)<\/li><li><em>Data engineer<\/em>&nbsp;\\approx\u2248&nbsp;<em>Data architect<\/em>&nbsp;\\approx\u2248&nbsp;<em>Head of data<\/em>&nbsp;\\approx\u2248&nbsp;<em>Data platform engineer<\/em><\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cc2517b elementor-widget elementor-widget-image\" data-id=\"cc2517b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"960\" height=\"512\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/consolidated_roles-d0609bd70c768ce428b0873ea5ff1bd3-3d61e.png\" class=\"attachment-large size-large wp-image-18473\" alt=\"all machine learning, data science, data engineering roles at Y-Combinator companies consolidated into coarse categories\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/consolidated_roles-d0609bd70c768ce428b0873ea5ff1bd3-3d61e.png 960w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/consolidated_roles-d0609bd70c768ce428b0873ea5ff1bd3-3d61e-300x160.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/consolidated_roles-d0609bd70c768ce428b0873ea5ff1bd3-3d61e-768x410.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/consolidated_roles-d0609bd70c768ce428b0873ea5ff1bd3-3d61e-610x325.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/consolidated_roles-d0609bd70c768ce428b0873ea5ff1bd3-3d61e-750x400.png 750w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-992c283 elementor-widget elementor-widget-text-editor\" data-id=\"992c283\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>If we don\u2019t like dealing with raw counts, here are some percentages to put us at ease:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ff94e22 elementor-widget elementor-widget-image\" data-id=\"ff94e22\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"960\" height=\"512\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/normalized_consolidated_roles-48138e6d849e501e2823381e49ba06e1-3d61e.png\" class=\"attachment-large size-large wp-image-18474\" alt=\"all machine learning, data science, data engineering roles at Y-Combinator companies normalized frequencies\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/normalized_consolidated_roles-48138e6d849e501e2823381e49ba06e1-3d61e.png 960w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/normalized_consolidated_roles-48138e6d849e501e2823381e49ba06e1-3d61e-300x160.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/normalized_consolidated_roles-48138e6d849e501e2823381e49ba06e1-3d61e-768x410.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/normalized_consolidated_roles-48138e6d849e501e2823381e49ba06e1-3d61e-610x325.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/normalized_consolidated_roles-48138e6d849e501e2823381e49ba06e1-3d61e-750x400.png 750w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7248a2f elementor-widget elementor-widget-text-editor\" data-id=\"7248a2f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>I probably could have lumped&nbsp;<em>ML research engineer<\/em>&nbsp;into one of the&nbsp;<em>ML scientist<\/em>&nbsp;or&nbsp;<em>ML engineer<\/em>&nbsp;bins, but given that it was a bit of a hybrid role, I left it as is.<\/p>\n\n<p>Overall the consolidation made the differences even more pronounced! There are&nbsp;<strong>~70%<\/strong>&nbsp;more open&nbsp;<em>data engineer<\/em>&nbsp;than&nbsp;<em>data scientist<\/em>&nbsp;positions. In addition, there are&nbsp;<strong>~40%<\/strong>&nbsp;more open&nbsp;<em>ML engineer<\/em>&nbsp;than&nbsp;<em>data scientist<\/em>&nbsp;positions. There are also only&nbsp;<strong>~30%<\/strong>&nbsp;as many&nbsp;<em>ML scientist<\/em>&nbsp;as&nbsp;<em>data scientist<\/em>&nbsp;positions.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8d0a080 elementor-widget elementor-widget-heading\" data-id=\"8d0a080\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Takeaways<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ef8e026 elementor-widget elementor-widget-text-editor\" data-id=\"ef8e026\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><em>Data engineers<\/em>&nbsp;are in increasingly high demand compared to other data-driven professions. In a sense, this represents an evolution for the broader field.<\/p>\n\n<p>When machine learning become hot   5-8 years ago, companies decided they need people that can make classifiers on data. But then frameworks like&nbsp;<a href=\"https:\/\/www.tensorflow.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Tensorflow<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/pytorch.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch<\/a>&nbsp;became really good, democratizing the ability to get started with deep learning and machine learning.<\/p>\n\n<p>This commoditized the data modelling skillset.<\/p>\n\n<p>Today, the bottleneck in helping companies get machine learning and modelling insights to production center on data problems.<\/p>\n\n<p>How do you annotate data? How do you process and clean data? How do you move it from A to B? How do you do this every day as quickly as possible?<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0063e06 elementor-widget elementor-widget-image\" data-id=\"0063e06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"684\" height=\"1024\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/patrick-951fdc9920aa2a6cc7b75e0959379665-321c9-684x1024.png\" class=\"attachment-large size-large wp-image-18475\" alt=\"patrick star moving data\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/patrick-951fdc9920aa2a6cc7b75e0959379665-321c9-684x1024.png 684w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/patrick-951fdc9920aa2a6cc7b75e0959379665-321c9-200x300.png 200w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/patrick-951fdc9920aa2a6cc7b75e0959379665-321c9-610x913.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/patrick-951fdc9920aa2a6cc7b75e0959379665-321c9.png 720w\" sizes=\"(max-width: 684px) 100vw, 684px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-963e859 elementor-widget elementor-widget-text-editor\" data-id=\"963e859\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>All that amounts to having good engineering skills.<\/p>\n\n<p>This may sound boring and unsexy, but old-school software engineering with a bend toward data may be what we really need right now.<\/p>\n\n<p>For years, we\u2019ve become enamored with the idea of data professionals that breathe life into raw data thanks to cool demos and media hype. After all, when was the last time you saw a&nbsp;<a href=\"https:\/\/techcrunch.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">TechCrunch<\/a>&nbsp;article about an ETL pipeline?<\/p>\n\n<p>If nothing else, I believe solid engineering is something we don\u2019t emphasize enough in data science job training or educational programs. In addition to learning how to use&nbsp;<em>linear_regression.fit()<\/em>, learn how to write a unit test too!<\/p>\n\n<p>So does that mean you shouldn\u2019t study data science? No.<\/p>\n\n<p>What it means is that competition is going to be tougher. There are going to be fewer positions available for what is looking to be an abundance of newcomers to the market trained to do data science.<\/p>\n\n<p>There will always be a need for people that can effectively analyze and extract actionable insights from data. But they have to be good.<\/p>\n\n<p>Downloading a pretrained model off the Tensorflow website on the&nbsp;<a href=\"https:\/\/scikit-learn.org\/stable\/auto_examples\/datasets\/plot_iris_dataset.html\" target=\"_blank\" rel=\"noreferrer noopener\">Iris dataset<\/a>&nbsp;probably is no longer enough to get that data science job.<\/p>\n\n<p>It\u2019s clear, however, with the large number of&nbsp;<em>ML engineer<\/em>&nbsp;openings that companies often want a hybrid data practitioner: someone that can build and deploy models. Or said more succinctly, someone that can use Tensorflow but can also build it from source.<\/p>\n\n<p>Another takeaway here is that there just aren\u2019t that many ML research positions.<\/p>\n\n<p>Machine learning research tends to get its fair share of hype because that\u2019s where all the cutting-edge stuff happens, all the&nbsp;<a href=\"https:\/\/deepmind.com\/research\/case-studies\/alphago-the-story-so-far\" target=\"_blank\" rel=\"noreferrer noopener\">AlphaGo<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/openai.com\/blog\/openai-api\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">GPT-3<\/a>&nbsp;and what-not.<\/p>\n\n<p>But for many companies, especially early-stage ones, the bleeding-edge state-of-the-art may not be what\u2019s needed anymore. Getting a model that\u2019s 90% of the way there but can scale to 1000+ users is often more valuable to them.<\/p>\n\n<p>That\u2019s not to say that there isn\u2019t an important place for machine learning research. Absolutely not.<\/p>\n\n<p>But you\u2019ll probably find more of those kinds of roles at industry research labs that can afford to take capital-intensive bets for long stretches of time rather than at a seed-stage startup trying to demonstrate product-market fit to investors as it raises a Series A.<\/p>\n\n<p>If nothing else, I believe it\u2019s important to make the expectations of newcomers to data fields reasonable and calibrated. We must acknowledge that\u00a0<a href=\"https:\/\/veekaybee.github.io\/2019\/02\/13\/data-science-is-different\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">data science is different now<\/a>. I hope this post was able to shed some light on the state of the field today. It\u2019s only when we know where we are that we know where we need to go.<\/p>\n\n<p>Cross-posted from <a href=\"https:\/\/www.mihaileric.com\/posts\/we-need-data-engineers-not-data-scientists\/\" target=\"_blank\" rel=\"noreferrer noopener\">mihaileric.com<\/a> with permission of author.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let\u2019s place more emphasis on engineering skills.<\/p>\n","protected":false},"author":1027,"featured_media":18476,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97,1260,94,394,92],"ppma_author":[3875],"class_list":["post-22571","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence","tag-data-engineer","tag-data-science","tag-data-scientist","tag-machine-learning"],"authors":[{"term_id":3875,"user_id":1027,"is_guest":0,"slug":"mihail-eric","display_name":"Mihail Eric","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/mihail_eric-150x150.jpeg","user_url":"https:\/\/www.mihaileric.com\/","last_name":"Eric","first_name":"Mihail","job_title":"","description":"Mihail Eric, a researcher, engineer, and educator, is Machine Learning Scientist at Alexa AI, Amazon."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1027"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22571"}],"version-history":[{"count":13,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22571\/revisions"}],"predecessor-version":[{"id":32471,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22571\/revisions\/32471"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18476"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22571"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}