{"id":10011,"date":"2020-10-02T09:56:21","date_gmt":"2020-10-02T09:56:21","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=10011"},"modified":"2023-10-25T10:05:31","modified_gmt":"2023-10-25T10:05:31","slug":"will-automl-software-replace-data-scientists","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/will-automl-software-replace-data-scientists\/","title":{"rendered":"Will AutoML Software Replace Data Scientists?"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"10011\" class=\"elementor elementor-10011\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-257b3fb9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"257b3fb9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1ff6dcde\" data-id=\"1ff6dcde\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6d47bc40 elementor-widget elementor-widget-text-editor\" data-id=\"6d47bc40\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"has-medium-font-size\"><em>AutoML is not a threat for Data Scientists<\/em><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ce88429 elementor-widget elementor-widget-text-editor\" data-id=\"ce88429\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"02c2\">In the last years, a lot of automated machine learning pieces of software have been introduced. They can automate some tasks that a Data Scientist has usually to perform manually. They have reached a very remarkable level of complexity and effectiveness. Are they a threat to Data Scientist\u2019s job or are they an opportunity?<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c3df59f elementor-widget elementor-widget-heading\" data-id=\"c3df59f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">What is AutoML?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d659b9b elementor-widget elementor-widget-text-editor\" data-id=\"d659b9b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"eee8\">AutoML is a generic expression to indicate pieces of software that perform Machine Learning tasks automatically. They usually automate the entire pipeline processing like, for example, cleaning, encoding, feature and model selection, and hyperparameters tuning. Such pieces of software can be Python libraries like Auto-Sklearn or software programs like Data Robot.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4b1070b elementor-widget elementor-widget-heading\" data-id=\"4b1070b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Is AutoML useful to Data Scientists?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-19f37be elementor-widget elementor-widget-text-editor\" data-id=\"19f37be\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"c174\">Yes, I think that it\u2019s very useful because it automates all the boring tasks that usually require a lot of code and give a high chance of making some mistake. Without AutoML, a Data Scientist must create his own ML pipeline from scratch. Every ML model has its own requirements (e.g. scaling the features for the neural networks), so the complete set of pipelines to test may become quite complex and time-consuming. Using an AutoML tool will easily make a Data Scientist create a good ML model without caring too much about the code. Remember: a Data Scientist is not a software engineer, so he must write as little code as possible, in order to focus on data and information.<\/p>\n<!-- \/wp:paragraph --\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-19653e9 elementor-widget elementor-widget-heading\" data-id=\"19653e9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusions<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-df2c26f elementor-widget elementor-widget-text-editor\" data-id=\"df2c26f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"b36f\">I think that Data Scientists must follow change and innovation, so AutoML can become a very useful friend of theirs if they start using it properly. If they automate boring tasks, they will likely have more time to spend analyzing information, that is the real goal of a Data Scientist.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\n\n\n<p id=\"f2a4\">AutoML pieces of software replace all the boring steps that take more time to a Data Scientist\u2019s work. They actually make all the combinations of the several parameters of a pipeline (e.g. the blank filling values, scaling algorithm, model type, model hyperparameters) and select the best combination that maximizes some performance metrics (like RMSE or Area under the ROC Curve) in k-fold cross-validation using some search algorithm (like Grid or Random Search).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-193d541 elementor-widget elementor-widget-text-editor\" data-id=\"193d541\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"2572\">They can really simplify the life of somebody that has to create a model from scratch and sometimes they explore combinations and scenarios that a Data Scientist may not have thought of.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9830bca elementor-widget elementor-widget-heading\" data-id=\"9830bca\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Does it replace a Data Scientist\u2019s work?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ab35b25 elementor-widget elementor-widget-text-editor\" data-id=\"ab35b25\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"79db\">Somebody may think that AutoML replaces a Data Scientist\u2019s work and may make this job obsolete in the future. There\u2019s nothing more wrong than this suspicion. Let\u2019s see why.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9781594 elementor-widget elementor-widget-heading\" data-id=\"9781594\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Data Science is not (only) Machine Learning<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-696c12a elementor-widget elementor-widget-text-editor\" data-id=\"696c12a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"af66\">A Data Scientist is more than a person that uses Machine Learning models. A Data Scientist analyzes the hidden information inside data, extracts useful correlations, gives help preparing the correct data to feed an ML pipeline, gives useful insights about the business that has created data itself. These things are the most important part of Data Science and cannot be fully automated. They rely on a deep knowledge of the business, on a strong and effective use of the business language that people talk and, more than everything else, that business managers talk.<\/p>\n\n\n\n<p id=\"86c1\">All these things make the Data Scientist\u2019s job more complicated and interesting than running Machine Learning models and that\u2019s outside AutoML scope.<\/p>\n\n\n\n<p id=\"7f8d\">AutoML software automates Machine Learning tasks, not the whole Data Science process. Machine Learning is just a small part of a Data Scientist\u2019s job and maybe isn\u2019t the most important one nor the most challenging one. Understanding data, information, and business context are the real challenges of a Data Scientist and, if these tasks are not fully accomplished, Machine Learning will never be the magic wand that solves all the problems.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b50afc9 elementor-widget elementor-widget-heading\" data-id=\"b50afc9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">AutoML doesn\u2019t work alone<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a3c838f elementor-widget elementor-widget-text-editor\" data-id=\"a3c838f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"8282\">AutoML is software, so it always needs somebody with the right skills to use it. Infact, AutoML results must be validated by a professional Data Scientist in order to make sure they are correct and make sense in the business environment they have been produced. It\u2019s not unusual to produce a model that seems perfect on paper but in reality, doesn\u2019t produce any useful business insights or, in the worst case, its predictions are trivial. That\u2019s why a Data Scientist must always be there in order to make sure that the model is telling us something new and not just chewing something old.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4b1070b elementor-widget elementor-widget-heading\" data-id=\"4b1070b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Is AutoML useful to Data Scientists?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-19f37be elementor-widget elementor-widget-text-editor\" data-id=\"19f37be\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"c174\">Yes, I think that it\u2019s very useful because it automates all the boring tasks that usually require a lot of code and give a high chance of making some mistake. Without AutoML, a Data Scientist must create his own ML pipeline from scratch. Every ML model has its own requirements (e.g. scaling the features for the neural networks), so the complete set of pipelines to test may become quite complex and time-consuming. Using an AutoML tool will easily make a Data Scientist create a good ML model without caring too much about the code. Remember: a Data Scientist is not a software engineer, so he must write as little code as possible, in order to focus on data and information.<\/p>\n<!-- \/wp:paragraph --\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-19653e9 elementor-widget elementor-widget-heading\" data-id=\"19653e9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusions<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-df2c26f elementor-widget elementor-widget-text-editor\" data-id=\"df2c26f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p id=\"b36f\">I think that Data Scientists must follow change and innovation, so AutoML can become a very useful friend of theirs if they start using it properly. If they automate boring tasks, they will likely have more time to spend analyzing information, that is the real goal of a Data Scientist.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Data Scientists must follow change and innovation, so AutoML can become a very useful friend of theirs if they start using it properly. If they automate boring tasks, they will likely have more time to spend analyzing information, that is the real goal of a Data Scientist.<\/p>\n","protected":false},"author":618,"featured_media":10012,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[690,394,92],"ppma_author":[3328],"class_list":["post-10011","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-automl-software","tag-data-scientist","tag-machine-learning"],"authors":[{"term_id":3328,"user_id":618,"is_guest":0,"slug":"gianluca-malato","display_name":"Gianluca Malato","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/04\/medium_918623b2-8f36-4110-8343-6fc9228595dd-150x150.jpg","user_url":"http:\/\/www.gianlucamalato.it\/","last_name":"Malato","first_name":"Gianluca","job_title":"","description":"Gianluca Malato is Data Scientist at Poste Italiane SPA.\u00a0 He is also a fiction author and software developer, Editor of\u00a0<a href=\"https:\/\/medium.com\/data-science-journal?source=follow_footer--------------------------follow_footer-\">Data Science Journal<\/a>,\u00a0<a href=\"https:\/\/medium.com\/the-trading-scientist?source=follow_footer--------------------------follow_footer-\">The Trading Scientist<\/a>, and\u00a0<a href=\"https:\/\/medium.com\/the-writers-notebook?source=follow_footer--------------------------follow_footer-\">The Writer\u2019s Notebook<\/a>. His books are available on <a href=\"https:\/\/www.amazon.com\/Gianluca-Malato\/e\/B076CHTG3W?ref=dbs_a_mng_rwt_scns_share\">Amazon<\/a>."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10011","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/618"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=10011"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10011\/revisions"}],"predecessor-version":[{"id":33726,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10011\/revisions\/33726"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/10012"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=10011"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=10011"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=10011"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=10011"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}