{"id":22494,"date":"2020-12-11T10:01:29","date_gmt":"2020-12-11T10:01:29","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/automated-machine-learning-is-coming-wont-matter\/"},"modified":"2023-09-21T17:52:16","modified_gmt":"2023-09-21T17:52:16","slug":"automated-machine-learning-is-coming-wont-matter","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/automated-machine-learning-is-coming-wont-matter\/","title":{"rendered":"Automated Machine Learning Is Coming&#8230; And It Won&#8217;t Matter"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22494\" class=\"elementor elementor-22494\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-1579256 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1579256\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b41c4a1\" data-id=\"b41c4a1\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-956d94b elementor-widget elementor-widget-text-editor\" data-id=\"956d94b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fa61\">Recently, I\u2019ve been seeing a lot of services and products advertising automation of machine learning.\u00a0<a href=\"https:\/\/www.datarobot.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Robot<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.h2o.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">H2O.ai<\/a>\u00a0offer platforms that allow the creation of machine learning algorithms in point-and-click interfaces. They\u2019ll even do the feature engineering for you! This functionality, or something like it, is slowly being built into various tools and programs. They promise to automate the creation of the whole machine learning pipeline \u2014 from feature transformations, hyperparameter tuning, to model selection. There are open-source tools that do much the same things (like\u00a0<a href=\"https:\/\/github.com\/EpistasisLab\/tpot\" target=\"_blank\" rel=\"noreferrer noopener\">TPOT<\/a>, a cool module I love the idea of but can never get to actually work on a data set that isn\u2019t trivially small).<\/p>\n\n<p id=\"50e2\">Right now, these tools mostly aren\u2019t great and\/or are absurdly expensive (for the cost of a subscription to Data Robot, you can employ a full-time data scientist). But I have no doubt that soon tools will exist that will completely take care of the model\/hyperparameter\/feature-transformation process.<\/p>\n\n<p id=\"8ac5\">I\u2019ve had people ask me if I\u2019m worried about my job security as a data scientist. No, I am not. I can\u2019t wait until these tools are there and open source so I can just type \u201cimport machinelearn\u201d and just have it do the stupid hyperparameter optimization and I can get on with the hard part of the job.<\/p>\n\n<p id=\"3380\">When I get data to the point where it could conceivably be ingested by one of these tools, the problem is basically done. At that point I need to run a bit of code to do the grid search and find a reasonably decent model and tune the hyperparameters. Hell, if I just ran XGBoost with the default parameters at this point it would usually be almost as good as I am ever going to get it anyways. Doing the extra work of tuning things a bit more is only worth it because it\u2019s relatively easy, and you very quickly get to the point of diminishing returns (unless you\u2019re in a Kaggle competition, where even diminished returns might take you from 10th place to 1st so you milk every tiny incremental increase in accuracy you can).<\/p>\n\n<p id=\"9aad\">Once you have your data in the format where you could make a Kaggle competition out of it, you\u2019ve done the hard part. I would love it if at that point I just ran a single function that did a well optimized search that was way more thorough than my typical grid searches, and also explored some different feature <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/why-70-percent-of-digital-transformations-fail\/\" target=\"_blank\" rel=\"noreferrer noopener\">transformations<\/a>. Maybe my models would do marginally better, and I would save myself a few minutes writing the code. It would be nice. But if it would put you out of a job, maybe you should be seriously thinking about what skills you bring to the table.<\/p>\n\n<p id=\"5d80\">In most data science positions I\u2019ve heard of, the hard part isn\u2019t building a model once the problem has been framed, data collected, samples chosen, and data is in a neat one-row-per-sample format. The hard part is getting to that point. While I don\u2019t doubt some of these steps will be made simpler in the future as tools evolve, I can\u2019t see anytime in the near future where the whole process could be easily automated. Translating a business problem into a prediction problem is hard and requires a lot of business knowledge coupled with abstract, quantitative thinking. Figuring out what data to use and how to get it is hard \u2014 businesses evolve and the data infrastructure isn\u2019t always so clean, so there aren\u2019t ready solutions here. Choosing an unbiased sample set for training can be extremely difficult and there isn\u2019t a cookie-cutter solution to this. Most often, some structure needs to be imposed on the data from knowledge about the particulars of the problem.<\/p>\n\n<p id=\"96b8\">I have no doubt that in the next few years, we\u2019ll have some nice tools for automating the building of a machine learning pipeline. Hopefully once that problem is rendered trivial, fewer aspiring data scientists will try to prove their skills by showing off how accurate their model is on the Iris data set. I don\u2019t see much impact on the field beyond that.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>In the next few years, we will have some nice tools for automating the building of a machine learning pipeline.<\/p>\n","protected":false},"author":997,"featured_media":18150,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97,653,116,92],"ppma_author":[3893],"class_list":["post-22494","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence","tag-automated-machine-learning","tag-automation","tag-machine-learning"],"authors":[{"term_id":3893,"user_id":997,"is_guest":0,"slug":"tommy-blanchard","display_name":"Tommy Blanchard","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/b5fca41dcee51fce507eba09b993d21fa80d8078ea83d132cb77a82be2e52876?s=96&d=mm&r=g","user_url":"https:\/\/medium.com\/@blanchard.tommy","last_name":"Blanchard","first_name":"Tommy","job_title":"","description":"<a href=\"http:\/\/tommyblanchard.com\/\" target=\"_blank\" rel=\"noopener\">Tommy Blanchard<\/a> is Lead\/Manager, Data Science at Klaviyo. He did PhD in Brain and Cognitive Sciences at the University of Rochester, and postdoc at Harvard in the Computational Cognitive Neuroscience lab."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22494","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/997"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22494"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22494\/revisions"}],"predecessor-version":[{"id":33121,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22494\/revisions\/33121"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18150"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22494"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22494"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22494"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22494"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}