{"id":9766,"date":"2020-09-22T07:45:41","date_gmt":"2020-09-22T07:45:41","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=9766"},"modified":"2023-11-01T16:30:09","modified_gmt":"2023-11-01T16:30:09","slug":"untold-truths-of-being-a-machine-learning-engineer","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/untold-truths-of-being-a-machine-learning-engineer\/","title":{"rendered":"Untold Truths of being a Machine Learning Engineer"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"9766\" class=\"elementor elementor-9766\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6b873fc4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"84598\" data-id=\"6b873fc4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-57db1344\" data-eae-slider=\"89715\" data-id=\"57db1344\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-38b6c169 elementor-widget elementor-widget-text-editor\" data-id=\"38b6c169\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"5b88\">Irecently was a part of an interesting Reddit discussion and a few of my answers got highly upvoted. The main point of it was t<strong>he untold truths of being a machine learning engineer.<\/strong>\u00a0I am sharing the key takeaways in a curated manner as I was one of the more active participants.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c269f2b elementor-widget elementor-widget-image\" data-id=\"c269f2b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1449\/1*fpOSYLBxLgRmKVdSyidN8g.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f6d7354 elementor-widget elementor-widget-heading\" data-id=\"f6d7354\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"8eae\">1. Using Deep Learning<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9db407c elementor-widget elementor-widget-text-editor\" data-id=\"9db407c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"c071\">Many Machine Learning enthusiasts think that they will play with fancy Deep Learning models, tune Neural Network architectures and hyperparameters. Don\u2019t get me wrong, some do, but not many.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"48ea\">The truth is that ML engineers spend most of the time working on \u201c<strong>how to properly extract the training set that will resemble real-world problem distribution<\/strong>\u201d. Once you have that, you can in most cases train a classical Machine Learning model and it will work well enough.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-09bff0c elementor-widget elementor-widget-heading\" data-id=\"09bff0c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">\n<h2 class=\"wp-block-heading\" id=\"d9af\">Just out of curiosity, which is the hardest problem being solved by any of these algorithms? And which one is being used to solve it?<\/h2>\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1927a5e elementor-widget elementor-widget-image\" data-id=\"1927a5e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/2938\/0*jBPGZz7ReY1m9pQ3\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d9b0fd1 elementor-widget elementor-widget-text-editor\" data-id=\"d9b0fd1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Do you have a feeling that deep learning on graphs is a bunch of heuristics that work sometimes and nobody has a clue why? In this post, I discuss the graph isomorphism problem, the Weisfeiler-Lehman heuristic for graph isomorphism testing, and how it can be used to analyse the expressive power of graph neural networks.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-dc71aa1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"653\" data-id=\"dc71aa1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f63e353\" data-eae-slider=\"88180\" data-id=\"f63e353\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-935048d elementor-widget elementor-widget-heading\" data-id=\"935048d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">\n<h2 class=\"wp-block-heading\" id=\"d8f7\">Are Deep Learning models difficult to explain in comparison to classic ML models?<\/h2>\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bcce22d elementor-widget elementor-widget-text-editor\" data-id=\"bcce22d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>OP said it nicely:<\/p>\n<p>Can\u2019t see how explaining a Convolutional Neural Net would be any harder than explaining a whole classification framework based on SVMs, Random Forests or Gradient Boosting.<\/p>\n<p>I feel like this statement has become less and less true over the years as NNs have seen more research into interpretability.<\/p>\n<p>It clearly still holds when comparing NNs to good old traditional statistics like GLMs or Naive Bayes. But as soon as you move to CART based methods or anything using the kernel trick this fabled interpretability goes out the window.<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/medium.com\/@romanorac\/autonomous-systems-dcf6af4f88c5\" target=\"_blank\" rel=\"noreferrer noopener\">Autonomous SystemsThe field of autonomous vehicles is set to grow by 42% within the next four years, with salaries for top engineers\u2026medium.com<\/a><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3733763 elementor-widget elementor-widget-heading\" data-id=\"3733763\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">2. Learning Machine Learning<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2bbd92d elementor-widget elementor-widget-image\" data-id=\"2bbd92d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4591\/0*ZkwseE3uPxdRveL9\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5bb2e46 elementor-widget elementor-widget-text-editor\" data-id=\"5bb2e46\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>When learning, you tend to go through a lot of papers on arxiv-sanity with some really cool algorithms. Then you enter the industry and all you see is relatively basic stuff like logistic regression, feedforward NNs, random forests (decision trees), bag-of-words instead of embeddings, and you feel like these models could be implemented by the average undergrad or even a smart high schooler. Maybe if you\u2019re lucky you\u2019ll see an SVM.<\/p>\n<p><mark>Infrastructure and data pipelines are where all the real engineering work happens<\/mark>.<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1eab\">I felt similar to the OP above at the beginning of my career. But why would you use a more complicated tool to solve the task when there\u2019s no need for it. Many real-world problems don\u2019t require state-of-the-art NN architecture to be solved. Sometimes a simple logistic regression gets the job done.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"008d\">The second part of the comment is true for smaller startups in which you usually have to take care of data pipelines by yourself. In bigger companies, there are designated departments that deal with infrastructure. But there are no shortcuts \u2014 Data Scientists still need to be well informed about how data infrastructure works.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-65194af elementor-widget elementor-widget-heading\" data-id=\"65194af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">3. Learning Theory<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-796606e elementor-widget elementor-widget-image\" data-id=\"796606e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4512\/0*Mj9lc4s3wDZV5tac\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b4dd225 elementor-widget elementor-widget-text-editor\" data-id=\"b4dd225\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Learn as much fancy theory as you want, but at the end of the day, your job is going to be 99% data cleaning and infrastructure work.<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a652\">99% is a bit overexaggerated. To rephrase the OP: Machine Learning Engineers don\u2019t just play with fancy models. Sometimes they need to get their hands dirty by cleaning and labeling the data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f05a066 elementor-widget elementor-widget-heading\" data-id=\"f05a066\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Why don\u2019t you use software and services to label data?<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4b5f801 elementor-widget elementor-widget-text-editor\" data-id=\"4b5f801\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>This is very true. So much so that I thought I was alone. I work mostly in NLP and 99% of my job is labelling data and making some infrastructure in Java.<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4540\">Data labeling services are usually too expensive for the big datasets that are used in practice. Some datasets are not trivial to label. I had an experience where I was working on invoice classification and you would need professional accountants to label that data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c21fb0 elementor-widget elementor-widget-heading\" data-id=\"3c21fb0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">How does Machine Learning look in the real-world?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-2f67e3d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"51133\" data-id=\"2f67e3d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-be84699\" data-eae-slider=\"17329\" data-id=\"be84699\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c9a3b7f elementor-widget elementor-widget-text-editor\" data-id=\"c9a3b7f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p class=\"wp-block-paragraph\" id=\"91f7\">I increasingly notice that there is a gap in understanding what do Data Scientists do. Many aspiring Data Scientists are then disappointed when expectations don\u2019t meet reality. <a href=\"https:\/\/www.experfy.com\/blog\/machine-learning-engineer-vs-data-scientist-is-data-science-over\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Science<\/a> is not just about tweaking parameters of your favorite model and getting higher on the Kaggle leaderboard-\u00a0<strong>what if I told you there is no leaderboard in the real world?!?<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"01ea\">That is the reason I wrote\u00a0<a href=\"https:\/\/datascienceisfun.net\/\" target=\"_blank\" rel=\"noreferrer noopener\">Your First Machine Learning Model in the Cloud<\/a>\u00a0Ebook to show how does working on an actual Data Science projects looks from start to finish. This Ebook is aimed at Data Science enthusiasts and Software Engineers who are thinking to pursue a career in Data Science.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>ML engineers spend most of the time working on \u201chow to properly extract the training set that will resemble real-world problem distribution\u201d. Once you have that, you can in most cases train a classical Machine Learning model and it will work well enough.<\/p>\n","protected":false},"author":784,"featured_media":9767,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[97,92,437],"ppma_author":[3778],"class_list":["post-9766","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence","tag-machine-learning","tag-machine-learning-engineer"],"authors":[{"term_id":3778,"user_id":784,"is_guest":0,"slug":"roman-orac","display_name":"Roman Orac","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/04\/medium_b7d17fbf-b990-4540-aa64-0ff5333f3943-150x150.jpg","author_category":"","user_url":"https:\/\/www.sportradar.com\/","last_name":"Orac","first_name":"Roman","job_title":"","description":"Roman Orac is Senior Data Scientist at <a href=\"http:\/\/www.sportradar.com\/\">Sportradar<\/a>, a global leader in understanding and leveraging the power of sports data and digital content for its clients around the world."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9766","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/784"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=9766"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9766\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/9767"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=9766"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=9766"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=9766"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=9766"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}