{"id":22548,"date":"2021-01-07T09:55:42","date_gmt":"2021-01-07T09:55:42","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/machine-learning-going-real-time\/"},"modified":"2023-09-13T11:30:10","modified_gmt":"2023-09-13T11:30:10","slug":"machine-learning-going-real-time","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/","title":{"rendered":"Machine Learning Is Going Real-Time"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22548\" class=\"elementor elementor-22548\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-992b1b1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"992b1b1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-e274367\" data-id=\"e274367\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-87f963f elementor-widget elementor-widget-text-editor\" data-id=\"87f963f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>After talking to machine learning and infrastructure engineers at major Internet companies across the US, Europe, and China, I noticed two groups of companies. One group has made significant investments (hundreds of millions of dollars) into infrastructure to allow real-time machine learning and has already seen returns on their investments. Another group still wonders if there\u2019s value in real-time ML.<\/p>\n\n<p>There seems to be little consensus on what real-time ML means, and there hasn\u2019t been a lot of in-depth discussion on how it\u2019s done in the industry. In this post, I want to share what I\u2019ve learned after talking to about a dozen companies that are doing it.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d3c76f3 elementor-widget elementor-widget-heading\" data-id=\"d3c76f3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">There are two levels of real-time machine learning that I\u2019ll go over in this post.<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-67631f4 elementor-widget elementor-widget-text-editor\" data-id=\"67631f4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Level 1: Your ML system makes predictions in real-time (online predictions).<\/li><li>Level 2: Your system can incorporate new data and update your model in real-time (online learning).<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-076d1a3 elementor-widget elementor-widget-text-editor\" data-id=\"076d1a3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>I use \u201cmodel\u201d to refer to the machine learning model and \u201csystem\u201d to refer to the infrastructure around it, including data pipeline and monitoring systems.<\/p>\n<hr class=\"wp-block-separator\"\/>\n\n<p><strong>Table of contents<\/strong><br>\u2026.\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_predictions\">Level 1: Online predictions &#8211; your system can make predictions in real-time<\/a><br>\u2026\u2026..\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_predictions_use_cases\">Use cases<\/a><br>\u2026\u2026\u2026\u2026\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#problems_batch_predictions\">Problems with batch predictions<\/a><br>\u2026\u2026..\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_predictions_solutions\">Solutions<\/a><br>\u2026\u2026\u2026\u2026\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#fast_inference\">Fast inference<\/a><br>\u2026\u2026\u2026\u2026\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#stream_pipeline\">Real-time pipeline<\/a><br>\u2026\u2026\u2026\u2026\u2026.\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#stream_processing_vs_batch_processing\">Stream processing vs. batch processing<\/a><br>\u2026\u2026\u2026\u2026\u2026.\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#event_driven_vs_request_driven\">Event-driven vs. request-driven<\/a><br>\u2026\u2026..\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_predictions_challenges\">Challenges<\/a><br>\u2026.\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning\">Level 2: Online learning &#8211; your system can incorporate new data and update in real-time<\/a><br>\u2026\u2026..\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning_definition\">Defining \u201conline learning\u201d<\/a><br>\u2026\u2026..\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning_use_cases\">Use case<\/a><br>\u2026\u2026..\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning_solutions\">Solutions<\/a><br>\u2026\u2026..\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning_definition\">Challenges<\/a><br>\u2026\u2026\u2026\u2026\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning_theoretical_challenges\">Theoretical<\/a><br>\u2026\u2026\u2026\u2026\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning_practical_challenges\">Practical<\/a><br>\u2026.\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#mlops_china_vs_us\">The MLOps race between the US and China<\/a><br>\u2026.\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/machine-learning-going-real-time\/#online_learning_practical_challenges\">Conclusion<\/a><\/p>\n<hr class=\"wp-block-separator\"\/>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aa52ea2 elementor-widget elementor-widget-heading\" data-id=\"aa52ea2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Level 1: Online predictions - your system can make predictions in real-time<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-27a95c4 elementor-widget elementor-widget-text-editor\" data-id=\"27a95c4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><em><strong>Real-time<\/strong>&nbsp;here is defined to be in the order of milliseconds to seconds.<\/em><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3ad4a2f elementor-widget elementor-widget-heading\" data-id=\"3ad4a2f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Use cases<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bb709b6 elementor-widget elementor-widget-text-editor\" data-id=\"bb709b6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Latency matters, especially for user-facing applications. In 2009, Google\u2019s experiments demonstrated that&nbsp;<a href=\"https:\/\/services.google.com\/fh\/files\/blogs\/google_delayexp.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">increasing web search latency 100 to 400 ms reduces the daily number of searches per user by 0.2% to 0.6%<\/a>. In 2019,&nbsp;<a href=\"https:\/\/blog.acolyer.org\/2019\/10\/07\/150-successful-machine-learning-models\/\" target=\"_blank\" rel=\"noreferrer noopener\">Booking.com found that an increase of 30% in latency cost about 0.5% in conversion rates \u2014 \u201ca relevant cost for our business.\u201d<\/a><\/p>\n\n<p>No matter how great your ML models are, if they take just milliseconds too long to make predictions, users are going to click on something else.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-069fcc4 elementor-widget elementor-widget-heading\" data-id=\"069fcc4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Problems with batch predictions<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d603569 elementor-widget elementor-widget-text-editor\" data-id=\"d603569\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>One non-solution is to avoid making predictions online. You can generate predictions in batch offline, store them (e.g. in SQL tables), and pull out pre-computed predictions when needed.<\/p>\n\n<p>This can work when the input space is finite \u2013 you know exactly how many possible inputs to make predictions for. One example is when you need to generate movie recommendations for your users \u2013 you know exactly how many users there are. So you predict a set of recommendations for each user periodically, such as every few hours.<\/p>\n\n<p>To make their user input space finite, many apps make their users choose from categories instead of entering wild queries. For example, if you go to TripAdvisor, you first have to pick a predefined metropolis area instead of being able to enter just any location.<\/p>\n\n<p>This approach has many limitations. TripAdvisor results are okay within their predefined categories, such as&nbsp;<strong>\u201cRestaurants\u201d<\/strong>&nbsp;in&nbsp;<strong>\u201cSan Francisco\u201d<\/strong>, but are pretty bad when you try to enter wild queries like&nbsp;<strong>\u201chigh rating Thai restaurants in Hayes Valley\u201d<\/strong>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1181971 elementor-widget elementor-widget-image\" data-id=\"1181971\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"603\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-1024x603.png\" class=\"attachment-large size-large wp-image-18374\" alt=\"MLOps over time - Machine Learning Is Going Real-Time\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-1024x603.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-300x177.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-768x452.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-1536x904.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-610x359.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-750x442.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor-1140x671.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_tripadvisor.png 1999w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-561d444 elementor-widget elementor-widget-text-editor\" data-id=\"561d444\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Limitations caused by batch predictions exist even in more technologically progressive companies like Netflix. Say, you\u2019ve been watching a lot of horrors lately, so when you first log into Netflix, horror movies dominate recommendations. But you\u2019re feeling bright today so you search \u201ccomedy\u201d and start browsing the comedy category. Netflix should learn and show you more comedy in your list of their recommendations, right? But it can\u2019t update the list until the next time batch recommendations are generated.<\/p>\n\n<p>In the two examples above, batch predictions lead to decreases in user experience (which is tightly coupled with user engagement\/retention), not catastrophic failures. Other examples are ad ranking, Twitter\u2019s trending hashtag ranking, Facebook\u2019s newsfeed ranking, estimating time of arrival, etc.<\/p>\n\n<p>There are also many applications that, without online predictions, would lead to catastrophic failures or just wouldn\u2019t work. Examples include high frequency trading, autonomous vehicles, voice assistants, unlocking your phones using face\/fingerprints, fall detection for elderly care, fraud detection, etc. Being able to detect a fraudulent transaction that happened 3 hours ago is still better than not detecting it at all, but being able to detect it in real-time can prevent it from going through.<\/p>\n\n<p>Switching from batch predictions to real-time predictions allows you to use dynamic features to make more relevant predictions. Static features are information that changes slowly or rarely \u2013 age, gender, job, neighborhood, etc. Dynamic features are features based on what\u2019s happening right now \u2013 what you\u2019re watching, what you\u2019ve just liked, etc. Knowing a user\u2019s interests right now will allow your systems to make recommendations much more relevant to them.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-55763d6 elementor-widget elementor-widget-image\" data-id=\"55763d6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"553\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-1024x553.png\" class=\"attachment-large size-large wp-image-18375\" alt=\"MLOps over time Machine Learning Is Going Real-Time\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-1024x553.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-300x162.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-768x415.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-1536x829.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-610x329.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-750x405.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google-1140x615.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/2_google.png 1999w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8844381 elementor-widget elementor-widget-heading\" data-id=\"8844381\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Solutions<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5c072df elementor-widget elementor-widget-heading\" data-id=\"5c072df\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">For your system to be able to make online predictions, it has to have two components:<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9a49d8a elementor-widget elementor-widget-text-editor\" data-id=\"9a49d8a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li>Fast inference: model that can make predictions in the order of milliseconds<\/li><li>Real-time pipeline: a pipeline that can process data, input it into model, and return a prediction in real-time<\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-105b304 elementor-widget elementor-widget-heading\" data-id=\"105b304\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Fast inference<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cb8ccd1 elementor-widget elementor-widget-text-editor\" data-id=\"cb8ccd1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>When a model is too big and taking too long to make predictions, there are three approaches:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fa03b56 elementor-widget elementor-widget-heading\" data-id=\"fa03b56\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">1. Make models faster (inference optimization)<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ca97f37 elementor-widget elementor-widget-text-editor\" data-id=\"ca97f37\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>E.g. fusing operations, distributing computations, memory footprint optimization, writing high performance kernels targeting specific hardwares, etc.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-869845f elementor-widget elementor-widget-heading\" data-id=\"869845f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">2. Make models smaller (model compression)<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cc422ff elementor-widget elementor-widget-text-editor\" data-id=\"cc422ff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Originally, this family of technique is to make models smaller to make them fit on edge devices. Making models smaller often makes them run faster. The most common, general technique for model compression is quantization, e.g. using 16-bit floats (half precision) or 8-bit integers (fixed-point) instead of 32-bit floats (full precision) to represent your model weights. In the extreme case, some have attempted 1-bit representation (binary weight neural networks), e.g.&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1511.00363\" target=\"_blank\" rel=\"noreferrer noopener\">BinaryConnect<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1603.05279\" target=\"_blank\" rel=\"noreferrer noopener\">Xnor-Net<\/a>. The authors of Xnor-Net spun off Xnor.ai, a startup focused on model compression which was&nbsp;<a href=\"https:\/\/www.geekwire.com\/2020\/exclusive-apple-acquires-xnor-ai-edge-ai-spin-paul-allens-ai2-price-200m-range\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">acquired by Apple for a reported $200M<\/a>.<\/p>\n\n<p>Another popular technique is&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1503.02531\" target=\"_blank\" rel=\"noreferrer noopener\">knowledge distillation<\/a>&nbsp;\u2013 a small model (student) is trained to mimic a larger model or an ensemble of models (teacher). Even though the student is often trained with a pre-trained teacher, both may also be trained at the same time. One example of a distilled network used in production is&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1910.01108\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>DistilBERT<\/strong><\/a>, which reduces the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster.<\/p>\n\n<p>Other techniques include pruning (finding parameters least useful to predictions and setting them to 0) and low-rank factorization (replacing the over-parametric convolution filters with compact blocks to both reduce the number of parameters and increase speed). See&nbsp;<strong><a href=\"https:\/\/arxiv.org\/abs\/1710.09282\" target=\"_blank\" rel=\"noreferrer noopener\">A Survey of Model Compression and Acceleration for Deep Neural Networks<\/a><\/strong>&nbsp;(Cheng et al.. 2017) for a detailed analysis.<\/p>\n\n<p>The number of research papers on model compression is growing. Off-the-shelf utilities are proliferating. Awesome Open Source has a list of&nbsp;<a href=\"https:\/\/awesomeopensource.com\/projects\/model-compression\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\"><strong>The Top 40 Model Compression Open Source Projects<\/strong><\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2d07796 elementor-widget elementor-widget-heading\" data-id=\"2d07796\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">3. Make hardware faster<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-259451a elementor-widget elementor-widget-text-editor\" data-id=\"259451a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>This is another research area that is booming. Big companies and startups alike are in a race to develop hardware that allows large ML models to do inference, even training, faster both on the cloud and especially on devices. IDC forecasts that by 2020, the combination of edge and mobile devices doing inferencing will&nbsp;<a href=\"https:\/\/www.arm.com\/-\/media\/global\/solutions\/artificial-intelligence\/ai-ml-on-cpu-whitepaper.pdf?revision=17a2b30b-0f5a-4a42-8681-3d9f3f94e513\" target=\"_blank\" rel=\"noreferrer noopener\">total 3.7 billion units, with a further 116 million units doing training<\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-23147a1 elementor-widget elementor-widget-heading\" data-id=\"23147a1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Real-time pipeline<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bee6d0c elementor-widget elementor-widget-text-editor\" data-id=\"bee6d0c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Suppose you have a ride sharing app and want to detect fraudulent transactions e.g. payments using stolen credit cards. When the true credit owner discovers unauthorized payments, they\u2019ll dispute with their bank and you\u2019ll have to refund the charges. To maximize profits, fraudsters might call multiple rides either in succession or from multiple accounts. In 2019, merchants estimate fraudulent transactions account for an average of&nbsp;<a href=\"https:\/\/network.americanexpress.com\/globalnetwork\/dam\/jcr:09c34553-b4a2-43ca-bf3e-47cbc911ea51\/American%20Express%202019%20Digital%20Payments%20Survey_Insights%20Paper.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">27% of their annual online sales<\/a>. The longer it takes for you to detect the stolen credit card, the more money you\u2019ll lose.<\/p>\n\n<p>To detect whether a transaction is fraudulent, looking at that transaction alone isn\u2019t enough. You need to at least look into the recent history of the user involved in that transaction, their recent trips and activities in-app, the credit card\u2019s recent transactions, and other transactions happening around the same time.<\/p>\n\n<p>To quickly access these types of information, you want to keep as much of them in-memory as possible. Every time an event you care about happens \u2013 a user choosing a location, booking a trip, contacting a driver, canceling a trip, adding a credit card, removing a credit card, etc. \u2013 information about that event goes into your in-memory storage. It stays there for as long as they are useful (usually in order of days) then either goes into permanent storage (e.g. S3) or is discarded. The most common tool for this is&nbsp;<a href=\"https:\/\/github.com\/apache\/kafka\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Kafka<\/a>, with alternatives such as Amazon Kinesis. Kafka is a stream storage: it stores data as it streams.<\/p>\n\n<p>Streaming data is different from static data \u2013 data that already exists somewhere in its entirety, such as CSV files. When reading from CSV files, you know when the job is finished. Streams of data never finish.<\/p>\n\n<p>Once you\u2019ve had a way to manage streaming data, you want to extract features to input into your ML models. On top of features from streaming data, you might also need features from static data (when was this account created, what\u2019s the user\u2019s rating, etc.). You need a tool that allows you to process streaming data as well as static data and join them together from various data sources.<\/p>\n\n<p><strong>Stream processing vs. batch processing<\/strong><\/p>\n\n<p>People generally use \u201cbatch processing\u201d to refer to static data processing because you can process them in batches. This is opposed to \u201cstream processing\u201d, which processes each event as it arrives. Batch processing is&nbsp;<strong>efficient<\/strong>&nbsp;\u2013 you can leverage tools like MapReduce to process large amounts of data. Stream processing is&nbsp;<strong>fast<\/strong>&nbsp;because you can process each piece of data as soon as it comes. Robert Metzger, a PMC member at Apache Flink, disputed that streaming processing can be as efficient as batch processing because&nbsp;<a href=\"https:\/\/www.ververica.com\/blog\/batch-is-a-special-case-of-streaming\" rel=\"noopener\">batch is a special case of streaming<\/a>.<\/p>\n\n<p>Processing stream data is more difficult because the data amount is unbounded and the data comes in at variable rates and speeds. It\u2019s easier to make a stream processor do batch processing than making a batch processor do stream processing.<\/p>\n\n<p>Apache Kafka has some capacity for stream processing and some companies use this capacity on top of their Kafka stream storage, but Kafka stream processing is limited in its ability to deal with various data sources. There have been efforts to extend SQL, the popular query language intended for static data tables, to handle data streams [<a href=\"http:\/\/cs.brown.edu\/~ugur\/streamsql.pdf\" rel=\"noopener\">1<\/a>,&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/StreamSQL\" rel=\"noopener\">2<\/a>]. However, the most popular tool for stream processing is&nbsp;<a href=\"https:\/\/github.com\/apache\/flink\" target=\"_blank\" rel=\"noreferrer noopener\">Apache Flink<\/a>, with native support for batch processing.<\/p>\n\n<p>In the early days of machine learning production, many companies built their ML systems on top of their existing MapReduce\/Spark\/Hadoop data pipeline. When these companies want to do real-time inference, they need to build a separate pipeline for streaming data.<\/p>\n\n<p>Having two different pipelines to process your data is a common cause for bugs in ML production, e.g. the changes in one pipeline aren\u2019t correctly replicated in the other leading to two pipelines extracting two different sets of features. This is especially common if the two pipelines are maintained by two different teams, e.g. the development team maintains the batch pipeline for training while the deployment team maintains the stream pipeline for inference. Companies including&nbsp;<a href=\"https:\/\/www.infoq.com\/presentations\/sql-streaming-apache-flink\/\" target=\"_blank\" rel=\"noreferrer noopener\">Uber<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/www.youtube.com\/watch?v=WQ520rWgd9A&amp;ab_channel=FlinkForward\" target=\"_blank\" rel=\"noreferrer noopener\">Weibo<\/a>&nbsp;have made major infrastructure overhaul to unify their batch and stream processing pipelines with Flink.<\/p>\n\n<p><strong>Event-driven vs. request-driven<\/strong><\/p>\n\n<p>The software world has gone microservices in the last decade. The idea is to break your business logic into small components \u2013 each component is a self-contained service \u2013 that can be maintained independently. The owner of each component can update to and test that component quickly without having to consult the rest of the system.<\/p>\n\n<p>Microservices often go hand-in-hand with REST, a set of methods that let these microservices communicate. REST APIs are request-driven. A client (service) sends requests to tell its server exactly what to do via methods such as POST and GET, and its server responds with the results. A server has to listen to the request for the request to register.<\/p>\n\n<p>Because in a request-driven world, data is handled via requests to different services, no one has an overview of how data flows through the entire system. Consider a simple system with 3 services:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6a87490 elementor-widget elementor-widget-text-editor\" data-id=\"6a87490\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>A manages drivers availability<\/li><li>B manages ride demand<\/li><li>C predicts the best possible price to show customers each time they request a ride<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-07c07ae elementor-widget elementor-widget-text-editor\" data-id=\"07c07ae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Because prices depend on availability and demands, service C\u2019s output depends on the outputs from service A and B. First, this system requires inter-service communication: C needs to ping A and B for predictions, A needs to ping B to know whether to mobilize more drivers and ping C to know what price incentive to give them. Second, there\u2019d be no easy way to monitor how changes in A or B logics affect the performance of service C, or to map the data flow to debug if service C\u2019s performance suddenly goes down.<\/p>\n\n<p>With only 3 services, things are already getting complicated. Imagine having hundreds, if not thousands of services like what major Internet companies have. Inter-service communication would blow up. Sending data as JSON blobs over HTTP \u2013 the way REST requests are commonly done \u2013 is also slow. Inter-service data transfer can become a bottleneck, slowing down the entire system.<\/p>\n\n<p>Instead of having 20 services ping service A for data, what if whenever an event happens within service A, this event is broadcasted to a stream, and whichever service wants data from A can subscribe to that stream and pick out what it needs? What if there\u2019s a stream all services can broadcast their events and subscribe to? This model is called pub\/sub: publish &amp; subscribe. This is what solutions like Kafka allow you to do. Since all data flows through a stream, you can set up a dashboard to monitor your data and its transformation across your system. Because it\u2019s based on events broadcasted by services, this architecture is event-driven.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-189d83a elementor-widget elementor-widget-image\" data-id=\"189d83a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"505\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-1024x505.png\" class=\"attachment-large size-large wp-image-18376\" alt=\"MLOps over time Machine Learning Is Going Real-Time\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-1024x505.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-300x148.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-768x379.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-1536x757.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-610x301.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-750x370.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event-1140x562.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/3_request_event.png 1582w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-485a110 elementor-widget elementor-widget-text-editor\" data-id=\"485a110\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-text-align-center\"><a href=\"https:\/\/www.infoq.com\/presentations\/microservices-streams-state-scalability\/\" target=\"_blank\" rel=\"noreferrer noopener\">Beyond Microservices: Streams, State and Scalability<\/a>&nbsp;(Gwen Shapira, QCon 2019)<\/p>\n\n<p>Request-driven architecture works well for systems that rely more on logics than on data. Event-driven architecture works better for systems that are data-heavy.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-62e488b elementor-widget elementor-widget-heading\" data-id=\"62e488b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Challenges<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6bf6306 elementor-widget elementor-widget-text-editor\" data-id=\"6bf6306\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Many companies are switching from batch processing to stream processing, from request-driven architecture to event-driven architecture. My impression from talking to major Internet companies in the US and China is that this change is still slow in the US, but much faster in China. The adoption of streaming architecture is tied to the popularity of Kafka and Flink. Robert Metzger told me that he observed more machine learning workloads with Flink in Asia than in the US. Google Trends for \u201cApache Flink\u201d is consistent with this observation.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-de734f6 elementor-widget elementor-widget-image\" data-id=\"de734f6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"851\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china-1024x851.png\" class=\"attachment-large size-large wp-image-18377\" alt=\"Machine Learning Is Going Real-Time\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china-1024x851.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china-300x249.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china-768x638.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china-610x507.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china-750x623.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china-1140x947.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/4_mlops_china.png 1526w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5d6a1ed elementor-widget elementor-widget-text-editor\" data-id=\"5d6a1ed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>There are many reasons why streaming isn\u2019t more popular.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8147216 elementor-widget elementor-widget-text-editor\" data-id=\"8147216\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li><strong>Companies don\u2019t see the benefits of streaming<\/strong><ul><li>Their system isn\u2019t at a scale where inter-service communication is a bottleneck.<\/li><li>They don\u2019t have applications that benefit from online predictions.<\/li><li>They have applications that might benefit from online predictions but they don\u2019t know that yet because they have never done online predictions before.<\/li><\/ul><\/li><li><strong>High initial investment on infrastructure<\/strong><br>Infrastructure updates are expensive and can jeopardize existing applications. Managers might not be willing to invest to upgrade their infra to allow online predictions.<\/li><li><strong>Mental shift<\/strong><br>Switching from batch processing to stream processing requires a mental shift. With batch processing, you know when a job is done. With stream processing, it\u2019s never done. You can make rules such as get the average of all data points in the last 2 minutes, but what if an event that happened 2 minutes ago got delayed and hasn\u2019t entered the stream yet? With batch processing, you can have well-defined tables and join them, but in streaming, there are no tables to join, then what does it mean to do a join operation on two streams?<\/li><li><strong>Python incompatibility<\/strong><br>Python is the lingua franca of machine learning whereas Kafka and Flink run on Java and Scala. Introducing streaming might create language incompatibility in the workflows. Apache Beam provides a Python interface on top of Flink for communicating with streams, but you\u2019d still need people who can work with Java\/Scala.<\/li><li><strong>Higher processing cost<\/strong><br>Batch processing means you can use your computing resources more efficiently. If your hardware is capable of processing 1000 data points at a time, it\u2019s wasteful to use it to process only 1 data point at a time.<\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fb7dd96 elementor-widget elementor-widget-heading\" data-id=\"fb7dd96\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Level 2: Online learning - your system can incorporate new data and update in real-time<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-70c458c elementor-widget elementor-widget-text-editor\" data-id=\"70c458c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><em><strong>Real-time<\/strong>&nbsp;here is defined to be in the order of minutes<\/em><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e00f2cb elementor-widget elementor-widget-heading\" data-id=\"e00f2cb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Defining \"online learning\"<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9e6bd51 elementor-widget elementor-widget-text-editor\" data-id=\"9e6bd51\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>I used \u201conline learning\u201d instead of \u201conline training\u201d because the latter term is contentious. By definition, online training means learning from each incoming data point. Very, very few companies actually do this because:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2385d38 elementor-widget elementor-widget-text-editor\" data-id=\"2385d38\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>This method suffers from catastrophic forgetting \u2013 neural networks abruptly forget previously learned information upon learning new information.<\/li><li>It can be more expensive to run a learning step on only one data point than on a batch (this can be mitigated by having hardware just powerful enough to process exactly one data point).<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-82fe453 elementor-widget elementor-widget-text-editor\" data-id=\"82fe453\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Even if a model is learning with each incoming data point, it doesn\u2019t mean the new weights are deployed after each data point. With our current limited understanding of how ML algorithms learn, the updated model needs to be evaluated first to see how well it does.<\/p>\n\n<p>For most companies that do so-called online training, their models learn in micro batches and are evaluated after a certain period of time. Only after its performance is evaluated to be satisfactory that the model is deployed wider. For Weibo, their iteration cycle from <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/overview-of-different-approaches-to-deploying-machine-learning-models-in-production\/\" target=\"_blank\" rel=\"noreferrer noopener\">learning to deploying model <\/a>updates is 10 minutes.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6812cb5 elementor-widget elementor-widget-image\" data-id=\"6812cb5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"557\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-1024x557.png\" class=\"attachment-large size-large wp-image-18378\" alt=\"Machine Learning Is Going Real-Time\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-1024x557.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-300x163.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-768x418.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-1536x836.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-610x332.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-750x408.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo-1140x620.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5_weibo.png 1999w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d0ccfad elementor-widget elementor-widget-text-editor\" data-id=\"d0ccfad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-text-align-center\"><a href=\"https:\/\/www.youtube.com\/watch?v=WQ520rWgd9A\" target=\"_blank\" rel=\"noreferrer noopener\">Machine learning with Flink in Weibo<\/a>&nbsp;(Qian Yu, Flink Forward 2020)<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-19e58f0 elementor-widget elementor-widget-heading\" data-id=\"19e58f0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Use cases<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0caef0b elementor-widget elementor-widget-text-editor\" data-id=\"0caef0b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>TikTok is incredibly addictive. Its secret lies in its&nbsp;<a href=\"https:\/\/newsroom.tiktok.com\/en-us\/how-tiktok-recommends-videos-for-you\" target=\"_blank\" rel=\"noreferrer noopener\">recommendation systems<\/a>&nbsp;that can learn your preferences quickly and suggest videos that you are likely to watch next, giving its users an incredible scrolling experience. It\u2019s possible because ByteDance, the company behind TikTok, has set up a mature infrastructure that allows their recommendation systems to learn their user preferences (\u201cuser profiles\u201d in their lingo) in real-time.<\/p>\n\n<p>Recommendation systems are perfect candidates for online learning. They have natural labels \u2013 if a user clicks on a recommendation, it\u2019s a correct prediction. Not all recommendation systems need online learning. User preferences for items like houses, cars, flights, hotels are unlikely to change from a minute to the next, so it would make little sense for systems to continually learn. However, user preferences for online content \u2013 videos, articles, news, tweets, posts, memes \u2013 can change very quickly (\u201cI just read that octopi sometimes punch fish for no reason and now I want to see a video of it\u201d). As preferences for online content change in real-time, ads systems also need to be updated in real-time to show relevant ads.<\/p>\n\n<p>Online learning is crucial for systems to adapt to rare events. Consider online shopping on Black Friday. Because Black Friday happens only once a year, there\u2019s no way Amazon or other ecommerce sites can get enough historical data to learn how users are going to behave that day, so their systems need to continually learn on that day to adapt.<\/p>\n\n<p>Or consider Twitter search when someone famous tweets something stupid. For example, as soon as the news about \u201cFour Seasons Total Landscaping\u201d went live, many people were going to search \u201ctotal landscaping\u201d. If your system doesn\u2019t immediately learn that \u201ctotal landscaping\u201d here refers to the press conference, your users are going to get a lot of gardening recommendations.<\/p>\n\n<p>Online learning can also help with the cold start problem. A user just joined your app and you have no information on them yet. If you don\u2019t have the capacity for any form of online learning, you\u2019ll have to serve your users generic recommendations until the next time your model is trained offline.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d92233b elementor-widget elementor-widget-heading\" data-id=\"d92233b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Solutions<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6342913 elementor-widget elementor-widget-text-editor\" data-id=\"6342913\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Since online learning is still fairly new and most companies who are doing it aren\u2019t talking publicly about it in detail yet, there\u2019s no standard solution.<\/p>\n\n<p>Online learning doesn\u2019t mean \u201cno batch training\u201d. The companies that have most successfully used online learning also train their models offline in parallel and then combine the online version with the offline version.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eaec6cf elementor-widget elementor-widget-heading\" data-id=\"eaec6cf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Challenges<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3018851 elementor-widget elementor-widget-text-editor\" data-id=\"3018851\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>There are many challenges facing online learning, both theoretical and practical.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-69739de elementor-widget elementor-widget-heading\" data-id=\"69739de\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Theoretical<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e2670ae elementor-widget elementor-widget-text-editor\" data-id=\"e2670ae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Online learning flips a lot of what we\u2019ve learned about machine learning on its head. In introductory machine learning classes, students are probably taught different versions of \u201ctrain your model with a sufficient number of epochs until convergence.\u201d In online learning, there\u2019s no epoch \u2013 your model sees each data point only once. There\u2019s no such thing as convergence either. Your underlying data distribution keeps on shifting. There\u2019s nothing stationary to converge to.<\/p>\n\n<p>Another theoretical challenge for online learning is model evaluation. In traditional batch training, you evaluate your models on stationary held out test sets. If a new model performs better than the existing model on the same test set, we say the new model is better. However, the goal of online learning is to adapt your model to constantly changing data. If your updated model is trained to adapt to data now, and we know that data now is different from data in the past, it wouldn\u2019t make sense to use old data to test your updated model.<\/p>\n\n<p>Then how do we know that the model trained on data from the last 10 minutes is better than the model trained on data from 20 minutes ago? We have to compare these two models on current data. Online training demands online evaluation, but serving a model that hasn\u2019t been tested to users sounds like a recipe for disaster.<\/p>\n\n<p>Many companies do it anyway. New models are first subject to offline tests to make sure they aren\u2019t disastrous, then evaluated online in parallel with the existing models via a complex A\/B testing system. Only when a model is shown to be better than an existing model in some metrics the company cares about that it can be deployed wider. (Don\u2019t get me started on choosing a metric for online evaluation).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-289493a elementor-widget elementor-widget-heading\" data-id=\"289493a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Practical<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c8279d elementor-widget elementor-widget-text-editor\" data-id=\"3c8279d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>There are not yet standard infrastructures for online training. Some companies have converged to streaming architecture with&nbsp;<a href=\"https:\/\/web.eecs.umich.edu\/~mosharaf\/Readings\/Parameter-Server.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">parameter servers<\/a>, but other than that, companies that do online training that I\u2019ve talked to have to build a lot of their infrastructures in house. I\u2019m reluctant to discuss this online since some companies asked me to keep this information confidential because they\u2019re building solutions for them \u2013 it\u2019s their competitive advantage.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fd3abbb elementor-widget elementor-widget-heading\" data-id=\"fd3abbb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">The MLOps race between the US and China<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6b1ebb2 elementor-widget elementor-widget-text-editor\" data-id=\"6b1ebb2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>I\u2019ve read a lot about the AI race between the US and China, but most comparisons seem to focus on the number of&nbsp;<a href=\"https:\/\/datainnovation.org\/2019\/08\/who-is-winning-the-ai-race-china-the-eu-or-the-united-states\/\" target=\"_blank\" rel=\"noreferrer noopener\">research papers, patents, citations, funding<\/a>. Only after I\u2019ve started talking to both American and Chinese companies about real-time machine learning that I noticed a staggering difference in their MLOps infrastructures.<\/p>\n\n<p>Few American Internet companies have attempted online learning, and even among these companies, online learning is used for simple models such as logistic regression. My impression from both talking directly to Chinese companies and talking with people who have worked with companies in both countries is that online learning is more common in China, and Chinese engineers are more eager to make the jump. You can see some of the conversations&nbsp;<a href=\"https:\/\/twitter.com\/chipro\/status\/1337077324936663040\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/www.linkedin.com\/posts\/chiphuyen_mlops-machinelearning-activity-6742844916705177600-taRd\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-461eb3b elementor-widget elementor-widget-image\" data-id=\"461eb3b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"662\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-1024x662.png\" class=\"attachment-large size-large wp-image-18379\" alt=\"MLOps over time\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-1024x662.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-300x194.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-768x497.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-1536x994.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-610x395.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-750x485.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us-1140x737.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/6_mlops_china_us.png 1999w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-069ebec elementor-widget elementor-widget-heading\" data-id=\"069ebec\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e2131b9 elementor-widget elementor-widget-text-editor\" data-id=\"e2131b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Machine learning is going real-time, whether you\u2019re ready or not. While the majority of companies are still debating whether there\u2019s value in online inference and online learning, some of those who do it correctly have already seen returns on investment, and their real-time algorithms might be a major contributing factor that helps them stay ahead of their competitors.<\/p>\n\n<p>I have a lot more thoughts on real-time machine learning but this post is already long. If you\u2019re interested in chatting about this, shoot me an email.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-644c287 elementor-widget elementor-widget-heading\" data-id=\"644c287\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Acknowledgments<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-804c74a elementor-widget elementor-widget-text-editor\" data-id=\"804c74a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>This post is a synthesis of many conversations with the following wonderful engineers and academics. I\u2019d like to thank Robert Metzger, Neil Lawrence, Savin Goyal, Zhenzhong Xu, Ville Tuulos, Dat Tran, Han Xiao, Hien Luu, Ledio Ago, Peter Skomoroch, Piero Molino, Daniel Yao, Jason Sleight, Becket Qin, Tien Le, Abraham Starosta, Will Deaderick, Caleb Kaiser, Miguel Ramos.<\/p>\n\n<p>There are several more people who have chosen to stay anonymous. Without them, this post would be incomplete.<\/p>\n\n<p>Thanks&nbsp;<a href=\"https:\/\/twitter.com\/Luke_Metz\" rel=\"noopener\">Luke Metz<\/a>&nbsp;for being an amazing first reader!<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>There seems to be little consensus on what real-time ML means, and there has not been a lot of in-depth discussion on how it is done in the industry.  This post shares two levels of real-time machine learning for online predictions, and online learning.<\/p>\n","protected":false},"author":1022,"featured_media":18380,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[92,549,1220,1221],"ppma_author":[3822],"class_list":["post-22548","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-machine-learning","tag-mlops","tag-online-learning","tag-online-predictions"],"authors":[{"term_id":3822,"user_id":1022,"is_guest":0,"slug":"chip-huyen","display_name":"Chip Huyen","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Chip-Huyen-150x150.jpeg","user_url":"https:\/\/huyenchip.com\/","last_name":"Huyen","first_name":"Chip","job_title":"","description":"Chip Huyen, a writer and computer scientist,  is ML Engineer &amp; Open Source Lead at  Snorkel AI, a data-first end-to-end platform for developing AI applications.  She created TensorFlow for Deep Learning Research  at Stanford University and taught the course, and will be teaching Machine Learning Systems Design there from January 2021. LinkedIn included her among Top Voices in Software Development (2019) and Top Voices in Data Science &amp; AI (2020). She has published four best-selling Vietnamese books."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1022"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22548"}],"version-history":[{"count":7,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22548\/revisions"}],"predecessor-version":[{"id":32828,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22548\/revisions\/32828"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18380"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22548"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}