{"id":1637,"date":"2019-04-15T03:22:17","date_gmt":"2019-04-15T03:22:17","guid":{"rendered":"http:\/\/kusuaks7\/?p=1242"},"modified":"2023-06-29T12:35:54","modified_gmt":"2023-06-29T12:35:54","slug":"generalizable-deep-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/generalizable-deep-reinforcement-learning\/","title":{"rendered":"Everything you need to know about Google\u2019s new PlaNet reinforcement learning network"},"content":{"rendered":"<section>\n<h3 style=\"color: #aaa; font-style: italic;\">What Google AI\u2019s PlaNet AI means for reinforcement learning research and how transfer learning plays a key\u00a0role.<\/h3>\n<p style=\"text-align: center;\"><img decoding=\"async\" style=\"width: 600px; height: 400px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/2560\/1*PSIWp3AA8eoxMuHxZKXn9g.jpeg\" \/><\/p>\n<p style=\"text-align: center;\">Learning to walk before we can\u00a0run<\/p>\n<p id=\"0385\">Transfer learning is all the rage in the machine learning community these days.<\/p>\n<p id=\"4af8\">Transfer learning serves as the basis for many of the managed AutoML services that\u00a0<a href=\"https:\/\/cloud.google.com\/automl\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/cloud.google.com\/automl\/\" data->Google<\/a>,\u00a0<a href=\"https:\/\/engineering.salesforce.com\/open-sourcing-transmogrifai-4e5d0e098da2\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/engineering.salesforce.com\/open-sourcing-transmogrifai-4e5d0e098da2\" data->Salesforce<\/a>,\u00a0<a href=\"https:\/\/www.ibm.com\/watson\/services\/visual-recognition\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.ibm.com\/watson\/services\/visual-recognition\/\" data->IBM<\/a>, and\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/announcing-automated-ml-capability-in-azure-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/announcing-automated-ml-capability-in-azure-machine-learning\/\" data->Azure<\/a>\u00a0provide. It now figures prominently in the latest NLP research\u200a\u2014\u200aappearing in Google\u2019s Bidirectional Encoder Representations from Transformers (<a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\" data->BERT<\/a>) model and in Sebastian Ruder and Jeremy Howard\u2019s Universal Language Model Fine-tuning for Text Classification (<a href=\"https:\/\/arxiv.org\/abs\/1801.06146\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/abs\/1801.06146\" data->ULMFIT<\/a>).<\/p>\n<p id=\"c877\">As Sebastian writes in his blog post, \u2018<a href=\"http:\/\/ruder.io\/nlp-imagenet\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/ruder.io\/nlp-imagenet\/\" data->NLP\u2019s ImageNet moment has arrived<\/a>\u2019:<\/p>\n<blockquote id=\"a28c\"><p>These works\u00a0<a href=\"https:\/\/blog.openai.com\/language-unsupervised\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/blog.openai.com\/language-unsupervised\/\" data->made<\/a>\u00a0<a href=\"https:\/\/techcrunch.com\/2018\/06\/15\/machines-learn-language-better-by-using-a-deep-understanding-of-words\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/techcrunch.com\/2018\/06\/15\/machines-learn-language-better-by-using-a-deep-understanding-of-words\/\" data->headlines<\/a>\u00a0by demonstrating that pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks. Such methods herald a watershed moment: they may have the same wide-ranging impact on NLP as pretrained ImageNet models had on computer vision.<\/p><\/blockquote>\n<p id=\"d85f\">We\u2019re also starting to see examples of neural networks that can handle multiple tasks using transfer learning\u00a0<em>across domains<\/em>.\u00a0<a href=\"https:\/\/medium.com\/@paraschopra\" target=\"_blank\" rel=\"noopener noreferrer\" data-action=\"show-user-card\" data-action-type=\"hover\" data-action-value=\"ce4d7f282c52\" data-anchor-type=\"2\" data-href=\"https:\/\/medium.com\/@paraschopra\" data-user-id=\"ce4d7f282c52\" data->Paras Chopra<\/a>\u00a0has an excellent tutorial for one PyTorch network that can conduct an image search based on a textual description, search for similar images and words, and write captions for images (link to his post below).<\/p>\n<p><a title=\"https:\/\/towardsdatascience.com\/one-neural-network-many-uses-image-captioning-image-search-similar-image-and-words-in-one-model-1e22080ce73d\" href=\"https:\/\/towardsdatascience.com\/one-neural-network-many-uses-image-captioning-image-search-similar-image-and-words-in-one-model-1e22080ce73d\" data-href=\"https:\/\/towardsdatascience.com\/one-neural-network-many-uses-image-captioning-image-search-similar-image-and-words-in-one-model-1e22080ce73d\" data- rel=\"noopener\"><strong>One neural network, many uses<\/strong><br \/>\n<em>Build image search, image captioning, similar words and similar images using a single model<\/em>towardsdatascience.com<\/a><\/p>\n<p id=\"5f85\"><strong>The main question at hand is:<\/strong>\u00a0<strong>could transfer learning have applications within reinforcement learning?<\/strong><\/p>\n<p id=\"a26f\">Compared to other machine learning methods, deep reinforcement learning has a reputation for being data hungry, subject to instability in its learning process (see Deepmind\u2019s\u00a0<a href=\"http:\/\/the%20correlations%20present%20in%20the%20sequence%20of%20observations%2C%20the%20fact%20that%20small%20updates%20to%20q%20may%20significantly%20change%20the%20policy%20and%20therefore%20change%20the%20data%20distribution%2C%20and%20the%20correlations%20between%20the%20action-values%20and%20the%20target%20values.\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/the correlations present in the sequence of observations, the fact that small updates to Q may significantly change the policy and therefore change the data distribution, and the correlations between the action-values and the target values.\">paper on RL with neural networks<\/a>), and a laggard in terms of performance. There\u2019s a reason why the main areas and use cases where we\u2018ve seen reinforcement learning being applied to are games or robotics\u200a\u2014\u200a<em>namely, scenarios that can generate significant amounts of simulated data.<\/em><\/p>\n<p id=\"35cc\">At the same time, many believe that reinforcement learning is still the most viable approach for achieving Artificial General Intelligence (AGI). Yet reinforcement learning continually bumps up against the ability to\u00a0<em>generalize to many tasks in diverse settings\u200a\u2014\u200a<\/em>a key attribute of intelligence.<\/p>\n<p id=\"bc86\">After all, learning is not an easy task. These reinforcement learning agents must process and derive efficient representations of their environment when these environments have both high-dimensional sensory inputs and either no notion of or an extremely delayed notion of progress, reward, or success. On top of that, they have to use this information to generalize past experiences to new situations.<\/p>\n<\/section>\n<section>\n<hr \/>\n<blockquote id=\"dda2\"><p><strong>Up to this point, reinforcement learning techniques and research has primarily focused on mastery of individual tasks. I was interested to see if transfer learning could aid reinforcement learning research achieve generality\u200a\u2014\u200aso I was very excited when the Google AI team released the\u00a0<\/strong><a href=\"https:\/\/arxiv.org\/pdf\/1811.04551.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/pdf\/1811.04551.pdf\" data-><strong>Deep Planning Network (PlaNet) agent<\/strong><\/a><strong>\u00a0earlier this\u00a0year.<\/strong><\/p><\/blockquote>\n<h3 id=\"795a\">Behind PlaNet<\/h3>\n<p id=\"3677\">For the project, the PlaNet agent was tasked with \u2018planning\u2019 a sequence of actions to achieve a goal like pole balancing, teaching a virtual entity (human or cheetah) to walk, or keeping a box rotating by hitting it in a specific location.<\/p>\n<figure id=\"d5d1\"><canvas width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 385px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*S36WkYW62J7oOgbx1rWRvA.gif\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*S36WkYW62J7oOgbx1rWRvA.gif\" \/><\/figure>\n<p style=\"text-align: center;\">Overview of the six tasks that the Deep Planning Network (PlaNet) agent had to perform.\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=tZk1eof_VNA&amp;feature=youtu.be\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=tZk1eof_VNA&amp;feature=youtu.be\" data->See the longer\u00a0video<\/a><\/p>\n<p id=\"7aaf\">From\u00a0<a href=\"https:\/\/ai.googleblog.com\/2019\/02\/introducing-planet-deep-planning.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/ai.googleblog.com\/2019\/02\/introducing-planet-deep-planning.html\" data->the original Google AI blog post introducing PlaNet<\/a>, here are the six tasks (plus the challenges associated with that task):<\/p>\n<ul>\n<li id=\"3671\"><strong>Cartpole Balance:\u00a0<\/strong>starting from a balancing position, the agent must quickly recognize to keep the pole up<\/li>\n<li id=\"e7a0\"><strong>Cartpole Swingup:<\/strong>\u00a0with a fixed camera, so the cart can move out of sight. The agent thus must absorb and remember information over multiple frames.<\/li>\n<li id=\"df68\"><strong>Finger Spin:<\/strong>\u00a0requires predicting two separate objects, as well as the interactions between them.<\/li>\n<li id=\"73f9\"><strong>Cheetah Run<\/strong>: includes contacts with the ground that are difficult to predict precisely, calling for a model that can predict multiple possible futures.<\/li>\n<li id=\"4a0c\"><strong>Cup Catch:<\/strong>\u00a0only provides a sparse reward signal once a ball is caught. This demands accurate predictions far into the future to plan a precise sequence of actions.<\/li>\n<li id=\"b421\"><strong>Walker Walk:<\/strong>\u00a0where a simulated robot starts off by lying on the ground, and must first learn to stand up and then walk.<\/li>\n<\/ul>\n<p id=\"627b\">There are a few common goals between these tasks that the PlaNet needed to achieve:<\/p>\n<ol>\n<li id=\"45c4\">The Agent needs to predict a variety of possible futures (for robust planning)<\/li>\n<li id=\"7bda\">The Agent needs to update the plan based on the outcomes\/rewards of a recent action<\/li>\n<li id=\"bc8f\">The Agent needs to retain information over many time steps<\/li>\n<\/ol>\n<p id=\"43fd\">So how did the Google AI team achieve these goals?<\/p>\n<h3 id=\"8cce\">PlaNet AI\u2026and the\u00a0rest?<\/h3>\n<p id=\"1e74\">PlaNet AI marked a departure from traditional reinforcement learning in three distinct ways:<\/p>\n<ol>\n<li id=\"a683\"><strong>Learning with a latent dynamics model\u200a<\/strong>\u2014\u200aPlaNet learns from a series of hidden or latent states\u00a0<em>instead of images<\/em>\u00a0to predict the latent state moving forward.<\/li>\n<li id=\"7754\"><strong>Model-based planning<\/strong>\u200a\u2014\u200aPlaNet works without a policy network and instead makes decisions based on continuous planning.<\/li>\n<li id=\"2ed1\"><strong>Transfer learning<\/strong>\u200a\u2014\u200aThe Google AI team trained a single PlaNet agent to solve all six different tasks.<\/li>\n<\/ol>\n<p id=\"f232\">Let\u2019s dig into each one of these differentiators and see how they impact model performance.<\/p>\n<h4 id=\"1d15\">#1 Latent Dynamics\u00a0Model<\/h4>\n<p id=\"28b8\">The authors\u2019 main decision here was whether to use compact latent states or original sensory inputs from the environment.<\/p>\n<p id=\"c002\">There are a few trade-offs here. Using a compact latent space means an extra difficulty bump because now the agent not only has to learn to defeat the game but also has to build an understanding of the visual concepts within the game\u200a\u2014\u200athis encoding and decoding of images requires significant computation.<\/p>\n<p id=\"2748\">The key benefits to using compact latent state spaces are that it allows the agent to learn more abstract representations like the objects\u2019 positions and velocities and also avoid having to generate images. This means that the actual planning is much faster because the agent only needs to predict future rewards and not images or the scenario.<\/p>\n<p id=\"f7ae\">Latent dynamics models are being more commonly used now since researchers argue that \u201c<a href=\"https:\/\/deepdrive.berkeley.edu\/node\/209\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/deepdrive.berkeley.edu\/node\/209\" data->the simultaneous training of a latent dynamics model in conjunction with a provided reward will create a latent embedding sensitive to factors of variation relevant the reward signal and insensitive to extraneous factors of the simulated environment used during training<\/a>.\u201d<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/0*FNEcEgpVgc1EblVk.png\" \/><\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/ai.googleblog.com\/2019\/02\/introducing-planet-deep-planning.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/ai.googleblog.com\/2019\/02\/introducing-planet-deep-planning.html\" data->Learned Latent Dynamics Model\u200a<\/a>\u2014\u200aInstead of using the input images directly, the encoder networks (gray trapezoids) compress the images\u2019 information into hidden states (green circles). These hidden states are then used to predict future images (blue trapezoids) and rewards (blue rectangle).<\/p>\n<blockquote id=\"5723\"><p>Check out this excellent paper \u2018<a href=\"https:\/\/arxiv.org\/abs\/1903.10404\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/abs\/1903.10404\" data->On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning<\/a>\u2019, where they state:<\/p><\/blockquote>\n<blockquote id=\"c7fb\"><p>In autonomous embedded systems, it is often vital to reduce the amount of actions taken in the real world and energy required to learn a policy. Training reinforcement learning agents from high dimensional image representations can be very expensive and time consuming. Autoencoders are deep neural network used to compress high dimensional data such as pixelated images into small latent representations.<\/p><\/blockquote>\n<h4 id=\"4597\">#2 Model-based Planning vs. Model-free<\/h4>\n<figure id=\"3f89\"><img decoding=\"async\" style=\"width: 700px; height: 118px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*eyQbfEfcMONf341r14ZmEw.png\" data-image-id=\"1*eyQbfEfcMONf341r14ZmEw.png\" \/><\/figure>\n<p style=\"text-align: center;\"><a href=\"https:\/\/medium.com\/@jonathan_hui\/rl-model-based-reinforcement-learning-3c2b6f0aa323\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/medium.com\/@jonathan_hui\/rl-model-based-reinforcement-learning-3c2b6f0aa323\" data->Great diagram<\/a>\u00a0from\u00a0<a href=\"https:\/\/medium.com\/@jonathan_hui\" target=\"_blank\" rel=\"noopener noreferrer\" data-action=\"show-user-card\" data-action-type=\"hover\" data-action-value=\"bd51f1a63813\" data-anchor-type=\"2\" data-href=\"https:\/\/medium.com\/@jonathan_hui\" data-user-id=\"bd51f1a63813\" data->Jonathan Hui<\/a>\u00a0showing the spectrum of reinforcement learning approaches<\/p>\n<p id=\"253e\">Model-based reinforcement learning attempts to have agents learn how the world behaves in general. Instead of directly mapping observations to actions, this allows an agent to explicitly\u00a0<em>plan ahead,\u00a0<\/em>to more carefully select actions by \u201cimagining\u201d their long-term outcomes. The benefit of taking a model-based approach is that it\u2019s much more sample efficient\u200a\u2014\u200ameaning that it doesn\u2019t learn each new task from scratch.<\/p>\n<p id=\"7459\">One way to look at the difference between model-free and model-based reinforcement learning is to see whether we\u2019re optimizing for maximum rewards or least cost (model-free = max rewards while model-based = least cost).<\/p>\n<p id=\"5f73\">Model-free reinforcement learning techniques like using Policy Gradients can be\u00a0<em>brute force<\/em>\u00a0solutions, where the correct actions are eventually discovered and internalized into a policy. Policy Gradients have to actually experience a positive reward, and experience it very often in order to eventually and slowly shift the policy parameters towards repeating moves that give high rewards.<\/p>\n<blockquote id=\"d8e3\"><p>One interesting note is how the type of task affects which approach you might choose to take. In Andrej Kaparthy\u2019s awesome post \u2018<a href=\"http:\/\/karpathy.github.io\/2016\/05\/31\/rl\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/karpathy.github.io\/2016\/05\/31\/rl\/\" data->Deep Reinforcement Learning: Pong from Pixel<\/a>s\u2019, he describes games\/tasks where Policy Gradients can beat humans:<\/p><\/blockquote>\n<blockquote id=\"4dd4\"><p>\u201cThere are many games where Policy Gradients would quite easily defeat a human. In particular, anything with frequent reward signals that requires precise play, fast reflexes, and not too much long-term planning would be ideal, as these short-term correlations between rewards and actions can be easily \u201cnoticed\u201d by the approach, and the execution meticulously perfected by the policy. You can see hints of this already happening in our Pong agent: it develops a strategy where it waits for the ball and then rapidly dashes to catch it just at the edge, which launches it quickly and with high vertical velocity. The agent scores several points in a row repeating this strategy. There are many ATARI games where Deep Q Learning destroys human baseline performance in this fashion\u200a\u2014\u200ae.g. Pinball, Breakout, etc.\u201d<\/p><\/blockquote>\n<h4 id=\"fddc\">#3 Transfer\u00a0Learning<\/h4>\n<p id=\"c172\">After the first game, the PlaNet agent already had a rudimentary understanding of gravity and dynamics and was able to re-use knowledge in next games. As a result, PlaNet was often 50 times more efficient than previous techniques that learned from scratch. This meant that the agent only need to look at five frames of an animation (literally a 1\/5 of second of footage) to be able to predict how the sequence will continue with remarkably high accuracy. Implementation-wise, it means that the team did not have to train six separate models to achieve solid performance on the tasks.<\/p>\n<blockquote id=\"e7c2\"><p>From the paper: \u201cPlaNet solves a variety of image-based control tasks, competing with advanced model-free agents in terms of final performance while being 5000% more data efficient on average\u2026These learned dynamics can be independent of any specific task and thus have the potential to transfer well to other tasks in the environment\u201d<\/p><\/blockquote>\n<p id=\"1ad7\">Check out the stunning data efficiency gain that PlaNet had over D4PG with only 2,000 episodes:<\/p>\n<figure id=\"10a7\"><canvas width=\"75\" height=\"22\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 227px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*cZxTxn5dc0zPPjTVaG5j5w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*cZxTxn5dc0zPPjTVaG5j5w.png\" \/><\/figure>\n<p style=\"text-align: center;\">From\u00a0<a href=\"https:\/\/planetrl.github.io\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/planetrl.github.io\/\" data->the paper<\/a>: PlaNet clearly outperforms A3C on all tasks and reaches final performance close to D4PG while, using 5000% less interaction with the environment on\u00a0average.<\/p>\n<p id=\"297b\">As well as these plots of the test performance against the number of collected episodes (PlaNet is in blue):<\/p>\n<figure id=\"3410\"><canvas width=\"75\" height=\"42\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 416px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*ObCKMHCFLCMgMyFe5ghqag.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*ObCKMHCFLCMgMyFe5ghqag.png\" \/><\/figure>\n<p style=\"text-align: center;\">Figure 4 from\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1811.04551.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/pdf\/1811.04551.pdf\" data->the PlaNet paper<\/a>\u00a0comparing PlaNet against model-free algorithms.<\/p>\n<p id=\"bbaf\">These are incredibly exciting results that mean a new era for data efficient and generalizable reinforcement learning. Keep your eye on this space!<\/p>\n<p id=\"d409\"><strong>Want to learn more? Here are some other great resources on reinforcement learning:<\/strong><\/p>\n<ul>\n<li id=\"9755\"><a href=\"https:\/\/www.topbots.com\/most-important-ai-reinforcement-learning-research\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.topbots.com\/most-important-ai-reinforcement-learning-research\/\" data->TOPBOTS\u2019 Most Important AI Reinforcement Learning Research<\/a><\/li>\n<li id=\"7034\"><a href=\"https:\/\/www.youtube.com\/watch?v=fdY7dt3ijgY\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=fdY7dt3ijgY\" data->Open AI\u2019s Spinning Up in Deep RL tutorial<\/a><\/li>\n<li id=\"342e\"><a href=\"https:\/\/www.youtube.com\/watch?v=2pWv7GOvuf0&amp;list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=2pWv7GOvuf0&amp;list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT\" data->DeepMind\u2019s David Silver\u2019s RL Course (Lectures 1\u201310)<\/a><\/li>\n<li id=\"1f56\"><a href=\"https:\/\/skymind.ai\/wiki\/deep-reinforcement-learning\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/skymind.ai\/wiki\/deep-reinforcement-learning\" data->Skymind.ai\u2019s Deep Reinforcement Learning<\/a><\/li>\n<li id=\"ff0d\"><a href=\"http:\/\/karpathy.github.io\/2016\/05\/31\/rl\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/karpathy.github.io\/2016\/05\/31\/rl\/\" data->Andrej Karparthy\u2019s Deep Reinforcement Learning: Pong from Pixels<\/a><\/li>\n<li id=\"ab8a\">[Plus a fun Transfer Learning resource ]\u00a0Dipanjan (DJ) Sarkar\u2019s\u00a0<a href=\"https:\/\/towardsdatascience.com\/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/towardsdatascience.com\/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a\" data->Transfer Learning Guide<\/a><\/li>\n<\/ul>\n<p>Originally published at\u00a0<a href=\"https:\/\/towardsdatascience.com\/everything-you-need-to-know-about-googles-new-planet-reinforcement-learning-network-144c2ca3f284\" rel=\"noopener\">Towards Data Science<\/a>.<\/p>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>Transfer learning is all the rage in the machine learning community these days. It serves as the basis for many of the managed AutoML services and now figures prominently in the latest NLP research. We&rsquo;re also starting to see examples of neural networks that can handle multiple tasks using transfer learning&nbsp;across domains. The main question at hand is:&nbsp;could transfer learning have applications within reinforcement learning? Compared to other machine learning methods, deep reinforcement learning has a reputation for being data hungry, subject to instability in its learning process.<\/p>\n","protected":false},"author":534,"featured_media":2488,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[92],"ppma_author":[3190],"class_list":["post-1637","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-machine-learning"],"authors":[{"term_id":3190,"user_id":534,"is_guest":0,"slug":"cecelia-shao","display_name":"Cecelia Shao","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Shao","first_name":"Cecelia","job_title":"","description":"Cecelia Shao&nbsp;is looking after Product Growth at <a href=\"http:\/\/www.comet.ml\/\">Comet<\/a> that is doing for AI what GitHub did for software."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1637","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/534"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1637"}],"version-history":[{"count":3,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1637\/revisions"}],"predecessor-version":[{"id":28970,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1637\/revisions\/28970"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/2488"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1637"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1637"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}