{"id":22772,"date":"2021-04-29T07:44:00","date_gmt":"2021-04-29T07:44:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/multi-task-robotic-reinforcement-learning-scale\/"},"modified":"2023-08-23T06:15:42","modified_gmt":"2023-08-23T06:15:42","slug":"multi-task-robotic-reinforcement-learning-scale","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/multi-task-robotic-reinforcement-learning-scale\/","title":{"rendered":"Multi-Task Robotic Reinforcement Learning At Scale"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22772\" class=\"elementor elementor-22772\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-d75b761 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"88463\" data-id=\"d75b761\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b0e446e\" data-eae-slider=\"28826\" data-id=\"b0e446e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1634a0a elementor-widget elementor-widget-text-editor\" data-id=\"1634a0a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>For general-purpose robots to be most useful, they would need to be able to perform a range of tasks, such as cleaning, maintenance and delivery. But training even a single task (e.g., grasping) using&nbsp;<a href=\"https:\/\/ai.googleblog.com\/2020\/08\/tackling-open-challenges-in-offline.html\" rel=\"noopener\">offline reinforcement learning<\/a>&nbsp;(RL), a trial and error learning method where the agent uses training previously collected data, can take&nbsp;<a href=\"https:\/\/ai.googleblog.com\/2018\/06\/scalable-deep-reinforcement-learning.html\" rel=\"noopener\">thousands of robot-hours<\/a>, in addition to the significant engineering needed to enable autonomous operation of a large-scale robotic system. Thus, the computational costs of building general-purpose&nbsp;<a href=\"https:\/\/x.company\/projects\/everyday-robots\/\">everyday robots<\/a>&nbsp;using current robot learning methods becomes prohibitive as the number of tasks grows.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-93fd9fc elementor-widget elementor-widget-text-editor\" data-id=\"93fd9fc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-table alignfull is-style-regular\"><table><tbody><tr><td><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/image1.gif\"><a href=\"https:\/\/1.bp.blogspot.com\/-3zCzaQm-_Fo\/YHn5zp0Iv_I\/AAAAAAAAHZs\/Fyg3TSZX28wlvIEuv5t1h1CiLH_YDIa6wCLcBGAsYHQ\/s512\/image1.gif\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"><\/a><\/td><\/tr><tr><td>               Multi-task data collection across multiple robots where different robots collect data for different tasks.<\/td><\/tr><\/tbody><\/table><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0a93154 elementor-widget elementor-widget-text-editor\" data-id=\"0a93154\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In other large-scale machine learning domains, such as\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Natural_language_processing\" rel=\"noopener\">natural language processing<\/a>\u00a0and\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Computer_vision\" rel=\"noopener\">computer vision<\/a>, a number of strategies have been applied to amortize the effort of learning over multiple skills. For example,\u00a0<a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\" rel=\"noopener\">pre-training<\/a>\u00a0on large natural language datasets can enable few- or zero-shot learning of multiple tasks, such as\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Question_answering\" rel=\"noopener\">question answering<\/a>\u00a0and\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Sentiment_analysis\" rel=\"noopener\">sentiment analysis<\/a>. However, because robots collect their own data, robotic skill learning presents a unique set of opportunities and challenges. Automating this process is a large engineering endeavour, and effectively reusing\u00a0<a href=\"https:\/\/bair.berkeley.edu\/blog\/2019\/11\/26\/robo-net\/\" rel=\"noopener\">past robotic data collected by different robots<\/a>\u00a0remains an open problem.<\/p>\n<p>Today we present two new advances for robotic RL at scale,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2104.08212\" rel=\"noopener\">MT-Opt<\/a>, a new multi-task RL system for automated data collection and multi-task RL training, and\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2104.07749\" rel=\"noopener\">Actionable Models<\/a>, which leverages the acquired data for\u00a0<a href=\"https:\/\/bair.berkeley.edu\/blog\/2018\/09\/06\/rig\/\" rel=\"noopener\">goal-conditioned<\/a>\u00a0RL. MT-Opt introduces a scalable data-collection mechanism that is used to collect over 800,000 episodes of various tasks on real robots and demonstrates a successful application of multi-task RL that yields ~3x average improvement over baseline. Additionally, it enables robots to master new tasks quickly through use of its extensive multi-task dataset (new task fine-tuning in 1 day of data collection). Actionable Models enables learning in the absence of specific tasks and rewards by training an implicit model of the world that is also an actionable robotic policy. This drastically increases the number of tasks the robot can perform (via visual goal specification) and enables more efficient learning of downstream tasks.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f0f4ae9 elementor-widget elementor-widget-heading\" data-id=\"f0f4ae9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Scale Multi-Task Data Collection System<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-13f866f elementor-widget elementor-widget-text-editor\" data-id=\"13f866f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The cornerstone for both MT-Opt and Actionable Models is the volume and quality of training data. To collect diverse, multi-task data at scale, users need a way to specify tasks, decide for which tasks to collect the data, and finally, manage and balance the resulting dataset. To that end, we create a scalable and intuitive multi-task success detector using data from all of the chosen tasks. The multi-task success is trained using supervised learning to detect the outcome of a given task and it allows users to quickly define new tasks and their rewards. When this success detector is being applied to collect data, it is periodically updated to accommodate distribution shifts caused by various real-world factors, such as varying lighting conditions, changing background surroundings, and novel states that the robots discover.<\/p>\n\n<p>Second, we simultaneously collect data for multiple distinct tasks across multiple robots by using solutions to easier tasks to effectively bootstrap learning of more complex tasks. This allows training of a policy for the harder tasks and improves the data collected for them. As such, the amount of per-task data and the number of successful episodes for each task grows over time. To further improve the performance, we focus data collection on underperforming tasks, rather than collecting data uniformly across tasks.<\/p>\n\n<p>This system collected 9600 robot hours of data (from 57 continuous data collection days on seven robots). However, while this data collection strategy was effective at collecting data for a large number of tasks, the success rate and data volume was imbalanced between tasks.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3420575 elementor-widget elementor-widget-heading\" data-id=\"3420575\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Learning with MT-Opt<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ec385d7 elementor-widget elementor-widget-text-editor\" data-id=\"ec385d7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>We address the data collection imbalance by transferring data across tasks and re-balancing the per-task data. The robots generate episodes that are labelled as success or failure for each task and are then copied and shared across other tasks. The balanced batch of episodes is then sent to our <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/applications-of-reinforcement-learning-in-real-world\/\" target=\"_blank\" rel=\"noreferrer noopener\">multi-task RL<\/a> training pipeline to train the MT-Opt policy.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-10c8acc elementor-widget elementor-widget-text-editor\" data-id=\"10c8acc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-table alignfull\"><table><tbody><tr><td><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/image2.gif\"><a href=\"https:\/\/1.bp.blogspot.com\/-ORHRr1O1TYc\/YHn5-FZgWrI\/AAAAAAAAHZw\/_S5xTB7lVJUapExE3iix5l4NnM3SxxmvACLcBGAsYHQ\/s816\/image2.gif\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"><\/a><\/td><\/tr><tr><td>Data sharing and task re-balancing strategy used by MT-Opt. The robots generate episodes which then get labelled as success or failure for the current task and are then shared across other tasks.<\/td><\/tr><\/tbody><\/table><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ad2cb40 elementor-widget elementor-widget-text-editor\" data-id=\"ad2cb40\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>MT-Opt uses&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Q-learning\" rel=\"noopener\">Q-learning<\/a>, a popular RL method that learns a function that estimates the future sum of rewards, called the Q-function. The learned policy then picks the action that maximizes this learned Q-function. For multi-task policy training, we specify the task as an extra input to a large Q-learning network (inspired by our previous work on large-scale single-task learning with&nbsp;<a href=\"https:\/\/ai.googleblog.com\/2018\/06\/scalable-deep-reinforcement-learning.html\" rel=\"noopener\">QT-Opt<\/a>) and then train all of the tasks simultaneously with&nbsp;<a href=\"https:\/\/ai.googleblog.com\/2020\/08\/tackling-open-challenges-in-offline.html\" rel=\"noopener\">offline RL<\/a>&nbsp;using the entire multi-task dataset. In this way, MT-Opt is able to train on a wide variety of skills that include picking specific objects, placing them into various fixtures, aligning items on a rack, rearranging and covering objects with towels, etc.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0387775 elementor-widget elementor-widget-text-editor\" data-id=\"0387775\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-video\"><video controls src=\"https:\/\/karolhausman.github.io\/mt-opt\/img\/mt-opt-grid.mp4\"><\/video><figcaption>Example tasks that MT-Opt is able to learn, such as instance and indiscriminate grasping, chasing, placing, aligning and rearranging.<\/figcaption><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9769ce9 elementor-widget elementor-widget-text-editor\" data-id=\"9769ce9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-table alignfull\">\n<table>\n<tbody>\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/image3.gif\" \/><\/td>\n<\/tr>\n<tr>\n<td>Towel-covering task that was not present in the original dataset. We fine-tune MT-Opt on this novel task in 1 day to achieve a high (90%) success rate.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-afad989 elementor-widget elementor-widget-heading\" data-id=\"afad989\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Learning with Actionable Models<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2d6721a elementor-widget elementor-widget-text-editor\" data-id=\"2d6721a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>While supplying a rigid definition of tasks facilitates autonomous data collection for MT-Opt, it limits the number of learnable behaviors to a fixed set. To enable learning a wider range of tasks from the same data, we use\u00a0<a href=\"https:\/\/bair.berkeley.edu\/blog\/2018\/09\/06\/rig\/\" rel=\"noopener\">goal-conditioned<\/a>\u00a0learning, i.e., learning to reach given goal configurations of a scene in front of the robot, which we specify with goal images. In contrast to explicit\u00a0<a href=\"https:\/\/bair.berkeley.edu\/blog\/2019\/12\/12\/mbpo\/\" rel=\"noopener\">model-based methods<\/a>\u00a0that learn predictive models of future world observations, or approaches that employ\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1707.01495\" rel=\"noopener\">online data collection<\/a>, this approach learns goal-conditioned policies via offline model-free RL.<\/p>\n\n<p>To learn to reach any goal state, we perform&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1707.01495\" rel=\"noopener\">hindsight relabeling<\/a>&nbsp;of all trajectories and sub-sequences in our collected dataset and train a goal-conditioned Q-function in a&nbsp;<a href=\"https:\/\/ai.googleblog.com\/2020\/08\/tackling-open-challenges-in-offline.html\" rel=\"noopener\">fully offline<\/a>&nbsp;manner (in contrast to learning online using a fixed set of success examples as in&nbsp;<a href=\"https:\/\/ai.googleblog.com\/2021\/03\/recursive-classification-replacing.html\" rel=\"noopener\">recursive classification<\/a>). One challenge in this setting is the distributional shift caused by learning only from \u201cpositive\u201d hindsight relabeled examples. This we address by employing a&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2006.04779\" rel=\"noopener\">conservative strategy<\/a>&nbsp;to minimize Q-values of unseen actions using artificial negative actions. Furthermore, to enable reaching temporary-extended goals, we introduce a technique for chaining goals across multiple episodes.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-53386fb elementor-widget elementor-widget-text-editor\" data-id=\"53386fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-table alignfull\"><table><tbody><tr><td><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/image4.gif\"><a href=\"https:\/\/1.bp.blogspot.com\/-kpV2GPoLsso\/YHn6ZuK2grI\/AAAAAAAAHaA\/2JkUk3LSRfE0KhB451FagVis-W_wSOoNwCLcBGAsYHQ\/s800\/image4.gif\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"><\/a><\/td><\/tr><tr><td>Actionable Models relabel sub-sequences with all intermediate goals and regularize Q-values with artificial negative actions.<\/td><\/tr><\/tbody><\/table><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d9b7e53 elementor-widget elementor-widget-text-editor\" data-id=\"d9b7e53\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Training with Actionable Models allows the system to learn a large repertoire of visually indicated skills, such as object grasping, container placing and object rearrangement. The model is also able to generalize to novel objects and visual objectives not seen in the training data, which demonstrates its ability to learn general functional knowledge about the world. We also show that downstream reinforcement learning tasks can be learned more efficiently by either fine-tuning a pre-trained goal-conditioned model or through a goal-reaching auxiliary objective during training.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7782fb2 elementor-widget elementor-widget-text-editor\" data-id=\"7782fb2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-table aligncenter\"><table><tbody><tr><td><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/actionable_models_grid.gif\"><a href=\"https:\/\/1.bp.blogspot.com\/-HBzWWQ1ScEc\/YHoPmj1aUgI\/AAAAAAAAHaM\/j814Lw21PTkhSR4O-aCo-ApM5DMLLqFDwCLcBGAsYHQ\/s1100\/actionable_models_grid.gif\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"><\/a><\/td><\/tr><tr><td>Example tasks (specified by goal-images) that our Actionable Model is able to learn.<\/td><\/tr><\/tbody><\/table><\/figure>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-60f0586 elementor-widget elementor-widget-heading\" data-id=\"60f0586\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dfe5482 elementor-widget elementor-widget-text-editor\" data-id=\"dfe5482\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The results of both MT-Opt and Actionable Models indicate that it is possible to collect and then learn many distinct tasks from large diverse real-robot datasets within a single model, effectively amortizing the cost of learning across many skills. We see this an important step towards general robot learning systems that can be further scaled up to perform many useful services and serve as a starting point for learning downstream tasks.<\/p>\n\n<p>This post is based on two papers, &#8220;<a href=\"https:\/\/arxiv.org\/abs\/2104.08212\" rel=\"noopener\">MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale<\/a>&#8221; and &#8220;<a href=\"https:\/\/arxiv.org\/abs\/2104.07749\" rel=\"noopener\">Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills<\/a>,&#8221; with additional information and videos on the project websites for&nbsp;<a href=\"https:\/\/karolhausman.github.io\/mt-opt\/\" rel=\"noopener\">MT-Opt<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/actionable-models.github.io\/\" rel=\"noopener\">Actionable Models<\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ddce439 elementor-widget elementor-widget-heading\" data-id=\"ddce439\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Acknowledgements<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-adf5540 elementor-widget elementor-widget-text-editor\" data-id=\"adf5540\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><em>This research was conducted by Dmitry Kalashnikov, Jake Varley, Karol Hausman, Yevgen Chebotar, Ben Swanson, Rico Jonschkowski, Chelsea Finn, Sergey Levine, Yao Lu, Alex Irpan, Ben Eysenbach, Ryan Julian and Ted Xiao. We\u2019d like to give special thanks to Josh Weaver, Noah Brown, Khem Holden, Linda Luu and Brandon Kinman for their robot operation support; Anthony Brohan for help with distributed learning and testing infrastructure; Tom Small for help with videos and project media; Julian Ibarz, Kanishka Rao, Vikas Sindhwani and Vincent Vanhoucke for their support; Tuna Toksoz and Garrett Peake for improving the bin reset mechanisms; Satoshi Kataoka, Michael Ahn, and Ken Oslund for help with the underlying control stack, and the rest of the Robotics at Google team for their overall support and encouragement. All the above contributions were incredibly enabling for this research.<\/em><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>For general-purpose robots to be most useful, they would need to be able to perform a range of tasks, such as cleaning, maintenance and delivery.<\/p>\n","protected":false},"author":1121,"featured_media":19262,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[695,411,214],"ppma_author":[3680],"class_list":["post-22772","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-reinforcement-learning","tag-robots","tag-rpa"],"authors":[{"term_id":3680,"user_id":1121,"is_guest":0,"slug":"karol-hausman","display_name":"Karol Hausman","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Karol-Hausman-150x150.jpeg","author_category":"","user_url":"https:\/\/research.google\/teams\/brain\/robotics\/","last_name":"Hausman","first_name":"Karol","job_title":"","description":"Karol Hausman is Senior Research Scientist and Robot Manipulation Lead at <a href=\"https:\/\/research.google\/teams\/brain\/robotics\/\">Google Brain<\/a>."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22772","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1121"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22772"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22772\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/19262"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22772"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22772"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22772"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22772"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}