{"id":22792,"date":"2021-05-07T05:09:00","date_gmt":"2021-05-07T05:09:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/getting-started-with-reinforcement-learning\/"},"modified":"2023-08-21T10:16:21","modified_gmt":"2023-08-21T10:16:21","slug":"getting-started-with-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/getting-started-with-reinforcement-learning\/","title":{"rendered":"Getting Started With Reinforcement Learning"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22792\" class=\"elementor elementor-22792\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-8115db3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"47745\" data-id=\"8115db3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9209490\" data-eae-slider=\"43010\" data-id=\"9209490\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ef0099d elementor-widget elementor-widget-text-editor\" data-id=\"ef0099d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\">Demystifying some of the main concepts and terminologies associated with Reinforcement Learning and their association with other fields of AI<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ee9ac6f elementor-widget elementor-widget-heading\" data-id=\"ee9ac6f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Introduction<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8d9c257 elementor-widget elementor-widget-text-editor\" data-id=\"8d9c257\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9259\">Today, Artificial Intelligence (AI) has undergone impressive advancements. AI can be subdivided into three different levels according to the ability of machines to perform intellectual tasks logically and independently:<\/p>\n\n<ul><li><strong><em>Narrow AI<\/em><\/strong>: machines are more efficient than humans in performing very specific tasks (but not trying to perform other types of tasks).<\/li><li><strong><em>General AI<\/em><\/strong>: machines are as intelligent as human beings.<\/li><li><strong><em>Strong AI<\/em><\/strong>: machines perform better than humans in different ambit (in tasks that we might or not be able to perform at all).<\/li><\/ul>\n\n<p id=\"16fe\">Right now, thanks to Machine Learning, we have been able to achieve good competency at the Narrow AI level. There are three main types of machine learning algorithms used:<\/p>\n\n<ul><li><strong><em>Supervised Learning:<\/em><\/strong>&nbsp;using a labelled training set to train a model, to then make predictions on unlabelled data.<\/li><li><strong><em>Unsupervised Learning:&nbsp;<\/em><\/strong>giving a model an unlabelled data-set, the model has then to try to find patterns in the data to make predictions.<\/li><li><strong><em>Reinforcement Learning:<\/em><\/strong>&nbsp;training a model trough a reward mechanism to encourage positive behaviours in case of good performance (particularly used in agent-based simulations, gaming and robotics).<\/li><\/ul>\n\n<p id=\"8c17\">Reinforcement Learning, is now considered to be the most promising technique in order to move to the next level in the AI paradigm (Figure 1).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7f8e285 elementor-widget elementor-widget-image\" data-id=\"7f8e285\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"462\" height=\"260\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-yhJ9Ma_fhxIBlacV1dP6A-1.gif\" class=\"attachment-large size-large wp-image-30898\" alt=\"Getting Started With Reinforcement Learning\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ea19a1f elementor-widget elementor-widget-heading\" data-id=\"ea19a1f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Reinforcement Learning (RL)<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-432a32d elementor-widget elementor-widget-text-editor\" data-id=\"432a32d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2103\">One of the reasons&nbsp;why Reinforcement Learning has gained so much interest today, is its interdisciplinarity. The core concepts of this area, follow in fact basic&nbsp;<a href=\"https:\/\/towardsdatascience.com\/game-theory-in-artificial-intelligence-57a7937e1b88\" target=\"_blank\" rel=\"noreferrer noopener\">game theory<\/a>,&nbsp;<a href=\"https:\/\/towardsdatascience.com\/introduction-to-evolutionary-algorithms-1278f335ead6\" target=\"_blank\" rel=\"noreferrer noopener\">evolutionary<\/a>&nbsp;and neuroscience principles.<\/p>\n\n<p id=\"4cb4\">Compared to all the other forms of Machine Learning, RL can, in fact, be considered to be the closest approximation in trying to replicate how humans and animals learn throughout time.<\/p>\n\n<p id=\"ca38\">Reinforcement Learning advocates that the main way which humans most commonly use in order to learn is by using their sensors and interacting with an environment (therefore without necessarily external guidance, like in supervised learning, but by a trial and error process).<\/p>\n\n<p id=\"15c7\">On a daily basis, we try to accomplish new tasks and depending on the results of our attempts we affect the environment around us. By assessing our attempts we can then learn through experience to identify which actions gave us greater benefits (and therefore are most convenient to repeat) and which ones should instead be best to avoid. This iterative process is summarized in Figure 2 and represents the main workflow of most Reinforcement Learning based algorithms.<\/p>\n\n<blockquote class=\"wp-block-quote\"><p>An agent (eg. software bot, robot) is placed in an environment and by interacting with it can learn, receive new stimulus and create new states (eg. unlock a new scenarios or modify the structure of the exstisting ones). Every action of our agent is then associated with a reward value assessing its efficacy towards achieving a predefined goal.<\/p><\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-432c45e elementor-widget elementor-widget-image\" data-id=\"432c45e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"908\" height=\"350\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12rmKGjZOv5pGkLLVt-EuMA-1.png\" class=\"attachment-large size-large wp-image-30899\" alt=\"Reinforcement Learning Workflow\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12rmKGjZOv5pGkLLVt-EuMA-1.png 908w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12rmKGjZOv5pGkLLVt-EuMA-1-300x116.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12rmKGjZOv5pGkLLVt-EuMA-1-768x296.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12rmKGjZOv5pGkLLVt-EuMA-1-610x235.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12rmKGjZOv5pGkLLVt-EuMA-1-750x289.png 750w\" sizes=\"(max-width: 908px) 100vw, 908px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3e5c6f5 elementor-widget elementor-widget-heading\" data-id=\"3e5c6f5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Two main challenges which characterize Reinforcement Learning systems are:<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-81a963b elementor-widget elementor-widget-text-editor\" data-id=\"81a963b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li><strong><em>The exploration-exploitation dilemma<\/em><\/strong>: if an agent finds an action which can give him a moderately high reward might be tempted to not try any other available action because afraid it might be less successful. At the same time, if the agent doesn\u2019t even attempt to try a different action it might never find out that better rewards were possible to be achieved.<\/li><li><strong><em>Processing of delayed rewards<\/em><\/strong>: agents are not told what actions to try, but should instead come up with different solutions, test them and finally evaluate them based on the received reward. Agents should not evaluate their actions just on their immediate rewards. Choosing some type of actions might, in fact, provide greater rewards not immediately but in the long run.<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6c13781 elementor-widget elementor-widget-heading\" data-id=\"6c13781\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Core Components<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a9370cb elementor-widget elementor-widget-text-editor\" data-id=\"a9370cb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ec28\">According to Richard S. Sutton et al. [3], Reinforcement Learning algorithms are formed by 4 main key components: Policy, Reward, Value Function, Environment Model.<\/p>\n\n<ul><li><strong><em>Policy:&nbsp;<\/em><\/strong>defines the agent behaviour (maps the different states to actions). Policies are most likely to be&nbsp;<a href=\"https:\/\/towardsdatascience.com\/stochastic-processes-analysis-f0a116999e4\" target=\"_blank\" rel=\"noreferrer noopener\">stochastic<\/a>&nbsp;since each specific action is associated with a probability to be selected.<\/li><li><strong><em>Reward:&nbsp;<\/em><\/strong>is a signal used to alert the agent how should be best to modify its policy in order to achieve the defined objectives (in the short time period). A reward is received to the agent from the environment each time an action is performed.<\/li><li><strong><em>Value Function:&nbsp;<\/em><\/strong>is used in order to get a feeling of what actions can bring a greater return in the long run. It works by assigning values to the different states to asses what kind of reward should an agent expect if starting from any specific state.<\/li><li><strong><em>Environment Model:&nbsp;<\/em><\/strong>simulates the dynamics of the environment the agent is placed in and how the environment should respond to the different actions taken by the agent. Depending on the <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/applications-of-reinforcement-learning-in-real-world\/\" target=\"_blank\" rel=\"noreferrer noopener\">application<\/a>, some RL algorithms do not necessarily require an environment model (model-free approach) since they can be approached using a trial-error approach. Although, model-based approaches can enable RL algorithms to tackle more complicated tasks which require planning.<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-570aa5e elementor-widget elementor-widget-heading\" data-id=\"570aa5e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-20d00b3 elementor-widget elementor-widget-text-editor\" data-id=\"20d00b3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"28db\">In case you are interested in finding out more about Reinforcement Learning,&nbsp;<a href=\"https:\/\/web.stanford.edu\/class\/psych209\/Readings\/SuttonBartoIPRLBook2ndEd.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">\u201cReinforcement Learning: An Introduction\u201d by Richard S. Sutton and Andrew G. Barto<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/gym.openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Open AI Gym<\/a>&nbsp;(as discussed in my next article!) are two great places where to start.<\/p>\n\n<p id=\"c9af\"><em>I hope you enjoyed this article, thank you for read<\/em><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Demystifying some of the main concepts and terminologies associated with Reinforcement Learning and their association with other fields of AI<\/p>\n","protected":false},"author":952,"featured_media":19342,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[97,92,695],"ppma_author":[3676],"class_list":["post-22792","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence","tag-machine-learning","tag-reinforcement-learning"],"authors":[{"term_id":3676,"user_id":952,"is_guest":0,"slug":"pier-paolo-ippolito","display_name":"Pier Paolo Ippolito","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/Pier-Paolo-Ippolito-150x150.jpg","author_category":"","user_url":"https:\/\/pierpaolo28.github.io\/","last_name":"Paolo Ippolito","first_name":"Pier","job_title":"","description":"Pier Paolo Ippolito is a Data Scientist and MSc in Artificial Intelligence graduate with an interest in research areas such as Data Science, Machine Learning, and Cloud Development. Aside from his work activities, he is a freelancer and technical writer."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22792","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/952"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22792"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22792\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/19342"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22792"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22792"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22792"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22792"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}