Welcome to Reinforcement Learning. After watching this video, you will be able to: Define reinforcement learning. Explain how the reward system works in reinforcement learning. List the three reinforcement learning algorithms. List two types of reinforcement learning. And list some applications of reinforcement learning. For those of us who have dogs, when we occasionally go to the park, we will play a game of fetch. Whenever the dog catches the stick or ball, they are rewarded with a treat. Reinforcement learning works like that: it is a machine learning system that is based on some form of reward, where the machine learns and gives itself a reward when it is right and penalizes itself when it is wrong. Reinforcement learning uses different concepts. First, we have the agents. An agent is an algorithm that interacts and uses information from the environment such as state and reward to decide what actions to take. An agent's goal is to learn how to maximize the reward by selecting the optimal actions in response to the current state. We can look at the agent as a dog playing fetch, which is an action. An action is the possible move that the agent can take with its owner, known as the environment. The environment is the surroundings or the conditions that the agent operates in and responds to, and it contains the states. The environment takes actions provided by the agent and returns the next state and reward earned by performing that action. A state is a concrete and immediate situation in which an agent finds itself. The reward is the value or signal returned by the environment as a result of an action taken by the agent. To make an agent perform a particular task or achieve a particular goal, you must provide a reward structure for it to maximize. A typical structure is to have positive, negative, and neutral rewards. The reward is your way of communicating to the agent what you want it to achieve. In the case of the dog and fetch example, a positive reward will be getting a treat for catching the ball and not getting anything for missing the ball. Finally, we have the policy. A policy defines how the agent will act when in a specific state, it maps the states in the environment to a single action or probability for each action. A policy can be represented as a lookup table, a function, an algorithm, or by calculating the probability of performing an action given a state. There are three approaches to reinforcement learning, First is the model-based method. Here, you create a virtual model to help the agent learn in specific environments Second is the policy-based method. You develop a strategy that helps to gain maximum rewards in the future through possible actions performed in each state. And third is the value-based method. The agent’s goal here is to maximize a defined value function. Value functions are estimates for actions and states resulting from interacting with the environment. The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting at a given state. The agent uses this value function to make decisions. This is where the learning happens, as we play games and learn what states are good and bad. There are two types of reinforcement learning: positive and negative reinforcement. Positive reinforcement learning increases the strength and frequency of the behavior and positively impacts the action taken by the agent. Positive reinforcement helps you to maximize performance and sustain change for a more extended period. Too much reinforcement can lead to over-optimization of the state, which can affect the results. Negative reinforcement is when a behavior is strengthened as a result of a negative condition that should have been stopped or avoided. It indicates the minimum acceptable standard of performance by setting what is considered a “good enough” standard. A challenge is that it can be easy to meet the minimum standard behavior, hence not optimizing properly. There is also neutral reinforcement. Here, the behavior is not rewarded or stopped. For example, a dog eating its owner’s shoes and the owner not responding to that behavior. The challenge is that it can be misinterpreted as positive reinforcement in situations where the behavior should be stopped or avoided but no feedback was given. Reinforcement learning is applied in a variety of industries. Let’s look at some of the industries and how reinforcement learning is applied. Reinforcement learning is useful for the management of large fleets of vehicles that move around to generate income. Examples of these industries include ride-sharing companies, mail companies, shipping companies, delivery services, and car rental companies. The states would be vehicles' locations, the number of vehicles at each location, and the available jobs at each location. The actions would be where to send the vehicles. And the rewards would be the income generated from completing jobs. Reinforcement learning is excelling at learning how to play games. Chess is very popular to play with reinforcement learning. The states would be the positions of all the pieces on the board, the actions would be which piece to move where, and the rewards would be given at the end of the game based on the outcome. You can also train a robotic arm to perform a task like picking up and moving an object to the correct spot. Robotic arms can be used in assembly lines for manufacturing. The states would be the arm's position, speed of the moving parts, and the claw position. The actions would be to change the claw’s grasp or move a joint a certain way. Moving the object to the correct spot would earn a positive reward, and all other states would give no reward. In this video, you learned that: Reinforcement learning is a machine learning method. Reinforcement learning works like a reward system. The most common concepts in reinforcement learning are the agent, environment, reward, action, state, and policy. The two types of reinforcement learning are positive and negative reinforcement. And finally, reinforcement learning is used in various industries like games, fleet management, and manufacturing.