Introduction In Markov Decision Processes you have: * Agent: The decision maker / learner. The agent sends an action to the environment. * Environment: Everything that is not the agent. The environment sends a reward back to the agent. * Reward: The signal that agent tries to maximize.
Example GridWorld Lets say we have a 5x5 grid. There are four possible actions: left, right, up, and down. If you reach the point (1,2) and move in any direction you recieve the reward of 10 and are moved to the point (5,2).
Introduction This is going to be part of series where I illustrate examples and questions from the brilliant book by Sutton and Barto Sutton and Barto (1998). You can download the pdf version of the newly updated book online, just google it. I am planning on going through each chapter and illustrating 1 or 2 examples from each chapter.
MultiArmed Bandits What on earth is a mutliarmed bandit? It might be easier to think of it as which pokie you choose to play on down at the local.