Introduction In Markov Decision Processes you have: * Agent: The decision maker / learner. The agent sends an action to the environment. * Environment: Everything that is not the agent. The environment sends a reward back to the agent. * Reward: The signal that agent tries to maximize.
Example GridWorld Lets say we have a 5x5 grid. There are four possible actions: left, right, up, and down. If you reach the point (1,2) and move in any direction you recieve the reward of 10 and are moved to the point (5,2).