Being able to learn from mistakes is a powerful ability that humans (being mistake-prone) take advantage of all the time. Even if we screw something up that we’re trying to do, we probably got parts of it at least a little bit correct, and we can build off of the things that we did not to do better next time.
Eventually, we succeed. Robots can use similar trial-and-error techniques to learn new tasks.
With reinforcement learning, a robot tries different ways of doing a thing, and gets rewarded whenever an attempt helps it to get closer to the goal. Based on the reinforcement provided by that reward, the robot tries more of those same sorts of things until it succeeds.
Where humans differ is in how we’re able to learn from our failures as well as our successes. It’s not just that we learn what doesn’t work relative to our original goal; we also collect information about how we fail that we may later be able to apply to a goal that’s slightly different, making us much more effective at generalizing what we learn than robots tend to be.
Today, San Francisco-based AI research company OpenAI is releasing an open source algorithm called Hindsight Experience Replay, or HER, which reframes failures as successes in order to help robots learn more like humans. The other nice thing about HER is that it uses what researchers call “sparse rewards” to guide learning. Read more from spectrum.ieee.org…
thumbnail courtesy of ieee.org