Skip to content

Latest commit

 

History

History
34 lines (27 loc) · 2.11 KB

File metadata and controls

34 lines (27 loc) · 2.11 KB
tags python,numpy,neural-network,reinforcement learning
mathjax true

Monte Carlo Tree Search Algorithms

  • an algorithm based on node selection, node expansion, Monte Carlo rollouts and reward backpropagation
  • table based algorithm that limits its usability to more or less small problems with limited and discrete state space and moderate branching factor

{:.caption .img} A Tree Search Algorithms Survey Roy van Rijn - A Tree Search Algorithms Survey (2017)

{:.caption .img} Tree Search, Amplification and Distillation Robert Miles - Tree Search, Amplification and Distillation (2019)

The Alpha Zero Algorithm

  • uses generalizing value and policy neural networks to target fully observable deterministic problems with very large state space and large branching factor
    • a value network predicts the winner of the game
    • a policy network generates an action probability vector
    • value and policy function may be implemented using a combined (shared weights) neural network with two heads generating the scalar state value and the action probability vector
    • both value and policy function in conjunction represent a model that can be used for planning
  • the algorithm incorporates a Monte Carlo Tree Search like element to generate simulated experience, used to optimize the value and policy network
  • self-play against randomly chosen previous versions of itself, is the one and only mechanism used to improve the playing strength of the algorithm

{:.caption .img} AlphaGo, AlphaZero, and Deep Reinforcement Learning David Silver - AlphaGo, AlphaZero, and Deep Reinforcement Learning

{:.caption .img} How AlphaGo Zero works Xander Steenbrugge - How Google DeepMind's AlphaGo Zero works