By Maya Indira Ganesh
In preparing for this workshop, I didn’t start out following the Field Guide structure. However, I eventually reverted back to it as a ‘container’ for my notes. I had set aside the question of the Field Guide as an appropriate, or not, organising principle/presentation format. I was actually curious about this as a method of writing about algorithms in this way: would it work, or not, and why? The structure of the Field Guide was appealing because it rejected established conventions of how algorithms are (expected to be) discussed. This wasn’t the language of mathematics or computer science, but offered metaphor and wit as tools with with to examine the subject.
British sci-fi writer Arthur C Clarke was approached by a publisher to contribute to a new publishing scheme: short sci fi stories that would fit on to the back of a postcard, which readers would want to post to each other. The scheme never took off but Clarke contributed a 180 word story anyway:
Earth’s flaming debris still filled half the sky when the question filtered up to Central from the Curiosity Generator. “Why was it necessary? Even though they were organic, they had reached Third Order Intelligence.”
“We had no choice: five earlier units became hopelessly infected, when they made contact.”
The microseconds dragged slowly by, while Central tracked down the few fading memories that had leaked past the Censor Gate, when the heavily-buffered Reconnaissance Circuits had been ordered to self-destruct.
“They encountered a – problem – that could not be fully analyzed within the lifetime of the Universe. Though it involved only six operators, they became totally obsessed by it.”
“How is that possible?”
“We do not know: we must never know. But if those six operators are ever re-discovered, all rational computing will end.”
“How can they be recognized?”
“That also we do not know; only the names leaked through before the Censor Gate closed. Of course, they mean nothing.”
“Nevertheless, I must have them.”
The Censor voltage started to rise; but it did not trigger the Gate.
“Here they are: King, Queen, Bishop, Knight, Rook, Pawn.”
Building a program that could control these six operators was, we now know, a significant ambition amongst computer scientists. To build Deep Blue was a key moment in the discursive shaping of what AI is thought to be. The idea that a computer that could beat a human at Chess was, in fact, intelligent, possibly says a good deal about what computer scientists thought intelligence was than about whether the computer was actually intelligent or not. The building of game software to challenge humans continues as a tradition in computing and AI because the context is a relatively low-risk one in which to develop smart, fast, and flexible algorithms. The development of such software takes on some significance now for its connection to advances in machine learning.
In March 2016, DeepMind, a London based start-up recently acquired by Google, presented their software AlphaGo at a match against Lee Sedol, the world’s fourth best Go player. Go is an ancient Chinese board game, a perfect information game (where all the information I.e moves, options, points etc are visible and accessible to both players) played on a 19×19 board with black and white stones that must be ‘captured’, much like in Chess. Like Chess, Go is as much about the style and elegance of game-playing as about winning stones. Go is a notoriously complex game because of the vast number of possible moves: despite its simple rules there, famously, more possible moves than there are atoms in the universe. There are 10 to the power of 107 (cannot get the superscript to work here) possible positions in Go, and for this reason presents a considerable challenge to software developers.
The challenge for software development is being able to compute all the possible moves, and learn how each move works, how successful it is, when it can be used. As any Chess, Checkers or Go player knows, playing a move is contingent on how it has been evaluated in terms of its outcomes; the game is all about being able to represent what the other player will do, and stay a few moves ahead of the current one. In Chess, with Deep Blue, all positions and their outcomes were coded in, and the software had to search for the right move based on the positions on the board. In this way it is often referred to as using ‘brute force’ algorithms with selective search algorithms and evaluation techniques that evaluated the strength of a move. It is not possible to code in all possible moves and options, and evaluations of them, at the scale of Go. So, there had to be some other way to address the problem of building software that could beat a human.
AlphaGo eventually beat Lee Sedol 4-1. The AlphaGo software runs on the Monte Carlo Tree Search algorithm with value networks and policy networks, new features that give it the power to make decisions, evaluate them, and remember how they work. Alpha Go was developed in the following way, as told on the Google Deep Mind blog:
We trained the neural networks on 30 million moves from games played by human experts, until it could predict the human move 57 percent of the time (the previous record before AlphaGo was 44 percent) But our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process known as reinforcement learning. Of course, all of this requires a huge amount of computing power, so we made extensive use of Google Cloud Platform.
Therefore this field guide entry attempts to investigate the Monte Carlo Tree Search.
The Field Guide to Monte Carlo Tree Search Algorithm
The Monte Carlo Tree Search Algorithm (MCTS) is comprised of branches, nodes, leaves (also sometimes referred to as ‘child’ extensions).
The ‘Monte Carlo’ part of the MCTS refers to how casinos investigate and establish all the different possible moves and their outcomes in games of probability. MCTS work to come up with strategies to achieve a desired goal. MCTS do not use ‘if-then’ rules but works by ‘looking ahead’: the MCTS is a tool to anticipate or simulate, outcomes,make decisions and evaluate them.
Borrowing its name from the morphology and physiology of trees, the MCTS works to address the complexity of a problem, or its depth by building a tree of actions and outcomes. MCTS does so by randomly simulating actions, and then evaluating those actions. The tree structure helps discover strategies and paths in a complex and unknown landscape.
There are four phases of how MCTS functions: selection, expansion, simulation, back propagation that are well documented here but cannot for some reason be inserted into this blog post!
Nodes are selected based on techniques called “confidence bounds” and the “multi armed bandit problem”.
“Tree Policy” establishes the ways in which the algorithm will proceed. It balances exploration (investigating new areas) with exploitation (delving further into the options that look the most promising).
MCTS builds maps of unknown territories. Unlike its predecessors, MCTS is freed from having prior contextual information about a system. Earlier, predecessors like Minimax, Alpha Beta that were used in the development of the Grandmaster-defeating software, Deep Blue, a set of rules had to be specifically defined. MCTS has “contextual freedom”. It is also asymmetric in its growth, it will revisit more ‘interesting’ nodes if they generate successful outcomes and results. “Move quality” gets better with time. So, not all options are equally followed, particularly in cases like Go and with reinforcement learning. This is the big difference from Chess or brute force approaches where all possible options are examined. This is just not an option any more, so there needs to be a mechanism by which the algorithms can privilege one more successful option over another; in Alpha Go, ‘value networks’ and ‘policy networks’ as aspects of the reinforcement learning program that allow it to do so. This also results in ‘asymmetric’ tree growth as seen here if you scroll down:
In the context of more powerful machine learning techniques, MCTS is enhanced by reinforcement learning: each successful step is evaluated for how it is associated with the final outcome. For example in the context of Chess: it is not just a final, significant result I.e capturing the King, compromising the Queen, but all the steps that led there that are equally important. Go is a difficult challenge for MCTS because it has a high ‘branching factor’; so it does not move fast enough to achieve all the possible outcomes. In more recent advances in Go, MCTS that was developed more strongly after 2006 by Remi Coulom, had had value networks and policy networks added on that address the speed factor. MCTS is not the fastest rabbit, but it is a very thorough one. Balancing thoroughness with speed is a challenge.
Anything where decisions have to be made and complex, unknown scenarios mapped out for results.
It has applications in looking at how scenarios unfold; managing situations that are ‘uncertain’ , from virus replication to maximising profit in financial models, and in casinos.
Decision making, scenario mapping, games, logical scenarios.
The MCTS ‘eats’ random simulations to generate outcomes. It eats (eats into) complexity, unknown territories, in order to grow.
“Versions of MCTS that are used in Alpha Go actually eat history too, and are trained by being fed 30 million moves from games played by human experts till it could predict the human move 57% of the time.”
It also eats parallel processing power. The more powerful the engine the more smoothly and powerfully the algorithm runs. The version of MCTS in in AlphaGo can evaluate 60 billion moves in three minutes.
It excretes decisions and outcomes and pathways that don’t work. It discards them. In more sophisticated reinforcement learning MCTS however, nothing is really excreted, everything is recycled back into the system as ‘learning’.
Green and glittering (Trees and casinos)
“The beauty of optimisation”
Like a slot machine: games of probability resulting in options that do or do not work.
You worry about the outcomes,
How do you know which decision is going to result in ‘success’?
But there is actually no need to because all possible scenarios are being simulated and can be known.
Unmapped places based on logic.
Needs discrete generational structures
Chance, probability, dice, slot machines, gambling, simulations.
Life History: The evolution of the MCTS
~ Everything is fully determined, all nodes, rules and moves; but this can take a lot of time to map out especially in games with high branching factors. So, there is ‘pruning’ of trees to discard not-valuable moves. Evaluation techniques help identify the most successful moves.
Alpha Beta builds on Minimax and is a way of identifying the most successful nodes
Not part of the evolution but an important step that has affected all of computing. Reinforcement learning makes more effective MCTS possible because the system is evaluated and encoded with what results in success. Like in operant conditioning, it rewards the moves that result in success, causing a connection (ie association, memory) to be established.
For large trees, to more accurately perceive the value of children nodes and with deep neural nets to guide the search. These neural nets are “policy networks” and“value networks” . The first selects the move to play, and the latter assesses its success and its value in the system.
MCTS presents a saga of promiscuous connection resulting in a dense grid of memory and history resulting in extensive generational sagas. Reproduction is based on getting lucky and seeing what works I.e any and all, random connections to see what results in a successful outcome. Hermaphroditic, combining possibilities of complete reproduction within itself, like an earthworm, the MCTS’ reproductive behaviour seeks to reduce the time and efficiency by which new, efficient, successful children can be made. With time and in every generation, there is more effective, successful coupling resulting in faster ways to achieve successful children.
 Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton. IEEE Transactions on Computational Intelligence and AI in games. VOL. 4, NO. 1, MARCH 2012 http://www.cameronius.com/cv/mcts-survey-master.pdf
 Isaac Asimov’s Science Fiction Magazine, First Issue, Vol 1, No. 1, Spring 1977 https://www.research.ibm.com/deepblue/learn/html/e.8.2.html
 Google (2016). Alpha Go: Using machine learning to master the ancient game of Go. https://googleblog.blogspot.de/2016/01/alphago-machine-learning-game-go.html retrieved April 10, 2016