This New Technique Helps Build Autonomous, Self-Learning AI Agents that Passed the Pommerman ChallengeJesus RodriguezBlockedUnblockFollowFollowingMar 18The emergence of trends such as self-driving cars or drones have helped to popularized an area of artificial intelligence(AI) research known as autonomous agents.
Conceptually, autonomous agents are AI that build knowledge real time based on the characteristics of their surrounding environment as well as other agents.
If we use the example of self-driving vehicles, the autonomous agents need to quickly adapt to the information processed by the car’s LIDAR sensors to avoid collisions and perform a safe drive.
The increasing importance of autonomous agents has attracted the attention major corporate AI Labs and research institutions.
Recently, a team from IBM AI Research Lab in Tokyo, published a paper proposing a tree search technique based on pessimistic scenarios that improves the implementation of autonomous agents.
The proposed method went to win first and third places in the X-Games of autonomous agents: the Pommerman Challenge.
What makes the implementation of autonomous agents so challenging is not only the self-learning, real-time nature of the knowledge building process but the fact that those agents operate in multi-agent, partially observable environments.
In an autonomous agent scenario, the AI model does not only need to process environmental information real time but also interact with other autonomous agents and learn from their behavior.
Additionally, the entire information about the environment is not known upfront which requires the agents to leverage memory techniques to build an incremental representation of the environment.
Again, think about a self-driving car navigating(real time) on a road that it’s never driven before(partially observable) during a traffic jam(multi-agent).
The Pommerman ChallengeAutonomous AI agents are not only incredibly difficult to build but also expensive to test.
Its not like we can go test new models in self-driving cars or drones every day.
To streamline the testing and validation of autonomous AI systems and advance the research in the space, the AI community created the Pommerman Challenge, a multi-agent playground to test new autonomous AI systems.
In Pommerman, a team of two agents competes against another team of two agents on a board of 11 x 11 grids.
Each agent can observe only a limited area of the board, and the agents cannot communicate with each other.
The goal of a team is to knock down all of the opponents.
Towards this goal, the agents place bombs to destroy wooden walls and collect power-up items that might appear from those wooden walls, while avoiding flames and attacking opponents.
Real time decision making is one of the characteristics that make Pommerman so difficult.
In a typical game, an agent needs to make a decision in about 100 milliseconds which limits the applicability of computationally expensive techniques such as Monte Carlo Tree Search.
In Pommerman, the branching factor at each step can be as large as 64= 1296, because four agents take actions simultaneously in each step, and there are six possible actions for each agent.
The agents should plan ahead and choose actions by taking into account the explosion of bombs, whose lifetime is 10 steps.
This factor typically results challenging to tree search techniques as those with less than levels of depth would ignore the explosion of bombs while those with sufficient depth could result unviable given a large branching factor.
While the AI community has been steadily making progress in the Pommerman Challenge, the results remain way below other games such as Atari, Go or even Poker.
The key of a successful autonomous agent in the Pommerman Challenge is to infer critical events far ahead in the future.
To address that challenge, IBM decided to rely on combines real time tree search with a deterministic evaluation of the environment.
Real Time Tree Search with Pessimistic ScenariosAs discussed in the previous section, Pommerman wouldn’t be such a challenging scenario for autonomous agents if it wasn’t because of its real time constraints.
Techniques such as Monte Carlo Tree Search(MCTS) are perfectly suitable to solve the Pommerman challenge except that it typically takes a long time to find a solution.
However, in many scenarios, MCTS type techniques are still a viable solution.
Consider the situation in which where an agent can survive only by following a particular route.
MCTS is likely to outperform alternatives given the reduced scope of the search.
While the previous example teaches us is that a potential solution to the Pommerman Challenge could use traditional search techniques up to certain level and then combine them with inference scenarios.
This is precisely the approach followed by the IBM team.
In their quest to solve the Pommerman Challenge, IBM leveraged a method that performs a tree search only with a limited depth, but the leaves of the search tree are evaluated on the basis of a deterministic and pessimistic scenario.
The new approach keeps the size of the search tree small, because there are branches only until a limited depth.
At the same time, the new approach can take into account critical evens that might occur far ahead in the future, because the leaves are evaluated with a deterministic scenario that can be much longer than what would be possible with branches.
The idea of relying on pessimistic scenarios has its basics on the fact that good actions are often the ones that perform well under pessimistic scenarios particularly in cases where safety is a primary concern.
One of the key aspects of IBM’s tree-search strategy is the generation and evaluation of the pessimistic scenario.
The generation process takes place for each of the leaves in the search tree.
The IBM model assumes that the state of the environment can be represented by the positions of objects.
Some of those objects change their positions randomly or by depending on the actions of the agents, which forces tree search to have branches.
If one can tell the worst sequence of the positions of an object among all of the possibilities, one can place and move that object accordingly in the pessimistic scenario.
After generating the different pessimistic scenarios, the IBM agent evaluates them using a score that quantifies the survivability of the agent which is an indication of the number of positions in which the agent can stay safely in the sequence of board.
Intuitively, an agent is considered to have high survivability if there are many positions that the agent can reach without contacting the other agents.
In that sense, the IBM autonomous agent chooses actions that maximize its level of survivability.
IBM evaluated the new model against state-of-the-art agents and the results were remarkable.
For starters, the new agents captured first and third place in the Pommerman competition held at the Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018) in Montreal.
One of the most impressive discoveries was to see how the effectiveness of the agent’s play improve proportionally to the level of pessimism which is illustrated in the following figure.
Autonomous agents are going to be one of the next frontiers in the evolution of AI.
The work that companies like IBM are doing to evolve this space can transition the field of autonomous agents from very specialized applications in self-driving cars or drones to more mainstream scenarios.