Two years ago when it was still Pre-Covid times, I was brought to a board game called Hanabi by a college friend. Unlike chess or other competitive game where two players are against each other, Hanabi is a collaborative game. Players are aware of other players' cards but not their own and attempt to play a series of cards in a specific order to set off simulated fireworks. People win or lose together (Trust me, it is a really fun game to play with friends and family). As there are many AI research reaches competitive superhuman performances like Go, chess, and some variations of pokers, I wonder if there are any researches on purely collaborative gameplay. And surprisingly, I found existing research papers writing about the Hanabi challenge.
“…imperfect information entangles how an agent should behave across multiple observed states. In Hanabi, we observe this when thinking of the policy as a communication protocol between players, where the efficacy of any given protocol depends on the entire scheme rather than how players communicate in a particular observed situation…Due to this entanglement, the type of single-action exploration techniques common in reinforcement learning (e.g,. greedy, entropy regularization)” can incorrectly evaluate the utility of such exploration steps as they ignore their holistic impact.
This area is quite challenging. Hanabi is neither a two-player nor zero-sum game in which agents typically compute an equilibrium policy such that no single player can improve their utility by deviating from the equilibrium - as a result, agents can achieve a meaningful worst-case performance guarantee in these domains by finding any equilibrium policy. However, in Hanabi, the value of an agent’s policy depends on the policies of its teammates. Even if all players play according to the same equilibrium, there can be multiple local optimal equilibria that are relatively inferior. For algorithms that iteratively train independent agents, such as those commonly used in the multi-agent reinforcement learning literature, these inferior equilibria can be particularly difficult to escape and so even learning a good policy for all players is challenging.
Learning a good policy for all players? Doesn’t this sound like what we are facing every day? In the working environment, we are balancing interests for all players on the table; in team works, we are trying to make everyone happy while they efficiently make contributions. Everyone has different backgrounds, mindsets, and philosophies, and selfish genes are pushing people to work towards their own interests. Many techniques are involved in decisions making: observations, communications, team-bonding activities… There’s no easy effort just like lightning up the fireworks in Hanabi.
Here’s a quote I took away from one recent company meeting, “problems are different from dilemmas. Problems have solutions, but dilemmas only have optimizations.” What we are facing in our lives are mostly dilemmas: we are making trade-offs between various options to balance the results we can endure. Making such optimizations is hard for humans, and we need to believe that developing novel techniques for multi-agent learning in reinforcement learning is crucial for broader collaborative efforts, especially with human partners. Even a tiny bit of improvement will make a tremendous impact in the social world, I am looking forward to the day to come.
Resources: