Teaching Artificial Intelligence to Lie at Card Games

Applying Probabilistic Modelling to One Night Ultimate Werewolf

Matt Eland on Dec. 4, 2022

Cover image created by Matt Eland using MidJourney

I taught an AI to lie at my favorite card game.

In this article, I’ll explore how I did that and what considerations I had to make while designing an artificial intelligence to play a social deduction-based card game. I’ll also discuss where the project is headed and the potential approaches that you might consider building game systems as an AI developer.

The game I chose to model is One Night Ultimate Werewolf by Bezier Games. This is a social deduction game modelled on the popular party games of Werewolf and Mafia. I’ll give you a quick overview of the rules in the next section, for those unfamiliar with it.

One Night Ultimate Werewolf is copyright by Bezier Games. This article and its associated code are intended for educational and entertainment purposes only.

This content is also available in video form on YouTube

I chose to implement this system as a class library using C# and dotnet. I also built a desktop application using Windows Presentation Foundation (WPF) to show the status of the game, though this is mostly for demo purposes and to catch issues that my unit tests don’t spot.

Application Showing a Game of One Night Ultimate Werewolf

The code for this project is freely available in its GitHub Repository.

Let’s talk about what this game is, the decisions I made in building an early AI for it, how probability factors into it, and where the project is going from here.

A Brief Overview of One Night Ultimate Werewolf

Feel free to skip this section if you’re already familiar with ONUW and its rules.

Unlike Werewolf / Mafia, One Night Ultimate Werewolf (ONUW) is designed to be played quickly in less than 15 minutes.

The game can be played with 3 players or more and you always play with 3 more cards than you have players.

Game Structure

The game itself is broken down into several key phases:

Setup - each player is dealt a card face down and the remaining 3 cards are put in the center as shown below:

Game Setup

Night - each player closes their eyes, then an announcer (provided through the official Android or Apple apps) directs players to wake up and take their turn based on their initial role
Day - all players open their eyes and discuss what they think happened in the night, who they claim to be, and who might be the werewolf
Voting - after enough time has elapsed, the narrator prompts the players to point to the player they want to vote for.
Endgame - The player or players with the highest number of votes are considered dead. Their cards are flipped over. If at least one was a werewolf, the good team wins. Otherwise, the evil team wins.

Note: In the event that there are no Werewolves assigned to players, all players can still win by ensuring no player gets more than 1 vote.

Roles

Each card is a role in the game that governs which team you are on and what action you might get to perform during the night phase.

Some sample roles include:

Werewolf - one of the evil roles. You’re trying not to get voted out at the end of the game
Seer - a good role that gets to look at 2 of the cards in the center or 1 player’s card
Mason - a pair of good players (always include 2 masons) who get to see if another player is also a Mason
Troublemaker - a good player who gets to swap 2 other players cards without looking at them
Robber - a good player that gets to swap their card with another player’s in the night phase, looking at their new card in the process
Villager - a good player who gets no special ability

There are far more roles than that to list between the different expansions to ONUW, but these convey some of the intricacies of the game.

One critical factor that new players miss is that you’re not just trying to figure out who other players started the game as, you’re also trying to figure out if you are still the role you started as, as you may have been swapped into a different team during that phase.

If you’d like to see a sample game of One Night Ultimate Werewolf, check out a video of a sample game from the inventor’s YouTube channel:

As you can see, there’s a lot of role swapping, false claims, and accusations in a real version of Werewolf.

If you want to see more playthroughs, I strongly recommend the exceptional Bouncin Mouncin YouTube channel who produced over 200 high quality videos of game playthroughs over the years. We miss you.

Project Goals

My goal with this AI project was to see if I could create a set of AIs to play each other and emulate human strategies (and adapt their own) to play the game in a compelling and human-like manner.

But this is a challenging project because of its complexity. In a game of Werewolf, each player must keep track of:

What role they started as
What actions they took in the night
What role they think they are now
What role they are claiming to be to others
What actions they claim they took in the night
How much they trust each other player

Limiting the Scope

This was too much to take on at once so I chose to limit my scope significantly.

First, I decided to avoid initially adding roles like the troublemaker that can move cards during the night. This simplifies the first version of the probability engine powering the AI.

Secondly, I decided to restrict the way the AIs interact with each other. I decided that each AI should claim a role, claim actions they took, and then express suspicions about others and who they’re planning on voting for. After that, the voting phase would kick in.

Limiting the social interactions to a more formalized process kept the task manageable, though it did introduce more order and structure than is typically present in a ONUW game.

Thirdly, for the initial version AI players on the village team always tell the truth. This does make it easier for the evil team to claim roles not in play, but it also keeps the initial task of determining whether a player is trustworthy as significantly easier, since in a real game of ONUW players on the village team will frequently lie to try to trap evil players in bad lies.

All of these strategic decisions made it easier to build an initial version of a ONUW simulation that an AI could play, but did move the game further from the real version of the project. As time goes on, I want to reverse some of these decisions, but limiting scope in any game project is usually a very healthy decision, especially early on.

Finally, at the time of writing this article, I’m not yet done with my initial phase 1 of the AI. AI right now claim roles and tell somewhat convincing lies, but are not coordinated in their voting and do not yet accuse other players.

Candidate Approaches

When it comes to designing an AI, there are a lot of different candidate approaches. Since AI is a very broad field and there is so much variation inside of even just game AI, no one solution is going to work for every project.

For this AI project I considered a number of different approaches.

Traditional tree-based approaches such as min-max trees are often considered for games like tic-tac-toe, chess, checkers, connect four, and other games. These trees tend to perform well at finding optimal moves, but they do not work for all scenarios. In particular, they do not work for games with hidden information, which is crucial to ONUW.

Instead, because my goal is to be surprised by AI behavior, I thought that reinforcement learning would be a good fit. In reinforcement learning you model the game world and how to interact with it, then let the agent try random things until it sees what works best and what works worst. In short, it learns from throwing science at the wall and seeing what sticks.

The main problem with this is that each role in ONUW behaves completely different from each other, so either it would be difficult for the AI to get meaningful feedback on its performance in each role or I’d need to train separate AIs per role. Both approaches looked like they had potential problems with scale and the player’s role changing during the night.

I also considered a brute-force approach where all possible states of the game world are kept in memory and the player would compare what they had observed to determine which of those states are possible. Based on that, the player could then vote in such a way as to maximize the percentage of games where their vote would be a good vote.

This is a valid approach and it would perform particularly well once card moving roles are introduced, but it does require a lot of different possible game states to be stored in memory at once and it’s slightly hard to explain so I decided against it as an initial approach.

Initial Approach: Probabilistic Models

Instead, I decided to create a probabilistic model of the game board for every player.

In a probabilistic model, there’s a probability attached to every player’s card that it is a specific card.

For example, imagine a sample game with 3 players (Santa, Rudolf, and Comet given the time of year).

All players know that the game includes the following cards:

1x Villager
2x Masons
2x Werewolves
1x Seer

Now imagine that Santa is dealt a Mason.

Santa’s Starting Knowledge

Santa knows that there is one other Mason in play, so if Santa is considering the probability of Rudolf being the Mason, he would take the total number of masons left in the deck and divide that by the total unknown cards left in the deck.

In this case that’d be 1 remaining Mason divided by 5 remaining cards, resulting in a 20% chance that any other card is also the Mason.

If you apply that to every possible role and every unknown card, you get a probability model that looks something like this:

Santa’s Starting Probability Model

Note: In the picture above, each percent on a card is the percent that card is the specific role in question, not the percentage chance that that role is that specific card.

This also highlights an interesting fact: for both of the other players Santa currently thinks each has a 60% chance that they’re a villager and a 40% chance that they’re a werewolf.

Let’s fast forward a bit in the game. The night phase occurs and all Masons in the game wake up. In this case, Santa is the only Mason in play, so he wakes up alone.

He now knows definitively that Rudolf and Comet cannot be Masons. This alters his probabilistic model of those two, but not of the 3 center cards.

Santa’s Probability Model after observing no fellow Masons

Now it’s a 50 / 50 toss up for each player as to if they’re on his team or on the evil team. Worse still: Santa is not going to get any new information aside from learning what the other two players claim.

Probability and Claims

The night phase now ends and all players wake up and make claims and share what they learned in the night.

Rudolf claims to be the Seer who got to look at two cards in the center and saw the Villager and the Mason. This is compatible with the probability model as both of those cards could be in the center, however Santa doesn’t automatically adjust his mental model of probability in the game world to reflect that.

Santa’s Probability Model after Claims

Next, Comet makes their claim and states that they are the Villager.

Uh oh. While Rudolf and Comet have both given information that doesn’t conflict with Santa’s mental model, their information contradicts with one another. The villager card cannot be both Comet’s card and a card in the center.

Generally, villager players will vote for the player that they think has the highest chance of being on the werewolf team. In the event that two or more players have the same highest probability level, a random one will be selected.

However, since Rudolf provided more information than Comet and that information was compatible with what Santa knew (the Mason being in the center) that’s used as a tie-breaker so Santa votes with Rudolf to execute Comet.

Unfortunately for Santa, when the roles were revealed Santa realizes that Rudolf should have been on his naughty list instead.

Rudolf as the Werewolf and Comet as the Villager

Probability for Werewolves

Let’s take a look at how the same probability engine helped Rudolf trick Santa by claiming a safe role.

From Rudolf’s perspective, he drew Werewolf at the game start and woke up as the only Werewolf.

Because Rudolf was the only Werewolf player he was allowed to look at a card from the middle.

In this case, Rudolf sees the Seer in the middle so he knows neither of the other two players will have that role.

Rudolf as the Werewolf and Comet as the Villager

As a result, Rudolf now knows it is safe to claim the Seer role.

However, Rudolf still needs to describe what happened when he was the Seer, and he needs to do so in a way that is as persuasive and believable as possible.

To accomplish this, I have custom code for every role that a player can have. When a player tries to lie and claim that role, this custom code runs to generate believable claims from what that player knows.

In this case, Rudolf claims to have seen a Werewolf card (his own actual card) and a Villager card. The Villager card here is a random pick from the most probable available roles at the time the lie is generated.

Unfortunately, Rudolf is not omnipotent and doesn’t know that Comet is the actual villager, so this lie wound up being imperfect.

However, lies in real One Night Ultimate Werewolf games are often made with imperfect information and fall apart, and that’s part of what makes the game compelling and fun.

A World Full of Events

At a technical level, the simulation is focused heavily on card slots and game events.

Card slots are things that will have a card. These are either actual players or spots in the middle of the game board. These are simple and track the initial and current role as well as the name of the player or slot they’re associated with.

Events, on the other hand, represent anything that can happen in the game world. Some sample events include:

A player waking up
The seer looking at a card
A player claiming to be the Villager
A Mason seeing another Mason when they woke up
A Werewolf seeing they were the only Werewolf
A player voting

Some events are known to everyone such as voting or claiming roles. Other events are only visible to specific players.

The probability engine looks at all events visible to a specific player as it generates the probabilities for the board.

Some events will tell the probability system that a role is certain for a card. These events include looking at your own card at the beginning of the game or seeing another Mason during the night phase.

Other events tell the probability engine that a card cannot be a specific role. For example, when a Werewolf player wakes up at night and nobody else does, they know all other players cannot be Werewolves. This doesn’t tell them what those players actually are, but it does adjust the probabilities of various cards.

The result is a system that gives the AI a baseline heuristic for determining what cards are in play and if a given neighbor is likely to be naughty or nice.

The Result and What Lies Ahead

The simulation, in its current form, is still very limited - particularly at how AIs will vote. Players do not yet coordinate votes or try to impact other players to vote for players they know to be bad. The result feels like an AI that’s almost there, but makes some silly mistakes.

Real human players tend to play a lot more aggressive and chaotic and will push on obvious tactical errors and inconsistencies. Human players also lie, and lie with reckless abandon, even as villager roles.

At some point I will take the “training wheels” off of the villager AI and allow them to randomly lie initially to trick Werewolves into outing themselves.

Right now the system relies too much on randomness for strategy and role bluffing and I don’t like that. I want to build a neural network or genetic algorithm that can evolve an optimal set of preferences for when to claim what role, and how aggressive to be.

This is a project that has a lot of interest for me, particularly the aspects that involve reinforcement learning and potentially genetic algorithms. That being said, this is intended to be a research project and not a playable game, and it should in no way be considered a substitute for the amazing card game by Bezier Games. Instead, this is a project built to explore the systems of that game and how well an AI can grasp and execute on them.

If you’re curious about the various phases I will be pushing this simulation through, or when I intend to cover various capabilities, check out the project’s Readme file for more information.

I’m excited to see where the project goes as I can move to the point to be able to train the AI in an automated manner and have it discover new strategies that go beyond raw probability.

Although rare, I have already been fooled by this early AI. I can’t wait to see what creative things the AI will be able to do with what’s next.