Danny LoebChallenges in
|
This document is organized into an Introductory section, which is followed by a mathematical evaluation of negotiation in such games. Finally, the application of this theory and research in a programming project called the Bordeaux diplomat is detailed, and the conclusions which can be drawn from this article are presented. A bibliography containing the references cited in this article is provided as an appendix.
This article is also available for downloading in DVI and LaTeX format from the web site:
Whereas a great deal of work has been done on two-player games, multi-player games are only beginning to become understood both on a theoretical and on a practical level. In so far as one studies games in an attempt to model the real world, this is a terrible loss, for large number of participants are involved in most real world conflicts.
The game of Diplomacy was invented in 1959 by Allan Calhamer to simulate one such conflict: the First World War. This game was chosen as a test bed for our multi-player game playing theories.
As in any multi-player game, positions are extremely difficult to analyze. The mathematical machinery developed to analyze two-player games crumbles, and must be rebuilt from scratch. In the final analysis, the most important factor in evaluating a position of a multi-player game is not located on the game board, but rather in the minds of the players. For this reason, in any multi-player game negotiation becomes necessary in order to determine, and perhaps modify these attitudes.
Diplomacy is an extremely rich example of a multi-player game. The number of players allows a large number of free-wheeling alliances without degenerating into total chaos. Many heuristics exist by which to measure a player's progress to victory, and there are a large number of points which need to be negotiated during the formation and continued operation of an alliance.
Furthermore, just as real life, Diplomacy is not a sequential game. Moves must be submitted simultaneously for all players for all of their units. Thus, normal search techniques either no longer apply or get bogged down in the tremendous size of the game tree, for it is impossible to complete even a one-ply search.
Finally, Diplomacy is an extremely suitable testbed for different approaches to multi-player games, since it is sufficiently popular to encourage unrelated groups of researchers to work on the game.
According to legend, the first play-by-mail Diplomacy game (1963A) included a diplomat (diplomacy automata) by Dave McDaniel which was eliminated in 1903. (See bibliography entry "MB".) In 1982, the Avalon Hill Game Corporation marketed "Computer Diplomacy." The program is designed primarily as a game master, but has very limited playing capabilities as well.
At the 1989 Conference on Computer Games, Kraus, Ephrati, and Lehmann introduced a diplomat they developed over three years at Hebrew University (see KLE1,KLE2). Their program is very good, and negotiates like a human. It constitutes the first serious attempt at a computer Diplomacy program.
During the 1991 Conference on Computer Games, Michael Hall and the author elaborated several ideas by which a computer could play the game Diplomacy. Whereas the Hebrew University diplomat is based primarily on the negotiation, our diplomat is based primarily on the strategy of the game. See bibliography entry HL for details. The present article outlines the Diplomacy Programming Project's progress since last year and connects our ideas with a new mathematical theory developed for multi-player games. (See also bibliography entry DL.)
In order to make the article as accessible as possible, we have preferred to omit all references to the detailed rules of the game. (For these, see the bibliography entry Rules.) In fact, most of the ideas discussed below can be be applied to any multi-person conflict.
I would like to express my appreciation to Constantin Staykoff for his dedication and hard work in implementing the strategic module of the Bordeaux Diplomat, and Ken Lowe for the use of his Diplomacy Adjudicator. I would further like to thank all the members of the Diplomacy Programming Project for their encouragement and help.
In theory, the minimax algorithm can be used to play
any two-player, deterministic, sequential-move game perfectly.
However, the above analysis guarantees that there exists a
perfect strategy to which any approximation thereof can be compared.
However, what happens when a third player is added?
For simplicity, assume that one player will win, and the other
two players lose. However, similar remarks hold in the absence of
such assumptions.
A game tree can still be formed as above with values assigned to the leaves.
However, problems are encountered when minimaxing the values back
up the tree.
Let n be an A-node in the tree whose children have all
been evaluated. v(n) must now be calculated.
Obviously, if n has a child whose value is equal to an A-win, then
this child is a good strategy for A, and the value of n will be a
win for A. Similarly, if all of the children of n are wins
for say B, then n is itself a win for B. However, what
happens if some of the children of n are wins for B and some are
wins for C? It is then impossible to evaluate n without some
psychological information regarding A's attitude to such situations.
His choice of move is indifferent, yet greatly affects the
strategy of the other players.
Jim Propp calls such positions "queer" (see bibliography entry 3PIG).
Are queer positions very common? If not, then we can hope to frequently
apply the traditional techniques mentioned above.
Unfortunately, given any impartial game (one in which to answer the question
"What moves are possible?", you need not pose the question "Whose turn is it?"
Thus, Chess is not an impartial game, since only White can move White's
pieces), G, the game 2+2+G is queer. Thus, impartial games large
enough to
contain a 2+2 component are queer.
A simple probabilistic argument shows that virtually all games are
queer. Let a,b,c,q represent the probabilities that a position is a
win for any given player or that it is queer respectively. Without
loss of generality, we consider games in which all internal nodes have
two children. By
symmetry, a=b=c. Thus, the probability to be queer q=1-3a is equal
to the probability that neither choice is a win for the
current player,
yet the two choices are wins for different opponents
(1-a)^{2} - 2a^{2}. The only solution that represents a valid
probability is a=0. Thus, q=1.
A traditional technique in analyzing games has been to break down the
game into a sum of simpler games. That is, the game G+H is the game
in which players may play in either G or H but not both. The game
is over when play is no longer possible in either summand. The last
player to have played is the winner.
Typically, the most powerful results have been found for impartial
games. The two possible results of a two-player game are N (win for
the next player), and P (win for the previous player).
For three-player impartial games, the possible sums are quite
complex (see the table below). The sums are calculated on the
premise that play rotates NFPNFPNFP.... (see 3PIG).
In the above classification, queer games are all considered to have
the same value. Indeed, it has been believed (see JVN) that all
three-player games are strictly-determined or symmetric
non-strictly determined. However, are queer games always symmetric?
For example, suppose N could choose between two
queer positions: one where F would name N or P as
winner, and one where N would name F or P as winner.
Perhaps, N would prefer the first choice even though both are queer
games, since he would then retain some chance of winning.
The two positions above must then be given different values: f1
and n1 respectively. Obviously, p1 can be defined in a
similar manner. Now, denote forced wins by f0, p0, and
n0 respectively. N has the following preferences
n0 > f1 , p1 > f0 , p0.
He is indifferent regarding f0 and p0. A position leading to
such a choice can not be simplified by his opponents, and must be
labeled simply n1.
Is N necessarily indifferent between f1 and p1?
Unfortunately, we can not answer this question until we clarify how a
player makes a choice between equally interesting possibilities.
One assumption is that he makes a random move. In that case, f1
may be preferred to p1 or visa versa. However, consider a game
with additional moves equivalent to certain other moves. For example,
in Chess allowing the Rooks to be moved by either
hand and all other pieces to be moved only by the left hand. Is it
then more likely the player will move his rook?
Although plausible, this is not a very pleasing model since it does
not allow us to prune redundant branches from our game tree. Instead,
I propose the model by which the players establish equivalence
classes of positions. (Two positions are said to be equivalent if the
equivalence classes of their non-dominated options are in one-to-one
correspondence via equivalence.)
Moves are made by randomly choosing a non-dominated equivalence class,
and then selecting a move within this class.
In this way, f1 and p1 are equivalence classes of equal
interest to N. He makes his choice randomly between the two. This
choice between F and P to select the eventual winner can be
represented as n2.
Similarly, n3 is the choice by N of who will choose the person
who will choose the person who
selects the winner. In fact, it can be shown that any position in a
three-player finite sequential perfect-knowlege game is equivalent to exactly
one of the equivalence
classes ni,pi,fi where i is an integer. ni+1 for
example is defined to be a choice for N between pi and fi.
Thus, we have the counter-intuitive result that no game is
symmetric under the hypotheses above.
Calculations with these refined classes become even more complicated.
For example, whereas "Nim" with two piles is completely
trivial with two players (the next player wins unless the two piles are of
equal size), there is no apparent rhyme or reason in the three-player two-pile
"Nim" table (see the table below).
Given a game G and a partition P of its players, we
define the quotient game G/ to be a new game with one player for
each block
(or alliance) of P. The player B in P in the game
G/P must make a move whenever it is the turn of some p\in
B. Thus, in G/P there might not be simple rotation
of play even if in G the players take turns playing in simple rotation.
The moves available to B in the game G/P are the same as those
available to P in the game G. B wins the game G/P
if any P in B wins.
A is said to be a winning (resp. losing) alliance of players in the game G
if the quotient game G/{A,P-A} is a win (resp. loss) for the
alliance A.
The set of winning games is a voting scheme, since adding players to
an alliance can only strengthen it, and given a pair of opposing
coalitions exactly one can win while the other must lose. (If
the game includes results other the simple
wins and losses, then the notion of a voting scheme must be
generalized. A generalized voting scheme is a order preserving map
from the subsets of V into the set of results {-n,-n+2,...,n-2,n}
such that f(A)=-f(V-A))
The play of a game can be viewed as the selection of a winning
coalition (see Sh).
The difference between the coarse classification and the fine
classification of game is that when a winning alliance is formed in
fact the alliance itself does not win, but rather only one member of
the alliance. Thus, in the end, most alliances are unstable. It is
this unstability which the fine types fi,ni,pi attempt to
measure in three-player games.
In conclusion, just as two-dimensional projections are often
insufficient to recreate a three-dimensional object, these coarse
classifications based on
two-player projections of a multi-player game do not contain the
subtle nuances which give the game its interest. However, it is
difficult to develop a single complete and rigorous theory as was the
case for the theory of two-player games.
Three-player games are much more complicated than
two-player games. Whereas, in a two-player game, there is only one
adversarial relationship, in a three-player game, any of the three
players can be opposed by an alliance of the other two players.
The exact strategy derived depends on many "psychological"
assumptions on the behavior of the other players when confronted with
choices of equal interest. For example, when a loss seems certain,
many players will choose to share the defeat with the player most
responsible for their loss. While maximizing your position against
such a psychology, one must be careful not to give the impression that
you are minimizing the position of any other player.
Whereas, when a victory seems possible, players chose their winning
coalition according to their estimated chances of having the victory
accorded to them. Thus, all critical members of an alliance must be
assured of the likelihood of victory for them.
In any two-player game, players can only be pleasantly surprised when
their opponent makes a bad move. In that way, the player will often
find a previously unavailable winning strategy.
However, this is not the case in three-player games.
Suppose for example that, N chooses f0 (a forced
win for F) when he could have chosen f1 (by which F would
have decided whether P or N would win). This is a bad move for
N
since he no longer has any chance to win. Nevertheless, P also loses
from this error.
Thus, in any three-player game, negotiation is necessary for several
reasons. First, as we saw above, each player will attempt to shift the
blame for the other player's misfortunes on the third player.
Here we discuss matters of loyalty and treason, and try to place an
appropriate "spin" on the moves that we have made.
Secondly, negotiation can be used to avoid the sort of error that N
made in the example above. If P had convinced N of the danger that
he was in, then perhaps N would have made the move that was in their
mutual interest.
On the other hand, if the other player no longer has any trust in you,
then he will be less receptive to any advice you might give. In an
extreme case, he might ignore his own normal preferences, and simply
hand the victory to the third player.
In a game with a very complicated search tree, communication does not
merely serve for negotiation itself, but rather these suggestions
allow a pair of players temporarily working together to exchange
information, and thus examine the search tree in parallel.
In order to better study the relationship between negotiation and
multi-player games, we chose the game Diplomacy.
Obviously, many other multi-player games exist. However, certain of these
prohibit negotiation among players. For example, in the game of Poker,
perhaps some information could usefully be exchanged among the players.
However, this is normally forbidden.
Some games such as three-player "Nim" are so simple that there
is little to discuss. Moreover, in any impartial game, there is a
great symmetry which
overshadows the negotiation. In a position n1, what can the
player F say to N different from what P might say in order to
persuade N to attribute the victory to him.
On the other hand, although there is no inherent symmetry in the game
Diplomacy,
statistics show that each country has an approximately equal chance to win
the game (see PG). In fact, according to Alan Calhamer, creator of
the game, a perfect game of Diplomacy is one in which no country is
eliminated, but rather everyone allies against the current leader until he
is weakened and a new leader appears. All of this coalition building and
rebuilding generally involves much detailed negotiation. Furthermore, an
effective alliance will require coordinated movements. Through these
negotiations the participants in any alliance will determine which of the
units will make attacks, and which will support those attacks.
Moreover, the strategy of the game Diplomacy is very rich. Thus, in
proposing alliances with Turkey, Russia and Austria-Hungary will each
be able to offer a number of advantages and disadvantages to Turkey.
Depending on the temperament of the Turkish player, he can accept one
of the offers, hold off while waiting further developments, or
even pretend to accept both offers while intending to soon
betray the trust of one or both of the other players.
When testing game programs, we have learned much more from
machine-versus-machine competitions than from machine-versus-human
competitions.
The humans are very unpredictable. Thus, a victory one day might not
mean that you have improved your program, but rather that the human is
tired, or has gotten overconfident, etc.
On the other hand, computer tests are repeatable. One variable can be
changed at a time, while hundreds of test games are played
automatically. Using such techniques, we are able to
automatically fine-tune the position evaluation heuristics of a
computer game program. Some programs use automated learning
techniques; in this case, it is vital that the program be able to play
thousands of games automatically.
For a game like Diplomacy, it is even more important to emphasize
automated testing.
Since there are seven players, an "average" program should win about
one seventh of the time. Thus, as compared with a two-player game,
approximately three or four times as many test games should be run in
order to statistically prove a similar percentage increase in the
number of wins.
Furthermore, the heuristic position evaluation used by the Bordeaux
Diplomat contains a large number of parameters whose values should
eventually be optimized through repeated tests.
When testing a Diplomacy program with human players, we have seen that
each player has many preconceived notions about the computer.
These preconceptions lead to unwarranted prejudices for or against
the computer, which in turn
influence the alliance formation process much more than any purely
game-related factor.
Thus, the real capacity of the program is clouded by a large number of
extraneous issues.
To avoid this, test games against humans have been organized using
Gunboat rules. By these rules, the identity of each player is kept
secret, and no communication is allowed other than through the
movements themselves. However, by eliminating communication, we have
eliminated the negotiation which was our main object of study.
Indeed, the ideal solution is to test our program against computer
opponents.
The first opponents one might think of are copies of our program, or
alternate versions of our program. However, this is an extremely
inadequate solution. Obviously, programs could recognize copies of
themselves by sending special messages at the beginning of the game.
The program would then ally with copies of itself, eliminate
any remaining players, and finally declare a draw.
Here, we assume that the programmer is trying to recognize his own
program. However, similar effects can occur without such intentions on
the part of the programmer.
For example, it is reasonable, all things being
equal, to follow advice from an intelligent player -- yet how
should a program judge the intelligence of others? The most reasonable
criterion is that an intelligent player would suggest moves that you consider
very strong. However, if the other program is a copy of yourself, then
it will evaluate moves the same way you do, and then will probably
suggest to you the very moves you would have thought of making
yourself. Under such conditions, your program would inevitably
ally with itself.
Once again the subject at hand is being clouded by a host of
irrelevant details. To avoid these problems, any Diplomacy program
should be tested against a set of programs written by different groups
of programmers using completely different techniques.
But how can seven diplomats programmed by
different people carry out any sort of meaningful dialog?
How should messages be transmitted between programs? And how can these
messages be understood?
In many computer game competitions, moves are transmitted by hand. The
computers sit side-by-side while operators enter moves. This
guarantees
compatibility, but in a game of Diplomacy, where for each
turn up to 34 movements must be entered, and may be preceded by arbitrarily
many diplomatic messages, the game would inevitably slow down to a
standstill. Indeed, the test at the 1992 Computer Olympiad (see DL) was
terminated after 12 movement phases due to lack of time, whereas it is
important to run a large number of complete game in order to
accurately measure the potential of a Diplomacy program.
To run the necessary number of games in a reasonable amount of time, the
computers communicate directly. The Diplomat
Interface has been developed by the Diplomacy
Programming Project to facilitate such communication.
This interface is written in LCS, is a variant of SML developed by
Bernard Berthomeiu and the Outil et Logiciel pour le
Communication group at the Laboratoire d'Automatique et de
l'Analyse de Systeme in Toulouse. LCS differs from SML in that it
allows the management of parallel processing. This implementation choice
notwithstanding, the interface is capable of
launching programs written in any language within the UNIX
operating system. The standard output of each player-diplomat is
rerouted via the use of an LCS instream to the LCS agent responsible for the
communication with that player-diplomat. This agent then
can output in the corresponding LCS outstream where it is
available on a FIFO basis as standard input for the diplomat.
To simulate the play of humans against machines, the Diplomacy
Interface is capable of opening X or suntools windows for
specified countries. Input and output for these windows are handled in
a manner similar to that described above. Actually, it is impossible
for a diplomat to determine who is playing the other countries: a
human or a machine. Thus, no prejudices against humans can be
built into the diplomats, and humans must perform a sort of Turing test
if they wish to distinguish the machines from other humans.
Each player communicates only with the Diplomat Interface. The
Diplomat Interface acts as a game master. That is to say, players can
submit their moves with the SUB command. When all
the moves are
in and all players are ready to go, or when the deadline arrive,
results are computed and distributed with the ORD
message. The
current position is announced via the NOW message,
the missing
orders for next turn are requested with the MIS message.
It is the Diplomat Interface that declares the game over either
because a player has reached the victory condition of 18 supply
centers, or because the players have agreed on a draw with the
DRW command.
However, the Diplomat Interface is also a real interface; that is to
say, it can pass messages from one player to another.
It was decided that this communication should not be made in a natural
language. The study of natural languages is an entirely separate and
difficult field of research; but we wish to
concentrate on the problems directly related to negotiation and game
playing. Thus, a simplified language called the DPP Protocol was
devised by which the players would be required to communicate
(see DPPP,DPPEx).
Tests using actual negotiation transcripts show that this simple
language allows the expression of nearly all of the concepts usually
employed when negotiating in Diplomacy. It is impossible to say that it
is sunny outside, but it is easy to threaten a player with war if he
takes one of your cities. For example,
In order to make the language as rich as possible, several dozen
predicates have been defined. In addition to their grammar,
definitions of their suggested meaning and examples of their use are
given. However, there are no guarantees that other programs will
interpret the messages accordingly. One can always write a
psychopathic program which attacks only those who propose peace.
However, one could expect that, as in the real world, psychopaths would
have a terribly short life expectancy in Diplomacy. In consequence,
most reasonable diplomats will start all relationships with the
tentative assumption that their partners will react "normally" to any
signals made.
Certain programs might not understand all possible
constructions in the protocol. However, for the most part, they should
understand the messages we indicate in the dictionary as the most
basic. This includes the message TRY. Its purpose is
that upon
receipt of an unknown message, we respond with TRY
and a list of
constructions understood by our program. In this way, some sort of
communication can be held between programs of widely differing levels
of sophistication.
The Diplomat proposed in bibliography entry HL is based on a
strategic core.
This lower brain of the diplomat does all the calculations and thus
forms the basis for the decisions taken by the upper brain, or
negotiator part of the diplomat.
The negotiator does not know the details of the map, nor the rules of
the game. On the other hand, it must be able to communicate with the
other players, distinguish between friends and enemies, form
alliances, and so on. To make decisions, the negotiators poses
concrete questions to the strategic core, and following the answers to
these questions, it chooses the appropriate moves to submit, or
messages to send out.
The strategic core does not know which country the diplomat is playing.
It is simply a disinterested, objective observer who calculates the
best moves for any country given a certain number of constraints
along with the value of the resulting position of all countries.
This strategic core can be tested separately from the rest of the
program (as was done at the 1992 Computer Olympiad). However, its
failure to communicate and to grasp the notions of balance of power,
greatly limit its potential.
On the other hand, the program's division between the strategic core
and the negotiator is completely modular. Thus, it would be possible
to redesign the Bordeaux diplomat simply by replacing one of the two
modules with a new program. Given a collection of several negotiators
and several strategic cores based on different algorithms, we can easily
construct an enormous number of full diplomats.
The strategic core is given the current position including the supply
center distribution, and a set of "concerned" countries, which has been
pre-divided into alliances by the negotiator. Each alliance can
optionally have a list of "mortal enemies" designated. Next, the
length and depth of the search is specified. The core can be told that
certain countries are diplomatically bound not
to enter certain "demilitarized zones," or to make or not make certain
moves.
Finally, a "seed" can be specified for the search.
In return, the strategic core must return for each concerned
country the move that maximizes the total interest of their alliance
(while minimizing the interest of their enemies) subject to the
constraints given along with the values themselves.
Obviously the strategic core is used to select the moves to be used
subject to the constraints of all of the treaties which we have agreed
to, and decided to follow.
However, the core has many other applications:
Many difficulties were encountered in writing the strategic core of our
program. It was first outlined in bibliography entry HL, but several
difficulties
were noted in the search routine. In the end, a new algorithm was
implemented by Constantin Staykoff.
As we have seen above, perfect-knowledge position evaluation
is much more difficult to define in a multi-player
environment -- especially in a game such as Diplomacy with as many as
seven players.
Already in two-player games such as Chess, the strategy is
sufficiently rich to prevent programs from playing perfectly.
Hence, a search is made through the most "important" parts of the
game tree, and value of the leave positions are then "calculated."
The selection of tree paths and the evaluation of the terminal
positions are governed by various heuristics.
The game tree of Diplomacy is much more complex. All units must move
simultanously. Thus, the game is not a sequential game, but rather a
seven-dimensional matricial game. For example, the number of possible
non-equivalent openings is 4,430,690,040,914,420 (over four
quadrillion, see bibliography entry Green) as
compared to 20 in chess. In chess, the average number of moves mounts
to about three dozen during the mid-game. However, in Diplomacy, the
number of available moves increases by an additional factor of several
billion during the movement phases as the number of units and possible
supports rises.
An efficient simplex algorithm has been invented by Lemke and Howson
(see LH) to find Nash equilibria in two-dimension matricial games.
However, even if this method could be generalized to multi-player
games, there would not be sufficient time to input the entire game
matrix to the algorithm. Hence, whereas it is possible to search
several turns ahead in chess, it is impossible to conduct a complete
one-ply search during a movement phase of the game Diplomacy. Instead,
we must carefully converge on the Nash equilibria in the enormous
matrix without looking at all the possible moves.
It is senseless to consider looking at two movement phases at the same
time. All predictional capacity must then be built into our position
evaluation heuristics.
Fortunately, during the building and retreat phases, the number of
possible moves is never quite so astronomical. It is thus possible to
do a complete one-ply analysis of these phases of the game.
Optionally, the search of a build or retreat phase could be less
complete, but anticipate the following phase via a recursive call to
the strategic module of the program.
A complete description of the heuristics of our position evaluator can
be found in bibliography entry HL, so we will concentrate on the search
techniques itself. What moves should be considered? In what combinations? And
in what order?
The search technique proposed in EBBFS (Even Better
Best-First Search) involved an iterative search for moves for all of
the concerned countries in parallel. Moves were considered one at a
time by each alliance following a set agenda and adding orders for one
unit a time (Best-First). The results of a given move were calculated
in relation to the "current best" moves for each of the other
alliances. The position is kept as current best move or is rejected
according to the output the position evaluator gives to the resulting
position.
Suppose a move MA was rejected for the alliance A because
the alliance B had considered the move MB to be its
best move at the time A evaluated MA, but later B rejects
MB for a better move M'B. According to the above algorithm,
the alliance A would not be given a chance to reconsider the
interest of the move MA in light of this development.
Given the drawbacks of EBBFS, we propose a new search technique called
RES (Refined Evolutionary Search). The name is derived from the fact
that this search strategy is similar to that adopted by nature to play
the game "Biology."
In the game biology, each player must chose DNA sequences
families. These sequences then define a certain number of living
organisms. If the descendants of these living organisms are numerous
after several billion years in a certain prechosen environment, then
the player has won.
To play the game of biology, a player must in theory test every DNA
sequences family in competition against every possible set of rivals.
However, such a research is impossible. Thus, nature has adopted the
evolutionary search techniques. By this method, animals are mutated
one at a time by randomly changing a certain number of genomes. The
successful mutations replace the original species,
and the unsuccessful mutations are rejected.
Similarly, in Diplomacy, a large alliance cannot possibly consider
all of its movements. Thus, the research is started by considering a
certain "seed" move. This move being considered is then mutated by
selected a certain number of units, and randomly selecting new moves
for these units. Mutations are made for one alliance after another
until the maximum search length has been reached.
The value of any given mutation with respect to the current moves of
all other alliances is compared to the value of the original move with
respect to the current moves of all other alliances. In other
words, when an alliance adopts a new strategy, this may change the
value of the strategies already being considered by the other
alliances.
A certain number of consistency rules cut the loss of time due to the
consideration of uninteresting mutations. In all cases where allied
forces are ordered to attack each other, the moves are changed into
mutual support. Furthermore, as a time saving measure, support is only
considered among units belonging to the same alliance.
If the search passes through Nash equilibrium, then no alliance will
have any interest in changing its moves. Thus, no Nash equilibria will
be lost.
Nevertheless, the program only considers pure strategies whereas pure
Nash equilibria do not exist for all games. Suppose Italy with his
fleet in the Tyrrhenian Sea is trying to
counter Turkey's incursion into the Ionian Sea. If Italy defends
Naples, then Turkey should attack Tunis. However, if Italy defends
Tunis, then Turkey should attack Naples.
Assuming that Italy and Turkey are the only countries concerned by
this situation, the RES search will eventually enter a
cyclic pattern of length four as described above.
Eventually, we would like our program to be able to detect such
cycles. The strategies involved in these cycles could then be
extracted and subjected to a detailed matricial game analysis. A mixed
Nash equilibrium would then be found by which Italy moves to Naples
and Tunis with certain probabilities. At that point, a random choice
would be made, and the move evaluated according the expected return.
Currently our program does not do such detailed analysis but merely
halts at a random point (the search limit) and returns the strategies
currently being considered and their actual return.
The research is designated as "refined," since that is how the
mutation rate is set. At the beginning of the research, all units are
reassigned new orders. There is no reason to believe that
the old orders are good, so we attempt to examine as many completely
different possibilities as we can. Later on in the research, we are
more confident in the moves being considered, and therefore restrict
ourselves to smaller and smaller modifications. As the search limit
approaches, we only consider changing the order of one unit at a time,
or permute the supports used.
The following procedure can be used to calculate best moves and their values
for each given alliance with respect to a certain position (units and supply
centers) in the game Diplomacy and under the following constraints:
demilitarized zones, required moves, forbidden moves, search depth, and search
width.
In biology, Mammals have retained their deep-diving instinct to some
measure despite having left the oceans millions of years ago.
Similarly, in Diplomacy, a player may attempt to defend a space which
comes under attack early in the research. In response, the attacking
alliance may add enough support that defense becomes hopeless. However,
the defender will persist in his futile support unless there is
positive gain by making some other move. (When there is a
positive gain available from another move, a cyclic situation as
described above often results.)
Such behavior in biology is useful; for example, when people fall into
icy waters. Similarly, such a behavior is useful in Diplomacy in order
to make life as difficult as possible for the attacking player.
Mathematically, we seem to lack a good language in which to describe
such problems. Underlying psychological hypotheses are necessary, which
open the game up to the fuzzy world of negotiation.
Our view of negotiation is that it must be based on a solid strategic
core. Otherwise, the diplomat cannot decide what to ask for, nor
evaluate any sort of offer. Early versions of this strategic
core show good survival skills, but lack the savvy to pull off a win.
Nevertheless, in blind tests, players are unable to distinguish
the strategic core from human players.
In combination with an automatic negotiator which will
temper the aggressivity of its strategic alter-ego, we should be able to
compete at an equal level with humans in the game of Diplomacy.
Simultaneously, we hope that existing diplomats will be made available
in conformity with a universal language of communication among
Diplomat which we have developed, and that many new programs are
introduced.
Thus, we will be able to begin the repeated testing necessary to
compare various theories, and to optimize our strategies.
By attempting to solve these difficult problems resulting
from the interactions of intelligent agents, we hope to better understand
what sort of capabilities such an agent should have, when it cannot assume
that everyone in the world is its friend. Creating programs that
can play complex multi-player games is a step towards creating programs
that can deal with the complexities of the real world and the craftiness
of humans.
If you wish to e-mail feedback on this article to the author, and clicking
on the mail address above does not work for you, feel free to use the "Dear
DP..." mail interface, which is located
here....
Three-Player Position Evaluation
Traditional Techniques
However,
due to time constraints, it is seldom possible to analyze the entire
game tree. Therefore, game programs often use various search
techniques (such as alpha-beta from bibliography entry AB)
and approximative
evaluation functions which analyze the position using different
heuristics, and thus need not search to the end of the game.
Queer Games
0+1--0+0
/ n p
/
0+2--0+0
/ n p
/
1+2--1+1--0+1--0+0
q \ f \ n p
\ \
1+0 1+0--0+0
n \ n p
\
0+0
p
Figure 1: Three player nim
For example, consider the game of "Nim" played with
three players. Players take turns removing stones from two piles. Any
number of stones (at least one) can be removed from a single pile. The player
to make the last move wins. In such a game, 0+0 is a win for the
previous player.
Thus, 1+0, 0+1, 2+0 and 0+2 are wins for the next player. Hence, 1+1
is a win for the following player. Therefore, the position 1+2 is
queer, as it yields
a choice for the next player between wins for the previous
player and the following player.
Sums of Games
+ | N P
---+-------------
N | N or P N
P | N P
The table above gives the sums of all two-player impartial games except
N+N. In fact, the sum of two N's
depends on the particular games, yet a theorem of Grundy allows
us without loss of generality to consider only sums of games of "Nim" with one
pile. In these trivial games, the P positions are equivalent to empty
piles, and N positions are equivalent to non-empty piles. The sum of
two N positions is N itself if and only if the sizes of their
associated "Nim" piles are distinct (see bibliography entry
WinningWays).
+ | P N F Q
---+-----------------------
P | PQ NQ FQ Q
N | NQ NFQ PNQ NQ
F | FQ PNQ NQ NFQ
Q | Q NQ NFQ FQ
Only in one particular case (P+Q) is the sum predetermined.
To determine the value of the sum, there are two further problems. First,
there is no generalized theorem of Grundy. Grundy's proof supposes that
all "reversible" moves can be ignored. However, in three-player
impartial games even if P's move is reversed by N,
he may have gained Zugzwang, and placed F in a difficult position.
Second, as can be seen below, even the game of "Nim" is no longer
trivial once there are three players.
Classifying Queer Games
i \ j | 1 2 3 4 5 6 7 8 9 10
---------+---------------------------------------------------------
0 p0 | n0 n0 n0 n0 n0 n0 n0 n0 n0 n0
1 n0 | n1 f1 n2 n2 n2 n2 n2 n2 n2 n2
2 n0 | n1 f1 n2 n2 n2 n2 n2 n2 n2 n2
3 n0 | f1 n2 p1 p1 p1 p1 p1 p1 p1 p1
4 n0 | n2 n2 p1 f2 n3 f3 n4 n4 n4 n4
5 n0 | n2 n2 p1 n3 f3 n4 n4 n4 n4 n4
6 n0 | n2 n2 p1 f3 n4 p3 p3 p3 p3 p3
7 n0 | n2 n2 p1 n4 n4 p3 f4 n5 f5 n6
8 n0 | n2 n2 p1 n4 n4 p3 n5 f5 n6 n6
9 n0 | n2 n2 p1 n4 n4 p3 f5 n6 p5 p5
10 n0 | n2 n2 p1 n4 n4 p3 n6 n6 p5 f6
More players
Until now, we have only discussed games with three players.
When a fourth player is added, the refined classification
(ni,fi,pi) breaks down completely. However, the crude
classification scheme N,F,P,Q can be extended. Instead of
four types of games, now twelve are needed.
Under this view, game types are in one-to-one correspondence with
voting schemes (see bibliography entry Isbell).
Number of Voting Schemes
n 0 1 2 3 4 5 6 7
---------------------------------------
v(n) 0 1 2 4 12 81 2646 1422564
A voting scheme consists of a set of voters
V, and a set of winning coalitions W subject to the
condition that if V is a superset of B and B a superset of A in coalition
set W, then B is in W, and if A=V-B, then exactly one of
A and B can be a
member of W. In other words, a voting scheme is an order
preserving map f from the subsets of V to \{-1,1\}
such that
f(A)=-f(V-A).
Negotiation
The Theoretical Basis
By giving a player useful suggestions, one can hope that his trust for
you will be raised. On the other hand,
you can suggest a move to a gullible player which is more in
your interest than his. If you have promoted his trust in you up to
now, then perhaps he will accept your advice (against his better
judgement).
Diplomacy
Diplomacy Programming Project
France: "SND ENG (IFF DMZ(PAR BRE MAR) THN PCE ELS NOT(PCE))"
This language was developed independently of the diplomat we are
developing. Thus, one can hope that other programmers will find the
language as convenient as we are.
The Bordeaux Diplomat
Division of Control
Further discussion of the diplomatic aspects of our program are
postponed to a later article as continued work progresses on the
negotiation module of our program by Per Westling
(Linkoping, Sweden).
The Strategic Core
Search Techniques with Simultaneous Movement
NOTE: The value of a position for an alliance is the sum of the values
for each member of the alliance minus the value for any indicated
mortal enemy.
Calculate the position P resulting from BestM.
For each alliance k, let VBestMk be the value of P for
alliance k. (see note below)
For each unit u, create a list of legal moves LM(u) given
the map, demilitarized zones, forbidden moves, and required moves.
Abort if LM(u) is empty.
Set i=1.
Select a set SU of n units belonging to alliance p.
For all units u in
SU, set RM(u) to a random move or hold from LM(u).
Set M to RM.
Replace in M all orders of the form A MovesTo B and B
Holds with appropriate supports.
Let NewVj be the value of P for the
alliance j. (see note below)
If i is less than width, then continue with step 2.
Otherwise, output BestMj and VBestMj for each alliance, and exit.
Conclusions
The theory of multi-player games and the game Diplomacy in particular
is too deep and rich to expect a complete analysis.
However, in view of the applications of multi-player games, we must
persist in our efforts to model such games.
Bibliography
Danny Loeb
Universite de Bordeaux I, France
(loeb@delanet.com)