The ARMADA Diplomacy club has decided to use EIDRaS as its player ranking system. A well known platitude in the publishing business is that every equation in a document cuts the readership in half. The original EIDRaS article was rather equation heavy, and in an effort to familiarize ARMADA members with the system I wrote this explanation. The end of the article discusses some differences between the originally published EIDRaS system, and how it will be implemented as the ARMADA ranking system.
The EIDRaS system is based on an assumption and some math. The assumption is that if you knew the exact skill, represented by the rating, of each player in a game you should be able to guess the expected result of that game. This is useful for rating players because players doing better than expected might be rated too low, and players that do worse than expected might be rated too high. So the ratings can be adjusted to be more accurate after every game. If a player's rating is correct they will sometimes do better than expected and sometimes worse so their rating should stabilize around the correct value.
The first job is to mathematically score the result of a game.
Score = players / winners (for players that won)
Score = 0 (for players that lost)
In traditional Diplomacy, there are seven points to be gotten in each
game, but the equations are generalized for variants with any number of
players. The points are shared evenly among the winners. The
losers get no points. I'll be using the example from the original
EIDRaS paper to illustrate. Here it is again:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
And here are the scores achieved by each player in each game:
|
ABC draw |
D solo |
ABCD draw |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The scores by themselves don't mean very much. The important thing
is the comparison of score with expected score. The expected score
formula is probably the most complicated in the paper so I'll start with
something simpler. Each player gets a number which is related to
their rating. We add up the numbers of all the players to find a
total number for the game. The fraction of the 7 available points
that we expect each player to get is equal to the fraction of their number
into the total number.
Expected Score = number of points available * (player's number / sum of all players' numbers)
For seven players with the same rating they would each expect 1 point meaning a seven-way draw. Anyone who does better than a 7-way draw will raise their rating. Anyone who does worse will drop. The equation looks complicated because the player's number is calculated by this formula:
Player's number = e^(0.002 * players rating)
An exponential function is chosen so that if a very highly rated player
wins a low rated game, or a low rated player loses a highly rated game
there will be very little effect on their rating. Getting back to
the example, we can now calculate the expected score for each player in
the first game.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Only players A, D, and G are expected to get better than a 7-way draw.
The other players won't lose much rating if they lose the game. After
the first game when A, B, and C draw we calculate the difference between
each player's score and expected score.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(rounding error) |
From the difference column we can make some qualitative judgements about
changes in rating. C should gain about twice as much rating as A
gains (+37 vs. +19.) D should lose a little more rating than B gains
(-34 vs. +32.) The difference is actually multiplied by a factor
of 20. How do we decide this rating change factor? It is more
or less arbitrary. A large factor will allow players to move
quickly to their correct rating, but a small factor will make the ratings
more stable once players are rated correctly. The answer is to make
the factor large at first, but decrease as a player plays more games. Even
old players need rating mobility so there is a base factor which
is increased by a multiple for a player's first games.
rating change factor = max(50 * base factor / (games played + 5), base factor)
A new player in his first rated game will change his rating by ten times the base factor. After twenty games it will be twice the base, and after forty five games it will be the base.
The base factor is determined by provisional ratings and press.
Games with full press are more accurate demonstrations of skill and should
create larger changes in rating. Press is given these values: partial
20, broadcast 15, no-press 10. A player's rating is considered provision
for his or her first seven games. Provisional players'
ratings are probably not accurate so their opponents' ratings should
change by less.
base factor = max(press value * fraction of non-provisional opponents
,press value * 1/3)
For example, if you are playing against six opponents and one of them is provisionally rated then your base factor is 5/6 the press factor. This can never go lower than 1/3 of the press factor so that games with all newbies will still affect their ratings.
The first decision we made with the ARMADA rating system is that different variants should have completely different rating ladders as skill with one variant does not necessarily imply skill with another. We felt that this applies to different press settings too. So the main ARMADA rating ladder will only count games played with the original rules, and different rating change factors for different press settings are removed. A factor of 20 will be used for all games.
The ARMADA rating system will still use the EIDRaS rules to modify the base factor for provisionally rated players. Any non-ARMADA rated player participating in an ARMADA sanctioned game will be deemed to have a provisional rating of 1000.
The original EIDRaS article also gave rules for games where a replacement
player is required. First, the country is deemed to have been played
by a player whose rating is the average of the ratings of players who played
that country weighted by the number of turns each played. Then the
rating change is divided between the players in the
ratio of number of turns played, except that a player who abandons
a position cannot increase rating to discourage abandonments. For
the ARMADA ladder we hope that we will not have too many abandonments,
and those will be in good faith that a player really cannot continue playing
such as going on vacation, etc. So we have decided to use the time
weighted average system, but allow abandoning players to share in positive
rating changes.
Look for the ARMADA Rating ladder at http://www.armada-dip.com, and we will keep the pouch updated as to the successes and troubles we encounter using this system.
Robert Steinke
(steinker@cs.colorado.edu) |
If you wish to e-mail feedback on this article to the author, and clicking on the envelope above does not work for you, feel free to use the "Dear DP..." mail interface.