The observant amongst you may have noticed the Pouch's YARS rating system is currently down. This was partly technical because the way it was coded placed great strains on the diplom.org machine but also because certain people questioned the meaningfulness of the YARS ratings. This article introduces a new Diplomacy rating system, based on the ELO chess methodology, that should be up and running at the Pouch by the time the next issue of the Zine hits your Web browser.
In many ways, both YARS and the Hall of Fame work well in their intent to judge the best and most prolific players. However for us mere mortals the ratings are not comparable. For instance, in YARS a slightly negative rating may indicate frequent above average play or dreadful but infrequent play. What we need is a rating system that only judges ability, not the number of games played.
EIDRaS is an abbreviation of ELO Inspired Diplomacy Rating System, as you may have guessed it thus has the following advantageous ELO properties:
Each player has a rating R and each game that rating changes by an amount R (delta R) depending on the result, S, and the expected result, X, which itself depends on how you compare with the opposition ratings wise. Your rating should be considered as an approximate measure of your ability which with added results becomes more accurate. The K factor represents this by reducing the degree of rating change as the number of games you have played increases. The formula is simply
R = K(S - X)
S = W.n ÷ N is the game score where n is number of players, N the number of winners, and W is 1 if you won/draw or 0 if you didn't.
X is your expected result depending on your rating compared to your opponents. X = n.e^[c.R] ÷ j(e^[c.(Rj)] ) where c = 0.002 and the Rj's are the ratings of every player in the game including yourself. If more than one player played a nation, the time weighted average is used for Rj. For the mathematically inclined this becomes the ELO formula when n=2. [1]
K is a rating change factor. It is a measure of how much more accurate your rating has become as a result of this game information. Hence it depends on number of games you have played before, the press settings and how many of your opponents were provisionally rated. K is calculated by the formula K=(max(50s/(g+5),s) where g is the number of games played; s is given by max(f÷3, p.f) where p is the fraction of provisionally rated opponents and f a press factor taking the value of 20 for partial press; 15 for broadcast only and 10 for no-press. For real time (RT) judge games, f=f-4
The system will be seeded by a iterative method. All players are estimated to be worth 1000 rating points and HOF results are put through the above formulae to generate ratings. The output ratings are then used as new estimates and results fed through the system again. This is repeated until the variation in output ratings is small.
Newbies start with a 1000 point rating which will vary as per the above formulae. For the first seven games their ratings will be considered provisional and have less effect on the changes of fellow players rating via the K factor.
If more than one player plays a nation, the nation's rating is
assumed to be the time weighted average of the players concerned.
(Time being measured by the number of movement seasons each was at
the helm.)
Abandonments ratings will change by: R = min(0, tR÷(t+T))
where t is the number of seasons you played, T the number you
missed. It is not the place of an ability rating system to hurt the
undedicated however annoying they are, but I don't think they can
benefit either. It has proven very difficult to find a fair formula
for replacements so their rating is unaffected by such games. Like
old age this is not ideal, just better than the alternative.
A group of seven established dippers play three games, one right after the other. Note that the players take into each game the rating they hold as a result of the last game. Here are the results:
Name | Initial Rating | After ABC draw | After D solo | After ABCD draw |
---|---|---|---|---|
Another Stabber |
1300 |
1319 |
1290 |
1299 |
Bobby Bull |
1000 |
1032 |
1015 |
1135 |
Cannon Fodder |
800 |
837 |
826 |
850 |
Dave Decent |
1400 |
1366 |
1475 |
1471 |
Elaine Egotist |
900 |
888 |
875 |
864 |
Fluent Liar |
1100 |
1082 |
1064 |
1047 |
Gil Gullible |
1200 |
1177 |
1156 |
1135 |
Note how the ratings of A, B, and C, who all achieved the same results, converge, and similarly for E, F and G. Against this opposition, D really needs to win, which still has a healthy effect on his rating, but the four-way draw actually leads to a rating decline for D because he should have been able to achieve better.
EIDRaS has been developed by George Heintzelman and myself. Thanks to Brahm Dorst for initiating the rec.games.diplomacy newsgroup thread that buoyed us into action and to all the r.g.d contributors for helping to shape the system. Thanks to Manus for agreeing to host EIDRaS at the Pouch and offering to help code it up (ready, Manus?)
Tony Nichols (anthony.nichols@virgin.net) |
If you wish to e-mail feedback on this article to the author, click on the letter above. If that does not work, feel free to use the "Dear DP..." mail interface.