DP S1998R: Diplomacy's New Rating System

A Player Rating System for Diplomacy

--or--
See YARS Later, Welcome EIDRaS

by Tony Nichols

Introduction

The observant amongst you may have noticed the Pouch's YARS rating system is currently down. This was partly technical because the way it was coded placed great strains on the diplom.org machine but also because certain people questioned the meaningfulness of the YARS ratings. This article introduces a new Diplomacy rating system, based on the ELO chess methodology, that should be up and running at the Pouch by the time the next issue of the Zine hits your Web browser.

Why Not YARS or The HoF?

In many ways, both YARS and the Hall of Fame work well in their intent to judge the best and most prolific players. However for us mere mortals the ratings are not comparable. For instance, in YARS a slightly negative rating may indicate frequent above average play or dreadful but infrequent play. What we need is a rating system that only judges ability, not the number of games played.

EIDRaS Properties

EIDRaS is an abbreviation of ELO Inspired Diplomacy Rating System, as you may have guessed it thus has the following advantageous ELO properties:

Players of similar ability have similar ratings allowing GMs to designate games for players rated 2000+ or between 1000 and 1400. Players joining such games will be guaranteed a more even quality and thus hopefully better quality game.
The difference in rating between two players is indicative of how their performances should differ if paired in the same game. For instance if you were rated 200 above France, you'd be expected to score half a point more and this would be the case if you were rated 3000 and France 2800 or you 800 to her 600.
The abilities of opponents affect your rating. Beating up on a bunch of newbies will do your rating a lot less good than soloing against quality opposition. Conversely, losing to poor players will inflict greater damage on your rating.
Recent results affect your rating more than old ones, hence the ability to improve, or regress, is recognised and with time those awful early results will stop dragging your rating down.
Players will tend toward their true rating from 1000 over time, hence the top ratings will not be filled with players who have played one or two games but got a solo, nor will it reward playing extra games once a relatively accurate rating is established.
The rating system includes map variants on an equal basis.

The Math

Each player has a rating R and each game that rating changes by an amount R (delta R) depending on the result, S, and the expected result, X, which itself depends on how you compare with the opposition ratings wise. Your rating should be considered as an approximate measure of your ability which with added results becomes more accurate. The K factor represents this by reducing the degree of rating change as the number of games you have played increases. The formula is simply

R = K(S - X)

S = W.n ÷ N is the game score where n is number of players, N the number of winners, and W is 1 if you won/draw or 0 if you didn't.

X is your expected result depending on your rating compared to your opponents. X = n.e^[c.R] ÷ _j(e^[c.(R_j)] ) where c = 0.002 and the R_j's are the ratings of every player in the game including yourself. If more than one player played a nation, the time weighted average is used for R_j. For the mathematically inclined this becomes the ELO formula when n=2. [1]

K is a rating change factor. It is a measure of how much more accurate your rating has become as a result of this game information. Hence it depends on number of games you have played before, the press settings and how many of your opponents were provisionally rated. K is calculated by the formula K=(max(50s/(g+5),s) where g is the number of games played; s is given by max(f÷3, p.f) where p is the fraction of provisionally rated opponents and f a press factor taking the value of 20 for partial press; 15 for broadcast only and 10 for no-press. For real time (RT) judge games, f=f-4

Seeding the System

The system will be seeded by a iterative method. All players are estimated to be worth 1000 rating points and HOF results are put through the above formulae to generate ratings. The output ratings are then used as new estimates and results fed through the system again. This is repeated until the variation in output ratings is small.

Newbies

Newbies start with a 1000 point rating which will vary as per the above formulae. For the first seven games their ratings will be considered provisional and have less effect on the changes of fellow players rating via the K factor.

Abandonments and Replacements

If more than one player plays a nation, the nation's rating is assumed to be the time weighted average of the players concerned. (Time being measured by the number of movement seasons each was at the helm.)
Abandonments ratings will change by: R = min(0, tR÷(t+T)) where t is the number of seasons you played, T the number you missed. It is not the place of an ability rating system to hurt the undedicated however annoying they are, but I don't think they can benefit either. It has proven very difficult to find a fair formula for replacements so their rating is unaffected by such games. Like old age this is not ideal, just better than the alternative.

A Simple Example

A group of seven established dippers play three games, one right after the other. Note that the players take into each game the rating they hold as a result of the last game. Here are the results:

Name Initial Rating After ABC draw After D solo After ABCD draw

Another Stabber

1300

1319

1290

1299

Bobby Bull

1000

1032

1015

1135

Cannon Fodder

800

837

826

850

Dave Decent

1400

1366

1475

1471

Elaine Egotist

900

888

875

864

Fluent Liar

1100

1082

1064

1047

Gil Gullible

1200

1177

1156

1135

Name	Initial Rating	After ABC draw	After D solo	After ABCD draw
Another Stabber	1300	1319	1290	1299
Bobby Bull	1000	1032	1015	1135
Cannon Fodder	800	837	826	850
Dave Decent	1400	1366	1475	1471
Elaine Egotist	900	888	875	864
Fluent Liar	1100	1082	1064	1047
Gil Gullible	1200	1177	1156	1135

Note how the ratings of A, B, and C, who all achieved the same results, converge, and similarly for E, F and G. Against this opposition, D really needs to win, which still has a healthy effect on his rating, but the four-way draw actually leads to a rating decline for D because he should have been able to achieve better.

Acknowledgements

EIDRaS has been developed by George Heintzelman and myself. Thanks to Brahm Dorst for initiating the rec.games.diplomacy newsgroup thread that buoyed us into action and to all the r.g.d contributors for helping to shape the system. Thanks to Manus for agreeing to host EIDRaS at the Pouch and offering to help code it up (ready, Manus?)

[1] Actually the constant is different, we use the natural logarithm and X averages 1 rather than 0.5 for chess so you cannot use this to compare your chess vs. Diplomacy ability, but the form is essentially identical.

Tony Nichols
(anthony.nichols@virgin.net)

If you wish to e-mail feedback on this article to the author, click on the letter above. If that does not work, feel free to use the "Dear DP..." mail interface.