lichess.org
Donate

Exact Ratings for Everyone on Lichess

Quality post and immense effort to extract, analyze, and present this information in a well structured way with insights and commentary. Very cool, although some of that went over my head ahaha
Is there any way to have access to the result of all "recent" rated games, like last 30-60 days ? (a quick search in the lichess api didn't give me the info i wanted, mb i just missed it)
I'd like to tryout how springrank is doing on the rating task. From the 2017 paper it was also supposed to be better than elo.
@a100kpm_matrice said in #4:
> Is there any way to have access to the result of all "recent" rated games, like last 30-60 days ? (a quick search in the lichess api didn't give me the info i wanted, mb i just missed it)
> I'd like to tryout how springrank is doing on the rating task. From the 2017 paper it was also supposed to be better than elo.

You wanna head over to database.lichess.org/
I should've mentioned in the blog post, that's where the data comes from. There is sadly no way to get more up to date data than the monthly database release, hence why I wish lichess did weekly releases. There is the api endpoint /api/games/user/{username} that lets you 30 games per second of a given user. Not very feasible to get every game that way tho... Springrank looks cool, I will look into that!
Never heard of ordo Remi COulom Bayed-ELo seem about the same without going to detail. Any maximun likelyhood system over set of games instead of one game will beat glicko-2 for sure.
I believe the author had good intentions when he wrote this article.

"An annoying aspect of online play, especially at very high levels, is when players care too much about ratings. "

I know a much more effective and simpler method that allows you to not worry too much about rankings. You need to go to your profile settings and select “do not show ratings.” Simple and effective.

"It instead uses Glicko-2 which produces more accurate ratings."

The system may be better in some aspects, but it doesn't seem like using it on lichess will produce so good results. Sometimes lichess ratings looks like a joke. For example, I know two players: IM with a stable lichess rating of about 2500 and CM with a stable lichess rating of about 2750. The first of them plays not just stronger, but significantly stronger. Their FIDE rating reflects this, the lichess rating does not. I suspect that the desire to make Glicko-2's rating converge quickly inevitably results in its increased vulnerability to farming (though the validity of this expression needs to be explored).

"If Lichess was using something as archaic and unreliable as FIDE's Elo version, I doubt the predictions would be accurate more than 53% of the time, pretty much random."

Statistics don't work that way. If you were too lazy to check your statement, you shouldn't have included it here. I can just as easily say that I believe that using up-to-date FIDE ratings for classical chess should give a significantly better prediction than using lichess ratings for blitz or rapid. Also, what is this wonderful new “percentage of correct results” metric you invented? Both Elo and Glicko-2 are unable to predict any game outcome. They do not return the result, but the mathematical expectation of the number of points scored. If I play with an opponent who has a rating 200 points lower, the mathematical expectation will be about 0.76 points, which I will not score in any way in one game (this is against the rules), so the result predicted in this way will in any case be erroneous and will never coincide with the real one (except for the situation when opponents with equal ratings are playing - in this case the mathematical expectation is 0.5 points and such a result is indeed possible). It is possible to measure the difference between the mathematical expectation and the actual score in each specific case, but this difference cannot in any way be expressed as a percentage.

The big problem with both Elo and Glicko-2 (and, as I understand from your description, Ordo is no better) is that neither of these systems takes into account the simple fact that in a chess game there are three possible outcomes, not two. A draw is a natural outcome of the game, and the higher the level of the opponents, the higher the probability of a draw. That is why those with high FIDE ratings are forced to avoid relatively low-rated opponents. That is why it is almost impossible to achieve a super grandmaster rating (2700+) without playing in elite super tournaments (I do not consider the possibility of doing this by cheating). A GM with a rating of 2500 can draw twice with a FM with a rating of 2300, who was desperately trying to dry up the position from the very first moves, and will be punished for this as if he lost one game with one win, despite the fact that he may have in both of these games he was not in a worse position at any point and, perhaps, in a hypothetical 10-game match his opponent would not have been able to achieve a single victory. It is pointless. In theory, this should work, but in practice, the curve of the dependence of the average result of a game on the difference in rating between opponents differs significantly from the exponential formula at the heart of the rating system.
None of the systems have any problems with draws. mathematically they do no need any special treatment. Problems with system have been related to large number on new player who get stronger very quickly. Elo is dated system for estimating the strength but that is merely related taking into account relatively small amount of games in mathematically oversimplified way. BUT it in converge to correct values given enough time and stable player pool. Which obviously not true in lichess. Glicko-2/1 are marginally better.

I did not find any references in academic literature to Ordo so it pretty hard say
how good it is. Nor has it been tested against othre better known system like
baeys-elo (www.remi-coulom.fr/Bayesian-Elo/ or Sonas Chessmetrics en.wikipedia.org/wiki/Chessmetrics which seems bit akward in actual use.

Just about every Go site has used some decayin memory maximun likelyhood estimate. it just chess that like's to kling on the past. Also rating junkies do like the kick of seeing won poinst immediatedly

as for making Ordo fast: Just take a window fo recent games somehow seeded with pre-period rating.

But good system use more cpu that bad ones. there is no way around it.
@MyPoorRook said in #7:
> I believe the author had good intentions when he wrote this article.
Haha I hope so too

> You need to go to your profile settings and select “do not show ratings.” Simple and effective.
Already have. This article is more about a solution that keeps the ratings, and makes them more useful for everyone.

> Sometimes lichess ratings looks like a joke.
Actually they correlate quite well with FIDE ratings. Specifically Lichess Blitz ratings - FIDE standard ratings. Check out lichess.org/forum/general-chess-discussion/yet-another-lichess-vs-fide-rating-comparison#2

> If you were too lazy to check your statement, you shouldn't have included it here.
Yep you're right. It was very hand-wavy of me. Just for you, I'll track FIDE style Elo when I recompute all historical ratings, and see how it fares.

> Also, what is this wonderful new “percentage of correct results” metric you invented?
I've seen this used before, but it just counts how often the higher rated player wins. It was a very quick comparison to perform. It is a scale from 0.5 (random) and 1.0 (perfect ratings). But you're right, it's not perfect, and the difference between assigned and expected rating for every pairing (weighted by number of games in that pairing) is more appropriate, I've actually come to the same conclusion, I was just too lazy to compute it for this big dataset. I'll throw it in with the December ratings.
@petri999 said in #8:
> I did not find any references in academic literature to Ordo so it pretty hard say how good it is.
I picked Ordo because it is standard in computer chess, and I assumed those guys know what they're doing. Check out www.talkchess.com/forum3/viewtopic.php?f=2&t=44180

> Just about every Go site has used some decayin memory maximun likelyhood estimate.
Cool! I'll look into this

> as for making Ordo fast: Just take a window fo recent games somehow seeded with pre-period rating.
The key factor in Ordo being slow is that it's doing gradient descent on a single cpu thread. If I use my brain maybe I'll manage to put it on the GPU with Torch or something, and get a 1000x speedup. As for the period size, I look into it when I compute the whole backlog of ratings, maybe 3 months is perfect, maybe 3 weeks, we will see. Keep an eye out for my next rating release when December database drops!