lichess.org
Donate

Evaluating Sharpness using LC0's WDL

It might be because past high-level draw by high level were not only about on-board chess competition factor, but the context of social competition structure, and other practical considerations, like keeping energies if a win was not necessary for some other tournament purpose (I don't know, but the master opening explorer, has lots of mysterious draws, which, of course might be so, because of my patzer fog... one can always assume that some expertise is beyond comprehension, unless one reached that level).

So, maybe LC0 training being more controllable for its initial prior policy and its pure outcome to termination (no resignation, and no convenience draws), might make for a less noise in the data analysis?

But that is a good question. There is also, the other question, or sharp for who (or is it whom)? In that sense, maybe I would extend your question to a full model of pairs of ratings as dependent variables, and the global optimization problem could use all of lichess database, and therefore have more information considered, even trenchs that could be fitted from knowing more of the rating extra range of input into that probability model. Not just best chess. I think Maia, did some preliminary work that way, but they were on their way to making an engine, with perhaps a model that had to fit with the existing LC0 engines.

I am not sure that they actually used the whole data, and all its information, in the bottom level model of human chess behavior. It was something of an error-based model w.r.t. to some best chess, I forgot what that best chess was, but I am not sure that it was LC0-RL itself. And the binning was not very fine grain. I don't recall why. And I think the figures were not telling about the variability of the fitted curves. (neither are many of such curves).

There may be reasons intrinsic to the data set characteristics of that nature, that would also explain why the binning has not been using the full fine grain range it seems to me that lichess data was providing.

The full spectrum of human play, with at least 2 ratings for each game (even if those are 1D, there would still be more information from the given model to use that, in my naive not having enough information, point of view.

All I remember, was that even if the not show cloud spread of the data was wild, there seemed to have been a find grain progression of the curves per their first figure 11. It was a conversion curve about outcome odds given some posited measure of position difficulty (based on SF score of them, not that this necessarily matters, i just like to give the full dependencies I could understand).

I am not answering the previous question, just expanding it as well as that of the op. Has anyone better memory, or awareness fo Mais pre-engine data analysis work with lichess full range data?
@ButterflyofDeath said in #20:
> Why do you think that this method is more valuable than looking at a grandmaster database and determining which openings are 'sharp' based on drawing percentage? isn't it the case (at least practically) that games with high draw percentage are less sharp?

The main drawback of using a database is that there need to be enough games in the database to make any conclusions. LC0 and the sharpness score can be used for any positions, so it's helpful when deciding between different options in openings that are less explored. It can also be used as an indicator for the sharpness of middle game positions.

In the post I focused on common openings since I think it's important to get a feeling if and how such values work before using them on positions where there isn't much common understanding about the sharpness of them. Well-known openings make the comparisons easier and are a way to ensure that the sharpness score works at least somewhat as intended.
Well, in your blog you remarked that you were surprised to find that the Grunfeld Defense was given a low sharpness score by your equation (which may be grounds to look for a different, more fitting equation). However the GM database shows that the Grunfeld is quite draw-ish (which may be grounds to keep your current equation).
Hello, interesting concept. I’ve set up a Nibbler with Lc0 but when I try to use your formula and use the numbers for the WDL the numbers for the log are negative. For example the W is 526 then 1/526-1 would be a negative number which I can’t use for the log. Am I doing it wrong?
@ButterflyofDeath said in #23:
> Well, in your blog you remarked that you were surprised to find that the Grunfeld Defense was given a low sharpness score by your equation (which may be grounds to look for a different, more fitting equation). However the GM database shows that the Grunfeld is quite draw-ish (which may be grounds to keep your current equation).

In general, there is no clear definition of sharpness, so it's always hard to determine how good a given quantification is.
I decided that the formula I used is good enough and rather than trying many different formulas, which differ only in a relative small number of cases, I wanted to explore different things using the formula. The good thing is that it can always be changed later, if something clearly better is found.
@Fianchiero said in #24:
> Hello, interesting concept. I’ve set up a Nibbler with Lc0 but when I try to use your formula and use the numbers for the WDL the numbers for the log are negative. For example the W is 526 then 1/526-1 would be a negative number which I can’t use for the log. Am I doing it wrong?

The W and L in the formula are rescaled such that W+D+L=1, so you need to divide all the WDL values from Nibbler by 1000.
So instead of using 526, you use 0.526
@jk_182 said in #26:
> The W and L in the formula are rescaled such that W+D+L=1, so you need to divide all the WDL values from Nibbler by 1000.
> So instead of using 526, you use 0.526

Thanks for claryfying. But I think I still have a mistake in my calculations. I tried the formula with W=490 and L=488 and my solution is 2737. Do I have to do something with the solution? In your graphics your scale goes from 0-1 I think this is could be the problem?
@Fianchiero said in #27:
> Thanks for claryfying. But I think I still have a mistake in my calculations. I tried the formula with W=490 and L=488 and my solution is 2737. Do I have to do something with the solution? In your graphics your scale goes from 0-1 I think this is could be the problem?

My scale only goes from 0 to 1 since I looked at opening positions, which aren't too extreme in terms of sharpness. Having W=490 and L=488 is a very sharp position, and the value has no upper bound, so the sharpness value explodes in these more extreme situations.
I also used the natural logarithm and I think that you used the logarithm with base 10. But that doesn't make a huge difference, I just wanted to point it out to avoid further confusion.