It might be because past high-level draw by high level were not only about on-board chess competition factor, but the context of social competition structure, and other practical considerations, like keeping energies if a win was not necessary for some other tournament purpose (I don't know, but the master opening explorer, has lots of mysterious draws, which, of course might be so, because of my patzer fog... one can always assume that some expertise is beyond comprehension, unless one reached that level).
So, maybe LC0 training being more controllable for its initial prior policy and its pure outcome to termination (no resignation, and no convenience draws), might make for a less noise in the data analysis?
But that is a good question. There is also, the other question, or sharp for who (or is it whom)? In that sense, maybe I would extend your question to a full model of pairs of ratings as dependent variables, and the global optimization problem could use all of lichess database, and therefore have more information considered, even trenchs that could be fitted from knowing more of the rating extra range of input into that probability model. Not just best chess. I think Maia, did some preliminary work that way, but they were on their way to making an engine, with perhaps a model that had to fit with the existing LC0 engines.
I am not sure that they actually used the whole data, and all its information, in the bottom level model of human chess behavior. It was something of an error-based model w.r.t. to some best chess, I forgot what that best chess was, but I am not sure that it was LC0-RL itself. And the binning was not very fine grain. I don't recall why. And I think the figures were not telling about the variability of the fitted curves. (neither are many of such curves).
There may be reasons intrinsic to the data set characteristics of that nature, that would also explain why the binning has not been using the full fine grain range it seems to me that lichess data was providing.
The full spectrum of human play, with at least 2 ratings for each game (even if those are 1D, there would still be more information from the given model to use that, in my naive not having enough information, point of view.
All I remember, was that even if the not show cloud spread of the data was wild, there seemed to have been a find grain progression of the curves per their first figure 11. It was a conversion curve about outcome odds given some posited measure of position difficulty (based on SF score of them, not that this necessarily matters, i just like to give the full dependencies I could understand).
I am not answering the previous question, just expanding it as well as that of the op. Has anyone better memory, or awareness fo Mais pre-engine data analysis work with lichess full range data?
So, maybe LC0 training being more controllable for its initial prior policy and its pure outcome to termination (no resignation, and no convenience draws), might make for a less noise in the data analysis?
But that is a good question. There is also, the other question, or sharp for who (or is it whom)? In that sense, maybe I would extend your question to a full model of pairs of ratings as dependent variables, and the global optimization problem could use all of lichess database, and therefore have more information considered, even trenchs that could be fitted from knowing more of the rating extra range of input into that probability model. Not just best chess. I think Maia, did some preliminary work that way, but they were on their way to making an engine, with perhaps a model that had to fit with the existing LC0 engines.
I am not sure that they actually used the whole data, and all its information, in the bottom level model of human chess behavior. It was something of an error-based model w.r.t. to some best chess, I forgot what that best chess was, but I am not sure that it was LC0-RL itself. And the binning was not very fine grain. I don't recall why. And I think the figures were not telling about the variability of the fitted curves. (neither are many of such curves).
There may be reasons intrinsic to the data set characteristics of that nature, that would also explain why the binning has not been using the full fine grain range it seems to me that lichess data was providing.
The full spectrum of human play, with at least 2 ratings for each game (even if those are 1D, there would still be more information from the given model to use that, in my naive not having enough information, point of view.
All I remember, was that even if the not show cloud spread of the data was wild, there seemed to have been a find grain progression of the curves per their first figure 11. It was a conversion curve about outcome odds given some posited measure of position difficulty (based on SF score of them, not that this necessarily matters, i just like to give the full dependencies I could understand).
I am not answering the previous question, just expanding it as well as that of the op. Has anyone better memory, or awareness fo Mais pre-engine data analysis work with lichess full range data?