Cheaters are Out of Control Today. Is Lichess anti cheating team on vacation today?

Doing this "work" for free is not an apologize if you do it under a bad premise. Thinking that cheat detection is purely handled via ML is such a premise. The guy was offered to stay in touch with the moderation team, but I guess he wasn't ever interested in this.

Also, if it helps, about 20-25% of marks are due to rating manipulation instead of cheating. (source: again, 2023 recap)

anonmod edited

#72

@delorenflie said in #70:
> That is the issue.

Exactly.

> if we assume that the 1% missing is because tosViolation was flagged for something else than cheating, then this person basically managed to reverse engineer the exact parameters that lichess uses.

That is a very wrong assumption, by an order of magnitude. Note that if the assumption was indeed correct, even for tosViolations due to engine abuse, a lot of them are done without ML input. Just for example, when cheating extensions are detected in the browser – which happens a lot.
So if the assumption was correct, they would have done a lot more than just reverse engineer the exact parameters that lichess uses. Quite a lot more.

So, I am asking again: How did they manage to achieve a 99% overlap?

delorenflie

#73

@Cedur216 said in #71:
> Doing this "work" for free is not an apologize if you do it under a bad premise. Thinking that cheat detection is purely handled via ML is such a premise. The guy was offered to stay in touch with the moderation team, but I guess he wasn't ever interested in this.

I think there is something wrong with your phrase as I did not fully understand it.

The "guy" signaled a very important and fundamental problem with the cheat detection system, which is "is trained with labels that largely are based on its own decisions from the past". This is a terrible practice. Not to say that I still can't understand how the use of CDNs can be helpful in cheat detection (as the author of the post also points out).

> Also, if it helps, about 20-25% of marks are due to rating manipulation instead of cheating. (source: again, 2023 recap

Until these sources fully and transparently disclose what a tosViolation means exactly for each banned player, I have a hard time believing them. The results of the analysis I linked before, and of my own analysis indicate that the effective decisions do not match these sources.

delorenflie

#74

@anonmod said in #72:
> That is a very wrong assumption, by an order of magnitude. Note that if the assumption was indeed correct, even for tosViolations due to engine abuse, a lot of them are done without ML input. Just for example, when cheating extensions are detected in the browser – which happens a lot.
> So if the assumption was correct, they would have done a lot more than just reverse engineer the exact parameters that lichess uses. Quite a lot more.
>
> So, I am asking again: How did they manage to achieve a 99% overlap?

Do you have any background in machine learning or anomaly detection? The lichess model is trained on the pre-existing labels of players that were banned. And this is done in an almost online fashion, as the model is continuously re-trained. Therefore, if the fitting of the model worked at all, it doesn't matter that many of the decisions are not made with the model, it should still be able to predict those labels. The fundamental issue pointed out in the link I posted earlier, is that not only there is little validation that those previous labels were correct, the model might be strongly over-fitting them, as it is constantly being re-fed the labels it is generating.

Jade-1

#75

All I know is that a lot of 1000 players are able to play 15-20 moves of opening theory all of the sudden. It's fascinating.

anonmod edited

#76

@delorenflie said in #74:
> Do you have any background in machine learning or anomaly detection?

I have indeed.

> The lichess model is trained on the pre-existing labels of players that were banned.

Partially correct.

> And this is done in an almost online fashion, as the model is continuously re-trained.

Incorrect. This is not done and was never done.

> Therefore, if the fitting of the model worked at all, it doesn't matter that many of the decisions are not made with the model, it should still be able to predict those labels.

Even if it was continuously re-trained in an online fashion, this would not happen. Neither Irwin nor Kaladin ever receive data that would enable them to detect browser plugins.

> The fundamental issue pointed out in the link I posted earlier, is that not only there is little validation that those previous labels were correct, the model might be strongly over-fitting them, as it is constantly being re-fed the labels it is generating.

The fundamental issue for me is that while you are calling for a technical discussion, you keep making incorrect assumptions and presenting them as facts. A technical discussion based on wrong assumptions will lead nowhere.

Edit: By the way, have you figured out yet how that 99% overlap was achieved?

Cedur216

#77

@delorenflie said in #73:
> Until these sources fully and transparently disclose what a tosViolation means exactly for each banned player, I have a hard time believing them. The results of the analysis I linked before, and of my own analysis indicate that the effective decisions do not match these sources.

There are only two possible ToS mark reasons: cheating or boosting/sandbagging.

In 2023, Lichess flagged 72k accounts for cheating and 20k accounts for rating manipulation. So that's a 20/92 = 21.7% percentage of ToS marks not issued for cheating.

delorenflie

#78

@anonmod said in #76:
> > Therefore, if the fitting of the model worked at all, it doesn't matter that many of the decisions are not made with the model, it should still be able to predict those labels.

> Even if it was continuously re-trained in an online fashion, this would not happen. Neither Irwin nor Kaladin ever receive data that would enable them to detect browser plugins.

So are you saying that neither Irwin nor Kaladin have ever trained on labels which were generated by other methods than its own predictions? Because you are assuming that there absolutely no correlations between some of these other data that you say is used to automatically flag some players and the data that is fed to Kaladin and Irwin. That is likely a WRONG assumption, not only in my experience, but given my own research. if y~f(data fed to the system, data not fed to the system) where y is the likelihood that someone is cheating, as long as the COV(data fed to the system, data not fed to the system) != 0, the system should still be able to learn a patter that would lead to producing the labels. Of course, that depends on how much redundancy (e.g, the amount of covariance between the different features) there is in the data.

> Edit: By the way, have you figured out yet how that 99% overlap was achieved?
Yes. As I explain before, even if not all data used by lichess is available to train Kaladin and Irwin, as long as the missing data is correlated (linearly or non-linearly) with the data fed into the system, it should be possible to recover a function that generates the likelihood of the labels "cheat", "not cheat". The function does not have to be the same that lichess uses, since most likely the function space that maps the set of features I previously described to the likelihood of cheating has a compact-open topology.

> The fundamental issue for me is that while you are calling for a technical discussion, you keep making incorrect assumptions and presenting them as facts. A technical discussion based on wrong assumptions will lead nowhere.

I do not think I am making incorrect assumptions, but I do have the feeling that this discussion is pointless. I am not sure you are fully being able to pass across to me all you want to say, since from my perspective, I am having the feeling that you don't fully understand ML, although you say you have a background on it. Likely you do understand, but because we are not having an online discussion, it is quite hard to fully explain yourself. I believe, I might be having the same problem here, and what I want to say is also not coming across. Among all these, you have some times stupid message coming in the middle like the one of @Jade-1 . Therefore, what I think we need is a forum specific for these things, where only people with a technical background and people with a strong chess background and interest in helping improve cheat detection are allowed.

delorenflie

#79

@Cedur216 said in #77:
> There are only two possible ToS mark reasons: cheating or boosting/sandbagging.
>
> In 2023, Lichess flagged 72k accounts for cheating and 20k accounts for rating manipulation. So that's a 20/92 = 21.7% percentage of ToS marks not issued for cheating.

Yes, but since we don't know which accounts have been banned for exactly each reason, I can't check what is the percentage of people that have been marked for sandbagging that have also cheated. What I mean is that the group of people who have cheated and the group of people who have sandbagged is not disjoint.

SergioGlorias

#80

@delorenflie said in #79:
> Yes, but since we don't know which accounts have been banned for exactly each reason, I can't check what is the percentage of people that have been marked for sandbagging that have also cheated. What I mean is that the group of people who have cheated and the group of people who have sandbagged is not disjoint.

The only ones who may know the real reason for the ban are the moderators and account owners.

This topic has been archived and can no longer be replied to.