The highly frustrating side of chess engine development.

@nutnun said in #20:
Pretty simple. Have a look: github.com/likeawizard/tofiks/blob/master/pkg/evaluation/evaluation.go#L99

I just save Zobrist hashes of all previous positions. Then I check if the current position hash has appeared anywhere in the past within the half move counter - it can't be past that because that involves captures and pawn pushes which by definition can not repeat a position.

TheKnightIsDark

#22

@likeawizard I'm really not an expert so take everything I say as genuine questions instead of criticism.
But I've looked at the code on github and I'm a bit confused about your move ordering:
Afaik you'd normally order all captures before all quiet moves, but you seem to order all captures after all quie moves because the MVV/LVA scores are so low. Is this intentional? The same goes for killers vs history, I don't think there's a fixed order between them in your code (usually you'd order killers first I think). I don't really know go all that well, so maybe I just don't understand what happens in `GetMoveSelector`.
Also, what's the difference between the pv move and the hash move? Shouldn't the tt already contain the pv move?

likeawizard

#23

@TheKnightIsDark you're absolutely right on the move ordering. I will double check everything tomorrow but it seems while refactoring the capScore constant got lost in the mvv--lva part. That's a big yikes if true.

In the hash move vs pv. They would be the same most of the time but I always had issues with my trainsposition table. But things can get over written in the tt so I thought adding the pv line was a good safe guard. Especially since I am planning to add lazy smp - more threads might write to the tt and less predictability.

dboing

edited

#24

Can you explain where you lose predictability and what it means. for maybe the user audience. or in between dev and general chess audience? and myself. I have heard instability somewhere else.. reliable too. about PVs.

What is your take on PV depth and those notions. and what is your intent in terms of ability to use your code for those of us, who wouuld like to use it to analyss a position, not so much use it only for play, or watch it do self-play? i.e. we would like to have an analysis like the engine does, but forgets as it was built for gaming first, in other engines I mean. yours is evolving, have you adopted the same imperative of forgetfulness, or are you maintaining ways to recover the PVs that were used to generate the score one might see if using your engine for anaylsis of a position? so we as analytical users of chess positions, could get the full programmed picture.. and have our grain of salt autonomy of thinking about the the score and best chosen move..

Not only using the tool as oracle, not questions dared to be even considered, about possible programmed biases that could be understood or learned to be from having both score, move and full PV... specially the part of PV where the score value comes from (possibly normalized nowadays in SF for example).-

What are your thought on that branch of engine development toward engine as analytical helpers. Are there was to stimulate developper creativity in that direction.

Do you agree that it might conflict with the current set of objectives (ELO in some kind of near mutant self play, maybe, we don't know really, how different those engine pool players are from each other)

any idea on how to stimulate them in the direction of calibration of the engines as analytical tools. that their response surfaces might be studied and allow users with documentation to that effect, would then have a confidence level to associate to the output.. so they might want to keep thinking as humans on top of that then more objective tool, since we would have quantified its behavior, and shared it with the users.

some of us are curious about the chess impact of engine analysis never having been analysed itself (or if there were such thing, not shared outside dev circles, or not with proper data analsyis precautions, so they might not be shareable outside).

dboing

edited

#25

www.talkchess.com/forum3/viewtopic.php?t=40931
repetition detectoin. while i was visiting for something else..

Edit: might need the op's talent for explaining in less "shop-talk" parlance.. so we can all thing concepts more than code. (while such concepts have been honed to me amenable to automatic behavior, for coding to make sense, they can help us think about chess and the code effect on chess too, at a level where coding fluency is not a barrier).

why I was applauding this blog author efforts, which i found unique, in that it is on lichess, with a general chess audience (at least it could), and the journalling humility of being a learner and sharing that with us.

There might have been an acceleration in some other direction, toward the ELO grail (OLE grail?)... but I know the talent is still there..

likeawizard

#26

@dboing what I mean by predictability is that an engine can search millions of nodes to come to a conclusion on a move to play. That means that when you are debugging your code there is often no easy way to see what is happening inside. You can at times check if individual components work or not. You can also check the playing strength over a large number of games to see how it performs. But it requires a very solid test suite to make sure nothing is broken. That it for example still plays rook endgames properly and also handles passed pawns with the attention they require. An update might improve some aspects while breaking others. It can have even a net positive result but some things might get broken and go unnoticed.

I want to work on a test-suite where I can answer some of these questions and measure the performance in more controlled environments.

dboing

edited

#27

I approve of a controlled parameters approach. And making tests. Thanks you for your answer. Would making statistics in single searches sub-tree by keeping the whole sub-tree for many positions (not necessarily all from same game or following only one game), allow to have an ensemble view. A tracing of the recursions having been doing at execution, in a slower mode that still does what the faster mode is doing? A diagnostic mode.

If the ELO time shaving imperative is always there, or memory size (which might be interacting), one might be forgetful to get more room, faster for deeper searches, often also by not looking as wide as a strict AB search would do on the basis of a high quality leaf evaluation, and no quiescence search (the quality of the leaf eval is about its forecasting, its ability to extract all the information that is in a position, be it in the middle of a saloon bar fight with Yosemite Sam involved).

The initial type A program was to use computer speed to be able to leverage a poor leaf evaluation by shear exhaustive search over many nodes deep enough. So I think it might be wise, to have that in mind when talking about predictability. Such engine may evolve toward improving partial their leaf evaluation parameters or coverage, if provide enough covering of chess position space (where there is currently no way to make sure, and nobody asking, I think, not that I have seen, but i gave up not so long ago looking). But the pressure is not high, given the partial exploration and the selectoin of positions within the searches. All such claims I seem to be making could be adressed with accumulation of many explored sub-trees that such engine recursively explore, and try to forget as soon as possible keeping only miminal information, often to reduce the future width of exploation in exchance for deeper (faster) one. One need to be able to look at sub-tree shapes (combinatorially, and in some parallel univers possibly also with some metric from chess position space).

I suspect long ago, such things might have been done internally enough to convince developpers that their statistics however not thorough, would justify trying a heuristic, and talk about in some dev. forum. And that such results, would propagate in the dev pool building any engine of that category, which was the only until A0, in engine competitions. I have seen, examples of such internal preliminary probing, where the data analysis was difficult to interpret, because the goal was not analysis, but just enough convincing to try and let fishtest sort out. No sense of cloud dispersion. There are so many nodes types (as many as there are recursive conditions), that one would realy need a full dump nowadays, if not for engine tournament optimization, for non-dev, assessment of how confident such engine output is, when used as chess analytical tool. I find a certified full dump in some mode, tracking the forgotten instead of forgetting it, might be the simplest way, to build such full diagnostic mode, in a controlled way.

(btw endgames... if you want controlled and rationalized by many angles, you would find you could have a solid way to mesh that sub-set of chess space).

Nevin_T007

#28

@Toscani said in #6:
just like Watson in 2011.

Toscani

#29

en.wikipedia.org/wiki/IBM_Watson#/media/File:DeepQA.svg

dboing

#30

Dear @likeawizard, I am hereby writing to maybe help us (some of us) and maybe you as side effect.

From your whole series of blogs journalling your chess engine development from scratch (well beside general C.S. notion you might have become familiar with, but not specifically about chess and its algorithmic models (or the high level math. models their might be implementing, or were implementing in the past, before it became code that started inventing something often now in need to high level extraction in terms equally rigorous, but absrtacting machine language into human language. so a bit of math.. might be needed to bridge things). That is my outlook. I am writing to you, because i know you have had that concern of explaining to some idea of general audience what you were experiencing, learning and doing, including your trials and errors.

It has been my understanding that even your code has been constructed with clarity of purpose as part of your global optmization criteria (for the whole project, not just the arena behavior).

So I would like to complement my current understanding of chess engines of the familiy you have been following in your design, which includes our mighty, beloved and revered without much question, I name Stockfish tree of versions (and its fishtest pool of other instances, of which probably we get to see the code post optimization, at release time).

Avoiding chessprogramming wiki terminolgy at the point where post AB pruning heuristics start being considered to further reduce the explored tree diversity of candidate branches being explored toward scoring a single position, usually the user current position.

I will make a list of what I think I know.. and would ask you to try to complements trying to regroups things abstracted from the chessporgramming wiki various terminologies that might have been fortuitous choices that seemed good at innovation time, but do not help, figuring out what the code might be doing at the level your blogs were aiming at or I am trying to understand.

I don,t fear math. i fear words more than math.. and words to math fuzzyness, across math. sub-fields.. so i suggest we try to be parsimonious. if you agree. and use best natural language if we can..

list of things, if you please could complete even partially, about what your engine has so far tried, rejectied, and included.

exhaustive legal tree exploration
root position scoring
AB pruning
Iterative deepening
Quiescence search
Leaf node evaluation

Now that could be the basic notions (besides the move encoding and the position information encoding), as chess user of engine analysis might want to become familiar with, and they are all explained by your efforts in your blogs. I suggest avoiding implementation terms like transposition table (it is not only about transposition anymore, has not been for a long time in SF).

Let me set the tabula rasa level. You have mentioned history, good choice of term.. to which i would had memory of history.
Which ties in to the notion of making statistics internally during the construction of the explored tree, about various things that all the incrementally composed new heuristics you are trying on top of above core model for chess audience.
The killer move list.. which is an example of how such engine design keeps having to develop a bush of recursive heuristics possibly synergizing and repairing each others weaknesses, but also possibly messing things up, perhaps the source of your spleen.. (i may not have read or understood the blog carefully, let me know if I should revisit).

I have questions for you that might help you answer the more general, possibly impossible to answer questions (depends on how fast you went for the ELO whale, i assume you have been consistent though).

So I have a basic model i alrady shared with you, that you responded was basically the skeleton of interactions between the search module and the evaluation module. (at the time). They are increasingly tangled as your engine evolved. and I would let you speak about how move versus position gets dissected and filtered in that interaction, but the point is always to reduce computing cost and memory usage (which are probably interacting costs in same direction). but that is not necessary for chess model we want, is it. can we still talk in SAN notation ply-moves and positions as FEN, even if those are not internally preserved?

I will. SAN move being the individual mobile unit notation that determines if applied (that is important that applied part i think for chess audience to understand, do you agree?), what the next position will be (or candidate next position, if move not yet applied). The point is that a SAN move can be processed in the search module for many recursively coded (the whole coder and all heuristics are usually coded like that, very compact writing), which regard to which position it would be applied to or the position resulting.. A lot happens in search before, a candidat move is dispatched to the evaluation module (expllicit function of position information or what is left of it in the particular engine design) with modelize input as FEN.

My question to you, as coder, and open source code repository developper. If we were to have a generic model within that frame i might have suggested. could I assume that all your heuristic experiements beyond the core model above, have a code reading connectted set of lines vertically, and possibly even a name for the conditional test (and its bottom case), to the point
where I could ask a subsequent question:

How much trouble would it be to have the full set of all nodes and all their heuristic dispatching as node label (not the chessprogramming topology or node type, or other confusing terms to me at least, unless you went consistent with only one there and did not need more node types.) So node and heuristics code block labelling . modifying code, in single root search, to keep dumping in some persitent memory, all the node processing, and all the dispatching calls.

if the leaf evaluation further makes types of nodes (the moves) or the position (I let you sort that out if it no longer makes sense to talk about piece-move or position. In case the processing has spilled over to the evaluation modules, for more cost efficiency i can't imagine. not being a developpper myself..

Can you help me understand further and which of your blogs might cover that already.. the parts that can.

I hope i have just show to you and possibly other chess engine developpper, and most importantly chess lovers and users of engine as analytical tools, what I meant in previous rambling about internal debug mode.

We can start with the as-hoc tree topology nomenclature using terminolgooy from your code of your choosing. You are good at such thing) or just function# or block number if some node processing is not yet sub-modularized.

I suppose such full singe root search explored tree tracing with all the heuristic treatment labels from code (not wiki), is stil of the finite set category, each heuristic or testing and dispatching steps are still human mind size. or reasonable amount of words.

then. the data analysis can start becoming conceivable., and we would have the real model that corresponds to your code, that some non-dev chess users could still include in their own thinking, analysing chess board, and discussion in the community.

I already explained why I am eager that one day, such memory effort dumping becomes a norm for engine being used in their analytical role in chess.

I hope that i made. sense to op. and at least someone else. or more. i welcome any mode of discussion seeking some truth. not a competition here.