Chess Engines- How they work, play, and compete

20 Feb 2022983 viewsEnglish (US)

I wrote a piece a year or so ago about computer chess, and figured I should edit it to be up-to-date and post it here. I might also post something about how to effectively use them, and if you want that just lmk. it's probably still got mistakes just lmk

TCEC- An explanation

First off, I'll explain how TCEC works. TCEC is basically a computer chess tournament, and there's some division system, but all you need to know is that there is a Premier Division, and the top two in that division (after a bunch of games, I believe 224) go on to a Superfinal or Sufi, which is a 100 game match, and the winner is the TCEC champ.
Note that the openings for every game (in Premier Division, the superfinal, and other divisions as well) are handpicked by humans. The reason why is that otherwise there would be a very high drawing percentage, and the games would all just go the same way and be extremely boring- imagine a tournament where the Berlin was played almost every game and you have an idea of how it might be. To make it fair, each opening is played twice, once with reversed colors. For instance, Komodo played Stockfish and won as white in a Sicilian line, despite being clearly the worse engine. The reason is that the opening position was basically just busted at this level. To make it fair, Stockfish (often abbreviated as SF) played a game in that line as white vs Komodo (and it won). (These games were games 187 and 215 in Premier Division 19
Generally, imbalanced openings are picked in order to force a lower draw rate. Draw rates in these events are usually very high- 70-80% is not unusual at all, and 90% is not unknown. Examples of such imbalanced openings from last Superfinal are e4 a6 d4 e6 c4 b5 and the czech benoni. These are honestly tamer than usual, and they still get drawn.
Generally, the goal is for the openings to be lopsided but not straight-out winning for one side.

History

So, now I'll discuss the history of computer chess. Computer chess is basically as old as computer science- the first proper chess program was made by Alan Turing (really big deal in CS, one of the main founding fathers of the field) in 1948. There had been occasional automatons constructed that claimed to play chess, but either they were extremely limited (such as only being built/able to do a king and rook vs king checkmate) or it was fake and a human was inside. Turing was unable to even run it on a computer, and had to execute it by hand, but it worked. It was completely awful, naturally. The first big landmark moment was Claude Shannon (big deal in CS, another founding father)'s paper Programming a Computer for Playing Chess, where he basically outlined the fundamental ideas of traditional search using minimax. The basic ideas are still used in the traditional engines of today (Stockfish, Komodo, etc), The first computer to participate in human competition was MAC HACK in the spring of 1967. It earned a rating in the low 1500s. Like most computer chess programs, it had an opening book, which is basically just a collation of lines it would play- much how like players would memorize opening lines. This book was notably developed by the chess master Larry Kaufman, who is still involved with computer chess nowadays and currently works on Komodo.
Over time, computers increased in strength, and chess programs were improved. In October 1983, the engine Belle reached master strength, and was given the title of U.S. Master by the USCF. Interestingly, one of the main developers of Belle was Ken Thompson, who (among many other important feats) was one of the people that made Unix, which was an important operating system that you can still see influence from today. Interestingly, his password at one point was "p/q2-q4!a" which in algebraic notation is d4 (moving the pawn in front of the queen twice).
In 1996, Deep Blue played a match vs Garry Kasparov (the world champion at the time), and won game 1. This was the first win by a computer against a current world champion at normal tournament time controls. It would go on to lose the match 4-2. In 1997, it came back and beat Kasparov 3.5-2.5.
After this, computers just ran away and became much much better than humans. The last win by a human versus a computer in normal tournament conditions was in 2005 when Ruslan Ponomariov beat Fritz. Interestingly, the citation for this fact on Wikipedia is a XKCD comic.
Nowadays, the only way humans can compete with engines is if they are given odds. Komodo played a rapid match vs GM David Smerdon at knight odds, and Smerdon managed to win 5-1 (5 wins, 1 loss). GM Jorden Van Foreest played a match vs Stockfish at 2 pawn odds, and was able to win 1 game and draw 2 with no losses. Sadly, the match ended as soon as Van Foreest won one, so we were not able to see all 10 games. I'm frankly quite pissed at this. In both cases, people favored the computers heavily, with many betting that Van Foreest would not even be able to win one game and would lose many. Nobody really understood how this stuff would go. Generally, the humans gave back material to initiate simplification.
If you want more reading on computer chess history, Monty Newborn's book Kasparov versus Deep Blue: Computer Chess Comes of Age is a great book that spurred my long-lasting interest in this subject.

Evaluations

A key part in pretty much any engine is some way, without any sort of searching, to give an approximate evaluation of a position. Many engines, such as Stockfish and Deep Blue, basically use a huge, humanmade, formula that is based off of many factors, such as material, piece activity, pawn structure, etc. This is the "traditional" approach. There's a newfangled approach, using neural networks with no human input and trained via self-play to evaluate positions. They basically are black boxes that you feed a position and they spit out an evaluation. This is used by Leela and was used by AlphaZero. There is a third method, named NNUE, which is very new and will be discussed later.

Minimax and Alpha-beta

These are tricky concepts, and I'm just going to explain it at a high level, due to multiple factors (the fact that most of you probably don't have a CS background, the fact that the sufi starts in 10 hours, with probably most of that spent asleep, and the fact that I'm lazy being predominant).
If you want a more in-depth explanation, the Wikipedia pages for each of these concepts are quite good.
Minimax basically builds a tree out of possible continuations to some depth. In it's most basic form, it considers all moves at any position. It figures out what the best move is, assuming that it's opponent will choose the (seemingly, based on the tree) best possible move.
However, in this basic form, it is quite slow due to the great number of moves. Thus, pruning is used. One of the most important pruning mechanisms is alpha-beta pruning which basically doesn't bother to investigate moves if they will not change the result. It can also be described as not bothering to look at other responses if a move has already been proven to be worse than another due to a reply found. There are many more, very smart, pruning mechanisms, which are all over most minimax engines.
Engines that use this are Stockfish, Komodo, Deep Blue, Ethereal, and many others. This is the "traditional" search approach,usually combined with a "traditional" evaluation.

Monte Carlo Tree Search (MCTS)

Like with minimax and alpha-beta, this introduction will be very high level and non-technical. I am ashamed to admit that this is also the limit of my understanding, however.
Basically, it plays out many possible continuations, with reasonable play for both sides, for a while, assesses the ending position (using either a neural network or a traditional evaluation, or NNUE or some other method), and then gathers statistics to make a final evaluation. I don't know any good resources for this, as the wikipedia page for it is confusing and seems to discuss an algorithm different from this use.
Ones that use this are Leela, AlphaZero, KomodoMCTS, and some others. Generally, it seems to be paired with a neural network evaluation.

Leela/AlphaZero

AlphaZero and Leela use the same methodology. AlphaZero came first, and then Leela shortly thereafter was created as an open source project based on the architecture of AlphaZero. They basically both use a neural network-based evaluation in conjunction with MCTS. The neural network is trained from extensive self-play.
It's worth noting that Leela uses the GPU, and as such if you wish to use it you will want a fairly powerful GPU.

NNUE

NNUE is a new technique that is used in Stockfish 12. It was taken from the shogi community. Basically, it's a neural network that is trained on a bunch of positions, and it is attempting to mimic the evaluation of some traditional engine at some low depth (12 was usually used, sometimes 13 or 14 apparently, although I'd expect as time goes on for this depth to increase). As it turns out, this significantly improves the performance of the engine, and led to rapid gains in strength for Stockfish.
However, NNUE isn't *always* better than classical evaluation- positions with large material or piece activity imbalances in particular (among others) are cases where Stockfish's classical eval is better, and as such will be used. The exact rules for where Stockfish's classical eval is used and where the newfangled NNUE eval is used are complicated and constantly changing, and I'm sure between the time I finish writing and I publish it, there will be all sorts of changes made.
The reason why it's possible/feasible to use NNUE with the minimax search of SF is due to the relatively small size of the neural network. If a net from Leela for instance was used, the net would be too big for the combination to be any good (and thus, this is why Leela uses MCTS).

Comparison of different types

Generally speaking, traditional ab engines struggle in closed positions and are better at tactics and open tactical positions compared to the neural network engines. The neural network engines also have a more refined positional sense. Neural network engines usually struggle in openings such as the Open Sicilian, the King's Gambit, and while traditional ab engines struggle in openings such as the Benoni, French, King's Indian, and the Stonewall Dutch.
Engines using NNUE such as Stockfish 12 have not been in the spotlight for long enough to have a clear set of weaknesses and strengths, but it seems as if it basically has the same strengths as traditional ab engines, but with better performance in the sort of positions they struggle in. Every engine has it's own playstyle and exact strengths and weaknesses differ, however. For instance, historically Komodo has handled the King's Indian better than Stockfish.

Endgame Tablebases

Over time, all endgame positions with less than or 7 pieces total (including kings) have been solved. Engines typically have access to a tablebase where it can be checked whether it is a win, draw, or loss, which replaces their evaluation.
On occasion, adding a tablebase that has positions with more pieces can in fact reduce performance, unintutively. The problem is that tablebases are really big. Even if you're only storing whether a position is a win, loss, or draw (all you'd need for when you run into it in search), a 7 man tablebase would take about 9.35 TB, which is sort of ridiculous. Storing it on hard disk is too slow and would lead to reduced performance, so you have to store it on an SSD which is simply impractical. The cost would be quite ridiculous.