Both chess4j and Prophet now have hashing. Maybe I should say that chess4j has hashing again, because it did have hashing before as I wrote about here. As stated in a previous post, I basically “tore down” the search in chess4j in order to build it back up alongside P4, so I could test equivalence as I go.
The replacement strategy is very simple and certainly an area for future work. For every single node in the full width search where moves are expanded and searched (so these wouldn’t include quiescence nodes, leaf nodes, or nodes where a draw caused an early exit), the result is stored in the hash table. For every node of the full width search except the root, the hash table is probed. The hash entry includes not just the score for the position, but the depth of the search that produced it, and the type of node (PV, ALL, CUT) as well so we know if the score represents a bound or an exact value. In the case of PV and CUT nodes, the “best move” is also stored to aid in move ordering. The hash table is not “bucketed” – an entry consists of just a single slot, so anytime something is hashed any previous values are overwritten. This is referred to in the literature as the “always replace” strategy. It’s very simple, but very effective. In the future I intend to experiment with probing from the qsearch, different replacement strategies (such as “depth preferred”) and expanding the number of values that can be mapped to a single key.
When a hash probe is done and a value found, if the depth associated with the entry is at least as big as the depth we intend to search, we may be able to immediately exit the search without expanding any moves. If not, we at least have a move to try. If we do have to search and there is a suggested move from the hash table, it is tried first, before any moves are generated.
The results of two runs of chess4j with hash vs. without:
… and Prophet4 vs Prophet3:
A very solid jump in strength; I’ll take it. 🙂
I’ve been putting some thought into my testing regime. In the past I’ve tested changes by running gauntlets of 10+0.5, which can take quite a long time on a single processor if you really want to get the error bars down. I have an older laptop that I’ve dedicated to testing. It has two cores, so it would be nice to utilize both of them. So, I did some self play tests with both of my programs, running two games concurrently, and happily the results came out pretty close to 50%. Additionally, I also experimented with reducing game times in self play matches from 10+0.5 down to 5+0.25. The results were still around 50%, and without any timeouts. So, at the moment my testing regime is to run a self play match, and then to run a P4 vs P3 match as verification. I think this will work until I’ve reached parity with P3, at which point I intend to go back to gauntlet testing. It would be good to introduce some statistical methods to measure the confidence a change is good (and to use for early termination), particularly as the elo become harder to come by and more and more games are needed to measure the change. I imagine at some point I’ll need more cores for testing as well.