Hashing is an area I’ve meant to revisit in chess4j since it was originally implemented, for a couple of reasons. (1) The hash entries consume too much space, limiting the number of entries the hash table can store, and (2) in preparation for multi-processing capabilities. I’ll address the first issue in this post, and follow up on the second at a later date.

If you don’t know what a hash table is used for, I recommend reading this article: https://chessprogramming.wikispaces.com/Hash+Table .

Up until recently the table entries in chess4j had the following structure:

TranspositionTableEntry
long zobristKey (64 bits)
TranspositionTableEntryType type (32 bit reference to an Enum)
int score (32 bits)
int depth (32 bits)
Move move (32 bit reference to an Object) |

TranspositionTableEntry
long zobristKey (64 bits)
TranspositionTableEntryType type (32 bit reference to an Enum)
int score (32 bits)
int depth (32 bits)
Move move (32 bit reference to an Object)

So how much space does a single hash entry take? It’s not very straightforward to figure it out. With C you can use the ‘sizeof’ operation, but things get more complicated with object oriented languages. But, we can make some educated guesses and do some hand waiving to get some estimates.

The zobristKey, score, and depth members are all primitive: longs are 64 bits, and ints are 32 bits each. The ‘type’ member is a reference to an enum. An assumption is that references are 32 bits, since we’re using less than 32GB of heap — though that is JVM specific. By the same logic ‘move’ is also a 32 bit reference. Therefore, the TranspositionTableEntry itself is 64 + 32 + 32 + 32 + 32 bits == 192 bits == 24 bytes. (That likely ignores some OO “overhead.”)

That is already pretty bad, but it’s not even the complete picture. Note ‘move’ is a reference to an instance of a Move object. The Move object being referenced has a cost associated with it too. That object can’t be deallocated from the heap as long as we’re referencing it. So how big is a Move?

Move
Square from
Square to
Piece piece
Piece captured (possibly null)
Piece promotion (possibly null)
boolean castle
boolean epCapture |

Move
Square from
Square to
Piece piece
Piece captured (possibly null)
Piece promotion (possibly null)
boolean castle
boolean epCapture

Again, it’s difficult to come up with an exact number, but we have between 3 and 5 references (2 Square and 1-3 Piece) to Singleton objects, and two booleans which are 8 bits each. So, somewhere between 14 and 22 bytes. Note that not every hash entry will have a Move associated with it: upper bounds for example, but most would.

All in all, that makes the cost of a single hash entry somewhere between 24 and 46 bytes! Doing a little more hand waiving, let’s assume an average of 38 bytes (which is probably low). That means a fully loaded hash table would have the following maximums:

RAM |
Entries (millions) |

64 mb |
1.7 |

128 mb |
3.5 |

256 mb |
7 |

512 mb |
14.1 |

1 gb |
28.3 |

So how can this be improved? It turns out that with a little bit twiddling it’s possible to encode the type, score, depth and move members into a single 64 bit value!

TranspositionTableEntry
long zobristKey (64 bits)
long val (64 bits) |

TranspositionTableEntry
long zobristKey (64 bits)
long val (64 bits)

Not only does that reduce the overhead of an entry from 192 bits to 128 bits (24 bytes to 16 bytes), but it completely eliminates the need to keep those Move objects on the heap! Now a fully loaded hash table has the following maximums:

RAM |
Entries (millions) |

64 mb |
4.2 |

128 mb |
8.4 |

256 mb |
16.8 |

512 mb |
33.6 |

1 gb |
67.1 |

That’s a fantastic improvement in terms of storage capacity. Essentially we’re trading the computational cost of doing the transformations for space. The theory is that the extra space, which allows us to store far more entries, will more than offset that computational cost. So does it work?

chess4j has a command line parameter “-hash=NNN” that can be used to set the number of hash entries. NNN must be a power of 2. I ran several tests with the old and new hash schemas. Using the Silent but Deadly test suite at 30 seconds per problem, I measured average depth achieved at various combinations of max heap space (the Java -Xmx VM argument) and max # hash entries.

Results for the old hash schema:

Heap size (-Xmx) |
NNN=1m |
NNN=2m |
NNN=4m |
NNN=8m |
NNN=16m |
NNN=32m |
NNN=64m |
NNN=128m |

64m |
11.31 |
11.34 |
11.03 |
OOM |

128m |
12.42 |
12.22 |
12.12 |
11.93 |
11.42 |
OOM |

256m |
12.59 |
12.93 |
13.04 |
12.99 |
12.68 |
12 |
OOM |

512m |
12.64 |
12.99 |
13.31 |
13.4 |
13.49 |
13.43 |
12.64 |
OOM |

1g |
12.64 |
13.02 |
13.21 |
13.42 |
13.48 |
13.61 |
13.54 |
13.03 |

OOM = “Out of Memory”. The green cells indicate the peak performance setting for that combination of heap space and # hash entries. As you can see, performance can actually decline when the number of hash entries is too large for the given heap space, even before the JVM throws an Out of Memory error.

Results for the new hash schema:

Heap size (-Xmx) |
NNN=1m |
NNN=2m |
NNN=4m |
NNN=8m |
NNN=16m |
NNN=32m |
NNN=64m |
NNN=128m |

64m |
12.01 |
11.83 |
11.61 |
10.84 |
OOM |

128m |
12.58 |
12.84 |
12.84 |
12.46 |
11.69 |
OOM |

256m |
12.68 |
12.93 |
13.26 |
13.43 |
13.31 |
12.36 |
OOM |

512m |
12.72 |
12.97 |
13.26 |
13.45 |
13.62 |
13.46 |
12.89 |
OOM |

1g |
12.70 |
13.06 |
13.30 |
13.50 |
13.58 |
13.63 |
13.62 |
13.37 |

So, it appears the new hash schema indeed performs better, particularly with smaller heap sizes.

It occurred to me during testing that perhaps the JVM should have been initialized with the ‘-Xms’ argument equal to ‘-Xmx’, in order to not “penalize” the program for the overhead of increasing the heap, since it will surely grow to capacity. I reasoned that, using a large initial -Xms value would likely make the larger heap sizes look even better relative to the smaller ones, but doesn’t really matter if the point of the test is to compare performance of one schema to the other, as long as all settings are consistent.

Node rate per second, hash hit rate, and hash collision rate were also measured, but were less interesting. Node rate and hash collision rate predictably decreased with successively larger tables while hash hit rate increased.

It’s worth pointing out that my experience here is similar to Jonatan Pettersson’s with Mediocre Chess. See http://mediocrechess.sourceforge.net/guides/memoryinhashtables.html

These changes will be part of the 3.4 release. For the impatient, the changes to the main hash table have already been merged into master on the project github repo.