Gabe and I got VLife pretty optimized back in the day. We took the tile route to a even bigger extreme and gave tiles sub-tiles as well. The key is to do as little processing as you can get away with; any loop that you don’t have to loop over is less processing.

Of course you can go too far. If you make the sub tiles too small you wind up with too much overhead. If you makes the tiles themselves too small you wind up with overhead on the memory allocation/freeing side of the world. The key is to do benchmarks to get a sense of progress.

In some relatively recent conversations with a former coworker we came up with an even better way of representing the cells to same memory and potentially speed things up even more: denser cell packing!

In the previous description we were spending five bits to store each cell. One bit for the “liveness” of the cell and four for the neighbors since you can have up to eight live neighbors.

This is good, but we can get better. Storing five bits is problematic because it’s not a power of two; it’s hard to pack things like that. I suppose you could take a 16 bit word and get three cells, but it’s more of a stretch. That’s not even getting to the additional overhead of the bit shifting to process things.

We need a better way.

Rob came up with the better way through an elegant hack!

Let’s go down the path of storing a cell in only four bits and take one bit for the liveness. You can’t get away from that. That leaves three bits for neighbors. How can you store the number of neighbors in three bits? You can only count up to seven like that.

My initial thought was to use seven (all bits set) as a semaphore to trigger the exception case of “count the neighbors the old fashioned way” since seven or eight neighbors happens so infrequently. But Rob got me one better than that.

Implicit in storing the cell in four bits is that you store two cells per byte. The sibling cell’s liveness bit is in the same byte! The count only has to store the number of live cells other than your sibling. This gets the max down to seven. Problem solved!

Of course the next question is “how do you process this efficiently?”

Easy. A lookup table!

With only one byte to represent two cells the lookup table is only 256 (2^8) rows long. Given the previous state of the cell-pair, all of the information to do the processing is contained in that one byte: the liveness of both cells and enough information to determine the total neighbor count of each!

I’ve not yet implemented this, but it’s cool to advance the state of the art like this.