Hashing on Two Inputs

Ender314

Hashing on Two Inputs

Building a One-Way Compression Function

One-way compression functions are a common cryptographic primitive. A one way-compression function takes two equal-length inputs and generates an output of the same length. The output must appear random relative to the inputs, and it must be hard to find two inputs that generate a given output (the one-way criterion). Many hash functions are built from one-way compression functions, for instance using the Merkle-Damgard construction.

We begin the search for a good function using genetic programming. The function we are looking for takes two integers as input, and produces one integer as output. The criteria are similar to general randomness criteria for a hash function. Specifically, the sequence f(1,0), f(2,0), f(3,0), … must pass strong tests of randomness. Also, flipping a single bit of either input must change the output in a completely random way (the differential test).

After running for quite some time and exploring over 200,000 different functions, the genetic algorithm finds the solution below (in assembly and C#).

This function is remarkably simple, and has been tested out to a billion evaluations without any detectable pattern. The state variable y is changed at every iteration by exclusive-or with a constant. Since exclusive-or is it’s own inverse, this means the value of y alternates between the initial value and the exclusive or of the initial value with the given constant. This value is exclusive-or’d with x at each iteration, and x is scrambled with the rotate-multiply operation, which is known to be a very strong randomizer.

The first line inside the loop serves several purposes. First, it ensures that Hash(0,0) is not zero (a desirable property fort hashes). Second, it ensures that the second line in the loop actually changes the value of x significantly, even if the initial value of y doesn’t have a lot of 1’s (in the binary representation). For instance, if y was just 256, there would only be one non-zero bit in the binary representation, and without the first line, the second line “x ^= y;” would just flip one bit of x at each loop iteration. The first line ensures that, whatever the initial value of y, on multiple iterations it will have many non-zero bits, causing a lot of mixing with x.

As with many hashes, the particular value of the constants likely doesn’t matter a whole lot. The first constant could likely be any number with about half 1’s in the binary representation, the multiplier could be any odd number with roughly half 1’s in the binary representation, and the rotation constant could likely be any number not too close to 1 or 63. This function is a good randomizer, but in order to make it a good compression function, one would have to wrap it as below.

Public ulong Compress(ulong x, ulong y)
{ return x ^ y ^ Hash(x,y); }

By mixing the inputs with the output, it becomes more difficult to invert.