Variable-length Integers

Wow. So it definitely took me long enough to get this blog going. A lot has happened since I got this site up and running, so hopefully there will be no shortage of thoughts to write about.

This semester I’ve finally qualified to take the CS Senior Design Project, which, this semester, is to design a Chord client. At the heart of the system is the identifier, which is essentially a glorified hash. We decided to go with SHA-256, partially because the instructor mentioned it was in the class of his favorite hashes. It didn’t really matter to me, since C# supports it just as easily as anything else.

Identifiers have two essential purposes, comparison and addition, from which every method can be derived. I wasn’t particularly intrigued by either of these topics, since almost without exception, you can rely on existing structures in the FCL to do whatever you need to do, and it’s always better to do so. Now, I said “almost without exception,” namely because I stumbled upon the fact that these identifiers unequivocally are exceptions. The thing about using a SHA-256 hash – or any hash for that matter – is that it’s an enormous value – 256 bits to be precise. There aren’t any real 32 or even 64 bit hashes available (I doubt they’d be useful), so relying on the good ol’ FCL goes out the window.

Other students, stuck in the abyss of Java, were having difficulty getting their socket implementations going, so mucking around in massive integer arithmetic was right out. I believe the professor advised them to take the 32 MSB and just rely on an unsigned integer. (Side note: does Java even have unsigned integers?) I considered a few similar options: taking the 64 LSB or separating the hash into several equal groups of 64-bits each and getting the xor result of the segments. Arrogantly I proceeded on, demanding that whatever I wrote work not only for SHA-256 but also SHA-512 or anything else using the full bit-length.

Comparison wasn’t difficult. Essentially, you start at the MSB in your array (which happens to be at Length - 1) and compare until you find a comparison != 0:

public int CompareTo(Identifier other)
{
	for (int i = _hash.Length - 1; i >= 0; i--)
	{
		int compare = _hash[i].CompareTo(other._hash[i]);

		if (compare != 0)
			return compare;
	}

	return 0;
}

Now comes the fun part. First, I wrote a little method called Pad(byte[] list, int size) which pads the MSB of an array with 0’s until it’s the proper length. After several iterations, I finally ended up with this:

public static byte[] Add(byte[] a, byte[] b)
{
	if (a.Length > b.Length)
		b = Pad(b, a.Length);
	else if (a.Length < b.Length)
		a = Pad(a, b.Length);

	byte[] result = new byte[a.Length];

	int carry = 0;
	for (int i = 0; i < a.Length; i++)
	{
		int value = carry + a[i] + b[i];

		// get the bottom part of the bits
		result[i] = (byte)value;

		// knock off the bottom bits that we already got and
		// put the rest into carry.
		carry = (value - result[i]) / byte.MaxValue;
	}

	return result;
}

The data holder for carry has to be large enough for the worst case scenario 0xFF + 0xFF + carry 1, which is why I chose an int rather than a byte (although a short could have worked just as well, I suppose).

Subtraction was more difficult. Consider the way you subtract, with carrying, etc. Carrying on its own is a recursive process, and I can’t imagine writing it out. It was at that time that I remembered how the ALU does subtraction- by taking the two’s compliment of the second operand and doing addition. So I followed suit:

public static byte[] Subtract(byte[] a, byte[] b)
{
    if (a.Length > b.Length)
        b = Pad(b, a.Length);
    else if (a.Length < b.Length)
        a = Pad(a, b.Length);

	byte[] twoscomp = new byte[b.Length];

	// get the 1's compliment
	for (int i = 0; i < b.Length; i++)
		twoscomp[i] = (byte)(byte.MaxValue - b[i]);

	// add 1
	twoscomp = Add(twoscomp, new byte[] { 1 });

	// NOTE: twoscomp is now the two's compliment of b.

	return Add(a, twoscomp);
}

I also needed multiplication, but I’m still working on the solution. I’ve got a working draft, but I’m not entirely sure I’m happy with it just yet. I’ll keep you notified.