Wednesday, May 16, 2012

How to shoot yourself in the foot with HashCodeBuilder

The setup

HashCodeBuilder is a nice tool provided in apache commons to help building infamously difficult hashCode and equals methods in java classes that work properly. While I appreciate the tool, versions prior to 2.5 had a subtle api design flaw that is likely to bite you if you aren't careful. To illustrate how subtle the design flaw is, take a look at the following code:

package blog.mainguy;

import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;

public class LookupEntity {
    private String name;

    public void setName(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    @Override
    public boolean equals(Object other) {
        if (other != null && other instanceof LookupEntity) {
            return new EqualsBuilder().append(this.getName(), ((LookupEntity) other).getName()).isEquals();
        } else {
            return false;
        }
    }

    @Override
    public int hashCode() {
        return new HashCodeBuilder(17,31).append(this.getName()).hashCode();
    }
}

The first surprise

It's probably impossible to see the bug, but I assure you with older versions, the following test will (often) fail.

import blog.mainguy.LookupEntity;
import org.junit.Test;

import java.util.HashSet;
import java.util.Set;

import static org.junit.Assert.assertTrue;

public class LookupEntityTest {
    @Test
    public void lookupTest() {
        Set mySet = new HashSet();
        LookupEntity one = new LookupEntity();
        one.setName("mike");
        mySet.add(one);
        assertTrue(mySet.contains(one));

    }

}

The flaw is that the HashCodeBuilder didn't use the hashCode() method to return the hashcode, it used a method called toHashCode(). We effectively were getting the hash code for the instance of the builder instead of the hashCode for the "built" entity. This was fixed in 2.5 so that hashCode() now calls toHashCode(), but illustrates how an simple decision (such as how to name a method or naming a method in a manner too similar to another) can cause subtle bugs.

Surprise Number Two

To show how interesting this problem is the following tests will (often) pass:

import blog.mainguy.LookupEntity;
import org.junit.Before;
import org.junit.Test;

import java.util.HashSet;
import java.util.Set;

import static junit.framework.Assert.assertEquals;
import static junit.framework.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class LookupEntityTest {
    Set mySet = new HashSet();
    LookupEntity one = new LookupEntity();

    @Before
    public void setup() {
        one.setName("mike");
        mySet.add(one);
    }

    @Test
    public void lookupTest() {
        assertFalse(mySet.contains(one));

    }

    @Test
    public void countTest() {
        mySet.add(one);
        assertEquals(2, mySet.size());

    }

}

What! it appears that the "Set" is broken! The fact is, however, this is exactly why hashcode and equals not agreeing is a big problem. The method that java uses when storing things in Hashes is to divide the internal storage into buckets and use the hash to determine which bucket to put something into... it then adds things to an array in the bucket to store it. Then, when it's time to see if it contains something, it looks up the hash of the thing, goes to that bucket, and traverses the list to see if it is there. In our "broken" examples, we are returning the hashCode of a new instance of HashCodeBuilder... every time. This means that it will stand a strong chance of returning a hashCode that will not match the next time we call it.

Yes, one more surprise

For example, the following test will also (often) pass.

import blog.mainguy.LookupEntity;
import org.junit.Before;
import org.junit.Test;

import java.util.HashSet;
import java.util.Set;

import static junit.framework.Assert.assertEquals;
import static junit.framework.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class LookupEntityTest {
    Set mySet = new HashSet();
    LookupEntity one = new LookupEntity();

    @Before
    public void setup() {
        one.setName("mike");
        mySet.add(one);
    }

    @Test
    public void lookupTest() {
        assertFalse(mySet.contains(one));

    }

    @Test
    public void countTest() {
        mySet.add(one);
        assertEquals(2, mySet.size());


    }

    @Test
    public void hashCodeTest() {
        assertFalse(one.hashCode() == one.hashCode());
    }


}

Now we (hopefully) can understand why we're going to have a problem. If I call hashCode twice on the same object and get different results, when the java runtime attempts to see if the object is in the hash backend, it will be looking in the wrong bucket. More insidiously, it will keep adding things to random buckets and the count will continue to grow even though you cannot find things you just put in.

So to recap

First: Designing easy to use APIs that are logical and correct is difficult. Part of the problem with this design was with the ease with which someone could use it incorrectly. Never forget #AdamsRule When trying to design something completely foolproof, don't underestimate the ingenuity of complete fools. The fix in the later versions of the commons api overcomes this problem very well I think.
Second: if you're using commons-lang, upgrade to (at least) 2.5 and avoid this particular problem.
Third: If you're NOT using commons-lang and rolling your own hashcode/equals methods, be careful and make sure you understand what you're getting yourself into.

3 comments:

Sanketi Mehta said...

I know this post is from last year but I just came across. Thanks for writing this. I've spent a day using 2.4 commons-lang jar and trying to figure out why HashMap returns weird results each time on the same Object! This helped me.
Sanketi M.

Sanketi Mehta said...

I know this post is from last year but thanks for writing this. I've spent a day using 2.4 commons-lang jar and trying to figure out why HashMap returns weird results each time on the same Object! This helped me.
Sanketi M.

Sanketi Mehta said...

I know this post is from last year but thanks for writing this. I've spent a day using 2.4 commons-lang jar and trying to figure out why HashMap returns weird results each time on the same Object! This helped me.
Sanketi M.