Thursday, May 31, 2012

How to juice your java performance

Warning: This is a preoptimization

In my previous post about equals and hashcode I thought I'd point out how to redesign the class in question to make better use of performance. If you have a situation where you create groups of objects that are immutable, but you need to pass them around in hashsets and/or look things up, a way to increase performance is to change from something like the following:

package blog.mainguy;

import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;

public class LookupEntity {
    private String name;

    public void setName(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    @Override
    public boolean equals(Object other) {
        if (other != null && other instanceof LookupEntity) {
            return new EqualsBuilder().append(this.getName(), ((LookupEntity) other).getName()).isEquals();
        } else {
            return false;
        }
    }

    @Override
    public int hashCode() {
        return new HashCodeBuilder(17,31).append(this.getName()).hashCode();
    }
}

to something like
package blog.mainguy;

import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;

public class LookupEntity {
    private String name;
    private LookupEntity() {
    }
    public LookupEntity(String name) {
        
        this.name = name == null?"":name;
        hashCode = name.hashCode();
    }
    public String getName() {
        return name;
    }
    private int hashCode;

    @Override
    public boolean equals(Object other) {
       if (other != null && other instanceof LookupEntity) {
            return this.name.equals(((LookupEntity)other).getName())
        }
    }

    @Override
    public int hashCode() {
        return hashCode;
    }
}

Note, I have not profiled this, it is based on my perception and understanding of how java works. It is also (as I noted) a pre-optimization that I personally wouldn't necessarily start with (though in many cases I might).

Use computers for repeatable tasks, use humans for creative tasks

Pundits often refer to the notion that the most effective developers are 10 or more times as efficient as the least effective. I think this is probably an understatement, but I think the reason is often oversimplified. Many of the "more effective" developers seem like to pat themselves on the back and chalk it up to being smarter or some other nonsense. While arguably this is true, I think it is more that the most effective developers HATE doing the same thing twice.

I think this is an important detail because computers are REALLY good at doing the EXACT same thing millions of times per second. People are not very good at this... in fact it's arguable that it is impossible to do the same thing exactly the same twice in a row.

The truly effective developers are the ones who use computers to do the repeatable tasks and use their unique human curiosity and creativity to find new things for the computers to do. More importantly, the best software development shops realize this and learn how to foster behaviors and environments that reenforce this division. In GREAT software shops, spending 1 hour per week filling out manual reports and copy/pasting info from one spreadsheet to another will NOT happen and will sound like a really stupid idea.

Wednesday, May 16, 2012

How to shoot yourself in the foot with HashCodeBuilder

The setup

HashCodeBuilder is a nice tool provided in apache commons to help building infamously difficult hashCode and equals methods in java classes that work properly. While I appreciate the tool, versions prior to 2.5 had a subtle api design flaw that is likely to bite you if you aren't careful. To illustrate how subtle the design flaw is, take a look at the following code:

package blog.mainguy;

import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;

public class LookupEntity {
    private String name;

    public void setName(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    @Override
    public boolean equals(Object other) {
        if (other != null && other instanceof LookupEntity) {
            return new EqualsBuilder().append(this.getName(), ((LookupEntity) other).getName()).isEquals();
        } else {
            return false;
        }
    }

    @Override
    public int hashCode() {
        return new HashCodeBuilder(17,31).append(this.getName()).hashCode();
    }
}

The first surprise

It's probably impossible to see the bug, but I assure you with older versions, the following test will (often) fail.

import blog.mainguy.LookupEntity;
import org.junit.Test;

import java.util.HashSet;
import java.util.Set;

import static org.junit.Assert.assertTrue;

public class LookupEntityTest {
    @Test
    public void lookupTest() {
        Set mySet = new HashSet();
        LookupEntity one = new LookupEntity();
        one.setName("mike");
        mySet.add(one);
        assertTrue(mySet.contains(one));

    }

}

The flaw is that the HashCodeBuilder didn't use the hashCode() method to return the hashcode, it used a method called toHashCode(). We effectively were getting the hash code for the instance of the builder instead of the hashCode for the "built" entity. This was fixed in 2.5 so that hashCode() now calls toHashCode(), but illustrates how an simple decision (such as how to name a method or naming a method in a manner too similar to another) can cause subtle bugs.

Surprise Number Two

To show how interesting this problem is the following tests will (often) pass:

import blog.mainguy.LookupEntity;
import org.junit.Before;
import org.junit.Test;

import java.util.HashSet;
import java.util.Set;

import static junit.framework.Assert.assertEquals;
import static junit.framework.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class LookupEntityTest {
    Set mySet = new HashSet();
    LookupEntity one = new LookupEntity();

    @Before
    public void setup() {
        one.setName("mike");
        mySet.add(one);
    }

    @Test
    public void lookupTest() {
        assertFalse(mySet.contains(one));

    }

    @Test
    public void countTest() {
        mySet.add(one);
        assertEquals(2, mySet.size());

    }

}

What! it appears that the "Set" is broken! The fact is, however, this is exactly why hashcode and equals not agreeing is a big problem. The method that java uses when storing things in Hashes is to divide the internal storage into buckets and use the hash to determine which bucket to put something into... it then adds things to an array in the bucket to store it. Then, when it's time to see if it contains something, it looks up the hash of the thing, goes to that bucket, and traverses the list to see if it is there. In our "broken" examples, we are returning the hashCode of a new instance of HashCodeBuilder... every time. This means that it will stand a strong chance of returning a hashCode that will not match the next time we call it.

Yes, one more surprise

For example, the following test will also (often) pass.

import blog.mainguy.LookupEntity;
import org.junit.Before;
import org.junit.Test;

import java.util.HashSet;
import java.util.Set;

import static junit.framework.Assert.assertEquals;
import static junit.framework.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class LookupEntityTest {
    Set mySet = new HashSet();
    LookupEntity one = new LookupEntity();

    @Before
    public void setup() {
        one.setName("mike");
        mySet.add(one);
    }

    @Test
    public void lookupTest() {
        assertFalse(mySet.contains(one));

    }

    @Test
    public void countTest() {
        mySet.add(one);
        assertEquals(2, mySet.size());


    }

    @Test
    public void hashCodeTest() {
        assertFalse(one.hashCode() == one.hashCode());
    }


}

Now we (hopefully) can understand why we're going to have a problem. If I call hashCode twice on the same object and get different results, when the java runtime attempts to see if the object is in the hash backend, it will be looking in the wrong bucket. More insidiously, it will keep adding things to random buckets and the count will continue to grow even though you cannot find things you just put in.

So to recap

First: Designing easy to use APIs that are logical and correct is difficult. Part of the problem with this design was with the ease with which someone could use it incorrectly. Never forget #AdamsRule When trying to design something completely foolproof, don't underestimate the ingenuity of complete fools. The fix in the later versions of the commons api overcomes this problem very well I think.
Second: if you're using commons-lang, upgrade to (at least) 2.5 and avoid this particular problem.
Third: If you're NOT using commons-lang and rolling your own hashcode/equals methods, be careful and make sure you understand what you're getting yourself into.

Tuesday, May 8, 2012

Clean code is an important quality, but not the MOST important

Clean code is a book by Uncle Bob Martin that speaks to principles in software development that lead to high quality, legible software. This is a great book and every developer should read it once (or twice). The book outlines important tools and techniques, but these are not the end, they are a means. I've noticed a lot of folks treat clean code as the "secret sauce" that makes great software. Unfortunately, they take it a little too far and by being too focused on their secret sauce, they forget that it's worthless if the customer starves to death waiting for the secret sauce to be "perfect".

At one point early in the book, Uncle Bob uses the metaphor of physicians and hand washing. In particular, he makes a point that it would be unrofessional and possibly illegal for physician to not wash his hands before surgery, even if the patient (the customer) asked for it. I like this metaphor and I think he makes a good point, but I feel he narrows the perspective too much. I respect his position, and it makes complete sense when talking about elective surgery, but doesn't apply in all situations. In my experience, most software development isn't like elective surgery with a controlled environment and a sterile operating room, but more like treating wounds in a combat zone under direct fire. In this sort of an environment, hand washing (clean code) is important, but not always as important as stopping the bleeding and/or saving the patient's life. It's often more important to get the patient out of harms way and stop the bleeding than it is to make sure you don't introduce the risk of infection at some future point. Sometimes, you might need to even sacrifice not only hand washing, but even a leg or two in order to save the patient's life and you've done the right thing.

The more important factor beyond keeping things clean is understanding and delivering value to the customer in a manner (i.e. cleanliness) that is appropriate for the environment. I've been on a few projects where there is 100% test coverage, the code is a beautiful work of art, but the business customer ultimately cancels the project because they cannot afford to operate without the business value the code was supposed to deliver. In our combat zone metaphor, some practitioners are so busy washing their hands, they've forgotten that the goal is to save lives, not prevent infection. Clean code is an important goal, just like preventing infection is an important goal, but not THE most important goal.

Developers, clean code is important, it is downright essential in the long run, but it is not the primary goal of software development. Learn the principles and techniques to keep your code clean, but don't forget that your customer needs the software to do something for them and no matter how clean your code might be, if it doesn't provide real business value in a timely manner, the customer isn't getting what they need. In other words, don't let your patients die from blood loss while you're busy washing your hands.