Wednesday, April 25, 2012

Java programmers: Code to the interface, even if the interface is a class

After spending a considerable amount of time trying to figure out how to refactor some particularly hairly (hairy + gnarly) data access code, I thought I'd share some insight into a popular misconception about what coding to the interface actually means.

Let's say we're writing a data access layer and we have something called the UserDAO. a simple implementation might be something like:

public class User {
    public int id;
    public String name;
}
public class UserDao {
    public boolean save(User toBeSaved) {
    
    }
   
}

I'm going to dodge the issue of the user class not having getters and setters and thus not following the javabean spec and talk about the UserDao Interface. Yes, you heard me, the UserDao class effectively is an interface in this example. Sit down and just think about that for a minute, once you've done that, move to the next paragraph.

A great (GREAT) many java developers might not get to this paragraph because they'll immediately start reacting to the idea that the UserDao isn't a java interface, it's a java class. This is an implementation detail and I'm here to tell you that you have already started down a path of increased complexity. Why? because most java developers will, instead of just using the above two classes, add another layer of indirection.

public interface UserDao {
    public boolean save(User toBeSaved);
   
}
and change the UserDao to implement this class:
public class UserDaoImpl implements UserDao {
    public boolean save(User toBeSaved) {
    
    }
   
}

Which in my experience is of no value in a large percentage of use cases (let's call it 90% of the time). This is a mistake! I know that "best practices" from just about every source you'll find say this is a good idea, but I'm here to tell you that you are taking on debt and you should CAREFULLY weigh the cumulative cost of that debt. The biggest problem is that in non-trivial systems, this has adds unnecessary complexity to the design and makes things more difficult to decypher. There are other problems, but my biggest problem with this assumption is that not just the added complexity, but the knee jerk non-thought that goes into adding the complexity for no good reason. Imagine if you have 90 DAOs and 90 interfaces and every change to the interface requires a change in two places.

But Mike! people will say, what if my current implementation uses hibernate and I want to switch to ibatis? Fine, I'd answer, change the implementation of the save and get methods in your simple UserDao to use the other library. An example would be to use composition in the Dao to hook to the particular implementation you need (example use spring autowired beans).

public class UserDao {
    @Autowired
    private HibernateSession hibernateSession:
    public boolean save(User toBeSaved) {
        return hibernateSession.save(toBeSaved);
    }
   
}
and when we decide to use ibatis
public class UserDao {
    @Autowired
    private IbatisSession ibatisSession;
    public boolean save(User toBeSaved) {
        return ibatisSession.save(toBeSaved);
    }
}

I realize it's not really that simple (I don't know ibatis well enough, sorry), but my point is that the class in this example is GOOD ENOUGH as the interface. My rejection of the "automatically use a java interface" is because there are good reasons to USE an interface, but this example is NOT one of them.

So when is a good time to use an java interface? The time to use interfaces is when you have multiple things that need a shared interface (set of operations), but they don't necessarily have the same concrete class backing them. This design detail is java's way of handling multiple inheritance. In the context of most J2EE apps, DAOs are not a good use of the concept, a better example would be something like getting audit information for multiple entities:

public interface Auditable {
    public String getAuditString();
}
public class User implements Auditable {
    public int id;
    public String name;
    public getAuditString() {
        return "User " + id + " with name " + name;
    }
}

public class Account implements Auditable {
    public int id;
    public String accountNumber;
    public getAuditString() {
        return "Account " + id + " with account number " + accountNumber;
    }
}



public class AuditDao {
    public void audit(Auditable toBeAudited) {
        System.out.println("performing operation on:  " + toBeAudited.getAuditString());
    }
}
public class UserDao {
    @Autowired
    private HibernateSession hibernateSession:
    @Autowired
    private AuditDao auditor;
    public boolean save(User toBeSaved) {
        auditDao.audit(toBeSaved);
        return hibernateSession.save(toBeSaved);
    }
}


public class AccountDao {
    @Autowired
    private HibernateSession hibernateSession:
    @Autowired
    private AuditDao auditor;
    public boolean save(Account toBeSaved) {
        auditDao.audit(toBeSaved);
        return hibernateSession.save(toBeSaved);
    }
}

I realize there are better ways to implement this particular variation, but my point is that the auditable interface requires to implementation by completely different classes to happen at runtime. Hiding things behind interfaces should only be done if necessary and can provide realistic known value in the present or real future. Switching implementations can often be done in other ways when you spend time to think about your design. Java interfaces are for enabling multiple concrete classes to have the same interface, NOT necessarily for simply defining the interface of a concrete class. With good design, a class will hide it's inner details and the interface is just extra complexity.

Wednesday, April 18, 2012

ya ain't gonna need it until ya need it

Yesterday I posted a somewhat snarky comment about how You don't need layers until you need them which may have seemed like a nonsensical thing to say. Today I was started to write an example of how to refactor an anemic data model with lots'a layers into a lean and mean persistance machine... but stumbled into a perfect example of what I was trying to say. In essense, I was trying to repeat the idea that "Ya Ain't Gonna Need It", but with emphasis on the fact that... Yes, you may KNOW you're going to eventually need it, but building infrastructure before you need it accumulates overhead that you must pay for, even if you don't get the benefit.

My Example (Snippet of pom file)

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <properties>
        <scala.version>2.7.7</scala.version>
        <spring.version>3.1.1.RELEASE</spring.version>
    </properties>
    <groupId>tstMaven</groupId>
    <artifactId>tstMaven</artifactId>
    <version>1.0</version>
    <dependencies>
        <dependency>
            <groupId>rome</groupId>
            <artifactId>rome</artifactId>
            <version>0.9</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-compiler</artifactId>
            <version>${scala.version}</version>
            <scope>compile</scope>
        </dependency>

        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-core</artifactId>
            <version>4.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>3.1.1.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-orm</artifactId>
            <version>3.1.1.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-orm</artifactId>
            <version>3.1.1.RELEASE</version>
        </dependency>
    </dependencies>
</project>

What's wrong?

First, take a look at the scala version number. I've prematurely assumed I'm going to have multiple things that will depend on the scala version and moved it to a property. Don't do this, why? Because you don't need it. :) More importantly, if everyone follows this standard, they'll end up doing more work, one step to put the version property at the top of the file and another step to put the replaced string down in the dependencies section. Even more importantly, a person wanting to know which things have multiple dependencies on the version number will have no immediate cue as to which things have intentionally identical version numbers. The important theme is to try and communicate INTENT to subsequent developers.
Next, you'll note my spring config has the exact opposite problem, I've got three dependencies that all SHOULD move in lockstep and the version number is defined independently.
A lot of tech folks will immediately say "Just make everything use a property, that way it's all done the same way". I would agree, there is some value in standardizing on "how to define the version", but I think there is more value being lost in adopting this lowest common denominator mentality. In short, but only externalizing the version number when it's necessary, it adds a clear signal to the next person looking at the project when there are versions that multiple dependencies are dependent on.

The refactored version

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <properties>
        <spring.version>3.1.1.RELEASE</spring.version>
    </properties>
    <groupId>tstMaven</groupId>
    <artifactId>tstMaven</artifactId>
    <version>1.0</version>
    <dependencies>
        <dependency>
            <groupId>rome</groupId>
            <artifactId>rome</artifactId>
            <version>0.9</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-compiler</artifactId>
            <version>2.7.7</version>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-core</artifactId>
            <version>4.1.2</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-core</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-orm</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-orm</artifactId>
            <version>${spring.version}</version>
        </dependency>
    </dependencies>
</project>
This version eliminates the problem of knowing which things must travel in lockstep version and which things can version independently. While this requires more thinking when building the pom file, and it certainly a trivial example of a larger problem, I think it illustrates what YAGNI really means. Unnecessary baggage should be though of as equipment you're putting in your backpack for a 1000 mile hike... sure it might be nice to carry 20 lbs of first aid equipment, "Just in case", but remember you've got to carry all that stuff every step for the next 1000 miles.

Tuesday, April 17, 2012

All java archeditects read this

I have a couple of quick notes for any aspiring java architects. Please read them carefully and think about them.

Adding layers is BAD

In general, you don't need extra layers until you need them. At that point, add a new layer (but only in necessary places). Create "standard" layers just adds complexity, makes maintenance more expensive, and ultimately fosters copy/paste coding and discourages developers from thinking about what they're doing. An example of a good time to add a layer is when you need to hide complicated operations behind a facade because low level database transaction management is being done in the same place as the code that determines which screen should be displayed next. Too many developers heard "layers add flexibility/scaleability/whatever" and started adding layers to every situation that has an arbitrary division of responsibility... I've worked on systems where adding a table column to be displayed in a CRUD application required changing upwards of 10 different classes... this is a maintenance nightmare.

Interfaces have a special purpose, you don't need them for everything

Not every class needs an interface... They should be reserved for situations where an interface is useful and not just another unnecessary ceremony that developers will mindlessly follow "because that's the way we do it here". A good example of an interface would be something like "Nameable or Labelable". These are often contexts that systems need when rendering information ('cause toString() often won't cut it). The key point is that there will be many classes (at least more than one) that will implement the same interface in the system at the same time. If you try to hide an implementation behind an interface with the idea that the implementation might change in the future... Just use a concrete class and change the dag gone implementation when you need to change it. Don't force every update every developer makes for the next 6 years be TWICE as much effort...

Beware of one size fit's all solutions

Don't build the titanic when all you need is a rowboat. I've seen monster frameworks grow because of one tiny edge case. Instead of treating the single edge case as an outlier and walling of that portion of code from everything else, may architects make the mistake of trying to accomodate the edge case in EVERY part of the system. In addition to extra layers and extra interfaces, I've seen systems that generate javascript web service code, soap stubs, extra java classes to handle serialization, and any other number of overcomplicated plumbing... just because one or two calls in the system needed to be remote.

The short version is, don't overcomplicate your solutions and don't start adding code you don't need ahead of time. You'll be carrying that code on your back for every step you take and need to make sure you don't burn out hauling a bunch of unnecessary baggage.

Tuesday, April 10, 2012

The Notoriously Tricky "Step 0"

I recently posted about "Git for Dummies" and noticed that commented that they had followed my instructions and had a strange permission issue. Wanting to verify everything was correct, I did a quick check and, sure enough I was having the same problem. Upon investigation, I discovered that I had posted a which required a public/private key pair and folks who hadn't already set this up on their machines would get an error. The post was supposed to have been written for a new user and they typically wouldn't have done this. As a long time github user, I did this one time setup ages ago and had totally forgotten about it.

This is common occurrence, so common, that I give it a name... I call it "Step 0". This is a variation of the curse of (prior) knowledge and can be frustrating for both people trying to learn something new as well as people who are trying to explain how to that something to a newbie. In essence, "Step 0" is the prior knowledge necessary and fundamental to performing a task that is so fundamental that it is routinely forgotten until a novice attempts the task for the first time.

As with many things, there is no sure fire solution or way to avoid this, but being aware of the situation can certainly reduce the amount of frustration when it happens. I was about to amend my previous blog post to add the necessary detail, but decided to change the url to the "read only" version. Instead of editing the post to add the 5 steps necessary to set up and use the ssh version, I changed to use the anonymous version that most newbies should start with.

I chose the simpler solution over trying to explain "setting up a github account" and "setting up ssh public/private key pairs" because I think those topics are different and trying to put too much detail can be just a detrimental as not having enough. If someone follows the instructions and they don't work, they can ask a question.

Monday, April 2, 2012

Exploiting the Cloud for Personal Productivity

I'm currently doing an evaluation of Drools to illustrate the differences and similarities between it and JRules. Due to time constraints, most of this is being done on a train commuting between Chicago and my home. This means I'm doing most of my work at 60mph over a 3g connection. This also means that my network is constantly dropping, slowing down, and otherwise misbehaving.

So to evaluate drools, I needed to download the 300ish mb zip file and set up. Originally I started downloading to my laptop but realized this was just going to take longer than I really wanted. So, I fired up an ubuntu EC2 AMI, typed "wget http://download.jboss.org/drools/release/5.3.0.Final/guvnor-distribution-5.3.0.Final.zip" and was ready to roll in 5 minutes.

An additional benefit to this is that I can point people to the ip address of the EC2 instance and they can actually start to use the product without requiring me or my laptop to be present. An even bigger benefit is that I can tune this image, get it really good, then snapshot it and resell the image (or ability to create the image) as a starting point to multiple clients for their own rule based projects.