Thursday, May 16, 2013

When to refactor code

As a die hard refactorer, but also pragmatic programmer, I often have a tough time articulating to other developers when a refactor is important and when it is gratuitous. I can imagine many people look at decisions I've made about when it is and isn't appropriate and think it's simply a whim or "when I feel like it". To clarify this for both myself and any future victims/co workers involved with refactoring decisions I may make, I submit this 10 item checklist.

  1. Is the cyclomatic complexity of the function below 5?
  2. Did you completely understand what cyclomatic complexity was without following that link?
  3. Do you have an automated test or documented test case for every path through the function?
  4. Do all of the existing test cases pass?
  5. Can you explain the function and all it's edge cases to me in less than 1 minute?
  6. If not, how about 5 minutes?
  7. 10 minutes?
  8. Are there fewer than 100 lines of code in the function (including comments)?
  9. Can you find two other developers who agree this code is error free just by visual inspection?
  10. Is this function used in only one place?
  11. Does the function meet it's performance goals (Time/Memory/CPU)?

Scoring

Add up the "no" answers above:
  • 0-1 - Why would you even think about refactoring this? You probably just want to rename variables because you don't like the previous developer's naming convention
  • 2-5 - This might need a little tweaking, but I wouldn't hold up a production release for something in this range
  • 6-8 - OK, we probably need to fix this... It's likely we're going to keep revisiting this and/or we don't actually know what it's doing. Still on the fence, but it's highly suspect
  • 9+ - This is a prime candidate for refactoring. (Note, writing test cases is a form of refactoring)

While I know this is a rough guideline, I think it hits on the key factors that are important about the overall quality of source code and helps avoid overspending effort fixing things that aren't necessarily broken.

Monday, May 13, 2013

Apologetic Agile Development

Having lived through numerous attempts to build software embracing the concepts behind the agile manifesto, I feel there are three large categories folks fall into when talking about agile principles.
  1. The curmudgen - these folks have been writing code since punchcards where the state of the art, OR they have been brainwashed by large consulting organizations into thinking that a large heavyweight process is the only way to succeed. Note, a subset of these folks believe that "no process" is actually OK and are quite happy to cowboy-code their way through life.
  2. The fanboy - these folks think "everything agile all the time" and will rename status meetings to "scrums". These are folks who are used to working solo on projects that they can do in their heads... or they are simply not clued into the implications of actually having a repeatable process or delivering working software.
  3. The apologetic - these folks understand the principles and the value they provide, but also understand that these principles are the important thing and know that the current state of the art of software development is still very problematic. These folks often complain or quip that they are not doing "real agile", but accept that using some of the tool and principles coupled with more traditional principles, tools, and processes has much more value in most cases
I'm squarely in the apologetic camp (my ego transposes apologetic for pragmatic BTW), and while I feel I have a good understanding of where and how agile can deliver value, I also understand that many times agile gets sold as a magic bullet that never delivers completely on it's promises. I think this is a mistake: No process, methodology, or tool is perfect, folks who complain that "agile" causes problems in their projects or doesn't solve problems that they have are completely missing the point. No process, principle, or methodology should completely dominate your software development philosophy and enlightened developers should stop apologizing.

Sunday, February 17, 2013

java static fields

A great many people starting out with java development have only a vague understanding of the difference between a "public static String", "public String", and the difference between a class and an object. As this was confusing to me at first, I thought I would give a quick overview.

A class defines a template for what data and operations are available when you tell the JVM to create an object.

So, for example:

class BlogPost {
    public BlogPost(String inString) {
       text = inString;
       BlogPost.latest = this;
    }
    public String text = "";
    public static BlogPost latest;
   
}

When you do the following

BlogPost myPost = new BlogPost("Hello");

You're telling the JVM to allocate some memory on the heap to store a reference to a memory location and from now on, when I refer to myPost, it means that memory location. BlogPost is a class, myPost is an Object that is a reference to a memory location that is an instance of the BlogPost Class. When I'm trying to create a new instance of a BlogPost, the JVM searches it's classpath for a compiled Class definition of Type BlogPost with a constructor that takes a single argument which is a String object.

The important detail some people miss at first is that "static" fields and methods work on the Class definition, not on the Object instance. This means that if you create an instance of a BlogPost, and read the "latest" field, you will not get 5 different values, you'll only going to get the reference to the last one. This has implications for thread safety and other situations where multiple objects may read or write the same static field.

Further complicating the issue is that java can have multiple classloaders which means if "BlogPost" is found in different places (or even in different threads potentially) there might actually be multiple instances of the Class definition. This can make for interesting debugging situations where a static field is updated twice, but only one of them is visible from a particular perspective at a given time. As an illustration, suppose the following two snippets of code appear in different parts of a system:

BlogPost myPost1 = new BlogPost("Hello1");
BlogPost myPost2 = new BlogPost("Hello2");

If myPost1 happens first and myPost2 happens second, most people would expect BlogPost.latest to always refer to myPost2. This is not always the case though, in a web container, if myPost1 was created in a different classloader than myPost2, it's quite possible that certain parts of the system will ALWAYS see BlogPost.latest as myPost1 and other parts will ALWAYS see MyPost2, no matter what order these calls were made. Worse yet, it's possible that, if the class was garbage collected, you might see neither, even though most people would never expect that to happen.

Examples of how this might happen are when you deploy a jar file to a web container in it's parent classpath (like xml processing libraries) and also deploy them inside the web application (or in two different web applications in the same container). Depending on how the container handles the situation, you may very well get different results (even from what I described).

For a more complete set of examples and a better explaination, particularly in regard to implementing Singletons in java, see this