Wednesday, July 25, 2012

Effectively communicating software requirements - PART 1

The most important factor that contributes to a successful software project is ensuring that the development staff has a good understanding of the requirements. Incorrect requirements almost guarantee incorrect software behavior. Most often, however, the requirements aren't clearly incorrect, but rather they are vague, ambiguous, or make incorrect assumptions about the best way to apply technology to solve the problem.

The original title of this post was "Writing Good Software Requirements", but I realized after writing the previous paragraph (irony noted) that writing good requirements is actually part of the problem. Having written software my entire adult life and much of my childhood, I've been most effective writing software for myself. This software has always done exactly what it was supposed to do (technical bugs notwithstanding) and the reason is because I had a visceral and complete understanding of the problem I was trying to solve and what factors would directly influence my impression of success.

It follows then, that the number one goal is to effectively communicate the problem that the software is meant to solve. By far, the most effective way to do this is to make the developer have the problem you want solved. Need a better solution for authoring blog posts offline? tell a developer that this is the only way they can do it and I guarantee you'll get a decent solution. Yes, it will likely need some visual tweaks and you'll need to make some changes to accomodate users that are NOT developers, but the essence of the problem will be solved very quickly.

This is obviously not a way to do things in the real world, but thinking about the problem leads us to some basic rules to follow when gathering and communicating requirements:

Step 1 - State the problem clearly

A common misstep, especially when the person assembling the requirements is technical, is to skip outlining exactly what the problem actually is. Folks will jump immediately to a solution and assume everybody understands why we need it. For example, let's supposed we are create a system to enable users to write blog posts. For the author, one problem with online editors (like blogspot) is that it's not possible to author posts offline. One version of the requirement is to say something like "enable users to run a copy of the blogging platform on their local machine".

There are probably a thousand business analysts who looked at that statement curiously and thought "that's PERFECT, what could possibly be wrong with that as a requirement?". Well, there are a number of problems, but first and foremost, it makes a rather large assumption about the mechanism that should be used to do the offline editing. This leads to a situation where it limits the developer's ability to use the technological assets at their disposal to solve the problem.

The better way to communicate the requirement is to start with a statement of the problem. "Users cannot author blog posts when their computer is not connected to the internet". This initial statement gets the developer thinking about the REAL problem, and not imagining other possible reasons that someone might want to run a copy of the blogging platform locally. More importantly, it also enables the developer to start creatively thinking about the problem in terms of the end user instead of as merely a technical problem to be implemented. This has the added benefit of forcing (one would hope) the developer to start taking ownership of both the problem and it's solution and helps avoid a situation where one implements a really crappy solution to the problem "because that's what they asked for".

Step 2 - Clearly express non functional requirements

Once the problem has been clearly articulated, the next step is to clearly communicate what other factors will influence the success of a solution. These factors might be "Users should be able to author offline posts in their iPhone" and "users shouldn't need to download any additional software to use the solution". These requirements are what architects would refer to as "non-functional requirements" or NFRs. The problem with these statements is that , again, the developer is left to their own devices as to WHY these requirements even exist, which leads us to:

Step 3 - Clearly express the reason for the non functional requirements

A super way to communicate these is to explain (from a business or user perspective) WHY it is important for these requirements to be met. The iPhone requirement might state something like "20% of our customers have iPhones and we believe we can capture this market and increase our market share by this amount if we have this capability". For the second NFR, it could be something like "90% of our users are non-technical and we believe they would not be willing to install additional software for this capability"

If you are a person responsible for communicating requirements to developers, starting with these three steps will guarantee that you're starting on a solid base and will enable you to supercharge your development team. Realistically, using these simple steps as the cornerstones of your requirements gathering process will yield immediate positive results. If your developers are complaining about "bad requirements" or you feel that the software your development staff is unable to produce the desired results, take a look at what you're communicating to them and make sure these things are being accomplished.

In a subsequent post, I'll outline some more specific pitfalls when expressing requirements (especially in written form) and give some helpful tips on how to avoid them.

Friday, July 13, 2012

equals and hashCode for dummies (again)

In java, writing equals and hashcode methods are perennial problems. Newbies and experts alike are confounded when things go haywire and troubleshooting problems can be extremely difficult. A common thing that a poor hashcode/equals implementation will cause are intermittent problems and/or mysterious out of memory errors. To help clear things up, I'll give a 10,000 overview of how the core java libraries use the hashcode/equals methods.

Essentially, the hashcode/equals methods are used when storing things in Hashes (HashMap, Hashtable, HashSet). Things often go wrong when folks start using custom objects as keys for these classes. For example, let's say we have a person class that looks something like this:

            public class Person {
                public Person(String orgName) {
                    this.name = orgName
                }
                private String name;
                public String getName() {
                    return name;

                }
                public void setName(String newName) {
                    this.name = newName;
                }
            }
        

This seems pretty straightforward, no surprises, can't get any simpler than this right?

Now let's say we add some people to a hashset that represents a club that people can belong to:

            HashSet<Person> myClub = new HashSet<Person>();
            Person jsmith = new Person("Joe Smith");
            Person sjones = new Person("Sarah Jones");
            myClub.add(jsmith);
            myClub.add(sjones);

        

The contract of a set says that it will guarantee uniqueness... to test this, we can try and add a duplicate Sarah Jones:

            Person sjones2 = new Person("Sarah Jones");
            assertTrue(myClub.add(sjones2));

        

This tells us that we can add two Person objects with the same name. In our strange use case, we don't want that, we want only 1 "Sarah Jones" person in the club. To accomplish this, we need to write an equals method in order to let the system know that our people are unique by name. So we override the equals method in our class by adding the following to the Person.

            @Override
            public equals(Object o) {
                if (o == null || o.getClass() != this.getClass()) return false;
                Person other = (Person)o;
                return other.getName().equals(this.getName());
            }
        

Now, if we try to add our second Sarah Jones, it should fail right? Well, it turns out it only "might" fail, but it also "might" work (sometimes) and this is where things get wonky (especially for newbies). Consider the following:

            HashSet<Person> myClub = new HashSet<Person>();
                    Person sjones = new Person("Sarah Jones");
                    Person sjones2 = new Person("Sarah Jones");
                    assertEquals(sjones, sjones2);


            // Now for the guts
                    myClub.add(sjones);
                    assertTrue(myClub.contains(sjones);

            //Random failures after here
                    assertFalse(myClub.add(sjones2));
                    assertTrue(myClub.contains(sjones2);


        

The above code will randomly fail the third and fourth assertions for no apparent reason. Why? It has to do with how the Hash* implementations work in java. For a really nice, more in-depth write-up, go here, but I'll just give a quick overview.

The Hash* java implementations store things in buckets typically with linked lists (or arrays) in each bucket. For example, a trivial implementation takes 10 buckets and when storing something, it takes the hashcode of the object, does a mod-10 operation on it and then adds it to the linked list of that bucket. The problem with our implementation is that we didn't override the hashCode function and the default implementation in java just uses the memory location for the result of the hashCode function. So, for example our above example would work if the following happens:

  • new hashSet is created
  • sjones gets created and stored at memory location 4
  • sjones2 gets created and stored at memory location 14
  • java compares sjones and sjones2, sees they are equal and returns true
  • java adds sjones to myClub by grabbing the mod-10 of sjones (which would be 4) and putting it in the list on bucket 4 of myClub
  • java verifies that sjones is in myClub by grabbing the hashCode, seeing sjones should be in bucket 4, then walking down the list calling the equals method on each object. There's only one and sjones.getName() in fact is equal to sjones.getName() so it can be found
  • java then tries to add sjones2 by doing a mod-10 of the sjones2 hashcode, seeing it should ALSO go in bucket 4, then walking the list to see if sjones is already there. It sees sjones is already there (because sjones.getName() is equal to sjones2.getName(). Due to the nature of the HashSet contract java will return false to indicate that sjones2 was NOT added because it was already there.
  • Next, java will again look up the hashcode of sjones2, mod-10, look in the bucket, and verify that sjones2 is, in fact, in the set... even though it previously said it was not added (which is totally correct and makes sense if you understand how things are supposed to work)

If the above results are suprising, you'll want to take a break, but for those of you who are following around, now it gets more interesting. Lets run through the above scenario, but instead of sjones2 getting created at memory location 14, let's say it gets created at memory location 17. This would mean that NOW, sjones2 will get added to the set and you'll end up with duplicates. Worse yet, if you call it enough, you could end up with 10 copies of Person objects that are all equal in the Set. How will this happen?

Well, we see that in order to determine which "bucket" to look for an Object, java will use the hashcode. If two Objects that are "equal" have different hashCodes, java will look in the wrong bucket and NOT find it. Because the default implementation uses memory locations as the hashCode, it's often the case that things will "sometimes maybe" work, but other times completely fail.

In short, if you're having wonky behaviour with complex types in Sets or the keys of Maps, carefully verify that all things that can be "equal" will "always" have the same hashCode. Note, things with the same hashCode DON'T need to be equal, but that's a completely different discussion.