Friday, December 16, 2011

Java classes, objects, and instances demystified (hopefully)

A great many people are competent java developers, but have only a vague understanding of the difference between a "public static method", "public method", and the difference between a class and an object. As this was confusing to me at first, I thought I would give a quick overview.

A class defines a template for what data and operations are available when you tell the JVM to create an object.

So, for example:

public class BlogPost {
    public String text = "";
    public static BlogPost latest;
    public static BlogPost create(String input) {
        latest = new BlogPost();
        latest.text = input;
        return latest;
    }
    public int getTextSize() {
       return text.length();
    }
}

When you compile this class, it is creating a file that the java runtime can later use to enable programmers to load an object into memory that has a single attribute called "text" which is a reference to another object with a class of String. Additionally, the class itself has an attribute called latest that is a reference to an object with a class of BlogPost, a method called create that accepts a reference to a string, creates a new Blogpost, stores it to the "latest" class variable, set's the text attribute on the "latest" class variable and subsequently returns the reference to "latest". In addition, there is an instance method called "getTextSize()" that returns the result of the instance method "length" for the "text" instance variable of the object.

So far, if you've done any amount of java programming, this shouldn't be too shocking or eye opening. However, there are some subtle and not-no-subtle nuances that are at play here. First and most commonly confused is that static methods cannot get access to instance variables.

Why?

Let's talk through this.... when java is running, class definitions are broken into two pieces. The data piece and the method piece. The data piece is independent for every new instance, the method piece is identical for every instance created and unique for the particular class. The method you can call on a particular class are unique to the CLASS, not the instance of the class. In addition, static variables on the class are also unique to the CLASS, not the instance. So, for example, the method area for our "BlogPost" has two method references... one for "create" which expects to receive a String reference and will return a BlogPost reference, and one for getTextSize, which expects to receive a reference to a BlogPost instance an will return the integer value of the text field reference stored on the BlogPost reference it received

When we are in the create method, there is no BlogPost instance available to look at... since the "length" method and the "text" instance variable both need a copy of a BlogPost object loaded into memory ... there's no way to access them.

Put another way, When you define an instance method, even though you don't tell the JVM, it automatically knows that it MUST have a reference to it's defining class already loaded into memory (BlogPost) in order to perform it's operations.

This gets to the heart of what I think a lot of people don't get about java (not ALL languages/VMs do it this way, but java DOES). Another way to code the getTextSize() method above would be to do this:

    public static int getTextSize(BlogPost myPost) {
        return myPost.text.length();
    }

Some people would think that this method is more effecient because you don't "waste" memory having all kinds of copies of the function loaded into memory. The fact is, java is not that naive, ALL methods are effectively singletons and there will only ever be one copy of the method implementation in memory. When you call instance methods, you're simply telling java that you want to make sure this method ALWAYS REQUIRES an instance of my class as an implicit parameter. In addition, the language has a nice way of defining this were you don't have to explicitly pass the instance as a parameter. There really is no difference at runtime between the memory consumption of the two implementations.

For a little more discussion, take a look at Stack Overflow . Hopefully this will help clarify things for some folks.

Friday, December 9, 2011

The java collections framework for newbies

I don't consider myself a java expert by any measure, but there's a disturbing thing I've noticed. There are a LOT of people who claim to be "java developers", but they have zero clue what the "java collections framework" is. This post is designed for folks who keep getting stumped on interview questions or are mystified when someone starts talking about the difference between a Set and a List (for example).
If you google "java collections framework for dummies" you'll find this link which has a more complete, if fairly dense explanation. I'm going to do you one better and give a rule of thumb that you can use without thinking about it.
At it the root of things, a collection is something you can store other things inside. Just like in real life, a collection of marbles is just a "bunch" of marbles. The big difference in the collections framework is that the different implementations have different things they DO with the marbles that you need to understand.
For example, let's consider the ArrayList... everybody and their brother should know this... if not, you are not a java developer, go read a book. Some special things about an array list: It stores the entries in order of when they are added, you can select an element by it's index, it can contain duplicates of the same element. From a performance perspective, it is VERY fast to lookup and add things by index and add things to an ArrayList, on average, it is slow to see if a particular object is present because you must iterate the elements of the list to see if it's there.
Next, let's talk about HashSet... I realize that this might sound vaguely drug related to the uninitiated, but a hashset has some interestingly different characteristics from a list. First off, a HashSet has no concept of order or index, you can add things to it, you can iterate over it, but you cannot look things up by index nor are there any guarantees of what order things will be presented to you when it loop over members. Another interesting characteristic is that it cannot contain duplicates, if you try to add the same object twice, it will NOT fail, it will just return false and you can happily move on.
Last but not least, there is the Hashtable (or his slightly more dangerous cousin, the HashMap). This is used to store key/value pairs. Instead of keying things by an index (like an arraylist), you can key them by just about anything you want. You can do things like myMap.put("foo","bar") and then myMap.get("foo") will return bar...
There is a LOT more to this, but with this quick reference you can at least begin to do useful things in java.
Examples of using a List
ArrayList myList = new ArrayList();
myList.add("Second Thing");
myList.add("Second Thing");
myList.add("First Thing");

System.out.println(myList.get(0));
will output
Second Thing
An interesting thing to note is that the size of this is 3
System.out.println(myList.size());
will output
3
The following:
for (String thing: myList) {
  System.out.println(thing);
}
will always output:
Second Thing
Second Thing
First Thing
Next lets look at a set:
HashSet mySet = new HashSet();
mySet.add("Second Thing");
mySet.add("Second Thing");
mySet.add("First Thing");
The first difference we can see is that
System.out.println(mySet.size());
returns
2
Which makes complete sense if you understand that sets cannot contain duplicates (and you understand how the equals method of String works...;) Another interesting thing is that: The following:
for (String thing: myList) {
  System.out.println(thing);
}
might output:
Second Thing
First Thing
or it might output:
First Thing
Second Thing
It so happens that it returns the second version on my machine but it's really JVM/runtime specific (it depends on how the HashSet is implemented and how hashcode is implemented and a bunch of other variables I don't even fully understand).
More importantly, the following will be likely be much faster for LARGE collections:
System.out.println(mySet.contains("Third Thing"));
Finally, the grandDaddy of all the entire framework, hashtable.
 Hashtable myMap = new Hashtable();
 myMap.put("a", "Second Thing");
 myMap.put("b", "Second Thing");
 myMap.put("c", "First Thing");

 System.out.println(myMap.get("a"));

Will output
Second Thing
and the following:
for (Map.Entry entry: myMap.entrySet()) {
    System.out.println(entry.getKey() + "=" + entry.getValue());
}
will output
b=Second Thing
a=Second Thing
c=First Thing
Hopefully with these examples, you can get an idea of the capabilities of the collections framework. There is much much more to it and I encourage ANYONE doing java development to spend time playing around and learning the different characteristics of the various components as I've only lightly skimmed the surface.

Tuesday, December 6, 2011

Is your team a cross country team or a soccer team?

While touring a college campus with my daughter, one of her prospective cross country team mates said something that gave me pause. In effect, her statement was that she really liked cross country because everybody on the team was always pushing for you to do your best. Also, she continued, it's nice to know that you either succeeded or failed because of your own effort and training, not because of anyone else. I've thought about this for quite some time and I realize there is a VERY big distinction between "individual" sports like wrestling, swimming, or cross country... and "team" sports like soccer, football, or basketball. These differences are important not just on the playing field, but in any situation that requires teamwork.

On "team" sports, you very often will have competition within the team that actually works against the team's objective. Additionally, individual team members may have to forgo performing at the best of their ability because of a particular game situation. For example, there are many good reasons for a soccer player to NOT dribble the ball when they have possession: They could be covered by a good defender, they might not actually be a very good dribbler, or they might have a teammate in a much better position to do something productive. This means at any given moment, they need to not only take into account their own situation, but the situation of 21 other people plus a ball and a referee or three.

With an individual sport, it is the performance of an individual that is paramount. In cross country, I supposed there are situations where it might be good to lose a few places individually in order to help pace a teammate, but there is not really as complex an interaction with the other players on the field. Put another way, the SPORT of cross country doesn't require as much social intelligence as something like soccer... it is much more purely sport for sport's sake and a measure of an individual's ability to perform.

At the end if the day, both styles of team are important and beneficial, but from my observation, there are some interesting implications. First, on a "team" sport, there are often social conflicts due to the complex interplay of individuals and game situations. On an "individual" sport, I think these conflicts are less common (or severe). It seems like this is because even if two cross country teammates are in fierce competition with each other... they're only helping the team and each other out and making both stronger. In contrast, if two soccer teammates are in fierce competition with each other, nothing good will happen and it will likely destroy the team.

When working with teams in general, it's important to understand if the situation is a "individual" situation, or if it is a "team" situation. If you're trying to motivate a team and it's more important that each individual do their best and the individual's contribution ONLY has positive effects, foster and reward the individual regardless of the performance of everyone else. If, on the other hand, your team has more complex interactions, it becomes more important to be sure to let players know when they might need to behave in a more altruistic manner for the good of the team.