Friday, November 22, 2013

Web application framework popularity over time

I thought I'd do a quick comparison of web frameworks to see how popularity is trending nowadays per google trends. I realized this is has some serious drawbacks, but as a person who has been in the j2ee space for years, I find the trend very interesting.

The trend is clear, if you're an ace J2EE super guru, you are about in the same position as a yii developer. While you're experience might be transferrable, the days of java containers ruling the universe appear to be numbered. To do some other interesting comparisions, let's look at java specifically.

This roughly supports my observation that Struts is dying off, spring is holding stead, and gwt...while making a big splash a few years ago seems to be tapering off. These numbers are a little deceptive because back in 2005 a lot of the "up and coming" frameworks, languages, and tools didn't even exist. So let's take a look at just the upstarts.

I've included rails because it's relatively new, and didn't such a large downturn and slow slide into oblivion as struts even though they started in a relatively similar timeframe.

For an up to date view of current programming language trends (not just frameworks), take a look at the Tiobe Index. It's interesting that C has been the top language for 40 years and shows now sign of faltering, whereas something like COBOL peaked 20 years ago and has long been in decline.

Use this information as you will, but some comments I will make:

  • Popularity is important, frameworks and languages need a vibrant community
  • Search trends can lie... something with high search requests could just suck and therefore more people need to look things up
  • The days of the software monoglot being in high demand (think J2EE developers in 2004) are rapidly receding. On one hand, the market is growing so the aggregate job demand might be the same. But from a solution perspective, there are many many more options now than there were even 10 years ago.

Running off the jetway or how to make decisions under pressure

The hazards of the unknown unknowns or how to avoid Running off the jetway

This scene illustrates an all too common problem in any field, and is one that I've encountered over and over again. We often entrust major decisions to folks who don't have enough information or are working with major assumptions about the situation that are incorrect. In this example, Jim Carrey's character is running furiously to get a briefcase back to it's owner, and when stopped (because the airplane has left), he assumes that his status as a Limo driver entitles him to board the plane even when no one else is allowed. The critical piece of information he was missing was that it wasn't simply his lack of status that was the reason he couldn't board the plane, but the fact that the plane had already left.

This is a lot like many computer 'experts' who claim that because of their status, they can ignore warnings or requests from others because 'They know better' about this sort of thing. They have the opinion that they are knowledgable enough to make broad assumptions about 'how things are supposed to be', but often are missing important and dangerous details. These blindspots are made famous by Donald Rumsfeld as unknown unkowns and cause many problems that we end up spending a lot of time extracting ourselves from.

To avoid the pain of running off the jetway, there are two important important things to remember:

  1. Don't Panic
  2. Be a good listener

Dont Panic: because once your brain goes into fight or flight mode, you can make bad decisions. This means you will revert to "Type 1" or automatic thinking. Your decisions will be short sighted and even irrational because your brain is busy trying to save you from whatever the emergency situation happens to be.

Be a good listener: since often the details of the situation can illuminate important details that make things much less urgent than they might seem on the surface. It takes careful and active listening skill to tease these details out and they will be buried if everyone is busy broadcasting their opinion. Calm ourself, listen to what is going on, and most importantly, start to detect what you aren't actively looking for.

Wednesday, November 20, 2013

Running and software development

I have a long love/hate relationship with running and I think that it's a great metaphor to help explain the subtle differences between agile practices and traditional development. My kids have been bitten by the running bug and they devote a lot of time to cross country and track. More importantly, they have trained for running longer distances and therefor better understand the importance of preparation, pace, and form.

As a person who spends a lot of time playing soccer, the idea of NOT running as fast as possible still requires a LOT of mental energy. On the rare occasion that I still get out running longer distances with my kids, they routinely tell me to slow down for the first mile... then they scratch their heads because I burned myself out blazing through the first mile in 7-8 minutes.

In software development, agile practices are the equivalent of the type of running done in soccer... you are actively changing direction, reacting to things on the field and using strong, explosive, but short bursts of energy. Scrum actually borrows metaphors directly from rugby to help explain the activities and practices (Rugby being a form of European football closely related to Association football from which Soccer derives its name).

More traditional linear development process and practices are more like running distances, where pacing yourself, conserving enough reserve energy to make the entire distance is more important than getting 40 yards ahead of the competition as fast as possible. Iterative or Agile processes are more like Football (soccer, rugby, american, austraian rules) where explosive movement and short timings are more important than endurance and predictable splits.

There is a place for both approaches, but too often folks want to try and do BOTH at the same time and it just doesn't work. It's a bit like trying to sprint a marathon while chasing a ball... it's just not going to work.

Monday, November 18, 2013

Using Olark online chat component

So I thought I'd investigate SAAS online chat components and I consulted the almighty google and selected my first contender. boldchat seemed like a good option until I discovered it seemed to need an exe to function. As I'm a max/linux snob, this immediately removed it from the running. My next contender was selected from a post on Stack Overflow was olark. I'll have to say that from the get-go, this seems like a very fully featured and easy to use tool. I was up and running within 5 minutes. After toying with this for a bit, I'm really quite impressed. If you need a quick and easy online chat on your web property, olark is a great way to get moving quickly.

Sunday, November 17, 2013

Fun with character encoding and when to use ISO8859 instead of UTF-8

What character encoding are you using? Most folks nowadays settle on UTF-8 for web centric type applications, but things can get squirrelly if you use this encoding and start working with non-unicode systems. Recently, we had a situation where we took the string representation that started out with an array of 17 0xff values. In a unicode aware system, using UTF-8, this will translate into a character sequence of 17 0xfffd values.

What happened? How did an array of 8 bit values get magically translated into an array of 16 bit values?

It requires a bit of digging, but the short version is that if your source system is using 8 bit characters (something like iso 8859-n) and you translate to unicode, you will fail on certain byte values because they are invalid in UTF-8 and other character encodings. The only thing that can then happen is to change the character to the "invalid character" which is 0xfffd. For reference, in UTF-8 the values 192, 193, and everything over 253 are invalid and will be translated into 0xfffd.

So what's the solution? If you MUST do this because you are interacting with an embedded device or something else that is not unicode aware, the simplest would be to use an ISO 8859 charset to support characters larger than 0x80 (128). This can be quite a challenge if you actually need to get the correct glyph because there are a number of charsets in this space. Note, in Java at least, all characters are 16bit values, so there is often some magical transformation necessary to switch between bytes and chars.

Examples for folks who'd like to see the problem in code:

char[] ba = {0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff,
                0xff};

byte[] ba2 = new byte[ba.length];
        for(int i = 0; i < ba.length;i++) {
            ba2[i] = (byte)ba[i];
        }


        try {
            String ascii = new String(ba2, "US-ASCII");
            String iso8859 = new String(ba2, "ISO-8859-1");
            String utf8 = new String(ba2, "UTF-8");

            char asciichar = ascii.charAt(0);
//this will be 65533 (0xfffd)
            char isochar = iso8859.charAt(0);
//this will be 255 (0xff)
            char utfchar = utf8.charAt(0);
//this will also be (0xfffd)

Note, there are still challenges as a char is 16 bits and you'll need to be careful when traversing systems to let everyone know your output is in an ISO-8859 character set. For example, if you generate an xml file and set the encoding to UTF-8, the file will contain �����������������, but if you use ISO-8859 it will contain ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.

A more entertaining aspect of this is if you subsequently try to insert this string value into a database that is unicode aware and assume your characters will fit into 17 bytes it will overflow the width because the ����������������� string of 17 chars is actually 34 bytes, but the ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ string can be represented as only 17 bytes.

In short, the space of ascii between 128 and 255 is treacherous territory and using UTF-8 only helps if everybody uses UTF-8, otherwise transliterating between the code pages can be quite an adventure.

Friday, September 20, 2013

MQTT over websockets with javascript apache and active mq

Reading up a big about MQTT I decided to set up a test bed to see how it works and if it lives up to it's potential. The use case was simple, I wanted to build a multi-user chat system that would use MQTT over websockets connected directly to an apache activemq server.

First off, I fired up an ec2 instance with ubuntu-13.04 and tried to apt-get activemq. It turns out, however, to use mqtt over websockets, you need version 5.9 of activemq. So I pulled a snapshot version of activemq. Enabling mqtt over websockets is a snap, you simply add the following configuration to your activemq.xml

<transportConnector name="mqtt+ws" uri="ws://0.0.0.0:1884maximumConnections=1000&wireFormat.maxFrameSize=104857600"/>

Next, download the eclipse paho mqtt client. And create a web page to send traffic back and forth.

One last thing I did was install apache on the same server as the activemq instance. I did this to avoid any potential problems with same origin policies.

At this point I create a page that uses the mqtt client and I can publish and subscribe to messages that will get pushed real time to the browser. My example follows (note, the aws instance in the connect string is no longer valid, you need to use your own server :)

<html>
    <head>
        <title>MQTT over websockets</title>
        <style type="text/css">
            #status {
                padding: 5px;
                display: inline-block;
            }
            label, .label {
                display: inline-block;
                width: 100px;
            }
            li {
                list-style: none;

                background: #fff;
                margin: 2px;
            }
            ul {
                background: #eef;
            }
            .disconnected {
                background-color: #f88;
            }
            input {
                width: 400px;
            }
            .connected {
                background-color: #8f8;
            }

            #messagelist {
                width: 600px;
                height: 200px;
                overflow-y: scroll;
            }
        </style>
    </head>
    <body>
        <h1>Super simple chat</h1>
        <span class='label'>Status</span> <div id="status" class="disconnected">Pending</div>
        <form id='mainform' action="#">
            <label for="name">Name</label>
            <input id="name" name="name" type="text" width="40" value="anonymous"/> <br/>
            <label for="message">Message</label>
            <input id="message" name="message" type="text" width="200"/>
            <input id="submit" type="submit" value="go"/>
        </form>

        <div id="messages"><ul id="messagelist">

        </ul></div>

    <script src="./mqttws31.js"></script>
    <script src="./jquery-1.10.2.js"></script>
    <script>
    $(document).ready(function() {
        function doSubscribe() {

        }

        $('#mainform').submit(function(){
            var messageinput = $('#message');
            message = new Messaging.Message(messageinput.val());
            message.destinationName = "/can/"+$('#name').val();
            messageinput.val('');
            messageinput.focus();
            client.send(message);
            return false;
        });


        function doDisconnect() {
            client.disconnect();
        }

        // Web Messaging API callbacks
        var onSuccess = function(value) {
            $('#status').toggleClass('connected',true);
            $('#status').text('Success');
        }

        var onConnect = function(frame) {
            $('#status').toggleClass('connected',true);
            $('#status').text('Connected');
            client.subscribe("/can/#");
            //var form = document.getElementById("example");
            //form.connected.checked= true;
        }
        var onFailure = function(error) {
            $('#status').toggleClass('connected',false);
            $('#status').text("Failure");
        }

        function onConnectionLost(responseObject) {
            //var form = document.getElementById("example");
            //form.connected.checked= false;
            //if (responseObject.errorCode !== 0)
            alert(client.clientId+"\n"+responseObject.errorCode);
        }

        function onMessageArrived(message) {
            $('#messagelist').prepend('<li>'+message.destinationName+ '->' +message.payloadString+'</li>');
            //var form = document.getElementById("example");
            //form.receiveMsg.value = message.payloadString;
        }

        var client;
        var r = Math.round(Math.random()*Math.pow(10,5));
        var d = new Date().getTime();
        var cid = r.toString() + "-" + d.toString()

        client = new Messaging.Client("ec2-54-242-185-231.compute-1.amazonaws.com", 1884, cid );
        client.onConnect = onConnect;
        client.onMessageArrived = onMessageArrived;
        client.onConnectionLost = onConnectionLost;
        client.connect({onSuccess: onConnect, onFailure: onFailure});

    });

    </script>
    </body>
</html>

Friday, August 23, 2013

An unflattering commentary on Rackspace cloud server security

I recently needed a new server instance for some testing. Normally I would go back to AWS as I've had problems with Rackspace in the past. Being open minded and assuming things have changed in the last couple years I thought I'd go back and try out Rackspace cloud for my testing (for reasons I will not name here).

My first and most shocking revelation is that they have NOT fixed a key security problem. I'm going to outline this right now and hopefully somebody can fix it

Problem #1: Login as root via ssh

Guys...guys...guys(or gals)... It is baffling to me that you still allow this. Yes I get that you have a wonderful "Blacklist the my server ip when something goes wrong" and "then disable access to my console to fix" routine going on to protect your network if MY machine gets compromised due to your silly lackadaisical security. Wait, that's actually a negative thing too :) please stop, I'm not going to use you as a provider until you fix this. In the interest of fairness I'll say, you DO generate a nice, secure, random looking password... but that isn't really good enough in my book. At a minimum, generate a random password for a random (or hell even let me name a user) userid, disable remote root access, and I MIGHT consider using your service, except for the next problem.

Problem #2: No firewall protecting the machine by default

So let's ignore the root access problem... well, ok we won't... Now we have an aggravating problem... BEFORE I even have an opportunity to do ANY hardening of the server, it's spun up and connected to the internet listening on ssh. While I get that in your book this isn't probably the end of the world, I'm quite "not thrilled" by this. I suppose this problem is mitigated by the fact that I need to install all my services manually, but I'm still not happy. Why wouldn't I get access to firewall rules (Like I do in AWS) to limit the attack profile on my server (like to only allow ssh from my network)?

Rackspace, come on guys, I just can't believe you're still doing this, it's been a couple years now, you should learned by now! I can't imagine this is an expensive proposition, hell, problem #1 was already fixed by the ubuntu team by default, you actually had to do work to defeat their efforts.

If your philosophical stance is that "This is an acceptable risk for my customers" well then, good luck to you, glad you made that decision for me, I'll be moving on to other providers that care about my business.

Tuesday, June 11, 2013

Java's problem is that Jidigava idigis gididibidigeridigish

I posted a while back about how ruby's syntax is better for designing software than java because it removes extra language cruft and enables developers to write more succinct and direct code. A common response to this from detractors is that "it's only three extra characters" and "my IDE can automatically generate that code for me". Sitting at the train station today reading a comment from someone that said something similar to "geeze, if you've got thousands of lines of code, why do you care about a couple of letters and a parenthesis or curly brace here and there?". I started thinking about why I care and discovered the reason: With those little three letters here and there, your code literally becomes a type of Gibberish

In gibberish, you use simple rules to add extra characters here and there (sounds familiar) to create and quite confusing language that is a 1 for 1 direct translation to/from english. While gibberish, pig latin, and other language games are entertaining past-times to kill summer afternoons and baffle outsiders when you converse using them, MOST folks wouldn't subscribe to The New York times translated into gibberish. More importantly, almost nobody would agree that writing a blog post in gibberish is worth the effort. So java programmers, instead of arguing about how "it doesn't matter", lower your defenses and look around at alternatives available to you (there are a lot).

As a graphic illustration (and a bit extreme, I admit), the first sentence of this blog post translated to my best attempt at idig/adig gibberish:

Idigi pidigostiged idiga whidigile badagack adigabagout hadigow ridiguby's sidigyntadigax idigis bidigettidigetter fidigor didigesidigignigiging sadigoftwidigare thatigan jadigavadaga bidicadigause idigit ridigemidigoves idigextridiga ladigangidiguage crididguft adigand idigenadigables didigevidigevladigopidigigers tidigo wridigite madigore sadiguccidiginct adigand didigiridigect cadigode.

I'm pretty sure the "plain english" version is probably better to communicate an idea if the intent is to communicate clearly. The only thing I did to render that above sentence was add "idig" or "adig" after the initial consonant of each syllable and put it at the front of the vowel for syllables that start with a vowel. This is very simple and we could probably create a word plug-in to make it super easy to translate normal english to gibberish with the click of a button.

That having been said, if there was a subculture that wrote and spoke exclusively in gibberish, wouldn't normal english speakers/writers also question their choice of languages? After all, it's adding extra syllables and letters that don't provide any direct value. In fact, the whole value of gibberish (other than entertainment) is that it's MORE difficult to understand, why would we ever argue that this is a good thing or "Not a problem (tm)".

Thursday, June 6, 2013

Software is design, how ruby is better for that job than java

As a long time java developer and ... well ... at this point also a long time ruby developer, I've found some things about the ruby language are much less bothersome than in java. Groovy takes an interesting middle road, but only gets about halfway in my mind. I'll leave the runtime differences and the dynamic/static compiled/interpreted debates for other forums and just focus on the Focus on this one irksome quirk.

Property Accessors are too verbose

Java Definition
class Car {
    private Color color;
    public Color getColor() {
        return color;
    }
    public void setColor(Color color) {
        this.color = color;
    }
}
and to use it:
Car car = new Car();
Color blue = car.getColor();
car.setColor(blue);

The whole getter setter thing is a pain to me. The bean pattern used by java is just overly verbose. For all the OO purists, I get it, we need to hide the private variables and put them behind methods to abstract away the inner structure, but I'm so weary of 3x lines of generated code to accomplish this.

Groovy (1/2 way there)
class Car {
    Color color;
}
or (if you need to override a behavior):
class Car {
    Color color;
    public void setColor(Color color) {
        println color
        this.color = color
    }
}
and to use the class (either approach) the syntax is much better:
car = new Car();
Color blue = car.color
car.color = blue

In the first groovy example above, the actual class implementation is identical to the java example. It is much cleaner and obviously easier to read because it's missing all the cognitive overhead of explicitly defined methods, parentheses, and other java doodads.

In the second example, we've added some code to print the color to the console in the setter and still written fewer lines of code. This both reduces our typing/typo load, plus reduces our "100 lines of getters and setters" overhead. So, good marks to Groovy for striking a middle ground and allowing developers to do the easy stuff the easy way and only imposing the java crazy syntax tax :) when it's necessary.

Ruby
class Car
    attr_accessor :color
end
and to override the set color method:
class Car
    attr_accessor: color

    def color= color
        puts color
        @color = color
    end
end
and to use it
car = Car.new
blue = car.color
car.color= blue

While perhaps a little implicit, one might think that there weren't any methods at all, it looks like you're just setting and getting values. The advantage of the ruby approach is that when you change the underlying implementation, you can still use the simple syntax much like looking at java properties, but when you NEED the complexity of hiding things behind a method... you can do this you still have method declarations that are symmetric with how you call them.

One thing I hear over and over on this regard from die hard java folks: "But mike I can just use a tool that will automatically generate all that boilerplate and even automatically refactor all the names, so this isn't an issue". I disagree, what they're really saying is that there is a workaround they feel "isn't a big deal" to minimize the impact of this wonkiness, but I feel this is a broken window that is a foundational problem with java based solutions.

For detractors, let me start by stating that is is my opinion that all programming is design. If you disagree, then we probably won't be able to come to an agreement on the following statement, which is: "The most important part of a programming language is its ability to convey ideas clearly, succinctly, and unambiguously to other developers". The problem with java (and many other languages) is that it isn't designed to convey ideas to developers, its design instead seems intent on conveying instructions to the computer than it does to convey ideas to other developers. The ruby approach, in this regard, is much better.

References

Wednesday, May 29, 2013

Fixing Perverse Incentives in Software Development

I read with interest an article about picking the right metric to incentivize desired behavior and thought I would add a little of my own personal insight. One problem with many (maybe most) software development organizations is that they inadvertently create perverse incentives, rewarding undesireable behavior and creating confusing and chaotic environments that, despite best efforts of all involved, seem to only on a hit or miss basis produce the desired result. Just as important, often the rewards are implicit and it isn't obvious that developers are actually being rewarded for the errant behavior.

Some short examples of widely used, but poor metrics I've observed as well as some simple and arguably better alternatives follow. For example:

  • Rewarding developers for "count of bugs fixed". Without accountability for who created the bug, this simply incents developers to release buggy half finished software that they will then subsequently be rewarded for "fixing". It's probably better to reward developers for "bug-free features delivered" or "lowest bug count".
  • Rewarding developers for "hours worked". Do we really want to incent someone to take 20 hours to do something they could actually accomplish in 20 minutes? It would be better to measure features delivered per unit time than simply the raw number of hours in chair.
  • Rewarding folks for being the "expert" in something. Inevitably this leads to situations where folks WANT to be the sole expert in something... hiding or obfuscating things or creating proprietary boondoggles that no outsider ever has a hope to figure out. Better to reward someone for creating something that is transparent and usable by all ... or better yet reward someone for teaching others how to do something than be the "expert".

Some examples of inadvertent rewards that crop up over and over again and reinforce the negative behavior:

  • We're going to pay you to your own mess. Yep, consultants love this one.. pay to build a mess AND fix it, you'll be gainfully employed for life! Maybe it would be better if we let you fix your bugs for free! on your own time! I wonder what would happen to the bug count in this case?
  • Look at the superhero. Wow! load balancer crashed and we had to call you on vacation because you're the only one who knows the root password/knows where the server is. What an ego boost! Perhaps instead we should celebrate the sysadmin who has NEVER had a server outage instead of the one who can't take a vacation because they're the only one who knows how to reboot the load balancer.
  • You're fired! This is a tricky one, and actually not a reward, but if there is a absolutely no reward in pointing out problems and ONLY risk, who's going to stand up and risk the axe for pointing out an honest error? It's much better to reward honest and constructive criticism of problems with appreciation for the effort and concern than creating a hide and go seek culture of "it's not my fault". I realize this is actually a risk, but I've put myself on a tight timeline so you'll have to live with it :)

While not comprehensive, hopefully we can use this list as a lens to view our software development teams and look beyond the superficial problems. In doing this, I believe, we can start getting better at constructing better metrics and rewards. This should lead to build better software, of a higher quality, delivered more reliably and quickly.

Thursday, May 16, 2013

When to refactor code

As a die hard refactorer, but also pragmatic programmer, I often have a tough time articulating to other developers when a refactor is important and when it is gratuitous. I can imagine many people look at decisions I've made about when it is and isn't appropriate and think it's simply a whim or "when I feel like it". To clarify this for both myself and any future victims/co workers involved with refactoring decisions I may make, I submit this 10 item checklist.

  1. Is the cyclomatic complexity of the function below 5?
  2. Did you completely understand what cyclomatic complexity was without following that link?
  3. Do you have an automated test or documented test case for every path through the function?
  4. Do all of the existing test cases pass?
  5. Can you explain the function and all it's edge cases to me in less than 1 minute?
  6. If not, how about 5 minutes?
  7. 10 minutes?
  8. Are there fewer than 100 lines of code in the function (including comments)?
  9. Can you find two other developers who agree this code is error free just by visual inspection?
  10. Is this function used in only one place?
  11. Does the function meet it's performance goals (Time/Memory/CPU)?

Scoring

Add up the "no" answers above:
  • 0-1 - Why would you even think about refactoring this? You probably just want to rename variables because you don't like the previous developer's naming convention
  • 2-5 - This might need a little tweaking, but I wouldn't hold up a production release for something in this range
  • 6-8 - OK, we probably need to fix this... It's likely we're going to keep revisiting this and/or we don't actually know what it's doing. Still on the fence, but it's highly suspect
  • 9+ - This is a prime candidate for refactoring. (Note, writing test cases is a form of refactoring)

While I know this is a rough guideline, I think it hits on the key factors that are important about the overall quality of source code and helps avoid overspending effort fixing things that aren't necessarily broken.

Monday, May 13, 2013

Apologetic Agile Development

Having lived through numerous attempts to build software embracing the concepts behind the agile manifesto, I feel there are three large categories folks fall into when talking about agile principles.
  1. The curmudgen - these folks have been writing code since punchcards where the state of the art, OR they have been brainwashed by large consulting organizations into thinking that a large heavyweight process is the only way to succeed. Note, a subset of these folks believe that "no process" is actually OK and are quite happy to cowboy-code their way through life.
  2. The fanboy - these folks think "everything agile all the time" and will rename status meetings to "scrums". These are folks who are used to working solo on projects that they can do in their heads... or they are simply not clued into the implications of actually having a repeatable process or delivering working software.
  3. The apologetic - these folks understand the principles and the value they provide, but also understand that these principles are the important thing and know that the current state of the art of software development is still very problematic. These folks often complain or quip that they are not doing "real agile", but accept that using some of the tool and principles coupled with more traditional principles, tools, and processes has much more value in most cases
I'm squarely in the apologetic camp (my ego transposes apologetic for pragmatic BTW), and while I feel I have a good understanding of where and how agile can deliver value, I also understand that many times agile gets sold as a magic bullet that never delivers completely on it's promises. I think this is a mistake: No process, methodology, or tool is perfect, folks who complain that "agile" causes problems in their projects or doesn't solve problems that they have are completely missing the point. No process, principle, or methodology should completely dominate your software development philosophy and enlightened developers should stop apologizing.

Sunday, February 17, 2013

java static fields

A great many people starting out with java development have only a vague understanding of the difference between a "public static String", "public String", and the difference between a class and an object. As this was confusing to me at first, I thought I would give a quick overview.

A class defines a template for what data and operations are available when you tell the JVM to create an object.

So, for example:

class BlogPost {
    public BlogPost(String inString) {
       text = inString;
       BlogPost.latest = this;
    }
    public String text = "";
    public static BlogPost latest;
   
}

When you do the following

BlogPost myPost = new BlogPost("Hello");

You're telling the JVM to allocate some memory on the heap to store a reference to a memory location and from now on, when I refer to myPost, it means that memory location. BlogPost is a class, myPost is an Object that is a reference to a memory location that is an instance of the BlogPost Class. When I'm trying to create a new instance of a BlogPost, the JVM searches it's classpath for a compiled Class definition of Type BlogPost with a constructor that takes a single argument which is a String object.

The important detail some people miss at first is that "static" fields and methods work on the Class definition, not on the Object instance. This means that if you create an instance of a BlogPost, and read the "latest" field, you will not get 5 different values, you'll only going to get the reference to the last one. This has implications for thread safety and other situations where multiple objects may read or write the same static field.

Further complicating the issue is that java can have multiple classloaders which means if "BlogPost" is found in different places (or even in different threads potentially) there might actually be multiple instances of the Class definition. This can make for interesting debugging situations where a static field is updated twice, but only one of them is visible from a particular perspective at a given time. As an illustration, suppose the following two snippets of code appear in different parts of a system:

BlogPost myPost1 = new BlogPost("Hello1");
BlogPost myPost2 = new BlogPost("Hello2");

If myPost1 happens first and myPost2 happens second, most people would expect BlogPost.latest to always refer to myPost2. This is not always the case though, in a web container, if myPost1 was created in a different classloader than myPost2, it's quite possible that certain parts of the system will ALWAYS see BlogPost.latest as myPost1 and other parts will ALWAYS see MyPost2, no matter what order these calls were made. Worse yet, it's possible that, if the class was garbage collected, you might see neither, even though most people would never expect that to happen.

Examples of how this might happen are when you deploy a jar file to a web container in it's parent classpath (like xml processing libraries) and also deploy them inside the web application (or in two different web applications in the same container). Depending on how the container handles the situation, you may very well get different results (even from what I described).

For a more complete set of examples and a better explaination, particularly in regard to implementing Singletons in java, see this

Wednesday, February 13, 2013

Database pagination on mySql and Oracle

Having studiously avoided Oracle for over 20 years, I'm now working in a shop that uses it almost exclusively. Aside from the general overall expense of the product I'm routinely amazed at how many features other DBMS's I've used (DB2, MSSQL, MySQL, PostGres) are either missing or syntactically difficult to understand.

The most recent example is server side pagination… or more specifically, having the DBMS limit the results returned to for specific subsets of rows. In oracle to do this, one must run a query something like this:


select * from (select name, rownum rn from 
        (select name
          from users order by name)
      where rownum <= 10) where rn > 5;

I realize that this is a legacy syntax, but I personally find the new way just as obtuse. The new way (I guess) is supposed to be:

select * from (select name,
        row_number() over
        (order by name) rn
  FROM users) where rn between 5 and 10 order by rn

Compare this with the syntax for mySql (also now I guess technically part of the Oracle corporation):

select * from users order by name limit 5,5;

I find mySql's syntax to be more concise and don't really understand why Oracle's syntax is so convoluted other than perhaps some dogmatic insistence on following some sort of standard or an internal engineering group who was all hopped up on set theory drugs of some sort ;)