Saturday, April 23, 2011

Computus in scala and java

In the spirit of the holiday, I figured I would try my hand at writing computus in scala and java. For those who aren't aware, computus is the algorithm used to compute which day easter falls on.

First, the scala version:
object Computus {
  def main(args: Array[String]) {
    val start = System.currentTimeMillis()
    for (year <- 2000.until(1000000)) {
      println(pretty_computus(year))
    }
    println(System.currentTimeMillis()-start)
  }

  def golden(year:Long):Long = {
    year % 19 + 1
  }

  def century(year:Long):Long = {
    (year / 100)  +1
  }

  def  solar(year:Long):Long = {
     (3 * (century(year) /4) ) -12
  }

  def  lunar(year:Long):Long = {
     ((8 * century(year) +5) / 25) - 5
  }

  def  letter(year:Long):Long = {
    5 * year / 4 - solar(year) - 10
  }

  def  epact(year:Long):Long = {
    (11 * golden(year) + 20 +lunar(year) - solar(year)) %30
  }

  def  correct_9006(year:Long):Long = {
    val epact_val = epact(year)
    if (epact_val < 0) {
      epact_val + 30
    } else {
      epact_val
    }
  }

  def  correct_epact(year:Long):Long = {
    val epact_val = correct_9006(year)
    if (((epact_val ==25) && (golden(year) > 11)) || (epact_val ==24)) {
      epact_val + 1
    } else {
      epact_val
    }
  }

  def  n_whatever(year:Long):Long = {
    44- correct_epact(year)
  }

  def  fix_n(year:Long):Long = {
    val n=n_whatever(year)
    if (n<21){
      n + 30
    } else {
      n
    }
  }

  def  computus(year:Long):Long = {
    fix_n(year) + 7 -((letter(year)+fix_n(year))%7)
  }

  def pretty_computus(year:Long):String = {
    val day_of_march = computus(year)
    if (day_of_march>31) {
      "April " + (day_of_march -31).toString()
    } else {
      "March " + day_of_march.toString()
    }
  }
}

and now the java version

public class JavaComputus {
    
    
  public static void main(String[] args) {
    Long start = System.currentTimeMillis();
    
    for (long i = 2000; i < 1000000; i++) {
       System.out.println(pretty_computus(i));
        
    }
    System.out.println(System.currentTimeMillis()-start);
  }

  public static Long golden(Long year) {
    return year % 19 + 1;
  }

  public static Long century(Long year) {
    return (year / 100)  +1;
  }

  public static Long  solar(Long year) {
     return (3 * (century(year) /4) ) -12;
  }

  public static Long  lunar(Long year) {
     return ((8 * century(year) +5) / 25) - 5;
  }

  public static Long  letter(Long year) {
    return 5 * year / 4 - solar(year) - 10;
  }

  public static Long  epact(Long year) {
    return (11 * golden(year) + 20 +lunar(year) - solar(year)) %30;
  }

  public static Long  correct_9006(Long year) {
    Long  epact_val = epact(year);
    if (epact_val < 0) {
      return epact_val + 30;
    } else {
      return epact_val;
    }
  }

  public static Long  correct_epact(Long year) {
    Long epact_val = correct_9006(year);
    if (((epact_val ==25) && (golden(year) > 11)) || (epact_val ==24)) {
      return epact_val + 1;
    } else {
      return epact_val;
    }
  }

  public static Long  n_whatever(Long year) {
    return 44- correct_epact(year);
  }

  public static Long  fix_n(Long year) {
    Long n=n_whatever(year);
    if (n<21){
      return n + 30;
    } else {
      return n;
    }
  }

  public static Long  computus(Long year) {
    return fix_n(year) + 7 -((letter(year)+fix_n(year))%7);
  }

  public static String pretty_computus(Long year) {
    Long day_of_march = computus(year);
    if (day_of_march>31) {
      return "April " + ((Long)(day_of_march -31)).toString();
    } else {
      return "March " + day_of_march.toString();
    }
  }

}

I realize the scala implementation is basically "java in scala", but an interesting data point is that the the scala version is about 20% slower. From my perspective, it seems like the scala version should have almost identical performance characteristics to the java version.

More importantly, I must certainly be doing doing something wrong. Iterating and building the list in this manner cannot be the "right" way to do this, but I'm drawing a blank at how to improve things. Aside from the performance characteristics, I'm really interested in trying to improve the scala version to make it more idiomatic. I'm pretty sure I'm hampered by my lack of experience with functional languages.

Friday, April 22, 2011

Amazon EC2 Collapse and designing for cloud computing

As I'm sure most tech geeky folks now know, Amazon EC2 had a massive outage yesterday. This affected numerous online applications and web sites totally unrelated to amazon.com. My favorite new word is currently cloudpocalypse.

Many folks have decided that "cloud computing" is the next golden hammer that will solve any and all computing problems. I hate to rain on their parade (pun intended), but at least from my perspective, "cloud computing" primarily provides the ability to acquire cheap and fast computing infrastructure, have it online quickly, and scale it massively. EC2 is GREAT for that, as a matter of fact, there probably isn't anything nearly as powerful and complete on the market right now.

Note that I didn't include the word "reliable" anywhere in my value proposition. Don't trust a cloud provider to be reliable, to quote Amazon's own CTO "Everything Fails all the time". The important thing to consider when designing software for the cloud is how you deal with failure.

Many folks with traditional datacenters try to deal with failure by flogging sysadmins, developers, and vendors every time something goes wrong and generally spending a lot of time pointing fingers. This is pretty unhealthy and actually creates more problems than it solves, but when using amazon as a platform it's just not going to be an option.

What does this mean?

It means when designing for a cloud provider, you might need a plan "B" or a plan "C" when something goes wrong. Many traditional shops have a "disaster recovery" site which is physically separated from the main site. How do you do this in the cloud? Likely it means having alternate providers or real life physical servers that you have control over as a disaster recovery option.

Moreover it means that your applications should be designed in such a way that they can still function when they suddenly are running on a different machine in a different location or with a different set of components. If you rely on physical files (outside your app) being present for your application to function... you'll fail. If you rely on a database having certain up to date information... you'll fail, if you rely on "you own" IP address (within your application code) you'll fail.

In short, designing for cloud computing means designing for failure. Arguable ALL software design should be done this way, but I think cloud computing requires elevating the importance of this quality.

Thursday, April 21, 2011

How important is the devops movement

My short answer is: "very".

My Long answer Follows:

Devops is a reaction to a common problem in software development. It's referred to here as a "wall of confusion". I've blogged on this before and anybody who has ever worked in a shop with two different teams (ops vs dev) you will likely understand the problem very well. Inherently, Ops wants stability and dev wants change.

I believe the recent push for "agile" in IT organizations has aggravated this problem to an extreme in some places. This aggravation has been because the focus of agile has been on the DEVELOPMENT of software, but not the OPERATION of software. To clarify how these responsibilities are typically divided:

DEVELOPMENT Perspective
Make new features for customers
Get it done quickly
Get it done cheaply
Move on to next thing
Don't worry too much about the future or past… Live in the NOW

OPERATIONS Perspective
Fix the messes created by development
Prevent development from making more messes
Support all previous and future messes for eternity

Devops seeks to reconcile these differences and develop new processes, attitudes, and tools to align the goals and attitudes of these two perspectives. While largely in it's infancy and not without it's
detractors, I think the idea of fostering this change within technology organizations is critical to the future of successful software development.

Wednesday, April 20, 2011

Parsing RSS in java and Jruby

I've been playing around with some rss feed parsing and decided to contrast the ruby way with the java way.

First, the java way
import com.sun.syndication.feed.synd.*;
import com.sun.syndication.io.*;
import java.net.URL;
import java.util.List;

public class RssJava {
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        int total = 0;
        int entries = 0;

        try {
            SyndFeedInput input = new SyndFeedInput();
            SyndFeed synFeed = input.build(new XmlReader(new URL("http://mikemainguy.blogspot.com/feeds/posts/default")));
            for (SyndEntry entry : (List) synFeed.getEntries()) {
                entries++;
                List contents = entry.getContents();
                for (SyndContent content : contents) {
                    total += content.getValue().split(" ").length;
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
        System.out.println(total + "/" + entries + "=" + total/entries);
        System.out.println((float)(System.currentTimeMillis()-start) / 1000.0);
    }
}

Next, the ruby way:
start = Time.new

require 'rubygems'
require 'simple-rss'
require 'open-uri'

total_count = 0
entry_count = 0

rss = SimpleRSS.parse open("http://mikemainguy.blogspot.com/feeds/posts/default").read
rss.items.each do |item|
  entry_count+=1
  total_count+= item.content.split(" ").count
end
puts "#{total_count}/#{entry_count}=#{total_count/entry_count}"
puts "#{Time.new - start}"

and the output of these two:
Java:
10732/25=429
1.096

Ruby:
10244/25=409
3.668

Now for some interesting tidbits:
The java version took about 45 minutes to build. After trolling maven central, I settled on using Rome as it seemed the simplest and most straightforward.

The ruby version took about the same amount of time. My development time in ruby was slowed primarily because I tried to initially use the built-in rss parser in ruby, but it was failing on my blogspot feed. I never did figure out why, I just instead used the simple-rss gem and was off and running.

More interesting to me was that the java version was almost 3x faster. I realized I was using jruby 1.5.3 and I wasn't sure if that was a source of the performance problem so I upgraded to jruby-1.5.6-p249 and the speed was roughly equivalent. I ended up splitting it apart and seeing that the parse was taking about 800ms and the rest was building the data. It seems like ruby io may have some issues that I may need to look at.

Surprisingly, the split functions seem to do something different as the actual numbers being output are different. I'll have to take a look at that someday and try to understand what the heck is going on.

Not so surprisingly the ruby code is about 2/3 of the java and just seems more straightforward to me. Obviously the "straightforwardness" of the code is subjective, but I think java should seriously look at the type inference used in scala and put it on the roadmap.

Sunday, April 17, 2011

If it's stupid but it works, it's still stupid, it just also happens to work

Two phrases really are irksome to me: "If it's stupid, but it works, it ain't stupid" and "If it ain't broke, don't fix it"

They bother me not because I disagree with the intent, I wholeheartedly understand and agree with the sentiment. My problem is that they are often used to justify cutting corners and generally producing inferior products in order to avoid fixing problems that customers might not detect.

First off, I'm not foolish, I understand nothing is perfect. I understand that software will have bugs, cars will have problems, and sometimes judicious use of duct tape is the best solution.

BUT

When you don't acknowledge "quick fixes" as such, this inevitably lowers what people think the baseline level of quality should be. In my experience this has a cumulative effect of causing the next "quick fix" to be of even lower quality because it's just a "little worse" than the previous stupid solution. The particularly bad thing about this attitude is that two things happen: #1 People who KNOW they're doing a bad job begin to be ashamed of their work and try to hide things or make up excuses why they're doing the wrong thing #2 People who don't know any better think that these "quick fixes" are the right way of doing things. Both of these factors lead to a situation were team members don't feel any obligation for doing the right thing.

My mantra is the Boy Scout Rule, "Always leave things a little better than they were when you started". If you see something out of place, improper, or otherwise incorrect, you have an obligation to fix it. This is not to say you need to boil the ocean and fix every problem at once, but you have an obligation to fix at least one little thing. It doesn't matter if it's your mess or someone else's mess, it matters that the mess gets cleaned up. Adopting this attitude within your team has the effect of raising the quality as well as help to ingrain a sense of ownership and responsibility.

Friday, April 15, 2011

Simple software estimation guidelines

As software developers, a common problem we face is trying to estimate how long it might take to build some idea still locked in a customer's head. Many legacy, high ceremony processes try to use highly precise measurement systems (man-hours etc) to measure this. A problem with this approach is that they very often fail or give deceptive results because of their inability to account for a confidence factor and changing events. Put another way, these methods are precise, but inaccurate.
Many agile frameworks attempt to fix this by decoupling the estimation process from reality. They do things like use "points" or "high medium low" and then use historical data to adjust velocities based on actual performance. This approach has the advantage of acknowledging uncertainty and the inherent inaccuracy estimation often contains, it has a major flaw in that advance resource planning and scheduling can become a nightmare.
My approach is to blend of both these approaches: Estimate things by using hours, days, weeks, and "I don't know". This has the advantage of being fairly vague and can be used like the arbitrary numbers that the agile processes espouse, but also has the advantage of immediately letting stakeholders perform resource planning and start to build a schedule.