Wednesday, April 20, 2011

Parsing RSS in java and Jruby

I've been playing around with some rss feed parsing and decided to contrast the ruby way with the java way.

First, the java way
import com.sun.syndication.feed.synd.*;
import java.util.List;

public class RssJava {
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        int total = 0;
        int entries = 0;

        try {
            SyndFeedInput input = new SyndFeedInput();
            SyndFeed synFeed = XmlReader(new URL("")));
            for (SyndEntry entry : (List) synFeed.getEntries()) {
                List contents = entry.getContents();
                for (SyndContent content : contents) {
                    total += content.getValue().split(" ").length;

        } catch (Exception e) {
        System.out.println(total + "/" + entries + "=" + total/entries);
        System.out.println((float)(System.currentTimeMillis()-start) / 1000.0);

Next, the ruby way:
start =

require 'rubygems'
require 'simple-rss'
require 'open-uri'

total_count = 0
entry_count = 0

rss = SimpleRSS.parse open("").read
rss.items.each do |item|
  total_count+= item.content.split(" ").count
puts "#{total_count}/#{entry_count}=#{total_count/entry_count}"
puts "#{ - start}"

and the output of these two:


Now for some interesting tidbits:
The java version took about 45 minutes to build. After trolling maven central, I settled on using Rome as it seemed the simplest and most straightforward.

The ruby version took about the same amount of time. My development time in ruby was slowed primarily because I tried to initially use the built-in rss parser in ruby, but it was failing on my blogspot feed. I never did figure out why, I just instead used the simple-rss gem and was off and running.

More interesting to me was that the java version was almost 3x faster. I realized I was using jruby 1.5.3 and I wasn't sure if that was a source of the performance problem so I upgraded to jruby-1.5.6-p249 and the speed was roughly equivalent. I ended up splitting it apart and seeing that the parse was taking about 800ms and the rest was building the data. It seems like ruby io may have some issues that I may need to look at.

Surprisingly, the split functions seem to do something different as the actual numbers being output are different. I'll have to take a look at that someday and try to understand what the heck is going on.

Not so surprisingly the ruby code is about 2/3 of the java and just seems more straightforward to me. Obviously the "straightforwardness" of the code is subjective, but I think java should seriously look at the type inference used in scala and put it on the roadmap.

No comments: