Should I use mongodb, couchdb, or redis?

In the current nosql fervor, there is an important distinction that seems to get missed repeatedly. There are two (OK three) really important factors that these tools use to distinguish themselves and many people completely miss the point.

The first factor is durability -- does the data actually get saved to a disk somewhere and, if so, how often and how much might I lose if something "goes wrong"? Redis and mongodb users might be somewhat surprised to learn that, by default, they can lose your data should the process crash or shut down. While you can configure them to work around this issue, you're going to slow things down substantially doing so and therefore lose the big advantage they've been designed to provide. In short, redis is a great alternative to something like memcached, but is not really an alternative to something like couchdb.

Which brings me to the second factor, which is searchability (I couldn't think of a better term) -- Key-value stores are typically not designed to be easy to search, but to be able to fetch values by a particular key really quickly. Document stores are designed to enable more dynamic searching, often at the expense of some other attribute like speed, memory, or disk space.

Lastly, there's speed -- couchdb can be fast, but it's not really going to compare at real-time updates to mongo or redis. If real-time is your most important factor, couch is probably not your best solution (actually it certainly isn't your best solution).

So in the crop of current contenders (in no particular order) I'll give you my winners in certain use cases:

  • A fast disposable cache based on discrete keys: Redis... it's fast, it's widely known, it's easy to set up and use and more flexible than memcached (although memcached is also a good choice).
  • A durable and searchable document store that slowly accumulates more data and needs some concept of versioning (maybe like wikipedia or a blog engine): Couchdb
  • A quasi-durable searchable document store with quickly changing values (like a real-time status reporting application... maybe facebook or twitter: Mongodb

As for the other 9,999 choices that currently exist, I'd say don't dig around too much or agonize over your choice unless there is a specific and very important problem your application needs to solve that is difficult or complicated with these solutions. Should you get into that situation (like maybe needing to find directions like google maps) then you'll need to expand your horizons and look into other solutions. My recommendation is to start with one of these three and only go to a different solution when necessary. You could six months researching all of the possibilities and at the end have nothing but outdated research. Pick something and run with it, only then will you understand the problem and be able to make a better/more informed decision for your scenario.

More importantly, you'll probably notice that for many real-world solution, it might make sense to use all three of these (or more). I think part of what causes problems in "fair" comparisons of technology is that folks think they can pick the "single best solution for all problems" and that's just not a realistic perspective.

Comments

antirez said…
Hello, Redis author here, if you enable AOF in Redis, with default fsync every second, there is no significant change in performances (including in heavy writes workloads) compared to RDB persistence.

So in short: you can have durability and speed in Redis, and your article should be reviewed for correctness.

Salvatore
Mike Mainguy said…
Thanks for the comment, you've built a great tool and I appreciate your feedback.

As far as accuracy is concerned, I called out at the beginning of the post that redis and mongo can both be configured to be durable, but that "out of the box" they are not.

I used this as the basis for comparison because I've met a number of people who assumed that these two are going to persist things somewhere and were actually surprised to learn that this was not the case.

Popular posts from this blog

the myth of asynchronous JDBC

The difference between Scalability, Performance, Efficiency, and Concurrency explained

Push versus pull deployment models