Start With The Problem

Don't create solutions until you understand the problem.
Don't create solutions until you understand the problem.
  • rss
  • archive
  • It’s not NoSQL, it’s NoJoin.

    As I’ve been studying and playing with the non-relational databases (aka NoSQL), I realized that the name NoSQL is a poor descriptor for these types of databases. Yes, I know, I’m not the only one who’s noticed this, but however you interpret NoSQL, whether it’s No SQL or Not-Only SQL, it’s still wrong, because it’s not that these database don’t do SQL (in fact, some have very SQL-like query languages), it’s that they don’t do joins.

    Joins are the underpinnings of relational databases, but are also the hardest things to make perform well. The reason why databases such as MongoDB, Cassandra, CouchDB, Redis, etc., are all so blazingly fast is that they don’t do joins. For example, if you have a table of Employees and another table of Addresses for those employees, and you want to get a list of the employees with their addresses, that’s a join. In a relational database, you can easily combine them to get a nice list. With NoJoin databases, you’re on your own if you want to join them, but the whole point of the NoJoin databases is that you don’t put those two types of information in separate places, you put them in one big table (hey, where have I heard that name before?) and when you want an employee with his/her address, you just grab that document (or value) and you’re done.

    Daniel Lemire wrote about this a couple of years ago and so from now on, I’m going to only refer to such databases as NoJoin databases, because that’s the kind of pedantry I can get behind. Won’t you join me? (ha. ha.)

    • October 9, 2012 (1:15 pm)
© 2012–2024 Start With The Problem