Monday, August 10, 2015

Please use ANSI-92 SQL Join Syntax

Way back in 1992, a standard was published for SQL. In it, the syntax for specifying joins in SQL Database was finally standardized. This was a good thing because (for folks who've been doing this a while), trying to decode the vagaries of how various Databases handled outer joins could drive you bonkers yielding different results depending on the parse order of things in the from and where clauses. Unfortunately, some DBMS vendors took a while to adopt this standard (Oracle didn't support it until 2001), but at this point if you're using a major RDBMS and NOT using it, you're probably mired in a tradition or habit that bears changing.

As an example of what I'm describing, here are some examples of what I mean.

Old School (no standard syntax + implementation and evaluation of predicates varied by vendor [and query]):

select * from customer, order
where customer.customer = order.customerid

ANSI way (agreed to standard, predicate evaluation has some rules that are followed):

select * from customer
  inner join order on customer.customerid = order.customerid

Now for the crusty folks who are going to argue "I've always done it this way and there's no quantifiable reason to switch" I give you a photo to illustrate my perspective

For folks who don't understand why it matters, I'll give a quick conceptual overview (that is slightly inaccurate, but serves to illustrate my point). In the above trivial example it IS a little arbitrary, but lets take a more involved example:

select * from customer, order, address, product
  where customer.customerid = order.customerid and customer.customerid = address.customerid and order.status = 'SHIPPED' and product.size = 'XL'

Can you spot the defect (this is still pretty trivial, but hopefully you've been burned by this a few times and will catch on. Compare this to:

select * from order
  inner join customer on order.customerid = customer.customerid
  inner join address on address.customerid = customer
  inner join product on <--wait what are we joining on here...cartesian product time...what are we actually joining on?
where order.status = 'SHIPPED' and product.color = 'WHITE' and product.size = 'XL'

While this is a quantitative evaluation and the readability of the second form is pretty subjective. If one makes the conceptual leap to say that "join" clause describes how to link the tables together and "where" clause describes how to filter the joined rows, the second form begins to make sense. The first form with 16 tables and a few out joins thrown in becomes an exercise in cognitive overload, where the second form (in more cases) is easier to mentally parse and reason about.

No comments: