When is a JOIN not a JOIN and what is a black box?

You probably heard this one as a kid right? No? Well the answer to the riddle is this: when one is a LEFT JOIN, one is an INNER JOIN and you are using a certain RDBMS named Sybase.

So I recently heard of a situation with Sybase where a LEFT JOIN on a particular key between two tables would return records but the INNER JOIN wouldn’t. I’m talking this sort of thing if I remember properly:

SELECT *
FROM Customer c LEFT JOIN Order o
  ON c.CustomerId = o.CustomerId
WHERE c.State = 'MN'

SELECT *
FROM Customer c INNER JOIN Order o
  ON c.CustomerId = o.CustomerId
WHERE c.State = 'MN'

Now on first glance you are probably thinking well duh the Minnesota Customer didn’t have any Order yet, so of course a LEFT JOIN works differently than an INNER JOIN… go read your SQL book again. Not the case in fact. Literally the database would refuse to return records where say CustomerID = 1 in both tables for an INNER JOIN, but worked as expected for a LEFT JOIN. Furthermore the database would willing assert the values were equal when you limited the record set to 1 from each table and compared them directly.

(Diagram below)

Obviously something fishy is going on here. There was some investigation and talk of indexes that may be corrupted or something else but in the end solution was never discovered due to time constraints. The developer literally ended up having to extract portions of the tables into memory of a .NET app to do the join because he couldn’t trust the database to do it properly.

Black Box?

Now I wouldn’t consider a RDBMS a black box fully, more like a gray box or something like this:

.rdbms
{
opacity:0.7;
filter:alpha(opacity=70); /* for < IE8 */
}

This is because there are many things a developer (or DBA) can do to increase the performance of a database when it comes to database structure, data types, indexes, page sizes, looking at the query plan, or writing performant queries. We aren’t talking about that though, we are talking about this:

1 != 1

Perhaps the indexes were corrupted or something crazy like that but at what point does that become an implementation detail? I can specify indexes as a developer to increase performance of joins or queries but do I really have control over, or should I reasonably have to be concerned about if they have been become corrupted? If they do shouldn’t the RDBMS be capable of realizing this corruption, blowing up or correcting it rather than literally being wrong?

The Point

At some point I have to be able to reasonably assume that in C# for example (1 == 1) == true. To me C# is also a black box when it comes to the native code that is executed – at some point I have little idea what it looks like and I shouldn’t need to care.

I should not need to know exactly how the depths of some platform is working or look at the actual machine code to be productive. To me one of the main reasons why C# and Java are successful and productive programming platforms is that you are able to focus on delivering functionality and not every last implementation detail of how the code works at the lowest abstracted levels. If I cannot make the assumption that things are going to just work or I am constantly second guessing the platform my productivity is going to be severely hindered and I may end up becoming extremely paranoid.

We will just put it this way: If C# pulled a (1 != 1) == true on me I might just switch to doing JavaScript full time…

Black Box?

The Point

Leave a Reply Cancel reply