If you never really understood how joins work, or if you're somewhat familiar with .joins but hardly grasp what .left_outer_joins does, this is the blog post you need.

I voluntarily won't talk about the other join types because this article mainly targets Ruby on Rails developers. In this context, most of us will probably never need more than INNER JOIN and LEFT OUTER JOIN. My goal is to help you understand how these two work. After reading this post, it'll become easy for you to understand the other ones on your own.

The secret to understanding SQL joins

Here it is: when you perform an SQL join, your database engine creates some kind of virtual table composed of both tables. Bear with me, I'll explain this in a bit.

Context

Let's say we have the following setup:

class Book < ApplicationRecord
  belongs_to :category, optional: true
end

class Category < ApplicationRecord
  has_many :books
end

3.times { Category.create }
Book.create(category: Category.first)
Book.create(category: Category.first)
Book.create(category: Category.second)
Book.create

Our database would look like this:

Database content

Building a temporary table

When you perform a join like Category.joins(:books) for instance, your database builds a temporary table that looks like this:

Virtual table

Here, what we call the left side is the categories table, and what we call the right side is the books table.

If we did Book.joins(:categories), the temporary table would have been built the other way around: books would've been left and categories would've been right.

How is the temporary table filled?

The goal of the database is to represent, in this temp table, all the records of both categories and books tables. This means that:

  • a category that has no book associated will appear once, the book-side will be filled with NULL
  • a book that has no category associated will appear once, the category-side will be filled with NULL
  • a category that has only one book associated will result in a single row (representing both the category and its associated book)
  • a category that has 2 books associated will result in 2 rows: this category will appear twice, each time with a different book associated

The left-side table may thus have duplicate records.

Now that the rows are built, let's talk about how rows are selected.

#joins

.joins performs an INNER JOIN. It selects only rows that have both a left and a right side present.

Inner join

irb(main):001:0> Category.joins(:books).all
   (0.4ms)  SELECT sqlite_version(*)
  Category Load (0.1ms)  SELECT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 
[#<Category:0x0000563b67314748 id: 1, ...>,
 #<Category:0x00007f070442fbc8 id: 1, ...>,
 #<Category:0x00007f070442fb00 id: 2, ...>]

#left_outer_joins

.left_outer_joins (or its alias .left_join) performs a LEFT OUTER JOIN. It selects all rows that have a left side present, no matter whether they have a right side present or not.

Left outer join

irb(main):002:0> Category.left_outer_joins(:books).all
  Category Load (0.7ms)  SELECT "categories".* FROM "categories" LEFT OUTER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 
[#<Category:0x00007f0704821998 id: 1, ...>,
 #<Category:0x00007f07048218d0 id: 1, ...>,
 #<Category:0x00007f07048217e0 id: 2, ...>,
 #<Category:0x00007f07048216f0 id: 3, ...>]

#distinct

Now that you understand how rows are built and selected, you understand why there might be duplicates on the left side. To get rid of them, you can call .distinct:

irb(main):001:0> Category.joins(:books)
   (0.5ms)  SELECT sqlite_version(*)
  Category Load (0.1ms)  SELECT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 
[#<Category:0x00007f736480bf40 id: 1, ...>,
 #<Category:0x000055a59cc93000 id: 1, ...>,
 #<Category:0x000055a59cc92e70 id: 2, ...>]

irb(main):002:0> Category.joins(:books).count
  Category Count (0.2ms)  SELECT COUNT(*) FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 3

irb(main):003:0> Category.joins(:books).distinct
  Category Load (0.7ms)  SELECT DISTINCT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 
[#<Category:0x000055a59c7d4910 id: 1, ...>,
 #<Category:0x000055a59c7d4780 id: 2, ...>]

irb(main):004:0> Category.joins(:books).distinct.count
  Category Count (0.7ms)  SELECT COUNT(DISTINCT "categories"."id") FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 2

Adding conditions to your JOIN clause

You can add as many conditions as you wish at this point. To illustrate how, I'll give just two examples.

1. Select categories associated with books that have a price > 100

Assuming the books table has a price column:

irb(main):010:0> Category.joins(:books).where("books.price > 100").distinct
  Category Load (0.7ms)  SELECT DISTINCT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id" WHERE (books.price > 100)

Remember that each row contains both the category side and the book side. Thus, I can use table_name.column_name within my WHERE clause to refer to any column.

2. Select categories that don't have any book associated

irb(main):011:0> Category.left_outer_joins(:books).where(books: { id: nil })
  Category Load (0.8ms)  SELECT "categories".* FROM "categories" LEFT OUTER JOIN "books" ON "books"."category_id" = "categories"."id" WHERE "books"."id" IS NULL

Why this works:

  1. When the left side doesn't have associated records, its right side is filled with NULL. Therefore if I select books.id = NULL, I'll find categories that don't have any book associated.
  2. I don't need .distinct because there are duplicates only when there are multiple associated records, which is not the case of the records I'm selecting with my WHERE clause.

How an SQL query looks like

SELECT /* Give me rows */
DISTINCT /* Do not return duplicates */
  "categories".* /* Columns to return for each row */
FROM "categories" /* Left-side of the temp table */
LEFT OUTER JOIN /* Type of join to perform */
  "books" /* Right-side of the temp table */
ON "books"."category_id" = "categories"."id" /* Relation that links a left-side record to a right-side one (most of the time: right.foreign_key = left.primary_key) */
WHERE "books"."id" IS NULL /* Conditions for selecting records within the temp table */

Final note

Your database engine does not really build a temporary table because it would be way too inefficient. It was just a mental picture I wanted you to have to help you understand how joins work. Databases nowadays know how to perform a join without actually copying all records into a new temp table.

Thank you for reading!
Younes SERRAJ