If you never really understood how joins work, or if you're somewhat familiar with .joins
but hardly grasp what .left_outer_joins
does, this is the blog post you need.
I voluntarily won't talk about the other join types because this article mainly targets Ruby on Rails developers. In this context, most of us will probably never need more than INNER JOIN
and LEFT OUTER JOIN
. My goal is to help you understand how these two work. After reading this post, it'll become easy for you to understand the other ones on your own.
The secret to understanding SQL joins
Here it is: when you perform an SQL join, your database engine creates some kind of virtual table composed of both tables. Bear with me, I'll explain this in a bit.
Context
Let's say we have the following setup:
class Book < ApplicationRecord
belongs_to :category, optional: true
end
class Category < ApplicationRecord
has_many :books
end
3.times { Category.create }
Book.create(category: Category.first)
Book.create(category: Category.first)
Book.create(category: Category.second)
Book.create
Our database would look like this:
Building a temporary table
When you perform a join like Category.joins(:books)
for instance, your database builds a temporary table that looks like this:
Here, what we call the left side is the categories
table, and what we call the right side is the books
table.
If we did Book.joins(:categories)
, the temporary table would have been built the other way around: books would've been left and categories would've been right.
How is the temporary table filled?
The goal of the database is to represent, in this temp table, all the records of both categories
and books
tables. This means that:
- a category that has no book associated will appear once, the book-side will be filled with
NULL
- a book that has no category associated will appear once, the category-side will be filled with
NULL
- a category that has only one book associated will result in a single row (representing both the category and its associated book)
- a category that has 2 books associated will result in 2 rows: this category will appear twice, each time with a different book associated
The left-side table may thus have duplicate records.
Now that the rows are built, let's talk about how rows are selected.
#joins
.joins
performs an INNER JOIN
. It selects only rows that have both a left and a right side present.
irb(main):001:0> Category.joins(:books).all
(0.4ms) SELECT sqlite_version(*)
Category Load (0.1ms) SELECT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=>
[#<Category:0x0000563b67314748 id: 1, ...>,
#<Category:0x00007f070442fbc8 id: 1, ...>,
#<Category:0x00007f070442fb00 id: 2, ...>]
#left_outer_joins
.left_outer_joins
(or its alias .left_join
) performs a LEFT OUTER JOIN
. It selects all rows that have a left side present, no matter whether they have a right side present or not.
irb(main):002:0> Category.left_outer_joins(:books).all
Category Load (0.7ms) SELECT "categories".* FROM "categories" LEFT OUTER JOIN "books" ON "books"."category_id" = "categories"."id"
=>
[#<Category:0x00007f0704821998 id: 1, ...>,
#<Category:0x00007f07048218d0 id: 1, ...>,
#<Category:0x00007f07048217e0 id: 2, ...>,
#<Category:0x00007f07048216f0 id: 3, ...>]
#distinct
Now that you understand how rows are built and selected, you understand why there might be duplicates on the left side. To get rid of them, you can call .distinct
:
irb(main):001:0> Category.joins(:books)
(0.5ms) SELECT sqlite_version(*)
Category Load (0.1ms) SELECT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=>
[#<Category:0x00007f736480bf40 id: 1, ...>,
#<Category:0x000055a59cc93000 id: 1, ...>,
#<Category:0x000055a59cc92e70 id: 2, ...>]
irb(main):002:0> Category.joins(:books).count
Category Count (0.2ms) SELECT COUNT(*) FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 3
irb(main):003:0> Category.joins(:books).distinct
Category Load (0.7ms) SELECT DISTINCT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=>
[#<Category:0x000055a59c7d4910 id: 1, ...>,
#<Category:0x000055a59c7d4780 id: 2, ...>]
irb(main):004:0> Category.joins(:books).distinct.count
Category Count (0.7ms) SELECT COUNT(DISTINCT "categories"."id") FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id"
=> 2
Adding conditions to your JOIN clause
You can add as many conditions as you wish at this point. To illustrate how, I'll give just two examples.
1. Select categories associated with books that have a price > 100
Assuming the books
table has a price
column:
irb(main):010:0> Category.joins(:books).where("books.price > 100").distinct
Category Load (0.7ms) SELECT DISTINCT "categories".* FROM "categories" INNER JOIN "books" ON "books"."category_id" = "categories"."id" WHERE (books.price > 100)
Remember that each row contains both the category side and the book side. Thus, I can use table_name.column_name
within my WHERE clause to refer to any column.
2. Select categories that don't have any book associated
irb(main):011:0> Category.left_outer_joins(:books).where(books: { id: nil })
Category Load (0.8ms) SELECT "categories".* FROM "categories" LEFT OUTER JOIN "books" ON "books"."category_id" = "categories"."id" WHERE "books"."id" IS NULL
Why this works:
- When the left side doesn't have associated records, its right side is filled with NULL. Therefore if I select
books.id = NULL
, I'll find categories that don't have any book associated. - I don't need
.distinct
because there are duplicates only when there are multiple associated records, which is not the case of the records I'm selecting with my WHERE clause.
How an SQL query looks like
SELECT /* Give me rows */
DISTINCT /* Do not return duplicates */
"categories".* /* Columns to return for each row */
FROM "categories" /* Left-side of the temp table */
LEFT OUTER JOIN /* Type of join to perform */
"books" /* Right-side of the temp table */
ON "books"."category_id" = "categories"."id" /* Relation that links a left-side record to a right-side one (most of the time: right.foreign_key = left.primary_key) */
WHERE "books"."id" IS NULL /* Conditions for selecting records within the temp table */
Final note
Your database engine does not really build a temporary table because it would be way too inefficient. It was just a mental picture I wanted you to have to help you understand how joins work. Databases nowadays know how to perform a join without actually copying all records into a new temp table.
Thank you for reading!
Younes SERRAJ