Let's see how to write a recursive SQL query with PostgreSQL's CTE (Common Table Expressions) and ActiveRecord
What is a recursive SQL query
A recursive SQL query is a SELECT
statement that'll run on a unidirectional, non-looping graph data structure. It builds a temporary table where each record can use data from it's parent record to serve whatever logic the application needs.
When does one need them
Here are two very common examples:
- Computing some kind of position or depth for each record
- You store a linked list and need the position of each node
- You store a filesystem and need to know the depth of each file in it
- Concatenating strings
- You concatenate the parent record's name with the current record's name to build a breadcrumb
Prerequisite
I'll go with the basis that you know what recursion is in general. If you're not comfortable with how the below #depth
method works, I advise you to learn the concept of recursion first before applying it to SQL statements.
class Node
attr_accessor :left, :right
def initialize(left: nil, right: nil)
@left = left
@right = right
end
def depth
1 + ([left&.depth, right&.depth].max || 0)
end
end
root = Node.new(
left: Node.new(
left: Node.new,
right: Node.new(
right: Node.new
)
),
right: Node.new
)
puts root.depth
How to write a recursive SQL query with PostgreSQL's CTE
Let's write a simple one.
Suppose we have the following categories
table:
create_table :categories do |t|
t.integer :parent_id
t.string :name
t.timestamps null: false
end
class Category < ApplicationRecord
belongs_to :parent, class_name 'Category', foreign_key: 'parent_id', inverse_of: :children, optional: true
has_many :children, class_name 'Category', foreign_key: 'parent_id', inverse_of: :parent
end
Let's say we want to add an extra-column in the SQL result that'd be called fullname
. Read the following SQL statement then we'll break it down to understand how it works.
WITH RECURSIVE temp_table (parent_id, id, name, fullname)
AS (
SELECT parent_id, id, name, name
FROM categories
WHERE parent_id IS NULL
UNION
SELECT
current_row.parent_id,
current_row.id,
current_row.name,
CONCAT_WS(' > ', previous_row.name, current_row.name)
FROM categories current_row
INNER JOIN temp_table previous_row ON current_row.parent_id = previous_row.id
)
SELECT *
FROM temp_table
CONCAT_WS
: Concatenate with separator
CONCAT_WS(' > ', 'a', 'b')
=> 'a > b'
CONCAT_WS(' > ', NULL, 'a')
=> 'a'
Okay, now let's break it down.
1. First, tell PG the structure of the recursive temp table
WITH RECURSIVE temp_table (parent_id, id, name, fullname)
AS (
# ...
)
We want to build:
- a temporary table:
temp_table
- that'll be composed of the following columns:
parent_id, id, name, fullname
These columns can be named anything you want. In this case, I tried to keep the naming close to the naming of categories
's columns but it's not mandatory. Obviously, the same goes for the temp table's name.
Now that the new table is declared, we're going to tell PG how to fill it in two steps: first the initial parent/root records, then their descendants.
2. Next, tell PG how to build the root records
SELECT parent_id, id, name, name
FROM categories
WHERE parent_id IS NULL
Remember we declared a series of columns? Now is the time to fill them. The values given to SELECT will be assigned to the root records' columns in the order in which the temp table was declared:
-
temp_table.parent_id
=categories.parent_id
-
temp_table.id
=categories.id
-
temp_table.name
=categories.name
-
temp_table.fullname
=categories.name
Why
WHERE parent_id IS NULL
?
We want to traverse the records from roots to leafs. Root categories don't have a parent category, therefore we can select them with the parent_id IS NULL
condition.
3. Then, tell PG how to build the children records
UNION
SELECT
current_row.parent_id,
current_row.id,
current_row.name,
CONCAT_WS(' > ', previous_row.name, current_row.name)
FROM categories current_row
INNER JOIN temp_table previous_row ON current_row.parent_id = previous_row.id
We join parents and children records in the following manner:
- We perform an INNER JOIN (since we're at the children level, the parent always exists)
- We alias
categories
ascurrent_row
- We alias
temp_table
asprevious_row
- We select values from either
current_row
orparent_row
depending on the logic we want to achieve. Here we collectparent_id, id, name
from the current row and we craft a new value forfullname
.
I chose the names current_row
and parent_row
because they make the logic easier to understand IMO but you can name them whatever you want.
4. Finally, perform a regular SELECT statement on that temp table
SELECT *
FROM temp_table
Nothing fancy here, we select all columns of all records within the temp table.
Annnd we're done :)
Other example: an ordered linked list
Now that you understand how it works, here's another very common usage:
WITH RECURSIVE ordered_nodes (previous_id, id, value, position)
AS (
SELECT previous_id, id, value, 0
FROM nodes
WHERE previous_id IS NULL
UNION
SELECT
current_row.previous_id,
current_row.id,
current_row.value,
previous_row.position + 1
FROM nodes current_row
INNER JOIN ordered_nodes previous_row ON current_row.previous_id = previous_row.id
)
SELECT *
FROM ordered_nodes
ORDER BY position ASC
Use the result with ActiveRecord
ActiveRecord offers a very handy method called #find_by_sql to convert the array returned by PG into objects of your model's type (as if you did Category.all
for instance).
You can simply do the following:
sql_query = "WITH RECURSIVE ... SELECT * FROM temp_table"
records = Category.find_by_sql(sql_query)
I hope this clarifies what recursive SQL statements are and how to craft them. It's a really useful tool and it's often more efficient to delegate this kind of task to your database engine.
One last thing: without going into the details of it, just know that you can also create recursive SQL views.
Thank you for reading!
Younes SERRAJ