Let's see how to write a recursive SQL query with PostgreSQL's CTE (Common Table Expressions) and ActiveRecord
A recursive SQL query is a
SELECT statement that'll run on a unidirectional, non-looping graph data structure. It builds a temporary table where each record can use data from it's parent record to serve whatever logic the application needs.
Here are two very common examples:
- Computing some kind of position or depth for each record
- You store a linked list and need the position of each node
- You store a filesystem and need to know the depth of each file in it
- Concatenating strings
- You concatenate the parent record's name with the current record's name to build a breadcrumb
I'll go with the basis that you know what recursion is in general. If you're not comfortable with how the below
#depth method works, I advise you to learn the concept of recursion first before applying it to SQL statements.
class Node attr_accessor :left, :right def initialize(left: nil, right: nil) @left = left @right = right end def depth 1 + ([left&.depth, right&.depth].max || 0) end end root = Node.new( left: Node.new( left: Node.new, right: Node.new( right: Node.new ) ), right: Node.new ) puts root.depth
Let's write a simple one.
Suppose we have the following
create_table :categories do |t| t.integer :parent_id t.string :name t.timestamps null: false end
class Category < ApplicationRecord belongs_to :parent, class_name 'Category', foreign_key: 'parent_id', inverse_of: :children, optional: true has_many :children, class_name 'Category', foreign_key: 'parent_id', inverse_of: :parent end
Let's say we want to add an extra-column in the SQL result that'd be called
fullname. Read the following SQL statement then we'll break it down to understand how it works.
WITH RECURSIVE temp_table (parent_id, id, name, fullname) AS ( SELECT parent_id, id, name, name FROM categories WHERE parent_id IS NULL UNION SELECT current_row.parent_id, current_row.id, current_row.name, CONCAT_WS(' > ', previous_row.name, current_row.name) FROM categories current_row INNER JOIN temp_table previous_row ON current_row.parent_id = previous_row.id ) SELECT * FROM temp_table
CONCAT_WS: Concatenate with separator
CONCAT_WS(' > ', 'a', 'b')=> 'a > b'
CONCAT_WS(' > ', NULL, 'a')=> 'a'
Okay, now let's break it down.
WITH RECURSIVE temp_table (parent_id, id, name, fullname) AS ( # ... )
We want to build:
- a temporary table:
- that'll be composed of the following columns:
parent_id, id, name, fullname
These columns can be named anything you want. In this case, I tried to keep the naming close to the naming of
categories's columns but it's not mandatory. Obviously, the same goes for the temp table's name.
Now that the new table is declared, we're going to tell PG how to fill it in two steps: first the initial parent/root records, then their descendants.
SELECT parent_id, id, name, name FROM categories WHERE parent_id IS NULL
Remember we declared a series of columns? Now is the time to fill them. The values given to SELECT will be assigned to the root records' columns in the order in which the temp table was declared:
WHERE parent_id IS NULL?
We want to traverse the records from roots to leafs. Root categories don't have a parent category, therefore we can select them with the
parent_id IS NULL condition.
UNION SELECT current_row.parent_id, current_row.id, current_row.name, CONCAT_WS(' > ', previous_row.name, current_row.name) FROM categories current_row INNER JOIN temp_table previous_row ON current_row.parent_id = previous_row.id
We join parents and children records in the following manner:
- We perform an INNER JOIN (since we're at the children level, the parent always exists)
- We alias
- We alias
- We select values from either
parent_rowdepending on the logic we want to achieve. Here we collect
parent_id, id, namefrom the current row and we craft a new value for
I chose the names
parent_row because they make the logic easier to understand IMO but you can name them whatever you want.
SELECT * FROM temp_table
Nothing fancy here, we select all columns of all records within the temp table.
Annnd we're done :)
Now that you understand how it works, here's another very common usage:
WITH RECURSIVE ordered_nodes (previous_id, id, value, position) AS ( SELECT previous_id, id, value, 0 FROM nodes WHERE previous_id IS NULL UNION SELECT current_row.previous_id, current_row.id, current_row.value, previous_row.position + 1 FROM nodes current_row INNER JOIN ordered_nodes previous_row ON current_row.previous_id = previous_row.id ) SELECT * FROM ordered_nodes ORDER BY position ASC
ActiveRecord offers a very handy method called #find_by_sql to convert the array returned by PG into objects of your model's type (as if you did
Category.all for instance).
You can simply do the following:
sql_query = "WITH RECURSIVE ... SELECT * FROM temp_table" records = Category.find_by_sql(sql_query)
I hope this clarifies what recursive SQL statements are and how to craft them. It's a really useful tool and it's often more efficient to delegate this kind of task to your database engine.
One last thing: without going into the details of it, just know that you can also create recursive SQL views.
Thank you for reading!