Let's see how to write a recursive SQL query with PostgreSQL's CTE (Common Table Expressions) and ActiveRecord

What is a recursive SQL query

A recursive SQL query is a SELECT statement that'll run on a unidirectional, non-looping graph data structure. It builds a temporary table where each record can use data from it's parent record to serve whatever logic the application needs.

When does one need them

Here are two very common examples:

  • Computing some kind of position or depth for each record
    • You store a linked list and need the position of each node
    • You store a filesystem and need to know the depth of each file in it
  • Concatenating strings
    • You concatenate the parent record's name with the current record's name to build a breadcrumb

Prerequisite

I'll go with the basis that you know what recursion is in general. If you're not comfortable with how the below #depth method works, I advise you to learn the concept of recursion first before applying it to SQL statements.

class Node
  attr_accessor :left, :right

  def initialize(left: nil, right: nil)
    @left = left
    @right = right
  end

  def depth
    1 + ([left&.depth, right&.depth].max || 0)
  end
end

root = Node.new(
  left: Node.new(
    left: Node.new,
    right: Node.new(
      right: Node.new
    )
  ),
  right: Node.new
)

puts root.depth

How to write a recursive SQL query with PostgreSQL's CTE

Let's write a simple one.

Suppose we have the following categories table:

create_table :categories do |t|
  t.integer :parent_id
  t.string :name

  t.timestamps null: false
end
class Category < ApplicationRecord
  belongs_to :parent, class_name 'Category', foreign_key: 'parent_id', inverse_of: :children, optional: true
  has_many :children, class_name 'Category', foreign_key: 'parent_id', inverse_of: :parent
end

Let's say we want to add an extra-column in the SQL result that'd be called fullname. Read the following SQL statement then we'll break it down to understand how it works.

WITH RECURSIVE temp_table (parent_id, id, name, fullname)
AS (
   SELECT parent_id, id, name, name
   FROM categories
   WHERE parent_id IS NULL
   UNION
   SELECT
   current_row.parent_id,
   current_row.id,
   current_row.name,
   CONCAT_WS(' > ', previous_row.name, current_row.name)
   FROM categories current_row
   INNER JOIN temp_table previous_row ON current_row.parent_id = previous_row.id
)
SELECT *
FROM temp_table

CONCAT_WS: Concatenate with separator
CONCAT_WS(' > ', 'a', 'b') => 'a > b'
CONCAT_WS(' > ', NULL, 'a') => 'a'

Okay, now let's break it down.

1. First, tell PG the structure of the recursive temp table

WITH RECURSIVE temp_table (parent_id, id, name, fullname)
AS (
  # ...
)

We want to build:

  • a temporary table: temp_table
  • that'll be composed of the following columns: parent_id, id, name, fullname

These columns can be named anything you want. In this case, I tried to keep the naming close to the naming of categories's columns but it's not mandatory. Obviously, the same goes for the temp table's name.

Now that the new table is declared, we're going to tell PG how to fill it in two steps: first the initial parent/root records, then their descendants.

2. Next, tell PG how to build the root records

SELECT parent_id, id, name, name
FROM categories
WHERE parent_id IS NULL

Remember we declared a series of columns? Now is the time to fill them. The values given to SELECT will be assigned to the root records' columns in the order in which the temp table was declared:

  • temp_table.parent_id = categories.parent_id
  • temp_table.id = categories.id
  • temp_table.name = categories.name
  • temp_table.fullname = categories.name

Why WHERE parent_id IS NULL?

We want to traverse the records from roots to leafs. Root categories don't have a parent category, therefore we can select them with the parent_id IS NULL condition.

3. Then, tell PG how to build the children records

UNION
SELECT
current_row.parent_id,
current_row.id,
current_row.name,
CONCAT_WS(' > ', previous_row.name, current_row.name)
FROM categories current_row
INNER JOIN temp_table previous_row ON current_row.parent_id = previous_row.id

We join parents and children records in the following manner:

  • We perform an INNER JOIN (since we're at the children level, the parent always exists)
  • We alias categories as current_row
  • We alias temp_table as previous_row
  • We select values from either current_row or parent_row depending on the logic we want to achieve. Here we collect parent_id, id, name from the current row and we craft a new value for fullname.

I chose the names current_row and parent_row because they make the logic easier to understand IMO but you can name them whatever you want.

4. Finally, perform a regular SELECT statement on that temp table

SELECT *
FROM temp_table

Nothing fancy here, we select all columns of all records within the temp table.

Annnd we're done :)

Other example: an ordered linked list

Now that you understand how it works, here's another very common usage:

WITH RECURSIVE ordered_nodes (previous_id, id, value, position)
AS (
  SELECT previous_id, id, value, 0
  FROM nodes
  WHERE previous_id IS NULL
  UNION
  SELECT
    current_row.previous_id,
    current_row.id,
    current_row.value,
    previous_row.position + 1
  FROM nodes current_row
  INNER JOIN ordered_nodes previous_row ON current_row.previous_id = previous_row.id
)
SELECT *
FROM ordered_nodes
ORDER BY position ASC

Use the result with ActiveRecord

ActiveRecord offers a very handy method called #find_by_sql to convert the array returned by PG into objects of your model's type (as if you did Category.all for instance).

You can simply do the following:

sql_query = "WITH RECURSIVE ... SELECT * FROM temp_table"
records = Category.find_by_sql(sql_query)

I hope this clarifies what recursive SQL statements are and how to craft them. It's a really useful tool and it's often more efficient to delegate this kind of task to your database engine.

One last thing: without going into the details of it, just know that you can also create recursive SQL views.

Thank you for reading!
Younes SERRAJ