Ryan Bigg

⟵ Posts

Scoping an Active Record join

09 Dec 2024

Active Record is well known for its footguns, like N+1 queries and letting you dump all the business logic for your applications in models. (For an alternative, read Maintainable Rails.)

A lesser-known footgun is this seemingly innocuous use of joins in a tenanted Rails application. By “tenanted” I mean that most records have something like a tenant_id on them that declares ownership. In our case, it’s merchant_id. Here’s the query:

FraudCheck.where(merchant: merchant).joins(:purchase)

Fraud checks belong to a merchant, and they also belong to a purchase. Purchases have just the one fraud check. Merchants have many fraud checks and purchases.

The query this executes is:

SELECT "fraud_checks".* FROM "fraud_checks"
INNER JOIN "purchases" ON "purchases"."id" = "fraud_checks"."purchase_id"
WHERE "fraud_checks"."merchant_id" = 1

This seems like a relatively good query and it’ll run “fast enough” on small data sets. However, as your dataset grows and becomes measured in multiple terabytes, such a query will get slower and slower.

This query runs slow because it’s querying two tables, one very quickly because it has a small dataset to query through, and one very slowly because it has a much larger dataset to trawl through. The first table it queries is fraud_checks, and it finds all of those where the merchant_id=1, which is a smaller dataset than “all fraud checks ever”. The second table it queries is “purchases”, which it attempts to find all purchases from all time matching the purchase_id values returned by the fraud checks query.

We can shorten this query’s execution time by scoping the purchases to just those from the merchant by using merge:

FraudCheck
  .where(merchant: merchant)
  .joins(:purchase)
  .merge(
    Purchase.where(merchant: merchant)
  )

This now executes this query:

SELECT "fraud_checks".* FROM "fraud_checks"
INNER JOIN "purchases" ON "purchases"."id" = "fraud_checks"."transaction_id"
WHERE "fraud_checks"."merchant_id" = 2736
AND "purchases"."merchant_id" = 2736

The query is now limited to just fraud checks and purchases that match that merchant_id, resulting in a smaller table scan for purchases that match the selected fraud checks.

We further limit this query by applying a date range scope on the purchases too:

FraudCheck
  .where(merchant: merchant)
  .joins(:purchase)
  .merge(
    Purchase.where(merchant: merchant, created_at: start_date..end_date)
  )

This results in a super fast query compared to what we started with, as we’ve now drastically reduced the scope of purchases that can match our query.