Ryan Bigg

⟵ Posts

Scoping an Active Record join

09 Dec 2024

Active Record is well known for its footguns, like N+1 queries and letting you dump all the business logic for your applications in models. (For an alternative, read Maintainable Rails.)

A lesser-known footgun is this seemingly innocuous use of joins in a tenanted Rails application. By “tenanted” I mean that most records have something like a tenant_id on them that declares ownership. In our case, it’s merchant_id. Here’s the query:

FraudCheck.where(merchant: merchant).joins(:purchase)

Fraud checks belong to a merchant, and they also belong to a purchase. Purchases have just the one fraud check. Merchants have many fraud checks and purchases.

The query this executes is:

SELECT "fraud_checks".* FROM "fraud_checks"
INNER JOIN "purchases" ON "purchases"."id" = "fraud_checks"."purchase_id"
WHERE "fraud_checks"."merchant_id" = 1

This seems like a relatively good query and it’ll run “fast enough” on small data sets. However, as your dataset grows and becomes measured in multiple terabytes, such a query will get slower and slower.

This query runs slow because it’s querying two tables, one very quickly because it has a small dataset to query through, and one very slowly because it has a much larger dataset to trawl through. The first table it queries is fraud_checks, and it finds all of those where the merchant_id=1, which is a smaller dataset than “all fraud checks ever”. The second table it queries is “purchases”, which it attempts to find all purchases from all time matching the purchase_id values returned by the fraud checks query.

We can shorten this query’s execution time by scoping the purchases to just those from the merchant by using merge:

FraudCheck
  .where(merchant: merchant)
  .joins(:purchase)
  .merge(
    Purchase.where(merchant: merchant)
  )

This now executes this query:

SELECT "fraud_checks".* FROM "fraud_checks"
INNER JOIN "purchases" ON "purchases"."id" = "fraud_checks"."transaction_id"
WHERE "fraud_checks"."merchant_id" = 2736
AND "purchases"."merchant_id" = 2736

The query is now limited to just fraud checks and purchases that match that merchant_id, resulting in a smaller table scan for purchases that match the selected fraud checks.

We further limit this query by applying a date range scope on the purchases too:

FraudCheck
  .where(merchant: merchant)
  .joins(:purchase)
  .merge(
    Purchase.where(merchant: merchant, created_at: start_date..end_date)
  )

This results in a super fast query compared to what we started with, as we’ve now drastically reduced the scope of purchases that can match our query.

In case you missed it...

The Last 100 Posts