Source Freshness¶
Source freshness checks answer a simple question:
“How old is the latest data in this source, and is that acceptable?”
They complement table-level DQ tests by validating recency of inputs (seeds, raw tables, landing zones) before you build marts.
- Configuration lives alongside your
sources.ymlmetadata. - Evaluation is done via the
fft source-freshnessCLI command. - Output is CI-friendly (non-zero exit when critical freshness rules fail).
When to use source freshness¶
Use source freshness when:
- you rely on upstream ingestion jobs (ETL, CDC, streaming) and need a guard-rail like
“
crm.ordersmust be < 60 minutes old”; - you have critical feeds (payments, auth logs, PII) where stale data is dangerous;
- you want a cheap pre-flight check in CI before running a heavier
fft run+fft test.
It is not a replacement for table-level freshness tests on marts – they work nicely together.
Configuration¶
Freshness rules are attached to source tables in your metadata (conceptually alongside sources.yml).
A minimal example:
version: 2
sources:
- name: crm
schema: raw
tables:
- name: orders
identifier: seed_orders
freshness:
loaded_at_field: "_ff_loaded_at"
max_delay_minutes: 1440 # 1 day
warn_after_minutes: 720 # optional: warning threshold
error_after_minutes: 1440 # optional: hard error threshold
tags: ["example:dq_demo", "critical_source"]
````
Key fields:
* `loaded_at_field`: timestamp column used to compute the **max** loaded time. When seeds are
materialized via `fft seed`, every table automatically includes `_ff_loaded_at` (UTC timestamp
captured during the seed run). Pointing freshness rules at this metadata column keeps demo seeds
“fresh” even if the CSV contains static business timestamps.
* `max_delay_minutes` / `warn_after_minutes` / `error_after_minutes`:
* if only `max_delay_minutes` is set, it is treated as an error threshold;
* `warn_after_minutes` and `error_after_minutes` allow a 3-state result:
* ✅ **on-time** (age ≤ `warn_after_minutes`)
* ❕ **late (warning)** (`warn_after_minutes` < age ≤ `error_after_minutes`)
* ❌ **stale (error)** (age > `error_after_minutes`)
The exact field names should mirror whatever you wired into `run_source_freshness`; adjust the snippet if your structure differs.
---
## Running checks
Basic usage:
```bash
fft source freshness <project> --env <env>
Examples:
# Check all sources in the DQ demo (DuckDB)
fft source freshness examples/dq_demo --env dev_duckdb
# Only check sources tagged "critical_source"
fft source freshness . \
--env dev \
--select tag:critical_source
# Combine with other selectors (depends on your implementation)
fft source freshness . \
--env dev \
--select source:crm --exclude tag:experimental
The command:
- connects using the selected profile (
--env); - loads source + freshness metadata;
- executes a
max(loaded_at_column)query per configured source; -
compares the result to your thresholds and produces:
-
per-source rows (age, thresholds, status),
- an overall exit status (
0if all within thresholds, non-zero on error).
CI / automation¶
Typical pattern in CI:
# 1) Check source recency
fft source freshness . --env ci
# 2) Only if sources are fresh, run the pipeline and DQ tests
fft run . --env ci
fft test . --env ci --select tag:ci
Because fft source freshness exits non-zero on stale inputs, you can simply let the
CI job fail early rather than running a full DAG on obviously outdated data.
Troubleshooting¶
“No freshness rules found”
- You called
fft source-freshnessbut nothing was evaluated. -
Check that:
-
at least one source table has a
freshness:block; - your
--select/--excludepatterns aren’t filtering everything out.
“Column not found”
- The
loaded_at_columndoesn’t exist in the physical source. - Verify the column name and that your
identifier/ schema overrides for that source are correct.
Unexpectedly large ages
- Make sure your warehouse and timestamps are in the expected timezone.
- Confirm that the ingestion job actually updates
loaded_at_column(and not some other field).
Relationship to table-level freshness tests¶
Table-level freshness tests in project.yml:
- operate on models (e.g.
mart_orders_agg.last_order_ts); - run via
fft test.
Source freshness:
- operates on sources (e.g.
crm.orders.order_ts); - runs via
fft source freshness.
Using both lets you catch:
- Stale upstream ingestion (source is old),
- And downstream pipeline lag or bugs (mart not refreshed even though source is fresh).