Snapshot Demo Project¶

The examples/snapshot_demo project shows how to build history-aware tables with FastFlowTransform snapshots. It reuses the small users pipeline from the basic demo and adds a users_clean_snapshot model that captures row-versioned history over time.

Why it exists¶

Show snapshot semantics – demonstrate materialized='snapshot' with strategy='timestamp' on a simple dataset.
Separate runs – illustrate why snapshots are executed via fft snapshot run instead of the regular fft run.
Engine parity – keep the snapshot demo portable across DuckDB, Postgres, Databricks Spark (parquet / Delta Lake / Iceberg), and BigQuery (once engines are implemented).
Understand the shape of a snapshot table – see how FFT adds validity columns on top of your source columns.

Use it as a sandbox before adding snapshots to your own marts and dimensions.

Project layout¶

The snapshot demo is intentionally tiny and mirrors the basic demo structure:

Path	Purpose
`seeds/seed_users.csv`	Sample CRM-style user data. `fft seed` materializes it as a physical `seed_users` table.
`models/staging/users_clean.ff.sql`	Same as in the basic demo: cleans emails, casts types, derives `email_domain`.
`models/marts/mart_users_by_domain.ff.sql`	Same as in the basic demo: aggregates users per email domain.
`models/snapshots/users_clean_snapshot.ff.sql`	New: snapshot model that captures slowly changing history of `users_clean.ff`.
`profiles.yml`	Reused from the basic demo: defines `dev_duckdb`, `dev_postgres`, `dev_databricks_parquet`, `dev_databricks_delta`, `dev_databricks_iceberg`, `dev_bigquery`.
`.env.dev_*`	Engine-specific environment files (`.env.dev_duckdb`, `.env.dev_postgres`, `.env.dev_databricks_parquet`, `.env.dev_databricks_delta`, `.env.dev_databricks_iceberg`).
`Makefile`	Adds snapshot-aware targets on top of the usual `seed` / `run` / `test` / `dag`.

The snapshot model¶

The core of the demo is models/snapshots/users_clean_snapshot.ff.sql:

{{ config(
    materialized='snapshot',
    snapshot={
        'strategy': 'timestamp',   -- or 'check' (not used in this demo)
    },
    unique_key='user_id',
    updated_at='signup_date',
    tags=[
        'example:snapshot_demo',
        'scope:snapshot',
        'engine:duckdb',
        'engine:postgres',
        'engine:databricks_spark',
        'engine:bigquery'
    ],
) }}

select
    user_id,
    email,
    email_domain,
    signup_date
from {{ ref('users_clean.ff') }};

Key points:

materialized='snapshot' marks this as a snapshot model.
snapshot.strategy='timestamp' means:
FFT uses updated_at='signup_date' to detect changed rows.
When a row changes, the old version is closed and a new version is opened.
unique_key='user_id' defines the business key used to match records between runs.
The body is a normal SELECT from the cleaned staging model; FFT takes care of the history logic.

On physical storage, FFT keeps:

All columns from the select (user_id, email, email_domain, signup_date)
Plus engine-agnostic snapshot metadata columns (names depending on your implementation), typically:
a valid-from timestamp
a valid-to timestamp (nullable/open ended)
an is_current flag

So a given user_id may appear multiple times with different validity ranges.

Running the snapshot demo¶

Assuming you’ve already wired examples/snapshot_demo/Makefile similarly to the basic demo (with snapshot / snapshot_demo targets):

Change into the project:

cd examples/snapshot_demo

Choose an engine and export the environment (example: DuckDB):

# DuckDB
set -a; source .env.dev_duckdb; set +a

# Or Postgres
# set -a; source .env.dev_postgres; set +a

# Or Databricks
# Parquet: set -a; source .env.dev_databricks_parquet; set +a
# Delta:   set -a; source .env.dev_databricks_delta; set +a
# Iceberg: set -a; source .env.dev_databricks_iceberg; set +a
# (optionally export FF_DBR_TABLE_FORMAT=delta|iceberg to override the table format)

# Or BigQuery (requires GCP setup)
# set -a; source .env.dev_bigquery_pandas; set +a
# set -a; source .env.dev_bigquery_bigframes; set +a

Run the full snapshot demo for the selected engine:

# One-shot: clean → seed → run (pipeline) → snapshot → dag → test
make snapshot_demo ENGINE=duckdb
# make snapshot_demo ENGINE=postgres
# make snapshot_demo ENGINE=databricks_spark DBR_TABLE_FORMAT=delta
# make snapshot_demo ENGINE=databricks_spark DBR_TABLE_FORMAT=iceberg
# make snapshot_demo ENGINE=bigquery BQ_FRAME=bigframes

Under the hood this will typically do:

fft seed – materialize seed_users
fft run – build staging/mart views/tables (excluding snapshot models)
fft snapshot run – apply snapshot logic to users_clean_snapshot
fft dag – generate the DAG/site
fft test – run any configured DQ tests

Databricks table formats (parquet / Delta / Iceberg)¶

Just like the incremental demo, the snapshot project lets you flip Spark table formats without editing models. Pass DBR_TABLE_FORMAT=parquet|delta|iceberg to make snapshot_demo or export FF_DBR_TABLE_FORMAT when invoking fft directly. dev_databricks_parquet, dev_databricks_delta, and dev_databricks_iceberg each point to their own managed database / warehouse (snapshot_demo_parquet, snapshot_demo_delta, snapshot_demo_iceberg), so switching formats never reuses stale Hive metadata. The Iceberg profile wires in the catalog via spark.sql.catalog.iceberg.*; Delta still requires the delta-spark package.

Manual CLI examples:

# Parquet snapshots
FF_DBR_TABLE_FORMAT=parquet \
  FFT_ACTIVE_ENV=dev_databricks_parquet FF_ENGINE=databricks_spark \
  fft snapshot run . --select tag:example:snapshot_demo --select tag:engine:databricks_spark

# Delta Lake snapshots
FF_DBR_TABLE_FORMAT=delta \
  FFT_ACTIVE_ENV=dev_databricks_delta FF_ENGINE=databricks_spark \
  fft snapshot run . --select tag:example:snapshot_demo --select tag:engine:databricks_spark

# Iceberg snapshots
FF_DBR_TABLE_FORMAT=iceberg \
  FFT_ACTIVE_ENV=dev_databricks_iceberg FF_ENGINE=databricks_spark \
  fft snapshot run . --select tag:example:snapshot_demo --select tag:engine:databricks_spark

Or run only the snapshot step (after a normal fft run):

# DuckDB example
make run ENGINE=duckdb              # builds users_clean etc.
make snapshot ENGINE=duckdb         # runs only snapshot models

Or directly with fft:

# Only snapshot models (tagged example:snapshot_demo)
fft snapshot run . \
  --env dev_duckdb \
  --select tag:example:snapshot_demo --select tag:engine:duckdb

If your selection includes non-snapshot models, FFT will ignore them for the snapshot run.

Inspecting the snapshot table¶

After a couple of runs with changed data, use your engine to inspect users_clean_snapshot:

DuckDB (from the project root):

select *
from users_clean_snapshot
order by user_id, _ff_valid_from;  -- adjust column names to what you implement

Postgres / BigQuery / Databricks: the table name is the same; the schema/database/dataset follows the profile.

Typical patterns to explore:

Current records only (one row per user_id):

select *
from users_clean_snapshot
where _ff_is_current = true;

History of a single user:

select *
from users_clean_snapshot
where user_id = 42
order by _ff_valid_from;

This makes it easy to answer questions like “what did we know about this user on date X?”.

Snapshot CLI & retention¶

The snapshot demo uses the dedicated entry point:

fft snapshot run . --env dev_duckdb --select tag:example:snapshot_demo

In addition, the CLI supports retention and pruning flags (once implemented in your code base):

--prune – enables pruning of old snapshot rows.
--keep-last N – when used with --prune, keeps only the last N versions per key.
--dry-run – shows which rows would be pruned without actually deleting anything.

Example:

# Keep only the last 3 versions per user_id; just show the plan
fft snapshot run . \
  --env dev_duckdb \
  --select tag:example:snapshot_demo \
  --prune --keep-last 3 --dry-run

# Apply the pruning for real
fft snapshot run . \
  --env dev_duckdb \
  --select tag:example:snapshot_demo \
  --prune --keep-last 3

This is especially useful when snapshot tables grow large and you only care about a bounded history window for most use cases.

Interaction with regular runs¶

Two important rules:

Snapshot models are not part of fft run They’re intentionally excluded to keep regular pipeline runs stateless and predictable. If a snapshot model is accidentally selected in fft run, FFT surfaces a clear error:

Snapshot models cannot be executed via 'fft run'. Use 'fft snapshot run' instead.

Snapshots depend on upstream models In the demo, users_clean_snapshot depends on users_clean.ff. The typical flow is:

fft run . --env dev_duckdb --select tag:example:basic_demo
fft snapshot run . --env dev_duckdb --select tag:example:snapshot_demo

fft run ensures users_clean is fresh.
fft snapshot run compares the new users_clean rows with the existing snapshot table and writes history changes.