Cache & Parallelism
π docs/Cache_and_Parallelism.md¶
# Parallelism & Cache (FastFlowTransform v0.3)
FastFlowTransform 0.3 introduces a level-wise parallel scheduler and a build cache driven by stable fingerprints. This document explains **how parallel execution works**, **when nodes are skipped**, the exact **fingerprint formula**, and the **meta table** written after successful builds.
---
## Table of Contents
- [Parallel Scheduler](#parallel-scheduler)
- [Cache Policy](#cache-policy)
- [Fingerprint Formula](#fingerprint-formula)
- [Meta Table Schema](#meta-table-schema)
- [CLI Recipes](#cli-recipes)
- [Troubleshooting & FAQ](#troubleshooting--faq)
- [Example: simple_duckdb](#example-simple_duckdb)
- [Appendix: Environment Inputs](#appendix-environment-inputs)
---
## Parallel Scheduler
FastFlowTransform splits the DAG into **levels** (all nodes that can run together without violating dependencies). Within a level, up to `--jobs` nodes execute in **parallel**.
- Dependencies are **never** violated.
- `--keep-going`: tasks already started in a level finish; **subsequent levels wonβt start** if any task in the current level fails.
- Logs are serialized through an internal queue to keep lines readable and per-node timing visible.
**Quick start**
```bash
# Run with 4 workers per level
fft run . --env dev --jobs 4
# Keep tasks in the same level running even if one fails
fft run . --env dev --jobs 4 --keep-going
Cache Policy¶
The cache decides whether a node can be skipped when nothing relevant changed. Modes:
--cache=off  # always build
--cache=rw   # default; skip on match; write cache after build
--cache=ro   # skip on match; on miss build but don't write cache
--cache=wo   # always build and write cache
--rebuild <glob>  # ignore cache for matching nodes
--no-cache       # alias for --cache=off
Skip condition¶
A node is skipped iff:
- The current fingerprint matches the on-disk cache value, and
 - The physical relation exists on the target engine.
 
If the relation was dropped externally, FastFlowTransform will rebuild even if the fingerprint matches.
Fingerprint Formula¶
Fingerprints are stable hashes that change on any relevant input:
- 
SQL models:
fingerprint_sql(node, rendered_sql, env_ctx, dep_fps) - 
Uses rendered SQL (after Jinja), not the raw template.
 - 
Python models:
fingerprint_py(node, func_src, env_ctx, dep_fps) - 
Uses
inspect.getsource(func)with a file-content fallback if needed. 
env_ctx includes:
engine(e.g.,duckdb,postgres,bigquery)profile_name(CLI--env)- Selected environment entries: all 
FF_*keys (key + value) - A normalized portion of 
sources.yml(sorted keys/dump) 
dep_fps are upstream fingerprints; any upstream change invalidates downstream fingerprints.
Properties
- Same inputs β same hash.
 - Minimal change in SQL/function β different hash.
 - Dependency changes propagate downstream.
 
Meta Table Schema¶
After a successful build, FastFlowTransform writes a per-node audit row:
_ff_meta (
  node_name   TEXT/STRING,   -- logical name, e.g. "users.ff"
  relation    TEXT/STRING,   -- physical table/view, e.g. "users"
  fingerprint TEXT/STRING,
  engine      TEXT/STRING,
  built_at    TIMESTAMP
)
Backends:
- DuckDB: table 
_ff_metainmain. - Postgres: table 
_ff_metain the active schema. - BigQuery: table 
<dataset>._ff_meta. 
Note: Skip logic uses the file-backed fingerprint cache and a direct relation existence check; the meta table is for auditing and tooling.
CLI Recipes¶
# First run β builds everything, writes cache and meta
fft run . --env dev --cache=rw
# No-op run β should skip all nodes (if nothing changed)
fft run . --env dev --cache=rw
# Force rebuild of a single model (ignores cache for it)
fft run . --env dev --cache=rw --rebuild marts_daily.ff
# Read-only cache (skip on match, build on miss, no writes)
fft run . --env dev --cache=ro
# Always build and write cache
fft run . --env dev --cache=wo
# Disable cache entirely
fft run . --env dev --no-cache
With parallelism:
fft run . --env dev --jobs 4
fft run . --env dev --jobs 4 --keep-going
Troubleshooting & FAQ¶
βWhy did it skip?β A skip requires a fingerprint match and an existing relation. Fingerprints include:
- rendered SQL / Python function source,
 sources.yml(normalized),- engine/profile,
 - all 
FF_*environment variables, - upstream fingerprints.
 
Any change in the above triggers a rebuild downstream.
βRelation missing but cache says skip?β We also check relation existence. If the table/view was dropped externally, FastFlowTransform will rebuild.
βMy logs interleave under parallelism.β
Logs are serialized via a queue; use -v / -vv for richer but still stable output. Each node prints start/end and duration; levels summarize.
βUtest cache?β
fft utest --cache {off|ro|rw} defaults to off for deterministic runs. With rw, expensive unit cases can be accelerated. Unit tests do not rely on the meta table by default.
Example: simple_duckdb¶
The demo contains two independent staging nodes (users.ff.sql, orders.ff.sql). They run in parallel within the same level.
Makefile targets:
run_parallel:
    FF_ENGINE=duckdb FF_DUCKDB_PATH="$(DB)" fft run "$(PROJECT)" --env dev --jobs 4
cache_rw_first:
    FF_ENGINE=duckdb FF_DUCKDB_PATH="$(DB)" fft run "$(PROJECT)" --env dev --cache=rw
cache_rw_second:
    FF_ENGINE=duckdb FF_DUCKDB_PATH="$(DB)" fft run "$(PROJECT)" --env dev --cache=rw
cache_invalidate_env:
    FF_ENGINE=duckdb FF_DUCKDB_PATH="$(DB)" FF_DEMO_TOGGLE=1 fft run "$(PROJECT)" --env dev --cache=rw
Appendix: Environment Inputs¶
Only environment variables with the FF_ prefix affect fingerprints (keys and values). If you change one (e.g., FF_RUN_DATE, FF_REGION), fingerprints change and downstream nodes rebuild.
# Will invalidate fingerprints and rebuild affected nodes
FF_RUN_DATE=2025-01-01 fft run . --env dev --cache=rw
---
### π `docs/index.md` β Link zum neuen Kapitel
```diff
--- a/docs/index.md
+++ b/docs/index.md
@@ -10,6 +10,7 @@
 - [User Guide β Operational](./Technical_Overview.md#part-i--operational-guide)
 - [Modeling Reference](./Config_and_Macros.md)
 - [Parallelism & Cache (v0.3)](./Cache_and_Parallelism.md)
 - [Developer Guide β Architecture & Internals](./Technical_Overview.md#part-ii--architecture--internals)