Contracts¶
FastFlowTransform supports data contracts: declarative expectations about your
tables and columns. Contracts are stored in YAML files and are compiled into
normal fft test checks.
You get:
- A place to describe the intended schema (types, nullability, enums, etc.)
- Automatic data-quality tests derived from those contracts
- Optional checks for the physical DB data type (per engine)
Contracts live in two places:
- Per-table:
models/**/<table>.contracts.yml - Project-level defaults:
contracts.ymlat the project root
Per-table contracts (*.contracts.yml)¶
For each logical table you can create a *.contracts.yml file under models/.
Convention
- File name: ends with
.contracts.yml - Location: anywhere under
models/ - Each file describes exactly one table
Example:
# models/staging/customers.contracts.yml
version: 1
table: customers
columns:
customer_id:
type: integer
physical:
duckdb: BIGINT
postgres: integer
bigquery: INT64
snowflake_snowpark: NUMBER
databricks_spark: BIGINT
nullable: false
unique: true
name:
type: string
nullable: false
status:
type: string
nullable: false
enum:
- active
- inactive
created_at:
type: timestamp
nullable: false
````
The `table` name should match the logical relation name you use in your models
(e.g. `relation_for("customers")`).
---
## Column attributes
Each entry under `columns:` is a **column contract**.
Supported attributes:
```yaml
columns:
some_column:
type: string # optional semantic type
physical: # optional physical DB type(s)
duckdb: VARCHAR
postgres: text
nullable: false # nullability contract
unique: true # uniqueness contract
enum: [a, b, c] # allowed values
regex: "^[A-Z]{2}[0-9]{4}$" # regex pattern
min: 0 # numeric min (inclusive)
max: 100 # numeric max (inclusive)
description: "Human note" # free-form description
type (semantic type)¶
Free-form semantic type hint, things like:
integerstringtimestampboolean- …
Right now this is documentation / intent only; it does not generate tests by itself. Use it to communicate intent and align with your physical types.
physical (engine-specific physical DB type)¶
physical describes the actual DB type of the column, per engine.
There are two forms:
1) Shorthand string
physical: BIGINT
This is interpreted as:
physical:
default: BIGINT
2) Per-engine mapping
physical:
default: BIGINT # fallback if no engine-specific key is set
duckdb: BIGINT
postgres: integer
bigquery: INT64
snowflake_snowpark: NUMBER
databricks_spark: BIGINT
Supported keys:
| Key | Engine / executor |
|---|---|
default |
Fallback for all engines |
duckdb |
DuckDB executor |
postgres |
Postgres executor |
bigquery |
BigQuery executors |
snowflake_snowpark |
Snowflake Snowpark executor |
databricks_spark |
Databricks / Spark executor |
Important
The value here must match what your warehouse reports in its catalog / information schema for that column (e.g.
INT64in BigQuery,NUMBERin Snowflake, etc.).
Each physical contract is turned into a column_physical_type test.
If the engine does not yet support physical type introspection, the test will
fail with a clear “engine not yet supported” message instead of silently
passing.
Engine-canonical type names¶
Physical type comparisons use the canonical type strings reported by the engine.
That means:
-
Some engines expose aliases as canonical names in their catalogs.
-
Example (Postgres):
timestampis an alias fortimestamp without time zonetimestamptzis an alias fortimestamp with time zone- FFT compares types after engine-specific canonicalization, so contracts can use common names like
timestamp/timestamptzwhile still matching what Postgres reports.
If you see a mismatch like:
expected
timestamp, gottimestamp without time zone
it means your Postgres executor/runtime is not canonicalizing types yet (or you’re using raw information_schema.data_type). In that case, update Postgres type introspection to use pg_catalog.format_type(...) so comparisons are consistent.
nullable¶
nullable: false
nullable: false→ generates anot_nulltest for this column.nullable: trueor omitted → no nullability test.
unique¶
unique: true
unique: true→ generates auniquetest for this column.unique: falseor omitted → no uniqueness test.
enum¶
enum:
- active
- inactive
- pending
enum defines a finite set of allowed values and generates an
accepted_values test.
You can also use a single scalar:
enum: active
which is treated as ["active"].
regex¶
regex: "^[^@]+@[^@]+$"
regex defines a pattern that all non-null values must match. It generates a
regex_match test.
min / max¶
min: 0
max: 100
min and max define an inclusive numeric range and generate a between test.
You can specify just one side:
min: 0 # only lower bound
# or
max: 100 # only upper bound
description¶
description: "Customer signup timestamp in UTC"
Free-form description field. This does not generate tests; it’s for docs / tooling.
Project-level contracts (contracts.yml)¶
You can define project-wide defaults in a single contracts.yml file at
the project root.
This file only defines defaults, not concrete tables.
Example:
# contracts.yml
version: 1
defaults:
columns:
# All *_id columns are non-null integers with engine-specific types
- match:
name: ".*_id$"
type: integer
nullable: false
physical:
duckdb: BIGINT
postgres: integer
bigquery: INT64
# created_at should always be a non-null timestamp
- match:
name: "^created_at$"
type: timestamp
nullable: false
contracts.yml enforcement configuration¶
Example:
version: 1
defaults:
columns:
- match:
name: ".*_id$"
type: integer
nullable: false
enforcement:
# Modes: off | verify | cast
default_mode: off
# If true, contract enforcement only cares about declared columns.
# Extra columns produced by the model are allowed.
allow_extra_columns: true
# Optional per-table overrides (by logical relation name)
tables:
mart_users_by_domain:
mode: verify
allow_extra_columns: true
mart_latest_signup:
mode: cast
allow_extra_columns: true
Rules:
enforcement.default_modeapplies to all tables unless overridden.enforcement.tables.<table>.modeoverrides the default for a single table.-
allow_extra_columnscontrols whether the model output may contain columns not listed in the contract: -
true: extra columns are ignored by enforcement (but still exist in the table) false: extra columns fail enforcement
Column match rules¶
Each entry under defaults.columns is a column default rule:
- match:
name: "regex on column name" # required
table: "regex on table name" # optional
type: ...
physical: ...
nullable: ...
unique: ...
enum: ...
regex: ...
min: ...
max: ...
description: ...
-
match.nameRequired regex applied to the column name. -
match.tableOptional regex applied to the table name.
All the other fields are the same as in *.contracts.yml. They act as
defaults.
How defaults are applied¶
For each column contract from a per-table file:
- All
defaults.columnsrules are evaluated in file order. -
A rule applies if both:
-
match.namematches the column name, and match.tableis empty or matches the table name.-
For every applicable rule:
-
Fields that are currently
null/ unset on the column are filled from the rule. - Fields that are already set on the column are not overridden.
Per-table contracts always win. Defaults only fill in missing values.
Example:
# contracts.yml
defaults:
columns:
- match:
name: ".*_id$"
nullable: false
physical: BIGINT
# models/orders.contracts.yml
version: 1
table: orders
columns:
customer_id:
# nullable unspecified → inherited as false from defaults
physical:
duckdb: BIGINT
postgres: integer # overrides default
Effective contract for orders.customer_id:
type: null
nullable: false # from defaults
physical:
duckdb: BIGINT # from per-table
postgres: integer # from per-table
default: BIGINT # from defaults.physical (other engines)
unique: null
...
How contracts become tests¶
Contracts are turned into regular TestSpec entries used by fft test.
For each column:
| Contract field | Generated test type | Notes |
|---|---|---|
physical |
column_physical_type |
Uses engine-specific mapping |
nullable: false |
not_null |
|
unique: true |
unique |
|
enum |
accepted_values |
|
min / max |
between |
inclusive range |
regex |
regex_match |
Python regex |
All contract-derived tests:
- Use severity
errorby default (today) - Receive the tag
contract(so you can filter on them later)
Example for customers:
# models/staging/customers.contracts.yml
version: 1
table: customers
columns:
customer_id:
nullable: false
unique: true
physical:
duckdb: BIGINT
status:
enum: [active, inactive]
This yields tests roughly equivalent to:
customers.customer_id not_null (tags: contract)
customers.customer_id unique (tags: contract)
customers.customer_id column_physical_type (tags: contract)
customers.status accepted_values (tags: contract)
You don’t need to write those tests yourself; they’re derived automatically from the contract files.
Runtime enforcement (optional)¶
In addition to turning contracts into fft test checks, FastFlowTransform can enforce contracts at runtime while building models.
Runtime enforcement means:
- FFT can verify that the materialized table matches the contract schema, and fail the run if not.
- FFT can cast the model output into the declared physical types before creating the table.
This is configured in project-level contracts.yml under enforcement.
Enforcement modes¶
Contracts enforcement supports three modes:
-
offDo not enforce at build time. (Contracts may still generate tests.) -
verifyBuild the table normally, then verify the physical schema matches the contract. -
castBuild the table by selecting from your model and casting contract columns into their declared physical types, then verify.
castis useful when your warehouse would infer “close but not exact” types (e.g.COUNT(*)becoming a sized numeric type) and you want stable physical types across engines.
Failure messages¶
If enforcement fails, FFT raises an error like:
- Missing/extra columns
- Type mismatch (expected vs actual physical type)
- Non-null/unique contract failures (if those are enforced at runtime in your setup)
The error includes the table name and a list of mismatches.
Enforcement with incremental models¶
When a model is materialized as incremental, FFT applies enforcement to the incremental write path, not only full refresh.
Typical behavior:
- On the first run, the model creates the target relation (full refresh behavior) and enforcement is applied.
- On subsequent runs, FFT computes a delta dataset and writes it using the engine’s incremental strategy (insert/merge/delete+insert, etc.).
- Enforcement is applied so the target table remains compatible with the contract.
Practical recommendations:
- If the incremental model relies on
unique_key, make sure your source change simulation does not introduce duplicated keys in the delta. - For “update simulation” in demos, prefer a second full seed file that represents the entire source after the update (not just appended rows), then rerun incremental. This produces a realistic “source changed” scenario without creating duplicates.
Tests vs runtime enforcement¶
Contracts can be used in two independent ways:
-
Tests (
fft test) Contracts generate test specs likenot_null,unique,accepted_values,regex_match, andcolumn_physical_type. -
Runtime enforcement (
fft run) Enforcement runs during model materialization and can fail the run early.
You can use either one alone, or both together.
Enforcement for SQL models¶
When enforcing contracts for a SQL model:
-
verifymode: -
FFT creates the table/view normally from the model SQL
-
FFT introspects the created object and compares the physical schema to the contract
-
castmode: -
FFT wraps the model SQL in a projection that casts the declared columns:
2. FFT creates the table from that casted SELECT 3. FFT verifies the resulting physical schemaselect cast(col_a as <physical-type>) as col_a, cast(col_b as <physical-type>) as col_b, ... -- optionally include extra columns if allow_extra_columns=true from (<model select>) as src
Notes:
- Enforcement is best-effort: if a contract has no physical types for the current engine,
castmode cannot enforce and will fail with a clear error. allow_extra_columns=truemeans non-contracted columns are carried through unchanged.
Enforcement for Python models¶
For Python models (pandas / Spark / Snowpark / BigFrames):
- FFT first materializes the DataFrame result according to the executor.
-
If enforcement is enabled, the runtime contracts layer may:
-
Stage the DataFrame into a temporary table (engine-specific)
- Re-create the target table using casts (
castmode) - Or only verify the schema (
verifymode)
This allows a consistent enforcement mechanism even when the model result is not expressed as SQL.
Using contracts with fft test¶
The high-level flow:
- You define
*.contracts.ymlundermodels/and, optionally, a rootcontracts.ymlwith defaults. -
fft testloads: -
all per-table contracts
- project-level defaults
- Contracts are expanded into test specs.
- Tests are executed like any other
fft testchecks.
If a contract file is malformed (YAML, duplicate keys, or schema), FFT raises a
friendly ContractsConfigError with a hint. The test run will fail until the
file is fixed, rather than silently skipping it.
Current limitations¶
A few things contracts do not do yet:
- Contracts do not change DDL: tables are still created with the types
inferred by the warehouse from your
SELECT. type(semantic type) is not used to alter the schema; it is for intent / documentation.-
Physical type checks require engine support:
-
Currently, only engines that can introspect their
INFORMATION_SCHEMAand expose that to FFT can fully enforcecolumn_physical_type. - Other engines may reject such tests with a clear “engine not supported” message.
Current limitations¶
- Enforcement behavior can differ by engine depending on what the executor can introspect and how it stages/casts data.
castmode requires explicitphysicaltypes for the current engine.- Some warehouses expose “decorated” physical types (e.g.
VARCHAR(16777216),NUMBER(18,0)) rather than a short base type name. Contracts should match the canonical/normalized representation used by the engine implementation.