Sources Configuration¶
sources.yml declares external tables (seeds, raw inputs, lakehouse paths) that models can reference via {{ source('group', 'table') }}. This document covers the schema, engine overrides, file paths, and best practices.
File Location¶
Place sources.yml at your project root (same level as models/). Example:
project/
├── models/
├── sources.yml
└── seeds/
YAML Schema (Version 2)¶
FastFlowTransform expects a dbt-style structure:
version: 2
sources:
  - name: raw
    schema: staging                # default schema for this source group
    overrides:
      postgres:
        schema: raw_main           # engine-specific default override
    tables:
      - name: seed_users
        identifier: seed_users     # optional physical name
        overrides:
          duckdb:
            schema: main
          databricks_spark:
            format: delta
            location: "/mnt/delta/raw/seed_users"
Fields¶
| Level | Field | Description | 
|---|---|---|
| source | name | 
Logical group identifier referenced by source('name', ...). | 
schema | 
Default target schema/database for the group. | |
database/catalog | 
Optional qualifiers per engine (BigQuery, Snowflake). | |
overrides | 
Map of engine → config snippet (schema overrides, formats, locations). | |
| table | name | 
Logical table name (second argument in source()). | 
identifier | 
Physical name; defaults to name if omitted. | 
|
location | 
File/path location (used with format). | 
|
format | 
Ingestion format for engines supporting path-based sources (delta, parquet, …). | 
|
options | 
Dict of format options (Spark/Databricks). | |
overrides | 
Additional engine-specific settings merged with source-level overrides. | 
Engine-specific overrides follow this merge order:
- Source defaults (
schema,database, …) - Source-level 
overrides[engine] - Table-level 
overrides[engine] 
Engine Behavior¶
- DuckDB / Postgres / BigQuery / Snowflake: expect 
identifier(plusschema/databasewhere relevant). Path-based sources raise errors. - Databricks Spark: supports 
format+location. The executor registers a temp view with optionaloptions(e.g.compression). 
Path-Based Sources Example¶
  - name: raw_events
    tables:
      - name: landing
        overrides:
          databricks_spark:
            format: json
            location: "abfss://landing@storage.dfs.core.windows.net/events/*.json"
            options:
              multiline: true
Referencing Sources in Models¶
select id, email
from {{ source('raw', 'seed_users') }}
After rendering, the executor resolves the fully-qualified relation or path depending on the active engine.
Seed Integration¶
When combined with seeds/schema.yml, you can map CSV/Parquet seeds into schemas per engine:
targets:
  raw/users:
    schema: raw
    schema_by_engine:
      duckdb: main
      postgres: staging
Validation & Errors¶
- Missing 
identifierandlocationproduceKeyErrorduring rendering. - Unknown source/table names raise 
KeyErrorwith suggestions. - Unsupported path-based sources on an engine (
locationprovided but noformat) raise descriptiveNotImplementedError. 
Keep sources.yml declarative, use engine overrides for schema differences, and lean on .env files where credentials or URIs vary per environment.