API Demo Project¶

The examples/api_demo scenario demonstrates how FastFlowTransform blends local data, external APIs, and multiple execution engines. It highlights:

Hybrid data model: joins a local seed (crm.users) with live user data from JSONPlaceholder.
Multiple environments: switch between DuckDB, Postgres, and Databricks Spark using profiles.yml + .env.*.
HTTP integration: compare the built-in FastFlowTransform HTTP client (api_users_http) with a plain requests implementation (api_users_requests).
Offline caching & telemetry: inspect HTTP snapshots via run_results.json.
Engine-aware registration: scope Python models via engine_model and SQL models via config(engines=[...]) so only the active engine’s nodes load.

Data Model¶

Seed staging – models/common/users.ff.sql

{{ config(
    materialized='table',
    tags=[
        'example:api_demo',
        'scope:common',
        'kind:seed-consumer',
        'engine:duckdb',
        'engine:postgres',
        'engine:databricks_spark'
    ]
) }}
select id, email
from {{ source('crm', 'users') }};

Consumes sources.yml → crm.users (seeded from seeds/seed_users.csv).

API enrichment – two Python implementations under models/engines/duckdb/:
api_users_http.ff.py uses the built-in HTTP wrapper (fastflowtransform.api.http.get_df) with cache/offline support.
api_users_requests.ff.py uses raw requests for maximum flexibility.
Wrap engine-specific callables with engine_model(only="duckdb", ...) to skip registration when another engine is selected.

Mart join – models/common/mart_users_join.ff.sql

{{ config(engines=['duckdb','postgres','databricks_spark']) }}
{% set api_users_model = var('api_users_model', 'api_users_http') %}
{% set api_users_refs = {
    'api_users_http': ref('api_users_http'),
    'api_users_requests': ref('api_users_requests')
} %}
{% set api_users_relation = api_users_refs.get(api_users_model, api_users_refs['api_users_http']) %}
with a as (
  select u.id as user_id, u.email from {{ ref('users.ff') }} u
),
b as (
  select * from {{ api_users_relation }}
)
select ...

Ties everything together and exposes the var('api_users_model') hook to choose the HTTP implementation while still keeping literal ref('…') calls in the template (required for DAG detection). config(engines=[...]) keeps the SQL node registered only for the engines you list, preventing duplicate names across engine-specific folders.

Warning: The DAG builder only detects dependencies from literal ref('model_name') strings. A pure ref(api_users_model) (without the mapping shown above) compiles, but the graph would miss the edge to api_users_http/api_users_requests.

Profiles & Secrets¶

profiles.yml defines per-engine profiles that reference environment variables:

dev_duckdb:
  engine: duckdb
  duckdb:
    path: "{{ env('FF_DUCKDB_PATH', '.local/api_demo.duckdb') }}"

dev_postgres:
  engine: postgres
  postgres:
    dsn: "{{ env('FF_PG_DSN') }}"
    db_schema: "{{ env('FF_PG_SCHEMA', 'public') }}"

.env.dev_* files supply the actual values. _load_dotenv_layered() loads them in priority order: repo .env → project .env → .env.<env> → shell overrides (highest priority). Secrets stay out of version control.

Makefile Workflow¶

Makefile chooses the profile via ENGINE (duckdb/postgres/databricks_spark) and wraps the main commands:

ENGINE ?= duckdb

ifeq ($(ENGINE),duckdb)
  PROFILE_ENV = dev_duckdb
endif
...

seed:
    uv run fft seed "$(PROJECT)" --env $(PROFILE_ENV)
run:
    env FFT_ACTIVE_ENV=$(PROFILE_ENV) ... uv run fft run ...

Common targets:

Target	Description
`make ENGINE=duckdb seed`	Materialize seeds into DuckDB.
`make ENGINE=postgres run`	Execute the full pipeline against Postgres.
`make dag`	Render documentation (`site/dag/`).
`make api-run`	Run only API models (uses HTTP cache).
`make api-offline`	Force offline mode (`FF_HTTP_OFFLINE=1`).
`make api-show-http`	Display HTTP snapshot metrics via `jq`.

HTTP tuning parameters (FF_HTTP_ALLOWED_DOMAINS, cache dir, timeouts) live in .env and are appended via HTTP_ENV when running commands.

End-to-End Demo¶

Select engine: make ENGINE=duckdb (default). Set ENGINE=postgres or ENGINE=databricks_spark to switch.
Seed data: make seed
Run pipeline: make run
Explore docs: make dag → open examples/api_demo/site/dag/index.html
Inspect HTTP usage: make api-show-http

This example demonstrates multi-engine configuration, environment-driven secrets, and API enrichment within FastFlowTransform.