API Demo Project¶
The examples/api_demo scenario demonstrates how FastFlowTransform blends local data, external APIs, and multiple execution engines. It highlights:
- Hybrid data model: joins a local seed (
crm.users) with live user data from JSONPlaceholder. - Multiple environments: switch between DuckDB, Postgres, Databricks Spark, BigQuery (pandas or BigFrames client), and Snowflake (Snowpark) using
profiles.yml+.env.*. - HTTP integration: compare the built-in FastFlowTransform HTTP client (
api_users_http) with a plainrequestsimplementation (api_users_requests). - Offline caching & telemetry: inspect HTTP snapshots via
run_results.json. - Engine-aware registration: scope Python models via
engine_modeland SQL models viaconfig(engines=[...])so only the active engine’s nodes load.
Data Model¶
-
Seed staging –
models/common/users.ff.sqlConsumes{{ config( materialized='table', tags=[ 'example:api_demo', 'scope:common', 'kind:seed-consumer', 'engine:duckdb', 'engine:postgres', 'engine:databricks_spark', 'engine:bigquery', 'engine:snowflake_snowpark' ] ) }} select id, email from {{ source('crm', 'users') }};sources.yml → crm.users(seeded fromseeds/seed_users.csv). -
API enrichment – engine-specific Python implementations under
models/engines/<engine>/: api_users_http.ff.pyuses the built-in HTTP wrapper (fastflowtransform.api.http.get_df) with cache/offline support.api_users_requests.ff.pyuses rawrequestsfor maximum flexibility.-
Engine-specific callables are scoped with
engine_model(only=...)(DuckDB/Postgres/Spark/Snowflake) orenv_match={"FF_ENGINE": "bigquery", "FF_ENGINE_VARIANT": ...}(BigQuery pandas/BigFrames) to stay isolated per engine. -
Mart join –
models/common/mart_users_join.ff.sqlTies everything together and exposes the{{ config(engines=['duckdb','postgres','databricks_spark','bigquery','snowflake_snowpark']) }} {% set api_users_model = var('api_users_model', 'api_users_http') %} {% set api_users_refs = { 'api_users_http': ref('api_users_http'), 'api_users_requests': ref('api_users_requests') } %} {% set api_users_relation = api_users_refs.get(api_users_model, api_users_refs['api_users_http']) %} with a as ( select u.id as user_id, u.email from {{ ref('users.ff') }} u ), b as ( select * from {{ api_users_relation }} ) select ...var('api_users_model')hook to choose the HTTP implementation while still keeping literalref('…')calls in the template (required for DAG detection).config(engines=[...])keeps the SQL node registered only for the engines you list, preventing duplicate names across engine-specific folders.
Warning: The DAG builder only detects dependencies from literal
ref('model_name')strings. A pureref(api_users_model)(without the mapping shown above) compiles, but the graph would miss the edge toapi_users_http/api_users_requests.
Profiles & Secrets¶
profiles.yml defines per-engine profiles that reference environment variables:
dev_duckdb:
engine: duckdb
duckdb:
path: "{{ env('FF_DUCKDB_PATH', '.local/api_demo.duckdb') }}"
dev_postgres:
engine: postgres
postgres:
dsn: "{{ env('FF_PG_DSN') }}"
db_schema: "{{ env('FF_PG_SCHEMA', 'public') }}"
dev_bigquery_bigframes:
engine: bigquery
bigquery:
project: "{{ env('FF_BQ_PROJECT') }}"
dataset: "{{ env('FF_BQ_DATASET', 'api_demo') }}"
location: "{{ env('FF_BQ_LOCATION', 'EU') }}"
use_bigframes: true
dev_snowflake:
engine: snowflake_snowpark
snowflake_snowpark:
account: "{{ env('FF_SF_ACCOUNT') }}"
user: "{{ env('FF_SF_USER') }}"
password: "{{ env('FF_SF_PASSWORD') }}"
warehouse: "{{ env('FF_SF_WAREHOUSE', 'COMPUTE_WH') }}"
database: "{{ env('FF_SF_DATABASE', 'API_DEMO') }}"
schema: "{{ env('FF_SF_SCHEMA', 'API_DEMO') }}"
role: "{{ env('FF_SF_ROLE', '') }}"
allow_create_schema: true
.env.dev_* files supply the actual values (including .env.dev_snowflake for Snowflake credentials). _load_dotenv_layered() loads them in priority order: repo .env → project .env → .env.<env> → shell overrides (highest priority). Secrets stay out of version control.
BigQuery specifics¶
- Set
ENGINE=bigqueryin the Makefile targets and choose a client viaBQ_FRAME=pandasorBQ_FRAME=bigframes(default). - Required env vars:
FF_BQ_PROJECT,FF_BQ_DATASET(defaults toapi_demo), and optionallyFF_BQ_LOCATION. Uncommentallow_create_datasetinprofiles.ymlfor first-run convenience. - BigFrames variants ingest the HTTP payload into a pandas DataFrame, then wrap it as a BigFrames DataFrame (FFT’s
get_df(..., output="bigframes")is not implemented yet).
Makefile Workflow¶
Makefile chooses the profile via ENGINE (duckdb/postgres/databricks_spark/bigquery/snowflake_snowpark) and wraps the main commands. For BigQuery, set BQ_FRAME=pandas|bigframes:
ENGINE ?= duckdb
ifeq ($(ENGINE),duckdb)
PROFILE_ENV = dev_duckdb
endif
...
ifeq ($(ENGINE),bigquery)
ENGINE_TAG = engine:bigquery
ifeq ($(BQ_FRAME),pandas)
PROFILE_ENV = dev_bigquery_pandas
else
PROFILE_ENV = dev_bigquery_bigframes
endif
endif
ifeq ($(ENGINE),snowflake_snowpark)
PROFILE_ENV = dev_snowflake
endif
seed:
uv run fft seed "$(PROJECT)" --env $(PROFILE_ENV)
run:
env FFT_ACTIVE_ENV=$(PROFILE_ENV) ... uv run fft run ...
Common targets:
| Target | Description |
|---|---|
make ENGINE=duckdb seed |
Materialize seeds into DuckDB. |
make ENGINE=postgres run |
Execute the full pipeline against Postgres. |
make ENGINE=bigquery run BQ_FRAME=bigframes |
Run against BigQuery (default BigFrames client; set BQ_FRAME=pandas to switch). |
make ENGINE=snowflake_snowpark run |
Execute the API demo on Snowflake via Snowpark (install fastflowtransform[snowflake]). |
make dag |
Render documentation (site/dag/). |
make api-run |
Run only API models (uses HTTP cache). |
make api-offline |
Force offline mode (FF_HTTP_OFFLINE=1). |
make api-show-http |
Display HTTP snapshot metrics via jq. |
HTTP tuning parameters (FF_HTTP_ALLOWED_DOMAINS, cache dir, timeouts) live in .env and are appended via HTTP_ENV when running commands.
End-to-End Demo¶
- Select engine:
make ENGINE=duckdb(default). SetENGINE=postgres,ENGINE=databricks_spark,ENGINE=bigquery BQ_FRAME=<pandas|bigframes>, orENGINE=snowflake_snowparkto switch. - Seed data:
make seed - Run pipeline:
make run - Explore docs:
make dag→ openexamples/api_demo/site/dag/index.html - Inspect HTTP usage:
make api-show-http
This example demonstrates multi-engine configuration, environment-driven secrets, and API enrichment within FastFlowTransform.