Local Engine Setup
Local Engine Setup¶
DuckDB¶
- Copy
.env.dev_duckdband adjustFF_DUCKDB_PATHif you want a different location (default:.local/api_demo.duckdb).
Optionally setFF_DUCKDB_SCHEMA(default schema for models/seeds) andFF_DUCKDB_CATALOG(catalog alias) if you need to isolate namespaces. - Create the target directory once:
mkdir -p examples/api_demo/.local. - Run
make ENGINE=duckdb seed runto build the seeds and models inside the DuckDB file.
Postgres¶
- Start a local database, e.g. via Docker:
docker run --name fft-postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:15. - Set
FF_PG_DSNin.env.dev_postgres(for examplepostgresql+psycopg://postgres:postgres@localhost:5432/fft) and optionally overrideFF_PG_SCHEMA(defaults toapi_demo).
The executor ensures the schema exists viaCREATE SCHEMA IF NOT EXISTSon first connect. - Execute
make ENGINE=postgres seed runto materialize seeds and models in Postgres.
Databricks Spark (local)¶
- Install Java (JDK ≥ 17) and declare
JAVA_HOME, for example:
brew install openjdk@17
echo 'JAVA_HOME=/opt/homebrew/opt/openjdk@17' >> examples/api_demo/.env.dev_databricks. - Optionally tweak
FF_SPARK_MASTER/FF_SPARK_APP_NAMEin.env.dev_databricks(default:local[*]). - To persist tables across separate
seed/runsessions, enable the bundled Hive metastore defaults:
FF_DBR_ENABLE_HIVE=1,FF_DBR_WAREHOUSE_DIR=examples/api_demo/spark-warehouse,FF_DBR_DATABASE=api_demo. - Switch the physical format by setting
FF_DBR_TABLE_FORMAT(e.g.delta, requires the Delta Lake runtime); extra writer options can be supplied viaprofiles.yml → databricks_spark.table_options. - Ensure your shell loads
.env.dev_databricks(viamake,direnv, or manual export) and runmake ENGINE=databricks_spark seed run.
Yep, let’s bolt on a “how to set it up in GCP” section that fits with what you already have.
Here’s an extended BigQuery section you can drop into your docs (you can keep or trim the parts you already added):
BigQuery¶
1. One-time setup in Google Cloud¶
You only need to do this once per project / environment.
-
Create (or pick) a GCP project
-
Go to the Google Cloud Console → IAM & Admin → Create project.
- Give it a name, e.g.
FFT Basic Demo, and note the Project ID, e.g.fft-basic-demo. -
All further steps refer to this project id.
-
Enable the BigQuery API
-
In the console, go to APIs & Services → Library.
- Search for “BigQuery API” and click Enable.
-
(Optional but recommended) Also enable BigQuery Storage API for faster reads.
-
Create a BigQuery dataset
-
Go to BigQuery in the console (left sidebar).
- Make sure your project
fft-basic-demois selected. - Click “+ Create dataset”:
- Dataset ID: e.g.
basic_demo - Location type: choose a multi-region, e.g.:
EUorUS- Click Create dataset.
- Dataset ID: e.g.
⚠️ Important: The dataset location must match the location you use in your env (FF_BQ_LOCATION).
- If your dataset is in EU (multi-region), then FF_BQ_LOCATION=EU.
- If the dataset is in a single region like europe-west3, use that exact region name.
- Create a service account (for CI / non-interactive use)
For local dev you can use your own user credentials (see below), but for CI/CD or shared environments a service account is better.
- Go to IAM & Admin → Service Accounts → Create service account.
- Name it e.g.
fft-runner. - On the Roles step, add roles with BigQuery write access, for example:
BigQuery Job UserBigQuery Data Editor
- (Optionally) Restrict to dataset level later if you want stricter permissions.
Then create a key:
- Click your service account → Keys → Add key → Create new key.
-
Select JSON, download the file, and store it somewhere safe (e.g.
~/.config/gcloud/fft-sa.json). -
Authentication options
You have two ways to authenticate locally:
A) Application Default Credentials via gcloud (easy for dev)
gcloud auth application-default login
This opens a browser, you log in, and Google stores your ADC in
~/.config/gcloud/application_default_credentials.json.
The BigQuery client in fastflowtransform will pick this up automatically as long as
FF_BQ_PROJECT points to a project you have access to.
B) Service account key (good for CI)
-
Put the downloaded JSON key (from step 4) somewhere on disk.
-
Set the environment variable before running
fft:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/fft-sa.json
-
Make sure the service account has at least:
-
BigQuery Job User -
BigQuery Data Editor -
Optionally grant
BigQuery Data Viewerif you’re only reading some tables.
2. Local configuration (env + profiles)¶
- Environment file (
.env.dev_bigquery)
# BigQuery connection
FF_BQ_PROJECT=fft-basic-demo # your GCP project id
FF_BQ_DATASET=basic_demo # dataset from step 3
FF_BQ_LOCATION=EU # or europe-west3, US, etc. MUST match dataset location
# Active fft environment name (must match profiles.yml)
FFT_ACTIVE_ENV=dev_bigquery
Load this via direnv, make, or manual export.
- profiles.yml
dev_bigquery:
engine: bigquery
bigquery:
project: ${FF_BQ_PROJECT}
dataset: ${FF_BQ_DATASET}
location: ${FF_BQ_LOCATION}
use_bigframes: true # Python models use BigQuery DataFrames (BigFrames)
3. Running seeds, models, and tests¶
- Seed BigQuery from
seeds/:
make ENGINE=bigquery seed
This writes all seeds/*.csv|parquet to tables under
${FF_BQ_PROJECT}.${FF_BQ_DATASET}.*.
- Build models:
make ENGINE=bigquery run
- SQL models are executed as BigQuery queries.
-
Python models with
only="bigquery"run viaBigQueryBFExecutor(BigQuery DataFrames) and are written back into the same dataset. -
Run data-quality tests:
make ENGINE=bigquery test
fft test uses the BigQuery to run checks like not_null, unique,
row_count_between, greater_equal, etc. against
${FF_BQ_PROJECT}.${FF_BQ_DATASET}.<table>.
4. Common BigQuery gotchas¶
-
Location mismatch
-
Error like
Location basic_demo does not support this operationorNot found: Dataset ...:- Check the dataset location in the BigQuery UI.
- Make sure
FF_BQ_LOCATIONis exactly that value (EU,US,europe-west3, …). - Ensure the executor is initialized with the same location (via
profiles.yml→location).
-
Permission issues
-
If you see
accessDeniedorPermission denied:- Confirm you authenticated (ADC or service account).
-
Ensure your user / service account has at least:
-
BigQuery Job User BigQuery Data Editoron the project or dataset.
-
Dataset not found
-
Error
Not found: Dataset fft-basic-demo:basic_demo:-
Check that the dataset id matches exactly:
-
Project:
fft-basic-demo - Dataset:
basic_demo - Verify it exists and is in the same project you set in
FF_BQ_PROJECT.
-
Snowflake Snowpark¶
1. One-time setup in Snowflake¶
You need a Snowflake account with a warehouse and database you can write to.
-
Log in to Snowflake UI (Web Console)
Use your regular Snowflake login. You should see the Worksheets / Data / Compute sections. -
Create (or pick) a warehouse
If you don’t have one yet:
```sql CREATE WAREHOUSE COMPUTE_WH WAREHOUSE_SIZE = XSMALL AUTO_SUSPEND = 60 AUTO_RESUME = TRUE; ````
You can of course use any existing warehouse; just make sure the user you configure below can USE and OPERATE it.
- Create a database and base schema
FFT will auto-create the schema (if allow_create_schema=true), but not the database.
So create the DB once:
CREATE DATABASE EXAMPLE_DEMO;
CREATE SCHEMA EXAMPLE_DEMO.BASIC_DEMO; -- optional, FFT can create this if allowed
Adjust names if you prefer something else; just keep database+schema consistent with .env and profiles.yml.
- User / role permissions
Make sure the user you’ll use for FFT can:
USE ROLE ACCOUNTADMIN; -- or a less powerful custom role with the needed grants
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE ACCOUNTADMIN;
GRANT USAGE ON DATABASE EXAMPLE_DEMO TO ROLE ACCOUNTADMIN;
GRANT USAGE, CREATE SCHEMA, CREATE TABLE, CREATE VIEW ON DATABASE EXAMPLE_DEMO TO ROLE ACCOUNTADMIN;
(In the examples we stick with ACCOUNTADMIN to keep the setup simple; in real environments you’d use a dedicated, restricted role.)
2. Local configuration (env + profiles)¶
- Environment file (
examples/api_demo/.env.dev_snowflake)
# Snowflake connection
FF_SF_ACCOUNT=your_account_name # e.g. xy12345.eu-central-1
FF_SF_USER=YOUR_USERNAME
FF_SF_PASSWORD=YOUR_PASSWORD
FF_SF_WAREHOUSE=COMPUTE_WH
FF_SF_DATABASE=EXAMPLE_DEMO
FF_SF_SCHEMA=BASIC_DEMO
FF_SF_ROLE=ACCOUNTADMIN # or another role with required grants
# Active fft environment name (must match profiles.yml)
FFT_ACTIVE_ENV=dev_snowflake
Notes:
FF_SF_ACCOUNTis the Snowflake account identifier, not the full URL (e.g.xy12345.eu-central-1, nothttps://xy12345.eu-central-1.snowflakecomputing.com).FF_SF_DATABASEmust already exist (see step 1).-
FF_SF_SCHEMAwill be auto-created by FFT ifallow_create_schema=truein the profile. -
profiles.yml
Example profile matching the env above:
dev_snowflake:
engine: snowflake_snowpark
snowflake_snowpark:
account: "{{ env('FF_SF_ACCOUNT') }}"
user: "{{ env('FF_SF_USER') }}"
password: "{{ env('FF_SF_PASSWORD') }}"
warehouse: "{{ env('FF_SF_WAREHOUSE', 'COMPUTE_WH') }}"
database: "{{ env('FF_SF_DATABASE', 'EXAMPLE_DEMO') }}"
db_schema: "{{ env('FF_SF_SCHEMA', 'BASIC_DEMO') }}"
role: "{{ env('FF_SF_ROLE', 'ACCOUNTADMIN') }}"
allow_create_schema: true
-
allow_create_schema: truetells the executor to run:CREATE SCHEMA IF NOT EXISTS "EXAMPLE_DEMO"."BASIC_DEMO";on first connect (best-effort). If you prefer to manage schemas manually, set this to
false.
3. Running seeds and models¶
Once the env file and profile are in place:
- Seed Snowflake from
seeds/:
make ENGINE=snowflake_snowpark seed
This will:
- Connect via Snowpark
- Create the schema (if allowed and it doesn’t exist)
-
Upload CSV seeds via
write_pandasintoEXAMPLE_DEMO.BASIC_DEMO.* -
Build models:
make ENGINE=snowflake_snowpark run
- SQL models are rendered to Snowflake SQL and executed as
CREATE OR REPLACE TABLE/VIEW. -
Snowpark Python models (
only="snowflake_snowpark") receive SnowparkDataFrameinputs and write back usingsave_as_table. -
Run tests (if you have them):
make ENGINE=snowflake_snowpark test
This executes the standard FFT test suite (e.g. not_null, unique, etc.) against tables in EXAMPLE_DEMO.BASIC_DEMO.
4. Cleanup / reset for re-runs¶
You wired Snowflake into your cleanup.py, so you can reset the demo schema with:
python scripts/cleanup.py --engine snowflake_snowpark --project examples/basic_demo
Depending on how you implemented cleanup_snowflake, this typically:
- Drops and recreates the schema (not the database), e.g.
EXAMPLE_DEMO.BASIC_DEMO. - Removes local FFT artifacts (manifest, run_results, etc.) unless
--skip-artifactsis set.
Then you can re-seed and re-run from a clean slate:
make ENGINE=snowflake_snowpark seed run
5. Common Snowflake gotchas¶
-
Database vs schema creation
-
FFT’s Snowflake executor only auto-creates the schema (when
allow_create_schema=true). -
The database must exist (e.g.
EXAMPLE_DEMO) or you’ll getSchema 'EXAMPLE_DEMO.BASIC_DEMO' does not exist or not authorized. -
Case sensitivity / quoting
-
FFT creates tables unquoted, e.g.
CREATE TABLE EXAMPLE_DEMO.BASIC_DEMO.SEED_USERS, so Snowflake stores them as uppercase. - Your SQL models can safely use lowercase identifiers (
select id, email from {{ ref('seed_users') }}); Snowflake normalizes them. -
The executor takes care of quoting database/schema/table names when building fully qualified identifiers.
-
Permissions
-
Errors like
Object 'EXAMPLE_DEMO.BASIC_DEMO.*' does not exist or not authorizedusually mean:- DB/schema/table really doesn’t exist or
- the role in
FF_SF_ROLEdoesn’t haveUSAGE+CREATE TABLE/VIEWon that DB/schema. - Double-check role grants with:
SHOW GRANTS TO ROLE ACCOUNTADMIN;