Usage

To start, create a directory with a following structure, where manifest.json is a file generated by dbt:

.
├── config
│   ├── base
│      ├── airflow.yml
│      ├── dbt.yml
│      └── k8s.yml
│   └── dev
│       └── dbt.yml
├── dag.py
└── manifest.json

Then, put the following code into dag.py:

from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory
from airflow.models import Variable
from os import path

dag = AirflowDagFactory(path.dirname(path.abspath(__file__)), Variable.get("env")).create()

For older versions of Airflow (before 2.0) the dag file need to be slightly bigger:

from airflow import DAG
from pytimeparse import parse
from os import path
from airflow.models import Variable
from dbt_airflow_factory.config_utils import read_config
from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory

dag_factory = AirflowDagFactory(path.dirname(path.abspath(__file__)), Variable.get("env"))
config = dag_factory.read_config()
with DAG(default_args=config["default_args"], **config["dag"]) as dag:
    dag_factory.create_tasks(config)

When uploaded to Airflow DAGs directory, it will get picked up by Airflow, parse manifest.json and prepare a DAG to run.

Configuration files

It is best to look up the example configuration files in tests directory to get a glimpse of correct configs.

You can use Airflow template variables in your dbt.yml and k8s.yml files, as long as they are inside quotation marks:

target: "{{ var.value.env }}"
some_other_field: "{{ ds_nodash }}"

Analogously, you can use "{{ var.value.VARIABLE_NAME }}" in airflow.yml, but only the Airflow variable getter. Any other Airflow template variables will not work in airflow.yml.

Creation of the directory with data-pipelines-cli

DBT Airflow Factory works best in tandem with data-pipelines-cli tool. dp not only prepares directory for the library to digest, but also automates Docker image building and pushes generated directory to the cloud storage of your choice.