YAML Projects

DJ entities can be managed through YAML definitions. This is a versatile feature that enables change review and more holistic testing before deploying to production. Using source-controlled YAML definitions provide a more structured approach to development, allowing you to review and audit changes.

👉

You can anchor your project to a specific namespace in DJ, and use YAML files to define all nodes in that namespace. While not required, this approach promotes a cleaner and more organized setup.

Setup Guide

From Existing

If you’ve already started developing DJ entities through the UI or a different client, you can export the existing entities to a YAML project to get started quickly. Note that this process only supports exporting a single namespace at a time.

Export: Use the DJ CLI (installed when you install the DJ Python client) to export your DJ entities to a YAML project. This snippet will export the default namespace to a YAML project in the ./example_project directory:

dj pull default ./example_project

Validation: Once you’ve made changes to the YAML files, you can validate those changes with:

dj deploy ./example_project --dryrun

Deployment: When satisfied, you can deploy the changes like this:

dj deploy ./example_project

From Scratch

You can also create a YAML project from scratch.

Create a dj.yaml file with project metadata. See additional details on project metadata fields below. Example:

name: Roads Project
description: Roads example project
prefix: projects.roads
mode: published
tags:
  - name: deprecated
    description: This node is deprecated
    tag_type: Maintenance
build:
  priority:
    - roads.date
    - roads.repair_order_details
    - roads.contractors
    - roads.hard_hats

Create YAML files that represent each node. Use file infixes to define node types, like foo.dimension.yaml or foo.transform.yaml). Here is an example for the roads.date_dim node, in file `./roads/

display_name: Date
description: Date dimension
query: |
  SELECT
    dateint,
    month,
    year,
    day
  FROM ${prefix}roads.date
primary_key:
  - dateint

Project Metadata Fields

Field	Required?	Description
`name`	Yes	Name of the YAML project
`description`	Yes	Description of the YAML project
`prefix`	Yes	This is set to a DJ namespace. Node names are derived from the directory and file structure. For example, for the prefix `projects.roads` and file `./baz/boom.source.yaml`, the node name becomes `projects.roads.baz.boom`.
`mode`	No	Whether the project is published or draft (defaults to published)
`tags`	No	Used to define any tags that are used by nodes in the project
`build.priority`	No	Used to control the ordering of node deployment

Node YAML Fields Overview

👉

The node name is derived from the directory structure and file name. For example, for the prefix projects.roads and file ./baz/boom.source.yaml, the node name becomes projects.roads.baz.boom.

Source Node YAML

Field	Required?	Description
`table`	Yes	The physical table of this source node
`columns`	No	The columns of this source node, will be derived from the table if not provided
`display_name`	No	The display name of the source node
`description`	No	Description of the node
`tags`	No	A list of tags for this node
`primary_key`	No	A list of columns that make up the primary key of this node
`dimension_links`	No	A list of dimension links, if any. See details.

Transform / Dimension Node YAML

Field	Required?	Description
`display_name`	No	The display name of the node
`description`	No	Description of the node
`query`	Yes	The SQL query for the node
`columns`	No	Optional column-level settings (like `attributes` or `partition`)
`tags`	No	A list of tags for this node
`primary_key`	No	A list of columns that make up the primary key of this node
`dimension_links`	No	A list of dimension links, if any. See details.

Metric Node YAML

Field	Required?	Description
`display_name`	No	The display name of the node
`description`	No	Description of the node
`query`	Yes	The SQL query for the node
`columns`	No	Optional column-level settings (like `attributes` or `partition`)
`tags`	No	A list of tags for this node
`required_dimensions`	No	A list of required dimensions for this metric
`direction`	No	Direction of this metric (one of `higher_is_better`, `lower_is_better`, or `neutral`)
`unit`	No	The unit of this metric

Cube Node YAML

Field	Required?	Description
`display_name`	No	The display name of the node
`description`	No	Description of the node
`metrics`	Yes	The metrics in the cube
`dimensions`	Yes	The dimensions in the cube
`columns`	No	Optional column-level settings (like `attributes` or `partition`)
`tags`	No	A list of tags for this node

Dimension Link YAML

There are two types of dimension links, join links and reference links. The fields available for each of the two are slightly different. Here’s an example of a node with both:

description: Hard hat dimension
display_name: Local Hard Hats
query: ...
primary_key: ...
dimension_links:
  - type: join
    node_column: state_id
    dimension_node: ${prefix}roads.us_state
  - type: reference
    node_column: birth_date
    dimension: ${prefix}roads.date_dim.dateint
    role: birth_date

Join Link Fields	Required?	Description
`type`	Yes	Must be `join`
`dimension_node`	Yes	The dimension node being linked to
`node_column`	No	The column on this node that is being linked from
`join_on`	No	A custom join on SQL clause
`join_type`	No	The type of join (one of `left`, `right`, `inner`, `full`, `cross`). Defaults to `left`.
`role`	No	The role this dimension represents

Reference Link Fields	Required?	Description
`type`	Yes	Must be `reference`
`node_column`	Yes	The column on this node that is being linked from
`dimension`	Yes	The dimension attribute being linked to
`role`	No	The role this dimension represents

Columns YAML

The columns section can be included if additional column-level settings are needed on the node.

Attributes

Column-level attributes like dimension can be configured like this:

columns:
- name: is_clicked
  attributes:
  - dimension

Display Name

A column can be given a custom display name like this:

columns:
- name: is_clicked
  display_name: Clicked?

Partitions

Partition columns can be configured like this:

columns:
- name: utc_date
  partition:
    format: yyyyMMdd
    granularity: day
    type_: temporal

YAML Projects

Setup Guide #

From Existing #

From Scratch #

Project Metadata Fields #

Node YAML Fields Overview #

Source Node YAML #

Transform / Dimension Node YAML #

Metric Node YAML #

Cube Node YAML #

Dimension Link YAML #

Columns YAML #