YAML Projects

DJ entities can be managed through YAML definitions. This is a versatile feature that enables change review and more holistic testing before deploying to production. Using source-controlled YAML definitions provide a more structured approach to development, allowing you to review and audit changes.

Setup Guide

From Existing

If you’ve already started developing DJ entities through the UI or a different client, you can export the existing entities to a YAML project to get started quickly. Note that this process only supports exporting a single namespace at a time.

  1. Export: Use the DJ CLI (installed when you install the DJ Python client) to export your DJ entities to a YAML project. This snippet will export the default namespace to a YAML project in the ./example_project directory:
dj pull default ./example_project
  1. Validation: Once you’ve made changes to the YAML files, you can validate those changes with:
dj deploy ./example_project --dryrun
  1. Deployment: When satisfied, you can deploy the changes like this:
dj deploy ./example_project

From Scratch

You can also create a YAML project from scratch.

  1. Create a dj.yaml file with project metadata. See additional details on project metadata fields below. Example:
name: Roads Project
description: Roads example project
prefix: projects.roads
mode: published
tags:
  - name: deprecated
    description: This node is deprecated
    tag_type: Maintenance
build:
  priority:
    - roads.date
    - roads.repair_order_details
    - roads.contractors
    - roads.hard_hats
  1. Create YAML files that represent each node. Use file infixes to define node types, like foo.dimension.yaml or foo.transform.yaml). Here is an example for the roads.date_dim node, in file `./roads/
display_name: Date
description: Date dimension
query: |
  SELECT
    dateint,
    month,
    year,
    day
  FROM ${prefix}roads.date
primary_key:
  - dateint

Project Metadata Fields

FieldRequired?Description
nameYesName of the YAML project
descriptionYesDescription of the YAML project
prefixYesThis is set to a DJ namespace. Node names are derived from the directory and file structure. For example, for the prefix projects.roads and file ./baz/boom.source.yaml, the node name becomes projects.roads.baz.boom.
modeNoWhether the project is published or draft (defaults to published)
tagsNoUsed to define any tags that are used by nodes in the project
build.priorityNoUsed to control the ordering of node deployment

Node YAML Fields Overview

Source Node YAML
FieldRequired?Description
tableYesThe physical table of this source node
columnsNoThe columns of this source node, will be derived from the table if not provided
display_nameNoThe display name of the source node
descriptionNoDescription of the node
tagsNoA list of tags for this node
primary_keyNoA list of columns that make up the primary key of this node
dimension_linksNoA list of dimension links, if any. See details.
Transform / Dimension Node YAML
FieldRequired?Description
display_nameNoThe display name of the node
descriptionNoDescription of the node
queryYesThe SQL query for the node
columnsNoOptional column-level settings (like attributes or partition)
tagsNoA list of tags for this node
primary_keyNoA list of columns that make up the primary key of this node
dimension_linksNoA list of dimension links, if any. See details.
Metric Node YAML
FieldRequired?Description
display_nameNoThe display name of the node
descriptionNoDescription of the node
queryYesThe SQL query for the node
columnsNoOptional column-level settings (like attributes or partition)
tagsNoA list of tags for this node
required_dimensionsNoA list of required dimensions for this metric
directionNoDirection of this metric (one of higher_is_better, lower_is_better, or neutral)
unitNoThe unit of this metric
Cube Node YAML
FieldRequired?Description
display_nameNoThe display name of the node
descriptionNoDescription of the node
metricsYesThe metrics in the cube
dimensionsYesThe dimensions in the cube
columnsNoOptional column-level settings (like attributes or partition)
tagsNoA list of tags for this node

There are two types of dimension links, join links and reference links. The fields available for each of the two are slightly different. Here’s an example of a node with both:

description: Hard hat dimension
display_name: Local Hard Hats
query: ...
primary_key: ...
dimension_links:
  - type: join
    node_column: state_id
    dimension_node: ${prefix}roads.us_state
  - type: reference
    node_column: birth_date
    dimension: ${prefix}roads.date_dim.dateint
    role: birth_date
Join Link FieldsRequired?Description
typeYesMust be join
dimension_nodeYesThe dimension node being linked to
node_columnNoThe column on this node that is being linked from
join_onNoA custom join on SQL clause
join_typeNoThe type of join (one of left, right, inner, full, cross). Defaults to left.
roleNoThe role this dimension represents
Reference Link FieldsRequired?Description
typeYesMust be reference
node_columnYesThe column on this node that is being linked from
dimensionYesThe dimension attribute being linked to
roleNoThe role this dimension represents
Columns YAML

The columns section can be included if additional column-level settings are needed on the node.

Attributes

Column-level attributes like dimension can be configured like this:

columns:
- name: is_clicked
  attributes:
  - dimension

Display Name

A column can be given a custom display name like this:

columns:
- name: is_clicked
  display_name: Clicked?

Partitions

Partition columns can be configured like this:

columns:
- name: utc_date
  partition:
    format: yyyyMMdd
    granularity: day
    type_: temporal