Cubes

Cubes are used to represent a set of metrics with dimensions and filters.

AttributeDescriptionType
nameUnique name used by other nodes to select from this nodestring
display_nameA human readable name for the nodestring
descriptionA human readable description of the nodestring
modepublished or draft (see Node Mode)string
metricsA set of metric node nameslist[string]
dimensionsA set of dimension attribute nameslist[string]
filtersA set of filterslist[string]

Creating Cube Nodes

curl -X POST http://localhost:8000/nodes/cube/ \
-H 'Content-Type: application/json' \
-d '{
    "name": "default.repairs_cube",
    "mode": "published",
    "display_name": "Repairs for each company",
    "description": "Cube of the number of repair orders grouped by dispatcher companies",
    "metrics": [
        "default.num_repair_orders"
    ],
    "dimensions": [
        "default.all_dispatchers.company_name"
    ],
    "filters": ["default.all_dispatchers.company_name IS NOT NULL"],
}'
from datajunction import DJBuilder, NodeMode
dj = DJBuilder(DJ_URL)

repairs_cube = dj.create_cube(
    name="repairs_cube",
    display_name="Repairs Cube",
    description="Cube of various metrics related to repairs",
    mode=NodeMode.PUBLISHED,  # for draft nodes, use `mode=NodeMode.DRAFT`
    metrics=[
        "num_repair_orders",
        "avg_repair_price",
        "total_repair_cost"
    ],
    dimensions=[
        "hard_hat.country",
        "hard_hat.postal_code",
        "hard_hat.city",
        "hard_hat.state",
        "dispatcher.company_name",
        "municipality_dim.local_region"
    ],
    filters=["hard_hat.state='AZ'"]
)
dj.cubes.create(
    {
        name: "default.repairs_cube",
        mode: "published",
        display_name: "Repairs for each company",
        description: "Cube of the number of repair orders grouped by dispatcher companies",
        metrics: [
            "default.num_repair_orders"
        ],
        dimensions: [
            "default.all_dispatchers.company_name"
        ],
        filters: ["default.all_dispatchers.company_name IS NOT NULL"]
    }
).then(data => console.log(data))

Adding Materialization Config

Any non-source node in DJ can have user-configurable materialization settings, but for cube nodes, DJ will seed the node with a set of generic cube materialization settings that can be used downstream by different materialization engines. Like all other non-source nodes, users can then set engine-specific materialization config, which will be layered on top of the generic cube materialization settings.

DJ currently supports materialization of cubes into Druid.

This can be added using the following request, assuming that the Druid engine is already configured in your DJ setup:

curl -X POST \
http://localhost:8000/nodes/default.repairs_cube/materialization/ \
-H 'Content-Type: application/json'
-d '{
  "engine": {
    "name": "DRUID",
    "version": ""
  },
  "schedule": "0 * * * *",
  "config": {
    "spark": {
      "spark.driver.memory": "4g",
      "spark.executor.memory": "6g"
    },
    "druid": {
      "timestamp_column": "dateint",
      "intervals": ["2023-01-01/2023-03-31"],
      "granularity": "DAY"
    }
  }
}'
from datajunction import MaterializationConfig, Engine

config = MaterializationConfig(
    engine=Engine(
        name="DRUID",
        version="",
    ),
    schedule="0 * * * *",
    config={
        "spark": {
            "spark.driver.memory": "4g",
            "spark.executor.memory": "6g",
            "spark.executor.cores": "2",
            "spark.memory.fraction": "0.3",
        },
        "druid": {
            "timestamp_column": "dateint",
            "intervals": ["2023-01-01/2023-03-31"],
            "granularity": "DAY",
        },
    },
)
cube.add_materialization(config)