Materialization

Cube Nodes

When we attach a materialization config to a cube node (instructions here), we are requesting DJ to prepare for the materialization of the cube’s underlying data into an OLAP database (such as Druid). This enables low-latency metric queries across all defined dimensions in the cube.

However, many such databases are only configured to work with simple aggregations, so DJ will break down each metric expression into its constituent simple aggregation measures prior to materialization. These measures are ingested into the OLAP database as separate columns and they’re combined back together into the original metrics when users request metric data.

For a detailed explanation of how this decomposition works, including supported aggregation types and dialect translation, see Metric Decomposition.

A few examples include:

SELECT
  AVG(price)
    AS avg_repair_price
FROM repair_order_details

price_count

count

count(price)

price_sum

sum

sum(price)
SELECT
  SUM(price)
    AS total_repair_price
FROM repair_order_details

price_sum

sum

sum(price)
SELECT
  CAST(
    SUM(
      IF(discount > 0.0,
          1, 0)
    ) AS DOUBLE
  ) / COUNT(*)
    AS discounted_orders_rate
FROM repair_order_details

discount_sum

sum

sum(
  if(
    discount > 0.0,
    1, 0
  )
)

count_star

count

count(*)
SELECT
  SUM(price1) +
    SUM(price2)
  AS total_cost
FROM costs

price1_sum

sum

sum(price1)

price2_sum

sum

sum(price2)
SELECT
  APPROX_COUNT_DISTINCT(
    user_id
  ) AS unique_users
FROM events

user_id_hll

hll_sketch_agg

hll_sketch_agg(user_id)

Stored as binary HLL sketch

The combiner expression for APPROX_COUNT_DISTINCT uses HyperLogLog functions:

hll_sketch_estimate(hll_union(user_id_hll))

This is automatically translated to the appropriate dialect when querying:

  • Druid: APPROX_COUNT_DISTINCT_DS_HLL(DS_HLL(user_id_hll))
  • Trino: cardinality(merge(user_id_hll))