Materialization
On this page
Cube Nodes
When we attach a materialization config to a cube node (instructions here), we are requesting DJ to prepare for the materialization of the cube’s underlying data into an OLAP database (such as Druid). This enables low-latency metric queries across all defined dimensions in the cube.
However, many such databases are only configured to work with simple aggregations, so DJ will break down each metric expression into its constituent simple aggregation measures prior to materialization. These measures are ingested into the OLAP database as separate columns and they’re combined back together into the original metrics when users request metric data.
For a detailed explanation of how this decomposition works, including supported aggregation types and dialect translation, see Metric Decomposition.
A few examples include:
SELECT
AVG(price)
AS avg_repair_price
FROM repair_order_details
price_count
count
count(price)
price_sum
sum
sum(price)
SELECT
SUM(price)
AS total_repair_price
FROM repair_order_details
price_sum
sum
sum(price)
SELECT
CAST(
SUM(
IF(discount > 0.0,
1, 0)
) AS DOUBLE
) / COUNT(*)
AS discounted_orders_rate
FROM repair_order_details
discount_sum
sum
sum(
if(
discount > 0.0,
1, 0
)
)
count_star
count
count(*)
SELECT
SUM(price1) +
SUM(price2)
AS total_cost
FROM costs
price1_sum
sum
sum(price1)
price2_sum
sum
sum(price2)
SELECT
APPROX_COUNT_DISTINCT(
user_id
) AS unique_users
FROM events
user_id_hll
hll_sketch_agg
hll_sketch_agg(user_id)
Stored as binary HLL sketch
The combiner expression for APPROX_COUNT_DISTINCT uses HyperLogLog functions:
hll_sketch_estimate(hll_union(user_id_hll))
This is automatically translated to the appropriate dialect when querying:
- Druid:
APPROX_COUNT_DISTINCT_DS_HLL(DS_HLL(user_id_hll)) - Trino:
cardinality(merge(user_id_hll))