Transforms

Transform nodes allow you to do arbitray SQL operations on sources, dimensions, and even other transform nodes. Of course with a perfect data model, you may not need to define any transform nodes. However, in some cases it may be convenient to use transform nodes to clean up your external data within DJ by joining, aggregating, casting types, or any other SQL operation that your query engine supports.

AttributeDescriptionType
nameUnique name used by other nodes to select from this nodestring
display_nameA human readable name for the nodestring
descriptionA human readable description of the nodestring
modepublished or draft (see Node Mode)string
queryA SQL query that selects from other nodesstring

Creating Transform Nodes

curl -X POST http://localhost:8000/nodes/transform/ \
-H 'Content-Type: application/json' \
-d '{
    "name": "default.repair_orders_w_dispatchers",
    "description": "Repair orders that have a dispatcher",
    "mode": "published",
    "query": "SELECT repair_order_id, municipality_id, hard_hat_id, dispatcher_id FROM default.repair_orders WHERE dispatcher_id IS NOT NULL"
}'
from datajunction import DJBuilder, NodeMode
dj = DJBuilder(DJ_URL)

transform_node = dj.create_transform(
    name="repair_orders_w_dispatchers",
    description="Repair orders that have a dispatcher",
    query="""
        SELECT
        repair_order_id,
        municipality_id,
        hard_hat_id,
        dispatcher_id
        FROM default.repair_orders
        WHERE dispatcher_id IS NOT NULL
    """,
    mode=NodeMode.PUBLISHED,  # for draft nodes, use `mode=NodeMode.DRAFT`
)
dj.transforms.create(
    {
        name: "default.repair_orders_w_dispatchers",
        mode: "published",
        description: "Repair orders that have a dispatcher",
        query: `
            SELECT
            repair_order_id,
            municipality_id,
            hard_hat_id,
            dispatcher_id
            FROM default.repair_orders
            WHERE dispatcher_id IS NOT NULL
        `
    }
).then(data => console.log(data))