Concepts

UTL-X Mapping

UTL-X (Universal Transformation Language Extended) is Open-M's native mapping language — an open-source, format-agnostic functional transformation language. Pure, stateless, deterministic. It runs inline on the connection arrow without an extra component hop, or as a full mapping component for complex scenarios. AGPL v3, dual-licensed for commercial use.

What is UTL-X?

UTL-X is a format-agnostic functional transformation language inspired by MuleSoft DataWeave, XSLT, and functional programming principles. It abstracts away the source format — the same transformation logic works whether the input is XML, JSON, CSV, YAML, Protobuf, or Avro. The language is pure: every expression returns a value, there are no side effects, and transformations are always deterministic.

In Open-M pipelines, UTL-X serves as the glue between components with incompatible schemas. It is deliberately lightweight for simple field mapping, and scales to N-input fan-in scenarios using the MPPM step window for contextual history.

📦

UTL-X is a standalone open-source project at github.com/grauwen/utl-x, separately usable outside Open-M. It is embedded in the Open-M runtime as the native mapping engine. The conformance suite has 465 tests passing at 100%.

The three mapping modes

Open-M does not force every field mapping through a full component. Instead, mapping is placed where it belongs — either invisible (Mode 1), lightweight on the arrow (Mode 2), or as an explicit component (Mode 3).

MODE 1
Plain connection
Source and target schemas match. No transform needed. Pulsar carries the payload as-is.
No transform
MODE 2
UTL-X on the arrow
Stateless field mapping inline on the connection. Executes inside the receiving component's wrapper after dequeue, before business logic. No extra hop.
inline · ref
MODE 3
Mapping component
Full explicit component. Required for external calls, aggregation, splitting, non-UTL-X engines, or MPPM step-window traceability.
Explicit node

In the IDE, Mode 2 renders as a decorated arrow with a ◇ diamond icon at the midpoint. Mode 3 renders as an explicit rectangular node on the canvas. Mode 1 is a plain arrow with no decoration.

Mode 1 — Plain connection

When the source component's output schema and the target component's input schema are identical, no transform is needed. The connection carries the MPPM envelope payload unchanged. This is the most efficient path — zero processing overhead beyond the Pulsar topic hop.

yaml — Mode 1 connection
connections:
  - id: kafka-to-processor
    from:
      component: kafka-in
      port: output
    to:
      component: order-processor
      port: input
        # No transform stanza — schemas match, plain connection

Mode 2 — UTL-X inline on the arrow

The most common mapping pattern. A UTL-X transform is declared directly on the connection in the pipeline YAML. It executes inside the receiving component's wrapper — in-process, after dequeue, before business logic — with no extra Pulsar topic or component pod.

Mode 2 has two sub-variants: inline (the mapping body is embedded in the YAML, suitable for small unique mappings under ~20 field assignments) and ref (references a versioned mapping stored in the Mapping Registry, suitable for complex or reused mappings).

ℹ️

Inline mappings and the step window: An inline Mode 2 mapping sits on a single arrow and can only access the step history of that pipe. The wrapper extracts one or more steps from the same MPPM envelope and passes them as named inputs — e.g. input: current json, previous json. Nothing from any other pipe is reachable from an inline mapping. To combine payloads from multiple pipes, use Mode 2 ref with N-input, or Mode 3.

Mode 2 inline

yaml — Mode 2 inline transform
connections:
  - id: sap-to-salesforce
    from:
      component: sap-idoc-in
      port: output
    to:
      component: salesforce-opportunity
      port: input
    route:
      topic: logistics.order-pipeline.sap-idoc-in.out
      source_schema_ref: logistics.schemas.sap-order-idoc:1.0.0
      target_schema_ref: logistics.schemas.sf-opportunity:2.1.0
    transform:
      type: utlx
      mode: inline
      mapping: |
                %utlx 1.0
                input: current json, previous json   // two steps from this pipe only
                output json
                ---
                {
                  Name:        $current.ORDERS05.E1EDK01.BSTNK,
                  AccountId:   $current.ORDERS05.E1EDKA1[0].PARTN,
                  Amount:      $current.ORDERS05.E1EDP01.NETWR |> toNumber(),
                  CloseDate:   $current.ORDERS05.E1EDK03.DATUM |> toDate("yyyyMMdd"),
                  StageName:   "Prospecting",
                  PrevOrderRef: $previous.ORDERS05.E1EDK01.BSTNK  // step before on same pipe
                }
      target_schema_ref: logistics.schemas.sf-opportunity:2.1.0

Mode 2 ref

For complex or reused mappings, store the UTL-X script in the Mapping Registry and reference it by version. This allows independent versioning, testing, and reuse across pipelines.

yaml — Mode 2 ref transform
transform:
  type: utlx
  mode: ref
  ref: logistics.mappings.sap-order-to-sf-opportunity:2.0.0
  target_schema_ref: logistics.schemas.sf-opportunity:2.1.0
💡

When to use inline vs ref: inline for pipeline-unique mappings under ~20 field assignments. ref when the mapping is reused across pipelines, requires independent version control, or exceeds ~20 assignments. ref mappings are searchable in the Mapping Registry and visible in the IDE's mapping browser.

Mode 3 — Full mapping component

An explicit component node on the pipeline canvas. Required when any of the following apply:

yaml — Mode 3 mapping component
components:
  - id: enrich-with-customer-data
    ref: open-m.connectors.utlx-mapper:1.0.0
    config:
      mapping_ref: crm.mappings.order-enrichment:3.1.0
      engine: utlx
    ports:
      input:
        schema_ref: crm.schemas.raw-order:1.0.0
      output:
        schema_ref: crm.schemas.enriched-order:2.0.0

Language syntax

Every UTL-X transformation follows the same three-part structure: a header declaring version and formats, a separator (---), and a functional expression body that produces the output.

utlx — basic structure
// Header — format declarations
%utlx 1.0
input  auto    // auto-detect: XML, JSON, CSV, YAML
output json    // target format
---
// Body — pure functional expression
{
  invoice: {
    id:       $input.Order.@id,
    customer: $input.Order.Customer.Name,
    total:    $input.Order.Items |> map(item => item.Price * item.Qty)
                                    |> sum()
  }
}

UTL-X uses C-style comments// for single-line and /* */ for multi-line. Hash (#) is not a comment character and will cause parse errors in the body.

utlx — key operators
// Pipe operator — chains transformations
$input.items |> filter(i => i.active) |> map(i => i.name)

// Lambda arrow
items |> map(item => { name: item.label, qty: item.count })

// Safe navigation — returns null instead of error
$input.Order?.Customer?.Address

// XML attribute access
$input.Order.@id

// Index access
$input.Order.Items[0]

// Filter using pipe
$input.Order.Items |> filter(i => i.Price > 100)

Format-agnostic selectors

The same path syntax works regardless of whether the input is XML, JSON, CSV, or YAML. UTL-X translates the path to the appropriate native access at runtime via its Universal Data Model (UDM).

utlx — selectors
$input.Order.Customer.Name                             // Simple path
$input.Order.Items[0]                                  // Index access
$input.Order.@id                                       // XML attribute / JSON property
$input.Order.Items |> map(i => i.Name)                 // Extract from all elements
$input.Order.Items |> filter(i => i.Total > 1000)    // Filter by condition

// Multi-input — named with $ prefix (see N-input section)
$order.header.id
$pricing.lines[0].unitPrice

Standard library

UTL-X ships with 635 standard library functions covering strings, arrays, dates, math, type conversion, and schema operations. All functions work on the Universal Data Model and are format-independent.

utlx — common stdlib functions
// String functions
upper($input.name)
trim($input.description)
replace(value, ({"\\n": "", "\\t": ""}))   // object literal needs ( )
contains($input.code, "SAP")

// Array / collection functions
map(items, item => item.name)
filter(items, item => item.active)
reduce(items, (acc, item) => acc + item.amount, 0)
sum(prices)
groupBy(items, item => item.category)
flatten(nestedList)
distinct(values)

// Date / time functions
toDate($input.dateStr, "yyyyMMdd")
formatDate(date, "yyyy-MM-dd")
now()

// Type conversion
toNumber($input.amount)
toString(42)
toBoolean("true")

// Null handling
default($input.optional?.field, "N/A")
isNull($input.value)

The step window

Every MPPM envelope carries a step window — an ordered history of the last N payloads the message passed through. This gives UTL-X access to previous states of the data without any stateful store or external lookup.

STEP WINDOW — DEPTH 3 (DEFAULT)
current
$input (step[0]) current payload
step[-1]
payload after component A
step[-2]
payload after component B
step[-3]
payload after component C

The step window is an Open-M MPPM envelope concept — it is not a UTL-X language feature. UTL-X itself has no awareness of message history. If a previous step's payload is needed inside a UTL-X mapping, the Open-M wrapper extracts the relevant envelope step and passes it as a named input declared in the UTL-X header. Previous steps are just inputs like any other — they must be named.

utlx — accessing a previous step as a named input
%utlx 1.0
input: current json, previous json
output json
---
{
  // Current payload — named input "$current"
  orderId:       $current.id,
  finalAmount:   $current.pricing.finalAmount,

  // Previous step payload — named input "$previous"
  originalPrice: $previous.pricing.baseAmount,
  sourceSystem:  $previous.metadata.origin,

  // Computed across both
  priceChange:   $current.pricing.finalAmount - $previous.pricing.baseAmount
}

The Open-M pipeline YAML specifies which envelope steps to extract and under which input alias to pass them to the UTL-X script. The mapping itself stays pure — it only sees named inputs, with no knowledge of the MPPM envelope or step indices.

💡

Multiple MPPM envelopes as inputs: A UTL-X mapping can receive payloads from N independent MPPM envelopes — each from a different correlation chain — as long as each is passed as a distinctly named input by the Open-M wrapper. Mode 3 is only required when the join needs stateful waiting — i.e. the envelopes don't arrive at the same time and need to be held until all are present. If the wrapper can assemble them synchronously, Mode 2 is valid.

Input naming convention

Since previous step payloads and multi-envelope inputs are all just named inputs, a clear naming convention makes mappings self-documenting. The recommended pattern is {source}{Stage} — name after what the data represents, not its technical position:

utlx — recommended input naming patterns
// Current + previous step from the same pipeline
input: orderCurrent json, orderPrevious json

// Multiple independent sources
input: orderValidated json, pricingResult json, customerProfile xml

// Named after the upstream microservice/component
input: sapIdocIn xml, workdayEmployee json, sfOpportunity json

// Step history from a named pipe — explicit depth in name
input: pipe1Step0 json, pipe1Step1 json, pipe2Step0 xml

// Mixed formats — each declared with its own format
input: orderHeader xml, orderLines csv, pricing json

The Open-M pipeline YAML transform.inputs stanza binds each arrow or envelope step to its alias name, which must match exactly what the UTL-X header declares. The mapping itself stays pure — it only ever sees named inputs.

N-input fan-in (Mode 2)

UTL-X supports multiple named input arrows converging on a single transform. Each input carries its own step window. The transform is declared on the trigger arrow (the one whose arrival fires the mapping) and references context arrows by alias.

This is valid as Mode 2 only when all inputs share the same correlation_id — meaning they are parts of the same message chain. The wrapper assembles the context synchronously from a short-term buffer (default: 1,000 envelopes, 5 second TTL).

yaml — N-input Mode 2 transform
transform:
  type: utlx
  mode: ref
  ref: logistics.mappings.compose-dispatch:1.0.0
  trigger: true        # this arrow owns the transform
  inputs:
    - alias: order
      arrow: conn-order-to-dispatch-mapper
      source_schema_ref: logistics.schemas.order-validated:1.0.0
      window_depth: 3
    - alias: pricing
      arrow: conn-pricing-to-dispatch-mapper
      source_schema_ref: logistics.schemas.order-pricing:1.0.0
      window_depth: 2
  correlation_mode: same_chain
  target_schema_ref: logistics.schemas.fulfillment-dispatch:2.0.0

Inside the UTL-X mapping, each named input is accessible via its alias with a $ prefix. Each alias also exposes its step window:

utlx — N-input mapping body
%utlx 1.0
input: order json, pricing json
output json
---
{
  // Access named inputs by alias
  dispatchId:   $order.header.orderId,
  customerId:   $order.customer.id,
  totalAmount:  $pricing.summary.grossTotal,
  currency:     $pricing.summary.currency,

  lines: $order.lines |> map(line => {
    sku:      line.productCode,
    qty:      line.quantity,
    price:    $pricing.lines[line.lineId].unitPrice
  })
}
ℹ️

Step window propagation rule: Only the trigger arrow's history propagates forward into the downstream envelope. If a previous step payload is needed inside the mapping, it must be passed as an additional named input declared in the UTL-X header — the wrapper extracts the relevant envelope step and passes it by name. If relevant context data is needed further downstream, explicitly include it in the mapping output.

Which mode to use?

Situation Mode Reason
Schemas match exactly Mode 1 No transform needed. Zero overhead.
Simple field rename or type conversion, unique to this pipeline Mode 2 inline Pure, stateless, <20 assignments. Lives in the YAML.
Complex mapping reused across pipelines Mode 2 ref Independently versioned in Mapping Registry.
Fan-in from same correlation chain (N inputs, same correlation_id) Mode 2 ref (N-input) Same-chain context assembled synchronously by wrapper.
Fan-in from independent correlation chains (different correlation_ids) Mode 3 Requires stateful wait. Use stateful-join component.
Mapping makes external API or database calls Mode 3 Side effects disqualify Mode 2 (must be pure).
Aggregation or splitting (N→1 or 1→N) Mode 3 Cardinality changes require explicit component.
Non-UTL-X engine (XSLT, JSONATA, JQ) Mode 3 External engines always run as components.
Compliance requires MPPM step traceability Mode 3 Mode 3 adds an MPPM step entry visible in the ops dashboard.

UTL-X vs DataWeave and alternatives

FeatureUTL-XDataWeaveXSLTJSONata / JQ
Format agnostic XML, JSON, CSV, YAML, Avro, Protobuf XML only JSON only
Licence AGPL v3 / Commercial Proprietary (Salesforce) W3C open standard MIT / MIT
Inline on message arrow Mode 2 Always a component
Step window history Built into MPPM
N-input fan-in Same-chain, Mode 2 ~ Via DataWeave scripts
Functional / pure
Standard library 635 functions Rich ~ XPATH/EXSLT ~ Limited
LSP / IDE support JSON-RPC 2.0 daemon MuleSoft IDE only ~
Use without middleware platform Standalone CLI Requires MuleSoft