Pipeline YAML

Pipeline YAML Overview

The pipeline YAML descriptor is the canonical artefact in Open-M. It is human-readable, Git-diffable, and drives everything: the IDE canvas renders it, the control plane provisions topics and subscriptions from it, and CI/CD validates and deploys it. You never have to write XML, click through a wizard, or hand-edit a binary file.

Why YAML?

Enterprise middleware platforms like TIBCO BusinessWorks and MuleSoft use XML as their canonical flow format. XML makes for unresolvable merge conflicts, is unreadable without the designer tool, and requires every change to flow through a single IDE. Open-M uses YAML because:

Human readable. A developer can understand a pipeline's topology from the YAML without opening any tool.
Git-native. Adding a component or arrow appears as a small, localised addition. Diffs are reviewable in a pull request.
Kubernetes-consistent. The same syntax family as Helm charts, ArgoCD config, and every CI/CD tool in the cloud-native ecosystem.
Validatable with JSON Schema. VS Code provides real-time autocomplete and error highlighting without any Open-M-specific plugin.
Canvas is a rendering, not the source. The IDE reads the YAML and renders it as a graph. Every canvas action writes back to the YAML. There is one source of truth.

Top-level structure

yaml — skeleton

apiVersion: open-m/v1
kind:       Pipeline

metadata:
  name:        order-fulfillment
  namespace:   logistics.orders
  version:     2.5.0
  description: "Receives orders, enriches, validates inventory, dispatches."

spec:
  defaults:    # pipeline-wide defaults, all overridable per component/connection
    ...
  schemas:     # all schema refs used in this pipeline
    ...
  components:  # nodes on the canvas
    ...
  connections: # arrows on the canvas — each is a topic + subscription + route
    ...
  error_handling:
    ...
  scaling:     # kept separate from logical flow
    ...

The two mandatory top-level fields are apiVersion: open-m/v1 and kind: Pipeline. These allow the control plane and IDE to version-gate the parser and provide forward-compatible parsing as the descriptor format evolves.

Stanzas at a glance

metadata

Identity & version

name, namespace, version (semver), description. The namespace determines the Pulsar tenant and topic prefix for all auto-generated topics in this pipeline.

spec.defaults

Pipeline-wide defaults

delivery_guarantee, subscription_type, retention_policy, retry strategy, logging level. Every component and connection inherits these unless overridden.

spec.schemas

Schema declarations

All schema refs used in this pipeline, declared once. Full schema content lives in the Schema Registry. Enables control-plane compatibility validation at deploy time.

spec.components

Components (canvas nodes)

id, ref (namespace.category.name:version), config, ports (with schema_ref), placement (x/y for canvas), logging topic. One entry per component node.

spec.connections

Connections (canvas arrows)

Each arrow = one output topic + one named durable subscription + one route. Carries schema_ref, delivery_guarantee, retry policy, and error_routing.

transform stanza

UTL-X on the arrow

Mode 1 (plain), Mode 2 (UTL-X inline or ref on the connection), Mode 3 (explicit mapping component). Declared inside the connection stanza.

error_handling

DLQ & retry

Pipeline-level DLQ strategy, replay flag, alert channel. Per-connection retry (max_attempts, strategy, delays) and error_routing topic/subscription live in the connection stanza.

spec.scaling

Multi-cluster placement

Kept separate from the logical flow. Declares which cluster each component runs on and how many replicas. The logical pipeline topology is unchanged regardless of how scaling is configured.

Full minimal example

A two-component pipeline — HTTP inbound to SAP RFC — with a UTL-X field mapping on the arrow, a single error handler, and single-cluster scaling:

yaml — complete minimal pipeline

apiVersion: open-m/v1
kind:       Pipeline

metadata:
  name:      order-to-sap
  namespace: logistics.orders
  version:   1.0.0

spec:
  defaults:
    delivery_guarantee: at_least_once
    subscription_type:  Shared
    retry:
      max_attempts:    3
      strategy:        exponential_backoff
      initial_delay_ms: 500
    logging:
      level:           INFO

  schemas:
    - { ref: logistics.schemas.http-order:1.0.0,   type: JSCH }
    - { ref: logistics.schemas.sap-order-idoc:1.0.0, type: XSC  }

  components:
    - id:  http-inbound
      ref: open-m.adapters.http-inbound:1.4.0
      config: { path: /orders, method: POST }
      ports:
        output: { schema_ref: logistics.schemas.http-order:1.0.0, format: JSON }
      placement: { x: 100, y: 300 }
      logging:
        topic: logistics.orders.order-to-sap.http-inbound.log

    - id:  sap-rfc-out
      ref: open-m.connectors.sap-rfc:2.1.0
      config: { bapi: BAPI_SALESORDER_CREATEFROMDAT2 }
      ports:
        input: { schema_ref: logistics.schemas.sap-order-idoc:1.0.0, format: XML }
      placement: { x: 600, y: 300 }
      logging:
        topic: logistics.orders.order-to-sap.sap-rfc-out.log

  connections:
    - id:   conn-http-to-sap                                   # auto
      from: { component: http-inbound, port: output }
      to:   { component: sap-rfc-out,  port: input  }
      route:
        topic:             logistics.orders.order-to-sap.http-inbound.out  # auto
        subscription:      logistics.orders.order-to-sap.sap-rfc-out.sub   # auto
        source_schema_ref: logistics.schemas.http-order:1.0.0
        target_schema_ref: logistics.schemas.sap-order-idoc:1.0.0
      transform:
        type: utlx
        mode: ref
        ref:  logistics.mappings.http-order-to-idoc:1.0.0
      error_routing:
        topic:        logistics.orders.order-to-sap.http-inbound.err      # auto
        subscription: logistics.orders.order-to-sap.error-handler.err-sub  # auto

  error_handling:
    dlq_strategy:   per_component
    replay_enabled: true
    alert_on_dlq:   true

  scaling:
    clusters:
      - cluster-ref: k8s-prod-eu-west
        components:
          - { id: http-inbound, replicas: 2 }
          - { id: sap-rfc-out,  replicas: 1 }

Auto-generated fields

When you draw an arrow in the IDE canvas, these fields are auto-populated from the naming convention. You can override any of them in the text editor:

connections[].id — {from-id}--{to-id}
route.topic — {namespace}.{pipeline-name}.{from-component-id}.out
route.subscription — {namespace}.{pipeline-name}.{to-component-id}.sub
error_routing.topic — {namespace}.{pipeline-name}.{from-component-id}.err
components[].logging.topic — {namespace}.{pipeline-name}.{component-id}.log
DLQ topics — {namespace}.{pipeline-name}.{component-id}.dlq

⚠️

Subscription names are stable identifiers. In Pulsar, a subscription name is durable — if it changes between deployments, the consumer restarts from the latest offset and silently drops unprocessed messages. Never rename a subscription on a production pipeline without a deliberate migration plan. Auto-generated subscription names are stable as long as component IDs and pipeline names don't change.

YAML as the canonical artefact

The pipeline YAML is always the source of truth. The IDE canvas is a visual rendering of the YAML — not the other way around. Every canvas action (drag a component, draw an arrow, change a config value) writes back to the YAML file immediately. The canvas and the YAML are always in sync.

This means your pipelines live in Git. Pull requests show exactly what changed: which component was added, which connection was rewired, what config value changed. CI/CD validates the YAML schema and runs connector tests before merge. There are no locked binary files, no single IDE chokepoint.

💡

Install the Open-M VS Code extension to get JSON Schema validation, component ref autocomplete, and hover documentation directly in your editor — no canvas required. The extension registers a JSON Schema for apiVersion: open-m/v1, kind: Pipeline documents automatically.

REFERENCE

spec.components →

id, ref, config, ports, placement, logging — full field reference.

REFERENCE

spec.connections →

route, subscription, schema_ref, error_routing — full field reference.

REFERENCE

transforms (UTL-X) →

Mode 1/2/3, inline/ref, N-input — the transform stanza in full.

CONCEPT

Pipeline model →

Components, ports, arrows, and how the topology maps to broker objects.