Pipeline YAML

Pipeline YAML Overview

The pipeline YAML descriptor is the canonical artefact in Open-M. It is human-readable, Git-diffable, and drives everything: the IDE canvas renders it, the control plane provisions topics and subscriptions from it, and CI/CD validates and deploys it. You never have to write XML, click through a wizard, or hand-edit a binary file.

Why YAML?

Enterprise middleware platforms like TIBCO BusinessWorks and MuleSoft use XML as their canonical flow format. XML makes for unresolvable merge conflicts, is unreadable without the designer tool, and requires every change to flow through a single IDE. Open-M uses YAML because:

Top-level structure

yaml — skeleton
apiVersion: open-m/v1
kind:       Pipeline

metadata:
  name:        order-fulfillment
  namespace:   logistics.orders
  version:     2.5.0
  description: "Receives orders, enriches, validates inventory, dispatches."

spec:
  defaults:    # pipeline-wide defaults, all overridable per component/connection
    ...
  schemas:     # all schema refs used in this pipeline
    ...
  components:  # nodes on the canvas
    ...
  connections: # arrows on the canvas — each is a topic + subscription + route
    ...
  error_handling:
    ...
  scaling:     # kept separate from logical flow
    ...

The two mandatory top-level fields are apiVersion: open-m/v1 and kind: Pipeline. These allow the control plane and IDE to version-gate the parser and provide forward-compatible parsing as the descriptor format evolves.

Stanzas at a glance

metadata
Identity & version
name, namespace, version (semver), description. The namespace determines the Pulsar tenant and topic prefix for all auto-generated topics in this pipeline.
spec.defaults
Pipeline-wide defaults
delivery_guarantee, subscription_type, retention_policy, retry strategy, logging level. Every component and connection inherits these unless overridden.
spec.schemas
Schema declarations
All schema refs used in this pipeline, declared once. Full schema content lives in the Schema Registry. Enables control-plane compatibility validation at deploy time.
spec.components
Components (canvas nodes)
id, ref (namespace.category.name:version), config, ports (with schema_ref), placement (x/y for canvas), logging topic. One entry per component node.
spec.connections
Connections (canvas arrows)
Each arrow = one output topic + one named durable subscription + one route. Carries schema_ref, delivery_guarantee, retry policy, and error_routing.
transform stanza
UTL-X on the arrow
Mode 1 (plain), Mode 2 (UTL-X inline or ref on the connection), Mode 3 (explicit mapping component). Declared inside the connection stanza.
error_handling
DLQ & retry
Pipeline-level DLQ strategy, replay flag, alert channel. Per-connection retry (max_attempts, strategy, delays) and error_routing topic/subscription live in the connection stanza.
spec.scaling
Multi-cluster placement
Kept separate from the logical flow. Declares which cluster each component runs on and how many replicas. The logical pipeline topology is unchanged regardless of how scaling is configured.

Full minimal example

A two-component pipeline — HTTP inbound to SAP RFC — with a UTL-X field mapping on the arrow, a single error handler, and single-cluster scaling:

yaml — complete minimal pipeline
apiVersion: open-m/v1
kind:       Pipeline

metadata:
  name:      order-to-sap
  namespace: logistics.orders
  version:   1.0.0

spec:
  defaults:
    delivery_guarantee: at_least_once
    subscription_type:  Shared
    retry:
      max_attempts:    3
      strategy:        exponential_backoff
      initial_delay_ms: 500
    logging:
      level:           INFO

  schemas:
    - { ref: logistics.schemas.http-order:1.0.0,   type: JSCH }
    - { ref: logistics.schemas.sap-order-idoc:1.0.0, type: XSC  }

  components:
    - id:  http-inbound
      ref: open-m.adapters.http-inbound:1.4.0
      config: { path: /orders, method: POST }
      ports:
        output: { schema_ref: logistics.schemas.http-order:1.0.0, format: JSON }
      placement: { x: 100, y: 300 }
      logging:
        topic: logistics.orders.order-to-sap.http-inbound.log

    - id:  sap-rfc-out
      ref: open-m.connectors.sap-rfc:2.1.0
      config: { bapi: BAPI_SALESORDER_CREATEFROMDAT2 }
      ports:
        input: { schema_ref: logistics.schemas.sap-order-idoc:1.0.0, format: XML }
      placement: { x: 600, y: 300 }
      logging:
        topic: logistics.orders.order-to-sap.sap-rfc-out.log

  connections:
    - id:   conn-http-to-sap                                   # auto
      from: { component: http-inbound, port: output }
      to:   { component: sap-rfc-out,  port: input  }
      route:
        topic:             logistics.orders.order-to-sap.http-inbound.out  # auto
        subscription:      logistics.orders.order-to-sap.sap-rfc-out.sub   # auto
        source_schema_ref: logistics.schemas.http-order:1.0.0
        target_schema_ref: logistics.schemas.sap-order-idoc:1.0.0
      transform:
        type: utlx
        mode: ref
        ref:  logistics.mappings.http-order-to-idoc:1.0.0
      error_routing:
        topic:        logistics.orders.order-to-sap.http-inbound.err      # auto
        subscription: logistics.orders.order-to-sap.error-handler.err-sub  # auto

  error_handling:
    dlq_strategy:   per_component
    replay_enabled: true
    alert_on_dlq:   true

  scaling:
    clusters:
      - cluster-ref: k8s-prod-eu-west
        components:
          - { id: http-inbound, replicas: 2 }
          - { id: sap-rfc-out,  replicas: 1 }

Auto-generated fields

When you draw an arrow in the IDE canvas, these fields are auto-populated from the naming convention. You can override any of them in the text editor:

⚠️

Subscription names are stable identifiers. In Pulsar, a subscription name is durable — if it changes between deployments, the consumer restarts from the latest offset and silently drops unprocessed messages. Never rename a subscription on a production pipeline without a deliberate migration plan. Auto-generated subscription names are stable as long as component IDs and pipeline names don't change.

YAML as the canonical artefact

The pipeline YAML is always the source of truth. The IDE canvas is a visual rendering of the YAML — not the other way around. Every canvas action (drag a component, draw an arrow, change a config value) writes back to the YAML file immediately. The canvas and the YAML are always in sync.

This means your pipelines live in Git. Pull requests show exactly what changed: which component was added, which connection was rewired, what config value changed. CI/CD validates the YAML schema and runs connector tests before merge. There are no locked binary files, no single IDE chokepoint.

💡

Install the Open-M VS Code extension to get JSON Schema validation, component ref autocomplete, and hover documentation directly in your editor — no canvas required. The extension registers a JSON Schema for apiVersion: open-m/v1, kind: Pipeline documents automatically.