Concepts

Schema Registry

The Schema Registry is the authoritative catalogue of payload schemas used across all Open-M pipelines. Every MPPM envelope declares a schema_ref. The Schema Registry resolves that reference, enforces compatibility between connected components at deploy time, and enables the IDE's mapping editor to render accurate field trees for UTL-X mapping.

Why a Schema Registry?

A pipeline YAML without schema references is structurally incomplete. It describes the topology — which components are connected — but not the contract: what data flows on each connection. Without contracts:

Schema references in the pipeline YAML are not optional decoration — they are part of the pipeline's operational contract. Schema content, however, is never inlined. It lives in the Schema Registry and is referenced by version.

Schema ref format

Every schema reference is a namespaced, versioned identifier:

format
{namespace}.schemas.{name}:{version}

# Examples
logistics.schemas.order-received:1.0.0
logistics.schemas.sap-order-idoc:2.1.0
finance.schemas.invoice-outbound:3.0.0

The namespace segment matches the pipeline namespace, keeping schema ownership co-located with the pipelines that use the schemas. A schema owned by the logistics team lives in logistics.schemas.* and is governed by that team.

Schema types

Open-M supports eight payload formats, each with a declared schema type in the registry:

JSCH
JSON Schema (draft-07 / 2020-12)
For JSON and YAML payloads. Most common internal format. Renders field trees natively in the UTL-X mapping editor.
XSC
XML Schema Constraint (XSD)
For XML payloads — SAP IDoc, EDIFACT, SOAP, legacy WS. Well-tooled, mature. Required for SAP and fulfillment system integrations.
TSCH
Tabular Schema (UTL-X)
For CSV and fixed-width flat files. Declares column names, types, separator, quoting, and positional field definitions for fixed-width COBOL output.
OSCH
OData / Entity Model Schema
For SAP OData, Workday, Microsoft Dynamics. Captures entity sets, navigation properties, and the $metadata structure for use in UTL-X mappings.
PROTO
Protocol Buffers (.proto)
For gRPC and high-throughput binary payloads. Schema must be registered before pipeline deploy. Wrapper serialises/deserialises at the topic boundary.
AVRO
Apache Avro Schema
For Kafka/Pulsar Schema Registry integration. Binary with embedded schema fingerprint. Schema must be registered before pipeline deploy.
⚠️

Binary formats (PROTO, AVRO) at the pipeline edge are valid. Inside the pipeline, they are discouraged. Binary payloads cannot be tapped, displayed in the IDE inspector, or include readable log previews without the schema. Adapters at the pipeline entry point should decode binary → JSON before the payload enters internal hops.

Where schema refs appear in the pipeline YAML

Schema refs appear in three places in the pipeline YAML descriptor:

1. spec.schemas declaration stanza

All schemas used by the pipeline declared once at the top level. Lets the control plane fetch and cache all schemas at deploy time in a single batch, and lets the IDE verify all refs exist before rendering the canvas.

yaml — spec.schemas
spec:
  schemas:
    - { ref: logistics.schemas.order-received:1.0.0,          type: JSCH }
    - { ref: logistics.schemas.order-enriched-customer:1.0.0,  type: JSCH }
    - { ref: logistics.schemas.fulfillment-dispatch:2.0.0,    type: XSC  }

2. Component port definitions

Each component's input and output ports declare their schema. The control plane checks port compatibility when validating connection schema pairs.

yaml — component port schemas
components:
  - id:  customer-enricher
    ref: logistics.services.customer-lookup:2.1.0
    ports:
      input:
        schema_ref: logistics.schemas.order-received:1.0.0
        format:     JSON
      output:
        schema_ref: logistics.schemas.order-enriched-customer:1.0.0
        format:     JSON

3. Connection route schema refs

Each connection declares the schema of the MPPM payload traveling on its topic. For Mode 1 (no transform) this is a single schema_ref. For Mode 2 (UTL-X transform), source_schema_ref and target_schema_ref are both declared — the transform bridges between them.

Registering a schema

cli — register a new schema
# Register a JSON Schema
open-m schema register \
  --ref      logistics.schemas.order-received:1.0.0 \
  --type     JSCH \
  --file     ./schemas/order-received.json \
  --env      production

# Register an XSD (XML Schema)
open-m schema register \
  --ref      logistics.schemas.fulfillment-dispatch:2.0.0 \
  --type     XSC \
  --file     ./schemas/fulfillment-dispatch.xsd \
  --env      production

# Check what's registered
open-m schema list --namespace logistics

# View a specific schema
open-m schema get logistics.schemas.order-received:1.0.0

Compatibility rules

Schema versions follow semver. The compatibility level required determines the deployment model:

Version bumpAllowed schema changesDeploy modelOld envelopes readable?
PATCH x.x.N No structural change. Documentation, descriptions, examples only. Rolling restart. identical
MINOR x.N.0 New optional fields with defaults. No field removal. No type changes. Consumers that haven't upgraded ignore the new fields. Rolling restart. Old and new envelopes coexist safely. backward compatible
MAJOR N.0.0 Any change permitted: field removal, type changes, structural restructure. Blue/green deployment. Old version drains before new activates. No mixed-version envelopes on the same topic. breaking change

Deploy-time validation

When a pipeline is deployed, the control plane performs schema validation before any Kubernetes manifest is applied or any topic is provisioned. This is the equivalent of compile-time type checking for your integration:

ℹ️

At runtime, the wrapper validates the schema_ref in each received MPPM envelope against the registry on every message. A validation failure is a permanent error — the envelope is routed directly to the DLQ with no retry, regardless of the configured delivery guarantee.

Binary format handling (PROTO, AVRO)

Protobuf and Avro payloads require special handling because the raw bytes are meaningless without the schema. The Open-M Schema Registry stores the .proto file or Avro schema, and the wrapper uses it to serialise and deserialise at the topic boundary.

Because binary payloads cannot be rendered in the IDE's message inspector without the schema, and cannot appear in log event previews, the recommended pattern is:

This keeps the pipeline interior inspectable, mappable, and debuggable. Binary encoding stays at the edge where it belongs.