FMI 3.0 Co-Simulation with FMPy: Orchestrating a Multi-FMU Digital Twin in Python

FMI 3.0 co-simulation is the discipline of stepping several independently compiled simulation models forward in lockstep, exchanging signals at agreed time points, so that a controller model, a plant model, and a sensor model behave as one coherent digital twin. Each model ships as a Functional Mock-up Unit — a self-contained zip that carries its own equations, its own numerical solver, and a standardized C interface. The magic of the standard is that none of these units needs to know how the others were built: a Simulink-exported valve, a Modelica-exported tank, and a hand-written Python controller can all speak the same protocol. FMPy — an open-source Python implementation of the Functional Mock-up Interface, maintained within the Modelica Association ecosystem — is the tool we will use to load those units, inspect them, and drive them from a master loop we write ourselves. By the end you will have a running twin where a PI controller regulates a tank level, with feedback routed between two FMUs on every step.

What this covers:

What FMI and FMUs are, and how the standard evolved from 1.0 through 2.0 to 3.0
The anatomy of an FMU zip and the FMI 3.0 co-simulation interface
The role of the master algorithm and why it, not the FMU, owns the clock
A complete FMPy walkthrough: read_model_description, dump, instantiate, step, terminate
A worked multi-FMU twin — controller plus tank plant — with feedback and logging
FMI 3.0 specifics: new numeric types, arrays, binary variables, clocks, terminals, and intermediate update
The failure modes that bite real integration projects and how to avoid them

Context and Background

The Functional Mock-up Interface is a tool-independent standard for exchanging dynamic simulation models. It was born inside the MODELISAR project around 2010 to solve a painfully common problem in automotive and aerospace engineering: a supplier builds a component model in one tool, the OEM integrates it into a system model in another tool, and neither wants to hand over source code or force the other onto a single vendor. The answer was to compile each model into a Functional Mock-up Unit (FMU) — a zip archive containing a platform-specific binary plus a machine-readable description of every variable — and to define a small C API that any importing tool can call.

FMI defines two flavours. Model exchange ships the model’s equations and expects the importing tool to supply the numerical solver; the FMU exposes its state derivatives and the master integrates them. Co-simulation bundles a solver inside the FMU: the importer only asks the unit to advance from one communication point to the next via doStep, and the FMU internally decides how to sub-step. Co-simulation is the natural fit for digital twins because it lets each subsystem keep the solver its author tuned for it, and it isolates stiff or discontinuous dynamics behind a clean interface. It also solves a governance problem as much as a technical one: a supplier can ship a co-simulation FMU that protects its intellectual property — the equations are compiled away — while still letting the integrator run a faithful virtual replica. That is why co-simulation became the backbone of virtual integration in automotive and aerospace, where a full vehicle model may stitch together dozens of FMUs from as many suppliers, none of whom sees the others’ source. For a broader architectural treatment of how these units slot into a twin, see our companion piece on FMI and FMU co-simulation in digital twin architecture.

The standard has moved through three major versions. FMI 1.0 (2010) split model exchange and co-simulation into two separate specifications with overlapping but distinct APIs, which meant tools effectively implemented two standards. FMI 2.0 (2014) unified them under one description schema, added fmi2GetFMUstate/fmi2SetFMUstate for save-and-restore, introduced directional derivatives for smarter coupling, and cleaned up initialization into an explicit mode with a defined entry and exit. That 2.0 unification is why the field consolidated: a single modelDescription.xml schema and one binary convention meant a supplier could export once and be consumed anywhere. FMI 3.0, released by the Modelica Association in 2022 and now widely supported across commercial and open tools, is the largest leap yet: it introduces a richer type system (sized integers and 32/64-bit floats), array-valued variables, a binary type for opaque payloads, first-class clocks for event and hybrid co-simulation, terminals and icons for structured connectors, intermediate update so a master can peek inside a step, and a third interface flavour called scheduled execution aimed at real-time and virtual-ECU use. We reach for FMPy because it implements the FMI 3.0 API faithfully in pure-Python-plus-ctypes, runs anywhere Python does, integrates naturally with NumPy and matplotlib, and — unlike a heavyweight GUI tool — lets you script the master algorithm explicitly, which is exactly what a hands-on tutorial needs. That scriptability is also what makes FMPy a favourite for continuous-integration checks: you can load an FMU, assert its interface, run a short simulation, and compare against a reference trajectory, all in a headless test.

FMI 3.0 Co-Simulation Architecture

At the highest level, FMI 3.0 co-simulation is a contract: the FMU promises to advance its internal state by a requested time increment when told to, and to expose named inputs and outputs through integer value references; the master promises to own the global clock, set inputs before each step, call doStep, and read outputs afterward. The FMU never calls the master and never advances on its own — all orchestration lives outside the unit.

Inside the FMU zip

An FMU is just a zip archive with a .fmu extension, and understanding its layout demystifies everything else. At the root sits modelDescription.xml, the single source of truth. It declares the FMI version, the model’s GUID, the supported interface types (co-simulation, model exchange, or scheduled execution), and — most importantly — a ModelVariables block listing every variable with its name, valueReference (the integer handle you actually pass to the API), causality (input, output, parameter, local), variability, data type, and start value. FMI 3.0 also adds a ModelStructure section describing dependencies, and optional Clocks, Terminals, and UnitDefinitions.

The binaries/ folder holds the compiled implementation, one subfolder per platform triple — x86_64-linux, x86_64-windows, aarch64-darwin, and so on — each containing a shared library (.so, .dll, .dylib). The resources/ folder carries any runtime files the model needs: lookup tables, parameter CSVs, map data. FMI 3.0 adds terminalsAndIcons/ for the structured-connector metadata and an SVG icon. FMPy reads the XML, extracts the archive to a temporary directory, and dlopen’s the binary that matches your OS and architecture. If a matching binary is missing, the FMU is simply not runnable on your machine — a common surprise when a vendor ships Windows-only binaries and you are on Linux.

It is worth internalizing why the valueReference indirection exists rather than addressing variables by name. Names are for humans and for the description file; the compiled binary knows nothing about them. When the FMU is exported, every variable is assigned a stable integer handle, and the C API’s setters and getters take arrays of those integers. This keeps the hot path — thousands of get/set calls per second in a real twin — free of string hashing, and it lets the same binary serve tools in any language. The practical consequence for you is that the first thing your master must build is a name-to-valueReference map from the model description; everything downstream addresses variables by handle. FMI 3.0 keeps this design but widens what a single handle can carry: because a value reference can now name an array, one handle and one call can move a whole state vector, which changes how you think about granularity when wiring units together.

The co-simulation interface and the master’s role

The co-simulation slave exposes a compact set of functions. After instantiation you call enterInitializationMode, set parameters and initial inputs, then exitInitializationMode. The simulation loop is then a repetition of: set inputs with typed setters (setFloat64, setInt32, setBoolean, and so on), call doStep(currentTime, stepSize), and read outputs with the matching getters. The FMU returns a status and, importantly, may signal earlyReturn — telling the master it stopped short of the requested step because an internal event fired. That earlyReturn mechanism, new in FMI 3.0, is what makes event-accurate co-simulation possible: rather than forcing the FMU to swallow an event that occurred mid-interval, the standard lets it return early at the event time, hand control back to the master, and let the master service the event (activate clocks, exchange the new values) before resuming from that exact instant. The status codes themselves — OK, Warning, Discard, Error, Fatal — are worth respecting: a Discard means the FMU could not take the requested step and the master should retry smaller, while Error or Fatal mean the instance is unusable and must be freed. A master that ignores return status and ploughs ahead is a master that produces confident nonsense after the first rejected step.

The master algorithm is the piece you own, and its job is deceptively large: it decides the communication step size, the order in which FMUs are stepped, and how the output of one FMU becomes the input of another. The simplest master is a fixed-step Gauss-Seidel scheme — step FMU A, feed its outputs into FMU B, step B, feed B’s outputs back to A on the next iteration. More sophisticated masters use Jacobi ordering (step all FMUs on the old inputs, then exchange), variable step sizes driven by error estimates, or rollback via getFMUState/setFMUState to retry a step at a smaller size. Our companion architecture guide on FMI and FMU co-simulation goes deeper on master families.

The distinction between Gauss-Seidel and Jacobi ordering is not academic — it changes both accuracy and the shape of your code. In Jacobi ordering the master reads every FMU’s outputs at the start of the interval, distributes them as inputs, then steps all FMUs over the same interval in parallel; every unit sees inputs from the previous communication point. This is embarrassingly parallel — you can doStep each FMU on its own thread — but it carries a full one-step delay on every coupling, so the coupling error is larger for a given h. In Gauss-Seidel ordering you step the FMUs in a chosen sequence and immediately propagate each unit’s fresh output to the next unit within the same interval, so only the loop-closing edge (the feedback from the last unit back to the first) carries a delay. Gauss-Seidel is more accurate for a given step but inherently serial and sensitive to the order you pick. There is no universally right choice: strongly one-directional signal chains favour Gauss-Seidel, while tightly bidirectional couplings sometimes behave better under Jacobi with a smaller step, or under an iterative master that repeats the interval until inputs and outputs agree. Whichever you pick, write it down — a co-simulation result is only reproducible if the ordering is documented alongside the step size.

How signals are exchanged and held

Between two communication points, an FMU’s inputs are held at whatever the master last set — they are, by default, piecewise-constant across the interval. This is a modelling assumption with real consequences: if a fast signal drives an FMU that only sees it as a constant for the whole 10 ms step, the FMU integrates against a staircase approximation of the true input, and the error scales with both the step size and how quickly the signal actually moves. FMI 3.0 offers two escape hatches. The first is to declare, in the model description, that an input carries a derivative — the master can then supply not just the value but its rate of change, letting the FMU linearly interpolate the input across the interval instead of treating it as flat; this is the co-simulation form of input extrapolation and it markedly reduces coupling error for smooth signals. The second is intermediate update, where the FMU calls back into the master at internal sub-steps to fetch a fresher input value, closing the gap for signals that must not be frozen. Understanding this hold behaviour is essential because it explains a class of bugs where a twin is stable and plausible but subtly wrong: the physics inside each FMU is correct, yet the constant-input assumption at the seams has quietly degraded the coupling. When you tighten h and the answer changes, this is usually why.

What FMI 3.0 adds

Four additions matter most for twins. A richer type system replaces FMI 2.0’s single Integer/Real with sized types — Int8, Int16, Int32, Int64 (and unsigned variants UInt8 through UInt64) plus Float32 and Float64 — so an embedded controller model can expose an 8-bit register accurately instead of widening it to a 32-bit int, and a GPU-oriented model can keep single-precision Float32 signals without a silent widening to double. Each type has its own setter and getter (setUInt8, getFloat32, and so on), and the description file tags every variable with its exact type so the master knows which call to use. Array variables let a single value reference carry a whole vector or matrix (think a state vector, a temperature profile, or a sensor grid), which you get and set in one call rather than N scalar calls; the description declares the dimensions, which may themselves be fixed or bound to a structural parameter set at initialization. The binary type carries opaque byte payloads — serialized images, protocol frames, ML tensors — through the same setter/getter machinery, which is what makes it feasible to co-simulate a perception model that emits an encoded frame alongside a physics model that consumes a scalar. Clocks are the headline feature: a clock is a boolean-like signal that “ticks” to mark discrete events, and it comes in periodic (ticks on a fixed schedule) and triggered (ticks when the model raises an event) flavours. Clocks let co-simulation model hybrid systems — sampled digital controllers, network messages, state machines — cleanly, instead of smearing events across a fixed step. Terminals group related variables into named connectors (a “power port” bundling voltage and current, or a bus bundling several channels) so a tool can wire whole ports rather than individual scalars, and the terminal metadata lives in terminalsAndIcons/. Intermediate update lets an FMU call back into the master mid-step so the master can supply or observe values at internal sub-steps — essential for tight coupling and for signals that must not be held constant across a whole communication interval, such as a fast disturbance driving a slow plant. Together these features move FMI from a purely continuous-signal exchange toward a hybrid, event-aware standard suited to modern twins that mix physics, control, and data.

Two of these additions deserve a closer look because they change the master’s inner loop rather than just its type declarations. Intermediate update inverts the usual control flow: normally the master calls into the FMU, but with intermediate update the FMU calls back into a master-supplied callback at chosen sub-step boundaries inside a single doStep. Inside that callback the master can read the FMU’s current outputs at a finer granularity than the communication grid, and — if the FMU permits — supply a fresher input, which is precisely how you feed a fast-moving signal into a unit without freezing it for the whole interval. The callback also carries flags telling the master whether it may return early, which is the hook that lets an FMU stop mid-step at an event and hand control back at the exact event time. In FMPy this manifests as a callback you register when instantiating; in a hand-written master it is the difference between a naive fixed-grid loop and one that resolves events and tight couplings correctly. The third interface flavour, scheduled execution, is aimed at real-time and virtual-ECU work: instead of a monolithic doStep that advances continuous time, the model exposes named model partitions — computational tasks — that an external scheduler activates on their own clocks, matching how a real embedded RTOS dispatches tasks at different rates. It is a specialist tool you reach for when co-simulating an actual control unit’s task set, and while FMPy focuses on the co-simulation and model-exchange interfaces most twins need, knowing scheduled execution exists tells you where FMI is heading: toward being the lingua franca not just of offline simulation but of hardware-in-the-loop and virtual ECUs as well.

Packaging: how an FMU is built and what ends up in the zip

Because the master treats every unit as an opaque binary, the quality of your twin is decided partly at export time, before FMPy ever sees the file — so it pays to understand what a well-formed FMI 3.0 co-simulation FMU actually contains. When a modelling tool exports a co-simulation FMU it does three things: it code-generates or links the model’s equations together with an embedded solver into a shared library, it emits the modelDescription.xml that enumerates every variable and capability flag, and it zips the two together with any resources/ the model reads at runtime. The capability flags in that XML are the contract the master reads first. canHandleVariableCommunicationStepSize tells you whether the FMU will accept a different stepSize on each doStep or insists on a single fixed step; canGetAndSetFMUState tells you whether rollback is available; canReturnEarlyAfterIntermediateUpdate and the clock-related flags tell you whether event-accurate stepping is even possible. An FMU that declares none of these is a fixed-step, no-rollback, no-events unit, and no amount of master cleverness can change that — the ceiling is set at export.

For hand-written or CI-built FMUs there is a second consideration: the binary must be compiled for every platform you intend to run on, placed in the correctly named binaries/<platform-triple>/ folder, and exported against the matching FMI 3.0 header set (fmi3Functions.h, fmi3FunctionTypes.h, fmi3PlatformTypes.h). A frequent packaging mistake is a mismatch between the GUID baked into the binary and the GUID declared in modelDescription.xml; FMPy checks them and refuses to instantiate on a mismatch, which is the standard’s way of stopping you from running a description against the wrong binary. If you are producing FMUs yourself, run them through the reference FMI validator before distribution — it catches missing binaries, malformed XML, undeclared dependencies, and capability flags that the binary does not actually honour. Treating export as a build artifact with its own validation gate, rather than a one-click afterthought, is what makes a fleet of FMUs composable months later.

Arrays, the binary type, and terminals in practice

The three structural additions of FMI 3.0 — arrays, the binary type, and terminals — change how you wire a twin, not just what you can express. Arrays collapse what used to be N scalar ports into one handle: a stratified tank exposing forty temperature nodes is a single Float64 variable of dimension forty, set and read with one call carrying a NumPy array rather than forty round-trips through ctypes. The dimensions may be fixed at export or bound to a structural parameter you set in configuration mode, which is how one FMU can serve a ten-node or a hundred-node discretization without re-export. The binary type is the escape valve for everything that is not a number: a camera FMU can emit an H.264 frame as an opaque byte payload, a network FMU can pass a CAN frame, an ML model can hand over a serialized tensor — all through the same setBinary/getBinary machinery, with the master moving bytes it never has to interpret. Terminals are the wiring abstraction: instead of connecting voltage_p, voltage_n, and current as three loose scalars, an electrical terminal groups them into one named connector with a declared kind, so a tool — or a disciplined master — wires the whole port at once and cannot accidentally cross a voltage into a current. The terminal metadata lives in terminalsAndIcons/terminalsAndIcons.xml alongside the SVG icon, and while FMPy exposes it as data rather than auto-wiring on it, reading it lets your connection table assert that both ends of a coupling belong to compatible terminals before a single signal flows. For large twins this is the difference between a wiring bug found at design time and one found as a diverging trajectory three hours into a run.

Hands-On: Orchestrating a Multi-FMU Twin

We will build a small but complete twin: a PI controller FMU regulating the liquid level in a tank plant FMU. The controller reads a setpoint and the measured level, emits a valve command; the tank integrates a mass balance and emits its level; the master routes the tank’s level back to the controller as feedback and logs everything. This tank-and-controller pairing is deliberately the smallest system that still exhibits every co-simulation concern that bites larger twins: it has a genuine feedback loop, a continuous plant coupled to a controller that in a real deployment would be sampled, a saturating actuator, and a signal that must be routed between two independently compiled units on every step. Master the wiring here and a fifty-FMU vehicle or plant twin is the same pattern repeated. The connection graph is shown below.

Before any code, a note on where these FMUs come from. You rarely author an FMU by hand; you export it from a modelling tool. The tank might be exported from OpenModelica or Dymola, the controller from Simulink or from a hand-written C model compiled against the FMI headers, and the sensor from a Python model wrapped by a tool that emits an FMU. What matters for this tutorial is that once each subsystem is an FMU, its origin is irrelevant to the master — that opacity is the entire value proposition. If you want reference FMUs to practise on, the Modelica Association publishes a set of small, well-behaved test FMUs, and FMPy ships examples; both are ideal for rehearsing the master loop before you point it at proprietary vendor units whose quirks you do not yet know.

First, install FMPy and inspect an FMU. FMPy’s read_model_description parses the XML into Python objects, and dump prints a human-readable summary — always your first move with an unfamiliar unit.

# pip install "fmpy[complete]"  (pulls in lxml, numpy, and plotting extras)
from fmpy import read_model_description, dump, extract
from fmpy.fmi3 import FMU3Slave
import numpy as np

# 1. Inspect before you trust: print the interface
dump("Controller.fmu")   # prints FMI version, variables, causality, start values

# 2. Parse the description into objects we can index by name
md = read_model_description("Controller.fmu")
print("FMI version:", md.fmiVersion)
print("Interfaces :", [ct.modelIdentifier for ct in (md.coSimulation, md.modelExchange) if ct])

# Build a name -> valueReference map. In FMI 3.0 the API is addressed
# entirely by integer valueReference, never by string name.
vr = {v.name: v.valueReference for v in md.modelVariables}
print("Controller variables:", list(vr.keys()))

read_model_description returns an object whose .modelVariables list gives you each variable’s .name, .valueReference, .causality, .type, and .start. We turn that into a dictionary so the rest of the code can look up handles by human-readable name. This step is where FMI 3.0’s typing shows up: a variable’s .type may be Float64, Int32, UInt8, Boolean, or Binary, and you must call the matching setter/getter — mixing them raises a status error inside the FMU. It is worth iterating over md.modelVariables and printing (name, valueReference, causality, type, unit) for every variable the first time you meet an FMU; that single loop tells you which ports are inputs you must drive, which are outputs you can read, which are parameters you set once during initialization, and which are internal locals you should leave alone. Wiring an FMU without reading its causalities is the fastest way to a twin that runs cleanly and produces nonsense.

Now instantiate both slaves. FMPy’s FMU3Slave wraps the FMI 3.0 co-simulation API. We extract each FMU to a temp directory, construct the slave, and drive it through the initialization handshake.

def load_slave(path):
    md = read_model_description(path)
    unzip_dir = extract(path)
    slave = FMU3Slave(
        guid=md.guid,
        modelIdentifier=md.coSimulation.modelIdentifier,
        unzipDirectory=unzip_dir,
        instanceName=md.coSimulation.modelIdentifier,
    )
    slave.instantiate()          # dlopen binary, create instance
    return slave, {v.name: v.valueReference for v in md.modelVariables}

ctrl,  ctrl_vr  = load_slave("Controller.fmu")
plant, plant_vr = load_slave("Tank.fmu")

# --- Initialization: set parameters, then leave init mode ---
for slave in (ctrl, plant):
    slave.enterInitializationMode()

# Controller gains and setpoint (parameters + a start input)
ctrl.setFloat64([ctrl_vr["Kp"]],       [2.0])
ctrl.setFloat64([ctrl_vr["Ki"]],       [0.5])
ctrl.setFloat64([ctrl_vr["setpoint"]], [1.0])   # target level = 1.0 m
# Tank geometry
plant.setFloat64([plant_vr["area"]],    [0.5])  # tank cross-section m^2
plant.setFloat64([plant_vr["outflow"]], [0.05]) # constant drain

for slave in (ctrl, plant):
    slave.exitInitializationMode()

Note the shape of the calls: setFloat64 takes a list of value references and a list of values, because FMI 3.0 lets you set many variables — or a whole array variable — in one call. If the tank exposed its full state as an array (say a stratified-temperature vector), you would pass that variable’s single value reference and a NumPy array of values in the same call. The initialization handshake itself is not optional bookkeeping — it exists because many FMUs must solve an internal consistent-initialization problem (finding a steady state, resolving algebraic constraints) that can only run once all start values and parameters are known. Setting a parameter after exitInitializationMode is either rejected or silently ignored, depending on the variable’s variability, so anything you want to configure for the whole run must be set inside the init window. Structural parameters that change array sizes are stricter still: they can only be set in a dedicated configuration mode before initialization, because the FMU must allocate storage before it can accept values.

With both slaves initialized, the master loop is a fixed-step Gauss-Seidel scheme. On each step we push the current feedback into the controller, step it, read its valve command, push that into the plant, step it, read the new level, and stash it as next step’s feedback. This ordering is drawn out in the sequence diagram below.

The choice to step the controller before the plant is deliberate and reflects the physical causality of the loop: the controller decides an action based on what it last measured, and the plant then reacts to that action. If we reversed the order — stepping the plant first on a command it has not yet received — we would inject an extra step of delay and a subtly different phase into the loop. In a system with one clear direction of causality, Gauss-Seidel in the causal order is almost always the right default. Where the direction is genuinely bidirectional and simultaneous, no static ordering is correct and you must either accept a delay, iterate the interval to a fixed point, or restructure the models to introduce a state that breaks the simultaneity. Making this choice consciously, and recording it, is part of what separates a co-simulation that you can defend from one that merely runs.

t, t_end, h = 0.0, 60.0, 0.01     # 60 s, 10 ms communication step
level_feedback = 0.0              # measured tank level, fed to controller
log = {"t": [], "level": [], "cmd": []}

while t < t_end:
    # --- FMU_A: controller sees the latest feedback ---
    ctrl.setFloat64([ctrl_vr["measured"]], [level_feedback])
    ctrl.doStep(currentCommunicationPoint=t, communicationStepSize=h)
    cmd = ctrl.getFloat64([ctrl_vr["valve_cmd"]])[0]
    cmd = max(0.0, min(1.0, cmd))          # clamp valve opening 0..1

    # --- FMU_B: plant consumes the command, produces a new level ---
    plant.setFloat64([plant_vr["inflow_cmd"]], [cmd])
    plant.doStep(currentCommunicationPoint=t, communicationStepSize=h)
    level = plant.getFloat64([plant_vr["level"]])[0]

    # --- Route B's output to A's feedback for the next step ---
    level_feedback = level                 # (1-step delay: classic Gauss-Seidel)

    log["t"].append(t); log["level"].append(level); log["cmd"].append(cmd)
    t += h

for slave in (ctrl, plant):
    slave.terminate()
    slave.freeInstance()

Two subtleties are worth pausing on. First, feeding level into level_feedback after the plant step introduces a one-communication-step delay in the feedback path — a well-known artifact of Gauss-Seidel co-simulation. At h = 10 ms it is negligible for this slow tank; shrink h or switch to intermediate update if your loop is fast. If you wanted zero delay you would need the two FMUs to agree on their coupled values within the interval, which means an iterative master that re-steps the interval — and that in turn requires both FMUs to support state rollback. Second, we clamp the valve command in the master. In a stricter design that saturation belongs inside the controller FMU, because a real actuator saturates whether or not a master is present, but doing it in the master here illustrates a point that scales: the orchestrator is not just a pipe, it is a place where you can inject clamps, rate limiters, unit conversions, fault injection, and instrumentation between units without touching the compiled models. That is one of co-simulation’s underrated strengths — the coupling layer is soft and scriptable even when the components are opaque binaries.

A word on what the plant FMU is actually doing when you call doStep. Internally the tank integrates a first-order mass balance, area * d(level)/dt = inflow_cmd * max_inflow - outflow, using whatever solver its author embedded — perhaps a fixed-step Euler, perhaps an adaptive Runge-Kutta. When the master requests a 10 ms step, the co-simulation slave may take many smaller internal sub-steps to hit its own accuracy target, then report the level at the end of the interval. You never see those sub-steps; that encapsulation is the whole point. It also means the plant’s accuracy is largely its author’s responsibility, and your h controls only how often the two units exchange signals, not how finely the tank itself is integrated. Confusing those two step sizes — the communication step you own and the internal solver step the FMU owns — is a frequent source of misplaced blame when a twin is inaccurate.

Our loop above uses a single fixed h, but real masters negotiate the step size, and FMI 3.0 gives them the vocabulary to do it safely. The negotiation has two halves. The first is permission: the master must respect each FMU’s canHandleVariableCommunicationStepSize flag — a unit that declares it false must be driven at a constant step, so the master’s chosen h is bounded below by the least flexible FMU in the set. The second is feedback: when an FMU cannot take the requested step it returns status Discard together, in FMI 3.0, with the information the master needs to shrink and retry. A robust variable-step master therefore does not commit blindly; it proposes a step, inspects the returned status and any earlyReturn time, and adapts. The skeleton looks like this:

def negotiated_step(slaves, t, h_proposed, h_min=1e-6):
    """Try a co-simulation step, halving on Discard, honouring earlyReturn."""
    h = h_proposed
    while h >= h_min:
        # snapshot every slave so we can roll back a rejected step
        states = [s.getFMUState() for s in slaves]
        statuses = [s.doStep(currentCommunicationPoint=t,
                             communicationStepSize=h) for s in slaves]
        if all(st == fmi3OK for st in statuses):
            return t + h, h                      # step accepted
        # at least one FMU discarded: restore and retry smaller
        for s, snap in zip(slaves, states):
            s.setFMUState(snap)                  # rollback to interval start
        h *= 0.5
    raise RuntimeError("step rejected below h_min; check FMU stability")

This ten-line pattern is the heart of every error-controlled co-simulation master, and it exposes exactly why rollback is not optional at this level of sophistication. To retry a discarded step you must return every FMU in the coupled set to the state it held at the start of the interval; if even one unit cannot restore — because it omits canGetAndSetFMUState — you cannot retry cleanly, and the whole master collapses back to fixed steps at the pace the weakest unit allows. This is why the capability flags are a design-time gate, not a runtime curiosity: they determine the entire class of master algorithm available to you before you write a line of the loop. Note too that getFMUState/setFMUState snapshot the FMU’s internal state — solver history, integrator memory, discrete modes — not just its outputs, which is what makes a rollback physically faithful rather than a superficial value reset. Some FMUs also implement serializeFMUState, letting you persist a snapshot to disk and resume a long twin run later or fork it across machines, a capability that turns co-simulation from a single-process exercise into something you can checkpoint like any long computation.

If the controller were a sampled digital regulator running at, say, 50 Hz while the plant integrates continuously, FMI 3.0 clocks express that cleanly. The controller FMU would declare a periodic input clock; instead of calling doStep and reading a fresh command every 10 ms, the master advances continuous time and, when the clock’s tick interval elapses, activates the clock (via the clock-activation API) so the controller recomputes its output exactly on the sample boundary and holds it between ticks. A triggered output clock works the reverse way: the FMU raises it to tell the master “an event happened at this instant” — a threshold crossing, an alarm — and the master then queries which clock ticked and reacts. Handling clocks in the loop typically means checking for earlyReturn from doStep, reading the clock state, servicing the event, and resuming.

The reason this matters for accuracy is subtle but concrete. Without clocks, a 50 Hz controller forced onto a fixed 10 ms continuous step would recompute its command every step whether or not a sample boundary had actually arrived, and its output between true sample instants would drift instead of holding — a small but real error in the loop’s phase. With a periodic clock the controller’s compute-and-hold semantics are exact: the command is a piecewise-constant signal that changes only on the 20 ms boundaries, precisely as the real embedded controller behaves. For events, the payoff is even larger. A triggered clock lets a relay, a gear-shift, or a protection trip fire at the instant its condition is met — signalled through earlyReturn so the master can shorten the interval and land the event on time — instead of being detected one whole communication step late. That one-step latency is exactly what makes naive fixed-step co-simulation of switching systems chatter or diverge, and it is the class of problem clocks were designed to remove.

Logging the two continuous signals is then just a plot:

import matplotlib.pyplot as plt
plt.plot(log["t"], log["level"], label="tank level (m)")
plt.plot(log["t"], log["cmd"],   label="valve command")
plt.axhline(1.0, ls="--", label="setpoint")
plt.xlabel("time (s)"); plt.legend(); plt.tight_layout()
plt.savefig("twin_response.png", dpi=150)

The end-to-end FMPy lifecycle — read, extract, instantiate, initialize, step, terminate — is summarized in the workflow diagram below; keep it beside you when wiring your own units. Notice that the lifecycle is strictly ordered: you cannot set a Float64 before instantiating, cannot doStep before exiting initialization, and cannot read outputs after terminate. FMI enforces this state machine, and calling an operation in the wrong mode returns an error rather than doing something surprising. When you scale from two FMUs to many, wrapping each slave’s lifecycle in a small manager object — one that owns the read, extract, instantiate, and init sequence and exposes clean set, step, and get methods keyed by port name — pays for itself immediately, because it lets the master loop read like the connection graph rather than like ctypes plumbing.

The FMPy co-simulation lifecycle: read the model description, extract the archive, instantiate the slave, run the initialization handshake, loop set-step-get, then terminate and free the instance. Every state transition is enforced by the FMI state machine, so calling an operation out of order returns an error rather than misbehaving silently.

For a single FMU where you do not need a custom master, FMPy’s one-liner simulate_fmu("Tank.fmu", stop_time=60, output=["level"]) runs the whole thing and returns a NumPy structured array — perfect for smoke-testing a unit before you wire it into the twin. Run it on every FMU in isolation first: confirm each unit is individually sane, that its outputs move in the direction you expect when you nudge an input, and only then compose them. Most co-simulation debugging time is spent discovering that a unit was misbehaving alone and the coupling merely exposed it, so isolating that failure mode up front is the single highest-leverage habit in the whole workflow.

Trade-offs, Gotchas, and What Goes Wrong

Co-simulation looks tidy on a slide and turns spiky in practice. The first landmine is the algebraic loop: if FMU A’s output depends instantaneously on FMU B’s output and vice versa within the same instant — a direct feedthrough with no state between them — a plain Gauss-Seidel master cannot resolve it in one pass, because whichever FMU steps first must use a stale value for its partner’s output. The result is either a persistent step-lag error or, if the loop gain is high enough, a numerical oscillation that grows until the twin blows up. FMI 3.0’s ModelStructure block declares exactly which outputs depend directly on which inputs, so a smart master can detect the loop and iterate the interval — re-stepping with getFMUState/setFMUState until the coupled inputs and outputs converge to a fixed point — which is the co-simulation analogue of a Newton iteration. Intermediate update helps by exposing sub-step values so the loop can be closed more tightly than once per communication point. But the honest, robust fix is usually structural: insert a small physical state or a first-order filter so the loop passes through a derivative rather than a pure feedthrough, breaking the instantaneous dependency. A twin that avoids direct algebraic coupling between separately compiled units is a twin that stays stable.

Step size and stability are the next trap. A fixed communication step that is comfortable for a slow thermal plant will destabilize a fast electrical one; the coupling error grows with h, and unlike a monolithic solver, a basic co-simulation master has no rollback — once doStep commits, you cannot un-advance unless every FMU supports getFMUState/setFMUState. Always check the canGetAndSetFMUState capability flag before designing a variable-step or retry strategy. The insidious part is that co-simulation instability does not always announce itself as an obvious blow-up; sometimes it appears as a small, growing oscillation superimposed on an otherwise plausible response, easy to mistake for real dynamics. The discipline that protects you is the convergence sweep: run the twin at your working step, then at half that step, and compare. If the trajectories differ meaningfully, your step is too large and what you are seeing is coupling error, not physics. Only when halving the step leaves the answer essentially unchanged have you earned the right to trust it. This sweep costs a few extra runs and is the cheapest insurance in the whole workflow. As a rule of thumb, the communication step should be several times smaller than the fastest time constant that crosses a coupling — a signal that swings meaningfully within one step is a signal your seams cannot represent.

Type and unit mismatches are silent killers. FMI 3.0’s sized integers mean a UInt8 register wired into an Int32 input will not raise a compile error — it will just wrap or truncate at runtime, so a value of 300 arrives as 44. Worse, units are advisory metadata, not enforced contracts: connecting a variable declared in bar to one expecting Pa compiles fine and produces physically wrong results scaled by 100,000, and nothing in the standard will stop you. The UnitDefinitions block gives you the information to catch this — every unit resolves to SI base units with a factor and offset — so a conscientious master reads both ends’ declared units and either asserts they match or inserts a conversion. Validate .type and .unit from the model description before you connect anything. Vendor conformance varies too — not every exporter implements clocks, arrays, intermediate update, or state save/restore, and some ship binaries for only one platform or omit the ModelStructure dependencies that a smart master needs. Run the FMU through the official FMI cross-check suite or a validator, and always call dump first so surprises surface before they reach your loop. No rollback in basic co-simulation deserves repeating as its own hazard: if any FMU in a loop lacks canGetAndSetFMUState, your master is committed to every step it takes, which rules out error-controlled stepping for the whole system — the weakest FMU sets the ceiling on the master’s sophistication. Finally, clock semantics are subtle: a periodic clock’s interval and phase and a triggered clock’s priority all affect the order in which simultaneous events are serviced, and getting them wrong makes events fire on the wrong step or in the wrong sequence, which in a protection or interlock model is not a rounding error but a functional bug.

Practical Recommendations

Start every integration by running dump and read_model_description on each FMU and writing down the exact valueReference, type, unit, and causality of every port you intend to wire. Do not connect by name alone — connect by validated handle. Prototype the master with a coarse fixed step, confirm the twin is qualitatively correct, then tighten h until your output stops changing; that convergence sweep is your evidence the coupling error is bounded, and it should be part of the record you keep with the result. Prefer FMUs that declare canGetAndSetFMUState so you retain the option of rollback and error-controlled stepping later, and treat any FMU that omits ModelStructure dependencies with suspicion, because your master cannot reason about coupling it cannot see. When any subsystem is a sampled controller or event-driven, model it with a clock rather than forcing it onto the continuous step; the accuracy you buy is real and the code is not much harder. Keep the master’s routing logic in one readable place — a single connection table mapping source (fmu, valueReference) to destination (fmu, valueReference) beats routing scattered through the loop — and log every exchanged signal to disk. Debugging co-simulation without a signal trace is guesswork: when a twin diverges, the trace tells you instantly whether the fault entered through a mis-wired port, a unit mismatch, or a step-size instability, and that first bisection saves hours. Finally, pin your FMPy and FMU versions in the project; a silently updated exporter can change a variable’s causality or a start value and turn a reproducible twin into a moving target.

Checklist:

[ ] dump each FMU; record VR, type, unit, causality for every wired port
[ ] Confirm a binary exists for your OS/architecture
[ ] Match setters/getters to declared types (setFloat64 for Float64, etc.)
[ ] Complete the init handshake: enterInitializationMode → set params → exitInitializationMode
[ ] Choose Gauss-Seidel vs Jacobi ordering deliberately; document the feedback delay
[ ] Run a step-size convergence sweep before trusting results
[ ] Check canGetAndSetFMUState if you need rollback or variable steps
[ ] Use clocks for sampled/event subsystems; handle earlyReturn
[ ] Validate units at connection points; never assume they match
[ ] terminate and freeInstance every slave to avoid leaked temp dirs

Frequently Asked Questions

What is the difference between model exchange and co-simulation in FMI 3.0?

Model exchange ships only the model’s equations and relies on the importing tool to provide the numerical solver, exposing state derivatives that the master integrates. Co-simulation bundles a solver inside the FMU, so the master only calls doStep to advance from one communication point to the next and the FMU sub-steps internally. Co-simulation is preferred for multi-tool digital twins because each subsystem keeps its author-tuned solver and discontinuities stay hidden behind the interface.

Does FMPy support FMI 3.0, and how mature is it?

Yes. FMPy implements FMI 1.0, 2.0, and 3.0 for both co-simulation and model exchange, including the FMI 3.0 additions such as sized integer and float types, array variables, and the co-simulation slave API exposed through its fmpy.fmi3 module. It is open source, maintained within the Modelica Association community, and is widely used for scripting, validation, and cross-checking FMUs. Clock and intermediate-update support depends on both FMPy and the specific FMU exporter implementing them.

What is a master algorithm and why do I have to write it?

The master algorithm is the orchestrator that owns the global clock, decides the communication step size, sets each FMU’s inputs, calls doStep, reads outputs, and routes signals between units. FMI deliberately leaves the master outside the standard so tool vendors and integrators can choose fixed or variable steps, Gauss-Seidel or Jacobi ordering, and rollback strategies. FMPy gives you the primitives; the coupling policy is yours to encode, which is exactly what makes a hands-on tutorial necessary.

How do clocks change co-simulation compared to FMI 2.0?

In FMI 2.0, discrete events had to be approximated by shrinking the fixed step until they landed close to a communication point, smearing timing. FMI 3.0 clocks make events first-class: a periodic clock ticks on a fixed schedule for sampled controllers, and a triggered clock ticks when the model raises an event such as a threshold crossing. The master activates or reads clocks and services events exactly at their instants, which lets hybrid continuous-discrete systems be simulated accurately without hunting for events by step-shrinking.

Why does my FMU fail to load even though the file exists?

The most common cause is a missing binary for your platform: the binaries/ folder must contain a subfolder matching your OS and architecture (for example x86_64-linux), and vendors sometimes ship Windows-only builds. Other causes are a GUID mismatch, an unsupported interface type (asking for co-simulation from a model-exchange-only FMU), or a corrupt zip. Run dump and read_model_description first; they surface the declared interfaces and the available binaries before you ever call instantiate.

Can I roll back a co-simulation step if it goes unstable?

Only if every FMU in the loop declares the canGetAndSetFMUState capability. When they do, you snapshot state with getFMUState before a step and restore it with setFMUState if the step produced a too-large error, then retry with a smaller size — the basis of error-controlled variable-step masters. Basic co-simulation has no rollback: once doStep commits, time has advanced. Check the capability flags in modelDescription.xml before designing any retry logic; a single FMU that lacks the flag forces the whole master to fixed steps.

How does FMPy actually call the compiled FMU under the hood?

FMPy loads the platform-appropriate shared library from the FMU’s binaries/ folder with Python’s ctypes and binds the exported FMI C functions to Python callables. When you call slave.setFloat64(...), FMPy marshals your value-reference list and value list into C arrays and invokes the FMU’s fmi3SetFloat64 symbol directly; doStep maps to fmi3DoStep, and so on. There is no code generation and no separate process — the FMU runs in-process with your Python. This is why FMPy is so convenient for scripting and CI, and also why a badly behaved FMU that segfaults can take your Python process down with it: you are, in effect, running the vendor’s C code inside your interpreter. When robustness against a crashing FMU matters, running each unit in its own subprocess or container behind a thin RPC is a common defensive pattern.

FMI 3.0 Co-Simulation with FMPy: A Hands-On Digital Twin Tutorial (2026)