Is edge AI orchestration the same as MLOps?

No. MLOps manages the lifecycle of ML models — training, versioning, deployment. Edge AI orchestration manages what happens after deployment: how models interact with sensors, other models, and actuators on the device at runtime. They are complementary but distinct.

Does edge AI orchestration require an RTOS?

Not strictly, but an RTOS like FreeRTOS simplifies it significantly. Orchestration needs concurrent task management — reading sensors, running inference, and triggering actions in parallel. Bare-metal approaches work for simple pipelines but break down as complexity grows.

Can one MCU orchestrate multiple AI models?

Yes, with careful memory management. An ESP32-S3 with 512 KB SRAM and 8 MB PSRAM can run a small anomaly detection model alongside a classification model. The models share the processor but use separate tensor arenas. Sequential execution is typical — true parallel inference requires multi-core scheduling.

How is orchestration different from a simple inference loop?

A simple inference loop reads a sensor, runs one model, and outputs a result. Orchestration adds decision logic between steps: conditional model selection, multi-sensor fusion, threshold-based escalation, and feedback loops. At the microcontroller tier, this is a firmware architecture pattern, not just a function call.

Guide

What Is Edge AI Orchestration?

Edge AI orchestration coordinates multiple ML models, sensor inputs, and actuator outputs across edge devices through structured workflows. Instead of writing monolithic firmware, orchestration defines what data to collect, which model to run, and what action to take as a configurable graph. In production this graph usually runs on a Linux edge gateway above the devices rather than on the microcontroller itself.

Published 2026-04-01

This guide covers the microcontroller (TinyML) tier. Here “orchestration” means a firmware pattern on an MCU such as an ESP32 or STM32. One tier above it sits the Edge Agent, an AI agent that runs on a Linux edge device and treats the workflow graph as the program. For that Linux tier, start with What Is an Edge Agent? and Why AI Agents Run on the Machine.

Beyond Single-Model Inference

Most edge AI tutorials end at the same point. You have a model running on an MCU, printing predictions to the serial console. That is inference, not a system.

A real edge AI application needs more:

Multiple inputs. A predictive maintenance node reads vibration, temperature, and current — three sensors feeding different processing pipelines.
Conditional logic. If vibration exceeds a threshold AND temperature is rising, run the anomaly detection model. Otherwise, skip inference to save power.
Multiple models. A security camera node runs a lightweight motion detector continuously, but only invokes the heavier object classification model when motion is detected.
Actions. When the model detects an anomaly, the system must do something — trigger a relay, send an MQTT message, log to flash, or wake a more powerful processor.

Orchestration is the layer that connects these pieces into a coherent system.

What Orchestration Looks Like on an MCU

On a microcontroller, orchestration is not a container scheduler or a Kubernetes pod. It is a firmware architecture pattern that manages the flow between sensing, inference, and action.

A typical orchestrated pipeline:

Sensors → Preprocessing → Model A → Decision Logic → Model B (conditional) → Action
    ↑                                                                          |
    └──────────────────── Feedback Loop ───────────────────────────────────────┘

In practice, this runs as a set of RTOS tasks:

Sensor task — reads hardware inputs at defined intervals, writes to a shared buffer
Preprocessing task — applies signal processing (FFT for vibration, MFCC for audio, normalization for analog signals)
Inference task — loads preprocessed data into the tensor arena, invokes the model, produces predictions
Decision task — evaluates predictions against rules, triggers downstream actions or additional inference
Action task — controls outputs (GPIO, UART, MQTT publish, flash logging)

Each task runs on a schedule. An RTOS like FreeRTOS on ESP32 handles the concurrency — task priorities, synchronization, and inter-task communication via queues.

Why Not Just Write Monolithic Firmware?

You can — and many teams do. The problem emerges at scale:

Adding a second model to a monolithic firmware means rewriting the main loop. With orchestration, you add a pipeline stage and connect it to the decision logic.

Changing the action (from GPIO toggle to MQTT alert) in monolithic code means touching the inference code. With orchestration, actions are decoupled from models.

Deploying the same logic on different hardware (ESP32 today, STM32 next quarter) in monolithic code means rewriting sensor and HAL layers throughout. With orchestration, you replace the hardware abstraction layer while the pipeline definition stays the same. This is the value proposition of open-source orchestration runtimes like ForestHub edge-agents: running on the Linux edge gateway, it orchestrates the pipeline once as a portable graph and talks to devices over MQTT today with Modbus and OPC-UA on the roadmap, so the same workflow runs across different hardware without rewriting the orchestration.

This separation matters most for predictive maintenance deployments where the same detection logic runs on different machines with different sensor configurations.

Multi-Model Pipelines

A single MCU can run multiple ML models if memory allows. Common patterns:

Cascade Pipeline

A lightweight model screens all inputs. Only positive detections pass to a larger, more accurate model.

Example on ESP32-S3:

Model A (10 KB): Motion detection from accelerometer — runs every 50 ms
Model B (200 KB): Object classification from camera — runs only when Model A triggers

This saves 90%+ of compute cycles compared to running the heavy model continuously.

Parallel Pipeline

Two models process different sensor modalities simultaneously.

Example on STM32H7:

Vibration model (30 KB): Analyzes accelerometer FFT spectrum — detects bearing wear
Thermal model (15 KB): Monitors temperature gradient — detects overheating

The decision task fuses both outputs: an anomaly flagged by either model triggers an alert, but flagged by both escalates to an immediate shutdown signal.

Sequential Pipeline

Each model’s output feeds the next model’s input.

Example: Audio processing pipeline

VAD model (5 KB): Voice Activity Detection — is someone speaking?
Keyword model (50 KB): Keyword spotting — is the wake word present?
Command model (100 KB): Command classification — what was said?

Each stage gates the next. The MCU runs the VAD model continuously but only activates keyword detection when speech is detected.

Orchestration Building Blocks

Regardless of whether you build orchestration manually or use a platform, these components are needed:

Component	Role	Implementation
Sensor abstraction	Uniform API across sensors	HAL layer per sensor type
Data pipeline	Buffering, preprocessing, feature extraction	Ring buffers + DSP functions
Model registry	Which models are loaded, their input/output specs	Static config or runtime table
Decision engine	Rules, thresholds, conditional model execution	State machine or rule evaluator
Action dispatcher	Maps decisions to hardware outputs or network calls	GPIO, UART, MQTT, HTTP handlers
Scheduler	When to read, when to infer, when to act	RTOS tasks with priorities and timers

Building this from scratch for every project is where most teams lose time. The models are the easy part. The orchestration plumbing is where complexity lives.

The State of the Field

Edge AI orchestration is an emerging practice, not an established standard. Most production deployments today are hand-coded firmware with hard-wired pipelines. Orchestration as a structured discipline — with reusable components, visual tooling, and deployment automation — is where the field is moving.

What exists today:

Framework-level tools: TFLite Micro and Edge Impulse handle inference. They do not handle orchestration.
RTOS primitives: FreeRTOS provides the task scheduling foundation, but orchestration logic is left to the developer.
Platform approaches: ForestHub edge-agents abstracts the orchestration layer with a visual node-based editor where you define sensor triggers, decision logic, and LLM reasoning as a workflow graph. It runs on the Linux edge gateway, above the devices: it ingests device results over MQTT today with Modbus and OPC-UA on the roadmap, orchestrates the graph deterministically with the LLM as one node among many, and acts back over the same transport, inspectable, replayable, and auditable.
DIY approaches: Many teams build custom state machines in C that manage their specific sensor-model-action flows.

There is no “Kubernetes for MCUs” — and given the memory constraints (512 KB SRAM), there may never be. Orchestration on microcontrollers will always be more constrained and more tightly coupled to hardware than cloud orchestration. The question is how much of the plumbing can be abstracted without sacrificing control.

When Orchestration Matters

Not every edge AI project needs orchestration. A single model reading one sensor and toggling one GPIO is a simple inference loop — and that is fine.

Orchestration becomes valuable when:

You have multiple sensors feeding different models
You need conditional logic between inference and action
You deploy the same logic on multiple hardware variants
You want to change behavior without reflashing (configuration-driven pipelines)
Your system has multiple operating modes (normal monitoring, alert mode, low-power sleep)

For teams building predictive maintenance systems across a fleet of machines, orchestration is not optional — it is the difference between a demo and a deployed system.

Frequently Asked Questions

Is edge AI orchestration the same as MLOps?: No. MLOps manages the lifecycle of ML models — training, versioning, deployment. Edge AI orchestration manages what happens after deployment: how models interact with sensors, other models, and actuators on the device at runtime. They are complementary but distinct.
Does edge AI orchestration require an RTOS?: Not strictly, but an RTOS like FreeRTOS simplifies it significantly. Orchestration needs concurrent task management — reading sensors, running inference, and triggering actions in parallel. Bare-metal approaches work for simple pipelines but break down as complexity grows.
Can one MCU orchestrate multiple AI models?: Yes, with careful memory management. An ESP32-S3 with 512 KB SRAM and 8 MB PSRAM can run a small anomaly detection model alongside a classification model. The models share the processor but use separate tensor arenas. Sequential execution is typical — true parallel inference requires multi-core scheduling.
How is orchestration different from a simple inference loop?: A simple inference loop reads a sensor, runs one model, and outputs a result. Orchestration adds decision logic between steps: conditional model selection, multi-sensor fusion, threshold-based escalation, and feedback loops. At the microcontroller tier, this is a firmware architecture pattern, not just a function call.

Related Hardware Guides

ESP32 Predictive Maintenance with Edge Impulse

Deploy vibration-based predictive maintenance on ESP32 with Edge Impulse. Sensor setup, model training, and continuous monitoring guide.

STM32H7 Predictive Maintenance with Edge Impulse

Deploy predictive maintenance on STM32H7 with Edge Impulse. High-frequency vibration analysis with 1 MB SRAM and 480 MHz Cortex-M7.

ESP32-S3 Object Detection with TFLite Micro

Run object detection on ESP32-S3 with TFLite Micro. Hardware specs, compatibility analysis, getting started guide, and alternatives.

STM32F4 Predictive Maintenance with TFLite Micro

Deploy predictive maintenance on STM32F4 with TFLite Micro. A widely used Cortex-M4 for cost-effective vibration monitoring in industrial settings.

Sources

Explore More

ESP32 guides ESP32-S3 guides STM32F4 guides STM32H7 guides All resources MCU Compatibility Checker

Orchestrate Without the Firmware Complexity

ForestHub edge-agents is the open-source edge AI agents orchestration runtime. It runs on your Linux edge gateway, ingests device results over MQTT today with Modbus and OPC-UA on the roadmap, and orchestrates multi-model workflows as a deterministic, auditable graph.

Get Started Free