Guide

AI Agents for Embedded Systems

An AI agent on an embedded system is a firmware architecture that combines sensor input, ML inference, decision logic, and actuator control into an autonomous loop. Unlike single-model inference, agents coordinate multiple inputs and models to act on their environment without human intervention.

Published 2026-04-01

What “Agent” Means on a Microcontroller

The term “AI agent” in 2024-2025 typically refers to LLM-based systems that use tools, maintain memory, and pursue goals — ChatGPT with function calling, autonomous coding assistants, multi-step reasoning systems. That model does not translate to microcontrollers. MCUs have kilobytes of RAM, not gigabytes. They do not run language models.

On a microcontroller, an AI agent is something different and more specific:

An autonomous sense-think-act loop that runs continuously on the device, making decisions about the physical world based on ML inference — without human input and without cloud connectivity.

The “intelligence” comes not from a foundation model, but from the combination of:

  • Specialized ML models (anomaly detection, classification, object detection)
  • Rule-based decision logic (thresholds, state machines, conditional execution)
  • Physical actions (GPIO control, motor actuation, relay switching, alert transmission)
  • Feedback (action outcomes feed back into the sensing pipeline)

This is not a stripped-down version of cloud-native AI agents. It is a fundamentally different architecture — closer to robotics than to chatbots.

From Inference to Agent: The Progression

Most embedded ML projects stop at inference. Building an agent means continuing through additional layers:

Level 1: Inference

Sensor → Model → Prediction (displayed or logged)

The model runs, produces output, and someone else decides what to do. This is the “serial monitor demo” — useful for validation, not for deployment.

Level 2: Reactive System

Sensor → Model → Decision Logic → Action

The firmware acts on the prediction. If the anomaly score exceeds a threshold, toggle a GPIO, send an MQTT message, or activate a relay. Most production edge AI deployments today are reactive systems.

Level 3: Stateful Agent

Sensor → Model → State Machine → Conditional Actions → Feedback → Sensor

The system maintains state across inference cycles. It remembers that the anomaly score has been rising for 4 hours. It escalates from “advisory” to “warning” to “critical” based on trend, not just the current reading. Past actions influence future decisions.

Level 4: Multi-Agent System

Agent A ←→ Coordination Protocol ←→ Agent B

               Agent C

Multiple agents on separate MCUs collaborate. A vibration monitoring agent on Motor 1 shares its state with an agent on Motor 2. If both flag anomalies simultaneously, the coordination logic infers a systemic cause (power supply issue, ambient temperature spike) rather than two independent failures.

Architecture of an Embedded AI Agent

A single agent on an MCU consists of four subsystems:

Sense

The sensing layer abstracts hardware inputs into a uniform data stream:

typedef struct {
    float vibration_rms;
    float temperature_c;
    float current_a;
    uint32_t timestamp_ms;
} sensor_reading_t;

The sensor task runs on a fixed schedule — typically 10 Hz to 1 kHz depending on the modality. It reads raw ADC or I2C data, applies calibration, and writes to a shared ring buffer.

For agents that combine multiple sensors (vibration + temperature + current for predictive maintenance), the sensing layer handles synchronization — ensuring inference operates on temporally aligned data from all sources.

Think

The inference and decision layer runs ML models and applies logic:

agent_decision_t agent_think(sensor_reading_t* readings, int count) {
    // Preprocess: compute FFT spectrum from vibration data
    float spectrum[128];
    compute_fft(readings, spectrum, count);

    // Run anomaly detection model
    float anomaly_score = run_anomaly_model(spectrum);

    // Stateful logic: track score trend over time
    update_score_history(anomaly_score);
    float trend = compute_trend();

    // Decision: combine current score with trend
    if (anomaly_score > CRITICAL_THRESHOLD) {
        return DECISION_CRITICAL;
    } else if (anomaly_score > WARNING_THRESHOLD && trend > 0) {
        return DECISION_WARNING;
    }
    return DECISION_NORMAL;
}

The key difference from plain inference: the decision logic is not just a threshold on the model output. It incorporates state (score history), trends (is it getting worse?), and cross-references (vibration anomaly + temperature rise = different conclusion than vibration anomaly alone).

Act

The action layer translates decisions into physical outputs:

DecisionActionHardware
NORMALUpdate dashboard periodicallyMQTT publish via Wi-Fi
WARNINGSend alert, increase sampling rateMQTT + timer reconfiguration
CRITICALTrigger local alarm, notify, logGPIO relay + MQTT + flash log

Actions can also modify the agent’s own behavior:

  • Adaptive sampling. Increase sensor polling rate when an anomaly is developing. Drop back to low-frequency sampling when conditions are normal. This saves power on battery-powered STM32L4 nodes.
  • Model switching. Run a lightweight screening model normally. When the screening model flags something, load a more accurate classification model. This cascade pattern works on ESP32-S3 where PSRAM allows loading alternate models at runtime.

Learn (Limited)

On-device learning on MCUs is constrained. Full model retraining requires backpropagation and optimizer state — operations that exceed typical MCU memory.

What is practical today:

  • Threshold adaptation: Adjust decision thresholds based on confirmed false positive/negative feedback
  • Baseline drift compensation: Slowly update the “normal” reference as operating conditions change seasonally
  • Statistical model updates: For models based on statistical bounds (not neural networks), update mean and variance estimates incrementally

Full model retraining happens off-device — on a PC or in the cloud. The retrained model is deployed via firmware update.

Example: Predictive Maintenance Agent

A concrete agent architecture for vibration-based machine monitoring:

Hardware: ESP32 + MEMS accelerometer + MQTT broker

Agent state machine:

MONITORING → (anomaly score > 0.7) → ALERT
ALERT → (score < 0.5 for 1 hour) → MONITORING
ALERT → (score > 0.9 OR rising trend > 2 hours) → CRITICAL
CRITICAL → (maintenance acknowledged via MQTT) → MONITORING

Behavior per state:

StateSample RateInference IntervalAction
MONITORING100 HzEvery 10 sPeriodic status via MQTT
ALERT500 HzEvery 2 sAlert via MQTT, LED warning
CRITICAL1 kHzEvery 500 msAlarm relay, continuous MQTT, raw data log

This is more than a model running in a loop. The agent adapts its behavior based on what it observes, escalates through defined stages, and takes different physical actions at each stage.

Example: Multi-Sensor Fusion Agent

A more complex agent that combines multiple sensing modalities:

Hardware: STM32H7 + accelerometer + thermocouple + current transformer

Pipeline:

  1. Read all three sensors (synchronized to a common timestamp)
  2. Run vibration anomaly model → score_v
  3. Run thermal trend model → score_t
  4. Run current signature model → score_c
  5. Fusion logic: weighted combination with domain rules

The fusion rules encode engineering knowledge:

  • score_v > 0.8 AND score_t > 0.6: Likely bearing failure — friction causes both vibration and heat
  • score_c > 0.8 AND score_v < 0.3: Likely electrical fault — current anomaly without mechanical vibration
  • All three > 0.5: Likely external cause — power supply problem or ambient temperature spike

The fusion logic is where agent intelligence lives — not in any individual model, but in how models are combined with domain knowledge.

Challenges and Limitations

This field is early. There is no standard framework for building AI agents on MCUs. The patterns described here are architectural — implemented in custom C firmware, not with an off-the-shelf agent SDK. Standardized tooling for orchestrating agent workflows on microcontrollers is emerging but not yet mature.

Memory is the hard constraint. Each additional model, state buffer, and sensor pipeline consumes SRAM that does not grow. A three-model agent on ESP32 (520 KB SRAM) leaves minimal headroom for application logic. Memory planning must be done upfront.

Testing is difficult. Agent behavior depends on state transitions that may take hours or days to trigger in real conditions. Simulation and accelerated testing frameworks for embedded AI agents are underdeveloped compared to cloud-native testing tools.

Debugging is harder than inference. When a single model produces wrong output, you check the input data and model weights. When an agent makes a wrong decision, you must trace through sensor fusion, state machine transitions, threshold logic, and action dispatch. Embedded debuggers help, but there is no equivalent of cloud-native observability for MCU agents.

Where This Is Going

The embedded AI agent space is converging from two directions:

From the embedded side: RTOS vendors and MCU manufacturers are adding ML-aware task scheduling, hardware inference accelerators, and inter-device communication protocols. FreeRTOS on ESP32 already provides the concurrency primitives. ST’s Cube.AI provides model optimization. The missing piece is the agent coordination layer that connects them.

From the AI side: The agent paradigm — sense, think, act, learn — is being applied to constrained devices. The question is how much of the orchestration layer can be abstracted without sacrificing the control that embedded developers need.

The intersection — AI agents that run autonomously on $5 microcontrollers, coordinate with each other, and adapt to their environment — is where embedded development is heading. The building blocks (ML inference, RTOS scheduling, wireless communication) exist today. The integration tooling that ties them into coherent agent architectures is what is being built now.

Frequently Asked Questions

What makes an embedded AI agent different from a simple inference loop?
A simple inference loop reads a sensor, runs a model, and prints a result. An agent adds decision logic (thresholds, state machines), multi-source input fusion, physical actions (GPIO, relays, motor control), and feedback loops where action outcomes influence the next sensing cycle. It is firmware architecture, not a function call.
Can MCUs run multiple AI agents simultaneously?
Yes, within memory constraints. An ESP32-S3 with 512 KB SRAM and PSRAM can run 2-3 small agents as separate RTOS tasks. Each agent needs its own tensor arena and sensor pipeline. The practical limit is memory — each additional model and its buffers consume 30-100 KB of SRAM.
How do embedded AI agents communicate with other systems?
Via standard protocols. ESP32-based agents use MQTT over Wi-Fi for cloud or dashboard communication. STM32-based agents use Modbus RTU for PLC integration or UART for inter-MCU communication. Multiple agents on separate MCUs can coordinate via CAN bus, I2C, or MQTT.
Is embedded AI agent development mature enough for production?
Single-agent systems — one MCU running a sensing-inference-action loop — are production-ready today. Predictive maintenance and anomaly detection deployments exist at scale. Multi-agent coordination across MCUs is earlier-stage. Standardized agent frameworks for microcontrollers are still emerging.

Related Hardware Guides

Explore More

Build Embedded AI Agents Visually

ForestHub is an edge AI agents orchestration platform. Design sensor-inference-action workflows visually, generate deployable firmware for ESP32 and STM32.

Get Started Free