Guide
Edge AI Agents on Microcontrollers
An edge AI agent on a microcontroller is firmware that closes a sense-reason-act loop on the device itself: it reads sensors, runs ML inference or rule-based logic to decide, and drives actuators — without a round trip to the cloud. On MCUs like the ESP32 or STM32 the reasoning is small models plus a state machine, not a language model, because RAM is measured in kilobytes.
Published 2026-06-06
This page covers the embedded, microcontroller-specific case. For the general definition of the term and how it maps to industrial deployments, see the canonical pillar on edge agents.
What “Agent” Means on a Microcontroller
In 2024-2025 the phrase “AI agent” usually means an LLM that calls tools, keeps memory, and pursues goals — ChatGPT with function calling, a coding assistant, a multi-step reasoning system. That definition does not survive contact with a microcontroller. An ESP32 has on the order of 520 KB of SRAM. An STM32F4 has tens to a few hundred kilobytes. Neither runs a language model.
On a microcontroller an edge AI agent is something narrower and more concrete:
An autonomous sense-reason-act loop that runs continuously on the device, deciding about the physical world from local inference, with no human in the loop and no network dependency.
The “intelligence” is not a foundation model. It is the combination of:
- Small specialized models — anomaly detection, classification, keyword spotting — quantized to int8 and run with LiteRT for Microcontrollers (formerly TensorFlow Lite Micro) or CMSIS-NN.
- Rule-based decision logic — thresholds, state machines, conditional execution that encodes engineering knowledge.
- Physical actions — GPIO toggling, relay switching, motor control, an MQTT alert.
- Feedback — the outcome of an action changes the next sensing cycle (adaptive sampling, escalation).
This is closer to control engineering and robotics than to chatbots. Calling it an “agent” is justified by the loop, not by the model.
The Sense-Reason-Act Loop
Every embedded agent is a variation on three stages plus feedback. Most embedded-ML projects implement only the first.
┌──────────────────────────────────────────┐
│ │
▼ │
┌────────┐ ┌─────────┐ ┌────────┐ ┌────────┐
│ SENSE │ ─► │ REASON │ ─► │ ACT │ ─► │ STATE │
│ ADC, │ │ model + │ │ GPIO, │ │ history│
│ I2C, │ │ logic + │ │ relay, │ │ trend │
│ SPI │ │ state │ │ MQTT │ │ │
└────────┘ └─────────┘ └────────┘ └────────┘
│
feedback (sampling rate)│
◄─────────────────────────────────────────┘
Sense
The sensing task runs on a fixed schedule — typically 10 Hz to a few kHz depending on the modality — reading raw ADC, I2C, or SPI data, applying calibration, and writing into a shared ring buffer:
typedef struct {
float vibration_rms;
float temperature_c;
float current_a;
uint32_t timestamp_ms;
} sensor_reading_t;
When an agent fuses multiple sensors, this layer also handles synchronization so the reasoning stage sees temporally aligned data.
Reason
The reasoning stage is where an agent differs from plain inference. It runs the model, then applies a policy that incorporates state and trend:
agent_decision_t agent_reason(sensor_reading_t *r, int n) {
float spectrum[128];
compute_fft(r, spectrum, n);
float score = run_anomaly_model(spectrum); // int8 LiteRT model
push_history(score); // keep last N scores
float trend = compute_trend(); // rising / falling
if (score > CRITICAL_THRESHOLD) return DECISION_CRITICAL;
if (score > WARNING_THRESHOLD && trend > 0) return DECISION_WARNING;
return DECISION_NORMAL;
}
The decision is not a bare threshold on a model output. It mixes the current score, the trend, and (in multi-sensor agents) cross-references between modalities.
Act
The action stage turns a decision into a physical effect and, optionally, changes the agent’s own behavior:
| Decision | Action | Hardware path |
|---|---|---|
| NORMAL | Periodic status | MQTT publish over Wi-Fi |
| WARNING | Alert, raise sampling rate | MQTT + timer reconfiguration |
| CRITICAL | Local alarm, notify, log | GPIO relay + MQTT + flash log |
Feedback closes the loop: raise the sampling rate when an anomaly is developing, drop back to a low duty cycle when conditions normalize. On a battery node like the STM32L4 running anomaly detection, adaptive sampling is the difference between weeks and months of runtime.
Architecture: Tasks, Arenas, and a State Machine
A single agent on an MCU is usually three or four RTOS tasks coordinated through queues:
- Sensor task — fixed-rate acquisition into a ring buffer.
- Inference task — pulls a window, runs the model into a statically allocated tensor arena, emits a score.
- Policy task — runs the state machine, decides, dispatches actions.
- Comms task — handles MQTT/Modbus/UART without blocking the control path.
On the ESP32, FreeRTOS (bundled with ESP-IDF) provides the task and queue primitives directly. On STM32, the same structure runs on FreeRTOS, Zephyr, or a bare super-loop with a timer-driven scheduler. ST’s X-CUBE-AI converts and optimizes the model for the target; LiteRT for Microcontrollers and CMSIS-NN are the common runtimes.
The state machine is the spine of the agent. A predictive-maintenance agent built on the ESP32 with Edge Impulse might look like:
MONITORING --(score > 0.7)----------------► ALERT
ALERT --(score < 0.5 for 1 h)--------► MONITORING
ALERT --(score > 0.9 OR trend > 2 h)-► CRITICAL
CRITICAL --(ack via MQTT)---------------► MONITORING
Each state changes both the sampling rate and the actions taken — the behavior adapts to what the device observes rather than reacting identically every cycle.
Constraints That Define the Design
Memory is the binding constraint. Each model, tensor arena, sensor buffer, and history ring consumes SRAM that does not grow. A single int8 model with its arena is roughly 30-100 KB. An ESP32-S3 (512 KB SRAM plus external PSRAM) can host two or three small models as separate tasks; an ESP32 anomaly-detection node leaves modest headroom for application logic. Budget memory first, then design the agent to fit.
Compute is fixed and modest. A 240 MHz Xtensa or a 168 MHz Cortex-M4 runs a small model in single-digit to low-tens of milliseconds. That is fine for control loops at tens to hundreds of Hz. It is not enough for large vision transformers — those belong on a gateway or in the cloud.
Determinism matters. The reason to put the loop on the device is bounded, repeatable latency. Network calls reintroduce jitter. If an agent must call out, the call belongs off the hard real-time path (for example, escalation messaging), never inside the control decision.
On-device learning is limited. Full retraining needs backpropagation and optimizer state that exceed MCU memory. What is practical on-device: threshold adaptation from confirmed feedback, slow baseline-drift compensation, and incremental statistics for non-neural models. Real retraining happens off-device and ships as a firmware update.
When to Use an Edge AI Agent on an MCU
Reach for an on-device agent when most of these hold:
- The control decision needs deterministic sub-100 ms latency — vibration response, motor adjustment, safety interlock.
- The device must keep working offline — remote, industrial, or intermittently connected sites.
- Raw data cannot leave the device for privacy or bandwidth reasons. A 3.2 kHz accelerometer produces tens of megabytes per hour; on-device reasoning reduces that to kilobytes of events.
- You are deploying many low-cost nodes where per-inference cloud cost would dominate.
Do not force it on-device when the model exceeds MCU memory, the reasoning genuinely needs a language model, or you retrain weekly. Those cases want a gateway or cloud tier — often in a hybrid split where the MCU runs the fast local loop and a server handles heavier reasoning. The trade-offs are laid out in detail in edge agents vs cloud agents and edge AI vs cloud AI.
How ForestHub Fits
The patterns above are typically hand-written C firmware: custom RTOS tasks, hand-rolled state machines, manual MQTT plumbing. That is the work that stalls most teams between a working inference demo and a deployable agent.
ForestHub is the edge AI agents orchestration platform that targets exactly that gap. It runs on your Linux edge gateway, above the MCUs: you design the sense-reason-act loop as a graph — sensor and inference results arriving over MQTT, Modbus, and OPC-UA, decision and state nodes, an LLM as one reasoning node among many, escalation and actuation back out over the same protocols. ForestHub orchestrates that graph deterministically and keeps it inspectable, replayable, and auditable: it ingests device results, holds state across the fleet, adds reasoning where a step benefits from a language model rather than an on-device pattern matcher, and acts over industrial protocols. The device owns on-device sensing and actuation; the Linux-gateway platform owns the orchestration that turns models into agents.
If you are moving from a single inference loop toward a coordinated agent, the companion guide build an AI agent for embedded systems walks through the architecture in more depth, and how to build agentic edge AI gives the step-by-step build path.
Frequently Asked Questions
- What is an edge AI agent on a microcontroller?
- It is firmware that runs an autonomous sense-reason-act loop on the device. It reads sensors, decides using on-device ML inference plus rule-based logic and state, and drives actuators (GPIO, relays, MQTT messages) — all without a network round trip. The reasoning is small quantized models and a state machine, not a large language model, because an ESP32 has roughly 520 KB of SRAM and an STM32F4 has tens to hundreds of kilobytes.
- Can a microcontroller really run an AI agent?
- Yes, for the right definition of agent. An MCU cannot run a foundation model, but it can run the autonomous loop that an agent describes: sensing, a decision policy, actuation, and feedback. Tools like LiteRT for Microcontrollers (formerly TensorFlow Lite Micro) and CMSIS-NN run quantized neural networks in kilobytes of RAM. The agency comes from the loop and the decision logic, not from model size.
- How much memory does an edge AI agent need on an ESP32 or STM32?
- It depends on the models. A single anomaly-detection or keyword-spotting model with its tensor arena typically needs 30-100 KB of SRAM. On an ESP32-S3 (512 KB SRAM plus PSRAM) you can run two or three small models as separate RTOS tasks. On an STM32F4 with less SRAM, plan one model plus the decision and sensor pipeline. Memory is usually the binding constraint, so it must be budgeted upfront.
- When should I use an edge AI agent instead of cloud inference?
- Use an on-device agent when you need deterministic low latency (sub-100 ms control loops), offline operation, or data that cannot leave the device. Use cloud or a gateway when the model exceeds MCU memory, the reasoning needs a language model, or you retrain frequently. Many production systems are hybrid: the MCU runs the fast local loop, a gateway or cloud handles heavier reasoning.
- What is the difference between an edge AI agent and a simple inference loop?
- A simple inference loop reads a sensor, runs one model, and reports a result. An agent adds a decision policy (thresholds, a state machine), maintains state across cycles, takes physical actions, and feeds action outcomes back into sensing. It is an architecture, not a single function call.
Related Hardware Guides
ESP32 Predictive Maintenance with Edge Impulse
Deploy vibration-based predictive maintenance on ESP32 with Edge Impulse. Sensor setup, model training, and continuous monitoring guide.
ESP32-S3 Object Detection with TFLite Micro
Run object detection on ESP32-S3 with TFLite Micro. Hardware specs, compatibility analysis, getting started guide, and alternatives.
STM32F4 Anomaly Detection with TFLite Micro
Run anomaly detection on STM32F4 with TFLite Micro. Autoencoder-based monitoring on the industry-standard Cortex-M4 platform.
STM32L4 Anomaly Detection with TFLite Micro
Deploy ultra-low-power anomaly detection on STM32L4 with TFLite Micro. Battery-operated monitoring with shutdown current under 100 nA.
ESP32 Anomaly Detection with TFLite Micro
Run anomaly detection on ESP32 with TFLite Micro. Autoencoder setup, sensor integration, and real-time monitoring for industrial applications.
Explore More
Orchestrate Edge AI Agents Across Your Fleet
ForestHub is the edge AI agents orchestration platform. It runs on your Linux edge gateway, ingests device results over MQTT, Modbus, and OPC-UA, and orchestrates the sense-reason-act loop across the fleet as a deterministic, auditable graph.
Get Started Free