What is agentic edge AI?

Agentic edge AI is an on-device system that does more than run a single inference. It closes a loop: it senses, reasons (model plus decision logic and state), acts on the physical world, and feeds outcomes back into the next cycle — autonomously and without a cloud round trip. On microcontrollers the reasoning is small quantized models plus a state machine, not a language model.

What tools do I need to build agentic edge AI?

A model runtime such as LiteRT for Microcontrollers (formerly TensorFlow Lite Micro) or CMSIS-NN; a training and conversion path such as Edge Impulse or TensorFlow plus int8 quantization; an RTOS or scheduler such as FreeRTOS (bundled with ESP-IDF) or Zephyr; and vendor tooling such as ST X-CUBE-AI for STM32. For multi-device orchestration, an edge AI agents orchestration platform like ForestHub handles state, networking, and escalation.

How do I keep an edge AI agent within microcontroller memory limits?

Budget SRAM before writing code. A single int8 model with its tensor arena needs roughly 30-100 KB. Statically allocate the tensor arena, avoid dynamic allocation in the loop, prefer one well-chosen model over several, and use a cascade (a small screening model that only loads a larger model when triggered) when you need more capability than RAM allows at once. An ESP32-S3 with PSRAM gives the most headroom.

Should the reasoning step run on-device or in the cloud?

Run it on-device when the decision needs deterministic low latency, offline operation, or data locality. Move it to a gateway or cloud when the model is too large for the MCU, when the reasoning needs a language model, or when you change behavior often. A common production pattern is hybrid: a fast deterministic loop on-device, heavier reasoning escalated to a server off the real-time path.

Guide

How to Build Agentic Edge AI

To build agentic edge AI, implement a sense-reason-act loop on the device: wire up sensors and a fixed-rate acquisition task, run a quantized on-device model for inference, add a decision-and-state layer (thresholds plus a state machine), drive actuators, then add orchestration for state, networking, and escalation. Validate the model offline first, budget SRAM before adding models, and keep network calls off the real-time decision path.

Published 2026-06-06

This is the practical build path. For the definition of the term and how it maps to industrial deployments, see the canonical pillar on edge agents; for the architecture in depth, see build an AI agent for embedded systems.

What You Are Building

Agentic edge AI is a sense-reason-act loop running on the device. Most embedded-ML tutorials stop at “model runs, prints a result.” An agent continues: it decides, acts on the physical world, holds state across cycles, and feeds outcomes back into sensing. The steps below take you from a validated model to a deployable agent on an ESP32 or STM32, and then to a fleet.

Before you start, get the boundaries right:

The MCU does the fast, deterministic, local work. A language model does not run on it.
Memory is the binding constraint. Budget it before adding models.
Keep network calls off the decision path; escalation is asynchronous.

Step 1 — Define the Loop and Pick the Hardware

Write down the four corners of the loop before any code: what it senses, what decision it must make, what action it takes, and what state it must remember. That single paragraph determines the hardware.

Vibration/predictive maintenance maps well to an ESP32 with Edge Impulse or, for heavier DSP, an STM32H7.
Vision (object detection, people counting) needs PSRAM and a camera — an ESP32-S3 running object detection.
Motion/gesture on a low-power node fits the Arduino Nano 33 BLE.
Anomaly detection on a constrained MCU runs on an STM32F4.

Pick the cheapest part whose SRAM and clock comfortably hold your model plus the loop. Use the MCU compatibility checker to sanity-check the fit.

Step 2 — Train and Validate the Model Offline

Build the model before the firmware. Collect representative data, train (Edge Impulse or TensorFlow), and quantize to int8 for the MCU. Validate accuracy on a held-out set on a PC first — debugging a bad model inside firmware is far harder than fixing it in a notebook.

Convert to the target runtime: LiteRT for Microcontrollers (formerly TensorFlow Lite Micro) or CMSIS-NN, with ST X-CUBE-AI for STM32. Record the model’s RAM footprint (arena size) and per-inference latency now — both feed the next steps.

Step 3 — Build the Sensing Task

Implement acquisition as a fixed-rate task that writes calibrated readings into a ring buffer. Do not couple sampling to inference timing — they have different rates.

void sensor_task(void *arg) {
    const TickType_t period = pdMS_TO_TICKS(10);   // 100 Hz
    TickType_t last = xTaskGetTickCount();
    for (;;) {
        sensor_reading_t s = read_calibrated_sensors();
        ring_push(&g_ring, &s);
        vTaskDelayUntil(&last, period);
    }
}

For multi-sensor agents, timestamp every reading so the reasoning step operates on aligned data.

Step 4 — Run Inference On-Device

Statically allocate the tensor arena (never allocate it in the loop) and run the model on a window pulled from the ring buffer:

static uint8_t tensor_arena[ARENA_SIZE];   // sized in Step 2

float run_inference(const float *window, int n) {
    fill_input_tensor(window, n);
    if (interpreter->Invoke() != kTfLiteOk) return NAN;
    return read_output_score();
}

Confirm the on-device latency matches your Step 2 estimate. If it is too slow, shrink the model or move to a faster part — do not paper over it with a longer loop period that breaks the control requirement.

Step 5 — Add Decision and State Logic

This is the step that turns inference into agency. Combine the model output with state (history) and a state machine, not a bare threshold:

agent_decision_t decide(float score) {
    push_history(score);
    float trend = compute_trend();

    switch (g_state) {
    case MONITORING:
        if (score > 0.7f) g_state = ALERT;
        break;
    case ALERT:
        if (score > 0.9f || trend > RISING_2H) g_state = CRITICAL;
        else if (score < 0.5f) g_state = MONITORING;
        break;
    case CRITICAL:
        if (ack_received()) g_state = MONITORING;
        break;
    }
    return decision_for_state(g_state);
}

The agent now escalates on trend, remembers what it has seen, and behaves differently in each state.

Step 6 — Drive Actuation

Map each decision to a physical action, and let state change the agent’s own behavior (adaptive sampling, model cascade):

State	Sample rate	Action
MONITORING	100 Hz	Periodic status via MQTT
ALERT	500 Hz	Alert via MQTT, warning LED
CRITICAL	1 kHz	Alarm relay, continuous MQTT, raw-data log

Keep actuation deterministic and local. Anything that needs the network (an alert, a log upload) goes through the comms task so it never blocks the decision.

Step 7 — Orchestrate State, Networking, and Escalation

A single device is an agent; a deployment is a system. The orchestration layer handles state persistence across reboots, network reconnection and buffering, escalation to a gateway or cloud reasoning step, and coordination between devices. This is where hand-rolled firmware accumulates the most accidental complexity.

For decisions that genuinely need a larger model or an LLM, escalate off the real-time path: the edge agent makes the fast local decision, and asynchronously asks a server for the harder judgment, then acts on the response. The trade-offs of where each step runs are covered in edge agents vs cloud agents.

Step 8 — Deploy and Update the Fleet

Ship with OTA so you can push model and logic updates without touching hardware. Version models and decision logic independently, validate each update on a staging device, and roll out gradually. Centralize device health and event logs so you can see what every agent is deciding — local-only logs do not scale past a handful of devices.

Tools Landscape

Layer	Common options
Model training/conversion	Edge Impulse, TensorFlow + int8 quantization
On-device runtime	LiteRT for Microcontrollers, CMSIS-NN
Vendor optimization	ST X-CUBE-AI (STM32), ESP-DL (ESP32)
Scheduling	FreeRTOS (ESP-IDF), Zephyr, bare super-loop
Connectivity	MQTT (Wi-Fi), Modbus RTU, UART, CAN
Orchestration	ForestHub (edge AI agents orchestration platform)

How ForestHub Fits

Steps 5 through 8 — decision/state logic, orchestration, escalation, and fleet deployment — are where most of the custom firmware lives, and where projects stall after the model works.

ForestHub is the edge AI agents orchestration platform that targets that layer. It runs on your Linux edge gateway, above the devices: you wire the loop as a graph — sensor and inference results coming in over MQTT, Modbus, and OPC-UA, decision and state nodes, an LLM as one reasoning node among many, escalation and actuation back out over the same industrial protocols. ForestHub orchestrates that graph deterministically and keeps it inspectable, replayable, and auditable: state handling, escalation to cloud or on-premise reasoning, and versioned management across the fleet. The device keeps owning sensing and actuation; the platform owns the orchestration between a working model and a deployable, manageable agent. For the conceptual grounding, start with edge AI agents on microcontrollers.

Frequently Asked Questions

What is agentic edge AI?: Agentic edge AI is an on-device system that does more than run a single inference. It closes a loop: it senses, reasons (model plus decision logic and state), acts on the physical world, and feeds outcomes back into the next cycle — autonomously and without a cloud round trip. On microcontrollers the reasoning is small quantized models plus a state machine, not a language model.
What tools do I need to build agentic edge AI?: A model runtime such as LiteRT for Microcontrollers (formerly TensorFlow Lite Micro) or CMSIS-NN; a training and conversion path such as Edge Impulse or TensorFlow plus int8 quantization; an RTOS or scheduler such as FreeRTOS (bundled with ESP-IDF) or Zephyr; and vendor tooling such as ST X-CUBE-AI for STM32. For multi-device orchestration, an edge AI agents orchestration platform like ForestHub handles state, networking, and escalation.
How do I keep an edge AI agent within microcontroller memory limits?: Budget SRAM before writing code. A single int8 model with its tensor arena needs roughly 30-100 KB. Statically allocate the tensor arena, avoid dynamic allocation in the loop, prefer one well-chosen model over several, and use a cascade (a small screening model that only loads a larger model when triggered) when you need more capability than RAM allows at once. An ESP32-S3 with PSRAM gives the most headroom.
Should the reasoning step run on-device or in the cloud?: Run it on-device when the decision needs deterministic low latency, offline operation, or data locality. Move it to a gateway or cloud when the model is too large for the MCU, when the reasoning needs a language model, or when you change behavior often. A common production pattern is hybrid: a fast deterministic loop on-device, heavier reasoning escalated to a server off the real-time path.
How do I deploy and update agentic edge AI on a fleet?: Ship the firmware with an OTA update mechanism so you can push model and logic updates without physical access. Version models and decision logic separately, validate each update on a staging device, and roll out gradually. Centralize device health and event logging so you can see what each agent is deciding. An orchestration platform manages versioned deploys and per-device state across the fleet.

Related Hardware Guides

ESP32 Predictive Maintenance with Edge Impulse

Deploy vibration-based predictive maintenance on ESP32 with Edge Impulse. Sensor setup, model training, and continuous monitoring guide.

ESP32-S3 Object Detection with TFLite Micro

Run object detection on ESP32-S3 with TFLite Micro. Hardware specs, compatibility analysis, getting started guide, and alternatives.

STM32F4 Anomaly Detection with TFLite Micro

Run anomaly detection on STM32F4 with TFLite Micro. Autoencoder-based monitoring on the industry-standard Cortex-M4 platform.

Arduino Nano 33 BLE Gesture Recognition TFLite

Run gesture recognition on Arduino Nano 33 BLE with TFLite Micro. Built-in IMU, Arduino IDE, and the official TFLite gesture tutorial.

STM32H7 Predictive Maintenance with Edge Impulse

Deploy predictive maintenance on STM32H7 with Edge Impulse. High-frequency vibration analysis with 1 MB SRAM and 480 MHz Cortex-M7.

Sources

Explore More

ESP32 guides ESP32-S3 guides nRF52840 guides STM32F4 guides STM32H7 guides All resources MCU Compatibility Checker

Build and Orchestrate Agentic Edge AI

ForestHub is the edge AI agents orchestration platform. It runs on your Linux edge gateway, ingests device results over MQTT, Modbus, and OPC-UA, and orchestrates the sense-reason-act loop as a deterministic, auditable graph.

Get Started Free