Guide
How to Deploy AI Models to Microcontrollers
To deploy AI models to microcontrollers, train a model in TensorFlow or Edge Impulse, quantize it to int8, convert it to TFLite format, and flash the resulting C array alongside the TFLite Micro runtime to your target MCU.
Published 2026-04-01
What You Need Before Starting
Deploying AI to an MCU requires three things: a trained model, a conversion pipeline, and a firmware project for your target hardware.
On the training side, you need a model that is small enough to fit in your MCU’s memory. For most microcontrollers, that means models under 500 KB. You will train on a desktop machine or cloud service — not on the MCU itself.
On the hardware side, you need:
- A supported MCU (ESP32, STM32, Arduino Nano 33 BLE, or similar ARM Cortex-M / Xtensa chip)
- The vendor’s toolchain installed (ESP-IDF for Espressif, STM32CubeIDE for ST, Arduino IDE for Arduino boards)
- A USB cable and basic embedded C experience
Step 1: Train or Select a Model
Start with a pre-trained model or train your own. For first deployments, use one of these proven starting points:
- Image classification: MobileNet V2 (quantized) — fits on ESP32-S3 with PSRAM
- Keyword spotting: The TFLite Micro speech example — runs on virtually any supported MCU
- Anomaly detection: A simple autoencoder or statistical model — under 20 KB on STM32F4
If you use Edge Impulse, the training and conversion happen in one pipeline. Upload your dataset, select a learning block, and Edge Impulse handles quantization and export.
If you use TensorFlow, train normally in Python, then convert to TFLite:
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
tflite_model = converter.convert()
Step 2: Quantize to int8
Quantization converts 32-bit floating point weights to 8-bit integers. This is not optional for most MCUs — it reduces model size by 4x and speeds up inference significantly on chips without an FPU.
Full integer quantization (int8 weights and activations) is the standard for MCU deployment. You need a representative dataset — a small sample of real inputs — to calibrate the activation ranges:
def representative_dataset():
for sample in calibration_data:
yield [sample.astype(np.float32)]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
Expect accuracy loss of 1-3% from quantization. If accuracy drops more than that, your model may be too complex for the target hardware.
Step 3: Convert to C Array
The MCU cannot read .tflite files from a filesystem. You convert the binary model into a C array that gets compiled into the firmware:
xxd -i model.tflite > model_data.cc
This produces a header with the model bytes:
unsigned char model_tflite[] = {
0x20, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, ...
};
unsigned int model_tflite_len = 152384;
Edge Impulse skips this step — it exports a complete C++ library with the model already embedded.
Step 4: Set Up the TFLite Micro Runtime
Your firmware needs the TFLite Micro interpreter. The setup follows the same pattern regardless of MCU:
- Include the runtime — add TFLite Micro source files to your build (as a static library or source inclusion)
- Register operators — only the ops your model actually uses, keeping binary size small
- Allocate a tensor arena — a static byte array that serves as the interpreter’s working memory
- Load the model — point the interpreter at your C array
- Run inference — copy input data into the input tensor, invoke, read the output tensor
The tensor arena size depends on your model. Start with 80-100 KB and reduce it until inference fails, then add 10% headroom.
Step 5: Flash and Test
Build the firmware and flash it to your board:
| MCU | Build System | Flash Command |
|---|---|---|
| ESP32 / ESP32-S3 | ESP-IDF | idf.py flash monitor |
| STM32H7 / STM32F4 | STM32CubeIDE | Build + Run in IDE, or st-flash |
| Arduino Nano 33 BLE | Arduino CLI | arduino-cli upload -b arduino:mbed_nano:nano33ble |
After flashing, verify inference works:
- Check serial output for prediction results
- Measure inference time — compare against your latency requirement
- Monitor RAM usage — the tensor arena plus stack must fit in available SRAM
Common Pitfalls
Model too large for flash. A 500 KB model on a 1 MB flash chip may not leave enough room for the firmware itself. Budget 40-60% of flash for application code and runtime.
Tensor arena too small. The interpreter will return kTfLiteError without a clear message. Increase the arena in 10 KB steps until inference succeeds.
Operator not supported. TFLite Micro supports a subset of TFLite operators. If your model uses an unsupported op (like FlexDelegate), you must restructure the model or implement the op manually.
Wrong input format. If your model expects int8 input but you feed float32 sensor data, results will be garbage. Match the input tensor type exactly.
Once your model runs on the MCU, coordinating the surrounding agent logic — collecting results across the fleet, holding state, deciding what to do, and acting — is where most development time goes. ForestHub handles this layer on your Linux edge gateway: it ingests device results over MQTT, Modbus, and OPC-UA, runs decision and state logic with LLM reasoning as one node, and acts back over industrial protocols — all as a deterministic, auditable graph, so you can focus on the model and the use case.
Which MCU Should You Use?
The right choice depends on your use case:
| Requirement | Recommended MCU | Why |
|---|---|---|
| Vision (camera input) | ESP32-S3 | Camera interface, SIMD, PSRAM |
| Ultra-low power | STM32L4 | < 100 nA shutdown mode |
| Maximum compute | STM32H7 | 480 MHz Cortex-M7, 1 MB SRAM |
| Budget / prototyping | ESP32-C3 | $1-3 per chip, Wi-Fi included |
| Arduino ecosystem | Nano 33 BLE | Built-in sensors, simple IDE |
Not sure which MCU fits? The MCU Selector helps you filter by use case, framework, and constraints.
Frequently Asked Questions
- What size AI model can run on a microcontroller?
- Most MCUs run models between 50 KB and 500 KB. An int8 quantized MobileNet V2 for image classification fits in roughly 250 KB. Simpler models like keyword spotting or anomaly detection can be under 20 KB.
- Do I need to train the model on the microcontroller?
- No. Training happens on a PC or cloud service. The microcontroller only runs inference — it executes the pre-trained model on live sensor data. On-device training on MCUs is experimental and not production-ready.
- Can I use Python on a microcontroller for AI?
- MicroPython exists but is too slow for real-time inference. Production deployments use C/C++ with TFLite Micro or Edge Impulse SDK. The model is compiled into a C array and linked directly into the firmware.
- How long does inference take on a typical MCU?
- Depends on the model and hardware. A keyword spotting model on ESP32 runs in roughly 20-50 ms. Object detection on ESP32-S3 with SIMD takes roughly 100-300 ms per frame. Anomaly detection on STM32L4 can run under 10 ms. These are estimated ranges — benchmark on target hardware for production, as performance varies with model architecture and optimization.
Related Hardware Guides
ESP32-S3 Object Detection with TFLite Micro
Run object detection on ESP32-S3 with TFLite Micro. Hardware specs, compatibility analysis, getting started guide, and alternatives.
STM32H7 Object Detection with TFLite Micro
Run object detection on STM32H7 with TFLite Micro. 1 MB SRAM, 480 MHz Cortex-M7, CMSIS-NN acceleration for real-time inference.
Arduino Nano 33 BLE Voice Recognition Edge Impulse
Build keyword spotting on Arduino Nano 33 BLE with Edge Impulse. Built-in microphone, cloud training, and on-device inference.
ESP32 Anomaly Detection with TFLite Micro
Run anomaly detection on ESP32 with TFLite Micro. Autoencoder setup, sensor integration, and real-time monitoring for industrial applications.
STM32F4 Predictive Maintenance with TFLite Micro
Deploy predictive maintenance on STM32F4 with TFLite Micro. A widely used Cortex-M4 for cost-effective vibration monitoring in industrial settings.
Explore More
Skip the Boilerplate
Once your model runs on the MCU, ForestHub orchestrates the rest. The edge AI agents platform runs on your Linux gateway, ingesting device results over MQTT, Modbus, and OPC-UA and coordinating the fleet as a deterministic, auditable graph.
Get Started Free