Guide
To deploy AI models to microcontrollers, train a model in TensorFlow or Edge Impulse, quantize it to int8, convert it to TFLite format, and flash the resulting C array alongside the TFLite Micro runtime to your target MCU.
Published 2026-04-01
Deploying AI to an MCU requires three things: a trained model, a conversion pipeline, and a firmware project for your target hardware.
On the training side, you need a model that is small enough to fit in your MCU’s memory. For most microcontrollers, that means models under 500 KB. You will train on a desktop machine or cloud service — not on the MCU itself.
On the hardware side, you need:
Start with a pre-trained model or train your own. For first deployments, use one of these proven starting points:
If you use Edge Impulse, the training and conversion happen in one pipeline. Upload your dataset, select a learning block, and Edge Impulse handles quantization and export.
If you use TensorFlow, train normally in Python, then convert to TFLite:
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
tflite_model = converter.convert()
Quantization converts 32-bit floating point weights to 8-bit integers. This is not optional for most MCUs — it reduces model size by 4x and speeds up inference significantly on chips without an FPU.
Full integer quantization (int8 weights and activations) is the standard for MCU deployment. You need a representative dataset — a small sample of real inputs — to calibrate the activation ranges:
def representative_dataset():
for sample in calibration_data:
yield [sample.astype(np.float32)]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
Expect accuracy loss of 1-3% from quantization. If accuracy drops more than that, your model may be too complex for the target hardware.
The MCU cannot read .tflite files from a filesystem. You convert the binary model into a C array that gets compiled into the firmware:
xxd -i model.tflite > model_data.cc
This produces a header with the model bytes:
unsigned char model_tflite[] = {
0x20, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, ...
};
unsigned int model_tflite_len = 152384;
Edge Impulse skips this step — it exports a complete C++ library with the model already embedded.
Your firmware needs the TFLite Micro interpreter. The setup follows the same pattern regardless of MCU:
The tensor arena size depends on your model. Start with 80-100 KB and reduce it until inference fails, then add 10% headroom.
Build the firmware and flash it to your board:
| MCU | Build System | Flash Command |
|---|---|---|
| ESP32 / ESP32-S3 | ESP-IDF | idf.py flash monitor |
| STM32H7 / STM32F4 | STM32CubeIDE | Build + Run in IDE, or st-flash |
| Arduino Nano 33 BLE | Arduino CLI | arduino-cli upload -b arduino:mbed_nano:nano33ble |
After flashing, verify inference works:
Model too large for flash. A 500 KB model on a 1 MB flash chip may not leave enough room for the firmware itself. Budget 40-60% of flash for application code and runtime.
Tensor arena too small. The interpreter will return kTfLiteError without a clear message. Increase the arena in 10 KB steps until inference succeeds.
Operator not supported. TFLite Micro supports a subset of TFLite operators. If your model uses an unsupported op (like FlexDelegate), you must restructure the model or implement the op manually.
Wrong input format. If your model expects int8 input but you feed float32 sensor data, results will be garbage. Match the input tensor type exactly.
The right choice depends on your use case:
| Requirement | Recommended MCU | Why |
|---|---|---|
| Vision (camera input) | ESP32-S3 | Camera interface, SIMD, PSRAM |
| Ultra-low power | STM32L4 | < 100 nA shutdown mode |
| Maximum compute | STM32H7 | 480 MHz Cortex-M7, 1 MB SRAM |
| Budget / prototyping | ESP32-C3 | $1-3 per chip, Wi-Fi included |
| Arduino ecosystem | Nano 33 BLE | Built-in sensors, simple IDE |
Run object detection on ESP32-S3 with TFLite Micro. Hardware specs, compatibility analysis, getting started guide, and alternatives.
Run object detection on STM32H7 with TFLite Micro. 1 MB SRAM, 480 MHz Cortex-M7, CMSIS-NN acceleration for real-time inference.
Build keyword spotting on Arduino Nano 33 BLE with Edge Impulse. Built-in microphone, cloud training, and on-device inference.
Run anomaly detection on ESP32 with TFLite Micro. Autoencoder setup, sensor integration, and real-time monitoring for industrial applications.
Deploy predictive maintenance on STM32F4 with TFLite Micro. A widely used Cortex-M4 for cost-effective vibration monitoring in industrial settings.
ForestHub is designed to generate deployment-ready C code from a visual workflow. Pick your MCU, pick your model, deploy.
Get Started Free