Guide
Install ESP-IDF, add the TFLite Micro component to your project, allocate a tensor arena in SRAM, register only the operators your model uses, and invoke the interpreter. ESP32 runs int8 quantized models with 20-300 ms inference depending on complexity.
Published 2026-04-01
TensorFlow Lite Micro was rebranded to LiteRT for Microcontrollers in 2024. This guide uses both names interchangeably — the codebase, API, and model format are the same. The official documentation now lives under the LiteRT name.
Before starting, you need:
This guide targets ESP-IDF. If you are using Arduino, the concepts are the same but the project structure differs.
Start with a clean ESP-IDF project:
idf.py create-project tflite_demo
cd tflite_demo
idf.py set-target esp32 # or esp32s3
This creates the standard project structure with main/tflite_demo.c as the entry point.
TFLite Micro integrates into ESP-IDF as a managed component. Add it to your project:
idf.py add-dependency "espressif/esp-tflite-micro"
This pulls the Espressif-maintained fork of TFLite Micro, which includes ESP32-specific optimizations — SIMD acceleration on ESP32-S3 and memory placement options for PSRAM.
Alternatively, you can clone the TFLite Micro source directly into your components/ directory. This gives full control but requires manual updates.
The ESP32 does not have a filesystem for loading .tflite files at runtime. Convert the model binary to a C header:
xxd -i model.tflite > main/model_data.h
This generates:
const unsigned char model_tflite[] = {
0x20, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, ...
};
const unsigned int model_tflite_len = 48672;
For ESP32-S3 with PSRAM, place the model in external memory to free SRAM for the tensor arena and application code:
__attribute__((section(".ext_ram.bss")))
const unsigned char model_tflite[] = { ... };
PSRAM is slower than internal SRAM. Model weights stored in PSRAM are accessed during each inference, adding some latency overhead. However, for many applications this trade-off is worthwhile to free internal SRAM for the tensor arena and application buffers.
The tensor arena is a statically allocated byte array that the interpreter uses as working memory. It holds intermediate activations, temporary buffers, and quantization parameters during inference.
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"
constexpr int kTensorArenaSize = 81920; // 80 KB — adjust per model
alignas(16) uint8_t tensor_arena[kTensorArenaSize];
Sizing the arena: Start with a generous size (80-100 KB). If inference succeeds, reduce by 10 KB at a time until it fails, then add 10% headroom. The interpreter returns kTfLiteError when the arena is too small — with no indication of how much more it needs.
Memory budget for ESP32 (520 KB SRAM total):
TFLite Micro uses selective operator registration. Instead of linking all operators (which bloats the binary by 100+ KB), you register only the ops your model actually uses:
static tflite::MicroMutableOpResolver<6> resolver;
resolver.AddConv2D();
resolver.AddDepthwiseConv2D();
resolver.AddReshape();
resolver.AddSoftmax();
resolver.AddFullyConnected();
resolver.AddMaxPool2D();
The template parameter (<6>) must match the number of ops you register.
To find which ops your model uses, inspect the .tflite file with flatc or the Netron model viewer. Each registered op adds 1-10 KB to the firmware binary. Skip ops your model does not use.
If you miss an op, the interpreter fails at init with Didn't find op for builtin opcode — a clear error that tells you exactly which op to add.
The complete inference setup in C:
#include "model_data.h"
void app_main(void) {
// 1. Load and validate the model
const tflite::Model* model = tflite::GetModel(model_tflite);
if (model->version() != TFLITE_SCHEMA_VERSION) {
printf("Model schema version mismatch\n");
return;
}
// 2. Create the interpreter
tflite::MicroInterpreter interpreter(
model, resolver, tensor_arena, kTensorArenaSize);
// 3. Allocate tensors
if (interpreter.AllocateTensors() != kTfLiteOk) {
printf("AllocateTensors failed — arena too small\n");
return;
}
// 4. Get input tensor pointer
TfLiteTensor* input = interpreter.input(0);
printf("Input type: %d, bytes: %zu\n", input->type, input->bytes);
// 5. Inference loop
while (true) {
// Copy sensor data into input tensor
// Match the model's expected input type (int8 or float32)
read_sensor_data(input->data.int8, input->bytes);
// Run inference
if (interpreter.Invoke() != kTfLiteOk) {
printf("Invoke failed\n");
continue;
}
// Read output
TfLiteTensor* output = interpreter.output(0);
int8_t prediction = output->data.int8[0];
printf("Prediction: %d\n", prediction);
vTaskDelay(pdMS_TO_TICKS(100));
}
}
Input format is critical. If your model expects int8 input (standard for quantized models), you must provide int8 data. Feeding float32 data to an int8 input produces garbage output with no error message. Always check input->type after AllocateTensors().
idf.py build
idf.py -p /dev/ttyUSB0 flash monitor
The first build takes 2-5 minutes as TFLite Micro compiles from source. Subsequent builds are faster.
| Symptom | Cause | Fix |
|---|---|---|
AllocateTensors failed | Tensor arena too small | Increase kTensorArenaSize by 20 KB |
Didn't find op for builtin opcode | Missing operator registration | Check model ops, add to resolver |
| Garbage output values | Input format mismatch | Verify input->type matches your data |
| Crash or watchdog reset | Stack overflow | Increase main task stack size in menuconfig |
| Slow inference (>500 ms) | Float32 model not optimized for MCU pipeline | Use int8 quantized model |
| Linker errors | TFLite component not found | Run idf.py reconfigure after adding dependency |
| Model | ESP32 (int8) | ESP32-S3 (int8 + SIMD) |
|---|---|---|
| Keyword spotting (20 KB) | 30-50 ms | 15-30 ms |
| Anomaly detection (10 KB) | 5-10 ms | 2-5 ms |
| Gesture recognition (30 KB) | 10-25 ms | 5-15 ms |
| Image classification 96x96 (250 KB) | 200-400 ms | 100-200 ms |
Estimated ranges — benchmark on target hardware for production. Performance varies with model architecture and optimization.
The ESP32-S3’s SIMD instructions accelerate int8 operations by roughly 2x compared to the base ESP32. For ML workloads, the S3 is worth the $3-5 price premium.
The ESP32’s single-precision FPU handles individual floating-point operations but is not optimized for batch neural network computations. Int8 quantized models run faster because integer operations process more efficiently on the MCU’s pipeline. Quantize during conversion, not as an afterthought:
# During model conversion (on your development PC)
converter = tf.lite.TFLiteConverter.from_saved_model('./saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
tflite_model = converter.convert()
Note: The older tflite_convert CLI is deprecated. Use the Python tf.lite.TFLiteConverter API shown above.
On the ESP32-S3, place the model array in PSRAM to free internal SRAM for the tensor arena. PSRAM is slower than internal SRAM, and model weights stored there are accessed during each inference. This adds some latency, but frees internal SRAM for the tensor arena and application code — a worthwhile trade-off for larger models.
Each registered operator adds flash footprint. A model using 5 operators adds roughly 15-30 KB to the binary. Audit your model and strip unnecessary ops — TFLite conversion sometimes introduces ops that an optimized model does not actually need.
Run inference on Core 1 while Core 0 handles sensor reading and Wi-Fi. This prevents inference from blocking communication:
xTaskCreatePinnedToCore(inference_task, "inference", 8192,
NULL, 5, NULL, 1); // Pin to Core 1
xTaskCreatePinnedToCore(sensor_task, "sensor", 4096,
NULL, 4, NULL, 0); // Pin to Core 0
Once your first model runs on ESP32:
Run object detection on ESP32-S3 with TFLite Micro. Hardware specs, compatibility analysis, getting started guide, and alternatives.
Run anomaly detection on ESP32 with TFLite Micro. Autoencoder setup, sensor integration, and real-time monitoring for industrial applications.
Deploy anomaly detection on ESP32-C3 with TFLite Micro. Cost-effective sensor monitoring with RISC-V and Wi-Fi connectivity.
Implement keyword spotting on ESP32-S3 with TFLite Micro. DS-CNN model setup, audio preprocessing, and real-time voice command recognition.
ForestHub is designed to generate TFLite Micro firmware for ESP32 from a visual workflow. No manual operator registration, no tensor arena guessing.
Get Started Free