Guide

How to Run TensorFlow Lite on ESP32

Install ESP-IDF, add the TFLite Micro component to your project, allocate a tensor arena in SRAM, register only the operators your model uses, and invoke the interpreter. ESP32 runs int8 quantized models with 20-300 ms inference depending on complexity.

Published 2026-04-01

Naming Note

TensorFlow Lite Micro was rebranded to LiteRT for Microcontrollers in 2024. This guide uses both names interchangeably — the codebase, API, and model format are the same. The official documentation now lives under the LiteRT name.

Prerequisites

Before starting, you need:

  • ESP-IDF v5.x installed (setup guide)
  • A trained, quantized TFLite model — int8 quantization recommended for ESP32 (see our deployment guide)
  • An ESP32 or ESP32-S3 dev board — the S3 is recommended for ML workloads due to SIMD instructions
  • Basic C programming experience

This guide targets ESP-IDF. If you are using Arduino, the concepts are the same but the project structure differs.

Step 1: Create an ESP-IDF Project

Start with a clean ESP-IDF project:

idf.py create-project tflite_demo
cd tflite_demo
idf.py set-target esp32  # or esp32s3

This creates the standard project structure with main/tflite_demo.c as the entry point.

Step 2: Add TFLite Micro as a Component

TFLite Micro integrates into ESP-IDF as a managed component. Add it to your project:

idf.py add-dependency "espressif/esp-tflite-micro"

This pulls the Espressif-maintained fork of TFLite Micro, which includes ESP32-specific optimizations — SIMD acceleration on ESP32-S3 and memory placement options for PSRAM.

Alternatively, you can clone the TFLite Micro source directly into your components/ directory. This gives full control but requires manual updates.

Step 3: Convert Your Model to a C Array

The ESP32 does not have a filesystem for loading .tflite files at runtime. Convert the model binary to a C header:

xxd -i model.tflite > main/model_data.h

This generates:

const unsigned char model_tflite[] = {
  0x20, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, ...
};
const unsigned int model_tflite_len = 48672;

For ESP32-S3 with PSRAM, place the model in external memory to free SRAM for the tensor arena and application code:

__attribute__((section(".ext_ram.bss")))
const unsigned char model_tflite[] = { ... };

PSRAM is slower than internal SRAM. Model weights stored in PSRAM are accessed during each inference, adding some latency overhead. However, for many applications this trade-off is worthwhile to free internal SRAM for the tensor arena and application buffers.

Step 4: Set Up the Tensor Arena

The tensor arena is a statically allocated byte array that the interpreter uses as working memory. It holds intermediate activations, temporary buffers, and quantization parameters during inference.

#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"

constexpr int kTensorArenaSize = 81920;  // 80 KB — adjust per model
alignas(16) uint8_t tensor_arena[kTensorArenaSize];

Sizing the arena: Start with a generous size (80-100 KB). If inference succeeds, reduce by 10 KB at a time until it fails, then add 10% headroom. The interpreter returns kTfLiteError when the arena is too small — with no indication of how much more it needs.

Memory budget for ESP32 (520 KB SRAM total):

  • 80-120 KB for the tensor arena
  • 10-20 KB for the TFLite runtime
  • 20-40 KB for FreeRTOS and application code
  • Remaining for stack, heap, and peripherals

Step 5: Register Operators

TFLite Micro uses selective operator registration. Instead of linking all operators (which bloats the binary by 100+ KB), you register only the ops your model actually uses:

static tflite::MicroMutableOpResolver<6> resolver;

resolver.AddConv2D();
resolver.AddDepthwiseConv2D();
resolver.AddReshape();
resolver.AddSoftmax();
resolver.AddFullyConnected();
resolver.AddMaxPool2D();

The template parameter (<6>) must match the number of ops you register.

To find which ops your model uses, inspect the .tflite file with flatc or the Netron model viewer. Each registered op adds 1-10 KB to the firmware binary. Skip ops your model does not use.

If you miss an op, the interpreter fails at init with Didn't find op for builtin opcode — a clear error that tells you exactly which op to add.

Step 6: Write the Inference Code

The complete inference setup in C:

#include "model_data.h"

void app_main(void) {
    // 1. Load and validate the model
    const tflite::Model* model = tflite::GetModel(model_tflite);
    if (model->version() != TFLITE_SCHEMA_VERSION) {
        printf("Model schema version mismatch\n");
        return;
    }

    // 2. Create the interpreter
    tflite::MicroInterpreter interpreter(
        model, resolver, tensor_arena, kTensorArenaSize);

    // 3. Allocate tensors
    if (interpreter.AllocateTensors() != kTfLiteOk) {
        printf("AllocateTensors failed — arena too small\n");
        return;
    }

    // 4. Get input tensor pointer
    TfLiteTensor* input = interpreter.input(0);
    printf("Input type: %d, bytes: %zu\n", input->type, input->bytes);

    // 5. Inference loop
    while (true) {
        // Copy sensor data into input tensor
        // Match the model's expected input type (int8 or float32)
        read_sensor_data(input->data.int8, input->bytes);

        // Run inference
        if (interpreter.Invoke() != kTfLiteOk) {
            printf("Invoke failed\n");
            continue;
        }

        // Read output
        TfLiteTensor* output = interpreter.output(0);
        int8_t prediction = output->data.int8[0];
        printf("Prediction: %d\n", prediction);

        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

Input format is critical. If your model expects int8 input (standard for quantized models), you must provide int8 data. Feeding float32 data to an int8 input produces garbage output with no error message. Always check input->type after AllocateTensors().

Step 7: Build and Flash

idf.py build
idf.py -p /dev/ttyUSB0 flash monitor

The first build takes 2-5 minutes as TFLite Micro compiles from source. Subsequent builds are faster.

Troubleshooting

SymptomCauseFix
AllocateTensors failedTensor arena too smallIncrease kTensorArenaSize by 20 KB
Didn't find op for builtin opcodeMissing operator registrationCheck model ops, add to resolver
Garbage output valuesInput format mismatchVerify input->type matches your data
Crash or watchdog resetStack overflowIncrease main task stack size in menuconfig
Slow inference (>500 ms)Float32 model not optimized for MCU pipelineUse int8 quantized model
Linker errorsTFLite component not foundRun idf.py reconfigure after adding dependency

Performance by Model Type

ModelESP32 (int8)ESP32-S3 (int8 + SIMD)
Keyword spotting (20 KB)30-50 ms15-30 ms
Anomaly detection (10 KB)5-10 ms2-5 ms
Gesture recognition (30 KB)10-25 ms5-15 ms
Image classification 96x96 (250 KB)200-400 ms100-200 ms

Estimated ranges — benchmark on target hardware for production. Performance varies with model architecture and optimization.

The ESP32-S3’s SIMD instructions accelerate int8 operations by roughly 2x compared to the base ESP32. For ML workloads, the S3 is worth the $3-5 price premium.

Optimization Tips

Use int8 Quantization

The ESP32’s single-precision FPU handles individual floating-point operations but is not optimized for batch neural network computations. Int8 quantized models run faster because integer operations process more efficiently on the MCU’s pipeline. Quantize during conversion, not as an afterthought:

# During model conversion (on your development PC)
converter = tf.lite.TFLiteConverter.from_saved_model('./saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
tflite_model = converter.convert()

Note: The older tflite_convert CLI is deprecated. Use the Python tf.lite.TFLiteConverter API shown above.

Place Model in PSRAM (ESP32-S3)

On the ESP32-S3, place the model array in PSRAM to free internal SRAM for the tensor arena. PSRAM is slower than internal SRAM, and model weights stored there are accessed during each inference. This adds some latency, but frees internal SRAM for the tensor arena and application code — a worthwhile trade-off for larger models.

Minimize Operator Count

Each registered operator adds flash footprint. A model using 5 operators adds roughly 15-30 KB to the binary. Audit your model and strip unnecessary ops — TFLite conversion sometimes introduces ops that an optimized model does not actually need.

Use Dual Cores (ESP32/ESP32-S3)

Run inference on Core 1 while Core 0 handles sensor reading and Wi-Fi. This prevents inference from blocking communication:

xTaskCreatePinnedToCore(inference_task, "inference", 8192,
                         NULL, 5, NULL, 1);  // Pin to Core 1
xTaskCreatePinnedToCore(sensor_task, "sensor", 4096,
                         NULL, 4, NULL, 0);  // Pin to Core 0

Next Steps

Once your first model runs on ESP32:

Frequently Asked Questions

Does TFLite Micro support all TensorFlow operations on ESP32?
No. TFLite Micro supports a subset of TFLite operators — roughly 60-80 depending on the version. Common layers like Conv2D, DepthwiseConv2D, FullyConnected, Softmax, and Reshape are supported. Custom or uncommon ops like FlexDelegate or HashTableLookup are not. Check your model's ops against the supported list before starting.
How much SRAM does TFLite Micro need on ESP32?
The TFLite Micro runtime itself uses roughly 10-20 KB. The tensor arena — where model activations and intermediate buffers live — is additional and model-dependent. A keyword spotting model needs 30-50 KB of arena. Image classification needs 80-150 KB. Total SRAM usage for ML is typically 50-200 KB.
Can I use Arduino IDE instead of ESP-IDF for TFLite on ESP32?
Yes, the Arduino ESP32 core supports TFLite Micro via library inclusion. But ESP-IDF gives you more control over memory allocation, task priorities, and advanced features like PSRAM placement of the tensor arena. For production firmware, ESP-IDF is the better choice.
What is the difference between TFLite Micro and LiteRT for Microcontrollers?
They are the same project. Google rebranded TensorFlow Lite Micro as LiteRT for Microcontrollers in 2024. The API, model format, and runtime are identical. Documentation has moved to ai.google.dev/edge/litert/microcontrollers/. Most community resources still use the TFLite Micro name.

Related Hardware Guides

Explore More

Skip the Manual Setup

ForestHub is designed to generate TFLite Micro firmware for ESP32 from a visual workflow. No manual operator registration, no tensor arena guessing.

Get Started Free