Does TFLite Micro support all TensorFlow operations on ESP32?

No. TFLite Micro supports a subset of TFLite operators — roughly 60-80 depending on the version. Common layers like Conv2D, DepthwiseConv2D, FullyConnected, Softmax, and Reshape are supported. Custom or uncommon ops like FlexDelegate or HashTableLookup are not. Check your model's ops against the supported list before starting.

How much SRAM does TFLite Micro need on ESP32?

The TFLite Micro runtime itself uses roughly 10-20 KB. The tensor arena — where model activations and intermediate buffers live — is additional and model-dependent. A keyword spotting model needs 30-50 KB of arena. Image classification needs 80-150 KB. Total SRAM usage for ML is typically 50-200 KB.

Can I use Arduino IDE instead of ESP-IDF for TFLite on ESP32?

Yes, the Arduino ESP32 core supports TFLite Micro via library inclusion. But ESP-IDF gives you more control over memory allocation, task priorities, and advanced features like PSRAM placement of the tensor arena. For production firmware, ESP-IDF is the better choice.

What is the difference between TFLite Micro and LiteRT for Microcontrollers?

They are the same project. Google rebranded TensorFlow Lite Micro as LiteRT for Microcontrollers in 2024. The API, model format, and runtime are identical. Documentation has moved to ai.google.dev/edge/litert/microcontrollers/. Most community resources still use the TFLite Micro name.

Guide

How to Run TensorFlow Lite on ESP32

Install ESP-IDF, add the TFLite Micro component to your project, allocate a tensor arena in SRAM, register only the operators your model uses, and invoke the interpreter. ESP32 runs int8 quantized models with 20-300 ms inference depending on complexity.

Published 2026-04-01

Naming Note

TensorFlow Lite Micro was rebranded to LiteRT for Microcontrollers in 2024. This guide uses both names interchangeably — the codebase, API, and model format are the same. The official documentation now lives under the LiteRT name.

Prerequisites

Before starting, you need:

ESP-IDF v5.x installed (setup guide)
A trained, quantized TFLite model — int8 quantization recommended for ESP32 (see our deployment guide)
An ESP32 or ESP32-S3 dev board — the S3 is recommended for ML workloads due to SIMD instructions
Basic C programming experience

This guide targets ESP-IDF. If you are using Arduino, the concepts are the same but the project structure differs.

Step 1: Create an ESP-IDF Project

Start with a clean ESP-IDF project:

idf.py create-project tflite_demo
cd tflite_demo
idf.py set-target esp32  # or esp32s3

This creates the standard project structure with main/tflite_demo.c as the entry point.

Step 2: Add TFLite Micro as a Component

TFLite Micro integrates into ESP-IDF as a managed component. Add it to your project:

idf.py add-dependency "espressif/esp-tflite-micro"

This pulls the Espressif-maintained fork of TFLite Micro, which includes ESP32-specific optimizations — SIMD acceleration on ESP32-S3 and memory placement options for PSRAM.

Alternatively, you can clone the TFLite Micro source directly into your components/ directory. This gives full control but requires manual updates.

Step 3: Convert Your Model to a C Array

The ESP32 does not have a filesystem for loading .tflite files at runtime. Convert the model binary to a C header:

xxd -i model.tflite > main/model_data.h

This generates:

const unsigned char model_tflite[] = {
  0x20, 0x00, 0x00, 0x00, 0x54, 0x46, 0x4c, 0x33, ...
};
const unsigned int model_tflite_len = 48672;

For ESP32-S3 with PSRAM, place the model in external memory to free SRAM for the tensor arena and application code:

__attribute__((section(".ext_ram.bss")))
const unsigned char model_tflite[] = { ... };

PSRAM is slower than internal SRAM. Model weights stored in PSRAM are accessed during each inference, adding some latency overhead. However, for many applications this trade-off is worthwhile to free internal SRAM for the tensor arena and application buffers.

Step 4: Set Up the Tensor Arena

The tensor arena is a statically allocated byte array that the interpreter uses as working memory. It holds intermediate activations, temporary buffers, and quantization parameters during inference.

#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/schema/schema_generated.h"

constexpr int kTensorArenaSize = 81920;  // 80 KB — adjust per model
alignas(16) uint8_t tensor_arena[kTensorArenaSize];

Sizing the arena: Start with a generous size (80-100 KB). If inference succeeds, reduce by 10 KB at a time until it fails, then add 10% headroom. The interpreter returns kTfLiteError when the arena is too small — with no indication of how much more it needs.

Memory budget for ESP32 (520 KB SRAM total):

80-120 KB for the tensor arena
10-20 KB for the TFLite runtime
20-40 KB for FreeRTOS and application code
Remaining for stack, heap, and peripherals

Step 5: Register Operators

TFLite Micro uses selective operator registration. Instead of linking all operators (which bloats the binary by 100+ KB), you register only the ops your model actually uses:

static tflite::MicroMutableOpResolver<6> resolver;

resolver.AddConv2D();
resolver.AddDepthwiseConv2D();
resolver.AddReshape();
resolver.AddSoftmax();
resolver.AddFullyConnected();
resolver.AddMaxPool2D();

The template parameter (<6>) must match the number of ops you register.

To find which ops your model uses, inspect the .tflite file with flatc or the Netron model viewer. Each registered op adds 1-10 KB to the firmware binary. Skip ops your model does not use.

If you miss an op, the interpreter fails at init with Didn't find op for builtin opcode — a clear error that tells you exactly which op to add.

Step 6: Write the Inference Code

The complete inference setup in C:

#include "model_data.h"

void app_main(void) {
    // 1. Load and validate the model
    const tflite::Model* model = tflite::GetModel(model_tflite);
    if (model->version() != TFLITE_SCHEMA_VERSION) {
        printf("Model schema version mismatch\n");
        return;
    }

    // 2. Create the interpreter
    tflite::MicroInterpreter interpreter(
        model, resolver, tensor_arena, kTensorArenaSize);

    // 3. Allocate tensors
    if (interpreter.AllocateTensors() != kTfLiteOk) {
        printf("AllocateTensors failed — arena too small\n");
        return;
    }

    // 4. Get input tensor pointer
    TfLiteTensor* input = interpreter.input(0);
    printf("Input type: %d, bytes: %zu\n", input->type, input->bytes);

    // 5. Inference loop
    while (true) {
        // Copy sensor data into input tensor
        // Match the model's expected input type (int8 or float32)
        read_sensor_data(input->data.int8, input->bytes);

        // Run inference
        if (interpreter.Invoke() != kTfLiteOk) {
            printf("Invoke failed\n");
            continue;
        }

        // Read output
        TfLiteTensor* output = interpreter.output(0);
        int8_t prediction = output->data.int8[0];
        printf("Prediction: %d\n", prediction);

        vTaskDelay(pdMS_TO_TICKS(100));
    }
}

Input format is critical. If your model expects int8 input (standard for quantized models), you must provide int8 data. Feeding float32 data to an int8 input produces garbage output with no error message. Always check input->type after AllocateTensors().

Step 7: Build and Flash

idf.py build
idf.py -p /dev/ttyUSB0 flash monitor

The first build takes 2-5 minutes as TFLite Micro compiles from source. Subsequent builds are faster.

Troubleshooting

Symptom	Cause	Fix
`AllocateTensors failed`	Tensor arena too small	Increase `kTensorArenaSize` by 20 KB
`Didn't find op for builtin opcode`	Missing operator registration	Check model ops, add to resolver
Garbage output values	Input format mismatch	Verify `input->type` matches your data
Crash or watchdog reset	Stack overflow	Increase main task stack size in `menuconfig`
Slow inference (>500 ms)	Float32 model not optimized for MCU pipeline	Use int8 quantized model
Linker errors	TFLite component not found	Run `idf.py reconfigure` after adding dependency

Performance by Model Type

Model	ESP32 (int8)	ESP32-S3 (int8 + SIMD)
Keyword spotting (20 KB)	30-50 ms	15-30 ms
Anomaly detection (10 KB)	5-10 ms	2-5 ms
Gesture recognition (30 KB)	10-25 ms	5-15 ms
Image classification 96x96 (250 KB)	200-400 ms	100-200 ms

Estimated ranges — benchmark on target hardware for production. Performance varies with model architecture and optimization.

The ESP32-S3’s SIMD instructions accelerate int8 operations by roughly 2x compared to the base ESP32. For ML workloads, the S3 is worth the $3-5 price premium.

Optimization Tips

Use int8 Quantization

The ESP32’s single-precision FPU handles individual floating-point operations but is not optimized for batch neural network computations. Int8 quantized models run faster because integer operations process more efficiently on the MCU’s pipeline. Quantize during conversion, not as an afterthought:

# During model conversion (on your development PC)
converter = tf.lite.TFLiteConverter.from_saved_model('./saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
tflite_model = converter.convert()

Note: The older tflite_convert CLI is deprecated. Use the Python tf.lite.TFLiteConverter API shown above.

Place Model in PSRAM (ESP32-S3)

On the ESP32-S3, place the model array in PSRAM to free internal SRAM for the tensor arena. PSRAM is slower than internal SRAM, and model weights stored there are accessed during each inference. This adds some latency, but frees internal SRAM for the tensor arena and application code — a worthwhile trade-off for larger models.

Minimize Operator Count

Each registered operator adds flash footprint. A model using 5 operators adds roughly 15-30 KB to the binary. Audit your model and strip unnecessary ops — TFLite conversion sometimes introduces ops that an optimized model does not actually need.

Use Dual Cores (ESP32/ESP32-S3)

Run inference on Core 1 while Core 0 handles sensor reading and Wi-Fi. This prevents inference from blocking communication:

xTaskCreatePinnedToCore(inference_task, "inference", 8192,
                         NULL, 5, NULL, 1);  // Pin to Core 1
xTaskCreatePinnedToCore(sensor_task, "sensor", 4096,
                         NULL, 4, NULL, 0);  // Pin to Core 0

Next Steps

Once your first model runs on ESP32:

ESP32-S3 for vision tasks: Camera interface and PSRAM for image classification
ESP32-C3 for budget IoT: $1-3 RISC-V chip for simple anomaly detection with Wi-Fi
Edge Impulse for managed pipelines: Skip manual conversion with a cloud-based training and export workflow
Multi-model orchestration: Coordinate multiple models and sensors in a structured firmware architecture
Workflow automation with ForestHub: Once your model runs on the device, ForestHub orchestrates the surrounding agent loop on your Linux edge gateway — ingesting results over MQTT, holding state, applying decision logic with LLM reasoning as one node, and acting back over industrial protocols as a deterministic, auditable graph

Frequently Asked Questions

Does TFLite Micro support all TensorFlow operations on ESP32?: No. TFLite Micro supports a subset of TFLite operators — roughly 60-80 depending on the version. Common layers like Conv2D, DepthwiseConv2D, FullyConnected, Softmax, and Reshape are supported. Custom or uncommon ops like FlexDelegate or HashTableLookup are not. Check your model's ops against the supported list before starting.
How much SRAM does TFLite Micro need on ESP32?: The TFLite Micro runtime itself uses roughly 10-20 KB. The tensor arena — where model activations and intermediate buffers live — is additional and model-dependent. A keyword spotting model needs 30-50 KB of arena. Image classification needs 80-150 KB. Total SRAM usage for ML is typically 50-200 KB.
Can I use Arduino IDE instead of ESP-IDF for TFLite on ESP32?: Yes, the Arduino ESP32 core supports TFLite Micro via library inclusion. But ESP-IDF gives you more control over memory allocation, task priorities, and advanced features like PSRAM placement of the tensor arena. For production firmware, ESP-IDF is the better choice.
What is the difference between TFLite Micro and LiteRT for Microcontrollers?: They are the same project. Google rebranded TensorFlow Lite Micro as LiteRT for Microcontrollers in 2024. The API, model format, and runtime are identical. Documentation has moved to ai.google.dev/edge/litert/microcontrollers/. Most community resources still use the TFLite Micro name.

Related Hardware Guides

ESP32-S3 Object Detection with TFLite Micro

Run object detection on ESP32-S3 with TFLite Micro. Hardware specs, compatibility analysis, getting started guide, and alternatives.

ESP32 Anomaly Detection with TFLite Micro

Run anomaly detection on ESP32 with TFLite Micro. Autoencoder setup, sensor integration, and real-time monitoring for industrial applications.

ESP32-C3 Anomaly Detection with TFLite Micro

Deploy anomaly detection on ESP32-C3 with TFLite Micro. Cost-effective sensor monitoring with RISC-V and Wi-Fi connectivity.

ESP32-S3 Voice Recognition with TFLite Micro

Implement keyword spotting on ESP32-S3 with TFLite Micro. DS-CNN model setup, audio preprocessing, and real-time voice command recognition.

Sources

Explore More

ESP32 guides ESP32-C3 guides ESP32-S3 guides All resources MCU Compatibility Checker

Skip the Manual Setup

Your ESP32 runs TFLite Micro; ForestHub orchestrates everything around it. The edge AI agents platform runs on your Linux gateway, ingesting inference results over MQTT, Modbus, and OPC-UA and coordinating the sense-reason-act loop as a deterministic, auditable graph.

Get Started Free