Hardware Guide

ESP32-S3 for Object Detection with TensorFlow Lite Micro

The ESP32-S3 runs quantized object detection models via TFLite Micro at 2-5 FPS. Its 512 KB SRAM and vector instructions handle int8 MobileNet-SSD inference — suitable for presence detection, counting, and trigger-based classification tasks.

Hardware Specs

Spec ESP32-S3
Processor Dual-core Xtensa LX7 @ 240 MHz
SRAM 512 KB
Flash Up to 16 MB (external)
Key Features Vector instructions (SIMD), USB OTG, LCD/Camera interface, Up to 8 MB PSRAM
Connectivity Wi-Fi 802.11 b/g/n, Bluetooth 5.0 LE
Price Range $3 - $8 (chip), $10 - $25 (dev board)

Compatibility: Good

The ESP32-S3 provides 512 KB SRAM against a typical 200-300 KB footprint for quantized MobileNet-SSD v2. The Xtensa LX7 vector instructions accelerate int8 multiply-accumulate operations by roughly 2x compared to the original ESP32. Flash is not a constraint at up to 16 MB external. The bottleneck is inference speed: expect 2-5 FPS with QVGA (320x240) input, which rules out real-time tracking but works for occupancy counting and trigger-based detection. TFLite Micro has first-class ESP-IDF support via the official tflite-micro-esp-examples repository. Camera input requires an OV2640 or OV5640 module connected via the DVP interface — the ESP32-S3-EYE dev board includes this out of the box.

Getting Started

  1. 1

    Set up ESP-IDF v5.1+

    Install Espressif's development framework with the ESP32-S3 target. Use the VS Code extension or the manual installation via espressif.github.io/esp-idf. Run idf.py set-target esp32s3.

  2. 2

    Add TFLite Micro as ESP-IDF component

    Clone the tflite-micro-esp-examples repository into your project's components/ directory. This includes pre-built TFLite Micro with ESP-NN optimizations for the Xtensa architecture.

  3. 3

    Prepare a quantized detection model

    Use TensorFlow's post-training int8 quantization on a MobileNet-SSD v2 model. Target output size under 300 KB. Convert with tflite_convert and verify operator compatibility with the Micro interpreter.

  4. 4

    Connect camera and flash model to device

    Wire an OV2640 camera module via the DVP interface, or use the ESP32-S3-EYE which has one built in. Convert the .tflite model to a C array using xxd -i and include it in your firmware build.

Alternatives

Explore More

FAQ

Can the ESP32-S3 run TensorFlow Lite for object detection?
Yes. The ESP32-S3 has 512 KB SRAM and vector instructions that accelerate int8 neural network inference. It runs quantized MobileNet-SSD models at 2-5 FPS with a connected camera module like the OV2640.
What camera module works best with ESP32-S3 for object detection?
The OV2640 is a widely supported camera module for ESP32-S3 boards. For higher image quality, the OV5640 is supported on boards like the ESP32-S3-EYE. Both connect via the DVP camera interface.
How much RAM does object detection need on ESP32-S3?
A quantized MobileNet-SSD v2 model needs roughly 200-300 KB for the model plus inference buffers. The ESP32-S3's 512 KB SRAM handles this with headroom for the application logic and camera frame buffer.

Build This Detection Pipeline in ForestHub

Deploy object detection to ESP32-S3 visually — compile to optimized C firmware from a drag-and-drop workflow.

Get Started Free