Hardware Guide
ESP32-S3 for Object Detection with TensorFlow Lite Micro
The ESP32-S3 runs quantized object detection models via TFLite Micro at 2-5 FPS. Its 512 KB SRAM and vector instructions handle int8 MobileNet-SSD inference — suitable for presence detection, counting, and trigger-based classification tasks.
Hardware Specs
| Spec | ESP32-S3 |
|---|---|
| Processor | Dual-core Xtensa LX7 @ 240 MHz |
| SRAM | 512 KB |
| Flash | Up to 16 MB (external) |
| Key Features | Vector instructions (SIMD), USB OTG, LCD/Camera interface, Up to 8 MB PSRAM |
| Connectivity | Wi-Fi 802.11 b/g/n, Bluetooth 5.0 LE |
| Price Range | $3 - $8 (chip), $10 - $25 (dev board) |
Compatibility:
The ESP32-S3 provides 512 KB SRAM against a typical 200-300 KB footprint for quantized MobileNet-SSD v2. The Xtensa LX7 vector instructions accelerate int8 multiply-accumulate operations by roughly 2x compared to the original ESP32. Flash is not a constraint at up to 16 MB external. The bottleneck is inference speed: expect 2-5 FPS with QVGA (320x240) input, which rules out real-time tracking but works for occupancy counting and trigger-based detection. TFLite Micro has first-class ESP-IDF support via the official tflite-micro-esp-examples repository. Camera input requires an OV2640 or OV5640 module connected via the DVP interface — the ESP32-S3-EYE dev board includes this out of the box.
Getting Started
- 1
Set up ESP-IDF v5.1+
Install Espressif's development framework with the ESP32-S3 target. Use the VS Code extension or the manual installation via espressif.github.io/esp-idf. Run idf.py set-target esp32s3.
- 2
Add TFLite Micro as ESP-IDF component
Clone the tflite-micro-esp-examples repository into your project's components/ directory. This includes pre-built TFLite Micro with ESP-NN optimizations for the Xtensa architecture.
- 3
Prepare a quantized detection model
Use TensorFlow's post-training int8 quantization on a MobileNet-SSD v2 model. Target output size under 300 KB. Convert with tflite_convert and verify operator compatibility with the Micro interpreter.
- 4
Connect camera and flash model to device
Wire an OV2640 camera module via the DVP interface, or use the ESP32-S3-EYE which has one built in. Convert the .tflite model to a C array using xxd -i and include it in your firmware build.
Alternatives
STM32H7 with TFLite Micro
1 MB SRAM and 480 MHz Cortex-M7 deliver higher resolution detection at faster inference speeds, but no built-in Wi-Fi and 3-4x the board cost.
ESP32-S3 with Edge Impulse
Same hardware, but Edge Impulse handles model training, quantization, and deployment in one pipeline. Easier onboarding, less control over the model.
Explore More
FAQ
- Can the ESP32-S3 run TensorFlow Lite for object detection?
- Yes. The ESP32-S3 has 512 KB SRAM and vector instructions that accelerate int8 neural network inference. It runs quantized MobileNet-SSD models at 2-5 FPS with a connected camera module like the OV2640.
- What camera module works best with ESP32-S3 for object detection?
- The OV2640 is a widely supported camera module for ESP32-S3 boards. For higher image quality, the OV5640 is supported on boards like the ESP32-S3-EYE. Both connect via the DVP camera interface.
- How much RAM does object detection need on ESP32-S3?
- A quantized MobileNet-SSD v2 model needs roughly 200-300 KB for the model plus inference buffers. The ESP32-S3's 512 KB SRAM handles this with headroom for the application logic and camera frame buffer.
Orchestrate Vision AI Agents with ForestHub
Run detection on-device; ForestHub on your Linux edge gateway orchestrates the agents, ingests results over MQTT, and acts on the line — a deterministic, auditable graph.
Get Started Free