Why choose STM32H7 over ESP32-S3 for object detection?

The STM32H7 has 2x the SRAM (1 MB vs 512 KB), 2x the clock speed (480 vs 240 MHz), and L1 cache that reduces memory bottlenecks. This enables higher resolution input, larger models, and 3-5x faster inference. Choose the STM32H7 when detection speed and accuracy matter more than wireless connectivity.

What FPS can the STM32H7 achieve for object detection?

With a quantized MobileNet-SSD v2 at QVGA resolution (320x240), the STM32H7 achieves 5-15 FPS depending on model complexity and CMSIS-NN optimization level. Smaller models like Lightweight detection models like FOMO or pruned MobileNet-SSD can achieve usable frame rates — benchmark with your specific model. The DMA2D accelerator handles pixel copy, fill, and color conversion operations.

Does the STM32H7 support TFLite Micro natively?

Yes. TFLite Micro includes CMSIS-NN optimized kernels that target ARM Cortex-M7 specifically. STMicroelectronics also provides X-CUBE-AI, which can further optimize TFLite models for STM32 hardware. Both approaches work — X-CUBE-AI offers better performance, TFLite Micro offers broader model compatibility.

Hardware Guide

STM32H7 for Object Detection with TensorFlow Lite Micro

The STM32H7 is one of the most capable MCUs for on-device object detection. Its 1 MB SRAM, 480 MHz Cortex-M7, and L1 cache run quantized MobileNet-SSD models at 5-15 FPS with CMSIS-NN acceleration — fast enough for real-time counting and tracking applications.

Hardware Specs

Spec	STM32H7
Processor	ARM Cortex-M7 @ 480 MHz
SRAM	1024 KB
Flash	2 MB
Key Features	Double-precision FPU, L1 cache (16 KB I + 16 KB D), JPEG codec, Chrom-ART Accelerator (DMA2D)
Connectivity	Ethernet, USB OTG HS/FS
Price Range	$8 - $20 (chip), $30 - $80 (dev board)

Compatibility: Excellent

The STM32H7 provides 1024 KB SRAM — 4x the 256 KB minimum for object detection and 2x what the ESP32-S3 offers. The Cortex-M7 at 480 MHz with L1 cache (16 KB instruction + 16 KB data) significantly reduces memory access latency during inference. CMSIS-NN optimized kernels in TFLite Micro accelerate convolution and pooling operations by significantly faster than generic implementations. The double-precision FPU handles any floating-point preprocessing efficiently. The DMA2D (Chrom-ART) accelerator can handle image scaling and color conversion, offloading the CPU for inference. STMicroelectronics' X-CUBE-AI tool can further optimize TFLite models for STM32, but raw TFLite Micro with CMSIS-NN already delivers strong performance. The main limitation is connectivity: no built-in Wi-Fi or BLE. External modules (ESP-AT, ATWINC1500) add cost and complexity.

Getting Started

1

Set up STM32CubeIDE with TFLite Micro

Install STM32CubeIDE and create a project for your STM32H7 board (e.g., STM32H743-NUCLEO or STM32H7B3-DK). Add TFLite Micro as a library — use the CMSIS-NN backend for Cortex-M7 optimized kernels.
2

Configure camera input via DCMI

Connect an OV5640 or OV7725 camera module to the STM32H7's DCMI (Digital Camera Interface). Configure DMA transfer to write frames directly to SRAM. The H7B3-DK board includes a camera connector and LCD for live preview.
3

Prepare and quantize the detection model

Train or download a MobileNet-SSD v2 model and apply int8 quantization. On the STM32H7, you have headroom for larger models — up to 500 KB comfortably. Use STM32Cube.AI's model analyzer to verify RAM and Flash requirements before flashing.
4

Run inference with CMSIS-NN acceleration

The TFLite Micro interpreter automatically uses CMSIS-NN kernels on Cortex-M7. Allocate the tensor arena in the H7's DTCM or AXI SRAM for optimal access speed. Benchmark inference time on hardware — timing varies significantly with model size and input resolution.

Alternatives

ESP32-S3 with TFLite Micro

Built-in Wi-Fi and BLE at 1/3 the cost. 512 KB SRAM runs smaller models at 2-5 FPS. Better for IoT-connected detection where connectivity matters more than speed.

ESP32-S3 with Edge Impulse

End-to-end pipeline with Edge Impulse's FOMO architecture. Easier development, built-in Wi-Fi. Lower performance ceiling but faster time to deployment.

Explore More

More STM32H7 guides More Object Detection guides All resources Find the right MCU

FAQ

Why choose STM32H7 over ESP32-S3 for object detection?: The STM32H7 has 2x the SRAM (1 MB vs 512 KB), 2x the clock speed (480 vs 240 MHz), and L1 cache that reduces memory bottlenecks. This enables higher resolution input, larger models, and 3-5x faster inference. Choose the STM32H7 when detection speed and accuracy matter more than wireless connectivity.
What FPS can the STM32H7 achieve for object detection?: With a quantized MobileNet-SSD v2 at QVGA resolution (320x240), the STM32H7 achieves 5-15 FPS depending on model complexity and CMSIS-NN optimization level. Smaller models like Lightweight detection models like FOMO or pruned MobileNet-SSD can achieve usable frame rates — benchmark with your specific model. The DMA2D accelerator handles pixel copy, fill, and color conversion operations.
Does the STM32H7 support TFLite Micro natively?: Yes. TFLite Micro includes CMSIS-NN optimized kernels that target ARM Cortex-M7 specifically. STMicroelectronics also provides X-CUBE-AI, which can further optimize TFLite models for STM32 hardware. Both approaches work — X-CUBE-AI offers better performance, TFLite Micro offers broader model compatibility.