Hardware Guide

STM32H7 for Object Detection with TensorFlow Lite Micro

The STM32H7 is one of the most capable MCUs for on-device object detection. Its 1 MB SRAM, 480 MHz Cortex-M7, and L1 cache run quantized MobileNet-SSD models at 5-15 FPS with CMSIS-NN acceleration — fast enough for real-time counting and tracking applications.

Hardware Specs

Spec STM32H7
Processor ARM Cortex-M7 @ 480 MHz
SRAM 1024 KB
Flash 2 MB
Key Features Double-precision FPU, L1 cache (16 KB I + 16 KB D), JPEG codec, Chrom-ART Accelerator (DMA2D)
Connectivity Ethernet, USB OTG HS/FS
Price Range $8 - $20 (chip), $30 - $80 (dev board)

Compatibility: Excellent

The STM32H7 provides 1024 KB SRAM — 4x the 256 KB minimum for object detection and 2x what the ESP32-S3 offers. The Cortex-M7 at 480 MHz with L1 cache (16 KB instruction + 16 KB data) significantly reduces memory access latency during inference. CMSIS-NN optimized kernels in TFLite Micro accelerate convolution and pooling operations by significantly faster than generic implementations. The double-precision FPU handles any floating-point preprocessing efficiently. The DMA2D (Chrom-ART) accelerator can handle image scaling and color conversion, offloading the CPU for inference. STMicroelectronics' X-CUBE-AI tool can further optimize TFLite models for STM32, but raw TFLite Micro with CMSIS-NN already delivers strong performance. The main limitation is connectivity: no built-in Wi-Fi or BLE. External modules (ESP-AT, ATWINC1500) add cost and complexity.

Getting Started

  1. 1

    Set up STM32CubeIDE with TFLite Micro

    Install STM32CubeIDE and create a project for your STM32H7 board (e.g., STM32H743-NUCLEO or STM32H7B3-DK). Add TFLite Micro as a library — use the CMSIS-NN backend for Cortex-M7 optimized kernels.

  2. 2

    Configure camera input via DCMI

    Connect an OV5640 or OV7725 camera module to the STM32H7's DCMI (Digital Camera Interface). Configure DMA transfer to write frames directly to SRAM. The H7B3-DK board includes a camera connector and LCD for live preview.

  3. 3

    Prepare and quantize the detection model

    Train or download a MobileNet-SSD v2 model and apply int8 quantization. On the STM32H7, you have headroom for larger models — up to 500 KB comfortably. Use STM32Cube.AI's model analyzer to verify RAM and Flash requirements before flashing.

  4. 4

    Run inference with CMSIS-NN acceleration

    The TFLite Micro interpreter automatically uses CMSIS-NN kernels on Cortex-M7. Allocate the tensor arena in the H7's DTCM or AXI SRAM for optimal access speed. Benchmark inference time on hardware — timing varies significantly with model size and input resolution.

Alternatives

Explore More

FAQ

Why choose STM32H7 over ESP32-S3 for object detection?
The STM32H7 has 2x the SRAM (1 MB vs 512 KB), 2x the clock speed (480 vs 240 MHz), and L1 cache that reduces memory bottlenecks. This enables higher resolution input, larger models, and 3-5x faster inference. Choose the STM32H7 when detection speed and accuracy matter more than wireless connectivity.
What FPS can the STM32H7 achieve for object detection?
With a quantized MobileNet-SSD v2 at QVGA resolution (320x240), the STM32H7 achieves 5-15 FPS depending on model complexity and CMSIS-NN optimization level. Smaller models like Lightweight detection models like FOMO or pruned MobileNet-SSD can achieve usable frame rates — benchmark with your specific model. The DMA2D accelerator handles pixel copy, fill, and color conversion operations.
Does the STM32H7 support TFLite Micro natively?
Yes. TFLite Micro includes CMSIS-NN optimized kernels that target ARM Cortex-M7 specifically. STMicroelectronics also provides X-CUBE-AI, which can further optimize TFLite models for STM32 hardware. Both approaches work — X-CUBE-AI offers better performance, TFLite Micro offers broader model compatibility.

Build Detection Pipelines in ForestHub

Deploy object detection to STM32H7 visually — compile camera input to inference output as optimized C firmware.

Get Started Free