Hardware Guide
The STM32H7 is one of the most capable MCUs for on-device object detection. Its 1 MB SRAM, 480 MHz Cortex-M7, and L1 cache run quantized MobileNet-SSD models at 5-15 FPS with CMSIS-NN acceleration — fast enough for real-time counting and tracking applications.
| Spec | STM32H7 |
|---|---|
| Processor | ARM Cortex-M7 @ 480 MHz |
| SRAM | 1024 KB |
| Flash | 2 MB |
| Key Features | Double-precision FPU, L1 cache (16 KB I + 16 KB D), JPEG codec, Chrom-ART Accelerator (DMA2D) |
| Connectivity | Ethernet, USB OTG HS/FS |
| Price Range | $8 - $20 (chip), $30 - $80 (dev board) |
The STM32H7 provides 1024 KB SRAM — 4x the 256 KB minimum for object detection and 2x what the ESP32-S3 offers. The Cortex-M7 at 480 MHz with L1 cache (16 KB instruction + 16 KB data) significantly reduces memory access latency during inference. CMSIS-NN optimized kernels in TFLite Micro accelerate convolution and pooling operations by significantly faster than generic implementations. The double-precision FPU handles any floating-point preprocessing efficiently. The DMA2D (Chrom-ART) accelerator can handle image scaling and color conversion, offloading the CPU for inference. STMicroelectronics' X-CUBE-AI tool can further optimize TFLite models for STM32, but raw TFLite Micro with CMSIS-NN already delivers strong performance. The main limitation is connectivity: no built-in Wi-Fi or BLE. External modules (ESP-AT, ATWINC1500) add cost and complexity.
Set up STM32CubeIDE with TFLite Micro
Install STM32CubeIDE and create a project for your STM32H7 board (e.g., STM32H743-NUCLEO or STM32H7B3-DK). Add TFLite Micro as a library — use the CMSIS-NN backend for Cortex-M7 optimized kernels.
Configure camera input via DCMI
Connect an OV5640 or OV7725 camera module to the STM32H7's DCMI (Digital Camera Interface). Configure DMA transfer to write frames directly to SRAM. The H7B3-DK board includes a camera connector and LCD for live preview.
Prepare and quantize the detection model
Train or download a MobileNet-SSD v2 model and apply int8 quantization. On the STM32H7, you have headroom for larger models — up to 500 KB comfortably. Use STM32Cube.AI's model analyzer to verify RAM and Flash requirements before flashing.
Run inference with CMSIS-NN acceleration
The TFLite Micro interpreter automatically uses CMSIS-NN kernels on Cortex-M7. Allocate the tensor arena in the H7's DTCM or AXI SRAM for optimal access speed. Benchmark inference time on hardware — timing varies significantly with model size and input resolution.
Built-in Wi-Fi and BLE at 1/3 the cost. 512 KB SRAM runs smaller models at 2-5 FPS. Better for IoT-connected detection where connectivity matters more than speed.
End-to-end pipeline with Edge Impulse's FOMO architecture. Easier development, built-in Wi-Fi. Lower performance ceiling but faster time to deployment.
Deploy object detection to STM32H7 visually — compile camera input to inference output as optimized C firmware.
Get Started Free