Dhrystone Benchmarking on MCUs
The dhrystone benchmark application for Stratify OS is up and running. Dhrystone is a simple benchmark application that is designed to mimic a typical program. It characterizes the performance of both the compiler and the processor (and in this case the operating system). The result of the program is a number called DMIPS which is the amount of time to run one “dhrystone” divided by 1757 which is the number of dhrystones some old 1 MIPS processor could output.
I ran the benchmark on a variety of processors. The benchmark was running inside of Stratify OS. So there are a couple of factors that reduce the maximum possible DMIPS score that is advertised by chip manufacturers.
- The application was compiled using mlong-calls which has a 5% to 10% penalty.
- The kernel (including libc) was compiled with -Os rather than being purely speed optimized.
- In some cases, I may have just made an error on the cache/flash settings that is causing degraded performance. I will update this post if I find any issues.
- The application is running with the OS. Some processor time is being used by context switching and providing metrics over USB. This takes away another 10% or so as shown by the following task analysis from
sl app.run:path=dhrystone,terminal task.analyze
. The output was taken from a run with the STM32H743 but is rougly applicable to all tests.
summary:
dhrystone-1.2:
info:
name: dhrystone
id: 2
pid: 1
memorySize: 32768
stack: 0x2407FCF0
heap: 0x24078000
activity:
cpuTime: 492453363
cpuUsage: 92.787010%
maximumStack: 808
maximumHeap: 12696
maximumMemoryUsage: 41.21%
The Results
Processor | CPU | DMIPS | DMIPS/MHz | Notes |
---|---|---|---|---|
RT1052 | 525MHz | 675 DMIPS | 1.29 DMIPS/MHz | Running from external SDRAM with cache on |
RT1052 | 525MHz | 26 DMIPS | 0.05 DMIPS/MHz | Running from external SDRAM with cache off |
STM32H743ZI | 400MHz | 457 DMIPS | 1.14 DMIPS/MHz | Running from internal flash/RAM with cache on |
STM32H743ZI | 400MHz | 79 DMIPS | 0.20 DMIPS/MHz | Running from internal flash/RAM with cache off |
STM32F746ZG | 216MHz | 231 DMIPS | 1.07 DMIPS/MHz | Running from internal flash/RAM with cache and flash accelerator on |
STM32F746ZG | 216MHz | 120 DMIPS | 0.55 DMIPS/MHz | Running from internal flash/RAM with cache off and flash accelerator on |
STM32F746ZG | 216MHz | 53 DMIPS | 0.25 DMIPS/MHz | Running from internal flash/RAM with cache and flash accelerator off |
STM32F723ZG | 216MHz | 246 DMIPS | 1.14 DMIPS/MHz | Running from internal flash/RAM with cache on and flash accelerator off |
STM32F723ZG | 216MHz | 53 DMIPS | 0.25 DMIPS/MHz | Running from internal flash/RAM with cache off and flash accelerator off |
STM32F446ZE | 168MHz | 106 DMIPS | 0.63 DMIPS/MHz | Running from internal flash with data/instruction cache on |
STM32F446ZE | 168MHz | 52 DMIPS | 0.31 DMIPS/MHz | Running from internal flash/RAM with data/instruction cache off |
LPC4078 | 120MHz | 85 DMIPS | 0.71 DMIPS/MHz | Running from internal ram |
LPC4078 | 120MHz | 63 DMIPS | 0.53 DMIPS/MHz | Running from internal flash |
LPC1768 | 96MHz | 51 DMIPS | 0.53 DMIPS/MHz | Running from internal flash |
STM32L475VG | 80MHz | 52 DMIPS | 0.65 DMIPS/MHz | Running from internal ram |
STM32L475VG | 80MHz | 50 DMIPS | 0.62 DMIPS/MHz | Running from internal flash |
STM32F411VE | 96MHz | 62 DMIPS | 0.65 DMIPS/MHz | Running from internal flash with data/instruction cache on |
STM32F411VE | 96MHz | 47 DMIPS | 0.49 DMIPS/MHz | Running from internal ram with with data/instruction cache off |
STM32F411VE | 96MHz | 39 DMIPS | 0.41 DMIPS/MHz | Running from internal flash with with data/instruction cache off |
Things to Notice
- Running code on external SDRAM with cache on vs cache off makes a huge difference (obviously) on the RT1052.
- The cache makes a big difference on the STM32H7/F7 chips even when running from internal flash and RAM. This is mainly due to the flash running at speeds lower than the CPU.
- The LPC4078 has better performance executing from RAM than from flash.
Last Thing
The image at the top of the page shows the benchmark dashboard for when I finally got the cache working on the STM32H743. To see all the data, you can login to the Stratify Dashboard and browse to things.