The dhrystone benchmark application for Stratify OS is up and running. Dhrystone is a simple benchmark application that is designed to mimic a typical program. It characterizes the performance of both the compiler and the processor (and in this case the operating system). The result of the program is a number called DMIPS which is the amount of time to run one “dhrystone” divided by 1757 which is the number of dhrystones some old 1 MIPS processor could output.
I ran the benchmark on a variety of processors. The benchmark was running inside of Stratify OS. So there are a couple of factors that reduce the maximum possible DMIPS score that is advertised by chip manufacturers.
- The application was compiled using mlong-calls which has a 5% to 10% penalty.
- The kernel (including libc) was compiled with -Os rather than being purely speed optimized.
- In some cases, I may have just made an error on the cache/flash settings that is causing degraded performance. I will update this post if I find any issues.
- The application is running with the OS. Some processor time is being used by context switching and providing metrics over USB. This takes away another 10% or so as shown by the following task analysis from
sl app.run:path=dhrystone,terminal task.analyze. The output was taken from a run with the STM32H743 but is rougly applicable to all tests.
summary: dhrystone-1.2: info: name: dhrystone id: 2 pid: 1 memorySize: 32768 stack: 0x2407FCF0 heap: 0x24078000 activity: cpuTime: 492453363 cpuUsage: 92.787010% maximumStack: 808 maximumHeap: 12696 maximumMemoryUsage: 41.21%
|RT1052||525MHz||675 DMIPS||1.29 DMIPS/MHz||Running from external SDRAM with cache on|
|RT1052||525MHz||26 DMIPS||0.05 DMIPS/MHz||Running from external SDRAM with cache off|
|STM32H743ZI||400MHz||457 DMIPS||1.14 DMIPS/MHz||Running from internal flash/RAM with cache on|
|STM32H743ZI||400MHz||79 DMIPS||0.20 DMIPS/MHz||Running from internal flash/RAM with cache off|
|STM32F746ZG||216MHz||231 DMIPS||1.07 DMIPS/MHz||Running from internal flash/RAM with cache and flash accelerator on|
|STM32F746ZG||216MHz||120 DMIPS||0.55 DMIPS/MHz||Running from internal flash/RAM with cache off and flash accelerator on|
|STM32F746ZG||216MHz||53 DMIPS||0.25 DMIPS/MHz||Running from internal flash/RAM with cache and flash accelerator off|
|STM32F723ZG||216MHz||246 DMIPS||1.14 DMIPS/MHz||Running from internal flash/RAM with cache on and flash accelerator off|
|STM32F723ZG||216MHz||53 DMIPS||0.25 DMIPS/MHz||Running from internal flash/RAM with cache off and flash accelerator off|
|STM32F446ZE||168MHz||106 DMIPS||0.63 DMIPS/MHz||Running from internal flash with data/instruction cache on|
|STM32F446ZE||168MHz||52 DMIPS||0.31 DMIPS/MHz||Running from internal flash/RAM with data/instruction cache off|
|LPC4078||120MHz||85 DMIPS||0.71 DMIPS/MHz||Running from internal ram|
|LPC4078||120MHz||63 DMIPS||0.53 DMIPS/MHz||Running from internal flash|
|LPC1768||96MHz||51 DMIPS||0.53 DMIPS/MHz||Running from internal flash|
|STM32L475VG||80MHz||52 DMIPS||0.65 DMIPS/MHz||Running from internal ram|
|STM32L475VG||80MHz||50 DMIPS||0.62 DMIPS/MHz||Running from internal flash|
|STM32F411VE||96MHz||62 DMIPS||0.65 DMIPS/MHz||Running from internal flash with data/instruction cache on|
|STM32F411VE||96MHz||47 DMIPS||0.49 DMIPS/MHz||Running from internal ram with with data/instruction cache off|
|STM32F411VE||96MHz||39 DMIPS||0.41 DMIPS/MHz||Running from internal flash with with data/instruction cache off|
Things to Notice
- Running code on external SDRAM with cache on vs cache off makes a huge difference (obviously) on the RT1052.
- The cache makes a big difference on the STM32H7/F7 chips even when running from internal flash and RAM. This is mainly due to the flash running at speeds lower than the CPU.
- The LPC4078 has better performance executing from RAM than from flash.
The image at the top of the page shows the benchmark dashboard for when I finally got the cache working on the STM32H743. To see all the data, you can login to the Stratify Dashboard and browse to things.