Wednesday, February 2, 2022

IRQ logger - a simple tool for diagnosing interrupts

[NOTE: this post is deprecated. Please use STMViewer instead]

Hey there!

Today's post will be about a debug tool I've been thinking about for a while now:

I was recently designing a part of embedded software where I had to check the interrupts execution times and if their priorities are set correctly. In short, it meant I had to take my oscilloscope probes connect them with wires to the MCU's GPIOs and toggle them in each interrupt routine.  

nobody likes that...

Meh. I thought to myself, now's the time to finally do something about it. I wanted something that will show me the interrupts as logic signals in time, such as a signal is high when the interrupt is active and low when it's not. Also, the time resolution should be small enough to measure a few-microsecond interrupts. At first, I wanted to go for realtime logging like trice or directly using Segger's RRT communication, but these were too slow and added too much overhead which could cause differences in operation when logging is present vs when it's not. Moreover, I actually didn't need to send any sophisticated debug messages, just a timestamp, an ID, and a state (0 or 1). I thought I could just make a buffer in the RAM memory, collect all the data for some time (until the buffer is full), and then send it using DMA to the PC where a python script would interpret it as a time plot. I liked the idea at first but found connecting UART or any other additional cables (besides the STlink) quite annoying and not applicable to my current projects. And this is when it struck me - I could just use STlink and it's CLI to send and receive data from the target - no other communication buses are needed. It even has the -dump option to dump the binary data to a file - excellent. 

So I began by creating a struct of two timestamp variables (one is expressed in milliseconds, the other in us) - both are 2 bytes each, a 1 byte ID and 1 byte state variables. Six bytes in total. I thought about integrating the state in the ID's MSB, but for now, I decided to keep it super simple. This is the struct, with a union to be able to easily represent it as a byte array. 


The packed attribute is to avoid padding. After that, I had to set up a timestamp timer and implement a simple function that adds logs to the buffer. The embedded side of the project is really straightforward, as it was intended. 

This is how you add a log to the buffer

Next, I got to the part where I needed to read the buffer from the MCU's memory and present it as waveforms so that the data is actually useful and readable. My initial plan was to check the variable addresses in the *.elf file, insert them into the python script and use ST-Link CLI (pystlink) to read their values. This however seemed really impractical, especially when the addresses changed during the firmware development process. This is why I decided to use pyelftools for elf file parsing and variable address extraction. You just have to know the variable names and not change them during development. The script will automatically find the address of the "enable" variable (which turns logging on), and the buffer with its size. It will trigger the logging process and then download the buffer to a binary file, which can be viewed in case the target is no longer present. 

The rest is just unpacking the binary data to a list using "struct", and taking care of signal states in time so that they are displayed correctly. I also added two cursors (they still need improvements) to be able to quickly check the time period of a particular signal state. A simple example log is presented below: 


Of course, I wanted to check the performance so I conducted some tests with my scope. First I checked how long it takes to toggle a GPIO (directly using BSRR and BRR registers). The MCU's core clock was set to 170 Mhz during the tests and the optimization was turned off  (-O0) for the user files and set to -O3 for the logger source file (with #pragma GCC push_options directive).

GPIO test:


This way all the future measurements should be reduced by the 36 ns period. 

Since adding a log is the thing that happens very often and takes place in interrupts I started by measuring the time it takes to add a single log. There are two options - the delay it introduces when logging is off (only entering the add_log() function and checking the first condition), and the delay it introduces when logging is turned on (when besides calling the function we are actually writing values to the buffer). 

The first option when logging is off, takes about 200ns:


whereas the case when logging is turned on, takes twice as long with about 420 ns:



I think it's pretty decent, even considering the shortest interrupt routines. I also plan to test the add_log() function implementation as a #define, so that there's no overhead due to function call. I've heard it's not really the best practice and today's compilers are pretty aggressive with inlining functions, but we shall see ;) 

The source code can be found on my github account, together with a short step-by-step guide on how to use it. There's also a docs/photo directory with some scope and logger screenshot comparisons, and a STM32f474CCU6 example code. Please bear in mind I'm not really a Python guy so the code may not be the best quality. 

Hope you'll find it useful!