Limiting TCP/IP RAM usage on STM32
The TCP/IP functionality of a connected device uses dynamic RAM allocation because of the unpredictable nature of network behavior. For example, if a device serves a web dashboard, we cannot control how many clients might connect at the same time. Likewise, if a device communicates with a cloud server, we may not know in advance how large the exchanged messages will be.
Therefore, limiting the amount of RAM used by the TCP/IP stack improves the device’s security and reliability, ensuring it remains responsive and does not crash due to insufficient memory.
Watch the video below to see how RAM limiting is implemented in practice using the Mongoose embedded TCP/IP stack.
Microcontroller RAM overview
It is common that on microcontrollers, available memory resides in several non-contiguous regions. Each of these regions can have different cache characteristics, performance levels, or power properties, and certain peripheral controllers may only support DMA operations to specific memory areas.
STM32H723ZG RAM regions
Let's take the STM32H723ZG microcontroller as an example. Its datasheet, in section 3.3.2 defines embedded SRAM regions:
- from 128 to 320 Kbytes of AXI-SRAM mapped onto the AXI bus on D1 domain
- SRAM1 mapped on D2 domain: 16 Kbytes
- SRAM2 mapped on D2 domain: 16 Kbytes
- SRAM4 mapped on D3 domain: 16 Kbytes
- 4 Kbytes of backup SRAM. The content of this area is protected against
possible unwanted write accesses, and can be retained in Standby or VBAT mode.
- RAM mapped to TCM interface (ITCM and DTCM):
Both ITCM and DTCM RAMs are zero wait state memories. They can be accessed
either from the CPU or the MDMA (even in Sleep mode) through a specific AHB slave
of the Cortex®-M7CPU(AHBSAHBP):
– 64 to 256 Kbytes of ITCM-RAM (instruction RAM)
This RAM is connected to an ITCM 64-bit interface designed for execution of
critical real-time routines by the CPU.
– 128 Kbytes of DTCM-RAM (2x 64-Kbyte DTCM-RAMs on 2x32-bit DTCM ports)
The DTCM-RAM could be used for critical real-time data, such as interrupt service
routines or stack/heap memory. Both DTCM-RAMs can be used in parallel (for
load/store operations) thanks to the Cortex®-M7 dual issue capability.
The MDMA can be used to load code or data in ITCM or DTCM RAMs. As reflected
above, 192 Kbyte of RAM can be used either for AXI SRAM or ITCM, with a 64Kbyte
granularity.
This is an example linker script snippet for this microcontroller generated by the CubeMX:
MEMORY {
ITCMRAM (xrw) : ORIGIN = 0x00000000, LENGTH = 64K
DTCMRAM (xrw) : ORIGIN = 0x20000000, LENGTH = 128K
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1024K
RAM_D1 (xrw) : ORIGIN = 0x24000000, LENGTH = 320K
RAM_D2 (xrw) : ORIGIN = 0x30000000, LENGTH = 32K
RAM_D3 (xrw) : ORIGIN = 0x38000000, LENGTH = 16K
}
Ethernet DMA memory
We can clearly see that RAM is split into several regions. The STM32H723ZG device includes a built-in Ethernet MAC controller that uses DMA for its operation. It is important to note that the DMA controller is located in domain D2, meaning it cannot directly access memory in domain D1. Therefore, the linker script and source code must ensure that Ethernet DMA data structures are placed in domain D2 - for example, in RAM_D2.
To achieve this, first define a section in the linker script and place it in the RAM_D2 region:
.eth_ram (NOLOAD) : { *(.eth_ram* .eth_ram.*) } >RAM_D2 AT> ROM
Second, the Ethernet driver source code must put respective data into that section. It may look like this:
static uint8_t s_rxbuf[ETH_DESC_CNT][ETH_PKT_SIZE]
__attribute__((section(".eth_ram")))
__attribute__((aligned((8U))));
Heap memory
The next important part is the microcontroller's heap memory. The standard C library provides two basic functions for dynamic memory allocation:
void *malloc(size_t size); // Allocate memory from the heap
void free(void *ptr); // Free memory allocated by malloc
Typically, ARM-based microcontroller SDK are shipped with the ARM GCC compiler
which includes the Newlib C library.
This library, like many others, has a concept of so-called "syscalls" - a low
level routines that user can override, and which are called by the standard
C functions. In our case, the malloc() and free() standard C routines
call the _sbrk() syscall which firmware code can override. It is typically
done in the sycalls.c or sysmem.c file, and may look this this:
void *_sbrk(int incr) {
extern char _end; // Symbol defined in the linker script
static unsigned char *heap = NULL;
unsigned char *prev_heap;
if (heap == NULL) heap = (unsigned char *) &_end;
prev_heap = heap;
heap += incr;
return prev_heap;
}
As we can see, the _sbrk() operates on a single memory region:
That means that such implementation cannot use several RAM regions. There are
more advanced implementations like FreeRTOS's
heap4.c
which can use multiple RAM regions and provides pvPortMalloc() and
pvPortFree() functions.
In any case, standard C functions malloc() and free() provide heap memory
as a shared resource. If several subsystems in a device’s firmware use dynamic
memory and their memory usage is not limited by code, any of them can
potentially exhaust the available memory. This can leave the device in an
out-of-memory state, which typically causes it to stop operating.
Therefore, the solution is to have every subsystem that uses dynamic memory allocation operate within a bounded memory pool. This approach protects the entire device from running out of memory.
Memory Pools
The idea behind a memory pool is to split a single shared heap - with a single malloc and free - into multiple “heaps” or memory pools, each with its own malloc and free. The pseudo-code might look like this:
void *pool_create(size);
void *pool_alloc(pool_t *pool, size_t size);
void pool_free(pool_t *pool, void *ptr);
The next step is to make each firmware subsystem use its own memory pool. This can be achieved by creating a separate memory pool for each subsystem and using the pool’s malloc and free functions instead of the standard ones.
In the case of a TCP/IP stack, this would require all parts of the networking code - the driver, the stack, the HTTP/MQTT library, the TLS stack, and the application code - to use a dedicated memory pool. This can be tedious to implement manually.
RTOS Memory pool API
Some RTOSes provide a memory pool API. For example, Zephyr providers memory heaps:
struct k_heap my_heap;
K_HEAP_DEFINE(my_heap, 1024);
void *ptr = k_heap_alloc(&my_heap, 100, K_NO_WAIT);
k_heap_free(&my_heap, ptr);
The other example of an RTOS that provides memory pools is ThreadX:
TX_BYTE_POOL my_pool;
tx_byte_pool_create(&my_pool, "My Pool", 1024);
void *ptr = tx_byte_allocate(&my_pool, 100, TX_NO_WAIT);
tx_byte_release(ptr);
Using external allocator
The other alternative is to use an external allocator. There are many implementations available. Here are some notable ones:
- umm_malloc - specifically designed to work with the ARM7 embedded processor, but it should work on many other 32 bit processors, as well as 16 and 8 bit devices.
- o1heap - a highly deterministic constant-complexity memory allocator designed for hard real-time high-integrity embedded systems. The name stands for O(1) heap.
Example: Mongoose and O1Heap
Mongoose TCP/IP stack makes it easy to limit its memory usage, because Mongoose
uses its own functions mg_calloc() and mg_free() to allocate and release
memory. The default implementation uses the C standard library functions calloc()
and free(), but Mongoose allows user to override these functions with their own
implementations.
We can pre-allocate memory for Mongoose at firmware startup, for example 50 Kb,
and use o1heap library to use that preallocated block and implement mg_calloc()
and mg_free() using o1heap. Here are the exact steps:
- Fetch o1heap.c and o1heap.h into your source tree
- Add o1heap.c to the list of your source files
- Preallocate memory chunk at the firmware startup:
size_t poolsize = 50 * 1024;
void *pool = malloc(poolsize + O1HEAP_ALIGNGMENT);
void *aligned = (void *) ((uintptr_t) pool & ~(O1HEAP_ALIGNGMENT - 1));
s_mem = o1heap_init(aligned, poolsize);
- Implement
mg_calloc()andmg_free()using o1heap and preallocated memory chunk
#include "o1heap.h"
O1HeapInstance *s_mem;
void *mg_calloc(size_t count, size_t size) {
void *ptr = o1heapAallocate(s_mem, count * size);
if (ptr) memset(ptr, 0, count * size);
return ptr;
}
void mg_free(void *ptr) {
o1heapFree(s_mem, ptr);
}
You can see the full implementation procedure in the video linked at the beginning of this article.
Summary
In this article, you learned how to:
- Understand STM32’s complex RAM layout
- Ensure Ethernet DMA buffers reside in accessible memory
- Avoid memory exhaustion by using bounded memory pools
- Integrate the o1heap allocator with Mongoose to enforce TCP/IP RAM limits
By isolating the network stack's memory usage, you make your firmware more stable, deterministic, and secure - especially in real-time or resource-constrained systems.
Ready to add TCP/IP to your STM32 project? Try the Mongoose Wizard now.