Limiting TCP/IP RAM usage on STM32

The TCP/IP functionality of a connected device uses dynamic RAM allocation because of the unpredictable nature of network behavior. For example, if a device serves a web dashboard, we cannot control how many clients might connect at the same time. Likewise, if a device communicates with a cloud server, we may not know in advance how large the exchanged messages will be.

Therefore, limiting the amount of RAM used by the TCP/IP stack improves the device’s security and reliability, ensuring it remains responsive and does not crash due to insufficient memory.

Watch the video below to see how RAM limiting is implemented in practice using the Mongoose embedded TCP/IP stack.

Microcontroller RAM overview

It is common that on microcontrollers, available memory resides in several non-contiguous regions. Each of these regions can have different cache characteristics, performance levels, or power properties, and certain peripheral controllers may only support DMA operations to specific memory areas.

STM32H723ZG RAM regions

Let's take the STM32H723ZG microcontroller as an example. Its datasheet, in section 3.3.2 defines embedded SRAM regions:

- from 128 to 320 Kbytes of AXI-SRAM mapped onto the AXI bus on D1 domain
- SRAM1 mapped on D2 domain: 16 Kbytes
- SRAM2 mapped on D2 domain: 16 Kbytes
- SRAM4 mapped on D3 domain: 16 Kbytes
- 4 Kbytes of backup SRAM. The content of this area is protected against
  possible unwanted write accesses, and can be retained in Standby or VBAT mode.
- RAM mapped to TCM interface (ITCM and DTCM):
  Both ITCM and DTCM RAMs are zero wait state memories. They can be accessed
  either from the CPU or the MDMA (even in Sleep mode) through a specific AHB slave
  of the Cortex®-M7CPU(AHBSAHBP):
– 64 to 256 Kbytes of ITCM-RAM (instruction RAM)
  This RAM is connected to an ITCM 64-bit interface designed for execution of
  critical real-time routines by the CPU.
– 128 Kbytes of DTCM-RAM (2x 64-Kbyte DTCM-RAMs on 2x32-bit DTCM ports)
  The DTCM-RAM could be used for critical real-time data, such as interrupt service
  routines or stack/heap memory. Both DTCM-RAMs can be used in parallel (for
  load/store operations) thanks to the Cortex®-M7 dual issue capability.
  The MDMA can be used to load code or data in ITCM or DTCM RAMs. As reflected
  above, 192 Kbyte of RAM can be used either for AXI SRAM or ITCM, with a 64Kbyte
  granularity.

This is an example linker script snippet for this microcontroller generated by the CubeMX:

MEMORY {
  ITCMRAM (xrw)    : ORIGIN = 0x00000000,   LENGTH = 64K
  DTCMRAM (xrw)    : ORIGIN = 0x20000000,   LENGTH = 128K
  FLASH    (rx)    : ORIGIN = 0x08000000,   LENGTH = 1024K
  RAM_D1  (xrw)    : ORIGIN = 0x24000000,   LENGTH = 320K
  RAM_D2  (xrw)    : ORIGIN = 0x30000000,   LENGTH = 32K
  RAM_D3  (xrw)    : ORIGIN = 0x38000000,   LENGTH = 16K
}

Ethernet DMA memory

We can clearly see that RAM is split into several regions. The STM32H723ZG device includes a built-in Ethernet MAC controller that uses DMA for its operation. It is important to note that the DMA controller is located in domain D2, meaning it cannot directly access memory in domain D1. Therefore, the linker script and source code must ensure that Ethernet DMA data structures are placed in domain D2 - for example, in RAM_D2.

To achieve this, first define a section in the linker script and place it in the RAM_D2 region:

.eth_ram (NOLOAD) : { *(.eth_ram* .eth_ram.*) } >RAM_D2 AT> ROM

Second, the Ethernet driver source code must put respective data into that section. It may look like this:

static uint8_t s_rxbuf[ETH_DESC_CNT][ETH_PKT_SIZE]
    __attribute__((section(".eth_ram")))
    __attribute__((aligned((8U))));

Heap memory

The next important part is the microcontroller's heap memory. The standard C library provides two basic functions for dynamic memory allocation:

void *malloc(size_t size);  // Allocate memory from the heap
void free(void *ptr);       // Free memory allocated by malloc

Typically, ARM-based microcontroller SDK are shipped with the ARM GCC compiler which includes the Newlib C library. This library, like many others, has a concept of so-called "syscalls" - a low level routines that user can override, and which are called by the standard C functions. In our case, the malloc() and free() standard C routines call the _sbrk() syscall which firmware code can override. It is typically done in the sycalls.c or sysmem.c file, and may look this this:

void *_sbrk(int incr) {
  extern char _end;  // Symbol defined in the linker script
  static unsigned char *heap = NULL;
  unsigned char *prev_heap;
  if (heap == NULL) heap = (unsigned char *) &_end;
  prev_heap = heap;
  heap += incr;
  return prev_heap;
}

As we can see, the _sbrk() operates on a single memory region:

That means that such implementation cannot use several RAM regions. There are more advanced implementations like FreeRTOS's heap4.c which can use multiple RAM regions and provides pvPortMalloc() and pvPortFree() functions.

In any case, standard C functions malloc() and free() provide heap memory as a shared resource. If several subsystems in a device’s firmware use dynamic memory and their memory usage is not limited by code, any of them can potentially exhaust the available memory. This can leave the device in an out-of-memory state, which typically causes it to stop operating.

Therefore, the solution is to have every subsystem that uses dynamic memory allocation operate within a bounded memory pool. This approach protects the entire device from running out of memory.

Memory Pools

The idea behind a memory pool is to split a single shared heap - with a single malloc and free - into multiple “heaps” or memory pools, each with its own malloc and free. The pseudo-code might look like this:

void *pool_create(size);
void *pool_alloc(pool_t *pool, size_t size);
void pool_free(pool_t *pool, void *ptr);

The next step is to make each firmware subsystem use its own memory pool. This can be achieved by creating a separate memory pool for each subsystem and using the pool’s malloc and free functions instead of the standard ones.

In the case of a TCP/IP stack, this would require all parts of the networking code - the driver, the stack, the HTTP/MQTT library, the TLS stack, and the application code - to use a dedicated memory pool. This can be tedious to implement manually.

RTOS Memory pool API

Some RTOSes provide a memory pool API. For example, Zephyr providers memory heaps:

struct k_heap my_heap;
K_HEAP_DEFINE(my_heap, 1024);

void *ptr = k_heap_alloc(&my_heap, 100, K_NO_WAIT);
k_heap_free(&my_heap, ptr);

The other example of an RTOS that provides memory pools is ThreadX:

TX_BYTE_POOL my_pool;
tx_byte_pool_create(&my_pool, "My Pool", 1024);

void *ptr = tx_byte_allocate(&my_pool, 100, TX_NO_WAIT);
tx_byte_release(ptr);

Using external allocator

The other alternative is to use an external allocator. There are many implementations available. Here are some notable ones:

umm_malloc - specifically designed to work with the ARM7 embedded processor, but it should work on many other 32 bit processors, as well as 16 and 8 bit devices.
o1heap - a highly deterministic constant-complexity memory allocator designed for hard real-time high-integrity embedded systems. The name stands for O(1) heap.

Example: Mongoose and O1Heap

Mongoose TCP/IP stack makes it easy to limit its memory usage, because Mongoose uses its own functions mg_calloc() and mg_free() to allocate and release memory. The default implementation uses the C standard library functions calloc() and free(), but Mongoose allows user to override these functions with their own implementations.

We can pre-allocate memory for Mongoose at firmware startup, for example 50 Kb, and use o1heap library to use that preallocated block and implement mg_calloc() and mg_free() using o1heap. Here are the exact steps:

Fetch o1heap.c and o1heap.h into your source tree
Add o1heap.c to the list of your source files
Preallocate memory chunk at the firmware startup:

size_t poolsize = 50 * 1024;
void *pool = malloc(poolsize + O1HEAP_ALIGNGMENT);
void *aligned = (void *) ((uintptr_t) pool & ~(O1HEAP_ALIGNGMENT - 1));
s_mem = o1heap_init(aligned, poolsize);

Implement mg_calloc() and mg_free() using o1heap and preallocated memory chunk

#include "o1heap.h"
O1HeapInstance *s_mem;

void *mg_calloc(size_t count, size_t size) {
  void *ptr = o1heapAallocate(s_mem, count * size);
  if (ptr) memset(ptr, 0, count * size);
  return ptr;
}

void mg_free(void *ptr) {
  o1heapFree(s_mem, ptr);
}

You can see the full implementation procedure in the video linked at the beginning of this article.

Summary

In this article, you learned how to:

Understand STM32’s complex RAM layout
Ensure Ethernet DMA buffers reside in accessible memory
Avoid memory exhaustion by using bounded memory pools
Integrate the o1heap allocator with Mongoose to enforce TCP/IP RAM limits

By isolating the network stack's memory usage, you make your firmware more stable, deterministic, and secure - especially in real-time or resource-constrained systems.

Ready to add TCP/IP to your STM32 project? Try the Mongoose Wizard now.

Mongoose

Mongoose is an open source embedded HTTP/Websocket/MQTT library for C/C++

Quick Links

Contact

Cesanta, 13 Edward Pl, Dublin 4, Ireland

+353 1 592 5476

support@cesanta.com