lwIP vs Mongoose - Embedded TCP/IP Stack Comparison on STM32 (Part 1)

lwIP is a lightweight TCP/IP stack commonly bundled with STM32 and other MCU SDKs, but it provides only low-level networking. Mongoose is a complete embedded networking stack that integrates TCP/IP, HTTP server, WebSocket, MQTT client, TLS/HTTPS, and OTA firmware updates in a single C library without external dependencies. This article compares lwIP and Mongoose for embedded systems, including footprint, TLS integration, MQTT support, and production readiness.

lwIP vs Mongoose - Step-by-Step STM32 Comparison Plan

Let's keep this simple and real. We're going to build the same firmware step by step on a NUCLEO-F756ZG using VS Code and a bare metal project. After every step, we measure flash, RAM, and when it makes sense, timing. No theory, just numbers.

  1. We start with a minimal bare metal project that just brings up clocks, GPIO, and toggles an LED. That's our clean baseline on STM32F756ZG.
  2. Next, we add a TCP/IP stack, bring up Ethernet, and make the board reply to ping. Then we check how much flash and RAM basic networking actually costs.
  3. After that, we add an HTTP server and build a tiny web dashboard to toggle the LED from a browser. Again, we measure the footprint.
  4. Then we turn on TLS and switch to HTTPS. We measure memory usage and check how long the TLS handshake really takes on a Cortex-M7.
  5. Once that's stable, we add MQTT and connect to HiveMQ so we can toggle the LED from the cloud. We measure the new footprint.
  6. Finally, we add firmware update over MQTT and measure the total flash and RAM needed for secure OTA.

By the end, we'll have a clear, side-by-side picture of what it really takes to go from bare metal STM32 to a secure, cloud-connected device - and how lwIP and Mongoose compare at every step.

lwIP vs Mongoose - Step 1: STM32 Bare Metal Baseline (Flash and RAM)

First, we create a minimal bare metal STM32 project using CubeMX and VS Code as our build environment. We will set the main clock to 216 MHz, configure LED GPIOs, enable Ethernet pins, bring up UART for debug output, random number generator, and nothing else. The goal is simple: measure the baseline flash size and free RAM on STM32F756ZG so we have solid footprint numbers before adding lwIP or Mongoose, and later comparing how much TCP/IP, HTTP, TLS, and MQTT actually increase memory usage.

Open CubeMX and create a new project for STM32F756ZG. In Clock Configuration, push the system clock to 216 MHz. Enable RNG. Enable USART3 on PD8 and PD9 for debug output, configure LEDs on PB0, PB7, and PB14 as GPIO outputs, and enable the Ethernet controller in RMII mode. Let CubeMX assign the default ETH pins, then verify them against the Mongoose STM32 board pinout table and fix PG11 if needed. In Project Manager, name the project f756-base, select CMake with GCC, generate the code, and open it in VS Code.

Now we redirect printf to the UART so we can see output in the serial console. In the Mongoose documentation under Tutorials - Common Tasks - Redirect printf to UART, copy the _write override and paste it into main.c. GCC with newlib implements standard C functions through weak syscalls, so printf ultimately calls _write. By overriding _write to transmit data over USART3, all file output functions, including printf, are redirected to the debug console.

int _write(int fd, unsigned char *buf, int len) {
  HAL_USART_Transmit(&husart3, buf, len, HAL_MAX_DELAY);
  return len;
}

Next, we add proper free RAM measurement. We override _sbrk, which malloc uses to grow the heap, so we can track heap usage ourselves. CubeMX already generates an _sbrk in sysmem.c, so we disable it and replace it with our version that also exposes helper functions for memory tracking.

On STM32F756ZG the linker script defines 320 KB of RAM. The _end symbol marks the end of the data section, which is where the dynamic heap begins. The heap grows upward as _sbrk moves the break pointer, while the stack grows downward from the top of RAM. Free RAM is simply the space between the current heap end and the current stack pointer. We approximate the stack pointer using a local variable and keep a 256-byte safety margin. Two helper functions, ramUsed() and ramFree(), report the actual runtime memory footprint in flash and RAM terms.

// Works in ARM GCC (newlib C library)
extern unsigned char _end[];                   // Heap start
static unsigned char *s_break_address = _end;  // Heap end (dynamic)

size_t ramUsed(void) {
  return (size_t) (s_break_address - _end);
}

size_t ramFree(void) {
  unsigned char endofstack;
  return (size_t) (&endofstack - s_break_address);
}

void *_sbrk(int incr) {
  unsigned char *prev_heap;
  unsigned char *heap_end = (unsigned char *) ((size_t) &heap_end - 256);
  prev_heap = s_break_address;
  if (s_break_address + incr > heap_end) {
    errno = ENOMEM;
    return (void *) -1;
  }
  s_break_address += incr;
  return prev_heap;
}

Now we update the main superloop to print free RAM once per second. Include <stdio.h> and <errno.h> to properly declare printf() and errno.

  uint32_t timer = 0, period = 1000;  // milliseconds
  while (1) {
    if (timer + period <= HAL_GetTick()) {
      timer = HAL_GetTick();
      printf("RAM: %u\n", ramFree());
    }

By default, a CubeMX CMake project only produces an ELF file. For accurate flash usage measurement on STM32, we also need the raw .bin image. Add the following post-build step to CMakeLists.txt to automatically generate a BIN file after each build:

add_custom_command(TARGET ${PROJECT_NAME} POST_BUILD
    COMMAND ${CMAKE_OBJCOPY} -O binary
            $<TARGET_FILE:${PROJECT_NAME}>
            ${PROJECT_NAME}.bin
)

Build and flash the firmware. The CMake build outputs the binaries into build/Debug, and we use the BIN size for accurate flash usage. Open a serial console on USART3 to watch the live free RAM output, then record the baseline flash and RAM numbers for later lwIP vs Mongoose comparison.

Now we have clean baseline numbers on STM32F756ZG - flash size from the BIN and free RAM from the console - with nothing enabled except basic debug printing.

$ ls -l f756-base/build/Debug/
total 3816
-rwxr-xr-x@  1 cpq  staff    18692  1 Mar 11:20 f756-base.bin
-rwxr-xr-x@  1 cpq  staff  1314380  1 Mar 11:20 f756-base.elf
...

Console output looks like this:

RAM: 325431
RAM: 325431
...

Let's start filling in the comparison table:

lwIP Mongoose
Baseline free RAM, bytes 325431 325431
Baselinse flash usage, bytes 18692 18692

lwIP vs Mongoose - Step 2: Integrating the TCP/IP Stack on STM32

Every embedded networking application is built as a stack of layers. At the bottom sits the network driver, which talks to the Ethernet or WiFi hardware. Above it is the TCP/IP stack, which implements IP, ICMP, TCP, and UDP. On top of that usually come TLS, higher-level protocols like HTTP or MQTT, and finally the application logic. In this step we focus only on the first two layers - the network driver and the TCP/IP stack.

The goal is to bring up Ethernet and make the board respond to ping. Ping is a simple network test that sends an ICMP echo request to a device and expects an ICMP echo reply in return. If the board replies to ping, it means the driver works, the TCP/IP stack is running, the device has an IP address, and basic network connectivity is working.

software stack: driver and tcp/ip layers

2.1 Integrating lwIP on STM32F756ZG

The STM32 Cube framework includes lwIP, and the networking software stack looks like this.

lwip-based stack on stm32

At the bottom sits the Ethernet driver provided by ST. It is a fairly large and complex component - more than 3k lines of code - and exposes a plug-in interface through weak functions. TCP/IP stacks such as lwIP or Zephyr implement those functions and connect to the ST Ethernet driver that way. The Cube Ethernet driver has a reputation for issues, especially in more complex projects. In this simple experiment, the driver works fine.

Above the driver is the TCP/IP stack. lwIP is highly configurable, but many of its options are difficult to understand unless you are familiar with TCP/IP internals. Several settings are interconnected, which makes configuration tricky for most embedded developers. lwIP also provides three programming interfaces: the raw callback API, the netconn API, and the BSD socket API. The BSD socket API requires an RTOS, so in this experiment we will use the raw API.

By default, lwIP uses its own custom memory pools for network buffers. For this comparison we switch lwIP to use the system malloc instead, just like Mongoose does by default, so the memory measurements remain comparable.

For TLS, Cube integrates mbedTLS from ARM.

lwIP also ships with example applications such as an HTTP server and MQTT client. These examples are built directly on top of the TCP/IP stack internals and do not expose a clean reusable library interface, which makes them harder to adapt for real products. We will look at that later in the article.

For now, we focus only on the first two layers - the Ethernet driver and the TCP/IP stack.

  1. Make a copy of the baseline project folder, rename it to f756-lwip, and open it in CubeMX (open the .ioc file). Clean up the build directory.

  2. In CubeMX, enable Ethernet. Select RMII mode and let CubeMX assign the default ETH pins for NUCLEO-F756ZG. Verify pins against the appendix table, make correction for PG11 and PG13.

  3. Enable the LwIP middleware. Turn on DHCP so the board grabs an IP automatically. In Platform settings, choose LAN8742 PHY.

  4. Configure lwIP to use the system heap instead of its own fixed memory pools. Add this to lwipopts.h:

    #define MEM_LIBC_MALLOC    1
    #define MEM_USE_POOLS      0
    #define MEMP_MEM_MALLOC    1
    
  5. Generate code from CubeMX. Open the updated project in VS Code and build it to make sure the CubeMX changes compile cleanly.

  6. Update the main: add DHCP initialisation and update superloop so lwIP is polled regularly.

    extern struct netif gnetif;  // Add DHCP Init into superloop
    dhcp_start(&gnetif); 
    
      // Superloop change: Print obtained IP address
      printf("RAM: %u, IP: %s\n", ramFree(), ip4addr_ntoa(&gnetif.ip_addr));
    }
    MX_LWIP_Process();  // Call lwIP processing
    
  7. Build and flash the firmware. Start the USART3 serial console, record free RAM value.

    RAM: 315807, IP: 192.168.2.28
    RAM: 315807, IP: 192.168.2.28
    
    $ ls -l f756-lwip/build/Debug/
    -rwxr-xr-x@  1 cpq  staff    83452  1 Mar 13:04 f756-base.bin
    -rwxr-xr-x@  1 cpq  staff  1831228  1 Mar 13:04 f756-base.elf
    ...
    
  8. Confirm ping works

    $ ping 192.168.2.28
    PING 192.168.2.28 (192.168.2.28): 56 data bytes
    64 bytes from 192.168.2.28: icmp_seq=0 ttl=255 time=0.980 ms
    
  9. Record the measurements:

    lwIP Mongoose
    Baseline free RAM, bytes 326063 326063
    Baselinse flash usage, bytes 14648 14648
    Ethernet driver stm32f7xx_hal_eth.c
    Ethernet driver size, lines 3300
    TCP/IP stack flash usage, bytes 67520
    TCP/IP stack RAM usage, bytes 10184
    Integration difficulty easy
  10. Now let's see how difficult it is to change default settings. Go back to CubeMX, go to LWIP configuration. For a person not familiar with TCP/IP stack internals - most those options make little sense. That means, the vast majority of embedded developers do not understand how to tune lwIP. For example, let's change the maximum TCP segment size from 536 to 1400. When we do that, Cube warns us about configuration error - but we cannot fix it easily, cause have no clue what exactly to fix. If we build and flash the firmware, it does not work. Therefore, we set the tuning configuration as hard:

    lwIP Mongoose
    Baseline free RAM, bytes 326063 326063
    Baselinse flash usage, bytes 14648 14648
    Ethernet driver stm32f7xx_hal_eth.c
    Ethernet driver size, lines 3300
    TCP/IP stack flash usage, bytes 67520
    TCP/IP stack RAM usage, bytes 10184
    Integration difficulty easy
    Tuning configuration hard

2.2 Integrating Mongoose on STM32F756ZG

mongoose-based stack on stm32
  1. Make a copy of the baseline project folder, rename it to f756-mongoose, and open it in CubeMX (open the .ioc file). Clean up the build directory.

  2. In CubeMX, enable Ethernet. Select RMII mode and let CubeMX assign the default ETH pins for NUCLEO-F756ZG. Verify pins against the appendix table, make correction for PG11 and PG13.

  3. Generate code from CubeMX. Open the updated project in VS Code and build it to make sure the CubeMX changes compile cleanly.

  4. We can add Mongoose via CubeMX middleware, but let's copy it manually directly from Github cause it is only 2 files. Go to https://github.com/cesanta/mongoose. Copy mongoose.c to Core/Src and mongoose.h to Core/Inc

  5. Add Inc/mongoose_config.h with the following contents:

    #pragma once
    
    // See https://mongoose.ws/documentation/#build-options
    #define MG_ENABLE_TCPIP 1          // Enable build-in TCP/IP stack
    #define MG_ARCH MG_ARCH_CUBE       // Change this if not Cube
    #define MG_ENABLE_DRIVER_STM32F 1  // Change this if not STM32Fxx
    #define MG_TLS MG_TLS_NONE         // No TLS
    
  6. Add mongoose.c to cmake/stm32cubemx/CMakeLists.txt

  7. In your main.c file, add snippets into your main() function:

    #include "mongoose.h"  // Add include to the top
    
    // Add before the superloop
    struct mg_mgr mgr;        // Mongoose event manager
    mg_log_set(MG_LL_DEBUG);  // MG_LL_ERROR, MG_LL_INFO, MG_LL_DEBUG, MG_LL_VERBOSE
    mg_mgr_init(&mgr);        // Initialise event manager
    
    mg_mgr_poll(&mgr, 0);   // Add to the superloop
    
  8. Build and flash the firmware. Start the USART3 serial console, record free RAM value.

    7fd    2 mongoose.c:5224:onstatechange  READY, IP: 192.168.2.27
    803    2 mongoose.c:5225:onstatechange         GW: 192.168.2.1
    809    2 mongoose.c:5227:onstatechange        MAC: 2a:61:54:08:9c:70
    RAM: 302899
    
    $ ls -l f756-mongoose/build/Debug/
    -rwxr-xr-x@  1 cpq  staff    68824  1 Mar 13:42 f756-base.bin
    -rwxr-xr-x@  1 cpq  staff  1706532  1 Mar 13:42 f756-base.elf
    ...
    
  9. Confirm ping works

    $ ping 192.168.2.28
    PING 192.168.2.28 (192.168.2.28): 56 data bytes
    64 bytes from 192.168.2.28: icmp_seq=0 ttl=255 time=0.980 ms
    
  10. Record the measurements:

    lwIP Mongoose
    Baseline free RAM, bytes 326063 326063
    Baselinse flash usage, bytes 14648 14648
    Ethernet driver stm32f7xx_hal_eth.c stm32f.c
    Ethernet driver size, lines 3300 250
    TCP/IP stack flash usage, bytes 67520 52780
    TCP/IP stack RAM usage, bytes 10184 23092
    Integration difficulty easy

    We can see that the flash usage for both lwIP and Mongoose is similar, slightly higher for lwIP. The RAM usage for Mongoose is around 20Kb: it is taken by 4 TX and 4 RX DMA buffers 1.5Kb each, and 8k RX queue.

  11. Let's try to tune the configuration. All build options are documented at https://mongoose.ws/documentation/#build-options, and are settable via mongoose_config.h. The number of options is not overwhelming. Let's change some, and understood for the non-expert. Let's change some parameter, for example let's change reallocation size:

    #define MG_IO_SIZE 800
    

    Then rebuild and reflash the project. It build and works, so we can set the "Tuning configuration" to easy.

    lwIP Mongoose
    Baseline free RAM, bytes 326063 326063
    Baselinse flash usage, bytes 14648 14648
    Ethernet driver stm32f7xx_hal_eth.c stm32f.c
    Ethernet driver size, lines 3300 250
    TCP/IP stack flash usage, bytes 67520 52780
    TCP/IP stack RAM usage, bytes 10184 23092
    Integration difficulty easy easy
    Tuning configuration hard easy