lwIP vs Mongoose - TCP/IP Stack Integration Benchmark

When building a connected embedded device, one of the first decisions is the networking stack. Two common choices are lwIP and Mongoose. lwIP is a widely used embedded TCP/IP stack. Mongoose is an integrated networking library that provides TCP/IP, web server, MQTT, TLS, and firmware update support in a single package. In this article we start a practical lwIP vs Mongoose benchmark to measure integration complexity, flash and RAM footprint, and overall production readiness.

This is part 1 of the comparison. Here we build the baseline firmware and integrate the TCP/IP stack. Later parts will add a web server, device dashboard, MQTT cloud connectivity, firmware OTA updates, and TLS.

For the benchmark we use an ST Nucleo-F756ZG board so the results are real, reproducible, and close to what you would see in an actual product.

lwIP vs Mongoose Networking Stack Architecture

Before comparing lwIP and Mongoose, it helps to look at how networking functionality in embedded firmware is typically structured. The following diagram shows the common stack of layers used in a connected device:

Layered architecture of embedded networking functionality: driver, TCP/IP stack, TLS, protocols, application

Each layer has a specific role.

The driver talks to the Ethernet peripheral and sends or receives network frames.
The TCP/IP stack implements IP, TCP, UDP and packet processing.
TLS adds encryption and authentication.
Protocols implement things like HTTP or MQTT.
The application implements the actual device logic.

With that model in mind, we can now see how lwIP and Mongoose fit into this architecture. The next diagram shows how networking functionality is typically assembled in firmware when using lwIP versus Mongoose.

Comparison of networking stack layers implemented by lwIP and Mongoose

On the lwIP side the stack is built from several separate components: the STM32 HAL Ethernet driver, the lwIP TCP/IP stack, mbedTLS for security, and example HTTP and MQTT code.

On the Mongoose side the same functionality is provided by a single integrated library. Mongoose includes the network driver, TCP/IP stack, TLS, and application protocols such as HTTP and MQTT. It also provides extendable examples and tools like Mongoose Wizard for building device dashboards.

One important detail: Mongoose can also run on top of lwIP. In that configuration lwIP provides the TCP/IP layer, while Mongoose provides TLS, protocols such as HTTP and MQTT, and the application framework, as a diagram below shows:

lwIP plus Mongoose architecture

In this benchmark we compare lwIP used in the typical STM32 Cube configuration against Mongoose used as a full networking stack. This reflects how lwIP is most commonly used in real embedded projects.

Baseline Bare-Metal STM32 Project

First we create a minimal bare-metal STM32 project using CubeMX and VS Code as the build environment. This project is our clean baseline for all later measurements, so it contains only the hardware setup needed for the benchmark and nothing more.

We configure the MCU to run at 216 MHz, set up the LED GPIOs, enable the Ethernet pins, bring up UART for debug output, and enable the random number generator. At this stage there is still no networking stack, no web server, no MQTT, and no TLS - just a tiny firmware skeleton that boots and runs on the Nucleo-F756ZG board.

Once the baseline firmware is up, we make it print the amount of free RAM over UART and record the flash footprint of the build. These numbers give us the reference point for everything that follows. After that, we will integrate lwIP and Mongoose separately on top of exactly the same baseline and measure how much code size, RAM usage, and complexity each solution adds.

GCC with newlib implements standard C functions through weak syscalls, so printf ultimately calls _write. By overriding _write to transmit data over USART3, all file output functions, including printf, are redirected to the debug console.

We override _sbrk, which malloc uses to grow the heap, so we can track heap usage. CubeMX already generates an _sbrk in sysmem.c, so we disable it and replace it with our version.

On STM32F756ZG the linker script defines 320 KB of RAM. The _end symbol marks the end of the data section, which is where the dynamic heap begins. The heap grows upward as _sbrk moves the break pointer, while the stack grows downward from the top of RAM. Free RAM is simply the space between the current heap end and the current stack pointer.

Now we update the main superloop to print free RAM once per second.

  uint32_t timer = 0, period = 1000;  // milliseconds
  while (1) {
    uint32_t now = HAL_GetTick();

    if ((int32_t) (now - timer) >= 0) {
      while ((int32_t)(now - timer) >= 0) timer += period;  // Advance timer
      printf("RAM: %u\n", ramFree());
    }

By default, a CubeMX CMake project only produces an ELF file. Add a post-build step to generate the raw .bin flash image file:

add_custom_command(TARGET ${PROJECT_NAME} POST_BUILD
    COMMAND ${CMAKE_OBJCOPY} -O binary
            $<TARGET_FILE:${PROJECT_NAME}>
            ${PROJECT_NAME}.bin
)

Build and flash the firmware. Record free RAM and flash usage:
Baseline free RAM, bytes: 325431
Baselinse flash usage, bytes: 18628

Integrating lwIP into the Baseline Firmware

With the baseline firmware in place, the next step is to integrate lwIP.

We start by making a copy of the baseline project. This is important because the same clean baseline will later be used for the Mongoose integration too, so both stacks are measured on exactly the same foundation.

Inside CubeMX we enable lwIP and keep the default configuration.

Add DHCP initialisation so the board can obtain an IP address automatically from the local network:

extern struct netif gnetif;  // Add DHCP Init into superloop
dhcp_start(&gnetif); 

Then we add lwIP processing to the superloop, which is the bare-metal equivalent of telling the stack: wake up, do your job, handle packets, don't just sit there looking expensive:

 // Superloop change: Print obtained IP address
 printf("RAM: %u, IP: %s\n", ramFree(), ip4addr_ntoa(&gnetif.ip_addr));
}
MX_LWIP_Process();  // Call lwIP processing

Once the interface comes up, the firmware prints the obtained IP address over UART. This gives us a simple sanity check that the Ethernet driver, PHY link, DHCP client, and lwIP stack are all alive and cooperating.

After that we verify that the board responds to ping. Ping may look boring, but it is the first useful end-to-end network test. If ping works, we know the MAC, driver, ARP, IP stack, packet RX/TX path, link state, and basic timing are all working together. If ping does not work, nothing else will.

With lwIP integrated and ping confirmed, we record the new flash footprint and free RAM numbers.

RAM footprint, bytes: 31504 (325431 - 293927)
Flash footprint, bytes: 67020 (85712 - 18692)

Now let's try to change lwIP configuration settings.

Open CubeMX and navigate to the lwIP configuration panel. Most options there are low-level TCP/IP stack parameters. For engineers who are not familiar with TCP/IP internals, these settings are difficult to understand and even harder to tune correctly. We classify lwIP configuration tuning as hard.

Takeaway: tuning lwIP configuration parameters requires deep knowledge of the TCP/IP stack internals. For this benchmark we therefore classify lwIP configuration tuning as hard.

Integrating Mongoose into the Baseline Firmware

Now let's integrate Mongoose into the same baseline firmware.

As before, we start by making a copy of the baseline project. This keeps the comparison fair: both lwIP and Mongoose are integrated on top of exactly the same starting point.

Mongoose can be added through CubeMX middleware, but in practice it is simpler to copy the library directly from GitHub because the core library consists of only two source files.

Go to https://github.com/cesanta/mongoose and copy the following files into the project: mongoose.c to Core/Src, mongoose.h to Core/Inc. Create a configuration file Core/Inc/mongoose_config.h:

#pragma once

// See https://mongoose.ws/documentation/#build-options
#define MG_ENABLE_TCPIP 1          // Enable build-in TCP/IP stack
#define MG_ARCH MG_ARCH_CUBE       // Change this if not Cube
#define MG_ENABLE_DRIVER_STM32F 1  // Change this if not STM32Fxx
#define MG_TLS MG_TLS_NONE         // No TLS

This file controls all build options and allows the firmware to enable or disable specific features.

Add mongoose.c to the build system by editing cmake/stm32cubemx/CMakeLists.txt.

With the files in place we add Mongoose initialisation to main.c, create a Mongoose instance:

struct mg_mgr mgr;        // Mongoose event manager
mg_log_set(MG_LL_DEBUG);  // MG_LL_ERROR, MG_LL_INFO, MG_LL_DEBUG, MG_LL_VERBOSE
mg_mgr_init(&mgr);        // Initialise event manager

and add event processing to the superloop:

mg_mgr_poll(&mgr, 0);   // Add to the superloop

Mongoose uses an event-driven architecture, so the firmware simply calls the event manager periodically to process network activity.

Build and flash the firmware. Record the flash footprint and free RAM numbers, and verify that the board responds to ping.

RAM footprint, bytes: 22524 (325431 - 302907)
Flash footprint, bytes: 50140 (68832 - 18692)

Now let's try tuning the configuration.

All Mongoose build options are documented at https://mongoose.ws/documentation/#build-options and are controlled through mongoose_config.h. The number of options is relatively small and the settings are easy to understand even for developers who are not networking experts.

For example, features such as HTTP, MQTT, TLS, or filesystem support can be enabled or disabled using simple compile-time flags. Changing these options does not require deep knowledge of TCP/IP internals.

Takeaway: tuning Mongoose configuration is straightforward and easy to understand.

Integration Summary

After integrating both stacks on top of the same baseline firmware, we can compare the integration effort and resulting firmware footprint.

The goal here is not to measure networking performance yet. We simply measure how difficult it is to bring each stack to a minimal working state and how much code and memory it adds to the firmware.

Metric lwIP Mongoose
Integration method CubeMX middleware CubeMX middleware, or copy 2 files
Files added to project 279 3
Lines of code added to project 105k 32k
Flash footprint 67020 50140
RAM footprint 31504 22524
Ethernet driver stm32f7xx_hal_eth.c stm32f.c
Ethernet driver size, lines 3300 250
Integration complexity Easy Easy
Configuration tuning difficulty Hard Easy

Both stacks successfully bring up the Ethernet interface and respond to ping, confirming that the driver, packet processing, and TCP/IP layers are working correctly. Both stacks are straightforward to integrate, but lwIP is harder to configure due to its large number of low-level TCP/IP tuning parameters.

In the next sections we will build on this foundation and add real networking functionality such as a web server, device dashboard, MQTT connectivity, TLS and OTA updates. Those steps will show how the two approaches behave in a realistic embedded application.