A bare-metal dynamic memory allocator for STM32F411CE, written from scratch in C with no standard library dependencies.
HeapForge is a complete reimplementation of malloc, free, and calloc targeting the STM32F411CE (Black Pill) microcontroller. No OS, no stdlib, no HAL — just raw register writes and pointer arithmetic on top of a custom linker script.
The project was built to deeply understand memory management, embedded toolchains, and low-level C programming.
stm_malloc(size)— first-fit allocator with block splittingstm_free(ptr)— frees a block with forward and backward coalescingstm_calloc(count, size)— allocates and zero-initializes memoryheap_integrity_check()— walks the linked list and validates allnext->prevpointers- RTT debug output via probe-rs (no UART wiring required)
- Zero stdlib dependencies — custom
uint8_t,uint16_t,uint32_ttypedefs
The heap lives between the end of .bss and the top of RAM, defined in the linker script:
RAM (128K)
┌─────────────┐ 0x20000000
│ .data │
├─────────────┤
│ .bss │ ← _ebss / _heap_start
├─────────────┤
│ HEAP │ ← HeapForge manages this region
├─────────────┤
│ STACK │ ← grows downward from _estack
└─────────────┘ 0x20020000
Every allocation is preceded by a small header:
typedef struct memblock {
uint32_t size; // usable bytes in this block
uint8_t used; // 1 = allocated, 0 = free
struct memblock *next; // next block in the list
struct memblock *prev; // previous block (for O(1) coalescing)
} memblock;Before malloc(64):
[ block: 128K, free ]
After malloc(64):
[ block: 64B, used ] -> [ block: 128K-64B-sizeof(header), free ]
When a block is freed, HeapForge merges adjacent free blocks in both directions to prevent fragmentation:
Before free(B):
[ A: free ] -> [ B: used ] -> [ C: free ]
After free(B):
[ A+B+C: free ]
heapforge/
├── src/
│ └── main.c # allocator implementation + test suite
├── inc/
│ └── heapforge.h # types, macros, register definitions
├── startup.c # vector table + Reset_Handler
├── linker.ld # memory layout, heap symbols
└── Makefile
arm-none-eabi-gccprobe-rs- macOS / Linux
brew install --cask gcc-arm-embedded
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/probe-rs/probe-rs/releases/latest/download/probe-rs-tools-installer.sh | shmake # compile
make flash # flash via probe-rs over SWD
make re # clean rebuildprobe-rs rtt --chip STM32F411CEUx --elf blink.elf| Property | Value |
|---|---|
| MCU | STM32F411CEU6 |
| Core | Cortex-M4F |
| Flash | 512K @ 0x08000000 |
| RAM | 128K @ 0x20000000 |
| Board | WeAct Black Pill |
- Writing a linker script from scratch and exposing symbols to C
- Bare-metal startup code: vector table,
.datacopy,.bsszero-init - Memory-mapped peripheral registers — GPIO, RCC — without any HAL
- Implementing a doubly-linked free list allocator with coalescing
- RTT (Real-Time Transfer) protocol for zero-wiring debug output
- Why
volatilematters for memory-mapped registers and shared memory
-
stm_realloc - Best-fit allocation strategy
-
stm_malloc_alignedfor DMA-safe allocations -
heap_dump— full block list over RTT -
heap_stats— fragmentation metrics - UART backend for standalone debug output