Skip to content

Reduce aplite memory pressure #119

@mattrossman

Description

@mattrossman

Aplite crashes are most likely coming from draw-time allocation, uncapped forecast buffers, and a few remaining low-memory gaps, so we should keep shrinking and instrumenting the heap-sensitive paths.

Full memory investigation

Aplite Memory Investigation

Summary

We were chasing repeated Aplite-only crashes caused by heap pressure. The app was stable on Basalt/Diorite but fragile on Aplite because the runtime heap budget is extremely small once the OS/runtime and app startup allocations are accounted for.

Key Numbers

  • Aplite app init baseline: heap_bytes_free() started around 2356, then fell to 1780 after app_message_init, 1756 after config load.
  • Aplite after UI init: the app often ended up with only 100 bytes free at runtime in some runs.
  • Final Aplite link-time report after the latest changes: 2534 bytes free heap, 22042 bytes footprint in RAM.
  • Basalt comparison: roughly 42-43 KB free heap after the same startup flow.

What We Found

  1. AppMessage is a fixed early cost, but not the main culprit.
  • app_message_init chose a 512 byte inbox.
  • app_message_open consumed about 576 bytes of heap on Aplite in the logged run.
  • This is meaningful, but not the dominant problem.
  1. The biggest Aplite pressure came from layer/object count.
  • forecast_layer was moderate.
  • weather_status_layer was expensive because it created multiple TextLayers and icon objects.
  • time_layer added three TextLayers.
  • calendar_layer originally created 21 TextLayers and was a major heap hog.
  • calendar_status_layer and battery_layer were the next crash points once earlier pressure was reduced.
  1. Calendar refactor was the biggest real improvement.
  • Replacing the 21 calendar TextLayers with a single custom-drawn layer improved the Aplite heap by about +396 bytes at that stage.
  • This moved the crash point forward and confirmed that per-cell text layers were the biggest single fixed cost.
  1. The rain/snow PNG experiment was not a big heap lever.
  • Build-time comparison showed the PNG version had slightly more free heap than the path version (3010 vs 2760 in one build snapshot).
  • The conclusion was that the precip asset choice was a small contributor, not the root issue.
  1. The battery layer crash was a null-allocation bug under low heap.
  • The crash mapped to battery_layer.c:85.
  • That line was dereferencing the palette pointer immediately after malloc(2 * sizeof(GColor)).
  • Null checks were added so the app can degrade gracefully instead of faulting.

Important Conclusions

  • Aplite does not have a realistic “comfortable” heap margin for this design.
  • The app can be perfectly fine on Basalt while still being very close to the edge on Aplite.
  • The biggest Aplite savings came from removing whole layer objects, not from swapping a small icon implementation.
  • The remaining risk is not one single bug; it is the cumulative cost of a fairly layered UI on a 24 KB platform.

What Changed

  • Removed heap GPath allocations from hot draw paths.
  • Flattened the calendar grid from 21 TextLayers to one custom-drawn layer.
  • Added heap probes around startup and layer init/refresh paths.
  • Added null checks in battery_layer and other creation paths.

Next Things To Do Later

  • Convert battery_layer to a primitive-drawn Aplite path instead of relying on the charging bitmap/palette.
  • Consider flattening any other small status icons that still use layered bitmap/text stacks.
  • Keep heap_bytes_free() probes around startup and layer creation until the remaining Aplite path is stable.

Change Log

This is the more complete map of the unstaged work that got us here:

  • src/c/watchface.c
    • Added app-start/app-shutdown heap probes so we can see the real budget before and after initialization.
  • src/c/appendix/app_message.c
    • Added heap_bytes_free() logging around AppMessage setup.
    • Logged the chosen inbox size so we could confirm we were not accidentally using a giant buffer.
  • src/c/layers/forecast_layer.c
    • Replaced per-draw GPath allocation with stack paths and added heap logging.
    • Later simplified the forecast drawing path further while profiling Aplite pressure.
  • src/c/layers/time_layer.c
    • Added heap probes around each TextLayer create and tick/update path.
    • Confirmed that the three-layer time stack was a measurable but not catastrophic cost.
  • src/c/layers/calendar_layer.c
    • Replaced the 21 per-day TextLayers with one custom-drawn layer.
    • This was the biggest single heap win in the session.
  • src/c/layers/weather_status_layer.c
    • Added heap probes and null checks for layer creation.
    • Experimented with precip icon rendering, including PNG-based assets and path-based fallbacks.
    • The PNG vs path choice was not a meaningful long-term heap lever compared with the text-layer savings.
  • src/c/layers/calendar_status_layer.c
    • Added profiling around the status bar.
    • Started moving the Aplite path away from a bitmap-heavy status strip toward a simpler draw path.
  • src/c/layers/battery_layer.c
    • Added null checks for bitmap and palette allocation.
    • Identified the low-heap crash at battery_layer.c:85 as a dereference-after-malloc failure.
  • src/c/layers/loading_layer.c
    • Removed an extra TextLayer and drew the loading message directly to save heap.
  • src/c/appendix/persist.c
    • Fixed a bad default write for PRECIP_TREND that was writing to the wrong key.
  • resources/img/rain.png and resources/img/snow.png
    • Added for the precip icon experiment.
    • Later proved to be a small contributor, not the root cause of the Aplite memory pressure.

Decision Summary

  • Keep:
    • startup heap probes
    • AppMessage sizing/logging
    • the calendar grid refactor
    • battery null checks
    • the loading-layer simplification
  • Revisit later:
    • whether the battery/status strip should be flattened further for Aplite
    • whether the precip icon implementation should stay path-based or resource-based once the rest of the app is stable
  • Likely discard if you want to simplify the branch:
    • most of the temporary profiling noise once the remaining heap hotspots are resolved

Automated Memory Monitoring In CI

Yes, this can be automated so we do not have to inspect logs by hand after every change.

Basic idea

  1. Add a small C helper like log_heap("after forecast refresh") that prints heap_bytes_free() and heap_bytes_used() with a consistent label.
  2. Run the watchface in the emulator during CI.
  3. Capture emulator stdout/stderr to a log file.
  4. Parse the heap lines and compare them to a baseline or threshold.
  5. Fail the CI job if free heap drops too far or if a checkpoint regresses.

How to keep CI from hanging

Do not wait on an open-ended emulator session forever. Use a marker-first run:

  • start the emulator in the background
  • wait for a known log marker like MEMORY_CHECK_DONE
  • kill the emulator as soon as that marker appears
  • keep a short timeout only as a fallback if the marker never shows up

Example shape:

mise install-emulator --logs > pebble.log 2>&1 &
pid=$!

while ! grep -q 'MEMORY_CHECK_DONE' pebble.log; do
  sleep 1
done

kill "$pid"
wait "$pid" || true

That keeps the job from hanging while still collecting the startup and refresh logs we care about.

Keep heap logging out of release builds

Yes, the logging should be compiled out of release builds.

The repo already has a build-profile split:

  • profiles/package.dev.json sets buildProfile to dev
  • profiles/package.release.json sets buildProfile to release
  • mise build and mise build release already select the profile

Use that to generate a tiny C header such as build_config.h with a flag like FCW2_ENABLE_HEAP_LOGGING.

Example:

#if FCW2_ENABLE_HEAP_LOGGING
static void log_heap(const char *tag) {
  APP_LOG(APP_LOG_LEVEL_DEBUG, "%s free=%lu used=%lu",
          tag,
          (unsigned long)heap_bytes_free(),
          (unsigned long)heap_bytes_used());
}
#else
#define log_heap(tag) ((void)0)
#endif

That gives us:

  • dev builds: full heap observability
  • release builds: zero runtime logging cost
  • one code path, no manual cleanup before shipping

What to measure automatically

  • boot heap
  • after app_message_open()
  • after each major layer group is created
  • after forecast refresh
  • after config refresh

What the CI check should look for

  • minimum free heap at boot
  • no regression larger than an agreed byte threshold
  • no unexpected heap drop after redraw-only paths
  • no missing heap log lines for the required checkpoints

Practical recommendation

Start with one Aplite smoke test and one threshold, not a full benchmark suite. The goal is to catch regressions early, not to perfectly profile every byte.

Keep/Toss Checklist

  • Keep for now
    • src/c/watchface.c heap probes
    • src/c/appendix/app_message.c sizing/logging
    • src/c/layers/calendar_layer.c custom-drawn calendar
    • src/c/layers/battery_layer.c null checks
    • src/c/layers/loading_layer.c no-text-layer loading state
  • Keep only if Aplite still needs it
    • src/c/layers/calendar_status_layer.c simplification
    • src/c/layers/weather_status_layer.c precip/icon fallback work
  • Toss once stable
    • extra heap logging that no longer helps debug
    • the rain/snow PNG experiment if you decide to stay with path/primitives
    • any temporary Aplite-only fallback code that duplicates Basalt behavior without real benefit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions