Wired Memory Management System #348

robertmsale · 2026-01-31T23:44:47Z

Proposed changes

Implements option 3 for #347 by moving the wired memory coordinator into mlx‑swift and exposing a generic manager/policy/ticket API.

Adds WiredMemoryManager, WiredMemoryTicket, WiredMemoryPolicy, and WiredMemoryEvent.
Introduces reservation vs active tickets and hysteresis (threshold + cooldown) to avoid shrink thrash while work is active.
Uses policy id (Identifiable) for grouping.
Adds tests covering policy stacking, reservations, hysteresis, cooldown, and admission behavior.
Adds doc comments and new DocC article (wired-memory.md) plus README/MLX doc updates.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

So devs can customize admission gating

robertmsale · 2026-02-01T01:10:23Z

Tested end-to-end with the changes in mlx-swift-lm most recent commits. So the pipeline from here to inference over there is effectively wired together with unit tests demonstrating the system, how to use it effectively, and preserving default behavior to some degree.

robertmsale · 2026-02-01T01:34:06Z

System overview

robertmsale · 2026-02-01T17:19:06Z

I added policy-only mode for CPU and future CUDA workflows (I see open issues about CUDA support with indications of inclusion) and made it so Apple Silicon devices in CPU-only mode can still use maximumRecommendedWorkingSetBytes as a reference cap 😄

davidkoski · 2026-02-02T23:05:31Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+MLX only provides the generic interfaces. MLXLMCommon (from mlx-swift-lm)
+provides LLM-focused policies such as `WiredSumPolicy`, `WiredMaxPolicy`, and
+`WiredFixedPolicy`. You can use `GPU.maxRecommendedWorkingSetBytes()` as a
+portable upper bound when designing custom policies.


I wonder if one or more of these should be in mlx-swift? WiredSumPolicy or WiredMaxPolicy for example might be generic enough and could be used in domains outside of llms. I do agree that the interesting policies will be domain specific.

davidkoski · 2026-02-02T23:07:40Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+try await ticket.withWiredLimit {
+    // run inference
+}
+```


Nice clear example!

davidkoski · 2026-02-02T23:10:09Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+
+// Reserve model weights without keeping the limit elevated while idle.
+let weights = policy.ticket(size: weightsBytes, kind: .reservation)
+_ = await weights.start()


Should there be a way to cancel this? E.g. you might have a server that would load and unload models (weights). It might make sense to cancel the policy rather than this reservation, or if this returned a Cancellable you could cancel the specific reservation.

I think if this was inside a Task you could cancel the task (I can see how that works), but this example looks like you might want to hold a ticket long term and wrapping it in a Task is unwieldy.

davidkoski · 2026-02-02T23:11:51Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

+
+### Choosing a baseline
+
+When wired memory is unsupported, the manager will use:


This might be unclear -- if wired memory is unsupported what do these values actually do? I would think it would just be a NOP.

davidkoski · 2026-02-02T23:14:47Z

Source/MLX/WiredMemory.swift

+        let result = mlx_set_wired_limit(&previous, 0)
+        guard result == 0 else { return nil }
+        var tmp: size_t = 0
+        _ = mlx_set_wired_limit(&tmp, previous)


I am concerned that temporarily setting it to 0 may cause trouble. Can we avoid using this? What if the manager could answer this and we just document that use of multiple managers outside a testing context is ill-advised or undefined. The manager sets it so it should surely have this answer.

davidkoski · 2026-02-02T23:17:37Z

Source/MLX/WiredMemory.swift

+///
+/// These settings implement hysteresis to prevent small or frequent shrinks
+/// while active work is running. Growing the limit is always allowed; shrinking
+/// is gated by a minimum drop and a minimum time between changes.


Is this something you observed being a problem? If so, awesome -- it seems like it could be useful to keep active memory hot.

davidkoski · 2026-02-02T23:19:53Z

Source/MLX/WiredMemory.swift

+
+            if let lastLimitChange {
+                let elapsed = Date().timeIntervalSince(lastLimitChange)
+                if elapsed < configuration.shrinkCooldown {


I have one concern here: the way I read this is we won't shrink unless another request comes through here?

Consider:

request for N bytes of wired memory

request is complete, reduce back to 0 but per the timeout no

no more requests come in

Does the process still hold the wired memory? I think so, but perhaps there is a path I didn't see.

davidkoski · 2026-02-02T23:21:30Z

Source/MLX/WiredMemory.swift

+    /// wired memory control is unsupported (e.g. CPU-only execution). The
+    /// manager will not attempt to change wired memory, but tickets can still
+    /// gate admission and emit events.
+    public var policyOnlyWhenUnsupported: Bool


Is there a benefit to having the default be false? Or even having this config -- why not just always do this? It seems harmless (the compute cost is minimal even if the result is a NOP).

davidkoski · 2026-02-02T23:24:24Z

Source/MLX/WiredMemory.swift

+    /// Debug label for the policy group, if applicable.
+    public let policy: String?
+    /// Baseline wired limit captured from the system.
+    public let baseline: Int?


Maybe the term baseline needs an explicit definition? My concept is that this is the idle floor -- we don't go below this, even when there are no outstanding requests. But the unsupported case and in the code seem to tie this to the max supported value -- I am not sure what the intent is.

I think the baseline today (without this code) is 0.

davidkoski · 2026-02-02T23:25:55Z

Source/MLX/WiredMemory.swift

+
+    /// Stable grouping key for policies.
+    private enum PolicyKey: Hashable {
+        case identifier(AnyHashable)


Do we need this enum? Why not just use AnyHashable as the key? This is internal to the actor so if the implementation requires adding another level it is safe to do so without callers being aware.

davidkoski · 2026-02-02T23:29:51Z

Source/MLX/WiredMemory.swift

+    public func end(id: UUID, policy: any WiredMemoryPolicy) async -> Int {
+        if let waiter = waiters.removeValue(forKey: id) {
+            waiter.resume()
+        }


There is a call to resumeWaiters() at the end that will awaken all the waiters. Why have a one off here? (not saying it is wrong, but it isn't clear to me -- if we need it I think maybe it needs a comment)

davidkoski · 2026-02-02T23:32:28Z

Source/MLX/WiredMemory.swift

+            waiter.resume()
+        }
+
+        guard WiredMemoryBackend.isSupported || policyOnlyMode else {


I think isSupported will return false if this is a Task set for the cpu device. Two questions:

will returning a 0 here affect the wired memory for the GPU tasks? would a nil be better to indicate "no change"?

the return value is more informative -- policy is applied by applyCurrentLimit(). would the return from the end of the function make more sense? currentLimit ?? baseline ?? 0

is it ok that resumeWaiters() is not called (also consider this question if the one-off in a previous line isn't needed) -- I think maybe this is ok because the wired memory probably doesn't change here

davidkoski · 2026-02-02T23:35:10Z

Source/MLX/WiredMemory.swift

+        }
+
+        guard let state = tickets.removeValue(forKey: id) else {
+            emit(kind: .ticketEndIgnored, ticketID: id)


What is the case where this would happen? I wonder if this should be a fatalError -- is some invariant violated here (like we lost a ticket)?

I think at one point it was mentioned that end should be idempotent. Should it? Or is that papering over programming errors? free() isn't idempotent and is a similar resource release idea.

davidkoski · 2026-02-02T23:43:39Z

Source/MLX/WiredMemory.swift

+                continuation.onTermination = { _ in
+                    Task { await self.removeEventContinuation(id: id) }
+                }
+            }


I think this is safe -- the call to AsyncStream is synchronous so the interior happens before it returns. It surprised me because I was used to this form:

let (stream, continuation) = AsyncStream<Generation>.makeStream()

I don't have a strong preference - I think they are the same, just pointing out as I had to read to make sure I understood.

davidkoski · 2026-02-02T23:45:20Z

Source/MLX/WiredMemory.swift

+        var baselineValue = ensureBaseline(refresh: baseline == nil || !hasActiveWork())
+        if tickets[id] != nil {
+            emit(
+                kind: .ticketStartIgnored,


Similar question on the end -- a double start might be a programming error. Is there a use case for allowing this?

davidkoski · 2026-02-02T23:55:04Z

Source/MLX/WiredMemory.swift

+    ///
+    /// If this was the last ticket, the manager restores the baseline and
+    /// clears internal state.
+    public func end(id: UUID, policy: any WiredMemoryPolicy) async -> Int {


This is marked as async but I do not see any await inside. Callers (outside the actor) have to await anyway as that is what the actor requires, but this would also force callers inside the actor (if there were any) to 1) be async and 2) have a potentially a suspend point, which is requires a little more care.

I think this should remove the async.

davidkoski · 2026-02-02T23:57:57Z

Source/MLX/WiredMemory.swift

+        if WiredMemoryBackend.isSupported {
+            return WiredMemoryBackend.readCurrentLimit() ?? 0


Could we use currentLimit here? I know there is an issue if you have multiple managers but we can document that as undefined (it seems useless except for tests).

In fact, I wonder ... we could make the init() on the actor be non-public to enforce this. The tests could still create multiple. Or add an obvious path like static func newTestPolicy() that would be more noticeable if somebody tried multiple in a real program?

davidkoski · 2026-02-03T00:00:42Z

Source/MLX/WiredMemory.swift

+        return 0
+    }
+
+    private func ensureBaseline(refresh: Bool) -> Int {


This is not taking an action, it is computing the current baseline. ensure sounds like it is applying. I see it sits on top of resolve so we are already using that name :-). The key here is that it notices when the resolved baseline changes. resolveBaselineAndEmit? I am not great with the names (and it is private so ultimately just needed for clarity), see what you think.

davidkoski · 2026-02-03T00:07:50Z

README.md

 popd
 ```

+## Wired Memory Management


These docs are well written but I don't think we need them at the top level of the README. I wonder if this would be better in Source/MLX/Documentation.docc/MLX.md (the top level of the MLX docs where there are pointers to some of the articles with more information). This README is much higher level.

davidkoski · 2026-02-03T00:11:18Z

Source/MLX/WiredMemory.swift

+        size: Int,
+        policy: any WiredMemoryPolicy,
+        kind: WiredMemoryTicketKind
+    ) async -> Int {


Unlike the end this one must be async as it can suspend per the admit on the policy.

davidkoski · 2026-02-03T00:13:42Z

Source/MLX/Documentation.docc/Articles/wired-memory.md

@@ -0,0 +1,121 @@
+# Wired Memory Management
+
+Coordinate a process-wide wired memory limit for GPU workloads.


I think we need a few additions:

deprecate Memory.withWired and GPU.withWired

the sync version of that can be documented as being a NOP

the async version can use a static policy, e.g. WiredSumPolicy (document)

We don't want a back door that will let people bypass the mechanism.

davidkoski · 2026-02-03T00:14:28Z

Looks really good! Check out my comments & questions and see what you think.

Robert Sale added 2 commits January 31, 2026 15:28

Wired Memory Management System

1fbc11f

Exposed GPU.maxRecommendedWorkingSetBytes()

566f260

So devs can customize admission gating

robertmsale marked this pull request as ready for review February 1, 2026 01:04

Added policy-only mode for CPU-only support (and future CUDA)

d0950ee

davidkoski reviewed Feb 2, 2026

View reviewed changes

davidkoski reviewed Feb 3, 2026

View reviewed changes


		### Choosing a baseline

		When wired memory is unsupported, the manager will use:

		if WiredMemoryBackend.isSupported {
		return WiredMemoryBackend.readCurrentLimit() ?? 0

		@@ -0,0 +1,121 @@
		# Wired Memory Management

		Coordinate a process-wide wired memory limit for GPU workloads.

Wired Memory Management System #348

Are you sure you want to change the base?

Wired Memory Management System #348

Conversation

robertmsale commented Jan 31, 2026

Proposed changes

Checklist

Uh oh!

robertmsale commented Feb 1, 2026

Uh oh!

robertmsale commented Feb 1, 2026

Uh oh!

robertmsale commented Feb 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkoski commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davidkoski Feb 2, 2026 •

edited

Loading

davidkoski Feb 3, 2026 •

edited

Loading