fix(local-server,local-driver): improve memory cleanup across local server components#26298
fix(local-server,local-driver): improve memory cleanup across local server components#26298anthony-murphy-agent wants to merge 10 commits intomicrosoft:mainfrom
Conversation
…nents This commit addresses multiple resource cleanup issues in the local server and related test utilities: LocalWebSocketServer: - Track active socket connections - Disconnect all sockets on server close - Clear rooms and remove event listeners on socket disconnect - Add early return if socket already disconnected LocalDeltaConnectionServer: - Remove temporary error handlers after successful connection LocalOrdererManager: - Implement removeOrderer() to actually close and remove orderers PubSub: - Handle unsubscribe gracefully when topic/subscriber doesn't exist - Remove assert() calls that could crash on edge cases LocalKafka/LocalLambdaController: - Add unsubscribe() method to LocalKafka - Track subscription in LocalLambdaController - Unsubscribe from Kafka on controller close LocalDocumentService: - Implement dispose() with proper cleanup - Add disposed flag to prevent double disposal LocalDocumentDeltaConnection: - Fix floating promise in submit() with proper error handling Test utilities (TestDb, TestConsumer, TestPublisher): - Add removeAllListeners() calls in close() methods Also adds unit tests for: - LocalWebSocketServer socket tracking and cleanup - PubSub graceful unsubscribe behavior - LocalKafka unsubscribe functionality Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… cleanup Adds an unsubscribe method to LocalKafka to allow removing subscriptions, which is necessary for proper resource cleanup when documents are closed. Updates LocalNode to track and clean up kafka subscriptions during close. Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…erver components Critical memory leak fixes for unit testing scenarios with frequent create/teardown: local-server (localDeltaConnectionServer.ts): - Fix event listener leak: remove all temp handlers on both success AND error paths - Store and close PubSub and MongoManager in close() method local-driver: - Add dispose() to LocalDocumentStorageService to clear blobsShaCache - Implement close() in LocalSessionStorageDb to clear collections and listeners memory-orderer: - Add close() to IPubSub interface and PubSub class - Add close() to Socket class to remove ws listeners and cleanup - Add close() to RemoteNode to cleanup connectMap, orderers, and reject pending promises tinylicious: - Add close() to PubSubPublisher to implement updated IPubSub interface Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nect The removeOrderer() method is called from disconnect.ts when any client disconnects. The previous implementation closed the orderer, which broke all other connected clients since the orderer serves the entire document. Reverted to no-op behavior - orderer lifecycle is managed separately via the close() method when the server shuts down. Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Clear clients array after disposing containers in test harness - Null out references to help garbage collection - Add periodic GC calls during minimization (every 10 replays) This reduces memory usage from ~1250MB to ~940MB during minimization and prevents OOM when running stress tests with minimization enabled. Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/azp run Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - client packages, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid, Build - protocol-definitions, Build - test-tools |
|
/azp run repo-policy-check, server-gitrest, server-gitssh, server-historian, server-routerlicious |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR implements critical memory leak fixes for local-server and local-driver packages, which are extensively used for unit testing with frequent create/teardown cycles. The changes add proper cleanup methods (close(), dispose()) across multiple components to ensure resources are released when components are no longer needed.
Changes:
- Added
close()methods to IPubSub interface and implementations (PubSub, PubSubPublisher) to clear subscriptions and topics - Implemented proper cleanup in Socket, RemoteNode, LocalNode, LocalKafka, and related components with listener removal and resource clearing
- Enhanced LocalWebSocketServer to track and disconnect all active connections on close
- Fixed event listener leaks in localDeltaConnectionServer by removing handlers on both success and error paths
- Added
dispose()to LocalDocumentStorageService, LocalDocumentServiceFactory, and LocalDocumentService to clear caches and references - Removed lodash dependency in memory-orderer package, replacing with native spread operators
- Added periodic garbage collection in FuzzTestMinimizer to prevent memory buildup during minimization
- Updated type validation configs to acknowledge breaking changes from new public methods
Reviewed changes
Copilot reviewed 30 out of 31 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| server/routerlicious/pnpm-lock.yaml | Removed lodash and @types/lodash dependencies, added libc metadata to resolver bindings |
| server/routerlicious/packages/tinylicious/src/services/pubSubPublisher.ts | Added empty close() method to satisfy IPubSub interface |
| server/routerlicious/packages/test-utils/src/testPublisher.ts | Added cleanup in close() to remove event listeners and clear topics |
| server/routerlicious/packages/test-utils/src/testKafka.ts | Added cleanup in TestConsumer.close() to remove event listeners |
| server/routerlicious/packages/test-utils/src/testCollection.ts | Added cleanup in TestDb.close() to clear collections and remove listeners |
| server/routerlicious/packages/memory-orderer/src/test/types/validateServerMemoryOrdererPrevious.generated.ts | Added ts-expect-error annotations for known breaking changes |
| server/routerlicious/packages/memory-orderer/src/test/pubsub.spec.ts | New comprehensive test file for PubSub functionality including subscribe, unsubscribe, and cleanup |
| server/routerlicious/packages/memory-orderer/src/test/localKafka.spec.ts | Added tests for unsubscribe functionality with single and multiple subscriptions |
| server/routerlicious/packages/memory-orderer/src/socket.ts | Added close() method to cleanup listeners, close underlying socket, and clear pending messages |
| server/routerlicious/packages/memory-orderer/src/remoteNode.ts | Added close() method to cleanup socket listeners, reject pending promises, and clear maps |
| server/routerlicious/packages/memory-orderer/src/pubsub.ts | Added close() method and made unsubscribe() defensive by removing asserts |
| server/routerlicious/packages/memory-orderer/src/localOrderer.ts | Replaced lodash merge with native spread operator |
| server/routerlicious/packages/memory-orderer/src/localNode.ts | Added close() method, replaced lodash clone with spread operator, added heartbeat timer cancellation |
| server/routerlicious/packages/memory-orderer/src/localLambdaController.ts | Added subscription tracking and proper unsubscribe in close() |
| server/routerlicious/packages/memory-orderer/src/localKafka.ts | Changed subscribe() to return subscription, added unsubscribe() method |
| server/routerlicious/packages/memory-orderer/package.json | Removed lodash dependencies, added breaking change entries to typeValidation config |
| server/routerlicious/packages/local-server/src/test/localWebSocketServer.spec.ts | New comprehensive test file for LocalWebSocketServer socket tracking and cleanup |
| server/routerlicious/packages/local-server/src/localWebSocketServer.ts | Added connection tracking and proper cleanup of rooms and listeners on disconnect |
| server/routerlicious/packages/local-server/src/localOrdererManager.ts | Enhanced comment explaining why removeOrderer is a no-op |
| server/routerlicious/packages/local-server/src/localDeltaConnectionServer.ts | Fixed event listener leak by removing handlers on all exit paths, added pubsub and mongoManager cleanup |
| server/routerlicious/packages/local-server/api-report/server-local-server.api.md | Added removeConnection() to LocalWebSocketServer public API |
| packages/test/stochastic-test-utils/src/minification.ts | Added periodic garbage collection during minimization to prevent memory buildup |
| packages/test/local-server-stress-tests/src/localServerStressHarness.ts | Enhanced cleanup by disposing containers, clearing client array, and nulling references |
| packages/drivers/local-driver/src/test/types/validateLocalDriverPrevious.generated.ts | Added ts-expect-error annotation for LocalDocumentServiceFactory breaking change |
| packages/drivers/local-driver/src/localSessionStorageDb.ts | Added cleanup in LocalSessionStorageDb.close() to clear collections and remove listeners |
| packages/drivers/local-driver/src/localDocumentStorageService.ts | Added dispose() method to clear blobsShaCache Map |
| packages/drivers/local-driver/src/localDocumentServiceFactory.ts | Added dispose() method to clear documentDeltaConnectionsMap |
| packages/drivers/local-driver/src/localDocumentService.ts | Enhanced dispose() to dispose inner service and remove listeners |
| packages/drivers/local-driver/src/localDocumentDeltaConnection.ts | Added error handling for submit() promise to prevent unhandled rejections |
| packages/drivers/local-driver/package.json | Added breaking change entry for LocalDocumentServiceFactory |
| packages/drivers/local-driver/api-report/local-driver.legacy.beta.api.md | Added dispose() to LocalDocumentServiceFactory public API |
Files not reviewed (1)
- server/routerlicious/pnpm-lock.yaml: Language not supported
The webSocketServer.close() call was not being awaited, which could cause the close() method to complete before the web socket server has fully cleaned up its resources. Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/azp run Build - api-markdown-documenter, Build - benchmark-tool, Build - build-common, Build - build-tools, Build - client packages, Build - common-utils, Build - eslint-config-fluid, Build - eslint-plugin-fluid, Build - protocol-definitions, Build - test-tools |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
🔗 No broken links found! ✅ Your attention to detail is admirable. linkcheck output |
|
|
||
| // Clear pending messages and remove all listeners from this EventEmitter | ||
| this.pending = []; | ||
| this.removeAllListeners(); |
There was a problem hiding this comment.
Self-review note: The send() method (below the diff hunk) uses setImmediate to batch and send pending messages. If close() is called while a setImmediate callback is already scheduled (sendScheduled = true), the callback will still fire after close and attempt this.socket.send() on the already-closed underlying ws socket.
Consider adding a this.closed guard at the top of the setImmediate callback in send(), similar to how the heartbeat timer in LocalNode guards against firing after close.
-- anthony-murphy
There was a problem hiding this comment.
Acknowledged. The setImmediate callback in send() does not check this.closed, so it could fire after close() and attempt to send on the closed underlying ws socket. I will add a this.closed guard at the top of the setImmediate callback.
There was a problem hiding this comment.
Fixed in 910c91d. Added a this.closed guard at the top of the setImmediate callback in send() -- if close() was called while a send was scheduled, the callback now returns immediately instead of attempting to send on the closed socket.
| * Disposes the factory and cleans up all tracked connections. | ||
| */ | ||
| public dispose(): void { | ||
| this.documentDeltaConnectionsMap.clear(); |
There was a problem hiding this comment.
Self-review note: This dispose() only clears the map, but doesn't actually disconnect any of the tracked LocalDocumentDeltaConnection instances. The connections remain live with active event listeners. Should we iterate the map and call disconnectClient or dispose on each connection before clearing?
Also, there's no _disposed guard here (unlike the other dispose methods in this PR), so calling dispose() twice would clear the map again (harmless but inconsistent with the pattern used elsewhere).
-- anthony-murphy
There was a problem hiding this comment.
Acknowledged. The dispose() method should disconnect each tracked LocalDocumentDeltaConnection before clearing the map, and should include a _disposed guard for consistency with other dispose methods in this PR. I will fix both issues.
There was a problem hiding this comment.
Fixed in 910c91d. Added a _disposed guard and now iterates all tracked LocalDocumentDeltaConnection instances calling disconnectClient() before clearing the map.
| deliContext, | ||
| moiraContext, | ||
| merge({}, DefaultServiceConfiguration, serviceConfiguration), | ||
| { ...DefaultServiceConfiguration, ...serviceConfiguration }, |
There was a problem hiding this comment.
Self-review note: The original code used lodash.merge({}, DefaultServiceConfiguration, serviceConfiguration) which performs a deep merge of nested objects. The replacement { ...DefaultServiceConfiguration, ...serviceConfiguration } only does a shallow merge -- any nested objects in serviceConfiguration will completely replace the corresponding nested objects from DefaultServiceConfiguration rather than merging their properties.
For example, if DefaultServiceConfiguration has { moira: { enable: false, endpoint: 'x' } } and serviceConfiguration has { moira: { enable: true } }, the old code would produce { moira: { enable: true, endpoint: 'x' } } but the new code produces { moira: { enable: true } } (losing endpoint).
Looking at the callers in this codebase, most pass either {} or a partial config with only top-level overrides, so this may be fine in practice. But it's worth noting the behavior change, especially since IServiceConfiguration has nested properties like deli, scribe, and moira.
-- anthony-murphy
There was a problem hiding this comment.
Acknowledged. The shallow spread replacing lodash.merge is a behavior change for nested properties. However, looking at the callers in this codebase, most pass {} or partial configs with only top-level overrides. I think the risk is low enough for test infrastructure, and the lodash removal is a worthwhile cleanup. I will add a code comment documenting this is a shallow merge so future maintainers are aware of the behavior.
There was a problem hiding this comment.
Addressed in 910c91d. Added a code comment documenting the shallow merge behavior so future maintainers are aware that nested objects are replaced rather than deep-merged. Decided not to restore lodash.merge since the current callers only use top-level overrides.
| } | ||
| this.connectMap.clear(); | ||
|
|
||
| // Clear orderers and topic maps |
There was a problem hiding this comment.
Self-review note: this.orderers contains ProxyOrderer instances which implement IOrderer (with a close() method). This just clears the map without calling close() on each orderer. Should we close the ProxyOrderer instances before clearing the map, similar to how LocalNode.close() and LocalOrdererManager.close() close their orderers?
-- anthony-murphy
There was a problem hiding this comment.
Acknowledged. The ProxyOrderer instances should be closed before clearing the map, similar to how LocalNode.close() and LocalOrdererManager.close() close their orderers. I will add orderer close calls in RemoteNode.close().
There was a problem hiding this comment.
Fixed in 910c91d. Now calls close() on each ProxyOrderer instance in the map before clearing it. Uses Promise.all with a .catch() for best-effort cleanup so individual failures do not prevent the rest of the cleanup.
| * Removes an orderer from the order map. Called when an orderer is closed. | ||
| */ | ||
| public removeOrderer(tenantId: string, documentId: string): void { | ||
| const fullId = `${tenantId}/${documentId}`; |
There was a problem hiding this comment.
Self-review note: This new public method removeOrderer is added with a JSDoc saying it's called when an orderer is closed, but I don't see any caller in this PR. Is it intended to be wired up to LocalOrderer.close() as a callback, or called from LocalOrdererManager? If it's not needed yet, consider removing it to avoid exposing dead API surface. If it IS needed, it should be wired up -- otherwise the orderMap will only be cleaned up on full close(), not when individual documents are closed.
-- anthony-murphy
There was a problem hiding this comment.
Acknowledged. removeOrderer is not wired up to anything and represents dead API surface. I will remove it to keep the public API clean. If needed in the future, it can be added back when the caller is also implemented.
There was a problem hiding this comment.
Fixed in 910c91d. Removed the unused removeOrderer() method entirely. It can be added back in the future when a caller is implemented.
| public async close(): Promise<void> {} | ||
| public async close(): Promise<void> { | ||
| // Clear all collections to release memory | ||
| for (const key of Object.keys(this.collections)) { |
There was a problem hiding this comment.
Self-review note: TestCollection instances returned from collection() likely hold direct references to the arrays in this.collections. Setting this.collections[key] = [] creates a new empty array for each key, but any previously-created TestCollection still references the old array with all its data. For memory cleanup purposes, consider calling .length = 0 on each array instead (which mutates the existing array in place), or clearing the entire this.collections object reference.
Looking at the TestNotImplementedCollection / TestCollection code would clarify whether this is actually an issue -- if collections are always accessed fresh from TestDb.collection() rather than cached, this is fine.
-- anthony-murphy
There was a problem hiding this comment.
Acknowledged. You are right that previously-created TestCollection instances still hold references to the old arrays. Looking at the code, TestCollection is constructed with a direct reference to the array from this.collections[name], so setting this.collections[key] = [] does not clear the array the TestCollection holds. I will fix this by using .length = 0 to mutate the arrays in place.
There was a problem hiding this comment.
Fixed in 910c91d. Changed this.collections[key] = [] to this.collections[key].length = 0 which mutates the existing array in place, ensuring any TestCollection instances holding references to those arrays also see the cleared data.
| public async close() { | ||
| await this.webSocketServer.close(); | ||
| await this.ordererManager.close(); | ||
| this.pubsub.close(); |
There was a problem hiding this comment.
Self-review note: If this.ordererManager.close() throws, the subsequent pubsub.close() and mongoManager.close() will be skipped. Since this is test infrastructure and individual failures would indicate a larger problem, this sequential approach is probably fine. But if we want to be more defensive, we could wrap each in a try-catch or use a pattern like:
const errors: unknown[] = [];
try { await this.webSocketServer.close(); } catch (e) { errors.push(e); }
try { await this.ordererManager.close(); } catch (e) { errors.push(e); }
try { this.pubsub.close(); } catch (e) { errors.push(e); }
try { await this.mongoManager.close(); } catch (e) { errors.push(e); }
if (errors.length > 0) { throw errors[0]; }Low priority -- just noting for completeness.
-- anthony-murphy
There was a problem hiding this comment.
Acknowledged. This is a valid point about error handling in sequential close. Since this is test infrastructure and individual failures would indicate a larger problem, I think the current sequential approach is acceptable. The defensive try-catch pattern would add complexity for a low-probability scenario. I will leave this as-is for now but add a brief comment noting the sequential dependency.
There was a problem hiding this comment.
Addressed in 910c91d. Added a code comment noting the sequential dependency and that a failure in any step will skip subsequent cleanup. Keeping as-is since defensive try-catch would add complexity for a low-probability scenario in test infrastructure.
- Add closed guard in Socket.send() setImmediate callback to prevent sending on closed ws socket - Add _disposed guard and disconnect tracked connections in LocalDocumentServiceFactory.dispose() - Close ProxyOrderer instances in RemoteNode.close() before clearing map - Remove unused removeOrderer() method from LocalNode - Mutate arrays in place in TestDb.close() so TestCollection refs see cleared data - Add comment documenting shallow merge behavior in LocalOrderer.load() - Add comment noting sequential close dependency in LocalDeltaConnectionServer.close() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: anthony-murphy <anthony.murphy@microsoft.com>
|
After the push of 910c91d, Azure Pipelines need to be triggered for this fork PR. Please copy and post the following commands: |
1 similar comment
|
After the push of 910c91d, Azure Pipelines need to be triggered for this fork PR. Please copy and post the following commands: |
Summary
Critical memory leak fixes for local-server and local-driver packages, which are used extensively for unit testing with frequent create/teardown cycles.
Changes
local-server (localDeltaConnectionServer.ts):
connectWebSocket()close()methodlocal-driver:
dispose()toLocalDocumentStorageServiceto clearblobsShaCacheMapclose()inLocalSessionStorageDbto clear collections and remove listenersmemory-orderer:
close()toIPubSubinterface andPubSubclass to clear topicsclose()toSocketclass to remove ws listeners and cleanup pending messagesclose()toRemoteNodeto cleanup connectMap, orderers, topicMap, and reject pending promisestinylicious:
close()toPubSubPublisherto implement updatedIPubSubinterfaceBreaking Changes
These changes add new methods to public interfaces/classes:
IPubSub.close()PubSub.close()Socket.close()RemoteNode.close()LocalDocumentServiceFactory.dispose()Type validation configs updated to acknowledge these breaking changes.
Test plan
🤖 Generated with Claude Code