Skip to content

Limit the in-flight downloads for model parameters#381

Open
IAvecilla wants to merge 15 commits intomainfrom
limit-p2p-parameter-sharing
Open

Limit the in-flight downloads for model parameters#381
IAvecilla wants to merge 15 commits intomainfrom
limit-p2p-parameter-sharing

Conversation

@IAvecilla
Copy link
Contributor

@IAvecilla IAvecilla commented Nov 18, 2025

This PR introduces a download scheduler for concurrency control and structured retry logic for P2P parameter sharing:

  • New DownloadSchedulerHandle actor that manages concurrent download capacity. It enforces a configurable max concurrent downloads limit.
  • Unified retry logic with differentiated policies:
    • Model paramaters and config: Retries immediately with no max retry limit. Retries are gated by the concurrency limiter, so they only start when a slot is available (max 8 at the sime time).
    • Distro results: Exponential backoff (backoff_base * 2^retries) with a configurable max retry count (default 3). These are not gated by download capacity.
  • Moved download_manager.rs into a download/ module with manager.rs and scheduler.rs.
  • Add unit tests for the scheduler covering capacity management, FIFO ordering, retry policies, and edge cases

@IAvecilla IAvecilla self-assigned this Nov 18, 2025
@IAvecilla IAvecilla linked an issue Dec 17, 2025 that may be closed by this pull request
@IAvecilla IAvecilla marked this pull request as ready for review January 21, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limit concurrent downloads in model sharing

1 participant