-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
known problemKnown problem, scheduled for future resolutionKnown problem, scheduled for future resolutionlaterMaybe laterMaybe later
Description
Background
In PR #595, we implemented direct S3 upload to bypass Vercel's 4.5MB limit. The original design included:
- Version-level deduplication: If content hash matches existing version, skip upload ✅
- Blob-level deduplication: Upload individual file blobs separately for cross-version deduplication ❌
Problem
The blob-level deduplication creates a paradox:
| Approach | Upload | Download | Issue |
|---|---|---|---|
| Only archive | archive.tar.gz | archive.tar.gz | No file-level dedup |
| Only blobs | individual blobs | Server reconstructs | Vercel serverless limits |
| Both blobs + archive | blobs + archive | archive | Double upload (wasteful) |
Current state:
- Prepare endpoint generates presigned URLs for blobs (never used)
- CLI/Sandbox only upload archive + manifest (not blobs)
- Blob table records created but no actual blob files in S3
Decision
For now, we will:
- Remove blob presigned URL generation - saves AWS API calls
- Remove blob ref counting - eliminates race condition risk
- Keep version-level deduplication - this already works and provides value
Future Consideration
If file-level deduplication becomes necessary (e.g., large repos with incremental changes), consider:
Option: Lambda async archive generation
- Client uploads only changed blobs
- Trigger AWS Lambda to generate archive from blobs
- Archive available after async processing
This requires additional infrastructure (Lambda, SQS) and is out of scope for now.
Related
- PR feat(api): add direct S3 upload endpoints for large file support #595: feat(api): add direct S3 upload endpoints for large file support
- Original issue bug: checkpoint creation fails with 'Invalid JSON response' on large file uploads #588: Checkpoint creation fails with "Invalid JSON response" on large file uploads
Metadata
Metadata
Assignees
Labels
known problemKnown problem, scheduled for future resolutionKnown problem, scheduled for future resolutionlaterMaybe laterMaybe later