-
Notifications
You must be signed in to change notification settings - Fork 111
Labels
triagePending for triagingPending for triaging
Description
Task Summary
The current dataset upload flow exposes MinIO/LakeFS presigned URLs to the client and lets the browser upload file parts directly to object storage. A new design moves the File Service into the data path and introduces server-side upload sessions states keyed by filePath, uid and did.
New DB schema (multipart upload session + parts)
| Field / Property | dataset_upload_session |
dataset_upload_session_part |
|---|---|---|
| Purpose | Tracks one active multipart upload session for a dataset file (per user + dataset + file path). | Tracks per-part completion state for a multipart upload (stores etag needed for finalize). |
| Primary key | (uid, did, file_path) |
(upload_id, part_number) |
| Key columns | upload_id (UNIQUE), physical_address, num_parts_requested |
etag |
| Defaults | — | etag TEXT NOT NULL DEFAULT '' |
| Checks | — | CHECK (part_number > 0) |
| Foreign keys | did → dataset(did) ON DELETE CASCADEuid → "user"(uid) ON DELETE CASCADE |
upload_id → dataset_upload_session(upload_id) ON DELETE CASCADE |
| Cleanup behavior | Deleting a session deletes the session row (and cascades to parts). | Deleted automatically when the parent session is deleted. |
| Why this matters | Keeps server-side state (no presigned URLs). Enforces expected total parts. | Enables per-part locking, retries, and DB-based completeness validation (no listParts() call). |
Current Behavior
New Behavior
Priority
P3 – Low
Task Type
- Code Implementation
- Documentation
- Refactor / Cleanup
- Testing / QA
- DevOps / Deployment
Metadata
Metadata
Assignees
Labels
triagePending for triagingPending for triaging