fix: harden data room uploads and backfill view schema#1
fix: harden data room uploads and backfill view schema#1joelev wants to merge 1 commit intosdamico:mainfrom
Conversation
|
Reviewer checklist for this PR:
|
|
Validation update (local smoke harness): Executed 4 targeted scenarios against the changed handlers with mocked auth/DB dependencies to isolate behavior in this PR:
Additional validation:
Scope note:
|
sdamico
left a comment
There was a problem hiding this comment.
Good work overall — the cache-control fix on protected assets is an important security improvement, the atomic chunk enforcement (enforcing MAX_FILE_SIZE in the WHERE clause) is a solid approach, MIME validation is well-factored into isAllowedMimeType, and the migration backfill is clean and safe.
Two things worth addressing before merging:
Required: MAX_UPLOAD_BYTES must account for base64 overhead
MAX_UPLOAD_BYTES and MAX_FILE_SIZE are both set to 50 * 1024 * 1024:
// api/admin/data-room-upload.js, lines 4-5
const MAX_FILE_SIZE = 50 * 1024 * 1024; // 50MB
const MAX_UPLOAD_BYTES = 50 * 1024 * 1024; // 50MB ← bugThe chunk action reads raw base64 text from the request body and gates it with MAX_UPLOAD_BYTES. But a 50 MB binary file, when base64-encoded, is ~66.7 MB on the wire (~33% overhead). Using the same cap for both means a valid 50 MB file will be rejected mid-transfer with a 413 before it even reaches the SQL layer.
The fix is to size MAX_UPLOAD_BYTES to cover the largest base64-encoded payload that could represent a MAX_FILE_SIZE file:
const MAX_FILE_SIZE = 50 * 1024 * 1024;
const MAX_UPLOAD_BYTES = Math.ceil(MAX_FILE_SIZE * 4 / 3); // ~66.7 MB, covers base64 overheadThis keeps the actual stored size enforced at the DB layer (the WHERE clause you added), and MAX_UPLOAD_BYTES becomes a transport-layer guard that won't prematurely kill valid uploads.
Minor: decode(chunkBase64, 'base64') is called twice in the chunk UPDATE
In the new chunk append query:
SET
content = content || decode(${chunkBase64}, 'base64'),
size_bytes = octet_length(content) + octet_length(decode(${chunkBase64}, 'base64'))
WHERE id = ${id}
AND octet_length(content) + octet_length(decode(${chunkBase64}, 'base64')) <= ${MAX_FILE_SIZE}decode(chunkBase64, 'base64') appears three times — once in SET, once in the size_bytes expression, and once in the WHERE predicate. Postgres may optimize this away, but it's not guaranteed to. A CTE avoids the ambiguity:
WITH decoded AS (
SELECT decode(${chunkBase64}, 'base64') AS chunk_bytes
)
UPDATE data_room_files
SET
content = content || (SELECT chunk_bytes FROM decoded),
size_bytes = octet_length(content) + octet_length((SELECT chunk_bytes FROM decoded))
WHERE id = ${id}
AND octet_length(content) + octet_length((SELECT chunk_bytes FROM decoded)) <= ${MAX_FILE_SIZE}
RETURNING idThis is a minor optimization — the correctness issue with MAX_UPLOAD_BYTES is the one that needs fixing before ship.
Problem
Data room upload/access paths had multiple correctness and security gaps:
size_bytesand could succeed without checking post-append size constraints atomically.mime_type.public, max-age=3600), which is unsafe for access-controlled content.views/view_files/data_room_access.view_idfrom newer code paths without guaranteed schema presence.Root Cause
viewsartifacts were not present.What Changed
api/admin/data-room-upload.jssizeBytesvalues.MAX_FILE_SIZEtoMAX_UPLOAD_BYTES(base64 transport overhead aware).size_bytes+ enforce max size in one SQL statement.size_bytesfrom actual bytea length.mime_type.api/data/pages.jsCache-Control: no-store, no-cache, must-revalidatePragma: no-cacheExpires: 0migrations/012_view_schema_backfill.sqlviewstableview_filestable + indexesdata_room_access.view_id+ indexMigration Impact
IF NOT EXISTS/ADD COLUMN IF NOT EXISTS).Risk Assessment
Rollback Plan
a219ac0if regressions are found.api/admin/data-room-upload.jsandapi/data/pages.jswhile leaving schema backfill intact.Validation
npm run buildpasses:Built content/page.html (63624 bytes, 9 slides)test,lint, ortypecheckscripts are currently defined inpackage.json.