fix: harden data room uploads and backfill view schema by joelev · Pull Request #1 · sdamico/pitch-deck

joelev · 2026-03-02T13:03:36Z

Problem

Data room upload/access paths had multiple correctness and security gaps:

Chunked uploads enforced the wrong size limit and could exceed the intended 50MB file cap.
Chunk append operations did not consistently update size_bytes and could succeed without checking post-append size constraints atomically.
Entry-page metadata updates did not persist mime_type.
Protected data-room page responses were cacheable (public, max-age=3600), which is unsafe for access-controlled content.
Existing environments could reference views/view_files/data_room_access.view_id from newer code paths without guaranteed schema presence.

Root Cause

Validation logic drift between upload initialization and chunk append paths.
Missing atomic guardrail in SQL update for append-size enforcement.
Incomplete schema evolution for legacy databases where views artifacts were not present.
Cache headers on protected endpoints were optimized for performance rather than confidentiality.

What Changed

`api/admin/data-room-upload.js`

Added shared MIME validation helper and enforced MIME checks in upload init path.
Added validation for invalid sizeBytes values.
Switched chunk request-size cutoff from MAX_FILE_SIZE to MAX_UPLOAD_BYTES (base64 transport overhead aware).
Made chunk append update atomic: append + recompute size_bytes + enforce max size in one SQL statement.
Added explicit 404 vs 413 handling when append update does not return a row.
Ensured finalize path synchronizes size_bytes from actual bytea length.
Entry-page init update now persists mime_type.

`api/data/pages.js`

Replaced cacheable headers with strict no-cache headers for protected file responses:
- Cache-Control: no-store, no-cache, must-revalidate
- Pragma: no-cache
- Expires: 0

`migrations/012_view_schema_backfill.sql`

Added idempotent backfill for view access schema:
- views table
- view_files table + indexes
- data_room_access.view_id + index
Designed as safe no-op on fresh installs and corrective for older deployments.

Migration Impact

Forward-compatible and idempotent (IF NOT EXISTS / ADD COLUMN IF NOT EXISTS).
No destructive DDL.
Existing data remains intact; only missing structures are created.

Risk Assessment

Low to medium: upload path behavior is stricter and may reject malformed previously-tolerated requests.
Low: cache policy becomes more restrictive (expected for sensitive content).
Low: migration is additive/idempotent.

Rollback Plan

Revert commit a219ac0 if regressions are found.
Application rollback is immediate; migration objects are additive and can remain in place safely.
If required, isolate rollback to API behavior by reverting only api/admin/data-room-upload.js and api/data/pages.js while leaving schema backfill intact.

Validation

npm run build passes:
- Built content/page.html (63624 bytes, 9 slides)
No test, lint, or typecheck scripts are currently defined in package.json.

joelev · 2026-03-02T13:06:03Z

Reviewer checklist for this PR:

Upload correctness

Confirm chunked uploads now enforce transport size using MAX_UPLOAD_BYTES and final stored size using atomic SQL guard (<= MAX_FILE_SIZE).
Confirm size_bytes remains accurate after append/finalize paths.

Validation/security

Confirm MIME validation is enforced consistently in upload init and standard upload flows.
Confirm invalid sizeBytes inputs fail fast with 400.

Access-controlled content caching

Confirm protected responses in api/data/pages.js are non-cacheable (no-store/no-cache headers) and this is acceptable for UX/perf tradeoff.

Schema/migration safety

Confirm migrations/012_view_schema_backfill.sql is additive and idempotent in production-like DB states.
Confirm no conflict with existing migrations that reference view_id/views.

Regression spot-check

Manual smoke test: standard upload, chunked upload, and data-room page load for an authorized user.
Manual negative test: over-limit chunked upload should fail with 413.

joelev · 2026-03-02T13:09:07Z

Validation update (local smoke harness):

Executed 4 targeted scenarios against the changed handlers with mocked auth/DB dependencies to isolate behavior in this PR:

action=simple upload path

Result: PASS (201), persisted expected size_bytes.

Chunked upload flow (action=init -> action=chunk -> action=finish)

Result: PASS (200 chain), final size_bytes synchronized correctly.

Over-limit chunked upload (> 50MB request payload)

Result: PASS (413), request stream aborted as expected.

Authorized api/data/pages asset fetch caching policy

Result: PASS (200), response headers include:
- Cache-Control: no-store, no-cache, must-revalidate
- Pragma: no-cache
- Expires: 0

Additional validation:

npm run build passes (Built content/page.html (63624 bytes, 9 slides)).

Scope note:

This repo currently has no test/lint/typecheck scripts. The smoke checks were run via a local Node harness with mocked DB/auth modules to validate the exact changed code paths deterministically.

sdamico

Good work overall — the cache-control fix on protected assets is an important security improvement, the atomic chunk enforcement (enforcing MAX_FILE_SIZE in the WHERE clause) is a solid approach, MIME validation is well-factored into isAllowedMimeType, and the migration backfill is clean and safe.

Two things worth addressing before merging:

Required: `MAX_UPLOAD_BYTES` must account for base64 overhead

MAX_UPLOAD_BYTES and MAX_FILE_SIZE are both set to 50 * 1024 * 1024:

// api/admin/data-room-upload.js, lines 4-5
const MAX_FILE_SIZE   = 50 * 1024 * 1024; // 50MB
const MAX_UPLOAD_BYTES = 50 * 1024 * 1024; // 50MB  ← bug

The chunk action reads raw base64 text from the request body and gates it with MAX_UPLOAD_BYTES. But a 50 MB binary file, when base64-encoded, is ~66.7 MB on the wire (~33% overhead). Using the same cap for both means a valid 50 MB file will be rejected mid-transfer with a 413 before it even reaches the SQL layer.

The fix is to size MAX_UPLOAD_BYTES to cover the largest base64-encoded payload that could represent a MAX_FILE_SIZE file:

const MAX_FILE_SIZE    = 50 * 1024 * 1024;
const MAX_UPLOAD_BYTES = Math.ceil(MAX_FILE_SIZE * 4 / 3); // ~66.7 MB, covers base64 overhead

This keeps the actual stored size enforced at the DB layer (the WHERE clause you added), and MAX_UPLOAD_BYTES becomes a transport-layer guard that won't prematurely kill valid uploads.

Minor: `decode(chunkBase64, 'base64')` is called twice in the chunk UPDATE

In the new chunk append query:

SET
  content = content || decode(${chunkBase64}, 'base64'),
  size_bytes = octet_length(content) + octet_length(decode(${chunkBase64}, 'base64'))
WHERE id = ${id}
  AND octet_length(content) + octet_length(decode(${chunkBase64}, 'base64')) <= ${MAX_FILE_SIZE}

decode(chunkBase64, 'base64') appears three times — once in SET, once in the size_bytes expression, and once in the WHERE predicate. Postgres may optimize this away, but it's not guaranteed to. A CTE avoids the ambiguity:

WITH decoded AS (
  SELECT decode(${chunkBase64}, 'base64') AS chunk_bytes
)
UPDATE data_room_files
SET
  content    = content || (SELECT chunk_bytes FROM decoded),
  size_bytes = octet_length(content) + octet_length((SELECT chunk_bytes FROM decoded))
WHERE id = ${id}
  AND octet_length(content) + octet_length((SELECT chunk_bytes FROM decoded)) <= ${MAX_FILE_SIZE}
RETURNING id

This is a minor optimization — the correctness issue with MAX_UPLOAD_BYTES is the one that needs fixing before ship.

fix: harden data room uploads and add view schema backfill

a219ac0

sdamico requested changes Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden data room uploads and backfill view schema#1

fix: harden data room uploads and backfill view schema#1
joelev wants to merge 1 commit intosdamico:mainfrom
joelev:fix/data-room-upload-backfill

joelev commented Mar 2, 2026 •

edited

Loading

Uh oh!

joelev commented Mar 2, 2026

Uh oh!

joelev commented Mar 2, 2026

Uh oh!

sdamico left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joelev commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

What Changed

api/admin/data-room-upload.js

api/data/pages.js

migrations/012_view_schema_backfill.sql

Migration Impact

Risk Assessment

Rollback Plan

Validation

Uh oh!

joelev commented Mar 2, 2026

Uh oh!

joelev commented Mar 2, 2026

Uh oh!

sdamico left a comment

Choose a reason for hiding this comment

Required: MAX_UPLOAD_BYTES must account for base64 overhead

Minor: decode(chunkBase64, 'base64') is called twice in the chunk UPDATE

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joelev commented Mar 2, 2026 •

edited

Loading

`api/admin/data-room-upload.js`

`api/data/pages.js`

`migrations/012_view_schema_backfill.sql`

Required: `MAX_UPLOAD_BYTES` must account for base64 overhead

Minor: `decode(chunkBase64, 'base64')` is called twice in the chunk UPDATE