Skip to content

[BUG] [v0.0.7] cortex-app-server storage.rs save_session() truncates file before acquiring exclusive lock — concurrent readers see empty session file (TOCTOU race) #53301

@petar-vasilev

Description

@petar-vasilev

Project

cortex

Description

In cortex-app-server/src/storage.rs, save_session() uses fs::File::create() which IMMEDIATELY truncates (empties) the file before acquiring the exclusive lock:

pub fn save_session(&self, session: &StoredSession) -> std::io::Result<()> {
let path = self.session_path(&session.id);
let file = fs::File::create(&path)?; // Step 1: TRUNCATES file to 0 bytes
file.lock_exclusive()?; // Step 2: lock acquired AFTER truncation
...
}

The race window exists between Step 1 and Step 2. A concurrent reader calling load_session() can:
a. Open the file (now 0 bytes after truncation)
b. Acquire a shared lock (no exclusive lock blocks it — writer hasn't locked yet)
c. Read 0 bytes from the file
d. serde_json::from_reader returns Err('EOF while parsing value')

This can happen in cortex-app-server when multiple WebSocket clients concurrently trigger a session save (e.g., one client sends a message while another client polls GET /sessions/{id}). The reader fails with a spurious 'session not found' or 'invalid session data' error even though the session exists and is being saved.

Additionally: the file lock is intended to prevent concurrent write corruption between multiple server instances, but the truncation BEFORE locking defeats this purpose for the create-then-lock pattern.

Error Message

Debug Logs

cortex-app-server/src/storage.rs:88-103 (TOCTOU: truncate before lock):
    pub fn save_session(&self, session: &StoredSession) -> std::io::Result<()> {
        let path = self.session_path(&session.id);
        let file = fs::File::create(&path)?;  // BUG: truncates file immediately!
        file.lock_exclusive()?;               // Lock acquired AFTER truncation
        // ... write data ...
    }

  Race window (between truncation and exclusive lock acquisition):
    Thread A: File::create() -> file is NOW EMPTY
    Thread B: File::open() -> opens empty file
    Thread B: lock_shared() -> succeeds (no exclusive lock yet)
    Thread B: serde_json::from_reader -> Err('unexpected end of file')
    Thread A: lock_exclusive() -> acquires lock, starts writing

  error: load_session('abc123'): 'Failed to load session: EOF while parsing value at line 1 column 0'
  error: GET /sessions/abc123 returns 500 Internal Server Error
  warning: spurious session read failure due to TOCTOU between File::create and lock_exclusive

System Information

OS: Ubuntu 22.04 LTS  |  Version: v0.0.7

Screenshots

https://github.com/petar-vasilev/screenshots/blob/main/c764ef8877e24e6eba46eb57fc35d579.png

Steps to Reproduce

  1. Run: cortex app-server # concurrent session reads and writes
  2. Two threads/clients use the same session ID concurrently
  3. Thread A (writer): calls save_session('abc123')
  4. -> fs::File::create('abc123.json') -- FILE TRUNCATED to 0 bytes
  5. Thread B (reader): calls load_session('abc123') in the race window
  6. -> fs::File::open('abc123.json') -- opens 0-byte file
  7. -> file.lock_shared() -- no exclusive lock yet, succeeds
  8. -> serde_json::from_reader -> Err('EOF while parsing value at line 1 column 0')
  9. Thread A (writer): file.lock_exclusive() -- now acquires lock and writes data
  10. Thread B returns error: 'Failed to load session: EOF while parsing'
  11. The session is actually saved correctly, but the reader sees a spurious error

Expected Behavior

The exclusive lock should be acquired BEFORE the file is truncated/written:

// Correct approach: open without truncating, then lock, then truncate+write
let file = fs::OpenOptions::new()
.write(true)
.create(true)
.open(&path)?;
file.lock_exclusive()?; // Lock acquired BEFORE any modification
file.set_len(0)?; // Truncate AFTER acquiring lock

This ensures no reader can open the file in a truncated-but-unlocked state.

Actual Behavior

save_session() calls fs::File::create() which immediately truncates the session file to 0 bytes, THEN acquires the exclusive lock. In the race window between these two operations, concurrent readers can open, lock, and read the empty file, receiving a spurious JSON parse error even though the session exists and is actively being written.

Additional Context

── Code Evidence ────────────────────────────────────────────────────
This is related to the BufWriter flush bug (separate bug report). Together:
1. File is truncated before lock (this bug) -> spurious read failures
2. BufWriter flush errors silently dropped -> silent data loss

The pattern should be: open file in write mode without truncating -> lock -> truncate -> write -> flush explicitly -> unlock.

On Linux, fs::File::create() uses open(O_CREAT|O_WRONLY|O_TRUNC) which atomically creates and truncates. The truncation happens in the kernel before the file is returned to userspace, so there is no way to 'create without truncating' using this call.

This affects any concurrent read+write scenario, including:
- WebSocket GET /sessions/{id} while a message is being processed
- Periodic session list refresh (GET /sessions) during active AI work
- Multiple server instances sharing the same ~/.cortex/app-server/ directory

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions