Skip to content

inline::sqlite-vec missing WAL mode and busy_timeout on vector store connections #5344

@mfleader

Description

@mfleader

System Info

  • llama-stack version: latest main
  • Python: 3.12
  • OS: Linux (UBI9)
  • Deployment: Kubernetes, multiple uvicorn workers

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Bug Description

The inline::sqlite-vec provider opens database connections without setting WAL mode or busy_timeout. With multiple uvicorn workers, concurrent writes fail immediately with sqlite3.OperationalError: database is locked instead of waiting and retrying.

The KV store (src/llama_stack/core/storage/sqlstore/sqlalchemy_sqlstore.py:140-145) already handles this correctly by setting journal_mode=WAL, busy_timeout=5000, and synchronous=NORMAL. The vector store at src/llama_stack/providers/inline/vector_io/sqlite_vec/sqlite_vec.py:111 does not.

Steps to Reproduce

Create a run config with 2 workers:

run.yaml

server:                                                         
  port: 8321                                                                            
  workers: 2                                                                            

Start the server:

llama stack run run.yaml                                                               

Hit the vector store API from two terminals simultaneously:

# Terminal 1                                                                            
curl -X POST http://localhost:8321/v1/vector_stores \                                   
  -H "Content-Type: application/json" \                                                 
  -d '{"name": "test-store-1", "embedding_model": "all-MiniLM-L6-v2"}'                  
# Terminal 2 (run at the same time)                                                     
curl -X POST http://localhost:8321/v1/vector_stores \                                   
  -H "Content-Type: application/json" \                                                 
  -d '{"name": "test-store-2", "embedding_model": "all-MiniLM-L6-v2"}'                  

Error logs

sqlite3.OperationalError: database is locked 

Expected behavior

Concurrent writes should wait up to a timeout and succeed, matching the KV store's concurrency behavior. The fix is two lines in _create_sqlite_connection() after sqlite3.connect(db_path):

connection.execute("PRAGMA journal_mode=WAL")                                           
connection.execute("PRAGMA busy_timeout=5000")                                          

Actual Behavior

The second concurrent write fails immediately with no retry because the default
SQLite journal mode (DELETE) uses exclusive locks and no busy_timeout is set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions