-
Notifications
You must be signed in to change notification settings - Fork 518
Description
Affected Component
Codebase
Current Behavior
Collection.query() is not safe for concurrent multi-process access; users must serialize all reads with an external lock
As I understand, Zvec is an embedded, file-based vector database (similar design to SQLite). Because there is no server process managing access, when the same collection is opened by multiple OS processes simultaneously, for example, in a multi-worker web server (Gunicorn/Uvicorn) or across multiple Docker containers sharing a volume, there are no internal concurrency guarantees. Users are forced to wrap every operation (including read-only queries) in an external FileLock, which serializes all search operations and creates a hard throughput ceiling.
Concrete Setup That Reproduces This
Three processes all mount the same collection path:
Process A: API server worker 1 (collection.query(...))
Process B: API server worker 2 (collection.query(...))
Process C: Celery embedding worker (collection.upsert(...))
Without an external lock, Process A and B opening the same collection simultaneously for parallel reads is undefined behavior. The HNSW graph file could be in a partially-flushed state from Process C's flush(). There is no documented guarantee that Zvec's file format is safe for concurrent readers.
Desired Improvement
Concurrent read safety guarantee: document that multiple processes can safely call [collection.query()] simultaneously on the same collection path, as long as no writer is active. This is the standard MRSW (Multiple Readers Single Writer) model that SQLite WAL mode implements. If the file format already supports this, just documenting it would let users use an RWLock instead of an exclusive.
An in-process thread-safety guarantee, if a single process opens the collection once at startup and multiple threads call [query()] concurrently, is that safe? If yes, document it explicitly. Users deploying with --workers 1 and async I/O (FastAPI/asyncio) could then load the collection once and share it across coroutines without any locking.
Impact
With 4 Uvicorn workers and a query that holds the lock for ~30ms, worst-case, the 4th worker waits 90ms purely for lock acquisition before any search work starts. Throughput ceiling is approximately 1 / lock_hold_time regardless of how many CPU cores or workers are added. Scaling horizontally is impossible; more processes mean a longer queue at the lock.
Current Workaround
from filelock import FileLock
_zvec_lock = FileLock(f"{settings.ZVEC_PATH}.lock")
class ZvecClient:
def connect(self):
_zvec_lock.acquire() # exclusive lock — blocks ALL other processes
self._collection = zvec.open(path)
def disconnect(self):
self._collection.flush()
_zvec_lock.release()Zvec Version: 0.2.0
Python: 3.12
Platform: Linux (AWS EC2, ARM Graviton4), macOS (development)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status