[Enhance]: Concurrent multi-process access support without needs of FileLock for All Operations

### Affected Component

Codebase

### Current Behavior

Collection.query() is not safe for concurrent multi-process access; users must serialize all reads with an external lock

As I understand, Zvec is an embedded, file-based vector database (similar design to SQLite). Because there is no server process managing access, when the same collection is opened by multiple OS processes simultaneously, for example, in a multi-worker web server (Gunicorn/Uvicorn) or across multiple Docker containers sharing a volume, there are no internal concurrency guarantees. Users are forced to wrap every operation (including read-only queries) in an external FileLock, which serializes all search operations and creates a hard throughput ceiling.

Concrete Setup That Reproduces This

Three processes all mount the same collection path:

Process A: API server worker 1 (collection.query(...))
Process B: API server worker 2 (collection.query(...))
Process C: Celery embedding worker (collection.upsert(...))
Without an external lock, Process A and B opening the same collection simultaneously for parallel reads is undefined behavior. The HNSW graph file could be in a partially-flushed state from Process C's flush(). There is no documented guarantee that Zvec's file format is safe for concurrent readers.

### Desired Improvement


Concurrent read safety guarantee: document that multiple processes can safely call [collection.query()] simultaneously on the same collection path, as long as no writer is active. This is the standard MRSW (Multiple Readers Single Writer) model that SQLite WAL mode implements. If the file format already supports this, just documenting it would let users use an RWLock instead of an exclusive.

An in-process thread-safety guarantee, if a single process opens the collection once at startup and multiple threads call [query()] concurrently, is that safe? If yes, document it explicitly. Users deploying with --workers 1 and async I/O (FastAPI/asyncio) could then load the collection once and share it across coroutines without any locking.

### Impact

With 4 Uvicorn workers and a query that holds the lock for ~30ms, worst-case, the 4th worker waits 90ms purely for lock acquisition before any search work starts. Throughput ceiling is approximately 1 / lock_hold_time regardless of how many CPU cores or workers are added. Scaling horizontally is impossible; more processes mean a longer queue at the lock.

Current Workaround
```python
from filelock import FileLock
_zvec_lock = FileLock(f"{settings.ZVEC_PATH}.lock")

class ZvecClient:
    def connect(self):
        _zvec_lock.acquire()  # exclusive lock — blocks ALL other processes
        self._collection = zvec.open(path)
    
    def disconnect(self):
        self._collection.flush()
        _zvec_lock.release()

```

Zvec Version: 0.2.0
Python: 3.12
Platform: Linux (AWS EC2, ARM Graviton4), macOS (development)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhance]: Concurrent multi-process access support without needs of FileLock for All Operations #247

Affected Component

Current Behavior

Desired Improvement

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhance]: Concurrent multi-process access support without needs of FileLock for All Operations #247

Description

Affected Component

Current Behavior

Desired Improvement

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions