I encountered a problem when reproducing the evaluation experiments on the longmemeval dataset.
When using sliced runs (e.g. --from-conv 234 --to-conv 264) can make search fail with:
BM25 index not found: bm25_index_conv_234.pkl
Root cause: index files may be built with local sequential ids (0..N-1) while search looks up global conversation ids (234..263).
Result: search_results are empty, and evaluation scores drop for the wrong reason.
Repro:
uv run python -m evaluation.cli
--dataset longmemeval
--system evermemos
--from-conv 234 --to-conv 264
--run-name test
Expected: sliced runs should generate/load indexes with consistent conversation ids so search can find bm25_index_conv_<conv_id>.pkl.