fix(vector_io): wire file_processors provider into vector store file insertion by alinaryan · Pull Request #5339 · llamastack/llama-stack

alinaryan · 2026-03-27T15:01:37Z

What does this PR do?

When a file is posted to a vector store via POST /v1/vector_stores/{id}/files, the mixin code checks for a file_processor_api to process the file (e.g., using docling for structure-aware PDF parsing). However,
that attribute was never actually wired up — no vector_io provider ever received the file_processors dependency from the resolver. So vector store file insertion always fell through to the legacy pypdf chunking
path, regardless of which file_processors provider was configured.

This PR fixes that by:

Adding Api.file_processors to optional_api_dependencies for all vector_io provider specs
Threading the dependency through each provider's factory function and constructor to the OpenAIVectorStoreMixin

Test Plan

Run the server with docling as the file_processors provider:

OLLAMA_URL=http://localhost:11434/v1 llama stack run \
--providers "file_processors=inline::docling,files=inline::localfs,vector_io=inline::faiss,inference=inline::sentence-transformers,inference=remote::ollama" \
--port 8321

Then in another terminal:

Upload a PDF

FILE_ID=$(curl -s http://localhost:8321/v1/files \
-F purpose=assistants \
-F file=@some.pdf | jq -r '.id')

Create a vector store

VS_ID=$(curl -s http://localhost:8321/v1/vector_stores \
-H "Content-Type: application/json" \
-d '{"name":"test"}' | jq -r '.id')

Attach file to vector store — this should now use docling

curl -s http://localhost:8321/v1/vector_stores/$VS_ID/files \
-H "Content-Type: application/json" \
-d "{"file_id":"$FILE_ID"}"

Search and verify chunks have docling-style markdown content

curl -s http://localhost:8321/v1/vector_stores/$VS_ID/search \
-H "Content-Type: application/json" \
-d '{"query":"your query","max_num_results":3}' | jq '.data[].content[].text'

Verify: Server logs should show "Using FileProcessor API to process file" instead of "FileProcessor API not available, using legacy chunking". The returned chunks should contain markdown-formatted content
(docling output) rather than raw extracted text (pypdf output).

…insertion Vector store file insertion was always using the legacy pypdf chunking path because the configured file_processors provider was never injected into vector_io providers. Add Api.file_processors as an optional dependency and thread it through all provider constructors to the mixin. Signed-off-by: Alina Ryan <aliryan@redhat.com>

github-actions · 2026-03-27T19:32:11Z

Recording workflow completed

Providers: gpt, azure

Recordings have been generated and will be committed automatically by the companion workflow.

View workflow run

Fork PR: Recordings will be committed if you have "Allow edits from maintainers" enabled.

Co-Authored-By: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

github-actions · 2026-03-27T19:32:28Z

✅ Recordings committed successfully

Recordings from the integration tests have been committed to this PR.

View commit workflow

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 27, 2026

alinaryan force-pushed the fix-vector-store-file-insert branch from a43f5d7 to c6a1502 Compare March 27, 2026 19:20

Recordings update from CI

766e450

Co-Authored-By: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vector_io): wire file_processors provider into vector store file insertion#5339

fix(vector_io): wire file_processors provider into vector store file insertion#5339
alinaryan wants to merge 2 commits intollamastack:mainfrom
alinaryan:fix-vector-store-file-insert

alinaryan commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alinaryan commented Mar 27, 2026

What does this PR do?

Test Plan

Upload a PDF

Create a vector store

Attach file to vector store — this should now use docling

Search and verify chunks have docling-style markdown content

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant