fix(vector_io): wire file_processors provider into vector store file insertion#5339
Draft
alinaryan wants to merge 2 commits intollamastack:mainfrom
Draft
fix(vector_io): wire file_processors provider into vector store file insertion#5339alinaryan wants to merge 2 commits intollamastack:mainfrom
alinaryan wants to merge 2 commits intollamastack:mainfrom
Conversation
…insertion Vector store file insertion was always using the legacy pypdf chunking path because the configured file_processors provider was never injected into vector_io providers. Add Api.file_processors as an optional dependency and thread it through all provider constructors to the mixin. Signed-off-by: Alina Ryan <aliryan@redhat.com>
a43f5d7 to
c6a1502
Compare
Contributor
|
Recording workflow completed Providers: gpt, azure Recordings have been generated and will be committed automatically by the companion workflow. Fork PR: Recordings will be committed if you have "Allow edits from maintainers" enabled. |
Co-Authored-By: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Contributor
|
✅ Recordings committed successfully Recordings from the integration tests have been committed to this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
When a file is posted to a vector store via POST /v1/vector_stores/{id}/files, the mixin code checks for a file_processor_api to process the file (e.g., using docling for structure-aware PDF parsing). However,
that attribute was never actually wired up — no vector_io provider ever received the file_processors dependency from the resolver. So vector store file insertion always fell through to the legacy pypdf chunking
path, regardless of which file_processors provider was configured.
This PR fixes that by:
Test Plan
Run the server with docling as the file_processors provider:
OLLAMA_URL=http://localhost:11434/v1 llama stack run \
--providers "file_processors=inline::docling,files=inline::localfs,vector_io=inline::faiss,inference=inline::sentence-transformers,inference=remote::ollama" \
--port 8321
Then in another terminal:
Upload a PDF
FILE_ID=$(curl -s http://localhost:8321/v1/files \
-F purpose=assistants \
-F file=@some.pdf | jq -r '.id')
Create a vector store
VS_ID=$(curl -s http://localhost:8321/v1/vector_stores \
-H "Content-Type: application/json" \
-d '{"name":"test"}' | jq -r '.id')
Attach file to vector store — this should now use docling
curl -s http://localhost:8321/v1/vector_stores/$VS_ID/files \
-H "Content-Type: application/json" \
-d "{"file_id":"$FILE_ID"}"
Search and verify chunks have docling-style markdown content
curl -s http://localhost:8321/v1/vector_stores/$VS_ID/search \
-H "Content-Type: application/json" \
-d '{"query":"your query","max_num_results":3}' | jq '.data[].content[].text'
Verify: Server logs should show "Using FileProcessor API to process file" instead of "FileProcessor API not available, using legacy chunking". The returned chunks should contain markdown-formatted content
(docling output) rather than raw extracted text (pypdf output).