fix: Wire URL processing for image embed through existing fetch policy #20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes a security gap where image URLs provided to
image:embedwere being passed directly to embedding providers, potentially leaking internal or signed URLs to third-party APIs. Now, URL-based images are fetched server-side using the existingassetProcessing.fetchpolicy before being sent to the embedding provider.Problem
Previously, when ingesting images with multimodal embedding enabled:
embedImagecorrectlyThis bypassed the
assetProcessing.fetchsecurity settings (allowlist, HTTPS-only, timeouts) and could expose internal URLs or signed URLs to third-party embedding APIs.Solution
Modified the ingest pipeline to:
getAssetByteswith the configuredassetProcessing.fetchsettingsembedImageassetProcessing.onErrorsetting (skip or fail)Changes
Core
packages/unrag/registry/core/ingest.ts: Added server-side URL fetching for image assets before callingembedImageDocumentation
apps/web/content/docs/embedding/multimodal-embeddings.mdx: Updated to clarify URL fetching behavior and add security noteapps/web/content/docs/extractors/image/embed.mdx: Added documentation about URL fetch handling and troubleshootingapps/web/content/docs/reference/asset-processing.mdx: Clarified thatFetchConfigapplies to image embedding as well as extractorsapps/web/content/docs/reference/core-types.mdx: Added documentation for thestagefield inasset_processing_errorwarningsTests
packages/unrag/test/core-image-embed-url-fetch.test.ts: Added comprehensive tests covering:onError=skipemits warningonError=failthrows errorBreaking Changes
None. This is a backwards-compatible security hardening:
ImageEmbeddingInput.datanow always receivesUint8Array(URLs are fetched first)assetProcessing.fetchsettings