Skip to content

Conversation

@pingSubhajit
Copy link
Contributor

Summary

This PR fixes a security gap where image URLs provided to image:embed were being passed directly to embedding providers, potentially leaking internal or signed URLs to third-party APIs. Now, URL-based images are fetched server-side using the existing assetProcessing.fetch policy before being sent to the embedding provider.

Problem

Previously, when ingesting images with multimodal embedding enabled:

  • Images provided as bytes were passed to embedImage correctly
  • Images provided as URLs were passed directly to the embedding provider

This bypassed the assetProcessing.fetch security settings (allowlist, HTTPS-only, timeouts) and could expose internal URLs or signed URLs to third-party embedding APIs.

Solution

Modified the ingest pipeline to:

  1. Check if the image asset is URL-based
  2. Fetch the bytes server-side using getAssetBytes with the configured assetProcessing.fetch settings
  3. Pass the fetched bytes (not the URL) to embedImage
  4. Properly handle fetch errors according to assetProcessing.onError setting (skip or fail)

Changes

Core

  • packages/unrag/registry/core/ingest.ts: Added server-side URL fetching for image assets before calling embedImage

Documentation

  • apps/web/content/docs/embedding/multimodal-embeddings.mdx: Updated to clarify URL fetching behavior and add security note
  • apps/web/content/docs/extractors/image/embed.mdx: Added documentation about URL fetch handling and troubleshooting
  • apps/web/content/docs/reference/asset-processing.mdx: Clarified that FetchConfig applies to image embedding as well as extractors
  • apps/web/content/docs/reference/core-types.mdx: Added documentation for the stage field in asset_processing_error warnings

Tests

  • packages/unrag/test/core-image-embed-url-fetch.test.ts: Added comprehensive tests covering:
    • URL fetch with headers merging and bytes passed to embedImage
    • Fetch disabled with onError=skip emits warning
    • Fetch disabled with onError=fail throws error

Breaking Changes

None. This is a backwards-compatible security hardening:

  • ImageEmbeddingInput.data now always receives Uint8Array (URLs are fetched first)
  • Existing code using bytes-based images is unaffected
  • URL-based images now respect assetProcessing.fetch settings

@pingSubhajit pingSubhajit self-assigned this Jan 9, 2026
@vercel
Copy link

vercel bot commented Jan 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
unrag-web Ready Ready Preview, Comment Jan 9, 2026 6:08pm

@pingSubhajit pingSubhajit merged commit 5a58195 into release/v0.2.9 Jan 9, 2026
3 checks passed
@pingSubhajit pingSubhajit deleted the fix/handle-url-processing-for-image-embed-through-existing-fetch-policy branch January 9, 2026 18:11
pingSubhajit added a commit that referenced this pull request Jan 9, 2026
…L processing (#22)

* fix: wire url processing within image embed through existing fetch policy (#20)
* feat: Add evaluation harness battery for retrieval quality measurement (#21)
* chore: bump package minor version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants