Skip to content

feat(deepagents): support multimodal files for backends#298

Open
Colin Francis (colifran) wants to merge 61 commits intomainfrom
colifran/multimodal-refactor
Open

feat(deepagents): support multimodal files for backends#298
Colin Francis (colifran) wants to merge 61 commits intomainfrom
colifran/multimodal-refactor

Conversation

@colifran
Copy link
Contributor

@colifran Colin Francis (colifran) commented Mar 11, 2026

Description

Adds support for binary and multimodal files (images, PDFs, audio, video, etc.) across all backend implementations, with a versioned protocol layer that returns structured results instead of plain values or arrays.

Changes

Protocol

  • Introduced BackendProtocolV2 with structured Result return types:
    • ReadResult — for read() operations
    • ReadRawResult — for readRaw() operations
    • GrepResult — for grepRaw() operations
    • LsResult — for lsInfo() operations
    • GlobResult — for globInfo() operations
  • Introduced SandboxBackendProtocolV2 extending BackendProtocolV2; deprecated SandboxBackendProtocol
  • Deprecated v1 interfaces (BackendProtocol, SandboxBackendProtocol) moved to v1/; current interfaces in v2/ for clear separation
  • Added AnyBackendProtocol union type and adaptBackendProtocol() for backward compatibility — public APIs accept either version and adapt internally
  • FileData split into FileDataV1 (legacy line array) and FileDataV2 (single string + mimeType, supports base64)
  • All Result types follow consistent pattern: { error?: string, [data]?: T } for explicit error propagation

Binary File Support

  • All backends (State, Store, Filesystem, Sandbox, Composite) store and retrieve binary files as base64
  • read_file tool returns typed multimodal content blocks (image, audio, video, file) for binary files
  • Binary reads capped at 10MB to stay within provider inline limits
  • MIME type detection via file extension, stored with v2 FileData

Middleware

  • Updated skills.ts middleware to handle LsResult from backend operations
  • Updated fs.ts middleware to handle LsResult and GlobResult with proper error checking

Provider Updates

  • QuickJS REPL: public API accepts AnyBackendProtocol, adapts to v2 internally via adaptBackendProtocol() — fully backward compatible
  • Node VFS: VfsSandbox updated to return LsResult and GlobResult instead of bare arrays

Backward Compatibility

  • v1 backends continue to work everywhere via adaptBackendProtocol() runtime detection
  • v2 backends correctly read and operate on existing v1 FileData (string[] content)
  • No breaking changes to public APIs — all accept AnyBackendProtocol

Tests

  • All backend tests updated to assert on Result types: composite.test.ts, filesystem.test.ts, state.test.ts, store.test.ts, sandbox.test.ts, local-shell.test.ts
  • Added readRaw() tests returning ReadRawResult across State, LocalShell, Composite, and Node VFS backends
  • utils.test.ts updated to test adaptBackendProtocol() with all Result types, including v1→v2 wrapping
  • Middleware tests (fs.test.ts, skills.test.ts) updated to mock backends returning Result types
  • New binary tests for State and Store backends covering upload, download, round-trip, read, grep, and pagination
  • New multimodal content block tests for read_file (image, audio, video, PDF, size limit enforcement)
  • Sandbox test: readRaw() now returns error in Result object instead of throwing
  • Added v1→v2 backward compatibility tests for State and Store backends (mixed v1/v2 data)
  • Manual e2e testing scripts for all 5 backend configurations plus v1→v2 migration simulation

Example:

multimodal

@changeset-bot
Copy link

changeset-bot bot commented Mar 11, 2026

🦋 Changeset detected

Latest commit: 2930258

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 6 packages
Name Type
@langchain/node-vfs Patch
@langchain/quickjs Patch
@langchain/modal Patch
@langchain/sandbox-standard-tests Patch
deepagents Patch
deepagents-acp Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@colifran Colin Francis (colifran) changed the title feat: support multimodal content in backends feat(deepagents): support multimodal files for backends Mar 11, 2026
@colifran Colin Francis (colifran) force-pushed the colifran/multimodal-refactor branch from 516cedf to 4d75256 Compare March 11, 2026 19:50
@colifran Colin Francis (colifran) force-pushed the colifran/multimodal-refactor branch from 0ace2b2 to 6beb46c Compare March 12, 2026 20:44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, optional.

Great work 👍

@colifran Colin Francis (colifran) force-pushed the colifran/multimodal-refactor branch from 10210cc to c8b6f3e Compare March 13, 2026 23:06
Copy link
Member

@hntrl Hunter Lovell (hntrl) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the mime type thing:

  • I don't think the filename heuristic to determine what content block gets made is our best approach. E.g. in S3 I know I can attach a content-type header to a file but have an ambiguous file name

Comment on lines +115 to +116
/** File content as a string (text or base64-encoded binary), undefined on failure */
content?: string;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing w/ mimetype

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would this be base64 encoded? can we save the encoding time if we just always had a Uint8Array? For reference this is how we type in https://github.com/langchain-ai/langchainjs/blob/c3fcea5f288ac5264b11731d271385e9a2cee537/libs/langchain-core/src/messages/content/multimodal.ts#L51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this with Claude so please confirm but I think we would run into an issue with state and store backends. It sounds like the JsonPlusSerializer only handles Uint8Array at the top level (["bytes", obj]) but when it's nested inside the files record it goes through JSON.stringify and comes back as a plain object with numeric keys ({"0": 137, "1": 80, ...}). The replacer/reviver has cases for Set, Map, RegExp, etc. but not Uint8Array. So we would run into issues with data corruption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants