Skip to content

gRPC Exception when using Message.withImage() with multimodal model - Gemma 3N E2B #175

@hsy159

Description

@hsy159

Description

Hello, I'm experiencing a gRPC exception and JVM crash when trying to use the multimodal capabilities of Gemma 3N.
The model is configured with "supportImage: true" and works correctly with text-only queries, but fails with a SIGSEGV error when including image data.

Environment

  • Model: gemma-3n-E2B-it-int4.litertlm
  • Platform: macOS (darwin, aarch64)
  • flutter_gemma configuration
    • supportImage: true
    • maxNumImages: 1
    • preferredBackend: PreferredBackend.gpu

Expected Behavior

When sending a query with both text and image using Message.withImage(), the model should process the multimodal input and return a response.

Actual Behavior

✅ Text-only queries work correctly using Message.text()
❌ Queries with images cause a JVM crash (SIGSEGV) and gRPC exception

Error Logs

Fatal JVM Errors

[LiteRT-LM Server] #
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000013d9fa400, pid=78383, tid=38919
#
# JRE version: OpenJDK Runtime Environment (25.0.1+8) (build 25.0.1+8-27)
# Java VM: OpenJDK 64-Bit Server VM (25.0.1+8-27, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Problematic frame:
# C  [liblitertlm_jni.so+0x406400]
  litert::lm::SessionBasic::ProcessAndCombineContents(std::__1::vector<std::__1::variant<litert::lm::InputText, litert::lm::InputImage, litert::lm::InputAudio, litert::lm::InputAudioEnd>, std::__1::allocator<std::__1::variant<litert::lm::InputText, litert::lm::InputImage, litert::lm::InputAudio, litert::lm::InputAudioEnd>>> const&)+0x120

gRPC Exception

[TEST] Error occurred: gRPC Error (code: 2, codeName: UNKNOWN, message: HTTP/2 error: Connection error: Connection is being forcefully terminated. (errorCode: 10), details: null, rawResponse: null, trailers: {})
[ServerProcessManager] Server process exited with code -6

Test Code Example

// Model initialization (works fine)
_inferenceModel = await gemma.createModel(
  modelType: ModelType.gemmaIt,
  preferredBackend: PreferredBackend.gpu,
  maxTokens: 4096,
  supportImage: true,
  maxNumImages: 1,
);

// Session creation and query
final session = await _inferenceModel!.createSession();

// Load image
final Uint8List imageBytes = await imageFile.readAsBytes();
sLogD("[TEST] Image bytes loaded, size: ${imageBytes.length}", tag: "");

// This causes a SIGSEGV crash in litert::lm::SessionBasic::ProcessAndCombineContents
await session.addQueryChunk(
  Message.withImage(
    text: "Describe this image",
    imageBytes: imageBytes,
    isUser: true,
  ),
);
sLogD("[TEST] Query with image added successfully", tag: ""); // This log appears

final response = await session.getResponse(); // Crashes here with gRPC error

Analysis

The crash occurs in the native layer at litert::lm::SessionBasic::ProcessAndCombineContents, suggesting the issue might be:

  • Memory access violation when processing image data
  • Incompatible image format or encoding
  • Native library issue with the multimodal model on macOS ARM64
  • ...

Questions

  1. Is this a known issue with the current version on macOS ARM64?
  2. Are there specific image format requirements (JPEG/PNG) or preprocessing steps needed?
  3. Is there a working example of multimodal inference with Gemma 3N?
  4. Could this be related to the model file (gemma-3n-E2B-it-int4.litertlm) not supporting image input?

Thank you for creating such an excellent package! I really appreciate the work you've put into making Gemma accessible on Flutter. Any guidance on this issue would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions