-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Description
Hello, I'm experiencing a gRPC exception and JVM crash when trying to use the multimodal capabilities of Gemma 3N.
The model is configured with "supportImage: true" and works correctly with text-only queries, but fails with a SIGSEGV error when including image data.
Environment
- Model: gemma-3n-E2B-it-int4.litertlm
- Platform: macOS (darwin, aarch64)
- flutter_gemma configuration
- supportImage: true
- maxNumImages: 1
- preferredBackend: PreferredBackend.gpu
Expected Behavior
When sending a query with both text and image using Message.withImage(), the model should process the multimodal input and return a response.
Actual Behavior
✅ Text-only queries work correctly using Message.text()
❌ Queries with images cause a JVM crash (SIGSEGV) and gRPC exception
Error Logs
Fatal JVM Errors
[LiteRT-LM Server] #
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000013d9fa400, pid=78383, tid=38919
#
# JRE version: OpenJDK Runtime Environment (25.0.1+8) (build 25.0.1+8-27)
# Java VM: OpenJDK 64-Bit Server VM (25.0.1+8-27, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Problematic frame:
# C [liblitertlm_jni.so+0x406400]
litert::lm::SessionBasic::ProcessAndCombineContents(std::__1::vector<std::__1::variant<litert::lm::InputText, litert::lm::InputImage, litert::lm::InputAudio, litert::lm::InputAudioEnd>, std::__1::allocator<std::__1::variant<litert::lm::InputText, litert::lm::InputImage, litert::lm::InputAudio, litert::lm::InputAudioEnd>>> const&)+0x120
gRPC Exception
[TEST] Error occurred: gRPC Error (code: 2, codeName: UNKNOWN, message: HTTP/2 error: Connection error: Connection is being forcefully terminated. (errorCode: 10), details: null, rawResponse: null, trailers: {})
[ServerProcessManager] Server process exited with code -6
Test Code Example
// Model initialization (works fine)
_inferenceModel = await gemma.createModel(
modelType: ModelType.gemmaIt,
preferredBackend: PreferredBackend.gpu,
maxTokens: 4096,
supportImage: true,
maxNumImages: 1,
);
// Session creation and query
final session = await _inferenceModel!.createSession();
// Load image
final Uint8List imageBytes = await imageFile.readAsBytes();
sLogD("[TEST] Image bytes loaded, size: ${imageBytes.length}", tag: "");
// This causes a SIGSEGV crash in litert::lm::SessionBasic::ProcessAndCombineContents
await session.addQueryChunk(
Message.withImage(
text: "Describe this image",
imageBytes: imageBytes,
isUser: true,
),
);
sLogD("[TEST] Query with image added successfully", tag: ""); // This log appears
final response = await session.getResponse(); // Crashes here with gRPC error
Analysis
The crash occurs in the native layer at litert::lm::SessionBasic::ProcessAndCombineContents, suggesting the issue might be:
- Memory access violation when processing image data
- Incompatible image format or encoding
- Native library issue with the multimodal model on macOS ARM64
- ...
Questions
- Is this a known issue with the current version on macOS ARM64?
- Are there specific image format requirements (JPEG/PNG) or preprocessing steps needed?
- Is there a working example of multimodal inference with Gemma 3N?
- Could this be related to the model file (gemma-3n-E2B-it-int4.litertlm) not supporting image input?
Thank you for creating such an excellent package! I really appreciate the work you've put into making Gemma accessible on Flutter. Any guidance on this issue would be greatly appreciated!