Skip to content

Add multimodal input to the system #19

@andreaschandra

Description

@andreaschandra

streamlit chat doesn't support upload image
need to find another alternative for the UI to enable multimodal

Enhance our existing LangChain-based knowledge chatbot to support multimodal inputs (text, images) alongside the current text-only functionality. This will allow users to interact with the system using various input types for a richer conversational experience.

Current State

✅ Text input processing via LangChain
✅ Text-based responses
✅ Document upload support
❌ Image input support

Must Have

  • Image Input: Support common formats (PNG, JPG, JPEG, WebP, GIF)
  • Input Validation: File size limits (images: 10MB)
  • Error Handling: Clear error messages for unsupported formats/sizes

Nice to Have

  • Image OCR: Extract text from images for processing
  • Batch Processing: Multiple file uploads simultaneously
  • Progress Indicators: Upload/processing status feedback

Metadata

Metadata

Labels

help wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions