Skip to content

Draft#10

Open
Kumario1 wants to merge 6 commits intoakash-network:mainfrom
Kumario1:prince
Open

Draft#10
Kumario1 wants to merge 6 commits intoakash-network:mainfrom
Kumario1:prince

Conversation

@Kumario1
Copy link

@Kumario1 Kumario1 commented Mar 9, 2025

Added file attachment support for images. (PDF and more to come later)

Vision Model Integration for AkashChat using Ollama's LLaVA

This feature allows users to upload images to the chat and have them analyzed. Images are processed by Ollama's LLaVA vision model. The analysis is then appended to the user's message as context and sent to the AI model.

How It Works

  1. The user uploads an image by clicking the upload button in the chat input area
  2. The file is displayed as a preview with an option to remove it
  3. The user types their question about the file
  4. When the user sends the message:
    • Images are processed using Ollama's LLaVA model
  5. The analysis is prepended to the user's message as context
  6. The combined message is sent to the main AI model API

Components

  • ImageUploadButton.tsx: A reusable component for the file upload button
  • ChatInput.tsx: Modified to include file upload and processing
  • pages/api/vision.ts: API endpoint that processes images using Ollama's LLaVA model
  • utils/app/vision.ts: handles file uploads and conversions

New Dependencies

  • axios: For making HTTP requests to the Ollama API

Configuration

  1. Install and set up Ollama:

    • Download and install Ollama from https://ollama.ai/
    • Pull the LLaVA model: ollama pull llava
    • Make sure Ollama is running on the default port (11434)
  2. If your Ollama server is running on a different machine or port, update the API endpoint URL in pages/api/vision.ts.

Limitations

  • Vision analysis accuracy depends on the quality of the image and the capabilities of the LLaVA model
  • Large files may take longer to process
  • The Ollama server must be running for the vision analysis to work
  • Processing large files may require significant computational resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant