Draft by Kumario1 · Pull Request #10 · akash-network/akash-chat

Kumario1 · 2025-03-09T19:21:18Z

Added file attachment support for images. (PDF and more to come later)

Vision Model Integration for AkashChat using Ollama's LLaVA

This feature allows users to upload images to the chat and have them analyzed. Images are processed by Ollama's LLaVA vision model. The analysis is then appended to the user's message as context and sent to the AI model.

How It Works

The user uploads an image by clicking the upload button in the chat input area
The file is displayed as a preview with an option to remove it
The user types their question about the file
When the user sends the message:
- Images are processed using Ollama's LLaVA model
The analysis is prepended to the user's message as context
The combined message is sent to the main AI model API

Components

ImageUploadButton.tsx: A reusable component for the file upload button
ChatInput.tsx: Modified to include file upload and processing
pages/api/vision.ts: API endpoint that processes images using Ollama's LLaVA model
utils/app/vision.ts: handles file uploads and conversions

New Dependencies

axios: For making HTTP requests to the Ollama API

Configuration

Install and set up Ollama:
- Download and install Ollama from https://ollama.ai/
- Pull the LLaVA model: ollama pull llava
- Make sure Ollama is running on the default port (11434)
If your Ollama server is running on a different machine or port, update the API endpoint URL in pages/api/vision.ts.

Limitations

Vision analysis accuracy depends on the quality of the image and the capabilities of the LLaVA model
Large files may take longer to process
The Ollama server must be running for the vision analysis to work
Processing large files may require significant computational resources

…as "Text Extracted from Image"

Kumario1 added 6 commits March 9, 2025 00:38

added upload button its functionality.

67ec8a1

created OCR API endpoint.

0be7490

Finished OCR implementation.

6a5d7dc

Removed context from user input. Now all context from image is shown …

a8e65a6

…as "Text Extracted from Image"

Remove OCR now using Ollama Vision model

5d1a376

removed old OCR function.

f7af8cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft#10

Draft#10
Kumario1 wants to merge 6 commits intoakash-network:mainfrom
Kumario1:prince

Kumario1 commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kumario1 commented Mar 9, 2025

Added file attachment support for images. (PDF and more to come later)

Vision Model Integration for AkashChat using Ollama's LLaVA

How It Works

Components

New Dependencies

Configuration

Limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant