Skip to content

Yatin-aggarwal/ByteBrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Byte Brain Setup Guide

This guide walks you through setting up the environment and dependencies for the Byte Brain project. It includes installing necessary tools, running key services, and preparing the system for local development and experimentation.


🔧 Prerequisites

Make sure the following tools are installed on your machine:

# Follow installation instructions from: https://ollama.com/download
# Follow installation instructions from: https://docs.docker.com/desktop/
# Follow installation instructions from: https://www.anaconda.com/docs/main
# Follow installation instructions from: https://www.anaconda.com/docs/main

🤖 Ollama Model Setup

This project uses multiple AI models served locally via Ollama. Please make sure Ollama is installed and running on your system.

📥 Install Ollama

Follow the installation guide for your OS:
👉 https://ollama.com/download

Once installed, start the Ollama service.


🚀 Run Required Ollama Models

Use the following commands to download and run each model:

ollama run mxbai-embed-large:latest
ollama run llava:latest
ollama run deepseek-llm:latest

🐳 Docker Container Setup

The Byte Brain project relies on several services running via Docker. Follow the steps below to spin up each required container.


1. 🔁 Redis

Redis is used for caching and real-time data management.

docker run -d --name redis-stack \
-p 6379:6379 -p 8001:8001 \
redis/redis-stack:latest

2. 📚 ChromaDB

Used for vector storage and similarity search.

docker run -v ./chroma-data:/data \
-p 1020:8000 \
chromadb/chroma

3. 🐘 Zookeeper

docker run -p 2181:2181 zookeeper

docker run -v ./chroma-data:/data \
-p 1020:8000 \
chromadb/chroma

4. 📨 Kafka

Kafka is used for streaming data between services. Replace <IP_ADDRESS> with your machine’s IP address:

docker run -d -p 9092:9092 \
-e KAFKA_ZOOKEEPER_CONNECT=<IP_ADDRESS>:2181 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://<IP_ADDRESS>:9092 \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
confluentinc/cp-kafka

5. 🧩 MongoDB

Used as a primary NoSQL database for storing structured and semi-structured data.

docker run -d --name mongo_container \
-p 27017:27017 \
-e MONGO_INITDB_ROOT_USERNAME=<USERNAME> \
-e MONGO_INITDB_ROOT_PASSWORD=<PASSWORD> \
mongo

📄 OCR Dependencies

The Byte Brain project uses OCR (Optical Character Recognition) to extract text from documents and images. The following tools must be installed:


🔠 Tesseract OCR

Tesseract is an open-source OCR engine used to recognize text in images.

✅ Windows

  • Download installer from:
    Tesseract at UB Mannheim
  • During installation, ensure the option to add to system PATH is checked.
  • If not, manually add the Tesseract installation directory (e.g., C:\Program Files\Tesseract-OCR) to the system PATH.

✅ macOS

brew install tesseract

To verify installation:

tesseract --version

📄 Poppler

Poppler is required to convert PDF documents to images or text, which can then be processed by Tesseract.

✅ Windows

✅ macOS

brew install poppler

To verify installation:

pdftoppm -h

📁 Clone & Setup Project

git clone https://github.com/Yatin-aggarwal/ByteBrain.git
cd ByteBrain

✅ You're all set!

Now you can begin working with Byte Brain! Ensure all services and dependencies are running in the background.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published