This guide walks you through setting up the environment and dependencies for the Byte Brain project. It includes installing necessary tools, running key services, and preparing the system for local development and experimentation.
Make sure the following tools are installed on your machine:
1. Ollama
# Follow installation instructions from: https://ollama.com/download2. Docker
# Follow installation instructions from: https://docs.docker.com/desktop/# Follow installation instructions from: https://www.anaconda.com/docs/main# Follow installation instructions from: https://www.anaconda.com/docs/mainThis project uses multiple AI models served locally via Ollama. Please make sure Ollama is installed and running on your system.
Follow the installation guide for your OS:
👉 https://ollama.com/download
Once installed, start the Ollama service.
Use the following commands to download and run each model:
ollama run mxbai-embed-large:latestollama run llava:latestollama run deepseek-llm:latestThe Byte Brain project relies on several services running via Docker. Follow the steps below to spin up each required container.
Redis is used for caching and real-time data management.
docker run -d --name redis-stack \
-p 6379:6379 -p 8001:8001 \
redis/redis-stack:latestUsed for vector storage and similarity search.
docker run -v ./chroma-data:/data \
-p 1020:8000 \
chromadb/chromadocker run -p 2181:2181 zookeeper
docker run -v ./chroma-data:/data \
-p 1020:8000 \
chromadb/chromaKafka is used for streaming data between services. Replace <IP_ADDRESS> with your machine’s IP address:
docker run -d -p 9092:9092 \
-e KAFKA_ZOOKEEPER_CONNECT=<IP_ADDRESS>:2181 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://<IP_ADDRESS>:9092 \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
confluentinc/cp-kafkaUsed as a primary NoSQL database for storing structured and semi-structured data.
docker run -d --name mongo_container \
-p 27017:27017 \
-e MONGO_INITDB_ROOT_USERNAME=<USERNAME> \
-e MONGO_INITDB_ROOT_PASSWORD=<PASSWORD> \
mongoThe Byte Brain project uses OCR (Optical Character Recognition) to extract text from documents and images. The following tools must be installed:
Tesseract is an open-source OCR engine used to recognize text in images.
- Download installer from:
Tesseract at UB Mannheim - During installation, ensure the option to add to system PATH is checked.
- If not, manually add the Tesseract installation directory (e.g.,
C:\Program Files\Tesseract-OCR) to the systemPATH.
brew install tesseracttesseract --versionPoppler is required to convert PDF documents to images or text, which can then be processed by Tesseract.
- Download from: Poppler Windows Builds
- Extract the ZIP file.
- Add the bin/ directory to your system PATH.
brew install popplerpdftoppm -hgit clone https://github.com/Yatin-aggarwal/ByteBrain.gitcd ByteBrainNow you can begin working with Byte Brain! Ensure all services and dependencies are running in the background.