A Java-based Retrieval-Augmented Generation (RAG) system that combines document ingestion, vector storage, and language model inference to provide contextual question-answering capabilities.
This project implements a complete RAG pipeline using:
- Document Ingestion: Loads and processes text documents
- Vector Storage: Uses Milvus for storing document embeddings
- Embedding Generation: Python-based API using SentenceTransformer models
- Language Model: Integration with Ollama for text generation
- Retrieval System: Vector similarity search for relevant context
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Document │ │ Embedding │ │ Vector │
│ Ingestion │───▶│ Generation │───▶│ Storage │
│ │ │ (Python API) │ │ (Milvus) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Answer │ │ Language │ │ Context │
│ Generation │◀───│ Model │◀───│ Retrieval │
│ │ │ (Ollama) │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Multi-format Document Processing: Load and process text documents
- Semantic Search: Vector-based similarity search using embeddings
- Context-aware Responses: Generate answers based on retrieved relevant context
- Modular Design: Separate components for ingestion, retrieval, and generation
- External LLM Integration: Uses Ollama for language model inference
- Scalable Vector Storage: Milvus database for efficient vector operations
- Java 11 or higher
- Maven 3.6+
- Milvus: Vector database (default: localhost:19530)
- Ollama: Language model server (default: localhost:11434)
- Python Embedding API: SentenceTransformer service (default: localhost:5005)
git clone <repository-url>
cd Java-Rag-Systemmvn clean installcd src/embedding-api
pip install flask sentence-transformers
python embedding_api.pyFollow the Milvus installation guide or use Docker:
docker run -d --name milvus -p 19530:19530 milvusdb/milvus:latest- Install Ollama from https://ollama.ai
- Pull the required model:
ollama pull gemma3:1bFirst, run the ingestion process to load documents into the vector database:
mvn exec:java -Dexec.mainClass="ingestion.App"Start the main RAG application:
mvn exec:java -Dexec.mainClass="LLM.AppRag"The system will prompt you to enter questions, and it will:
- Retrieve relevant context from the vector database
- Generate contextual answers using the language model
src/
├── main/java/
│ ├── ingestion/ # Document processing and vector storage
│ │ ├── App.java # Main ingestion application
│ │ ├── SimpleDocumentLoader.java
│ │ ├── RemoteEmbedder.java
│ │ ├── MilvusVectorStore.java
│ │ └── MilvusConnection.java
│ ├── retrieval/ # Vector search and context retrieval
│ │ └── VectorRetriever.java
│ └── LLM/ # Language model integration
│ ├── AppRag.java # Main RAG application
│ └── RagPipeline.java # RAG pipeline orchestration
├── embedding-api/ # Python embedding service
│ └── embedding_api.py
└── resources/
└── doc1.txt # Sample documents
- Model: gemma3:1b (configurable in RagPipeline.java)
- Temperature: 0.2
- Base URL: http://localhost:11434
- Host: 127.0.0.1
- Port: 19530
- Collection: Automatically managed
- Model: all-MiniLM-L6-v2
- Endpoint: http://127.0.0.1:5005/embed
- LangChain4J: Framework for LLM applications
- Ollama Integration: Language model client
- Milvus SDK: Vector database client
- Gson: JSON processing
- Flask: Web framework for embedding API
- SentenceTransformers: Pre-trained embedding models
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Free to use for educational purposes.
- Connection refused: Ensure all external services (Milvus, Ollama, Python API) are running
- Model not found: Make sure to pull the required Ollama model:
ollama pull gemma3:1b - Port conflicts: Check if default ports (19530, 11434, 5005) are available
For issues and questions, please create an issue in the repository.