Saul Botman ⚖️

A sophisticated legal assistant inspired by Better Call Saul utilizing RAG architecture, advanced language models and embeddings to retrieve and generate contextually relevant answers from a provided legal document corpus designed to provide accurate information about the Indian Penal Code (IPC).

🔥 Try it now: Saul Botman Live

Technical Stack 🛠️

Frontend: Streamlit with custom CSS theming
LLM Integration: Groq's llama-3.3-70b-versatile
Embeddings: HuggingFace sentence-transformers/all-MiniLM-L6-v2
Vector Store: FAISS for efficient similarity search
Document Processing: langchain_text_splitters RecursiveCharacterTextSplitter
Memory: InMemoryChatMessageHistory for lightweight chat history

Features ✨

Clean and intuitive user interface
Streamlined conversation experience with legal context
Vector-based similarity search for relevant IPC sections
Real-time document retrieval and context analysis
Conversation memory for maintaining context
Custom prompt engineering for legal responses

Installation 🚀

Clone the repository:

git clone https://github.com/NiteeshL/Saul-Botman.git
cd Saul-Botman

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure API Keys: Create a .env file in the root directory with the following:

GROQ_API_KEY=your_groq_api_key_here

To obtain the API keys:

Get your Groq API key from Groq Cloud

Data Processing 📚

The system processes Indian Penal Code documents through the following pipeline:

PDF documents are loaded from the legal_documents directory
Documents are split into chunks of 1000 characters with 200 character overlap
Text chunks are embedded using HuggingFace sentence-transformers/all-MiniLM-L6-v2
Embeddings are stored in a FAISS vector database for efficient retrieval

Note: You can expand the knowledge base by adding more PDF documents to the legal_documents folder. After adding new documents, simply run python data_ingestion.py again to update the vector database with the new content.

Usage 💡

Place your IPC documents in the legal_documents directory
Run the data ingestion script:

python data_ingestion.py

Start the Streamlit app:

streamlit run app.py

The application will open in your default web browser at http://localhost:8501.

Query Processing Flow 🔄

User input is processed through the Streamlit chat interface
Relevant IPC sections are retrieved using FAISS similarity search (k=4)
Context is combined with the user query and recent chat history using a custom PromptTemplate
Response is generated using the Groq LLM (llama-3.3-70b-versatile)
Chat history is maintained via InMemoryChatMessageHistory (last two exchanges)

Architecture Diagram 🗺️

flowchart TD
  subgraph Offline Data Ingestion
    A[PDFs in legal_documents] --> B[PyPDFDirectoryLoader]
    B --> C[RecursiveCharacterTextSplitter - chunk 1000, overlap 200]
    C --> D[HuggingFaceEmbeddings - all-MiniLM-L6-v2]
    D --> E[FAISS index]
    E --> F[vector_db on disk]
  end

  subgraph Online Query
    U[User - Streamlit UI] --> Q[Question]
    Q --> R[FAISS Retriever - k 4]
    R --> Ctx[Top-k Context]
    Hist[InMemoryChatMessageHistory] --> P
    Ctx --> P[PromptTemplate - context + chat history + question]
    P --> LLM[Groq Chat LLM - llama-3.3-70b-versatile]
    LLM --> Ans[Answer]
    Ans --> UI[Render with typing effect]
    Clear[Clear Chat] -.-> Hist
  end

  F -. loads .-> R

Contributing 🤝

Contributions are welcome! Please feel free to submit a Pull Request.

License 📄

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
legal_documents		legal_documents
vector_db		vector_db
LICENSE		LICENSE
README.md		README.md
app.py		app.py
data_ingestion.py		data_ingestion.py
requirements.txt		requirements.txt
saul.jpg		saul.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Saul Botman ⚖️

Technical Stack 🛠️

Features ✨

Installation 🚀

Data Processing 📚

Usage 💡

Query Processing Flow 🔄

Architecture Diagram 🗺️

Contributing 🤝

License 📄

About

Uh oh!

Languages

License

NiteeshL/Saul-Botman

Folders and files

Latest commit

History

Repository files navigation

Saul Botman ⚖️

Technical Stack 🛠️

Features ✨

Installation 🚀

Data Processing 📚

Usage 💡

Query Processing Flow 🔄

Architecture Diagram 🗺️

Contributing 🤝

License 📄

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages