Skip to content

A legal chatbot project built as a practical application of NLP and conversational AI. Designed to understand and respond to legal queries using natural language processing techniques. Includes dataset preparation, model training, and a simple interactive interface, all in Python.

Notifications You must be signed in to change notification settings

RK0297/Legal-Chatbot

Repository files navigation

ΰ€•ΰ€Ύΰ€¨ΰ₯‚ΰ€¨ - AI Legal Assistant Documentation

Overview

ΰ€•ΰ€Ύΰ€¨ΰ₯‚ΰ€¨ (Kanoon) is an AI-powered legal assistant chatbot designed to help users with Indian legal queries. It leverages a Retrieval-Augmented Generation (RAG) pipeline combined with a Large Language Model (LLM) to provide accurate, context-aware legal information based on the Indian Constitution, legal cases.

Key Features:

  • Real-time Legal Q&A - Get instant answers to legal questions
  • Hybrid RAG/LLM Mode - Automatically switches between database-backed and general knowledge responses

Tech Stack

Frontend

Technology Purpose Version
React UI Framework 18.3.1
TypeScript Type Safety 5.8.3
Vite Build Tool & Dev Server 5.4.19
Shadcn UI Component Library Latest
Tailwind CSS Styling 3.4.17
React Router Navigation 6.30.1
Lucide React Icons 0.462.0

Backend

Technology Purpose Version
FastAPI Web Framework Latest
Python Programming Language 3.12+
Uvicorn ASGI Server Latest
ChromaDB Vector Database Latest
Sentence Transformers Embeddings Latest
Ollama Local LLM Runtime Latest
Qwen 3 Language Model 8B parameters

Data Processing

Technology Purpose
Hugging Face Datasets Data Loading
Beautiful Soup Web Scraping (optional)
Pandas Data Manipulation
tqdm Progress Bars

πŸ“ Repository Structure

Ith/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ main.py                    # FastAPI application & API endpoints
β”‚   β”‚   β”œβ”€β”€ rag_pipeline.py            # RAG pipeline implementation
β”‚   β”‚   β”œβ”€β”€ vector_database.py         # ChromaDB vector database manager
β”‚   β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”‚   └── vectordb/              # ChromaDB persistent storage
β”‚   β”‚   β”‚       β”œβ”€β”€ chroma.sqlite3
β”‚   β”‚   β”‚       └── e1567276.../       # Vector embeddings
β”‚   β”‚   └── __pycache__/
β”‚   β”‚
β”‚   β”œβ”€β”€ scrapers/
β”‚   β”‚   β”œβ”€β”€ dt.py                      # Hugging Face dataset loader
β”‚   β”‚   └── data/
β”‚   β”‚       └── raw/
β”‚   β”‚           └── legal_data_all.json # Raw legal Q&A dataset
β”‚   β”‚
β”‚   β”œβ”€β”€ requirements.txt               # Python dependencies
β”‚   └── .env                           # Environment variables
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatInterface.tsx      # Main chat component
β”‚   β”‚   β”‚   β”œβ”€β”€ Navigation.tsx         # Header navigation
β”‚   β”‚   β”‚   β”œβ”€β”€ Footer.tsx             # Footer component
β”‚   β”‚   β”‚   └── ui/                    # Shadcn UI components
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   └── api.ts                 # API service for backend communication
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   β”œβ”€β”€ Index.tsx              # Landing page
β”‚   β”‚   β”‚   └── NotFound.tsx           # 404 page
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   β”‚   └── utils.ts               # Utility functions
β”‚   β”‚   β”‚
β”‚   β”‚   β”œβ”€β”€ App.tsx                    # Main app component
β”‚   β”‚   └── main.tsx                   # App entry point
β”‚   β”‚
β”‚   β”œβ”€β”€ package.json                   # Node dependencies
β”‚   β”œβ”€β”€ vite.config.ts                 # Vite configuration
β”‚   β”œβ”€β”€ tailwind.config.ts             # Tailwind configuration
β”‚   └── .env                           # Frontend environment variables
β”‚
└── DOCUMENTATION.md                   # This file

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      HTTP/REST      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend  β”‚ ◄─────────────────► β”‚   FastAPI    β”‚
β”‚   (React)   β”‚      JSON Data      β”‚   Backend    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                            β”‚
                                            β”‚
                                            β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚      RAG Pipeline Manager           β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚          β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚                               β”‚
                         β–Ό                               β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚  ChromaDB   β”‚              β”‚   Ollama    β”‚
                  β”‚ Vector DB   β”‚              β”‚  (Qwen3)    β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  Embeddings Search            Text Generation

Breakdown

1. Frontend (React + TypeScript)

  • API Service: Handles all HTTP requests to backend
  • State Management: React hooks (useState, useEffect) for local state

2. Backend API (FastAPI)

  • CORS Configuration: Allows frontend-backend communication
  • Request Validation: Pydantic models for type safety

3. Vector Database (ChromaDB)

  • Embedding Model: sentence-transformers/all-MiniLM-L6-v2
  • Search: Cosine similarity search for relevant Q&A pairs
  • Metadata: Stores question, answer, and other metadata

4. RAG Pipeline

  • Document Retrieval: Fetches top-k similar Q&A pairs from ChromaDB
  • Hybrid Mode: Switches between RAG and pure LLM based on similarity threshold
  • LLM Generation: Sends prompt to Ollama (Qwen3) for response generation

5. LLM (Ollama + Qwen3)

  • Local Execution: Runs on user's machine (no API costs)
  • Model: Qwen3 8B - optimized for chat and instruction following
  • Temperature: 0.2 for more factual, consistent responses
  • Max Tokens: 1500 tokens (~1000-1200 words)

RAG + LLM Architecture

What is RAG?

Retrieval-Augmented Generation is a technique that combines:

  1. Information Retrieval - Finding relevant documents from a database
  2. Text Generation - Using an LLM to generate responses based on retrieved context

How RAG Works in This Project

graph TD
    A[User Query] --> B[Generate Query Embedding]
    B --> C[Search Vector DB]
    C --> D{Similarity > Threshold?}
    
    D -->|Yes| E[RAG Mode]
    D -->|No| F[Pure LLM Mode]
    
    E --> G[Retrieve Top-K Q&A Pairs]
    G --> H[Build Context Prompt]
    H --> I[Send to Ollama LLM]
    
    F --> J[Build General Prompt]
    J --> I
    
    I --> K[Generate Response]
    K --> L[Return to User with Sources]
Loading

RAG Pipeline Steps

Step 1: Query Processing

# User asks: "What are fundamental rights?"
query = "What are fundamental rights in Indian Constitution?"

Step 2: Embedding Generation

# Convert query to vector representation
query_embedding = embedding_model.encode(query)
# Result: [0.123, -0.456, 0.789, ...] (384 dimensions)

Step 3: Similarity Search

# Search ChromaDB for similar Q&A pairs
results = vector_db.search(query_embedding, top_k=5)
# Returns: Top 5 most similar questions with their answers

Step 4: Relevance Check

# Calculate average similarity
avg_similarity = 1 - avg_distance
threshold = 0.35  # Configurable

if avg_similarity >= threshold:
    mode = "RAG"  # Use database context
else:
    mode = "LLM"  # Use general knowledge

Step 5: Prompt Construction

RAG Mode (with context):

System: You are an AI legal assistant...

Context from Database:
[Reference 1] (ID: qa_123):
Question: What are fundamental rights?
Answer: Fundamental rights are basic human rights enshrined in Part III...


**LLM Mode (no context):**

System: You are an AI legal assistant with expertise in Indian law... Note: This response is based on general knowledge.

User: What are fundamental rights in Indian Constitution? Assistant:


#### Step 6: Response Generation
```python
# Send prompt to Ollama
response = ollama.generate(
    model="qwen3:8b",
    prompt=prompt,
    temperature=0.2,
    max_tokens=1500
)

Step 7: Response Formatting

πŸ“Š Workflow & Flowchart

Complete System Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         USER INTERACTION                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. USER TYPES QUERY                                              β”‚
β”‚    Example: "What is Article 21 of Indian Constitution?"        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. FRONTEND SENDS REQUEST                                        β”‚
β”‚    POST /api/chat                                                β”‚
β”‚    Body: { query, conversation_id?, top_k? }                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. BACKEND RECEIVES REQUEST                                      β”‚
β”‚    - Validates input (max 1000 chars)                            β”‚
β”‚    - Checks RAG pipeline status                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 4. GENERATE QUERY EMBEDDING                                      β”‚
β”‚    - Use Sentence Transformer model                              β”‚
β”‚    - Convert text to 384-dim vector                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 5. SEARCH VECTOR DATABASE (ChromaDB)                             β”‚
β”‚    - Cosine similarity search                                    β”‚
β”‚    - Retrieve top-5 Q&A pairs                                    β”‚
β”‚    - Calculate similarity scores                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚                       β”‚
                β–Ό                       β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Similarity >= 0.35β”‚   β”‚ Similarity < 0.35 β”‚
    β”‚   RAG MODE        β”‚   β”‚   LLM MODE        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                       β”‚
              β–Ό                       β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Build RAG Promptβ”‚   β”‚ Build General Promptβ”‚
    β”‚ - Add Q&A pairs β”‚   β”‚ - No database info  β”‚
    β”‚ - Add sources   β”‚   β”‚ - General knowledge β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                     β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 6. SEND PROMPT TO OLLAMA (Qwen3:8b)                             β”‚
β”‚    - Temperature: 0.2                                            β”‚
β”‚    - Max tokens: 1500                                            β”‚
β”‚    - Timeout: 120s                                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 7. OLLAMA GENERATES RESPONSE                                     β”‚
β”‚    - Processes prompt                                            β”‚
β”‚    - Generates coherent answer                                   β”‚
β”‚    - Returns text response                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 8. FORMAT RESPONSE WITH SOURCES                                  β”‚
β”‚    - Extract source metadata                                     β”‚
β”‚    - Format citations                                            β”‚
β”‚    - Update conversation history                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 9. SEND RESPONSE TO FRONTEND                                     β”‚
β”‚    {                                                             β”‚
β”‚      response: "Article 21 states...",                           β”‚
β”‚      sources: [...],                                             β”‚
β”‚      conversation_id: "uuid",                                    β”‚
β”‚      timestamp: "2025-10-29T..."                                 β”‚
β”‚    }                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 10. FRONTEND DISPLAYS RESPONSE                                   β”‚
β”‚     - Shows bot message with answer                              β”‚
β”‚     - Displays source cards below                                β”‚
β”‚     - Maintains conversation context                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Setup & Installation

Prerequisites

  • Python 3.12+
  • Node.js 18+
  • Ollama installed
  • Git

Step 1: Clone Repository

git clone <repository-url>
cd Ith

Step 2: Backend Setup

# Navigate to backend
cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
# Create .env file in backend directory with:
# FRONTEND_ORIGINS=http://localhost:5173
# OLLAMA_MODEL=qwen3:8b
# OLLAMA_BASE_URL=http://localhost:11434
# VECTOR_DB_PATH=./data/vectordb

Step 3: Load Dataset

# Navigate to scrapers
cd scrapers

# Run dataset loader
python dt.py

# This will download the dataset from Hugging Face and save it
# Follow the prompts to process all examples

Step 4: Build Vector Database

# Navigate to models
cd ../models

# Run vector database setup
python vector_database.py

# This will:
# 1. Create ChromaDB instance
# 2. Generate embeddings for all Q&A pairs
# 3. Store in persistent database

Step 5: Install Ollama Model

# Pull the Qwen3 8B model
ollama pull qwen3:8b

# Verify it's running
ollama list

Step 6: Start Backend Server

# In backend/models directory
python -m uvicorn main:app --reload --port 8000

# Or simply
python main.py

Step 7: Frontend Setup

# Navigate to frontend
cd ../../frontend

# Install dependencies
npm install

# Create .env file with:
# VITE_API_BASE_URL=http://localhost:8000

# Start development server
npm run dev

Step 8: Access Application

Open browser and navigate to:

  • Frontend: http://localhost:5173
  • Backend API Docs: http://localhost:8000/docs

Screenshots

Landing Page Chat Bot Chat Bot 2 About

Future Enhancements

  • User authentication and sessions
  • Chat history persistence
  • Export chat as PDF
  • Multi-language support
  • Voice input/output

Contributors

  • Radhakrishna Bharuka -
  • Nilesh Dwivedi-
  • Hari Krishna Sharma-

License

This project is licensed under the MIT License.


Acknowledgments

  • Hugging Face - For the viber1/indian-law-dataset

Contact

For questions or support, please reach out:


About

A legal chatbot project built as a practical application of NLP and conversational AI. Designed to understand and respond to legal queries using natural language processing techniques. Includes dataset preparation, model training, and a simple interactive interface, all in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published