Skip to content

🤖 AI-powered code Q&A platform | Ask questions about any GitHub repo using semantic search & vector embeddings | Built with React, Node.js, PostgreSQL (pgvector) & Google Gemini API

Notifications You must be signed in to change notification settings

Shubz224/RepoChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💬 CodeChat - AI-Powered Code Repository Q&A

CodeChat Banner

Node.js React PostgreSQL Supabase Google Gemini TailwindCSS

Ask natural language questions about any GitHub repository and get intelligent, context-aware answers

🚀 Live Demo


📖 Table of Contents


🌟 Overview

CodeChat is an intelligent code repository analysis system that enables developers to interact with their GitHub repositories using natural language. Instead of manually searching through files and documentation, simply ask questions and get AI-powered answers with precise file references and code citations.

Built with cutting-edge vector similarity search and Google's Gemini AI, CodeChat indexes your entire repository, understands code context, and provides accurate answers backed by actual source code.

Why CodeChat?

  • 🔍 Semantic Code Search - Find relevant code using natural language, not just keywords
  • 🤖 AI-Powered Answers - Get intelligent responses with file citations and code snippets
  • Fast & Efficient - Vector embeddings enable lightning-fast similarity search
  • 📚 Complete Context - AI understands relationships between files and functions
  • 🔒 Secure & Private - Your code stays secure with token-based authentication

✨ Key Features

🔐 Authentication & User Management

  • Secure user registration and login with JWT tokens
  • Password hashing with bcrypt
  • Session management and protected routes
  • User-specific project isolation

📦 GitHub Repository Integration

  • One-Click Indexing - Simply paste your GitHub repository URL
  • Automatic File Discovery - Intelligently fetches and filters code files
  • Language Detection - Supports multiple programming languages
  • Public & Private Repos - Works with both public and private repositories (with token)

🧠 Intelligent Code Analysis

  • AI Summarization - Each file gets an AI-generated summary using Gemini 2.5 Flash
  • Vector Embeddings - Code converted to 768-dimensional vectors for semantic search
  • pgvector Integration - PostgreSQL extension for efficient similarity search
  • Contextual Understanding - AI comprehends code structure and relationships

💬 Natural Language Q&A

  • Ask Anything - Query your codebase in plain English
  • Smart Answers - AI generates context-aware responses with explanations
  • File Citations - Every answer includes relevant file references
  • Similarity Scores - See how relevant each file is to your question
  • Code Snippets - View actual code excerpts that answer your question

📊 Project Management

  • Multiple Projects - Index and manage multiple repositories
  • Status Tracking - Real-time indexing progress monitoring
  • Project Dashboard - View file counts, languages, and indexing status
  • Easy Deletion - Remove projects and associated data with one click

📜 Question History

  • Conversation Memory - Access all past questions and answers
  • Export Capabilities - Save important Q&A sessions
  • Search History - Find previous questions quickly
  • Delete Questions - Remove unwanted history items

🎬 Demo

Chat Box

Dashboard

Repository Indexing

Indexing Process


🛠️ Tech Stack

Backend

Technology Purpose
Node.js JavaScript runtime environment
Express.js Web application framework
PostgreSQL Primary database with advanced features
pgvector Vector similarity search extension
Supabase Backend-as-a-Service for database and auth
Google Gemini API AI for summarization, embeddings, and Q&A
JWT Secure token-based authentication
Bcrypt Password hashing and security
Axios HTTP client for API requests
Express Rate Limit API rate limiting and abuse prevention

Frontend

Technology Purpose
React.js UI library for building interactive interfaces
React Router DOM Client-side routing and navigation
Tailwind CSS Utility-first CSS framework
Context API Global state management
Axios HTTP client for backend communication

AI & Machine Learning

  • Google Gemini 2.5 Flash - Code understanding and question answering
  • text-embedding-004 - High-quality text embeddings (768 dimensions)
  • Vector Similarity Search - Cosine similarity for semantic matching

🏗️ Architecture

Architecture

Data Flow

  1. User Authentication - JWT tokens secure all API requests
  2. Repository Indexing:
    • Fetch repository structure from GitHub API
    • Filter relevant code files (.js, .py, .java, etc.)
    • Generate AI summaries for each file using Gemini
    • Create vector embeddings using text-embedding-004
    • Store in PostgreSQL with pgvector
  3. Question Answering:
    • User asks question in natural language
    • Question converted to vector embedding
    • pgvector performs similarity search to find relevant files
    • Top matching files sent to Gemini with question
    • AI generates contextual answer with file citations
  4. History Management - All Q&As stored for future reference

📋 Prerequisites

Before you begin, ensure you have the following installed and configured:

  • Node.js 16.x or higher (Download)
  • PostgreSQL 14.x or higher with pgvector extension
  • Git for version control
  • npm or yarn package manager

Required Accounts & API Keys

  1. Supabase Account - Sign up free
  2. Google Gemini API Key - Get API key
  3. GitHub Personal Access Token (optional, for private repos) - Generate token

🚀 Installation

1. Clone the Repository

git clone https://github.com/yourusername/codechat.git
cd codechat

2. Backend Setup

cd backend
npm install

Install Dependencies:

npm install express pg @supabase/supabase-js @google/generative-ai bcryptjs jsonwebtoken axios dotenv express-rate-limit cors

3. Frontend Setup

cd ../frontend
npm install

Install Dependencies:

npm install react react-dom react-router-dom axios tailwindcss

⚙️ Configuration

Backend Environment Variables

Create a .env file in the backend directory:

# Server Configuration
PORT=5000
NODE_ENV=development

# Supabase Configuration
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-supabase-anon-key
SUPABASE_SERVICE_KEY=your-supabase-service-role-key

# Database Configuration (from Supabase)
DATABASE_URL=postgresql://postgres:[password]@db.[project-ref].supabase.co:5432/postgres

# JWT Configuration
JWT_SECRET=your-super-secret-jwt-key-min-32-characters
JWT_EXPIRE=7d

# Google Gemini API
GEMINI_API_KEY=your-gemini-api-key-here

# GitHub Configuration (Optional - for private repos)
GITHUB_TOKEN=ghp_your-personal-access-token

# Rate Limiting
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100

# Frontend URL (for CORS)
FRONTEND_URL=http://localhost:3000

Frontend Environment Variables

Create a .env file in the frontend directory:

# API Configuration
REACT_APP_API_URL=http://localhost:5000/api

# App Configuration
REACT_APP_NAME=CodeChat
REACT_APP_VERSION=1.0.0

🗄️ Database Setup

Option 1: Supabase (Recommended)

  1. Create a Supabase Project

  2. Enable pgvector Extension

In the Supabase SQL Editor, run:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
  1. Create Database Schema
-- Users table
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Projects table
CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL,
    repo_owner VARCHAR(255) NOT NULL,
    repo_name VARCHAR(255) NOT NULL,
    github_url TEXT NOT NULL,
    status VARCHAR(50) DEFAULT 'pending',
    file_count INTEGER DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Source code embeddings table
CREATE TABLE source_code_embeddings (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    file_path TEXT NOT NULL,
    source_code TEXT NOT NULL,
    summary TEXT,
    embedding vector(768),
    language VARCHAR(50),
    file_size INTEGER,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Questions table
CREATE TABLE questions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    question TEXT NOT NULL,
    answer TEXT NOT NULL,
    file_references JSONB,
    query_embedding vector(768),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create indexes for better performance
CREATE INDEX idx_projects_user_id ON projects(user_id);
CREATE INDEX idx_embeddings_project_id ON source_code_embeddings(project_id);
CREATE INDEX idx_questions_project_id ON questions(project_id);
CREATE INDEX idx_questions_user_id ON questions(user_id);

-- Create vector similarity search index
CREATE INDEX idx_embeddings_vector ON source_code_embeddings 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

CREATE INDEX idx_questions_vector ON questions 
USING ivfflat (query_embedding vector_cosine_ops)
WITH (lists = 100);
  1. Set Row Level Security (Optional)
-- Enable RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE source_code_embeddings ENABLE ROW LEVEL SECURITY;
ALTER TABLE questions ENABLE ROW LEVEL SECURITY;

-- Create policies (examples)
CREATE POLICY "Users can view own data" ON users
    FOR SELECT USING (auth.uid() = id);

CREATE POLICY "Users can view own projects" ON projects
    FOR SELECT USING (auth.uid() = user_id);

Option 2: Local PostgreSQL

  1. Install PostgreSQL and pgvector
# macOS
brew install postgresql pgvector

# Ubuntu/Debian
sudo apt-get install postgresql postgresql-contrib
  1. Enable pgvector
CREATE EXTENSION vector;
  1. Run the same schema SQL from above

📖 Usage Guide

Starting the Application

1. Start Backend Server

cd backend
npm run dev

Server will start at http://localhost:5000

2. Start Frontend Development Server

cd frontend
npm start

Application will open at http://localhost:3000

Using CodeChat

Step 1: Register/Login

  1. Navigate to the registration page
  2. Create an account with email and password
  3. Login with your credentials

Step 2: Create a Project

  1. Click "New Project" on the dashboard
  2. Enter a project name
  3. Paste the GitHub repository URL (e.g., https://github.com/facebook/react)
  4. Click "Create Project"

Step 3: Index the Repository

  1. Click "Start Indexing" on your project
  2. Wait for the indexing process to complete
    • Files are fetched from GitHub
    • AI generates summaries
    • Vector embeddings are created
  3. Monitor the status indicator

Step 4: Ask Questions

  1. Open the indexed project
  2. Type your question in natural language:
    • "How does authentication work in this project?"
    • "Where is the database connection established?"
    • "Explain the routing logic"
    • "What libraries are used for state management?"
  3. Review the AI-generated answer with file citations
  4. Explore the referenced files and code snippets

Step 5: View History

  1. Navigate to "Question History"
  2. Browse all past questions and answers
  3. Delete questions you no longer need

Example Questions

✅ "How is user authentication implemented?"
✅ "What API endpoints are available?"
✅ "Where is the database schema defined?"
✅ "Explain how the vector search works"
✅ "What dependencies does this project use?"
✅ "How are errors handled in the API?"
✅ "Where is the configuration loaded?"

📚 API Documentation

Base URL

http://localhost:5000/api

Authentication Endpoints

Register User

POST /api/auth/register
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "SecurePassword123"
}

Response:

{
  "success": true,
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "uuid",
    "email": "user@example.com"
  }
}

Login User

POST /api/auth/login
Content-Type: application/json

{
  "email": "user@example.com",
  "password": "SecurePassword123"
}

Project Endpoints

List Projects

GET /api/projects
Authorization: Bearer <token>

Response:

{
  "success": true,
  "projects": [
    {
      "id": "uuid",
      "name": "React Project",
      "repo_owner": "facebook",
      "repo_name": "react",
      "github_url": "https://github.com/facebook/react",
      "status": "completed",
      "file_count": 245,
      "created_at": "2024-01-15T10:30:00Z"
    }
  ]
}

Create Project

POST /api/projects
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "My Project",
  "githubUrl": "https://github.com/username/repo"
}

Get Project Details

GET /api/projects/:id
Authorization: Bearer <token>

Start Indexing

POST /api/projects/:id/process
Authorization: Bearer <token>

Check Indexing Status

GET /api/projects/:id/status
Authorization: Bearer <token>

Response:

{
  "status": "processing",
  "progress": {
    "processed": 120,
    "total": 245,
    "percentage": 48.98
  }
}

Delete Project

DELETE /api/projects/:id
Authorization: Bearer <token>

Question Endpoints

Ask Question

POST /api/questions/:projectId/ask
Authorization: Bearer <token>
Content-Type: application/json

{
  "question": "How does authentication work?"
}

Response:

{
  "success": true,
  "answer": "Authentication is implemented using JWT tokens...",
  "fileReferences": [
    {
      "filePath": "src/auth/middleware.js",
      "summary": "JWT authentication middleware",
      "similarity": 0.87,
      "codeSnippet": "const verifyToken = (req, res, next) => {...}"
    }
  ]
}

Get Question History

GET /api/questions/:projectId/history
Authorization: Bearer <token>

Delete Question

DELETE /api/questions/:questionId
Authorization: Bearer <token>

Rate Limits

  • Authentication endpoints: 5 requests per 15 minutes
  • Project endpoints: 50 requests per 15 minutes
  • Question endpoints: 20 requests per 15 minutes

📁 Project Structure

codechat/
├── backend/
│   ├── config/
│   │   ├── database.js          # Database connection
│   │   └── gemini.js            # Gemini AI configuration
│   ├── middleware/
│   │   ├── auth.js              # JWT authentication
│   │   └── rateLimiter.js       # Rate limiting
│   ├── routes/
│   │   ├── auth.js              # Authentication routes
│   │   ├── projects.js          # Project management
│   │   └── questions.js         # Q&A routes
│   ├── services/
│   │   ├── githubService.js     # GitHub API integration
│   │   ├── embeddingService.js  # Vector embedding generation
│   │   ├── indexingService.js   # Repository indexing
│   │   └── qaService.js         # Question answering
│   ├── utils/
│   │   ├── vectorSearch.js      # pgvector similarity search
│   │   └── fileParser.js        # Code file parsing
│   ├── .env                     # Environment variables
│   ├── server.js                # Express server entry
│   └── package.json
│
├── frontend/
│   ├── public/
│   │   └── index.html
│   ├── src/
│   │   ├── components/
│   │   │   ├── Auth/
│   │   │   │   ├── Login.jsx
│   │   │   │   └── Register.jsx
│   │   │   ├── Dashboard/
│   │   │   │   ├── ProjectList.jsx
│   │   │   │   └── ProjectCard.jsx
│   │   │   ├── Project/
│   │   │   │   ├── ProjectDetails.jsx
│   │   │   │   ├── IndexingStatus.jsx
│   │   │   │   └── QAInterface.jsx
│   │   │   └── History/
│   │   │       └── QuestionHistory.jsx
│   │   ├── context/
│   │   │   └── AuthContext.jsx   # Authentication context
│   │   ├── services/
│   │   │   └── api.js            # Axios API client
│   │   ├── App.jsx               # Main app component
│   │   ├── index.js              # Entry point
│   │   └── index.css             # Tailwind styles
│   ├── .env
│   ├── tailwind.config.js
│   └── package.json
│
└── README.md

🐛 Troubleshooting

Common Issues

1. Database Connection Errors

Error: Connection refused to PostgreSQL

Solution:

  • Verify DATABASE_URL in .env
  • Check Supabase project is active
  • Ensure network connectivity

2. pgvector Extension Not Found

Error: extension "vector" does not exist

Solution:

-- Run in Supabase SQL Editor
CREATE EXTENSION IF NOT EXISTS vector;

3. Gemini API Rate Limits

Error: 429 Too Many Requests

Solution:

  • Reduce concurrent indexing operations
  • Implement request queuing
  • Consider upgrading Gemini API tier

4. Large Repositories Timeout

Error: Request timeout during indexing

Solution:

  • Process files in smaller batches
  • Increase timeout limits in axios config
  • Filter out non-essential files

5. CORS Errors

Error: Access-Control-Allow-Origin blocked

Solution:

  • Verify FRONTEND_URL in backend .env
  • Check CORS middleware configuration
  • Ensure correct API URL in frontend .env

6. JWT Token Expired

Error: Token expired or invalid

Solution:

  • User needs to login again
  • Implement token refresh mechanism
  • Check JWT_EXPIRE setting

Debug Mode

Enable detailed logging:

# Add to backend .env
DEBUG=true
LOG_LEVEL=verbose

🗺️ Roadmap

Version 1.1 (Q2 2024)

  • Support for more programming languages
  • Batch question asking (multiple questions at once)
  • Export Q&A sessions to PDF/Markdown
  • Code snippet highlighting in answers
  • Project sharing with team members

Version 1.2 (Q3 2024)

  • Real-time collaboration on projects
  • Integration with GitLab and Bitbucket
  • Custom AI model fine-tuning
  • Advanced filtering and search options
  • Mobile application (React Native)

Version 2.0 (Q4 2024)

  • Code generation based on Q&A context
  • Automatic documentation generation
  • Integration with IDE plugins (VS Code, IntelliJ)
  • Multi-repository project support
  • Advanced analytics and insights dashboard

Community Requests

  • Voice-to-text question input
  • Diagram generation from code explanations
  • Integration with Slack/Discord bots
  • Webhook support for CI/CD pipelines

🤝 Contributing

We welcome contributions from the community! CodeChat is open-source and thrives on collaboration.

How to Contribute

  1. Fork the Repository

    git clone https://github.com/yourusername/codechat.git
  2. Create a Feature Branch

    git checkout -b feature/amazing-feature
  3. Make Your Changes

    • Write clean, documented code
    • Follow existing code style
    • Add tests if applicable
  4. Commit Your Changes

    git commit -m "Add amazing feature"
  5. Push to Your Fork

    git push origin feature/amazing-feature
  6. Open a Pull Request

    • Describe your changes clearly
    • Reference any related issues
    • Wait for review and feedback

Contribution Guidelines

  • Code Style: Follow ESLint and Prettier configurations
  • Commits: Use conventional commit messages
  • Testing: Add tests for new features
  • Documentation: Update README and inline comments
  • Issues: Check existing issues before creating new ones

Development Setup

# Install development dependencies
npm install --include=dev

# Run tests
npm test

# Run linting
npm run lint

# Format code
npm run format

Areas for Contribution

  • 🐛 Bug fixes and error handling
  • ✨ New features and enhancements
  • 📝 Documentation improvements
  • 🎨 UI/UX enhancements
  • 🧪 Test coverage expansion
  • 🌐 Internationalization (i18n)

📄 License

This project is licensed under the MIT License.

MIT License

Copyright (c) 2024 CodeChat

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

See LICENSE file for details.


🙏 Acknowledgments

CodeChat is built on the shoulders of giants. We'd like to thank:

Technologies

  • Google Gemini - Powerful AI models for code understanding
  • Supabase - Backend infrastructure and database
  • pgvector - Vector similarity search for PostgreSQL
  • React - UI library for building the frontend
  • Tailwind CSS - Utility-first CSS framework

Inspiration

  • GitHub Copilot - AI-powered code assistance
  • Phind - AI search for developers
  • Sourcegraph - Code intelligence platform

Contributors

Thank you to all our contributors who help make CodeChat better!



⭐ Star this repository if you find it helpful!

Built with ❤️ by developers, for developers

⬆ Back to Top


About

🤖 AI-powered code Q&A platform | Ask questions about any GitHub repo using semantic search & vector embeddings | Built with React, Node.js, PostgreSQL (pgvector) & Google Gemini API

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published