Skip to content

A high-performance REST API for the Bhagavad Gita, built with Go and PostgreSQL. Features include full text search, semantic search using vector embeddings (pgvector), and a React frontend.

License

Notifications You must be signed in to change notification settings

devangb3/Gitartha-Engine

Repository files navigation

Gitartha Engine

Public REST API for serving Bhagavad Gita chapters and verses with English/Hindi translations and semantic search capabilities.

Architecture

  • Go Monolith: Handles all API logic and business operations
  • PostgreSQL + pgvector: Stores verses and vector embeddings for semantic search
  • Python ML Service: Minimal service for real-time embedding generation
  • Semantic Search: AI-powered verse search using vector similarity

1. Prerequisites

  • Go 1.22+
  • PostgreSQL 14+ with pgvector extension
  • Python 3.8+ (for ML service)
  • golang-migrate CLI (for database migrations)

2. Repository Setup

git clone git@github.com:devangb3/Gitartha-Engine.git
cd Gitartha-Engine
go mod tidy

3. Environment Configuration

Create a .env file in the project root:

cat <<'ENV' > .env
DATABASE_URL=postgres://<user>:<password>@localhost:5432/gitartha?sslmode=disable
PORT=8186
ENV=development
LOG_LEVEL=info
ML_SERVICE_URL=http://localhost:5001
ENV
  • The database name (gitartha in the example) is defined inside the DATABASE_URL.
  • Ensure the referenced database already exists in PostgreSQL (createdb gitartha).
  • ML_SERVICE_URL points to the Python ML service for embedding generation.

4. Database Setup

Install pgvector Extension

First, install the pgvector extension in PostgreSQL:

# Ubuntu/Debian
sudo apt install postgresql-14-pgvector

# Or compile from source: https://github.com/pgvector/pgvector

Run Migrations

Apply the database schema including vector embeddings:

make migrate-up

This creates the verse_embeddings table with pgvector support. Use make migrate-down to roll back.

5. Data Ingestion

Load Verses

Run the Go ingestion CLI to load verses:

go run ./cmd/ingest --csv bg.csv

This reads bg.csv, upserts chapters/verses, and updates verse_count totals.

Generate Vector Embeddings

Generate embeddings for semantic search:

cd scripts
python generate_embeddings_pgvector.py

This creates vector embeddings for all verses using the all-MiniLM-L6-v2 model and stores them in PostgreSQL.

6. Running the Services

Start Python ML Service

The ML service provides embedding generation for semantic search:

cd internal/ml-service
source venv/bin/activate
pip install -r requirements.txt
python app_pgvector.py

The service runs on http://localhost:5001 and provides:

  • POST /embed - Generate embeddings for text queries
  • GET /health - Health check

Start Go API Server

make run

Output example:

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
[GIN-debug] GET    /healthz                  --> ... (*handler*).health
[GIN-debug] GET    /api/v1/chapters          --> ...
[GIN-debug] GET    /api/v1/semantic-search   --> ... (*handler*).semanticSearch
...

Visit http://localhost:8186/healthz to confirm the service is healthy.

7. API Overview

Core Endpoints

  • GET /api/v1/chapters — List all chapters.
  • GET /api/v1/chapters/{chapter} — Chapter metadata + verses.
  • GET /api/v1/chapters/{chapter}/verses/{verse} — Specific verse with translations.
  • GET /api/v1/search?query=term&lang=en|hi — Keyword search (English/Hindi).
  • GET /api/v1/random — Random verse.

Semantic Search

  • GET /api/v1/semantic-search?query=text&limit=5 — AI-powered semantic search using vector similarity.

Interactive API Documentation

The API includes interactive Swagger/OpenAPI documentation:

  • Swagger UI: Visit http://localhost:8186/swagger/index.html for interactive API documentation
  • OpenAPI Spec: http://localhost:8186/swagger/doc.json (JSON format)
  • OpenAPI YAML: http://localhost:8186/swagger/swagger.yaml (YAML format)

Use tools like curl, Postman, or httpie to exercise the endpoints:

curl http://localhost:8186/api/v1/chapters/1/verses/1

8. Testing

Run unit tests (includes database layer tests with sqlmock):

make test

Or directly:

go test ./...

9. Project Layout (high level)

cmd/api              # HTTP server entrypoint
cmd/ingest           # Data ingestion CLI
internal/config      # Configuration loading (Viper)
internal/db          # PostgreSQL connection helper
internal/data        # DB store for chapters/verses + semantic search
internal/http        # Gin router & handlers
internal/search      # ML client for embedding generation
internal/ml-service  # Python ML service (embedding generation)
migrations           # Database schema migrations (includes pgvector)
scripts              # Embedding generation scripts

10. Performance & Architecture

Semantic Search Flow

  1. User Query → Go API receives text query
  2. Embedding Generation → Python ML service converts text to vector
  3. Vector Search → Go queries PostgreSQL pgvector for similar verses
  4. Result Enrichment → Go fetches full verse data and combines with similarity scores

Performance Benefits

  • 40-50% faster than Python-based search
  • Direct SQL queries using pgvector's optimized IVFFlat indexing
  • Scalable architecture with PostgreSQL handling vector operations
  • Minimal Python footprint - only used for embedding generation

11. Next Steps

  • Containerize (Docker Compose for API + Postgres + ML service)
  • Add query caching for frequently searched terms
  • Consider pure Go implementation with ONNX runtime

Acknowledgements

Special thanks to JDhruv14 for providing the JDhruv14/Bhagavad-Gita_Dataset, which serves as the foundational dataset for this project.

Questions or issues? Open an issue in the GitHub repository or add to the docs.

About

A high-performance REST API for the Bhagavad Gita, built with Go and PostgreSQL. Features include full text search, semantic search using vector embeddings (pgvector), and a React frontend.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published