Public REST API for serving Bhagavad Gita chapters and verses with English/Hindi translations and semantic search capabilities.
- Go Monolith: Handles all API logic and business operations
- PostgreSQL + pgvector: Stores verses and vector embeddings for semantic search
- Python ML Service: Minimal service for real-time embedding generation
- Semantic Search: AI-powered verse search using vector similarity
- Go 1.22+
- PostgreSQL 14+ with pgvector extension
- Python 3.8+ (for ML service)
golang-migrateCLI (for database migrations)
git clone git@github.com:devangb3/Gitartha-Engine.git
cd Gitartha-Engine
go mod tidyCreate a .env file in the project root:
cat <<'ENV' > .env
DATABASE_URL=postgres://<user>:<password>@localhost:5432/gitartha?sslmode=disable
PORT=8186
ENV=development
LOG_LEVEL=info
ML_SERVICE_URL=http://localhost:5001
ENV- The database name (
gitarthain the example) is defined inside theDATABASE_URL. - Ensure the referenced database already exists in PostgreSQL (
createdb gitartha). ML_SERVICE_URLpoints to the Python ML service for embedding generation.
First, install the pgvector extension in PostgreSQL:
# Ubuntu/Debian
sudo apt install postgresql-14-pgvector
# Or compile from source: https://github.com/pgvector/pgvectorApply the database schema including vector embeddings:
make migrate-upThis creates the verse_embeddings table with pgvector support. Use make migrate-down to roll back.
Run the Go ingestion CLI to load verses:
go run ./cmd/ingest --csv bg.csvThis reads bg.csv, upserts chapters/verses, and updates verse_count totals.
Generate embeddings for semantic search:
cd scripts
python generate_embeddings_pgvector.pyThis creates vector embeddings for all verses using the all-MiniLM-L6-v2 model and stores them in PostgreSQL.
The ML service provides embedding generation for semantic search:
cd internal/ml-service
source venv/bin/activate
pip install -r requirements.txt
python app_pgvector.pyThe service runs on http://localhost:5001 and provides:
POST /embed- Generate embeddings for text queriesGET /health- Health check
make runOutput example:
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
[GIN-debug] GET /healthz --> ... (*handler*).health
[GIN-debug] GET /api/v1/chapters --> ...
[GIN-debug] GET /api/v1/semantic-search --> ... (*handler*).semanticSearch
...
Visit http://localhost:8186/healthz to confirm the service is healthy.
GET /api/v1/chapters— List all chapters.GET /api/v1/chapters/{chapter}— Chapter metadata + verses.GET /api/v1/chapters/{chapter}/verses/{verse}— Specific verse with translations.GET /api/v1/search?query=term&lang=en|hi— Keyword search (English/Hindi).GET /api/v1/random— Random verse.
GET /api/v1/semantic-search?query=text&limit=5— AI-powered semantic search using vector similarity.
The API includes interactive Swagger/OpenAPI documentation:
- Swagger UI: Visit
http://localhost:8186/swagger/index.htmlfor interactive API documentation - OpenAPI Spec:
http://localhost:8186/swagger/doc.json(JSON format) - OpenAPI YAML:
http://localhost:8186/swagger/swagger.yaml(YAML format)
Use tools like curl, Postman, or httpie to exercise the endpoints:
curl http://localhost:8186/api/v1/chapters/1/verses/1Run unit tests (includes database layer tests with sqlmock):
make testOr directly:
go test ./...cmd/api # HTTP server entrypoint
cmd/ingest # Data ingestion CLI
internal/config # Configuration loading (Viper)
internal/db # PostgreSQL connection helper
internal/data # DB store for chapters/verses + semantic search
internal/http # Gin router & handlers
internal/search # ML client for embedding generation
internal/ml-service # Python ML service (embedding generation)
migrations # Database schema migrations (includes pgvector)
scripts # Embedding generation scripts
- User Query → Go API receives text query
- Embedding Generation → Python ML service converts text to vector
- Vector Search → Go queries PostgreSQL pgvector for similar verses
- Result Enrichment → Go fetches full verse data and combines with similarity scores
- 40-50% faster than Python-based search
- Direct SQL queries using pgvector's optimized IVFFlat indexing
- Scalable architecture with PostgreSQL handling vector operations
- Minimal Python footprint - only used for embedding generation
- Containerize (Docker Compose for API + Postgres + ML service)
- Add query caching for frequently searched terms
- Consider pure Go implementation with ONNX runtime
Special thanks to JDhruv14 for providing the JDhruv14/Bhagavad-Gita_Dataset, which serves as the foundational dataset for this project.
Questions or issues? Open an issue in the GitHub repository or add to the docs.