A production-ready Go backend demonstrating Retrieval-Augmented Generation (RAG) using:
- Go 1.23
- Fiber v2
- Postgres 16 + pgvector
- OpenAI embeddings (SmallEmbedding3)
- GPT-4o Mini for generation
- Optional mock AI mode (no API key required)
- Docker + Docker Compose
This repository extends the base notes-memory-core backend into a full AI retrieval system:
- Store notes
- Generate vector embeddings
- Run semantic search using pgvector
- Produce AI answers grounded in your notes (RAG)
The system supports both synchronous and asynchronous execution modes, with graceful degradation when optional infrastructure is unavailable.
- CRUD Notes API
- Structured logging (zerolog)
- In-memory metrics at /metrics
- Automatic migrations
- Dockerized Postgres 16
- Rate limiting middleware to protect AI-backed endpoints
- pgvector semantic search
- Embeddings: mock OR real OpenAI
- LLM responses: mock OR real OpenAI
- Clean modular AI architecture
- Fully runnable without any API keys
notes-memory-core-rag/
βββ main.go # API entrypoint
βββ Dockerfile
βββ docker-compose.yml
βββ fly.toml
βββ .env.example
βββ README.md
β
βββ cmd/
β βββ worker/
β βββ main.go # Background job worker (Redis-based)
β
βββ internal/
β βββ ai/ # AI abstraction layer
β β βββ embeddings.go # Mock + real embeddings (ctx-aware)
β β βββ responder.go # Mock + real LLM responses
β β βββ openai.go
β β
β βββ database/
β β βββ database.go # Postgres + migrations
β β βββ redis.go # Optional Redis initialization
β β βββ jobs.go # Async job persistence
β β
β βββ handlers/
β β βββ notes.go # CRUD notes
β β βββ query.go # Synchronous RAG
β β βββ rag_pipeline.go # Shared RAG pipeline logic
β β βββ enqueue_query.go # Async job enqueue
β β βββ get_job.go # Job status retrieval
β β
β βββ middleware/
β βββ logger.go
β βββ metrics.go
β βββ rate_limit.go
β
βββ .github/workflows/
βββ ci.yml
βββ fly-deploy.yml
+-----------------------+
| HTTP Client |
+-----------+-----------+
|
v
+-------+--------+
| Fiber API |
+-------+--------+
|
+-------+--------+
| |
v v
+----+----+ +----+-----+
| Handlers | | Middleware|
| notes.go | | logger.go |
| query.go | | metrics.go|
+----+----+ +-----------+
|
v
+------+------------------------------+
| AI Layer |
| embeddings.go openai.go |
| responder.go mock/real toggle |
+-------------------------------------+
|
v
+------+------------------------------+
| Postgres 16 + pgvector |
| notes + note_embeddings tables |
+-------------------------------------+
User Query
|
v
Generate Query Embedding (mock or real)
|
v
pgvector similarity search (<->)
|
v
Top-K Relevant Notes
|
+-----------------------------+
| USE_MOCK_LLM=true β Mock |
| USE_MOCK_LLM=false β Real |
+-----------------------------+
|
v
Final AI Answer
The RAG pipeline is implemented once and reused by both synchronous HTTP handlers and the background worker.
git clone https://github.com/ai-backend-course/notes-memory-core-rag.git
cd notes-memory-core-rag
cp .env.example .env
Default mode:
- mock embeddings
- mock LLM
- no API key needed
docker-compose up --build
API available at:
http://localhost:8081
Health check.
Return all notes.
{
"title": "My Note",
"content": "This is a test note."
}
Creates:
- note record
- embedding (mock or real)
{
"query": "memory tips"
}
Semantic vector search.
{
"query": "summarize my notes"
}
Full RAG pipeline:
- semantic search
- top-k notes
- AI answer (mock or real)
- Context-aware execution with strict end-to-end timeouts
- Intended for demos, CLI usage, and lightweight UI interactions
This endpoint is always available, even when background infrastructure is not present.
- Enqueues RAG work into Redis
- Processes jobs with a background worker with retries and backoff
- Designed for long-running or high-latency AI tasks
If Redis is unavailable (e.g., API-only deployments), these endpoints return a clear 503 Service Unavailable response instead of failing.
{
"total_requests": 12,
"total_errors": 0,
"avg_latency_ms": 1.7
}
Inside .env:
USE_MOCK_EMBEDDINGS=false
USE_MOCK_LLM=false
OPENAI_API_KEY=your_key_here
This switches pipeline to:
- SmallEmbedding3 for embeddings
- GPT-4o Mini for generation
curl -X POST http://localhost:8081/notes \
-H "Content-Type: application/json" \
-d '{"title":"Test","content":"This is a demo note."}'
curl -X POST http://localhost:8081/search \
-H "Content-Type: application/json" \
-d '{"query":"demo"}'
curl -X POST http://localhost:8081/query \
-H "Content-Type: application/json" \
-d '{"query":"summarize my notes"}'
curl -X POST http://localhost:8081/jobs/query \
-H "Content-Type: application/json" \
-d '{"query":"summarize my notes"}'
curl http://localhost:8081/jobs/:id
This service is deployed to Fly.io in an API-only mode:
- The synchronous RAG endpoint (
/query) is always available - Background job endpoints (
/jobs/*) are enabled only when Redis is present - Redis is treated as an optional dependency
- When Redis is unavailable, async endpoints return a clear
503 Service Unavailable
This design demonstrates graceful degradation and allows the core API to remain stable even when optional infrastructure is absent.
- All AI calls propagate
context.Context - Strict timeouts are enforced across the full RAG pipeline
- Long-running or blocked AI calls cannot stall the API
- Async jobs include retries with exponential backoff
- Optional infrastructure failures never crash the service
| Component | Technology |
|---|---|
| Language | Go 1.23 |
| Framework | Fiber v2 |
| Database | Postgres 16 |
| Vector Search | pgvector |
| Embeddings | SmallEmbedding3 |
| LLM | GPT-4o Mini |
| Containers | Docker Compose |
| Logging | zerolog |
This repo is part of a four-project AI Backend Portfolio:
- notes-memory-core β template backend
- notes-memory-core-rag β flagship RAG system
- AI Summary Microservice
- Embedding Worker Microservice
- Portfolio Website
This repository:
- Runs without OpenAI keys
- Fully supports real OpenAI
- Uses enterprise Go patterns
- Provides semantic search + RAG
- Is ready for employer review
- CI/CD is handled via GitHub Actions, automatically building and deploying to Fly.io with zero-downtime machine replacement.