REST API in Go to fetch song lyrics by scraping Vagalume using Chrome (chromedp). Includes an in-memory cache with TTL to reduce latency on repeated queries.
- Gin HTTP server with endpoints
/healthand/v1/lyrics. - Scraping with
chromedp(Chrome runs headless by default). - In-memory cache with TTL and a background janitor.
- Structured logging with Uber Zap.
- Go (the module declares
go 1.24.4). - Google Chrome or Chromium installed on the system (chromedp uses it under the hood).
- macOS, Linux, or Windows.
APP_ENV:developmentorproduction. Affects logs and Gin mode. Default:development.APP_PORT: server port. Default:8080.APP_LYRICS_CACHE_TTL_SECONDS: cache TTL in seconds. Default:1800(30 min).APP_LYRICS_CACHE_MAX_ENTRIES: maximum cache capacity. Default:1000.MONGODB_URI: MongoDB connection string used by the DB-backed cache. Example:mongodb://localhost:27017or an Atlas URI.MONGODB_DB: (optional) MongoDB database name for the cache. Default:lyricscrawl.MONGODB_COLLECTION: (optional) Collection used for cache documents. Default:lyrics_cache.
Local testing with MongoDB (optional):
- Start a local Mongo instance with Docker:
docker run -d -p 27017:27017 --name mongo mongo:6
export MONGODB_URI="mongodb://localhost:27017"- Start the app (dev):
make devYou can define them in a .env at the repository root; if present, it is loaded on startup.
Clone the repository and fetch dependencies.
Development (manual reload):
make devProduction (binary in ./bin/app):
make build
make startAlternative without Makefile:
# development
sh scripts/dev.sh
# production
sh scripts/build.sh
sh scripts/start.shHealth check:
GET /health -> 200 { "status": "ok" }Get lyrics:
GET /v1/lyrics?query=<artist> - <song>curl example:
curl "http://localhost:8080/v1/lyrics?query=Coldplay%20-%20Yellow"Response:
{
"data": "<lyrics>",
"cached": true
}The cache key is the normalized query (lowercased, trimmed). If there is a cache hit, cached is true and the response is immediate; otherwise scraping runs and the result is stored.
- Opens a Chrome context (
headless=trueby default) with a random User-Agent. - Navigates to
https://www.vagalume.com.br/search?q=<query>and takes the first result. - Adjusts the URL (removes
-traducaoif applicable) and navigates to the detail page. - If an 18+ notice appears, it tries to accept the modal.
- Extracts the text from
div#lyricsand removes confirmation messages.
Relevant code:
src/scraper/Scraper.gosrc/scraper/UserAgentGenerator.go
- Each request currently creates a fresh Chrome context and closes it afterwards. This is safe and simple.
- If you need to keep Chrome open persistently across requests, consider creating a reusable global context and managing its lifecycle. (Not implemented by default.)
- Implementation in
src/cache/LyricsCache.go. - Automatically initialized in
mainby reading environment variables. - Periodic background cleanup (~TTL/2, minimum 30s).
src/main.go: server bootstrap, loads.env, logger, and router.src/api/router/Router.go: routes and groups.src/api/controller/TokenController.go: lyrics controller.src/scraper/*: scraping and user-agent.src/logger/logger.go: Zap configuration.scripts/*: helpers for dev/build/start.bruno-http/*: optional Bruno collection for testing.
- Chrome not found or slow to start: install stable Google Chrome and keep it updated. In containers, enable flags like
--no-sandbox(already set in code) as needed. - Permissions on macOS: if a security dialog appears when launching Chrome, allow the app.
- Broken selectors: Vagalume selectors may change. Check
a.gs-title,div#lyrics, and the 18+ modal if things stop working.
Not specified.