A small, production-style pipeline that ingests Wikipedia RecentChanges events, buffers them, processes them, and serves results.
- Reader pulls RecentChanges from Wikimedia EventStreams and publishes to Amazon Kinesis
- Consumers read from Kinesis, de-duplicate by revision ID, enrich, and write to DynamoDB
- Firehose archives raw events to S3
- FastAPI API provides health, control, and query endpoints
- Optional cache layer (DAX/Redis) for sub-50 ms queries
- services/
- api/
- reader/
- consumer/
- infra/
- terraform/
- scripts/
- tests/
- .github/
- workflows/
- Create a Python venv in each service folder when you start that piece
- Implement reader - write to a local file or in-memory queue first
- Implement consumer - read from the file/queue, dedupe
rev_id, write to local file - Implement API - simple
GET /health - Replace the file/queue with Kinesis and DynamoDB once local works