Skip to content

Sammy-Dabbas/wiki-stream-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real Time Wiki Change Processing System

A small, production-style pipeline that ingests Wikipedia RecentChanges events, buffers them, processes them, and serves results.

High-level

  • Reader pulls RecentChanges from Wikimedia EventStreams and publishes to Amazon Kinesis
  • Consumers read from Kinesis, de-duplicate by revision ID, enrich, and write to DynamoDB
  • Firehose archives raw events to S3
  • FastAPI API provides health, control, and query endpoints
  • Optional cache layer (DAX/Redis) for sub-50 ms queries

Repo structure

  • services/
    • api/
    • reader/
    • consumer/
  • infra/
    • terraform/
    • scripts/
  • tests/
  • .github/
    • workflows/

Getting started

  1. Create a Python venv in each service folder when you start that piece
  2. Implement reader - write to a local file or in-memory queue first
  3. Implement consumer - read from the file/queue, dedupe rev_id, write to local file
  4. Implement API - simple GET /health
  5. Replace the file/queue with Kinesis and DynamoDB once local works

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published