Skip to content

shumisoft/distributed-url-shortener

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Distributed URL Shortener

Backend Deployment CI/CD Infrastructure API Gateway Database Data Structure Configuration Discovery

A high-performance, distributed URL shortening and redirection platform architected to handle massive concurrent traffic and scale to 1 million users.

This project demonstrates deep, end-to-end ownership of a distributed system. It moves beyond standard CRUD operations by implementing a highly available API gateway, lock-free distributed ID generation via consensus algorithms, and multi-layered caching strategies to guarantee ultra-low latency.


📂 System Architecture & Microservices

The system is decoupled into independent microservices to allow isolated scaling of the read-heavy (redirection) and write-heavy (shortening) paths.

Live Environment:

  • 🌐 Application Endpoint: here

Traffic Flow

  • Edge Routing: Traefik acts as the API Gateway. It natively integrates with the Consul Server Cluster as its configuration provider, enabling dynamic service discovery. When new instances of a service spin up, Traefik routes traffic to them automatically without manual configuration reloads.
  • Path-Based Routing: * Requests to the root path (/) serve the HTML landing page and route POST requests to the Shortening Service.
    • Requests matching the Base62 pattern (/{code}) are routed instantly to the Redirection Service.

⚙️ Distributed Systems Mechanics (System Design)

To ensure high availability and prevent performance bottlenecks like database lock contention and cache penetration, the system implements the following advanced design patterns:

1. Lock-Free Distributed ID Generation (Ticket Server Pattern)

Generating unique, collision-free short codes in a highly concurrent distributed environment is a primary challenge. Relying on a central database's auto-increment creates a massive bottleneck and single point of failure.

  • Solution: We utilize a Consul Key-Value store to implement a distributed range-allocation strategy.
  • Implementation: On startup, instances of the Shortening Service request an ID Range (e.g., IDs 1,000 to 2,000) from Consul. We utilize Consul's Check-And-Set (CAS) operation to atomically increment the range counter. This guarantees that no two service instances are ever assigned the same block of IDs, enabling lightning-fast, lock-free ID generation entirely in application memory.

2. Cache Penetration Prevention (Bloom Filters)

A common attack vector for URL shorteners involves malicious users or bots requesting non-existent short links to bypass the cache and overwhelm the primary database (Cache Penetration).

  • Solution: The Redirection Service is integrated with Redis, utilizing a Bloom Filter.
  • Implementation: Before querying the PostgreSQL database or the Redis KV cache, the service checks the Bloom Filter. If the filter determines the URL definitely does not exist, the request is immediately rejected with a 404. This completely shields the underlying persistence layer from malicious load. Valid requests are then resolved via the Redis KV Cache for sub-millisecond latency.

🚀 Infrastructure & Self-Managed Deployment

The entire deployment lifecycle and underlying infrastructure are entirely self-managed, emphasizing a cloud-native, automated operations (DevOps) approach.

  • Self-Managed VPS: The production environment is hosted on a free-tier Oracle Cloud Infrastructure (OCI) Ampere instance (4-core ARM vCPUs, 24GB RAM, 200GB Block Storage) running Ubuntu 24.04 LTS.
  • Container Orchestration: The entire stack—including Traefik, Consul, Redis, PostgreSQL, and the custom Spring Boot microservices—is containerized and managed via Docker Compose.
  • Zero-Downtime Automated Rollouts: A Watchtower container continuously polls the private Docker registry. Upon detecting a new image tag pushed by the CI pipeline, Watchtower gracefully pulls the multi-architecture image and replaces the running containers automatically, ensuring continuous delivery with zero downtime.

🔄 CI/CD Pipeline & Quality Assurance

A centralized Jenkins server (running natively on the ARM architecture) automates the build, test, and release lifecycle.

  1. Continuous Integration: On every commit, Jenkins executes the JUnit and Mockito test suites.
  2. Integration Testing: The pipeline utilizes the Testcontainers library to spin up ephemeral Redis and PostgreSQL instances, ensuring the distributed caching and database logic is validated in an environment that strictly mirrors production.
  3. Cross-Platform Builds: Given the ARM-based OCI infrastructure, the Jenkins pipeline utilizes docker buildx to compile multi-platform container images, guaranteeing compatibility across both ARM and AMD64 environments before pushing to the container warehouse.

👥 Team, Agile Workflow & Collaboration

This system was architected and developed in collaboration with the organization owners under the Shumisoft GitHub umbrella.

Operating as a lean, high-velocity two-person engineering team, we optimized our Software Development Life Cycle (SDLC) for rapid iteration and strict task ownership:

  • Agile Methodology: We utilized Trello for comprehensive task management, organizing our workflow into focused sprints to strictly define and deliver the Minimum Viable Product (MVP) and subsequent scaling features.
  • Cross-Functional Ownership: To maintain high bandwidth communication and rapid deployment cycles, both engineers took full-stack ownership—collaborating closely on the system design, the Java/Spring Boot microservice implementations, and the underlying Docker/Jenkins infrastructure.

About

Distributed URL Shortener

Topics

Resources

License

Stars

Watchers

Forks

Contributors