A highly scalable, event-driven real-time chat application built on a distributed microservices architecture.
This platform handles real-time bidirectional communication, secure media sharing, and presence indicators. It was explicitly designed to demonstrate proficiency in handling highly concurrent WebSocket connections, asynchronous message persistence, centralized configuration, and full-stack observability in a cloud-native environment.
The ecosystem is fully decoupled into 11 distinct repositories, strictly adhering to the single-responsibility principle.
- Traefik Reverse Proxy: Acts as the primary ingress controller, routing external traffic to the API Gateway via Docker labels.
- realtime-chatapp-gateway-service: Built with Spring Cloud Gateway. Dynamically load-balances incoming requests across healthy service instances using Eureka. Handles stateless perimeter security by validating JWTs against the Auth Service's
/.well-known/jwksendpoints.
- realtime-chatapp-service-registry: Eureka server for dynamic service discovery.
- realtime-chatapp-config-service: Centralized Spring Cloud Config server that dynamically pulls environment configurations from a private Git repository (
realtime-chatapp-central-config). - realtime-chatapp-orchestration-config: Central repository for infrastructure-as-code, Docker Compose files, and deployment manifests.
- realtime-chatapp-auth-service: Issues JWTs and manages authentication. Utilizes a Redis Bloom Filter to provide ultra-low-latency "username exists" checks during registration, preventing database cache penetration.
- realtime-chatapp-user-service: Manages user profiles, metadata, and avatars.
- realtime-chatapp-message-service: The real-time engine. Maintains stateful WebSocket (STOMP) connections with clients for messaging, typing indicators, and read/delivered receipts.
- realtime-chatapp-persistence-service: A decoupled worker service that consumes messages from Kafka to permanently store chat history in the database without blocking the real-time WebSocket threads.
- realtime-chatapp-storage-service: Integrates with a self-hosted MinIO object storage bucket. Generates presigned URLs so clients can upload media directly to the bucket, completely bypassing the backend to save network bandwidth and compute resources.
- realtime-chatapp-angular-frontend: Built with Angular 20 (Standalone Components). Features strict state management via NgRx, reactive data streams via RxJS, and a
SockJSclient for robust WebSocket communication. Styled with TailwindCSS and deployed on Vercel's Edge network.
To ensure high availability and responsiveness under load, the architecture implements several advanced patterns:
Standard WebSockets are stateful, meaning if User A connects to Server Instance 1, and User B connects to Server Instance 2, they cannot chat directly.
- Solution: The
message-serviceutilizes Redis Pub/Sub. When a message is received on any WebSocket instance, it is published to a Redis channel. All other instances subscribe to this channel and route the message to the correct recipient, allowing the WebSocket layer to scale horizontally behind the Gateway.
Writing every chat message directly to a relational database synchronously introduces latency and creates a massive bottleneck during traffic spikes.
- Solution: The
message-serviceacts as a producer, immediately acknowledging the message to the user via WebSocket and simultaneously publishing the payload to an Apache Kafka topic. - The decoupled
persistence-serviceconsumes this topic at its own pace to perform database inserts, ensuring the real-time chat experience remains instantaneous regardless of database load.
Operating a complex microservices architecture requires robust telemetry. The platform integrates the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir) via OpenTelemetry (OTel).
- Distributed Tracing (Tempo): Every request entering the Traefik/Spring Gateway is assigned a trace ID, allowing deep visualization of a request's lifecycle as it traverses through Eureka, the Auth service, and down to the Kafka message brokers.
- Metrics (Mimir) & Logs (Loki): JVM metrics, Kafka consumer lag, and centralized logs are ingested and visualized on a unified Grafana dashboard.
The system utilizes a highly normalized relational database schema to ensure data integrity across the messaging and user domains. To support advanced features like massive group channels and real-time read receipts, the schema is optimized for both complex joins and rapid state updates.
The deployment lifecycle and underlying infrastructure are entirely self-managed, emphasizing a modern, cloud-native DevOps approach.
- Production Server: Hosted on a self-managed Oracle Cloud Infrastructure (OCI) Ampere instance (4-core ARM vCPUs, 24GB RAM, 200GB Block Storage) running Ubuntu 24.04 LTS.
- Container Orchestration: The entire 11-service architecture—including backing data stores like MinIO, Kafka, Redis, MySQL, and the complete Grafana/OpenTelemetry observability stack—is containerized and managed natively via Docker Compose on the OCI instance.
A dedicated, self-hosted Jenkins server automates the build, test, and release lifecycle to guarantee production reliability:
- Continuous Integration: On every code push, Jenkins compiles the Spring Boot 4 microservices and executes isolated JUnit and Mockito test suites.
- Integration Testing: The pipeline utilizes the Testcontainers library to spin up ephemeral MySQL databases, Redis caches, and Kafka brokers, validating the distributed event logic in an environment that strictly mirrors production.
- Multi-Architecture Builds: Because the OCI production server uses an ARM architecture, the Jenkins pipeline leverages
docker buildxto compile multi-platform container images (ARM64 and AMD64), pushing the final, production-ready artifacts to an upstream Docker container registry. - Continuous Delivery (Watchtower): A Watchtower container actively monitors the upstream Docker registry for changes. Upon detecting a newly pushed image tag from the CI pipeline, Watchtower automatically pulls the update and gracefully redeploys the target containers, ensuring seamless, zero-downtime updates.
This distributed platform was architected and developed in tight collaboration with the organization owners under the Shumisoft GitHub umbrella.
Operating as a highly focused, engineering team, we optimized our Software Development Life Cycle (SDLC) for rapid iteration, avoiding heavy bureaucratic overhead in favor of velocity and strict task ownership:
- Agile Methodology: We utilized Trello for comprehensive task management. By organizing our workflow into focused sprints, we were able to strictly define the Minimum Viable Product (MVP) and systematically roll out complex architectural additions (like transitioning to Kafka for asynchronous persistence).
- Cross-Functional Ownership: To maintain high-bandwidth communication, both engineers took full-stack ownership. We collaborated closely on the overarching system design, the Angular standalone UI, the Spring Boot microservices, and the underlying Jenkins/Docker infrastructure.
