Skip to content

omarmohammed271/omarmohammed271

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 

Repository files navigation

πŸ‘‹ Hi, I'm Omar Mohammed

Senior Data Engineer | Real-Time & Batch Data Architect


🧱 About Me

  • Passionate about building scalable data pipelines using Spark, Kafka, and modern data platforms.
  • Experienced in both cloud-native architectures (AWS, Azure) and on-premise systems.
  • Strong believer in the power of clean code, observability, and data quality.

πŸ’Ό What I Do

  • Design and develop ETL and ELT pipelines (batch + streaming).
  • Implement data lakehouse architectures (Bronze / Silver / Gold stages).
  • Build real-time processing systems using Spark Structured Streaming & Kafka.
  • Automate workflows and scheduling with Airflow.
  • Optimize analytics databases (e.g. ClickHouse) for fast query performance.
  • Create CI/CD pipelines for data solutions (using GitHub Actions or similar).

🌐 Tech Stack

Domain Technologies
Data Processing PySpark, Spark SQL, Delta Lake
Streaming Apache Kafka, Spark Structured Streaming
Workflow Orchestration Apache Airflow
Data Storage S3 / ADLS, Delta / Parquet
Analytics ClickHouse, PostgreSQL
Cloud AWS, Azure
CI / CD GitHub Actions
Languages Python, SQL

πŸš€ Featured Projects

Here are some of my key repositories (feel free to click and explore):

  • [Real-Time Processing Pipeline] β€” A Kafka β†’ Spark Streaming system with schema validation and data quality checks.
  • [Lakehouse Architecture Demo] β€” Multi-layer (Bronze / Silver / Gold) data lakehouse built with Delta Lake.
  • [Airflow Data Workflows] β€” End-to-end DAGs for ingestion, transformation, and orchestration.
  • [Analytics in ClickHouse] β€” Setup for real-time analytics using ClickHouse materialized views.
  • [CI/CD for Data Jobs] β€” GitHub Actions to test, build, and deploy data workloads.

πŸ“ˆ GitHub Stats

Omar’s GitHub stats


πŸ“« Get in Touch


⚑ Fun Facts

  • I love optimizing pipelines β€” every millisecond matters.
  • Outside work: I enjoy reading about distributed systems and data infrastructure.
  • Lifelong learner: currently exploring feature stores and ML data platforms.

🌐 Socials:

LinkedIn

πŸ’» Tech Stack:

πŸš€ Tech Stack

🧱 Data Engineering

Spark Kafka Airflow ClickHouse Trino Hadoop


🐍 Programming

Python Scala Java SQL


☁ Cloud & DevOps

AWS Azure Docker Kubernetes GitHub Actions Git GitHub GitLab


πŸ—„ Databases & Warehousing

PostgreSQL MySQL MongoDB Redis Delta Lake S3


πŸ“Š Data Science / ML Tools

NumPy Pandas scikit-learn PyTorch Matplotlib

πŸ“Š GitHub Stats:



πŸ† GitHub Trophies

πŸ” Top Contributed Repo


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published