Data Engineer | Technical Writer | Building Scalable, Reliable Data Pipelines | Cloud & Workflow Automation
With a passion for modern data stack tooling, I specialize in building production-ready data pipelines using Python and emerging frameworks like dlt (data load tool). I focus on clean, maintainable ingestion pipelines, orchestration, and cloud-native data workflows.
Core scripting and advanced querying for data engineering workflows.
Building robust batch and real-time data ingestion pipelines at scale.
Cloud data architecture and modern data warehousing solutions.
Orchestrating reliable, production-grade data workflows.
- Data Ingestion: dlt (data load tool), PySpark
- Orchestration & Workflow: Apache Airflow, Kestra, Prefect
- Data Transformation: dbt, SQL
- Infrastructure & Deployment: Docker, Terraform
- CI/CD: GitHub Actions
- Version Control: Advanced Git
- Monitoring & Reliability: Structured logging, pipeline health checks, alerting
- Documentation: Pipeline lineage, runbooks, data dictionaries
Built a production-ready data ingestion pipeline using dlt (data load tool) to ingest, normalize, and load NYC taxi trip data into a cloud data warehouse.
Impact: Automated end-to-end data loading with schema inference, incremental loading, and built-in data quality checks.
Key Challenge: Handling schema evolution across different taxi dataset versions while maintaining idempotent, reliable loads.
Stack: Python · dlt · SQL · GitHub Actions
Designed and implemented a modular dlt-based analytics engineering pipeline with structured ingestion layers and transformation workflows.
Impact: Reduced manual data wrangling effort with automated schema management, enabling clean separation between ingestion and transformation layers.
Key Challenge: Structuring incremental pipeline runs that are both efficient and replayable from any checkpoint.
Stack: Python · dlt · dbt · SQL · Jupyter Notebook
Leveraged PySpark to process and analyze large-scale datasets using distributed computing techniques.
Impact: Applied big data processing fundamentals to transform raw datasets into structured, analysis-ready formats.
Key Challenge: Optimizing Spark jobs for performance while maintaining code clarity and reproducibility in Jupyter notebooks.
Stack: PySpark · Python · Jupyter Notebook
Open to Remote & Hybrid Opportunities
GitHub: github.com/Derrick-Ryan-Giggs
Open to collaborating on interesting data infrastructure projects and discussions about data engineering, cloud architecture, and modern data stack tooling!
Last Updated: 2026-03-29 17:56:14
