Cloud Data Engineer — Project Repository

Welcome to the Cloud Data Engineer lab — a portfolio-style repository that showcases real-world data engineering and data science projects using modern cloud technologies such as AWS, GCP, and open public data.

This repository is structured to reflect professional development practices, with each subproject demonstrating specific cloud pipelines, automation techniques, and scalable architectures.

Purpose

To consolidate hands-on cloud projects that:

Showcase practical skills in data ingestion, transformation, and storage
Leverage public APIs and cloud services (e.g., AWS S3, BigQuery, Cloud Functions, Dataproc)
Apply best practices in modular development, reproducibility, and documentation
Serve as a portfolio for recruiters and technical leads

Subprojects

🔹 ETL: ANEEL Complaint Data to AWS

Extracts consumer complaint data from the ANEEL public API, transforms the dataset using Pandas, and uploads it to an AWS S3 bucket.

Technologies: Python, AWS S3, Boto3, Poetry, Taskipy
Focus: REST API ingestion, transformation, cloud-native ETL pipeline
Automation: Run and deploy using task run and task deploy

🔹 Hadoop Ecosystem — PySpark Word Count

Runs a distributed word count job using PySpark on Google Cloud Dataproc, processing a text file stored on GCS and saving the sorted word frequencies.

Technologies: PySpark, Google Cloud Dataproc, GCS
Focus: Big Data processing with Hadoop ecosystem
Execution: gcloud dataproc jobs submit pyspark ...

Tools & Technologies

Cloud Platforms: AWS (S3), GCP (Dataproc, GCS)
Languages & Frameworks: Python, PySpark
Package & Task Management: Poetry, Taskipy
Libraries: Pandas, Requests, Boto3

How to Use

Clone the repository:

git clone https://github.com/leticiagcsilva/Cloud_Data_Engineer.git
cd Cloud_Data_Engineer

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
climate_data_pipeline		climate_data_pipeline
etl_ANEEL_AWS		etl_ANEEL_AWS
hadoop_ecosystem_lab		hadoop_ecosystem_lab
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
index.md		index.md
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Data Engineer — Project Repository

Purpose

Subprojects

🔹 ETL: ANEEL Complaint Data to AWS

🔹 Hadoop Ecosystem — PySpark Word Count

Tools & Technologies

How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

leticiagcsilva/data_engineering

Folders and files

Latest commit

History

Repository files navigation

Cloud Data Engineer — Project Repository

Purpose

Subprojects

🔹 ETL: ANEEL Complaint Data to AWS

🔹 Hadoop Ecosystem — PySpark Word Count

Tools & Technologies

How to Use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages