⚽ ⚽ ⚽ Liga Młodych Talentów (The Young Talent League) is a football competition which takes place across six locations in Poland. Matches take place every fortnight over five rounds. The winter season came to an end last weekend, with just under 600 teams competing in almost 4,500 matches across 23 leagues. Results and league standings are currently maintained in Google Sheets, with a publicly available URL for each city. ⚽ ⚽ ⚽
My aim was to bring all this information together in one place to provide users with real-time information.
From raw data to insights, here’s what I did:
- ✅ Pulled raw data from six Google Sheets URLs
- ✅ Cleaned & transformed it using Apache Spark in Databricks 🔥
- ✅ Followed the Medallion architecture (Bronze → Silver → Gold)
- ✅ Stored the processed data in AWS S3 (Parquet format) 🏗️
- ✅ Built a Streamlit app to serve insights in real-time! 🎨📊
The Pipeline scriptsa re included in this repo under the Bronze, Silver and Gold folders.
https://liga-mt-cloud-development-mode-stephen-barrie.streamlit.app/
