Only mlb_parser_hit.py and mlb_parser_ranking.py are useful in this project, the others are trivial.
For now, this is an airflow DAG used to collect MLB data everyday and created an API to demonstrate the data
Comprison:
Project comprises three segements, including parsing data from MLB website, committing data to database and demonstrating data via API.
Tools:
Airflow, Docker and postgresql are used to collect data and send to database
Goals:
Eventually, I wish to train these data collected day by day to build a model used to predict MVP(Most Valueable Player), which I had completed in the field of NBA(National Basketball Association),of MLB(Major League Baseball). Due to website record, most of MVP players are batter, so this is why I only parse data related to batter.
DAG used to gain the data of players
DAG to update team ranking and grade
Table in the database
ER model of these tables
Team grade
Batter rank order by AVG
Batter rank order by HR
Raw data of batters without advanced query to get specific result
API provided information matching the condition
