Skip to content

DataTwins2023/MlbParserAirflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MlbParserAirflow

Only mlb_parser_hit.py and mlb_parser_ranking.py are useful in this project, the others are trivial.

Description

For now, this is an airflow DAG used to collect MLB data everyday and created an API to demonstrate the data

Project goal and tools

Comprison:

Project comprises three segements, including parsing data from MLB website, committing data to database and demonstrating data via API.

Tools:

Airflow, Docker and postgresql are used to collect data and send to database

Goals:

Eventually, I wish to train these data collected day by day to build a model used to predict MVP(Most Valueable Player), which I had completed in the field of NBA(National Basketball Association),of MLB(Major League Baseball). Due to website record, most of MVP players are batter, so this is why I only parse data related to batter.

截圖 2023-12-04 下午12 48 13 DAG used to gain the data of players

截圖 2023-12-04 下午12 48 37 DAG to update team ranking and grade

截圖 2023-12-04 下午12 30 40

Table in the database

截圖 2023-12-04 下午12 31 33

ER model of these tables

Screenshot of each table(limit 10)

截圖 2023-12-04 下午12 56 07

Team grade

截圖 2023-12-04 下午12 57 23

Batter rank order by AVG

截圖 2023-12-04 下午12 57 59

Batter rank order by HR

截圖 2023-12-04 下午1 00 55

Raw data of batters without advanced query to get specific result

截圖 2023-12-05 下午10 12 34

API provided information matching the condition

About

this is an airflow DAG used to collect MLB data everyday and created an API to demonstrate the data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors