In this project we have created a data pipeline that unites the results of the survey from a data base with data obtained from API connection and web scraping process in order to enrich the results.
To obtain the results of the survey for just one country:
python main.py -p /data/raw/raw_data_project_m1.db -c Spain
- Python==3.8.5
- pandas==1.1.3
- sqlalchemy==1.3.20
- requests==2.25.1
- seaborn==0.11.0
- numpy==1.19.2
- argparse==3.2
There are 3 different datasources involved:
- Tables (.db) with the results of the survey. You can see the data in data/raw folder.
- API. We will use the API from the Open Skills Project.
- Web Scraping. Finally, we will need to retrieve information about country codes from Eurostat website.
└── project
├── .gitignore
├── requirements.txt
├── README.md
├── main.py
├── p_acquisition
│ ├── __init__.py
│ └── m_acquisition.py
├── p_wrangling
│ ├── __init__.py
│ └── m_wrangling.py
├── p_analysis
│ ├── __init__.py
│ └── m_analysis.py
├── p_reporting
│ ├── __init__.py
│ └── m_reporting.py
└── data
├── raw
├── processed
└── results
If you have some question, email me to sarisinhache@gmail.com!
