Lucas Madiedo
IronHack: Data Analysis Bootcamp
The purpose of this project was to clean and unificate data from differents sources by a pipeline process in order to obtain three different tables. Each of these tables uses these multiple source info to make aggregations and allows us to obtain conclusions based on a full informated point of view .
Make a professional structured pipeline Use argparse to give terminal line arguments to our script Make a working combined dataframe Extract the requested information Display all the resulting info
- SQL Data Base: Here you can find the
.dbfile - API. We will use the API from the Open Skills Project.
- Web Scraping. Finally, we will need to retrieve information about country codes from Eurostat website.
- argparse
- os
- sqlite3
- pandas
- requests
- BeautifulSoup
- Steamlit
- pandas
- Smtplib
- EmailMessage
- Configparser
── dashboard.py
├── data
│ ├── processed
│ │ ├── 01_FULL_raw_table.csv
│ │ ├── api_carrer_info_cleaned.csv
│ │ ├── arguments_against_cleaned.csv
│ │ ├── arguments_pro_cleaned.csv
│ │ ├── db_carrer_info_cleaned.csv
│ │ ├── db_countries_info_cleaned.csv
│ │ ├── db_personal_info_cleaned.csv
│ │ ├── db_poll_info_cleaned.csv
│ │ └── ws_countries_info_cleaned.csv
│ ├── raw
│ │ └── raw_data_project_m1.db
│ ├── results
│ │ ├── result_bonus1_procons_args.csv
│ │ ├── result_bonus2.csv
│ │ └── result_challenge1.csv
│ └── sent
│ └── mail_info.csv
├── main_script.py
├── notebooks
│ ├── bonus1.ipynb
│ ├── bonus2.ipynb
│ ├── dashboard bonus1.ipynb
│ ├── mail.ipynb
│ └── reporting.ipynb
├── p_acquisition
│ ├── __init__.py
│ ├── m_acquisition.py
├── p_analysis
├── p_reporting
│ ├── __init__.py
│ ├── m_reporting.py
├── p_wrangling
│ ├── __init__.py
│ ├── m_wrangling.py
├── README.md
├── requirements.txt
└── __trash__
Access and clean all tables in SQL Database and creates a df and a csv for each cleaned table
Makes a df and csv file from webscraping process
Makes a df and a csv from an API request of each jobcode
Merge out career info df with the info from the API requests
Merge Country info from webscraping and our Database
Makes a full df and CSV with all the info merged (but poll info)
-
[Challenge 1]: Filter our full clean df depending of our choices Makes columns Calculations
-
[Bonus 1]: Count Number or each type of argument
-
[Bonus 2]: Makes api request for each job asociated to every education level
Display all the info on a web based dashboard
Allows to make filters and work with the data directly on the dashboard.
Also is possible to send the info (all data or just the filtered) by email
- make dynamic plots and histograms
- Uploading to web
- make some map based plot
- working on error handling


