MODULE 1

FINAL PROYECT: PIPELINES AND ANALYSIS ON A DATA PROJECT

Lucas Madiedo
IronHack: Data Analysis Bootcamp

Overview

The purpose of this project was to clean and unificate data from differents sources by a pipeline process in order to obtain three different tables. Each of these tables uses these multiple source info to make aggregations and allows us to obtain conclusions based on a full informated point of view .

Goals

Make a professional structured pipeline Use argparse to give terminal line arguments to our script Make a working combined dataframe Extract the requested information Display all the resulting info

Dataset

SQL Data Base: Here you can find the .db file
API. We will use the API from the Open Skills Project.
Web Scraping. Finally, we will need to retrieve information about country codes from Eurostat website.

Technology stack

Libraries Used

argparse
os
sqlite3
pandas
requests
BeautifulSoup
Steamlit
pandas
Smtplib
EmailMessage
Configparser

- Folder structure

── dashboard.py
├── data
│   ├── processed
│   │   ├── 01_FULL_raw_table.csv
│   │   ├── api_carrer_info_cleaned.csv
│   │   ├── arguments_against_cleaned.csv
│   │   ├── arguments_pro_cleaned.csv
│   │   ├── db_carrer_info_cleaned.csv
│   │   ├── db_countries_info_cleaned.csv
│   │   ├── db_personal_info_cleaned.csv
│   │   ├── db_poll_info_cleaned.csv
│   │   └── ws_countries_info_cleaned.csv
│   ├── raw
│   │   └── raw_data_project_m1.db
│   ├── results
│   │   ├── result_bonus1_procons_args.csv
│   │   ├── result_bonus2.csv
│   │   └── result_challenge1.csv
│   └── sent
│       └── mail_info.csv
├── main_script.py
├── notebooks
│   ├── bonus1.ipynb
│   ├── bonus2.ipynb
│   ├── dashboard bonus1.ipynb
│   ├── mail.ipynb
│   └── reporting.ipynb
├── p_acquisition
│   ├── __init__.py
│   ├── m_acquisition.py
├── p_analysis
├── p_reporting
│   ├── __init__.py
│   ├── m_reporting.py
├── p_wrangling
│   ├── __init__.py
│   ├── m_wrangling.py
├── README.md
├── requirements.txt
└── __trash__

Process

Step 1: Getting our data ready @ Data_Acquisition

Access and clean all tables in SQL Database and creates a df and a csv for each cleaned table
Makes a df and csv file from webscraping process
Makes a df and a csv from an API request of each jobcode

Step 2: Putting our data together @ Data_Wrangling

Merge out career info df with the info from the API requests
Merge Country info from webscraping and our Database
Makes a full df and CSV with all the info merged (but poll info)

Step 3: Analysing our data @ Data_Reporting

[Challenge 1]: Filter our full clean df depending of our choices Makes columns Calculations
[Bonus 1]: Count Number or each type of argument
[Bonus 2]: Makes api request for each job asociated to every education level

Step 4: Sharing our data @ Dashboard

Display all the info on a web based dashboard
Allows to make filters and work with the data directly on the dashboard.
Also is possible to send the info (all data or just the filtered) by email

Results

Next steps

make dynamic plots and histograms
Uploading to web
make some map based plot
working on error handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MODULE 1

FINAL PROYECT: PIPELINES AND ANALYSIS ON A DATA PROJECT

Overview

Goals

Dataset

Technology stack

Libraries Used

- Folder structure

Process

Step 1: Getting our data ready @ Data_Acquisition

Step 2: Putting our data together @ Data_Wrangling

Step 3: Analysing our data @ Data_Reporting

Step 4: Sharing our data @ Dashboard

Results

Next steps

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
img		img
notebooks		notebooks
p_acquisition		p_acquisition
p_reporting		p_reporting
p_wrangling		p_wrangling
.env.txt		.env.txt
.gitignore		.gitignore
README.md		README.md
dashboard.py		dashboard.py
main_script.py		main_script.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MODULE 1

FINAL PROYECT: PIPELINES AND ANALYSIS ON A DATA PROJECT

Overview

Goals

Dataset

Technology stack

Libraries Used

- Folder structure

Process

Step 1: Getting our data ready @ Data_Acquisition

Step 2: Putting our data together @ Data_Wrangling

Step 3: Analysing our data @ Data_Reporting

Step 4: Sharing our data @ Dashboard

Results

Next steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages