GitHub - aaguiare/ih_datamadpt0923_project_m1: Ironhack Madrid - Data Analytics Part Time - Sep 2023 - Project Module 1

Ironhack Data PT MAD - Project Module 1

This README file includes the details of the repository elements required for the module 1 project within Data Analytics Bootcamp in Ironhack Madrid. The project consists of creating a data pipeline that shows the nearest stations by service (Bicimad or Bicipark) to the place of interest given by the project details.

Data

CSV file Bicimad stations (located in data folder)
CSV file Bicipark stations (located in data folder)
API REST connection with Portal de datos abiertos del Ayuntamiento de Madrid, with the following dataset /catalogo/202162-0-instalaciones-accesibles-municip.json, from now known as 'Places of interest'

Resources in this repository

This repository will include the following elements (apart from the ones defined in Data):

Modules:
- Modules_database py file with all the details on data acquisition and wrangling of the place of interest list, Bicimad, and Bicipark data
- Modules_function py file with the auxiliary functions to be used during the data pipeline
- Geo_calculations py file includes the two given functions in the project to calculate distances with coordinates. This module is given for project execution, not created while developing the code.
- General map creation module to create visual maps of all the places of interest and Bicimad or Bicipark stations
- Key map creation module to create visual maps of the place of interest input by the user with the nearest station, as well as a line of relation among two points
- Module api data with all the details of the connection with Bicimad API, extraction in real-time bases, and bikes available in the stations.
Pipeline:
- Main.py file with all the data pipeline execution, applying all the auxiliary resources from the modules
Notebooks:
- A jupyter notebook used for practise and experimentation
Output:
- Output folder with all the results that the data pipeline can create.
Others:
- Slide deck for project introduction on project day
- Trash folder with all the iterations of the main file and modules, as well as additional resources used.


📁 Folder structure
└── project
    ├── __trash__
    ├── .gitignore
    ├── README.md
    ├── LICENSE
    ├── Project M1 Ironhack    
    ├── main.py
    ├── output
    ├── notebooks
    │   ├── dev_notebook_.ipynb
    ├── modules
    │   ├── .env
    │   ├── general_map_creation.py
    │   ├── module_api_data.py
    │   ├── key_api_creation.py
    │   ├── geo_calculations.py
    │   ├── modules_database
    │   ├── modules_function
    └── data
        ├── bicimad_stations.csv
        ├── bicipark_stations.csv

🥤Usage

At the execution of pipeline with main.py, the user must input through argparse the service that he is interested in ('bicimad or bicipark, as "-p" or "--service") and the place of interest where he is situated (as -t or "--title").

The output will be the following:

A string with the nearest station to him, total distance and the availability of free bases and bikes in each stations in real-time with the API connection for Bicimad input; in Bicipark case, the result comes from the given CSV.
A html visual map with the position of both points (in the folder output) and a line of correlation.
A CSV file (in the folder output) indicating th nearest stations with free bases and bikes available. In the case of Bicimad inputs, the results are in real-time; in Bicipark case, the result comes from the given CSV.

In case the user defines the location as "All", the out put will be as it follows:

A CSV file with the list of places of interest aligned with the calculated nerest stations from the services of choice, the distance from each point and the availability of free bases and bikes in each stations in real-time with the API connection (for Bicimad, Bicipark is static data from CSV)
A html visual map including all the places of interest in Madrid with all the existing stations (depending on the service choice, Bicimad or Bicipark).

🔧Configuration

The following libraries are used in the pipeline, so they should be downloaded in the environment to be executed

import numpy as np
import argparse
from scipy.spatial.distance import cdist
import geopandas as gpd
from shapely.geometry import Point
import requests
import pandas as pd 
import folium 
from fuzzywuzzy import process
from dotenv import load_dotenv
import os

💥Technology stack

Python, Pandas, Scipy, Scikit-learn, Numpy, Geopandas, Shapely, Fuzzywuzzy, Folium, Os, Dotenv, ArgParse and Requests.

👀Context

This repository is the final project for Module 1 project for the Part Time Data Analytics Bootcamp in November 2023, which had the following requirements:

Main Challenge:

You must create a Python App (Data Pipeline) allowing potential users to find the nearest BiciMAD station to a set of places of interest using the methods included in the module geo_calculations.py. Your project must meet the following requirements:

It must be contained in a GitHub repository with a README file that explains the aim and content of your code. You may follow the structure suggested here.
It must create, at least, a .csv file including the requested table (i.e. Main Challenge). Alternatively, you may create an image, pdf, plot or any other output format that you may find convenient. You may also send your output by e-mail, upload it to a cloud repository, etc.
It must provide, at least, two options for the final user to select when executing using argparse: (1) To get the table for every 'Place of interest' included in the dataset (or a set of them), (2) To get the table for a specific 'Place of interest' imputed by the user.
Additionally:

You must prepare a 4 minutes presentation (ppt, canva, etc.) to explain your project (Instructors will provide further details about the content of the presentation). The last slide of your presentation must include your candidate for the 'Ironhack Data Code Beauty Pageant'.
Bonus 1: You may include in your table the availability of bikes in each station.
Bonus 2: You may improve the usability of your app by using FuzzyWuzzy.
Bonus 3: Feel free to enrich your output data with any data you may find relevant (e.g.: wiki info for every place of interest) or connect to the BiciMAD API and update bikes availability in realtime or find a better way to calculate distances...there's no limit!!!

💩 ToDo

As next steps and continuous improvements:

Improve map visualization and data input for usability
Create interactive maps to include walkable and transport approaches from the places of interest to the stations, rather than a straight line.

💌 Contact info

Hi! I am Ana! 🎟 Feel free to contact me at teamurjc@gmail.com. Happy to chat!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ironhack Data PT MAD - Project Module 1

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
_trash_		_trash_
data		data
modules		modules
notebooks		notebooks
output		output
.gitignore		.gitignore
LICENSE		LICENSE
Project M1 ironhack.pdf		Project M1 ironhack.pdf
README.md		README.md
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

Ironhack Data PT MAD - Project Module 1

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages