As part of my Master's in Software and Data Engineering (MSDE), where the "D" stands for Data, this project focuses on learning how to ingest, clean, and visualize data using Python libraries like Pandas and Matplotlib
As part of my Master's in Software Development and Engineering (MSDE), this assignment focuses on exploring, cleaning, and visualizing data using Python tools introduced in class.
The goal of this assignment is to use Python and Jupyter Notebook to explore, analyze, and visualize the provided datasets. You can find the dataset following this link:
- [Download Airport Dataset] (https://drive.google.com/file/d/1MUJrvA0dRDoWGXlIJY9BxhjobL0O8Mg1/view?usp=share_link)
- [Download Countries Dataset] (https://drive.google.com/file/d/1mAyCkM2_Y_kLTWpb3dQ2-xBymgsgzQT2/view?usp=share_link)
- [Download Energy Dataset] (https://drive.google.com/file/d/12BvtMOuuRCzPawqgSe-nHnHeXGnN73e_/view?usp=share_link)
- [Europe.geoJson Dataset] (https://drive.google.com/file/d/1MK3yuScG26-6RcJUR2PU-GZlGUwi_-tz/view?usp=share_link)
- [Market value Decline Dataset] (https://drive.google.com/file/d/1BTolE3CDJpe_lP0IBQ9TPnCTATWLFtFI/view?usp=share_link)
- [Routes Dataset] (https://drive.google.com/file/d/1admk_UHq7fZaFMY7-LAs9L3GKBzmLL8f/view?usp=share_link)
- Clone the repository:
git clone https://github.com/your-username/visual-analytics.git cd visual-analytics
This project combines data querying, visualization, and the extension of Elasticsearch capabilities through a custom ingestion plugin.
The aim of this assignment is to:
- Explore and analyze a dataset of restaurants using Elasticsearch queries and aggregations.
- Visualize insights through Kibana dashboards and canvas presentations.
- Develop a custom Elasticsearch ingest plugin that performs lookup-based text substitutions during document ingestion.
The dataset used in this assignment is provided as a CSV file at this link:
- [Download Restaurants Dataset] (https://drive.google.com/file/d/1-SQEOkNKFW5VhdHM69CWF9m5nKYuRvHw/view?usp=share_link)
- [Downloand NDJson of Restaurants Dataset] (https://drive.google.com/file/d/1vNQueoWjHDXbvuk973xo_kC3b9ESu3rh/view?usp=share_link)
- [Nyc Borought geoJson Dataset] (https://drive.google.com/file/d/18aTi575vHgVT1-XzEm4hsNmshvMyYc9G/view?usp=share_link)
Please ensure your Elasticsearch index is named: restaurants
- Indexed the
restaurants.csvinto JSON documents - Crafted advanced search queries and filters (e.g., geolocation, string patterns, numeric ranges)
- Built aggregations for:
- Weighted averages
- Top-N groupings
- Bucket-based analysis
- Created interactive dashboards with:
- Review sentiment trends
- Cost distribution across continents
- Map with vote-based markers
- Heatmap for ratings vs price
- Built a Canvas with:
- Custom filters
- Categorized cost metrics
- Visual summaries of review quality
- Implemented a lookup ingest processor
- Enabled dynamic replacement of coded fields with human-readable values during indexing
- Configured pipeline setup and document transformation via custom plugin
Input document:
{ "field1": "Need to optimize the C001 temperature. C010 needs to be changed." }Lookup Map { "C001": "tyre", "C010": "front wing" }
Output { "field1": "Need to optimize the tyre temperature. front wing needs to be changed." }
As part of my Master's in Software Development and Engineering (MSDE), this assignment focuses on learning how to process and analyze large datasets using Polars and Apache Spark β two powerful tools for high-performance data manipulation.
The goal of this assignment is to:
- Ingest and clean a large dataset using Polars and Apache Spark.
- Perform efficient computations and transformations on the data.
- Generate insightful visualizations based on the processed data.
The dataset used in this assignment is provided in CSV format.
[Downlod Trip Data Dataset] (https://drive.google.com/file/d/14jnhbmcfedFyaj6EksPQr_DtVx8qiBdb/view?usp=share_link)
[Download Trip Fare Dataset] (https://drive.google.com/file/d/14jnhbmcfedFyaj6EksPQr_DtVx8qiBdb/view?usp=share_link)
- Handled missing values, inconsistent formatting, and outliers.
- Applied efficient column-wise transformations using Polars and Spark APIs.
- Computed key statistics and trends across multiple dimensions.
- Generated visual insights from the processed data using Python tools.
- Clone the repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name
For this group project, we explored the relationship between tech job markets and cost of living by creating an interactive dashboard using Python tools.
The objective of this group project was to:
- Find and analyze a real-world dataset.
- Use Python libraries covered in class to clean, process, and visualize the data.
- Build an interactive dashboard to explore key insights.
We used several dataset to compute the statistics of cost of living from 2015 t0 2025. There is a folder inside the group project one, containing all the dataset used. Datasets were ingested and merged using Pandas.
- Pandas: for data ingestion, cleaning, and merging
- Dash (by Plotly): for building the interactive web-based dashboard
- CSV: as the data source format
The dashboard allows users to:
- Compare average tech salaries vs cost of living across cities
- Filter by region or job title
- Explore affordability and job density visually
- Clone the repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name