Skip to content

As part of my Master's in Software Development and Engineering (MSDE), where the "D" stands for Data, this project focuses on learning how to ingest, clean, and visualize data using Python libraries like Pandas and Matplotlib

License

Notifications You must be signed in to change notification settings

24luca24/visual-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

visual-analytics

As part of my Master's in Software and Data Engineering (MSDE), where the "D" stands for Data, this project focuses on learning how to ingest, clean, and visualize data using Python libraries like Pandas and Matplotlib

πŸ“Š Data Exploration and Visualization Assignment

As part of my Master's in Software Development and Engineering (MSDE), this assignment focuses on exploring, cleaning, and visualizing data using Python tools introduced in class.

Python Jupyter Pandas Matplotlib Seaborn GeoPandas Bokeh


Assignment 1

🎯 Goal

The goal of this assignment is to use Python and Jupyter Notebook to explore, analyze, and visualize the provided datasets. You can find the dataset following this link:

▢️ How to Run the Notebook

  1. Clone the repository:
    git clone https://github.com/your-username/visual-analytics.git
    cd visual-analytics
    

Assignment 2: Visual Analytics with Elasticsearch & Kibana

This project combines data querying, visualization, and the extension of Elasticsearch capabilities through a custom ingestion plugin.

Elasticsearch Kibana Java Gradle CSV

🎯 Goal

The aim of this assignment is to:

  1. Explore and analyze a dataset of restaurants using Elasticsearch queries and aggregations.
  2. Visualize insights through Kibana dashboards and canvas presentations.
  3. Develop a custom Elasticsearch ingest plugin that performs lookup-based text substitutions during document ingestion.

πŸ“ Dataset

The dataset used in this assignment is provided as a CSV file at this link:

Please ensure your Elasticsearch index is named: restaurants

πŸ› οΈ Features Implemented

πŸ”Ž Section 1: Indexing, Queries & Aggregations

  • Indexed the restaurants.csv into JSON documents
  • Crafted advanced search queries and filters (e.g., geolocation, string patterns, numeric ranges)
  • Built aggregations for:
    • Weighted averages
    • Top-N groupings
    • Bucket-based analysis

πŸ“Š Section 2: Kibana Visualization

  • Created interactive dashboards with:
    • Review sentiment trends
    • Cost distribution across continents
    • Map with vote-based markers
    • Heatmap for ratings vs price
  • Built a Canvas with:
    • Custom filters
    • Categorized cost metrics
    • Visual summaries of review quality

πŸ”Œ Section 3: Elasticsearch Ingest Plugin

  • Implemented a lookup ingest processor
  • Enabled dynamic replacement of coded fields with human-readable values during indexing
  • Configured pipeline setup and document transformation via custom plugin

πŸ”„ Example of Plugin in Action

Input document:

{ "field1": "Need to optimize the C001 temperature. C010 needs to be changed." }

Lookup Map { "C001": "tyre", "C010": "front wing" }

Output { "field1": "Need to optimize the tyre temperature. front wing needs to be changed." }

▢️ How to Run the Notebook


Assignment 3: Data Processing with Polars & Apache Spark

As part of my Master's in Software Development and Engineering (MSDE), this assignment focuses on learning how to process and analyze large datasets using Polars and Apache Spark β€” two powerful tools for high-performance data manipulation.

Polars Apache Spark

🎯 Goal

The goal of this assignment is to:

  1. Ingest and clean a large dataset using Polars and Apache Spark.
  2. Perform efficient computations and transformations on the data.
  3. Generate insightful visualizations based on the processed data.

πŸ“ Dataset

The dataset used in this assignment is provided in CSV format.
[Downlod Trip Data Dataset] (https://drive.google.com/file/d/14jnhbmcfedFyaj6EksPQr_DtVx8qiBdb/view?usp=share_link) [Download Trip Fare Dataset] (https://drive.google.com/file/d/14jnhbmcfedFyaj6EksPQr_DtVx8qiBdb/view?usp=share_link)

πŸ› οΈ Features Implemented

🧼 Data Cleaning & Transformation

  • Handled missing values, inconsistent formatting, and outliers.
  • Applied efficient column-wise transformations using Polars and Spark APIs.

πŸ“Š Visualization & Analytics

  • Computed key statistics and trends across multiple dimensions.
  • Generated visual insights from the processed data using Python tools.

▢️ How to Run the Notebook

  1. Clone the repository:
    git clone https://github.com/your-username/your-repo-name.git
    cd your-repo-name
    

🧠 Group Project: Tech Jobs & Cost of Living Dashboard

For this group project, we explored the relationship between tech job markets and cost of living by creating an interactive dashboard using Python tools.

Python Pandas Dash CSV

🎯 Goal

The objective of this group project was to:

  1. Find and analyze a real-world dataset.
  2. Use Python libraries covered in class to clean, process, and visualize the data.
  3. Build an interactive dashboard to explore key insights.

πŸ“ Dataset

We used several dataset to compute the statistics of cost of living from 2015 t0 2025. There is a folder inside the group project one, containing all the dataset used. Datasets were ingested and merged using Pandas.

πŸ› οΈ Tools & Technologies

  • Pandas: for data ingestion, cleaning, and merging
  • Dash (by Plotly): for building the interactive web-based dashboard
  • CSV: as the data source format

πŸ“Š Final Output

The dashboard allows users to:

  • Compare average tech salaries vs cost of living across cities
  • Filter by region or job title
  • Explore affordability and job density visually

▢️ How to Run the Dashboard

  1. Clone the repository:
    git clone https://github.com/your-username/your-repo-name.git
    cd your-repo-name
    

About

As part of my Master's in Software Development and Engineering (MSDE), where the "D" stands for Data, this project focuses on learning how to ingest, clean, and visualize data using Python libraries like Pandas and Matplotlib

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages