European music festival dataset

This project consists of several parts:

Scraping data from https://www.festival-alarm.com/us using BeautifulSoup. After some research, this website seemed to be the only one providing a decent overview of European music festivals from 2014 to 2025. The tables from the different pages corresponding to each year were merged to form a single Pandas Dataframe. From a first analysis it´s clear that the dataset has not been curated and presents many missing values especially when counting the number of visitors and the price of tickets(ca. 30% each). Furthermore many well known festivals have not been reported and the dataframe shows to be accurate only for countries like Germany and UK, while for other countries like Italy it shows too few entries (only 17 in total, which is obviously not accurate). This Dataset could be improved integrating information from different sources, like Wikipedia for instance.
Data cleaning After parsing the HTML file in step 1, the resulting Dataframe had 11 column, which required several cleaning steps. These included: data type conversion, string cleaning (like separating or merging words), datetime feature extraction etc.
Feature Engineering Dates were rendered in a uniform way to present just the day of start of the event. Since we have a column for 'Duration', this information seemed redundant, and not easy to handle for data analysis if we don´t have uniform format of data in a particular column. An additional column 'Total_revenue' was created by simply multipling the number of visitors for the price. Further analysis could be interesting, to have a better estimation of this value. Also with all the missing values, and not having implemented any soluition to fill them, we could calculate the revenue for only ca. 50% of the rows.
Geocoding Location using Nominatim. Thisn allows to create two new columns 'Latitude' and 'Longitude' with coordinates that can be used to display the entries on a map
Data Analysis some consideration and visualization on the data
Streamlit App: an interactive visualization of the mapping applying different filters

Requirements

Python version: 3.11.8

pip install -r requirements.txt

Streamlit app

The streamlit app allows to visualize subsets of the dataset. Select the properties for a specific subset and display it on the map.

Run the streamlit app from the console with:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
Data_analysis.ipynb		Data_analysis.ipynb
Data_preparation.py		Data_preparation.py
README.md		README.md
app.py		app.py
festival.jpg		festival.jpg
festivals_tot.csv		festivals_tot.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

European music festival dataset

Requirements

Streamlit app

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Vinsora/Festival

Folders and files

Latest commit

History

Repository files navigation

European music festival dataset

Requirements

Streamlit app

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages