A data visualization project which includes the complete pipeline from data collection, cleaning, featuring, EDA, and applying an ML technique to find the solution.
Data Preprocessing Steps
Collected data from [state your data source, e.g. Kaggle, UCI, scraped websites, survey, etc.] ensuring relevance and comprehensiveness for the project goal.
Imported datasets to the working environment for further exploration.
Explored the raw data: inspected shapes, types, unique values, descriptive statistics, and column names.
Detected obvious errors, inconsistencies, or unusual patterns through summary statistics and visualization.
Handled missing values: identified null or blank entries and treated them by removing, imputing (mean/median/mode), or flagging as needed.
Removed duplicate rows to ensure each data point is unique.
Detected and addressed outliers or impossible values using statistical thresholds or domain logic (e.g., replacing '0' in age column with NaN).
Standardized formats (e.g., date/time) and ensured units and labels are uniform across the dataset.
Encoded categorical features: converted non-numerical values into numeric representations using label encoding or one-hot encoding, making the dataset suitable for machine learning models.
Created or modified features as required by feature engineering tasks.
Normalized or standardized numeric features to ensure comparability and improve model performance, typically using methods such as MinMaxScaler or StandardScaler.
Splitted the dataset into training, validation, and test sets to assess model performance and prevent overfitting.
Documented each step with clear code comments and maintained a changelog of modifications during preprocessing for transparency and reproducibility.
Imported cleaned dataset.
Built dashboards to explore countrywide music trends and age-group preferences.
Generated insights like:
Most popular genres per country.
Trending songs across different age brackets.
Artist-level popularity comparisons.

- Python 3.7 or higher installed on your system
- Your dataset file named
cleaned_genres_data.csv
Save these files in the same directory:
app.py(the main application code)requirements.txt(dependencies list)cleaned_genres_data.csv(your dataset)
Open terminal/command prompt in the project directory and run:
pip install -r requirements.txtAlternative (if you don't want to use requirements.txt):
pip install streamlit pandas numpyMake sure your cleaned_genres_data.csv file contains these columns:
track_idartistsalbum_nametrack_namepopularityduration_msexplicitdanceabilityenergyloudnessspeechinessacousticnessinstrumentalnesslivenessvalencetempotrack_genre
In your terminal/command prompt, navigate to the project directory and run:
streamlit run app.py- The application will automatically open in your default web browser
- If it doesn't open automatically, go to:
http://localhost:8501 - The terminal will show you the exact URL
- Enter your age in the sidebar (10-100 years)
- Choose number of recommendations (5-20 songs)
- Click "Get My Recommendations" to get personalized song suggestions
- Use the search feature to find specific songs or artists
- Expand song details to see more information about each track
- Teenagers (13-19): High energy pop, hip-hop, EDM
- Young Adults (20-29): Mix of popular and indie tracks
- Adults (30-49): Rock, jazz, acoustic, more mature sounds
- Seniors (50+): Classical, jazz, acoustic, timeless classics
Search functionality - Type song name or artist to find songs Popular songs section - Browse top popular songs if you're unsure Select button - Choose the song you want to listen to
Based on your selected song - Uses audio features similarity Filtered by your age group - Applies age-based preferences Combined scoring - 70% similarity + 30% popularity for best results
Cosine Similarity from scikit-learn:
Analyzes audio features: danceability, energy, loudness, speechiness, acousticness, etc. Creates similarity matrix between all songs Finds songs most similar to your selected track
Search & Select - Find and choose a song you like Age Input - Enter your age for personalized filtering Get Recommendations - System finds similar songs based on:
Audio feature similarity to your selected song Age group preferences for better matching Popularity score for quality assurance
Error: "Dataset file not found"
- Make sure
cleaned_genres_data.csvis in the same folder asapp.py - Check the file name spelling exactly
Error: "Module not found"
- Run
pip install streamlit pandas numpyagain - Make sure you're using the correct Python environment
Port already in use
- Use:
streamlit run app.py --server.port 8502 - Or close other Streamlit applications
Application not loading
- Check terminal for error messages
- Ensure Python version is 3.7+
- Try refreshing the browser page
✅ Age-based recommendations using music characteristics
✅ Interactive web interface with Streamlit
✅ Song search functionality
✅ Detailed song information (duration, popularity, audio features)
✅ Responsive design with sidebar controls
✅ Real-time filtering based on danceability, energy, and valence
Press Ctrl + C in the terminal to stop the Streamlit server.
🎵 Enjoy discovering new music based on your age preferences!

