MentalHealth-GeoAnalysis is a Python-based project that fetches and analyzes top Reddit posts from selected mental health-related subreddits. It aims to uncover geographic trends in mental health discussions through data cleaning, analysis, and interactive visualizations.
-
🔍 Reddit Scraper:
Uses PRAW (Python Reddit API Wrapper) to fetch top posts from specified mental health-focused subreddits. -
🔐 Secure API Access:
Authenticates using environment variables to protect your Reddit API credentials. -
🧹 Data Cleaning & Analysis:
Cleans and processes Reddit data for accurate analysis, removing noise and irrelevant content. -
🗺️ Geo Visualization:
Generates an interactive HTML map (reddit_crisis_map_final.html) highlighting posts with geographical context to visualize mental health crisis trends. -
🕵️ PII Removal:
Utilizes Microsoft Presidio in a Jupyter notebook (PII_Removal.ipynb) to anonymize personal information. -
📊 Reproducible Analysis:
A Jupyter notebook (CodeFile.ipynb) for exploratory analysis and visual storytelling.
Clone the repository and install the required dependencies:
git clone https://github.com/OnePunchMonk/MentalHealth-GeoAnalysis.git
cd MentalHealth-GeoAnalysis
pip install praw pandas geopandas jupyter| Tool / Library | Purpose |
|---|---|
| Python | Core programming language |
| PRAW | Reddit API integration |
| Pandas | Data manipulation and analysis |
| GeoPandas | Geospatial data processing |
| Jupyter Notebook | Interactive analysis and documentation |
| Microsoft Presidio | PII detection and anonymization |
| HTML/Leaflet.js | Interactive geographic visualization |
-
Reddit API Authentication:
- Create a
.envfile in the project root directory. - Add your Reddit API credentials:
CLIENT_ID=your_client_id CLIENT_SECRET=your_client_secret USER_AGENT=your_user_agent
- Create a
-
Run the data collection and analysis scripts as needed (see notebooks).
-
Processed datasets are saved as:
cleaned_reddit_posts_final.csvcleaned_reddit_posts_final.json
-
To explore the interactive map, open:
reddit_crisis_map_final.htmlin your browser to see a geographical visualization of Reddit posts related to mental health topics.
CodeFile.ipynb: Core data exploration and visualization workflow.PII_Removal.ipynb: Uses Microsoft Presidio for identifying and removing personally identifiable information (PII).