Skip to content

AI-powered health education platform using Random Forest Classification to predict diseases from symptoms and provide personalized recommendations (medications, diet, workouts, precautions). Integrates RxTerms API and Indian medical databases (CDSCO, IPC, 1mg). Features interactive visualizations with Plotly/Matplotlib. Frontend made by my friend

Notifications You must be signed in to change notification settings

sobhan2204/HealthEdu

Repository files navigation

Run Health_edu by ---> streamlit run ossp_pbl.py

πŸ₯ HealthEdu - AI-Powered Health Education & Disease Prediction System

Python Streamlit Machine Learning License

A comprehensive health education platform that leverages Random Forest Classification to predict diseases based on symptoms and provides personalized health recommendations including medications, dietary plans, precautions, and workout routines. Built with a strong emphasis on medical ethics and educational transparency.

🌟 Features

πŸ€– Machine Learning Core

  • Random Forest Classifier: Advanced ensemble learning for accurate disease-symptom pattern matching
  • Multi-Label Symptom Processing: Handles complex symptom combinations
  • 30% Test Split Validation: Rigorous model testing for reliability
  • Real-time Prediction Engine: Instant disease classification from user inputs

πŸ” Symptom Information System

  • Interactive symptom selection interface
  • Pattern-based condition matching
  • Educational disease information lookup
  • Multi-symptom analysis capabilities

πŸ’Š Medication Education Tools

  • Drug-condition association learning
  • Integration with RxTerms API (NLM Clinical Tables)
  • Links to Indian government portals (CDSCO, IPC)
  • Access to verified databases (1mg, PharmEasy, Drugs.com)
  • Intelligent medicine name normalization and fallback queries

πŸ” Medication Verification System

  • Educational medication-condition verification
  • Cross-reference with medical databases
  • Safe medication information lookup

πŸ“Š Health Data Visualization

  • Interactive Charts: Condition frequency analysis using Streamlit charts
  • Pie Charts: Top 10 conditions distribution with Matplotlib
  • Scatter Plots: Symptom correlation analysis using Plotly
  • Real-time data exploration tools

πŸ›‘οΈ Safety & Ethics

  • Comprehensive medical disclaimers throughout the application
  • Clear distinction between education and medical advice
  • Professional resource directory (WHO, CDC, Mayo Clinic)
  • Indian-specific medical resources (Emergency: 112, 102/108, 104)
  • Emphasis on consulting healthcare professionals

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Installation

  1. Clone the repository:
git clone https://github.com/sobhan2204/HealthEdu.git
cd HealthEdu
  1. Install required dependencies:
pip install -r req.txt

Required packages:

  • streamlit
  • pandas
  • matplotlib
  • plotly
  • scikit-learn
  • requests
  • urllib3

Running the Application

Launch the Streamlit app:

streamlit run Health_edu.py

The application will open automatically in your default web browser at http://localhost:8501

πŸ“Š Dataset Architecture

The project utilizes 8 comprehensive medical datasets:

Dataset Purpose Contents
Training.csv ML Model Training Disease-symptom relationships for Random Forest
Symptom-severity.csv Severity Analysis Weighted severity scores for symptoms
symptoms_df.csv Symptom Database Comprehensive symptom catalog
description.csv Medical Information Detailed disease descriptions
medications.csv Pharmacology Medication recommendations per condition
diets.csv Nutrition Dietary guidelines for conditions
precautions_df.csv Safety Guidelines Precautionary measures
workout_df.csv Fitness Plans Condition-specific exercise routines

πŸ€– Machine Learning Pipeline

Architecture & Implementation

# Feature Engineering
symptom_columns = training_df.columns[1:-1]
X = training_df[symptom_columns]  # Multi-symptom features
y = training_df[disease_column]    # Disease labels

# Train-Test Split (70-30)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Random Forest Classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Prediction with accuracy tracking
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

Key ML Features:

  • Algorithm: Random Forest (ensemble method with multiple decision trees)
  • Validation: 70-30 train-test split for robust evaluation
  • Input Processing: Binary symptom encoding (present=1, absent=0)
  • Output: Multi-class disease classification
  • Performance Metrics: Accuracy score and classification reports

API Integrations:

  • RxTerms API: NLM Clinical Tables for medication information
  • Caching: @st.cache_data for optimized API performance (1-hour TTL)
  • Error Handling: Comprehensive fallback mechanisms

πŸ› οΈ Technology Stack

Machine Learning & Backend (Developed by sobhan2204)

  • Python 3.8+: Core programming language
  • Scikit-learn: Random Forest implementation, preprocessing, metrics
  • Pandas: Data manipulation and CSV processing
  • NumPy: Numerical computations (via sklearn dependencies)
  • Requests: External API integration
  • Urllib: URL encoding for API calls

Frontend & Visualization (Developed by my friend)

  • Streamlit: Interactive web application framework
  • Matplotlib: Static visualizations (pie charts, histograms)
  • Plotly Express: Interactive 3D scatter plots and dynamic charts
  • Custom CSS: Styled UI components with disclaimer boxes

πŸ“ Project Structure

HealthEdu/
β”œβ”€β”€ Health_edu.py              # Main Streamlit application (ML + Frontend)
β”‚   β”œβ”€β”€ ML Components:
β”‚   β”‚   β”œβ”€β”€ Random Forest training
β”‚   β”‚   β”œβ”€β”€ Symptom preprocessing
β”‚   β”‚   β”œβ”€β”€ Disease prediction
β”‚   β”‚   └── API integrations
β”‚   └── Frontend Components:
β”‚       β”œβ”€β”€ Streamlit UI
β”‚       β”œβ”€β”€ Interactive visualizations
β”‚       └── User interface design
β”‚
β”œβ”€β”€ Training.csv               # ML training dataset (disease-symptom matrix)
β”œβ”€β”€ Symptom-severity.csv       # Symptom severity weights
β”œβ”€β”€ symptoms_df.csv            # Symptom reference database
β”œβ”€β”€ description.csv            # Disease information
β”œβ”€β”€ medications.csv            # Medication database
β”œβ”€β”€ diets.csv                  # Dietary recommendations
β”œβ”€β”€ precautions_df.csv         # Safety precautions
β”œβ”€β”€ workout_df.csv             # Exercise routines
β”œβ”€β”€ req.txt                    # Python dependencies
└── README.md                  # Documentation

πŸ’‘ How It Works

1. User Input

Users select symptoms from an interactive multi-select interface with 5-column button layout

2. Data Preprocessing

# Binary encoding of symptoms
input_data = {symptom: (1 if symptom in selected else 0) 
              for symptom in symptom_columns}
input_df = pd.DataFrame([input_data])

3. ML Prediction

Random Forest classifier analyzes symptom patterns and predicts the most likely condition

4. Information Retrieval

System fetches relevant information from multiple CSV databases:

  • Disease descriptions
  • Medication recommendations (with API verification)
  • Dietary guidelines
  • Precautionary measures
  • Workout suggestions

5. Educational Presentation

Results displayed with clear disclaimers, professional resource links, and next steps

🎯 Application Modules

🏠 Home

  • Platform overview and mission
  • Clear educational disclaimers
  • Usage instructions
  • Professional resource directory

πŸ“‹ Symptom Information System

  • Multi-symptom selection interface
  • ML-based pattern matching
  • Condition information display
  • Doctor consultation guidance

πŸ’Š Medication Education

  • Symptom-to-medication learning
  • Interactive medication lookup
  • RxTerms API integration
  • Indian government portal links
  • Never replaces prescriptions

πŸ” Medication Verification

  • Condition-medication association checker
  • Educational database cross-reference
  • Individual medicine lookup
  • Safety information

πŸ“Š Data Visualization

  • Condition frequency bar charts
  • Distribution pie charts (Top 10)
  • Symptom scatter plots with Plotly
  • Interactive data exploration

πŸ₯ Professional Resources

  • Emergency contacts (India: 112, 102/108, 104)
  • WHO, CDC, Mayo Clinic links
  • Indian government health portals
  • Verified medical databases

⚠️ Critical Disclaimers

This application is STRICTLY educational and NOT for medical diagnosis, treatment, or prescription.

  • ❌ NOT a substitute for professional medical advice
  • ❌ NOT a diagnostic tool
  • ❌ NOT a prescription service
  • βœ… IS an educational learning platform
  • βœ… IS designed to facilitate informed conversations with doctors

Always consult qualified healthcare professionals for:

  • Medical diagnosis
  • Treatment decisions
  • Medication prescriptions
  • Health concerns

Emergency contacts are provided throughout the application.

🌍 Indian Healthcare Integration

Special features for Indian users:

  • CDSCO (Central Drugs Standard Control Organization)
  • IPC (Indian Pharmacopoeia Commission)
  • Emergency numbers: 112 (National), 102/108 (Ambulance), 104 (Medical Helpline)
  • Links to 1mg, PharmEasy, Apollo Pharmacy
  • Doctor search via Practo, DocPrime, Lybrate

πŸ‘₯ Development Credits

πŸ€– Machine Learning & Backend Development

Developed by: sobhan2204

Responsibilities:

  • Random Forest classifier implementation
  • Data preprocessing and feature engineering
  • API integrations (RxTerms, medical databases)
  • Prediction engine development
  • Model training and validation
  • Backend logic and data flow

🎨 Frontend Development

Developed by: My talented friend

Responsibilities:

  • Streamlit interface design
  • Custom CSS styling
  • Interactive visualizations (Matplotlib, Plotly)
  • User experience optimization
  • UI components and layouts

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Areas for contribution:

  • Additional ML algorithms (SVM, Neural Networks)
  • Expanded datasets
  • Multi-language support
  • Mobile responsiveness
  • Additional visualizations

πŸ“ Future Enhancements

  • Deep learning models (CNN, LSTM)
  • Medical imaging analysis
  • Chatbot integration with LLMs
  • Multi-language support (Hindi, regional languages)
  • Mobile app (React Native/Flutter)
  • Wearable device integration
  • Enhanced API integrations
  • User health history tracking
  • Telemedicine platform integration
  • Expanded Indian regional medical resources

πŸ› Known Issues

  • CSV filename typo handling: symptoms_df.csv vs symtoms_df.csv
  • RxTerms API may have rate limits (cached for 1 hour)
  • Some medications may require name normalization for accurate lookup

πŸ“Š Model Performance

The Random Forest classifier is trained with:

  • Test size: 30% of dataset
  • Random state: 42 (reproducible results)
  • Validation: Accuracy score and classification report
  • Real-time inference: < 1 second prediction time

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“§ Contact

Sobhan - @sobhan2204

Project Link: https://github.com/sobhan2204/HealthEdu


⭐ If you find this project helpful for learning about health and ML, please star the repository!

Made with ❀️ for health education, medical ethics, and responsible AI


πŸ™ Acknowledgments

  • Medical datasets used for educational purposes
  • NLM Clinical Tables (RxTerms API)
  • Indian government health portals
  • Streamlit community
  • Scikit-learn documentation
  • Healthcare professionals who inspire ethical health tech

About

AI-powered health education platform using Random Forest Classification to predict diseases from symptoms and provide personalized recommendations (medications, diet, workouts, precautions). Integrates RxTerms API and Indian medical databases (CDSCO, IPC, 1mg). Features interactive visualizations with Plotly/Matplotlib. Frontend made by my friend

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages