Run Health_edu by ---> streamlit run ossp_pbl.py
A comprehensive health education platform that leverages Random Forest Classification to predict diseases based on symptoms and provides personalized health recommendations including medications, dietary plans, precautions, and workout routines. Built with a strong emphasis on medical ethics and educational transparency.
- Random Forest Classifier: Advanced ensemble learning for accurate disease-symptom pattern matching
- Multi-Label Symptom Processing: Handles complex symptom combinations
- 30% Test Split Validation: Rigorous model testing for reliability
- Real-time Prediction Engine: Instant disease classification from user inputs
- Interactive symptom selection interface
- Pattern-based condition matching
- Educational disease information lookup
- Multi-symptom analysis capabilities
- Drug-condition association learning
- Integration with RxTerms API (NLM Clinical Tables)
- Links to Indian government portals (CDSCO, IPC)
- Access to verified databases (1mg, PharmEasy, Drugs.com)
- Intelligent medicine name normalization and fallback queries
- Educational medication-condition verification
- Cross-reference with medical databases
- Safe medication information lookup
- Interactive Charts: Condition frequency analysis using Streamlit charts
- Pie Charts: Top 10 conditions distribution with Matplotlib
- Scatter Plots: Symptom correlation analysis using Plotly
- Real-time data exploration tools
- Comprehensive medical disclaimers throughout the application
- Clear distinction between education and medical advice
- Professional resource directory (WHO, CDC, Mayo Clinic)
- Indian-specific medical resources (Emergency: 112, 102/108, 104)
- Emphasis on consulting healthcare professionals
- Python 3.8 or higher
- pip package manager
- Clone the repository:
git clone https://github.com/sobhan2204/HealthEdu.git
cd HealthEdu- Install required dependencies:
pip install -r req.txtRequired packages:
- streamlit
- pandas
- matplotlib
- plotly
- scikit-learn
- requests
- urllib3
Launch the Streamlit app:
streamlit run Health_edu.pyThe application will open automatically in your default web browser at http://localhost:8501
The project utilizes 8 comprehensive medical datasets:
| Dataset | Purpose | Contents |
|---|---|---|
| Training.csv | ML Model Training | Disease-symptom relationships for Random Forest |
| Symptom-severity.csv | Severity Analysis | Weighted severity scores for symptoms |
| symptoms_df.csv | Symptom Database | Comprehensive symptom catalog |
| description.csv | Medical Information | Detailed disease descriptions |
| medications.csv | Pharmacology | Medication recommendations per condition |
| diets.csv | Nutrition | Dietary guidelines for conditions |
| precautions_df.csv | Safety Guidelines | Precautionary measures |
| workout_df.csv | Fitness Plans | Condition-specific exercise routines |
# Feature Engineering
symptom_columns = training_df.columns[1:-1]
X = training_df[symptom_columns] # Multi-symptom features
y = training_df[disease_column] # Disease labels
# Train-Test Split (70-30)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Random Forest Classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Prediction with accuracy tracking
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)- Algorithm: Random Forest (ensemble method with multiple decision trees)
- Validation: 70-30 train-test split for robust evaluation
- Input Processing: Binary symptom encoding (present=1, absent=0)
- Output: Multi-class disease classification
- Performance Metrics: Accuracy score and classification reports
- RxTerms API: NLM Clinical Tables for medication information
- Caching:
@st.cache_datafor optimized API performance (1-hour TTL) - Error Handling: Comprehensive fallback mechanisms
- Python 3.8+: Core programming language
- Scikit-learn: Random Forest implementation, preprocessing, metrics
- Pandas: Data manipulation and CSV processing
- NumPy: Numerical computations (via sklearn dependencies)
- Requests: External API integration
- Urllib: URL encoding for API calls
- Streamlit: Interactive web application framework
- Matplotlib: Static visualizations (pie charts, histograms)
- Plotly Express: Interactive 3D scatter plots and dynamic charts
- Custom CSS: Styled UI components with disclaimer boxes
HealthEdu/
βββ Health_edu.py # Main Streamlit application (ML + Frontend)
β βββ ML Components:
β β βββ Random Forest training
β β βββ Symptom preprocessing
β β βββ Disease prediction
β β βββ API integrations
β βββ Frontend Components:
β βββ Streamlit UI
β βββ Interactive visualizations
β βββ User interface design
β
βββ Training.csv # ML training dataset (disease-symptom matrix)
βββ Symptom-severity.csv # Symptom severity weights
βββ symptoms_df.csv # Symptom reference database
βββ description.csv # Disease information
βββ medications.csv # Medication database
βββ diets.csv # Dietary recommendations
βββ precautions_df.csv # Safety precautions
βββ workout_df.csv # Exercise routines
βββ req.txt # Python dependencies
βββ README.md # Documentation
Users select symptoms from an interactive multi-select interface with 5-column button layout
# Binary encoding of symptoms
input_data = {symptom: (1 if symptom in selected else 0)
for symptom in symptom_columns}
input_df = pd.DataFrame([input_data])Random Forest classifier analyzes symptom patterns and predicts the most likely condition
System fetches relevant information from multiple CSV databases:
- Disease descriptions
- Medication recommendations (with API verification)
- Dietary guidelines
- Precautionary measures
- Workout suggestions
Results displayed with clear disclaimers, professional resource links, and next steps
- Platform overview and mission
- Clear educational disclaimers
- Usage instructions
- Professional resource directory
- Multi-symptom selection interface
- ML-based pattern matching
- Condition information display
- Doctor consultation guidance
- Symptom-to-medication learning
- Interactive medication lookup
- RxTerms API integration
- Indian government portal links
- Never replaces prescriptions
- Condition-medication association checker
- Educational database cross-reference
- Individual medicine lookup
- Safety information
- Condition frequency bar charts
- Distribution pie charts (Top 10)
- Symptom scatter plots with Plotly
- Interactive data exploration
- Emergency contacts (India: 112, 102/108, 104)
- WHO, CDC, Mayo Clinic links
- Indian government health portals
- Verified medical databases
This application is STRICTLY educational and NOT for medical diagnosis, treatment, or prescription.
- β NOT a substitute for professional medical advice
- β NOT a diagnostic tool
- β NOT a prescription service
- β IS an educational learning platform
- β IS designed to facilitate informed conversations with doctors
Always consult qualified healthcare professionals for:
- Medical diagnosis
- Treatment decisions
- Medication prescriptions
- Health concerns
Emergency contacts are provided throughout the application.
Special features for Indian users:
- CDSCO (Central Drugs Standard Control Organization)
- IPC (Indian Pharmacopoeia Commission)
- Emergency numbers: 112 (National), 102/108 (Ambulance), 104 (Medical Helpline)
- Links to 1mg, PharmEasy, Apollo Pharmacy
- Doctor search via Practo, DocPrime, Lybrate
Developed by: sobhan2204
Responsibilities:
- Random Forest classifier implementation
- Data preprocessing and feature engineering
- API integrations (RxTerms, medical databases)
- Prediction engine development
- Model training and validation
- Backend logic and data flow
Developed by: My talented friend
Responsibilities:
- Streamlit interface design
- Custom CSS styling
- Interactive visualizations (Matplotlib, Plotly)
- User experience optimization
- UI components and layouts
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Areas for contribution:
- Additional ML algorithms (SVM, Neural Networks)
- Expanded datasets
- Multi-language support
- Mobile responsiveness
- Additional visualizations
- Deep learning models (CNN, LSTM)
- Medical imaging analysis
- Chatbot integration with LLMs
- Multi-language support (Hindi, regional languages)
- Mobile app (React Native/Flutter)
- Wearable device integration
- Enhanced API integrations
- User health history tracking
- Telemedicine platform integration
- Expanded Indian regional medical resources
- CSV filename typo handling:
symptoms_df.csvvssymtoms_df.csv - RxTerms API may have rate limits (cached for 1 hour)
- Some medications may require name normalization for accurate lookup
The Random Forest classifier is trained with:
- Test size: 30% of dataset
- Random state: 42 (reproducible results)
- Validation: Accuracy score and classification report
- Real-time inference: < 1 second prediction time
This project is licensed under the MIT License - see the LICENSE file for details.
Sobhan - @sobhan2204
Project Link: https://github.com/sobhan2204/HealthEdu
β If you find this project helpful for learning about health and ML, please star the repository!
Made with β€οΈ for health education, medical ethics, and responsible AI
- Medical datasets used for educational purposes
- NLM Clinical Tables (RxTerms API)
- Indian government health portals
- Streamlit community
- Scikit-learn documentation
- Healthcare professionals who inspire ethical health tech