HealthNER is a Natural Language Processing (NLP) project built using spaCy to automatically extract meaningful medical entities such as diseases, symptoms, and drugs from unstructured clinical text. This allows healthcare data to be converted into a more structured and analyzable format.
Try it now: https://healthner.streamlit.app/
- 🔍 Extracts Diseases, Symptoms, and other entities from raw medical text
- 🧠 Trains a custom Named Entity Recognition (NER) model using the NCBI Disease dataset
- 📈 Evaluates model performance using Precision, Recall, and F1 Score
- 🖼️ Visualizes entities using spaCy's
displacyand Streamlit UI - 📦 Deployable as an interactive web app
- 🌙 Dark/Light mode toggle for better user experience
- 📊 Interactive charts and analytics with entity distribution
- 📄 Export functionality (PDF/CSV) for analysis reports
- 🕓 Analysis history tracking
- Python - Core programming language
- spaCy - NLP library for entity recognition
- Streamlit - Web app framework
- Plotly - Interactive visualizations
- Pandas - Data manipulation
- FPDF - PDF report generation
.
├── convert_ncbi_to_spacy.py # Converts NCBI dataset into spaCy format
├── train.py # Trains the NER model using spaCy
├── test_model.py # Tests the model on simple input
├── test_model_on_data.py # Tests on complex clinical paragraphs
├── streamlit_app.py # Streamlit UI for real-time testing
├── visualize.py # Saves NER visualizations as HTML
├── requirements.txt # Python dependencies
├── output/ # Trained model files
├── README.md # Project overview- Python 3.7+
- pip package manager
-
Clone the repository
git clone https://github.com/DSUCS0018/HeallthNER.git cd HeallthNER -
Install dependencies
pip install -r requirements.txt
-
Download spaCy English model
python -m spacy download en_core_web_sm
# Convert NCBI dataset to spaCy format
python convert_ncbi_to_spacy.py
# Train the custom NER model
python train.py# Test on simple input
python test_model.py
# Test on complex clinical data
python test_model_on_data.pystreamlit run streamlit_app.pyThe app will be available at http://localhost:8501
The custom NER model is trained on the NCBI Disease dataset and evaluated using standard NLP metrics:
- Precision: Accuracy of entity predictions
- Recall: Coverage of actual entities
- F1 Score: Harmonic mean of precision and recall
- Clinical Research: Extract entities from research papers and clinical notes
- Healthcare Analytics: Convert unstructured patient data into structured format
- Medical Documentation: Automatically tag and categorize medical documents
- Drug Discovery: Identify relationships between diseases, symptoms, and treatments
- spaCy team for the excellent NLP library
- NCBI for providing the disease dataset
- Streamlit for the amazing web app framework
Your Name - [Your Email] - [Your LinkedIn]
Project Link: https://github.com/DSUCS0018/HeallthNER
Live Demo: https://heallthner.streamlit.app/

