This project focuses on automating the data cleansing process in healthcare records using Natural Language Processing (NLP). By leveraging NLP techniques, we aim to enhance the quality, consistency, and accuracy of healthcare data, ensuring it is suitable for further analysis and machine learning applications.
Name: healthcare_dataset.csv
Source : Kaggle
Description: The dataset consists of healthcare records with various inconsistencies such as missing values, duplicate entries, and incorrect formats. It is used to train and evaluate our NLP-based data cleansing model.
Notebook Name: NLP.ipynb
Details: The dataset has been preprocessed, trained, and evaluated in the Jupyter Notebook NLP.ipynb.
The notebook contains: Data loading and preprocessing steps
NLP-based cleansing techniques
Model training and validation
Performance analysis and results
The Healthcare Data Cleansing Dashboard is an interactive web-based tool designed to monitor, analyze, and enhance the quality of healthcare datasets. It enables users to track data quality scores, identify common issues, and oversee cleansing operations in real time.
Data Quality Score Visualization: Displays an overall data quality score with color-coded indicators.
Trends Analysis: Line charts track quality improvements over time.
Issue Summary: Highlights common data issues like missing values, format inconsistencies, and duplicates.
Data Upload & Analysis: Allows users to upload healthcare datasets for cleansing and validation.
Cleansing Operations Tracking: Displays recent data cleansing activities with details on processed records and resolved issues.
Frontend: HTML, CSS, Bootstrap
Backend & Data Processing: JavaScript (Chart.js for visualization)
File Handling: CSV data upload and analysis
Abhishek R (1BC21AI001) Fairoz Khan (1BC21AI004) Syed Saad (1BC21AI001) Pruthviraj M Y (1BC22AI401)