Skip to content

AbhishekR045/NLP

Repository files navigation

Healthcare-data-cleansing-NLP

Project Overview

This project focuses on automating the data cleansing process in healthcare records using Natural Language Processing (NLP). By leveraging NLP techniques, we aim to enhance the quality, consistency, and accuracy of healthcare data, ensuring it is suitable for further analysis and machine learning applications.

Dataset

Name: healthcare_dataset.csv

Source : Kaggle

Description: The dataset consists of healthcare records with various inconsistencies such as missing values, duplicate entries, and incorrect formats. It is used to train and evaluate our NLP-based data cleansing model.

Training Notebook

Notebook Name: NLP.ipynb

Details: The dataset has been preprocessed, trained, and evaluated in the Jupyter Notebook NLP.ipynb.

The notebook contains: Data loading and preprocessing steps

NLP-based cleansing techniques

Model training and validation

Performance analysis and results

Healthcare Data Cleansing Dashboard

Overview

The Healthcare Data Cleansing Dashboard is an interactive web-based tool designed to monitor, analyze, and enhance the quality of healthcare datasets. It enables users to track data quality scores, identify common issues, and oversee cleansing operations in real time.

Key Features

Data Quality Score Visualization: Displays an overall data quality score with color-coded indicators.

Trends Analysis: Line charts track quality improvements over time.

Issue Summary: Highlights common data issues like missing values, format inconsistencies, and duplicates.

Data Upload & Analysis: Allows users to upload healthcare datasets for cleansing and validation.

Cleansing Operations Tracking: Displays recent data cleansing activities with details on processed records and resolved issues.

Technologies Used

Frontend: HTML, CSS, Bootstrap

Backend & Data Processing: JavaScript (Chart.js for visualization)

File Handling: CSV data upload and analysis

Contributors

Abhishek R (1BC21AI001) Fairoz Khan (1BC21AI004) Syed Saad (1BC21AI001) Pruthviraj M Y (1BC22AI401)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors