This project applies Logistic Regression to classify breast cancer tumors as Benign or Malignant using the Breast Cancer Wisconsin (Diagnostic) dataset from Kaggle.
The goal is to build a simple but effective predictive model for early detection of breast cancer, which is a critical step in assisting doctors with diagnosis.
- Source: Breast Cancer Wisconsin (Diagnostic) Dataset - Kaggle
- Features: 30 numerical features computed from digitized images of breast masses (radius, texture, smoothness, etc.)
- Target:
0β Malignant (cancerous)1β Benign (non-cancerous)
- Samples: 569 total
- Language: Python
- Environment: Google Colab
- Libraries Used:
numpypandasmatplotlib/seaborn(for visualization)scikit-learn(for preprocessing, model training & evaluation)
-
Data Loading & Exploration
- Import dataset, inspect features, check for missing values
- Visualize class distribution
-
Data Preprocessing
- Train-test split
-
Model Building
- Logistic Regression model using Scikit-learn
-
Model Evaluation
- Accuracy Score
- Accuracy Achieved: ~95% (depending on random split)
- Logistic Regression proved effective in distinguishing between benign and malignant cases.
- Thevindu Dilmith