Lung Cancer Prediction Using Machine Learning

This project focuses on the detection of lung cancer using a dataset consisting of various symptoms and risk factors. The notebook explores preprocessing techniques, model training, evaluation, and insights to build effective prediction systems for lung cancer using supervised machine learning algorithms.

🎯 Objective

To build a machine learning model that can predict the presence of lung cancer based on various lifestyle, health, and demographic attributes. The aim is to assist healthcare professionals in identifying high-risk individuals early.

📊 Dataset

The dataset includes the following features:

GENDER: Male or Female
AGE: Age of the individual
SMOKING: Smoker (1) or Non-smoker (0)
YELLOW_FINGERS: Presence of yellow fingers due to smoking
ANXIETY, PEER_PRESSURE, CHRONIC DISEASE, FATIGUE, ALLERGY, WHEEZING, ALCOHOL CONSUMING, COUGHING, SHORTNESS OF BREATH, SWALLOWING DIFFICULTY, CHEST PAIN: Binary indicators of symptoms
LUNG_CANCER: Target class (YES or NO)

Target Variable: LUNG_CANCER

🧰 Dependencies

Install the following Python packages before running the notebook:

pip install pandas numpy matplotlib seaborn scikit-learn

You can also use the provided requirements.txt file (to be created from the notebook environment).

🔍 Exploratory Data Analysis

Basic Exploration:

Shape of dataset
Info and data types
Null value check
Value counts of target label LUNG_CANCER

Visualizations:

Countplots for categorical features (e.g., Gender, Smoking)
Pie chart for lung cancer distribution
Heatmap showing feature correlation

🔄 Data Preprocessing

Label Encoding

Categorical columns like GENDER and LUNG_CANCER were label-encoded using LabelEncoder() from sklearn.

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['GENDER'] = le.fit_transform(df['GENDER'])
df['LUNG_CANCER'] = le.fit_transform(df['LUNG_CANCER'])

Splitting Dataset

Split into features and target:

X = df.drop(columns='LUNG_CANCER')
y = df['LUNG_CANCER']

Train/test split:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

🤖 Model Building

Trained various classification models:

Logistic Regression
Support Vector Classifier
Decision Tree
Random Forest
K-Nearest Neighbors

Each model was trained and tested using scikit-learn’s implementation.

Example:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

📈 Model Evaluation

Metrics Used:

Accuracy Score
Confusion Matrix
Classification Report

Example:

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))
print("Accuracy:", accuracy_score(y_test, predictions))

Summary of Results:

Model	Accuracy
Logistic Regression	~87%
SVM (Linear Kernel)	~87%
Decision Tree	~86%
Random Forest	~86%
KNN	~86%

(Note: These are approximations from notebook outputs. Exact scores depend on your dataset state.)

✅ Conclusion

Logistic Regressiont and Support Vector Machine models showed the highest accuracy.
Features such as Smoking, Yellow Fingers, and Chest Pain had strong correlations with lung cancer diagnosis.
Machine learning can aid in early lung cancer detection when combined with reliable data.

🔮 Future Work

Use cross-validation for robust evaluation
Apply hyperparameter tuning (e.g., GridSearchCV)
Explore ensemble stacking techniques
Integrate model into a web-based application using Streamlit or Flask

📜 License

This project is licensed under the MIT License. Feel free to modify and use it for educational or commercial purposes.

📧 Contact

For queries or contributions, please contact [iamnaveen1401@gmail.com]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Lung Cancer - Dataset.csv		Lung Cancer - Dataset.csv
Lung Cancer - image.png		Lung Cancer - image.png
Python_code - LUNG CANCER PREDICTION.ipynb		Python_code - LUNG CANCER PREDICTION.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lung Cancer Prediction Using Machine Learning

📁 Table of Contents

🎯 Objective

📊 Dataset

🧰 Dependencies

🔍 Exploratory Data Analysis

Basic Exploration:

Visualizations:

🔄 Data Preprocessing

Label Encoding

Splitting Dataset

🤖 Model Building

📈 Model Evaluation

Metrics Used:

Summary of Results:

✅ Conclusion

🔮 Future Work

📜 License

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lung Cancer Prediction Using Machine Learning

📁 Table of Contents

🎯 Objective

📊 Dataset

🧰 Dependencies

🔍 Exploratory Data Analysis

Basic Exploration:

Visualizations:

🔄 Data Preprocessing

Label Encoding

Splitting Dataset

🤖 Model Building

📈 Model Evaluation

Metrics Used:

Summary of Results:

✅ Conclusion

🔮 Future Work

📜 License

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages