This project detects whether a pair of Quora questions are duplicates using classical machine learning techniques. It leverages a subset of the official Quora Question Pairs dataset and applies a Random Forest Classifier for prediction.
- Source: Quora Question Pairs
- Samples Used: 100,000 rows from the training data
- Target Column:
is_duplicate(1 if duplicate, 0 otherwise)
- Algorithm: Random Forest Classifier
- Library:
scikit-learn - Accuracy Achieved: 80.755%
- Python 3.x
- pandas, numpy
- scikit-learn
- Jupyter Notebook "https://www.kaggle.com/code/pradeepkrmahato/getting-started-with-nlp"