Skip to content

NLP + Machine Learning project identifying student math misconceptions using open-ended responses. Includes TF-IDF, embeddings, logistic regression, deep learning baselines, and full model evaluation.

Notifications You must be signed in to change notification settings

LanaGeis/MAP-Student-Math-Misunderstandings_Kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAP - Charting Student Math Misunderstandings

Project Overview

This project uses Natural Language Processing (NLP) and Machine Learning to identify student math misconceptions from open-ended responses. It is based on the Kaggle competition dataset.

The project implements a 3-stage modeling approach:

  1. Binary Classification: Predict correct vs. incorrect answers.
  2. 3-Class Classification: Categorize explanations (Correct, Misconception, Neither).
  3. Multiclass Classification: Identify specific misconception types (35+ categories).

Structure

  • Term_Project_geissinger_final.ipynb: Main analysis and modeling notebook.
  • project_math/: Folder containing the dataset (train.csv, test.csv).
  • requirements.txt: List of dependencies.

Setup

  1. Clone the repository.
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the notebook: Open Term_Project_geissinger_final.ipynb and run all cells. The notebook is configured to look for data in the project_math/ directory by default.

Models

  • Text Representation: TF-IDF and Sentence Transformers (embeddings).
  • Classifiers: Random Forest and Logistic Regression.
  • Handling Imbalance: BorderlineSMOTE.

About

NLP + Machine Learning project identifying student math misconceptions using open-ended responses. Includes TF-IDF, embeddings, logistic regression, deep learning baselines, and full model evaluation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published