Tentanator - AI-Powered CSV Exam Grading Assistant

Tentanator is a Python-based intelligent grading system that combines manual grading with OpenAI fine-tuning to streamline exam grading workflows. It enables educators to grade a sample of responses manually, train custom AI models on that data, and automatically grade the remaining responses with AI assistance.

Features

Interactive CSV Grading Interface: Grade exam responses directly from CSV files with a user-friendly CLI
AI-Assisted Grading: After grading sample responses, use fine-tuned GPT models to suggest grades for remaining responses
Content Moderation: Automatic filtering of harmful content before training with OpenAI's moderation API
OpenAI Fine-Tuning Integration: Automatically export graded data to JSONL format and train custom grading models
Session Persistence: Resume grading sessions at any time with automatic session saving
Smart Sampling: Choose from multiple sampling algorithms (KMeans, maximin, random, GPTSort) to select representative responses
Smart Auto-Grading: Automatically assigns grade "0" to blank or dash responses
Batch Export: Export fully graded CSV files with all grades filled in
Excel Export: Convert graded CSV files to Excel format with auto-adjusted column widths
Model Registry: Track and manage all fine-tuned models for different questions
Global Question Bank: Link questions across multiple exams to build comprehensive training datasets

Installation

Prerequisites

Python 3.8 or higher
OpenAI API key (for AI-assisted grading features)

Setup

Clone the repository:

git clone https://github.com/Edwinexd/tentanator.git
cd tentanator

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up your OpenAI API key: Create a .env file in the project root:

OPENAI_API_KEY=your-api-key-here

Usage

Complete Grading Workflow

Step 1: Initial Setup and Sampling

Start the grading application:

python tentanator.py

Select your CSV file:
- Place exam CSV files in the exams/ directory
- The program will list available files for selection
Configure column mappings:
- Select ID columns (e.g., student ID, name)
- Select input columns (student responses to grade)
- Select output columns (where grades will be stored)
- Link to global question bank (optional, for cross-exam model reuse)
Choose sampling method (configurable in tentanator.py):
- kmeans_auto: Automatically determines optimal clusters (recommended)
- kmeans_fixed: Fixed number of clusters
- maximin: Diversity-based sampling
- random: Random selection
- gptsort: GPT-based quality sorting
- Default: 25 representative samples per question

Step 2: Grade Sample Responses

Grade the selected samples:
- Grade the representative samples shown by the system
- Minimum 25 valid responses required per question (configurable)
- Use commands:
  - q - quit and save session
  - s - skip current response
  - b - go back to previous response
  - Type grade value directly
- Session is auto-saved after each grade
Export training data:
- After reaching the minimum sample threshold (default: 25)
- Choose to export to JSONL format for OpenAI fine-tuning
- Training data saved in training_data/ directory
- Enter exam question text when prompted (used in training)

Step 3: Train the AI Model

Run the training module:

python openai_trainer.py

Configure and start training:
- Select JSONL file(s) to train (or choose "all" for batch training)
- Content moderation runs automatically:
  - Each training example is checked for harmful content
  - Flagged examples are excluded from training
  - Statistics displayed: total examples, flagged count, categories
  - Training aborted if >50% of content is flagged
- Upload proceeds with clean examples only
- Fine-tuning job is created and monitored
Monitor training progress:
- Training typically takes 10-30 minutes per model
- OpenAI allows up to 6 concurrent fine-tuning jobs
- Models are automatically registered in models.json when complete
- Global question IDs link models across exams

Step 4: AI-Assisted Grading

Resume grading with trained model:

python tentanator.py

Select the same session:

System detects available trained models
AI automatically suggests grades for remaining responses
Pre-computes suggestions for smooth grading experience

Review and finalize:

Review AI suggestions (shown before each response)
Press [Enter] to accept suggestion
Type grade value to override
All grades are recorded with timestamps
Export final CSV when complete

File Format Conversion

The make_excel.py utility handles bidirectional conversion between CSV and Excel formats:

python make_excel.py

What it does:

Excel → CSV (Input Processing):

Converts ungraded Excel files from exams_in/ directory
Creates CSV files in exams/ directory for grading
Supports both .xlsx and .xls formats
Reads first sheet by default

CSV → Excel (Output Formatting):

Converts graded CSV files from graded_exams/ directory
Creates Excel files in graded_exams_out/ directory
Auto-adjusts column widths for better readability
Preserves all data and formatting from the CSV files

Workflow:

Place ungraded Excel files in exams_in/
Run python make_excel.py to convert to CSV
Grade exams using tentanator.py
Run python make_excel.py again to convert graded results to Excel

Configuration Options

Key settings in tentanator.py:

GRADING_THRESHOLD = 25              # Minimum manual grades before training
NUM_REPRESENTATIVE_SAMPLES = 25     # Number of samples to grade
SAMPLING_ALGORITHM = "kmeans_auto"  # Sampling method

Available sampling algorithms:

kmeans_auto: Automatically determines optimal number of clusters
kmeans_fixed: Uses fixed number of clusters
maximin: Maximizes diversity in selected samples
random: Random selection
gptsort: Uses GPT to sort responses by quality

Content Moderation

All training data is automatically moderated before upload to OpenAI:

Moderation Categories Checked:

Harassment and threatening content
Hate speech and threatening hate
Illicit content and violent instructions
Self-harm content, intent, and instructions
Sexual content and minors
Violence and graphic violence

Moderation Behavior:

Individual messages (system, user, assistant) are checked
Flagged examples are automatically excluded from training
Detailed statistics shown: total, flagged count, categories
Training prevented if >50% of content is flagged
Fails open if moderation API encounters errors

To disable moderation (not recommended):

trainer.upload_training_file(filepath, question_name, moderate=False)

Project Structure

tentanator/
├── tentanator.py              # Main grading application
├── openai_trainer.py          # OpenAI fine-tuning module with moderation
├── sampling.py                # Sampling algorithms (KMeans, maximin, etc.)
├── embeddings.py              # OpenAI embeddings wrapper
├── make_excel.py              # Bidirectional CSV/Excel converter utility
├── global_bank.py             # Global question bank management
├── test_moderation.py         # Moderation testing suite
├── requirements.txt           # Python dependencies
├── .env                       # API keys (not in version control)
├── exams_in/                  # Input Excel files (converted to CSV)
│   └── *.xlsx, *.xls
├── exams/                     # Input CSV files (ready for grading)
│   └── *.csv
├── graded_exams/              # Output CSV files with grades
│   └── *.csv
├── graded_exams_out/          # Excel exports of graded exams
│   └── *.xlsx
├── training_data/             # JSONL files for fine-tuning
│   ├── *.jsonl                # Combined training files
│   └── partials/              # Per-exam training data
│       └── *.jsonl
├── .tentanator_sessions/      # Saved grading sessions
│   └── *.json
├── global_bank.json           # Global question bank registry
├── models.json                # Registry of fine-tuned models
└── .tentanator_training_session.json  # Saved training session

Key Components

tentanator.py

The main application module containing:

Data Classes:
- GradedItem: Individual graded response
- QuestionGrades: Grades for a single question
- GradingSession: Complete grading session data
Core Functions:
- grade_questions(): Main interactive grading interface
- export_to_csv(): Export graded data to CSV
- export_to_jsonl(): Export training data for fine-tuning
- get_ai_grade_suggestion(): Get grade suggestions from trained models

make_excel.py

The bidirectional CSV/Excel converter utility containing:

Excel → CSV Conversion (convert_excel_to_csv()):
- Reads Excel files from exams_in/ directory
- Supports .xlsx and .xls formats
- Converts to CSV in exams/ directory
- Batch processing of multiple Excel files
CSV → Excel Conversion (convert_csv_to_excel()):
- Reads CSV files from graded_exams/ directory
- Creates Excel files in graded_exams_out/ directory
- Auto-adjusted column widths for optimal readability
- Batch processing of multiple CSV files
Main Function (make_excel()):
- Runs both conversion processes sequentially
- Excel→CSV first, then CSV→Excel
- Handles missing directories gracefully

openai_trainer.py

The OpenAI fine-tuning module with content moderation:

Data Classes:
- FineTuningConfig: Configuration for training jobs
- TrainingFile: Uploaded training file metadata
- FineTuningJob: Fine-tuning job tracking
- ModelRegistry: Registry of trained models
OpenAITrainer Class:
- moderate_content(): Check content using OpenAI moderation API
- validate_and_moderate_jsonl(): Validate format and filter harmful content
- validate_jsonl_file(): Validate training data format (no moderation)
- upload_training_file(): Upload data to OpenAI (with moderation by default)
- create_fine_tuning_job(): Start fine-tuning
- monitor_job(): Track job progress
- batch_grade_with_model(): Grade multiple responses

sampling.py

Implements various sampling algorithms for selecting representative responses:

SamplingAlgorithm: Type definition for available algorithms
Functions:
- kmeans_sample(): K-means clustering with auto/fixed cluster selection
- maximin_sample(): Maximize diversity using maximin distance
- random_sample(): Simple random sampling
- gptsort_sample(): GPT-based quality sorting
- select_representative_samples(): Main interface for all algorithms

embeddings.py

Wrapper for OpenAI text embeddings:

Uses text-embedding-3-large model
Async API calls for performance
Caching for repeated requests

Quick Start Workflow

First Time Setup

# 0. Convert Excel to CSV (if needed)
# Place Excel files in exams_in/
python make_excel.py
# → Converts Excel files to CSV in exams/

# 1. Setup and configure
python tentanator.py
# → Choose CSV file from exams/
# → Map ID, input, and output columns
# → Link to global question bank (optional)
# → Grade 25 representative samples (default)
# → Export to JSONL when prompted

# 2. Train AI models
python openai_trainer.py
# → Select "all" to train all untrained files
# → Content moderation runs automatically
# → Wait 10-30 minutes per model
# → Models registered automatically

# 3. Complete grading with AI assistance
python tentanator.py
# → Select the same session
# → AI suggests grades for remaining responses
# → Review and accept/modify suggestions
# → Export final CSV when complete

# 4. Convert to Excel
python make_excel.py
# → Converts graded CSV to Excel in graded_exams_out/

Detailed Workflow Example

Day 0: Convert Excel Files (Optional)

Place exam1.xlsx in exams_in/ directory
Run python make_excel.py
Excel file converted to exams/exam1.csv

Day 1: Setup and Initial Grading

Exam CSV already in exams/ directory (from conversion or manual placement)
Run python tentanator.py
Configure column mappings (ID: "Student ID", Input: "Response Q1", Output: "Grade Q1")
Link to global question "Calculus Derivatives" in global bank
System selects 25 representative samples using KMeans clustering
Grade the 25 samples (takes ~10-15 minutes)
Export to JSONL → creates gq1_exam1_Grade_Q1.jsonl
Quit and save session

Day 1: Train Model

Run python openai_trainer.py
Content moderation checks all 25 examples
- Example output: "Valid: 24 training examples (excluded 1 flagged)"
- Flagged categories: harassment (1 example excluded)
Upload proceeds with 24 clean examples
Fine-tuning job created (Job ID: ftjob-xxx)
Wait 15-20 minutes for completion
Model registered as ft:gpt-4-mini:...:tentanator_grade_q1:xxx

Day 2: Complete Grading

Run python tentanator.py
Select existing session "exam1_..."
System loads trained model for "Grade Q1"
AI pre-computes suggestions for next 5 responses
Review each suggestion, press Enter to accept or type override
Complete all 200 remaining responses (~15-20 minutes)
Export final CSV to graded_exams/exam1.csv
Run python make_excel.py → creates graded_exams_out/exam1.xlsx

Future Exams: Reuse Model

Load exam2.csv with same question
Link to same global question "Calculus Derivatives"
Grade 25 new samples → adds to existing training data
Retrain model with combined data from both exams
Use improved model for grading

Features in Detail

Session Persistence

All grading progress is automatically saved after each grade
Sessions stored in .tentanator_sessions/ directory
Sessions can be resumed at any time
Tracks graded items, timestamps, embeddings cache, and configuration
Multiple sessions can be active for different exams

Smart Grading Logic

Blank or dash responses auto-graded as "0"
Valid response counter excludes auto-graded items
Configurable threshold (default: 25 valid responses) required for training
Representative sampling reduces manual grading workload

Sampling Algorithms

KMeans Auto: Automatically determines optimal clusters using silhouette scoring
KMeans Fixed: Uses specified number of clusters
Maximin: Selects diverse samples by maximizing minimum distance
Random: Simple random selection (baseline)
GPTSort: Uses GPT to sort responses by quality without embeddings

Global Question Bank

Links identical questions across multiple exams
Combines training data from all linked exams
Single model trained on data from multiple exam iterations
Improves model accuracy with larger, more diverse datasets
Tracked in global_bank.json

Content Moderation

Automatic: Runs by default on all training data before upload
Categories: 13 moderation categories checked (harassment, hate, violence, etc.)
Statistics: Detailed reporting of flagged content and categories
Safety: Training prevented if >50% of content is flagged
Transparent: Shows which examples were excluded and why
Fail-Safe: Fails open if moderation API encounters errors

AI Integration

Pre-computes suggestions for smoother grading experience
Rolling window of 5 suggestions cached for performance
Models matched to questions by global question ID or normalized naming
Base system prompt includes exam question for context

Export and Import Options

CSV Export: Complete graded exam file with all grades
JSONL Export: OpenAI fine-tuning format with moderation
Partial Files: Per-exam training data in partials/ subdirectory
Combined Files: Merged training data for global questions
Excel Export: Formatted .xlsx files with auto-adjusted columns
Excel Import: Convert Excel files to CSV format for grading

Tips and Best Practices

Use Representative Sampling: The default kmeans_auto algorithm selects diverse samples, reducing the amount of manual grading needed while maintaining quality
Consistent Grading: Be consistent in your manual grading as the AI will learn from your patterns
Link Global Questions: Use the global question bank to combine training data across multiple exam iterations for more accurate models
Monitor Content Moderation: Check moderation statistics to ensure your training data is appropriate and unbiased
Review AI Suggestions: Always review AI-suggested grades, especially for edge cases or unusual responses
Gradual Improvement: Models improve with more training data - consider retraining after grading multiple exams
Backup Sessions: Session files are automatically created in .tentanator_sessions/ but consider backing up important grading data
Model Management: Use models.json to track which models are trained for which questions and when they were created
Batch Training: Use "all" option in openai_trainer.py to train multiple models simultaneously (up to 6 concurrent jobs)
Quality Over Quantity: 25 well-chosen representative samples often perform better than 50+ random samples

Troubleshooting

Common Issues

Missing API Key

Ensure .env file exists in project root with valid OPENAI_API_KEY
Test with: python -c "import dotenv; dotenv.load_dotenv(); import os; print('OK' if os.getenv('OPENAI_API_KEY') else 'MISSING')"

Session Recovery

Sessions stored in .tentanator_sessions/ directory
If corrupted, backup and delete the specific session JSON file
Start fresh by selecting "New session" in tentanator.py

Training Failures

Check OpenAI dashboard for quota or billing issues
Verify moderation didn't exclude too many examples (>50%)
Ensure minimum 10 examples remain after moderation
OpenAI limits: 6 concurrent fine-tuning jobs

Content Moderation Blocking Training

Review which categories are being flagged
Check if student responses contain inappropriate content
Consider if content is genuinely problematic or false positive
If false positive, contact OpenAI support or manually review

CSV Format Issues

Ensure CSV files have proper headers in first row
Use UTF-8 encoding (not ASCII or Latin-1)
Avoid special characters in column names
Check for consistent delimiter (comma vs semicolon)

Model Not Found

Verify model is registered in models.json
Check global question ID matches between session and model
Ensure fine-tuning job completed successfully
Run python openai_trainer.py to check job status

Slow Performance

Embeddings are cached after first use
First-time sampling may take 1-2 minutes for large datasets
AI suggestions are pre-computed in batches of 5
Consider using random sampling for faster initial selection

Requirements

Python Dependencies

Python 3.8+
openai>=1.0.0 (for fine-tuning and moderation APIs)
python-dotenv>=1.0.0 (for environment variable management)
pandas>=2.0.0 (for CSV processing)
openpyxl>=3.1.0 (for Excel export)
scikit-learn (for KMeans clustering and embeddings)
numpy (for numerical operations)

OpenAI API Access

Valid OpenAI API key with access to:
- Fine-tuning API (GPT-4 Mini recommended)
- Moderation API (free)
- Embeddings API (for sampling algorithms)
- Chat completions (for GPTSort sampling)

Costs Estimate

Embeddings: ~$0.13 per 1M tokens (text-embedding-3-large)
Fine-tuning: ~$3.00 per 1M tokens training (gpt-4.1-mini)
Inference: ~$0.30 per 1M tokens (fine-tuned model)
Moderation: Free
Example: 200 responses × 100 words each = ~26,700 tokens
- Embeddings: <$0.01
- Fine-tuning (25 samples): <$0.10
- Inference (175 graded): ~$0.01
- Total per exam: ~$0.12

License

GNU Affero General Public License - See LICENSE file for details

Contributing

Contributions are welcome! Please ensure all code follows the existing patterns:

Type hints for all functions
Dataclasses for data structures
Comprehensive error handling
Session persistence for long-running operations

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
combine_moodle_dumps.py		combine_moodle_dumps.py
embeddings.py		embeddings.py
make_excel.py		make_excel.py
openai_trainer.py		openai_trainer.py
process_global.py		process_global.py
requirements.txt		requirements.txt
sampling.py		sampling.py
search_question.py		search_question.py
tentanator.py		tentanator.py
test_sampling.py		test_sampling.py
test_visualization.py		test_visualization.py
workspace.py		workspace.py

Folders and files

Latest commit

History

Repository files navigation

Tentanator - AI-Powered CSV Exam Grading Assistant

Features

Installation

Prerequisites

Setup

Usage

Complete Grading Workflow

Step 1: Initial Setup and Sampling

Step 2: Grade Sample Responses

Step 3: Train the AI Model

Step 4: AI-Assisted Grading

File Format Conversion

Configuration Options

Content Moderation

Project Structure

Key Components

tentanator.py

make_excel.py

openai_trainer.py

sampling.py

embeddings.py

Quick Start Workflow

First Time Setup

Detailed Workflow Example

Features in Detail

Session Persistence

Smart Grading Logic

Sampling Algorithms

Global Question Bank

Content Moderation

AI Integration

Export and Import Options

Tips and Best Practices

Troubleshooting

Common Issues

Requirements

Python Dependencies

OpenAI API Access

Costs Estimate

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages