Tentanator is a Python-based intelligent grading system that combines manual grading with OpenAI fine-tuning to streamline exam grading workflows. It enables educators to grade a sample of responses manually, train custom AI models on that data, and automatically grade the remaining responses with AI assistance.
- Interactive CSV Grading Interface: Grade exam responses directly from CSV files with a user-friendly CLI
- AI-Assisted Grading: After grading sample responses, use fine-tuned GPT models to suggest grades for remaining responses
- Content Moderation: Automatic filtering of harmful content before training with OpenAI's moderation API
- OpenAI Fine-Tuning Integration: Automatically export graded data to JSONL format and train custom grading models
- Session Persistence: Resume grading sessions at any time with automatic session saving
- Smart Sampling: Choose from multiple sampling algorithms (KMeans, maximin, random, GPTSort) to select representative responses
- Smart Auto-Grading: Automatically assigns grade "0" to blank or dash responses
- Batch Export: Export fully graded CSV files with all grades filled in
- Excel Export: Convert graded CSV files to Excel format with auto-adjusted column widths
- Model Registry: Track and manage all fine-tuned models for different questions
- Global Question Bank: Link questions across multiple exams to build comprehensive training datasets
- Python 3.8 or higher
- OpenAI API key (for AI-assisted grading features)
- Clone the repository:
git clone https://github.com/Edwinexd/tentanator.git
cd tentanator- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up your OpenAI API key:
Create a
.envfile in the project root:
OPENAI_API_KEY=your-api-key-here- Start the grading application:
python tentanator.py-
Select your CSV file:
- Place exam CSV files in the
exams/directory - The program will list available files for selection
- Place exam CSV files in the
-
Configure column mappings:
- Select ID columns (e.g., student ID, name)
- Select input columns (student responses to grade)
- Select output columns (where grades will be stored)
- Link to global question bank (optional, for cross-exam model reuse)
-
Choose sampling method (configurable in
tentanator.py):kmeans_auto: Automatically determines optimal clusters (recommended)kmeans_fixed: Fixed number of clustersmaximin: Diversity-based samplingrandom: Random selectiongptsort: GPT-based quality sorting- Default: 25 representative samples per question
-
Grade the selected samples:
- Grade the representative samples shown by the system
- Minimum 25 valid responses required per question (configurable)
- Use commands:
q- quit and save sessions- skip current responseb- go back to previous response- Type grade value directly
- Session is auto-saved after each grade
-
Export training data:
- After reaching the minimum sample threshold (default: 25)
- Choose to export to JSONL format for OpenAI fine-tuning
- Training data saved in
training_data/directory - Enter exam question text when prompted (used in training)
- Run the training module:
python openai_trainer.py-
Configure and start training:
- Select JSONL file(s) to train (or choose "all" for batch training)
- Content moderation runs automatically:
- Each training example is checked for harmful content
- Flagged examples are excluded from training
- Statistics displayed: total examples, flagged count, categories
- Training aborted if >50% of content is flagged
- Upload proceeds with clean examples only
- Fine-tuning job is created and monitored
-
Monitor training progress:
- Training typically takes 10-30 minutes per model
- OpenAI allows up to 6 concurrent fine-tuning jobs
- Models are automatically registered in
models.jsonwhen complete - Global question IDs link models across exams
- Resume grading with trained model:
python tentanator.py- Select the same session:
- System detects available trained models
- AI automatically suggests grades for remaining responses
- Pre-computes suggestions for smooth grading experience
- Review and finalize:
- Review AI suggestions (shown before each response)
- Press
[Enter]to accept suggestion - Type grade value to override
- All grades are recorded with timestamps
- Export final CSV when complete
The make_excel.py utility handles bidirectional conversion between CSV and Excel formats:
python make_excel.pyWhat it does:
Excel → CSV (Input Processing):
- Converts ungraded Excel files from
exams_in/directory - Creates CSV files in
exams/directory for grading - Supports both
.xlsxand.xlsformats - Reads first sheet by default
CSV → Excel (Output Formatting):
- Converts graded CSV files from
graded_exams/directory - Creates Excel files in
graded_exams_out/directory - Auto-adjusts column widths for better readability
- Preserves all data and formatting from the CSV files
Workflow:
- Place ungraded Excel files in
exams_in/ - Run
python make_excel.pyto convert to CSV - Grade exams using
tentanator.py - Run
python make_excel.pyagain to convert graded results to Excel
Key settings in tentanator.py:
GRADING_THRESHOLD = 25 # Minimum manual grades before training
NUM_REPRESENTATIVE_SAMPLES = 25 # Number of samples to grade
SAMPLING_ALGORITHM = "kmeans_auto" # Sampling methodAvailable sampling algorithms:
kmeans_auto: Automatically determines optimal number of clusterskmeans_fixed: Uses fixed number of clustersmaximin: Maximizes diversity in selected samplesrandom: Random selectiongptsort: Uses GPT to sort responses by quality
All training data is automatically moderated before upload to OpenAI:
Moderation Categories Checked:
- Harassment and threatening content
- Hate speech and threatening hate
- Illicit content and violent instructions
- Self-harm content, intent, and instructions
- Sexual content and minors
- Violence and graphic violence
Moderation Behavior:
- Individual messages (system, user, assistant) are checked
- Flagged examples are automatically excluded from training
- Detailed statistics shown: total, flagged count, categories
- Training prevented if >50% of content is flagged
- Fails open if moderation API encounters errors
To disable moderation (not recommended):
trainer.upload_training_file(filepath, question_name, moderate=False)tentanator/
├── tentanator.py # Main grading application
├── openai_trainer.py # OpenAI fine-tuning module with moderation
├── sampling.py # Sampling algorithms (KMeans, maximin, etc.)
├── embeddings.py # OpenAI embeddings wrapper
├── make_excel.py # Bidirectional CSV/Excel converter utility
├── global_bank.py # Global question bank management
├── test_moderation.py # Moderation testing suite
├── requirements.txt # Python dependencies
├── .env # API keys (not in version control)
├── exams_in/ # Input Excel files (converted to CSV)
│ └── *.xlsx, *.xls
├── exams/ # Input CSV files (ready for grading)
│ └── *.csv
├── graded_exams/ # Output CSV files with grades
│ └── *.csv
├── graded_exams_out/ # Excel exports of graded exams
│ └── *.xlsx
├── training_data/ # JSONL files for fine-tuning
│ ├── *.jsonl # Combined training files
│ └── partials/ # Per-exam training data
│ └── *.jsonl
├── .tentanator_sessions/ # Saved grading sessions
│ └── *.json
├── global_bank.json # Global question bank registry
├── models.json # Registry of fine-tuned models
└── .tentanator_training_session.json # Saved training session
The main application module containing:
-
Data Classes:
GradedItem: Individual graded responseQuestionGrades: Grades for a single questionGradingSession: Complete grading session data
-
Core Functions:
grade_questions(): Main interactive grading interfaceexport_to_csv(): Export graded data to CSVexport_to_jsonl(): Export training data for fine-tuningget_ai_grade_suggestion(): Get grade suggestions from trained models
The bidirectional CSV/Excel converter utility containing:
-
Excel → CSV Conversion (
convert_excel_to_csv()):- Reads Excel files from
exams_in/directory - Supports
.xlsxand.xlsformats - Converts to CSV in
exams/directory - Batch processing of multiple Excel files
- Reads Excel files from
-
CSV → Excel Conversion (
convert_csv_to_excel()):- Reads CSV files from
graded_exams/directory - Creates Excel files in
graded_exams_out/directory - Auto-adjusted column widths for optimal readability
- Batch processing of multiple CSV files
- Reads CSV files from
-
Main Function (
make_excel()):- Runs both conversion processes sequentially
- Excel→CSV first, then CSV→Excel
- Handles missing directories gracefully
The OpenAI fine-tuning module with content moderation:
-
Data Classes:
FineTuningConfig: Configuration for training jobsTrainingFile: Uploaded training file metadataFineTuningJob: Fine-tuning job trackingModelRegistry: Registry of trained models
-
OpenAITrainer Class:
moderate_content(): Check content using OpenAI moderation APIvalidate_and_moderate_jsonl(): Validate format and filter harmful contentvalidate_jsonl_file(): Validate training data format (no moderation)upload_training_file(): Upload data to OpenAI (with moderation by default)create_fine_tuning_job(): Start fine-tuningmonitor_job(): Track job progressbatch_grade_with_model(): Grade multiple responses
Implements various sampling algorithms for selecting representative responses:
- SamplingAlgorithm: Type definition for available algorithms
- Functions:
kmeans_sample(): K-means clustering with auto/fixed cluster selectionmaximin_sample(): Maximize diversity using maximin distancerandom_sample(): Simple random samplinggptsort_sample(): GPT-based quality sortingselect_representative_samples(): Main interface for all algorithms
Wrapper for OpenAI text embeddings:
- Uses
text-embedding-3-largemodel - Async API calls for performance
- Caching for repeated requests
# 0. Convert Excel to CSV (if needed)
# Place Excel files in exams_in/
python make_excel.py
# → Converts Excel files to CSV in exams/
# 1. Setup and configure
python tentanator.py
# → Choose CSV file from exams/
# → Map ID, input, and output columns
# → Link to global question bank (optional)
# → Grade 25 representative samples (default)
# → Export to JSONL when prompted
# 2. Train AI models
python openai_trainer.py
# → Select "all" to train all untrained files
# → Content moderation runs automatically
# → Wait 10-30 minutes per model
# → Models registered automatically
# 3. Complete grading with AI assistance
python tentanator.py
# → Select the same session
# → AI suggests grades for remaining responses
# → Review and accept/modify suggestions
# → Export final CSV when complete
# 4. Convert to Excel
python make_excel.py
# → Converts graded CSV to Excel in graded_exams_out/Day 0: Convert Excel Files (Optional)
- Place
exam1.xlsxinexams_in/directory - Run
python make_excel.py - Excel file converted to
exams/exam1.csv
Day 1: Setup and Initial Grading
- Exam CSV already in
exams/directory (from conversion or manual placement) - Run
python tentanator.py - Configure column mappings (ID: "Student ID", Input: "Response Q1", Output: "Grade Q1")
- Link to global question "Calculus Derivatives" in global bank
- System selects 25 representative samples using KMeans clustering
- Grade the 25 samples (takes ~10-15 minutes)
- Export to JSONL → creates
gq1_exam1_Grade_Q1.jsonl - Quit and save session
Day 1: Train Model
- Run
python openai_trainer.py - Content moderation checks all 25 examples
- Example output: "Valid: 24 training examples (excluded 1 flagged)"
- Flagged categories: harassment (1 example excluded)
- Upload proceeds with 24 clean examples
- Fine-tuning job created (Job ID: ftjob-xxx)
- Wait 15-20 minutes for completion
- Model registered as
ft:gpt-4-mini:...:tentanator_grade_q1:xxx
Day 2: Complete Grading
- Run
python tentanator.py - Select existing session "exam1_..."
- System loads trained model for "Grade Q1"
- AI pre-computes suggestions for next 5 responses
- Review each suggestion, press Enter to accept or type override
- Complete all 200 remaining responses (~15-20 minutes)
- Export final CSV to
graded_exams/exam1.csv - Run
python make_excel.py→ createsgraded_exams_out/exam1.xlsx
Future Exams: Reuse Model
- Load
exam2.csvwith same question - Link to same global question "Calculus Derivatives"
- Grade 25 new samples → adds to existing training data
- Retrain model with combined data from both exams
- Use improved model for grading
- All grading progress is automatically saved after each grade
- Sessions stored in
.tentanator_sessions/directory - Sessions can be resumed at any time
- Tracks graded items, timestamps, embeddings cache, and configuration
- Multiple sessions can be active for different exams
- Blank or dash responses auto-graded as "0"
- Valid response counter excludes auto-graded items
- Configurable threshold (default: 25 valid responses) required for training
- Representative sampling reduces manual grading workload
- KMeans Auto: Automatically determines optimal clusters using silhouette scoring
- KMeans Fixed: Uses specified number of clusters
- Maximin: Selects diverse samples by maximizing minimum distance
- Random: Simple random selection (baseline)
- GPTSort: Uses GPT to sort responses by quality without embeddings
- Links identical questions across multiple exams
- Combines training data from all linked exams
- Single model trained on data from multiple exam iterations
- Improves model accuracy with larger, more diverse datasets
- Tracked in
global_bank.json
- Automatic: Runs by default on all training data before upload
- Categories: 13 moderation categories checked (harassment, hate, violence, etc.)
- Statistics: Detailed reporting of flagged content and categories
- Safety: Training prevented if >50% of content is flagged
- Transparent: Shows which examples were excluded and why
- Fail-Safe: Fails open if moderation API encounters errors
- Pre-computes suggestions for smoother grading experience
- Rolling window of 5 suggestions cached for performance
- Models matched to questions by global question ID or normalized naming
- Base system prompt includes exam question for context
- CSV Export: Complete graded exam file with all grades
- JSONL Export: OpenAI fine-tuning format with moderation
- Partial Files: Per-exam training data in
partials/subdirectory - Combined Files: Merged training data for global questions
- Excel Export: Formatted .xlsx files with auto-adjusted columns
- Excel Import: Convert Excel files to CSV format for grading
-
Use Representative Sampling: The default
kmeans_autoalgorithm selects diverse samples, reducing the amount of manual grading needed while maintaining quality -
Consistent Grading: Be consistent in your manual grading as the AI will learn from your patterns
-
Link Global Questions: Use the global question bank to combine training data across multiple exam iterations for more accurate models
-
Monitor Content Moderation: Check moderation statistics to ensure your training data is appropriate and unbiased
-
Review AI Suggestions: Always review AI-suggested grades, especially for edge cases or unusual responses
-
Gradual Improvement: Models improve with more training data - consider retraining after grading multiple exams
-
Backup Sessions: Session files are automatically created in
.tentanator_sessions/but consider backing up important grading data -
Model Management: Use
models.jsonto track which models are trained for which questions and when they were created -
Batch Training: Use "all" option in
openai_trainer.pyto train multiple models simultaneously (up to 6 concurrent jobs) -
Quality Over Quantity: 25 well-chosen representative samples often perform better than 50+ random samples
Missing API Key
- Ensure
.envfile exists in project root with validOPENAI_API_KEY - Test with:
python -c "import dotenv; dotenv.load_dotenv(); import os; print('OK' if os.getenv('OPENAI_API_KEY') else 'MISSING')"
Session Recovery
- Sessions stored in
.tentanator_sessions/directory - If corrupted, backup and delete the specific session JSON file
- Start fresh by selecting "New session" in tentanator.py
Training Failures
- Check OpenAI dashboard for quota or billing issues
- Verify moderation didn't exclude too many examples (>50%)
- Ensure minimum 10 examples remain after moderation
- OpenAI limits: 6 concurrent fine-tuning jobs
Content Moderation Blocking Training
- Review which categories are being flagged
- Check if student responses contain inappropriate content
- Consider if content is genuinely problematic or false positive
- If false positive, contact OpenAI support or manually review
CSV Format Issues
- Ensure CSV files have proper headers in first row
- Use UTF-8 encoding (not ASCII or Latin-1)
- Avoid special characters in column names
- Check for consistent delimiter (comma vs semicolon)
Model Not Found
- Verify model is registered in
models.json - Check global question ID matches between session and model
- Ensure fine-tuning job completed successfully
- Run
python openai_trainer.pyto check job status
Slow Performance
- Embeddings are cached after first use
- First-time sampling may take 1-2 minutes for large datasets
- AI suggestions are pre-computed in batches of 5
- Consider using
randomsampling for faster initial selection
- Python 3.8+
- openai>=1.0.0 (for fine-tuning and moderation APIs)
- python-dotenv>=1.0.0 (for environment variable management)
- pandas>=2.0.0 (for CSV processing)
- openpyxl>=3.1.0 (for Excel export)
- scikit-learn (for KMeans clustering and embeddings)
- numpy (for numerical operations)
- Valid OpenAI API key with access to:
- Fine-tuning API (GPT-4 Mini recommended)
- Moderation API (free)
- Embeddings API (for sampling algorithms)
- Chat completions (for GPTSort sampling)
- Embeddings: ~$0.13 per 1M tokens (text-embedding-3-large)
- Fine-tuning: ~$3.00 per 1M tokens training (gpt-4.1-mini)
- Inference: ~$0.30 per 1M tokens (fine-tuned model)
- Moderation: Free
- Example: 200 responses × 100 words each = ~26,700 tokens
- Embeddings: <$0.01
- Fine-tuning (25 samples): <$0.10
- Inference (175 graded): ~$0.01
- Total per exam: ~$0.12
GNU Affero General Public License - See LICENSE file for details
Contributions are welcome! Please ensure all code follows the existing patterns:
- Type hints for all functions
- Dataclasses for data structures
- Comprehensive error handling
- Session persistence for long-running operations