A Django-based web application for advanced Persian spelling error correction using BERT (Bidirectional Encoder Representations from Transformers) and Levenshtein distance algorithms.
This application leverages neural networks, particularly the ParsBERT masked language model, to identify and correct diverse spelling errors in Persian text. It handles both real-word and non-real-word errors through a combined approach using BERT and Levenshtein distance, offering superior performance for Persian language spell checking.
- Advanced ML Model: Uses HappyTransformer with ParsBERT for accurate spell correction
- Multiple Error Types: Handles homophone, keyboard, and substitution errors
- User Authentication: Secure login and registration system
- File Processing: Upload text files for batch spell correction
- Async Task Processing: Background task processing using Dramatiq
- User Dashboard: Track your correction history and download results
- Real-time Correction: Process text directly through the web interface
Spelling mistakes, Neural Networks, BERT masked language model, Error correction system, Real and non-real word errors, ParsBERT model, Levenshtein distance
- Python 3.8 or higher
- pip (Python package manager)
- Virtual environment (recommended)
- PostgreSQL (optional, SQLite is used by default)
-
Clone the repository
git clone <repository-url> cd SpellCorrectionApp-main
-
Create and activate a virtual environment
# On macOS/Linux python3 -m venv venv source venv/bin/activate # On Windows python -m venv venv venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
Create a
.envfile in the root directory:SECRET_KEY=your-secret-key-here DEBUG=True
To generate a secure SECRET_KEY:
python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"
-
Run database migrations
python manage.py makemigrations python manage.py migrate
-
Create a superuser (admin account)
python manage.py createsuperuser
-
Prepare ML Model and Dictionary Files
Ensure you have the following in your project:
- Trained BERT model (ParsBERT)
- Dictionary files:
dictionary.txtkeyboard_realword_errors.txtsubstitution_realword_errors.txthomophone_realword_errors.txt
-
Start the development server
python manage.py runserver
-
Start the Dramatiq worker (in a separate terminal)
# Activate your virtual environment first python manage.py rundramatiq -
Access the application
Open your browser and navigate to:
http://127.0.0.1:8000/
-
Register an Account
- Navigate to the registration page
- Provide username, email, and password
- Submit the form to create your account
-
Login
- Use your email and password to log in
- You'll be redirected to the home page
-
Correct Text
Option A: Direct Text Input
- Enter or paste Persian text directly into the text area
- Click the correction button
- View corrected text and download results
Option B: File Upload
- Upload a text file containing Persian text
- Submit for processing
- The task will be processed in the background
- Check your profile/dashboard for results
- Download the corrected file and correction report
-
View Your History
- Access your profile page
- View all previous correction tasks
- Download corrected files and reports
- Track task status (processing/completed)
-
Access Admin Panel
http://127.0.0.1:8000/admin/ -
Manage Users
- View, edit, or delete user accounts
- Monitor user activity
-
Manage Tasks
- View input and output tasks
- Monitor task processing status
- Access user-uploaded and corrected files
SpellCorrectionApp-main/
βββ manage.py # Django management script
βββ requirements.txt # Project dependencies
βββ README.md # This file
βββ .env # Environment variables (create this)
βββ base/ # Main application
β βββ models.py # Database models (User, InputTask, OutputTask)
β βββ views.py # Request handlers
β βββ forms.py # Form definitions
β βββ urls.py # URL routing
β βββ tasks.py # Background task definitions
β βββ ml_model.py # ML model implementation
β βββ templates/ # HTML templates
β βββ migrations/ # Database migrations
βββ SpellCorrectionApp/ # Project settings
β βββ settings.py # Django configuration
β βββ urls.py # Root URL configuration
β βββ wsgi.py # WSGI configuration
βββ static/ # Static files (CSS, JS, images)
βββ templates/ # Base templates
By default, the app uses SQLite. To use PostgreSQL:
- Install psycopg2 (already in requirements.txt)
- Update
settings.py:DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql', 'NAME': 'your_db_name', 'USER': 'your_db_user', 'PASSWORD': 'your_db_password', 'HOST': 'localhost', 'PORT': '5432', } }
For production, collect static files:
python manage.py collectstaticUpload files are stored in:
media/uploads/- User input filesmedia/downloads/- Corrected output filesmedia/reports/- Correction reports
Run the test suite:
python manage.py testRun tests for a specific app:
python manage.py test base- Backend Framework: Django 4.1.7
- ML Framework: HappyTransformer (BERT)
- Task Queue: Dramatiq with django-dramatiq
- String Similarity: Polyleven (Levenshtein distance)
- Database: SQLite (default) / PostgreSQL
- Frontend: HTML, CSS (SASS), JavaScript
- Data Processing: Pandas, NumPy
/- Home page/login/- User login/register/- User registration/about/- About page
/profile/- User profile and task history/logout/- User logout/update-password/- Change password/update-user/- Update user information
-
Import Errors
- Ensure all dependencies are installed:
pip install -r requirements.txt - Activate your virtual environment
- Ensure all dependencies are installed:
-
Database Errors
- Run migrations:
python manage.py migrate - Check database configuration in settings.py
- Run migrations:
-
Static Files Not Loading
- Run:
python manage.py collectstatic - Check STATIC_URL and STATIC_ROOT in settings.py
- Run:
-
Background Tasks Not Processing
- Ensure Dramatiq worker is running:
python manage.py rundramatiq - Check task queue configuration
- Ensure Dramatiq worker is running:
-
ML Model Errors
- Verify model path in settings
- Ensure dictionary files are present and accessible
- Check model compatibility with HappyTransformer version
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/YourFeature - Commit your changes:
git commit -m 'Add YourFeature' - Push to the branch:
git push origin feature/YourFeature - Open a Pull Request
This project is part of academic research on Persian spelling error correction using BERT.
For questions or support, please open an issue in the repository.
- ParsBERT model contributors
- HappyTransformer library developers
- Django community