Skip to content

Amir79Naziri/SpellCorrectionApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SpellCorrectionApp - Persian Spell Checker

A Django-based web application for advanced Persian spelling error correction using BERT (Bidirectional Encoder Representations from Transformers) and Levenshtein distance algorithms.

πŸ“– About

This application leverages neural networks, particularly the ParsBERT masked language model, to identify and correct diverse spelling errors in Persian text. It handles both real-word and non-real-word errors through a combined approach using BERT and Levenshtein distance, offering superior performance for Persian language spell checking.

Key Features

  • Advanced ML Model: Uses HappyTransformer with ParsBERT for accurate spell correction
  • Multiple Error Types: Handles homophone, keyboard, and substitution errors
  • User Authentication: Secure login and registration system
  • File Processing: Upload text files for batch spell correction
  • Async Task Processing: Background task processing using Dramatiq
  • User Dashboard: Track your correction history and download results
  • Real-time Correction: Process text directly through the web interface

Keywords

Spelling mistakes, Neural Networks, BERT masked language model, Error correction system, Real and non-real word errors, ParsBERT model, Levenshtein distance


πŸš€ Getting Started

Prerequisites

  • Python 3.8 or higher
  • pip (Python package manager)
  • Virtual environment (recommended)
  • PostgreSQL (optional, SQLite is used by default)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd SpellCorrectionApp-main
  2. Create and activate a virtual environment

    # On macOS/Linux
    python3 -m venv venv
    source venv/bin/activate
    
    # On Windows
    python -m venv venv
    venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Set up environment variables

    Create a .env file in the root directory:

    SECRET_KEY=your-secret-key-here
    DEBUG=True

    To generate a secure SECRET_KEY:

    python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"
  5. Run database migrations

    python manage.py makemigrations
    python manage.py migrate
  6. Create a superuser (admin account)

    python manage.py createsuperuser
  7. Prepare ML Model and Dictionary Files

    Ensure you have the following in your project:

    • Trained BERT model (ParsBERT)
    • Dictionary files:
      • dictionary.txt
      • keyboard_realword_errors.txt
      • substitution_realword_errors.txt
      • homophone_realword_errors.txt
  8. Start the development server

    python manage.py runserver
  9. Start the Dramatiq worker (in a separate terminal)

    # Activate your virtual environment first
    python manage.py rundramatiq
  10. Access the application

    Open your browser and navigate to: http://127.0.0.1:8000/


πŸ“‹ Usage

For Regular Users

  1. Register an Account

    • Navigate to the registration page
    • Provide username, email, and password
    • Submit the form to create your account
  2. Login

    • Use your email and password to log in
    • You'll be redirected to the home page
  3. Correct Text

    Option A: Direct Text Input

    • Enter or paste Persian text directly into the text area
    • Click the correction button
    • View corrected text and download results

    Option B: File Upload

    • Upload a text file containing Persian text
    • Submit for processing
    • The task will be processed in the background
    • Check your profile/dashboard for results
    • Download the corrected file and correction report
  4. View Your History

    • Access your profile page
    • View all previous correction tasks
    • Download corrected files and reports
    • Track task status (processing/completed)

For Administrators

  1. Access Admin Panel

    http://127.0.0.1:8000/admin/
    
  2. Manage Users

    • View, edit, or delete user accounts
    • Monitor user activity
  3. Manage Tasks

    • View input and output tasks
    • Monitor task processing status
    • Access user-uploaded and corrected files

πŸ—οΈ Project Structure

SpellCorrectionApp-main/
β”œβ”€β”€ manage.py                  # Django management script
β”œβ”€β”€ requirements.txt           # Project dependencies
β”œβ”€β”€ README.md                  # This file
β”œβ”€β”€ .env                       # Environment variables (create this)
β”œβ”€β”€ base/                      # Main application
β”‚   β”œβ”€β”€ models.py             # Database models (User, InputTask, OutputTask)
β”‚   β”œβ”€β”€ views.py              # Request handlers
β”‚   β”œβ”€β”€ forms.py              # Form definitions
β”‚   β”œβ”€β”€ urls.py               # URL routing
β”‚   β”œβ”€β”€ tasks.py              # Background task definitions
β”‚   β”œβ”€β”€ ml_model.py           # ML model implementation
β”‚   β”œβ”€β”€ templates/            # HTML templates
β”‚   └── migrations/           # Database migrations
β”œβ”€β”€ SpellCorrectionApp/        # Project settings
β”‚   β”œβ”€β”€ settings.py           # Django configuration
β”‚   β”œβ”€β”€ urls.py               # Root URL configuration
β”‚   └── wsgi.py               # WSGI configuration
β”œβ”€β”€ static/                    # Static files (CSS, JS, images)
└── templates/                 # Base templates

πŸ”§ Configuration

Database Configuration

By default, the app uses SQLite. To use PostgreSQL:

  1. Install psycopg2 (already in requirements.txt)
  2. Update settings.py:
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.postgresql',
            'NAME': 'your_db_name',
            'USER': 'your_db_user',
            'PASSWORD': 'your_db_password',
            'HOST': 'localhost',
            'PORT': '5432',
        }
    }

Static Files

For production, collect static files:

python manage.py collectstatic

Media Files

Upload files are stored in:

  • media/uploads/ - User input files
  • media/downloads/ - Corrected output files
  • media/reports/ - Correction reports

πŸ§ͺ Testing

Run the test suite:

python manage.py test

Run tests for a specific app:

python manage.py test base

πŸ› οΈ Technologies Used

  • Backend Framework: Django 4.1.7
  • ML Framework: HappyTransformer (BERT)
  • Task Queue: Dramatiq with django-dramatiq
  • String Similarity: Polyleven (Levenshtein distance)
  • Database: SQLite (default) / PostgreSQL
  • Frontend: HTML, CSS (SASS), JavaScript
  • Data Processing: Pandas, NumPy

πŸ“Š API Endpoints

Public Routes

  • / - Home page
  • /login/ - User login
  • /register/ - User registration
  • /about/ - About page

Protected Routes (Login Required)

  • /profile/ - User profile and task history
  • /logout/ - User logout
  • /update-password/ - Change password
  • /update-user/ - Update user information

πŸ› Troubleshooting

Common Issues

  1. Import Errors

    • Ensure all dependencies are installed: pip install -r requirements.txt
    • Activate your virtual environment
  2. Database Errors

    • Run migrations: python manage.py migrate
    • Check database configuration in settings.py
  3. Static Files Not Loading

    • Run: python manage.py collectstatic
    • Check STATIC_URL and STATIC_ROOT in settings.py
  4. Background Tasks Not Processing

    • Ensure Dramatiq worker is running: python manage.py rundramatiq
    • Check task queue configuration
  5. ML Model Errors

    • Verify model path in settings
    • Ensure dictionary files are present and accessible
    • Check model compatibility with HappyTransformer version

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/YourFeature
  3. Commit your changes: git commit -m 'Add YourFeature'
  4. Push to the branch: git push origin feature/YourFeature
  5. Open a Pull Request

πŸ“ License

This project is part of academic research on Persian spelling error correction using BERT.


πŸ“§ Contact

For questions or support, please open an issue in the repository.


πŸ™ Acknowledgments

  • ParsBERT model contributors
  • HappyTransformer library developers
  • Django community

About

A Django based spell correction App.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published