A web-based tool for conducting human evaluation of machine translation outputs. This tool allows evaluators to assess and compare translations from different systems, mark errors, and provide detailed feedback.
- User authentication and authorization
- Support for multiple language pairs
- Error marking and categorization
- Severity level assessment
- Side-by-side comparison of translations
- Progress tracking
- Results aggregation and export
Below is a quick video showing how the Human Evaluation Tool looks and works:
Demo.mov
The project consists of three main components:
backend/: Flask-based REST API serverfrontend/: React-based web applicationpublic/: Static assets and built files
- Python 3.10 or later
- Node.js 18 or later
- PostgreSQL 13 or later
- Poetry (Python package manager)
- npm (Node.js package manager)
- Build the Docker image:
docker build -t yaraku/human-evaluation-tool .Note: The Docker build installs Poetry versions that satisfy
>=1.5,<1.7by default. You can override the constraint with--build-arg POETRY_VERSION_CONSTRAINT="==1.6.1"if you need to pin an exact release.
- Run the container:
docker run --rm -it -p 8000:8000 yaraku/human-evaluation-tool-
Install prerequisites:
- Python 3.10 or later
- Node.js 18 or later
- PostgreSQL 13 or later
- Poetry (Python package manager)
- npm (Node.js package manager)
-
Set up PostgreSQL:
# Start PostgreSQL service
sudo service postgresql start
# Create database and set password
sudo -u postgres createdb he_tool
sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'postgres';"- Set up the backend:
cd backend
# Install dependencies (Poetry 1.5.x - 1.6.x)
poetry install
# Create and configure .env file
cat > .env << EOL
FLASK_APP=human_evaluation_tool:app
FLASK_ENV=development
DB_HOST=localhost
DB_PORT=5432
DB_NAME=he_tool
DB_USER=postgres
DB_PASSWORD=postgres
JWT_SECRET_KEY=development-secret-key
EOL
# Initialize and run migrations
poetry run flask db init
poetry run flask db migrate
poetry run flask db upgrade
# Start the backend server (development)
poetry run python main.py
# Or launch with Gunicorn (production-style)
poetry run gunicorn --bind 0.0.0.0:8000 human_evaluation_tool:app- Set up the frontend (in a new terminal):
cd frontend
# Install dependencies
npm install
# Create and configure .env file
echo "VITE_API_URL=http://localhost:5000" > .env
# Start the development server
npm run dev- Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:5000
When you run the backend without PostgreSQL credentials it falls back to a local SQLite database. That database is pre-populated with a demo user (yaraku@yaraku.com / yaraku) and a "Sample Evaluation" so you can explore the workflow immediately.
- Access the application at http://localhost:5173
- Log in with the demo credentials above or register a new account
- Open the "Sample Evaluation" to try the annotation UI, or create a new evaluation project
- Upload documents and system outputs when running your own studies
- Start evaluating translations
The backend is built with Flask and uses:
- SQLAlchemy for database ORM
- Flask-JWT-Extended for authentication
- Flask-Migrate for database migrations
Key commands:
cd backend
poetry run flask db migrate # Create new migrations
poetry run flask db upgrade # Apply migrations
poetry run python main.py # Run development server
poetry run gunicorn --bind 0.0.0.0:8000 human_evaluation_tool:app # Run with GunicornThe frontend is built with React and uses:
- Vite for build tooling
- TailwindCSS for styling
- React Query for data fetching
Key commands:
cd frontend
npm run dev # Start development server
npm run build # Build for production
npm run preview # Preview production buildThe application uses a PostgreSQL database with the following main entities:
- Users: Evaluators and administrators
- Documents: Source texts for evaluation
- Systems: MT systems being evaluated
- Evaluations: Evaluation projects
- Annotations: User annotations and feedback
- Markings: Error markings and categorizations
For a detailed ER diagram, see backend/README.md.
- Fork the repository
- Create a feature branch
- For backend changes, install dependencies with
poetry install --with devand run:poetry run black --check src testspoetry run isort --check-only src testspoetry run flake8 src testspoetry run mypy src testspoetry run pytest
- Commit your changes
- Push to the branch
- Create a Pull Request and ensure the Backend CI workflow passes when touching backend code.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.