Human Evaluation Tool

A web-based tool for conducting human evaluation of machine translation outputs. This tool allows evaluators to assess and compare translations from different systems, mark errors, and provide detailed feedback.

Features

User authentication and authorization
Support for multiple language pairs
Error marking and categorization
Severity level assessment
Side-by-side comparison of translations
Progress tracking
Results aggregation and export

Demo

Below is a quick video showing how the Human Evaluation Tool looks and works:

Demo.mov

Project Structure

The project consists of three main components:

backend/: Flask-based REST API server
frontend/: React-based web application
public/: Static assets and built files

Prerequisites

Python 3.10 or later
Node.js 18 or later
PostgreSQL 13 or later
Poetry (Python package manager)
npm (Node.js package manager)

Installation and Setup

Option 1: Using Docker (Recommended)

Build the Docker image:

docker build -t yaraku/human-evaluation-tool .

Note: The Docker build installs Poetry versions that satisfy >=1.5,<1.7 by default. You can override the constraint with --build-arg POETRY_VERSION_CONSTRAINT="==1.6.1" if you need to pin an exact release.

Run the container:

docker run --rm -it -p 8000:8000 yaraku/human-evaluation-tool

Option 2: Manual Setup

Install prerequisites:
- Python 3.10 or later
- Node.js 18 or later
- PostgreSQL 13 or later
- Poetry (Python package manager)
- npm (Node.js package manager)
Set up PostgreSQL:

# Start PostgreSQL service
sudo service postgresql start

# Create database and set password
sudo -u postgres createdb he_tool
sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'postgres';"

Set up the backend:

cd backend

# Install dependencies (Poetry 1.5.x - 1.6.x)
poetry install

# Create and configure .env file
cat > .env << EOL
FLASK_APP=human_evaluation_tool:app
FLASK_ENV=development
DB_HOST=localhost
DB_PORT=5432
DB_NAME=he_tool
DB_USER=postgres
DB_PASSWORD=postgres
JWT_SECRET_KEY=development-secret-key
EOL

# Initialize and run migrations
poetry run flask db init
poetry run flask db migrate
poetry run flask db upgrade

# Start the backend server (development)
poetry run python main.py

# Or launch with Gunicorn (production-style)
poetry run gunicorn --bind 0.0.0.0:8000 human_evaluation_tool:app

Set up the frontend (in a new terminal):

cd frontend

# Install dependencies
npm install

# Create and configure .env file
echo "VITE_API_URL=http://localhost:5000" > .env

# Start the development server
npm run dev

Access the application:
- Frontend: http://localhost:5173
- Backend API: http://localhost:5000

Usage

When you run the backend without PostgreSQL credentials it falls back to a local SQLite database. That database is pre-populated with a demo user (yaraku@yaraku.com / yaraku) and a "Sample Evaluation" so you can explore the workflow immediately.

Access the application at http://localhost:5173
Log in with the demo credentials above or register a new account
Open the "Sample Evaluation" to try the annotation UI, or create a new evaluation project
Upload documents and system outputs when running your own studies
Start evaluating translations

Development

Backend Development

The backend is built with Flask and uses:

SQLAlchemy for database ORM
Flask-JWT-Extended for authentication
Flask-Migrate for database migrations

Key commands:

cd backend
poetry run flask db migrate  # Create new migrations
poetry run flask db upgrade  # Apply migrations
poetry run python main.py  # Run development server
poetry run gunicorn --bind 0.0.0.0:8000 human_evaluation_tool:app  # Run with Gunicorn

Frontend Development

The frontend is built with React and uses:

Vite for build tooling
TailwindCSS for styling
React Query for data fetching

Key commands:

cd frontend
npm run dev  # Start development server
npm run build  # Build for production
npm run preview  # Preview production build

Database Schema

The application uses a PostgreSQL database with the following main entities:

Users: Evaluators and administrators
Documents: Source texts for evaluation
Systems: MT systems being evaluated
Evaluations: Evaluation projects
Annotations: User annotations and feedback
Markings: Error markings and categorizations

For a detailed ER diagram, see backend/README.md.

Contributing

Fork the repository
Create a feature branch
For backend changes, install dependencies with poetry install --with dev and run:
- poetry run black --check src tests
- poetry run isort --check-only src tests
- poetry run flake8 src tests
- poetry run mypy src tests
- poetry run pytest
Commit your changes
Push to the branch
Create a Pull Request and ensure the Backend CI workflow passes when touching backend code.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
public		public
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Human Evaluation Tool

Features

Demo

Project Structure

Prerequisites

Installation and Setup

Option 1: Using Docker (Recommended)

Option 2: Manual Setup

Usage

Development

Backend Development

Frontend Development

Database Schema

Contributing

License

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

yaraku/he-tool

Folders and files

Latest commit

History

Repository files navigation

Human Evaluation Tool

Features

Demo

Project Structure

Prerequisites

Installation and Setup

Option 1: Using Docker (Recommended)

Option 2: Manual Setup

Usage

Development

Backend Development

Frontend Development

Database Schema

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages