Skip to content

forzagreen/wikitermbase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wikitermbase

Table of Contents

Overview

Wiki Term Base is a tool designed to standardise terminology used on Arabic Wikipedia and accelerate vocabulary translation.

ℹ For functional documentation, please check the dedicated Wikipedia page مسرد الويكي (in Arabic).

🌐 The website is available at: https://wikitermbase.toolforge.org

It is hosted on Toolforge, as a Python web application built with the Flask framework, using a MariaDB relational database.

The website's frontend is built with React framework.

The Wikipedia gadget frontend is built with OOUI and can be enabled in Arabic Wikipedia's user preferences.

Wiki Gadget

The Wikipedia gadget can be activated in user preferences -> "مسرد الويكي".

The deployed version in Arabic Wikipedia:

On Wikipedia, gadgets are production-ready features, while user scripts serve as a flexible environment for development and experimentation.

The user script, available at gadget/SearchTerm.js, differs from gadget code in that it consolidates all imports, JavaScript code, and CSS styles into a single file.

Local Setup

Please note that the database content is managed in the project arabterm, (currently still on beta at branch feature/arabterm_v2)

Clone the arabterm repository, and start the MariaDB database in a Docker container:

make init
make init_mariadb  # start or create container
make delete_mariadb  # delete database if exists
make migrate_to_mariadb  # migrate the SQLite content to MariaDB

Then from wikitermbase repository, install python dependencies:

make init

Create a file at ./var/local.cnf with (adapt values):

[client]
user = MyUserName
password = MyTestPassword

Start the Flask application:

FLASK_APP=backend/app.py python -m flask run --port=5001

You can then open the web application at http://127.0.0.1:5001/

Backend

Python version: 3.11

Flask API

  • Aggregated search (results are groupped by the arabic term):
GET /api/v1/search/aggregated?q=magnetoscope
GET /api/v1/search/aggregated?q=اشتقاق

As a result, we get a JSON. An example can found at gadget/response.json

  • Raw search (without groupping):
GET /api/v1/search?q=magnetoscope
GET /api/v1/search?q=اشتقاق

Flask API on Toolforge

Initial Setup

Refs:

For the initial setup of the repository in Toolforge:

  • ssh toolforge and become wikitermbase
  • Generate a token in Github
  • Clone the repository git clone https://github.com/forzagreen/wikitermbase
  • Enter webservice shell: toolforge webservice --backend=kubernetes python3.11 shell
  • mkdir -p $HOME/www/python
  • Create a symlink from $HOME/www/python/src to the folder backend of the cloned repo:
    • ln -s /data/project/wikitermbase/wikitermbase/backend /data/project/wikitermbase/www/python/src
  • Create a virtual environment, activate it, and install dependencies:
    • python3 -m venv $HOME/www/python/venv
    • source $HOME/www/python/venv/bin/activate
    • pip install -r $HOME/www/python/src/requirements.txt
  • Exit out of webservice shell (Ctrl + D or exit)
  • toolforge webservice --backend=kubernetes python3.11 start
  • To test, go to: https://wikitermbase.toolforge.org/search?q=telescope
  • Check logs in /data/project/wikitermbase/uwsgi.log

Updating the Codebase

  • ssh toolforge and become wikitermbase
  • cd wikitermbase and git pull origin main (supply username and token)
  • If python code changed:
    • Enter webservice shell: toolforge webservice --backend=kubernetes python3.11 shell
    • Enter python virtual environment and update dependencies:
      source $HOME/www/python/venv/bin/activate
      pip uninstall arabterm
      pip install -r $HOME/www/python/src/requirements.txt
    • Exit the webservice shell (exit)
  • If npm dependencies changed (or to rebuild javascript/html/css code):
    • Enter Node.js shell: toolforge webservice node18 shell
    • cd wikitermbase, make build_frontend, and exit the shell.
  • toolforge webservice --backend=kubernetes python3.11 restart
  • To test, go to: https://wikitermbase.toolforge.org/search?q=telescope
  • Make sure the gadget in Wikipedia is still working.

Database: MariaDB

Updating data

Ref: https://mariadb.com/kb/en/backup-and-restore-overview/

Prerequisite: SQLite arabterm.db is up to date in arabterm repository (currently on branch feature/arabterm_v2).

From arabterm repository, generate MariaDB database:

make init_mariadb  # start or create container
make delete_mariadb
make migrate_to_mariadb

# Make sure search works in MariaDB:
make search term="telescope"

# Generate database dumps, SQLite and MariaDB:
make dump

Commit and push arabterm.db and db/ to arabterm GitHub repository:

Then, from wikitermbase repository:

# If python dependencies changed (including arabterm python package):
pip uninstall arabterm
make init

# Download dump from arabterm repository, branch feature/arabterm_v2
make download_dump
make fix_dump

Commit changes to git. Then go to ToolForge and update the database.

MariaDB on Toolforge

Initial Setup

Ref: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#User_databases

  • ssh toolforge and become wikitermbase
  • Find out your user in $HOME/replica.my.cnf
  • Create the database:
    • Open the SQL console: sql tools
    • Create the database: MariaDB [(none)]> CREATE DATABASE s55953__arabterm;

Updating the Database

To update/restore the database:

  • ssh toolforge and become wikitermbase
  • cd wikitermbase and git pull origin main (supply username and token)
  • cd ~/wikitermbase/db
  • mariadb --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud s55953__arabterm < arabterm.sql

Troubleshooting

All these issues are fixed by running make fix_dump

  • https://jira.mariadb.org/browse/MDEV-34183 drop the line /*!999999\- enable the sandbox mode */ or /*M!999999\- enable the sandbox mode */
  • ERROR 1273 (HY000) at line 25: Unknown collation: 'utf8mb4_uca1400_ai_ci', replace it with utf8mb4_unicode_520_ci

References

About

Standardise terminology used on Arabic Wikipedia and accelerate vocabulary translation

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published