Generate SQLite and PostgreSQL relational databases from Japanese dictionary data sources:
- JMdict: Japanese-multilingual dictionary data (expression/vocabulary entries)
- KANJIDIC2: Comprehensive kanji character information
- RADKFILE: Kanji radical decomposition data
The tool produces SQL schemas for SQLite, as well as CSV exports, used by PostgreSQL.
SQL is generated using scripts written in dart;
Download required packages with
dart pub get
SQL or CSV are generated using dart under the src directory.
- to SQL scripts
Using src/to_sql_expression.dart and src/to_sql_kanji.dart
Options :
- --langs languages to process (gloss or meanings), comma separated list The languages are in ISO 639-3 format for expression and ISO 639-2 for kanji
- --max-inserts How many VALUES per INSERT in the generated SQL. 0 for all VALUES.
English and French
dart src/to_sql_expression.dart --langs "eng,fre" --max-inserts 1English only
dart src/to_sql_expression.dart --langs "eng"dart src/to_sql_kanji.dart en- to CSV scripts
Using src/to_csv_expression.dart and src/to_csv_kanji.dart
Files are generated under data/generated directory
Databases can be opened/tested with sqlite3 binary
sqlite3 data/generated/db/expression.dbsqlite3 data/generated/db/kanji.dbInstead of calling dart directly, helper bash scripts under the scripts directory can be used.
Bash scripts under scripts that helps download files, init, populate the databases
For all the scripts the first argument musy be expression or kanji.
-
download dictionaries with
wgetand uncompressed them withgunzip. -
--download: Download JMdict and KANJIDIC2/RADKFILE source files -
--clean: Clear generated SQL or CSV files -
--sql: Generate SQL insert statements -
--csv: Generate CSV files
sqlite3 db files are created and populated using the sqlite3 binary.
sudo apt install sqlite3--init: Create SQLite database file with tables and indexes--populate: Insert data using previously generated SQL--compress: Compress the SQLite database--clean: Remove the database
Postgres database is populated using psql.
--init: Create PostgreSQL database with tables and indexes--populate: Import data using previously generated CSV files via COPY--clean: Remove the database
First download the dictionaries
bash scripts/run.bash expression --download
bash scripts/run.bash kanji --download
generate sql for english sense and populate the db and compress:
- For expression
bash scripts/run.bash expression --sql "eng"
bash scripts/sqlite.bash expression --clean --init --populate --compress "zip" --compress "xz"- For kanji
bash scripts/run.bash kanji --clean --init --sql "en"
bash scripts/sqlite.bash kanji --populate --compress "zip" --compress "xz"- Create a container
docker run -v "$(pwd):/workspace" \
--name postgres-container \
-e POSTGRES_PASSWORD=mysecretpassword \
-e POSTGRES_DB=edict \
-p 5432:5432 \
-d postgres
- At next boot
docker start postgres-container
- Data are imported from csv
bash scripts/run.bash expression --csv
bash scripts/run.bash kanji --csv
- import expression and kanji
docker exec -i -w /workspace postgres-container bash scripts/postgres.bash expression --init --populate
docker exec -i -w /workspace postgres-container bash scripts/postgres.bash kanji --init --populate
- Wipe schema
docker exec -i -w /workspace postgres-container bash scripts/postgres.bash expression --clean
docker exec -i -w /workspace postgres-container bash scripts/postgres.bash kanji --clean
- Interactive session
docker exec -it -w /workspace postgres-container psql -U postgres -d edict
For more information onto the database structure and SQL recipes see the Wiki at https://github.com/odrevet/edict_database/wiki
The edict_database project is not affiliated with the edict project.
The source code in the src folder is licenced under the MIT license
The generated sql and db files are licenced under the edrdg license, same as the edict dictionary.