Goal

Generate SQLite and PostgreSQL relational databases from Japanese dictionary data sources:

JMdict: Japanese-multilingual dictionary data (expression/vocabulary entries)
KANJIDIC2: Comprehensive kanji character information
RADKFILE: Kanji radical decomposition data

The tool produces SQL schemas for SQLite, as well as CSV exports, used by PostgreSQL.

setup

Dart SDK

SQL is generated using scripts written in dart;

Dart packages

Download required packages with

dart pub get

Generate sql for selected languages

SQL or CSV are generated using dart under the src directory.

to SQL scripts

Using src/to_sql_expression.dart and src/to_sql_kanji.dart

Options :

--langs languages to process (gloss or meanings), comma separated list The languages are in ISO 639-3 format for expression and ISO 639-2 for kanji
--max-inserts How many VALUES per INSERT in the generated SQL. 0 for all VALUES.

English and French

dart src/to_sql_expression.dart --langs "eng,fre" --max-inserts 1

English only

dart src/to_sql_expression.dart --langs "eng"

dart src/to_sql_kanji.dart en

to CSV scripts

Using src/to_csv_expression.dart and src/to_csv_kanji.dart

generated files

Files are generated under data/generated directory

Databases can be opened/tested with sqlite3 binary

sqlite3 data/generated/db/expression.db

sqlite3 data/generated/db/kanji.db

Helper scripts

Instead of calling dart directly, helper bash scripts under the scripts directory can be used.

Bash scripts under scripts that helps download files, init, populate the databases

For all the scripts the first argument musy be expression or kanji.

scripts/run.sh

download dictionaries with wget and uncompressed them with gunzip.
--download: Download JMdict and KANJIDIC2/RADKFILE source files
--clean: Clear generated SQL or CSV files
--sql: Generate SQL insert statements
--csv: Generate CSV files

scripts/sqlite.sh

sqlite3 db files are created and populated using the sqlite3 binary.

sudo apt install sqlite3

--init: Create SQLite database file with tables and indexes
--populate: Insert data using previously generated SQL
--compress: Compress the SQLite database
--clean: Remove the database

scripts/postgres.sh

Postgres database is populated using psql.

--init: Create PostgreSQL database with tables and indexes
--populate: Import data using previously generated CSV files via COPY
--clean: Remove the database

Helper scripts examples

First download the dictionaries

bash scripts/run.bash expression --download
bash scripts/run.bash kanji --download

sqlite

generate sql for english sense and populate the db and compress:

For expression

bash scripts/run.bash expression --sql "eng"
bash scripts/sqlite.bash expression --clean --init --populate --compress "zip" --compress "xz"

For kanji

bash scripts/run.bash kanji --clean --init --sql "en"
bash scripts/sqlite.bash kanji --populate --compress "zip" --compress "xz"

postgres using docker

Create a container

docker run -v "$(pwd):/workspace" \
  --name postgres-container \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -e POSTGRES_DB=edict \
  -p 5432:5432 \
  -d postgres

At next boot

docker start postgres-container

Data are imported from csv

bash scripts/run.bash expression --csv
bash scripts/run.bash kanji --csv

import expression and kanji

docker exec -i -w /workspace postgres-container bash scripts/postgres.bash expression --init --populate
docker exec -i -w /workspace postgres-container bash scripts/postgres.bash kanji --init --populate

Wipe schema

docker exec -i -w /workspace postgres-container bash scripts/postgres.bash expression --clean
docker exec -i -w /workspace postgres-container bash scripts/postgres.bash kanji --clean

Interactive session

docker exec -it -w /workspace postgres-container psql -U postgres -d edict

Documentation

For more information onto the database structure and SQL recipes see the Wiki at https://github.com/odrevet/edict_database/wiki

Licencing

The edict_database project is not affiliated with the edict project.

The source code in the src folder is licenced under the MIT license

The generated sql and db files are licenced under the edrdg license, same as the edict dictionary.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
data		data
doc		doc
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
analysis_options.yaml		analysis_options.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Goal

setup

Dart SDK

Dart packages

Generate sql for selected languages

generated files

Helper scripts

scripts/run.sh

scripts/sqlite.sh

scripts/postgres.sh

Helper scripts examples

sqlite

postgres using docker

Documentation

Licencing

Links

Radkfile

JMdict

kanjidict

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

odrevet/edict_database

Folders and files

Latest commit

History

Repository files navigation

Goal

setup

Dart SDK

Dart packages

Generate sql for selected languages

generated files

Helper scripts

scripts/run.sh

scripts/sqlite.sh

scripts/postgres.sh

Helper scripts examples

sqlite

postgres using docker

Documentation

Licencing

Links

Radkfile

JMdict

kanjidict

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages