Skip to content

A tool for preprocessing of text data in Japanese for further machine learning. It uses MeCab for tokenization and part-of-speech tagging and Cabocha for shallow and deep parsing.

License

Notifications You must be signed in to change notification settings

ptaszynski/cabocha-extractor

Repository files navigation

cabocha-extractor

[DESCRIPTION]

A tool for preprocessing of text data in Japanese for further machine learning. It uses MeCab for tokenization and part-of-speech tagging and Cabocha for shallow and deep parsing.

usage:

bash main.sh input_file.txt

[DEPENDENCIES]

MeCab, MeCab Perl binding, Cabocha, mecab-ipadic-neologd

Assumes neologd is present at /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd

About

A tool for preprocessing of text data in Japanese for further machine learning. It uses MeCab for tokenization and part-of-speech tagging and Cabocha for shallow and deep parsing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •