cabocha-extractor

[DESCRIPTION]

A tool for preprocessing of text data in Japanese for further machine learning. It uses MeCab for tokenization and part-of-speech tagging and Cabocha for shallow and deep parsing.

usage:

bash main.sh input_file.txt

[DEPENDENCIES]

MeCab, MeCab Perl binding, Cabocha, mecab-ipadic-neologd

Assumes neologd is present at /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cabocha_extractor_upgraded.pl		cabocha_extractor_upgraded.pl
cleaner.pl		cleaner.pl
input_example.txt		input_example.txt
main.sh		main.sh
mecab_extractor_better.pl		mecab_extractor_better.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cabocha-extractor

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ptaszynski/cabocha-extractor

Folders and files

Latest commit

History

Repository files navigation

cabocha-extractor

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages