Skip to content

Mykytea with no word segmentation #14

@himkt

Description

@himkt

I want to use Mykytea with no word segmentation mode (-nows).
But it seems not to be possible in current implementation.

Example of -nows shows below.

echo '私 は 猫 で す'| kytea -nows

私/代名詞/わたくし は/助詞/は 猫/名詞/ねこ で/助動詞/で す/語尾/す

My code to try to use -nows shows below.

import Mykytea


if __name__ == '__main__':
    kytea_tagger = Mykytea.Mykytea('-nows')
    print(kytea_tagger.getTagsToString('私 は 猫 で す'))

And I execute this program to get the result...

python main.py

私/代名詞/わたくし  /補助記号/UNK は/助詞/は  /補助記号/UNK 猫/名詞/ねこ  /補助記号/UNK で/助動詞/で  /補助記号/UNK す/語尾/す

Problem

There are unnecessary UNK symbols in the analysis result.
This is same as analyzing space splitted sentence with -nows.

echo '私 は 猫 で す'| kytea`

私/代名詞/わたくし \ /補助記号/UNK は/助詞/は \ /補助記号/UNK 猫/名詞/ねこ \ /補助記号/UNK で/助動詞/で \ /補助記号/UNK す/語尾/す

So I think Mykytea() could not take the -nows option correctly.
Regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions