縦中横検出 by mattn · Pull Request #30 · ndl-lab/ndlocr-lite

mattn · 2026-03-07T07:19:57Z

縦書きテキスト中に含まれる横書き部分(縦中横)を検出・認識する機能を追加しました。新聞などで数字や英字が横書きで組まれているケースに対応します。

--enable-tcy フラグで有効化するオプトインになっているので既存の動作には影響しません。また指定しない時はロードすらされません。

指定すると以下 tcy_wrapper.py をロードし PARSEQ をラップします。--enable-tcy 指定時のみ --tcy-* パラメータで検出の閾値等を調整できます。この機能を使う場合には cv2 が必要になります。

Add --enable-tcy flag to detect and correctly OCR horizontal text embedded in vertical lines (tate-chuu-yoko), commonly found in newspaper headlines (e.g. "4317" in "全国4317人不足"). - TateChuYokoWrapper segments vertical line images into blocks and identifies TCY regions by horizontal component analysis - Filter noise blocks (height < seg_min_gap) from TCY detection - Use max_aspect_ratio=0.75 to avoid false positives on regular kanji - All TCY parameters are tunable via --tcy-* flags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

縦中横検出#30

縦中横検出#30
mattn wants to merge 1 commit intondl-lab:masterfrom
mattn:feature/tcy-addon

mattn commented Mar 7, 2026 •

edited by ndl-lab-staff

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattn commented Mar 7, 2026 • edited by ndl-lab-staff Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mattn commented Mar 7, 2026 •

edited by ndl-lab-staff

Loading