Skip to content

Fix: read utf-8 files, correct paragraph parsing, generate concated.json#408

Open
farhana-lgtm wants to merge 1 commit intoagwaBom:mainfrom
farhana-lgtm:fix/main-read-encoding
Open

Fix: read utf-8 files, correct paragraph parsing, generate concated.json#408
farhana-lgtm wants to merge 1 commit intoagwaBom:mainfrom
farhana-lgtm:fix/main-read-encoding

Conversation

@farhana-lgtm
Copy link
Copy Markdown

1.Replace main.py to properly read files using UTF-8.
2.Join broken lines into paragraphs based on blank lines.
3.Generate one JSON object per line using ensure_ascii=False.

Why:

  • The original code did not handle UTF-8 correctly and did not merge lines into paragraphs.
    fixes encoding and paragraph pairing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant