Replies: 2 comments
-
|
Thanks for pointing out this opportunity to improve the functionality of the pythonbible library. I honestly had not considered this use case (most of my personal use cases are for searching through published texts, sometimes hundreds of years old), but this is certainly a valid use case, and there would be genuine value in being able to clean up that sort of data. This enhancement will take some serious thought and probably discussion, so I am converting this issue into a discussion where I, and whoever else is interested in helping out, can post potential solution ideas. The solution may need to be implemented in phases as well. Thanks again! |
Beta Was this translation helpful? Give feedback.
-
|
I added unit tests to represent each of the examples listed above. I was pleasantly surprised to discover that two of the examples already work as desired. For the rest, hopefully, we can make incremental progress on them. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Machine Generated ASR programs like open-Ai's Whisper are on the rise and tend to output messy formatting of scripture, with difficulties in consistent int/ordinals/words for book/chapter/verse numbers, spans, and have varying capitalizations problems, etc.
Here are a handful of examples lines from webvtt/srt outputs from a batch I've run recently:
Second Timothy chapter two verses three and four says endure hardshipIf you read Ephesians four 17 through 32 all the ammunitionremember that powerful message of Paul in first Corinthians ninein Jesus's first sermonic presentation on planet earth in Matthew five through seven,Jesus said over in Matthew chapter six, verse number 12,Genesis four, 25.and forth between Haggai two and Ezra three.and go and report to John one-fifteen and thirty.I want to focus on here is Colossians chapter three, 22 through verses through chapter four, verse one.In 1 Corinthians 9.22, you see Paul sayingsays in Mark 16 10 that the disciples werethrough that fire, 1 Kings 18.24-38, 1 Chronicles 21.26, 2 Chronicles 7.1-3.open their Bibles to first Corinthians 14, 34, 35 and say, lookGenesis 1, 26, 2, 7, and 21, 22.look in Revelations 21, 1 through 7, you can start reading all aboutPsalms 103.12 saysfor one another Galatians 6 1 & 2 clearly gives usIt will take a post-processing step to clean this sort of data up for nearly anyone using these tools seriously and while feeding the inputs into an LLM or NLP tookit may make sense, it would be swell if a library like this one could do some of the heavy lifting to normalize scripture referenced in a string. Tall order/deep rabbit hole, I understand, but worth a shot.
Suggest a
reformat_fuzzy_referencesthat returns (attempts) a reformatted input_string with even a subset of the most common speech patterns into a normalized form. Bonus points if the user can have some configuration control on output styles, e.g. omit "chapter" or use "v./vv."Assumed gotchas:
I was just in class at 8.30 with my friend WilsonWe're going to talk at 3.30 this afternoon about the discipline of grace and there isSo in Acts chapter 2, 3,000 were saved.Beta Was this translation helpful? Give feedback.
All reactions