Skip to content

Conversation

@samueleternity
Copy link
Collaborator

No description provided.

text normalization algorithm with TextNormalization, WhitespaceNormalization, RepetitionNormalization, InvisibleCharacterNormalization functions. Implemented leetspeak normalization with LeetspeakNormalization and IsPrimarlyCyrillic functions. Implemented word checker with Tokenization and Hash-Trie building with further checking with Wordchecking and Tokenize functions. Currently they are NOT adapted to the moderation-service environment and require further adjustments and combination with proto and with the database
@samueleternity samueleternity changed the title minor changes algorithm-implementation Oct 25, 2025
added united namespace TextProcessor for optimal usage of all new functions. Implemented the text processing in the  service. The attempt was successful
@samueleternity samueleternity self-assigned this Oct 25, 2025
@samueleternity samueleternity linked an issue Oct 25, 2025 that may be closed by this pull request
added model folder with constants used by the algorithm, cleaned up functions
@kataevandrey kataevandrey force-pushed the algorithm-implementation branch from 36082c0 to 25cc18c Compare October 30, 2025 09:04
@samueleternity samueleternity merged commit 056b0c9 into main Nov 6, 2025
@samueleternity samueleternity deleted the algorithm-implementation branch November 6, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Text processing algorithm

3 participants