Skip to content

realsanjeev/nepali_unicoder

Repository files navigation

Nepali Unicoder

Documentation PyPI version

A robust Python package for converting Romanized Nepali text and Preeti font text into Unicode Devanagari script. It uses a greedy matching algorithm for Roman transliteration and a two-phase conversion process for Preeti with contextual rules.

Read the Full Documentation for detailed usage guides, Preeti mapping references, and API details.

Features

  • Accurate Transliteration: Uses a greedy matching algorithm to prioritize longer phonetic matches (e.g., 'kha' is matched before 'k' and 'h').
  • Preeti Font Support: Full support for Preeti to Unicode conversion with 30+ contextual rules for accurate transformation.
  • Smart Vowel Handling: Distinguishes between independent vowels (e.g., 'aa' -> 'आ') and vowel signs/matras (e.g., 'ka' -> 'क', 'kaa' -> 'का').
  • Contextual Rules: Handles complex Devanagari rules like reph positioning, matra reordering, and special character combinations.
  • Mixed Content Support: Allows keeping English words or specific text in Roman script using {} blocks.
  • Customizable: Supports custom word-level overrides via word_maps.json.
  • CLI Support: Can be used directly from the command line.

Installation

You can install the package locally:

pip install nepali-unicoder

Usage

Command Line Interface (CLI)

You can use the converter directly from the terminal:

# Using the installed command
nepali-unicoder "namaste"
# Output: नमस्ते

# Or using python module
python -m nepali_unicoder "namaste"
# Output: नमस्ते

# Pipe input
echo "mero naam sanjeev ho" | nepali-unicoder
# Output: मेरो नाम सन्जीव् हो

Python API

from nepali_unicoder.convert import Converter

converter = Converter()

# Basic conversion
text = "namaste nepal"
print(converter.convert(text))
# Output: नमस्ते नेपाल

# Using 'as-is' blocks for English text
mixed_text = "mero naam {Sanjeev} ho"
print(converter.convert(mixed_text))
# Output: मेरो नाम Sanjeev हो

Preeti Mode

Convert Preeti font text to Unicode with full support for contextual rules:

from nepali_unicoder.convert import Converter

# Create converter in Preeti mode
preeti_converter = Converter(mode="preeti")

# Basic conversion
preeti_text = "s{sf"  # Preeti characters
print(preeti_converter.convert(preeti_text))
# Output: र्कर्का

# The converter handles:
# - Reph positioning: { → र् (moves before consonant)
# - Matra reordering: l (ि) moves after consonant
# - Special m transformations
# - Vowel combinations
# - Literal brackets: { and } are treated as normal characters in Preeti mode

Preeti Character Examples

Preeti Unicode Description
s Consonant ka
s{ र्क Reph + ka (contextual)
sl कि ka + short i (reordered)
qm क्र Special m transformation
!@# १२३ Nepali numbers
Ù / Ú ; / : Literal punctuation
« / » ्र Ra-foot (for ट, ठ, ड, ढ)
¿ रू Combined ruu
å द्व Combined dva
ˆ फ् Half ph
ª Consonant nga
æ / Æ / Curly quotes
¥ र् Half ra
ठ्ठ Combined thth
§ ट्ट Combined tt
£ घ् Half gh
Ë / Í ङ्ग / ङ्क Combined nga-ga / nga-ka
झ् Half jh

CLI for Preeti

python -m nepali_unicoder --preeti "s{sf"
# Output: र्कर्का

Transliteration Rules

  • Consonants: k -> क्, ka -> , kh -> ख्, kha ->
  • Vowels: a -> , aa -> , i -> , u ->
  • Matras: ki -> कि, ko -> को
  • Special: . -> , .. ->
  • Numbers: 0-9 -> ०-९ (Decimal points are preserved: 1.5 -> १.५)

Advanced Usage

Handling Complex Text

The converter handles mixed content gracefully. You can use {} to keep text as-is (e.g., for English words or code snippets).

text = "mero naam {Sanjeev} ho ra ma 12.5 barsa ko bhaye."
print(converter.convert(text))
# Output: मेरो नाम Sanjeev हो र म १२.५ बर्स को भए।

Configuration

The package uses word_maps.json for custom word-level overrides, located in the src/nepali_unicoder directory.

  1. word_maps.json: Defines custom word-level overrides. Use this for words that don't follow standard phonetic rules.

Example word_maps.json:

{
    "nepal": "नेपाल",
    "kathamandu": "काठमाडौँ"
}

Contribution

We welcome contributions! Here's how you can help:

  1. Clone the repository:

    git clone https://github.com/realsanjeev/nepali_unicoder.git
    cd nepali_unicoder
  2. Set up a virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
    pip install -e .
  3. Run tests:

    python -m unittest discover tests
  4. Submit a Pull Request: Create a new branch, make your changes, and submit a PR.

Development

To run tests:

python -m unittest discover tests

License

MIT

About

A library for converting Nepali Preeti to Unicode and translating Roman text into Nepali.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages