Skip to content

lupodevelop/str

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

123 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

str logo

str

Unicode-aware string utilities for Gleam

Package Version Hex Docs CI License: MIT

Production-ready Gleam library providing Unicode-aware string operations with a focus on grapheme-cluster correctness, pragmatic ASCII transliteration, and URL-friendly slug generation.


✨ Features

Category Highlights
🎯 Grapheme-Aware All operations correctly handle Unicode grapheme clusters (emoji, ZWJ sequences, combining marks)
πŸ”€ Case Conversions snake_case, camelCase, kebab-case, PascalCase, Title Case, capitalize
πŸ”— Slug Generation Configurable slugify with token limits, custom separators, and Unicode preservation
πŸ” Search & Replace index_of, last_index_of, replace_first, replace_last, contains_any/all
βœ… Validation is_uppercase, is_lowercase, is_title_case, is_ascii, is_hex, is_numeric, is_alpha
πŸ›‘οΈ Escaping escape_html, unescape_html, escape_regex
πŸ“ Similarity Levenshtein distance, percentage similarity, hamming_distance
🧩 Splitting splitn, partition, rpartition, chunk, lines, words
πŸ“ Padding pad_left, pad_right, center, fill
πŸš€ Minimal Dependencies Pure Gleam implementation with no OTP requirement

πŸ“¦ Installation

gleam add str

πŸš€ Quick Start

import str

pub fn main() {
  // 🎯 Grapheme-safe truncation preserves emoji
  let text = "Hello πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦ World"
  str.truncate(text, 10, "...")
  // β†’ "Hello πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦..."

  // πŸ”— ASCII transliteration and slugification
  str.slugify("Crème Brûlée — Recipe 2025!")
  // β†’ "creme-brulee-recipe-2025"

  // πŸ”€ Case conversions
  str.to_camel_case("hello world")   // β†’ "helloWorld"
  str.to_snake_case("Hello World")   // β†’ "hello_world"
  str.capitalize("hELLO wORLD")      // β†’ "Hello world"

  // πŸ” Grapheme-aware search
  str.index_of("πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ family test", "family")
  // β†’ Ok(2) - counts grapheme clusters, not bytes!

  // πŸ“ String similarity
  str.similarity("hello", "hallo")
  // β†’ 0.8 (80% similar)
  
  // πŸ›‘οΈ HTML escaping
  str.escape_html("<script>alert('xss')</script>")
  // β†’ "&lt;script&gt;alert(&#39;xss&#39;)&lt;/script&gt;"
}

πŸ“š API Reference

πŸ”€ Case & Capitalization

Function Example Result
capitalize(text) "hELLO wORLD" "Hello world"
swapcase(text) "Hello World" "hELLO wORLD"
is_uppercase(text) "HELLO123" True
is_lowercase(text) "hello_world" True
is_title_case(text) "Hello World" True

βœ‚οΈ Grapheme Extraction

Function Example Result
take(text, n) take("πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦abc", 2) "πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦a"
drop(text, n) drop("hello", 2) "llo"
take_right(text, n) take_right("hello", 3) "llo"
drop_right(text, n) drop_right("hello", 2) "hel"
at(text, index) at("hello", 1) Ok("e")
chunk(text, size) chunk("abcdef", 2) ["ab", "cd", "ef"]

πŸ” Search & Replace

Function Example Result
index_of(text, needle) "hello world", "world" Ok(6)
last_index_of(text, needle) "hello hello", "hello" Ok(6)
contains_any(text, needles) "hello", ["x", "e", "z"] True
contains_all(text, needles) "hello", ["h", "e"] True
replace_first(text, old, new) "aaa", "a", "b" "baa"
replace_last(text, old, new) "aaa", "a", "b" "aab"

⚠️ Experimental: Search Strategies

Algorithms:

  • KMP: optimized for long/repetitive patterns
  • Sliding: fast for short patterns, zero allocations

APIs:

Function Description
index_of_auto(text, pattern) Auto-select algorithm (heuristic)
index_of_strategy(text, pattern, Kmp|Sliding) Explicit algorithm choice
count_auto(text, pattern, overlapping) Auto-select for counting
count_strategy(text, pattern, overlapping, Kmp|Sliding) Explicit count algorithm

Examples:

// Force KMP explicitly
str.index_of_strategy("long text...", "pattern", str.Kmp)

// Let heuristic decide (experimental)
str.index_of_auto("some text", "pat")

Note: _auto variants use heuristics and may not always choose optimally. For performance-critical code, use _strategy variants. Configure thresholds in src/str/config.gleam.

🧩 Splitting & Partitioning

Function Example Result
partition(text, sep) "a-b-c", "-" #("a", "-", "b-c")
rpartition(text, sep) "a-b-c", "-" #("a-b", "-", "c")
splitn(text, sep, n) "a-b-c-d", "-", 2 ["a", "b-c-d"]
words(text) "hello world" ["hello", "world"]
lines(text) "a\nb\nc" ["a", "b", "c"]

πŸ“ Padding & Filling

Function Example Result
pad_left(text, width, pad) "42", 5, "0" "00042"
pad_right(text, width, pad) "hi", 5, "*" "hi***"
center(text, width, pad) "hi", 6, "-" "--hi--"
fill(text, width, pad, pos) "x", 5, "-", "both" "--x--"

βœ… Validation

Function Description
is_numeric(text) Digits only (0-9)
is_alpha(text) Letters only (a-z, A-Z)
is_alphanumeric(text) Letters and digits
is_ascii(text) ASCII only (0x00-0x7F)
is_printable(text) Printable ASCII (0x20-0x7E)
is_hex(text) Hexadecimal (0-9, a-f, A-F)
is_blank(text) Whitespace only
is_title_case(text) Title Case format

πŸ”— Prefix & Suffix

Function Example Result
remove_prefix(text, prefix) "hello world", "hello " "world"
remove_suffix(text, suffix) "file.txt", ".txt" "file"
ensure_prefix(text, prefix) "world", "hello " "hello world"
ensure_suffix(text, suffix) "file", ".txt" "file.txt"
starts_with_any(text, list) "hello", ["hi", "he"] True
ends_with_any(text, list) "file.txt", [".txt", ".md"] True
common_prefix(strings) ["abc", "abd"] "ab"
common_suffix(strings) ["abc", "xbc"] "bc"

πŸ›‘οΈ Escaping

Function Example Result
escape_html(text) "<div>" "&lt;div&gt;"
unescape_html(text) "&lt;div&gt;" "<div>"
escape_regex(text) "a.b*c" "a\\.b\\*c"

πŸ“ Similarity & Distance

Function Example Result
distance(a, b) "kitten", "sitting" 3
similarity(a, b) "hello", "hallo" 0.8
hamming_distance(a, b) "karolin", "kathrin" Ok(3)

πŸ“ Text Manipulation

Function Description
truncate(text, len, suffix) Truncate with emoji preservation
ellipsis(text, len) Truncate with …
reverse(text) Grapheme-aware reversal
reverse_words(text) Reverse word order
initials(text) Extract initials ("John Doe" β†’ "JD")
normalize_whitespace(text) Collapse whitespace
strip(text, chars) Remove chars from ends
squeeze(text, char) Collapse consecutive chars
chomp(text) Remove trailing newline

πŸ“„ Line Operations

Function Description
lines(text) Split into lines
dedent(text) Remove common indentation
indent(text, spaces) Add indentation
wrap_at(text, width) Word wrap

πŸ”€ Case Conversions & ASCII Folding

Case Conversions

import str

str.to_snake_case("Hello World")    // β†’ "hello_world"
str.to_camel_case("hello world")    // β†’ "helloWorld"
str.to_pascal_case("hello world")   // β†’ "HelloWorld"
str.to_kebab_case("Hello World")    // β†’ "hello-world"
str.to_title_case("hello world")    // β†’ "Hello World"

ASCII Folding (Deburr)

str.ascii_fold("Crème Brûlée")  // → "Creme Brulee"
str.ascii_fold("straße")        // β†’ "strasse"
str.ascii_fold("Γ¦on")           // β†’ "aeon"

Slug Generation

str.slugify("Hello, World!")                    // β†’ "hello-world"
str.slugify_opts("one two three", 2, "-", False) // β†’ "one-two"
str.slugify_opts("Hello World", 0, "_", False)   // β†’ "hello_world"

πŸ—οΈ Module Guide

Which module should I use?

Module When to use Import
str All string operations (recommended) import str
str/advanced Low-level KMP algorithms, caching import str/advanced
str/config Search heuristics configuration import str/config

Quick start: Use import str for all your needs. The main str module provides the complete public API including grapheme operations, ASCII folding, slugs, and case conversions.

Advanced users: Import str/advanced for explicit control over search algorithms and KMP map caching.

Module structure

str/
β”œβ”€β”€ str.gleam       # Main module (complete public API)
β”œβ”€β”€ advanced.gleam  # Low-level search algorithms
β”œβ”€β”€ config.gleam    # Search heuristics configuration
└── internal/       # Implementation details (not public API)

πŸ“– Documentation

Document Description
Core API Grapheme-aware string operations
Extra API ASCII folding and slug generation
Tokenizer Pure-Gleam tokenizer reference
Examples Integration examples and OTP patterns
Character Tables Machine-readable transliteration data

⚑ Optional OTP Integration

The library core is OTP-free by design. For production Unicode normalization (NFC/NFD):

import str

// In your application code:
pub fn otp_nfd(s: String) -> String {
  // Call Erlang's :unicode module
  s
}

// Use with str:
str.ascii_fold_with_normalizer("Crème", otp_nfd)
str.slugify_with_normalizer("CafΓ©", otp_nfd)

πŸ§ͺ Development

# Run the test suite
gleam test

# Regenerate character tables documentation
python3 scripts/generate_character_tables.py

Note: as of 2.0.0, escape_html now uses the houdini library for fast, allocation‑friendly escaping, and unescape_html uses odysseus for comprehensive entity support (named, decimal and hex numeric entities). See CHANGELOG.md for details.


πŸ“Š Test Coverage

  • tests covering all public functions
  • Unicode edge cases (emoji, ZWJ, combining marks)
  • Grapheme cluster boundary handling
  • Cross-module integration tests

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Expanding character transliteration tables
  • Additional test cases for edge cases
  • Documentation improvements
  • Performance optimizations
gleam test  # Ensure tests pass before submitting PRs

πŸ“„ License

MIT License β€” see LICENSE for details.


πŸ”— Links


Made with πŸ’œ for the Gleam community

About

Gleam library providing Unicode-aware string operations

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors