Unicode-aware string utilities for Gleam
Production-ready Gleam library providing Unicode-aware string operations with a focus on grapheme-cluster correctness, pragmatic ASCII transliteration, and URL-friendly slug generation.
Category
Highlights
π― Grapheme-Aware
All operations correctly handle Unicode grapheme clusters (emoji, ZWJ sequences, combining marks)
π€ Case Conversions
snake_case, camelCase, kebab-case, PascalCase, Title Case, capitalize
π Slug Generation
Configurable slugify with token limits, custom separators, and Unicode preservation
π Search & Replace
index_of, last_index_of, replace_first, replace_last, contains_any/all
β
Validation
is_uppercase, is_lowercase, is_title_case, is_ascii, is_hex, is_numeric, is_alpha
π‘οΈ Escaping
escape_html, unescape_html, escape_regex
π Similarity
Levenshtein distance, percentage similarity, hamming_distance
π§© Splitting
splitn, partition, rpartition, chunk, lines, words
π Padding
pad_left, pad_right, center, fill
π Minimal Dependencies
Pure Gleam implementation with no OTP requirement
import str
pub fn main ( ) {
// π― Grapheme-safe truncation preserves emoji
let text = "Hello π©βπ©βπ§βπ¦ World"
str . truncate ( text , 10 , "..." )
// β "Hello π©βπ©βπ§βπ¦..."
// π ASCII transliteration and slugification
str . slugify ( "CrΓ¨me BrΓ»lΓ©e β Recipe 2025!" )
// β "creme-brulee-recipe-2025"
// π€ Case conversions
str . to_camel_case ( "hello world" ) // β "helloWorld"
str . to_snake_case ( "Hello World" ) // β "hello_world"
str . capitalize ( "hELLO wORLD" ) // β "Hello world"
// π Grapheme-aware search
str . index_of ( "π¨βπ©βπ§βπ¦ family test" , "family" )
// β Ok(2) - counts grapheme clusters, not bytes!
// π String similarity
str . similarity ( "hello" , "hallo" )
// β 0.8 (80% similar)
// π‘οΈ HTML escaping
str . escape_html ( "<script>alert('xss')</script>" )
// β "<script>alert('xss')</script>"
}
π€ Case & Capitalization
Function
Example
Result
capitalize(text)
"hELLO wORLD"
"Hello world"
swapcase(text)
"Hello World"
"hELLO wORLD"
is_uppercase(text)
"HELLO123"
True
is_lowercase(text)
"hello_world"
True
is_title_case(text)
"Hello World"
True
βοΈ Grapheme Extraction
Function
Example
Result
take(text, n)
take("π¨βπ©βπ§βπ¦abc", 2)
"π¨βπ©βπ§βπ¦a"
drop(text, n)
drop("hello", 2)
"llo"
take_right(text, n)
take_right("hello", 3)
"llo"
drop_right(text, n)
drop_right("hello", 2)
"hel"
at(text, index)
at("hello", 1)
Ok("e")
chunk(text, size)
chunk("abcdef", 2)
["ab", "cd", "ef"]
Function
Example
Result
index_of(text, needle)
"hello world", "world"
Ok(6)
last_index_of(text, needle)
"hello hello", "hello"
Ok(6)
contains_any(text, needles)
"hello", ["x", "e", "z"]
True
contains_all(text, needles)
"hello", ["h", "e"]
True
replace_first(text, old, new)
"aaa", "a", "b"
"baa"
replace_last(text, old, new)
"aaa", "a", "b"
"aab"
β οΈ Experimental: Search Strategies
Algorithms:
KMP : optimized for long/repetitive patterns
Sliding : fast for short patterns, zero allocations
APIs:
Function
Description
index_of_auto(text, pattern)
Auto-select algorithm (heuristic)
index_of_strategy(text, pattern, Kmp|Sliding)
Explicit algorithm choice
count_auto(text, pattern, overlapping)
Auto-select for counting
count_strategy(text, pattern, overlapping, Kmp|Sliding)
Explicit count algorithm
Examples:
// Force KMP explicitly
str . index_of_strategy ( "long text..." , "pattern" , str . Kmp )
// Let heuristic decide (experimental)
str . index_of_auto ( "some text" , "pat" )
Note: _auto variants use heuristics and may not always choose optimally. For performance-critical code, use _strategy variants. Configure thresholds in src/str/config.gleam.
π§© Splitting & Partitioning
Function
Example
Result
partition(text, sep)
"a-b-c", "-"
#("a", "-", "b-c")
rpartition(text, sep)
"a-b-c", "-"
#("a-b", "-", "c")
splitn(text, sep, n)
"a-b-c-d", "-", 2
["a", "b-c-d"]
words(text)
"hello world"
["hello", "world"]
lines(text)
"a\nb\nc"
["a", "b", "c"]
Function
Example
Result
pad_left(text, width, pad)
"42", 5, "0"
"00042"
pad_right(text, width, pad)
"hi", 5, "*"
"hi***"
center(text, width, pad)
"hi", 6, "-"
"--hi--"
fill(text, width, pad, pos)
"x", 5, "-", "both"
"--x--"
Function
Description
is_numeric(text)
Digits only (0-9)
is_alpha(text)
Letters only (a-z, A-Z)
is_alphanumeric(text)
Letters and digits
is_ascii(text)
ASCII only (0x00-0x7F)
is_printable(text)
Printable ASCII (0x20-0x7E)
is_hex(text)
Hexadecimal (0-9, a-f, A-F)
is_blank(text)
Whitespace only
is_title_case(text)
Title Case format
Function
Example
Result
remove_prefix(text, prefix)
"hello world", "hello "
"world"
remove_suffix(text, suffix)
"file.txt", ".txt"
"file"
ensure_prefix(text, prefix)
"world", "hello "
"hello world"
ensure_suffix(text, suffix)
"file", ".txt"
"file.txt"
starts_with_any(text, list)
"hello", ["hi", "he"]
True
ends_with_any(text, list)
"file.txt", [".txt", ".md"]
True
common_prefix(strings)
["abc", "abd"]
"ab"
common_suffix(strings)
["abc", "xbc"]
"bc"
Function
Example
Result
escape_html(text)
"<div>"
"<div>"
unescape_html(text)
"<div>"
"<div>"
escape_regex(text)
"a.b*c"
"a\\.b\\*c"
π Similarity & Distance
Function
Example
Result
distance(a, b)
"kitten", "sitting"
3
similarity(a, b)
"hello", "hallo"
0.8
hamming_distance(a, b)
"karolin", "kathrin"
Ok(3)
Function
Description
truncate(text, len, suffix)
Truncate with emoji preservation
ellipsis(text, len)
Truncate with β¦
reverse(text)
Grapheme-aware reversal
reverse_words(text)
Reverse word order
initials(text)
Extract initials ("John Doe" β "JD")
normalize_whitespace(text)
Collapse whitespace
strip(text, chars)
Remove chars from ends
squeeze(text, char)
Collapse consecutive chars
chomp(text)
Remove trailing newline
Function
Description
lines(text)
Split into lines
dedent(text)
Remove common indentation
indent(text, spaces)
Add indentation
wrap_at(text, width)
Word wrap
π€ Case Conversions & ASCII Folding
import str
str . to_snake_case ( "Hello World" ) // β "hello_world"
str . to_camel_case ( "hello world" ) // β "helloWorld"
str . to_pascal_case ( "hello world" ) // β "HelloWorld"
str . to_kebab_case ( "Hello World" ) // β "hello-world"
str . to_title_case ( "hello world" ) // β "Hello World"
str . ascii_fold ( "CrΓ¨me BrΓ»lΓ©e" ) // β "Creme Brulee"
str . ascii_fold ( "straΓe" ) // β "strasse"
str . ascii_fold ( "Γ¦on" ) // β "aeon"
str . slugify ( "Hello, World!" ) // β "hello-world"
str . slugify_opts ( "one two three" , 2 , "-" , False ) // β "one-two"
str . slugify_opts ( "Hello World" , 0 , "_" , False ) // β "hello_world"
Which module should I use?
Module
When to use
Import
str
All string operations (recommended)
import str
str/advanced
Low-level KMP algorithms, caching
import str/advanced
str/config
Search heuristics configuration
import str/config
Quick start: Use import str for all your needs. The main str module provides the complete public API including grapheme operations, ASCII folding, slugs, and case conversions.
Advanced users: Import str/advanced for explicit control over search algorithms and KMP map caching.
str/
βββ str.gleam # Main module (complete public API)
βββ advanced.gleam # Low-level search algorithms
βββ config.gleam # Search heuristics configuration
βββ internal/ # Implementation details (not public API)
β‘ Optional OTP Integration
The library core is OTP-free by design. For production Unicode normalization (NFC/NFD):
import str
// In your application code:
pub fn otp_nfd ( s : String ) -> String {
// Call Erlang's :unicode module
s
}
// Use with str:
str . ascii_fold_with_normalizer ( "Crème" , otp_nfd )
str . slugify_with_normalizer ( "CafΓ©" , otp_nfd )
# Run the test suite
gleam test
# Regenerate character tables documentation
python3 scripts/generate_character_tables.py
Note: as of 2.0.0 , escape_html now uses the houdini library for fast, allocationβfriendly escaping, and unescape_html uses odysseus for comprehensive entity support (named, decimal and hex numeric entities). See CHANGELOG.md for details.
tests covering all public functions
Unicode edge cases (emoji, ZWJ, combining marks)
Grapheme cluster boundary handling
Cross-module integration tests
Contributions welcome! Areas for improvement:
Expanding character transliteration tables
Additional test cases for edge cases
Documentation improvements
Performance optimizations
gleam test # Ensure tests pass before submitting PRs
MIT License β see LICENSE for details.
Made with π for the Gleam community