Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: test

on:
push:
branches:
- master
- main
pull_request:

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: erlef/setup-beam@v1
with:
otp-version: "28"
gleam-version: "1.13.0"
rebar3-version: "3"
# elixir-version: "1"
- run: gleam deps download
- run: gleam test
- run: gleam format --check src test
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,24 @@

All notable changes to this project are documented in this file.

## [1.2.3] - 2026-01-08
### Changed
- Replaced `escape_html` implementation with `houdini.escape` for faster,
allocation-friendly HTML escaping.
- Replaced `unescape_html` with `odysseus.unescape` for comprehensive HTML
entity unescaping (named entities, numeric decimal and hex entities).
- Added dependencies: `houdini`, `odysseus`.

### Tests
- Added tests for HTML escape/unescape and numeric entities (decimal and hex).

Contributed by: Daniele (`lupodevelop`)
Suggested by: Louis Pilfold (`@lpil`)

Suggested by: NNB (`@NNBnh`)
Suggested change: updated README logo pointer to use the raw.githubusercontent URL
(pointing to the repository commit) so the logo is resolvable on Hexdocs.

## [1.2.2] - 2026-01-05
### Added
- Added internal helper `grapheme_len/1` (internal) to centralize grapheme cluster length computation and avoid repetitive `string.to_graphemes |> list.length` patterns.
Expand Down
34 changes: 34 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Contributing to str

Thanks for helping! Short, practical guide.

## Quick start
- Fork, create a branch: `git switch -c feat/your-change`.
- Run `gleam format` and `gleam test` locally.
- Open a PR against `main` with a short description and tests.

## Setup
- Requirements: Gleam (see `gleam.toml`)

Commands:
```bash
gleam format
gleam test
```

## Commits
Use brief prefixes: `feat:`, `fix:`, `chore:`, `test:`, `perf:`.
Example: `feat(display): add truncate_display`
No strict enforcement, use these prefixes as a guideline, not a hard rule.

## PR checklist
- [ ] Tests added/updated
- [ ] `gleam format` & `gleam test` pass
- [ ] Update `CHANGELOG.md` if behaviour changes
- [ ] Document noteworthy changes in `README.md` , docs/ or examples/
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra space before the comma in "README.md , docs/". It should be "README.md, docs/" without the space before the comma.

Suggested change
- [ ] Document noteworthy changes in `README.md` , docs/ or examples/
- [ ] Document noteworthy changes in `README.md`, docs/ or examples/

Copilot uses AI. Check for mistakes.

## Deprecations
- Report breaking changes in an issue and add migration notes in PRs. See `DEPRECATIONS.md` if present.

## Testing
- Add unit tests for edge cases (ZWJ, skin tones, combining marks, CJK, ambiguous widths).
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<p align="center">
<img src="assets/img/logo-str.png" alt="str logo" width="280">
<img src="https://raw.githubusercontent.com/lupodevelop/str/c190b21/assets/img/logo-str.png" alt="str logo" width="280">
</p>

<h1 align="center">str</h1>
Expand Down Expand Up @@ -327,6 +327,8 @@ gleam test
python3 scripts/generate_character_tables.py
```

Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocation‑friendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details.
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hyphen in "allocation‑friendly" is a non-breaking hyphen (U+2011) instead of a regular ASCII hyphen. This is inconsistent with the same term in CHANGELOG.md line 8 which uses a regular hyphen. Consider using a regular hyphen for consistency.

Suggested change
Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocationfriendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details.
Note: as of **1.2.3**, `escape_html` now uses the `houdini` library for fast, allocation-friendly escaping, and `unescape_html` uses `odysseus` for comprehensive entity support (named, decimal and hex numeric entities). See [CHANGELOG.md](CHANGELOG.md) for details.

Copilot uses AI. Check for mistakes.

---

## 📊 Test Coverage
Expand Down
5 changes: 3 additions & 2 deletions gleam.toml
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
name = "str"
version = "1.2.2"
version = "1.2.3"

# Project metadata (fill or replace placeholders before publishing)
description = "Unicode-aware string utilities for Gleam: grapheme-safe operations, pragmatic ASCII transliteration, and slug generation."
licenses = ["MIT"]
repository = { type = "github", user = "lupodevelop", repo = "str" }
links = [{ title = "Repository", href = "https://github.com/lupodevelop/str" }]

# For a full reference of all the available options, see:
# https://gleam.run/writing-gleam/gleam-toml/

[dependencies]
gleam_stdlib = ">= 0.44.0 and < 2.0.0"
houdini = ">= 1.0.0 and < 2.0.0"
odysseus = ">= 1.0.0 and < 2.0.0"

[dev-dependencies]
gleeunit = ">= 1.0.0 and < 2.0.0"
4 changes: 4 additions & 0 deletions manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@
packages = [
{ name = "gleam_stdlib", version = "0.65.0", build_tools = ["gleam"], requirements = [], otp_app = "gleam_stdlib", source = "hex", outer_checksum = "7C69C71D8C493AE11A5184828A77110EB05A7786EBF8B25B36A72F879C3EE107" },
{ name = "gleeunit", version = "1.9.0", build_tools = ["gleam"], requirements = ["gleam_stdlib"], otp_app = "gleeunit", source = "hex", outer_checksum = "DA9553CE58B67924B3C631F96FE3370C49EB6D6DC6B384EC4862CC4AAA718F3C" },
{ name = "houdini", version = "1.2.0", build_tools = ["gleam"], requirements = [], otp_app = "houdini", source = "hex", outer_checksum = "5DB1053F1AF828049C2B206D4403C18970ABEF5C18671CA3C2D2ED0DD64F6385" },
{ name = "odysseus", version = "1.0.0", build_tools = ["gleam"], requirements = [], otp_app = "odysseus", source = "hex", outer_checksum = "6A97DA1075BDDEA8B60F47B1DFFAD49309FA27E73843F13A0AF32EA7087BA11C" },
]

[requirements]
gleam_stdlib = { version = ">= 0.44.0 and < 2.0.0" }
gleeunit = { version = ">= 1.0.0 and < 2.0.0" }
houdini = { version = ">= 1.0.0 and < 2.0.0" }
odysseus = { version = ">= 1.0.0 and < 2.0.0" }
16 changes: 4 additions & 12 deletions src/str/core.gleam
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ import gleam/dict
import gleam/int
import gleam/list
import gleam/string
import houdini
import odysseus
import str/config

/// Detects if a grapheme cluster likely contains emoji components.
Expand Down Expand Up @@ -1766,12 +1768,7 @@ pub fn is_hex(text: String) -> Bool {
/// escape_html("Say \"hello\"") -> "Say &quot;hello&quot;"
///
pub fn escape_html(text: String) -> String {
text
|> string.replace("&", "&amp;")
|> string.replace("<", "&lt;")
|> string.replace(">", "&gt;")
|> string.replace("\"", "&quot;")
|> string.replace("'", "&#39;")
houdini.escape(text)
}

/// Unescapes HTML entities to their character equivalents.
Expand All @@ -1781,12 +1778,7 @@ pub fn escape_html(text: String) -> String {
/// unescape_html("Tom &amp; Jerry") -> "Tom & Jerry"
///
pub fn unescape_html(text: String) -> String {
text
|> string.replace("&#39;", "'")
|> string.replace("&quot;", "\"")
|> string.replace("&gt;", ">")
|> string.replace("&lt;", "<")
|> string.replace("&amp;", "&")
odysseus.unescape(text)
}

/// Escapes regex metacharacters so the string can be used as a literal pattern.
Expand Down
64 changes: 64 additions & 0 deletions test/str_html_escape_extended_test.gleam
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import gleam/list
import gleeunit
import str

pub fn main() -> Nil {
gleeunit.main()
}

pub fn roundtrip_basic_entities_test() {
let cases = [
"<div>Hello</div>",
"Tom & Jerry",
"Say \"hello\"",
"It's me",
"5 < 10 && 10 > 5",
"Ampersand: &",
]

list.fold(cases, True, fn(_, s) {
let escaped = str.escape_html(s)
let unescaped = str.unescape_html(escaped)
assert unescaped == s
True
})
}

Comment on lines +19 to +26
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first parameter in the fold function is unused (indicated by the underscore). The function returns True but this return value is never used. The entire fold seems unnecessary here - consider using list.each instead, or restructure to avoid the unused parameter and return value.

Suggested change
list.fold(cases, True, fn(_, s) {
let escaped = str.escape_html(s)
let unescaped = str.unescape_html(escaped)
assert unescaped == s
True
})
}
list.each(cases, fn(s) {
let escaped = str.escape_html(s)
let unescaped = str.unescape_html(escaped)
assert unescaped == s
})
}

Copilot uses AI. Check for mistakes.
pub fn numeric_and_named_entities_test() {
assert str.unescape_html("&lt;&gt;&amp;&#39;&#x27;&#34;") == "<>&''\""
assert str.unescape_html("&quot; and &#34; and &#x22;") == "\" and \" and \""
assert str.unescape_html("I like &#39;quotes&#39;") == "I like 'quotes'"
assert str.unescape_html("Hex: &#x27;") == "Hex: '"
}

pub fn malformed_and_unknown_entity_test() {
// Missing semicolon should remain unchanged
assert str.unescape_html("This &amp is broken") == "This &amp is broken"

// Unknown entity should remain unchanged
assert str.unescape_html("This &notanentity; remains")
== "This &notanentity; remains"
}

pub fn combined_and_adjacent_entities_test() {
assert str.unescape_html("&lt;&lt; &gt;&gt;") == "<< >>"
assert str.unescape_html("&amp;&amp;&amp;") == "&&&"
}

pub fn unicode_and_emoji_roundtrip_test() {
let s = "Café — ️👩‍👩‍👧‍👦 \u{00A0}"
let escaped = str.escape_html(s)
// Expect unescape to restore the original (escape may not change emoji/nbspace)
assert str.unescape_html(escaped) == s
}

pub fn idempotence_and_double_escape_test() {
let s = "&"
let once = str.escape_html(s)
let twice = str.escape_html(once)
assert once == "&amp;"
assert twice == "&amp;amp;"
// unescape decodes one level: "&amp;amp;" -> "&amp;"; double unescape restores original
assert str.unescape_html(twice) == "&amp;"
assert str.unescape_html(str.unescape_html(twice)) == s
}
70 changes: 70 additions & 0 deletions test/str_html_escape_fuzz_test.gleam
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import gleeunit
import str
import gleam/list
import gleam/string

pub fn main() -> Nil {
gleeunit.main()
}

// Deterministic, simple generator over a token pool.
fn gen_token_pool() -> List(String) {
[
"a","b","c","1","2","3"," ","\n","<",">","&","\"","'",
"&amp;","&lt;","&gt;","&quot;","&#39;","&#x27;","&#x22;","&notanentity;",
"&","&amp","&#", "&#x",
"\u{00A0}", // NBSP
"Café","naïve","ø","漢","字",
"👩‍👩‍👧‍👦","👨‍👩‍👧","️","✈️","🏳️‍🌈",
"\u{0301}", // combining acute
"&alpha;","&beta;","&gamma;"
]
}

// Deterministic pseudo-random index using seed and i
fn idx_for(seed: Int, i: Int, len: Int) -> Int {
// simple LCG-ish formula; keep small to avoid large-int overhead
let v = seed * 1103515245 + 12345 + i
let v_pos = case v < 0 { True -> -v False -> v }
v_pos % len
}

fn gen_string(seed: Int, tokens: List(String), n: Int) -> String {
let len = list.length(tokens)
let seq = list.range(0, n - 1)
seq
|> list.map(fn(i) {
let j = idx_for(seed, i, len)
case list.drop(tokens, j) {
[first, ..] -> first
[] -> ""
}
})
|> list.fold("", fn(acc, s) { acc <> s })
}

fn run_cfg(seed: Int, n: Int, tokens: List(String)) -> Bool {
let s = gen_string(seed, tokens, n)
// Roundtrip: unescape(escape(s)) == s
let escaped = str.escape_html(s)
let unescaped = str.unescape_html(escaped)
assert unescaped == s

// Escaped string must not contain raw angle brackets or quotes
assert string.contains(escaped, "<") == False
assert string.contains(escaped, ">") == False
assert string.contains(escaped, "\"") == False
assert string.contains(escaped, "'") == False

True
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns True but test functions in gleeunit should return Nil. Additionally, the run_cfg calls on lines 65-67 have return values that are ignored. Consider removing the return value from both run_cfg and this test function.

Copilot uses AI. Check for mistakes.
}

pub fn fuzz_roundtrip_test() {
let tokens = gen_token_pool()

run_cfg(1, 20, tokens)
run_cfg(42, 50, tokens)
run_cfg(123, 200, tokens)

True
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test function returns True but this value is never used. In Gleam tests using gleeunit, test functions should return Nil. The function should not return True at the end.

Copilot uses AI. Check for mistakes.
}
31 changes: 31 additions & 0 deletions test/str_html_escape_test.gleam
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import str

pub fn escape_basic_test() {
assert str.escape_html("<div>Hello</div>") == "&lt;div&gt;Hello&lt;/div&gt;"
assert str.escape_html("Tom & Jerry") == "Tom &amp; Jerry"
assert str.escape_html("Say \"hello\"") == "Say &quot;hello&quot;"
}

pub fn unescape_basic_test() {
assert str.unescape_html("&lt;div&gt;") == "<div>"
assert str.unescape_html("Tom &amp; Jerry") == "Tom & Jerry"
assert str.unescape_html("Say &quot;hello&quot;") == "Say \"hello\""
assert str.unescape_html("It&#39;s me") == "It's me"
}

pub fn roundtrip_test() {
let s = "Hello & < > \""
let escaped = str.escape_html(s)
assert str.unescape_html(escaped) == s
}

pub fn numeric_entities_test() {
// Decimal numeric entity
assert str.unescape_html("I like &#39;quotes&#39;") == "I like 'quotes'"

// Hex numeric entity
assert str.unescape_html("Hex: &#x27;") == "Hex: '"

// Double quote numeric and hex
assert str.unescape_html("&quot; and &#34; and &#x22;") == "\" and \" and \""
}
Loading