Skip to content

Conversation

@marcarl
Copy link
Collaborator

@marcarl marcarl commented Jan 8, 2026

Implements a function to identify and wrap Swedish currency amounts (kronor,
kr, SEK) and percentages (%, procent) with semantic tags containing:

  • id: context-aware slug (e.g., "avgift-1500-kr", "ranta-8-5-procent")
  • type: "amount" or "percentage"
  • value: normalized numeric value

The function:

  • Handles Swedish number formats (space separators, decimal comma)
  • Supports multipliers (miljoner, miljarder, tusen)
  • Extracts context words for descriptive slugs
  • Skips markdown headers and XML/HTML tags
  • Includes 48 unit tests

claude added 7 commits January 8, 2026 19:55
…centages

Implements a function to identify and wrap Swedish currency amounts (kronor,
kr, SEK) and percentages (%, procent) with semantic <data> tags containing:
- id: context-aware slug (e.g., "avgift-1500-kr", "ranta-8-5-procent")
- type: "amount" or "percentage"
- value: normalized numeric value

The function:
- Handles Swedish number formats (space separators, decimal comma)
- Supports multipliers (miljoner, miljarder, tusen)
- Extracts context words for descriptive slugs
- Skips markdown headers and XML/HTML tags
- Includes 48 unit tests
Remove numeric value and unit from id attribute, keeping only the
context-derived identifier (e.g., "avgift", "ranta", "moms").

This allows tracking the same data point across law amendments,
since the id stays constant while only the value changes.

Before: id="avgift-1500-kr"
After:  id="avgift"
Replace context-based slug generation with positional ids that can be
mapped to descriptive slugs via a reference table.

Changes:
- Generate positional ids like "kap5.2-belopp-1" based on section + type + position
- Add reference table support (data/amount-references.json) for custom slugs
- Section tags in the text automatically set the current section id
- Counters reset when entering a new section

This approach allows:
- Consistent ids across law amendments (same position = same id)
- Human/LLM curation of descriptive slugs like "riksbankens-referensranta"
- Tracking value changes over time using the stable id

Example with reference table:
  {"kap5.2-belopp-1": "tillstandsavgift"}

Output:
  <data id="tillstandsavgift" type="amount" value="1500">1 500 kronor</data>
Include SFS designation (e.g., "2024:123") in positional ids to enable:
- Unique identification across different laws
- Tracking value changes when same slug maps to multiple SFS versions

New id format: sfs-2024-123-kap5.2-belopp-1

Reference table now supports tracking changes over time:
{
  "sfs-2020-100-kap5.2-belopp-1": "tillstandsavgift",
  "sfs-2024-123-kap5.2-belopp-1": "tillstandsavgift"
}

Both resolve to id="tillstandsavgift" but with different values,
allowing comparison of the same data point across amendments.

Also extracts SFS id from <article selex:id="lag-2024-123"> tags.
Change positional id format from:
  sfs-2024-123-kap5.2-belopp-1
To:
  sfs-2024-123/kap5.2-belopp-1

The "/" creates clearer visual hierarchy:
- Before slash: the law (SFS designation)
- After slash: position within the document

Added test for reference table slug resolution.
New function to find amounts/percentages that need slugs in the reference table.

Returns list of dicts with:
- positional_id: the id that needs mapping
- type: "amount" or "percentage"
- value: normalized numeric value
- matched_text: original text matched
- context: surrounding text for understanding

Useful for batch curation of slugs with LLM assistance.
Add comprehensive reference table entries covering:
- Socialtjänstlagen (2025:400) - sanktionsavgifter
- Inkomstskattelagen (1999:1229) - basbelopp, avdrag, skattesatser
- Socialförsäkringsbalken (2010:110) - sjukpenning, föräldrapenning
- Brottsbalken (1962:700) - straffbestämmelser
- Aktiebolagslagen (2005:551) - kapitalkrav
- Räntelagen (1975:635) - referensränta
- And 27 more Swedish laws

This enables tracking of amount changes across law amendments using
stable descriptive slugs like "prisbasbelopp", "referensranta", etc.
@marcarl marcarl changed the title Add tag_swedish_amounts function for tagging monetary amounts and percentages Märk upp belopp och avgifter i SFS som data-taggar Jan 9, 2026
claude added 2 commits January 9, 2026 07:38
YAML supports inline comments, making it easier to document and
organize the reference table with section headers and annotations.

Changes:
- Convert data/amount-references.json to data/amount-references.yaml
- Update load_reference_table() to use yaml.safe_load()
- Replace json import with yaml import
Each entry now includes an inline comment with:
- The actual value (e.g., "57 300 kr", "80%")
- A text excerpt showing context from the law

Example:
  "sfs-1999-1229/kap2.1-belopp-1": prisbasbelopp  # 57 300 kr - "prisbasbeloppet enligt 2 kap."

This makes it easier to understand and verify each mapping.
@marcarl marcarl linked an issue Jan 9, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Märka upp belopp och avgifter som data-taggar

3 participants