Skip to content

feat: PostgreSQL extension for Icelandic full-text search#1

Open
jokull wants to merge 1 commit intomainfrom
feat/postgres-extension
Open

feat: PostgreSQL extension for Icelandic full-text search#1
jokull wants to merge 1 commit intomainfrom
feat/postgres-extension

Conversation

@jokull
Copy link
Owner

@jokull jokull commented Feb 1, 2026

Summary

  • C extension embedding lemma-is binary for native PostgreSQL FTS
  • icelandic_lexize(), icelandic_fts_lemmas(), icelandic_fts_query(), icelandic_tsvector() functions
  • Docker setup for dev/testing
  • Integration tests verifying parity with JS implementation

Usage

CREATE EXTENSION icelandic_fts;

-- Lemmatize a word
SELECT icelandic_lexize('hestinum');  -- {hestur}

-- Build tsvector for indexing
SELECT icelandic_tsvector('Börnin fóru í bíó');

-- Search with lemma expansion
SELECT * FROM documents 
WHERE search_vector @@ to_tsquery('simple', icelandic_fts_query('hestur'));

Test plan

  • pnpm test tests/pg-extension.integration.test.ts passes
  • Manual testing with real corpus

🤖 Generated with Claude Code

C extension that embeds lemma-is binary data for native PostgreSQL FTS:

Functions:
- icelandic_lexize(text) → text[] - lemmatize single word
- icelandic_fts_lemmas(text) → text[] - extract indexable lemmas
- icelandic_fts_query(text) → text - build tsquery string
- icelandic_tsvector(text) → tsvector - full document indexing

Includes:
- Docker setup for development/testing
- Integration tests verifying parity with JS implementation
- Codegen script for static data (stopwords, rules, etc.)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant