Skip to content

Fix missing Presidio recognizers for URL, US_SSN, CRYPTO, etc.#69

Merged
sgasser merged 1 commit intomainfrom
fix/missing-presidio-recognizers
Feb 9, 2026
Merged

Fix missing Presidio recognizers for URL, US_SSN, CRYPTO, etc.#69
sgasser merged 1 commit intomainfrom
fix/missing-presidio-recognizers

Conversation

@sgasser
Copy link
Owner

@sgasser sgasser commented Feb 9, 2026

Summary

  • Fixes the missing Presidio recognizers causing detection failures for URL, US_SSN, US_PASSPORT, CRYPTO, and MEDICAL_LICENSE
  • Adds comprehensive set of recognizers organized by global (all languages) and language-specific (only when configured)
  • Optimized for performance: only loads recognizers relevant to configured languages

Changes

  • Add GLOBAL_RECOGNIZERS list with 7 pattern-based recognizers (Email, URL, IP, IBAN, CreditCard, Crypto, Date)
  • Add LANGUAGE_RECOGNIZERS dict for language-specific recognizers:
    • en: 13 recognizers (US, UK, AU, SG)
    • es: 2 recognizers (Spanish NIF, NIE)
    • it: 5 recognizers (Italian docs)
    • pl: 1 recognizer (PESEL)
    • ko: 1 recognizer (RRN)
  • Dynamic config generation based on configured languages

Test plan

  • Built Docker image with fix
  • Verified Presidio starts without missing recognizer warnings
  • Tested URL detection: ✓ detected
  • Tested CRYPTO detection: ✓ detected
  • Tested US_PASSPORT detection: ✓ detected
  • Verified all recognizers loaded via /recognizers endpoint

Fixes #67

@sgasser sgasser force-pushed the fix/missing-presidio-recognizers branch from 215d264 to beefe97 Compare February 9, 2026 08:02
The config generator only included 6 recognizers, missing standard ones
like UrlRecognizer, UsSsnRecognizer, CryptoRecognizer. This caused
detection failures when users enabled these entity types.

Changes:
- Add GLOBAL_RECOGNIZERS for pattern-based detection (7 recognizers)
- Add LANGUAGE_RECOGNIZERS for language-specific detection
- Only load language-specific recognizers when that language is configured
- EN: US + UK recognizers (8)
- ES: Spanish NIF/NIE (2)
- IT: Italian documents (5)
- PL: Polish PESEL (1)
- KO: Korean RRN (1)

Fixes #67
@sgasser sgasser force-pushed the fix/missing-presidio-recognizers branch from beefe97 to 4cb908d Compare February 9, 2026 08:04
@sgasser sgasser merged commit 5871fa5 into main Feb 9, 2026
3 checks passed
@sgasser sgasser deleted the fix/missing-presidio-recognizers branch February 23, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Missing Standard Presidio Recognizers (URL, US_SSN) cause detection failures

1 participant