Skip to content

Conversation

Copy link

Copilot AI commented Dec 19, 2025

Implements three v2.0 enhancements: JSON structure-preserving redaction, four Turkish PII recognizers (VKN, SGK, license plates, passports), and comprehensive test coverage.

Changes

JSON-Safe Redaction

  • Added RedactionStyle.JsonSafe enum value
  • Implemented JSON-aware redaction that parses JSON, redacts only values (preserving keys), falls back to partial redaction on parse failure
  • Email-specific handling maintains readability: test@example.comt***@***.com
var opt = new DataGuardianOptions { Redaction = RedactionStyle.JsonSafe };
// Input:  {"email":"user@example.com","name":"Ali Veli"}
// Output: {"email":"u***@***.com","name":"Ali Veli"}  // name not PII, email redacted

Turkish Identity Recognizers

VknRecognizer (Vergi Kimlik Numarası, weight: 9)

  • 10-digit tax ID with Modulo 10 checksum validation

SgkRecognizer (SGK Sicil Numarası, weight: 7)

  • 12-digit social security number

LicensePlateRecognizer (weight: 5)

  • Pattern: \d{2}\s?[A-Z]{1,3}\s?\d{2,4} (e.g., "34 ABC 1234")
  • Turkish language only

PassportRecognizer (weight: 8)

  • Pattern: [A-Z]\d{8} (e.g., "U12345678")
  • Turkish and English contexts

All recognizers registered in DataGuardianEngine, added to default weights and RedactTypes in DataGuardianOptions.

Test Coverage

  • RecognizerTests.cs: 6 tests covering valid/invalid cases for new recognizers
  • JsonRedactionTests.cs: 6 tests for JSON parsing, nesting, arrays, fallback behavior
  • IntegrationTests.cs: 5 E2E tests verifying detector integration and configuration

22/22 tests passing. CodeQL: 0 alerts.

Original prompt

Objective

Implement three high-priority enhancements for DataGuardian v2.0 roadmap:

1. JSON-Safe Redaction (Priority: High)

Problem: Current redaction (MaskAll, Partial, Hash) operates on the entire text string, potentially breaking JSON structure.

Solution:

  • Add new RedactionStyle.JsonSafe enum value
  • Implement JSON-aware redaction that:
    • Parses JSON content safely
    • Redacts only values containing PII, preserving keys
    • Maintains valid JSON structure
    • Falls back to regular redaction if JSON parsing fails
  • Update DataGuardianOptions.cs to include the new enum value
  • Update the Redact() method in DataGuardianMiddleware.cs to handle JSON-safe mode

Example:

// Before: { "email": "test@example.com", "name": "Ali Veli" }
// After (JsonSafe): { "email": "***@***.com", "name": "Ali Veli" }

2. Turkish Identity Detectors (Priority: High)

Expand PII detection for Turkish market by adding four new recognizers in src/Devoplus.DataGuardian/Recognizers/:

a) VknRecognizer.cs (Vergi Kimlik Numarası)

  • Pattern: 10 digits
  • Validation: Must start with 0-9, basic checksum validation
  • Type: "VKN"
  • Weight suggestion: 9 (add to default weights in DataGuardianOptions)

b) SgkRecognizer.cs (SGK Sicil Numarası)

  • Pattern: 12 digits
  • Type: "SGK"
  • Weight suggestion: 7

c) LicensePlateRecognizer.cs (Turkish License Plate)

  • Pattern: Turkish format - 2 digits + 1-3 letters + 2-4 digits (e.g., "34 ABC 1234", "06 XY 9876")
  • Type: "LICENSE_PLATE"
  • Weight suggestion: 5
  • Language: "tr" only

d) PassportRecognizer.cs (Passport Number)

  • Pattern: Turkish passport format - 1 letter + 8 digits (e.g., "U12345678")
  • Type: "PASSPORT"
  • Weight suggestion: 8
  • Support both "tr" and "en" languages

Requirements:

  • Each recognizer must implement IPiiRecognizer interface
  • Follow existing patterns from TcknRecognizer.cs and CreditCardRecognizer.cs
  • Include validation logic where applicable
  • Register all new recognizers in DataGuardianEngine.cs constructor
  • Add new types to RedactTypes default set in DataGuardianOptions.cs

3. Expanded Test Coverage (Priority: High)

Create comprehensive tests in tests/Devoplus.DataGuardian.Tests/:

a) Update RecognizerTests.cs

Add test methods for each new recognizer:

  • Vkn_Valid_Sample_Is_Detected()
  • Vkn_Invalid_Sample_Is_Rejected()
  • Sgk_Valid_Sample_Is_Detected()
  • LicensePlate_Turkish_Format_Is_Detected()
  • LicensePlate_Invalid_Format_Is_Rejected()
  • Passport_Turkish_Format_Is_Detected()

b) Create new file: JsonRedactionTests.cs

Add comprehensive JSON redaction tests:

  • JsonSafe_Redacts_Only_Values_Not_Keys()
  • JsonSafe_Preserves_Valid_Json_Structure()
  • JsonSafe_Handles_Nested_Objects()
  • JsonSafe_Handles_Arrays()
  • JsonSafe_Falls_Back_On_Invalid_Json()
  • JsonSafe_Handles_Multiple_PII_Types()

c) Create new file: IntegrationTests.cs

Add end-to-end tests:

  • Engine_Detects_All_Turkish_Identifiers()
  • Engine_Calculates_Risk_With_New_Detectors()
  • Middleware_JsonSafe_Mode_Works_E2E()

Test Requirements:

  • Use xUnit framework (already configured)
  • Follow AAA pattern (Arrange, Act, Assert)
  • Include both positive and negative test cases
  • Test edge cases (empty strings, null values, malformed input)

Success Criteria

  • JSON-safe redaction preserves valid JSON structure
  • All 4 new Turkish identity detectors work correctly
  • New detectors are registered in engine and weights
  • Test coverage includes all new features
  • All existing tests still pass
  • Code follows existing conventions and style

Additional Notes

  • Maintain backward compatibility - existing configurations should work unchanged
  • Update README.md is NOT required in this PR (will be done separately)
  • Ensure all new code has proper null handling (project uses nullable reference types)

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Dec 19, 2025
Copilot AI and others added 2 commits December 19, 2025 07:48
…ive tests

Co-authored-by: saracmert <5950989+saracmert@users.noreply.github.com>
Co-authored-by: saracmert <5950989+saracmert@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement JSON-safe redaction for DataGuardian Add JSON-safe redaction and Turkish identity detectors Dec 19, 2025
Copilot AI requested a review from saracmert December 19, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants