Skip to content

Conversation

@tddang-linagora
Copy link
Collaborator

@tddang-linagora tddang-linagora commented Dec 24, 2025

Issue

Summary by CodeRabbit

  • Bug Fixes
    • Hardened regex-driven parsing for HTML/CSS, links, images, line-height, whitespace, emails/addresses and Base64 to prevent catastrophic slowdowns and improve correctness and performance.
  • New Features
    • Added a public helper to detect localhost email addresses and testing accessors for validation patterns.
  • Tests
    • Added comprehensive ReDoS/regression/performance stress tests and targeted address-parsing unit tests.
  • Documentation
    • Added an ADR describing the regex hardening strategy, mitigations, and testing approach.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 24, 2025

Warning

Rate limit exceeded

@tddang-linagora has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 44 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 836f077 and fedb4be.

📒 Files selected for processing (4)
  • core/lib/presentation/utils/html_transformer/base/dom_transformer.dart
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
  • core/lib/utils/html/html_utils.dart
  • core/test/utils/redos_vulnerability_test.dart

Walkthrough

Replaces many inline regular expressions with precompiled private RegExp fields across multiple modules (DomTransformer, NormalizeLineHeightInStyleTransformer, HtmlUtils, StringConvert, and AppUtils). Adds new helper methods and getters in StringConvert (including isEmailLocalhost, base64/email-localhost test getters) and adjusts email separator handling. Introduces a comprehensive ReDoS-focused test suite (core/test/utils/redos_vulnerability_test.dart) plus targeted string_convert tests and an ADR documenting the ReDoS mitigation strategy and affected files. No public API signature changes except additions in StringConvert noted above.

Suggested reviewers

  • hoangdat

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'TF-4224 Reduce ReDoS vulnerability' directly summarizes the main change: addressing ReDoS (Regular Expression Denial of Service) vulnerabilities through regex pattern optimization and refactoring.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/lib/utils/string_convert.dart (1)

169-170: Bug: contains with anchored regex always matches unintended input.

input.contains(RegExp(r'^[A-Za-z0-9+/=]+$')) does not work as intended. The contains method checks if the pattern matches any substring, but the ^...$ anchors only make sense for a full-string match. This means contains will return true if any single character matches [A-Za-z0-9+/=], which is almost always true.

Use RegExp(...).hasMatch(input) instead to test the entire string.

🔎 Proposed fix
      if (input.length % 4 == 0 &&
-          input.contains(RegExp(r'^[A-Za-z0-9+/=]+$'))) {
+          RegExp(r'^[A-Za-z0-9+/=]+$').hasMatch(input)) {
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29b54a7 and a512142.

📒 Files selected for processing (5)
  • core/lib/presentation/utils/html_transformer/base/dom_transformer.dart
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
  • core/lib/utils/html/html_utils.dart
  • core/lib/utils/string_convert.dart
  • lib/main/utils/app_utils.dart
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • lib/main/utils/app_utils.dart
  • core/lib/utils/html/html_utils.dart
  • core/lib/presentation/utils/html_transformer/base/dom_transformer.dart
  • core/lib/utils/string_convert.dart
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • lib/main/utils/app_utils.dart
  • core/lib/utils/html/html_utils.dart
  • core/lib/presentation/utils/html_transformer/base/dom_transformer.dart
  • core/lib/utils/string_convert.dart
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (default)
  • GitHub Check: Build web version and deploy
🔇 Additional comments (12)
lib/main/utils/app_utils.dart (1)

54-59: LGTM! Improved regex pattern reduces ReDoS risk.

The updated pattern uses non-capturing groups and a negated character class [^"\r\n]+ for quoted local-parts instead of the greedy .+. This eliminates potential backtracking issues while maintaining correct email validation semantics.

core/lib/presentation/utils/html_transformer/base/dom_transformer.dart (1)

28-47: LGTM! Stricter URL extraction pattern.

The updated regex with word boundary \b and negated character class [^' ")]+ improves matching precision and avoids catastrophic backtracking.

Note: The .replaceAll('\'', '').replaceAll('"', '') on line 37 is now redundant since the capture group already excludes quotes, but keeping it as defensive code is acceptable.

core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart (2)

18-22: Good use of RegExp.escape for safe pattern construction.

Building the alternation with escaped values prevents potential injection issues if _removableLineHeights were ever extended with regex-special characters.


32-33: Behavioral change: only literal spaces are now collapsed.

Changing from \s{2,} to {2,} means tabs and newlines are no longer collapsed. This is appropriate for inline style attributes where multiple literal spaces are the common case, and it reduces regex complexity.

core/lib/utils/html/html_utils.dart (4)

699-700: Good ReDoS mitigation with bounded quantifier.

The end tag pattern uses {0,128} to bound the match length, preventing catastrophic backtracking on malformed HTML. The start tag pattern is also appropriately constrained.


849-854: Safe dynamic tag pattern construction.

Using RegExp.escape on validTags before joining ensures special characters in tag names won't break the pattern. The resulting regex is well-structured with a non-capturing group.


869-873: Verify URL regex edge cases.

The URL pattern uses character classes that don't overlap excessively, reducing ReDoS risk. The negative lookbehind (?<![.,:;!?"')\]]) ensures trailing punctuation isn't captured.

However, the pattern [^\s<.]+(?:\.[^\s<.]+)* could still cause some backtracking on inputs with many dots. Consider testing with strings like www.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t to verify acceptable performance.


898-900: Well-anchored protocol check.

The anchored regex ^(?:https?|ftp|mailto|file) fails fast on non-matching input, which is efficient for the prefix check use case.

core/lib/utils/string_convert.dart (4)

12-13: Good separator pattern consolidation.

Including \n\r\t in the character class eliminates the need for separate .replaceAll() calls, reducing string allocations.


40-50: Effective pre-filtering before regex execution.

The preliminary length and whitespace checks (input.length % 4 == 0 && !input.contains(' ')) provide a fast-path rejection, reducing unnecessary regex evaluations. The Base64 character class regex is simple and anchored, posing no ReDoS risk.


138-160: Improved Markdown table detection with bounded patterns.

Using [ \t]* instead of \s* is more specific and avoids matching newlines within the pattern. The early exit logic when both conditions are determined is a good optimization.


179-193: Good pattern with negated character classes.

The named address pattern uses [^"]+, [^']+, and [^>]+ which cannot cause catastrophic backtracking. The memory-efficient iteration via for-in on allMatches is also a nice improvement.

@dab246
Copy link
Member

dab246 commented Dec 24, 2025

  • How to test it work?

@dab246
Copy link
Member

dab246 commented Dec 24, 2025

  • Initialize Regex as a const outside the function for reuse, avoiding multiple initializations.

@github-actions
Copy link

This PR has been deployed to https://linagora.github.io/tmail-flutter/4225.

@tddang-linagora
Copy link
Collaborator Author

How to test it work?

Tests added

Initialize Regex as a const outside the function for reuse, avoiding multiple initializations.

Done

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart (1)

19-23: Consider precomputing the line-height pattern as a static field.

Since _removableLineHeights is a const list, the alternation and pattern construction can be moved to a static final field instead of being rebuilt on every process() call.

Based on PR comments, this aligns with the recommendation to initialize regex outside functions for reuse.

🔎 Proposed refactor
 class NormalizeLineHeightInStyleTransformer extends DomTransformer {
   const NormalizeLineHeightInStyleTransformer();
 
   static const _removableLineHeights = ['1px', '100%'];
   static final _multipleSpacesRegex = RegExp(r' {2,}');
+  static final _lineHeightPattern = RegExp(
+    r'line-height\s*:\s*(?:' + 
+    _removableLineHeights.map(RegExp.escape).join('|') + 
+    r')\s*;?',
+    caseSensitive: false,
+  );
 
   @override
   Future<void> process({
     required Document document,
     required DioClient dioClient,
     Map<String, String>? mapUrlDownloadCID,
   }) async {
     try {
-      final alternation = _removableLineHeights.map(RegExp.escape).join('|');
-      final pattern = RegExp(
-        r'line-height\s*:\s*(?:' + alternation + r')\s*;?',
-        caseSensitive: false,
-      );
 
       final elements = document.querySelectorAll('[style*="line-height"]');
 
       for (final element in elements) {
         final originalStyle = element.attributes['style'];
         if (originalStyle == null) continue;
 
-        var updatedStyle = originalStyle.replaceAll(pattern, '').trim();
+        var updatedStyle = originalStyle.replaceAll(_lineHeightPattern, '').trim();
core/lib/utils/html/html_utils.dart (1)

860-868: Move tagRegex to a static field to avoid repeated compilation.

The tagRegex is currently reconstructed on every call to extractPlainText. Since validTags is already a compile-time constant, the regex pattern can be precomputed once as a static field.

🔎 Proposed refactor to eliminate repeated regex compilation

Add this static field near the other regex patterns (after line 36):

  static final _tagRemovalRegex = RegExp(
    '</?(?:${validTags.map(RegExp.escape).join('|')})(?:\\s+[^>]*)?>',
    caseSensitive: false,
  );

Then update extractPlainText:

-    // Delete all remaining HTML tags → replace tag with space to avoid text sticking
-    final String tagsPipe = validTags.map(RegExp.escape).join('|');
-    final tagRegex = RegExp(
-      '</?(?:$tagsPipe)(?:\\s+[^>]*)?>',
-      caseSensitive: false,
-    );
-    cleaned = cleaned.replaceAll(tagRegex, ' ');
+    // Delete all remaining HTML tags → replace tag with space to avoid text sticking
+    cleaned = cleaned.replaceAll(_tagRemovalRegex, ' ');

This eliminates redundant regex compilation and improves performance, directly addressing dab246's feedback about avoiding multiple initializations.


Based on learnings, dab246 suggested: "Initialize Regex as a const outside the function for reuse, avoiding multiple initializations."

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a512142 and 83bd970.

📒 Files selected for processing (6)
  • core/lib/presentation/utils/html_transformer/base/dom_transformer.dart
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
  • core/lib/utils/html/html_utils.dart
  • core/lib/utils/string_convert.dart
  • core/test/utils/redos_vulnerability_test.dart
  • lib/main/utils/app_utils.dart
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/lib/presentation/utils/html_transformer/base/dom_transformer.dart
  • lib/main/utils/app_utils.dart
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/lib/utils/html/html_utils.dart
  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
  • core/lib/utils/string_convert.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/lib/utils/html/html_utils.dart
  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
  • core/lib/utils/string_convert.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build web version and deploy
  • GitHub Check: analyze-test (default)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (core)
🔇 Additional comments (8)
core/test/utils/redos_vulnerability_test.dart (1)

1-384: Excellent ReDoS test coverage!

This comprehensive test suite effectively validates that the regex patterns are safe from ReDoS attacks while maintaining correctness. The tests cover:

  • Performance bounds for various edge cases
  • Validation that legitimate inputs still work correctly
  • Multiple attack vectors (long strings, nested patterns, repetitive characters)
  • Regex reuse optimization verification

The time bounds (100-500ms) are reasonable for the operations being tested.

core/lib/utils/string_convert.dart (3)

42-73: LGTM! Efficient base64 validation approach.

The optimization to check length and whitespace before running the regex validation is a good performance improvement. This reduces unnecessary regex operations for strings that obviously aren't base64.


15-22: Good use of static regex patterns for ReDoS mitigation.

The introduction of static final regex fields (_base64ValidationRegex, _mdSeparatorRegex, _asciiArtRegex, _namedAddressRegex) aligns with the PR's goal of reducing ReDoS vulnerabilities by avoiding repeated regex compilation and using safer patterns.


13-13: The implementation is safe. extractNamedAddresses preprocesses all newlines to spaces before pattern matching (line 174: input.replaceAll('\n', ' ')), so quoted display names with these whitespace characters cannot cause unintended splitting. The existing test at lines 635–652 confirms this behavior passes. For extractEmailAddress, treating newlines/tabs/carriage returns as separators is intentional and covered by tests (line 188–196).

core/lib/utils/html/html_utils.dart (4)

710-711: HTML detection logic is sound.

The approach of checking for both start and end tags before attempting to parse is good defensive programming. The detection will be safer once the _htmlStartTagRegex pattern is corrected per the previous comment.


880-880: Good refactor to use shared URL regex.

Reusing _urlRegex instead of creating inline regex instances aligns with the PR objectives and improves performance.


933-935: Protocol detection is efficient and safe.

Using startsWith(_protocolRegex) with the anchored pattern is both ReDoS-safe and performant. The refactor correctly normalizes URLs that lack explicit protocols.


27-36: The _htmlStartTagRegex pattern is not vulnerable to ReDoS; no changes required.

The pattern <[a-zA-Z][^>\s]*[^>]*> correctly handles HTML tags. While it has overlapping character classes, it does not cause catastrophic backtracking because:

  • The pattern requires a closing > to match, which terminates the search instantly
  • Incomplete tags (like <a followed by spaces without >) fail to match in negligible time
  • The existing test suite (core/test/utils/redos_vulnerability_test.dart) specifically tests this with 10,000+ character inputs and confirms performance within 100ms
  • The code is already marked as "ReDoS-safe" and represents the intended fix for TF-4224

The proposed fix <[a-zA-Z][^>]{0,128}> would incorrectly limit tag content to 128 characters, breaking legitimate HTML with longer attributes.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
core/lib/utils/html/html_utils.dart (1)

27-40: Good ReDoS mitigation with precompiled regex patterns.

The introduction of static final regex fields addresses the ReDoS vulnerability and follows the PR review recommendation to avoid repeated initialization. A few observations:

  1. _htmlEndTagRegex uses a bounded quantifier {0,128} which is excellent for preventing catastrophic backtracking.

  2. _htmlStartTagRegex has a minor redundancy in [^>\s]*[^>]* — the second character class subsumes the first for non-whitespace characters. Consider simplifying:

🔎 Optional simplification
-  static final _htmlStartTagRegex = RegExp(r'<[a-zA-Z][^>\s]*[^>]*>');
+  static final _htmlStartTagRegex = RegExp(r'<[a-zA-Z][^>]*>');

However, the current pattern is still safe (linear backtracking) since the split point is determined by whitespace boundaries.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83bd970 and b52d1f9.

📒 Files selected for processing (3)
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
  • core/lib/utils/html/html_utils.dart
  • core/test/utils/string_convert_test.dart
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/lib/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/test/utils/string_convert_test.dart
  • core/lib/utils/html/html_utils.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/test/utils/string_convert_test.dart
  • core/lib/utils/html/html_utils.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (default)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: Build web version and deploy
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (core)
🔇 Additional comments (7)
core/test/utils/string_convert_test.dart (1)

818-843: Good edge case coverage for the refactored regex patterns.

These tests validate important edge cases for the extractNamedAddresses function after the ReDoS-safe refactoring:

  • Base64-like plain text not being corrupted
  • Bracket-only addresses with empty names
  • Quoted names containing commas (which would otherwise be separators)
  • Graceful handling of malformed input

Minor formatting nit: Line 819 has an extra leading space before the comment.

core/lib/utils/html/html_utils.dart (6)

31-35: URL regex is well-structured for ReDoS safety.

The pattern uses literal . as separators between [^\s<.]+ segments, which prevents exponential backtracking. The negative lookbehind correctly strips trailing punctuation from matched URLs.


37-40: Good use of RegExp.escape for dynamic pattern construction.

Building _tagRemovalRegex from the validTags list with proper escaping prevents regex injection and keeps the pattern maintainable.


714-715: LGTM!

The refactored code correctly uses the precompiled regex patterns for HTML detection.


864-867: LGTM!

The tag removal and whitespace normalization now use the shared precompiled patterns, eliminating repeated regex compilation overhead.


879-879: LGTM!

The URL regex is now passed as a shared instance rather than being constructed inline.


932-934: LGTM!

Using the precompiled _protocolRegex for protocol detection is efficient and correctly determines when to prepend https:// to relative URLs like www.example.com.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
core/test/utils/redos_vulnerability_test.dart (1)

68-93: Consider testing the actual transformer directly.

The tests create inline RegExp instances rather than exercising the actual NormalizeLineHeightInStyleTransformer. To fully validate the ReDoS fixes, consider calling the transformer's methods directly (if available) to ensure the static regex patterns are actually tested.

core/lib/utils/string_convert.dart (1)

25-26: Consider alternative approach for test visibility.

The public getter exposes internal implementation details solely for testing. Consider these alternatives:

  • Test the public behavior of methods using these patterns rather than the patterns themselves
  • If direct pattern testing is essential, document why this exposure is necessary

The current approach creates a permanent public API for a testing concern.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b52d1f9 and 652b0ce.

📒 Files selected for processing (2)
  • core/lib/utils/string_convert.dart
  • core/test/utils/redos_vulnerability_test.dart
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/utils/string_convert.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/utils/string_convert.dart
📚 Learning: 2025-12-12T11:00:57.625Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4205
File: core/lib/utils/app_logger.dart:80-86
Timestamp: 2025-12-12T11:00:57.625Z
Learning: In the tmail-flutter codebase, logging philosophy differs by platform: web applications use the `webConsoleEnabled` parameter in `app_logger.dart` to selectively display logs in production console, while mobile applications intentionally suppress all logs in release builds (enforced by `BuildUtils.isDebugMode` check in `_debugPrint`).

Applied to files:

  • core/lib/utils/string_convert.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build web version and deploy
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (default)
🔇 Additional comments (10)
core/test/utils/redos_vulnerability_test.dart (6)

1-12: LGTM! Excellent documentation and setup.

The imports are appropriate, and the documentation clearly explains the purpose of these ReDoS vulnerability tests. The test structure follows best practices.


11-66: LGTM! Comprehensive ReDoS protection tests.

The background-image regex tests provide excellent coverage:

  • Performance validation against catastrophic backtracking
  • Correctness validation with various valid formats
  • Proper null handling for malformed inputs

95-177: LGTM! Thorough HtmlUtils regex validation.

Excellent coverage of HTML processing patterns:

  • Bounded quantifiers for tag matching
  • URL detection with malicious inputs
  • Both performance and correctness validation
  • Appropriate timing thresholds

179-274: LGTM! Comprehensive StringConvert pattern tests.

The tests provide excellent coverage:

  • Base64 validation with long inputs
  • Markdown and ASCII art detection
  • Named address extraction stress test (1000 addresses)
  • Email extraction with malicious separators
  • Table identification correctness

276-327: LGTM! Well-documented testing approach.

The localhost email regex tests provide good coverage. The comment clearly explains why the regex pattern is tested directly rather than through AppUtils.isEmailLocalhost. This is a reasonable approach given the module boundary constraints.


329-369: LGTM! Excellent stress and optimization tests.

The performance stress tests exercise various edge cases comprehensively. The regex initialization test on lines 362-367 effectively validates that regex patterns are static and reused rather than recreated, which is crucial for performance.

core/lib/utils/string_convert.dart (4)

13-23: LGTM! Excellent optimization addressing reviewer feedback.

The static final regex patterns address the reviewer's (dab246's) recommendation to initialize regex outside functions for reuse. This prevents repeated regex compilation and improves performance, which is crucial for ReDoS mitigation.

Based on learnings, this directly addresses past review feedback about regex initialization.


46-77: LGTM! Well-optimized base64 handling.

The two-step base64 validation approach is excellent:

  1. Quick pre-check (length % 4 and no spaces) eliminates obvious non-base64
  2. Regex validation only when pre-check passes

This prevents expensive regex operations on clearly invalid inputs, directly mitigating ReDoS risks.


143-163: LGTM! Efficient table detection with early exit.

The refactored logic effectively uses the static regex patterns and includes a smart early exit optimization (lines 158-159). This prevents unnecessary checks once both conditions are determined, improving performance without sacrificing correctness.


165-204: LGTM! Memory-efficient address extraction.

The refactored implementation uses direct iteration over _namedAddressRegex.allMatches (line 182) instead of building intermediate lists. This streaming approach reduces memory allocation, which is especially beneficial when processing large recipient lists.

@dab246
Copy link
Member

dab246 commented Dec 26, 2025

@coderabbitai please review on this change.

@coderabbitai
Copy link

coderabbitai bot commented Dec 26, 2025

@dab246, I'll review the changes in this PR that address ReDoS vulnerabilities by refactoring inline regex patterns to static final fields and adding comprehensive test coverage.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 652b0ce and f537019.

📒 Files selected for processing (1)
  • docs/adr/0069-redos-vulnerability-mitigation.md
🧰 Additional context used
🪛 LanguageTool
docs/adr/0069-redos-vulnerability-mitigation.md

[grammar] ~230-~230: Ensure spelling is correct
Context: ...trict time limits: - Simple patterns: < 100ms - Complex operations (e.g., extractNamedAd...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~235-~235: Ensure spelling is correct
Context: ...nd image regex: 10+ seconds → now < 100ms - URL regex with long URLs: *30+ seconds...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/adr/0069-redos-vulnerability-mitigation.md

273-273: Bare URL used

(MD034, no-bare-urls)


274-274: Bare URL used

(MD034, no-bare-urls)


275-275: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (default)
  • GitHub Check: Build web version and deploy
🔇 Additional comments (1)
docs/adr/0069-redos-vulnerability-mitigation.md (1)

1-277: Excellent, comprehensive ADR document.

This ADR effectively documents the ReDoS mitigation strategy with clear context, decision rationale, concrete examples, and a solid testing approach. The three mitigation strategies (pattern optimization, static compilation, algorithmic optimization) are well-explained with before/after code examples. The testing strategy with specific performance thresholds and categorized test cases demonstrates thorough planning. The document appropriately acknowledges trade-offs and includes follow-up actions.

@hoangdat
Copy link
Member

How about to add some kinds of tests?

test('Should handle Gmail newsletter HTML structure', () {
        // Real pattern from Gmail promotional emails
        const gmailHtml = '''
          <div dir="ltr">
            <table cellpadding="0" cellspacing="0" border="0" width="100%">
              <tr>
                <td style="background-image: url('https://ci3.googleusercontent.com/proxy/ABCDEFGHIJKLMNOP1234567890abcdefghijklmnop=s0-d-e1-ft#https://example.com/images/header.jpg'); background-size: cover;">
                  <p style="font-family: Arial, sans-serif;">Hello User,</p>
                  <p>Visit https://example.com/products for our latest offers.</p>
                </td>
              </tr>
            </table>
          </div>
        ''';

        final stopwatch = Stopwatch()..start();

        // Extract plain text
        final plainText = HtmlUtils.extractPlainText(gmailHtml);

        // Wrap URLs
        final linkedHtml = HtmlUtils.wrapPlainTextLinks(gmailHtml);

        stopwatch.stop();

        // Performance check
        expect(stopwatch.elapsedMilliseconds, lessThan(200),
            reason: 'Gmail HTML processing should be fast');

        // Functional check
        expect(plainText, contains('Hello User'));
        expect(plainText, contains('Visit'));
        expect(linkedHtml, contains('<a href='));
      });
test('Should handle URLs with complex query parameters', () {
        const complexUrls = [
          // Tracking parameters
          'https://example.com/article?utm_source=email&utm_medium=newsletter&utm_campaign=spring2024',
          // Nested URL in parameter
          'https://redirect.example.com?url=https://target.com/page&ref=email',
          // Fragment identifier
          'https://docs.example.com/guide#installation-steps',
          // Multiple fragments (invalid but should handle)
          'https://example.com/page#section1#subsection',
          // Query + Fragment
          'https://api.example.com/v1/users?id=123&format=json#results',
          // Encoded characters
          'https://example.com/search?q=hello%20world&lang=en',
        ];

        for (final url in complexUrls) {
          final html = '<p>Visit $url for details</p>';

          final stopwatch = Stopwatch()..start();
          final result = HtmlUtils.wrapPlainTextLinks(html);
          stopwatch.stop();

          // Performance check
          expect(stopwatch.elapsedMilliseconds, lessThan(100),
              reason: 'Should process quickly: $url');

          // Functional check
          expect(result, contains('<a href='),
              reason: 'Should wrap URL: $url');
          expect(result, contains(url.split('?')[0]),
              reason: 'Should preserve URL base: $url');
        }
      });
test('Should handle HTML tags at 128 character boundary', () {
        final exactly128 = '</div${'x' * 122}>';

        // 129 characters (over limit, should still process fast)
        final over128 = '</div${'x' * 123}>';

        // Very long (1000 chars - malicious)
        final malicious = '</div${'x' * 995}>';

        final stopwatch1 = Stopwatch()..start();
        HtmlUtils.addQuoteToggle('Text $exactly128 end');
        stopwatch1.stop();

        final stopwatch2 = Stopwatch()..start();
        HtmlUtils.addQuoteToggle('Text $over128 end');
        stopwatch2.stop();

        final stopwatch3 = Stopwatch()..start();
        HtmlUtils.addQuoteToggle('Text $malicious end');
        stopwatch3.stop();

        // All should complete quickly (bounded quantifier prevents backtracking)
        expect(stopwatch1.elapsedMilliseconds, lessThan(100),
            reason: '128 chars (boundary) should be fast');
        expect(stopwatch2.elapsedMilliseconds, lessThan(100),
            reason: '129 chars (over boundary) should be fast');
        expect(stopwatch3.elapsedMilliseconds, lessThan(100),
            reason: '1000 chars (malicious) should be fast');
      });

@hoangdat
Copy link
Member

  • test with special character in email
  • Deeply Nested HTML Structure
        final deepNesting = '<div>' * 50 +
                           '<p>Deep content with https://example.com link</p>' +
                           '</div>' * 50;

        // 100 levels (extreme)
        final extremeNesting = '<span>' * 100 +
                              'Content' +
                              '</span>' * 100;

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
core/test/utils/redos_vulnerability_test.dart (2)

449-500: Consider extracting the duplicated regex pattern.

The comment on lines 454-455 appropriately acknowledges the limitation of testing the inline pattern rather than the production code. However, the same regex pattern is duplicated three times (lines 456-459, 467-469, 484-486).

🔎 Suggested refactor to reduce duplication
    group('AppUtils._emailLocalhostRegex', () {
+     // Pattern must match the one in lib/main/utils/app_utils.dart
+     final localhostEmailRegex = RegExp(
+       r'^(?:"[^"\r\n]+"|[^<>()[\]\\.,;:\s@"]+(?:\.[^<>()[\]\\.,;:\s@"]+)*)@localhost$',
+     );
+
      test('should handle very long email addresses efficiently', () {
        final longEmail = '${'a' * 10000}@localhost';

        final stopwatch = Stopwatch()..start();
        // Note: We can't directly test AppUtils.isEmailLocalhost in core tests
        // But we can test the regex pattern
-       final regex = RegExp(
-         r'^(?:"[^"\r\n]+"|[^<>()[\]\\.,;:\s@"]+(?:\.[^<>()[\]\\.,;:\s@"]+)*)@localhost$',
-       );
-       final result = regex.hasMatch(longEmail);
+       final result = localhostEmailRegex.hasMatch(longEmail);
        stopwatch.stop();

Apply similar changes to the other two tests.


68-93: Test group name is misleading—tests regex patterns in isolation, not the transformer.

The test group labeled 'NormalizeLineHeightInStyleTransformer' (lines 68–93) recreates regex patterns inline rather than exercising the actual transformer. While the inline patterns (r'line-height\s*:\s*(?:1px|100%)\s*;?' and r' {2,}') currently match the transformer's static patterns, the tests don't validate against the production code.

A dedicated test suite already exists in core/test/utils/normalize_line_height_in_style_transformer_test.dart that properly instantiates the transformer and tests its process() method. Consider either renaming this group to clarify it tests regex performance in isolation, or consolidating redos vulnerability checks into the dedicated transformer test file to avoid confusion.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f537019 and 40805af.

📒 Files selected for processing (1)
  • core/test/utils/redos_vulnerability_test.dart
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build web version and deploy
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (default)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (server_settings)
🔇 Additional comments (6)
core/test/utils/redos_vulnerability_test.dart (6)

1-10: LGTM! Excellent test structure and documentation.

The file is well-organized with clear documentation explaining the ReDoS vulnerability testing purpose. The imports are appropriate and the test group structure is clean.


11-66: LGTM! Comprehensive DomTransformer tests.

The tests properly exercise the production code through ImageTransformer.findImageUrlFromStyleTag with appropriate adversarial inputs (long strings, nested parentheses, malformed patterns) and validate both performance (<100ms) and correctness. The valid pattern tests ensure functionality is preserved after the ReDoS fix.


95-305: LGTM! Exceptionally comprehensive HtmlUtils test coverage.

These tests thoroughly address all requirements from the PR comments:

  • Gmail newsletter HTML structure with performance validation (<200ms) ✓
  • Complex query parameters (tracking, nested URLs, fragments, encoded chars) ✓
  • 128-character boundary testing at exactly 128, 129, and 1000 chars ✓
  • Deeply nested HTML (50 and 100 levels) with performance assertions ✓

The tests properly exercise production code and validate both performance and correctness, ensuring the ReDoS fixes don't break functionality.


307-447: LGTM! Thorough StringConvert pattern testing.

The tests comprehensively cover ReDoS scenarios including:

  • Long invalid base64 strings
  • Malicious markdown and ASCII art patterns
  • Bulk processing (1000 named addresses)
  • Special character emails as requested in PR comments

The functional tests for text table identification (both valid and invalid inputs) ensure correctness is preserved alongside performance improvements.


502-532: LGTM! Effective performance stress testing.

The try-catch approach with the comment "It's OK if these fail, we're just testing performance" is appropriate for stress testing. The 1-second threshold effectively validates that none of the regex patterns exhibit catastrophic backtracking behavior on adversarial inputs.


534-542: LGTM! Good verification of the regex reuse optimization.

This test validates that the regex patterns are initialized once and reused (as recommended by dab246 in the PR comments), using identical() to confirm reference equality rather than just pattern equality.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
core/test/utils/redos_vulnerability_test.dart (2)

68-93: Consider testing the actual transformer implementation instead of inline regex patterns.

These tests create new RegExp objects inline rather than exercising the actual NormalizeLineHeightInStyleTransformer static patterns. If the implementation's regex differs from these inline patterns, tests could pass while the actual code remains vulnerable.

🔎 Suggested approach

Consider either:

  1. Import and test the actual transformer class directly (similar to how ImageTransformer is tested in the previous group)
  2. Expose test getters for the static regex patterns similar to StringConvert.base64ValidationRegex (line 529)
// Option 1: Test via transformer
import 'package:core/presentation/utils/html_transformer/dom/normalize_line_height_in_style_transformer.dart';

test('should handle very long style strings efficiently', () {
  final transformer = NormalizeLineHeightInStyleTransformer();
  // Create a mock DOM element and test through the transformer
  ...
});

449-464: Consider moving the regex pattern to a shared location to avoid duplication.

The inline regex at lines 450-452 duplicates the pattern from AppUtils._emailLocalhostRegex. As noted in the comment, this is because core tests can't import from lib/main/. However, pattern duplication creates a maintenance risk — if the implementation pattern changes, this test would still pass with the old pattern.

🔎 Options to consider
  1. Move the regex to core: Extract _emailLocalhostRegex to a shared utility in core/lib/utils/ that both AppUtils and these tests can import.

  2. Add integration tests in lib/test/: Create lib/test/main/utils/app_utils_test.dart that can directly test AppUtils.isEmailLocalhost.

  3. Keep as-is with documentation: If the above aren't feasible, add a comment linking to the source pattern:

// Pattern duplicated from lib/main/utils/app_utils.dart - keep in sync
final localhostRegex = RegExp(
  r'^(?:"[^"\r\n]+"|[^<>()[\]\\.,;:\s@"]+(?:\.[^<>()[\]\\.,;:\s@"]+)*)@localhost$',
);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40805af and 4efa2f6.

📒 Files selected for processing (1)
  • core/test/utils/redos_vulnerability_test.dart
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: analyze-test (default)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: Build web version and deploy
🔇 Additional comments (5)
core/test/utils/redos_vulnerability_test.dart (5)

1-9: LGTM!

Good test file structure with clear documentation of the test purpose. The imports are appropriate for the tested functionality.


11-66: LGTM!

Excellent ReDoS test coverage for the background-image regex. Tests cover:

  • Very long input strings (10000 chars)
  • Deeply nested parentheses (1000 levels)
  • Malformed patterns with excessive whitespace
  • Valid pattern variations with different quote styles

The time thresholds are reasonable and the tests use the actual ImageTransformer implementation.


95-305: LGTM!

Comprehensive test coverage for HtmlUtils regex patterns. The tests properly use the actual HtmlUtils implementation and cover:

  • Bounded quantifier behavior at the 128-character boundary
  • Gmail-like realistic HTML structures
  • Complex URL patterns with query params, fragments, and encoding
  • Deep nesting scenarios (50-100 levels)

Time thresholds are appropriately scaled for complexity.


526-533: Good test for regex reuse, consider extending coverage.

The identical() check correctly verifies static regex reuse. Consider adding similar checks for other precompiled patterns mentioned in the PR (e.g., HtmlUtils patterns if they expose test getters) to ensure the optimization is applied consistently.


494-523: LGTM!

Good stress test design — using try-catch to focus purely on performance characteristics without failing on expected parsing errors. The 1-second threshold is appropriately lenient for 100k+ character inputs.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
core/test/utils/redos_vulnerability_test.dart (2)

475-512: Mixed String/Matcher map pattern works but could be clearer.

The approach of storing both String and Matcher values in the same map (line 490) works correctly with the type-checking at lines 504-510, but it's unconventional. Consider using a consistent type for better readability.

🔎 Alternative approach for clarity
-      test('Should handle emails with special characters - valid cases', () {
-        // Valid emails that should be extracted successfully
-        final validEmailInputs = {
-          'user+tag@example.com': 'user+tag@example.com',
-          ...
-          'Multiple emails: alice@test.com, bob@test.org': contains('alice@test.com'),
-        };
-
-        for (final entry in validEmailInputs.entries) {
-          ...
-          if (expectedMatch is String) {
-            expect(result, contains(expectedMatch),
-                reason: 'Should extract "$expectedMatch" from: $input');
-          } else {
-            expect(result, expectedMatch,
-                reason: 'Should match expected pattern from: $input');
-          }
+      test('Should handle emails with special characters - valid cases', () {
+        // Valid emails that should be extracted successfully
+        final validEmailInputs = <String, String>{
+          'user+tag@example.com': 'user+tag@example.com',
+          ...
+          'Multiple emails: alice@test.com, bob@test.org': 'alice@test.com',
+        };
+
+        for (final entry in validEmailInputs.entries) {
+          ...
+          expect(result, contains(entry.value),
+              reason: 'Should extract "${entry.value}" from: ${entry.key}');
         }

657-663: Consider adding explicit ignore annotation for linter compliance.

The empty catch block is intentional per the comment, but adding an // ignore: comment or using _ for the exception variable would make this clearer and satisfy linter rules if applicable.

🔎 Suggested improvement
           try {
             StringConvert.extractEmailAddress(input);
             StringConvert.isTextTable('$input\n$input');
             HtmlUtils.extractPlainText(input);
-          } catch (e) {
-            // It's OK if these fail, we're just testing performance
+          } catch (_) {
+            // Intentionally ignored - focus is on performance, not correctness
           }
core/lib/utils/string_convert.dart (1)

59-90: Approve the optimization, but consider consistency with extractNamedAddresses.

The refactored base64 validation with early checks (length % 4 == 0 and !input.contains(' ')) is a good performance optimization before applying the regex.

However, Line 184 in extractNamedAddresses uses _base64ValidationRegex.hasMatch(input) without the length and whitespace checks. Consider applying the same optimization there for consistency and performance:

🔎 Suggested consistency improvement for extractNamedAddresses
-      if (input.length % 4 == 0 && _base64ValidationRegex.hasMatch(input)) {
+      if (input.length % 4 == 0 && !input.contains(' ') && _base64ValidationRegex.hasMatch(input)) {
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4efa2f6 and 15fcc7f.

📒 Files selected for processing (3)
  • core/lib/utils/string_convert.dart
  • core/test/utils/redos_vulnerability_test.dart
  • lib/main/utils/app_utils.dart
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/utils/string_convert.dart
  • lib/main/utils/app_utils.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/utils/string_convert.dart
  • lib/main/utils/app_utils.dart
📚 Learning: 2025-12-12T11:00:57.625Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4205
File: core/lib/utils/app_logger.dart:80-86
Timestamp: 2025-12-12T11:00:57.625Z
Learning: In the tmail-flutter codebase, logging philosophy differs by platform: web applications use the `webConsoleEnabled` parameter in `app_logger.dart` to selectively display logs in production console, while mobile applications intentionally suppress all logs in release builds (enforced by `BuildUtils.isDebugMode` check in `_debugPrint`).

Applied to files:

  • core/lib/utils/string_convert.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build web version and deploy
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (default)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (forward)
🔇 Additional comments (12)
core/test/utils/redos_vulnerability_test.dart (5)

16-72: Well-structured ReDoS tests for background-image regex.

Good coverage of malicious inputs (long strings, nested parentheses) and valid patterns. The force unwrap on line 69 is safe given the preceding isNotNull assertion.


74-199: Comprehensive tests for line-height transformer.

Good coverage of edge cases including long styles, space normalization, case-insensitive matching, and empty style attribute removal. The tests effectively verify both performance and correctness.


201-411: Excellent coverage of HtmlUtils regex patterns.

Tests thoroughly cover the bounded quantifier mitigations (128-character boundary tests), deep nesting scenarios (50 and 100 levels as requested), and realistic inputs like Gmail newsletter HTML. The boundary tests on lines 354-382 specifically validate the {0,128} quantifier limit.


601-639: Good coverage for isEmailLocalhost.

Tests cover valid localhost patterns, including edge cases like quoted local parts, and properly verify rejection of invalid patterns.


673-681: Good verification of regex reuse optimization.

Using identical() correctly verifies that the regex pattern is a static field rather than being recreated on each access, which is a key part of the ReDoS mitigation strategy.

core/lib/utils/string_convert.dart (6)

7-7: LGTM! Good addition of visibleForTesting and expanded email separators.

The addition of semicolons, newlines, and tabs to the email separator pattern better handles real-world email address inputs and multi-line formats.

Also applies to: 13-14


34-39: LGTM! Clean API design.

The delegation to the static regex pattern and input normalization with trim() are appropriate.


155-176: LGTM! Efficient table detection with early exit optimization.

The refactored logic correctly distinguishes Markdown tables from ASCII art tables and includes an early exit when both conditions are determined, improving performance.


178-217: LGTM! Memory-efficient streaming iteration.

The refactoring to use _namedAddressRegex.allMatches(input) with direct iteration instead of building an intermediate list is a good memory optimization, especially for large inputs.


16-26: Comprehensive performance and edge-case tests have been implemented.

The requested tests already exist in core/test/utils/redos_vulnerability_test.dart and thoroughly validate the ReDoS mitigations:

Performance tests (with Stopwatch measurements):

  • Gmail newsletter HTML structure: <200ms ✓
  • URLs with complex query parameters (tracking params, nested URLs, fragments, encoded characters): <100ms each ✓
  • HTML tags at 128-character boundary: <100ms ✓

Edge-case tests:

  • Special characters in email addresses ✓
  • Deeply nested HTML structures (50-level and 100-level nesting) ✓
  • URLs with tracking parameters (utm_source, utm_medium, utm_campaign) ✓
  • Nested URLs, fragments, and encoded characters ✓
  • Localhost email validation ✓

Additional coverage includes validation of base64, markdown separator, ASCII art, and named address regex patterns with malicious inputs.


16-32: The localhost email regex is well-designed and handles edge cases appropriately.

The ReDoS mitigation approach of using static regex fields is excellent. All patterns are safe from catastrophic backtracking, including the _emailLocalhostRegex.

The pattern correctly handles the edge cases:

  • Quoted local parts allow special characters: "user@name"@localhost
  • Multiple consecutive dots are rejected: user..name@localhost
  • Leading/trailing dots are rejected: .user@localhost, user.@localhost
  • Case sensitivity is intentional: user@LOCALHOST requires lowercase "localhost" ✗
  • Newlines in quoted sections are properly excluded by [^"\r\n]+
lib/main/utils/app_utils.dart (1)

54-56: LGTM! Clean refactoring eliminates code duplication.

The delegation to StringConvert.isEmailLocalhost(email) centralizes the localhost email detection logic, eliminating duplicate regex patterns and improving maintainability.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
core/test/utils/redos_vulnerability_test.dart (2)

35-44: Clarify test expectation for nested parentheses.

The input '(' * 1000 + 'example.jpg' + ')' * 1000 creates a deeply malformed URL pattern. Line 43 asserts result, isNotNull, but this might be implementation-dependent. If the regex correctly rejects this malformed input, null would be the expected result.

Consider documenting whether this test verifies "performance under malicious input" (regardless of result) or "successful extraction despite nesting."


384-410: Consider testing link wrapping on extreme nesting.

For the 50-level nesting case, both extractPlainText and wrapPlainTextLinks are tested. However, for the 100-level extreme nesting (lines 396-398), only extractPlainText is called. If URL extraction should also work on deeply nested HTML, consider adding a similar wrapPlainTextLinks call.

This aligns with reviewer hoangdat's request to validate both correctness and performance on deep nesting.

🔎 Suggested addition
         final stopwatch2 = Stopwatch()..start();
         final plainText2 = HtmlUtils.extractPlainText(extremeNesting);
+        final linked2 = HtmlUtils.wrapPlainTextLinks(extremeNesting);
         stopwatch2.stop();

         // Performance checks
         expect(stopwatch1.elapsedMilliseconds, lessThan(200),
             reason: '50 levels of nesting should process quickly');
         expect(stopwatch2.elapsedMilliseconds, lessThan(300),
             reason: '100 levels of nesting should process quickly');

         // Functional checks
         expect(plainText1, contains('Deep content'));
         expect(linked1, contains('<a href='));
         expect(plainText2, contains('Content'));
+        // Verify extreme nesting doesn't break link wrapping
+        expect(linked2, isNotNull);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15fcc7f and 4cda176.

📒 Files selected for processing (2)
  • core/lib/utils/string_convert.dart
  • core/test/utils/redos_vulnerability_test.dart
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/utils/string_convert.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
  • core/lib/utils/string_convert.dart
📚 Learning: 2025-12-12T11:00:57.625Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4205
File: core/lib/utils/app_logger.dart:80-86
Timestamp: 2025-12-12T11:00:57.625Z
Learning: In the tmail-flutter codebase, logging philosophy differs by platform: web applications use the `webConsoleEnabled` parameter in `app_logger.dart` to selectively display logs in production console, while mobile applications intentionally suppress all logs in release builds (enforced by `BuildUtils.isDebugMode` check in `_debugPrint`).

Applied to files:

  • core/lib/utils/string_convert.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (default)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (model)
  • GitHub Check: Build web version and deploy
🔇 Additional comments (10)
core/test/utils/redos_vulnerability_test.dart (5)

1-11: LGTM - Imports and mock setup are appropriate.

The imports cover necessary dependencies for testing the ReDoS mitigations, and MockDioClient properly mocks the DioClient for transformer tests.


74-102: LGTM - Comprehensive transformer tests.

The test setup and performance validation for NormalizeLineHeightInStyleTransformer are well-structured. The long style string test (10,000 repetitions) effectively validates ReDoS mitigation.


475-507: LGTM - Well-structured email validation tests.

The explicit Map<String, String> for valid email inputs with expected extracted values addresses the previous review feedback about fragile assertions. Clear separation between valid cases (lines 475-507), quoted cases (lines 509-531), and edge cases (lines 533-559).


636-666: LGTM - Effective stress testing approach.

The stress tests appropriately focus on performance (1-second timeout) rather than correctness for edge-case inputs. The try-catch block (lines 652-658) correctly suppresses exceptions to measure execution time under adversarial inputs.


668-676: LGTM - Validates static regex reuse.

The identical(regex1, regex2) check correctly verifies that the regex is a static field rather than recreated on each access, confirming the optimization per reviewer dab246's recommendation.

core/lib/utils/string_convert.dart (5)

12-26: LGTM - Well-designed static regex patterns.

The static final declarations ensure regex compilation happens once, addressing the ReDoS mitigation strategy. Key observations:

  • emailSeparatorPattern now includes newlines/tabs, eliminating the need for replaceAll() preprocessing
  • _base64ValidationRegex uses simple character class with anchors (ReDoS-safe)
  • _mdSeparatorRegex and _asciiArtRegex are bounded and non-greedy
  • _emailLocalhostRegex uses anchors (^...$) preventing catastrophic backtracking

34-39: LGTM - Clean implementation.

The isEmailLocalhost method is straightforward: trim, then match. This properly handles inputs with leading/trailing whitespace.


59-90: LGTM - Efficient two-step Base64 validation.

The optimization correctly applies:

  1. URL decoding first (line 63)
  2. Quick length/whitespace pre-checks before regex (line 67)
  3. Regex validation only if pre-checks pass (line 69)
  4. Direct split without redundant replaceAll() (lines 80-84)

This reduces unnecessary regex evaluations for most inputs.


156-176: LGTM - Efficient table detection with early exit.

The single-loop approach with early exit (line 172) is well-optimized. The logic correctly handles both Markdown-style tables and ASCII art tables.

Minor note: lines.length >= 2 in line 175 is already guaranteed by the early return on line 159, but keeping it makes the return condition self-documenting.


178-217: LGTM - Memory-efficient streaming iteration.

The refactored implementation:

  • Consolidates Base64 validation into a single condition (line 184)
  • Uses streaming iteration via for-in on allMatches() (line 195) instead of materializing all matches into a list
  • Preserves the original gap-handling logic for plain addresses between named addresses

This addresses both the ReDoS mitigation (static regex) and memory efficiency concerns. Based on past review comments, unit tests were added to validate this behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
core/test/utils/redos_vulnerability_test.dart (1)

685-692: Verify static initialization of regex patterns.

The test correctly uses identical() to verify that base64ValidationRegex returns the same static instance on repeated calls, confirming the regex is not recreated.

💡 Optional: Expand coverage to verify all refactored regex patterns

The PR refactored multiple regex patterns across several classes (DomTransformer, NormalizeLineHeightInStyleTransformer, HtmlUtils, StringConvert). Consider expanding this test to verify static initialization for other key patterns:

test('regex patterns should be reused, not recreated', () {
  // StringConvert patterns
  final base64_1 = StringConvert.base64ValidationRegex;
  final base64_2 = StringConvert.base64ValidationRegex;
  expect(identical(base64_1, base64_2), true, 
    reason: 'base64ValidationRegex should be static');

  // Add similar checks for other exposed patterns
  // Example (if accessible):
  // final emailLocalhostTest1 = StringConvert.emailLocalhostTestRegex;
  // final emailLocalhostTest2 = StringConvert.emailLocalhostTestRegex;
  // expect(identical(emailLocalhostTest1, emailLocalhostTest2), true,
  //   reason: 'emailLocalhostTestRegex should be static');
});

This ensures comprehensive verification of the refactoring goal: moving inline regex to static/const fields.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cda176 and 836f077.

📒 Files selected for processing (1)
  • core/test/utils/redos_vulnerability_test.dart
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/test/utils/redos_vulnerability_test.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: analyze-test (fcm)
  • GitHub Check: analyze-test (model)
  • GitHub Check: analyze-test (core)
  • GitHub Check: analyze-test (forward)
  • GitHub Check: analyze-test (contact)
  • GitHub Check: analyze-test (email_recovery)
  • GitHub Check: analyze-test (rule_filter)
  • GitHub Check: analyze-test (server_settings)
  • GitHub Check: analyze-test (default)
🔇 Additional comments (7)
core/test/utils/redos_vulnerability_test.dart (7)

1-10: LGTM!

The imports and mock setup are appropriate for comprehensive ReDoS vulnerability testing. The MockDioClient is correctly defined for transformer tests that require a DioClient dependency.


17-84: Excellent ReDoS coverage for background-image extraction.

The test suite comprehensively validates:

  • Long URLs (10k chars) don't trigger catastrophic backtracking
  • Deeply nested parentheses (1000 levels) are handled efficiently
  • Malformed patterns fail quickly
  • Valid patterns with various quote styles are correctly parsed

The comment explaining permissive parser behavior (lines 40-42) is particularly helpful for future maintainers.


86-211: Thorough testing of line-height normalization with ReDoS safeguards.

The test suite validates:

  • Performance with very long style strings (10k repetitions) and excessive spaces
  • Correct removal of line-height: 1px and 100% patterns (including case-insensitive matching)
  • Preservation of other line-height values
  • Cleanup of empty style attributes
  • Space normalization after transformations

All tests complete within <100ms bounds, confirming no catastrophic backtracking.


213-428: Outstanding HtmlUtils test coverage addressing all reviewer requests.

The test suite comprehensively validates ReDoS mitigations and addresses all feedback from @hoangdat:

Gmail newsletter HTML (lines 296-329): Tests realistic promotional email structure with long proxy URLs and embedded links within <200ms

Complex URL handling (lines 331-364): Validates tracking parameters, nested URLs, fragments, query strings, and encoded characters within <100ms per URL

Boundary-length HTML tags (lines 366-394): Tests at exactly 128 chars, 129 chars, and 1000 chars (malicious) to confirm bounded quantifier {0,128} prevents catastrophic backtracking

Deeply nested HTML (lines 396-427): Tests 50-level (realistic) and 100-level (extreme) nesting for both plain-text extraction and link wrapping

The test organization and performance assertions ensure the refactored regex patterns are both correct and performant.


430-611: Comprehensive StringConvert testing with excellent edge-case coverage.

The test suite validates ReDoS mitigations across multiple regex patterns and addresses reviewer feedback:

Special characters in email addresses (lines 492-524): Tests plus signs, dots, underscores, hyphens, numbers, international domains, and long local parts. Each case explicitly maps input to expected output, resolving the fragile assertion issue from the previous review.

Strong edge-case handling:

  • Quoted email addresses (lines 526-548)
  • Invalid emails that should fail gracefully (lines 550-576)
  • Malicious separator patterns with 10k repetitions (lines 479-490)
  • Stress tests with 1000 named addresses (lines 464-477)

All tests validate both performance (<100ms–500ms bounds) and correctness.


613-651: LGTM! Focused localhost email validation.

The tests validate:

  • Performance with very long localhost emails (10k chars) within <100ms
  • Correct matching of valid localhost patterns (including quoted names)
  • Proper rejection of invalid localhost emails

653-683: LGTM! Effective stress testing across all regex patterns.

The stress tests validate performance under extreme conditions (100k chars, 1000-level nesting, many pipes/dashes) across multiple functions. The intentional try-catch block (lines 669-675) correctly prioritizes performance verification over correctness for these adversarial inputs, ensuring none trigger catastrophic backtracking within the 1-second bound.

Copy link
Member

@dab246 dab246 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hoangdat hoangdat merged commit 0604162 into master Jan 6, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants