The air-gap system supports content-based filtering of events at the upstream side before transmission, and in the resend application when replaying missing events. This allows you to control which events are forwarded to the downstream Kafka based on regex patterns matching the payload content.
Important: Filtered events are not dropped. Instead, they are sent across the diode with an empty payload. This ensures the downstream gap-detector sees every sequence ID and does not report false gaps. The deduplication application (
PartitionDedupApp) silently discards zero-length events before writing to the clean topic, so they never appear in downstream Kafka.
Input filtering provides:
- Payload inspection: Filter events based on their content, not just metadata
- Allow/deny rules: Create ordered lists of regex patterns with allow or deny actions
- First match wins: Rules are evaluated in order, the first matching rule determines the action
- Security: Protection against ReDoS attacks with timeout limits and dangerous pattern detection
- Performance: Minimal overhead with configurable regex timeout (default 100ms)
Add to your config/upstream.properties or config/resend.properties:
# File-based rules (recommended for complex filters). Use .txt suffix on the files for better detection as a rule file.
inputFilterRules=config/input-filter-rules.txt
inputFilterDefaultAction=allow
inputFilterTimeout=100
# Or inline rules (comma-separated)
inputFilterRules=allow:127\.0\.0\.1,allow:10\.10\.\d{1,3}\.\d{1,3},deny:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
inputFilterDefaultAction=allow
inputFilterTimeout=100Override via environment variables (upstream):
export AIRGAP_UPSTREAM_INPUT_FILTER_RULES=config/input-filter-rules.txt
export AIRGAP_UPSTREAM_INPUT_FILTER_DEFAULT_ACTION=allow
export AIRGAP_UPSTREAM_INPUT_FILTER_TIMEOUT=100For the resend application:
export AIRGAP_RESEND_INPUT_FILTER_RULES=config/input-filter-rules.txt
export AIRGAP_RESEND_INPUT_FILTER_DEFAULT_ACTION=allow
export AIRGAP_RESEND_INPUT_FILTER_TIMEOUT=100Each rule has the format: action:regex_pattern
- Action: Either
allowordeny - Pattern: A valid Go regex pattern (RE2 syntax)
- Comments: Lines starting with
#are ignored (must be on separate lines, not inline) - Empty lines: Ignored
- Important: Do NOT use inline comments on the same line as rules, as they will be included in the regex pattern and cause matching to fail
Example rule file (config/input-filter-rules.txt):
# Allow localhost traffic
allow:127\.0\.0\.1
# Allow internal network 10.x.x.x
allow:10\.\d{1,3}\.\d{1,3}\.\d{1,3}
# Allow internal network 192.168.x.x
allow:192\.168\.\d{1,3}\.\d{1,3}
# Allow internal network 172.16.0.0/12
allow:172\.(1[6-9]|2[0-9]|3[0-1])\.(\d{1,3})\.(\d{1,3})
# Deny any other IP addresses
deny:\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
# Rest is allowed by default actionWhen no rules match, the inputFilterDefaultAction determines the behavior:
allow(default): Send the event if no rules matchdeny: Block the event if no rules match
This lets you create either:
- Allowlist approach: Set default to
deny, useallowrules - Blocklist approach: Set default to
allow, usedenyrules
Block all events except high/critical severity:
# Allow high and critical severity
allow:.*"severity":\s*"(high|critical)".*
# Deny everything else
deny:.*With inputFilterDefaultAction=deny, this creates a strict allowlist.
Allow everything except events containing PII:
# Block Social Security Numbers (US format)
deny:\b\d{3}-\d{2}-\d{4}\b
# Block email addresses
deny:(?i)[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,}
# Block credit card numbers (basic pattern)
deny:\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b
# Block phone numbers (various formats)
deny:\b\d{3}[-.]?\d{3}[-.]?\d{4}\b
# Block credentials
deny:(?i)(password|passwd|pwd|secret|token|api[_-]?key)\s*[:=]\s*\S+
# Default: allow everything elseWith inputFilterDefaultAction=allow, this blocks sensitive data while allowing normal traffic.
Network example - only allow specific internal networks:
# Allow localhost
allow:127\.0\.0\.1
# Allow 10.10.x.x network
allow:10\.10\.\d{1,3}\.\d{1,3}
# Deny any other IP addresses
deny:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
# Default: allow (non-IP content passes through)With inputFilterDefaultAction=allow, this filters out unauthorized IP addresses while allowing other content.
Reduce traffic by filtering low-priority logs:
# Deny debug logs
deny:(?i)"level":\s*"debug"
# Deny trace logs
deny:(?i)"level":\s*"trace"
# Deny verbose logs
deny:(?i)"level":\s*"verbose"
# Allow everything else (info, warn, error)Use filtering for simple content routing (alternative to multiple upstreams):
# Only send security events
allow:(?i)"category":\s*"(security|authentication|authorization)"
# Deny everything else
deny:.*The input filter includes multiple layers of protection against Regular Expression Denial of Service (ReDoS) attacks:
-
Pattern validation: Dangerous nested quantifiers are detected at startup:
(\w+)*,(a+)+,(.*)*,(\w*)*,(\d+)*,(.+)+,(\S+)*
-
Timeout protection: Each regex match has a configurable timeout (default 100ms via
inputFilterTimeout) -
Panic recovery: Regex panics are caught and treated as non-matches
-
Length limits: Regex patterns are limited to 1000 characters
When logLevel=DEBUG is set, the upstream Kafka consumer logs up to 80 bytes of the raw message payload and the full message key before the input filter is applied. This means sensitive content that the filter is configured to suppress may still appear in debug logs. Do not use logLevel=DEBUG in production environments where payload confidentiality is required.
inputFilterRules: Path to rule file or inline comma-separated rulesinputFilterDefaultAction: Default action (allowordeny) when no rules matchinputFilterTimeout: Regex match timeout in milliseconds (default: 100)
-
Test patterns: Validate regex patterns before deployment
echo "test content" | grep -P "your_pattern"
-
Start simple: Begin with simple patterns, add complexity as needed
-
Fail open: On filter errors, events are allowed (not blocked)
-
Monitor statistics: Track
filteredandtotal_filteredin statistics logs -
Use anchors: Add
^and$when matching entire payloads -
Escape special characters: Remember to escape
.*+?[](){}|\
At typical throughput levels:
- Low throughput (<1,000 eps): Negligible impact (<1% CPU)
- Medium throughput (1,000-10,000 eps): ~5-10% CPU overhead
- High throughput (>10,000 eps): ~10-20% CPU overhead
Optimization tips:
- Use simple patterns when possible
- Avoid complex alternations with many options
- Place most common matches first in the rule list
- Use anchors to avoid unnecessary backtracking
Filtered events are tracked in the statistics output:
STATISTICS: {
"id": "Upstream_1",
"time": 1735776000,
"time_start": 1735772400,
"interval": 60,
"received": 3600,
"sent": 3200,
"filtered": 400,
"unfiltered": 3200,
"filter_timeouts": 2,
"eps": 60,
"total_received": 72000,
"total_sent": 64000,
"total_filtered": 8000,
"total_unfiltered": 64000,
"total_filter_timeouts": 15
}Fields:
filtered: Events whose payload was cleared by the input filter during the last interval (sent with empty payload; not counted insent)unfiltered: Events that passed through the input filter unchanged during the last intervaltotal_filtered: Total events whose payload was cleared since startuptotal_unfiltered: Total events that passed through the input filter unchanged since startupfilter_timeouts: Regex timeout errors during the last intervaltotal_filter_timeouts: Total regex timeout errors since startup
Check if your filter rules are too restrictive:
- Set
inputFilterDefaultAction=allowtemporarily - Check logs for
"Input filter cleared payload for message"(debug level) — filtered events are still sent but with empty payload - Verify your regex patterns match what you expect
- Verify the rule file path is correct
- Check for syntax errors in rules (look for startup errors)
- Enable debug logging:
logLevel=DEBUG - Verify patterns with a regex tester
- Check for complex regex patterns (nested quantifiers)
- Monitor CPU usage during high load
- Simplify patterns or reduce rule count
config/security-filter.txt:
# Allow security-relevant events
allow:(?i)"category":\s*"(security|authentication|authorization|audit)"
allow:(?i)"severity":\s*"(high|critical|error)"
allow:(?i)"event":\s*"(login|logout|access_denied|permission_change)"
# Block everything else
deny:.*config/privacy-filter.txt:
# Block common PII patterns
deny:\b\d{3}-\d{2}-\d{4}\b # SSN
deny:(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b # Email
deny:\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b # Credit cards
deny:(?i)(password|passwd|pwd|secret|token|api_key)\s*[:=] # Credentials
# Allow everything elseconfig/network-filter.txt:
# Allow private networks only
allow:10\.\d{1,3}\.\d{1,3}\.\d{1,3}
allow:172\.(1[6-9]|2[0-9]|3[01])\.\d{1,3}\.\d{1,3}
allow:192\.168\.\d{1,3}\.\d{1,3}
allow:127\.0\.0\.1
allow:::1
# Block all other IPs
deny:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
deny:[0-9a-fA-F:]+
# Allow non-IP content- Installation and Configuration - General configuration guide
- Monitoring - Tracking filter statistics