Input Filtering

The air-gap system supports content-based filtering of events at the upstream side before transmission, and in the resend application when replaying missing events. This allows you to control which events are forwarded to the downstream Kafka based on regex patterns matching the payload content.

Important: Filtered events are not dropped. Instead, they are sent across the diode with an empty payload. This ensures the downstream gap-detector sees every sequence ID and does not report false gaps. The deduplication application (PartitionDedupApp) silently discards zero-length events before writing to the clean topic, so they never appear in downstream Kafka.

Overview

Input filtering provides:

Payload inspection: Filter events based on their content, not just metadata
Allow/deny rules: Create ordered lists of regex patterns with allow or deny actions
First match wins: Rules are evaluated in order, the first matching rule determines the action
Security: Protection against ReDoS attacks with timeout limits and dangerous pattern detection
Performance: Minimal overhead with configurable regex timeout (default 100ms)

Configuration

Basic Setup

Add to your config/upstream.properties or config/resend.properties:

# File-based rules (recommended for complex filters). Use .txt suffix on the files for better detection as a rule file.
inputFilterRules=config/input-filter-rules.txt
inputFilterDefaultAction=allow
inputFilterTimeout=100

# Or inline rules (comma-separated)
inputFilterRules=allow:127\.0\.0\.1,allow:10\.10\.\d{1,3}\.\d{1,3},deny:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
inputFilterDefaultAction=allow
inputFilterTimeout=100

Environment Variables

Override via environment variables (upstream):

export AIRGAP_UPSTREAM_INPUT_FILTER_RULES=config/input-filter-rules.txt
export AIRGAP_UPSTREAM_INPUT_FILTER_DEFAULT_ACTION=allow
export AIRGAP_UPSTREAM_INPUT_FILTER_TIMEOUT=100

For the resend application:

export AIRGAP_RESEND_INPUT_FILTER_RULES=config/input-filter-rules.txt
export AIRGAP_RESEND_INPUT_FILTER_DEFAULT_ACTION=allow
export AIRGAP_RESEND_INPUT_FILTER_TIMEOUT=100

Rule Format

Each rule has the format: action:regex_pattern

Action: Either allow or deny
Pattern: A valid Go regex pattern (RE2 syntax)
Comments: Lines starting with # are ignored (must be on separate lines, not inline)
Empty lines: Ignored
Important: Do NOT use inline comments on the same line as rules, as they will be included in the regex pattern and cause matching to fail

Example rule file (config/input-filter-rules.txt):

# Allow localhost traffic
allow:127\.0\.0\.1

# Allow internal network 10.x.x.x
allow:10\.\d{1,3}\.\d{1,3}\.\d{1,3}

# Allow internal network 192.168.x.x  
allow:192\.168\.\d{1,3}\.\d{1,3}

# Allow internal network 172.16.0.0/12
allow:172\.(1[6-9]|2[0-9]|3[0-1])\.(\d{1,3})\.(\d{1,3})

# Deny any other IP addresses
deny:\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

# Rest is allowed by default action

Default Action

When no rules match, the inputFilterDefaultAction determines the behavior:

allow (default): Send the event if no rules match
deny: Block the event if no rules match

This lets you create either:

Allowlist approach: Set default to deny, use allow rules
Blocklist approach: Set default to allow, use deny rules

Use Cases

Use Case 1: Allow Only High-Severity Events

Block all events except high/critical severity:

# Allow high and critical severity
allow:.*"severity":\s*"(high|critical)".*

# Deny everything else
deny:.*

With inputFilterDefaultAction=deny, this creates a strict allowlist.

Use Case 2: Block PII (Personally Identifiable Information)

Allow everything except events containing PII:

# Block Social Security Numbers (US format)
deny:\b\d{3}-\d{2}-\d{4}\b

# Block email addresses
deny:(?i)[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,}

# Block credit card numbers (basic pattern)
deny:\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b

# Block phone numbers (various formats)
deny:\b\d{3}[-.]?\d{3}[-.]?\d{4}\b

# Block credentials
deny:(?i)(password|passwd|pwd|secret|token|api[_-]?key)\s*[:=]\s*\S+

# Default: allow everything else

With inputFilterDefaultAction=allow, this blocks sensitive data while allowing normal traffic.

Use Case 3: IP Address Allowlist

Network example - only allow specific internal networks:

# Allow localhost
allow:127\.0\.0\.1

# Allow 10.10.x.x network
allow:10\.10\.\d{1,3}\.\d{1,3}

# Deny any other IP addresses
deny:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

# Default: allow (non-IP content passes through)

With inputFilterDefaultAction=allow, this filters out unauthorized IP addresses while allowing other content.

Use Case 4: Block Debug/Verbose Logs

Reduce traffic by filtering low-priority logs:

# Deny debug logs
deny:(?i)"level":\s*"debug"

# Deny trace logs
deny:(?i)"level":\s*"trace"

# Deny verbose logs
deny:(?i)"level":\s*"verbose"

# Allow everything else (info, warn, error)

Use Case 5: Content-Based Routing Alternative

Use filtering for simple content routing (alternative to multiple upstreams):

# Only send security events
allow:(?i)"category":\s*"(security|authentication|authorization)"

# Deny everything else
deny:.*

Security Considerations

ReDoS Protection

The input filter includes multiple layers of protection against Regular Expression Denial of Service (ReDoS) attacks:

Pattern validation: Dangerous nested quantifiers are detected at startup:
- (\w+)*, (a+)+, (.*)*, (\w*)*, (\d+)*, (.+)+, (\S+)*
Timeout protection: Each regex match has a configurable timeout (default 100ms via inputFilterTimeout)
Panic recovery: Regex panics are caught and treated as non-matches
Length limits: Regex patterns are limited to 1000 characters

DEBUG Logging Caveat

When logLevel=DEBUG is set, the upstream Kafka consumer logs up to 80 bytes of the raw message payload and the full message key before the input filter is applied. This means sensitive content that the filter is configured to suppress may still appear in debug logs. Do not use logLevel=DEBUG in production environments where payload confidentiality is required.

Configuration Options

inputFilterRules: Path to rule file or inline comma-separated rules
inputFilterDefaultAction: Default action (allow or deny) when no rules match
inputFilterTimeout: Regex match timeout in milliseconds (default: 100)

Best Practices

Test patterns: Validate regex patterns before deployment
```
echo "test content" | grep -P "your_pattern"
```
Start simple: Begin with simple patterns, add complexity as needed
Fail open: On filter errors, events are allowed (not blocked)
Monitor statistics: Track filtered and total_filtered in statistics logs
Use anchors: Add ^ and $ when matching entire payloads
Escape special characters: Remember to escape . * + ? [ ] ( ) { } | \

Performance Impact

At typical throughput levels:

Low throughput (<1,000 eps): Negligible impact (<1% CPU)
Medium throughput (1,000-10,000 eps): ~5-10% CPU overhead
High throughput (>10,000 eps): ~10-20% CPU overhead

Optimization tips:

Use simple patterns when possible
Avoid complex alternations with many options
Place most common matches first in the rule list
Use anchors to avoid unnecessary backtracking

Statistics

Filtered events are tracked in the statistics output:

STATISTICS: {
  "id": "Upstream_1",
  "time": 1735776000,
  "time_start": 1735772400,
  "interval": 60,
  "received": 3600,
  "sent": 3200,
  "filtered": 400,
  "unfiltered": 3200,
  "filter_timeouts": 2,
  "eps": 60,
  "total_received": 72000,
  "total_sent": 64000,
  "total_filtered": 8000,
  "total_unfiltered": 64000,
  "total_filter_timeouts": 15
}

Fields:

filtered: Events whose payload was cleared by the input filter during the last interval (sent with empty payload; not counted in sent)
unfiltered: Events that passed through the input filter unchanged during the last interval
total_filtered: Total events whose payload was cleared since startup
total_unfiltered: Total events that passed through the input filter unchanged since startup
filter_timeouts: Regex timeout errors during the last interval
total_filter_timeouts: Total regex timeout errors since startup

Troubleshooting

No events are being sent

Check if your filter rules are too restrictive:

Set inputFilterDefaultAction=allow temporarily
Check logs for "Input filter cleared payload for message" (debug level) — filtered events are still sent but with empty payload
Verify your regex patterns match what you expect

Filter not working

Verify the rule file path is correct
Check for syntax errors in rules (look for startup errors)
Enable debug logging: logLevel=DEBUG
Verify patterns with a regex tester

Performance issues

Check for complex regex patterns (nested quantifiers)
Monitor CPU usage during high load
Simplify patterns or reduce rule count

Example Filter Files

Security-focused Filter

config/security-filter.txt:

# Allow security-relevant events
allow:(?i)"category":\s*"(security|authentication|authorization|audit)"
allow:(?i)"severity":\s*"(high|critical|error)"
allow:(?i)"event":\s*"(login|logout|access_denied|permission_change)"

# Block everything else
deny:.*

Privacy-protection Filter

config/privacy-filter.txt:

# Block common PII patterns
deny:\b\d{3}-\d{2}-\d{4}\b                                    # SSN
deny:(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b            # Email
deny:\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b               # Credit cards
deny:(?i)(password|passwd|pwd|secret|token|api_key)\s*[:=]    # Credentials

# Allow everything else

Network Filter

config/network-filter.txt:

# Allow private networks only
allow:10\.\d{1,3}\.\d{1,3}\.\d{1,3}
allow:172\.(1[6-9]|2[0-9]|3[01])\.\d{1,3}\.\d{1,3}
allow:192\.168\.\d{1,3}\.\d{1,3}
allow:127\.0\.0\.1
allow:::1

# Block all other IPs
deny:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
deny:[0-9a-fA-F:]+

# Allow non-IP content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input Filtering

Overview

Configuration

Basic Setup

Environment Variables

Rule Format

Default Action

Use Cases

Use Case 1: Allow Only High-Severity Events

Use Case 2: Block PII (Personally Identifiable Information)

Use Case 3: IP Address Allowlist

Use Case 4: Block Debug/Verbose Logs

Use Case 5: Content-Based Routing Alternative

Security Considerations

ReDoS Protection

DEBUG Logging Caveat

Configuration Options

Best Practices

Performance Impact

Statistics

Troubleshooting

No events are being sent

Filter not working

Performance issues

Example Filter Files

Security-focused Filter

Privacy-protection Filter

Network Filter

Related Documentation

FilesExpand file tree

InputFilter.md

Latest commit

History

InputFilter.md

File metadata and controls

Input Filtering

Overview

Configuration

Basic Setup

Environment Variables

Rule Format

Default Action

Use Cases

Use Case 1: Allow Only High-Severity Events

Use Case 2: Block PII (Personally Identifiable Information)

Use Case 3: IP Address Allowlist

Use Case 4: Block Debug/Verbose Logs

Use Case 5: Content-Based Routing Alternative

Security Considerations

ReDoS Protection

DEBUG Logging Caveat

Configuration Options

Best Practices

Performance Impact

Statistics

Troubleshooting

No events are being sent

Filter not working

Performance issues

Example Filter Files

Security-focused Filter

Privacy-protection Filter

Network Filter

Related Documentation