-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
The traffic_forwarder.py script that forwards traffic between vsock and TCP connections has several issues that can lead to connection leaks and resource exhaustion:
- Sockets are not being properly closed, leading to connection accumulation
- No idle timeout detection for stale connections
- Buffer size is too small (1KB) for efficient data transfer
- Excessive logging creates noise in production
Root Cause
When connections are closed, the sockets are only partially shutdown (SHUT_RD/SHUT_WR) but not fully closed. This causes file descriptors to leak over time. Additionally, stale connections that stop transferring data are never cleaned up.
Solution
Implement proper socket cleanup and connection management:
Code Changes
# traffic_forwarder.py changes
# 1. Add configurable logging level (reduce spam)
# Set up logging - use WARNING level by default to reduce log spam
# Set TRAFFIC_LOG_LEVEL env var to "DEBUG" or "INFO" for more verbose logging
log_level = os.environ.get('TRAFFIC_LOG_LEVEL', 'WARNING').upper()
logging.basicConfig(level=getattr(logging, log_level, logging.WARNING),
format='%(asctime)s - %(levelname)s - %(message)s')
# 2. In forward() function - add idle timeout and proper cleanup
def forward(source, destination, connection_id, direction):
"""Forward data between sockets with proper cleanup"""
try:
source.settimeout(1.0) # 1 second timeout for checking shutdown
idle_count = 0
max_idle = 300 # 5 minutes of idle time before considering connection dead
while not shutdown_flag.is_set():
try:
data = source.recv(8192) # Increased buffer size for better performance
if not data:
logging.info(f"Connection {connection_id}: End of data stream ({direction})")
break
idle_count = 0 # Reset idle counter on data
logging.debug(f"Forwarding {len(data)} bytes") # Changed to debug level
destination.sendall(data)
except socket.timeout:
idle_count += 1
if idle_count >= max_idle:
logging.warning(f"Connection {connection_id}: Idle timeout reached ({direction})")
break
continue # Check shutdown flag
# ... error handling ...
finally:
# Properly close both sockets to avoid connection leaks
try:
# First try graceful shutdown
source.shutdown(socket.SHUT_RDWR)
except OSError:
pass # Socket might already be closed
try:
destination.shutdown(socket.SHUT_RDWR)
except OSError:
pass
# Then close the sockets
try:
source.close()
except OSError:
pass
try:
destination.close()
except OSError:
pass
logging.info(f"Connection {connection_id}: Completed ({direction})")Benefits
- Prevents connection leaks: Proper socket cleanup with SHUT_RDWR and close()
- Handles stale connections: 5-minute idle timeout prevents zombie connections
- Better performance: 8KB buffer size (up from 1KB) for more efficient data transfer
- Reduced log noise: WARNING level by default, configurable via TRAFFIC_LOG_LEVEL env var
- Resource protection: Prevents file descriptor exhaustion
Testing
- Monitor with
lsof -n | grep vsock | wc -lto verify connections are cleaned up - Check file descriptor usage with
ls /proc/*/fd | wc -l - Verify idle connections are terminated after 5 minutes
- Confirm no "Bad file descriptor" errors in logs
Impact
This fix is critical for long-running services that use vsock communication, preventing resource exhaustion that could cause service outages after extended operation.
Metadata
Metadata
Assignees
Labels
No labels