Skip to content

Improve traffic_forwarder.py socket cleanup and connection management #1

@AnthonyRonning

Description

@AnthonyRonning

Problem

The traffic_forwarder.py script that forwards traffic between vsock and TCP connections has several issues that can lead to connection leaks and resource exhaustion:

  1. Sockets are not being properly closed, leading to connection accumulation
  2. No idle timeout detection for stale connections
  3. Buffer size is too small (1KB) for efficient data transfer
  4. Excessive logging creates noise in production

Root Cause

When connections are closed, the sockets are only partially shutdown (SHUT_RD/SHUT_WR) but not fully closed. This causes file descriptors to leak over time. Additionally, stale connections that stop transferring data are never cleaned up.

Solution

Implement proper socket cleanup and connection management:

Code Changes

# traffic_forwarder.py changes

# 1. Add configurable logging level (reduce spam)
# Set up logging - use WARNING level by default to reduce log spam
# Set TRAFFIC_LOG_LEVEL env var to "DEBUG" or "INFO" for more verbose logging
log_level = os.environ.get('TRAFFIC_LOG_LEVEL', 'WARNING').upper()
logging.basicConfig(level=getattr(logging, log_level, logging.WARNING), 
                    format='%(asctime)s - %(levelname)s - %(message)s')

# 2. In forward() function - add idle timeout and proper cleanup
def forward(source, destination, connection_id, direction):
    """Forward data between sockets with proper cleanup"""
    try:
        source.settimeout(1.0)  # 1 second timeout for checking shutdown
        idle_count = 0
        max_idle = 300  # 5 minutes of idle time before considering connection dead
        
        while not shutdown_flag.is_set():
            try:
                data = source.recv(8192)  # Increased buffer size for better performance
                if not data:
                    logging.info(f"Connection {connection_id}: End of data stream ({direction})")
                    break
                idle_count = 0  # Reset idle counter on data
                logging.debug(f"Forwarding {len(data)} bytes")  # Changed to debug level
                destination.sendall(data)
            except socket.timeout:
                idle_count += 1
                if idle_count >= max_idle:
                    logging.warning(f"Connection {connection_id}: Idle timeout reached ({direction})")
                    break
                continue  # Check shutdown flag
            # ... error handling ...
    finally:
        # Properly close both sockets to avoid connection leaks
        try:
            # First try graceful shutdown
            source.shutdown(socket.SHUT_RDWR)
        except OSError:
            pass  # Socket might already be closed
        try:
            destination.shutdown(socket.SHUT_RDWR)
        except OSError:
            pass
        
        # Then close the sockets
        try:
            source.close()
        except OSError:
            pass
        try:
            destination.close()
        except OSError:
            pass
        
        logging.info(f"Connection {connection_id}: Completed ({direction})")

Benefits

  1. Prevents connection leaks: Proper socket cleanup with SHUT_RDWR and close()
  2. Handles stale connections: 5-minute idle timeout prevents zombie connections
  3. Better performance: 8KB buffer size (up from 1KB) for more efficient data transfer
  4. Reduced log noise: WARNING level by default, configurable via TRAFFIC_LOG_LEVEL env var
  5. Resource protection: Prevents file descriptor exhaustion

Testing

  • Monitor with lsof -n | grep vsock | wc -l to verify connections are cleaned up
  • Check file descriptor usage with ls /proc/*/fd | wc -l
  • Verify idle connections are terminated after 5 minutes
  • Confirm no "Bad file descriptor" errors in logs

Impact

This fix is critical for long-running services that use vsock communication, preventing resource exhaustion that could cause service outages after extended operation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions