-
Notifications
You must be signed in to change notification settings - Fork 4
Add file descriptor limits and buffered output for enclave logging #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Increase file descriptor limit to 65536 to prevent socket exhaustion - Add line buffering (stdbuf -oL) to socat output for timely log delivery - Add background monitoring script to track FD usage every 5 minutes - Warn when FD usage exceeds 50,000 to provide early warning Co-authored-by: Anthony <AnthonyRonning@users.noreply.github.com>
WalkthroughUpdates entrypoint.sh to line-buffer and redirect logs via socat to VSOCK, raise the open file descriptor limit to 65536 with logging, and add a periodic background monitor that logs FD usage and warns above 50,000. The monitor loop appears twice, creating two concurrent monitors. Changes
Sequence Diagram(s)sequenceDiagram
participant Entrypoint
participant stdbuf
participant socat as socat (VSOCK 3:8011)
participant FDMonitor
participant App
Entrypoint->>Entrypoint: ulimit -n 65536
Entrypoint->>stdbuf: Redirect stdout/stderr (line-buffered)
stdbuf->>socat: Stream logs (unbuffered)
Entrypoint->>FDMonitor: Start background loop (periodic FD count)
Entrypoint->>App: Launch main application
App-->>stdbuf: Logs
stdbuf-->>socat: Lines forwarded to VSOCK
FDMonitor-->>Entrypoint: Log FD count / warnings
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related issues
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements critical infrastructure improvements to the AWS Nitro Enclave entrypoint script to address resource exhaustion issues and improve logging reliability. The changes focus on three key areas:
File Descriptor Limit Increase: The PR raises the file descriptor limit from the system default to 65,536 using ulimit -n 65536. This change is crucial for the enclave environment which runs numerous background services including 22+ traffic forwarders for proxying connections to external services like databases, APIs, and other resources. Each of these services consumes file descriptors for network connections, and under heavy load, the system was hitting the default limit causing "too many open files" errors.
Buffered Logging Enhancement: The logging pipeline is modified to use stdbuf -oL with socat, enabling line buffering for the VSOCK connection that sends logs to the parent instance. This ensures logs are delivered immediately rather than being batched or delayed, which is critical for real-time debugging and monitoring in the production enclave environment.
Proactive Monitoring: A background monitoring script is added that checks file descriptor usage every 5 minutes by counting entries in /proc/*/fd. When usage exceeds 50,000 descriptors, it logs a warning message, providing early detection before hitting the 65,536 limit.
These changes integrate well with the existing enclave architecture documented in the nitro deployment guide, where the entrypoint script is responsible for initializing the enclave environment, setting up traffic forwarders, and managing the application lifecycle. The improvements address production stability concerns while maintaining the security and isolation guarantees of the Nitro Enclave environment.
Confidence score: 5/5
- This PR is safe to merge with minimal risk as it addresses critical production stability issues
- Score reflects well-understood infrastructure improvements with clear benefits and low complexity
- No files require special attention as the changes are straightforward system configuration updates
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
entrypoint.sh (1)
31-34: Harden ulimit setting: respect hard limit, check errors, and log both soft/hard
ulimit -n 65536silently fails if the hard limit is lower, and you currently don’t check the exit status. Set the soft limit up to the hard cap and log both values.Apply this diff:
-# Increase file descriptor limits to prevent socket exhaustion -ulimit -n 65536 -log "File descriptor limit set to: $(ulimit -n)" +# Increase file descriptor limits to prevent socket exhaustion +TARGET_NOFILE=65536 +HARD_NOFILE="$(ulimit -Hn || echo 65536)" +if [ "$TARGET_NOFILE" -gt "$HARD_NOFILE" ]; then + log "Requested nofile $TARGET_NOFILE exceeds hard limit $HARD_NOFILE; using hard limit" + TARGET_NOFILE="$HARD_NOFILE" +fi +if ulimit -Sn "$TARGET_NOFILE"; then + log "File descriptor soft limit set to: $(ulimit -Sn); hard limit: $(ulimit -Hn)" +else + log "WARNING: Failed to set file descriptor limit to $TARGET_NOFILE; current soft: $(ulimit -Sn); hard: $(ulimit -Hn)" +fi
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
entrypoint.sh(3 hunks)
| # Redirect all output to the logging script via VSOCK with buffering | ||
| # Use stdbuf to line-buffer output for timely delivery | ||
| exec > >(stdbuf -oL socat -u - VSOCK-CONNECT:3:8011) 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make log redirection resilient to vsock logger outages to avoid whole script termination
With set -e, if the socat process in the process substitution exits (e.g., VSOCK 3:8011 not yet ready), subsequent writes to stdout/stderr can hit EPIPE and terminate the entrypoint unexpectedly. This is a production risk.
Refactor to use a FIFO + a supervised, auto-retrying logger so the writer FD remains valid even if the remote is temporarily unavailable. Also add a fallback when stdbuf is unavailable.
Apply this diff:
-# Redirect all output to the logging script via VSOCK with buffering
-# Use stdbuf to line-buffer output for timely delivery
-exec > >(stdbuf -oL socat -u - VSOCK-CONNECT:3:8011) 2>&1
+# Redirect all output to the logging script via VSOCK with buffering.
+# Use a FIFO and a supervised logger to avoid EPIPE killing the script if the vsock target is not ready.
+rm -f /tmp/opensecret.logpipe
+mkfifo -m 600 /tmp/opensecret.logpipe
+(
+ while true; do
+ if command -v stdbuf >/dev/null 2>&1; then
+ stdbuf -oL socat -u - VSOCK-CONNECT:3:8011 < /tmp/opensecret.logpipe || true
+ else
+ socat -u - VSOCK-CONNECT:3:8011 < /tmp/opensecret.logpipe || true
+ fi
+ sleep 1
+ done
+) &
+exec > /tmp/opensecret.logpipe 2>&1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Redirect all output to the logging script via VSOCK with buffering | |
| # Use stdbuf to line-buffer output for timely delivery | |
| exec > >(stdbuf -oL socat -u - VSOCK-CONNECT:3:8011) 2>&1 | |
| # Redirect all output to the logging script via VSOCK with buffering. | |
| # Use a FIFO and a supervised logger to avoid EPIPE killing the script if the vsock target is not ready. | |
| rm -f /tmp/opensecret.logpipe | |
| mkfifo -m 600 /tmp/opensecret.logpipe | |
| ( | |
| while true; do | |
| if command -v stdbuf >/dev/null 2>&1; then | |
| stdbuf -oL socat -u - VSOCK-CONNECT:3:8011 < /tmp/opensecret.logpipe || true | |
| else | |
| socat -u - VSOCK-CONNECT:3:8011 < /tmp/opensecret.logpipe || true | |
| fi | |
| sleep 1 | |
| done | |
| ) & | |
| exec > /tmp/opensecret.logpipe 2>&1 |
🤖 Prompt for AI Agents
In entrypoint.sh around lines 15-17, the current exec > >(stdbuf -oL socat -u -
VSOCK-CONNECT:3:8011) 2>&1 can cause the script to die with EPIPE if the socat
process exits; replace the process-substitution approach with a named FIFO and a
supervised, auto-retrying background logger so the writer FD stays valid
regardless of remote availability. Create a tmp FIFO (mkfifo), start a
background loop that reads from the FIFO and runs socat (or a fallback logger)
with exponential/backoff retry on failure, and redirect stdout/stderr of the
main script into the FIFO (using stdbuf -oL if available, otherwise use cat for
no-buffering fallback). Ensure the background supervisor logs its own errors,
runs detached, and that you trap EXIT to remove the FIFO and kill the background
logger to avoid leaks.
| # Start file descriptor monitoring in background | ||
| ( | ||
| while true; do | ||
| sleep 300 # Check every 5 minutes | ||
| FD_COUNT=$(ls /proc/*/fd 2>/dev/null | wc -l) | ||
| log "File descriptors in use: $FD_COUNT" | ||
| if [ "$FD_COUNT" -gt 50000 ]; then | ||
| log "WARNING: High file descriptor usage: $FD_COUNT" | ||
| fi | ||
| done | ||
| ) & | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix FD monitor count method and emit first reading immediately
ls /proc/*/fd | wc -l is inaccurate (directory header lines inflate counts) and relatively expensive. Prefer /proc/sys/fs/file-nr for a cheap, accurate snapshot. Also, move sleep to the end so you don’t wait 5 minutes for the first log.
Apply this diff:
-# Start file descriptor monitoring in background
-(
- while true; do
- sleep 300 # Check every 5 minutes
- FD_COUNT=$(ls /proc/*/fd 2>/dev/null | wc -l)
- log "File descriptors in use: $FD_COUNT"
- if [ "$FD_COUNT" -gt 50000 ]; then
- log "WARNING: High file descriptor usage: $FD_COUNT"
- fi
- done
-) &
+# Start file descriptor monitoring in background
+(
+ while true; do
+ # /proc/sys/fs/file-nr: <allocated> <unused> <max>
+ if read -r ALLOC UNUSED MAX < /proc/sys/fs/file-nr; then
+ FD_USED=$((ALLOC - UNUSED))
+ log "File descriptors in use: $FD_USED (allocated: $ALLOC, unused: $UNUSED, max: $MAX)"
+ THRESHOLD=50000
+ # If the system max is below the static threshold, warn when within ~90% of max.
+ if [ "$MAX" -lt "$THRESHOLD" ]; then
+ WARN_AT=$(( (MAX * 90) / 100 ))
+ else
+ WARN_AT="$THRESHOLD"
+ fi
+ if [ "$FD_USED" -gt "$WARN_AT" ]; then
+ log "WARNING: High file descriptor usage: $FD_USED (threshold: $WARN_AT)"
+ fi
+ else
+ log "WARNING: Unable to read /proc/sys/fs/file-nr for FD usage"
+ fi
+ sleep 300 # Check every 5 minutes
+ done
+) &🤖 Prompt for AI Agents
In entrypoint.sh around lines 667 to 678, replace the current FD counting and
timing: use /proc/sys/fs/file-nr and take its first field as the accurate, cheap
file descriptor count instead of ls /proc/*/fd | wc -l, and move the sleep to
the end of the loop so the first reading is emitted immediately; also ensure the
read handles errors (e.g., fallback to 0) and keep the log and warning behavior
unchanged.
Fixes #86
This PR implements critical improvements to prevent enclave resource exhaustion and ensure reliable logging:
Changes
ulimit -nto prevent socket exhaustionstdbuf -oL) to socat for timely log deliveryBenefits
Testing
ulimit -ninside enclaveGenerated with Claude Code
Summary by CodeRabbit