Fix SIGTERM propagation to AWF process for timeout-minutes enforcement#24013
Fix SIGTERM propagation to AWF process for timeout-minutes enforcement#24013
Conversation
…substitution and add SIGTERM trap The original `cmd 2>&1 | tee -a logfile` pipeline breaks SIGTERM delivery when GitHub Actions enforces timeout-minutes. The runner sends SIGTERM to the shell, but tee does not forward it upstream to awf, and sudo may also intercept the signal. The fix uses two techniques: 1. Process substitution (> >(tee -a)) instead of a pipe so that awf is the foreground process and its PID can be captured. 2. An explicit SIGTERM/SIGINT trap that kills the captured AWF PID, ensuring the signal reaches awf even when sudo intercepts it. Updated wasm golden test files and recompiled all 179 workflow lock files. Fixes: #23965 Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e0e3e49b-8af7-4519-a71b-2748d64cffcc Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@lpcox maybe we should have a job level timeout as well. |
There was a problem hiding this comment.
Pull request overview
This PR fixes timeout-minutes enforcement for agent steps by changing how the AWF-wrapped engine command is executed and logged, ensuring SIGTERM/SIGINT can be relayed to the AWF process rather than being swallowed by a | tee pipeline and/or sudo.
Changes:
- Update
BuildAWFCommand()to replace2>&1 | tee -a ...with process substitution> >(tee -a ...) 2>&1, and add a TERM/INT trap that attempts to terminate the captured PID. - Regenerate workflow lock files to reflect the new generated shell script shape.
- Update WASM golden fixtures to match the new generated script.
Reviewed changes
Copilot reviewed 157 out of 182 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/awf_helpers.go | Generates a different bash script for AWF execution to improve SIGTERM/SIGINT propagation under timeouts. |
| pkg/workflow/testdata/wasm_golden/TestWasmGolden_CompileFixtures/with-imports.golden | Updates golden output for the new AWF execution/trap script. |
| pkg/workflow/testdata/wasm_golden/TestWasmGolden_CompileFixtures/basic-copilot.golden | Updates golden output for the new AWF execution/trap script. |
| .github/workflows/workflow-health-manager.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/workflow-generator.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/weekly-blog-post-writer.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/test-workflow.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/test-project-url-default.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/test-dispatcher.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/terminal-stylist.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/super-linter.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/sub-issue-closer.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/security-review.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/security-compliance.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/repository-quality-improver.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/repo-tree-map.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/repo-audit-analyzer.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/refiner.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/q.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/pr-triage-agent.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/pr-nitpick-reviewer.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/plan.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/pdf-summary.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/notion-issue-summary.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/metrics-collector.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/jsweep.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/issue-triage-agent.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/issue-monster.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/gpclean.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/github-remote-mcp-auth-test.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/firewall.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/example-permissions-warning.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/dictation-prompt.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/dev.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/dependabot-go-checker.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/dependabot-burner.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/dead-code-remover.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-team-status.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-semgrep-scan.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-secrets-analysis.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-regulatory.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-malicious-code-scan.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-cli-tools-tester.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-cli-performance.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-assign-issue-to-user.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/daily-architecture-diagram.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/craft.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/contribution-check.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/constraint-solving-potd.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/codex-github-remote-mcp-test.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/code-simplifier.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/code-scanning-fixer.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/ci-coach.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/changeset.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/brave.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/bot-detection.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/archie.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/ai-moderator.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/agentic-observability-kit.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/agent-persona-explorer.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/agent-performance-analyzer.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
| .github/workflows/ace-editor.lock.yml | Regenerated to use process substitution + trap/wait pattern. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const awfSignalTrap = `# Signal forwarding: capture AWF PID to relay SIGTERM/SIGINT on timeout. | ||
| # Without this, the pipe (| tee) and sudo intercept the signal before awf | ||
| # receives it, so GitHub Actions timeout-minutes is not enforced. | ||
| _awf_pid="" | ||
| _awf_terminate() { [ -n "$_awf_pid" ] && kill -TERM "$_awf_pid" 2>/dev/null || true; } | ||
| trap '_awf_terminate' TERM INT` |
There was a problem hiding this comment.
The signal trap kills $_awf_pid, but with sudo -E awf ... & $! will typically be the PID of sudo, not the actual awf process (sudo often forks/execs the target). In that case, kill -TERM "$ _awf_pid" may terminate sudo while leaving awf running, which would still prevent timeout-minutes from being enforced. Consider sending the signal to the whole process group (e.g., kill -TERM -- -"$_awf_pid") or otherwise ensuring the PID you capture is the one that will reliably terminate awf.
pkg/workflow/awf_helpers.go
Outdated
| // 1. Process substitution (> >(tee -a)) instead of a pipe so that awf | ||
| // is the foreground process and its PID can be captured. | ||
| // 2. An explicit SIGTERM/SIGINT trap that kills the captured AWF PID, | ||
| // ensuring the signal reaches awf even when sudo intercepts it. |
There was a problem hiding this comment.
The rationale comment says process substitution makes awf the “foreground process”, but the generated command backgrounds it (... 2>&1 &) and then waits. This is a bit misleading for future readers (and could affect how signals/TTY behavior is reasoned about). Suggest rewording the comment to emphasize “not in a pipe / PID can be captured” rather than “foreground process”.
pkg/workflow/awf_helpers.go
Outdated
| command = fmt.Sprintf(`set -o pipefail | ||
| %s | ||
| %s | ||
| # shellcheck disable=SC1003 | ||
| %s %s %s \ | ||
| -- %s 2>&1 | tee -a %s`, | ||
| -- %s > >(tee -a %s) 2>&1 & | ||
| _awf_pid=$! | ||
| wait "$_awf_pid"`, |
There was a problem hiding this comment.
Switching from a pipe to process substitution means the script no longer observes tee’s exit status. Previously, set -o pipefail would fail the step if tee -a failed (disk full/permission issues), but now wait "$_awf_pid" only reflects the awf/sudo exit code and logging failures could go unnoticed. If log capture is still expected to be reliable, consider a pattern that can also wait/check tee (e.g., FIFO + background tee with captured PID) or document that tee failures are intentionally ignored.
|
@copilot review comments |
…ect tee failures via FIFO - Use `kill -TERM -- -"$_awf_pid"` to signal the whole process group, ensuring awf is reached even when sudo intercepts the initial SIGTERM - Replace process substitution with a FIFO so both awf and tee PIDs can be tracked and waited on — tee failures (e.g. disk full) now fail the step - Fix comment: "foreground process" → "not in a pipe / PID can be captured" (the command is backgrounded with & and waited on) Recompiled all 179 lock files and updated wasm golden fixtures. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/be54e6d4-adb2-4bb9-9a99-72d2c3434477 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed all three review comments in 1fe2d5f:
Generated script shape: _awf_log=$(mktemp -u)
mkfifo "$_awf_log"
_awf_pid="" _awf_tee_pid=""
_awf_terminate() {
[ -n "$_awf_pid" ] && kill -TERM -- -"$_awf_pid" 2>/dev/null || true
}
trap '_awf_terminate' TERM INT
tee -a /tmp/gh-aw/agent-stdio.log < "$_awf_log" &
_awf_tee_pid=$!
sudo -E awf ... > "$_awf_log" 2>&1 &
_awf_pid=$!
wait "$_awf_pid"; _awf_exit=$?
wait "$_awf_tee_pid"
rm -f "$_awf_log"
exit "$_awf_exit"Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot merge main and recompile |
…ignal-propagation-path # Conflicts: # .github/workflows/ai-moderator.lock.yml # .github/workflows/changeset.lock.yml # .github/workflows/ci-doctor.lock.yml # .github/workflows/codex-github-remote-mcp-test.lock.yml # .github/workflows/daily-fact.lock.yml # .github/workflows/daily-issues-report.lock.yml # .github/workflows/daily-observability-report.lock.yml # .github/workflows/duplicate-code-detector.lock.yml # .github/workflows/grumpy-reviewer.lock.yml # .github/workflows/issue-arborist.lock.yml # .github/workflows/schema-feature-coverage.lock.yml # .github/workflows/smoke-agent-all-merged.lock.yml # .github/workflows/smoke-agent-all-none.lock.yml # .github/workflows/smoke-agent-public-approved.lock.yml # .github/workflows/smoke-agent-public-none.lock.yml # .github/workflows/smoke-agent-scoped-approved.lock.yml # .github/workflows/smoke-call-workflow.lock.yml # .github/workflows/smoke-codex.lock.yml Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/ac282fd9-3a60-4964-8f24-0ecf2725dc38 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 6b9c6d6. Merged origin/main (resolved 18 lock file conflicts), rebuilt the binary, and recompiled all 179 workflow lock files. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
This comment has been minimized.
This comment has been minimized.
|
Hey The PR is well-structured, includes a thorough before/after script comparison, and the golden test files and 179 regenerated lock files all look like expected artifacts of the template change. This is exactly the kind of core infrastructure improvement the project benefits from. This PR looks ready for maintainer review. ✅
|
|
I think the right fix should be in awf: github/gh-aw-firewall#1623 |
timeout-minuteson the agent step is not enforced because thecmd 2>&1 | tee -a logfilepipeline blocks SIGTERM:teedoesn't forward signals upstream andsudomay intercept beforeawfreceives them.Changes
pkg/workflow/awf_helpers.go—BuildAWFCommand()now generates a shell script that:mkfifo) instead of a pipe or process substitution, soawfandteeare independent background processes with separately captured PIDskill -TERM -- -"$_awf_pid"(process group kill) in the TERM/INT trap, ensuring the signal reachesawfeven whensudointercepts the initial delivery$_awf_pid(awf/sudo) and$_awf_tee_pid(tee) independently, so logging failures (e.g. disk full) are detected and fail the stepawfexit code via_awf_exitand passes it to the finalexitGolden files + 179 lock files — recompiled to reflect new script shape
Generated script (before → after)
Before:
After: