feat: CMS autoTriage improvements and review feedback#283
feat: CMS autoTriage improvements and review feedback#283RossHastie wants to merge 5 commits intomicrosoft:mainfrom
Conversation
autoTriage: extract shared constants, refactor 360-line triage_issues into 3 helpers (212 lines), add type hints, replace emojis with text indicators, bounded caching with eviction, LLM rate limiting, input truncation, pin dependencies, add Dependabot config, GitHub Enterprise URL support. Feedback: add all CMS V1 review reports and consolidated findings register from 4 independent code review agents (security, architecture, code quality, evaluation methodology). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves the autoTriage tooling by refactoring the intake pipeline, adding LLM call protections (rate limiting + input truncation/sanitization), improving caching behavior in GitHub API wrappers, and aligning user-facing output with repo conventions. It also adds operational documentation artifacts and enables Dependabot for Python dependency updates.
Changes:
- Refactors
triage_issues()by extracting helpers and standardizing outputs (including removing emoji status markers in autoTriage outputs). - Adds LLM request hardening: rolling-window rate limiter, consistent input truncation, and user-content sanitization before prompt interpolation.
- Replaces
@lru_cacheusage on GitHubService instance methods with a bounded, TTL-based module cache; pins Python dependencies and adds Dependabot config.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/cli/install-cli.ps1 | Adds strict mode, prerequisite validation, and comment-based help for CLI installation script. |
| scripts/cli/Auth/New-Agent365ToolsServicePrincipalProdPublic.ps1 | Improves robustness via strict mode and error handling for module install/import. |
| autoTriage/services/teams_service.py | Replaces emoji markers in Teams Adaptive Cards with text labels. |
| autoTriage/services/llm_service.py | Adds sanitization, truncation, and process-wide rate limiting; updates prompt formatting. |
| autoTriage/services/intake_service.py | Adds URL parsing via urlparse, refactors triage orchestration into helpers, and applies sanitization in Copilot paths. |
| autoTriage/services/github_service.py | Introduces bounded cache eviction and migrates repository label caching to TTL cache. |
| autoTriage/requirements.txt | Pins dependency versions and adds guidance for hash verification. |
| autoTriage/constants.py | Introduces shared constants module (e.g., contributors-to-show limit). |
| Feedback/*.md | Adds multiple review reports, consolidated findings, and verification documentation. |
| .github/dependabot.yml | Enables weekly Dependabot updates for /autoTriage pip dependencies. |
scripts/cli/Auth/New-Agent365ToolsServicePrincipalProdPublic.ps1
Outdated
Show resolved
Hide resolved
- Sanitise exceptions to prevent token/key leakage in logs - Add XML entity escaping to defend against prompt injection - Bound _repo_cache with FIFO eviction at CACHE_MAX_ENTRIES - Guard rate limiter against negative sleep times - Validate LLM JSON output against allowlisted type/priority enums Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rovements - Extract sanitise_exception/sanitise_user_content to utils/sanitise.py so they are shared across services rather than imported as private symbols - Fix XML escaping order: & escaped before < and > to prevent double-encoding - Fix Bearer token redaction order: Bearer regex runs before key=value regex so the full JWT value is captured before the header name consumes it - Add threading.Lock to RateLimiter for safe concurrent Azure Functions use - Add github_host propagation through _fetch_issues_to_triage so GHE issue URLs in IssueClassification point at the correct server, not github.com - Add ValueError guard when both issue_url and issue_numbers are provided - Replace CONFIDENTIAL classification markers with INTERNAL in Feedback files - Replace emoji character in PowerShell script with plain-text WARNING label - Add 39 tests: sanitise utils, RateLimiter, _parse_issue_url (incl. GHE), and _fetch_issues_to_triage mutual exclusion guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Plumb GHE hostname through GitHubService: add github_host parameter to __init__ and configure PyGithub base_url for non-github.com hosts; triage_issues extracts the host from issue_url before constructing the service so API calls reach the correct GHE endpoint - Defensive parsing for LLM_MAX_CALLS_PER_MINUTE: wrap int() in try/except and fall back to the default (60) with a warning rather than crashing at import time when the env var contains a non-integer value Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep main's bash-delegation approach (install-cli.ps1 → install-cli.sh) and retain the PR branch's synopsis header and strict-mode improvements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@RossHastie please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
| def __init__(self, max_calls_per_minute: int = 60): | ||
| self._max_calls = max_calls_per_minute | ||
| self._calls: List[float] = [] | ||
| self._lock = threading.Lock() | ||
|
|
||
| def wait_if_needed(self) -> None: | ||
| """Block until a call is allowed under the rate limit.""" | ||
| with self._lock: | ||
| now = time.monotonic() | ||
| # Remove timestamps that have aged out of the 60-second window. | ||
| self._calls = [t for t in self._calls if now - t < 60] | ||
|
|
||
| if len(self._calls) >= self._max_calls: | ||
| # Sleep until the oldest recorded call falls outside the window. | ||
| # max(0, ...) guards against a negative value that could arise from | ||
| # clock skew or a race where the window entry just aged out between | ||
| # the list-comprehension above and this calculation. | ||
| sleep_time = max(0, 60 - (now - self._calls[0])) | ||
| if sleep_time > 0: |
There was a problem hiding this comment.
RateLimiter doesn't validate max_calls_per_minute. If LLM_MAX_CALLS_PER_MINUTE is set to 0 (or a negative value), wait_if_needed() will enter the limit branch with an empty _calls list and then access self._calls[0], raising IndexError. Consider enforcing max_calls_per_minute >= 1 (or treating 0 as 'disabled' and skipping limiting).
Summary
triage_issues()function, deduplicated constants intoconstants.py, replaced emojis with text labels, added type hintsLLM_MAX_CALLS_PER_MINUTEconfig@lru_cacheon instance methods with bounded dict cache (CACHE_MAX_ENTRIES=100) and two-phase evictionMAX_ISSUE_TITLE_LENGTH=200,MAX_ISSUE_BODY_LENGTH=2000applied uniformlyurlparse-based URL handling instead of hardcoded regex.github/dependabot.ymlfor automated dependency updates23 files changed, +4,607/-292 lines
Test plan
🤖 Generated with Claude Code