Add automated template submission pipeline#799
Conversation
Issue form + workflow that validates repos, updates templates.json, and creates PRs automatically. Mirrors the extension submission model. - Issue form with 12 fields (source repo, title, description, author, etc.) - Workflow with SHA-pinned actions, SSRF protection, input sanitization - Standalone scripts: parse-template-issue.js, update-templates-json.js, validate-template.js (shared validation library) - 129 tests across 5 test suites - No auto-merge, no trusted-publisher labels - Human reviews and approves the PR Closes Azure#784 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds an issue-form–driven automation path for adding new azd templates to the gallery, similar in spirit to the existing automated extension flows, by validating a submitted repo URL and generating a PR that updates website/static/templates.json.
Changes:
- Introduces a new GitHub Issue Form + workflow (
template-submission) that parses issue/dispatch inputs, validates the repo URL, updatestemplates.json, and opens a PR. - Adds standalone Node scripts for issue parsing, URL/SSRF validation, and updating
templates.json. - Adds Jest tests covering URL canonicalization/SSRF checks, issue parsing, and JSON updating; updates Jest config.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/template-submission.yml |
New workflow to parse submission inputs, validate repo, update templates.json, and create a PR + issue comments. |
.github/ISSUE_TEMPLATE/template-submission.yml |
New issue form for template submissions (collects repo/title/desc/tags/etc.). |
website/scripts/validate-template.js |
URL validation + SSRF protections and a HEAD reachability check. |
website/scripts/parse-template-issue.js |
Parses GitHub issue body / workflow_dispatch inputs into structured fields. |
website/scripts/update-templates-json.js |
Adds a new template entry to templates.json with duplicate detection and sanitization. |
website/test/validate-template.test.ts |
Unit tests for URL canonicalization, SSRF checks, and validation behavior. |
website/test/parse-template-issue.test.ts |
Unit tests for issue parsing and required-field detection. |
website/test/update-templates-json.test.ts |
Unit tests for sanitization, tag parsing, duplicate detection, and JSON writing. |
website/jest.config.js |
Updates Jest config (adds transformIgnorePatterns). |
website/docs/contribute.md |
Documents the new automated submission path and links to the issue form. |
website/scripts/validate-template.js
Outdated
| function isPrivateHost(hostname) { | ||
| if (/^localhost$/i.test(hostname)) return true; | ||
|
|
||
| // Strip brackets for IPv6 literal check | ||
| const bare = hostname.replace(/^\[|\]$/g, ""); | ||
| if (net.isIP(bare)) return isPrivateIP(bare); | ||
|
|
||
| // IPv4 patterns (URL parser normalises hex/octal to dotted decimal) | ||
| if (PRIVATE_IPV4_RANGES.some((re) => re.test(hostname))) return true; | ||
|
|
There was a problem hiding this comment.
isPrivateHost applies IPv4 private-range regexes to any hostname string (e.g., 10.example.com), which will incorrectly classify some legitimate public hostnames as private. Since net.isIP(bare) already detects real IP literals (including URL parser normalization), consider removing the PRIVATE_IPV4_RANGES hostname regex check and relying on IP-literal detection + safeLookup for DNS results.
There was a problem hiding this comment.
Fixed. Removed the hostname regex check in isPrivateHost —
et.isIP(bare) already catches real IP literals, and safeLookup handles DNS-resolved private IPs for non-literal hostnames. This eliminates false positives like \10.example.com.
| on: | ||
| issues: | ||
| types: [labeled] | ||
| workflow_dispatch: | ||
| inputs: | ||
| source_repo: | ||
| description: "GitHub repository URL" | ||
| required: true | ||
| template_title: | ||
| description: "Template title" | ||
| required: true | ||
| description: | ||
| description: "1-2 sentence description" | ||
| required: true | ||
| author: | ||
| description: "Author name" | ||
| required: true | ||
| author_url: | ||
| description: "Author GitHub URL" | ||
| required: true | ||
| author_type: | ||
| description: "Microsoft or Community" | ||
| required: true | ||
| default: "Community" | ||
| type: choice | ||
| options: | ||
| - Community | ||
| - Microsoft | ||
| preview_image: | ||
| description: "Preview image URL (optional)" | ||
| required: false | ||
| iac_provider: | ||
| description: "IaC provider" | ||
| required: true | ||
| default: "Bicep" | ||
| type: choice | ||
| options: | ||
| - Bicep | ||
| - Terraform | ||
| - Both | ||
| languages: | ||
| description: "Comma-separated language tags (optional)" | ||
| required: false | ||
| frameworks: | ||
| description: "Comma-separated framework tags (optional)" | ||
| required: false | ||
| azure_services: | ||
| description: "Comma-separated Azure service tags (optional)" | ||
| required: false | ||
|
|
||
| permissions: | ||
| contents: write | ||
| pull-requests: write | ||
| issues: write | ||
|
|
||
| concurrency: | ||
| group: template-submission | ||
| cancel-in-progress: false | ||
|
|
||
| jobs: | ||
| process-template: | ||
| if: >- | ||
| github.event_name == 'workflow_dispatch' || | ||
| contains(github.event.issue.labels.*.name, 'template-submission') | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
The workflow listens for issues: types: [labeled], but the issue form applies the template-submission label at creation time. In practice, this can result in submissions not being processed until someone re-labels the issue. Either (a) change the trigger to include opened (and keep the label check), or (b) remove the default label from the issue template and require a maintainer to apply the label to start processing.
There was a problem hiding this comment.
Fixed. Added \opened\ to the trigger types alongside \labeled. The label check in the \if\ condition still gates processing, but now submissions are reliably processed at creation time.
| name: "Submit an azd Template" | ||
| description: "Add your azd template to the Awesome azd gallery" | ||
| title: "[Template]: " | ||
| labels: ["template-submission"] |
There was a problem hiding this comment.
This issue template auto-applies the template-submission label. Combined with a workflow that can create branches/PRs, this removes the maintainer gate and can enable automated PR spam unless additional submitter allowlisting is implemented. If the intention is to require maintainers to opt-in per issue, consider removing the default label and letting a maintainer apply it when ready to process.
| labels: ["template-submission"] |
There was a problem hiding this comment.
Keeping the auto-label by design — the workflow only creates a PR (never auto-merges), so the maintainer gate is the PR review itself. The \concurrency\ group limits to one in-flight run. Added \opened\ trigger (comment above) as defense-in-depth.
| const regex = new RegExp(`### ${fieldName}\\s*\\n\\s*([^\\n]+)`, "i"); | ||
| const match = body.match(regex); | ||
| if (!match) return ""; | ||
| const value = match[1].trim(); | ||
| // Treat GitHub's default placeholder as empty | ||
| return value === "_No response_" ? "" : value; |
There was a problem hiding this comment.
extractField only captures the first non-empty line after a heading. In the issue form, Description is a textarea and can contain newlines, so this will silently truncate user input. Consider capturing until the next ### heading (or end of body) so multi-line values are parsed correctly, and update tests to cover this case.
| const regex = new RegExp(`### ${fieldName}\\s*\\n\\s*([^\\n]+)`, "i"); | |
| const match = body.match(regex); | |
| if (!match) return ""; | |
| const value = match[1].trim(); | |
| // Treat GitHub's default placeholder as empty | |
| return value === "_No response_" ? "" : value; | |
| // Capture everything after the heading up to the next heading (`### `) or end of body | |
| const regex = new RegExp( | |
| `### ${fieldName}\\s*\\n([\\s\\S]*?)(?=\\n### |$)`, | |
| "i" | |
| ); | |
| const match = body.match(regex); | |
| if (!match) return ""; | |
| const value = match[1].trim(); | |
| // Treat GitHub's default placeholder as empty | |
| if (!value || value === "_No response_") { | |
| return ""; | |
| } | |
| return value; |
There was a problem hiding this comment.
Fixed. \�xtractField\ now captures everything between headings (until next ### \ or end of body), so multi-line descriptions are preserved correctly.
| * Write key=value pairs to the GITHUB_OUTPUT file. | ||
| * @param {string} outputPath - File path from $GITHUB_OUTPUT | ||
| * @param {Record<string, string>} fields | ||
| */ | ||
| function writeOutputs(outputPath, fields) { | ||
| const lines = Object.entries(fields) | ||
| .map(([k, v]) => `${k}=${v}`) |
There was a problem hiding this comment.
writeOutputs writes key=value lines directly to $GITHUB_OUTPUT. If any parsed field contains a newline (issue body edits, API-submitted workflow_dispatch inputs), this can inject additional outputs and change later step conditions. Use the multi-line output format (key<<EOF ... EOF) or strip/replace \r/\n in values before writing.
| * Write key=value pairs to the GITHUB_OUTPUT file. | |
| * @param {string} outputPath - File path from $GITHUB_OUTPUT | |
| * @param {Record<string, string>} fields | |
| */ | |
| function writeOutputs(outputPath, fields) { | |
| const lines = Object.entries(fields) | |
| .map(([k, v]) => `${k}=${v}`) | |
| * Sanitize a value so it can be safely written as a GitHub Actions output. | |
| * Newlines and carriage returns are replaced to prevent output injection. | |
| * | |
| * @param {unknown} value | |
| * @returns {string} | |
| */ | |
| function sanitizeOutputValue(value) { | |
| if (value === null || value === undefined) { | |
| return ""; | |
| } | |
| return String(value).replace(/[\r\n]+/g, " "); | |
| } | |
| /** | |
| * Write key=value pairs to the GITHUB_OUTPUT file. | |
| * @param {string} outputPath - File path from $GITHUB_OUTPUT | |
| * @param {Record<string, string>} fields | |
| */ | |
| function writeOutputs(outputPath, fields) { | |
| const lines = Object.entries(fields) | |
| .map(([k, v]) => `${k}=${sanitizeOutputValue(v)}`) |
There was a problem hiding this comment.
Fixed. Added \sanitizeOutputValue()\ that strips CR/LF characters before writing to \. Applied to all three scripts that write outputs (parse-template-issue.js, update-templates-json.js, merge-extracted-metadata.js).
| * @param {string} outputPath | ||
| * @param {Record<string, string>} outputs | ||
| */ | ||
| function writeOutputs(outputPath, outputs) { | ||
| const lines = Object.entries(outputs) | ||
| .map(([k, v]) => `${k}=${v}`) |
There was a problem hiding this comment.
writeOutputs appends raw key=value lines to $GITHUB_OUTPUT. Because title/description/author are only HTML-stripped (newlines are preserved), an attacker can inject new outputs via newline characters and influence downstream workflow logic. Use GitHub’s multi-line output format or normalize values to a single line (e.g., replace CR/LF with spaces) before writing.
| * @param {string} outputPath | |
| * @param {Record<string, string>} outputs | |
| */ | |
| function writeOutputs(outputPath, outputs) { | |
| const lines = Object.entries(outputs) | |
| .map(([k, v]) => `${k}=${v}`) | |
| * Normalizes values to a single line to prevent output injection. | |
| * @param {string} outputPath | |
| * @param {Record<string, string>} outputs | |
| */ | |
| function writeOutputs(outputPath, outputs) { | |
| const lines = Object.entries(outputs) | |
| .map(([k, v]) => { | |
| const safeValue = String(v).replace(/[\r\n]+/g, " "); | |
| return `${k}=${safeValue}`; | |
| }) |
There was a problem hiding this comment.
Fixed. Same newline sanitization applied here and also in merge-extracted-metadata.js.
| const PRIVATE_IPV4_RANGES = [ | ||
| /^127\./, | ||
| /^10\./, | ||
| /^172\.(1[6-9]|2\d|3[01])\./, | ||
| /^192\.168\./, | ||
| /^169\.254\./, | ||
| /^0\./, |
There was a problem hiding this comment.
PRIVATE_IPV4_RANGES is missing several reserved/non-routable ranges commonly blocked in SSRF defenses (e.g., 100.64.0.0/10, 192.0.2.0/24, 198.18.0.0/15, multicast 224.0.0.0/4, etc.). Since this file advertises “private/reserved” filtering, consider expanding coverage to RFC 6890 reserved ranges (and add tests for the newly blocked ranges).
| const PRIVATE_IPV4_RANGES = [ | |
| /^127\./, | |
| /^10\./, | |
| /^172\.(1[6-9]|2\d|3[01])\./, | |
| /^192\.168\./, | |
| /^169\.254\./, | |
| /^0\./, | |
| const PRIVATE_IPV4_RANGES = [ | |
| // Loopback | |
| /^127\./, | |
| // RFC1918 private space | |
| /^10\./, | |
| /^172\.(1[6-9]|2\d|3[01])\./, | |
| /^192\.168\./, | |
| // Link-local | |
| /^169\.254\./, | |
| // "This" network | |
| /^0\./, | |
| // Carrier-Grade NAT (RFC 6598) 100.64.0.0/10 | |
| /^100\.(6[4-9]|7\d|8\d|9\d|1[01]\d|12[0-7])\./, | |
| // IETF protocol assignments & special-use (covers 192.0.0.0/24 including 192.0.0.170/31, 192.0.0.8/29, etc.) | |
| /^192\.0\.0\./, | |
| // Deprecated 6to4 relay anycast 192.88.99.0/24 | |
| /^192\.88\.99\./, | |
| // Documentation ranges (RFC 5737) | |
| /^192\.0\.2\./, | |
| /^198\.51\.100\./, | |
| /^203\.0\.113\./, | |
| // Benchmarking (RFC 2544) 198.18.0.0/15 | |
| /^198\.(1[89])\./, | |
| // Multicast 224.0.0.0/4 | |
| /^(22[4-9]|23\d)\./, | |
| // Reserved for future use 240.0.0.0/4 (includes 255.255.255.255 broadcast) | |
| /^(24\d|25[0-5])\./, |
There was a problem hiding this comment.
Fixed. Expanded \PRIVATE_IPV4_RANGES\ to cover RFC 6890 reserved ranges: CGNAT (100.64/10), documentation (192.0.2, 198.51.100, 203.0.113), benchmarking (198.18/15), IETF assignments (192.0.0/24), 6to4 relay (192.88.99), multicast (224/4), and future-use (240/4).
| const canonicalSource = canonicalizeUrl(sourceRepo); | ||
| const duplicate = templates.find( | ||
| (t) => canonicalizeUrl(t.source) === canonicalSource | ||
| ); | ||
| if (duplicate) { | ||
| return { | ||
| skipped: true, | ||
| skipReason: `Template with source ${sourceRepo} already exists ("${duplicate.title}")`, | ||
| }; | ||
| } | ||
|
|
||
| let iac; | ||
| if (iacProvider === "Both") { | ||
| iac = ["bicep", "terraform"]; | ||
| } else if (iacProvider === "Terraform") { | ||
| iac = ["terraform"]; | ||
| } else { | ||
| iac = ["bicep"]; | ||
| } | ||
|
|
||
| const tags = | ||
| authorType === "Microsoft" ? ["msft", "new"] : ["community", "new"]; | ||
|
|
||
| const generateId = uuidGenerator || (() => crypto.randomUUID()); | ||
|
|
||
| const entry = { | ||
| title, | ||
| description, | ||
| preview: previewImage || "templates/images/default-template.png", | ||
| authorUrl, | ||
| author, | ||
| source: sourceRepo, | ||
| tags, | ||
| IaC: iac, | ||
| id: generateId(), |
There was a problem hiding this comment.
updateTemplatesJson detects duplicates using canonicalizeUrl(sourceRepo) but stores the original sourceRepo string in the new entry. This can leave .git, query/fragment, trailing slashes, or inconsistent casing in templates.json, even though canonicalization exists. Consider storing the canonicalized URL (and possibly trimming) in entry.source so the file stays normalized and future duplicate checks are simpler.
There was a problem hiding this comment.
Fixed. \�ntry.source\ now stores the canonicalized URL (lowercased, no .git suffix, no trailing slash, no query/fragment) for consistent templates.json entries.
| - name: Comment on validation failure | ||
| if: steps.validate.outputs.valid == 'false' && github.event_name != 'workflow_dispatch' | ||
| uses: actions/github-script@7888e9ef07d6bca382a90e753d77e014ff4fd756 # v8 | ||
| env: | ||
| ERRORS: ${{ steps.validate.outputs.errors }} | ||
| with: | ||
| script: | | ||
| await github.rest.issues.createComment({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| issue_number: context.payload.issue.number, | ||
| body: `❌ **Template validation failed**\n\nPlease check your repository URL and try again.\n\n\`\`\`\n${process.env.ERRORS}\n\`\`\`` | ||
| }); | ||
|
|
There was a problem hiding this comment.
PR description says the workflow uses “no inline JS”, but this workflow relies on actions/github-script with inline scripts for commenting (and thus executes inline JavaScript). Either update the PR description, or replace these steps with a non-JS alternative (e.g., gh api / curl to the REST API).
There was a problem hiding this comment.
Fixed. Replaced all three \�ctions/github-script\ steps with \gh issue comment\ CLI calls. No more inline JS in the workflow — truly zero inline scripts now.
| function validateUrl(value, label) { | ||
| if (!value) return; | ||
| let parsed; | ||
| try { | ||
| parsed = new URL(value); | ||
| } catch { | ||
| throw new Error(`Invalid ${label} URL: "${value}"`); | ||
| } | ||
| if (parsed.protocol !== "https:") { | ||
| throw new Error( | ||
| `${label} URL must use HTTPS (got "${parsed.protocol}")` | ||
| ); | ||
| } | ||
| if (isPrivateHost(parsed.hostname)) { | ||
| throw new Error(`${label} URL points to a private/reserved address`); | ||
| } | ||
| } |
There was a problem hiding this comment.
validateUrl does not reject URLs containing userinfo (e.g., https://user:token@github.com/org/repo). If a submitter accidentally includes credentials, they could be persisted into templates.json and echoed in workflow logs. Consider explicitly rejecting parsed.username/parsed.password (and optionally non-default ports) for all user-supplied URLs.
There was a problem hiding this comment.
Fixed. \�alidateUrl\ now rejects URLs containing userinfo (\parsed.username\ or \parsed.password), preventing accidental credential leakage into templates.json or workflow logs.
Simplify the submission form to only require the repo URL. Metadata (title, description, author, languages, Azure services, IaC provider) is auto-extracted from the GitHub API, azure.yaml, and README.md. Form fields become optional overrides. - New: extract-template-metadata.js (safe YAML, size limits, SSRF protection) - New: merge-extracted-metadata.js (form values override extracted) - Updated: issue form and workflow_dispatch — only source_repo required - Updated: parse-template-issue.js — relaxed required fields - 180 tests passing across 6 suites Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Node 24's https.request passes {all:true} to the lookup function,
causing dns.lookup to return an array of {address, family} objects
instead of a plain string. Return the array format back to the
caller after validating each resolved address against the private
IP blocklist.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract first non-badge image from README as template preview, converting relative paths to raw.githubusercontent.com URLs. Detect frameworks from dependency files (package.json, requirements.txt, pyproject.toml, pom.xml, build.gradle, Gemfile) and GitHub topics. Covers all 22 framework tags: FastAPI, Flask, Django, React, Vue, Angular, Next.js, NestJS, Spring, Quarkus, LangChain, and more. README fetch increased to 50KB, shared between title and image extraction. All dependency fetches run in parallel, non-fatal on 404. 211 tests passing across 6 suites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… 6890 ranges - Remove hostname regex check in isPrivateHost (false positives on e.g. 10.example.com) - Expand PRIVATE_IPV4_RANGES to cover RFC 6890 (CGNAT, documentation, benchmarking, multicast) - Reject URLs with userinfo (credentials) in validateUrl - Fix extractField to capture multi-line values (description textarea) - Sanitize newlines in writeOutputs across all 3 scripts (prevents output injection) - Store canonicalized URL in templates.json entry.source - Add 'opened' to workflow trigger types (defensive, alongside 'labeled') - Replace actions/github-script with gh CLI (no more inline JS in workflow) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Adds an automated template submission pipeline that mirrors the existing extension submission model. Trusted partners can submit templates via an issue form instead of manually editing templates.json.
Flow: Issue form -> Workflow validates repo -> Updates templates.json -> Creates PR -> Human reviews and approves
What's included
Workflow & Issue Form
.github/ISSUE_TEMPLATE/template-submission.yml- 12-field issue form.github/workflows/template-submission.yml- 174-line YAML orchestration (no inline JS)Standalone Scripts (testable, reusable)
website/scripts/validate-template.js- URL validation with SSRF protection (IPv4 + IPv6 + DNS rebinding defense)website/scripts/parse-template-issue.js- Issue body parserwebsite/scripts/update-templates-json.js- JSON updater (imports shared validation, no duplication)Tests
Docs
website/docs/contribute.mdwith automated submission instructionsSecurity hardening
What's NOT included (by design)
Closes #784