Skip to content

ci: add GCP Workload Identity Federation for Vertex AI in record-integration-tests workflow #5272

@Artemon-line

Description

@Artemon-line

🚀 Describe the new functionality needed

The record-integration-tests.yml workflow currently references secrets.GOOGLE_APPLICATION_CREDENTIALS and secrets.VERTEX_AI_PROJECT / secrets.VERTEX_AI_LOCATION for the vertexai provider, but these secrets are not configured in the repository. As a result, the Vertex AI recording job silently runs without valid credentials and fails.

The fix is to adopt GCP Workload Identity Federation (OIDC-based, keyless authentication) instead of static service account credentials.

What needs to change:

  1. GCP side: Set up a GCP project with a Workload Identity Pool and Provider that trusts the llamastack/llama-stack GitHub repo's OIDC tokens.

  2. GitHub repo side: Add two repository secrets:

    • GCP_WORKLOAD_IDENTITY_PROVIDER — the full provider resource name (projects/<id>/locations/global/workloadIdentityPools/<pool>/providers/<provider>)
    • VERTEX_AI_PROJECT — the GCP project ID (already referenced but not set)
  3. Workflow changes in record-integration-tests.yml:

    • Add id-token: write permission (required for OIDC token exchange)
    • Add a google-github-actions/auth step (pinned SHA) before the test run step for the vertexai provider:
      - name: Authenticate to Google Cloud (Vertex AI)
        if: matrix.provider.setup == 'vertexai'
        uses: google-github-actions/auth@7c6bc770dae815cd3e89ee6cdf493a5fab2cc093 # v3
        with:
          project_id: ${{ secrets.VERTEX_AI_PROJECT }}
          workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
    • Remove the GOOGLE_APPLICATION_CREDENTIALS secret reference (the auth action sets GOOGLE_APPLICATION_CREDENTIALS automatically via a generated credentials file)
    • Set VERTEX_AI_LOCATION to a hardcoded value (e.g., global) rather than a secret, since it's not sensitive
  4. Security: Fork PRs and Dependabot PRs should skip the Vertex AI auth step (OIDC tokens are not available).

💡 Why is this needed? What if we don't build it?

Without this, the vertexai provider in the recording matrix is effectively dead — it appears in the workflow but can never authenticate. This means:

  • Vertex AI integration test recordings cannot be auto-generated or updated via CI
  • Contributors must record Vertex AI tests manually with their own credentials
  • The workflow gives a false sense of coverage by listing vertexai as a provider

Workload Identity Federation is the recommended approach for GitHub Actions ↔ GCP auth (no long-lived keys to rotate, no secret file management).

Other thoughts

  • The vertexai provider is already gated behind workflow_dispatch (not auto-triggered on PRs), so the id-token: write permission only applies to manual runs, minimizing security surface.
  • The existing security model (fork PR blocking, read-only pull_request trigger) is preserved.
  • VERTEX_AI_LOCATION should be global for Gemini models (regional endpoints don't support them).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions