Expose Google Antigravity models to VS Code via Copilot's official BYOK (Bring Your Own Key) interface - Manages CLIProxyAPI server lifecycle and configures custom language models using VS Code's supported extension APIs.
⚠️ Disclaimer: This extension uses VS Code's official Language Model API for custom model configuration. It does not modify GitHub Copilot internals, intercept Copilot traffic, or patch any Copilot files. This project is unofficial and not affiliated with GitHub, Microsoft, Google, or Anthropic.
- One-Click Server Management: Start/stop CLIProxyAPI directly from VS Code
- Automatic Configuration: Creates default
config.yamlif missing - BYOK Model Registration: Registers Antigravity models using VS Code's official Language Model API
- Status Bar Integration: Quick visual status and server controls
- Sidebar Dashboard: Monitor server status, view available models, and manage settings
- Auto-Start Support: Configure the server to start automatically with VS Code
- Authentication Launcher: Launches Antigravity's authentication flow via CLIProxyAPI
- Rate Limiting: Built-in rate limiter to prevent 429 errors with thinking models
- Optional Throttling Proxy: Local proxy that queues BYOK requests and can clamp max output tokens to reduce upstream 429s
| Model | Description | Capabilities |
|---|---|---|
| Claude Sonnet 4.5 | Latest Claude model | Tools |
| Claude Sonnet 4.5 (Thinking) | Extended thinking mode | Tools, Thinking |
| Claude Opus 4.5 (Thinking) | Most powerful Claude | Tools, Thinking |
| Gemini 2.5 Flash | Fast Gemini model | Tools |
| Gemini 2.5 Flash Lite | Lightweight Gemini | Tools |
| Gemini 3 Pro (Preview) | Latest Gemini Pro | Tools |
| Gemini 3 Flash (Preview) | Latest Gemini Flash | Tools |
| Gemini 3 Pro Image (Preview) | Gemini with vision | Tools, Vision |
| Gemini 2.5 Computer Use | Computer interaction | Tools, Vision |
| gpt-oss-120b-medium | Open source model | Basic |
-
VS Code Insiders (required for custom models support)
winget install --id Microsoft.VisualStudioCode.Insiders -
GitHub Copilot Pro subscription
-
GitHub Copilot Extensions (pre-release versions)
code-insiders --install-extension github.copilot --pre-release code-insiders --install-extension github.copilot-chat --pre-release
-
CLIProxyAPI installed in
%USERPROFILE%\CLIProxyAPI\The extension can automatically download and install the latest version for you when you attempt to start the server. Alternatively, you can install it manually:
$zipPath = "$env:TEMP\CLIProxyAPI.zip" $extractPath = "$env:USERPROFILE\CLIProxyAPI" # Download latest from GitHub Invoke-WebRequest -Uri "https://github.com/router-for-me/CLIProxyAPI/releases/latest/download/CLIProxyAPI_windows_amd64.zip" -OutFile $zipPath Expand-Archive -Path $zipPath -DestinationPath $extractPath -Force Remove-Item $zipPath
After configuring models, you must manually enable them in VS Code:
- Open Copilot Chat (
Ctrl+Alt+I) - Click the model picker dropdown
- Select "Manage Models..."
- Enable these recommended models:
- Gemini 3 Pro (Preview)
- Gemini 3 Flash (Preview)
- Claude Opus 4.5 (Thinking)
- Click the 👁️ eye icon next to each model to enable it
Note: Models can only be enabled manually through the VS Code UI. Programmatic enablement is not supported by the current VS Code BYOK API.
- CLIProxyAPI Configuration: (Optional) The extension will automatically create a default
config.yamlif one doesn't exist.- Default location:
%USERPROFILE%\CLIProxyAPI\config.yaml - Default content:
port: 8317 host: "127.0.0.1" auth-dir: "C:\\Users\\<USERNAME>\\.cli-proxy-api" providers: antigravity: enabled: true
- Default location:
- Download the
.vsixfile - Open VS Code Insiders
- Press
Ctrl+Shift+P→ "Extensions: Install from VSIX..." - Select the downloaded file
- Click the Antigravity icon in the Activity Bar
- Click "Login to Antigravity" button
- Follow the authentication flow in the terminal (Server will stop temporarily during login)
- Click "Start Server" in the sidebar
- Wait for the server to start (status will turn green)
- Click "Configure Models" button
- Reload VS Code when prompted
- Open Copilot Chat (
Ctrl+Alt+I) - Click the model picker dropdown → "Manage Models..."
- Find the Antigravity models and click the eye icon to enable them
- The models will now appear in the model picker dropdown
Open VS Code Settings (Ctrl+,) and search for antigravityCopilot:
| Setting | Default | Description |
|---|---|---|
server.enabled |
false |
Enable server on startup |
server.autoStart |
false |
Auto-start server with VS Code |
server.executablePath |
(auto) | Path to cli-proxy-api.exe |
server.port |
8317 |
Starting port (auto-increments if in use) |
server.host |
127.0.0.1 |
Server host |
autoConfigureCopilot |
true |
Auto-configure models on startup |
showNotifications |
true |
Show notifications |
Rate limiting provides a safety net for thinking models. The primary 429 mitigation is now aggressive retries matching Antigravity IDE's behavior.
| Setting | Default | Description |
|---|---|---|
rateLimit.enabled |
true |
Enable rate limiting |
rateLimit.cooldownMs |
5000 |
Base cooldown between requests |
rateLimit.showNotifications |
true |
Show notifications when blocked |
When consecutive 429 errors occur, the rate limiter automatically applies exponential backoff:
- Each consecutive 429 doubles the effective cooldown (up to 5× the base)
- The backoff resets after a successful request
- Check current backoff status via Command Palette → "Antigravity: Rate Limit Status"
This prevents hammering the upstream server when quota is exhausted.
The optional throttling proxy queues requests to prevent upstream 429 errors with thinking models. Thinking models use reduced token limits (maxInputTokens: 32000, maxOutputTokens: 2048) to minimize quota burn.
| Setting | Default | Description |
|---|---|---|
proxy.enabled |
true |
Enable the local throttling proxy |
proxy.host |
127.0.0.1 |
Proxy bind host |
proxy.port |
8420 |
Starting port for proxy (auto-increments if in use) |
proxy.rewriteMaxTokens |
true |
Clamp output tokens to reduce long generations |
proxy.maxTokensThinking |
1024 |
Max output tokens for Thinking models |
proxy.maxTokensStandard |
4096 |
Max output tokens for standard models |
proxy.logRequests |
true |
Log request metadata (model, status, duration) |
proxy.transformThinking |
true |
Transform streaming responses for thinking display |
proxy.thinkingTransformMode |
none |
Transform mode: none, annotate, enhanced, or claude |
proxy.thinkingTimeoutMs |
60000 |
Timeout for Thinking requests (abort long runs) |
proxy.requestTimeoutMs |
120000 |
Timeout for standard requests |
proxy.truncateToolOutput |
true |
Truncate very large tool outputs (e.g., git diff) |
proxy.maxToolOutputChars |
12000 |
Max chars kept per tool output after truncation |
proxy.toolOutputHeadChars |
6000 |
Chars kept from start of tool output |
proxy.toolOutputTailChars |
2000 |
Chars kept from end of tool output |
proxy.maxRequestBodyBytes |
10485760 |
Max request body size; returns 413 if exceeded |
proxy.thinkingConcurrency |
1 |
Max concurrent requests for Thinking models |
proxy.standardConcurrency |
3 |
Max concurrent requests for standard models |
proxy.maxRetries |
3 |
Retry attempts for 429 errors (0 to disable) |
proxy.retryBaseDelayMs |
1000 |
Base delay before first retry (exponential backoff) |
The proxy can transform streaming responses from Thinking models to help clients display reasoning content:
none: Direct passthrough without any transformation (most compatible)annotate(default): Adds minimal_is_thinkingmarkers to delta objectsenhanced: Adds comprehensive thinking block markers in OpenAI formatclaude: Full conversion to Anthropic/Claude streaming format (experimental)
Long “thinking” runs can consume a lot of quota (even if you didn’t request a huge visible answer). The proxy can abort long-running requests:
antigravityCopilot.proxy.thinkingTimeoutMs(default: 60s)antigravityCopilot.proxy.requestTimeoutMs(default: 120s)
Copilot Chat tool calls like git diff can produce very large outputs that get embedded into subsequent requests. This increases prompt size and can trigger upstream RESOURCE_EXHAUSTED.
When enabled, the proxy truncates only messages with role: "tool" that exceed your configured limits:
antigravityCopilot.proxy.truncateToolOutput: trueantigravityCopilot.proxy.maxToolOutputChars: 12000antigravityCopilot.proxy.toolOutputHeadChars: 6000antigravityCopilot.proxy.toolOutputTailChars: 2000
If you want to hard-limit request size regardless, set:
antigravityCopilot.proxy.maxRequestBodyBytes(default: 10MB)
Copilot Chat can fire multiple requests per prompt (tools, retries, follow-ups). For resource-intensive Thinking models, this can trip upstream quota even if you only clicked once.
The proxy uses a semaphore-based concurrency queue with separate limits for thinking vs standard models:
antigravityCopilot.proxy.thinkingConcurrency: 1(keep low to avoid exhaustion)antigravityCopilot.proxy.standardConcurrency: 3
Excess requests queue until a slot opens. Thinking requests have lower priority than standard requests.
When 429 or RESOURCE_EXHAUSTED errors occur, the proxy automatically retries with aggressive short-delay retries matching Antigravity IDE's approach:
antigravityCopilot.proxy.maxRetries: 5(set to 0 to disable)antigravityCopilot.proxy.retryBaseDelayMs: 100(almost immediate first retry)
Retry delays: ~200ms → ~400ms → ~800ms → ~1.6s → ~3.2s. Most 429 errors resolve within 2-3 retries.
{
"antigravityCopilot.server.autoStart": true,
"antigravityCopilot.autoConfigureCopilot": true,
"antigravityCopilot.showNotifications": true,
"antigravityCopilot.rateLimit.enabled": true,
"antigravityCopilot.rateLimit.cooldownMs": 5000,
"antigravityCopilot.proxy.enabled": true,
"antigravityCopilot.proxy.thinkingConcurrency": 1,
"antigravityCopilot.proxy.standardConcurrency": 3,
"antigravityCopilot.proxy.maxRetries": 5,
"antigravityCopilot.proxy.retryBaseDelayMs": 100,
"antigravityCopilot.proxy.thinkingTimeoutMs": 60000,
"antigravityCopilot.proxy.requestTimeoutMs": 120000,
"antigravityCopilot.proxy.truncateToolOutput": true,
"antigravityCopilot.proxy.maxToolOutputChars": 12000,
"antigravityCopilot.proxy.toolOutputHeadChars": 6000,
"antigravityCopilot.proxy.toolOutputTailChars": 2000,
"antigravityCopilot.proxy.maxRequestBodyBytes": 10485760
}Access commands via Command Palette (Ctrl+Shift+P):
- Antigravity: Start Server - Start the CLIProxyAPI server
- Antigravity: Stop Server - Stop the server
- Antigravity: Restart Server - Restart the server
- Antigravity: Login to Antigravity - Authenticate with Google
- Antigravity: Configure Models - Add models to Copilot Chat
- Antigravity: Show Server Controls - Open quick controls menu
- Antigravity: Rate Limit Status - View and manage rate limiter status
- Verify CLIProxyAPI is installed at
%USERPROFILE%\CLIProxyAPI\cli-proxy-api.exe - Check if port 8317 is already in use:
netstat -ano | findstr :8317 - Review logs: Click "Show Logs" in the dashboard
- Ensure you're using VS Code Insiders (not stable VS Code)
- Ensure Copilot extensions are pre-release versions
- Click "Configure Models" and reload VS Code
- Check if Custom OpenAI feature is available (gradual rollout)
- Run
Antigravity: Login to Antigravitycommand - Follow the browser authentication flow
- Check auth files in
%USERPROFILE%\.cli-proxy-api\
429 RESOURCE_EXHAUSTED indicates a quota or concurrency limit on the server side — not a syntax error. Common causes:
- Too many concurrent requests (especially with Thinking models)
- Hitting model-provider GPU/compute quota
- Very large context causing expensive internal passes
- Repeated long "thinking" requests consuming extra backend resources
Quick fixes:
- Enable rate limiting:
antigravityCopilot.rateLimit.enabled: true - Increase cooldown if errors persist:
antigravityCopilot.rateLimit.cooldownMs: 30000 - Enable tool output truncation to reduce context size
- Check rate limiter status via Command Palette → "Antigravity: Rate Limit Status"
- Reset the rate limiter from the sidebar dashboard if needed
The rate limiter applies exponential backoff automatically after consecutive 429s.
Copilot Chat can send multiple requests per prompt (tools, retries, follow-ups). For resource-intensive Thinking models, this can trip upstream quota/throttling even if you only clicked once.
This extension includes an optional local throttling proxy that queues requests before they reach CLIProxyAPI. It does not modify Copilot internals; it simply changes the BYOK endpoint URL Copilot uses.
- Enable the proxy:
antigravityCopilot.proxy.enabled: true
- Re-run Antigravity: Configure Models, then reload VS Code.
- Use a longer cooldown (start with 30–60s):
antigravityCopilot.rateLimit.cooldownMs: 60000
If you still see RESOURCE_EXHAUSTED immediately, your Antigravity account/model quota may be exhausted; switch to a lighter model or wait for quota reset.
If you want to understand why 429s happen (bursting, large requested outputs, etc.), you can enable proxy request logging.
- Setting:
antigravityCopilot.proxy.logRequests: true - What it logs: request metadata only (endpoint, model, token limits, status code, duration)
- What it does not log: your prompt text or chat content
Open the Antigravity output channel to view [PROXY ...] log lines.
-
Node.js (v18 or later)
winget install OpenJS.NodeJS.LTS
-
VS Code Extension Manager (vsce)
npm install -g @vscode/vsce
-
Clone the repository
git clone https://github.com/punal100/antigravity-copilot.git cd antigravity-copilot
-
Install dependencies
npm install
-
Compile TypeScript
npm run compile
-
Package the extension
# Using npm script npm run package # Or directly with vsce vsce package
This creates a
.vsixfile (e.g.,antigravity-copilot-1.5.3.vsix) in the project root.
-
Watch mode (auto-recompile on changes):
npm run watch
-
Lint the code:
npm run lint
- Open VS Code Insiders
- Press
Ctrl+Shift+P→ "Extensions: Install from VSIX..." - Select the generated
.vsixfile
MIT License
This extension:
- Manages CLIProxyAPI: A local OpenAI-compatible proxy server that launches and manages Antigravity authentication via CLIProxyAPI
- Registers Models via BYOK: Uses VS Code/Copilot's BYOK setting
github.copilot.chat.customOAIModelsto register custom OpenAI-compatible endpoints - Displays Status: Provides a sidebar UI for server management and status monitoring
No Copilot internals are modified. The extension only uses documented VS Code APIs and settings.
This extension explicitly does NOT:
- ❌ Modify GitHub Copilot internals or files
- ❌ Host or redistribute any AI models
- ❌ Collect, store, or transmit user credentials
- ❌ Intercept or proxy GitHub Copilot’s own service traffic
- ❌ Provide access to Antigravity (users must obtain access independently)
- ❌ Connect to any internal/private services
Notes:
- ✅ If you enable
antigravityCopilot.proxy.enabled, the extension runs an optional local throttling proxy only for the BYOK endpoint you configured (Copilot → your local endpoint). This is used to queue requests and reduce upstream 429s.
- Punal Manalan - Author and maintainer
- CLIProxyAPI - The proxy server powering this extension
This extension requires CLIProxyAPI and a Google account with Antigravity access.
- This project does not provide access to Antigravity — users must obtain access independently
- This project is unofficial and not affiliated with GitHub, Microsoft, Google, Anthropic, or OpenAI
- Users are responsible for ensuring their use complies with all applicable terms of service
- The authors assume no liability for any misuse or ToS violations
- Antigravity access may be subject to eligibility requirements or usage policies set by Google