Skip to content

chore(pricing): Update vertex-ai pricing#550

Open
siddharthsambharia-portkey wants to merge 17 commits intomainfrom
pricing-update/vertex-ai
Open

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 17 commits intomainfrom
pricing-update/vertex-ai

Conversation

@siddharthsambharia-portkey
Copy link
Collaborator

@siddharthsambharia-portkey siddharthsambharia-portkey commented Mar 17, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 0
🔄 Models updated (merged) 9

🔄 Updated Models

  • gemini-2.5-pro
  • gemini-3.1-pro-preview
  • gemini-3.1-flash-image-preview
  • gemini-3.1-flash-lite-preview
  • gemini-3-pro-image-preview
  • gemini-3-flash-preview
  • veo-3.1-fast-generate-001
  • veo-3.0-fast-generate-preview
  • gemini-embedding-2-preview

Model → Pricing Page Mapping

Model ID Publisher / Section Source Notes
gemini-2.5-pro Google – Gemini 2.5 API Standard input $1.25/output $10 (≤200K); tiered pricing above 200K
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 API Matches "Gemini 2.5 Pro Computer Use-Preview" row
gemini-2.5-flash Google – Gemini 2.5 API Input $0.30, output $2.50
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 API Preview alias; uses gemini-2.5-flash pricing
gemini-2.5-flash-image Google – Gemini 2.5 API Matches "Gemini 2.5 Flash Image"; image_token $30/1M
gemini-2.5-flash-lite Google – Gemini 2.5 API Input $0.10, output $0.40
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 API Preview alias; uses gemini-2.5-flash-lite pricing
gemini-2.0-flash-001 Google – Gemini 2.0 API Input $0.15, output $0.60
gemini-2.0-flash-lite-001 Google – Gemini 2.0 API Input $0.075, output $0.30
gemini-3.1-pro-preview Google – Gemini 3 API Input $2, output $12 (≤200K)
gemini-3.1-flash-image-preview Google – Gemini 3 API Input $0.50, output $3; image_token $60/1M
gemini-3.1-flash-lite-preview Google – Gemini 3 API Input $0.25, output $1.50
gemini-3-pro-preview Google – Gemini 3 API Input $2, output $12 (≤200K)
gemini-3-pro-image-preview Google – Gemini 3 API Input $2, output $12; image_token $120/1M
gemini-3-flash-preview Google – Gemini 3 API Input $0.50, output $3
imagen-4.0-generate-001 Google – Imagen API $0.04/image; row matched via lookup_variant imagen-4.0-generate
imagen-4.0-fast-generate-001 Google – Imagen API $0.02/image; row matched via imagen-4.0-fast-generate
imagen-4.0-ultra-generate-001 Google – Imagen API $0.06/image; row matched via imagen-4.0-ultra-generate
imagen-3.0-generate-002 Google – Imagen API $0.04/image; row matched via imagen-3.0-generate
imagen-3.0-capability-001 Google – Imagen API $0.04/image; uses imagen-3.0-generate pricing (capability shares generate price)
imagen-3.0-capability-002 Google – Imagen API $0.04/image; uses imagen-3.0-generate pricing (capability shares generate price)
veo-3.1-generate-001 Google – Veo API $0.20/sec (720p/1080p video); row matched via veo-3.1-generate
veo-3.1-generate-preview Google – Veo API $0.20/sec (720p/1080p video); preview alias for veo-3.1
veo-3.1-fast-generate-001 Google – Veo API $0.10/sec (720p/1080p video); row matched via veo-3.1-fast-generate
veo-3.1-fast-generate-preview Google – Veo API $0.10/sec; preview alias for veo-3.1-fast
veo-3.0-generate-001 Google – Veo API $0.20/sec (720p/1080p); row matched via veo-3.0-generate
veo-3.0-generate-preview Google – Veo API $0.20/sec; preview alias for veo-3.0
veo-3.0-fast-generate-001 Google – Veo API $0.10/sec; row matched via veo-3.0-fast-generate
veo-3.0-fast-generate-preview Google – Veo API $0.10/sec; preview alias for veo-3.0-fast
veo-2.0-generate-001 Google – Veo API $0.50/sec; row matched via veo-2.0-generate
gemini-embedding-001 Google – Gemini Embedding API $0.00015/1K tokens online, $0.00012/1K batch
gemini-embedding-2-preview Google – Gemini Embedding API Preview; uses Gemini Embedding pricing
text-embedding-005 Google – Text Embedding API $0.000025/1K chars (per_thousand_tokens unit)
text-multilingual-embedding-002 Google – Text Embedding API $0.000025/1K chars
text-embedding-large-exp-03-07 Google – Text Embedding API Experimental; uses text-embedding pricing $0.000025/1K chars
textembedding-gecko Google – Text Embedding API Legacy; uses text-embedding pricing $0.000025/1K chars
multimodalembedding Google – Multimodal Embedding API Per-image $0.0001, video-plus $0.0020/sec, standard $0.0010/sec, essential $0.0005/sec
claude-opus-4-6 Anthropic – Claude API @default stripped; input $5, output $25; 5m cache write $6.25, cache hit $0.50
claude-sonnet-4-6 Anthropic – Claude API @default stripped; input $3, output $15; cache write $3.75, cache hit $0.30
claude-opus-4@20250514 Anthropic – Claude API Input $15, output $75; cache write $18.75, cache hit $1.50
claude-sonnet-4@20250514 Anthropic – Claude API Input $3, output $15; cache write $3.75, cache hit $0.30
claude-opus-4-1@20250805 Anthropic – Claude API Input $15, output $75; cache write $18.75, cache hit $1.50
claude-sonnet-4-5@20250929 Anthropic – Claude API Input $3, output $15; cache write $3.75, cache hit $0.30
claude-haiku-4-5@20251001 Anthropic – Claude API Input $1, output $5; cache write $1.25, cache hit $0.10
claude-opus-4-5@20251101 Anthropic – Claude API Input $5, output $25; cache write $6.25, cache hit $0.50
gpt-oss-120b-maas OpenAI API Matches "gpt-oss-120b" row; input $0.09, output $0.36
llama-3.3-70b-instruct-maas Meta – Llama API Matches "Llama 3.3 70B"; input $0.72, output $0.72
llama-4-maverick-17b-128e-instruct-maas Meta – Llama API Matches "Llama 4 Maverick"; input $0.35, output $1.15
mistral-small-2503 Mistral AI API Matches "Mistral Small 3.1 (25.03)"; input $0.10, output $0.30
mistral-medium-3 Mistral AI API Input $0.40, output $2.00
codestral-2 Mistral AI API Input $0.30, output $0.90
deepseek-r1-0528-maas DeepSeek API Matches "DeepSeek-R1 (0528)"; input $1.35, output $5.40
deepseek-v3.1-maas DeepSeek API Matches "DeepSeek-V3.1"; input $0.60, output $1.70; cache hit $0.06
deepseek-v3.2-maas DeepSeek API Matches "DeepSeek-V3.2"; input $0.56, output $1.68; cache hit $0.056
qwen3-235b-a22b-instruct-2507-maas Qwen API Matches "Qwen3-235B-A22B-Instruct-2507"; input $0.22, output $0.88
qwen3-coder-480b-a35b-instruct-maas Qwen API Matches "Qwen3-Coder-480B-A35B-Instruct"; input $0.22, output $1.80; cache hit $0.022
qwen3-next-80b-a3b-instruct-maas Qwen API Matches "Qwen3-Next-80B-Instruct"; input $0.15, output $1.20
qwen3-next-80b-a3b-thinking-maas Qwen API Matches "Qwen3-Next-80B-Thinking"; input $0.15, output $1.20
kimi-k2-thinking-maas Kimi / Moonshot API Matches "Kimi-K2-Thinking"; input $0.60, output $2.50; cache hit $0.06
minimax-m2-maas MiniMax API Matches "MiniMax-M2"; input $0.30, output $1.20; cache hit $0.03
glm-4.7-maas ZAI.org / GLM API Matches "GLM-4.7"; input $0.60, output $2.20
glm-5-maas ZAI.org / GLM API Matches "GLM-5"; input $1.00, output $3.20; cache hit $0.10

Excluded Models

Model ID Publisher Reason
gemini-live-2.5-flash-native-audio Google Gemini Live streaming — excluded per global rules
lyria-002 Google Music generation — excluded per global rules
imagegeneration Google Legacy, superseded by imagen-3.0+ — excluded per google.md
virtual-try-on-001 Google Product-specific retail model — excluded per google.md
pretrained-ocr Google OCR — excluded per global rules
shieldgemma2 Google Safety/guard model — excluded per global rules
t5gemma, paligemma, codegemma, gemma, gemma2, gemma3, gemma3n, functiongemma, translategemma, embeddinggemma, mammut, txgemma, medgemma, medsiglip, medasr, hear, path-foundation, derm-foundation, cxr-foundation, bert-base-uncased, timesfm, weathernext Google Self-deploy only or non-generative ML — excluded per google.md
chirp-2, chirp-3 Google Audio transcription, not generative inference — excluded
translate-llm, text-translation Google Translation, not generative inference
video-text-detection, video-speech-transcription Google Non-generative CV/transcription
earth-ai-imagery-owlvit-eap-10-2025, earth-ai-imagery-mammut-eap-10-2025 Google Non-generative vision models
image-segmentation-001 Google Image segmentation — non-generative
Various legacy/CV models Google Non-generative: imageclassification-, imageobjectdetection-, bert-base, resnet50, etc.
clip-vit-base-patch32, openclip OpenAI Non-generative (vision embedding/classification)
whisper-large OpenAI Audio transcription — excluded per openai.md
gpt-oss OpenAI Self-deploy (has_deploy: true, no -maas suffix)
faster-r-cnn, retinanet, mask-r-cnn, segment-anything, sam3 Meta Non-generative CV
roberta-large, xlm-roberta-large, nllb, imagebind Meta Non-generative NLP/embedding
llama-guard, prompt-guard Meta Safety/guard models
codellama-7b-hf, llama2, llama-2-quantized, llama3, llama3_1, llama3-2, llama3-3, llama4 Meta Self-deploy only (has_deploy: true, no -maas)
mistral, mixtral Mistral (mistral-ai) Self-deploy only
codestral-2501-self-deploy Mistral (mistralai) Self-deploy (name contains self-deploy)
mistral-ocr-2505 Mistral OCR — excluded per global rules
ministral-3, mistral-large-3 Mistral Self-deploy only
deepseek-r1, deepseek-v3, deepseek-v3-1, deepseek-v3-2 DeepSeek Self-deploy only
deepseek-ocr, deepseek-ocr-2, deepseek-ocr-maas DeepSeek OCR — excluded per global rules
qwq, qwen3, qwen3-embedding, qwen3-5, qwen2, qwen3-coder, qwen3-coder-next, qwen3-next, qwen3-vl Qwen Self-deploy only
qwen-image Qwen Excluded by policy (qwen-image)
kimi-k2, kimi-k2-5 Kimi Self-deploy only
minimax-m2 MiniMax Self-deploy only
glm-4.7, glm-5, glm-4.5 ZAI.org Self-deploy only
glm-ocr ZAI.org OCR — excluded per global rules
glm-image ZAI.org Excluded by policy (glm-image)
jamba-large-1.6 AI21 Self-deploy only (has_deploy: true, no -maas)

Generated by Pricing Agent on 2026-03-24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant