.oPYo. o 8 8
8 8 8 8 8
8 oPYo. .oPYo. .oPYo. o8P .oPYo. .oPYo8 8oPYo. o o
8 8 `' 8oooo8 .oooo8 8 8oooo8 8 8 8 8 8 8
8 8 8 8. 8 8 8 8. 8 8 8 8 8 8
`YooP' 8 `Yooo' `YooP8 8 `Yooo' `YooP' `YooP' `YooP8
:.....:..:::::.....::.....:::..::.....::.....::::.....::....8
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::ooP'.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::...::
8 o
8 8
.oPYo. odYo. o o 8oPYo. oPYo. .oPYo. .oPYo. o8P
Yb.. 8' `8 8 8 8 8 8 `' 8 8 8 8 8
'Yb. 8 8 8 8 8 8 8 8 8 8 8 8
`YooP' 8 8 `YooP' `YooP' 8 `YooP' `YooP' 8
:.....:..::..:.....::.....:..:::::.....::.....:::..:
::::::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::
================================================================================
Version 3.0 - Updated for Veo 3.1 / Flow Date Updated: January 2026 Compatibility: Designed for Veo 3.1 / Flow as of January 2026
================================================================================
- Updated for Veo 3.1: Reflects current model capabilities including native audio generation
- Flow Integration: Added workflow patterns for Google Flow filmmaking workspace
- Revised Technical Specs: Updated duration (4/6/8s), resolution (720p/1080p), aspect ratios (16:9/9:16)
- New Prompt Formula: Adopted Google's official 5-part prompt structure
- Timestamp Prompting: Added multi-shot sequencing within single generations
- Reference Image System: Up to 3 reference images for consistency ("ingredients to video")
- First/Last Frame Control: New transition control capabilities
- Clip Extension: Build longer sequences by extending generated clips
- Simplified Dialogue Syntax: Updated to current best practices (colon format no longer required for subtitle prevention)
- Removed Deprecated Techniques: "(thats where the camera is)" syntax no longer necessary
- BREAKING: The "(thats where the camera is)" camera positioning syntax is deprecated; use standard cinematography terms instead
- BREAKING: 8-second maximum duration remains, but 4s and 6s options now available for complex actions
- UPDATE: Dialogue now uses quotation marks with "says:" format (e.g.,
The character says: "dialogue") - UPDATE: Negative prompts should describe what you don't want (avoid "no" or "don't" language)
================================================================================
- Veo 3.1 Technical Specifications
- The 5-Part Prompt Formula
- Character Development Framework
- Cinematography Integration
- Audio Engineering Excellence
- Timestamp Prompting
- Reference Images & Ingredients to Video
- First and Last Frame Control
- Clip Extension Workflows
- Troubleshooting & Optimization
================================================================================
Meta prompts are AI systems that generate professional Veo 3.1 video prompts automatically. Instead of manually crafting complex prompts, you describe what you want to a meta prompt, and it creates a complete, professional-grade Veo 3.1 prompt using advanced cognitive architecture.
User Input → Meta Prompt Analysis → Character Development → Scene Architecture → Technical Specification → Professional Veo 3.1 Prompt → High-Quality Video
- Professional 5-component Veo 3.1 prompts using Google's official format standards
- Character descriptions with comprehensive attributes for consistency
- Native audio integration with dialogue, SFX, and ambient sound
- Platform-optimized formatting for different aspect ratios and durations
- Reference image guidance for multi-shot consistency
- Quality assurance protocols with effective negative prompts
================================================================================
| Traditional Approach | Meta Prompt Approach |
|---|---|
| ❌ Manual prompt crafting | ✅ Automated generation |
| ❌ Inconsistent results | ✅ Professional consistency |
| ❌ Limited expertise | ✅ Advanced knowledge base |
| ❌ Time-intensive | ✅ Rapid optimization |
| ❌ Trial and error | ✅ Proven methodologies |
- Precision: Meta prompts use tested techniques and proven methodologies
- Speed: Generate professional prompts rapidly with automated systems
- Consistency: Maintain character and brand consistency across projects
- Scalability: Create variations and test efficiently across platforms
- Expertise: Access advanced cinematography and audio engineering principles
================================================================================
| Parameter | Options | Notes |
|---|---|---|
| Duration | 4, 6, or 8 seconds | Use 4-6s for complex action; 8s for atmospheric shots |
| Resolution | 720p or 1080p | 720p for iterations; 1080p for final output |
| Aspect Ratio | 16:9 or 9:16 | 16:9 for desktop/YouTube; 9:16 for Shorts/Reels/TikTok |
| Frame Rate | 24fps | Standard cinematic output |
| Audio | Native generation | Dialogue, SFX, ambient sound generated with video |
| Reference Images | Up to 3 | For character/object/style consistency |
- Image-to-Video: Animate a source image with prompt adherence and audio
- Ingredients to Video: Provide reference images for consistent aesthetics across shots
- First and Last Frame: Generate transitions between provided start/end images
- Add/Remove Object: Introduce or remove objects from generated video (uses Veo 2, no audio)
- Clip Extension: Extend previously generated clips to build longer sequences
- Digital Watermarking: All videos marked with SynthID
IMPORTANT: Be aware of these constraints when crafting prompts:
- Complex multi-action scenes may fragment; use one major action per shot
- Character identity can drift across shots without consistent reference images
- Exact lip-sync is not guaranteed; plan for VO alignment in post
- Native audio may need replacement for brand work or precise dialogue
- Hand and finger details may require attention in complex scenes
================================================================================
Google's official recommended structure for Veo 3.1 prompts:
[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]
Define camera work and shot composition:
- Shot Types: Wide shot, medium shot, close-up, extreme close-up
- Camera Angles: Eye-level, low-angle, high-angle, bird's-eye, Dutch angle
- Camera Movements: Static, pan, tilt, dolly, truck, crane, handheld, arc shot
- Lens Effects: Shallow depth of field, wide-angle, rack focus, fisheye
Identify the main character or focal point with specificity:
- Physical attributes, clothing, distinctive features
- Profession, role, or character type
- Emotional state and demeanor
Describe what the subject is doing:
- Primary movement or behavior
- Interactions with environment or other subjects
- Emotional expressions and subtle gestures
Detail the environment and background:
- Location (interior/exterior)
- Time of day and weather
- Atmospheric details and props
Specify overall aesthetic, mood, and lighting:
- Lighting setup and quality
- Color palette and visual tone
- Film style or genre aesthetic
Medium shot, a tired corporate worker, rubbing his temples in exhaustion, in front of a bulky 1980s computer in a cluttered office late at night. The scene is lit by the harsh fluorescent overhead lights and the green glow of the monochrome monitor. Retro aesthetic, shot as if on 1980s color film, slightly grainy.
================================================================================
For maintaining character consistency across multiple shots:
[NAME/ROLE], a [AGE] [ETHNICITY] [GENDER] with [HAIR_DETAILS], [EYE_COLOR] eyes, [FACIAL_FEATURES], [BUILD], wearing [CLOTHING], with [POSTURE/MANNERISMS], [EMOTIONAL_STATE]
✅ Essential Elements:
- Age and appearance
- Gender presentation
- Hair: color, style, length, texture
- Eyes: color, shape, expression
- Facial features: distinctive characteristics
- Build: height, weight, body type
- Clothing: style, color, fit, material
- Posture and movement patterns
- Emotional baseline
- Identical Descriptions: Use exact same wording across all prompts in a series
- Reference Images: Use 1-3 consistent reference images per character
- Wardrobe Lock: Keep clothing and accessories consistent
- Pose Similarity: Maintain similar poses in reference images to reduce drift
- Aspect Ratio Lock: Don't switch between 16:9 and 9:16 mid-project
================================================================================
| Angle | Effect | Example |
|---|---|---|
| Eye-level | Neutral, relatable | "eye-level shot of a woman sipping tea" |
| Low-angle | Powerful, imposing | "low-angle tracking shot of a superhero landing" |
| High-angle | Vulnerable, small | "high-angle shot of a child lost in a crowd" |
| Bird's-eye | Map-like overview | "bird's-eye view of a bustling city intersection" |
| Dutch angle | Unease, dynamism | "dutch angle shot of a character running down a hallway" |
| Movement | Description | Example |
|---|---|---|
| Static | No movement | "static shot of a serene landscape" |
| Pan | Horizontal rotation | "slow pan left across a city skyline at dusk" |
| Tilt | Vertical rotation | "tilt down from face to letter in hands" |
| Dolly | Physical in/out | "dolly out to emphasize isolation" |
| Truck | Horizontal travel | "truck right, following character walking" |
| Crane | Vertical sweep | "crane shot revealing vast battlefield" |
| Handheld | Realistic, immediate | "handheld camera during chaotic chase" |
| Arc | Circular path | "arc shot around couple embracing" |
| Shot Type | Framing | Use Case |
|---|---|---|
| Extreme Wide (EWS) | Full environment | Establishing location |
| Wide Shot (WS) | Full body + environment | Context and scale |
| Medium Shot (MS) | Waist up | Dialogue, conversation |
| Close-Up (CU) | Head and shoulders | Emotion, connection |
| Extreme Close-Up (ECU) | Eyes/detail only | Intense emotion, detail |
- Shallow depth of field: Subject sharp, background bokeh
- Rack focus: Shift focus between foreground and background
- Wide-angle: Expanded field of view, slight distortion
- Fisheye: Extreme barrel distortion, panoramic
- Vertigo effect (dolly zoom): Background perspective shift while subject stays same size
================================================================================
Veo 3.1 generates synchronized audio with video. Clearly specify audio elements in your prompt using separate sentences.
Use quotation marks with character identification:
✅ RECOMMENDED FORMAT:
"The seasoned detective says: 'Your story has holes.'"
"A woman says, 'We have to leave now.'"
"The narrator speaks with a polished British accent in a serious, urgent tone."
Describe individual, distinct sounds:
"SFX: thunder cracks in the distance"
"The sound of a phone ringing"
"Soft house sounds, the creak of a closet door, and a ticking clock"
Define background soundscape:
"Ambient noise: the quiet hum of a starship bridge"
"The sounds of city traffic and distant sirens"
"Waves crashing on the shore"
- Keep: Use native audio when ambient/environmental sound fits the scene
- Refine: Check for audio jumps between clips; crossfade in NLE
- Replace: For brand work, precise dialogue, or music-driven pieces, plan to replace native audio
Exact lip-sync is not guaranteed. For projects requiring precise lip movement, plan for VO alignment and possible retiming in your NLE.
================================================================================
Every effective Veo 3.1 meta prompt follows this architecture:
🧠 COGNITIVE LAYERS:
├── Identity Layer: Role definition and expertise areas
├── Knowledge Layer: Technical specifications and best practices
├── Analysis Layer: Requirement parsing and optimization
├── Generation Layer: Professional format application
├── Quality Layer: Validation and error prevention
└── Output Layer: Structured response with alternatives
PROCESSING PHASES:
Phase 1: Requirements Analysis
├── Parse user intent and objectives
├── Identify target platform and aspect ratio
├── Determine content type and duration
├── Assess reference image needs
└── Plan quality assurance checkpoints
Phase 2: Creative Development
├── Design character profiles with consistency framework
├── Develop scene environments with atmospheric details
├── Plan camera work using 5-part formula
├── Script dialogue and audio elements
└── Integrate brand messaging if applicable
Phase 3: Technical Optimization
├── Apply 5-part prompt structure
├── Ensure duration-appropriate action complexity
├── Integrate effective negative prompts
├── Specify reference images if needed
└── Validate audio-visual synchronization
Phase 4: Quality Validation
├── Review prompt clarity and specificity
├── Check character description completeness
├── Validate technical accuracy
├── Ensure platform compliance
└── Assess generation probability
================================================================================
- ✅ Subject description is specific and detailed
- ✅ Action is singular and clear (one major action per shot)
- ✅ Camera work specifies shot type, angle, and movement
- ✅ Context includes location, time, and atmospheric details
- ✅ Style & ambiance defines lighting and visual aesthetic
- ✅ Audio elements are explicitly specified if needed
- ✅ Negative prompts describe unwanted elements (not using "no" or "don't")
- ✅ Duration is appropriate for action complexity
- ✅ Reference images prepared if character consistency needed
- ✅ Aspect ratio matches delivery platform
Format: Describe what you don't want to see (avoid instructive language)
❌ AVOID: "no walls" or "don't show walls"
✅ USE: "wall, frame" (meaning you don't want these elements)
Common Quality Control Negatives:
subtitles, captions, watermark, text overlays, words on screen, logo, blurry footage, low resolution, artifacts, distorted hands, compression noise, camera shake
- Subject: Is the character/object exactly what you asked for?
- Motion: Is the action readable and not jittery?
- Framing: Does the camera match your plan?
- Lighting: Does the mood match your specification?
- Audio: Does the native audio generally fit?
================================================================================
Create complete sequences with precise cinematic pacing in a single generation by assigning actions to timed segments:
[00:00-00:02] Medium shot from behind a young female explorer with a leather satchel and messy brown hair in a ponytail, as she pushes aside a large jungle vine to reveal a hidden path.
[00:02-00:04] Reverse shot of the explorer's freckled face, her expression filled with awe as she gazes upon ancient, moss-covered ruins in the background. SFX: The rustle of dense leaves, distant exotic bird calls.
[00:04-00:06] Tracking shot following the explorer as she steps into the clearing and runs her hand over the intricate carvings on a crumbling stone wall. Emotion: Wonder and reverence.
[00:06-00:08] Wide, high-angle crane shot, revealing the lone explorer standing small in the center of the vast, forgotten temple complex, half-swallowed by the jungle. SFX: A swelling, gentle orchestral score begins to play.
- Use 2-second segments for distinct beats
- Maintain character consistency across segments
- Specify camera changes at each timestamp
- Include audio cues where appropriate
- Keep total duration within 8 seconds
================================================================================
Use up to 3 reference images per generation to maintain consistency:
- Character Reference: Lock appearance across shots
- Object Reference: Maintain product/prop consistency
- Style Reference: Ensure aesthetic continuity
- Generate References: Create reference images using Gemini 2.5 Flash Image or similar
- Compose Scene: Use Ingredients to Video feature with relevant references
- Prompt with References: Specify which reference applies to which element
Example:
"Using the provided images for the detective, the woman, and the office setting, create a medium shot of the detective behind his desk. He looks up at the woman and says in a weary voice, 'Of all the offices in this town, you had to walk into mine.'"
- Keep subject's clothing and pose similar in references
- Reuse the same 1-3 references across related clips
- Lock aspect ratio across your project
- Use shorter durations (4-6s) for action-heavy beats
================================================================================
Create controlled camera movements or transformations between two distinct points:
Step 1: Create starting frame (image generation)
"Medium shot of a female pop star singing passionately into a vintage microphone. She is on a dark stage, lit by a single, dramatic spotlight from the front. Photorealistic, cinematic."
Step 2: Create ending frame
"POV shot from behind the singer on stage, looking out at a large, cheering crowd. The stage lights are bright, creating lens flare. Energetic atmosphere."
Step 3: Animate with Veo 3.1
"The camera performs a smooth 180-degree arc shot, starting with the front-facing view of the singer and circling around her to seamlessly end on the POV shot from behind her on stage. The singer sings 'when you look me in the eyes, I can see a million stars.'"
- Controlled camera movements between specific compositions
- Smooth transitions between scenes
- Maintaining lighting and composition cohesion
================================================================================
Extend previously generated Veo clips to build longer sequences:
- Generate Base Clip: Create your best short clip (4-8s)
- Review and Select: Choose the strongest generation
- Extend: Use extension feature to continue the action
- Iterate: Refine extensions as needed
- Start with shorter durations (4-6s) for complex action
- Extend only your best clips
- Maintain consistent prompt language
- Use first/last frame control for smooth handoffs
- Plan to stitch multiple clips in your NLE for final edit
================================================================================
| Symptom | Likely Cause | Fix |
|---|---|---|
| Soft/smeared motion | Too many concurrent actions | Shorten to 4-6s; simplify action; add reference image |
| Character identity drift | Inconsistent references | Reuse same 1-3 references; keep wardrobe similar |
| Lighting shifts between clips | Prompt language variance | Standardize 2-3 lighting descriptors; use frame control |
| Camera mismatch | Vague terms ("cinematic" only) | Specify focal length, shot size, direction |
| Distracting native audio | Ambience/level mismatch | Disable and replace in NLE |
| Repeated artifacts | Seed stuck; over-constrained | Change seed; reduce adjectives; adjust one variable |
- One major action per shot - Complex multi-action scenes fragment
- Use reference images when consistency matters - Up to 3 per generation
- Lock aspect ratio for the project - Don't flip mid-project
- Keep language plain - Replace poetic metaphors with visual specifics
- Use seed for reproducibility - Change seed when stuck in a "look rut"
- Move one variable at a time - Adjust camera OR lighting, not both
================================================================================
Corporate Standards:
- Executive presence and authority
- Brand-compliant visual elements
- Professional attire and grooming
- Confident body language
- Clear, authoritative communication
- Appropriate office environments
Learning Psychology:
- Visual-auditory synchronization
- Cognitive load management
- Attention-grabbing techniques
- Clear progression and structure
- Multi-sensory engagement
Conversion Triggers:
- Hook within first 2 seconds
- Emotional engagement activation
- Platform-specific formatting (16:9 vs 9:16)
- Call-to-action optimization
- Demographic targeting precision
30-Second Brand Montage (16:9)
- Plan: 5-6 clips, each 4-6s
- Settings: 1080p, 24fps, 16:9
- Use reference images for products
- Replace audio with licensed track and VO
Vertical Social Teaser (9:16)
- Plan: 3-4 clips at 4-6s each
- Settings: 720p for iteration, 1080p final
- Strong subject isolation
- Keep text-safe zones for captions
Short Narrative with Continuity
- Plan: 4 clips—establishing, action, reaction, resolve
- Use first/last frame control between clips
- Reference images for character and setting
- Extend strongest clip to bridge beats
================================================================================
✅ Core Architecture
- 5-Part Prompt Formula implementation
- Character Consistency Framework
- Quality Assurance Protocols
✅ Technical Excellence
- Appropriate duration for action complexity
- Reference images for consistency
- Effective negative prompts (descriptive, not instructive)
- Audio elements explicitly specified
✅ Advanced Techniques
- Timestamp prompting for multi-shot sequences
- First/last frame control for transitions
- Clip extension for longer sequences
- Seed management for reproducibility
- Start Simple: Master the 5-part formula before advanced techniques
- Iterate Quickly: Use 720p for iterations, 1080p for finals
- Lock Variables: Keep aspect ratio, references, and style consistent
- One Change at a Time: Adjust single variables between iterations
- Plan for Post: Native audio is a starting point; plan replacements
- Use References: Character consistency requires consistent reference images
- Mind Duration: Complex action needs shorter clips (4-6s)
- Test and Learn: Track what works for your specific use cases
================================================================================
- Google Veo 3.1 Model Reference
- Vertex AI Video Generation Prompt Guide
- Ultimate Prompting Guide for Veo 3.1 (Google Cloud Blog)
- Gemini API Video Documentation
- Google Flow Labs
- DeepMind Veo Overview
- Topaz Lab's Video Upscaler: 4K/60fps enhancement
- Luma's Reframe Video: Vertical format conversion
- DaVinci Resolve: Professional editing and color grading
================================================================================
Streamline your video generation workflow with systematic meta prompt architecture.
Professional video generation through systematic automation
🔧 Professional Meta Prompt Architecture Guide
================================================================================
Last updated: January 2026 Version 3.0 - Updated for Veo 3.1 / Flow Compatibility: Veo 3.1 / Flow as of January 2026
Guide Features:
- Complete meta prompt architecture with 5-part prompt formula
- Updated technical specifications for Veo 3.1 (4/6/8s, 720p/1080p, 16:9/9:16)
- Native audio generation guidance (dialogue, SFX, ambient)
- Reference image system for character/object consistency
- Timestamp prompting for multi-shot sequences
- First/last frame control for transitions
- Clip extension workflows for longer sequences
- Domain-specific templates and scenario playbooks
- Comprehensive troubleshooting guide
- Quality assurance protocols and checklists
A comprehensive resource for professional Veo 3.1 / Flow video generation through systematic meta prompt automation.
================================================================================