Why this experiment?
At Terranoha, our approach is driven by curiosity and a hands-on exploration of emerging technologies. When Google DeepMind released Veo 3, its new experimental video generation tool powered by AI, we wanted to understand how it truly performs — beyond the theoretical demos. Our goal was simple:- Test what Veo 3 can actually deliver today
- Identify its technical limitations in a professional context
- Explore how to produce a short professional video using only text-based prompts
What Veo 3 Promises
On paper, Veo 3 offers impressive capabilities:- 1080p video generation from text prompts
- Multiple visual styles (cinematic, animated, documentary…)
- Temporal and visual consistency
- Voice-over integration via audio/text prompts
What We Actually Encountered
During our deep dive into Veo 3, we encountered several significant limitations for our professional use case:- Severely limited length: Max 8 seconds per sequence, forcing the video to be artificially fragmented
- Voice-over sync issues: Audio sometimes failed to generate despite accurate prompts
- Subtitle inconsistencies: Despite Google’s recent updates, we continued to face recurring errors
- Prompt variability: Even with highly detailed descriptions of Emmie, her face varied significantly between sequences, disrupting visual consistency
- Inconsistent voice: Despite identical instructions, Emmie’s voice tone often changed, affecting auditory coherence
- Unrealistic generations: Several outputs had visual oddities (unnatural expressions, odd angles, strange movements), requiring multiple re-renders to get usable clips
- High experimentation costs: Veo 3 uses Google Cloud credits. 20,000 credits cost $200. One 8-second video consumes ~100 credits (around $1 per 8 seconds). A full experiment can add up quickly.
Our Methodology
Here’s how we optimized our use of Veo 3:- Use “Veo 3 quality” Include this phrase in every prompt for optimal rendering.
- Ultra-detailed character identity Describe characters with extreme precision (appearance, outfit, demeanor…). ChatGPT can help refine these descriptions.
- Highly specific environment Every element of the scene must be defined: style, objects, lighting, mood. Every detail counts.
- Scene direction Provide exact instructions for movement and interaction to minimize misinterpretation.
- Short, clear dialogue
With the 8-second limit, each line must be concise and time-efficient. - Always revise scripts after poor outputs If the result is subpar, tweak the wording. Repeating the same prompt often yields worse results.





