AI Experiment: How We Created an Emmie Video with Google Veo 3

Why this experiment?

At Terranoha, our approach is driven by curiosity and a hands-on exploration of emerging technologies. When Google DeepMind released Veo 3, its new experimental video generation tool powered by AI, we wanted to understand how it truly performs — beyond the theoretical demos.

Our goal was simple:

Test what Veo 3 can actually deliver today
Identify its technical limitations in a professional context
Explore how to produce a short professional video using only text-based prompts

To do this, we chose a concrete use case: creating a video introducing Emmie, our virtual agent dedicated to trading workflows, in a realistic environment.

What Veo 3 Promises

On paper, Veo 3 offers impressive capabilities:

1080p video generation from text prompts
Multiple visual styles (cinematic, animated, documentary…)
Temporal and visual consistency
Voice-over integration via audio/text prompts

It appears to be a promising solution for creating innovative video content.

What We Actually Encountered

During our deep dive into Veo 3, we encountered several significant limitations for our professional use case:

Severely limited length: Max 8 seconds per sequence, forcing the video to be artificially fragmented
Voice-over sync issues: Audio sometimes failed to generate despite accurate prompts
Subtitle inconsistencies: Despite Google’s recent updates, we continued to face recurring errors
Prompt variability: Even with highly detailed descriptions of Emmie, her face varied significantly between sequences, disrupting visual consistency
Inconsistent voice: Despite identical instructions, Emmie’s voice tone often changed, affecting auditory coherence
Unrealistic generations: Several outputs had visual oddities (unnatural expressions, odd angles, strange movements), requiring multiple re-renders to get usable clips
High experimentation costs: Veo 3 uses Google Cloud credits. 20,000 credits cost $200. One 8-second video consumes ~100 credits (around $1 per 8 seconds). A full experiment can add up quickly.

These concrete constraints highlight that Veo 3 remains experimental and not yet suited for demanding professional video production.

Our Methodology

Here’s how we optimized our use of Veo 3:

Use “Veo 3 quality”
Include this phrase in every prompt for optimal rendering.
Ultra-detailed character identity
Describe characters with extreme precision (appearance, outfit, demeanor…). ChatGPT can help refine these descriptions.
Highly specific environment
Every element of the scene must be defined: style, objects, lighting, mood. Every detail counts.
Scene direction
Provide exact instructions for movement and interaction to minimize misinterpretation.
Short, clear dialogue
With the 8-second limit, each line must be concise and time-efficient.
Always revise scripts after poor outputs
If the result is subpar, tweak the wording. Repeating the same prompt often yields worse results.

Our Scripts and Prompts

You can download the full scripts and prompts we used for this experiment:
DOWNLOAD FULL SCRIPTS

Final Result

Despite the limitations, our process allowed us to create a video that aligns with our original vision for Emmie: professional, smooth, visually coherent, and tailored to the trading environment.

Conclusion & Outlook

This experiment with Google Veo 3 gave us deeper insights into the current capabilities — and limits — of AI-driven video generation. While still experimental and imperfect, Veo 3 offers a promising glimpse into the future of intelligent video creation.

We’ll continue exploring these emerging technologies to further enhance the user experience powered by Emmie.

1 Comment

Comments are closed.