Expanding the Toolkit

Sep 17

When I started this next phase of exploration, my goal was simple: make a “music video” with my imagery — something purely visual, a rhythmic flow of light and color set to sound. I wasn’t thinking about narrative yet. I just wanted to feel movement — to see my worlds pulse to music, like living paintings.

But as I experimented, something unexpected happened. The tools began to grow more powerful, and so did my understanding of how to use them. Each test led to a new question, and each question pulled me closer to something that felt like story. What began as a visual poem started to hint at characters, emotion, and the structure of a film.

I started expanding my toolkit to see how far I could push both myself and the technology. Kling 1.6 and later 2.1 quickly stood out — incredible for visual effects, atmosphere, and the kind of light energy I’d only been able to paint before. Flux Kontext became my new go-to for image generation, especially when I needed character consistency across scenes. Wan 2.1, another Chinese video model, offered unique stylistic motion — something more cinematic, more unpredictable.

Then I discovered that Seedance Pro 1.0, for pure visual fidelity and realism, especially with human characters, was far ahead of anything else I’d used. With carefully written and engineered prompts, I could achieve subtlety — micro muscle movements, shifting emotional tones, and nuanced performances within a single take. That kind of control is nearly impossible in traditional CG without massive investments in 4D facial capture technology. In fact, this was one of the most persistent and expensive challenges I faced across several of the AAA games I’ve worked on. To see something close to that level of performance emerging from these AI tools felt like witnessing a creative threshold being crossed.

And then there was Veo 3. It’s still rough around the edges, but it remains the only model that can handle speaking characters with any real believability. I’ve only started early tests here, but even the smallest success — a line that syncs, an expression that feels human — feels like a glimpse of the future.

On the audio side, I began incorporating ElevenLabs for AI voiceover and Audacity for mixing, SFX layering, and polish. Once I started shaping dialogue, breath, and rhythm, everything changed. The images stopped feeling like experiments. They started to feel like performances.

It’s interesting — in these hybrid workflows, the technology evolves almost daily, but the real learning is still human. Every new tool introduces a new way of thinking. I’ve found that AI excels at handling environments, architecture, and effects — anything with structure and physics. But people? People are harder. Anatomy, emotion, and timing live on a razor’s edge. One pixel too far and you tumble into the uncanny valley.

Building the Origin Story

As part of this new toolkit phase, I’ve begun experimenting directly with the Utherworlds origin story — the story of Lucas Sellers, the central figure in the book. The illustrated novel never revealed who Lucas was or how he came to Utherworlds. It existed as a journal, a chronicle of his journey written from within the dream — fragmented, emotional, and shaped by the resonance of the world around him.

These early sequences begin to explore what came before that journal. They’re not full scenes yet — more like moving metaphors, cinematic sketches of transformation and memory. I’m using this stage to establish the visual and symbolic language that defines the brand: the relationship between fire and rebirth, light and loss, form and emotion.

The footage I’m sharing here isn’t narrative in the traditional sense; it’s thematic. Each moment — a spark, a floating paper, a reflection of firelight — is a test of how symbolism and design can carry meaning long before dialogue or plot enter the frame. In this way, I’m treating the visuals as mythic language, testing how tone, rhythm, and visual metaphor can connect story, character, and worldbuilding into a single emotional identity.

What’s exciting to me is that this is where art direction and narrative direction merge. The same creative ingredients that define the visual brand — the textures, symbols, and emotional contrasts — also become the narrative DNA of the property itself. Even at this early stage, I can feel Utherworlds beginning to speak in its own cinematic tongue.

Still, I can feel something shifting. What began as a visual experiment — a music video without words — is becoming something larger. I’m no longer asking, Can I make something move? Now I’m asking, Can I make something mean something?

That’s where Utherworlds begins.

Defining Lucas

Here is where I begin to establish what Lucas looks like and what his “costume” is. There are several sketches I created that I then fed into Midjounrey and Flux Context in an attempt to create consistency. I then started to create 2D turnarounds and preliminary 3D turnarounds as well. That’s for another post.

I am fairly impressed with some of the emotion I am achieving here. However, there is still quite a bit to be desired with the quality of Lucas at certain points in these video samples.

Additionally, there is still quite a bit of hand painting over the plates I am generating to achieve these results and it is still a requirement that I attempt various prompts and, in some cases, over a dozen attempts before I achieve something close to the results I am looking for.

One last thing- you can see that I am starting to define the color palette with this section so this sequence has its own identity.

philip straub

Expanding the Toolkit

Building the Origin Story

Defining Lucas

Utherworlds: Reawakening a Mythic Vision

Velocity of Vision: Bringing Cities to Life