What Did Grok Just Fix in AI Video?

Hey Creator,

Most AI video tools hand you half a product. You get the visuals. The audio, syncing, and second export are still entirely on you.

Here's the one that finally finishes the job.

Do the math on your last AI video clip: generation time, plus the time spent finding sound effects, plus the time syncing them, plus a re-export. The "30-second idea" rarely stays a 30-second job.

Most AI video models generate silent clips, leaving audio as a separate, manual step every time
Syncing dialogue or sound effects to motion eats more time than the actual generation
By the time you're done, the spontaneity that made the idea good is long gone

That gap — between having an idea and having something postable — is exactly what xAI built Grok Imagine Video 1.5 to close.

What it actually does?

The recently released, Grok Imagine Video 1.5 generates video and audio in the same pass — no separate dubbing, no manual sync.

Audio built in, not bolted on — dialogue, sound effects, ambience, and background music generate alongside the video itself, with no separate tool or alignment step required
Genuinely fast — the Fast variant produces a 6-second 720p clip in about 25 seconds, down from 40+ seconds in the previous version
Holds together across the clip — built on xAI's Aurora architecture, which keeps subjects, lighting, and scene details consistent from first frame to last, reducing the warping and inconsistency common in earlier AI video
Extend instead of regenerate — the Extend from Frame feature adds 6-10 seconds onto an existing clip, so you can grow a sequence without starting over
Currently #1 on the Image-to-Video Arena leaderboard, with a 52-point Elo jump over the previous version, beating Seedance 2.0 and Google's Veo in blind testing

Where it genuinely shines

It's strongest for image-to-video work — animating product shots, portraits, or concept art into motion — and for social-native short clips where built-in audio means you skip post-production entirely. If you're testing five visual ideas before committing to one, this is built for exactly that kind of rapid, throwaway-friendly iteration.

Where it's not the right tool

Be honest with yourself about what you're making. It lacks the precise, "surgical" editing controls you'd get in something like Runway, and it's not built for long-form storytelling or polished, highly structured brand videos. Think fast concept and social content, not finished client deliverables.

Pro tip: Generate first at 480p and 5 seconds to test your prompt direction cheaply. Once the motion and audio land right, scale up to 720p and your full clip length — this saves you from paying full price on prompts that need three more tries.

Cost and access

Went generally available in mid-June 2026, with the Fast variant live now on grok.com/imagine and the iOS/Android apps, and standard API access for developers.

Pricing is $4.20 per minute at 720p — 86% cheaper than Sora 2 Pro's old $30/minute rate for 1024p output (Sora's consumer app shut down in April 2026, with its API set to retire later this year), and also undercuts Google Veo 3.1, which starts at $9/minute.

Try it / learn more

xAI's Grok Imagine Video 1.5 announcement
Grok Imagine app

Free Weekly AI Sessions for Experienced Software Engineers.

Every Wednesday at 5 PM CT, Gauntlet AI professors teach a live, hands-on AI engineering session — completely free. If you're nontechnical, this isn't for you. New topic every week, built for engineers who want to build, not just watch. See upcoming sessions.

Browse Free Sessions

Did you find today’s issue useful?

Let us know with a quick click 👇