CastVoice demos AI casting tool
CastVoice showcased a demo tool that assigns AI or cloned voices to characters from uploaded text, mixes in effects, and outputs finished audio drama segments—built on ElevenLabs’ models. The demo illustrates how casting automation can speed prototype dialogue reads and mockups for audio fiction production. (x.com)
A script that used to need a casting spreadsheet, a stack of audition files, and an audio editor can now be turned into a rough scene by one demo tool in a single pass. CastVoice showed a workflow that takes uploaded text, assigns voices to characters, adds effects, and spits out an audio drama segment built on ElevenLabs models. (x.com, elevenlabs.io) The timing matters because ElevenLabs already sells the pieces CastVoice is stitching together. Its platform offers text to speech, voice cloning, sound effects, and an editor aimed at podcasts, audiobooks, and voiceovers. (elevenlabs.io, elevenlabs.io) The voice-selection part is not just picking from a menu of stock narrators. ElevenLabs says creators can use a voice library, make a custom voice from a written prompt with Voice Design, or use professional and instant voice cloning for replicas. (elevenlabs.io, elevenlabs.io) That changes what “casting” means at the prototype stage. Instead of hiring five actors just to hear whether a scene works, a writer can test “tired detective,” “teen sidekick,” or “grandmother with a dry voice” as generated performances and swap them in minutes. (elevenlabs.io) The dialogue engine underneath is built for exactly this kind of mockup. ElevenLabs’ Text to Dialogue tool is designed to generate expressive multi-speaker scenes for games, podcasts, and audiobooks, and it reads emotional cues directly from the text. (elevenlabs.io) That means the script itself can steer the performance. ElevenLabs documents tags like “[sad],” “[laughing],” and “[whispering],” plus sound cues like “[gentle footsteps]” or “[applause],” so a draft can carry both acting notes and scene texture in the same text block. (elevenlabs.io) CastVoice’s demo pushes one step past plain read-throughs by bundling those pieces into a finished-feeling output. The result is closer to an animatic for audio fiction: not the final production, but a fast version that lets a team hear pacing, chemistry, and scene transitions before spending money on studio sessions. (x.com, elevenlabs.io) ElevenLabs’ own documentation also shows the ceiling on this approach. It says Text to Dialogue is not meant for real-time use and may require several generations before a user gets the result they want, which makes the best fit prototyping rather than one-click final mastering. (elevenlabs.io) Its Voice Design docs make a similar point from the casting side. The company says prompt-made voices are best for quick exploration and iteration, while professional voice clones are the higher-quality option when consistency matters. (elevenlabs.io) So the clearest use case is the messy middle of production, where teams are still deciding who a character is and how a scene should sound. CastVoice is showing that the bottleneck there may no longer be recording audio at all, but choosing which version of the scene is worth making with humans next. (x.com, elevenlabs.io)