Create multi-voice podcasts with AI text-to-speech. Add segments, assign different voices to each, preview in real-time, and export as audio.
- Multi-segment editing — Add unlimited text segments
- Per-segment voices — Assign different voices to each segment
- Audio pre-generation — Generate audio clips before playback for consistency
- Audio caching — Generated clips are cached for instant replay
- Real-time preview — Play individual segments or the entire podcast
- Audio export — Download your podcast as a single WAV file
- Project management — Export/Import podcasts as JSON
example/podcast-1765734509668.wav
git clone https://github.com/microsoft/VibeVoice
cd VibeVoiceCopy the server script to your VibeVoice directory:
cp example/server.py /path/to/VibeVoice/demo/server.pyRun the server:
cd /path/to/VibeVoice
python demo/server.py --model microsoft/VibeVoice-Realtime-0.5B --device cuda --port 8880The server will start at http://localhost:8880.
# Install dependencies
bun install # or npm install
# Start development server
bun dev # or npm run devOpen http://localhost:5173 in your browser.
- Add segments — Click "Add Segment" to create new text entries
- Write content — Enter text for each segment
- Select voices — Choose a voice from the dropdown for each segment
- Generate audio — Click "Generate" on each segment to create the audio clip
- Preview — Click "Play" on individual segments or "Play Podcast" for all
Each segment has a Generate / Regenerate button:
- Generate — Creates the audio clip for that segment (amber button)
- Regenerate — Re-creates the audio if you want a different take
- Generate All — Generates all missing audio clips at once
- Ready — Green indicator shows the segment has cached audio
Tip: Pre-generating audio ensures consistent playback. Each time you generate, the audio is cached and will play the same way every time — no more "I feel lucky" randomness!
- If a segment has cached audio, it plays instantly
- If not cached, it will generate first, then play
- The cache is invalidated when you change text or voice
- JSON — Save your project with "Export JSON" to continue editing later
- Audio — Click "Download Audio" to export the complete podcast
- Uses cached audio where available (fast!)
- Generates missing segments automatically
Click "Import" and select a previously saved JSON file to restore your project.
Check the example/ folder for sample podcast JSON files you can import.
The frontend connects to the API at http://localhost:8880/api. To change this, edit the API_BASE constant in src/App.tsx.
- Frontend: React + TypeScript + Vite + Tailwind CSS
- Backend: FastAPI + VibeVoice TTS
- Audio: REST API synthesis with client-side caching
MIT
