🎙️ GibbsTTS — Zero-Shot Voice Cloning TTS
Official demo for Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech.
Upload a short reference clip (a few seconds is enough). The reference transcript is optional — leave it blank and Whisper will fill it in for you automatically. Then type the text you want to synthesize, and the model will speak it in the reference voice. Supports English and Chinese Mandarin (plus experimental EN/ZH mixing).
8 64
1 5
0 1
0.1 1.5
0.5 1
Examples
| Reference audio (prompt) | Reference transcript (optional) | Target text (what you want the model to speak) | Language |
|---|
If you find this work useful, please cite the paper. Model trained on Emilia-en/zh.