🎙️ GibbsTTS — Zero-Shot Voice Cloning TTS

Official interactive demo for Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech.

Paper: https://arxiv.org/abs/2605.09386
Code: https://github.com/ydqmkkx/GibbsTTS
Weights: https://huggingface.co/ydqmkkx/GibbsTTS

Upload a short reference speech audio (a few seconds is enough).
The reference transcript is optional. Leave it blank, choose ASR language, (and click the Auto-transcribe button,) Whisper will transcribe automatically.
Then type the text you want to synthesize, choose reference and TTS languages, and click the Synthesize button, the model will speak it in the reference voice.
Supports English, Chinese Mandarin, English/Chinese mixing, and Japanese (LoRA fine-tuned).
Also supports cross-lingual synthesis.