CosyVoice Text-to-Speech

Warm

✓ Official🚀 500 runs👁 $0.01 per second✓ Commercial use

CosyVoice is a powerful Text-to-Speech model that generates expressive and natural-sounding speech from text, using a reference audio to mimic voice characteristics.

Input

Configure your audio generation parameters

Prompt *

Describe the text you want to convert to speech

Reference Audio *

Upload an audio file to use as a reference for voice characteristics

Output

No audio generated yet

Enter a prompt, upload reference audio, and click generate to create your first speech