Overview
The Audio API provides voice cloning and text-to-speech capabilities. Clone voices from audio samples and generate natural-sounding speech in 30+ languages. Perfect for voiceovers, audiobooks, virtual assistants, and character dialogue.Base URL
Core Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /voices/clone | Clone voice from audio sample |
| GET | /voices | List your voice clones |
| GET | /voices/{voiceId} | Get voice clone details |
| DELETE | /voices/{voiceId} | Delete voice clone |
| POST | /audio/generate | Generate speech from text |
| GET | /audio/{audioId} | Get audio generation status |
| GET | /audio | List generated audio files |
Voice Clone Object
Audio Generation Object
Pricing
Voice Cloning
- Cost: 5 credits per voice clone (one-time)
- Reusable: Generate unlimited audio with same voice ID
Audio Generation
- Base: 5 credits
- Duration: +1 credit per second
- Formula:
Total = 5 + (duration_seconds × 1)
Examples
| Duration | Calculation | Total Credits |
|---|---|---|
| 5s | 5 + (5 × 1) | 10 |
| 15s | 5 + (15 × 1) | 20 |
| 60s | 5 + (60 × 1) | 65 |
| 2min | 5 + (120 × 1) | 125 |
Quick Start
Clone a Voice
Generate Speech
Supported Languages
- English
- European
- Asian
- Other
en-US- American Englishen-GB- British Englishen-AU- Australian Englishen-CA- Canadian Englishen-IN- Indian English
Preset Voices
Use built-in voices without cloning:| Voice ID | Description | Languages |
|---|---|---|
preset-professional-male | Deep, authoritative | All supported |
preset-professional-female | Clear, confident | All supported |
preset-friendly-male | Warm, approachable | All supported |
preset-friendly-female | Upbeat, energetic | All supported |
preset-narrator | Storytelling style | All supported |
Voice Modulation
Control speech characteristics:Audio Quality Options
Output Formats
| Format | Codec | Use Case | File Size |
|---|---|---|---|
| MP3 | MPEG Audio Layer 3 | Web, streaming, general use | Small |
| WAV | Uncompressed PCM | Professional editing, highest quality | Large |
| OGG | Ogg Vorbis | Open format, web embedding | Medium |
Sample Rates
- 22050 Hz: Voice-only, minimal quality
- 44100 Hz: CD quality, recommended for most uses
- 48000 Hz: Professional audio, broadcast quality
SSML Support
Use Speech Synthesis Markup Language for fine control:Error Responses
invalid_audio_sample- Poor quality or too shortunsupported_language- Language not availabletext_too_long- Exceeds maximum length (10,000 chars)voice_not_found- Voice ID doesn’t existinsufficient_credits- Not enough credits
Rate Limits
| Tier | Voice Cloning | Audio Generation | Concurrent |
|---|---|---|---|
| Free | 5/day | 100/day | 2 |
| Pro | 50/day | 1000/day | 5 |
| Enterprise | Unlimited | Unlimited | 20 |
Best Practices
Recording Quality
Recording Quality
For best voice cloning results:
- Use clean, noise-free environment
- 15-60 seconds of clear speech
- Professional microphone recommended
- Sample rate: 44.1kHz or higher
- Format: WAV, MP3, or FLAC
Text Optimization
Text Optimization
- Keep sentences natural and conversational
- Use punctuation for proper pauses
- Break long text into smaller chunks
- Specify pronunciation for technical terms
- Test with short samples first
Content Policy
Content Policy
- Only clone voices you have permission to use
- Don’t impersonate without consent
- Clearly disclose AI-generated content
- Comply with local voice biometric laws