Overview
Percify’s Voice Cloning technology enables you to create realistic voice replicas and generate natural-sounding speech in multiple languages. Perfect for adding personality to avatars, creating audio content, or building voice-enabled applications.Natural Speech
Human-like intonation and emotion
Multi-Language
Support for 30+ languages
Quick Cloning
Clone voices from 15s samples
Flexible Output
Control speed, pitch, and emotion
How Voice Cloning Works
1
Upload Voice Sample
Provide a clean audio recording (15-60 seconds recommended)
2
AI Analysis
Our models analyze vocal characteristics, tone, and patterns
3
Voice Profile Creation
Generate a unique voice ID you can reuse
4
Text-to-Speech Generation
Convert any text to speech using the cloned voice
Voice Cloning Requirements
Audio Sample Quality
Recording Guidelines
Recording Guidelines
Optimal Conditions:
- Clear, noise-free environment
- Professional or high-quality microphone
- Consistent volume level
- No background music or effects
- Sample rate: 44.1kHz or higher
- Format: WAV, MP3, or FLAC
Duration Requirements
Duration Requirements
- Minimum: 15 seconds (basic cloning)
- Recommended: 30-60 seconds (better quality)
- Maximum: 5 minutes (professional cloning)
Content Guidelines
Content Guidelines
Read clear, varied sentences that include:
- Different emotions (neutral, happy, serious)
- Various pitch ranges
- Natural pauses and breathing
- Complete sentences with proper intonation
Pricing
Voice Cloning (One-time per voice)
- Cost: 5 credits per voice clone
- Includes: Voice profile creation and storage
- Reusable: Generate unlimited audio with the same voice ID
Audio Generation (Per use)
- Base Cost: 5 credits
- Duration Cost: +1 credit per second of generated audio
- Formula:
Total = 5 + (duration_seconds × 1)
Examples
| Duration | Calculation | Total Credits |
|---|---|---|
| 5 seconds | 5 + (5 × 1) | 10 credits |
| 10 seconds | 5 + (10 × 1) | 15 credits |
| 30 seconds | 5 + (30 × 1) | 35 credits |
| 60 seconds | 5 + (60 × 1) | 65 credits |
| 2 minutes | 5 + (120 × 1) | 125 credits |
Creating a Voice Clone
Via Dashboard
1
Navigate to Voice Studio
Click “Voice Cloning” from the main menu
2
Upload Audio Sample
Drag and drop or select your audio file (15s-5min)
3
Name Your Voice
Give your voice clone a memorable name
4
Set Language
Select primary language for the voice
5
Create Clone
Click “Clone Voice” and wait 30-60 seconds for processing
Via API
Generating Speech
Basic Text-to-Speech
Advanced Options
Supported Languages
- English Variants
- European Languages
- Asian Languages
- Other Languages
en-US- American Englishen-GB- British Englishen-AU- Australian Englishen-CA- Canadian Englishen-IN- Indian English
Language detection is automatic based on input text. Specify language explicitly for best results with multilingual content.
Preset Voices
Don’t have a voice sample? Use our preset voices:Professional Male
Deep, authoritative, news anchor style
Professional Female
Clear, confident, corporate presenter
Friendly Male
Warm, approachable, conversational
Friendly Female
Upbeat, energetic, engaging
Narrator
Storytelling, documentary style
Character Voices
Various character archetypes
Combining Voice with Video
Create fully animated, voiced avatar videos:Voice Management
List Your Voices
Update Voice Metadata
Delete Voice
Audio Quality Optimization
Output Format Selection
Output Format Selection
- MP3: Best for web/streaming (smaller files)
- WAV: Highest quality, uncompressed (large files)
- OGG: Good compression, open format
Sample Rate & Bitrate
Sample Rate & Bitrate
Higher values = better quality but larger filesRecommended settings:
- Web/mobile: 44.1kHz, 128kbps MP3
- Professional: 48kHz, 320kbps MP3 or WAV
- Podcast: 44.1kHz, 192kbps MP3
Post-Processing
Post-Processing
Percify automatically applies:
- Noise reduction
- Volume normalization
- De-essing (reduces harsh ‘s’ sounds)
- Breath removal (optional)
applyProcessing: false for raw output.Advanced Features
SSML Support
Use Speech Synthesis Markup Language for fine control:Phoneme-Level Control
Specify exact pronunciation:Voice Mixing
Combine multiple voices in one audio:Use Cases
Content Creation
YouTube videos, podcasts, audiobooks
E-Learning
Online courses, training materials
Accessibility
Screen readers, audio descriptions
Gaming
Character dialogue, narration
Virtual Assistants
Chatbots, voice interfaces
Marketing
Ads, promotional content
Best Practices
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Robotic sound | Poor quality sample | Use longer, clearer audio sample |
| Wrong pronunciation | Text ambiguity | Use SSML or phonemes for clarity |
| Volume inconsistency | Unprocessed output | Enable applyProcessing: true |
| Generation fails | Unsupported language | Check language code, use preset if unavailable |
| Audio cuts off | Duration limit hit | Split into multiple requests, concatenate |
API Rate Limits
| Tier | Cloning | Generation | Concurrent |
|---|---|---|---|
| Free | 5/day | 100/day | 2 |
| Pro | 50/day | 1000/day | 5 |
| Enterprise | Unlimited | Unlimited | 20 |
Next Steps
Image to Video
Combine voice with animation
Audio API Reference
Complete API docs
Credits System
Voice pricing details