Skip to main content

Overview

Percify’s Voice Cloning technology enables you to create realistic voice replicas and generate natural-sounding speech in multiple languages. Perfect for adding personality to avatars, creating audio content, or building voice-enabled applications.

Natural Speech

Human-like intonation and emotion

Multi-Language

Support for 30+ languages

Quick Cloning

Clone voices from 15s samples

Flexible Output

Control speed, pitch, and emotion

How Voice Cloning Works

1

Upload Voice Sample

Provide a clean audio recording (15-60 seconds recommended)
2

AI Analysis

Our models analyze vocal characteristics, tone, and patterns
3

Voice Profile Creation

Generate a unique voice ID you can reuse
4

Text-to-Speech Generation

Convert any text to speech using the cloned voice

Voice Cloning Requirements

Audio Sample Quality

Optimal Conditions:
  • Clear, noise-free environment
  • Professional or high-quality microphone
  • Consistent volume level
  • No background music or effects
  • Sample rate: 44.1kHz or higher
  • Format: WAV, MP3, or FLAC
  • Minimum: 15 seconds (basic cloning)
  • Recommended: 30-60 seconds (better quality)
  • Maximum: 5 minutes (professional cloning)
Longer samples provide better voice fidelity and natural intonation.
Read clear, varied sentences that include:
  • Different emotions (neutral, happy, serious)
  • Various pitch ranges
  • Natural pauses and breathing
  • Complete sentences with proper intonation
Example script: “Hello, I’m excited to try voice cloning with Percify. The technology is amazing and opens up so many creative possibilities. I can imagine using this for podcasts, videos, or even virtual assistants. Let’s see how well it captures my unique voice characteristics.”

Pricing

Voice Cloning (One-time per voice)

  • Cost: 5 credits per voice clone
  • Includes: Voice profile creation and storage
  • Reusable: Generate unlimited audio with the same voice ID

Audio Generation (Per use)

  • Base Cost: 5 credits
  • Duration Cost: +1 credit per second of generated audio
  • Formula: Total = 5 + (duration_seconds × 1)

Examples

DurationCalculationTotal Credits
5 seconds5 + (5 × 1)10 credits
10 seconds5 + (10 × 1)15 credits
30 seconds5 + (30 × 1)35 credits
60 seconds5 + (60 × 1)65 credits
2 minutes5 + (120 × 1)125 credits
Cost Optimization: Clone a voice once (5 credits), then reuse it indefinitely. Only pay for generated audio duration.

Creating a Voice Clone

Via Dashboard

1

Navigate to Voice Studio

Click “Voice Cloning” from the main menu
2

Upload Audio Sample

Drag and drop or select your audio file (15s-5min)
3

Name Your Voice

Give your voice clone a memorable name
4

Set Language

Select primary language for the voice
5

Create Clone

Click “Clone Voice” and wait 30-60 seconds for processing

Via API

const fs = require('fs');
const FormData = require('form-data');

const form = new FormData();
form.append('audio', fs.createReadStream('voice-sample.wav'));
form.append('name', 'My Custom Voice');
form.append('language', 'en-US');

const response = await fetch('https://api.percify.io/v1/voices/clone', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.PERCIFY_API_KEY}`,
    ...form.getHeaders()
  },
  body: form
});

const voice = await response.json();
console.log(`Voice ID: ${voice.id}`);
Response:
{
  "id": "voice_abc123",
  "name": "My Custom Voice",
  "language": "en-US",
  "status": "completed",
  "sampleDuration": 45.3,
  "creditCost": 5,
  "createdAt": "2025-11-25T05:45:00Z"
}

Generating Speech

Basic Text-to-Speech

const response = await fetch('https://api.percify.io/v1/audio/generate', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.PERCIFY_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Welcome to Percify! Your AI-powered creative platform.',
    voiceId: 'voice_abc123',
    speed: 1.0,
    pitch: 0,
    outputFormat: 'mp3'
  })
});

const audio = await response.json();
console.log(`Audio URL: ${audio.audioUrl}`);

Advanced Options

const audio = await client.audio.generate({
  text: 'This is an exciting announcement!',
  voiceId: 'voice_abc123',
  
  // Voice modulation
  speed: 1.1,           // 0.5 to 2.0 (1.0 = normal)
  pitch: 2,             // -10 to +10 semitones
  
  // Emotion & style
  emotion: 'excited',   // neutral, happy, sad, excited, calm
  emphasis: ['exciting', 'announcement'], // Words to emphasize
  
  // Output options
  outputFormat: 'mp3',  // mp3, wav, ogg
  sampleRate: 44100,    // 22050, 44100, 48000
  bitrate: 192          // kbps for mp3
});

Supported Languages

  • English Variants
  • European Languages
  • Asian Languages
  • Other Languages
  • en-US - American English
  • en-GB - British English
  • en-AU - Australian English
  • en-CA - Canadian English
  • en-IN - Indian English
Language detection is automatic based on input text. Specify language explicitly for best results with multilingual content.

Preset Voices

Don’t have a voice sample? Use our preset voices:

Professional Male

Deep, authoritative, news anchor style

Professional Female

Clear, confident, corporate presenter

Friendly Male

Warm, approachable, conversational

Friendly Female

Upbeat, energetic, engaging

Narrator

Storytelling, documentary style

Character Voices

Various character archetypes
Access preset voices:
const audio = await client.audio.generate({
  text: 'Your message here',
  voiceId: 'preset-professional-male',
  // or: preset-professional-female, preset-friendly-male, etc.
});

Combining Voice with Video

Create fully animated, voiced avatar videos:
// 1. Clone voice (one-time)
const voice = await client.voices.clone({
  audio: voiceSample,
  name: 'Character Voice'
});

// 2. Generate avatar video
const video = await client.videos.fromImage({
  imageId: 'avatar_123',
  durationSeconds: 8
});

// 3. Generate speech audio
const audio = await client.audio.generate({
  text: 'Welcome to our platform! I\'m your AI guide.',
  voiceId: voice.id
});

// 4. Sync audio with video (automatic lip-sync)
const finalVideo = await client.videos.addAudio({
  videoId: video.id,
  audioId: audio.id,
  enableLipSync: true
});

console.log(`Complete video: ${finalVideo.videoUrl}`);

Voice Management

List Your Voices

curl -X GET https://api.percify.io/v1/voices \
  -H "Authorization: Bearer $PERCIFY_API_KEY"

Update Voice Metadata

await client.voices.update('voice_abc123', {
  name: 'Updated Voice Name',
  description: 'Professional narrator voice for documentaries',
  tags: ['professional', 'narrator', 'documentary']
});

Delete Voice

curl -X DELETE https://api.percify.io/v1/voices/voice_abc123 \
  -H "Authorization: Bearer $PERCIFY_API_KEY"

Audio Quality Optimization

  • MP3: Best for web/streaming (smaller files)
  • WAV: Highest quality, uncompressed (large files)
  • OGG: Good compression, open format
Choose based on your use case and delivery method.
Higher values = better quality but larger filesRecommended settings:
  • Web/mobile: 44.1kHz, 128kbps MP3
  • Professional: 48kHz, 320kbps MP3 or WAV
  • Podcast: 44.1kHz, 192kbps MP3
Percify automatically applies:
  • Noise reduction
  • Volume normalization
  • De-essing (reduces harsh ‘s’ sounds)
  • Breath removal (optional)
Disable with applyProcessing: false for raw output.

Advanced Features

SSML Support

Use Speech Synthesis Markup Language for fine control:
<speak>
  <prosody rate="slow" pitch="+2st">
    Welcome to Percify!
  </prosody>
  <break time="500ms"/>
  <emphasis level="strong">Create amazing content</emphasis>
  with AI-powered tools.
</speak>
const audio = await client.audio.generate({
  text: ssmlContent,
  voiceId: 'voice_abc123',
  textFormat: 'ssml'
});

Phoneme-Level Control

Specify exact pronunciation:
{
  "text": "Welcome to Percify",
  "phonemes": {
    "Percify": "ˈpɜːrsɪfaɪ"
  }
}

Voice Mixing

Combine multiple voices in one audio:
const audio = await client.audio.generateMultiVoice({
  segments: [
    { text: 'Hello, I\'m Sarah.', voiceId: 'voice_sarah' },
    { text: 'And I\'m John.', voiceId: 'voice_john' },
    { text: 'Together we\'ll guide you.', voiceId: 'both' }
  ]
});

Use Cases

Content Creation

YouTube videos, podcasts, audiobooks

E-Learning

Online courses, training materials

Accessibility

Screen readers, audio descriptions

Gaming

Character dialogue, narration

Virtual Assistants

Chatbots, voice interfaces

Marketing

Ads, promotional content

Best Practices

Recording Tips:
  • Use a pop filter to reduce plosives (p, b sounds)
  • Record in a quiet, carpeted room
  • Maintain consistent distance from mic (6-12 inches)
  • Speak naturally, don’t over-enunciate
  • Do multiple takes and choose the best
Content Policy:
  • Only clone voices you have permission to use
  • Don’t impersonate others without consent
  • No deceptive or fraudulent use
  • Comply with local voice biometric laws
  • Clearly disclose AI-generated content when required

Troubleshooting

IssueCauseSolution
Robotic soundPoor quality sampleUse longer, clearer audio sample
Wrong pronunciationText ambiguityUse SSML or phonemes for clarity
Volume inconsistencyUnprocessed outputEnable applyProcessing: true
Generation failsUnsupported languageCheck language code, use preset if unavailable
Audio cuts offDuration limit hitSplit into multiple requests, concatenate

API Rate Limits

TierCloningGenerationConcurrent
Free5/day100/day2
Pro50/day1000/day5
EnterpriseUnlimitedUnlimited20

Next Steps

Support

Need help with voice cloning? Visit the FAQ or email support@percify.io.