Voice Cloning - Percify Docs

Overview

Percify’s Voice Cloning technology enables you to create realistic voice replicas and generate natural-sounding speech in multiple languages. Perfect for adding personality to avatars, creating audio content, or building voice-enabled applications.

Natural Speech

Human-like intonation and emotion

Multi-Language

Support for 30+ languages

Quick Cloning

Clone voices from 15s samples

Flexible Output

Control speed, pitch, and emotion

How Voice Cloning Works

Upload Voice Sample

Provide a clean audio recording (15-60 seconds recommended)

AI Analysis

Our models analyze vocal characteristics, tone, and patterns

Voice Profile Creation

Generate a unique voice ID you can reuse

Text-to-Speech Generation

Convert any text to speech using the cloned voice

Voice Cloning Requirements

Audio Sample Quality

Recording Guidelines

Optimal Conditions:

Clear, noise-free environment
Professional or high-quality microphone
Consistent volume level
No background music or effects
Sample rate: 44.1kHz or higher
Format: WAV, MP3, or FLAC

Duration Requirements

Minimum: 15 seconds (basic cloning)
Recommended: 30-60 seconds (better quality)
Maximum: 5 minutes (professional cloning)

Longer samples provide better voice fidelity and natural intonation.

Content Guidelines

Read clear, varied sentences that include:

Different emotions (neutral, happy, serious)
Various pitch ranges
Natural pauses and breathing
Complete sentences with proper intonation

Example script: “Hello, I’m excited to try voice cloning with Percify. The technology is amazing and opens up so many creative possibilities. I can imagine using this for podcasts, videos, or even virtual assistants. Let’s see how well it captures my unique voice characteristics.”

Pricing

Voice Cloning (One-time per voice)

Cost: 5 credits per voice clone
Includes: Voice profile creation and storage
Reusable: Generate unlimited audio with the same voice ID

Audio Generation (Per use)

Base Cost: 5 credits
Duration Cost: +1 credit per second of generated audio
Formula: Total = 5 + (duration_seconds × 1)

Examples

Duration	Calculation	Total Credits
5 seconds	5 + (5 × 1)	10 credits
10 seconds	5 + (10 × 1)	15 credits
30 seconds	5 + (30 × 1)	35 credits
60 seconds	5 + (60 × 1)	65 credits
2 minutes	5 + (120 × 1)	125 credits

Cost Optimization: Clone a voice once (5 credits), then reuse it indefinitely. Only pay for generated audio duration.

Creating a Voice Clone

Via Dashboard

Navigate to Voice Studio

Click “Voice Cloning” from the main menu

Upload Audio Sample

Drag and drop or select your audio file (15s-5min)

Name Your Voice

Give your voice clone a memorable name

Set Language

Select primary language for the voice

Create Clone

Click “Clone Voice” and wait 30-60 seconds for processing

Via API

const fs = require('fs');
const FormData = require('form-data');

const form = new FormData();
form.append('audio', fs.createReadStream('voice-sample.wav'));
form.append('name', 'My Custom Voice');
form.append('language', 'en-US');

const response = await fetch('https://api.percify.io/v1/voices/clone', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.PERCIFY_API_KEY}`,
    ...form.getHeaders()
  },
  body: form
});

const voice = await response.json();
console.log(`Voice ID: ${voice.id}`);

Response:

{
  "id": "voice_abc123",
  "name": "My Custom Voice",
  "language": "en-US",
  "status": "completed",
  "sampleDuration": 45.3,
  "creditCost": 5,
  "createdAt": "2025-11-25T05:45:00Z"
}

Generating Speech

Basic Text-to-Speech

const response = await fetch('https://api.percify.io/v1/audio/generate', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.PERCIFY_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Welcome to Percify! Your AI-powered creative platform.',
    voiceId: 'voice_abc123',
    speed: 1.0,
    pitch: 0,
    outputFormat: 'mp3'
  })
});

const audio = await response.json();
console.log(`Audio URL: ${audio.audioUrl}`);

Advanced Options

const audio = await client.audio.generate({
  text: 'This is an exciting announcement!',
  voiceId: 'voice_abc123',
  
  // Voice modulation
  speed: 1.1,           // 0.5 to 2.0 (1.0 = normal)
  pitch: 2,             // -10 to +10 semitones
  
  // Emotion & style
  emotion: 'excited',   // neutral, happy, sad, excited, calm
  emphasis: ['exciting', 'announcement'], // Words to emphasize
  
  // Output options
  outputFormat: 'mp3',  // mp3, wav, ogg
  sampleRate: 44100,    // 22050, 44100, 48000
  bitrate: 192          // kbps for mp3
});

Supported Languages

English Variants
European Languages
Asian Languages
Other Languages

en-US - American English
en-GB - British English
en-AU - Australian English
en-CA - Canadian English
en-IN - Indian English

Language detection is automatic based on input text. Specify language explicitly for best results with multilingual content.

Preset Voices

Don’t have a voice sample? Use our preset voices:

Professional Male

Deep, authoritative, news anchor style

Professional Female

Clear, confident, corporate presenter

Friendly Male

Warm, approachable, conversational

Friendly Female

Upbeat, energetic, engaging

Narrator

Storytelling, documentary style

Character Voices

Various character archetypes

Access preset voices:

const audio = await client.audio.generate({
  text: 'Your message here',
  voiceId: 'preset-professional-male',
  // or: preset-professional-female, preset-friendly-male, etc.
});

Combining Voice with Video

Create fully animated, voiced avatar videos:

// 1. Clone voice (one-time)
const voice = await client.voices.clone({
  audio: voiceSample,
  name: 'Character Voice'
});

// 2. Generate avatar video
const video = await client.videos.fromImage({
  imageId: 'avatar_123',
  durationSeconds: 8
});

// 3. Generate speech audio
const audio = await client.audio.generate({
  text: 'Welcome to our platform! I\'m your AI guide.',
  voiceId: voice.id
});

// 4. Sync audio with video (automatic lip-sync)
const finalVideo = await client.videos.addAudio({
  videoId: video.id,
  audioId: audio.id,
  enableLipSync: true
});

console.log(`Complete video: ${finalVideo.videoUrl}`);

Voice Management

List Your Voices

curl -X GET https://api.percify.io/v1/voices \
  -H "Authorization: Bearer $PERCIFY_API_KEY"

Update Voice Metadata

await client.voices.update('voice_abc123', {
  name: 'Updated Voice Name',
  description: 'Professional narrator voice for documentaries',
  tags: ['professional', 'narrator', 'documentary']
});

Delete Voice

curl -X DELETE https://api.percify.io/v1/voices/voice_abc123 \
  -H "Authorization: Bearer $PERCIFY_API_KEY"

Audio Quality Optimization

Output Format Selection

MP3: Best for web/streaming (smaller files)
WAV: Highest quality, uncompressed (large files)
OGG: Good compression, open format

Choose based on your use case and delivery method.

Sample Rate & Bitrate

Higher values = better quality but larger filesRecommended settings:

Web/mobile: 44.1kHz, 128kbps MP3
Professional: 48kHz, 320kbps MP3 or WAV
Podcast: 44.1kHz, 192kbps MP3

Post-Processing

Percify automatically applies:

Noise reduction
Volume normalization
De-essing (reduces harsh ‘s’ sounds)
Breath removal (optional)

Disable with applyProcessing: false for raw output.

Advanced Features

SSML Support

Use Speech Synthesis Markup Language for fine control:

<speak>
  <prosody rate="slow" pitch="+2st">
    Welcome to Percify!
  </prosody>
  <break time="500ms"/>
  <emphasis level="strong">Create amazing content</emphasis>
  with AI-powered tools.
</speak>

const audio = await client.audio.generate({
  text: ssmlContent,
  voiceId: 'voice_abc123',
  textFormat: 'ssml'
});

Phoneme-Level Control

Specify exact pronunciation:

{
  "text": "Welcome to Percify",
  "phonemes": {
    "Percify": "ˈpɜːrsɪfaɪ"
  }
}

Voice Mixing

Combine multiple voices in one audio:

const audio = await client.audio.generateMultiVoice({
  segments: [
    { text: 'Hello, I\'m Sarah.', voiceId: 'voice_sarah' },
    { text: 'And I\'m John.', voiceId: 'voice_john' },
    { text: 'Together we\'ll guide you.', voiceId: 'both' }
  ]
});

Use Cases

Content Creation

YouTube videos, podcasts, audiobooks

E-Learning

Online courses, training materials

Accessibility

Screen readers, audio descriptions

Gaming

Character dialogue, narration

Virtual Assistants

Chatbots, voice interfaces

Marketing

Ads, promotional content

Best Practices

Recording Tips:

Use a pop filter to reduce plosives (p, b sounds)
Record in a quiet, carpeted room
Maintain consistent distance from mic (6-12 inches)
Speak naturally, don’t over-enunciate
Do multiple takes and choose the best

Content Policy:

Only clone voices you have permission to use
Don’t impersonate others without consent
No deceptive or fraudulent use
Comply with local voice biometric laws
Clearly disclose AI-generated content when required

Troubleshooting

Issue	Cause	Solution
Robotic sound	Poor quality sample	Use longer, clearer audio sample
Wrong pronunciation	Text ambiguity	Use SSML or phonemes for clarity
Volume inconsistency	Unprocessed output	Enable `applyProcessing: true`
Generation fails	Unsupported language	Check language code, use preset if unavailable
Audio cuts off	Duration limit hit	Split into multiple requests, concatenate

API Rate Limits

Tier	Cloning	Generation	Concurrent
Free	5/day	100/day	2
Pro	50/day	1000/day	5
Enterprise	Unlimited	Unlimited	20

Next Steps

Image to Video

Combine voice with animation

Audio API Reference

Complete API docs

Credits System

Voice pricing details

Support

Need help with voice cloning? Visit the FAQ or email support@percify.io.

Overview

Creation Features

Credits & Billing

Platform

API

Guides & Tutorials

​Overview

Natural Speech

Multi-Language

Quick Cloning

Flexible Output

​How Voice Cloning Works

​Voice Cloning Requirements

​Audio Sample Quality

​Pricing

​Voice Cloning (One-time per voice)

​Audio Generation (Per use)

​Examples

​Creating a Voice Clone

​Via Dashboard

​Via API

​Generating Speech

​Basic Text-to-Speech

​Advanced Options

​Supported Languages

​Preset Voices

Professional Male

Professional Female

Friendly Male

Friendly Female

Narrator

Character Voices

​Combining Voice with Video

​Voice Management

​List Your Voices

​Update Voice Metadata

​Delete Voice

​Audio Quality Optimization

​Advanced Features

​SSML Support

​Phoneme-Level Control

​Voice Mixing

​Use Cases

Content Creation

E-Learning

Accessibility

Gaming

Virtual Assistants

Marketing

​Best Practices

​Troubleshooting

​API Rate Limits

​Next Steps

Image to Video

Audio API Reference

Credits System

​Support

Overview

How Voice Cloning Works

Voice Cloning Requirements

Audio Sample Quality

Pricing

Voice Cloning (One-time per voice)

Audio Generation (Per use)

Examples

Creating a Voice Clone

Via Dashboard

Via API

Generating Speech

Basic Text-to-Speech

Advanced Options

Supported Languages

Preset Voices

Combining Voice with Video

Voice Management

List Your Voices

Update Voice Metadata

Delete Voice

Audio Quality Optimization

Advanced Features

SSML Support

Phoneme-Level Control

Voice Mixing

Use Cases

Best Practices

Troubleshooting

API Rate Limits

Next Steps

Support