Skip to main content

Overview

The Audio API provides voice cloning and text-to-speech capabilities. Clone voices from audio samples and generate natural-sounding speech in 30+ languages. Perfect for voiceovers, audiobooks, virtual assistants, and character dialogue.

Base URL

https://api.percify.io/v1

Core Endpoints

MethodEndpointDescription
POST/voices/cloneClone voice from audio sample
GET/voicesList your voice clones
GET/voices/{voiceId}Get voice clone details
DELETE/voices/{voiceId}Delete voice clone
POST/audio/generateGenerate speech from text
GET/audio/{audioId}Get audio generation status
GET/audioList generated audio files

Voice Clone Object

{
  "id": "voice_abc123",
  "userId": "user_xyz789",
  "name": "My Custom Voice",
  "language": "en-US",
  "status": "completed",
  "sampleUrl": "https://cdn.percify.io/voices/voice_abc123_sample.mp3",
  "sampleDuration": 45.3,
  "quality": "high",
  "creditCost": 5,
  "metadata": {
    "pitch": "medium",
    "pace": "normal",
    "emotion": "neutral"
  },
  "createdAt": "2025-11-25T06:20:00Z"
}

Audio Generation Object

{
  "id": "audio_def456",
  "userId": "user_xyz789",
  "voiceId": "voice_abc123",
  "text": "Welcome to Percify! Your AI-powered creative platform.",
  "status": "completed",
  "audioUrl": "https://cdn.percify.io/audio/audio_def456.mp3",
  "durationSeconds": 5.2,
  "format": "mp3",
  "sampleRate": 44100,
  "bitrate": 192,
  "creditCost": 10,
  "language": "en-US",
  "metadata": {
    "speed": 1.0,
    "pitch": 0,
    "emotion": "neutral"
  },
  "createdAt": "2025-11-25T06:25:00Z",
  "completedAt": "2025-11-25T06:25:03Z"
}

Pricing

Voice Cloning

  • Cost: 5 credits per voice clone (one-time)
  • Reusable: Generate unlimited audio with same voice ID

Audio Generation

  • Base: 5 credits
  • Duration: +1 credit per second
  • Formula: Total = 5 + (duration_seconds × 1)

Examples

DurationCalculationTotal Credits
5s5 + (5 × 1)10
15s5 + (15 × 1)20
60s5 + (60 × 1)65
2min5 + (120 × 1)125

Quick Start

Clone a Voice

const fs = require('fs');
const FormData = require('form-data');

const form = new FormData();
form.append('audio', fs.createReadStream('voice-sample.wav'));
form.append('name', 'Professional Narrator');
form.append('language', 'en-US');

const response = await fetch('https://api.percify.io/v1/voices/clone', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.PERCIFY_API_KEY}`,
    ...form.getHeaders()
  },
  body: form
});

const voice = await response.json();
console.log(`Voice ID: ${voice.id}`);

Generate Speech

const response = await fetch('https://api.percify.io/v1/audio/generate', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.PERCIFY_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Welcome to Percify! Your AI-powered creative platform.',
    voiceId: 'voice_abc123',
    speed: 1.0,
    outputFormat: 'mp3'
  })
});

const audio = await response.json();
console.log(`Audio URL: ${audio.audioUrl}`);

Supported Languages

  • English
  • European
  • Asian
  • Other
  • en-US - American English
  • en-GB - British English
  • en-AU - Australian English
  • en-CA - Canadian English
  • en-IN - Indian English

Preset Voices

Use built-in voices without cloning:
Voice IDDescriptionLanguages
preset-professional-maleDeep, authoritativeAll supported
preset-professional-femaleClear, confidentAll supported
preset-friendly-maleWarm, approachableAll supported
preset-friendly-femaleUpbeat, energeticAll supported
preset-narratorStorytelling styleAll supported

Voice Modulation

Control speech characteristics:
const audio = await client.audio.generate({
  text: 'This is an exciting announcement!',
  voiceId: 'voice_abc123',
  
  // Speed control
  speed: 1.1,              // 0.5 to 2.0 (1.0 = normal)
  
  // Pitch control
  pitch: 2,                // -10 to +10 semitones
  
  // Emotion
  emotion: 'excited',      // neutral, happy, sad, excited, calm
  
  // Emphasis
  emphasis: ['exciting', 'announcement'],
  
  // Output format
  outputFormat: 'mp3',     // mp3, wav, ogg
  sampleRate: 44100,       // 22050, 44100, 48000
  bitrate: 192             // kbps for mp3
});

Audio Quality Options

Output Formats

FormatCodecUse CaseFile Size
MP3MPEG Audio Layer 3Web, streaming, general useSmall
WAVUncompressed PCMProfessional editing, highest qualityLarge
OGGOgg VorbisOpen format, web embeddingMedium

Sample Rates

  • 22050 Hz: Voice-only, minimal quality
  • 44100 Hz: CD quality, recommended for most uses
  • 48000 Hz: Professional audio, broadcast quality

SSML Support

Use Speech Synthesis Markup Language for fine control:
<speak>
  <prosody rate="slow" pitch="+2st">
    Welcome to Percify!
  </prosody>
  <break time="500ms"/>
  <emphasis level="strong">Create amazing content</emphasis>
  with our AI-powered tools.
  <break time="300ms"/>
  <prosody rate="fast">Let's get started!</prosody>
</speak>
const audio = await client.audio.generate({
  text: ssmlContent,
  voiceId: 'voice_abc123',
  textFormat: 'ssml'
});

Error Responses

{
  "error": {
    "code": "invalid_audio_sample",
    "message": "Audio sample too short. Minimum 15 seconds required.",
    "details": {
      "duration": 8.3,
      "minimum": 15,
      "recommended": 30
    }
  }
}
Common error codes:
  • invalid_audio_sample - Poor quality or too short
  • unsupported_language - Language not available
  • text_too_long - Exceeds maximum length (10,000 chars)
  • voice_not_found - Voice ID doesn’t exist
  • insufficient_credits - Not enough credits

Rate Limits

TierVoice CloningAudio GenerationConcurrent
Free5/day100/day2
Pro50/day1000/day5
EnterpriseUnlimitedUnlimited20

Best Practices

For best voice cloning results:
  • Use clean, noise-free environment
  • 15-60 seconds of clear speech
  • Professional microphone recommended
  • Sample rate: 44.1kHz or higher
  • Format: WAV, MP3, or FLAC
  • Keep sentences natural and conversational
  • Use punctuation for proper pauses
  • Break long text into smaller chunks
  • Specify pronunciation for technical terms
  • Test with short samples first
  • Only clone voices you have permission to use
  • Don’t impersonate without consent
  • Clearly disclose AI-generated content
  • Comply with local voice biometric laws

Webhooks

Subscribe to audio completion events:
{
  "event": "audio.completed",
  "data": {
    "audioId": "audio_def456",
    "voiceId": "voice_abc123",
    "audioUrl": "https://cdn.percify.io/audio/audio_def456.mp3",
    "durationSeconds": 5.2,
    "creditCost": 10
  },
  "timestamp": "2025-11-25T06:25:03Z"
}

Next Steps