Start Translation

Create a new translation project by uploading a video or audio file. The translation process runs asynchronously in the background. Use the status endpoint to track progress and retrieve results.

Concurrency Limit

You can run up to 10 translations at the same time per account. If 5 translations are already in progress, new requests return CONCURRENT_TRANSLATION_LIMIT_REACHED (HTTP 429).

Request

This endpoint accepts multipart/form-data with a file upload.

Headers

x-api-key

string

required

Your VoiceCheap API key. Get one from app.voicecheap.ai/page-api.

Body Parameters

file

required

The video or audio file to translate.Supported video formats: video/mp4, video/quicktime, video/x-matroska, video/webm, video/mpegSupported audio formats: audio/mpeg, audio/wav, audio/mp4, audio/x-m4a, audio/flac, audio/ogg, audio/aacMaximum file size: 20 GB

targetLanguage

string

required

The language to translate the content into. Must be lowercase.Allowed values: american english, arabic, brazilian portuguese, british english, bulgarian, canadian french, chinese, croatian, czech, danish, dutch, finnish, french, german, greek, hindi, hungarian, indonesian, italian, japanese, korean, malay, mandarin, norwegian, polish, portuguese, romanian, russian, slovak, spanish, swedish, tagalog, tamil, turkish, ukrainian, vietnamese

originalLanguage

string

The source language of the content using ISO language codes (e.g., en, es, fr, de, ja, zh).

Strongly recommended: Leave this empty for auto-detection.Only provide this parameter if you are 100% certain the language code is correct and in valid ISO format. Incorrect language codes will cause transcription failures. Our auto-detection supports 80+ languages and is highly accurate.

Default: auto-detect

projectName

string

A custom name for the project. Useful for identifying projects in your dashboard.Default: The project ID will be used if not provided.

keepBackgroundMusic

boolean

Whether to preserve background audio in the output.When enabled, keeps background music, ambience, laughs, claps, and crowd sounds while removing only the original voice (stem separation). Turn off if your source has no background audio.Default: true

voiceIsolatorOption

string

Voice isolation mode when keepBackgroundMusic is enabled. Controls the quality and characteristics of voice separation.

Studio (Recommended)

Our default voice processing, designed for professional audio quality:

Removes echoes and reverberations
Cleans technical imperfections
Produces a clear and crisp voice

Recommended for: Most projects where audio quality is paramount. Ideal for tutorials, educational content, marketing videos, and any content requiring optimal voice clarity.

Realistic

Preserves the natural characteristics of the recording environment:

Maintains a sound closer to the original recording
Preserves environmental characteristics

Recommended for: Content where the authenticity of the environment is important, such as outdoor vlogs, documentaries, or content where the sound ambiance is an integral part of the experience.

This option may create artifacts or unexpected effects in some cases due to the preservation of background elements.

Allowed values: studio, realisticDefault: studio

keepOriginalVoice

boolean

Whether to keep the original background voice (vocal track) underneath the dub.When enabled, the original voice is mixed in the background at a reduced level. Use originalVoiceVolume to control the target level.Default: false

originalVoiceVolume

number

The target level for the original background voice bed when keepOriginalVoice is enabled (in dB).Range: 10 - 70 (higher = louder original voice)Default: 30

subtitles

boolean

Whether to generate subtitles for the translated video.When enabled, adds clean Netflix-style black and white subtitles. Use subtitlesSource to choose original (source language) or translated (target language) text. Subtitles are automatically synced for optimal readability.Note: Burned-in subtitles require FFmpeg with the subtitles filter (libass). If unavailable, the API falls back to embedding a subtitle track instead of hard-burned styling.Default: false

subtitlesSource

string

Choose the subtitle text source when subtitles is enabled.Allowed values: translated, originalDefault: translatedNote: If original is selected but the original transcription is unavailable, subtitles fall back to translated.

translationTimeSkips

array

Optional list of time ranges (in seconds) that should not be translated. The original audio (voices + background) is kept for these ranges.

Show translationTimeSkips item

translationTimeSkips[].startTime

number

required

Start timestamp in seconds.

translationTimeSkips[].endTime

number

required

End timestamp in seconds (must be greater than startTime).

translationTimeSkips[].duration

number

required

Duration in seconds (endTime - startTime).

Rules:

Ranges must be within the media duration and cannot overlap.
Ranges cannot include any existing transcription segments.
Requires keepBackgroundMusic=true and keepOriginalVoice=false.

Form-data: Send as a JSON string (e.g., -F 'translationTimeSkips=[{\"startTime\":12.5,\"endTime\":18.0,\"duration\":5.5}]').

lipsyncPro

boolean

Trigger lip-sync processing after translation completes.

false = Standard lip-sync (4 minutes of credits per 1 minute of video)
true = Professional lip-sync (9 minutes of credits per 1 minute of video)
Max duration: 30 minutes per video.Latency: Lip-sync processing typically adds 2x-4x the original video duration.

Lip-sync completion and failure emails are not sent for API-triggered requests. Use the status endpoint to track progress.

Default: not enabled (omit the field to skip lip-sync)Form-data: Send boolean values as true or false strings (e.g., -F "lipsyncPro=false").

voiceCloningSettings

object

Fine-tune voice cloning parameters for advanced control over the generated voice. Pass as a JSON string when using form-data. All values must be between 0 and 1 (with step of 0.01).

Show voiceCloningSettings properties

stability

number

default:"0.60"

Voice Stability (0.00 - 1.00)Determines how stable the voice is and the randomness between each generation.

Lower values allow a wider emotional range but may cause odd or rushed speech
Higher values produce more consistent output but may sound monotone

Default: 0.60Tip: For avoiding accent reproduction, use 0.80

similarity

number

default:"0.85"

Voice Similarity (0.00 - 1.00)Controls how closely the AI adheres to the original voice.

Higher values make the cloned voice more similar to the original
If the original audio is noisy and similarity is too high, artifacts or background noise may carry into the generated voice

Default: 0.85Tip: For avoiding accent reproduction, use 0.20

speakerBoost

number

default:"0.60"

Speaker Boost (0.00 - 1.00)Boosts similarity to the original speaker. This is a subtle enhancement that increases resemblance to the source voice.Note: Higher values increase compute time and latency.Default: 0.60Tip: For avoiding accent reproduction, use 0.00

Default values (balanced):

{
  "stability": 0.6,
  "similarity": 0.85,
  "speakerBoost": 0.6
}

Recommended for avoiding accent reproduction:

{
  "stability": 0.8,
  "similarity": 0.2,
  "speakerBoost": 0.0
}

Response

success

boolean

required

Always true for successful requests

message

string

required

A human-readable message describing the result

projectId

string

required

The unique identifier for the created translation project. Use this ID to check status.

estimatedDuration

number

required

The detected duration of the uploaded file in seconds

Examples

originalVoiceVolume expects a numeric dB level (10-70). Higher values keep more of the original background voice. subtitlesSource accepts translated or original to control which text is used for subtitles.

curl -X POST https://api.voicecheap.ai/v1/translate \
  -H "x-api-key: vc_your-api-key" \
  -F "file=@video.mp4" \
  -F "targetLanguage=spanish" \
  -F "projectName=My Spanish Translation" \
  -F "keepBackgroundMusic=true" \
  -F "voiceIsolatorOption=studio" \
  -F "keepOriginalVoice=true" \
  -F "originalVoiceVolume=40" \
  -F "subtitles=true" \
  -F "subtitlesSource=translated" \
  -F "lipsyncPro=false"

Response Example

{
  "success": true,
  "message": "Translation started successfully",
  "projectId": "abc123-def456-ghi789",
  "estimatedDuration": 125.5
}

Errors

Status	Code	Description
400	`FILE_REQUIRED`	No file was uploaded with the request
400	`INVALID_FILE_TYPE`	The uploaded file type is not supported
400	`DURATION_DETECTION_FAILED`	Could not detect the duration of the uploaded file
400	`INVALID_BOOLEAN_VALUE`	A boolean parameter has an invalid value (use “true” or “false”)
400	`INVALID_JSON_FORMAT`	The voiceCloningSettings JSON is malformed
401	`INVALID_API_KEY`	The provided API key is invalid
403	`SUBSCRIPTION_REQUIRED`	API access requires a paid subscription
403	`INSUFFICIENT_CREDITS`	Not enough credits to process this file
429	`RATE_LIMIT_EXCEEDED`	Too many requests (limit: 10 requests per minute)
429	`CONCURRENT_TRANSLATION_LIMIT_REACHED`	Too many translations in progress (limit: 10 concurrent translations)

Overview

Translation

Reference

Start Translation

Start Translation

Concurrency Limit

Request

Headers

Body Parameters

Response

Examples

Response Example

Errors

Overview

Translation

Reference

​Start Translation

​Concurrency Limit

​Request

​Headers

​Body Parameters

​Response

​Examples

​Response Example

​Errors

Start Translation

Concurrency Limit

Request

Headers

Body Parameters

Response

Examples

Response Example

Errors