Create a new translation project by uploading a video or audio file. The translation process runs asynchronously in the background. Use the status endpoint to track progress and retrieve results.
You can run up to 10 translations at the same time per account. If 5 translations are already in progress, new requests return CONCURRENT_TRANSLATION_LIMIT_REACHED (HTTP 429).
The video or audio file to translate.Supported video formats:video/mp4, video/quicktime, video/x-matroska, video/webm, video/mpegSupported audio formats:audio/mpeg, audio/wav, audio/mp4, audio/x-m4a, audio/flac, audio/ogg, audio/aacMaximum file size: 20 GB
The source language of the content using ISO language codes (e.g., en, es, fr, de, ja, zh).
Strongly recommended: Leave this empty for auto-detection.Only provide this parameter if you are 100% certain the language code is correct and in valid ISO format. Incorrect language codes will cause transcription failures. Our auto-detection supports 80+ languages and is highly accurate.
Whether to preserve background audio in the output.When enabled, keeps background music, ambience, laughs, claps, and crowd sounds while removing only the original voice (stem separation). Turn off if your source has no background audio.Default:true
Voice isolation mode when keepBackgroundMusic is enabled. Controls the quality and characteristics of voice separation.
Studio (Recommended)
Our default voice processing, designed for professional audio quality:
Removes echoes and reverberations
Cleans technical imperfections
Produces a clear and crisp voice
Recommended for: Most projects where audio quality is paramount. Ideal for tutorials, educational content, marketing videos, and any content requiring optimal voice clarity.
Realistic
Preserves the natural characteristics of the recording environment:
Maintains a sound closer to the original recording
Preserves environmental characteristics
Recommended for: Content where the authenticity of the environment is important, such as outdoor vlogs, documentaries, or content where the sound ambiance is an integral part of the experience.
This option may create artifacts or unexpected effects in some cases due to the preservation of background elements.
Whether to keep the original background voice (vocal track) underneath the dub.When enabled, the original voice is mixed in the background at a reduced level. Use originalVoiceVolume to control the target level.Default:false
The target level for the original background voice bed when keepOriginalVoice is enabled (in dB).Range: 10 - 70 (higher = louder original voice)Default:30
Whether to generate subtitles for the translated video.When enabled, adds clean Netflix-style black and white subtitles. Use subtitlesSource to choose original (source language) or translated (target language) text. Subtitles are automatically synced for optimal readability.Note: Burned-in subtitles require FFmpeg with the subtitles filter (libass). If unavailable, the API falls back to embedding a subtitle track instead of hard-burned styling.Default:false
Choose the subtitle text source when subtitles is enabled.Allowed values:translated, originalDefault:translatedNote: If original is selected but the original transcription is unavailable, subtitles fall back to translated.
Fine-tune voice cloning parameters for advanced control over the generated voice. Pass as a JSON string when using form-data. All values must be between 0 and 1 (with step of 0.01).
Speaker Boost (0.00 - 1.00)Boosts similarity to the original speaker. This is a subtle enhancement that increases resemblance to the source voice.Note: Higher values increase compute time and latency.Default:0.60Tip: For avoiding accent reproduction, use 0.00
originalVoiceVolume expects a numeric dB level (10-70). Higher values keep more of the original background voice.
subtitlesSource accepts translated or original to control which text is used for subtitles.