Oh… There are discrepancies in the document’s claims…![]()
What the File name too long: 'AAAAGGZ0eXBtcDQy...' error means
That string (AAAAGGZ0eXBtcDQy...) is the base64 of your .m4a file header (MP4 container). The server is not decoding it as base64 audio. Instead, it is treating the inputs string as a file path and trying to open it as a filename; because it’s thousands of characters long, the OS raises “file name too long”.
This is a known failure mode for ASR requests where a base64 string is accidentally interpreted as a path. (Hugging Face Forums)
So: your audio length (15 seconds) is not the issue. The request is being interpreted incorrectly on the server side.
Why this happens on this endpoint
1) The ASR schema says inputs is a base64 string (but some backends still treat strings as paths)
Hugging Face’s Inference Providers ASR documentation states:
inputs: base64-encoded audio string- or raw bytes if you don’t send
parameters(Hugging Face)
However, some ASR serving wrappers (and older endpoint implementations) handle a string inputs as “path/URL to an audio file” first, and only decode bytes in other branches. When that happens, base64 gets misread as a path → “file name too long”. (Hugging Face Forums)
2) .m4a decoding and content-type handling can be brittle
For binary-audio tasks, HF has historically relied on “content-type guessing” or backend-specific decoding paths; inconsistencies between serverless inference and other deployments are documented as a practical pitfall. (Hugging Face)
Fixes (choose based on whether you must send parameters)
Fix A — Most reliable: send raw audio bytes (no JSON, no parameters)
Per the ASR docs, if you omit parameters, you can send raw bytes directly. (Hugging Face)
Dart (raw bytes)
final url = Uri.parse(
'/static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fhf-inference%2Fmodels%2Fopenai%2Fwhisper-large-v3-turbo',
);
final req = http.Request('POST', url)
..headers['Authorization'] = 'Bearer $_hfToken'
..headers['Accept'] = 'application/json'
// For .m4a in an MP4 container, audio/mp4 is generally safer than audio/m4a.
..headers['Content-Type'] = 'audio/mp4'
..bodyBytes = audioBytes;
final streamed = await req.send();
final body = await streamed.stream.bytesToString();
if (streamed.statusCode != 200) {
throw Exception('HTTP ${streamed.statusCode}: $body');
}
Why this works: the server receives bytes and can’t misinterpret them as a filename. (Hugging Face)
Tradeoff: you can’t pass generation_parameters in this “raw bytes” mode (the docs only describe raw bytes when no parameters are provided). (Hugging Face)
Fix B — If you need generation_parameters: pass a URL as inputs (JSON)
If your backend is treating string inputs as a path, then use that intentionally: provide an HTTPS URL to the audio file (ideally a short-lived signed URL), and keep your JSON parameters.
This aligns with HF’s client documentation: ASR inputs can be raw bytes, a local file, or a URL. (Hugging Face)
JSON body
{
"inputs": "https://<signed-url>/audio.m4a",
"parameters": {
"generation_parameters": {
"do_sample": false,
"temperature": 0,
"top_p": 1,
"num_beams": 1,
"max_new_tokens": 448
}
}
}
Dart (URL input + parameters)
final payload = jsonEncode({
"inputs": signedAudioUrl, // https://...
"parameters": {
"generation_parameters": {
"do_sample": false,
"temperature": 0,
"top_p": 1,
"num_beams": 1,
"max_new_tokens": 448,
}
}
});
final req = http.Request('POST', url)
..headers['Authorization'] = 'Bearer $_hfToken'
..headers['Content-Type'] = 'application/json'
..headers['Accept'] = 'application/json'
..body = payload;
Why this works: it avoids base64 entirely, and it matches the “string interpreted as path/URL” behavior that is causing your error. (Hugging Face Forums)
Fix C — If you need to force “transcribe” vs “translate” reliably: use a deployment that exposes Whisper’s task/language controls
Whisper supports explicit generation controls:
task:"transcribe"or"translate"language: tokens like"en"/"english"(Hugging Face)
But the serverless ASR interface you’re calling is a generic ASR wrapper; even when generation_parameters is supported, Whisper-specific task/language may not be exposed the way Transformers exposes them. (Hugging Face)
If you must guarantee “never translate, always transcribe (and optionally force English)”, the robust approach is to run Whisper behind an endpoint where you control the inference code (so you can set task="transcribe" and language="english" explicitly). (Hugging Face)
Practical stability tips (to reduce “random languages”)
Even after you fix the request shape, “random language” output is often caused by audio decoding / language-ID instability. The highest-impact change is:
- Convert to WAV PCM, mono, 16 kHz before sending (then use
Content-Type: audio/wav). - If you keep
.m4a, useaudio/mp4rather thanaudio/m4a(some stacks handle it more consistently). - Make decoding deterministic (
temperature: 0,do_sample: false) — but that requires Fix B (URL input) or an endpoint that accepts parameters with bytes. (Hugging Face)
Recommendation for your exact situation
- First, switch to Fix A (raw bytes) and confirm transcription works consistently (this isolates request-format issues). (Hugging Face)
- If you need deterministic decoding knobs, move to Fix B (URL input) and keep
generation_parameters. (Hugging Face) - If you need a hard guarantee on transcribe vs translate, use a setup that exposes Whisper’s
task/languagecontrols directly. (Hugging Face)