Hello,
XTTS v2 clone sounds American and stutters on long sentences. How can I get a UK accent and stable speech?
I am using XTTS v2 for an AI-Tutor project. My reference voice is a British English teacher. I pass this as speaker_wav and set language="en".
Issues
-
The output often sounds US English instead of British.
-
Some long or connected sentences repeat or get stuck.
Setup
-
OS: Windows
-
Python: 3.11
-
TTS: 0.22.0
-
Model: xtts-v2 (local)
-
Inference:
language="en",speaker_wav=<british_teacher.wav> -
Data: ~75 clean clips of the same speaker (mono WAV)
What I tried
-
Multiple short reference clips from the same speaker
-
Clean audio with no music or noise
-
Added full stops and commas, and split long text into shorter sentences
Questions
-
Is there a way to force a UK accent in XTTS v2 (for example a code like
en-gb) or is accent only taken from the reference audio? -
How much reference audio is recommended for better accent retention?
-
What settings help to avoid stutters on long sentences? Is there a best practice for sentence splitting with XTTS v2?
-
Would a small fine-tune on my British dataset help more than zero-shot cloning? If yes, which training recipe should I follow?