Audio / Text-to-Speech(xtts-v2)

pykara · November 10, 2025, 5:46am

Hello,

XTTS v2 clone sounds American and stutters on long sentences. How can I get a UK accent and stable speech?

I am using XTTS v2 for an AI-Tutor project. My reference voice is a British English teacher. I pass this as speaker_wav and set language="en".

Issues

The output often sounds US English instead of British.
Some long or connected sentences repeat or get stuck.

Setup

OS: Windows
Python: 3.11
TTS: 0.22.0
Model: xtts-v2 (local)
Inference: language="en", speaker_wav=<british_teacher.wav>
Data: ~75 clean clips of the same speaker (mono WAV)

What I tried

Multiple short reference clips from the same speaker
Clean audio with no music or noise
Added full stops and commas, and split long text into shorter sentences

Questions

Is there a way to force a UK accent in XTTS v2 (for example a code like en-gb) or is accent only taken from the reference audio?
How much reference audio is recommended for better accent retention?
What settings help to avoid stutters on long sentences? Is there a best practice for sentence splitting with XTTS v2?
Would a small fine-tune on my British dataset help more than zero-shot cloning? If yes, which training recipe should I follow?

John6666 · November 10, 2025, 9:20am

If you’re particular about accents, it might be easier to use a TTS other than XTTS-v2…

system · November 11, 2025, 5:08am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Chinese text to speech Models	0	608	April 18, 2024
Question Project STT - TTS - Sub translated Community Calls	0	510	September 3, 2023
Two way translation Speech to Speech model EN-DE Models	1	474	September 26, 2023
Speech synthesis model with Styles Like Emoticons or emphasis Intermediate	3	341	December 25, 2024
Arabic Female TTS model 🤗Transformers	0	114	June 6, 2024

Audio / Text-to-Speech(xtts-v2)

Related topics