After learning more about Vedal and the AI vtuber Neuro-sama, I wonder if it’s possible to create an AI “buddy” for new streamers to help them get used to talking while live? It would be a good project to add to my resume and I hope it’ll help other introverted streamers.
Some of the members in Vedal’s server told me it would involve different models working together and that Hugging Face is a good place to start looking. Are there any other factors I should consider? I’m a software developer/tester, but an AI novice. ![]()
I’m not particularly knowledgeable about the actual Neuro-sama
, but from what I’ve gathered by referencing threads and such, the difficulty in replication seems to lie more in the software engineering aspects than the AI models itself.
Since the models themselves have become more efficient over time, the core challenges are more on the software engineering side: things like pipeline structure, real-time performance, model integration, network processing, and leveraging existing components.
Regarding the AI models needed within the system, a basic understanding of LLMs for text, VAD, ASR, and TTS for audio should likely suffice for implementation. You might have to wrestle with Python or library version mismatches though…
Related threads
Wow. Did you write those files just to answer my question? Either way, thanks for the valuable information.