Highest Voted 'voice' Questions - Data Science Stack Exchange

5

votes

2 answers

How can you efficiently cluster speech segments by speaker?

We have ~30 audio snippets, of which around 50% are from the same speaker, who is our target speaker, and the rest are from various different speakers. We want to extract all audio snippets from our target speaker, so basically figure out which…

asked Mar 11 '25 at 17:54

Yes

181
4

1

vote

0 answers

How can I get tortoise-tts to pronounce acronyms correctly?

I'm trying to get tortoise-tts to pronounce acronyms correctly. Example of text that I'd like tortoise-tts to generate an audio file for: OpenAI ChatGPT is a new language model. The audio file generated by tortoise-tts is: OpenAI Chat is a new…

nlp voice text-to-speech

asked Apr 07 '23 at 05:36

Franck Dernoncourt

5,862
12
44
80

0

votes

1 answer

How to create AI voice generator for fantasy language?

I have a "fantasy language" (a conlang), which has a very simple pronunciation system. Every letter represents one sound, as opposed to English, where you can have the same sound with different spellings ("here" and "hear", for example). In the…

ai voice text-to-speech

asked Aug 14 '23 at 22:28

Lance Pollard

75
2
9

0

votes

1 answer

What is the difference between VAD and Speaker Segmentation?

I'm not sure I can distinguish and understand the difference between: VAD (Voice Activity Detection) and Speaker Segmentation I understand that: VAD - split audio to segments of speech or not speech Speaker Segmentation - split audio to segments…

deep-learning voice

asked Apr 03 '23 at 14:50

user3668129

769
4
15

Questions tagged [voice]

How can you efficiently cluster speech segments by speaker?

How can I get tortoise-tts to pronounce acronyms correctly?

How to create AI voice generator for fantasy language?

What is the difference between VAD and Speaker Segmentation?