You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you mind to eloborate how exactly this works or what your thought process was in comparison to normal voice conversion? From my understanding you just generate a random tensor in the first code piece and give this as a kind of random voice embedding to the inference? Would you mind going into detail how this works and if it is a truly random voice and how you come to that conclusion?
Would be really helpful <3
The text was updated successfully, but these errors were encountered:
voice anonymization is achieved by not conditioning generation on any timbre prompt. At the beginning I expect the generated timbre to be random, but it turns out instead of random voice, anonymization turns all source speeches into the same voice, which may be some kind of "average voice" of the train set. This is also a good sign indicating source speaker identity is completely removed by speech tokenizer.
interesting thanks for the clarification. maybe manipulating this "random average" voice could result in creating artificial voices. i think thats kinda novel, i dont know of any model where you can create an artificial voice.
I found these code pieces regarding voice anonymization:
Do you mind to eloborate how exactly this works or what your thought process was in comparison to normal voice conversion? From my understanding you just generate a random tensor in the first code piece and give this as a kind of random voice embedding to the inference? Would you mind going into detail how this works and if it is a truly random voice and how you come to that conclusion?
Would be really helpful <3
The text was updated successfully, but these errors were encountered: