Voice anonymization question #169

nilslacroix · 2025-04-21T14:04:44Z

I found these code pieces regarding voice anonymization:

if anonymization_only:
             chunk_ar_cond = self.ar_length_regulator(chunk[None])[0]
             chunk_ar_out = self.ar.generate(chunk_ar_cond, torch.zeros([1, 0]).long().to(device),
                                             compiled_decode_fn=self.compiled_decode_fn,
                                           top_p=top_p, temperature=temperature,
                                           repetition_penalty=repetition_penalty)

vc_mel = self.cfm.inference(
                        cat_condition,
                        torch.LongTensor([original_len]).to(device),
                        target_mel, target_style, diffusion_steps,
                        inference_cfg_rate=[intelligebility_cfg_rate, similarity_cfg_rate],
                        random_voice=anonymization_only,
                    )

Do you mind to eloborate how exactly this works or what your thought process was in comparison to normal voice conversion? From my understanding you just generate a random tensor in the first code piece and give this as a kind of random voice embedding to the inference? Would you mind going into detail how this works and if it is a truly random voice and how you come to that conclusion?

Would be really helpful <3

The text was updated successfully, but these errors were encountered:

Plachtaa · 2025-04-22T09:15:17Z

voice anonymization is achieved by not conditioning generation on any timbre prompt. At the beginning I expect the generated timbre to be random, but it turns out instead of random voice, anonymization turns all source speeches into the same voice, which may be some kind of "average voice" of the train set. This is also a good sign indicating source speaker identity is completely removed by speech tokenizer.

nilslacroix · 2025-04-22T20:15:53Z

interesting thanks for the clarification. maybe manipulating this "random average" voice could result in creating artificial voices. i think thats kinda novel, i dont know of any model where you can create an artificial voice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice anonymization question #169

Voice anonymization question #169

nilslacroix commented Apr 21, 2025

Plachtaa commented Apr 22, 2025

nilslacroix commented Apr 22, 2025

Voice anonymization question #169

Voice anonymization question #169

Comments

nilslacroix commented Apr 21, 2025

Plachtaa commented Apr 22, 2025

nilslacroix commented Apr 22, 2025