Skip to content

Voice Control and Naturalization #631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stnxo2023 opened this issue Mar 13, 2025 · 0 comments
Open

Voice Control and Naturalization #631

stnxo2023 opened this issue Mar 13, 2025 · 0 comments

Comments

@stnxo2023
Copy link

Hello,

I'm seeking information on how to manipulate the following aspects of voice output:

  1. Expressing emotions such as laughter, tears, and coughs.
  2. Adjusting pitch and speaking speed.
  3. Ensuring accurate pronunciation and conveying different tones (happy, sad, neutral).
  4. Incorporating natural pauses and filler words like "hmm" and "so on."
  5. How can i clone the voice with high fidelity? So that it has all these characteristics (changing tone on demand during the stream , adjusting pitch, speaking speed, pronounciation, pauses, and natural language with low word-error.)

I would also appreciate details regarding:

  • The timeline for future fine-tuning of this model.
  • Methods to enhance the voice's naturalness and conversational quality.

Please advise on these points.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant