Skip to content

zxcq544/russian_text_to_speech

Repository files navigation

About

This text to speach works using Silero neural network which is optimized for russian language. Numbers are turned to russian words using num2words and english words are transliterated. By default it uses cpu and 4 cores but you can switch to cuda in NeuralSpeaker.py

You can test Silero text to speech model here https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb

Voice sound examples

Text used as example: На дворе трава, на траве дрова: раз дрова, два дрова, три дрова. На дворе трава, на траве дрова — раз дрова, два дрова, три дрова: дрова вдоль двора, дрова вширь двора, не вместит двор дров, надо дрова выдворить обратно со двора.

baya.mp4
kseniya.mp4
xenia.mp4
aidar.mp4
random.mp4

Install dependencies

Needs python 3.10 or newer

Windows

pip install -r requirements.txt

Linux

Install gcc if you need to. Then:

sudo apt-get install -y python3-dev libasound2-dev
pip install -r requirements.txt

Run prod

uvicorn main:app --host 0.0.0.0 --port 8080 --workers 1 --proxy-headers

Run Dev Server

uvicorn main:app --reload 

Try it out

Simple GUI

Visit http://localhost:8000/

Run with GET parameters

Visit http://localhost:8000/speak?words=Привет&speaker=xenia&sample_rate=48000

Here you can set 3 parameters:

  1. words - phrase that you want to pronounce
  2. speaker - voice to use. Available voices are: aidar, baya, kseniya, xenia, eugene, random.random generates new voice each time
  3. sample_rate - sets output audio sample rate. Available options are 8000, 24000, 48000.

Generate and download audio

Visit http://localhost:8000/get_audio_file?words=Привет&speaker=xenia&sample_rate=48000 to generate and download audio file. Parameters for get_audio_file are same as for speak

OpenAPI generated docs

In dev mode you can visit http://localhost:8000/docs for OpenApi generated docs.

Speak GET request with curl

curl -X 'GET' 'http://localhost:8000/speak?words=Привет&speaker=xenia&sample_rate=48000' -H 'accept: application/json'

About

russian text to speech using neural network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published