This text to speach works using Silero neural network which is optimized for russian language. Numbers are turned to russian words using num2words and english words are transliterated. By default it uses cpu and 4 cores but you can switch to cuda in NeuralSpeaker.py
You can test Silero text to speech model here https://colab.research.google.com/github/snakers4/silero-models/blob/master/examples_tts.ipynb
Text used as example: На дворе трава, на траве дрова: раз дрова, два дрова, три дрова. На дворе трава, на траве дрова — раз дрова, два дрова, три дрова: дрова вдоль двора, дрова вширь двора, не вместит двор дров, надо дрова выдворить обратно со двора.
baya.mp4
kseniya.mp4
xenia.mp4
aidar.mp4
random.mp4
Needs python 3.10 or newer
pip install -r requirements.txt
Install gcc if you need to. Then:
sudo apt-get install -y python3-dev libasound2-dev
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8080 --workers 1 --proxy-headers
uvicorn main:app --reload
Visit http://localhost:8000/
Visit http://localhost:8000/speak?words=Привет&speaker=xenia&sample_rate=48000
Here you can set 3 parameters:
words
- phrase that you want to pronouncespeaker
- voice to use. Available voices are:aidar
,baya
,kseniya
,xenia
,eugene
,random
.random
generates new voice each timesample_rate
- sets output audio sample rate. Available options are8000
,24000
,48000
.
Visit http://localhost:8000/get_audio_file?words=Привет&speaker=xenia&sample_rate=48000 to generate and download audio file.
Parameters for get_audio_file
are same as for speak
In dev mode you can visit http://localhost:8000/docs for OpenApi generated docs.
curl -X 'GET' 'http://localhost:8000/speak?words=Привет&speaker=xenia&sample_rate=48000' -H 'accept: application/json'