A powerful Python application that converts text to speech using OpenAI's Text-to-Speech API. Designed for handling texts of any size with automatic chunking and seamless audio stitching.
📌 Repository: github.com/theosp/open-ai-text-to-speech
This project's development was sponsored by JustDo.com, a Source-Available enterprise-grade project management platform.
Every business is unique, and no single project management tool can meet everyone's needs. JustDo's source-availability lets you turn complexity into your competitive edge.
Check out JustDo on GitHub: github.com/justdoinc/justdo - stars are highly appreciated 🙏
- Convert text from files to speech with high-quality results
- Direct PDF document support with automatic text extraction
- Multiple voice options (alloy, echo, fable, onyx, nova, shimmer)
- Robust command-line interface with intuitive options
- Smart handling of large files with automatic chunking and stitching
- Cost estimation before processing to avoid surprises
- Comprehensive test suite with 100% client-side test coverage
- Advanced error handling with automatic retries
- Works with both standard and high-definition TTS models
- Python 3.7+
- FFmpeg (required for audio processing and stitching)
- OpenAI API key
FFmpeg is required for audio processing and file stitching. The application will not work properly without it.
# Using Homebrew
brew install ffmpeg
sudo apt-get update
sudo apt-get install ffmpeg
# Using Chocolatey
choco install ffmpeg
# Or using Scoop
scoop install ffmpeg
To verify installation:
ffmpeg -version
# Clone the repository
git clone https://github.com/theosp/open-ai-text-to-speech.git
cd open-ai-text-to-speech
# Install dependencies
pip install -r requirements.txt
You'll need an OpenAI API key to use the Text-to-Speech service. You can provide it in one of three ways:
-
As a command-line argument:
python generator.py --api-key YOUR_API_KEY
-
As an environment variable:
export OPENAI_API_KEY="your-api-key-here"
-
In a
.env
file in the project directory:OPENAI_API_KEY=your-api-key-here
# Create an input.txt file with your text
echo "Hello, this is a test of the text to speech system." > input.txt
# Generate speech with default settings
python generator.py --api-key YOUR_OPENAI_API_KEY
# Use a different voice
python generator.py --api-key YOUR_API_KEY --voice nova
# Specify input and output files
python generator.py --api-key YOUR_API_KEY --input-file custom.txt --output-file custom.mp3
# Use a different model (higher quality)
python generator.py --api-key YOUR_API_KEY --model tts-1-hd
# Skip the confirmation prompt
python generator.py --api-key YOUR_API_KEY --force
The application automatically handles large text files:
- Text exceeding OpenAI's character limit (4096 characters) is split into smaller chunks
- Each chunk is processed separately
- The resulting audio files are stitched together seamlessly
- The final audio file is saved to the specified output location
Before processing, you'll see information about:
- Text length
- Number of chunks required
- Estimated cost based on character count and model
- Option to confirm or cancel the operation
Instead of passing your API key as a command line argument, you can set it as an environment variable:
export OPENAI_API_KEY="your-api-key-here"
python generator.py
Or use a .env
file in the project directory:
OPENAI_API_KEY=your-api-key-here
The application calculates cost based on OpenAI's pricing:
- Standard model (tts-1): $0.015 per 1,000 characters
- High-definition model (tts-1-hd): $0.030 per 1,000 characters
You'll see the estimated cost before processing, giving you a chance to cancel if needed.
This project has extensive test coverage for both server-side and client-side code.
# Navigate to the project directory
cd text-to-speech
# Install npm dependencies if not already installed
npm install
# Run client-side tests
npm test
# Run Python tests
pytest tests/
# Run tests with coverage report
pytest --cov=app --cov=generator --cov=utils
For more detailed information about testing practices in this project, see:
These guides include best practices for:
- Asynchronous testing
- DOM manipulation testing
- API mocking
- Proper test structure and organization
The application handles various error scenarios:
- Missing API key
- Authentication errors
- Rate limiting
- Network timeouts
- File I/O errors
- Large file processing issues
Contributions are welcome! Here's how you can help improve this project:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name
- Make your changes
- Run the tests to ensure everything works:
python run_tests.py --coverage
- Submit a pull request
- Follow PEP 8 style guidelines
- Write tests for new features
- Maintain or improve code coverage (currently at 78%)
- Document new features in the README
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for their Text-to-Speech API
- FFmpeg project for audio processing capabilities
- Contributors to all the open source libraries used in this project
In addition to the command-line tool, this project now includes a web interface built with Flask, making it easier to use the text-to-speech generation capabilities.
- User-friendly web interface for text input
- Support for all available voices and models
- Preview of estimated cost before processing
- Storage of generation history
- Audio playback, download, and management
The application requires an OpenAI API key to function. You can set this up in one of two ways:
-
Copy the example environment file and edit it:
cp .env.example .env # Edit .env with your OpenAI API key
-
Export the environment variable directly:
export OPENAI_API_KEY=your-api-key-here
The easiest way to run the application is using Docker, which eliminates any dependency issues:
-
Make sure you have Docker and Docker Compose installed.
-
Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your-api-key-here
Or create a
.env
file in the project directory with the following content:OPENAI_API_KEY=your-api-key-here
-
Build and start the Docker container:
# If using Docker Compose V2 docker compose up -d # If using Docker Compose V1 docker-compose up -d
-
Open your web browser and navigate to:
http://localhost:5001/
-
To stop the container:
# If using Docker Compose V2 docker compose down # If using Docker Compose V1 docker-compose down
Note: The Docker setup has been tested and works correctly. The application will be available on port 5001, and all generated files will be stored in the
output
directory, which is mounted as a volume in the container.
If you prefer to run the application locally:
-
Make sure you have installed all the required dependencies:
pip install -r requirements.txt
-
Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your-api-key-here
Or use a
.env
file in the project directory with the following content:OPENAI_API_KEY=your-api-key-here
-
Start the Flask application:
python app.py
-
Open your web browser and navigate to:
http://localhost:5001/
- Enter the text you want to convert to speech in the text area
- Select your preferred voice and model
- Click the "Preview Cost" button to see the estimated cost before proceeding
- Click "Generate Speech" to convert your text to speech
- When processing is complete, you'll be redirected to a results page where you can:
- Play the generated audio
- Download the MP3 file
- View processing details
- Visit the History page to access all your previously generated audio files
Generated audio files are stored in the output
directory. The application maintains a history of all generations in a JSON file.