awesome-llm-apps/voice_ai_agents/voice_rag_openaisdk at main · Shubhamsaboo/awesome-llm-apps

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
rag_voice.py		rag_voice.py
requirements.txt		requirements.txt

README.md

🎙️ Voice RAG with OpenAI SDK

This script demonstrates how to build a voice-enabled Retrieval-Augmented Generation (RAG) system using OpenAI's SDK and Streamlit. The application allows users to upload PDF documents, ask questions, and receive both text and voice responses using OpenAI's text-to-speech capabilities.

Features

Creates a voice-enabled RAG system using OpenAI's SDK
Supports PDF document processing and chunking
Uses Qdrant as the vector database for efficient similarity search
Implements real-time text-to-speech with multiple voice options
Provides a user-friendly Streamlit interface
Allows downloading of generated audio responses
Supports multiple document uploads and tracking

How to get Started?

Clone the GitHub repository

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd awesome-llm-apps/rag_tutorials/voice_rag_openaisdk

Install the required dependencies:

pip install -r requirements.txt

Set up your API keys:

Get your OpenAI API key
Set up a Qdrant Cloud account and get your API key and URL
Create a .env file with your credentials:

OPENAI_API_KEY='your-openai-api-key'
QDRANT_URL='your-qdrant-url'
QDRANT_API_KEY='your-qdrant-api-key'

Run the Voice RAG application:

streamlit run rag_voice.py

Open your web browser and navigate to the URL provided in the console output to interact with the Voice RAG system.

How it works?

Document Processing:
- Upload PDF documents through the Streamlit interface
- Documents are split into chunks using LangChain's RecursiveCharacterTextSplitter
- Each chunk is embedded using FastEmbed and stored in Qdrant
Query Processing:
- User questions are converted to embeddings
- Similar documents are retrieved from Qdrant
- A processing agent generates a clear, spoken-word friendly response
- A TTS agent optimizes the response for speech synthesis
Voice Generation:
- Text responses are converted to speech using OpenAI's TTS
- Users can choose from multiple voice options
- Audio can be played directly or downloaded as MP3
Features:
- Real-time audio streaming
- Multiple voice personality options
- Document source tracking
- Download capability for audio responses
- Progress tracking for document processing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voice_rag_openaisdk

voice_rag_openaisdk

README.md

🎙️ Voice RAG with OpenAI SDK

Features

How to get Started?

How it works?

Files

voice_rag_openaisdk

Directory actions

More options

Directory actions

More options

Latest commit

History

voice_rag_openaisdk

Folders and files

parent directory

README.md

🎙️ Voice RAG with OpenAI SDK

Features

How to get Started?

How it works?