Skip to content

Latest commit

 

History

History

voice_rag_openaisdk

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

🎙️ Voice RAG with OpenAI SDK

This script demonstrates how to build a voice-enabled Retrieval-Augmented Generation (RAG) system using OpenAI's SDK and Streamlit. The application allows users to upload PDF documents, ask questions, and receive both text and voice responses using OpenAI's text-to-speech capabilities.

Features

  • Creates a voice-enabled RAG system using OpenAI's SDK
  • Supports PDF document processing and chunking
  • Uses Qdrant as the vector database for efficient similarity search
  • Implements real-time text-to-speech with multiple voice options
  • Provides a user-friendly Streamlit interface
  • Allows downloading of generated audio responses
  • Supports multiple document uploads and tracking

How to get Started?

  1. Clone the GitHub repository
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd awesome-llm-apps/rag_tutorials/voice_rag_openaisdk
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Set up your API keys:
OPENAI_API_KEY='your-openai-api-key'
QDRANT_URL='your-qdrant-url'
QDRANT_API_KEY='your-qdrant-api-key'
  1. Run the Voice RAG application:
streamlit run rag_voice.py
  1. Open your web browser and navigate to the URL provided in the console output to interact with the Voice RAG system.

How it works?

  1. Document Processing:

    • Upload PDF documents through the Streamlit interface
    • Documents are split into chunks using LangChain's RecursiveCharacterTextSplitter
    • Each chunk is embedded using FastEmbed and stored in Qdrant
  2. Query Processing:

    • User questions are converted to embeddings
    • Similar documents are retrieved from Qdrant
    • A processing agent generates a clear, spoken-word friendly response
    • A TTS agent optimizes the response for speech synthesis
  3. Voice Generation:

    • Text responses are converted to speech using OpenAI's TTS
    • Users can choose from multiple voice options
    • Audio can be played directly or downloaded as MP3
  4. Features:

    • Real-time audio streaming
    • Multiple voice personality options
    • Document source tracking
    • Download capability for audio responses
    • Progress tracking for document processing