Whisper Assistant: Your Voice-Driven Coding Companion

Whisper Assistant is an extension for Visual Studio Code and its forks, e.g. Cursor that transcribes your spoken words into text within the programming editor. This hands-free approach to coding allows you to focus on your ideas instead of your typing.

Whisper Assistant can also be integrated with other powerful AI tools, such as Chat GPT-4 or the Cursor.so application, to create a dynamic, AI-driven development environment.

Powered by OpenAI Whisper

Whisper Assistant utilizes the Whisper AI, offering voice transcription in multiple ways:

Local Processing: Free voice transcription using locally installed Whisper models
API-based Processing: Enhanced transcription via hosted API services

By default, the base model of Whisper AI is used, balancing accuracy and performance. You can select a different model in the extension settings, but remember to download your chosen model before using Whisper Assistant. Failure to download the selected model can lead to errors. The base model is recommended and set as default.

For more details about Whisper, visit the Whisper OpenAI GitHub page.

Getting Started: Complete Installation Guide

1. Prerequisites Installation

Install SoX to enable microphone recording through the command line:

MacOS:
```
brew install sox
```
Windows:
```
choco install sox.portable
```
Ubuntu/Linux:
```
sudo apt install sox pulseaudio
```

2. Whisper Installation

If you plan to use local processing, follow these steps:

Install Python 3 (version 3.8 or higher recommended)
Install PIP (should come with Python)
Install Whisper:
```
pip install -U openai-whisper
```
This will automatically install PyTorch and other dependencies.
Verify installation:
```
whisper --help
```

Note on PyTorch: Whisper relies on PyTorch for its neural network operations. When you install Whisper via pip, PyTorch is automatically installed as a dependency. The verification script will check for proper PyTorch installation.

GPU Acceleration (Optional)

For faster transcription with GPU support:

Ensure you have a compatible NVIDIA GPU
Install CUDA toolkit (version 11.6+ recommended)
PyTorch will automatically detect and use CUDA if available

3. Extension Installation

Install the Whisper Assistant extension:

Open VS Code or Cursor.so
Go to Extensions (Ctrl+Shift+X or Cmd+Shift+X)
Search for "Whisper Assistant"
Click "Install"

4. API Provider Configuration

Whisper Assistant supports multiple API providers for transcription:

Open VS Code settings (File > Preferences > Settings)
Search for "Whisper Assistant"
Configure the following settings:
- API Provider: Choose from:
  - local (default, uses locally installed Whisper)
  - openai (uses OpenAI's API)
  - localhost (uses local Faster Whisper server)
- API Key: Enter your provider's API key (if applicable)
- Whisper Model: Choose the model size (tiny, base, small, medium, large)

5. Verify Your Environment (Recommended)

To ensure everything is set up correctly, run the included verification script:

Download setup_whisper_assistant.sh from the repository
Make it executable:
```
chmod +x setup_whisper_assistant.sh
```
Run the script:
```
./setup_whisper_assistant.sh
```

The script will verify your environment, checking for:

Required dependencies (SoX, Python, ffmpeg)
Whisper installation
PyTorch installation and CUDA capability
Recording devices
Docker (if using local server)

This comprehensive check will help ensure Whisper Assistant is ready to use.

How to Use Whisper Assistant

Initialization: Upon loading Visual Studio Code, the extension verifies the correct installation of SoX and Whisper. If any issues are detected, an error message will be displayed. These dependencies must be installed to use the extension.

Once initialization is complete, a quote icon will appear in the bottom right status bar.

Starting the Recording: Activate the extension by clicking on the quote icon or using the shortcut Command+M (for Mac) or Control+M (for Windows). You can record for as long as you like, but remember, the longer the recording, the longer the transcription process. The recording time will be displayed in the status bar.

Stopping the Recording: Stop the recording using the same shortcut (Command+M or Control+M). The extension icon in the status bar will change to a loading icon, and a progress message will be displayed, indicating that the transcription is underway.

Transcription: Once the transcription is complete, the text will be saved to the clipboard. This allows you to use the transcription in any program, not just within Visual Studio Code. If an editor is active, the transcription will be pasted there automatically.

Tip: A good microphone will improve transcription accuracy, although it is not a requirement.

Tip: For an optimal experience, consider using the Cursor.so application to directly call the Chat GPT-4 API for code instructions. This allows you to use your voice to instruct GPT to refactor your code, write unit tests, and implement various improvements.

Using Whisper Assistant with Cursor.so

To enhance your development experience with Cursor.so and Whisper Assistant, follow these simple steps:

Start the recording: Press Command+M (Mac) or Control+M (Windows).
Speak your instructions clearly.
Stop the recording: Press Command+M (Mac) or Control+M (Windows). Note: This initiates the transcription process.
Open the Cursor dialog: Press Command+K or Command+L. Important: Do this before the transcription completes.
The transcribed text will automatically populate the Cursor dialog. Here, you can edit the text or add files/docs, then press Enter to execute the GPT query.

By integrating Cursor.so with Whisper Assistant, you can provide extensive instructions without the need for typing, significantly enhancing your development workflow.

Troubleshooting Common Issues

Permission Errors (ENOENT)

If you encounter "ENOENT: no such file or directory" errors:

Check that all prerequisites are installed correctly

Ensure the directories used by the extension have proper permissions:

# Create and set permissions for the temp directory
mkdir -p ~/.whisper-assistant-vscode/temp
chmod 755 ~/.whisper-assistant-vscode/temp

On Linux systems, make sure PulseAudio is running:
```
pulseaudio --start
```

Recording Device Issues

If your microphone isn't being detected:

Make sure your microphone is connected and enabled in your system settings
Run the verification script to check detected recording devices:
```
./setup_whisper_assistant.sh
```
Try using a different microphone or adjusting your system's audio input settings

API Key or Provider Issues

If experiencing issues with API providers:

Verify your API key is correctly entered in the extension settings
Check that you've selected the right provider
If using the local provider, ensure Whisper is installed correctly

PyTorch or CUDA Issues

If encountering problems with PyTorch or GPU acceleration:

Verify PyTorch installation:

python3 -c "import torch; print(f'PyTorch {torch.__version__} installed')"

Check CUDA availability:

python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

For GPU acceleration issues, ensure compatible CUDA drivers are installed for your GPU

Development Setup

For contributors who want to work on the extension:

Clone the repository:

git clone https://github.com/martin-opensky/whisper-assistant-vscode.git
cd whisper-assistant-vscode

Install dependencies:
```
npm install
```

Copy the settings template:

cp .vscode/settings.template.json .vscode/settings.json

Run the extension in development mode:
```
npm run watch
```
Press F5 to open a new window with the extension loaded

Local Development with Faster Whisper

This extension supports using a local Faster Whisper model through Docker. This provides faster transcription and doesn't require an API key.

Setting up the Local Server

Install Docker on your system

Build the Docker image:

docker build -t whisper-assistant-server .

Run the container:

docker run -d -p 4444:4444 --name whisper-assistant whisper-assistant-server

Using the Local Server

Open VSCode settings (File > Preferences > Settings)
Search for "Whisper Assistant"
Set "Api Provider" to "localhost"
Set "Api Key" to any non-empty string (e.g., "local")
The extension will now use your local Faster Whisper server

Available Models

The local server uses the "base" model by default. To use a different model, modify the server/main.py file:

model = WhisperModel("large-v2", device="cpu", compute_type="int8")

Available models:

tiny
base
small
medium
large-v2
large-v3

Note: Larger models require more memory but provide better accuracy.

Using GPU Acceleration

If you have a CUDA-capable GPU, you can modify the Dockerfile to use the GPU version:

FROM python:3.10-slim

# Install CUDA dependencies
RUN apt-get update && apt-get install -y \
    git \
    ffmpeg \
    cuda-toolkit-11-8 \
    && rm -rf /var/lib/apt/lists/*

# ... rest of Dockerfile ...

Then update the model initialization in server/main.py:

model = WhisperModel("base", device="cuda", compute_type="float16")

Run the container with GPU support:

docker run -d -p 4444:4444 --gpus all --name whisper-assistant whisper-assistant-server

Troubleshooting

Check if the server is running:
```
curl http://localhost:4444/health
```
View server logs:
```
docker logs whisper-assistant
```
If you encounter memory issues, try a smaller model or increase Docker's memory limit.

Disclaimer

Please note that this extension has been primarily tested on Mac OS and Linux Ubuntu 22.04 LTS. While efforts have been made to ensure compatibility, its functionality on other platforms such as Windows cannot be fully guaranteed. I welcome and appreciate any pull requests to address potential issues encountered on these platforms.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
.yarn		.yarn
images		images
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
.vscodeignore		.vscodeignore
.yarnrc		.yarnrc
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
package.json		package.json
setup_whisper_assistant.sh		setup_whisper_assistant.sh
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Assistant: Your Voice-Driven Coding Companion

Powered by OpenAI Whisper

Getting Started: Complete Installation Guide

1. Prerequisites Installation

2. Whisper Installation

GPU Acceleration (Optional)

3. Extension Installation

4. API Provider Configuration

5. Verify Your Environment (Recommended)

How to Use Whisper Assistant

Using Whisper Assistant with Cursor.so

Troubleshooting Common Issues

Permission Errors (ENOENT)

Recording Device Issues

API Key or Provider Issues

PyTorch or CUDA Issues

Development Setup

Local Development with Faster Whisper

Setting up the Local Server

Using the Local Server

Available Models

Using GPU Acceleration

Troubleshooting

Disclaimer

About

Releases

Packages

Languages

License

EdwardTang/whisper-assistant-cursor

Folders and files

Latest commit

History

Repository files navigation

Whisper Assistant: Your Voice-Driven Coding Companion

Powered by OpenAI Whisper

Getting Started: Complete Installation Guide

1. Prerequisites Installation

2. Whisper Installation

GPU Acceleration (Optional)

3. Extension Installation

4. API Provider Configuration

5. Verify Your Environment (Recommended)

How to Use Whisper Assistant

Using Whisper Assistant with Cursor.so

Troubleshooting Common Issues

Permission Errors (ENOENT)

Recording Device Issues

API Key or Provider Issues

PyTorch or CUDA Issues

Development Setup

Local Development with Faster Whisper

Setting up the Local Server

Using the Local Server

Available Models

Using GPU Acceleration

Troubleshooting

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages