A Python-based benchmarking tool for comparing different Ollama models' performance in Next.js development tasks. This tool helps developers evaluate and choose the most efficient model for their coding assistance needs.
Sample benchmark results comparing response times across different models
- Benchmark multiple Ollama models simultaneously
- Measure response times and performance metrics
- Test models against real-world Next.js development scenarios
- Generate detailed performance reports
- Isolated testing environment using Python virtual environment
- Python 3.8 or higher
- Ollama installed and running locally
- At least one Ollama model pulled and ready to use
- Clone the repository:
git clone https://github.com/binoymanoj/ollama-benchmark
cd ollama-benchmark
- Create and activate virtual environment:
Windows:
python -m venv venv
venv\Scripts\activate
Unix/MacOS:
python -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Run the benchmark:
python benchmark.py
The benchmark tool provides an interactive terminal interface for selecting which models to test:
Navigation Controls:
↑ or k: Move cursor up ↓ or j: Move cursor down SPACE: Toggle model selection ENTER: Confirm selection and start benchmark
Selection Interface:
Select models using SPACE, navigate with UP/DOWN or j/k. Press ENTER when done.
[ ] llama2
[*] mistral
[ ] codellama
[*] neural-chat
[ ]: Unselected model [*]: Selected model Highlighted row: Current cursor position
Note: You can select multiple models to benchmark them against each other in a single run. There is no limit to the number of models you can select, but testing more models will naturally take longer.
Add or modify test cases in benchmark.py
:
test_cases = [
{
"name": "Custom Test",
"prompt": "Your custom prompt here"
}
]
Default test cases cover common Next.js development scenarios:
-
Component Creation
- Tests model's ability to generate responsive React components
- Evaluates understanding of Next.js patterns
-
API Route Implementation
- Tests knowledge of Next.js API routes
- Evaluates authentication handling
-
Data Fetching
- Tests understanding of Next.js data fetching methods
- Evaluates server-side rendering knowledge
The benchmark generates a JSON report with:
{
"summary": {
"model_name": {
"average_response_time": float,
"min_response_time": float,
"max_response_time": float
}
},
"detailed_results": {
"model_name": {
"response_times": [...],
"responses": [...]
}
}
}
ollama-benchmark/
├── LICENSE
├── README.md
├── benchmark.py
├── requirements.txt
├── screenshots/
└── venv/
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama team for providing the model serving infrastructure
- Next.js community for inspiration on test cases
If you encounter any issues or have questions:
- Check existing GitHub issues
- Create a new issue with detailed description
- Include your system information and Ollama version
Made with ❤️ by Binoy Manoj