Tutorial: First Benchmark¶
This tutorial walks through a complete benchmark run -- from checking your hardware to storing results in KARR.
Prerequisites
KITT must be installed and your GPU must be accessible to Docker. See the Installation guide if you haven't set that up yet.
1. Check Your Hardware Fingerprint¶
KITT identifies each machine by a compact fingerprint string. Run:
Example output:
Use --verbose for a full breakdown of every detected component:
Tip
The fingerprint is embedded in every result set so you can always trace which hardware produced a given benchmark.
2. Pull the Engine Image¶
Before running benchmarks, make sure the engine's Docker image is available locally. KITT can pull it for you:
This downloads the default vLLM image (vllm/vllm-openai:latest). The first
pull may take several minutes depending on your connection.
3. Verify Engine Readiness¶
List all registered engines and their status:
You should see vLLM listed with its image marked as available. If the image
column shows "missing", re-run kitt engines setup vllm.
4. List Available Benchmarks¶
See what benchmarks KITT ships with:
Output groups benchmarks by category:
| Category | Benchmarks |
|---|---|
| Performance | throughput, latency, memory, warmup_analysis |
| Quality | mmlu, gsm8k, truthfulqa, hellaswag |
Note
Quality benchmarks require the datasets extra. Install it with
poetry install -E datasets if you haven't already.
5. Run a Quick Benchmark¶
The quick suite runs a single throughput benchmark -- ideal for verifying that
everything works before committing to a full evaluation.
KITT will:
- Start a vLLM container with your model
- Wait for the health check to pass
- Execute the throughput benchmark
- Tear down the container
- Write results to
kitt-results/and store them in KARR
Warning
Make sure the model format matches the engine. vLLM accepts safetensors/pytorch; llama.cpp and Ollama require GGUF.
6. View the Results¶
Each run produces a timestamped directory under kitt-results/ containing:
| File | Contents |
|---|---|
metrics.json |
Raw benchmark measurements (tokens/sec, latencies, memory) |
hardware.json |
System fingerprint captured at run time |
config.json |
Exact configuration used for the run |
summary.md |
Human-readable Markdown report |
Open the summary for a quick overview:
Or view results as a Rich table in the terminal:
7. Browse Results in KARR¶
Results are stored in KARR automatically. Initialize the database if this is your first run:
Then browse and query your stored results:
You can also import any flat-file results from previous runs:
Tip
KARR uses SQLite by default (~/.kitt/kitt.db) with zero configuration.
For production or multi-agent setups, see the
KARR concepts page for PostgreSQL configuration.
Next Steps¶
- Run the full
standardsuite:kitt run -m /path/to/model -e vllm -s standard - Compare engines: run the same model on multiple engines, then use
kitt compare - Try Docker-based workflows: Docker Quickstart