Skip to content

Results & KARR

KARR (Kitt's AI Results Repository) persists all benchmark results. The current backend is a relational database (SQLite by default, PostgreSQL for production). Flat JSON files are still written for convenience, and the legacy Git-backed backend remains available.

Database Storage (Default)

Initialize the Database

Create tables in the default SQLite database (~/.kitt/kitt.db) or in a PostgreSQL instance pointed to by KITT_DB_DSN:

kitt storage init

Results are saved to the database automatically after every kitt run. No extra flags are needed.

Import Existing JSON Results

Bring previously exported or flat-file results into KARR:

kitt storage import ./kitt-results/run1/metrics.json
kitt storage import ./kitt-results/               # imports all runs found

Export Results

Export runs from KARR back to JSON files:

kitt storage export --output ./export/
kitt storage export --model llama --engine vllm --output ./export/

List Stored Runs

Browse what is stored in KARR with optional filters:

kitt storage list
kitt storage list --model llama --engine vllm --limit 20

Output includes run ID, model, engine, suite, timestamp, and pass/fail counts.

Database Statistics

Get a high-level summary of KARR contents:

kitt storage stats

This shows total runs, unique models, unique engines, date range, and storage size.

Querying Results

The ResultStore interface supports filtered queries and aggregation from the CLI or programmatically.

Filter by Model, Engine, or Suite

kitt storage list --model "Llama-3.1-8B" --engine vllm
kitt storage list --suite performance --limit 5

Aggregation

Group results by model or engine to compare averages:

kitt storage stats --group-by model
kitt storage stats --group-by engine

This is useful for spotting regressions across a fleet of models or engines.

Schema Migrations

When upgrading KITT, apply any pending database migrations:

kitt storage migrate

The current schema version is 2. See the Database Schema Reference for full table documentation.

Flat File Output

Every kitt run writes JSON results to a kitt-results/ directory in the current working directory, regardless of database settings. These files are handy for:

  • Quick inspection with jq or a text editor
  • Archiving to external storage
  • Sharing individual run data without database access

The flat files mirror the content stored in runs.raw_json in the database.

Comparing Results

CLI Comparison

Compare metrics across two or more benchmark runs:

kitt results compare ./run1 ./run2
kitt results compare ./run1 ./run2 --additional ./run3 --format json

The table output shows min, max, average, standard deviation, and coefficient of variation for each metric. Paths can point to flat-file result directories or exported database runs.

Interactive TUI

Launch a side-by-side terminal comparison (requires the cli_ui extra):

kitt compare ./run1 ./run2

Both kitt results compare and kitt compare work with flat-file directories and database-exported results interchangeably.

Git-Backed Storage (Legacy)

Legacy Backend

Git-backed KARR storage is the previous generation (Gen 2). It remains available via --store-karr for backward compatibility, but the database backend is recommended for all new deployments.

The previous generation of KARR stored results in a Git repository with LFS tracking. To use it, add --store-karr to a run:

kitt results init --path ./my-results
kitt run -m /models/llama-7b -e vllm -s standard --store-karr ./my-results
kitt results list --karr ./my-results

Directory Structure (Gen 2)

karr-<fingerprint>/
  <model>/
    <engine>/
      <timestamp>/
        metrics.json
        summary.md
        hardware.json
        config.json
        outputs/          # compressed .jsonl.gz, tracked by Git LFS

For Docker-based deployments, Git repos are cumbersome to mount and manage. The database backend removes this friction entirely.

Next Steps