Docker Files¶

KITT uses Docker both for its own container image and for launching inference engines as sibling containers. This page documents the project Dockerfile, the monitoring docker-compose stack, and the Docker patterns KITT relies on.

Dockerfile¶

Location: Dockerfile (project root)

Build stages¶

The Dockerfile uses a single-stage build based on python:3.12-slim:

FROM python:3.12-slim

# System dependencies for psutil/pynvml compilation and Docker CLI
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc python3-dev docker.io \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Install Poetry and project dependencies
COPY pyproject.toml poetry.lock ./
RUN pip install poetry && poetry install --no-root --without dev

COPY src/ ./src/
COPY configs/ ./configs/
RUN poetry install --only-root

ENTRYPOINT ["poetry", "run", "kitt"]

Key points:

docker.io is installed inside the container so KITT can call the Docker CLI to manage inference engine containers.
gcc and python3-dev are needed to compile native extensions for psutil and pynvml.
The entry point runs poetry run kitt, so any CLI command can be passed as arguments (e.g. docker run kitt run -m /model -e vllm).

Multi-architecture support¶

The base image python:3.12-slim supports both amd64 and arm64, so the Dockerfile is architecture-agnostic. Agents build the image locally via kitt-agent build, producing a native image for the host architecture. This avoids cross-architecture issues when the server (amd64) and agents (e.g. ARM64 NVIDIA Grace Blackwell) differ.

Additional Dockerfiles¶

File	Purpose
`docker/web/Dockerfile`	Web dashboard image (also used by `kitt-agent build`)
`docker/llama_cpp/Dockerfile.spark`	llama.cpp build for DGX Spark

docker-compose.yaml (Monitoring)¶

Location: docker/monitoring/docker-compose.yaml

This stack provides metrics collection and visualization:

Service	Image	Port	Purpose
`prometheus`	`prom/prometheus:latest`	9090	Metrics scraping
`grafana`	`grafana/grafana:latest`	3000	Dashboard visualization
`influxdb`	`influxdb:2`	8086	Time-series storage for benchmark data

Named volumes persist data across restarts: prometheus_data, grafana_data, influxdb_data.

Docker socket mounting¶

KITT manages inference engines as sibling containers -- it calls the Docker CLI from inside its own container to start/stop engine containers on the host. This requires mounting the Docker socket:

docker run -v /var/run/docker.sock:/var/run/docker.sock kitt run ...

Without the socket mount, KITT cannot launch or manage engine containers.

Network mode¶

All engine containers use --network host so they bind directly to the host network. This avoids port-mapping complexity and lets KITT reach engines at localhost:<port>:

Engine	Default Port
vLLM	8000
llama.cpp	8081
Ollama	11434
ExLlamaV2	8000

GPU passthrough¶

Engine containers require GPU access. KITT passes --gpus all to Docker when launching engine containers:

docker run --gpus all --network host vllm/vllm-openai:latest ...

Ensure the NVIDIA Container Toolkit is installed on the host system. KITT detects GPU availability through pynvml or by parsing nvidia-smi output.

Model volume mounting¶

Models stored on the host are mounted into engine containers. The host path is typically set via the MODEL_PATH environment variable or the -m CLI flag:

docker run --gpus all --network host \
  -v /models/llama-8b:/models/llama-8b \
  vllm/vllm-openai:latest --model /models/llama-8b