With your cluster prerequisites in place (GPU Operator + DRA Driver), install the GreenThread engine via Helm.
This page covers the engine only. To layer the UI and customer portal on top, install LiquidCompute and AI Console afterwards.
Pull credentials
The chart and the engine images all live on licence.greenthread.ai — see Licensing › Pulling charts and images for the URL convention. Two setup steps, both using your licence token:
# 1. Helm registry login (one-time, per workstation)
helm registry login licence.greenthread.ai \
--username licence \
--password <your-licence-token>
# 2. In-cluster image-pull secret (the namespace must exist first)
kubectl create namespace greenthread-system
kubectl create secret docker-registry greenthread-registry \
-n greenthread-system \
--docker-server=licence.greenthread.ai \
--docker-username=licence \
--docker-password=<your-licence-token>
Install
helm upgrade --install greenthread \
oci://licence.greenthread.ai/greenthread/charts/greenthread \
--namespace greenthread-system \
--set licence.token=<your-licence-token> \
--set controller.huggingFaceToken=<your-hf-token> \
--set 'global.imagePullSecrets[0].name=greenthread-registry'
The chart's agent, controller, webhook, and sidecar images all default to licence.greenthread.ai/greenthread/*, so there are no image-repository overrides to pass — the pull secret above is what authenticates them. Image tags default to the chart's appVersion, and the licence server only serves tags you have access to. If you have direct GHCR access instead, point the repos back at ghcr.io/greenthread-ai/* with --set agent.image.repository=… (and the controller / webhook / controller.sidecarImage equivalents).
That's it. The chart installs:
| Workload | Type | What it does |
|---|---|---|
greenthread-controller | Deployment (2 replicas, leader-elected) | Reconciles Model, GPUShareClaim, GPU, StorageNode. One leader phones home to the licence server. |
greenthread-webhook | Deployment (2 replicas) | Validating admission webhook for Model + GPUShareClaim. |
greenthread-agent | DaemonSet (on every node with nvidia.com/gpu.present=true) | Discovers GPUs via NVML, runs per-GPU scheduler daemons, registers as a DRA kubelet plugin, publishes a ResourceSlice. |
Plus the licence Secret (greenthread-licence in the operator namespace) and supporting RBAC.
The standalone storage server has been folded into the per-Model sidecar. There's no gthread-storage DaemonSet to operate, no hugepages to configure, no NVMe host-mount to plan around. Weight staging is handled inline.
Helm values reference
The full chart values.yaml is at greenthread/charts/greenthread/values.yaml. The values most installs care about:
| Value | Default | Purpose |
|---|---|---|
licence.enabled | true | Set false to skip the licence Secret + RBAC (dev clusters only). |
licence.token | "" | Customer licence token. Required when enabled=true. |
licence.serverURL | https://licence.greenthread.ai | Licence server base URL. See Licensing. |
slotsPerGPU | 10 | How many DRA slots the agent publishes per physical GPU. Higher = finer-grained sharing of small models; lower = whole-GPU workloads. |
controller.replicas | 2 | Controller replica count (leader-elected). |
controller.vllmImage | vllm/vllm-openai:v0.19.0 | Default vLLM image stamped on every Model pod when Model.spec.vllmImage is empty. |
controller.vllmOmniImage | vllm/vllm-omni:v0.20.0 | Same, but for modelType: tts (vllm-omni). |
controller.huggingFaceToken | "" | Passed to conversion Jobs as HF_TOKEN for gated models (Llama, Gemma, etc.). |
controller.imagePullSecret | "" | Image-pull Secret attached to every Model pod and copied into every Model namespace. |
controller.runtimeClassName | "" | Optional runtimeClassName (e.g. nvidia) for GPU pods. |
agent.nodeSelector | { nvidia.com/gpu.present: "true" } | Which nodes the agent DaemonSet runs on. |
agent.cdiSpecDir | /var/run/cdi | Host path the DRA driver writes per-claim CDI specs to. Must be in containerd's cdi_spec_dirs. |
webhook.failurePolicy | Ignore | Fail if you want unreachable webhook to block all CR writes; Ignore if you'd rather risk a brief admission gap during a rollout. |
Verify
# 1. All pods Running
kubectl get pods -n greenthread-system
# 2. The licence Secret has a cached JWT (within ~1 minute of install)
kubectl get secret greenthread-licence -n greenthread-system -o json \
| jq '.data | keys'
# → ["cached-jwt", "token"]
# 3. GPUs auto-discovered (one GPU CR per physical device)
kubectl get gpus
Example from a 4-GPU dev cluster (1 dev node + 2 Blackwell workstation nodes):
$ kubectl get pods -n greenthread-system
NAME READY STATUS AGE
greenthread-agent-6rxxs 1/1 Running 4h
greenthread-agent-hhzgj 1/1 Running 4h
greenthread-agent-xv88f 1/1 Running 4h
greenthread-controller-85d5fc9d68-j8kw8 1/1 Running 4h
greenthread-controller-85d5fc9d68-vscpb 1/1 Running 4h
greenthread-webhook-c4cb6887b-5sfb5 1/1 Running 4h
greenthread-webhook-c4cb6887b-5vhhm 1/1 Running 4h
$ kubectl get gpus
NAME NODE INDEX PRODUCT MODE HEALTH AVAILABLE
gpu-blackwell-0-0 blackwell-0 0 NVIDIA-RTX-PRO-4000-Blackwell inference Healthy 4178575360
gpu-blackwell-1-0 blackwell-1 0 NVIDIA-RTX-PRO-4000-Blackwell inference Healthy 3740270592
gpu-gt-gpu-dev-01-0 gt-gpu-dev-01 0 NVIDIA-A40 inference Healthy 2856321024
gpu-gt-gpu-dev-01-1 gt-gpu-dev-01 1 NVIDIA-A40 inference Healthy 51527024640
The AVAILABLE column shows free VRAM in bytes; MODE is the current sharing class published by the DRA driver.
Next steps
- Deploy your first Model — drop a Model CR, watch it convert, hit it with
curl. - LiquidCompute — add the Projects / Services UI on top.
- AI Console — add customer-facing API keys + playgrounds.
- Monitoring — wire in Prometheus + Grafana.
Troubleshooting
licence.token is required
You ran helm install without --set licence.token=…. Either provide one or pass --set licence.enabled=false to install in unlicenced mode (dev clusters only).
kubectl get secret greenthread-licence shows only token, no cached-jwt
The controller couldn't reach the licence server. Check logs:
kubectl logs -n greenthread-system -l app.kubernetes.io/component=controller \
--tail=50 | grep licence
Common causes: outbound HTTPS blocked by a NetworkPolicy, wrong licence.serverURL, or a revoked / mistyped token.
Webhook denies every Model create with "cluster is unlicenced"
Same as above — the cached JWT is missing or expired. The webhook fails closed when the licence reader has no valid cached JWT to verify.
Agent stuck CrashLoopBackOff
Almost always a CDI spec dir mismatch. Verify agent.cdiSpecDir (default /var/run/cdi) is one of the dirs in your containerd config's cdi_spec_dirs (the GPU Operator usually configures /etc/cdi and /var/run/cdi).
DRA Driver controller Pending on a managed Kubernetes service
Managed control planes have no schedulable nodes for the DRA controller. Override the affinity:
--set controller.nodeSelector=null \
--set "controller.affinity=null" \
--set "controller.tolerations[0].key=CriticalAddonsOnly" \
--set "controller.tolerations[0].operator=Exists"
(Pass these to the nvidia-dra-driver-gpu chart, not the GreenThread chart.)
