GreenThread Docs

The GreenThread platform is three independent components that snap together. You install only what you need — every layer below the one you stop at is optional.

Component	Role	Audience
GreenThread	The engine. Kubernetes-native GPU-sharing platform; deploys models, schedules them across GPUs, sleeps idle ones to free VRAM.	Platform team
LiquidCompute	Self-hosted "Render.com for GPUs". Adds a Service abstraction, Projects, custom domains, a Fleet view, and an OpenAI-aggregator proxy on top of the engine.	Platform team / app teams
AI Console	Customer-facing AI portal. API keys, usage dashboards, audit log, and chat / TTS / STT playgrounds in front of LiquidCompute.	End users / customers

How they fit together

Request flow for a chat completion

A customer's app POSTs to https://api.example.com/v1/chat/completions with an API key. The DNS points at AI Console.
AI Console authenticates the key, records the request for usage, and proxies the payload to LiquidCompute's lc-proxy.
lc-proxy selects the right Model based on the request's model field and forwards to the engine pod's sidecar.
The engine's sidecar:
- If the Model is serving, forwards straight to vLLM.
- If the Model is sleeping, restores weights to GPU (usually < 1 s), then forwards.
- If the GPU is full, the fairness policy sleeps an idle Model first.
The streamed response flows back through lc-proxy, AI Console, to the customer.

The customer never sees the engine, the GPU scheduling, or the sleep/wake — they get a standard OpenAI API.

Install order

Always install bottom-up. Each layer assumes the one below it exists.

Stop at step 2 if you only want kubectl apply -f model.yaml and a per-Model URL. The engine is a complete, standalone platform.
Stop at step 3 if you want a self-service Projects/Services UI and a unified /v1/* endpoint, but your end users are internal engineers, not customers.
Go to step 4 when you want a customer-facing portal with API keys, usage, and chat playgrounds — i.e. you're building "our own OpenAI".

Component reference

GreenThread (engine) — the inference engine
LiquidCompute — Render.com-for-GPUs platform
AI Console — customer-facing portal
Licensing — how the cluster authorises itself