Onyx is an open-source AI assistant and knowledge platform with RAG (retrieval-augmented generation), connectors for Slack, Google Drive, Confluence, and more, plus a chat UI. By connecting Onyx to GreenThread, all RAG and chat inference runs on your self-hosted models instead of cloud APIs.
Architecture
Users → Onyx (Chat UI / RAG) → Bifrost or GreenThread Ingress → Models
Onyx uses LiteLLM under the hood, so any OpenAI-compatible endpoint works as an LLM provider. You can point it at:
- Bifrost — if you want auth, logging, and multi-provider routing
- GreenThread ingress — for direct access to your models
Prerequisites
Onyx's Helm chart bundles PostgreSQL, Redis, and nginx as sub-chart dependencies. In most clusters you'll already have these (or prefer your own). Disable them and deploy standalone instances instead.
Create namespace
kubectl create namespace onyx
Deploy PostgreSQL
You can use any PostgreSQL instance — CloudNativePG, a managed service like RDS, or a simple deployment. Here's a minimal deployment for testing:
apiVersion: v1
kind: Secret
metadata:
name: onyx-postgresql
namespace: onyx
stringData:
username: postgres
password: postgres
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: onyx-postgres
namespace: onyx
spec:
replicas: 1
selector:
matchLabels:
app: onyx-postgres
template:
metadata:
labels:
app: onyx-postgres
spec:
containers:
- name: postgres
image: postgres:16-alpine
env:
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_PASSWORD
value: postgres
- name: POSTGRES_DB
value: postgres
ports:
- containerPort: 5432
---
apiVersion: v1
kind: Service
metadata:
name: onyx-postgres
namespace: onyx
spec:
selector:
app: onyx-postgres
ports:
- port: 5432
This minimal PostgreSQL deployment has no persistence or backups. For production, use CloudNativePG, Amazon RDS, or another managed PostgreSQL service.
Deploy Redis
apiVersion: apps/v1
kind: Deployment
metadata:
name: onyx-redis
namespace: onyx
spec:
replicas: 1
selector:
matchLabels:
app: onyx-redis
template:
metadata:
labels:
app: onyx-redis
spec:
containers:
- name: redis
image: redis:7-alpine
args: ["--requirepass", "password"]
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: onyx-redis
namespace: onyx
spec:
selector:
app: onyx-redis
ports:
- port: 6379
Apply both:
kubectl apply -f onyx-prereqs.yaml
Install Onyx
Add the Helm repo
helm repo add onyx https://onyx-dot-app.github.io/onyx/
helm repo update
Create a values file
Create onyx-values.yaml:
# Disable bundled operators — we deploy our own
redis:
enabled: false
postgresql:
enabled: false
nginx:
enabled: false
# Disable Onyx's built-in inference model server
# (we use GreenThread for LLM inference)
# Keep indexing model server — Onyx requires it for embeddings at startup
inferenceCapability:
replicaCount: 0
# Point at our standalone Redis + Postgres
configMap:
DOMAIN: "onyx.example.com"
WEB_DOMAIN: "https://onyx.example.com"
POSTGRES_HOST: "onyx-postgres"
REDIS_HOST: "onyx-redis"
# Ingress (assumes nginx ingress controller is installed)
ingress:
enabled: true
className: nginx
api:
host: onyx.example.com
webserver:
host: onyx.example.com
# TLS via cert-manager
letsencrypt:
enabled: true
email: admin@example.com
# OpenSearch password
auth:
opensearch:
values:
opensearch_admin_password: "<your-opensearch-password>"
Set POSTGRES_HOST to the Service name of your PostgreSQL instance. If using CloudNativePG, this is typically <cluster-name>-rw. The Onyx chart expects a secret named onyx-postgresql with username and password keys.
Install
helm upgrade --install onyx onyx/onyx \
--namespace onyx \
-f onyx-values.yaml
Wait for all pods to come up:
kubectl get pods -n onyx -w
Connect Onyx to GreenThread
The LLM provider is configured in the Onyx admin UI, not in Helm values.
- Open your Onyx instance (e.g.
https://onyx.example.com) - Create your admin account on first login
- Go to Admin Panel → LLM → Add Custom LLM Provider
Provider settings
| Field | Value |
|---|---|
| Display Name | GreenThread |
| Provider Name | openai |
| API Key | Your Bifrost Virtual Key, or not-needed if connecting to the ingress directly |
| API Base | http://bifrost.bifrost.svc.cluster.local:8080/v1 or http://gthread-ingress.greenthread-system.svc.cluster.local/v1 |
Via Bifrost — use if you want auth, request logging, and cost tracking. Set the API Base to http://bifrost.bifrost.svc.cluster.local:8080/v1 and the API Key to a Bifrost Virtual Key. Prefix model names with greenthread/ (e.g. greenthread/gpt-oss-20b).
Direct to GreenThread — simpler setup. Set the API Base to http://gthread-ingress.greenthread-system.svc.cluster.local/v1 and the API Key to not-needed. Use the HuggingFace model ID as the model name (e.g. Qwen/Qwen3.5-9B).
Add models
Under Model Configurations, add each model you want Onyx to use:
| Model Name | Max Input Tokens |
|---|---|
greenthread/gpt-oss-20b | 128000 |
Qwen/Qwen3-4B-Thinking-2507 | 128000 |
Qwen/Qwen3.5-9B | 128000 |
Set the Default Model to your preferred model for chat.
Click Update to save.
The model names must match what the LLM backend expects. When routing through Bifrost, prefix with the provider name (e.g. greenthread/model-name). When connecting directly to the GreenThread ingress, use the HuggingFace model ID (e.g. Qwen/Qwen3.5-9B).
Verifying the integration
After saving the LLM provider config:
- Go to the Onyx chat interface
- Start a new conversation
- Ask a question — Onyx will send the request to GreenThread via LiteLLM
- If the model is sleeping, GreenThread wakes it automatically (first response may take a few seconds longer)
You can verify requests are flowing by checking the GreenThread ingress logs:
kubectl logs -n greenthread-system deployment/gthread-ingress -f | grep chat/completions
Troubleshooting
Onyx pods in CrashLoopBackOff
The most common cause is PostgreSQL or Redis not being reachable. Check:
# Verify Postgres is up
kubectl get pods -n onyx -l app=onyx-postgres
# Verify the secret exists
kubectl get secret -n onyx onyx-postgresql
# Check Onyx API server logs
kubectl logs -n onyx deployment/onyx-api-server --tail=50
"Model not found" errors in chat
Ensure the model names in the Onyx LLM config exactly match what GreenThread serves. Check available models:
# Direct to ingress
curl http://gthread-ingress.greenthread-system.svc.cluster.local/v1/models
# Or via Bifrost
curl http://bifrost.bifrost.svc.cluster.local:8080/v1/models \
-H "x-provider: greenthread"
Embedding models
Onyx requires an embedding model for document indexing. By default, the Onyx Helm chart deploys its own indexing model server (inferenceCapability). If you set inferenceCapability.replicaCount: 0, Onyx will not be able to index documents unless you configure an embedding model in the LLM admin panel.
For most deployments, leave the default Onyx indexing model server running — it handles embeddings separately from the LLM used for chat.
