Bifrost is an open-source AI gateway that provides rate limiting, authentication, request logging, cost tracking, and multi-provider routing. It connects to GreenThread as an OpenAI-compatible provider, giving your team a managed access layer in front of your self-hosted models.
Why use Bifrost with GreenThread?
GreenThread handles GPU scheduling, model lifecycle, and inference serving. Bifrost sits in front as an API gateway and adds:
- Authentication & API keys — issue keys per team or application
- Request logging & audit trails — track usage across models and users
- Cost tracking — monitor token usage and costs per key
- Rate limiting — protect models from request spikes
- Multi-provider routing — route between GreenThread and cloud providers (OpenAI, Anthropic, etc.) from a single endpoint
- Governance headers — enforce organizational policies on requests
Architecture
Clients → Bifrost (AI Gateway) → GreenThread Ingress → Models
Bifrost connects to the GreenThread ingress service as a custom OpenAI-compatible provider. All inference requests flow through Bifrost for authentication and logging, then to the ingress which handles model routing, wake/sleep, and proxying to vLLM.
Install Bifrost
Add the Helm repo
helm repo add bifrost https://maximhq.github.io/bifrost/helm-charts
helm repo update
Create namespace
kubectl create namespace bifrost
Create a values file
Create bifrost-values.yaml:
bifrost:
# Admin dashboard credentials
authConfig:
isEnabled: true
adminUsername: admin
adminPassword: <your-admin-password>
# Set to true to skip auth on inference endpoints
# (useful for internal clusters)
disableAuthOnInference: true
client:
enforceGovernanceHeader: true
# Connect to GreenThread as an OpenAI-compatible provider
providers:
greenthread:
custom_provider_config:
base_provider_type: openai
allowed_requests:
chat_completion: true
chat_completion_stream: true
completion: true
embeddings: true
model_list: true
keys:
- name: default
value: not-needed
weight: 1
network_config:
base_url: http://gthread-ingress.greenthread-system.svc.cluster.local/
default_request_timeout_in_seconds: 300
The base_url uses the Kubernetes service DNS name for the GreenThread ingress. This works when Bifrost runs in the same cluster. If Bifrost is external, use your ingress's external URL instead.
Install
helm upgrade --install bifrost bifrost/bifrost \
--namespace bifrost \
-f bifrost-values.yaml
Expose Bifrost (optional)
To expose Bifrost externally with an Ingress:
# Add to bifrost-values.yaml
ingress:
enabled: true
className: nginx
hosts:
- host: bifrost.example.com
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- bifrost.example.com
secretName: bifrost-tls
Making requests through Bifrost
Once installed, send requests to Bifrost instead of directly to the GreenThread ingress. Bifrost forwards them to GreenThread transparently.
curl
curl https://bifrost.example.com/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "greenthread/meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 256
}'
The model field is prefixed with the provider name (greenthread/) so Bifrost knows which backend to route to.
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://bifrost.example.com/v1",
api_key="not-needed",
)
response = client.chat.completions.create(
model="greenthread/meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
When enforceGovernanceHeader is enabled, Bifrost requires requests to authenticate with Virtual Keys. Combined with disableAuthOnInference: true, this lets you skip auth on inference endpoints while still enforcing governance via Virtual Keys when auth is active.
Streaming
Streaming works the same way — set "stream": true and Bifrost proxies the SSE stream from GreenThread:
stream = client.chat.completions.create(
model="greenthread/meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Adding cloud providers alongside GreenThread
Bifrost can route between GreenThread and cloud providers from a single endpoint. Add additional providers in your values:
bifrost:
providers:
greenthread:
# ... (as above)
openai:
keys:
- name: default
value: sk-...
weight: 1
network_config:
default_request_timeout_in_seconds: 120
anthropic:
keys:
- name: default
value: sk-ant-...
weight: 1
network_config:
default_request_timeout_in_seconds: 120
Then route requests to different providers by prefixing the model name with the provider:
# Route to GreenThread (self-hosted)
curl ... -d '{"model": "greenthread/meta-llama/Llama-3.1-8B-Instruct", ...}'
# Route to OpenAI
curl ... -d '{"model": "openai/gpt-4o", ...}'
# Route to Anthropic
curl ... -d '{"model": "anthropic/claude-sonnet-4-20250514", ...}'
Bifrost dashboard
Bifrost includes a web dashboard for managing keys, viewing logs, and monitoring usage. Access it at your Bifrost URL (e.g. https://bifrost.example.com) and log in with the admin credentials configured in your values.
Configuration reference
Key Bifrost settings relevant to GreenThread:
| Setting | Description |
|---|---|
bifrost.providers.greenthread.network_config.base_url | GreenThread ingress service URL |
bifrost.providers.greenthread.network_config.default_request_timeout_in_seconds | Request timeout (increase for large model responses) |
bifrost.authConfig.disableAuthOnInference | Skip auth on inference endpoints (for internal use) |
bifrost.client.enforceGovernanceHeader | Require Virtual Keys for API authentication |
bifrost.client.max_request_body_size_mb | Max request body size (default: 100MB) |
