GreenThread Docs

Bifrost is an open-source AI gateway that provides rate limiting, authentication, request logging, cost tracking, and multi-provider routing. It connects to GreenThread as an OpenAI-compatible provider, giving your team a managed access layer in front of your self-hosted models.

Why use Bifrost with GreenThread?

GreenThread handles GPU scheduling, model lifecycle, and inference serving. Bifrost sits in front as an API gateway and adds:

Authentication & API keys — issue keys per team or application
Request logging & audit trails — track usage across models and users
Cost tracking — monitor token usage and costs per key
Rate limiting — protect models from request spikes
Multi-provider routing — route between GreenThread and cloud providers (OpenAI, Anthropic, etc.) from a single endpoint
Governance headers — enforce organizational policies on requests

Architecture

Clients → Bifrost (AI Gateway) → GreenThread Ingress → Models

Bifrost connects to the GreenThread ingress service as a custom OpenAI-compatible provider. All inference requests flow through Bifrost for authentication and logging, then to the ingress which handles model routing, wake/sleep, and proxying to vLLM.

Install Bifrost

Add the Helm repo

helm repo add bifrost https://maximhq.github.io/bifrost/helm-charts
helm repo update

Create namespace

kubectl create namespace bifrost

Create a values file

Create bifrost-values.yaml:

bifrost:
  # Admin dashboard credentials
  authConfig:
    isEnabled: true
    adminUsername: admin
    adminPassword: <your-admin-password>
    # Set to true to skip auth on inference endpoints
    # (useful for internal clusters)
    disableAuthOnInference: true

  client:
    enforceGovernanceHeader: true

  # Connect to GreenThread as an OpenAI-compatible provider
  providers:
    greenthread:
      custom_provider_config:
        base_provider_type: openai
        allowed_requests:
          chat_completion: true
          chat_completion_stream: true
          completion: true
          embeddings: true
          model_list: true
      keys:
        - name: default
          value: not-needed
          weight: 1
      network_config:
        base_url: http://gthread-ingress.greenthread-system.svc.cluster.local/
        default_request_timeout_in_seconds: 300

Internal service URL

The base_url uses the Kubernetes service DNS name for the GreenThread ingress. This works when Bifrost runs in the same cluster. If Bifrost is external, use your ingress's external URL instead.

Install

helm upgrade --install bifrost bifrost/bifrost \
  --namespace bifrost \
  -f bifrost-values.yaml

Expose Bifrost (optional)

To expose Bifrost externally with an Ingress:

# Add to bifrost-values.yaml
ingress:
  enabled: true
  className: nginx
  hosts:
    - host: bifrost.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - hosts:
        - bifrost.example.com
      secretName: bifrost-tls

Making requests through Bifrost

Once installed, send requests to Bifrost instead of directly to the GreenThread ingress. Bifrost forwards them to GreenThread transparently.

curl

curl https://bifrost.example.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "greenthread/meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
  }'

The model field is prefixed with the provider name (greenthread/) so Bifrost knows which backend to route to.

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://bifrost.example.com/v1",
    api_key="not-needed",
)

response = client.chat.completions.create(
    model="greenthread/meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Governance header

When enforceGovernanceHeader is enabled, Bifrost requires requests to authenticate with Virtual Keys. Combined with disableAuthOnInference: true, this lets you skip auth on inference endpoints while still enforcing governance via Virtual Keys when auth is active.

Streaming

Streaming works the same way — set "stream": true and Bifrost proxies the SSE stream from GreenThread:

stream = client.chat.completions.create(
    model="greenthread/meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Adding cloud providers alongside GreenThread

Bifrost can route between GreenThread and cloud providers from a single endpoint. Add additional providers in your values:

bifrost:
  providers:
    greenthread:
      # ... (as above)

    openai:
      keys:
        - name: default
          value: sk-...
          weight: 1
      network_config:
        default_request_timeout_in_seconds: 120

    anthropic:
      keys:
        - name: default
          value: sk-ant-...
          weight: 1
      network_config:
        default_request_timeout_in_seconds: 120

Then route requests to different providers by prefixing the model name with the provider:

# Route to GreenThread (self-hosted)
curl ... -d '{"model": "greenthread/meta-llama/Llama-3.1-8B-Instruct", ...}'

# Route to OpenAI
curl ... -d '{"model": "openai/gpt-4o", ...}'

# Route to Anthropic
curl ... -d '{"model": "anthropic/claude-sonnet-4-20250514", ...}'

Bifrost dashboard

Bifrost includes a web dashboard for managing keys, viewing logs, and monitoring usage. Access it at your Bifrost URL (e.g. https://bifrost.example.com) and log in with the admin credentials configured in your values.

Configuration reference

Key Bifrost settings relevant to GreenThread:

Setting	Description
`bifrost.providers.greenthread.network_config.base_url`	GreenThread ingress service URL
`bifrost.providers.greenthread.network_config.default_request_timeout_in_seconds`	Request timeout (increase for large model responses)
`bifrost.authConfig.disableAuthOnInference`	Skip auth on inference endpoints (for internal use)
`bifrost.client.enforceGovernanceHeader`	Require Virtual Keys for API authentication
`bifrost.client.max_request_body_size_mb`	Max request body size (default: 100MB)