gpubox.ai

UK-sovereign AI inference

OpenAI-compatible.
UK-domiciled.

Drop-in replacement for the OpenAI API on dedicated UK GPU infrastructure. Built for AI developers who want predictable pricing, and regulated industries that need a sovereign answer.

Private beta. Email hello@gpubox.ai for a same-day API key.

hello.py — 3 lines to switch from OpenAI
from openai import OpenAI

client = OpenAI(
    api_key="gpb_...",
    base_url="https://api.gpubox.ai/v1",
)

response = client.chat.completions.create(
    model="qwen2.5-32b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Migrating from OpenAI

Change one URL. Keep every line of your existing code.

# before

base_url = "https://api.openai.com/v1"

# after

base_url = "https://api.gpubox.ai/v1"

Models

The same names you expect, served from UK GPUs

LLM · chat completions

live

qwen2.5-32b-instruct

Qwen2.5-32B-Instruct (AWQ-int4) on RTX 5090. 8k context. Supports streaming, tool use, JSON mode.

/v1/chat/completions

Speech-to-text

live

whisper-large-v3-turbo

OpenAI Whisper large-v3-turbo via faster-whisper. Multipart upload, 100+ languages, verbose JSON with segments + timestamps.

/v1/audio/transcriptions

Embeddings

soon

bge-m3

BAAI BGE-M3. Multilingual, dense + sparse retrieval, 8k context. Coming soon.

/v1/embeddings

Why GPUBox

Sovereign infrastructure, transparent pricing, real models.

Drop-in OpenAI replacement

Change one URL. Every OpenAI client library — Python, Node, Go, curl — works without modification. Migrate in minutes.

UK-sovereign by design

Every inference runs on UK-domiciled hardware operated by a UK-registered company. Banks, public sector, and regulated industries can deploy without a sovereignty audit.

Transparent pricing

Per-token and per-audio-minute rates published openly. No GPU-hour mental math. No surprise bills. £1.00 per million tokens, £0.005 per audio minute.

Real model names

We tell you exactly which model is serving your request. No opaque endpoints. No silent model swaps. You name the model in your code, we serve that model.

Pay-as-you-go

Honest, published rates.

One blended rate per million tokens. One rate per audio minute. No GPU-hour roulette. No mystery bill.

See full pricing →

Chat / LLM

£1.00

per million tokens

Audio / Whisper

£0.005

per audio minute

Built in the UK. Billed in GBP.

Email us for an API key — we hand them out same-day during the private beta.