UK-sovereign AI inference
OpenAI-compatible.
UK-domiciled.
Drop-in replacement for the OpenAI API on dedicated UK GPU infrastructure. Built for AI developers who want predictable pricing, and regulated industries that need a sovereign answer.
Private beta. Email hello@gpubox.ai for a same-day API key.
from openai import OpenAI
client = OpenAI(
api_key="gpb_...",
base_url="https://api.gpubox.ai/v1",
)
response = client.chat.completions.create(
model="qwen2.5-32b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Migrating from OpenAI
Change one URL. Keep every line of your existing code.
# before
base_url = "https://api.openai.com/v1"
# after
base_url = "https://api.gpubox.ai/v1"
Models
The same names you expect, served from UK GPUs
LLM · chat completions
liveqwen2.5-32b-instruct
Qwen2.5-32B-Instruct (AWQ-int4) on RTX 5090. 8k context. Supports streaming, tool use, JSON mode.
/v1/chat/completions
Speech-to-text
livewhisper-large-v3-turbo
OpenAI Whisper large-v3-turbo via faster-whisper. Multipart upload, 100+ languages, verbose JSON with segments + timestamps.
/v1/audio/transcriptions
Embeddings
soonbge-m3
BAAI BGE-M3. Multilingual, dense + sparse retrieval, 8k context. Coming soon.
/v1/embeddings
Why GPUBox
Sovereign infrastructure, transparent pricing, real models.
Drop-in OpenAI replacement
Change one URL. Every OpenAI client library — Python, Node, Go, curl — works without modification. Migrate in minutes.
UK-sovereign by design
Every inference runs on UK-domiciled hardware operated by a UK-registered company. Banks, public sector, and regulated industries can deploy without a sovereignty audit.
Transparent pricing
Per-token and per-audio-minute rates published openly. No GPU-hour mental math. No surprise bills. £1.00 per million tokens, £0.005 per audio minute.
Real model names
We tell you exactly which model is serving your request. No opaque endpoints. No silent model swaps. You name the model in your code, we serve that model.
Pay-as-you-go
Honest, published rates.
One blended rate per million tokens. One rate per audio minute. No GPU-hour roulette. No mystery bill.
See full pricing →Chat / LLM
£1.00
per million tokens
Audio / Whisper
£0.005
per audio minute
Built in the UK. Billed in GBP.
Email us for an API key — we hand them out same-day during the private beta.