Oriora — AI model router: one API, best model per request

Pick your lane

Which lane is you?

The one thing Oriora always manages is the model choice. From there it's a dial: take just the pick and run the call yourself, or hand us the call too. You pay only for the layers you switch on.

You run the call

1 layer · just the pick

Your key stays on your own infrastructure. Ask us which model fits; we return the best-fit model plus ranked alternatives. You make the call yourself — we never see your key or your output, and your prompt only if you choose to have us classify it.

Pick this if you want maximum privacy or control, or already have your own call setup.

One flat fee per recommendation.

We run the call

2 layers · pick + run

Hand us your vendor key. One request — we pick the best model and run it, with caching, fallback, and retries, then return the output. Point your tool at one endpoint.

Pick this if you want the least work — one endpoint and we do the rest.

Two flat fees per call.

Both run on the same scoring brain — quality, cost, and latency across every supported vendor, within the preferences you set. Two things are yours to set on top: how many layers you hand us (the pick, or the pick and the call), and whether you declare the task or let us classify it for you.

Don't want to wire anything?

Quick start: Oriora routing in your terminal

The easiest way to watch Oriora pick the model for you — a capable AI terminal on your own key, set up in one step.

First time? Open a terminal on your Mac — 3 steps

1Press ⌘ Command + Space
2Type Terminal and press Enter
3Hit Copy on a command below, paste it in, press Enter

Don't want a terminal at all? Use the one-click installer below — double-click, paste your key, done. · On Windows, run the commands in WSL or Git Bash.

One-click installer

Double-click, paste your key, done. Sets everything up for you — no terminal experience needed.

↓ Download for macOS

Already have Claude Code?

Add the Oriora layer to your existing install — nothing re-downloaded.

curl -fsSL https://orioralabs.com/terminal/add-oriora.sh | bash

What it does, plainly

1. Installs Claude Code — Anthropic's terminal AI — straight from Anthropic, not from us.
2. Adds our layer: a few skills, a memory system, and model-routing.
3. Connects your own AI key — your prompts go straight to your provider; we just pick the best model for each job.

Claude Code is a product of Anthropic — we only add the layer + the routing. Not affiliated with or endorsed by Anthropic.

Prefer your own agent? (Hermes, OpenClaw, or any OpenAI tool)

A different setup from the terminal above — here your vendor keys live on your Oriora account(not your Mac), and Oriora runs each call for you. The agent only ever needs your Oriora key. Good if you'd rather not keep keys on your own machine.

1Install it + point it at Oriora — copy a line, paste in your terminal, press Enter:

Hermes

curl -fsSL https://orioralabs.com/terminal/add-oriora-hermes.sh | bash

OpenClaw

curl -fsSL https://orioralabs.com/terminal/add-oriora-openclaw.sh | bash

2It asks for your sk_oriora_ key — create one in Settings.
3Add your vendor keys (DeepSeek, OpenAI, …) at Settings → Provider keys. That's where they live — on your Oriora account, never in the agent.
4Use the agent. Oriora picks the best model per task and runs it on your key. Add more vendor keys anytime — it routes across all of them.

Any OpenAI-compatible agent or SDK works the same way — point its base URL at https://api.orioralabs.com/v1, use your Oriora key, model oriora-auto.

From download to chatting — every step

Never used a terminal? This is the whole thing. The wizard does the setup — you just paste your key when it asks.

1Press Download for macOS above — you get a .zip.
2Double-click it. It installs Claude Code (Anthropic's terminal AI) + the Oriora layer, and opens a terminal.
3A setup wizard runs by itself and asks which provider's key you'll use — a numbered list (OpenRouter, DeepSeek, MiniMax…).
4Type the number, then paste that key when asked, and press Enter. The wizard saves it for you — locked, on your Mac. Nothing to file by hand.
5Optionally paste your sk_oriora_ key so Oriora picks the best model per job — or press Enter to skip.
6You're in. Chat away — it runs on your own key, and your key never leaves your machine.

One OpenRouter key

Oriora picks the best model for each job from our whole catalogue — every vendor, one key.

One vendor's key

Oriora picks the best of that vendor's models for each job. Either way, the picking is the value.

Note on four vendors: OpenAI, Google, Mistral and Cohere speak OpenAI's API format, which the Claude terminal can't call directly — pick OpenRouter to use those (it carries them). The rest — OpenRouter, DeepSeek, Anthropic, Zhipu, Moonshot, MiniMax, xAI — connect directly.

Optional · give your agent more power

Add Computer Use

Want to give your terminal more power? Let your agent see your screen and click and typeto run real on-screen tasks — fill a form, organize files, grab a screenshot — while you're away. It runs on your own machine, on your own key, billed per call like everything else.

You stay in control — keep it sandboxed and watch what it does. An agent driving your computer is powerful; treat it like handing over the keyboard.

Claude Code & MCP agents

Add a computer-use MCP server — it gives the agent screenshot, click and type tools.

Hermes & agents with it built in

Just ask it — "install computer use and control my Mac." No config wrangling.

OpenAI-compatible

Works with any tool you already use

Oriora uses the same API shape as OpenAI. Any tool that accepts a custom base URL works today — set it to Oriora's endpoint, drop in your Oriora key, and the tool gets intelligent model routing on your own vendor keys. No code changes, no new SDK. Your vendor keys live on your Oriora account (Settings → Provider keys); the tool only needs your Oriora key.

Claude Code

One-click installer above. An OpenRouter key lets Oriora route across our full catalogue per request; a single vendor key uses that vendor — your own key either way. Docs →

Cursor

AI code editor. Add Oriora as a custom model (Settings → Models). Cursor chat then routes through us to the best model on your keys. (Composer/agent + tab keep using its own models.) Docs → · Watch ↗

Continue.dev

VS Code / JetBrains AI extension. Add Oriora in config.json (apiBase) → chat + inline edits in your editor, routed to the best model on your keys. Docs → · Watch ↗

LiteLLM

Building a stack? Add Oriora as a model in your LiteLLM proxy — everything behind it gets our routing, on your keys. Docs → · Watch ↗

LangChain

Building your own app? ChatOpenAI(base_url="https://api.orioralabs.com/v1") — one line, every call routed to the best model on your keys. Docs → · Watch ↗

LlamaIndex

Building your own app or RAG? OpenAI(api_base="https://api.orioralabs.com/v1") — one line, every call routed on your keys. Docs → · Watch ↗

Vercel AI SDK

Building a web app? createOpenAI({ baseURL: "https://api.orioralabs.com/v1" }) — drop-in, every call routed on your keys. Docs →

Open WebUI

Self-hosted ChatGPT-style chat app. Add Oriora as a connection → every message routes to the best model on your keys. Docs → · Watch ↗

Dify

Visual AI-app builder. Add Oriora as an OpenAI-compatible provider → use it in any app or workflow, routed on your keys. Docs → · Watch ↗

Flowise

Visual LLM-flow builder. Point a ChatOpenAI node at Oriora → your flows route to the best model on your keys. Docs → · Watch ↗

Cloudflare AI Gateway

Add Oriora as a custom provider in your CF AI Gateway → your traffic routes through us to the best model on your keys. Docs →

Any OpenAI SDK

Python / Node / Go / Rust — set the base URL to Oriora, keep your code. Every call routed to the best model on your keys. Docs →

The pattern

base_url = "https://api.orioralabs.com/api/route"
api_key  = "<your-oriora-key>"
# Everything else stays the same — model name, messages, stream, all of it.

Or — Oriora as an MCP server

Plug Oriora into Claude Desktop, Cursor, or any MCP client and your agent gains one tool — recommend_model — that picks the best model for each task from your router, so it's never locked to one model or guessing which to use. Your agent then runs the call on your own keys; Oriora never sees your prompt or vendor key. $0.001 per recommendation. Nothing to install or self-host — it's hosted at api.orioralabs.com/mcp.

{
  "mcpServers": {
    "oriora": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://api.orioralabs.com/mcp",
               "--header", "Authorization: Bearer ${ORIORA_API_KEY}"]
    }
  }
}

Get your Oriora key at Settings → Provider keys. The tool returns the same fields as /api/select (model · provider · alternatives) — see the Reference below. What is MCP? →

Oriora is an independent product — not affiliated with, partnered with, or endorsed by any of the tools listed above. We work with them because Oriora is OpenAI-compatible; anyone can point a compatible client at our endpoint.

If your app knows its job, declare the task and we route on that — we never read your prompt. If it can't — an off-the-shelf agent, a mixed-purpose tool — let us classify the prompt instead: read in memory, stored nowhere. Same router, same scores. You choose whether we read it.

Classify — when you can't declare

For agents & mixed-purpose tools
An off-the-shelf agent can't pass a task label, so we read the prompt to work out what kind of task it is.
Read in memory, stored nowhere
The prompt is sorted into a task type in memory and never written down — the same zero-storage promise as everywhere else.
Local or server-side — privacy vs accuracy
Classify on your machine with a fast rules classifier (your prompt never leaves) for maximum privacy — or send it to our sharper server-side classifier, which catches the ambiguous, mixed prompts that simple rules miss and routes them better. Either way: in memory, stored nowhere.
Same router, same scores
Once the task is known, routing is identical to declaring it — classifying only fills in the label.

Declare — the clean default

You declare the task
Your app is a code reviewer, or a support bot, or a research tool. It always will be. You pass taskType — we never need to read your prompt.
No black box
Routing is deterministic — the top scorer for your task within the vendors and priority you set. The response tells you which model actually ran.
Zero classification overhead
No ML inference before routing. Your request goes straight to the best model.
Built for apps with a purpose
If your app knows what it does, declare it. That's the only requirement.

The full menu

Three ways the task gets decided — all still client-side (you run the call, your key never leaves, one flat fee). It's a menu, not an either-or: mix them per workload, even per task type.

1

Declare it

Prompt never sent

You pass the task type yourself. No classifier runs — the cleanest, fastest path.

2

Classify locally

Prompt never leaves your machine

The oriora-c2 adapter classifies on your side with a fast rules classifier. Maximum privacy — built for off-the-shelf agents that can’t pass a label.

3

Classify server-side

Read in memory, stored nowhere

Send the prompt to /api/select and our sharper classifier reads it — better on the ambiguous, mixed prompts that simple rules miss.

Privacy

No prompt at rest. Anywhere.

Not a policy. Not a contractual add-on. The architecture itself has nowhere for your prompt to land.

Managed selection

Logs the task type, the model that ran, and the flat fee per request. That's it. Prompt content is never written to disk or any database.

Model gateway

Forwards your request to the model provider. Configured for zero prompt retention — nothing stored in transit.

Model provider

Runs inference and returns a response. Same as calling them directly — the routing layer adds no extra data surface.

Relevant for any privacy-conscious product where prompt content shouldn't pass through a third-party logging layer. Some routing tools store prompts by default and charge extra for zero-data-retention. We don't store them at all.

Powered by Oriora — every call is labeled

Each AI result can carry an honest, un-fakeable credit line — proof that Oriora routed the call and your key was never retained. It rides on the apps you build and spreads wherever Oriora ran — proof, not a logo.

Routed by Oriora

Bring Your Own Memory

Your key is yours. So is your memory.

Bring your own key — and bring your own memory. Your rules, your preferences, your way of working, kept in a file you own. Any Oriora app or agent reads it the moment you act, uses it for that one reply, then forgets it. One memory, everywhere you work — and we never store it.

It stays yours

Your memory lives in a file you own — your repo, your device. You edit the source; the next reply already has it. We keep no copy.

Read per call, stored nowhere

When you act, the memory is added to the prompt for that single call — then it is gone. Nothing of it is written to disk or any database.

One memory, every surface

The same memory follows you across every app and agent you use. Not per-app silos — one you, everywhere.

Up in three steps.

01

Create an account

Sign up at orioralabs.com. Connect your vendor API keys — Oriora charges only its flat per-call fees.

02

Generate a key

Inside your account, generate an sk_oriora_... key. Yours in seconds.

03

Pass your task type

One POST request with taskType declared. Oriora handles everything from there.

Generate your API key →

Opens Settings → Oriora API keys (sign in required).

Client-side — get a model recommendation

Oriora returns the best-fit model; you make the call with your own vendor key. Your key never touches us. Flat $0.001 per recommendation.

curl https://api.orioralabs.com/api/select \
  -H "Authorization: Bearer sk_oriora_..." \
  -H "Content-Type: application/json" \
  -d '{"task_type": "coding_extra_hard"}'

# → { "model": "anthropic/claude-fable-5", "provider": "anthropic",
#     "native_model": "claude-fable-5", "alternatives": [...],
#     "task_type": "coding_extra_hard",
#     "orchestration": { "pattern": "quality-verify",
#                        "verifier_model": "deepseek/deepseek-v4-pro",
#                        "verify_prompt": "Review the answer above for correctness
#                          and completeness, then produce an improved, final version.
#                          Respond with only the final answer." } }
#
# Task tiers: coding → coding_hard → coding_extra_hard (same for reasoning/agentic).
# coding_extra_hard = premium model + a quality-verify recommendation. To use it:
# run the call on "model", then run a second pass on "verifier_model" (a different
# vendor) feeding it your prompt + the first answer + "verify_prompt" — that second
# model reviews and returns the corrected final answer. To disable the recommendation
# account-wide, toggle "Orchestration hints" off in Settings. Oriora runs no extra call.
#
# *_extra_hard is reached by declaration (as above) or by auto-classification when the
# classifier detects high-consequence signals (security vulnerabilities, whole-system
# scope, completeness requirements). orchestration is null on all other task types.
#
# "models" is optional — omit it and Oriora ranks across the full catalogue.
# Prefer not to name the task? Send "messages" instead of "task_type" and
# Oriora classifies the prompt in memory — stored nowhere (see above).

Discover task types (no auth): GET /api/select/task-types

Choosing the routing tier

Every request is classified into a routing tier. You can set the tier explicitly with task_type, or let Oriora infer it from your prompt. Explicit task_typeis always honored exactly; prompt-based inference is a convenience for when you'd rather not classify yourself.

To request	Set task_type, or signal in your prompt
Standard routing — the best model for the task	The default; no action needed
A higher-capability model for demanding work	Phrasing such as “this is complex / difficult” → the _hard tier
A higher-capability model plus a second, independent model that reviews the answer	Declare task_type: "coding_extra_hard" (or reasoning_/agentic_), or phrasing such as “double-check this”, “must be correct”, “high-stakes”

Server-side — routed completion (OpenAI-compatible)

Point your existing OpenAI SDK at Oriora. We route to the best model and run it on your own vendor key (BYOK) — your vendor bills the tokens; Oriora charges only the platform & routing fees.

curl https://api.orioralabs.com/v1/chat/completions \
  -H "Authorization: Bearer sk_oriora_..." \
  -H "x-oriora-app: my-app" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "oriora-auto",
    "messages": [{"role": "user", "content": "Review this PR diff..."}]
  }'

# Returns an OpenAI-shaped chat.completion.
# model:"oriora-auto" lets Oriora route. x-oriora-app is an optional label —
# it keeps each app's usage breakdown and cache/route warmth separate.
# Vendor keys connect account-wide under Settings -> Provider keys.

Server-side — declare the task (plain JSON)

The simplest integration: one POST, no SDK shape. Same routing and BYOK execution as the OpenAI-compatible endpoint — you name the task type yourself. Flat $0.002 per call.

curl https://api.orioralabs.com/api/route \
  -H "Authorization: Bearer sk_oriora_..." \
  -H "Content-Type: application/json" \
  -d '{
    "taskType": "coding",
    "messages": [{"role": "user", "content": "Write a regex for..."}]
  }'

# → { "response": "...", "amount_usd": 0.002, "provider": "...", "latency_ms": ... }
# You declare the task (full list: GET /api/select/task-types). Add
# "classify_task": true to let Oriora read the prompt and escalate the task.
# Three tiers: coding → coding_hard (premium model) → coding_extra_hard
# (premium model + quality-verify orchestration; also works for agentic/reasoning).

Limits & errors

What you can rely on, and what each status code means. Failed BYOK calls are not charged.

Rate limit   60 calls/min per account on the AI endpoints.
             Every response carries standard RateLimit-* headers.
Input cap    800,000 characters (~200k tokens) per request.
Output cap   16,384 tokens (max_tokens above this is clamped).

401  missing or invalid key — these endpoints take sk_oriora_... API
     keys, not website session logins
402  insufficient_funds — top up your wallet at orioralabs.com
403  requires BYOK — connect a vendor key in Settings -> Provider keys
400  unknown task_type — valid values: GET /api/select/task-types
413  input too large (see caps above)
429  rate limited — wait for the RateLimit-Reset header, then retry
502  routing or vendor failure — the call was not charged (BYOK)

You know what
you're building.
We route it.

Which lane is you?

Two ways to wire client-side

Quick start: Oriora routing in your terminal

From download to chatting — every step

Add Computer Use

Works with any tool you already use

Tell us the task — or let us read it

No prompt at rest. Anywhere.

Your key is yours. So is your memory.

Wired? Then it's read. Not wired? Nothing changes.

Prefer one key instead of ten?

Up in three steps.

Developer reference

Client-side — get a model recommendation

Choosing the routing tier

Server-side — routed completion (OpenAI-compatible)

Server-side — declare the task (plain JSON)

Limits & errors

Ready to route?

You know what you're building.We route it.

Which lane is you?

Two ways to wire client-side

Quick start: Oriora routing in your terminal

From download to chatting — every step

Add Computer Use

Works with any tool you already use

Tell us the task — or let us read it

No prompt at rest. Anywhere.

Your key is yours. So is your memory.

Wired? Then it's read. Not wired? Nothing changes.

Prefer one key instead of ten?

Up in three steps.

Developer reference

Client-side — get a model recommendation

Choosing the routing tier

Server-side — routed completion (OpenAI-compatible)

Server-side — declare the task (plain JSON)

Limits & errors

Ready to route?

You know what
you're building.
We route it.