For teams spending $10K–$200K / month on LLM APIs

Cut your LLM bill 30–75%.
One line of code.
Numbers backed by published research.

A research-backed routing engine that classifies each request in <1ms and sends it to the model that actually fits — not the most expensive one. Same outputs. Same SLA. Up to 97% per token, 30–75% on real bills depending on workload.

GitHub jewanchen/casca 📄 Published Zenodo · DOI 🛡️ SLA Refund if < 30% savings 🌐 Native 14 languages
Integration base_url = "https://api.cascaio.com/v1"
Escape hatch: CASCA_BYPASS=true → direct connection in 5 seconds
Live in < 30 minutes
Prompts never stored or trained on
One-click bypass if anything goes wrong

Routing in real time

Casca classifies every request and dispatches it to the most cost-effective model.

0
LOW
0
MED
0
HIGH
0
CACHE

Built for Zero-Compromise
Cost Optimization

Four systems working together: classify complexity, protect quality, cache answers, and learn automatically.

Complexity-Aware Routing

Every prompt is classified as HIGH, MED, or LOW in under 1ms. Simple queries route to Gemini Flash (up to 97% cheaper per token). Critical analysis stays on GPT-4o or Claude Sonnet. No manual rules to maintain — our production engine handles 14 languages natively, with an L2 MiniLM fallback for ambiguous cases.

160 RULES · 14 LANGUAGES · L1+L2 PIPELINE

SLA Quality Protection

Legal, compliance, and medical prompts are force-routed to GPT-4o / Claude Sonnet — always. If quality drops below your threshold for 3 consecutive days, you get a full refund. Written in the contract, not a promise.

FORCE HIGH · ONE-CLICK ROLLBACK · SLA GUARANTEE

Semantic Caching

"What is an API?" gets asked 200 times a day. Same question, same answer, zero cost. Our global knowledge cache matches semantically — typos, rephrasing, multilingual variants all hit cache at $0.

FUZZY MATCH · LEVENSHTEIN < 5 · GLOBAL POOL

Auto-Learn Flywheel

Ambiguous prompts ("handle this for me", "fix it") enter the AMBIG queue for review. Every resolution trains the engine. Engine versioning is public — v1.2 (26 rules, peer-reviewed paper) → v2.6.2 (160 rules, current production). Your savings compound monthly.

AMBIG QUEUE · v2.6.2 PRODUCTION · COMPOUNDING

Change one line.
Done this afternoon.

Fully compatible with the OpenAI SDK. No logic changes, no prompt rewriting, no engineering sprint. Swap the base URL and everything works.

100%
OpenAI SDK compatible
0
Other code changes
< 1h
Total setup time
# Your current code
client = OpenAI(
api_key="sk-...",
base_url="https://api.openai.com/v1"
)
 
# Change to this. Nothing else.
client = OpenAI(
api_key="sk-casca-...",
base_url="https://api.cascaio.com/v1" # ✓ Done
)
 
# Escape hatch — 5 seconds to revert
# export CASCA_BYPASS=true

Most "save 60%" claims are fiction.
The truth is industry-dependent.

We modeled 8 representative workloads through Casca's v2.6.2 routing engine. The numbers below are what we actually saw — including the industries where Casca saves less. We publish the underperformers because honesty scales better than marketing.

Industry
Bill Reduction
Why this savings level
Fintech support
65–75%
Balance, transaction, KYC lookups dominate — almost all LOW-tier traffic
E-commerce CS
60–72%
Order status, returns, sizing — the same questions repeat at scale
HR helpdesk
60–75%
Every employee asks the same PTO / benefits / 401k questions
Insurance support
55–68%
Policy lookups and claim status checks are repetitive and simple
Marketing / content tools
41–55%
Mid-tier content generation dominates — fewer LOW, fewer HIGH
EdTech / online learning
38–52%
Mixed Q&A and tutoring — broad complexity distribution
B2B SaaS in-app AI
30–45%
Mixed reasoning workloads — analyze / draft / summarize blend
Developer tools / code gen
19–31%
Most queries genuinely need reasoning — less to optimize
Numbers represent modeled workloads using engine v2.6.2 against a GPT-4o flat-rate baseline ($5.00/M tokens). Real customer bills depend on traffic mix, retry policies, and overhead. The published v1.2 paper documents the foundational methodology; v2.6.2 industry benchmark coming soon.

Intelligent multi-language parsing across 14 languages

🇺🇸 English
🇹🇼 繁體中文
🇨🇳 简体中文
🇯🇵 日本語
🇫🇷 Français
🇰🇷 한국어
🇩🇪 Deutsch
🇪🇸 Español
🇮🇹 Italiano
🇮🇳 हिन्दी
🇸🇦 العربية
🇹🇭 ไทย
🇻🇳 Tiếng Việt
🇮🇩 Bahasa Indonesia

Why not just use OpenRouter, Helicone, or Portkey?

Real comparison, no marketing spin. Casca is a routing engine, not an aggregator or observability layer — and we publish our research.

Capability OpenRouter Helicone Portkey Casca
Auto-route by request complexity Manual rules ✓ Native
Quality SLA with monthly refund ✓ Contractual
Published technical paper + DOI ✓ Zenodo
Per-industry savings benchmark ✓ 8 verticals
Auto-learn from real traffic ✓ Flywheel
14-language native classification EN-only routing N/A Limited ✓ Built-in
Cross-customer semantic cache ✓ + isolation mode
OpenAI SDK-compatible (drop-in) ✓ One line
Bypass switch (≤ 5s rollback) Config-based ✓ Env var
All comparisons based on public documentation as of May 2026. Disagree? Open an issue on GitHub — we'll update this table.

Pay for what you save.
Not what you spend.

Self-serve plans for predictable pricing. Or pick our outcome-based tier — and only pay when you save more than the subscription would have cost.

Free
$0 / mo
10M tokens included · BYO API keys
No overage — quota-capped
For founders trying Casca on a side project or proof-of-concept.
  • 3-tier intelligent routing
  • Cross-customer semantic cache
  • Real-time savings dashboard
  • Community support (GitHub)
  • Bring your own OpenAI / Anthropic / Google keys
Start free →
Starter
$299 / mo
100M tokens included · BYO keys
Overage: $0.10 / 1M tokens routed
For Series A AI-native startups, $5K–$30K/mo on LLMs.
  • Everything in Free
  • Email support · 24h response
  • Routing analytics + alerts
  • Quality SLA (automated)
  • Pause subscription anytime
Start 60-day trial →
Scale
From $2,499 / mo
2B tokens · or 12% of savings
Two pricing options · see below
For mid-market teams, $80K–$300K/mo on LLMs. Choose flat or outcome-based.
  • Everything in Growth
  • Outcome-based pricing available
  • Provider Pool add-on available
  • Dedicated success manager
  • Custom SLA + audit log retention
See options ↓

Scale tier · two ways to pay

Switch anytime
Option A · Predictable
Flat Subscription
$2,499 / mo

2 billion tokens included. Overage at $0.05 / 1M tokens. Predictable monthly bill — easy for procurement to approve, easy for finance to forecast.

Predictable revenue line for finance No data sharing required
Option B · Aligned
Outcome-Based
12% of verified savings

Connect your provider billing read-access. We verify savings against a GPT-4o flat-rate baseline and bill 12% of what you actually save. Floor: $0 if no savings. Cap: 1.5× Option A — never pay more than the flat plan.

Floor $0 — zero risk if engine underperforms Cap 1.5× Option A — no surprise bills Includes monthly peer-benchmark report
Optional add-on · Scale tier and above

Casca Provider Pool — Future-Proof Your AI Stack

One integration. Every major LLM provider — and every credible second-tier alternative. As new models launch, they're available the day they launch. When OpenAI changes pricing, you switch in one click. When a new provider beats GPT-4o on price/quality, you benefit immediately. Built for teams that don't want their AI strategy tied to one vendor's roadmap.

OpenAI Anthropic Google Groq Mistral Together AI Fireworks AI Cohere + more
Pre-negotiated rates · Unified DPA · Auto failover · One invoice across all providers
Add-on
+$499 / mo
included in Enterprise
Add to Scale plan →
Enterprise
Custom annual contract · Outcome-based primary

For teams spending $300K+/mo on LLMs. Outcome pricing 12–15% of verified savings (lower at higher volumes). Provider Pool included by default. Custom SLA, named-account support, private deployment available, custom rate-card negotiation with providers on your behalf.

Provider Pool default
Private deployment available
Quarterly business reviews
Setup fee: $15K–$30K
Contact sales →
💡 The math (e-commerce CS workload, $50K/mo on GPT-4o): Casca routes ~60% to cheaper models → LLM bill drops to ~$15K. Add Growth plan: 5B tokens × $0.05 + $999 = ~$1,250. Total Casca cost: $1,250/mo. Net savings: $33,750/mo. ROI: 27:1. Other industries see different ratios — see the Honest Math table above.
All BYO-key plans: your API keys never leave your infrastructure. LLM costs billed directly by OpenAI / Anthropic / Google. Casca charges routing fee only.

Designed so you can leave anytime.

The best SaaS commitment isn't lock-in — it's making cancel-anytime so frictionless that customers stay because they want to. Here's exactly what we promise, in plain English.

1

60-Day Savings Guarantee

If your verified bill reduction is under 30% after 60 days of active use, we refund your subscription fee for the trial period. No questions, no negotiation. Available once per customer account.

2

Cancel Anytime

Cancel from your dashboard at any time. No annual lock-in on Self-Serve plans, no salesperson to call. Service runs to the end of the current billing cycle; no refund of unused portion. Bypass switch via CASCA_BYPASS=true reverts traffic to your provider in < 5 seconds.

3

Pause Subscription

Pause for 1–3 months at any time. Zero subscription fees during pause; routing engine inactive (traffic falls back via bypass). All settings, API keys, and dashboard data preserved. Resume anytime within 90 days, no penalty.

4

Automated Quality SLA

Objective metrics, automatic credits — no claim required, no subjective dispute:

· Routing accuracy < 90% on 7-day rolling window
· Bypass switch latency > 5 seconds
· p99 latency > 1.5× provider baseline for 3 consecutive days

Enterprise customers receive a custom MSA with named-account performance commitments and contractual remedies beyond the standard self-serve SLA.

Frequently Asked

Stop Burning Money.
Start Today.

Pick the path that fits how you buy. Both lead to the same engine.

Self-serve · all team sizes

Create a free account

Sign up, get an API key, swap your base_url, and start routing in under 30 minutes. Free tier covers 10M tokens/month — enough to validate the savings on your real traffic.

Create free account → ↳ No credit card · < 30 min to live
Free to start  ·  No credit card  ·  Cancel anytime