For teams spending $10K–$200K / month on LLM APIs

Cut your LLM bill 30–75%.
One line of code.
Numbers backed by published research.

A research-backed routing engine that classifies each request in <1ms and sends it to the model that actually fits — not the most expensive one. Same outputs. Same SLA. Up to 97% per token, 30–75% on real bills depending on workload.

⭐ GitHub jewanchen/casca 📄 Published Zenodo · DOI 🛡️ SLA Refund if < 30% savings 🌐 Native 14 languages

Integration base_url = "https://api.cascaio.com/v1"

Escape hatch: CASCA_BYPASS=true → direct connection in 5 seconds

✓ Live in < 30 minutes

✓ Prompts never stored or trained on

✓ One-click bypass if anything goes wrong

Core Technology

Built for Zero-Compromise
Cost Optimization

Four systems working together: classify complexity, protect quality, cache answers, and learn automatically.

◈

Complexity-Aware Routing

Every prompt is classified as HIGH, MED, or LOW in under 1ms. Simple queries route to Gemini Flash (up to 97% cheaper per token). Critical analysis stays on GPT-4o or Claude Sonnet. No manual rules to maintain — our production engine handles 14 languages natively, with an L2 MiniLM fallback for ambiguous cases.

160 RULES · 14 LANGUAGES · L1+L2 PIPELINE

⛊

SLA Quality Protection

Legal, compliance, and medical prompts are force-routed to GPT-4o / Claude Sonnet — always. If quality drops below your threshold for 3 consecutive days, you get a full refund. Written in the contract, not a promise.

FORCE HIGH · ONE-CLICK ROLLBACK · SLA GUARANTEE

⟁

Semantic Caching

"What is an API?" gets asked 200 times a day. Same question, same answer, zero cost. Our global knowledge cache matches semantically — typos, rephrasing, multilingual variants all hit cache at $0.

FUZZY MATCH · LEVENSHTEIN < 5 · GLOBAL POOL

↻

Auto-Learn Flywheel

Ambiguous prompts ("handle this for me", "fix it") enter the AMBIG queue for review. Every resolution trains the engine. Engine versioning is public — v1.2 (26 rules, peer-reviewed paper) → v2.6.2 (160 rules, current production). Your savings compound monthly.

AMBIG QUEUE · v2.6.2 PRODUCTION · COMPOUNDING

Honest Math

Most "save 60%" claims are fiction.
The truth is industry-dependent.

We modeled 8 representative workloads through Casca's v2.6.2 routing engine. The numbers below are what we actually saw — including the industries where Casca saves less. We publish the underperformers because honesty scales better than marketing.

Industry

Bill Reduction

Why this savings level

Fintech support

65–75%

Balance, transaction, KYC lookups dominate — almost all LOW-tier traffic

E-commerce CS

60–72%

Order status, returns, sizing — the same questions repeat at scale

HR helpdesk

60–75%

Every employee asks the same PTO / benefits / 401k questions

Insurance support

55–68%

Policy lookups and claim status checks are repetitive and simple

Marketing / content tools

41–55%

Mid-tier content generation dominates — fewer LOW, fewer HIGH

EdTech / online learning

38–52%

Mixed Q&A and tutoring — broad complexity distribution

B2B SaaS in-app AI

30–45%

Mixed reasoning workloads — analyze / draft / summarize blend

Developer tools / code gen

19–31%

Most queries genuinely need reasoning — less to optimize

Get my workload analyzed → Read methodology & paper →

Numbers represent modeled workloads using engine v2.6.2 against a GPT-4o flat-rate baseline ($5.00/M tokens). Real customer bills depend on traffic mix, retry policies, and overhead. The published v1.2 paper documents the foundational methodology; v2.6.2 industry benchmark coming soon.

Compare

Why not just use OpenRouter, Helicone, or Portkey?

Real comparison, no marketing spin. Casca is a routing engine, not an aggregator or observability layer — and we publish our research.

Capability	OpenRouter	Helicone	Portkey	Casca
Auto-route by request complexity	—	—	Manual rules	✓ Native
Quality SLA with monthly refund	—	—	—	✓ Contractual
Published technical paper + DOI	—	—	—	✓ Zenodo
Per-industry savings benchmark	—	—	—	✓ 8 verticals
Auto-learn from real traffic	—	—	—	✓ Flywheel
14-language native classification	EN-only routing	N/A	Limited	✓ Built-in
Cross-customer semantic cache	—	—	✓	✓ + isolation mode
OpenAI SDK-compatible (drop-in)	✓	✓	✓	✓ One line
Bypass switch (≤ 5s rollback)	—	—	Config-based	✓ Env var

All comparisons based on public documentation as of May 2026. Disagree? Open an issue on GitHub — we'll update this table.

Pricing

Pay for what you save.
Not what you spend.

Self-serve plans for predictable pricing. Or pick our outcome-based tier — and only pay when you save more than the subscription would have cost.

Free

$0 / mo

10M tokens included · BYO API keys

No overage — quota-capped

For founders trying Casca on a side project or proof-of-concept.

3-tier intelligent routing
Cross-customer semantic cache
Real-time savings dashboard
Community support (GitHub)
Bring your own OpenAI / Anthropic / Google keys

Start free →

Starter

$299 / mo

100M tokens included · BYO keys

Overage: $0.10 / 1M tokens routed

For Series A AI-native startups, $5K–$30K/mo on LLMs.

Everything in Free
Email support · 24h response
Routing analytics + alerts
Quality SLA (automated)
Pause subscription anytime

Start 60-day trial →

Scale tier · two ways to pay

Switch anytime

Option A · Predictable

Flat Subscription

$2,499 / mo

2 billion tokens included. Overage at $0.05 / 1M tokens. Predictable monthly bill — easy for procurement to approve, easy for finance to forecast.

            ● Predictable revenue line for finance
            ● No data sharing required
          

Option B · Aligned

Outcome-Based

12% of verified savings

Connect your provider billing read-access. We verify savings against a GPT-4o flat-rate baseline and bill 12% of what you actually save. Floor: $0 if no savings. Cap: 1.5× Option A — never pay more than the flat plan.

            ● Floor $0 — zero risk if engine underperforms
            ● Cap 1.5× Option A — no surprise bills
            ● Includes monthly peer-benchmark report
          

⬡ Optional add-on · Scale tier and above

Casca Provider Pool — Future-Proof Your AI Stack

One integration. Every major LLM provider — and every credible second-tier alternative. As new models launch, they're available the day they launch. When OpenAI changes pricing, you switch in one click. When a new provider beats GPT-4o on price/quality, you benefit immediately. Built for teams that don't want their AI strategy tied to one vendor's roadmap.

OpenAI Anthropic Google Groq Mistral Together AI Fireworks AI Cohere + more

Pre-negotiated rates · Unified DPA · Auto failover · One invoice across all providers

Add-on

+^$499 / mo

included in Enterprise

Add to Scale plan →

Enterprise

Custom annual contract · Outcome-based primary

For teams spending $300K+/mo on LLMs. Outcome pricing 12–15% of verified savings (lower at higher volumes). Provider Pool included by default. Custom SLA, named-account support, private deployment available, custom rate-card negotiation with providers on your behalf.

●Provider Pool default

●Private deployment available

●Quarterly business reviews

●Setup fee: $15K–$30K

Contact sales →

💡 The math (e-commerce CS workload, $50K/mo on GPT-4o): Casca routes ~60% to cheaper models → LLM bill drops to ~$15K. Add Growth plan: 5B tokens × $0.05 + $999 = ~$1,250. Total Casca cost: $1,250/mo. Net savings: $33,750/mo. ROI: 27:1. Other industries see different ratios — see the Honest Math table above.

All BYO-key plans: your API keys never leave your infrastructure. LLM costs billed directly by OpenAI / Anthropic / Google. Casca charges routing fee only.

Service Commitment

Designed so you can leave anytime.

The best SaaS commitment isn't lock-in — it's making cancel-anytime so frictionless that customers stay because they want to. Here's exactly what we promise, in plain English.

60-Day Savings Guarantee

If your verified bill reduction is under 30% after 60 days of active use, we refund your subscription fee for the trial period. No questions, no negotiation. Available once per customer account.

Cancel Anytime

Cancel from your dashboard at any time. No annual lock-in on Self-Serve plans, no salesperson to call. Service runs to the end of the current billing cycle; no refund of unused portion. Bypass switch via CASCA_BYPASS=true reverts traffic to your provider in < 5 seconds.

Pause Subscription

Pause for 1–3 months at any time. Zero subscription fees during pause; routing engine inactive (traffic falls back via bypass). All settings, API keys, and dashboard data preserved. Resume anytime within 90 days, no penalty.

Automated Quality SLA

Objective metrics, automatic credits — no claim required, no subjective dispute:

· Routing accuracy < 90% on 7-day rolling window
· Bypass switch latency > 5 seconds
· p99 latency > 1.5× provider baseline for 3 consecutive days

Enterprise customers receive a custom MSA with named-account performance commitments and contractual remedies beyond the standard self-serve SLA.

Stop Burning Money.
Start Today.

Pick the path that fits how you buy. Both lead to the same engine.

High-touch · $50K+/mo teams

Get a free workload analysis

Send us a sample of 100 real requests (anonymized). We run your traffic through our v2.6.2 engine and return a per-tier savings projection within 24 hours. No commitment, no sales call required.

Request analysis → ↳ Response in < 24h · Reply directly to founder

Self-serve · all team sizes

Create a free account

Sign up, get an API key, swap your base_url, and start routing in under 30 minutes. Free tier covers 10M tokens/month — enough to validate the savings on your real traffic.

Create free account → ↳ No credit card · < 30 min to live

✓ Free to start · ✓ No credit card · ✓ Cancel anytime

Cut your LLM bill 30–75%.
One line of code.
Numbers backed by published research.

Routing in real time

Built for Zero-Compromise
Cost Optimization

Complexity-Aware Routing

SLA Quality Protection

Semantic Caching

Auto-Learn Flywheel

Change one line.
Done this afternoon.

Most "save 60%" claims are fiction.
The truth is industry-dependent.

Intelligent multi-language parsing across 14 languages

Why not just use OpenRouter, Helicone, or Portkey?

Pay for what you save.
Not what you spend.

Scale tier · two ways to pay

Casca Provider Pool — Future-Proof Your AI Stack

Designed so you can leave anytime.

60-Day Savings Guarantee

Cancel Anytime

Pause Subscription

Automated Quality SLA

Frequently Asked

Stop Burning Money.
Start Today.

Get a free workload analysis

Create a free account

CASCA.

Product

Resources

Company

Legal & Trust

Cut your LLM bill 30–75%. One line of code. Numbers backed by published research.

Routing in real time

Built for Zero-CompromiseCost Optimization

Complexity-Aware Routing

SLA Quality Protection

Semantic Caching

Auto-Learn Flywheel

Change one line.Done this afternoon.

Most "save 60%" claims are fiction.The truth is industry-dependent.

Intelligent multi-language parsing across 14 languages

Why not just use OpenRouter, Helicone, or Portkey?

Pay for what you save.Not what you spend.

Scale tier · two ways to pay

Casca Provider Pool — Future-Proof Your AI Stack

Designed so you can leave anytime.

60-Day Savings Guarantee

Cancel Anytime

Pause Subscription

Automated Quality SLA

Frequently Asked

Stop Burning Money.Start Today.

Get a free workload analysis

Create a free account

CASCA.

Product

Resources

Company

Legal & Trust

Cut your LLM bill 30–75%.
One line of code.
Numbers backed by published research.

Built for Zero-Compromise
Cost Optimization

Change one line.
Done this afternoon.

Most "save 60%" claims are fiction.
The truth is industry-dependent.

Pay for what you save.
Not what you spend.

Stop Burning Money.
Start Today.