A research-backed routing engine that classifies each request in <1ms and sends it to the model that actually fits — not the most expensive one. Same outputs. Same SLA. Up to 97% per token, 30–75% on real bills depending on workload.
CASCA_BYPASS=true → direct connection in 5 seconds
Casca classifies every request and dispatches it to the most cost-effective model.
Four systems working together: classify complexity, protect quality, cache answers, and learn automatically.
Every prompt is classified as HIGH, MED, or LOW in under 1ms. Simple queries route to Gemini Flash (up to 97% cheaper per token). Critical analysis stays on GPT-4o or Claude Sonnet. No manual rules to maintain — our production engine handles 14 languages natively, with an L2 MiniLM fallback for ambiguous cases.
160 RULES · 14 LANGUAGES · L1+L2 PIPELINELegal, compliance, and medical prompts are force-routed to GPT-4o / Claude Sonnet — always. If quality drops below your threshold for 3 consecutive days, you get a full refund. Written in the contract, not a promise.
FORCE HIGH · ONE-CLICK ROLLBACK · SLA GUARANTEE"What is an API?" gets asked 200 times a day. Same question, same answer, zero cost. Our global knowledge cache matches semantically — typos, rephrasing, multilingual variants all hit cache at $0.
FUZZY MATCH · LEVENSHTEIN < 5 · GLOBAL POOLAmbiguous prompts ("handle this for me", "fix it") enter the AMBIG queue for review. Every resolution trains the engine. Engine versioning is public — v1.2 (26 rules, peer-reviewed paper) → v2.6.2 (160 rules, current production). Your savings compound monthly.
AMBIG QUEUE · v2.6.2 PRODUCTION · COMPOUNDINGFully compatible with the OpenAI SDK. No logic changes, no prompt rewriting, no engineering sprint. Swap the base URL and everything works.
We modeled 8 representative workloads through Casca's v2.6.2 routing engine. The numbers below are what we actually saw — including the industries where Casca saves less. We publish the underperformers because honesty scales better than marketing.
Real comparison, no marketing spin. Casca is a routing engine, not an aggregator or observability layer — and we publish our research.
| Capability | OpenRouter | Helicone | Portkey | Casca |
|---|---|---|---|---|
| Auto-route by request complexity | — | — | Manual rules | ✓ Native |
| Quality SLA with monthly refund | — | — | — | ✓ Contractual |
| Published technical paper + DOI | — | — | — | ✓ Zenodo |
| Per-industry savings benchmark | — | — | — | ✓ 8 verticals |
| Auto-learn from real traffic | — | — | — | ✓ Flywheel |
| 14-language native classification | EN-only routing | N/A | Limited | ✓ Built-in |
| Cross-customer semantic cache | — | — | ✓ | ✓ + isolation mode |
| OpenAI SDK-compatible (drop-in) | ✓ | ✓ | ✓ | ✓ One line |
| Bypass switch (≤ 5s rollback) | — | — | Config-based | ✓ Env var |
Self-serve plans for predictable pricing. Or pick our outcome-based tier — and only pay when you save more than the subscription would have cost.
One integration. Every major LLM provider — and every credible second-tier alternative. As new models launch, they're available the day they launch. When OpenAI changes pricing, you switch in one click. When a new provider beats GPT-4o on price/quality, you benefit immediately. Built for teams that don't want their AI strategy tied to one vendor's roadmap.
For teams spending $300K+/mo on LLMs. Outcome pricing 12–15% of verified savings (lower at higher volumes). Provider Pool included by default. Custom SLA, named-account support, private deployment available, custom rate-card negotiation with providers on your behalf.
The best SaaS commitment isn't lock-in — it's making cancel-anytime so frictionless that customers stay because they want to. Here's exactly what we promise, in plain English.
If your verified bill reduction is under 30% after 60 days of active use, we refund your subscription fee for the trial period. No questions, no negotiation. Available once per customer account.
Cancel from your dashboard at any time. No annual lock-in on Self-Serve plans, no salesperson to call. Service runs to the end of the current billing cycle; no refund of unused portion. Bypass switch via CASCA_BYPASS=true reverts traffic to your provider in < 5 seconds.
Pause for 1–3 months at any time. Zero subscription fees during pause; routing engine inactive (traffic falls back via bypass). All settings, API keys, and dashboard data preserved. Resume anytime within 90 days, no penalty.
Objective metrics, automatic credits — no claim required, no subjective dispute:
· Routing accuracy < 90% on 7-day rolling window
· Bypass switch latency > 5 seconds
· p99 latency > 1.5× provider baseline for 3 consecutive days
Pick the path that fits how you buy. Both lead to the same engine.
Send us a sample of 100 real requests (anonymized). We run your traffic through our v2.6.2 engine and return a per-tier savings projection within 24 hours. No commitment, no sales call required.
Request analysis →Sign up, get an API key, swap your base_url, and start routing in under 30 minutes. Free tier covers 10M tokens/month — enough to validate the savings on your real traffic.