ai case studies

AI case studies.
Receipts, not slideware.

Six anonymized engagements — clinical triage on Claude Sonnet, RAG over 12k product docs, Realtime API voice agents, payer prior-auth drafting, contract review, and a Flutter voice copilot. Each one ships with an eval set, latency budget, kill points, and the math behind the metric. Client names changed at their request; numbers drawn from shadow-mode logs and frozen eval sets unless explicitly noted as published cost math.

Industry

HealthcareSaaSLegalE-commerce

Capability

RAGAI agentsVoice agentsChatbots

Stack

ClaudeOpenAILangChainLangGraphpgvector

Outcome

Time savedDeflectionConversion liftCompliance

six engagements · live + in-flight

Six AI success stories,
anonymized at the client's request.

Click through to the live pilot below for the full operator-detail write-up — eval table, signature architecture diagram, kill-point section, the works. The other five render Phase 2 (same template, different signature SVG per page).

Healthcare · Regional health system Case study

HIPAA-safe clinical triage agent — shipped in 9 weeks

Problem

Pre-triage queue averaging 38–62 min wait at peak. Nurse triage line overflow routing the wrong-acuity patients to ER. PHI-safe AI never piloted.

Approach

FHIR-pulled chart context → PHI redaction → hybrid pgvector + BM25 retrieval over the clinical-pathway corpus → Claude Sonnet 4.6 forced-JSON decision → policy + 2-eye guardrails. Three outcome lanes.

Claude Sonnet 4.6pgvector 0.7FHIR R4LangGraph 0.2Langfuse

Outcome

38–62% pre-triage wait reduction (n=14,200 shadow encounters)

B2B SaaS · Developer tooling Case study

Claude case study — RAG over 12,000 product-docs pages

Problem

Documentation search rated 2.3/5 by users; support ticket volume 41% docs-recoverable (n=1,200). Existing keyword search couldn't reason across nested module hierarchies; old answer-bot hallucinated on synonyms.

Approach

Claude Sonnet 4.6 + Haiku 4.5 router over a hybrid pgvector + Algolia index. voyage-3-large embeddings, bge-reranker-large self-hosted. Forced-JSON answer schema with regex-enforced anchor citations — every claim links to a doc anchor or the validator rejects.

Claude Sonnet 4.6Claude Haiku 4.5pgvector 0.7BAAI bge-reranker

Outcome

≈ 64% docs-recoverable tickets deflected at conf ≥ 0.8 (95% CI · n=3,400)

SaaS · Customer support Case study

OpenAI case study — Realtime API voice agent at $0.10/call

Problem

Tier-1 voice queue averaging 4-minute wait at peak; 5 inbound questions accounted for 62% of call volume. Existing IVR bouncing 80%+ to a human.

Approach

gpt-realtime-2 voice agent over the help-center RAG corpus. p95 580ms first-token, function-calling handoff_to_human when confidence < 0.7, Twilio + Cloudflare edge audio transport. Published $0.10/call cost math vs $4 live-agent baseline.

gpt-realtime-2Whisper-large-v3pgvector 0.7Twilio Voice

Outcome

≈ 38% tier-1 voice deflection (95% CI · n=11,400 calls)

Fintech · Mid-market US bank Case study

Anthropic case study — Claude Sonnet 4.6 fraud agent at a US mid-market bank

Problem

Rules-engine bleeding 18% false-positive rate on 1.2B/yr transactions across card · wire · ACH · RTP. Median analyst review-prep at 8 minutes per flagged case at $14 fully-loaded. Every flag needed a regulator-audit-defensible case note — the binding constraint.

Approach

XGBoost velocity score short-circuits the LLM on the auto-clear band → hybrid pgvector + BM25 retrieval over a 4-yr KYC + case-note corpus → bge-reranker-large self-host → Claude Sonnet 4.6 forced-JSON disposition over AWS PrivateLink → policy-as-code + 2-eye gate → 3 outcome lanes (clear / case-note / regulatory escalate).

Claude Sonnet 4.6Claude Haiku 4.5pgvector 0.7XGBoost 2.0LangGraph 0.2

Outcome

≥ 0.96 precision @ 1% FPR (n=412 eval + 1,840 production · ±0.012 CI)

Legal · Mid-market firm Case study

RAG case study — first-pass MSA review for a mid-market law firm

Problem

Partners spending 6–9 hours per MSA on first-pass review; clause-library drift across 4 practice groups producing inconsistent calls; 11% of post-execution disputes traced to first-pass drift.

Approach

LangChain 0.3 + LangGraph 0.2 orchestrator over a reconciled clause library (1,420 clauses post-reconciliation, down from 1,840). Hybrid pgvector + tsvector BM25 retrieval, bge-reranker-large (Cohere Rerank A/B'd, kept as fallback). Forced-JSON clause-risk schema with regex-enforced policy_id citations.

Claude Sonnet 4.6LangChain 0.3LangGraph 0.2pgvector 0.7

Outcome

≈ 71% first-pass MSA review time saved · partner-signed-off (95% CI · n=180 MSAs)

E-commerce · DTC apparel · Flutter mobile Case study

AI chatbot case study — Flutter voice copilot in a DTC apparel app

Problem

Mobile-app conversion lagging desktop by 18 points across a 1.4M-MAU Flutter app. In-app search UX rated 2.8/5 (n=1,200). Team had failed two prior on-device voice A/B tests — both rejected on trigger UX, third strike on the line.

Approach

Tap-to-talk on-device VAD → WebRTC over Cloudflare-minted ephemeral keys → gpt-realtime-2 streaming with function-calls into the existing Algolia facet index → product grid re-renders live. Surface shipped as a new GFVoiceCopilot widget in the open-source GetWidget Flutter UI kit (4.8k★). 30-day A/B with matched control.

gpt-realtime-2Flutter 3.24GetWidget OSSAlgoliaCloudflare Workers

Outcome

+11.4 pts mobile conversion · voice-engaged sessions (n=42,318 · ±1.6pt CI · 30d A/B)

full pilot write-up

Start with the clinical triage agent.
The full-detail exemplar.

The pilot case study runs ~2,400 words with an interactive architecture diagram, a 6-row eval table, and a `When NOT to ship this` section. The other five case studies render the same template across the next sprint.

Ready to ship

Want a case study like this
for your stack?

Book a free audit. We review your highest-ROI candidate workflow, recommend a model + retrieval recipe, project token + run-cost, and tell you whether it's case-study-shaped (or whether you should buy an off-the-shelf platform). No deck, no obligation to build.

See pricing

30 min, async or live Eval-first scoping Walk-away point in the pilot

keep exploring

From the case studies
back to the pillars.

Each case study feeds back into a service or industry pillar — start anywhere.

AI case studies.
Receipts, not slideware.

Six AI success stories,
anonymized at the client's request.

HIPAA-safe clinical triage agent — shipped in 9 weeks

Claude case study — RAG over 12,000 product-docs pages

OpenAI case study — Realtime API voice agent at $0.10/call

Anthropic case study — Claude Sonnet 4.6 fraud agent at a US mid-market bank

RAG case study — first-pass MSA review for a mid-market law firm

AI chatbot case study — Flutter voice copilot in a DTC apparel app

Start with the clinical triage agent.
The full-detail exemplar.

Want a case study like this
for your stack?

From the case studies
back to the pillars.

Healthcare AI Development

AI Agent Development

Claude Development

OpenAI Developers

AI Voice Agents

AI Automation

Talk to an engineer, not a salesperson.

Thanks —
we'll reply within 24 working hours.

AI case studies. Receipts, not slideware.

Six AI success stories, anonymized at the client's request.

HIPAA-safe clinical triage agent — shipped in 9 weeks

Claude case study — RAG over 12,000 product-docs pages

OpenAI case study — Realtime API voice agent at $0.10/call

Anthropic case study — Claude Sonnet 4.6 fraud agent at a US mid-market bank

RAG case study — first-pass MSA review for a mid-market law firm

AI chatbot case study — Flutter voice copilot in a DTC apparel app

Start with the clinical triage agent. The full-detail exemplar.

Want a case study like this for your stack?

From the case studies back to the pillars.

Healthcare AI Development

AI Agent Development

Claude Development

OpenAI Developers

AI Voice Agents

AI Automation

AI case studies.
Receipts, not slideware.

Six AI success stories,
anonymized at the client's request.

Start with the clinical triage agent.
The full-detail exemplar.

Want a case study like this
for your stack?

From the case studies
back to the pillars.