The architecture below is the production shape, not a marketing diagram. The docs corpus ingests from the customer's MDX monorepo via a GitHub Action that triggers on PR-merge — only changed pages reindex, the rest hot-reload from cache. The anchor-preserving chunker is the load-bearing detail: each chunk carries a stable anchor_id of shape `doc_[slug]#[anchor-slug]`, minted at chunk-time and never regenerated, so a citation issued today resolves to the same passage tomorrow even after editorial reshuffling. The shape of the id is also exactly the shape the answer schema enforces downstream with a regex.
Retrieval is hybrid. pgvector 0.7 sits on the embedding side with voyage-3-large at 1,024 dimensions. Algolia sits on the lexical side over the same corpus, reusing the synonym table the docs team has maintained for four years. We fuse with reciprocal-rank fusion at k=60, take the top-40 from each lane, dedupe by anchor_id, and rerank with BAAI's bge-reranker-large self-hosted on a single g5.xlarge inside the customer's VPC. The reranker carries a score-margin gate: if the best candidate doesn't beat the second by at least 0.04, the agent refuses on principle and routes to a human ticket. That gate is part of why the false-anchor rate moved from 6.2% to 0.9% — most synonym-hallucination cases fail it before the model ever sees them.
The decision step is two-model. Claude Haiku 4.5 routes intent first: greeting, product-list lookup, docs-question, or out-of-scope. Roughly 28% of inbound queries terminate at Haiku — they don't need Sonnet's reasoning, and paying for it would be a waste. The other 72% pass through to Claude Sonnet 4.6 with `response_format: json_schema` set to the DocsAnswer shape. The model has zero write tools — it cannot open tickets, send emails, or modify docs. All it produces is a JSON object: the prose answer, a 0–1 confidence float, an array of cited claims (each tied to an anchor_id), and an explicit refusal boolean. Every claim has to point to an anchor_id whose pathway-id matches the routed product context, or the validator rejects.
The third outcome lane — refusal — is first-class on purpose. Below confidence 0.5, the surface doesn't answer; it tells the user it doesn't know and opens a draft ticket with the query, the retrieved candidates, and the refusal reason pre-attached. Between 0.5 and 0.8, it shows the answer as a draft with a "verify with support" banner. At ≥ 0.8 it renders the cited answer inline. This three-tier surface is the thing that buys the support lead's sign-off. A confident-wrong answer is the failure mode that scares the team; the architecture makes confident-wrong structurally hard to ship.