The architecture below is the production shape, not a marketing diagram. MSA uploads route through iManage Work via the firm's existing matter-scoped OAuth flow — the agent reads only documents the responsible attorney has matter-level access to, and every read is logged. The semantic clause splitter is the load-bearing detail: each clause is bounded on its section heading (§N.M, with sub-clause merge logic that respects cross-references like `subject to Section 8.3`), tagged with a clause_id that survives editorial reshuffling, and carries the surrounding context window the model needs to reason about it without being polluted by neighboring clauses.
Retrieval is hybrid and fans out into two parallel lanes at stage 4. The clause-RAG lane runs pgvector 0.7 + Postgres tsvector BM25 over the reconciled clause library (1,420 unique reference clauses post-reconciliation, down from 1,840 pre-reconciliation), fuses with reciprocal-rank fusion at k=60, dedupes by clause_id, and reranks with BAAI's bge-reranker-large self-hosted on a single g5.xlarge in the firm's tenant. The policy-lookup lane runs a separate regex-validated index over house policy documents — every policy carries an id of shape `policy_(practice-group)-(NNN)` (e.g. policy_IP-014, policy_MA-203) — and is practice-group-aware: IP clauses route to IP policies first, real estate clauses route to real estate policies first, with cross-practice retrieval as a deliberate fallback rather than a default. Running the two lanes as parallel branches rather than a single fused retrieval is the thing the senior counsel specifically asked for during reconciliation — they wanted the policy citation and the precedent retrieval to be visibly independent, not blended.
During build, we A/B-tested two rerankers — BAAI's bge-reranker-large and Cohere Rerank v3 — on the held-out clause-eval slice. bge won by roughly three points on top-1 precision over the legal corpus; we shipped bge as primary and kept Cohere wired as a runtime fallback so the firm has a swap-out path if bge ever degrades. Both rerankers' top-1 precisions are logged in Langfuse per-decision so the comparison can be re-run on a fresh slice at any time.
The decision step is Claude Sonnet 4.6 with `response_format: json_schema` set to the ClauseRisk shape. The model has zero write tools — it cannot send anything to counterparties, modify documents in iManage, or finalize a redline. All it produces is a JSON object: the clause_id, the risk band (one of four enum values), an array of rationale entries each tied to a policy_id (regex-enforced via Zod) plus up to eight precedent_ids, and an optional suggested_redline string. Every rationale claim has to cite a policy_id whose regex matches and whose pathway resolves to a live policy document, or the validator rejects. Confidence below 0.8 routes the clause to manual-review regardless of band — partner sees the manual-review marker on the redline draft and reads the clause themselves.