Claude Sonnet 4.6
AnthropicLong-context · tool use · default pick
Anthropic developers, Claude consultants, and Claude Code experts who ship production AI — long-context agents, tool-using workflows, Computer Use, Claude RAG, and Claude Code engineering. Model-agnostic. Daily operators. We'll show you the token-cost math before you commit.
From chat to agents to repo-scale code work, these are the patterns we ship most often. Every one of them comes with an eval suite, audit logging, and a token-cost target — not a demo.
Customer-facing or internal chat built on Claude's tool-use API. Multi-turn dialogue, context memory, structured outputs, escalation paths. Deployed in web, mobile, Slack, Teams, or your own front-end.
Production agents that plan, call tools, observe results, and recover from errors. ReAct, plan-and-execute, hierarchical agent patterns. LangGraph orchestration or custom Python — we pick the simpler one that works.
Full-contract review, multi-document synthesis, repo-scale code analysis, regulatory comparison. Claude's long context unlocks workflows you couldn't ship on GPT-4's 128K — without the chunking pain.
Production Claude RAG — retrieval-augmented agents over Notion, Drive, Confluence, your CRM, or your code. Pinecone, pgvector, or Weaviate retrieval, eval-tested with your real questions before launch. Claude's 200K context lets us cite full source passages instead of chunks.
Claude's Computer Use API for browser and desktop automation — form-filling, legacy-system workflows, complex UI tasks without API hooks. Honest about where Computer Use is and isn't production-ready in 2026.
Internal AI engineering for your team using Claude Code: code generation, repo-scale refactoring, test authoring, on-call triage. We dogfood this for our own engineering, daily.
Some teams need a Claude consultant to sort the strategy first — which model, which deployment, which workflow. Others know what they want and just need Anthropic developers to build it. We do both, picked by what stage you're at. Either way, fixed-fee at the start.
You're not sure whether Sonnet 4.6 / Haiku 4.5 / Opus 4.7 fits the workflow, whether Computer Use is the right call, or whether Bedrock vs Anthropic-direct is the right deployment. We run a fixed-fee Anthropic consulting audit, deliver a ranked roadmap, and you decide whether to build with us or in-house.
You know what you want shipped. We build one Claude workflow end-to-end against your real systems in 4–6 weeks — agents, RAG, Computer Use, or Claude Code engagement. Fixed-price. Eval suite, monitoring, and a runbook ship with it. Walk-away point if the data doesn't move at the pilot stage.
You have a roadmap of 3–5 Claude workflows. Embedded Anthropic developers ship them on cadence, with monthly cost-of-ownership and drift reporting. Includes Claude Code consulting for your engineering team. Cancel any month.
Both are production-ready. The honest answer per dimension — drawn from shipped client work, not benchmarks — is below. We pick per workflow, not per vendor.
Generalizations from shipped client work + public benchmark suites (HELM, SWE-bench, GAIA). Specifics vary per workload; we benchmark on your eval before recommending.
The Claude family covers three price/quality bands. The default is Sonnet 4.6 for most workloads. Haiku is ~3× cheaper for narrow tasks; Opus 4.7 is only ~1.7× more expensive than Sonnet (a big drop from the old Opus 4 pricing), which changes when Opus is worth the spend. Here's how we choose.
Prices reflect Anthropic API list pricing as of 2026; Bedrock pricing tracks within ±10%. Latency from typical production traces.
Claude Sonnet 4.6's 200K-token context isn't a benchmark stat — it's an unlock. Three workflows we now ship without chunking, retrieval errors, or per-section state-juggling.
Past 100K tokens, you can put an entire master services agreement + every amendment + the playbook of redlined clauses into a single Claude call. No chunking, no retrieval errors, no "the model missed clause 14(b) because it was on page 12."
Drop a 150-file Python service into the prompt and ask Claude to find every place a deprecated function is called. We've shipped this for an internal devtools client — found 41 call sites, zero false positives.
Compare three regulatory filings against your current policy doc. Or merge five meeting transcripts into a single decision log. Or summarize 200 customer interviews. All in one prompt, with citations back to the source.
Three tactics stacked. Each one independently saves money; together they typically bring effective token cost to 10–15% of the naive baseline — at the same eval-suite quality.
The number-one complaint about Claude in production: "the bill ran away." The fix is rarely "use a worse model." Three tactics we apply in every pilot — and report on monthly afterwards.
Most workflows have 70% easy decisions and 30% hard ones. We route easy decisions to Haiku ($1/$5 per M) and only escalate to Sonnet when needed. Typical result on the same workflow: 60–80% cost reduction with zero quality drop on the eval suite.
Anthropic's prompt caching cuts cached prefix tokens to 10% of input cost. We restructure prompts so the system prompt + tool definitions + long context are cacheable. On document workflows we see effective input cost drop by 70–85% within a week.
Long-running agents bloat their own context. We add summarization layers that compress old turns into 200-token gists, so a 12-step agent runs in ~8K tokens instead of 40K. Same quality, 5× cheaper, faster latency.
The migration that scares teams least: prompts + tool-use + eval suite. Three weeks of work, milestone-billed, with a walk-away point if the data doesn't move. Most teams see a 30–60% token-cost reduction after the optimization pass.
We rebuild your eval set from existing GPT outputs, audit every system prompt for Claude-style instructions, and map your function-calling schema to Claude's tool-use format.
Most GPT-4 workloads map to Sonnet 4. We benchmark on your eval to confirm; sometimes Haiku is enough, sometimes Opus is required. You see the data, not our opinion.
We run Claude in shadow mode alongside your GPT pipeline. Same inputs, both outputs logged. You see the quality + cost delta on real traffic. Cut over only when the data shows parity or better.
Once live, we run the token-optimization playbook: prompt caching, complexity routing, context compression. Most teams see another 30–60% cost reduction in the first month after cutover.
The same ticket-triage agent across Sonnet, Haiku, and Opus. Pick a model on the left — the model= line swaps and the per-ticket cost stat updates. This is how we choose a model: by running your eval, then looking at the bill.
from anthropic import Anthropic
client = Anthropic()
@tool(description="Search internal product docs")
def search_docs(query: str) -> list[dict]:
return vector_db.search(query, k=5)
@tool(description="Create a Zendesk reply (pending review)")
def reply(ticket_id: int, body: str) -> dict:
return zendesk.update(
ticket_id, body=body, status="pending"
)
def triage(ticket: dict) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=[search_docs, reply],
messages=[{
"role": "user",
"content": format_ticket(ticket),
}],
)
return run_tool_loop(response, ticket)
The right deployment depends on which compliance regime you need to satisfy. We've shipped all three, walked the security teams through each, and provide a DPIA template at the audit stage.
Anthropic offers BAA on Claude Enterprise. We deploy with audit logging, no-train-on-data toggle, and a 30-day retention default you can shorten to zero. We walk your security team through the architecture before code ships.
For workloads that need AWS's compliance posture, we deploy Claude through Bedrock with PrivateLink, KMS encryption, and CloudTrail logging. Your data never leaves your VPC. Same Claude model, AWS-managed control plane.
Anthropic's EU data residency option for Claude API and Bedrock's eu-central-1 region cover the most common EU compliance asks. Data Processing Agreements available. We provide a DPIA template at the audit stage for your privacy team.
Same pricing as our other engagements. Most clients begin with the audit to scope, run a 4–6 week pilot on the highest-ROI workflow, then move to monthly for the next 3–5.
Find the Claude workflows worth shipping before you commit a budget.
One workflow shipped end-to-end on Claude, with eval data — not a demo.
Embedded squad shipping the next Claude workflow on your roadmap.
Three anonymized capability patterns drawn from real engagements. Named references shared under NDA once we know what you're building.
Inside legal team reviewing 80-page master agreements + amendments manually; 6 hours per contract average; clause deviations slipping through.
Claude Sonnet 4.6 ingests the full contract + amendment chain + the team's clause-deviation playbook in a single prompt. Returns a redline summary with citations to specific clause numbers.
Tier-1 support team drowning in repetitive product questions; help-center docs underused; agents copy-paste-editing the same replies.
Claude Sonnet RAG agent over product docs + historical ticket replies. Drafts reply if confidence > 0.7, escalates with a redacted draft otherwise. Learns from every agent edit.
Mid-size engineering team losing 4–8 hours per on-call rotation triaging stale alerts and tracing through a 150-file legacy service.
Claude Code agentic loop with custom subagents for repo navigation, log query, and PR drafting. Plugged into PagerDuty + the team's incident runbook.
Claude is developed by Anthropic, an AI-safety company founded in 2021 by former senior members of OpenAI. Anthropic's research focuses on alignment, interpretability, and Constitutional AI. The current model family — Claude Sonnet 4.6, Haiku 4.5, and Opus 4.7 — is widely benchmarked as competitive with or ahead of GPT-4 on reasoning, long-context, and code generation. GetWidget is not affiliated with Anthropic; we're a development partner that builds production applications on top of Claude.
Anthropic. The company is headquartered in San Francisco, was founded by Dario and Daniela Amodei alongside other former OpenAI researchers, and is backed by Google, Amazon, and other strategic investors. Anthropic ships Claude both directly through the Anthropic API and through cloud partners — most commonly AWS Bedrock and Google Vertex AI. Where you access Claude affects pricing, compliance posture, and data residency; we walk teams through that tradeoff at the audit stage.
For some tasks, yes — for others, no. The honest comparison: Claude Sonnet 4.6 wins on long-context work (200K vs OpenAI's 128K), most code benchmarks, the cleanliness of its tool-use API for long-running agents, and Constitutional-AI safety posture for regulated industries. GPT-4 wins on raw ecosystem maturity (assistants API, plugin marketplace, broader third-party library support) and audio via the Realtime API. For high-volume classification, both Haiku 4.5 and GPT-4o-mini are excellent and the choice often comes down to latency and cost on your specific eval. We benchmark per use case rather than pick a winner.
Not in 2026. Claude (and other frontier models) replaces narrowly-scoped engineering tasks — boilerplate generation, snippet-level refactoring, test-stub authoring, and parts of code review. It does not replace the work of figuring out what to build, designing systems, debugging novel production failures, owning long-running services, or making architectural calls. Teams that lean on Claude Code and Claude agents tend to ship more output per engineer — which means engineers do higher-leverage work, not less work. We use Claude Code on our own engineering team for exactly these reasons.
Yes — and it's usually less painful than teams expect. The work splits into three phases. (1) Prompt audit: most GPT system prompts translate with minor edits to match Claude's instruction style. (2) Function-calling to tool-use: Claude's tool-use API has a slightly different schema, but the conversion is mechanical and we automate most of it. (3) Re-eval: we rebuild your eval suite, run shadow-mode against your current GPT pipeline, and only cut over when the data shows quality is equal or better. Typical migration runs 4–6 weeks for a single application. Token-cost reduction post-migration is typically 30–60% with our optimization pass.
Three tactics, in order of impact. (1) Complexity routing — route 70% of decisions to Haiku 4.5 ($1/$5 per million tokens) and only escalate to Sonnet 4.6 when the eval suite says it's needed. Typical cost drop: 60–80% on the same workflow. (2) Prompt caching — Anthropic's caching makes repeated prefix reads cost 10% of normal input. We restructure prompts so system prompt + tool definitions + long context become cacheable. Typical effective input cost reduction: 70–85% within a week. (3) Context compression — long-running agents bloat their own context with stale turns; we add summarization layers that compress old turns into 200-token gists. Same quality, 5× cheaper, faster latency. We include this optimization pass in every Claude pilot.
Yes. Claude Code consulting is one of our most common engagements. We use Claude Code internally daily for code generation, repo-scale refactoring, test authoring, and on-call triage — so we speak from operator experience, not slides. Common Claude Code consulting engagements: setting up Claude Code for a 10–50 engineer team, building custom subagents tuned to your codebase, integrating Claude Code with your CI and PR review flow, and training the team on the prompting patterns that actually work for production code. Claude Code consulting sits inside our standard Claude pilot or continuous-team engagement.
Two valid paths. Path A — Claude consulting first: a fixed-fee two-week audit ($3K) where we map your workload, recommend which Claude (Sonnet 4.6 / Haiku 4.5 / Opus 4.7) fits each use case, project token cost, and deliver a 90-day Anthropic roadmap. You leave with a written plan; you can build with us or in-house. Path B — build with Claude directly: skip the audit if you already know what you want shipped. We move straight into a pilot ($10–25K, 4–6 weeks) on the highest-ROI workflow. Most teams take Path A. About 30% know exactly what they want built and skip to Path B. Anthropic consulting and Claude development engagements ladder into the same continuous team afterwards.
Yes — three deployment options depending on your compliance posture. Anthropic offers Claude Enterprise with SOC 2 Type II and a BAA for HIPAA workloads. AWS Bedrock gives you Claude inside AWS's compliance posture (SOC 2, HIPAA via BAA, PCI, FedRAMP-eligible regions) with PrivateLink so data never leaves your VPC. For EU sovereignty, Anthropic's EU data residency option and Bedrock's eu-central-1 region cover most GDPR-driven asks. We disable training-on-your-data on every deployment as a default. At the audit stage we provide an architecture review and DPIA template for your security and privacy teams.
Book a free Claude audit. We'll review your current LLM workload (if any), recommend Sonnet / Haiku / Opus per workflow, project token cost vs your current spend, and give you a 90-day Claude roadmap. No deck, no obligation to build.
Building with Claude often connects to OpenAI, agent frameworks, or full AI integration. These pages go deeper.
GPT-4, GPT-4o, Realtime API integration.
Read more 02Multi-step autonomous agents with LangGraph, CrewAI.
Read more 03Plug Claude into Salesforce, Slack, NetSuite, and more.
Read more 04Production chatbots on Claude + GPT — RAG, guardrails, multi-channel.
Read more 05The umbrella pillar — operator-grade AI dev across Claude, GPT, and open-weights.
Read more 06Production workflow automation in 6–8 weeks.
Read more 07Strategy and roadmap before the build.
Read more 08Sonnet 4.6 for ambient scribe + Haiku 4.5 for surge-mode triage routing.
Read more 09Sonnet 4.6 for predictive-maintenance work orders + Haiku 4.5 for shift-handoff summaries.
Read more 10Sonnet 4.6 for clause analysis + cite-grounded brief research, Bedrock + customer-managed keys for ring 2.
Read more 11Sonnet 4.6 for AI revenue management + IROPS draft rationale, Haiku 4.5 as the surge-mode swap.
Read more 12Sonnet 4.6 for Socratic tutoring scaffolds + rubric-anchored grading drafts, LMS write-back on teacher sign-off.
Read more 13Sonnet 4.6 for KYC tier-2 enrichment + ECOA principal-reason extraction; Haiku 4.5 for inline Card-auth fraud screen at 200ms p99.
Read more 14HR AI audit + roadmap — EEOC / AEDT / ADA regulatory ledger, bias-audit harness scoping, and HRIS / ATS integration gate-in before any inference.
Read more 15Insurance AI audit + roadmap — claim-lifecycle state machine, underwriting capacity sankey, and fraud-network mapping before any core-system integration.
Read more