You sat through the security review three times this quarter. Each time, a different version of the same line:


"Our security team needs to see Anthropic's invoice. Your invoice doesn't help them."


It arrives wrapped in other objections too:


  • "We already have OpenAI Enterprise with zero retention — why are we paying you a markup to use a worse contract?"
  • "Our model committee approved Claude on Bedrock in our VPC. You only support public endpoints."
  • "We hit a rate limit during the demo because we're on a shared tier with everyone else on the platform."
  • "We can't tell which prompts you log and which you don't."

Different complaints, one shape. The LLM bill is in the vendor's name, and the relationship with the model provider follows the bill.


The procurement loop usually runs like this: security asks for a DPA. The vendor provides a SOC 2. Security asks who can see the prompts, not who currently does. The vendor says they don't train on customer data. Security asks if that's contractually fixed or a policy setting. The vendor offers to escalate to their enterprise team. The deal stalls. Three weeks later, it happens again with a different stakeholder.


Most text-to-SQL pricing pages frame BYO keys as a cost question — saving the markup on a million tokens. That is not why your CISO blocked procurement. The real question is who sits between the prompt and the provider, and the answer follows the invoice.


The Key Is the Trust Boundary


Whoever signs the request is the account that holds the prompt logs. Their TOS dictates retention. Their abuse team can sample traffic. Their billing tier caps your throughput. Their negotiated price sets your floor. If the provider flips a clause — "we now train on customer data unless you contact sales" — the only safe response is to stop sending data, which means the product stops working. Teams in regulated industries live one DPA paragraph away from a procurement freeze.


This is not a theoretical risk. It is the daily operational reality for any team whose queries include customer PII, transaction records, or anything that touches a data residency requirement. "We don't train on your data" is a statement about current policy, not a contractual guarantee. Security teams know the difference — which is why the objection keeps resurfacing in new forms even after the vendor swears it's resolved.


Key Takeaway

Whoever's API key signs the request is the party that legally and operationally sees the prompt. Retention, sampling, rate limits, and rotation control all follow from that one fact.


Why "Bring Your Own Key" Usually Doesn't Move the Boundary

Most BYO-key offerings move the billing, not the boundary.


Vendor-paid with a markup. The vendor's key signs the request, the vendor's account reads the prompts, and you pay a margin for the privilege. Even if the vendor offers "zero retention" under their enterprise plan, the key is still theirs, the account is still theirs, and their policy still governs what the provider can do with data at rest. The audit finding doesn't close.


A reverse proxy in front of the SaaS. Cloudflare, an LLM gateway, an egress filter — useful for redaction and traffic logging, but it sits in front of the SaaS, not in front of the model provider. The vendor still sees the server-side payload. Nothing about the underlying contract changed. You've added operational complexity without resolving the DPA question, and when the vendor updates their API surface, the proxy breaks first.


Build it yourself on LangChain. Full control. Now you also own the schema linker, the dialect-aware repair pass, the eval harness, the agent UI, and the data source connectors. Six months of plumbing for the part that was never your differentiator. Engineering leadership runs the math on the first sprint and the project dies in a different meeting.


Per-user OAuth to OpenAI or Anthropic. Solves the billing question, breaks the team-shared workspace. End users don't want to authorize a personal model account to do their day job. The platform doesn't want to manage per-user credential delegation across hundreds of analysts who rotate teams, go on leave, or get their personal API access revoked for an unrelated reason.


Each alternative fails on a different axis. The shared root: the key is either on the vendor's side of the wire, or the integration degrades into something neither team would deploy in production.


What It Looks Like When the Customer Owns the Boundary

The architecture that actually moves the boundary keeps it simple: credentials are first-class tenant state, stored in the customer's own database, loaded at inference time, and used directly. No proxy. No platform-key fallback. What follows is the implementation — the same pattern applies to any self-hosted system that treats credentials as tenant state.


The data model is one row per team per provider:


prisma
// packages/persistence/db/models/team-ai-provider.prisma
model TeamAIProvider {
  id        String  @id @default(cuid())
  teamId    String
  provider  String  // openai | anthropic | groq | google | deepseek | openrouter | ollama | lmstudio | oai-compat-{slug}
  apiKey    String? @db.Text
  baseUrl   String? // gateways, proxies, and self-hosted endpoints
  isEnabled Boolean @default(true)

  @@unique([teamId, provider])
}

No shared credential pool. If a team hasn't configured a provider, the agent refuses to run. That refusal is the feature: there is no fallback path where data crosses the wrong boundary when something is misconfigured.


At inference time, the agent loads its team's row and instantiates the model client with that key directly:


ts
// apps/backend/src/core/agents/sql-agent.ts
// modelName resolved earlier from agent config
const { provider, apiKey, baseUrl } = teamProvider;

if (!apiKey && !isLocalProvider(provider)) {
  throw new ProviderNotConfiguredError(provider);
}

return {
  model: modelProvider(provider, apiKey ?? '', baseUrl ?? undefined)(modelName),
};

modelProvider is a switch over the major provider SDKs — OpenAI, Anthropic, Groq, Google, DeepSeek, OpenRouter — plus an OpenAI-compatible adapter for any endpoint that speaks the OpenAI API format: Bedrock-fronted gateways, custom proxies, Ollama, LM Studio.For Ollama and LM Studio, the API key field is optional — the base URL is the whole config, which makes air-gapped inference a one-field setup. The request leaves the customer's network on their TLS connection, signed with their key, hitting whichever endpoint is in the row.


A few smaller decisions reinforce this: the API response masks the saved key as eight bullets plus the last four characters so it never round-trips back to the browser. Each agent can point at a different provider row — a "PII-safe" agent pinned to a self-hosted model, a "general analytics" agent using Claude. The per-call model namespace (anthropic:claude-sonnet-4) overrides the provider at request time without borrowing credentials across teams.


Three Shapes the Same Architecture Covers

A regulated bank pointing at an internal Bedrock gateway. Their model committee approved Claude on Bedrock inside the VPC — not on public api.anthropic.com. They register an OpenAI-compatible provider with their Bedrock-fronted gateway as the base URL and a service-account token as the key. The agent code path is identical to public OpenAI; no special Bedrock SDK required. The request never leaves the VPC. Their procurement team gets what they needed: Anthropic's data handling terms apply to their key, their account, their negotiated contract — not a vendor's.


A startup splitting workloads across Claude and self-hosted Llama. They register two providers on the same team — Anthropic for prose-heavy queries that benefit from Claude's reasoning, and Ollama at an internal endpoint for joins that touch sensitive customer records. Two agents, each pointing at a different provider row. The model dropdown shows both namespaced correctly, and side-by-side comparison routes each prompt through the right key automatically. Sensitive data never leaves their network at all.


An enterprise rotating a leaked key. Their security team spots a committed secret at 2 AM. They update the provider row with the new key. The next request loads the updated record and instantiates a fresh client — no in-process cache to invalidate, no support ticket to file, no vendor workflow to follow. Rotation is effective on the next request. The flip side is equally plain: if the key is compromised at the model provider level, you revoke there and re-key here. The boundary is yours to manage, in both directions.


Renting the Boundary vs. Owning It

The difference shows up along the axes procurement actually asks about.


Vendor-held key

  • Prompts cross the vendor's network and sit in the vendor's logs - Retention and sampling are governed by the vendor's TOS - Rate limits are shared with everyone else on the platform tier - Rotation requires the vendor's cooperation or a support escalation

Customer-held key

  • Prompts cross your network on your TLS, signed with your key - Retention and sampling follow your direct contract with the model provider - Rate limits match whatever tier your account holds, including enterprise contracts you already pay for - Rotation is an upsert to your own database row — effective on the next request

What Owning the Boundary Doesn't Solve

Putting the boundary on your side of the wire moves the operational surface to your side too. Worth being plain about what that means in practice.


The provider key column is unencrypted by default — stored as a Postgres TEXT value, not pre-encrypted at the application layer. Encryption at rest is your Postgres deployment's job: managed Postgres with a customer-managed key, a KMS-aware secret manager wrapping the deployment, or disk encryption at the infrastructure layer. If your database runs on an unencrypted volume with a permissive role, the key is readable to anyone who can query that table. Self-hosting means you own the security posture of the layer below the application. There is no encryption-at-rest handling baked into the data model; you choose the envelope that meets your control requirements before a key ever reaches the database.


Key rotation today is a naive upsert — no rolling window where old and new keys are simultaneously valid during a transition, and no built-in audit log of who changed what and when. For most teams this is fine; they rotate infrequently and can tolerate a brief window. For teams with strict rotation policies or SOX-adjacent audit requirements, that gap is real. Application-level logging can capture the rotation event, but the platform does not provide a rotation workflow, a dual-key validity window, or a change ledger out of the box.


A few more edges worth naming directly: keys are team-scoped, not user-scoped. If two analysts need separate budget envelopes inside the same team workspace, you model that as two teams — there is no per-user provider binding. Provider-side budget alarms — monthly token caps, spend alerts, hard cutoffs — live on the provider's dashboard, not here; the platform does not enforce a token ceiling per provider row.


These are not reasons to avoid owning the boundary. They are the operating conditions that come with it. The alternative — renting the boundary from a vendor — does not make these problems disappear. It makes them someone else's problems to solve on someone else's schedule.