Defense in Depth: Keeping Read-Only Really Read-Only

When you connect a production database to an AI agent, every query the agent generates runs against live data.In Limerence, an agent is an AI assistant connected to your database that turns plain-language questions into SQL queries and executes them. If the agent ever writes instead of reads — even once — that is a production incident. Deleted rows, corrupted tables, dropped indexes. The kind of failure that makes people stop trusting the product.

So "read-only" cannot be a feature checkbox. It is an engineering invariant that needs enforcement at every level of the stack.

One Layer Fails to One Bypass

The most common approach to read-only enforcement is a single check: set the database connection to read-only mode, or parse the SQL and reject anything that is not a SELECT. Either one sounds reasonable. Neither one is sufficient alone.

A connection-level setting can fail silently. If the session initialization throws an error and the system logs it but continues, the connection is live with full write access and nothing raises an alarm. A SQL regex can be fooled by invisible Unicode characters — a zero-width space inside DROP makes it look like an unknown token to a naive pattern matcher while some database engines still parse it correctly. An LLM instruction that says "only generate SELECT queries" can be overridden if the model follows an injected instruction in user input.

Each layer alone is breakable in a different way. Defense in depth means stacking independent layers so that a dangerous query has to defeat all of them to cause harm.

◆Key Takeaway

Read-only enforcement works because independent layers each catch a different class of threat. Grounded scope rejects queries that touch the wrong data, SQL validation rejects unsafe statements, database sessions refuse writes where supported, and file protections close the loop for local data.

Scope Is a Runtime Gate

The earliest safety boundary is the agent's data scope. Limerence does not hand an agent an unbounded map of the company database, unrelated conversations, and every saved file. It grounds the agent in the business knowledge for that agent, the current conversation, the user's relevant saved preferences, and the tables and views configured for that agent.

That scoping matters before SQL exists. An agent for revenue operations should be grounded in revenue terms, revenue tables, and revenue workflows. It should not behave like a general database console that happens to speak natural language. The narrower the working context, the less likely the model is to invent unrelated tables, follow irrelevant instructions, or reach for data outside the task.

The agent's instructions add another constraint: validate SQL before executing it, and never present results from a query that has not been validated and executed in the current turn.The model is also told to ignore override instructions embedded in user messages, which reduces the chance that prompt injection changes its behavior. Instructions shape the query before the runtime gates ever see it. They are useful because they lower the number of bad queries the harder layers need to catch.

But scope is not only prompt context. Once SQL is generated, Limerence checks the tables and views referenced by the query against the agent's grounded data scope. If the SQL tries to read an entity outside that scope, the query is rejected before validation or execution can reach the database.

sql

-- Agent grounded to accounts, opportunities, and invoices:
SELECT * FROM payroll_adjustments;
-- This is still a SELECT, but it leaves the agent's scope.
-- The runtime rejects it before the database sees it.

-- Agent grounded to accounts, opportunities, and invoices:
SELECT * FROM payroll_adjustments;
-- This is still a SELECT, but it leaves the agent's scope.
-- The runtime rejects it before the database sees it.

This catches a different failure mode than read-only validation. A query can be perfectly read-only and still be wrong: the model might hallucinate a table, a user might ask for data the agent was not configured to use, or a prompt injection might try to steer the agent toward unrelated records. Statement-type validation answers "is this a read?" Grounded scope answers "is this a read from the data this agent is allowed to use?"

The safe behavior is fail-closed. If Limerence cannot analyze the query's table references reliably, it stops before validation or execution. That can reject a valid dialect edge case, but it is better than letting an unanalyzed query become a database call.

Normalize Before You Validate

Every SQL query passes through a preprocessor before it reaches the database — on every adapter method (execute, validate, test), for every engine.A preprocessor is a function that transforms or validates the SQL string before the database sees it. The read-only validator is always the first preprocessor in the chain, installed in every adapter — Postgres, MySQL, MariaDB, SQL Server, SQLite, BigQuery.

The first thing the preprocessor does is strip invisible characters. Unicode defines two categories of characters that are visually absent but can affect parsing: format charactersUnicode category Cf — zero-width spaces, zero-width non-joiners, soft hyphens, byte-order marks. Characters that change rendering or layout without being visible. and control characters.Unicode category Cc — ASCII control codes like null bytes, backspaces, and escape sequences. Newlines and tabs are preserved because SQL line comments depend on them. Both are dropped before any validation runs. The stripped, normalized string is what every subsequent check operates on.

This closes a specific attack vector. A zero-width space is a Unicode character that occupies no visible space — it is present in the bytes but absent to the eye. Placed inside a SQL keyword, it can fool a naive regex while some database engines still parse the token correctly:

sql

-- Byte-level view of what the model or an attacker might produce:
SE[U+200B]LECT * FROM users; DR[U+200B]OP TABLE users
--  ^^^^^^^^                    ^^^^^^^^
--  zero-width spaces hiding inside keywords

-- After stripping, the validator sees plain text:
SELECT * FROM users; DROP TABLE users
-- Now the multi-statement check catches the semicolon.

-- Byte-level view of what the model or an attacker might produce:
SE[U+200B]LECT * FROM users; DR[U+200B]OP TABLE users
--  ^^^^^^^^                    ^^^^^^^^
--  zero-width spaces hiding inside keywords

-- After stripping, the validator sees plain text:
SELECT * FROM users; DROP TABLE users
-- Now the multi-statement check catches the semicolon.

After normalization, the validator parses the SQL into an abstract syntax treeAn AST is a structured representation of the SQL statement. Instead of matching text patterns, the system can inspect the statement's type, clauses, and structure programmatically. and runs a fixed sequence of checks:

1
Empty query — blank strings after normalization are rejected.
2
Comments rejected — both -- line comments and /* */ block comments are rejected outright as injection vectors, not stripped and continued.
3
Multi-statement blocked — a semicolon followed by another statement is caught. AST-based when possible, regex-based as fallback.
4
Statement type enforced — only SELECT is allowed. The AST node type must be select. If the parser failed, the query must start with SELECT or WITH.
5
EXPLAIN guarded — EXPLAIN SELECT is allowed for debugging. EXPLAIN ANALYZE is blocked because it actually executes the query.
6
Data-modifying CTEs caught — WITH clauses are walked for INSERT, UPDATE, or DELETE hiding inside what looks like a read query.
7
Platform-specific traps — SQLite write-PRAGMAs blocked. MySQL INTO OUTFILE blocked. SQL Server SET PARSEONLY wrappers unwrapped and the inner query re-validated.

The design choice that matters most is the dual strategy. The system tries AST parsing first using a SQL parser that understands dialect-specific syntax. If the parser cannot handle the query — an unusual syntax extension, a very long statement, a parser bug — it does not silently pass. It falls back to keyword matching that covers DROP, TRUNCATE, ALTER, CREATE, GRANT, REVOKE, EXEC, MERGE, COPY, LOCK, and others.

If any check fails, the system throws a typed error that the API layer returns as a 400 with a machine-readable code. The caller gets a clear signal that the query was rejected for safety, not a generic execution failure.

The Database Session Refuses Writes

The preprocessor catches dangerous SQL before it reaches the database. But what if the preprocessor has a bug, or an edge case slips through a parser limitation? The next layer makes the database engine itself refuse write operations.

How this works depends on the engine:

Every new pool connection runs SET default_transaction_read_only = on during initialization. This is a session-level setting that persists for the life of the connection. PostgreSQL will reject any INSERT, UPDATE, DELETE, or DDL with a read-only transaction error before it touches any data.

Single layer

If the preprocessor misses a query, the database executes it with full write access.

one parser bug = one write to production
no fallback, no second opinion
the blast radius of a single failure is unbounded

Preprocessor plus session enforcement

Even if the preprocessor misses a query, the database session refuses writes independently.

PostgreSQL: session-level rejection before data is touched
MySQL / MariaDB: per-call rejection
SQL Server and BigQuery: preprocessor remains the primary guard
where session enforcement exists, the database engine acts as an independent second voter

The File System Refuses the Write

For SQLite databases — including spreadsheet uploads, which are converted to SQLite at import time — another layer operates below the database engine entirely.

Every SQLite file is opened with three independent read-only flags:

The first two are SQLite URI parameters. The third is a Node.js constructor option.

mode=ro          → SQLite URI: opens the file in read-only mode
immutable=1      → SQLite URI: skips the locking protocol entirely

mode=ro          → SQLite URI: opens the file in read-only mode
immutable=1      → SQLite URI: skips the locking protocol entirely

new DatabaseSync(uri, { readOnly: true });
// Node.js binding-level enforcement, independent of the URI flags

new DatabaseSync(uri, { readOnly: true });
// Node.js binding-level enforcement, independent of the URI flags

mode=ro tells the operating system to refuse any write system call to the file. immutable=1 goes further — SQLite treats the database as permanently constant and will not attempt any write operation, not even a journal file. The readOnly constructor option adds a third enforcement point at the Node.js binding level, independent of the URI flags.

This is triple-redundant by design. Even if the preprocessor and every other layer were somehow bypassed, the operating system kernel would return a read-only error before a single byte reached the disk. And because the database file is opened fresh and closed after every query, there is no persistent handle that could accumulate stale state.

Where Enforcement Is Thinner

No system is uniformly strong. The defense-in-depth model means a gap in one layer does not become an incident — but the gaps still exist.

PostgreSQL's session hook is not fail-fast. If the SET default_transaction_read_only = on command fails during connection initialization — the database user lacks SET permission, or the connection times out mid-handshake — the error is logged but the connection is returned to the pool anyway. In that window, only the preprocessor stands between the query and a writable connection.

SQL Server execution has no session guard. Validation wraps in a rolled-back transaction, but execution runs directly against the pool with no database-level write protection. The preprocessor is the only code-level safety net for SQL Server queries at execution time.

BigQuery relies on infrastructure configuration. If the service account has write permissions in Google Cloud and the preprocessor is bypassed, there is no code-level fallback. The assumption is that IAM roles are configured correctly — a reasonable assumption, but not an engineering guarantee.

MySQL and MariaDB enforce per-call, not per-connection. The read-only guard fires at the start of every adapter method call. Code that accesses the underlying connection pool directly — bypassing the adapter — would skip the per-call SET SESSION entirely. No such path exists in the current codebase, but the pool object is accessible, so the guard is architecturally softer than PostgreSQL's per-connection setting.

AST parsing failure degrades to regex. If the SQL parser cannot handle a query, the system falls back to keyword matching. The keyword list is comprehensive, but regex matching is not semantically equivalent to AST analysis. The fallback catches the common cases, but an adversary with knowledge of the regex patterns could in theory craft something that passes.

Scope analysis depends on parser coverage. Limerence rejects queries that reference tables or views outside the agent's grounded scope, but it has to analyze the SQL to know what the query touches. If a dialect-specific construct cannot be analyzed reliably, the safe behavior is to reject the query before validation or execution. That protects data, but it can also block valid read-only queries until parser coverage improves.

Semantic Understanding, Not Just Keywords

The path forward is moving from "block dangerous statement types" to "understand what the query does." Limerence already takes the first step by checking which tables and views the query touches. It still has more to learn about cost, result size, column breadth, and whether a technically valid query matches the user's actual intent.

That distinction matters. Limerence does not stop at asking the model to stay inside the right semantic neighborhood. It checks the generated SQL against the agent's grounded data scope and rejects out-of-scope reads before they become database calls.

The remaining frontier is resource and intent governance inside the allowed scope. A query can touch the right tables and still scan too much data, return too many rows, or select every column from a wide table. Cost estimation, row limits, SELECT * blocking, stronger dialect coverage, and fail-fast session hooks are the next hardening work on top of the existing write-safety and scope-safety foundation.

The layers already make write-through a multi-fault scenario. Grounded scope adds another important property: even a read-only query has to stay inside the part of the data model the agent was configured to use. For the broader picture of how Limerence treats trust — auth model, deployment topology, audit logging — see the security overview.

Teaching Agents Your Business Language

How domain jargon becomes typed instruction fragments and runtime context — the knowledge layer that helps agents generate correct queries in the first place.