LIMERENCE
EngineeringMAY 12, 20268 min read

Defense in Depth: Keeping Read-Only Really Read-Only

Four independent layers of SQL safety — from invisible character stripping and AST validation to session-level enforcement and OS-level file protection — ensure AI-generated queries never modify your data.

When you connect a production database to an AI agent, every query the agent generates runs against live data.In Limerence, an agent is an AI assistant connected to your database that turns plain-language questions into SQL queries and executes them. If the agent ever writes instead of reads — even once — that is a production incident. Deleted rows, corrupted tables, dropped indexes. The kind of failure that makes people stop trusting the product.


So "read-only" cannot be a feature checkbox. It is an engineering invariant that needs enforcement at every level of the stack.

One Layer Fails to One Bypass

The most common approach to read-only enforcement is a single check: set the database connection to read-only mode, or parse the SQL and reject anything that is not a SELECT. Either one sounds reasonable. Neither one is sufficient alone.


A connection-level setting can fail silently. If the session initialization throws an error and the system logs it but continues, the connection is live with full write access and nothing raises an alarm. A SQL regex can be fooled by invisible Unicode characters — a zero-width space inside DROP makes it look like an unknown token to a naive pattern matcher while some database engines still parse it correctly. An LLM instruction that says "only generate SELECT queries" can be overridden if the model follows an injected instruction in user input.


Each layer alone is breakable in a different way. Defense in depth means stacking independent layers so that a dangerous query has to defeat all of them to cause harm.

Key Takeaway

Read-only enforcement works because four independent layers each catch a different class of threat. Connection bypass, SQL obfuscation, parser failure, and prompt injection are all handled by different parts of the system.

Normalize Before You Validate

Every SQL query passes through a preprocessor before it reaches the database — on every adapter method (execute, validate, test), for every engine.A preprocessor is a function that transforms or validates the SQL string before the database sees it. The read-only validator is always the first preprocessor in the chain, installed in every adapter — Postgres, MySQL, MariaDB, SQL Server, SQLite, BigQuery.


The first thing the preprocessor does is strip invisible characters. Unicode defines two categories of characters that are visually absent but can affect parsing: format charactersUnicode category Cf — zero-width spaces, zero-width non-joiners, soft hyphens, byte-order marks. Characters that change rendering or layout without being visible. and control characters.Unicode category Cc — ASCII control codes like null bytes, backspaces, and escape sequences. Newlines and tabs are preserved because SQL line comments depend on them. Both are dropped before any validation runs. The stripped, normalized string is what every subsequent check operates on.


This closes a specific attack vector. A zero-width space is a Unicode character that occupies no visible space — it is present in the bytes but absent to the eye. Placed inside a SQL keyword, it can fool a naive regex while some database engines still parse the token correctly:

sql
-- Byte-level view of what the model or an attacker might produce:
SE[U+200B]LECT * FROM users; DR[U+200B]OP TABLE users
--  ^^^^^^^^                    ^^^^^^^^
--  zero-width spaces hiding inside keywords

-- After stripping, the validator sees plain text:
SELECT * FROM users; DROP TABLE users
-- Now the multi-statement check catches the semicolon.

After normalization, the validator parses the SQL into an abstract syntax treeAn AST is a structured representation of the SQL statement. Instead of matching text patterns, the system can inspect the statement's type, clauses, and structure programmatically. and runs a fixed sequence of checks:

  1. 1
    Empty query — blank strings after normalization are rejected.
  2. 2
    Comments rejected — both -- line comments and /* */ block comments are rejected outright as injection vectors, not stripped and continued.
  3. 3
    Multi-statement blocked — a semicolon followed by another statement is caught. AST-based when possible, regex-based as fallback.
  4. 4
    Statement type enforced — only SELECT is allowed. The AST node type must be select. If the parser failed, the query must start with SELECT or WITH.
  5. 5
    EXPLAIN guardedEXPLAIN SELECT is allowed for debugging. EXPLAIN ANALYZE is blocked because it actually executes the query.
  6. 6
    Data-modifying CTEs caughtWITH clauses are walked for INSERT, UPDATE, or DELETE hiding inside what looks like a read query.
  7. 7
    Platform-specific traps — SQLite write-PRAGMAs blocked. MySQL INTO OUTFILE blocked. SQL Server SET PARSEONLY wrappers unwrapped and the inner query re-validated.

The design choice that matters most is the dual strategy. The system tries AST parsing first using a SQL parser that understands dialect-specific syntax. If the parser cannot handle the query — an unusual syntax extension, a very long statement, a parser bug — it does not silently pass. It falls back to keyword matching that covers DROP, TRUNCATE, ALTER, CREATE, GRANT, REVOKE, EXEC, MERGE, COPY, LOCK, and others.


If any check fails, the system throws a typed error that the API layer returns as a 400 with a machine-readable code. The caller gets a clear signal that the query was rejected for safety, not a generic execution failure.

The Database Session Refuses Writes

The preprocessor catches dangerous SQL before it reaches the database. But what if the preprocessor has a bug, or an edge case slips through a parser limitation? The second layer makes the database engine itself refuse write operations.


How this works depends on the engine:

Every new pool connection runs SET default_transaction_read_only = on during initialization. This is a session-level setting that persists for the life of the connection. PostgreSQL will reject any INSERT, UPDATE, DELETE, or DDL with a read-only transaction error before it touches any data.

Single layer

If the preprocessor misses a query, the database executes it with full write access.

  • one parser bug = one write to production
  • no fallback, no second opinion
  • the blast radius of a single failure is unbounded

Preprocessor plus session enforcement

Even if the preprocessor misses a query, the database session refuses writes independently.

  • PostgreSQL: session-level rejection before data is touched
  • MySQL / MariaDB: per-call rejection
  • SQL Server and BigQuery: preprocessor remains the primary guard
  • where session enforcement exists, the database engine acts as an independent second voter

The File System Refuses the Write

For SQLite databases — including spreadsheet uploads, which are converted to SQLite at import time — there is a third layer that operates below the database engine entirely.


Every SQLite file is opened with three independent read-only flags:

The first two are SQLite URI parameters. The third is a Node.js constructor option.

mode=ro          → SQLite URI: opens the file in read-only mode
immutable=1      → SQLite URI: skips the locking protocol entirely
ts
new DatabaseSync(uri, { readOnly: true })
// Node.js binding-level enforcement, independent of the URI flags

mode=ro tells the operating system to refuse any write system call to the file. immutable=1 goes further — SQLite treats the database as permanently constant and will not attempt any write operation, not even a journal file. The readOnly constructor option adds a third enforcement point at the Node.js binding level, independent of the URI flags.


This is triple-redundant by design. Even if the preprocessor and every other layer were somehow bypassed, the operating system kernel would return a read-only error before a single byte reached the disk. And because the database file is opened fresh and closed after every query, there is no persistent handle that could accumulate stale state.

The Model Is Already Shaped

The fourth layer operates before SQL exists at all. The agent's system prompt includes guardrails that shape what the model generates in the first place.The model is instructed to validate SQL before executing it, and to never present results from a query that has not been validated and executed in the current turn. Skill-level instructions explicitly tell the model to call a validation step before any execution step. Anti-prompt-injection rules instruct the model to ignore override instructions embedded in user messages.


This is the weakest layer. It is a soft constraint — the model could in theory ignore it. But it is also the earliest layer in the chain, and it dramatically reduces the surface area for the harder layers downstream. The vast majority of queries the model generates are already well-formed SELECT statements because that is what the prompt shapes it to produce. The engineering layers exist to catch the exceptions.

Where Enforcement Is Thinner

No system is uniformly strong. The defense-in-depth model means a gap in one layer does not become an incident — but the gaps still exist.


PostgreSQL's session hook is not fail-fast. If the SET default_transaction_read_only = on command fails during connection initialization — the database user lacks SET permission, or the connection times out mid-handshake — the error is logged but the connection is returned to the pool anyway. In that window, only the preprocessor stands between the query and a writable connection.


SQL Server execution has no session guard. Validation wraps in a rolled-back transaction, but execution runs directly against the pool with no database-level write protection. The preprocessor is the only code-level safety net for SQL Server queries at execution time.


BigQuery relies on infrastructure configuration. If the service account has write permissions in Google Cloud and the preprocessor is bypassed, there is no code-level fallback. The assumption is that IAM roles are configured correctly — a reasonable assumption, but not an engineering guarantee.


MySQL and MariaDB enforce per-call, not per-connection. The read-only guard fires at the start of every adapter method call. Code that accesses the underlying connection pool directly — bypassing the adapter — would skip the per-call SET SESSION entirely. No such path exists in the current codebase, but the pool object is accessible, so the guard is architecturally softer than PostgreSQL's per-connection setting.


AST parsing failure degrades to regex. If the SQL parser cannot handle a query, the system falls back to keyword matching. The keyword list is comprehensive, but regex matching is not semantically equivalent to AST analysis. The fallback catches the common cases, but an adversary with knowledge of the regex patterns could in theory craft something that passes.

Semantic Understanding, Not Keyword Matching

The path forward is moving from "block dangerous statement types" to "understand what the query does." The current system knows that a statement is a SELECT. It does not know which tables the query touches, what the expected cost is, or whether the query matches the user's stated intent.


Table allow-lists at the preprocessor level would prevent hallucinated table names from reaching the database. Cost estimation would cap damage even for technically read-only queries that scan terabytes. And making every session-enforcement hook fail-fast — refusing the connection instead of logging and continuing — would close the remaining windows where a single layer failure matters more than it should.


The four layers already make write-through a multi-fault scenario. The work ahead is making each layer individually airtight.

Teaching Agents Your Business Language

How domain jargon becomes typed instruction fragments and runtime context — the knowledge layer that helps agents generate correct queries in the first place.

Keep reading