nl2sql

Security Architecture

The NL2SQL Platform implements a multi-layered security approach designed to ensure that users can only access data they are explicitly authorized to see. This document outlines the security mechanisms for authentication, authorization, and query validation.

1. Authentication & Context

The platform assumes that the caller (API or CLI) has already authenticated the user. The user’s identity and roles are passed into the execution pipeline via the user_context dictionary in the GraphState.

User Context Structure

The user_context must contain the following keys to enable authorization checks:

{
  "role": "sales_analyst",
  "allowed_datasources": ["manufacturing_history", "manufacturing_supply"],
  "allowed_tables": [
      "customers", "sales_orders", "products", "inventory"
  ]
}

2. Authorization Layers

Security checks are performed at two distinct stages of the pipeline to fail fast and prevent unauthorized data access.

Layer 1: Datasource Access (Routing)

Component: DecomposerNode & OrchestratorVectorStore

Before any query planning begins, the system enforces Knowledge Isolation.

Layer 2: Table Access (Logical Validation)

Component: LogicalValidatorNode

After the PlannerNode generates an abstract syntax tree (AST) for the query, the LogicalValidatorNode performs a strict policy check.

3. Query Safety & Validation

Beyond RBAC, the system enforces strict structural constraints to prevent SQL injection and accidental mutation.

Read-Only Enforcement

The LogicalValidatorNode enforces that all generated plans are strictly READ operations (SELECT).

Validated AST vs. Raw SQL

The system does not rely on the LLM to generate raw SQL strings directly.

  1. Planner: Generates a typed JSON AST (PlanModel).
  2. Validator: Validates the AST logic and security.
  3. Generator: Compiles the validated AST into SQL using a deterministic compiler (Visitor pattern).

This separation ensures that “Prompt Injection” attacks cannot easily force the model to output malicious SQL syntax, as the Generator controls the final output syntax.

4. Resource Protection (DoS)

To prevent Denial of Service (DoS) attacks or run-away queries, the system implements resource safeguards in the ExecutorNode.

5. Secret Management

The platform employs a Pluggable Secret Management system (nl2sql.secrets) to handle sensitive credentials securely.

Mechanism

Providers

The system supports extensible providers via the SecretProvider protocol. You configure them in secrets.yaml with a unique id.

  1. Environment (env): Standard lookup (e.g., ${env:DB_PASS}). Always available.
  2. AWS Secrets Manager: Defined by type aws. (e.g., ${aws-prod:db/pass}).
  3. Azure Key Vault: Defined by type azure. (e.g., ${azure-main:db-secret}).
  4. HashiCorp Vault: Defined by type hashi. (e.g., ${vault-internal:secret/data/db:pass}).

Dependencies: Cloud providers require optional extras (nl2sql-core[aws], nl2sql-core[azure], etc.) to keep the core lightweight.

6. Configuration Security

Strict Validation (V3)

Datasource configurations are validated strictly at load time. This ensures:

6.1 Policy Definition (configs/policies.json)

The application uses Role-Based Access Control (RBAC). The policies.json file defines policies keyed by Role ID (e.g., admin, analyst).

Strict Namespacing Rule: To prevent namespace collisions, allowed_tables MUST use the format datasource_id.table_name. Simple table names are not supported.

Example

{
  "sales_analyst": {
    "description": "Access to Sales DB only",
    "role": "analyst",
    "allowed_datasources": ["sales_db"],
    "allowed_tables": [
      // Exact Match
      "sales_db.orders",
      
      // Datasource Wildcard
      "sales_db.customers_*"
    ]
  },
  "admin": {
    "description": "Super Admin",
    "role": "admin",
    "allowed_tables": ["*"]
  }
}

In CLI execution: nl2sql run ... --role sales_analyst. The application assumes the identity provider has already authenticated the user and assigned this role.

6.2 Policy Schema & Validation

Policies are treated as Configuration Code. To prevent misconfiguration (e.g., typos, invalid types), the system validates policies.json against a strict Pydantic Schema at startup.

Schema (nl2sql.security.policies):

  1. Strict Typing: Fields like allowed_datasources MUST be lists of strings.
  2. Syntax Enforcement: allowed_tables values are validated ensuring they match the datasource_id.table_name or wildcard format.
  3. Fail Fast: If the configuration is invalid, the application refuses to start, printing a clear error message describing the violation.

6.3 Policy Management CLI

You can validate your policy file without running a query using the CLI.

# Validate Syntax & Integrity
nl2sql policy validate

This command performs two checks:

  1. Schema Check: Validates syntax against the Pydantic model.
  2. Integrity Check: Verifies that referenced datasources and tables actually exist in datasources.yaml. Users often typo table names; this catches those errors before runtime.