The NL2SQL Platform implements a multi-layered security approach designed to ensure that users can only access data they are explicitly authorized to see. This document outlines the security mechanisms for authentication, authorization, and query validation.
The platform assumes that the caller (API or CLI) has already authenticated the user. The user’s identity and roles are passed into the execution pipeline via the user_context dictionary in the GraphState.
The user_context must contain the following keys to enable authorization checks:
{
"role": "sales_analyst",
"allowed_datasources": ["manufacturing_history", "manufacturing_supply"],
"allowed_tables": [
"customers", "sales_orders", "products", "inventory"
]
}
datasource_id strings that the user can query. Use ["*"] for full access (e.g., Admin).Security checks are performed at two distinct stages of the pipeline to fail fast and prevent unauthorized data access.
Component: DecomposerNode & OrchestratorVectorStore
Before any query planning begins, the system enforces Knowledge Isolation.
allowed_datasources from state.user_context.filter={"datasource_id": {"$in": allowed_ids}}.Component: LogicalValidatorNode
After the PlannerNode generates an abstract syntax tree (AST) for the query, the LogicalValidatorNode performs a strict policy check.
PlanModel (AST).state.user_context["allowed_tables"].SECURITY_VIOLATION error is raised.Beyond RBAC, the system enforces strict structural constraints to prevent SQL injection and accidental mutation.
The LogicalValidatorNode enforces that all generated plans are strictly READ operations (SELECT).
PlanModel AST.The system does not rely on the LLM to generate raw SQL strings directly.
PlanModel).Visitor pattern).This separation ensures that “Prompt Injection” attacks cannot easily force the model to output malicious SQL syntax, as the Generator controls the final output syntax.
To prevent Denial of Service (DoS) attacks or run-away queries, the system implements resource safeguards in the ExecutorNode.
row_limit (configured in datasources.yaml, default: 1000), execution is aborted with a SAFEGUARD_VIOLATION.datasources.yaml) support statement_timeout_ms to kill long-running queries at the driver level (Native enforcement for Postgres/MySQL).max_bytes) on the serialized result set. If the data returned by the adapter exceeds this limit (default: 10MB), the execution is halted to prevent OOM (Out Of Memory) crashes.The platform employs a Pluggable Secret Management system (nl2sql.secrets) to handle sensitive credentials securely.
secrets.yaml.datasources.yaml) using the syntax ${provider_id:key}.SecretManager resolves these references before the configuration is parsed, ensuring that sensitive values are never hardcoded in YAML files or committed to version control.The system supports extensible providers via the SecretProvider protocol. You configure them in secrets.yaml with a unique id.
env): Standard lookup (e.g., ${env:DB_PASS}). Always available.aws. (e.g., ${aws-prod:db/pass}).azure. (e.g., ${azure-main:db-secret}).hashi. (e.g., ${vault-internal:secret/data/db:pass}).Dependencies: Cloud providers require optional extras (nl2sql-core[aws], nl2sql-core[azure], etc.) to keep the core lightweight.
Datasource configurations are validated strictly at load time. This ensures:
PostgresAdapter) defines and validates its own configuration schema requirements.configs/policies.json)The application uses Role-Based Access Control (RBAC). The policies.json file defines policies keyed by Role ID (e.g., admin, analyst).
Strict Namespacing Rule: To prevent namespace collisions, allowed_tables MUST use the format datasource_id.table_name. Simple table names are not supported.
{
"sales_analyst": {
"description": "Access to Sales DB only",
"role": "analyst",
"allowed_datasources": ["sales_db"],
"allowed_tables": [
// Exact Match
"sales_db.orders",
// Datasource Wildcard
"sales_db.customers_*"
]
},
"admin": {
"description": "Super Admin",
"role": "admin",
"allowed_tables": ["*"]
}
}
In CLI execution: nl2sql run ... --role sales_analyst.
The application assumes the identity provider has already authenticated the user and assigned this role.
Policies are treated as Configuration Code. To prevent misconfiguration (e.g., typos, invalid types), the system validates policies.json against a strict Pydantic Schema at startup.
Schema (nl2sql.security.policies):
allowed_datasources MUST be lists of strings.allowed_tables values are validated ensuring they match the datasource_id.table_name or wildcard format.You can validate your policy file without running a query using the CLI.
# Validate Syntax & Integrity
nl2sql policy validate
This command performs two checks:
datasources and tables actually exist in datasources.yaml. Users often typo table names; this catches those errors before runtime.