-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
A critical security vulnerability exists in the database adapters where structural SQL identifiers (table names, column names, project IDs) are directly interpolated into queries using Python f-strings. This allows for SQL Injection via malicious or malformed metadata in the manifest/project configuration.
Steps to Reproduce
- Configure a dataset in the project manifest (e.g.,
postgresorbigquery). - Set the table identifier to a malicious string:
users"; DROP TABLE accounts; --. - Run the profiling pipeline or generate a data product:
from intugle.analysis.models import DataSet # Malicious identifier from manifest triggers the exploit during profiling ds = DataSet(data={"identifier": "users\"; DROP TABLE accounts; --"}, name="exploit") ds.profile()
- Observe that the generated SQL breaks the quoted identifier and executes the second command.
Expected Behavior
- Structural identifiers should be safely quoted and escaped using dialect-specific logic (e.g.,
sqlglot) to prevent syntax breakout. - The
is_piiflag should be dynamically retrieved from the manifest metadata or the profiling pipeline rather than being hardcoded to False.
Actual Behavior
- The PostgresAdapter and BigQueryAdapter generate strings like
"schema"."users"; DROP TABLE accounts; --"and send them directly to the database engine. DataProductassembly completely ignores any PII tags existing in the YAML manifest because of hardcoded initialization logic.
Error Messages/Stack Trace
# Example of what happens in BigQuery when a malformed name is injected
google.api_core.exceptions.BadRequest: 400 Syntax error: Unexpected identifier "DROP" at
Environment
- Intugle Version: fc1920d (Commit)
- Python Version: 3.10+
- Operating System: All
- LLM Provider: Gemini
- Data Source Type: Postgres, BigQuery, Databricks, MySQL, MariaDB
Code Snippet
Vulnerable structural SQL generation in BigQueryAdapter.py:
# line 153
count_query = f"SELECT COUNT(*) as count FROM {fqn}"Hardcoded governance logic in data_product.py:
# line 186
field_detail: FieldDetailsModel = FieldDetailsModel(
id=f"{source.table.name}.{column.name}",
...
is_pii=False, # This should not be hardcoded
...
)Additional Context
- This is a regression (it used to work in a previous version)
- This blocks my workflow
- I'm willing to submit a PR to fix this
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working