Skip to content

[BUG] [SECURITY/CRITICAL] SQL Injection Vulnerabilities in DB Adapters and Hardcoded PII Logic #216

@anmolp1

Description

@anmolp1

Bug Description

A critical security vulnerability exists in the database adapters where structural SQL identifiers (table names, column names, project IDs) are directly interpolated into queries using Python f-strings. This allows for SQL Injection via malicious or malformed metadata in the manifest/project configuration.

Steps to Reproduce

  1. Configure a dataset in the project manifest (e.g., postgres or bigquery).
  2. Set the table identifier to a malicious string: users"; DROP TABLE accounts; --.
  3. Run the profiling pipeline or generate a data product:
    from intugle.analysis.models import DataSet
    # Malicious identifier from manifest triggers the exploit during profiling
    ds = DataSet(data={"identifier": "users\"; DROP TABLE accounts; --"}, name="exploit")
    ds.profile()
  4. Observe that the generated SQL breaks the quoted identifier and executes the second command.

Expected Behavior

  1. Structural identifiers should be safely quoted and escaped using dialect-specific logic (e.g., sqlglot) to prevent syntax breakout.
  2. The is_pii flag should be dynamically retrieved from the manifest metadata or the profiling pipeline rather than being hardcoded to False.

Actual Behavior

  1. The PostgresAdapter and BigQueryAdapter generate strings like "schema"."users"; DROP TABLE accounts; --" and send them directly to the database engine.
  2. DataProduct assembly completely ignores any PII tags existing in the YAML manifest because of hardcoded initialization logic.

Error Messages/Stack Trace

# Example of what happens in BigQuery when a malformed name is injected
google.api_core.exceptions.BadRequest: 400 Syntax error: Unexpected identifier "DROP" at

Environment

  • Intugle Version: fc1920d (Commit)
  • Python Version: 3.10+
  • Operating System: All
  • LLM Provider: Gemini
  • Data Source Type: Postgres, BigQuery, Databricks, MySQL, MariaDB

Code Snippet

Vulnerable structural SQL generation in BigQueryAdapter.py:

# line 153
count_query = f"SELECT COUNT(*) as count FROM {fqn}"

Hardcoded governance logic in data_product.py:

# line 186
field_detail: FieldDetailsModel = FieldDetailsModel(
    id=f"{source.table.name}.{column.name}",
    ...
    is_pii=False, # This should not be hardcoded
    ...
)

Additional Context

  • This is a regression (it used to work in a previous version)
  • This blocks my workflow
  • I'm willing to submit a PR to fix this

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions