Skip to content

Add data_purposes support to datasets#7674

Open
galvana wants to merge 15 commits intomainfrom
feat/dataset-data-purposes
Open

Add data_purposes support to datasets#7674
galvana wants to merge 15 commits intomainfrom
feat/dataset-data-purposes

Conversation

@galvana
Copy link
Contributor

@galvana galvana commented Mar 16, 2026

Description Of Changes

Update fideslang dependency to version 3.1.3, which includes data_purposes field support. This enables purpose-based access control (PBAC) at dataset, collection, field, and sub-field levels via existing dataset CRUD endpoints.

Code Changes

  • Updated pyproject.toml to pin fideslang==3.1.3 (from 3.1.4a1)

Steps to Confirm

  1. Upsert a dataset with data_purposes at all levels via POST /api/v1/dataset/upsert
  2. GET the dataset back and confirm data_purposes persists at all levels
  3. Verify backward compat: existing datasets without data_purposes still work

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

Update fideslang dependency to use the feat/add-data-purposes-to-dataset-models
branch which adds data_purposes at dataset, collection, field, and sub-field
levels.

Dependency: ethyca/fideslang#39

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Contributor

vercel bot commented Mar 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Mar 25, 2026 8:25pm
fides-privacy-center Ignored Ignored Mar 25, 2026 8:25pm

Request Review

Adrian Galvan and others added 2 commits March 16, 2026 17:49
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rect-reference error

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galvana galvana changed the title feat: add data_purposes support to datasets Add data_purposes support to datasets Mar 18, 2026
@galvana galvana marked this pull request as ready for review March 24, 2026 22:30
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR introduces data_purposes support in datasets by taking a dependency on an unreleased branch of fideslang. The single change is a new root-level requirements.txt file that references fideslang from a Git branch.

  • The override mechanism appears broken: pyproject.toml still declares fideslang==3.1.4a1 and uv.lock still resolves to that PyPI release. The new requirements.txt is not wired into any uv install step, so the branch dependency will never actually be installed.
  • The branch reference (@feat/add-data-purposes-to-dataset-models) is non-deterministic — it is not pinned to a commit SHA and can silently drift as the upstream branch evolves.
  • The PR description itself flags this as pre-merge work ("The fideslang pin will be updated to the release version before merge"), but as currently written, even the dev/review environment won't pick up the new fideslang code without explicitly running uv pip install -r requirements.txt by hand.
  • pyproject.toml and uv.lock must be updated together for the override to take effect in a uv-managed project.

Confidence Score: 1/5

  • Not safe to merge — the dependency override has no effect as implemented, and the PR is explicitly flagged as pre-merge work pending an upstream release.
  • The sole change (new requirements.txt) does not actually install the intended fideslang branch in a uv-managed project because pyproject.toml and uv.lock were not updated. The feature being enabled (data_purposes support) will therefore not be present at runtime. Additionally, the PR author explicitly notes this should not be merged until fideslang#39 is released and the pin is updated to a released version.
  • requirements.txt — the override approach needs to be replaced with updates to pyproject.toml and uv.lock

Important Files Changed

Filename Overview
requirements.txt New root-level requirements.txt pinning fideslang to a git branch — does not override pyproject.toml or uv.lock, so the actual installed version is unchanged; also uses a mutable branch ref instead of a commit SHA

Reviews (1): Last reviewed commit: "Merge branch 'main' into feat/dataset-da..." | Re-trigger Greptile

requirements.txt Outdated
@@ -0,0 +1 @@
fideslang @ git+https://github.com/ethyca/fideslang.git@feat/add-data-purposes-to-dataset-models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Dependency override has no effect

This new requirements.txt file is not referenced anywhere in the project's install configuration, CI workflows, or Makefile. The actual project dependency is declared in pyproject.toml (line 58: "fideslang==3.1.4a1") and locked in uv.lock. Creating a standalone requirements.txt at the repo root does not override those — uv sync and uv install will continue installing fideslang==3.1.4a1 from PyPI.

To actually swap in the unreleased branch build, you would need to either:

  1. Update pyproject.toml to use the git reference: "fideslang @ git+https://github.com/ethyca/fideslang.git@feat/add-data-purposes-to-dataset-models"
  2. Regenerate uv.lock with uv lock

The CI check in .github/workflows/publish_docs.yaml that uses requirements.txt only reads docs/fides/requirements.txt, not the root-level file added here.

requirements.txt Outdated
@@ -0,0 +1 @@
fideslang @ git+https://github.com/ethyca/fideslang.git@feat/add-data-purposes-to-dataset-models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Branch reference is non-deterministic

The dependency is pinned to a branch name (@feat/add-data-purposes-to-dataset-models) rather than a specific commit SHA. This means different installs at different points in time may pick up different code if new commits are pushed to that branch, making builds non-reproducible.

When the intent is to pin to a specific pre-release state, use a commit SHA instead:

fideslang @ git+https://github.com/ethyca/fideslang.git@<specific-commit-sha>

The PR description already acknowledges this needs to be replaced with the final release version before merge, but documenting the SHA in the meantime prevents accidental drift during development/review.

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a single file change — a new root-level requirements.txt that pins fideslang to a feature branch in order to get data_purposes field support ahead of its official release.

Critical (Must Fix)

1. Mutable branch reference is non-reproducible.
The pin fideslang @ git+...@feat/add-data-purposes-to-dataset-models resolves to branch HEAD at install time. Any force-push or new commit to that branch silently changes what gets installed. Pin to an immutable commit SHA until the release is available, then update to the release version before merging.

2. Conflicting dependency declaration.
pyproject.toml already declares fideslang==3.1.4a1 as the authoritative dependency. This new requirements.txt at the repo root creates a second, conflicting source of truth. CI workflows in this repo only reference docs/fides/requirements.txt and qa/requirements.txt — there is no evidence this root-level file is consumed by any pipeline, Docker build, or install step. The net effect is that local environments that manually run pip install -r requirements.txt will get a different fideslang than environments driven by pip install -e . (i.e., CI and production). If the intent is to override the dependency for development purposes, the right place is pyproject.toml directly.

Suggestions

  • The PR description explicitly notes this must not be merged until ethyca/fideslang#39 is released. Consider adding a draft status or a blocking label to prevent accidental early merge, since there is no automated guard on this.
  • The CHANGELOG entry is unchecked in the pre-merge checklist — if data_purposes support is user-facing, a changelog entry will be needed before merge.
  • No application code changes accompany the dependency bump. Even if the new field flows through automatically via fideslang's Pydantic models, it would be worth adding or calling out at least one integration test that round-trips data_purposes at each level (dataset, collection, field, sub-field) to guard against regressions.

@github-actions
Copy link

github-actions bot commented Mar 25, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 4056b0e.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

  • uv.lock

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galvana galvana enabled auto-merge March 25, 2026 17:53
Adrian Galvan and others added 4 commits March 25, 2026 10:56
The fideslang 3.1.3 bump added a data_purposes field to the Dataset
Pydantic model, but the SQLAlchemy model was missing the corresponding
column, causing TypeError on dataset create/update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galvana galvana requested a review from a team as a code owner March 25, 2026 20:05
@galvana galvana requested review from dsill-ethyca and removed request for a team March 25, 2026 20:05
Adrian Galvan and others added 2 commits March 25, 2026 13:14
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The customer_dataset was not being normalized through FideslangDataset
like stored and upcoming datasets were, causing DeepDiff to detect
false changes when fideslang adds new fields (e.g. data_purposes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants