Skip to content

feat: OpenGraph findings- BED-6903#2333

Open
brandonshearin wants to merge 26 commits intomainfrom
BED-6903
Open

feat: OpenGraph findings- BED-6903#2333
brandonshearin wants to merge 26 commits intomainfrom
BED-6903

Conversation

@brandonshearin
Copy link
Contributor

@brandonshearin brandonshearin commented Feb 3, 2026

Description

This PR contains odds and ends bits to support the wire up of datapipe to the opengraph schema definitions that both teams have been chuggin on.

  • az_graph_schema incorrectly defined an azure environment with a type of Tenant, when AZTenant is needed. @wes-mil Please look over my change to az_graph_schema.sql
  • refactor GetEnvironments* methods on OpenGraphSchema interface to use the same shared func with configurable sql filters.
  • refactor GetSchemaRelationshipFinding* methods on OpenGraphSchema interface to use a shared func with configurable sql filters.
  • methods to get kind/source_kind by id added
  • in convertors.go i capitalize the environment_id property on any generic-ingested node. this will keep environment_id's consistent with the uppercase convention we currently maintain for DomainSIDs and TenantIDs

Motivation and Context

BED-6903
OpenGraph findings

How Has This Been Tested?

Screenshots (optional):

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

Summary by CodeRabbit

  • New Features

    • Filtered environment retrieval and environment-scoped relationship findings.
    • New lookups to fetch kinds and source kinds by ID.
  • Improvements

    • Migrations and bootstrap now upsert missing kinds on demand instead of failing.
    • Environment handling moved to a single-key uniqueness approach; environment identifiers are normalized (uppercase) where applicable.
  • Tests

    • Mocks and test helpers updated to reflect the new retrieval and lookup methods.

@brandonshearin brandonshearin self-assigned this Feb 3, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Centralizes environment and schema-relationship retrieval behind new filtered DB methods, adds ID-based Kind/SourceKind getters, changes SourceKind.Name to string with ToKind(), updates mocks and migrations to upsert missing kinds, and normalizes environment_id strings to uppercase in node conversion.

Changes

Cohort / File(s) Summary
Graph schema & DB API
cmd/api/src/database/graphschema.go
Introduced filter-driven core methods (GetEnvironmentsFiltered, getSchemaRelationshipFindingsFiltered), added GetEnvironmentByEnvironmentKindId and GetSchemaRelationshipFindingsByEnvironmentId, removed GetEnvironmentByKinds/GetEnvironmentById, and routed existing getters to filtered implementations.
Kind & SourceKind APIs
cmd/api/src/database/kind.go, cmd/api/src/database/sourcekinds.go
Added GetKindById and GetSourceKindById; changed SourceKind.Name from graph.Kind to string and added ToKind(); adjusted mappings and queries to new types.
Mocks
cmd/api/src/database/mocks/db.go
Removed mocks for GetEnvironmentByKinds; added mocks/recorders for GetEnvironmentsFiltered, GetKindById, GetSourceKindById, GetSchemaRelationshipFindingsByEnvironmentId, and updated recorder signatures.
Upsert & finding logic
cmd/api/src/database/upsert_schema_environment.go, cmd/api/src/database/upsert_schema_finding.go, packages/go/schemagen/generator/sql.go
Switched environment lookup to use environment_kind_id single-key; replace-on-create now deletes existing by environment_kind_id; replaced raise-on-missing-kind with upsert-and-retry in generator/sql.
Migrations / SQL scripts
cmd/api/src/database/migration/extensions/ad_graph_schema.sql, cmd/api/src/database/migration/extensions/az_graph_schema.sql, cmd/api/src/database/migration/migrations/v8.7.0.sql
SQL now upserts missing kinds before use, removes forced pre-upserts for some envs (AZTenant changes), and changes unique constraint on schema_environments to be unique on environment_kind_id.
Services, API surface & dataflow
cmd/api/src/services/graphify/convertors.go, cmd/api/src/api/v2/database_wipe.go, cmd/api/src/api/v2/kinds.go, cmd/api/src/daemons/datapipe/pipeline.go
Normalize environment_id strings to uppercase in node conversion; convert stored source-kind names to graph.Kind via ToKind() where required; minor list/mapping adjustments.
Tests / Integration
cmd/api/src/database/sourcekinds_integration_test.go, cmd/api/src/database/graphschema_integration_test.go
Updated fixtures to use string SourceKind.Name and replaced calls to GetEnvironmentById with GetEnvironmentByEnvironmentKindId.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement, api, dbmigration

Suggested reviewers

  • cweidenkeller
  • kpowderly
  • LawsonWillard
  • AD7ZJ

Poem

🐰 I hop through rows and filtered streams,

Upserts stitch missing kindly dreams,
AZTenant finds its place so bright,
Env IDs gleam in uppercase light,
Mocks and queries dance into the night.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main feature change (OpenGraph findings support) and includes the associated ticket identifier (BED-6903).
Description check ✅ Passed The description covers the key changes (az_graph_schema fix, refactored GetEnvironments/GetSchemaRelationshipFinding methods, new ID-based accessors, environment_id capitalization). While testing details and some checklist items are missing, the core content adequately describes the PR's purpose and changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch BED-6903

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brandonshearin brandonshearin changed the title feat: OpenGraph findings- BED 6903 feat: OpenGraph findings- BED-6903 Feb 3, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@cmd/api/src/database/graphschema.go`:
- Around line 748-762: GetSchemaRelationshipFindingsByEnvironmentId may return a
nil slice when no rows are found; modify it so it always returns a non-nil empty
slice instead of nil. After the query (before returning), ensure findings is
initialized to an empty slice when length is zero (e.g., set findings =
[]model.SchemaRelationshipFinding{}), so callers serializing the result get []
not null; keep references to the existing variables and use the same Raw(...)
call and CheckError handling as currently implemented.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/api/src/database/graphschema.go (1)

598-641: ⚠️ Potential issue | 🟠 Major

Update buildSQLFilter to properly handle qualified column names like se.id.

The current implementation passes the entire qualified name (e.g., "se.id") to pgsql.Identifier(), which treats it as a single identifier rather than splitting it into table alias and column. The pattern exists in cmd/api/src/model/filter.go where BuildSQLFilter accepts a tableAlias parameter and uses pgsql.AsCompoundIdentifier(tableAlias, columnName) when needed.

Since GetEnvironmentByKinds and GetEnvironmentById both pass qualified filters like "se.environment_kind_id" and "se.id" to buildSQLFilter, the function should either:

  • Accept a tableAlias parameter and use pgsql.AsCompoundIdentifier
  • Parse and split qualified names on the dot separator before passing to pgsql.Identifier

Without proper handling, the generated SQL may be semantically incorrect (e.g., "se.id" instead of "se"."id").

Copy link
Contributor

@wes-mil wes-mil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema changes look good to me

Copy link
Contributor

@mistahj67 mistahj67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but left a few comments to ponder 🤔

Copy link
Contributor

@wes-mil wes-mil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only comment is that you could return the ID from genscript_upsert_source_kind and then you could skip having to SELECT but tomato tomato

EDIT: Approving specifically the sql generator changes, please get other approvals for the rest of the changes 😄

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/api/src/database/upsert_schema_environment.go (1)

117-133: ⚠️ Potential issue | 🟠 Major

Fix environment lookup: GetEnvironmentById expects schema environment ID, not environment_kind_id.

Line 122 passes EnvironmentKindId to GetEnvironmentById, which filters on se.id. That will usually miss the existing environment and, after the new unique constraint, will cause duplicate insert errors. Query by environment_kind_id instead (and update the comment to match the new constraint).

🐛 Suggested fix
 func (s *BloodhoundDB) replaceSchemaEnvironment(ctx context.Context, graphSchema model.SchemaEnvironment) (int32, error) {
-	if existing, err := s.GetEnvironmentById(ctx, graphSchema.EnvironmentKindId); err != nil && !errors.Is(err, ErrNotFound) {
-		return 0, fmt.Errorf("error retrieving schema environment: %w", err)
-	} else if !errors.Is(err, ErrNotFound) {
-		// Environment exists - delete it first
-		if err := s.DeleteEnvironment(ctx, existing.ID); err != nil {
-			return 0, fmt.Errorf("error deleting schema environment %d: %w", existing.ID, err)
-		}
-	}
+	filters := model.Filters{
+		"se.environment_kind_id": []model.Filter{{Operator: model.Equals, Value: fmt.Sprintf("%d", graphSchema.EnvironmentKindId)}},
+	}
+	if existing, err := s.GetEnvironmentsFiltered(ctx, filters); err != nil {
+		return 0, fmt.Errorf("error retrieving schema environment: %w", err)
+	} else if len(existing) > 0 {
+		// Environment exists - delete it first
+		if err := s.DeleteEnvironment(ctx, existing[0].ID); err != nil {
+			return 0, fmt.Errorf("error deleting schema environment %d: %w", existing[0].ID, err)
+		}
+	}
🤖 Fix all issues with AI agents
In `@cmd/api/src/database/migration/migrations/v8.7.0.sql`:
- Around line 18-30: The migration adds a UNIQUE constraint on
schema_environments.environment_kind_id which will fail if duplicate
environment_kind_id values already exist; modify the migration (around the ALTER
TABLE ... ADD CONSTRAINT schema_environments_environment_kind_id_key) to first
check for duplicates by querying schema_environments with GROUP BY
environment_kind_id HAVING COUNT(*) > 1, and if any rows are returned either (a)
perform a cleanup strategy (delete or consolidate duplicates) or (b) RAISE
EXCEPTION with a clear message listing the offending environment_kind_id values
so the upgrade fails fast and informs operators; ensure this pre-check runs
before attempting to add the constraint and reference schema_environments and
schema_environments_environment_kind_id_key in the added check/exception.

In `@cmd/api/src/database/upsert_schema_finding.go`:
- Around line 39-47: The code incorrectly calls GetEnvironmentById(ctx,
environmentKindId) with an environment *kind* ID; change this to look up by
environment kind instead (either call an existing
GetEnvironmentByKindID/GetEnvironmentByEnvironmentKindID function or add one
that queries se.environment_kind_id = environmentKindId) so the lookup uses
environment_kind_id not se.id, and update the surrounding comment to state the
uniqueness is on environment_kind_id (not environment id); keep references to
validateAndTranslateSourceKind, GetEnvironmentById, and environmentKindId to
locate and replace the call.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/api/src/database/upsert_schema_environment.go (1)

29-62: ⚠️ Potential issue | 🟠 Major

Multi-step upsert is not wrapped in a transaction — partial failures leave inconsistent state.

UpsertSchemaEnvironmentWithPrincipalKinds performs multiple dependent write operations (delete old environment → create new environment → delete old principal kinds → create new principal kinds) without a transaction. If any intermediate step fails, the database is left in a half-updated state (e.g., old environment deleted but new one not created, or environment created but principal kinds missing).

Consider wrapping the write operations in s.Transaction(ctx, ...) to ensure atomicity.

🤖 Fix all issues with AI agents
In `@cmd/api/src/database/graphschema_integration_test.go`:
- Around line 2491-2492: The test calls GetEnvironmentByEnvironmentKindId with
environment.ID (the row PK) but the function expects the environment_kind_id;
change the call to pass environment.EnvironmentKindId (the field on the
environment struct) instead of environment.ID so the query filters by the
intended foreign-key value when invoking GetEnvironmentByEnvironmentKindId in
graphschema_integration_test.go.
🧹 Nitpick comments (3)
cmd/api/src/database/graphschema_integration_test.go (1)

1392-1470: Test naming and arg field are now misleading after the method rename.

TestGetSchemaEnvironmentById and the args.environmentId field no longer describe what's being tested — the method now looks up by environment_kind_id, not the row ID. Consider renaming the test to TestGetSchemaEnvironmentByEnvironmentKindId and the arg to environmentKindId to keep the test self-documenting.

cmd/api/src/database/upsert_schema_finding.go (1)

39-42: Source kind validated but result discarded — consider documenting the intent.

validateAndTranslateSourceKind is called for its side effect (registering the source kind if absent), but the returned ID is discarded. A brief inline comment explaining why this call exists would help future readers understand the intent, since the source kind ID isn't used in the finding creation.

cmd/api/src/database/graphschema.go (1)

987-1012: Remove commented-out dead code.

This block of commented-out code (a Kind struct and GetKindByName method) with a TODO note should be cleaned up. It adds noise and the TODO suggests it's already been implemented elsewhere.

BEGIN
SELECT id INTO retreived_environment_kind_id FROM kind WHERE name = v_environment_kind_name;
IF retreived_environment_kind_id IS NULL THEN
RAISE EXCEPTION 'couldn''t find matching kind_id';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the exception is removed, the environment kinds must also be declared as node kinds before being created as an environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants