Conversation
…itHub API fetching spdx id
|
Thank you for submitting this pull request! We appreciate your contribution to the project. Before we can merge it, we need to review the changes you've made to ensure they align with our code standards and meet the requirements of the project. We'll get back to you as soon as we can with feedback. Thanks again! |
Reviewer's GuideRefactors license and metadata workflows to centralize file detection and validation, adds robust database update flows and logging, modernizes the FAIR dashboard templates, and updates the validator API and Nuxt UI configuration to support richer validation states and dependency alignment. Sequence diagram for metadata validation and database update workflowsequenceDiagram
actor GitHubWebhook
participant BotComplianceChecks
participant MetadataModule as metadata_index_js
participant GitHubAPI as GitHub_API
participant ValidatorService as Validator_API
participant Database as DB
participant Logwatch
GitHubWebhook->>BotComplianceChecks: push event(context, owner, repository)
BotComplianceChecks->>MetadataModule: checkMetadataFilesExists(context, owner, repository)
MetadataModule->>GitHubAPI: GET codemeta.json
GitHubAPI-->>MetadataModule: 200 or 404
MetadataModule->>GitHubAPI: GET CITATION.cff
GitHubAPI-->>MetadataModule: 200 or 404
MetadataModule-->>BotComplianceChecks: {codemeta, citation}
BotComplianceChecks->>MetadataModule: updateMetadataDatabase(repoId, subjects, repository, owner, context)
MetadataModule->>Database: ensureMetadataRecord(repository.id, subjects)
Database-->>MetadataModule: existing codeMetadata or created record
MetadataModule->>MetadataModule: determineRevalidationNeeds(context, subjects)
MetadataModule-->>Logwatch: info revalidation flags
alt revalidationNeeded
MetadataModule->>GitHubAPI: gatherMetadata(context, owner, repository)
GitHubAPI-->>MetadataModule: base metadata
MetadataModule->>MetadataModule: applyDbMetadata(existing, metadata)
alt codemetaExists and revalidateCodemeta
MetadataModule->>GitHubAPI: getCodemetaContent(context, owner, repository)
GitHubAPI-->>MetadataModule: codemetaContent or null
alt codemetaContent
MetadataModule->>ValidatorService: POST /validate-codemeta {file_content}
ValidatorService-->>MetadataModule: {message, version, error}
MetadataModule->>MetadataModule: validateCodemeta() -> ValidationResult
MetadataModule-->>Logwatch: success/warn/info codemeta status
MetadataModule->>MetadataModule: applyCodemetaMetadata(codemetaContent, metadata, repository)
else no codemetaContent
MetadataModule->>MetadataModule: codemetaValidation = ValidationResult.invalid("File not found")
MetadataModule-->>Logwatch: info no codemeta.json
end
end
alt citationExists and revalidateCitation
MetadataModule->>GitHubAPI: getCitationContent(context, owner, repository)
GitHubAPI-->>MetadataModule: citationContent or null
alt citationContent
MetadataModule->>ValidatorService: POST /validate-citation {file_path}
ValidatorService-->>MetadataModule: {message, output, error}
MetadataModule->>MetadataModule: validateCitation() -> ValidationResult
MetadataModule-->>Logwatch: success/warn/info citation status
MetadataModule->>MetadataModule: applyCitationMetadata(citationContent, metadata, repository)
else no citationContent
MetadataModule->>MetadataModule: citationValidation = ValidationResult.invalid("File not found")
MetadataModule-->>Logwatch: info no CITATION.cff
end
end
MetadataModule->>Database: updateMetadataRecord(repoId, metadata, codemetaValidation, citationValidation, subjects)
Database-->>MetadataModule: updated record
else noRevalidation
MetadataModule->>MetadataModule: reuse existing metadata and validation
end
MetadataModule-->>BotComplianceChecks: {metadata, validCodemeta, validCitation, codemetaValidation, citationValidation, existing}
BotComplianceChecks-->>GitHubWebhook: compliance results
Sequence diagram for rerunMetadataValidation command workflowsequenceDiagram
actor Maintainer
participant GitHubUI as GitHub_UI
participant BotApp as Bot_App
participant CommandHandler as rerunMetadataValidation
participant MetadataModule as metadata_index_js
participant LicenseModule as license_index_js
participant GitHubAPI as GitHub_API
participant Database as DB
participant Logwatch
Maintainer->>GitHubUI: comment /rerun-metadata-validation
GitHubUI->>BotApp: issue_comment event(context)
BotApp->>CommandHandler: rerunMetadataValidation(context, owner, repository, issueBody)
CommandHandler->>Logwatch: start "Rerunning metadata validation"
CommandHandler->>MetadataModule: checkMetadataFilesExists(context, owner, repository)
MetadataModule->>GitHubAPI: GET codemeta.json and CITATION.cff
GitHubAPI-->>MetadataModule: contents or 404
MetadataModule-->>CommandHandler: subjects {codemeta, citation}
CommandHandler->>LicenseModule: checkForLicense(context, owner, repository.name)
LicenseModule->>GitHubAPI: check license files
GitHubAPI-->>LicenseModule: file or not found
LicenseModule-->>CommandHandler: license object {status, path, content, spdx_id}
CommandHandler->>CommandHandler: subjects.license = license
CommandHandler->>CommandHandler: build syntheticContext with pusher.name = GH_APP_NAME[bot]
CommandHandler->>MetadataModule: updateMetadataDatabase(repository.id, subjects, repository, owner, syntheticContext)
MetadataModule->>Database: ensureMetadataRecord and updateMetadataRecord
Database-->>MetadataModule: updated codeMetadata
MetadataModule-->>CommandHandler: validation results
CommandHandler->>MetadataModule: applyMetadataTemplate(subjects, baseTemplate, repository, owner, syntheticContext)
MetadataModule-->>CommandHandler: newMetadataSection
CommandHandler->>CommandHandler: replace Metadata section in issueBody
CommandHandler->>CommandHandler: updatedBody = applyLastModifiedTemplate(updatedBody)
CommandHandler->>BotApp: createIssue(context, owner, repository, ISSUE_TITLE, updatedBody)
BotApp->>GitHubAPI: create or update issue
GitHubAPI-->>BotApp: issue updated
BotApp-->>CommandHandler: ok
CommandHandler->>Logwatch: info "Metadata validation rerun completed"
alt error
CommandHandler-->>Logwatch: error details
CommandHandler->>GitHubAPI: update issue with restored body
GitHubAPI-->>CommandHandler: ok
CommandHandler-->>BotApp: throw error
end
Sequence diagram for rerunLicenseValidation command workflowsequenceDiagram
actor Maintainer
participant GitHubUI as GitHub_UI
participant BotApp as Bot_App
participant CommandHandler as rerunLicenseValidation
participant LicenseModule as license_index_js
participant MetadataModule as metadata_index_js
participant GitHubAPI as GitHub_API
participant Database as DB
participant Logwatch
Maintainer->>GitHubUI: comment /rerun-license-validation
GitHubUI->>BotApp: issue_comment event(context)
BotApp->>CommandHandler: rerunLicenseValidation(context, owner, repository, issueBody)
CommandHandler->>Logwatch: start "Rerunning License Validation"
CommandHandler->>LicenseModule: checkForLicense(context, owner, repository.name)
LicenseModule->>GitHubAPI: check LICENSE, LICENSE.md, LICENSE.txt
GitHubAPI-->>LicenseModule: license file or none
alt license found
LicenseModule-->>CommandHandler: license {status, path, content, spdx_id}
CommandHandler->>LicenseModule: updateLicenseDatabase(repository, license)
LicenseModule->>Database: findUnique licenseRequest(repository_id)
alt existing entry
Database-->>LicenseModule: existingLicense
LicenseModule->>LicenseModule: validateLicense(license, existingLicense)
LicenseModule->>Database: update licenseRequest
else no entry
Database-->>LicenseModule: null
LicenseModule->>Database: create licenseRequest
end
else no license
LicenseModule-->>CommandHandler: {status: false, path, content: "", spdx_id: null}
CommandHandler-->>Logwatch: error "License not found"
CommandHandler-->>BotApp: throw Error
end
note over CommandHandler,GitHubAPI: If license updated successfully, rebuild LICENSE section in issue body
CommandHandler->>CommandHandler: strip last updated timestamp from issueBody
CommandHandler->>LicenseModule: applyLicenseTemplate({license}, baseTemplate, repository, owner, context)
LicenseModule-->>CommandHandler: newLicenseSection
CommandHandler->>CommandHandler: replace LICENSE section in issueBody
CommandHandler->>CommandHandler: lastModifiedBody = applyLastModifiedTemplate(updatedBody)
CommandHandler->>BotApp: createIssue(context, owner, repository, ISSUE_TITLE, lastModifiedBody)
BotApp->>GitHubAPI: create or update issue
GitHubAPI-->>BotApp: issue updated
CommandHandler->>Logwatch: info "License validation rerun completed"
alt any error
CommandHandler-->>Logwatch: error details
CommandHandler->>GitHubAPI: restore issue with body without command
GitHubAPI-->>CommandHandler: ok
CommandHandler-->>BotApp: throw error with cause
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions! |
PR Summary
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In several places
contains_metadatais computed as!!(subjects.citation & subjects.codemeta)which uses bitwise&instead of logical&&; this will coerce booleans to numbers and can silently produce incorrect state in the DB, so it should be updated tosubjects.citation && subjects.codemeta. - The
subjects.licenseshape is now inconsistent (boolean in some paths, object withstatusin others, e.g.runComplianceChecks/iterateCommitDetailsvsapplyArchivalTemplateandapplyLicenseTemplate), which can lead to treating a repository with a license as if it has none; consider normalizingsubjects.licenseto a single object shape throughout. - In
rerunMetadataValidationandrerunLicenseValidation, the logic that slices the issue body on the'Last updated'marker and on section headings (## Metadata/## LICENSE) assumes those markers always exist; adding guards forindexOfreturning-1would make the rerun flows more robust against manually edited or legacy issues.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In several places `contains_metadata` is computed as `!!(subjects.citation & subjects.codemeta)` which uses bitwise `&` instead of logical `&&`; this will coerce booleans to numbers and can silently produce incorrect state in the DB, so it should be updated to `subjects.citation && subjects.codemeta`.
- The `subjects.license` shape is now inconsistent (boolean in some paths, object with `status` in others, e.g. `runComplianceChecks`/`iterateCommitDetails` vs `applyArchivalTemplate` and `applyLicenseTemplate`), which can lead to treating a repository with a license as if it has none; consider normalizing `subjects.license` to a single object shape throughout.
- In `rerunMetadataValidation` and `rerunLicenseValidation`, the logic that slices the issue body on the `'Last updated'` marker and on section headings (`## Metadata`/`## LICENSE`) assumes those markers always exist; adding guards for `indexOf` returning `-1` would make the rerun flows more robust against manually edited or legacy issues.
## Individual Comments
### Comment 1
<location> `bot/compliance-checks/metadata/index.js:365` </location>
<code_context>
+ contains_metadata: !!(subjects.citation & subjects.codemeta),
</code_context>
<issue_to_address>
**issue (bug_risk):** Bitwise `&` is used instead of logical `&&` when computing `contains_metadata`, which will produce incorrect results.
In both `ensureMetadataRecord` and `updateMetadataRecord`, `contains_metadata` is computed as `!!(subjects.citation & subjects.codemeta)`. Because `&` does numeric coercion, this only behaves like a logical AND while both values are strictly boolean. If either becomes a non-boolean truthy/falsy value, the result can be incorrect. Please use `subjects.citation && subjects.codemeta` instead to avoid this subtle bug.
</issue_to_address>
### Comment 2
<location> `bot/compliance-checks/metadata/index.js:109-118` </location>
<code_context>
+ `Metadata validation rerun completed for repo: ${repository.name} (ID: ${repository.id})`
);
- await createIssue(context, owner, repository, ISSUE_TITLE, lastModified);
} catch (error) {
- // Remove the command from the issue body
- const issueBodyRemovedCommand = issueBody.substring(
- 0,
- issueBody.indexOf(`<sub><span style="color: grey;">Last updated`)
- );
- const lastModified = await applyLastModifiedTemplate(
- issueBodyRemovedCommand
+ logwatch.error(
+ {
+ message: "Failed to rerun metadata validation",
+ repo: repoInfo,
+ error: error.message,
+ stack: error.stack,
+ },
+ true
);
- await createIssue(context, owner, repository, ISSUE_TITLE, lastModified);
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Codemeta fetch/validation errors are logged but do not update the validation status, which may leave stale or misleading DB state.
When `codemeta.json` fetch fails in `updateMetadataDatabase`, the catch only logs a warning and leaves `codemetaValidation` unchanged (likely carrying over a previous successful state). This can cause the DB to show a stale “valid” status when validation actually failed. Align this with the CITATION branch by setting `codemetaValidation = ValidationResult.error(error)` in the catch so `updateMetadataRecord` captures the current failure correctly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Thanks for closing this pull request! If you have any further questions, please feel free to open a new issue. We are always happy to help! |
Summary by Sourcery
Refactor and harden license and metadata validation workflows, centralizing file detection and database updates, improving logging, and enhancing dashboard issue rendering to better reflect validation states and service errors.
New Features:
Bug Fixes:
Enhancements:
Build:
Chores: