Skip to content

Implement Heretto CMS uploader and integration tests#100

Merged
hawkeyexl merged 9 commits intomainfrom
heretto2
Jan 4, 2026
Merged

Implement Heretto CMS uploader and integration tests#100
hawkeyexl merged 9 commits intomainfrom
heretto2

Conversation

@hawkeyexl
Copy link
Contributor

@hawkeyexl hawkeyexl commented Jan 3, 2026

Introduce a Heretto CMS uploader for automatic screenshot synchronization, enhancing the upload process and error handling. Add integration tests for Heretto API interactions and improve job asset handling. Update dependencies and refine screenshot handling logic. Document the new features and ensure robust logging throughout the upload operations.

Summary by CodeRabbit

  • New Features

    • REST API client and helpers for Heretto: resource discovery, asset validation, file mapping, search, and uploads.
    • Integration test suite and CI workflow with scheduled and PR-triggered runs, result artifacts, and automated failure notifications.
    • Metadata and spec ID generation enabling cross-file integration tracking.
  • Chores

    • Updated test scripts and dependencies for integration testing reliability.

✏️ Tip: You can customize this high-level summary in your review settings.

…nization

- Added HerettoUploader class to handle file uploads to Heretto CMS.
- Integrated uploader into the existing upload process for changed screenshots.
- Enhanced screenshot saving logic to track changes and source integration metadata.
- Created documentation for Heretto screenshot sync feature, detailing the upload process and architecture.
- Updated resolver to build file mappings and handle Heretto-specific configurations.
- Introduced utility functions to find Heretto integrations based on file paths.
- Added error handling and logging for upload operations to improve reliability.
- Create integration tests for Heretto API interactions.
- Implement job asset retrieval with pagination support.
- Validate presence of .ditamap files in job assets.
- Update package.json scripts for integration testing.
- Modify Heretto functions to improve error handling and logging.
- Update `saveScreenshot.js` to mark new screenshots as changed for upload.
- Add a new test file `screenshot.test.js` to verify sourceIntegration preservation and changed status for screenshots.
- Create a new test file `upload.test.js` to validate the upload module, including collecting changed files and integration configuration.
- Modify `package-lock.json` to link to local `doc-detective-common` instead of a versioned package.
- Enhance `heretto.js` to retrieve resource dependencies from the Heretto REST API, including detailed logging and XML parsing.
… paths

- Implement generateSpecId to derive safe spec IDs from absolute or relative file paths.
- Update parseTests to use generateSpecId for generating spec IDs, reducing collisions from files with the same basename.
- Change doc-detective-common from local file reference to version "^3.6.0-dev.2"
- Update posthog-node to version "^5.18.1"
- Upgrade chai to version "^6.2.2"
- Upgrade sinon to version "^21.0.1"
- Update ajv to version "^8.17.1"
- Update ajv-errors to version "3.0.0"
- Update ajv-formats to version "3.0.1"
- Update ajv-keywords to version "5.1.0"
- Update other dependencies in package-lock.json
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 3, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This PR adds Heretto integration features and CI: a new GitHub Actions integration-test workflow, expanded package scripts/deps, a REST-based Heretto client with asset/resource handling and file-mapping/upload helpers, integration and unit tests, and utilities to map Heretto outputs to test specs.

Changes

Cohort / File(s) Summary
CI/CD Integration Testing Infrastructure
.github/workflows/integration-tests.yml
New GitHub Actions workflow to run scheduled, manual, and PR-triggered integration tests; sets Node 24, uses secrets for HERETTO credentials, uploads artifacts, and creates/updates a GitHub issue on scheduled failures.
Test Scripts & Dependencies
package.json
Added test:integration and test:all scripts; unit test ignores integration tests; bumped/added deps (doc-detective-common^3.6.0-dev.2, added fast-xml-parser, posthog-node bumped) and minor devDependency bumps.
DITA Parsing Patterns
src/config.js
Added a DITA inline step pattern (<data name="step" ...>) to dita_1_0 inlineStatements; minor formatting/quote normalization only.
Heretto REST Client & Asset Handling
src/heretto.js
New REST client (createRestApiClient), REST helpers (getResourceDependencies, XML parsing helpers), paginated asset listing (getJobAssetDetails), DITA presence validation (validateDitamapInAssets), updated polling (pollJobStatus validates assets at completion), file-mapping (buildFileMapping, findFilesWithExtensions, extractImageReferences), search/upload/resolve helpers (searchFileByName, uploadFile, resolveFileId), and exports for these utilities.
Integration Test Suite
src/heretto.integration.test.js
New CI-only integration tests gated by env vars; exercises client creation, scenario discovery, publish triggering, polling with asset validation, download/extract, and end-to-end loadHerettoContent flows; includes extended timeouts and cleanup.
Unit Tests Expanded
src/heretto.test.js
Added tests for getJobAssetDetails (pagination/filters), validateDitamapInAssets, and updated pollJobStatus expectations to include post-poll asset validation.
Integration Mapping & Spec IDs
src/utils.js
Added findHerettoIntegration(config, filePath) export and generateSpecId(filePath); initialize/populate config._herettoPathMapping; attach sourceIntegration metadata to parsed steps and specs for Heretto screenshot handling.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Client as Doc Detective Client
    participant HerettoModule as src/heretto.js
    participant HerettoAPI as Heretto Publishing API
    participant RestAPI as Heretto REST API
    participant FS as File System

    User->>Client: loadHerettoContent(herettoConfig, uploadOnChange)
    Client->>HerettoModule: createRestApiClient(herettoConfig)
    HerettoModule->>RestAPI: configure REST client (XML Accept)
    
    alt uploadOnChange enabled
        Client->>HerettoModule: getResourceDependencies(ditamapId)
        HerettoModule->>RestAPI: GET ditamap dependencies (XML)
        RestAPI-->>HerettoModule: dependency XML → parsed mapping
        HerettoModule-->>Client: attach resourceDependencies
    end

    Client->>HerettoAPI: trigger publish job(fileId)
    HerettoAPI-->>Client: jobId

    loop poll until terminal
        Client->>HerettoModule: pollJobStatus(jobId)
        HerettoModule->>HerettoAPI: GET job status
        HerettoAPI-->>HerettoModule: status (PENDING/SUCCESS/FAIL)
        alt terminal
            HerettoModule->>RestAPI: getJobAssetDetails(fileId, jobId)
            RestAPI-->>HerettoModule: assets (paginated)
            HerettoModule->>HerettoModule: validateDitamapInAssets(assets)
            alt ditamap present
                HerettoModule-->>Client: completed job + assets
            else missing
                HerettoModule-->>Client: null (warning logged)
            end
        else continue
            HerettoModule-->>Client: null (continue polling)
        end
    end

    Client->>HerettoAPI: download published archive
    Client->>FS: extract to outputPath

    alt uploadOnChange enabled
        Client->>HerettoModule: buildFileMapping(outputPath, herettoConfig)
        HerettoModule->>FS: findFilesWithExtensions(.dita,.ditamap,.xml)
        FS-->>HerettoModule: file list
        HerettoModule->>HerettoModule: extractImageReferences(parsedXml)
        HerettoModule-->>Client: file mapping (local → relative refs)
    end

    Client-->>User: content loaded (with optional mapping)
Loading
sequenceDiagram
    autonumber
    actor CI as Integration Test Runner
    participant Test as src/heretto.integration.test.js
    participant HerettoModule as src/heretto.js
    participant HerettoAPI as Heretto API
    participant RestAPI as Heretto REST API
    participant FS as File System

    CI->>Test: start integration test
    Test->>Test: validate CI env vars & secrets
    Test->>HerettoModule: createClient(orgId, username, token)
    HerettoModule-->>Test: client with base URL

    Test->>HerettoModule: findScenario(scenarioName)
    HerettoModule->>HerettoAPI: search scenarios
    HerettoAPI-->>HerettoModule: scenario metadata
    HerettoModule-->>Test: scenarioId

    Test->>HerettoAPI: trigger publish
    HerettoAPI-->>Test: jobId

    loop polling
        Test->>HerettoModule: pollJobStatus(jobId)
        HerettoModule->>HerettoAPI: GET job status
        HerettoAPI-->>HerettoModule: status
        HerettoModule->>RestAPI: getJobAssetDetails
        RestAPI-->>HerettoModule: assets
        HerettoModule->>HerettoModule: validate ditamap presence
    end

    Test->>HerettoAPI: download output
    Test->>FS: extract & validate output
    Test-->>CI: assertions complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • Integrate Heretto API #95: Directly related — overlaps changes to Heretto integration code, client/polling logic, file mapping, and tests.

Poem

🐰 I hopped through XML, fetch and map in paw,
Polling jobs and parsing ditamaps with awe,
Uploads and searches, a clever little trick,
CI wakes each morning to run the test pick.
🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement Heretto CMS uploader and integration tests' directly and clearly summarizes the main changes in the pull request, which adds Heretto CMS uploader functionality and integration tests.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
.github/workflows/integration-tests.yml (1)

53-61: Artifact upload may fail if paths don't exist.

The upload-artifact@v4 action will fail if none of the specified paths exist. Consider adding if-no-files-found: ignore or if-no-files-found: warn to prevent workflow failures when no test results or logs are generated.

🔎 Proposed fix
       - name: Upload test results
         if: always()
         uses: actions/upload-artifact@v4
         with:
           name: integration-test-results
           path: |
             test-results/
             *.log
           retention-days: 7
+          if-no-files-found: warn
src/heretto.js (5)

228-261: Consider adding a maximum page limit to prevent potential infinite loops.

While the pagination logic is correct, if the API returns inconsistent totalPages values, this could result in excessive requests. Adding a safeguard maximum page count would improve resilience.

🔎 Proposed fix
 async function getJobAssetDetails(client, fileId, jobId) {
   const allAssets = [];
   let page = 0;
   const pageSize = 100;
+  const maxPages = 100; // Safeguard against infinite pagination
   let hasMorePages = true;

-  while (hasMorePages) {
+  while (hasMorePages && page < maxPages) {
     const response = await client.get(
       `/files/${fileId}/publishes/${jobId}/assets`,
       {
         params: {
           page,
           size: pageSize,
         },
       }
     );
     // ... rest of logic
   }
+
+  if (page >= maxPages) {
+    // Log warning about hitting max pages limit
+  }

   return allAssets;
 }

319-323: Use consistent log level naming.

The coding guidelines specify available log levels as: debug, info, warn, error. This code uses "warning" instead of "warn". This inconsistency appears in multiple places throughout the file.

As per coding guidelines, consider using "warn" instead of "warning" for consistency:

-          log(
-            config,
-            "warning",
-            `Publishing job ${jobId} completed but no .ditamap file found in ot-output/dita/`
-          );
+          log(
+            config,
+            "warn",
+            `Publishing job ${jobId} completed but no .ditamap file found in ot-output/dita/`
+          );

515-515: Remove unused parentPath parameter.

The parentPath parameter in the extractDependencies inner function is defined but never used.

🔎 Proposed fix
-    const extractDependencies = (obj, parentPath = "") => {
+    const extractDependencies = (obj) => {

863-870: Consider using the REST client or dedicated search client.

The function uses createApiClient which sets baseURL to /ezdnxtgen/api/v2, then overrides it in the request. This works but is confusing. Consider creating the client with the correct base URL or using createRestApiClient if appropriate.


1008-1017: Caching via config mutation is a side effect.

The function mutates herettoConfig.fileMapping to cache search results. While this works, it's a side effect that could be surprising. Consider documenting this behavior in the JSDoc or using a separate cache mechanism.

Add a note to the JSDoc:

 /**
  * Resolves a local file path to a Heretto file ID.
  * First checks file mapping, then searches by filename if needed.
+ * Note: Caches search results by mutating herettoConfig.fileMapping.
  * @param {Object} herettoConfig - Heretto integration configuration
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b09773 and b7309fa.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (7)
  • .github/workflows/integration-tests.yml
  • package.json
  • src/config.js
  • src/heretto.integration.test.js
  • src/heretto.js
  • src/heretto.test.js
  • src/utils.js
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,ts}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts}: Use async/await for asynchronous operations
Prefer destructuring for function parameters
Use meaningful variable names that reflect Doc Detective terminology
Add JSDoc comments for complex functions

Files:

  • src/heretto.test.js
  • src/heretto.integration.test.js
  • src/utils.js
  • src/config.js
  • src/heretto.js
**/*.test.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.js: When possible, directly import and run functions rather than use extensive mocking and stubbing in tests
Mock external dependencies (file system, HTTP requests) in tests
Test both successful and error scenarios
Validate configuration handling thoroughly in tests
Use realistic test data that matches actual usage patterns
Use Mocha for unit tests
Use Chai for assertions

Files:

  • src/heretto.test.js
  • src/heretto.integration.test.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Use the built-in logging system with available log levels: debug, info, warn, error

Files:

  • src/heretto.test.js
  • src/heretto.integration.test.js
  • src/utils.js
  • src/config.js
  • src/heretto.js
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Add comprehensive test coverage when adding new features
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Validate configuration handling thoroughly in tests

Applied to files:

  • src/heretto.test.js
  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Use realistic test data that matches actual usage patterns

Applied to files:

  • src/heretto.test.js
  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Test both successful and error scenarios

Applied to files:

  • src/heretto.test.js
  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Add comprehensive test coverage when adding new features

Applied to files:

  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Mock external dependencies (file system, HTTP requests) in tests

Applied to files:

  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : When possible, directly import and run functions rather than use extensive mocking and stubbing in tests

Applied to files:

  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to src/resolve.js : Follow existing regex patterns for markup detection when adding new features

Applied to files:

  • src/config.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Use Mocha for unit tests

Applied to files:

  • package.json
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.{js,ts} : Use meaningful variable names that reflect Doc Detective terminology

Applied to files:

  • package.json
🧬 Code graph analysis (3)
src/heretto.test.js (2)
src/heretto.integration.test.js (6)
  • heretto (17-17)
  • assets (173-177)
  • assets (190-194)
  • completedJob (155-161)
  • mockLog (60-64)
  • mockConfig (65-65)
src/heretto.js (2)
  • assets (303-303)
  • completedJob (645-651)
src/heretto.integration.test.js (1)
src/heretto.js (8)
  • client (597-597)
  • client (841-841)
  • client (908-908)
  • job (290-290)
  • job (636-640)
  • completedJob (645-651)
  • assets (303-303)
  • outputPath (662-669)
src/config.js (2)
src/config.test.js (2)
  • config (291-291)
  • config (334-334)
src/index.test.js (14)
  • config (221-223)
  • config (249-251)
  • config (265-267)
  • config (280-282)
  • config (350-366)
  • config (379-395)
  • config (408-424)
  • config (456-472)
  • config (517-548)
  • config (581-659)
  • config (705-736)
  • config (794-846)
  • config (1000-1000)
  • config (1021-1021)
🔇 Additional comments (24)
src/config.js (1)

91-91: LGTM - New DITA step pattern is well-formed.

The new inline statement pattern correctly captures DITA <data name="step"> elements and aligns with the existing pattern style. The regex properly escapes XML characters and uses non-greedy matching.

package.json (1)

7-9: LGTM - Test script organization follows best practices.

The separation of unit and integration tests is well-structured. The extended timeout for integration tests (10 minutes) is appropriate for real API interactions, and the default test script correctly excludes integration tests for faster local development.

src/utils.js (5)

31-50: LGTM - Well-implemented path-based integration lookup.

The function correctly normalizes paths and uses straightforward logic to match files to their Heretto integration. The early return and clear iteration pattern make it easy to understand and maintain.


63-89: LGTM - Robust spec ID generation prevents basename collisions.

The function intelligently uses relative paths when possible to create unique, portable spec IDs while maintaining readability. The path normalization and character sanitization are appropriate for filesystem and URL safety.


283-286: LGTM - Proper initialization and storage of Heretto path mappings.

The defensive initialization and strategic storage of path mappings enable proper tracking of which files belong to which Heretto integration. The underscore prefix convention for internal state is appropriate.

Also applies to: 314-314


787-827: LGTM - Comprehensive sourceIntegration metadata for screenshot tracking.

The code properly attaches integration metadata to screenshot steps from Heretto sources. The handling of different screenshot value types (string, boolean, object) is thorough, and the metadata structure is consistent across both attachment points.


1008-1010: LGTM - generateSpecId prevents spec ID collisions.

Replacing basename-based IDs with path-based IDs generated by generateSpecId prevents collisions when multiple files share the same basename but exist in different directories.

src/heretto.test.js (3)

232-328: LGTM - Comprehensive test coverage for asset retrieval.

The tests thoroughly cover asset retrieval scenarios including pagination, empty results, and edge cases like missing filePath properties. The test data is realistic and aligns with the Heretto API structure.

Based on learnings: The tests use realistic test data that matches actual usage patterns and cover both successful and error scenarios.


330-381: LGTM - Thorough validation test coverage.

The tests comprehensively cover ditamap validation scenarios, including correct location, missing files, wrong directory, and edge cases. The tests verify the directory-specific requirement for ditamaps in ot-output/dita/.

Based on learnings: Tests cover both successful and error scenarios appropriately.


391-533: LGTM - Tests properly verify integrated asset validation.

The updated tests correctly validate that pollJobStatus now fetches and validates assets after job completion. The tests cover success, failure, and edge cases including missing ditamaps and asset fetch failures. The call count expectations are accurate (3 status polls + 1 asset call).

Based on learnings: Tests validate configuration handling thoroughly and cover both successful and error scenarios.

src/heretto.integration.test.js (4)

1-77: LGTM - Excellent CI-gating and developer experience.

The integration test setup is well-designed with clear environment variable requirements, helpful skip messages for local development, and proper CI-gating logic. The DEBUG-based logging control is a nice touch for debugging failed integration tests.

Based on learnings: When possible, directly import and run functions rather than use extensive mocking and stubbing in tests. This integration test file follows that guidance by testing against real APIs in CI.


79-90: LGTM - Appropriate API client validation.

The tests verify essential client setup properties including method availability and correct base URL construction. These are good smoke tests for the integration environment.


92-214: LGTM - Comprehensive workflow testing.

The tests thoroughly validate the full Heretto publishing workflow from scenario lookup through job completion, asset validation, and output extraction. The extended timeouts are appropriate for real API interactions, and the tests verify critical integration points.

Based on learnings: Tests use realistic test data that matches actual usage patterns and add comprehensive test coverage when adding new features.


216-231: LGTM - Appropriate end-to-end test.

The end-to-end test of loadHerettoContent validates the complete workflow and serves as a good integration smoke test. The 10-minute timeout is reasonable for the full Heretto content loading process.

.github/workflows/integration-tests.yml (2)

63-119: Well-structured failure notification job.

The notify-on-failure job properly handles idempotent issue creation by checking for existing open issues before creating new ones, and adds comments for subsequent failures. The condition failure() && github.event_name == 'schedule' correctly limits notifications to scheduled runs only.


36-38: No action needed. Node.js 24 is a valid LTS version as of January 2026, having entered Active LTS status on October 28, 2025 and scheduled to remain in Active LTS through October 20, 2026.

Likely an incorrect or invalid review comment.

src/heretto.js (8)

58-76: LGTM!

The createRestApiClient function correctly mirrors the createApiClient pattern with appropriate modifications for the REST API endpoint (different base URL and XML Accept header).


263-273: LGTM!

Clean validation function with clear intent and correct use of Array.some() for early termination.


702-758: LGTM!

The buildFileMapping function properly handles XML parsing with per-file error handling and builds a useful mapping of local paths to file metadata.


766-789: LGTM!

Recursive file finder with appropriate extension matching and intentional silent error handling for inaccessible directories.


797-823: LGTM!

Clean recursive traversal that correctly handles both single and multiple image elements in the DITA content.


907-966: LGTM!

Well-implemented file upload with proper MIME type detection, error handling, and consistent return structure.


617-628: LGTM!

Good conditional logic to only fetch resource dependencies when uploadOnChange is enabled, avoiding unnecessary API calls.


1028-1049: LGTM!

All new functions are properly exported and the module structure is well-organized with constants exported for testing.

@promptless
Copy link

promptless bot commented Jan 3, 2026

📝 Documentation updates detected!

Updated existing suggestion: Document Heretto CMS integration for DITA content testing

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive Heretto CMS integration for automatic screenshot synchronization and content management. The implementation adds REST API client capabilities, job asset validation, file upload functionality, and integration tests with CI/CD automation.

Key changes include:

  • Added Heretto REST API client for file uploads and resource management with UUID-to-path mapping
  • Enhanced job polling to validate .ditamap presence in published outputs before considering jobs successful
  • Introduced integration tests that run against real Heretto APIs with scheduled CI/CD pipeline and failure notifications

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/utils.js Added functions to find Heretto integration by path, generate spec IDs from file paths, and attach sourceIntegration metadata to screenshot steps
src/heretto.js Added REST API client, asset validation functions, file upload/search capabilities, resource dependency retrieval, and file mapping for screenshot synchronization
src/heretto.test.js Updated unit tests for new asset validation logic and removed obsolete assertions for unused post calls
src/heretto.integration.test.js New integration test suite for real Heretto API interactions with CI environment detection and credential checks
src/config.js Added new DITA step pattern for data elements and updated screenshot image regex patterns (with syntax errors)
package.json Added fast-xml-parser dependency, updated test scripts to separate unit and integration tests, and bumped dependency versions
package-lock.json Lock file updates reflecting new and updated dependencies
.github/workflows/integration-tests.yml New CI/CD workflow for scheduled integration tests with automatic issue creation on failures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +200 to +214
it("should download and extract output", async function () {
const outputPath = await heretto.downloadAndExtractOutput(
client,
scenarioInfo.fileId,
jobId,
herettoConfig.name,
mockLog,
mockConfig
);

expect(outputPath).to.not.be.null;
expect(outputPath).to.be.a("string");
expect(outputPath).to.include("heretto_");
});
});
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests download and extract content to temporary directories but don't clean them up after the tests complete. Consider adding an after() hook to clean up any temporary directories created during testing to prevent disk space accumulation in CI environments.

Copilot uses AI. Check for mistakes.
HERETTO_ORGANIZATION_ID: ${{ secrets.HERETTO_ORGANIZATION_ID }}
HERETTO_USERNAME: ${{ secrets.HERETTO_USERNAME }}
HERETTO_API_TOKEN: ${{ secrets.HERETTO_API_TOKEN }}
HERETTO_SCENARIO_NAME: ${{ secrets.HERETTO_SCENARIO_NAME || 'Doc Detective' }}
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value syntax for GitHub Actions should use double pipes || as a fallback operator, not within the secrets expression. The current syntax ${{ secrets.HERETTO_SCENARIO_NAME || 'Doc Detective' }} will work, but it's more idiomatic to handle the default at the application level or use the env context. Consider documenting that this secret is optional.

Suggested change
HERETTO_SCENARIO_NAME: ${{ secrets.HERETTO_SCENARIO_NAME || 'Doc Detective' }}
HERETTO_SCENARIO_NAME: ${{ secrets.HERETTO_SCENARIO_NAME }}

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +88
function generateSpecId(filePath) {
const absolutePath = path.resolve(filePath);
const cwd = process.cwd();

let relativePath;
if (absolutePath.startsWith(cwd)) {
relativePath = path.relative(cwd, absolutePath);
} else {
relativePath = absolutePath;
}

const normalizedPath = relativePath
.split(path.sep)
.join("/")
.replace(/^\.\//, "")
.replace(/[^a-zA-Z0-9._\-\/]/g, "_");

return normalizedPath;
}
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generateSpecId function is defined but not exported in the exports section at the top of the file. If this function needs to be used by other modules, it should be added to the exports. If it's only used internally within utils.js, this is fine.

Copilot uses AI. Check for mistakes.
async function uploadFile(herettoConfig, fileId, localFilePath, log, config) {
const client = createRestApiClient(herettoConfig);

try {
Copy link

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uploadFile function uses fs.readFileSync without checking if the file exists first. Consider adding a file existence check before attempting to read the file to provide better error messages. You could use fs.existsSync(localFilePath) to check before reading.

Suggested change
try {
try {
// Ensure the local file exists before attempting to read it
if (!fs.existsSync(localFilePath)) {
log(
config,
"warning",
`Local file does not exist, cannot upload to Heretto: ${localFilePath}`
);
return {
status: "FAIL",
description: `Local file not found: ${localFilePath}`,
};
}

Copilot uses AI. Check for mistakes.
- Update HERETTO_SCENARIO_NAME handling in integration tests to clarify it's optional.
- Refine regex patterns for screenshot image handling in config.
- Add cleanup logic for temporary directories in Heretto integration tests.
- Implement file existence check before uploading to Heretto.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
src/heretto.integration.test.js (2)

22-25: Consider using a more conventional Chai import pattern.

Using global.expect in a before hook is unconventional and could cause issues if tests run in parallel or in isolation. Consider importing Chai at the module level using dynamic import or CommonJS require with a wrapper.

🔎 Suggested alternative
+let expect;
+
 before(async function () {
-  const { expect } = await import("chai");
-  global.expect = expect;
+  const chai = await import("chai");
+  expect = chai.expect;
 });

Then use expect directly in tests without polluting global scope.


63-63: Unused variable tempDirectories.

The tempDirectories array is declared but never populated or used. The cleanup logic in the after hook uses a different approach (scanning the temp directory). Remove this unused variable.

🔎 Proposed fix
   let client;
   let herettoConfig;
-  let tempDirectories = []; // Track temp directories for cleanup
   const mockLog = (...args) => {
.github/workflows/integration-tests.yml (1)

54-62: Artifact paths may not exist.

The test-results/ directory and *.log files may not be created by the test run. Consider adding if-no-files-found: ignore to prevent workflow warnings if these paths don't exist.

🔎 Suggested configuration
       - name: Upload test results
         if: always()
         uses: actions/upload-artifact@v4
         with:
           name: integration-test-results
           path: |
             test-results/
             *.log
           retention-days: 7
+          if-no-files-found: ignore
src/heretto.js (5)

220-261: Good pagination handling for asset retrieval.

The pagination logic correctly iterates through all pages and handles the totalPages check. Consider adding a safeguard against infinite loops if the API returns unexpected pagination data.

🔎 Optional safeguard
   while (hasMorePages) {
+    // Safeguard against infinite loops
+    if (page > 1000) {
+      break;
+    }
     const response = await client.get(

493-498: Consider using a separate metadata object instead of underscore-prefixed keys.

Storing metadata like _ditamapPath, _ditamapId, and _ditamapParentFolderId directly in the pathToUuidMap object risks collision with file paths that might start with underscores. Consider returning a structured object with separate files and metadata properties.

🔎 Suggested structure
return {
  files: pathToUuidMap,
  metadata: {
    ditamapPath: relativePath,
    ditamapId: ditamapId,
    ditamapParentFolderId: ditamapParentFolder,
  }
};

783-785: Consider logging suppressed errors.

Silently ignoring read errors could make debugging difficult. Consider adding debug-level logging for errors, even if they're expected (e.g., permission issues).

🔎 Suggested improvement
-  } catch (error) {
-    // Ignore read errors for inaccessible directories
+  } catch (error) {
+    // Ignore read errors for inaccessible directories but log for debugging
+    // Note: This function doesn't have access to log/config, so consider
+    // passing them as parameters if debugging becomes necessary
   }

Alternatively, pass log and config as parameters to enable debug logging.


862-869: Potential baseURL conflict.

The request overrides baseURL while using client created with createApiClient which already has a baseURL. The endpoint /ezdnxtgen/api/search also partially duplicates the path in createApiClient's baseURL. This works but is confusing.

🔎 Consider using the REST client instead

Since you're overriding the baseURL anyway, consider using createRestApiClient which has the simpler base URL, or extract the search endpoint to a constant for clarity.


1021-1029: Mutation of config object for caching.

Directly mutating herettoConfig.fileMapping to cache search results is a side effect that could be unexpected. This is acceptable for performance optimization but should be documented in the JSDoc.

🔎 Documentation suggestion

Add a note to the JSDoc:

/**
 * ...
 * @note This function may mutate herettoConfig.fileMapping to cache resolved file IDs.
 * ...
 */
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7309fa and fb53596.

📒 Files selected for processing (4)
  • .github/workflows/integration-tests.yml
  • src/config.js
  • src/heretto.integration.test.js
  • src/heretto.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/config.js
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,ts}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts}: Use async/await for asynchronous operations
Prefer destructuring for function parameters
Use meaningful variable names that reflect Doc Detective terminology
Add JSDoc comments for complex functions

Files:

  • src/heretto.integration.test.js
  • src/heretto.js
**/*.test.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.js: When possible, directly import and run functions rather than use extensive mocking and stubbing in tests
Mock external dependencies (file system, HTTP requests) in tests
Test both successful and error scenarios
Validate configuration handling thoroughly in tests
Use realistic test data that matches actual usage patterns
Use Mocha for unit tests
Use Chai for assertions

Files:

  • src/heretto.integration.test.js
src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Use the built-in logging system with available log levels: debug, info, warn, error

Files:

  • src/heretto.integration.test.js
  • src/heretto.js
🧠 Learnings (5)
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Validate configuration handling thoroughly in tests

Applied to files:

  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Mock external dependencies (file system, HTTP requests) in tests

Applied to files:

  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Test both successful and error scenarios

Applied to files:

  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Use realistic test data that matches actual usage patterns

Applied to files:

  • src/heretto.integration.test.js
📚 Learning: 2025-11-25T23:59:45.298Z
Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : When possible, directly import and run functions rather than use extensive mocking and stubbing in tests

Applied to files:

  • src/heretto.integration.test.js
🧬 Code graph analysis (1)
src/heretto.integration.test.js (1)
src/heretto.js (3)
  • client (596-596)
  • client (840-840)
  • client (907-907)
🔇 Additional comments (15)
src/heretto.integration.test.js (5)

30-39: Good CI gating implementation.

The conditional test execution pattern using describe.skip is clean and the detailed skip messages help developers understand why tests aren't running locally.


110-121: LGTM!

The API client creation tests appropriately validate the client instance and its configuration.


123-150: Good test coverage for findScenario.

Tests cover both the success case (existing scenario) and failure case (non-existent scenario), aligning with the coding guidelines for testing both successful and error scenarios.


152-245: Test dependency chain is acceptable for integration tests.

The sequential dependency where later tests rely on jobId from earlier tests is appropriate for this integration workflow. The this.skip() in the before hook provides graceful handling when the scenario isn't found.


247-262: LGTM!

The end-to-end test appropriately exercises the full workflow with an extended timeout to accommodate real API interactions.

.github/workflows/integration-tests.yml (3)

27-28: Good security practice for fork PRs.

The condition to skip on fork PRs prevents exposure of secrets to untrusted code. The inline comment explaining that HERETTO_SCENARIO_NAME is optional is a good documentation addition.


64-119: Well-designed failure notification.

The job appropriately only runs for scheduled failures, checks for existing issues before creating new ones, and provides useful context in the issue body. Good practice for maintaining visibility into integration test health.


34-38: No action needed. Node.js v24 is in Active LTS (as of October 28, 2025) and is fully supported by GitHub Actions setup-node. Using v24 is appropriate and stable for the integration tests workflow.

Likely an incorrect or invalid review comment.

src/heretto.js (7)

58-76: LGTM!

The REST API client factory follows the same pattern as createApiClient with appropriate headers for XML responses. Good separation of concerns.


263-273: LGTM!

Clear and focused validation function with good JSDoc documentation.


285-351: Good enhancement with asset validation.

The updated pollJobStatus now validates that the job output contains the expected .ditamap file, providing early detection of publishing issues. The error handling for asset validation failures is appropriate.


692-757: LGTM!

The file mapping function properly handles parse errors per-file without failing the entire operation, and provides useful debug logging.


790-822: LGTM!

The recursive traversal correctly handles both single and array image elements, properly extracting href attributes.


906-978: Good implementation with file existence check.

The file existence check before reading addresses the previous review comment. Content type detection is comprehensive and error handling returns structured results for upstream processing.


1040-1061: LGTM!

Exports are complete and include all new public functions along with constants for testing purposes.

@hawkeyexl hawkeyexl merged commit b7345ab into main Jan 4, 2026
14 checks passed
@hawkeyexl hawkeyexl deleted the heretto2 branch January 4, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants