Integrate Heretto API by hawkeyexl · Pull Request #95 · doc-detective/resolver

hawkeyexl · 2025-12-16T19:41:42Z

Implement Heretto integration with enhanced logging and error handling. Update dependencies and improve timeout settings for downloads. Revise tests and ignore already processed DITA maps to streamline processing.

Summary by CodeRabbit

New Features
- Added Heretto content integration to load external content as processing inputs and inject them into the processing sequence
- Improved handling of empty input sequences with clear warnings
- Prevent duplicate processing of DITA map outputs
Bug Fixes
- Adjusted log severity label for missing-test warnings
Tests
- Added comprehensive tests covering Heretto integration and polling behavior
Chores
- Added ZIP handling dependency, pinned one package to a dev build, bumped analytics client version

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ample configuration

coderabbitai · 2025-12-16T19:41:52Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a Heretto CCMS integration (API client, publish trigger, polling, ZIP download/extract), wires it into source processing to load heretto: sources, adds tests, adds adm-zip dependency and updates two dependency versions, and changes one log-level string.

Changes

Cohort / File(s)	Summary
Dependencies `package.json`	Added `adm-zip`; pinned `doc-detective-common` `^3.6.0` → `3.6.0-dev.1`; bumped `posthog-node` `^5.17.0` → `^5.17.2`.
Heretto CCMS Integration `src/heretto.js`	New module implementing Basic auth header, API client creation, scenario discovery/validation, publish-job trigger, polling with timeout, ZIP download & extraction (uses AdmZip), and top-level `loadHerettoContent`; exports functions and constants.
Heretto Tests `src/heretto.test.js`	New comprehensive test suite (proxyquire/sinon, fake timers) covering auth, client creation, scenario validation branches, job triggering, polling behavior, download/extract, error cases, and exported constants.
Source Processing Integration `src/utils.js`	Detects `heretto:` sources, calls `loadHerettoContent` to fetch/extract content, injects extracted output into processing inputs, tracks already-processed DITA maps to avoid re-processing, and guards empty input sequences.
Logging Update `src/index.js`	Changed log level string from `"warn"` to `"warning"` when no tests are detected (severity label only).

Sequence Diagram(s)

sequenceDiagram
    participant Processor as Processor (src/utils.js)
    participant Heretto as Heretto Module (src/heretto.js)
    participant API as Heretto CCMS API
    participant FS as File System

    Processor->>Heretto: loadHerettoContent(herettoConfig, log, config)

    rect rgba(210,235,255,0.6)
    note over Heretto,API: Init & discover scenario
    Heretto->>Heretto: createApiClient()
    Heretto->>API: GET /scenarios
    API-->>Heretto: scenarios[]
    Heretto->>API: GET /scenarios/{id}/parameters
    API-->>Heretto: parameters
    Heretto->>Heretto: validate scenario params
    end

    rect rgba(220,255,220,0.6)
    note over Heretto,API: Trigger publish & poll
    Heretto->>API: POST /files/{fileId}/publishes
    API-->>Heretto: { jobId }
    loop poll until SUCCESS/FAIL or timeout
        Heretto->>API: GET /files/{fileId}/publishes/{jobId}/status
        API-->>Heretto: status
    end
    end

    rect rgba(255,250,210,0.6)
    note over Heretto,FS: Download & extract
    Heretto->>API: GET /files/{fileId}/publishes/{jobId}/output (zip)
    API-->>Heretto: zip binary
    Heretto->>FS: save zip -> temp
    Heretto->>FS: extract with AdmZip (safe traversal)
    Heretto->>FS: cleanup temp zip
    FS-->>Heretto: extraction path
    end

    Heretto-->>Processor: extraction path | null

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Focus review items:
- src/heretto.js: auth header correctness, scenario parameter validation, polling/timeouts, temp file handling and AdmZip extraction safety, error logging and cleanup.
- src/utils.js: integration logic for heretto: sources, management of ignored DITA maps, and interaction with existing processing flow and error handling.
- src/heretto.test.js: reliability of mocks/stubs, use of fake timers, and coverage of failure branches.

Possibly related PRs

Update dependencies to latest versions #84 — overlaps package.json edits (pins for doc-detective-common, posthog-node).
Update dependencies #91 — related dependency updates touching posthog-node.

Poem

🐰 I hopped to Heretto to fetch a map or two,
I polled and I waited till the job said "woohoo!"
I zipped and I unzipped, then left paths in a line,
Now inputs are tidy and the DITA maps align.
✨ Hoppy builds and neat little hops!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Integrate Heretto API' directly summarizes the main change—adding a new Heretto CCMS API integration module with supporting utilities and tests.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch heretto

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c5d03d6 and 2da6d15.

📒 Files selected for processing (2)

src/heretto.js (1 hunks)
src/heretto.test.js (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/heretto.test.js

🧰 Additional context used

📓 Path-based instructions (2)

**/*.{js,ts}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts}: Use async/await for asynchronous operations
Prefer destructuring for function parameters
Use meaningful variable names that reflect Doc Detective terminology
Add JSDoc comments for complex functions

Files:

src/heretto.js

src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Use the built-in logging system with available log levels: debug, info, warn, error

Files:

src/heretto.js

🧠 Learnings (3)

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.{js,ts} : Add JSDoc comments for complex functions

Applied to files:

src/heretto.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Test both successful and error scenarios

Applied to files:

src/heretto.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Validate configuration handling thoroughly in tests

Applied to files:

src/heretto.js

🧬 Code graph analysis (1)

src/heretto.js (1)

src/heretto.test.js (9)

herettoConfig (57-61)

herettoConfig (327-333)

herettoConfig (344-350)

scenarioParameters (87-93)

scenarioParameters (134-140)

scenarioParameters (159-164)

scenarioParameters (183-188)

scenarioParameters (352-358)

completedJob (240-243)

🔇 Additional comments (7)

src/heretto.js (7)

1-14: LGTM! Well-organized imports and constants.

The timeout constants are properly defined and named, addressing previous concerns about magic numbers. The separation of API_REQUEST_TIMEOUT_MS (30s) and DOWNLOAD_TIMEOUT_MS (5min) appropriately reflects the different nature of these operations.

21-53: LGTM! Clean helper functions with proper documentation.

All three helper functions have complete JSDoc comments and implement their responsibilities correctly. The createApiClient function properly configures axios with authentication, timeout, and headers.

60-76: LGTM! API methods are well-documented.

Both functions now have complete JSDoc comments, addressing previous feedback. The fallback to an empty array in getPublishingScenarios (line 62) is a good defensive practice.

172-192: LGTM! Publishing job functions are well-documented.

Both functions have complete JSDoc comments and implement straightforward API operations correctly.

203-240: LGTM! Robust polling implementation with proper timeout and error handling.

The polling logic correctly handles success, failure, and timeout scenarios with appropriate logging at each stage. The use of optional chaining (line 209, 211, 215) safely handles potentially undefined status properties.

252-334: LGTM! Path traversal protection properly implemented.

The function now correctly addresses previous security concerns:

Uses recursive: true for directory creation (line 263)

Uses the DOWNLOAD_TIMEOUT_MS constant (line 282)

Implements robust path validation (lines 296-307) to prevent zip slip attacks

The path traversal protection logic correctly validates that each extracted entry's resolved path stays within the intended output directory.

344-430: LGTM! Well-orchestrated integration with proper fallback handling.

The main loadHerettoContent function correctly orchestrates the complete Heretto publishing workflow with appropriate error handling at each stage. Line 355 properly handles the case where scenarioName is not provided by falling back to DEFAULT_SCENARIO_NAME, addressing previous feedback.

The module exports are well-structured, exposing the main functions and constants needed for testing.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (2)

src/heretto.js (2)
64-69: Consider adding JSDoc for internal helper function.

getPublishingScenarioParameters lacks JSDoc documentation. As per coding guidelines, consider adding JSDoc comments for clarity.
+/**
+ * Fetches parameters for a specific publishing scenario.
+ * @param {Object} client - Configured axios instance
+ * @param {string} scenarioId - ID of the publishing scenario
+ * @returns {Promise<Object>} Scenario parameters object
+ */
 async function getPublishingScenarioParameters(client, scenarioId) {
187-190: Unused function: getPublishingJobHistory.

This function is defined but only referenced in commented-out code (lines 346-351). Consider removing it or documenting its intended future use.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef908c and ecfb4fc.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (5)

package.json (1 hunks)
src/heretto.js (1 hunks)
src/heretto.test.js (1 hunks)
src/index.js (1 hunks)
src/utils.js (3 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{js,ts}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts}: Use async/await for asynchronous operations
Prefer destructuring for function parameters
Use meaningful variable names that reflect Doc Detective terminology
Add JSDoc comments for complex functions

Files:

src/heretto.test.js
src/index.js
src/heretto.js
src/utils.js

**/*.test.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.js: When possible, directly import and run functions rather than use extensive mocking and stubbing in tests
Mock external dependencies (file system, HTTP requests) in tests
Test both successful and error scenarios
Validate configuration handling thoroughly in tests
Use realistic test data that matches actual usage patterns
Use Mocha for unit tests
Use Chai for assertions

Files:

src/heretto.test.js

src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Use the built-in logging system with available log levels: debug, info, warn, error

Files:

src/heretto.test.js
src/index.js
src/heretto.js
src/utils.js

🧠 Learnings (6)

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Validate configuration handling thoroughly in tests

Applied to files:

src/heretto.test.js
src/index.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Test both successful and error scenarios

Applied to files:

src/heretto.test.js
src/index.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Use realistic test data that matches actual usage patterns

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Mock external dependencies (file system, HTTP requests) in tests

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to src/**/*.js : Use the built-in logging system with available log levels: debug, info, warn, error

Applied to files:

src/index.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.{js,ts} : Use meaningful variable names that reflect Doc Detective terminology

Applied to files:

src/index.js
src/utils.js

🧬 Code graph analysis (2)

src/heretto.test.js (1)

src/heretto.js (1)

os (4-4)

src/utils.js (1)

src/heretto.js (1)

outputPath (385-392)

🪛 GitHub Actions: Test (& Publish)

package.json

[error] 1-1: Command failed with exit code 2. npm test ran and 2 tests failed.

src/heretto.test.js

[error] 105-105: Heretto Integration: findScenario should return scenarioId and fileId when valid scenario is found. AssertionError: expected null to deeply equal { scenarioId: 'scenario-123', ... }

[error] 342-342: Heretto Integration: Constants should export expected constants: AssertionError: expected undefined to equal 'Doc Detective'

🔇 Additional comments (13)

src/index.js (1)

37-37: LGTM!

The log level change from "warn" to "warning" aligns with the built-in logging system which supports "error", "warning", "info", and "debug" levels.

src/utils.js (3)

227-230: LGTM!

Good defensive guard for empty input sequences with appropriate warning logging.

238-286: Heretto integration logic looks correct.

The implementation properly:

Extracts the integration name from the heretto: prefix

Looks up configuration from config.integrations.heretto

Loads content asynchronously and handles failures gracefully

Injects the output path into the processing sequence

Tracks already-processed DITA maps to avoid reprocessing

One observation: mutating the sequence array while iterating over it (line 262, 282) works correctly here because you're inserting at currentIndex + 1 and then continue-ing, so the for-loop's next iteration picks up the new item. This is intentional and correct for the flow.

307-308: LGTM!

The condition correctly skips DITA maps that are already in the ignoredDitaMaps list, preventing redundant processing of Heretto-loaded content.

src/heretto.test.js (1)

1-11: LGTM!

Good setup using Mocha, Sinon for stubs, proxyquire for dependency injection, and dynamic Chai import. This follows the project's testing conventions.

src/heretto.js (7)

1-12: LGTM!

Clean module structure with well-organized imports and clearly defined internal constants. The polling configuration (5 seconds interval, 5 minutes timeout) is reasonable for a publishing job workflow.

14-52: Well-documented authentication and client setup.

Good JSDoc comments, proper Base64 encoding for Basic auth, and sensible 30-second timeout for individual requests.

103-114: Verify API response structure for transtype parameter.

The code checks transtypeParam.value !== "dita" but the test data (and possibly the actual API) may return the value nested as options[0].value. This mismatch is causing test failures.

Please verify the actual Heretto API response structure for scenario parameters. If the API returns:

{ name: "transtype", value: "dita" } → current code is correct, fix tests

{ name: "transtype", options: [{ value: "dita" }] } → update this check to transtypeParam.options?.[0]?.value

201-238: Polling implementation is solid.

Good use of:

Timeout boundary check with POLLING_TIMEOUT_MS

Graceful error handling with logging

Appropriate use of Promise with setTimeout for delays

Clear status logging for debugging

250-315: Download and extraction logic is well-implemented.

Proper handling of:

Temp directory creation with { recursive: true }

Unique output directory naming using MD5 hash

Binary download with correct responseType: "arraybuffer"

Extended timeout (5 minutes) for large downloads

ZIP cleanup after extraction

Graceful error handling with fallback to null

325-403: Main orchestration function is well-structured.

Clean async flow with appropriate logging at each stage and proper error handling. The function correctly chains the workflow: create client → find scenario → trigger job → poll status → download/extract.

405-417: LGTM!

Clean module exports with functions and constants properly exposed for testing and external use.

package.json (1)

28-28: LGTM!

The adm-zip dependency at version 0.5.16 is appropriately added to support ZIP extraction in the new Heretto integration module. This is the latest stable version with no known vulnerabilities and is widely adopted across the ecosystem.

package.json

src/heretto.test.js

promptless · 2025-12-16T19:52:32Z

📝 Documentation updates detected!

Updated existing suggestion: Document Heretto CMS integration for DITA content testing

- Change transtype options structure to a single value - Pass scenario name to findScenario function calls - Update constant name from SCENARIO_NAME to DEFAULT_SCENARIO_NAME

Copilot

Pull request overview

This PR integrates Heretto API functionality to load external content from Heretto CCMS as processing inputs. The implementation includes API authentication, publishing job triggering, polling mechanisms with timeout handling, and ZIP file extraction. Additionally, it updates dependencies and standardizes log level usage.

Added comprehensive Heretto API integration with enhanced error handling and logging
Implemented polling mechanism for asynchronous publishing job completion
Updated log severity from "warn" to "warning" for consistency
Added logic to prevent duplicate processing of DITA maps

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 25 comments.

Show a summary per file

File	Description
src/heretto.js	New module implementing Heretto CCMS API client, authentication, scenario management, publishing job triggering, polling, and content download/extraction
src/heretto.test.js	Comprehensive test suite covering Heretto integration functions with mocked API responses
src/utils.js	Integrated Heretto content loading into qualifyFiles, added empty input check, fixed DITA map processing logic, and prevented duplicate processing
src/index.js	Fixed log level from "warn" to "warning" for consistency with codebase
package.json	Added adm-zip dependency for ZIP handling, updated posthog-node and doc-detective-common versions, updated axios version constraint
package-lock.json	Updated lock file to reflect dependency changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/heretto.js

src/heretto.test.js

- Change 'options' to 'value' for 'transtype' in multiple test cases - Update test description to reflect that 'tool-kit-name' parameter can be missing - Add 'scenarioName' to the context in relevant tests

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/heretto.test.js (1)
1-6: Remove unused imports.

The following imports are declared but never used: assert, path, fs, and os. Removing them will clean up the code.

Apply this diff:
-const assert = require("assert");
 const sinon = require("sinon");
 const proxyquire = require("proxyquire");
-const path = require("path");
-const fs = require("fs");
-const os = require("os");

🧹 Nitpick comments (2)

src/heretto.test.js (2)
204-220: Consider adding an error scenario test.

While the success path is well tested, consider adding a test case for when the POST request fails (e.g., network error, invalid scenario ID). This would make the test suite more complete.

Example test case:
it("should throw error when job creation fails", async function () {
  mockClient.post.rejects(new Error("API error"));
  
  await expect(
    heretto.triggerPublishingJob(mockClient, "file-uuid", "scenario-id")
  ).to.be.rejected;
});
As per coding guidelines, testing both success and error scenarios improves test coverage.

281-337: Consider adding a success path test or integration test.

The current tests cover important failure scenarios (scenario lookup and job creation failures), but there's no test for the full success path of loadHerettoContent.

Given the complexity of the function (it orchestrates scenario lookup, job triggering, polling, downloading, and extracting), you might consider:

Adding a unit test that mocks all the intermediate steps and verifies the success path

Creating an integration test that exercises the full flow with test fixtures

This would provide more confidence in the happy path and align with the coding guideline to "Add comprehensive test coverage when adding new features."

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0cedc5 and 877ed1b.

📒 Files selected for processing (1)

src/heretto.test.js (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{js,ts}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts}: Use async/await for asynchronous operations
Prefer destructuring for function parameters
Use meaningful variable names that reflect Doc Detective terminology
Add JSDoc comments for complex functions

Files:

src/heretto.test.js

**/*.test.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.js: When possible, directly import and run functions rather than use extensive mocking and stubbing in tests
Mock external dependencies (file system, HTTP requests) in tests
Test both successful and error scenarios
Validate configuration handling thoroughly in tests
Use realistic test data that matches actual usage patterns
Use Mocha for unit tests
Use Chai for assertions

Files:

src/heretto.test.js

src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Use the built-in logging system with available log levels: debug, info, warn, error

Files:

src/heretto.test.js

🧠 Learnings (6)

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Mock external dependencies (file system, HTTP requests) in tests

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Validate configuration handling thoroughly in tests

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Test both successful and error scenarios

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Use realistic test data that matches actual usage patterns

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Add comprehensive test coverage when adding new features

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.{js,ts} : Use meaningful variable names that reflect Doc Detective terminology

Applied to files:

src/heretto.test.js

🧬 Code graph analysis (1)

src/heretto.test.js (1)

src/heretto.js (3)

authHeader (40-43)

scenarioParameters (89-92)

completedJob (368-374)

🔇 Additional comments (6)

src/heretto.test.js (6)

18-38: LGTM: Excellent test setup using proxyquire and sinon.

The test setup properly mocks the axios client using proxyquire, which allows clean dependency injection without affecting the actual module. The beforeEach/afterEach pattern ensures proper test isolation.

Based on learnings, this follows the guideline to mock external dependencies (HTTP requests) in tests.

40-55: LGTM: Comprehensive coverage for authentication header creation.

The tests properly verify Base64 encoding and include special character handling, which is important for password security scenarios.

57-73: LGTM: Proper verification of API client configuration.

The test comprehensively verifies the axios client is created with the correct baseURL, authorization header, and content type.

75-202: LGTM: Excellent test coverage for scenario discovery.

The test suite thoroughly covers the findScenario function with:

Success path with valid parameters

Scenario not found case

Network failure handling

Parameter validation for transtype, tool-kit-name, and file_uuid_picker

This aligns with coding guidelines to test both successful and error scenarios with realistic test data.

As per coding guidelines, this properly tests both success and error scenarios and validates configuration handling thoroughly.

222-279: LGTM: Excellent polling test using fake timers.

The test suite properly covers:

Success and failure scenarios

Polling behavior with multiple intermediate states

Proper use of sinon.useFakeTimers() to avoid actual delays

The fake timer approach is best practice for testing time-dependent async operations.

As per coding guidelines, this tests both success and error scenarios appropriately.

339-345: LGTM: Proper verification of module constants.

The test correctly verifies all exported constants with their expected values.

- Update timeout constants for API requests and downloads - Modify findScenario function to use scenarioName parameter - Remove unused scenario description constant - Simplify temp directory creation logic in download function - Clean up test file by removing unnecessary imports

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/heretto.js (1)

292-296: Path traversal vulnerability in ZIP extraction.

The zip.extractAllTo() method doesn't validate that extracted file paths stay within outputDir. A malicious ZIP could contain entries like ../../../etc/passwd that write files outside the intended directory (zip slip attack).

Consider validating each entry's path before extraction:

     // Extract ZIP contents
     log(config, "debug", `Extracting output to ${outputDir}...`);
     const zip = new AdmZip(zipPath);
-    zip.extractAllTo(outputDir, true);
+    
+    // Validate and extract entries safely
+    for (const entry of zip.getEntries()) {
+      const entryPath = path.join(outputDir, entry.entryName);
+      const resolvedPath = path.resolve(entryPath);
+      
+      // Ensure the resolved path is within outputDir
+      if (!resolvedPath.startsWith(path.resolve(outputDir) + path.sep)) {
+        log(config, "warning", `Skipping potentially malicious ZIP entry: ${entry.entryName}`);
+        continue;
+      }
+      
+      if (entry.isDirectory) {
+        fs.mkdirSync(resolvedPath, { recursive: true });
+      } else {
+        fs.mkdirSync(path.dirname(resolvedPath), { recursive: true });
+        fs.writeFileSync(resolvedPath, entry.getData());
+      }
+    }

🧹 Nitpick comments (3)

src/heretto.js (1)
78-84: Inaccurate JSDoc return description.

The JSDoc states "Scenario object or null if creation failed" but this function only finds and validates an existing scenario—it doesn't create anything. The comment is misleading.
 /**
  * Finds an existing publishing scenario by name and validates its configuration.
  * @param {Object} client - Configured axios instance
  * @param {Function} log - Logging function
  * @param {Object} config - Doc Detective config for logging
- * @returns {Promise<Object|null>} Scenario object or null if creation failed
+ * @param {string} scenarioName - Name of the scenario to find
+ * @returns {Promise<Object|null>} Object with scenarioId and fileId, or null if not found/invalid
  */
src/heretto.test.js (2)
218-275: Good test coverage, but timeout scenario is not tested.

The polling tests cover SUCCESS, FAIL, and intermediate polling states well. Consider adding a test for the timeout scenario where the job never completes within POLLING_TIMEOUT_MS.

Example timeout test:
it("should return null on timeout", async function () {
  const clock = sinon.useFakeTimers();
  
  // Always return PENDING status
  mockClient.get.resolves({ 
    data: { id: "job-123", status: { status: "PENDING", result: null } } 
  });

  const pollPromise = heretto.pollJobStatus(mockClient, "file-uuid", "job-123", mockLog, mockConfig);

  // Advance past the timeout
  await clock.tickAsync(heretto.POLLING_TIMEOUT_MS + 1000);

  const result = await pollPromise;
  expect(result).to.be.null;
  
  clock.restore();
});
277-333: Missing success path and downloadAndExtractOutput coverage.

The loadHerettoContent tests only cover failure scenarios. Per coding guidelines, both successful and error scenarios should be tested. Additionally, downloadAndExtractOutput has no direct test coverage despite handling critical file system and ZIP operations.

Consider adding tests for:

Success path for loadHerettoContent - mocking the full workflow through to extraction

Direct tests for downloadAndExtractOutput - testing download, extraction, and cleanup

Example success path test structure:
it("should return output path on successful workflow", async function () {
  const herettoConfig = { /* ... */ };
  
  // Mock scenario lookup
  mockClient.get
    .onCall(0).resolves({ data: { content: [{ id: "scenario-123", name: "Doc Detective" }] } })
    .onCall(1).resolves({ data: { content: [/* valid params */] } })
    .onCall(2).resolves({ data: { status: { result: "SUCCESS" } } });
  
  // Mock job creation
  mockClient.post.resolves({ data: { jobId: "job-123" } });
  
  // Need to also mock file system operations for downloadAndExtractOutput
  // ...
});

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 877ed1b and c5d03d6.

📒 Files selected for processing (2)

src/heretto.js (1 hunks)
src/heretto.test.js (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{js,ts}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{js,ts}: Use async/await for asynchronous operations
Prefer destructuring for function parameters
Use meaningful variable names that reflect Doc Detective terminology
Add JSDoc comments for complex functions

Files:

src/heretto.test.js
src/heretto.js

**/*.test.js

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.js: When possible, directly import and run functions rather than use extensive mocking and stubbing in tests
Mock external dependencies (file system, HTTP requests) in tests
Test both successful and error scenarios
Validate configuration handling thoroughly in tests
Use realistic test data that matches actual usage patterns
Use Mocha for unit tests
Use Chai for assertions

Files:

src/heretto.test.js

src/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Use the built-in logging system with available log levels: debug, info, warn, error

Files:

src/heretto.test.js
src/heretto.js

🧠 Learnings (9)

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Validate configuration handling thoroughly in tests

Applied to files:

src/heretto.test.js
src/heretto.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Mock external dependencies (file system, HTTP requests) in tests

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Use realistic test data that matches actual usage patterns

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Test both successful and error scenarios

Applied to files:

src/heretto.test.js
src/heretto.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Add comprehensive test coverage when adding new features

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : When possible, directly import and run functions rather than use extensive mocking and stubbing in tests

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.test.js : Use Chai for assertions

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.{js,ts} : Use meaningful variable names that reflect Doc Detective terminology

Applied to files:

src/heretto.test.js

📚 Learning: 2025-11-25T23:59:45.298Z

Learnt from: CR
Repo: doc-detective/resolver PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T23:59:45.298Z
Learning: Applies to **/*.{js,ts} : Add JSDoc comments for complex functions

Applied to files:

src/heretto.js

🧬 Code graph analysis (2)

src/heretto.test.js (1)

src/heretto.js (3)

authHeader (41-44)

scenarioParameters (95-98)

completedJob (361-367)

src/heretto.js (1)

src/heretto.test.js (5)

scenarioParameters (85-91)

scenarioParameters (132-138)

scenarioParameters (157-162)

scenarioParameters (181-186)

scenarioParameters (311-317)

🔇 Additional comments (12)

src/heretto.js (6)

1-13: LGTM!

The imports are appropriate and the constants are well-named with clear inline documentation. The separate timeout constants (API_REQUEST_TIMEOUT_MS vs DOWNLOAD_TIMEOUT_MS) address previous magic number concerns.

15-33: LGTM!

Both helper functions are simple, focused, and have appropriate JSDoc documentation. The auth header creation follows standard Basic Auth patterns.

35-53: LGTM!

The axios client creation is clean with proper timeout configuration and authorization headers.

55-76: LGTM!

Simple API wrapper functions with appropriate JSDoc documentation. The null-coalescing fallback (|| []) in getPublishingScenarios is a good defensive practice.

164-239: LGTM!

The publishing job workflow functions are well-implemented:

triggerPublishingJob cleanly triggers the job

getJobStatus is a simple status check

pollJobStatus correctly handles SUCCESS/FAIL states with timeout protection and appropriate logging levels

316-410: LGTM!

The orchestrator function is well-structured with comprehensive error handling. The fallback to DEFAULT_SCENARIO_NAME when herettoConfig.scenarioName is not provided is a good pattern. The module exports appropriately expose functions and constants needed for testing.

src/heretto.test.js (6)

1-34: LGTM!

The test setup is clean and follows best practices:

Dynamic import for chai with global assignment handles ESM compatibility

Proxyquire properly injects the mocked axios

afterEach ensures sinon stubs are restored between tests

36-51: LGTM!

Good test coverage for createAuthHeader including the edge case with special characters in credentials.

53-69: LGTM!

The test properly validates that createApiClient configures the axios instance with the correct baseURL, authorization header, and content type.

71-198: LGTM!

Comprehensive test coverage for findScenario including:

Success case with valid scenario

Scenario not found

Network error handling

Parameter validation failures (transtype, tool-kit-name, file_uuid_picker)

The test data structure correctly uses value: "dita" matching the production code expectations. As per coding guidelines, both successful and error scenarios are tested.

200-216: LGTM!

The test validates the correct API endpoint and request payload for job triggering. The error scenario is covered indirectly in loadHerettoContent tests.

335-341: LGTM!

The constants export test properly verifies the public API contract for the exported constants.

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-17T03:39:35Z

src/heretto.js

+      return null;
+    }
+
+    // Make sure that scenarioParameters.content has an object with name="tool-kit-name" and value="default/dita-ot-3.6.1"


The comment mentions checking for 'value="default/dita-ot-3.6.1"', but the code only validates that the value exists (is truthy), not that it matches this specific version. The comment should be updated to match the actual validation logic, or the validation should be strengthened to check for the specific value if that's a requirement.

Copilot · 2025-12-17T03:39:35Z

src/utils.js

+
+  const ignoredDitaMaps = [];
+
  for (let source of sequence) {


Modifying the sequence array during iteration with for...of can lead to unexpected behavior. While splicing elements after the current index will cause them to be iterated, this pattern is error-prone and hard to reason about. Consider using a traditional for loop with an index (e.g., for (let i = 0; i < sequence.length; i++)) or collecting modifications to apply after the loop completes, to make the iteration behavior more predictable and maintainable.

Copilot · 2025-12-17T03:39:36Z

src/heretto.js

+/**
+ * Finds an existing publishing scenario by name and validates its configuration.
+ * @param {Object} client - Configured axios instance
+ * @param {Function} log - Logging function
+ * @param {Object} config - Doc Detective config for logging
+ * @returns {Promise<Object|null>} Scenario object or null if creation failed
+ */


The function documentation is missing the @param tag for the 'scenarioName' parameter. The function signature includes this parameter, but it's not documented in the JSDoc comment.

Copilot · 2025-12-17T03:39:36Z

src/heretto.js

+ * @param {Object} client - Configured axios instance
+ * @param {Function} log - Logging function
+ * @param {Object} config - Doc Detective config for logging
+ * @returns {Promise<Object|null>} Scenario object or null if creation failed


The function documentation states it returns "Scenario object or null if creation failed", but the function doesn't create scenarios - it only finds and validates existing ones. The return description should be updated to accurately reflect that it returns a scenario object if found and validated, or null if not found or validation fails.

Copilot · 2025-12-17T03:39:36Z

src/utils.js

      path.extname(source) === ".ditamap" &&
-      config.processDitaMap
+      !ignoredDitaMaps.some((ignored) => source.includes(ignored)) &&
+      config.processDitaMaps


The config property name has been changed from "processDitaMap" (singular) to "processDitaMaps" (plural), which is inconsistent with the function name "processDitaMap" (singular). This appears to be a naming inconsistency that could lead to confusion. Consider using consistent singular or plural naming throughout, or verify this matches the actual config schema definition.

- Update findScenario function to include scenarioName parameter and clarify return type - Implement path traversal protection in downloadAndExtractOutput function - Add tests for error handling in triggerPublishingJob and pollJobStatus functions - Create tests for downloadAndExtractOutput to validate ZIP extraction and malicious entry handling

promptless · 2025-12-17T14:19:31Z

📝 Documentation updates detected!

Updated existing suggestion: Document Heretto CMS integration for DITA content testing

hawkeyexl added 13 commits December 2, 2025 19:43

First pass at Heretto integration

14ef5de

Fix resolver log level

c43daf5

Heretto API failure workflows

f419c4e

Set octect-stream accept header

2fd032f

Move Heretto loading into qualifyFiles()

c1f0200

Ignore already processed ditamaps

300c1b1

Revise heretto tests

ad062a7

Add .env to .gitignore and update Heretto integration schemas with ex…

b3407a6

…ample configuration

Longer timeout for downloads

ae4d4c4

Loosen transtype validation

e6a3c66

Use job.jobId when downloading Heretto output

8f93e24

Added senarioName to heretto integration

84e3e1c

Set dep version

b5c31b7

Merge branch 'main' into heretto

ecfb4fc

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

promptless bot mentioned this pull request Dec 16, 2025

docs: Document Heretto CMS integration for DITA content testing doc-detective/doc-detective.github.io#141

Open

Update findScenario function and constants in Heretto integration tests

b0cedc5

- Change transtype options structure to a single value - Pass scenario name to findScenario function calls - Update constant name from SCENARIO_NAME to DEFAULT_SCENARIO_NAME

hawkeyexl requested a review from Copilot December 16, 2025 23:16

Copilot started reviewing on behalf of hawkeyexl December 16, 2025 23:16 View session

Copilot AI reviewed Dec 16, 2025

View reviewed changes

Refactor scenario parameters in Heretto integration tests

877ed1b

- Change 'options' to 'value' for 'transtype' in multiple test cases - Update test description to reflect that 'tool-kit-name' parameter can be missing - Add 'scenarioName' to the context in relevant tests

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

hawkeyexl requested a review from Copilot December 17, 2025 03:35

Copilot started reviewing on behalf of hawkeyexl December 17, 2025 03:35 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

hawkeyexl merged commit 15c58e0 into main Dec 17, 2025
12 checks passed

hawkeyexl deleted the heretto branch December 17, 2025 14:07

This was referenced Jan 3, 2026

Implement Heretto CMS uploader and integration tests #100

Merged

Add comprehensive test coverage and edge case tests #103

Open

coderabbitai bot mentioned this pull request Jan 28, 2026

feat: migrate to TypeScript #110

Open

Uh oh!

Conversation

hawkeyexl commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

promptless bot commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

promptless bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

hawkeyexl commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 16, 2025 •

edited

Loading