Skip to content

Conversation

@dlee3
Copy link

@dlee3 dlee3 commented Aug 5, 2025

Fixes issues with removeBOM to properly handle UTF-8 BOM.

Problem

The current implementation of removeBOM() in utils.js has critical issues:

  1. It correctly detects the UTF-8 BOM (0xEF 0xBB 0xBF) but doesn't actually remove it
  2. trimStart() removes whitespace rather than removing the BOM bytes
  3. It creates unnecessary buffer copies with Buffer.from(chunk) even for inputs that are already Buffers
  4. It doesn't handle string inputs with BOM characters (Unicode BOM 0xFEFF)
  5. These issues can lead to improper file processing when files contain BOM markers, potentially causing validation errors and unexpected behavior in downstream processes.

Solution

The updated implementation:

  1. Uses proper type checking with Buffer.isBuffer() to avoid unnecessary conversions
  2. Correctly removes the UTF-8 BOM from buffer inputs using slice(3) which removes the BOM characters instead of whitespace
  3. Properly handles string inputs with BOM characters using charCodeAt(0) === 0xfeff
  4. Removes the whitespace trimming functionality, which was unrelated to BOM removal

Result

This ensures that files with BOMs are correctly processed by the validator, which prevents validation errors and inconsistencies in the processing pipeline.

Our use case is that we're streaming files from a S3 bucket to a Lambda to validate them. We found some specific files that would fail with the error below. But if we downloaded the file and used the CLI or online validator the file validated without issue. These updates fix this S3 streaming issue and allow the validator to function properly.

Test Plan

We noticed this issue with this MRF: https://smmcnj.com/wp-content/uploads/2024/12/1568825545_SaintMichaelsMedicalCenter_standardcharges.json

That MRF has been tested and verified. I initially made these updates our code base back in June. We've tested around 200 MRFs since then and have seen no issues.

@mint-thompson
Copy link
Collaborator

Thank you for opening this PR. I've confirmed that these changes don't seem to cause any problems when used with the CLI or online validator. I would like a little more information about your use case: you mentioned an error that would occur. What is the error message, and where in the process would that error occur?

@dlee3
Copy link
Author

dlee3 commented Aug 7, 2025

The file mentioned above fails with the message below. It looks like the process never really gets past trying to open the file before it fails. We're using this package directly in our code. We are not using the CLI or the online validator.

file:///node_modules/hpt-validator/lib/utils.js:45
        chunk = chunk.trimStart();
                      ^
TypeError: chunk.trimStart is not a function
    at removeBOM (file:///node_modules/hpt-validator/lib/utils.js:45:23)
    at ChecksumStream.<anonymous> (file:///node_modules/hpt-validator/lib/versions/common/json.js:29:24)
    at ChecksumStream.emit (node:events:518:28)
    at ChecksumStream.emit (node:domain:489:12)
    at ChecksumStream.Readable.read (node:internal/streams/readable:782:10)
    at flow (node:internal/streams/readable:1283:53)
    at resume_ (node:internal/streams/readable:1262:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)

@mint-thompson
Copy link
Collaborator

This change looks good! Could you add some test cases to show the function handling a Buffer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants