Skip to content

[BUG] [0.1.0] reconstruct_new_content and apply_to_content aggressively normalize file line endings to LF (\n), destroying CRLF (\r\n) #53305

@sp643541-bit

Description

@sp643541-bit

Project

cortex

Description

The patch algorithms in diff.rs (apply_to_content and reconstruct_new_content) destructively join parsed text lines back together utilizing a hardcoded \n sequence. This fails to account for Windows-style line endings (\r\n). If a file natively formatted in CRLF is processed by the diff patching engine, all of its unmodified lines and added lines are artificially normalized to strictly LF format. This unexpectedly alters the total file byte content beyond the scope of a given patch's actual logic, generating unintentional, massive newline modification diffs across version control systems like Git when dealing with cross-platform repositories.

Code Context

File: /home/im8/bounty-challenge/cortex/src/cortex-engine/src/diff.rs

265:     /// Reconstruct new file content from additions.
266:     fn reconstruct_new_content(&self) -> String {
...
274:             .collect::<Vec<_>>()
275:             .join("\n") // ← Hardcodes `\n`, stripping original source layout endings
276:     }
...
278:     /// Apply hunks to existing content.
279:     fn apply_to_content(&self, content: &str) -> Result<String> {
280:         let mut lines: Vec<String> = content
281:             .lines() // ← Strips \r internally in standard library iterator
...
314:         Ok(lines.join("\n")) // ← Hardcodes `\n`, resulting in total CRLF loss
315:     }

Error Message

Debug Logs

System Information

Linux

Screenshots

https://github.com/sp643541-bit/images/blob/main/Proofis.png

Steps to Reproduce

  1. Fetch or initialize an AI engine task modifying an existing source wrapper file encoded strictly with \r\n (CRLF) endings.
  2. Instruct Cortex to apply a small patch modifying highly localized components using its embedded patch mechanisms.
  3. Observe the output payload. The resulting modified document abandons its initial carriage return properties for Line Feed components regardless of system environment configuration or original encoding profiles.

Expected Behavior

The patching engine must proactively infer the primary line-ending layout natively present within the source prior to altering content. It should dynamically respect original content geometries by mapping \r\n back uniformly if it constituted the predominant terminating characteristic.

Actual Behavior

The file algorithms blindly force the entire document format into Linux-compliant formatting without checking standard byte arrays leading to the complete degradation of Windows CR markers globally across the payload array stack.

Verified output from repro_bug38.rs:

Original bytes: [108, 105, 110, 101, 49, 13, 10, 108, 105, 110, 101, 50, 13, 10, 108, 105, 110, 101, 51, 13, 10]
Reconstructed bytes: [108, 105, 110, 101, 49, 10, 108, 105, 110, 101, 50, 10, 108, 105, 110, 101, 51]

### Additional Context

If the IDE leverages the Cortex engine to process code editing diffs or file patching routines on developers working via Windows environments or cross-platform code structures utilizing `core.autocrlf false`, simple targeted edits will violently format all unmodified lines unexpectedly. Changes presented inside VS Code or Source Control managers register 100% file diff coverage merely for modifying a single line!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions