integration tests by Rakshat28 · Pull Request #30 · Rakshat28/bdstorage

Rakshat28 · 2026-02-27T13:49:08Z

test(#28): implement robust integration test suite and CI pipeline

Add integration test harness using assert_cmd and tempfile for safe $HOME sandboxing.
Implement comprehensive test cases: happy path, zero-byte files, deep nesting, and tiered sparse-hashing collisions.
Verify metadata integrity (mtime, permissions, xattrs) across the deduplication pipeline.
Add safety checks for --allow-unsafe-hardlinks and --paranoid bit-rot detection.
Add GitHub Actions workflow (ci.yml) to enforce formatting, linting, and sequential testing on ubuntu-latest.

Closes #28

Copilot

Pull request overview

This PR implements a comprehensive integration test suite and CI/CD pipeline for the bdstorage file deduplication tool. The changes establish automated testing infrastructure to ensure data safety and correctness across dedupe, restore, and scan operations. The PR includes 9 integration tests covering happy paths, edge cases (zero-byte files, deep nesting, sparse hashing collisions), metadata preservation, safety features (paranoid mode, dry-run), and hardlink fallback behavior. A GitHub Actions workflow enforces code quality through formatting checks, linting, testing, and coverage reporting.

Changes:

Added comprehensive integration test suite using assert_cmd and tempfile for isolated HOME directory sandboxing
Implemented GitHub Actions CI pipeline with formatting, linting, testing (sequential with --test-threads=1), and code coverage
Applied rustfmt formatting improvements across source files (line breaking, import ordering)
Added dev-dependencies: assert_cmd, tempfile, predicates, walkdir

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tests/integration_tests.rs	New integration test suite with 9 tests covering deduplication, restoration, metadata integrity, dry-run mode, paranoid mode, and edge cases
.github/workflows/ci.yml	CI pipeline configuration with test, format, lint, and coverage jobs running on ubuntu-latest
Cargo.toml	Added dev-dependencies for integration testing framework
Cargo.lock	Dependency lockfile updates for new dev-dependencies
src/main.rs	Formatting improvements: multi-line formatting for long function calls and string literals
src/state.rs	Formatting improvements: line breaking for method chains
src/vault.rs	Import reordering (alphabetical) and multi-line formatting for conditionals
src/dedupe.rs	Import reordering and formatting improvements for function signatures and method chains
src/hasher.rs	Formatting improvements for method chains

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T13:54:52Z

tests/integration_tests.rs

@@ -0,0 +1,346 @@
+use assert_cmd::Command;
+use std::fs;
+use std::os::unix::fs::{MetadataExt, PermissionsExt};


The integration tests import Unix-specific APIs (std::os::unix::fs) without platform guards. While the PR description mentions testing on ubuntu-latest, the code should either have cfg attributes or documentation noting these tests are Unix-only. This is consistent with the main codebase which is Unix-only, but should be explicit in the test file.

Copilot · 2026-02-27T13:54:52Z

tests/integration_tests.rs

+    let vault = home.join(".imprint").join("store");
+    assert!(


The dry-run test verifies that the vault is not created, but doesn't check if the state database is created. In true dry-run mode, the state database at ~/.imprint/state.redb should also not be created or modified. Consider adding an assertion to verify the entire ~/.imprint directory doesn't exist after a dry-run to ensure complete isolation.

Suggested change

let vault = home.join(".imprint").join("store");

assert!(

let imprint_dir = home.join(".imprint");

assert!(

!imprint_dir.exists(),

".imprint directory should not be created in dry-run mode"

);

let vault = imprint_dir.join("store");

assert!(

Copilot · 2026-02-27T13:54:53Z

.github/workflows/ci.yml

+        uses: actions/cache@v3
+        with:
+          path: ~/.cargo/registry
+          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
+
+      - name: Cache cargo index
+        uses: actions/cache@v3
+        with:
+          path: ~/.cargo/git
+          key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}
+
+      - name: Cache cargo build
+        uses: actions/cache@v3


The workflow uses actions/cache@v3 while actions/checkout@v4 is used. Consider updating to actions/cache@v4 for consistency and to benefit from the latest cache improvements and bug fixes. GitHub Actions v4 is the current major version as of the knowledge cutoff.

Suggested change

uses: actions/cache@v3

with:

path: ~/.cargo/registry

key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo index

uses: actions/cache@v3

with:

path: ~/.cargo/git

key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo build

uses: actions/cache@v3

uses: actions/cache@v4

with:

path: ~/.cargo/registry

key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo index

uses: actions/cache@v4

with:

path: ~/.cargo/git

key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo build

uses: actions/cache@v4

Copilot · 2026-02-27T13:54:53Z

tests/integration_tests.rs

+    assert_eq!(
+        file1_meta.ino(),
+        file2_meta.ino(),
+        "Hardlinked files should have the same inode"
+    );


The test_hardlink_fallback test assumes that reflinks will not be available and hardlinks will be used. However, on systems that support reflinks (like modern Linux filesystems with CoW support), this test will fail because the code will create reflinks (different inodes) instead of hardlinks (same inode). The test should either check which type of link was created, or ensure reflinks are unavailable for this specific test scenario.

Suggested change

assert_eq!(

file1_meta.ino(),

file2_meta.ino(),

"Hardlinked files should have the same inode"

);

if file1_meta.ino() == file2_meta.ino() {

// On filesystems without reflink support (or when hardlinks are used),

// deduplication should result in hardlinked files with the same inode.

assert_eq!(

file1_meta.ino(),

file2_meta.ino(),

"Hardlinked files should have the same inode"

);

} else {

// On filesystems with reflink (CoW) support, the implementation may create

// reflinks instead of hardlinks, which results in different inodes while

// still sharing storage. In that case, we at least verify that the

// deduplicated files have identical contents.

let file1_content =

fs::read(&target.join("file1.txt")).expect("Failed to read file1 contents");

let file2_content =

fs::read(&target.join("file2.txt")).expect("Failed to read file2 contents");

assert_eq!(

file1_content, file2_content,

"Deduplicated files should have identical contents regardless of link type"

);

}

Copilot · 2026-02-27T13:54:53Z

.github/workflows/ci.yml

+      - name: Cache cargo registry
+        uses: actions/cache@v3
+        with:
+          path: ~/.cargo/registry
+          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
+
+      - name: Cache cargo index
+        uses: actions/cache@v3
+        with:
+          path: ~/.cargo/git
+          key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}
+
+      - name: Cache cargo build
+        uses: actions/cache@v3
+        with:
+          path: target
+          key: ${{ runner.os }}-cargo-build-target-${{ hashFiles('**/Cargo.lock') }}
+


The cache paths use ~/.cargo which may not be portable across different runner configurations. Consider using Swatinem/rust-cache action instead, which handles Rust-specific caching more robustly and includes both ~/.cargo and target directories automatically with better cache key management.

Suggested change

- name: Cache cargo registry

uses: actions/cache@v3

with:

path: ~/.cargo/registry

key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo index

uses: actions/cache@v3

with:

path: ~/.cargo/git

key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo build

uses: actions/cache@v3

with:

path: target

key: ${{ runner.os }}-cargo-build-target-${{ hashFiles('**/Cargo.lock') }}

- name: Cache Rust build

uses: Swatinem/rust-cache@v2

Copilot · 2026-02-27T13:54:54Z

tests/integration_tests.rs

+    let mut restore_cmd = run_cmd(home, &["restore", &target.to_string_lossy()]);
+    restore_cmd.assert().success();
+
+    let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)
+        .into_iter()
+        .filter_map(|e| e.ok())
+        .filter(|e| e.file_type().is_file())
+        .collect();
+
+    assert!(
+        vault_files.is_empty(),
+        "Vault should be empty after restore and GC"
+    );


The restore test verifies the vault is empty after restore but doesn't verify that the restored files have the correct content. The test should verify that at least one of the deduplicated files can be read and contains the expected content after restoration to ensure the restore operation actually works correctly.

Copilot · 2026-02-27T13:54:54Z

tests/integration_tests.rs

+    if vault.exists() {
+        let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)
+            .into_iter()
+            .filter_map(|e| e.ok())
+            .filter(|e| e.file_type().is_file())
+            .collect();
+
+        assert!(
+            vault_files.len() >= 1,
+            "Vault should contain master files after dedupe of large files"
+        );
+    }


The test creates two 15KB files with a single byte difference at position 7KB. This tests the sparse hashing logic, but the assertion only checks if the vault exists and has at least 1 file. The test should verify that exactly 2 vault files exist (one for each unique content) to properly validate that the sparse hashing correctly distinguished between the two files.

Suggested change

if vault.exists() {

let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)

.into_iter()

.filter_map(|e| e.ok())

.filter(|e| e.file_type().is_file())

.collect();

assert!(

vault_files.len() >= 1,

"Vault should contain master files after dedupe of large files"

);

}

assert!(

vault.exists(),

"Vault store directory should exist after dedupe of large files"

);

let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)

.into_iter()

.filter_map(|e| e.ok())

.filter(|e| e.file_type().is_file())

.collect();

assert_eq!(

vault_files.len(),

2,

"Vault should contain exactly 2 master files (one per unique large file content)"

);

Copilot · 2026-02-27T13:54:54Z

.github/workflows/ci.yml

+        run: cargo tarpaulin --out Xml --timeout 300 -- --test-threads=1
+
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v3


The workflow uses codecov/codecov-action@v3, but v4 is available and recommended as of late 2023. Consider updating to v4 which includes improvements to upload reliability and tokenless uploads for public repositories. Note that v4 may require explicit token configuration for private repositories.

Suggested change

uses: codecov/codecov-action@v3

uses: codecov/codecov-action@v4

Copilot · 2026-02-27T13:54:54Z

tests/integration_tests.rs

+        file_content.push(0xFF);
+        fs::write(&vault_path, file_content).expect("Failed to corrupt vault file");


The paranoid mode test corrupts a vault file by appending data, but this changes the file size which may be detected before the hash comparison. The test should verify that paranoid mode actually performs byte-by-byte comparison. Consider corrupting a byte in the middle of the file instead of appending to better test the paranoid verification logic.

Suggested change

file_content.push(0xFF);

fs::write(&vault_path, file_content).expect("Failed to corrupt vault file");

if !file_content.is_empty() {

let idx = file_content.len() / 2;

file_content[idx] ^= 0xFF;

fs::write(&vault_path, file_content).expect("Failed to corrupt vault file");

}

Copilot · 2026-02-27T13:54:55Z

tests/integration_tests.rs

+    if let Ok(Some(attr_val)) = xattr::get(&dup_path, "user.test_attr") {
+        assert_eq!(attr_val, b"test_value", "Extended attribute value should match");


The xattr assertion is wrapped in an if let Ok(Some(...)) which silently passes the test if xattrs are not supported or fail to be retrieved. This means the test could pass even if xattr preservation is completely broken. Consider using expect() or at minimum logging a warning when xattrs cannot be verified, or skip the xattr assertion on systems where user xattrs are not supported.

Suggested change

if let Ok(Some(attr_val)) = xattr::get(&dup_path, "user.test_attr") {

assert_eq!(attr_val, b"test_value", "Extended attribute value should match");

match xattr::get(&dup_path, "user.test_attr") {

Ok(Some(attr_val)) => {

assert_eq!(attr_val, b"test_value", "Extended attribute value should match");

}

Ok(None) => {

panic!("Extended attribute 'user.test_attr' not found on duplicate file");

}

Err(e) => {

panic!(

"Failed to retrieve extended attribute 'user.test_attr' on duplicate file: {e}"

);

}

integration tests

40f0674

Copilot AI review requested due to automatic review settings February 27, 2026 13:49

Copilot started reviewing on behalf of Rakshat28 February 27, 2026 13:49 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Rakshat28 added 2 commits February 27, 2026 19:26

integration tests

c4f8669

fix clippy warnings & harden integration test suite

3f95ef2

Rakshat28 merged commit 7021598 into main Feb 27, 2026
2 checks passed

Rakshat28 deleted the feat/integration-testing branch March 11, 2026 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration tests#30

integration tests#30
Rakshat28 merged 3 commits intomainfrom
feat/integration-testing

Rakshat28 commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    assert_eq!(
-        file1_meta.ino(),
-        file2_meta.ino(),
-        "Hardlinked files should have the same inode"
-    );
+    if file1_meta.ino() == file2_meta.ino() {
+        // On filesystems without reflink support (or when hardlinks are used),
+        // deduplication should result in hardlinked files with the same inode.
+        assert_eq!(
+            file1_meta.ino(),
+            file2_meta.ino(),
+            "Hardlinked files should have the same inode"
+        );
+    } else {
+        // On filesystems with reflink (CoW) support, the implementation may create
+        // reflinks instead of hardlinks, which results in different inodes while
+        // still sharing storage. In that case, we at least verify that the
+        // deduplicated files have identical contents.
+        let file1_content =
+            fs::read(&target.join("file1.txt")).expect("Failed to read file1 contents");
+        let file2_content =
+            fs::read(&target.join("file2.txt")).expect("Failed to read file2 contents");
+        assert_eq!(
+            file1_content, file2_content,
+            "Deduplicated files should have identical contents regardless of link type"
+        );
+    }

	uses: codecov/codecov-action@v3
	uses: codecov/codecov-action@v4

		file_content.push(0xFF);
		fs::write(&vault_path, file_content).expect("Failed to corrupt vault file");

		if let Ok(Some(attr_val)) = xattr::get(&dup_path, "user.test_attr") {
		assert_eq!(attr_val, b"test_value", "Extended attribute value should match");

-    if let Ok(Some(attr_val)) = xattr::get(&dup_path, "user.test_attr") {
-        assert_eq!(attr_val, b"test_value", "Extended attribute value should match");
+    match xattr::get(&dup_path, "user.test_attr") {
+        Ok(Some(attr_val)) => {
+            assert_eq!(attr_val, b"test_value", "Extended attribute value should match");
+        }
+        Ok(None) => {
+            panic!("Extended attribute 'user.test_attr' not found on duplicate file");
+        }
+        Err(e) => {
+            panic!(
+                "Failed to retrieve extended attribute 'user.test_attr' on duplicate file: {e}"
+            );
+        }

Conversation

Rakshat28 commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants