Skip to content

integration tests#30

Merged
Rakshat28 merged 3 commits intomainfrom
feat/integration-testing
Feb 27, 2026
Merged

integration tests#30
Rakshat28 merged 3 commits intomainfrom
feat/integration-testing

Conversation

@Rakshat28
Copy link
Copy Markdown
Owner

test(#28): implement robust integration test suite and CI pipeline

  • Add integration test harness using assert_cmd and tempfile for safe $HOME sandboxing.
  • Implement comprehensive test cases: happy path, zero-byte files, deep nesting, and tiered sparse-hashing collisions.
  • Verify metadata integrity (mtime, permissions, xattrs) across the deduplication pipeline.
  • Add safety checks for --allow-unsafe-hardlinks and --paranoid bit-rot detection.
  • Add GitHub Actions workflow (ci.yml) to enforce formatting, linting, and sequential testing on ubuntu-latest.

Closes #28

Copilot AI review requested due to automatic review settings February 27, 2026 13:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive integration test suite and CI/CD pipeline for the bdstorage file deduplication tool. The changes establish automated testing infrastructure to ensure data safety and correctness across dedupe, restore, and scan operations. The PR includes 9 integration tests covering happy paths, edge cases (zero-byte files, deep nesting, sparse hashing collisions), metadata preservation, safety features (paranoid mode, dry-run), and hardlink fallback behavior. A GitHub Actions workflow enforces code quality through formatting checks, linting, testing, and coverage reporting.

Changes:

  • Added comprehensive integration test suite using assert_cmd and tempfile for isolated HOME directory sandboxing
  • Implemented GitHub Actions CI pipeline with formatting, linting, testing (sequential with --test-threads=1), and code coverage
  • Applied rustfmt formatting improvements across source files (line breaking, import ordering)
  • Added dev-dependencies: assert_cmd, tempfile, predicates, walkdir

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/integration_tests.rs New integration test suite with 9 tests covering deduplication, restoration, metadata integrity, dry-run mode, paranoid mode, and edge cases
.github/workflows/ci.yml CI pipeline configuration with test, format, lint, and coverage jobs running on ubuntu-latest
Cargo.toml Added dev-dependencies for integration testing framework
Cargo.lock Dependency lockfile updates for new dev-dependencies
src/main.rs Formatting improvements: multi-line formatting for long function calls and string literals
src/state.rs Formatting improvements: line breaking for method chains
src/vault.rs Import reordering (alphabetical) and multi-line formatting for conditionals
src/dedupe.rs Import reordering and formatting improvements for function signatures and method chains
src/hasher.rs Formatting improvements for method chains

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,346 @@
use assert_cmd::Command;
use std::fs;
use std::os::unix::fs::{MetadataExt, PermissionsExt};
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests import Unix-specific APIs (std::os::unix::fs) without platform guards. While the PR description mentions testing on ubuntu-latest, the code should either have cfg attributes or documentation noting these tests are Unix-only. This is consistent with the main codebase which is Unix-only, but should be explicit in the test file.

Copilot uses AI. Check for mistakes.
Comment on lines +341 to +342
let vault = home.join(".imprint").join("store");
assert!(
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dry-run test verifies that the vault is not created, but doesn't check if the state database is created. In true dry-run mode, the state database at ~/.imprint/state.redb should also not be created or modified. Consider adding an assertion to verify the entire ~/.imprint directory doesn't exist after a dry-run to ensure complete isolation.

Suggested change
let vault = home.join(".imprint").join("store");
assert!(
let imprint_dir = home.join(".imprint");
assert!(
!imprint_dir.exists(),
".imprint directory should not be created in dry-run mode"
);
let vault = imprint_dir.join("store");
assert!(

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +35
uses: actions/cache@v3
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo index
uses: actions/cache@v3
with:
path: ~/.cargo/git
key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo build
uses: actions/cache@v3
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow uses actions/cache@v3 while actions/checkout@v4 is used. Consider updating to actions/cache@v4 for consistency and to benefit from the latest cache improvements and bug fixes. GitHub Actions v4 is the current major version as of the knowledge cutoff.

Suggested change
uses: actions/cache@v3
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
- name: Cache cargo index
uses: actions/cache@v3
with:
path: ~/.cargo/git
key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}
- name: Cache cargo build
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
- name: Cache cargo index
uses: actions/cache@v4
with:
path: ~/.cargo/git
key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}
- name: Cache cargo build
uses: actions/cache@v4

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +243
assert_eq!(
file1_meta.ino(),
file2_meta.ino(),
"Hardlinked files should have the same inode"
);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_hardlink_fallback test assumes that reflinks will not be available and hardlinks will be used. However, on systems that support reflinks (like modern Linux filesystems with CoW support), this test will fail because the code will create reflinks (different inodes) instead of hardlinks (same inode). The test should either check which type of link was created, or ensure reflinks are unavailable for this specific test scenario.

Suggested change
assert_eq!(
file1_meta.ino(),
file2_meta.ino(),
"Hardlinked files should have the same inode"
);
if file1_meta.ino() == file2_meta.ino() {
// On filesystems without reflink support (or when hardlinks are used),
// deduplication should result in hardlinked files with the same inode.
assert_eq!(
file1_meta.ino(),
file2_meta.ino(),
"Hardlinked files should have the same inode"
);
} else {
// On filesystems with reflink (CoW) support, the implementation may create
// reflinks instead of hardlinks, which results in different inodes while
// still sharing storage. In that case, we at least verify that the
// deduplicated files have identical contents.
let file1_content =
fs::read(&target.join("file1.txt")).expect("Failed to read file1 contents");
let file2_content =
fs::read(&target.join("file2.txt")).expect("Failed to read file2 contents");
assert_eq!(
file1_content, file2_content,
"Deduplicated files should have identical contents regardless of link type"
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +39
- name: Cache cargo registry
uses: actions/cache@v3
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo index
uses: actions/cache@v3
with:
path: ~/.cargo/git
key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}

- name: Cache cargo build
uses: actions/cache@v3
with:
path: target
key: ${{ runner.os }}-cargo-build-target-${{ hashFiles('**/Cargo.lock') }}

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache paths use ~/.cargo which may not be portable across different runner configurations. Consider using Swatinem/rust-cache action instead, which handles Rust-specific caching more robustly and includes both ~/.cargo and target directories automatically with better cache key management.

Suggested change
- name: Cache cargo registry
uses: actions/cache@v3
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
- name: Cache cargo index
uses: actions/cache@v3
with:
path: ~/.cargo/git
key: ${{ runner.os }}-cargo-git-${{ hashFiles('**/Cargo.lock') }}
- name: Cache cargo build
uses: actions/cache@v3
with:
path: target
key: ${{ runner.os }}-cargo-build-target-${{ hashFiles('**/Cargo.lock') }}
- name: Cache Rust build
uses: Swatinem/rust-cache@v2

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +91
let mut restore_cmd = run_cmd(home, &["restore", &target.to_string_lossy()]);
restore_cmd.assert().success();

let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.collect();

assert!(
vault_files.is_empty(),
"Vault should be empty after restore and GC"
);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore test verifies the vault is empty after restore but doesn't verify that the restored files have the correct content. The test should verify that at least one of the deduplicated files can be read and contains the expected content after restoration to ensure the restore operation actually works correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +170
if vault.exists() {
let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.collect();

assert!(
vault_files.len() >= 1,
"Vault should contain master files after dedupe of large files"
);
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test creates two 15KB files with a single byte difference at position 7KB. This tests the sparse hashing logic, but the assertion only checks if the vault exists and has at least 1 file. The test should verify that exactly 2 vault files exist (one for each unique content) to properly validate that the sparse hashing correctly distinguished between the two files.

Suggested change
if vault.exists() {
let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.collect();
assert!(
vault_files.len() >= 1,
"Vault should contain master files after dedupe of large files"
);
}
assert!(
vault.exists(),
"Vault store directory should exist after dedupe of large files"
);
let vault_files: Vec<_> = walkdir::WalkDir::new(&vault)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.collect();
assert_eq!(
vault_files.len(),
2,
"Vault should contain exactly 2 master files (one per unique large file content)"
);

Copilot uses AI. Check for mistakes.
run: cargo tarpaulin --out Xml --timeout 300 -- --test-threads=1

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow uses codecov/codecov-action@v3, but v4 is available and recommended as of late 2023. Consider updating to v4 which includes improvements to upload reliability and tokenless uploads for public repositories. Note that v4 may require explicit token configuration for private repositories.

Suggested change
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4

Copilot uses AI. Check for mistakes.
Comment on lines +269 to +270
file_content.push(0xFF);
fs::write(&vault_path, file_content).expect("Failed to corrupt vault file");
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paranoid mode test corrupts a vault file by appending data, but this changes the file size which may be detected before the hash comparison. The test should verify that paranoid mode actually performs byte-by-byte comparison. Consider corrupting a byte in the middle of the file instead of appending to better test the paranoid verification logic.

Suggested change
file_content.push(0xFF);
fs::write(&vault_path, file_content).expect("Failed to corrupt vault file");
if !file_content.is_empty() {
let idx = file_content.len() / 2;
file_content[idx] ^= 0xFF;
fs::write(&vault_path, file_content).expect("Failed to corrupt vault file");
}

Copilot uses AI. Check for mistakes.
Comment on lines +209 to +210
if let Ok(Some(attr_val)) = xattr::get(&dup_path, "user.test_attr") {
assert_eq!(attr_val, b"test_value", "Extended attribute value should match");
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xattr assertion is wrapped in an if let Ok(Some(...)) which silently passes the test if xattrs are not supported or fail to be retrieved. This means the test could pass even if xattr preservation is completely broken. Consider using expect() or at minimum logging a warning when xattrs cannot be verified, or skip the xattr assertion on systems where user xattrs are not supported.

Suggested change
if let Ok(Some(attr_val)) = xattr::get(&dup_path, "user.test_attr") {
assert_eq!(attr_val, b"test_value", "Extended attribute value should match");
match xattr::get(&dup_path, "user.test_attr") {
Ok(Some(attr_val)) => {
assert_eq!(attr_val, b"test_value", "Extended attribute value should match");
}
Ok(None) => {
panic!("Extended attribute 'user.test_attr' not found on duplicate file");
}
Err(e) => {
panic!(
"Failed to retrieve extended attribute 'user.test_attr' on duplicate file: {e}"
);
}

Copilot uses AI. Check for mistakes.
@Rakshat28 Rakshat28 merged commit 7021598 into main Feb 27, 2026
2 checks passed
@Rakshat28 Rakshat28 deleted the feat/integration-testing branch March 11, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement robust integration test suite and CI pipeline (Trust Building)

2 participants