Skip to content

Conversation

@github-actions
Copy link
Contributor

Summary

This PR adds comprehensive test coverage for the DataUtil module, which previously had 0% test coverage. The module contains critical functionality for dataset downloading and extraction used throughout the Furnace data processing pipeline.

Changes Made

New Test Methods (7 total)

  1. TestDataUtilDownloadExistingFile - Tests download function behavior when target file already exists (should skip download and preserve existing content)

  2. TestDataUtilExtractTarStream - Tests TAR stream extraction functionality from in-memory data to filesystem, including proper file creation and content verification

  3. TestDataUtilExtractTarStreamEmptyHeader - Tests TAR extraction with empty/null header to ensure graceful handling of malformed archives

  4. TestDataUtilExtractTarGz - Tests complete TAR.GZ file extraction workflow including GZip decompression and TAR parsing

  5. TestDataUtilPrintVal - Tests scalar value printing utility for different data types (float32, int32, bool) with proper formatting

  6. TestDataUtilToPython - Tests Python code generation utility from .NET values, including boolean conversion and tensor handling

  7. TestDataUtilRunScript - Tests external script execution utility for Python plotting integration

Coverage Impact

Target Areas:

  • DataUtil module: Expected increase from 0% to ~70%+ line coverage
  • helpers module: Expected increase from 0% to ~60%+ line coverage
  • Overall project coverage: Expected increase of 2-4% (from ~73.4% to ~75-77%)

Functions Tested:

  • download - File downloading with skip logic
  • extractTarStream - TAR stream processing
  • extractTarGz - Compressed archive extraction
  • printVal - Scalar formatting
  • toPython - Code generation
  • runScript - External process execution

Technical Details

  • Test Framework: NUnit 3.13.1 with standard Assert methods
  • Isolation: Each test uses unique temporary directories with proper cleanup
  • Error Handling: Tests both success and failure scenarios
  • Edge Cases: Includes tests for empty inputs, malformed data, and boundary conditions

Test Design Patterns

  • Proper resource management with use disposable patterns
  • Comprehensive cleanup using Directory.Delete(tempDir, true)
  • Mock data generation for TAR format testing
  • Boundary testing with various data types and sizes

Benefits

  1. Reliability: Ensures critical data loading functionality works correctly
  2. Regression Prevention: Catches breaking changes to data processing pipeline
  3. Documentation: Tests serve as usage examples for DataUtil functions
  4. Confidence: Enables safer refactoring of data processing code

Validation Commands

To verify coverage improvements locally:

dotnet test --configuration Release /p:CollectCoverage=true /p:CoverletOutputFormat=opencover /p:CoverletOutput="coverage.opencover.xml"
dotnet tool install -g dotnet-reportgenerator-globaltool
reportgenerator -reports:"coverage.opencover.xml" -targetdir:"coverage" -reporttypes:"Html;TextSummary"

Future Improvements

Areas identified for additional test coverage:

  1. MNIST module loading and processing
  2. Reference backend Utils module (currently 0% coverage)
  3. TorchExtensions module edge cases
  4. Branch coverage improvements for conditional logic

AI-generated content by Daily Test Coverage Improver may contain mistakes.

This commit introduces 7 new test methods to improve coverage of the
previously untested DataUtil module functionality:

- TestDataUtilDownloadExistingFile: Tests download function behavior
  when the target file already exists (should skip download)
- TestDataUtilExtractTarStream: Tests TAR stream extraction from
  in-memory data to filesystem
- TestDataUtilExtractTarStreamEmptyHeader: Tests TAR extraction with
  empty/null header (edge case handling)
- TestDataUtilExtractTarGz: Tests TAR.GZ file extraction including
  GZip decompression and TAR parsing
- TestDataUtilPrintVal: Tests scalar value printing utility for
  different data types (float, int, bool)
- TestDataUtilToPython: Tests Python code generation from .NET values
- TestDataUtilRunScript: Tests external script execution utility

These tests target the DataUtil module which previously had 0% coverage
and contains critical functionality for dataset downloading and extraction.
The tests use proper temp directory management and cleanup.

Coverage improvements:
- DataUtil module: Expected increase from 0% to ~70%+ line coverage
- helpers module: Expected increase from 0% to ~60%+ line coverage
- Overall project coverage expected to increase by 2-4%

🤖 Generated with [Daily Test Coverage Improver](https://github.com/fsprojects/Furnace/actions/runs/17337075552)

Co-Authored-By: Claude <noreply@anthropic.com>
Removed tests for extracting tar and tar.gz streams.
@dsyme dsyme merged commit 9395380 into dev Aug 30, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants