-
Notifications
You must be signed in to change notification settings - Fork 0
Subset command and tests #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ant file handling; add new read and info modules for axis length and categorical column decoding
…egorical_column and col_chunk_as_strings
…t results summary
…henated version tags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds major new functionality to the h5ad CLI tool, transforming it from a basic inspection utility into a full-featured file manipulation toolkit with comprehensive testing and CI/CD infrastructure.
Changes:
- Implemented a new
subsetcommand that allows filtering.h5adfiles by observation (cell) and variable (gene) names with support for both dense and sparse matrix formats - Refactored existing
infoandtablecommands into separate modules with improved functionality - Added comprehensive test suite with 58 tests covering all major functionality
- Established CI/CD pipelines with GitHub Actions for automated testing and Docker image publishing
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/h5ad/commands/subset.py | Core subsetting functionality with support for dense/sparse matrices and chunked processing |
| src/h5ad/commands/info.py | Refactored info command into dedicated module |
| src/h5ad/commands/table.py | Refactored table export command with improved validation |
| src/h5ad/info.py | Shared utility functions for reading axis information |
| src/h5ad/read.py | Shared utility functions for decoding strings and reading categorical data |
| src/h5ad/cli.py | Simplified CLI with commands delegated to separate modules |
| tests/conftest.py | Test fixtures for creating various h5ad file types |
| tests/test_subset.py | Comprehensive tests for subsetting functionality |
| tests/test_info_read.py | Tests for info and read utility functions |
| tests/test_cli.py | CLI integration tests |
| pyproject.toml | Added dev dependencies and coverage configuration |
| pytest.ini | Pytest configuration for test discovery |
| .github/workflows/tests.yml | CI workflow for automated testing with coverage |
| .github/workflows/quay-on-tag.yml | Docker build and publish workflow |
| docs/TESTING.md | Testing documentation |
| README.md | Enhanced documentation with new features |
| Dockerfile | Updated with csvkit installation |
| uv.lock | Dependency updates for new packages |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/h5ad/commands/subset.py
Outdated
| if indices is None: | ||
| src.copy(key, dst, name=key) | ||
| else: | ||
| src.copy(key, dst, name=key) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if-else branches perform the same operation (src.copy). This duplicated code can be simplified to just 'src.copy(key, dst, name=key)' without the conditional.
| if indices is None: | |
| src.copy(key, dst, name=key) | |
| else: | |
| src.copy(key, dst, name=key) | |
| src.copy(key, dst, name=key) |
Dockerfile
Outdated
| ENV UV_NO_DEV=1 | ||
|
|
||
| # Clone the repo into /app | ||
| RUN git clone --branch 0.1.0 https://github.com/cellgeni/h5ad-cli.git . |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Dockerfile is hardcoded to clone version 0.1.0, but this PR is introducing version 0.2.0 features. This should be updated to a newer tag or use a build argument to allow dynamic version specification.
src/h5ad/info.py
Outdated
| @@ -0,0 +1,78 @@ | |||
| from typing import Optional, Tuple | |||
| import h5py | |||
| import numpy as np | |||
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'np' is not used.
| @@ -0,0 +1,279 @@ | |||
| """Tests for CLI commands.""" | |||
|
|
|||
| import pytest | |||
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
|
|
||
| import pytest | ||
| import csv | ||
| from pathlib import Path |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Path' is not used.
| @@ -0,0 +1,416 @@ | |||
| """Tests for subset.py module functions.""" | |||
|
|
|||
| import pytest | |||
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
| import pytest | ||
| import h5py | ||
| import numpy as np | ||
| from pathlib import Path |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Path' is not used.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
This PR represents a major enhancement to the h5ad CLI tool, adding significant new functionality, comprehensive testing, and CI/CD automation. The changes transform the tool from a basic info/table utility into a full-featured
.h5adfile manipulation toolkit with production-ready quality assurance.📊 Statistics
🚀 Major Features Added
1. Subset Command ⭐
The crown jewel of this PR - a fully functional subsetting capability for
.h5adfiles.Features:
--obsflag--varflagFiles Added:
Usage Example:
2. Enhanced Info Command
Refactored to provide better structure visualization and rich terminal output.
Changes:
3. Table Export Command Improvements
Enhanced table export functionality with better memory management.
Changes:
Usage Example:
4. Core Utility Functions
New shared utilities for reading h5ad files efficiently.
Files Added:
🧪 Testing Infrastructure
Comprehensive Test Suite
Added a robust test suite covering all major functionality:
Test Files:
.h5adfilesTest Configuration:
Documentation:
🔧 CI/CD & Automation
1. GitHub Actions - Testing Workflow
.github/workflows/tests.yml - Automated testing on push/PR
Features:
mainanddevbranches2. GitHub Actions - Docker Build & Push
.github/workflows/quay-on-tag.yml - Automated Docker image publishing
Features:
latestfor stable releases (tags without hyphens)0.2.0-preview) don't overwritelatest📦 Dependencies & Configuration
Added Dependencies
Runtime:
numpy>=2.3.5- Array operations and sparse matrix handlingrich>=14.2.0- Terminal formatting and progress barsDevelopment:
pytest>=8.3.4- Test frameworkpytest-cov>=6.0.0- Coverage reportingDocker Enhancements
csvkitinstallation for CSV manipulation capabilitiesCoverage Configuration
Added comprehensive coverage settings in pyproject.toml:
src/h5ad📚 Documentation Updates
README Overhaul
README.md significantly expanded with:
New Documentation
🔍 Code Quality Improvements
Refactoring
Standards Compliance
encoding-typeattribute to write as bytes (h5ad standard compliance)🏷️ Version Tags
Released preview versions during development:
0.2.0-preview(62d3edb)0.2.0-preview2(aa5eebf)0.2.0-preview3(aa5eebf)🔄 Migration Notes
Breaking Changes
--colsrenamed to--columns(short form-cstill works)--outrenamed to--output(short form-ostill works)Before (v0.1.x):
After (v0.2.0):
Note: Short forms
-cand-ocontinue to work unchanged.New Features Available
After merging, users will have access to:
h5ad subsetcommand for filtering large files✅ Testing Checklist
.h5adfiles📋 Commit History
View all 24 commits (click to expand)
🎯 Recommendation
READY TO MERGE - This PR represents significant value-add with:
Suggested next steps after merge:
0.2.0or1.0.0)Author: Aljes
Date Range: December 12, 2025 - January 15, 2026
Target Branch: main ← dev