Releases: SemClone/binarysniffer
1.11.3
BinarySniffer v1.11.3 Release Notes
Release Date: November 5, 2025
Critical Fix Release
Fixed
- Integration Restoration - Restored critical osslili and UPMEX integrations that were incorrectly removed
- Re-added binarysniffer/integrations/enhanced_oslili.py for comprehensive license detection
- Re-added binarysniffer/integrations/upmex_adapter.py for package metadata extraction
- Re-added binarysniffer/integrations/init.py with proper module exports
- Fixed analyzer_enhanced.py to initialize and use both integrations properly
- Fixed archive.py to extract licenses and package metadata from archives
- Added conversion of UPMEX license data to ComponentMatch objects with upmex_detection match type
- OSLiLi license detection now shows oslili_detection match type with proper confidence scores
- Package metadata now properly extracted and displayed for JAR, WAR, WHL, and other supported formats
Verified
- Complete Integration Testing - Verified all integrations work correctly across file types
- OSLiLi detects licenses in source code files (.py, .js, .java, .c, .cpp, etc.) with 80-100% confidence
- UPMEX extracts package metadata including Maven coordinates, SPDX licenses, and notice text
- Archive files (JAR, IPA) properly analyzed with both component signatures and license detection
- Binary files (libssl.so, libcurl.so) correctly identify OpenSSL, cURL, and other components
- CLI output displays package type, license info, and comprehensive match evidence
What This Fixes
This release restores functionality that was accidentally removed in repository cleanup commits d6bef5c and 1ed6695. Users will now see:
- License Detection: Proper oslili_detection and upmex_detection match types in results
- Package Metadata: Maven coordinates, SPDX licenses, and package information for supported formats
- Enhanced Coverage: Comprehensive analysis across source code, archives, and binary files
- Evidence Details: Detailed confidence scores, detection methods, and source attribution
Upgrade Instructions
pip install --upgrade binarysniffer
No configuration changes required - all functionality is restored automatically.
Dependencies: osslili>=1.5.6, upmex>=1.6.7
1.11.2
[1.11.2] - 2025-10-27
Fixed
- Dependency Resolution - Fixed dependency compatibility issues with renamed packages
- Ensured clean migration from semantic-copycat-oslili to osslili>=1.5.6
- Ensured clean migration from semantic-copycat-upmex to upmex>=1.6.7
- Removed conflicting legacy package installations that caused version mismatches
- All integrations now properly import from updated package namespaces
This version focused on resolving issues with the dependency migration from semantic-copycat packages to
their renamed versions.
1.11.1
Release v1.11.1: Major performance and usability improvements
Fixed
- Progress Display - Fixed progress bar stuck at 0% for directory analysis
- Progress callbacks now properly track file processing
- Consolidated summary table for directory scans instead of individual file results
- File Processing - Resolved hanging issues with problematic files
- Added configurable timeout system (60s default) with --timeout option
- Automatic detection and exclusion of XML/plist metadata files
- Fixed processing of large files (>50MB excluded by default, use --include-large to analyze)
Added
- CLI Enhancements - New options for better control and visibility
- --timeout - Configure per-file timeout (default: 60 seconds)
- --include-large / -l - Include files larger than 50MB in analysis
- --debug / -v - Show files being processed in real-time
- --skip-metadata - Skip XML/plist metadata files entirely
- --with-hashes - Now properly displays file hash values
Improved
- Performance - Optimized file processing for large directories
- Sequential processing for large directories (>100 files) to prevent resource exhaustion
- Smart file filtering to skip problematic formats early
- Reduced timeout for metadata files (3 seconds)
- Documentation - Updated user guide with all new CLI options
- Comprehensive option descriptions and usage examples
- Fixed outdated references to deprecated flags
- License Detection - Verified OSLiLi integration functionality
- Successfully detects BSD, ISC, Python-2.0 licenses with high confidence (85-97%)
- TLSH Fuzzy Matching - Enabled and documented TLSH support
- Created example TLSH signature database
- Comprehensive setup guide in docs/TLSH_SETUP_GUIDE.md
- 100% confidence matching for system binaries
Removed
- Deprecated Options - Cleaned up obsolete CLI flags
- Removed --enhanced flag (enhanced mode is now always enabled)
- Fixed partially wired options to be fully functional
1.10.5
Changelog for v1.10.5 - 2025-10-19
Added
- UPMEX Integration - Comprehensive package metadata extraction for enhanced analysis
- Universal Package Support: Analyzes eight package ecosystems (Maven, PyPI, NPM, NuGet, Composer, Cargo, Gem, Conda)
- Quick Package Detection: Fast metadata extraction before deep binary analysis
- Enhanced JAR Analysis: Maven coordinates (groupId/artifactId/version), manifest parsing, license file extraction
- Package Metadata in Reports: JSON and console outputs now display discovered package information
- Multi-format Support: JAR, WAR, EAR, ZIP, WHL package analysis ready
- Enhanced OSLiLi Integration - Advanced SPDX license detection with multiple methods
- Mandatory Dependency: OSLiLi is now required for comprehensive license detection
- Source Code License Tags: Detects SPDX license identifiers in .py, .js, .java, .c, .cpp, .h, .go, .rs, and other source files
- Multi-method Detection: Hash matching, tag detection, keyword analysis, regex patterns, and full-text analysis
- License Reference Parsing: Maps 15+ common license names to SPDX identifiers (e.g., "Mozilla Public License, v. 2.0" → "MPL-2.0")
- License Categorization: Declared, detected, and referenced license classification
- Compatibility Analysis: License compatibility information using SPDX framework
Fixed
- Test Suite - Achieved 100% test success rate (205/205 tests passing)
- Fixed ONNX extractor file type detection to properly handle .pb files
- Fixed pickle extractor missing security methods (validate_safe_unpickle())
- Fixed PyTorch native extractor functionality:
- State dict detection (has_state_dict)
- Optimizer detection (has_optimizer)
- Architecture detection (ResNet, Transformer, etc.)
- Suspicious operations detection via STACK_GLOBAL handling
- Layer counting functionality
- Fixed static library BSD extended names parsing bug (double subtraction issue)
Improved
- Code Quality - Cleaned up dead code and duplicated imports
- Removed unused import struct from hashing utilities
- Consolidated duplicate import zipfile statements in UPMEX adapter
- Moved imports to module level for better organization
- Enhanced code maintainability with zero orphaned files
- Removed progressive_deterministic.py dead code
- Consolidated duplicate OSLiLi integrations into unified EnhancedOsliliIntegration class
Changed
- Dependencies - Updated multiple outdated dependencies to more recent versions
- Added semantic-copycat-upmex>=1.6.2 as required dependency
- Updated semantic-copycat-oslili>=1.5.0 for enhanced license detection
- Improved compatibility and security with the latest package versions
- Maintained backward compatibility while upgrading core dependencies
Usage
- When UPMEX is Used: Automatically activated for package files (JAR, WAR, EAR, WHL, ZIP) to extract metadata before signature analysis
- When OSLiLi is Used: Applied to all supported files for license detection - package files via UPMEX integration, source files via direct analysis, and license files via content analysis
- Performance Impact: Zero regression - analysis times remain sub-second with enhanced metadata extraction
1.10.1
v1.10.1 - Enhanced License Detection with OSLiLi
Highlights
This release integrates semantic-copycat-oslili for significantly improved license detection accuracy. BinarySniffer now correctly identifies licenses from LICENSE files, package metadata, and source code
with SPDX-compliant identifiers.
What's New
Added
- OSLiLi Integration - Enhanced license detection using semantic-copycat-oslili
- Automatic license detection from package metadata (package.json, pom.xml, etc.)
- SPDX-compliant license identifiers
- ML-based license matching with higher accuracy
- Support for license categorization (declared, detected, referenced)
- Integrated into the archive extraction pipeline
Improved
- License Detection Accuracy - Better detection rates
- Correctly identifies licenses from LICENSE files in archives
- Reduced false positives through ML-based matching
- TLSH fuzzy matching for license text similarity
- Proper detection of Apache-2.0, MIT, BSD, GPL, and other standard licenses
Fixed
- Code Quality - Removed dead code and fixed potential bugs
- Fixed potential None reference in analyze_licenses method
- Removed duplicate license detection implementations
- Cleaned up unused methods in the integrations module
- Improved error handling when OSLiLi is unavailable
Changed
- Dependencies - Added semantic-copycat-oslili as a required dependency
- semantic-copycat-oslili >= 1.3.2 now required for license detection
- LicenseMatcher retained as fallback for compatibility
Testing Results
Tested with Apache Commons Lang3 JAR:
- Apache-2.0 license correctly detected from META-INF/LICENSE.txt
- 100% confidence using TLSH fuzzy matching
- SPDX identifiers in all output formats (JSON, CycloneDX, KissBOM)
Installation
pip install semantic-copycat-binarysniffer==1.10.1
Compatibility
No breaking changes - backward compatibility maintained with fallback to pattern matching when OSLiLi is unavailable.
Full Changelog: v1.10.0...v1.10.1
1.10.0
v1.10.0 - ML Model Security Analysis System
Major Security Features
- ML Model Security Analysis - Comprehensive security module for machine learning models with dedicated
ml-scancommand - MITRE ATT&CK Integration - Threat categorization using industry-standard framework
- Multi-level Risk Assessment - SAFE, LOW, MEDIUM, HIGH, CRITICAL risk levels for ML models
- Deep Pickle Analysis - Analyzes pickle opcodes without code execution for safe inspection
- Obfuscation Detection - Uses entropy analysis and pattern matching to identify hidden threats
Threat Detection
- 50+ Malicious Patterns - Extensive database of threat patterns mapped to MITRE techniques
- Code Execution Detection - Identifies os.system, subprocess.Popen, eval, exec patterns
- Network Operations - Detects socket connections, requests, urllib usage
- Shell Commands - Finds /bin/bash, cmd.exe, reverse shell indicators
- Encoding Techniques - Detects base64, zlib, marshal obfuscation
Enhanced Output Formats
- SARIF Format - CI/CD integration with GitHub Actions and IDE support
- Security-Enhanced SBOM - CycloneDX format with ML security metadata
- Markdown Reports - Human-readable security assessment reports
- Model Integrity - Hash verification for supply chain security
Improvements
- Better ML Framework Detection - XGBoost (77.3%), PyTorch (96%), scikit-learn (94%) confidence
- Fixed Entropy Calculation - Resolved float.bit_length() issue in obfuscation detection
- Malicious File Handling - Properly flags malicious pickle files as CRITICAL risk
This release addresses issues #25 and #26, providing comprehensive ML model security analysis capabilities for defensive security operations.
1.9.9
Version 1.9.9
Added
- XGBoost Detection - New signature file for XGBoost gradient boosting framework
- Detects xgboost.sklearn, XGBClassifier, XGBRegressor patterns
- Identifies gradient boosting specific parameters (max_depth, n_estimators, learning_rate)
- Successfully detects XGBoost in mixed ML model files
- Apache-2.0 license attribution for XGBoost components
- Malformed File Detection - Enhanced handling of corrupted and invalid files
- New signature set for detecting malformed pickle files
- Categorizes errors: invalid opcodes, truncated files, unknown errors
- Provides precise WARNING classification for problematic files
- Tracks risk levels in metadata (malformed, error, dangerous, safe)
Improved
- Enhanced Pickle Extractor - Better error handling and user feedback
- Distinguishes between different types of file corruption
- Adds specific error features for signature matching
- Provides suspicious_items metadata for detailed diagnostics
- Improved risk assessment with new "malformed" and "error" categories
- CLI User Experience - Clearer warnings and feedback for problematic files
- Table components can display warning titles and captions
- Special formatting for malformed file detection
- Risk level indicators shown before component tables
- Better visual distinction between normal detections and warnings
Fixed
- Mixed ML model files (e.g., XGBoost models) now properly detected instead of showing "No components"
- Malformed pickle files now show a clear WARNING instead of generic error messages
- The classification column properly handles both licenses and security warnings
1.9.6
[1.9.6] - 2025-08-13
Added
- Signature Collision Detection - New system to identify and resolve pattern conflicts between components
- Implemented SignatureCollisionDetector for cross-signature pattern analysis
- Added --check-collisions flag to analyze pattern overlaps
- Added --interactive mode for guided collision resolution
- Added --collision-threshold to control sensitivity (default: 2 patterns)
- Smart severity levels: critical (5+ components), high (3-4), medium (2 unrelated), low (2 related)
- Recognizes related component families (FFmpeg, OpenSSL, Qt, etc.)
- Automatic Generic Word Filtering - Filters 100+ common programming terms to prevent false positives
- Automatically removes generic patterns like "error", "debug", "init", "handler", etc.
- Preserves library-specific prefixes (av_, ssl_, qt_, etc.)
- Signature Deduplication - All signatures are now automatically deduplicated during generation
- Enhanced Test Coverage - Added comprehensive test suite for signature validation
Improved
- Signature Quality - Significantly reduced false positives through automatic filtering
- Documentation - Updated signature management docs with collision detection features
1.9.5
Summary of Changes
- Version bumped to 1.9.5 in both pyproject.toml and binarysniffer/init.py
- CHANGELOG.md updated with v1.9.5 entry:
- Fixed signature status command to handle both "signatures" and "patterns" keys
- Merged OpenSSL signature files into a single file with 135 patterns
- Improved signature organization and status display
1.9.4
Summary of Changes
- Version bumped to 1.9.4 in both pyproject.toml and binarysniffer/init.py
- CHANGELOG.md updated with the critical codec signature fix:
- Fixed bug where codec signatures weren't being imported
- SignatureManager now handles both "signatures" and "patterns" keys
- Increased signature count from 1,193 to 1,324 with proper codec imports