Skip to content

Releases: SemClone/binarysniffer

1.11.3

06 Nov 02:08
7eae72c

Choose a tag to compare

BinarySniffer v1.11.3 Release Notes

Release Date: November 5, 2025

Critical Fix Release

Fixed

  • Integration Restoration - Restored critical osslili and UPMEX integrations that were incorrectly removed
    • Re-added binarysniffer/integrations/enhanced_oslili.py for comprehensive license detection
    • Re-added binarysniffer/integrations/upmex_adapter.py for package metadata extraction
    • Re-added binarysniffer/integrations/init.py with proper module exports
    • Fixed analyzer_enhanced.py to initialize and use both integrations properly
    • Fixed archive.py to extract licenses and package metadata from archives
    • Added conversion of UPMEX license data to ComponentMatch objects with upmex_detection match type
    • OSLiLi license detection now shows oslili_detection match type with proper confidence scores
    • Package metadata now properly extracted and displayed for JAR, WAR, WHL, and other supported formats

Verified

  • Complete Integration Testing - Verified all integrations work correctly across file types
    • OSLiLi detects licenses in source code files (.py, .js, .java, .c, .cpp, etc.) with 80-100% confidence
    • UPMEX extracts package metadata including Maven coordinates, SPDX licenses, and notice text
    • Archive files (JAR, IPA) properly analyzed with both component signatures and license detection
    • Binary files (libssl.so, libcurl.so) correctly identify OpenSSL, cURL, and other components
    • CLI output displays package type, license info, and comprehensive match evidence

What This Fixes

This release restores functionality that was accidentally removed in repository cleanup commits d6bef5c and 1ed6695. Users will now see:

  • License Detection: Proper oslili_detection and upmex_detection match types in results
  • Package Metadata: Maven coordinates, SPDX licenses, and package information for supported formats
  • Enhanced Coverage: Comprehensive analysis across source code, archives, and binary files
  • Evidence Details: Detailed confidence scores, detection methods, and source attribution

Upgrade Instructions

pip install --upgrade binarysniffer

No configuration changes required - all functionality is restored automatically.


Dependencies: osslili>=1.5.6, upmex>=1.6.7

1.11.2

28 Oct 06:57

Choose a tag to compare

[1.11.2] - 2025-10-27

Fixed

  • Dependency Resolution - Fixed dependency compatibility issues with renamed packages
    • Ensured clean migration from semantic-copycat-oslili to osslili>=1.5.6
    • Ensured clean migration from semantic-copycat-upmex to upmex>=1.6.7
    • Removed conflicting legacy package installations that caused version mismatches
    • All integrations now properly import from updated package namespaces

This version focused on resolving issues with the dependency migration from semantic-copycat packages to
their renamed versions.

1.11.1

24 Oct 06:28
e10a7bd

Choose a tag to compare

Release v1.11.1: Major performance and usability improvements

Fixed

  • Progress Display - Fixed progress bar stuck at 0% for directory analysis
    • Progress callbacks now properly track file processing
    • Consolidated summary table for directory scans instead of individual file results
  • File Processing - Resolved hanging issues with problematic files
    • Added configurable timeout system (60s default) with --timeout option
    • Automatic detection and exclusion of XML/plist metadata files
    • Fixed processing of large files (>50MB excluded by default, use --include-large to analyze)

Added

  • CLI Enhancements - New options for better control and visibility
    • --timeout - Configure per-file timeout (default: 60 seconds)
    • --include-large / -l - Include files larger than 50MB in analysis
    • --debug / -v - Show files being processed in real-time
    • --skip-metadata - Skip XML/plist metadata files entirely
    • --with-hashes - Now properly displays file hash values

Improved

  • Performance - Optimized file processing for large directories
    • Sequential processing for large directories (>100 files) to prevent resource exhaustion
    • Smart file filtering to skip problematic formats early
    • Reduced timeout for metadata files (3 seconds)
  • Documentation - Updated user guide with all new CLI options
    • Comprehensive option descriptions and usage examples
    • Fixed outdated references to deprecated flags
  • License Detection - Verified OSLiLi integration functionality
    • Successfully detects BSD, ISC, Python-2.0 licenses with high confidence (85-97%)
  • TLSH Fuzzy Matching - Enabled and documented TLSH support
    • Created example TLSH signature database
    • Comprehensive setup guide in docs/TLSH_SETUP_GUIDE.md
    • 100% confidence matching for system binaries

Removed

  • Deprecated Options - Cleaned up obsolete CLI flags
    • Removed --enhanced flag (enhanced mode is now always enabled)
    • Fixed partially wired options to be fully functional

1.10.5

19 Oct 08:13
ffdefe2

Choose a tag to compare

Changelog for v1.10.5 - 2025-10-19

Added

  • UPMEX Integration - Comprehensive package metadata extraction for enhanced analysis
    • Universal Package Support: Analyzes eight package ecosystems (Maven, PyPI, NPM, NuGet, Composer, Cargo, Gem, Conda)
    • Quick Package Detection: Fast metadata extraction before deep binary analysis
    • Enhanced JAR Analysis: Maven coordinates (groupId/artifactId/version), manifest parsing, license file extraction
    • Package Metadata in Reports: JSON and console outputs now display discovered package information
    • Multi-format Support: JAR, WAR, EAR, ZIP, WHL package analysis ready
  • Enhanced OSLiLi Integration - Advanced SPDX license detection with multiple methods
    • Mandatory Dependency: OSLiLi is now required for comprehensive license detection
    • Source Code License Tags: Detects SPDX license identifiers in .py, .js, .java, .c, .cpp, .h, .go, .rs, and other source files
    • Multi-method Detection: Hash matching, tag detection, keyword analysis, regex patterns, and full-text analysis
    • License Reference Parsing: Maps 15+ common license names to SPDX identifiers (e.g., "Mozilla Public License, v. 2.0" → "MPL-2.0")
    • License Categorization: Declared, detected, and referenced license classification
    • Compatibility Analysis: License compatibility information using SPDX framework

Fixed

  • Test Suite - Achieved 100% test success rate (205/205 tests passing)
    • Fixed ONNX extractor file type detection to properly handle .pb files
    • Fixed pickle extractor missing security methods (validate_safe_unpickle())
    • Fixed PyTorch native extractor functionality:
      • State dict detection (has_state_dict)
      • Optimizer detection (has_optimizer)
      • Architecture detection (ResNet, Transformer, etc.)
      • Suspicious operations detection via STACK_GLOBAL handling
      • Layer counting functionality
    • Fixed static library BSD extended names parsing bug (double subtraction issue)

Improved

  • Code Quality - Cleaned up dead code and duplicated imports
    • Removed unused import struct from hashing utilities
    • Consolidated duplicate import zipfile statements in UPMEX adapter
    • Moved imports to module level for better organization
    • Enhanced code maintainability with zero orphaned files
    • Removed progressive_deterministic.py dead code
    • Consolidated duplicate OSLiLi integrations into unified EnhancedOsliliIntegration class

Changed

  • Dependencies - Updated multiple outdated dependencies to more recent versions
    • Added semantic-copycat-upmex>=1.6.2 as required dependency
    • Updated semantic-copycat-oslili>=1.5.0 for enhanced license detection
    • Improved compatibility and security with the latest package versions
    • Maintained backward compatibility while upgrading core dependencies

Usage

  • When UPMEX is Used: Automatically activated for package files (JAR, WAR, EAR, WHL, ZIP) to extract metadata before signature analysis
  • When OSLiLi is Used: Applied to all supported files for license detection - package files via UPMEX integration, source files via direct analysis, and license files via content analysis
  • Performance Impact: Zero regression - analysis times remain sub-second with enhanced metadata extraction

1.10.1

30 Aug 17:44
1dc7493

Choose a tag to compare

v1.10.1 - Enhanced License Detection with OSLiLi

Highlights

This release integrates semantic-copycat-oslili for significantly improved license detection accuracy. BinarySniffer now correctly identifies licenses from LICENSE files, package metadata, and source code
with SPDX-compliant identifiers.

What's New

Added

  • OSLiLi Integration - Enhanced license detection using semantic-copycat-oslili
    • Automatic license detection from package metadata (package.json, pom.xml, etc.)
    • SPDX-compliant license identifiers
    • ML-based license matching with higher accuracy
    • Support for license categorization (declared, detected, referenced)
    • Integrated into the archive extraction pipeline

Improved

  • License Detection Accuracy - Better detection rates
    • Correctly identifies licenses from LICENSE files in archives
    • Reduced false positives through ML-based matching
    • TLSH fuzzy matching for license text similarity
    • Proper detection of Apache-2.0, MIT, BSD, GPL, and other standard licenses

Fixed

  • Code Quality - Removed dead code and fixed potential bugs
    • Fixed potential None reference in analyze_licenses method
    • Removed duplicate license detection implementations
    • Cleaned up unused methods in the integrations module
    • Improved error handling when OSLiLi is unavailable

Changed

  • Dependencies - Added semantic-copycat-oslili as a required dependency
    • semantic-copycat-oslili >= 1.3.2 now required for license detection
    • LicenseMatcher retained as fallback for compatibility

Testing Results

Tested with Apache Commons Lang3 JAR:

  • Apache-2.0 license correctly detected from META-INF/LICENSE.txt
  • 100% confidence using TLSH fuzzy matching
  • SPDX identifiers in all output formats (JSON, CycloneDX, KissBOM)

Installation

pip install semantic-copycat-binarysniffer==1.10.1

Compatibility

No breaking changes - backward compatibility maintained with fallback to pattern matching when OSLiLi is unavailable.

Full Changelog: v1.10.0...v1.10.1

1.10.0

15 Aug 06:52

Choose a tag to compare

v1.10.0 - ML Model Security Analysis System

Major Security Features

  • ML Model Security Analysis - Comprehensive security module for machine learning models with dedicated ml-scan command
  • MITRE ATT&CK Integration - Threat categorization using industry-standard framework
  • Multi-level Risk Assessment - SAFE, LOW, MEDIUM, HIGH, CRITICAL risk levels for ML models
  • Deep Pickle Analysis - Analyzes pickle opcodes without code execution for safe inspection
  • Obfuscation Detection - Uses entropy analysis and pattern matching to identify hidden threats

Threat Detection

  • 50+ Malicious Patterns - Extensive database of threat patterns mapped to MITRE techniques
  • Code Execution Detection - Identifies os.system, subprocess.Popen, eval, exec patterns
  • Network Operations - Detects socket connections, requests, urllib usage
  • Shell Commands - Finds /bin/bash, cmd.exe, reverse shell indicators
  • Encoding Techniques - Detects base64, zlib, marshal obfuscation

Enhanced Output Formats

  • SARIF Format - CI/CD integration with GitHub Actions and IDE support
  • Security-Enhanced SBOM - CycloneDX format with ML security metadata
  • Markdown Reports - Human-readable security assessment reports
  • Model Integrity - Hash verification for supply chain security

Improvements

  • Better ML Framework Detection - XGBoost (77.3%), PyTorch (96%), scikit-learn (94%) confidence
  • Fixed Entropy Calculation - Resolved float.bit_length() issue in obfuscation detection
  • Malicious File Handling - Properly flags malicious pickle files as CRITICAL risk

This release addresses issues #25 and #26, providing comprehensive ML model security analysis capabilities for defensive security operations.

1.9.9

14 Aug 22:15
4e744c8

Choose a tag to compare

Version 1.9.9

Added

  • XGBoost Detection - New signature file for XGBoost gradient boosting framework
    • Detects xgboost.sklearn, XGBClassifier, XGBRegressor patterns
    • Identifies gradient boosting specific parameters (max_depth, n_estimators, learning_rate)
    • Successfully detects XGBoost in mixed ML model files
    • Apache-2.0 license attribution for XGBoost components
  • Malformed File Detection - Enhanced handling of corrupted and invalid files
    • New signature set for detecting malformed pickle files
    • Categorizes errors: invalid opcodes, truncated files, unknown errors
    • Provides precise WARNING classification for problematic files
    • Tracks risk levels in metadata (malformed, error, dangerous, safe)

Improved

  • Enhanced Pickle Extractor - Better error handling and user feedback
    • Distinguishes between different types of file corruption
    • Adds specific error features for signature matching
    • Provides suspicious_items metadata for detailed diagnostics
    • Improved risk assessment with new "malformed" and "error" categories
  • CLI User Experience - Clearer warnings and feedback for problematic files
    • Table components can display warning titles and captions
    • Special formatting for malformed file detection
    • Risk level indicators shown before component tables
    • Better visual distinction between normal detections and warnings

Fixed

  • Mixed ML model files (e.g., XGBoost models) now properly detected instead of showing "No components"
  • Malformed pickle files now show a clear WARNING instead of generic error messages
  • The classification column properly handles both licenses and security warnings

1.9.6

13 Aug 21:09

Choose a tag to compare

[1.9.6] - 2025-08-13

Added

  • Signature Collision Detection - New system to identify and resolve pattern conflicts between components
    • Implemented SignatureCollisionDetector for cross-signature pattern analysis
    • Added --check-collisions flag to analyze pattern overlaps
    • Added --interactive mode for guided collision resolution
    • Added --collision-threshold to control sensitivity (default: 2 patterns)
    • Smart severity levels: critical (5+ components), high (3-4), medium (2 unrelated), low (2 related)
    • Recognizes related component families (FFmpeg, OpenSSL, Qt, etc.)
  • Automatic Generic Word Filtering - Filters 100+ common programming terms to prevent false positives
    • Automatically removes generic patterns like "error", "debug", "init", "handler", etc.
    • Preserves library-specific prefixes (av_, ssl_, qt_, etc.)
  • Signature Deduplication - All signatures are now automatically deduplicated during generation
  • Enhanced Test Coverage - Added comprehensive test suite for signature validation

Improved

  • Signature Quality - Significantly reduced false positives through automatic filtering
  • Documentation - Updated signature management docs with collision detection features

1.9.5

11 Aug 21:49

Choose a tag to compare

Summary of Changes

  1. Version bumped to 1.9.5 in both pyproject.toml and binarysniffer/init.py
  2. CHANGELOG.md updated with v1.9.5 entry:
    - Fixed signature status command to handle both "signatures" and "patterns" keys
    - Merged OpenSSL signature files into a single file with 135 patterns
    - Improved signature organization and status display

1.9.4

11 Aug 21:37

Choose a tag to compare

Summary of Changes

  1. Version bumped to 1.9.4 in both pyproject.toml and binarysniffer/init.py
  2. CHANGELOG.md updated with the critical codec signature fix:
    - Fixed bug where codec signatures weren't being imported
    - SignatureManager now handles both "signatures" and "patterns" keys
    - Increased signature count from 1,193 to 1,324 with proper codec imports