-
-
Notifications
You must be signed in to change notification settings - Fork 1
Move tier bridge #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move tier bridge #12
Conversation
* Introduced a new test method `PsScript_Populates_ScriptLanguage_And_Cmdlets` to validate PowerShell script detection. * Enhanced the `FileInspector` analysis logic to map script languages from file extensions and improve cmdlet extraction for PowerShell scripts. * Updated `SecurityHeuristics` to limit the number of cmdlets extracted and added helper methods for cmdlet identification. * Modified `FileInspectorX.psd1` to include additional metadata for the module.
…abilities * Introduced `FileSystemMetadata` and `FileMetadataOptions` classes to encapsulate file metadata details. * Implemented `ReadFileMetadata` method to read and return metadata for a specified file path. * Added `InspectWithMetadata` method to combine file inspection results with metadata. * Enhanced `CollectMetadata` methods to include file system metadata in analysis results. * Introduced `SignatureStatus` class to summarize signature presence and validity. * Added support for extracting top tokens from script/log content with configurable options. * Updated settings to control top token extraction behavior. * Enhanced analysis and report views to include top tokens in the output.
PR Review: Move Tier BridgeI've completed a thorough review of this pull request. Overall, this is a well-structured feature addition that introduces metadata extraction capabilities and enhanced text metrics. Here's my detailed feedback: ✅ Strengths
🐛 Potential Bugs & Issues1. BOM Handling Issue in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces several enhancements to the FileInspectorX library focused on improving file analysis capabilities, script detection, and metadata extraction. The changes add token extraction for scripts/logs, comprehensive file system metadata support, enhanced declared vs detected type comparison with security implications, and improved script language detection.
Key changes include:
- New TopTokens feature for extracting frequent tokens from script/log content (opt-in via settings)
- File system metadata capture (FileSystemMetadata, FileInspectionSummary, SignatureStatus classes)
- Enhanced declared type comparison with strong alternative candidates and danger detection (DeclaredTypeComparison)
- Improved script language detection from file extensions with expanded coverage
- Better UTF BOM handling in text reading functions
- Enhanced PowerShell cmdlet extraction with module/function hints
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| Module/FileInspectorX.psd1 | Adds RequireLicenseAcceptance field and reformats PSData section |
| FileInspectorX/Views/ReportView.cs | Adds TopTokens property to report view with heuristics display logic |
| FileInspectorX/Views/AnalysisView.cs | Adds TopTokens property and updates EstimatedLineCount documentation |
| FileInspectorX/NuGet.README.md | Documents new features: InspectWithMetadata, TopTokens settings, and CompareDeclaredDetailed |
| FileInspectorX/Logging/Settings.cs | Adds TopTokens configuration settings (enabled, max, min length, min count) |
| FileInspectorX/Helpers.cs | Expands IsText helper to include script file extensions |
| FileInspectorX/Detection/FileInspector.DeclaredComparison.cs | New file implementing detailed declared vs detected type comparison with dangerous alternative detection |
| FileInspectorX/Detection/DeclaredTypeComparison.cs | New class modeling detailed comparison results with strong alternatives and danger flags |
| FileInspectorX/Analysis/SignatureStatus.cs | New lightweight signature summary class for simplified signature information access |
| FileInspectorX/Analysis/SecurityHeuristics.cs | Enhances cmdlet extraction, adds script hint parsing, improves UTF BOM handling |
| FileInspectorX/Analysis/SecurityHeuristics.Cmdlets.cs | New file with PowerShell-specific module/function/class extraction logic |
| FileInspectorX/Analysis/FileSystemMetadata.cs | New classes for file system metadata and options |
| FileInspectorX/Analysis/FileInspector.TextMetrics.cs | New file implementing top token extraction with stop words and frequency counting |
| FileInspectorX/Analysis/FileInspector.Metadata.cs | New file with metadata reading, signature status, and filename mismatch detection |
| FileInspectorX/Analysis/FileInspector.Analyze.cs | Improves script detection from extensions, adds TextMetrics population, enhances UTF BOM handling, maps "node" to "javascript" |
| FileInspectorX/Analysis/FileInspectionSummary.cs | New bundled class combining analysis, report, metadata, and flattened dictionary |
| FileInspectorX/Analysis/FileAnalysis.cs | Adds TopTokens property with documentation |
| FileInspectorX.Tests/FileInspectorXHeuristicsTests.cs | Adds test for ScriptLanguage and ScriptCmdlets population |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return token.Substring(start, end - start + 1); | ||
| } | ||
|
|
||
| private static bool IsTrimChar(char c) => c == '-' || c == '_' || c == '.' || c == ':'; |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IsTrimChar function includes colon (':') as a trim character, but this will cause issues with the token format which uses colon as a delimiter. When a token is formatted as "token:count" (line 88), if the original token ends with a colon during extraction, it will be trimmed. However, tokens should not contain colons in the first place based on IsTokenChar (line 95) which doesn't include colon. This inconsistency suggests either IsTokenChar should include colon, or IsTrimChar should not. Given the format "token:count", IsTrimChar should not include colon.
| private static bool IsTrimChar(char c) => c == '-' || c == '_' || c == '.' || c == ':'; | |
| private static bool IsTrimChar(char c) => c == '-' || c == '_' || c == '.'; |
| foreach (var p in probes) | ||
| { | ||
| if (text.IndexOf(p, StringComparison.OrdinalIgnoreCase) >= 0) ordered.Add(p); | ||
| if (text.IndexOf(p, StringComparison.OrdinalIgnoreCase) >= 0) AddCmdlet(p); | ||
| } |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
| foreach (var candidate in candidates) | ||
| { | ||
| if (!IsStrongCandidate(candidate)) continue; | ||
| var ext = NormalizeExtension(candidate.Extension); | ||
| if (!string.IsNullOrEmpty(primaryExt) && | ||
| !string.IsNullOrEmpty(ext) && | ||
| string.Equals(ext, primaryExt, StringComparison.OrdinalIgnoreCase)) | ||
| continue; | ||
| list.Add(candidate); | ||
| } |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
| foreach (var alt in strong) | ||
| { | ||
| var ext = NormalizeExtension(alt.Extension); | ||
| if (string.IsNullOrEmpty(ext)) continue; | ||
| if (string.Equals(ext, decl, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| cmp.DeclaredMatchesAlternative = true; | ||
| break; | ||
| } | ||
| } |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.
| foreach (var candidate in candidates) | ||
| { | ||
| var ext = NormalizeExtension(candidate.Extension); | ||
| if (ext == null || ext.Length == 0) continue; | ||
| if (!isDangerous(ext)) continue; | ||
| if (seen.Add(ext)) list.Add(ext); | ||
| if (list.Count >= 3) break; | ||
| } |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.
| private static string? NormalizeModuleName(string? raw) | ||
| { | ||
| if (string.IsNullOrWhiteSpace(raw)) return null; | ||
| var name = (raw ?? string.Empty).Trim().Trim('"', '\''); |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition is always not null because of call to method IsNullOrWhiteSpace.
| } | ||
| // JS minified heuristic if file extension is .js | ||
| var declaredExt = System.IO.Path.GetExtension(path)?.TrimStart('.').ToLowerInvariant(); | ||
| var detectedExt = (det?.Extension ?? string.Empty).Trim().TrimStart('.').ToLowerInvariant(); |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition is always not null because of ... is ....
| var detectedExt = (det?.Extension ?? string.Empty).Trim().TrimStart('.').ToLowerInvariant(); | |
| var detectedExt = det?.Extension?.Trim().TrimStart('.').ToLowerInvariant() ?? string.Empty; |
|
|
||
| // PDF heuristics | ||
| if (det.Extension == "pdf") { | ||
| if (det != null && det.Extension == "pdf") { |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition is always true because of ... is ....
| if (det != null && det.Extension == "pdf") { | |
| if (det.Extension == "pdf") { |
|
|
||
| // OLE2 Office macros (VBA) check for legacy formats (.doc/.xls/.ppt) | ||
| if (det.Extension is "doc" or "xls" or "ppt") | ||
| if (det != null && det.Extension is "doc" or "xls" or "ppt") |
Copilot
AI
Dec 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition is always true because of ... is ....
| if (det != null && det.Extension is "doc" or "xls" or "ppt") | |
| if (det != null && (det.Extension == "doc" || det.Extension == "xls" || det.Extension == "ppt")) |
* Introduced `MapTextSubtypeFromExtension` method for improved mapping of file extensions to text subtypes. * Refactored token extraction logic to support a maximum number of unique tokens. * Added settings for `TopTokensMaxUniqueTokens` and `ScriptHintMaxLineLength` to enhance configurability. * Improved handling of script hints by enforcing line length constraints.
Pull Request Review: Move tier bridgeSummaryThis PR introduces significant enhancements to FileInspectorX, adding file system metadata reading capabilities, improved PowerShell script detection, text subtype mapping enhancements, and top token extraction for script/log content analysis. Overall Assessment: Strong implementation with minor concerns ✅ Code Quality & Best PracticesStrengths ⭐
Suggestions for Improvement 📝
Potential Bugs & IssuesCritical Issues 🔴None identified. Medium Priority Issues 🟡
Low Priority Issues 🟢
Performance ConsiderationsPositive Aspects ⚡
Potential Concerns
|
…n scanning limit - Introduced `TopTokensMaxBytes` to allow users to set a maximum byte limit for scanning when computing top tokens. - Default value set to 256 KB, with an option to fall back to `DetectionReadBudgetBytes` if set to 0. - Enhanced script language detection logic for improved accuracy.
|
Code Review for PR #12: Move tier bridge I have completed a comprehensive review of this pull request. Overall, this is a well-structured enhancement that adds significant new functionality while maintaining code quality. === STRENGTHS ===
=== POTENTIAL ISSUES ===
=== PERFORMANCE ===
=== SECURITY ===
=== TEST COVERAGE RECOMMENDATIONS ===
=== RECOMMENDATION === APPROVE with minor suggestions The issues identified are relatively minor and do not block merging. Main improvements:
The code quality is high with good defensive programming practices and performance considerations. Review generated with Claude Code |
…settings * Updated XML comments to indicate potential exposure of sensitive content for `MagicHeaderHex` and `IncludeMagicHeader`. * Introduced `ScriptHintMaxLines` setting to control the maximum number of lines scanned for script hints, with a default of 400. * Added note in `TopTokensEnabled` to highlight the risk of surfacing sensitive keywords.
* Updated version numbers in project files to reflect the new release. * Ensured consistency across `Update-Version.ps1`, `.csproj`, and module manifest files.
Pull Request Review: Move tier bridge (PR #12)SummaryThis PR introduces several significant enhancements to FileInspectorX, including metadata extraction, text metrics with token analysis, improved script detection, and enhanced declared vs detected type comparison. The changes span 19 files with 1027 additions and 40 deletions. Code Quality and Best PracticesStrengths ✅
Areas for Improvement 🔍
Potential Bugs or IssuesCritical 🔴None identified. The code appears solid. Medium Priority 🟡
Performance ConsiderationsPositive Aspects ✅
Potential Optimizations 💡
Security ConcernsGood Security Practices ✅
Security Notes 🔐
Test CoverageExisting Tests ✅
Test Coverage Gaps 🧪
Recommendation: Add comprehensive unit tests for all new public APIs before merging. Specific File-by-File CommentsFileInspector.TextMetrics.cs
FileInspector.Metadata.cs
SecurityHeuristics.Cmdlets.cs
FileInspector.DeclaredComparison.cs
Settings.cs
RecommendationsBefore Merging
Future Enhancements
ConclusionThis is a high-quality PR that significantly enhances FileInspectorX's capabilities. The code follows good practices, is well-documented, and introduces useful features. The main areas for improvement are:
Overall Assessment: Approve with minor suggestions ✅ The code is production-ready after addressing test coverage gaps. Excellent work on the metadata extraction and text analysis features! Review generated with Claude Code |
* Corrected the formatting of the module manifest. * Ensured proper syntax for PowerShell module compatibility.
* Introduced new properties in `FileAnalysis` for caching detection results: - `DetectedExtension` - `DetectedMimeType` - `DetectionConfidence` - `DetectionReason` - `DetectionValidationStatus` - `DetectionScore` - `DetectionIsDangerous` * Implemented `PopulateDetectionSummary` method to populate these properties based on detection results. * Enhanced PE kind detection logic in `FileInspector`.
Pull Request Review: v1.0.6 Feature EnhancementsSummaryThis PR introduces significant enhancements to FileInspectorX v1.0.6, including file metadata reading capabilities, improved script detection, top token extraction, and various detection improvements. The changes span 24 files with 1,078 additions and 47 deletions. Code Quality & Best Practices ✅Strengths
Areas for Improvement1. BOM Handling in ReadHeadText (FileInspector.Analyze.cs:2084-2094)The new BOM detection logic is excellent, but consider extracting it into a separate helper method for reusability: private static string ReadHeadTextWithBomDetection(byte[] buffer, int bytesRead)2. Magic Number Constants (FileInspector.Analyze.cs:898-900)The PE characteristics constant should be defined at the class level: private const ushort IMAGE_FILE_DLL = 0x2000;Currently it's defined inline which reduces discoverability. 3. Error Handling GranularitySeveral methods use empty catch blocks (
Potential Bugs & Issues
|
…routines * Enhanced exception handling to log specific errors in `SecurityHeuristics.Cmdlets.cs`, `TextMetrics.cs`, and `Metadata.cs`. * Moved `IMAGE_FILE_DLL` constant to a new file `PeConstants.cs` for better organization. * Added a hard limit for unique tokens in `TextMetrics.cs` to prevent excessive memory usage. * Introduced a new method `ReadHeadText` in `TextReading.cs` for improved text reading with BOM detection.
Pull Request Review: Move tier bridge (v1.0.6)SummaryThis PR adds significant functionality to FileInspectorX with new metadata collection APIs, text metrics extraction, and improved script detection. The changes introduce ~1,131 additions across 26 files with good test coverage. Code Quality & Best PracticesStrengths
Areas for Improvement1. Missing Null Check in FileInspector.TextMetrics.cs:85list.Sort((a, b) =>
{
int c = b.Value.CompareTo(a.Value);
return c != 0 ? c : string.Compare(a.Key, b.Key, StringComparison.OrdinalIgnoreCase);
});The diff appears truncated, but the 2. Performance Consideration: Token ExtractionFileInspector.TextMetrics.cs:55-96 - The
3. Exception SwallowingMultiple locations use empty catch blocks or only log in debug mode:
Recommendation: While acceptable for optional/best-effort operations, consider logging at least the exception type at warning level for production debugging. 4. Magic Number in FileInspector.DeclaredComparison.cs:7private const int StrongCandidateScoreThreshold = 80;This threshold is hardcoded but would benefit from being configurable in 5. Incomplete Refactoring in FileInspector.Analyze.csLines 508-523 show good refactoring with new helper methods var mappedScript = MapScriptLanguageFromExtension(detectedExt);
if (mappedScript == null && !string.IsNullOrWhiteSpace(declaredExt))
mappedScript = MapScriptLanguageFromExtension(declaredExt);
// ... followed by more conditional backfillingThis could be simplified with a clearer priority order to reduce cognitive load. Potential Bugs1. Possible NullReferenceException in FileInspector.Analyze.cs:778if (det != null && det.Extension == "pdf") {The code checks 2. Dictionary Key Collision HandlingFileInspector.Metadata.cs:131-136: foreach (var kv in reportDict)
{
if (!dict.ContainsKey(kv.Key))
dict[kv.Key] = kv.Value;
else if (Settings.Logger.IsDebug)
Settings.Logger.WriteDebug("metadata:collision key '{0}' skipped", kv.Key);
}Silent collision resolution (first wins) could hide bugs. Consider:
3. File Extension Comparison Case SensitivityMultiple locations use Security Concerns1. File Path Inclusion in Metadata
if (options.IncludePath) meta.Path = path;Good: This is opt-in with 2. Magic Header Hex ExposureFileSystemMetadata.cs:43: /// <summary>Magic header bytes as hex, when requested. Note: may expose sensitive content.</summary>
public string? MagicHeaderHex { get; set; }Good: The XML comment warns about sensitive content exposure. 3. Top Tokens May Leak Sensitive InformationSettings.cs:326: /// Note: enabling this can surface sensitive keywords from scripts/logs.
public static bool TopTokensEnabled { get; set; } = false;Good: Disabled by default with clear warning. 4. Command Injection Risk - LowSecurityHeuristics.Cmdlets.cs extracts module names and function names from PowerShell scripts. The extraction appears safe (no execution), but verify downstream consumers don't execute or eval these strings. Performance Considerations1. Multiple File ReadsThe analysis flow may read the same file multiple times:
Recommendation: Consider caching the initial read buffer for reuse across heuristics. 2. String Allocations in Token ProcessingFileInspector.TextMetrics.cs:73: var token = trimmed.ToString().ToLowerInvariant();Each token creates two string allocations. For large files with many tokens, this could pressure GC. 3. BOM Detection on Every ReadFileInspector.TextReading.cs:25-42 - Test CoverageStrengths
Gaps
Recommendation: Add test coverage for the public APIs introduced in this PR. Additional Observations1. Version BumpVersion incremented from 1.0.5 to 1.0.6 across all projects - appropriate for a feature addition. 2. DocumentationXML comments are comprehensive and helpful. Consider adding a CHANGELOG.md entry for this release. 3. Backwards CompatibilityAll changes appear to be additive (new properties, new methods) with no breaking changes. Existing code should continue to work. 4. Settings ProliferationThe
VerdictRecommendation: Approve with Minor Revisions This is a solid PR that adds valuable functionality with good code quality. The issues identified are minor and can be addressed in follow-up commits or the next PR. Must Fix Before MergeNone - no critical issues found. Should Fix Soon
Nice to Have
Great work overall! The new metadata and text analysis capabilities will be very useful. |
* Refactored `ReadHeadText` to cache results for improved performance. * Updated logging levels for error handling in metadata and text reading. * Enhanced `FileMetadataOptions` to default sensitive fields to false. * Introduced `TopTokensRedactPatterns` for sensitive token redaction. * Adjusted strong candidate score threshold to use settings for flexibility.
Pull Request Review: v1.0.6 - PowerShell Detection & Metadata EnhancementsOverviewThis PR introduces significant enhancements to FileInspectorX v1.0.6, adding PowerShell script detection, file system metadata reading capabilities, top token extraction, and improved text subtype mapping. The changes span 26 files with 1,206 additions and 62 deletions. Code Quality & Best PracticesStrengths
Areas for Improvement
Performance ConsiderationsOptimizations
Potential Issues
Security ConcernsSecurity Strengths
Recommendations
Test CoverageGood Coverage
Suggested Additions
Additional ObservationsPositive Changes
Questions
SummaryHigh-quality PR with well-thought-out features and good defensive coding. Main improvements needed:
Recommendation: Approve with minor suggestionsCode is production-ready with good error handling, security awareness, and documentation. Suggested improvements can be addressed in follow-up work. Review generated with Claude Code |
No description provided.