Add dataset loader and ground truth pairing for FTE-HARM validation#2
Draft
Add dataset loader and ground truth pairing for FTE-HARM validation#2
Conversation
This commit introduces a comprehensive dataset loading and ground truth pairing system for forensic log analysis validation: - dataset_loader.py: Core module with classes for: - DatasetScanner: Scans directories for log and label files - DatasetPairer: Matches logs with ground truth using multiple rules - GroundTruthLoader: Parses line-by-line, CSV, and JSON formats - DatasetValidator: Validates dataset integrity and pairing - DatasetStatsGenerator: Generates comprehensive statistics - DatasetIterator: Iterates through paired datasets - FTEHARMValidator: Complete validation workflow integration - Notebook additions: Demonstration cells showing: - Dataset configuration and scanning - Log-ground truth pairing workflow - Validation and statistics generation - FTE-HARM integration examples
- Create dedicated dataset_loader.ipynb notebook with demonstration cells: - Module import and Google Drive mounting - Dataset path configuration - Directory scanning - Log-ground truth pairing - Dataset validation - Statistics generation - Iteration workflow examples - FTE-HARM validation integration - Convenience functions reference - Validation checklist - Restore AI_AGENTS_lab_8_(1).ipynb to original state The dataset_loader.ipynb provides a complete Colab-ready workflow for loading forensic log datasets and pairing them with ground truth annotations for FTE-HARM validation.
Changes:
- Update GroundTruthLoader._parse_line_entry() to handle AIT dataset
JSON format: {"labels": ["attacker_vpn"]} or {"labels": []}
- Non-empty labels list = malicious (binary=1)
- Empty labels list = benign (binary=0)
- Embed full dataset_loader module code directly in notebook
(Colab doesn't load external .py files)
- Remove external import dependency
- Streamlined notebook workflow with 8 steps
Implements simplified FTE-HARM validation with: - ONE hypothesis (first label from dataset) - ONE P_Score method (Option A: Binary Presence) - ONE validation approach (Binary: TP/FP/TN/FN) Features: - Flexible dataset discovery (handles variable naming conventions) - Label discovery across all datasets - Physical Token Quantization for entity extraction - Binary validation with confusion matrix and metrics - Results saved to Google Drive
Changes: - Add Cell 2: Load MITRE ATT&CK hypotheses from summary folder - Add Cell 5: Select hypothesis (MITRE or fallback) - Add SUMMARY_PATH and MITRE_PATH configuration - Update validation to use target_labels (list) instead of single label - Save results to summary folder instead of root output path - Add hypothesis_source tracking in results Hypothesis loading priority: 1. mitre_att&ck/fte_harm_hypotheses.json 2. summary/fte_harm_config_latest.json 3. summary/fte_harm_hypotheses.json 4. Fallback from discovered labels
Features: - 10 dataset-specific hypotheses targeting AITv2 labels - MITRE ATT&CK metadata for forensic corroboration - Threshold testing prioritizing HIGH RECALL - Two P_Score methods: Option A (Binary) & B3 (Confidence-Weighted) - Two-stage validation: detection + hypothesis matching - MITRE corroboration table generation - Automatic label mapping between hypotheses and ground truth Hypotheses cover: - Privilege Escalation (T1548.003, T1068) - Discovery/Scanning (T1046) - Credential Access (T1110.001) - Persistence (T1505.003) - Exfiltration (T1048, T1071.004) - Lateral Movement (T1021.004) - Command & Control (T1059, T1071.001) Output paths: - summary/: Main validation results - threshold_test/: Threshold analysis - mitre_att&ck/: MITRE corroboration tables
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a comprehensive dataset loading and ground truth pairing system for forensic log analysis validation:
dataset_loader.py: Core module with classes for:
Notebook additions: Demonstration cells showing: