Add dataset loader and ground truth pairing for FTE-HARM validation by Abumaude · Pull Request #2 · Abumaude/AI-Foolosophy

Abumaude · 2025-11-25T15:28:35Z

This commit introduces a comprehensive dataset loading and ground truth pairing system for forensic log analysis validation:

dataset_loader.py: Core module with classes for:
- DatasetScanner: Scans directories for log and label files
- DatasetPairer: Matches logs with ground truth using multiple rules
- GroundTruthLoader: Parses line-by-line, CSV, and JSON formats
- DatasetValidator: Validates dataset integrity and pairing
- DatasetStatsGenerator: Generates comprehensive statistics
- DatasetIterator: Iterates through paired datasets
- FTEHARMValidator: Complete validation workflow integration
Notebook additions: Demonstration cells showing:
- Dataset configuration and scanning
- Log-ground truth pairing workflow
- Validation and statistics generation
- FTE-HARM integration examples

This commit introduces a comprehensive dataset loading and ground truth pairing system for forensic log analysis validation: - dataset_loader.py: Core module with classes for: - DatasetScanner: Scans directories for log and label files - DatasetPairer: Matches logs with ground truth using multiple rules - GroundTruthLoader: Parses line-by-line, CSV, and JSON formats - DatasetValidator: Validates dataset integrity and pairing - DatasetStatsGenerator: Generates comprehensive statistics - DatasetIterator: Iterates through paired datasets - FTEHARMValidator: Complete validation workflow integration - Notebook additions: Demonstration cells showing: - Dataset configuration and scanning - Log-ground truth pairing workflow - Validation and statistics generation - FTE-HARM integration examples

- Create dedicated dataset_loader.ipynb notebook with demonstration cells: - Module import and Google Drive mounting - Dataset path configuration - Directory scanning - Log-ground truth pairing - Dataset validation - Statistics generation - Iteration workflow examples - FTE-HARM validation integration - Convenience functions reference - Validation checklist - Restore AI_AGENTS_lab_8_(1).ipynb to original state The dataset_loader.ipynb provides a complete Colab-ready workflow for loading forensic log datasets and pairing them with ground truth annotations for FTE-HARM validation.

Changes: - Update GroundTruthLoader._parse_line_entry() to handle AIT dataset JSON format: {"labels": ["attacker_vpn"]} or {"labels": []} - Non-empty labels list = malicious (binary=1) - Empty labels list = benign (binary=0) - Embed full dataset_loader module code directly in notebook (Colab doesn't load external .py files) - Remove external import dependency - Streamlined notebook workflow with 8 steps

Implements simplified FTE-HARM validation with: - ONE hypothesis (first label from dataset) - ONE P_Score method (Option A: Binary Presence) - ONE validation approach (Binary: TP/FP/TN/FN) Features: - Flexible dataset discovery (handles variable naming conventions) - Label discovery across all datasets - Physical Token Quantization for entity extraction - Binary validation with confusion matrix and metrics - Results saved to Google Drive

Changes: - Add Cell 2: Load MITRE ATT&CK hypotheses from summary folder - Add Cell 5: Select hypothesis (MITRE or fallback) - Add SUMMARY_PATH and MITRE_PATH configuration - Update validation to use target_labels (list) instead of single label - Save results to summary folder instead of root output path - Add hypothesis_source tracking in results Hypothesis loading priority: 1. mitre_att&ck/fte_harm_hypotheses.json 2. summary/fte_harm_config_latest.json 3. summary/fte_harm_hypotheses.json 4. Fallback from discovered labels

Features: - 10 dataset-specific hypotheses targeting AITv2 labels - MITRE ATT&CK metadata for forensic corroboration - Threshold testing prioritizing HIGH RECALL - Two P_Score methods: Option A (Binary) & B3 (Confidence-Weighted) - Two-stage validation: detection + hypothesis matching - MITRE corroboration table generation - Automatic label mapping between hypotheses and ground truth Hypotheses cover: - Privilege Escalation (T1548.003, T1068) - Discovery/Scanning (T1046) - Credential Access (T1110.001) - Persistence (T1505.003) - Exfiltration (T1048, T1071.004) - Lateral Movement (T1021.004) - Command & Control (T1059, T1071.001) Output paths: - summary/: Main validation results - threshold_test/: Threshold analysis - mitre_att&ck/: MITRE corroboration tables

claude and others added 7 commits November 25, 2025 14:57

Create Thesis

4ed5759

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset loader and ground truth pairing for FTE-HARM validation#2

Add dataset loader and ground truth pairing for FTE-HARM validation#2
Abumaude wants to merge 7 commits intomainfrom
claude/load-pair-ground-truth-01M9aGDtzBzBgCYgSCmjRYHS

Abumaude commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abumaude commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants