Skip to content

Add dataset loader and ground truth pairing for FTE-HARM validation#2

Draft
Abumaude wants to merge 7 commits intomainfrom
claude/load-pair-ground-truth-01M9aGDtzBzBgCYgSCmjRYHS
Draft

Add dataset loader and ground truth pairing for FTE-HARM validation#2
Abumaude wants to merge 7 commits intomainfrom
claude/load-pair-ground-truth-01M9aGDtzBzBgCYgSCmjRYHS

Conversation

@Abumaude
Copy link
Copy Markdown
Owner

This commit introduces a comprehensive dataset loading and ground truth pairing system for forensic log analysis validation:

  • dataset_loader.py: Core module with classes for:

    • DatasetScanner: Scans directories for log and label files
    • DatasetPairer: Matches logs with ground truth using multiple rules
    • GroundTruthLoader: Parses line-by-line, CSV, and JSON formats
    • DatasetValidator: Validates dataset integrity and pairing
    • DatasetStatsGenerator: Generates comprehensive statistics
    • DatasetIterator: Iterates through paired datasets
    • FTEHARMValidator: Complete validation workflow integration
  • Notebook additions: Demonstration cells showing:

    • Dataset configuration and scanning
    • Log-ground truth pairing workflow
    • Validation and statistics generation
    • FTE-HARM integration examples

claude and others added 7 commits November 25, 2025 14:57
This commit introduces a comprehensive dataset loading and ground truth
pairing system for forensic log analysis validation:

- dataset_loader.py: Core module with classes for:
  - DatasetScanner: Scans directories for log and label files
  - DatasetPairer: Matches logs with ground truth using multiple rules
  - GroundTruthLoader: Parses line-by-line, CSV, and JSON formats
  - DatasetValidator: Validates dataset integrity and pairing
  - DatasetStatsGenerator: Generates comprehensive statistics
  - DatasetIterator: Iterates through paired datasets
  - FTEHARMValidator: Complete validation workflow integration

- Notebook additions: Demonstration cells showing:
  - Dataset configuration and scanning
  - Log-ground truth pairing workflow
  - Validation and statistics generation
  - FTE-HARM integration examples
- Create dedicated dataset_loader.ipynb notebook with demonstration cells:
  - Module import and Google Drive mounting
  - Dataset path configuration
  - Directory scanning
  - Log-ground truth pairing
  - Dataset validation
  - Statistics generation
  - Iteration workflow examples
  - FTE-HARM validation integration
  - Convenience functions reference
  - Validation checklist

- Restore AI_AGENTS_lab_8_(1).ipynb to original state

The dataset_loader.ipynb provides a complete Colab-ready workflow
for loading forensic log datasets and pairing them with ground truth
annotations for FTE-HARM validation.
Changes:
- Update GroundTruthLoader._parse_line_entry() to handle AIT dataset
  JSON format: {"labels": ["attacker_vpn"]} or {"labels": []}
- Non-empty labels list = malicious (binary=1)
- Empty labels list = benign (binary=0)
- Embed full dataset_loader module code directly in notebook
  (Colab doesn't load external .py files)
- Remove external import dependency
- Streamlined notebook workflow with 8 steps
Implements simplified FTE-HARM validation with:
- ONE hypothesis (first label from dataset)
- ONE P_Score method (Option A: Binary Presence)
- ONE validation approach (Binary: TP/FP/TN/FN)

Features:
- Flexible dataset discovery (handles variable naming conventions)
- Label discovery across all datasets
- Physical Token Quantization for entity extraction
- Binary validation with confusion matrix and metrics
- Results saved to Google Drive
Changes:
- Add Cell 2: Load MITRE ATT&CK hypotheses from summary folder
- Add Cell 5: Select hypothesis (MITRE or fallback)
- Add SUMMARY_PATH and MITRE_PATH configuration
- Update validation to use target_labels (list) instead of single label
- Save results to summary folder instead of root output path
- Add hypothesis_source tracking in results

Hypothesis loading priority:
1. mitre_att&ck/fte_harm_hypotheses.json
2. summary/fte_harm_config_latest.json
3. summary/fte_harm_hypotheses.json
4. Fallback from discovered labels
Features:
- 10 dataset-specific hypotheses targeting AITv2 labels
- MITRE ATT&CK metadata for forensic corroboration
- Threshold testing prioritizing HIGH RECALL
- Two P_Score methods: Option A (Binary) & B3 (Confidence-Weighted)
- Two-stage validation: detection + hypothesis matching
- MITRE corroboration table generation
- Automatic label mapping between hypotheses and ground truth

Hypotheses cover:
- Privilege Escalation (T1548.003, T1068)
- Discovery/Scanning (T1046)
- Credential Access (T1110.001)
- Persistence (T1505.003)
- Exfiltration (T1048, T1071.004)
- Lateral Movement (T1021.004)
- Command & Control (T1059, T1071.001)

Output paths:
- summary/: Main validation results
- threshold_test/: Threshold analysis
- mitre_att&ck/: MITRE corroboration tables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants