Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/run-yara-forge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,27 @@ jobs:
- name: Run YARA-Forge
run: |
python yara-forge.py

- name: Upload build statistics
if: always()
uses: actions/upload-artifact@v4
with:
name: build-statistics
path: build_stats.md
retention-days: 30

- name: Upload build log
if: always()
uses: actions/upload-artifact@v4
with:
name: build-log
path: yara-forge.log
retention-days: 30

- name: Upload rule issues
if: always()
uses: actions/upload-artifact@v4
with:
name: rule-issues
path: yara-forge-rule-issues.yml
retention-days: 30
24 changes: 24 additions & 0 deletions .github/workflows/weekly-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,30 @@ jobs:
run: |
python yara-forge.py

- name: Upload build statistics
if: always()
uses: actions/upload-artifact@v4
with:
name: build-statistics
path: build_stats.md
retention-days: 90

- name: Upload build log
if: always()
uses: actions/upload-artifact@v4
with:
name: build-log
path: yara-forge.log
retention-days: 90

- name: Upload rule issues
if: always()
uses: actions/upload-artifact@v4
with:
name: rule-issues
path: yara-forge-rule-issues.yml
retention-days: 90

- name: Get current date
run: echo "CURRENT_DATE=$(date +'%Y%m%d')" >> $GITHUB_ENV
shell: bash
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ repos/*
yara-forge-rule-issues.yml
build_stats.md
yara-forge-config-testing.yml
.claude/settings.local.json
26 changes: 18 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
# yara-forge
# YARA Forge

Automated YARA Rule Standardization and Quality Assurance Tool

YARA Forge is a robust tool designed to streamline the process of sourcing, standardizing, and optimizing YARA rules. It automates the collection of rules from various online repositories, ensures they adhere to a unified standard, conducts thorough quality checks, and eliminates any broken or non-compliant rules.
YARA Forge collects YARA rules from 45+ vetted security repositories, standardizes their metadata, performs multi-level quality checks, and generates tiered rule packages (core/extended/full) ready for integration into security products. It handles deduplication, private rule dependencies, and custom scoring to produce consistent, reliable rule sets for malware detection and threat hunting.

The tool generates curated rule packages, ready for integration into various security products, with an emphasis on performance and stability.
The tool is used by security teams and analysts who need curated YARA rules without manually managing multiple sources. Weekly releases are published automatically via GitHub Actions.

Perfect for analysts and security teams seeking consistent, reliable, and effective YARA rules.
## Components

This [web page](https://yarahq.github.io/) contains all information on the YARA Forge project.

Note: the repositories used for YARA Forge have been carefully selected. If you want to add other sets that random people publish on the Internet, you're on your own.
- `yara-forge.py` — CLI entry point and pipeline orchestrator
- `main/` — Rule collection, processing, and output generation
- `qa/` — Quality assurance checks and validation
- `packages/` — Generated rule packages (core, extended, full)

## Documentation

Detailed technical documentation on code structure, modules, classes, and functions: [code-structure.md](./docs/code-structure.md)
- **[Project Map / IKL](./docs/index.md)** — Navigation guide for the codebase
- [Architecture](./docs/architecture.md) — System design and data flows
- [Code Structure](./docs/code-structure.md) — API reference for modules and functions

## Quick Links

- [YARA Forge Website](https://yarahq.github.io/) — Official project page
- [GitHub Releases](https://github.com/YARAHQ/yara-forge/releases) — Weekly rule packages

> **Note:** The repositories used for YARA Forge have been carefully selected. Adding unvetted sources is not supported.
245 changes: 245 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# YARA Forge — Architecture

## Overview

YARA Forge is a batch processing pipeline that transforms YARA rules from multiple source repositories into standardized, quality-checked rule packages.

```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ 45+ GitHub │ │ │ │ Rule Packages │
│ Repositories │────▶│ YARA Forge │────▶│ (core/ext/ │
│ (YARA rules) │ │ Pipeline │ │ full .yar) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
┌──────────────┐
│ QA Reports │
│ Build Stats │
└──────────────┘
```

## Major Components

### 1. CLI Orchestrator (`yara-forge.py`)

The entry point that:

- Parses command-line arguments (`--debug`, `-c`)
- Configures logging (console + file)
- Loads configuration from YAML
- Executes pipeline stages in sequence
- Reports status via section headers

### 2. Rule Collector (`main/rule_collector.py`)

Responsible for:

- Cloning/updating Git repositories
- Sparse checkout for repositories with path filters
- Extracting license files
- Finding all `.yar`/`.yara` files
- Parsing rules via plyara library

### 3. Rule Processors (`main/rule_processors.py`)

The largest module, handling:

- Logic-hash deduplication across repositories
- Metadata standardization (author, date, description, hashes)
- Rule name prefixing with repository identifier
- UUID generation for tracking
- Tag extraction and normalization
- Score evaluation and importance assignment
- Private rule dependency management

### 4. Quality Assurance (`qa/rule_qa.py` + `qa/yaraQA/`)

Performs:

- Syntax validation (compile test)
- Critical issue detection (Level 4 → rule rejected)
- Efficiency analysis via yaraQA submodule
- Performance benchmarking
- Quality score adjustments
- Issue reporting to YAML

### 5. Rule Output (`main/rule_output.py`)

Generates:

- Filtered rule packages based on thresholds
- Private rule inclusion for dependencies
- Package headers with metadata
- Build statistics for releases

## Data Flow

```
┌─────────────────────────────────────────────────────────────────────┐
│ CONFIGURATION │
│ yara-forge-config.yml yara-forge-custom-scoring.yml │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ STAGE 1: COLLECT │
│ rule_collector.retrieve_yara_rule_sets() │
│ │
│ Input: Config (repository URLs, paths, branches) │
│ Output: yara_rule_repo_sets[] │
│ [{name, url, author, quality, license, commit, │
│ rules_sets: [{file_path, rules: [parsed_rule]}]}] │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ STAGE 2: PROCESS │
│ rule_processors.process_yara_rules() │
│ │
│ Transformations: │
│ - Deduplicate by logic hash │
│ - Prefix rule names with repo identifier │
│ - Normalize metadata fields │
│ - Generate UUIDs │
│ - Extract and normalize tags │
│ - Calculate scores and importance │
│ - Track private rule dependencies │
│ │
│ Output: processed_yara_repos[] (same structure, enriched rules) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ STAGE 3: QA │
│ rule_qa.evaluate_rules_quality() │
│ │
│ Checks: │
│ - Critical syntax issues (Level 4 → reject rule) │
│ - Syntax warnings (Level 2-3 → reduce score) │
│ - yaraQA efficiency analysis │
│ - Custom quality reductions (noisy-rules config) │
│ │
│ Side effects: │
│ - Updates rule quality scores │
│ - Writes yara-forge-rule-issues.yml │
│ │
│ Output: evaluated_yara_repos[] (quality-filtered) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ STAGE 4: OUTPUT │
│ rule_output.write_yara_packages() │
│ │
│ For each package (core, extended, full): │
│ - Filter by: minimum_quality, minimum_score, age range │
│ - Apply importance overrides (force include/exclude) │
│ - Collect required private rules │
│ - Generate .yar file with headers │
│ - Track statistics │
│ │
│ Output: │
│ - packages/core/yara-rules-core.yar │
│ - packages/extended/yara-rules-extended.yar │
│ - packages/full/yara-rules-full.yar │
│ - build_stats.md │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ STAGE 5: VALIDATE │
│ rule_qa.check_yara_packages() │
│ │
│ - Compile each generated package with YARA engine │
│ - Exit 0 on success, 1 on failure │
└─────────────────────────────────────────────────────────────────────┘
```

## Key Data Structures

### Rule (plyara format)

```python
{
'rule_name': str,
'metadata': [{'key': value}, ...],
'strings': [...],
'condition_terms': [...],
'raw_condition': str,
'scopes': ['private', ...], # optional
# Added by YARA Forge:
'logic_hash': str,
'original_rule_name': str,
'private_rules_used': [str, ...],
}
```

### Repository Set

```python
{
'name': str, # e.g., "signature-base"
'url': str, # GitHub URL
'author': str,
'quality': int, # Base quality score (70-90)
'license': str, # License text
'commit': str, # Git commit hash
'rules_sets': [
{
'file_path': str,
'rules': [rule, ...]
}
]
}
```

## External Dependencies

### Internal

| Dependency | Usage |
| ---------- | ----- |
| plyara | YARA rule parsing and manipulation |
| yara-python | Rule compilation and validation |
| GitPython | Repository cloning and history |
| PyYAML | Configuration loading |
| dateparser | Flexible date parsing |
| pyre2 (fb-re2) | Regex performance analysis |

### External Services

| Service | Usage |
| ------- | ----- |
| GitHub | Source repositories (via Git clone) |

### Submodules

| Submodule | Location | Purpose |
| --------- | -------- | ------- |
| yaraQA | `qa/yaraQA/` | Advanced rule efficiency analysis |

## Boundaries

### Internal (This Project)

- Rule collection, processing, QA, output
- Configuration management
- Pipeline orchestration
- Build statistics

### External (Not This Project)

- Source YARA rule content (from external repos)
- yaraQA analysis logic (submodule)
- YARA engine compilation
- Git operations (GitPython wrapper)

## Package Tiers

| Package | Quality | Score | Age | Use Case |
| ------- | ------- | ----- | --- | -------- |
| core | >= 70 | >= 65 | 1-2500 days | Production, low FP tolerance |
| extended | >= 50 | >= 60 | 1-5000 days | Broader coverage, moderate FP |
| full | >= 20 | >= 40 | 0-10000 days | Research, threat hunting |

Importance levels can override these thresholds via `force_include_importance_level` and `force_exclude_importance_level` settings.
Loading