This repository aggregates multiple public threat intelligence (TI) data sources into a single, normalized, and de-duplicated list of malicious, phishing, C2, and suspicious domains.
The goal is to provide a clean, ready-to-consume IOC dataset for:
- SOC & DFIR teams
- Blue-team threat hunting
- SIEM lookup enrichment
- DNS/Firewall blocking
- OSINT/CTI research
✔ Aggregates 19+ raw feeds
✔ Extracts domains using strict regex
✔ Automatically deduplicates
✔ Deterministic sorted output (stable Git diffs)
✔ CI/CD ready feed pipeline
✔ Designed for SOC production environments
malicious-domains/
├── sources/ # Raw upstream threat intel feeds
├── scripts/ # TI ingestion + normalization pipeline
│ ├── update_feeds.sh
│ └── combine_feeds.py
├── output/ # Final unified domain lists
│ ├── domains.txt
│ └── domains.csv
├── docs/ # Engineering documentation
│ ├── ARCHITECTURE.md
│ ├── DATA_MODEL.md
│ └── FEED_SOURCES.md
└── CONTRIBUTING.md
The pipeline follows a clean separation of layers:
[Raw OSINT Feeds] --> sources/
(untouched)
sources/ --> combine_feeds.py
(parse + extract + dedupe)
combine_feeds.py --> output/
(normalized artifacts)
Principles:
- Lossless ingestion (retain original data in
sources/) - Normalization only in scripts
- Idempotent runs
- Deterministic ordering
More visuals: see docs/ARCHITECTURE.md
You can wire this script to cron or a GitHub Action.
./scripts/update_feeds.shThis refreshes raw .txt feed files in sources/.
NOTE: Replace placeholder URLs in the script with real feed URLs.
python3 scripts/combine_feeds.pyOutputs generated under output/:
| File | Purpose |
|---|---|
domains.txt |
One domain per line list (ready for DNS/firewall) |
domains.csv |
CSV format with header (SIEM lookup tables, SOAR enrichment) |
- Indicator type: Domain
- Regex-based strict extraction
- Canonical form: lower-cased domain only
- No URLs, IPs, paths, or protocols
Future metadata planned:
- source feed
- threat type (phishing/malware/c2)
- first_seen / last_seen timestamps
- confidence score
More details: docs/DATA_MODEL.md
All OSINT-provider files are located in sources/.
Mapping details: docs/FEED_SOURCES.md
Upload output/domains.csv as:
- A lookup table
- Dynamic blacklist
- Enrichment dataset
Use case:
-
When DNS/Proxy/Firewall logs contain a domain:
- check membership in this list
- tag as suspicious
- map to threat intelligence source
Convert domains to hosts file format:
0.0.0.0 bad-domain.example
Example:
sed 's/^/0.0.0.0 /' output/domains.txt > output/hosts.txtUse hosts.txt as blocklist.
Convert to bulk blacklist import format.
Example URL pattern:
*.malicious-domain.com
Future plan: auto-generate firewall import format.
Feed domains.csv into:
- Cortex XSOAR playbooks
- Shuffle automations
- ANY SOC custom enrichment microservice
✔ Malicious infra trend analysis ✔ Domain age profiling ✔ Malware campaign correlation ✔ TI scoring models ✔ WhoIs intel pivoting ✔ APT/C2 infra clustering
-
Add automated feed ingestion via GitHub Actions
-
Export artifacts:
- STIX
- MISP JSON
- hosts file
-
Add metadata annotations:
- threat_type
- first_seen
- confidence
-
Build lookup API for realtime domain reputation:
GET /lookup?domain=xyz.com
Contributions welcome!
Please check CONTRIBUTING.md
All data are collected for:
- research
- blue-team defensive security
- SOC/Threat Intel usage only
❗ Do NOT use this dataset for any offensive or unlawful purpose. ❗ Maintainer holds no liability for misuse.