ArgAtlas is a Streamlit dashboard for OSINT workflows on usernames, emails, domains, and IP indicators. The project combines multi-platform discovery, external API enrichment, local SQLite persistence, visual analytics, and exports suitable for technical analysis and reporting.
- Purpose
- Core features
- Architecture
- Project structure
- Requirements
- Setup and run
- Environment configuration (
.env) - Application configuration (
config.py) - Operational workflows
- Data model and persistence
- Exports and artifacts
- Tests
- Performance and resilience
- Troubleshooting
- Security and compliance
- Suggested roadmap
ArgAtlas is designed to:
- map the digital footprint of an identifier
- correlate indicators across multiple platforms
- support OSINT investigations from a single UI
- keep a local history of scans
- produce portable outputs (PDF/Excel/JSON/CSV/JSONL/HTML)
The project is local-first and does not require an external backend.
- operational KPIs (volume, coverage, risk)
- global map with points, heatmap, clustering, and threat zones
- combined filters by username, date range, found percentage, risk, and verified state
- daily timeline, weekly trend, platform distribution
- user-platform entity graph
- HTML snapshot export of visualizations
- fast scan for a single input
- profile status checks across many platforms
- optional scraping preview
- optional GitHub enrichment
- optional external API enrichment
- extended scan with deeper control
- scan parameter tuning (for example max profiles)
- risk assessment computed at the end of each run
- persistent storage with temporal deduplication
- CSV ingestion using a
usernamecolumn - memory-safe chunking (
CSV_BATCH_SIZE) for larger files - processed/saved/skipped summary
- deduplication support by time window
- recent scan review
- on-demand export preparation
- PDF, Excel, JSON, and profile CSV downloads
- bulk exports (JSONL, CSV summary, Excel)
- email reverse lookup (Hunter.io)
- account correlation
- naming and platform-preference pattern detection
- alert generation and alert state tracking (OPEN/RESOLVED)
- scan-to-scan comparison (added/removed platforms, risk delta)
- total scans and unique users
- reports folder size
- API integration status
- quick operational health metrics
ArgAtlas_v4.py: Streamlit UI and workflow orchestrationengine_core.py: scan engine, API integrations, risk scoringdatastore.py: SQLite persistence, filtering, bulk operations, alert storageanalysis_tools.py: correlation, pattern analysis, alert logic, scan comparisonexporters.py: report file generationviz.py: Plotly visual layers and HTML snapshotsutils.py: HTTP/rate limiting, input validation, metadata extractionconfig.py: app configuration and feature flags
User input
-> validate_username (utils)
-> run_scan_for_input (engine_core)
-> build_services_for_username
-> check_profiles_exist (HTTP status)
-> scrape_social_preview (metadata)
-> github_lookup + external enrichments
-> compute_risk_assessment
-> save_scan (datastore)
-> generate_alerts / analysis_tools
-> dashboard rendering
-> file export (exporters / viz)
osint_suite_pro/
|-- .env.example
|-- ArgAtlas_v4.py
|-- analysis_tools.py
|-- config.py
|-- datastore.py
|-- engine_core.py
|-- exporters.py
|-- requirements.txt
|-- utils.py
|-- viz.py
|-- data/
| `-- capitals.json
|-- tests/
| |-- test_analysis_tools.py
| `-- test_datastore.py
|-- reports/
|-- fonts/
|-- backup_unused/
`-- osint_scans.db
- Python 3.10+ (3.11 recommended)
- dedicated virtual environment
- internet connectivity for external lookups
- font available at
fonts/NotoSans-Regular.ttffor Unicode PDF export
Main dependencies:
streamlit>=1.20requestsbeautifulsoup4pandasfpdfplotlyopenpyxlnetworkxpython-dotenv
- Move to project folder:
cd .\osint_suite_pro- Create and activate a virtual environment.
If you are in the workspace root (Project_Sherlook):
python -m venv myenv
.\myenv\Scripts\Activate.ps1If you are already inside osint_suite_pro:
python -m venv ..\myenv
.\..\myenv\Scripts\Activate.ps1- Install dependencies:
pip install -r requirements.txt- Configure
.env:
Copy-Item .env.example .env- Start app:
streamlit run ArgAtlas_v4.pyTypical UI URL: http://localhost:8501
cd .\osint_suite_pro
.\..\myenv\Scripts\Activate.ps1
streamlit run ArgAtlas_v4.pyThe .env.example file includes all supported variables.
HUNTER_IO_API_KEYGITHUB_API_TOKENABUSEIPDB_API_KEYVIRUSTOTAL_API_KEYIPINFO_TOKENREDDIT_CLIENT_IDREDDIT_CLIENT_SECRETREDDIT_USER_AGENTYOUTUBE_API_KEY
URLSCAN_API_KEYOTX_API_KEYGREYNOISE_API_KEY
EXTERNAL_API_TIMEOUT(default:10)EXTERNAL_API_RETRIES(default:2)EXTERNAL_API_RETRY_BACKOFF(default:0.75)EXTERNAL_API_CACHE_TTL(default:86400)
Important notes:
- some integrations can still work without keys (for example limited URLScan, baseline OTX, crt.sh, URLhaus, ipapi)
- when a key is missing, the related module returns
enabled: falseorskippedwithout breaking the pipeline
BASE_DIRDB_PATHREPORTS_PATHACCENT_COLORPAGE_TITLE
CACHE_LIMITCLUSTER_LEVELSDEFAULT_MAP_ZOOMDEFAULT_MAP_CENTERTHREAT_ZONES
MAX_CSV_ROWSCSV_ENCODINGCSV_BATCH_SIZEDEFAULT_MAX_PROFILESMAX_CONCURRENT_CHECKSSKIP_DUPLICATE_DAYS
HTTP_TIMEOUTSCRAPE_DELAYSTATUS_CHECK_DELAYMAX_DOMAIN_CACHE_SIZEDOMAIN_RATE_LIMITS
ALERT_CONFIGCORRELATION_MIN_PLATFORMSCORRELATION_MIN_SIMILARITY
- username
- handle with allowed symbols
- IP (for threat intel modules)
- domain/URL (for reputation modules)
Input validation (utils.validate_username):
- maximum length: 100
- allowed charset: letters, digits,
._-@+, spaces - reserved username blocklist (
admin,root,system,null,undefined)
run_scan_for_input performs:
- input normalization (base username)
- social service map generation (
build_services_for_username) - profile status checks (
check_profiles_exist) - metadata scraping (
scrape_social_preview) - GitHub lookup (
github_lookup) - external enrichment (
run_external_enrichment) - risk scoring (
compute_risk_assessment) - username variant generation (
brute_username)
run_batch_scan_from_csv uses pandas.read_csv(..., chunksize=CSV_BATCH_SIZE):
- more stable memory profile on larger files
- chunk-level logging
- skips empty rows and repeated header rows
Weighted score (0-100) based on:
- found profile percentage (
found_profiles_pct) - VirusTotal findings (
malicious,suspicious) - AbuseIPDB confidence score
Levels:
High>= 70Medium>= 40Low< 40
build_services_for_username covers many platforms, including:
- GitHub, X/Twitter, Instagram, Facebook, TikTok, YouTube, LinkedIn, Reddit, Telegram
- Twitch, Mastodon, Discord, Bluesky, Threads, Pinterest, Tumblr, Medium, Dev.to
- Stack Overflow, Quora
- Steam, PlayStation, Xbox, Roblox
- DeviantArt, ArtStation, Flickr, Behance
- Spotify, SoundCloud, Bandcamp, Last.fm
- GitLab, Kaggle, Replit, Codepen, and more
- GitHub API
- Reddit API (OAuth with public fallback)
- YouTube Data API v3
- AbuseIPDB
- IPinfo
- VirusTotal
- URLScan.io
- AlienVault OTX
- GreyNoise
- crt.sh
- URLhaus
- ipapi
All lookups use a resilient wrapper (_api_request_json) with:
- centralized timeout
- retry for retryable statuses (
408,425,429,500,502,503,504) - progressive backoff
- in-memory TTL cache for GET requests
Local SQLite database: osint_scans.db
Main fields:
idusernamequeried_atresult_jsonfound_pctrisk_scoreverified
Main indexes:
idx_usernameidx_queried_atidx_found_pctidx_risk_scoreidx_verified
Main fields:
idscan_id(FK -> scans)alert_typealert_msgseveritystatuscreated_at
Aggregated per-username registry:
username(unique)first_seenlast_seenscans_countlast_found_pctlast_risk_scorelast_result_json
save_scan blocks duplicate inserts when the same username was scanned within
SKIP_DUPLICATE_DAYS.
Output directory: reports/
- PDF:
generate_pdf_report - Excel:
generate_excel - JSON:
generate_json - Profile CSV:
generate_csv_profiles
- JSONL:
generate_jsonl_bulk - CSV summary:
generate_csv_bulk_summary - Bulk Excel (from UI)
Handled through viz.export_snapshot_html in the dashboard.
Filename policy: prefix, safe username, and timestamp (YYYYMMDD_HHMMSS).
Current suite: unittest
tests/test_analysis_tools.py- account correlations
- pattern detection
- alert generation
- scan comparison
tests/test_datastore.py- save deduplication
- alert lifecycle (OPEN -> RESOLVED)
Run tests:
cd .\osint_suite_pro
.\..\myenv\Scripts\python.exe -m unittest discover -s tests -v- per-domain rate limiting (with LRU domain cache)
- retry + backoff for transient failures
- API response caching with configurable TTL
- CSV chunking to reduce memory pressure
- prepared export cleanup in UI session
- reduce
do_previewfor very large runs - lower
max_profilesin slow or heavily rate-limited environments - increase
EXTERNAL_API_TIMEOUTon high-latency networks - keep
EXTERNAL_API_RETRIESconservative to avoid long blocking runs
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypassstreamlit run ArgAtlas_v4.py --server.port 8502- check keys in
.env - verify provider quotas/rate limits
- ensure input type matches module expectations (IP, domain, username)
- confirm
DB_PATHinconfig.py - verify app points to the intended SQLite file
- check whether deduplication is skipping recent saves
- verify font exists at
fonts/NotoSans-Regular.ttf - verify write permissions on
reports/
- do not hardcode secrets in source code
- always use environment variables for tokens/API keys
- run only for authorized use cases
- respect provider ToS, privacy requirements, and local regulations
- many profile checks are heuristic (HTTP status + URL patterns)
- metadata scraping quality depends on platform markup changes
- external enrichment quality depends on third-party provider quality/availability
- advanced sections provide baseline analytics, not a full forensic pipeline
- broader test coverage across engine and UI logic
- additional providers with quota-aware scheduling
- STIX/TAXII or SOC-oriented export formats
- recurring scan scheduler
- authentication/roles for multi-user deployments
Use ArgAtlas only for subjects, accounts, and domains where you have legal basis or authorization. The operator is responsible for compliance with applicable laws, privacy rules, and platform terms.