GitHub - bijay-develops/static-document-security-scanner

Static Document Security Scanner

This project is a small, composable static document scanner implemented in Go.

For a step‑by‑step walkthrough of installation and usage, see Guide.md.

For a detailed list of currently supported document types and how to extend them, see Supported_Document_Types.md.

What it does

Walks a directory tree recursively
Uses a worker pool to scan files concurrently based on runtime.NumCPU()
Analyzes files using pluggable analyzers
Computes SHA256 hashes for every scanned file
Detects macro-enabled Word documents (.docx, .docm) by locating vbaProject.bin inside the ZIP structure
Detects suspicious PDF indicators in .pdf files using simple heuristic string matching
Emits structured JSON results on stdout

Quick usage

From the docscanner directory (after cloning the repo):

go run ./cmd/scanner <directory>

Example (scan the provided samples directory from the repo root):

go run ./cmd/scanner ../samples

The CLI prints an array of JSON objects, each matching the ScanResult model defined in internal/model/result.go.

To save the JSON output to a file instead of just printing it:

go run ./cmd/scanner ../samples > results.json

You can then open results.json in your editor.

For more detailed usage patterns (different directories, troubleshooting, etc.) see Guide.md.

Supported document types

Out of the box, the scanner understands:

Microsoft Word: .docx, .docm
PDF: .pdf

Other file types are walked but ignored unless you implement and register a new analyzer. See Supported_Document_Types.md for details.

Extensibility

New document types can be added by implementing the Analyzer interface in internal/analyzer/analyzer.go.
The worker pool and directory walker do not need to change when new analyzers are introduced.

This is intentionally a foundation: a minimal but solid base to grow more advanced detection logic.

Backend / hosting notes

The code is structured so the core scanning logic under internal/ can be reused from a long‑running backend service (for example, an HTTP API). A typical next step is to add a cmd/server entrypoint that:

Listens on an HTTP port
Accepts scan requests (e.g. directory paths or uploaded files)
Invokes the existing walker, worker pool, and analyzers to produce ScanResult JSON

Once such a server entrypoint exists, you can deploy it to platforms like Render as a Go web service.

Code structure overview

High-level flow:

main.go
	├─ parses CLI args
	├─ creates channels (files, results)
	├─ builds analyzers []Analyzer
	├─ starts directory walker (WalkDirectory)
	├─ starts worker pool (StartWorkerPool)
	└─ aggregates []ScanResult and prints JSON

WalkDirectory (internal/scanner/walker.go)
	└─ walks the filesystem and pushes file paths into files chan

StartWorkerPool (internal/scanner/workerpool.go)
	└─ spins up N workers
			 └─ for each file:
						├─ os.ReadFile
						├─ pick matching Analyzer via Supports
						└─ Analyzer.Analyze → *ScanResult → results chan

Analyzers (internal/analyzer/*.go)
	├─ WordAnalyzer  – detects vbaProject.bin in .docx/.docm
	└─ PDFAnalyzer   – scans for suspicious PDF keywords

Model (internal/model/result.go)
	└─ ScanResult – structure that is serialized to JSON

Data flow (Mermaid)

flowchart TD
	CLI["CLI (main.go)"] --> ARGS["Parse args (root dir)"]
	ARGS --> CHANNELS["Create channels: files, results"]
	CHANNELS --> ANALYZERS["Build analyzers []Analyzer"]
	CHANNELS --> WALKER["WalkDirectory (walker.go)"]
	WALKER --> FILES["files chan <- file paths"]
	FILES --> WORKERPOOL["StartWorkerPool (workerpool.go)"]
	WORKERPOOL --> WORKER["N workers"]
	WORKER --> READ["os.ReadFile(file)"]
	READ --> DISPATCH{"Supports(file)?"}
	DISPATCH -->|yes| ANALYZE["Analyzer.Analyze(file, data)"]
	ANALYZE --> RESULTS["results chan <- *ScanResult"]
	RESULTS --> AGG["Aggregate []ScanResult"]
	AGG --> JSON["Marshal JSON & print"]

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docscanner		docscanner
.gitignore		.gitignore
Analyzer_Interface.md		Analyzer_Interface.md
Core_Data_Model.md		Core_Data_Model.md
Guide.md		Guide.md
Main_CLI.md		Main_CLI.md
PDF_Analyzer.md		PDF_Analyzer.md
README.md		README.md
Supported_Document_Types.md		Supported_Document_Types.md
Word_Analyzer.md		Word_Analyzer.md
Worker_Pool.md		Worker_Pool.md
architectural_philosophy.md		architectural_philosophy.md
directory_Walker.md		directory_Walker.md
project_structure.md		project_structure.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Static Document Security Scanner

What it does

Quick usage

Supported document types

Extensibility

Backend / hosting notes

Code structure overview

Data flow (Mermaid)

About

Uh oh!

Releases

Packages

Languages

bijay-develops/static-document-security-scanner

Folders and files

Latest commit

History

Repository files navigation

Static Document Security Scanner

What it does

Quick usage

Supported document types

Extensibility

Backend / hosting notes

Code structure overview

Data flow (Mermaid)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages