Releases: PC5518/anscom-nfie-python-extension
v1.5.0 — Per-File Intelligence (CSV, Duplicates, Top-N, Regex)
The largest single feature release since v1.0.0. Five new opt-in capabilities, all running on the same single-pass traversal — no re-scanning, no behavioral change to existing code.
What's New
return_files=True — Per-file list in the result dict
The returned dict gains a "files" key: a Python list of dicts, one per scanned file.
result = anscom.scan("/project", return_files=True, silent=True)
for f in result["files"]:
print(f["path"], f["size"], f["category"], f["mtime"])Each entry has path, size, ext, category, mtime. Size and mtime come from the same stat / FindFirstFile call already being made — no extra syscall on Windows. On Linux, fstatat is called only when needed.
export_csv="inventory.csv" — UTF-8 CSV export
Per-file inventory with columns path,size,ext,category,mtime. RFC 4180-compliant quoting. Zero dependencies. Reuses the per-file array collected for return_files.
anscom.scan("/data", export_csv="inventory.csv", silent=True)Pipe directly into pandas, openpyxl, or any standard CSV consumer:
import pandas as pd
df = pd.read_csv("inventory.csv")
df.to_excel("report.xlsx", index=False)largest_n=N — Top-N largest files
Per-thread min-heap of capacity N. O(log N) cost per file, no extra pass, no full sort.
result = anscom.scan("/mnt/storage", largest_n=20, silent=True)
for f in result["largest_files"]:
print(f"{f['size'] / 1024**3:.2f} GB {f['path']}")The printed report also gains a "TOP N LARGEST FILES" section.
find_duplicates=True — CRC32-based duplicate detection
Two-phase: (1) sort by size — files with a unique size are skipped entirely with zero I/O. (2) For same-size groups ≥2, read first 4096 bytes and compute CRC32.
result = anscom.scan("/media-library", find_duplicates=True, silent=True)
print(f"Duplicate groups: {len(result['duplicates'])}")Combine with return_files=True to compute reclaimable space.
regex_filter="pattern" — Path pattern filter
Only count files whose full path matches a regex.
anscom.scan("/codebase", regex_filter=r"/tests/.*\.py$", silent=True)- Linux / macOS: Native POSIX
regcomp(REG_EXTENDED | REG_NOSUB)+regexec— zero GIL acquisition, runs fully in C inside the worker threads. - Windows: Falls back to Python's
remodule. For large Windows scans, prefer theextensionswhitelist for zero-GIL filtering.
Invalid patterns raise ValueError immediately — no scan is started.
Performance
All five features are strictly opt-in. A plain anscom.scan(".") with no new parameters runs the identical hot path as v1.3.0 — no extra syscalls, no allocations per file, no behavioral change.
- Per-thread
FileInfoarray pre-allocated at 65,536 entries — zero reallocations for typical scans. fstataton Linux called only when needed (two separate guards: one for type resolution, one for size/mtime collection).- Per-thread min-heap for
largest_n— lock-free, merged after join. - Per-thread file arrays — lock-free, merged after join.
Migration from v1.3.0
No breaking changes. All v1.3.0 code runs unchanged on v1.5.0.
# v1.3.0 code — works identically on v1.5.0
result = anscom.scan("/data", silent=True, ignore_junk=True)
# v1.5.0 — opt into new features as needed
result = anscom.scan(
"/data",
silent=True,
ignore_junk=True,
return_files=True,
largest_n=20,
find_duplicates=True,
export_csv="inventory.csv",
)Bug Fixes
sorted_toppaths are nowstrdup'd independently fromglobal_heap— no lifetime overlap, no double-free.fstataton Linux is called only whenmin_size,return_files,export_csv,find_duplicates, orlargest_n > 0is active. Two separate guards for type resolution vs. size/mtime collection.- Full docstring on
anscom.scanis now accessible viahelp(anscom.scan).
Removed
export_excel— was crashing on Windows due to anopenpyxlWorkbook.read_onlyexception. Useexport_csv+pandas.to_excel()instead, which is faster, dependency-free at scan time, and works identically across platforms.
Full Example — Everything at Once
import anscom
result = anscom.scan(
"/mnt/enterprise",
max_depth=20,
workers=32,
ignore_junk=True,
silent=True,
return_files=True,
largest_n=50,
find_duplicates=True,
export_json="audit.json",
export_csv="inventory.csv",
show_tree=True,
export_tree="tree.txt",
)
print(f"Files : {result['total_files']:,}")
print(f"Duration : {result['duration_seconds']:.3f}s")
print(f"Dup groups : {len(result['duplicates'])}")
print(f"Largest file : {result['largest_files'][0]['path']}")One scan pass. Four output files. Full in-memory results.
Install: pip install --upgrade anscom
PyPI: https://pypi.org/project/anscom/1.5.0/
License: MIT
v1.4.0-- ignore (unstable version)
I apologize to for making the system unstable. I am working on making the stable version again.
anscom.c
LICENSE.txt
README.md
setup.py


v1.3.0 — Export Release (JSON, Tree, Excel)
What's New in v1.3.0
New Features
export_json— Export full scan results as formatted JSON file. Zero external dependencies — fully native using Python's built-in json module.export_tree— Save complete DFS directory tree to a.txtfile simultaneously alongside stdout. Written incrementally — no memory accumulation.export_excel— Export scan results to a structured.xlsxfile with three sheets: Categories, Extensions, and Summary. Requiresopenpyxl.
Bug Fixes
- Fixed MSVC Windows compiler compatibility — added
#include <stdint.h>foruint64_tsupport - Fixed
PATH_MAXundefined error on Windows MSVC - Fixed variable declarations for strict C89/C90 MSVC compliance
- Added
/std:c11flag insetup.pyfor Windows builds
API Changes
Three new optional parameters added to anscom.scan():
export_json=None— path to output JSON fileexport_tree=None— path to output TXT file (requiresshow_tree=True)export_excel=None— path to output XLSX file (requirespip install anscom[excel])
Installation
pip install anscom==1.3.0
# For Excel support:
pip install anscom[excel]Full Example
import anscom
anscom.scan(
"/path",
show_tree=True,
export_json="results.json",
export_tree="tree.txt",
export_excel="report.xlsx"
)