Releases · PC5518/anscom-nfie-python-extension

09 Apr 13:48

PC5518

v1.5.0

be38b35

v1.5.0 — Per-File Intelligence (CSV, Duplicates, Top-N, Regex) Latest

Latest

The largest single feature release since v1.0.0. Five new opt-in capabilities, all running on the same single-pass traversal — no re-scanning, no behavioral change to existing code.

What's New

`return_files=True` — Per-file list in the result dict

The returned dict gains a "files" key: a Python list of dicts, one per scanned file.

result = anscom.scan("/project", return_files=True, silent=True)
for f in result["files"]:
    print(f["path"], f["size"], f["category"], f["mtime"])

Each entry has path, size, ext, category, mtime. Size and mtime come from the same stat / FindFirstFile call already being made — no extra syscall on Windows. On Linux, fstatat is called only when needed.

`export_csv="inventory.csv"` — UTF-8 CSV export

Per-file inventory with columns path,size,ext,category,mtime. RFC 4180-compliant quoting. Zero dependencies. Reuses the per-file array collected for return_files.

anscom.scan("/data", export_csv="inventory.csv", silent=True)

Pipe directly into pandas, openpyxl, or any standard CSV consumer:

import pandas as pd
df = pd.read_csv("inventory.csv")
df.to_excel("report.xlsx", index=False)

`largest_n=N` — Top-N largest files

Per-thread min-heap of capacity N. O(log N) cost per file, no extra pass, no full sort.

result = anscom.scan("/mnt/storage", largest_n=20, silent=True)
for f in result["largest_files"]:
    print(f"{f['size'] / 1024**3:.2f} GB  {f['path']}")

The printed report also gains a "TOP N LARGEST FILES" section.

`find_duplicates=True` — CRC32-based duplicate detection

Two-phase: (1) sort by size — files with a unique size are skipped entirely with zero I/O. (2) For same-size groups ≥2, read first 4096 bytes and compute CRC32.

result = anscom.scan("/media-library", find_duplicates=True, silent=True)
print(f"Duplicate groups: {len(result['duplicates'])}")

Combine with return_files=True to compute reclaimable space.

`regex_filter="pattern"` — Path pattern filter

Only count files whose full path matches a regex.

anscom.scan("/codebase", regex_filter=r"/tests/.*\.py$", silent=True)

Linux / macOS: Native POSIX regcomp(REG_EXTENDED | REG_NOSUB) + regexec — zero GIL acquisition, runs fully in C inside the worker threads.
Windows: Falls back to Python's re module. For large Windows scans, prefer the extensions whitelist for zero-GIL filtering.

Invalid patterns raise ValueError immediately — no scan is started.

Performance

All five features are strictly opt-in. A plain anscom.scan(".") with no new parameters runs the identical hot path as v1.3.0 — no extra syscalls, no allocations per file, no behavioral change.

Per-thread FileInfo array pre-allocated at 65,536 entries — zero reallocations for typical scans.
fstatat on Linux called only when needed (two separate guards: one for type resolution, one for size/mtime collection).
Per-thread min-heap for largest_n — lock-free, merged after join.
Per-thread file arrays — lock-free, merged after join.

Migration from v1.3.0

No breaking changes. All v1.3.0 code runs unchanged on v1.5.0.

# v1.3.0 code — works identically on v1.5.0
result = anscom.scan("/data", silent=True, ignore_junk=True)

# v1.5.0 — opt into new features as needed
result = anscom.scan(
    "/data",
    silent=True,
    ignore_junk=True,
    return_files=True,
    largest_n=20,
    find_duplicates=True,
    export_csv="inventory.csv",
)

Bug Fixes

sorted_top paths are now strdup'd independently from global_heap — no lifetime overlap, no double-free.
fstatat on Linux is called only when min_size, return_files, export_csv, find_duplicates, or largest_n > 0 is active. Two separate guards for type resolution vs. size/mtime collection.
Full docstring on anscom.scan is now accessible via help(anscom.scan).

Removed

export_excel — was crashing on Windows due to an openpyxl Workbook.read_only exception. Use export_csv + pandas.to_excel() instead, which is faster, dependency-free at scan time, and works identically across platforms.

Full Example — Everything at Once

import anscom

result = anscom.scan(
    "/mnt/enterprise",
    max_depth=20,
    workers=32,
    ignore_junk=True,
    silent=True,
    return_files=True,
    largest_n=50,
    find_duplicates=True,
    export_json="audit.json",
    export_csv="inventory.csv",
    show_tree=True,
    export_tree="tree.txt",
)

print(f"Files        : {result['total_files']:,}")
print(f"Duration     : {result['duration_seconds']:.3f}s")
print(f"Dup groups   : {len(result['duplicates'])}")
print(f"Largest file : {result['largest_files'][0]['path']}")

One scan pass. Four output files. Full in-memory results.

Install: pip install --upgrade anscom
PyPI: https://pypi.org/project/anscom/1.5.0/
License: MIT

Assets 2

09 Apr 07:51

PC5518

v1.4.0

effe3f7

v1.4.0-- ignore (unstable version)

I apologize to for making the system unstable. I am working on making the stable version again.
anscom.c
LICENSE.txt
README.md
setup.py

Assets 2

17 Mar 00:18

PC5518

v1.3.0

d87701b

v1.3.0 — Export Release (JSON, Tree, Excel)

What's New in v1.3.0

New Features

export_json — Export full scan results as formatted JSON file. Zero external dependencies — fully native using Python's built-in json module.
export_tree — Save complete DFS directory tree to a .txt file simultaneously alongside stdout. Written incrementally — no memory accumulation.
export_excel — Export scan results to a structured .xlsx file with three sheets: Categories, Extensions, and Summary. Requires openpyxl.

Bug Fixes

Fixed MSVC Windows compiler compatibility — added #include <stdint.h> for uint64_t support
Fixed PATH_MAX undefined error on Windows MSVC
Fixed variable declarations for strict C89/C90 MSVC compliance
Added /std:c11 flag in setup.py for Windows builds

API Changes

Three new optional parameters added to anscom.scan():

export_json=None — path to output JSON file
export_tree=None — path to output TXT file (requires show_tree=True)
export_excel=None — path to output XLSX file (requires pip install anscom[excel])

Installation

pip install anscom==1.3.0
# For Excel support:
pip install anscom[excel]

Full Example

import anscom

anscom.scan(
    "/path",
    show_tree=True,
    export_json="results.json",
    export_tree="tree.txt",
    export_excel="report.xlsx"
)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

`return_files=True` — Per-file list in the result dict

`export_csv="inventory.csv"` — UTF-8 CSV export

`largest_n=N` — Top-N largest files

`find_duplicates=True` — CRC32-based duplicate detection

`regex_filter="pattern"` — Path pattern filter

Performance

Migration from v1.3.0

Bug Fixes

Removed

Full Example — Everything at Once

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New in v1.3.0

New Features

Bug Fixes

API Changes

Installation

Full Example

Uh oh!

Releases: PC5518/anscom-nfie-python-extension

v1.5.0 — Per-File Intelligence (CSV, Duplicates, Top-N, Regex)

What's New

return_files=True — Per-file list in the result dict

export_csv="inventory.csv" — UTF-8 CSV export

largest_n=N — Top-N largest files

find_duplicates=True — CRC32-based duplicate detection

regex_filter="pattern" — Path pattern filter

Performance

Migration from v1.3.0

Bug Fixes

Removed

Full Example — Everything at Once

Uh oh!

v1.4.0-- ignore (unstable version)

Uh oh!

v1.3.0 — Export Release (JSON, Tree, Excel)

What's New in v1.3.0

New Features

Bug Fixes

API Changes

Installation

Full Example

Uh oh!

`return_files=True` — Per-file list in the result dict

`export_csv="inventory.csv"` — UTF-8 CSV export

`largest_n=N` — Top-N largest files

`find_duplicates=True` — CRC32-based duplicate detection

`regex_filter="pattern"` — Path pattern filter