Skip to content

Releases: PC5518/anscom-nfie-python-extension

v1.5.0 — Per-File Intelligence (CSV, Duplicates, Top-N, Regex)

09 Apr 13:48
be38b35

Choose a tag to compare

The largest single feature release since v1.0.0. Five new opt-in capabilities, all running on the same single-pass traversal — no re-scanning, no behavioral change to existing code.

What's New

return_files=True — Per-file list in the result dict

The returned dict gains a "files" key: a Python list of dicts, one per scanned file.

result = anscom.scan("/project", return_files=True, silent=True)
for f in result["files"]:
    print(f["path"], f["size"], f["category"], f["mtime"])

Each entry has path, size, ext, category, mtime. Size and mtime come from the same stat / FindFirstFile call already being made — no extra syscall on Windows. On Linux, fstatat is called only when needed.

export_csv="inventory.csv" — UTF-8 CSV export

Per-file inventory with columns path,size,ext,category,mtime. RFC 4180-compliant quoting. Zero dependencies. Reuses the per-file array collected for return_files.

anscom.scan("/data", export_csv="inventory.csv", silent=True)

Pipe directly into pandas, openpyxl, or any standard CSV consumer:

import pandas as pd
df = pd.read_csv("inventory.csv")
df.to_excel("report.xlsx", index=False)

largest_n=N — Top-N largest files

Per-thread min-heap of capacity N. O(log N) cost per file, no extra pass, no full sort.

result = anscom.scan("/mnt/storage", largest_n=20, silent=True)
for f in result["largest_files"]:
    print(f"{f['size'] / 1024**3:.2f} GB  {f['path']}")

The printed report also gains a "TOP N LARGEST FILES" section.

find_duplicates=True — CRC32-based duplicate detection

Two-phase: (1) sort by size — files with a unique size are skipped entirely with zero I/O. (2) For same-size groups ≥2, read first 4096 bytes and compute CRC32.

result = anscom.scan("/media-library", find_duplicates=True, silent=True)
print(f"Duplicate groups: {len(result['duplicates'])}")

Combine with return_files=True to compute reclaimable space.

regex_filter="pattern" — Path pattern filter

Only count files whose full path matches a regex.

anscom.scan("/codebase", regex_filter=r"/tests/.*\.py$", silent=True)
  • Linux / macOS: Native POSIX regcomp(REG_EXTENDED | REG_NOSUB) + regexeczero GIL acquisition, runs fully in C inside the worker threads.
  • Windows: Falls back to Python's re module. For large Windows scans, prefer the extensions whitelist for zero-GIL filtering.

Invalid patterns raise ValueError immediately — no scan is started.

Performance

All five features are strictly opt-in. A plain anscom.scan(".") with no new parameters runs the identical hot path as v1.3.0 — no extra syscalls, no allocations per file, no behavioral change.

  • Per-thread FileInfo array pre-allocated at 65,536 entries — zero reallocations for typical scans.
  • fstatat on Linux called only when needed (two separate guards: one for type resolution, one for size/mtime collection).
  • Per-thread min-heap for largest_n — lock-free, merged after join.
  • Per-thread file arrays — lock-free, merged after join.

Migration from v1.3.0

No breaking changes. All v1.3.0 code runs unchanged on v1.5.0.

# v1.3.0 code — works identically on v1.5.0
result = anscom.scan("/data", silent=True, ignore_junk=True)

# v1.5.0 — opt into new features as needed
result = anscom.scan(
    "/data",
    silent=True,
    ignore_junk=True,
    return_files=True,
    largest_n=20,
    find_duplicates=True,
    export_csv="inventory.csv",
)

Bug Fixes

  • sorted_top paths are now strdup'd independently from global_heap — no lifetime overlap, no double-free.
  • fstatat on Linux is called only when min_size, return_files, export_csv, find_duplicates, or largest_n > 0 is active. Two separate guards for type resolution vs. size/mtime collection.
  • Full docstring on anscom.scan is now accessible via help(anscom.scan).

Removed

  • export_excel — was crashing on Windows due to an openpyxl Workbook.read_only exception. Use export_csv + pandas.to_excel() instead, which is faster, dependency-free at scan time, and works identically across platforms.

Full Example — Everything at Once

import anscom

result = anscom.scan(
    "/mnt/enterprise",
    max_depth=20,
    workers=32,
    ignore_junk=True,
    silent=True,
    return_files=True,
    largest_n=50,
    find_duplicates=True,
    export_json="audit.json",
    export_csv="inventory.csv",
    show_tree=True,
    export_tree="tree.txt",
)

print(f"Files        : {result['total_files']:,}")
print(f"Duration     : {result['duration_seconds']:.3f}s")
print(f"Dup groups   : {len(result['duplicates'])}")
print(f"Largest file : {result['largest_files'][0]['path']}")

One scan pass. Four output files. Full in-memory results.


Install: pip install --upgrade anscom
PyPI: https://pypi.org/project/anscom/1.5.0/
License: MIT

v1.4.0-- ignore (unstable version)

09 Apr 07:51
effe3f7

Choose a tag to compare

I apologize to for making the system unstable. I am working on making the stable version again.
anscom.c
LICENSE.txt
README.md
setup.py
image

image

v1.3.0 — Export Release (JSON, Tree, Excel)

17 Mar 00:18
d87701b

Choose a tag to compare

What's New in v1.3.0

New Features

  • export_json — Export full scan results as formatted JSON file. Zero external dependencies — fully native using Python's built-in json module.
  • export_tree — Save complete DFS directory tree to a .txt file simultaneously alongside stdout. Written incrementally — no memory accumulation.
  • export_excel — Export scan results to a structured .xlsx file with three sheets: Categories, Extensions, and Summary. Requires openpyxl.

Bug Fixes

  • Fixed MSVC Windows compiler compatibility — added #include <stdint.h> for uint64_t support
  • Fixed PATH_MAX undefined error on Windows MSVC
  • Fixed variable declarations for strict C89/C90 MSVC compliance
  • Added /std:c11 flag in setup.py for Windows builds

API Changes

Three new optional parameters added to anscom.scan():

  • export_json=None — path to output JSON file
  • export_tree=None — path to output TXT file (requires show_tree=True)
  • export_excel=None — path to output XLSX file (requires pip install anscom[excel])

Installation

pip install anscom==1.3.0
# For Excel support:
pip install anscom[excel]

Full Example

import anscom

anscom.scan(
    "/path",
    show_tree=True,
    export_json="results.json",
    export_tree="tree.txt",
    export_excel="report.xlsx"
)