Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Portable Audit eXporter (PAX) Solution Set
<!-- v1.0.11 -->

**Portable Audit eXporter (PAX)** exports Copilot and AI usage data from Purview and Graph API audit logs via Graph API or EOM methods. All solutions export to CSV or Excel formats, ready for analysis in Power BI or your preferred data analysis tool.

Expand Down Expand Up @@ -67,7 +66,7 @@ This is an experimental script. On occasion, you may notice small deviations fro

---

> **🔍 Purview Audit Log Processor:** Download the script → [`PAX_Purview_Audit_Log_Processor_v1.10.5.ps1`](https://github.com/microsoft/PAX/releases/download/purview-v1.10.5/PAX_Purview_Audit_Log_Processor_v1.10.5.ps1)
> **🔍 Purview Audit Log Processor:** Download the script → [`PAX_Purview_Audit_Log_Processor_v1.10.6.ps1`](https://github.com/microsoft/PAX/releases/download/purview-v1.10.6/PAX_Purview_Audit_Log_Processor_v1.10.6.ps1)
>
> **📖 Resources:** [Latest Documentation](https://github.com/microsoft/PAX/blob/release/release_documentation/Purview_Audit_Log_Processor/PAX_Purview_Audit_Log_Processor_Documentation_v1.10.0.md) | [Latest Release Notes](https://github.com/microsoft/PAX/blob/release/release_notes/Purview_Audit_Log_Processor/PAX_Purview_Audit_Log_Processor_Release_Note_v1.10.0.md)
>
Expand Down
2 changes: 1 addition & 1 deletion release_documentation/.gitkeep
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Last updated: 2026-01-30 (PAX v1.0.17, Graph v1.0.1, Purview v1.10.5, CopilotInteractions v1.2.0)
# Last updated: 2026-02-10 (PAX v1.0.18, Graph v1.0.1, Purview v1.10.6, CopilotInteractions v1.2.0)
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Portable Audit eXporter (PAX) - <br/>Purview Audit Log Processor

> **📥 Quick Start:** Download the script → [`PAX_Purview_Audit_Log_Processor_v1.10.5.ps1`](https://github.com/microsoft/PAX/releases/download/purview-v1.10.5/PAX_Purview_Audit_Log_Processor_v1.10.5.ps1)
> **📥 Quick Start:** Download the script → [`PAX_Purview_Audit_Log_Processor_v1.10.6.ps1`](https://github.com/microsoft/PAX/releases/download/purview-v1.10.6/PAX_Purview_Audit_Log_Processor_v1.10.6.ps1)
>
> **📋 Release Notes:** See what's new → [v1.10.x Release Notes](https://github.com/microsoft/PAX/blob/release/release_notes/Purview_Audit_Log_Processor/PAX_Purview_Audit_Log_Processor_Release_Note_v1.10.0.md) | [All Release Notes](https://github.com/microsoft/PAX/tree/release/release_notes/Purview_Audit_Log_Processor)
>
> **📜 Previous Script Versions:** [All Purview Releases](https://github.com/microsoft/PAX/releases?q=purview-&expanded=true)
>
> **📚 Documentation Archive:** [v1.10.x Documentation](https://github.com/microsoft/PAX/blob/release/release_documentation/Purview_Audit_Log_Processor/PAX_Purview_Audit_Log_Processor_Documentation_v1.10.0.md) | [All Documentation](https://github.com/microsoft/PAX/tree/release/release_documentation/Purview_Audit_Log_Processor)

**Script:** `PAX_Purview_Audit_Log_Processor_v1.10.5.ps1`
**Script:** `PAX_Purview_Audit_Log_Processor_v1.10.6.ps1`
**Documentation Version:** 1.10.x
**Audience:** IT admins, security/compliance analysts, BI/data teams
**Runtime:** PowerShell 5.1 (compatible) / PowerShell 7+ (recommended)
Expand Down Expand Up @@ -51,7 +51,7 @@ This is an experimental script. On occasion, you may notice small deviations fro
12. [Combining Filters](#combining-filters)
13. [DSPM for AI](#dspm-for-ai)
14. [Excel Export](#excel-export)
15. [Incremental Data Collection](#incremental-data-collection-appendfile)
15. [Incremental Data Collection](#incremental-data-collection)
16. [Checkpoint & Resume](#checkpoint--resume)
17. [Output Files & Schema](#output-files--schema)
18. [Activity Types Reference](#activity-types-reference)
Expand Down Expand Up @@ -143,6 +143,7 @@ The **Portable Audit eXporter (PAX)** is an enterprise-grade PowerShell script t
- **Learned Block Sizes:** Per-activity and global adaptive sizing based on observed densities
- **Fast Data Writer:** Direct `StreamWriter` usage for CSV; ImportExcel module for Excel exports
- **Schema Sampling:** Configurable initial sampling to optimize column discovery vs. memory usage
- **Memory Management:** Automatic memory monitoring (`-MaxMemoryMB`) that streams records directly to JSONL files when system memory reaches the threshold (75% of RAM by default)

</details>

Expand Down Expand Up @@ -450,7 +451,7 @@ powershell -ExecutionPolicy Bypass -File .\PAX_Purview_Audit_Log_Processor.ps1 -

**Notes:**

- See [Incremental Data Collection](#incremental-data-collection-appendfile) section for complete documentation
- See [Incremental Data Collection](#incremental-data-collection) section for complete documentation
- Validates header compatibility before appending
- Works with both live query and offline replay modes
- NOT compatible with `-IncludeUserInfo` or `-OnlyUserInfo`
Expand Down Expand Up @@ -1165,6 +1166,33 @@ ExchangeAdmin, ExchangeItem, ExchangeMailbox, SharePointFileOperation, SharePoin

---

#### `-MaxMemoryMB` (int)

**Purpose:** Memory threshold that controls when PAX switches to JSONL-only streaming mode (records bypass in-memory collection and are written directly to incremental JSONL files). Active by default — PAX automatically monitors memory usage and streams to disk when the threshold is reached.
**Range:** `-1` to `65536`
**Default:** `-1` (auto = 75% of system RAM)
**Adjust When:**

- Running on memory-constrained machines where 75% of RAM is still too generous
- Running alongside other processes that need available RAM — set an explicit lower cap
- Scheduled/unattended exports where you want a predictable, fixed memory ceiling

**Notes:**

- Always active by default at 75% of system RAM — no action needed for most users
- Set to `0` to disable the memory threshold entirely (all records collected in memory)
- Not compatible with `-ExplodeArrays` or `-ExplodeDeep` (explosion modes always use in-memory processing; the threshold is ignored with a logged warning)
- Stored in checkpoint and can be overridden with `-Resume` (e.g., resuming on different hardware)

**Examples:**

```
-MaxMemoryMB 4096 # Override auto-detection — cap at 4 GB
-MaxMemoryMB 0 # Disable — keep all records in memory
```

---

### Observability & Completeness Parameters

#### `-EmitMetricsJson` (switch)
Expand Down Expand Up @@ -1271,6 +1299,7 @@ The `-Resume` switch restores ALL settings from the checkpoint file to ensure da
| `-ClientId` | Override client ID (for AppRegistration) |
| `-ClientSecret` | Provide client secret (for AppRegistration) |
| `-ExplosionThreads` | Override thread count for parallel explosion (e.g., resuming on different hardware) |
| `-MaxMemoryMB` | Override memory threshold (e.g., resuming on different hardware) |

**NOT Allowed with `-Resume`:**

Expand Down Expand Up @@ -1821,7 +1850,7 @@ elseif ($LASTEXITCODE -eq 20) { Write-Host 'Circuit breaker tripped – investig

</details>

### Performance Tuning
### Performance Tuning Examples

<details>
<summary>💻 Show Performance Tuning Examples</summary>
Expand All @@ -1838,6 +1867,9 @@ elseif ($LASTEXITCODE -eq 20) { Write-Host 'Circuit breaker tripped – investig

# Parallel explosion for large datasets (PS7+ only)
./PAX_Purview_Audit_Log_Processor.ps1 -ExplodeDeep -ExplosionThreads 8 -StartDate 2025-10-01 -EndDate 2025-10-31

# Cap memory at 4 GB for large standard exports
./PAX_Purview_Audit_Log_Processor.ps1 -MaxMemoryMB 4096 -StartDate 2025-10-01 -EndDate 2025-10-31
```

</details>
Expand Down Expand Up @@ -3821,6 +3853,7 @@ This reactive approach is more reliable than time-based prompts because token li
- `-Auth` - Override authentication method
- `-TenantId`, `-ClientId`, `-ClientSecret` - Auth credentials for AppRegistration
- `-ExplosionThreads` - Override thread count for parallel explosion (e.g., resuming on different hardware)
- `-MaxMemoryMB` - Override memory threshold (e.g., resuming on different hardware)

**NOT Allowed with `-Resume`:**
- Any other parameter (dates, activities, explosion settings, etc.)
Expand Down Expand Up @@ -5060,6 +5093,18 @@ pwsh -ExecutionPolicy Bypass -File ./PAX_Purview_Audit_Log_Processor.ps1 `
-EndDate 2025-10-02
```

**For Large Standard (Non-Exploded) Exports:**

PAX automatically monitors memory and streams to JSONL when 75% of system RAM is reached. Use `-MaxMemoryMB` only to override the default threshold or disable it.

```powershell
# Override auto-detection — explicit 4 GB cap on memory-constrained machines
./PAX_Purview_Audit_Log_Processor.ps1 -MaxMemoryMB 4096 -StartDate 2025-10-01 -EndDate 2025-10-31

# Disable memory threshold — keep all records in memory (not recommended for large exports)
./PAX_Purview_Audit_Log_Processor.ps1 -MaxMemoryMB 0 -StartDate 2025-10-01 -EndDate 2025-10-31
```

</details>

### Parallel Execution Tuning
Expand Down
2 changes: 1 addition & 1 deletion release_notes/.gitkeep
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Last updated: 2026-01-30 (PAX v1.0.17, Graph v1.0.1, Purview v1.10.5, CopilotInteractions v1.2.0)
# Last updated: 2026-02-10 (PAX v1.0.18, Graph v1.0.1, Purview v1.10.6, CopilotInteractions v1.2.0)
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Release Information

- **Version:** 1.10.x
- **Release Date:** 2026-01-30
- **Release Date:** 2026-02-10
- **Released By:** Microsoft Copilot Growth ROI Advisory Team (copilot-roi-advisory-team-gh@microsoft.com)

---
Expand All @@ -12,7 +12,7 @@

Download the script below. For questions or issues, refer to the documentation.

- **PAX Purview Audit Log Processor Script v1.10.5:** [PAX_Purview_Audit_Log_Processor_v1.10.5.ps1](https://github.com/microsoft/PAX/releases/download/purview-v1.10.5/PAX_Purview_Audit_Log_Processor_v1.10.5.ps1)
- **PAX Purview Audit Log Processor Script v1.10.6:** [PAX_Purview_Audit_Log_Processor_v1.10.6.ps1](https://github.com/microsoft/PAX/releases/download/purview-v1.10.6/PAX_Purview_Audit_Log_Processor_v1.10.6.ps1)
- **Documentation v1.10.x (Markdown):** [PAX_Purview_Audit_Log_Processor_Documentation_v1.10.x.md](https://github.com/microsoft/PAX/blob/release/release_documentation/Purview_Audit_Log_Processor/PAX_Purview_Audit_Log_Processor_Documentation_v1.10.0.md)

---
Expand All @@ -25,7 +25,7 @@ The **Microsoft 365 Usage Bundle** (`-IncludeM365Usage`) is a single-switch acti

**Checkpoint & Resume** (`-Resume`) enables recovery from interrupted exports—a critical capability for multi-hour queries spanning large date ranges. PAX automatically saves progress after each partition completes, allowing seamless resumption after token expiry, network interruptions, or system restarts. Combined with intelligent token refresh (silent refresh attempts before prompting, proactive refresh for AppRegistration), this ensures reliable completion of even the longest exports.

Additional enhancements include **parallel explosion processing** (`-ExplosionThreads`) for faster post-retrieval performance on PS7+, **automatic 1M record limit detection** for Graph API queries (with BlockHours auto-subdivision), new CopilotInteraction control switches, an execution telemetry export option, improved automation support with the `-Force` parameter, and UX safeguards when many output files or tabs are expected.
Additional enhancements include **memory management** (`-MaxMemoryMB`) to prevent out-of-memory crashes on large exports by streaming records through JSONL files instead of accumulating them in memory, **parallel explosion processing** (`-ExplosionThreads`) for faster post-retrieval performance on PS7+, **automatic 1M record limit detection** for Graph API queries (with BlockHours auto-subdivision), new CopilotInteraction control switches, an execution telemetry export option, improved automation support with the `-Force` parameter, and UX safeguards when many output files or tabs are expected.

---

Expand Down Expand Up @@ -420,6 +420,42 @@ If minimum window reached:

---

### Memory Management: `-MaxMemoryMB`

| Area | Details |
| --- | --- |
| **Purpose** | Automatically prevents out-of-memory conditions during large audit log exports (100K+ records) by streaming records directly to JSONL files on disk instead of accumulating them in memory. Active by default — no switch required. |
| **Default** | `-1` (auto-detect: 75% of system RAM). Use `0` to disable and restore original unlimited behavior. |
| **How It Works** | Records are written directly to JSONL files on disk instead of accumulating in memory. At export time, records are streamed from JSONL files to CSV in batches with HashSet-based deduplication. |
| **Limitation** | Not compatible with explosion modes (`-ExplodeDeep`/`-ExplodeArrays`), which require all records in memory. When explosion is specified, `-MaxMemoryMB` is ignored with a warning. |
| **Checkpoint** | Value is saved in checkpoint JSON and restored on `-Resume`. Can be overridden on the resume command line. |

#### Example

```powershell
# Default (auto-detect 75% of system RAM)
./PAX_Purview_Audit_Log_Processor.ps1 `
-StartDate 2026-01-01 `
-EndDate 2026-02-01 `
-OutputPath "C:\Exports\"

# Explicit 4GB limit
./PAX_Purview_Audit_Log_Processor.ps1 `
-StartDate 2026-01-01 `
-EndDate 2026-02-01 `
-MaxMemoryMB 4096 `
-OutputPath "C:\Exports\"

# Disable memory management (unlimited, original behavior)
./PAX_Purview_Audit_Log_Processor.ps1 `
-StartDate 2026-01-01 `
-EndDate 2026-02-01 `
-MaxMemoryMB 0 `
-OutputPath "C:\Exports\"
```

---

## Bug Fixes

- **(v1.10.0) Activity Type Breakdown metrics:** Fixed an issue where "Retrieved" counts showed 0 in the Activity Type Breakdown and Pipeline Summary sections. Per-activity retrieved counts now display correctly in all code paths.
Expand Down Expand Up @@ -450,6 +486,16 @@ If minimum window reached:

- **(v1.10.5) AppRegistration token refresh failure:** Fixed "Parameter set cannot be resolved using the specified named parameters" error during automatic token refresh in long-running AppRegistration operations. The `Invoke-TokenRefresh` function had the same parameter set conflict fixed in v1.10.2 for initial authentication—passing `-ClientId` alongside `-ClientSecretCredential` when the Graph SDK expects ClientId embedded only in the PSCredential.

- **(v1.10.6) AppRegistration token reliability for long-running exports:** Fixed multiple issues causing 401 authentication cascades during exports exceeding 60 minutes with `-Auth AppRegistration`. ThreadJob parallel partitions now build fresh headers from the shared auth state for every API call (12 locations fixed), token refresh logic now correctly uses AppRegistration credentials instead of defaulting to interactive WebLogin, and proactive token refresh now runs periodically every 30 minutes throughout the export.

- **(v1.10.6) Partition error recovery and final reconciliation:** Fixed an issue where partitions encountering non-authentication errors were not being queued for retry, potentially resulting in missing data in the final export. Added a final reconciliation safety net before export that detects any incomplete partitions and retries them sequentially (up to 5 attempts). Error messages now accurately indicate that failed partitions will be retried automatically.

- **(v1.10.6) Query slot cleanup and fetch retry:** Failed partitions now clean up their server-side query slots immediately, preventing orphaned queries from filling all 10 concurrent slots and blocking subsequent queries. Also added retry logic for record fetch failures (3 attempts, 30-second delays) to preserve costly server-side query preparation work before deleting the query.

- **(v1.10.6) Zero-record run cleanup:** Fixed `_PARTIAL` suffix remaining on output CSV and log filenames when all partitions completed successfully but returned 0 records. Checkpoint files are now properly cleaned up on zero-record runs.

- **(v1.10.6) Log message completeness:** Fixed missing and duplicate "Query succeeded" messages in the log file. All three ThreadJob output processing code paths now reliably emit exactly one success message per partition.

---

## Known Considerations
Expand Down
2 changes: 1 addition & 1 deletion script_archive/.gitkeep
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Last updated: 2026-01-30 (PAX v1.0.17, Graph v1.0.1, Purview v1.10.5, CopilotInteractions v1.2.0)
# Last updated: 2026-02-10 (PAX v1.0.18, Graph v1.0.1, Purview v1.10.6, CopilotInteractions v1.2.0)
Loading