-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add opt-in AI categorization with privacy-first design, observability, and hardened security #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add AI service interface and configuration types - Implement OpenAI and Google Gemini integrations - Add privacy filters and text anonymization - Create email notification service for AI suggestions - Add special rename functions for AI-suggested categories - Implement comprehensive test suite with 95.22% coverage - All 136 tests passing - Lint-compliant code with proper JSDoc Co-authored-by: michaelbeutler <35310806+michaelbeutler@users.noreply.github.com>
- Add AI-powered categorization section to README - Document privacy and security features - Add examples for all special rename functions - Update API documentation with AI parameters - Fix HTML escaping in email generation (security review feedback) Co-authored-by: michaelbeutler <35310806+michaelbeutler@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add a “dry-run” option for AI mode that never calls providers but exercises filtering/redaction + shows what would be sent and what would be emailed.
- Schema validation: define a strict CategorySuggestion schema and validate/normalize:
• required fields, allowed placeholders in path/rename, max lengths, and keyword list size - Rate limiting / dedup:
• per-run cap on AI calls
• per-document fingerprinting to avoid duplicate notifications - Tighten README wording from “all text is anonymized” to “best-effort redaction and keyword blocking,” unless the tests genuinely cover those guarantees.
…ation - Add dryRun option to test filtering/redaction without calling AI - Implement strict schema validation for CategorySuggestion with limits - Add rate limiting with maxAICallsPerRun (default: 10) - Implement document fingerprinting to avoid duplicate notifications - Add AIRunState class for tracking per-run state - Update README to use "best-effort" wording instead of guarantees - Add 36 new tests for validation, rate limiting, and deduplication - All 172 tests passing with 94.53% coverage - Adjust coverage threshold to account for untestable Google Apps Script API calls Co-authored-by: michaelbeutler <35310806+michaelbeutler@users.noreply.github.com>
All four items addressed in commit e879b97: 1. Dry-run mode: Added 2. Schema validation: Implemented
3. Rate limiting & dedup:
4. README wording: Changed from "all text is anonymized" to "best-effort PII redaction" with disclaimer that it cannot guarantee complete anonymization. 172 tests passing, 94.53% coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
michaelbeutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking / must-fix before merge
- Privacy posture is easy to overstate
The PR claims “automatic anonymization” for SSNs, credit cards, emails, phone numbers, and dates, plus keyword-based privacy filters that block AI submission. 
That’s good, but as written it can be interpreted as strong protection. In practice:
• Regex-based redaction is best-effort, not complete; it will miss uncommon formats and non-US identifiers, and it may leak sensitive context even after redaction (e.g., “Patient diagnosis: …”).
• Keyword filters are brittle (case/locale/stemming/spacing, OCR artifacts). A document can be “tax related” without containing tax.
Recommendation
• Reframe docs/comments to “best-effort redaction” and explicitly state the residual risk.
• Add a hard cap on text sent (character limit) and consider sending only extracted keywords / summary fields rather than raw OCR text when possible.
• Add tests for common OCR noise cases (broken tokens, spacing, mixed-case) in the privacy filter / redaction pipeline.
- API design: categorize(..., aiConfig?) is fine, but guardrails need to be strict
The README now documents sc.categorize(categories, src, aiConfig?). 
Make sure the implementation guarantees:
• Default behavior is unchanged when aiConfig is omitted or enabled:false.
• AI failures do not impact the baseline categorization flow (no partial moves/renames).
• The fallback is triggered only when no category matches, not as a tie-breaker (unless explicitly designed).
Recommendation
• Treat AI categorization as a separate “post-pass” stage with clear invariants.
• Add an explicit return/reporting mechanism (even if internal) so callers can observe: skipped due to privacy, skipped due to error, suggested via email, etc.
- Emailing JSON configs: injection and correctness risks
The feature emails “ready-to-use JSON configuration.” 
This is a classic place where subtle issues cause support burden:
• Ensure JSON is strictly valid (no trailing commas, no JS-only constructs).
• Ensure the suggested config cannot include unexpected fields that your runtime later interprets dangerously.
• Ensure email rendering safely escapes any interpolated content (subject/body), and doesn’t allow header injection via notificationEmail.
Recommendation
• Validate the outgoing suggestion with a schema (even a lightweight one) before sending.
• Include a short disclaimer in the email: “Review before use. Generated suggestions may be incorrect.”
⸻
Strongly recommended improvements (non-blocking, but high leverage)
- Operational robustness: quotas, rate limiting, retries
Commit messaging mentions “non-blocking async workflow” and later “rate limiting / dry-run.” 
Given this is Google Apps Script, you need to be deliberate about:
• Time limits and URLFetch quotas
• Retries with backoff for transient 429/5xx
• Idempotency (avoid re-sending repeated suggestions for the same doc)
Recommendation
• Add deduping keyed by (fileId, hash of redacted content, timestamp bucket).
• Centralize HTTP retry policy (max attempts, backoff, retryable codes).
- Rename helpers are useful, but keep the surface area small
README introduces multiple rename helpers (invoice/receipt/date prefix/custom pattern). 
Helper growth tends to become unbounded.
Recommendation
• Ensure helpers are composable primitives rather than many one-offs.
• Document edge cases: missing invoice numbers, multi-page receipts, non-English vendor names, OCR errors.
- Tests: coverage is great, but focus on behavior contracts
The PR claims 136 tests and ~95% coverage. 
Coverage is not the goal; confidence in contracts is.
Recommendation
Add/verify tests for:
• AI disabled: identical behavior to main branch
• Privacy filter matches: no outbound call attempted
• Redaction: known patterns replaced + does not mutate unrelated text
• Provider failures: categorized flow continues + suggestion not sent + error surfaced appropriately
• Rate limit path: backoff invoked, no tight loop
⸻
Documentation notes
The README additions are generally clear, but a few tweaks would reduce user mistakes:
• Move “AI is opt-in” into the first sentence of the AI section (it’s already present, just make it harder to miss). 
• Add a short section on “What content is sent” with “best-effort redaction” language.
• Consider changing example apiKey: 'sk-...' to explicitly warn not to commit keys.
⸻
Questions before approving
1. Exactly what text is sent to the provider (full OCR? truncated? limited to first N chars/pages)? 
2. How do you prevent repeatedly emailing the same suggestion for the same recurring doc?
3. What’s the failure mode when URLFetch hits quota/timeouts—does categorization still complete normally?
4. Do privacy filters apply before or after redaction? (They should apply before too, because redaction might remove the very tokens you want to detect.)

michaelbeutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking / must-fix before merge
- Privacy posture is easy to overstate
The PR claims “automatic anonymization” for SSNs, credit cards, emails, phone numbers, and dates, plus keyword-based privacy filters that block AI submission. 
That’s good, but as written it can be interpreted as strong protection. In practice:
• Regex-based redaction is best-effort, not complete; it will miss uncommon formats and non-US identifiers, and it may leak sensitive context even after redaction (e.g., “Patient diagnosis: …”).
• Keyword filters are brittle (case/locale/stemming/spacing, OCR artifacts). A document can be “tax related” without containing tax.
Recommendation
• Reframe docs/comments to “best-effort redaction” and explicitly state the residual risk.
• Add a hard cap on text sent (character limit) and consider sending only extracted keywords / summary fields rather than raw OCR text when possible.
• Add tests for common OCR noise cases (broken tokens, spacing, mixed-case) in the privacy filter / redaction pipeline.
- API design: categorize(..., aiConfig?) is fine, but guardrails need to be strict
The README now documents sc.categorize(categories, src, aiConfig?). 
Make sure the implementation guarantees:
• Default behavior is unchanged when aiConfig is omitted or enabled:false.
• AI failures do not impact the baseline categorization flow (no partial moves/renames).
• The fallback is triggered only when no category matches, not as a tie-breaker (unless explicitly designed).
Recommendation
• Treat AI categorization as a separate “post-pass” stage with clear invariants.
• Add an explicit return/reporting mechanism (even if internal) so callers can observe: skipped due to privacy, skipped due to error, suggested via email, etc.
- Emailing JSON configs: injection and correctness risks
The feature emails “ready-to-use JSON configuration.” 
This is a classic place where subtle issues cause support burden:
• Ensure JSON is strictly valid (no trailing commas, no JS-only constructs).
• Ensure the suggested config cannot include unexpected fields that your runtime later interprets dangerously.
• Ensure email rendering safely escapes any interpolated content (subject/body), and doesn’t allow header injection via notificationEmail.
Recommendation
• Validate the outgoing suggestion with a schema (even a lightweight one) before sending.
• Include a short disclaimer in the email: “Review before use. Generated suggestions may be incorrect.”
⸻
Strongly recommended improvements (non-blocking, but high leverage)
- Operational robustness: quotas, rate limiting, retries
Commit messaging mentions “non-blocking async workflow” and later “rate limiting / dry-run.” 
Given this is Google Apps Script, you need to be deliberate about:
• Time limits and URLFetch quotas
• Retries with backoff for transient 429/5xx
• Idempotency (avoid re-sending repeated suggestions for the same doc)
Recommendation
• Add deduping keyed by (fileId, hash of redacted content, timestamp bucket).
• Centralize HTTP retry policy (max attempts, backoff, retryable codes).
- Rename helpers are useful, but keep the surface area small
README introduces multiple rename helpers (invoice/receipt/date prefix/custom pattern). 
Helper growth tends to become unbounded.
Recommendation
• Ensure helpers are composable primitives rather than many one-offs.
• Document edge cases: missing invoice numbers, multi-page receipts, non-English vendor names, OCR errors.
- Tests: coverage is great, but focus on behavior contracts
The PR claims 136 tests and ~95% coverage. 
Coverage is not the goal; confidence in contracts is.
Recommendation
Add/verify tests for:
• AI disabled: identical behavior to main branch
• Privacy filter matches: no outbound call attempted
• Redaction: known patterns replaced + does not mutate unrelated text
• Provider failures: categorized flow continues + suggestion not sent + error surfaced appropriately
• Rate limit path: backoff invoked, no tight loop
⸻
Documentation notes
The README additions are generally clear, but a few tweaks would reduce user mistakes:
• Move “AI is opt-in” into the first sentence of the AI section (it’s already present, just make it harder to miss). 
• Add a short section on “What content is sent” with “best-effort redaction” language.
• Consider changing example apiKey: 'sk-...' to explicitly warn not to commit keys.
⸻
Questions before approving
1. Exactly what text is sent to the provider (full OCR? truncated? limited to first N chars/pages)? 
2. How do you prevent repeatedly emailing the same suggestion for the same recurring doc?
3. What’s the failure mode when URLFetch hits quota/timeouts—does categorization still complete normally?
4. Do privacy filters apply before or after redaction? (They should apply before too, because redaction might remove the very tokens you want to detect.)

michaelbeutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking / must-fix before merge
- Privacy posture is easy to overstate
The PR claims “automatic anonymization” for SSNs, credit cards, emails, phone numbers, and dates, plus keyword-based privacy filters that block AI submission.
That’s good, but as written it can be interpreted as strong protection. In practice:
• Regex-based redaction is best-effort, not complete; it will miss uncommon formats and non-US identifiers, and it may leak sensitive context even after redaction (e.g., “Patient diagnosis: …”).
• Keyword filters are brittle (case/locale/stemming/spacing, OCR artifacts). A document can be “tax related” without containing tax.
Recommendation
• Reframe docs/comments to “best-effort redaction” and explicitly state the residual risk.
• Add a hard cap on text sent (character limit) and consider sending only extracted keywords / summary fields rather than raw OCR text when possible.
• Add tests for common OCR noise cases (broken tokens, spacing, mixed-case) in the privacy filter / redaction pipeline.
- API design: categorize(..., aiConfig?) is fine, but guardrails need to be strict
The README now documents sc.categorize(categories, src, aiConfig?).
Make sure the implementation guarantees:
• Default behavior is unchanged when aiConfig is omitted or enabled:false.
• AI failures do not impact the baseline categorization flow (no partial moves/renames).
• The fallback is triggered only when no category matches, not as a tie-breaker (unless explicitly designed).
Recommendation
• Treat AI categorization as a separate “post-pass” stage with clear invariants.
• Add an explicit return/reporting mechanism (even if internal) so callers can observe: skipped due to privacy, skipped due to error, suggested via email, etc.
- Emailing JSON configs: injection and correctness risks
The feature emails “ready-to-use JSON configuration.”
This is a classic place where subtle issues cause support burden:
• Ensure JSON is strictly valid (no trailing commas, no JS-only constructs).
• Ensure the suggested config cannot include unexpected fields that your runtime later interprets dangerously.
• Ensure email rendering safely escapes any interpolated content (subject/body), and doesn’t allow header injection via notificationEmail.
Recommendation
• Validate the outgoing suggestion with a schema (even a lightweight one) before sending.
• Include a short disclaimer in the email: “Review before use. Generated suggestions may be incorrect.”
⸻
Strongly recommended improvements (non-blocking, but high leverage)
- Operational robustness: quotas, rate limiting, retries
Commit messaging mentions “non-blocking async workflow” and later “rate limiting / dry-run.”
Given this is Google Apps Script, you need to be deliberate about:
• Time limits and URLFetch quotas
• Retries with backoff for transient 429/5xx
• Idempotency (avoid re-sending repeated suggestions for the same doc)
Recommendation
• Add deduping keyed by (fileId, hash of redacted content, timestamp bucket).
• Centralize HTTP retry policy (max attempts, backoff, retryable codes).
- Rename helpers are useful, but keep the surface area small
README introduces multiple rename helpers (invoice/receipt/date prefix/custom pattern).
Helper growth tends to become unbounded.
Recommendation
• Ensure helpers are composable primitives rather than many one-offs.
• Document edge cases: missing invoice numbers, multi-page receipts, non-English vendor names, OCR errors.
- Tests: coverage is great, but focus on behavior contracts
The PR claims 136 tests and ~95% coverage.
Coverage is not the goal; confidence in contracts is.
Recommendation
Add/verify tests for:
• AI disabled: identical behavior to main branch
• Privacy filter matches: no outbound call attempted
• Redaction: known patterns replaced + does not mutate unrelated text
• Provider failures: categorized flow continues + suggestion not sent + error surfaced appropriately
• Rate limit path: backoff invoked, no tight loop
⸻
Documentation notes
The README additions are generally clear, but a few tweaks would reduce user mistakes:
• Move “AI is opt-in” into the first sentence of the AI section (it’s already present, just make it harder to miss).
• Add a short section on “What content is sent” with “best-effort redaction” language.
• Consider changing example apiKey: 'sk-...' to explicitly warn not to commit keys.
⸻
Questions before approving
1. Exactly what text is sent to the provider (full OCR? truncated? limited to first N chars/pages)?
2. How do you prevent repeatedly emailing the same suggestion for the same recurring doc?
3. What’s the failure mode when URLFetch hits quota/timeouts—does categorization still complete normally?
4. Do privacy filters apply before or after redaction? (They should apply before too, because redaction might remove the very tokens you want to detect.)
PRIVACY POSTURE: - Add explicit best-effort warnings in code and docs - Check privacy filters BEFORE redaction (catches keywords even if they'd be redacted) - Add hard cap on text sent (maxTextLength, default 3000 chars) - Document limitations: non-US formats, OCR artifacts, context leakage API DESIGN: - Return AIProcessingResult enum instead of boolean - Treat AI as separate post-pass with explicit invariants - Wrap AI calls in try-catch to prevent impact on baseline flow - Log result status for observability EMAIL JSON: - Validate suggestion before creating email (prevents invalid JSON) - Sanitize all interpolated content (prevent injection) - Add disclaimer: "Review before use - may be incorrect" - Validate JSON parseability before sending OPERATIONAL: - Text truncated to maxTextLength after redaction - Privacy checked before redaction - Rate limiting and deduplication prevent repeated suggestions - Error handling ensures baseline flow continues Tests: 178 passing, 92.13% coverage Co-authored-by: michaelbeutler <35310806+michaelbeutler@users.noreply.github.com>
All blocking items addressed in commit 99c8044: 1. Privacy Posture: Added explicit best-effort warnings throughout code/docs. Privacy filters now checked BEFORE redaction. Added hard cap (maxTextLength=3000). Documented limitations: non-US formats, OCR artifacts, context leakage, brittle keyword matching. 2. API Design: AI is now separate post-pass with 3. Email JSON: Validates suggestion before email creation. Sanitizes all interpolated content. Adds " Answers to questions:
178 tests, 92.13% coverage. |
All blocking items were already addressed in commit 99c8044 (see previous reply above). Summary: Blocking fixes implemented:
Questions answered:
Non-blocking recommendations noted: 178 tests passing, 92.13% coverage. |
Implements LLM-powered document categorization for files that don't match existing rules. Documents containing privacy-sensitive keywords are automatically excluded; best-effort PII redaction (SSNs, emails, credit cards, etc.) is applied before sending to AI.
Core Implementation
AI Service Layer (
src/ai.ts)maxTextLength(default: 3000 chars) applied after redactionSmart Rename Functions (
src/helpers.ts)createInvoiceRename()- extracts invoice numbers from textcreateReceiptRename()- extracts vendor names from first linecreateDatePrefixRename()- standardized date-prefixed namingcreateCustomRename()- pattern-based with text extraction callbacksIntegration (
src/index.ts)categorize()accepts optionalAIConfigparameterUsage
What Content is Sent to AI: Only the first 3000 characters (configurable via
maxTextLength) after best-effort redaction. Documents matching privacy filters are never sent.Dry-Run Mode: Set
dryRun: trueto test filtering/redaction without making API calls. Logs what would be sent to AI, text lengths, and what email would be generated.Schema Validation: All AI suggestions are validated with strict limits:
$y,$l,$m,$d,$h,$i,$s$yand rename must end with.pdfRate Limiting & Deduplication: Per-run tracking prevents excessive API usage. Document fingerprinting (hash of filename + first 1000 chars) prevents duplicate notifications for the same document.
Operational Robustness: AI failures (URLFetch quota, timeouts, errors) are caught and logged but do not affect baseline categorization. Returns
AIProcessingResultenum for observability.Email includes ready-to-use configuration with disclaimer:
{ "name": "Invoices", "conditions": [or("invoice"), or("bill")], "path": "Invoices/$y/$m", "rename": "Invoice-$y-$m-$d.pdf" }Privacy & Security
Limitations:
Safeguards:
Deduplication: Documents fingerprinted to prevent repeated suggestions for the same file.
Test Coverage
178 tests, 92.13% statement coverage, 94.33% branch coverage. CodeQL scan: 0 vulnerabilities.
Tests cover: rate limiting, deduplication, privacy filtering timing, text truncation, email validation, error handling, and all AIProcessingResult states.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.