Skip to content

Implementation Plan

mandla-enkosi edited this page Apr 6, 2025 · 7 revisions

Stage 1: MVP Implementation

1. Overview

  • Objective:
    Build the core functionality of the Code Repository Cleaner as a static, front-end–only application using React. This stage focuses on file upload, initial analysis, filtering via gitignore rules, sensitive data scanning with standard regex patterns, basic binary/media filtering, and dual export options (zip and concatenated document). A light ad integration will be embedded using static scripts.
  • Expected Result:
    A functional MVP that allows users to upload a code directory, receive an initial analysis, and process files through the filtering and sensitive data scanning pipeline. The tool will output a cleaned repository as a downloadable zip archive or concatenated document. The UI will include basic progress feedback and unobtrusive ad placements.

2. Dependency Validation

  • Pre-Stage Requirements:
    • Verified file upload capability (including directory selection using webkitdirectory).
    • Integrated core libraries: React, Material UI, JSZip, Micromatch, and static ad scripts.
    • Environment configuration: Node.js/npm installed, and project scaffolding via create-Vite for react-ts.
  • External Dependencies:
    • Third-party libraries for zip generation and pattern matching.
    • Ad network scripts (e.g., Google AdSense or custom internal ad code).

3. Diagrams

  • Visual Aids:
    • Architecture Diagram:
flowchart TD
    A[React UI Components] --> B[File Upload & Analysis Module]
    B --> C[File Processing Engine]
    C --> D[Gitignore Filtering Module]
    C --> E[Sensitive Data Scanner]
    C --> F[Media/Binary Filter]
    D & E & F --> G[Initial Export List]
    G --> H[File Override Panel]
    H --> I[Final Export Engine]
    I --> J[Output: Cleaned Zip / Concatenated Document]
    A --> K[Ad Integration Module]
    K --> A
Loading

4. Touched Parts

  • Modules/Functionalities/Files:
    • Components: (Leveraging core logic and state management from custom hooks in /src/hooks)
      • FileUploader.tsx (file/directory upload)
      • AnalysisDashboard.tsx (initial analysis results)
      • ConfigurationPanel.tsx (filtering and export options)
      • FileOverridePanel.tsx (hierarchical view of all files with toggles/checkboxes for user selection)
      • ProgressIndicator.tsx (processing status)
      • ExportOptions.tsx (export controls)
      • AdBanner.tsx (ad placement)
    • Hooks:
      • /hooks/useFileProcessing.ts: Manages the state machine for a single cleaning job (status, files, configuration used, overrides, results) and orchestrates worker interaction via useWorkerManager
      • /hooks/useWorkerManager.ts: Manages the Web Worker lifecycle and provides a typed interface for communication (sending tasks, receiving messages)
      • /hooks/useConfiguration.ts: Manages global/persistent user settings (e.g., preferred export format)
    • Modules:
      • /modules/fileProcessing.ts (orchestration of processing tasks)
      • /modules/gitignoreFilter.ts (applies gitignore rules)
      • /modules/sensitiveScanner.ts (scans and obfuscates sensitive data)
      • /modules/mediaBinaryFilter.ts (filters binaries, caches, media files)
      • /modules/exportEngine.ts (generates zip or concatenated output)
    • Web Workers:
      • /workers/processor.worker.ts (offloads heavy processing)
    • Utilities:
      • /utils/api.ts (ad integration helpers)
      • /utils/storage.ts (IndexedDB/local storage utilities)
    • Entry Points & Styles:
      • App.tsx, main.tsx, styles/Theme.ts and styles/index.css
    • Continuous Integration (CI):
      • .github/workflows/ci.yml for GitHub Actions performing linting, testing (vitest --run), and building

5. Module Contracts & Interface Definitions

  • Interface Definitions:
    • File Processing:
      • processFiles(files: File[]): Promise<ProcessedData>
        • where ProcessedData might be { filesToInclude: ProcessedFile[], removedFileCount: number, sensitiveDataFound: boolean }
      • applyGitignoreRules(files: File[], rules: string[]): File[]
      • overrideFileSelection(files: File[], selections: Record<string, boolean>): File[]
    • Sensitive Scanner:
      • scanSensitiveData(content: string): string
    • Export Engine:
      • generateZip(processedData: ProcessedData): Blob
  • Local State & Error Handling:
    • Define state for file upload status, processing progress, and export readiness.
    • Errors such as FileUploadError, ProcessingError, and ExportError should trigger UI error messages through React Error Boundaries.

6. Tests

Tests will be written and executed using vitest. Unit tests will be implemented in files named *.test.ts(x) co-located with their corresponding source files in /src. Component and hook tests will utilize @testing-library/react.

  • Unit Tests:
    • Test file upload parsing and validation.
    • Test gitignore rule application with multiple scenarios.
    • Validate sensitive data scanning with known input strings.
    • Verify export engine outputs (zip file and concatenated document) using simulated processed data.
    • Test FileOverridePanel component to ensure that manual toggling correctly updates the selection state.
    • Test the new overrideFileSelection function with various scenarios (e.g., all files toggled off, mixed selections, etc.).
    • Test the custom React hooks (useFileProcessing, useWorkerManager, useConfiguration) independently using @testing-library/react's renderHook to verify their state transitions, logic, and interactions (mocking worker communication for useFileProcessing when testing it, mocking hook interactions when testing components).
  • Integration Tests:
    • Simulate a complete user flow from file upload to export.
    • Use React Testing Library to ensure component interactions (upload → configuration → processing → export) work as expected.
    • Validate the end-to-end flow: Upload → Automatic Processing → Manual Override → Export.
    • Ensure that the final output respects both the automatic filtering and the manual overrides.
    • Use React Testing Library (@testing-library/react) with vitest to ensure component interactions work as expected. Component tests should focus on rendering and user interaction, often mocking the custom hooks they consume (useFileProcessing, etc.) to isolate component logic, especially for hooks managing complex state or side effects.
  • Edge Cases & Negative Testing:
    • Test handling of empty directories, extremely large files, and unsupported file types.
    • Simulate failure scenarios (e.g., file read errors) and ensure graceful degradation.

7. Module Integration Checkpoints

  • Internal Module Integration:
    • Confirm that the file upload module correctly passes files to the processing engine.
    • Validate that outputs from gitignore filtering, sensitive scanning, and media filtering combine seamlessly into the export engine.
  • Interface Integration:
    • Ensure that the Export Engine generates correct outputs from processed data.
    • Verify that the Ad Integration module loads static ad scripts without affecting core functionality.

8. Documentation Deliverables

  • Internal Documentation:
    • Inline code comments and module-level documentation.
    • Developer guide covering module interactions and state/error management.
    • Documentation detailing the purpose, state managed, parameters, return values, key interactions (e.g., with workers or other hooks), and usage context for the core custom hooks (useFileProcessing, useWorkerManager, useConfiguration).
  • External Developer Guides:
    • User manual explaining how to use the tool.
    • Quick-start guide for local project setup.
  • Change Logs & Versioning:
    • Maintain a CHANGELOG.md file to document feature additions, fixes, and version releases.

9. Considerations & Notes

  • Key Considerations:
    • Implement chunked processing using Web Workers to maintain UI responsiveness.
    • Ensure the tool is compatible with all major browsers.
    • Keep ad integration unobtrusive to the core user experience.
    • Component tests (using Vitest and likely React Testing Library) will involve interacting with MUI components, should query MUI components effectively (e.g., by role, label), and must avoid testing MUI's internal implementation details.
  • Risks & Mitigation:
    • Large repositories may strain browser memory; mitigate with chunked processing and fallback messages.
    • Browser inconsistencies with the File API—provide clear user guidance and alternative upload methods.
    • Increased bundle size from Material UI should be mitigated
  • Other Notes:
    • Prioritize privacy by ensuring all processing is client-side.
    • Maintain a clean separation between core functionality and ad content.
    • CI Pipeline: GitHub Actions will automate checks (lint, test, build) via:
      • npm ci
      • npm run lint
      • npm run test -- --run
      • npm run build

Stage 2: Enhancement Implementation

1. Overview

  • Objective:
    Enhance the MVP by improving performance and user feedback, and by introducing basic customization options. This stage will refine the Web Worker integration, add cancelable and chunked processing, and expand configuration settings. Additionally, the ad integration module will be refined based on initial user feedback.
  • Expected Result:
    • Improved responsiveness and memory management for processing large codebases.
    • Detailed progress reporting and the ability to cancel ongoing operations.
    • Enhanced configuration options for users.
    • Refined ad module capable of dynamic switching between internal and external ads.

2. Dependency Validation

  • Pre-Stage Requirements:
    • Successfully deployed and validated MVP functionalities.
    • Verified file upload, processing, and export operations.
    • Updated library versions as necessary for enhanced processing (e.g., any improved Web Worker libraries).
    • Continued use of static site hosting for seamless updates.
  • External Dependencies:
    • Continued use of React, JSZip, and Micromatch.
    • Optionally integrate IndexedDB utilities for enhanced storage management.

3. Diagrams (Optional)

  • Visual Aids:
    • Enhanced Workflow Diagram:
flowchart LR
    A[Enhanced React UI] --> B[Advanced File Upload & Analysis]
    B --> C[Optimized File Processing Engine]
    C --> D[Improved Gitignore & Sensitive Scanner Modules]
    C --> E[Enhanced Export Engine]
    D & E --> F[Refined Output Delivery]
    A --> G[Enhanced Ad Integration Module]
Loading

4. Touched Parts

  • Modules/Functionalities/Files:
    • Refine existing React components: Update ProgressIndicator.tsx for detailed metrics.
    • Refactor /modules/fileProcessing.ts for cancelable, chunked processing with Web Worker integration.
    • Enhance /workers/processor.worker.ts to support cancelable tasks.
    • Extend /utils/storage.ts for potential IndexedDB integration.
    • Update ConfigurationPanel.tsx to include additional customization options.
    • Modify AdBanner.tsx for dynamic ad switching and performance tracking.
    • Refactor /hooks/useFileProcessing.ts and /hooks/useWorkerManager.ts to integrate cancelable operations and handle detailed progress reporting from the worker.
    • Extend /hooks/useConfiguration.ts or /hooks/useFileProcessing.ts to manage state for basic custom sensitive data patterns.

5. Module Contracts & Interface Definitions

  • Interface Definitions:
    • Extend functions with additional parameters for progress callbacks:
      • processFiles(files: File[], onProgress: (progress: number) => void): Promise<ProcessedData>
    • Define new interfaces for enhanced configuration settings.
  • Local State & Error Handling:
    • Introduce additional state variables for Web Worker status and user configuration.
    • Update error handling to capture and report Web Worker termination and configuration errors.

6. Tests

  • Unit Tests:
    • Extend file processing tests to cover chunking and cancellation.
    • Test new configuration options to ensure they correctly influence processing behavior.
  • Integration Tests:
    • Run end-to-end tests simulating long-running processes and cancellation.
    • Validate that enhanced progress reporting is accurately reflected in the UI.
  • Edge Cases & Negative Testing:
    • Simulate interruptions and verify that cancellation recovers gracefully.
    • Stress-test large repositories to ensure improved memory management.

7. Module Integration Checkpoints

  • Internal Module Integration:
    • Validate Web Worker integration: ensure that the enhanced file processing updates the UI responsively.
    • Confirm that enhanced configuration options integrate properly with the processing engine.
  • Interface Integration:
    • Verify that new progress callbacks and error handlers are correctly consumed across modules.
    • Ensure compatibility with the MVP’s export outputs.

8. Documentation Deliverables

  • Internal Documentation:
    • Update developer guides with enhanced Web Worker and configuration details.
    • Document new parameters and state management updates inline.
  • External Developer Guides:
    • Revise user guides to include instructions for the enhanced configuration and progress features.
  • Change Logs & Versioning:
    • Update CHANGELOG.md with details of performance and UI enhancements.

9. Considerations & Notes

  • Key Considerations:
    • Focus on maintaining a responsive UI even under heavy processing loads.
    • Ensure enhancements do not compromise cross-browser compatibility.
  • Risks & Mitigation:
    • Complexity of cancelable operations may introduce bugs—extensive testing and fallback strategies are essential.
    • Customization features could overwhelm users; use defaults and clear UI guidance.
  • Other Notes:
    • Continue to monitor the impact of ad integration as performance is optimized.

Stage 3: Future Enhancements & API Integrations

1. Overview

  • Objective:
    Expand the application with advanced features such as customizable sensitive data detection, selective file retention, and a minimal JavaScript API for integration with development tools and CI/CD pipelines.
  • Expected Result:
    • Advanced configuration options that allow users to fine-tune sensitive data scanning.
    • Selective retention controls for overriding default exclusion rules.
    • A minimal API that exposes key functionality for programmatic access.
    • Further export optimizations including LLM-specific formatting (e.g., token count estimation).

2. Dependency Validation

  • Pre-Stage Requirements:
    • Stable MVP and Enhancement stages in production.
    • Positive user feedback and analytics supporting demand for advanced options.
    • Updated dependencies to support additional API integrations.
    • Environment: Local server setup may be required for API simulation.
  • External Dependencies:
    • Additional libraries if necessary for API creation or advanced processing.

3. Diagrams (Optional)

  • Visual Aids:
    • Future Integration Diagram:
flowchart LR
    A[React UI with Advanced Options] --> B[Customizable Sensitive Data Module]
    B --> C[Selective Retention Module]
    C --> D[Advanced Export Engine]
    A --> E[JavaScript API Integration]
Loading

4. Touched Parts

  • Modules/Functionalities/Files:
    • Add new React component AdvancedConfigPanel.tsx for advanced settings.
    • Extend /modules/sensitiveScanner.ts to support custom pattern input.
    • Create new module(s) for selective file retention logic.
    • Create a new directory /api containing modules that expose a minimal JavaScript API.
    • Update /modules/exportEngine.ts for LLM-specific formatting options.
    • Extend /hooks/useConfiguration.ts or /hooks/useFileProcessing.ts to manage state for advanced configuration options (custom sensitive patterns, LLM formatting choices).
    • Note: Selective retention logic is managed within useFileProcessing as part of the override state established in MVP.
    • Note: The JavaScript API in /api will likely interact with the core processing modules directly, not necessarily requiring new UI-focused hooks.

5. Module Contracts & Interface Definitions

  • Interface Definitions:
    • New functions for customizable scanning:
      • setCustomPatterns(patterns: Pattern[]): void
      • scanWithCustomPatterns(content: string, patterns?: Pattern[]): string
    • Define interfaces for the exposed JavaScript API (e.g., cleanRepository(config: Config): ProcessedData).
  • Local State & Error Handling:
    • Extend state management to store advanced configuration settings.
    • Expand error handling for new modules and API integration failures.

6. Tests

  • Unit Tests:
    • Develop tests for new advanced sensitive data and selective retention modules.
    • Test new API functions with various configurations.
  • Integration Tests:
    • Validate end-to-end processing with advanced options enabled.
    • Ensure API endpoints are functioning as documented.
  • Edge Cases & Negative Testing:
    • Test scenarios with conflicting advanced configuration options.
    • Simulate API call failures and ensure graceful degradation.

7. Module Integration Checkpoints

  • Internal Module Integration:
    • Confirm that new advanced modules integrate without breaking core functionality.
  • Interface Integration:
    • Validate that the API interfaces work as expected when called from external tools.
    • Ensure backwards compatibility with outputs from previous stages.

8. Documentation Deliverables

  • Internal Documentation:
    • Update developer documentation with details on advanced module contracts and API usage.
  • External Developer Guides:
    • Create comprehensive API documentation and user guides for advanced features.
  • Change Logs & Versioning:
    • Document all new functionalities and integrations in the versioned changelog.

9. Considerations & Notes

  • Key Considerations:
    • Maintain the core simplicity and performance even with advanced features.
    • Provide clear default configurations to avoid overwhelming non-technical users.
  • Risks & Mitigation:
    • Increased complexity may lead to integration challenges—extensive testing is required.
    • Customization options might confuse some users; provide comprehensive documentation and sensible defaults.
  • Other Notes:
    • Regularly review user feedback to decide which advanced features to prioritize.
    • Monitor API usage and performance analytics to guide future iterations.