A professional Windows desktop application for PDF analysis, redaction, and classification with advanced OCR capabilities.
- PDF Text Extraction: Extract and analyze text content from PDF documents
- Document Classification: Automatically classify documents based on content sensitivity
- Redaction Engine: Visual and true text removal redaction capabilities
- OCR Integration: Automatic OCR for scanned documents using Tesseract
- License Management: Free and Pro version with feature restrictions
- Advanced OCR: Automatic text recognition for scanned documents
- Unlimited Redaction: No limits on redaction areas
- Export Redacted PDFs: True text removal with secure export
- Advanced Classification: Enhanced document classification algorithms
- Secure Vault Export: Encrypted export capabilities
PDFClassifierPro/
├── PDFClassifierPro.Core/ # Core business logic
│ ├── Ocr/ # OCR functionality
│ │ └── OcrEngine.cs
│ ├── Redaction/ # Redaction engine
│ │ └── RedactionEngine.cs
│ ├── Classification/ # Document classification
│ │ └── ClassificationEngine.cs
│ ├── License/ # License management
│ │ └── LicenseManager.cs
│ └── Utils/ # Utility classes
│ ├── PdfHandler.cs
│ ├── PdfInspector.cs
│ └── FileService.cs
├── PDFClassifierPro.UI/ # WPF User Interface
│ ├── MainWindow.xaml # Main application window
│ ├── Views/ # UI components
│ │ ├── Controls/
│ │ │ └── PdfViewerControl.xaml
│ │ └── LicenseActivationDialog.xaml
│ └── App.xaml # Application entry point
└── PDFClassifierPro.Tests/ # Unit tests
├── Ocr/
├── Redaction/
├── Classification/
├── License/
└── Utils/
- .NET 8: Modern C# framework
- WPF: Windows Presentation Foundation for UI
- Fluent.Ribbon: Professional Office-style ribbon interface
- PdfiumViewer: PDF rendering and manipulation
- Tesseract: OCR engine for text recognition
- xUnit: Unit testing framework
- Windows 10/11
- .NET 8.0 SDK
- Visual Studio 2022 or VS Code
# Clone the repository
git clone https://github.com/your-username/pdf-classifier-pro.git
cd pdf-classifier-pro
# Restore dependencies
dotnet restore
# Build the solution
dotnet build
# Run tests
dotnet test
# Run the application
dotnet run --project PDFClassifierPro.UI- Launch the Application: Run
PDFClassifierPro.UI.exe - Open a PDF: Click "Open PDF" in the File group
- Analyze Content: Use "Classify Document" to analyze sensitivity
- Add Redactions: Use "Add Redaction Area" to mark sensitive content
- Export Results: Save or export redacted documents
- Click "Activate Pro" in the Pro Features group
- Enter your Pro license key
- Enjoy advanced features like unlimited redactions and OCR
- Unclassified: Public documents
- Confidential: Internal use only
- Secret: Sensitive information
- Top Secret: Highly classified content
The project includes comprehensive unit tests with 94% pass rate:
# Run all tests
dotnet test
# Run specific test categories
dotnet test --filter "Category=Ocr"
dotnet test --filter "Category=Redaction"
dotnet test --filter "Category=Classification"- OCR Engine: Image processing and text extraction
- Redaction Engine: Visual and true redaction capabilities
- Classification Engine: Document sensitivity analysis
- License Management: Pro feature validation
- Utility Classes: File operations and PDF inspection
- Single Responsibility: Each class has one clear purpose
- Modular Design: Features are separated into logical modules
- Clean Architecture: Clear separation between UI, business logic, and data
- No Comments: Following project guidelines for clean code
- Create feature class in appropriate Core subfolder
- Add corresponding test class in Tests subfolder
- Update UI if needed
- Ensure all tests pass
- Debug: Development with full debugging
- Release: Optimized production build
- Test: Automated testing with coverage
- OS: Windows 10/11 (x64)
- RAM: 4GB minimum, 8GB recommended
- Storage: 500MB available space
- .NET: .NET 8.0 Runtime
- IDE: Visual Studio 2022 or VS Code
- SDK: .NET 8.0 SDK
- Git: Version control
# Create release build
dotnet publish PDFClassifierPro.UI -c Release -r win-x64 --self-contained
# Output location
PDFClassifierPro.UI/bin/Release/net8.0-windows/win-x64/publish/- Use WiX Toolset or similar for MSI creation
- Include .NET 8.0 Runtime if not self-contained
- Register file associations for .pdf files
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
- Follow existing code structure
- Add unit tests for new features
- Ensure all tests pass before submitting
- Update documentation as needed
This project is licensed under the MIT License - see the LICENSE file for details.
- OCR Not Working: Ensure Tesseract data files are available
- PDF Loading Errors: Check file permissions and PDF validity
- License Issues: Verify license key format and validation
- Issues: Create GitHub issue with detailed description
- Documentation: Check inline code comments and test examples
- Community: Join our Discord server for real-time support
- Batch processing capabilities
- Advanced OCR with multiple languages
- Cloud storage integration
- Enhanced security features
- Machine learning classification
- API for enterprise integration
- Mobile companion app
- Advanced redaction tools
- Web-based interface
- Multi-platform support
- Enterprise deployment tools
- Advanced analytics dashboard
PDF Classifier Pro - Professional PDF analysis and redaction for the modern workplace.