-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Implement a comprehensive Google Drive import tool that allows users to authenticate with their Google account, select files/folders from their Drive, and import them into PinShare with full monitoring capabilities. The tool should support both manual one-time imports and optional continuous synchronization.
User Story
As a PinShare user, I want to import my files from Google Drive into the decentralized PinShare network, so that I can:
- Liberate my data from centralized cloud storage
- Share my files via P2P/IPFS without relying on Google's infrastructure
- Maintain a decentralized backup of my important files
- Optionally keep my PinShare instance in sync with my Drive
Current State
PinShare currently supports file uploads via:
- File system watcher monitoring an
./uploadfolder - Manual file placement by users
Architecture Gaps for Google Drive Import:
- ❌ No user authentication system (OAuth or otherwise)
- ❌ No Google Drive API integration
- ❌ No background job queue for long-running operations
- ❌ No per-user file import tracking
- ❌ Limited upload status tracking (designed for quick local file processing)
- ❌ No continuous sync/monitoring capability
Existing Assets to Leverage:
- ✅ Robust upload pipeline (validation → hashing → security scanning → IPFS → metadata storage)
- ✅ Upload status tracking system (
UploadStatusManager) - ✅ Real-time UI updates via React Query
- ✅ Security scanning infrastructure (VirusTotal, ClamAV, P2P-Sec)
- ✅ P2P metadata distribution via PubSub
Proposed Solution
Architecture Components
1. User Authentication System
- OAuth 2.0 with PKCE for Google Drive API access
- Per-user token storage (encrypted)
- Token refresh mechanism
- Support for multiple users on the same PinShare instance
Tech Stack:
google.golang.org/api/drive/v3for Google Drive APIgolang.org/x/oauth2for OAuth flow- Secure token storage (encrypted database or keyring)
2. Google Drive Integration
- List user's Drive folders and files
- Stream file downloads directly to PinShare
- Folder hierarchy preservation (optional)
- Metadata mapping (Drive metadata → PinShare metadata)
Features:
- Folder tree browser UI
- Multi-select file/folder selection
- Filter by file type, size, date
- Preview file list before import
3. Background Job System
- Job queue for import operations
- Worker pool for concurrent processing
- Job persistence (survive restarts)
- Progress tracking per job
- Retry mechanism with exponential backoff
Job States:
pending → downloading → hashing → scanning → uploading → completed
└→ failed (with retry)
4. Enhanced Monitoring Dashboard
Real-time metrics:
- Overall import progress (X of Y files, % complete)
- Per-file status with detailed stages
- Error tracking with specific failure reasons
- Bandwidth metrics (current speed, average speed, ETA)
- Success/failure statistics
Historical tracking:
- Import job history
- Per-file import logs
- Retry attempts
- Total data imported
5. Continuous Sync Engine (Phase 3)
- Watch Google Drive for changes (polling or webhooks)
- Auto-import new/modified files
- Configurable sync interval
- Conflict resolution strategy
- Sync pause/resume capability
Technical Implementation
Backend API Endpoints
Authentication
POST /api/google-drive/authorize
→ Initiates OAuth flow, returns authorization URL
POST /api/google-drive/callback?code={authCode}
→ Exchanges auth code for tokens, stores encrypted
GET /api/google-drive/auth-status
→ Returns whether user is authenticated
DELETE /api/google-drive/revoke
→ Revokes access and deletes tokens
File Selection
GET /api/google-drive/folders?path={folderId}
→ Lists files/folders in specified folder (defaults to root)
Response: { id, name, mimeType, size, modifiedTime, parents[] }
POST /api/google-drive/preview-import
Request: { fileIds: [], folderIds: [], recursive: bool }
Response: { files: [], totalSize, totalCount }
Import Operations
POST /api/google-drive/import
Request: {
fileIds: [],
folderIds: [],
recursive: bool,
options: { preserveHierarchy, skipDuplicates }
}
Response: { jobId, status, filesQueued }
GET /api/google-drive/import/{jobId}/status
Response: {
jobId,
status: "pending|running|completed|failed|cancelled",
progress: {
totalFiles,
completedFiles,
failedFiles,
currentFile,
percentComplete,
bytesTransferred,
totalBytes,
transferRate,
estimatedTimeRemaining
},
files: [
{
driveId,
fileName,
status: "pending|downloading|hashing|scanning|uploading|completed|failed",
progress: 0-100,
error: ""
}
]
}
POST /api/google-drive/import/{jobId}/cancel
→ Cancels running import job
POST /api/google-drive/import/{jobId}/retry-failed
→ Retries all failed files in the job
GET /api/google-drive/import/history
→ Returns list of past import jobs with summary stats
Continuous Sync (Phase 3)
POST /api/google-drive/sync/configure
Request: { enabled, folderId, interval, options }
GET /api/google-drive/sync/status
Response: { enabled, lastSync, nextSync, syncedFiles, errors }
POST /api/google-drive/sync/trigger
→ Manually triggers sync cycle
Database Schema
Users Table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
google_id VARCHAR(255) UNIQUE NOT NULL,
email VARCHAR(255) NOT NULL,
encrypted_access_token TEXT,
encrypted_refresh_token TEXT,
token_expiry TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);Import Jobs Table
CREATE TABLE import_jobs (
id UUID PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
status VARCHAR(50), -- pending, running, completed, failed, cancelled
total_files INTEGER,
completed_files INTEGER,
failed_files INTEGER,
total_bytes BIGINT,
transferred_bytes BIGINT,
started_at TIMESTAMP,
completed_at TIMESTAMP,
options JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);Import Files Table
CREATE TABLE import_files (
id SERIAL PRIMARY KEY,
job_id UUID REFERENCES import_jobs(id),
drive_file_id VARCHAR(255),
file_name VARCHAR(500),
file_size BIGINT,
status VARCHAR(50), -- pending, downloading, hashing, scanning, uploading, completed, failed
progress INTEGER, -- 0-100
sha256_hash VARCHAR(64),
ipfs_cid VARCHAR(255),
error_message TEXT,
retry_count INTEGER DEFAULT 0,
started_at TIMESTAMP,
completed_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);Sync Configurations Table (Phase 3)
CREATE TABLE sync_configs (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
drive_folder_id VARCHAR(255),
enabled BOOLEAN DEFAULT TRUE,
sync_interval INTEGER, -- minutes
last_sync_at TIMESTAMP,
next_sync_at TIMESTAMP,
options JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);Internal Architecture
New Go Packages
internal/gdrive/
client.go- Google Drive API client wrapperoauth.go- OAuth flow managementdownloader.go- File download from Drivemapper.go- Drive metadata → PinShare metadata conversion
internal/jobs/
queue.go- Job queue interfaceworker.go- Worker pool implementationimport_job.go- Import job definition and state machinepersistence.go- Job state persistence
internal/users/
auth.go- User authenticationstore.go- User data storagetokens.go- Encrypted token management
internal/sync/ (Phase 3)
engine.go- Continuous sync orchestrationwatcher.go- Drive change detectionscheduler.go- Sync scheduling
Integration with Existing Systems
Upload Pipeline Integration:
// In internal/jobs/import_job.go
func (j *ImportJob) processFile(driveFile *drive.File) error {
// 1. Download from Google Drive
j.updateFileStatus(driveFile.Id, "downloading", 0)
localPath, err := j.driveClient.Download(driveFile)
// 2. Plug into existing upload pipeline
j.updateFileStatus(driveFile.Id, "hashing", 30)
sha256 := psfs.ComputeSHA256(localPath)
j.updateFileStatus(driveFile.Id, "scanning", 50)
secResult := psfs.SecurityCheck(localPath, sha256)
if !secResult.Safe {
return j.failFile(driveFile.Id, "Security scan failed")
}
j.updateFileStatus(driveFile.Id, "uploading", 70)
cid, err := psfs.AddFileIPFS(localPath)
j.updateFileStatus(driveFile.Id, "storing", 90)
metadata := store.BaseMetadata{
FileSHA256: sha256,
IPFSCID: cid,
FileName: driveFile.Name,
// ... map other Drive metadata
}
store.GlobalStore.AddFile(metadata)
j.updateFileStatus(driveFile.Id, "completed", 100)
return nil
}UI Requirements
1. Google Drive Authorization Page
Location: /import/google-drive/authorize
Components:
- "Connect to Google Drive" button
- OAuth consent explanation
- Permissions required list
- Privacy policy link
2. Folder/File Selection Interface
Location: /import/google-drive/select
Features:
- Tree view of Drive folders (collapsible)
- File list view with checkboxes
- Multi-select capability
- File type icons
- Size/date metadata display
- Search/filter bar
- "Select All" / "Deselect All" buttons
- Preview import summary (X files, Y GB)
- Import options:
- Preserve folder hierarchy
- Skip duplicates (by SHA256)
- Include shared files
- "Start Import" button
3. Import Status Dashboard
Location: /import/google-drive/status/{jobId}
Real-time Metrics:
╔════════════════════════════════════════════════════════╗
║ Import Progress [Cancel] ║
╠════════════════════════════════════════════════════════╣
║ ████████████████░░░░░░░░░░ 45% (45/100 files) ║
║ ⬇ Downloading: document.pdf (2.5 MB/s) ║
║ ⏱ ETA: 5 minutes ║
║ 📊 Status: 40 completed, 5 failed, 55 pending ║
╚════════════════════════════════════════════════════════╝
Files:
┌─────────────────────────────────────────────────────┐
│ ✅ report.pdf │ Completed │ 2.3 MB │ 12:30 │
│ ⏳ presentation.pptx │ Scanning │ ████░░ │ │
│ ❌ large-video.mp4 │ Failed │ Error: Too large│
│ ⏸ document.docx │ Pending │ 45 KB │ │
└─────────────────────────────────────────────────────┘
[Retry Failed Files] [View Details]
Detailed Per-File View:
- File name with Drive icon
- Progress bar for current file
- Current stage (downloading/hashing/scanning/uploading)
- Transfer speed
- Success/error indicator
- Retry button for failed files
4. Import History Page
Location: /import/google-drive/history
Display:
- List of past import jobs
- Job ID, start time, duration
- Success/failure counts
- Total data imported
- "View Details" link to status page
5. Sync Configuration Page (Phase 3)
Location: /import/google-drive/sync
Settings:
- Enable/disable continuous sync
- Select Drive folder to sync
- Sync interval (hourly, daily, etc.)
- Conflict resolution strategy
- Last sync timestamp
- Manual "Sync Now" button
Monitoring & Observability
Metrics to Track
Job-Level Metrics:
gdrive_import_jobs_total{status="completed|failed|cancelled"}gdrive_import_duration_secondsgdrive_import_files_total{status="completed|failed"}gdrive_import_bytes_total
File-Level Metrics:
gdrive_file_download_duration_secondsgdrive_file_size_bytes{stage="downloaded|uploaded"}gdrive_transfer_rate_bytes_per_second
API Metrics:
gdrive_api_requests_total{endpoint,status}gdrive_api_errors_total{error_type}gdrive_api_rate_limit_hits_total
Sync Metrics (Phase 3):
gdrive_sync_cycles_total{status}gdrive_sync_new_files_detectedgdrive_sync_lag_seconds(time since last successful sync)
Logging Strategy
- Structured JSON logs
- Log levels: DEBUG, INFO, WARN, ERROR
- Include job ID, user ID, file ID in all log entries
- Detailed error logging with stack traces
Security Considerations
OAuth Security
- PKCE Flow - Use Proof Key for Code Exchange for additional security
- Token Encryption - Encrypt tokens at rest using AES-256
- Secure Storage - Store encrypted tokens in database or OS keyring
- Token Rotation - Implement automatic refresh token rotation
- Scope Minimization - Request only
drive.readonlyscope
API Security
- Rate Limiting - Respect Google Drive API quotas (per-user limits)
- Authentication Required - All endpoints require valid user session
- Input Validation - Validate all Drive file IDs, folder paths
- CORS - Restrict to localhost and configured domains
File Security
- Leverage Existing Scanning - All imported files go through security checks
- Size Limits - Enforce max file size limits
- Type Validation - Respect PinShare's allowed file types
- Malware Scanning - VirusTotal/ClamAV on all imports
Privacy
- User Data Isolation - Each user only sees their own imports
- Token Revocation - Support complete data deletion
- Audit Logging - Log all import operations
Testing Strategy
Unit Tests
- Google Drive client mocking
- OAuth flow state machine
- Job queue operations
- Metadata mapping accuracy
Integration Tests
- End-to-end import flow with test files
- OAuth callback handling
- Database persistence
- Worker pool concurrency
Load Tests
- Import 1,000 files concurrently
- Test with 10+ concurrent import jobs
- Measure memory usage and performance
- Test rate limit handling
Security Tests
- OAuth PKCE flow validation
- Token encryption/decryption
- Unauthorized access attempts
- Input validation edge cases
Implementation Phases
Phase 1: OAuth + Basic Import (MVP)
Goal: Import files manually from Google Drive
Deliverables:
- User authentication system
- Google Drive OAuth integration
- Drive folder/file browser UI
- Basic import job queue
- Simple progress tracking
- Integration with existing upload pipeline
Estimated Effort: 2-3 weeks
Phase 2: Enhanced Monitoring
Goal: Comprehensive import monitoring
Deliverables:
- Per-file status tracking
- Real-time progress updates (WebSocket)
- Bandwidth/speed metrics
- Error tracking and retry mechanism
- Import history dashboard
- Prometheus metrics
Estimated Effort: 1-2 weeks
Phase 3: Continuous Sync
Goal: Auto-sync Drive changes
Deliverables:
- Drive change detection (polling)
- Sync scheduler
- Sync configuration UI
- Conflict resolution
- Sync pause/resume
Estimated Effort: 2-3 weeks
Phase 4: Performance & Polish
Goal: Production-ready reliability
Deliverables:
- Performance optimization
- Webhook support (vs polling)
- Advanced filtering options
- Batch operations
- Comprehensive documentation
Estimated Effort: 1-2 weeks
Dependencies
External Services
- Google Cloud Project - OAuth credentials, API enablement
- Google Drive API v3 - File access
- Database - PostgreSQL or SQLite for job/user persistence
Go Packages
require (
google.golang.org/api v0.XXX
golang.org/x/oauth2 v0.XXX
github.com/lib/pq v1.XXX // PostgreSQL driver
github.com/google/uuid v1.XXX // Job IDs
)Configuration
# config.yaml additions
google_drive:
oauth:
client_id: "${GOOGLE_OAUTH_CLIENT_ID}"
client_secret: "${GOOGLE_OAUTH_CLIENT_SECRET}"
redirect_url: "http://localhost:9090/api/google-drive/callback"
scopes:
- "https://www.googleapis.com/auth/drive.readonly"
import:
max_concurrent_downloads: 5
max_file_size_mb: 1024
temp_download_dir: "./tmp/gdrive"
rate_limiting:
requests_per_second: 10
burst: 20Success Metrics
User Adoption
- Number of users connecting Google Drive
- Total files imported
- Active sync configurations
Performance
- Average import speed (files/minute, MB/s)
- P95 latency for import operations
- Error rate < 1%
Reliability
- Job success rate > 99%
- Retry success rate
- Sync lag < 5 minutes (Phase 3)
Future Enhancements
Beyond Initial Implementation
- Dropbox Integration - Apply same pattern to Dropbox
- OneDrive Support - Microsoft OneDrive import
- S3 Import - AWS S3 bucket import
- Selective Export - PinShare → Google Drive
- Smart Deduplication - Cross-user file deduplication
- Bandwidth Scheduling - Import during off-peak hours
- Multi-folder Sync - Sync multiple Drive folders simultaneously
Related Issues
- Implement server-side search endpoint for scalable file searching #2 - Server-side search endpoint (needed for filtering large import lists)
- TBD - User authentication system (foundational for per-user OAuth)
- TBD - PostgreSQL migration (needed for job persistence)
References
Priority: High
Complexity: High
Impact: High - Unlocks PinShare for users with existing cloud storage