Implement Google Drive Import Tool with Comprehensive Monitoring

## Summary

Implement a comprehensive Google Drive import tool that allows users to authenticate with their Google account, select files/folders from their Drive, and import them into PinShare with full monitoring capabilities. The tool should support both manual one-time imports and optional continuous synchronization.

## User Story

**As a PinShare user**, I want to import my files from Google Drive into the decentralized PinShare network, so that I can:
- Liberate my data from centralized cloud storage
- Share my files via P2P/IPFS without relying on Google's infrastructure
- Maintain a decentralized backup of my important files
- Optionally keep my PinShare instance in sync with my Drive

## Current State

PinShare currently supports file uploads via:
- File system watcher monitoring an `./upload` folder
- Manual file placement by users

**Architecture Gaps for Google Drive Import:**
1. ❌ No user authentication system (OAuth or otherwise)
2. ❌ No Google Drive API integration
3. ❌ No background job queue for long-running operations
4. ❌ No per-user file import tracking
5. ❌ Limited upload status tracking (designed for quick local file processing)
6. ❌ No continuous sync/monitoring capability

**Existing Assets to Leverage:**
- ✅ Robust upload pipeline (validation → hashing → security scanning → IPFS → metadata storage)
- ✅ Upload status tracking system (`UploadStatusManager`)
- ✅ Real-time UI updates via React Query
- ✅ Security scanning infrastructure (VirusTotal, ClamAV, P2P-Sec)
- ✅ P2P metadata distribution via PubSub

## Proposed Solution

### Architecture Components

#### 1. User Authentication System
- **OAuth 2.0 with PKCE** for Google Drive API access
- Per-user token storage (encrypted)
- Token refresh mechanism
- Support for multiple users on the same PinShare instance

**Tech Stack:**
- `google.golang.org/api/drive/v3` for Google Drive API
- `golang.org/x/oauth2` for OAuth flow
- Secure token storage (encrypted database or keyring)

#### 2. Google Drive Integration
- List user's Drive folders and files
- Stream file downloads directly to PinShare
- Folder hierarchy preservation (optional)
- Metadata mapping (Drive metadata → PinShare metadata)

**Features:**
- Folder tree browser UI
- Multi-select file/folder selection
- Filter by file type, size, date
- Preview file list before import

#### 3. Background Job System
- Job queue for import operations
- Worker pool for concurrent processing
- Job persistence (survive restarts)
- Progress tracking per job
- Retry mechanism with exponential backoff

**Job States:**
```
pending → downloading → hashing → scanning → uploading → completed
                                                      └→ failed (with retry)
```

#### 4. Enhanced Monitoring Dashboard
**Real-time metrics:**
- Overall import progress (X of Y files, % complete)
- Per-file status with detailed stages
- Error tracking with specific failure reasons
- Bandwidth metrics (current speed, average speed, ETA)
- Success/failure statistics

**Historical tracking:**
- Import job history
- Per-file import logs
- Retry attempts
- Total data imported

#### 5. Continuous Sync Engine (Phase 3)
- Watch Google Drive for changes (polling or webhooks)
- Auto-import new/modified files
- Configurable sync interval
- Conflict resolution strategy
- Sync pause/resume capability

---

## Technical Implementation

### Backend API Endpoints

#### Authentication
```
POST /api/google-drive/authorize
  → Initiates OAuth flow, returns authorization URL

POST /api/google-drive/callback?code={authCode}
  → Exchanges auth code for tokens, stores encrypted

GET /api/google-drive/auth-status
  → Returns whether user is authenticated

DELETE /api/google-drive/revoke
  → Revokes access and deletes tokens
```

#### File Selection
```
GET /api/google-drive/folders?path={folderId}
  → Lists files/folders in specified folder (defaults to root)
  Response: { id, name, mimeType, size, modifiedTime, parents[] }

POST /api/google-drive/preview-import
  Request: { fileIds: [], folderIds: [], recursive: bool }
  Response: { files: [], totalSize, totalCount }
```

#### Import Operations
```
POST /api/google-drive/import
  Request: { 
    fileIds: [], 
    folderIds: [], 
    recursive: bool,
    options: { preserveHierarchy, skipDuplicates }
  }
  Response: { jobId, status, filesQueued }

GET /api/google-drive/import/{jobId}/status
  Response: { 
    jobId, 
    status: "pending|running|completed|failed|cancelled",
    progress: { 
      totalFiles, 
      completedFiles, 
      failedFiles, 
      currentFile,
      percentComplete,
      bytesTransferred,
      totalBytes,
      transferRate,
      estimatedTimeRemaining
    },
    files: [
      { 
        driveId, 
        fileName, 
        status: "pending|downloading|hashing|scanning|uploading|completed|failed",
        progress: 0-100,
        error: ""
      }
    ]
  }

POST /api/google-drive/import/{jobId}/cancel
  → Cancels running import job

POST /api/google-drive/import/{jobId}/retry-failed
  → Retries all failed files in the job

GET /api/google-drive/import/history
  → Returns list of past import jobs with summary stats
```

#### Continuous Sync (Phase 3)
```
POST /api/google-drive/sync/configure
  Request: { enabled, folderId, interval, options }
  
GET /api/google-drive/sync/status
  Response: { enabled, lastSync, nextSync, syncedFiles, errors }

POST /api/google-drive/sync/trigger
  → Manually triggers sync cycle
```

### Database Schema

#### Users Table
```sql
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  google_id VARCHAR(255) UNIQUE NOT NULL,
  email VARCHAR(255) NOT NULL,
  encrypted_access_token TEXT,
  encrypted_refresh_token TEXT,
  token_expiry TIMESTAMP,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);
```

#### Import Jobs Table
```sql
CREATE TABLE import_jobs (
  id UUID PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  status VARCHAR(50), -- pending, running, completed, failed, cancelled
  total_files INTEGER,
  completed_files INTEGER,
  failed_files INTEGER,
  total_bytes BIGINT,
  transferred_bytes BIGINT,
  started_at TIMESTAMP,
  completed_at TIMESTAMP,
  options JSONB,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);
```

#### Import Files Table
```sql
CREATE TABLE import_files (
  id SERIAL PRIMARY KEY,
  job_id UUID REFERENCES import_jobs(id),
  drive_file_id VARCHAR(255),
  file_name VARCHAR(500),
  file_size BIGINT,
  status VARCHAR(50), -- pending, downloading, hashing, scanning, uploading, completed, failed
  progress INTEGER, -- 0-100
  sha256_hash VARCHAR(64),
  ipfs_cid VARCHAR(255),
  error_message TEXT,
  retry_count INTEGER DEFAULT 0,
  started_at TIMESTAMP,
  completed_at TIMESTAMP,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);
```

#### Sync Configurations Table (Phase 3)
```sql
CREATE TABLE sync_configs (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  drive_folder_id VARCHAR(255),
  enabled BOOLEAN DEFAULT TRUE,
  sync_interval INTEGER, -- minutes
  last_sync_at TIMESTAMP,
  next_sync_at TIMESTAMP,
  options JSONB,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);
```

### Internal Architecture

#### New Go Packages

**`internal/gdrive/`**
- `client.go` - Google Drive API client wrapper
- `oauth.go` - OAuth flow management
- `downloader.go` - File download from Drive
- `mapper.go` - Drive metadata → PinShare metadata conversion

**`internal/jobs/`**
- `queue.go` - Job queue interface
- `worker.go` - Worker pool implementation
- `import_job.go` - Import job definition and state machine
- `persistence.go` - Job state persistence

**`internal/users/`**
- `auth.go` - User authentication
- `store.go` - User data storage
- `tokens.go` - Encrypted token management

**`internal/sync/`** (Phase 3)
- `engine.go` - Continuous sync orchestration
- `watcher.go` - Drive change detection
- `scheduler.go` - Sync scheduling

#### Integration with Existing Systems

**Upload Pipeline Integration:**
```go
// In internal/jobs/import_job.go
func (j *ImportJob) processFile(driveFile *drive.File) error {
    // 1. Download from Google Drive
    j.updateFileStatus(driveFile.Id, "downloading", 0)
    localPath, err := j.driveClient.Download(driveFile)
    
    // 2. Plug into existing upload pipeline
    j.updateFileStatus(driveFile.Id, "hashing", 30)
    sha256 := psfs.ComputeSHA256(localPath)
    
    j.updateFileStatus(driveFile.Id, "scanning", 50)
    secResult := psfs.SecurityCheck(localPath, sha256)
    
    if !secResult.Safe {
        return j.failFile(driveFile.Id, "Security scan failed")
    }
    
    j.updateFileStatus(driveFile.Id, "uploading", 70)
    cid, err := psfs.AddFileIPFS(localPath)
    
    j.updateFileStatus(driveFile.Id, "storing", 90)
    metadata := store.BaseMetadata{
        FileSHA256: sha256,
        IPFSCID: cid,
        FileName: driveFile.Name,
        // ... map other Drive metadata
    }
    store.GlobalStore.AddFile(metadata)
    
    j.updateFileStatus(driveFile.Id, "completed", 100)
    return nil
}
```

---

## UI Requirements

### 1. Google Drive Authorization Page
**Location:** `/import/google-drive/authorize`

**Components:**
- "Connect to Google Drive" button
- OAuth consent explanation
- Permissions required list
- Privacy policy link

### 2. Folder/File Selection Interface
**Location:** `/import/google-drive/select`

**Features:**
- Tree view of Drive folders (collapsible)
- File list view with checkboxes
- Multi-select capability
- File type icons
- Size/date metadata display
- Search/filter bar
- "Select All" / "Deselect All" buttons
- Preview import summary (X files, Y GB)
- Import options:
  - [ ] Preserve folder hierarchy
  - [ ] Skip duplicates (by SHA256)
  - [ ] Include shared files
- "Start Import" button

### 3. Import Status Dashboard
**Location:** `/import/google-drive/status/{jobId}`

**Real-time Metrics:**
```
╔════════════════════════════════════════════════════════╗
║  Import Progress                            [Cancel]  ║
╠════════════════════════════════════════════════════════╣
║  ████████████████░░░░░░░░░░  45% (45/100 files)      ║
║  ⬇ Downloading: document.pdf (2.5 MB/s)              ║
║  ⏱ ETA: 5 minutes                                     ║
║  📊 Status: 40 completed, 5 failed, 55 pending       ║
╚════════════════════════════════════════════════════════╝

Files:
┌─────────────────────────────────────────────────────┐
│ ✅ report.pdf         │ Completed  │ 2.3 MB │ 12:30 │
│ ⏳ presentation.pptx  │ Scanning   │ ████░░ │       │
│ ❌ large-video.mp4    │ Failed     │ Error: Too large│
│ ⏸ document.docx      │ Pending    │ 45 KB  │       │
└─────────────────────────────────────────────────────┘

[Retry Failed Files]  [View Details]
```

**Detailed Per-File View:**
- File name with Drive icon
- Progress bar for current file
- Current stage (downloading/hashing/scanning/uploading)
- Transfer speed
- Success/error indicator
- Retry button for failed files

### 4. Import History Page
**Location:** `/import/google-drive/history`

**Display:**
- List of past import jobs
- Job ID, start time, duration
- Success/failure counts
- Total data imported
- "View Details" link to status page

### 5. Sync Configuration Page (Phase 3)
**Location:** `/import/google-drive/sync`

**Settings:**
- Enable/disable continuous sync
- Select Drive folder to sync
- Sync interval (hourly, daily, etc.)
- Conflict resolution strategy
- Last sync timestamp
- Manual "Sync Now" button

---

## Monitoring & Observability

### Metrics to Track

**Job-Level Metrics:**
- `gdrive_import_jobs_total{status="completed|failed|cancelled"}`
- `gdrive_import_duration_seconds`
- `gdrive_import_files_total{status="completed|failed"}`
- `gdrive_import_bytes_total`

**File-Level Metrics:**
- `gdrive_file_download_duration_seconds`
- `gdrive_file_size_bytes{stage="downloaded|uploaded"}`
- `gdrive_transfer_rate_bytes_per_second`

**API Metrics:**
- `gdrive_api_requests_total{endpoint,status}`
- `gdrive_api_errors_total{error_type}`
- `gdrive_api_rate_limit_hits_total`

**Sync Metrics (Phase 3):**
- `gdrive_sync_cycles_total{status}`
- `gdrive_sync_new_files_detected`
- `gdrive_sync_lag_seconds` (time since last successful sync)

### Logging Strategy
- Structured JSON logs
- Log levels: DEBUG, INFO, WARN, ERROR
- Include job ID, user ID, file ID in all log entries
- Detailed error logging with stack traces

---

## Security Considerations

### OAuth Security
1. **PKCE Flow** - Use Proof Key for Code Exchange for additional security
2. **Token Encryption** - Encrypt tokens at rest using AES-256
3. **Secure Storage** - Store encrypted tokens in database or OS keyring
4. **Token Rotation** - Implement automatic refresh token rotation
5. **Scope Minimization** - Request only `drive.readonly` scope

### API Security
1. **Rate Limiting** - Respect Google Drive API quotas (per-user limits)
2. **Authentication Required** - All endpoints require valid user session
3. **Input Validation** - Validate all Drive file IDs, folder paths
4. **CORS** - Restrict to localhost and configured domains

### File Security
1. **Leverage Existing Scanning** - All imported files go through security checks
2. **Size Limits** - Enforce max file size limits
3. **Type Validation** - Respect PinShare's allowed file types
4. **Malware Scanning** - VirusTotal/ClamAV on all imports

### Privacy
1. **User Data Isolation** - Each user only sees their own imports
2. **Token Revocation** - Support complete data deletion
3. **Audit Logging** - Log all import operations

---

## Testing Strategy

### Unit Tests
- Google Drive client mocking
- OAuth flow state machine
- Job queue operations
- Metadata mapping accuracy

### Integration Tests
- End-to-end import flow with test files
- OAuth callback handling
- Database persistence
- Worker pool concurrency

### Load Tests
- Import 1,000 files concurrently
- Test with 10+ concurrent import jobs
- Measure memory usage and performance
- Test rate limit handling

### Security Tests
- OAuth PKCE flow validation
- Token encryption/decryption
- Unauthorized access attempts
- Input validation edge cases

---

## Implementation Phases

### Phase 1: OAuth + Basic Import (MVP)
**Goal:** Import files manually from Google Drive

**Deliverables:**
- [ ] User authentication system
- [ ] Google Drive OAuth integration
- [ ] Drive folder/file browser UI
- [ ] Basic import job queue
- [ ] Simple progress tracking
- [ ] Integration with existing upload pipeline

**Estimated Effort:** 2-3 weeks

### Phase 2: Enhanced Monitoring
**Goal:** Comprehensive import monitoring

**Deliverables:**
- [ ] Per-file status tracking
- [ ] Real-time progress updates (WebSocket)
- [ ] Bandwidth/speed metrics
- [ ] Error tracking and retry mechanism
- [ ] Import history dashboard
- [ ] Prometheus metrics

**Estimated Effort:** 1-2 weeks

### Phase 3: Continuous Sync
**Goal:** Auto-sync Drive changes

**Deliverables:**
- [ ] Drive change detection (polling)
- [ ] Sync scheduler
- [ ] Sync configuration UI
- [ ] Conflict resolution
- [ ] Sync pause/resume

**Estimated Effort:** 2-3 weeks

### Phase 4: Performance & Polish
**Goal:** Production-ready reliability

**Deliverables:**
- [ ] Performance optimization
- [ ] Webhook support (vs polling)
- [ ] Advanced filtering options
- [ ] Batch operations
- [ ] Comprehensive documentation

**Estimated Effort:** 1-2 weeks

---

## Dependencies

### External Services
- **Google Cloud Project** - OAuth credentials, API enablement
- **Google Drive API v3** - File access
- **Database** - PostgreSQL or SQLite for job/user persistence

### Go Packages
```go
require (
    google.golang.org/api v0.XXX
    golang.org/x/oauth2 v0.XXX
    github.com/lib/pq v1.XXX // PostgreSQL driver
    github.com/google/uuid v1.XXX // Job IDs
)
```

### Configuration
```yaml
# config.yaml additions
google_drive:
  oauth:
    client_id: "${GOOGLE_OAUTH_CLIENT_ID}"
    client_secret: "${GOOGLE_OAUTH_CLIENT_SECRET}"
    redirect_url: "http://localhost:9090/api/google-drive/callback"
    scopes:
      - "https://www.googleapis.com/auth/drive.readonly"
  
  import:
    max_concurrent_downloads: 5
    max_file_size_mb: 1024
    temp_download_dir: "./tmp/gdrive"
    
  rate_limiting:
    requests_per_second: 10
    burst: 20
```

---

## Success Metrics

### User Adoption
- Number of users connecting Google Drive
- Total files imported
- Active sync configurations

### Performance
- Average import speed (files/minute, MB/s)
- P95 latency for import operations
- Error rate < 1%

### Reliability
- Job success rate > 99%
- Retry success rate
- Sync lag < 5 minutes (Phase 3)

---

## Future Enhancements

### Beyond Initial Implementation
- **Dropbox Integration** - Apply same pattern to Dropbox
- **OneDrive Support** - Microsoft OneDrive import
- **S3 Import** - AWS S3 bucket import
- **Selective Export** - PinShare → Google Drive
- **Smart Deduplication** - Cross-user file deduplication
- **Bandwidth Scheduling** - Import during off-peak hours
- **Multi-folder Sync** - Sync multiple Drive folders simultaneously

---

## Related Issues

- #2 - Server-side search endpoint (needed for filtering large import lists)
- TBD - User authentication system (foundational for per-user OAuth)
- TBD - PostgreSQL migration (needed for job persistence)

---

## References

- [Google Drive API Documentation](https://developers.google.com/drive/api/v3/about-sdk)
- [OAuth 2.0 PKCE Flow](https://oauth.net/2/pkce/)
- [PinShare Upload Pipeline](internal/p2p/uploads.go)
- [UploadStatusManager](internal/store/upload_status.go)

---

**Priority:** High  
**Complexity:** High  
**Impact:** High - Unlocks PinShare for users with existing cloud storage

Implement Google Drive Import Tool with Comprehensive Monitoring #4

Description

Summary

User Story

Current State

Proposed Solution

Architecture Components

1. User Authentication System

2. Google Drive Integration

3. Background Job System

4. Enhanced Monitoring Dashboard

5. Continuous Sync Engine (Phase 3)

Technical Implementation

Backend API Endpoints

Authentication

File Selection

Import Operations

Continuous Sync (Phase 3)

Database Schema

Users Table

Import Jobs Table

Import Files Table

Sync Configurations Table (Phase 3)

Internal Architecture

New Go Packages

Integration with Existing Systems

UI Requirements

1. Google Drive Authorization Page

2. Folder/File Selection Interface

3. Import Status Dashboard

4. Import History Page

5. Sync Configuration Page (Phase 3)

Monitoring & Observability

Metrics to Track

Logging Strategy

Security Considerations

OAuth Security

API Security

File Security

Privacy

Testing Strategy

Unit Tests

Integration Tests

Load Tests

Security Tests

Implementation Phases

Phase 1: OAuth + Basic Import (MVP)

Phase 2: Enhanced Monitoring

Phase 3: Continuous Sync

Phase 4: Performance & Polish

Dependencies

External Services

Go Packages

Configuration

Success Metrics

User Adoption

Performance

Reliability

Future Enhancements

Beyond Initial Implementation

Related Issues

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions