Skip to content

Add optional metrics and observability support #28

@sgaunet

Description

@sgaunet

Feature Request

Add optional metrics collection for troubleshooting and monitoring long-running processes.

Use Cases

  1. Debugging performance issues
  2. Monitoring log processing throughput
  3. Analyzing cache effectiveness
  4. Identifying bottlenecks

Proposed Metrics

Processing Metrics

  • Lines processed per second
  • Total lines processed
  • Bytes processed
  • Processing latency (p50, p95, p99)

Cache Metrics

  • Cache hit rate (%)
  • Cache size (entries)
  • Cache hits / misses

Stream Metrics

  • Stdout vs stderr line counts
  • Lines per log level
  • Average line length

Proposed Implementation

Option 1: CLI Flag with Summary

# Enable metrics collection
logwrap --metrics -- long-running-command

# Output at end:
# Metrics Summary:
# Lines processed: 125,432
# Processing rate: 1,234 lines/sec
# Cache hit rate: 94.3%
# Log levels: ERROR=12 WARN=45 INFO=125,375

Option 2: Metrics File

# Write metrics to file
logwrap --metrics-file /tmp/logwrap-metrics.json -- command
{
  "duration_seconds": 120,
  "lines_processed": 125432,
  "lines_per_second": 1045.27,
  "bytes_processed": 15234567,
  "cache": {
    "hits": 118234,
    "misses": 7198,
    "hit_rate": 0.943,
    "size": 7198
  },
  "levels": {
    "ERROR": 12,
    "WARN": 45,
    "INFO": 125375
  },
  "streams": {
    "stdout": 125400,
    "stderr": 32
  }
}

Option 3: Periodic Stats

# Print stats every 10 seconds
logwrap --metrics --metrics-interval 10s -- command

# Output:
# [10s] 12,543 lines (1,254/s) - Cache: 92.1% - Levels: E=1 W=5 I=12,537
# [20s] 25,123 lines (1,256/s) - Cache: 93.4% - Levels: E=2 W=8 I=25,113

Implementation

// pkg/metrics/metrics.go

type Metrics struct {
    StartTime      time.Time
    LinesProcessed uint64
    BytesProcessed uint64
    CacheHits      uint64
    CacheMisses    uint64
    LevelCounts    map[string]uint64
    StreamCounts   map[string]uint64
    mu             sync.Mutex
}

func (m *Metrics) RecordLine(line, level, stream string) {
    m.mu.Lock()
    defer m.mu.Unlock()
    
    m.LinesProcessed++
    m.BytesProcessed += uint64(len(line))
    m.LevelCounts[level]++
    m.StreamCounts[stream]++
}

func (m *Metrics) RecordCacheHit(hit bool) {
    m.mu.Lock()
    defer m.mu.Unlock()
    
    if hit {
        m.CacheHits++
    } else {
        m.CacheMisses++
    }
}

func (m *Metrics) Summary() string {
    m.mu.Lock()
    defer m.mu.Unlock()
    
    duration := time.Since(m.StartTime).Seconds()
    lps := float64(m.LinesProcessed) / duration
    hitRate := float64(m.CacheHits) / float64(m.CacheHits + m.CacheMisses) * 100
    
    return fmt.Sprintf(
        "Lines: %d (%.0f/s) - Cache hit rate: %.1f%% - Levels: E=%d W=%d I=%d",
        m.LinesProcessed, lps, hitRate,
        m.LevelCounts["ERROR"], m.LevelCounts["WARN"], m.LevelCounts["INFO"],
    )
}

Configuration

# config.yaml
metrics:
  enabled: false
  output: "stderr"  # where to write metrics
  interval: "10s"   # periodic stats (0 = summary only)
  format: "text"    # text or json

Benefits

Troubleshooting:

  • Identify slow processing
  • Detect cache inefficiency
  • Monitor resource usage

Optimization:

  • Measure impact of changes
  • Compare configurations
  • Validate improvements

Monitoring:

  • Track long-running processes
  • Alert on performance degradation
  • Analyze log patterns

Implementation Checklist

  • Create metrics package
  • Add Metrics struct with thread-safe counters
  • Integrate metrics collection in processor
  • Add CLI flag --metrics
  • Add summary output at completion
  • Add periodic stats option
  • Add JSON output format
  • Document metrics in README

Example Usage

# Basic metrics
$ logwrap --metrics -- ./long-running-app
# ... app output ...
# Metrics: 1,234,567 lines (12,345/s), Cache: 94.2%, E=12 W=456 I=1,234,099

# JSON metrics to file
$ logwrap --metrics-file metrics.json -- ./app
$ cat metrics.json
{"lines": 1234567, "rate": 12345.67, ...}

# Live stats every 5s
$ logwrap --metrics --metrics-interval 5s -- ./app
[5s]  61,728 lines (12,345/s) ...
[10s] 123,456 lines (12,346/s) ...

Related Issues

References

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions