Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
b342525
docs: Add CLAUDE.md with development guidance
danielunderwood Jun 26, 2025
8a5f205
feat(deps): Add FastAPI dependencies and basic app structure
danielunderwood Jun 26, 2025
9ff5479
feat(migration): Complete Phase 1 - FastAPI/Quart dual setup
danielunderwood Jun 27, 2025
cd583be
ci: Enable CI pipeline for feature branches
danielunderwood Jun 27, 2025
2e5ab71
feat(docker): Switch to uvicorn with hybrid app
danielunderwood Jun 27, 2025
8d28118
feat(deps): Add structlog for structured logging
danielunderwood Jun 27, 2025
685ad6a
fix(logging): Remove duplicate logging handlers and configurations
danielunderwood Jun 27, 2025
0b81361
feat(logging): Initialize structlog for structured logging
danielunderwood Jun 27, 2025
093aee0
feat(dev): Add Slumber REST API testing configuration
danielunderwood Jun 27, 2025
77bad83
feat(logging): Rewrite structlog config with clean output
danielunderwood Jun 27, 2025
cff7d43
feat(logging): Convert all modules to structured logging
danielunderwood Jun 27, 2025
9ccb7c2
feat(provider): Add debug decorator for async operation timing
danielunderwood Jun 27, 2025
9f3070f
feat(logging): Modernize logging configuration with pydantic-settings
danielunderwood Jun 27, 2025
314413b
docs(logging): Add logging configuration documentation
danielunderwood Jun 27, 2025
a0ab5ca
fix(ssl): Update certifi and fix SSL certificate verification errors
danielunderwood Jun 27, 2025
8717b86
feat(docker): Optimize Dockerfile for Poetry dependency caching
danielunderwood Jul 5, 2025
8eb4723
fix(async): Fix asyncio timeout and unpacking issues across API funct…
danielunderwood Jul 5, 2025
517beb4
refactor(async): Extract common async timeout handling to utility fun…
danielunderwood Jul 5, 2025
1c0ed26
feat(async): Add timeout handling to artist and album endpoints with …
danielunderwood Jul 5, 2025
8a64771
feat(async): Add comprehensive async operation monitoring and resilience
danielunderwood Jul 5, 2025
a2102fe
fix(async): Add missing imports and endpoint registration
danielunderwood Jul 5, 2025
b5cc3eb
fix(async): Add timeout protection to search endpoints
danielunderwood Jul 5, 2025
50b6a47
fix(async): Increase album timeout and add detailed operation tracking
danielunderwood Jul 5, 2025
a7ab3ea
feat(resilience): Add circuit breaker protection to database operations
danielunderwood Jul 5, 2025
69cfa92
feat(async): Add aggressive timeout handling and hanging operation cl…
danielunderwood Jul 6, 2025
7f4dfa7
feat(typing): Add structured Pydantic models and proper exception han…
danielunderwood Jul 6, 2025
422cba6
feat(fastapi): Migrate artist endpoint and reorganize models structure
danielunderwood Jul 6, 2025
9343730
feat(config): Add configurable async operation timeouts with Pydantic…
danielunderwood Jul 6, 2025
cc87090
feat(debug): Add comprehensive debugging infrastructure for timeout d…
danielunderwood Jul 6, 2025
cbcc822
fix(fastapi): Convert datetime to string for InfoResponse replication…
danielunderwood Jul 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .envrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dotenv
105 changes: 105 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

LidarrAPI.Metadata is a Python-based metadata API server that provides music metadata services for Lidarr. It integrates with MusicBrainz database, Solr search, Redis caching, and external music services like Spotify to deliver comprehensive music metadata.

## Development Commands

### Environment Setup
```bash
# Install dependencies with poetry (preferred)
poetry install --with=dev
poetry shell

# Or use pip with requirements.txt
pip install -r requirements.txt
```

### Running the Application
```bash
# Run the metadata server directly
python lidarrmetadata/server.py

# Or use the installed command
lidarr-metadata-server

# Run the crawler
lidarr-metadata-crawler
```

### Testing
```bash
# Run tests with pytest (in poetry environment)
pytest tests --doctest-modules

# Run tests with coverage
pytest tests --doctest-modules --cov=lidarrmetadata --cov-report=xml --cov-report=html

# Run tests with tox
tox
```

### Docker Services
```bash
# Start database services
docker-compose up -d db
docker-compose run --rm musicbrainz /usr/local/bin/createdb.sh -fetch

# Set up search indexing
docker-compose up -d indexer musicbrainz
docker-compose exec indexer python -m sir amqp_setup

# Development environment (exposes service ports)
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d

# Production environment
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
```

## Architecture

### Core Components
- **lidarrmetadata/app.py**: Main Quart application with API routes
- **lidarrmetadata/api.py**: Core API logic and data processing functions
- **lidarrmetadata/server.py**: Gunicorn-based WSGI server wrapper
- **lidarrmetadata/provider.py**: External service integrations (Spotify, Last.fm, etc.)
- **lidarrmetadata/cache.py**: Redis caching layer
- **lidarrmetadata/crawler.py**: Background data crawler

### Data Layer
- **lidarrmetadata/sql/**: SQL queries for MusicBrainz database operations
- PostgreSQL database with MusicBrainz schema
- Redis for caching and session management
- Solr for search indexing

### External Services
- MusicBrainz: Primary metadata source
- Spotify: Additional metadata and mapping
- Last.fm: Charts and popularity data
- Billboard: Chart data integration

### Configuration
- Environment-based configuration in `lidarrmetadata/config.py`
- Docker environment files: `postgres.env`
- Test configuration via `LIDARR_METADATA_CONFIG=TEST`

## Key Files
- **pyproject.toml**: Poetry dependencies and project configuration
- **tox.ini**: Test runner configuration
- **docker-compose*.yml**: Service orchestration for different environments
- **lidarrmetadata/sql/CreateIndices.sql**: Additional database indices for Lidarr

## Development Best Practices

- Use semantic commits like `feat(component): Add ...`

## Version Control

- Use git flow branches

## Development Memories

- Add new settings to pydantic-basesettings model rather than old config model
267 changes: 267 additions & 0 deletions DEBUGGING_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
# Debugging Guide for /artist/<mbid> Timeout Issues

This guide explains how to use the comprehensive debugging tools implemented to diagnose and resolve persistent timeout issues in the `/artist/<mbid>` endpoint.

## Quick Start - Debugging a Timeout

When you encounter a timeout on `/artist/<mbid>`, follow these steps:

### 1. Check Overall Health
```bash
curl http://localhost:5001/health/async
```
This shows:
- Active async operations
- Hanging operations (operations running longer than their timeout)
- Recent failures
- Circuit breaker status

### 2. Check Database Performance
```bash
curl http://localhost:5001/debug/database
```
This shows:
- Connection pool utilization
- Average query times
- Slow queries
- Connection acquisition metrics

### 3. Debug Specific Artist
```bash
curl "http://localhost:5001/debug/artist/<PROBLEMATIC_MBID>"
```
This provides:
- Current timeout settings
- Database metrics at the moment
- Active async operations
- Provider information
- Circuit breaker status

### 4. Monitor Real-Time Operations
```bash
curl http://localhost:5001/debug/operations/hanging
```
This shows currently hanging operations with detailed context.

## Understanding the Debug Output

### Database Metrics (`/debug/database`)

```json
{
"connection_pool": {
"pool_size": 10,
"active_connections": 2,
"idle_connections": 8,
"utilization_percent": 20.0
},
"connection_acquisition": {
"total_acquisitions": 150,
"failed_acquisitions": 0,
"avg_acquisition_time": 0.0034,
"failure_rate_percent": 0.0
},
"query_performance": {
"recent_queries_count": 12,
"avg_query_time": 0.156,
"slow_queries_count": 2,
"slow_query_threshold": 5.0
}
}
```

**Red Flags:**
- `utilization_percent > 80%` - Connection pool exhaustion
- `avg_acquisition_time > 1.0` - Slow connection acquisition
- `failure_rate_percent > 5%` - Connection failures
- `avg_query_time > 2.0` - Slow queries

### Async Operations (`/health/async`)

```json
{
"healthy": false,
"active_operations": 3,
"hanging_operations": 1,
"hanging_details": [
{
"name": "database_artist_lookup",
"running_time": 15.4,
"timeout": 10.0,
"context": {
"mbids": ["artist-id-here"]
}
}
]
}
```

**Red Flags:**
- `hanging_operations > 0` - Operations stuck longer than timeout
- `active_operations` growing without completing

## Common Timeout Scenarios and Solutions

### Scenario 1: Database Query Timeout
**Symptoms:**
- `database_artist_lookup` appears in hanging operations
- High `avg_query_time` in database metrics
- EXPLAIN ANALYZE logs showing slow query plans

**Investigation:**
1. Check slow queries in `/debug/database`
2. Look for EXPLAIN ANALYZE logs in application logs
3. Check connection pool utilization

**Solutions:**
- Optimize slow SQL queries
- Add database indexes
- Increase database_query timeout
- Scale database resources

### Scenario 2: External API Timeout
**Symptoms:**
- `artist_overviews_batch` or `artist_images_*` in hanging operations
- Circuit breaker showing failures for external services

**Investigation:**
1. Check circuit breaker status in `/debug/artist/<mbid>`
2. Monitor external API response times in logs
3. Check network connectivity to external services

**Solutions:**
- Increase external_api timeout
- Implement retry logic
- Use circuit breaker more aggressively
- Cache external API responses longer

### Scenario 3: Provider-Specific Issues
**Symptoms:**
- Specific provider operations appearing in hanging operations
- Provider operation logs showing slow responses

**Investigation:**
1. Check provider-specific metrics in debug output
2. Look for "Slow provider operation" warnings in logs
3. Monitor specific provider response times

**Solutions:**
- Increase timeout for specific provider operations
- Implement provider-specific circuit breakers
- Cache provider responses more aggressively

## Log Analysis

### Key Log Entries to Monitor

1. **Slow Database Queries:**
```
logger.warning("Slow database query detected", extra={
'execution_time': 8.5,
'sql_preview': 'SELECT row_to_json(artist_data)...',
'performance_issue': True
})
```

2. **Critical Slow Queries with EXPLAIN:**
```
logger.error("Critical slow query - EXPLAIN ANALYZE", extra={
'execution_time': 9.8,
'explain_plan': {...},
'critical_performance_issue': True
})
```

3. **Hanging Operations:**
```
logger.warning("album_search tasks timed out after 20s: get_overview(), artist_images_primary()", extra={
'timed_out_coroutines': ['get_overview()', 'artist_images_primary()']
})
```

4. **Provider Operations:**
```
logger.warning("Slow provider operation", extra={
'provider': 'WikipediaProvider',
'operation': 'get_artist_overview',
'elapsed_seconds': 7.2,
'performance_concern': True
})
```

## Timeout Configuration

Current timeout settings can be viewed via environment variables or in the debug output:

```bash
# View current timeouts
curl http://localhost:5001/debug/artist/any-valid-uuid | jq '.timeouts'
```

### Environment Variables for Timeout Tuning:
```bash
export ASYNC_TIMEOUT_ARTIST_INFO=60 # Increase from 45s
export ASYNC_TIMEOUT_DATABASE_QUERY=15 # Increase from 10s
export ASYNC_TIMEOUT_EXTERNAL_API=15 # Increase from 10s
export ASYNC_TIMEOUT_ARTIST_IMAGES=15 # Increase from 10s
```

## Emergency Actions

### Clear Hanging Operations
```bash
curl -X POST http://localhost:5001/health/async/cleanup
```

### Reset Database Metrics
```bash
curl -X POST http://localhost:5001/debug/database/reset
```

### Circuit Breaker Status
Check if external services are being circuit-broken:
```bash
curl http://localhost:5001/health/async | jq '.circuit_breakers'
```

## Performance Monitoring Script

Use this script to continuously monitor for issues:

```bash
#!/bin/bash
# monitor_performance.sh

echo "Monitoring artist endpoint performance..."
while true; do
echo "=== $(date) ==="

# Check for hanging operations
hanging=$(curl -s http://localhost:5001/health/async | jq '.hanging_operations')
if [ "$hanging" -gt 0 ]; then
echo "⚠️ ALERT: $hanging hanging operations detected!"
curl -s http://localhost:5001/debug/operations/hanging | jq '.'
fi

# Check database performance
db_util=$(curl -s http://localhost:5001/debug/database | jq '.connection_pool.utilization_percent')
if (( $(echo "$db_util > 80" | bc -l) )); then
echo "⚠️ ALERT: High database connection utilization: $db_util%"
fi

echo "Status: $hanging hanging ops, $db_util% DB utilization"
sleep 30
done
```

## Next Steps

If timeouts persist after using these debugging tools:

1. **Gather Evidence:** Collect logs and debug output during a timeout event
2. **Identify Pattern:** Determine if it's always the same operation timing out
3. **Resource Analysis:** Check if it's a resource constraint (CPU, memory, network)
4. **Infrastructure:** Consider if the issue is at the infrastructure level
5. **Code Review:** Review the specific operation that's consistently timing out

The comprehensive tracking added should now give you exact visibility into where the 45-second timeout is being consumed in the `/artist/<mbid>` endpoint.
Loading