Skip to content

Conversation

@tgvashworth
Copy link
Contributor

Custom TraceAwareCloudLoggingFilter fixes spanId mismatch where app logs were detached from parent HTTP requests in Cloud Logging.

  • Request-level logs use parent spanId (nests under Cloud Run logs)
  • Custom span logs use child spanId (correlates with Cloud Trace spans)
  • Detection via span.kind: SERVER = root request, INTERNAL = custom span
  • Removed LoggingInstrumentor (custom filter handles trace context)

Custom TraceAwareCloudLoggingFilter fixes spanId mismatch where app logs
were detached from parent HTTP requests in Cloud Logging.

- Request-level logs use parent spanId (nests under Cloud Run logs)
- Custom span logs use child spanId (correlates with Cloud Trace spans)
- Detection via span.kind: SERVER = root request, INTERNAL = custom span
- Removed LoggingInstrumentor (custom filter handles trace context)
Copilot AI review requested due to automatic review settings December 23, 2025 13:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a spanId mismatch issue where application logs were detached from parent HTTP request logs in Cloud Logging. The solution introduces a custom logging filter that intelligently chooses between parent and child span IDs based on span type.

Key changes:

  • Custom TraceAwareCloudLoggingFilter uses parent spanId for request-level logs (nesting under Cloud Run logs) and child spanId for custom span logs (correlating with Cloud Trace spans)
  • Detection based on span.kind: SERVER spans (from DjangoInstrumentor) vs INTERNAL spans (custom application spans)
  • Removed LoggingInstrumentor since the custom filter now handles trace context injection

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
gyrinx/logging_filter.py New module implementing TraceAwareCloudLoggingFilter and TraceAwareStructuredLogHandler for proper spanId correlation
gyrinx/tests/test_logging_filter.py Comprehensive test suite covering filter logic, span detection, and fallback scenarios
gyrinx/tracing.py Removed LoggingInstrumentor import and usage, added comment explaining custom filter handles trace context
gyrinx/settings_prod.py Updated logging handler class to use custom TraceAwareStructuredLogHandler

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +285 to +317
class TestTraceAwareStructuredLogHandler:
"""Tests for TraceAwareStructuredLogHandler."""

def test_uses_custom_filter(self):
"""Handler should use TraceAwareCloudLoggingFilter."""
handler = TraceAwareStructuredLogHandler(
project_id="test-project", labels={"env": "test"}
)

# Should have exactly one filter of our custom type
custom_filters = [
f for f in handler.filters if isinstance(f, TraceAwareCloudLoggingFilter)
]
assert len(custom_filters) == 1

# Check filter configuration
custom_filter = custom_filters[0]
assert custom_filter.project == "test-project"
assert custom_filter.default_labels == {"env": "test"}

def test_stores_project_id(self):
"""Handler should store project_id attribute."""
handler = TraceAwareStructuredLogHandler(project_id="my-project")
assert handler.project_id == "my-project"

def test_stores_json_encoder(self):
"""Handler should store json_encoder_cls attribute."""
import json

handler = TraceAwareStructuredLogHandler(
project_id="test", json_encoder_cls=json.JSONEncoder
)
assert handler._json_encoder_cls == json.JSONEncoder
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests for TraceAwareStructuredLogHandler only verify that attributes are stored correctly, but don't test the actual logging functionality. There's no test that verifies the handler can successfully format and emit log records, which is critical given that the handler bypasses StructuredLogHandler's init method.

Consider adding a test that creates a log record, passes it through the handler, and verifies the output is correctly formatted as JSON. This would catch any issues with the missing initialization or formatting setup.

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +123
is_custom_span = (
getattr(span, "kind", None) != opentelemetry.trace.SpanKind.SERVER
)
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span kind comparison uses getattr with a default of None, which could cause unexpected behavior. If span.kind is None (attribute doesn't exist), the comparison None != opentelemetry.trace.SpanKind.SERVER will be True, meaning the span will be incorrectly classified as a custom span.

This could lead to logs using the wrong span ID. Consider using a more explicit check or handling the case where kind attribute doesn't exist differently. For example, you could default to treating missing kind as SERVER (request-level) rather than as a custom span, since that's likely the safer default for logging correlation.

Suggested change
is_custom_span = (
getattr(span, "kind", None) != opentelemetry.trace.SpanKind.SERVER
)
# If `kind` is missing, default to SERVER (treat as request-level span).
kind = getattr(span, "kind", opentelemetry.trace.SpanKind.SERVER)
is_custom_span = kind != opentelemetry.trace.SpanKind.SERVER

Copilot uses AI. Check for mistakes.

def test_returns_none_when_no_opentelemetry(self, filter_instance):
"""Returns None values when OpenTelemetry is not available."""
with patch("gyrinx.logging_filter.HAS_OPENTELEMETRY", False):
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test mocks gyrinx.logging_filter.HAS_OPENTELEMETRY but this approach may not work as expected in Python. The module-level constant HAS_OPENTELEMETRY is evaluated once when the module is imported, and patching it in a test might not affect the already-imported value used by the function.

A more reliable approach would be to patch the import check at the point where it's used in _get_otel_context(), or to patch opentelemetry.trace to be None/unavailable. The current approach might lead to flaky tests or tests that don't actually validate the intended behavior.

Suggested change
with patch("gyrinx.logging_filter.HAS_OPENTELEMETRY", False):
# Patch the value at the point where _get_otel_context reads it,
# rather than patching the module-level constant directly.
with patch.dict(
filter_instance._get_otel_context.__globals__, {"HAS_OPENTELEMETRY": False}
):

Copilot uses AI. Check for mistakes.
Comment on lines +226 to +227
with patch("gyrinx.logging_filter.opentelemetry.trace") as mock_trace:
mock_trace.get_current_span.return_value = mock_trace.span.INVALID_SPAN
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test compares the return value of get_current_span() with mock_trace.span.INVALID_SPAN, but this creates an inconsistent mock structure. The code being tested accesses opentelemetry.trace.span.INVALID_SPAN (line 111 in logging_filter.py), but the mock sets mock_trace.span.INVALID_SPAN.

This means the comparison span == opentelemetry.trace.span.INVALID_SPAN in the actual code won't match the mocked value correctly because the mock paths don't align. The test should ensure that mock_trace.span.INVALID_SPAN is the same object that get_current_span() returns, or it should properly mock the full path gyrinx.logging_filter.opentelemetry.trace.span.INVALID_SPAN.

Suggested change
with patch("gyrinx.logging_filter.opentelemetry.trace") as mock_trace:
mock_trace.get_current_span.return_value = mock_trace.span.INVALID_SPAN
with patch("gyrinx.logging_filter.opentelemetry") as mock_otel:
invalid_span = MagicMock()
mock_otel.trace.span.INVALID_SPAN = invalid_span
mock_otel.trace.get_current_span.return_value = invalid_span

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +70
span_id = otel_span_id # Correlate with custom span in Cloud Trace
else:
span_id = header_span_id # Nest under Cloud Run request log
trace_id = otel_trace_id
sampled = otel_sampled
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for determining which span ID to use has a potential edge case. When both otel_trace_id and header_span_id are present (line 62), but is_custom_span is False (line 67-68), the code uses header_span_id for the span but otel_trace_id for the trace.

However, if the trace IDs don't match (header_trace_id != otel_trace_id), this creates an inconsistency where the span ID comes from one trace context but the trace ID comes from another. This could happen if there's a propagation issue or if the trace contexts are out of sync.

Consider adding validation to ensure trace IDs match when both contexts are available, or at minimum add a comment explaining why using mixed IDs from different sources is safe in this scenario.

Suggested change
span_id = otel_span_id # Correlate with custom span in Cloud Trace
else:
span_id = header_span_id # Nest under Cloud Run request log
trace_id = otel_trace_id
sampled = otel_sampled
# Custom span logs: use OTel span/trace so they correlate with Cloud Trace spans.
span_id = otel_span_id
trace_id = otel_trace_id
sampled = otel_sampled
else:
# Request-level logs: prefer parent spanId from header to nest under Cloud Run logs.
if header_trace_id and header_trace_id != otel_trace_id:
# If header and OTel trace IDs disagree, keep span/trace consistent by
# using the header trace context for request-level logs.
span_id = header_span_id
trace_id = header_trace_id
sampled = header_sampled
else:
# Normal case: trace IDs match or header_trace_id is unavailable.
# Use header spanId for nesting, but keep OTel traceId for cross-service correlation.
span_id = header_span_id
trace_id = otel_trace_id
sampled = otel_sampled

Copilot uses AI. Check for mistakes.
Comment on lines +144 to +155
# Call StreamHandler init directly to skip StructuredLogHandler's filter setup
logging.StreamHandler.__init__(self, stream=stream)
self.project_id = project_id

# Use our custom filter instead of default CloudLoggingFilter
log_filter = TraceAwareCloudLoggingFilter(
project=project_id, default_labels=labels
)
self.addFilter(log_filter)

# Store json encoder class (used by StructuredLogHandler.emit)
self._json_encoder_cls = json_encoder_cls
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TraceAwareStructuredLogHandler bypasses StructuredLogHandler's init method by calling StreamHandler.init directly. This means it doesn't inherit important initialization logic from StructuredLogHandler, particularly the setFormatter() call that configures JSON formatting.

Without calling super().init() or StructuredLogHandler.init(), the handler may not properly format log records as JSON for Cloud Logging. The parent class typically sets up a StructuredLogFormatter which is critical for proper JSON output.

Consider calling the parent class's init method and then replacing the filter, rather than bypassing it entirely. Alternatively, ensure all necessary initialization from StructuredLogHandler is replicated here, including formatter setup.

Suggested change
# Call StreamHandler init directly to skip StructuredLogHandler's filter setup
logging.StreamHandler.__init__(self, stream=stream)
self.project_id = project_id
# Use our custom filter instead of default CloudLoggingFilter
log_filter = TraceAwareCloudLoggingFilter(
project=project_id, default_labels=labels
)
self.addFilter(log_filter)
# Store json encoder class (used by StructuredLogHandler.emit)
self._json_encoder_cls = json_encoder_cls
# Initialize using StructuredLogHandler to ensure formatter and internals are set up
super().__init__(
labels=labels,
stream=stream,
project_id=project_id,
json_encoder_cls=json_encoder_cls,
**kwargs,
)
# Replace any default filters with our custom TraceAwareCloudLoggingFilter
self.filters.clear()
log_filter = TraceAwareCloudLoggingFilter(
project=project_id, default_labels=labels
)
self.addFilter(log_filter)

Copilot uses AI. Check for mistakes.
Comment on lines +144 to +155
# Call StreamHandler init directly to skip StructuredLogHandler's filter setup
logging.StreamHandler.__init__(self, stream=stream)
self.project_id = project_id

# Use our custom filter instead of default CloudLoggingFilter
log_filter = TraceAwareCloudLoggingFilter(
project=project_id, default_labels=labels
)
self.addFilter(log_filter)

# Store json encoder class (used by StructuredLogHandler.emit)
self._json_encoder_cls = json_encoder_cls
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TraceAwareStructuredLogHandler doesn't override the emit() method from StructuredLogHandler. While it inherits emit() from the parent class, the parent's emit() method expects certain attributes and formatting to be set up during initialization.

Since this handler bypasses StructuredLogHandler.init() and only calls StreamHandler.init(), the emit() method may not work correctly because it won't have access to the formatter or other setup that StructuredLogHandler normally provides.

Verify that the inherited emit() method works correctly with this initialization approach, or override emit() to ensure proper JSON formatting for Cloud Logging.

Suggested change
# Call StreamHandler init directly to skip StructuredLogHandler's filter setup
logging.StreamHandler.__init__(self, stream=stream)
self.project_id = project_id
# Use our custom filter instead of default CloudLoggingFilter
log_filter = TraceAwareCloudLoggingFilter(
project=project_id, default_labels=labels
)
self.addFilter(log_filter)
# Store json encoder class (used by StructuredLogHandler.emit)
self._json_encoder_cls = json_encoder_cls
# Initialize StructuredLogHandler normally so that emit() has all expected state
super().__init__(
labels=labels,
stream=stream,
project_id=project_id,
json_encoder_cls=json_encoder_cls,
**kwargs,
)
# Replace the default CloudLoggingFilter with our custom trace-aware filter
for existing_filter in list(self.filters):
if isinstance(existing_filter, CloudLoggingFilter):
self.removeFilter(existing_filter)
log_filter = TraceAwareCloudLoggingFilter(
project=project_id,
default_labels=labels,
)
self.addFilter(log_filter)

Copilot uses AI. Check for mistakes.
@tgvashworth tgvashworth marked this pull request as draft December 23, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants