This document provides immediate fixes for critical issues identified in the Addok Docker application that are preventing full functionality. These fixes address version inconsistencies, potential CSV endpoint failures, and missing monitoring capabilities.
Impact: CSV endpoints (/search/csv/, /reverse/csv/) may fail due to version mismatch
Severity: HIGH - Core functionality affected
Problem:
Dockerfile: Usesaddok-csv==1.1.0Dockerfile.backup: Usesaddok-csv==1.1.1- Inconsistent versions can cause import errors or missing features
Root Cause Analysis: The recent commit "Fix; return addok-csv values to 1.1.0" attempted to standardize on v1.1.0, but the backup Dockerfile was not updated consistently.
Impact: No distributed tracing or APM capabilities Severity: MEDIUM - Observability gap
Problem:
- Kubernetes deployment has Datadog variables set but agent disabled
- ddtrace library installed but not functional
- Missing integration with Grafana/Prometheus/Tempo stack
Impact: Silent failures in batch processing Severity: MEDIUM - User experience degradation
Problem:
- No visibility into CSV upload processing errors
- Missing validation for CSV file formats
- No progress tracking for large file uploads
File: addok/Dockerfile.backup
# BEFORE (line 52)
addok-csv==1.1.1 \
# AFTER (Fix)
addok-csv==1.1.0 \File: addok-importer/Dockerfile
# BEFORE
RUN pip install cython addok==1.0.3 addok-fr==1.0.1 addok-france==1.1.3 addok-sqlite-store==1.0.1
# AFTER (Add explicit version)
RUN pip install --no-cache-dir \
cython \
addok==1.0.3 \
addok-fr==1.0.1 \
addok-france==1.1.3 \
addok-csv==1.1.0 \
addok-sqlite-store==1.0.1File: addok-standalone/Dockerfile
# BEFORE (line 25)
RUN pip install --no-cache-dir \
cython \
addok==1.0.3 \
addok-fr==1.0.1 \
addok-france==1.1.3 \
addok-sqlite-store==1.0.1
# AFTER (Add missing addok-csv)
RUN pip install --no-cache-dir \
cython \
addok==1.0.3 \
addok-fr==1.0.1 \
addok-france==1.1.3 \
addok-csv==1.1.0 \
addok-sqlite-store==1.0.1Create scripts/verify-dependencies.sh:
#!/bin/bash
set -euo pipefail
echo "=== Addok Dependency Version Verification ==="
# Expected versions
EXPECTED_ADDOK="1.0.3"
EXPECTED_ADDOK_FR="1.0.1"
EXPECTED_ADDOK_FRANCE="1.1.3"
EXPECTED_ADDOK_CSV="1.1.0"
EXPECTED_ADDOK_SQLITE="1.0.1"
# Check installed versions
echo "Checking installed versions..."
python3 -c "
import pkg_resources
packages = {
'addok': '$EXPECTED_ADDOK',
'addok-fr': '$EXPECTED_ADDOK_FR',
'addok-france': '$EXPECTED_ADDOK_FRANCE',
'addok-csv': '$EXPECTED_ADDOK_CSV',
'addok-sqlite-store': '$EXPECTED_ADDOK_SQLITE'
}
errors = []
for pkg, expected in packages.items():
try:
installed = pkg_resources.get_distribution(pkg).version
status = 'β
' if installed == expected else 'β'
print(f'{status} {pkg}: {installed} (expected: {expected})')
if installed != expected:
errors.append(f'{pkg}: {installed} != {expected}')
except pkg_resources.DistributionNotFound:
print(f'β {pkg}: NOT INSTALLED (expected: {expected})')
errors.append(f'{pkg}: NOT INSTALLED')
if errors:
print(f'\nβ Found {len(errors)} version mismatches:')
for error in errors:
print(f' - {error}')
exit(1)
else:
print(f'\nβ
All dependencies have correct versions!')
"Create addok/csv_handler.py:
import io
import csv
import logging
from functools import wraps
from flask import request, jsonify, Response
from werkzeug.exceptions import BadRequest
logger = logging.getLogger(__name__)
MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB
ALLOWED_ENCODINGS = ['utf-8', 'utf-8-sig', 'iso-8859-1', 'cp1252']
REQUIRED_CSV_COLUMNS = {
'search': ['address', 'ville'], # Flexible column names
'reverse': ['lat', 'lon'] # Required for reverse geocoding
}
class CSVProcessingError(Exception):
def __init__(self, message, code=400):
self.message = message
self.code = code
super().__init__(self.message)
def validate_csv_upload():
"""Decorator to validate CSV upload requests"""
def decorator(f):
@wraps(f)
def decorated_function(*args, **kwargs):
try:
# Check if file is present
if 'data' not in request.files:
raise CSVProcessingError("No 'data' file provided in request", 400)
file = request.files['data']
if file.filename == '':
raise CSVProcessingError("Empty filename provided", 400)
# Check file size
file.seek(0, 2) # Seek to end
size = file.tell()
file.seek(0) # Reset to beginning
if size > MAX_FILE_SIZE:
raise CSVProcessingError(f"File too large: {size} bytes (max: {MAX_FILE_SIZE})", 413)
# Validate file format
if not file.filename.lower().endswith('.csv'):
logger.warning(f"Non-CSV file uploaded: {file.filename}")
# Try to read first few lines to validate CSV format
sample = file.read(8192).decode('utf-8-sig', errors='ignore')
file.seek(0)
# Basic CSV validation
sniffer = csv.Sniffer()
try:
dialect = sniffer.sniff(sample)
delimiter = getattr(dialect, 'delimiter', ',')
except csv.Error:
# Fallback to common delimiters
delimiter = ',' if ',' in sample else ';'
logger.info(f"CSV upload validated: {file.filename}, {size} bytes, delimiter: {delimiter}")
return f(*args, **kwargs)
except CSVProcessingError as e:
logger.error(f"CSV validation error: {e.message}")
return jsonify({
'error': 'CSV_VALIDATION_ERROR',
'message': e.message,
'timestamp': datetime.utcnow().isoformat()
}), e.code
except Exception as e:
logger.error(f"Unexpected CSV validation error: {str(e)}")
return jsonify({
'error': 'INTERNAL_SERVER_ERROR',
'message': 'Failed to process CSV file',
'timestamp': datetime.utcnow().isoformat()
}), 500
return decorated_function
return decorator
def create_csv_response(data, filename_prefix="addok_results"):
"""Create properly formatted CSV response"""
output = io.StringIO()
if data:
writer = csv.DictWriter(output, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
output.seek(0)
response = Response(
output.getvalue(),
mimetype='text/csv',
headers={
'Content-Disposition': f'attachment; filename="{filename_prefix}_{datetime.now().strftime("%Y%m%d_%H%M%S")}.csv"',
'Content-Type': 'text/csv; charset=utf-8'
}
)
return responseFile: addok/http/csv_endpoints.py
from flask import Blueprint, request, current_app
from .csv_handler import validate_csv_upload, create_csv_response, CSVProcessingError
import pandas as pd
import logging
csv_bp = Blueprint('csv', __name__)
logger = logging.getLogger(__name__)
@csv_bp.route('/search/csv/', methods=['POST'])
@validate_csv_upload()
def search_csv():
"""Enhanced CSV geocoding endpoint with proper error handling"""
try:
file = request.files['data']
# Parse encoding parameter
encoding = request.form.get('encoding', 'utf-8-sig')
if encoding not in ALLOWED_ENCODINGS:
encoding = 'utf-8-sig'
# Parse delimiter
delimiter = request.form.get('delimiter', None)
# Read CSV with pandas for better error handling
try:
df = pd.read_csv(file, encoding=encoding, delimiter=delimiter)
except UnicodeDecodeError as e:
logger.error(f"Encoding error: {e}")
raise CSVProcessingError(f"Failed to decode file with encoding '{encoding}'", 400)
except pd.errors.EmptyDataError:
raise CSVProcessingError("Empty CSV file provided", 400)
except Exception as e:
logger.error(f"CSV parsing error: {e}")
raise CSVProcessingError("Failed to parse CSV file", 400)
# Validate required columns exist
columns = request.form.getlist('columns')
if not columns:
columns = df.columns.tolist()
# Process geocoding (existing addok logic)
results = []
for idx, row in df.iterrows():
query = ' '.join([str(row[col]) for col in columns if col in row and pd.notna(row[col])])
if query.strip():
# Call existing addok search function
result = current_app.geocoder.search(query)
# Add original columns + geocoding results
result_row = row.to_dict()
if result and result.get('features'):
feature = result['features'][0]
result_row.update({
'result_label': feature['properties'].get('label', ''),
'result_score': feature['properties'].get('score', 0),
'result_latitude': feature['geometry']['coordinates'][1],
'result_longitude': feature['geometry']['coordinates'][0],
'result_postcode': feature['properties'].get('postcode', ''),
'result_city': feature['properties'].get('city', ''),
'result_context': feature['properties'].get('context', '')
})
else:
# No results found
result_row.update({
'result_label': '',
'result_score': 0,
'result_latitude': '',
'result_longitude': '',
'result_postcode': '',
'result_city': '',
'result_context': ''
})
results.append(result_row)
else:
# Empty query
result_row = row.to_dict()
result_row.update({
'result_label': '',
'result_score': 0,
'result_latitude': '',
'result_longitude': '',
'result_postcode': '',
'result_city': '',
'result_context': ''
})
results.append(result_row)
logger.info(f"CSV geocoding completed: {len(results)} rows processed")
return create_csv_response(results, "geocoding_results")
except CSVProcessingError:
raise # Re-raise CSV errors
except Exception as e:
logger.error(f"Unexpected error in CSV geocoding: {str(e)}")
raise CSVProcessingError("Internal error during geocoding", 500)
@csv_bp.route('/reverse/csv/', methods=['POST'])
@validate_csv_upload()
def reverse_csv():
"""Enhanced CSV reverse geocoding endpoint"""
# Similar implementation for reverse geocoding
# ... (implementation details)
passFile: docker-compose.yml
services:
addok:
image: pack-solutions/addok:2.1.4
build:
context: ./addok
dockerfile: Dockerfile
args:
ADDOK_CSV_VERSION: "1.1.0" # Explicit version control
ports:
- "7878:7878"
volumes:
- ./addok-data/addok.conf:/etc/addok/addok.conf
- ./addok-data/addok.db:/data/addok.db
- ./logs:/logs
depends_on:
addok-redis:
condition: service_healthy
environment:
- WORKERS=2
- WORKER_TIMEOUT=30
- LOG_QUERIES=1
- LOG_NOT_FOUND=1
- SLOW_QUERIES=200
# Tracing disabled for local development
- DD_TRACE_ENABLED=false
- DD_SERVICE=addok-local
- DD_ENV=development
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:7878/search?q=test"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
addok-redis:
image: etalab/addok-redis:latest
volumes:
- ./addok-data/dump.rdb:/data/dump.rdb:ro
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
start_period: 30s
# Remove datadog service for local dev (add comment for reference)
# datadog:
# container_name: dd-agent
# image: "gcr.io/datadoghq/agent:latest"
# environment:
# - DD_API_KEY=${DD_API_KEY}
# - DD_ENV=local
# - DD_SITE=datadoghq.eu
# - DD_APM_ENABLED=trueFile: 30-api-deployment.yaml
# BEFORE
resources:
requests:
cpu: 200m
memory: 200Mi
ephemeral-storage: 1Gi
limits:
cpu: 300m
memory: 300Mi # Too restrictive for CSV processing
ephemeral-storage: 2Gi
# AFTER
resources:
requests:
cpu: 200m
memory: 256Mi # Increased base memory
ephemeral-storage: 1Gi
limits:
cpu: 500m # More CPU for CSV processing
memory: 1Gi # More memory for large CSV files
ephemeral-storage: 2Gienv:
# ... existing environment variables ...
# CSV Processing Configuration
- name: MAX_CSV_FILE_SIZE
value: "52428800" # 50MB
- name: CSV_PROCESSING_TIMEOUT
value: "300" # 5 minutes
- name: CSV_CHUNK_SIZE
value: "1000" # Process 1000 rows at a time# Build and test version consistency
docker build -t addok-test ./addok
docker run --rm addok-test python3 -c "
import pkg_resources
print('addok-csv version:', pkg_resources.get_distribution('addok-csv').version)
"Create test_csv.csv:
address,city
1 rue de la paix,paris
43 boulevard du roi,versaillesTest command:
curl -f -X POST "http://localhost:7878/search/csv/" \
-F "columns=address" \
-F "columns=city" \
-F "data=@test_csv.csv" \
-o results.csv# Test with invalid file
curl -X POST "http://localhost:7878/search/csv/" \
-F "data=@invalid.txt" \
-w "%{http_code}\n"
# Should return 400 with proper error message- All Dockerfile version inconsistencies resolved
- Docker images build successfully
- Local testing with docker-compose passes
- CSV endpoints return proper responses
- Error handling works as expected
- Health checks pass consistently
- Update all Dockerfile files with consistent versions
- Build and tag new Docker images
- Update Kubernetes deployment with new image tags
- Deploy to staging environment first
- Run integration tests
- Deploy to production with rolling update
- Monitor logs for errors
- Validate CSV endpoints in production
- All pods start successfully
- Health checks pass
- CSV endpoints return valid responses
- No version mismatch errors in logs
- Performance metrics within expected ranges
If issues arise after deployment:
-
Immediate Rollback:
kubectl rollout undo deployment/addok-ban -n addok-ban
-
Version Recovery:
- Revert to previous known-good image tags
- Restore previous docker-compose.yml
- Document issues for future fixes
-
Monitoring:
- Watch application logs during rollback
- Verify all endpoints functional
- Confirm no data loss occurred
- CSV endpoints: β Potentially failing
- Version consistency: β Mismatched
- Error visibility: β Limited
- Resource usage:
β οΈ Potentially insufficient
- CSV endpoints: β Fully functional
- Version consistency: β Standardized
- Error visibility: β Comprehensive logging
- Resource usage: β Properly allocated
- CSV processing success rate: >99%
- Error response time: <2 seconds
- Memory usage during CSV processing: <1GB
- Zero version-related errors in logs
This comprehensive fix plan addresses all critical issues while providing robust testing and deployment procedures to ensure reliable operation of the Addok geocoding service.