[Feat] 모니터링 관련 설정 (Prometheus, Grafana, Loki)#197
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughPrometheus/Grafana/Loki 기반 모니터링 스택을 추가합니다: Gradle에 Micrometer Prometheus 의존성을 추가하고, Docker Compose 및 Prometheus/Loki 설정 파일을 추가하며 Spring Boot 액추에이터 노출과 보안 허용 경로를 업데이트했습니다. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(100,150,240,0.5)
participant Client
end
rect rgba(50,200,100,0.5)
participant App as KareerServer
end
rect rgba(240,200,60,0.5)
participant Prom as Prometheus
end
rect rgba(220,120,200,0.5)
participant Graf as Grafana
end
rect rgba(200,100,100,0.5)
participant Loki
end
Client->>App: 요청 처리 (로그·메트릭 생성)
App-->>Prom: /actuator/prometheus 노출(메트릭)
Prom->>Prom: 메트릭 스크랩 및 저장
Graf->>Prom: 쿼리 (대시보드 데이터)
Client->>Graf: 대시보드 조회
App-->>Loki: 로그 전송 (수집기/플루언트 경유)
Loki->>Loki: 로그 인덱스 및 저장
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (4)
deployment/docker-compose-monitoring.yml (2)
31-33:depends_on에 health check 조건 추가 권장현재
depends_on은 컨테이너 시작만 기다리며, 서비스가 실제로 준비되었는지 확인하지 않습니다. Prometheus나 Loki가 완전히 초기화되기 전에 Grafana가 연결을 시도하면 초기 오류가 발생할 수 있습니다.♻️ health check 조건 추가 예시
prometheus: image: prom/prometheus:latest ... + healthcheck: + test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"] + interval: 10s + timeout: 5s + retries: 3 grafana: ... depends_on: - - prometheus - - loki + prometheus: + condition: service_healthy + loki: + condition: service_healthy loki: image: grafana/loki:latest ... + healthcheck: + test: ["CMD", "wget", "-q", "--spider", "http://localhost:3100/ready"] + interval: 10s + timeout: 5s + retries: 3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deployment/docker-compose-monitoring.yml` around lines 31 - 33, The compose uses a simple depends_on (the depends_on block with - prometheus and - loki) which only waits for container start, not service readiness; add healthcheck definitions to the prometheus and loki services (command, interval, timeout, retries) and change Grafana's depends_on to use condition: service_healthy for prometheus and loki so Grafana waits until those services pass their healthchecks before starting; update the depends_on block and add corresponding healthcheck blocks referenced by the service names prometheus and loki.
1-1:version필드 제거 고려Docker Compose V2에서
version필드는 더 이상 필요하지 않으며 무시됩니다. 최신 Docker Compose에서는 이 필드를 제거하는 것이 권장됩니다.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deployment/docker-compose-monitoring.yml` at line 1, Remove the top-level "version" field from the compose file (the literal version: "3.8") because modern Docker Compose v2 ignores it; locate the compose YAML containing the version key and delete that single line so the file relies on the newer schema (top-level keys like services, networks, volumes remain unchanged) and validate the resulting docker-compose-monitoring.yml to ensure no syntax issues.deployment/prometheus/prometheus.yml (1)
5-10: 스크래핑 타임아웃 및 TLS 설정 추가 권장네트워크 지연이나 일시적인 장애 시 안정적인 스크래핑을 위해
scrape_timeout설정을 추가하는 것이 좋습니다. 기본값은scrape_interval과 동일하지만, 명시적으로 설정하면 관리가 용이합니다.♻️ 권장 수정안
global: scrape_interval: 15s evaluation_interval: 15s + scrape_timeout: 10s scrape_configs: - job_name: 'kareer_server' metrics_path: '/actuator/prometheus' + scheme: https static_configs: - - targets: ['api.ka-reer.com:443'] # HTTPS를 통해 접근 - scheme: https + - targets: ['api.ka-reer.com:443']🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deployment/prometheus/prometheus.yml` around lines 5 - 10, 현재 job_name 'kareer_server'의 scrape 설정에 scrape_timeout과 TLS 세부 설정이 없어 네트워크 지연/인증 문제에 취약합니다; 업데이트할 때 scrape_configs -> job_name: 'kareer_server' 블록에 scrape_timeout (예: "15s" 또는 서비스에 맞는 값)을 명시하고 scheme: https 아래에 tls_config을 추가하여 server_name 또는 ca_file을 지정하거나 필요시 insecure_skip_verify: false로 설정해 TLS 검증을 명시적으로 구성하세요 (metrics_path: '/actuator/prometheus'와 targets: ['api.ka-reer.com:443']는 그대로 유지).build.gradle (1)
89-90: 미사용 tess4j 의존성 제거 권장tess4j는 코드베이스에서 전혀 사용되지 않습니다. 현재 Clova OCR 서비스를 사용 중이며, 어떤 Java 파일에서도 tess4j를 import하거나 사용하지 않습니다. 네이티브 라이브러리를 포함하고 있어 빌드 크기를 불필요하게 증가시키므로, 실제로 Tesseract 기반 OCR 구현이 필요할 때 추가하는 것을 권장합니다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@build.gradle` around lines 89 - 90, Remove the unused Tess4J dependency declaration implementation 'net.sourceforge.tess4j:tess4j:5.13.0' from build.gradle; search for any references to "tess4j" or imports of net.sourceforge.tess4j in the codebase (classes/methods) and ensure none remain, then run the build (e.g., ./gradlew assemble or dependency report) to confirm the project still compiles and the dependency is no longer pulled in; if OCR is needed later, re-add the dependency at that time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deployment/docker-compose-monitoring.yml`:
- Line 9: Replace the use of floating :latest image tags with explicit
version-pinned tags to ensure reproducible deployments: locate the image string
"prom/prometheus:latest" and the other image entries referenced (the occurrences
matching the diff at the same image lines and the Loki image entries tied to
loki/config.yml) and change them to specific, tested version tags (e.g., a
concrete Prometheus and Loki release) and update any related compose references
so all three occurrences are pinned consistently; also add a short comment or
note near the image entries indicating the chosen version and source of truth
for future upgrades.
- Around line 22-34: The Grafana service currently exposes port 3000 with
default credentials; update the grafana service configuration to set a strong
admin password via environment variables (e.g., add GF_SECURITY_ADMIN_PASSWORD)
and disable anonymous access by setting GF_AUTH_ANONYMOUS_ENABLED=false; ensure
these env vars are added to the grafana service block and consider restricting
external exposure (remove or limit the "3000:3000" port mapping or bind it to
localhost) so the grafana service is no longer accessible with default
admin/admin credentials or anonymously.
In `@deployment/loki/config.yml`:
- Line 1: Summary: auth_enabled is set to false leaving Loki open to
unauthenticated Promtail pushes and docker-compose-monitoring.yml binds port
3100 to the host. Fix: set auth_enabled: true in deployment/loki/config.yml (or
document/automate secure credentials) and update docker-compose-monitoring.yml
to avoid public host binding (use internal network only or bind to
127.0.0.1:3100:3100) and ensure firewall rules block external access; reference
the auth_enabled key, Promtail clients, and the ports mapping "3100:3100" when
making the changes.
- Around line 6-20: The Loki config uses storage_config with boltdb and
schema_config setting store: boltdb and schema: v11 which is incompatible with
Loki 3.x; update the config to use the tsdb store and a supported schema (e.g.,
change store: boltdb -> store: tsdb and schema: v12 or newer) and adjust
storage_config to match tsdb expectations (replace or remove boltdb-specific
keys and ensure filesystem/chunks layout remains valid), or alternatively pin
the Docker image in docker-compose-monitoring.yml to a Loki 2.x tag to keep the
existing boltdb/schema: v11 settings.
In `@src/main/java/org/sopt/kareer/global/config/security/SecurityConfig.java`:
- Line 43: The /actuator/prometheus entry currently listed publicly in
SecurityConfig should not be exposed without protection; update SecurityConfig
(e.g., the configure(HttpSecurity http) or WebSecurityCustomizer setup in
SecurityConfig.java) to restrict access to the "/actuator/prometheus"
endpoint—either require authentication/authority (add a requestMatcher for
"/actuator/prometheus" and .authenticated() or a specific role) and enable HTTP
Basic for the actuator endpoints, or implement an IP whitelist by adding a
custom WebExpression (hasIpAddress(...)) or a filter that allows only Prometheus
scrape IPs; ensure the change is applied where the antMatchers/requestMatchers
are configured so the prometheus endpoint is no longer permitAll.
In `@src/main/resources/application.yml`:
- Around line 103-104: application.yml currently exposes the info endpoint but
SecurityConfig.java's PERMIT_ALL_PATTERNS does not include /actuator/info,
causing unauthenticated requests to get 401/403; either remove "info" from
management.endpoints.web.exposure.include in application.yml if you want it
protected, or add "/actuator/info" (and optionally "/actuator/info/**") to the
PERMIT_ALL_PATTERNS constant in SecurityConfig (and update any related security
matcher logic) so /actuator/info is publicly accessible; choose one of these two
actions and make the corresponding change consistently.
- Around line 108-109: The health endpoint is currently exposing sensitive infra
details via management.endpoint.health.show-details: always while
/actuator/health is in PERMIT_ALL_PATTERNS; change
management.endpoint.health.show-details to either when-authorized (preferred) or
never in application.yml and ensure your security config no longer leaves
/actuator/health publicly permitted (remove it from PERMIT_ALL_PATTERNS or add
an authorization rule) so that detailed health info is only returned to
authenticated/authorized principals; update any related config or docs
referencing /actuator/health accordingly.
---
Nitpick comments:
In `@build.gradle`:
- Around line 89-90: Remove the unused Tess4J dependency declaration
implementation 'net.sourceforge.tess4j:tess4j:5.13.0' from build.gradle; search
for any references to "tess4j" or imports of net.sourceforge.tess4j in the
codebase (classes/methods) and ensure none remain, then run the build (e.g.,
./gradlew assemble or dependency report) to confirm the project still compiles
and the dependency is no longer pulled in; if OCR is needed later, re-add the
dependency at that time.
In `@deployment/docker-compose-monitoring.yml`:
- Around line 31-33: The compose uses a simple depends_on (the depends_on block
with - prometheus and - loki) which only waits for container start, not service
readiness; add healthcheck definitions to the prometheus and loki services
(command, interval, timeout, retries) and change Grafana's depends_on to use
condition: service_healthy for prometheus and loki so Grafana waits until those
services pass their healthchecks before starting; update the depends_on block
and add corresponding healthcheck blocks referenced by the service names
prometheus and loki.
- Line 1: Remove the top-level "version" field from the compose file (the
literal version: "3.8") because modern Docker Compose v2 ignores it; locate the
compose YAML containing the version key and delete that single line so the file
relies on the newer schema (top-level keys like services, networks, volumes
remain unchanged) and validate the resulting docker-compose-monitoring.yml to
ensure no syntax issues.
In `@deployment/prometheus/prometheus.yml`:
- Around line 5-10: 현재 job_name 'kareer_server'의 scrape 설정에 scrape_timeout과 TLS
세부 설정이 없어 네트워크 지연/인증 문제에 취약합니다; 업데이트할 때 scrape_configs -> job_name:
'kareer_server' 블록에 scrape_timeout (예: "15s" 또는 서비스에 맞는 값)을 명시하고 scheme: https
아래에 tls_config을 추가하여 server_name 또는 ca_file을 지정하거나 필요시 insecure_skip_verify:
false로 설정해 TLS 검증을 명시적으로 구성하세요 (metrics_path: '/actuator/prometheus'와 targets:
['api.ka-reer.com:443']는 그대로 유지).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6aa87b94-7751-4c4d-b85a-057150a12936
📒 Files selected for processing (6)
build.gradledeployment/docker-compose-monitoring.ymldeployment/loki/config.ymldeployment/prometheus/prometheus.ymlsrc/main/java/org/sopt/kareer/global/config/security/SecurityConfig.javasrc/main/resources/application.yml
Related issue 🛠
Work Description 📝
1. docker-compose-monitoring.yml
2. prometheus.yml
3. loki/config.yml
아직 Promtail 관련 설정은 안해주었고, PR 승인 후 대시보드에서 기본 모니터링 결과가 나오는지 확인하고 Promtail을 통해 애플케이션 서버에서 로그 수집 후 Loki로 전달하도록 구현할 예정입니다.
ScreenShots 📷
To Reviewers 📢
Summary by CodeRabbit
새 기능
설정