Skip to content

[Chore] CI 테스트 외부 의존성 mock 처리#68

Merged
zweadfx merged 4 commits intomainfrom
chore/ci-test-mock
Mar 17, 2026
Merged

[Chore] CI 테스트 외부 의존성 mock 처리#68
zweadfx merged 4 commits intomainfrom
chore/ci-test-mock

Conversation

@zweadfx
Copy link
Copy Markdown
Owner

@zweadfx zweadfx commented Mar 17, 2026

어떤 변경사항인가요?

CI 환경에서 pytest 실행 시 ChromaDB, OpenAI API 등 외부 의존성으로 인해 전체 테스트가 실패하던 문제를 mock 처리하여 해결했습니다.

작업 상세 내용

  • tests/conftest.pygenerate_embeddings 공유 mock (lifespan 초기화 시 OpenAI 호출 차단)
  • tests/test_gear_advisor.py — ChromaDB 조회 및 OpenAI API 호출 mock 처리
  • tests/test_skill_lab.py — ChromaDB 드릴 조회 mock 처리
  • tests/test_weekly_coach.py — 기존 mock 유지, conftest로 lifespan 문제 해결
  • tests/test_whistle.py — ChromaDB 규칙/용어 조회 mock 처리
  • 로컬 환경 전체 테스트 통과 확인 (52 passed)

체크리스트

  • self-test를 수행하였는가?
  • 관련 문서나 주석을 업데이트하였는가?
  • 설정한 코딩 컨벤션을 준수하였는가?

관련 이슈

리뷰 포인트

  • tests/conftest.pyautouse=True fixture가 모든 테스트에 자동 적용되어 lifespan의 OpenAI 임베딩 호출을 차단하는 구조
  • 각 테스트 파일의 side_effect 함수가 입력에 따라 필터링된 결과를 반환하여 실제 DB 로직을 적절히 시뮬레이션하는지

참고사항 및 스크린샷(선택)

  • CI 실패 원인: ValueError: Failed to retrieve from database, openai.RateLimitError: insufficient_quota
  • 이전 CI 결과: 9 passed, 43 failed → 현재: 52 passed, 0 failed

Summary by CodeRabbit

  • Tests
    • Enhanced test infrastructure with mocked external service dependencies to improve CI/CD reliability.
    • Expanded test coverage for core features including gear recommendations, skill drills, and rules retrieval.
    • Tests now execute in isolated environments without requiring live API connections.

@zweadfx zweadfx self-assigned this Mar 17, 2026
@zweadfx zweadfx added the chore 실제 코드나 기능과는 직접적인 연관이 없는 설정 작업을 의미합니다. label Mar 17, 2026
@zweadfx zweadfx linked an issue Mar 17, 2026 that may be closed by this pull request
5 tasks
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This PR introduces automated mocking infrastructure for ChromaDB and OpenAI embedding services across the test suite. A new conftest.py module patches the embedding generation function globally for all tests, while existing test files are refactored to use mock ChromaDB data and fixture-based dependency injection, eliminating external API calls during CI execution.

Changes

Cohort / File(s) Summary
Embedding and ChromaDB Mocking Infrastructure
tests/conftest.py
Introduces autouse fixture that mocks generate_embeddings with a zero-valued vector (dimension 1536), allowing ChromaDB initialization without OpenAI API access in CI environments.
Gear Advisor Tests
tests/test_gear_advisor.py
Adds mock_chroma, shoe_retriever_instance, and mock_gear_agent fixtures with synthetic ChromaDB responses. Introduces 7 new unit tests covering shoe retrieval scenarios (normal, sensory filtering, archetype matching, complex conditions, exception handling, budget, signature boosting) and 3 refactored API endpoint tests with mocked agent responses.
Skill Lab Tests
tests/test_skill_lab.py
Introduces mock_drills fixture with _fake_query_drills function and synthetic drill documents/metadata. Refactors test methods to accept the mock fixture, replacing direct chroma_manager.query_drills calls with controlled in-memory responses.
Whistle Tests
tests/test_whistle.py
Adds mock_chroma_rules fixture with _fake_query_rules and _fake_query_glossary functions for rules/glossary data. Introduces rule_retriever_instance fixture for dependency injection, refactoring RuleRetriever tests to use mocked ChromaDB access instead of direct instantiation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

  • Issue #64 ([Test] Skill Lab / Weekly Coach 테스트 커버리지 확보) — This PR provides the test infrastructure foundation (conftest mocking and fixture patterns) that enables the comprehensive test coverage outlined in the linked issue.

Possibly related PRs

  • PR #64 — Both modify tests/test_skill_lab.py with mock fixtures for chroma_manager.query_drills and extend Skill Lab test coverage using synchronized mocking patterns.
  • PR #18 — This PR's test additions and embedding mocks directly exercise the Gear Advisor endpoints and RAG retrieval components that were introduced in that PR.

Poem

🐇 With mocks in place and fixtures set,
Our tests now need not fret,
No API calls shall dare appear,
ChromaDB whispers, crystal clear,
The CI tests dance without a tear!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Chore] CI 테스트 외부 의존성 mock 처리' accurately describes the main change: adding mocks for external dependencies in CI tests to prevent API calls and database interactions.
Linked Issues check ✅ Passed The PR addresses the primary objective from issue #64 by adding comprehensive test coverage with mocking infrastructure (conftest.py with autouse mock for embeddings, test fixtures for ChromaDB mocks) that enables the test suite to pass in CI environments without external API calls.
Out of Scope Changes check ✅ Passed All changes are scoped to test infrastructure improvements: adding mocks and fixtures to tests/conftest.py, tests/test_gear_advisor.py, tests/test_skill_lab.py, and tests/test_whistle.py. No production code modifications are included.
Docstring Coverage ✅ Passed Docstring coverage is 97.37% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/ci-test-mock
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_skill_lab.py (1)

173-189: ⚠️ Potential issue | 🟠 Major

Unit tests are asserting the fake helper, not the retrieval path.

At Line 185 / 213 / 262 / 298 / 314 / 326, tests call _fake_query_drills() directly, so the patch at Line 147 is never actually exercised. This can hide regressions in the real retrieval wiring.

Suggested fix
+from src.services.rag.chroma_db import chroma_manager
...
-        results = _fake_query_drills(
+        results = chroma_manager.query_drills(
             query_texts=[query_text],
             n_results=10,
             where={"category": "dribble"},
         )
+        mock_drills.assert_called()

Apply the same pattern to the other retrieval tests currently invoking _fake_query_drills directly.

Also applies to: 203-215, 253-264, 288-302, 310-330

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_skill_lab.py` around lines 173 - 189, Tests currently call the
helper _fake_query_drills directly (e.g., in test_tc01_normal_drill_retrieval
and the other listed tests), so the real retrieval patch is never exercised;
update those tests to call the actual retrieval function introduced in the patch
(the real query/retrieval function that replaces _fake_query_drills) with the
same parameters (query_texts, n_results, where) while keeping the mock_drills
fixture or mocks for external dependencies intact, and then assert on the
returned results as before so the true retrieval wiring is exercised instead of
the fake helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_gear_advisor.py`:
- Around line 124-135: The mock _fake_query_shoes currently ignores the
n_results parameter and returns all matches from
_FAKE_SHOE_DOCS/_FAKE_SHOE_METAS; update _fake_query_shoes to respect n_results
by applying the filter logic first and then truncating the matched docs and
metas to at most n_results before returning, keeping the same return shape
{"documents": [docs], "metadatas": [metas]}; ensure edge cases (n_results is
None or <=0) are handled sensibly (e.g., treat None as no limit, <=0 returns
empty lists).
- Line 127: The two zip usages silently truncate if the paired lists differ in
length; update the for-loop and the paired creation to use zip(..., strict=True)
so mismatched lengths raise an error: change the for d, m in
zip(_FAKE_SHOE_DOCS, _FAKE_SHOE_METAS) loop to use strict=True and change the
paired = list(zip(_FAKE_PLAYER_DOCS, _FAKE_PLAYER_METAS)) call to use
strict=True as well so the linter B905 is satisfied and length mismatches are
caught.

In `@tests/test_skill_lab.py`:
- Line 130: The zip call iterating over _FAKE_DRILL_DOCS and _FAKE_DRILL_METAS
should use strict=True to satisfy B905 and catch length mismatches; update the
loop that currently does for d, m in zip(_FAKE_DRILL_DOCS, _FAKE_DRILL_METAS):
to call zip with strict=True (i.e., zip(_FAKE_DRILL_DOCS, _FAKE_DRILL_METAS,
strict=True)) so the test fails if the fixtures differ in length.

In `@tests/test_whistle.py`:
- Line 103: The two zip usages iterating test fixtures (e.g., the loop using for
d, m in zip(_FAKE_RULE_DOCS, _FAKE_RULE_METAS) and the other zip at the later
test) should be made strict to avoid silent truncation; update both zip(...)
calls to pass strict=True so the test will raise if the sequences differ in
length, leaving the rest of the loop logic unchanged.

---

Outside diff comments:
In `@tests/test_skill_lab.py`:
- Around line 173-189: Tests currently call the helper _fake_query_drills
directly (e.g., in test_tc01_normal_drill_retrieval and the other listed tests),
so the real retrieval patch is never exercised; update those tests to call the
actual retrieval function introduced in the patch (the real query/retrieval
function that replaces _fake_query_drills) with the same parameters
(query_texts, n_results, where) while keeping the mock_drills fixture or mocks
for external dependencies intact, and then assert on the returned results as
before so the true retrieval wiring is exercised instead of the fake helper.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 07e0b407-e398-41b0-9a99-00cc768aed68

📥 Commits

Reviewing files that changed from the base of the PR and between e234e64 and c864685.

📒 Files selected for processing (4)
  • tests/conftest.py
  • tests/test_gear_advisor.py
  • tests/test_skill_lab.py
  • tests/test_whistle.py

Comment on lines +124 to +135
def _fake_query_shoes(query_texts, n_results=10, where=None):
"""Return filtered shoe results based on where clause."""
docs, metas = [], []
for d, m in zip(_FAKE_SHOE_DOCS, _FAKE_SHOE_METAS):
if where and "price_krw" in str(where):
# Extract budget limit from where clause
budget = where.get("price_krw", {}).get("$lte")
if budget is not None and m["price_krw"] > budget:
continue
docs.append(d)
metas.append(m)
return {"documents": [docs], "metadatas": [metas]}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mock query_shoes should honor n_results to match real query behavior.

Right now the fake returns all matches regardless of requested limit, which can hide pagination/limit regressions.

Suggested fix
 def _fake_query_shoes(query_texts, n_results=10, where=None):
     """Return filtered shoe results based on where clause."""
     docs, metas = [], []
     for d, m in zip(_FAKE_SHOE_DOCS, _FAKE_SHOE_METAS):
         if where and "price_krw" in str(where):
             # Extract budget limit from where clause
             budget = where.get("price_krw", {}).get("$lte")
             if budget is not None and m["price_krw"] > budget:
                 continue
         docs.append(d)
         metas.append(m)
+    docs = docs[:n_results]
+    metas = metas[:n_results]
     return {"documents": [docs], "metadatas": [metas]}
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 127-127: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_gear_advisor.py` around lines 124 - 135, The mock
_fake_query_shoes currently ignores the n_results parameter and returns all
matches from _FAKE_SHOE_DOCS/_FAKE_SHOE_METAS; update _fake_query_shoes to
respect n_results by applying the filter logic first and then truncating the
matched docs and metas to at most n_results before returning, keeping the same
return shape {"documents": [docs], "metadatas": [metas]}; ensure edge cases
(n_results is None or <=0) are handled sensibly (e.g., treat None as no limit,
<=0 returns empty lists).

def _fake_query_shoes(query_texts, n_results=10, where=None):
"""Return filtered shoe results based on where clause."""
docs, metas = [], []
for d, m in zip(_FAKE_SHOE_DOCS, _FAKE_SHOE_METAS):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Confirm target Python/lint config for zip(strict=...)
fd -H -a 'pyproject.toml|setup.cfg|ruff.toml|.ruff.toml' .
rg -n --no-heading 'requires-python|target-version|B905|select|ignore' \
  pyproject.toml setup.cfg ruff.toml .ruff.toml 2>/dev/null || true

Repository: zweadfx/assist

Length of output: 453


🏁 Script executed:

# Check the full ruff.toml configuration
head -30 /home/jailuser/git/ruff.toml

Repository: zweadfx/assist

Length of output: 1079


🏁 Script executed:

# View the zip() calls at the specified lines in the test file
sed -n '125,145p' /home/jailuser/git/tests/test_gear_advisor.py | cat -n

Repository: zweadfx/assist

Length of output: 1157


Add strict=True to both zip() calls to comply with B905 linting rule.

This is enforced by the project's ruff configuration (flake8-bugbear is selected) and prevents silent truncation if paired lists have mismatched lengths.

# Line 127:
for d, m in zip(_FAKE_SHOE_DOCS, _FAKE_SHOE_METAS, strict=True):

# Line 142:
paired = list(zip(_FAKE_PLAYER_DOCS, _FAKE_PLAYER_METAS, strict=True))

Also applies to: 142-142

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 127-127: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_gear_advisor.py` at line 127, The two zip usages silently truncate
if the paired lists differ in length; update the for-loop and the paired
creation to use zip(..., strict=True) so mismatched lengths raise an error:
change the for d, m in zip(_FAKE_SHOE_DOCS, _FAKE_SHOE_METAS) loop to use
strict=True and change the paired = list(zip(_FAKE_PLAYER_DOCS,
_FAKE_PLAYER_METAS)) call to use strict=True as well so the linter B905 is
satisfied and length mismatches are caught.

def _fake_query_drills(query_texts, n_results=3, where=None):
"""Return filtered drill results based on where clause."""
docs, metas = [], []
for d, m in zip(_FAKE_DRILL_DOCS, _FAKE_DRILL_METAS):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify Python target and Ruff rule configuration for B905/zip(strict=...)
fd -H -a 'pyproject.toml|setup.cfg|ruff.toml|.ruff.toml' .
rg -n --no-heading 'requires-python|target-version|B905|select|ignore' \
  pyproject.toml setup.cfg ruff.toml .ruff.toml 2>/dev/null || true

Repository: zweadfx/assist

Length of output: 453


🏁 Script executed:

cat ruff.toml

Repository: zweadfx/assist

Length of output: 1135


🏁 Script executed:

# Check if B905 is explicitly mentioned anywhere in config files
rg 'B905' .

Repository: zweadfx/assist

Length of output: 40


🏁 Script executed:

# Look at the actual code at line 130 and surrounding context
sed -n '125,135p' tests/test_skill_lab.py

Repository: zweadfx/assist

Length of output: 432


🏁 Script executed:

# Check for per-file ignores that might affect tests/test_skill_lab.py
rg 'test_skill_lab|test.*\.py.*B905' ruff.toml

Repository: zweadfx/assist

Length of output: 40


🏁 Script executed:

# Find the definitions of _FAKE_DRILL_DOCS and _FAKE_DRILL_METAS
rg -n '_FAKE_DRILL_DOCS|_FAKE_DRILL_METAS' tests/test_skill_lab.py | head -20

Repository: zweadfx/assist

Length of output: 168


🏁 Script executed:

# Get context around these fixture definitions
sed -n '1,100p' tests/test_skill_lab.py | head -50

Repository: zweadfx/assist

Length of output: 1832


🏁 Script executed:

# Get the full definitions of both fixtures
sed -n '80,95p' tests/test_skill_lab.py

Repository: zweadfx/assist

Length of output: 647


🏁 Script executed:

# Get the complete _FAKE_DRILL_METAS definition
sed -n '88,120p' tests/test_skill_lab.py

Repository: zweadfx/assist

Length of output: 943


🏁 Script executed:

# Count the items in both fixtures
sed -n '80,86p' tests/test_skill_lab.py | grep -c '"'

Repository: zweadfx/assist

Length of output: 59


🏁 Script executed:

# Get the complete _FAKE_DRILL_METAS list
sed -n '88,128p' tests/test_skill_lab.py

Repository: zweadfx/assist

Length of output: 1144


Add strict=True to zip() to satisfy the enabled B905 linting rule.

This zip() call lacks the strict= parameter. Since B905 is enabled in the ruff configuration and your project requires Python 3.10+, use zip(..., strict=True) to satisfy the rule and catch any future fixture-data length mismatches during tests.

Example fix
for d, m in zip(_FAKE_DRILL_DOCS, _FAKE_DRILL_METAS, strict=True):
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 130-130: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_skill_lab.py` at line 130, The zip call iterating over
_FAKE_DRILL_DOCS and _FAKE_DRILL_METAS should use strict=True to satisfy B905
and catch length mismatches; update the loop that currently does for d, m in
zip(_FAKE_DRILL_DOCS, _FAKE_DRILL_METAS): to call zip with strict=True (i.e.,
zip(_FAKE_DRILL_DOCS, _FAKE_DRILL_METAS, strict=True)) so the test fails if the
fixtures differ in length.

def _fake_query_rules(query_texts, n_results=5, where=None):
"""Return filtered rule results based on where clause."""
docs, metas, dists = [], [], []
for d, m in zip(_FAKE_RULE_DOCS, _FAKE_RULE_METAS):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify Python and Ruff settings before enforcing zip(strict=...)
fd -H -a 'pyproject.toml|setup.cfg|ruff.toml|.ruff.toml' .
rg -n --no-heading 'requires-python|target-version|B905|select|ignore' \
  pyproject.toml setup.cfg ruff.toml .ruff.toml 2>/dev/null || true

Repository: zweadfx/assist

Length of output: 453


🏁 Script executed:

# Check if B905 is in the enabled rules
cat ruff.toml | head -50

Repository: zweadfx/assist

Length of output: 1135


🏁 Script executed:

# Examine the specific lines in tests/test_whistle.py
sed -n '100,125p' tests/test_whistle.py

Repository: zweadfx/assist

Length of output: 989


🏁 Script executed:

# Search for all zip() calls in the test file to understand context
rg -n 'zip\(' tests/test_whistle.py -A 1 -B 1

Repository: zweadfx/assist

Length of output: 312


Add strict=True to both zip() calls (B905 rule violation).

The Ruff configuration explicitly enables flake8-bugbear rules ("B" in select), which includes B905. Both zip() calls at lines 103 and 120 lack the strict= parameter and violate this rule. Adding strict=True prevents silent truncation if the zipped sequences have different lengths, which helps catch bugs during testing.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 103-103: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_whistle.py` at line 103, The two zip usages iterating test
fixtures (e.g., the loop using for d, m in zip(_FAKE_RULE_DOCS,
_FAKE_RULE_METAS) and the other zip at the later test) should be made strict to
avoid silent truncation; update both zip(...) calls to pass strict=True so the
test will raise if the sequences differ in length, leaving the rest of the loop
logic unchanged.

@zweadfx zweadfx merged commit 1b3b7cd into main Mar 17, 2026
2 of 4 checks passed
@zweadfx zweadfx deleted the chore/ci-test-mock branch March 17, 2026 05:19
zweadfx added a commit that referenced this pull request Mar 17, 2026
[Chore] ChatGPT 코드 리뷰 워크플로우 및 pytest CI 제거
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore 실제 코드나 기능과는 직접적인 연관이 없는 설정 작업을 의미합니다.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Chore] CI 테스트 외부 의존성 mock 처리

1 participant