Conversation
There was a problem hiding this comment.
Pull request overview
This PR centralizes configuration-backed values into a new listenr.constants module, updates core modules to import constants instead of reading config inline, and adds unit tests to validate constant types/values and migration hygiene.
Changes:
- Introduce
src/listenr/constants.pywith typed, config-backed constants plus areload()helper. - Migrate
cli.py,unified_asr.py,llm_processor.py, andbuild_dataset.pyto consume centralized constants. - Add
tests/test_constants.pycoverage for constant sanity, reload behavior, and smoke-import/migration checks; extendDEFAULT_CONFIGfor dataset/output keys.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_constants.py | Adds constant type/value/reload tests plus migration smoke checks. |
| src/listenr/constants.py | New config-backed constants module and reload() implementation. |
| src/listenr/unified_asr.py | Replaces inline config reads with imports from listenr.constants. |
| src/listenr/llm_processor.py | Uses constants for defaults while keeping runtime override via config for API base. |
| src/listenr/config_manager.py | Extends defaults and renames Output format to line_format. |
| src/listenr/cli.py | Removes inline config reads and imports audio/LLM/storage settings from constants. |
| src/listenr/build_dataset.py | Sources dataset defaults and manifest path from constants/config defaults. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class TestNoCfgCallsInMigratedModules: | ||
| """ | ||
| Ensure migrated modules do not contain inline cfg.get_*_setting() calls | ||
| that duplicate what constants.py already exposes. We allow cfg.get_setting | ||
| ONLY inside _api_base() in llm_processor (URL may be overridden at runtime). | ||
| """ | ||
|
|
||
| def _source(self, module_name: str) -> str: | ||
| mod = importlib.import_module(module_name) | ||
| import inspect | ||
| return inspect.getsource(mod) | ||
|
|
||
| def test_cli_has_no_inline_cfg_get_calls(self): | ||
| src = self._source('listenr.cli') | ||
| # cli.py should contain no cfg.get_*_setting() calls at all | ||
| import re | ||
| calls = re.findall(r'cfg\.get_\w+_setting\(', src) | ||
| assert calls == [], f"cli.py still has inline cfg calls: {calls}" | ||
|
|
||
| def test_build_dataset_has_no_inline_cfg_get_calls(self): | ||
| src = self._source('listenr.build_dataset') | ||
| import re | ||
| calls = re.findall(r'cfg\.get_\w+_setting\(', src) | ||
| assert calls == [], f"build_dataset.py still has inline cfg calls: {calls}" | ||
|
|
||
| def test_unified_asr_has_no_inline_cfg_get_calls(self): | ||
| src = self._source('listenr.unified_asr') | ||
| import re | ||
| calls = re.findall(r'cfg\.get_\w+_setting\(', src) | ||
| assert calls == [], f"unified_asr.py still has inline cfg calls: {calls}" | ||
|
|
||
| def test_llm_processor_cfg_calls_only_in_api_base(self): | ||
| src = self._source('listenr.llm_processor') | ||
| import re | ||
| # Find all lines with cfg.get_*_setting | ||
| lines_with_cfg = [ | ||
| (i + 1, line.strip()) | ||
| for i, line in enumerate(src.splitlines()) | ||
| if re.search(r'cfg\.get_\w+_setting\(', line) | ||
| ] | ||
| for lineno, line in lines_with_cfg: | ||
| assert '_api_base' in src.splitlines()[lineno - 2] or 'def _api_base' in line or '_api_base' in line, ( | ||
| f"llm_processor.py has unexpected cfg call outside _api_base at line ~{lineno}: {line!r}" | ||
| ) |
There was a problem hiding this comment.
The “No stale cfg.get_* calls” checks don't actually match cfg.get_setting(...) because the regex requires get_..._setting (e.g., get_int_setting). As written, cfg.get_setting usages can slip through even though the docstring says they’re disallowed outside _api_base. Consider broadening the pattern (or using inspect.getsource + AST/tokenize) to enforce what the test claims to enforce.
| "--format", | ||
| choices=["csv", "hf", "both"], | ||
| default="csv", | ||
| help="Output format: csv, hf (HuggingFace datasets), or both (default: csv)", | ||
| default=DEFAULT_FORMAT, | ||
| help=f"Output format: csv, hf (HuggingFace datasets), or both (default: from config, currently {DEFAULT_FORMAT})", | ||
| ) |
There was a problem hiding this comment.
--format restricts choices to ['csv','hf','both'], but DEFAULT_FORMAT comes from DATASET_FORMAT (config-backed). If a user config sets an unsupported value (e.g. 'parquet'), argparse will error immediately because the default is invalid. Either validate/normalize DATASET_FORMAT in constants/config_manager, or keep the CLI choices in sync with all documented/accepted config values.
| 'Output': { | ||
| 'file': '~/transcripts_raw.txt', | ||
| 'llm_file': '~/transcripts_clean.txt', | ||
| 'format': '[{timestamp}] {text}', | ||
| 'line_format': '[{timestamp}] {text}', | ||
| 'timestamp_format': '%%Y-%%m-%%d %%H:%%M:%%S', # Double %% for configparser escaping | ||
| 'show_raw': 'false', |
There was a problem hiding this comment.
Renaming the Output key from format to line_format in DEFAULT_CONFIG is a breaking change for existing config.ini files that still set format—their customized value will be ignored and the fallback will be used. Consider adding a small backward-compat alias/migration (e.g., when reading line_format, fall back to format if present) or documenting the migration clearly.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…o centralize-constants
Make use of single constants entry for app instead of calling cfg everywhere and add unit tests