# 1) Create venv & install
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
# 2) Run on sample data (produces output/run_YYYYMMDD_HHMM/report.html)
qa-bugs run --config configs/example.config.yml --input data/sample_bugs.csv --since 2025-09-01 --until 2025-09-30 --metrics defect_age,age_by_priority --llm offOpen the generated report.html in a browser.
Data-Driven Approach:
- Environments are discovered from your uploaded data, not predefined in config
- Only environments present in the data are analyzed and displayed in reports
- Environments are ordered by defect count (most defects first) for better visibility
- If an environment doesn't exist in your data, it won't appear in any metric
Environment Mapping:
- Use
env_value_mapping(manual) orauto_env_mapping(LLM-based) to normalize values - Example:
"production" → "PROD","testing" → "QA" - Mapping happens before analysis, so configure
intended_envandleak_envsusing the mapped values
Example:
# If your data has: "prod-server", "qa-env", "staging"
# And you map them to: PROD, QA, STAGE
# Configure leakage_rate to use mapped names:
leakage_rate:
intended_env: ["QA", "STAGE"] # Use mapped names
leak_envs: ["PROD"] # Use mapped namesWarnings:
- If you configure
intended_envorleak_envswith environments not in your data, warnings will be logged - Missing environments are skipped automatically - analysis continues with available data
Automatic Classification:
- The system can automatically classify your data semantics using AI
- No more hardcoded status lists or priority mappings
- Works with any bug tracking system (Jira, Azure DevOps, GitHub Issues, etc.)
What Gets Classified:
- Statuses → Open / Closed / Rejected categories
- Priorities → Severity order (Critical → High → Medium → Low)
- Environments → Production vs Non-Production
How to Enable:
# In config file:
auto_classification:
enabled: true
classify_statuses: true
classify_priorities: true
classify_environments: true
confidence_threshold: 0.6 # Auto-apply if confidence ≥ 60%Or via CLI:
qa-bugs run --config config.yml --input data.csv --auto-classify --llm onHow It Works:
- Upload your data → AI analyzes unique status/priority values
- LLM classification (if enabled) or fuzzy keyword matching (fallback)
- Classifications shown in report with confidence scores
- High-confidence classifications auto-applied to metrics
- You can review and override AI decisions
Benefits:
- ✅ Works with any project (no config changes needed)
- ✅ Transparent (see what AI decided + confidence scores)
- ✅ Fallback to fuzzy matching if LLM unavailable
- ✅ Reduces config complexity
- ✅ Adapts to your project's terminology
Report Display: Reports now include "� AI Data Profile" section showing:
- Status Tab: Open/Closed/Rejected classifications with confidence scores
- Priority Tab: Severity ordering from highest to lowest
- Environment Tab: Production vs Non-Production, pipeline order (DEV → QA → STAGE → PROD)
- Summary Tab: Field completeness, date range, applicable metrics
- Method used (LLM or fuzzy matching) with confidence percentage
- Any warnings or unclassified values
Available In:
- ✅ CLI HTML Reports - Detailed profile section with styling
- ✅ Streamlit UI - Interactive expandable tabs (Status/Priority/Environment/Summary)
- ✅ UI Toggle - Enable/disable auto-classification from sidebar
You can pull fresh issues directly from Jira into a CSV compatible with the analytics pipeline.
Copy .env.example to .env and fill (provide full JQL in JIRA_JQL_EXTRA including project clause):
JIRA_URL=https://your-domain.atlassian.net
JIRA_USER=your-email@example.com
JIRA_TOKEN=your_api_token
JIRA_JQL_EXTRA=project=PROJECTKEY AND status != Done AND priority in (High, Critical)
Generate an API token from Atlassian account security settings.
pip install -e . will install requests used by the exporter.
python -m qa_bugs.automation.jira_export export --output data/jira_issues.csv --limit 100Filtering now uses an env var JIRA_JQL_EXTRA (required, full JQL). Example in .env.example already includes the project= clause.
Adjust batch size or limit:
--batch-size 500(Jira caps at 1000)--limit 1000
qa-bugs --config configs/example.config.yml --input data/jira_issues.csv --llm offThe CSV headers will match the configured fields_mapping (e.g., Created, Resolved, FixVersion).
Unit tests use pytest.
Run the full suite (excluding optional live tests by default):
python -m pytestRun a single test file:
python -m pytest tests/test_defect_age.py -qtests/test_llm_live.py is marked with @pytest.mark.live and performs a real Azure OpenAI request.
It is skipped unless the following environment variables are set:
AZURE_OPENAI_KEYAZURE_OPENAI_ENDPOINT- (optional)
AZURE_OPENAI_DEPLOYMENT(defaults togpt-4o) - (optional)
AZURE_OPENAI_API_VERSION(defaults to2024-05-01-preview)
Run only live tests:
python -m pytest -m liveExclude live tests:
python -m pytest -m "not live"Place new test files in tests/ named test_*.py. Keep each test focused with minimal assertions covering:
- Happy path
- One edge case (e.g., empty dataframe)
- One configuration variance
Automatic file logging is now enabled by default:
CLI runs:
- Logs saved to:
output/run_YYYYMMDD_HHMM/qa_bugs.log - Includes all field mapping, analysis, and LLM activity
- DEBUG level details in file, INFO level in console
UI runs:
- Field mapping:
output/ui_session_YYYYMMDD_HHMMSS/qa_bugs_ui.log - Analysis:
output/ui_run_YYYYMMDD_HHMMSS/qa_bugs_ui.log - Includes all activity at DEBUG level
LLM prompt/response files (when log_prompts: true):
- Saved in same output directory as logs
- Format:
prompt_{metric_id}_{timestamp}.txtandresponse_{metric_id}_{timestamp}.txt
For scripts or notebooks, configure logging manually:
import logging
# For detailed debugging (includes fuzzy match scores, LLM responses)
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# For high-level progress tracking
logging.basicConfig(level=logging.INFO)Field Mapping Detection:
Auto-detecting field mapping for N columns- Detection startingLLM service is enabledorusing fuzzy matching only- Detection method selectedLLM prompt:/LLM response:- DEBUG level shows full LLM interactionFuzzy match: 'key' -> 'Issue ID' (score: 0.75)- DEBUG level match scoresLLM detection successfulorFalling back to fuzzy matching- Result statusvalidation: valid=True, errors=0, warnings=2- Validation summary
When to Use:
- Debugging why certain CSV columns aren't detected
- Understanding LLM vs fuzzy matching decisions
- Reviewing actual LLM prompts and responses for troubleshooting
- Troubleshooting missing required fields
- Analyzing low similarity scores in fuzzy matching
See demo_field_mapper_logging.py for a working example.
- Auto Field Mapping: See docs/AUTO_MAPPING.md for detailed guide on automatic CSV field detection
- Streamlit Deployment: See STREAMLIT_DEPLOYMENT.md for UI deployment instructions