diff --git a/.claude/commands/check-db.md b/.claude/commands/check-db.md new file mode 100644 index 0000000..2da92e2 --- /dev/null +++ b/.claude/commands/check-db.md @@ -0,0 +1,29 @@ +--- +description: Verify RootsMagic database file exists and is accessible +--- + +Check that the RootsMagic database file is present and can be queried. + +```bash +DB_PATH="${RM_DATABASE_PATH:-data/Iiams.rmtree}" + +if [ ! -f "$DB_PATH" ]; then + echo "❌ Database file not found: $DB_PATH" + echo "" + echo "Set RM_DATABASE_PATH in config/.env" + exit 1 +fi + +echo "✅ Database found: $DB_PATH" +echo "" + +# Get basic stats +PERSON_COUNT=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM PersonTable;") +EVENT_COUNT=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM EventTable;") +SOURCE_COUNT=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM SourceTable;") + +echo "📊 Database Statistics:" +echo " • Persons: $PERSON_COUNT" +echo " • Events: $EVENT_COUNT" +echo " • Sources: $SOURCE_COUNT" +``` diff --git a/.claude/commands/coverage.md b/.claude/commands/coverage.md new file mode 100644 index 0000000..1dd5556 --- /dev/null +++ b/.claude/commands/coverage.md @@ -0,0 +1,11 @@ +--- +description: Run tests with coverage report +--- + +Run the full test suite with coverage analysis. Shows which lines are covered and identifies gaps. + +```bash +uv run pytest --cov=rmagent --cov-report=term-missing --cov-report=html +echo "" +echo "📊 Coverage report generated at: file://$(pwd)/htmlcov/index.html" +``` diff --git a/.claude/commands/doc-review.md b/.claude/commands/doc-review.md new file mode 100644 index 0000000..cdac1df --- /dev/null +++ b/.claude/commands/doc-review.md @@ -0,0 +1,149 @@ +--- +description: Review documentation for accuracy and completeness +show-prompt: true +meta: true +--- + +Review project documentation to ensure it's concise, current, and accurate. + +**Modes:** +- `/doc-review` or `/doc-review brief` - Review root docs and INDEX.md +- `/doc-review deep` - Review ALL documentation files + +This command reads documentation files and asks the LLM to verify: +1. Content is concise and well-organized +2. Information is current (no outdated references) +3. Cross-references and links are accurate +4. No contradictions between documents +5. INDEX.md accurately reflects documentation structure + +```bash +MODE="${1:-brief}" + +if [ "$MODE" = "deep" ]; then + cat <<'PROMPT' +Please perform a DEEP documentation review of the RMAgent project. + +Review ALL documentation files for: +1. **Accuracy** - Are all statements, commands, and examples correct? +2. **Currency** - Is information up-to-date? Any outdated references? +3. **Consistency** - Do documents contradict each other? +4. **Completeness** - Are there gaps in documentation coverage? +5. **Conciseness** - Can any content be condensed without losing value? +6. **Organization** - Is content in the right location? + +**Root Documentation Files:** +PROMPT + + echo "" + echo "=== CLAUDE.md ===" + head -100 CLAUDE.md + echo "" + echo "=== README.md ===" + head -100 README.md + echo "" + echo "=== AGENTS.md ===" + head -50 AGENTS.md + echo "" + echo "=== CONTRIBUTING.md ===" + head -50 CONTRIBUTING.md + echo "" + echo "=== CHANGELOG.md ===" + head -30 CHANGELOG.md + + cat <<'PROMPT' + +**Documentation Index:** +PROMPT + + echo "" + echo "=== docs/INDEX.md ===" + cat docs/INDEX.md + + cat <<'PROMPT' + +**All Documentation Files:** +PROMPT + + echo "" + find docs -name "*.md" -type f | sort | while read file; do + echo "" + echo "=== $file ===" + head -80 "$file" + echo "... [file continues]" + done + + cat <<'PROMPT' + +Please provide: +1. **Overall Assessment** - General state of documentation +2. **Critical Issues** - Must-fix problems (inaccuracies, broken links, outdated info) +3. **Recommendations** - Suggestions for improvement +4. **Specific Fixes** - Line-by-line corrections needed + +Focus on actionable feedback that improves documentation quality. +PROMPT + +else + # Brief mode - just check root docs + INDEX.md + cat <<'PROMPT' +Please perform a BRIEF documentation review of the RMAgent project's core files. + +Review the following files for: +1. **Accuracy** - Are statements, commands, and statistics correct? +2. **Currency** - Any outdated references or old information? +3. **Consistency** - Do these files contradict each other? +4. **Conciseness** - Is content appropriately detailed (not too verbose)? +5. **INDEX.md Accuracy** - Does INDEX.md correctly reference all docs in the repository? + +**Root Documentation Files:** +PROMPT + + echo "" + echo "=== CLAUDE.md ===" + cat CLAUDE.md + echo "" + echo "=== README.md ===" + cat README.md + echo "" + echo "=== AGENTS.md ===" + cat AGENTS.md + echo "" + echo "=== CONTRIBUTING.md ===" + cat CONTRIBUTING.md + echo "" + echo "=== CHANGELOG.md ===" + cat CHANGELOG.md + + cat <<'PROMPT' + +**Documentation Index:** +PROMPT + + echo "" + echo "=== docs/INDEX.md ===" + cat docs/INDEX.md + + cat <<'PROMPT' + +**Verify INDEX.md Completeness:** +Check that docs/INDEX.md accurately references all documentation files in the repository. +PROMPT + + echo "" + echo "=== All docs files in repository ===" + find docs -name "*.md" -type f | sort + + cat <<'PROMPT' + +Please provide: +1. **Quick Assessment** - Overall state of core documentation +2. **Critical Issues** - Any inaccuracies, broken references, or outdated info +3. **INDEX.md Status** - Is it complete and accurate? +4. **Quick Wins** - Easy improvements to make immediately + +Be concise and actionable. Focus on what needs fixing right now. +PROMPT + +fi +``` diff --git a/.claude/commands/docs.md b/.claude/commands/docs.md new file mode 100644 index 0000000..420d3c9 --- /dev/null +++ b/.claude/commands/docs.md @@ -0,0 +1,33 @@ +--- +description: Open documentation in browser or show quick reference +--- + +Quick access to RMAgent documentation. + +Usage: +- `/docs` - Show documentation index +- `/docs schema` - Open schema reference +- `/docs data-formats` - Open data formats reference +- `/docs dev` - Open developer guide + +```bash +case "$ARGUMENTS" in + schema) + echo "📖 Schema Reference: docs/reference/schema/schema-reference.md" + cat docs/reference/schema/schema-reference.md | head -50 + ;; + data-formats) + echo "📖 Data Formats: docs/reference/data-formats/" + ls -1 docs/reference/data-formats/ + ;; + dev) + echo "📖 Developer Guide: docs/guides/developer-guide.md" + cat docs/guides/developer-guide.md | head -50 + ;; + *) + echo "📚 RMAgent Documentation Index" + echo "" + cat docs/INDEX.md | head -80 + ;; +esac +``` diff --git a/.claude/commands/lint.md b/.claude/commands/lint.md new file mode 100644 index 0000000..d13d387 --- /dev/null +++ b/.claude/commands/lint.md @@ -0,0 +1,22 @@ +--- +description: Run linting checks (ruff and black) +--- + +Run code quality checks using ruff and black formatters. + +Usage: +- `/lint` - Check code without modifying +- `/lint fix` - Auto-fix issues where possible + +```bash +if [ "$ARGUMENTS" = "fix" ]; then + echo "🔧 Auto-fixing issues..." + uv run ruff check --fix . + uv run black . + echo "✅ Fixes applied" +else + echo "🔍 Checking code quality..." + uv run ruff check . + uv run black --check . +fi +``` diff --git a/.claude/commands/rm-ask.md b/.claude/commands/rm-ask.md new file mode 100644 index 0000000..e628a91 --- /dev/null +++ b/.claude/commands/rm-ask.md @@ -0,0 +1,23 @@ +--- +description: Ask a question about the genealogy database using AI +show-prompt: true +meta: true +--- + +Ask natural language questions about your genealogy data. Uses AI to query and analyze the database. + +Usage: +- `/rm-ask "Who are the ancestors of John Smith?"` +- `/rm-ask "Find everyone born in Baltimore"` +- `/rm-ask "Show family relationships for person 1"` + +```bash +if [ -z "$ARGUMENTS" ]; then + echo "❌ Error: Question required" + echo "Usage: /rm-ask \"Your question here\"" + exit 1 +fi + +echo "🤔 Asking AI about your genealogy data..." +uv run rmagent ask "$ARGUMENTS" +``` diff --git a/.claude/commands/rm-bio.md b/.claude/commands/rm-bio.md new file mode 100644 index 0000000..3d1a3be --- /dev/null +++ b/.claude/commands/rm-bio.md @@ -0,0 +1,27 @@ +--- +description: Generate a biography for a person using rmagent +show-prompt: true +meta: true +--- + +Generate a biography using the rmagent CLI. Requires a person ID. + +Usage: +- `/rm-bio 1` - Generate standard biography for person 1 +- `/rm-bio 1 short` - Generate short biography +- `/rm-bio 1 comprehensive` - Generate comprehensive biography + +```bash +if [ -z "$1" ]; then + echo "❌ Error: Person ID required" + echo "Usage: /rm-bio [length]" + echo "Example: /rm-bio 1 standard" + exit 1 +fi + +PERSON_ID=$1 +LENGTH=${2:-standard} + +echo "📝 Generating $LENGTH biography for person $PERSON_ID..." +uv run rmagent bio $PERSON_ID --length $LENGTH --output reports/biographies/ +``` diff --git a/.claude/commands/rm-person.md b/.claude/commands/rm-person.md new file mode 100644 index 0000000..a980dc5 --- /dev/null +++ b/.claude/commands/rm-person.md @@ -0,0 +1,23 @@ +--- +description: Query person details from RootsMagic database +--- + +Query detailed information about a person using rmagent CLI. + +Usage: +- `/rm-person 1` - Basic person info +- `/rm-person 1 --events` - Include all events +- `/rm-person 1 --family` - Include family relationships +- `/rm-person 1 --ancestors` - Include ancestors +- `/rm-person 1 --descendants` - Include descendants + +```bash +if [ -z "$1" ]; then + echo "❌ Error: Person ID required" + echo "Usage: /rm-person [--options]" + exit 1 +fi + +echo "👤 Querying person $1..." +uv run rmagent person $@ +``` diff --git a/.claude/commands/rm-quality.md b/.claude/commands/rm-quality.md new file mode 100644 index 0000000..d47b57f --- /dev/null +++ b/.claude/commands/rm-quality.md @@ -0,0 +1,22 @@ +--- +description: Run data quality validation on RootsMagic database +--- + +Run data quality checks on the RootsMagic database. + +Usage: +- `/rm-quality` - Run all quality checks +- `/rm-quality dates` - Check only date issues +- `/rm-quality names` - Check only name issues +- `/rm-quality places` - Check only place issues +- `/rm-quality relationships` - Check only relationship issues + +```bash +if [ -z "$ARGUMENTS" ]; then + echo "🔍 Running all data quality checks..." + uv run rmagent quality --format table +else + echo "🔍 Running quality checks for category: $ARGUMENTS" + uv run rmagent quality --category $ARGUMENTS --format table +fi +``` diff --git a/.claude/commands/rm-search.md b/.claude/commands/rm-search.md new file mode 100644 index 0000000..6d1681b --- /dev/null +++ b/.claude/commands/rm-search.md @@ -0,0 +1,21 @@ +--- +description: Search RootsMagic database by name or place +--- + +Search the RootsMagic database for people by name or place. + +Usage: +- `/rm-search --name "John Smith"` - Search by name +- `/rm-search --place "Baltimore"` - Search by place +- `/rm-search --name "Smith" --limit 10` - Limit results + +```bash +if [ -z "$ARGUMENTS" ]; then + echo "❌ Error: Search parameters required" + echo "Usage: /rm-search --name \"Name\" or /rm-search --place \"Place\"" + exit 1 +fi + +echo "🔎 Searching database..." +uv run rmagent search $ARGUMENTS +``` diff --git a/.claude/commands/rm-timeline.md b/.claude/commands/rm-timeline.md new file mode 100644 index 0000000..9600c66 --- /dev/null +++ b/.claude/commands/rm-timeline.md @@ -0,0 +1,22 @@ +--- +description: Generate timeline visualization for a person +--- + +Generate a TimelineJS3 timeline for a person's life events. + +Usage: +- `/rm-timeline 1` - Generate JSON timeline for person 1 +- `/rm-timeline 1 --format html` - Generate HTML timeline +- `/rm-timeline 1 --include-family` - Include family events +- `/rm-timeline 1 --group-by-phase` - Group events by life phases + +```bash +if [ -z "$1" ]; then + echo "❌ Error: Person ID required" + echo "Usage: /rm-timeline [options]" + exit 1 +fi + +echo "📅 Generating timeline for person $1..." +uv run rmagent timeline $@ +``` diff --git a/.claude/commands/test.md b/.claude/commands/test.md new file mode 100644 index 0000000..a6088f5 --- /dev/null +++ b/.claude/commands/test.md @@ -0,0 +1,19 @@ +--- +description: Run pytest tests with optional filters +--- + +Run the test suite using pytest. You can optionally specify a test file or pattern. + +Usage: +- `/test` - Run all tests +- `/test unit` - Run only unit tests +- `/test integration` - Run only integration tests +- `/test test_queries.py` - Run specific test file + +```bash +if [ -z "$ARGUMENTS" ]; then + uv run pytest -v +else + uv run pytest -v tests/$ARGUMENTS* +fi +``` diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..e038605 --- /dev/null +++ b/.gitattributes @@ -0,0 +1 @@ +data/Iiams.rmtree filter=lfs diff=lfs merge=lfs -text diff --git a/.github/workflows/pr-tests.yml b/.github/workflows/pr-tests.yml new file mode 100644 index 0000000..b90992e --- /dev/null +++ b/.github/workflows/pr-tests.yml @@ -0,0 +1,50 @@ +name: PR Tests + +on: + pull_request: + branches: [ develop, main ] + +jobs: + test: + runs-on: macos-latest + + steps: + - name: Checkout code + uses: actions/checkout@v4 + with: + lfs: true + + - name: Install uv + uses: astral-sh/setup-uv@v3 + with: + version: "latest" + + - name: Set up Python + run: uv python install 3.12 + + - name: Install dependencies + run: uv sync --extra dev + + - name: Run linting + run: | + uv run ruff check . + uv run black --check . + + - name: Run tests with coverage + run: uv run pytest --cov=rmagent --cov-report=term-missing --cov-report=xml --cov-fail-under=80 + env: + # Set test environment variables + RM_DATABASE_PATH: data/Iiams.rmtree + DEFAULT_LLM_PROVIDER: anthropic + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} + OLLAMA_BASE_URL: ${{ secrets.OLLAMA_BASE_URL }} + LOG_LEVEL: WARNING + + - name: Upload coverage reports + uses: codecov/codecov-action@v4 + if: always() + with: + token: ${{ secrets.CODECOV_TOKEN }} + files: ./coverage.xml + fail_ci_if_error: false diff --git a/CLAUDE.md b/CLAUDE.md index 86937f1..0ea7586 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -24,40 +24,45 @@ rmagent/ │ └── rmlib/ # Core library (database, parsers, queries) ├── config/ # Runtime config (config/.env) ├── data/ # Database files (*.rmtree, NOT tracked in git) -├── data_reference/ # 18 schema/format docs (RM11_*.md) -├── docs/ # Project docs (AI_AGENT_TODO.md, USER_GUIDE.md, MVP_CHECKPOINT.md) +├── docs/ # **📚 START HERE: docs/INDEX.md** - Complete documentation +│ ├── INDEX.md # Master table of contents +│ ├── getting-started/ # Installation, quickstart, configuration +│ ├── guides/ # User & developer guides +│ ├── reference/ # Schema, formats, query patterns +│ ├── projects/ # Active feature development +│ └── archive/ # Completed milestones & summaries ├── sqlite-extension/ # ICU extension for RMNOCASE collation -└── tests/unit/ # Test suite (245+ tests, pytest) +└── tests/unit/ # Test suite (490+ tests, pytest) ``` -## Essential Documentation (data_reference/) +## 📚 Essential Documentation -**Schema & Structure:** -- **RM11_Schema_Reference.md** - START HERE: tables, fields, relationships, query patterns -- **RM11_schema_annotated.sql** - SQL with comments for query writing -- **RM11_DataDef.yaml** - Field enumerations and constraints +**For complete documentation, see [`docs/INDEX.md`](docs/INDEX.md)** -**Core Formats:** -- **RM11_Date_Format.md** - CRITICAL: 24-char date encoding (ranges, qualifiers, BC/AD) -- **RM11_Place_Format.md** - Comma-delimited hierarchy (City, County, State, Country) -- **RM11_FactTypes.md** - 65 built-in event types +### Quick Reference (Most Important Files) -**BLOB Structures (UTF-8 XML with BOM):** -- **RM11_BLOB_SourceFields.md** - SourceTable.Fields extraction -- **RM11_BLOB_SourceTemplateFieldDefs.md** - Template definitions (433 templates) -- **RM11_BLOB_CitationFields.md** - CitationTable.Fields extraction +**Schema & Database:** +- **[schema-reference.md](docs/reference/schema/schema-reference.md)** - START HERE: tables, fields, relationships +- **[annotated-schema.sql](docs/reference/schema/annotated-schema.sql)** - SQL with comments +- **[data-definitions.yaml](docs/reference/schema/data-definitions.yaml)** - Field enumerations -**Data Quality & Output:** -- **RM11_Data_Quality_Rules.md** - 24 validation rules across 6 categories -- **RM11_Query_Patterns.md** - 15 optimized SQL patterns -- **RM11_Biography_Best_Practices.md** - 9-section structure, citation styles -- **RM11_Timeline_Construction.md** - TimelineJS3 JSON generation +**Critical Data Formats:** +- **[date-format.md](docs/reference/data-formats/date-format.md)** - ⚠️ CRITICAL: 24-char date encoding +- **[place-format.md](docs/reference/data-formats/place-format.md)** - Comma-delimited hierarchy +- **[fact-types.md](docs/reference/data-formats/fact-types.md)** - 65 built-in event types -**Additional References:** -- RM11_Relationships.md (Relate1/Relate2 calculations) -- RM11_Name_Display_Logic.md (context-aware name selection) -- RM11_EventTable_Details.md (Details field patterns) -- RM11_Sentence_Templates.md (reference only - AI generates text natively) +**BLOB Parsing (UTF-8 XML with BOM):** +- **[blob-source-fields.md](docs/reference/data-formats/blob-source-fields.md)** - SourceTable.Fields +- **[blob-citation-fields.md](docs/reference/data-formats/blob-citation-fields.md)** - CitationTable.Fields +- **[blob-template-field-defs.md](docs/reference/data-formats/blob-template-field-defs.md)** - Template definitions + +**Query & Quality:** +- **[query-patterns.md](docs/reference/query-patterns/query-patterns.md)** - 15 optimized SQL patterns +- **[data-quality-rules.md](docs/reference/query-patterns/data-quality-rules.md)** - 24 validation rules + +**Biography & Output:** +- **[biography-best-practices.md](docs/reference/biography/biography-best-practices.md)** - 9-section structure +- **[timeline-construction.md](docs/reference/biography/timeline-construction.md)** - TimelineJS3 format ## Critical Schema Patterns @@ -106,6 +111,26 @@ RM_DATABASE_PATH=data/Iiams.rmtree LOG_LEVEL=DEBUG # Enable LLM logging ``` +### Claude Code Integration + +RMAgent includes 12 custom slash commands and automated hooks for Claude Code: + +**Quick Commands:** +- `/rm-bio ` - Generate biography with AI +- `/rm-person ` - Query person from database +- `/rm-quality` - Run data quality checks +- `/doc-review [brief|deep]` - Review documentation for accuracy with AI +- `/test` - Run pytest suite +- `/coverage` - Run tests with coverage +- `/check-db` - Verify database connection + +**Automated Hooks:** +- Coverage reminders after pytest runs +- Commit preview before git push +- Documentation review reminder (pre-commit) + +**See [`docs/guides/claude-code-setup.md`](docs/guides/claude-code-setup.md) for complete setup and usage guide.** + ## Project Status (2025-10-12) 🎉 **Milestone 2: MVP ACHIEVED** - All foundation phases complete (33/33 tasks) @@ -129,11 +154,11 @@ LOG_LEVEL=DEBUG # Enable LLM logging - Source formatting improvements (italic rendering, type prefix removal) - Biography collision handling with sequential numbering -**Test Coverage:** 418 tests, 82% overall coverage (97% database, 96% parsers, 91% quality) +**Test Coverage:** 490 tests, 88% overall coverage (97% database, 96% parsers, 94% rendering) **Next Phase:** Phase 7 - Production Polish (performance optimization, advanced features) -See `docs/AI_AGENT_TODO.md` for complete roadmap. +See [`docs/projects/ai-agent/roadmap.md`](docs/projects/ai-agent/roadmap.md) for complete roadmap. ## CLI Commands @@ -159,18 +184,58 @@ All commands use `uv run rmagent [command]`: ## LangChain v1.0 Integration (Future) -**Status:** Zero active LangChain imports. v1.0 upgrade planned for Phase 7. See `docs/RM11_LangChain_Upgrade.md` and `AGENTS.md` for patterns. +**Status:** Zero active LangChain imports. v1.0 upgrade planned for Phase 7. See [`docs/projects/ai-agent/langchain-upgrade.md`](docs/projects/ai-agent/langchain-upgrade.md) for detailed plan. **v1.0 Requirements:** `create_agent()`, `system_prompt="string"`, TypedDict state only. New code goes in `rmagent/agent/lc/` directory. ## Git Workflow +**RMAgent uses a gitflow workflow with automated testing on all PRs.** + +### Branch Structure +- **main** - Production-ready code only (protected, PR-only) +- **develop** - Default integration branch (all work starts here) +- **feature/** - Individual features (branch from develop, PR back to develop) + +### Quick Start ```bash +# Clone and setup git clone git@github.com:miams/rmagent.git && ssh-add ~/.ssh/miams-github -git commit -m "feat: description" # Use: feat|fix|docs|test|refactor + +# Start new feature +git checkout develop && git pull +git checkout -b feature/description +git push -u origin feature/description + +# Make changes and commit +git add . && git commit -m "feat: description" +git push + +# Create PR to develop (squash merge) +gh pr create --base develop + +# Merge when tests pass +gh pr merge --squash --delete-branch ``` -**Branches:** main (production), develop (integration), feature/* (new work) +### Commit Convention +Use: `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, `chore:` + +### Release Process +When milestone complete: Create PR from `develop` → `main` (merge commit, not squash) + +### CI/CD +All PRs automatically run: +- Linting (black, ruff) +- Full test suite with coverage (must maintain 80%+) +- See `.github/workflows/pr-tests.yml` + +**For detailed workflow instructions, see [`docs/guides/git-workflow.md`](docs/guides/git-workflow.md)** + +**New to git collaboration?** See [`docs/guides/git-for-newbies.md`](docs/guides/git-for-newbies.md) for fundamentals: +- `git pull` vs `git fetch` explained +- Feature branch relationships +- Multi-developer sync strategies ## Quick Reference diff --git a/README.md b/README.md index 267c868..89883ec 100644 --- a/README.md +++ b/README.md @@ -132,7 +132,7 @@ quality_summary = agent.analyze_data_quality() ### CLI Setup Options **Option 1: Direct Access (Recommended)** -Run `./setup_cli.sh` to enable direct CLI access and tab completion. See [docs/CLI_SETUP.md](docs/CLI_SETUP.md) for details. +Run `./setup_cli.sh` to enable direct CLI access and tab completion. After setup, use commands directly: ```bash @@ -377,13 +377,12 @@ When LangChain v1.0 stable releases, use these patterns: ### Migration Plan -See `docs/RM11_LangChain_Upgrade.md` for complete upgrade strategy and timeline. +See [`docs/projects/ai-agent/langchain-upgrade.md`](docs/projects/ai-agent/langchain-upgrade.md) for complete upgrade strategy and timeline. **Key Points:** - New LangChain code goes in `rmagent/agent/lc/` directory - Use v1.0 patterns from day one (no migration needed) - Maintain 80%+ test coverage for all LangChain features -- See `AGENTS.md` for comprehensive best practices ## Development @@ -414,33 +413,34 @@ uv run pytest --cov=rmagent --cov-report=html ## Documentation -**📚 Complete Documentation Index:** [docs/README.md](docs/README.md) +**📚 Complete Documentation Index:** **[docs/INDEX.md](docs/INDEX.md)** ← START HERE ### For New Users -Start here to get up and running: +Get up and running quickly: -1. **[INSTALL.md](INSTALL.md)** - Installation guide (macOS, Linux, Windows/WSL2) -2. **[CONFIGURATION.md](CONFIGURATION.md)** - Configuration, LLM providers, prompt customization -3. **[USAGE.md](USAGE.md)** - Complete CLI reference with 50+ examples -4. **[FAQ.md](FAQ.md)** - Common questions and troubleshooting - -**Comprehensive Guide:** [docs/USER_GUIDE.md](docs/USER_GUIDE.md) (31KB, all-in-one) +1. **[Installation Guide](docs/getting-started/installation.md)** - Install RMAgent and dependencies +2. **[Quick Start](docs/getting-started/quickstart.md)** - 5-minute tutorial +3. **[Configuration Guide](docs/getting-started/configuration.md)** - Set up API keys and database +4. **[User Guide](docs/guides/user-guide.md)** - Complete CLI reference with examples +5. **[FAQ](docs/faq.md)** - Troubleshooting and common questions ### For Developers -Start here to contribute or extend RMAgent: +Contribute or extend RMAgent: -1. **[DEVELOPER_GUIDE.md](DEVELOPER_GUIDE.md)** - Architecture, design patterns, API reference +1. **[Developer Guide](docs/guides/developer-guide.md)** - Architecture, design patterns, API reference 2. **[CONTRIBUTING.md](CONTRIBUTING.md)** - Contribution workflow and coding standards -3. **[TESTING.md](TESTING.md)** - Testing guide (279 tests, coverage analysis) -4. **[CHANGELOG.md](CHANGELOG.md)** - Complete version history +3. **[Testing Guide](docs/guides/testing-guide.md)** - Testing guide (490 tests, 88% coverage) +4. **[Git Workflow](docs/guides/git-workflow.md)** - Branching strategy and PR process +5. **[CHANGELOG.md](CHANGELOG.md)** - Version history -### Additional Documentation +### Technical Reference -- **[AGENTS.md](AGENTS.md)** - Agent design patterns -- **[data_reference/](data_reference/)** - RootsMagic 11 schema (18 reference docs) -- **[docs/](docs/)** - Project documentation and completion reports +- **[Schema Reference](docs/reference/schema/)** - RootsMagic 11 database schema +- **[Data Formats](docs/reference/data-formats/)** - Date/place/BLOB formats +- **[Query Patterns](docs/reference/query-patterns/)** - Optimized SQL patterns +- **[Biography Reference](docs/reference/biography/)** - Biography generation guidelines ## Status @@ -450,7 +450,7 @@ Start here to contribute or extend RMAgent: **Completion:** All 26 foundation tasks complete (Phases 1-4) **Next Focus:** Testing & Quality improvements (Phase 5) -See [docs/MVP_CHECKPOINT.md](docs/MVP_CHECKPOINT.md) for complete verification report. +See [docs/archive/checkpoints/mvp-checkpoint.md](docs/archive/checkpoints/mvp-checkpoint.md) for complete verification report. --- @@ -501,9 +501,9 @@ See [docs/MVP_CHECKPOINT.md](docs/MVP_CHECKPOINT.md) for complete verification r - ✅ Export Command (Hugo blog export with batch support, 8 tests, 74% coverage) - ✅ Search Command (name/place search with phonetic matching, 8 tests, 88% coverage) -**⏭️ Next Tasks:** Phase 5 - Testing & Quality (comprehensive integration testing) +**⏭️ Next Tasks:** Phase 7 - Production Polish (performance optimization, advanced features) -See `docs/AI_AGENT_TODO.md` for detailed progress and roadmap. +See [`docs/projects/ai-agent/roadmap.md`](docs/projects/ai-agent/roadmap.md) for detailed progress and roadmap. ## Repository diff --git a/data/Iiams.rmtree b/data/Iiams.rmtree new file mode 100644 index 0000000..00bb873 --- /dev/null +++ b/data/Iiams.rmtree @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7163c4982fa83bb0aae58e894986dc5e4b6f86634cca4cd3aeaf2d8e01e1571a +size 66928640 diff --git a/data/Iiams.rmtree-shm b/data/Iiams.rmtree-shm new file mode 100644 index 0000000..fe9ac28 Binary files /dev/null and b/data/Iiams.rmtree-shm differ diff --git a/data/Iiams.rmtree-wal b/data/Iiams.rmtree-wal new file mode 100644 index 0000000..e69de29 diff --git a/docs/INDEX.md b/docs/INDEX.md new file mode 100644 index 0000000..d48122d --- /dev/null +++ b/docs/INDEX.md @@ -0,0 +1,173 @@ +# RMAgent Documentation Index + +**Version**: 1.0 +**Last Updated**: 2025-10-15 + +This is the master table of contents for all RMAgent documentation. All paths are relative to the `docs/` directory. + +--- + +## Quick Links + +- [Installation Guide](getting-started/installation.md) - Get RMAgent installed +- [Quick Start](getting-started/quickstart.md) - 5-minute getting started guide +- [User Guide](guides/user-guide.md) - Complete CLI reference +- [Developer Guide](guides/developer-guide.md) - Contributing and development setup +- [FAQ](faq.md) - Frequently asked questions + +--- + +## 📚 Documentation Structure + +### Getting Started + +New users start here: + +- **[installation.md](getting-started/installation.md)** - Installation instructions (uv, dependencies, database setup) +- **[quickstart.md](getting-started/quickstart.md)** - 5-minute tutorial with first commands +- **[configuration.md](getting-started/configuration.md)** - Environment configuration and API keys + +### User Guides + +Complete guides for using RMAgent: + +- **[user-guide.md](guides/user-guide.md)** - Complete CLI command reference and usage patterns +- **[faq.md](faq.md)** - Frequently asked questions and troubleshooting + +### Developer Guides + +For contributors and developers: + +- **[developer-guide.md](guides/developer-guide.md)** - Architecture, coding standards, development setup +- **[testing-guide.md](guides/testing-guide.md)** - Running tests, writing tests, coverage +- **[git-workflow.md](guides/git-workflow.md)** - Branching strategy, commit conventions, PR process +- **[git-for-newbies.md](guides/git-for-newbies.md)** - Git collaboration fundamentals for new developers +- **[claude-code-setup.md](guides/claude-code-setup.md)** - Claude Code slash commands and hooks configuration + +### Reference Documentation + +Technical reference materials: + +#### Schema & Database +- **[reference/schema/](reference/schema/)** - RootsMagic 11 database schema documentation + - `schema-reference.md` - Complete table and field reference + - `annotated-schema.sql` - Annotated SQL schema + - `data-definitions.yaml` - Field enumerations and constraints + - `relationships.md` - Parent-child and family relationships + +#### Data Formats +- **[reference/data-formats/](reference/data-formats/)** - RootsMagic data format specifications + - `date-format.md` - 24-character date encoding (CRITICAL) + - `place-format.md` - Comma-delimited place hierarchy + - `blob-fields.md` - UTF-8 XML BLOB parsing (sources, citations, templates) + - `fact-types.md` - 65 built-in event types + +#### Query Patterns +- **[reference/query-patterns/](reference/query-patterns/)** - SQL query patterns and best practices + - `query-patterns.md` - 15 optimized query patterns + - `rmnocase-collation.md` - ICU extension for case-insensitive text matching + - `data-quality-rules.md` - 24 validation rules across 6 categories + +#### Biography +- **[reference/biography/](reference/biography/)** - Biography generation reference + - `biography-best-practices.md` - 9-section structure, citation styles + - `timeline-construction.md` - TimelineJS3 JSON generation + +### Active Projects + +Point-in-time feature work and planning documents: + +#### AI Agent Development +- **[projects/ai-agent/](projects/ai-agent/)** + - `roadmap.md` - AI agent development roadmap (phases 1-7) + - `langchain-upgrade.md` - LangChain 1.0 migration plan + - `multi-agent-plan.md` - Multi-agent architecture design + +#### Census Extraction +- **[projects/census-extraction/](projects/census-extraction/)** + - `architecture.md` - Census extraction system architecture + - `implementation-plan.md` - Step-by-step implementation plan + +#### Biography & Citations +- **[projects/biography-citations/](projects/biography-citations/)** + - `citation-implementation.md` - Citation processing implementation + - `married-name-search.md` - Married name search optimization + +### Archive + +Completed milestones and historical documentation: + +- **[archive/checkpoints/](archive/checkpoints/)** - Milestone completion documents + - `mvp-checkpoint.md` - MVP milestone (Phases 1-6 complete) + - `phase-5-completion.md` - Testing & quality phase + - `phase-6-completion.md` - Documentation phase + +- **[archive/summaries/](archive/summaries/)** - Implementation summaries + - `integration-testing-summary.md` - Integration test implementation + - `optimization-summary.md` - Performance optimization work + - `test-coverage-analysis.md` - Coverage improvement analysis + +### Root Documentation + +Key files in the repository root (not in docs/): + +- **[CLAUDE.md](../CLAUDE.md)** - AI assistant context and project guide +- **[AGENTS.md](../AGENTS.md)** - LangChain patterns and multi-agent architecture +- **[README.md](../README.md)** - Repository entry point and quick start +- **[CONTRIBUTING.md](../CONTRIBUTING.md)** - Contribution guidelines +- **[CHANGELOG.md](../CHANGELOG.md)** - Version history and release notes + +--- + +## 🎯 For Claude Code and AI Assistants + +When working on this codebase: + +1. **Start with**: `CLAUDE.md` (root) for project context and guidelines +2. **Schema questions**: `reference/schema/schema-reference.md` +3. **Data parsing**: `reference/data-formats/` (especially `date-format.md`) +4. **Query patterns**: `reference/query-patterns/query-patterns.md` +5. **Active work**: `projects/` subdirectories for current feature development +6. **Testing**: `guides/testing-guide.md` + +--- + +## 📝 Documentation Maintenance + +### Keeping Docs Current + +- **Active work** → `projects/` (move to `archive/` when complete) +- **Completed milestones** → `archive/checkpoints/` +- **Implementation notes** → `archive/summaries/` +- **Update INDEX.md** when adding/moving documentation + +### Naming Conventions + +- **Directories**: `lowercase-with-hyphens/` +- **Files**: `lowercase-with-hyphens.md` +- **Be descriptive**: `installation.md` not `install.md` +- **No abbreviations** unless universally known (e.g., `faq.md` is OK) + +### Where Does It Go? + +| Content Type | Location | Example | +|--------------|----------|---------| +| User-facing guides | `guides/` | CLI reference, tutorials | +| Installation/setup | `getting-started/` | Installation, configuration | +| Technical reference | `reference/` | Schema, formats, patterns | +| Active feature work | `projects/{feature}/` | Planning, architecture docs | +| Completed work | `archive/` | Checkpoints, summaries | +| General questions | `faq.md` | Common issues, how-tos | + +--- + +## 🔗 External Resources + +- **Repository**: https://github.com/miams/rmagent +- **RootsMagic**: https://www.rootsmagic.com/ +- **SQLite**: https://www.sqlite.org/docs.html +- **LangChain**: https://python.langchain.com/ + +--- + +**Need help?** Check [faq.md](faq.md) or open an issue on GitHub. diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index 4997f9e..0000000 --- a/docs/README.md +++ /dev/null @@ -1,224 +0,0 @@ -# RMAgent Documentation Index - -Comprehensive documentation for RMAgent - AI-powered genealogy assistant for RootsMagic 11 databases. - -## 📚 Documentation Overview - -### For New Users - -Start here if you're new to RMAgent: - -1. **[README.md](../README.md)** - Project overview and quick start -2. **[INSTALL.md](../INSTALL.md)** - Installation guide (all platforms) -3. **[CONFIGURATION.md](../CONFIGURATION.md)** - Configuration and setup -4. **[USAGE.md](../USAGE.md)** - Complete CLI command reference -5. **[FAQ.md](../FAQ.md)** - Common questions and troubleshooting - -### For Developers - -Start here if you want to contribute or extend RMAgent: - -1. **[DEVELOPER_GUIDE.md](../DEVELOPER_GUIDE.md)** - Architecture, design patterns, API reference -2. **[CONTRIBUTING.md](../CONTRIBUTING.md)** - Contribution workflow -3. **[TESTING.md](../TESTING.md)** - Testing guide -4. **[CHANGELOG.md](../CHANGELOG.md)** - Version history - -### Additional Documentation - -- **[USER_GUIDE.md](USER_GUIDE.md)** - Comprehensive user guide (31KB, all-in-one) -- **[AGENTS.md](../AGENTS.md)** - Agent design patterns - ---- - -## 📖 Documentation by Topic - -### Installation & Setup - -| Document | Description | Audience | -|----------|-------------|----------| -| [INSTALL.md](../INSTALL.md) | Platform-specific installation (macOS, Linux, Windows/WSL2) | All users | -| [CONFIGURATION.md](../CONFIGURATION.md) | Environment variables, LLM providers, prompt customization | All users | -| [FAQ.md](../FAQ.md) | Troubleshooting installation issues | All users | - -### Using RMAgent - -| Document | Description | Audience | -|----------|-------------|----------| -| [USAGE.md](../USAGE.md) | All 7 CLI commands with 50+ examples | All users | -| [USER_GUIDE.md](USER_GUIDE.md) | Comprehensive guide (installation → advanced usage) | All users | -| [FAQ.md](../FAQ.md) | Common workflows and troubleshooting | All users | - -### Development - -| Document | Description | Audience | -|----------|-------------|----------| -| [DEVELOPER_GUIDE.md](../DEVELOPER_GUIDE.md) | Architecture, API reference, adding features | Developers | -| [CONTRIBUTING.md](../CONTRIBUTING.md) | Git workflow, coding standards, PR process | Contributors | -| [TESTING.md](../TESTING.md) | Test suite guide (279 tests, coverage analysis) | Developers | -| [CHANGELOG.md](../CHANGELOG.md) | Complete version history | All | - -### Technical Reference - -| Document | Description | Audience | -|----------|-------------|----------| -| [AGENTS.md](../AGENTS.md) | Agent design patterns | Developers | -| [data_reference/](../data_reference/) | RootsMagic 11 schema (18 docs) | Developers | - ---- - -## 🎯 Quick Navigation - -### By Task - -**I want to...** - -- **Install RMAgent** → [INSTALL.md](../INSTALL.md) -- **Configure API keys** → [CONFIGURATION.md](../CONFIGURATION.md) (LLM Provider Setup) -- **Generate a biography** → [USAGE.md](../USAGE.md) (Biography Command) -- **Customize prompts** → [CONFIGURATION.md](../CONFIGURATION.md) (Prompt Customization) -- **Run quality checks** → [USAGE.md](../USAGE.md) (Quality Command) -- **Export to Hugo** → [USAGE.md](../USAGE.md) (Export Command) -- **Fix errors** → [FAQ.md](../FAQ.md) (Troubleshooting) -- **Add a new feature** → [DEVELOPER_GUIDE.md](../DEVELOPER_GUIDE.md) (Adding New Features) -- **Run tests** → [TESTING.md](../TESTING.md) (Running Tests) -- **Contribute code** → [CONTRIBUTING.md](../CONTRIBUTING.md) (Contribution Workflow) - -### By Role - -**New User:** -1. [README.md](../README.md) → [INSTALL.md](../INSTALL.md) → [CONFIGURATION.md](../CONFIGURATION.md) → [USAGE.md](../USAGE.md) - -**Contributor:** -1. [README.md](../README.md) → [CONTRIBUTING.md](../CONTRIBUTING.md) → [TESTING.md](../TESTING.md) - -**Developer (adding features):** -1. [DEVELOPER_GUIDE.md](../DEVELOPER_GUIDE.md) → [CONTRIBUTING.md](../CONTRIBUTING.md) → [TESTING.md](../TESTING.md) - ---- - -## 📦 Project Documentation Files - -### Top-Level Documentation (8 files) - -Located in project root: - -``` -RM11/ -├── README.md # Project overview & quick start -├── INSTALL.md # Installation guide (515 lines) -├── USAGE.md # CLI command reference (800+ lines) -├── CONFIGURATION.md # Configuration guide (1,000+ lines) -├── FAQ.md # Common questions (550+ lines) -├── CONTRIBUTING.md # Contribution guidelines (450+ lines) -├── TESTING.md # Testing guide (600+ lines) -├── CHANGELOG.md # Version history (350+ lines) -├── DEVELOPER_GUIDE.md # Developer guide (1,000+ lines) -└── AGENTS.md # Agent design patterns -``` - -**Total:** 5,865+ lines of documentation - -### docs/ Directory - -Project-specific documentation: - -``` -docs/ -├── README.md # This file - documentation index -├── USER_GUIDE.md # Comprehensive user guide (1,424 lines) -├── MVP_CHECKPOINT.md # Milestone 2 verification -├── PHASE_5_COMPLETION.md # Testing & Quality report -├── PHASE_6_COMPLETION.md # Documentation & Polish report -├── AI_AGENT_TODO.md # Project roadmap -├── Test_Coverage_Analysis.md -├── INTEGRATION_TESTING_SUMMARY.md -├── REAL_API_VERIFICATION.md -└── OPTIMIZATION_SUMMARY.md -``` - -### data_reference/ Directory - -RootsMagic 11 schema documentation (18 files): - -``` -data_reference/ -├── RM11_Schema_Reference.md # Complete schema -├── RM11_Date_Format.md # 24-char date encoding -├── RM11_Place_Format.md # Place hierarchy -├── RM11_FactTypes.md # Event types -├── RM11_BLOB_*.md # XML BLOB specs (3 files) -├── RM11_Biography_Best_Practices.md # 9-section structure -├── RM11_Data_Quality_Rules.md # 24 validation rules -├── RM11_Query_Patterns.md # SQL patterns -├── RM11_Timeline_Construction.md # TimelineJS3 format -└── ... (10 more reference docs) -``` - ---- - -## 🔍 Documentation Statistics - -**Total Documentation:** -- **Lines:** 7,289+ lines -- **Files:** 36 files (10 top-level + 10 docs/ + 16 data_reference/) -- **Coverage:** Installation, usage, configuration, development, schema, API - -**By Category:** -- **User Documentation:** 3,500+ lines (INSTALL, USAGE, CONFIGURATION, FAQ, USER_GUIDE) -- **Developer Documentation:** 2,050+ lines (DEVELOPER_GUIDE, CONTRIBUTING, TESTING) -- **Project Documentation:** 1,200+ lines (CHANGELOG, completion reports, roadmap) -- **Schema Reference:** 18 files (data_reference/) - ---- - -## 📝 Documentation Updates - -### Recent Updates (v0.2.0 - October 12, 2025) - -**New Documentation:** -- ✅ **DEVELOPER_GUIDE.md** - Comprehensive developer guide with architecture, API reference, adding features -- ✅ **Prompt Customization** - Complete section in CONFIGURATION.md with YAML examples -- ✅ **USER_GUIDE.md** - Updated to v0.2.0 with new prompt system - -**Updated Documentation:** -- ✅ USER_GUIDE.md - "Customizing Prompts" section updated for YAML system -- ✅ CONFIGURATION.md - Added 350+ line "Prompt Customization" section -- ✅ CHANGELOG.md - Phase 5 and Phase 6 entries - -### Version History - -- **v0.2.0** (2025-10-12): Phase 5 & 6 complete - Testing, Quality, Documentation -- **v0.1.0** (2025-10-10): MVP complete - All 8 CLI commands working -- **v0.0.3** (2025-10-09): Milestone 1 - Working prototype - ---- - -## 🤝 Contributing to Documentation - -Documentation improvements are always welcome! - -**To contribute documentation:** - -1. **Small fixes:** Edit directly and submit PR -2. **New sections:** Open issue first to discuss -3. **Style guide:** Follow existing format -4. **Examples:** Include practical code examples -5. **Testing:** Verify all commands work - -See [CONTRIBUTING.md](../CONTRIBUTING.md) for details. - ---- - -## 📞 Getting Help - -**Questions about documentation:** -- Open an issue: https://github.com/miams/rmagent/issues -- Discussions: https://github.com/miams/rmagent/discussions - -**Found a bug in documentation:** -- Report it: https://github.com/miams/rmagent/issues -- Include: Which file, what's wrong, suggested fix - ---- - -**Last Updated:** October 12, 2025 diff --git a/docs/MVP_CHECKPOINT.md b/docs/archive/checkpoints/mvp-checkpoint.md similarity index 100% rename from docs/MVP_CHECKPOINT.md rename to docs/archive/checkpoints/mvp-checkpoint.md diff --git a/docs/PHASE_5_COMPLETION.md b/docs/archive/checkpoints/phase-5-completion.md similarity index 100% rename from docs/PHASE_5_COMPLETION.md rename to docs/archive/checkpoints/phase-5-completion.md diff --git a/docs/PHASE_6_COMPLETION.md b/docs/archive/checkpoints/phase-6-completion.md similarity index 100% rename from docs/PHASE_6_COMPLETION.md rename to docs/archive/checkpoints/phase-6-completion.md diff --git a/biography.md b/docs/archive/summaries/biography-notes.md similarity index 100% rename from biography.md rename to docs/archive/summaries/biography-notes.md diff --git a/docs/CLI_SETUP.md b/docs/archive/summaries/cli-setup.md similarity index 100% rename from docs/CLI_SETUP.md rename to docs/archive/summaries/cli-setup.md diff --git a/docs/INTEGRATION_TESTING_SUMMARY.md b/docs/archive/summaries/integration-testing-summary.md similarity index 100% rename from docs/INTEGRATION_TESTING_SUMMARY.md rename to docs/archive/summaries/integration-testing-summary.md diff --git a/data_reference/RM11_Documentation_Index.md b/docs/archive/summaries/old-documentation-index.md similarity index 100% rename from data_reference/RM11_Documentation_Index.md rename to docs/archive/summaries/old-documentation-index.md diff --git a/docs/USER_GUIDE.md b/docs/archive/summaries/old-user-guide.md similarity index 100% rename from docs/USER_GUIDE.md rename to docs/archive/summaries/old-user-guide.md diff --git a/docs/OPTIMIZATION_SUMMARY.md b/docs/archive/summaries/optimization-summary.md similarity index 100% rename from docs/OPTIMIZATION_SUMMARY.md rename to docs/archive/summaries/optimization-summary.md diff --git a/docs/REAL_API_VERIFICATION.md b/docs/archive/summaries/real-api-verification.md similarity index 100% rename from docs/REAL_API_VERIFICATION.md rename to docs/archive/summaries/real-api-verification.md diff --git a/docs/SEARCH_LOGIC_FIX.md b/docs/archive/summaries/search-logic-fix.md similarity index 100% rename from docs/SEARCH_LOGIC_FIX.md rename to docs/archive/summaries/search-logic-fix.md diff --git a/docs/SETUP_COMPLETE.md b/docs/archive/summaries/setup-complete.md similarity index 100% rename from docs/SETUP_COMPLETE.md rename to docs/archive/summaries/setup-complete.md diff --git a/docs/Test_Coverage_Analysis.md b/docs/archive/summaries/test-coverage-analysis.md similarity index 100% rename from docs/Test_Coverage_Analysis.md rename to docs/archive/summaries/test-coverage-analysis.md diff --git a/docs/VALIDATION_RESULTS.md b/docs/archive/summaries/validation-results.md similarity index 100% rename from docs/VALIDATION_RESULTS.md rename to docs/archive/summaries/validation-results.md diff --git a/FAQ.md b/docs/faq.md similarity index 100% rename from FAQ.md rename to docs/faq.md diff --git a/CONFIGURATION.md b/docs/getting-started/configuration.md similarity index 100% rename from CONFIGURATION.md rename to docs/getting-started/configuration.md diff --git a/INSTALL.md b/docs/getting-started/installation.md similarity index 100% rename from INSTALL.md rename to docs/getting-started/installation.md diff --git a/QUICKSTART.md b/docs/getting-started/quickstart.md similarity index 100% rename from QUICKSTART.md rename to docs/getting-started/quickstart.md diff --git a/docs/guides/claude-code-setup.md b/docs/guides/claude-code-setup.md new file mode 100644 index 0000000..48f5d6a --- /dev/null +++ b/docs/guides/claude-code-setup.md @@ -0,0 +1,438 @@ +# Claude Code Setup for RMAgent + +This guide documents the slash commands and hooks configured for the RMAgent project. + +## Slash Commands + +All slash commands are defined in `.claude/commands/`. RMAgent-specific commands use the `rm-` prefix. + +**Total Commands: 12** (6 RMAgent-specific, 3 development, 3 utility) + +### RMAgent Commands (AI & Data) + +Commands that interact with RootsMagic database and AI features: + +#### `/rm-bio [length]` +**Description:** Generate a biography for a person using AI + +**Options:** +- `show-prompt: true` - Shows the AI prompt +- `meta: true` - Shows token usage and timing + +**Usage:** +``` +/rm-bio 1 # Standard biography +/rm-bio 1 short # Short biography +/rm-bio 1 comprehensive # Comprehensive biography +``` + +**Output:** Markdown file in `reports/biographies/` + +--- + +#### `/rm-ask ""` +**Description:** Ask natural language questions about genealogy data using AI + +**Options:** +- `show-prompt: true` - Shows the AI prompt +- `meta: true` - Shows token usage and timing + +**Usage:** +``` +/rm-ask "Who are the ancestors of John Smith?" +/rm-ask "Find everyone born in Baltimore" +/rm-ask "Show family relationships for person 1" +``` + +**Output:** AI-generated answer with data from database + +--- + +#### `/rm-person [options]` +**Description:** Query person details from RootsMagic database + +**Usage:** +``` +/rm-person 1 # Basic info +/rm-person 1 --events # Include all events +/rm-person 1 --family # Include family relationships +/rm-person 1 --ancestors # Include ancestors +/rm-person 1 --descendants # Include descendants +``` + +**Output:** Formatted person information with requested details + +--- + +#### `/rm-search [--name "Name"] [--place "Place"] [--limit N]` +**Description:** Search database by name or place + +**Usage:** +``` +/rm-search --name "John Smith" +/rm-search --place "Baltimore" +/rm-search --name "Smith" --limit 10 +``` + +**Output:** List of matching persons with IDs + +--- + +#### `/rm-quality [category]` +**Description:** Run data quality validation + +**Categories:** +- `dates` - Date format and range issues +- `names` - Missing or invalid names +- `places` - Place format issues +- `relationships` - Relationship inconsistencies +- `sources` - Citation and source issues +- `events` - Event data issues + +**Usage:** +``` +/rm-quality # All checks +/rm-quality dates # Only date issues +/rm-quality names # Only name issues +``` + +**Output:** Formatted table of data quality issues + +--- + +#### `/rm-timeline [options]` +**Description:** Generate timeline visualization + +**Usage:** +``` +/rm-timeline 1 # JSON format +/rm-timeline 1 --format html # HTML format +/rm-timeline 1 --include-family # Include family events +/rm-timeline 1 --group-by-phase # Group by life phases +``` + +**Output:** TimelineJS3 JSON or HTML file + +--- + +### Development Commands + +Generic development and testing commands (no `rm-` prefix): + +#### `/test [filter]` +**Description:** Run pytest tests with optional filters + +**Usage:** +``` +/test # Run all tests +/test unit # Run unit tests only +/test integration # Run integration tests only +/test test_queries.py # Run specific test file +``` + +**Output:** Test results with pass/fail status + +--- + +#### `/coverage` +**Description:** Run tests with coverage analysis + +**Usage:** +``` +/coverage +``` + +**Output:** Coverage report with line-by-line analysis. HTML report at `htmlcov/index.html` + +--- + +#### `/lint [fix]` +**Description:** Run code quality checks (ruff + black) + +**Usage:** +``` +/lint # Check only +/lint fix # Auto-fix issues +``` + +**Output:** Linting errors and warnings + +--- + +### Utility Commands + +#### `/docs [topic]` +**Description:** Quick access to documentation + +**Usage:** +``` +/docs # Show INDEX.md +/docs schema # Schema reference +/docs data-formats # Data formats reference +/docs dev # Developer guide +``` + +**Output:** Documentation preview (first 50-80 lines) + +--- + +#### `/doc-review [mode]` +**Description:** Review documentation for accuracy and completeness using AI + +**Options:** +- `show-prompt: true` - Shows the AI prompt +- `meta: true` - Shows token usage and timing + +**Usage:** +``` +/doc-review # Brief mode: review root docs + INDEX.md +/doc-review brief # Same as above +/doc-review deep # Deep mode: review ALL documentation files +``` + +**Brief Mode Reviews:** +- CLAUDE.md, README.md, AGENTS.md, CONTRIBUTING.md, CHANGELOG.md +- docs/INDEX.md +- Verifies INDEX.md accurately references all docs + +**Deep Mode Reviews:** +- All files from brief mode +- All documentation in docs/ directory +- Cross-references and consistency checks + +**Output:** AI assessment with critical issues, recommendations, and specific fixes + +--- + +#### `/check-db` +**Description:** Verify database file exists and is accessible + +**Usage:** +``` +/check-db +``` + +**Output:** Database path, status, and basic statistics (person/event/source counts) + +--- + +## Claude Code Hooks + +Hooks are configured in `.claude/settings.local.json` and run automatically at specific events. + +### PostToolUse Hooks + +#### Coverage Reminder Hook +**Trigger:** After running pytest with coverage (`uv run pytest --cov`) + +**Action:** Extracts coverage percentage and reminds to update CLAUDE.md and README.md if significantly changed + +**Output:** +``` +📊 Test coverage: 88% +💡 Reminder: Update coverage stats in CLAUDE.md and README.md if significantly changed +``` + +--- + +### PreToolUse Hooks + +#### Git Push Confirmation Hook +**Trigger:** Before git push commands + +**Action:** Shows recent commits being pushed + +**Output:** +``` +⚠️ Pushing to remote. Recent commits: +66a29ca docs: reorganize documentation structure +4cdea76 fix: constrain caption width to match image margins +528f1ef fix: handle sqlite3.Row objects in biography rendering +``` + +--- + +## Git Pre-Commit Hook + +A standard git pre-commit hook is configured at `.git/hooks/pre-commit` to remind about documentation updates. + +**Triggers:** +- Documentation structure changes → Check CLAUDE.md +- Test modifications → Check README.md for coverage stats +- Core code changes → Check CLAUDE.md +- Dependency changes → Check README.md + +**Output:** +``` +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +⚠️ DOCUMENTATION REVIEW REMINDER +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ + +This commit may require updates to key documentation files: + + • Documentation structure changed + +Please review and update if necessary: + + ✓ CLAUDE.md - Already modified + 📄 README.md - User-facing docs, badges, quick start + 📄 AGENTS.md - LangChain patterns, agent architecture + +Continue with commit? (y/n) +``` + +--- + +## Configuration Files + +### `.claude/settings.local.json` +- **Permissions:** Pre-approved bash commands, web fetch domains +- **Hooks:** PostToolUse and PreToolUse hook configurations + +### `.claude/commands/` +- All custom slash command definitions +- Markdown files with frontmatter and bash scripts + +### `.git/hooks/pre-commit` +- Git pre-commit hook for documentation review +- Executable shell script + +--- + +## Best Practices + +### When to Use Each Command + +**For Genealogy Work:** +- Use `/rm-person` to quickly explore individuals +- Use `/rm-search` to find people by name/place +- Use `/rm-bio` when you need a formatted biography +- Use `/rm-ask` for complex queries that need AI interpretation +- Use `/rm-quality` regularly to maintain data integrity + +**For Development:** +- Use `/test` for quick test runs during development +- Use `/coverage` before committing to check test coverage +- Use `/lint` before committing to catch style issues +- Use `/check-db` when debugging database connection issues + +**For Documentation:** +- Use `/docs` to quickly reference schema or data formats +- Use `/doc-review` regularly to ensure docs stay accurate and current +- Use `/doc-review brief` before major commits or PRs +- Use `/doc-review deep` after significant feature additions +- Keep CLAUDE.md, README.md, and AGENTS.md updated (hooks will remind you) + +### show-prompt and meta Flags + +Only use `show-prompt` and `meta` in commands that interact with LLMs: +- ✅ `/rm-bio` - Generates biography with AI +- ✅ `/rm-ask` - Uses AI to answer questions +- ✅ `/doc-review` - Uses AI to review documentation +- ❌ `/rm-person` - Pure database query +- ❌ `/rm-search` - Pure database query +- ❌ `/rm-quality` - Rule-based validation +- ❌ `/rm-timeline` - Data extraction and formatting + +### Naming Conventions + +- **RMAgent-specific commands:** Use `rm-` prefix (`/rm-bio`, `/rm-person`) +- **Generic dev commands:** No prefix (`/test`, `/lint`, `/coverage`) +- **Utility commands:** No prefix (`/docs`, `/check-db`) + +--- + +## Testing Your Setup + +After configuring slash commands and hooks: + +1. **Test a slash command:** + ``` + /check-db + ``` + +2. **Test an AI command:** + ``` + /doc-review brief + ``` + +3. **Test a hook:** Run pytest with coverage to see PostToolUse hook: + ```bash + uv run pytest --cov=rmagent + ``` + +4. **Test git hook:** Make a change to tests and try to commit: + ```bash + # Modify a test file + git add tests/ + git commit -m "test: update test" + # You should see the documentation review reminder + ``` + +5. **View all commands:** + ``` + /help + ``` + +--- + +## Troubleshooting + +### Slash commands not appearing +- Check that files exist in `.claude/commands/` +- Ensure markdown files have proper frontmatter +- Restart Claude Code + +### Hooks not firing +- Verify JSON syntax in `.claude/settings.local.json` +- Check that regex matchers are properly escaped +- Use `claude --debug` for detailed hook execution info + +### Git hook not running +- Verify `.git/hooks/pre-commit` exists and is executable: + ```bash + chmod +x .git/hooks/pre-commit + ``` +- Check that it's not being bypassed with `--no-verify` + +--- + +## Extending This Setup + +### Adding New Slash Commands + +1. Create `.claude/commands/your-command.md`: + ```markdown + --- + description: Your command description + show-prompt: true # Only if uses AI + meta: true # Only if uses AI + --- + + Usage instructions here. + + ```bash + # Your bash script + echo "Command output" + ``` + ``` + +2. Use `rm-` prefix if RMAgent-specific +3. Document in this file + +### Adding New Hooks + +1. Edit `.claude/settings.local.json` +2. Add to appropriate hook type (PreToolUse, PostToolUse, etc.) +3. Use regex matcher to target specific tools +4. Test thoroughly before committing + +--- + +## References + +- **Claude Code Docs:** https://docs.claude.com/en/docs/claude-code/ +- **Slash Commands:** https://docs.claude.com/en/docs/claude-code/slash-commands +- **Hooks:** https://docs.claude.com/en/docs/claude-code/hooks +- **RMAgent User Guide:** [guides/user-guide.md](user-guide.md) +- **Developer Guide:** [guides/developer-guide.md](developer-guide.md) diff --git a/DEVELOPER_GUIDE.md b/docs/guides/developer-guide.md similarity index 100% rename from DEVELOPER_GUIDE.md rename to docs/guides/developer-guide.md diff --git a/docs/guides/git-for-newbies.md b/docs/guides/git-for-newbies.md new file mode 100644 index 0000000..8acf204 --- /dev/null +++ b/docs/guides/git-for-newbies.md @@ -0,0 +1,535 @@ +# Git for Newbies - RMAgent Collaboration Guide + +A practical guide to git collaboration for developers new to the RMAgent project. + +## Table of Contents + +- [git pull vs git fetch](#git-pull-vs-git-fetch) +- [Feature Branch Relationships](#feature-branch-relationships) +- [Multi-Developer Sync Strategy](#multi-developer-sync-strategy) +- [Common Scenarios](#common-scenarios) +- [Quick Reference](#quick-reference) + +--- + +## git pull vs git fetch + +### `git pull` (Recommended for Most Work) + +```bash +git pull +# Equivalent to: +# git fetch origin +# git merge origin/ +``` + +**What it does:** +- Fetches updates from remote for **current branch only** +- Automatically merges them into your local branch +- Fast and focused on what you're working on + +**When to use:** +- Daily work on feature branches +- Updating develop before creating new features +- Most common operation (95% of the time) + +**Example:** +```bash +git checkout develop +git pull # Get latest develop +git checkout -b feature/my-feature +# ... work on your feature ... +``` + +--- + +### `git fetch --all --tags` (Comprehensive Sync) + +```bash +git fetch --all --tags +``` + +**What it does:** +- `--all`: Fetches from **all remotes** (origin, upstream, etc.) +- `--tags`: Fetches all tag information +- **Does NOT merge** - only updates your local cache of remote state + +**When to use:** +- After being away from project for a while +- Want to see all branch activity without merging +- Need tag information (releases, versions) +- Working with multiple remotes (forks, upstreams) + +**Example:** +```bash +git fetch --all --tags +git branch -r # View all remote branches +git tag -l # View all tags +git log --all --graph # Visualize repository state +``` + +--- + +## Feature Branch Relationships + +### Understanding Branch Independence + +**Key concept:** Feature branches are **NOT sub-branches** of develop. They are independent branches that **start from** develop and **merge back to** develop. + +### Creating a Feature Branch + +```bash +git checkout develop +git pull # Get latest develop +git checkout -b feature/my-feature # Create new branch FROM develop +``` + +**What happens:** +- Git creates a new branch pointer at the **same commit** as develop +- The branches are now independent - neither is a "parent" or "child" +- They're like two roads that split from the same point + +**Visual:** +``` +main: A---B---C + \ +develop: D---E---F + \ +feature/x: G---H---I +``` + +At the moment you create `feature/x`, both `feature/x` and `develop` point to commit `F`. Then as you work, `feature/x` adds commits `G`, `H`, `I`. + +--- + +### Feature Branch Diverges from Develop + +While you're working on `feature/x`, other developers might merge features into develop: + +``` +main: A---B---C + \ +develop: D---E---F---J---K (other features merged) + \ +feature/x: G---H---I (your feature) +``` + +Now `feature/x` and `develop` have **diverged** - they're completely independent branches with different commits. + +--- + +### Merging Feature Back to Develop + +When your feature is done, you create a PR to merge it back: + +```bash +gh pr create --base develop --head feature/x +gh pr merge --squash +``` + +**What happens with squash merge:** +``` +main: A---B---C + \ +develop: D---E---F---J---K---L + \ ↑ +feature/x: G---H---I -----┘ + (squashed into L) +``` + +Commits `G`, `H`, `I` get "squashed" into a single commit `L` on develop. + +--- + +### Eventually Develop Merges to Main + +When a milestone is complete: + +```bash +gh pr create --base main --head develop +gh pr merge --merge # Merge commit, NOT squash +``` + +**Result:** +``` +main: A---B---C-----------M + \ / +develop: D---E---F---J---K---L +``` + +Commit `M` is a **merge commit** that brings all of develop's changes into main. + +--- + +## Multi-Developer Sync Strategy + +### Sync Frequency Based on Activity + +| Primary Dev Activity | Other Devs Should Pull | Reason | +|---------------------|------------------------|--------| +| Multiple PRs per day | Every 2-4 hours | Avoid large merge conflicts | +| 1-2 PRs per day | Morning + before PR | Stay reasonably current | +| Few PRs per week | Once daily | Minimal divergence | +| Inactive | Before starting work | No need for frequent checks | + +--- + +### Primary Developer Workflow (Active Development) + +**Starting your day:** +```bash +git checkout develop +git pull +git checkout -b feature/new-thing +# ... work ... +``` + +**Before creating PR:** +```bash +# Make sure develop hasn't changed +git fetch origin develop +git merge origin/develop # Or rebase if you prefer +git push +gh pr create --base develop +``` + +**Frequency:** Pull `develop` before starting each new feature (once or twice daily) + +--- + +### Other Developers Workflow (Collaborating) + +**Starting their day:** +```bash +git checkout develop +git pull # Get latest develop +git checkout -b feature/their-feature +# ... work ... +``` + +**Mid-development check (if primary dev is actively merging PRs):** +```bash +# Every few hours or before taking a break +git checkout develop +git pull + +# If they want their feature branch updated: +git checkout feature/their-feature +git merge develop # Bring in latest changes +``` + +**Before creating PR:** +```bash +git checkout develop +git pull # Critical: get absolute latest +git checkout feature/their-feature +git merge develop # Sync with latest +git push +gh pr create --base develop +``` + +**Frequency when primary is active:** Pull `develop` **every 2-4 hours** or **before each PR** + +--- + +## Common Scenarios + +### Scenario 1: Multiple Developers Working Simultaneously + +**Primary Developer (You):** +```bash +# Monday morning +git checkout develop && git pull +git checkout -b feature/add-docs +# ... work for 2 hours ... +git push +gh pr create --base develop +gh pr merge --squash # Merged at 10am + +# Monday afternoon +git checkout develop && git pull +git checkout -b feature/fix-bug +# ... work for 1 hour ... +git push +gh pr create --base develop +gh pr merge --squash # Merged at 2pm +``` + +**Other Developer:** +```bash +# Monday 9am - starts work +git checkout develop && git pull # Gets your state from Sunday + +# Works on their feature all morning +git checkout -b feature/new-parser +# ... 3 hours of work ... + +# Monday 12pm - takes lunch break, syncs develop +git checkout develop && git pull # Gets your 10am merge +git checkout feature/new-parser +git merge develop # Optional: bring changes into their feature + +# Monday 3pm - ready to create PR +git checkout develop && git pull # Gets your 2pm merge +git checkout feature/new-parser +git merge develop # Brings in both your merges +git push +gh pr create --base develop +``` + +**Key point:** They pulled develop **3 times** during a day when you were actively merging. + +--- + +### Scenario 2: Working with Forks and Upstreams + +If contributing from a fork: + +```bash +# Add upstream remote (one time) +git remote add upstream git@github.com:original/rmagent.git + +# Check your remotes +git remote -v +# origin git@github.com:you/rmagent.git (your fork) +# upstream git@github.com:original/rmagent.git (original repo) + +# Sync your fork with upstream +git fetch upstream +git checkout develop +git merge upstream/develop +git push origin develop + +# Now create feature from updated develop +git checkout -b feature/my-contribution +``` + +--- + +### Scenario 3: Keeping Feature Branch Updated + +If develop has advanced while you're working on a long-running feature: + +```bash +# You're on feature/long-feature +git checkout develop +git pull # Get latest develop + +git checkout feature/long-feature +git merge develop # Merge latest develop into your feature + +# Resolve any conflicts if they arise +git add . +git commit -m "merge: sync with latest develop" +git push +``` + +**Alternative using rebase (cleaner history):** +```bash +git checkout feature/long-feature +git rebase develop # Replay your commits on top of latest develop + +# If conflicts, resolve them then: +git add . +git rebase --continue + +# Force push (only safe on your feature branch!) +git push --force +``` + +--- + +### Scenario 4: Viewing Repository State Without Merging + +Want to see what's changed without affecting your working branch: + +```bash +# Fetch all updates +git fetch --all + +# View all branches +git branch -r + +# Compare your branch with remote develop +git log HEAD..origin/develop # What's in develop that you don't have +git log origin/develop..HEAD # What you have that's not in develop + +# View changes in a specific branch +git log origin/develop --oneline -10 +git show origin/develop:path/to/file.py +``` + +--- + +### Scenario 5: Checking Tags and Releases + +```bash +# Fetch tags +git fetch --all --tags + +# List all tags +git tag -l + +# View specific tag +git show v1.2.3 + +# Checkout a release +git checkout v1.2.3 # Detached HEAD at this tag +git checkout -b hotfix-v1.2.3 # Create branch from tag +``` + +--- + +## When You Actually Need `git fetch --all --tags` + +### Use Case 1: Multiple Remotes + +```bash +# You have a fork and upstream +git remote -v +# origin git@github.com:you/rmagent.git +# upstream git@github.com:original/rmagent.git + +git fetch --all # Gets from both origin AND upstream +``` + +### Use Case 2: Checking Tags/Releases + +```bash +git fetch --all --tags +git tag -l # See all version tags +git checkout v1.2.3 # Check out specific release +``` + +### Use Case 3: Viewing All Branch Activity + +```bash +git fetch --all +git branch -r # See all remote branches +git log --all --graph # Visualize entire repository state +``` + +**For RMAgent** (single remote, team collaboration), these scenarios are rare. + +--- + +## Quick Reference + +### Daily Commands + +```bash +# Start of day +git checkout develop && git pull + +# Create new feature +git checkout -b feature/name + +# Save work +git add . && git commit -m "message" && git push + +# Sync with latest develop (mid-work) +git checkout develop && git pull +git checkout feature/name && git merge develop + +# Before creating PR (CRITICAL) +git checkout develop && git pull +git checkout feature/name && git merge develop +gh pr create --base develop +``` + +--- + +### Understanding Branch State + +```bash +# What branch am I on? +git branch + +# What's changed locally? +git status +git diff + +# What's different from remote? +git fetch +git log HEAD..origin/develop # What's new in remote develop + +# View recent history +git log --oneline -10 +git log --all --graph --oneline +``` + +--- + +### Sync Checklist for Collaborators + +**Before starting work:** +- [ ] `git checkout develop && git pull` + +**Every 2-4 hours (if primary dev is active):** +- [ ] `git checkout develop && git pull` +- [ ] (Optional) Merge develop into your feature branch + +**Before creating PR:** +- [ ] `git checkout develop && git pull` +- [ ] `git checkout feature/your-feature` +- [ ] `git merge develop` (resolve conflicts) +- [ ] `git push` +- [ ] `gh pr create --base develop` + +--- + +## Key Takeaways + +### ✅ Do This: +- **Use `git pull`** for 95% of your work +- **Pull develop frequently** when primary dev is active (every 2-4 hours) +- **Always pull develop before creating a PR** +- **Merge develop into your feature** if it's long-running +- **Delete feature branches after merge** (GitHub does this automatically with `--delete-branch`) + +### ❌ Don't Do This: +- Don't assume feature branches are "sub-branches" of develop +- Don't go days without pulling if others are actively merging +- Don't create PRs without pulling develop first +- Don't force-push to develop or main (ever) +- Don't use `--admin` to bypass branch protection unless it's a genuine emergency + +### 🎯 Remember: +- Feature branches are **independent** - they start from develop and merge back to develop +- `git pull` = fetch + merge for current branch +- `git fetch --all --tags` = comprehensive sync without merging +- Sync frequency depends on team activity level +- **When in doubt, pull develop before doing anything important** + +--- + +## Getting Help + +If you're stuck: + +```bash +# Check your current state +git status + +# View recent commits +git log --oneline -10 + +# See what's on remote +git fetch +git log origin/develop --oneline -10 + +# Undo local changes (if needed) +git checkout -- file.py # Discard changes to file +git reset --hard origin/develop # Reset to match remote (DESTRUCTIVE) +``` + +**Still stuck?** Check the full workflow guide: [git-workflow.md](git-workflow.md) + +--- + +## Additional Resources + +- [Pro Git Book](https://git-scm.com/book/en/v2) - Free comprehensive guide +- [Git Branching Model](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) - Gitflow explained +- [GitHub CLI Manual](https://cli.github.com/manual/) - gh commands reference +- [RMAgent Git Workflow](git-workflow.md) - Project-specific workflow guide diff --git a/docs/guides/git-workflow.md b/docs/guides/git-workflow.md new file mode 100644 index 0000000..b766440 --- /dev/null +++ b/docs/guides/git-workflow.md @@ -0,0 +1,424 @@ +# Git Workflow Guide - RMAgent + +## Overview + +RMAgent uses a **gitflow** workflow to manage development. This guide will walk you through the daily workflow, from starting a new feature to merging it into production. + +**📚 New to git or team collaboration?** See **[git-for-newbies.md](git-for-newbies.md)** for: +- `git pull` vs `git fetch` explained +- How feature branches relate to develop +- Multi-developer sync strategies +- Common collaboration scenarios + +## Branch Structure + +``` +main Production-ready code (protected) + ↑ + PR (merge commit) + | +develop Integration branch (default) + ↑ + PR (squash merge) + | +feature/* Individual features +``` + +### Branches Explained + +- **main**: Production-ready code only. **Protected** - no direct commits allowed. + - Requires PR with 1 approving review + - Requires all status checks to pass (linting + tests) + - Requires branch to be up-to-date before merge + - No force pushes or deletions allowed + - Enforced for administrators + +- **develop**: Default working branch. **Protected** - all features merge here first. + - Requires PR (self-merge allowed for solo dev) + - Requires all status checks to pass (linting + tests) + - No force pushes or deletions allowed + - Admins can bypass in emergencies + +- **feature/**: Individual features. Branch from develop, PR back to develop. + - No protection - you can commit directly and force push if needed + - Still requires PR to merge into develop + +## Daily Workflow + +### 1. Starting a New Feature + +Always start from an up-to-date `develop` branch: + +```bash +# Make sure you're on develop +git checkout develop + +# Get latest changes +git pull origin develop + +# Create a new feature branch +git checkout -b feature/my-feature-name + +# Push the branch to GitHub +git push -u origin feature/my-feature-name +``` + +**Branch Naming Convention**: `feature/description-with-dashes` + +Examples: +- `feature/census-extraction` +- `feature/add-source-validation` +- `feature/fix-date-parsing` + +### 2. Working on Your Feature + +Make commits as you normally would: + +```bash +# Make changes to your files +# Stage changes +git add . + +# Commit with descriptive message +git commit -m "feat: add census extraction parser" + +# Push to GitHub +git push +``` + +**Commit Message Convention**: +- `feat:` - New feature +- `fix:` - Bug fix +- `docs:` - Documentation only +- `test:` - Adding or updating tests +- `refactor:` - Code refactoring +- `chore:` - Maintenance tasks + +**Pre-Commit Hook** (Automatic): + +When you commit, a pre-commit hook automatically checks if key documentation files need updating: + +``` +⚠️ DOCUMENTATION REVIEW REMINDER + +This commit may require updates to key documentation files: + + • Documentation structure changed + +Please review and update if necessary: + + 📄 CLAUDE.md - Project overview, structure, key patterns + 📄 README.md - User-facing docs, badges, quick start + ✓ AGENTS.md - Already modified +``` + +The hook will ask you to confirm before proceeding. This ensures CLAUDE.md, README.md, and AGENTS.md stay current. + +### 3. Creating a Pull Request + +When your feature is ready: + +```bash +# Make sure all changes are committed and pushed +git status +git push + +# Create PR using GitHub CLI +gh pr create --base develop --title "feat: add census extraction" --body "Description of changes" +``` + +Or create the PR through GitHub web interface. + +**PR Checklist**: +- [ ] All tests pass locally (`uv run pytest` or `/test`) +- [ ] Code is formatted (`uv run black .` or `/lint fix`) +- [ ] No linting errors (`uv run ruff check .` or `/lint`) +- [ ] Changes are documented (if needed) +- [ ] Pre-commit hook reviewed (auto-runs on commit) +- [ ] CLAUDE.md, README.md, AGENTS.md updated if needed + +### 4. Merging Your PR + +Once tests pass on GitHub Actions: + +```bash +# Merge using GitHub CLI (squash merge) +gh pr merge --squash --delete-branch + +# Or use GitHub web interface with "Squash and merge" button +``` + +Then update your local develop: + +```bash +git checkout develop +git pull origin develop +``` + +### 5. Releasing to Main (Milestone Complete) + +When you've completed a milestone from AI_AGENT_TODO.md: + +```bash +# Make sure develop is up to date +git checkout develop +git pull origin develop + +# Create PR to main +gh pr create --base main --title "Release: Milestone X Complete" --body "Summary of changes" + +# Review and merge with "Create a merge commit" (not squash) +gh pr merge --merge +``` + +## Common Scenarios + +### Switching Between Features + +```bash +# Save your current work +git add . +git commit -m "wip: partial implementation" +git push + +# Switch to different feature +git checkout feature/other-feature + +# Or start a new one +git checkout develop +git pull +git checkout -b feature/new-feature +``` + +### Updating Feature Branch with Latest Develop + +If develop has new changes you need: + +```bash +# On your feature branch +git checkout feature/my-feature + +# Get latest develop +git fetch origin develop + +# Merge develop into your feature +git merge origin/develop + +# Resolve any conflicts, then push +git push +``` + +### Abandoning a Feature + +```bash +# Switch back to develop +git checkout develop + +# Delete local branch +git branch -D feature/unwanted-feature + +# Delete remote branch +git push origin --delete feature/unwanted-feature +``` + +### Fixing a Mistake in Your Last Commit + +```bash +# Make the fix +git add . + +# Amend the last commit +git commit --amend --no-edit + +# Force push (only safe on feature branches!) +git push --force +``` + +## Automated Checks & Hooks + +RMAgent has three layers of automated checks: + +### 1. Git Pre-Commit Hook (Local) + +**Location:** `.git/hooks/pre-commit` + +**Triggers when:** +- Documentation structure changes → Reminds to check CLAUDE.md +- Test modifications → Reminds to check README.md for coverage stats +- Core code changes → Reminds to check CLAUDE.md +- Dependency updates → Reminds to check README.md + +**What it does:** +- Detects changes that might affect key documentation +- Shows which docs need review (with checkmarks for already-modified files) +- Prompts to confirm before allowing commit +- Prevents accidental outdated documentation + +**Example:** +``` +⚠️ DOCUMENTATION REVIEW REMINDER + + • Tests modified (check if coverage stats need updating) + +Please review and update if necessary: + + ✓ CLAUDE.md - Already modified + 📄 README.md - User-facing docs, badges, quick start + 📄 AGENTS.md - LangChain patterns, agent architecture + +Continue with commit? (y/n) +``` + +### 2. Claude Code Hooks (Development) + +**Location:** `.claude/settings.local.json` + +**PostToolUse Hook** - After running pytest with coverage: +``` +📊 Test coverage: 88% +💡 Reminder: Update coverage stats in CLAUDE.md and README.md if significantly changed +``` + +**PreToolUse Hook** - Before git push: +``` +⚠️ Pushing to remote. Recent commits: +66a29ca docs: reorganize documentation structure +e3937b5 feat: add Claude Code slash commands +``` + +**See [`docs/guides/claude-code-setup.md`](claude-code-setup.md) for full hook documentation.** + +### 3. GitHub Actions CI/CD (Remote) + +**Location:** `.github/workflows/pr-tests.yml` + +Every PR automatically runs: +1. **Linting** - `ruff check` and `black --check` +2. **Tests** - Full test suite with coverage +3. **Coverage Check** - Must maintain 80%+ coverage + +**Required for merge** due to branch protection on `develop` and `main`. + +**If tests fail**: +1. Check the Actions tab on GitHub for error details +2. Fix the issues locally +3. Push the fixes (CI runs again automatically) + +```bash +# Run the same checks locally before pushing +uv run black . +uv run ruff check . +uv run pytest --cov=rmagent --cov-fail-under=80 +``` + +## Cheat Sheet + +### Quick Commands + +```bash +# Start new feature +git checkout develop && git pull && git checkout -b feature/name + +# Save and push work +git add . && git commit -m "message" && git push + +# Create PR to develop +gh pr create --base develop + +# Merge PR and update local +gh pr merge --squash --delete-branch && git checkout develop && git pull + +# Check status +git status +git log --oneline -10 +git branch -a +``` + +### Useful Git Commands + +```bash +# See what changed +git diff # Uncommitted changes +git diff --staged # Staged changes +git log --oneline -10 # Recent commits + +# Undo changes +git checkout -- file.py # Discard changes to file +git reset HEAD file.py # Unstage file +git reset --soft HEAD~1 # Undo last commit (keep changes) + +# Branch info +git branch -a # List all branches +git branch -vv # Show tracking info +git remote -v # Show remotes +``` + +## Troubleshooting + +### "Your branch is behind origin/develop" + +```bash +git pull origin develop +``` + +### "Your branch has diverged from origin/develop" + +```bash +git fetch origin +git rebase origin/develop +git push --force +``` + +### "Merge conflict" + +1. Git will mark conflicts in files with `<<<<<<<` markers +2. Edit files to resolve conflicts (remove markers, keep desired code) +3. Stage resolved files: `git add file.py` +4. Complete merge: `git commit` (or `git rebase --continue` if rebasing) +5. Push: `git push` + +### PR Tests Failing + +```bash +# Run tests locally to debug +uv run pytest -v + +# Check specific test +uv run pytest tests/unit/test_file.py::test_name -v + +# Run with more output +uv run pytest -vv -s +``` + +### Need Help? + +- Check git status: `git status` +- View recent history: `git log --oneline --graph --all -10` +- See what branch you're on: `git branch` +- GitHub CLI help: `gh pr --help` + +## Best Practices + +1. **Keep features small** - Easier to review and merge +2. **Commit often** - Small, logical commits are better than large ones +3. **Write good commit messages** - Future you will thank you +4. **Pull before push** - Stay up to date with develop +5. **Test before PR** - Don't rely on CI to catch basic issues +6. **One feature per branch** - Don't mix unrelated changes +7. **Delete merged branches** - Keep your branch list clean +8. **Trust the hooks** - Pre-commit and Claude Code hooks help maintain quality +9. **Update docs proactively** - Don't wait for the hook to remind you +10. **Use slash commands** - `/test`, `/lint`, `/coverage` for quick checks + +## Resources + +### RMAgent Documentation +- **[git-for-newbies.md](git-for-newbies.md)** - Git collaboration fundamentals (pull vs fetch, branch relationships, sync strategies) +- **[developer-guide.md](developer-guide.md)** - Complete developer documentation +- **[claude-code-setup.md](claude-code-setup.md)** - Slash commands and hooks + +### External Resources +- [Understanding Git Branching](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) +- [Conventional Commits](https://www.conventionalcommits.org/) +- [GitHub CLI Manual](https://cli.github.com/manual/) +- [Gitflow Workflow](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) diff --git a/TESTING.md b/docs/guides/testing-guide.md similarity index 100% rename from TESTING.md rename to docs/guides/testing-guide.md diff --git a/USAGE.md b/docs/guides/user-guide.md similarity index 100% rename from USAGE.md rename to docs/guides/user-guide.md diff --git a/docs/DATA_PARSING_TODO.md b/docs/projects/ai-agent/data-parsing-todo.md similarity index 100% rename from docs/DATA_PARSING_TODO.md rename to docs/projects/ai-agent/data-parsing-todo.md diff --git a/docs/RM_Features_using_Langchain.md b/docs/projects/ai-agent/langchain-features.md similarity index 100% rename from docs/RM_Features_using_Langchain.md rename to docs/projects/ai-agent/langchain-features.md diff --git a/docs/RM11_LangChain_Upgrade.md b/docs/projects/ai-agent/langchain-upgrade.md similarity index 100% rename from docs/RM11_LangChain_Upgrade.md rename to docs/projects/ai-agent/langchain-upgrade.md diff --git a/docs/MULTI_AGENT_PLAN.md b/docs/projects/ai-agent/multi-agent-plan.md similarity index 100% rename from docs/MULTI_AGENT_PLAN.md rename to docs/projects/ai-agent/multi-agent-plan.md diff --git a/docs/AI_AGENT_TODO.md b/docs/projects/ai-agent/roadmap.md similarity index 100% rename from docs/AI_AGENT_TODO.md rename to docs/projects/ai-agent/roadmap.md diff --git a/docs/RM11_TimelineTODO.md b/docs/projects/ai-agent/timeline-todo.md similarity index 100% rename from docs/RM11_TimelineTODO.md rename to docs/projects/ai-agent/timeline-todo.md diff --git a/docs/Biography_Citation_Implementation_Plan.md b/docs/projects/biography-citations/citation-implementation.md similarity index 100% rename from docs/Biography_Citation_Implementation_Plan.md rename to docs/projects/biography-citations/citation-implementation.md diff --git a/docs/MARRIED_NAME_SEARCH_OPTIMIZATION.md b/docs/projects/biography-citations/married-name-search.md similarity index 100% rename from docs/MARRIED_NAME_SEARCH_OPTIMIZATION.md rename to docs/projects/biography-citations/married-name-search.md diff --git a/docs/RM11_CensusExtraction_Architecture.md b/docs/projects/census-extraction/architecture.md similarity index 100% rename from docs/RM11_CensusExtraction_Architecture.md rename to docs/projects/census-extraction/architecture.md diff --git a/docs/RM11_CensusExtraction_Plan.md b/docs/projects/census-extraction/implementation-plan.md similarity index 100% rename from docs/RM11_CensusExtraction_Plan.md rename to docs/projects/census-extraction/implementation-plan.md diff --git a/data_reference/RM11_Biography_Best_Practices.md b/docs/reference/biography/biography-best-practices.md similarity index 100% rename from data_reference/RM11_Biography_Best_Practices.md rename to docs/reference/biography/biography-best-practices.md diff --git a/data_reference/RM11_Timeline_Construction.md b/docs/reference/biography/timeline-construction.md similarity index 100% rename from data_reference/RM11_Timeline_Construction.md rename to docs/reference/biography/timeline-construction.md diff --git a/data_reference/RM11_BLOB_CitationFields.md b/docs/reference/data-formats/blob-citation-fields.md similarity index 100% rename from data_reference/RM11_BLOB_CitationFields.md rename to docs/reference/data-formats/blob-citation-fields.md diff --git a/data_reference/RM11_BLOB_SourceFields.md b/docs/reference/data-formats/blob-source-fields.md similarity index 100% rename from data_reference/RM11_BLOB_SourceFields.md rename to docs/reference/data-formats/blob-source-fields.md diff --git a/data_reference/RM11_BLOB_SourceTemplateFieldDefs.md b/docs/reference/data-formats/blob-template-field-defs.md similarity index 100% rename from data_reference/RM11_BLOB_SourceTemplateFieldDefs.md rename to docs/reference/data-formats/blob-template-field-defs.md diff --git a/data_reference/RM11_Date_Format.md b/docs/reference/data-formats/date-format.md similarity index 100% rename from data_reference/RM11_Date_Format.md rename to docs/reference/data-formats/date-format.md diff --git a/data_reference/RM11_Date_Format.yaml b/docs/reference/data-formats/date-format.yaml similarity index 100% rename from data_reference/RM11_Date_Format.yaml rename to docs/reference/data-formats/date-format.yaml diff --git a/data_reference/RM11_FactTypes.md b/docs/reference/data-formats/fact-types.md similarity index 100% rename from data_reference/RM11_FactTypes.md rename to docs/reference/data-formats/fact-types.md diff --git a/data_reference/RM11_Place_Format.md b/docs/reference/data-formats/place-format.md similarity index 100% rename from data_reference/RM11_Place_Format.md rename to docs/reference/data-formats/place-format.md diff --git a/data_reference/RM11_Sentence_Templates.md b/docs/reference/data-formats/sentence-templates.md similarity index 100% rename from data_reference/RM11_Sentence_Templates.md rename to docs/reference/data-formats/sentence-templates.md diff --git a/data_reference/RM11_Data_Quality_Rules.md b/docs/reference/query-patterns/data-quality-rules.md similarity index 100% rename from data_reference/RM11_Data_Quality_Rules.md rename to docs/reference/query-patterns/data-quality-rules.md diff --git a/data_reference/RM11_Query_Patterns.md b/docs/reference/query-patterns/query-patterns.md similarity index 100% rename from data_reference/RM11_Query_Patterns.md rename to docs/reference/query-patterns/query-patterns.md diff --git a/data_reference/RM11_schema_annotated.sql b/docs/reference/schema/annotated-schema.sql similarity index 100% rename from data_reference/RM11_schema_annotated.sql rename to docs/reference/schema/annotated-schema.sql diff --git a/data_reference/RM11_DataDef.yaml b/docs/reference/schema/data-definitions.yaml similarity index 100% rename from data_reference/RM11_DataDef.yaml rename to docs/reference/schema/data-definitions.yaml diff --git a/data_reference/RM11_EventTable_Details.md b/docs/reference/schema/event-table-details.md similarity index 100% rename from data_reference/RM11_EventTable_Details.md rename to docs/reference/schema/event-table-details.md diff --git a/data_reference/RM11_Name_Display_Logic.md b/docs/reference/schema/name-display-logic.md similarity index 100% rename from data_reference/RM11_Name_Display_Logic.md rename to docs/reference/schema/name-display-logic.md diff --git a/data_reference/RM11_Relationships.md b/docs/reference/schema/relationships.md similarity index 100% rename from data_reference/RM11_Relationships.md rename to docs/reference/schema/relationships.md diff --git a/data_reference/RM11_Schema_Reference.md b/docs/reference/schema/schema-reference.md similarity index 100% rename from data_reference/RM11_Schema_Reference.md rename to docs/reference/schema/schema-reference.md diff --git a/data_reference/RM11_schema.json b/docs/reference/schema/schema.json similarity index 100% rename from data_reference/RM11_schema.json rename to docs/reference/schema/schema.json diff --git a/rmagent/agent/formatters.py b/rmagent/agent/formatters.py index d2b88cc..f18b765 100644 --- a/rmagent/agent/formatters.py +++ b/rmagent/agent/formatters.py @@ -8,7 +8,6 @@ from __future__ import annotations from rmagent.rmlib.parsers.date_parser import parse_rm_date -from rmagent.rmlib.queries import QueryService class GenealogyFormatters: @@ -74,7 +73,7 @@ def format_events(events, event_citations: dict[int, list[int]] | None = None) - # Add note if present (often contains full article transcriptions) if note: # Show "NOTE: " prefix only once, then indent subsequent lines - note_lines = note.split('\n') + note_lines = note.split("\n") for idx, note_line in enumerate(note_lines): if note_line.strip(): if idx == 0: @@ -233,9 +232,7 @@ def format_siblings(siblings) -> list[str]: return lines @staticmethod - def format_early_life( - person, parents, siblings, life_span: dict[str, int | None] - ) -> str: + def format_early_life(person, parents, siblings, life_span: dict[str, int | None]) -> str: """Format early life narrative with birth order, parental ages, migration notes.""" person_name = GenealogyFormatters.format_person_name(person) birth_year = life_span.get("birth_year") @@ -322,16 +319,10 @@ def format_family_losses(life_span, parents, spouses, siblings, children) -> str name = GenealogyFormatters.format_person_name(data) losses.append(f"- {name} ({relation}) died in {death_year_value}.") - return ( - "\n".join(losses) - if losses - else "No recorded family deaths occurred during the subject's lifetime." - ) + return "\n".join(losses) if losses else "No recorded family deaths occurred during the subject's lifetime." @staticmethod - def calculate_parent_age( - parents, birth_year_key: str, child_birth_year: int | None - ) -> int | None: + def calculate_parent_age(parents, birth_year_key: str, child_birth_year: int | None) -> int | None: """Calculate parent's age at child's birth.""" if not parents or child_birth_year is None: return None diff --git a/rmagent/agent/genealogy_agent.py b/rmagent/agent/genealogy_agent.py index 0bf0890..6664c45 100644 --- a/rmagent/agent/genealogy_agent.py +++ b/rmagent/agent/genealogy_agent.py @@ -63,9 +63,7 @@ class GenealogyAgent: # ---- Public API ----------------------------------------------------- - def generate_biography( - self, person_id: int, style: str = "standard", max_tokens: int | None = None - ) -> LLMResult: + def generate_biography(self, person_id: int, style: str = "standard", max_tokens: int | None = None) -> LLMResult: """Generate a narrative biography using the configured prompts/LLM.""" context = self._build_biography_context(person_id, style) @@ -84,9 +82,7 @@ def _run_validator(db: RMDatabase | None) -> QualityReport: return self._with_database(_run_validator) - def ask( - self, question: str, person_id: int | None = None, max_tokens: int | None = None - ) -> LLMResult: + def ask(self, question: str, person_id: int | None = None, max_tokens: int | None = None) -> LLMResult: """Answer ad-hoc questions with light context and persistent memory.""" context = self._build_qa_context(question, person_id) @@ -138,15 +134,11 @@ def _builder(db: RMDatabase | None) -> dict[str, str]: life_span, parents, spouses, siblings, children ) sibling_lines = GenealogyFormatters.format_siblings(siblings) - sibling_summary = ( - "\n".join(sibling_lines) if sibling_lines else "No sibling records available." - ) + sibling_summary = "\n".join(sibling_lines) if sibling_lines else "No sibling records available." # Extract person-level notes person_notes = person.get("Note") or "" - person_notes_formatted = ( - person_notes if person_notes else "No person-level notes available." - ) + person_notes_formatted = person_notes if person_notes else "No person-level notes available." # Generate style-specific length guidance length_guidance = self._get_length_guidance_for_style(style) @@ -185,9 +177,7 @@ def _builder(db: RMDatabase | None) -> dict[str, str]: snippets.append(GenealogyFormatters.format_family_overview(spouses, children, siblings)) snippets.append(GenealogyFormatters.format_early_life(person, parents, siblings, life_span)) - history_snippets = [ - f"Q: {turn.question}\nA: {turn.answer}" for turn in self._memory[-3:] - ] + history_snippets = [f"Q: {turn.question}\nA: {turn.answer}" for turn in self._memory[-3:]] snippets.extend(history_snippets) return { @@ -297,9 +287,7 @@ def _fetch_siblings(self, query: QueryService, parents: dict[str, str] | None, p ) return siblings - def _build_event_citations_map( - self, query: QueryService, events: list[dict] - ) -> dict[int, list[int]]: + def _build_event_citations_map(self, query: QueryService, events: list[dict]) -> dict[int, list[int]]: """ Build mapping of EventID -> list of CitationIDs for inline citation markers. @@ -333,9 +321,7 @@ def _build_event_citations_map( return event_citations_map - def _collect_all_citations_for_person( - self, query: QueryService, person_id: int - ) -> list[dict]: + def _collect_all_citations_for_person(self, query: QueryService, person_id: int) -> list[dict]: """ Collect all citations for a person's events using QueryService. Returns list of citation dicts with CitationID, SourceID, SourceName, CitationName, EventType. diff --git a/rmagent/agent/llm_provider.py b/rmagent/agent/llm_provider.py index 4f15b10..99ca3ed 100644 --- a/rmagent/agent/llm_provider.py +++ b/rmagent/agent/llm_provider.py @@ -86,9 +86,7 @@ def __init__( self.model = model self.default_max_tokens = default_max_tokens self.retry_config = retry_config or RetryConfig() - self.prompt_cost_per_1k, self.completion_cost_per_1k = ( - pricing_per_1k if pricing_per_1k else (0.0, 0.0) - ) + self.prompt_cost_per_1k, self.completion_cost_per_1k = pricing_per_1k if pricing_per_1k else (0.0, 0.0) def generate(self, prompt: str, **kwargs: Any) -> LLMResult: """Invoke provider with retry semantics.""" @@ -135,9 +133,7 @@ def _with_cost(self, result: LLMResult) -> LLMResult: def _invoke(self, prompt: str, **kwargs: Any) -> LLMResult: """Concrete providers implement this call.""" - def _log_debug( - self, prompt: str, result: LLMResult, elapsed: float, kwargs: dict[str, Any] - ) -> None: + def _log_debug(self, prompt: str, result: LLMResult, elapsed: float, kwargs: dict[str, Any]) -> None: debug_logger = logging.getLogger("rmagent.llm_debug") if not debug_logger.isEnabledFor(logging.DEBUG): return diff --git a/rmagent/agent/tools.py b/rmagent/agent/tools.py index 4097508..9dc8970 100644 --- a/rmagent/agent/tools.py +++ b/rmagent/agent/tools.py @@ -78,10 +78,7 @@ def __init__(self, query_service: QueryService): self.query_service = query_service def run(self, person_id: int, generations: int = 3): - return [ - dict(row) - for row in self.query_service.get_direct_ancestors(person_id, generations=generations) - ] + return [dict(row) for row in self.query_service.get_direct_ancestors(person_id, generations=generations)] @dataclass @@ -99,14 +96,8 @@ def run(self, person_a: int, person_b: int) -> dict[str, str | None]: if person_a == person_b: return {"relationship": "Same person"} - ancestors_a = { - row["PersonID"]: row - for row in self.query_service.get_direct_ancestors(person_a, generations=5) - } - ancestors_b = { - row["PersonID"]: row - for row in self.query_service.get_direct_ancestors(person_b, generations=5) - } + ancestors_a = {row["PersonID"]: row for row in self.query_service.get_direct_ancestors(person_a, generations=5)} + ancestors_b = {row["PersonID"]: row for row in self.query_service.get_direct_ancestors(person_b, generations=5)} shared = set(ancestors_a).intersection(ancestors_b) if not shared: @@ -137,8 +128,7 @@ def run(self): report = validator.run_all_checks() return { "totals_by_severity": { - k.value if hasattr(k, "value") else str(k): v - for k, v in report.totals_by_severity.items() + k.value if hasattr(k, "value") else str(k): v for k, v in report.totals_by_severity.items() }, "totals_by_category": report.totals_by_category, "issue_count": report.summary.get("issue_total", 0), diff --git a/rmagent/cli/commands/bio.py b/rmagent/cli/commands/bio.py index d8c7c75..96ebd68 100644 --- a/rmagent/cli/commands/bio.py +++ b/rmagent/cli/commands/bio.py @@ -99,7 +99,8 @@ def bio( }[citation_style.lower()] # Create generator and agent - config = ctx.load_config() + # Skip LLM credential validation if using template-based generation + config = ctx.load_config(require_llm_credentials=not no_ai) agent = ( None if no_ai diff --git a/rmagent/cli/commands/export.py b/rmagent/cli/commands/export.py index 174fada..807fa54 100644 --- a/rmagent/cli/commands/export.py +++ b/rmagent/cli/commands/export.py @@ -87,7 +87,7 @@ def hugo( }[bio_length.lower()] # Create exporter - config = ctx.load_config() + config = ctx.load_config(require_llm_credentials=False) exporter = HugoExporter( db=config.database.database_path, extension_path=config.database.sqlite_extension_path, @@ -100,9 +100,7 @@ def hugo( # Get all person IDs from rmagent.rmlib.database import RMDatabase - with RMDatabase( - config.database.database_path, extension_path=config.database.sqlite_extension_path - ) as db: + with RMDatabase(config.database.database_path, extension_path=config.database.sqlite_extension_path) as db: all_persons = db.query("SELECT PersonID FROM PersonTable") person_ids = [p["PersonID"] for p in all_persons] diff --git a/rmagent/cli/commands/person.py b/rmagent/cli/commands/person.py index ca1f76b..2973901 100644 --- a/rmagent/cli/commands/person.py +++ b/rmagent/cli/commands/person.py @@ -46,9 +46,7 @@ def person(ctx, person_id: int, events: bool, ancestors: bool, descendants: bool raise click.Abort() # Display person header - name = ( - f"{_get_value(person_data, 'Given')} {_get_value(person_data, 'Surname')}".strip() - ) + name = f"{_get_value(person_data, 'Given')} {_get_value(person_data, 'Surname')}".strip() birth_year = _get_value(person_data, "BirthYear", "?") death_year = _get_value(person_data, "DeathYear", "?") console.print(f"\n[bold]📋 Person: {name}[/bold] ({birth_year}–{death_year})") @@ -68,9 +66,7 @@ def person(ctx, person_id: int, events: bool, ancestors: bool, descendants: bool from rmagent.rmlib.parsers.date_parser import parse_rm_date date_str = _get_value(event, "Date") - formatted_date = ( - parse_rm_date(date_str).format_display() if date_str else "" - ) + formatted_date = parse_rm_date(date_str).format_display() if date_str else "" table.add_row( formatted_date, _get_value(event, "EventType"), @@ -89,15 +85,13 @@ def person(ctx, person_id: int, events: bool, ancestors: bool, descendants: bool # Check for father if _get_value(parents_row, "FatherID"): father_name = ( - f"{_get_value(parents_row, 'FatherGiven')} " - f"{_get_value(parents_row, 'FatherSurname')}" + f"{_get_value(parents_row, 'FatherGiven')} " f"{_get_value(parents_row, 'FatherSurname')}" ).strip() console.print(f" • Father: {father_name}") # Check for mother if _get_value(parents_row, "MotherID"): mother_name = ( - f"{_get_value(parents_row, 'MotherGiven')} " - f"{_get_value(parents_row, 'MotherSurname')}" + f"{_get_value(parents_row, 'MotherGiven')} " f"{_get_value(parents_row, 'MotherSurname')}" ).strip() console.print(f" • Mother: {mother_name}") @@ -106,9 +100,7 @@ def person(ctx, person_id: int, events: bool, ancestors: bool, descendants: bool if spouses: console.print("\n[bold]Spouses:[/bold]") for spouse in spouses: - spouse_name = ( - f"{_get_value(spouse, 'Given')} {_get_value(spouse, 'Surname')}".strip() - ) + spouse_name = f"{_get_value(spouse, 'Given')} {_get_value(spouse, 'Surname')}".strip() console.print(f" • {spouse_name}") # Get children @@ -116,9 +108,7 @@ def person(ctx, person_id: int, events: bool, ancestors: bool, descendants: bool if children: console.print("\n[bold]Children:[/bold]") for child in children: - child_name = ( - f"{_get_value(child, 'Given')} {_get_value(child, 'Surname')}".strip() - ) + child_name = f"{_get_value(child, 'Given')} {_get_value(child, 'Surname')}".strip() console.print(f" • {child_name}") # Show ancestors if requested @@ -141,8 +131,7 @@ def person(ctx, person_id: int, events: bool, ancestors: bool, descendants: bool console.print("\n[bold]Descendants:[/bold] (4 generations)") for descendant in descendant_rows: descendant_name = ( - f"{_get_value(descendant, 'Given')} " - f"{_get_value(descendant, 'Surname')}" + f"{_get_value(descendant, 'Given')} " f"{_get_value(descendant, 'Surname')}" ).strip() gen = _get_value(descendant, "Generation", 1) indent = " " * gen diff --git a/rmagent/cli/commands/quality.py b/rmagent/cli/commands/quality.py index a53b50e..3bc6435 100644 --- a/rmagent/cli/commands/quality.py +++ b/rmagent/cli/commands/quality.py @@ -112,7 +112,7 @@ def quality( task = progress.add_task("Running data quality validation...", total=None) # Create generator - config = ctx.load_config() + config = ctx.load_config(require_llm_credentials=False) generator = QualityReportGenerator( db=config.database.database_path, extension_path=config.database.sqlite_extension_path, @@ -141,9 +141,7 @@ def quality( console.print() console.print(report_output) else: - console.print( - "[yellow]Warning:[/yellow] HTML and CSV formats require --output option" - ) + console.print("[yellow]Warning:[/yellow] HTML and CSV formats require --output option") except Exception as e: console.print(f"\n[red]Error:[/red] {e}") diff --git a/rmagent/cli/commands/search.py b/rmagent/cli/commands/search.py index 06d371f..421c5ee 100644 --- a/rmagent/cli/commands/search.py +++ b/rmagent/cli/commands/search.py @@ -1,6 +1,7 @@ """Search command - Search database by name or place.""" import re + import click from rich.console import Console from rich.table import Table @@ -22,9 +23,7 @@ def _get_value(row, key, default=""): def _get_surname_metaphone(db, surname: str) -> str | None: """Get Metaphone encoding for a surname from the database.""" # Query a sample name to get the Metaphone encoding - result = db.query_one( - "SELECT SurnameMP FROM NameTable WHERE Surname = ? COLLATE RMNOCASE LIMIT 1", (surname,) - ) + result = db.query_one("SELECT SurnameMP FROM NameTable WHERE Surname = ? COLLATE RMNOCASE LIMIT 1", (surname,)) return result["SurnameMP"] if result else None @@ -49,9 +48,9 @@ def _parse_name_variations(name: str, all_variants: list[str]) -> list[str]: return [name] # Extract brackets and base name (everything before first bracket) - bracket_pattern = r'\[([^\]]+)\]' + bracket_pattern = r"\[([^\]]+)\]" brackets = re.findall(bracket_pattern, name) - base_name = re.sub(bracket_pattern, '', name).strip() + base_name = re.sub(bracket_pattern, "", name).strip() if not brackets: return [name] @@ -200,9 +199,7 @@ def search( # Validate radius search options if kilometers is not None and miles is not None: - console.print( - "[red]Error:[/red] Cannot specify both --kilometers and --miles. Choose one." - ) + console.print("[red]Error:[/red] Cannot specify both --kilometers and --miles. Choose one.") raise click.Abort() radius_km = None @@ -221,9 +218,7 @@ def search( radius_unit = "mi" if radius_km is not None and not place: - console.print( - "[red]Error:[/red] Radius search requires --place to be specified" - ) + console.print("[red]Error:[/red] Radius search requires --place to be specified") raise click.Abort() with ctx.get_database() as db: @@ -232,7 +227,7 @@ def search( # Search by name if name: # Load config to get surname variants for [ALL] keyword - config = load_app_config(configure_logger=False) + config = load_app_config(configure_logger=False, require_llm_credentials=False) all_variants = config.search.surname_variants_all # Parse name variations (supports [variant] and [ALL] syntax) @@ -240,9 +235,7 @@ def search( # Show which variations are being searched if len(name_variations) > 1: - console.print( - f"[dim]Searching {len(name_variations)} name variations...[/dim]" - ) + console.print(f"[dim]Searching {len(name_variations)} name variations...[/dim]") # Collect results from all variations all_results = [] @@ -257,9 +250,7 @@ def search( # Single word - could be surname or given name # Try both try: - surname_results = queries.search_primary_names( - surname=name_parts[0], limit=limit - ) + surname_results = queries.search_primary_names(surname=name_parts[0], limit=limit) for r in surname_results: if r["PersonID"] not in seen_person_ids: all_results.append(r) @@ -267,9 +258,7 @@ def search( except ValueError: pass try: - given_results = queries.search_primary_names( - given=name_parts[0], limit=limit - ) + given_results = queries.search_primary_names(given=name_parts[0], limit=limit) for r in given_results: if r["PersonID"] not in seen_person_ids: all_results.append(r) @@ -313,15 +302,11 @@ def search( if len(variation.strip().split()) > 1: # Multi-word: Use word-based search (more precise) # This finds people where ALL words appear across name fields - variation_results = queries.search_names_by_words( - search_text=variation, limit=limit - ) + variation_results = queries.search_names_by_words(search_text=variation, limit=limit) else: # Single word: Use flexible search # This finds people where word appears in surname OR given name - variation_results = queries.search_names_flexible( - search_text=variation, limit=limit - ) + variation_results = queries.search_names_flexible(search_text=variation, limit=limit) # Add unique results for r in variation_results: @@ -347,9 +332,7 @@ def search( # Display name search results if results: - console.print( - f"\n[bold]🔍 Found {len(results)} person(s) matching '{name}':[/bold]" - ) + console.print(f"\n[bold]🔍 Found {len(results)} person(s) matching '{name}':[/bold]") console.print("─" * 60) table = Table(show_header=True, header_style="bold cyan") @@ -420,9 +403,7 @@ def search( ) if radius_results: - console.print( - f"\n[bold]🌍 Found {len(radius_results)} place(s) within radius:[/bold]" - ) + console.print(f"\n[bold]🌍 Found {len(radius_results)} place(s) within radius:[/bold]") table = Table(show_header=True, header_style="bold cyan") table.add_column("ID", style="dim", width=8) @@ -461,9 +442,7 @@ def search( else: # Standard place search (no radius) if place_results: - console.print( - f"\n[bold]📍 Found {len(place_results)} place(s) matching '{place}':[/bold]" - ) + console.print(f"\n[bold]📍 Found {len(place_results)} place(s) matching '{place}':[/bold]") console.print("─" * 60) table = Table(show_header=True, header_style="bold cyan") diff --git a/rmagent/cli/commands/timeline.py b/rmagent/cli/commands/timeline.py index 22d62f4..4ad35eb 100644 --- a/rmagent/cli/commands/timeline.py +++ b/rmagent/cli/commands/timeline.py @@ -70,7 +70,7 @@ def timeline( task = progress.add_task(f"Generating timeline for person {person_id}...", total=None) # Create generator - config = ctx.load_config() + config = ctx.load_config(require_llm_credentials=False) generator = TimelineGenerator( db=config.database.database_path, extension_path=config.database.sqlite_extension_path, diff --git a/rmagent/cli/main.py b/rmagent/cli/main.py index e72045e..553bfdd 100644 --- a/rmagent/cli/main.py +++ b/rmagent/cli/main.py @@ -36,10 +36,14 @@ def __init__( self.config = None self.db = None - def load_config(self): - """Load application configuration.""" + def load_config(self, require_llm_credentials: bool = True): + """Load application configuration. + + Args: + require_llm_credentials: When True, validate LLM provider credentials. + """ if not self.config: - self.config = load_app_config() + self.config = load_app_config(require_llm_credentials=require_llm_credentials) # Override with CLI options if provided if self.database_path: self.config.database.database_path = self.database_path @@ -50,7 +54,8 @@ def load_config(self): def get_database(self) -> RMDatabase: """Get database connection (creates if needed).""" if not self.db: - config = self.load_config() + # Database access doesn't require LLM credentials + config = self.load_config(require_llm_credentials=False) db_path = config.database.database_path if not db_path: raise click.UsageError( @@ -154,11 +159,11 @@ def completion(shell: str): # For fish rmagent completion fish """ - shell_upper = shell.upper() prog_name = "rmagent" if shell == "zsh": - click.echo(f"""# Add this to your ~/.zshrc: + click.echo( + f"""# Add this to your ~/.zshrc: eval "$(_RMAGENT_COMPLETE=zsh_source {prog_name})" # Or generate and save the completion script: @@ -166,23 +171,29 @@ def completion(shell: str): # Then add this to ~/.zshrc: fpath=(~/.zfunc $fpath) autoload -Uz compinit && compinit -""") +""" + ) elif shell == "bash": - click.echo(f"""# Add this to your ~/.bashrc: + click.echo( + f"""# Add this to your ~/.bashrc: eval "$(_RMAGENT_COMPLETE=bash_source {prog_name})" # Or generate and save the completion script: _RMAGENT_COMPLETE=bash_source {prog_name} > ~/.bash_completion.d/{prog_name} # Then add this to ~/.bashrc: source ~/.bash_completion.d/{prog_name} -""") +""" + ) elif shell == "fish": - click.echo(f"""# Add this to ~/.config/fish/completions/{prog_name}.fish: + click.echo( + f"""# Add this to ~/.config/fish/completions/{prog_name}.fish: _RMAGENT_COMPLETE=fish_source {prog_name} | source # Or generate and save the completion script: _RMAGENT_COMPLETE=fish_source {prog_name} > ~/.config/fish/completions/{prog_name}.fish -""") +""" + ) + cli.add_command(person.person) cli.add_command(bio.bio) diff --git a/rmagent/config/config.py b/rmagent/config/config.py index e7aa005..4036357 100644 --- a/rmagent/config/config.py +++ b/rmagent/config/config.py @@ -90,9 +90,7 @@ class LLMSettings(BaseModel): def check_provider(cls, provider: str) -> str: provider_lower = provider.lower() if provider_lower not in cls.allowed_providers: - raise ValueError( - f"Unknown provider '{provider}'. Allowed: {sorted(cls.allowed_providers)}" - ) + raise ValueError(f"Unknown provider '{provider}'. Allowed: {sorted(cls.allowed_providers)}") return provider_lower def ensure_credentials(self) -> None: @@ -173,9 +171,7 @@ class CitationSettings(BaseModel): def check_style(cls, style: str) -> str: style_lower = style.lower() if style_lower not in cls.allowed_styles: - raise ValueError( - f"Invalid citation style '{style}'. Allowed: {sorted(cls.allowed_styles)}" - ) + raise ValueError(f"Invalid citation style '{style}'. Allowed: {sorted(cls.allowed_styles)}") return style_lower @@ -306,6 +302,7 @@ def load_app_config( env_path: Path | None = None, auto_create_dirs: bool = True, configure_logger: bool = True, + require_llm_credentials: bool = True, ) -> AppConfig: """ Load application configuration. @@ -314,6 +311,7 @@ def load_app_config( env_path: Optional path to a .env file. Defaults to config/.env when not provided. auto_create_dirs: When True, create output/export directories. configure_logger: When True, configure global logging handlers. + require_llm_credentials: When True, validate LLM provider credentials. """ if env_path is None: env_path = DEFAULT_ENV_PATH @@ -336,9 +334,7 @@ def load_app_config( media_root = _env("RM_MEDIA_ROOT_DIRECTORY") database_settings = DatabaseSettings( database_path=Path(_env("RM_DATABASE_PATH", "data/Iiams.rmtree")), - sqlite_extension_path=Path( - _env("SQLITE_ICU_EXTENSION", "./sqlite-extension/icu.dylib") - ), + sqlite_extension_path=Path(_env("SQLITE_ICU_EXTENSION", "./sqlite-extension/icu.dylib")), media_root_directory=Path(media_root) if media_root else None, ) @@ -361,9 +357,7 @@ def load_app_config( ) search_settings = SearchSettings( - surname_variants_all=_env( - "SURNAME_VARIANTS_ALL", "Iams,Iames,Iiams,Iiames,Ijams,Ijames,Imes,Eimes" - ), + surname_variants_all=_env("SURNAME_VARIANTS_ALL", "Iams,Iames,Iiams,Iiames,Ijams,Ijames,Imes,Eimes"), ) logging_settings = LoggingSettings( @@ -391,10 +385,11 @@ def load_app_config( if configure_logger: configure_logging(config.logging) - try: - config.llm.ensure_credentials() - except ValueError as exc: - raise LLMError(str(exc)) from exc + if require_llm_credentials: + try: + config.llm.ensure_credentials() + except ValueError as exc: + raise LLMError(str(exc)) from exc return config diff --git a/rmagent/generators/biography.py b/rmagent/generators/biography.py deleted file mode 100644 index 50b1913..0000000 --- a/rmagent/generators/biography.py +++ /dev/null @@ -1,1579 +0,0 @@ -""" -Biography generator for RMAgent. - -Generates formatted biographical narratives following the 9-section structure -from RM11_Biography_Best_Practices.md. Handles privacy rules, citation formatting, -and length variations (short/standard/comprehensive). -""" - -from __future__ import annotations - -from dataclasses import dataclass, field -from datetime import datetime, timezone -from enum import Enum -from pathlib import Path -import time - -from rmagent.agent.genealogy_agent import GenealogyAgent -from rmagent.rmlib.database import RMDatabase -from rmagent.rmlib.models import OwnerType -from rmagent.rmlib.parsers.date_parser import is_unknown_date, parse_rm_date -from rmagent.rmlib.parsers.name_parser import format_full_name -from rmagent.rmlib.parsers.place_parser import format_place_medium, format_place_short -from rmagent.rmlib.queries import QueryService - - -class BiographyLength(str, Enum): - """Biography length variations.""" - - SHORT = "short" # 250-500 words (2-3 paragraphs) - STANDARD = "standard" # 500-1500 words (5-8 paragraphs) - COMPREHENSIVE = "comprehensive" # 1500+ words (10+ paragraphs, multiple sections) - - -class CitationStyle(str, Enum): - """Citation formatting styles.""" - - FOOTNOTE = "footnote" # Academic style with numbered footnotes - PARENTHETICAL = "parenthetical" # Genealogical style with inline source references - NARRATIVE = "narrative" # Popular style with narrative attribution - - -@dataclass -class LLMMetadata: - """Metadata from LLM generation for biography.""" - - provider: str # anthropic, openai, ollama - model: str - prompt_tokens: int - completion_tokens: int - total_tokens: int - prompt_time: float # seconds (context building) - llm_time: float # seconds (LLM generation) - cost: float | None = None - - -@dataclass -class EventContext: - """Contextual information for a single event.""" - - event_id: int - event_type: str - date: str # Formatted display date - place: str - details: str - note: str # Event note (EventTable.Note) - often contains full transcriptions - is_private: bool - proof: int - citations: list[dict] # CitationID, SourceID, Page, etc. - sort_date: int - - -@dataclass -class PersonContext: - """Complete person context for biography generation.""" - - person_id: int - full_name: str - given_name: str - surname: str - prefix: str | None - suffix: str | None - nickname: str | None - - birth_year: int | None - birth_date: str | None - birth_place: str | None - - death_year: int | None - death_date: str | None - death_place: str | None - - sex: int # 0=Male, 1=Female, 2=Unknown - is_private: bool - is_living: bool # Calculated based on 110-year rule - - # Person-level notes (PersonTable.Note) - person_notes: str | None = None - - # Relationships - father_id: int | None = None - father_name: str | None = None - mother_id: int | None = None - mother_name: str | None = None - spouses: list[dict] = field(default_factory=list) - children: list[dict] = field(default_factory=list) - siblings: list[dict] = field(default_factory=list) - - # Events categorized by type - vital_events: list[EventContext] = field(default_factory=list) - education_events: list[EventContext] = field(default_factory=list) - occupation_events: list[EventContext] = field(default_factory=list) - military_events: list[EventContext] = field(default_factory=list) - residence_events: list[EventContext] = field(default_factory=list) - other_events: list[EventContext] = field(default_factory=list) - - # Media - media_files: list[dict] = field(default_factory=list) - - # Sources - all_citations: list[dict] = field(default_factory=list) - - -@dataclass -class Biography: - """Generated biography with structured sections.""" - - person_id: int - full_name: str - length: BiographyLength - citation_style: CitationStyle - - # Generated content - introduction: str - early_life: str - education: str - career: str - marriage_family: str - later_life: str - death_legacy: str - footnotes: str # Footnotes section (only for FOOTNOTE citation style) - sources: str - - # Metadata - generated_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc).astimezone()) - word_count: int = 0 - privacy_applied: bool = False - birth_year: int | None = None - death_year: int | None = None - llm_metadata: LLMMetadata | None = None - citation_count: int = 0 - source_count: int = 0 - media_files: list[dict] = field(default_factory=list) # Media files for images - - def _calculate_word_count(self) -> int: - """Calculate word count from all biography sections.""" - all_text = "\n".join([ - self.introduction, - self.early_life, - self.education, - self.career, - self.marriage_family, - self.later_life, - self.death_legacy, - self.footnotes, - self.sources, - ]) - return len(all_text.split()) - - @staticmethod - def _format_tokens(count: int) -> str: - """Format token count with k suffix.""" - if count >= 1000: - return f"{count/1000:.1f}k" - return str(count) - - @staticmethod - def _format_duration(seconds: float) -> str: - """Format duration as Xm Ys or Xs.""" - if seconds >= 60: - minutes = int(seconds // 60) - secs = int(seconds % 60) - return f"{minutes}m{secs}s" if secs > 0 else f"{minutes}m" - return f"{int(seconds)}s" - - def render_metadata(self) -> str: - """Render Hugo-style front matter metadata.""" - lines = ["---"] - - # Title with years - years_str = "" - if self.birth_year or self.death_year: - birth = self.birth_year or "????" - death = self.death_year or "????" - years_str = f" ({birth}-{death})" - lines.append(f'Title: "Biography of {self.full_name}{years_str}"') - - # Timestamp in ISO 8601 format with timezone (format as -05:00) - tz_str = self.generated_at.strftime("%z") - tz_formatted = f"{tz_str[:3]}:{tz_str[3:]}" if tz_str else "" - date_str = self.generated_at.strftime("%Y-%m-%dT%H:%M:%S") + tz_formatted - lines.append(f'Date: {date_str}') - - # Person ID - lines.append(f'PersonID: {self.person_id}') - - # LLM Metadata (if available) - if self.llm_metadata: - lines.append(f'TokensIn: {self._format_tokens(self.llm_metadata.prompt_tokens)}') - lines.append(f'TokensOut: {self._format_tokens(self.llm_metadata.completion_tokens)}') - lines.append(f'TotalTokens: {self._format_tokens(self.llm_metadata.total_tokens)}') - lines.append(f'LLM: {self.llm_metadata.provider.capitalize()}') - lines.append(f'Model: {self.llm_metadata.model}') - lines.append(f'PromptTime: {self._format_duration(self.llm_metadata.prompt_time)}') - lines.append(f'LLMTime: {self._format_duration(self.llm_metadata.llm_time)}') - - # Biography stats (calculate word count dynamically) - word_count = self._calculate_word_count() - lines.append(f'Words: {word_count:,}') - lines.append(f'Citations: {self.citation_count}') - lines.append(f'Sources: {self.source_count}') - - lines.append("---\n") - return "\n".join(lines) - - def render_markdown(self, include_metadata: bool = True) -> str: - """Render complete biography as Markdown with optional front matter.""" - sections = [] - - # Hugo-style front matter metadata - if include_metadata: - sections.append(self.render_metadata()) - - # Title with lifespan years - years_str = "" - if self.birth_year or self.death_year: - birth = self.birth_year or "????" - death = self.death_year or "????" - years_str = f" ({birth}-{death})" - sections.append(f"# Biography of {self.full_name}{years_str}\n") - - # Separate primary and additional images (only for STANDARD and COMPREHENSIVE) - primary_image = None - additional_images = [] - if self.length != BiographyLength.SHORT and self.media_files: - for media in self.media_files: - is_primary = media.get("IsPrimary", 0) == 1 if hasattr(media, 'get') else media["IsPrimary"] == 1 - if is_primary and primary_image is None: - primary_image = media - elif not is_primary: - additional_images.append(media) - - # Introduction - if self.introduction: - sections.append("## Introduction\n") - - # Add primary portrait image with text wrapping (if available) - if primary_image: - from pathlib import Path - # Format the media path - media_path = primary_image.get("MediaPath", "") if hasattr(primary_image, 'get') else primary_image["MediaPath"] - media_file = primary_image.get("MediaFile", "") if hasattr(primary_image, 'get') else primary_image["MediaFile"] - - # Strip RootsMagic's ?\ or ?/ prefix if present - if media_path.startswith("?\\"): - media_path = media_path[2:] - elif media_path.startswith("?/"): - media_path = media_path[2:] - - # Combine path components - if media_path: - full_path = Path(media_path) / media_file - else: - full_path = Path(media_file) - - # Convert to POSIX-style path for Markdown - image_path = full_path.as_posix() - - # Caption: "Full Name (birth_year-death_year)" - caption = f"{self.full_name}" - if self.birth_year or self.death_year: - birth = self.birth_year or "????" - death = self.death_year or "????" - caption += f" ({birth}-{death})" - - # Use HTML for text wrapping - align right with width constraint - sections.append(f'{caption}\n') - - sections.append(self.introduction) - sections.append("") - - # Early Life & Family Background - if self.early_life: - sections.append("## Early Life & Family Background\n") - sections.append(self.early_life) - sections.append("") - - # Education - if self.education: - sections.append("## Education\n") - sections.append(self.education) - sections.append("") - - # Career & Accomplishments - if self.career: - sections.append("## Career & Accomplishments\n") - sections.append(self.career) - sections.append("") - - # Marriage & Family - if self.marriage_family: - sections.append("## Marriage & Family\n") - sections.append(self.marriage_family) - sections.append("") - - # Later Life & Activities - if self.later_life: - sections.append("## Later Life & Activities\n") - sections.append(self.later_life) - sections.append("") - - # Death & Legacy - if self.death_legacy: - sections.append("## Death & Legacy\n") - sections.append(self.death_legacy) - sections.append("") - - # Photos (additional non-primary images) - if additional_images: - sections.append("## Photos\n") - for media in additional_images: - from pathlib import Path - # Format the media path - media_path = media.get("MediaPath", "") if hasattr(media, 'get') else media["MediaPath"] - media_file = media.get("MediaFile", "") if hasattr(media, 'get') else media["MediaFile"] - - # Strip RootsMagic's ?\ or ?/ prefix if present - if media_path.startswith("?\\"): - media_path = media_path[2:] - elif media_path.startswith("?/"): - media_path = media_path[2:] - - # Combine path components - if media_path: - full_path = Path(media_path) / media_file - else: - full_path = Path(media_file) - - # Convert to POSIX-style path for Markdown - image_path = full_path.as_posix() - - # Caption: "Full Name (birth_year-death_year)" - caption = f"{self.full_name}" - if self.birth_year or self.death_year: - birth = self.birth_year or "????" - death = self.death_year or "????" - caption += f" ({birth}-{death})" - - # Standard markdown image format (no text wrapping for additional images) - sections.append(f"![{caption}]({image_path})\n") - sections.append(f"*{caption}*\n") - sections.append("") - - # Footnotes (only for FOOTNOTE citation style) - if self.footnotes and self.citation_style == CitationStyle.FOOTNOTE: - sections.append("## Footnotes\n") - sections.append(self.footnotes) - sections.append("") - - # Sources - if self.sources: - sections.append("## Sources\n") - sections.append(self.sources) - sections.append("") - - content = "\n".join(sections) - # Update word_count for consistency (though metadata renders dynamically) - self.word_count = self._calculate_word_count() - return content - - def __str__(self) -> str: - """String representation returns rendered markdown.""" - return self.render_markdown() - - -@dataclass -class CitationInfo: - """Formatted citation information for footnotes and bibliography.""" - - citation_id: int - source_id: int - footnote: str # Full footnote (first use) - short_footnote: str # Short footnote (subsequent use) - bibliography: str # Bibliography entry - is_freeform: bool # True if TemplateID == 0 - template_name: str | None # Template name if not free-form - - -@dataclass -class CitationTracker: - """Track citations for footnote numbering and source-level deduplication.""" - - # Map: CitationID -> FootnoteNumber - citation_to_footnote: dict[int, int] = field(default_factory=dict) - - # Map: SourceID -> first CitationID encountered - source_first_citation: dict[int, int] = field(default_factory=dict) - - # Ordered list of citations as they appear in text - citation_order: list[int] = field(default_factory=list) - - def add_citation(self, citation_id: int, source_id: int) -> int: - """ - Add citation to tracker, returns footnote number. - Tracks first citation per source for full vs short footnote logic. - """ - if citation_id in self.citation_to_footnote: - # Already encountered, return existing number - return self.citation_to_footnote[citation_id] - - # New citation - footnote_num = len(self.citation_order) + 1 - self.citation_to_footnote[citation_id] = footnote_num - self.citation_order.append(citation_id) - - # Track first citation for this source - if source_id not in self.source_first_citation: - self.source_first_citation[source_id] = citation_id - - return footnote_num - - def is_first_for_source(self, citation_id: int, source_id: int) -> bool: - """Check if this is the first citation for a given source.""" - return self.source_first_citation.get(source_id) == citation_id - - -def _get_row_value(row, key: str, default=None): - """Get value from sqlite3.Row object with default.""" - try: - return row[key] if key in row.keys() else default - except (KeyError, TypeError): - return default - - -class BiographyGenerator: - """ - Generate biographical narratives from RootsMagic data. - - Follows the 9-section structure from RM11_Biography_Best_Practices.md: - 1. Introduction (Birth & Identity) - 2. Early Life & Family Background - 3. Education - 4. Career & Occupation - 5. Marriage & Family Life - 6. Later Life & Activities - 7. Death & Legacy - 8. Sources & Notes - - Args: - db: RMDatabase instance or path to database - agent: Optional GenealogyAgent for AI-powered narrative generation - extension_path: Path to ICU extension (default: ./sqlite-extension/icu.dylib) - current_year: Current year for 110-year rule calculation (default: current year) - - Example: - ```python - from rmagent.generators.biography import BiographyGenerator - from rmagent.agent.genealogy_agent import GenealogyAgent - - agent = GenealogyAgent(llm_provider=provider, db_path="data/Iiams.rmtree") - generator = BiographyGenerator(db_path="data/Iiams.rmtree", agent=agent) - - bio = generator.generate( - person_id=1, - length=BiographyLength.STANDARD, - citation_style=CitationStyle.FOOTNOTE, - include_sources=True - ) - - print(bio.render_markdown()) - ``` - """ - - def __init__( - self, - db: RMDatabase | Path | str | None = None, - agent: GenealogyAgent | None = None, - extension_path: Path | str = Path("./sqlite-extension/icu.dylib"), - current_year: int | None = None, - ): - # Handle db parameter - if isinstance(db, (Path, str)): - self.db_path = Path(db) - self._db = None - self._owns_db = True - elif isinstance(db, RMDatabase): - self.db_path = None - self._db = db - self._owns_db = False - else: - self.db_path = None - self._db = None - self._owns_db = False - - self.agent = agent - self.extension_path = Path(extension_path) - self.current_year = current_year or datetime.now().year - - def generate( - self, - person_id: int, - length: BiographyLength = BiographyLength.STANDARD, - citation_style: CitationStyle = CitationStyle.FOOTNOTE, - include_sources: bool = True, - include_media: bool = True, - use_ai: bool = True, - ) -> Biography: - """ - Generate a biography for the specified person. - - Args: - person_id: PersonID from PersonTable - length: Biography length (short/standard/comprehensive) - citation_style: Citation formatting style - include_sources: Include sources section - include_media: Include media references - use_ai: Use AI agent for narrative generation (requires agent parameter) - - Returns: - Biography instance with all sections populated - - Raises: - ValueError: If person not found or if use_ai=True but no agent provided - """ - # Check for agent requirement early - if use_ai and not self.agent: - raise ValueError("AI generation requested but no agent provided") - - # Extract person context - context = self._extract_person_context(person_id, include_media) - - # Apply privacy rules - self._apply_privacy_rules(context) - - # Generate biography sections - if use_ai and self.agent: - biography = self._generate_with_ai(context, length, citation_style, include_sources) - else: - biography = self._generate_template_based( - context, length, citation_style, include_sources - ) - - return biography - - # ---- Private Methods: Data Extraction ---- - - def _extract_person_context(self, person_id: int, include_media: bool = True) -> PersonContext: - """Extract complete person context from database.""" - - def _extract(db: RMDatabase) -> PersonContext: - query = QueryService(db) - - # Get person with primary name - person = query.get_person_with_primary_name(person_id) - if not person: - raise ValueError(f"Person {person_id} not found") - - # Extract name components - full_name = format_full_name( - given=_get_row_value(person, "Given"), - surname=_get_row_value(person, "Surname"), - prefix=_get_row_value(person, "Prefix"), - suffix=_get_row_value(person, "Suffix"), - ) - - # Calculate is_living based on 110-year rule - birth_year = _get_row_value(person, "BirthYear") - is_living = False - if birth_year: - age = self.current_year - birth_year - is_living = age < 110 - - # Extract birth/death information - birth_date_str, birth_place = self._extract_vital_info( - db, person_id, fact_type_id=1 - ) # Birth - death_date_str, death_place = self._extract_vital_info( - db, person_id, fact_type_id=2 - ) # Death - - # Get relationships - parents = query.get_parents(person_id) - father_name = None - mother_name = None - father_id = None - mother_id = None - - if parents: - father_id = _get_row_value(parents, "FatherID") - mother_id = _get_row_value(parents, "MotherID") - if father_id: - father_name = format_full_name( - given=_get_row_value(parents, "FatherGiven"), - surname=_get_row_value(parents, "FatherSurname"), - ) - if mother_id: - mother_name = format_full_name( - given=_get_row_value(parents, "MotherGiven"), - surname=_get_row_value(parents, "MotherSurname"), - ) - - spouses = query.get_spouses(person_id) or [] - children = query.get_children(person_id) or [] - siblings = [] # TODO: Implement get_siblings() in QueryService - - # Get all events and categorize - all_events = query.get_person_events(person_id) - ( - vital_events, - education_events, - occupation_events, - military_events, - residence_events, - other_events, - ) = self._categorize_events(db, all_events) - - # Get media if requested - media_files = [] - if include_media: - media_files = self._get_media_for_person(db, person_id) - - # Get all citations - all_citations = self._get_all_citations_for_person(db, person_id) - - # Extract person-level notes - person_notes = _get_row_value(person, "Note") - - return PersonContext( - person_id=person_id, - full_name=full_name, - given_name=_get_row_value(person, "Given", ""), - surname=_get_row_value(person, "Surname", ""), - prefix=_get_row_value(person, "Prefix"), - suffix=_get_row_value(person, "Suffix"), - nickname=_get_row_value(person, "Nickname"), - birth_year=birth_year, - birth_date=birth_date_str, - birth_place=birth_place, - death_year=_get_row_value(person, "DeathYear"), - death_date=death_date_str, - death_place=death_place, - sex=_get_row_value(person, "Sex", 2), - is_private=bool(_get_row_value(person, "IsPrivate", 0)), - is_living=is_living, - person_notes=person_notes, - father_id=father_id, - father_name=father_name, - mother_id=mother_id, - mother_name=mother_name, - spouses=spouses, - children=children, - siblings=siblings, - vital_events=vital_events, - education_events=education_events, - occupation_events=occupation_events, - military_events=military_events, - residence_events=residence_events, - other_events=other_events, - media_files=media_files, - all_citations=all_citations, - ) - - if self._db: - return _extract(self._db) - elif self.db_path: - with RMDatabase(self.db_path, extension_path=self.extension_path) as db: - return _extract(db) - else: - raise ValueError("No database provided") - - def _extract_vital_info( - self, db: RMDatabase, person_id: int, fact_type_id: int - ) -> tuple[str | None, str | None]: - """Extract date and place for a vital event (birth/death).""" - query = QueryService(db) - vital_events = query.get_vital_events(person_id) - - for event in vital_events: - if _get_row_value(event, "FactTypeID") == fact_type_id: - # Parse date - date_str = _get_row_value(event, "Date") - formatted_date = None - if date_str and not is_unknown_date(date_str): - try: - parsed_date = parse_rm_date(date_str) - formatted_date = parsed_date.format_display() - except Exception: - formatted_date = None - - # Format place - place_str = _get_row_value(event, "Place") - formatted_place = None - if place_str: - try: - formatted_place = format_place_medium(place_str) - except Exception: - formatted_place = place_str - - return formatted_date, formatted_place - - return None, None - - def _categorize_events( - self, db: RMDatabase, events: list[dict] - ) -> tuple[list[EventContext], ...]: - """Categorize events into vital, education, occupation, military, residence, and other.""" - vital = [] - education = [] - occupation = [] - military = [] - residence = [] - other = [] - - for event in events: - event_ctx = self._build_event_context(db, event) - event_type = _get_row_value(event, "EventType", 0) - - # Categorize based on FactTypeID - # Vital: Birth (1), Death (2), Burial (3), Baptism (4), Christening (5) - if event_type in (1, 2, 3, 4, 5, 6): - vital.append(event_ctx) - # Education: Education (17), Graduation (18) - elif event_type in (17, 18): - education.append(event_ctx) - # Occupation: Occupation (12), Retirement (27) - elif event_type in (12, 27): - occupation.append(event_ctx) - # Military: Military Service (10), Drafted (63), Military Discharge (64) - elif event_type in (10, 63, 64): - military.append(event_ctx) - # Residence: Residence (13), Immigration (20), Emigration (19) - elif event_type in (13, 19, 20): - residence.append(event_ctx) - else: - other.append(event_ctx) - - return vital, education, occupation, military, residence, other - - def _build_event_context(self, db: RMDatabase, event: dict) -> EventContext: - """Build EventContext from event row.""" - # Parse date - date_str = _get_row_value(event, "Date", "") - formatted_date = "" - if date_str and not is_unknown_date(date_str): - try: - parsed = parse_rm_date(date_str) - formatted_date = parsed.format_display() - except Exception: - formatted_date = date_str - - # Format place - place_str = _get_row_value(event, "Place", "") - formatted_place = "" - if place_str: - try: - formatted_place = format_place_short(place_str) - except Exception: - formatted_place = place_str - - # Get citations for this event - citations = self._get_citations_for_event(db, _get_row_value(event, "EventID", 0)) - - return EventContext( - event_id=_get_row_value(event, "EventID", 0), - event_type=_get_row_value(event, "EventType", ""), - date=formatted_date, - place=formatted_place, - details=_get_row_value(event, "Details", ""), - note=_get_row_value(event, "Note", ""), - is_private=bool(_get_row_value(event, "IsPrivate", 0)), - proof=_get_row_value(event, "Proof", 0), - citations=citations, - sort_date=_get_row_value(event, "SortDate", 0), - ) - - def _get_citations_for_event(self, db: RMDatabase, event_id: int) -> list[dict]: - """Get all citations for an event with formatted text (Footnote, ShortFootnote, Bibliography).""" - query = QueryService(db) - return query.get_event_citations(event_id) - - def _get_media_for_person(self, db: RMDatabase, person_id: int) -> list[dict]: - """Get all media files linked to person, including path and primary flag.""" - cursor = db.execute( - """ - SELECT m.MediaID, m.MediaPath, m.MediaFile, m.Caption, m.Date, m.Description, - ml.IsPrimary, ml.SortOrder - FROM MediaLinkTable ml - JOIN MultimediaTable m ON ml.MediaID = m.MediaID - WHERE ml.OwnerType = ? AND ml.OwnerID = ? - ORDER BY ml.IsPrimary DESC, ml.SortOrder, ml.LinkID - """, - (OwnerType.PERSON.value, person_id), - ) - return cursor.fetchall() - - def _get_all_citations_for_person(self, db: RMDatabase, person_id: int) -> list[dict]: - """Get all citations associated with person (via events, names, etc.) with full citation data.""" - query = QueryService(db) - - # Get all events for person - events = query.get_person_events(person_id) - - # Collect citations from all events (deduplicated by CitationID) - all_citations = [] - seen_citation_ids = set() - - for event in events: - event_id = _get_row_value(event, "EventID") - if not event_id: - continue - - # Get full citation data including BLOBs - citations = query.get_event_citations(event_id) - for citation in citations: - citation_id = _get_row_value(citation, "CitationID") - if citation_id in seen_citation_ids: - continue - seen_citation_ids.add(citation_id) - all_citations.append(citation) - - return all_citations - - def _format_media_path(self, media_path: str, media_file: str) -> str: - r""" - Format media path for local file access. - - Converts RootsMagic's ?\path notation to a path relative to the database directory. - - Args: - media_path: MediaPath from MultimediaTable (e.g., "?\Pictures - People") - media_file: MediaFile from MultimediaTable (e.g., "Iams, Franklin Pierce (1852-1917).jpg") - - Returns: - Formatted path relative to database directory - """ - # Strip RootsMagic's ?\ or ?/ prefix if present - if media_path.startswith("?\\"): - media_path = media_path[2:] - elif media_path.startswith("?/"): - media_path = media_path[2:] - - # Combine path components - if media_path: - # Use Path to handle cross-platform separators - full_path = Path(media_path) / media_file - else: - full_path = Path(media_file) - - # Convert to POSIX-style path (forward slashes) for Markdown - return full_path.as_posix() - - def _calculate_age_at_death(self, birth_year: int | None, death_year: int | None) -> int | None: - """Calculate age at death from birth and death years.""" - if birth_year and death_year: - return death_year - birth_year - return None - - # ---- Private Methods: Privacy Rules ---- - - def _apply_privacy_rules(self, context: PersonContext) -> None: - """Apply privacy rules to person context (modifies in place).""" - # If person is marked private or likely living, filter events - if context.is_private or context.is_living: - context.privacy_applied = True # type: ignore[misc] - - # Remove all private events - context.vital_events = [e for e in context.vital_events if not e.is_private] - context.education_events = [e for e in context.education_events if not e.is_private] - context.occupation_events = [e for e in context.occupation_events if not e.is_private] - context.military_events = [e for e in context.military_events if not e.is_private] - context.residence_events = [e for e in context.residence_events if not e.is_private] - context.other_events = [e for e in context.other_events if not e.is_private] - - # If likely living, also remove sensitive event types - if context.is_living: - # Remove occupation, residence, and education events for living persons - context.occupation_events = [] - context.residence_events = [] - context.education_events = [] - - # ---- Private Methods: Biography Generation ---- - - def _generate_with_ai( - self, - context: PersonContext, - length: BiographyLength, - citation_style: CitationStyle, - include_sources: bool, - ) -> Biography: - """Generate biography using AI agent.""" - if not self.agent: - raise ValueError("AI generation requested but no agent provided") - - # Time the prompt building and LLM generation - prompt_start = time.time() - - # Use agent's generate_biography method (includes internal timing) - result = self.agent.generate_biography(person_id=context.person_id, style=length.value) - - total_time = time.time() - prompt_start - - # Extract LLM metadata from result - llm_metadata = None - if hasattr(self.agent, 'llm_provider'): - provider_name = self.agent.llm_provider.__class__.__name__.replace('Provider', '').lower() - llm_metadata = LLMMetadata( - provider=provider_name, - model=result.model, - prompt_tokens=result.usage.prompt_tokens, - completion_tokens=result.usage.completion_tokens, - total_tokens=result.usage.total_tokens, - prompt_time=total_time * 0.1, # Estimate ~10% for prompt building - llm_time=total_time * 0.9, # Estimate ~90% for LLM - cost=result.cost, - ) - - # Process citations for FOOTNOTE style BEFORE parsing into sections - footnotes_text = "" - sources_text = "" - response_text = result.text - citation_count = 0 - source_count = 0 - - if citation_style == CitationStyle.FOOTNOTE: - # Process {cite:ID} markers in full response (preserves section headers) - modified_text, footnotes, tracker = self._process_citations_in_text( - response_text, context.all_citations - ) - - # Use modified text for section parsing - response_text = modified_text - - # Generate footnotes section - if footnotes: - footnotes_text = self._generate_footnotes_section(footnotes, tracker) - citation_count = len(footnotes) - - # Generate sources section using new bibliography method - if include_sources: - sources_text = self._generate_sources_section(context.all_citations) - # Count unique sources - source_ids = set() - for citation in context.all_citations: - source_id = _get_row_value(citation, "SourceID", 0) - if source_id: - source_ids.add(source_id) - source_count = len(source_ids) - else: - # For other citation styles, use existing format - if include_sources: - sources_text = self._format_sources_section(context, citation_style) - citation_count = len(context.all_citations) - # Count unique sources - source_ids = set() - for citation in context.all_citations: - source_id = _get_row_value(citation, "SourceID", 0) - if source_id: - source_ids.add(source_id) - source_count = len(source_ids) - - # Parse AI response into sections (after citation processing) - sections = self._parse_ai_response(response_text) - - return Biography( - person_id=context.person_id, - full_name=context.full_name, - length=length, - citation_style=citation_style, - introduction=sections.get("introduction", ""), - early_life=sections.get("early_life", ""), - education=sections.get("education", ""), - career=sections.get("career", ""), - marriage_family=sections.get("marriage_family", ""), - later_life=sections.get("later_life", ""), - death_legacy=sections.get("death_legacy", ""), - footnotes=footnotes_text, - sources=sources_text, - privacy_applied=getattr(context, "privacy_applied", False), - birth_year=context.birth_year, - death_year=context.death_year, - llm_metadata=llm_metadata, - citation_count=citation_count, - source_count=source_count, - media_files=context.media_files, - ) - - def _generate_template_based( - self, - context: PersonContext, - length: BiographyLength, - citation_style: CitationStyle, - include_sources: bool, - ) -> Biography: - """Generate biography using template-based approach (no AI).""" - # Generate each section using templates - intro = self._generate_introduction(context) - early_life = self._generate_early_life(context) - education = self._generate_education(context) - career = self._generate_career(context) - marriage = self._generate_marriage_family(context) - later_life = self._generate_later_life(context) - death = self._generate_death_legacy(context) - - sources_text = "" - citation_count = 0 - source_count = 0 - if include_sources: - sources_text = self._format_sources_section(context, citation_style) - citation_count = len(context.all_citations) - # Count unique sources - source_ids = set() - for citation in context.all_citations: - source_id = _get_row_value(citation, "SourceID", 0) - if source_id: - source_ids.add(source_id) - source_count = len(source_ids) - - return Biography( - person_id=context.person_id, - full_name=context.full_name, - length=length, - citation_style=citation_style, - introduction=intro, - early_life=early_life, - education=education, - career=career, - marriage_family=marriage, - later_life=later_life, - death_legacy=death, - footnotes="", # Template-based biographies don't use citations - sources=sources_text, - privacy_applied=getattr(context, "privacy_applied", False), - birth_year=context.birth_year, - death_year=context.death_year, - llm_metadata=None, # No LLM used for template-based - citation_count=citation_count, - source_count=source_count, - media_files=context.media_files, - ) - - # ---- Private Methods: Template Generation ---- - - def _generate_introduction(self, context: PersonContext) -> str: - """Generate introduction section.""" - lines = [] - - # Basic intro: Name was born on [date] in [place] - birth_info = "" - if context.birth_date: - birth_info = f" on {context.birth_date}" - if context.birth_place: - birth_info += f" in {context.birth_place}" - - if birth_info: - lines.append(f"{context.full_name} was born{birth_info}.") - else: - lines.append(f"{context.full_name}'s birth date and place are not recorded.") - - # Parents - if context.father_name or context.mother_name: - parent_info = [] - if context.father_name: - parent_info.append(context.father_name) - if context.mother_name: - parent_info.append(context.mother_name) - parent_str = " and ".join(parent_info) - pronoun = "He" if context.sex == 0 else "She" if context.sex == 1 else "They" - verb = "was" if context.sex != 2 else "were" - lines.append(f"{pronoun} {verb} the child of {parent_str}.") - - # Death information (if applicable) - if context.death_date or context.death_place: - death_info = "" - pronoun = "He" if context.sex == 0 else "She" if context.sex == 1 else "They" - verb = "died" if context.sex != 2 else "died" - - if context.death_date: - death_info = f" on {context.death_date}" - if context.death_place: - death_info += f" in {context.death_place}" - - # Calculate age at death if both years available - age = self._calculate_age_at_death(context.birth_year, context.death_year) - if age is not None: - death_info += f" at the age of {age}" - - lines.append(f"{pronoun} {verb}{death_info}.") - - return " ".join(lines) - - def _generate_early_life(self, context: PersonContext) -> str: - """Generate early life section.""" - if not context.siblings: - return "" - - sibling_count = len(context.siblings) - pronoun = "He" if context.sex == 0 else "She" if context.sex == 1 else "They" - verb = "grew" if context.sex != 2 else "grew" - - if sibling_count == 0: - return f"{pronoun} {verb} up as an only child." - elif sibling_count == 1: - return f"{pronoun} had one sibling." - else: - return f"{pronoun} had {sibling_count} siblings." - - def _generate_education(self, context: PersonContext) -> str: - """Generate education section.""" - if not context.education_events: - return "" - - lines = [] - for event in context.education_events: - event_desc = f"{event.date}" if event.date else "At an unknown date" - if event.place: - event_desc += f" in {event.place}" - if event.details: - event_desc += f", {event.details}" - lines.append(event_desc + ".") - - return " ".join(lines) - - def _generate_career(self, context: PersonContext) -> str: - """Generate career section.""" - if not context.occupation_events: - return "" - - lines = [] - for event in context.occupation_events: - if event.details: - desc = f"{context.given_name} worked as {event.details}" - if event.date: - desc += f" in {event.date}" - if event.place: - desc += f" in {event.place}" - lines.append(desc + ".") - - return " ".join(lines) - - def _generate_marriage_family(self, context: PersonContext) -> str: - """Generate marriage and family section.""" - lines = [] - - # Marriages - if context.spouses: - for spouse in context.spouses: - spouse_name = format_full_name( - given=_get_row_value(spouse, "Given"), - surname=_get_row_value(spouse, "Surname"), - ) - marriage_date = _get_row_value(spouse, "MarriageDate") - if marriage_date and not is_unknown_date(marriage_date): - try: - parsed = parse_rm_date(marriage_date) - date_str = parsed.format_display() - lines.append(f"{context.given_name} married {spouse_name} on {date_str}.") - except Exception: - lines.append(f"{context.given_name} married {spouse_name}.") - else: - lines.append(f"{context.given_name} married {spouse_name}.") - - # Children - if context.children: - child_count = len(context.children) - if child_count == 1: - lines.append("They had one child.") - else: - lines.append(f"They had {child_count} children.") - - return " ".join(lines) - - def _generate_later_life(self, context: PersonContext) -> str: - """Generate later life section.""" - # Could include residence changes, later events - if context.residence_events: - places = [e.place for e in context.residence_events if e.place] - if places: - return f"{context.given_name} resided in {', '.join(places[:3])}." - return "" - - def _generate_death_legacy(self, context: PersonContext) -> str: - """Generate death and legacy section.""" - if not context.death_date and not context.death_place: - return "" - - death_info = "" - if context.death_date: - death_info = f" on {context.death_date}" - if context.death_place: - death_info += f" in {context.death_place}" - - pronoun = "He" if context.sex == 0 else "She" if context.sex == 1 else "They" - verb = "died" if context.sex != 2 else "died" - - return f"{pronoun} {verb}{death_info}." - - def _format_sources_section(self, context: PersonContext, citation_style: CitationStyle) -> str: - """Format sources section based on citation style.""" - if not context.all_citations: - return "" - - lines = [] - for i, citation in enumerate(context.all_citations, 1): - source_name_raw = _get_row_value(citation, "SourceName", "Unknown Source") - citation_name = _get_row_value(citation, "CitationName", "") - - # Remove source type prefixes like "Book: " or "Newspapers: " - source_name = self._strip_source_type_prefix(source_name_raw) - - if citation_style == CitationStyle.FOOTNOTE: - lines.append(f"{i}. *{source_name}*") - if citation_name: - lines.append(f" {citation_name}") - elif citation_style == CitationStyle.PARENTHETICAL: - lines.append(f"- *{source_name}*") - if citation_name: - lines.append(f" ({citation_name})") - else: # NARRATIVE - if citation_name: - lines.append(f"- *{source_name}*: {citation_name}") - else: - lines.append(f"- *{source_name}*") - - return "\n".join(lines) - - @staticmethod - def _strip_source_type_prefix(source_name: str) -> str: - """ - Remove source type prefixes like 'Book: ', 'Newspapers: ', etc. - - Examples: - "Book: Smith Family History" -> "Smith Family History" - "Newspapers: Baltimore Sun" -> "Baltimore Sun" - "US Census Records" -> "US Census Records" (no change) - """ - # Common source type prefixes in RootsMagic - prefixes = [ - "Book: ", - "Books: ", - "Newspaper: ", - "Newspapers: ", - "Cemetery: ", - "Cemeteries: ", - "Census: ", - "Church Records: ", - "Court Records: ", - "Military Records: ", - "Vital Records: ", - "Website: ", - "Websites: ", - "Document: ", - "Documents: ", - "Letter: ", - "Letters: ", - "Photo: ", - "Photos: ", - ] - - for prefix in prefixes: - if source_name.startswith(prefix): - return source_name[len(prefix):] - - return source_name - - def _parse_ai_response(self, response_text: str) -> dict[str, str]: - """Parse AI-generated biography into sections.""" - # Simple parser - looks for section headers - sections = { - "introduction": "", - "early_life": "", - "education": "", - "career": "", - "marriage_family": "", - "later_life": "", - "death_legacy": "", - } - - # Split by markdown headers and categorize - # This is a simplified version - production would use more robust parsing - lines = response_text.split("\n") - current_section = None - current_text = [] - - for line in lines: - if line.startswith("##"): - # Save previous section - if current_section and current_text: - sections[current_section] = "\n".join(current_text).strip() - - # Detect new section - header = line.lower() - if "introduction" in header or "birth" in header: - current_section = "introduction" - elif "early life" in header or "family background" in header: - current_section = "early_life" - elif "education" in header: - current_section = "education" - elif "career" in header or "occupation" in header: - current_section = "career" - elif "marriage" in header or "family" in header: - current_section = "marriage_family" - elif "later life" in header: - current_section = "later_life" - elif "death" in header or "legacy" in header: - current_section = "death_legacy" - else: - current_section = None - - current_text = [] - elif current_section: - current_text.append(line) - - # Save final section - if current_section and current_text: - sections[current_section] = "\n".join(current_text).strip() - - return sections - - # ---- Citation Formatting Methods ---- - - def _format_citation_info(self, citation: dict) -> CitationInfo: - """ - Format citation into CitationInfo with all text versions. - Handles free-form (TemplateID=0) and template-based citations. - """ - citation_id = _get_row_value(citation, "CitationID", 0) - source_id = _get_row_value(citation, "SourceID", 0) - template_id = _get_row_value(citation, "TemplateID", 0) - template_name = _get_row_value(citation, "TemplateName") - - is_freeform = template_id == 0 - - if is_freeform: - # Use formatted fields from CitationTable if available - footnote = _get_row_value(citation, "Footnote") - short_footnote = _get_row_value(citation, "ShortFootnote") - bibliography = _get_row_value(citation, "CitationBibliography") - - # Fallback: Generate from Fields BLOB if NULL - if not footnote: - footnote = self._generate_citation_from_fields(citation) - if not short_footnote: - short_footnote = self._generate_short_footnote_from_fields(citation, footnote) - if not bibliography: - bibliography = self._generate_bibliography_from_fields(citation) - else: - # Template-based: Show placeholders - footnote = f"[Citation {citation_id}, Template: {template_name}]" - short_footnote = footnote - bibliography = f"[Source {source_id}, Template: {template_name}]" - - return CitationInfo( - citation_id=citation_id, - source_id=source_id, - footnote=footnote, - short_footnote=short_footnote, - bibliography=bibliography, - is_freeform=is_freeform, - template_name=template_name, - ) - - def _generate_citation_from_fields(self, citation: dict) -> str: - """ - Generate footnote text from BLOB fields (fallback). - First checks SourceFields for pre-formatted Footnote, then CitationFields for page/details. - Returns citation with WARNING only if all approaches fail. - """ - citation_id = _get_row_value(citation, "CitationID", 0) - - # First, check SourceFields BLOB for pre-formatted Footnote - source_fields_blob = _get_row_value(citation, "SourceFields") - if source_fields_blob: - from rmagent.rmlib.parsers.blob_parser import parse_source_fields - - try: - source_fields = parse_source_fields(source_fields_blob) - footnote = source_fields.get("Footnote", "") - if footnote: - return footnote - except Exception: - pass # Continue to next approach - - # Fallback: Check CitationFields BLOB for page/details - citation_fields_blob = _get_row_value(citation, "CitationFields") - if citation_fields_blob: - from rmagent.rmlib.parsers.blob_parser import parse_citation_fields - - try: - fields = parse_citation_fields(citation_fields_blob) - # Simple format: Page field is most common - page = fields.get("Page", "") - if page: - return f"p. {page}" - # If no page, show first non-empty field - for key, value in fields.items(): - if value: - return f"{key}: {value}" - except Exception: - pass - - return f"[Citation {citation_id}] ⚠️ WARNING: Missing citation fields" - - def _generate_short_footnote_from_fields(self, citation: dict, full_footnote: str) -> str: - """ - Generate short footnote text from BLOB fields (fallback). - First checks SourceFields for pre-formatted ShortFootnote, then falls back to full footnote. - """ - # Check SourceFields BLOB for pre-formatted ShortFootnote - source_fields_blob = _get_row_value(citation, "SourceFields") - if source_fields_blob: - from rmagent.rmlib.parsers.blob_parser import parse_source_fields - - try: - source_fields = parse_source_fields(source_fields_blob) - short_footnote = source_fields.get("ShortFootnote", "") - if short_footnote: - return short_footnote - except Exception: - pass - - # Fallback: use full footnote - return full_footnote - - def _generate_bibliography_from_fields(self, citation: dict) -> str: - """ - Generate bibliography entry from SourceFields BLOB (fallback). - First checks for pre-formatted Bibliography field, then constructs from individual fields. - Returns source name with WARNING only if all approaches fail. - """ - source_id = _get_row_value(citation, "SourceID", 0) - source_name = _get_row_value(citation, "SourceName", "[Unknown Source]") - fields_blob = _get_row_value(citation, "SourceFields") - - if not fields_blob: - return f"{source_name} ⚠️ WARNING: Missing source fields" - - from rmagent.rmlib.parsers.blob_parser import parse_source_fields - - try: - fields = parse_source_fields(fields_blob) - - # First, check for pre-formatted Bibliography field (RootsMagic stores formatted text here) - bibliography = fields.get("Bibliography", "") - if bibliography: - return bibliography - - # Fallback: Evidence Explained basic format: Author. Title. Publisher, Year. - author = fields.get("Author", "") - title = fields.get("Title", "") - publisher = fields.get("Publisher", "") - year = fields.get("Year", "") - - parts = [] - if author: - parts.append(f"{author}.") - if title: - parts.append(f"*{title}.*") - if publisher and year: - parts.append(f"{publisher}, {year}.") - elif publisher: - parts.append(f"{publisher}.") - elif year: - parts.append(f"{year}.") - - if parts: - return " ".join(parts) - return f"{source_name} ⚠️ WARNING: No source details in fields" - except Exception as e: - return f"{source_name} ⚠️ WARNING: Failed to parse source fields ({e})" - - def _process_citations_in_text( - self, text: str, all_citations: list[dict] - ) -> tuple[str, list[tuple[int, CitationInfo]], CitationTracker]: - """ - Process {{cite:ID}} markers in text, replace with [^N] footnote markers. - - Returns: - - Modified text with [^N] markers - - List of (footnote_num, CitationInfo) in order of appearance - - CitationTracker with all citation metadata - """ - import re - - tracker = CitationTracker() - - # Build lookup: CitationID -> CitationInfo - citation_lookup = {} - for citation in all_citations: - cid = _get_row_value(citation, "CitationID", 0) - citation_lookup[cid] = self._format_citation_info(citation) - - # Find all {{cite:ID}} markers (double braces as specified in prompt) - pattern = r"\{\{cite:(\d+)\}\}" - matches = list(re.finditer(pattern, text)) - - # Replace markers with footnote numbers (in reverse to preserve positions) - replacements = [] - for match in matches: - citation_id = int(match.group(1)) - - if citation_id not in citation_lookup: - # Citation not found, leave placeholder - footnote_marker = f"[^{citation_id}?]" - else: - citation_info = citation_lookup[citation_id] - source_id = citation_info.source_id - - # Get or assign footnote number - footnote_num = tracker.add_citation(citation_id, source_id) - footnote_marker = f"[^{footnote_num}]" - - replacements.append((match.span(), footnote_marker)) - - # Apply replacements in reverse order to preserve positions - modified_text = text - for (start, end), replacement in reversed(replacements): - modified_text = modified_text[:start] + replacement + modified_text[end:] - - # Build ordered footnote list - footnotes = [] - for citation_id in tracker.citation_order: - citation_info = citation_lookup.get(citation_id) - if citation_info: - footnote_num = tracker.citation_to_footnote[citation_id] - footnotes.append((footnote_num, citation_info)) - - return modified_text, footnotes, tracker - - def _generate_footnotes_section( - self, footnotes: list[tuple[int, CitationInfo]], tracker: CitationTracker - ) -> str: - """ - Generate footnotes section with numbered entries. - First citation per source uses full footnote, subsequent use short. - """ - lines = [] - - for footnote_num, citation_info in footnotes: - # Determine if first citation for this source - is_first = tracker.is_first_for_source(citation_info.citation_id, citation_info.source_id) - - # Use full or short footnote - footnote_text = citation_info.footnote if is_first else citation_info.short_footnote - - lines.append(f"[^{footnote_num}]: {footnote_text}") - - return "\n".join(lines) - - def _generate_sources_section(self, all_citations: list[dict]) -> str: - """ - Generate alphabetically sorted bibliography using SourceTable.ActualText. - Deduplicate by SourceID. - """ - # Build unique sources map: SourceID -> CitationInfo - sources = {} - for citation in all_citations: - source_id = _get_row_value(citation, "SourceID", 0) - if source_id not in sources: - citation_info = self._format_citation_info(citation) - sources[source_id] = citation_info - - # Sort alphabetically by bibliography text - sorted_sources = sorted(sources.values(), key=lambda c: c.bibliography.lower()) - - # Format as list - lines = [] - for citation_info in sorted_sources: - lines.append(f"- {citation_info.bibliography}") - - return "\n".join(lines) diff --git a/rmagent/generators/biography/citations.py b/rmagent/generators/biography/citations.py index 3d1eb28..65346be 100644 --- a/rmagent/generators/biography/citations.py +++ b/rmagent/generators/biography/citations.py @@ -49,7 +49,7 @@ def strip_source_type_prefix(source_name: str) -> str: for prefix in prefixes: if source_name.startswith(prefix): - return source_name[len(prefix):] + return source_name[len(prefix) :] return source_name @@ -162,7 +162,6 @@ def _generate_bibliography_from_fields(self, citation: dict) -> str: First checks for pre-formatted Bibliography field, then constructs from individual fields. Returns source name with WARNING only if all approaches fail. """ - source_id = get_row_value(citation, "SourceID", 0) source_name = get_row_value(citation, "SourceName", "[Unknown Source]") fields_blob = get_row_value(citation, "SourceFields") @@ -259,9 +258,7 @@ def process_citations_in_text( return modified_text, footnotes, tracker - def generate_footnotes_section( - self, footnotes: list[tuple[int, CitationInfo]], tracker: CitationTracker - ) -> str: + def generate_footnotes_section(self, footnotes: list[tuple[int, CitationInfo]], tracker: CitationTracker) -> str: """ Generate footnotes section with numbered entries and 3-character indent. First citation per source uses full footnote, subsequent use short. diff --git a/rmagent/generators/biography/generator.py b/rmagent/generators/biography/generator.py index 10905cf..deb4272 100644 --- a/rmagent/generators/biography/generator.py +++ b/rmagent/generators/biography/generator.py @@ -6,9 +6,9 @@ from __future__ import annotations +import time from datetime import datetime from pathlib import Path -import time from rmagent.agent.genealogy_agent import GenealogyAgent from rmagent.rmlib.database import RMDatabase @@ -18,6 +18,7 @@ from rmagent.rmlib.parsers.place_parser import format_place_medium, format_place_short from rmagent.rmlib.queries import QueryService +from .citations import CitationProcessor from .models import ( Biography, BiographyLength, @@ -27,7 +28,6 @@ PersonContext, get_row_value, ) -from .citations import CitationProcessor from .templates import BiographyTemplates @@ -141,9 +141,7 @@ def generate( if use_ai and self.agent: biography = self._generate_with_ai(context, length, citation_style, include_sources) else: - biography = self._generate_template_based( - context, length, citation_style, include_sources - ) + biography = self._generate_template_based(context, length, citation_style, include_sources) return biography @@ -176,12 +174,8 @@ def _extract(db: RMDatabase) -> PersonContext: is_living = age < 110 # Extract birth/death information - birth_date_str, birth_place = self._extract_vital_info( - db, person_id, fact_type_id=1 - ) # Birth - death_date_str, death_place = self._extract_vital_info( - db, person_id, fact_type_id=2 - ) # Death + birth_date_str, birth_place = self._extract_vital_info(db, person_id, fact_type_id=1) # Birth + death_date_str, death_place = self._extract_vital_info(db, person_id, fact_type_id=2) # Death # Get relationships parents = query.get_parents(person_id) @@ -273,9 +267,7 @@ def _extract(db: RMDatabase) -> PersonContext: else: raise ValueError("No database provided") - def _extract_vital_info( - self, db: RMDatabase, person_id: int, fact_type_id: int - ) -> tuple[str | None, str | None]: + def _extract_vital_info(self, db: RMDatabase, person_id: int, fact_type_id: int) -> tuple[str | None, str | None]: """Extract date and place for a vital event (birth/death).""" query = QueryService(db) vital_events = query.get_vital_events(person_id) @@ -305,9 +297,7 @@ def _extract_vital_info( return None, None - def _categorize_events( - self, db: RMDatabase, events: list[dict] - ) -> tuple[list[EventContext], ...]: + def _categorize_events(self, db: RMDatabase, events: list[dict]) -> tuple[list[EventContext], ...]: """Categorize events into vital, education, occupation, military, residence, and other.""" vital = [] education = [] @@ -471,8 +461,8 @@ def _generate_with_ai( # Extract LLM metadata from result llm_metadata = None - if hasattr(self.agent, 'llm_provider'): - provider_name = self.agent.llm_provider.__class__.__name__.replace('Provider', '').lower() + if hasattr(self.agent, "llm_provider"): + provider_name = self.agent.llm_provider.__class__.__name__.replace("Provider", "").lower() llm_metadata = LLMMetadata( provider=provider_name, model=result.model, @@ -480,7 +470,7 @@ def _generate_with_ai( completion_tokens=result.usage.completion_tokens, total_tokens=result.usage.total_tokens, prompt_time=total_time * 0.1, # Estimate ~10% for prompt building - llm_time=total_time * 0.9, # Estimate ~90% for LLM + llm_time=total_time * 0.9, # Estimate ~90% for LLM cost=result.cost, ) diff --git a/rmagent/generators/biography/models.py b/rmagent/generators/biography/models.py index f164435..bb8a277 100644 --- a/rmagent/generators/biography/models.py +++ b/rmagent/generators/biography/models.py @@ -7,8 +7,9 @@ from __future__ import annotations from dataclasses import dataclass, field -from datetime import datetime, timezone +from datetime import UTC, datetime from enum import Enum +from pathlib import Path class BiographyLength(str, Enum): @@ -129,7 +130,7 @@ class Biography: sources: str # Metadata - generated_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc).astimezone()) + generated_at: datetime = field(default_factory=lambda: datetime.now(UTC).astimezone()) word_count: int = 0 privacy_applied: bool = False birth_year: int | None = None @@ -138,7 +139,7 @@ class Biography: citation_count: int = 0 source_count: int = 0 media_files: list[dict] = field(default_factory=list) # Media files for images - media_root_directory: "Path | None" = None # Root directory for media files (replaces ? in MediaPath) + media_root_directory: Path | None = None # Root directory for media files (replaces ? in MediaPath) def calculate_word_count(self) -> int: """ @@ -146,21 +147,24 @@ def calculate_word_count(self) -> int: Excludes front matter, footnotes, and sources sections. """ - all_text = "\n".join([ - self.introduction, - self.early_life, - self.education, - self.career, - self.marriage_family, - self.later_life, - self.death_legacy, - ]) + all_text = "\n".join( + [ + self.introduction, + self.early_life, + self.education, + self.career, + self.marriage_family, + self.later_life, + self.death_legacy, + ] + ) return len(all_text.split()) def render_markdown(self, include_metadata: bool = True) -> str: """Render complete biography as Markdown with optional front matter.""" # Import here to avoid circular dependency from .rendering import BiographyRenderer + renderer = BiographyRenderer(media_root_directory=self.media_root_directory) return renderer.render_markdown(self, include_metadata) @@ -168,6 +172,7 @@ def render_metadata(self) -> str: """Render Hugo-style front matter metadata.""" # Import here to avoid circular dependency from .rendering import BiographyRenderer + renderer = BiographyRenderer(media_root_directory=self.media_root_directory) return renderer.render_metadata(self) diff --git a/rmagent/generators/biography/rendering.py b/rmagent/generators/biography/rendering.py index ba4d9c7..c094c0d 100644 --- a/rmagent/generators/biography/rendering.py +++ b/rmagent/generators/biography/rendering.py @@ -55,26 +55,26 @@ def render_metadata(self, bio: Biography) -> str: tz_str = bio.generated_at.strftime("%z") tz_formatted = f"{tz_str[:3]}:{tz_str[3:]}" if tz_str else "" date_str = bio.generated_at.strftime("%Y-%m-%dT%H:%M:%S") + tz_formatted - lines.append(f'Date: {date_str}') + lines.append(f"Date: {date_str}") # Person ID - lines.append(f'PersonID: {bio.person_id}') + lines.append(f"PersonID: {bio.person_id}") # LLM Metadata (if available) if bio.llm_metadata: - lines.append(f'TokensIn: {self.format_tokens(bio.llm_metadata.prompt_tokens)}') - lines.append(f'TokensOut: {self.format_tokens(bio.llm_metadata.completion_tokens)}') - lines.append(f'TotalTokens: {self.format_tokens(bio.llm_metadata.total_tokens)}') - lines.append(f'LLM: {bio.llm_metadata.provider.capitalize()}') - lines.append(f'Model: {bio.llm_metadata.model}') - lines.append(f'PromptTime: {self.format_duration(bio.llm_metadata.prompt_time)}') - lines.append(f'LLMTime: {self.format_duration(bio.llm_metadata.llm_time)}') + lines.append(f"TokensIn: {self.format_tokens(bio.llm_metadata.prompt_tokens)}") + lines.append(f"TokensOut: {self.format_tokens(bio.llm_metadata.completion_tokens)}") + lines.append(f"TotalTokens: {self.format_tokens(bio.llm_metadata.total_tokens)}") + lines.append(f"LLM: {bio.llm_metadata.provider.capitalize()}") + lines.append(f"Model: {bio.llm_metadata.model}") + lines.append(f"PromptTime: {self.format_duration(bio.llm_metadata.prompt_time)}") + lines.append(f"LLMTime: {self.format_duration(bio.llm_metadata.llm_time)}") # Biography stats (calculate word count dynamically) word_count = bio.calculate_word_count() - lines.append(f'Words: {word_count:,}') - lines.append(f'Citations: {bio.citation_count}') - lines.append(f'Sources: {bio.source_count}') + lines.append(f"Words: {word_count:,}") + lines.append(f"Citations: {bio.citation_count}") + lines.append(f"Sources: {bio.source_count}") lines.append("---\n") return "\n".join(lines) @@ -124,18 +124,28 @@ def render_markdown(self, bio: Biography, include_metadata: bool = True) -> str: db_caption = primary_image["Caption"] if "Caption" in primary_image.keys() else "" except (AttributeError, TypeError): db_caption = "" - caption = db_caption if db_caption else self._format_image_caption(bio.full_name, bio.birth_year, bio.death_year) - alt_text = self._format_image_caption(bio.full_name, bio.birth_year, bio.death_year) # Always use name/dates for alt text + if db_caption: + caption = db_caption + else: + caption = self._format_image_caption(bio.full_name, bio.birth_year, bio.death_year) + # Always use name/dates for alt text + alt_text = self._format_image_caption(bio.full_name, bio.birth_year, bio.death_year) sections.append('
') sections.append('
') - sections.append(f' {alt_text}') - sections.append(f'

{caption}

') - sections.append('
') + sections.append( + f' {alt_text}' + ) + sections.append( + f'

{caption}

' + ) + sections.append("
") sections.append('
') - sections.append(f' {bio.introduction}') - sections.append('
') - sections.append('\n') + sections.append(f" {bio.introduction}") + sections.append(" ") + sections.append("\n") else: sections.append(bio.introduction) diff --git a/rmagent/generators/biography/templates.py b/rmagent/generators/biography/templates.py index cbdf0e9..f90eec1 100644 --- a/rmagent/generators/biography/templates.py +++ b/rmagent/generators/biography/templates.py @@ -6,10 +6,11 @@ from __future__ import annotations -from .models import PersonContext, get_row_value from rmagent.rmlib.parsers.date_parser import is_unknown_date, parse_rm_date from rmagent.rmlib.parsers.name_parser import format_full_name +from .models import PersonContext, get_row_value + class BiographyTemplates: """Generates biography sections using templates (no AI).""" diff --git a/rmagent/generators/hugo_exporter.py b/rmagent/generators/hugo_exporter.py index c6dd1ae..d25a366 100644 --- a/rmagent/generators/hugo_exporter.py +++ b/rmagent/generators/hugo_exporter.py @@ -529,9 +529,7 @@ def _build_index(db: RMDatabase) -> str: lines.append(f"- [{person['name']}]({person['slug']}/){lifespan}") lines.append("") - lines.append( - f"*{len(people)} biographies • Generated {datetime.now().strftime('%Y-%m-%d')}*" - ) + lines.append(f"*{len(people)} biographies • Generated {datetime.now().strftime('%Y-%m-%d')}*") return "\n".join(lines) diff --git a/rmagent/generators/quality_report.py b/rmagent/generators/quality_report.py index da84822..9fa84eb 100644 --- a/rmagent/generators/quality_report.py +++ b/rmagent/generators/quality_report.py @@ -153,15 +153,11 @@ def _apply_filters( # Apply category filter if category_filter: - filtered_issues = [ - issue for issue in filtered_issues if issue.category == category_filter - ] + filtered_issues = [issue for issue in filtered_issues if issue.category == category_filter] # Apply severity filter if severity_filter: - filtered_issues = [ - issue for issue in filtered_issues if issue.severity == severity_filter - ] + filtered_issues = [issue for issue in filtered_issues if issue.severity == severity_filter] # Recalculate totals for filtered issues totals_by_severity = { @@ -320,10 +316,7 @@ def _format_html(self, report: QualityReport) -> str: " body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Arial, " "sans-serif; margin: 40px; }" ) - lines.append( - " h1 { color: #333; border-bottom: 2px solid #4CAF50; " - "padding-bottom: 10px; }" - ) + lines.append(" h1 { color: #333; border-bottom: 2px solid #4CAF50; " "padding-bottom: 10px; }") lines.append(" h2 { color: #555; margin-top: 30px; }") lines.append(" h3 { color: #666; }") lines.append( @@ -338,10 +331,7 @@ def _format_html(self, report: QualityReport) -> str: " .issue { background-color: #fff; border: 1px solid #ddd; padding: 15px; " "margin: 15px 0; border-radius: 4px; }" ) - lines.append( - " .issue-header { font-weight: bold; font-size: 1.1em; " - "margin-bottom: 10px; }" - ) + lines.append(" .issue-header { font-weight: bold; font-size: 1.1em; " "margin-bottom: 10px; }") lines.append(" .metadata { color: #666; font-size: 0.9em; }") lines.append(" .samples { margin-top: 10px; }") lines.append(" .sample { margin: 5px 0; padding-left: 20px; }") @@ -354,24 +344,16 @@ def _format_html(self, report: QualityReport) -> str: # Content lines.append("

Data Quality Report

") - lines.append( - f"

Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

" - ) + lines.append(f"

Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

") # Summary lines.append("
") lines.append("

Summary Statistics

") lines.append(" ") lines.append(" ") - lines.append( - f" " - ) - lines.append( - f" " - ) - lines.append( - f" " - ) + lines.append(f" ") + lines.append(f" ") + lines.append(f" ") lines.append( f" " ) @@ -409,9 +391,7 @@ def _format_html(self, report: QualityReport) -> str: severity_issues = [issue for issue in report.issues if issue.severity == severity] if severity_issues: css_class = severity.value - lines.append( - f"

{severity.value.capitalize()} Issues

" - ) + lines.append(f"

{severity.value.capitalize()} Issues

") for issue in severity_issues: lines.append("
") @@ -423,9 +403,7 @@ def _format_html(self, report: QualityReport) -> str: lines.append(f"

{issue.description}

") if issue.samples: - lines.append( - "
Sample Issues:
    " - ) + lines.append("
    Sample Issues:
      ") for sample in issue.samples[: self.sample_limit]: sample_text = self._format_sample_html(sample) lines.append(f"
    • {sample_text}
    • ") diff --git a/rmagent/generators/timeline.py b/rmagent/generators/timeline.py index 1b3e22e..744defe 100644 --- a/rmagent/generators/timeline.py +++ b/rmagent/generators/timeline.py @@ -234,9 +234,7 @@ def _extract(db: RMDatabase) -> dict: continue # Build timeline event - timeline_event = self._build_timeline_event( - db, event, person_id, birth_year, group_by_phase - ) + timeline_event = self._build_timeline_event(db, event, person_id, birth_year, group_by_phase) if timeline_event: timeline_events.append(timeline_event) @@ -286,9 +284,7 @@ def _build_timeline_event( place_formatted = self._format_place_for_timeline(place_str) # Build narrative text - narrative = self._build_event_narrative( - event_type_name, display_date, place_formatted, details - ) + narrative = self._build_event_narrative(event_type_name, display_date, place_formatted, details) # Get media media = self._get_event_media(db, event_id) @@ -330,9 +326,7 @@ def _build_timeline_event( return timeline_event - def _parse_date_to_timelinejs( - self, rm_date: str - ) -> tuple[dict | None, dict | None, str | None]: + def _parse_date_to_timelinejs(self, rm_date: str) -> tuple[dict | None, dict | None, str | None]: """Parse RM11 date to TimelineJS3 format.""" # Check if date string is null/unknown (empty or starts with ".") if not rm_date or rm_date.startswith("."): @@ -425,11 +419,7 @@ def _get_event_type_name(self, db: RMDatabase, event_type_id: int) -> str: """Get event type name from FactTypeTable.""" cursor = db.execute("SELECT Name FROM FactTypeTable WHERE FactTypeID = ?", (event_type_id,)) row = cursor.fetchone() - return ( - _get_row_value(row, "Name", f"Event {event_type_id}") - if row - else f"Event {event_type_id}" - ) + return _get_row_value(row, "Name", f"Event {event_type_id}") if row else f"Event {event_type_id}" def _get_event_media(self, db: RMDatabase, event_id: int) -> dict | None: """Get primary media for an event.""" diff --git a/rmagent/rmlib/database.py b/rmagent/rmlib/database.py index 7613c19..1c801b5 100644 --- a/rmagent/rmlib/database.py +++ b/rmagent/rmlib/database.py @@ -145,9 +145,7 @@ def _load_rmnocase_collation(self) -> None: # - caseLevel=off: Ignore case differences # - normalization=on: Normalize Unicode characters self._conn.execute( - "SELECT icu_load_collation(" - "'en_US@colStrength=primary;caseLevel=off;normalization=on'," - "'RMNOCASE')" + "SELECT icu_load_collation(" "'en_US@colStrength=primary;caseLevel=off;normalization=on'," "'RMNOCASE')" ) logger.debug("RMNOCASE collation registered successfully") finally: @@ -173,9 +171,7 @@ def connection(self) -> sqlite3.Connection: DatabaseError: If no active connection """ if self._conn is None: - raise DatabaseError( - "No active connection - use 'with RMDatabase(...)' or call connect()" - ) + raise DatabaseError("No active connection - use 'with RMDatabase(...)' or call connect()") return self._conn def execute(self, query: str, params: tuple | None = None) -> sqlite3.Cursor: diff --git a/rmagent/rmlib/models.py b/rmagent/rmlib/models.py index 04f1c1f..4f9c720 100644 --- a/rmagent/rmlib/models.py +++ b/rmagent/rmlib/models.py @@ -115,28 +115,18 @@ class Person(RMBaseModel): """ person_id: int = Field(..., alias="PersonID", description="Unique person identifier") - unique_id: str | None = Field( - None, alias="UniqueID", description="36-character hexadecimal unique ID" - ) + unique_id: str | None = Field(None, alias="UniqueID", description="36-character hexadecimal unique ID") sex: Sex = Field(..., alias="Sex", description="Person's sex/gender") parent_id: int = Field(0, alias="ParentID", description="FamilyID of parents (0 = no parents)") spouse_id: int = Field(0, alias="SpouseID", description="FamilyID of spouse (0 = no spouse)") - color: int = Field( - 0, alias="Color", ge=0, le=27, description="Color coding (0=None, 1-27=specific colors)" - ) - relate1: int = Field( - 0, ge=0, le=999, alias="Relate1", description="Generations to Most Recent Common Ancestor" - ) - relate2: int = Field( - 0, ge=0, alias="Relate2", description="Generations from reference person to MRCA" - ) + color: int = Field(0, alias="Color", ge=0, le=27, description="Color coding (0=None, 1-27=specific colors)") + relate1: int = Field(0, ge=0, le=999, alias="Relate1", description="Generations to Most Recent Common Ancestor") + relate2: int = Field(0, ge=0, alias="Relate2", description="Generations from reference person to MRCA") flags: int = Field(0, ge=0, le=10, alias="Flags", description="Relationship prefix descriptor") living: bool = Field(False, alias="Living", description="True if person is living") is_private: int = Field(0, alias="IsPrivate", description="Privacy flag (not implemented)") proof: int = Field(0, alias="Proof", description="Proof level (not implemented)") - bookmark: int = Field( - 0, alias="Bookmark", description="Bookmark flag (0=not bookmarked, 1=bookmarked)" - ) + bookmark: int = Field(0, alias="Bookmark", description="Bookmark flag (0=not bookmarked, 1=bookmarked)") note: str | None = Field(None, alias="Note", description="User-defined notes") @field_validator("sex", mode="before") @@ -168,43 +158,25 @@ class Name(RMBaseModel): surname: str | None = Field(None, alias="Surname", description="Surname/family name") given: str | None = Field(None, alias="Given", description="Given/first name") prefix: str | None = Field(None, alias="Prefix", description="Name prefix (Dr., Rev., etc.)") - suffix: str | None = Field( - None, alias="Suffix", description="Name suffix (Jr., Sr., III, etc.)" - ) + suffix: str | None = Field(None, alias="Suffix", description="Name suffix (Jr., Sr., III, etc.)") nickname: str | None = Field(None, alias="Nickname", description="Nickname") name_type: NameType = Field(NameType.NULL, alias="NameType", description="Type of name") - date: str | None = Field( - None, alias="Date", description="Date associated with this name (24-char encoded)" - ) + date: str | None = Field(None, alias="Date", description="Date associated with this name (24-char encoded)") sort_date: int | None = Field( None, alias="SortDate", description="Sortable date representation (9223372036854775807 = unknown)", ) - is_primary: bool = Field( - False, alias="IsPrimary", description="True if this is the primary name" - ) + is_primary: bool = Field(False, alias="IsPrimary", description="True if this is the primary name") is_private: bool = Field(False, alias="IsPrivate", description="True if name is private") - proof: ProofLevel = Field( - ProofLevel.BLANK, alias="Proof", description="Evidence quality rating" - ) + proof: ProofLevel = Field(ProofLevel.BLANK, alias="Proof", description="Evidence quality rating") sentence: str | None = Field(None, alias="Sentence", description="Custom sentence template") note: str | None = Field(None, alias="Note", description="User-defined notes") - birth_year: int | None = Field( - None, alias="BirthYear", description="Year extracted from birth event" - ) - death_year: int | None = Field( - None, alias="DeathYear", description="Year extracted from death event" - ) - surname_mp: str | None = Field( - None, alias="SurnameMP", description="Metaphone encoding of surname" - ) - given_mp: str | None = Field( - None, alias="GivenMP", description="Metaphone encoding of given name" - ) - nickname_mp: str | None = Field( - None, alias="NicknameMP", description="Metaphone encoding of nickname" - ) + birth_year: int | None = Field(None, alias="BirthYear", description="Year extracted from birth event") + death_year: int | None = Field(None, alias="DeathYear", description="Year extracted from death event") + surname_mp: str | None = Field(None, alias="SurnameMP", description="Metaphone encoding of surname") + given_mp: str | None = Field(None, alias="GivenMP", description="Metaphone encoding of given name") + nickname_mp: str | None = Field(None, alias="NicknameMP", description="Metaphone encoding of nickname") @field_validator("is_primary", "is_private", mode="before") @classmethod @@ -238,26 +210,18 @@ class Event(RMBaseModel): event_id: int = Field(..., alias="EventID", description="Unique event identifier") event_type: int = Field(..., alias="EventType", description="FactTypeID from FactTypeTable") - owner_type: OwnerType = Field( - ..., alias="OwnerType", description="Type of owner (person or family)" - ) + owner_type: OwnerType = Field(..., alias="OwnerType", description="Type of owner (person or family)") owner_id: int = Field(..., alias="OwnerID", description="PersonID or FamilyID") - family_id: int = Field( - 0, alias="FamilyID", description="FamilyID for parent-related events (0 = not applicable)" - ) + family_id: int = Field(0, alias="FamilyID", description="FamilyID for parent-related events (0 = not applicable)") place_id: int = Field(0, alias="PlaceID", description="PlaceID (0 = no place)") site_id: int = Field(0, alias="SiteID", description="PlaceID of place details (0 = no details)") date: str | None = Field(None, alias="Date", description="Date in 24-character encoded format") - sort_date: int | None = Field( - None, alias="SortDate", description="Sortable date representation" - ) + sort_date: int | None = Field(None, alias="SortDate", description="Sortable date representation") is_primary: bool = Field( False, alias="IsPrimary", description="True if this is primary event (suppresses conflicts)" ) is_private: bool = Field(False, alias="IsPrivate", description="True if event is private") - proof: ProofLevel = Field( - ProofLevel.BLANK, alias="Proof", description="Evidence quality rating" - ) + proof: ProofLevel = Field(ProofLevel.BLANK, alias="Proof", description="Evidence quality rating") status: int = Field(0, alias="Status", description="LDS status (0=default, 1-12=LDS statuses)") sentence: str | None = Field(None, alias="Sentence", description="Custom sentence template") details: str | None = Field(None, alias="Details", description="Event details/description") @@ -280,24 +244,16 @@ class Place(RMBaseModel): """ place_id: int = Field(..., alias="PlaceID", description="Unique place identifier") - place_type: PlaceType = Field( - PlaceType.PLACE, alias="PlaceType", description="Type of place entry" - ) - name: str | None = Field( - None, alias="Name", description="Place name (comma-delimited hierarchy)" - ) + place_type: PlaceType = Field(PlaceType.PLACE, alias="PlaceType", description="Type of place entry") + name: str | None = Field(None, alias="Name", description="Place name (comma-delimited hierarchy)") abbrev: str | None = Field(None, alias="Abbrev", description="Abbreviated place name") normalized: str | None = Field(None, alias="Normalized", description="Standardized place name") latitude: int = Field(0, alias="Latitude", description="Latitude (decimal degrees × 1e7)") longitude: int = Field(0, alias="Longitude", description="Longitude (decimal degrees × 1e7)") - lat_long_exact: bool = Field( - False, alias="LatLongExact", description="True if coordinates are exact" - ) + lat_long_exact: bool = Field(False, alias="LatLongExact", description="True if coordinates are exact") master_id: int = Field(0, alias="MasterID", description="PlaceID of master place (for details)") note: str | None = Field(None, alias="Note", description="User-defined notes") - reverse: str | None = Field( - None, alias="Reverse", description="Reverse order of place hierarchy (for indexing)" - ) + reverse: str | None = Field(None, alias="Reverse", description="Reverse order of place hierarchy (for indexing)") fs_id: int | None = Field(None, alias="fsID", description="FamilySearch place ID") an_id: int | None = Field(None, alias="anID", description="Ancestry.com place ID") @@ -338,9 +294,7 @@ class Source(RMBaseModel): comments: str | None = Field(None, alias="Comments", description="Source comments") is_private: bool = Field(False, alias="IsPrivate", description="True if source is private") template_id: int = Field(0, alias="TemplateID", description="SourceTemplateID (0=free-form)") - fields: bytes | None = Field( - None, alias="Fields", description="XML BLOB with field values (UTF-8 with BOM)" - ) + fields: bytes | None = Field(None, alias="Fields", description="XML BLOB with field values (UTF-8 with BOM)") @field_validator("is_private", mode="before") @classmethod @@ -364,18 +318,12 @@ class Citation(RMBaseModel): actual_text: str | None = Field(None, alias="ActualText", description="Research note") ref_number: str | None = Field(None, alias="RefNumber", description="Detail reference number") footnote: str | None = Field(None, alias="Footnote", description="Custom footnote override") - short_footnote: str | None = Field( - None, alias="ShortFootnote", description="Custom short footnote override" - ) - bibliography: str | None = Field( - None, alias="Bibliography", description="Custom bibliography override" - ) + short_footnote: str | None = Field(None, alias="ShortFootnote", description="Custom short footnote override") + bibliography: str | None = Field(None, alias="Bibliography", description="Custom bibliography override") fields: bytes | None = Field( None, alias="Fields", description="XML BLOB with citation field values (UTF-8 with BOM)" ) - citation_name: str | None = Field( - None, alias="CitationName", description="Auto-generated or user-defined name" - ) + citation_name: str | None = Field(None, alias="CitationName", description="Auto-generated or user-defined name") class Family(RMBaseModel): @@ -392,21 +340,11 @@ class Family(RMBaseModel): husb_order: int = Field(0, alias="HusbOrder", description="Spouse order (0=never rearranged)") wife_order: int = Field(0, alias="WifeOrder", description="Spouse order (0=never rearranged)") is_private: bool = Field(False, alias="IsPrivate", description="True if family is private") - proof: ProofLevel = Field( - ProofLevel.BLANK, alias="Proof", description="Evidence quality rating" - ) - father_label: ParentLabel = Field( - ParentLabel.FATHER, alias="FatherLabel", description="Label for father role" - ) - mother_label: MotherLabel = Field( - MotherLabel.MOTHER, alias="MotherLabel", description="Label for mother role" - ) - father_label_str: str | None = Field( - None, alias="FatherLabelStr", description="Custom label when FatherLabel=99" - ) - mother_label_str: str | None = Field( - None, alias="MotherLabelStr", description="Custom label when MotherLabel=99" - ) + proof: ProofLevel = Field(ProofLevel.BLANK, alias="Proof", description="Evidence quality rating") + father_label: ParentLabel = Field(ParentLabel.FATHER, alias="FatherLabel", description="Label for father role") + mother_label: MotherLabel = Field(MotherLabel.MOTHER, alias="MotherLabel", description="Label for mother role") + father_label_str: str | None = Field(None, alias="FatherLabelStr", description="Custom label when FatherLabel=99") + mother_label_str: str | None = Field(None, alias="MotherLabelStr", description="Custom label when MotherLabel=99") note: str | None = Field(None, alias="Note", description="User-defined notes") @field_validator("is_private", mode="before") @@ -430,21 +368,15 @@ class FactType(RMBaseModel): alias="FactTypeID", description="Unique fact type identifier (<1000=built-in, ≥1000=custom)", ) - owner_type: OwnerType = Field( - ..., alias="OwnerType", description="Type of owner (person or family)" - ) + owner_type: OwnerType = Field(..., alias="OwnerType", description="Type of owner (person or family)") name: str = Field(..., alias="Name", description="Fact type name") abbrev: str | None = Field(None, alias="Abbrev", description="Abbreviation") gedcom_tag: str | None = Field(None, alias="GedcomTag", description="GEDCOM tag") - use_value: bool = Field( - False, alias="UseValue", description="True if fact uses description field" - ) + use_value: bool = Field(False, alias="UseValue", description="True if fact uses description field") use_date: bool = Field(True, alias="UseDate", description="True if fact uses date field") use_place: bool = Field(True, alias="UsePlace", description="True if fact uses place field") sentence: str | None = Field(None, alias="Sentence", description="Sentence template") - flags: int = Field( - 0, alias="Flags", description="6-bit position-coded flags for Include settings" - ) + flags: int = Field(0, alias="Flags", description="6-bit position-coded flags for Include settings") @field_validator("use_value", "use_date", "use_place", mode="before") @classmethod diff --git a/rmagent/rmlib/parsers/blob_parser.py b/rmagent/rmlib/parsers/blob_parser.py index b6bdd3c..84ed276 100644 --- a/rmagent/rmlib/parsers/blob_parser.py +++ b/rmagent/rmlib/parsers/blob_parser.py @@ -170,9 +170,7 @@ def parse_template_field_defs(blob_data: bytes | None) -> list[TemplateField]: hint = hint_elem.text if hint_elem is not None else None long_hint = long_hint_elem.text if long_hint_elem is not None else None - citation_field = ( - citation_field_elem.text == "True" if citation_field_elem is not None else False - ) + citation_field = citation_field_elem.text == "True" if citation_field_elem is not None else False field_defs.append( TemplateField( @@ -242,12 +240,7 @@ def is_freeform_source(fields: dict[str, str]) -> bool: Returns: True if this appears to be a free-form source """ - return ( - len(fields) == 3 - and "Footnote" in fields - and "ShortFootnote" in fields - and "Bibliography" in fields - ) + return len(fields) == 3 and "Footnote" in fields and "ShortFootnote" in fields and "Bibliography" in fields def get_citation_level_fields(template_fields: list[TemplateField]) -> list[str]: diff --git a/rmagent/rmlib/parsers/date_parser.py b/rmagent/rmlib/parsers/date_parser.py index 3b85fb1..bdfd899 100644 --- a/rmagent/rmlib/parsers/date_parser.py +++ b/rmagent/rmlib/parsers/date_parser.py @@ -176,13 +176,7 @@ def to_datetime(self) -> datetime | None: - Date is BC - Date is a range """ - if ( - self.is_null - or self.date_type == DateType.TEXT - or self.is_partial - or self.is_bc - or self.is_range - ): + if self.is_null or self.date_type == DateType.TEXT or self.is_partial or self.is_bc or self.is_range: return None try: @@ -329,9 +323,7 @@ def parse_rm_date(date_str: str | None) -> RMDate: year, month, day, is_bc, is_double_date, qualifier = _parse_date_components(date_str[2:13]) # Parse second date (for ranges) - year2, month2, day2, is_bc2, is_double_date2, qualifier2 = _parse_date_components( - date_str[13:24] - ) + year2, month2, day2, is_bc2, is_double_date2, qualifier2 = _parse_date_components(date_str[13:24]) return RMDate( date_type=date_type, diff --git a/rmagent/rmlib/parsers/name_parser.py b/rmagent/rmlib/parsers/name_parser.py index c40ff0c..4e595af 100644 --- a/rmagent/rmlib/parsers/name_parser.py +++ b/rmagent/rmlib/parsers/name_parser.py @@ -284,9 +284,7 @@ def get_all_names(person_id: int, db_connection: sqlite3.Connection) -> list[Nam return names -def get_name_at_date( - person_id: int, event_sort_date: int | None, db_connection: sqlite3.Connection -) -> Name | None: +def get_name_at_date(person_id: int, event_sort_date: int | None, db_connection: sqlite3.Connection) -> Name | None: """ Get appropriate name for a specific date (context-aware). diff --git a/rmagent/rmlib/parsers/place_parser.py b/rmagent/rmlib/parsers/place_parser.py index 5f3df14..d2497db 100644 --- a/rmagent/rmlib/parsers/place_parser.py +++ b/rmagent/rmlib/parsers/place_parser.py @@ -225,9 +225,7 @@ def format_place_medium(place_name: str | None) -> str: return place_name -def convert_coordinates( - lat_int: int | None, lon_int: int | None -) -> tuple[float | None, float | None]: +def convert_coordinates(lat_int: int | None, lon_int: int | None) -> tuple[float | None, float | None]: """ Convert integer coordinates to decimal degrees. diff --git a/rmagent/rmlib/prototype.py b/rmagent/rmlib/prototype.py deleted file mode 100644 index 37db724..0000000 --- a/rmagent/rmlib/prototype.py +++ /dev/null @@ -1,651 +0,0 @@ -#!/usr/bin/env python3 -""" -Prototype script for Milestone 1: Working Prototype - -Demonstrates: -1. Database connection with RMNOCASE -2. Person query with complete data (name, events, places) -3. Date parsing for all formats -4. Data quality checks -5. Basic biography generation (no AI yet) - -Usage: - python -m rmagent.rmlib.prototype --person-id 1 --check-quality - python -m rmagent.rmlib.prototype --person-id 1541 --check-quality -""" - -from __future__ import annotations - -import argparse -import sys -from pathlib import Path - -# Add project root to path -PROJECT_ROOT = Path(__file__).resolve().parents[2] -if str(PROJECT_ROOT) not in sys.path: - sys.path.insert(0, str(PROJECT_ROOT)) - -from rmagent.rmlib.database import RMDatabase -from rmagent.rmlib.parsers.blob_parser import parse_citation_fields, parse_source_fields -from rmagent.rmlib.parsers.date_parser import parse_rm_date -from rmagent.rmlib.parsers.name_parser import format_full_name -from rmagent.rmlib.parsers.place_parser import ( - format_place_medium, - format_place_short, -) -from rmagent.rmlib.quality import DataQualityValidator -from rmagent.rmlib.queries import QueryService - - -def get_row_value(row, key: str, default=None): - """Get value from sqlite3.Row object with default.""" - try: - return row[key] if key in row.keys() else default - except (KeyError, TypeError): - return default - - -def render_italics(text: str) -> str: - """ - Render ... or ... tags as italic text in terminal. - - Uses ANSI italic codes: \033[3m for italic start, \033[23m to reset italic. - Handles both lowercase and uppercase tags. - """ - if not text: - return "" - - # Replace both lowercase and uppercase italic tags - result = text.replace("", "\033[3m").replace("", "\033[23m") - result = result.replace("", "\033[3m").replace("", "\033[23m") - return result - - -def format_person_info(person: dict, query_service: QueryService) -> str: - """Format person information for display.""" - lines = [] - lines.append("=" * 70) - lines.append(f"PERSON INFORMATION (ID: {person['PersonID']})") - lines.append("=" * 70) - - # Format name - full_name = format_full_name( - given=get_row_value(person, "Given"), - surname=get_row_value(person, "Surname"), - prefix=get_row_value(person, "Prefix"), - suffix=get_row_value(person, "Suffix"), - ) - lines.append(f"\nName: {full_name}") - - # Format birth/death years - birth_year = get_row_value(person, "BirthYear") - death_year = get_row_value(person, "DeathYear") - if birth_year or death_year: - years = f"({birth_year or '?'} - {death_year or '?'})" - lines.append(f"Years: {years}") - - # Format sex - sex_map = {0: "Male", 1: "Female", 2: "Unknown"} - sex = sex_map.get(get_row_value(person, "Sex"), "Unknown") - lines.append(f"Sex: {sex}") - - return "\n".join(lines) - - -def format_web_tags(person_id: int, db: RMDatabase) -> str: - """Format web tags (URLs) for a person.""" - lines = [] - - # Query URLTable for this person - web_tags = db.query( - """ - SELECT Name, URL, Note - FROM URLTable - WHERE OwnerType = 0 AND OwnerID = ? - ORDER BY Name - """, - (person_id,), - ) - - if not web_tags: - return "" - - lines.append("\n" + "-" * 70) - lines.append("WEB LINKS") - lines.append("-" * 70) - - for tag in web_tags: - name = get_row_value(tag, "Name", "[Unnamed]") - url = get_row_value(tag, "URL", "") - note = get_row_value(tag, "Note", "") - - lines.append(f"\n{name}: {url}") - if note: - lines.append(f" Note: {note}") - - return "\n".join(lines) - - -def format_events(person_id: int, query_service: QueryService) -> str: - """Format person's events for display.""" - lines = [] - lines.append("\n" + "-" * 70) - lines.append("EVENTS") - lines.append("-" * 70) - - events = query_service.get_person_events(person_id) - - if not events: - lines.append("\nNo events recorded.") - return "\n".join(lines) - - for event in events: - event_type = event["EventType"] - - # Parse and format date - date_str = event["Date"] - if date_str: - try: - date = parse_rm_date(date_str) - formatted_date = date.format_display() - except Exception: - formatted_date = date_str - else: - formatted_date = "[No date]" - - # Parse and format place - place_str = get_row_value(event, "Place", "") - if place_str: - try: - formatted_place = format_place_short(place_str) - except Exception: - formatted_place = place_str - else: - formatted_place = "[No place]" - - # Format details - details = get_row_value(event, "Details", "") - if details: - details_str = f" - {details}" - else: - details_str = "" - - lines.append(f"\n{event_type}: {formatted_date}, {formatted_place}{details_str}") - - return "\n".join(lines) - - -def format_citations(person_id: int, db: RMDatabase) -> str: - """Format citations for a person's events.""" - lines = [] - - # Query citations linked to this person's events - citations = db.query( - """ - SELECT - e.EventID, - ft.Name as EventType, - e.Date, - e.Details, - c.CitationID, - c.CitationName, - c.Fields as CitationFields, - s.SourceID, - s.Name as SourceName - FROM EventTable e - JOIN FactTypeTable ft ON e.EventType = ft.FactTypeID - LEFT JOIN CitationLinkTable cl ON cl.OwnerType = 2 AND cl.OwnerID = e.EventID - LEFT JOIN CitationTable c ON c.CitationID = cl.CitationID - LEFT JOIN SourceTable s ON s.SourceID = c.SourceID - WHERE e.OwnerType = 0 AND e.OwnerID = ? - AND c.CitationID IS NOT NULL - ORDER BY e.SortDate, cl.SortOrder - """, - (person_id,), - ) - - if not citations: - return "" - - lines.append("\n" + "-" * 70) - lines.append("CITATIONS") - lines.append("-" * 70) - - # Group citations by event - current_event = None - citation_num = 0 - - for cit in citations: - event_id = cit["EventID"] - event_type = cit["EventType"] - event_date = cit["Date"] - event_details = get_row_value(cit, "Details", "") - - # Format event header - if event_id != current_event: - current_event = event_id - - # Format date - if event_date: - try: - date = parse_rm_date(event_date) - date_str = date.format_display() - except Exception: - date_str = event_date - else: - date_str = "[No date]" - - # Event header - event_header = f"{event_type} ({date_str})" - if event_details: - event_header += f" - {event_details}" - - lines.append(f"\n{event_header}") - - # Citation details - citation_num += 1 - _citation_id = cit["CitationID"] # Available for future use - source_name = get_row_value(cit, "SourceName", "[Unknown Source]") - - # Parse citation fields to get page number - page = "" - if cit["CitationFields"]: - try: - fields = parse_citation_fields(cit["CitationFields"]) - page = fields.get("Page", "") - except Exception: - pass - - if page: - lines.append(f" [{citation_num}] Citation: Page {page} → Source: {source_name}") - else: - lines.append(f" [{citation_num}] Citation: (no page) → Source: {source_name}") - - return "\n".join(lines) - - -def format_sources(person_id: int, db: RMDatabase) -> str: - """Format unique sources for a person's citations.""" - lines = [] - - # Query unique sources for this person's events - sources = db.query( - """ - SELECT - s.SourceID, - s.Name, - s.TemplateID, - s.ActualText, - s.Fields as SourceFields, - COUNT(DISTINCT c.CitationID) as CitationCount - FROM EventTable e - JOIN CitationLinkTable cl ON cl.OwnerType = 2 AND cl.OwnerID = e.EventID - JOIN CitationTable c ON c.CitationID = cl.CitationID - JOIN SourceTable s ON s.SourceID = c.SourceID - WHERE e.OwnerType = 0 AND e.OwnerID = ? - GROUP BY s.SourceID - ORDER BY s.Name - """, - (person_id,), - ) - - if not sources: - return "" - - lines.append("\n" + "-" * 70) - lines.append(f"SOURCES ({len(sources)} unique source{'s' if len(sources) != 1 else ''})") - lines.append("-" * 70) - - for i, src in enumerate(sources, 1): - _source_id = src["SourceID"] # Available for future use - source_name = src["Name"] - template_id = src["TemplateID"] - actual_text = get_row_value(src, "ActualText", "") - citation_count = src["CitationCount"] - - # Display bibliography - bibliography_text = None - - # First, try to get Bibliography field from BLOB - if src["SourceFields"]: - try: - fields = parse_source_fields(src["SourceFields"]) - if "Bibliography" in fields and fields["Bibliography"]: - bibliography_text = fields["Bibliography"] - except Exception: - pass - - # Fallback to ActualText for free-form sources - if not bibliography_text and template_id == 0 and actual_text: - bibliography_text = actual_text - - if bibliography_text: - # Render with italics support - formatted_bib = render_italics(bibliography_text) - lines.append(f"\n[{i}] {formatted_bib}") - else: - # No bibliography text available, just show source name - lines.append(f"\n[{i}] {source_name}") - - # Show citation count - lines.append(f" (Used in {citation_count} citation{'s' if citation_count != 1 else ''})") - - return "\n".join(lines) - - -def format_family(person_id: int, query_service: QueryService) -> str: - """Format person's family relationships for display.""" - lines = [] - lines.append("\n" + "-" * 70) - lines.append("FAMILY RELATIONSHIPS") - lines.append("-" * 70) - - # Parents - parents = query_service.get_parents(person_id) - if parents: - father_name = ( - format_full_name( - given=get_row_value(parents, "FatherGiven"), - surname=get_row_value(parents, "FatherSurname"), - ) - if get_row_value(parents, "FatherID") - else "Unknown" - ) - - mother_name = ( - format_full_name( - given=get_row_value(parents, "MotherGiven"), - surname=get_row_value(parents, "MotherSurname"), - ) - if get_row_value(parents, "MotherID") - else "Unknown" - ) - - lines.append(f"\nFather: {father_name} (ID: {get_row_value(parents, 'FatherID', 'N/A')})") - lines.append(f"Mother: {mother_name} (ID: {get_row_value(parents, 'MotherID', 'N/A')})") - - # Spouses - spouses = query_service.get_spouses(person_id) - if spouses: - lines.append(f"\nSpouses ({len(spouses)}):") - for spouse in spouses: - spouse_name = format_full_name( - given=get_row_value(spouse, "Given"), surname=get_row_value(spouse, "Surname") - ) - marriage_date = get_row_value(spouse, "MarriageDate", "") - if marriage_date: - try: - date = parse_rm_date(marriage_date) - date_str = f" (m. {date.format_display()})" - except Exception: - date_str = f" (m. {marriage_date})" - else: - date_str = "" - lines.append(f" - {spouse_name} (ID: {spouse['PersonID']}){date_str}") - - # Children - children = query_service.get_children(person_id) - if children: - lines.append(f"\nChildren ({len(children)}):") - for child in children: - child_name = format_full_name( - given=get_row_value(child, "Given"), surname=get_row_value(child, "Surname") - ) - birth_year = get_row_value(child, "BirthYear", "") - year_str = f" (b. {birth_year})" if birth_year else "" - lines.append(f" - {child_name} (ID: {child['PersonID']}){year_str}") - - if not parents and not spouses and not children: - lines.append("\nNo family relationships recorded.") - - return "\n".join(lines) - - -def generate_basic_biography(person_id: int, query_service: QueryService) -> str: - """Generate a basic biography (no AI enhancement yet).""" - lines = [] - lines.append("\n" + "-" * 70) - lines.append("BASIC BIOGRAPHY (Text-based, no AI)") - lines.append("-" * 70) - - # Get person info - person = query_service.get_person_with_primary_name(person_id) - if not person: - return "\n".join(lines + ["\nPerson not found."]) - - full_name = format_full_name( - given=get_row_value(person, "Given"), - surname=get_row_value(person, "Surname"), - prefix=get_row_value(person, "Prefix"), - suffix=get_row_value(person, "Suffix"), - ) - - # Introduction - birth_year = get_row_value(person, "BirthYear") - death_year = get_row_value(person, "DeathYear") - - intro = f"\n{full_name}" - if birth_year and death_year: - intro += f" ({birth_year}-{death_year})" - elif birth_year: - intro += f" (b. {birth_year})" - elif death_year: - intro += f" (d. {death_year})" - - lines.append(intro) - - # Get vital events - vital_events = query_service.get_vital_events(person_id) - - # Birth - birth = next((e for e in vital_events if e["FactTypeID"] == 1), None) - if birth: - birth_date = get_row_value(birth, "Date", "") - birth_place = get_row_value(birth, "Place", "") - if birth_date: - try: - date = parse_rm_date(birth_date) - birth_text = f"{full_name} was born on {date.format_display()}" - except Exception: - birth_text = f"{full_name} was born" - else: - birth_text = f"{full_name} was born" - - if birth_place: - try: - formatted_place = format_place_medium(birth_place) - birth_text += f" in {formatted_place}" - except Exception: - birth_text += f" in {birth_place}" - - lines.append(f"\n{birth_text}.") - - # Marriage - spouses = query_service.get_spouses(person_id) - if spouses: - for spouse in spouses: - spouse_name = format_full_name( - given=get_row_value(spouse, "Given"), surname=get_row_value(spouse, "Surname") - ) - marriage_date = get_row_value(spouse, "MarriageDate", "") - if marriage_date: - try: - date = parse_rm_date(marriage_date) - lines.append(f"\n{full_name} married {spouse_name} on {date.format_display()}.") - except Exception: - lines.append(f"\n{full_name} married {spouse_name}.") - - # Children - children = query_service.get_children(person_id) - if children: - if len(children) == 1: - lines.append(f"\n{full_name} had one child.") - else: - lines.append(f"\n{full_name} had {len(children)} children.") - - # Death - death = next((e for e in vital_events if e["FactTypeID"] == 2), None) - if death: - death_date = get_row_value(death, "Date", "") - death_place = get_row_value(death, "Place", "") - if death_date: - try: - date = parse_rm_date(death_date) - death_text = f"{full_name} died on {date.format_display()}" - except Exception: - death_text = f"{full_name} died" - else: - death_text = f"{full_name} died" - - if death_place: - try: - formatted_place = format_place_medium(death_place) - death_text += f" in {formatted_place}" - except Exception: - death_text += f" in {death_place}" - - lines.append(f"\n{death_text}.") - - return "\n".join(lines) - - -def run_quality_checks(db: RMDatabase, person_id: int | None = None) -> str: - """Run data quality checks.""" - lines = [] - lines.append("\n" + "-" * 70) - lines.append("DATA QUALITY CHECKS") - lines.append("-" * 70) - - validator = DataQualityValidator(db, sample_limit=5) - - # Run all validation checks - lines.append("\nRunning all validation rules...") - report = validator.run_all_checks() - - # Display summary - lines.append(f"\nTotal Issues: {len(report.issues)}") - - # Show issues by severity - lines.append("\nIssues by Severity:") - for severity, count in report.totals_by_severity.items(): - lines.append(f" {severity}: {count}") - - # Show issues by category - lines.append("\nIssues by Category:") - for category, count in report.totals_by_category.items(): - lines.append(f" {category}: {count}") - - # Show entity counts - lines.append("\nEntity Counts:") - for entity, count in report.summary.items(): - lines.append(f" {entity}: {count}") - - # Show a few sample issues - if report.issues: - lines.append("\nSample Issues (first 3):") - for issue in report.issues[:3]: - lines.append(f"\n [{issue.severity}] {issue.rule_id}: {issue.name}") - lines.append(f" Description: {issue.description}") - lines.append(f" Affected count: {issue.count}") - if issue.samples: - lines.append(f" Sample records: {len(issue.samples)}") - for sample in issue.samples[:2]: - lines.append(f" - {sample}") - - lines.append("\nData quality validation complete.") - - return "\n".join(lines) - - -def main(): - """Main entry point.""" - parser = argparse.ArgumentParser( - description="Milestone 1 Working Prototype", - formatter_class=argparse.RawDescriptionHelpFormatter, - ) - parser.add_argument("--person-id", type=int, required=True, help="Person ID to query") - parser.add_argument("--check-quality", action="store_true", help="Run data quality checks") - parser.add_argument( - "--database", - type=str, - default="data/Iiams.rmtree", - help="Path to RootsMagic database (default: data/Iiams.rmtree)", - ) - parser.add_argument( - "--extension", - type=str, - default="sqlite-extension/icu.dylib", - help="Path to ICU extension (default: sqlite-extension/icu.dylib)", - ) - - args = parser.parse_args() - - # Validate paths - db_path = Path(args.database) - extension_path = Path(args.extension) - - if not db_path.exists(): - print(f"Error: Database not found: {db_path}", file=sys.stderr) - sys.exit(1) - - if not extension_path.exists(): - print(f"Error: ICU extension not found: {extension_path}", file=sys.stderr) - sys.exit(1) - - # Connect to database - print("Connecting to RootsMagic database...") - try: - with RMDatabase(db_path, extension_path=extension_path) as db: - query_service = QueryService(db) - - # Query person - person = query_service.get_person_with_primary_name(args.person_id) - if not person: - print(f"\nError: Person ID {args.person_id} not found.", file=sys.stderr) - sys.exit(1) - - # Display person information - print(format_person_info(person, query_service)) - - # Display web tags (Find a Grave, etc.) - web_tags_output = format_web_tags(args.person_id, db) - if web_tags_output: - print(web_tags_output) - - # Display events - print(format_events(args.person_id, query_service)) - - # Display family - print(format_family(args.person_id, query_service)) - - # Generate basic biography - print(generate_basic_biography(args.person_id, query_service)) - - # Display citations - citations_output = format_citations(args.person_id, db) - if citations_output: - print(citations_output) - - # Display sources - sources_output = format_sources(args.person_id, db) - if sources_output: - print(sources_output) - - # Run quality checks - if args.check_quality: - print(run_quality_checks(db, args.person_id)) - - print("\n" + "=" * 70) - print("Milestone 1 prototype complete!") - print("=" * 70) - - except Exception as e: - print(f"\nError: {e}", file=sys.stderr) - import traceback - - traceback.print_exc() - sys.exit(1) - - -if __name__ == "__main__": - main() diff --git a/rmagent/rmlib/quality.py b/rmagent/rmlib/quality.py index 7af0057..c637c67 100644 --- a/rmagent/rmlib/quality.py +++ b/rmagent/rmlib/quality.py @@ -20,7 +20,7 @@ parse_source_fields, parse_template_field_defs, ) -from .parsers.date_parser import UNKNOWN_SORT_DATE, parse_rm_date +from .parsers.date_parser import UNKNOWN_SORT_DATE # Numeric constants YEAR_SECONDS = 31557600 @@ -688,11 +688,7 @@ def _rule_4_3(self, rule: QualityRule) -> list[QualityIssue]: continue required = [field.name for field in template_fields if not field.citation_field] - missing = [ - field_name - for field_name in required - if not actual_fields.get(field_name, "").strip() - ] + missing = [field_name for field_name in required if not actual_fields.get(field_name, "").strip()] if missing: issues.append( { @@ -753,9 +749,7 @@ def _rule_5_1(self, rule: QualityRule) -> list[QualityIssue]: AND LENGTH(CAST(ABS(CAST(SortDate AS INTEGER)) AS TEXT)) NOT IN (18, 19)) ) """ - rows = self.db.query( - sql, (UNKNOWN_SORT_DATE, UNKNOWN_SORT_DATE, UNKNOWN_SORT_DATE, UNKNOWN_SORT_DATE) - ) + rows = self.db.query(sql, (UNKNOWN_SORT_DATE, UNKNOWN_SORT_DATE, UNKNOWN_SORT_DATE, UNKNOWN_SORT_DATE)) if not rows: return [] diff --git a/rmagent/rmlib/queries.py b/rmagent/rmlib/queries.py index 991923d..70a546e 100644 --- a/rmagent/rmlib/queries.py +++ b/rmagent/rmlib/queries.py @@ -337,9 +337,7 @@ def get_unsourced_vital_events( return self.db.query(sql, tuple(params)) # Pattern 13 - def find_places_by_name( - self, pattern: str, limit: int = DEFAULT_RESULT_LIMIT, exact: bool = False - ): + def find_places_by_name(self, pattern: str, limit: int = DEFAULT_RESULT_LIMIT, exact: bool = False): """ Find places by name with flexible or exact matching. @@ -382,7 +380,7 @@ def find_places_by_name( else: # Flexible matching (original behavior) # Split pattern by comma-space to get hierarchy parts - parts = [p.strip() for p in pattern.split(',') if p.strip()] + parts = [p.strip() for p in pattern.split(",") if p.strip()] if len(parts) == 1: # Simple case: single search term @@ -453,9 +451,7 @@ def find_places_within_radius( center_lon = center["Longitude"] if center["Longitude"] is not None else 0 if not center_lat or not center_lon or center_lat == 0 or center_lon == 0: - raise ValueError( - f"Place '{center['Name']}' (ID {center_place_id}) has no GPS coordinates" - ) + raise ValueError(f"Place '{center['Name']}' (ID {center_place_id}) has no GPS coordinates") # Convert integer coordinates to degrees center_lat_deg = center_lat / 10_000_000.0 @@ -481,9 +477,7 @@ def find_places_within_radius( place_lat_deg = place["Latitude"] / 10_000_000.0 place_lon_deg = place["Longitude"] / 10_000_000.0 - distance_km = _haversine_distance( - center_lat_deg, center_lon_deg, place_lat_deg, place_lon_deg - ) + distance_km = _haversine_distance(center_lat_deg, center_lon_deg, place_lat_deg, place_lon_deg) if distance_km <= radius_km: results.append( @@ -562,7 +556,7 @@ def _haversine_distance(lat1: float, lon1: float, lat2: float, lon2: float) -> f import math # Earth radius in kilometers - R = 6371.0 + earth_radius_km = 6371.0 # Convert degrees to radians lat1_rad = math.radians(lat1) @@ -571,11 +565,8 @@ def _haversine_distance(lat1: float, lon1: float, lat2: float, lon2: float) -> f delta_lon = math.radians(lon2 - lon1) # Haversine formula - a = ( - math.sin(delta_lat / 2) ** 2 - + math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(delta_lon / 2) ** 2 - ) + a = math.sin(delta_lat / 2) ** 2 + math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(delta_lon / 2) ** 2 c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a)) - distance = R * c + distance = earth_radius_km * c return distance diff --git a/sqlite-extension/python_example.py b/sqlite-extension/python_example.py index de3bab0..c717b89 100755 --- a/sqlite-extension/python_example.py +++ b/sqlite-extension/python_example.py @@ -49,9 +49,7 @@ def connect_rmtree(db_path, extension_path="./sqlite-extension/icu.dylib"): # - caseLevel=off: Ignore case differences # - normalization=on: Normalize Unicode characters conn.execute( - "SELECT icu_load_collation(" - "'en_US@colStrength=primary;caseLevel=off;normalization=on'," - "'RMNOCASE')" + "SELECT icu_load_collation(" "'en_US@colStrength=primary;caseLevel=off;normalization=on'," "'RMNOCASE')" ) finally: # Disable extension loading (security best practice) diff --git a/tests/integration/test_llm_providers.py b/tests/integration/test_llm_providers.py index 0430307..42ccbc6 100644 --- a/tests/integration/test_llm_providers.py +++ b/tests/integration/test_llm_providers.py @@ -176,9 +176,7 @@ class TestProviderInterfaceCompliance: ), ( OllamaProvider, - lambda m: setattr( - m, "generate", lambda **kw: {"response": "Text", "eval_count": 10} - ), + lambda m: setattr(m, "generate", lambda **kw: {"response": "Text", "eval_count": 10}), ), ], ) diff --git a/tests/integration/test_real_providers.py b/tests/integration/test_real_providers.py index 4698482..4c183b3 100644 --- a/tests/integration/test_real_providers.py +++ b/tests/integration/test_real_providers.py @@ -24,6 +24,7 @@ if _env_path.exists(): load_dotenv(_env_path) + # Environment checks - detect placeholder vs real keys def _is_real_key(key_value: str | None) -> bool: """Check if API key is real (not placeholder like sk-xxxxx).""" @@ -68,9 +69,7 @@ def test_genealogy_specific_prompt(self): assert result.usage.total_tokens > 0 # Check for genealogy keywords text_lower = result.text.lower() - assert any( - word in text_lower for word in ["census", "vital", "records", "birth", "death", "marriage"] - ) + assert any(word in text_lower for word in ["census", "vital", "records", "birth", "death", "marriage"]) @pytest.mark.real_api diff --git a/tests/unit/test_biography_generator.py b/tests/unit/test_biography_generator.py index 39d4dc0..d1a376a 100644 --- a/tests/unit/test_biography_generator.py +++ b/tests/unit/test_biography_generator.py @@ -336,6 +336,7 @@ def test_apply_privacy_rules_for_living_person(self): def test_generate_introduction(self): """Test generating introduction section.""" from rmagent.generators.biography import BiographyTemplates + templates = BiographyTemplates() context = PersonContext( @@ -370,6 +371,7 @@ def test_generate_introduction(self): def test_generate_early_life(self): """Test generating early life section.""" from rmagent.generators.biography import BiographyTemplates + templates = BiographyTemplates() # Test with siblings @@ -399,6 +401,7 @@ def test_generate_early_life(self): def test_format_sources_footnote_style(self): """Test formatting sources in footnote style.""" from rmagent.generators.biography import CitationProcessor + citation_processor = CitationProcessor() context = PersonContext( @@ -438,6 +441,7 @@ def test_format_sources_footnote_style(self): def test_format_sources_parenthetical_style(self): """Test formatting sources in parenthetical style.""" from rmagent.generators.biography import CitationProcessor + citation_processor = CitationProcessor() context = PersonContext( @@ -470,6 +474,7 @@ def test_format_sources_parenthetical_style(self): def test_parse_ai_response(self): """Test parsing AI-generated biography.""" from rmagent.generators.biography import BiographyTemplates + templates = BiographyTemplates() ai_response = """ @@ -641,9 +646,7 @@ def test_categorize_events(self, real_db_path, extension_path): }, # Residence ] - vital, education, occupation, military, residence, other = generator._categorize_events( - db, events - ) + vital, education, occupation, military, residence, other = generator._categorize_events(db, events) assert len(vital) == 1 assert len(education) == 1 diff --git a/tests/unit/test_citations.py b/tests/unit/test_citations.py new file mode 100644 index 0000000..99919b7 --- /dev/null +++ b/tests/unit/test_citations.py @@ -0,0 +1,408 @@ +""" +Unit tests for biography citation processing and formatting. + +Tests citation formatting, footnote generation, and bibliography creation. +""" + +from rmagent.generators.biography.citations import CitationProcessor +from rmagent.generators.biography.models import CitationInfo, CitationStyle, CitationTracker + + +class TestStripSourceTypePrefix: + """Test strip_source_type_prefix static method.""" + + def test_strip_book_prefix(self): + """Test removing 'Book: ' prefix.""" + result = CitationProcessor.strip_source_type_prefix("Book: Smith Family History") + assert result == "Smith Family History" + + def test_strip_newspaper_prefix(self): + """Test removing 'Newspaper: ' prefix.""" + result = CitationProcessor.strip_source_type_prefix("Newspaper: Baltimore Sun") + assert result == "Baltimore Sun" + + def test_strip_newspapers_plural_prefix(self): + """Test removing 'Newspapers: ' prefix.""" + result = CitationProcessor.strip_source_type_prefix("Newspapers: New York Times") + assert result == "New York Times" + + def test_no_prefix_to_strip(self): + """Test source name without prefix remains unchanged.""" + result = CitationProcessor.strip_source_type_prefix("US Census Records") + assert result == "US Census Records" + + def test_strip_cemetery_prefix(self): + """Test removing 'Cemetery: ' prefix.""" + result = CitationProcessor.strip_source_type_prefix("Cemetery: Oak Hill") + assert result == "Oak Hill" + + def test_strip_website_prefix(self): + """Test removing 'Website: ' prefix.""" + result = CitationProcessor.strip_source_type_prefix("Website: Ancestry.com") + assert result == "Ancestry.com" + + +class TestFormatCitationInfo: + """Test format_citation_info method.""" + + def test_format_freeform_citation_with_all_fields(self): + """Test formatting free-form citation with all fields populated.""" + processor = CitationProcessor() + citation = { + "CitationID": 123, + "SourceID": 456, + "TemplateID": 0, # Free-form + "Footnote": "Smith, *Family History*, p. 42", + "ShortFootnote": "Smith, p. 42", + "CitationBibliography": "Smith, John. *Family History*. Publisher, 2000.", + } + + result = processor.format_citation_info(citation) + + assert isinstance(result, CitationInfo) + assert result.citation_id == 123 + assert result.source_id == 456 + assert result.footnote == "Smith, *Family History*, p. 42" + assert result.short_footnote == "Smith, p. 42" + assert result.bibliography == "Smith, John. *Family History*. Publisher, 2000." + assert result.is_freeform is True + assert result.template_name is None + + def test_format_template_citation(self): + """Test formatting template-based citation shows placeholders.""" + processor = CitationProcessor() + citation = { + "CitationID": 789, + "SourceID": 101, + "TemplateID": 5, # Template-based + "TemplateName": "US Census", + "Footnote": None, + "ShortFootnote": None, + "CitationBibliography": None, + } + + result = processor.format_citation_info(citation) + + assert result.citation_id == 789 + assert result.source_id == 101 + assert result.is_freeform is False + assert result.template_name == "US Census" + assert "[Citation 789, Template: US Census]" in result.footnote + assert "[Source 101, Template: US Census]" in result.bibliography + + +class TestProcessCitationsInText: + """Test process_citations_in_text method.""" + + def test_process_single_citation(self): + """Test processing single citation marker in text.""" + processor = CitationProcessor() + text = "He was born in 1850.{{cite:123}}" + citations = [ + { + "CitationID": 123, + "SourceID": 456, + "TemplateID": 0, + "Footnote": "Birth Record, p. 10", + "ShortFootnote": "Birth Record", + "CitationBibliography": "Vital Records Office.", + } + ] + + modified_text, footnotes, tracker = processor.process_citations_in_text(text, citations) + + assert modified_text == "He was born in 1850.[^1]" + assert len(footnotes) == 1 + assert footnotes[0][0] == 1 # Footnote number + assert footnotes[0][1].citation_id == 123 + assert len(tracker.citation_order) == 1 + + def test_process_multiple_citations(self): + """Test processing multiple citation markers in text.""" + processor = CitationProcessor() + text = "He was born{{cite:123}} and died{{cite:456}}." + citations = [ + { + "CitationID": 123, + "SourceID": 1, + "TemplateID": 0, + "Footnote": "Birth Record", + "ShortFootnote": "Birth Record", + "CitationBibliography": "Vital Records.", + }, + { + "CitationID": 456, + "SourceID": 2, + "TemplateID": 0, + "Footnote": "Death Record", + "ShortFootnote": "Death Record", + "CitationBibliography": "Death Index.", + }, + ] + + modified_text, footnotes, tracker = processor.process_citations_in_text(text, citations) + + assert modified_text == "He was born[^1] and died[^2]." + assert len(footnotes) == 2 + assert footnotes[0][0] == 1 + assert footnotes[1][0] == 2 + + def test_process_duplicate_citation(self): + """Test that duplicate citations get same footnote number.""" + processor = CitationProcessor() + text = "First mention{{cite:123}} and second mention{{cite:123}}." + citations = [ + { + "CitationID": 123, + "SourceID": 456, + "TemplateID": 0, + "Footnote": "Source A", + "ShortFootnote": "Source A", + "CitationBibliography": "Bibliography A.", + } + ] + + modified_text, footnotes, tracker = processor.process_citations_in_text(text, citations) + + assert modified_text == "First mention[^1] and second mention[^1]." + assert len(footnotes) == 1 # Only one unique citation + + def test_process_missing_citation(self): + """Test processing citation marker with missing citation.""" + processor = CitationProcessor() + text = "Reference to missing citation{{cite:999}}." + citations = [] # No citations available + + modified_text, footnotes, tracker = processor.process_citations_in_text(text, citations) + + assert "[^999?]" in modified_text # Should show placeholder with ? + assert len(footnotes) == 0 + + def test_process_no_citations(self): + """Test text with no citation markers.""" + processor = CitationProcessor() + text = "Plain text with no citations." + citations = [] + + modified_text, footnotes, tracker = processor.process_citations_in_text(text, citations) + + assert modified_text == text + assert len(footnotes) == 0 + assert len(tracker.citation_order) == 0 + + +class TestGenerateFootnotesSection: + """Test generate_footnotes_section method.""" + + def test_generate_single_footnote(self): + """Test generating footnotes section with single entry.""" + processor = CitationProcessor() + tracker = CitationTracker() + tracker.add_citation(123, 456) + + citation_info = CitationInfo( + citation_id=123, + source_id=456, + footnote="Full footnote text", + short_footnote="Short footnote", + bibliography="Bibliography entry", + is_freeform=True, + template_name=None, + ) + footnotes = [(1, citation_info)] + + result = processor.generate_footnotes_section(footnotes, tracker) + + assert result == " [^1]: Full footnote text" + + def test_generate_multiple_footnotes_first_and_subsequent(self): + """Test first citation uses full footnote, subsequent use short.""" + processor = CitationProcessor() + tracker = CitationTracker() + + # Same source cited twice + tracker.add_citation(123, 456) # First citation for source 456 + tracker.add_citation(124, 456) # Second citation for same source + + citation1 = CitationInfo( + citation_id=123, + source_id=456, + footnote="Full footnote for source 456", + short_footnote="Short for 456", + bibliography="Bibliography", + is_freeform=True, + template_name=None, + ) + citation2 = CitationInfo( + citation_id=124, + source_id=456, + footnote="Full footnote for source 456", + short_footnote="Short for 456", + bibliography="Bibliography", + is_freeform=True, + template_name=None, + ) + + footnotes = [(1, citation1), (2, citation2)] + result = processor.generate_footnotes_section(footnotes, tracker) + + lines = result.split("\n") + assert "Full footnote for source 456" in lines[0] # First uses full + assert "Short for 456" in lines[1] # Second uses short + + +class TestGenerateSourcesSection: + """Test generate_sources_section method.""" + + def test_generate_single_source(self): + """Test generating bibliography with single source.""" + processor = CitationProcessor() + citations = [ + { + "CitationID": 123, + "SourceID": 456, + "TemplateID": 0, + "Footnote": "Footnote", + "ShortFootnote": "Short", + "CitationBibliography": "Smith, John. *Family History*. 2000.", + } + ] + + result = processor.generate_sources_section(citations) + + assert " Smith, John. *Family History*. 2000." in result + + def test_generate_multiple_sources_sorted(self): + """Test bibliography is alphabetically sorted.""" + processor = CitationProcessor() + citations = [ + { + "CitationID": 1, + "SourceID": 1, + "TemplateID": 0, + "Footnote": "F", + "ShortFootnote": "S", + "CitationBibliography": "Zimmerman, Alice. Book Z.", + }, + { + "CitationID": 2, + "SourceID": 2, + "TemplateID": 0, + "Footnote": "F", + "ShortFootnote": "S", + "CitationBibliography": "Adams, Bob. Book A.", + }, + ] + + result = processor.generate_sources_section(citations) + + lines = result.split("\n") + assert "Adams" in lines[0] # Adams should be first alphabetically + assert "Zimmerman" in lines[1] # Zimmerman should be second + + def test_deduplicate_sources_by_id(self): + """Test that sources are deduplicated by SourceID.""" + processor = CitationProcessor() + citations = [ + { + "CitationID": 1, + "SourceID": 100, + "TemplateID": 0, + "Footnote": "F", + "ShortFootnote": "S", + "CitationBibliography": "Same Source.", + }, + { + "CitationID": 2, + "SourceID": 100, # Same SourceID + "TemplateID": 0, + "Footnote": "F", + "ShortFootnote": "S", + "CitationBibliography": "Same Source.", + }, + ] + + result = processor.generate_sources_section(citations) + + # Should only appear once + assert result.count("Same Source.") == 1 + + +class TestFormatSourcesSection: + """Test format_sources_section method for legacy formatting.""" + + @staticmethod + def _create_minimal_context(**kwargs): + """Helper to create PersonContext with minimal required fields.""" + from rmagent.generators.biography.models import PersonContext + + defaults = { + "person_id": 1, + "full_name": "Test Person", + "given_name": "Test", + "surname": "Person", + "prefix": None, + "suffix": None, + "nickname": None, + "birth_year": None, + "birth_date": None, + "birth_place": None, + "death_year": None, + "death_date": None, + "death_place": None, + "sex": 2, # Unknown + "is_private": False, + "is_living": False, + } + defaults.update(kwargs) + return PersonContext(**defaults) + + def test_format_footnote_style(self): + """Test formatting sources in footnote style.""" + processor = CitationProcessor() + context = self._create_minimal_context( + all_citations=[ + {"SourceName": "Book: Family History", "CitationName": "Page 42"}, + ] + ) + + result = processor.format_sources_section(context, CitationStyle.FOOTNOTE) + + assert "1. *Family History*" in result # Prefix stripped + assert " Page 42" in result + + def test_format_parenthetical_style(self): + """Test formatting sources in parenthetical style.""" + processor = CitationProcessor() + context = self._create_minimal_context( + all_citations=[ + {"SourceName": "Newspaper: Daily News", "CitationName": "1950-01-01"}, + ] + ) + + result = processor.format_sources_section(context, CitationStyle.PARENTHETICAL) + + assert "- *Daily News*" in result # Prefix stripped + assert " (1950-01-01)" in result + + def test_format_narrative_style(self): + """Test formatting sources in narrative style.""" + processor = CitationProcessor() + context = self._create_minimal_context( + all_citations=[ + {"SourceName": "Census Records", "CitationName": ""}, + ] + ) + + result = processor.format_sources_section(context, CitationStyle.NARRATIVE) + + assert "- *Census Records*" in result + + def test_format_no_citations(self): + """Test formatting with no citations returns empty string.""" + processor = CitationProcessor() + context = self._create_minimal_context(all_citations=[]) + + result = processor.format_sources_section(context, CitationStyle.FOOTNOTE) + + assert result == "" diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py index 74f5715..922cdf7 100644 --- a/tests/unit/test_cli.py +++ b/tests/unit/test_cli.py @@ -62,6 +62,59 @@ def test_person_with_id(self, runner, test_db_path): # Should succeed even if person not found (graceful error) assert "Person" in result.output or "Error" in result.output + def test_person_with_events(self, runner, test_db_path): + """Test person command with --events flag.""" + result = runner.invoke(cli, ["--database", test_db_path, "person", "1", "--events"]) + assert result.exit_code == 0 + # Should show events section + assert "Events" in result.output or "Birth" in result.output + + def test_person_with_family(self, runner, test_db_path): + """Test person command with --family flag.""" + result = runner.invoke(cli, ["--database", test_db_path, "person", "1", "--family"]) + assert result.exit_code == 0 + # Should show family information + assert "Family" in result.output or "Parents" in result.output or "Children" in result.output + + def test_person_with_ancestors(self, runner, test_db_path): + """Test person command with --ancestors flag.""" + result = runner.invoke(cli, ["--database", test_db_path, "person", "1", "--ancestors"]) + assert result.exit_code == 0 + # Should show ancestors + assert "Ancestors" in result.output or "Generation" in result.output + + def test_person_with_descendants(self, runner, test_db_path): + """Test person command with --descendants flag.""" + result = runner.invoke(cli, ["--database", test_db_path, "person", "1", "--descendants"]) + assert result.exit_code == 0 + # Should show descendants + assert "Descendants" in result.output or "Generation" in result.output + + def test_person_with_all_flags(self, runner, test_db_path): + """Test person command with all information flags.""" + result = runner.invoke( + cli, + [ + "--database", + test_db_path, + "person", + "1", + "--events", + "--family", + "--ancestors", + "--descendants", + ], + ) + assert result.exit_code == 0 + # Should contain comprehensive information + assert "Person" in result.output + + def test_person_invalid_id(self, runner, test_db_path): + """Test person command with invalid person ID.""" + result = runner.invoke(cli, ["--database", test_db_path, "person", "999999"]) + # Should handle gracefully - either show error or empty result + assert result.exit_code in [0, 1] + class TestBioCommand: """Test bio command.""" @@ -87,9 +140,7 @@ def test_bio_with_invalid_length(self, runner, test_db_path): def test_bio_no_ai_template_based(self, runner, test_db_path, tmp_path): """Test bio command with --no-ai flag (template-based generation).""" output_file = tmp_path / "bio_test.md" - result = runner.invoke( - cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--output", str(output_file)] - ) + result = runner.invoke(cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--output", str(output_file)]) # Should succeed with template-based generation assert result.exit_code == 0 assert output_file.exists() @@ -119,26 +170,20 @@ def test_bio_length_variations(self, runner, test_db_path): def test_bio_citation_styles(self, runner, test_db_path): """Test bio with different citation styles.""" for style in ["footnote", "parenthetical", "narrative"]: - result = runner.invoke( - cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--citation-style", style] - ) + result = runner.invoke(cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--citation-style", style]) assert result.exit_code == 0 def test_bio_with_file_output(self, runner, test_db_path, tmp_path): """Test bio with file output.""" output_file = tmp_path / "biography.md" - result = runner.invoke( - cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--output", str(output_file)] - ) + result = runner.invoke(cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--output", str(output_file)]) assert result.exit_code == 0 assert "Biography written to" in result.output assert output_file.exists() def test_bio_no_sources(self, runner, test_db_path): """Test bio with --no-sources flag.""" - result = runner.invoke( - cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--no-sources"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "bio", "1", "--no-ai", "--no-sources"]) assert result.exit_code == 0 # Biography should not include sources section when --no-sources is used # (We can't easily verify this without parsing output, but command should succeed) @@ -165,9 +210,7 @@ def test_quality_with_invalid_format(self, runner): def test_quality_basic(self, runner, test_db_path, tmp_path): """Test basic quality report generation.""" output_file = tmp_path / "quality.md" - result = runner.invoke( - cli, ["--database", test_db_path, "quality", "--output", str(output_file)] - ) + result = runner.invoke(cli, ["--database", test_db_path, "quality", "--output", str(output_file)]) assert result.exit_code == 0 assert output_file.exists() assert "📊 Data Quality Summary" in result.output @@ -299,6 +342,28 @@ def test_ask_with_question(self, runner, test_db_path): # Either way, command should recognize the question format assert "Ask questions" not in result.output # Not showing help text + def test_ask_interactive_mode(self, runner, test_db_path): + """Test ask interactive mode with simulated user input.""" + # Simulate user typing a question then "quit" + result = runner.invoke( + cli, + ["--database", test_db_path, "ask", "--interactive"], + input="Who is person 1?\nquit\n", + ) + # Should enter interactive mode + assert "Interactive Q&A Mode" in result.output or result.exit_code in [0, 1] + + def test_ask_interactive_exit_commands(self, runner, test_db_path): + """Test that various exit commands work in interactive mode.""" + for exit_cmd in ["exit", "quit", "q"]: + result = runner.invoke( + cli, + ["--database", test_db_path, "ask", "--interactive"], + input=f"{exit_cmd}\n", + ) + # Should accept exit command and show goodbye message + assert result.exit_code in [0, 1] # May succeed or fail depending on LLM + class TestTimelineCommand: """Test timeline command.""" @@ -397,9 +462,7 @@ def test_timeline_with_include_family(self, runner, test_db_path, tmp_path): def test_timeline_invalid_format(self, runner, test_db_path): """Test timeline with invalid format option.""" - result = runner.invoke( - cli, ["--database", test_db_path, "timeline", "1", "--format", "invalid"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "timeline", "1", "--format", "invalid"]) assert result.exit_code != 0 @@ -573,9 +636,7 @@ def test_search_by_name(self, runner, test_db_path): def test_search_by_full_name(self, runner, test_db_path): """Test search by full name (given and surname).""" - result = runner.invoke( - cli, ["--database", test_db_path, "search", "--name", "Michael Iams"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "Michael Iams"]) assert result.exit_code == 0 def test_search_by_place(self, runner, test_db_path): @@ -587,40 +648,30 @@ def test_search_by_place(self, runner, test_db_path): def test_search_with_limit(self, runner, test_db_path): """Test search with custom limit.""" - result = runner.invoke( - cli, ["--database", test_db_path, "search", "--name", "Smith", "--limit", "10"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "Smith", "--limit", "10"]) assert result.exit_code == 0 def test_search_exact_mode(self, runner, test_db_path): """Test search with --exact flag (no phonetic matching).""" - result = runner.invoke( - cli, ["--database", test_db_path, "search", "--name", "Iams", "--exact"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "Iams", "--exact"]) assert result.exit_code == 0 def test_search_name_and_place(self, runner, test_db_path): """Test search with both name and place criteria.""" - result = runner.invoke( - cli, ["--database", test_db_path, "search", "--name", "Iams", "--place", "Maryland"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "Iams", "--place", "Maryland"]) # Should show results for both searches assert result.exit_code == 0 def test_search_with_surname_variation(self, runner, test_db_path): """Test search with surname variation syntax [variant].""" - result = runner.invoke( - cli, ["--database", test_db_path, "search", "--name", "John Iiams [Ijams]"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "John Iiams [Ijams]"]) assert result.exit_code == 0 # Should show that it's searching multiple variations assert "Searching 2 name variations" in result.output or "Found" in result.output def test_search_with_multiple_variations(self, runner, test_db_path): """Test search with multiple surname variations.""" - result = runner.invoke( - cli, ["--database", test_db_path, "search", "--name", "John Iams [Ijams] [Imes]"] - ) + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "John Iams [Ijams] [Imes]"]) assert result.exit_code == 0 # Should search 3 variations (base + 2 variants) assert "Searching 3 name variations" in result.output or "Found" in result.output @@ -632,6 +683,54 @@ def test_search_with_all_keyword(self, runner, test_db_path): # Should search all 8 configured variants assert "Searching 8 name variations" in result.output or "Found" in result.output + def test_search_with_married_name(self, runner, test_db_path): + """Test search with --married-name flag.""" + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "Janet", "--married-name"]) + assert result.exit_code == 0 + # Should search for females by maiden and married names + assert "Found" in result.output or "No persons" in result.output + + def test_search_radius_both_units_error(self, runner, test_db_path): + """Test that specifying both --kilometers and --miles fails.""" + result = runner.invoke( + cli, + [ + "--database", + test_db_path, + "search", + "--place", + "Phoenix, Arizona", + "--kilometers", + "100", + "--miles", + "50", + ], + ) + assert result.exit_code != 0 + assert "Cannot specify both" in result.output + + def test_search_radius_negative_value(self, runner, test_db_path): + """Test that negative radius value fails.""" + result = runner.invoke( + cli, + ["--database", test_db_path, "search", "--place", "Phoenix, Arizona", "--kilometers", "-10"], + ) + assert result.exit_code != 0 + assert "must be positive" in result.output or "Error" in result.output + + def test_search_radius_without_place(self, runner, test_db_path): + """Test that radius search requires --place.""" + result = runner.invoke(cli, ["--database", test_db_path, "search", "--name", "Smith", "--kilometers", "100"]) + assert result.exit_code != 0 + assert "requires --place" in result.output or "Error" in result.output + + def test_search_place_exact_match(self, runner, test_db_path): + """Test place search with --exact flag.""" + result = runner.invoke(cli, ["--database", test_db_path, "search", "--place", "Maryland", "--exact"]) + assert result.exit_code == 0 + # Should return results or no matches + assert "Found" in result.output or "No places" in result.output + class TestGlobalOptions: """Test global CLI options.""" diff --git a/tests/unit/test_hugo_exporter.py b/tests/unit/test_hugo_exporter.py index df3c770..a69260a 100644 --- a/tests/unit/test_hugo_exporter.py +++ b/tests/unit/test_hugo_exporter.py @@ -112,9 +112,7 @@ def test_export_person_raises_error_without_database(self, tmp_path): with pytest.raises(ValueError, match="No database provided"): exporter.export_person(person_id=1, output_dir=tmp_path) - def test_export_person_raises_error_for_nonexistent_person( - self, tmp_path, real_db_path, extension_path - ): + def test_export_person_raises_error_for_nonexistent_person(self, tmp_path, real_db_path, extension_path): """Test that export_person raises ValueError for nonexistent person.""" if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") @@ -276,9 +274,7 @@ def test_export_batch_with_index(self, tmp_path, real_db_path, extension_path): assert "Family Biographies" in content assert "---" in content # Has front matter - def test_export_batch_handles_invalid_person_gracefully( - self, tmp_path, real_db_path, extension_path - ): + def test_export_batch_handles_invalid_person_gracefully(self, tmp_path, real_db_path, extension_path): """Test batch export continues when one person fails.""" if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") @@ -351,9 +347,7 @@ def test_complete_hugo_export_workflow(self, tmp_path, real_db_path, extension_p if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") - exporter = HugoExporter( - db=real_db_path, extension_path=extension_path, media_base_path="/media/" - ) + exporter = HugoExporter(db=real_db_path, extension_path=extension_path, media_base_path="/media/") # Create Hugo directory structure content_dir = tmp_path / "content" / "people" @@ -403,9 +397,7 @@ def test_media_references_in_export(self, tmp_path, real_db_path, extension_path if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") - exporter = HugoExporter( - db=real_db_path, extension_path=extension_path, media_base_path="/media/" - ) + exporter = HugoExporter(db=real_db_path, extension_path=extension_path, media_base_path="/media/") result = exporter.export_person( person_id=1, diff --git a/tests/unit/test_llm_provider.py b/tests/unit/test_llm_provider.py index 0722b95..bee920d 100644 --- a/tests/unit/test_llm_provider.py +++ b/tests/unit/test_llm_provider.py @@ -41,9 +41,7 @@ def _invoke(self, prompt: str, **kwargs): return LLMResult( text=text, model=self.model, - usage=TokenUsage( - prompt_tokens=len(prompt.split()), completion_tokens=len(text.split()) - ), + usage=TokenUsage(prompt_tokens=len(prompt.split()), completion_tokens=len(text.split())), ) @@ -64,9 +62,7 @@ def _invoke(self, prompt: str, **kwargs): self.invocations += 1 if self.invocations < 2: raise LLMError("temporary failure") - return LLMResult( - text="ok", model=self.model, usage=TokenUsage(prompt_tokens=1, completion_tokens=1) - ) + return LLMResult(text="ok", model=self.model, usage=TokenUsage(prompt_tokens=1, completion_tokens=1)) provider = FlakyProvider() result = provider.generate("prompt") diff --git a/tests/unit/test_name_parser.py b/tests/unit/test_name_parser.py index febee4f..0eecdfe 100644 --- a/tests/unit/test_name_parser.py +++ b/tests/unit/test_name_parser.py @@ -179,9 +179,7 @@ def test_full_name_minimal(self): def test_full_name_surname_only(self): """Test full name with surname only.""" - name = Name( - name_id=1, person_id=1, is_primary=True, name_type=NameType.BIRTH, surname="Smith" - ) + name = Name(name_id=1, person_id=1, is_primary=True, name_type=NameType.BIRTH, surname="Smith") assert name.full_name() == "Smith" @@ -457,9 +455,7 @@ def test_format_minimal(self): def test_format_no_nickname(self): """Test formatting without nickname.""" - full = format_full_name( - surname="Smith", given="John", nickname="Jack", include_nickname=False - ) + full = format_full_name(surname="Smith", given="John", nickname="Jack", include_nickname=False) assert full == "John Smith" diff --git a/tests/unit/test_place_parser.py b/tests/unit/test_place_parser.py index dfeed21..87db7b8 100644 --- a/tests/unit/test_place_parser.py +++ b/tests/unit/test_place_parser.py @@ -172,9 +172,7 @@ def test_get_level_2_state(self): def test_get_level_3_country(self): """Test getting level 3 (country).""" - assert ( - get_place_level("Baltimore, Baltimore, Maryland, United States", 3) == "United States" - ) + assert get_place_level("Baltimore, Baltimore, Maryland, United States", 3) == "United States" def test_get_level_out_of_range(self): """Test getting level that doesn't exist.""" @@ -192,10 +190,7 @@ class TestGetPlaceShort: def test_get_short_us_place_2_levels(self): """Test short form for US place (skips county).""" - assert ( - get_place_short("Baltimore, Baltimore, Maryland, United States", 2) - == "Baltimore, Maryland" - ) + assert get_place_short("Baltimore, Baltimore, Maryland, United States", 2) == "Baltimore, Maryland" def test_get_short_international_place_2_levels(self): """Test short form for international place.""" @@ -217,18 +212,12 @@ class TestFormatPlaceShort: def test_format_us_4_level(self): """Test formatting US 4-level place.""" - assert ( - format_place_short("Baltimore, Baltimore, Maryland, United States") - == "Baltimore, Maryland" - ) + assert format_place_short("Baltimore, Baltimore, Maryland, United States") == "Baltimore, Maryland" def test_format_us_3_level(self): """Test formatting US 3-level place.""" # 3-level place: City, State, Country - format returns City, Country (level 0 and 2) - assert ( - format_place_short("Abbeville, South Carolina, United States") - == "Abbeville, United States" - ) + assert format_place_short("Abbeville, South Carolina, United States") == "Abbeville, United States" def test_format_international_4_level(self): """Test formatting international 4-level place.""" @@ -249,10 +238,7 @@ class TestFormatPlaceMedium: def test_format_medium_4_level(self): """Test medium format for 4-level place.""" - assert ( - format_place_medium("Baltimore, Baltimore, Maryland, United States") - == "Baltimore, Baltimore, Maryland" - ) + assert format_place_medium("Baltimore, Baltimore, Maryland, United States") == "Baltimore, Baltimore, Maryland" def test_format_medium_3_level(self): """Test medium format for 3-level place.""" diff --git a/tests/unit/test_quality.py b/tests/unit/test_quality.py index d9a3b69..d3b4c53 100644 --- a/tests/unit/test_quality.py +++ b/tests/unit/test_quality.py @@ -7,11 +7,8 @@ from __future__ import annotations -from collections.abc import Iterable from pathlib import Path -import pytest - # Ensure repository root is available on sys.path when running with pytest -o addopts='' PROJECT_ROOT = Path(__file__).resolve().parents[2] import sys diff --git a/tests/unit/test_quality_report.py b/tests/unit/test_quality_report.py index 427ff7e..5f34aae 100644 --- a/tests/unit/test_quality_report.py +++ b/tests/unit/test_quality_report.py @@ -261,9 +261,7 @@ def test_generate_raises_error_without_database(self): with pytest.raises(ValueError, match="No database provided"): generator.generate(format=ReportFormat.MARKDOWN) - def test_generate_markdown_with_mock_validation( - self, real_db_path, extension_path, mock_quality_report - ): + def test_generate_markdown_with_mock_validation(self, real_db_path, extension_path, mock_quality_report): """Test generate with mocked validation.""" if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") @@ -278,9 +276,7 @@ def test_generate_markdown_with_mock_validation( assert "Total People:** 10,000" in report assert "Total Issues Found:** 185" in report - def test_generate_html_with_mock_validation( - self, real_db_path, extension_path, mock_quality_report - ): + def test_generate_html_with_mock_validation(self, real_db_path, extension_path, mock_quality_report): """Test HTML generation with mocked validation.""" if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") @@ -293,9 +289,7 @@ def test_generate_html_with_mock_validation( assert "" in report assert "

      Data Quality Report

      " in report - def test_generate_csv_with_mock_validation( - self, real_db_path, extension_path, mock_quality_report - ): + def test_generate_csv_with_mock_validation(self, real_db_path, extension_path, mock_quality_report): """Test CSV generation with mocked validation.""" if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") @@ -309,9 +303,7 @@ def test_generate_csv_with_mock_validation( assert "Rule Name" in report assert "1.1" in report - def test_generate_with_output_path( - self, tmp_path, real_db_path, extension_path, mock_quality_report - ): + def test_generate_with_output_path(self, tmp_path, real_db_path, extension_path, mock_quality_report): """Test writing report to file.""" if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") @@ -363,9 +355,7 @@ def test_generate_real_markdown_report(self, real_db_path, extension_path): if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") - generator = QualityReportGenerator( - db=real_db_path, extension_path=extension_path, sample_limit=5 - ) + generator = QualityReportGenerator(db=real_db_path, extension_path=extension_path, sample_limit=5) report = generator.generate(format=ReportFormat.MARKDOWN) @@ -394,9 +384,7 @@ def test_generate_real_html_report(self, real_db_path, extension_path): if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") - generator = QualityReportGenerator( - db=real_db_path, extension_path=extension_path, sample_limit=5 - ) + generator = QualityReportGenerator(db=real_db_path, extension_path=extension_path, sample_limit=5) report = generator.generate(format=ReportFormat.HTML) @@ -419,9 +407,7 @@ def test_generate_real_csv_report(self, real_db_path, extension_path): if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") - generator = QualityReportGenerator( - db=real_db_path, extension_path=extension_path, sample_limit=5 - ) + generator = QualityReportGenerator(db=real_db_path, extension_path=extension_path, sample_limit=5) report = generator.generate(format=ReportFormat.CSV) @@ -441,9 +427,7 @@ def test_generate_all_formats(self, real_db_path, extension_path): if not real_db_path.exists() or not extension_path.exists(): pytest.skip("Real database or ICU extension not available") - generator = QualityReportGenerator( - db=real_db_path, extension_path=extension_path, sample_limit=3 - ) + generator = QualityReportGenerator(db=real_db_path, extension_path=extension_path, sample_limit=3) # Generate all three formats markdown_report = generator.generate(format=ReportFormat.MARKDOWN) diff --git a/tests/unit/test_queries.py b/tests/unit/test_queries.py index b063138..2d12f0a 100644 --- a/tests/unit/test_queries.py +++ b/tests/unit/test_queries.py @@ -148,3 +148,111 @@ def test_find_logical_inconsistencies(query_service: QueryService) -> None: assert rows for row in rows: assert row["DeathSort"] < row["BirthSort"] + + +def test_search_names_flexible(query_service: QueryService) -> None: + """Test flexible name search that searches surname or given name.""" + rows = query_service.search_names_flexible("Michael", limit=10) + assert rows + # Should find people with "Michael" in given or surname + for row in rows: + name_text = f"{row['Given']} {row['Surname']}".lower() + assert "michael" in name_text + + +def test_search_names_by_words(query_service: QueryService) -> None: + """Test multi-word search where all words must appear.""" + rows = query_service.search_names_by_words("Michael Iams", limit=10) + assert rows + # All results should contain both "Michael" and "Iams" + for row in rows: + name_text = f"{row['Given']} {row['Surname']}".lower() + assert "michael" in name_text and "iams" in name_text.lower() + + +def test_search_names_with_married(query_service: QueryService) -> None: + """Test search for females by maiden or married name.""" + # This searches only females (Sex=1) and includes spouse surnames + rows = query_service.search_names_with_married("Dorsey", limit=10) + # Should return results if there are females with Dorsey as maiden or married name + # Note: May be empty if no matches, so just verify it runs without error + assert isinstance(rows, list) + + +def test_search_names_with_married_by_words(query_service: QueryService) -> None: + """Test multi-word search for females by maiden or married name.""" + rows = query_service.search_names_with_married_by_words("Janet Iams", limit=10) + # Should search for females where all words appear in name + assert isinstance(rows, list) + + +def test_find_places_within_radius(query_service: QueryService) -> None: + """Test finding places within a radius of a center point.""" + # First find a place with coordinates to use as center + all_places = query_service.db.query( + "SELECT PlaceID, Name, Latitude, Longitude FROM PlaceTable " + "WHERE Latitude IS NOT NULL AND Latitude != 0 " + "AND Longitude IS NOT NULL AND Longitude != 0 " + "LIMIT 1" + ) + if not all_places: + pytest.skip("No places with GPS coordinates in database") + + center_place_id = all_places[0]["PlaceID"] + + # Search within 100km radius + rows = query_service.find_places_within_radius(center_place_id, radius_km=100, limit=10) + + # Results should be sorted by distance + if len(rows) > 1: + distances = [r["DistanceKm"] for r in rows] + assert distances == sorted(distances) + + # All results should have distance and be within radius + for row in rows: + assert "DistanceKm" in row + assert row["DistanceKm"] <= 100 + + +def test_find_places_within_radius_no_coordinates(query_service: QueryService) -> None: + """Test that radius search fails gracefully for places without coordinates.""" + # Create or find a place without coordinates + places_no_coords = query_service.db.query( + "SELECT PlaceID FROM PlaceTable WHERE Latitude IS NULL OR Latitude = 0 LIMIT 1" + ) + if places_no_coords: + with pytest.raises(ValueError, match="has no GPS coordinates"): + query_service.find_places_within_radius(places_no_coords[0]["PlaceID"], radius_km=100) + + +def test_get_person_count_by_place(query_service: QueryService) -> None: + """Test counting unique people with events at a place.""" + # Find a place that has events + places = query_service.find_places_by_name("Maryland", limit=1) + if not places: + pytest.skip("No places named Maryland in database") + + place_id = places[0]["PlaceID"] + count = query_service.get_person_count_by_place(place_id) + + # Count should be non-negative integer + assert isinstance(count, int) + assert count >= 0 + + +def test_find_places_exact_match(query_service: QueryService) -> None: + """Test exact place name matching.""" + # First get a known place name + all_places = query_service.find_places_by_name("Maryland", limit=1, exact=False) + if not all_places: + pytest.skip("No places with Maryland in name") + + exact_name = all_places[0]["Name"] + + # Now search for exact match + exact_results = query_service.find_places_by_name(exact_name, limit=10, exact=True) + + # Should return only places with exact name match (case-insensitive) + assert len(exact_results) > 0 + for place in exact_results: + assert place["Name"].lower() == exact_name.lower() diff --git a/tests/unit/test_rendering.py b/tests/unit/test_rendering.py new file mode 100644 index 0000000..188ce9d --- /dev/null +++ b/tests/unit/test_rendering.py @@ -0,0 +1,424 @@ +""" +Unit tests for biography rendering and markdown generation. + +Tests biography rendering, metadata formatting, and image handling. +""" + +from rmagent.generators.biography.models import Biography, BiographyLength, CitationStyle, LLMMetadata +from rmagent.generators.biography.rendering import BiographyRenderer + + +class TestFormatTokens: + """Test format_tokens static method.""" + + def test_format_less_than_thousand(self): + """Test formatting tokens less than 1000.""" + assert BiographyRenderer.format_tokens(500) == "500" + assert BiographyRenderer.format_tokens(999) == "999" + + def test_format_thousands(self): + """Test formatting tokens in thousands.""" + assert BiographyRenderer.format_tokens(1000) == "1.0k" + assert BiographyRenderer.format_tokens(1500) == "1.5k" + assert BiographyRenderer.format_tokens(2300) == "2.3k" + + def test_format_large_numbers(self): + """Test formatting large token counts.""" + assert BiographyRenderer.format_tokens(10000) == "10.0k" + assert BiographyRenderer.format_tokens(42500) == "42.5k" + + +class TestFormatDuration: + """Test format_duration static method.""" + + def test_format_seconds_only(self): + """Test formatting durations less than 60 seconds.""" + assert BiographyRenderer.format_duration(5.2) == "5s" + assert BiographyRenderer.format_duration(45.9) == "45s" + assert BiographyRenderer.format_duration(59) == "59s" + + def test_format_minutes_and_seconds(self): + """Test formatting durations with minutes and seconds.""" + assert BiographyRenderer.format_duration(65) == "1m5s" + assert BiographyRenderer.format_duration(125) == "2m5s" + assert BiographyRenderer.format_duration(183.5) == "3m3s" + + def test_format_minutes_only(self): + """Test formatting durations with even minutes.""" + assert BiographyRenderer.format_duration(60) == "1m" + assert BiographyRenderer.format_duration(120) == "2m" + assert BiographyRenderer.format_duration(180) == "3m" + + +class TestFormatImageCaption: + """Test _format_image_caption static method.""" + + def test_caption_with_both_years(self): + """Test caption with birth and death years.""" + caption = BiographyRenderer._format_image_caption("John Doe", 1850, 1920) + assert caption == "John Doe (1850-1920)" + + def test_caption_birth_only(self): + """Test caption with only birth year.""" + caption = BiographyRenderer._format_image_caption("Jane Smith", 1900, None) + assert caption == "Jane Smith (1900-????)" + + def test_caption_death_only(self): + """Test caption with only death year.""" + caption = BiographyRenderer._format_image_caption("Bob Jones", None, 1950) + assert caption == "Bob Jones (????-1950)" + + def test_caption_no_years(self): + """Test caption without any years.""" + caption = BiographyRenderer._format_image_caption("Alice Brown", None, None) + assert caption == "Alice Brown" + + +class TestFormatImagePath: + """Test _format_image_path method.""" + + def test_format_path_with_question_mark_backslash(self): + """Test formatting path with question-backslash prefix (Windows-style).""" + renderer = BiographyRenderer() + media = {"MediaPath": r"?\Photos\Family", "MediaFile": "portrait.jpg"} + + result = renderer._format_image_path(media) + + # Path object preserves backslashes on Unix, but as_posix() converts separators + # The actual behavior depends on the implementation - accept either format + assert "../images" in result + assert "portrait.jpg" in result + + def test_format_path_with_question_mark_slash(self): + """Test formatting path with ?/ prefix (Unix-style).""" + renderer = BiographyRenderer() + media = {"MediaPath": "?/Photos/Family", "MediaFile": "photo.png"} + + result = renderer._format_image_path(media) + + assert result == "../images/Photos/Family/photo.png" + + def test_format_path_without_question_mark(self): + """Test formatting path without ? prefix.""" + renderer = BiographyRenderer() + media = {"MediaPath": "Photos/Family", "MediaFile": "image.jpg"} + + result = renderer._format_image_path(media) + + assert result == "Photos/Family/image.jpg" + + def test_format_path_no_media_path(self): + """Test formatting with no MediaPath (only MediaFile).""" + renderer = BiographyRenderer() + media = {"MediaPath": "", "MediaFile": "standalone.jpg"} + + result = renderer._format_image_path(media) + + assert result == "standalone.jpg" + + +class TestRenderMetadata: + """Test render_metadata method.""" + + @staticmethod + def _create_minimal_biography(**kwargs): + """Helper to create Biography with minimal required fields.""" + defaults = { + "person_id": 1, + "full_name": "Test Person", + "length": BiographyLength.STANDARD, + "citation_style": CitationStyle.FOOTNOTE, + "introduction": "Test intro", + "early_life": "", + "education": "", + "career": "", + "marriage_family": "", + "later_life": "", + "death_legacy": "", + "footnotes": "", + "sources": "", + } + defaults.update(kwargs) + return Biography(**defaults) + + def test_render_metadata_basic(self): + """Test rendering basic metadata without LLM metadata.""" + bio = self._create_minimal_biography( + full_name="John Doe", + birth_year=1850, + death_year=1920, + citation_count=5, + source_count=3, + ) + renderer = BiographyRenderer() + + result = renderer.render_metadata(bio) + + assert "---" in result + assert 'Title: "Biography of John Doe (1850-1920)"' in result + assert "PersonID: 1" in result + assert "Words:" in result + assert "Citations: 5" in result + assert "Sources: 3" in result + + def test_render_metadata_with_llm_metadata(self): + """Test rendering metadata with LLM metadata.""" + llm_meta = LLMMetadata( + provider="anthropic", + model="claude-3-5-sonnet-20241022", + prompt_tokens=1500, + completion_tokens=800, + total_tokens=2300, + prompt_time=2.5, + llm_time=5.3, + ) + bio = self._create_minimal_biography( + llm_metadata=llm_meta, + citation_count=10, + source_count=5, + ) + renderer = BiographyRenderer() + + result = renderer.render_metadata(bio) + + assert "TokensIn: 1.5k" in result + assert "TokensOut: 800" in result + assert "TotalTokens: 2.3k" in result + assert "LLM: Anthropic" in result + assert "Model: claude-3-5-sonnet-20241022" in result + assert "PromptTime: 2s" in result + assert "LLMTime: 5s" in result + + def test_render_metadata_missing_years(self): + """Test rendering metadata with missing birth/death years.""" + bio = self._create_minimal_biography( + birth_year=None, + death_year=None, + ) + renderer = BiographyRenderer() + + result = renderer.render_metadata(bio) + + # Should not include years in title when both are None + assert 'Title: "Biography of Test Person"' in result + assert "????" not in result # No placeholder years + + +class TestRenderMarkdown: + """Test render_markdown method.""" + + @staticmethod + def _create_minimal_biography(**kwargs): + """Helper to create Biography with minimal required fields.""" + defaults = { + "person_id": 1, + "full_name": "Test Person", + "length": BiographyLength.STANDARD, + "citation_style": CitationStyle.FOOTNOTE, + "introduction": "Test intro", + "early_life": "", + "education": "", + "career": "", + "marriage_family": "", + "later_life": "", + "death_legacy": "", + "footnotes": "", + "sources": "", + } + defaults.update(kwargs) + return Biography(**defaults) + + def test_render_markdown_with_all_sections(self): + """Test rendering biography with all sections populated.""" + bio = self._create_minimal_biography( + full_name="Jane Smith", + birth_year=1900, + death_year=1980, + introduction="Jane was born in 1900.", + early_life="She grew up in Maryland.", + education="She attended local schools.", + career="She worked as a teacher.", + marriage_family="She married John.", + later_life="She retired in 1965.", + death_legacy="She passed away in 1980.", + sources="Source 1\nSource 2", + ) + renderer = BiographyRenderer() + + result = renderer.render_markdown(bio, include_metadata=False) + + # Check all sections are present + assert "# Biography of Jane Smith (1900-1980)" in result + assert "## Introduction" in result + assert "Jane was born in 1900." in result + assert "## Early Life & Family Background" in result + assert "She grew up in Maryland." in result + assert "## Education" in result + assert "She attended local schools." in result + assert "## Career & Accomplishments" in result + assert "She worked as a teacher." in result + assert "## Marriage & Family" in result + assert "She married John." in result + assert "## Later Life & Activities" in result + assert "She retired in 1965." in result + assert "## Death & Legacy" in result + assert "She passed away in 1980." in result + assert "## Sources" in result + assert "Source 1" in result + + def test_render_markdown_with_metadata(self): + """Test rendering biography with front matter metadata.""" + bio = self._create_minimal_biography( + introduction="Test introduction.", + ) + renderer = BiographyRenderer() + + result = renderer.render_markdown(bio, include_metadata=True) + + # Should have front matter + assert "---" in result + assert "PersonID:" in result + + def test_render_markdown_without_metadata(self): + """Test rendering biography without front matter.""" + bio = self._create_minimal_biography( + introduction="Test introduction.", + ) + renderer = BiographyRenderer() + + result = renderer.render_markdown(bio, include_metadata=False) + + # Should not have front matter + lines = result.split("\n") + # First line should be the title, not --- + assert not lines[0].startswith("---") + assert lines[0].startswith("# Biography") + + def test_render_markdown_with_footnotes(self): + """Test rendering biography with footnotes section.""" + bio = self._create_minimal_biography( + introduction="Test intro.", + footnotes="[^1]: Footnote 1\n[^2]: Footnote 2", + citation_style=CitationStyle.FOOTNOTE, + ) + renderer = BiographyRenderer() + + result = renderer.render_markdown(bio, include_metadata=False) + + assert "## Footnotes" in result + assert "[^1]: Footnote 1" in result + + def test_render_markdown_no_footnotes_for_other_styles(self): + """Test that footnotes section is omitted for non-footnote citation styles.""" + bio = self._create_minimal_biography( + introduction="Test intro.", + footnotes="[^1]: Footnote 1", + citation_style=CitationStyle.NARRATIVE, # Not FOOTNOTE + ) + renderer = BiographyRenderer() + + result = renderer.render_markdown(bio, include_metadata=False) + + assert "## Footnotes" not in result + + def test_render_markdown_short_biography_no_images(self): + """Test that SHORT biographies don't include images.""" + bio = self._create_minimal_biography( + length=BiographyLength.SHORT, + introduction="Short bio.", + media_files=[{"IsPrimary": 1, "MediaPath": "?/test", "MediaFile": "photo.jpg"}], + ) + renderer = BiographyRenderer() + + result = renderer.render_markdown(bio, include_metadata=False) + + # Should not have image HTML + assert "' in result + assert '
MetricCount
Total People{report.summary.get('total_people', 0):,}
Total Events{report.summary.get('total_events', 0):,}
Total Sources{report.summary.get('total_sources', 0):,}
Total People{report.summary.get('total_people', 0):,}
Total Events{report.summary.get('total_events', 0):,}
Total Sources{report.summary.get('total_sources', 0):,}
Total Citations{report.summary.get('total_citations', 0):,}