Developer Documentation

Architecture

GedGraph is structured as a modular Python application with clear separation of concerns:

gedgraph/
├── __init__.py       # Package metadata and version
├── __main__.py       # Entry point for `python -m gedgraph`
├── parser.py         # GEDCOM file parsing and queries
├── pathfinder.py     # Relationship path finding algorithms
├── dotgen.py         # GraphViz DOT file generation
├── progress.py       # Braille-spinner progress indicators (vendored from gedcom_tools)
└── cli.py            # Command-line interface

Core Modules

parser.py - GEDCOM Parsing

Purpose: Parse GEDCOM files and provide query methods for individuals and relationships.

Key Classes:

GedcomParser: Main parser class using ged4py library

Key Methods:

load(): Parse GEDCOM file into memory
get_individual(xref_id): Retrieve individual by ID
get_name(individual): Format name using NPFX/TITL GIVN SURN NSFX sequence
get_birth_year(), get_death_year(): Get vital dates with fallback to baptism/burial
get_parents(individual): Get father and mother
get_children(individual): Get all children
get_spouse_for_child(): Get spouse in context of specific child, with marriage status
is_full_sibling(), is_half_sibling(): Determine sibling relationships
_extract_year(): Extract year from GEDCOM event tag (used by birth/death methods)

Implementation Notes:

Individuals and families are loaded at initialization and kept in memory
GedcomReader stays open to allow lazy resolution of references
Individuals cached in _individuals dict for O(1) lookup
Families cached in _families dict
Accepts IDs with or without @ symbols for convenience
Name parsing checks GEDCOM sub-tags (NPFX, TITL, GIVN, SURN, NSFX) before falling back to parsed tuple
Marriage detection checks for MARR tag in family records

pathfinder.py - Relationship Path Finding

Purpose: Find relationship paths between individuals using graph traversal.

Key Classes:

PathFinder: BFS-based path finding
RelationshipPath: Represents a complete path with metadata
PathStep: Represents a single parent/child relationship with boolean flags

Algorithm:

Uses breadth-first search (BFS) to find shortest paths
Explores both upward (parents) and downward (children) relationships
Tracks visited nodes to avoid infinite loops
Continues searching until all paths of minimum length are found

Path Sorting: Paths are sorted by a tuple key:

Length (number of steps)
Blood score (count of half-blood relationships)
Male preference score (count of female-line steps)

This prioritizes: shorter paths, full blood over half blood, male line over female line.

Key Methods:

find_pedigree(): BFS to find ancestors up to N generations
find_pedigree_with_generations(): Find ancestors with generation tracking
find_pedigree_split(): Find paternal and maternal pedigrees separately
find_descendants(): Find descendants with generation tracking
find_relationship_paths(): BFS to find all paths between two individuals
get_shortest_paths(): Find and sort shortest paths
_bfs_traverse(): Generic BFS traversal used by all find methods
_get_neighbors(): Get all adjacent individuals in the graph
_is_full_blood(): Check if parent-child relationship is full blood

dotgen.py - GraphViz Generation

Purpose: Generate DOT format files for visualization.

Key Classes:

DotGenerator: Creates DOT syntax for charts

Key Methods:

generate_pedigree(): Create pedigree chart DOT file (ancestors only)
generate_hourglass(): Create hourglass chart DOT file (vertical split)
generate_bowtie(): Create bowtie chart DOT file (horizontal split)
generate_relationship(): Create relationship chart DOT file with spouse nodes
_format_label(): Format names with dates in (YYYY - YYYY) format
_describe_relationship(): Generate human-readable relationship description
_build_generation_map(): Build generation map for hourglass/bowtie charts
_render_chart(): Generic chart renderer for all chart types

Chart Types:

Pedigree: Ancestors only, top-to-bottom layout
Hourglass: Two variants with vertical layout (rankdir=TB)
- ancestor-split: Father's line above root, mother's line below root
- descendants: Ancestors above root, descendants below root
Bowtie: Two variants with horizontal layout (rankdir=LR)
- ancestor-split: Father's line left of root, mother's line right of root
- descendants: Ancestors left of root, descendants right of root
Relationship: Path between two individuals with spouses

DOT Generation:

Uses rankdir=TB (top-to-bottom) for pedigree, hourglass, and relationship charts
Uses rankdir=LR (left-to-right) for bowtie charts
Color codes nodes: lightcoral (start/root), lightblue (end), lightgreen (bloodline), lightyellow (spouses)
Spouse nodes positioned using {rank=same; ...} constraints
Marriage status indicated by line style: solid (married), dashed (unmarried)
Spouse lines use dir=none and constraint=false to avoid affecting layout
Includes metadata as comments (generations, path length, etc.)

cli.py - Command Line Interface

Purpose: Provide user-friendly CLI using argparse.

Commands:

pedigree: Generate ancestor chart
relationship: Generate relationship chart between two individuals
hourglass: Generate hourglass chart (vertical split layout)
bowtie: Generate bowtie chart (horizontal split layout)

Common Options:

-o, --output: Output DOT file path (required for all commands)
-g, --generations: Number of generations (default: 4, used by pedigree/hourglass/bowtie)
-d, --max-depth: Maximum search depth (default: 50, used by relationship)
-v, --variant: Chart variant (used by hourglass/bowtie)
- ancestor-split: Split by parental lines
- descendants: Split by ancestors/descendants

Global Flags (must precede the subcommand name):

--verbose: Show detailed progress with timing
-q, --quiet: Suppress progress output (spinner phases on stderr)
--no-color: Disable colored output

Progress Feedback:

Each command runs through 3 phases: Loading GEDCOM, Generating/Finding chart, Writing output
Progress is displayed on stderr via PhaseTracker from progress.py
In quiet mode, all spinner output is suppressed; the stdout summary line is always emitted
Non-TTY environments (pipes, redirects) degrade gracefully — no animation, just final status

Error Handling:

Validates GEDCOM file exists
Validates individual IDs exist
Reports when no relationship found
Exits with appropriate error codes

Development Setup

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

Testing

Test Structure

tests/
├── fixtures/
│   └── sample.ged       # Sample GEDCOM for testing
├── test_parser.py       # Parser unit tests
├── test_pathfinder.py   # Path finding unit tests
├── test_dotgen.py       # DOT generation unit tests
├── test_progress.py     # Progress indicator unit tests
├── test_cli.py          # CLI flag parsing and wiring tests
└── test_integration.py  # End-to-end CLI tests

Running Tests

# All tests
pytest tests/ -v

# Specific test file
pytest tests/test_parser.py -v

# With coverage
pytest tests/ --cov=gedgraph --cov-report=html

# Single test
pytest tests/test_parser.py::test_get_individual -v

Test Fixtures

The sample.ged file contains:

10 individuals across 4 generations
Various relationships (parent-child, siblings, cousins)
Birth and death dates for testing date parsing
Both connected and disconnected individuals

Makefile Targets

Target	Description
`make venv`	Create virtual environment in `.venv/`
`make install`	Upgrade pip and install package in editable mode with dev deps
`make test`	Run test suite with pytest
`make lint`	Check formatting (black) and linting (ruff)
`make fmt`	Auto-format code with black
`make audit`	Audit dependencies for known vulnerabilities
`make build`	Build distribution packages
`make clean`	Remove build artifacts and `__pycache__` directories
`make distclean`	Run `clean` and also remove the virtual environment

Code Quality

Tools

black: Code formatting (line length: 100)
ruff: Fast Python linter
pip-audit: Dependency vulnerability scanning
pytest: Testing framework

Adding New Features

Adding a New Chart Type

Add new generation method in dotgen.py (e.g., generate_newchart())
- Use PathFinder methods to gather individuals
- Organize individuals by generation or other criteria
- Generate DOT syntax with appropriate rankdir and constraints
Add new subcommand in cli.py
- Create subparser with appropriate arguments
- Handle command in main() function
- Add error handling and user feedback
Add tests in tests/test_dotgen.py
- Test successful generation
- Test error cases
- Verify DOT output contains expected elements
Update README.md and DEVELOPER.md with usage examples

Example: The hourglass and bowtie charts share common infrastructure:

Both use _build_generation_map() to organize individuals by generation
Both use _render_chart() for DOT generation
They differ only in rankdir: TB (hourglass) or LR (bowtie)
Both support ancestor-split and descendants variants

Adding New Relationship Metrics

Add new method to PathFinder class
Update RelationshipPath dataclass if needed
Add tests in tests/test_pathfinder.py
Use in dotgen.py for chart annotations

Performance Considerations

Parser: GEDCOM files are loaded entirely into memory for fast access
GedcomReader: Kept open during program execution to allow lazy reference resolution
Path Finding: BFS is optimal for finding shortest paths; max_depth prevents infinite searches
Caching: Individuals and families are cached to avoid repeated parsing
Spouse Detection: For relationship charts, spouses are identified by examining family records in the context of specific children

Common Issues

Memory Usage

For very large GEDCOM files (>100K individuals), consider:

Streaming parsing instead of loading all individuals
Limiting search depth
Using iterative deepening for path finding

Path Finding Performance

Default max_depth of 50 handles most genealogies
Increase max_depth for very distant relationships
Very large families may have many equally short paths

Dependencies

ged4py: GEDCOM parsing library (runtime dependency)
GraphViz: System tool (dot command) for rendering DOT files to images — not a Python package dependency, must be installed separately via your OS package manager

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Developer Documentation

Architecture

Core Modules

parser.py - GEDCOM Parsing

pathfinder.py - Relationship Path Finding

dotgen.py - GraphViz Generation

cli.py - Command Line Interface

Development Setup

Testing

Test Structure

Running Tests

Test Fixtures

Makefile Targets

Code Quality

Tools

Adding New Features

Adding a New Chart Type

Adding New Relationship Metrics

Performance Considerations

Common Issues

Memory Usage

Path Finding Performance

Dependencies

FilesExpand file tree

DEVELOPER.md

Latest commit

History

DEVELOPER.md

File metadata and controls

Developer Documentation

Architecture

Core Modules

parser.py - GEDCOM Parsing

pathfinder.py - Relationship Path Finding

dotgen.py - GraphViz Generation

cli.py - Command Line Interface

Development Setup

Testing

Test Structure

Running Tests

Test Fixtures

Makefile Targets

Code Quality

Tools

Adding New Features

Adding a New Chart Type

Adding New Relationship Metrics

Performance Considerations

Common Issues

Memory Usage

Path Finding Performance

Dependencies