GedGraph is structured as a modular Python application with clear separation of concerns:
gedgraph/
├── __init__.py # Package metadata and version
├── __main__.py # Entry point for `python -m gedgraph`
├── parser.py # GEDCOM file parsing and queries
├── pathfinder.py # Relationship path finding algorithms
├── dotgen.py # GraphViz DOT file generation
├── progress.py # Braille-spinner progress indicators (vendored from gedcom_tools)
└── cli.py # Command-line interface
Purpose: Parse GEDCOM files and provide query methods for individuals and relationships.
Key Classes:
GedcomParser: Main parser class using ged4py library
Key Methods:
load(): Parse GEDCOM file into memoryget_individual(xref_id): Retrieve individual by IDget_name(individual): Format name using NPFX/TITL GIVN SURN NSFX sequenceget_birth_year(),get_death_year(): Get vital dates with fallback to baptism/burialget_parents(individual): Get father and motherget_children(individual): Get all childrenget_spouse_for_child(): Get spouse in context of specific child, with marriage statusis_full_sibling(),is_half_sibling(): Determine sibling relationships_extract_year(): Extract year from GEDCOM event tag (used by birth/death methods)
Implementation Notes:
- Individuals and families are loaded at initialization and kept in memory
- GedcomReader stays open to allow lazy resolution of references
- Individuals cached in
_individualsdict for O(1) lookup - Families cached in
_familiesdict - Accepts IDs with or without @ symbols for convenience
- Name parsing checks GEDCOM sub-tags (NPFX, TITL, GIVN, SURN, NSFX) before falling back to parsed tuple
- Marriage detection checks for MARR tag in family records
Purpose: Find relationship paths between individuals using graph traversal.
Key Classes:
PathFinder: BFS-based path findingRelationshipPath: Represents a complete path with metadataPathStep: Represents a single parent/child relationship with boolean flags
Algorithm:
- Uses breadth-first search (BFS) to find shortest paths
- Explores both upward (parents) and downward (children) relationships
- Tracks visited nodes to avoid infinite loops
- Continues searching until all paths of minimum length are found
Path Sorting: Paths are sorted by a tuple key:
- Length (number of steps)
- Blood score (count of half-blood relationships)
- Male preference score (count of female-line steps)
This prioritizes: shorter paths, full blood over half blood, male line over female line.
Key Methods:
find_pedigree(): BFS to find ancestors up to N generationsfind_pedigree_with_generations(): Find ancestors with generation trackingfind_pedigree_split(): Find paternal and maternal pedigrees separatelyfind_descendants(): Find descendants with generation trackingfind_relationship_paths(): BFS to find all paths between two individualsget_shortest_paths(): Find and sort shortest paths_bfs_traverse(): Generic BFS traversal used by all find methods_get_neighbors(): Get all adjacent individuals in the graph_is_full_blood(): Check if parent-child relationship is full blood
Purpose: Generate DOT format files for visualization.
Key Classes:
DotGenerator: Creates DOT syntax for charts
Key Methods:
generate_pedigree(): Create pedigree chart DOT file (ancestors only)generate_hourglass(): Create hourglass chart DOT file (vertical split)generate_bowtie(): Create bowtie chart DOT file (horizontal split)generate_relationship(): Create relationship chart DOT file with spouse nodes_format_label(): Format names with dates in (YYYY - YYYY) format_describe_relationship(): Generate human-readable relationship description_build_generation_map(): Build generation map for hourglass/bowtie charts_render_chart(): Generic chart renderer for all chart types
Chart Types:
- Pedigree: Ancestors only, top-to-bottom layout
- Hourglass: Two variants with vertical layout (rankdir=TB)
ancestor-split: Father's line above root, mother's line below rootdescendants: Ancestors above root, descendants below root
- Bowtie: Two variants with horizontal layout (rankdir=LR)
ancestor-split: Father's line left of root, mother's line right of rootdescendants: Ancestors left of root, descendants right of root
- Relationship: Path between two individuals with spouses
DOT Generation:
- Uses
rankdir=TB(top-to-bottom) for pedigree, hourglass, and relationship charts - Uses
rankdir=LR(left-to-right) for bowtie charts - Color codes nodes: lightcoral (start/root), lightblue (end), lightgreen (bloodline), lightyellow (spouses)
- Spouse nodes positioned using
{rank=same; ...}constraints - Marriage status indicated by line style: solid (married), dashed (unmarried)
- Spouse lines use
dir=noneandconstraint=falseto avoid affecting layout - Includes metadata as comments (generations, path length, etc.)
Purpose: Provide user-friendly CLI using argparse.
Commands:
pedigree: Generate ancestor chartrelationship: Generate relationship chart between two individualshourglass: Generate hourglass chart (vertical split layout)bowtie: Generate bowtie chart (horizontal split layout)
Common Options:
-o, --output: Output DOT file path (required for all commands)-g, --generations: Number of generations (default: 4, used by pedigree/hourglass/bowtie)-d, --max-depth: Maximum search depth (default: 50, used by relationship)-v, --variant: Chart variant (used by hourglass/bowtie)ancestor-split: Split by parental linesdescendants: Split by ancestors/descendants
Global Flags (must precede the subcommand name):
--verbose: Show detailed progress with timing-q, --quiet: Suppress progress output (spinner phases on stderr)--no-color: Disable colored output
Progress Feedback:
- Each command runs through 3 phases: Loading GEDCOM, Generating/Finding chart, Writing output
- Progress is displayed on stderr via
PhaseTrackerfromprogress.py - In quiet mode, all spinner output is suppressed; the stdout summary line is always emitted
- Non-TTY environments (pipes, redirects) degrade gracefully — no animation, just final status
Error Handling:
- Validates GEDCOM file exists
- Validates individual IDs exist
- Reports when no relationship found
- Exits with appropriate error codes
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install in editable mode with dev dependencies
pip install -e ".[dev]"tests/
├── fixtures/
│ └── sample.ged # Sample GEDCOM for testing
├── test_parser.py # Parser unit tests
├── test_pathfinder.py # Path finding unit tests
├── test_dotgen.py # DOT generation unit tests
├── test_progress.py # Progress indicator unit tests
├── test_cli.py # CLI flag parsing and wiring tests
└── test_integration.py # End-to-end CLI tests
# All tests
pytest tests/ -v
# Specific test file
pytest tests/test_parser.py -v
# With coverage
pytest tests/ --cov=gedgraph --cov-report=html
# Single test
pytest tests/test_parser.py::test_get_individual -vThe sample.ged file contains:
- 10 individuals across 4 generations
- Various relationships (parent-child, siblings, cousins)
- Birth and death dates for testing date parsing
- Both connected and disconnected individuals
| Target | Description |
|---|---|
make venv |
Create virtual environment in .venv/ |
make install |
Upgrade pip and install package in editable mode with dev deps |
make test |
Run test suite with pytest |
make lint |
Check formatting (black) and linting (ruff) |
make fmt |
Auto-format code with black |
make audit |
Audit dependencies for known vulnerabilities |
make build |
Build distribution packages |
make clean |
Remove build artifacts and __pycache__ directories |
make distclean |
Run clean and also remove the virtual environment |
- black: Code formatting (line length: 100)
- ruff: Fast Python linter
- pip-audit: Dependency vulnerability scanning
- pytest: Testing framework
- Add new generation method in
dotgen.py(e.g.,generate_newchart())- Use PathFinder methods to gather individuals
- Organize individuals by generation or other criteria
- Generate DOT syntax with appropriate rankdir and constraints
- Add new subcommand in
cli.py- Create subparser with appropriate arguments
- Handle command in main() function
- Add error handling and user feedback
- Add tests in
tests/test_dotgen.py- Test successful generation
- Test error cases
- Verify DOT output contains expected elements
- Update README.md and DEVELOPER.md with usage examples
Example: The hourglass and bowtie charts share common infrastructure:
- Both use
_build_generation_map()to organize individuals by generation - Both use
_render_chart()for DOT generation - They differ only in rankdir: TB (hourglass) or LR (bowtie)
- Both support
ancestor-splitanddescendantsvariants
- Add new method to
PathFinderclass - Update
RelationshipPathdataclass if needed - Add tests in
tests/test_pathfinder.py - Use in
dotgen.pyfor chart annotations
- Parser: GEDCOM files are loaded entirely into memory for fast access
- GedcomReader: Kept open during program execution to allow lazy reference resolution
- Path Finding: BFS is optimal for finding shortest paths; max_depth prevents infinite searches
- Caching: Individuals and families are cached to avoid repeated parsing
- Spouse Detection: For relationship charts, spouses are identified by examining family records in the context of specific children
For very large GEDCOM files (>100K individuals), consider:
- Streaming parsing instead of loading all individuals
- Limiting search depth
- Using iterative deepening for path finding
- Default max_depth of 50 handles most genealogies
- Increase max_depth for very distant relationships
- Very large families may have many equally short paths
- ged4py: GEDCOM parsing library (runtime dependency)
- GraphViz: System tool (
dotcommand) for rendering DOT files to images — not a Python package dependency, must be installed separately via your OS package manager