Problem
TreeScraper extracts links but loses hierarchical context. When a link says "View" but its parent element says "Council Minutes December 30, 2024", we lose that date information.
Current Behavior
interface Link {
href: string;
text: string; // Just "View"
title?: string;
ariaLabel?: string;
// ... no parent context
}
Proposed Enhancement
Extend Link interface to capture parent context:
interface Link {
href: string;
text: string;
title?: string;
ariaLabel?: string;
// NEW: Parent context
parentText?: string; // Immediate parent's text content
ancestorTexts?: string[]; // Path of ancestor texts (e.g., ["2024", "Minutes", "December 30"])
hierarchyLevel?: number; // Depth in tree expansion
}
Implementation Notes
The code already constructs element paths in extractLinksWithTreeExpansion (lines 196-209 in tree.ts) but discards them. Changes needed:
- In
extractLinks() (line 124), traverse up from each <a> to capture parent text
- In
extractLinksWithTreeExpansion(), track which expansion iteration revealed each link
- Add
parentText to the Link interface in types.ts
Use Case
Municipal sites like eckville.com have:
<div class="meeting-item">
<span>Council Minutes December 30, 2024</span>
<a href="/public/download/files/266576">View</a>
</div>
With parent context, praeco's parser can extract "December 30, 2024" from parentText instead of just seeing "View".
Backwards Compatibility
- New fields are optional, won't break existing consumers
- Existing link extraction behavior unchanged
- Just adds more metadata to the Link objects
Problem
TreeScraper extracts links but loses hierarchical context. When a link says "View" but its parent element says "Council Minutes December 30, 2024", we lose that date information.
Current Behavior
Proposed Enhancement
Extend Link interface to capture parent context:
Implementation Notes
The code already constructs element paths in
extractLinksWithTreeExpansion(lines 196-209 in tree.ts) but discards them. Changes needed:extractLinks()(line 124), traverse up from each<a>to capture parent textextractLinksWithTreeExpansion(), track which expansion iteration revealed each linkparentTextto the Link interface intypes.tsUse Case
Municipal sites like eckville.com have:
With parent context, praeco's parser can extract "December 30, 2024" from
parentTextinstead of just seeing "View".Backwards Compatibility