You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -88,12 +89,12 @@ Canonical examples live in `examples/` and are verified by `zig build examples-c
88
89
- Scoped queries:
89
90
- same query family as `Document` (`queryOne/queryAll`, runtime, cached, debug)
90
91
91
-
### Additional helpers
92
+
### Helpers
92
93
93
94
-`doc.html()`, `doc.head()`, `doc.body()`
94
-
-`doc.isOwned(slice)` to check whether a returned slice points into document source bytes
95
+
-`doc.isOwned(slice)` to check whether a slice points into document source bytes
95
96
96
-
### Options
97
+
### Parse/Text options
97
98
98
99
-`ParseOptions`
99
100
-`eager_child_views: bool = true`
@@ -136,18 +137,18 @@ Compilation modes:
136
137
137
138
## Mode Guidance
138
139
139
-
`htmlparser` is permissive by design. Choose parse options per site behavior:
140
+
`htmlparser` is permissive by design. Choose parse options by workload:
140
141
141
142
| Mode | Parse Options | Best For | Tradeoffs |
142
143
|---|---|---|---|
143
-
|`strictest`|`.eager_child_views = true`, `.drop_whitespace_text_nodes = false`|Maximum traversal predictability and text fidelity |More parse-time work |
144
-
|`fastest`|`.eager_child_views = false`, `.drop_whitespace_text_nodes = true`|Throughput-first scraping |Whitespace-only text nodes dropped; child views built lazily |
144
+
|`strictest`|`.eager_child_views = true`, `.drop_whitespace_text_nodes = false`| traversal predictability and text fidelity |higher parse-time work |
145
+
|`fastest`|`.eager_child_views = false`, `.drop_whitespace_text_nodes = true`|throughput-first scraping |whitespace-only text nodes dropped; child views built lazily |
145
146
146
147
Fallback playbook:
147
148
148
149
1. Start with `fastest` for bulk workloads.
149
-
2.Switch problematic domains to `strictest` if text/navigation assumptions fail.
150
-
3. Use `queryOneRuntimeDebug` and inspect `QueryDebugReport` before changing selectors.
150
+
2.Move unstable domains to `strictest`.
151
+
3. Use `queryOneRuntimeDebug` and `QueryDebugReport` before changing selectors.
151
152
152
153
## Performance and Benchmarks
153
154
@@ -164,12 +165,57 @@ Artifacts:
164
165
-`bench/results/latest.md`
165
166
-`bench/results/latest.json`
166
167
167
-
Notes:
168
+
Benchmark policy:
168
169
169
170
- parse comparisons include `strlen`, `lexbor`, and parse-only `lol-html`
0 commit comments