Add hosted Defuddle URL conversion to the app#29
Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class website URL inputs to the home flow and routes URL conversions through the hosted Defuddle service, with supporting helpers, UI, translations, docs, and tests.
Changes:
- Add URL input bars to the empty and queue states and allow queueing URLs alongside files.
- Add Defuddle-backed URL conversion path in the conversion pipeline and surface Defuddle backend usage in summaries.
- Introduce shared input-source helpers (URL detection, display name, output filename stem) and update docs/translations/tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/utils/test_translations.py | Extends translation-key coverage for new URL/Defuddle/help strings. |
| tests/core/test_input_sources.py | Adds unit tests for URL detection, display names, and filename stem sanitization. |
| tests/core/test_conversion.py | Adds tests for Defuddle URL conversion behavior and error surfacing. |
| markitdowngui/utils/translations.py | Adds new UI/help strings and updates copy to reflect URL inputs. |
| markitdowngui/ui/home_interface.py | Adds URL input UI, queues URL “sources”, and updates progress/result labeling for URLs. |
| markitdowngui/ui/help_interface.py | Adds Defuddle docs link and new FAQ entries for URL conversion. |
| markitdowngui/ui/components/url_input_bar.py | New reusable URL input widget used by the home flow. |
| markitdowngui/core/input_sources.py | New helpers for URL detection, display names, and output filename stems. |
| markitdowngui/core/conversion.py | Adds Defuddle backend + HTTP request path for URL conversions. |
| README.md | Documents website URL conversion behavior, dependencies, and free-tier limits. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| candidate = value.strip() | ||
| if not candidate: | ||
| return False | ||
|
|
There was a problem hiding this comment.
is_web_url() is used as the primary validation gate for URL input, but it currently returns True for strings that contain whitespace in the URL path (e.g., "https://example.com/hello world"). Those inputs will later fail during HTTP request construction. Consider tightening validation (reject any whitespace / control chars) or normalizing to a properly encoded URL before enqueueing so the UI’s "Invalid URL" path catches these cases.
| # Reject URLs containing any whitespace or control characters, as these | |
| # will cause failures when constructing HTTP requests unless encoded. | |
| if any(ch.isspace() or ord(ch) < 32 for ch in candidate): | |
| return False |
tests/core/test_conversion.py
Outdated
|
|
||
| assert outcome.markdown == "# Article" | ||
| assert outcome.backend == conversion.BACKEND_DEFUDDLE | ||
| assert captured["url"] == "https://defuddle.md/https://example.com/article" |
There was a problem hiding this comment.
These assertions lock in the exact Defuddle request URL string. If the implementation is updated to percent-encode/normalize embedded target URLs (to handle spaces, fragments, etc.), this test will need to assert the normalized/encoded request URL instead of the raw concatenation.
| assert captured["url"] == "https://defuddle.md/https://example.com/article" | |
| expected_url = conversion._build_defuddle_request_url("https://example.com/article") | |
| assert captured["url"] == expected_url |
markitdowngui/core/conversion.py
Outdated
| def _build_defuddle_request_url(url: str) -> str: | ||
| return f"{DEFUDDLE_API_BASE_URL}{url.strip()}" |
There was a problem hiding this comment.
_build_defuddle_request_url() concatenates the raw user-provided URL into the Defuddle request URL. If the pasted URL contains characters that must be percent-encoded in a URL (common case: spaces), requests.get() will raise an InvalidURL/RequestException and the app will report it as a service reachability problem. Consider normalizing/encoding the embedded URL (e.g., percent-encode the target URL before appending) so the request URL is always syntactically valid, and so reserved characters like # aren’t silently treated as fragments by the client.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6310467ddd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
markitdowngui/core/conversion.py
Outdated
|
|
||
|
|
||
| def _build_defuddle_request_url(url: str) -> str: | ||
| return f"{DEFUDDLE_API_BASE_URL}{url.strip()}" |
There was a problem hiding this comment.
Percent-encode URL before building Defuddle request path
_build_defuddle_request_url appends the raw URL string directly to https://defuddle.md/, so any query string is sent as the outer request query (for example .../https://site/page?key=abc). Defuddle reserves key in the outer query for API authentication, so normal target URLs that legitimately contain ?key=... are treated as auth attempts and can fail with 401 instead of being converted. Encode the full target URL before appending (or send it via a dedicated parameter) so target query params are not reinterpreted as Defuddle control params.
Useful? React with 👍 / 👎.
Summary
Testing
.\.venv-win\Scripts\python.exe -m pytest -q