-
-
Notifications
You must be signed in to change notification settings - Fork 17
feat: added remote_file provider #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideIntroduce a new Sequence diagram for fetching and processing remote files with RemoteFile providersequenceDiagram
participant Main
participant RemoteFile
participant requests
Main->>RemoteFile: __init__(urls, excludeip6)
RemoteFile->>requests: get(url) for each url
requests-->>RemoteFile: response (text)
RemoteFile->>RemoteFile: _get_ranges()
RemoteFile->>RemoteFile: _process_ranges()
RemoteFile-->>Main: processed_ranges
Entity relationship diagram for provider registration including RemoteFileerDiagram
PROVIDERS {
string name
class reference
}
PROVIDERS ||--o| REMOTEFILE : registers
REMOTEFILE {
list urls
bool excludeip6
}
Class diagram for the new RemoteFile providerclassDiagram
class RemoteFile {
+urls: list[str]
+excludeip6: bool
+source_ranges: dict
+processed_ranges: dict
+__init__(urls: list[str], excludeip6: bool = False)
+_get_ranges() dict
+_process_ranges() dict
}
RemoteFile --|> BaseProvider
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
Blocking issues:
- Detected a 'requests' call without a timeout set. By default, 'requests' calls wait until the connection is closed. This means a 'requests' call without a timeout will hang the program if a response is never received. Consider setting a timeout for all 'requests'. (link)
General comments:
- Consider adding a timeout (and optionally retry logic) to the requests.get calls in the remote-file provider to avoid hanging or failing silently on slow/unresponsive URLs.
- You may want to deduplicate and sort the merged IP ranges from multiple URLs before emitting them to prevent duplicate entries and improve consistency.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider adding a timeout (and optionally retry logic) to the requests.get calls in the remote-file provider to avoid hanging or failing silently on slow/unresponsive URLs.
- You may want to deduplicate and sort the merged IP ranges from multiple URLs before emitting them to prevent duplicate entries and improve consistency.
## Individual Comments
### Comment 1
<location> `src/sephiroth/providers/remote_file.py:15` </location>
<code_context>
+ ranges = {}
+ for url in self.urls:
+ try:
+ resp = requests.get(url)
+ resp.raise_for_status()
+ lines = resp.text.splitlines()
</code_context>
<issue_to_address>
Consider setting a timeout for requests.get to avoid hanging.
A timeout will prevent indefinite waiting if a server does not respond.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
resp = requests.get(url)
=======
resp = requests.get(url, timeout=10)
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `src/sephiroth/providers/remote_file.py:18` </location>
<code_context>
+ resp = requests.get(url)
+ resp.raise_for_status()
+ lines = resp.text.splitlines()
+ ranges[url] = [line + "\n" for line in lines]
+ except Exception as e:
+ print(f"[!] Failed to fetch {url}: {e}")
</code_context>
<issue_to_address>
Appending '\n' to each line may be unnecessary and could cause issues downstream.
Downstream consumers may receive lines with extra newlines or inconsistent formatting. Please verify if appending '\n' is required.
</issue_to_address>
### Comment 3
<location> `src/sephiroth/providers/remote_file.py:27` </location>
<code_context>
+ ranges = []
+ for fname, range_list in self.source_ranges.items():
+ for ip_line in range_list:
+ if ip_line.startswith("#"):
+ continue
+ if ":" in ip_line and self.excludeip6:
</code_context>
<issue_to_address>
Lines starting with whitespace before '#' will not be treated as comments.
Consider stripping leading whitespace before checking for '#', to ensure all comment lines are correctly identified.
Suggested implementation:
```python
for ip_line in range_list:
stripped_ip_line = ip_line.lstrip()
if stripped_ip_line.startswith("#"):
continue
if ":" in stripped_ip_line and self.excludeip6:
```
```python
if "#" in stripped_ip_line:
ip_addr, comment = map(str.strip, stripped_ip_line.split("#", 1))
else:
ip_addr = stripped_ip_line.strip()
comment = ""
```
</issue_to_address>
### Comment 4
<location> `src/sephiroth/providers/remote_file.py:29` </location>
<code_context>
+ for ip_line in range_list:
+ if ip_line.startswith("#"):
+ continue
+ if ":" in ip_line and self.excludeip6:
+ continue
+ if "#" in ip_line:
</code_context>
<issue_to_address>
IPv6 exclusion logic may match false positives if ':' appears in comments.
Splitting the line to separate comments before checking for ':' will prevent incorrect exclusions.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
for ip_line in range_list:
if ip_line.startswith("#"):
continue
if ":" in ip_line and self.excludeip6:
continue
if "#" in ip_line:
ip_addr, comment = map(str.strip, ip_line.split("#", 1))
else:
ip_addr = ip_line.strip()
comment = ""
ranges.append({
"range": ip_addr,
"comment": f"{fname} {comment}".strip()
})
=======
for ip_line in range_list:
if ip_line.startswith("#"):
continue
if "#" in ip_line:
ip_addr, comment = map(str.strip, ip_line.split("#", 1))
else:
ip_addr = ip_line.strip()
comment = ""
if ":" in ip_addr and self.excludeip6:
continue
ranges.append({
"range": ip_addr,
"comment": f"{fname} {comment}".strip()
})
>>>>>>> REPLACE
</suggested_fix>
### Comment 5
<location> `src/sephiroth/main.py:290` </location>
<code_context>
- template_vars["header_comments"] += provider_vars["header_comments"]
- template_vars["ranges"] += provider_vars["ranges"]
+
+ if provider_vars:
+ template_vars["header_comments"] += provider_vars["header_comments"]
+ template_vars["ranges"] += provider_vars["ranges"]
</code_context>
<issue_to_address>
Skipping providers with no output may mask silent failures.
Consider adding a log or warning when provider_vars is None to make silent failures more visible during debugging.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
if provider_vars:
template_vars["header_comments"] += provider_vars["header_comments"]
template_vars["ranges"] += provider_vars["ranges"]
=======
if provider_vars:
template_vars["header_comments"] += provider_vars["header_comments"]
template_vars["ranges"] += provider_vars["ranges"]
else:
import logging
logging.warning(f"No output from provider '{provider}'. This may indicate a silent failure.")
>>>>>>> REPLACE
</suggested_fix>
## Security Issues
### Issue 1
<location> `src/sephiroth/providers/remote_file.py:15` </location>
<issue_to_address>
**security (python.requests.best-practice.use-timeout):** Detected a 'requests' call without a timeout set. By default, 'requests' calls wait until the connection is closed. This means a 'requests' call without a timeout will hang the program if a response is never received. Consider setting a timeout for all 'requests'.
```suggestion
resp = requests.get(url, timeout=30)
```
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ranges = {} | ||
| for url in self.urls: | ||
| try: | ||
| resp = requests.get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Consider setting a timeout for requests.get to avoid hanging.
A timeout will prevent indefinite waiting if a server does not respond.
| resp = requests.get(url) | |
| resp = requests.get(url, timeout=10) |
| resp = requests.get(url) | ||
| resp.raise_for_status() | ||
| lines = resp.text.splitlines() | ||
| ranges[url] = [line + "\n" for line in lines] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Appending '\n' to each line may be unnecessary and could cause issues downstream.
Downstream consumers may receive lines with extra newlines or inconsistent formatting. Please verify if appending '\n' is required.
| ranges = [] | ||
| for fname, range_list in self.source_ranges.items(): | ||
| for ip_line in range_list: | ||
| if ip_line.startswith("#"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Lines starting with whitespace before '#' will not be treated as comments.
Consider stripping leading whitespace before checking for '#', to ensure all comment lines are correctly identified.
Suggested implementation:
for ip_line in range_list:
stripped_ip_line = ip_line.lstrip()
if stripped_ip_line.startswith("#"):
continue
if ":" in stripped_ip_line and self.excludeip6: if "#" in stripped_ip_line:
ip_addr, comment = map(str.strip, stripped_ip_line.split("#", 1))
else:
ip_addr = stripped_ip_line.strip()
comment = ""| for ip_line in range_list: | ||
| if ip_line.startswith("#"): | ||
| continue | ||
| if ":" in ip_line and self.excludeip6: | ||
| continue | ||
| if "#" in ip_line: | ||
| ip_addr, comment = map(str.strip, ip_line.split("#", 1)) | ||
| else: | ||
| ip_addr = ip_line.strip() | ||
| comment = "" | ||
| ranges.append({ | ||
| "range": ip_addr, | ||
| "comment": f"{fname} {comment}".strip() | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): IPv6 exclusion logic may match false positives if ':' appears in comments.
Splitting the line to separate comments before checking for ':' will prevent incorrect exclusions.
| for ip_line in range_list: | |
| if ip_line.startswith("#"): | |
| continue | |
| if ":" in ip_line and self.excludeip6: | |
| continue | |
| if "#" in ip_line: | |
| ip_addr, comment = map(str.strip, ip_line.split("#", 1)) | |
| else: | |
| ip_addr = ip_line.strip() | |
| comment = "" | |
| ranges.append({ | |
| "range": ip_addr, | |
| "comment": f"{fname} {comment}".strip() | |
| }) | |
| for ip_line in range_list: | |
| if ip_line.startswith("#"): | |
| continue | |
| if "#" in ip_line: | |
| ip_addr, comment = map(str.strip, ip_line.split("#", 1)) | |
| else: | |
| ip_addr = ip_line.strip() | |
| comment = "" | |
| if ":" in ip_addr and self.excludeip6: | |
| continue | |
| ranges.append({ | |
| "range": ip_addr, | |
| "comment": f"{fname} {comment}".strip() | |
| }) |
| if provider_vars: | ||
| template_vars["header_comments"] += provider_vars["header_comments"] | ||
| template_vars["ranges"] += provider_vars["ranges"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Skipping providers with no output may mask silent failures.
Consider adding a log or warning when provider_vars is None to make silent failures more visible during debugging.
| if provider_vars: | |
| template_vars["header_comments"] += provider_vars["header_comments"] | |
| template_vars["ranges"] += provider_vars["ranges"] | |
| if provider_vars: | |
| template_vars["header_comments"] += provider_vars["header_comments"] | |
| template_vars["ranges"] += provider_vars["ranges"] | |
| else: | |
| import logging | |
| logging.warning(f"No output from provider '{provider}'. This may indicate a silent failure.") |
| ranges = {} | ||
| for url in self.urls: | ||
| try: | ||
| resp = requests.get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (python.requests.best-practice.use-timeout): Detected a 'requests' call without a timeout set. By default, 'requests' calls wait until the connection is closed. This means a 'requests' call without a timeout will hang the program if a response is never received. Consider setting a timeout for all 'requests'.
| resp = requests.get(url) | |
| resp = requests.get(url, timeout=30) |
Source: opengrep
Hi, I'd like to be able to give some remote files that are constantly updated as source of IPs to block (i.e. https://github.com/X4BNet/lists_vpn/tree/main/output) so I implemented these changes to be able to do so. This works if you use it like this
-t remote-file https://xyz.com/ip-list.txtbut also like this-t _all --url https://xyz.com/ip-list.txt. Multiple--urlflag are supported so sources are fetched and merged in a single file as with the use of thefileflag.Summary by Sourcery
Add a remote-file provider to allow fetching IP lists from remote URLs, extend CLI to accept --url flags, and integrate it into the existing target processing flow.
New Features:
Enhancements: