-
Notifications
You must be signed in to change notification settings - Fork 17
Updated URLScan for archive URLs. Fixes #84 #94
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: rajukoushik <g.rajukoushik@gmail.com>
JonoYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I made a few comments regarding your code. Thanks for the contribution!
| Create and save a file at `path` present at `url` using `scan_id` and bare `path` and | ||
| `file_name` and apply the scan. | ||
| """ | ||
| r = requests.get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is path repeated here?
| url_parse = urlparse(url) | ||
| os.chdir(path) | ||
|
|
||
| if r.status_code == 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have code from extractcode in scancode that can do extractions on different archive types which may be helpful, for example https://github.com/nexB/scancode-toolkit/blob/develop/src/extractcode/extract.py#L101
| scan_directory = None | ||
| scan_id = create_scan_id(user, url, scan_directory, scan_start_time) | ||
| current_scan = Scan.objects.get(pk=scan_id) | ||
| path = '/'.join([path, '{}'.format(current_scan.pk)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use os.path.join() to ensure consistency when joining paths
| for i in allowed_exts: | ||
| if url_parse.path.endswith(i): | ||
| is_zip_url = True | ||
| finally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is try-finally used?
| is_zip_url = False | ||
|
|
||
| try: | ||
| for i in allowed_exts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have some code that identify whether or not files are archives or not. I will ask @pombredanne
Signed-off-by: rajukoushik g.rajukoushik@gmail.com