Conversation
Summary of ChangesHello @gantoine, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request expands the API's feed generation capabilities by adding support for PKGJ feeds across multiple PlayStation platforms (PSP, PSVita, and PSX). It streamlines the process of creating these text-based feeds by introducing a reusable response helper and new data schemas, ensuring that users can access game and DLC information in a format compatible with PKGJ clients. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces new PKGJ feed endpoints and refactors existing feed generation logic. However, a critical vulnerability exists across all new and refactored endpoints: they are susceptible to CSV/TSV injection due to unsanitized delimiters and newline characters in the generated responses. This could allow an attacker to hijack downloads by manipulating column data. Additionally, the new PKGJ endpoints for games and DLCs suffer from significant code duplication and a bug where content is not filtered by type, leading to incorrect feed content. The current tests are insufficient to catch these filtering issues.
| txt_lines.append( | ||
| "\t".join( | ||
| [ | ||
| pkgj_item.title_id, | ||
| pkgj_item.region, | ||
| pkgj_item.type, | ||
| pkgj_item.name, | ||
| pkgj_item.download_link, | ||
| pkgj_item.content_id, | ||
| last_modified, | ||
| pkgj_item.rap, | ||
| pkgj_item.download_rap_file, | ||
| str(pkgj_item.file_size), | ||
| pkgj_item.sha_256, | ||
| ] | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The Pkgj feeds are vulnerable to TSV injection. An attacker can inject a tab character into the ROM name to shift the columns, potentially hijacking the "PKG direct link" column and pointing it to a malicious URL. All fields joined with tabs should have tab and newline characters removed or replaced. This specific issue highlights the need for robust sanitization within the pkgj_*_feed functions, which currently suffer from significant code duplication, making consistent application of such fixes challenging.
| txt_lines.append( | |
| "\t".join( | |
| [ | |
| pkgj_item.title_id, | |
| pkgj_item.region, | |
| pkgj_item.type, | |
| pkgj_item.name, | |
| pkgj_item.download_link, | |
| pkgj_item.content_id, | |
| last_modified, | |
| pkgj_item.rap, | |
| pkgj_item.download_rap_file, | |
| str(pkgj_item.file_size), | |
| pkgj_item.sha_256, | |
| ] | |
| ) | |
| ) | |
| txt_lines.append( | |
| "\t".join( | |
| [ | |
| pkgj_item.title_id, | |
| pkgj_item.region, | |
| pkgj_item.type, | |
| pkgj_item.name.replace("\t", " ").replace("\n", " ").replace("\r", " "), | |
| pkgj_item.download_link, | |
| pkgj_item.content_id, | |
| last_modified, | |
| pkgj_item.rap, | |
| pkgj_item.download_rap_file, | |
| str(pkgj_item.file_size), | |
| pkgj_item.sha_256, | |
| ] | |
| ) | |
| ) |
| txt_lines.append( | ||
| ",".join( | ||
| [ | ||
| pkgi_item.contentid, | ||
| str(pkgi_item.type), | ||
| f'"{pkgi_item.name}"', | ||
| pkgi_item.description, | ||
| pkgi_item.rap, | ||
| f'"{pkgi_item.url}"', | ||
| str(pkgi_item.size), | ||
| pkgi_item.checksum, | ||
| ] | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The feed endpoints construct text-based responses (CSV or TSV) by joining fields with commas or tabs without sanitizing or escaping these delimiters or newline characters within the fields (e.g., ROM name). An attacker with permissions to edit ROM metadata or upload files with crafted names can inject arbitrary columns or rows into the feeds. For Pkgj feeds, this can be used to hijack downloads by shifting the "PKG direct link" column to a malicious URL.
To remediate this, sanitize or escape delimiters (tabs, commas) and newline characters in all fields before joining them. For TSV formats, replace tabs and newlines with spaces. For CSV formats, ensure proper escaping of quotes and delimiters.
| txt_lines.append( | |
| ",".join( | |
| [ | |
| pkgi_item.contentid, | |
| str(pkgi_item.type), | |
| f'"{pkgi_item.name}"', | |
| pkgi_item.description, | |
| pkgi_item.rap, | |
| f'"{pkgi_item.url}"', | |
| str(pkgi_item.size), | |
| pkgi_item.checksum, | |
| ] | |
| ) | |
| ) | |
| txt_lines.append( | |
| ",".join( | |
| [ | |
| pkgi_item.contentid, | |
| str(pkgi_item.type), | |
| f'"{pkgi_item.name.replace("\"", "\"\"")}"', | |
| pkgi_item.description.replace("\"", "\"\""), | |
| pkgi_item.rap.replace("\"", "\"\""), | |
| f'"{pkgi_item.url}"', | |
| str(pkgi_item.size), | |
| pkgi_item.checksum, | |
| ] | |
| ) | |
| ) |
| txt_lines.append( | ||
| ",".join( | ||
| [ | ||
| pkgi_item.contentid, | ||
| str(pkgi_item.flags), | ||
| f'"{pkgi_item.name}"', | ||
| pkgi_item.name2, | ||
| pkgi_item.zrif, | ||
| f'"{pkgi_item.url}"', | ||
| str(pkgi_item.size), | ||
| pkgi_item.checksum, | ||
| ] | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Similar to the PKGi PS3 feed, the PS Vita feed is vulnerable to CSV injection if the ROM name or other fields contain double quotes or newlines. Delimiters and newlines should be properly escaped or replaced.
| txt_lines.append( | |
| ",".join( | |
| [ | |
| pkgi_item.contentid, | |
| str(pkgi_item.flags), | |
| f'"{pkgi_item.name}"', | |
| pkgi_item.name2, | |
| pkgi_item.zrif, | |
| f'"{pkgi_item.url}"', | |
| str(pkgi_item.size), | |
| pkgi_item.checksum, | |
| ] | |
| ) | |
| ) | |
| txt_lines.append( | |
| ",".join( | |
| [ | |
| pkgi_item.contentid, | |
| str(pkgi_item.flags), | |
| f'"{pkgi_item.name.replace("\"", "\"\"")}"', | |
| pkgi_item.name2.replace("\"", "\"\""), | |
| pkgi_item.zrif.replace("\"", "\"\""), | |
| f'"{pkgi_item.url}"', | |
| str(pkgi_item.size), | |
| pkgi_item.checksum, | |
| ] | |
| ) | |
| ) |
| txt_lines.append( | ||
| "\t".join( | ||
| [ | ||
| kekatsu_item.title, | ||
| kekatsu_item.platform, | ||
| kekatsu_item.region, | ||
| kekatsu_item.version, | ||
| kekatsu_item.author, | ||
| kekatsu_item.download_url, | ||
| kekatsu_item.filename, | ||
| str(kekatsu_item.size), | ||
| kekatsu_item.box_art_url, | ||
| ] | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The Kekatsu feed uses a tab-separated format. If fields like the ROM title or filename contain tabs or newlines, it will break the TSV structure and potentially allow for column or row injection. Tabs and newlines should be replaced with spaces or escaped.
| txt_lines.append( | |
| "\t".join( | |
| [ | |
| kekatsu_item.title, | |
| kekatsu_item.platform, | |
| kekatsu_item.region, | |
| kekatsu_item.version, | |
| kekatsu_item.author, | |
| kekatsu_item.download_url, | |
| kekatsu_item.filename, | |
| str(kekatsu_item.size), | |
| kekatsu_item.box_art_url, | |
| ] | |
| ) | |
| ) | |
| txt_lines.append( | |
| "\t".join( | |
| [ | |
| kekatsu_item.title.replace("\t", " ").replace("\n", " ").replace("\r", " "), | |
| kekatsu_item.platform, | |
| kekatsu_item.region, | |
| kekatsu_item.version, | |
| kekatsu_item.author, | |
| kekatsu_item.download_url, | |
| kekatsu_item.filename.replace("\t", " ").replace("\n", " ").replace("\r", " "), | |
| str(kekatsu_item.size), | |
| kekatsu_item.box_art_url, | |
| ] | |
| ) | |
| ) |
| def pkgj_psp_dlcs_feed(request: Request) -> Response: | ||
| platform = db_platform_handler.get_platform_by_slug(UPS.PSP) | ||
| if not platform: | ||
| raise HTTPException( | ||
| status_code=404, detail="PlayStation Portable platform not found" | ||
| ) | ||
|
|
||
| roms = db_rom_handler.get_roms_scalar(platform_ids=[platform.id]) | ||
| txt_lines = [] | ||
| txt_lines.append( | ||
| "Title ID\tRegion\tName\tPKG direct link\tContent ID\tLast Modification Date\tRAP\tDownload .RAP file\tFile Size\tSHA256" | ||
| ) | ||
|
|
||
| for rom in roms: | ||
| download_url = generate_rom_download_url(request, rom) | ||
| last_modified = _format_pkgj_datetime(rom.updated_at) | ||
|
|
||
| pkgj_item = PkgjPSPDlcsItemSchema( | ||
| title_id="", | ||
| region=rom.regions[0] if rom.regions else "", | ||
| name=(rom.name or rom.fs_name_no_tags).strip(), | ||
| download_link=download_url, | ||
| content_id="", | ||
| last_modified=rom.updated_at, | ||
| rap="", | ||
| download_rap_file="", | ||
| file_size=rom.fs_size_bytes, | ||
| sha_256=rom.sha1_hash or "", | ||
| ) | ||
|
|
||
| txt_lines.append( | ||
| "\t".join( | ||
| [ | ||
| pkgj_item.title_id, | ||
| pkgj_item.region, | ||
| pkgj_item.name, | ||
| pkgj_item.download_link, | ||
| pkgj_item.content_id, | ||
| last_modified, | ||
| pkgj_item.rap, | ||
| pkgj_item.download_rap_file, | ||
| str(pkgj_item.file_size), | ||
| pkgj_item.sha_256, | ||
| ] | ||
| ) | ||
| ) | ||
|
|
||
| return _text_response(txt_lines, "pkgj_psp_dlc.txt") |
There was a problem hiding this comment.
This endpoint is intended to provide a feed of DLCs, but it fetches all ROMs for the platform without any filtering (db_rom_handler.get_roms_scalar(platform_ids=[platform.id])). This means it will incorrectly include games and other content types in the DLC feed.
The roms should be filtered to include only those that are actual DLCs. This could be done by checking the category of the RomFiles associated with each Rom. For example:
all_roms = db_rom_handler.get_roms_scalar(platform_ids=[platform.id])
roms = [rom for rom in all_roms if any(f.category == RomFileCategory.DLC for f in rom.files)]A more efficient approach would be to filter at the database level if possible. This issue also applies to the other games and dlc feeds, which should similarly filter for their respective content types.
Additionally, there's a naming inconsistency. The function is pkgj_psp_dlcs_feed (plural), but the endpoint path is /pkgj/psp/dlc (singular) and the filename is pkgj_psp_dlc.txt (singular). For consistency with the games endpoints, consider using the plural form dlcs in the path and filename as well.
| def test_pkgj_psp_dlc_feed( | ||
| client: TestClient, access_token: str, platform: Platform, rom: Rom | ||
| ): | ||
| platform = db_platform_handler.update_platform( | ||
| platform.id, | ||
| {"name": "PlayStation Portable", "slug": UPS.PSP, "fs_slug": UPS.PSP}, | ||
| ) | ||
| db_rom_handler.update_rom( | ||
| rom.id, | ||
| { | ||
| "platform_id": platform.id, | ||
| "name": "Test PSP DLC", | ||
| "fs_name": "Test PSP DLC.pkg", | ||
| "fs_name_no_tags": "Test PSP DLC", | ||
| "fs_name_no_ext": "Test PSP DLC", | ||
| "fs_extension": "pkg", | ||
| "fs_path": f"{platform.slug}/roms", | ||
| "fs_size_bytes": 123, | ||
| "sha1_hash": "deadbeef", | ||
| "regions": ["US"], | ||
| }, | ||
| ) | ||
|
|
||
| response = client.get( | ||
| "/api/feeds/pkgj/psp/dlc", | ||
| headers={"Authorization": f"Bearer {access_token}"}, | ||
| ) | ||
| assert response.status_code == status.HTTP_200_OK | ||
| assert response.headers["content-disposition"] == "filename=pkgj_psp_dlc.txt" | ||
| assert "Test PSP DLC" in response.text | ||
|
|
There was a problem hiding this comment.
This test for the DLC feed is not specific enough. It currently asserts that a ROM named 'Test PSP DLC' is in the response, but it doesn't ensure that only DLCs are present. The test ROM isn't configured as a DLC, so this test would pass even with the filtering bug in the endpoint.
To make this test more robust, you should:
- Set up test data where some ROMs are explicitly categorized as DLCs (e.g., by adding a
RomFilewithcategory=RomFileCategory.DLC) and others are not. - Assert that the DLC feed contains the DLC ROM.
- Assert that the DLC feed does not contain the non-DLC ROM.
This would verify that the endpoint's filtering logic is working correctly. This same principle applies to the other new pkgj feed tests.
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
Description
Explain the changes or enhancements you are proposing with this pull request.
Checklist
Please check all that apply.
Screenshots (if applicable)