Skip to content

fix(commentPreprocessor, licenseDownloader): normalize keyword matching, SPDX URL handling, missing comma#123

Open
divyamagrawal06 wants to merge 2 commits intofossology:masterfrom
divyamagrawal06:fix/missing-comma
Open

fix(commentPreprocessor, licenseDownloader): normalize keyword matching, SPDX URL handling, missing comma#123
divyamagrawal06 wants to merge 2 commits intofossology:masterfrom
divyamagrawal06:fix/missing-comma

Conversation

@divyamagrawal06
Copy link
Copy Markdown

@divyamagrawal06 divyamagrawal06 commented Mar 26, 2026

Description

Small bug fixes in comment extraction and SPDX download normalization.

Changes

Updated keyword normalization in commentPreprocessor.py by cleaning list formatting (added a missing comma) and added "merchantability" keyword support ("merchantibility" exists, but "merchantability" is the common spelling).
Normalized seeAlso handling in licenseDownloader.py so that the url is stored consistently as a string.
One-row DataFrame creation per license/exception entry using [licenseDict].
Made SPDX filename generation Windows-safe by sanitizing the releaseDate before building CSV filename in licenseDownloader.py.

How to test

poetry install
poetry run preprocess

Create a new test file ex: dummy_license_file.txt

poetry run atarashi -a DLD .\dummy_license_file.txt

Validation:

Output from tests:
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant