feat(newagent): Add a new Keyword Agent for pre-checking#109
feat(newagent): Add a new Keyword Agent for pre-checking#109rajuljha wants to merge 1 commit intofossology:masterfrom
Conversation
Kaushl2208
left a comment
There was a problem hiding this comment.
Hi @rajuljha,
I've reviewed the changes and recommended a few minor adjustments. Overall, the changes look good at first glance.
I have a question regarding the NomosTestFiles: Is it necessary to commit these files directly? Could we consider downloading them at runtime instead? For instance, if a user wishes to run the evaluator for Keywords, they could download the files when needed, allowing the evaluation to be performed dynamically.
From our perspective, the evaluation script serves as a form of functional testing, which we can utilize in CI stages to ensure everything is functioning correctly. In these scenarios, I believe we could download the TestFiles and execute the evaluator as required, correct?
atarashi/agents/keywordAgent.py
Outdated
| if os.path.isfile(input_path): | ||
| results = agent.scan(input_path) | ||
| if results: | ||
| print(f"Scan results for {input_path}:") |
There was a problem hiding this comment.
| print(f"Scan results for {input_path}:") | |
| print(f"Keyword Scan results for {input_path}:") |
| file_path = os.path.join(root, file) | ||
| results = agent.scan(file_path) | ||
| if results: | ||
| print(f"Scan results for {file_path}:") |
There was a problem hiding this comment.
Same here.
Since it is more of a support agent rather than being a License Detector as of now. We should also mention that specifically and I currently wondering if it can act as a standalone agent altogether or it can act as a bypasser for all the other agents? What do you think @rajuljha ?
There was a problem hiding this comment.
KeywordAgent can act indepdantly as well by directly running the keyWordAgent.py file. Not through atarashi cli currently. Inside the atarashi cli, it only acts as a support agent for now. Let me know if it should be made accessible in the CLI as well.
| warnings.simplefilter("ignore") | ||
| sklearn_tfidf = TfidfVectorizer(min_df=1, use_idf=True, smooth_idf=True, | ||
| sublinear_tf=True, tokenizer=tokenize, | ||
| token_pattern=None, |
There was a problem hiding this comment.
I am hoping this is for a bug fix??
| ngram_json = defaultJSON | ||
| # Validate compatibility between agent and similarity | ||
| if args.agent_name == "tfidf" and args.similarity not in ["CosineSim", "ScoreSim"]: | ||
| print("Error: TFIDF agent supports only CosineSim or ScoreSim.", file=sys.stderr) |
There was a problem hiding this comment.
Ideally this warning should be there in the help command for the agent right??
There was a problem hiding this comment.
I have added this to the help command, but left this as a sanity check for the agent!
| result = {"file": os.path.abspath(inputPath), "results": result} | ||
| result = json.dumps(result, sort_keys=True, ensure_ascii=False, indent=4) | ||
| print(result + "\n") | ||
| keyword_ok = args.skip_keyword or keyword_scanner.scan(inputPath) |
There was a problem hiding this comment.
oww, My bad. Keyword agent actually working as a support agent here for any type of scan. :D
| :return: HTTP Pool Manager | ||
| """ | ||
| proxy_val = os.environ.get('http_proxy', False) | ||
| proxy_val = os.environ.get('http_proxy') |
| cpuCount = os.cpu_count() | ||
|
|
||
| num_threads = threads if threads is not None else cpuCount | ||
| if num_threads is None: | ||
| num_threads = 1 # Fallback | ||
| if cpuCount is not None: | ||
| num_threads = min(num_threads, cpuCount * 2) |
There was a problem hiding this comment.
Can we make this a modular function??
Yeah, we can do that. I'll take a look and fix this! |
Keyword agent pre-checks for certain keywords for license-possibility detection. Keywords are from Nomos's STRINGS.in, FOSSology's license_ref.json and SPDX licenses and exceptions in a two stage pipeline. Signed-off-by: Rajul Jha <rajuljha49@gmail.com>
6031e7a to
adf3d98
Compare
Description
Introduced a new agent called KeywordAgent that performs pre-checks for license-possibility.
Changes
How to test
[!NOTE]
If building atarashi, run the build_depy.py script as well.
Screenshots