This program is a pattern detection program that can detect patterns in a text file. It can detect both cybersecurity patterns and vulnerability patterns.
To use the program, you need to provide the program with two arguments:
- -vce: The path to the vulnerability pattern file
- -me: The path to the code to analyze
- -m: model used to find similarities - codex or ccflex
Typing '-h' or '--help' will display the help menu.
The directory test_code contains the code for testing of the tool, which can be added as -vce and -me arguments.
The software uses programming language models to transform each module into an embedding vector and compares the vectors. The comparison is done based on the distances between the modules and the reference code.
The reference code is collected manually as part of a research project and illustrate examples of vulnerabilities as well as their solutions.
The models that are used in this program are:
- codeBERT from hugging face
- singBerta - own model trained on linux Kernel
- wolfbert - own model trained on wolfSSL
- CCFlex: https://github.com/mochodek/py-ccflex
- OpenAI Codex: https://openai.com/
- main.py - the main module to execute with the parameters
- analysis.py - module that analyzes the distances and find the most similar programs
- checks.py - module that checks if the environment is set in the correct way
- codex_embeddings.py - module to extract the embeddings from CodeX using the openAI API
- mslogging.py - logging module (not used at the moment)
- parse_arguments.py - module to parse the arguments from the command line
- print_header.py - module to print the header and footer of the program
- setup_dirs.py - module that sets up the directories for the product
There are three ways to use it:
- python main.py + the parameters above
- As a web app - python app.py - send JSON strings there. The endpoint "/" on port 5000 provides an instruction how to use it
- As a docker container - pull it from hub.docker.com: miroslawstaron/ccsat
Miroslaw Staron, email (C) Miroslaw Staron, 2021-2023