Skip to content

ADD Tree Transformation Pipeline #28

@AlpacaMax

Description

@AlpacaMax

A tree transformation stage after the parsing but before the execution of the plagiarism detection algorithm.

This stage solves two things:

  1. Parsers in tree-sitter actually generate CST, which has a lot of redundant nodes that confuse the algorithm. We can remove these redundant nodes at this stage and just leave nodes that actually contain syntactical meaning.
  2. We can also unify syntaxes. For example, we can change lambda functions into normal functions and change for loops into while loops. This is to prevent the corresponding code obfuscation techniques.

An important note is that the old parsers I used before the ones in tree-sitter actually do some of these automatically. So I'm essentially adding these features back to mayat. So before this issue is done, I recommend just using v1.0.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions