Skip to content

Conversation

@19mdavenport
Copy link
Contributor

@19mdavenport 19mdavenport commented Aug 12, 2025

Second improvement to the commented code check. Does more in-depth regex scanning (including regex for negative examples that shouldn't be counted as code, most importantly javadoc comments/documentation) rather than the simple approach of checking the end character of each line.

I'd recommend playing with this, maybe changing some of the default variables, and running it on real student code before merging (I no longer have access to the code of students taking the course)

Copy link
Contributor

@Maillman Maillman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated check is working really well against all of the student code that I've ran it against! It's even catching commented code that the previous check wasn't catching because of empty comment lines inserted in the commented code. Like this commented code that I found when I ran it on a certain student's code:

        //Session session;

        //URI socketURI = new URI(url+"/ws");
//        this.notificationHandler = notificationHandler;
//
//        WebSocketContainer container = ContainerProvider.getWebSocketContainer();
//
//        this.session = container.connectToServer(this,socketURI);

I also ran it against the autograder code under src/. It unfortunately marked https://github.com/softwareconstruction240/autograder/blob/edcbd89b68d4fdd58d0181363d0cf308ed6af8e0/src/main/java/edu/byu/cs/service/ConfigService.java#L95-L97 as a "3 comment lines likely containing commented code (75% confidence)" and https://github.com/softwareconstruction240/autograder/blob/edcbd89b68d4fdd58d0181363d0cf308ed6af8e0/src/main/java/edu/byu/cs/service/ConfigService.java#L121-L123 as a "3 comment lines likely containing commented code (75% confidence)".
It seems to include the empty comment lines as part of the "likely containing commented code" when I don't think it should.
Could you make it so it doesn't include empty lines as part of the check?

@19mdavenport
Copy link
Contributor Author

Yep, Thank you for catching that

Comment on lines +29 to +53
static {
ENTIRE_COMMENT_CODE_PATTERNS.put(Pattern.compile("[\\s\\S]*(?:if|for|while)\\s*\\([\\s\\S]+\\)\\s*\\{[\\s\\S]*}[\\s\\S]*"), 0.9); //simple if/for/while
ENTIRE_COMMENT_CODE_PATTERNS.put(Pattern.compile("[\\s\\S]*if\\s*\\([\\s\\S]+\\)\\s*\\{[\\s\\S]*}\\s*else(?:\\sif\\s*\\([\\s\\S]+\\))?\\s*\\{[\\s\\S]*}[\\s\\S]*"), 0.95); //if/else or if/elseif
ENTIRE_COMMENT_CODE_PATTERNS.put(Pattern.compile("^(?:[\\s\\S]*\\n\\s*)?(?:private|public|protected)?\\s+\\w+\\s+\\w+\\s*\\([\\s\\S]*\\)\\s*\\{[\\s\\S]*}[\\s\\S]*$"), 0.9); //method declaration

SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^(?:if|while)\\s*\\([\\s\\S]+\\)\\s*\\{?\\s$"), 0.85); //simple if/while
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^for\\s*\\([\\s\\S]*;[\\s\\S]*;[\\s\\S]*\\)\\s*\\{?\\s$"), 0.9); //start of for loop
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^\\w+\\s+\\w+\\s*=[\\s\\S]*$"), 0.9); //variable declaration
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^\\w+\\s*(?:\\+|-|\\*|/|%|&|\\^|\\||<<|>>|>>>)?=[\\s\\S]*$"), 0.85); //variable assignment
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^(?:\\w+(?:\\(.*\\))?\\.)*\\w+\\(.*\\);$"), 0.85); //method call
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^return\\s+.*;$"), 0.9); // return statements
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^\\w+(?:--|\\+\\+);?$"), 0.9); //increment/decrement
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^.*;$"), 0.8); //end semicolon
SINGLE_LINE_CODE_PATTERNS.put(Pattern.compile("^}$"), 0.6); //only end curly


ENTIRE_COMMENT_NONCODE_PATTERNS.put(Pattern.compile("^(?:\\*.*\n)*\\*.*?$"), 0.85); //Javadoc comment
ENTIRE_COMMENT_NONCODE_PATTERNS.put(Pattern.compile("^\\{(?:\\s*\"\\w+\":\\s*(?:\"?\\w+\"?|\"\"|\\[[\\s\\S]*]|\\{[\\s\\S]*}),?\\s*)+}$"), 0.75); //JSON object

SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^\\*.*$"), 0.65); //start of javadoc comment line
SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^\\*\\s+@(?:param|return|throws)\\s+.*$"), 0.8); //start of javadoc comment line
SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^\"\\w+\":\\s*(?:\"?[\\w: ]+\"?|\"\"),?$"), 0.65); //individual line of json field
SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^(?i)(?:TODO|FIXME):.*$"), 0.8);
SINGLE_LINE_NONCODE_PATTERNS.put(Pattern.compile("^NOTE:.*$"), 0.7);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maillman Are there any other code patterns or common non-code patterns you think we should be checking for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is robust enough, and already significantly better than the previous commented code check. I don't think there's anything else I would add.

Copy link
Contributor

@Maillman Maillman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'll get this merged in and make a PR for the autograder to the updated checkstyle jar.

@Maillman Maillman merged commit aaf39e4 into softwareconstruction240:main Sep 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants