Skip to content

Implement subtask-based scoring for problems#347

Merged
Holmes98 merged 20 commits intomasterfrom
feature/subtask-scoring
Dec 17, 2025
Merged

Implement subtask-based scoring for problems#347
Holmes98 merged 20 commits intomasterfrom
feature/subtask-scoring

Conversation

@BelgianSalamander
Copy link
Member

This PR adds a use_subtask_scoring field (false by default) to contests. When this field is set to true, it changes how contest scores are calculated for each problem.

Specifically: instead of a user's score on a problem being the score of their best submissions, the best score is calculated for each subtask, and then these scores are added up. This allows students to solve different subtasks in separate submissions, which is especially nice for problems with subtasks that are not supersets of previous subtasks.

This is a common feature on other judges (like CMS, Kattis) and in most olympiads (including IOI - see scoring section here and EGOI). We also often emulate this feature at camps by painstakingly calculating these per-subtask scores by hand.

One slightly hacky thing I had to do here was create the fast_judge_data method on submissions. This does almost the same thing as judge_data but returns less complete information about per-testcase results (which are not necessary for calculating these subtask scores). The advantage is that this method is much faster, because judge_data performs one database query per testset.

This is reasonably fast overall. Some rough local testing on NZIC 2025 Round 1 (which has ~3000 submissions) gave me the following numbers:

  • Recalculating all contest scores for Round 1 with subtask scoring and with fast_judge_data took about 30 seconds.
  • Recalculating all contest scores for Round 1 without subtask scoring (so without these changes) took about 25 seconds.
  • Recalculating all contest scores for Round 1 with subtask scoring and without fast_judge_data took about 150 seconds (hence why I added fast_judge_data)

So, subtask scoring does not make score calculations much slower. (Recalculating all contest scores for a contest only happens when a contest is un-finalized, or when use_subtask_scoring is changed).

One slight issue is that currently this is only for contests and so could be a little confusing since it means students may see different scores on problems depending on if they view it in a contest or in a problem set. Potentially subtask scoring could also be implemented for UserProblemRelation however this is tricky since:

  • Older problems don't implement subtasks properly (as previously mentioned by Jonathan) and so this would need to be toggleable and off by default
  • Having it per-problem could result in contests with mixed scoring types which would also be confusing.

I am looking for feedback on this PR.

@coveralls
Copy link

coveralls commented Sep 1, 2025

Coverage Status

coverage: 38.386% (+0.7%) from 37.654%
when pulling bbdfb40 on feature/subtask-scoring
into 01e4e8f on master.

@puqeko
Copy link
Member

puqeko commented Sep 2, 2025

Nice work. I worry it will be confusing for NZIC participants if the practice problems are scored differently from the problems in-contest. I also think the decision about how subtasks are scored should be made at the problem level instead of the contest level since this may influence how the test data and subtasks are structured during problem design.

What if we expose a 'Scoring Method' (or similar) parameter on each problem statement (alongside the memory/time limits) with something like 'Best overall submission' or 'Best submission for each subtask' as options?

Regards the issues mentioned:

  • Can the new scoring method be on by default when creating a new problem and off by default when migrating existing problems?
  • Just don't mix scoring types when making a contest. If for some reason it has a different scoring method, edit or duplicate the problem. If you do choose to mix types, it should at-least be made clear in the problem statement (eg via Scoring Method). I think it's fine for the contest creator to be responsible for that. If we were really worried about this happening by accident then maybe a warning could be generated during contest creation if there is a mixture of scoring types in the contest problem set.

An advantage is that this approach generalises – if ever another scoring method is introduced.

@BelgianSalamander
Copy link
Member Author

Hmm I do mostly agree. Unless anyone says anything else, I'll change this to what you've said.

@BelgianSalamander
Copy link
Member Author

It looks like this might require a minor rework of how scores are stored in user_problem_relations (i.e non contest scores). Currently, user_problem_relation stores the id of the best submission, and the score is retrieved from that submission every time. This is incompatible with subtask scoring, so I will endeavour to instead have a score field directly on the relation (and update anything that needs to get this score).

@BelgianSalamander
Copy link
Member Author

This caused and uncovered some issues in the tests which I have now fixed. Notably:

  • Since problems were defaulting to using subtask scoring (which requires a submission to have a judge log) the specs were creating problems with subtask scoring but with judge logs, so all scores would get calculated wrong. I fixed this by setting scoring_method to 0 in these specs
  • When testing finalized contests, in the teardown, the contest gets unfinalized but never saved. I have added the save statement

I have also added a test for subtask scoring in contests

@BelgianSalamander
Copy link
Member Author

I have just discovered a small bug with how I modified the display of user's scores on a problem when an admin views a problem. If a user has only submitted submissions that cause a judge error, then rendering the problem page errors. I will fix this tonight.

@bagedevimo
Copy link
Contributor

I think we'd want to introduce some new tests for the new scoring method. I'll look over the rest of the code tonight and provide more detailed feedback :)

@BelgianSalamander
Copy link
Member Author

I have added some tests that makes sure subtask scores are calculated and updated correctly for both contest scores and user problem relations. Also, due to how I've slightly modified the admin view of problems to display subtask scores, there some minute changes that have happened. I have chosen to not do much about them since it is only for when you view a problem you can edit (so would actually be accessible to all users) and they are very minor. These are specifically for the list of people who have solved a These are:

  • The users in this list are now ordered by whoever viewed the problem first, instead of who submitted to the problem first
  • If a user's first submission to a problem causes a judge error, and they get zero points on all later submissions and one of these submissions does not cause a judge error, then the score gets displayed as "-" in a white box instead of "0" in a dark red outline.

If anyone does think it is worth fixing these I am happy to try, and realistically it might not be that difficult.

@BelgianSalamander BelgianSalamander changed the title Implement subtask-based scoring for contests Implement subtask-based scoring for problems Sep 5, 2025
@bagedevimo
Copy link
Contributor

A bunch of misc lines here have trailing semi-colons, which is very un-ruby likely-like and kinda a recipe for disaster down the road maybe?

Comment on lines +28 to +29
SCORING_METHOD = Enumeration.new 0 => :max_submission, 1 => :subtask_scoring
validates :scoring_method, presence: true, inclusion: { in: [0, 1] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rails has enum support which includes a few nice features for working with the enums, can we use that instead of having scoring method be an raw-integer column? https://api.rubyonrails.org/v5.0/classes/ActiveRecord/Enum.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can, will look into this

.joins("LEFT OUTER JOIN user_problem_relations ON user_problem_relations.problem_id = problems.id AND user_problem_relations.user_id = #{user_id} LEFT OUTER JOIN submissions ON submissions.id = user_problem_relations.submission_id")
.select(
"id", "name", "test_error_count", "test_warning_count", "test_status", "submissions.points", "submissions.maximum_points", "problem_set_problems.weighting"
"id", "name", "test_error_count", "test_warning_count", "test_status", "user_problem_relations.unweighted_score", "problem_set_problems.weighting"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this change result in a visual change for existing problem sets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The place where this method is used (in the view) has been modified so it should display like before. The only exception is the edge case which I described before, although I think this minor enough to ignore:

If a user's first submission to a problem causes a judge error, and they get zero points on all later submissions and one of these submissions does not cause a judge error, then the score gets displayed as "-" in a white box instead of "0" in a dark red outline.

@@ -0,0 +1,16 @@
class AddUnweightedScoreToUserProblemRelation < ActiveRecord::Migration
def change
add_column :user_problem_relations, :unweighted_score, :decimal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This column should be non-nullable, but if you want to defer that for a later change to avoid the table scan & lock thats fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the scores in a UserProblemRelation are null if the relation has been created (i.e the user has viewed the problem) and there haven't been any submissions yet, which matches the previous behaviour. I'd be happy to change it so that it can be zero if others agree

This null versus zero distinction does result in a small difference since if a user has made no submissions, and their score is null, when they view the problem their score gets displayed as "-" but if they have gotten a zero-point submission and their score is zero then they will see a score of "0/100"

# Calculate's the score for a problem based on a list of submissions
# The score is determined using the correct scoring method
# Returns [the score (0..1), the number of attempts needed to get that score, the last submission that earned points]
def score_problem_submissions(submissions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could extract the different scoring methods to service objects, then this problem class doesn't need to gain ~60 lines of fairly dense / specific to submissions code and it'll be a little more easily testable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea

@BelgianSalamander
Copy link
Member Author

To add to the goal of this PR and to summarise comments together:

Goal

This PR adds a different way of calculating a user's score on a problem. The current way just gets the max score a user achieved across all of their submissions to a problem. This works fine for simple problems but can get annoying for more complex problems.

Most modern problems have subtasks (represented by testsets in code). Simply, a subtask of a problem is the same problem but with simpler constraints than the full problem. Each subtask is worth a fraction of the total points of the problem and competitors get the points for a subtask if they pass all the testcases in it (in its testset).

The goal of subtasks is usually two-fold. The obvious one is that it allows competitors to get "partial credit". That is, if they get partway to a full solution, they can still get points for their solution (albeit not full marks). Secondly, subtasks can often be used to "guide" a competitor towards a full solution. By focusing on simpler problems, competitors can usually make observations that help them find the full solution.

But this is where the issue with the current scoring method lies. Take this this problem for example. It is very much possible to create:

  • A solution that passes only subtask 1
  • A solution that passes only subtask 2
    Each of these subtasks are worth ten points, but if a competitor were to submit these tow solutions separately their score for that problem would be recorded as only ten points. However, since they have solved both subtasks, they would probably deserve 20 points. In practice, contestants can make "frankenstein" solutions. In this case, this would mean writing something like this:
if (N == 3) {
  // Subtask one solution
} else if (max_h <= 5) {
  // Subtask two solution
}

However, this can be somewhat annoying and for problems with more complicated/more disjoint subtasks, can get quite messy (for example, with this problem it is theoretically possible to solve subtasks 1, 2, 4, 5, and 6 separately). Additionally, in the NZIC, novice coders will sometimes solve two subtasks separately but will not submit a merged solution, so they unfortunately miss out on points.

The solution to this, which is used in most major olympiads (see scoring section here or here), is to use the scoring method this pull request introduces, often called "subtask scoring". Simply, scores are first calculated per-subtask. To do this we check what the user's highest score on subtask one is across all of their submissions, then subtask two, then subtask three, etcetera... Then, their final score on the problem is just the sum of their best scores across all subtasks. With this method, competitors can submit solutions to separate subtasks as above and earn a much more representative score (so for the above case they would in fact get 20 points). This is also brings us more in line with how most olympiads actually run. We have been using this new scoring method for camp contests, but have had to calculate scores by hand after the contest which is slow and not as nice for the students

Current implementation

To not break past problems, this new scoring method is something that can be enabled per-problem. A new scoring_method column is added to the problems table. For existing problems, this will default to a value of 0 (representing max_submission_scoring in an enum) but for new problems this will default to 1 (for subtask_scoring).

After a problem gets judged, this new column is used to determine how to calculate scores for both user_problem_relation (i.e the score a user will see when they view the problem on the front page) and contest_score (the score that will be visible in a contest scoreboard).

The scoring logic has now all been extracted to services/scoring_methods.rb.

This also required a rework of how scores were stored in user_problem_relation. While contest_score just stores the actual score, user_problem_relation only stores the id of the best scoring submission, and the score is retrieved from this submission whenever it is needed. Obviously, this is incompatible with the new scoring method, as a user's score on a problem may be higher than the score of any of their individual submissions. So, an unweighted_score field is added to this table (unweighted meaning that it stores the score as a fraction of the maximum score, since the amount of points a problem is worth depends on which problem set it is in). For existing entries, this field is automatically set to the appropriate value (user_problem_relation.submission.points / user_problem.submission.max_points, making sure nothing is null and that we are not dividing by zero). This migration has been tested on a backup of the nztrain database.

Surprisingly, this needed minimal changes for everything to display properly. When rendered in problem sets, the score is gotten from problem#weighted_score. This is itself meant to be called on problems joined to user_problem_relations from problem_set#problems_with_scores_by_user so this needed to be modified to include the unweighted_score field. The only other place this score is seen is in the admin view of a problem, so this required some modifications to problems_controller.rb.

@Holmes98 Holmes98 merged commit 6474deb into master Dec 17, 2025
4 checks passed
@Holmes98 Holmes98 deleted the feature/subtask-scoring branch December 17, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants