Implement subtask-based scoring for problems by BelgianSalamander · Pull Request #347 · NZOI/nztrain

BelgianSalamander · 2025-09-01T12:30:53Z

This PR adds a use_subtask_scoring field (false by default) to contests. When this field is set to true, it changes how contest scores are calculated for each problem.

Specifically: instead of a user's score on a problem being the score of their best submissions, the best score is calculated for each subtask, and then these scores are added up. This allows students to solve different subtasks in separate submissions, which is especially nice for problems with subtasks that are not supersets of previous subtasks.

This is a common feature on other judges (like CMS, Kattis) and in most olympiads (including IOI - see scoring section here and EGOI). We also often emulate this feature at camps by painstakingly calculating these per-subtask scores by hand.

One slightly hacky thing I had to do here was create the fast_judge_data method on submissions. This does almost the same thing as judge_data but returns less complete information about per-testcase results (which are not necessary for calculating these subtask scores). The advantage is that this method is much faster, because judge_data performs one database query per testset.

This is reasonably fast overall. Some rough local testing on NZIC 2025 Round 1 (which has ~3000 submissions) gave me the following numbers:

Recalculating all contest scores for Round 1 with subtask scoring and with fast_judge_data took about 30 seconds.
Recalculating all contest scores for Round 1 without subtask scoring (so without these changes) took about 25 seconds.
Recalculating all contest scores for Round 1 with subtask scoring and without fast_judge_data took about 150 seconds (hence why I added fast_judge_data)

So, subtask scoring does not make score calculations much slower. (Recalculating all contest scores for a contest only happens when a contest is un-finalized, or when use_subtask_scoring is changed).

One slight issue is that currently this is only for contests and so could be a little confusing since it means students may see different scores on problems depending on if they view it in a contest or in a problem set. Potentially subtask scoring could also be implemented for UserProblemRelation however this is tricky since:

Older problems don't implement subtasks properly (as previously mentioned by Jonathan) and so this would need to be toggleable and off by default
Having it per-problem could result in contests with mixed scoring types which would also be confusing.

I am looking for feedback on this PR.

coveralls · 2025-09-01T12:46:26Z

coverage: 38.386% (+0.7%) from 37.654%
when pulling bbdfb40 on feature/subtask-scoring
into 01e4e8f on master.

puqeko · 2025-09-02T08:03:55Z

Nice work. I worry it will be confusing for NZIC participants if the practice problems are scored differently from the problems in-contest. I also think the decision about how subtasks are scored should be made at the problem level instead of the contest level since this may influence how the test data and subtasks are structured during problem design.

What if we expose a 'Scoring Method' (or similar) parameter on each problem statement (alongside the memory/time limits) with something like 'Best overall submission' or 'Best submission for each subtask' as options?

Regards the issues mentioned:

Can the new scoring method be on by default when creating a new problem and off by default when migrating existing problems?
Just don't mix scoring types when making a contest. If for some reason it has a different scoring method, edit or duplicate the problem. If you do choose to mix types, it should at-least be made clear in the problem statement (eg via Scoring Method). I think it's fine for the contest creator to be responsible for that. If we were really worried about this happening by accident then maybe a warning could be generated during contest creation if there is a mixture of scoring types in the contest problem set.

An advantage is that this approach generalises – if ever another scoring method is introduced.

BelgianSalamander · 2025-09-02T08:42:31Z

Hmm I do mostly agree. Unless anyone says anything else, I'll change this to what you've said.

… contests)

BelgianSalamander · 2025-09-02T12:06:34Z

It looks like this might require a minor rework of how scores are stored in user_problem_relations (i.e non contest scores). Currently, user_problem_relation stores the id of the best submission, and the score is retrieved from that submission every time. This is incompatible with subtask scoring, so I will endeavour to instead have a score field directly on the relation (and update anything that needs to get this score).

…d problem sets use new scoring

…anged

…tests

BelgianSalamander · 2025-09-03T02:58:13Z

This caused and uncovered some issues in the tests which I have now fixed. Notably:

Since problems were defaulting to using subtask scoring (which requires a submission to have a judge log) the specs were creating problems with subtask scoring but with judge logs, so all scores would get calculated wrong. I fixed this by setting scoring_method to 0 in these specs
When testing finalized contests, in the teardown, the contest gets unfinalized but never saved. I have added the save statement

I have also added a test for subtask scoring in contests

BelgianSalamander · 2025-09-03T03:52:25Z

I have just discovered a small bug with how I modified the display of user's scores on a problem when an admin views a problem. If a user has only submitted submissions that cause a judge error, then rendering the problem page errors. I will fix this tonight.

bagedevimo · 2025-09-03T05:26:03Z

I think we'd want to introduce some new tests for the new scoring method. I'll look over the rest of the code tonight and provide more detailed feedback :)

…errored

… minor errors

BelgianSalamander · 2025-09-05T01:30:52Z

I have added some tests that makes sure subtask scores are calculated and updated correctly for both contest scores and user problem relations. Also, due to how I've slightly modified the admin view of problems to display subtask scores, there some minute changes that have happened. I have chosen to not do much about them since it is only for when you view a problem you can edit (so would actually be accessible to all users) and they are very minor. These are specifically for the list of people who have solved a These are:

The users in this list are now ordered by whoever viewed the problem first, instead of who submitted to the problem first
If a user's first submission to a problem causes a judge error, and they get zero points on all later submissions and one of these submissions does not cause a judge error, then the score gets displayed as "-" in a white box instead of "0" in a dark red outline.

If anyone does think it is worth fixing these I am happy to try, and realistically it might not be that difficult.

bagedevimo · 2025-09-09T01:05:12Z

A bunch of misc lines here have trailing semi-colons, which is very un-ruby likely-like and kinda a recipe for disaster down the road maybe?

bagedevimo · 2025-09-09T01:02:17Z

app/models/problem.rb

+  SCORING_METHOD = Enumeration.new 0 => :max_submission, 1 => :subtask_scoring
+  validates :scoring_method, presence: true, inclusion: { in: [0, 1] }


Rails has enum support which includes a few nice features for working with the enums, can we use that instead of having scoring method be an raw-integer column? https://api.rubyonrails.org/v5.0/classes/ActiveRecord/Enum.html

We probably can, will look into this

bagedevimo · 2025-09-09T01:06:42Z

app/models/problem_set.rb

      .joins("LEFT OUTER JOIN user_problem_relations ON user_problem_relations.problem_id = problems.id AND user_problem_relations.user_id = #{user_id} LEFT OUTER JOIN submissions ON submissions.id = user_problem_relations.submission_id")
      .select(
-        "id", "name", "test_error_count", "test_warning_count", "test_status", "submissions.points", "submissions.maximum_points", "problem_set_problems.weighting"
+        "id", "name", "test_error_count", "test_warning_count", "test_status", "user_problem_relations.unweighted_score", "problem_set_problems.weighting"


will this change result in a visual change for existing problem sets?

The place where this method is used (in the view) has been modified so it should display like before. The only exception is the edge case which I described before, although I think this minor enough to ignore:

If a user's first submission to a problem causes a judge error, and they get zero points on all later submissions and one of these submissions does not cause a judge error, then the score gets displayed as "-" in a white box instead of "0" in a dark red outline.

bagedevimo · 2025-09-09T01:07:53Z

db/migrate/20250903000840_add_unweighted_score_to_user_problem_relation.rb

@@ -0,0 +1,16 @@
+class AddUnweightedScoreToUserProblemRelation < ActiveRecord::Migration
+  def change
+    add_column :user_problem_relations, :unweighted_score, :decimal


This column should be non-nullable, but if you want to defer that for a later change to avoid the table scan & lock thats fine

Currently the scores in a UserProblemRelation are null if the relation has been created (i.e the user has viewed the problem) and there haven't been any submissions yet, which matches the previous behaviour. I'd be happy to change it so that it can be zero if others agree

This null versus zero distinction does result in a small difference since if a user has made no submissions, and their score is null, when they view the problem their score gets displayed as "-" but if they have gotten a zero-point submission and their score is zero then they will see a score of "0/100"

bagedevimo · 2025-09-09T01:08:59Z

app/models/problem.rb

+  # Calculate's the score for a problem based on a list of submissions
+  # The score is determined using the correct scoring method
+  # Returns [the score (0..1), the number of attempts needed to get that score, the last submission that earned points]
+  def score_problem_submissions(submissions)


We could extract the different scoring methods to service objects, then this problem class doesn't need to gain ~60 lines of fairly dense / specific to submissions code and it'll be a little more easily testable

Sounds like a good idea

…issue

BelgianSalamander · 2025-12-11T09:59:02Z

To add to the goal of this PR and to summarise comments together:

Goal

This PR adds a different way of calculating a user's score on a problem. The current way just gets the max score a user achieved across all of their submissions to a problem. This works fine for simple problems but can get annoying for more complex problems.

Most modern problems have subtasks (represented by testsets in code). Simply, a subtask of a problem is the same problem but with simpler constraints than the full problem. Each subtask is worth a fraction of the total points of the problem and competitors get the points for a subtask if they pass all the testcases in it (in its testset).

The goal of subtasks is usually two-fold. The obvious one is that it allows competitors to get "partial credit". That is, if they get partway to a full solution, they can still get points for their solution (albeit not full marks). Secondly, subtasks can often be used to "guide" a competitor towards a full solution. By focusing on simpler problems, competitors can usually make observations that help them find the full solution.

But this is where the issue with the current scoring method lies. Take this this problem for example. It is very much possible to create:

A solution that passes only subtask 1
A solution that passes only subtask 2
Each of these subtasks are worth ten points, but if a competitor were to submit these tow solutions separately their score for that problem would be recorded as only ten points. However, since they have solved both subtasks, they would probably deserve 20 points. In practice, contestants can make "frankenstein" solutions. In this case, this would mean writing something like this:

if (N == 3) {
  // Subtask one solution
} else if (max_h <= 5) {
  // Subtask two solution
}

However, this can be somewhat annoying and for problems with more complicated/more disjoint subtasks, can get quite messy (for example, with this problem it is theoretically possible to solve subtasks 1, 2, 4, 5, and 6 separately). Additionally, in the NZIC, novice coders will sometimes solve two subtasks separately but will not submit a merged solution, so they unfortunately miss out on points.

The solution to this, which is used in most major olympiads (see scoring section here or here), is to use the scoring method this pull request introduces, often called "subtask scoring". Simply, scores are first calculated per-subtask. To do this we check what the user's highest score on subtask one is across all of their submissions, then subtask two, then subtask three, etcetera... Then, their final score on the problem is just the sum of their best scores across all subtasks. With this method, competitors can submit solutions to separate subtasks as above and earn a much more representative score (so for the above case they would in fact get 20 points). This is also brings us more in line with how most olympiads actually run. We have been using this new scoring method for camp contests, but have had to calculate scores by hand after the contest which is slow and not as nice for the students

Current implementation

To not break past problems, this new scoring method is something that can be enabled per-problem. A new scoring_method column is added to the problems table. For existing problems, this will default to a value of 0 (representing max_submission_scoring in an enum) but for new problems this will default to 1 (for subtask_scoring).

After a problem gets judged, this new column is used to determine how to calculate scores for both user_problem_relation (i.e the score a user will see when they view the problem on the front page) and contest_score (the score that will be visible in a contest scoreboard).

The scoring logic has now all been extracted to services/scoring_methods.rb.

This also required a rework of how scores were stored in user_problem_relation. While contest_score just stores the actual score, user_problem_relation only stores the id of the best scoring submission, and the score is retrieved from this submission whenever it is needed. Obviously, this is incompatible with the new scoring method, as a user's score on a problem may be higher than the score of any of their individual submissions. So, an unweighted_score field is added to this table (unweighted meaning that it stores the score as a fraction of the maximum score, since the amount of points a problem is worth depends on which problem set it is in). For existing entries, this field is automatically set to the appropriate value (user_problem_relation.submission.points / user_problem.submission.max_points, making sure nothing is null and that we are not dividing by zero). This migration has been tested on a backup of the nztrain database.

Surprisingly, this needed minimal changes for everything to display properly. When rendered in problem sets, the score is gotten from problem#weighted_score. This is itself meant to be called on problems joined to user_problem_relations from problem_set#problems_with_scores_by_user so this needed to be modified to include the unweighted_score field. The only other place this score is seen is in the admin view of a problem, so this required some modifications to problems_controller.rb.

db/migrate/20250902233414_add_scoring_method_to_problem.rb

app/models/contest_score.rb

… pass

BelgianSalamander added 4 commits September 1, 2025 23:26

Add subtask scoring field to contest

f72fab0

Implement subtask scoring

c45db7c

Update contest scores when 'subtask scoring' setting is changed

19f87a0

Fix style issues

c04d0c7

BelgianSalamander added 2 commits September 2, 2025 23:54

Add scoring method field to problems (and remove subtask scoring from…

2708679

… contests)

Make contest scoring follow scoring_method

5369851

BelgianSalamander added 5 commits September 3, 2025 01:14

Add unweighted_score field on user_problem_relation

4973ae3

Abstract out problem scoring (into problem model) and make contest an…

8413eeb

…d problem sets use new scoring

Update corresponding contest scores when problem scoring method is ch…

c7c7dca

…anged

Fix style issues in user_problem_relation.rb

2396de0

Fix previous contest specs and create spec for subtask scoring in con…

27e6727

…tests

BelgianSalamander added 2 commits September 5, 2025 12:31

Fix page error when rendering admin view with submissions that judge-…

a39e3d5

…errored

Create more specs for contest_score and user_problem_relation and fix…

d286fba

… minor errors

BelgianSalamander changed the title ~~Implement subtask-based scoring for contests~~ Implement subtask-based scoring for problems Sep 5, 2025

bagedevimo reviewed Sep 9, 2025

View reviewed changes

BelgianSalamander added 3 commits November 24, 2025 20:52

Use enum to represent scoring method in problem model

1fb1633

Extract problem/submission scoring logic into ScoringMethods

9ffcb39

Fix standardrb issues and remove semicolons

b9fe1d9

Reset column information after adding enum to fix migration crashing …

1feb6f1

…issue

bagedevimo reviewed Dec 14, 2025

View reviewed changes

db/migrate/20250902233414_add_scoring_method_to_problem.rb Show resolved Hide resolved

app/models/contest_score.rb Outdated Show resolved Hide resolved

BelgianSalamander added 3 commits December 15, 2025 22:13

Add specs for contest score edge cases (one of them is breaking)

90d87cd

Remove edge case where scoring_methods could return nil and make test…

e79e9c9

… pass

Add nil checks to contest_score

bbdfb40

bagedevimo approved these changes Dec 15, 2025

View reviewed changes

Holmes98 merged commit 6474deb into master Dec 17, 2025
4 checks passed

Holmes98 deleted the feature/subtask-scoring branch December 17, 2025 06:55

		SCORING_METHOD = Enumeration.new 0 => :max_submission, 1 => :subtask_scoring
		validates :scoring_method, presence: true, inclusion: { in: [0, 1] }

Conversation

BelgianSalamander commented Sep 1, 2025

Uh oh!

coveralls commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puqeko commented Sep 2, 2025

Uh oh!

BelgianSalamander commented Sep 2, 2025

Uh oh!

BelgianSalamander commented Sep 2, 2025

Uh oh!

BelgianSalamander commented Sep 3, 2025

Uh oh!

BelgianSalamander commented Sep 3, 2025

Uh oh!

bagedevimo commented Sep 3, 2025

Uh oh!

BelgianSalamander commented Sep 5, 2025

Uh oh!

bagedevimo commented Sep 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BelgianSalamander commented Dec 11, 2025

Goal

Current implementation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Sep 1, 2025 •

edited

Loading