Implement subtask-based scoring for problems#347
Conversation
|
Nice work. I worry it will be confusing for NZIC participants if the practice problems are scored differently from the problems in-contest. I also think the decision about how subtasks are scored should be made at the problem level instead of the contest level since this may influence how the test data and subtasks are structured during problem design. What if we expose a 'Scoring Method' (or similar) parameter on each problem statement (alongside the memory/time limits) with something like 'Best overall submission' or 'Best submission for each subtask' as options? Regards the issues mentioned:
An advantage is that this approach generalises – if ever another scoring method is introduced. |
|
Hmm I do mostly agree. Unless anyone says anything else, I'll change this to what you've said. |
|
It looks like this might require a minor rework of how scores are stored in |
…d problem sets use new scoring
|
This caused and uncovered some issues in the tests which I have now fixed. Notably:
I have also added a test for subtask scoring in contests |
|
I have just discovered a small bug with how I modified the display of user's scores on a problem when an admin views a problem. If a user has only submitted submissions that cause a judge error, then rendering the problem page errors. I will fix this tonight. |
|
I think we'd want to introduce some new tests for the new scoring method. I'll look over the rest of the code tonight and provide more detailed feedback :) |
|
I have added some tests that makes sure subtask scores are calculated and updated correctly for both contest scores and user problem relations. Also, due to how I've slightly modified the admin view of problems to display subtask scores, there some minute changes that have happened. I have chosen to not do much about them since it is only for when you view a problem you can edit (so would actually be accessible to all users) and they are very minor. These are specifically for the list of people who have solved a These are:
If anyone does think it is worth fixing these I am happy to try, and realistically it might not be that difficult. |
|
A bunch of misc lines here have trailing semi-colons, which is very un-ruby likely-like and kinda a recipe for disaster down the road maybe? |
app/models/problem.rb
Outdated
| SCORING_METHOD = Enumeration.new 0 => :max_submission, 1 => :subtask_scoring | ||
| validates :scoring_method, presence: true, inclusion: { in: [0, 1] } |
There was a problem hiding this comment.
Rails has enum support which includes a few nice features for working with the enums, can we use that instead of having scoring method be an raw-integer column? https://api.rubyonrails.org/v5.0/classes/ActiveRecord/Enum.html
There was a problem hiding this comment.
We probably can, will look into this
| .joins("LEFT OUTER JOIN user_problem_relations ON user_problem_relations.problem_id = problems.id AND user_problem_relations.user_id = #{user_id} LEFT OUTER JOIN submissions ON submissions.id = user_problem_relations.submission_id") | ||
| .select( | ||
| "id", "name", "test_error_count", "test_warning_count", "test_status", "submissions.points", "submissions.maximum_points", "problem_set_problems.weighting" | ||
| "id", "name", "test_error_count", "test_warning_count", "test_status", "user_problem_relations.unweighted_score", "problem_set_problems.weighting" |
There was a problem hiding this comment.
will this change result in a visual change for existing problem sets?
There was a problem hiding this comment.
The place where this method is used (in the view) has been modified so it should display like before. The only exception is the edge case which I described before, although I think this minor enough to ignore:
If a user's first submission to a problem causes a judge error, and they get zero points on all later submissions and one of these submissions does not cause a judge error, then the score gets displayed as "-" in a white box instead of "0" in a dark red outline.
| @@ -0,0 +1,16 @@ | |||
| class AddUnweightedScoreToUserProblemRelation < ActiveRecord::Migration | |||
| def change | |||
| add_column :user_problem_relations, :unweighted_score, :decimal | |||
There was a problem hiding this comment.
This column should be non-nullable, but if you want to defer that for a later change to avoid the table scan & lock thats fine
There was a problem hiding this comment.
Currently the scores in a UserProblemRelation are null if the relation has been created (i.e the user has viewed the problem) and there haven't been any submissions yet, which matches the previous behaviour. I'd be happy to change it so that it can be zero if others agree
This null versus zero distinction does result in a small difference since if a user has made no submissions, and their score is null, when they view the problem their score gets displayed as "-" but if they have gotten a zero-point submission and their score is zero then they will see a score of "0/100"
app/models/problem.rb
Outdated
| # Calculate's the score for a problem based on a list of submissions | ||
| # The score is determined using the correct scoring method | ||
| # Returns [the score (0..1), the number of attempts needed to get that score, the last submission that earned points] | ||
| def score_problem_submissions(submissions) |
There was a problem hiding this comment.
We could extract the different scoring methods to service objects, then this problem class doesn't need to gain ~60 lines of fairly dense / specific to submissions code and it'll be a little more easily testable
There was a problem hiding this comment.
Sounds like a good idea
|
To add to the goal of this PR and to summarise comments together: GoalThis PR adds a different way of calculating a user's score on a problem. The current way just gets the max score a user achieved across all of their submissions to a problem. This works fine for simple problems but can get annoying for more complex problems. Most modern problems have subtasks (represented by The goal of subtasks is usually two-fold. The obvious one is that it allows competitors to get "partial credit". That is, if they get partway to a full solution, they can still get points for their solution (albeit not full marks). Secondly, subtasks can often be used to "guide" a competitor towards a full solution. By focusing on simpler problems, competitors can usually make observations that help them find the full solution. But this is where the issue with the current scoring method lies. Take this this problem for example. It is very much possible to create:
if (N == 3) {
// Subtask one solution
} else if (max_h <= 5) {
// Subtask two solution
}However, this can be somewhat annoying and for problems with more complicated/more disjoint subtasks, can get quite messy (for example, with this problem it is theoretically possible to solve subtasks 1, 2, 4, 5, and 6 separately). Additionally, in the NZIC, novice coders will sometimes solve two subtasks separately but will not submit a merged solution, so they unfortunately miss out on points. The solution to this, which is used in most major olympiads (see scoring section here or here), is to use the scoring method this pull request introduces, often called "subtask scoring". Simply, scores are first calculated per-subtask. To do this we check what the user's highest score on subtask one is across all of their submissions, then subtask two, then subtask three, etcetera... Then, their final score on the problem is just the sum of their best scores across all subtasks. With this method, competitors can submit solutions to separate subtasks as above and earn a much more representative score (so for the above case they would in fact get 20 points). This is also brings us more in line with how most olympiads actually run. We have been using this new scoring method for camp contests, but have had to calculate scores by hand after the contest which is slow and not as nice for the students Current implementationTo not break past problems, this new scoring method is something that can be enabled per-problem. A new After a problem gets judged, this new column is used to determine how to calculate scores for both The scoring logic has now all been extracted to This also required a rework of how scores were stored in Surprisingly, this needed minimal changes for everything to display properly. When rendered in problem sets, the score is gotten from |
This PR adds a
use_subtask_scoringfield (false by default) to contests. When this field is set to true, it changes how contest scores are calculated for each problem.Specifically: instead of a user's score on a problem being the score of their best submissions, the best score is calculated for each subtask, and then these scores are added up. This allows students to solve different subtasks in separate submissions, which is especially nice for problems with subtasks that are not supersets of previous subtasks.
This is a common feature on other judges (like CMS, Kattis) and in most olympiads (including IOI - see scoring section here and EGOI). We also often emulate this feature at camps by painstakingly calculating these per-subtask scores by hand.
One slightly hacky thing I had to do here was create the
fast_judge_datamethod on submissions. This does almost the same thing asjudge_databut returns less complete information about per-testcase results (which are not necessary for calculating these subtask scores). The advantage is that this method is much faster, becausejudge_dataperforms one database query per testset.This is reasonably fast overall. Some rough local testing on NZIC 2025 Round 1 (which has ~3000 submissions) gave me the following numbers:
fast_judge_datatook about 30 seconds.fast_judge_datatook about 150 seconds (hence why I addedfast_judge_data)So, subtask scoring does not make score calculations much slower. (Recalculating all contest scores for a contest only happens when a contest is un-finalized, or when use_subtask_scoring is changed).
One slight issue is that currently this is only for contests and so could be a little confusing since it means students may see different scores on problems depending on if they view it in a contest or in a problem set. Potentially subtask scoring could also be implemented for
UserProblemRelationhowever this is tricky since:I am looking for feedback on this PR.