Skip to content

Reconsider fare_leg_rules.rule_priority matching semantics/spec wording #575

@jsteelz

Description

@jsteelz

Describe the problem

In #418, the rule_priority field was added to the GTFS specification, along with a new set of matching semantics for calculating the price of a given fare leg. As part of this change, the correct interpretation of the rule_priority field depends on whether or not the field is included in the file's header. If rule_priority is included in the header, an empty value means rule_priority=0 for that row, and the row applies to all networks/areas/etc. if those are not defined, but if the header is not included, rule_priority is essentially empty instead of 0, and the matching rules are completely different. After a year or so of ingesting fares-v2 files since this change (which, it should be mentioned, we supported!), we've noticed that this condition is dangerous at best, and given the spec, seems unnecessary.

For data consumers (such as ourselves at Transit), this becomes an annoyance when reading in such files. It should be noted that a guiding principle of the GTFS spec is that the spec should be as "database-friendly" as possible; for us, having the meaning of a row change on whether or not the original file included a field in its header is a headache, since the most logical and obvious way of handling missing fields/values in any csv-based format is to treat their values as undefined. (If you're designing a database that handles more than one producer's flavour of GTFS, you necessarily coalesce fields that one producer doesn't include to empty, so that you correctly import when the other producer includes them).

Additionally, we notice that producers (and producers can correct me if I'm misinterpreting!) often will include all or many of the GTFS specification's headers in a given file, even if they don't use some or many of them. This is normal practice as far as I can tell, but runs the risk of breaking their fares-v2 when "correctly" interpreted by consumers such as ourselves.

With the original fare_leg_rules.txt, GTFS gained a new type of semantics that was already relatively complicated - the idea that an empty field could refer to all of the valid values not mentioned in that field, elsewhere in the file. This is a new snag in the specification that I can see leading to misunderstanding, and I argue that it is unnecessary.

Use cases

Ensure that the spec is as clear as possible to avoid mistakes during the creation and interpretation of fares-v2 data.

Proposed solution

The simplest way around this is to remove the "defaulting" behaviour to 0 on rule_priority. If so, the matching rules for a given row in fare_leg_rules.txt could be correctly derived from reading that row alone, without needing to infer context from the original file's headers: If the rule_priority field has a number, then empty network/areas do not affect matching; if the field is empty, then empty network/areas fields refer to all unmentioned area_ids in the file. The corollary benefit to such a change is that the meaning of the field could be correctly understood from reading the field's definition in the spec, rather than having to reconcile it with the preamble at the top of fare_leg_rules.txt, as well as the further appendix that defines what "existing" means for a given field.

If it is considered that such a change would be too breaking and thus unsafe, then I propose that the spec should be further clarified, both in the preamble to fare_leg_rules.txt, and in the rule_priority field's definition. In the field's definition,

An empty value for rule_priority is treated as zero.

should be changed to something like

If the rule_priority field is defined in the header, an empty value for rule_priority is treated as zero.

Additionally, the oft-used verbiage of If the rule_priority field exists in the file is unclear (even if defined in an auxiliary location in the spec, which I assume goes unread since it's not something that sounds like it needs to be further defined). I can easily see this being interpreted as if there are no defined rule_priority values in the file. I understand that this wording was used to make an already-long spec somewhat more readable, but the trade-off on clarity here is lopsided. Where this wording is used, it should be replaced to be explicit that it's talking about whether or not the file contains rule_priority in the header.

Lastly, I strongly advocate that the GTFS specification going forward should refrain from using semantics based on the presence of a field in a file's header, for the reasons stated above: notably, that it is confusing for producers and consumers alike, and that it can be easily corrected by a more appropriate data type for the concerned field(s).

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Change type: FunctionalRefers to modifications that significantly affect specification functionalities.GTFS ScheduleIssues and Pull Requests that focus on GTFS ScheduleGTFS-FaresIssues and Pull Requests that focus on GTFS-Fares Extension

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions