Skip to content

Duplicate transit segments cause join explosion #14

@yueshuaing

Description

@yueshuaing

Background
tm2py exports two set of transit segment-level outputs:

  • boardings_by_segment_{time_period}.geojson
Image
  • transit_segment_{time_period}.csv (more variables)
Image

Transit segments can repeat within a line (loops/round trips). e.g., line 10_573_201_AM_d0_s490 traverses the same segment (INODE=266386, JNODE=294084) twice. The second pass is indicated by a “-2” suffix. The two passes have different boardings.
Image

Issue
Because of duplicate segments, it is not reliable to merge the two outputs based on LINE_ID, instead we should merge based on the unique longer segment name (line in the screenshots)

in_file = self.scenario_dir / f"output_summaries/boardings_by_segment_{time_period}.geojson"
logging.info(f"Reading {in_file}")
gdf = gpd.read_file(in_file)
logging.debug(f"gdf:\n{gdf}")
gdf["first_row_in_line"] = gdf.groupby("LINE_ID").cumcount() == 0
df = pd.DataFrame(gdf.drop(columns=["geometry"]))
# Add am boards
a_df = self.am_segment_boardings_df
df = pd.merge(df, a_df, how="left", on=["LINE_ID", "INODE", "JNODE"])

Plan
We could add the unique segment name to "boardings_by_segment_{time_period}.geojson, and also add the boarding variable to it so we don't need extra steps to merge boarding from transit_segment_{time_period}.csv. This can be done in tm2py, post-processor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions