Fix timetable stop sorting algorithm #72

Closed
opened 2026-04-08 00:16:46 +00:00 by maxtkc · 1 comment
Owner

It used to work correctly, but now I can end up with a row that has no times, which shouldn't be possible... Are we not still using the correct algorithm? Investigate this.

  Investigation Findings

  SCS Algorithm: Theoretically Sound, But...

  I traced through the full algorithm and I can prove that the current SCS + greedy alignment cannot produce an all-null row by design. Every position in the supersequence must have at least one aligned trip (the element at position p was
   placed there by some sequence, and that sequence's greedy alignment will cover p).

  The deduplication in shortestCommonSupersequence is safe because computeAlignments runs on all original sequences, and duplicate sequences are identical to their unique counterpart so they align identically.

  The Real Weakness: rebuildStopTimesFromTable Corrupts Stop Order

  This is the most dangerous part of the system. Every time any time value is edited, rebuildStopTimesFromTable is called which:
  1. Reads all inputs for that trip from the DOM
  2. Sorts stop_times by time value to assign new stop_sequence numbers
  3. Replaces all stop_times for the trip atomically

  So if you edit Stop B's time from 09:00 to 06:00, making it earlier than Stop A (08:00):

  - Before edit: DB has A=seq1, B=seq2 (route order)
  - After rebuildStopTimesFromTable: DB has B=seq1, A=seq2 (time order — WRONG)

  This means the SCS sees trips with conflicting stop orders even when they all serve the same stops in the same route order. The SCS then produces a supersequence with stops appearing twice, like [B, A, B], which shows a confusing
  duplicate row — but not an all-null row.

  Possible Scenario for Original Null Row

  I believe the null-row bug may have existed in an earlier version of the alignment code (the commits fix SCS and other changes, fix trip alignment again suggest repeated fixes). The current code may not reproduce it. The time-sort
  corruption is a separate, still-active problem.

  ---
  Questions Before Planning

  1. The SCS algorithm's computeExactSCS only advances sequences one-at-a-time when they disagree. For 3+ sequences, this is suboptimal — it can produce unnecessarily long supersequences (causing duplicate rows). Example: with trips
  [A→B→C], [A→C→B], [B→C→A] you'd get a 6-stop supersequence instead of something shorter. Do you want the fix to also improve the SCS algorithm for correctness with 3+ sequences, or just address the stop-sorting bug?

  2. The rebuildStopTimesFromTable sorts by time to determine stop_sequence. This was presumably intentional — it keeps the sequence in "chronological order of visits." But for routes with branching or out-of-order times, this corrupts
  the topology. Should stop_sequence be time-ordered or should it be preserved as explicitly set (with time edits leaving sequences unchanged)?

  3. When you had the all-null row — do you remember approximately when relative to the commits I can see? Was it before or after the fix SCS and other changes commit (e813591)? And was it with a single trip or multiple trips?

  4. Can you try to reproduce it by: creating two trips with stops in opposite orders (Trip A: Stop1→Stop2→Stop3, Trip B: Stop3→Stop2→Stop1), then editing times aggressively? I want to see if that produces duplicate rows vs null rows in
  the current code.

Can't figure out how to reproduce the error...

It used to work correctly, but now I can end up with a row that has no times, which shouldn't be possible... Are we not still using the correct algorithm? Investigate this. ``` Investigation Findings SCS Algorithm: Theoretically Sound, But... I traced through the full algorithm and I can prove that the current SCS + greedy alignment cannot produce an all-null row by design. Every position in the supersequence must have at least one aligned trip (the element at position p was placed there by some sequence, and that sequence's greedy alignment will cover p). The deduplication in shortestCommonSupersequence is safe because computeAlignments runs on all original sequences, and duplicate sequences are identical to their unique counterpart so they align identically. The Real Weakness: rebuildStopTimesFromTable Corrupts Stop Order This is the most dangerous part of the system. Every time any time value is edited, rebuildStopTimesFromTable is called which: 1. Reads all inputs for that trip from the DOM 2. Sorts stop_times by time value to assign new stop_sequence numbers 3. Replaces all stop_times for the trip atomically So if you edit Stop B's time from 09:00 to 06:00, making it earlier than Stop A (08:00): - Before edit: DB has A=seq1, B=seq2 (route order) - After rebuildStopTimesFromTable: DB has B=seq1, A=seq2 (time order — WRONG) This means the SCS sees trips with conflicting stop orders even when they all serve the same stops in the same route order. The SCS then produces a supersequence with stops appearing twice, like [B, A, B], which shows a confusing duplicate row — but not an all-null row. Possible Scenario for Original Null Row I believe the null-row bug may have existed in an earlier version of the alignment code (the commits fix SCS and other changes, fix trip alignment again suggest repeated fixes). The current code may not reproduce it. The time-sort corruption is a separate, still-active problem. --- Questions Before Planning 1. The SCS algorithm's computeExactSCS only advances sequences one-at-a-time when they disagree. For 3+ sequences, this is suboptimal — it can produce unnecessarily long supersequences (causing duplicate rows). Example: with trips [A→B→C], [A→C→B], [B→C→A] you'd get a 6-stop supersequence instead of something shorter. Do you want the fix to also improve the SCS algorithm for correctness with 3+ sequences, or just address the stop-sorting bug? 2. The rebuildStopTimesFromTable sorts by time to determine stop_sequence. This was presumably intentional — it keeps the sequence in "chronological order of visits." But for routes with branching or out-of-order times, this corrupts the topology. Should stop_sequence be time-ordered or should it be preserved as explicitly set (with time edits leaving sequences unchanged)? 3. When you had the all-null row — do you remember approximately when relative to the commits I can see? Was it before or after the fix SCS and other changes commit (e813591)? And was it with a single trip or multiple trips? 4. Can you try to reproduce it by: creating two trips with stops in opposite orders (Trip A: Stop1→Stop2→Stop3, Trip B: Stop3→Stop2→Stop1), then editing times aggressively? I want to see if that produces duplicate rows vs null rows in the current code. ``` Can't figure out how to reproduce the error...
maxtkc self-assigned this 2026-04-08 00:16:46 +00:00
Author
Owner

Closing for now

Closing for now
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
gtfs.zone/coloring-book#72
No description provided.