2. Multi-modal routing with a GTFS-Multi feed

The gtfsmulti package implements a very experimental approach to creating, storing and analyzing multi-modal transportation networks. The first vignette introduced the GTFS-Multi format and showed how to construct, read and write one. In this vignette we will show how to calculate multi-modal routes with a GTFS-Multi feed.

library(gtfsmulti)
library(mapview)
library(sf)

Getting the data

The GTFS-Multi feed for the city center of Tampere, created in the first vignette, is included in the package as an object called tampere. It is of class multinet and contains all required datasets of a GTFS-Multi feed.

names(tampere)
#>  [1] "calendar_dates" "agency"         "stop_times"     "routes"        
#>  [5] "transfers"      "calendar"       "trips"          "stops"         
#>  [9] "grid"           "access"         "direct"         "egress"

class(tampere)
#> [1] "multinet" "gtfs"     "list"

In addition, we will read the reference grid that was used to build this feed. Such a grid will usually be stored as a GeoTIFF file alongside the feed itself. We have it included as external package data.

grid = import_grid(system.file("tampere.geotiff", package = "gtfsmulti"))
grid
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> tampere.geotiff     1       1      1    1       1    1  859
#> dimension(s):
#>   from to  offset delta                refsys point values x/y
#> x    1 50  325200   100 ETRS89 / TM35FIN(E,N) FALSE   NULL [x]
#> y    1 50 6824850  -100 ETRS89 / TM35FIN(E,N) FALSE   NULL [y]

Setting the origin location and start time limits

Routes will always be calculated from a single origin to all possible destinations. In practice, this means that the given origin point is first snapped to its nearest grid point, and that from that grid points shortest routes are found to all other grid points. The coordinates of the origin location are supposed to be expressed as longitude and latitude values in coordinate reference system EPSG:4326.

Lets set an origin location somewhere in downtown Tampere:

origin = st_sfc(st_point(c(23.761215, 61.496725)), crs = 4326)

Internally the function get_nearest_neighbour() is used to find the nearest grid point in the GTFS-Multi feed to such an origin location. It returns the index of this grid point as well as the distance in meters between the origin location and the grid point. But as said, this is all done internally.

coords = st_coordinates(origin)
get_nearest_neighbour(coords, tampere$grid[, c("stop_id", "stop_lon", "stop_lat")])
#> $idx
#> [1] "I25J25"
#> 
#> $dist
#> [1] 30.22641

In addition we should define at which time a trip is allowed to start. You can set this as a time interval with an upper and lower bound. The time instants should be given as seconds since midnight. In our examples we will use a time interval between 8:00 AM and 9:00 AM. That is, a trip may start at any time during this time interval. We should also define during which day of the week we want to calculate travel times, as transit timetables may be different between days (if we don’t set this automatically the current day is taken).

time = c(8 * 3600, 9 * 3600)
day = "Monday"

Finding the shortest travel time

The standard approach is to calculate the shortest possible travel time of a trip between an origin and a destination given that the trip should start somewhere in the given time interval. In gtfsmulti, the multinet_traveltimes() function is the core multi-modal routing function. It calculates door-to-door travel times that take into account all different parts of a multi-modal trip. The calculation steps are the following:

The nearest grid point to the given origin location is found. An imaginary pre-leg of the trip accounts to some extent for the distance between these two points. This distance (as-the-crow-flies) is assumed to be travelled at a fixed speed, specified in the grid_access_speed parameter. By default its value is set to a walking-like speed of 5 / 3.6 m/s, i.e. 5 km/h. When setting include_grid_access = FALSE the distance between the actual origin location and its nearest grid point is not accounted for and the trip is assumed to effectively start at the nearest grid point. Keep again in mind that no matter what approach you choose, results get less accurate with a coarser grid.
For each transit stop listed in the stops table of the GTFS-Multi feed, the shortest multi-modal travel time from the selected grid point to that stop is calculated. This is done by a purely timetable-based routing algorithm. Access trips listed in the access table of the GTFS-Multi feed that originate from the selected grid point are included in the transit timetable by modelling them as direct transit trips from the selected grid point towards reachable transit stops, operating on a frequency-based schedule. The frequency of these trips is defined by the access_frequency parameter, which by default is set to 60 seconds, i.e. one departure every minute. The length of such an access trip is the minimum travel time for a given access connection among all different modes that are present in the access table. The modes can be filtered by setting the access_modes parameter, e.g. access_modes = "walk" to include only access times by foot. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the access_limit parameter. During the transit leg of the trip it is possible to transfer from one vehicle to another using a transfer connection listed in the transfers table of the GTFS-Multi feed. The travel time of such a transfer connection is the minimum travel time among all different modes that are present in the transfer table. The modes can be filtered by setting the transfer_modes parameter, e.g. transfer_modes = "walk" to include only transfer times by foot. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the transfer_limit parameter. For efficiency reasons, the routing algorithm in this phase has an upper bound defined, meaning that if the cumulative travel time of a trip from the selected grid point exceeds this upper bound before the transit stop is reached, the search for a route to that stop is terminated and the transit stop is marked as unreachable. This upper bound can be defined through the router_timeout parameter and defaults to 60 * 60 seconds, i.e. 1 hour.
For each destination grid point (i.e. all grid points except the one selected as origin) listed in the grid table of the GTFS-Multi feed, multi-modal travel times from the selected origin grid point to that destination grid point are calculated. This is done by extracting travel times of egress trips that lead to the destination grid point from the egress table of the GTFS-Multi feed, and adding those to the travel times from the origin grid point towards transit stops as calculated in the previous step. For a destination grid point this will often result in a set of multiple travel times (since there may be multiple egress trips leading to the same grid point), of which only the shortest one is preserved. The travel time of an egress connection is the minimum travel time among all different modes that are present in the egress table. The modes can be filtered by setting the egress_modes parameter, e.g. egress_modes = "walk" to include only egress times by foot. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the egress_limit parameter.
For each destination grid point listed in the grid table of the GTFS-Multi feed, the shortest multi-modal travel time (as calculated in the step above) is compared to the travel time of a single-modal direct trip listed in the direct table of the GTFS-Multi feed. The travel time of such a direct connection is the minimum travel time among all different modes that are present in the direct table. The modes can be filtered by setting the direct_modes parameter, e.g. direct_modes = "bike" to include only direct times by bicycle. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the direct_limit parameter. The minimum between the multi-modal travel times and the single-modal travel time is the returned shortest travel time from the selected origin grid point to the destination grid point.

Since all access, egress, direct and transfer times are already pre-calculated and stored in the GTFS Multi-feed, there is no street network routing needed anymore at this stage. The multi-modal travel times are calculated with a purely timetable-based routing algorithm. For that, multinet_traveltimes() relies on gtfsrouter::gtfs_traveltimes(), a function from the great gtfsrouter package that implements a recently developed algorithm for efficient one-to-many routing in transit networks.

An example:

short = multinet_traveltimes(tampere, origin, time, day = day, egress_modes = "walk")
short
#>       stop_id row_id col_id        stop_name stop_lat stop_lon travel_time
#>    1:   I39J4     39      4  Grid cell I39J4 61.51618 23.78610   1110.9811
#>    2:   I34J5     34      5  Grid cell I34J5 61.51506 23.77681    972.9811
#>    3:   I35J5     35      5  Grid cell I35J5 61.51511 23.77868   1025.9811
#>    4:   I36J5     36      5  Grid cell I36J5 61.51515 23.78056    929.9811
#>    5:   I37J5     37      5  Grid cell I37J5 61.51520 23.78244    947.9811
#>   ---                                                                     
#> 1637:  I46J50     46     50 Grid cell I46J50 61.47526 23.80347   1476.9811
#> 1638:  I47J50     47     50 Grid cell I47J50 61.47530 23.80534   1441.9811
#> 1639:  I48J50     48     50 Grid cell I48J50 61.47535 23.80722   1380.9811
#> 1640:  I49J50     49     50 Grid cell I49J50 61.47539 23.80909   1509.9811
#> 1641:  I50J50     50     50 Grid cell I50J50 61.47544 23.81096   1592.9811

As can be seen the function returns the grid table of the GTFS-Multi feed, with an additional column containing for each grid point the shortest travel time between the given origin location and that grid point. Grid points that could not be reached get assigned a travel time of NA.

We can use create_traveltime_grid() to write these travel times as values to the pixels of the reference grid. Such travel time grids are easy to visualize, and later on it will be easy to combine them for example with data about population, jobs or service locations in order to calculate accessibility metrics.

short_grid = create_traveltime_grid(short, grid)
short_grid
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
#> travel_time  161.9811 382.2311 567.4811 596.7634 767.9811 2493.981  906
#> dimension(s):
#>   from to  offset delta                refsys point values x/y
#> x    1 50  325200   100 ETRS89 / TM35FIN(E,N) FALSE   NULL [x]
#> y    1 50 6824850  -100 ETRS89 / TM35FIN(E,N) FALSE   NULL [y]

# Remove outliers for clearer visualization.
drop_outliers = function(x) { x[[1]][x[[1]] > quantile(x[[1]], 0.95, na.rm = TRUE)] = NA; x }
short_grid = drop_outliers(short_grid)

# Get bounding box of reference grid.
bbox = st_as_sfc(st_bbox(grid))

mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
  mapview(short_grid, layer.name = "travel time (s)", na.color = NA) +
  mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex  = 3)

Travel time grids can be imported and exported from/to GeoTIFF files using respectively the import_grid() and export_grid() functions.

Doing it the R5 way

The shortest possible travel time is not always a good metric to evaluate the quality of transit accessibility, because it does not tell you anything about the frequency of the service. There might be a very good bus connection that only operates once a day. When reporting only the shortest travel time to a destination along this connection the transit accessibility will be evaluated as being of very high quality. However, most of the day you might be hardly able to reach your destination by transit within an acceptable travel time. Hence, frequency is an important evaluation measure as well.

That is the reason why Conveyal with their R5 routing engine takes a different approach to travel time calculations. They calculate the travel time of multi-modal trips from a given origin to all destinations with a fixed start time. They iteratively repeat that calculation for every minute within the window of allowed starting times. In that way they obtain a distribution of travel times, which allows to analyze frequency of service.

Using multinet_traveltimes() you can do the same by setting iterate = TRUE. This will run the travel time calculation explained in the previous section iteratively, and during each iteration fixing the start time of the trips by forwarding a time period that has the same upper as lower bound. The start times that are iterated over are defined by the access_frequency parameter. By default this is set to 60 seconds, i.e. every minute.

all = multinet_traveltimes(
  tampere,
  origin,
  time,
  day = day,
  egress_modes = "walk",
  iterate = TRUE
)

names(all)[grep("travel_time", names(all))]
#>  [1] "travel_time_28800" "travel_time_28860" "travel_time_28920"
#>  [4] "travel_time_28980" "travel_time_29040" "travel_time_29100"
#>  [7] "travel_time_29160" "travel_time_29220" "travel_time_29280"
#> [10] "travel_time_29340" "travel_time_29400" "travel_time_29460"
#> [13] "travel_time_29520" "travel_time_29580" "travel_time_29640"
#> [16] "travel_time_29700" "travel_time_29760" "travel_time_29820"
#> [19] "travel_time_29880" "travel_time_29940" "travel_time_30000"
#> [22] "travel_time_30060" "travel_time_30120" "travel_time_30180"
#> [25] "travel_time_30240" "travel_time_30300" "travel_time_30360"
#> [28] "travel_time_30420" "travel_time_30480" "travel_time_30540"
#> [31] "travel_time_30600" "travel_time_30660" "travel_time_30720"
#> [34] "travel_time_30780" "travel_time_30840" "travel_time_30900"
#> [37] "travel_time_30960" "travel_time_31020" "travel_time_31080"
#> [40] "travel_time_31140" "travel_time_31200" "travel_time_31260"
#> [43] "travel_time_31320" "travel_time_31380" "travel_time_31440"
#> [46] "travel_time_31500" "travel_time_31560" "travel_time_31620"
#> [49] "travel_time_31680" "travel_time_31740" "travel_time_31800"
#> [52] "travel_time_31860" "travel_time_31920" "travel_time_31980"
#> [55] "travel_time_32040" "travel_time_32100" "travel_time_32160"
#> [58] "travel_time_32220" "travel_time_32280" "travel_time_32340"
#> [61] "travel_time_32400"

As can be seen the function returns still returns the grid table of the GTFS-Multi feed, but now with many additional columns containing for each grid point the shortest travel time between the given origin location and that grid point fixed at a given start time. Since we used a time interval of 1 hour with an access frequency of 1 minute, we obtain 61 different travel time values for each grid point. If we want a lower level of detail to gain performance, we can change the value of the access_frequency parameter, e.g. to every 10 minutes.

few = multinet_traveltimes(
  tampere,
  origin,
  time,
  day = day,
  egress_modes = "walk",
  iterate = TRUE,
  access_frequency = 600
)

names(few)[grep("travel_time", names(few))]
#> [1] "travel_time_28800" "travel_time_29400" "travel_time_30000"
#> [4] "travel_time_30600" "travel_time_31200" "travel_time_31800"
#> [7] "travel_time_32400"

The travel time grid we create with all these calculated travel times has an additional dimension next to the two spatial dimensions. This time dimension has one coordinate per iterated start time.

all_grid = create_traveltime_grid(all, grid)
all_grid
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median     Mean  3rd Qu.     Max.  NA's
#> travel_time  161.9811 607.9811 867.9811 884.3045 1132.981 4113.981 55270
#> dimension(s):
#>      from to  offset delta                refsys point values x/y
#> x       1 50  325200   100 ETRS89 / TM35FIN(E,N) FALSE   NULL [x]
#> y       1 50 6824850  -100 ETRS89 / TM35FIN(E,N) FALSE   NULL [y]
#> time    1 61   28800    60                    NA    NA   NULL

Following the Conveyal approach, we can reduce this full distribution of travel time values to a set of given percentile values. Different percentile values represent different views on what high quality transit accessibility means. For example, for a certain destination, the value of the travel time distribution at a percentile value close to 0 will be similar to the shortest possible travel time towards this destination during the given interval. It is appropriate for people that are flexible enough to fit their time schedule of travel to the transit time schedule, and thus don’t care too much about the frequency of the service. At the other side, the value of the travel time distribution at a percentile value close to 100 gives the time it will at least take you to reach the origin. It is appropriate for people that want to be able to leave at anytime, not even knowing the exact transit schedule. To get a balance between these two extremes, you may use any percentile value in between 0 and 100.

The reduce_to_percentiles() function lets you reduce a dense travel time grid to, for each pixel, a set of travel times at different percentile values of the full distribution. You can choose as many percentile values as you want. The result is a travel time grid in which the original time dimension is replaced by a percentile dimension, with one coordinate per selected percentile value.

NOTE You can also call reduce_to_percentiles() directly on the expanded grid table returned by multinet_traveltimes(). This will replace the original travel time columns by travel time percentile columns.

An example:

prc_grid = reduce_to_percentiles(all_grid, c(5, 95))
prc_grid
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>                  Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
#> travel_time  161.9811 573.9811 828.9811 892.8773 1162.981 4053.981 1812
#> dimension(s):
#>            from to  offset delta                refsys point   values x/y
#> x             1 50  325200   100 ETRS89 / TM35FIN(E,N) FALSE     NULL [x]
#> y             1 50 6824850  -100 ETRS89 / TM35FIN(E,N) FALSE     NULL [y]
#> percentile    1  2      NA    NA                    NA    NA 5% , 95%

Visualizing different travel times at different percentile values clearly show the differences (note the different color scales in the different plots). The higher the percentile value, the higher the travel times. Note also how direct single-modal travel times become more prevalent at higher percentile values. These single-modal travel times are constant during the full time interval, and may at higher percentile values well be shorter than multi-modal travel times that are bound to transit schedules, especially if the transit services don’t operate at high frequency.

# Get 5th percentile values.
p05_grid = split(prc_grid, "percentile")[1]

# Remove outliers for clearer visualization.
p05_grid = drop_outliers(p05_grid)

mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
  mapview(p05_grid, layer.name = "travel time (s)", na.color = NA) +
  mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex  = 3)

# Get 95th percentile values.
p95_grid = split(prc_grid, "percentile")[2]

# Remove outliers for clearer visualization.
p95_grid = drop_outliers(p95_grid)

mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
  mapview(p95_grid, layer.name = "travel time (s)", na.color = NA) +
  mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex  = 3)

Results will look different if we for example restrict access trips to be done by foot, limit both access and egress trips to be maximum 5 minutes, and don’t include direct trips at all.

all = multinet_traveltimes(
  tampere,
  origin,
  time,
  day = day,
  access_modes = "walk",
  direct_modes = NA,
  egress_modes = "walk",
  access_limit = 5 * 60,
  egress_limit = 5 * 60,
  iterate = TRUE
)

all_grid = create_traveltime_grid(all, grid)
prc_grid = reduce_to_percentiles(all_grid, c(5, 95))

# Get 5th percentile values.
p05_grid = split(prc_grid, "percentile")[1]

# Remove outliers for clearer visualization.
p05_grid = drop_outliers(p05_grid)

mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
  mapview(p05_grid, layer.name = "travel time (s)", na.color = NA) +
  mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex  = 3)

# Get 95th percentile values.
p95_grid = split(prc_grid, "percentile")[2]

# Remove outliers for clearer visualization.
p95_grid = drop_outliers(p95_grid)

mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
  mapview(p95_grid, layer.name = "travel time (s)", na.color = NA) +
  mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex  = 3)

Lucas van der Meer, triply GmbH

2021-12-21

Getting the data

Setting the origin location and start time limits

Finding the shortest travel time

Doing it the R5 way