02_route.Rmd
The gtfsmulti
package implements a very experimental approach to creating, storing and analyzing multi-modal transportation networks. The first vignette introduced the GTFS-Multi format and showed how to construct, read and write one. In this vignette we will show how to calculate multi-modal routes with a GTFS-Multi feed.
The GTFS-Multi feed for the city center of Tampere, created in the first vignette, is included in the package as an object called tampere
. It is of class multinet
and contains all required datasets of a GTFS-Multi feed.
names(tampere)
#> [1] "calendar_dates" "agency" "stop_times" "routes"
#> [5] "transfers" "calendar" "trips" "stops"
#> [9] "grid" "access" "direct" "egress"
class(tampere)
#> [1] "multinet" "gtfs" "list"
In addition, we will read the reference grid that was used to build this feed. Such a grid will usually be stored as a GeoTIFF file alongside the feed itself. We have it included as external package data.
grid = import_grid(system.file("tampere.geotiff", package = "gtfsmulti"))
grid
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> tampere.geotiff 1 1 1 1 1 1 859
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 50 325200 100 ETRS89 / TM35FIN(E,N) FALSE NULL [x]
#> y 1 50 6824850 -100 ETRS89 / TM35FIN(E,N) FALSE NULL [y]
Routes will always be calculated from a single origin to all possible destinations. In practice, this means that the given origin point is first snapped to its nearest grid point, and that from that grid points shortest routes are found to all other grid points. The coordinates of the origin location are supposed to be expressed as longitude and latitude values in coordinate reference system EPSG:4326.
Lets set an origin location somewhere in downtown Tampere:
Internally the function get_nearest_neighbour()
is used to find the nearest grid point in the GTFS-Multi feed to such an origin location. It returns the index of this grid point as well as the distance in meters between the origin location and the grid point. But as said, this is all done internally.
coords = st_coordinates(origin)
get_nearest_neighbour(coords, tampere$grid[, c("stop_id", "stop_lon", "stop_lat")])
#> $idx
#> [1] "I25J25"
#>
#> $dist
#> [1] 30.22641
In addition we should define at which time a trip is allowed to start. You can set this as a time interval with an upper and lower bound. The time instants should be given as seconds since midnight. In our examples we will use a time interval between 8:00 AM and 9:00 AM. That is, a trip may start at any time during this time interval. We should also define during which day of the week we want to calculate travel times, as transit timetables may be different between days (if we don’t set this automatically the current day is taken).
time = c(8 * 3600, 9 * 3600)
day = "Monday"
The standard approach is to calculate the shortest possible travel time of a trip between an origin and a destination given that the trip should start somewhere in the given time interval. In gtfsmulti
, the multinet_traveltimes()
function is the core multi-modal routing function. It calculates door-to-door travel times that take into account all different parts of a multi-modal trip. The calculation steps are the following:
grid_access_speed
parameter. By default its value is set to a walking-like speed of 5 / 3.6 m/s, i.e. 5 km/h. When setting include_grid_access = FALSE
the distance between the actual origin location and its nearest grid point is not accounted for and the trip is assumed to effectively start at the nearest grid point. Keep again in mind that no matter what approach you choose, results get less accurate with a coarser grid.access_frequency
parameter, which by default is set to 60 seconds, i.e. one departure every minute. The length of such an access trip is the minimum travel time for a given access connection among all different modes that are present in the access table. The modes can be filtered by setting the access_modes
parameter, e.g. access_modes = "walk"
to include only access times by foot. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the access_limit
parameter. During the transit leg of the trip it is possible to transfer from one vehicle to another using a transfer connection listed in the transfers table of the GTFS-Multi feed. The travel time of such a transfer connection is the minimum travel time among all different modes that are present in the transfer table. The modes can be filtered by setting the transfer_modes
parameter, e.g. transfer_modes = "walk"
to include only transfer times by foot. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the transfer_limit
parameter. For efficiency reasons, the routing algorithm in this phase has an upper bound defined, meaning that if the cumulative travel time of a trip from the selected grid point exceeds this upper bound before the transit stop is reached, the search for a route to that stop is terminated and the transit stop is marked as unreachable. This upper bound can be defined through the router_timeout
parameter and defaults to 60 * 60 seconds, i.e. 1 hour.egress_modes
parameter, e.g. egress_modes = "walk"
to include only egress times by foot. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the egress_limit
parameter.direct_modes
parameter, e.g. direct_modes = "bike"
to include only direct times by bicycle. Possible trips itself may be filtered as well, by setting a maximum allowed travel time through the direct_limit
parameter. The minimum between the multi-modal travel times and the single-modal travel time is the returned shortest travel time from the selected origin grid point to the destination grid point.Since all access, egress, direct and transfer times are already pre-calculated and stored in the GTFS Multi-feed, there is no street network routing needed anymore at this stage. The multi-modal travel times are calculated with a purely timetable-based routing algorithm. For that, multinet_traveltimes()
relies on gtfsrouter::gtfs_traveltimes()
, a function from the great gtfsrouter package that implements a recently developed algorithm for efficient one-to-many routing in transit networks.
An example:
short = multinet_traveltimes(tampere, origin, time, day = day, egress_modes = "walk")
short
#> stop_id row_id col_id stop_name stop_lat stop_lon travel_time
#> 1: I39J4 39 4 Grid cell I39J4 61.51618 23.78610 1110.9811
#> 2: I34J5 34 5 Grid cell I34J5 61.51506 23.77681 972.9811
#> 3: I35J5 35 5 Grid cell I35J5 61.51511 23.77868 1025.9811
#> 4: I36J5 36 5 Grid cell I36J5 61.51515 23.78056 929.9811
#> 5: I37J5 37 5 Grid cell I37J5 61.51520 23.78244 947.9811
#> ---
#> 1637: I46J50 46 50 Grid cell I46J50 61.47526 23.80347 1476.9811
#> 1638: I47J50 47 50 Grid cell I47J50 61.47530 23.80534 1441.9811
#> 1639: I48J50 48 50 Grid cell I48J50 61.47535 23.80722 1380.9811
#> 1640: I49J50 49 50 Grid cell I49J50 61.47539 23.80909 1509.9811
#> 1641: I50J50 50 50 Grid cell I50J50 61.47544 23.81096 1592.9811
As can be seen the function returns the grid table of the GTFS-Multi feed, with an additional column containing for each grid point the shortest travel time between the given origin location and that grid point. Grid points that could not be reached get assigned a travel time of NA
.
We can use create_traveltime_grid()
to write these travel times as values to the pixels of the reference grid. Such travel time grids are easy to visualize, and later on it will be easy to combine them for example with data about population, jobs or service locations in order to calculate accessibility metrics.
short_grid = create_traveltime_grid(short, grid)
short_grid
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> travel_time 161.9811 382.2311 567.4811 596.7634 767.9811 2493.981 906
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 50 325200 100 ETRS89 / TM35FIN(E,N) FALSE NULL [x]
#> y 1 50 6824850 -100 ETRS89 / TM35FIN(E,N) FALSE NULL [y]
# Remove outliers for clearer visualization.
drop_outliers = function(x) { x[[1]][x[[1]] > quantile(x[[1]], 0.95, na.rm = TRUE)] = NA; x }
short_grid = drop_outliers(short_grid)
# Get bounding box of reference grid.
bbox = st_as_sfc(st_bbox(grid))
mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
mapview(short_grid, layer.name = "travel time (s)", na.color = NA) +
mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex = 3)
Travel time grids can be imported and exported from/to GeoTIFF files using respectively the import_grid()
and export_grid()
functions.
The shortest possible travel time is not always a good metric to evaluate the quality of transit accessibility, because it does not tell you anything about the frequency of the service. There might be a very good bus connection that only operates once a day. When reporting only the shortest travel time to a destination along this connection the transit accessibility will be evaluated as being of very high quality. However, most of the day you might be hardly able to reach your destination by transit within an acceptable travel time. Hence, frequency is an important evaluation measure as well.
That is the reason why Conveyal with their R5 routing engine takes a different approach to travel time calculations. They calculate the travel time of multi-modal trips from a given origin to all destinations with a fixed start time. They iteratively repeat that calculation for every minute within the window of allowed starting times. In that way they obtain a distribution of travel times, which allows to analyze frequency of service.
Using multinet_traveltimes()
you can do the same by setting iterate = TRUE
. This will run the travel time calculation explained in the previous section iteratively, and during each iteration fixing the start time of the trips by forwarding a time period that has the same upper as lower bound. The start times that are iterated over are defined by the access_frequency
parameter. By default this is set to 60 seconds, i.e. every minute.
all = multinet_traveltimes(
tampere,
origin,
time,
day = day,
egress_modes = "walk",
iterate = TRUE
)
names(all)[grep("travel_time", names(all))]
#> [1] "travel_time_28800" "travel_time_28860" "travel_time_28920"
#> [4] "travel_time_28980" "travel_time_29040" "travel_time_29100"
#> [7] "travel_time_29160" "travel_time_29220" "travel_time_29280"
#> [10] "travel_time_29340" "travel_time_29400" "travel_time_29460"
#> [13] "travel_time_29520" "travel_time_29580" "travel_time_29640"
#> [16] "travel_time_29700" "travel_time_29760" "travel_time_29820"
#> [19] "travel_time_29880" "travel_time_29940" "travel_time_30000"
#> [22] "travel_time_30060" "travel_time_30120" "travel_time_30180"
#> [25] "travel_time_30240" "travel_time_30300" "travel_time_30360"
#> [28] "travel_time_30420" "travel_time_30480" "travel_time_30540"
#> [31] "travel_time_30600" "travel_time_30660" "travel_time_30720"
#> [34] "travel_time_30780" "travel_time_30840" "travel_time_30900"
#> [37] "travel_time_30960" "travel_time_31020" "travel_time_31080"
#> [40] "travel_time_31140" "travel_time_31200" "travel_time_31260"
#> [43] "travel_time_31320" "travel_time_31380" "travel_time_31440"
#> [46] "travel_time_31500" "travel_time_31560" "travel_time_31620"
#> [49] "travel_time_31680" "travel_time_31740" "travel_time_31800"
#> [52] "travel_time_31860" "travel_time_31920" "travel_time_31980"
#> [55] "travel_time_32040" "travel_time_32100" "travel_time_32160"
#> [58] "travel_time_32220" "travel_time_32280" "travel_time_32340"
#> [61] "travel_time_32400"
As can be seen the function returns still returns the grid table of the GTFS-Multi feed, but now with many additional columns containing for each grid point the shortest travel time between the given origin location and that grid point fixed at a given start time. Since we used a time interval of 1 hour with an access frequency of 1 minute, we obtain 61 different travel time values for each grid point. If we want a lower level of detail to gain performance, we can change the value of the access_frequency
parameter, e.g. to every 10 minutes.
few = multinet_traveltimes(
tampere,
origin,
time,
day = day,
egress_modes = "walk",
iterate = TRUE,
access_frequency = 600
)
names(few)[grep("travel_time", names(few))]
#> [1] "travel_time_28800" "travel_time_29400" "travel_time_30000"
#> [4] "travel_time_30600" "travel_time_31200" "travel_time_31800"
#> [7] "travel_time_32400"
The travel time grid we create with all these calculated travel times has an additional dimension next to the two spatial dimensions. This time dimension has one coordinate per iterated start time.
all_grid = create_traveltime_grid(all, grid)
all_grid
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> travel_time 161.9811 607.9811 867.9811 884.3045 1132.981 4113.981 55270
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 50 325200 100 ETRS89 / TM35FIN(E,N) FALSE NULL [x]
#> y 1 50 6824850 -100 ETRS89 / TM35FIN(E,N) FALSE NULL [y]
#> time 1 61 28800 60 NA NA NULL
Following the Conveyal approach, we can reduce this full distribution of travel time values to a set of given percentile values. Different percentile values represent different views on what high quality transit accessibility means. For example, for a certain destination, the value of the travel time distribution at a percentile value close to 0 will be similar to the shortest possible travel time towards this destination during the given interval. It is appropriate for people that are flexible enough to fit their time schedule of travel to the transit time schedule, and thus don’t care too much about the frequency of the service. At the other side, the value of the travel time distribution at a percentile value close to 100 gives the time it will at least take you to reach the origin. It is appropriate for people that want to be able to leave at anytime, not even knowing the exact transit schedule. To get a balance between these two extremes, you may use any percentile value in between 0 and 100.
The reduce_to_percentiles()
function lets you reduce a dense travel time grid to, for each pixel, a set of travel times at different percentile values of the full distribution. You can choose as many percentile values as you want. The result is a travel time grid in which the original time dimension is replaced by a percentile dimension, with one coordinate per selected percentile value.
NOTE You can also call
reduce_to_percentiles()
directly on the expanded grid table returned bymultinet_traveltimes()
. This will replace the original travel time columns by travel time percentile columns.
An example:
prc_grid = reduce_to_percentiles(all_grid, c(5, 95))
prc_grid
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> travel_time 161.9811 573.9811 828.9811 892.8773 1162.981 4053.981 1812
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 50 325200 100 ETRS89 / TM35FIN(E,N) FALSE NULL [x]
#> y 1 50 6824850 -100 ETRS89 / TM35FIN(E,N) FALSE NULL [y]
#> percentile 1 2 NA NA NA NA 5% , 95%
Visualizing different travel times at different percentile values clearly show the differences (note the different color scales in the different plots). The higher the percentile value, the higher the travel times. Note also how direct single-modal travel times become more prevalent at higher percentile values. These single-modal travel times are constant during the full time interval, and may at higher percentile values well be shorter than multi-modal travel times that are bound to transit schedules, especially if the transit services don’t operate at high frequency.
# Get 5th percentile values.
p05_grid = split(prc_grid, "percentile")[1]
# Remove outliers for clearer visualization.
p05_grid = drop_outliers(p05_grid)
mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
mapview(p05_grid, layer.name = "travel time (s)", na.color = NA) +
mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex = 3)
# Get 95th percentile values.
p95_grid = split(prc_grid, "percentile")[2]
# Remove outliers for clearer visualization.
p95_grid = drop_outliers(p95_grid)
mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
mapview(p95_grid, layer.name = "travel time (s)", na.color = NA) +
mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex = 3)
Results will look different if we for example restrict access trips to be done by foot, limit both access and egress trips to be maximum 5 minutes, and don’t include direct trips at all.
all = multinet_traveltimes(
tampere,
origin,
time,
day = day,
access_modes = "walk",
direct_modes = NA,
egress_modes = "walk",
access_limit = 5 * 60,
egress_limit = 5 * 60,
iterate = TRUE
)
all_grid = create_traveltime_grid(all, grid)
prc_grid = reduce_to_percentiles(all_grid, c(5, 95))
# Get 5th percentile values.
p05_grid = split(prc_grid, "percentile")[1]
# Remove outliers for clearer visualization.
p05_grid = drop_outliers(p05_grid)
mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
mapview(p05_grid, layer.name = "travel time (s)", na.color = NA) +
mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex = 3)
# Get 95th percentile values.
p95_grid = split(prc_grid, "percentile")[2]
# Remove outliers for clearer visualization.
p95_grid = drop_outliers(p95_grid)
mapview(bbox, alpha.regions = 0, alpha = 1, lwd = 3) +
mapview(p95_grid, layer.name = "travel time (s)", na.color = NA) +
mapview(origin, col.regions = "black", alpha.regions = 1, alpha = 0, cex = 3)