01_intro.Rmd
The gtfsmulti
package implements a very experimental approach to creating, storing and analyzing multi-modal transportation networks. This vignette will explain how the ideas behind the approach developed and how you can construct a GTFS-Multi feed, a multi-modal extension to the regular GTFS format.
library(gtfsmulti)
library(dodgr)
library(mapview)
library(sf)
library(sfnetworks)
library(tidygraph)
library(units)
The ideas behind gtfsmulti
are largely inspired by how Conveyal approaches multi-modal routing with their R5 routing engine. This can be summarized as follows. R5 utilizes a regular grid of cells covering the entire analysis area. The centroids of these grid cells (the grid points) form the set of possible destinations when e.g. calculating catchment areas or cumulative accessibility measures for a given origin. A multi-modal route between the origin and one of the destinations may then consist of several parts: travelling on the street network from the origin to a transit stop (i.e. the access leg), travelling between transit stops in a transit vehicle, travelling on the street network between transit stops to transfer from one transit vehicle to another (i.e. the transfer leg), and travelling on the street network from a transit stop to the selected grid point (i.e. the egress leg). It is also possible to travel directly from the origin to the destination, without using transit. Such a trip is called a direct trip.
Finding optimal multi-modal routes is a challenging task that involves both street network routing as well as transit network routing. The street network routing task is usually solved by graph-based algorithms in which street segments are edges and street intersections are nodes. The transit network routing task is usually solved by schedule-based algorithms that only scan the transit timetable rather than explicitly modelling the network as a graph. To reduce the complexitly of the routing task at runtime, R5 pre-calculates (as far as I understand) quite a lot of street network travel times e.g. between transit stop locations, street network vertices and grid points when combining the street network (loaded from an OpenStreetMap extract in PBF format) and the transit network (loaded from a GTFS archive) into a single, internally used transport network representation.
While trying to understand the internals of R5, I got the following idea. If we are already pre-calculating so many street network travel times, can we not simply include them in the transit network timetable, which can then be utilized by any GTFS-consuming routing engine? In the end, the grid points are fixed locations in space, just like transit stops. When we take the R5 approach a step further and also use the grid points as origins of trips, the access leg of a multi-modal route can be seen as a “transfer” from a grid point to a transit stop, and the egress leg as a “transfer” from a transit stop to a grid point. Using the same thought, a direct trip is a “transfer” from one grid point to another. I formalized this idea as an extension to the standard GTFS format: GTFS-Multi. It takes quite some pre-processing to create such a GTFS-Multi feed, but the benefit is that afterwards only schedule-based routing algorithms (i.e. those that scan a transit timetable) are sufficient to find optimal multi-modal routes between an origin and (multiple) destination(s).
The gtfsmulti
package contains functions to create, read and write GTFS-Multi feeds, as well as functions for multi-modal routing with a GTFS-Multi feed. It internally relies on gtfsrouter, an R package for fast schedule-based routing with GTFS feeds, and dodgr, an R package for dual-weighted street network routing.
Please note that gtfsmulti
is not meant to substitute R5 at all. R5 is a great tool created and maintained by a group of highly experienced people that spend years of work on its development. The gtfsmulti
package will by no means be able to do all the things that R5 can do, let alone at the same scale and the same speed. However, when using R5 (usually through the r5r package) in a more explorative way, I regularly feel the need for more control over the different phases in the routing workflow. For example:
Of course, R5 is open-source software, and thus, in theory nothing holds you back from modifying the source code (and potentially contributing to R5 itself). However, in practice this can be rather complicated, given that the source code of R5 contains hundreds of thousands lines of code. That is the reason why I started thinking about an alternative. As said, the idea and the implementation are very experimental, and to be honest I am quite sceptical about its scalability to larger areas. However, it is primarily intended to be used for exploratory analysis on smaller scales and with full flexibility, and I think it serves that purpose quite well.
A GTFS-Multi feed extends a regular, static GTFS feed by defining four additional files in the dataset: grid.txt, access.txt, direct.txt and egress.txt. On top of that, it modifies the existing specification of the transfers.txt file.
The field definitions of the added/modified files are as follows.
File: Required
The grid table stores the locations of all grid points in the reference grid. It has a similar structure to the stops.txt file in a regular GTFS feed.
Field Name | Type | Required | Description |
---|---|---|---|
stop_id
|
ID | Required | Uniquely identifies a grid point. |
row_id
|
Enum | Optional | Row number of the position of the grid point inside the grid. |
col_id
|
Enum | Optional | Column number of the position of the grid point inside the grid. |
stop_name
|
Text | Required | Name of the grid point. Merely required to correspond with regular stop table requirements. |
stop_desc
|
Text | Optional | Description of the location that provides useful, quality information. Do not simply duplicate the name of the location. |
stop_lat
|
Latitude | Required | Latitude of the location. |
stop_lon
|
Longitude | Required | Longitude of the location. |
File: Required
The access table stores street network travel times between grid points and transit stop locations. It has a similar structure to the transfers.txt file in a regular GTFS feed. Travel times can be stored separately for different modes of transport.
Field Name | Type | Required | Description |
---|---|---|---|
from_stop_id
|
ID referencing grid.stop_id
|
Required | Identifies a grid point where an access trip begins. |
to_stop_id
|
ID referencing stops.stop_id
|
Required | Identifies a stop or station where an access trip ends. If this field refers to a station, the transfer rule applies to all child stops. |
transfer_type
|
Enum | Required | Indicates the type of connection. Only valid option is 2. Merely required to correspond with regular transfer table requirements. |
transfer_time_*mode*
|
Non-negative integer | Required |
Amount of time, in seconds, it takes to complete the trip, using a specific transport mode. Each included transport mode will have its own column, e.g. transfer_time_bicycle and transfer_time_walk .
|
File: Required
The direct table stores street network travel times between two different grid points. It has a similar structure to the transfers.txt file in a regular GTFS feed. Travel times can be stored separately for different modes of transport.
Field Name | Type | Required | Description |
---|---|---|---|
from_stop_id
|
ID referencing grid.stop_id
|
Required | Identifies a grid point where a direct trip begins. |
to_stop_id
|
ID referencing grid.stop_id
|
Required | Identifies a grid point where a direct trip end. |
transfer_type
|
Enum | Required | Indicates the type of connection. Only valid option is 2. Merely required to correspond with regular transfer table requirements. |
transfer_time_*mode*
|
Non-negative integer | Required |
Amount of time, in seconds, it takes to complete the trip, using a specific transport mode. Each included transport mode will have its own column, e.g. transfer_time_bicycle and transfer_time_walk .
|
File: Required
The egress table stores street network travel times between transit stop locations and grid points. It has a similar structure to the transfers.txt file in a regular GTFS feed. Travel times can be stored separately for different modes of transport.
Field Name | Type | Required | Description |
---|---|---|---|
from_stop_id
|
ID referencing stops.stop_id
|
Required | Identifies a stop or station where an egress trip begins. If this field refers to a station, the transfer rule applies to all child stops. |
to_stop_id
|
ID referencing grid.stop_id
|
Required | Identifies a grid point where an egress trip ends. |
transfer_type
|
Enum | Required | Indicates the type of connection. Only valid option is 2. Merely required to correspond with regular transfer table requirements. |
transfer_time_*mode*
|
Non-negative integer | Required |
Amount of time, in seconds, it takes to complete the trip, using a specific transport mode. Each included transport mode will have its own column, e.g. transfer_time_bicycle and transfer_time_walk .
|
File: Required
The transfer table stores street network travel times between two different transit stop locations. It has a similar structure to the transfers.txt file in a regular GTFS feed. Travel times can be stored separately for different modes of transport.
Field Name | Type | Required | Description |
---|---|---|---|
from_stop_id
|
ID referencing stops.stop_id
|
Required | Identifies a stop or station where a transfer trip begins. If this field refers to a station, the transfer rule applies to all child stops. |
to_stop_id
|
ID referencing stops.stop_id
|
Required | Identifies a stop or station where a transfer trip ends. If this field refers to a station, the transfer rule applies to all child stops. |
transfer_type
|
Enum | Required | Indicates the type of connection. Only valid option is 2. Merely required to correspond with regular transfer table requirements. |
transfer_time_*mode*
|
Non-negative integer | Required |
Amount of time, in seconds, it takes to complete the trip, using a specific transport mode. Each included transport mode will have its own column, e.g. transfer_time_bicycle and transfer_time_walk .
|
The spatial extent of the analysis is a rectangular area over which the reference grid will be created. That is, all origins and destinations of multi-modal routes can only be inside this area. As an example we will use an area covering the centre of Tampere, Finland. We can use the create_extent()
function to create one. We will express coordinates in the common local projected coordinate reference system, which in Finland is EPSG:3067.
bounds = c(325200, 6819850, 330200, 6824850)
extent = create_extent(bounds, input_crs = 3067, output_crs = 3067)
The transit network (i.e. GTFS file) and street network (i.e. OSM PBF file) will be cropped such that they only contain data relevant for the analysis. For this cropping we use a slightly larger area than the analysis extent itself, to minimize border effects.
large_bounds = st_bbox(extent) + c(-2000, -2000, 2000, 2000)
large_extent = create_extent(large_bounds, input_crs = 3067, output_crs = 3067)
The source of the transit network should be a regular GTFS feed stored in a .zip
file. We can use the import_transitnet()
function to read such a transit network into R. It allows to provide a spatial extent through the extent
argument. Doing so will only keep transit stops within that extent, and the trips that contain them.
gtfs_file = tempfile(fileext = ".zip")
download.file("https://github.com/luukvdmeer/tampere/raw/main/tampere.zip", gtfs_file)
transitnet = import_transitnet(gtfs_file, extent = large_extent)
names(transitnet)
#> [1] "calendar_dates" "agency" "stop_times" "routes"
#> [5] "fare_attributes" "transfers" "calendar" "fare_rules"
#> [9] "trips" "stops" "shapes"
stops = transitnet$stops
stops
#> stop_id stop_code stop_name stop_lat stop_lon zone_id
#> 1: 0001 0001 Keskustori H 61.49751 23.76151 A
#> 2: 0002 0002 Keskustori G 61.49756 23.76148 A
#> 3: 0005 0005 Keskustori K 61.49734 23.76154 A
#> 4: 0007 0007 Keskustori J 61.49740 23.76154 A
#> 5: 0011 0011 Keskustori E 61.49767 23.76044 A
#> ---
#> 581: 0904 0904 Hippos B 61.50166 23.80087 A
#> 582: 0901 0901 Kalevan kirkko A 61.50014 23.79254 A
#> 583: 0902 0902 Kalevan kirkko B 61.50014 23.79294 A
#> 584: 0951 0951 Sorin aukio A 61.49493 23.76965 A
#> 585: 0950 0950 Sorin aukio B 61.49475 23.76964 A
The source of the street network should be a OpenStreetMap dump stored as a osm.pbf
file. We can use the import_streetnet()
function to extract the streets from such a file and read them as LINESTRING
geometries into R. It assumes that streets have a value for the OSM highway tag. Through the highway_types
argument you can specify a subset of all possible values for this tag, such that only a subset of streets is imported. For example, when setting highway_types = c("primary", "secondary")
only streets tagged as primary or secondary are imported. By default, the following highway types are considered streets:
gtfsmulti::DEFAULT_HIGHWAY_TYPES
#> [1] "motorway" "motorway_link" "trunk" "trunk_link"
#> [5] "primary" "primary_link" "secondary" "secondary_link"
#> [9] "tertiary" "tertiary_link" "unclassified" "residential"
#> [13] "living_street" "service" "pedestrian" "track"
#> [17] "footway" "bridleway" "steps" "path"
#> [21] "cycleway"
Through the tags
argument you can specify which OSM tags you want to include as attribute columns of the imported linestrings. Such attributes can be used to define custom edge weights later on. For example, if you want to create custom edge weights based on the number of lanes of a street, you’ll need to include the lanes tag as an attribute of the street linestrings when importing the street network. Also, the oneway tag should be present if you want to consider one-directional streets during graph building. By default, the following tags are included as attribute columns:
gtfsmulti::DEFAULT_TAGS
#> [1] "highway" "oneway" "name" "lanes" "maxspeed" "surface"
The import_streetnet()
function also allows to provide a spatial extent through the extent
argument. Doing so will only keep street linestrings that intersect with that extent.
Since we will at a later stage create very simple custom edge weights based on nothing more than the highway type of a street, we only request a limited amount of tags to be included as attribute columns
osm_file = tempfile(fileext = ".osm.pbf")
download.file("https://github.com/luukvdmeer/tampere/raw/main/tampere.osm.pbf", osm_file)
osm_tags = c("highway", "oneway")
streets = import_streetnet(osm_file, extent = large_extent, tags = osm_tags, quiet = TRUE)
streets
#> Simple feature collection with 16497 features and 3 fields
#> Geometry type: LINESTRING
#> Dimension: XY
#> Bounding box: xmin: 23.64334 ymin: 61.44738 xmax: 23.87721 ymax: 61.55741
#> Geodetic CRS: WGS 84
#> First 10 features:
#> osm_id highway oneway geometry
#> 1 4011780 primary yes LINESTRING (23.75679 61.491...
#> 2 4011781 primary <NA> LINESTRING (23.7508 61.5027...
#> 3 4011789 primary yes LINESTRING (23.74667 61.503...
#> 4 4012510 trunk yes LINESTRING (23.67901 61.511...
#> 5 4058167 primary yes LINESTRING (23.82317 61.492...
#> 6 4319507 motorway yes LINESTRING (23.77596 61.481...
#> 7 4516988 residential <NA> LINESTRING (23.75261 61.497...
#> 8 4696135 secondary yes LINESTRING (23.75911 61.454...
#> 9 4797048 primary yes LINESTRING (23.76376 61.476...
#> 10 4814383 residential <NA> LINESTRING (23.68993 61.507...