Working with Uber Movement Speeds data

David Schnurr
Uber Movement
Published in
5 min readMay 15, 2019

--

We recently announced Uber Movement Speeds — a dataset of historical street speeds aggregated from anonymized Uber trips to help urban planners and researchers solve complicated urban mobility problems.

Movement Speeds Web Exploration Tool for New York City

In addition to providing our web-based exploration tool, we are offering access to the raw speeds data to support a variety of different types of use cases and analyses.

We offer three primary types of data (made available as CSVs):

Hourly Time Series — This dataset provides the average and standard deviation of speeds on road segments throughout a city for each hour historically. It only includes road segments which received enough traffic in each hour to provide reliable estimates and preserve customer privacy.

Quarterly Statistics by Hour of Day — This dataset provides the average, standard deviation, 50th percentile, and 85th percentile speeds aggregated by hour of day across all days in the specified quarter. Similar to the hourly time series dataset, data is only provided for road segments which received sufficient Uber traffic during the time period — percentiles in particular are only included for highly trafficked roads and may not be available for many local roads.

Movement to OSM ID Mapping Files — While our speeds data is aligned with the OpenStreetMap road network, the files above report speeds for roads using Movement-specific IDs (the reasons for this are outside the scope of this document). These mapping files allow you to map Movement segment & junction IDs back to their respective OSM way & node IDs, and as such they play a critical role in leveraging and supporting interoperability of Movement Speeds data.

There are two primary ways you can begin working with this data:

Movement Data Toolkit — This is a purpose-built command line tool we’ve created to help automate and simplify the process of downloading, joining, and visualizing Movement speeds data. You can get the toolkit here.

Manual processing — Advanced users can manually download the desired files and write scripts or SQL queries to join the underlying data with OSM geospatial data that they’ve downloaded separately from an OSM data provider. This is most challenging and time consuming route.

Given the relative simplicity of Movement Data Toolkit compared to the alternative, we strongly recommend you explore that route first. But for those that wish to dive into the raw data, we’ll help you get started in this article.

Anatomy of Speeds Data

A Movement Segment represents a stretch of road between two junctions. They are conceptually the same as OSM Ways, however OSM ways can often span many blocks, whereas Movement Segments tend to be much shorter, only spanning from one junction to an adjacent junction. In practice this means that there is often a 1-to-many relationship between OSM Ways and Movement Segments.

When reporting speed data for a road segment, we provide a triplet of Movement IDs — the Start Junction, Segment, and End Junction. These three IDs allow us to determine directionality of traffic by reporting traffic moving from one junction towards another along a given road segment. Here’s an example of what this looks like in practice:

Note that there are two rows for each segment in the given hour — these roads are two way roads so speeds are reported in both directions.

Getting Started with the Movement Data Toolkit

As mentioned earlier, Movement Data Toolkit is the easiest way to get started working with speeds data. It will automatically download, unzip, join, and process the speeds data into a geojson file that you can drop into Kepler.gl or other similar geospatial tools.

Please read the documentation to learn more about installing the toolkit and all the functionality it offers.

Manually working with Speeds Data

If Movement Data Toolkit doesn’t suit your needs, you can manually download, process, and join the data yourself using custom scripts or SQL queries. We recommend loosely following the process outlined below:

1. Download the speeds and mapping data

From the Movement website, you should download the CSV files containing speeds data for the cities and time periods you care about. This could be either the hourly time series data or the quarterly statistics.

You should also download the files for mapping Movement Segments to OSM Ways and Movement Junctions to OSM Nodes for the cities you care about.

All these files are compressed .zip files so you’ll need to unzip them to obtain the CSV — in most operating systems this is as simple as just double clicking the zip file.

2. Obtain OSM road data

Uber Movement only provides files to map Movement IDs to corresponding OSM IDs in our download portal, however you’ll still need to figure out which roads in a city these OSM IDs correspond to. Depending on your use case there are a couple of strategies here.

If you only care about a handful of road segments in a city, you can use a tool like the official OSM Viewer to find the roads you care about and pull their OSM Way and Node IDs. These IDs can then be used to filter the speeds data down to the roads of interest.

If you want to perform larger analysis or visualization across all roads in a city, you’ll need to download OSM data directly. There are a number of ways to do this, so we recommend visiting the OSM wiki to determine the strategy that best matches your use case.

One very important thing to note about OSM data, is that it is constantly changing as people contribute updates to the map to improve its coverage and accuracy. This is a great thing, but presents a challenge for us because OSM IDs can change or disappear over time as roads are split, combined, added or removed, making it difficult to match them to our speeds data. Whenever possible, we recommend using an OSM build closest to the time period of the speed data you are downloading.

3. Process and join the data.

Once you have both speeds data and OSM data, the real analysis can begin. One possible approach is to use PostGIS or another database that facilitate SQL-style joins and queries to carry out your analysis.

You can also use Python, JavaScript, or another programming language to read, process, and join the raw data files, both performing analysis and possibly outputting reports or geospatial files that can be loaded into other tools. The is essentially what the Movement Data Toolkit is doing, and its source code can serve as sample code for those wishing to take a similar route.

Since we’re only recommending this manual data processing approach to advanced users, we’ll leave it to you to determine what will work best based on your use case and skillset.

Feedback

As we expand the rollout of Movement Speeds in the coming months, we hope to gather feedback on our speeds data and iterate on it to better match our customers workflows. If you have ideas on how to improve the process or would like to partner with us on a research project, don’t hesitate to reach out to us through the Movement website.

--

--

David Schnurr
Uber Movement

Engineering at OpenAI. Building products, tools, and data visualizations.