Skip to content

TUMFTM/european-bike-sharing-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

“Data-Driven Insights into (E-)Bike-Sharing: Mining a Large-Scale Dataset on Usage and Urban Characteristics - Descriptive Analysis and Performance Modeling” (supplementary dataset)

Scope – This repository accompanies the article in Springer Transportation (DOI 10.1007/s11116-025-10661-2, 2025) and provides the full relational data export (43 Mio km, 2.3 GiB compressed) required to reproduce all analyses, together with 1000‑row excerpts for quick exploration.

Authors

Name Affiliation Email Notes
Felix Waldner* Technical University of Munich, School of Engineering and Design, Institute of Automotive Technology, Germany f.waldner@tum.de *Corresponding author, Equal contribution
Georg Balke Technical University of Munich, School of Engineering and Design, Institute of Automotive Technology, Germany georg.balke@tum.de Equal contribution
Felix Rech Technical University of Munich, School of Computation, Information and Technology, Germany
Martin Lellep The University of Edinburgh, School of Physics and Astronomy, United Kingdom

Repository layout

.
├── full/                     # full‑resolution CSV exports
│   └── dataset.zip                
│       └── bikes.csv              (≈ 88 k rows )
│       └── bike_types.csv         (≈ 121 rows)
│       └── cities.csv             (267 rows)
│       └── city_areas.csv         (267 rows – WKT/WKB/GeoJSON geometries)
│       └── stations.csv           (≈ 13 k rows)
│       └── station_status.csv     (≈ 38 M rows, 1.7 GiB uncompressed)
│       └── trips.csv              (≈ 25 M rows, 2.7 GiB uncompressed)
├── sample/                   # first 1 000 rows of every file (handy for testing, exploration)
│   ├── bikes.csv
│   ├── …
├── etl/                      # Code used to process the raw data - just informational
│   ├── data_pipeline.py      # Data decompression, explosion, splitting, upload
│   ├── utils.py              # little helper functions for PostgreSQL upload
│   ├── config.template.py    # config file for database credentials and paths
│   └── schema_definition.sql # Data inserts, upserts, trip extraction handled in-database (PostgreSQL)
└── README.md                 # you are here

Data model & file schemas

All timestamps are Unix epoch seconds in UTC; join via cities.timezone to convert to local times. All coordinates are WGS‑84 (EPSG:4326).

trips.csv (25 210 627 rows)
column unit description
bike_id Bike identifier; bikes.id
city_id Rental city; cities.id
time_start s Rental start (UTC)
lon_start, lat_start, lon_end, lat_end ° Start & end coordinates WGS-84
station_id_start, station_id_end Station IDs, NULL if free‑floating
battery_start, battery_end % State of charge (e‑bikes only)
duration s Trip duration
distance m Great‑circle distance (PostGIS ST_Distance)
cities.csv (267 rows)
column unit description
id Primary key
name Name of the bike‑sharing scheme
lat, lon ° Approximate city center
timezone IANA timezone string (e.g. Europe/Paris)
country ISO‑3166 alpha‑3 country code
return_to_official_only bool true if bikes must be left at a (virtual) station
city_areas.csv (267 rows)
column description
city_id cities.id
geom_ewkb Operational area estimated via DBSCAN in EWKB
geom_ewkt … in EWKT
geom_geojson … in GeoJSON
bike_types.csv (121 rows)
column unit description
id Technical bike type
vehicle_image URL of representative image
name Commercial name
description Free‑text description
form_factor regular / cargo
rider_capacity Typical seats (1, 2, …)
propulsion_type human / electric
max_range m Nominal electric range
battery_capacity Wh Battery energy
bikes.csv (88 444 rows)
column unit description
id Bike identifier
bike_type_id Technical type; bike_types.id
computer_id On‑board computer identifier
stations.csv (13 192 rows)
column unit description
id Station identifier
city_id cities.id
name Human‑readable label
app_number Number shown to users
terminal_type Hardware generation (12 values)
place_type Unknown (23 observed values)
bike_racks Regular parking positions
special_racks Charging racks
lon, lat ° Location
station_status.csv (38 279 885 rows)
column unit description
station_id stations.id
time s Snapshot timestamp (UTC)
bikes Total bikes currently docked
booked_bikes Bikes reserved by users
bikes_available_to_rent Bikes ready for rental
free_racks Empty regular docks
free_special_racks Empty charging docks
maintenance bool true = offline

Quick start

# 1. Clone with Git LFS for large files (optional but recommended)
$ git lfs install
$ git clone https://github.com/tumftm/european-bike‑sharing‑dataset.git

# 2. Decompress the dataset
$ cd full
$ unzip dataset.zip  

# 3. Spin up PostGIS (example)
$ createdb bikesharing
$ psql bikesharing -c 'CREATE EXTENSION postgis;'
$ ogr2ogr -f PostgreSQL PG:"dbname=bikesharing" trips.csv -nln trips -oo X_POSSIBLE_NAMES=lon_start -oo Y_POSSIBLE_NAMES=lat_start

For exploratory work you can start with the 1 000‑row sample files:

import pandas as pd
import geopandas as gpd

trips = pd.read_csv('sample/trips.csv')
# ...
# You might want to perform the EWKB conversion manually
city_areas = gpd.read_file("sample/city_areas.csv")
city_areas = city_areas.set_geometry(gpd.GeoSeries.from_wkb(city_areas.geom_ewkb))

License

The dataset is distributed under the Creative Commons Attribution - NonCommercial 4.0 International (CC BY-NC 4.0) license. When using or adapting the data, please cite the paper which links back to this repository.

@article{waldnerbalke2025,
  title={Data-Driven Insights into (E-)Bike-Sharing: Mining a Large-Scale Dataset on Usage and Urban Characteristics - Descriptive Analysis and Performance Modeling},
  author={Waldner, Felix. and Balke, Georg and Rech, Felix and Lellep, Martin},
  year={2025},
  journal={Transportation}
}

Contact


About

Data from 267 bike sharing schemes across Europe - 43 million km & 88000 bikes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages