You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
“Data-Driven Insights into (E-)Bike-Sharing: Mining a Large-Scale Dataset on Usage and Urban Characteristics - Descriptive Analysis and Performance Modeling” (supplementary dataset)
Scope – This repository accompanies the article in Springer Transportation (DOI 10.1007/s11116-025-10661-2, 2025) and provides the full relational data export (43 Mio km, 2.3 GiB compressed) required to reproduce all analyses, together with 1000‑row excerpts for quick exploration.
Authors
Name
Affiliation
Email
Notes
Felix Waldner*
Technical University of Munich, School of Engineering and Design, Institute of Automotive Technology, Germany
Technical University of Munich, School of Computation, Information and Technology, Germany
Martin Lellep
The University of Edinburgh, School of Physics and Astronomy, United Kingdom
Repository layout
.
├── full/ # full‑resolution CSV exports
│ └── dataset.zip
│ └── bikes.csv (≈ 88 k rows )
│ └── bike_types.csv (≈ 121 rows)
│ └── cities.csv (267 rows)
│ └── city_areas.csv (267 rows – WKT/WKB/GeoJSON geometries)
│ └── stations.csv (≈ 13 k rows)
│ └── station_status.csv (≈ 38 M rows, 1.7 GiB uncompressed)
│ └── trips.csv (≈ 25 M rows, 2.7 GiB uncompressed)
├── sample/ # first 1 000 rows of every file (handy for testing, exploration)
│ ├── bikes.csv
│ ├── …
├── etl/ # Code used to process the raw data - just informational
│ ├── data_pipeline.py # Data decompression, explosion, splitting, upload
│ ├── utils.py # little helper functions for PostgreSQL upload
│ ├── config.template.py # config file for database credentials and paths
│ └── schema_definition.sql # Data inserts, upserts, trip extraction handled in-database (PostgreSQL)
└── README.md # you are here
Data model & file schemas
All timestamps are Unix epoch seconds in UTC; join via cities.timezone to convert to local times. All coordinates are WGS‑84 (EPSG:4326).
trips.csv (25 210 627 rows)
column
unit
description
bike_id
–
Bike identifier; bikes.id
city_id
–
Rental city; cities.id
time_start
s
Rental start (UTC)
lon_start, lat_start, lon_end, lat_end
°
Start & end coordinates WGS-84
station_id_start, station_id_end
–
Station IDs, NULL if free‑floating
battery_start, battery_end
%
State of charge (e‑bikes only)
duration
s
Trip duration
distance
m
Great‑circle distance (PostGIS ST_Distance)
cities.csv (267 rows)
column
unit
description
id
–
Primary key
name
–
Name of the bike‑sharing scheme
lat, lon
°
Approximate city center
timezone
–
IANA timezone string (e.g. Europe/Paris)
country
–
ISO‑3166 alpha‑3 country code
return_to_official_only
bool
true if bikes must be left at a (virtual) station
city_areas.csv (267 rows)
column
description
city_id
cities.id
geom_ewkb
Operational area estimated via DBSCAN in EWKB
geom_ewkt
… in EWKT
geom_geojson
… in GeoJSON
bike_types.csv (121 rows)
column
unit
description
id
–
Technical bike type
vehicle_image
–
URL of representative image
name
–
Commercial name
description
–
Free‑text description
form_factor
–
regular / cargo
rider_capacity
–
Typical seats (1, 2, …)
propulsion_type
–
human / electric
max_range
m
Nominal electric range
battery_capacity
Wh
Battery energy
bikes.csv (88 444 rows)
column
unit
description
id
–
Bike identifier
bike_type_id
–
Technical type; bike_types.id
computer_id
–
On‑board computer identifier
stations.csv (13 192 rows)
column
unit
description
id
–
Station identifier
city_id
–
cities.id
name
–
Human‑readable label
app_number
–
Number shown to users
terminal_type
–
Hardware generation (12 values)
place_type
–
Unknown (23 observed values)
bike_racks
–
Regular parking positions
special_racks
–
Charging racks
lon, lat
°
Location
station_status.csv (38 279 885 rows)
column
unit
description
station_id
–
stations.id
time
s
Snapshot timestamp (UTC)
bikes
–
Total bikes currently docked
booked_bikes
–
Bikes reserved by users
bikes_available_to_rent
–
Bikes ready for rental
free_racks
–
Empty regular docks
free_special_racks
–
Empty charging docks
maintenance
bool
true = offline
Quick start
# 1. Clone with Git LFS for large files (optional but recommended)
$ git lfs install
$ git clone https://github.com/tumftm/european-bike‑sharing‑dataset.git
# 2. Decompress the dataset
$ cd full
$ unzip dataset.zip
# 3. Spin up PostGIS (example)
$ createdb bikesharing
$ psql bikesharing -c 'CREATE EXTENSION postgis;'
$ ogr2ogr -f PostgreSQL PG:"dbname=bikesharing" trips.csv -nln trips -oo X_POSSIBLE_NAMES=lon_start -oo Y_POSSIBLE_NAMES=lat_start
For exploratory work you can start with the 1 000‑row sample files:
importpandasaspdimportgeopandasasgpdtrips=pd.read_csv('sample/trips.csv')
# ...# You might want to perform the EWKB conversion manuallycity_areas=gpd.read_file("sample/city_areas.csv")
city_areas=city_areas.set_geometry(gpd.GeoSeries.from_wkb(city_areas.geom_ewkb))
License
The dataset is distributed under the Creative Commons Attribution - NonCommercial 4.0 International (CC BY-NC 4.0) license. When using or adapting the data, please cite the paper which links back to this repository.
@article{waldnerbalke2025,
title={Data-Driven Insights into (E-)Bike-Sharing: Mining a Large-Scale Dataset on Usage and Urban Characteristics - Descriptive Analysis and Performance Modeling},
author={Waldner, Felix. and Balke, Georg and Rech, Felix and Lellep, Martin},
year={2025},
journal={Transportation}
}