A Python CLI tool to vectorize raster mask files into polygon shapefiles with topology-preserving simplification.
- Convert raster masks (int8/16/32 class IDs) to vector polygons
- Pure Python topology-preserving Visvalingam-Whyatt (TPVW) simplification
- No GEOS dependency for simplification — self-contained pure Python implementation
- Support for large images (30000x30000+)
- 4-connectivity polygonization
- Output formats: Shapefile (.shp), GeoPackage (.gpkg), GeoJSON (.geojson)
- Preserve class ID attributes
- CRS preservation from input raster
pip install -e .Or install from source:
git clone https://github.com/CVEO/vectorizer-pro.git
cd vectorizer-pro
pip install -e .vectorizer-pro input.tif output.shp# Specify nodata value to exclude
vectorizer-pro input.tif output.shp --nodata 0
# Remove small regions (merge regions smaller than 100 pixels)
vectorizer-pro input.tif output.shp --min-area 100
# Set simplification tolerance
vectorizer-pro input.tif output.shp --tolerance 0.1
# Output as GeoPackage
vectorizer-pro input.tif output.gpkg --format gpkg
# Simplify only internal edges (preserve boundary)
vectorizer-pro input.tif output.shp --no-simplify-boundary| Option | Description |
|---|---|
--nodata INT |
Nodata value to exclude from vectorization |
--min-area FLOAT |
Minimum polygon area threshold. Smaller polygons will be merged into their largest adjacent neighbor |
--tolerance FLOAT |
Simplification tolerance (default: half pixel size) |
--format, -f |
Output format: shp, gpkg, or geojson (default: shp) |
--simplify-boundary/--no-simplify-boundary |
Simplify exterior boundaries (default: yes) |
--detect-nodata |
Print nodata value and exit |
--list-classes |
List unique class IDs and exit |
from vectorizer_pro import vectorize, VectorizeResult
# Simple usage - writes to file
result = vectorize("input.tif", "output.shp", nodata=0)
# Remove small regions in Python API
result = vectorize("input.tif", "output.shp", nodata=0, min_area=100)
# Get geometries without writing
result = vectorize("input.tif", nodata=0, output_path=None)
polygons = result.polygons
class_ids = result.class_ids
crs = result.crs# Check nodata value
vectorizer-pro sample/top_potsdam_2_13.tif --detect-nodata
# List class IDs
vectorizer-pro sample/top_potsdam_2_13.tif --list-classes
# Vectorize excluding class 0
vectorizer-pro sample/top_potsdam_2_13.tif output.shp --nodata 0# High simplification for smoother polygons
vectorizer-pro input.tif output.shp --nodata 0 --tolerance 0.5
# Remove small regions before simplification
vectorizer-pro input.tif output.shp --nodata 0 --min-area 50 --tolerance 0.1
# Preserve exact boundary shape
vectorizer-pro input.tif output.shp --nodata 0 --no-simplify-boundary
# GeoPackage output with custom tolerance
vectorizer-pro input.tif output.gpkg --format gpkg --tolerance 0.05- Python >= 3.10
- rasterio
- shapely >= 2.1
- click
- fiona
- numpy
-
GDAL - Raster I/O and Polygonize algorithm
https://gdal.org/ -
Shapely - Python geometry operations
https://shapely.readthedocs.io/ -
GEOS - C/C++ Geometry engine (reference implementation for TPVW algorithm) https://libgeos.org/
-
JTS (Java Topology Suite) - JAVA Topology Processing https://github.com/locationtech/jts
-
GDAL Polygonize - Two-arm chain edge tracing algorithm for 4-connectivity raster vectorization
-
Visvalingam-Whyatt - Area-based vertex removal simplification that preserves topology in polygonal coverages
-
TPVW (Topology-Preserving Visvalingam-Whyatt) - Extension of VW algorithm that ensures shared edges between adjacent polygons are simplified identically, preventing gaps and overlaps
-
sample/top_potsdam_2_13.tif- Semantic labeling result generated by an AI model on the ISPRS Potsdam 2D Semantic Labeling Contest benchmark dataset. Used as a demonstration of vectorizing large raster masks. -
sample/small.tif- A smaller sample for quick testing.
The original Potsdam aerial imagery and ground truth are from the ISPRS benchmark: https://www.isprs.org/
Wuhan University CVEO Team (武汉大学CVEO课题组)
Website: https://www.whu-cveo.com/
MIT License - see LICENSE for details.