This repository presents a complete, validation-driven workflow for stereo depth estimation based on the Semi-Global Matching (SGM) algorithm.
The project demonstrates how a computationally intensive stereo matching algorithm can be:
- rigorously developed and validated in Python,
- systematically transformed into hardware-oriented implementations,
- evaluated across multiple abstraction levels, from software baselines to RTL hardware outputs.
Particular emphasis is placed on algorithmic transparency, reproducibility, and structured cross-validation.
The workflow highlights key architectural trade-offs, including path aggregation strategies, memory constraints, and the balance between disparity quality and hardware efficiency.
An end-to-end stereo depth estimation workflow using Semi-Global Matching (SGM),
bridging algorithmic reference models and FPGA-oriented architectures.
- Abstract
- Project Overview
- Problem Context: Stereo Depth Estimation
- Design Flow Overview
- Algorithm at a Glance
- FPGA-Oriented Design Decisions
- Results & Visual Comparison
- Dependencies & Execution Environment
- Input / Output Data Configuration
- Repository Structure
- Contributing
- License
This repository investigates an algorithm-to-architecture co-design workflow for stereo depth estimation using Semi-Global Matching (SGM).
A fully transparent, from-scratch Python implementation serves as the algorithmic reference model. It enables controlled experimentation and systematic analysis of:
- matching cost computation,
- smoothness penalties ((P_1, P_2)),
- disparity range selection,
- and multi-path aggregation strategies.
Once validated, the reference logic is translated into:
- HLS-style C++ for hardware synthesis, and
- Verilog RTL implementations optimized for FPGA deployment.
The hardware-oriented designs explicitly address practical constraints such as streaming dataflow, limited on-chip buffering, fixed-point arithmetic, and path-reduction trade-offs (1-path, 2-path, and 4-path variants).
This structured progression ensures pixel-level consistency between software and hardware implementations while enabling systematic evaluation of quality–complexity trade-offs.
Stereo depth estimation recovers scene geometry by matching corresponding points between two horizontally displaced images.
The disparity—the horizontal pixel shift between corresponding points—is inversely proportional to depth.
While local matching methods are computationally efficient, they often suffer from:
- noise,
- streaking artifacts,
- poor performance in low-texture or occluded regions.
Semi-Global Matching (SGM) improves robustness by enforcing smoothness along multiple 1-D paths, achieving a balance between local accuracy and global consistency—at the cost of increased computational and memory complexity.
Figure: Left image, right image, and ground-truth disparity map.
The workflow follows a structured, validation-driven approach:
-
Library-Based Baseline (OpenCV)
Establishes an initial quality reference using standard stereo matching implementations. -
Python From-Scratch Implementation
A transparent SGM implementation for understanding the algorithm, verifying correctness, and comparing with library versions. -
FPGA-Oriented Simplification
Algorithm parameters are constrained to reflect realistic memory, bandwidth, and streaming considerations. -
HLS-Style C++ Implementation
Hardware-friendly translation emphasizing fixed-point arithmetic and line-based processing. -
Verilog RTL Path Variants
Cycle-accurate RTL implementations with 1-path, 2-path, and 4-path aggregation to study quality–complexity trade-offs.
The SGM pipeline consists of four core stages:
Figure: End-to-end SGM development.
Computes similarity between pixel windows using Sum of Absolute Differences:
- (C(x, y, d)): cost of matching pixel ((x,y)) with disparity (d)
- (I_L, I_R): left and right images
- (W): local window around ((x,y))
Caption: Low cost → likely correct match; high cost → unlikely match.
Generates a 3D cost volume for all disparity candidates:
- H, W: image height and width
- D: maximum disparity
- Each slice at disparity
d(denotedC[:,:,d]) represents the cost map for that disparity
Caption: Serves as input for path-wise aggregation.
Aggregates costs along multiple paths to enforce smoothness:
-
$L_r(p, d)$ : Aggregated cost at pixel$p$ with disparity$d$ along path$r$ -
$P_1$ : Penalty for small disparity changes ($\pm 1$ pixel) -
$P_2$ : Penalty for larger disparity jumps ($> 1$ pixel) - Multiple paths (horizontal, vertical, diagonal) are summed for the final cost.
Caption: Preserves object boundaries while smoothing disparities.
Selects the disparity with minimum aggregated cost:
- Produces a dense, consistent disparity map
- Used as reference for HLS and Verilog validation
Key design choices reflecting hardware constraints include:
- Resolution: 272 × 240
- Disparity Range: 16 or 64
- Path Variants:
- 1-path: minimal complexity, visible streaking
- 2-path: improved consistency
- 4-path: best visual quality
- Fixed-Point Arithmetic to reduce complexity and improve determinism
These parameters enable controlled exploration of quality–complexity trade-offs.
The following visualization compares all implemented methods, from library baselines to hardware-oriented outputs:
Figure: Disparity maps for all implemented methods — SAD, OpenCV SGBM, OpenCV SGBM+WLS, Python SGM, HLS C++, and Verilog 4-path RTL implementation.
The following implementations are evaluated to demonstrate the progression from software-based baselines to hardware-oriented realizations of the SGM algorithm:
| Method | Description |
|---|---|
| SAD (Sum of Absolute Differences) | OpenCV-based local block-matching baseline using fixed window aggregation; provides a simple and computationally efficient reference. |
| OpenCV SGBM (Reference) | Standard OpenCV Semi-Global Block Matching implementation used as the primary software benchmark for quality comparison. |
| OpenCV SGBM + WLS Filter | SGBM output refined using a Weighted Least Squares (WLS) post-processing filter to enhance edge preservation and reduce noise in homogeneous regions. |
| Python SGM | Fully custom, from-scratch implementation of the complete SGM pipeline, including cost computation, smoothness penalties ((P_1, P_2)), and multi-path aggregation. Serves as the algorithmic reference for hardware validation. |
| HLS (C++ Synthesis) | Hardware-oriented implementation derived from the Python SGM logic and adapted for High-Level Synthesis, incorporating fixed-point arithmetic and streaming constraints. |
| Verilog (4-Path RTL) | Cycle-accurate RTL implementation with four-path cost aggregation, optimized for FPGA deployment and improved disparity consistency. |
-
Software Baselines (SAD / OpenCV SGBM / SGBM + WLS)
Provide fast disparity estimation and establish quality benchmarks. The WLS-filtered variant improves smoothness and reduces noise, particularly in low-texture regions. -
Python SGM
Produces dense and consistent disparity maps, enabling transparent inspection of all algorithmic stages and serving as the validation reference for subsequent hardware implementations. -
HLS and Verilog Implementations
Preserve the structural and visual characteristics of the Python SGM results while adhering to fixed-point arithmetic and streaming constraints, demonstrating a successful transition from algorithm to hardware. -
Path Aggregation Impact (1-Path → 4-Path)
Increasing the number of aggregation paths progressively reduces streaking artifacts, enhances depth continuity, and improves overall disparity completeness, at the cost of additional computational and memory complexity.
Figure: Comparison of 1-path, 2-path, and 4-path aggregation. Increasing the number of paths reduces streaking artifacts and improves depth continuity.
This project spans algorithmic development (Python), High-Level Synthesis (HLS), and RTL verification (Verilog).
The following tools and libraries were used to ensure deterministic cross-validation across all stages.
Required:
- Python 3.8+
- Jupyter Notebook
Libraries:
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as pltInstall dependencies using:
pip install opencv-python numpy pillow matplotlibThe primary algorithmic reference and evaluation framework is provided in:
Stereo_Depth_Estimation.ipynb
This notebook:
- Implements the full SGM pipeline from scratch
- Generates processed pixel streams for hardware validation
- Performs visual and numerical comparison across all methods
- Xilinx Vivado 2019.1
- Used for HLS C++ synthesis
- Used for C-Simulation and RTL Co-Simulation
- Used for Verilog RTL simulation
⚠️ The project was developed and validated using Vivado 2019.1.
Later Vivado versions may require minor project configuration updates.
The HLS and Verilog implementations rely on file-based pixel streaming to enable deterministic comparison against the Python reference model.
Proper path configuration is required before simulation.
The HLS testbench (main_tb.cpp) defines dataset and result paths using compile-time macros:
#ifndef DATA_PATH
#define DATA_PATH "../../../data/processed/"
#endif
#ifndef RESULT_PATH
#define RESULT_PATH "../../../results/"
#endifExpected input files:
data/processed/left_pixels.txt
data/processed/right_pixels.txt
Generated output:
results/hls_disparity.txt
If your Vivado project directory differs, adjust DATA_PATH and RESULT_PATH accordingly.
The RTL comparison testbench loads pixel streams using:
$readmemh("../../../data/processed/left_pixels.hex", memory_left);
$readmemh("../../../data/processed/right_pixels.hex", memory_right);Generated outputs:
results/verilog_disparity_1path.txt
results/verilog_disparity_2path.txt
results/verilog_disparity_4path.txt
Before simulation, ensure that:
- The relative paths match your simulation working directory
- The
data/processed/directory contains correctly formatted.txtand.hexpixel streams - The
results/directory exists
stereo-depth-estimation-sgm/
├── data/ # Rectified stereo image pairs for evaluation and testing
├── diagram/ # Algorithmic and architectural block diagrams
├── hls/ # HLS-style C++ implementations targeting FPGA synthesis
├── results/ # Generated disparity maps and visual comparison outputs
├── verilog/ # RTL modules including cost aggregation paths and WTA logic
├── Stereo_Depth_Estimation.ipynb # Primary Jupyter Notebook: Python reference implementation,
│ # documentation, validation, and cross-comparison framework
├── .gitignore # Git ignore rules
├── LICENSE # MIT License and project legal information
└── README.md # Project documentation and usage guide
Contributions are welcome! If you have suggestions or improvements, feel free to fork the repository and create a pull request.
- Fork the repository.
- Create a new branch:
git checkout -b feature-name
- Commit your changes:
git commit -m "Description of changes" - Push the changes and open a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.





