A GPU-Accelerated Unsteady Vortex Method Solver for Multi-Body Aerospace Problems
VortexGPU is a mesh-free, Lagrangian vortex method solver that simulates unsteady incompressible aerodynamic flows around single and multiple interacting bodies. It is built entirely from scratch — no external CFD libraries — and is GPU-native, leveraging fast N-body summation algorithms (treecode and fast multipole method) to achieve O(N log N) or O(N) computational scaling.
The solver tracks discrete vortex particles (vortex blobs) as they are shed from body surfaces and advected through the flow field. At every timestep, the velocity of each particle is computed from the collective influence of all other particles via the Biot-Savart law — an N-body problem that maps naturally onto GPU hardware.
- Unsteady airfoil aerodynamics: Impulsive starts, pitching/plunging motions, dynamic stall onset
- Wake interaction: Multiple bodies flying through each other's wakes (tandem, formation, biplane)
- Vortex-dominated flows: Problems where wake structure is the primary physical feature — flow visualization, vortex rollup, Kelvin-Helmholtz instability
Traditional grid-based CFD (finite volume, finite element, finite difference) discretizes the entire fluid domain. Vortex methods take a fundamentally different approach:
- No mesh in the wake: Vorticity-carrying particles are tracked in a Lagrangian frame. The wake evolves naturally without requiring mesh refinement to resolve it.
- Automatic far-field satisfaction: The Biot-Savart law inherently gives velocity that decays to zero at infinity — no artificial boundary conditions needed.
- Natural wake capture: Wake rollup, convection, merging, and interaction emerge from the physics without special treatment.
- Embarrassing parallelism: The core N-body velocity evaluation is perfectly suited for GPU computation.
- Physical intuition: The simulation output is literally the vortical structures — you see the physics directly.
The fundamental trade-off: vortex methods excel at external, vortex-dominated, unsteady flows. They are not well-suited for wall-bounded turbulence, internal flows, or problems where pressure fields are the primary output.
The solver is grounded in the vorticity-velocity formulation of the incompressible Navier-Stokes equations. In this formulation, pressure is eliminated entirely, and the key insight emerges: at high Reynolds numbers, vorticity is approximately materially conserved — it moves with the fluid. So the computational task reduces to:
- Track where vorticity is (particle positions)
- Compute the velocity field from the vorticity (Biot-Savart summation)
- Advect particles using the computed velocity
- Shed new vorticity from body trailing edges (Kutta condition)
Airfoil surfaces are represented as collections of flat panels. Each panel carries a bound vortex whose strength is determined by enforcing a no-penetration boundary condition (flow cannot pass through the body). This produces a linear system that is solved at every timestep. For rigid bodies, the system matrix is constant and can be LU-factored once and reused — only the right-hand side (which depends on the evolving wake) changes.
The panel method uses the Pistolesi rule: vortices placed at the 1/4-chord point of each panel, collocation (boundary condition enforcement) at the 3/4-chord point. This ensures correct thin-airfoil theory behavior in the limit.
At every timestep, a new vortex particle is shed from the trailing edge. Its strength is determined by Kelvin's circulation theorem: the total circulation in the flow (bound + wake) must remain constant. If the flow starts from rest, total circulation stays zero for all time.
The wake vortex particles are then advected by the total velocity field — freestream plus the induced velocity from all other particles (both wake and bound). This advection step is the computational bottleneck and is where GPU acceleration is critical.
Point vortices have singular velocity fields (infinite velocity at the vortex center). This is regularized by replacing point vortices with vortex blobs — vorticity distributed over a small core of radius σ. The smoothing parameter σ is typically set to 1.5× the initial inter-particle spacing.
Aerodynamic forces are computed via the impulse method: the force is proportional to the rate of change of the first moment of vorticity. This avoids the difficulty of computing pressure fields (which don't exist explicitly in the vortex method formulation) and is generally more robust than pressure integration in this context.
The solver is developed incrementally across four phases of increasing complexity:
A complete working unsteady vortex method solver for a single 2D airfoil. Includes NACA 4-digit airfoil geometry generation with cosine panel spacing, the full panel method linear system, wake shedding via Kelvin's theorem, and RK2 time integration. The naive O(N²) velocity evaluation runs on GPU using a tiled shared-memory kernel. Validated against the Wagner function (impulsive start), steady thin-airfoil theory, and the Theodorsen function (unsteady pitching).
Replaces the O(N²) velocity evaluation with an O(N log N) Barnes-Hut treecode. Particles are sorted by Morton codes (Z-order curve) for spatial locality, a quadtree is built implicitly from the sorted order, and tree traversal uses a multipole acceptance criterion to decide when distant particle groups can be approximated as single sources. The treecode is validated against direct summation for accuracy, and scaling is demonstrated from N = 10³ to N = 10⁶. An optional FMM (O(N)) implementation adds local expansions and a systematic downward pass for further speedup.
Extends the solver to handle two or more bodies simultaneously. Each body has its own panel set and sheds its own wake, but all bodies share a single velocity field. The panel method linear system is coupled across all bodies. Applications include tandem airfoil wake interaction, formation flight, and gust encounter problems. Bodies can translate and rotate independently.
Extends from 2D point vortices to 3D vortex filament segments. The 3D Biot-Savart law is integrated analytically for straight filament segments. A finite wing is modeled with a lifting line (bound filament along the quarter-chord) and trailing vortex filaments shed at each spanwise station. The key physical phenomenon is tip vortex rollup — the trailing vortex sheet rolls up into concentrated tip vortex cores, driven by Kelvin-Helmholtz instability. Validated against the elliptic wing analytical induced drag result.
| Component | Choice | Rationale |
|---|---|---|
| GPU Kernels | CUDA C++ | Direct control over shared memory tiling, warp-level operations for treecode traversal |
| Host Logic | C++ | Performance-critical panel solver, tree construction |
| Prototyping | Python + CuPy/Numba | Acceptable for Phase 1-2 development; CUDA C++ preferred for Phase 3-4 |
| Visualization | VTK (.vtp) + ParaView | 3D wake visualization, especially critical for Phase 4 |
| 2D Animation | matplotlib | Quick 2D wake animations, force history plots |
| Build System | CMake | Standard for CUDA C++ projects |
No external CFD libraries are used. Every component — geometry generation, panel method, Biot-Savart kernels, treecode, force computation, I/O — is implemented from scratch.
All detailed mathematical derivations, implementation specifications, and validation procedures are in the docs/ directory:
| Document | What's Inside |
|---|---|
docs/theory.md |
Governing equations (vorticity transport), Biot-Savart law (2D component form), regularization kernels (Gaussian, algebraic), viscous diffusion (core spreading, PSE), Kutta condition physics, Kelvin's circulation theorem, non-dimensionalization table, compressible flow extension |
docs/phase1_2d_single_body.md |
Panel representation (lumped vortex vs constant-strength), influence coefficient formulas, no-penetration boundary condition assembly, Kutta condition implementation (wake-shedding approach), complete timestepping algorithm, RK2 integration, force computation (impulse + unsteady Bernoulli), NACA 4-digit geometry generation with cosine spacing |
docs/phase2_gpu_fast_summation.md |
Barnes-Hut treecode (quadtree construction, multipole acceptance criterion, higher-order expansions), FMM (P2M/M2M/M2L/L2L/L2P/P2P operators, translation formulas), GPU implementation (Morton codes, implicit tree, warp-level traversal, shared-memory P2P), accuracy control methodology |
docs/phase3_multi_body.md |
Coupled multi-body panel system formulation, cross-body influence coefficients, moving/rotating body boundary conditions, Body class + shared WakeField architecture |
docs/phase4_3d_extension.md |
3D Biot-Savart law for filament segments (integrated closed-form + regularized version), lifting line + free wake vortex lattice method, tip vortex rollup physics |
docs/validation.md |
Wagner function test (impulsive flat plate), NACA 0012 steady lift, Theodorsen function test (pitching airfoil), treecode accuracy vs direct sum, tandem airfoil qualitative test, elliptic wing induced drag, performance scaling targets |
docs/references.md |
Essential reading: Katz & Plotkin, Cottet & Koumoutsakos, Barba GPU vortex codes, Greengard & Rokhlin FMM, Barnes & Hut treecode, Yokota & Barba CUDA implementation guide; validation data sources |
VortexGPU/
├── CMakeLists.txt
├── src/
│ ├── core/ # Particle/Panel/Body structs, Biot-Savart kernels (CPU + GPU)
│ ├── geometry/ # NACA airfoil generation, panel discretization
│ ├── solver/ # Panel linear system, wake management, time integration, forces
│ ├── treecode/ # Morton codes, quadtree construction, treecode traversal kernel
│ ├── multibody/ # Coupled multi-body solver, Body class with reference frame
│ ├── io/ # VTK and CSV output writers
│ └── main.cpp # Entry point and config parsing
├── tests/ # Unit and validation tests
├── configs/ # JSON simulation configs (single airfoil, tandem, etc.)
├── scripts/ # Python post-processing (force plots, wake animation)
├── benchmarks/ # Performance measurement scripts
└── docs/ # Full mathematical and implementation documentation
| Configuration | N Particles | Time per Step | Hardware |
|---|---|---|---|
| Phase 1 (O(N²) direct, GPU) | 10,000 | ~1 ms | Modern GPU |
| Phase 1 (O(N²) direct, GPU) | 100,000 | ~100 ms | Modern GPU |
| Phase 2 (Treecode, GPU) | 100,000 | ~5-10 ms | Modern GPU |
| Phase 2 (Treecode, GPU) | 1,000,000 | ~50-100 ms | Modern GPU |
The treecode provides 10-20× speedup over direct summation at N = 100,000, enabling long-time simulations with dense wakes that would be impractical with O(N²) scaling.
- Double precision (fp64) for positions and circulations: Single precision causes visible drift in vortex trajectories over long simulations. Mixed precision (fp64 positions, fp32 far-field kernel) is acceptable since treecode approximation error exceeds fp32 rounding error.
- RK2 minimum for time integration: Forward Euler produces unacceptable trajectory drift. RK2 (midpoint method) doubles the cost per step but is essential for accuracy.
- Cosine panel spacing: Clustering panels near the leading and trailing edges is critical for accurate pressure distributions. Uniform spacing gives poor results.
- Treecode before FMM: The Barnes-Hut treecode is simpler to implement on GPU (regular traversal pattern) and provides O(N log N) scaling. FMM (O(N)) is only worth the additional complexity for very large N.
- Inviscid first, viscosity later: Start with the inviscid approximation (vorticity conservation). Core spreading for viscous diffusion can be added as an optional feature.
See LICENSE for details.