Skip to content

Commit a199b84

Browse files
mivertowskiclaude
andcommitted
Update paper to paradigm-focused ecosystem coverage with arXiv template
- Convert all sections from RingKernel-focused to paradigm-focused - Add ecosystem coverage: DotCompute, Orleans.GpuBridge, RustGraph - Switch to arXiv-style template with Latin Modern fonts - Add cross-implementation benchmarks and comparison tables - Add ORCID and author affiliation - Update abstract, intro, background, design, implementation, evaluation, discussion, and conclusion sections Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 8e967b5 commit a199b84

12 files changed

Lines changed: 846 additions & 310 deletions

docs/paper/main.pdf

166 KB
Binary file not shown.

docs/paper/main.tex

Lines changed: 114 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,109 @@
1-
% RingKernel: A GPU-Native Persistent Actor Model
2-
% Technical Paper - Main Document
1+
% The GPU-Native Persistent Actor Model
2+
% Technical Paper - arXiv Preprint Style
33
%
4-
% Target venues: arXiv (cs.DC, cs.PL), ASPLOS, EuroSys, PLDI, PPoPP
4+
% Describes a paradigm for treating GPU compute units as actors with
5+
% persistent kernels, lock-free message passing, and causal ordering.
56
%
6-
\documentclass[11pt,a4paper]{article}
7-
8-
% Page geometry
9-
\usepackage[margin=1in]{geometry}
7+
% Implementations: RingKernel (Rust), DotCompute (.NET), Orleans.GpuBridge, RustGraph
8+
%
9+
% Target: arXiv (cs.DC, cs.PL, cs.AR)
10+
%
11+
\documentclass[11pt,letterpaper]{article}
12+
13+
%% ============================================================================
14+
%% arXiv-style formatting
15+
%% ============================================================================
16+
17+
% Page geometry - generous margins for readability
18+
\usepackage[
19+
letterpaper,
20+
top=1in,
21+
bottom=1in,
22+
left=1.25in,
23+
right=1.25in
24+
]{geometry}
25+
26+
% Typography
27+
\usepackage[T1]{fontenc}
28+
\usepackage{lmodern} % Latin Modern fonts
29+
\usepackage{microtype} % Improved typography
30+
\usepackage{setspace}
31+
\setstretch{1.15} % Slightly increased line spacing
32+
33+
% Math
34+
\usepackage{amsmath}
35+
\usepackage{amssymb}
36+
\usepackage{amsthm}
1037

11-
% Packages
38+
% Tables and figures
1239
\usepackage{booktabs}
40+
\usepackage{multirow}
1341
\usepackage{subcaption}
42+
\usepackage{graphicx}
43+
\usepackage{float}
44+
45+
% Code listings
1446
\usepackage{listings}
1547
\usepackage{xcolor}
48+
49+
% Graphics
1650
\usepackage{tikz}
1751
\usepackage{pgfplots}
18-
% algorithm/algpseudocode removed - use lstlisting for pseudocode
52+
\pgfplotsset{compat=1.16}
53+
\usetikzlibrary{shapes,arrows,positioning,fit,calc}
54+
55+
% References and links
1956
\usepackage{hyperref}
20-
\usepackage{amsmath}
21-
\usepackage{amssymb}
22-
\usepackage{graphicx}
23-
\usepackage{multirow}
2457
\usepackage{url}
25-
\usepackage{natbib}
58+
\usepackage[numbers,sort&compress]{natbib}
2659

27-
\pgfplotsset{compat=1.16}
28-
\usetikzlibrary{shapes,arrows,positioning,fit,calc}
60+
% Author handling
61+
\usepackage{authblk}
62+
\usepackage{orcidlink} % ORCID icons
2963

30-
% Hyperref setup
64+
%% ============================================================================
65+
%% Hyperref setup
66+
%% ============================================================================
3167
\hypersetup{
3268
colorlinks=true,
33-
linkcolor=blue,
69+
linkcolor=blue!70!black,
3470
filecolor=magenta,
35-
urlcolor=cyan,
36-
citecolor=blue,
71+
urlcolor=blue!70!black,
72+
citecolor=green!50!black,
73+
pdftitle={The GPU-Native Persistent Actor Model},
74+
pdfauthor={Michael Ivertowski},
75+
pdfsubject={GPU Computing, Actor Model, Distributed Systems},
76+
pdfkeywords={Actor Model, GPU, CUDA, Persistent Kernels, HLC}
3777
}
3878

39-
% Code listing style for Rust
79+
%% ============================================================================
80+
%% Code listing styles
81+
%% ============================================================================
82+
83+
% Rust
4084
\lstdefinelanguage{Rust}{
4185
keywords={fn, let, mut, if, else, match, for, while, loop, return, struct, enum, impl, trait, pub, use, mod, async, await, self, Self, where, type, const, static, unsafe, extern, crate, super},
42-
keywordstyle=\color{blue}\bfseries,
43-
keywords=[2]{i32, i64, u32, u64, f32, f64, bool, usize, isize, String, Vec, Option, Result, Arc, Box},
44-
keywordstyle=[2]\color{teal},
86+
keywordstyle=\color{blue!80!black}\bfseries,
87+
keywords=[2]{i32, i64, u32, u64, f32, f64, bool, usize, isize, String, Vec, Option, Result, Arc, Box, AtomicU32, AtomicU64},
88+
keywordstyle=[2]\color{teal!80!black},
4589
comment=[l]{//},
4690
morecomment=[s]{/*}{*/},
4791
commentstyle=\color{gray}\itshape,
48-
stringstyle=\color{red},
92+
stringstyle=\color{red!70!black},
4993
morestring=[b]",
5094
basicstyle=\ttfamily\small,
5195
breaklines=true,
5296
showstringspaces=false,
5397
tabsize=2,
5498
}
5599

56-
% Code listing style for CUDA
100+
% CUDA
57101
\lstdefinelanguage{CUDA}{
58102
language=C++,
59-
morekeywords={__global__, __device__, __shared__, __host__, threadIdx, blockIdx, blockDim, gridDim, atomicAdd, atomicCAS, __syncthreads},
60-
keywordstyle=\color{blue}\bfseries,
103+
morekeywords={__global__, __device__, __shared__, __host__, threadIdx, blockIdx, blockDim, gridDim, atomicAdd, atomicCAS, __syncthreads, __threadfence},
104+
keywordstyle=\color{blue!80!black}\bfseries,
61105
commentstyle=\color{gray}\itshape,
62-
stringstyle=\color{red},
106+
stringstyle=\color{red!70!black},
63107
basicstyle=\ttfamily\small,
64108
breaklines=true,
65109
showstringspaces=false,
@@ -68,39 +112,65 @@
68112
\lstset{
69113
language=Rust,
70114
frame=single,
115+
framerule=0.5pt,
116+
rulecolor=\color{gray!50},
71117
numbers=left,
72118
numberstyle=\tiny\color{gray},
73119
xleftmargin=2em,
74120
framexleftmargin=1.5em,
121+
backgroundcolor=\color{gray!5},
122+
captionpos=b,
75123
}
76124

77-
% Document metadata
78-
\title{\textbf{RingKernel: A GPU-Native Persistent Actor Model for\\High-Performance Concurrent Computing}}
125+
%% ============================================================================
126+
%% Custom commands
127+
%% ============================================================================
128+
\newcommand{\arxiv}[1]{\href{https://arxiv.org/abs/#1}{arXiv:#1}}
129+
\newcommand{\github}[1]{\href{https://github.com/#1}{\texttt{github.com/#1}}}
79130

80-
\author{
81-
Michael Ivertowski\\
82-
\textit{Independent Researcher}\\
83-
Zurich, Switzerland\\
84-
\texttt{mivertowski@outlook.com}
131+
%% ============================================================================
132+
%% Document metadata
133+
%% ============================================================================
134+
135+
\title{%
136+
\LARGE\textbf{The GPU-Native Persistent Actor Model:}\\[0.3em]
137+
\Large\textbf{Bringing Actor Semantics to Massively Parallel Hardware}
138+
}
139+
140+
\author[1]{Michael Ivertowski~\orcidlink{0009-0008-7829-2249}}
141+
\affil[1]{%
142+
Ernst \& Young AG\\
143+
Zurich, Switzerland\\
144+
\texttt{michael.ivertowski@ch.ey.com}
85145
}
86146

87-
\date{\today}
147+
\date{%
148+
January 2026\\[1em]
149+
\small\textit{Preprint. Under review.}
150+
}
88151

152+
%% ============================================================================
153+
%% Document
154+
%% ============================================================================
89155
\begin{document}
90156

91157
\maketitle
92158

93-
% Abstract
159+
%% Abstract
94160
\begin{abstract}
161+
\noindent
95162
\input{sections/00-abstract}
96163
\end{abstract}
97164

98165
\vspace{1em}
99-
\noindent\textbf{Keywords:} Actor Model, GPU Computing, Persistent Kernels, Message Passing, CUDA, Hybrid Logical Clocks, Lock-Free Algorithms
166+
\noindent\textbf{Keywords:} Actor Model, GPU Computing, Persistent Kernels, Message Passing, Hybrid Logical Clocks, Lock-Free Algorithms, CUDA, WebGPU, Distributed Systems, Graph Analytics
100167

101-
\vspace{1em}
168+
\vspace{0.5em}
169+
\noindent\textbf{ACM CCS:} Computer systems organization $\rightarrow$ Parallel architectures; Software and its engineering $\rightarrow$ Concurrent programming structures
170+
171+
\vspace{1.5em}
102172

103-
% Main content sections
173+
%% Main content sections
104174
\input{sections/01-introduction}
105175
\input{sections/02-background}
106176
\input{sections/03-related-work}
@@ -110,17 +180,17 @@
110180
\input{sections/07-discussion}
111181
\input{sections/08-conclusion}
112182

113-
% Acknowledgments
183+
%% Acknowledgments
114184
\section*{Acknowledgments}
115185
We thank the open-source community for their contributions to the CUDA ecosystem,
116186
particularly the cudarc project for Rust CUDA bindings. We also acknowledge the
117187
foundational work on the actor model by Carl Hewitt and colleagues.
118188

119-
% Bibliography
189+
%% Bibliography
120190
\bibliographystyle{plainnat}
121191
\bibliography{references}
122192

123-
% Appendix
193+
%% Appendix
124194
\appendix
125195
\input{sections/09-appendix}
126196

docs/paper/references.bib

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,3 +266,37 @@ @misc{nvidia2023cooperative
266266
year = {2023},
267267
howpublished = {\url{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cooperative-groups}},
268268
}
269+
270+
%% ===== GPU-Native Actor Ecosystem =====
271+
272+
@misc{ringkernel2025,
273+
author = {Ivertowski, Michael},
274+
title = {RingKernel: GPU-Native Persistent Actor Model for Rust},
275+
year = {2025},
276+
howpublished = {\url{https://github.com/mivertowski/RustCompute}},
277+
note = {Rust implementation with CUDA/WebGPU backends},
278+
}
279+
280+
@misc{dotcompute2025,
281+
author = {Ivertowski, Michael},
282+
title = {DotCompute: Universal Compute Acceleration for .NET},
283+
year = {2025},
284+
howpublished = {\url{https://github.com/mivertowski/DotCompute}},
285+
note = {.NET 9 implementation with CUDA/OpenCL/Metal backends},
286+
}
287+
288+
@misc{orleansgpubridge2025,
289+
author = {Ivertowski, Michael},
290+
title = {Orleans.GpuBridge: GPU Acceleration for Microsoft Orleans},
291+
year = {2025},
292+
howpublished = {\url{https://github.com/mivertowski/Orleans.GpuBridge}},
293+
note = {Integration of GPU-native actors with Orleans virtual actors},
294+
}
295+
296+
@misc{rustgraph2025,
297+
author = {Ivertowski, Michael},
298+
title = {RustGraph: Living Graph Database with GPU-Native Actors},
299+
year = {2025},
300+
howpublished = {\url{https://github.com/mivertowski/RustGraph}},
301+
note = {Graph nodes as persistent GPU actors with 64+ living analytics},
302+
}

docs/paper/sections/00-abstract.tex

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,22 @@
44
The actor model, introduced by Hewitt in 1973, has become foundational for building
55
concurrent and distributed systems. However, existing implementations target CPU
66
architectures, leaving GPU parallelism largely unexplored for actor-based computation.
7-
We present \textbf{RingKernel}, a GPU-native persistent actor model that treats GPU
7+
We present the \textbf{GPU-Native Persistent Actor Model}, a paradigm that treats GPU
88
compute units as long-running actors with lock-free message passing and causal ordering.
99

10-
Our key contributions are: (1) a formal extension of the actor model for GPU execution
11-
with Host-to-Kernel (H2K), Kernel-to-Host (K2H), and Kernel-to-Kernel (K2K) messaging
12-
channels; (2) a 128-byte \texttt{ControlBlock} structure for GPU-resident actor lifecycle
13-
management; (3) integration of Hybrid Logical Clocks (HLC) for causal ordering across
14-
thousands of concurrent GPU actors; and (4) a Rust-to-CUDA transpiler that generates
15-
persistent kernel code from high-level actor definitions.
10+
This paper describes \textbf{RingKernel}, the Rust implementation of this paradigm,
11+
alongside three companion frameworks: \textbf{DotCompute} (.NET), \textbf{Orleans.GpuBridge}
12+
(Microsoft Orleans integration), and \textbf{RustGraph} (living graph database). Together,
13+
these systems demonstrate the broad applicability of GPU-native actors.
1614

17-
We evaluate RingKernel on NVIDIA RTX Ada GPUs, demonstrating that persistent GPU actors
18-
achieve \textbf{11,327$\times$ lower latency} for interactive commands compared to
19-
traditional kernel launches (0.03$\mu$s vs 317$\mu$s). For mixed workloads combining
20-
computation with interactive commands, RingKernel achieves \textbf{2.7$\times$ higher
21-
throughput} than the traditional launch-per-operation model. Our system bridges the
22-
gap between high-level actor semantics and GPU hardware capabilities, enabling new
23-
classes of interactive GPU applications.
15+
Our key contributions are: (1) formalization of GPU actor semantics with Host-to-Kernel (H2K),
16+
Kernel-to-Host (K2H), and Kernel-to-Kernel (K2K) messaging channels; (2) a 128-byte
17+
\texttt{ControlBlock} structure for GPU-resident actor lifecycle management; (3) integration
18+
of Hybrid Logical Clocks (HLC) for causal ordering across thousands of concurrent GPU actors;
19+
and (4) cross-language implementations proving the paradigm's universality.
20+
21+
We evaluate on NVIDIA RTX Ada GPUs, demonstrating that persistent GPU actors achieve
22+
\textbf{11,327$\times$ lower latency} for interactive commands compared to traditional
23+
kernel launches (0.03$\mu$s vs 317$\mu$s). For mixed workloads, GPU-native actors achieve
24+
\textbf{2.7$\times$ higher throughput}, enabling new classes of interactive GPU applications
25+
including real-time fraud detection, living graph analytics, and distributed digital twins.

docs/paper/sections/01-introduction.tex

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,35 @@ \subsection{Persistent Kernels: A Partial Solution}
5252

5353
\subsection{Our Contribution: GPU-Native Actors}
5454

55-
We present \textbf{RingKernel}, a system that applies actor model semantics to GPU
56-
computing. Our key insight is that GPU threads (or thread blocks) can be viewed as
57-
actors: they have private state (registers, shared memory), communicate via messages
58-
(through lock-free queues), and run persistently.
55+
We present the \textbf{GPU-Native Persistent Actor Model}, a paradigm that applies
56+
actor semantics to GPU computing. Our key insight is that GPU threads (or thread blocks)
57+
can be viewed as actors: they have private state (registers, shared memory), communicate
58+
via messages (through lock-free queues), and run persistently.
5959

60-
RingKernel makes the following contributions:
60+
This paradigm is realized through four complementary implementations:
61+
62+
\begin{itemize}
63+
\item \textbf{RingKernel} (Rust): The reference implementation described in this paper,
64+
featuring a Rust-to-CUDA transpiler and comprehensive runtime.
65+
66+
\item \textbf{DotCompute} (.NET 9/C\#): A production-grade framework with multi-backend
67+
support (CUDA, OpenCL, Metal), LINQ-to-GPU compilation, and Native AOT compatibility.
68+
69+
\item \textbf{Orleans.GpuBridge} (.NET/Orleans): Integration with Microsoft Orleans'
70+
virtual actor model, enabling distributed GPU actors across Orleans clusters with
71+
hypergraph support and temporal causality.
72+
73+
\item \textbf{RustGraph} (Rust): A living graph database where nodes and edges are
74+
persistent GPU actors, maintaining 64+ analytics algorithms via continuous message
75+
propagation with O(1) query latency.
76+
\end{itemize}
77+
78+
Together, these systems demonstrate that GPU-native actors are a universal paradigm
79+
applicable across languages, frameworks, and domains.
80+
81+
\subsection{Contributions}
82+
83+
This paper makes the following contributions:
6184

6285
\begin{enumerate}
6386
\item \textbf{Formalization of GPU Actor Semantics} (\S\ref{sec:design}): We extend
@@ -73,9 +96,13 @@ \subsection{Our Contribution: GPU-Native Actors}
7396
HLC~\cite{kulkarni2014hlc} for causal ordering of messages across GPU actors,
7497
enabling distributed systems semantics on massively parallel hardware.
7598

76-
\item \textbf{Rust-to-CUDA Transpilation} (\S\ref{sec:implementation}): We provide
77-
a DSL and transpiler that generates persistent kernel CUDA code from high-level
78-
Rust actor definitions, including automatic message envelope handling.
99+
\item \textbf{Cross-Language Implementations} (\S\ref{sec:implementation}): We provide
100+
implementations in Rust and .NET, with transpilers generating CUDA, WGSL, and MSL,
101+
demonstrating the paradigm's language-independence.
102+
103+
\item \textbf{Domain-Specific Applications}: We apply GPU-native actors to FDTD
104+
simulation (RingKernel), enterprise accounting (DotCompute), distributed virtual
105+
actors (Orleans.GpuBridge), and living graph analytics (RustGraph).
79106

80107
\item \textbf{Comprehensive Evaluation} (\S\ref{sec:evaluation}): We demonstrate
81108
11,327$\times$ lower command latency and 2.7$\times$ higher mixed-workload
@@ -86,7 +113,8 @@ \subsection{Paper Organization}
86113

87114
The remainder of this paper is organized as follows. Section~\ref{sec:background}
88115
provides background on the actor model and GPU programming. Section~\ref{sec:related}
89-
discusses related work. Section~\ref{sec:design} presents the RingKernel system design.
90-
Section~\ref{sec:implementation} details the implementation. Section~\ref{sec:evaluation}
116+
discusses related work including our companion implementations. Section~\ref{sec:design}
117+
presents the GPU-native actor system design. Section~\ref{sec:implementation} details
118+
the RingKernel implementation and cross-language ecosystem. Section~\ref{sec:evaluation}
91119
evaluates performance. Section~\ref{sec:discussion} discusses limitations and future
92120
work. Section~\ref{sec:conclusion} concludes.

0 commit comments

Comments
 (0)