Skip to content

Commit 3294baf

Browse files
committed
TOC
1 parent d3748c5 commit 3294baf

2 files changed

Lines changed: 143 additions & 0 deletions

File tree

OpenTOC/fcpc25.html

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
<html xmlns:bkstg="http://www.atypon.com/backstage-ns" xmlns:urlutil="java:com.atypon.literatum.customization.UrlUtil" xmlns:pxje="java:com.atypon.frontend.services.impl.PassportXslJavaExtentions"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta http-equiv="Content-Style-Type" content="text/css"><style type="text/css">
2+
#DLtoc {
3+
font: normal 12px/1.5em Arial, Helvetica, sans-serif;
4+
}
5+
6+
#DLheader {
7+
}
8+
#DLheader h1 {
9+
font-size:16px;
10+
}
11+
12+
#DLcontent {
13+
font-size:12px;
14+
}
15+
#DLcontent h2 {
16+
font-size:14px;
17+
margin-bottom:5px;
18+
}
19+
#DLcontent h3 {
20+
font-size:12px;
21+
padding-left:20px;
22+
margin-bottom:0px;
23+
}
24+
25+
#DLcontent ul{
26+
margin-top:0px;
27+
margin-bottom:0px;
28+
}
29+
30+
.DLauthors li{
31+
display: inline;
32+
list-style-type: none;
33+
padding-right: 5px;
34+
}
35+
36+
.DLauthors li:after{
37+
content:",";
38+
}
39+
.DLauthors li.nameList.Last:after{
40+
content:"";
41+
}
42+
43+
.DLabstract {
44+
padding-left:40px;
45+
padding-right:20px;
46+
display:block;
47+
}
48+
49+
.DLformats li{
50+
display: inline;
51+
list-style-type: none;
52+
padding-right: 5px;
53+
}
54+
55+
.DLformats li:after{
56+
content:",";
57+
}
58+
.DLformats li.formatList.Last:after{
59+
content:"";
60+
}
61+
62+
.DLlogo {
63+
vertical-align:middle;
64+
padding-right:5px;
65+
border:none;
66+
}
67+
68+
.DLcitLink {
69+
margin-left:20px;
70+
}
71+
72+
.DLtitleLink {
73+
margin-left:20px;
74+
}
75+
76+
.DLotherLink {
77+
margin-left:0px;
78+
}
79+
80+
</style><title>FCPC '25: Proceedings of the 1st FastCode Programming Challenge</title></head><body><div id="DLtoc"><div id="DLheader"><h1>FCPC '25: Proceedings of the 1st FastCode Programming Challenge</h1><a class="DLcitLink" title="Go to the ACM Digital Library for additional information about this proceeding" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/proceedings/10.1145/3711708"><img class="DLlogo" alt="Digital Library logo" height="30" src="https://dl.acm.org/specs/products/acm/releasedAssets/images/footer-logo1.png">
81+
Full Citation in the ACM Digital Library
82+
</a></div><div id="DLcontent">
83+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723442">ParaCoder: Parallel Code Generation with Large Language Model</a></h3><ul class="DLauthors"><li class="nameList">Xiaowen Huang</li><li class="nameList">Xu Zhang</li><li class="nameList">Lvfang Tao</li><li class="nameList">Renjie Mao</li><li class="nameList">Nan Zhou</li><li class="nameList">Wenxi Zhu</li><li class="nameList">Minwen Deng</li><li class="nameList">Jintao Meng</li><li class="nameList">Yanjie Wei</li><li class="nameList">Amelie Chi Zhou</li><li class="nameList">Bingqiang Wang</li><li class="nameList Last">Shengzhong Feng</li></ul><div class="DLabstract"><div style="display:inline">
84+
<p>High-performance parallel code generation is a complex and fascinating area in computer science that focuses on producing code that executes as quickly and efficiently as possible. In our paper, we designed a new architecture for parallel code generation agent with 4 inter-connected components of <em>LLM---Memory, Planning, Tools and Action.</em> It also incooperated with two techniques: data augmentation, prompting and retrieval-augmented editing to improve the performance of the parallel codes. Data augmentation is implemented by extracting and processing PIE dataset, and also synthesis dataset generated by LLM models with ParEval benchmark. Finally planning-oriented prompting, code verification and retrieval augmented editing are used to promote the actual performance of the LLM generated code. The evaluation results confirm that a rough speedup of 6.06X and 5.13X are achieved using Qwen2.5-Coder-7B-Instruct, Qwen2.5-Coder-14B-Instruct LLM models.</p>
85+
</div></div>
86+
87+
88+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723443">An Efficient Implementation of Parallel Breadth-first Search</a></h3><ul class="DLauthors"><li class="nameList">Pao-I Chen</li><li class="nameList Last">Tsung-Wei Huang</li></ul><div class="DLabstract"><div style="display:inline">
89+
<p>Breadth-first search (BFS) is a fundamental building block of many graph algorithms, such as shortest path finding, network flow analysis, and connected component detection. As the graph size continues to increase, efficient implementations of parallel BFS across multiple cores are critical to the design of scalable graph algorithms. In this paper, we introduce an efficient implementation of the popular bi-directional BFS (BD-BFS) algorithm. Evaluating on the Speedcode platform [3], we can achieve 38.07% throughput performance improvement (3.01B/s vs 2.18B/s) over the conventional BD-BFS implementation.</p>
90+
</div></div>
91+
92+
93+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723445">HisOrder: A Historical Frontier-Aware Graph Reordering for Efficient BFS</a></h3><ul class="DLauthors"><li class="nameList">Xinmiao Zhang</li><li class="nameList">Xiaorong Qin</li><li class="nameList">Lei Zhang</li><li class="nameList Last">Cheng Liu</li></ul><div class="DLabstract"><div style="display:inline">
94+
<p>The efficiency of Breadth-first Search (BFS) is predominantly constrained by its poor locality and severe load imbalance. The former primarily arises from random accesses to neighbors of active vertices (<em>a.k.a</em> frontiers), while the latter is caused by skewed distribution of edge traversals among CPU cores. Graph reordering aims to improve data locality by contiguously placing vertices that are likely to be consecutively accessed, and can potentially enhance the load balance by distributing more balanced workloads to CPU cores. Prior graph reordering methods generally adopt static structural metrics like vertex degree and graph community to explore the vertex locality. However, the data access patterns and load distribution of BFS are intrinsically contingent upon the frontiers generated at runtime, which can vary significantly across different inputs and iterations. To this end, we propose a frontier-based locality feature vector that is capable to characterize the locality of the frontiers in BFS requests. With this feature vector, we propose a graph reordering method named HisOrder, which uses history of multiple BFS requests to construct a feature vector and explore the locality of vertices with <em>K</em>-Means. Furthermore, HisOrder finetunes the vertex order for balancing workloads among CPU cores by predicting runtime computing intensity using the learned frontier distribution. To enhance the efficiency of HisOrder, we eliminate redundant distance computations in K-means, thereby achieving a significant acceleration. Based on evaluation on four small-world graph datasets, HisOrder achieves an average performance improvement of 1.20× with reasonable overhead, and consistently surpasses the state-of-the-art graph reordering.</p>
95+
</div></div>
96+
97+
98+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723451">Parallel Breadth-First Search Optimization Strategies</a></h3><ul class="DLauthors"><li class="nameList">Marati Bhaskar</li><li class="nameList Last">Raghavendra Kanakagiri</li></ul><div class="DLabstract"><div style="display:inline">
99+
<p>Breadth-first search (BFS) is a fundamental graph algorithm that presents significant challenges for parallel implementation due to irregular memory access patterns, load imbalance and synchronization overhead. In this paper, we introduce a set of optimization strategies for parallel BFS on multicore systems, including hybrid traversal, bitmap-based visited set, and a novel non-atomic distance update mechanism. We evaluate these optimizations across two different architectures - a 24-core Intel Xeon platform and a 128-core AMD EPYC system - using a diverse set of synthetic and real-world graphs. Our results demonstrate that the effectiveness of optimizations varies significantly based on graph characteristics and hardware architecture. For small-diameter graphs, our hybrid BFS implementation achieves speedups of 3 -- 8× on the Intel platform and 3 -- 10× on the AMD system compared to a conventional parallel BFS implementation. However, the performance of large-diameter graphs is more nuanced, with some of the optimizations showing varied performance across platforms including performance degradation in some cases.</p>
100+
</div></div>
101+
102+
103+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723452">Cache-optimized BFS on multi-core CPUs</a></h3><ul class="DLauthors"><li class="nameList">Salvatore Domenico Andaloro</li><li class="nameList">Thomas Pasquali</li><li class="nameList Last">Flavio Vella</li></ul><div class="DLabstract"><div style="display:inline">
104+
<p>Breadth-First Search (BFS) performance on shared-memory systems is often limited by irregular memory access and cache inefficiencies. This work presents two optimizations for BFS graph traversal: a bitmap-based algorithm designed for small-diameter graphs and MergedCSR, a graph storage format that improves cache locality for large-scale graphs. Experimental results on real-world datasets show an average 1.3× speedup over a state-of-the-art implementation, with MergedCSR reducing RAM accesses by approximately 15%.</p>
105+
</div></div>
106+
107+
108+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723449">Optimized Parallel Breadth-First Search with Adaptive Strategies</a></h3><ul class="DLauthors"><li class="nameList">Chaoqun Li</li><li class="nameList">Runbang Hu</li><li class="nameList">Xiaojiang Du</li><li class="nameList Last">Yuede Ji</li></ul><div class="DLabstract"><div style="display:inline">
109+
<p>Breadth-First Search (BFS) is a fundamental graph traversal algorithm in a level-by-level pattern. It has been widely used in real-world applications, such as social network analysis, scientific computing, and web crawling. However, achieving high performance for BFS on large-scale graphs remains a challenging task due to irregular memory access patterns, diverse graph structures, and the necessity for efficient parallelization. This paper addresses these challenges by designing a highly optimized parallel BFS implementation based on the top-down and bottom-up traversal strategies. It further integrates several key innovations, including graph type-aware computation strategy selection, graph pruning, two-level bottom-up, and efficient parallel implementation. We evaluate our method on 11 diverse graphs in terms of size, diameter, and density. On a CPU server with 48 threads, our method achieves an average speedup of 9.5× over the serial BFS implementation. Also, on a synthetic dense graph, our method processes 9.3 billion edges per second, showing its efficiency in dense graph traversal.</p>
110+
</div></div>
111+
112+
113+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723444">BFSBlitz: A Highly Parallel Graph System for Breadth-First Search</a></h3><ul class="DLauthors"><li class="nameList">Sakib Fuad</li><li class="nameList Last">Ashkan Vedadi Gargary</li></ul><div class="DLabstract"><div style="display:inline">
114+
<p>Breadth-First Search (BFS) is a fundamental graph traversal algorithm widely used in applications such as social network analysis, web crawling, and shortest path computations. However, efficiently parallelizing BFS for large-scale and irregular graph structures remains a significant challenge due to issues like workload imbalance, excessive synchronization, and data locality constraints.</p>
115+
<p>In this paper, we introduce BFSBlitz, a highly parallel graph processing system for BFS on shared memory systems. BFSBlitz dynamically adapts between TopDown and BottomUp traversals, utilizing hybrid partitioning for load balancing and contention reduction while mitigating data races. Experimental results demonstrate an average speedup of 34.61× over serial standard BFS, highlighting its efficiency for large-scale graph processing.</p>
116+
</div></div>
117+
118+
119+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723448">Techniques for Practical Parallel BFS and SSSP</a></h3><ul class="DLauthors"><li class="nameList">Quinten De Man</li><li class="nameList">Richard Wen</li><li class="nameList Last">Laxman Dhulipala</li></ul><div class="DLabstract"><div style="display:inline">
120+
<p>Breadth-first search (BFS) and single source shortest paths (SSSP) are two fundamental graph problems with countless real-world applications. It is of major interest to develop efficient parallel algorithms and implementations to solve these problems on modern multicore architectures. The challenge is that these computations are often irregular, and there is no known algorithm that guarantees polylogarithmic depth for all graphs. For BFS, this paper describe several performance engineering methods to develop a practical parallel implementation for all classes of real-world graphs. For SSSP, we introduce a unique contraction based preprocessing method which significantly speeds up queries on high diameter graphs, like road networks. Our method is generic and can be used with any algorithm for SSSP queries.</p>
121+
</div></div>
122+
123+
124+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723446">Relax and don't Stop: Graph-aware Asynchronous SSSP</a></h3><ul class="DLauthors"><li class="nameList">Marco D'Antonio</li><li class="nameList">Kåre von Geijer</li><li class="nameList">Thai Son Mai</li><li class="nameList">Philippas Tsigas</li><li class="nameList Last">Hans Vandierendonck</li></ul><div class="DLabstract"><div style="display:inline">
125+
<p>The Parallel Single-Source Shortest Path (SSSP) problem has been tackled through many implementations, yet no single approach consistently outperforms others across diverse graph structures. Moreover, most implementations require extensive parameter tuning to reach peak performance. In this paper, we introduce the AdaMW scheduler, which dynamically selects between the schedulers Wasp and Multi-Queue. To achieve this, we use graph sampling and heuristics to select and configure the scheduler. In contrast to common state-of-the-art bulk-synchronous implementations, AdaMW is fully asynchronous, thus not needing to stop for global barriers. The resulting scheduler is highly competitive with the best manually-tuned, state-of-the-art implementations.</p>
126+
</div></div>
127+
128+
129+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723447">Hyb-Stepping: Hybrid Stepping for Parallel Shortest Paths</a></h3><ul class="DLauthors"><li class="nameList">Ashkan Vedadi Gargary</li><li class="nameList Last">Sakib Fuad</li></ul><div class="DLabstract"><div style="display:inline">
130+
<p>Shortest path algorithms have many real-world applications. It's challenging to achieve high scalability due to their sequential nature. To address this, stepping algorithms, including Δ* and <em>ρ</em> stepping, enable parallel execution by grouping nodes into buckets based on distance ranges. However, existing implementations require manual selection of the optimal method and parameters.</p>
131+
<p>In this work, we propose <em>hyb-stepping</em>, a parallel shortest path algorithm that dynamically selects between Δ* and <em>ρ</em> stepping based on graph properties. By leveraging a light preprocessing analysis, we classify graphs using four key metrics: (i) undirected graph average degree, (ii) general scale, (iii) large diameter scale, and (iv) small diameter scale. Our method eliminates manual tuning and ensures optimal parameter selection, significantly improving performance. Our results indicate <em>hyb-stepping</em> achieves a 1.5× speedup over fixed-parameter methods (440<em>M</em> edges per second).</p>
132+
</div></div>
133+
134+
135+
<h3><a class="DLtitleLink" title="Full Citation in the ACM Digital Library" referrerpolicy="no-referrer-when-downgrade" href="https://dl.acm.org/doi/10.1145/3711708.3723450">Adaptive Optimizations for Parallel Single-Source Shortest Paths</a></h3><ul class="DLauthors"><li class="nameList">Runbang Hu</li><li class="nameList">Chaoqun Li</li><li class="nameList">Xiaojiang Du</li><li class="nameList Last">Yuede Ji</li></ul><div class="DLabstract"><div style="display:inline">
136+
<p>The single-source shortest path (SSSP) problem is essential in graph theory with applications in navigation, biology, social networks, and traffic analysis. The Δ-Stepping algorithm enhances parallelism by grouping vertices into "buckets" based on their tentative distances. However, its performance depends on Δ values and graph properties. This paper introduces an adaptive parallel Δ-Stepping implementation with three innovations: neighbor reordering, bucket fusion, and graph type-aware Δ selection. Tested on 11 diverse graphs, it achieves an average 7.1× speedup over serial Dijkstra's algorithm on a 48-threads CPU.</p>
137+
</div></div>
138+
139+
</div></div></body></html>

_data/OpenTOC.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1543,3 +1543,7 @@
15431543
year: 2025
15441544
title: "Proceedings of the 2025 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Volume 1"
15451545
link: oopsla25-1.html
1546+
-
1547+
event: FCPC
1548+
year: 2025
1549+
title: "Proceedings of the 1st FastCode Programming Challenge"

0 commit comments

Comments
 (0)