


default search action
PPoPP 2025: Las Vegas, NV, USA
- Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025, Las Vegas, NV, USA, March 1-5, 2025. ACM 2025, ISBN 979-8-4007-1443-6
Keynote
- Charles E. Leiserson:
Setting a Course for Post-Moore Software Performance. 1
Graph Neural Networks
- Jie Sun, Zuocheng Shi, Li Su, Wenting Shen, Zeke Wang, Yong Li, Wenyuan Yu, Wei Lin, Fei Wu, Bingsheng He, Jingren Zhou:
Helios: Efficient Distributed Dynamic Graph Sampling for Online GNN Inference. 2-15 - Jou-An Chen, Hsin-Hsuan Sung, Ruifeng Zhang, Ang Li, Xipeng Shen:
Accelerating GNNs on GPU Sparse Tensor Cores through N: M Sparsity-Oriented Graph Reordering. 16-28 - Kaihao Ma, Renjie Liu, Xiao Yan, Zhenkun Cai, Xiang Song, Minjie Wang, Yichao Li, James Cheng:
Adaptive Parallel Training for Graph Neural Networks. 29-42
GPU I
- Vani Nagarajan, Rohan Gangaraju, Kirshanthan Sundararajah, Artem Pelenitsyn, Milind Kulkarni:
RT-BarnesHut: Accelerating Barnes-Hut Using Ray-Tracing Hardware. 43-56 - Anna Yue, Pen-Chung Yew, Sanyam Mehta:
EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs. 57-69 - Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Franck Cappello, Zizhong Chen:
TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs. 70-84
Concurrent Data Structures and Synchronization I
- Dave Dice, Alex Kogan:
Reciprocating Locks. 85-98 - Younghun Roh, Yuanhao Wei, Eric Ruppert, Panagiota Fatourou, Siddhartha Jayanti, Julian Shun:
Aggregating Funnels for Faster Fetch&Add and Queues. 99-114 - Takashi Hoshino, Kenjiro Taura:
Fairer and More Scalable Reader-Writer Locks by Optimizing Queue Management. 115-127 - Ajay Singh, Trevor Brown:
Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures. 128-141
Memory
- Fulin Nan, Ronglong Wu, Zhirong Shen, Jiahui Yang, Li Cheng, Zheng Chen, Yiming Zhang, Jiwu Shu:
AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations. 142-155 - Yun Wang, Liang Chen, Tianmai Deng, Ben Luo, Yibin Shen, Zhixiang Wei, Yixiao Xu, Minglang Huang, Zhengwei Qi:
Effectively Virtual Page Prefetching via Spatial-Temporal Patterns for Memory-intensive Cloud Applications. 156-169 - Hulin Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng:
Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion. 170-182
Deep Neural Networks
- Runxin Zhong, Yuyang Jin, Chen Zhang, Kinman Lei, Shuangyu Li, Jidong Zhai:
FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property. 183-196 - Weijian Liu, Mingzhen Li, Guangming Tan, Weile Jia:
Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism. 197-211 - Baixi Sun, Weijin Liu, J. Gregory Pauloski, Jiannan Tian, Jinda Jia, Daoce Wang, Boyuan Zhang, Mingkai Zheng, Sheng Di, Sian Jin, Zhao Zhang, Xiaodong Yu, Kamil A. Iskra, Pete Beckman, Guangming Tan, Dingwen Tao:
COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers. 212-224
Large Language Models
- Junfeng Lin, Ziming Liu, Yang You, Jun Wang, Weihao Zhang, Rong Zhao:
WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training. 225-238 - Elias Frantar, Roberto L. Castro, Jiale Chen, Torsten Hoefler, Dan Alistarh:
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models. 239-251 - Yuhang Liang, Xinyi Li, Jie Ren, Ang Li, Bo Fang, Jieyang Chen:
ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training. 252-266
Scheduling and Resource Management
- Yongkang Zhang, Haoxuan Yu, Chenxia Han, Cheng Wang, Baotong Lu, Yunzhe Li, Zhifeng Jiang, Yang Li, Xiaowen Chu, Huaicheng Li:
SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs. 267-281 - Zhengqing Liu, Musa Unal, Matthew J. Parkinson, Marios Kogias:
DORADD: Deterministic Parallel Execution in the Era of Microsecond-Scale Computing. 282-296 - Yankai Jiang, Rohan Basu Roy, Raghavendra Kanakagiri, Devesh Tiwari:
WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing. 297-311
Tensor Cores
- Jinliang Shi, Shigang Li, Youxuan Xu, Rongtian Fu, Xueying Wang, Tong Wu:
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores. 312-325 - Haisha Zhao, San Li, Jiaheng Wang, Chunbao Zhou, Jue Wang, Zhikuang Xin, Shunde Li, Zhiqiang Liang, Zhijie Pan, Fang Liu, Yan Zeng, Yangang Wang, Xuebin Chi:
Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores. 326-338 - Yuyao Niu, Marc Casas:
BerryBees: Breadth First Search by Bit-Tensor-Cores. 339-354 - Haozhi Han, Kun Li, Wei Cui, Donglin Bai, Yiwei Zhang, Liang Yuan, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang:
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units. 355-368
Concurrent Data Structures and Synchronization II
- Xizhe Yin, Chao Gao, Zhijia Zhao, Rajiv Gupta:
PANNS: Enhancing Graph-based Approximate Nearest Neighbor Search through Recency-aware Construction and Parameterized Search. 369-381 - Kåre von Geijer, Philippas Tsigas, Elias Johansson, Sebastian Hermansson:
Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO Queue. 382-395 - Liang Geng, Rubao Lee, Xiaodong Zhang:
LibRTS: A Spatial Indexing Library by Ray Tracing. 396-411 - Hao Wang, Minghao Pan, Jiaping Wang:
Crystality: A Programming Model for Smart Contracts on Parallel EVMs. 412-425
GPU II
- Julian Bellavita, Thomas Pasquali, Laura Del Rio Martin, Flavio Vella, Giulia Guidi:
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra. 426-440 - Zhibin Wang, Xi Lin, Xue Li, Pinhuan Wang, Ziheng Meng, Hang Liu, Chen Tian, Sheng Zhong:
Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm. 441-454 - Weichen Cao, Ke Meng, Zhiheng Lin, Guangming Tan:
GLumin: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining. 455-468 - Hansheng Wang, Zhekai Duan, Zitian Zhao, Siqi Wu, Saiqi Zheng, Qiao Li, Xu Jiang, Shaoshuai Zhang:
Improving Tridiagonalization Performance on GPU Architectures. 469-480
Parallel Algorithms and Applications
- Yiwei Zhang, Kun Li, Liang Yuan, Haozhi Han, Yunquan Zhang, Ting Cao, Mao Yang:
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers. 481-495 - Yi Zong, Chensong Zhang, Longjiang Mu, Jianchun Wang, Jian Sun, Xiaowen Xu, Xinliang Wang, Peinan Yu, Wei Xue:
Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid. 496-511 - Weicong Chen, Hao Qi, Curtis Tatsuoka, Xiaoyi Lu:
SBMGT: Scaling Bayesian Multinomial Group Testing. 512-523 - Xiaohui Duan, Yi Zhang, Kai Xu, Haohuan Fu, Bin Yang, Yiming Wang, Yilun Han, Siyuan Chen, Zhuangzhuang Zhou, Chenyu Wang, Dongqiang Huang, Huihai An, Xiting Ju, Haopeng Huang, Zhuang Liu, Wei Xue, Weiguo Liu, Bowen Yan, Jianye Hou, Maoxue Yu, Wenguang Chen, Jian Li, Zhao Jing, Hailong Liu, Lixin Wu:
An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores. 524-538
POSTER SESSION: Posters
- Daniel Anderson, Guy E. Blelloch, Siddhartha V. Jayanti:
Big Atomics and Fast Hash Tables. 539-541 - Xinmiao Zhang, Cheng Liu, Shengwen Liang, Chenwei Xiong, Yu Zhang, Lei Zhang, Huawei Li, Xiaowei Li:
Frontier-guided Graph Reordering. 542-544 - Yaodong Sheng, Ahmed Hassan, Michael F. Spear:
Transactional Data Structures with Orthogonal Metadata. 545-547 - Ao Li, Wenhai Li, Yuan Chen, Lingfeng Deng:
Boost Lock-free Queue and Stack with Batching. 548-550 - Yucheng Ouyang, Ying Liu, Honghui Shang, Zhenchuan Chen, Jiahao Shan, Huimin Cui, Xiaobing Feng, Xin Chen, Xingyu Gao, Lifang Wang, Haifeng Song, Xin Chen, Rongfen Lin, Fang Li:
TensorMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms. 551-553 - Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan:
FastBWA: Practical and Cost-Efficient Genome Sequence Alignment Pipeline. 554-556 - Boyuan Zhang, Luanzheng Guo, Jiannan Tian, Jinyang Liu, Daoce Wang, Fanjiang Ye, Chengming Zhang, Jan Strube, Nathan R. Tallent, Dingwen Tao:
High-performance Visual Semantics Compression for AI-Driven Science. 557-559 - YuAng Chen, Jeffrey Xu Yu:
Triangle Counting on Tensor Cores. 560-562 - Zhanyuan Di, Leping Wang, Ziyi Ren, En Shao, Jie Zhao, Siyuan Feng, Dingwen Tao, Guangming Tan, Ninghui Sun:
Magneto: Accelerating Parallel Structures in DNNs via Co-Optimization of Operators. 563-565 - Chen Zhuang, Peng Chen, Xin Liu, Rio Yokota, Nikoli Dryden, Lingqi Zhang, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib:
A General and Scalable GCN Training Framework on CPU Supercomputers. 566-568 - Angelo Borsotti, Luca Breveglieri, Angelo Morzenti, Stefano Crespi-Reghizzi:
Minimizing speculation overhead in a parallel recognizer for regular texts. 569-572

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.