


default search action
31st PACT 2022: Chicago, IL, USA
- Andreas Klöckner, José Moreira:
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2022, Chicago, Illinois, October 8-12, 2022. ACM 2022, ISBN 978-1-4503-9868-8
Compilers for ever
- Tong Zhou
, Ruiqin Tian, Rizwan A. Ashraf
, Roberto Gioiosa
, Gokcen Kestor, Vivek Sarkar
:
ReACT: Redundancy-Aware Code Generation for Tensor Expressions. 1-13 - Bodhisatwa Chatterjee
, Sharjeel Khan
, Santosh Pande:
Com-CAS: Effective Cache Apportioning under Compiler Guidance. 14-27 - Perry Gibson
, José Cano:
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation. 28-39
Optimizing the execution of GNNs
- Mingi Yoo, Jaeyong Song
, Hyeyoon Lee, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee
:
Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators. 40-53 - Zhe Zhou, Cong Li, Xuechao Wei, Xiaoyang Wang, Guangyu Sun:
GNNear: Accelerating Full-Batch Training of Graph Neural Networks with near-Memory Processing. 54-68 - Chengying Huan, Shuaiwen Leon Song, Yongchao Liu
, Heng Zhang, Hang Liu, Charles He, Kang Chen, Jinlei Jiang, Yongwei Wu:
T-GCN: A Sampling Based Streaming Graph Neural Network System with Hybrid Architecture. 69-82 - Zhuoran Ji
, Cho-Li Wang:
Optimizing Aggregate Computation of Graph Neural Networks with on-GPU Interpreter-Style Programming. 83-95
Getting more out of your memory
- Albin Eldstål-Ahrens, Angelos Arelakis, Ioannis Sourdis:
FlatPack: Flexible Compaction of Compressed Memory. 96-108 - Han Jie Qiu, Sihang Liu
, Xinyang Song, Samira Manabi Khan, Gennady Pekhimenko:
Pavise: Integrating Fault Tolerance Support for Persistent Memory Applications. 109-123 - Taiyu Zhou, Yajuan Du, Fan Yang, Xiaojian Liao, Youyou Lu:
Efficient Atomic Durability on eADR-Enabled Persistent Memory. 124-134
Sparse matrix computations
- Roberto L. Castro, Diego Andrade, Basilio B. Fraguela
:
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM Routine on Ampere GPUs. 135-147 - Xin He, Kuan-Yu Chen
, Siying Feng, Hun-Seok Kim, David T. Blaauw, Ronald G. Dreslinski, Trevor N. Mudge:
Squaring the circle: Executing Sparse Matrix Computations on FlexTPU - A TPU-Like Processor. 148-159 - Marcos Horro
, Louis-Noël Pouchet, Gabriel Rodríguez, Juan Touriño
:
Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations. 160-171
Graph processing
- Han-Yi Chou, Sayan Ghosh:
Batched Graph Community Detection on GPUs. 172-184 - Peng Jiang, Yihua Wei, Jiya Su, Rujia Wang, Bo Wu:
SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation. 185-197 - Shinnung Jeong
, Yongwoo Lee
, Jaeho Lee
, Heelim Choi, Seungbin Song, Jinho Lee
, Youngsok Kim, Hanjun Kim
:
Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph Processing. 198-210
Miscellaneous
- Jian Zhou, Jianfeng Wu, Weizhou Huang
, You Zhou, Fei Wu, Liu Shi, Xiaoyi Zhang, Kun Wang, Feng Zhu, Shu Li:
Tiered Hashing: Revamping Hash Indexing under a Unified Memory-Storage Hierarchy. 211-222 - Qi Zhao, Zhengyi Qiu, Shudi Shao, Xinning Hui, Hassan Ali Khan, Guoliang Jin:
Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization Determinism. 223-238 - Sankeerth Durvasula, Raymond Kiguru, Samarth Mathur, Jenny Xu, Jimmy Lin, Nandita Vijaykumar:
VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks. 239-251
Better neural networks
- Yufan Xu, Qiwei Yuan, Erik Curtis Barton, Rui Li, P. Sadayappan, Aravind Sukumaran-Rajam:
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs. 252-264 - Lizhi Xiang, P. Sadayappan, Aravind Sukumaran-Rajam:
High-Performance Architecture Aware Sparse Convolutional Neural Networks for GPUs. 265-278 - Zachary Susskind, Aman Arora
, Igor D. S. Miranda, Luis Armando Quintanilla Villon, Rafael Fontella Katopodis, Leandro Santiago de Araújo, Diego Leonel Cadette Dutra
, Priscila M. V. Lima, Felipe M. G. França
, Maurício Breternitz
, Lizy K. John:
Weightless Neural Networks for Efficient Edge Inference. 279-290 - Cheng Fu, Hanxian Huang, Bram Wasti, Chris Cummins, Riyadh Baghdadi, Kim M. Hazelwood, Yuandong Tian, Jishen Zhao, Hugh Leather:
Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight Repetition. 291-303
Getting more out of your GPU
- Leul Belayneh, Haojie Ye, Kuan-Yu Chen
, David T. Blaauw, Trevor N. Mudge, Ronald G. Dreslinski, Nishil Talati:
Locality-Aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems. 304-316 - Xiaodan Serina Tan, Pavel Golikov, Nandita Vijaykumar, Gennady Pekhimenko:
GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud. 317-332 - Yuhui Bao, Yifan Sun
, Zlatan Feric, Michael Tian Shen
, Micah Weston, José L. Abellán, Trinayan Baruah, John Kim, Ajay Joshi, David R. Kaeli:
NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs. 333-345
Better hardware
- Parmida Vahdatniya, Amirali Sharifian, Reza Hojabr, Arrvindh Shriraman:
mu-grind: A Framework for Dynamically Instrumenting HLS-Generated RTL. 346-358 - Seyed Armin Vakil-Ghahani, Soheil Khadirsharbiyani, Jagadish B. Kotra, Mahmut T. Kandemir:
Athena: An Early-Fetch Architecture to Reduce on-Chip Page Walk Latencies. 359-371 - Mingjian He, Hua Wang, Ke Zhou, Kaichao Cui, Huabing Yan, Chang Guo, Rongfeng He:
DSDP: Dual Stream Data Prefetcher. 372-383
Task parallelism
- Oh-Kyoung Kwon, Ji Hoon Kang, Seungchul Lee, Wonjung Kim
, Junehwa Song:
Efficient Task-Mapping of Parallel Applications Using a Space-Filling Curve. 384-397 - Mahyar Emami, Endri Bezati, Jörn W. Janneck, James R. Larus:
Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks. 398-411
Optimization
- Xinyu Chen, Marco Minutoli, Jiannan Tian
, Mahantesh Halappanavar, Ananth Kalyanaraman, Dingwen Tao:
HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures. 412-425 - Jedidiah McClurg, Miles Claver, Jackson Garner, Jake Vossen, Jordan Schmerge, Mehmet E. Belviranli:
Optimizing Regular Expressions via Rewrite-Guided Synthesis. 426-438 - Bangtian Liu, Avery Laird, Wai Hung Tsang, Bardia Mahjour, Maryam Mehri Dehnavi:
Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization. 439-450
GPU algorithms
- Jie Zhao
, Cédric Bastoul
, Yanzhi Yi
, Jiahui Hu
, Wang Nie
, Renwei Zhang
, Zhen Geng
, Chong Li
, Thibaut Tachon
, Zhiliang Gan
:
Parallelizing Neural Network Models Effectively on GPU by Implementing Reductions Atomically. 451-466 - Haoyuan Xing, Gagan Agrawal, Rajiv Ramnath:
GPU Adaptive In-situ Parallel Analytics (GAP). 467-480 - Muhammad A. Awad
, Serban D. Porumbescu, John D. Owens:
A GPU Multiversion B-Tree. 481-493
Portable performance
- Johannes Doerfert, Marc Jasper, Joseph Huber, Khaled Abdelaal, Giorgis Georgakoudis
, Thomas Scogland, Konstantinos Parasyris:
Breaking the Vendor Lock: Performance Portable Programming through OpenMP as Target Independent Runtime Layer. 494-504 - Foivos Tsimpourlas, Pavlos Petoumenos, Min Xu, Chris Cummins, Kim M. Hazelwood, Ajitha Rajan, Hugh Leather:
BenchPress: A Deep Active Benchmark Generator. 505-516 - Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia:
Collage: Seamless Integration of Deep Learning Backends with Automatic Placement. 517-529
Posters
- Anjia Wang
, Xinyao Yi, Yonghong Yan:
UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models. 530-531 - Dongwei Chen, Dong Tong, Chun Yang, Jiangfang Yi, Xu Cheng:
FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers. 532-533 - Michail Boulasikis, Flavius Gruian, Gareth Callanan, Jörn W. Janneck:
Analysing Dataflow Programs with Causation Traces. 534-535 - Jaeyoung Kang, Weihong Xu, Wout Bittremieux, Tajana Rosing:
Massively Parallel Open Modification Spectral Library Searching with Hyperdimensional Computing. 536-537 - Victor Ferrari
, Rafael C. F. Sousa, Márcio Machado Pereira, João P. L. de Carvalho, José Nelson Amaral, Guido Araujo:
Improving Convolution via Cache Hierarchy Tiling and Reduced Packing. 538-539 - Jie Li, Yuhui Deng, Zhaorui Wu, Shujie Pang:
A Thermal-Aware Data Replica Placement Strategy for Data-Intensive Data Centers. 540-541 - Luanzheng Guo
, Rizwan A. Ashraf
, Ryan D. Friese, Gokcen Kestor:
Towards Supporting Semiring in MLIR-Based COMET Compiler. 542-543 - Serena Curzel
, Sofija Jovic, Michele Fiorito
, Antonino Tumeo, Fabrizio Ferrandi
:
MLIR Loop Optimizations for High-Level Synthesis: A Case Study. 544-545 - Jeongeun Kim
, Young Woo Jeong
, Su-Yeon Jang, Seung Eun Lee:
An Architecture for Resilient Federated Learning through Parallel Recognition. 546-547 - Truls Asheim, Boris Grot, Rakesh Kumar:
A Specialized BTB Organization for Servers. 548-549

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.