


default search action
ICPP 2024: Gotland, SwedenA
- Proceedings of the 53rd International Conference on Parallel Processing, ICPP 2024, Gotland, Sweden, August 12-15, 2024. ACM 2024, ISBN 979-8-4007-1793-2
Algorithm Optimization
- Wojciech Kwedlo
:
Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systems. 1-10 - Subhajit Sahu
, Kishore Kothapalli
, Dip Sankar Banerjee
:
Fast Leiden Algorithm for Community Detection in Shared Memory Setting. 11-20 - Xianglin Wang
, Xin Yi
, Hengbiao Yu
, Chun Huang
, Lin Peng
:
Parallel Optimization for Accelerating the Generation of Correctly Rounded Elementary Functions. 21-31 - Abhishek V. N. Taraka Josyula
, Pritesh Verma
, Amar Gaonkar
, Amlan Barua
, Nikhil Hegde
:
Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable Matrices. 32-41
Best Paper Finalists
- Donney Fan
, Ben Liang
:
Online Non-preemptive Multi-Resource Scheduling for Weighted Completion Time on Multiple Machines. 42-51 - Yi Zong
, Peinan Yu
, Haopeng Huang
, Wei Xue
:
FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications. 52-62 - Kan Zhong
, Zhiwang Yu
, Qiao Li
, Xianqiang Luo
, Linbo Long
, Yujuan Tan
, Ao Ren
, Duo Liu
:
DPC: DPU-accelerated High-Performance File System Client. 63-72
Co-design
- Sonia Rani Gupta
, Nikela Papadopoulou
, Jing Chen
, Miquel Pericàs
:
Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving. 73-83 - Quentin Anthony
, Jacob Hatef
, Deepak Narayanan
, Stella Biderman
, Stas Bekman
, Junqi Yin
, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda
:
The Case for Co-Designing Model Architectures with Hardware. 84-96 - Qifeng Pan
, Ralf Schneider
:
Improving efficiency of Monte Carlo method via code intrinsic framework. 97-106 - Yongseok Soh
, Ramakrishnan Kannan
, Piyush Sao
, Jee W. Choi
:
Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures. 107-116
Communication and Networks
- Ujjaini Mukhopadhyay
, Alok Tripathy
, Oguz Selvitopi
, Katherine A. Yelick
, Aydin Buluç
:
Sparsity-Aware Communication for Distributed Graph Neural Network Training. 117-126 - Jing Xu
, Zhan Wang
, Fan Yang
, Ning Kang
, Zhenlong Ma
, Guojun Yuan
, Guangming Tan
, Ninghui Sun
:
FNCC: Fast Notification Congestion Control in Data Center Networks. 127-137 - Wen Xu
, Juncheng Wang
, Ben Liang
, Gary Boudreau
, Hamza Umit Sokun
:
Distributed Minimax Fair Optimization over Hierarchical Networks. 138-147
Communication and Scalability
- Jing Peng
, Zihan Li
, Shaohuai Shi
, Bo Li
:
Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning. 148-157 - Xinbiao Gan
, Tiejun Li
, Qiang Zhang
, Bo Yang
, Xinhai Chen
, Jie Liu
:
SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on Supercomputers. 158-167 - Tim Beringer
, Jakob Stock
, Arya Mazaheri
, Felix Wolf
:
Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction. 168-178
GPU Memory
- Jiajian Zhang
, Fangyu Wu
, Hai Jiang
, Guangliang Cheng
, Genlang Chen
, Qiufeng Wang
:
SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales. 179-188 - Abdun Nihaal
, Madhu Mutyam
:
Selective Memory Compression for GPU Memory Oversubscription Management. 189-198 - Gabin Schieffer
, Jacob Wahlgren
, Jie Ren
, Jennifer Faj
, Ivy Peng
:
Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper. 199-209
In-situ Workflow
- Jiahui Liu
, Tobias Edwards
, Kristina Durovic
, Philipp Schlatter
, Tino Weinkauf
:
In-Situ Binary Segmentation of 3D time-dependent Flows into Laminar and Turbulent Regions. 210-219 - Dewi Yokelson
, Mikhail Titov
, Srinivasan Ramesh
, Ozgur O. Kilic
, Matteo Turilli
, Shantenu Jha
, Allen D. Malony
:
Enabling Performance Observability for Heterogeneous HPC Workflows with SOMA. 220-230 - Jaime Cernuda
, Jie Ye
, Anthony Kougkas
, Xian-He Sun
:
HStream: A hierarchical data streaming engine for high-throughput scientific applications. 231-240
Parallel Algorithm
- Jialin Li
, Zhichen Feng
, Yaqian Gao
, Shaobo Tian
, Haoyuan Zhang
, Huang Ye
, Jian Zhang
:
High-Performance 3D convolution on the Latest Generation Sunway Processor. 241-251 - Gaurav Bhardwaj
, Bapi Chatterjee
, Abhinav Sharma
, Sathya Peri
, Siddharth Nayak
:
Kanva: A Lock-free Learned Search Data Structure. 252-261 - Haopeng Huang
, Yuyang Jin
, Wei Xue
:
BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core System. 262-272
Parallel Language
- Buddhi Ashan Mallika Kankanamalage
, Satish Puri
, Sushil K. Prasad
:
Extending Segment Tree for Polygon Clipping and Parallelizing using OpenMP and OpenACC Directives. 273-283 - Ruben Laso
, Diego Krupitza
, Sascha Hunold
:
Exploring Scalability in C++ Parallel STL Implementations. 284-293 - Zaman Lantra
, Steven A. Wright
, Gihan R. Mudalige
:
OP-PIC - an Unstructured-Mesh Particle-in-Cell DSL for Developing Nuclear Fusion Simulations. 294-304
Scheduling Cloud
- Svetlana Kulagina
, Henning Meyerhenke
, Anne Benoit
:
Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms✱. 305-316 - Liang Zhang
, Hongzi Zhu
, Yunzhe Li
, Jiangang Shen
, Minyi Guo
:
The Blind and the Elephant: A Preference-aware Edge Video Analytics Scheduler for Maximizing System Benefit. 317-326 - Tomasz Kanas
, Krzysztof Rzadca
:
Diminishing cold starts in serverless computing with approximation algorithms. 327-336 - Jiawei Huang
, Qile Wang
, Zhaoyi Li
, Yijun Li
, Zihao Chen
, Sitan Li
, Jing Shao
, Jingling Liu
, Min Zhan
, Jianxin Wang
:
Achieving Efficient Scheduling based on Accurate Measurement of Small Flows in Data Center. 337-346 - Huadong Li
, Hui Liu
, Aoqi Chen
, Xirui Ma
, Junzhao Du
:
Thawbringer: An Orchestrator to Mitigate Cascading Cold Starts of Serverless Function Chains. 347-356 - Ying Zheng
, Lei Jiao
, Han Yang
, Lulu Chen
, Ying Liu
, Yuxiao Wang
, Yuedong Xu
, Xin Wang
, Zongpeng Li
:
Online Scheduling and Pricing for Multi-LoRA Fine-Tuning Tasks. 357-366 - Xin Tan
, Jiamin Li
, Yitao Yang
, Jingzong Li
, Hong Xu
:
Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths. 367-376
Scientific Simulations
- Yaqian Gao
, Jian Zhang
, Huang Ye
, Xuebin Chi
:
Large-scale Phase-Field Simulations for Solid-Solid Phase Transformations involving Elastic Energy. 377-387 - Shui Jiang
, Rongliang Fu
, Lukas Burgholzer
, Robert Wille
, Tsung-Yi Ho
, Tsung-Wei Huang
:
FlatDD: A High-Performance Quantum Circuit Simulator using Decision Diagram and Flat Array. 388-399 - Yi Zhang, Ziyu Zhang, Yang Zhao
, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen:
Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics Simulation. 400-410 - Ran Zhao
, Chao Li
, Xiaowei Guo
, Sen Zhang
, Xi Yang
, Tao Tang
, Canqun Yang
:
A Motion Trace Decomposition-based overset grid method for parallel CFD simulations with moving boundaries. 411-420
Distributed Systems
- Conor James Green
, Mithuna Thottethodi
:
NetSmith: An Optimization Framework for Machine-Discovered Network Topologies. 421-432 - Chen Chen
, Li Shen
, Yingwen Chen
:
A Distributed Framework for Subgraph Isomorphism Leveraging CPU and GPU Heterogeneous Computing. 433-442 - Jinbin Hu
, Ying Liu
, Hao Wang
, Jin Wang
:
AutoPipe: Automatic Configuration of Pipeline Parallelism in Shared GPU Cluster. 443-452 - Ruisong Zhou
, Yuzhan Zhang
, Chunhua Li
, Ke Zhou
, Peng Wang
, Gong Zhang
, Ji Zhang
, Guangyu Zhang
:
HyperDB: a Novel Key Value Store for Reducing Background Traffic in Heterogeneous SSD Storage. 453-463
Federated Learning
- Fuyuan Xia
, Chenhao Ying
, David S. L. Wei
, Wei Chen
, Weiting Zhang
, Haiming Jin
, Yuan Luo
:
ChronusFed: Reinforcement-Based Adaptive Partial Training for Heterogeneous Federated Learning. 464-473 - Md Sirajul Islam
, Simin Javaherian
, Fei Xu
, Xu Yuan
, Li Chen
, Nian-Feng Tzeng
:
FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering. 474-483 - Jiangshan Hao
, Fang Dong
, Bingheng Cen
, Shucun Fu
, Ruiting Zhou
, Ding Ding
:
HASFL: Harnessing Heterogeneous Models Across Diverse Devices for Enhanced Federated Learning. 484-493 - Na Lv
, Zhi Shen
, Chen Chen
, Zhifeng Jiang
, Jiayi Zhang
, Quan Chen
, Minyi Guo
:
FedCA: Efficient Federated Learning with Client Autonomy. 494-503
GPU Cluster Optimization
- Bowen Zhang
, Shuxin Li
, Zhuozhao Li
:
MIGER: Integrating Multi-Instance GPU and Multi-Process Service for Deep Learning Clusters. 504-513 - Fei Yang
, Shuang Peng
, Ning Sun
, Fangyu Wang
, Yuanyuan Wang
, Fu Wu
, Jiezhong Qiu
, Aimin Pan
:
Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment. 514-523 - Bowen Yuchi
, Heng Shi
, Guoqing Bao
:
SPHINX: Search Space-Pruning Heterogeneous Task Scheduling for Deep Neural Networks. 524-533
Graph on GPU
- Chenle Yu
, Sara Royuela
, Eduardo Quiñones
:
Enhancing Heterogeneous Computing Through OpenMP and GPU Graph. 534-543 - Shinnung Jeong
, Sungjun Cho
, Yongwoo Lee
, Hyunjun Park
, Seonyeong Heo
, Gwangsun Kim
, Youngsok Kim
, Hanjun Kim
:
CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPU. 544-554 - Chen Zhao
, Ting Yu
, Zhigao Zheng
, Yuanyuan Zhu
, Song Jin
, Bo Du
, Dacheng Tao
:
SpeedCore: Space-efficient and Dependency-aware GPU Parallel Framework for Core Decomposition. 555-564 - Chih-Chun Chang
, Boyang Zhang
, Tsung-Wei Huang
:
GSAP: A GPU-Accelerated Stochastic Graph Partitioner. 565-575 - Mahesh Lakshminarasimhan
, Mary W. Hall
, Samuel Williams
, Oscar Antepara
:
BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs. 576-586 - Mithinti Srikanth
, Prashant Singh
, G. Ramakrishna
:
GPU Algorithms for Fastest Path Problem in Temporal Graphs. 587-596
Memory and Storage
- Wenda Tang
, Ying Han
, Tianxiang Ai
, Guanghui Li
, Bin Yu
, Xin Yang
:
Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared Memory. 597-606 - Ziwei Xiong
, Dejun Jiang
, Jin Xiong
:
DiStore: A Fully Memory Disaggregation Friendly Key-Value Store with Improved Tail Latency and Space Efficiency. 607-617 - Liuying Ma
, Zhenqing Liu
, Jin Xiong
, Yue Wu
, Renhai Chen
, Xi Peng
, Ying Zhang
, Gong Zhang
, Dejun Jiang
:
zQoS: Unleashing full performance capabilities of NVMe SSDs while enforcing SLOs in distributed storage systems. 618-628
Memory Optimization
- Stavroula Zouzoula
, Mohammad Ali Maleki
, Muhammad Waqar Azhar
, Pedro Trancoso
:
Scratchpad Memory Management for Deep Learning Accelerators. 629-639 - Jihu Guo
, Rui Xia
, Jie Liu
, Xiaoxiong Zhu
, Xiang Zhang
:
CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU. 640-649 - Qisheng Jiang
, Lei Jia
, Chundong Wang
:
GNNDrive: Reducing Memory Contention and I/O Congestion for Disk-based GNN Training. 650-659 - XinYu Piao
, Jong-Kook Kim
:
GMM: An Efficient GPU Memory Management-based Model Serving System for Multiple DNN Inference Models. 660-668
Performance Optimization
- Kaveh Mahdavi
:
A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel Applications. 669-678 - Fugeng Zhu
, Xinxin Qi
, Peng Zhang
, Jianbin Fang
, Tao Tang
, Yonggang Che
, Kainan Yu
, Jing Xie
, Chun Huang
, Jie Ren
:
Optimizing Stencil Computation on Multi-core DSPs. 679-690 - Ricardo Jesus
, Michèle Weiland
:
Evaluating and optimising compiler code generation for NVIDIA Grace. 691-700
Resource Allocation
- Yingwen Chen
, Wenxin Li
, Huan Zhou
, Xiangrui Yang
, Yanfei Yin
:
DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasks. 701-711 - Jiazhen Zhu
, Wenda Tang
, Xianglong Meng
, Nan Gong
, Tianxiang Ai
, Guanghui Li
, Bin Yu
, Xin Yang
:
PheCon: Fine-Grained VM Consolidation with Nimble Resource Defragmentation in Public Cloud Platforms. 712-721 - Dingyu Yang
, Ziyang Xiao
, Dongxiang Zhang
, Shuhao Zhang
, Jian Cao
, Gang Chen
:
PREACT: Predictive Resource Allocation for Bursty Workloads in a Co-located Data Center. 722-731
Scheduling Cloud
- Siyuan Chen
, Decheng Zuo
, Zhan Zhang
:
FlexSP: (1 + β)-Choice based Flexible Stream Partitioning for Stateful Operators. 732-741 - Wen Gao
, Zhiwen Yu
, Hui Xiong
, Bin Guo
, Liang Wang
, Yuan Yao
:
Parallel Task Scheduling in Autonomous Robotic Systems: An Event-Driven Multimodal Prediction Approach. 742-751 - Bin Gao
, Zhehui Wang
, Zhuomin He
, Tao Luo
, Weng-Fai Wong
, Zhi Zhou
:
IMI: In-memory Multi-job Inference Acceleration for Large Language Models. 752-761
Scheduling Edge
- Bei Ouyang
, Shengyuan Ye
, Liekang Zeng
, Tianyi Qian
, Jingyi Li
, Xu Chen
:
Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-tuning. 762-771 - Huadong Li
, Hui Liu
, Aoqi Chen
, Xirui Ma
, Qiaoqiao Liu
, Junzhao Du
:
RIA: Return on Investment Auto-scaler for Serverless Edge Functions. 772-781 - Yan Zhuang
, Zhenzhe Zheng
, Yunfeng Shao
, Bingshuai Li
, Fan Wu
, Guihai Chen
:
Nebula: An Edge-Cloud Collaborative Learning Framework for Dynamic Edge Environments. 782-791 - Jieyu Lin
, Minghao Li
, Sai Qian Zhang
, Alberto Leon-Garcia
:
Murmuration: On-the-fly DNN Adaptation for SLO-Aware Distributed Inference in Dynamic Edge Environments. 792-801
Tools
- Praseetha M
, Madhu Mutyam
, Venkata Kalyan Tavva
:
Cache Line Pinning for Mitigating Row Hammer Attack. 802-811 - Jie Ye
, Jaime Cernuda
, Neeraj Rajesh
, Keith Bateman
, Orcun Yildiz
, Tom Peterka
, Arnur Nigmetov
, Dmitriy Morozov
, Xian-He Sun
, Anthony Kougkas
, Bogdan Nicolae
:
Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network Models. 812-821 - Siyu Wu
, Hailong Yang
, Xin You
, Ruihao Gong
, Yi Liu
, Zhongzhi Luan
, Depei Qian
:
PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis. 822-832 - Simon Schwitanski
, Yussur Mustafa Oraji
, Cornelius Pätzold
, Joachim Jenke
, Felix Tomski
, Matthias S. Müller
:
RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access Applications. 833-844
Compression
- Tri Nguyen
, Md Hasanur Rahman
, Sheng Di
, Michela Becchi
:
Significantly Improving Fixed-Ratio Compression Framework for Resource-limited Applications. 845-855 - André Weißenberger
, Bertil Schmidt
:
Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUs. 856-865 - Zichen Tang
, Junlin Huang
, Rudan Yan
, Yuxin Wang
, Zhenheng Tang
, Shaohuai Shi
, Amelie Chi Zhou
, Xiaowen Chu
:
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning. 866-875
Configurable Hardware
- Kyle Zhao Bin Chen
, Tarek S. Abdelrahman
, Reza Azimi
, Tomasz S. Czajkowski
, Maziar Goudarzi
:
RoDMap: A Reserve-on-Demand Mapper for Spatially-Configured Coarse-Grained Reconfigurable Arrays. 876-886 - Jie Cheng
, Lifu Hu
, Wei Xu
, Hanhua Chen
, Tian Xia
:
Hardware Acceleration of Minimap2 Genomic Sequence Alignment Algorithm. 887-897 - Weilin Zhu
, Wei Tong
, Hujun Ge
, Zuoxian Zhang
, Mengran Zhang
, Wen Zhou
:
LpaqHP: A High-Performance FPGA Accelerator for LPAQ Compression. 898-907
Distributed Memory
- Piyush Sao
, Andrey Prokopenko
, Damien Lebrun-Grandié
:
PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU. 908-918 - Yifan Li
, Giulia Guidi
:
High-Performance Sorting-Based K-mer Counting in Distributed Memory with Flexible Hybrid Parallelism. 919-928 - Rui Zhang
, Yukai Huang
, Sicheng Liang
, Shangyi Sun
, Shaonan Ma
, Chengying Huan
, Lulu Chen
, Zhihui Lu
, Yang Xu
, Ming Yan
, Jie Wu
:
Revisiting Learned Index with Byte-addressable Persistent Storage. 929-938
Energy-aware Computing
- Hanfei Geng, Yi Sun, Yuanzhe Li, Jichao Leng, Xiangyu Zhu, Xianyuan Zhan, Yuanchun Li, Feng Zhao, Yunxin Liu:
TESLA: Thermally Safe, Load-Aware, and Energy-Efficient Cooling Control System for Data Centers. 939-949 - Hanlong Liao
, Guoming Tang
, Deke Guo
, Yi Wang
, Ruide Cao
:
Rethinking Low-Carbon Edge Computing System Design with Renewable Energy Sharing. 950-960 - Tiago Da Silva Barros
, Davide Ferré
, Frédéric Giroire
, Ramon Aparicio-Pardo
, Stephane Perennes
:
Scheduling Machine Learning Compressible Inference Tasks with Limited Energy Budget. 961-970
Federated Learning
- Haoyu Chen
, Yuxin Zhang
, Jin Zhao
, Xin Wang
, Yuedong Xu
:
Gradient Free Personalized Federated Learning. 971-980 - Yinlong Li
, Hao Zhang
, Siyao Cheng
, Jie Liu
:
Federated Edge Learning with Blurred or Pseudo Data Sharing. 981-990 - Dezhong Yao
, Ziquan Zhu
, Tongtong Liu
, Zhiqiang Xu
, Hai Jin
:
Rethinking Personalized Federated Learning from Knowledge Perspective. 991-1000
GPU Optimization
- Xu Zhang
, Guangda Zhang
, Lu Wang
, Shiqing Zhang
, Xia Zhao
:
AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUs. 1001-1011 - Jaebeom Jeon
, Minseong Gil
, Junsu Kim
, Jaeyong Park
, Gunjae Koo
, Myung Kuk Yoon
, Yunho Oh
:
VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing. 1012-1021 - Qianchao Zhu
:
FreeStencil: A Fine-Grained Solver Compiler with Graph and Kernel Optimizations on Structured Meshes for Modern GPUs. 1022-1031
Memory-centric Computing
- Pingdan Xiao
, Qinghui Hong
, Sichun Du
, Jiliang Zhang
:
CIM-KF: Efficient Computing-in-memory Circuits for Full-Process Execution of Kalman Filter Algorithm. 1032-1041 - Mohammad Sabri Abrebekoh
, Marc Riera Villanueva
, Antonio González
:
ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNNs. 1042-1051 - Tong Wu
, Shuibing He
, Jianxin Zhu
, Weijian Chen
, Siling Yang
, Ping Chen
, Yanlong Yin
, Xuechen Zhang
, Xian-He Sun
, Gang Chen
:
AUTOHET: An Automated Heterogeneous ReRAM-Based Accelerator for DNN Inference. 1052-1061 - Meven Mognol
, Dominique Lavenier
, Julien Legriel
:
Parallelization of the Banded Needleman & Wunsch Algorithm on UPMEM PiM Architecture for Long DNA Sequence Alignment. 1062-1071
Simulations on GPU
- Zhiyi Zhang
, Pengfei Zhang
, Zhuopin Xu
, Bingjie Yan
, Qi Wang
:
Im2col-Winograd: An Efficient and Flexible Fused-Winograd Convolution for NHWC Format on GPUs. 1072-1081 - Taisuke Boku
, Masatake Sugita
, Ryohei Kobayashi
, Shinnosuke Furuya
, Takuya Fujie
, Masahito Ohue
, Yutaka Akiyama
:
Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core Utilization. 1082-1091 - Fazeleh S. Kazemian
, Jorge L. Galvez Vallejo
, Giuseppe M. J. Barca
:
High-Performance, Accurate Large-Scale Quantum Chemistry Calculations on GPU Supercomputers using Coulomb-Perturbed Fragmentation. 1092-1102 - Runfeng Jin
, Wenhao Liang
, Haoyuan Zhang
, Yinxuan Song
, Zhen Luo
, Haibo Ma
, Yingjin Ma
, Zhong Jin
:
PASCI : A Scalable Framework for Heterogeneous Parallel Calculation of Dynamical Electron Correlation. 1103-1113
Sparse Tensor
- Seungbin Song
, Ju Min Lee
, Haeeun Jeong
, Hyunho Kwon
, Shinnung Jeong
, Jaeho Lee
, Hanjun Kim
:
TeMCO: Tensor Memory Compiler Optimization across Tensor Decompositions in Deep Learning Inference. 1114-1123 - Kaige Zhang
, Xiaoyan Liu
, Hailong Yang
, Tianyu Feng
, Xinyu Yang
, Yi Liu
, Zhongzhi Luan
, Depei Qian
:
Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor Core. 1124-1134 - YuAng Chen
, Jeffrey Xu Yu
:
Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor Cores. 1135-1144
SpMV
- Deshun Bi
, Shengguo Li
, Dezun Dong
, Peng Zhang
, Jianbin Fang
:
Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and Vectorization. 1145-1155 - Zhong Zheng
, Junshi Chen
, Yang Zhao
, Longsheng Song
, Xinming Qin
, Hong An
:
DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT Calculations. 1156-1165 - Chuhe Hong
, Qinglin Wang
, Runzhang Mao
, Yuechao Liang
, Rui Xia
, Jie Liu
:
SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core Processors. 1166-1175 - Haotian Mo
, Qinglin Wang
, Linyu Liao
, Biao Li
, Lihua Chi
, Jie Liu
:
Detailed Analysis and Optimization of Irregular-Shaped Matrix Multiplication on Multi-Core DSPs. 1176-1186
Storage
- Junhyeok Park
, Chang-Gyu Lee
, Soon Hwang
, Soonyeal Yang
, Jungki Noh
, Woosuk Chung
, Junghee Lee
, Youngjae Kim
:
BandSlim: A Novel Bandwidth and Space-Efficient KV-SSD with an Escape-from-Block Approach. 1187-1196 - Guantian Lin
, Si Wu
, Cheng Li
, Yinlong Xu
:
Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File Accesses. 1197-1206 - Piao Hu
, Huangzhen Xue
, Chentao Wu
, Jie Li
, Minyi Guo
:
HMT: A Hybrid Mitigating and Transferring Approach on I/O Throughput Degradation for Erasure Coded Storage Systems. 1207-1216 - Renping Liu
, Junhua Chen
, Peng Chen
, Linbo Long
, Anping Xiong
, Duo Liu
:
Hi-ZNS: High Space Efficiency and Zero-Copy LSM-Tree Based Stores on ZNS SSDs. 1217-1226 - Jiawei Huang
, Zihao Chen
, Yiting Wang
, Hui Li
, Zhaoyi Li
, Qile Wang
, Sitan Li
, Zhidong He
, Wanchun Jiang
:
Achieving High Efficiency for Datacenter Multicast using Skewed Bloom Filter. 1227-1236 - Shucheng Wang
, Kaiye Zhou
, Zhandong Guo
, Qiang Cao
, Jun Xu
, Jie Yao
:
SIndex: An SSD-based Large-scale Indexing with Deterministic Latency for Cloud Block Storage. 1237-1246 - Jiawei Huang
, Shengwen Zhou
, Zhaoyi Li
, Yijun Li
, Zihao Chen
, Xiaojun Zhu
, Jing Shao
, Sitan Li
, Wanchun Jiang
, Jianxin Wang
, Ping Zhong
:
Coupling Congestion Control and Flow Pausing in Data Center Network. 1247-1256

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.