default search action
SC 2024: Atlanta, GA, USA
- Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2024, Atlanta, GA, USA, November 17-22, 2024. IEEE 2024
ACM Gordon Bell Climate Modeling Finalists
- Xiao Wang, Siyan Liu, Aristeidis Tsaris, Jong-Youl Choi, Ashwin M. Aji, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash:
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability. 1 - Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun:
Boosting Earth System Model Outputs And Saving PetaBytes in Their Storage Using Exascale Climate Emulators. 2 - Junlin Wei, Xiang Han, Jiangfeng Yu, Jinrong Jiang, Hailong Liu, Pengfei Lin, Maoxue Yu, Kai Xu, Lian Zhao, Pengfei Wang, Weipeng Zheng, Jingwei Xie, Yanzhi Zhou, Tao Zhang, Feng Zhang, Yehong Zhang, Yue Yu, Yuzhu Wang, Yidi Bai, Chen Li, Zipeng Yu, Haoyu Deng, Yaxin Li, Xuebin Chi:
A Performance-Portable Kilometer-Scale Global Ocean Model on ORISE and New Sunway Heterogeneous Supercomputers. 3
ACM Gordon Bell Finalists
- Siddharth Singh, Prajwal Singhania, Aditya Ranjan, John Kirchenbauer, Jonas Geiping, Yuxin Wen, Neel Jain, Abhimanyu Hans, Manli Shu, Aditya Tomar, Tom Goldstein, Abhinav Bhatele:
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers. 4 - Honghui Shang, Ying Liu, Zhikun Wu, Zhenchuan Chen, Jinfeng Liu, Meiyue Shao, Yingzhou Li, Bowen Kan, Huimin Cui, Xiaobing Feng, Yunquan Zhang, Donald G. Truhlar, Hong An, Xiao He, Jinlong Yang:
Pushing the Limit of Quantum Mechanical Simulation to the Raman Spectra of a Biological System with 100 Million Atoms. 5 - Hatem Ltaief, Rabab Alomairy, Qinglei Cao, Jie Ren, Lotfi Slim, Thorsten Kurth, Benedikt Dorschner, Salim Bougouffa, Rached Abdelkhalak, David E. Keyes:
Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression. 6 - Gautham Dharuman, Kyle Hippe, Alexander Brace, Sam Foreman, Väinö Hatanpää, Varuni Katti Sastry, Huihuo Zheng, Logan T. Ward, Servesh Muralidharan, Archit Vasan, Bharat Kale, Carla M. Mann, Heng Ma, Yun-Hsuan Cheng, Yuliana Zamora, Shengchao Liu, Chaowei Xiao, Murali Emani, Tom Gibbs, Mahidhar Tatineni, Deepak Canchi, Jerome Mitchell, Koichi Yamada, Maria Garzaran, Michael E. Papka, Ian T. Foster, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan:
MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization. 7 - Kylee Santos, Stan Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan P. Thompson, Delyan Z. Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A. Leon, James H. Laros III, Michael James, Sivasankaran Rajamanickam:
Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System. 8 - Ryan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca:
Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials. 9
Technical Papers: Energy and Carbon-Efficient Architectures
- Dongho Ha, Yunan Zhang, Chen-Chien Kao, Christopher J. Hughes, Won Woo Ro, Hung-Wei Tseng:
M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs. 10 - Yiwei Li, Mingyu Gao:
Hydrogen: Contention-Aware Hybrid Memory for Heterogeneous CPU-GPU Architectures. 11 - Yankai Jiang, Rohan Basu Roy, Baolin Li, Devesh Tiwari:
EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable Computing. 12
Technical Papers: High-Performance Compression and I/O Management
- Jinyang Liu, Jiannan Tian, Shixun Wu, Sheng Di, Boyuan Zhang, Robert Underwood, Yafan Huang, Jiajun Huang, Kai Zhao, Guanpeng Li, Dingwen Tao, Zizhong Chen, Franck Cappello:
cuSZ-i: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation. 13 - Zhenbo Qiao, Qirui Tian, Zhenlu Qin, Jinzhen Wang, Qing Liu, Norbert Podhorszki, Scott Klasky, Hongjian Zhu:
Tango: A Cross-layer Approach to Managing I/O Interference over Local Ephemeral Storage. 14 - Yafan Huang, Sheng Di, Guanpeng Li, Franck Cappello:
cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio. 15
Technical Papers: Workflow Characterization and Optimization
- Malgorzata Lazuka, Andreea Anghel, Thomas P. Parnell:
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services. 16 - Hariharan Devarajan, Loïc Pottier, Kaushik Velusamy, Huihuo Zheng, Izzet Yildirim, Olga Kogiou, Weikuan Yu, Anthony Kougkas, Xian-He Sun, Jae-Seung Yeom, Kathryn M. Mohror:
DFTracer: An Analysis-Friendly Data Flow Tracer for AI-Driven Workflows. 17 - Michael Mandulak, Sayan Ghosh, S. M. Ferdous, Mahantesh Halappanavar, George M. Slota:
Efficient Weighted Graph Matching on GPUs. 18
Technical Papers: Compiler Analysis and Code Generation
- Ryuichi Sai, John M. Mellor-Crummey, Jinfan Xu, Mauricio Araya-Polo:
Automated Code Generation of High-Order Stencils for a Dataflow Architecture. 19 - Xiaoyan Liu, Xinyu Yang, Kejie Ma, Shanghao Liu, Kaige Zhang, Hailong Yang, Yi Liu, Zhongzhi Luan, Depei Qian:
Moirae: Generating High-Performance Composite Stencil Programs with Global Optimizations. 20 - Du Wu, Jintao Meng, Wenxi Zhu, Minwen Deng, Xiao Wang, Tao Luo, Mohamed Wahib, Yanjie Wei:
autoGEMM: Pushing the Limits of Irregular Matrix Multiplication on Arm Architectures. 21
Technical Papers: Power Management and Cooling
- Zeyu Yang, Karel Adámek, Wesley Armour:
Accurate and Convenient Energy Measurements for GPUs: A Detailed Study of NVIDIA GPU's Built-In Power Sensor. 22 - Wesley Brewer, Matthias Maiterth, Vineet Kumar, Rafal P. Wojda, Sedrick Bouknight, Jesse Hines, Woong Shin, Scott Greenwood, David Grant, Wesley Williams, Feiyi Wang:
A Digital Twin Framework for Liquid-cooled Supercomputers as Demonstrated at Exascale. 23 - Ana Luisa Veroneze Solórzano, Kento Sato, Keiji Yamamoto, Fumiyoshi Shoji, Jim M. Brandt, Benjamin Schwaller, Sara Petra Walton, Jennifer Green, Devesh Tiwari:
Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku Supercomputer. 24
Technical Papers: Scheduling
- Yiqin Dai, Ruibo Wang, Yong Dong, Kai Lu:
Towards Highly Compatible I/O-Aware Workflow Scheduling on HPC Systems. 25 - Rutwik Jain, Brandon Tran, Keting Chen, Matthew D. Sinclair, Shivaram Venkataraman:
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters. 26 - Donghyeon Ryu, Chanik Park:
Toward High-Performance Blockchain System by Blurring the Line between Ordering and Execution. 27
Technical Papers: Advanced Computational Methods and Architectures
- Ryuichi Sai, François P. Hamon, John M. Mellor-Crummey, Mauricio Araya-Polo:
Matrix-Free Finite Volume Kernels on a Dataflow Architecture. 28 - Jiajie Li, Jan-Niklas Schmelzle, Yixiao Du, Simon Heumos, Andrea Guarracino, Giulia Guidi, Pjotr Prins, Erik Garrison, Zhiru Zhang:
Rapid GPU-Based Pangenome Graph Layout. 29 - Jianxiong Li, Boyang Li, Zhuoqiang Guo, Mingzhen Li, Enji Li, Lijun Liu, Guojun Yuan, Zhan Wang, Guangming Tan, Weile Jia:
Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day. 30
Technical Papers: Leadership-class Supercomputers
- Awais Khan, John R. Lange, Nick Hagerty, Edwin F. Posada, John K. Holmen, James B. White, James Austin Harris, Verónica Melesse Vergara, Christopher Zimmer, Scott Atchley:
An Evaluation of the Effect of Network Cost Optimization for Leadership Class Supercomputers. 31 - Andreas Herten, Sebastian Achilles, Damian Alvarez, Jayesh Badwaik, Eric Behle, Mathis Bode, Thomas Breuer, Daniel Caviedes-Voullième, Mehdi Cherti, Adel Dabah, Salem El Sayed, Wolfgang Frings, Ana Gonzalez-Nicolas, Eric B. Gregory, Kaveh Haghighi Mood, Thorsten Hater, Jenia Jitsev, Chelsea Maria John, Jan H. Meinke, Catrin I. Meyer, Pavel Mezentsev, Jan-Oliver Mirus, Stepan Nassyr, Carolin Penke, Manoel Römmer, Ujjwal Sinha, Benedikt von St. Vieth, Olaf Stein, Estela Suarez, Dennis Willsch, Ilya Zhukov:
Application-Driven Exascale: The JUPITER Benchmark Suite. 32 - Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler:
Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects. 33
Technical Papers: Parallel Program Analysis and Code Optimization
- Zheng Zhang, Donglin Yang, Xiaobo Zhou, Dazhao Cheng:
MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators. 34 - Luke Marzen, Akash Dutta, Ali Jannesari:
Static Generation of Efficient OpenMP Offload Data Mappings. 35 - John Jacobson, Martin Burtscher, Ganesh Gopalakrishnan:
HiRace: Accurate and Fast Data Race Checking for GPU Programs. 36
Technical Papers: Serverless Computing and Disaggregated Memory
- Jing Wang, Hanzhang Yang, Chao Li, Yiming Zhuansun, Wang Yuan, Cheng Xu, Xiaofeng Hou, Minyi Guo, Yang Hu, Yaqian Zhao:
Boosting Data Center Performance via Intelligently Managed Multi-backend Disaggregated Memory. 37 - Chengzhi Lu, Huanle Xu, Yudan Li, Wenyan Chen, Kejiang Ye, Chengzhong Xu:
SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing. 38 - Hanfei Yu, Hao Wang, Devesh Tiwari, Jian Li, Seung-Jong Park:
Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing. 39
Technical Papers: GPU Optimizations for ML
- Branden Butler, Sixing Yu, Arya Mazaheri, Ali Jannesari:
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation. 40 - Zaifeng Pan, Zhen Zheng, Feng Zhang, Bing Xie, Ruofan Wu, Shaden Smith, Chuanjie Liu, Olatunji Ruwase, Xiaoyong Du, Yufei Ding:
RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules. 41 - Munkyu Lee, Sihoon Seong, Minki Kang, Jihyuk Lee, Gap-Joo Na, In-Geol Chun, Dimitrios Nikolopoulos, Cheol-Ho Hong:
ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments. 42
Technical Papers: Parallel Programming Frameworks, Libraries and Runtimes
- Cédric Augonnet, Andrei Alexandrescu, Albert Sidelnik, Michael Garland:
CUDASTF: Bridging the Gap Between CUDA and Task Parallelism. 43 - Tim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders:
KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPI. 44 - George Karlos, Henri E. Bal, Lin Wang:
NetCL: A Unified Programming Framework for In-Network Computing. 45
Technical Papers: Sparse Matrix Computations
- Isuru Ranawaka, Md Taufique Hussain, Charles Block, Gerasimos Gerogiannis, Josep Torrellas, Ariful Azad:
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication. 46 - Yuxi Hong, Aydin Buluç:
A Sparsity-Aware Distributed-Memory Algorithm for Sparse-Sparse Matrix Multiplication. 47 - Haozhong Qiu, Chuanfu Xu, Jianbin Fang, Jian Zhang, Liang Deng, Yue Ding, Qingsong Wang, Shizhao Chen, Yonggang Che, Jie Liu:
A Conflict-aware Divide-and-Conquer Algorithm for Symmetric Sparse Matrix-Vector Multiplication. 48
Technical Papers: Machine Learning Applications
- Weihu Wang, Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng:
Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching. 49 - Yaqi Xia, Donglin Yang, Xiaobo Zhou, Dazhao Cheng:
Scaling New Heights: Transformative Cross-GPU Sampling for Training Billion-Edge Graphs. 50 - Youguang Chen, Zheyu Wen, George Biros:
A Scalable Algorithm for Active Learning. 51
Technical Papers: Matrix Computations on Tensor Cores
- Yuechen Lu, Lijie Zeng, Tengcheng Wang, Xu Fu, Wenxuan Li, Helin Cheng, Dechuang Yang, Zhou Jin, Marc Casas, Weifeng Liu:
AmgT: Algebraic Multigrid Solver on Tensor Cores. 52 - Yiwei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang:
LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores. 53 - Patrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej Besta, Flavio Vella, Torsten Hoefler:
High Performance Unstructured SpMM Computation Using Tensor Cores. 54
Technical Papers: Performance Analysis
- Yafan Huang, Sheng Di, Zhaorui Zhang, Xiaoyi Lu, Guanpeng Li:
Versatile Datapath Soft Error Detection on the Cheap for HPC Applications. 55 - Francesco Antici, Andrea Bartolini, Zeynep Kiziltan, Özalp Babaoglu, Yuetsu Kodama:
MCBound: An Online Framework to Characterize and Classify Memory/Compute-bound HPC Jobs. 56 - Xin You, Zhibo Xuan, Hailong Yang, Zhongzhi Luan, Yi Liu, Depei Qian:
GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems. 57
Technical Papers: High-Performance Solvers
- Dechuang Yang, Yuxuan Zhao, Yiduo Niu, Weile Jia, En Shao, Weifeng Liu, Guangming Tan, Zhou Jin:
Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUs. 58 - Xiaojian Yang, Shengguo Li, Fan Yuan, Dezun Dong:
DBSR: An Efficient Storage Format for Vectorizing Sparse Triangular Solvers on Structured Grids. 59 - Shikhar Shah, Boqin Zhang, Hua Huang, John E. Pask, Phanish Suryanarayana, Edmond Chow:
Many-Body Electronic Correlation Energy using Krylov Subspace Linear Solvers. 60
Technical Papers: HPC for Physics and Material Science
- Wentiao Wu, Zhengbang Zhou, Qingcai Jiang, Junwei Feng, Xinming Qin, Huanhuan Ma, Zhenwei Cao, Junshi Chen, Sheng Chen, Xinyong Meng, Bingkun Hou, Yuanfan Xiong, Linhao Wang, Yixuan Sun, Hong An, Jinlong Yang, Wei Hu:
Enabling 13K-Atom Excited-State GW Calculations via Low-Rank Approximations and HPC on the New Sunway Supercomputer. 61 - Barry Sly-Delgado, Ben Tovar, Jin Zhou, Douglas Thain:
Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine. 62 - Leonard Deuschle, Alexander Maeder, Vincent Maillou, Nicolas Vetsch, Anders Winka, Jiang Cao, Alexandros Nikolaos Ziogas, Mathieu Luisier:
Towards Exascale Simulations of Nanoelectronic Devices in the GW Approximation. 63
Technical Papers: Performance Modeling
- Siyuan Shen, Langwen Huang, Marcin Chrapek, Timo Schneider, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler:
LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming. 64 - Nan Ding, Brian Austin, Yang Liu, Neil Mehta, Steven Farrell, Johannes P. Blaschke, Leonid Oliker, Hai Ah Nam, Nicholas J. Wright, Samuel Williams:
A Workflow Roofline Model for End-to-End Workflow Performance Analysis. 65 - Lingda Li, Thomas Flynn, Adolfy Hoisie:
Learning Generalizable Program and Architecture Representations for Performance Modeling. 66
Technical Papers: Quantum and Approximate Computing I
- Daniel Silver, Aditya Ranjan, Rakesh Achutha, Tirthak Patel, Devesh Tiwari:
LexiQL: Quantum Natural Language Processing on NISQ-era Machines. 67 - Yuwei Jin, Xiangyu Gao, Minghao Guo, Henry Chen, Fei Hua, Chi Zhang, Eddy Z. Zhang:
Optimizing Quantum Fourier Transformation (QFT) Kernels for Modern NISQ and FT Architectures. 68 - Marzio Vallero, Gioele Casagranda, Flavio Vella, Paolo Rech:
On the Efficacy of Surface Codes in Compensating for Radiation Events in Superconducting Devices. 69
Technical Papers: Analysis of HPC Systems
- Jeremiah Giordani, Ziyang Xu, Ella Colby, August Ning, Bhargav Reddy Godala, Ishita Chaturvedi, Shaowei Zhu, Yebin Chon, Greg Chan, Zujun Tan, Galen Collier, Jonathan D. Halverson, Enrico Armenio Deiana, Jasper Liang, Federico Sossai, Yian Su, Atmn Patel, Bangyen Pham, Nathan Greiner, Simone Campanoni, David I. August:
Revisiting Computation for Research: Practices and Trends. 70 - Anna Giannakou, Damian Hazen, Bjoern Enders, Lavanya Ramakrishnan, Nicholas J. Wright:
Understanding Data Movement Patterns in HPC: A NERSC Case Study. 71
Technical Papers: Quantum and Approximate Computing II
- Zane Fink, Konstantinos Parasyris, Praneet Rathi, Giorgis Georgakoudis, Harshitha Menon, Peer-Timo Bremer:
HPAC-ML: A Programming Model for Embedding ML Surrogates in Scientific Applications. 72 - Jason Ludmir, Tirthak Patel:
Parallax: A Compiler for Neutral Atom Quantum Computers under Hardware Constraints. 73
Technical Papers: Sparsity and Quantization in ML
- Yidong Chen, Chen Zhang, Rongchao Dong, Haoyuan Zhang, Yonghua Zhang, Zhonghua Lu, Jidong Zhai:
MixQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction. 74 - Tuowei Wang, Kun Li, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang:
Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity. 75
Technical Papers: Efficient Transformers
- Enzhi Zhang, Isaac Lyngaas, Peng Chen, Xiao Wang, Jun Igarashi, Yuankai Huo, Masaharu Munetomo, Mohamed Wahib:
Adaptive Patching for High-resolution Image Segmentation with Transformers. 76 - Meng Zhang, Jie Sun, Qinghao Hu, Peng Sun, Zeke Wang, Yonggang Wen, Tianwei Zhang:
TorchGT: A Holistic System for Large-Scale Graph Transformer Training. 77 - Xun Wang, Zeyang Zhu, Xiangyu Meng, Tao Song:
Exploring Efficient Partial Differential Equation Solution Using Speed Galerkin Transformer. 78
Technical Papers: Quantum and Approximate Computing III
- Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhilin Pei, Xingcheng Zhang, Wanli Ouyang:
Surpassing Sycamore: Achieving Energetic Superiority Through System-Level Circuit Simulation. 79 - Mekena Metcalf, Pablo Andrés-Martínez, Nathan Fitzpatrick:
Realizing Quantum Kernel Models at Scale with Matrix Product State Simulation. 80 - Mingkuan Xu, Shiyi Cao, Xupeng Miao, Umut A. Acar, Zhihao Jia:
Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs. 81
Technical Papers: Resource Utilization and Package Management
- Weicheng Xue, Kai Yang, Yongxiang Liu, Dengdong Fan, Pengxiang Xu, Yonghong Tian:
Unlocking High Performance with Low-Bit NPUs and CPUs for Highly Optimized HPL-MxP on Cloud Brain II. 82 - Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei, Junjie Qiu, Hui Qu, Zehui Ren, Zhangli Sha, Xuecheng Su, Xiaowen Sun, Yixuan Tan, Minghui Tang, Shiyu Wang, Yaohui Wang, Yongji Wang, Ziwei Xie, Yiliang Xiong, Yanhong Xu, Shengfeng Ye, Shuiping Yu, Yukun Zha, Liyue Zhang, Haowei Zhang, Mingchuan Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Yuheng Zou:
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning. 83 - Daniel Nichols, Harshitha Menon, Todd Gamblin, Abhinav Bhatele:
A Probabilistic Approach To Selecting Build Configurations in Package Managers. 84
Technical Papers: Scientific Data Processing and Visualization
- Daoce Wang, Pascal Grosset, Jesus Pulido, Tushar M. Athawale, Jiannan Tian, Kai Zhao, Zarija Lukic, Axel Huebl, Zhe Wang, James P. Ahrens, Dingwen Tao:
A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization. 85 - Xuan Wu, Qian Gong, Jieyang Chen, Qing Liu, Norbert Podhorszki, Xin Liang, Scott Klasky:
Error-controlled Progressive Retrieval of Scientific Data under Derivable Quantities of Interest. 86 - Ankush Jain, Charles D. Cranor, Qing Zheng, Bradley W. Settlemyer, George Amvrosiadis, Gary A. Grider:
CARP: Range Query-Optimized Indexing for Streaming Data. 87
Technical Papers: Communication Optimization for ML
- Kishore Punniyamurthy, Khaled Hamidouche, Bradford M. Beckmann:
Optimizing Distributed ML Communication with Fused Computation-Collective Operations. 88 - Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao:
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression. 89 - Yuanxin Wei, Jiangsu Du, Jiazhi Jiang, Xiao Shi, Xianwei Zhang, Dan Huang, Nong Xiao, Yutong Lu:
APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes. 90
Technical Papers: Computational Efficiency and Learning Techniques
- Manasa Kaniselvan, Alexander Maeder, Marko Mladenovic, Mathieu Luisier, Alexandros Nikolaos Ziogas:
Accelerated Atomistic Kinetic Monte Carlo Simulations of Resistive Memory Arrays. 91 - Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman:
Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning. 92 - Aristotle X. Martin, Geng Liu, Bálint Joó, Runxin Wu, Mohammed Shihab Kabir, Erik W. Draeger, Amanda Randles:
Designing a GPU-Accelerated Communication Layer for Efficient Fluid-Structure Interaction Computations on Heterogeneous Systems. 93
Technical Papers: Scale-Up Interconnects
- Dong Xu, Yuan Feng, Kwangsik Shin, Daewoo Kim, Hyeran Jeon, Dong Li:
Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link. 94 - Albert Cho, Anish Saxena, Moinuddin Qureshi, Alexandros Daglis:
COAXIAL: A CXL-Centric Memory System for Scalable Servers. 95 - Yinxiao Feng, Kaisheng Ma:
Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration. 96
Technical Papers: Scaling and Checkpointing
- Anyesha Ghosh, Neeraja J. Yadwadkar, Mattan Erez:
Fast and Efficient Scaling for Microservices with SurgeGuard. 97 - Theresa Pollinger, Alexander Van Craen, Philipp Offenhäuser, Dirk Pflüger:
Realizing Joint Extreme-Scale Simulations on Multiple Supercomputers - Two Superfacility Case Studies. 98 - Xiang Fu, Weiping Zhang, Shiman Meng, Xin Huang, Wubiao Xu, Luanzheng Guo, Kento Sato:
AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency Analysis. 99
Technical Papers: Graph Algorithms and Computation on Graphs
- Zhe Pan, Shuibing He, Xu Li, Xuechen Zhang, Yanlong Yin, Rui Wang, Lidan Shou, Mingli Song, Xian-He Sun, Gang Chen:
Enumeration of Billions of Maximal Bicliques in Bipartite Graphs without Using GPUs. 100 - Junya Arai, Masahiro Nakao, Yuto Inoue, Kanto Teranishi, Koji Ueno, Keiichiro Yamamura, Mitsuhisa Sato, Katsuki Fujisawa:
Doubling Graph Traversal Efficiency to 198 TeraTEPS on the Supercomputer Fugaku. 101 - Shubhendra Pal Singhal, Souvadra Hati, Jeffrey Young, Vivek Sarkar, Akihiro Hayashi, Richard W. Vuduc:
Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization. 102
Technical Papers: Scale-Out Interconnects
- Mikhail Khalilov, Salvatore Di Girolamo, Marcin Chrapek, Rami Nudelman, Gil Bloch, Torsten Hoefler:
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI. 103 - Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Zizhe Jian, Xin Liang, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur:
hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression. 104 - Guangnan Feng, Jiabin Xie, Dezun Dong, Yutong Lu:
UNR: Unified Notifiable RMA Library for HPC. 105
Technical Papers: Storage Management
- Shi Qiu, Li Wang, Yiming Zhang:
EXO: Accelerating Storage Paravirtualization with eBPF. 106 - Hai Zhou, Dan Feng, Yuchong Hu, Wei Wang, Huadong Huang:
CoRD: Combining Raid and Delta for Fast Partial Updates in Erasure-Coded Storage Clusters. 107 - Luke Logan, Anthony Kougkas, Xian-He Sun:
MegaMmap: Blurring the Boundary Between Memory and Storage for Data-Intensive Workloads. 108
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.