default search action
Hari Subramoni
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j17]Tu Tran, Bharath Ramesh, Benjamin Michalowicz, Mustafa Abduljabbar, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
Accelerating communication with multi-HCA aware collectives in MPI. Concurr. Comput. Pract. Exp. 36(1) (2024) - [c169]Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Accelerating Large Language Model Training with Hybrid GPU-based Compression. CCGrid 2024: 196-205 - [c168]Nawras Alnaasan, Horng-Ruey Huang, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models. HOTI 2024: 11-19 - [c167]Tu Tran, Goutham Kalikrishna Reddy Kuncham, Bharath Ramesh, Shulei Xu, Hari Subramoni, Mustafa Abduljabbar, Dhabaleswar K. Panda:
OHIO: Improving RDMA Network Scalability in MPI_Alltoall Through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design. HOTI 2024: 47-56 - [c166]Quentin Anthony, Benjamin Michalowicz, Jacob Hatef, Lang Xu, Mustafa Abdul Jabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Demystifying the Communication Characteristics for Distributed Transformer Models. HOTI 2024: 57-65 - [c165]Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
The Case for Co-Designing Model Architectures with Hardware. ICPP 2024: 84-96 - [c164]Dhabaleswar K. Panda, Hari Subramoni:
Message from the HCW 2024 Technical Program Committee Co-Chairs. IPDPS (Workshops) 2024: 1 - [c163]Dhabaleswar K. Panda, Hari Subramoni:
Message from the HCW 2024 Technical Program Committee Co-Chairs. IPDPS (Workshops) 2024: 4 - [c162]Mingzhe Han, Goutham Kalikrishna Reddy Kuncham, Benjamin Michalowicz, Rahul Vaidya, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI. IPDPS (Workshops) 2024: 761-770 - [c161]Bharath Ramesh, Nick Contini, Nawras Alnaasan, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions. IPDPS 2024: 802-813 - [c160]Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. IPDPS 2024: 915-925 - [c159]Qinghua Zhou, Bharath Ramesh, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters. ISC 2024: 1-12 - [c158]Nicholas Contini, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
OMB-FPGA: A Microbenchmark Suite for FPGA-aware MPIs using OpenCL and SYCL. PEARC 2024: 1:1-1:9 - [c157]Radha Gulhane, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Infer-HiRes: Accelerating Inference for High-Resolution Images with Quantization and Distributed Deep Learning. PEARC 2024: 5:1-5:9 - [c156]Chen-Chun Chen, Goutham Kalikrishna Reddy Kuncham, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda:
Design and Implementation of an IPC-based Collective MPI Library for Intel GPUs. PEARC 2024: 17:1-17:9 - [c155]Tu Tran, Mustafa Abduljabbar, HooYoung Ahn, SeonYoung Kim, Yoo-Mi Park, Woojong Han, Shin-Young Ahn, Hari Subramoni, Dhabaleswar K. Panda:
OMB-CXL: A Micro-Benchmark Suite for Evaluating MPI Communication Utilizing Compute Express Link Memory Devices. PEARC 2024: 27:1-27:8 - [i15]Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. CoRR abs/2401.08383 (2024) - [i14]Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
The Case for Co-Designing Model Architectures with Hardware. CoRR abs/2401.14489 (2024) - [i13]Quentin Anthony, Benjamin Michalowicz, Jacob Hatef, Lang Xu, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Demystifying the Communication Characteristics for Distributed Transformer Models. CoRR abs/2408.10197 (2024) - [i12]Jinghan Yao, Sam Ade Jacobs, Masahiro Tanaka, Olatunji Ruwase, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer. CoRR abs/2408.16978 (2024) - [i11]Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Large Language Model Training with Hybrid GPU-based Compression. CoRR abs/2409.02423 (2024) - 2023
- [j16]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect. J. Comput. Sci. Technol. 38(1): 128-145 (2023) - [j15]Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries. IEEE Micro 43(2): 131-139 (2023) - [c154]Pouya Kousha, Qinghua Zhou, Hari Subramoni, Dhabaleswar K. Panda:
Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data. Bench 2023: 104-119 - [c153]Nawras Alnaasan, Matthew Lieber, Aamir Shafi, Hari Subramoni, Scott A. Shearer, Dhabaleswar K. Panda:
HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training. IEEE Big Data 2023: 139-148 - [c152]Kinan Al-Attar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC. IEEE Big Data 2023: 2265-2274 - [c151]Chen-Chun Chen, Kawthar Shafie Khorassani, Goutham Kalikrishna Reddy Kuncham, Rahul Vaidya, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences. CCGrid 2023: 131-140 - [c150]Quentin Anthony, Lang Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
ScaMP: Scalable Meta-Parallelism for Deep Learning Search. CCGridW 2023: 346-348 - [c149]Quentin Anthony, Lang Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
ScaMP: Scalable Meta-Parallelism for Deep Learning Search. CCGrid 2023: 391-402 - [c148]Shulei Xu, Goutham Kalikrishna Reddy Kuncham, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Optimized All-to-All Connection Establishment for High-Performance MPI Libraries Over InfiniBand. HiPC 2023: 41-50 - [c147]Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference. HiPC 2023: 107-116 - [c146]Bharath Ramesh, Goutham Kalikrishna Reddy Kuncham, Kaushik Kandadi Suresh, Rahul Vaidya, Nawras Alnaasan, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Designing In-network Computing Aware Reduction Collectives in MPI. HOTI 2023: 25-32 - [c145]Benjamin Michalowicz, Kaushik Kandadi Suresh, Hari Subramoni, Dhabaleswar K. D. K. Panda, Stephen W. Poole:
Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs. HOTI 2023: 41-48 - [c144]Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of Using Quantization for DNN Inference on Edge Devices. ICFEC 2023: 1-6 - [c143]Nicholas Contini, Bharath Ramesh, Kaushik Kandadi Suresh, Tu Tran, Benjamin Michalowicz, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication. ICS 2023: 477-487 - [c142]Kaushik Kandadi Suresh, Benjamin Michalowicz, Bharath Ramesh, Nicholas Contini, Jinghan Yao, Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs. IPDPS 2023: 123-133 - [c141]Qinghua Zhou, Quentin Anthony, Lang Xu, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication. IPDPS 2023: 134-144 - [c140]Benjamin Michalowicz, Kaushik Kandadi Suresh, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Mustafa Abduljabbar, Dhabaleswar K. Panda:
In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences. IPDPS Workshops 2023: 354-363 - [c139]Kawthar Shafie Khorassani, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc*. IPDPS 2023: 646-656 - [c138]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. IPDPS 2023: 996-1006 - [c137]Pouya Kousha, Vivekananda Sathu, Matthew Lieber, Hari Subramoni, Dhabaleswar K. Panda:
Democratizing HPC Access and Use with Knowledge Graphs. SC Workshops 2023: 242-251 - [c136]Chen-Chun Chen, Kawthar Shafie Khorassani, Pouya Kousha, Qinghua Zhou, Jinghan Yao, Hari Subramoni, Dhabaleswar K. Panda:
MPI-xCCL: A Portable MPI Library over Collective Communication Libraries for Various Accelerators. SC Workshops 2023: 847-854 - [c135]Pouya Kousha, Arpan Jain, Ayyappa Kolli, Matthew Lieber, Mingzhe Han, Nicholas Contini, Hari Subramoni, Dhabaleswar K. Panda:
SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC. ISC 2023: 402-424 - [c134]Benjamin Michalowicz, Kaushik Kandadi Suresh, Hari Subramoni, Dhabaleswar K. Panda, Steve Poole:
DPU-Bench: A Micro-Benchmark Suite to Measure Offload Efficiency Of SmartNICs. PEARC 2023: 94-101 - [c133]Samuel Khuvis, Karen Tomko, Scott R. Brozell, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Optimizing Amber for Device-to-Device GPU Communication. PEARC 2023: 200-205 - [i10]Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version. CoRR abs/2303.05016 (2023) - [i9]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. CoRR abs/2303.08374 (2023) - [i8]Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference. CoRR abs/2305.13484 (2023) - 2022
- [j14]Arpan Jain, Nawras Alnaasan, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs. IEEE Micro 42(2): 53-60 (2022) - [c132]Kinan Al-Attar, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI. CLUSTER 2022: 71-81 - [c131]Apan Qasem, Hartwig Anzt, Eduard Ayguadé, Katharine J. Cahill, Ramon Canal, Jany Chan, Eric Fosler-Lussier, Fritz Göbel, Arpan Jain, Marcel Koch, Mateusz Kuzak, Josep Llosa, Raghu Machiraju, Xavier Martorell, Pratik Nayak, Shameema Oottikkal, Marcin Ostasz, Dhabaleswar K. Panda, Dirk Pleiter, Rajiv Ramnath, Maria-Ribera Sancho, Alessio Sclocco, Aamir Shafi, Hanno Spreeuw, Hari Subramoni, Karen Tomko:
Lightning Talks of EduHPC 2022. EduHPC@SC 2022: 42-49 - [c130]Qinghua Zhou, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads. HIPC 2022: 22-31 - [c129]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters. HIPC 2022: 32-41 - [c128]Bharath Ramesh, Qinghua Zhou, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries. HIPC 2022: 95-99 - [c127]Kaushik Kandadi Suresh, Akshay Paniraja Guptha, Benjamin Michalowicz, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters. HIPC 2022: 100-104 - [c126]Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries. HOTI 2022: 13-20 - [c125]Tu Tran, Benjamin Michalowicz, Bharath Ramesh, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
Designing Hierarchical Multi-HCA Aware Allgather in MPI. ICPP Workshops 2022: 28:1-28:10 - [c124]Chen-Chun Chen, Kawthar Shafie Khorassani, Quentin G. Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems. IPDPS Workshops 2022: 24-33 - [c123]Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter. IPDPS Workshops 2022: 449-456 - [c122]Kinan Al-Attar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Towards Java-based HPC using the MVAPICH2 Library: Early Experiences. IPDPS Workshops 2022: 510-519 - [c121]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. IPDPS Workshops 2022: 870-879 - [c120]Qinghua Zhou, Pouya Kousha, Quentin Anthony, Kawthar Shafie Khorassani, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters. ISC 2022: 3-25 - [c119]Pouya Kousha, Arpan Jain, Ayyappa Kolli, Prasanna Sainath, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
"Hey CAI" - Conversational AI Enabled User Interface for HPC Tools. ISC 2022: 87-108 - [c118]Arpan Jain, Aamir Shafi, Quentin Anthony, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda:
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters. ISC 2022: 109-130 - [c117]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect: Early Experiences. PEARC 2022: 15:1-15:7 - 2021
- [j13]Dhabaleswar Kumar Panda, Hari Subramoni, Ching-Hsiang Chu, Mohammadreza Bayatpour:
The MVAPICH project: Transforming research into high-performance MPI library for HPC community. J. Comput. Sci. 52: 101208 (2021) - [c116]Kawthar Shafie Khorassani, Ching-Hsiang Chu, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems. CCGRID 2021: 113-122 - [c115]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CCGRID 2021: 277-286 - [c114]Bharath Ramesh, Jahanzeb Maqbool Hashmi, Shulei Xu, Aamir Shafi, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems. HiPC 2021: 272-281 - [c113]Yuntian He, Saket Gurukar, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda, Srinivasan Parthasarathy:
DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding. HiPC 2021: 282-291 - [c112]Kaushik Kandadi Suresh, Bharath Ramesh, Chen-Chun Chen, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Layout-aware Hardware-assisted Designs for Derived Data Types in MPI. HiPC 2021: 302-311 - [c111]Nick Sarkauskas, Mohammadreza Bayatpour, Tu Tran, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU. HiPC 2021: 388-393 - [c110]Arpan Jain, Nawras Alnaasan, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs. HOTI 2021: 17-24 - [c109]Q. Zhou, C. Chu, N. S. Kumar, Pouya Kousha, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni, Dhabaleswar K. Panda:
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters*. IPDPS 2021: 444-453 - [c108]Arpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda, Brian Van Essen:
SUPER: SUb-Graph Parallelism for TransformERs. IPDPS 2021: 629-638 - [c107]Quentin Anthony, Lang Xu, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences. IPDPS Workshops 2021: 923-932 - [c106]Mohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. ISC 2021: 18-37 - [c105]Kawthar Shafie Khorassani, Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences. ISC 2021: 118-136 - [c104]Pouya Kousha, Kamal Raj Sankarapandian Dayala Ganesh Ram, Mansa Kedia, Hari Subramoni, Arpan Jain, Aamir Shafi, Dhabaleswar K. Panda, Trey Dockendorf, Heechang Na, Karen Tomko:
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications. PEARC 2021: 14:1-14:11 - [i7]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CoRR abs/2101.08878 (2021) - [i6]Pouya Kousha, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters. CoRR abs/2109.08329 (2021) - [i5]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. CoRR abs/2110.10659 (2021) - 2020
- [j12]Sourav Chakraborty, Ignacio Laguna, Murali Emani, Kathryn M. Mohror, Dhabaleswar K. Panda, Martin Schulz, Hari Subramoni:
EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications. Concurr. Comput. Pract. Exp. 32(3) (2020) - [j11]Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures. J. Parallel Distributed Comput. 144: 1-13 (2020) - [j10]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects. IEEE Micro 40(1): 35-43 (2020) - [c103]Mohammadreza Bayatpour, Seyedeh Mahdieh Ghazimirsaeed, Shulei Xu, Hari Subramoni, Dhabaleswar K. Panda:
Design and Characterization of InfiniBand Hardware Tag Matching in MPI. CCGRID 2020: 101-110 - [c102]Ching-Hsiang Chu, Kawthar Shafie Khorassani, Qinghua Zhou, Hari Subramoni, Dhabaleswar K. Panda:
Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters. CLUSTER 2020: 130-141 - [c101]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications. HiPC 2020: 111-120 - [c100]Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. ICS 2020: 6:1-6:12 - [c99]Jahanzeb Maqbool Hashmi, Shulei Xu, Bharath Ramesh, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures. IPDPS 2020: 32-41 - [c98]Amit Ruhela, Shulei Xu, Karthik Vadambacheri Manian, Hari Subramoni, Dhabaleswar K. Panda:
Analyzing and Understanding the Impact of Interconnect Performance on HPC, Big Data, and Deep Learning Applications: A Case Study with InfiniBand EDR and HDR. IPDPS Workshops 2020: 869-878 - [c97]Kaushik Kandadi Suresh, Bharath Ramesh, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI. IPDPS Workshops 2020: 896-905 - [c96]Quentin Anthony, Ammar Ahmad Awan, Arpan Jain, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR. IPDPS Workshops 2020: 1015-1023 - [c95]Bharath Ramesh, Kaushik Kandadi Suresh, Nick Sarkauskas, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System. ExaMPI@SC 2020: 11-20 - [c94]Seyedeh Mahdieh Ghazimirsaeed, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR. MLHPC/AI4S@SC 2020: 17-28 - [c93]Shulei Xu, Seyedeh Mahdieh Ghazimirsaeed, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
MPI Meets Cloud: Case Study with Amazon EC2 and Microsoft Azure. IPDRM@SC 2020: 41-48 - [c92]Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani:
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. SC 2020: 45 - [c91]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. ISC 2020: 83-103 - [c90]Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Kaushik Kandadi Suresh, Seyedeh Mahdieh Ghazimirsaeed, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Communication-Aware Hardware-Assisted MPI Overlap Engine. ISC 2020: 517-535 - [c89]Pouya Kousha, Kamal Raj S. D., Hari Subramoni, Dhabaleswar K. Panda, Heechang Na, Trey Dockendorf, Karen Tomko:
Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM. PEARC 2020: 215-223
2010 – 2019
- 2019
- [j9]Amit Ruhela, Hari Subramoni, Sourav Chakraborty, Mohammadreza Bayatpour, Pouya Kousha, Dhabaleswar K. Panda:
Efficient design for MPI asynchronous progress without dedicated resources. Parallel Comput. 85: 13-26 (2019) - [j8]Ammar Ahmad Awan, Karthik Vadambacheri Manian, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? Parallel Comput. 85: 141-152 (2019) - [j7]Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Bracy Elton, Dhabaleswar K. Panda:
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast. IEEE Trans. Parallel Distributed Syst. 30(3): 575-588 (2019) - [c88]Karthik Vadambacheri Manian, A. A. Ammar, Amit Ruhela, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures. GPGPU@ASPLOS 2019: 43-52 - [c87]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures. CCGRID 2019: 410-419 - [c86]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CCGRID 2019: 498-507 - [c85]Arpan Jain, Ammar Ahmad Awan, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters. CLUSTER 2019: 1-11 - [c84]Pouya Kousha, Bharath Ramesh, Kaushik Kandadi Suresh, Ching-Hsiang Chu, Arpan Jain, Nick Sarkauskas, Hari Subramoni, Dhabaleswar K. Panda:
Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters. HiPC 2019: 93-102 - [c83]Ching-Hsiang Chu, Jahanzeb Maqbool Hashmi, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. Panda:
High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems. HiPC 2019: 267-276 - [c82]Sourav Chakraborty, Shulei Xu, Hari Subramoni, Dhabaleswar K. Panda:
Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter. Hot Interconnects 2019: 40-44 - [c81]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects. Hot Interconnects 2019: 49-53 - [c80]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures. IPDPS 2019: 355-364 - [c79]Dhabaleswar K. Panda, Ammar Ahmad Awan, Hari Subramoni:
High performance distributed deep learning: a beginner's guide. PPoPP 2019: 452-454 - [c78]Amit Ruhela, Bharath Ramesh, Sourav Chakraborty, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast. IPDRM@SC 2019: 34-41 - [c77]Shulei Xu, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda:
Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2. IPDRM@SC 2019: 42-49 - [c76]Arpan Jain, Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera. DLS@SC 2019: 76-83 - [c75]Karthik Vadambacheri Manian, Ching-Hsiang Chu, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni:
OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks. PMBS@SC 2019: 82-92 - [c74]Kawthar Shafie Khorassani, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences. ISC Workshops 2019: 361-378 - [i4]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow. CoRR abs/1911.05146 (2019) - 2018
- [j6]Dhabaleswar K. Panda, Xiaoyi Lu, Hari Subramoni:
Networking and communication challenges for post-exascale systems. Frontiers Inf. Technol. Electron. Eng. 19(10): 1230-1235 (2018) - [j5]Srinivasan Ramesh, Aurèle Mahéo, Sameer Shende, Allen D. Malony, Hari Subramoni, Amit Ruhela, Dhabaleswar K. Panda:
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU. Parallel Comput. 77: 19-37 (2018) - [c73]Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Hari Subramoni, Pouya Kousha, Dhabaleswar K. Panda:
SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives. CLUSTER 2018: 12-23 - [c72]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. HiPC 2018: 143-152 - [c71]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores. IPDPS 2018: 1020-1029 - [c70]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? EuroMPI 2018: 2:1-2:9 - [c69]Mingzhe Li, Xiaoyi Lu, Hari Subramoni, Dhabaleswar K. Panda:
Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures. EuroMPI 2018: 4:1-4:10 - [c68]Amit Ruhela, Hari Subramoni, Sourav Chakraborty, Mohammadreza Bayatpour, Pouya Kousha, Dhabaleswar K. Panda:
Efficient Asynchronous Communication Progress for MPI without Dedicated Resources. EuroMPI 2018: 14:1-14:11 - [c67]Sourav Chakraborty, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Cooperative rendezvous protocols for improved performance and overlap. SC 2018: 28:1-28:13 - [i3]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CoRR abs/1810.11112 (2018) - 2017
- [c66]Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda:
Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems. CLUSTER 2017: 13-24 - [c65]Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems. CLUSTER 2017: 354-358 - [c64]Mingzhe Li, Xiaoyi Lu, Hari Subramoni, Dhabaleswar K. Panda:
Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand. HiPC 2017: 62-71 - [c63]Jahanzeb Maqbool Hashmi, Khaled Hamidouche, Hari Subramoni, Dhabaleswar K. Panda:
Kernel-Assisted Communication Engine for MPI on Emerging Manycore Processors. HiPC 2017: 84-93 - [c62]Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Maqbool Hashmi, Bracy Elton, Dhabaleswar K. Panda:
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning. ICPP 2017: 161-170 - [c61]Jahanzeb Maqbool Hashmi, Mingzhe Li, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting and Evaluating OpenSHMEM on KNL Architecture. OpenSHMEM 2017: 143-158 - [c60]Srinivasan Ramesh, Aurèle Mahéo, Sameer Shende, Allen D. Malony, Hari Subramoni, Dhabaleswar K. Panda:
MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU. EuroMPI/USA 2017: 16:1-16:11 - [c59]Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures. MLHPC@SC 2017: 8:1-8:8 - [c58]Mohammadreza Bayatpour, Sourav Chakraborty, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
Scalable reduction collectives with data partitioning-based multi-leader design. SC 2017: 64 - [c57]Hari Subramoni, Sourav Chakraborty, Dhabaleswar K. Panda:
Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication. ISC 2017: 334-354 - [i2]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? CoRR abs/1707.09414 (2017) - 2016
- [j4]Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu, Dhabaleswar K. Panda:
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters. Parallel Comput. 58: 27-36 (2016) - [c56]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Dhabaleswar K. Panda:
SHMEMPMI - Shared Memory Based PMI for Improved Performance and Scalability. CCGrid 2016: 60-69 - [c55]Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, Hari Subramoni, Dhabaleswar K. Panda:
Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase. CloudCom 2016: 310-317 - [c54]Mohammadreza Bayatpour, Hari Subramoni, Sourav Chakraborty, Dhabaleswar K. Panda:
Adaptive and Dynamic Design for MPI Tag Matching. CLUSTER 2016: 1-10 - [c53]Jiajun Cao, Kapil Arya, Rohan Garg, L. Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jérôme Vienne, Gene Cooperman:
System-Level Scalable Checkpoint-Restart for Petascale Computing. ICPADS 2016: 932-941 - [c52]Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Dip Sankar Banerjee, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems. IPDPS 2016: 983-992 - [c51]Ching-Hsiang Chu, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters. SBAC-PAD 2016: 59-66 - [c50]Ching-Hsiang Chu, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications. COMHPC@SC 2016: 29-38 - [c49]Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, Dhabaleswar K. Panda:
Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits. SC 2016: 433-443 - [c48]Hari Subramoni, Albert Mathews Augustine, Mark Daniel Arnold, Jonathan L. Perkins, Xiaoyi Lu, Khaled Hamidouche, Dhabaleswar K. Panda:
INAM2: InfiniBand Network Analysis and Monitoring with MPI. ISC 2016: 300-320 - [i1]Jiajun Cao, Kapil Arya, Rohan Garg, L. Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jérôme Vienne, Gene Cooperman:
System-level Scalable Checkpoint-Restart for Petascale Computing. CoRR abs/1607.07995 (2016) - 2015
- [c47]Sourav Chakraborty, Hari Subramoni, Adam Moody, Akshay Venkatesh, Jonathan L. Perkins, Dhabaleswar K. Panda:
Non-Blocking PMI Extensions for Fast MPI Startup. CCGRID 2015: 131-140 - [c46]Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu, Dhabaleswar K. Panda:
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters. CLUSTER 2015: 78-87 - [c45]Mingzhe Li, Hari Subramoni, Khaled Hamidouche, Xiaoyi Lu, Dhabaleswar K. Panda:
High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits. CLUSTER 2015: 226-235 - [c44]Akshay Venkatesh, Khaled Hamidouche, Hari Subramoni, Dhabaleswar K. Panda:
Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters. HiPC 2015: 234-243 - [c43]Hari Subramoni, Akshay Venkatesh, Khaled Hamidouche, Karen Tomko, Dhabaleswar K. Panda:
Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms. Hot Interconnects 2015: 60-67 - [c42]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Ammar Ahmad Awan, Dhabaleswar K. Panda:
On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI. IPDPS Workshops 2015: 235-244 - [c41]A. A. Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan L. Perkins, Hari Subramoni, Dhabaleswar K. Panda:
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks. EuroMPI 2015: 9:1-9:10 - [c40]Hari Subramoni, Ammar Ahmad Awan, Khaled Hamidouche, Dmitry Pekurovsky, Akshay Venkatesh, Sourav Chakraborty, Karen Tomko, Dhabaleswar K. Panda:
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters. ISC 2015: 434-453 - [e1]Dhabaleswar K. Panda, Karl W. Schulz, Khaled Hamidouche, Hari Subramoni:
Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, ESPM 2015, Austin, Texas, USA, November 15, 2015. ACM 2015, ISBN 978-1-4503-3996-4 [contents] - 2014
- [c39]Akshay Venkatesh, Hari Subramoni, Khaled Hamidouche, Dhabaleswar K. Panda:
A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters. HiPC 2014: 1-10 - [c38]Prasad Calyam, Alex Berryman, Erik Saule, Hari Subramoni, Paul Schopis, Gordon Springer, Ümit V. Çatalyürek, Dhabaleswar K. Panda:
Wide-area overlay networking to manage science DMZ accelerated flows. ICNC 2014: 269-275 - [c37]Hari Subramoni, Krishna Chaitanya Kandalla, Jithin Jose, Karen Tomko, Karl W. Schulz, Dmitry Pekurovsky, Dhabaleswar K. Panda:
Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters. ICPP 2014: 231-240 - [c36]Jithin Jose, Sreeram Potluri, Hari Subramoni, Xiaoyi Lu, Khaled Hamidouche, Karl W. Schulz, Hari Sundar, Dhabaleswar K. Panda:
Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models. PGAS 2014: 7:1-7:9 - [c35]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Adam Moody, Mark Daniel Arnold, Dhabaleswar K. Panda:
PMI Extensions for Scalable MPI Startup. EuroMPI/ASIA 2014: 21 - [c34]Hari Subramoni, Khaled Hamidouche, Akshay Venkatesh, Sourav Chakraborty, Dhabaleswar K. Panda:
Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences. ISC 2014: 278-295 - 2013
- [c33]Hari Subramoni, Devendar Bureddy, Krishna Chaitanya Kandalla, Karl W. Schulz, Bill Barth, Jonathan L. Perkins, Mark Daniel Arnold, Dhabaleswar K. Panda:
Design of network topology aware scheduling services for large InfiniBand clusters. CLUSTER 2013: 1-8 - [c32]Krishna Chaitanya Kandalla, Hari Subramoni, Karen Tomko, Dmitry Pekurovsky, Dhabaleswar K. Panda:
A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems. ICPP 2013: 611-620 - [c31]Xiaoyi Lu, Nusrat S. Islam, Md. Wasi-ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. Panda:
High-Performance Design of Hadoop RPC with RDMA over InfiniBand. ICPP 2013: 641-650 - [c30]Khaled Hamidouche, Sreeram Potluri, Hari Subramoni, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda:
MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand. ICS 2013: 399-408 - [c29]Sreeram Potluri, Devendar Bureddy, Hao Wang, Hari Subramoni, Dhabaleswar K. Panda:
Extending OpenSHMEM for GPU Computing. IPDPS 2013: 1001-1012 - [c28]Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. Panda:
High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand. IPDPS Workshops 2013: 1908-1917 - [c27]Sreeram Potluri, Devendar Bureddy, Khaled Hamidouche, Akshay Venkatesh, Krishna Chaitanya Kandalla, Hari Subramoni, Dhabaleswar K. Panda:
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters. SC 2013: 54:1-54:11 - 2012
- [c26]Jithin Jose, Hari Subramoni, Krishna Chaitanya Kandalla, Md. Wasi-ur-Rahman, Hao Wang, Sundeep Narravula, Dhabaleswar K. Panda:
Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports. CCGRID 2012: 236-243 - [c25]Krishna Chaitanya Kandalla, Aydin Buluç, Hari Subramoni, Karen Tomko, Jérôme Vienne, Leonid Oliker, Dhabaleswar K. Panda:
Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms? CLUSTER Workshops 2012: 222-230 - [c24]Raghunath Rajachandrasekar, Jai Jaswani, Hari Subramoni, Dhabaleswar K. Panda:
Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework. CLUSTER 2012: 329-336 - [c23]Hari Subramoni, Jérôme Vienne, Dhabaleswar K. Panda:
A Scalable InfiniBand Network Topology-Aware Performance Analysis Tool for MPI. Euro-Par Workshops 2012: 439-450 - [c22]Jérôme Vienne, Jitong Chen, Md. Wasi-ur-Rahman, Nusrat S. Islam, Hari Subramoni, Dhabaleswar K. Panda:
Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems. Hot Interconnects 2012: 48-55 - [c21]Jian Huang, Xiangyong Ouyang, Jithin Jose, Md. Wasi-ur-Rahman, Hao Wang, Miao Luo, Hari Subramoni, Chet Murthy, Dhabaleswar K. Panda:
High-Performance Design of HBase with RDMA over InfiniBand. IPDPS 2012: 774-785 - [c20]Krishna Chaitanya Kandalla, Ulrike Meier Yang, Jeff Keasler, Tzanio V. Kolev, Adam Moody, Hari Subramoni, Karen Tomko, Jérôme Vienne, Bronis R. de Supinski, Dhabaleswar K. Panda:
Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers. IPDPS 2012: 1156-1167 - [c19]S. Pai Raikar, Hari Subramoni, Krishna Chaitanya Kandalla, Jérôme Vienne, Dhabaleswar K. Panda:
Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters. IPDPS Workshops 2012: 1160-1167 - [c18]Md. Wasi-ur-Rahman, Jian Huang, Jithin Jose, Xiangyong Ouyang, Hao Wang, Nusrat S. Islam, Hari Subramoni, Chet Murthy, Dhabaleswar K. Panda:
Understanding the communication characteristics in HBase: What are the fundamental bottlenecks? ISPASS 2012: 122-123 - [c17]Nusrat S. Islam, Md. Wasi-ur-Rahman, Jithin Jose, Raghunath Rajachandrasekar, Hao Wang, Hari Subramoni, Chet Murthy, Dhabaleswar K. Panda:
High performance RDMA-based design of HDFS over InfiniBand. SC 2012: 35 - [c16]Hari Subramoni, Sreeram Potluri, Krishna Chaitanya Kandalla, Bill Barth, Jérôme Vienne, Jeff Keasler, Karen A. Tomko, Karl W. Schulz, Adam Moody, Dhabaleswar K. Panda:
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes. SC 2012: 70 - 2011
- [j3]Sayantan Sur, Sreeram Potluri, Krishna Chaitanya Kandalla, Hari Subramoni, Dhabaleswar K. Panda, Karen Tomko:
Codesign for InfiniBand Clusters. Computer 44(11): 31-36 (2011) - [j2]Krishna Chaitanya Kandalla, Hari Subramoni, Karen A. Tomko, Dmitry Pekurovsky, Sayantan Sur, Dhabaleswar K. Panda:
High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT. Comput. Sci. Res. Dev. 26(3-4): 237-246 (2011) - [c15]Hari Subramoni, Krishna Chaitanya Kandalla, Jérôme Vienne, Sayantan Sur, Bill Barth, Karen A. Tomko, Robert T. McLay, Karl W. Schulz, Dhabaleswar K. Panda:
Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters. CLUSTER 2011: 317-325 - [c14]N. Dandapanthula, Hari Subramoni, Jérôme Vienne, Krishna Chaitanya Kandalla, Sayantan Sur, Dhabaleswar K. Panda, Ron Brightwell:
INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool. Euro-Par Workshops (2) 2011: 166-177 - [c13]Krishna Chaitanya Kandalla, Hari Subramoni, Jérôme Vienne, S. Pai Raikar, Karen Tomko, Sayantan Sur, Dhabaleswar K. Panda:
Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL. Hot Interconnects 2011: 27-34 - [c12]Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan Sur, Dhabaleswar K. Panda:
Memcached Design on High Performance RDMA Capable Interconnects. ICPP 2011: 743-752 - [r1]Dhabaleswar K. Panda, Sayantan Sur, Hari Subramoni, Krishna Chaitanya Kandalla:
Collective Communication, Network Support For. Encyclopedia of Parallel Computing 2011: 327-334 - 2010
- [j1]Hari Subramoni, Fabrizio Petrini, Virat Agarwal, Davide Pasetto:
Intra-Socket and Inter-Socket Communication in Multi-core Systems. IEEE Comput. Archit. Lett. 9(1): 13-16 (2010) - [c11]Hari Subramoni, Ping Lai, Rajkumar Kettimuthu, Dhabaleswar K. Panda:
High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand. CCGRID 2010: 557-564 - [c10]Hari Subramoni, Krishna Chaitanya Kandalla, Sayantan Sur, Dhabaleswar K. Panda:
Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine. Hot Interconnects 2010: 40-49 - [c9]Hari Subramoni, Ping Lai, Sayantan Sur, Dhabaleswar K. Panda:
Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters. ICPP 2010: 462-471 - [c8]Miao Luo, Sreeram Potluri, Ping Lai, Emilio Pasquale Mancini, Hari Subramoni, Krishna Chaitanya Kandalla, Sayantan Sur, Dhabaleswar K. Panda:
High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2. ICPP Workshops 2010: 377-386 - [c7]Krishna Chaitanya Kandalla, Hari Subramoni, Abhinav Vishnu, Dhabaleswar K. Panda:
Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather. IPDPS Workshops 2010: 1-8 - [c6]Hari Subramoni, Fabrizio Petrini, Virat Agarwal, Davide Pasetto:
Streaming, low-latency communication in on-line trading systems. IPDPS Workshops 2010: 1-8 - [p1]Hari Subramoni, Fabrizio Petrini, Virat Agarwal, Davide Pasetto:
High Performance Topology-Aware Communication in Multicore Processors. Scientific Computing with Multicore and Accelerators 2010: 443-460
2000 – 2009
- 2009
- [c5]Hari Subramoni, Ping Lai, Miao Luo, Dhabaleswar K. Panda:
RDMA over Ethernet - A preliminary study. CLUSTER 2009: 1-9 - [c4]Hari Subramoni, Matthew J. Koop, Dhabaleswar K. Panda:
Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms. Hot Interconnects 2009: 112-120 - [c3]Ping Lai, Hari Subramoni, Sundeep Narravula, Amith R. Mamidala, Dhabaleswar K. Panda:
Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand. ICPP 2009: 156-163 - [c2]Krishna Chaitanya Kandalla, Hari Subramoni, Gopalakrishnan Santhanaraman, Matthew J. Koop, Dhabaleswar K. Panda:
Designing multi-leader-based Allgather algorithms for multi-core clusters. IPDPS 2009: 1-8 - 2008
- [c1]Sundeep Narravula, Hari Subramoni, Ping Lai, Ranjit Noronha, Dhabaleswar K. Panda:
Performance of HPC Middleware over InfiniBand WAN. ICPP 2008: 304-311
Coauthor Index
aka: Quentin G. Anthony
aka: A. A. Awan
aka: Mustafa Abduljabbar
aka: Karen Tomko
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-23 21:28 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint