default search action

combined dblp search
author search
venue search
publication search

ask others

Ammar Ahmad Awan

A. A. Awan

> Home > Persons

Person information

affiliation: The Ohio State University, Columbus, OH, USA

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[i18]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-08671
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-08671
Connor Holmes, Masahiro Tanaka, Michael Wyatt, Ammar Ahmad Awan, Jeff Rasley, Samyam Rajbhandari, Reza Yazdani Aminabadi, Heyang Qin, Arash Bakhtiari, Lev Kurilenko, Yuxiong He:
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference. CoRR abs/2401.08671 (2024)
[i17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2404-14219
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2404-14219
Marah I Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat S. Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou:
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. CoRR abs/2404.14219 (2024)
2023
[c34]
- view
  authority control:
- export record
  dblp key:
  - conf/ics/SinghRARHB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ics/SinghRARHB23
Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele:
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. ICS 2023: 203-214
[c33]
- view
  authority control:
- export record
  dblp key:
  - conf/ipps/AnthonyARHSASP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ipps/AnthonyARHSASP23
Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. IPDPS 2023: 996-1006
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-06318
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-06318
Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele:
A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training. CoRR abs/2303.06318 (2023)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-08374
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-08374
Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. CoRR abs/2303.08374 (2023)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2308-01320
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2308-01320
Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He:
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. CoRR abs/2308.01320 (2023)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-14327
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-14327
Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qin, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He:
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention. CoRR abs/2309.14327 (2023)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-04610
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-04610
Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan A. Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael W. Irvin, J. Gregory Pauloski, Logan T. Ward, Valérie Hayot-Sasson, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian T. Foster, James J. Davis, Michael E. Papka, Thomas S. Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi A. Hanson, Thomas E. Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton D. Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin M. Aji, Angela Dalton, Michael J. Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens:
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. CoRR abs/2310.04610 (2023)
2022
[c32]
- view
  authority control:
- export record
  dblp key:
  - conf/hipc/LiATRH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/hipc/LiATRH22
Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He:
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. HIPC 2022: 272-281
[c31]
- view
  - electronic edition @ mlr.press (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/RajbhandariLYZA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/RajbhandariLYZA22
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He:
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. ICML 2022: 18332-18346
[c30]
- view
  authority control:
- export record
  dblp key:
  - conf/sc/AminabadiRALLZRSZRH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/sc/AminabadiRALLZRSZRH22
Reza Yazdani Aminabadi, Samyam Rajbhandari, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Olatunji Ruwase, Shaden Smith, Minjia Zhang, Jeff Rasley, Yuxiong He:
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC 2022: 46:1-46:15
[i11]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2201-05596
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2201-05596
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He:
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. CoRR abs/2201.05596 (2022)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2207-00032
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2207-00032
Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He:
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. CoRR abs/2207.00032 (2022)
2021
[c29]
- view
  - electronic edition @ mlr.press (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/TangGARLLLZH21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/TangGARLLLZH21
Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He:
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. ICML 2021: 10118-10129
[i9]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2102-02888
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2102-02888
Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He:
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. CoRR abs/2102.02888 (2021)
[i8]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2104-06069
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2104-06069
Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He:
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. CoRR abs/2104.06069 (2021)
[i7]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2109-10465
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2109-10465
Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andrés Felipe Cruz-Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla:
Scalable and Efficient MoE Training for Multitask Multilingual Models. CoRR abs/2109.10465 (2021)
2020
[j5]
- view
  authority control:
- export record
  dblp key:
  - journals/micro/AwanJCSP20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/micro/AwanJCSP20
Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects. IEEE Micro 40(1): 35-43 (2020)
[c28]
- view
  authority control:
- export record
  dblp key:
  - conf/ics/ChuKAKSP20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ics/ChuKAKSP20
Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. ICS 2020: 6:1-6:12
[c27]
- view
  authority control:
- export record
  dblp key:
  - conf/ipps/AnthonyAJSP20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ipps/AnthonyAJSP20
Quentin Anthony, Ammar Ahmad Awan, Arpan Jain, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR. IPDPS Workshops 2020: 1015-1023
[c26]
- view
  authority control:
- export record
  dblp key:
  - conf/sc/JainAAHASPMP20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/sc/JainAAHASPMP20
Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani:
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. SC 2020: 45
[c25]
- view
  authority control:
- export record
  dblp key:
  - conf/supercomputer/AwanJASP20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/supercomputer/AwanJASP20
Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. ISC 2020: 83-103

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[j4]
- view
  authority control:
- export record
  dblp key:
  - journals/pc/AwanMCSP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/pc/AwanMCSP19
Ammar Ahmad Awan, Karthik Vadambacheri Manian, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? Parallel Comput. 85: 141-152 (2019)
[j3]
- view
  authority control:
- export record
  dblp key:
  - journals/tpds/ChuLASEP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/tpds/ChuLASEP19
Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Bracy Elton, Dhabaleswar K. Panda:
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast. IEEE Trans. Parallel Distributed Syst. 30(3): 575-588 (2019)
[c24]
- view
  authority control:
- export record
  dblp key:
  - conf/ccgrid/AwanBCSP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ccgrid/AwanBCSP19
Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CCGRID 2019: 498-507
[c23]
- view
  authority control:
- export record
  dblp key:
  - conf/cluster/JainAASP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cluster/JainAASP19
Arpan Jain, Ammar Ahmad Awan, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters. CLUSTER 2019: 1-11
[c22]
- view
  authority control:
- export record
  dblp key:
  - conf/hoti/AwanJCSP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/hoti/AwanJCSP19
Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects. Hot Interconnects 2019: 49-53
[c21]
- view
  authority control:
- export record
  dblp key:
  - conf/ppopp/PandaAS19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ppopp/PandaAS19
Dhabaleswar K. Panda, Ammar Ahmad Awan, Hari Subramoni:
High performance distributed deep learning: a beginner's guide. PPoPP 2019: 452-454
[c20]
- view
  authority control:
- export record
  dblp key:
  - conf/sc/JainASP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/sc/JainASP19
Arpan Jain, Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera. DLS@SC 2019: 76-83
[c19]
- view
  authority control:
- export record
  dblp key:
  - conf/sc/ManianCAKS19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/sc/ManianCAKS19
Karthik Vadambacheri Manian, Ching-Hsiang Chu, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni:
OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks. PMBS@SC 2019: 82-92
[i6]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1911-05146
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1911-05146
Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow. CoRR abs/1911.05146 (2019)
2018
[c18]
- view
  authority control:
- export record
  dblp key:
  - conf/hipc/AwanCSLP18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/hipc/AwanCSLP18
Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. HiPC 2018: 143-152
[c17]
- view
  authority control:
- export record
  dblp key:
  - conf/pvm/AwanCSP18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/pvm/AwanCSP18
Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? EuroMPI 2018: 2:1-2:9
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1810-11112
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1810-11112
Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CoRR abs/1810.11112 (2018)
2017
[c16]
- view
  authority control:
- export record
  dblp key:
  - conf/icpp/ChuLASHEP17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icpp/ChuLASHEP17
Ching-Hsiang Chu, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Maqbool Hashmi, Bracy Elton, Dhabaleswar K. Panda:
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning. ICPP 2017: 161-170
[c15]
- view
  authority control:
- export record
  dblp key:
  - conf/ppopp/AwanHHP17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ppopp/AwanHHP17
Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters. PPoPP 2017: 193-205
[c14]
- view
  authority control:
- export record
  dblp key:
  - conf/sc/AwanSP17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/sc/AwanSP17
Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures. MLHPC@SC 2017: 8:1-8:8
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/AwanCSP17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/AwanCSP17
Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? CoRR abs/1707.09414 (2017)
2016
[j2]
- view
  authority control:
- export record
  dblp key:
  - journals/pc/HamidoucheVASCP16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/pc/HamidoucheVASCP16
Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu, Dhabaleswar K. Panda:
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters. Parallel Comput. 58: 27-36 (2016)
[c13]
- view
  authority control:
- export record
  dblp key:
  - conf/ccgrid/ChuHVAP16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ccgrid/ChuHVAP16
Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Dhabaleswar K. Panda:
CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters. CCGrid 2016: 726-735
[c12]
- view
  authority control:
- export record
  dblp key:
  - conf/hipc/HamidoucheAVP16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/hipc/HamidoucheAVP16
Khaled Hamidouche, Ammar Ahmad Awan, Akshay Venkatesh, Dhabaleswar K. Panda:
CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC. HiPC 2016: 52-61
[c11]
- view
  authority control:
- export record
  dblp key:
  - conf/pvm/AwanHVP16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/pvm/AwanHVP16
A. A. Awan, Khaled Hamidouche, Akshay Venkatesh, Dhabaleswar K. Panda:
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning. EuroMPI 2016: 15-22
2015
[c10]
- view
  authority control:
- export record
  dblp key:
  - conf/cluster/HamidoucheVASCP15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cluster/HamidoucheVASCP15
Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu, Dhabaleswar K. Panda:
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters. CLUSTER 2015: 78-87
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/ipps/0003SPAP15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ipps/0003SPAP15
Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Ammar Ahmad Awan, Dhabaleswar K. Panda:
On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI. IPDPS Workshops 2015: 235-244
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/openshmem/AwanHCP15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/openshmem/AwanHCP15
A. A. Awan, Khaled Hamidouche, Ching-Hsiang Chu, Dhabaleswar K. Panda:
A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X. OpenSHMEM 2015: 69-86
[c7]
- view
  authority control:
- export record
  dblp key:
  - conf/pvm/AwanHVPSP15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/pvm/AwanHVPSP15
A. A. Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan L. Perkins, Hari Subramoni, Dhabaleswar K. Panda:
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks. EuroMPI 2015: 9:1-9:10
[c6]
- view
  authority control:
- export record
  dblp key:
  - conf/supercomputer/SubramoniAHPVCT15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/supercomputer/SubramoniAHPVCT15
Hari Subramoni, Ammar Ahmad Awan, Khaled Hamidouche, Dmitry Pekurovsky, Akshay Venkatesh, Sourav Chakraborty, Karen Tomko, Dhabaleswar K. Panda:
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters. ISC 2015: 434-453
2013
[j1]
- view
  authority control:
- export record
  dblp key:
  - journals/tjs/PervezAKLH13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/tjs/PervezAKLH13
Zeeshan Pervez, Ammar Ahmad Awan, Asad Masood Khattak, Sungyoung Lee, Eui-Nam Huh:
Privacy-aware searching with oblivious term matching for cloud storage. J. Supercomput. 63(2): 538-560 (2013)
[c5]
- view
  authority control:
- export record
  dblp key:
  - conf/bwcca/AmjadJHAR13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/bwcca/AmjadJHAR13
N. Amjad, Nadeem Javaid, Arsalan Haider, A. A. Awan, M. Rahman:
DREEM-ME: Distributed Regional Energy Efficient Multi-hop Routing Protocol Based on Maximum Energy in WSNs. BWCCA 2013: 43-48
[c4]
- view
  authority control:
- export record
  dblp key:
  - conf/bwcca/HaiderJAAKK13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/bwcca/HaiderJAAKK13
Arsalan Haider, Nadeem Javaid, N. Amjad, A. A. Awan, Abid Khan, Nasir Khan:
REECH-ME: Regional Energy Efficient Cluster Heads Based on Maximum Energy Routing Protocol for WSNs. BWCCA 2013: 88-92
[c3]
- view
  authority control:
- export record
  dblp key:
  - conf/ccgrid/AwanAHSL13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ccgrid/AwanAHSL13
Ammar Ahmad Awan, Muhammad Bilal Amin, Shujaat Hussain, Aamir Shafi, Sungyoung Lee:
An MPI-IO Compliant Java Based Parallel I/O Library. CCGRID 2013: 174-175
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/HaiderJAAKK13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/HaiderJAAKK13
Arsalan Haider, Nadeem Javaid, N. Amjad, A. A. Awan, Abid Khan, Nasir Khan:
REECH-ME: Regional Energy Efficient Cluster Heads based on Maximum Energy Routing Protocol for WSNs. CoRR abs/1307.7052 (2013)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/AmjadJHAR13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/AmjadJHAR13
N. Amjad, Nadeem Javaid, Arsalan Haider, A. A. Awan, M. Rahman:
DREEM-ME: Distributed Regional Energy Efficient Multi-hop Routing Protocol based on Maximum Energy in WSNs. CoRR abs/1307.7075 (2013)
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/RehmanJHAAQKQ13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/RehmanJHAAQKQ13
Obaid Ur Rehman, Nadeem Javaid, Arsalan Haider, N. Amjad, A. A. Awan, M. Qamar, Zahoor Ali Khan, U. Qasim:
An Energy Efficient Decoding Scheme for Wireless Body Area Sensor Networks. CoRR abs/1309.4374 (2013)
2012
[c2]
- view
  authority control:
- export record
  dblp key:
  - conf/icuimc/AminKAL12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icuimc/AminKAL12
Muhammad Bilal Amin, Wajahat Ali Khan, Ammar Ahmad Awan, Sungyoung Lee:
Intercloud message exchange middleware. ICUIMC 2012: 79:1-79:7
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/pdcat/AwanASL12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/pdcat/AwanASL12
Ammar Ahmad Awan, Muhammad Sohaib Ayub, Aamir Shafi, Sungyoung Lee:
Towards Efficient Support for Parallel I/O in Java HPC. PDCAT 2012: 137-143

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.