default search action
Mostofa Patwary
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c16]Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Bo Liu, Aastha Jhunjhunwala, Zhilin Wang, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
Data, Data Everywhere: A Guide for Pretraining Dataset Construction. EMNLP 2024: 10671-10695 - [c15]Jiaxuan You, Mingjie Liu, Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
LLM-Evolve: Evaluation for LLM's Evolving Capability on Benchmarks. EMNLP 2024: 16937-16942 - [i23]Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick LeGresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan M. Cohen, Bryan Catanzaro:
Nemotron-4 15B Technical Report. CoRR abs/2402.16819 (2024) - [i22]Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan M. Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick LeGresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu:
Nemotron-4 340B Technical Report. CoRR abs/2406.11704 (2024) - [i21]Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Bo Li, Aastha Jhunjhunwala, Zhilin Wang, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
Data, Data Everywhere: A Guide for Pretraining Dataset Construction. CoRR abs/2407.06380 (2024) - [i20]Jupinder Parmar, Sanjeev Satheesh, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models. CoRR abs/2407.07263 (2024) - [i19]Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov:
Compact Language Models via Pruning and Knowledge Distillation. CoRR abs/2407.14679 (2024) - [i18]Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov:
LLM Pruning and Distillation in Practice: The Minitron Approach. CoRR abs/2408.11796 (2024) - [i17]Syeda Nahida Akter, Shrimai Prabhumoye, John Kamalu, Sanjeev Satheesh, Eric Nyberg, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs. CoRR abs/2410.12881 (2024) - 2023
- [c14]Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro:
Context Generation Improves Open Domain Question Answering. EACL (Findings) 2023: 781-796 - [c13]Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
Adding Instructions during Pretraining: Effective way of Controlling Toxicity in Language Models. EACL 2023: 2628-2643 - [i16]Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models. CoRR abs/2302.07388 (2023) - 2022
- [c12]Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro:
Multi-Stage Prompting for Knowledgeable Dialogue Generation. ACL (Findings) 2022: 1317-1337 - [c11]Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro:
Evaluating Parameter Efficient Learning for Generation. EMNLP 2022: 4824-4833 - [c10]Mostofa Patwary:
Keynote Talk 2 Training Large Language Models: Challenges and Opportunities. IPDPS Workshops 2022: 1245 - [c9]Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro:
Factuality Enhanced Language Models for Open-Ended Text Generation. NeurIPS 2022 - [c8]Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro:
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models. NeurIPS 2022 - [i15]Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zheng, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro:
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. CoRR abs/2201.11990 (2022) - [i14]Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro:
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models. CoRR abs/2202.04173 (2022) - [i13]Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro:
Multi-Stage Prompting for Knowledgeable Dialogue Generation. CoRR abs/2203.08745 (2022) - [i12]Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
Factuality Enhanced Language Models for Open-Ended Text Generation. CoRR abs/2206.04624 (2022) - [i11]Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro:
Context Generation Improves Open Domain Question Answering. CoRR abs/2210.06349 (2022) - [i10]Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro:
Evaluating Parameter Efficient Learning for Generation. CoRR abs/2210.13673 (2022) - 2021
- [c7]Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L. Hamilton, Bryan Catanzaro:
End-to-End Training of Neural Retrievers for Open-Domain Question Answering. ACL/IJCNLP (1) 2021: 6648-6662 - [c6]Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia:
Efficient large-scale language model training on GPU clusters using megatron-LM. SC 2021: 58 - [i9]Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L. Hamilton, Bryan Catanzaro:
End-to-End Training of Neural Retrievers for Open-Domain Question Answering. CoRR abs/2101.00408 (2021) - [i8]Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia:
Efficient Large-Scale Language Model Training on GPU Clusters. CoRR abs/2104.04473 (2021) - 2020
- [c5]Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro:
Large Scale Multi-Actor Generative Dialog Modeling. ACL 2020: 66-84 - [c4]Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro:
MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models. EMNLP (1) 2020: 2831-2845 - [c3]Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, Raghav Mani:
BioMegatron: Larger Biomedical Domain Language Model. EMNLP (1) 2020: 4700-4706 - [c2]Raul Puri, Ryan Spring, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro:
Training Question Answering Models From Synthetic Data. EMNLP (1) 2020: 5811-5826 - [i7]Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro:
Training Question Answering Models From Synthetic Data. CoRR abs/2002.09599 (2020) - [i6]Alex Boyd, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro:
Large Scale Multi-Actor Generative Dialog Modeling. CoRR abs/2005.06114 (2020) - [i5]Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, Bryan Catanzaro:
MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models. CoRR abs/2010.00840 (2020) - [i4]Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, Raghav Mani:
BioMegatron: Larger Biomedical Domain Language Model. CoRR abs/2010.06060 (2020) - [i3]Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro:
Local Knowledge Powered Conversational Agents. CoRR abs/2010.10150 (2020)
2010 – 2019
- 2019
- [c1]Adam Rupe, Prabhat, James P. Crutchfield, Nalini Kumar, Vladislav Epifanov, Karthik Kashinath, Oleksandr Pavlyk, Frank Schlimbach, Mostofa Patwary, Sergey Maidanov, Victor W. Lee:
DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems. MLHPC@SC 2019: 75-87 - [i2]Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro:
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR abs/1909.08053 (2019) - [i1]Adam Rupe, Nalini Kumar, Vladislav Epifanov, Karthik Kashinath, Oleksandr Pavlyk, Frank Schlimbach, Mostofa Patwary, Sergey Maidanov, Victor W. Lee, Prabhat, James P. Crutchfield:
DisCo: Physics-Based Unsupervised Discovery of Coherent Structures in Spatiotemporal Systems. CoRR abs/1909.11822 (2019)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-25 23:39 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint