


default search action
Shinji Watanabe 0001
Person information
- affiliation: Carnegie Mellon University, Pittsburgh, PA, USA
- affiliation (former): Johns Hopkins University, Baltimore, MD, USA
- affiliation (2012 - 2017): Mitsubishi Electric Research Laboratories, Cambridge, MA, USA
- affiliation (2001 - 2011): NTT Communication Science Laboratories, Kyoto, Japan
- affiliation (PhD 2006): Waseda University, Tokyo, Japan
Other persons with the same name
- Shinji Watanabe 0002 — Kanagawa University, Department of Electrical Engineering, Yokohama, Japan
- Shinji Watanabe 0003 — Osaka Prefecture University, School of Knowledge and Information Systems, Sakai, Japan
- Shinji Watanabe 0004 — Renesas Electronics Corporation, Kawasaki, Japan
- Shinji Watanabe 0005 — Nintendo Co.,Ltd, Kyoto, Japan
- Shinji Watanabe 0006 — Gifu National College of Technology, Motosu-gun, Gifu-ken, Japan
- Shinji Watanabe 0007 — University of Miyazaki, Miyazaki, Japan
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j61]Xuankai Chang
, Shinji Watanabe
, Marc Delcroix
, Tsubasa Ochiai
, Wangyou Zhang
, Yanmin Qian
:
Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing]. IEEE Signal Process. Mag. 41(6): 39-50 (2024) - [j60]Rohit Prabhavalkar
, Takaaki Hori
, Tara N. Sainath
, Ralf Schlüter
, Shinji Watanabe
:
End-to-End Speech Recognition: A Survey. IEEE ACM Trans. Audio Speech Lang. Process. 32: 325-351 (2024) - [j59]Takaaki Saeki
, Soumi Maiti
, Xinjian Li
, Shinji Watanabe
, Shinnosuke Takamichi
, Hiroshi Saruwatari
:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1829-1844 (2024) - [j58]Shih-Lun Wu
, Chris Donahue
, Shinji Watanabe
, Nicholas J. Bryan
:
Music ControlNet: Multiple Time-Varying Controls for Music Generation. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2692-2703 (2024) - [j57]Shu-Wen Yang
, Heng-Jui Chang
, Zili Huang, Andy T. Liu
, Cheng-I Lai
, Haibin Wu
, Jiatong Shi
, Xuankai Chang, Hsiang-Sheng Tsai
, Wen-Chin Huang
, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2884-2899 (2024) - [c417]Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang
, Jinglin Liu, Yi Ren, Yuexian Zou, Zhou Zhao, Shinji Watanabe:
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. AAAI 2024: 23802-23804 - [c416]Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori S. Levin:
Wav2Gloss: Generating Interlinear Glossed Text from Speech. ACL (1) 2024: 568-582 - [c415]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. ACL (1) 2024: 10192-10209 - [c414]Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
On the Evaluation of Speech Foundation Models for Spoken Language Understanding. ACL (Findings) 2024: 11923-11938 - [c413]Yichen Lu, Jiaqi Song, Chao-Han Huck Yang, Shinji Watanabe:
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model. EMNLP (Industry Track) 2024: 440-451 - [c412]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. EMNLP 2024: 10205-10224 - [c411]Hang Chen, Shilong Wu, Chenxi Wang, Jun Du, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Jingdong Chen, Odette Scharenborg, Zhong-Qiu Wang, Bao-Cai Yin, Jia Pan:
Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge. ICASSP Workshops 2024: 123-124 - [c410]Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-Weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe:
Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation. ICASSP 2024: 316-320 - [c409]Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhongqiu Wang, Shinji Watanabe:
Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor. ICASSP 2024: 446-450 - [c408]Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. ICASSP Workshops 2024: 570-574 - [c407]Kwanghee Choi, Jee-Weon Jung, Shinji Watanabe:
Understanding Probe Behaviors Through Variational Bounds of Mutual Information. ICASSP 2024: 5655-5659 - [c406]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens. ICASSP 2024: 7970-7974 - [c405]Salvador Medina, Sarah L. Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann, Shinji Watanabe, Iain A. Matthews:
PhISANet: Phonetically Informed Speech Animation Network. ICASSP 2024: 8225-8229 - [c404]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro:
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper. ICASSP 2024: 10471-10475 - [c403]Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe:
Phoneme-Aware Encoding for Prefix-Tree-Based Contextual ASR. ICASSP 2024: 10641-10645 - [c402]Yui Sudo
, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search. ICASSP 2024: 10896-10900 - [c401]Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu:
Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models. ICASSP 2024: 11156-11160 - [c400]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. ICASSP 2024: 11481-11485 - [c399]Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury:
Semi-Autoregressive Streaming ASR with Label Context. ICASSP 2024: 11681-11685 - [c398]Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe:
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model. ICASSP 2024: 11741-11745 - [c397]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun
, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. ICASSP 2024: 11831-11835 - [c396]Samuele Cornell, Jee-Weon Jung, Shinji Watanabe, Stefano Squartini:
One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition. ICASSP 2024: 11856-11860 - [c395]Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe:
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing. ICASSP 2024: 11941-11945 - [c394]Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur:
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization. ICASSP 2024: 11971-11975 - [c393]Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur:
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora. ICASSP 2024: 12006-12010 - [c392]Jee-Weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe:
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models. ICASSP 2024: 12071-12075 - [c391]Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee:
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech. ICASSP 2024: 12136-12140 - [c390]Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-Weon Jung, Shinji Watanabe, Joon Son Chung:
VoxMM: Rich Transcription of Conversations in the Wild. ICASSP 2024: 12551-12555 - [c389]William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe:
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing. ICASSP 2024: 13066-13070 - [c388]Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe:
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks. ICASSP 2024: 13326-13330 - [c387]Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe:
Cross-Talk Reduction. IJCAI 2024: 5171-5180 - [c386]Yuning Wu
, Jiatong Shi
, Yifeng Yu
, Yuxun Tang
, Tao Qian
, Yueqian Lin
, Jionghao Han
, Xinyi Bai
, Shinji Watanabe
, Qin Jin
:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. ACM Multimedia 2024: 11279-11281 - [c385]Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe:
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions. NAACL-HLT 2024: 2754-2774 - [i336]Jee-weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe:
AugSumm: towards generalizable speech summarization using synthetic labels from large language model. CoRR abs/2401.06806 (2024) - [i335]Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Improving ASR Contextual Biasing with Guided Attention. CoRR abs/2401.08835 (2024) - [i334]Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng
, Shinji Watanabe:
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search. CoRR abs/2401.10449 (2024) - [i333]Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe:
Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor. CoRR abs/2401.12473 (2024) - [i332]Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian:
Improving Design of Input Condition Invariant Speech Enhancement. CoRR abs/2401.14271 (2024) - [i331]Yifan Peng
, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe:
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer. CoRR abs/2401.16658 (2024) - [i330]Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics. CoRR abs/2401.16812 (2024) - [i329]Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe:
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models. CoRR abs/2401.17230 (2024) - [i328]Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe:
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2. CoRR abs/2401.17619 (2024) - [i327]Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024) - [i326]Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Barry-John Theobald:
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features? CoRR abs/2402.00340 (2024) - [i325]Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj:
Evaluating and Improving Continual Learning in Spoken Language Understanding. CoRR abs/2402.10427 (2024) - [i324]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. CoRR abs/2402.12654 (2024) - [i323]Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro:
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. CoRR abs/2402.16021 (2024) - [i322]Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori S. Levin:
Wav2Gloss: Generating Interlinear Glossed Text from Speech. CoRR abs/2403.13169 (2024) - [i321]Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. CoRR abs/2404.09385 (2024) - [i320]Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition with Dynamic Vocabulary. CoRR abs/2405.13344 (2024) - [i319]Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. CoRR abs/2405.13514 (2024) - [i318]Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe:
Cross-Talk Reduction. CoRR abs/2405.20402 (2024) - [i317]Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe:
YODAS: Youtube-Oriented Dataset for Audio and Speech. CoRR abs/2406.00899 (2024) - [i316]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun
, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. CoRR abs/2406.02560 (2024) - [i315]Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe:
4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders. CoRR abs/2406.02950 (2024) - [i314]Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian:
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement. CoRR abs/2406.04269 (2024) - [i313]Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement. CoRR abs/2406.04660 (2024) - [i312]Jee-weon Jung, Xin Wang, Nicholas W. D. Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung:
To what extent can ASV systems naturally defend against spoofing attacks? CoRR abs/2406.05339 (2024) - [i311]Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann:
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation. CoRR abs/2406.06185 (2024) - [i310]Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin:
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units. CoRR abs/2406.07725 (2024) - [i309]Yoshiaki Bando, Tomohiko Nakamura, Shinji Watanabe:
Neural Blind Source Separation and Diarization for Distant Speech Recognition. CoRR abs/2406.08396 (2024) - [i308]Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe:
Self-Supervised Speech Representations are More Phonetic than Semantic. CoRR abs/2406.08619 (2024) - [i307]Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets. CoRR abs/2406.08641 (2024) - [i306]Yifeng Yu, Jiatong Shi, Yuning Wu, Shinji Watanabe:
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation. CoRR abs/2406.08761 (2024) - [i305]Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe:
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models. CoRR abs/2406.09282 (2024) - [i304]Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu:
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding. CoRR abs/2406.09345 (2024) - [i303]Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Y. Sun, Shinji Watanabe:
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model. CoRR abs/2406.09869 (2024) - [i302]Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
On the Evaluation of Speech Foundation Models for Spoken Language Understanding. CoRR abs/2406.10083 (2024) - [i301]Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model. CoRR abs/2406.12317 (2024) - [i300]Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting. CoRR abs/2406.12611 (2024) - [i299]Chenda Li, Samuele Cornell, Shinji Watanabe, Yanmin Qian:
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement. CoRR abs/2406.13471 (2024) - [i298]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Decoder-only Architecture for Streaming End-to-end Speech Recognition. CoRR abs/2406.16107 (2024) - [i297]Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss. CoRR abs/2406.16120 (2024) - [i296]Hye-jin Shim, Md. Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen:
Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing. CoRR abs/2406.17246 (2024) - [i295]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. CoRR abs/2407.00837 (2024) - [i294]Darshan Prabhu, Yifan Peng, Preethi Jyothi, Shinji Watanabe:
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels. CoRR abs/2407.03718 (2024) - [i293]Samuele Cornell, Taejin Park, Steve Huang, Christoph Böddeker, Xuankai Chang, Matthew Maciejewski, Matthew Wiesner, Paola García, Shinji Watanabe:
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization. CoRR abs/2407.16447 (2024) - [i292]Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe:
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data. CoRR abs/2408.00624 (2024) - [i291]Xi Xu, Siqi Ouyang, Brian Yan, Patrick Fernandes, William Chen, Lei Li, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2024 Simultaneous Speech Translation System. CoRR abs/2408.07452 (2024) - [i290]Samuele Cornell, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe:
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition. CoRR abs/2408.09215 (2024) - [i289]Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. CoRR abs/2409.07226 (2024) - [i288]Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas W. D. Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe:
Text-To-Speech Synthesis In The Wild. CoRR abs/2409.08711 (2024) - [i287]Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration. CoRR abs/2409.09506 (2024) - [i286]Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Marco Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, Andreas Stolcke:
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition. CoRR abs/2409.09785 (2024) - [i285]Li-Wei Chen, Takuya Higuchi, He Bai, Ahmed Hussen Abdelaziz, Alexander Rudnicky, Shinji Watanabe, Tatiana Likhomanenko, Barry-John Theobald, Zakaria Aldeneh:
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models. CoRR abs/2409.10788 (2024) - [i284]Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko, Barry-John Theobald:
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels. CoRR abs/2409.10791 (2024) - [i283]Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe:
Task Arithmetic for Language Expansion in Speech Translation. CoRR abs/2409.11274 (2024) - [i282]Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. CoRR abs/2409.12370 (2024) - [i281]Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, Dong Yu:
Preference Alignment Improves Language Model-Based TTS. CoRR abs/2409.12403 (2024) - [i280]Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James R. Glass, Shinji Watanabe, Hung-yi Lee:
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models. CoRR abs/2409.14085 (2024) - [i279]Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens. CoRR abs/2409.15732 (2024) - [i278]Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe:
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech. CoRR abs/2409.15897 (2024) - [i277]Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas W. D. Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe:
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild. CoRR abs/2409.17285 (2024) - [i276]Brian Yan, Vineel Pratap, Shinji Watanabe, Michael Auli:
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking. CoRR abs/2409.18428 (2024) - [i275]Yichen Lu, Jiaqi Song, Chao-Han Huck Yang, Shinji Watanabe:
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model. CoRR abs/2410.03007 (2024) - [i274]Yifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg:
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning. CoRR abs/2410.17485 (2024) - [i273]Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondrej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubinski, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John P. McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha, John E. Ortega, Sara Papi, Peter Polák, Adam Pospísil, Pavel Pecina, Elizabeth Salesky, Nivedita Sethiya, Balaram Sarkar, Jiatong Shi, Claytone Sikasote, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Brian Thompson, Marco Turchi, Alex Waibel, Shinji Watanabe, Patrick Wilken, Petr Zemánek, Rodolfo Zevallos:
Findings of the IWSLT 2024 Evaluation Campaign. CoRR abs/2411.05088 (2024) - [i272]Chien-yu Huang, Wei-Chih Chen, Shu-Wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Fabian Ritter Gutierrez, Ming To Chuang, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Eunjung Yeo, Kalvin Chang, Chung-Ming Chien, Kwanghee Choi, Cheng-Hsiu Hsieh, Yi-Cheng Lin, Chee-En Yu, I-Hsiang Chiu, Heitor R. Guimarães, Jionghao Han, Tzu-Quan Lin, Tzu-Yuan Lin, Homu Chang, Ting-Wu Chang, Chun Wei Chen, Shou-Jen Chen, Yu-Hua Chen, Hsi-Chun Cheng, Kunal Dhawan, Jia-Lin Fang, Shi-Xin Fang, Kuan-Yu Fang Chiang, Chi An Fu, Hsien-Fu Hsiao, Ching Yu Hsu, Shao-Syuan Huang, Lee Chen Wei, Hsi-Che Lin, Hsuan-Hao Lin, Hsuan-Ting Lin, Jian-Ren Lin, Ting-Chun Liu, Li-Chun Lu, Tsung-Min Pai, Ankita Pasad, Shih-Yun Shan Kuan, Suwon Shon, Yuxun Tang, Yun-Shao Tsai, Jui-Chiang Wei, Tzu-Chieh Wei, Chengxi Wu, Dien-Ruei Wu, Chao-Han Huck Yang, Chieh-Chi Yang, Jia Qi Yip, Shao-Xiang Yuan, Vahid Noroozi, Zhehuai Chen, Haibin Wu, Karen Livescu, David Harwath, Shinji Watanabe, Hung-yi Lee:
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks. CoRR abs/2411.05361 (2024) - [i271]Shih-Heng Wang, Jiatong Shi, Chien-yu Huang, Shinji Watanabe, Hung-yi Lee:
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition. CoRR abs/2411.18107 (2024) - [i270]Pengcheng Guo, Xuankai Chang, Hang Lv, Shinji Watanabe, Lei Xie:
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR. CoRR abs/2412.05589 (2024) - [i269]Peter Wu, Bohan Yu, Kevin Scheck, Alan W. Black, Aditi S. Krishnapriyan, Irene Y. Chen, Tanja Schultz, Shinji Watanabe, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Multimodal Articulatory Representations. CoRR abs/2412.13387 (2024) - [i268]Jiatong Shi, Hye-jin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, Shinji Watanabe:
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music. CoRR abs/2412.17667 (2024) - [i267]Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, Shinji Watanabe:
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization. CoRR abs/2412.19005 (2024) - 2023
- [j56]Matthew Maciejewski
, Jing Shi, Shinji Watanabe
, Sanjeev Khudanpur:
A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data. Comput. Speech Lang. 77: 101410 (2023) - [j55]Yen-Ju Lu
, Xuankai Chang
, Chenda Li
, Wangyou Zhang
, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler
, Zhong-Qiu Wang
, Yu Tsao
, Yanmin Qian
, Shinji Watanabe
:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing. J. Open Source Softw. 8(91): 5403 (2023) - [j54]Zhong-Qiu Wang
, Gordon Wichern
, Shinji Watanabe
, Jonathan Le Roux
:
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. IEEE ACM Trans. Audio Speech Lang. Process. 31: 397-410 (2023) - [j53]Shota Horiguchi
, Shinji Watanabe
, Paola García
, Yuki Takashima
, Yohei Kawaguchi
:
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors. IEEE ACM Trans. Audio Speech Lang. Process. 31: 706-720 (2023) - [j52]Yen-Ju Lu
, Chia-Yu Chang, Cheng Yu
, Ching-Feng Liu, Jeih-weih Hung
, Shinji Watanabe
, Yu Tsao
:
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information. IEEE ACM Trans. Audio Speech Lang. Process. 31: 2738-2750 (2023) - [j51]Siddharth Dalmia
, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe
, Florian Metze
, Luke Zettlemoyer, Abdelrahman Mohamed:
LegoNN: Building Modular Encoder-Decoder Models. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3112-3126 (2023) - [j50]Zhong-Qiu Wang
, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3221-3236 (2023) - [c384]Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky:
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. AAAI 2023: 12644-12652 - [c383]Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe:
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit. ACL (demo) 2023: 400-411 - [c382]Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan S. Sharma, Wei-Lun Wu, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks. ACL (1) 2023: 8906-8937 - [c381]Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino:
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units. ACL (1) 2023: 15655-15680 - [c380]Tuan Vu Ho, Shota Horiguchi, Shinji Watanabe, Paola García, Takashi Sumiyoshi:
Synthetic Data Augmentation for ASR with Domain Filtering. APSIPA ASC 2023: 1760-1765 - [c379]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng
, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. ASRU 2023: 1-8 - [c378]Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku:
LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models. ASRU 2023: 1-6 - [c377]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. ASRU 2023: 1-9 - [c376]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, Shinji Watanabe:
Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation. ASRU 2023: 1-8 - [c375]Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe:
Yodas: Youtube-Oriented Dataset for Audio and Speech. ASRU 2023: 1-8 - [c374]Yifan Peng
, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo
, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. ASRU 2023: 1-8 - [c373]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction. ASRU 2023: 1-6 - [c372]Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj:
Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems. ASRU 2023: 1-8 - [c371]Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe:
Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond. ASRU 2023: 1-8 - [c370]Yusuke Shinohara, Shinji Watanabe:
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition. ASRU 2023: 1-7 - [c369]Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. ASRU 2023: 1-8 - [c368]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement For Diverse Input Conditions. ASRU 2023: 1-6 - [c367]Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W. Black, Shinji Watanabe:
CTC Alignments Improve Autoregressive Translation. EACL 2023: 1615-1631 - [c366]Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History. ICASSP 2023: 1-5 - [c365]Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng
, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge. ICASSP 2023: 1-2 - [c364]Dan Berrebbi, Brian Yan, Shinji Watanabe:
Avoid Overthinking in Self-Supervised Models for Speech Recognition. ICASSP 2023: 1-5 - [c363]Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu:
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. ICASSP 2023: 1-2 - [c362]Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky:
A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units. ICASSP 2023: 1-5 - [c361]William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR with Auxiliary CTC Objectives. ICASSP 2023: 1-5 - [c360]Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono, Stefano Squartini
:
Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge. ICASSP 2023: 1-2 - [c359]Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge. ICASSP 2023: 1-2 - [c358]Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe:
Streaming Joint Speech Recognition and Disfluency Detection. ICASSP 2023: 1-5 - [c357]Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe, Sanjeev Khudanpur:
Euro: Espnet Unsupervised ASR Open-Source Toolkit. ICASSP 2023: 1-5 - [c356]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss. ICASSP 2023: 1-5 - [c355]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder. ICASSP 2023: 1-5 - [c354]Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, Shinji Watanabe:
FindAdaptNet: Find and Insert Adapters by Learned Layer Importance. ICASSP 2023: 1-5 - [c353]Jee-Weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung:
In Search of Strong Embedding Extractors for Speaker Diarisation. ICASSP 2023: 1-5 - [c352]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan S. Sharma, Kohei Matsuura, Shinji Watanabe:
Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders. ICASSP 2023: 1-5 - [c351]Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng
, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge. ICASSP 2023: 1-2 - [c350]Jiachen Lian, Alan W. Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, Gopala Krishna Anumanchipalli:
Articulatory Representation Learning via Joint Factor Analysis and Neural Matrix Factorization. ICASSP 2023: 1-5 - [c349]Takashi Maekaku, Yuya Fujita, Xuankai Chang, Shinji Watanabe:
Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model. ICASSP 2023: 1-5 - [c348]Soumi Maiti, Yifan Peng
, Takaaki Saeki, Shinji Watanabe:
Speechlmscore: Evaluating Speech Generation Using Speech Language Model. ICASSP 2023: 1-5 - [c347]Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe:
Align, Write, Re-Order: Explainable End-to-End Speech Translation via Operation Sequence Generation. ICASSP 2023: 1-5 - [c346]Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding. ICASSP 2023: 1-5 - [c345]Yifan Peng
, Jaesong Lee, Shinji Watanabe:
I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition. ICASSP 2023: 1-5 - [c344]Jiatong Shi, Chan-Jan Hsu, Ho-Lam Chung, Dongji Gao, Paola García, Shinji Watanabe, Ann Lee, Hung-Yi Lee:
Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR. ICASSP 2023: 1-5 - [c343]Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe:
Enhancing Speech-To-Speech Translation with Multiple TTS Targets. ICASSP 2023: 1-5 - [c342]Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe:
Context-Aware Fine-Tuning of Self-Supervised Speech Models. ICASSP 2023: 1-5 - [c341]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation. ICASSP 2023: 1-5 - [c340]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
FNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated full- and sub-band Modeling. ICASSP 2023: 1-5 - [c339]Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg
, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition. ICASSP 2023: 1-5 - [c338]Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Speaker-Independent Acoustic-to-Articulatory Speech Inversion. ICASSP 2023: 1-5 - [c337]Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Jeong Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi:
Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages. ICASSP 2023: 1-5 - [c336]Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg:
Multi-Blank Transducers for Speech Recognition. ICASSP 2023: 1-5 - [c335]Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe:
Towards Zero-Shot Code-Switched Speech Recognition. ICASSP 2023: 1-5 - [c334]Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement. ICASSP 2023: 1-5 - [c333]Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement. ICASSP 2023: 1-5 - [c332]Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks. ICLR 2023 - [c331]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. ICML 2023: 38462-38484 - [c330]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. IJCAI 2023: 5179-5187 - [c329]Yifan Peng
, Yui Sudo
, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. INTERSPEECH 2023: 62-66 - [c328]Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. INTERSPEECH 2023: 396-400 - [c327]Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng
, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
Tensor decomposition for minimization of E2E SLU model toward on-device processing. INTERSPEECH 2023: 710-714 - [c326]Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding. INTERSPEECH 2023: 720-724 - [c325]Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. INTERSPEECH 2023: 884-888 - [c324]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition. INTERSPEECH 2023: 1369-1373 - [c323]Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe:
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning. INTERSPEECH 2023: 1399-1403 - [c322]Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe, Rita Singh, Bhiksha Raj:
BASS: Block-wise Adaptation for Speech Summarization. INTERSPEECH 2023: 1454-1458 - [c321]Jiyang Tang
, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. INTERSPEECH 2023: 1528-1532 - [c320]Yifan Peng
, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang
, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. INTERSPEECH 2023: 2208-2212 - [c319]Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, Shinji Watanabe:
Exploration on HuBERT with Multiple Resolution. INTERSPEECH 2023: 3287-3291 - [c318]Yui Sudo
, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe:
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders. INTERSPEECH 2023: 3312-3316 - [c317]Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondrej Bojar:
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. INTERSPEECH 2023: 3979-3983 - [c316]William Chen, Xuankai Chang, Yifan Peng
, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. INTERSPEECH 2023: 4404-4408 - [c315]Yui Sudo
, Muhammad Shakeel, Yifan Peng
, Shinji Watanabe:
Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training. INTERSPEECH 2023: 4479-4483 - [c314]Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction. INTERSPEECH 2023: 4968-4972 - [c313]Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W. Black, Louis Goldstein, Shinji Watanabe, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from MRI-Based Articulatory Representations. INTERSPEECH 2023: 5132-5136 - [c312]Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondrej Bojar, Claudia Borg
, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Chen, Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny Matusov, Paul McNamee, John P. McCrae, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Ha Nguyen, Jan Niehues, Xing Niu, Atul Kr. Ojha, John E. Ortega, Proyag Pal, Juan Pino, Lonneke van der Plas, Peter Polák, Elijah Rippeth, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Yun Tang, Brian Thompson, Kevin Tran, Marco Turchi, Alex Waibel, Mingxuan Wang, Shinji Watanabe, Rodolfo Zevallos:
Findings of the IWSLT 2023 Evaluation Campaign. IWSLT@ACL 2023: 1-61 - [c311]Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe:
CMU's IWSLT 2023 Simultaneous Speech Translation System. IWSLT@ACL 2023: 235-240 - [c310]Zhong-Qiu Wang, Shinji Watanabe:
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures. NeurIPS 2023 - [c309]Taiqi He, Lindia Tjuatja, Nathaniel R. Robinson, Shinji Watanabe, David R. Mortensen, Graham Neubig, Lori S. Levin:
SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing. SIGMORPHON 2023: 209-216 - [c308]Georgios Karakasidis, Nathaniel R. Robinson, Yaroslav Getman, Atieno Ogayo, Ragheb Al-Ghezi, Ananya Ayasi
, Shinji Watanabe, David R. Mortensen, Mikko Kurimo:
Multilingual TTS Accent Impressions for Accented ASR. TSD 2023: 317-327 - [c307]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. WASPAA 2023: 1-5 - [d1]Yen-Ju Lu
, Xuankai Chang
, Chenda Li
, Wangyou Zhang
, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler
, Zhong-Qiu Wang
, Yu Tsao
, Yanmin Qian
, Shinji Watanabe
:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310). Zenodo, 2023 - [i266]Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe
, Wassim El-Hajj, Ahmed Ali:
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study. CoRR abs/2301.09099 (2023) - [i265]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe
, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. CoRR abs/2301.12596 (2023) - [i264]Li-Wei Chen, Shinji Watanabe
, Alexander Rudnicky
:
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. CoRR abs/2302.04215 (2023) - [i263]Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Speaker-Independent Acoustic-to-Articulatory Speech Inversion. CoRR abs/2302.06774 (2023) - [i262]Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe
, Manuel Pariente, Nobutaka Ono:
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge. CoRR abs/2302.07928 (2023) - [i261]Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08088 (2023) - [i260]Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08095 (2023) - [i259]William Chen, Brian Yan, Jiatong Shi, Yifan Peng
, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR With Auxiliary CTC Objectives. CoRR abs/2302.12829 (2023) - [i258]Yifan Peng
, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding. CoRR abs/2302.14132 (2023) - [i257]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. CoRR abs/2303.03329 (2023) - [i256]Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition. CoRR abs/2303.06326 (2023) - [i255]Yifan Peng
, Jaesong Lee, Shinji Watanabe:
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition. CoRR abs/2303.07624 (2023) - [i254]Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe:
Enhancing Speech-to-Speech Translation with Multiple TTS Targets. CoRR abs/2304.04618 (2023) - [i253]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. CoRR abs/2304.06795 (2023) - [i252]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling. CoRR abs/2304.08707 (2023) - [i251]Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe:
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. CoRR abs/2304.12995 (2023) - [i250]Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History. CoRR abs/2305.00926 (2023) - [i249]Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng
, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge. CoRR abs/2305.01194 (2023) - [i248]Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng
, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge. CoRR abs/2305.01620 (2023) - [i247]Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-Yi Lee:
Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation. CoRR abs/2305.07455 (2023) - [i246]Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei-Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. CoRR abs/2305.10615 (2023) - [i245]Yifan Peng
, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. CoRR abs/2305.11073 (2023) - [i244]Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. CoRR abs/2305.11095 (2023) - [i243]Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. CoRR abs/2305.13331 (2023) - [i242]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. CoRR abs/2305.17651 (2023) - [i241]Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe:
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning. CoRR abs/2305.18108 (2023) - [i240]Zhong-Qiu Wang, Shinji Watanabe:
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures. CoRR abs/2305.20054 (2023) - [i239]Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, Shinji Watanabe:
Exploration on HuBERT with Multiple Resolutions. CoRR abs/2306.01084 (2023) - [i238]William Chen, Xuankai Chang, Yifan Peng
, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. CoRR abs/2306.06672 (2023) - [i237]Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola García, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur:
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios. CoRR abs/2306.13734 (2023) - [i236]Roshan S. Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj:
BASS: Block-wise Adaptation for Speech Summarization. CoRR abs/2307.08217 (2023) - [i235]Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding. CoRR abs/2307.11005 (2023) - [i234]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. CoRR abs/2307.12231 (2023) - [i233]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition. CoRR abs/2307.12767 (2023) - [i232]Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction. CoRR abs/2308.10107 (2023) - [i231]Soumi Maiti, Yifan Peng
, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe:
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks. CoRR abs/2309.07937 (2023) - [i230]Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao:
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction. CoRR abs/2309.08348 (2023) - [i229]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens. CoRR abs/2309.08531 (2023) - [i228]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro:
Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model. CoRR abs/2309.08535 (2023) - [i227]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation. CoRR abs/2309.08876 (2023) - [i226]Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng
, Roshan S. Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee:
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech. CoRR abs/2309.09510 (2023) - [i225]Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. CoRR abs/2309.10787 (2023) - [i224]Siddhant Arora, George Saon
, Shinji Watanabe, Brian Kingsbury:
Semi-Autoregressive Streaming ASR With Label Context. CoRR abs/2309.10926 (2023) - [i223]Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondrej Bojar:
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. CoRR abs/2309.11379 (2023) - [i222]Yifan Peng
, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. CoRR abs/2309.13876 (2023) - [i221]Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. CoRR abs/2309.14922 (2023) - [i220]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng
, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning. CoRR abs/2309.15317 (2023) - [i219]Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed M. Ali, Shinji Watanabe, Sanjeev Khudanpur:
Speech collage: code-switched audio generation by collaging monolingual corpora. CoRR abs/2309.15674 (2023) - [i218]Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur:
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization. CoRR abs/2309.15686 (2023) - [i217]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. CoRR abs/2309.15800 (2023) - [i216]Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe:
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing. CoRR abs/2309.15826 (2023) - [i215]Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe:
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation. CoRR abs/2309.17352 (2023) - [i214]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement for Diverse Input Conditions. CoRR abs/2309.17384 (2023) - [i213]Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe, Helen Meng:
UniAudio: An Audio Foundation Model Toward Universal Audio Generation. CoRR abs/2310.00704 (2023) - [i212]Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini
:
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition. CoRR abs/2310.01688 (2023) - [i211]Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng
, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network. CoRR abs/2310.02973 (2023) - [i210]Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe:
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios. CoRR abs/2310.03938 (2023) - [i209]Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe:
HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model. CoRR abs/2310.03975 (2023) - [i208]Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chung, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond. CoRR abs/2310.05513 (2023) - [i207]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction. CoRR abs/2310.08277 (2023) - [i206]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis:
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch. CoRR abs/2310.17864 (2023) - [i205]Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan:
Music ControlNet: Multiple Time-varying Controls for Music Generation. CoRR abs/2311.07069 (2023) - [i204]Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe:
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR. CoRR abs/2312.09582 (2023) - [i203]Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu:
Generative Context-aware Fine-tuning of Self-supervised Speech Models. CoRR abs/2312.09895 (2023) - [i202]Kwanghee Choi, Jee-weon Jung, Shinji Watanabe:
Understanding Probe Behaviors through Variational Bounds of Mutual Information. CoRR abs/2312.10019 (2023) - 2022
- [j49]Amir Hussein
, Shinji Watanabe
, Ahmed Ali:
Arabic speech recognition by end-to-end, modular systems and human. Comput. Speech Lang. 71: 101272 (2022) - [j48]Zili Huang, Marc Delcroix
, Leibny Paola García-Perera
, Shinji Watanabe
, Desh Raj
, Sanjeev Khudanpur:
Joint speaker diarization and speech recognition based on region proposal networks. Comput. Speech Lang. 72: 101316 (2022) - [j47]Tae Jin Park
, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe
, Shrikanth Narayanan:
A review of speaker diarization: Recent advances with deep learning. Comput. Speech Lang. 72: 101317 (2022) - [j46]Jiatong Shi
, Chunlei Zhang
, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu
:
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer. Comput. Speech Lang. 73: 101327 (2022) - [j45]Aswin Shanmugam Subramanian
, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu
:
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput. Speech Lang. 75: 101360 (2022) - [j44]Jing Shi
, Xuankai Chang, Shinji Watanabe
, Bo Xu:
Train from scratch: Single-stage joint training of speech separation and recognition. Comput. Speech Lang. 76: 101387 (2022) - [j43]Hung-Yi Lee, Shinji Watanabe
, Karen Livescu
, Abdelrahman Mohamed, Tara N. Sainath
:
Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1174-1178 (2022) - [j42]Abdelrahman Mohamed, Hung-yi Lee
, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin
, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu
, Lars Maaløe, Tara N. Sainath
, Shinji Watanabe
:
Self-Supervised Speech Representation Learning: A Review. IEEE J. Sel. Top. Signal Process. 16(6): 1179-1210 (2022) - [j41]Zhong-Qiu Wang
, Shinji Watanabe
:
Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction. IEEE Signal Process. Lett. 29: 1422-1426 (2022) - [j40]Shota Horiguchi
, Yusuke Fujita
, Shinji Watanabe
, Yawen Xue, Paola García
:
Encoder-Decoder Based Attractors for End-to-End Neural Diarization. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1493-1507 (2022) - [j39]Wangyou Zhang
, Xuankai Chang
, Christoph Böddeker, Tomohiro Nakatani
, Shinji Watanabe
, Yanmin Qian
:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3173-3188 (2022) - [c306]Xinjian Li, Florian Metze, David R. Mortensen
, Shinji Watanabe, Alan W. Black:
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble. ACL (Findings) 2022: 2106-2115 - [c305]Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. ACL (1) 2022: 8479-8492 - [c304]Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe:
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models. EMNLP (Findings) 2022: 5419-5429 - [c303]Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model. EMNLP (Findings) 2022: 5486-5503 - [c302]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe
:
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion. ICASSP 2022: 6237-6241 - [c301]Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe
, Dong Yu:
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization. ICASSP 2022: 6412-6416 - [c300]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe
, Tomoki Toda:
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations. ICASSP 2022: 6552-6556 - [c299]Motoi Omachi, Yuya Fujita, Shinji Watanabe
, Tianzi Wang:
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing. ICASSP 2022: 6772-6776 - [c298]Zili Huang, Shinji Watanabe
, Shu-Wen Yang, Paola García, Sanjeev Khudanpur:
Investigating Self-Supervised Learning for Speech Enhancement and Separation. ICASSP 2022: 6837-6841 - [c297]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe
, Soumith Chintala, Vincent Quenneville-Bélair:
Torchaudio: Building Blocks for Audio and Speech Processing. ICASSP 2022: 6982-6986 - [c296]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe
:
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion. ICASSP 2022: 7107-7111 - [c295]Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng
, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe
:
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet. ICASSP 2022: 7167-7171 - [c294]Niko Moritz, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux:
Sequence Transduction with Graph-Based Supervision. ICASSP 2022: 7212-7216 - [c293]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. ICASSP 2022: 7322-7326 - [c292]Shota Horiguchi, Yuki Takashima, Paola García, Shinji Watanabe
, Yohei Kawaguchi:
Multi-Channel End-To-End Neural Diarization with Distributed Microphones. ICASSP 2022: 7332-7336 - [c291]Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe
, Alexander Richard, Cheng Yu, Yu Tsao:
Conditional Diffusion Probabilistic Model for Speech Enhancement. ICASSP 2022: 7402-7406 - [c290]Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Jeong Han, Shinji Watanabe
:
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition. ICASSP 2022: 7872-7876 - [c289]Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe
:
Joint Speech Recognition and Audio Captioning. ICASSP 2022: 7892-7896 - [c288]Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe
:
Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR. ICASSP 2022: 8287-8291 - [c287]Keqi Deng, Zehui Yang, Shinji Watanabe
, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang:
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models. ICASSP 2022: 8522-8526 - [c286]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe
:
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge. ICASSP 2022: 9201-9205 - [c285]Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe
, Sabato Marco Siniscalchi, Odette Scharenborg
, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results. ICASSP 2022: 9266-9270 - [c284]Yifan Peng
, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. ICML 2022: 17627-17643 - [c283]Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe
:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. INTERSPEECH 2022: 16-20 - [c282]Peter Wu, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. INTERSPEECH 2022: 779-783 - [c281]Takashi Maekaku, Yuya Fujita, Yifan Peng
, Shinji Watanabe
:
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR. INTERSPEECH 2022: 1071-1075 - [c280]Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe
, Odette Scharenborg
, Jingdong Chen, Shifu Xiong, Jianqing Gao:
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1111-1115 - [c279]Jiatong Shi, George Saon
, David Haws, Shinji Watanabe
, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. INTERSPEECH 2022: 1656-1660 - [c278]Keqi Deng, Shinji Watanabe
, Jiatong Shi, Siddhant Arora:
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. INTERSPEECH 2022: 1746-1750 - [c277]Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe
, Odette Scharenborg
, Jingdong Chen, Baocai Yin, Jia Pan:
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1766-1770 - [c276]Yusuke Shinohara, Shinji Watanabe
:
Minimum latency training of sequence transducers for streaming end-to-end speech recognition. INTERSPEECH 2022: 2098-2102 - [c275]Yuki Takashima, Shota Horiguchi, Shinji Watanabe
, Leibny Paola García-Perera, Yohei Kawaguchi:
Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models. INTERSPEECH 2022: 2218-2222 - [c274]Muqiao Yang, Ian R. Lane, Shinji Watanabe
:
Online Continual Learning of End-to-End Speech Recognition Models. INTERSPEECH 2022: 2668-2672 - [c273]Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
Improving Speech Enhancement through Fine-Grained Speech Characteristics. INTERSPEECH 2022: 2953-2957 - [c272]Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe
:
Two-Pass Low Latency End-to-End Spoken Language Understanding. INTERSPEECH 2022: 3478-3482 - [c271]Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe
:
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. INTERSPEECH 2022: 3533-3537 - [c270]Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen
, Shinji Watanabe
:
When Is TTS Augmentation Through a Pivot Language Useful? INTERSPEECH 2022: 3538-3542 - [c269]Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
:
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation. INTERSPEECH 2022: 3819-3823 - [c268]Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty, Shinji Watanabe
:
Residual Language Model for End-to-end Speech Recognition. INTERSPEECH 2022: 3899-3903 - [c267]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe
, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. INTERSPEECH 2022: 4272-4276 - [c266]Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe
, Qin Jin:
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. INTERSPEECH 2022: 4277-4281 - [c265]Jaesong Lee, Lukas Lee, Shinji Watanabe
:
Memory-Efficient Training of RNN-Transducer with Sampled Softmax. INTERSPEECH 2022: 4441-4445 - [c264]Yui Sudo
, Muhammad Shakeel, Kazuhiro Nakadai, Jiatong Shi, Shinji Watanabe
:
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection. INTERSPEECH 2022: 4641-4645 - [c263]Xinjian Li, Florian Metze, David R. Mortensen
, Alan W. Black, Shinji Watanabe
:
ASR2K: Speech Recognition for Around 2000 Languages without Audio. INTERSPEECH 2022: 4885-4889 - [c262]Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe
, Yusuke Kida:
Better Intermediates Improve CTC Inference. INTERSPEECH 2022: 4965-4969 - [c261]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. INTERSPEECH 2022: 5458-5462 - [c260]Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondrej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vera Kloudová
, Surafel Melaku Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Jan Niehues
, Xing Niu, John Ortega, Juan Miguel Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe:
Findings of the IWSLT 2022 Evaluation Campaign. IWSLT@ACL 2022: 98-157 - [c259]Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng
, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2022 Dialect Speech Translation System. IWSLT@ACL 2022: 298-307 - [c258]Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
Phone Inventories and Recognition for Every Language. LREC 2022: 1061-1067 - [c257]Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe
:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. SLT 2022: 84-91 - [c256]Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe
, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. SLT 2022: 260-265 - [c255]Yifan Peng
, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. SLT 2022: 406-413 - [c254]Soumi Maiti, Yushi Ueda, Shinji Watanabe
, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. SLT 2022: 480-487 - [c253]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. SLT 2022: 496-501 - [c252]Shota Horiguchi, Yuki Takashima, Shinji Watanabe
, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. SLT 2022: 620-625 - [c251]Tzu-hsun Feng, Shuyan Annie Dong, Ching-Feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe
, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee:
Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning. SLT 2022: 1096-1103 - [c250]Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe
, Paola García, Hung-yi Lee, Hao Tang:
On Compressing Sequences for Self-Supervised Speech Models. SLT 2022: 1128-1135 - [i201]Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe:
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies. CoRR abs/2201.05420 (2022) - [i200]Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang:
Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models. CoRR abs/2201.10103 (2022) - [i199]Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe:
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR. CoRR abs/2201.10190 (2022) - [i198]Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe:
Joint Speech Recognition and Audio Captioning. CoRR abs/2202.01405 (2022) - [i197]Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao:
Conditional Diffusion Probabilistic Model for Speech Enhancement. CoRR abs/2202.05256 (2022) - [i196]Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi:
Acoustic Event Detection with Classifier Chains. CoRR abs/2202.08470 (2022) - [i195]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge. CoRR abs/2202.12298 (2022) - [i194]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. CoRR abs/2203.00232 (2022) - [i193]Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally
, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse H. Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe
, Zeyu Jin, Yonatan Bisk:
HEAR 2021: Holistic Evaluation of Audio Representations. CoRR abs/2203.03022 (2022) - [i192]Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. CoRR abs/2203.06849 (2022) - [i191]Jaesong Lee, Lukas Lee, Shinji Watanabe:
Memory-Efficient Training of RNN-Transducer with Sampled Softmax. CoRR abs/2203.16868 (2022) - [i190]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe
, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. CoRR abs/2203.17001 (2022) - [i189]Yushi Ueda, Soumi Maiti, Shinji Watanabe
, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. CoRR abs/2203.17068 (2022) - [i188]Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida:
Better Intermediates Improve CTC Inference. CoRR abs/2204.00176 (2022) - [i187]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Multi-speaker ASR with Independent Vector Analysis. CoRR abs/2204.00218 (2022) - [i186]Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe:
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation. CoRR abs/2204.00540 (2022) - [i185]Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe:
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. CoRR abs/2204.02470 (2022) - [i184]Zhong-Qiu Wang, Shinji Watanabe
:
Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction. CoRR abs/2204.07566 (2022) - [i183]Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora:
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. CoRR abs/2204.08920 (2022) - [i182]Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe
, Jonathan Le Roux:
STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency. CoRR abs/2204.09911 (2022) - [i181]Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Jeong Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi:
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages. CoRR abs/2205.01086 (2022) - [i180]Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe
, Qin Jin:
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis. CoRR abs/2205.04029 (2022) - [i179]Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu
, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
:
Self-Supervised Speech Representation Learning: A Review. CoRR abs/2205.10643 (2022) - [i178]Shota Horiguchi, Shinji Watanabe
, Paola García, Yuki Takashima, Yohei Kawaguchi:
Online Neural Diarization of Unlimited Numbers of Speakers. CoRR abs/2206.02432 (2022) - [i177]Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe
, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed:
LegoNN: Building Modular Encoder-Decoder Models. CoRR abs/2206.03318 (2022) - [i176]Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Narisetty, Shinji Watanabe
:
Residual Language Model for End-to-end Speech Recognition. CoRR abs/2206.07430 (2022) - [i175]Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
Improving Speech Enhancement through Fine-Grained Speech Characteristics. CoRR abs/2207.00237 (2022) - [i174]Yifan Peng
, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe
:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. CoRR abs/2207.02971 (2022) - [i173]Muqiao Yang, Ian R. Lane, Shinji Watanabe
:
Online Continual Learning of End-to-End Speech Recognition Models. CoRR abs/2207.05071 (2022) - [i172]Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe
:
Two-Pass Low Latency End-to-End Spoken Language Understanding. CoRR abs/2207.06670 (2022) - [i171]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. CoRR abs/2207.09514 (2022) - [i170]Nathaniel R. Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen
, Shinji Watanabe
:
When Is TTS Augmentation Through a Pivot Language Useful? CoRR abs/2207.09889 (2022) - [i169]Jiatong Shi, George Saon
, David Haws, Shinji Watanabe
, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. CoRR abs/2208.01818 (2022) - [i168]Xinjian Li, Florian Metze, David R. Mortensen
, Alan W. Black, Shinji Watanabe
:
ASR2K: Speech Recognition for Around 2000 Languages without Audio. CoRR abs/2209.02842 (2022) - [i167]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation. CoRR abs/2209.03952 (2022) - [i166]Peter Wu, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. CoRR abs/2209.06337 (2022) - [i165]Kwangyoun Kim, Felix Wu, Yifan Peng
, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe
:
E-Branchformer: Branchformer with Enhanced merging for speech recognition. CoRR abs/2210.00077 (2022) - [i164]Shota Horiguchi, Yuki Takashima, Shinji Watanabe
, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. CoRR abs/2210.03459 (2022) - [i163]Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W. Black, Shinji Watanabe
:
CTC Alignments Improve Autoregressive Translation. CoRR abs/2210.05200 (2022) - [i162]Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe
, Paola García, Hung-yi Lee, Hao Tang:
On Compressing Sequences for Self-Supervised Speech Models. CoRR abs/2210.07189 (2022) - [i161]Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe
:
Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks. CoRR abs/2210.07499 (2022) - [i160]Tzu-hsun Feng, Shuyan Annie Dong, Ching-Feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe
, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee:
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning. CoRR abs/2210.08634 (2022) - [i159]Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe
, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. CoRR abs/2210.10742 (2022) - [i158]Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe
:
Large-scale learning of generalised representations for speaker recognition. CoRR abs/2210.10985 (2022) - [i157]Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe
, Joon Son Chung:
In search of strong embedding extractors for speaker diarisation. CoRR abs/2210.14682 (2022) - [i156]Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe
:
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models. CoRR abs/2210.15734 (2022) - [i155]Jiachen Lian, Alan W. Black, Yijing Lu, Louis Goldstein, Shinji Watanabe
, Gopala Krishna Anumanchipalli:
Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization. CoRR abs/2210.16498 (2022) - [i154]Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model. CoRR abs/2210.16663 (2022) - [i153]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder. CoRR abs/2211.00792 (2022) - [i152]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss. CoRR abs/2211.00795 (2022) - [i151]Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe
:
Towards Zero-Shot Code-Switched Speech Recognition. CoRR abs/2211.01458 (2022) - [i150]Yusuke Shinohara, Shinji Watanabe
:
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition. CoRR abs/2211.02333 (2022) - [i149]Jiatong Shi, Chan-Jan Hsu, Ho-Lam Chung, Dongji Gao, Paola García, Shinji Watanabe
, Ann Lee, Hung-yi Lee:
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR. CoRR abs/2211.03025 (2022) - [i148]Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe
, Boris Ginsburg:
Multi-blank Transducers for Speech Recognition. CoRR abs/2211.03541 (2022) - [i147]Yifan Peng
, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
:
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. CoRR abs/2211.05869 (2022) - [i146]Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe
:
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation. CoRR abs/2211.05967 (2022) - [i145]Li-Wei Chen, Shinji Watanabe
, Alexander Rudnicky
:
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units. CoRR abs/2211.06535 (2022) - [i144]Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe
:
Streaming Joint Speech Recognition and Disfluency Detection. CoRR abs/2211.08726 (2022) - [i143]Dan Berrebbi, Brian Yan, Shinji Watanabe
:
Avoid Overthinking in Self-Supervised Models for Speech Recognition. CoRR abs/2211.08989 (2022) - [i142]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation. CoRR abs/2211.12433 (2022) - [i141]Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-yi Lee, Shinji Watanabe
, Sanjeev Khudanpur:
EURO: ESPnet Unsupervised ASR Open-source Toolkit. CoRR abs/2211.17196 (2022) - [i140]Soumi Maiti, Yifan Peng
, Takaaki Saeki, Shinji Watanabe
:
SpeechLMScore: Evaluating speech generation using speech language model. CoRR abs/2212.04559 (2022) - [i139]Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe
, Juan Pino:
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units. CoRR abs/2212.08055 (2022) - [i138]Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu
, Shinji Watanabe
:
Context-aware Fine-tuning of Self-supervised Speech Models. CoRR abs/2212.08542 (2022) - [i137]Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan S. Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
:
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks. CoRR abs/2212.10525 (2022) - [i136]Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe
:
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders. CoRR abs/2212.10818 (2022) - 2021
- [j38]Reinhold Haeb-Umbach
, Jahn Heymann, Lukas Drude
, Shinji Watanabe
, Marc Delcroix
, Tomohiro Nakatani:
Far-Field Automatic Speech Recognition. Proc. IEEE 109(2): 124-148 (2021) - [j37]Nanxin Chen
, Shinji Watanabe
, Jesús Villalba
, Piotr Zelasko
, Najim Dehak
:
Non-Autoregressive Transformer for Speech Recognition. IEEE Signal Process. Lett. 28: 121-125 (2021) - [c249]Yen-Ju Lu, Yu Tsao, Shinji Watanabe:
A Study on Speech Enhancement Based on Diffusion Probabilistic Model. APSIPA ASC 2021: 659-666 - [c248]Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, Louis-Philippe Morency:
Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks. APSIPA ASC 2021: 841-848 - [c247]Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe
:
A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies. ASRU 2021: 16-23 - [c246]Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe
:
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation. ASRU 2021: 47-54 - [c245]Shota Horiguchi, Shinji Watanabe
, Paola García, Yawen Xue, Yuki Takashima, Yohei Kawaguchi:
Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors. ASRU 2021: 98-105 - [c244]Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-Wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe
:
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition. ASRU 2021: 228-235 - [c243]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe
:
Attention-Based Multi-Hypothesis Fusion for Speech Summarization. ASRU 2021: 487-494 - [c242]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe
, Tomoki Toda:
On Prosody Modeling for ASR+TTS Based Voice Conversion. ASRU 2021: 642-649 - [c241]Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han
, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe
, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang:
Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing. ASRU 2021: 679-686 - [c240]Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe
:
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. ASRU 2021: 922-929 - [c239]Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe
, Alan W. Black:
Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity. ASRU 2021: 1050-1057 - [c238]Chaitanya Prasad Narisetty, Tomoki Hayashi, Ryunosuke Ishizaki, Shinji Watanabe, Kazuya Takeda:
Leveraging State-of-the-art ASR Techniques to Audio Captioning. DCASE 2021: 160-164 - [c237]Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe:
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec. EACL 2021: 1134-1145 - [c236]Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita
, Marc Delcroix
, Shinji Watanabe
, Yanmin Qian:
Dual-Path Modeling for Long Recording Speech Separation in Meetings. ICASSP 2021: 5739-5743 - [c235]Matthew Maciejewski, Jing Shi, Shinji Watanabe
, Sanjeev Khudanpur:
Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step. ICASSP 2021: 5774-5778 - [c234]Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe
, Kun Wei, Wangyou Zhang, Yuekai Zhang:
Recent Developments on Espnet Toolkit Boosted By Conformer. ICASSP 2021: 5874-5878 - [c233]Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
:
Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition. ICASSP 2021: 6214-6218 - [c232]Jaesong Lee, Shinji Watanabe
:
Intermediate Loss Regularization for CTC-Based Speech Recognition. ICASSP 2021: 6224-6228 - [c231]Murali Karthick Baskar, Lukás Burget, Shinji Watanabe
, Ramón Fernandez Astudillo, Jan Honza Cernocký:
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition. ICASSP 2021: 6753-6757 - [c230]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe
, Tomohiro Nakatani, Marc Delcroix
, Keisuke Kinoshita
, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. ICASSP 2021: 6898-6902 - [c229]Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu:
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation. ICASSP 2021: 6908-6912 - [c228]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe
, John R. Hershey:
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. ICASSP 2021: 7183-7187 - [c227]Shota Horiguchi, Paola García
, Yusuke Fujita, Shinji Watanabe
, Kenji Nagamatsu:
End-To-End Speaker Diarization as Post-Processing. ICASSP 2021: 7188-7192 - [c226]Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara
, Shinji Watanabe
:
ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder. ICASSP 2021: 7503-7507 - [c225]Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe
, Tetsuji Ogawa, Tetsunori Kobayashi:
Improved Mask-CTC for Non-Autoregressive End-to-End ASR. ICASSP 2021: 8363-8367 - [c224]Aswin Shanmugam Subramanian
, Chao Weng, Shinji Watanabe
, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu:
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization. ICASSP 2021: 8433-8437 - [c223]Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, Shinji Watanabe:
Data Augmentation Methods for End-to-End Speech Recognition on Distant-Talk Scenarios. Interspeech 2021: 301-305 - [c222]Tatsuya Komatsu, Shinji Watanabe
, Koichi Miyazaki, Tomoki Hayashi:
Acoustic Event Detection with Classifier Chains. Interspeech 2021: 601-605 - [c221]Shu-Wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe
, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB: Speech Processing Universal PERformance Benchmark. Interspeech 2021: 1194-1198 - [c220]Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe
, Alan W. Black:
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding. Interspeech 2021: 1264-1268 - [c219]Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe
, Georg Kucsko:
SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition. Interspeech 2021: 1434-1438 - [c218]Katerina Zmolíková
, Marc Delcroix
, Desh Raj
, Shinji Watanabe
, Jan Cernocký:
Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics. Interspeech 2021: 1464-1468 - [c217]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe
, Alexander I. Rudnicky
:
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021. Interspeech 2021: 1564-1568 - [c216]Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe
:
Multi-Mode Transformer Transducer with Stochastic Future Context. Interspeech 2021: 1827-1831 - [c215]Brian Yan, Siddharth Dalmia, David R. Mortensen
, Florian Metze, Shinji Watanabe
:
Differentiable Allophone Graphs for Language-Universal Speech Recognition. Interspeech 2021: 2471-2475 - [c214]Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita
, Shinji Watanabe
, Marc Delcroix
, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen:
Continuous Speech Separation Using Speaker Inventory for Long Recording. Interspeech 2021: 3036-3040 - [c213]Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe
, Leibny Paola García-Perera, Kenji Nagamatsu:
Semi-Supervised Training with Pseudo-Labeling for End-To-End Neural Diarization. Interspeech 2021: 3096-3100 - [c212]Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe
, Leibny Paola García-Perera, Kenji Nagamatsu:
Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers. Interspeech 2021: 3116-3120 - [c211]Suwon Shon, Pablo Brusco, Jing Pan, Kyu Jeong Han, Shinji Watanabe
:
Leveraging Pre-Trained Language Model for Speech Sentiment Analysis. Interspeech 2021: 3420-3424 - [c210]Matthew Maciejewski, Shinji Watanabe
, Sanjeev Khudanpur:
Speaker Verification-Based Evaluation of Single-Channel Speech Separation. Interspeech 2021: 3520-3524 - [c209]Maokui He, Desh Raj
, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
:
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker. Interspeech 2021: 3555-3559 - [c208]Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe
, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan:
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio. Interspeech 2021: 3670-3674 - [c207]Pengcheng Guo, Xuankai Chang, Shinji Watanabe
, Lei Xie:
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain. Interspeech 2021: 3720-3724 - [c206]Yuya Fujita, Tianzi Wang, Shinji Watanabe
, Motoi Omachi:
Toward Streaming ASR with Non-Autoregressive Insertion-Based Model. Interspeech 2021: 3740-3744 - [c205]Jaesong Lee, Jingu Kang, Shinji Watanabe
:
Layer Pruning on Demand with Intermediate CTC. Interspeech 2021: 3745-3749 - [c204]Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe
:
Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models. Interspeech 2021: 3755-3759 - [c203]Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe:
ESPnet-ST IWSLT 2021 Offline Speech Translation System. IWSLT 2021: 100-109 - [c202]Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda:
Self-Guided Curriculum Learning for Neural Machine Translation. IWSLT 2021: 206-214 - [c201]Motoi Omachi, Yuya Fujita, Shinji Watanabe, Matthew Wiesner:
End-to-end ASR to jointly predict transcriptions and linguistic annotations. NAACL-HLT 2021: 1861-1871 - [c200]Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe:
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation. NAACL-HLT 2021: 1872-1881 - [c199]Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe:
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks. NAACL-HLT 2021: 1882-1896 - [c198]Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse H. Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk:
HEAR: Holistic Evaluation of Audio Representations. NeurIPS (Competition and Demos) 2021: 125-145 - [c197]Emiru Tsunoo, Yosuke Kashiwagi, Shinji Watanabe
:
Streaming Transformer Asr With Blockwise Synchronous Beam Search. SLT 2021: 22-29 - [c196]Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe
:
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration. SLT 2021: 785-792 - [c195]Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe
, Paola García, Kenji Nagamatsu:
Online End-To-End Neural Diarization with Speaker-Tracing Buffer. SLT 2021: 841-848 - [c194]Yuki Takashima, Yusuke Fujita, Shinji Watanabe
, Shota Horiguchi, Paola García, Kenji Nagamatsu:
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection. SLT 2021: 849-856 - [c193]Chenda Li, Yi Luo, Cong Han, Jinyu Li
, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix
, Keisuke Kinoshita
, Christoph Böddeker, Yanmin Qian, Shinji Watanabe
, Zhuo Chen:
Dual-Path RNN for Long Recording Speech Separation. SLT 2021: 865-872 - [c192]Desh Raj
, Leibny Paola García-Perera
, Zili Huang, Shinji Watanabe
, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur:
DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs. SLT 2021: 881-888 - [c191]Desh Raj
, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe
, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li
, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. SLT 2021: 897-904 - [c190]Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin W. Wilson, Desh Raj
, Shinji Watanabe
, Zhuo Chen, John R. Hershey:
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement. SLT 2021: 905-911 - [c189]Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe
, Yanmin Qian:
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions. WASPAA 2021: 146-150 - [i135]Amir Hussein, Shinji Watanabe, Ahmed Ali:
Arabic Speech Recognition by End-to-End, Modular Systems and Human. CoRR abs/2101.08454 (2021) - [i134]Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola García, Kenji Nagamatsu:
Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers. CoRR abs/2101.08473 (2021) - [i133]Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe, Shrikanth Narayanan:
A Review of Speaker Diarization: Recent Advances with Deep Learning. CoRR abs/2101.09624 (2021) - [i132]Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe:
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec. CoRR abs/2101.10877 (2021) - [i131]Shota Horiguchi, Nelson Yalta
, Paola García, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur:
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap. CoRR abs/2102.01363 (2021) - [i130]Jaesong Lee, Shinji Watanabe:
Intermediate Loss Regularization for CTC-based Speech Recognition. CoRR abs/2102.03216 (2021) - [i129]Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu:
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition. CoRR abs/2102.07955 (2021) - [i128]Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition. CoRR abs/2102.09168 (2021) - [i127]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. CoRR abs/2102.11525 (2021) - [i126]Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian:
Dual-Path Modeling for Long Recording Speech Separation in Meetings. CoRR abs/2102.11634 (2021) - [i125]Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang:
INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing. CoRR abs/2104.00960 (2021) - [i124]Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko:
SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition. CoRR abs/2104.02014 (2021) - [i123]Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe:
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation. CoRR abs/2104.06457 (2021) - [i122]Murali Karthick Baskar, Lukás Burget, Shinji Watanabe, Ramón Fernandez Astudillo, Jan Honza Cernocký:
EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition. CoRR abs/2104.07474 (2021) - [i121]Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe:
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks. CoRR abs/2105.00573 (2021) - [i120]Shu-Wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB: Speech processing Universal PERformance Benchmark. CoRR abs/2105.01051 (2021) - [i119]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. CoRR abs/2105.02096 (2021) - [i118]Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda:
Self-Guided Curriculum Learning for Neural Machine Translation. CoRR abs/2105.04475 (2021) - [i117]Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, Shinji Watanabe:
Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios. CoRR abs/2106.03419 (2021) - [i116]Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu:
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection. CoRR abs/2106.04078 (2021) - [i115]Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola García, Kenji Nagamatsu:
Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization. CoRR abs/2106.04764 (2021) - [i114]Suwon Shon, Pablo Brusco, Jing Pan, Kyu Jeong Han, Shinji Watanabe:
Leveraging Pre-trained Language Model for Speech Sentiment Analysis. CoRR abs/2106.06598 (2021) - [i113]Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan:
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio. CoRR abs/2106.06909 (2021) - [i112]Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie:
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain. CoRR abs/2106.08595 (2021) - [i111]Jaesong Lee, Jingu Kang, Shinji Watanabe:
Layer Pruning on Demand with Intermediate CTC. CoRR abs/2106.09216 (2021) - [i110]Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe:
Multi-mode Transformer Transducer with Stochastic Future Context. CoRR abs/2106.09760 (2021) - [i109]Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola García:
Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization. CoRR abs/2106.10654 (2021) - [i108]Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, Alan W. Black:
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding. CoRR abs/2106.15065 (2021) - [i107]Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe:
ESPnet-ST IWSLT 2021 Offline Speech Translation System. CoRR abs/2107.00636 (2021) - [i106]Shota Horiguchi, Shinji Watanabe, Paola García, Yawen Xue, Yuki Takashima, Yohei Kawaguchi:
Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors. CoRR abs/2107.01545 (2021) - [i105]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe, Alexander I. Rudnicky:
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021. CoRR abs/2107.05899 (2021) - [i104]Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe:
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models. CoRR abs/2107.09428 (2021) - [i103]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda:
On Prosody Modeling for ASR+TTS based Voice Conversion. CoRR abs/2107.09477 (2021) - [i102]Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe:
Differentiable Allophone Graphs for Language-Universal Speech Recognition. CoRR abs/2107.11628 (2021) - [i101]Yen-Ju Lu, Yu Tsao, Shinji Watanabe:
A Study on Speech Enhancement Based on Diffusion Probabilistic Model. CoRR abs/2107.11876 (2021) - [i100]Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe:
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring. CoRR abs/2109.04411 (2021) - [i99]Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe:
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. CoRR abs/2109.12804 (2021) - [i98]Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-Wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe:
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition. CoRR abs/2110.04590 (2021) - [i97]Shota Horiguchi, Yuki Takashima, Paola García, Shinji Watanabe, Yohei Kawaguchi:
Multi-Channel End-to-End Neural Diarization with Distributed Microphones. CoRR abs/2110.04694 (2021) - [i96]Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe:
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation. CoRR abs/2110.05249 (2021) - [i95]Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Jeong Han, Shinji Watanabe:
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition. CoRR abs/2110.05571 (2021) - [i94]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda:
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations. CoRR abs/2110.06280 (2021) - [i93]Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe:
ESPnet2-TTS: Extending the Edge of TTS Research. CoRR abs/2110.07840 (2021) - [i92]Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian:
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions. CoRR abs/2110.14139 (2021) - [i91]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi:
TorchAudio: Building Blocks for Audio and Speech Processing. CoRR abs/2110.15018 (2021) - [i90]Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Sequence Transduction with Graph-based Supervision. CoRR abs/2111.01272 (2021) - [i89]Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W. Black:
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity. CoRR abs/2111.01326 (2021) - [i88]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe:
Attention-based Multi-hypothesis Fusion for Speech Summarization. CoRR abs/2111.08201 (2021) - [i87]Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe:
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet. CoRR abs/2111.14706 (2021) - [i86]Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu:
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization. CoRR abs/2111.15016 (2021) - [i85]Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, Sayaka Shiota, Shinji Watanabe:
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification. CoRR abs/2112.09323 (2021) - [i84]Jing Shi, Xuankai Chang, Tomoki Hayashi, Yen-Ju Lu, Shinji Watanabe, Bo Xu:
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem. CoRR abs/2112.09382 (2021) - 2020
- [j36]Ruizhi Li
, Xiaofei Wang
, Sri Harish Mallidi, Shinji Watanabe
, Takaaki Hori
, Hynek Hermansky
:
Multi-Stream End-to-End Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 28: 646-655 (2020) - [j35]Wangyou Zhang
, Xuankai Chang
, Yanmin Qian
, Shinji Watanabe
:
Improving End-to-End Single-Channel Multi-Talker Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1385-1394 (2020) - [c188]Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Yalta
, Tomoki Hayashi, Shinji Watanabe:
ESPnet-ST: All-in-One Speech Translation Toolkit. ACL (demo) 2020: 302-311 - [c187]Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda:
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS. Blizzard Challenge / Voice Conversion Challenge 2020 - [c186]Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation. DCASE 2020: 100-104 - [c185]Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe
, Tomoki Toda
, Kazuya Takeda:
Weakly-Supervised Sound Event Detection with Self-Attention. ICASSP 2020: 66-70 - [c184]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
:
End-To-End Multi-Speaker Speech Recognition With Transformer. ICASSP 2020: 6134-6138 - [c183]Zili Huang, Shinji Watanabe
, Yusuke Fujita, Paola García
, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur:
Speaker Diarization with Region Proposal Network. ICASSP 2020: 6514-6518 - [c182]Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, Shinji Watanabe
:
End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection. ICASSP 2020: 6999-7003 - [c181]Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe
, Hynek Hermansky
:
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition. ICASSP 2020: 7014-7018 - [c180]Yuya Fujita, Aswin Shanmugam Subramanian
, Motoi Omachi, Shinji Watanabe
:
Attention-Based ASR with Lightweight and Dynamic Convolutions. ICASSP 2020: 7034-7038 - [c179]Aswin Shanmugam Subramanian
, Chao Weng, Meng Yu, Shi-Xiong Zhang, Yong Xu, Shinji Watanabe
, Dong Yu:
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives. ICASSP 2020: 7299-7303 - [c178]Katsuki Inoue, Sunao Hara, Masanobu Abe, Tomoki Hayashi, Ryuichi Yamamoto, Shinji Watanabe
:
Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models. ICASSP 2020: 7634-7638 - [c177]Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe
, Tomoki Toda
, Kazuya Takeda, Yu Zhang, Xu Tan:
Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit. ICASSP 2020: 7654-7658 - [c176]Shota Horiguchi, Yusuke Fujita, Shinji Watanabe
, Yawen Xue, Kenji Nagamatsu:
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors. INTERSPEECH 2020: 269-273 - [c175]Wangyou Zhang, Aswin Shanmugam Subramanian
, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. INTERSPEECH 2020: 324-328 - [c174]Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe
, Bo Xu:
Speaker-Conditional Chain Model for Speech Separation and Extraction. INTERSPEECH 2020: 2707-2711 - [c173]Jaejin Cho, Piotr Zelasko, Jesús Villalba, Shinji Watanabe
, Najim Dehak
:
Learning Speaker Embedding from Text-to-Speech. INTERSPEECH 2020: 3256-3260 - [c172]Xuankai Chang, Aswin Shanmugam Subramanian
, Pengcheng Guo, Shinji Watanabe
, Yuya Fujita, Motoi Omachi:
End-to-End ASR with Adaptive Span Self-Attention. INTERSPEECH 2020: 3595-3599 - [c171]Yosuke Higuchi, Shinji Watanabe
, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi:
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict. INTERSPEECH 2020: 3655-3659 - [c170]Yuya Fujita, Shinji Watanabe
, Motoi Omachi, Xuankai Chang:
Insertion-Based Modeling for End-to-End Automatic Speech Recognition. INTERSPEECH 2020: 3660-3664 - [c169]Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie:
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals. NeurIPS 2020 - [p9]Takahiro Shinozaki, Shinji Watanabe
, Kevin Duh:
Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms. Deep Neural Evolution 2020: 97-129 - [i83]Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, Shinji Watanabe:
End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection. CoRR abs/2002.00551 (2020) - [i82]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe:
End-to-End Multi-speaker Speech Recognition with Transformer. CoRR abs/2002.03921 (2020) - [i81]Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola García, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur:
Speaker Diarization with Region Proposal Network. CoRR abs/2002.06220 (2020) - [i80]Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu:
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification. CoRR abs/2003.02966 (2020) - [i79]Shinji Watanabe, Michael I. Mandel, Jon Barker, Emmanuel Vincent:
CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings. CoRR abs/2004.09249 (2020) - [i78]Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe:
ESPnet-ST: All-in-One Speech Translation Toolkit. CoRR abs/2004.10234 (2020) - [i77]Tomoki Hayashi, Shinji Watanabe:
DiscreTalk: Text-to-Speech as a Machine Translation Problem. CoRR abs/2005.05525 (2020) - [i76]Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi:
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict. CoRR abs/2005.08700 (2020) - [i75]Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu:
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors. CoRR abs/2005.09921 (2020) - [i74]Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. CoRR abs/2005.10479 (2020) - [i73]Yuya Fujita, Shinji Watanabe, Motoi Omachi, Xuankai Chang:
Insertion-Based Modeling for End-to-End Automatic Speech Recognition. CoRR abs/2005.13211 (2020) - [i72]Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Jing Shi, Kenji Nagamatsu:
Neural Speaker Diarization with Speaker-Wise Chain Rule. CoRR abs/2006.01796 (2020) - [i71]Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu:
Online End-to-End Neural Diarization with Speaker-Tracing Buffer. CoRR abs/2006.02616 (2020) - [i70]Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew Maciejewski, Piotr Zelasko, Paola García, Shinji Watanabe, Sanjeev Khudanpur:
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge. CoRR abs/2006.07898 (2020) - [i69]Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe, Bo Xu:
Speaker-Conditional Chain Model for Speech Separation and Extraction. CoRR abs/2006.14149 (2020) - [i68]Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie:
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals. CoRR abs/2006.14150 (2020) - [i67]Emiru Tsunoo, Yosuke Kashiwagi, Shinji Watanabe:
Streaming Transformer ASR with Blockwise Synchronous Inference. CoRR abs/2006.14941 (2020) - [i66]Jaesung Huh, Hee Soo Heo, Jingu Kang, Shinji Watanabe, Joon Son Chung:
Augmentation adversarial training for unsupervised speaker recognition. CoRR abs/2007.12085 (2020) - [i65]Wen-Chin Huang, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda:
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS. CoRR abs/2010.02434 (2020) - [i64]Jaejin Cho, Piotr Zelasko, Jesús Villalba, Shinji Watanabe, Najim Dehak:
Learning Speaker Embedding from Text-to-Speech. CoRR abs/2010.11221 (2020) - [i63]Matthew Maciejewski, Jing Shi, Shinji Watanabe, Sanjeev Khudanpur:
Training Noisy Single-Channel Speech Separation With Noisy Oracle Sources: A Large Gap and A Small Step. CoRR abs/2010.12430 (2020) - [i62]Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe:
Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder. CoRR abs/2010.13047 (2020) - [i61]Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi:
Improved Mask-CTC for Non-Autoregressive End-to-End ASR. CoRR abs/2010.13270 (2020) - [i60]Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, Yuekai Zhang:
Recent Developments on ESPnet Toolkit Boosted by Conformer. CoRR abs/2010.13956 (2020) - [i59]Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu:
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization. CoRR abs/2011.00091 (2020) - [i58]Desh Raj, Leibny Paola García-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur:
DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs. CoRR abs/2011.01997 (2020) - [i57]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Mao-Kui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis. CoRR abs/2011.02014 (2020) - [i56]Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe:
ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration. CoRR abs/2011.03706 (2020) - [i55]Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu:
Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation. CoRR abs/2011.13393 (2020) - [i54]Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Shinji Watanabe, Reinhold Haeb-Umbach:
Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation. CoRR abs/2011.15003 (2020) - [i53]Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen:
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording. CoRR abs/2012.09727 (2020) - [i52]Shota Horiguchi, Paola García, Yusuke Fujita, Shinji Watanabe, Kenji Nagamatsu:
End-to-End Speaker Diarization as Post-Processing. CoRR abs/2012.10055 (2020) - [i51]Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, Wangyou Zhang:
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans. CoRR abs/2012.13006 (2020)
2010 – 2019
- 2019
- [j34]Jonathan Le Roux
, Gordon Wichern
, Shinji Watanabe
, Andy M. Sarroff, John R. Hershey:
Phasebook and Friends: Leveraging Discrete Representations for Source Separation. IEEE J. Sel. Top. Signal Process. 13(2): 370-382 (2019) - [j33]Shinji Watanabe
, Shoko Araki
, Michiel Bacchiani, Reinhold Haeb-Umbach, Michael L. Seltzer:
Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition. IEEE J. Sel. Top. Signal Process. 13(4): 785-786 (2019) - [j32]Reinhold Haeb-Umbach
, Shinji Watanabe
, Tomohiro Nakatani
, Michiel Bacchiani
, Björn Hoffmeister, Michael L. Seltzer, Heiga Zen
, Mehrez Souden:
Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques. IEEE Signal Process. Mag. 36(6): 111-124 (2019) - [j31]Takafumi Moriya
, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe
, Kevin Duh:
Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 27(1): 77-88 (2019) - [c168]Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe
:
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models. ASRU 2019: 31-38 - [c167]Yiming Wang, Sanjeev Khudanpur, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe
:
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit. ASRU 2019: 136-143 - [c166]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
:
MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition. ASRU 2019: 237-244 - [c165]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe
:
End-to-End Neural Speaker Diarization with Self-Attention. ASRU 2019: 296-303 - [c164]Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe
:
Transformer ASR with Contextual Block Processing. ASRU 2019: 427-433 - [c163]Shigeki Karita, Xiaofei Wang, Shinji Watanabe
, Takenori Yoshimura, Wangyou Zhang, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin
, Ryuichi Yamamoto:
A Comparative Study on Transformer vs RNN in Speech Applications. ASRU 2019: 449-456 - [c162]Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara
, Shinji Watanabe
:
Multilingual End-to-End Speech Translation. ASRU 2019: 570-577 - [c161]Nelson Yalta
, Shinji Watanabe
, Takaaki Hori, Kazuhiro Nakadai, Tetsuya Ogata:
CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments*. EUSIPCO 2019: 1-5 - [c160]Sandeep Kothinti, Keisuke Imoto
, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe
, Mounya Elhilali
:
Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection. ICASSP 2019: 36-40 - [c159]Jonathan Le Roux, Gordon Wichern, Shinji Watanabe
, Andy M. Sarroff, John R. Hershey:
The Phasebook: Building Complex Masks via Discrete Representations for Source Separation. ICASSP 2019: 66-70 - [c158]Murali Karthick Baskar, Lukás Burget
, Shinji Watanabe
, Martin Karafiát
, Takaaki Hori, Jan Honza Cernocký
:
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR. ICASSP 2019: 5646-5650 - [c157]Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara
, Shinji Watanabe
:
Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion. ICASSP 2019: 6096-6100 - [c156]Shigeki Karita, Shinji Watanabe
, Tomoharu Iwata, Marc Delcroix
, Atsunori Ogawa, Tomohiro Nakatani:
Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders. ICASSP 2019: 6166-6170 - [c155]Jaejin Cho, Shinji Watanabe
, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesús Villalba, Najim Dehak
:
Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition. ICASSP 2019: 6191-6195 - [c154]Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe
:
End-to-end Monaural Multi-speaker ASR System without Pretraining. ICASSP 2019: 6256-6260 - [c153]Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe
, Jonathan Le Roux:
Cycle-consistency Training for End-to-end Speech Recognition. ICASSP 2019: 6271-6275 - [c152]Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe
:
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches. ICASSP 2019: 6630-6634 - [c151]Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe
, Sanjeev Khudanpur:
Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System. ICASSP 2019: 6665-6669 - [c150]Xiaofei Wang, Ruizhi Li, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe
, Hynek Hermansky
:
Stream Attention-based Multi-array End-to-end Speech Recognition. ICASSP 2019: 7105-7109 - [c149]Hainan Xu, Shuoyang Ding, Shinji Watanabe
:
Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling. ICASSP 2019: 7110-7114 - [c148]Ashish Arora, Paola García, Shinji Watanabe
, Vimal Manohar, Yiwen Shao, Sanjeev Khudanpur, Chun-Chieh Chang, Babak Rekabdar, Bagher BabaAli, Daniel Povey, David Etter, Desh Raj
, Hossein Hadian
, Jan Trmal:
Using ASR Methods for OCR. ICDAR 2019: 663-668 - [c147]Nelson Yalta
, Shinji Watanabe
, Kazuhiro Nakadai, Tetsuya Ogata:
Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation. IJCNN 2019: 1-8 - [c146]Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe
:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. INTERSPEECH 2019: 236-240 - [c145]Marc Delcroix
, Shinji Watanabe
, Tsubasa Ochiai, Keisuke Kinoshita
, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani:
End-to-End SpeakerBeam for Single Channel Target Speech Recognition. INTERSPEECH 2019: 451-455 - [c144]Shigeki Karita, Nelson Enrique Yalta Soplin
, Shinji Watanabe
, Marc Delcroix
, Atsunori Ogawa, Tomohiro Nakatani:
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration. INTERSPEECH 2019: 1408-1412 - [c143]Daniel Garcia-Romero, David Snyder, Shinji Watanabe
, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur:
Speaker Recognition Benchmark Using the CHiME-5 Corpus. INTERSPEECH 2019: 1506-1510 - [c142]Martin Karafiát
, Murali Karthick Baskar, Shinji Watanabe
, Takaaki Hori, Matthew Wiesner, Jan Cernocký
:
Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems. INTERSPEECH 2019: 2220-2224 - [c141]Hiroshi Seki, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux, John R. Hershey:
End-to-End Multilingual Multi-Speaker Speech Recognition. INTERSPEECH 2019: 3755-3759 - [c140]Murali Karthick Baskar, Shinji Watanabe
, Ramón Fernandez Astudillo, Takaaki Hori, Lukás Burget
, Jan Cernocký
:
Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text. INTERSPEECH 2019: 3790-3794 - [c139]Hiroshi Seki, Takaaki Hori, Shinji Watanabe
, Niko Moritz, Jonathan Le Roux:
Vectorized Beam Search for CTC-Attention-Based Speech Recognition. INTERSPEECH 2019: 3825-3829 - [c138]Laureano Moro-Velázquez
, Jaejin Cho, Shinji Watanabe
, Mark A. Hasegawa-Johnson, Odette Scharenborg
, Heejin Kim, Najim Dehak
:
Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease. INTERSPEECH 2019: 3875-3879 - [c137]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe
:
End-to-End Neural Speaker Diarization with Permutation-Free Objectives. INTERSPEECH 2019: 4300-4304 - [c136]Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe
, Chunxi Liu, Najim Dehak
, Sanjeev Khudanpur:
Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings. INTERSPEECH 2019: 4375-4379 - [c135]Tomoki Hayashi, Shinji Watanabe
, Tomoki Toda
, Kazuya Takeda, Shubham Toshniwal, Karen Livescu
:
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis. INTERSPEECH 2019: 4430-4434 - [c134]Hirofumi Inaguma, Shun Kiyono, Nelson Enrique Yalta Soplin, Jun Suzuki, Kevin Duh, Shinji Watanabe:
ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper. IWSLT 2019 - [c133]Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky:
Massively Multilingual Adversarial Speech Recognition. NAACL-HLT (1) 2019: 96-108 - [c132]Matthew Maciejewski, Gregory Sell, Yusuke Fujita, Leibny Paola García-Perera, Shinji Watanabe
, Sanjeev Khudanpur:
Analysis of Robustness of Deep Single-Channel Speech Separation Using Corpora Constructed From Multiple Domains. WASPAA 2019: 165-169 - [c131]Aswin Shanmugam Subramanian
, Xiaofei Wang, Murali Karthick Baskar, Shinji Watanabe
, Toru Taniguchi, Dung T. Tran, Yuya Fujita:
Speech Enhancement Using End-to-End Speech Recognition Objectives. WASPAA 2019: 234-238 - [c130]Toru Taniguchi, Aswin Shanmugam Subramanian
, Xiaofei Wang, Dung T. Tran, Yuya Fujita, Shinji Watanabe
:
Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors For Reverberant Speech Recognition. WASPAA 2019: 293-297 - [i50]Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky:
Massively Multilingual Adversarial Speech Recognition. CoRR abs/1904.02210 (2019) - [i49]Aswin Shanmugam Subramanian, Xiaofei Wang, Shinji Watanabe, Toru Taniguchi, Dung T. Tran, Yuya Fujita:
Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR. CoRR abs/1904.09049 (2019) - [i48]Murali Karthick Baskar, Shinji Watanabe, Ramón Fernandez Astudillo, Takaaki Hori, Lukás Burget, Jan Cernocký:
Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text. CoRR abs/1905.01152 (2019) - [i47]Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky:
Multi-Stream End-to-End Speech Recognition. CoRR abs/1906.08041 (2019) - [i46]Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. CoRR abs/1906.10876 (2019) - [i45]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Permutation-Free Objectives. CoRR abs/1909.05952 (2019) - [i44]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Self-attention. CoRR abs/1909.06247 (2019) - [i43]Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang:
A Comparative Study on Transformer vs RNN in Speech Applications. CoRR abs/1909.06317 (2019) - [i42]Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models. CoRR abs/1909.08103 (2019) - [i41]Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur:
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit. CoRR abs/1909.08723 (2019) - [i40]Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe:
Multilingual End-to-End Speech Translation. CoRR abs/1910.00254 (2019) - [i39]Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe:
MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition. CoRR abs/1910.06522 (2019) - [i38]Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe:
Transformer ASR with Contextual Block Processing. CoRR abs/1910.07204 (2019) - [i37]Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky:
A practical two-stage training strategy for multi-stream end-to-end speech recognition. CoRR abs/1910.10671 (2019) - [i36]Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan:
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit. CoRR abs/1910.10909 (2019) - [i35]Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe:
Towards Online End-to-end Transformer Automatic Speech Recognition. CoRR abs/1910.11871 (2019) - [i34]Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak:
Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition. CoRR abs/1911.04908 (2019) - 2018
- [c129]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
A Purely End-to-End System for Multi-speaker Speech Recognition. ACL (1) 2018: 2620-2630 - [c128]Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe
, John R. Hershey:
End-to-End Multi-Speaker Speech Recognition. ICASSP 2018: 4819-4823 - [c127]Hiroshi Seki, Shinji Watanabe
, Takaaki Hori, Jonathan Le Roux, John R. Hershey:
An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech. ICASSP 2018: 4919-4923 - [c126]Tsubasa Ochiai, Shinji Watanabe
, Shigeru Katagiri, Takaaki Hori, John R. Hershey:
Speaker Adaptation for Multichannel End-to-End Speech Recognition. ICASSP 2018: 6707-6711 - [c125]Shigeki Karita, Shinji Watanabe
, Tomoharu Iwata, Atsunori Ogawa, Marc Delcroix
:
Semi-Supervised End-to-End Speech Recognition. INTERSPEECH 2018: 2-6 - [c124]Tomoki Hayashi, Shinji Watanabe
, Tomoki Toda
, Kazuya Takeda:
Multi-Head Decoder for End-to-End Speech Recognition. INTERSPEECH 2018: 801-805 - [c123]Jon Barker, Shinji Watanabe
, Emmanuel Vincent, Jan Trmal:
The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines. INTERSPEECH 2018: 1561-1565 - [c122]Szu-Jui Chen, Aswin Shanmugam Subramanian
, Hainan Xu, Shinji Watanabe
:
Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline. INTERSPEECH 2018: 1571-1575 - [c121]Peter Sibbern Frederiksen, Jesús Villalba, Shinji Watanabe
, Zheng-Hua Tan, Najim Dehak
:
Effectiveness of Single-Channel BLSTM Enhancement for Language Identification. INTERSPEECH 2018: 1823-1827 - [c120]Shinji Watanabe
, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin
, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai:
ESPnet: End-to-End Speech Processing Toolkit. INTERSPEECH 2018: 2207-2211 - [c119]Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe
:
Multi-Modal Data Augmentation for End-to-end ASR. INTERSPEECH 2018: 2394-2398 - [c118]Marc Delcroix
, Shinji Watanabe
, Atsunori Ogawa, Shigeki Karita, Tomohiro Nakatani:
Auxiliary Feature Based Adaptation of End-to-end ASR Systems. INTERSPEECH 2018: 2444-2448 - [c117]Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba
, Matthew Maciejewski, Vimal Manohar, Najim Dehak
, Daniel Povey, Shinji Watanabe
, Sanjeev Khudanpur:
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge. INTERSPEECH 2018: 2808-2812 - [c116]Aswin Shanmugam Subramanian
, Szu-Jui Chen, Shinji Watanabe
:
Student-Teacher Learning for BLSTM Mask-based Speech Enhancement. INTERSPEECH 2018: 3249-3253 - [c115]Hirofumi Inaguma, Xuan Zhang, Zhiqi Wang, Adithya Renduchintala, Shinji Watanabe, Kevin Duh:
The JHU/KyotoU Speech Translation System for IWSLT 2018. IWSLT 2018: 153-159 - [c114]Takaaki Hori, Jaejin Cho, Shinji Watanabe
:
End-to-end Speech Recognition With Word-Based Rnn Language Models. SLT 2018: 389-396 - [c113]Tomoki Hayashi, Shinji Watanabe
, Yu Zhang, Tomoki Toda
, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for end-to-end ASR. SLT 2018: 426-433 - [c112]Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta
, Martin Karafiát
, Shinji Watanabe
, Takaaki Hori:
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling. SLT 2018: 521-527 - [c111]Chunxi Liu, Matthew Wiesner, Shinji Watanabe
, Craig Harman, Jan Trmal, Najim Dehak
, Sanjeev Khudanpur:
Low-Resource Contextual Topic Identification on Speech. SLT 2018: 656-663 - [i33]Aswin Shanmugam Subramanian, Szu-Jui Chen, Shinji Watanabe:
Student-Teacher Learning for BLSTM Mask-based Speech Enhancement. CoRR abs/1803.10013 (2018) - [i32]Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe:
Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline. CoRR abs/1803.10109 (2018) - [i31]Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe:
Multi-Modal Data Augmentation for End-to-end ASR. CoRR abs/1803.10299 (2018) - [i30]Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal:
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines. CoRR abs/1803.10609 (2018) - [i29]Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai:
ESPnet: End-to-End Speech Processing Toolkit. CoRR abs/1804.00015 (2018) - [i28]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda:
Multi-Head Decoder for End-to-End Speech Recognition. CoRR abs/1804.08050 (2018) - [i27]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
A Purely End-to-end System for Multi-speaker Speech Recognition. CoRR abs/1805.05826 (2018) - [i26]Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, Tetsuya Ogata:
Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation. CoRR abs/1807.01126 (2018) - [i25]Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur:
Low-Resource Contextual Topic Identification on Speech. CoRR abs/1807.06204 (2018) - [i24]Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for End-to-End ASR. CoRR abs/1807.10893 (2018) - [i23]Takaaki Hori, Jaejin Cho, Shinji Watanabe:
End-to-end Speech Recognition with Word-based RNN Language Models. CoRR abs/1808.02608 (2018) - [i22]Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy M. Sarroff, John R. Hershey:
Phasebook and Friends: Leveraging Discrete Representations for Source Separation. CoRR abs/1810.01395 (2018) - [i21]Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Reddy Mallidi, Nelson Yalta, Martin Karafiát, Shinji Watanabe, Takaaki Hori:
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling. CoRR abs/1810.03459 (2018) - [i20]Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux:
Cycle-consistency training for end-to-end speech recognition. CoRR abs/1811.01690 (2018) - [i19]Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe:
End-to-End Monaural Multi-speaker ASR System without Pretraining. CoRR abs/1811.02062 (2018) - [i18]Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, Shinji Watanabe:
Transfer learning of language-independent end-to-end ASR with language model fusion. CoRR abs/1811.02134 (2018) - [i17]Jaejin Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesús Villalba, Najim Dehak:
Language model integration based on memory control for sequence to sequence speech recognition. CoRR abs/1811.02162 (2018) - [i16]Matthew Maciejewski, Gregory Sell, Leibny Paola García-Perera, Shinji Watanabe, Sanjeev Khudanpur:
Building Corpora for Single-Channel Speech Separation Across Multiple Domains. CoRR abs/1811.02641 (2018) - [i15]Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya Ogata:
CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments. CoRR abs/1811.02735 (2018) - [i14]Murali Karthick Baskar, Lukás Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký:
Promising Accurate Prefix Boosting for sequence-to-sequence ASR. CoRR abs/1811.02770 (2018) - [i13]Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan Honza Cernocký:
Analysis of Multilingual Sequence-to-Sequence speech recognition systems. CoRR abs/1811.03451 (2018) - [i12]Sandeep Kothinti, Keisuke Imoto, Debmalya Chakrabarty, Gregory Sell, Shinji Watanabe, Mounya Elhilali:
Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection. CoRR abs/1811.04048 (2018) - [i11]Hainan Xu, Shuoyang Ding, Shinji Watanabe:
Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling. CoRR abs/1811.04284 (2018) - [i10]Hiroshi Seki, Takaaki Hori, Shinji Watanabe:
Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition. CoRR abs/1811.04568 (2018) - [i9]Ruizhi Li, Xiaofei Wang, Sri Harish Reddy Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky:
Multi-encoder multi-resolution framework for end-to-end speech recognition. CoRR abs/1811.04897 (2018) - [i8]Xiaofei Wang, Ruizhi Li, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky:
Stream attention-based multi-array end-to-end speech recognition. CoRR abs/1811.04903 (2018) - [i7]Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak, Sanjeev Khudanpur:
Low Resource Multi-modal Data Augmentation for End-to-end ASR. CoRR abs/1812.03919 (2018) - 2017
- [j30]Jon Barker, Ricard Marxer
, Emmanuel Vincent, Shinji Watanabe
:
Multi-microphone speech recognition in everyday environments. Comput. Speech Lang. 46: 386-387 (2017) - [j29]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe
:
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. Comput. Speech Lang. 46: 401-418 (2017) - [j28]Emmanuel Vincent, Shinji Watanabe
, Aditya Arie Nugraha
, Jon Barker, Ricard Marxer
:
An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Comput. Speech Lang. 46: 535-557 (2017) - [j27]Jon Barker, Ricard Marxer
, Emmanuel Vincent, Shinji Watanabe
:
The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes. Comput. Speech Lang. 46: 605-626 (2017) - [j26]Yuuki Tachioka, Shinji Watanabe
, Jonathan Le Roux, John R. Hershey:
Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones. J. Inf. Process. 25: 407-416 (2017) - [j25]Shinji Watanabe
, Takaaki Hori
, Suyoun Kim
, John R. Hershey, Tomoki Hayashi
:
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition. IEEE J. Sel. Top. Signal Process. 11(8): 1240-1253 (2017) - [j24]Tsubasa Ochiai
, Shinji Watanabe
, Takaaki Hori
, John R. Hershey, Xiong Xiao:
Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming. IEEE J. Sel. Top. Signal Process. 11(8): 1274-1288 (2017) - [j23]Tomoki Hayashi, Shinji Watanabe
, Tomoki Toda
, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Duration-Controlled LSTM for Polyphonic Sound Event Detection. IEEE ACM Trans. Audio Speech Lang. Process. 25(11): 2059-2070 (2017) - [c110]Takaaki Hori, Shinji Watanabe
, John R. Hershey:
Joint CTC/attention decoding for end-to-end speech recognition. ACL (1) 2017: 518-529 - [c109]Shinji Watanabe
, Takaaki Hori, John R. Hershey:
Language independent end-to-end architecture for joint language identification and speech recognition. ASRU 2017: 265-271 - [c108]Takaaki Hori, Shinji Watanabe
, John R. Hershey:
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition. ASRU 2017: 287-293 - [c107]Hayato Shibata, Taku Kato, Takahiro Shinozaki, Shinji Watanabe:
Composite embedding systems for ZeroSpeech2017 Track1. ASRU 2017: 747-753 - [c106]Zhong Meng, Shinji Watanabe
, John R. Hershey, Hakan Erdogan:
Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition. ICASSP 2017: 271-275 - [c105]Tomoki Hayashi, Shinji Watanabe
, Tomoki Toda
, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection. ICASSP 2017: 766-770 - [c104]Suyoun Kim, Takaaki Hori, Shinji Watanabe
:
Joint CTC-attention based end-to-end speech recognition using multi-task learning. ICASSP 2017: 4835-4839 - [c103]Shinji Watanabe
, Takaaki Hori, Jonathan Le Roux, John R. Hershey:
Student-teacher network learning with enhanced features. ICASSP 2017: 5275-5279 - [c102]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey:
Multichannel End-to-end Speech Recognition. ICML 2017: 2632-2641 - [c101]Takaaki Hori, Shinji Watanabe
, Yu Zhang, William Chan:
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM. INTERSPEECH 2017: 949-953 - [c100]Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya
, Shinji Watanabe
, Jonathan Le Roux:
Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information. INTERSPEECH 2017: 2461-2465 - [c99]Takahiro Shinozaki, Shinji Watanabe
, Daichi Mochihashi, Graham Neubig:
Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text. INTERSPEECH 2017: 2546-2550 - [c98]Tsubasa Ochiai, Shinji Watanabe
, Shigeru Katagiri:
Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR. MLSP 2017: 1-6 - [p8]Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey:
Preliminaries. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 3-17 - [p7]Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Michael I. Mandel, Liang Lu, John R. Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Dong Yu:
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 79-104 - [p6]John R. Hershey, Jonathan Le Roux, Shinji Watanabe, Scott Wisdom, Zhuo Chen, Yusuf Ziya Isik:
Novel Deep Architectures in Speech Processing. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 135-164 - [p5]Hakan Erdogan, John R. Hershey, Shinji Watanabe, Jonathan Le Roux:
Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 165-186 - [p4]Martin Karafiát, Karel Veselý, Katerina Zmolíková
, Marc Delcroix, Shinji Watanabe, Lukás Burget, Jan Honza Cernocký, Igor Szöke:
Training Data Augmentation and Data Selection. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 245-260 - [p3]Jon P. Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe:
The CHiME Challenges: Robust Speech Recognition in Everyday Environments. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 327-344 - [p2]Shinji Watanabe, Takaaki Hori, Yajie Miao, Marc Delcroix, Florian Metze, John R. Hershey:
Toolkits for Robust Speech Processing. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 369-382 - [e2]Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey:
New Era for Robust Speech Recognition, Exploiting Deep Learning. Springer 2017, ISBN 978-3-319-64679-4 [contents] - [e1]Naonori Ueda, Shinji Watanabe, Tomoko Matsui, Jen-Tzung Chien, Jan Larsen:
27th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2017, Tokyo, Japan, September 25-28, 2017. IEEE 2017, ISBN 978-1-5090-6341-3 [contents] - [i6]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey:
Multichannel End-to-end Speech Recognition. CoRR abs/1703.04783 (2017) - [i5]Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan:
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM. CoRR abs/1706.02737 (2017) - [i4]Zhong Meng, Shinji Watanabe, John R. Hershey, Hakan Erdogan:
Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition. CoRR abs/1711.08016 (2017) - 2016
- [c97]Xiong Xiao, Shinji Watanabe
, Eng Siong Chng
, Haizhou Li
:
Beamforming networks using spatial covariance features for far-field speech recognition. APSIPA 2016: 1-6 - [c96]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection. DCASE 2016: 35-39 - [c95]Toshiaki Koike-Akino
, Ruhi Mahajan, Tim K. Marks, Ye Wang, Shinji Watanabe
, Oncel Tuzel, Philip V. Orlik:
High-accuracy user identification using EEG biometrics. EMBC 2016: 854-858 - [c94]John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe
:
Deep clustering: Discriminative embeddings for segmentation and separation. ICASSP 2016: 31-35 - [c93]Scott Wisdom, John R. Hershey, Jonathan Le Roux, Shinji Watanabe
:
Deep unfolding for multichannel source separation. ICASSP 2016: 121-125 - [c92]Karel Veselý, Shinji Watanabe
, Katerina Zmolíková
, Martin Karafiát
, Lukás Burget
, Jan Honza Cernocký
:
Sequence summarizing neural network for speaker adaptation. ICASSP 2016: 5315-5319 - [c91]Xiong Xiao, Shinji Watanabe
, Hakan Erdogan, Liang Lu, John R. Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Michael I. Mandel, Dong Yu:
Deep beamforming networks for multi-channel speech recognition. ICASSP 2016: 5745-5749 - [c90]Takaaki Hori, Chiori Hori, Shinji Watanabe
, John R. Hershey:
Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. ICASSP 2016: 5990-5994 - [c89]Chiori Hori, Shinji Watanabe
, Takaaki Hori, Bret A. Harsham, John R. Hershey, Yusuke Koji, Youichi Fujii, Yuki Furumoto:
Driver confusion status detection using recurrent neural networks. ICME 2016: 1-6 - [c88]Yusuf Ziya Isik
, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe
, John R. Hershey:
Single-Channel Multi-Speaker Separation Using Deep Clustering. INTERSPEECH 2016: 545-549 - [c87]Hakan Erdogan, John R. Hershey, Shinji Watanabe
, Michael I. Mandel, Jonathan Le Roux:
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks. INTERSPEECH 2016: 1981-1985 - [c86]Katerina Zmolíková
, Martin Karafiát
, Karel Veselý, Marc Delcroix
, Shinji Watanabe
, Lukás Burget
, Jan Cernocký
:
Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training. INTERSPEECH 2016: 2354-2358 - [c85]Chiori Hori, Takaaki Hori, Shinji Watanabe
, John R. Hershey:
Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs. INTERSPEECH 2016: 3236-3240 - [c84]Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe
, Bret Harsham, Jonathan Le Roux, John R. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, Takeyuki Aikawa:
Dialog state tracking with attention-based sequence-to-sequence learning. SLT 2016: 552-558 - [c83]Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe
, Takaaki Hori, Kevin Duh:
Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. SLT 2016: 665-671 - [i3]Yusuf Ziya Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey:
Single-Channel Multi-Speaker Separation using Deep Clustering. CoRR abs/1607.02173 (2016) - [i2]Suyoun Kim, Takaaki Hori, Shinji Watanabe:
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning. CoRR abs/1609.06773 (2016) - 2015
- [b1]Shinji Watanabe
, Jen-Tzung Chien
:
Bayesian Speech and Language Processing. Cambridge University Press 2015, ISBN 9781107295360 - [j22]Yuuki Tachioka, Tomohiro Narita, Shinji Watanabe
:
Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments. EURASIP J. Adv. Signal Process. 2015: 52 (2015) - [c82]Hiroki Kanagawa, Yuuki Tachioka, Shinji Watanabe
, Jun Ishii:
Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN. APSIPA 2015: 86-92 - [c81]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe
:
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition. ASRU 2015: 475-481 - [c80]Jon Barker, Ricard Marxer
, Emmanuel Vincent, Shinji Watanabe
:
The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines. ASRU 2015: 504-511 - [c79]Roger Hsiao, Jeff Z. Ma, William Hartmann, Martin Karafiát
, Frantisek Grézl, Lukás Burget
, Igor Szöke, Jan Cernocký
, Shinji Watanabe
, Zhuo Chen, Sri Harish Reddy Mallidi, Hynek Hermansky
, Stavros Tsakalidis, Richard M. Schwartz:
Robust speech recognition in unknown reverberant and noisy conditions. ASRU 2015: 533-538 - [c78]Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe
, Kevin Duh:
Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy. ASRU 2015: 610-616 - [c77]Felix Weninger, Hakan Erdogan, Shinji Watanabe
, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn W. Schuller
:
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR. LVA/ICA 2015: 91-99 - [c76]Hakan Erdogan, John R. Hershey, Shinji Watanabe
, Jonathan Le Roux:
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. ICASSP 2015: 708-712 - [c75]Takahiro Shinozaki, Shinji Watanabe
:
Structure discovery of deep neural network based on evolutionary algorithms. ICASSP 2015: 4979-4983 - [c74]Yuuki Tachioka, Shinji Watanabe
:
Discriminative method for recurrent neural network language models. ICASSP 2015: 5386-5390 - [c73]Ramón Fernandez Astudillo, Shinji Watanabe, Ahmed Hussen Abdelaziz, Dorothea Kolossa:
Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview. INTERSPEECH 2015 - [c72]Yi Luan, Shinji Watanabe, Bret Harsham:
Efficient learning for spoken language understanding tasks with word embedding based pre-training. INTERSPEECH 2015: 1398-1402 - [c71]Zhuo Chen, Shinji Watanabe, Hakan Erdogan, John R. Hershey:
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. INTERSPEECH 2015: 3274-3278 - [c70]Yuuki Tachioka, Shinji Watanabe:
Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features. INTERSPEECH 2015: 3541-3545 - [c69]Ahmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emmanuel Vincent, Dorothea Kolossa:
Uncertainty propagation through deep neural networks. INTERSPEECH 2015: 3561-3565 - [i1]John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe:
Deep clustering: Discriminative embeddings for segmentation and separation. CoRR abs/1508.04306 (2015) - 2014
- [j21]Shinji Watanabe
, Atsushi Nakamura, Biing-Hwang Fred Juang:
Structural Bayesian Linear Regression for Hidden Markov Models. J. Signal Process. Syst. 74(3): 341-358 (2014) - [c68]Yuuki Tachioka, Shinji Watanabe
, Jonathan Le Roux, John R. Hershey:
Sequence discriminative training for low-rank deep neural networks. GlobalSIP 2014: 572-576 - [c67]Yuuki Tachioka, Tomohiro Narita, Shinji Watanabe
, Jonathan Le Roux:
Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments. HSCMA 2014: 162-166 - [c66]Shinji Watanabe
, Jonathan Le Roux:
Black box optimization for automatic speech recognition. ICASSP 2014: 3256-3260 - [c65]Hao Tang, Shinji Watanabe
, Tim K. Marks, John R. Hershey:
Log-linear dialog manager. ICASSP 2014: 4092-4096 - [c64]Felix Weninger, Shinji Watanabe
, Yuuki Tachioka, Björn W. Schuller
:
Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. ICASSP 2014: 4623-4627 - [c63]Chao Weng, Dong Yu, Shinji Watanabe
, Biing-Hwang Fred Juang:
Recurrent deep neural networks for robust speech recognition. ICASSP 2014: 5532-5536 - [c62]Shinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji:
Cost-level integration of statistical and rule-based dialog managers. INTERSPEECH 2014: 323-327 - [c61]Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe:
Discriminative NMF and its application to single-channel source separation. INTERSPEECH 2014: 865-869 - [c60]Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
Sequential maximum mutual information linear discriminant analysis for speech recognition. INTERSPEECH 2014: 2415-2419 - 2013
- [j20]Marc Delcroix
, Shinji Watanabe
, Tomohiro Nakatani, Atsushi Nakamura:
Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer. Comput. Speech Lang. 27(1): 350-368 (2013) - [j19]Marc Delcroix
, Keisuke Kinoshita
, Tomohiro Nakatani, Shoko Araki
, Atsunori Ogawa, Takaaki Hori, Shinji Watanabe
, Masakiyo Fujimoto, Takuya Yoshioka, Takanobu Oba, Yotaro Kubo, Mehrez Souden, Seong-Jun Hahm, Atsushi Nakamura:
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds. Comput. Speech Lang. 27(3): 851-873 (2013) - [j18]Takuya Maekawa, Shinji Watanabe
:
Training data selection with user's physical characteristics data for acceleration-based activity modeling. Pers. Ubiquitous Comput. 17(3): 451-463 (2013) - [j17]Tomoharu Iwata, Shinji Watanabe
:
Influence relation estimation based on lexical entrainment in conversation. Speech Commun. 55(2): 329-339 (2013) - [j16]Seong-Jun Hahm, Shinji Watanabe
, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura:
Prior-shared feature and model space speaker adaptation by consistently employing map estimation. Speech Commun. 55(3): 415-431 (2013) - [j15]Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe
, Nobuaki Minematsu, Keikichi Hirose:
Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting. IEEE Trans. Speech Audio Process. 21(10): 2172-2181 (2013) - [c59]Yuuki Tachioka, Shinji Watanabe
, Jonathan Le Roux, John R. Hershey:
A generalized discriminative training framework for system combination. ASRU 2013: 43-48 - [c58]Emmanuel Vincent, Jon Barker
, Shinji Watanabe
, Jonathan Le Roux, Francesco Nesta, Marco Matassoni:
The second 'CHiME' speech separation and recognition challenge: An overview of challenge systems and outcomes. ASRU 2013: 162-167 - [c57]Emmanuel Vincent, Jon Barker
, Shinji Watanabe
, Jonathan Le Roux, Francesco Nesta, Marco Matassoni:
The second 'chime' speech separation and recognition challenge: Datasets, tasks and baselines. ICASSP 2013: 126-130 - [c56]Yuuki Tachioka, Shinji Watanabe
, John R. Hershey:
Effectiveness of discriminative training and feature transformation for reverberated and noisy speech. ICASSP 2013: 6935-6939 - [c55]Shinji Watanabe
, John R. Hershey:
Stereo-based feature enhancement using dictionary learning. ICASSP 2013: 7073-7077 - [c54]Koichiro Yoshino, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
Statistical Dialogue Management using Intention Dependency Graph. IJCNLP 2013: 962-966 - [c53]Yuuki Tachioka, Shinji Watanabe:
Discriminative training of acoustic models for system combination. INTERSPEECH 2013: 2355-2359 - [c52]Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe
, Atsushi Nakamura, Tetsunori Kobayashi:
Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data. MLSP 2013: 1-6 - [c51]Jonathan Le Roux, Shinji Watanabe
, John R. Hershey:
Ensemble learning for speech enhancement. WASPAA 2013: 1-4 - 2012
- [j14]Masakiyo Fujimoto, Shinji Watanabe
, Tomohiro Nakatani:
Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection. Speech Commun. 54(2): 229-244 (2012) - [j13]Takaaki Hori, Shoko Araki
, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe
, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami
, Keisuke Kinoshita
, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato
:
Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera. IEEE Trans. Speech Audio Process. 20(2): 499-513 (2012) - [j12]Daisuke Saito, Shinji Watanabe
, Atsushi Nakamura, Nobuaki Minematsu:
Statistical Voice Conversion Based on Noisy Channel Model. IEEE Trans. Speech Audio Process. 20(6): 1784-1794 (2012) - [j11]Yotaro Kubo, Shinji Watanabe
, Takaaki Hori, Atsushi Nakamura:
Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition. IEEE Trans. Speech Audio Process. 20(8): 2240-2251 (2012) - [c50]Yotaro Kubo, Shinji Watanabe
, Atsushi Nakamura, Simon Wiesler, Ralf Schlüter
, Hermann Ney:
Basis vector orthogonalization for an improved kernel gradient matching pursuit method. ICASSP 2012: 1909-1912 - [c49]Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe
, Nobuaki Minematsu, Keikichi Hirose:
MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments. ICASSP 2012: 4109-4112 - [c48]Yotaro Kubo, Shinji Watanabe
, Atsushi Nakamura:
Decoding network optimization using minimum transition error training. ICASSP 2012: 4197-4200 - [c47]Shinji Watanabe
, Yotaro Kubo, Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Bag Of ARCS: New representation of speech segment features based on finite state machines. ICASSP 2012: 4201-4204 - [c46]Masakiyo Fujimoto, Shinji Watanabe
, Tomohiro Nakatani:
Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation. ICASSP 2012: 4713-4716 - [c45]Marc Delcroix
, Atsunori Ogawa, Shinji Watanabe
, Tomohiro Nakatani, Atsushi Nakamura:
Discriminative feature transforms using differenced maximum mutual information. ICASSP 2012: 4753-4756 - [c44]Roland Roller, Shinji Watanabe
, Tomoharu Iwata:
Effect of dialog acts on word use in polylogue. ICASSP 2012: 4969-4972 - [c43]Ekapol Chuangsuwanich, Shinji Watanabe
, Takaaki Hori, Tomoharu Iwata, James R. Glass:
Handling uncertain observations in unsupervised topic-mixture language model adaptation. ICASSP 2012: 5033-5036 - [c42]Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe
, Tetsunori Kobayashi:
Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering. ICASSP 2012: 5253-5256 - [c41]Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:
Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model. INTERSPEECH 2012: 2166-2169 - 2011
- [j10]Shinji Watanabe
, Tomoharu Iwata, Takaaki Hori, Atsushi Sako, Yasuo Ariki:
Topic tracking language model for speech recognition. Comput. Speech Lang. 25(2): 440-461 (2011) - [c40]Yotaro Kubo, Simon Wiesler, Ralf Schlüter
, Hermann Ney, Shinji Watanabe
, Atsushi Nakamura, Tetsunori Kobayashi:
Subspace pursuit method for kernel-log-linear models. ICASSP 2011: 4500-4503 - [c39]Shinji Watanabe
, Daichi Mochihashi, Takaaki Hori, Atsushi Nakamura:
Gibbs sampling based Multi-scale Mixture Model for speaker clustering. ICASSP 2011: 4524-4527 - [c38]Daisuke Saito, Shinji Watanabe
, Atsushi Nakamura, Nobuaki Minematsu:
High accurate model-integration-based voice conversion using dynamic features and model structure optimization. ICASSP 2011: 4576-4579 - [c37]Masakiyo Fujimoto, Shinji Watanabe
, Tomohiro Nakatani:
Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition. ICASSP 2011: 4816-4819 - [c36]Tomoharu Iwata, Shinji Watanabe
, Hiroshi Sawada:
Fashion Coordinates Recommender System Using Photographs from Fashion Magazines. IJCAI 2011: 2262-2267 - [c35]Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
A Robust Estimation Method of Noise Mixture Model for Noise Suppression. INTERSPEECH 2011: 697-700 - [c34]Shinji Watanabe, Atsushi Nakamura, Biing-Hwang Juang:
Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution. INTERSPEECH 2011: 1081-1084 - [c33]Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi:
Speaker Clustering Based on Utterance-Oriented Dirichlet Process Mixture Model. INTERSPEECH 2011: 2905-2908 - [c32]Tomoharu Iwata, Shinji Watanabe:
Learning Influences from Word Use in Polylogue. INTERSPEECH 2011: 3089-3092 - [c31]Takuya Maekawa, Shinji Watanabe
:
Unsupervised Activity Recognition with User's Physical Characteristics Data. ISWC 2011: 89-96 - [c30]Shinji Watanabe
, Atsushi Nakamura, Biing-Hwang Juang:
Bayesian linear regression for Hidden Markov Model based on optimizing variational bounds. MLSP 2011: 1-6 - [p1]Marc Delcroix, Shinji Watanabe, Tomohiro Nakatani:
Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing. Robust Speech Recognition of Uncertain or Missing Data 2011: 225-255 - 2010
- [j9]Yotaro Kubo, Shinji Watanabe
, Atsushi Nakamura, Erik McDermott, Tetsunori Kobayashi:
A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification. IEEE J. Sel. Top. Signal Process. 4(6): 974-984 (2010) - [j8]David Cournapeau, Shinji Watanabe
, Atsushi Nakamura, Tatsuya Kawahara
:
Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection. IEEE J. Sel. Top. Signal Process. 4(6): 1071-1083 (2010) - [j7]Shinji Watanabe
, Atsushi Nakamura:
Predictor-Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale. IEEE Trans. Speech Audio Process. 18(2): 395-406 (2010) - [c29]Hideyuki Watanabe, Shigeru Katagiri, Kouta Yamada, Erik McDermott, Atsushi Nakamura, Shinji Watanabe
, Miho Ohsaki:
Minimum Error Classification with geometric margin control. ICASSP 2010: 2170-2173 - [c28]David Cournapeau, Shinji Watanabe
, Atsushi Nakamura, Tatsuya Kawahara
:
Using online model comparison in the Variational Bayes framework for online unsupervised Voice Activity Detection. ICASSP 2010: 4462-4465 - [c27]Erik McDermott, Shinji Watanabe
, Atsushi Nakamura:
Discriminative training based on an integrated view of MPE and MMI in margin and error space. ICASSP 2010: 4894-4897 - [c26]Shinji Watanabe
, Takaaki Hori, Erik McDermott, Atsushi Nakamura:
A discriminative model for continuous speech recognition based on Weighted Finite State Transducers. ICASSP 2010: 4922-4925 - [c25]Takaaki Hori, Shinji Watanabe
, Atsushi Nakamura:
Search error risk minimization in Viterbi beam search for speech recognition. ICASSP 2010: 4934-4937 - [c24]Kazuo Aoyama, Shinji Watanabe
, Hiroshi Sawada, Yasuhiro Minami, Naonori Ueda, Kazumi Saito:
Fast similarity search on a large speech data set with neighborhood graph indexing. ICASSP 2010: 5358-5361 - [c23]Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. INTERSPEECH 2010: 346-349 - [c22]Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu:
Probabilistic integration of joint density model and speaker model for voice conversion. INTERSPEECH 2010: 1728-1731 - [c21]Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:
Improvements of search error risk minimization in viterbi beam search for speech recognition. INTERSPEECH 2010: 1962-1965 - [c20]Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination. INTERSPEECH 2010: 2954-2957 - [c19]Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. INTERSPEECH 2010: 3102-3105 - [c18]Shinji Watanabe
, Tomoharu Iwata, Takaaki Hori, Atsushi Sako, Yasuo Ariki:
Application of topic tracking model to language model adaptation and meeting analysis. SLT 2010: 378-383 - [c17]Takaaki Hori, Shoko Araki
, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe
, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita
, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato
:
Real-time meeting recognition and understanding using distant microphones and omni-directional camera. SLT 2010: 424-429
2000 – 2009
- 2009
- [j6]Marc Delcroix
, Tomohiro Nakatani, Shinji Watanabe
:
Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing. IEEE Trans. Speech Audio Process. 17(2): 324-334 (2009) - [c16]Atsushi Nakamura, Erik McDermott, Shinji Watanabe
, Shigeru Katagiri:
A unified view for discriminative objective functions based on negative exponential of difference measure between strings. ICASSP 2009: 1633-1636 - [c15]Shinji Watanabe
, Atsushi Nakamura:
On-line adaptation and Bayesian detection of environmental changes based on a macroscopic time evolution system. ICASSP 2009: 4373-4376 - [c14]Tomoharu Iwata, Shinji Watanabe, Takeshi Yamada, Naonori Ueda:
Topic Tracking Model for Analyzing Consumer Purchase Behavior. IJCAI 2009: 1427-1432 - [c13]Erik McDermott, Shinji Watanabe, Atsushi Nakamura:
Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training. INTERSPEECH 2009: 224-227 - [c12]Yosuke Izumi, Kenta Nishiki, Shinji Watanabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama:
Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment. INTERSPEECH 2009: 1955-1958 - 2008
- [c11]Marc Delcroix
, Tomohiro Nakatani, Shinji Watanabe
:
Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer. ICASSP 2008: 4073-4076 - [c10]Shinji Watanabe
, Atsushi Nakamura:
A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches. ICASSP 2008: 4285-4288 - 2007
- [c9]Shinji Watanabe
, Atsushi Nakamura:
Incremental Adaptation Based on a Macroscopic Time Evolution System. ICASSP (4) 2007: 769-772 - 2006
- [j5]Atsushi Nakamura, Shinji Watanabe, Takaaki Hori, Erik McDermott, Shigeru Katagiri:
Advanced computational models and learning theories for spoken language processing. IEEE Comput. Intell. Mag. 1(2): 5-9 (2006) - [j4]Shinji Watanabe
, Atsushi Nakamura:
Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework. IEICE Trans. Inf. Syst. 89-D(3): 970-980 (2006) - [j3]Shinji Watanabe
, Atsushi Sako, Atsushi Nakamura:
Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 14(3): 855-872 (2006) - [c8]Shinji Watanabe, Atsushi Nakamura:
Acoustic Model Adaptation Based on Coarse/Fine Training of Transfer Vectors Using Directional Statistics. ICASSP (1) 2006: 1005-1008 - 2005
- [j2]Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda:
Selection of Shared-State Hidden Markov Model Structure Using Bayesian Criterion. IEICE Trans. Inf. Syst. 88-D(1): 1-9 (2005) - [c7]Shinji Watanabe, Atsushi Nakamura:
Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition. INTERSPEECH 2005: 1105-1108 - 2004
- [j1]Shinji Watanabe
, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda:
Variational bayesian estimation and clustering for speech recognition. IEEE Trans. Speech Audio Process. 12(4): 365-381 (2004) - [c6]Parham Zolfaghari, Shinji Watanabe, Atsushi Nakamura, Shigeru Katagiri:
Bayesian modelling of the speech spectrum using mixture of Gaussians. ICASSP (1) 2004: 553-556 - [c5]Shinji Watanabe, Atsushi Sako, Atsushi Nakamura:
Automatic determination of acoustic model topology using variational Bayesian estimation and clustering. ICASSP (1) 2004: 813-816 - [c4]Shinji Watanabe:
Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task. INTERSPEECH 2004: 2933-2936 - 2003
- [c3]Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda:
Application of variational Bayesian estimation and clustering to acoustic model adaptation. ICASSP (1) 2003: 568-571 - 2002
- [c2]Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda:
Constructing shared-state hidden Markov models based on a Bayesian approach. INTERSPEECH 2002: 2669-2672 - [c1]Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda:
Application of Variational Bayesian Approach to Speech Recognition. NIPS 2002: 1237-1244
Coauthor Index
aka: Jan Honza Cernocký
aka: Jee-weon Jung
aka: Hung-Yi Lee
aka: Abdelrahman Mohamed
aka: Chaitanya Prasad Narisetty
aka: Juan Miguel Pino
aka: Bhiksha Ramakrishnan
aka: Nelson Yalta

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-02-20 20:47 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint