default search action
Takaaki Hori
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
Books and Theses
- 2013
- [b1]Takaaki Hori, Atsushi Nakamura:
Speech Recognition Algorithms Based on Weighted Finite-State Transducers. Synthesis Lectures on Speech and Audio Processing, Morgan & Claypool Publishers 2013, ISBN 9781608454730
Journal Articles
- 2024
- [j24]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. IEEE ACM Trans. Audio Speech Lang. Process. 32: 325-351 (2024) - 2022
- [j23]Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori:
Momentum Pseudo-Labeling: Semi-Supervised ASR With Continuously Improving Pseudo-Labels. IEEE J. Sel. Top. Signal Process. 16(6): 1424-1438 (2022) - 2020
- [j22]Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky:
Multi-Stream End-to-End Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 28: 646-655 (2020) - 2019
- [j21]Takaaki Hori, Wen Wang, Yusuke Koji, Chiori Hori, Bret Harsham, John R. Hershey:
Adversarial training and decoding strategies for end-to-end neural conversation models. Comput. Speech Lang. 54: 122-139 (2019) - [j20]Chiori Hori, Julien Perez, Ryuichiro Higashinaka, Takaaki Hori, Y-Lan Boureau, Michimasa Inaba, Yuiko Tsunomori, Tetsuro Takahashi, Koichiro Yoshino, Seokhwan Kim:
Overview of the sixth dialog system technology challenge: DSTC6. Comput. Speech Lang. 55: 1-25 (2019) - 2017
- [j19]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe:
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. Comput. Speech Lang. 46: 401-418 (2017) - [j18]Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey, Tomoki Hayashi:
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition. IEEE J. Sel. Top. Signal Process. 11(8): 1240-1253 (2017) - [j17]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey, Xiong Xiao:
Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming. IEEE J. Sel. Top. Signal Process. 11(8): 1274-1288 (2017) - [j16]Atsunori Ogawa, Takaaki Hori:
Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks. Speech Commun. 89: 70-83 (2017) - [j15]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Duration-Controlled LSTM for Polyphonic Sound Event Detection. IEEE ACM Trans. Audio Speech Lang. Process. 25(11): 2059-2070 (2017) - 2016
- [j14]Atsunori Ogawa, Takaaki Hori, Atsushi Nakamura:
Estimating Speech Recognition Accuracy Based on Error Type Classification. IEEE ACM Trans. Audio Speech Lang. Process. 24(12): 2400-2413 (2016) - 2015
- [j13]Marc Delcroix, Takuya Yoshioka, Atsunori Ogawa, Yotaro Kubo, Masakiyo Fujimoto, Nobutaka Ito, Keisuke Kinoshita, Miquel Espi, Shoko Araki, Takaaki Hori, Tomohiro Nakatani:
Strategies for distant speech recognitionin reverberant environments. EURASIP J. Adv. Signal Process. 2015: 60 (2015) - 2013
- [j12]Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Atsunori Ogawa, Takaaki Hori, Shinji Watanabe, Masakiyo Fujimoto, Takuya Yoshioka, Takanobu Oba, Yotaro Kubo, Mehrez Souden, Seong-Jun Hahm, Atsushi Nakamura:
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds. Comput. Speech Lang. 27(3): 851-873 (2013) - [j11]Seong-Jun Hahm, Shinji Watanabe, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura:
Prior-shared feature and model space speaker adaptation by consistently employing map estimation. Speech Commun. 55(3): 415-431 (2013) - 2012
- [j10]Takanobu Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito:
Model Shrinkage for Discriminative Language Models. IEICE Trans. Inf. Syst. 95-D(5): 1465-1474 (2012) - [j9]Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Efficient training of discriminative language models by sample selection. Speech Commun. 54(6): 791-800 (2012) - [j8]Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato:
Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera. IEEE Trans. Speech Audio Process. 20(2): 499-513 (2012) - [j7]Takanobu Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito:
Round-Robin Duel Discriminative Language Models. IEEE Trans. Speech Audio Process. 20(4): 1244-1255 (2012) - [j6]Yotaro Kubo, Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:
Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition. IEEE Trans. Speech Audio Process. 20(8): 2240-2251 (2012) - 2011
- [j5]Shinji Watanabe, Tomoharu Iwata, Takaaki Hori, Atsushi Sako, Yasuo Ariki:
Topic tracking language model for speech recognition. Comput. Speech Lang. 25(2): 440-461 (2011) - 2010
- [j4]Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Improved Sequential Dependency Analysis Integrating Labeling-Based Sentence Boundary Detection. IEICE Trans. Inf. Syst. 93-D(5): 1272-1281 (2010) - 2008
- [j3]Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Sequential dependency analysis for online spontaneous speech processing. Speech Commun. 50(7): 616-625 (2008) - 2007
- [j2]Takaaki Hori, Chiori Hori, Yasuhiro Minami, Atsushi Nakamura:
Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition. IEEE Trans. Speech Audio Process. 15(4): 1352-1365 (2007) - 2006
- [j1]Atsushi Nakamura, Shinji Watanabe, Takaaki Hori, Erik McDermott, Shigeru Katagiri:
Advanced computational models and learning theories for spoken language processing. IEEE Comput. Intell. Mag. 1(2): 5-9 (2006)
Conference and Workshop Papers
- 2023
- [c108]Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang:
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition. ICASSP 2023: 1-5 - 2022
- [c107]Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Sequence Transduction with Graph-Based Supervision. ICASSP 2022: 7212-7216 - [c106]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. ICASSP 2022: 7322-7326 - [c105]Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori:
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy. ICASSP 2022: 7672-7676 - [c104]Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori:
Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning. ICASSP 2022: 7732-7736 - [c103]Chiori Hori, Takaaki Hori, Jonathan Le Roux:
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers. INTERSPEECH 2022: 4511-4515 - 2021
- [c102]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Capturing Multi-Resolution Context by Dilated Self-Attention. ICASSP 2021: 5869-5873 - [c101]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification. ICASSP 2021: 6548-6552 - [c100]Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training. ICASSP 2021: 6553-6557 - [c99]Chiori Hori, Takaaki Hori, Jonathan Le Roux:
Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers. Interspeech 2021: 586-590 - [c98]Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori:
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition. Interspeech 2021: 726-730 - [c97]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition. Interspeech 2021: 1822-1826 - [c96]Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux:
Advanced Long-Context End-to-End Speech Recognition Using Context-Expanded Transformers. Interspeech 2021: 2097-2101 - 2020
- [c95]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Streaming Automatic Speech Recognition with the Transformer Model. ICASSP 2020: 6074-6078 - [c94]Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unsupervised Speaker Adaptation Using Attention-Based Speaker Memory for End-to-End ASR. ICASSP 2020: 7384-7388 - [c93]Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux:
All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection. INTERSPEECH 2020: 3112-3116 - [c92]Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux:
Transformer-Based Long-Context End-to-End Speech Recognition. INTERSPEECH 2020: 5011-5015 - 2019
- [c91]Shigeki Karita, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto:
A Comparative Study on Transformer vs RNN in Speech Applications. ASRU 2019: 449-456 - [c90]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models. ASRU 2019: 936-943 - [c89]Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya Ogata:
CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments*. EUSIPCO 2019: 1-5 - [c88]Chiori Hori, Huda AlAmri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh:
End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features. ICASSP 2019: 2352-2356 - [c87]Murali Karthick Baskar, Lukás Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký:
Promising Accurate Prefix Boosting for Sequence-to-sequence ASR. ICASSP 2019: 5646-5650 - [c86]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Triggered Attention for End-to-end Speech Recognition. ICASSP 2019: 5666-5670 - [c85]Jaejin Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesús Villalba, Najim Dehak:
Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition. ICASSP 2019: 6191-6195 - [c84]Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux:
Cycle-consistency Training for End-to-end Speech Recognition. ICASSP 2019: 6271-6275 - [c83]Xiaofei Wang, Ruizhi Li, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky:
Stream Attention-based Multi-array End-to-end Speech Recognition. ICASSP 2019: 7105-7109 - [c82]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition. INTERSPEECH 2019: 76-80 - [c81]Chiori Hori, Anoop Cherian, Tim K. Marks, Takaaki Hori:
Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog. INTERSPEECH 2019: 1886-1890 - [c80]Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan Cernocký:
Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems. INTERSPEECH 2019: 2220-2224 - [c79]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
End-to-End Multilingual Multi-Speaker Speech Recognition. INTERSPEECH 2019: 3755-3759 - [c78]Murali Karthick Baskar, Shinji Watanabe, Ramón Fernandez Astudillo, Takaaki Hori, Lukás Burget, Jan Cernocký:
Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text. INTERSPEECH 2019: 3790-3794 - [c77]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz, Jonathan Le Roux:
Vectorized Beam Search for CTC-Attention-Based Speech Recognition. INTERSPEECH 2019: 3825-3829 - 2018
- [c76]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
A Purely End-to-End System for Multi-speaker Speech Recognition. ACL (1) 2018: 2620-2630 - [c75]Chiori Hori, Takaaki Hori, Gordon Wichern, Jue Wang, Teng-Yok Lee, Anoop Cherian, Tim K. Marks:
Multimodal Attention for Fusion of Audio and Spatiotemporal Features for Video Description. CVPR Workshops 2018: 2528-2531 - [c74]Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, John R. Hershey:
End-to-End Multi-Speaker Speech Recognition. ICASSP 2018: 4819-4823 - [c73]Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey:
An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech. ICASSP 2018: 4919-4923 - [c72]Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, John R. Hershey:
Speaker Adaptation for Multichannel End-to-End Speech Recognition. ICASSP 2018: 6707-6711 - [c71]Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai:
ESPnet: End-to-End Speech Processing Toolkit. INTERSPEECH 2018: 2207-2211 - [c70]Takaaki Hori, Jaejin Cho, Shinji Watanabe:
End-to-end Speech Recognition With Word-Based Rnn Language Models. SLT 2018: 389-396 - [c69]Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for end-to-end ASR. SLT 2018: 426-433 - [c68]Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Mallidi, Nelson Yalta, Martin Karafiát, Shinji Watanabe, Takaaki Hori:
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling. SLT 2018: 521-527 - 2017
- [c67]Takaaki Hori, Shinji Watanabe, John R. Hershey:
Joint CTC/attention decoding for end-to-end speech recognition. ACL (1) 2017: 518-529 - [c66]Shinji Watanabe, Takaaki Hori, John R. Hershey:
Language independent end-to-end architecture for joint language identification and speech recognition. ASRU 2017: 265-271 - [c65]Takaaki Hori, Shinji Watanabe, John R. Hershey:
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition. ASRU 2017: 287-293 - [c64]Chiori Hori, Takaaki Hori, Tim K. Marks, John R. Hershey:
Early and late integration of audio features for automatic video description. ASRU 2017: 430-436 - [c63]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection. ICASSP 2017: 766-770 - [c62]Suyoun Kim, Takaaki Hori, Shinji Watanabe:
Joint CTC-attention based end-to-end speech recognition using multi-task learning. ICASSP 2017: 4835-4839 - [c61]Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey:
Student-teacher network learning with enhanced features. ICASSP 2017: 5275-5279 - [c60]Chiori Hori, Takaaki Hori, Teng-Yok Lee, Ziming Zhang, Bret Harsham, John R. Hershey, Tim K. Marks, Kazuhiro Sumi:
Attention-Based Multimodal Fusion for Video Description. ICCV 2017: 4203-4212 - [c59]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey:
Multichannel End-to-end Speech Recognition. ICML 2017: 2632-2641 - [c58]Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan:
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM. INTERSPEECH 2017: 949-953 - 2016
- [c57]Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection. DCASE 2016: 35-39 - [c56]Takaaki Hori, Chiori Hori, Shinji Watanabe, John R. Hershey:
Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. ICASSP 2016: 5990-5994 - [c55]Chiori Hori, Shinji Watanabe, Takaaki Hori, Bret A. Harsham, John R. Hershey, Yusuke Koji, Youichi Fujii, Yuki Furumoto:
Driver confusion status detection using recurrent neural networks. ICME 2016: 1-6 - [c54]Chiori Hori, Takaaki Hori, Shinji Watanabe, John R. Hershey:
Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs. INTERSPEECH 2016: 3236-3240 - [c53]Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, Bret Harsham, Jonathan Le Roux, John R. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, Takeyuki Aikawa:
Dialog state tracking with attention-based sequence-to-sequence learning. SLT 2016: 552-558 - [c52]Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe, Takaaki Hori, Kevin Duh:
Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. SLT 2016: 665-671 - 2015
- [c51]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe:
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition. ASRU 2015: 475-481 - [c50]Atsunori Ogawa, Takaaki Hori:
ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks. ICASSP 2015: 4370-4374 - [c49]Marc Delcroix, Keisuke Kinoshita, Takaaki Hori, Tomohiro Nakatani:
Context adaptive deep neural networks for fast acoustic model adaptation. ICASSP 2015: 4535-4539 - [c48]Quoc Truong Do, Satoshi Nakamura, Marc Delcroix, Takaaki Hori:
WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition. ICASSP 2015: 4959-4963 - [c47]Kazuo Aoyama, Atsunori Ogawa, Takashi Hattori, Takaaki Hori:
Double-layer neighborhood graph based similarity search for fast query-by-example spoken term detection. ICASSP 2015: 5216-5220 - [c46]Tsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi:
Multiscale recurrent neural network based language model. INTERSPEECH 2015: 2366-2370 - 2014
- [c45]Marc Delcroix, Takuya Yoshioka, Atsunori Ogawa, Yotaro Kubo, Masakiyo Fujimoto, Nobutaka Ito, Keisuke Kinoshita, Miquel Espi, Shoko Araki, Takaaki Hori, Tomohiro Nakatani:
Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition. GlobalSIP 2014: 522-526 - [c44]Atsunori Ogawa, Keisuke Kinoshita, Takaaki Hori, Tomohiro Nakatani, Atsushi Nakamura:
Fast segment search for corpus-based speech enhancement based on speech recognition technology. ICASSP 2014: 1557-1561 - [c43]Takaaki Hori, Yotaro Kubo, Atsushi Nakamura:
Real-time one-pass decoding with recurrent neural network language model for speech recognition. ICASSP 2014: 6364-6368 - [c42]Kazuo Aoyama, Atsunori Ogawa, Takashi Hattori, Takaaki Hori, Atsushi Nakamura:
Zero-resource spoken term detection using hierarchical graph-based similarity search. ICASSP 2014: 7093-7097 - [c41]Yotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura:
Restructuring output layers of deep neural networks using minimum risk parameter clustering. INTERSPEECH 2014: 1068-1072 - 2013
- [c40]Atsunori Ogawa, Takaaki Hori, Atsushi Nakamura:
Discriminative recognition rate estimation for N-best list and its application to N-best rescoring. ICASSP 2013: 6832-6836 - [c39]Tomohiro Nakatani, Mehrez Souden, Shoko Araki, Takuya Yoshioka, Takaaki Hori, Atsunori Ogawa:
Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition. ICASSP 2013: 7249-7253 - [c38]Yotaro Kubo, Takaaki Hori, Atsushi Nakamura:
Large vocabulary continuous speech recognition based on WFST structured classifiers and deep bottleneck features. ICASSP 2013: 7629-7633 - [c37]Seong-Jun Hahm, Atsunori Ogawa, Marc Delcroix, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura:
Feature space variational Bayesian linear regression and its combination with model space VBLR. ICASSP 2013: 7898-7902 - [c36]Kazuo Aoyama, Atsunori Ogawa, Takashi Hattori, Takaaki Hori, Atsushi Nakamura:
Graph index based query-by-example search on a large speech data set. ICASSP 2013: 8520-8524 - [c35]Yotaro Kubo, Takaaki Hori, Atsushi Nakamura:
A method for structure estimation of weighted finite-state transducers and its application to grapheme-to-phoneme conversion. INTERSPEECH 2013: 647-651 - [c34]Takanobu Oba, Atsunori Ogawa, Takaaki Hori, Hirokazu Masataki, Atsushi Nakamura:
Unsupervised discriminative language modeling using error rate estimator. INTERSPEECH 2013: 1223-1227 - 2012
- [c33]Shinji Watanabe, Yotaro Kubo, Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Bag Of ARCS: New representation of speech segment features based on finite state machines. ICASSP 2012: 4201-4204 - [c32]Atsunori Ogawa, Takaaki Hori, Atsushi Nakamura:
Error type classification and word accuracy estimation using alignment features from word confusion network. ICASSP 2012: 4925-4928 - [c31]Ekapol Chuangsuwanich, Shinji Watanabe, Takaaki Hori, Tomoharu Iwata, James R. Glass:
Handling uncertain observations in unsupervised topic-mixture language model adaptation. ICASSP 2012: 5033-5036 - [c30]Takanobu Oba, Takaaki Hori, Atsushi Nakamura, Akinori Ito:
Spoken document retrieval by discriminative modeling in a high dimensional feature space. ICASSP 2012: 5153-5156 - [c29]Seong-Jun Hahm, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura:
Speaker Adaptation Using Variational Bayesian Linear Regression in Normalized Feature Space. INTERSPEECH 2012: 803-806 - [c28]Satoshi Kobashikawa, Takaaki Hori, Yoshikazu Yamaguchi, Taichi Asami, Hirokazu Masataki, Satoshi Takahashi:
Efficient Beam Width Control to Suppress Excessive Speech Recognition Computation Time Based on Prior Score Range Normalization. INTERSPEECH 2012: 1011-1014 - [c27]Yotaro Kubo, Takaaki Hori, Atsushi Nakamura:
Integrating Deep Neural Networks into Structural Classification Approach based on Weighted Finite-State Transducers. INTERSPEECH 2012: 2594-2597 - [c26]Atsunori Ogawa, Takaaki Hori, Atsushi Nakamura:
Recognition rate estimation based on word alignment network and discriminative error type classification. SLT 2012: 113-118 - [c25]Satoshi Kobashikawa, Takaaki Hori, Yoshikazu Yamaguchi, Taichi Asami, Hirokazu Masataki, Satoshi Takahashi:
Efficient prior and incremental beam width control to suppress excessive speech recognition time based on score range estimation. SLT 2012: 125-130 - 2011
- [c24]Shinji Watanabe, Daichi Mochihashi, Takaaki Hori, Atsushi Nakamura:
Gibbs sampling based Multi-scale Mixture Model for speaker clustering. ICASSP 2011: 4524-4527 - [c23]Takanobu Oba, Takaaki Hori, Akinori Ito, Atsushi Nakamura:
Round-robin duel discriminative language models in one-pass decoding with on-the-fly error correction. ICASSP 2011: 5588-5591 - 2010
- [c22]Shinji Watanabe, Takaaki Hori, Erik McDermott, Atsushi Nakamura:
A discriminative model for continuous speech recognition based on Weighted Finite State Transducers. ICASSP 2010: 4922-4925 - [c21]Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:
Search error risk minimization in Viterbi beam search for speech recognition. ICASSP 2010: 4934-4937 - [c20]Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
A comparative study on methods of Weighted language model training for reranking lvcsr N-best hypotheses. ICASSP 2010: 5126-5129 - [c19]Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. INTERSPEECH 2010: 346-349 - [c18]Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:
Improvements of search error risk minimization in viterbi beam search for speech recognition. INTERSPEECH 2010: 1962-1965 - [c17]Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Round-robin discrimination model for reranking ASR hypotheses. INTERSPEECH 2010: 2446-2449 - [c16]Shinji Watanabe, Tomoharu Iwata, Takaaki Hori, Atsushi Sako, Yasuo Ariki:
Application of topic tracking model to language model adaptation and meeting analysis. SLT 2010: 378-383 - [c15]Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato:
Real-time meeting recognition and understanding using distant microphones and omni-directional camera. SLT 2010: 424-429 - 2007
- [c14]Takaaki Hori, I. Lee Hetherington, Timothy J. Hazen, James R. Glass:
Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks. ICASSP (4) 2007: 73-76 - [c13]Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition. INTERSPEECH 2007: 1753-1756 - 2006
- [c12]Takaaki Hori, Atsushi Nakamura:
An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech. ICASSP (1) 2006: 973-976 - [c11]Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking. INTERSPEECH 2006 - 2005
- [c10]Mike Schuster, Takaaki Hori:
Efficient Generation of high-order context-dependent Weighted Finite State Transducers for Speech Recognition. ICASSP (1) 2005: 201-204 - [c9]Takaaki Hori, Atsushi Nakamura:
Generalized fast on-the-fly composition algorithm for WFST-based speech recognition. INTERSPEECH 2005: 557-560 - [c8]Mike Schuster, Takaaki Hori, Atsushi Nakamura:
Experiments with probabilistic principal component analysis in LVCSR. INTERSPEECH 2005: 1685-1688 - 2004
- [c7]Takaaki Hori, Chiori Hori, Yasuhiro Minami:
Fast on-the-fly composition for weighted finite-state transducers in 1.8 million-word vocabulary continuous speech recognition. INTERSPEECH 2004: 289-292 - 2003
- [c6]Chiori Hori, Takaaki Hori, Hajime Tsukada, Hideki Isozaki, Yutaka Sasaki, Eisaku Maeda:
Spoken Interactive ODQA System: SPIQA. ACL (Companion) 2003: 153-156 - [c5]Takaaki Hori, Daniel Willett, Yasuhiro Minami:
Language model adaptation using WFST-based speaking-style translation. ICASSP (1) 2003: 228-231 - [c4]Chiori Hori, Takaaki Hori, Hideki Isozaki, Eisaku Maeda, Shigeru Katagiri, Sadaoki Furui:
Deriving disambiguous queries in a spoken interactive ODQA system. ICASSP (1) 2003: 624-627 - [c3]Takaaki Hori, Chiori Hori, Yasuhiro Minami:
Speech summarization using weighted finite-state transducers. INTERSPEECH 2003: 2817-2820 - [c2]Chiori Hori, Takaaki Hori, Sadaoki Furui:
Evaluation method for automatic speech summarization. INTERSPEECH 2003: 2825-2828 - 2001
- [c1]Takaaki Hori, Yoshiaki Noda, Shoichi Matsunaga:
Improved phoneme-history-dependent search for large-vocabulary continuous-speech recognition. INTERSPEECH 2001: 1809-1813
Parts in Books or Collections
- 2017
- [p1]Shinji Watanabe, Takaaki Hori, Yajie Miao, Marc Delcroix, Florian Metze, John R. Hershey:
Toolkits for Robust Speech Processing. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 369-382
Informal and Other Publications
- 2023
- [i39]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. CoRR abs/2303.03329 (2023) - 2022
- [i38]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. CoRR abs/2203.00232 (2022) - [i37]Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang:
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition. CoRR abs/2211.01438 (2022) - 2021
- [i36]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Capturing Multi-Resolution Context by Dilated Self-Attention. CoRR abs/2104.02858 (2021) - [i35]Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux:
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers. CoRR abs/2104.09426 (2021) - [i34]Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori:
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition. CoRR abs/2106.08922 (2021) - [i33]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition. CoRR abs/2107.01269 (2021) - [i32]Chiori Hori, Takaaki Hori, Jonathan Le Roux:
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers. CoRR abs/2108.02147 (2021) - [i31]Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori:
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy. CoRR abs/2110.04948 (2021) - [i30]Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori:
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning. CoRR abs/2110.06894 (2021) - [i29]Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Sequence Transduction with Graph-based Supervision. CoRR abs/2111.01272 (2021) - 2020
- [i28]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Streaming automatic speech recognition with the transformer model. CoRR abs/2001.02674 (2020) - [i27]Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR. CoRR abs/2002.06165 (2020) - [i26]Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux:
Multi-Pass Transformer for Machine Translation. CoRR abs/2009.11382 (2020) - [i25]Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Semi-Supervised Speech Recognition via Graph-based Temporal Classification. CoRR abs/2010.15653 (2020) - [i24]Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training. CoRR abs/2011.13439 (2020) - [i23]Shinji Watanabe, Florian Boyer, Xuankai Chang, Pengcheng Guo, Tomoki Hayashi, Yosuke Higuchi, Takaaki Hori, Wen-Chin Huang, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, Chenda Li, Jing Shi, Aswin Shanmugam Subramanian, Wangyou Zhang:
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans. CoRR abs/2012.13006 (2020) - 2019
- [i22]Murali Karthick Baskar, Shinji Watanabe, Ramón Fernandez Astudillo, Takaaki Hori, Lukás Burget, Jan Cernocký:
Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text. CoRR abs/1905.01152 (2019) - [i21]Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky:
Multi-Stream End-to-End Speech Recognition. CoRR abs/1906.08041 (2019) - [i20]Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang:
A Comparative Study on Transformer vs RNN in Speech Applications. CoRR abs/1909.06317 (2019) - 2018
- [i19]Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai:
ESPnet: End-to-End Speech Processing Toolkit. CoRR abs/1804.00015 (2018) - [i18]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
A Purely End-to-end System for Multi-speaker Speech Recognition. CoRR abs/1805.05826 (2018) - [i17]Chiori Hori, Huda AlAmri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh:
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features. CoRR abs/1806.08409 (2018) - [i16]Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, Ramón Fernandez Astudillo, Kazuya Takeda:
Back-Translation-Style Data Augmentation for End-to-End ASR. CoRR abs/1807.10893 (2018) - [i15]Takaaki Hori, Jaejin Cho, Shinji Watanabe:
End-to-end Speech Recognition with Word-based RNN Language Models. CoRR abs/1808.02608 (2018) - [i14]Jaejin Cho, Murali Karthick Baskar, Ruizhi Li, Matthew Wiesner, Sri Harish Reddy Mallidi, Nelson Yalta, Martin Karafiát, Shinji Watanabe, Takaaki Hori:
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling. CoRR abs/1810.03459 (2018) - [i13]Takaaki Hori, Ramón Fernandez Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux:
Cycle-consistency training for end-to-end speech recognition. CoRR abs/1811.01690 (2018) - [i12]Jaejin Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesús Villalba, Najim Dehak:
Language model integration based on memory control for sequence to sequence speech recognition. CoRR abs/1811.02162 (2018) - [i11]Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya Ogata:
CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments. CoRR abs/1811.02735 (2018) - [i10]Murali Karthick Baskar, Lukás Burget, Shinji Watanabe, Martin Karafiát, Takaaki Hori, Jan Honza Cernocký:
Promising Accurate Prefix Boosting for sequence-to-sequence ASR. CoRR abs/1811.02770 (2018) - [i9]Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan Honza Cernocký:
Analysis of Multilingual Sequence-to-Sequence speech recognition systems. CoRR abs/1811.03451 (2018) - [i8]Hiroshi Seki, Takaaki Hori, Shinji Watanabe:
Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition. CoRR abs/1811.04568 (2018) - [i7]Ruizhi Li, Xiaofei Wang, Sri Harish Reddy Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky:
Multi-encoder multi-resolution framework for end-to-end speech recognition. CoRR abs/1811.04897 (2018) - [i6]Xiaofei Wang, Ruizhi Li, Sri Harish Mallidi, Takaaki Hori, Shinji Watanabe, Hynek Hermansky:
Stream attention-based multi-array end-to-end speech recognition. CoRR abs/1811.04903 (2018) - 2017
- [i5]Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks:
Attention-Based Multimodal Fusion for Video Description. CoRR abs/1701.03126 (2017) - [i4]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey:
Multichannel End-to-end Speech Recognition. CoRR abs/1703.04783 (2017) - [i3]Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan:
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM. CoRR abs/1706.02737 (2017) - [i2]Chiori Hori, Takaaki Hori:
End-to-end Conversation Modeling Track in DSTC6. CoRR abs/1706.07440 (2017) - 2016
- [i1]Suyoun Kim, Takaaki Hori, Shinji Watanabe:
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning. CoRR abs/1609.06773 (2016)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 22:12 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint