default search action
ASRU 2017: Okinawa, Japan
- 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017. IEEE 2017, ISBN 978-1-5090-4788-8
- Emre Yilmaz, Julien van Hout, Horacio Franco:
Noise-robust exemplar matching for rescoring query-by-example search. 1-7 - Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani:
Learning speaker representation for neural network based multichannel speaker extraction. 8-15 - Wei-Ning Hsu, Yu Zhang, James R. Glass:
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. 16-23 - Anjali Menon, Chanwoo Kim, Umpei Kurokawa, Richard M. Stern:
Binaural processing for robust recognition of degraded speech. 24-31 - Shoko Araki, Nobutaka Ono, Keisuke Kinoshita, Marc Delcroix:
Meeting recognition with asynchronous distributed microphone array. 32-39 - Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
Adversarial training for data-driven speech enhancement without parallel corpus. 40-47 - Julien van Hout, Vikramjit Mitra, Horacio Franco, Chris Bartels, Dimitra Vergyri:
Tackling unseen acoustic conditions in query-by-example search using time and frequency convolution for multilingual deep bottleneck features. 48-54 - Keisuke Nakamura, Randy Gomez:
Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array. 55-62 - Hagen Soltau, Hank Liao, Hasim Sak:
Reducing the computational complexity for whole word models. 63-68 - Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:
Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence. 69-76 - Matthew Gibson, Gary Cook, Puming Zhan:
Semi-supervised training strategies for deep neural networks. 77-83 - Jeremy Heng Meng Wong, Mark J. F. Gales:
Multi-task ensembles with teacher-student training. 84-90 - Emre Yilmaz, Mitchell McLaren, Henk van den Heuvel, David A. van Leeuwen:
Language diarization for semi-supervised bilingual acoustic model training. 91-96 - Xie Chen, X. Liu, Anton Ragni, Y. Wang, Mark J. F. Gales:
Future word contexts in neural network language models. 97-103 - Qi Liu, Yanmin Qian, Kai Yu:
Future vector enhanced LSTM language model for LVCSR. 104-110 - Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong:
Acoustic-to-word model without OOV. 111-117 - Timo Lohrenz, Tim Fingscheidt:
Turbo fusion of magnitude and phase information for DNN-based phoneme recognition. 118-125 - Takashi Masuko:
Computational cost reduction of long short-term memory based on simultaneous compression of input and hidden state. 126-133 - Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara:
Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks. 134-140 - Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals:
WERD: Using social text spelling variants for evaluating dialectal speech recognition. 141-148 - Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo:
Character-based units for unlimited vocabulary continuous speech recognition. 149-156 - Jian Kang, Wei-Qiang Zhang, Jia Liu:
Gated convolutional networks based hybrid acoustic models for low resource speech recognition. 157-164 - Shankar Kumar, Michael Nirschl, Daniel Niels Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix X. Yu:
Lattice rescoring strategies for long short term memory language models in speech recognition. 165-172 - Zhongdi Qu, Parisa Haghani, Eugene Weinstein, Pedro J. Moreno:
Syllable-based acoustic modeling with CTC-SMBR-LSTM. 173-177 - Adnan Haider, Philip C. Woodland:
Sequence training of DNN acoustic models with natural gradient. 178-184 - Karan Nathwani, Emmanuel Vincent, Irina Illina:
Consistent DNN uncertainty training and decoding for robust ASR. 185-192 - Kanishka Rao, Hasim Sak, Rohit Prabhavalkar:
Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer. 193-199 - Lahiru Samarakoon, Brian Mak:
Unsupervised adaptation of student DNNS learned from teacher RNNS for improved ASR performance. 200-205 - Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, Anuroop Sriram, Zhenyao Zhu:
Exploring neural transducers for end-to-end speech recognition. 206-213 - Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong:
Unsupervised adaptation with domain separation networks for robust speech recognition. 214-221 - Sheng Li, Xugang Lu, Peng Shen, Ryoichi Takashima, Tatsuya Kawahara, Hisashi Kawai:
Incremental training and constructing the very deep convolutional residual network acoustic models. 222-227 - David Rybach, Michael Riley, Johan Schalkwyk:
On lattice generation for large vocabulary speech recognition. 228-235 - Joanna Rownicka, Steve Renals, Peter Bell:
Simplifying very deep convolutional neural network architectures for robust speech recognition. 236-243 - Gakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy:
Language modeling with highway LSTM. 244-251 - Ken'ichi Kumatani, Sankaran Panchapagesan, Minhua Wu, Minjae Kim, Nikko Strom, Gautam Tiwari, Arindam Mandal:
Direct modeling of raw audio with DNNS for wake word detection. 252-257 - Khe Chai Sim, Arun Narayanan, Tom Bagby, Tara N. Sainath, Michiel Bacchiani:
Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow. 258-264 - Shinji Watanabe, Takaaki Hori, John R. Hershey:
Language independent end-to-end architecture for joint language identification and speech recognition. 265-271 - Assaf Hurwitz Michaely, Xuedong Zhang, Gabor Simko, Carolina Parada, Petar S. Aleksic:
Keyword spotting for Google assistant using contextual speech recognition. 272-278 - Pegah Ghahremani, Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur:
Investigation of transfer learning for ASR using LF-MMI trained neural networks. 279-286 - Takaaki Hori, Shinji Watanabe, John R. Hershey:
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition. 287-293 - Bin Wang, Zhijian Ou:
Language modeling with neural trans-dimensional random fields. 294-300 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Listening while speaking: Speech chain by deep learning. 301-308 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Attention-based Wav2Text with feature transfer learning. 309-315 - Ahmed Ali, Stephan Vogel, Steve Renals:
Speech recognition challenge in the wild: Arabic MGB-3. 316-322 - Ewan Dunbar, Xuan-Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux:
The zero resource speech challenge 2017. 323-330 - Kei Sawada, Keiichi Tokuda, Simon King, Alan W. Black:
The blizzard machine learning challenge 2017. 331-337 - Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo:
Aalto system for the 2017 Arabic multi-genre broadcast challenge. 338-345 - Vimal Manohar, Daniel Povey, Sanjeev Khudanpur:
JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning. 346-352 - Maryam Najafian, Wei-Ning Hsu, Ahmed Ali, James R. Glass:
Automatic speech recognition of Arabic multi-genre broadcast media. 353-359 - Ahmet Emin Bulut, Qian Zhang, Chunlei Zhang, Fahimeh Bahmaninezhad, John H. L. Hansen:
UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speech. 360-367 - Karel Veselý, Murali Karthick Baskar, Mireia Díez, Karel Benes:
MGB-3 but system: Low-resource ASR on Egyptian YouTube data. 368-373 - Suwon Shon, Ahmed Ali, James R. Glass:
MIT-QCRI Arabic dialect identification system for the 2017 multi-genre broadcast challenge. 374-380 - Shun-Po Chuang, Chia-Hung Wan, Pang-Chi Huang, Chi-Yu Yang, Hung-yi Lee:
Seeing and hearing too: Audio representation for video captioning. 381-388 - Bowen Shi, Karen Livescu:
Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition. 389-396 - Andrey Malinin, Kate M. Knill, Mark J. F. Gales:
A hierarchical attention based model for off-topic spontaneous spoken response detection. 397-403 - Youssef Oualil, Dietrich Klakow, György Szaszák, Ajay Srinivasamurthy, Hartmut Helmke, Petr Motlícek:
A context-aware speech recognition and understanding system for air traffic control domain. 404-408 - Tuka Alhanai, Rhoda Au, James R. Glass:
Spoken language biomarkers for detecting cognitive impairment. 409-416 - Markus Müller, Sebastian Stüker, Alex Waibel:
DBLSTM based multilingual articulatory feature extraction for language documentation. 417-423 - Kenneth Leidal, David Harwath, James R. Glass:
Learning modality-invariant representations for speech and images. 424-429 - Chiori Hori, Takaaki Hori, Tim K. Marks, John R. Hershey:
Early and late integration of audio features for automatic video description. 430-436 - Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong:
Cracking the cocktail party problem by multi-beam deep attractor network. 437-444 - Hoon Chung, Yun-Kyung Lee, Jeon Gue Park:
Ground truth estimation of spoken english fluency score using decorrelation penalized low-rank matrix factorization. 445-449 - Salil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha, Lucia Specia, Thomas Hain:
Exploring the use of acoustic embeddings in neural machine translation. 450-457 - Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio, Laurent Besacier:
Unwritten languages demand attention too! Word discovery with encoder-decoder models. 458-465 - Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen:
Neural relevance-aware query modeling for spoken document retrieval. 466-473 - Yanzhang He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw:
Streaming small-footprint keyword spotting using sequence-to-sequence models. 474-481 - Bing Liu, Ian R. Lane:
Iterative policy learning in end-to-end trainable task-oriented neural dialog models. 482-489 - Miroslav Vodolán, Filip Jurcícek:
Denotation extraction for interactive learning in dialogue systems. 490-496 - Pin-Jung Chen, I-Hung Hsu, Yi Yao Huang, Hung-yi Lee:
Mitigating the impact of speech recognition errors on chatbot using sequence-to-sequence model. 497-503 - Titouan Parcollet, Mohamed Morchid, Georges Linarès:
Deep quaternion neural networks for spoken language understanding. 504-511 - Imran A. Sheikh, Dominique Fohr, Irina Illina:
Topic segmentation in ASR transcripts using bidirectional RNNS for change detection. 512-518 - Komei Sugiura, Hisashi Kawai:
Grounded language understanding for manipulation instructions using GAN-based classification. 519-524 - Emiru Tsunoo, Ondrej Klejch, Peter Bell, Steve Renals:
Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features. 525-532 - Zih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee:
Personalized word representations carrying personalized semantics learned from social network posts. 533-540 - Young-Bum Kim, Sungjin Lee, Ruhi Sarikaya:
Speaker-sensitive dual memory networks for multi-turn slot tagging. 541-546 - Young-Bum Kim, Sungjin Lee, Karl Stratos:
ONENET: Joint domain, intent, slot prediction for spoken language understanding. 547-553 - Po-Chun Chen, Ta-Chung Chi, Shang-Yu Su, Yun-Nung Chen:
Dynamic time-aware attention to speaker roles and contexts for spoken language understanding. 554-560 - Abhinav Rastogi, Dilek Hakkani-Tür, Larry P. Heck:
Scalable multi-domain dialogue state tracking. 561-568 - Yao Qian, Rutuja Ubale, Vikram Ramanarayanan, Patrick L. Lange, David Suendermann-Oeft, Keelan Evanini, Eugene Tsuprun:
Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system. 569-576 - Ning Gao, Gregory Sell, Douglas W. Oard, Mark Dredze:
Leveraging side information for speaker identification with the Enron conversational telephone speech collection. 577-583 - Chunlei Zhang, Kazuhito Koishida:
End-to-end text-independent speaker verification with flexibility in utterance duration. 584-590 - Lea Schonherr, Steffen Zeiler, Dorothea Kolossa:
Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. 591-598 - Jen-Tzung Chien, Kang-Ting Peng:
Adversarial manifold learning for speaker recognition. 599-605 - Yao Qian, Keelan Evanini, Patrick L. Lange, Robert A. Pugh, Rutuja Ubale, Frank K. Soong:
Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corpora. 606-613 - Ziqiang Shi, Liu Liu, Mengjiao Wang, Rujie Liu:
Multi-view (Joint) probability linear discrimination analysis for J-vector based text dependent speaker verification. 614-620 - Aditya Siddhant, Preethi Jyothi, Sriram Ganapathy:
Leveraging native language speech for accent identification using deep Siamese networks. 621-628 - Yi Liu, Liang He, Yao Tian, Zhuzi Chen, Jia Liu, Michael T. Johnson:
Comparison of multiple features and modeling methods for text-dependent speaker verification. 629-636 - Rachel Rakov, Andrew Rosenberg:
Investigating native and non-native English classification and transfer effects using Legendre polynomial coefficient clustering. 637-643 - Pallavi Baljekar, Sai Krishna Rallabandi, Alan W. Black:
The CMU entry to blizzard machine learning challenge. 644-649 - Ya-Jun Hu, Li-Juan Liu, Chuang Ding, Zhen-Hua Ling, Li-Rong Dai:
The USTC system for blizzard machine learning challenge 2017-ES2. 650-656 - Li-Juan Liu, Chuang Ding, Ya-Jun Hu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou, Si Wei:
The iFLYTEK system for blizzard machine learning challenge 2017-ES1. 657-664 - Axel H. Ng, Kyle Gorman, Richard Sproat:
Minimally supervised written-to-spoken text normalization. 665-670 - Eunwoo Song, Frank K. Soong, Hong-Goo Kang:
Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems. 671-676 - Berrak Sisman, Haizhou Li, Kay Chen Tan:
Sparse representation of phonetic features for voice conversion with and without parallel data. 677-684 - Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li:
Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework. 685-691 - Kévin Vythelingum, Yannick Estève, Olivier Rosec:
Error detection of grapheme-to-phoneme conversion in text-to-speech synthesis using speech signal and lexical context. 692-697 - Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Subband wavenet with overlapped single-sideband filterbanks. 698-704 - Moquan Wan, Gilles Degottex, Mark J. F. Gales:
Integrated speaker-adaptive speech synthesis. 705-711 - Tomoki Hayashi, Akira Tamamori, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
An investigation of multi-speaker training for wavenet vocoder. 712-718 - Herman Kamper, Karen Livescu, Sharon Goldwater:
An embedded segmental K-means model for unsupervised segmentation and clustering of speech. 719-726 - Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Multilingual bottle-neck feature learning from untranscribed speech. 727-733 - Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li:
Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation. 734-739 - Michael Heck, Sakriani Sakti, Satoshi Nakamura:
Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017. 740-746 - Hayato Shibata, Taku Kato, Takahiro Shinozaki, Shinji Watanabe:
Composite embedding systems for ZeroSpeech2017 Track1. 747-753 - T. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy:
Deep learning methods for unsupervised acoustic modeling - Leap submission to ZeroSpeech challenge 2017. 754-761 - T. K. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy, V. Susheela Devi:
Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions. 762-768
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.