default search action
INTERSPEECH 2006: Pittsburgh, PA, USA
- Ninth International Conference on Spoken Language Processing, INTERSPEECH-ICSLP 2006, Pittsburgh, PA, USA, September 17-21, 2006. ISCA 2006
Language Modeling for Spoken Dialog Systems
- Matthew Purver, Florin Ratiu, Lawrence Cavedon:
Robust interpretation in dialogue by combining confidence scores with contextual features. - Hui Ye, Steve J. Young:
A clustering approach to semantic decoding. - Teruhisa Misu, Tatsuya Kawahara:
A bootstrapping approach for developing language model of new spoken dialogue systems by selecting web texts. - Axel Horndasch, Elmar Nöth, Anton Batliner, Volker Warnke:
Phoneme-to-grapheme mapping for spoken inquiries to the semantic web. - Karl Weilhammer, Matthew N. Stuttle, Steve J. Young:
Bootstrapping language models for dialogue systems. - Junlan Feng:
Question answering with discriminative learning algorithms.
Feature Enhancement for Robust ASR
- Patrick Kenny, Vishwa Gupta, Gilles Boulianne, Pierre Ouellet, Pierre Dumouchel:
Feature normalization using smoothed mixture transformations. - Chia-Hsin Hsieh, Chung-Hsien Wu, Jun-Yu Lin:
Stochastic vector mapping-based feature enhancement using prior model and environment adaptation for noisy speech recognition. - Babak Nasersharif, Ahmad Akbari:
A framework for robust MFCC feature extraction using SNR-dependent compression of enhanced mel filter bank energies. - Friedrich Faubel, Matthias Wölfel:
Coupling particle filters with automatic speech recognition for speech feature enhancement. - Chang-Wen Hsu, Lin-Shan Lee:
Extension and further analysis of higher order cepstral moment normalization (HOCMN) for robust features in speech recognition. - Md. Babul Islam, Hiroshi Matsumoto, Kazumasa Yamamoto:
An improved mel-wiener filter for mel-LPC based speech recognition.
Dialog and Discourse
- Lluís F. Hurtado, David Griol, Encarna Segarra, Emilio Emilio, Sanchis Sanchis:
A stochastic approach for dialog management based on neural networks. - Mihai Rotaru, Diane J. Litman:
Discourse structure and speech recognition problems. - Satanjeev Banerjee, Alexander I. Rudnicky:
A texttiling based approach to topic boundary detection in meetings. - Stefan Schulz, Hilko Donker:
An user-centered development of an intuitive dialog control for speech-controlled music selection in cars. - Antoine Raux, Dan Bohus, Brian Langner, Alan W. Black, Maxine Eskénazi:
Doing research on a deployed spoken dialogue system: one year of let's go! experience. - Jackson Liscombe, Jennifer J. Venditti, Julia Hirschberg:
Detecting question-bearing turns in spoken tutorial dialogues.
The Speech Separation Challenge
- Soundararajan Srinivasan, Yang Shao, Zhaozhang Jin, DeLiang Wang:
A computational auditory scene analysis system for robust speech recognition. - Runqiang Han, Pei Zhao, Qin Gao, Zhiping Zhang, Hao Wu, Xihong Wu:
CASA based speech separation for robust speech recognition. - Mark R. Every, Philip J. B. Jackson:
Enhancement of harmonic content of speech based on a dynamic programming pitch tracking algorithm. - Jon Barker, André Coy, Ning Ma, Martin Cooke:
Recent advances in speech fragment decoding techniques. - Tuomas Virtanen:
Speech recognition using factorial hidden Markov models for separation in the feature space. - Ji Ming, Timothy J. Hazen, James R. Glass:
Combining missing-feature theory, speech enhancement and speaker-dependent/-independent modeling for speech separation. - Trausti T. Kristjansson, John R. Hershey, Peder A. Olsen, Steven J. Rennie, Ramesh A. Gopinath:
Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system. - Om Deshmukh, Carol Y. Espy-Wilson:
Modified phase opponency based solution to the speech separation challenge.
Multilingual and Multi-Accent Processing
- Jonas Lööf, Maximilian Bisani, Christian Gollan, Georg Heigold, Björn Hoffmeister, Christian Plahl, Ralf Schlüter, Hermann Ney:
The 2006 RWTH parliamentary speeches transcription system. - Ghazi Bouselmi, Dominique Fohr, Irina Illina, Jean Paul Haton:
Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints. - Joyce Y. C. Chan, P. C. Ching, Tan Lee, Houwei Cao:
Automatic speech recognition of Cantonese-English code-mixing utterances. - M. Zimmerman, Dilek Hakkani-Tür, James G. Fung, Nikki Mirghafori, Luke R. Gottlieb, Elizabeth Shriberg, Yang Liu:
The ICSI+ multilingual sentence segmentation system. - Yan Ming Cheng, Changxue Ma, Lynette Melnar:
Cross-language evaluation of voice-to-phoneme conversions for voice-tag application in embedded platforms. - Huanliang Wang, Yao Qian, Frank K. Soong, Jian-Lai Zhou, Jiqing Han:
A multi-space distribution (MSD) approach to speech recognition of tonal languages. - Viet Bac Le, Laurent Besacier:
Comparison of acoustic modeling techniques for Vietnamese and Khmer ASR. - Yi Liu, Pascale Fung:
Multi-accent Chinese speech recognition. - Seyed Ghorshi, Saeed Vaseghi, Qin Yan:
Comparative analysis of formants of British, american and australian accents. - Linquan Liu, Thomas Fang Zheng, Wenhu Wu:
Automatic initial/final generation for dialectal Chinese speech recognition. - Ruhi Sarikaya, Ossama Emam, Imed Zitouni, Yuqing Gao:
Maximum entropy modeling for diacritization of Arabic text. - Slavomír Lihan, Jozef Juhár, Anton Cizmar:
Comparison of Slovak and Czech speech recognition based on grapheme and phoneme acoustic models.
Corpora, Annotation, and Assessment Metrics I, II
- Rhys James Jones, Ambrose Choy, Briony Williams:
Integrating Festival and Windows. - Cosmin Munteanu, Gerald Penn, Ronald Baecker, Elaine G. Toms, David James:
Measuring the acceptable word error rate of machine-generated webcast transcripts. - Goshu Nagino, Makoto Shozakai:
Analyzing reusability of speech corpus based on statistical multidimensional scaling method. - Susan Fitt, Korin Richmond:
Redundancy and productivity in the speech technology lexicon - can we do better? - Takeshi Yamada, Masakazu Kumakura, Nobuhiko Kitawaki:
Word intelligibility estimation of noise-reduced speech. - Christoph Draxler:
Exploring the unknown - collecting 1000 speakers over the internet for the ph@ttsessionz database of adolescent speakers. - Timothy Murphy, Dorel Picovici, Abdulhussain E. Mahdi:
A new single-ended measure for assessment of speech quality. - Ailbhe Ní Chasaide, John Wogan, Brian Ó Raghallaigh, Áine Ní Bhriain, Eric Zoerner, Harald Berthelsen, Christer Gobl:
Speech technology for minority languages: the case of Irish (gaelic). - Francisco José Fraga, Carlos Alberto Ynoguti, André Godoi Chiovato:
Further investigations on the relationship between objective measures of speech quality and speech recognition rates in noisy environments. - Volodya Grancharov, David Yuheng Zhao, Jonas Lindblom, W. Bastiaan Kleijn:
Non-intrusive speech quality assessment with low computational complexity. - Min-Siong Liang, Ren-Yuan Lyu, Yuang-Chin Chiang:
Using speech recognition technique for constructing a phonetically transcribed taiwanese (min-nan) text corpus. - Andrej Zgank, Tomaz Rotovnik, Matej Grasic, Marko Kos, Damjan Vlaj, Zdravko Kacic:
Sloparl - slovenian parliamentary speech and text corpus for large vocabulary continuous speech recognition. - Siew Leng Toh, Fan Yang, Peter A. Heeman:
An annotation scheme for agreement analysis. - Hitoshi Aoki, Atsuko Kurashima, Akira Takahashi:
Conversational quality estimation model for wideband IP-telephony services. - Kelley Kilanski, Jonathan Malkin, Xiao Li, Richard Wright, Jeff A. Bilmes:
The vocal joystick data collection effort and vowel corpus. - Dmitry Sityaev, Katherine M. Knill, Tina Burrows:
Comparison of the ITU-t p.85 standard to other methods for the evaluation of text-to-speech systems. - Peter A. Heeman, Andy McMillin, J. Scott Yaruss:
An annotation scheme for complex disfluencies. - Christophe Van Bael, Lou Boves, Henk van den Heuvel, Helmer Strik:
Automatic phonetic transcription of large speech corpora: a comparative study. - Yongmei Shi, Lina Zhou:
Examining knowledge sources for human error correction.
Speech Coding
- Joon-Hyuk Chang, Woohyung Lim, Nam Soo Kim:
Signal modification incorporating perceptual weighting filter. - Jani Nurminen:
Enhanced dynamic codebook reordering for advanced quantizer structures. - Chang-Heon Lee, Sung-Kyo Jung, Thomas Eriksson, Won-Suk Jun, Hong-Goo Kang:
An efficient segment-based speech compression technique for hand-held TTS systems. - V. Ramasubramanian, D. Harish:
An unified unit-selection framework for ultra low bit-rate speech coding. - Jes Thyssen, Juin-Hwey Chen:
Efficient VQ techniques and general noise shaping in noise feedback coding. - Yasheng Qian, Wei-Shou Hsu, Peter Kabal:
Classified comfort noise generation for efficient voice transmission. - Balázs Kövesi, Dominique Massaloux, David Virette, Julien Bensa:
Integration of a CELP coder in the ARDOR universal sound codec. - Saikat Chatterjee, T. V. Sreenivas:
Two stage transform vector quantization of LSFs for wideband speech coding. - Saikat Chatterjee, T. V. Sreenivas:
Comparison of prediction based LSF quantization methods using split VQ. - Konrad Hofbauer, Gernot Kubin:
High-rate data embedding in unvoiced speech. - Kyle D. Anderson, Philippe Gournay:
Pitch resynchronization while recovering from a late frame in a predictive speech decoder.
Speech Enhancement I, II
- Suhadi Suhadi, Sorel Stan, Tim Fingscheidt:
A novel environment-dependent speech enhancement method with optimized memory footprint. - Esfandiar Zavarehei, Saeed Vaseghi, Qin Yan:
Weighted codebook mapping for noisy speech enhancement using harmonic-noise model. - Jesper Jensen, Richard C. Hendriks, Jan S. Erkelens, Richard Heusdens:
MMSE estimation of complex-valued discrete Fourier coefficients with generalized gamma priors. - Amarnag Subramanya, Michael L. Seltzer, Alex Acero:
Automatic removal of typed keystrokes from speech signals. - Erhard Rank, Gernot Kubin:
Lattice LP filtering for noise reduction in speech signals. - Om Deshmukh, Carol Y. Espy-Wilson:
Speech enhancement using modified phase opponency model. - Wen Jin, Michael S. Scordilis:
Single channel speech enhancement by frequency domain constrained optimization and temporal masking. - Jong Won Shin, Seung Yeol Lee, Hwan Sik Yun, Nam Soo Kim:
Speech enhancement based on residual noise shaping. - Hannu Pulakka, Laura Laaksonen, Paavo Alku:
Quality improvement of telephone speech by artificial bandwidth expansion - listening tests in three languages. - Benjamin J. Shannon, Kuldip K. Paliwal:
Role of phase estimation in speech enhancement. - Benjamin J. Shannon, Kuldip K. Paliwal, Climent Nadeu:
Speech enhancement based on spectral estimation from higher-lag autocorrelation. - Nitish Krishnamurthy, John H. L. Hansen:
Noise update modeling for speech enhancement: when do we do enough? - A. Shahina, B. Yegnanarayana:
Mapping neural networks for bandwidth extension of narrowband speech. - Amit Das, John H. L. Hansen:
Decision directed constrained iterative speech enhancement. - Takahiro Murakami, Yoshihisa Ishida:
Adaptive filtering for attenuating musical noise caused by spectral subtraction. - Yi Hu, Philipos C. Loizou:
Evaluation of objective measures for speech enhancement. - Myung-Suk Song, Chang-Heon Lee, Hong-Goo Kang:
Performance analysis of various single channel speech enhancement algorithms for automatic speech recognition.
ASR Other I, II
- Gilles Boulianne, Jean-Francois Beaumont, Maryse Boisvert, Julie Brousseau, Patrick Cardinal, Claude Chapdelaine, Michel Comeau, Pierre Ouellet, Frédéric Osterrath:
Computer-assisted closed-captioning of live TV broadcasts in French. - Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, Yuqing Gao:
On the use of morphological analysis for dialectal Arabic speech recognition. - Isabel Trancoso, Ricardo Nunes, Luís Neves, Céu Viana, Helena Moniz, Diamantino Caseiro, Ana Isabel Mata:
Recognition of classroom lectures in european portuguese. - Thomas Pellegrini, Lori Lamel:
Investigating automatic decomposition for ASR in less represented languages. - Abdillahi Nimaan, Pascal Nocera, Jean-François Bonastre:
Automatic transcription of Somali language. - Özgür Çetin, Elizabeth Shriberg:
Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: insights for automatic speech recognition. - Ryu Takeda, Shun'ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation. - Wooil Kim, John H. L. Hansen:
Missing-feature reconstruction for band-limited speech recognition in spoken document retrieval. - Hahn Koo, Yan Ming Cheng:
Incremental learning of MAP context-dependent edit operations for spoken phone number recognition in an embedded platform. - Yasunari Obuchi, Nobuo Hataoka:
Development and evaluation of speech database in automotive environments for practical speech recognition systems. - Dong Yu, Yun-Cheng Ju, Alex Acero:
An effective and efficient utterance verification technology using word n-gram filler models. - J. M. Górriz, Javier Ramírez, Carlos García Puntonet, José C. Segura:
An efficient bispectrum phase entropy-based algorithm for VAD. - Petr Cerva, Jan Nouza, Jan Silovský:
Two-step unsupervised speaker adaptation based on speaker and gender recognition and HMM combination. - Satoshi Nakamura, Masakiyo Fujimoto, Kazuya Takeda:
CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition. - Cheng-Tao Chu, Yun-Hsuan Sung, Yuan Zhao, Daniel Jurafsky:
Detection of word fragments in Mandarin telephone conversation. - Qiang Huo, Wei Li:
A DTW-based dissimilarity measure for left-to-right hidden Markov models and its application to word confusability analysis. - Angel M. Gomez, Juan J. Ramos-Muñoz, Antonio M. Peinado, Victoria E. Sánchez:
Multi-flow block interleaving applied to distributed speech recognition over IP networks. - Edward C. Lin, Kai Yu, Rob A. Rutenbar, Tsuhan Chen:
Moving speech recognition from software to silicon: the in silico vox project. - Chengyuan Ma, Yu Tsao, Chin-Hui Lee:
A study on detection based automatic speech recognition. - Rahul Chitturi, Mark Hasegawa-Johnson:
Novel time domain multi-class SVMs for landmark detection.
Modeling Prosodic Features
- Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan:
Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling. - Andrew Rosenberg, Julia Hirschberg:
On the correlation between energy and pitch accent in read English speech. - Keikichi Hirose, Yasufumi Asano, Nobuaki Minematsu:
Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses. - Tomás Dubeda:
Prosodic boundaries in Czech: an experiment based on delexicalized speech. - Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao:
Totally data-driven intonation prediction model using a novel F0 contour parametric representation. - Laura Dilley, Mara Breen, Marti Bolivar, John Kraemer, Edward Gibson:
A comparison of inter-transcriber reliability for two systems of prosodic annotation: rap (rhythm and pitch) and toBI (tones and break indices).
Spoken Information Retrieval
- Issac Alphonso, Shuangyu Chang:
Saliency parsing for automated directory assistance. - Kohei Iwata, Yoshiaki Itoh, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee:
Open-vocabulary spoken document retrieval based on new subword models and subword phonetic similarity. - Xiang Li, Ea-Ee Jan, Cheng Wu, David M. Lubensky:
Improved topic classification over maximum entropy model using k-norm based new objectives. - Yi-Cheng Pan, Jia-Yu Chen, Yen-shin Lee, Yi-Sheng Fu, Lin-Shan Lee:
Efficient interactive retrieval of spoken documents with key terms ranked by reinforcement learning. - Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki:
Discriminative named entity recognition of speech data using speech recognition confidence. - Ville T. Turunen, Mikko Kurimo:
Using latent semantic indexing for morph-based spoken document retrieval.
Front-End Methods for ASR
- Ralf Schlüter, András Zolnay, Hermann Ney:
Feature combination using linear discriminant analysis and its pitfalls. - Fabio Valente, Hynek Hermansky:
Discriminant linear processing of time-frequency plane. - Esmeralda Uraga, Thomas Hain:
Automatic speech recognition experiments with articulatory data. - Frederik Stouten, Jean-Pierre Martens:
Speech recognition with phonological features: some issues to attend. - Matthias Wölfel, Christian Fügen, Shajith Ikbal, John W. McDonough:
Multi-source far-distance microphone selection and combination for automatic transcription of lectures. - Colin Breithaupt, Rainer Martin:
Statistical analysis and performance of DFT domain noise reduction filters for robust speech recognition. - Luz García, José C. Segura, M. Carmen Benítez, Javier Ramírez, Ángel de la Torre:
Normalization of the inter-frame information using smoothing filtering. - Muhammad Ghulam, Junsei Horikawa, Tsuneo Nitta:
Comparative study on contributions of pitch-synchronization and peak-amplitude towards robustness issue of ASR. - Yasuo Ariki, Shunsuke Kato, Tetsuya Takiguchi:
Phoneme recognition based on fisher weight map to higher-order local auto-correlation. - Hynek Boril, Petr Fousek, Petr Pollák:
Data-driven design of front-end filter bank for Lombard speech recognition. - Andrej Ljolje:
Optimization of class weights for LDA feature transformations. - Janne Pylkkönen:
LDA based feature estimation methods for LVCSR. - Gholamreza Farahani, Seyed Mohammad Ahadi, Mohammad Mehdi Homayounpour:
Robust feature extraction based on spectral peaks of group delay and autocorrelation function and phase domain analysis. - Sankaran Panchapagesan:
Frequency warping by linear transformation of standard MFCC.
Language and Dialect Recognition
- Ana Lilia Reyes-Herrera, Luis Villaseñor Pineda, Manuel Montes-y-Gómez:
Automatic language identification using wavelets. - Josef G. Bauer, Ekaterina Timoshenko:
Minimum classification error training of hidden Markov models for acoustic language identification. - Ekaterina Timoshenko, Josef G. Bauer:
Unsupervised adaptation for acoustic language identification. - S. V. Basavaraja, T. V. Sreenivas:
Low complexity LID using pruned pattern tables of LZW. - Xi Yang, Lu-Feng Zhai, Man-Hung Siu, Herbert Gish:
Improved language identification using support vector machines for language modeling. - Chi-Yueh Lin, Hsiao-Chuan Wang:
Fusion of phonotactic and prosodic knowledge for language identification. - Haizhou Li, Bin Ma, Rong Tong:
Vector-based spoken language recognition using output coding. - Víctor G. Guijarrubia, M. Inés Torres:
Basque-Spanish language identification using phone-based methods. - Ayako Ikeno, John H. L. Hansen:
The role of prosody in the perception of US native English accents. - Bianca Vieru-Dimulescu, Philippe Boula de Mareüil:
Perceptual identification and phonetic analysis of 6 foreign accents in French. - Rongqing Huang, John H. L. Hansen:
Unsupervised Spanish dialect classification.
Spoken Dialog Systems I, II
- Petra Gieselmann, Alex Waibel:
Dynamic extension of a grammar-based dialogue system: constructing an all-recipes knowing robot. - Alexander Gruenstein, Stephanie Seneff, Chao Wang:
Scalable and portable web-based multimodal dialogue interaction with geographical databases. - Chantal Ackermann, Marion Libossek:
System- versus user-initiative dialog strategy for driver information systems. - Filip Krsmanovic, Curtis Spencer, Daniel Jurafsky, Andrew Y. Ng:
Have we met? MDP based speaker ID for robot dialogue. - Rob van Son, Wieneke Wesseling, Louis C. W. Pols:
Prominent words as anchors for TRP projection. - Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, Hiroshi Shimodaira:
Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces. - Jörg Mayer, Ekaterina Jasinskaja, Ulrike Kölsch:
Pitch range and pause duration as markers of discourse hierarchy: perception experiments. - Antonio Roque, Anton Leuski, Vivek Kumar Rangarajan Sridhar, Susan Robinson, Ashish Vaswani, Shrikanth S. Narayanan, David R. Traum:
Radiobot-CFF: a spoken dialogue system for military training. - Shinya Yamada, Toshihiko Itoh, Kenji Araki:
Is voice quality enough? - study on how the situation and user²s awareness influence the utterance features. - Jozef Juhár, Stanislav Ondás, Anton Cizmar, Milan Rusko, Gregor Rozinaj, Roman Jarina:
Development of slovak GALAXY/voiceXML based spoken language dialogue system to retrieve information from the internet. - Lars Degerstedt, Arne Jönsson:
LINTest: a development tool for testing dialogue systems. - Akinori Ito, Keisuke Shimada, Motoyuki Suzuki, Shozo Makino:
A user simulator based on voiceXML for evaluation of spoken dialog systems. - Kristiina Jokinen, Topi Hurtig:
User expectations and real experience on a multimodal interactive system. - Felix Burkhardt, Jitendra Ajmera, Roman Englert, Joachim Stegmann, Winslow Burleson:
Detecting anger in automated voice portal dialogs. - Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen:
Evaluation of a spoken dialogue system with usability tests and long-term pilot studies: similarities and differences. - Fuliang Weng, Sebastian Varges, Badri Raghunathan, Florin Ratiu, Heather Pon-Barry, Brian Lathrop, Qi Zhang, Harry Bratt, Tobias Scheideck, Kui Xu, Matthew Purver, Rohit Mishra, Annie Lien, Madhuri Raya, Stanley Peters, Yao Meng, J. Russell, Lawrence Cavedon, Elizabeth Shriberg, Hauke Schmidt, R. Prieto:
CHAT: a conversational helper for automotive tasks. - Kallirroi Georgila, James Henderson, Oliver Lemon:
User simulation for spoken dialogue systems: learning and evaluation.
Speaker Characterization and Recognition I-IV
- Yi-Hsiang Chao, Wei-Ho Tsai, Hsin-Min Wang, Ruei-Chuan Chang:
Improving the characterization of the alternative hypothesis via kernel discriminant analysis for likelihood ratio-based speaker verification. - Zhenchun Lei, Yingchun Yang, Zhaohui Wu:
A discriminative method for speaker verification using the difference information. - Nicolas Scheffer, Jean-François Bonastre:
A multiclass framework for speaker verification within an acoustic event sequence system. - Bin Ma, Donglai Zhu, Rong Tong, Haizhou Li:
Speaker cluster based GMM tokenization for speaker recognition. - Claudio Garretón, Néstor Becerra Yoma, Carlos Molina, Fernando Huenupán:
Intra-speaker variability compensation in speaker verification with limited enrolling data. - Girija Chetty, Michael Wagner:
Speaking faces for face-voice speaker identity verification. - Kishore Prahallad, Varanasi Sudhakar, Veluru Ranganatham, Krishna M. Bharat, S. Roy Debashish:
Significance of formants from difference spectrum for speaker identification. - Maider Zamalloa, Germán Bordel, Luis Javier Rodríguez, Mikel Peñagarikano, Juan Pedro Uribe:
Using genetic algorithms to weight acoustic features for speaker recognition. - Michael T. Padilla, Thomas F. Quatieri, Douglas A. Reynolds:
Missing feature theory with soft spectral subtraction for speaker verification. - Leena Mary, B. Yegnanarayana:
Prosodic features for speaker verification. - Ming Liu, Thomas S. Huang:
Unsupervised learning of HMM topology for text-dependent speaker verification. - Jan Anguita, Javier Hernando:
On the use of Jacobian adaptation in real speaker verification applications. - Ming Liu, Huazhong Ning, Thomas S. Huang, Zhengyou Zhang:
A novel framework of text-independent speaker verification based on utterance transform and iterative cohort modeling. - Vinod Prakash, John H. L. Hansen:
A cohort - UBM approach to mitigate data sparseness for in-set/out-of-set speaker recognition. - Vaishnevi S. Varadarajan, John H. L. Hansen:
Analysis of lombard effect under different types and levels of noise with application to in-set speaker ID systems. - Alan McCree:
Reducing speech coding distortion for speaker identification. - Tsuneo Kato, Hisashi Kawai:
A text-prompted distributed speaker verification system implemented on a cellular phone and a mobile terminal. - Srikanth Vishnubhotla, Carol Y. Espy-Wilson:
Automatic detection of irregular phonation in continuous speech. - V. Ramasubramanian, Deepak Vijaywargiay, Kumar V. Praveen:
Highly noise robust text-dependent speaker recognition based on hypothesized wiener filtering. - Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. - Andreas Stergiou, Aristodemos Pnevmatikakis, Lazaros C. Polymenakos:
Enhancing the performance of a GMM-based speaker identification system in a multi-microphone setup. - Chris Longworth, Mark J. F. Gales:
Discriminative adaptation for speaker verification. - Andrew O. Hatch, Sachin S. Kajarekar, Andreas Stolcke:
Within-class covariance normalization for SVM-based speaker recognition. - Carol Y. Espy-Wilson, Sandeep Manocha, Srikanth Vishnubhotla:
A new set of features for text-independent speaker identification. - Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt:
Detection of a third speaker in telephone conversations. - Konstantin Biatov, Joachim Köhler:
Improvement speaker clustering using global similarity features. - Narayanaswamy Balakrishnan, Rashmi Gangadharaiah, Richard M. Stern:
Voting for two speaker segmentation. - Alexandre Preti, Jean-François Bonastre:
Unsupervised model adaptation for speaker verification. - Rong Zheng, Shuwu Zhang, Bo Xu:
A quality measure method using Gaussian mixture models and divergence measure for speaker identification. - Yushi Zhang, Waleed H. Abdulla:
Gammatone auditory filterbank and independent component analysis for speaker identification. - Wei Wu, Thomas Fang Zheng, Ming-Xing Xu, Huanjun Bao:
Study on speaker verification on emotional speech. - M. Farrs, Ainara Garde, Pascual Ejarque, Jordi Luque, Javier Hernando:
On the fusion of prosody, voice spectrum and face features for multimodal person verification. - Tarun Pruthi, Carol Y. Espy-Wilson:
An MRI based study of the acoustic effects of sinus cavities and its application to speaker recognition. - Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker verification with non-audible murmur segments. - Christian A. Müller:
Automatic recognition of speakers' age and gender on the basis of empirical studies. - E. J. S. Fox, J. D. Roberts, Mohammed Bennamoun:
Text-independent speaker identification in birds. - Ilyas Potamitis, Todor Ganchev, Nikos Fakotakis:
Automatic acoustic identification of insects inspired by the speaker recognition paradigm.
System Combination
- Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee:
A study on lattice rescoring with knowledge scores for automatic speech recognition. - Sebastian Stüker, Christian Fügen, Susanne Burger, Matthias Wölfel:
Cross-system adaptation and combination for continuous speech recognition: the influence of phoneme set and acoustic front-end. - Catherine Breslin, Mark J. F. Gales:
Generating complementary systems for speech recognition. - Rong Zhang, Alexander I. Rudnicky:
Investigations of issues for using multiple acoustic models to improve continuous speech recognition. - I-Fan Chen, Lin-Shan Lee:
A new framework for system combination based on integrated hypothesis space. - Björn Hoffmeister, Tobias Klein, Ralf Schlüter, Hermann Ney:
Frame based system combination and a comparison with weighted ROVER and CNC.
Interpreting Prosodic Variation
- Jiahong Yuan, Mark Liberman, Christopher Cieri:
Towards an integrated understanding of speaking rate in conversation. - Minh-Quang Vu, Do Dat Tran, Eric Castelli:
Prosody of interrogative and affirmative sentences in vietnamese language: analysis and perceptive results. - Jennifer J. Venditti, Julia Hirschberg, Jackson Liscombe:
Intonational cues to student questions in tutoring dialogs. - Emiel Krahmer, Marc Swerts:
Testing the effect of audiovisual cues to prominence via a reaction-time experiment. - Agustín Gravano, Julia Hirschberg:
Effect of genre, speaker, and word class on the realization of given and new information. - Martti Vainio, Juhani Järvikivi, Stefan Werner:
Word order and tonal shape in the production of focus in short Finnish utterances.
Articulatory Modeling
- Bernd J. Kröger, Peter Birkholz, Jim Kannampuzha, Christiane Neuschaefer-Rube:
Modeling sensory-to-motor mappings using neural nets and a 3d articulatory speech synthesizer. - Julie Fontecave, Frédéric Berthommier:
Semi-automatic extraction of vocal tract movements from cineradiographic data. - Szu-Chen Stan Jou, Tanja Schultz, Matthias Walliczek, Florian Kraft, Alex Waibel:
Towards continuous speech recognition using surface electromyography. - Korin Richmond:
A trajectory mixture density network for the acoustic-articulatory inversion mapping. - Florian Metze:
Articulatory features for "meeting" speech recognition. - Zdenek Krnoul, Milos Zelezný, Ludek Müller, Jakub Kanis:
Training of coarticulation models using dominance functions and visual unit selection methods for audio-visual speech synthesis.
Acoustic Modeling I - Training and Topologies
- Le Zhang, Steve Renals:
Phone recognition analysis for trajectory HMM. - Joseph Keshet, Shai Shalev-Shwartz, Samy Bengio, Yoram Singer, Dan Chazan:
Discriminative kernel-based phoneme sequence recognition. - Jeremy Morris, Eric Fosler-Lussier:
Combining phonetic attributes using conditional random fields. - T. Nagarajan, Douglas D. O'Shaughnessy:
Discriminative MLE training using a product of Gaussian likelihoods. - Hao-Zheng Li, Douglas D. O'Shaughnessy:
State-level variable modeling for phoneme classification. - Xiaolong Li, Li Deng, Dong Yu, Alex Acero:
A time-synchronous phonetic decoder for a long-contextual-Span hidden trajectory model. - Marta Casar, José A. R. Fonollosa:
Analysis of HMM temporal evolution for automatic speech recognition and utterance verification. - Min Tang, Aravind Ganapathiraju:
Improvements to bucket box intersection algorithm for fast GMM computation in embedded speech recognition systems. - Konstantin Markov, Satoshi Nakamura:
Forward-backwards training of hybrid HMM/BN acoustic models. - Dirk Gehrig, Thomas Schaaf:
A comparative study of Gaussian selection methods in large vocabulary continuous speech recognition. - Soo-Young Suk, Seong-Jun Hahm, Ho-Youl Jung, Hyun-Yeol Chung:
A successive state and mixture splitting for optimizing the size of models in speech recognition. - Valentin Ion, Reinhold Haeb-Umbach:
Improved source modeling and predictive classification for channel robust speech recognition.
Acoustic Signal Segmentation and Classification
- Marco Kühne, Roberto Togneri:
Automatic English stop consonants classification using wavelet analysis and hidden Markov models. - Tingyao Wu, Dirk Van Compernolle, Jacques Duchateau, Hugo Van hamme:
Single frame selection for phoneme classification. - Sorin Dusan, Lawrence R. Rabiner:
On the relation between maximum spectral transition positions and phone boundaries. - T. Yingthawornsuk, H. Kaymaz Keskinpala, Daniel J. France, D. Mitchell Wilkes, Richard G. Shiavi, Ronald M. Salomon:
Objective estimation of suicidal risk using vocal output characteristics. - E. Didiot, Irina Illina, Odile Mella, Dominique Fohr, Jean Paul Haton:
A wavelet-based parameterization for speech/music segmentation. - Goshu Nagino, Makoto Shozakai:
Distance measure between Gaussian distributions for discriminating speaking styles. - Franz Pernkopf, Tuan Van Pham:
Bayesian networks for phonetic classification using time-scale features. - Nicole Beringer:
Fast and effective retraining on contrastive vocal characteristics with bidirectional long short-term memory nets. - Ning Ma, Phil D. Green, André Coy:
Exploiting dendritic autocorrelogram structure to identify spectro-temporal regions dominated by a single sound source. - Pairote Leelaphattarakij, Proadpran Punyabukkana, Atiwong Suchato:
Locating phone boundaries from acoustic discontinuities using a two-staged approach. - Qiang Fu, Biing-Hwang Juang:
Investigation on rescoring using minimum verification error (MVE) detectors. - Qiang Fu, Antonio Moreno-Daniel, Biing-Hwang Juang, Jian-Lai Zhou, Frank K. Soong:
Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP). - Michael A. Carlin, Brett Y. Smolenski, Stanley J. Wenndt:
Unsupervised detection of whispered speech in the presence of normal phonation. - Xavier Anguera, Chuck Wooters, Javier Hernando:
Friends and enemies: a novel initialization for speaker diarization.
Linguistics, Phonology, and Phonetics I, II
- Kushan Surana, Janet Slifka:
Acoustic cues for the classification of regular and irregular phonation. - Rattima Nitisaroj:
Realizations and representations of Thai tones in monomoraic syllables. - Irene Jacobi, Louis C. W. Pols, Jan Stroop:
Measuring and comparing vowel qualities in a Dutch spontaneous speech corpus. - Aijun Li, Qiang Fang, Ziyu Xiong:
Phonetic research on accented Chinese in three dialectal regions: Shanghai, Wuhan and Xiamen. - Chi Zhang, Ji Wu, Xi Xiao, Zuoying Wang:
Pronunciation variation modeling for Mandarin with accent. - Kuniko Y. Nielsen:
Specificity and generalizability of spontaneous phonetic imitation. - Christophe Van Bael, Hans van Halteren:
On the sufficiency of automatic phonetic transcriptions for pronunciation variation research. - Abe Kazemzadeh, Joseph Tepperman, Jorge F. Silva, Hong You, Sungbok Lee, Abeer Alwan, Shrikanth S. Narayanan:
Automatic detection of voice onset time contrasts for use in pronunciation assessment. - Hiroko Hirano, Goh Kawai, Keikichi Hirose, Nobuaki Minematsu:
Unfilled pauses in Japanese sentences read aloud by non-native learners. - Ryoji Hamabe, Kiyotaka Uchimoto, Tatsuya Kawahara, Hitoshi Isahara:
Detection of quotations and inserted clauses and its application to dependency structure analysis in spontaneous Japanese. - Chun-Han Tseng, Chia-Ping Chen:
Chinese input method based on reduced Mandarin phonetic alphabet. - Yoshimi Suzuki, Fumiyo Fukumoto:
Thesaurus expansion using similar word pairs from patent documents. - Patrick Schone:
Low-resource autodiacritization of abjads for speech keyword search. - Susan R. Hertz:
A model of the regularities underlying speaker variation: evidence from hybrid synthesis. - Augustin Speyer:
Pauses as a tool to ensure rhythmic wellformedness. - Michiko Watanabe, Yasuharu Den, Keikichi Hirose, Shusaku Miwa, Nobuaki Minematsu:
Factors affecting speakers² choice of fillers in Japanese presentations. - Marelie H. Davel, Etienne Barnard:
Developing consistent pronunciation models for phonemic variants. - Jinsik Lee, Seungwon Kim, Gary Geunbae Lee:
Grapheme-to-phoneme conversion using automatically extracted associative rules for Korean TTS system. - Paisarn Charoenpornsawat, Tanja Schultz:
Example-based grapheme-to-phoneme conversion for Thai.
Speech Translation
- Jason Riesa, Behrang Mohit, Kevin Knight, Daniel Marcu:
Building an English-iraqi Arabic machine translation system for spoken utterances with limited resources. - Sameer Maskey, Bowen Zhou, Yuqing Gao:
A phrase-level machine translation approach for disfluency detection using weighted finite state transducers. - Jonghoon Lee, Donghyeon Lee, Gary Geunbae Lee:
Improving phrase-based Korean-English statistical machine translation. - David Stallard, Fred Choi, Kriste Krstovski, Prem Natarajan, Rohit Prasad, Shirin Saleem:
A hybrid phrase-based/statistical speech translation system. - Chao Wang, Stephanie Seneff:
High-quality speech translation in the flight domain. - Roger Hsiao, Ashish Venugopal, Thilo Köhler, Ying Zhang, Paisarn Charoenpornsawat, Andreas Zollmann, Stephan Vogel, Alan W. Black, Tanja Schultz, Alex Waibel:
Optimizing components for handheld two-way speech translation for an English-iraqi Arabic system.
Acoustic Modeling II - Adaptation
- Armin Sehr, Marcus Zeller, Walter Kellermann:
Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain. - Xin Lei, Jon Hamaker, Xiaodong He:
Robust feature space adaptation for telephony speech recognition. - Nattanun Thatphithakkul, Boontee Kruatrachue, Chai Wutiwiwatchai, Sanparith Marukatat, Vataya Boonpiam:
A simulated-data adaptation technique for robust speech recognition. - Hans-Günter Hirsch, Harald Finster:
A new HMM adaptation approach for the case of a hands-free speech input in reverberant rooms. - Yu Tsao, Chin-Hui Lee:
A vector space approach to environment modeling for robust speech recognition. - Jen-Tzung Chien, Chuan-Wei Ting:
Subspace modeling and selection for noisy speech recognition.
Emotional Speech and Speaker State
- Björn W. Schuller, Niels Köhler, Ronald Müller, Gerhard Rigoll:
Recognition of interest in human conversational speech. - Hua Ai, Diane J. Litman, Katherine Forbes-Riley, Mihai Rotaru, Joel R. Tetreault, Amruta Purandare:
Using system and user performance features to improve emotion detection in spoken tutoring dialogs. - Laurence Devillers, Laurence Vidrascu:
Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. - Janneke Wilting, Emiel Krahmer, Marc Swerts:
Real vs. acted emotional speech. - Daniel Neiberg, Kjell Elenius, Kornel Laskowski:
Emotion recognition in spontaneous speech using GMMs. - Frank Enos, Stefan Benus, Robin L. Cautin, Martin Graciarena, Julia Hirschberg, Elizabeth Shriberg:
Personality factors in human deception detection: comparing human to machine performance.
Speech and Language in Education
- Leen Cleuren, Jacques Duchateau, Alain Sips, Pol Ghesquière, Hugo Van hamme:
Developing an automatic assessment tool for children²s oral reading. - Christopher J. Waple, Yasushi Tsubota, Masatake Dantsuji, Tatsuya Kawahara:
Prototyping a call system for students of Japanese using dynamic diagram generation and interactive hints. - Dominic W. Massaro, Ying Liu, Trevor H. Chen, Charles Perfetti:
A multilingual embodied conversational agent for tutoring speech and language learning. - Michael Heilman, Kevyn Collins-Thompson, Jamie Callan, Maxine Eskénazi:
Classroom success of an intelligent tutoring system for lexical practice and reading comprehension. - Sarah E. Petersen, Mari Ostendorf:
Assessing the reading level of web pages. - Jack Mostow:
Is ASR accurate enough for automated reading tutors, and how can we tell? - Chiharu Tsurutani, Yutaka Yamauchi, Nobuaki Minematsu, Dean Luo, Kazutaka Maruyama, Keikichi Hirose:
Development of a program for self assessment of Japanese pronunciation by English learners. - Joseph Tepperman, Jorge F. Silva, Abe Kazemzadeh, Hong You, Sungbok Lee, Abeer Alwan, Shrikanth S. Narayanan:
Pronunciation verification of children²s speech for automatic literacy assessment. - Sherif Mahdy Abdou, Salah Eldeen Hamid, Mohsen A. Rashwan, Abdurrahman Samir, Ossama Abdel-Hamid, Mostafa Shahin, Waleed Nazih:
Computer aided pronunciation learning system using speech recognition techniques.
Speech Perception I, II
- Bryce E. Lobdell, Jont B. Allen:
An information theoretic tool for investigating speech perception. - Geoffrey Stewart Morrison:
An adaptive sampling procedure for speech perception experiments. - Navin Viswanathan, James S. Magnuson, Carol A. Fowler:
Disentangling gestural and auditory contrast accounts of compensation for coarticulation. - Michael C. W. Yip:
The role of positional probability in the segmentation of Cantonese speech. - Shahina Haque, Tomio Takara:
Nasality perception of vowels in different language background. - Nao Hodoshima, Dawn M. Behne, Takayuki Arai:
Steady-state suppression in reverberation: a comparison of native and nonnative speech perception. - Akiyo Joto:
Effect of dynamic information of formants on discrimination of English vowels in consonantal contexts by Japanese listeners. - Yue Wang, Dawn M. Behne, Haisheng Jiang, Chad Danyluck:
Native and nonnative audio-visual perception of English fricatives in quiet and cafe-noise backgrounds. - Sven Grawunder, Ines Bose, Birgit Hertha, Franziska Trauselt, Lutz Christian Anders:
Perceptive and acoustic measurement of average speaking pitch of female and male speakers in German radio news. - Peter F. Assmann, Sophia Dembling, Terrance M. Nearey:
Effects of frequency shifts on perceived naturalness and gender information in speech. - Hitomi Tohyama, Shigeki Matsubara:
Influence of pause length on listeners² impressions in simultaneous interpretation. - Iris-Corinna Schwarz, Denis Burnham:
New measures to chart toddlers² speech perception and language development: a test of the lexical restructuring hypothesis. - Ángel de la Torre, Cristina Roldán, Manuel Sainz:
Perception of fundamental frequency in cochlear implant patients. - Sarah C. Creel, Delphine Dahan, Daniel Swingley:
Effects of featural similarity and overlap position on lexical confusions and overt similarity judgments. - Hansjörg Mixdorff, Yu Hu:
Word structure and tone perception in Mandarin. - Cécile Woehrling, Philippe Boula de Mareüil:
Identification of regional accents in French: perception and categorization. - Sandeep Phatak, Jont B. Allen:
Consonant and vowel confusions in speech-weighted noise. - Mirjam Broersma:
Accident - execute: increased activation in nonnative listening. - Kirstin Scholz, Marcel Wältermann, Lu Huo, Alexander Raake, Sebastian Möller, Ulrich Heute:
Estimation of the quality dimension "directness/frequency content" for the instrumental assessment of speech quality.
Speech Production, Physiology, and Pathology I, II
- Mark Pluymaekers, Mirjam Ernestus, R. Harald Baayen:
Effects of word frequency on the acoustic durations of affixes. - Xiaochuan Niu, Alexander Kain, Jan P. H. van Santen:
A noninvasive, low-cost device to study the velopharyngeal port during speech and some preliminary results. - Noureddine Aboutabit, Denis Beautemps, Laurent Besacier:
Characterization of cued speech vowels from the inner lip contour. - Christer Gobl:
Modelling aspiration noise during phonation using the LF voice source model. - Jianguo Wei, Xugang Lu, Jianwu Dang:
A simulation based parameter optimization for a coarticulation model. - Abdellah Kacha, Francis Grenez, Jean Schoentgen:
Multivariate analysis of frame-based acoustic cues of dysperiodicities in connected speech. - Tom Kovacs, Donald S. Finan:
Effects of midline tongue piercing on spectral centroid frequencies of sibilants. - P. Vijayalakshmi, M. Ramasubba Reddy, Douglas D. O'Shaughnessy:
Assessment of articulatory sub-systems of dysarthric speech using an isolated-style phoneme recognition system. - Donald S. Finan, Carol A. Boliek:
Respiratory/laryngeal interactions during sustained vowel production in children. - H. Timothy Bunnell, James B. Polikoff:
Acoustic characterization of children with speech delay. - Oscar Saz, Antonio Miguel, Eduardo Lleida, Alfonso Ortega, Luis Buera:
Study of time and frequency variability in pathological speech and error reduction methods for automatic speech recognition. - Markus Iseli, Yen-Liang Shue, Melissa A. Epstein, Patricia A. Keating, Jody Kreiman, Abeer Alwan:
Voice source correlates of prosodic features in american English: a pilot study. - Louis ten Bosch, R. Harald Baayen, Mirjam Ernestus:
On speech variation and word type differentiation by articulatory feature representations. - Sungbok Lee, Erik Bresch, Jason Adams, Abe Kazemzadeh, Shrikanth S. Narayanan:
A study of emotional speech articulation using a fast magnetic resonance imaging technique. - Hedvig Kjellström, Olov Engwall, Olle Bälter:
Reconstructing tongue movements from audio and video. - Gang Feng, Cyril Kotenkoff:
New considerations for vowel nasalization based on separate mouth-nose recording. - Maeva Garnier, Lucie Bailly, Marion Dohen, Pauline Welby, Hélène Loevenbruck:
An acoustic and articulatory study of Lombard speech: global effects on the utterance.
Formant Estimation
- Laurence Cnockaert, Jean Schoentgen, Pascal Auzou, Canan Ozsancak, Francis Grenez:
Tracking of involuntary formant frequency variations and application to parkinsonian speech. - Luis Weruaga, Amar Al-Khayat:
All-pole model estimation of vocal tract on the frequency domain. - Jonathan Darch, Ben Milner:
HMM-based MAP prediction of voiced and unvoiced formant frequencies from noisy MFCC vectors. - Joseph M. Anand, S. Guruprasad, B. Yegnanarayana:
Extracting formants from short segments of speech using group delay functions. - I. Yücel Özbek, Mübeccel Demirekler:
Tracking of visible vocal tract resonances (VVTR) based on kalman filtering. - Salma Chaari, Kaïs Ouni, Noureddine Ellouze:
Wavelet ridge track interpretation in terms of formants.
Language Processing Beyond and Below the Word-Level
- Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ebru Arisoy, Murat Saraclar:
Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition. - Ebru Arisoy, Murat Saraclar:
Lattice extension and rescoring based approaches for LVCSR of Turkish. - Catherine Kobus, Géraldine Damnati, Lionel Delphin-Poulat, Renato de Mori:
Exploiting semantic relations for a spoken language understanding application. - Yuya Akita, Masahiro Saikou, Hiroaki Nanjo, Tatsuya Kawahara:
Sentence boundary detection of spontaneous Japanese using statistical language model and support vector machines. - Sami Virpioja, Mikko Kurimo:
Compact n-gram models by incremental growing and clustering of histories. - Nathalie Camelin, Géraldine Damnati, Frédéric Béchet, Renato de Mori:
Opinion mining in a telephone survey corpus.
Robustness and Adaptation for ASR
- Antonio M. Peinado, Angel M. Gomez, Victoria E. Sánchez, José L. Pérez-Córdoba, Antonio J. Rubio:
An integrated solution for error concealment in DSR systems over wireless channels. - Angel M. Gomez, Antonio M. Peinado, Victoria E. Sánchez, José L. Carmona, Antonio J. Rubio:
Interleaving and MMSE estimation with VQ replicas for distributed speech recognition over lossy packet networks. - Gang Chen, Hesham Tolba, Douglas D. O'Shaughnessy:
Noise-robust speech recognition of conversational telephone speech. - Shingo Kuroiwa, Satoru Tsuge, Fuji Ren:
Lost speech reconstruction method using speech recognition based on missing feature theory and HMM-based speech synthesis. - Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Speaker adaptation using evolutionary-based linear transform. - Jingying Wang, Zuoying Wang:
A speaker adaptation algorithm using principal curves in noisy environments. - Constance Clarke, Daniel Jurafsky:
Limitations of MLLR adaptation with Spanish-accented English: an error analysis. - Hank Liao, Mark J. F. Gales:
Issues with uncertainty decoding for noise robust speech recognition. - Haitian Xu, Luca Rigazio, David Kryze:
Vector taylor series based joint uncertainty decoding. - Qiang Huo, Donglai Zhu:
A maximum likelihood training approach to irrelevant variability compensation based on piecewise linear transformations. - Arindam Mandal, Mari Ostendorf, Andreas Stolcke:
Speaker clustered regression-class trees for MLLR adaptation. - Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:
Robust speech recognition over mobile networks using combined weighted viterbi decoding and subvector based error concealment. - Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura:
Speaker adaptation of trajectory HMMs using feature-space MLLR. - Daniel Povey, George Saon:
Feature and model space speaker adaptation with full covariance Gaussians.
Multimodal, Translation and Information Retrieval
- Adrià de Gispert, José B. Mariño:
Linguistic tuple segmentation in n-gram-based statistical machine translation. - Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking. - Srinivas Bangalore, Patrick Haffner, Stephan Kanthak:
Sequence classification for machine translation. - Yoshiaki Itoh, Takayuki Otake, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee:
Two-stage vocabulary-free spoken document retrieval - subword identification and re-recognition of the identified sections. - Mihai Surdeanu, David Dominguez-Sal, Pere Comas:
Design and performance analysis of a factoid question answering system for spontaneous speech transcriptions. - Toshiyuki Takezawa, Tohru Shimizu:
Performance improvement of dialog speech translation by rejecting unreliable utterances. - Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Cross-lingual dialog model for speech to speech translation. - Murat Akbacak, John H. L. Hansen:
A robust fusion method for multilingual spoken document retrieval systems employing tiered resources. - Weizhong Zhu, Bowen Zhou, Charles Prosser, Pavel Krbec, Yuqing Gao:
Recent advances of IBM's handheld speech translation system. - Svetlana Stenchikova, Dilek Hakkani-Tür, Gökhan Tür:
QASR: question answering using semantic roles for speech interface. - Jan Frederik Maas, Britta Wrede, Gerhard Sagerer:
Towards a multimodal topic tracking system for a mobile robot. - Edward C. Kaiser, Paulo Barthelmess:
Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion, combined with an under-specified display. - Pui-Yu Hui, Helen M. Meng:
Joint interpretation of input speech and pen gestures for multimodal human-computer interaction.
Advances in Acoustic Segmentation
- David Cournapeau, Tatsuya Kawahara, Kenji Mase, Tomoji Toriyama:
Voice activity detector based on enhanced cumulant of LPC residual and on-line EM algorithm. - David Huggins-Daines, Alexander I. Rudnicky:
A constrained baum-welch algorithm for improved phoneme segmentation and efficient training. - Fabio Valente:
Infinite models for speaker clustering. - John Dines, Jithendra Vepa, Thomas Hain:
The segmentation of multi-channel meeting recordings for automatic speech recognition. - Jen-Wei Kuo, Hsin-Min Wang:
Minimum boundary error training for automatic phonetic segmentation. - William Schuler, Timothy A. Miller, Stephen T. Wu, Andrew Exley:
Dynamic evidence models in a DBN phone recognizer.
Acoustic Modeling III - LVCSR
- Bhuvana Ramabhadran, Olivier Siohan, Lidia Mangu, Geoffrey Zweig, Martin Westphal, Henrik Schulz, Alvaro Soneiro:
The IBM 2006 speech transcription system for european parliamentary speeches. - Christian Fügen, Matthias Wölfel, John W. McDonough, Shajith Ikbal, Florian Kraft, Kornel Laskowski, Mari Ostendorf, Sebastian Stüker, Ken'ichi Kumatani:
Advances in lecture recognition: the ISL RT-06s evaluation system. - Mei-Yuh Hwang, Xin Lei, Wen Wang, Takahiro Shinozaki:
Investigation on Mandarin broadcast news speech recognition. - Xin Lei, Man-Hung Siu, Mei-Yuh Hwang, Mari Ostendorf, Tan Lee:
Improved tone modeling for Mandarin broadcast news speech recognition. - Jui-Ting Huang, Lin-Shan Lee:
Prosodic modeling in large vocabulary Mandarin speech recognition. - Ying Sun, Daniel Willett, Raymond Brueckner, Rainer Gruhn, Dirk Bühler:
Experiments on Chinese speech recognition with tonal models and pitch estimation using the Mandarin speecon data.
Speech and Visual Processing
- Jonas Beskow, Björn Granström, David House:
Visual correlates to prominence in several expressive modes. - Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts:
How auditory and visual prosody is used in end-of-utterance detection. - Marc Swerts, Emiel Krahmer:
The importance of different facial areas for signalling visual prominence. - Josef Chaloupka:
Visual speech segmentation and speaker recognition for transcription of TV news. - Guillermo Cortés, Luz García, M. Carmen Benítez, José C. Segura:
HMM-based continuous sign language recognition using a fast optical flow parameterization of visual information. - Xu Shao, Jon Barker:
Audio-visual speech recognition in the presence of a competing speaker.
Text-to-Speech I, II
- Volker Strom, Robert A. J. Clark, Simon King:
Expressive prosody for unit-selection speech synthesis. - Rolf Carlson, Kjell Gustafson, Eva Strangert:
Cues for hesitation in speech synthesis. - Francesc Alías, Joan Claudi Socoró, Xavier Sevillano, Ignasi Iriondo Sanz, Xavier Gonzalvo:
Multi-domain text-to-speech synthesis by automatic text classification. - Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao:
Phrase break prediction using logistic generalized linear model. - Robert A. J. Clark, Simon King:
Joint prosodic and segmental unit selection speech synthesis. - Yeon-Jun Kim, Ann K. Syrdal, Alistair Conkie, Marc C. Beutnagel:
Phonetically enriched labeling in unit selection TTS synthesis. - Jerome R. Bellegarda:
Further developments in LSM-based boundary training for unit selection TTS. - Takashi Nose, Junichi Yamagishi, Takao Kobayashi:
A style control technique for speech synthesis using multiple regression HSMM. - Katsumi Ogata, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi:
Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis. - Ossama Abdel-Hamid, Sherif Mahdy Abdou, Mohsen A. Rashwan:
Improving Arabic HMM based speech synthesis quality. - Mohammad Mehdi Homayounpour, Majid Namnabat:
Farsbayan: a unit selection based Farsi speech synthesizer. - Tadesse Anberbir, Tomio Takara:
Amharic speech synthesis using cepstral method with stress generation rule. - Ausdang Thangthai, Chatchawarn Hansakunbuntheung, Rungkarn Siricharoenchai, Chai Wutiwiwatchai:
Automatic syllable-pattern induction in statistical Thai text-to-phone transcription. - H. J. Oosthuizen, S. T. Phihlela, Madimetja Jonas D. Manamela:
Development of prototype text-to-speech systems for northern sotho. - Jia-Li You, Yining Chen, Min Chu, Yong Zhao, Jin-Lin Wang:
Identify language origin of personal names with normalized appearance number of web pages. - Christian Weiss, Wolfgang Hess:
Conditional random fields for hierarchical segment selection in text-to-speech synthesis. - Aleksandra Krul, Géraldine Damnati, François Yvon, Thierry Moudenc:
Corpus design based on the kullback-leibler divergence for text-to-speech synthesis application. - Zhen-Hua Ling, Ren-Hua Wang:
HMM-based unit selection using frame sized speech segments. - Paul Taylor:
The target cost formulation in unit selection speech synthesis. - Daniel Tihelka, Jindrich Matousek:
Unit selection and its relation to symbolic prosody: a new approach. - Yi-Jian Wu, Wu Guo, Ren-Hua Wang:
Minimum generation error criterion for tree-based clustering of context dependent HMMs. - Heng Kang, Wenju Liu:
Selective-LPC based representation of STRAIGHT spectrum and its applications in spectral smoothing. - Matthias Jilka, Bernd Möbius:
Towards a comprehensive investigation of factors relevant to peak alignment using a unit selection corpus. - Robert J. Utama, Ann K. Syrdal, Alistair Conkie:
Six approaches to limited domain concatenative speech synthesis. - Volker Fischer, Siegfried Kunzmann:
From pre-recorded prompts to corporate voices: on the migration of interactive voice response applications. - Seung Seop Park, Jong Won Shin, Nam Soo Kim:
Automatic speech segmentation with multiple statistical models. - Kimmo Pärssinen, Marko Moberg:
Evaluation of perceptual quality of control point reduction in rule-based synthesis. - Geert Coorman:
Segment connection networks for corpus-based speech synthesis.
Special Populations - Learners, Aged, Challenged
- Ryo Tsuji, Tomohiko Kasami, Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa:
Observations of the spoken language acquisition process based on a multimodal infant behavior corpus. - Ellen Marklund, Francisco Lacerda:
Infants² ability to extract verbs from continuous speech. - Ricardo Augusto Hoffmann Bion, Paola Escudero, Andréia S. Rauber, Barbara O. Baptista:
Category formation and the role of spectral quality in the perception and production of English front vowels. - Ranka Bijeljac-Babic, Christelle Dodane, Sabine Metta, Claire Gerard:
Productions in bilinguism, early foreign language learning and monolinguism: a prosodic comparison. - Yukari Hirata, Elizabeth Whitehurst, Emily Cullings, Jacob Whiton, Carol Glenn:
Training native English speakers to identify Japanese vowel length with fast rate sentences. - Jiang-Chun Chen, Wei-Tang Hsu, Jyh-Shing Roger Jang, Ren-Yuan Lyu, Yuang-Chin Chiang:
Formant-based English vowel assessment for Chinese in Taiwan. - Jörg Metzner, Marcel Schmittfull, Karl Schnell:
Substitute sounds for ventriloquism and speech disorders. - Si Wei, Qing-Sheng Liu, Yu Hu, Ren-Hua Wang:
Automatic Mandarin pronunciation scoring for native learners with dialect accent. - Kengo Fujita, Tsuneo Kato, Hisashi Kawai:
Quick individual fitting methods of simplified hearing compensation for elderly people. - Xiao Li, Jonathan Malkin, Susumu Harada, Jeff A. Bilmes, Richard Wright, James A. Landay:
An online adaptive filtering algorithm for the vocal joystick. - Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech. - Rubén San Segundo, Roberto Barra-Chicote, Luis Fernando D'Haro, Juan Manuel Montero, Ricardo de Córdoba, Javier Ferreiros:
A Spanish speech to sign language translation system for assisting deaf-mute people. - Eeva Klintfors, Francisco Lacerda:
Potential relevance of audio-visual integration in mammals for computational modeling. - C. Anton Rytting:
Finding the gaps: applying a connectionist model of word segmentation to noisy phone-recognized speech data.
Robust ASR
- Shizhen Wang, Xiaodong Cui, Abeer Alwan:
Rapid speaker adaptation using regression-tree based spectral peak alignment. - Chanwoo Kim, Yu-Hsiang Bosco Chiu, Richard M. Stern:
Physiologically-motivated synchrony-based processing for robust automatic speech recognition. - Matthias Walliczek, Florian Kraft, Szu-Chen Stan Jou, Tanja Schultz, Alex Waibel:
Sub-word unit based non-audible speech recognition using surface electromyography. - Jesús Vicente-Peña, Fernando Díaz-de-María, W. Bastiaan Kleijn:
Individual on-line variance adaptation of frequency filtered parameters for robust ASR. - Bing Zhang, Spyros Matsoukas, Richard M. Schwartz:
Recent progress on the discriminative region-dependent transform for speech feature extraction. - Jan Rademacher, Matthias Wächter, Alfred Mertins:
Improved warping-invariant features for automatic speech recognition.
Speech Summarization
- Ani Nenkova:
Summarization evaluation for text and speech: issues and approaches. - Xiaodan Zhu, Gerald Penn:
Summarization of spontaneous conversations. - Pierre Chatain, Edward W. D. Whittaker, Joanna Mrozinski, Sadaoki Furui:
Perplexity based linguistic model adaptation for speech summarisation. - Lin-Shan Lee, Sheng-yi Kong, Yi-Cheng Pan, Yi-Sheng Fu, Yu-tsun Huang:
Multi-layered summarization of spoken document archives by information extraction and semantic structuring. - Sameer Maskey, Julia Hirschberg:
Soundbite detection in broadcast news domain. - Gabriel Murray, Steve Renals:
Dialogue act compression via pitch contour preservation.
Acoustic Modeling IV
- Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi:
Manifold HLDA and its application to robust speech recognition. - Luis Buera, Eduardo Lleida, Juan Arturo Nolazco-Flores, Antonio Miguel, Alfonso Ortega:
Time-dependent cross-probability model for multi-environment model based LInear normalization. - Daniel Povey:
SPAM and full covariance for speech recognition. - Sakriani Sakti, Konstantin Markov, Satoshi Nakamura:
The use of Bayesian network for incorporating accent, gender and wide-context dependency information. - Yu Wang, Eric Fosler-Lussier:
Integrating phonetic boundary discrimination explicitly into HMM systems. - Zhimin Xie, Partha Niyogi:
Robust acoustic-based syllable detection. - Lei He, Jie Hao:
A tone recognition framework for continuous Mandarin speech. - Annika Hämäläinen, Louis ten Bosch, Lou Boves:
Pronunciation variant-based multi-path HMMs for syllables. - Junho Park, Hanseok Ko:
A new state-dependent phonetic tied-mixture model with head-body-tail structured HMM for real-time continuous phoneme recognition system. - Andrej Zgank, Zdravko Kacic:
Conversion from phoneme based to grapheme based acoustic models for speech recognition. - Bong-Wan Kim, Dae-Lim Choi, Yongnam Um, Yong-Ju Lee:
Phone vector DHMM to decode a phone recognizer's output. - T. Nagarajan, P. Vijayalakshmi, Douglas D. O'Shaughnessy:
Combining multiple-sized sub-word units in a speech recognition system using baseform selection. - Antonio Miguel, Eduardo Lleida, Alfons Juan, Luis Buera, Alfonso Ortega, Oscar Saz:
Local transformation models for speech recognition.
Large Vocabulary Speech Recognition
- Toru Imai, Shoei Sato, Akio Kobayashi, Kazuo Onoe, Shinichi Homma:
Online speech detection and dual-gender speech recognition for captioning broadcast news. - Timothy J. Hazen:
Automatic alignment and error correction of human generated transcripts for long speech recordings. - Shuangyu Chang:
Improving speech recognition accuracy with multi-confidence thresholding. - Christophe Servan, Christian Raymond, Frédéric Béchet, Pascal Nocera:
Conceptual decoding from word lattices: application to the spoken dialogue corpus MEDIA. - Shilei Huang, Xiang Xie, Jingming Kuang:
Improving the performance of out-of-vocabulary word rejection by using support vector machines. - Kris Demuynck, Dirk Van Compernolle, Hugo Van hamme:
Robust phone lattice decoding. - Benjamin Lecouteux, Georges Linarès, Pascal Nocera, Jean-François Bonastre:
Imperfect transcript driven speech recognition. - Jian Xue, Rusheng Hu, Yunxin Zhao:
New improvements in decoding speed and latency for automatic captioning. - Shirin Saleem, Rohit Prasad, Prem Natarajan:
Colloquial Iraqi ASR for speech translation. - Tomohiro Hakamata, Akinobu Lee, Yoshihiko Nankaku, Keiichi Tokuda:
Reducing computation on parallel decoding using frame-wise confidence scores. - Hamed Ketabdar, Jithendra Vepa, Samy Bengio, Hervé Bourlard:
Posterior based keyword spotting with a priori thresholds. - Zhengyu Zhou, Helen M. Meng, Wai Kit Lo:
A multi-pass error detection and correction framework for Mandarin LVCSR. - Jan Nouza, Jindrich Zdánský, Petr Cerva, Jan Kolorenc:
Continual on-line monitoring of Czech spoken broadcast programs.
Speech/Noise/Music Segmentation
- Shilei Zhang, Hongchen Jiang, Shuwu Zhang, Bo Xu:
Fast SVM training based on the choice of effective samples for audio classification. - Joerg Schmalenstroeer, Reinhold Haeb-Umbach:
Online speaker change detection by combining BIC with microphone array beamforming. - Javier Ramírez, Pablo Yélamos, J. M. Górriz, José C. Segura, Luz García:
Speech/non-speech discrimination combining advanced feature extraction and SVM learning. - Safaa Jarifi, Dominique Pastor, Olivier Rosec:
Cooperation between global and local methods for the automatic segmentation of speech synthesis corpora. - Martin Heckmann, Marco Moebus, Frank Joublin, Christian Goerick:
Speaker independent voiced-unvoiced detection evaluated in different speaking styles. - Xavier Anguera, Chuck Wooters, José M. Pardo:
Robust speaker diarization for meetings: ICSI RT06s evaluation system. - André Coy, Jon Barker:
A multipitch tracker for monaural speech segmentation. - Rahul Chitturi, Mark Hasegawa-Johnson:
Novel entropy based moving average refiners for HMM landmarks. - Gibak Kim, Nam Ik Cho:
Two-microphone voice activity detection in the presence of coherent interference. - Tor André Myrvoll, Tomoko Matsui:
On a greedy learning algorithm for dPLRM with applications to phonetic feature detection.
Pitch Estimation
- Elliot Moore II, Juan F. Torres:
Improving glottal waveform estimation through rank-based glottal quality assessment. - Francesc Alías, Carlos Monzo, Joan Claudi Socoró:
A pitch marks filtering algorithm based on restricted dynamic programming. - Nicolas Malyska, Thomas F. Quatieri:
Analysis of nonmodal phonation using minimum entropy deconvolution. - Tomoyasu Nakano, Masataka Goto, Yuzuru Hiraga:
An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features. - Stephen A. Zahorian, Princy Dikshit, Hongbing Hu:
A spectral-temporal method for pitch tracking. - M. Shahidur Rahman, Hirobumi Tanaka, Tetsuya Shimamura:
Pitch determination using aligned AMDF.
Acoustic Modeling V - Novel Approaches
- Yan Han, Lou Boves:
Syllable-length path mixture hidden Markov models with trajectory clustering for continuous speech recognition. - Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training. - Christophe Lévy, Georges Linarès, Jean-François Bonastre:
GMM-based acoustic modeling for embedded speech recognition. - Mathias De Wachter, Kris Demuynck, Dirk Van Compernolle:
Boosting HMM performance with a memory upgrade. - Yunbin Deng, Xiaokun Li, Chiman Kwan, Roger Xu, Bhiksha Raj, Richard M. Stern, David Williamson:
An integrated approach to improve speech recognition rate for non-native speakers. - Rusheng Hu, Yunxin Zhao:
Bayesian decision tree state tying for conversational speech recognition.
Corpus-Based Synthesis
- Barry Kirkpatrick, Darragh O'Brien, Ronan Scaife:
Feature extraction for spectral continuity measures in concatenative speech synthesis. - Shinsuke Sakai, Tatsuya Kawahara:
Decision tree-based training of probabilistic concatenation models for corpus-based speech synthesis. - Yong Zhao, Di Peng, Lijuan Wang, Min Chu, Yining Chen, Peng Yu, Jun Guo:
Constructing stylistic synthesis databases from audio books. - Alistair Conkie, Ann K. Syrdal:
Expanding phonetic coverage in unit selection synthesis through unit substitution from a donor voice. - Paul Taylor:
Unifying unit selection and hidden Markov model speech synthesis. - Alan W. Black:
CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling.
Spoken Dialog Technology R&D
- Verena Rieser, Oliver Lemon:
Cluster-based user simulations for learning dialogue strategies. - Charles Lewis, Giuseppe Di Fabbrizio:
Prompt selection with reinforcement learning in an AT&t call routing application. - Silke Goronzy, Raquel Mochales, Nicole Beringer:
Developing speech dialogs for multimodal HMIs using finite state machines. - Norbert Pfleger, Jan Schehl:
Development of advanced dialog systems with PATE. - Rajah Annamalai Subramanian, Philip R. Cohen:
A joint intention-based dialogue engine. - Sebastian Möller, Roman Englert, Klaus-Peter Engelbrecht, Verena Vanessa Hafner, Anthony Jameson, Antti Oulasvirta, Alexander Raake, Norbert Reithinger:
Memo: towards automatic usability evaluation of spoken dialogue services by user error simulations.
Modeling Speaker Emotional State
- Brett Matthews, Raimo Bakis, Ellen Eide:
Synthesizing breathiness in natural speech with sinusoidal modelling. - Mauro Nicolao, Carlo Drioli, Piero Cosi:
Voice GMM modelling for FESTIVAL/MBROLA emotive TTS synthesis. - João P. Cabral, Luís C. Oliveira:
Emovoice: a system to generate emotions in speech. - Zhiyong Wu, Shen Zhang, Lianhong Cai, Helen M. Meng:
Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar. - Hongwu Yang, Helen M. Meng, Lianhong Cai:
Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis. - Sheng Zhang, P. C. Ching, Fanrang Kong:
Automatic emotion recognition of speech signal in Mandarin. - Yi-Hao Kao, Lin-Shan Lee:
Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. - Björn W. Schuller, Gerhard Rigoll:
Timing levels in segment-based speech emotion recognition. - Ryuichi Nisimura, Souji Omae, Hideki Kawahara, Toshio Irino:
Analyzing dialogue data for real-world emotional speech classification. - Cecilia Ovesdotter Alm, Xavier Llorà:
Evolving emotional prosody. - Xin Luo, Qian-Jie Fu, John J. Galvin III:
Vocal emotion recognition with cochlear implants. - Shoichi Matsunaga, S. Sakaguchi, Masaru Yamashita, Sueharu Miyahara, S. Nishitani, Kazuyuki Shinohara:
Emotion detection in infants² cries based on a maximum likelihood approach. - Joseph Tepperman, David R. Traum, Shrikanth S. Narayanan:
"yeah right": sarcasm recognition for spoken dialogue systems. - Rohit Kumar, Carolyn P. Rosé, Diane J. Litman:
Identification of confusion and surprise in spoken dialog using prosodic features. - Tin Lay Nwe, Haizhou Li, Minghui Dong:
Analysis and detection of speech under sleep deprivation. - Ioana Vasilescu, Martine Adda-Decker:
Language, gender, speaking style and language proficiency as factors influencing the autonomous vocalic filler production in spontaneous speech.
Language Modeling and ASR Applications
- Caroline Lavecchia, Kamel Smaïli, Jean Paul Haton:
How to handle gender and number agreement in statistical language models? - Oscar Chan, Roberto Togneri:
Prosodic features for a maximum entropy language model. - Shinsuke Mori:
Language model adaptation with a word list and a raw corpus. - Pascal Wiggers, Léon J. M. Rothkrantz:
Topic-based language modeling with dynamic Bayesian networks. - Hirofumi Yamamoto, Gen-ichiro Kikui, Satoshi Nakamura, Yoshinori Sagisaka:
Speech recognition of foreign out-of-vocabulary words using a hierarchical language model. - Xinhui Hu, Hirofumi Yamamoto, Gen-ichiro Kikui, Yoshinori Sagisaka:
Language modeling of Chinese personal names based on character units for continuous Chinese speech recognition. - A. Lakshmi, Hema A. Murthy:
A syllable based continuous speech recognizer for Tamil. - Monika Woszczyna, Paisarn Charoenpornsawat, Tanja Schultz:
Spontaneous Thai speech recognition. - Matteo Gerosa, Diego Giuliani, Shrikanth S. Narayanan:
Acoustic analysis and automatic recognition of spontaneous children²s speech. - Keith Vertanen:
Speech and speech recognition during dictation corrections. - Lubos Smídl, Josef V. Psutka:
Comparison of keyword spotting methods for searching in speech. - Mithun Balakrishna, Cyril Cerovic, Dan I. Moldovan, Ellis Cave:
Automatic generation of statistical language models for interactive voice response applications. - Yun-Cheng Ju, Ye-Yi Wang, Alex Acero:
Call analysis with classification using speech and non-speech features.
Spoken Language Understanding
- Wei-Lin Wu, Ruzhan Lu, Hui Liu, Feng Gao:
A spoken language understanding approach using successive learners. - Osamuyimen Stewart, Juan M. Huerta, Ea-Ee Jan, Cheng Wu, Xiang Li, David M. Lubensky:
Conversational help desk: vague callers and context switch. - Sophie Rosset, Olivier Galibert, Gabriel Illouz, Aurélien Max:
Integrating spoken dialog and question answering: the ritel project. - Thomas Prommer, Hartwig Holzapfel, Alex Waibel:
Rapid simulation-driven reinforcement learning of multimodal dialog strategies in human-robot interaction. - Gregory Aist, James F. Allen, Ellen Campana, Lucian Galescu, Carlos Gómez Gallo, Scott C. Stoness, Mary D. Swift, Michael K. Tanenhaus:
Software architectures for incremental understanding of human speech. - Florian Schiel, Christoph Draxler, Marion Libossek:
Lingua machinae - an unorthodox proposal. - Heather Pon-Barry, Fuliang Weng, Sebastian Varges:
Evaluation of content presentation strategies for an in-car spoken dialogue system. - Vaibhava Goel, Ramesh A. Gopinath:
On designing context sensitive language models for spoken dialog systems. - Yang Liu:
Using SVM and error-correcting codes for multiclass dialog act classification in meeting corpus. - Hartwig Holzapfel, Alex Waibel:
A multilingual expectations model for contextual utterances in mixed-initiative spoken dialogue. - Yuichiro Fukubayashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Dynamic help generation by estimating user²s mental model in spoken dialogue systems. - Dinoj Surendran, Gina-Anne Levow:
Dialog act tagging with support vector machines and hidden Markov models.
Segmentation and VAD
- Ángel de la Torre, Javier Ramírez, M. Carmen Benítez, José C. Segura, Luz García, Antonio J. Rubio:
Noise robust model-based voice activity detection. - Yu Shi, Frank K. Soong, Jian-Lai Zhou:
Auto-segmentation based VAD for robust ASR. - Kofi Boakye, Andreas Stolcke:
Improved speech activity detection using cross-channel features for recognition of multiparty meetings. - Yusuke Kida, Tatsuya Kawahara:
Evaluation of voice activity detection by combining multiple features with weight adaptation. - Keansub Lee, Daniel P. W. Ellis:
Voice activity detection in personal audio recordings using autocorrelogram compensation. - Ryan Rifkin, Nima Mesgarani:
Discriminating speech and non-speech with regularized least squares.
Technologies for Specific Populations: Learners and Challenged
- John Lee, Stephanie Seneff:
Automatic grammar correction for second-language learners. - Ambra Neri, Catia Cucchiarini, Helmer Strik:
ASR-based corrective feedback on pronunciation: does it really work? - Minghui Dong, Haizhou Li, Tin Lay Nwe:
Evaluating prosody of Mandarin speech for language learning. - Isabel Trancoso, Carlos Duarte, António Joaquim Serralheiro, Diamantino Caseiro, Luís Carriço, Céu Viana:
Spoken language technologies applied to digital talking books. - Akemi Iida, Jun Ito, Shimpei Kajima, Tsutomu Sugawara:
Building an English speech synthesis system from a Japanese ALS patient²s voice. - Alexey Karpov, Andrey Ronzhin, Alexandre Cadiou:
Multi-modal system ICANDO: intellectual computer assistant for disabled operators.
The Prosody of Turn-Taking and Dialog Acts
- Gabriel Skantze, David House, Jens Edlund:
User responses to prosodic variation in fragmentary grounding utterances in dialog. - Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
Analysis of prosodic and linguistic cues of phrase finals for turn-taking and dialog acts. - David Schlangen:
From reaction to prediction: experiments with computational models of turn-taking. - Jáchym Kolár, Elizabeth Shriberg, Yang Liu:
On speaker-specific prosodic models for automatic dialog act segmentation of multi-party meetings. - Nigel G. Ward, Yaffa Al Bayyari:
A case study in the identification of prosodic cues to turn-taking: back-channeling in Arabic. - Jens Edlund, Mattias Heldner:
/nailon/ - software for online analysis of prosody.
Multichannel Speech Enhancement/Speech Perception
- Junfeng Li, Masato Akagi, Yôiti Suzuki:
Improved hybrid microphone array post-filter by integrating a robust speech absence probability estimator for speech enhancement. - Timo Gerkmann, Rainer Martin:
Soft decision combining for dual channel noise reduction. - Guo Chen, Vijay Parsa:
An improved affine projection algorithm based crosstalk resistant adaptive noise canceller. - Stamatios Lefkimmiatis, Dimitrios Dimitriadis, Petros Maragos:
An optimum microphone array post-filter for speech applications. - Federico Flego, Maurizio Omologo:
Multi-microphone periodicity function for robust F0 estimation in real noisy and reverberant environments. - Hamid Reza Abutalebi, Majid Pourahmadi, Masoud Reza Aghabozorgi:
A new dual-microphone speech enhancement method for oriented noises. - Andrew Lovitt, Jont B. Allen:
50 years late: repeating miller-nicely 1955. - Shuichi Sakamoto, Tadahiro Yoshikawa, Shigeaki Amano, Yôiti Suzuki, Tadahisa Kondo:
New 20-word lists for word intelligibility test in Japanese. - Guoping Li, Mark E. Lutman:
Sparseness and speech perception in noise. - Wei Ming Liu, John S. D. Mason, Nicholas W. D. Evans, Keith A. Jellyman:
An assessment of automatic speech recognition as speech intelligibility estimation in the context of additive noise. - Marcel Wältermann, Kirstin Scholz, Alexander Raake, Ulrich Heute, Sebastian Möller:
Underlying quality dimensions of modern telephone connections. - Guo Chen, Vijay Parsa, Susan Scollie:
An ERB loudness pattern based objective speech quality measure.
Diarization in ASR
- Huazhong Ning, Ming Liu, Hao Tang, Thomas S. Huang:
A spectral clustering approach to speaker diarization. - Jindrich Zdánský:
BINSEG: an efficient speaker-based segmentation technique. - Ascensión Gallardo-Antolín, Xavier Anguera, Chuck Wooters:
Multi-stream speaker diarization systems for the meetings domain. - Carla Lopes, Fernando Perdigão:
Improved performance evaluation of speech event detectors. - José M. Pardo, Xavier Anguera, Chuck Wooters:
Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences. - Tuan Van Pham, Gernot Kubin:
Low-complexity and efficient classification of voiced/unvoiced/silence for noisy environments.
Language Model Adaptation, Refinement, and Evaluation
- Motoyuki Suzuki, Yasutomo Kajiura, Akinori Ito, Shozo Makino:
Unsupervised language model adaptation based on automatic text collection from WWW. - Yik-Cheung Tam, Tanja Schultz:
Unsupervised language model adaptation using latent semantic marginals. - David Mrva, Philip C. Woodland:
Unsupervised language model adaptation for Mandarin broadcast conversation transcription. - Dietrich Klakow:
Language model adaptation for tiny adaptation corpora. - Andrej Ljolje:
Pronunciation dependent language models.
Voice Morphing
- Long Qin, Yi-Jian Wu, Zhen-Hua Ling, Ren-Hua Wang:
Improving the performance of HMM-based voice conversion using context clustering decision tree and appropriate regression matrix format. - Chung-Han Lee, Chung-Hsien Wu:
Map-based adaptation for speech conversion using adaptation data selection and non-parallel training. - Jani Nurminen, Jilei Tian, Victor Popa:
Novel method for data clustering and mode selection with application in voice conversion. - David Sündermann, Harald Höge, Antonio Bonafonte, Hermann Ney, Julia Hirschberg:
Text-independent cross-language voice conversion. - Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. - Mikihiro Nakagiri, Tomoki Toda, Hideki Kashioka, Kiyohiro Shikano:
Improving body transmitted unvoiced speech with statistical voice conversion. - Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
An HMM-based singing voice synthesis system. - Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda:
Voice conversion based on mixtures of factor analyzers. - Jilei Tian, Jani Nurminen, Victor Popa:
Efficient Gaussian mixture model evaluation in voice conversion. - Yuji Nakano, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi:
Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis. - Zhiwei Shuang, Raimo Bakis, Slava Shechtman, Dan Chazan, Yong Qin:
Frequency warping based on mapping formant parameters. - Cheng-Yuan Lin, Jyh-Shing Roger Jang:
Automatic phonetic segmentation by using a SPM-based approach for a Mandarin singing voice corpus. - Partha Lal:
A comparison of singing evaluation algorithms.
Prosody
- Raymond W. M. Ng, Tan Lee, Wentao Gu:
Towards automatic parameter extraction of command-response model for Cantonese. - Francisco Campillo Díaz, Jan P. H. van Santen, Eduardo Rodríguez Banga:
A model for the f0 reset in corpus-based intonation approaches. - Gérard Bailly, Jan Gorisch:
Generating German intonation with a trainable prosodic model. - Seungwon Kim, Jinsik Lee, Byeongchang Kim, Gary Geunbae Lee:
Incorporating second-order information into two-step major phrase break prediction for Korean. - Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao:
Totally data-driven duration modeling based on generalized linear model for Mandarin TTS. - Özlem Öztürk, Tolga Çiloglu:
Segmental duration modeling in Turkish. - Rogier C. van Dalen, Pascal Wiggers, Léon J. M. Rothkrantz:
Lexical stress in continuous speech recognition. - Siwei Wang, Gina-Anne Levow:
Improving tone recognition with combined frequency and amplitude modelling. - Che-Kuang Lin, Lin-Shan Lee:
Latent prosodic modeling (LPM) for speech with applications in recognizing spontaneous Mandarin speech with disfluencies. - Keikichi Hirose, Hui Hu, Xiaodong Wang, Nobuaki Minematsu:
Tone recognition of continuous speech of standard Chinese using neural network and tone nucleus model. - Thamar Solorio, Olac Fuentes, Nigel G. Ward, Yaffa Al Bayyari:
Prosodic feature generation for back-channel prediction. - Wieneke Wesseling, Rob van Son, Louis C. W. Pols:
On the sufficiency and redundancy of pitch for TRP projection.
Discriminative Training
- Matthew Gibson, Thomas Hain:
Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition. - Jun Du, Peng Liu, Frank K. Soong, Jian-Lai Zhou, Ren-Hua Wang:
Minimum divergence based discriminative training. - Xinwei Li, Hui Jiang:
Solving large margin estimation of HMMS via semidefinite programming. - Dong Yu, Li Deng, Xiaodong He, Alex Acero:
Use of incrementally regulated discriminative margins in MCE training for speech recognition. - Jinyu Li, Ming Yuan, Chin-Hui Lee:
Soft margin estimation of hidden Markov model parameters. - Ye-Yi Wang, Alex Acero:
Discriminative models for spoken language understanding.
Speech Synthesis
- Guillaume Gibert, Gérard Bailly, Frédéric Elisei:
Evaluating a virtual speech cuer. - Laura Mayfield Tomokiyo, Kay Peterson, Alan W. Black, Kevin A. Lenzo:
Intelligibility of machine translation output in speech synthesis. - Makoto Tachibana, Takashi Nose, Junichi Yamagishi, Takao Kobayashi:
A technique for controlling voice quality of synthetic speech using multiple regression HSMM. - Tatyana Polyakova, Antonio Bonafonte:
Learning from errors in grapheme-to-phoneme conversion. - Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano:
Eigenvoice conversion based on Gaussian mixture model. - Brian Langner, Rohit Kumar, Arthur Chan, Lingyun Gu, Alan W. Black:
Generating time-constrained audio presentations of structured information.
Multimodal Processing
- Fawaz Alsaade, Aladdin M. Ariyaeeinia, L. Meng, Amit S. Malegaonkar:
Multimodal authentication using qualitative support vector machines. - Vassilis Pitsikalis, Athanassios Katsamanis, George Papandreou, Petros Maragos:
Adaptive multimodal fusion by uncertainty compensation. - Debra M. Hardison:
Effects of familiarity with faces and voices on second-language speech processing: components of memory traces. - Satoshi Tamura, Koji Hashimoto, Jiong Zhu, Satoru Hayamizu, Hirotsugu Asai, Hideki Tanahashi, Makoto Kanagawa:
Automatic metadata generation and video editing based on speech and image recognition for medical education contents. - Ibrahim Almajai, Ben Milner, Jonathan Darch:
Analysis of correlation between audio and visual speech features for clean audio feature prediction in noise. - Oxana Govokhina, Gérard Bailly, Gaspard Breton, Paul C. Bagshaw:
TDA: a new trainable trajectory formation system for facial animation.
Speech Analysis
- Giorgio Biagetti, Paolo Crippa, Claudio Turchetti:
Modeling of speech signals based on Bessel-like orthogonal transform. - Pamornpol Jinachitra:
Glottal closure and opening detection for flexible parametric voice coding. - Jan Trmal, Jan Vanek, Ludek Müller, Jan Zelinka:
Independent components for acoustic modeling. - Daryush D. Mehta, Thomas F. Quatieri:
Pitch-scale modification using the modulated aspiration noise source. - Tony Ezzat, Jake V. Bouvrie, Tomaso A. Poggio:
Max-Gabor analysis and synthesis of spectrograms. - Pedro J. Quintana-Morales, Juan L. Navarro-Mesa, Antonio G. Ravelo-García, Fernando D. Lorenzo-García:
Monitoring of the natural voice variations in open and closed phases with frequency warped ARMA modeling. - Hirokazu Kameoka, Jonathan Le Roux, Nobutaka Ono, Shigeki Sagayama:
Speech analyzer using a joint estimation model of spectral envelope and fine structure. - Andrew Errity, John McKenna:
An investigation of manifold learning for speech analysis. - Jake V. Bouvrie, Tony Ezzat:
An incremental algorithm for signal reconstruction from short-time fourier transform magnitude. - Toru Takahashi, Masashi Nishi, Toshio Irino, Hideki Kawahara:
Automatic assignment of anchoring points on vowel templates for defining correspondence between time-frequency representations of speech samples. - S. Prasad, Sundararajan Srinivasan, M. Pannuri, Georgios Y. Lazarou, Joseph Picone:
Nonlinear dynamical invariants for speech recognition.
Advances in Noisy ASR
- Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:
Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition. - Sébastien Demange, Christophe Cerisara, Jean Paul Haton:
Missing data mask models with global frequency and temporal constraints. - Hemant Misra, Jithendra Vepa, Hervé Bourlard:
Multi-stream ASR: an oracle perspective. - Koji Iwano, Kaname Kojima, Sadaoki Furui:
A weight estimation method using LDA for multi-band speech recognition. - Chang-Wen Hsu, Lin-Shan Lee:
Powered cepstral normalization (p-CN) for robust features in speech recognition. - Pei Ding, Lei He, Xiang Yan, Jie Hao:
Robust automatic speech recognition for accented Mandarin in car environments. - Xugang Lu, Masashi Unoki, Masato Akagi:
A robust feature extraction based on the MTF concept for speech recognition in reverberant environment. - Young Joon Kim, Woohyung Lim, Nam Soo Kim:
Clean speech feature estimation based on soft spectral masking. - Mansoor Vali, Seyyed Ali Seyyed Salehi, Kazem Karimi:
Robust speech recognition by modifying clean and telephone feature vectors using bidirectional neural network. - Chung-fu Tai, Jeih-Weih Hung:
Silence energy normalization for robust speech recognition in additive noise environment. - Maarten Van Segbroeck, Hugo Van hamme:
Handling convolutional noise in missing data automatic speech recognition. - Norihide Kitaoka, Souta Hamaguchi, Seiichi Nakagawa:
Noisy speech recognition based on selection of multiple noise suppression methods using noise GMMs. - Guillermo Aradilla, Jithendra Vepa, Hervé Bourlard:
Using posterior-based features in template matching for speech recognition. - Yasunari Obuchi, Nobuo Hataoka:
Hypothesis-based feature combination of multiple speech inputs for robust speech recognition in automotive environments.
Source Separation and Localization
- Zbynek Koldovský, Jan Nouza, Jan Kolorenc:
Continuous time-frequency masking method for blind speech separation with adaptive choice of threshold parameter using ICA. - Yanxue Liang, Ichiro Hagiwara:
Multistage convolutive blind source separation for speech mixture. - Futoshi Asano, Jun Ogata:
Detection and separation of speech events in meeting recordings. - Alberto Abad, Carlos Segura, Dusan Macho, Javier Hernando, Climent Nadeu:
Audio person tracking in a smart-room environment. - Tobias Gehrig, Ulrich Klee, John W. McDonough, Shajith Ikbal, Matthias Wölfel, Christian Fügen:
Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters. - Martin Heckmann, Tobias Rodemann, Björn Schölling, Frank Joublin, Christian Goerick:
Modeling the precedence effect for binaural sound source localization in noisy and echoic environments. - Fotios Talantzis, Anthony G. Constantinides, Lazaros C. Polymenakos:
Using a differential microphone array to estimate the direction of arrival of two acoustic sources. - Alessio Brutti, Maurizio Omologo, Piergiorgio Svaizer:
Speaker localization based on oriented global coherence field. - Mohammad H. Radfar, Richard M. Dansereau, Abolghasem Sayadiyan:
Performance evaluation of three features for model-based single channel speech separation problem. - Mikkel N. Schmidt, Rasmus Kongsgaard Olsson:
Single-channel speech separation using sparse non-negative matrix factorization. - Rong Hu, Yunxin Zhao:
Adaptive speech enhancement for speech separation in diffuse noise. - Hagai Thomas Attias:
A probabilistic graphical model for microphone array source separation using rich pre-trained source models. - Erik Visser:
Geometrically constrained permutation-free source separation in an undercomplete speech unmixing scenario. - Dirk Olszewski, Klaus Linhard:
Highly directional multi-beam audio loudspeaker.
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.