default search action
EUROSPEECH/INTERSPEECH 2003: Geneva, Switzerland
- 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - INTERSPEECH 2003, Geneva, Switzerland, September 1-4, 2003. ISCA 2003
Plenary Talks
- Kenneth Ward Church:
Speech and language processing: where have we been and where are we going? 1-4 - Birger Kollmeier:
Auditory principles in speech processing - do computers need silicon ears ? 5-8
Aurora Noise Robustness on SMALL Vocabulary Databases
- Kaisheng Yao, Erik M. Visser, Oh-Wook Kwon, Te-Won Lee:
A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. 9-12 - Yiu-Pong Lai, Man-Hung Siu:
Maximum likelihood normalization for robust speech recognition. 13-16 - Veronique Stouten, Hugo Van hamme, Kris Demuynck, Patrick Wambacq:
Robust speech recognition using model-based feature enhancement. 17-20 - Jian Wu, Qiang Huo:
Several HKU approaches for robust speech recognition and their evaluation on Aurora connected digit recognition tasks. 21-24 - Yadong Wang, Jesse Hansen, Gopi Krishna Allu, Ramdas Kumaresan:
Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database. 25-28 - Akira Sasou, Futoshi Asano, Kazuyo Tanaka, Satoshi Nakamura:
Adaptation of acoustic model using the gain-adapted HMM decomposition method. 29-32
ISCA Special Interest Group Session: "Hot Topics" in Speech Science and Technology
- Jean-François Bonastre, Frédéric Bimbot, Louis-Jean Boë, Joseph P. Campbell, Douglas A. Reynolds, Ivan Magrin-Chagnolleau:
Person authentication by voice: a need for caution. 33-36 - Gérard Bailly, Nick Campbell, Bernd Möbius:
ISCA special session: hot topics in speech synthesis. 37-40 - Béatrice de Gelder:
Perceiving emotions by ear and by eye. 41-44 - Steven Greenberg:
Strategies for automatic multi-tier annotation of spoken language corpora. 45-48 - Lin-Shan Lee, Yuan Ho, Jia-fu Chen, Shun-Chuan Chen:
Why is the special structure of the language important for Chinese spoken language processing? - examples on spoken document retrieval, segmentation and summarization. 49-52
Speech Signal Processing 1-4
- Luis Weruaga, Marián Képesi:
Speech analysis with the short-time chirp transform. 53-56 - Ixone Arroabarren, Alfonso Carlosena:
Glottal spectrum based inverse filtering. 57-60 - G. V. Kiran, Thippur V. Sreenivas:
A novel method of analysing and comparing responses of hearing aid algorithms using auditory time-frequency representation. 61-64 - Kuldip K. Paliwal, Bishnu S. Atal:
Frequency-related representation of speech. 65-68 - Vikas C. Raykar, Ramani Duraiswami, B. Yegnanarayana, S. R. Mahadeva Prasanna:
Tracking a moving speaker using excitation source information. 69-72 - Li Deng, Issam Bazzi, Alex Acero:
Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint. 73-76 - Khosrow Lashkari, Toshio Miki:
Optimization of the CELP model in the LSP domain. 1709-1712 - Ben Gillett, Simon King:
Transforming voice quality. 1713-1716 - Yusuke Hioka, Nozomu Hamada:
DOA estimation of speech signal using equilateral-triangular microphone array. 1717-1720 - Ilyas Potamitis, George Tremoulis, Nikos Fakotakis, George Kokkinakis:
Multi-array fusion for beamforming and localization of moving speakers. 1721-1724 - Xu Shao, Ben P. Milner, Stephen J. Cox:
Integrated pitch and MFCC extraction for speech reconstruction and speech recognition applications. 1725-1728 - Lasse Laaksonen, Sakari Himanen, Ari Heikkinen, Jani Nurminen:
Exploiting time warping in AMR-NB and AMR-WB speech coders. 1729-1732 - Stephan Grashey:
A new approach to voice activity detection based on self-organizing maps. 1733-1736 - Yoshinori Shiga, Simon King:
Estimating the spectral envelope of voiced speech using multi-frame analysis. 1737-1740 - Essa Jafer, Abdulhussain E. Mahdi:
Adaptive noise estimation using second generation and perceptual wavelet transforms. 1741-1744 - Julien Bourgeois:
A clustering approach to on-line audio source separation. 1745-1748 - Yoshinori Shiga, Simon King:
Estimation of voice source and vocal tract characteristics based on multi-frame analysis. 1749-1752 - Taoufik En-Najjary, Olivier Rosec, Thierry Chonavel:
A new method for pitch prediction from spectral envelope and its application in voice conversion. 1753-1756 - Marco Orlandi, Alfiero Santarelli, Daniele Falavigna:
Maximum likelihood endpoint detection with time-domain features. 1757-1760 - Ixone Arroabarren, Alfonso Carlosena:
Unified analysis of glottal source spectrum. 1761-1764 - Aïcha Bouzid, Noureddine Ellouze:
Local regularity analysis at glottal opening and closure instants in electroglottogram signal using wavelet transform modulus maxima. 2837-2840 - Martin Schafföner, Marcel Katz, Sven E. Krüger, Andreas Wendemuth:
Improved robustness of automatic speech recognition using a new class definition in linear discriminant analysis. 2841-2844 - Oytun Türk, Levent M. Arslan:
Voice conversion methods for vocal tract and pitch contour modification. 2845-2848 - Olaf Schreiner:
Modulation spectrum for pitch and speech pause detection. 2849-2852 - Dimitrios Dimitriadis, Petros Maragos:
Robust energy demodulation based on continuous models with application to speech recognition. 2853-2856 - Jong Uk Kim, Sang-Gyun Kim, Chang D. Yoo:
A robust and sensitive word boundary decision algorithm. 2857-2860 - Seongho Seo, Dalwon Jang, Sunil Lee, Chang D. Yoo:
A novel transcoding algorithm for SMV and g.723.1 speech coders via direct parameter transformation. 2861-2864 - Dalwon Jang, Seongho Seo, Sunil Lee, Chang D. Yoo:
A novel rate selection algorithm for transcoding CELP-type codec and SMV. 2865-2868 - Gary Choy, David Hermann, Robert L. Brennan, Todd Schneider, Hamid Sheikhzadeh, Etienne Cornu:
Subband-based acoustic shock limiting algorithm on a low-resource DSP system. 2869-2872 - Patricia A. Pelle, Matias L. Capeletto:
Pitch estimation using phase locked loops. 2873-2876 - Dhany Arifianto, Takao Kobayashi:
Performance evaluation of IFAS-based fundamental frequency estimator in noisy environment. 2877-2880 - Hans Kruschke, Michael Lenz:
Estimation of the parameters of the quantitative intonation model with continuous wavelet analysis. 2881-2884 - Francisco Romero Rodriguez, Wei Ming Liu, Nicholas W. D. Evans, John S. D. Mason:
Morphological filtering of speech spectrograms in the context of additive noise. 2885-2888 - Guillaume Lathoud, Iain McCowan, Darren Moore:
Segmenting multiple concurrent speakers using microphone arrays. 2889-2892 - T. Nagarajan, Hema A. Murthy, Rajesh M. Hegde:
Segmentation of speech into syllable-like units. 2893-2896 - Massimo Petrillo, Francesco Cutugno:
A syllable segmentation algorithm for English and italian. 2913-2916 - Ashish Verma, Arun Kumar:
Modeling speaking rate for voice fonts. 2917-2920 - Jouni Pohjalainen:
A new HMM-based approach to broad phonetic classification of speech. 2921-2924 - Xin Zhong, Mark A. Clements, Sung Lim:
Acoustic change detection and segment clustering of two-way telephone conversations. 2925-2928 - David N. Levin:
Blind normalization of speech from different channels. 2929-2932 - Aparna Gurijala, John R. Deller Jr.:
Speech watermarking by parametric embedding with an l_(infinity) fidelity criterion. 2933-2936
Phonology and Phonetics I
- Shu-Chuan Tseng:
Features of contracted syllables of spontaneous Mandarin. 77-80 - K. Samudravijaya:
Durational characteristics of hindi stop consonants. 81-84 - Toshiko Isei-Jaakkola:
Quantity comparison of Japanese and finnish in various word structures. 85-88 - Mary Baltazani:
Broad focus across sentence types in greek. 89-92 - Chatchawarn Hansakunbuntheung, Virongrong Tesprasit, Rungkarn Siricharoenchai, Yoshinori Sagisaka:
Analysis and modeling of syllable duration for Thai speech synthesis. 93-96 - Aoju Chen:
Reaction time as an indicator of discrete intonational contrasts in English. 97-100 - Dafydd Gibbon:
Corpus-based syntax-prosody tree matching. 761-764 - D. W. Ying, W. Gao, W. Q. Wang:
A new approach to segment and detect syllables from high-speed speech. 765-768 - R. J. J. H. van Son, Louis C. W. Pols:
Information structure and efficiency in speech production. 769-772 - Anna Corazza, Louis ten Bosch:
Learning rule ranking by dynamic construction of context-free grammars using AND/OR graphs. 773-776 - Elena Zvonik, Fred Cummins:
The effect of surrounding phrase lengths on pause duration. 777-780 - Shigeki Okawa, Katsuhiko Shirai:
Statistical estimation of phoneme's most stable point based on universal constraint. 781-784 - Nicole Beringer:
Independent automatic segmentation by self-learning categorial pronunciation rules. 785-788 - Bettina Braun, D. Robert Ladd:
Prosodic correlates of contrastive and non-contrastive themes in German. 789-792 - Yiya Chen:
Accentual lengthening in standard Chinese: evidence from four-syllable constituents. 793-796 - Supphanat Kanokphara:
Syllable structure based phonetic units for context-dependent continuous Thai speech recognition. 797-800 - Fang Hu:
An acoustic phonetic analysis of diphthongs in ningbo Chinese. 801-804 - Takashi Otake, Yoko Sakamoto:
Latent ability to manipulate phonemes by Japanese preliterates in roman alphabet. 805-808 - Hartmut R. Pfitzinger:
The /i/-/a/-/u/-ness of spoken vowels. 809-812
Topics in Prosody and Emotional Speech
- Ben Gillett, Simon King:
Transforming F0 contours. 101-104 - Norman D. Cook, Takeshi Fujisawa, Kazuaki Takami:
Evaluation of the affect of speech intonation using a model of the perception of interval dissonance and harmonic tension. 105-108 - Wen-Hsing Lai, Yih-Ru Wang, Sin-Horng Chen:
A new pitch modeling approach for Mandarin speech. 109-112 - Panagiotis Zervas, Manolis Maragoudakis, Nikos Fakotakis, George Kokkinakis:
Bayesian induction of intonational phrase breaks. 113-116 - Thibaut Ehrette, Noël Chateau, Christophe d'Alessandro, Valérie Maffiolo:
Predicting the perceptive judgment of voices in a telecom context: selection of acoustic parameters. 117-120 - Sven L. Mattys:
Stress-based speech segmentation revisited. 121-124 - Oh-Wook Kwon, Kwokleung Chan, Jiucang Hao, Te-Won Lee:
Emotion recognition by speech signals. 125-128 - Fabio Tamburini:
Automatic prosodic prominence detection in speech using acoustic features: an unsupervised system. 129-132 - Vladimir Hozjan, Zdravko Kacic:
Improved emotion recognition with large set of statistical features. 133-136 - Patavee Charnvivit, Nuttakorn Thubthong, Ekkarit Maneenoi, Sudaporn Luksaneeyanawin, Somchai Jitapunkul:
Recognition of intonation patterns in Thai utterance. 137-140 - Keikichi Hirose, Yusuke Furuyama, Shuichi Narusawa, Nobuaki Minematsu, Hiroya Fujisaki:
Use of linguistic information for automatic extraction of f_0 contour generation process model parameters. 141-144 - Marion Dohen, Hélène Loevenbruck, Marie-Agnès Cathiard, Jean-Luc Schwartz:
Potential audiovisual correlates of contrastive focus in French. 145-148 - Toshie Hatano, Yasuo Horiuchi, Akira Ichikawa:
How does human segment the speech by prosody ? 149-152 - Brenton D. Walker, Bradley C. Lackey, Jennifer S. Muller, Patrick John Schone:
Language-reconfigurable universal phone recognition. 153-156 - Chul Min Lee, Shrikanth S. Narayanan:
Emotion recognition using a data-driven fuzzy inference system. 157-160 - Noriko Suzuki, Yohei Yabuta, Yugo Takeuchi, Yasuhiro Katagiri:
Effects of voice prosody by computers on human behaviors. 161-164 - Oliver Jokisch, Marco Kühne:
An investigation of intensity patterns for German. 165-168 - João Paulo Ramos Teixeira, Diamantino Freitas:
Segmental durations predicted with a neural network. 169-172 - Takumi Yamashita, Yoshinori Sagisaka:
Generation and perception of f_0 markedness in conversational speech with adverbs expressing degrees. 173-176 - Hansjörg Mixdorff, Nguyen Hung Bach, Hiroya Fujisaki, Chi Mai Luong:
Quantitative analysis and synthesis of syllabic tones in vietnamese. 177-180 - Shinya Kiriyama, Yoshifumi Mitsuta, Yuta Hosokawa, Yoshikazu Hashimoto, Toshihiko Itoh, Shigeyoshi Kitazawa:
Japanese prosodic labeling support system utilizing linguistic information. 181-184 - Véronique Aubergé, Nicolas Audibert, Albert Rilliard:
Why and how to control the authentic emotional speech corpora. 185-188 - Laurence Devillers, Ioana Vasilescu:
Prosodic cues for emotion characterization in real-life spoken dialogs. 189-192
Language Modeling, Discourse and Dialog
- Joseph Polifroni, Grace Chung, Stephanie Seneff:
Towards the automatic generation of mixed-initiative dialogue systems from web content. 193-196 - Edward Filisko, Stephanie Seneff:
A context resolution server for the galaxy conversational systems. 197-200 - Hilda Hardy, Kirk Baker, Hélène Bonneau-Maynard, Laurence Devillers, Sophie Rosset, Tomek Strzalkowski:
Semantic and dialogic annotation for automated multilingual customer service. 201-204 - H. B. M. Nicholson, Ellen Gurman Bard, Anne H. Anderson, María L. Flecha-García, David Kenicer, Lucy Smallwood, Jim Mullin, Robin J. Lickley, Yiya Chen:
Disfluency under feedback and time-pressure. 205-208 - Peter A. Heeman, Fan Yang, Susan E. Strayer:
Control in task-oriented dialogues. 209-212 - Kevin McTait, Martine Adda-Decker:
The 300k LIMSI German broadcast news transcription system. 213-216 - Jilei Tian, Janne Suontausta, Juha Häkkinen:
Weighted entropy training for the decision tree based text-to-phoneme mapping. 217-220 - Yoshihiko Ogawa, Hirofumi Yamamoto, Yoshinori Sagisaka, Gen-ichiro Kikui:
Word class modeling for speech recognition with out-of-task words using a hierarchical language model. 221-224 - Roeland Ordelman, Arjan van Hessen, Franciska de Jong:
Compound decomposition in dutch large vocabulary speech recognition. 225-228 - Guergana K. Savova, Joan Bachenko:
Designing for errors: similarities and differences of disfluency rates and prosodic characteristics across domains. 229-232 - Mirjam Wester:
Syllable classification using articulatory-acoustic features. 233-236 - Imed Zitouni, Olivier Siohan, Chin-Hui Lee:
Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition. 237-240 - Sergio Barrachina, Juan Miguel Vilar:
Incremental and iterative monolingual clustering algorithms. 241-244 - Anand Venkataraman, Wen Wang:
Techniques for effective vocabulary selection. 245-248 - Lucian Galescu:
Recognition of out-of-vocabulary words with sub-lexical language models. 249-252 - Hélène Bonneau-Maynard, Sophie Rosset:
A semantic representation for spoken dialogs. 253-256 - Martine Adda-Decker:
A corpus-based decompounding algorithm for German lexical modeling in LVCSR. 257-260 - Kyong-Nim Lee, Minhwa Chung:
Modeling cross-morpheme pronunciation variations for korean large vocabulary continuous speech recognition. 261-264
Speech Synthesis: Unit Selection 1, 2
- Yi Zhou, Yiqing Zu:
Unit selection based on voice recognition. 265-268 - Jun Xu, Thomas Choy, Minghui Dong, Cuntai Guan, Haizhou Li:
On unit analysis for Cantonese corpus-based TTS. 269-272 - Tanya Lambert, Andrew P. Breen, Barry Eggleton, Stephen J. Cox, Ben P. Milner:
Unit selection in concatenative TTS synthesis systems based on mel filter bank amplitudes and phonetic context. 273-276 - Baris Bozkurt, Özlem Öztürk, Thierry Dutoit:
Text design for TTS speech corpus building using a modified greedy selection. 277-280 - Seung Seop Park, Chong Kyu Kim, Nam Soo Kim:
Discriminative weight training for unit-selection based speech synthesis. 281-284 - Peter Rutten, Justin Fackrell:
The application of interactive speech unit selection in TTS systems. 285-288 - Francisco Campillo Díaz, Eduardo Rodríguez Banga:
On the design of cost functions for unit-selection speech synthesis. 289-292 - Jithendra Vepa, Simon King:
Kalman-filter based join cost for unit-selection speech synthesis. 293-296 - Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki:
Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations. 297-300 - Jindrich Matousek, Daniel Tihelka, Josef Psutka:
Automatic segmentation for czech concatenative speech synthesis using statistical approach with boundary-specific correction. 301-304 - Chih-Chung Kuo, Chi-Shiang Kuo, Jau-Hung Chen, Sen-Chia Chang:
Automatic speech segmentation and verification for concatenative synthesis. 305-308 - Sérgio Paulo, Luís C. Oliveira:
DTW-based phonetic alignment using multiple acoustic features. 309-312 - John Kominek, Christina L. Bennett, Alan W. Black:
Evaluating and correcting phoneme segmentation for unit selection synthesis. 313-316 - Esther Klabbers, Jan P. H. van Santen:
Control and prediction of the impact of pitch modification on synthetic speech quality. 317-320 - Matthew P. Aylett, Justin Fackrell, Peter Rutten:
My voice, your prosody: sharing a speaker specific prosody model across speakers in unit selection TTS. 321-324 - Virongrong Tesprasit, Paisarn Charoenpornsawat, Virach Sornlertlamvanich:
Learning phrase break detection in Thai text-to-speech. 325-328 - Alexander Kain, Jan P. H. van Santen:
A speech model of acoustic inventories based on asynchronous interpolation. 329-332 - Keikichi Hirose, Takayuki Ono, Nobuaki Minematsu:
Corpus-based synthesis of fundamental frequency contours of Japanese using automatically-generated prosodic corpus and generation process model. 333-336 - S. Prahallad Kishore, Alan W. Black:
Unit size in unit selection speech synthesis. 1317-1320 - Antje Schweitzer, Norbert Braunschweiler, Tanja Klankert, Bernd Möbius, Bettina Säuberlich:
Restricted unlimited domain synthesis. 1321-1324 - Hélène François, Olivier Boëffard:
Evaluation of units selection criteria in corpus-based speech synthesis. 1325-1328 - Michael Pucher, Friedrich Neubarth, Erhard Rank, Georg Niklfeld, Qi Guan:
Combining non-uniform unit selection with diphone based synthesis. 1329-1332 - Francesc Alías, Xavier Llorà:
Evolutionary weight tuning based on diphone pairs for unit selection speech synthesis. 1333-1336 - Ove Andersen, Charles Hoequist:
Keeping rare events rare. 1337-1340
Aurora Noise Robustness on LARGE Vocabulary Databases
- Naveen Parihar, Joseph Picone:
Analysis of the Aurora large vocabulary evaluations. 337-340 - Florian Hilger, Hermann Ney:
Evaluation of quantile based histogram equalization with filter combination on the Aurora 3 and 4 databases. 341-344 - Luca Rigazio, Patrick Nguyen, David Kryze, Jean-Claude Junqua:
Large vocabulary noise robustness on Aurora4. 345-348 - Veronique Stouten, Hugo Van hamme, Jacques Duchateau, Patrick Wambacq:
Evaluation of model-based feature enhancement on the AURORA-4 task. 349-352 - José C. Segura, Javier Ramírez, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:
Improved feature extraction based on spectral noise reduction and nonlinear feature normalization. 353-356 - Young Joon Kim, Hyun Woo Kim, Woohyung Lim, Nam Soo Kim:
Feature compensation technique for robust speech recognition in noisy environments. 357-360
Multilingual Speech-to-Speech Translation
- Hermann Ney:
The statistical approach to machine translation and a roadmap for speech translation. 361-364 - Yuqing Gao:
Coupling vs. unifying: modeling techniques for speech-to-speech translation. 365-368 - Alex Waibel, Ahmed Badran, Alan W. Black, Robert E. Frederking, Donna Gates, Alon Lavie, Lori S. Levin, Kevin A. Lenzo, Laura Mayfield Tomokiyo, Jürgen Reichert, Tanja Schultz, Dorcas Wallace, Monika Woszczyna, Jing Zhang:
Speechalator: two-way speech-to-speech translation on a consumer PDA. 369-372 - Horacio Franco, Jing Zheng, Kristin Precoda, Federico Cesari, Victor Abrash, Dimitra Vergyri, Anand Venkataraman, Harry Bratt, Colleen Richey, Ace Sarich:
Development of phrase translation systems for handheld computers: from concept to field. 373-376 - Marcello Federico:
Evaluation frameworks for speech translation technologies. 377-380 - Gen-ichiro Kikui, Eiichiro Sumita, Toshiyuki Takezawa, Seiichi Yamamoto:
Creating corpora for speech-to-speech translation. 381-384
Prosody
- Nobuaki Minematsu, Bungo Matsuoka, Keikichi Hirose:
Prosodic analysis and modeling of the NAGAUTA singing to synthesize its prosodic patterns from the standard notation. 385-388 - Davood Gharavian, Seyed Mohammad Ahadi:
Statistical evaluation of the influence of stress on pitch frequency and phoneme durations in farsi language. 389-392 - Ken Chen, Sarah Borys, Mark Hasegawa-Johnson, Jennifer Cole:
Prosody dependent speech recognition with explicit duration modelling at intonational phrase boundaries. 393-396 - João Paulo Ramos Teixeira, Diamantino Freitas, Hiroya Fujisaki:
Prediction of fujisaki model's phrase commands. 397-400 - Makiko Muto, Yoshinori Sagisaka, Takuro Naito, Daiju Maeki, Aki Kondo, Katsuhiko Shirai:
Corpus-based modeling of naturalness estimation in timing control for non-native speech. 401-404 - Carlos Toshinori Ishi, Parham Mokhtari, Nick Campbell:
Perceptually-related acoustic-prosodic features of phrase finals in spontaneous speech. 405-408
Language Modeling
- David Langlois, Kamel Smaïli, Jean Paul Haton:
Efficient linear combination for distant n-gram models. 409-412 - Ahmad Emami:
Improving a connectionist based syntactical language model. 413-416 - Mikio Nakano, Timothy J. Hazen:
Using untranscribed user utterances for improving language models based on confidence scoring. 417-420 - Pi-Chuan Chang, Shuo-Peng Liao, Lin-Shan Lee:
Improved Chinese broadcast news transcription by language modeling with temporally consistent training corpora and iterative phrase extraction. 421-424 - Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh:
Language model adaptation using word clustering. 425-428 - Ian R. Lane, Tatsuya Kawahara, Tomoko Matsui, Satoshi Nakamura:
Hierarchical topic classification for dialog speech recognition based on language model switching. 429-432
Speech Modeling and Features 1-4
- Paavo Alku, Tom Bäckström:
Linear predictive method with low-frequency emphasis. 433-436 - Pratibha Jain, Hynek Hermansky:
Beyond a single critical-band in TRAP based ASR. 437-440 - Fabio Valente, Christian Wellekens:
Variational Bayesian GMM for speech recognition. 441-444 - Yamato Wada, Masahide Sugiyama:
Time alignment for scenario and sounds with voice, music and BGM. 445-448 - Phu Chien Nguyen, Masato Akagi:
Efficient quantization of speech excitation parameters using temporal decomposition. 449-452 - Robert van Kommer, Béat Hirsbrunner:
Distributed genetic algorithm to discover a wavelet packet best basis for speech recognition. 453-456 - Chao-Shih Huang, Chin-Hui Lee, Hsiao-Chuan Wang:
New model-based HMM distances with applications to run-time ASR error estimation and model tuning. 457-460 - Tokihiko Kaburagi, Koji Kawai:
Analysis of voice source characteristics using a constrained polynomial model. 461-464 - Jinfu Ni, Hisashi Kawai:
Tone pattern discrimination combining parametric modeling and maximum likelihood estimation. 465-468 - Stuart N. Wrigley, Guy J. Brown, Vincent Wan, Steve Renals:
Feature selection for the classification of crosstalk in multi-channel audio. 469-472 - Jingwei Liu:
A DTW-based DAG technique for speech and speaker feature analysis. 473-476 - Panu Somervuo, Barry Y. Chen, Qifeng Zhu:
Feature transformations and combinations for improving ASR performance. 477-480 - Chiu-yu Tseng:
On the role of intonation in the organization of Mandarin Chinese speech prosody. 481-484 - Yuichi Ohkawa, Akihiro Yoshida, Motoyuki Suzuki, Akinori Ito, Shozo Makino:
An optimized multi-duration HMM for spontaneous speech recognition. 485-488 - Hyoung-Gook Kim, Edgar Berdahl, Nicolas Moreau, Thomas Sikora:
Speaker recognition using MPEG-7 descriptors. 489-492 - Wolfgang Macherey, Hermann Ney:
A comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition. 493-496 - András Zolnay, Ralf Schlüter, Hermann Ney:
Extraction methods of voicing feature for robust speech recognition. 497-500 - Luca Armani, Marco Matassoni, Maurizio Omologo, Piergiorgio Svaizer:
Use of a CSP-based voice activity detector for distant-talking ASR. 501-504 - Mohamed Kamal Omar, Mark Hasegawa-Johnson:
Maximum conditional mutual information projection for speech recognition. 505-508 - Dafydd Gibbon, Ulrike Gut, Benjamin Hell, Karin Looks, Alexandra Thies, Thorsten Trippel:
A computational model of arm gestures in conversation. 813-816 - Vassilis Pitsikalis, Iasonas Kokkinos, Petros Maragos:
Nonlinear analysis of speech signals: generalized dimensions and lyapunov exponents. 817-820 - Petr Motlícek, Jan Cernocký:
Time-domain based temporal processing with application of orthogonal transformations. 821-824 - Petr Schwarz, Pavel Matejka, Jan Cernocký:
Recognition of phoneme strings using TRAP technique. 825-828 - Tibor Fegyó, Péter Mihajlik, Péter Tatai:
Comparative study on hungarian acoustic model sets and training methods. 829-832 - Alain de Cheveigné, Alexis Baskind:
F_0 estimation of one or several voices. 833-836 - Sunil Sivadas, Hynek Hermansky:
In search of target class definition in tandem feature extraction. 837-840 - André Gustavo Adami, Hynek Hermansky:
Segmentation of speech for speaker and language recognition. 841-844 - Xiang Li, Richard M. Stern:
Feature generation based on maximum classification probability for improved speech recognition. 845-848 - Kaisheng Yao, Kuldip K. Paliwal, Te-Won Lee:
Speech recognition with a generative factor analyzed hidden Markov model. 849-852 - Barry Y. Chen, Shuangyu Chang, Sunil Sivadas:
Learning discriminative temporal patterns in speech: development of novel TRAPS-like classifiers. 853-856 - Patricia Scanlon, Daniel P. W. Ellis, Richard B. Reilly:
Using mutual information to design class-specific phone recognizers. 857-860 - Helenca Duxans, Antonio Bonafonte:
Estimation of GMM in voice conversion including unaligned data. 861-864 - Keiichi Tokuda, Heiga Zen, Tadashi Kitamura:
Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features. 865-868 - Hermann Bauerecker, Climent Nadeu, Jaume Padrell:
On the advantage of frequency-filtering features for speech recognition with variable sampling frequencies. experiments with speechdatcar databases. 869-872 - Hansjörg Mixdorff, Hiroya Fujisaki, Gao Peng Chen, Yu Hu:
Towards the automatic extraction of fujisaki model parameters for Mandarin. 873-876 - S. S. Airey, Mark J. F. Gales:
Product of Gaussians as a distributed representation for speech recognition. 877-880 - Davor Petrinovic:
Harmonic weighting for all-pole modeling of the voiced speech. 881-884 - Nobuyuki Nishizawa, Keikichi Hirose, Nobuaki Minematsu:
Estimation of resonant characteristics based on AR-HMM modeling and spectral envelope conversion of vowel sounds. 885-888 - Hynek Hermansky, Pratibha Jain:
Band-independent speech-event categories for TRAP based ASR. 1013-1016 - Frantisek Grézl, Hynek Hermansky:
Local averaging and differentiating of spectral plane for TRAP-based ASR. 1017-1020 - Matthias Wölfel, John W. McDonough, Alex Waibel:
Minimum variance distortionless response on a warped frequency scale. 1021-1024 - Xuechuan Wang, Douglas D. O'Shaughnessy:
Improving the efficiency of automatic speech recognition by feature transformation and dimensionality reduction. 1025-1028 - Jan Stadermann, Gerhard Rigoll:
Distributed speech recognition on the WSJ task. 1029-1032 - Sebastian Stüker, Florian Metze, Tanja Schultz, Alex Waibel:
Integrating multilingual articulatory features into speech recognition. 1033-1036 - Bojan Petek:
Locus equations determination using the speechdat(II). 2301-2304 - Michael Emonts, Deryle Lonsdale:
A memory-based approach to Cantonese tone recognition. 2305-2308 - David Escudero Mancebo, Valentín Cardeñoso-Payo, Antonio Bonafonte:
Experimental evaluation of the relevance of prosodic features in Spanish using machine learning techniques. 2309-2312 - Tomohiro Nakatani, Toshio Irino, Parham Zolfaghari:
Dominance spectrum based v/UV classification and f_0 estimation. 2313-2316 - Hiroya Fujisaki, Shuichi Narusawa, Sumio Ohno, Diamantino Freitas:
Analysis and modeling of f_0 contours of portuguese utterances based on the command-response model. 2317-2320 - Philip J. B. Jackson, David M. Moreno, Martin J. Russell, Javier Hernando:
Covariation and weighting of harmonically decomposed streams for ASR. 2321-2324
Speech Enhancement 1, 2
- Panikos Heracleous, Satoshi Nakamura, Kiyohiro Shikano:
A semi-blind source separation method for hands-free speech recognition of multiple talkers. 509-512 - Leonid G. Krasny, Ali S. Khayrallah:
Influence of the waveguide propagation on the antenna performance in a car cabin. 513-516 - Ilyas Potamitis, George Tremoulis, Nikos Fakotakis:
Multi-speaker DOA tracking using interactive multiple models and probabilistic data association. 517-520 - Ching-Ta Lu, Hsiao-Chuan Wang:
Speech enhancement using weighting function based on the variance of wavelet coefficients. 521-524 - Ilyas Potamitis, Eran Fishler:
Microphone array voice activity detection and noise suppression using wideband generalized likelihood ratio. 525-528 - Zoran Saric, Slobodan Jovicic:
Adaptive beamforming in room with reverberation. 529-532 - Gwo-hwa Ju, Lin-Shan Lee:
Perceptually-constrained generalized singular value decomposition-based approach for enhancing speech corrupted by colored noise. 533-536 - Hiroaki Yamajo, Hiroshi Saruwatari, Tomoya Takatani, Tsuyoki Nishikawa, Kiyohiro Shikano:
Blind separation and deconvolution for convolutive mixture of speech using SIMO-model-based ICA and multichannel inverse filtering. 537-540 - D. G. Raza, C. F. Chan:
Quality enhancement of CELP coded speech by using an MFCC based Gaussian mixture model. 541-544 - Hyoung-Gook Kim, Markus Schwab, Nicolas Moreau, Thomas Sikora:
Enhancement of noisy speech for noise robust front-end and speech reconstruction at back-end of DSR system. 545-548 - Jianqiang Wei, Limin Du, Zhaoli Yan, Hui Zeng:
Improved kalman filter-based speech enhancement. 549-552 - Toshio Irino, Roy D. Patterson, Hideki Kawahara:
Speech segregation based on fundamental event information using an auditory vocoder. 553-556 - Zhaoli Yan, Limin Du, Jianqiang Wei, Hui Zeng:
Time delay estimation based on hearing characteristic. 557-560 - Mikhail Stolbov, Serguei Koval, Mikhail Khitrov:
Parametric multi-band automatic gain control for noisy speech enhancement. 561-564 - Bernd Iser, Gerhard Schmidt:
Neural networks versus codebooks in an application for bandwidth extension of speech signals. 565-568 - Essa Jafer, Abdulhussain E. Mahdi:
Wavelet-based perceptual speech enhancement using adaptive threshold estimation. 569-572 - Ilyas Potamitis, Nikos Fakotakis, George Kokkinakis:
A trainable speech enhancement technique based on mixture models for speech and noise. 573-576 - Qiang Fu, Eric A. Wan:
Perceptual wavelet adaptive denoising of speech. 577-580 - B. Yegnanarayana, S. R. Mahadeva Prasanna, Mathew Magimai-Doss:
Enhancement of speech in multispeaker environment. 581-584 - Mitsunori Mizumachi, Satoshi Nakamura:
Noise reduction using paired-microphones on non-equally-spaced microphone arrangement. 585-588 - Nao Hodoshima, Takayuki Arai, Tsuyoshi Inoue, Keisuke Kinoshita, Akiko Kusumoto:
Improving speech intelligibility by steady-state suppression as pre-processing in small to medium sized halls. 1365-1368 - Chen-Long Lee, Ya-Ru Yang, Wen-Whei Chang, Yuan-Chuan Chiang:
Enhancement of hearing-impaired Mandarin speech. 1369-1372 - Agustín Álvarez, Victor Nieto Lluis, Pedro Gómez Vilda, Rafael Martínez:
Speech enhancement for a car environment using LP residual signal and spectral subtraction. 1373-1376 - Gwo-hwa Ju, Lin-Shan Lee:
Speech enhancement and improved recognition accuracy by integrating wavelet transform and spectral subtraction algorithm. 1377-1380 - Gaël Mahé, André Gilloire:
Multi-referenced correction of the voice timbre distortions in telephone networks. 1381-1384 - J. J. Lee, J. H. Lee, K. Y. Lee:
Efficient speech enhancement based on left-right HMM with state sequence detection using LRT. 1385-1388 - H. Gnaba, Monia Turki-Hadj Alouane, Meriem Jaïdane-Saïdane, Pascal Scalart:
Introduction of the CELP structure of the GSM coder in the acoustic echo canceller for the GSM network. 1389-1392 - David Sodoyer, Laurent Girin, Christian Jutten, Jean-Luc Schwartz:
Extracting an AV speech source from a mixture of signals. 1393-1396 - Henning Puder:
Speech enhancement for hands-free car phones by adaptive compensation of harmonic engine noise components. 1397-1400 - Zhaorong Hou, Ying Jia:
Enhance low-frequency suppression of GSC beamforming. 1401-1404 - Sriram Srinivasan, Jonas Samuelsson, W. Bastiaan Kleijn:
Speech enhancement using a-priori information. 1405-1408 - John Hogden, Patrick Valdez, Shigeru Katagiri, Erik McDermott:
Blind inversion of multidimensional functions for speech enhancement. 1409-1412 - Hamid Reza Abutalebi, Hamid Sheikhzadeh, Robert L. Brennan, George H. Freeman:
Convergence improvement for oversampled subband adaptive noise and echo cancellation. 1413-1416 - Masashi Unoki, Keigo Sakata, Masato Akagi:
A speech dereverberation method based on the MTF concept. 1417-1420 - Sang-Gyun Kim, Jong Uk Kim, Chang D. Yoo:
Accuracy improved double-talk detector based on state transition diagram. 1421-1424 - Ajay Natarajan, John H. L. Hansen, Kathryn Hoberg Arehart, Jessica Rossi-Katz:
Perceptual based speech enhancement for normal-hearing and hearing-impaired individuals. 1425-1428 - Alfonso Ortega, Eduardo Lleida, Enrique Masgrau:
Residual echo power estimation for speech reinforcement systems in vehicles. 1429-1432 - Yasheng Qian, Peter Kabal:
Dual-mode wideband speech recovery from narrowband speech. 1433-1436 - Khaldoon Al-Naimi, Christian Sturt, Ahmet M. Kondoz:
A robust noise and echo canceller. 1437-1440 - Johannes Nix, Michael Kleinschmidt, Volker Hohmann:
Computational auditory scene analysis by using statistics of high-dimensional speech dynamics and sound source direction. 1441-1444
Spoken Dialog Systems 1, 2
- Silke M. Witt, Jason D. Williams:
Two studies of open vs. directed dialog strategies in spoken dialog systems. 589-592 - Ian M. O'Neill, Philip Hanna, Xingkun Liu, Michael F. McTear:
The queen's communicator: an object-oriented dialogue manager. 593-596 - Dan Bohus, Alexander I. Rudnicky:
Ravenclaw: dialog management using hierarchical task decomposition and an expectation agenda. 597-600 - Klaus Macherey, Hermann Ney:
Features for tree based dialogue course management. 601-604 - Francisco Torres, Emilio Sanchis, Encarna Segarra:
Development of a stochastic dialog manager driven by semantics. 605-608 - Masashi Takeuchi, Norihide Kitaoka, Seiichi Nakagawa:
Generation of natural response timing using decision tree based on prosodic and linguistic information. 609-612 - Linda Bell, Joakim Gustafson:
Child and adult speaker adaptation during error resolution in a publicly available spoken dialogue system. 613-616 - Yannick Estève, Christian Raymond, Frédéric Béchet, Renato de Mori:
Conceptual decoding for spoken dialog systems. 617-620 - Huei-Ming Wang, Yi-Chung Lin:
Sentence verification in spoken dialogue system. 621-624 - Norihide Kitaoka, Naoko Kakutani, Seiichi Nakagawa:
Detection and recognition of correction utterance in spontaneously spoken dialog. 625-628 - Chaitanya Ekanadham, Juan M. Huerta:
Topic-specific parser design in an air travel natural language understanding application. 629-632 - Stephen J. Cox, Gavin C. Cawley:
The use of confidence measures in vector based call-routing. 633-636 - Frédéric Béchet, Giuseppe Riccardi, Dilek Z. Hakkani-Tür:
Multi-channel sentence classification for spoken dialogue language modeling. 637-640 - Stephanie Seneff, Chao Wang, Timothy J. Hazen:
Automatic induction of n-gram language models from a natural language grammar. 641-644 - David Vilar, María José Castro, Emilio Sanchis:
Connectionist classification and specific stochastic models in the understanding process of a dialogue system. 645-648 - Johan Boye, Mats Wirén:
Robust parsing of utterances in negotiative dialogue. 649-652 - Chung-Hsien Wu, Gwo-Lang Yan:
Flexible speech act identification of spontaneous speech with disfluency. 653-656 - Kohji Dohsaka, Norihito Yasuda, Kiyoaki Aikawa:
Efficient spoken dialogue control depending on the speech recognition rate and system's database. 657-660 - Shinya Takahashi, Tsuyoshi Morimoto, Sakashi Maeda, Naoyuki Tsuruta:
Robust speech understanding based on expected discourse plan. 661-664 - Toshihiro Isobe, Shoji Hayakawa, Hiroya Murao, Tatsuji Mizutani, Kazuya Takeda, Fumitada Itakura:
A study on domain recognition of spoken dialogue systems. 1889-1892 - Wei He, Honglian Li, Baozong Yuan:
Domain adaptation augmented by state-dependence in spoken dialog systems. 1893-1896 - Thomas Portele, Silke Goronzy, Martin C. Emele, Andreas Kellner, Sunna Torge, Jürgen te Vrugt:
Smartkom-home - an advanced multi-modal interface to home entertainment. 1897-1900 - Yunbiao Xu, Fengying Di, Masahiro Araki, Yasuhisa Niimi:
Methods to improve its portability of a spoken dialog system both on task domains and languages. 1901-1904 - Tibor Fegyó, Péter Mihajlik, Máté Szarvas, Péter Tatai, Gábor Tatai:
Voxenter^TM - intelligent voice enabled call center for hungarian. 1905-1908 - Qiang Huang, Stephen J. Cox:
Automatic call-routing without transcriptions. 1909-1912 - Markku Turunen, Jaakko Hakulinen:
Jaspis^2 - an architecture for supporting distributed spoken dialogues. 1913-1916 - Janez Zibert, Sanda Martincic-Ipsic, Melita Hajdinjak, Ivo Ipsic, France Mihelic:
Development of a bilingual spoken dialog system for weather information retrieval. 1917-1920 - James Allen, David Attwater, Peter J. Durston, Mark Farrell:
Improving "how may i help you?" systems using the output of recognition lattices. 1921-1924 - Marco Andorno, Luciano Fissore, Pietro Laface, Mario Nigra, Cosmin Popovici, Franco Ravera, Claudio Vair:
Incremental learning of new user formulations in automatic directory assistance. 1925-1928 - Julie Baca, Feng Zheng, Hualin Gao, Joseph Picone:
Dialog systems for automotive environments. 1929-1932 - João Paulo Neto, Nuno J. Mamede, Renato Cassaca, Luís C. Oliveira:
The development of a multi-purpose spoken dialogue system. 1933-1936 - Silke Goronzy, Zica Valsan, Martin C. Emele, Juergen Schimanowski:
The dynamic, multi-lingual lexicon in smartkom. 1937-1940 - Ryuichiro Higashinaka, Noboru Miyazaki, Mikio Nakano, Kiyoaki Aikawa:
Evaluating discourse understanding in spoken dialogue systems. 1941-1944 - Lars Bo Larsen:
Assessment of spoken dialogue system usability - what are we really measuring? 1945-1948 - Paula M. T. Smeele, Juliette A. J. S. Waals:
Evaluation of a speech-driven telephone information service using the PARADISE framework: a closer look at subjective measures. 1949-1952 - Sebastian Möller, Janto Skowronek:
Quantifying the impact of system characteristics on perceived quality dimensions of a spoken dialogue service. 1953-1956 - Ganesh N. Ramaswamy, Ran D. Zilca, Oleg Alecksandrovich:
A programmable policy manager for conversational biometrics. 1957-1960 - Timothy J. Hazen, Douglas A. Jones, Alex Park, Linda C. Kukolich, Douglas A. Reynolds:
Integration of speaker recognition into conversational spoken dialogue systems. 1961-1964
Robust Speech Recognition - Noise Compensation
- Yasunari Obuchi, Richard M. Stern:
Normalization of time-derivative parameters using histogram equalization. 665-668 - Zhipeng Zhang, Kiyotaka Otsuji, Sadaoki Furui:
Tree-structured noise-adapted HMM modeling for piecewise linear-transformation-based adaptation. 669-672 - Donglai Zhu, Satoshi Nakamura, Kuldip K. Paliwal, Ren-Hua Wang:
Maximum likelihood sub-band weighting for robust speech recognition. 673-676 - Wooil Kim, Sungjoo Ahn, Hanseok Ko:
Feature compensation scheme based on parallel combined mixture model. 677-680 - Jasha Droppo, Li Deng, Alex Acero:
A comparison of three non-linear observation models for noisy speech features. 681-684 - Khalid Daoudi, Murat Deviren:
A new supervised-predictive compensation scheme for noisy speech recognition. 685-688
Forensic Speaker Recognition
- Andrzej Drygajlo, Didier Meuwly, Anil Alexander:
Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition. 689-692 - Joaquin Gonzalez-Rodriguez, Daniel Garcia-Romero, Marta Garcia-Gomar, Daniel Ramos, Javier Ortega-Garcia:
Robust likelihood ratio estimation in Bayesian forensic speaker recognition. 693-696 - Hirotaka Nakasone:
Automated speaker recognition in real world conditions: controlling the uncontrollable. 697-700 - Beat Pfister, René Beutler:
Estimating the weight of evidence in forensic speaker verification. 701-704 - Stefan G. Gfrörer:
Auditory-instrumental forensic speaker recognition. 705-708 - Jose H. Kerstholt, E. J. M. Jansen, A. G. van Amelsvoort, A. P. A. Broeders:
Earwitness line-ups: effects of speech duration, retention interval and acoustic environment on identification accuracy. 709-712
Emotion in Speech
- Noam Amir, Shirley Ziv, Rachel Cohen:
Characteristics of authentic anger in hebrew speech. 713-716 - Tapio Seppänen, Eero Väyrynen, Juhani Toivanen:
Prosody-based classification of emotions in spoken finnish. 717-720 - Mandar A. Rahurkar, John H. L. Hansen:
Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech. 721-724 - Jackson Liscombe, Jennifer J. Venditti, Julia Hirschberg:
Classifying subject ratings of emotional speech using acoustic features. 725-728 - Sherif M. Yacoub, Steven J. Simske, Xiaofan Lin, John Burns:
Recognition of emotions in interactive voice response systems. 729-732 - Anton Batliner, Viktor Zeißler, Carmen Frank, Johann Adelhardt, Rui Ping Shi, Elmar Nöth:
We are not amused - but how do you know? user states in a multi-modal dialogue system. 733-736
Dialog System User and Domain Modeling
- Niels Ole Bernsen:
On-line user modelling in a mobile spoken dialogue system. 737-740 - Botond Pakucs:
Towards dynamic multi-domain dialogue processing. 741-744 - Kazunori Komatani, Shinichi Ueno, Tatsuya Kawahara, Hiroshi G. Okuno:
User modeling in spoken dialogue systems for flexible guidance generation. 745-748 - Stephanie Seneff, Grace Chung, Chao Wang:
Empowering end users to personalize dialogue systems through spoken interaction. 749-752 - Antoine Raux, Brian Langner, Alan W. Black, Maxine Eskénazi:
LET's GO: improving spoken dialog systems for the elderly and non-natives. 753-756 - Jaakko Hakulinen, Markku Turunen, Esa-Pekka Salonen:
Agents for integrated tutoring in spoken dialogue systems. 757-760
Topics in Speech Recognition and Segmentation
- Taeyoon Kim, Hanseok Ko:
Utterance verification under distributed detection and fusion framework. 889-892 - Simon Ka-Lung Ho, Brian Mak:
Joint estimation of thresholds in a bi-threshold verification problem. 893-896 - Samir Nefti, Olivier Boëffard, Thierry Moudenc:
Confidence measures for phonetic segmentation of continuous speech. 897-900 - Pascal Wiggers, Léon J. M. Rothkrantz:
Using confidence measures and domain knowledge to improve speech recognition. 901-904 - Kishan Thambiratnam, Sridha Sridharan:
Isolated word verification using cohort word-level verification. 905-908 - Wing-Hei Au, Man-Hung Siu:
A new approach to minimize utterance verification error rate for a specific operating point. 909-912 - Binfeng Yan, Rui Guo, Xiaoyan Zhu:
Continuous speech recognition and verification based on a combination score. 913-916 - Tibor Fábián, Robert Lieb, Günther Ruske, Matthias Thomae:
Impact of word graph density on the quality of posterior probability based confidence measures. 917-920 - Panikos Heracleous, Tohru Shimizu:
An efficient keyword spotting technique using a complementary language for filler models training. 921-924 - Michael Levit, Hiyan Alshawi, Allen L. Gorin, Elmar Nöth:
Context-sensitive evaluation and correction of phone recognition output. 925-928 - Yonggang Deng, Milind Mahajan, Alex Acero:
Estimating speech recognition error rate without acoustic test data. 929-932 - Maximilian Bisani, Hermann Ney:
Multigram-based grapheme-to-phoneme conversion for LVCSR. 933-936 - René Beutler, Beat Pfister:
Integrating statistical and rule-based knowledge for continuous German speech recognition. 937-940 - An Vandecatseye, Jean-Pierre Martens:
A fast, accurate and stream-based speaker segmentation and clustering algorithm. 941-944 - Shih-Sian Cheng, Hsin-Min Wang:
A sequential metric-based audio segmentation method via the Bayesian information criterion. 945-948 - Amit Srivastava, Francis Kubala:
Sentence boundary detection in arabic speech. 949-952 - Martin Franz, Bhuvana Ramabhadran, Todd Ward, Michael Picheny:
Automated transcription and topic segmentation of large spoken archives. 953-956 - Yang Liu, Elizabeth Shriberg, Andreas Stolcke:
Automatic disfluency identification in conversational speech using multiple knowledge sources. 957-960 - Natsuo Yamamoto, Jun Ogata, Yasuo Ariki:
Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition. 961-964
Robust Speech Recognition - Acoustic Modeling
- Konstantin Markov, Jianwu Dang, Yosuke Iizuka, Satoshi Nakamura:
Hybrid HMM/BN ASR system integrating spectrum and articulatory features. 965-968 - Georg Stemmer, Viktor Zeißler, Christian Hacker, Elmar Nöth, Heinrich Niemann:
Context-dependent output densities for hidden Markov models in speech recognition. 969-972 - Takahiro Shinozaki, Sadaoki Furui:
Time adjustable mixture weights for speaking rate fluctuation. 973-976 - Jian Wu, Qiang Huo:
A switching linear Gaussian hidden Markov model and its application to nonstationary noise compensation for robust speech recognition. 977-980 - Vivek Tyagi, Iain McCowan, Hervé Bourlard, Hemant Misra:
On factorizing spectral dynamics for robust speech recognition. 981-984 - Chuan Jia, Peng Ding, Bo Xu:
Joint model and feature based compensation for robust speech recognition under non-stationary noise environments. 985-988
Advanced Machine Learning Algorithms for Speech and Language Processing
- Corinna Cortes, Patrick Haffner, Mehryar Mohri:
Weighted automata kernels - general framework and algorithms. 989-992 - Yasemin Altun, Thomas Hofmann:
Large margin methods for label sequence learning. 993-996 - Gunnar Rätsch:
Robust multi-class boosting. 997-1000 - Lawrence K. Saul, Fei Sha, Daniel D. Lee:
Statistical signal processing with nonnegativity constraints. 1001-1004 - Ashutosh Garg, Manfred K. Warmuth:
Inline updates for HMMs. 1005-1008 - Sam T. Roweis:
Factorial models and refiltering for speech separation and denoising. 1009-1012
Multi-Modal Spoken Language Processing
- Alexandra Klein, Harald Trost:
Using corpus-based methods for spoken access to news texts on the web. 1037-1040 - Douglas Brungart, Brian D. Simpson, Alexander J. Kordik:
Cross-modal informational masking due to mismatched audio cues in a speechreading task. 1041-1044 - Frédéric Berthommier:
Audiovisual speech enhancement based on the association between speech envelope and video features. 1045-1048 - Rainer Wasinger, Christoph Stahl, Antonio Krüger:
Robust speech interaction in a mobile environment through the use of multiple and different media input types. 1049-1052 - Rogier Woltjer, Wah Jin Tan, Fang Chen:
Speech-based, manual-visual, and multi-modal interaction with an in-car computer - evaluation of a pilot study. 1053-1056 - Plamen J. Prodanov, Andrzej Drygajlo:
Bayesian networks for spoken dialogue management in multimodal systems of tour-guide robots. 1057-1060
Speech Coding and Transmission
- Wai C. Chu, Toshio Miki:
Optimization of window and LSF interpolation factor for the ITU-t g.729 speech coding standard. 1061-1064 - Joon-Hyuk Chang, Jong Won Shin, Nam Soo Kim:
Likelihood ratio test with complex laplacian model for voice activity detection. 1065-1068 - Jani Nurminen:
Multi-mode quantization of adjacent speech parameters using a low-complexity prediction scheme. 1069-1072 - Ulpu Sinervo, Jani Nurminen, Ari Heikkinen, Jukka Saarinen:
Multi-mode matrix quantizer for low bit rate LSF quantization. 1073-1076 - Frank Mertz, Hervé Taddei, Imre Varga, Peter Vary:
Voicing controlled frame loss concealment for adaptive multi-rate (AMR) speech frames in voice-over-IP. 1077-1080 - Marja Lahdekorpi, Jani Nurminen, Ari Heikkinen, Jukka Saarinen:
Perceptual irrelevancy removal in narrowband speech coding. 1081-1084 - Charles du Jeu, Maurice Charbit, Gérard Chollet:
Very-low-rate speech compression by indexation of polyphones. 1085-1088 - Victoria E. Sánchez, Antonio M. Peinado, Angel M. Gomez, José L. Pérez-Córdoba:
Entropy-optimized channel error mitigation with application to speech recognition over wireless. 1089-1092 - Venkatesh Krishnan, David V. Anderson:
Robust jointly optimized multistage vector quantization for speech coding. 1093-1096 - Harald Pobloth, Renat Vafin, W. Bastiaan Kleijn:
Polar quantization of sinusoids from speech signal blocks. 1097-1100 - Sung-Wan Yoon, Jin-Kyu Choi, Hong-Goo Kang, Dae Hee Youn:
Transcoding algorithm for g.723.1 and AMR speech coders: for interoperability between voIP and mobile networks. 1101-1104 - Davorka Petrinovic, Davor Petrinovic:
Quality-complexity trade-off in predictive LSF quantization. 1105-1108 - Kei Kikuiri, Nobuhiko Naka, Tomoyuki Ohya:
Variable bit rate control with trellis diagram approximation. 1109-1112 - Naveen Srinivasamurthy, Antonio Ortega, Shrikanth S. Narayanan:
Towards optimal encoding for classification with applications to distributed speech recognition. 1113-1116 - Mohammed Raad, Ian S. Burnett, Alfred Mertins:
Multi-rate extension of the scalable to lossless PSPIHT audio coder. 1117-1120 - Turaj Zakizadeh Shabestary, Per Hedelin, Fredrik Nordén:
Entropy constrained quantization of LSP parameters. 1121-1124
Speech Recognition - Search and Lexicon Modeling
- Akio Kobayashi, Franz Josef Och, Hermann Ney:
Named entity extraction from Japanese broadcast news. 1125-1128 - Young-Hee Park, Dong-Hoon Ahn, Minhwa Chung:
Morpheme-based lexical modeling for korean broadcast news transcription. 1129-1132 - Mathias De Wachter, Kris Demuynck, Dirk Van Compernolle, Patrick Wambacq:
Data driven example based continuous speech recognition. 1133-1136 - Sergey Astrov, Bernt Andrassy:
Large vocabulary speaker independent isolated word recognition for embedded systems. 1137-1140 - Alexander Seward:
Low-latency incremental speech transcription in the synface project. 1141-1144 - Stephan Kanthak, Hermann Ney:
Multilingual acoustic modeling using graphemes. 1145-1148 - Atsushi Fujii, Katunobu Itou, Tomoyosi Akiba, Tetsuya Ishikawa:
A cross-media retrieval system for lecture videos. 1149-1152 - Atsushi Fujii, Katunobu Itou:
Building a test collection for speech-driven web retrieval. 1153-1156 - Miroslav Novak, Diego Ruiz:
Confidence measure driven scalable two-pass recognition strategy for large list grammars. 1157-1160 - Sherif M. Abdou, Michael S. Scordilis:
An efficient, fast matching approach using posterior probability estimates in speech recognition. 1161-1164 - Kadri Hacioglu, Bryan L. Pellom, Tolga Çiloglu, Özlem Öztürk, Mikko Kurimo, Mathias Creutz:
On lexicon creation for turkish LVCSR. 1165-1168 - Stanley F. Chen:
Compiling large-context phonetic decision trees into finite-state transducers. 1169-1172 - Sameer Maskey, Julia Hirschberg:
Automatic summarization of broadcast news using structural features. 1173-1176 - Yonghong Yan, Chengyi Zheng, Jianping Zhang, Jielin Pan, Jiang Han, Jian Liu:
A dynamic cross-reference pruning strategy for multiple feature fusion at decoder run time. 1177-1180 - Paul Lamere, Philip Kwok, William Walker, Evandro B. Gouvêa, Rita Singh, Bhiksha Raj, Peter Wolf:
Design of the CMU sphinx-4 decoder. 1181-1184 - Onur Cilingir, Mübeccel Demirekler:
A new decoder design for large vocabulary turkish speech recognition. 1185-1188
Speech Technology Applications
- Phil D. Green, James Carmichael, Athanassios Hatzis, Pam Enderby, Mark S. Hawley, Mark Parker:
Automatic speech recognition with sparse training data for dysarthric speakers. 1189-1192 - Akira Inoue, Takayoshi Mikami, Yoichi Yamashita:
Prediction of sentence importance for speech summarization using prosodic parameters. 1193-1196 - Chong-kai Wang, Ren-Yuan Lyu, Yuang-Chin Chiang:
An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker. 1197-1200 - Masataka Goto, Yukihiro Omoto, Katunobu Itou, Tetsunori Kobayashi:
Speech shift: direct speech-input-mode switching through intentional control of voice pitch. 1201-1204 - Masahiko Matsushita, Hiromitsu Nishizaki, Takehito Utsuro, Yasuhiro Kodama, Seiichi Nakagawa:
Evaluating multiple LVCSR model combination in NTCIR-3 speech-driven web retrieval task. 1205-1208 - Kuansan Wang:
Semantic object synchronous understanding in SALT for highly interactive user interface. 1209-1212 - Jan Kneissler, Anne K. Kienappel, Dietrich Klakow:
Information retrieval based call classification. 1213-1216 - Martha A. Larson, Stefan Eickeler:
Using syllable-based indexing features and language models to improve German spoken document retrieval. 1217-1220 - Shiva Sundaram, Shrikanth S. Narayanan:
An empirical text transformation method for spontaneous speech synthesizers. 1221-1224 - Yilmaz Gul, Aladdin M. Ariyaeeinia, Oliver Dewhirst:
A new approach to reducing alarm noise in speech. 1225-1228 - Dong Yu, Kuansan Wang, Milind Mahajan, Peter Mau, Alex Acero:
Improved name recognition with user modeling. 1229-1232 - Ziad Al Bawab, Ivo Locher, Jianxia Xue, Abeer Alwan:
Speech recognition over bluetooth wireless channels. 1233-1236 - Koji Kitayama, Masataka Goto, Katunobu Itou, Tetsunori Kobayashi:
Speech starter: noise-robust endpoint detection by using filled pauses. 1237-1240 - Gilles Boulianne, Jean-Francois Beaumont, Patrick Cardinal, Michel Comeau, Pierre Ouellet, Pierre Dumouchel:
Automatic segmentation of film dialogues into phonemes and graphemes. 1241-1244 - Julie Brousseau, Jean-Francois Beaumont, Gilles Boulianne, Patrick Cardinal, Claude Chapdelaine, Michel Comeau, Frédéric Osterrath, Pierre Ouellet:
Automated closed-captioning of live TV broadcast news in French. 1245-1248 - E. E. Jan, Benoît Maison, Lidia Mangu, Geoffrey Zweig:
Automatic construction of unique signatures and confusable sets for natural language directory assistance applications. 1249-1252 - Helen M. Meng, Yuk-Chi Li, Tien Ying Fung, Man Cheuk Ho, Chi-Kin Keung, Tin Hang Lo, Wai Kit Lo, P. C. Ching:
Recent enhancements in CU VOCAL for Chinese TTS-enabled applications. 1253-1256 - Isabel Trancoso, João Paulo Neto, Hugo Meinedo, Rui Amaral:
Evaluation of an alert system for selective dissemination of broadcast news. 1257-1260 - Udar Mittal, James P. Ashley, Edgardo M. Cruz-Zeno:
Low complexity joint optimization of excitation parameters in analysis-by-synthesis speech coding. 1261-1264 - James Horlock, Simon King:
Named entity extraction from word lattices. 1265-1268 - William Belfield, Herbert Gish:
A topic classification system based on parametric trajectory mixture models. 1269-1272
Robust Speech Recognition - Front-end Processing
- Kaisheng Yao, Kuldip K. Paliwal, Satoshi Nakamura:
Model based noisy speech recognition with environment parameters estimated by noise adaptive speech recognition with prior. 1273-1276 - Michael L. Seltzer, Jasha Droppo, Alex Acero:
A harmonic-model-based front end for robust speech recognition. 1277-1280 - Umit H. Yapanel, John H. L. Hansen:
A new perspective on feature extraction for robust in-vehicle speech recognition. 1281-1284 - Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi:
Speech recognition of double talk using SAFIA-based audio segregation. 1285-1288 - Xianxian Zhang, John H. L. Hansen:
CFA-BF: a novel combined fixed/adaptive beamforming for robust speech recognition in real car environments. 1289-1292 - Gerasimos Potamianos, Chalapathy Neti:
Audio-visual speech recognition in challenging environments. 1293-1296
Spoken Language Processing for e-Inclusion
- Inger Karlsson, Andrew Faulkner, Giampiero Salvi:
SYNFACE - a talking face telephone. 1297-1300 - Bostjan Vesnicer, Janez Zibert, Simon Dobrisek, Nikola Pavesic, France Mihelic:
A voice-driven web browser for blind people. 1301-1304 - Christian A. Müller, Frank Wittig, Jörg Baus:
Exploiting speech for recognizing elderly users to respond to their special needs. 1305-1308 - Alan F. Newell:
Spoken language and e-inclusion. 1309-1312 - Georg Stemmer, Christian Hacker, Stefan Steidl, Elmar Nöth:
Acoustic normalization of children's speech. 1313-1316
Language and Accent Identification
- Alvin F. Martin, Mark A. Przybocki:
NIST 2003 language recognition evaluation. 1341-1344 - Elliot Singer, Pedro A. Torres-Carrasquillo, Terry P. Gleason, William M. Campbell, Douglas A. Reynolds:
Acoustic, phonetic, and discriminative approaches to automatic language identification. 1345-1348 - Stanley F. Chen, Benoît Maison:
Using place name data to train language identification models. 1349-1352 - Pongtep Angkititrakul, John H. L. Hansen:
Use of trajectory models for automatic accent classification. 1353-1356 - V. Ramasubramanian, A. K. V. Sai Jayram, T. V. Sreenivas:
Language identification using parallel sub-word recognition - an ergodic HMM equivalence. 1357-1360 - Mohamed Faouzi BenZeghiba, Hervé Bourlard:
On the combination of speech and speaker recognition. 1361-1364
Speech Recognition - Adaptation 1, 2
- Michael Pitz, Hermann Ney:
Vocal tract normalization as linear transformation of MFCC. 1445-1448 - Zhirong Wang, Tanja Schultz:
Non-native spontaneous speech recognition through polyphone decision tree specialization. 1449-1452 - Yasuo Ariki, Takeru Shigemori, Tsuyoshi Kaneko, Jun Ogata, Masakiyo Fujimoto:
Live speech recognition in sports games by adaptation of acoustic model and language model. 1453-1456 - Se-Jin Oh, Kwang-Dong Kim, Duk-Gyoo Roh, Woo-Chang Sung, Hyun-Yeol Chung:
Speaker adaptation using regression classes generated by phonetic decision tree-based successive state splitting. 1457-1460 - Jiun Kim, Jaeho Chung:
Reduction of dimension of HMM parameters using ICA and PCA in MLLR framework for speaker adaptation. 1461-1464 - Huayun Zhang, Bo Xu:
Geometric constrained maximum likelihood linear regression on Mandarin dialect adaptation. 1465-1468 - Tomoyosi Akiba, Katunobu Itou, Atsushi Fujii:
Adapting language models for frequent fixed phrases by emphasizing n-gram subsets. 1469-1472 - Anne K. Kienappel:
Learning intra-speaker model parameter correlations from many short speaker segments. 1473-1476 - Patgi Kam, Tan Lee, Frank K. Soong:
Modeling Cantonese pronunciation variation by acoustic model refinement. 1477-1480 - Jong Se Park, Hwa Jeon Song, Hyung Soon Kim:
Performance improvement of rapid speaker adaptation based on eigenvoice and bias compensation. 1481-1484 - Xiaoshan Fang, Jianfeng Gao, Jianfeng Li, Huanye Sheng:
Training data optimization for language model adaptation. 1485-1488 - Stefanie Aalburg, Harald Höge:
Approaches to foreign-accented speaker-independent speech recognition. 1489-1492 - Shingo Yamade, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments. 1493-1496 - Fabrice Lauri, Irina Illina, Dominique Fohr, Filipp Korkmazsky:
Using genetic algorithms for rapid speaker adaptation. 1497-1500 - Vincent Barreaud, Irina Illina, Dominique Fohr, Filipp Korkmazsky:
Structural state-based frame synchronous compensation. 1501-1504 - Aaron D. Lawson, David M. Harris, John J. Grieco:
Effect of foreign accent on speech recognition in the NATO n-4 corpus. 1505-1508 - Jon P. Nedel, Richard M. Stern:
Duration normalization and hypothesis combination for improved spontaneous speech recognition. 1509-1512 - Wu Chou, Xiaodong He:
Maximum a posteriori linear regression (MAPLR) variance adaptation for continuous density HMMS. 1513-1516 - Tor André Myrvoll, Frank K. Soong:
On divergence based clustering of normal distributions and its application to HMM adaptation. 1517-1520 - Sreeram V. Balakrishnan:
Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent. 1521-1524 - Scott Axelrod, Vaibhava Goel, Brian Kingsbury, Karthik Visweswariah, Ramesh A. Gopinath:
Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices. 1613-1616 - Gyucheol Jang, Minho Jin, Chang D. Yoo:
Speaker adaptation based on confidence-weighted training. 1617-1620 - Alberto Abad, Climent Nadeu, Javier Hernando, Jaume Padrell:
Jacobian adaptation based on the frequency-filtered spectral energies. 1621-1624 - Driss Matrouf, Olivier Bellot, Pascal Nocera, Georges Linarès, Jean-François Bonastre:
Structural linear model-space transformations for speaker adaptation. 1625-1628 - Xiaodong He, Wu Chou:
Minimum classification error (MCE) model adaptation of continuous density HMMS. 1629-1632 - Asela Gunawardana, Alex Acero:
Adapting acoustic models to new domains and conditions using untranscribed data. 1633-1636
Speech Resources and Standards
- Mahmood Bijankhan, Javad Sheykhzadegan, Mahmood R. Roohani, Rahman Zarrintare, Seyyed Z. Ghasemi, Mohammad E. Ghasedi:
Tfarsdat - the telephone farsi speech database. 1525-1528 - Elviira Hartikainen, Giulio Maltese, Asunción Moreno, Shaunie Shammass, Ute Ziegenhain:
Large lexica for speech-to-speech translation: from specification to creation. 1529-1532 - Kemal Oflazer, Sharon Inkelas:
A pronunciation lexicon for turkish based on two-level morphology. 1533-1536 - Hong Zheng, Yiqing Lu:
Using both global and local hidden Markov models for automatic speech unit segmentation. 1537-1540 - Henk van den Heuvel, Khalid Choukri, Harald Höge, Bente Maegaard, Jan Odijk, Valérie Mapelli:
Quality control of language resources at ELRA. 1541-1544 - Christophe Van Bael, Diana Binnenpoorte, Helmer Strik, Henk van den Heuvel:
Validation of phonetic transcriptions based on recognition performance. 1545-1548 - Inmaculada Hernáez, Iker Luengo, Eva Navas, Maria Luisa Zubizarreta, Iñaki Gaminde, Jon Sánchez:
The basque speech_dat (II) database: a description and first test recognition results. 1549-1552 - Jens Maase, Diane Hirschfeld, Uwe Koloska, Timo Westfeld, Jörg Helbig:
Towards an evaluation standard for speech control concepts in real-world scenarios. 1553-1556 - Christoph Draxler:
Orientel: recording telephone speech of turkish speakers in Germany. 1557-1560 - Gerhard Backfried, Roser Jaquemot Caldes:
Spanish broadcast news transcription. 1561-1564 - Vassilios Digalakis, Dimitris Oikonomidis, Dimitris Pratsolis, Nikos Tsourakis, Christos Vosnidis, Nikos Chatzichrisafis, Vassilios Diakoloukas:
Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system. 1565-1568 - Philippe Daubias, Paul Deléglise:
The LIUM-AVS database : a corpus to test lip segmentation and speechreading systems in natural conditions. 1569-1572 - Özgül Salor, Bryan L. Pellom, Mübeccel Demirekler:
Implementation and evaluation of a text-to-speech synthesis system for turkish. 1573-1576 - Jáchym Kolár, Jan Romportl, Josef Psutka:
The czech speech and prosody database both for ASR and TTS purposes. 1577-1580 - Itsuki Kishida, Yuki Irie, Yukiko Yamaguchi, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki:
Construction of an advanced in-car spoken dialogue corpus and its characteristic analysis. 1581-1584 - Douglas A. Jones, Florian Wolf, Edward Gibson, Elliott Williams, Evelina Fedorenko, Douglas A. Reynolds, Marc A. Zissman:
Measuring the readability of automatic speech-to-text transcripts. 1585-1588 - Nadia Mana, Susanne Burger, Roldano Cattoni, Laurent Besacier, Victoria MacLaren, John W. McDonough, Florian Metze:
The NESPOLE! voIP multilingual corpora in tourism and medical domains. 1589-1592 - David Conejero, Jesús Giménez, Victoria Arranz, Antonio Bonafonte, Neus Pascual, Núria Castell, Asunción Moreno:
Lexica and corpora for speech-to-speech translation: a trilingual approach. 1593-1596 - Christopher Cieri, David Miller, Kevin Walker:
From switchboard to fisher: telephone collection protocols, their uses and yields. 1597-1600 - Einar Meister, Jürgen Lasn, Lya Meister:
Development of the estonian speechdat-like database. 1601-1604 - António Joaquim Serralheiro, Isabel Trancoso, Diamantino Caseiro, Teresa Chambel, Luís Carriço, Nuno Guimarães:
Towards a repository of digital talking books. 1605-1608 - Stephanie M. Strassel, David Miller, Kevin Walker, Christopher Cieri:
Shared resources for robust speech-to-text technology. 1609-1612
Towards Synthesizing Expressive Speech
- Nick Campbell:
Towards synthesising expressive speech; designing and collecting expressive speech data. 1637-1640 - Tanja Bänziger, Michel Morel, Klaus R. Scherer:
Is there an emotion signature in intonational patterns? and can it be used in synthesis? 1641-1644 - Ellen Eide, Raimo Bakis, Wael Hamza, John F. Pitrelli:
Multilayered extensions to the speech synthesis markup language for describing expressiveness. 1645-1648 - Alan W. Black:
Unit selection and emotional speech. 1649-1652 - Christophe d'Alessandro, Boris Doval:
Voice quality modification for emotional speech synthesis. 1653-1656 - Jan P. H. van Santen, Lois M. Black, Gilead Cohen, Alexander Kain, Esther Klabbers, Taniya Mishra, Jacques de Villiers, Xiaochuan Niu:
Applications of computer generated expressive speech for communication disorders. 1657-1660
Speaker Verification
- David A. van Leeuwen:
Speaker verification systems and security considerations. 1661-1664 - Matthieu Hébert, Larry P. Heck:
Phonetic class-based speaker verification. 1665-1668 - Suhadi Suhadi, Sorel Stan, Tim Fingscheidt, Christophe Beaugeant:
An evaluation of VTS and IMM for speaker verification in noise. 1669-1672 - Todor Ganchev, Dimitris K. Tasoulis, Michael N. Vrahatis, Nikos Fakotakis:
Locally recurrent probabilistic neural network for text-independent speaker verification. 1673-1676 - Stan Z. Li, Dong Zhang, Chengyuan Ma, Heung-Yeung Shum, Eric Chang:
Learning to boost GMM based speaker verification. 1677-1680 - Eric W. M. Yu, Man-Wai Mak, Chin-Hung Sit, Sun-Yuan Kung:
Speaker verification based on g.729 and g.723.1 coder parameters and handset mismatch compensation. 1681-1684
Dialog System Generation
- Stephen Whittaker, Marilyn A. Walker, Preetam Maloor:
Should i tell all?: an experiment on conciseness in spoken dialogue. 1685-1688 - Helen M. Meng, Wing Lin Yip, Oi Yan Mok, Shuk Fong Chan:
Natural language response generation in mixed-initiative dialogs using task goals and dialog acts. 1689-1692 - Keikichi Hirose, Junji Tago, Nobuaki Minematsu:
Speech generation from concept for realizing conversation with an agent in a virtual room. 1693-1696 - Marilyn A. Walker, Rashmi Prasad, Amanda Stent:
A trainable generator for recommendations in multimodal dialog. 1697-1700 - Tatsuya Kawahara, Ryosuke Ito, Kazunori Komatani:
Spoken dialogue system for queries on appliance manuals using hierarchical confirmation strategy. 1701-1704 - Dalina Kallulli:
SAG: a procedural tactical generator for dialog systems. 1705-1708
Robust Speech Recognition 1-4
- Yu Luo, Limin Du:
A hidden Markov model-based missing data imputation approach. 1765-1768 - Takeshi Yamada, Jiro Okada, Kazuya Takeda, Norihide Kitaoka, Masakiyo Fujimoto, Shingo Kuroiwa, Kazumasa Yamamoto, Takanobu Nishiura, Mitsunori Mizumachi, Satoshi Nakamura:
Integration of noise reduction algorithms for Aurora2 task. 1769-1772 - Rita Singh, Manfred K. Warmuth, Bhiksha Raj, Paul Lamere:
Classification with free energy at raised temperatures. 1773-1776 - Pei Ding, Bertram E. Shi, Pascale Fung, Zhigang Cao:
Flooring the observation probability for robust ASR in impulsive noise. 1777-1780 - Masakiyo Fujimoto, Yasuo Ariki:
Combination of temporal domain SVD based speech enhancement and GMM based speech estimation for ASR in noise - evaluation on the AURORA2 task -. 1781-1784 - Petr Fousek, Petr Pollák:
Additive noise and channel distortion-robust parametrization tool - performance evaluation on Aurora 2 & 3. 1785-1788 - Stéphane Dupont, Christophe Ris:
Robust feature extraction and acoustic modeling at multitel: experiments on the Aurora databases. 1789-1792 - Bojan Kotnik, Zdravko Kacic, Bogomir Horvat:
Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling. 1793-1796 - Christophe Couvreur, Oren Gedge, Klaus Linhard, Shaunie Shammass, Johan Vantieghem:
Database adaptation for ASR in cross-environmental conditions in the SPEECON project. 1797-1800 - Petr Motlícek, Jan Cernocký:
Autoregressive modeling based feature extraction for Aurora3 DSR task. 1801-1804 - Edmondo Trentin, Marco Matassoni, Marco Gori:
Evaluation on the Aurora 2 database of acoustic models that are less noise-sensitive. 1805-1808 - Javier Macías Guarasa, J. Ordonez, Juan Manuel Montero, Javier Ferreiros, Ricardo de Córdoba, Luis Fernando D'Haro:
Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition. 1809-1812 - Shahla Parveen, Phil D. Green:
Multitask learning in connectionist robust ASR using recurrent neural networks. 1813-1816 - Hemant Misra, Andrew C. Morris:
Confusion matrix based entropy correction in multi-stream combination. 1817-1820 - Huayun Zhang, Zhaobing Han, Bo Xu:
Dynamic channel compensation based on maximum a posteriori estimation. 2137-2140 - Laura Docío Fernández, David Gelbart, Nelson Morgan:
Far-field ASR on inexpensive microphones. 2141-2144 - Satoru Tsuge, Shingo Kuroiwa, Kenji Kita:
Evaluation of ETSI advanced DSR front-end and bias removal method on the Japanese newspaper article sentences speech corpus. 2145-2148 - Chng Chin Soon, Bernt Andrassy, Josef G. Bauer, Günther Ruske:
Environment adaptive control of noise reduction parameters for improved robustness of ASR. 2149-2152 - Yuki Denda, Takanobu Nishiura, Hideki Kawahara:
Speech enhancement with microphone array and fourier / wavelet spectral subtraction in real noisy environments. 2153-2156 - Takanobu Nishiura, Satoshi Nakamura, Kazuhiro Miki, Kiyohiro Shikano:
Environmental sound source identification based on hidden Markov model for robust speech recognition. 2157-2160 - Peter Jancovic, Münevver Köküer, Fionn Murtagh:
High-likelihood model based on reliability statistics for robust combination of features: application to noisy speech recognition. 2161-2164 - Cenk Demiroglu, David V. Anderson:
Noise robust digit recognition with missing frames. 2165-2168 - Xiaodong Cui, Alexis Bernard, Abeer Alwan:
A noise-robust ASR back-end technique based on weighted viterbi recognition. 2169-2172 - Muhammad Ghulam, Takashi Fukuda, Tsuneo Nitta:
Voice quality normalization in an utterance for robust ASR. 2173-2176 - Murat Akbacak, John H. L. Hansen:
Environmental sniffing: robust digit recognition for an in-vehicle environment. 2177-2180 - Tai-Hwei Hwang:
Energy contour extraction for in-car speech recognition. 2181-2184 - Takashi Fukuda, Tsuneo Nitta:
Noise-robust ASR by using distinctive phonetic features approximated with logarithmic normal distribution of HMM. 2185-2188 - Takashi Fukuda, Tsuneo Nitta:
Noise-robust automatic speech recognition using orthogonalized distinctive phonetic feature vectors. 2189-2192 - Néstor Becerra Yoma, Ivan Brito, Jorge F. Silva:
Language model accuracy and uncertainty in noise cancelling in the stochastic weighted viterbi algorithm. 2193-2196 - Koen Eneman, Jacques Duchateau, Marc Moonen, Dirk Van Compernolle, Hugo Van hamme:
Assessment of dereverberation algorithms for large vocabulary speech recognition systems. 2689-2692 - Ben P. Milner, Alastair Bruce James:
Analysis and compensation of packet loss in distributed speech recognition using interleaving. 2693-2696 - Ben P. Milner:
Non-linear compression of feature vectors using transform coding and non-uniform bit allocation. 2697-2700 - Jen-Tzung Chien, Sadaoki Furui:
Predictive hidden Markov model selection for decision tree state tying. 2701-2704 - Kazuhiro Nakadai, Daisuke Matsuura, Hiroshi G. Okuno, Hiroshi Tsujino:
Three simultaneous speech recognition by integration of active audition and face recognition for humanoid. 2705-2708 - Katsuhisa Fujinaga, Hiroaki Kokubo, Hirofumi Yamamoto, Gen-ichiro Kikui, Hiroshi Shimodaira:
Mis-recognized utterance detection using multiple language models generated by clustered sentences. 2709-2712 - Hui Sun, Guoliang Zhang, Fang Zheng, Mingxing Xu:
Using word confidence measure for OOV words detection in a spontaneous spoken dialog system. 2713-2716 - Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura:
Speech recognition using EMG; mime speech recognition. 2717-2720 - Takatoshi Jitsuhiro, Tomoko Matsui, Satoshi Nakamura:
Automatic generation of non-uniform context-dependent HMM topologies based on the MDL criterion. 2721-2724 - Norihide Kitaoka, Masahisa Shingu, Seiichi Nakagawa:
Comparison of effects of acoustic and language knowledge on spontaneous speech perception/recognition between human and automatic speech recognizer. 2725-2728 - Genevieve Gorrell:
Using statistical language modelling to identify new vocabulary in a grammar-based speech recognition system. 2729-2732 - Angel M. Gomez, Antonio M. Peinado, Victoria E. Sánchez, Antonio J. Rubio:
A source model mitigation technique for distributed speech recognition over lossy packet channels. 2733-2736 - Martin J. Russell, Philip J. B. Jackson:
The effect of an intermediate articulatory layer on the performance of a segmental HMM. 2737-2740 - Yi Liu, Pascale Fung:
Automatic phone set extension with confidence measure for spontaneous speech. 2741-2744 - Roberto Paredes, Alberto Sanchís, Enrique Vidal, Alfons Juan:
Utterance verification using an optimized k-nearest neighbour classifier. 2745-2748 - Guokang Fu, Ta-Hsin Li:
A segment-based algorithm of speech enhancement for robust speech recognition. 3029-3032 - Roberto Gemello, Franco Mana, Dario Albesano, Renato de Mori:
Robust multiple resolution analysis for automatic speech recognition. 3033-3036 - Mohamed Afify:
An accurate noise compensation algorithm in the log-spectral domain for robust speech recognition. 3037-3040 - Javier Ramírez, José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:
A new adaptive long-term spectral estimation voice activity detector. 3041-3044 - Michael J. Carey:
Robust speech recognition using non-linear spectral smoothing. 3045-3048 - Cailian Miao, Yangsheng Wang:
A novel use of residual noise model for modified PMC. 3049-3052 - Christophe Cerisara, Irina Illina:
Robust speech recognition to non-stationary noise based on model-driven approaches. 3053-3056 - Christophe Cerisara:
Towards missing data recognition with cepstral features. 3057-3060 - Hemmo Haverinen, Imre Kiss:
On-line parametric histogram equalization techniques for noise robust embedded speech recognition. 3061-3064 - An-Tze Yu, Hsiao-Chuan Wang:
Compensation of channel distortion in line spectrum frequency domain. 3065-3068 - Arnaud Martin, Laurent Mauuary:
Voicing parameter and energy based speech/non-speech detection for speech recognition in adverse conditions. 3069-3072 - Hugo Van hamme:
Two correction models for likelihoods in robust speech recognition using missing feature theory. 3073-3076 - J. Sujatha, K. R. Prasanna Kumar, K. R. Ramakrishnan, N. Balakrishnan:
Spectral maxima representation for robust automatic speech recognition. 3077-3080 - Toshiki Endo, Shingo Kuroiwa, Satoshi Nakamura:
Missing feature theory applied to robust speech recognition over IP network. 3081-3084 - Hesham Tolba, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low-SNR car environments. 3085-3088 - Hugo Van hamme:
Robust speech recognition using missing feature theory in the cepstral or LDA domain. 3089-3092 - Yuan-Fu Liao, Jeng-Shien Lin, Wei-Ho Tsai:
Bandwidth mismatch compensation for robust speech recognition. 3093-3096 - Robert W. Morris, Jon A. Arrowood, Mark A. Clements:
Markov chain monte carlo methods for noise robust feature extraction using the autoregressive model. 3097-3100 - Joan Marí Hilario, Fritz Class:
A comparative study of some discriminative feature reduction algorithms on the AURORA 2000 and the daimlerchrysler in-car ASR tasks. 3101-3104
Speech Recognition - Large Vocabulary 1, 2
- Josef Psutka, Pavel Ircing, Josef V. Psutka, Vlasta Radová, William J. Byrne, Jan Hajic, Jirí Mírovský, Samuel Gustman:
Large vocabulary ASR for spontaneous czech in the MALACH project. 1821-1824 - Giuseppe Riccardi, Dilek Hakkani-Tür:
Active and unsupervised learning for automatic speech recognition. 1825-1828 - Umit H. Yapanel, Satya Dharanipragada, John H. L. Hansen:
Perceptual MVDR-based cepstral coefficients (PMCCs) for high accuracy speech recognition. 1829-1832 - Sheng Gao, Chin-Hui Lee:
A discriminative decision tree learning approach to acoustic modeling. 1833-1836 - Patrick Nguyen, Luca Rigazio, Jean-Claude Junqua:
Large corpus experiments for broadcast news recognition. 1837-1840 - Somchai Jitapunkul, Ekkarit Maneenoi, Visarut Ahkuputra, Sudaporn Luksaneeyanawin:
Performance evaluation of phonotactic and contextual onset-rhyme models for speech recognition of Thai language. 1841-1844 - Yao Qian, Tan Lee, Yujia Li:
Overlapped di-tone modeling for tone recognition in continuous Cantonese speech. 1845-1848 - Masafumi Nishida, Tatsuya Kawahara:
Speaker model selection using Bayesian information criterion for speaker indexing and speaker adaptation. 1849-1852 - Janienke Sturm, Judith M. Kessens, Mirjam Wester, Febe de Wet, Eric Sanders, Helmer Strik:
Automatic transcription of football commentaries in the MUMIS project. 1853-1856 - S. Douglas Peters:
On the limits of cluster-based acoustic modeling. 1857-1860 - Dau-Cheng Lyu, Min-Siong Liang, Yuang-Chin Chiang, Chun-Nan Hsu, Ren-Yuan Lyu:
Large vocabulary taiwanese (min-nan) speech recognition using tone features and statistical pronunciation modeling. 1861-1864 - Pierre L. Dognin, Amro El-Jaroudi:
A new spectral transformation for speaker normalization. 1865-1868 - Hua Yu, Tanja Schultz:
Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition. 1869-1872 - Pavel Ircing, Josef Psutka:
Fitting class-based language models into weighted finite-state transducer framework. 1873-1876 - Fabrice Lefèvre, Jean-Luc Gauvain, Lori Lamel:
Multi-source training and adaptation for generic speech recognition. 1877-1880 - Brian Kingsbury, Lidia Mangu, George Saon, Geoffrey Zweig, Scott Axelrod, Vaibhava Goel, Karthik Visweswariah, Michael Picheny:
Toward domain-independent conversational speech recognition. 1881-1884 - Rong Zhang, Alexander I. Rudnicky:
Comparative study of boosting and non-boosting training for constructing ensembles of acoustic models. 1885-1888 - Peng Ding, Zhenbiao Chen, Sheng Hu, Shuwu Zhang, Bo Xu:
Discriminative optimization of large vocabulary Mandarin conversational speech recognition system. 1965-1968 - Johan Schalkwyk, I. Lee Hetherington, Ezra Story:
Speech recognition with dynamic grammars using finite-state transducers. 1969-1972 - Kris Demuynck, Tom Laureys, Dirk Van Compernolle, Hugo Van hamme:
FLavor: a flexible architecture for LVCSR. 1973-1976 - George Saon, Geoffrey Zweig, Brian Kingsbury, Lidia Mangu, Upendra V. Chaudhari:
An architecture for rapid decoding of large vocabulary conversational speech. 1977-1980 - Daniel Povey, Mark J. F. Gales, Do Yeong Kim, Philip C. Woodland:
MMI-MAP and MPE-MAP for acoustic model adaptation. 1981-1984 - Vlasios Doumpiotis, Stavros Tsakalidis, William J. Byrne:
Lattice segmentation and minimum Bayes risk discriminative training. 1985-1988
Robust Methods in Processing of Natural Language Dialogues
- Klaus Zechner:
Spoken language condensation in the 21st century. 1989-1992 - Sadaoki Furui:
Robust methods in automatic speech recognition and understanding. 1993-1998 - Rodolfo Delmonte:
Parsing spontaneous speech. 1999-2004
Speaker Identification
- Douglas A. Reynolds:
Model compression for GMM based speaker recognition systems. 2005-2008 - Jean-François Bonastre, Philippe Morin, Jean-Claude Junqua:
Gaussian dynamic warping (GDW) method applied to text-dependent speaker detection and verification. 2013-2016 - Luciana Ferrer, Harry Bratt, Venkata Ramana Rao Gadde, Sachin S. Kajarekar, Elizabeth Shriberg, M. Kemal Sönmez, Andreas Stolcke, Anand Venkataraman:
Modeling duration patterns for speaker recognition. 2017-2020 - Simon Lucey, Tsuhan Chen:
Improved speaker verification through probabilistic subspace adaptation. 2021-2024 - Peng Yu, Frank Seide, Chengyuan Ma, Eric Chang:
An improved model-based speaker segmentation system. 2025-2028
Speech Synthesis: Miscellaneous 1, 2
- Jerome R. Bellegarda:
A latent analogy framework for grapheme-to-phoneme conversion. 2029-2032 - Stanley F. Chen:
Conditional and joint models for grapheme-to-phoneme conversion. 2033-2036 - Beat Pfister, Harald Romsdorfer:
Mixed-lingual text analysis for polyglot TTS synthesis. 2037-2040 - Jason Y. Zhang, Alan W. Black, Richard Sproat:
Identifying speakers in children's stories for speech synthesis. 2041-2044 - Catherine J. Stevens, Nicole Lees, Julie Vonwiller:
Experimental tools to evaluate intelligibility of text-to-speech (TTS) synthesis: effects of voice gender and signal quality. 2045-2048 - Laura Mayfield Tomokiyo, Alan W. Black, Kevin A. Lenzo:
Arabic in my hand: small-footprint synthesis of egyptian arabic. 2049-2052 - Christina L. Bennett, Alan W. Black:
Using acoustic models to choose pronunciation variations for synthetic voices. 2937-2940 - Qin Yan, Saeed Vaseghi, Ching-Hsiang Ho, Dimitrios Rentzos, Emir Turajlic:
Comparative analysis and synthesis of formant trajectories of british and broad australian accents. 2941-2944 - Miguel Arjona Ramírez:
Cycle extraction for perfect reconstruction and rate scalability. 2945-2948 - António J. S. Teixeira, Luis M. T. Jesus, Roberto Martinez:
Adding fricatives to the portuguese articulatory synthesiser. 2949-2952 - Ignasi Iriondo Sanz, Francesc Alías, Javier Sanchis, Javier Melenchón:
A hybrid method oriented to concatenative text-to-speech synthesis. 2953-2956 - Yong Zhao, Min Chu, Hu Peng, Eric Chang:
Custom-tailoring TTS voice font - keeping the naturalness when reducing database size. 2957-2960
Speech Perception
- Soundararajan Srinivasan, DeLiang Wang:
Schema-based modeling of phonemic restoration. 2053-2056 - Hisao Kuwabara:
Perception of voice-individuality for distortions of resonance/source characteristics and waveforms. 2057-2060 - Tsutomu Sato:
The perceptual cues of a high level pitch-accent pattern in Japanese: pitch-accent patterns and duration. 2061-2064 - Mamoru Iwaki, Norio Nakamura:
Illusory continuity of intermittent pure tone in binaural listening and its dependency on interaural time difference. 2065-2068 - Nobuaki Minematsu, Changchen Guo, Keikichi Hirose:
CART-based factor analysis of intelligibility reduction in Japanese English. 2069-2072 - László Tóth, András Kocsor:
Harmonic alternatives to sine-wave speech. 2073-2076 - Dorel Picovici, Abdulhussain E. Mahdi:
Non-intrusive assessment of perceptual speech quality using a self-organising map. 2077-2080 - Sophie Dufour, Ronald Peereman:
Inhibitory priming effect in auditory word recognition: the role of the phonological mismatch length between primes and targets. 2081-2084 - Odette Scharenborg, Louis ten Bosch, Lou Boves:
Recognising 'real-life' speech with spem: a speech-based computational model of human speech recognition. 2085-2088 - Judith Rosenhouse, Liat Kishon-Rabin:
The effect of speech rate and noise on bilinguals' speech perception: the case of native speakers of arabic in israel. 2089-2092 - Oytun Türk, Levent M. Arslan:
Subjective evaluations for perception of speaker identity through acoustic feature transplantations. 2093-2096 - Odette Scharenborg, James M. McQueen, Louis ten Bosch, Dennis Norris:
Modelling human speech recognition using automatic speech recognition paradigms in speM. 2097-2100 - Mutsumi Saito, Kimio Shiraishi, Kimitoshi Fukudome:
The effect of amplitude compression on wide band telephone speech for hearing-impaired elderly people. 2101-2104 - Takashi Otake, Miki Komatsu:
Word activation model by Japanese school children without knowledge of roman alphabet. 2105-2108 - Sue Harding, Georg F. Meyer:
Multi-resolution auditory scene analysis: robust speech recognition using pattern-matching from a noisy signal. 2109-2112 - Hisami Matsui, Hideki Kawahara:
Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system. 2113-2116 - Kuldip K. Paliwal, Leigh D. Alsteris:
Usefulness of phase spectrum in human speech perception. 2117-2120 - Shinichi Tokuma:
Perception of English lexical stress by English and Japanese speakers: effect of duration and "realistic" intensity change. 2121-2124 - Pauline Welby:
French intonational rises and their role in speech seg mentation [sic]. 2125-2128 - Won Tokuma:
Physical and perceptual configurations of Japanese fricatives from multidimensional scaling analyses. 2129-2132 - Ching-Pong Au:
An acquisition model of speech perception with considerations of temporal information. 2133-2136
Multi-Modal Processing and Speech Interface Design
- Ilyas Potamitis, Kallirroi Georgila, Nikos Fakotakis, George K. Kokkinakis:
An integrated system for smart-home control of appliances based on remote speech interaction. 2197-2200 - Jianhong Jin, Martin J. Russell, Michael J. Carey, James Chapman, Harvey Lloyd-Thomas, Graham Tattersall:
A spoken language interface to an electronic programme guide. 2201-2204 - Luís Seabra Lopes, António J. S. Teixeira, Mário Rodrigues, Diogo Gomes, Cláudio Teixeira, Liliana da Silva Ferreira, Pedro Filipe Soares, João Girão, Nuno Sénica:
Towards a personal robot with language interface. 2205-2208 - Jason D. Williams, Andrew T. Shaw, Lawrence Piano, Michael Abt:
Preference, perception, and task completion of open, menu-based, and directed prompts for call routing: a case study. 2209-2212 - Athanassios Hatzis, Phil D. Green, James Carmichael, Stuart P. Cunningham, Rebecca Palmer, Mark Parker, Peter O'Neill:
An integrated toolkit deploying speech technology for computer based speech training with application to dysarthric speakers. 2213-2216 - Bernhard Suhm:
Towards best practices for speech user interface design. 2217-2220 - David Stallard, John Makhoul, Fred Choi, Ehry MacRostie, Premkumar Natarajan, Richard M. Schwartz, Bushra Zawaydeh:
Design and evaluation of a limited two-way speech translator. 2221-2224 - Sorin Dusan, Gregory J. Gadbois, James L. Flanagan:
Multimodal interaction on PDA's integrating speech and pen inputs. 2225-2228 - Petra Gieselmann, Matthias Denecke:
Towards multimodal interaction with an intelligent room. 2229-2232 - Roberto Pieraccini, Krishna Dayanidhi, Jonathan Bloom, Jean-Gui Dahan, Michael Phillips, Bryan R. Goodman, K. Venkatesh Prasad:
A multimodal conversational interface for a concept vehicle. 2233-2236 - Ling Ma, Dan J. Smith, Ben P. Milner:
Context awareness using environmental noise classification. 2237-2240 - Tatsuya Shiraishi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Simple designing methods of corpus-based visual speech synthesis. 2241-2244 - Janienke Sturm, Ilse Bakx, Bert Cranen, Jacques M. B. Terken:
Comparing the usability of a user driven and a mixed initiative multimodal dialogue system for train timetable information. 2245-2248 - Dominic W. Massaro, Joanna Light:
Read my tongue movements: bimodal learning to perceive and produce non-native speech /r/ and /l/. 2249-2252 - Jesus F. Guitarte Perez, Klaus Lukas, Alejandro F. Frangi:
Low resource lip finding and tracking algorithm for embedded devices. 2253-2256 - Futoshi Asano, Yoichi Motomura, Hideki Asoh, Takashi Yoshimura, Naoyuki Ichimura, Kiyoshi Yamamoto, Nobuhiko Kitawaki, Satoshi Nakamura:
Detection and separation of speech segment using audio and video information fusion. 2257-2260 - Olov Engwall, Jonas Beskow:
Resynthesis of 3d tongue movements from facial data. 2261-2264 - Thorsten Trippel, Felix Sasaki, Benjamin Hell, Dafydd Gibbon:
Acquiring lexical information from multilevel temporal annotations. 2265-2268 - Piero Cosi, Andrea Fusaro, Graziano Tisato:
LUCIA a new italian talking-head based on a modified cohen-massaro's labial coarticulation model. 2269-2272 - Niloy Mukherjee, Deb Roy:
A visual context-aware multimodal system for spoken language processing. 2273-2276
Speech Recognition - Language Modeling
- Juan P. Piantanida, Claudio Estienne:
Maximum entropy good-turing estimator for language modeling. 2277-2280 - Xiaolong Li, Yunxin Zhao:
Exploiting order-preserving perfect hashing to speedup n-gram language model lookahead. 2281-2284 - Dimitris Oikonomidis, Vassilios Digalakis:
Stem-based maximum entropy language models for inflectional languages. 2285-2288 - Pavel Krbec, Petr Podveský, Jan Hajic:
Combination of a hidden tag model and a traditional n-gram model: a case study in czech speech recognition. 2289-2292 - Vesa Siivola, Teemu Hirsimäki, Mathias Creutz, Mikko Kurimo:
Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner. 2293-2296 - Máté Szarvas, Sadaoki Furui:
Evaluation of the stochastic morphosyntactic language model on a one million word hungarian dictation task. 2297-2300
Feature Analysis and Cross-Language Processing of Chinese Spoken Language
- Lin-Shan Lee, Shun-Chuan Chen:
Automatic title generation for Chinese spoken documents considering the special structure of the language. 2325-2328 - Bo Xu, Shuwu Zhang, Chengqing Zong:
Statistical speech-to-speech translation with multilingual speech recognition and bilingual-chunk parsing. 2329-2332 - Limin Du, Boxing Chen:
Automatic extraction of bilingual chunk lexicon for spoken language translation. 2333-2336 - Wai Kit Lo, Yuk-Chi Li, Gina-Anne Levow, Hsin-Min Wang, Helen M. Meng:
Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval. 2337-2340 - Chiu-yu Tseng:
Mandarin speech prosody: issues, pitfalls and directions. 2341-2344 - Aijun Li, Xia Wang:
A contrastive investigation of standard Mandarin and accented Mandarin. 2345-2348 - Jianhua Tao:
Emotion control of Chinese speech synthesis in natural environment. 2349-2352
Speech Production and Physiology
- Alexander S. Leonov, Victor N. Sorokin:
Optimality criteria in inverse problems for tongue-jaw interaction. 2353-2356 - Koji Sasaki, Nobuhiro Miki, Yoshikazu Miyanaga:
FEM analysis based on 3-d time-varying vocal tract shape. 2357-2360 - Jianwu Dang, Kiyoshi Honda:
Consideration of muscle co-contraction in a physiological articulatory model. 2361-2364 - Claudia Manfredi, Giorgio Peretti:
Robust techniques for pre- and post-surgical voice analysis. 2365-2368 - Karl Schnell, Arild Lacroix:
Analysis of lossy vocal tract models for speech production. 2369-2372 - Beatrice Fung-Wah Khioe:
Temporal properties of the nasals and nasalization in Cantonese. 2373-2376 - Frédéric Bettens, Francis Grenez, Jean Schoentgen:
Estimation of vocal noise in running speech by means of bi-directional double linear prediction. 2377-2380 - Abdulhussain E. Mahdi:
Visualisation of the vocal tract based on estimation of vocal area functions and formant frequencies. 2381-2384 - Denisse Sciamarella, Christophe d'Alessandro:
Reproducing laryngeal mechanisms with a two-mass model. 2385-2388 - Milan Bostik, Milan Sigmund:
Methods for estimation of glottal pulses waveforms exciting voiced speech. 2389-2392 - Zhaoyan Zhang, Carol Y. Espy-Wilson, Mark Tiede:
Acoustic modeling of american English lateral approximants. 2393-2396 - Sayoko Takano, Kiyoshi Honda, Shinobu Masaki, Yasuhiro Shimada, Ichiro Fujimoto:
Translation and rotation of the cricothyroid joint revealed by phonation-synchronized high-resolution MRI. 2397-2400
Speech Synthesis: Voice Conversion and Miscellaneous Topics
- Hiromichi Kawanami, Yohei Iwami, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
GMM-based voice conversion applied to emotional speech synthesis. 2401-2404 - Dimitrios Rentzos, Saeed Vaseghi, Qin Yan, Ching-Hsiang Ho, Emir Turajlic:
Probability models of formant parameters for voice conversion. 2405-2408 - Hui Ye, Steve J. Young:
Perceptually weighted linear transformations for voice conversion. 2409-2412 - Yining Chen, Min Chu, Eric Chang, Jia Liu, Runsheng Liu:
Voice conversion with smoothed GMM and MAP adaptation. 2413-2416 - Özgül Salor, Mübeccel Demirekler, Bryan L. Pellom:
A system for voice conversion based on adaptive filtering and line spectral frequency distance optimization for text-to-speech synthesis. 2417-2420 - Hiroki Mori, Hideki Kasuya:
Speaker conversion in ARX-based source-formant type speech synthesis. 2421-2424 - Andrew P. Breen, Steve Minnis, Barry Eggleton:
Implementing an SSML compliant concatenative TTS system. 2425-2428 - Zhenglai Gu, Hiroki Mori, Hideki Kasuya:
Acoustic variations of focused disyllabic words in Mandarin Chinese: analysis, synthesis and perception. 2429-2432 - Pedro J. Quintana-Morales, Juan L. Navarro-Mesa:
An approach to common acoustical pole and zero modeling of consecutive periods of voiced speech. 2433-2436 - Huiqun Deng, Michael P. Beddoes, Rabab Kreidieh Ward, Murray Hodgson:
Estimating the vocal-tract area function and the derivative of the glottal wave from a speech signal. 2437-2440 - Parham Zolfaghari, Tomohiro Nakatani, Toshio Irino, Hideki Kawahara, Fumitada Itakura:
Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis. 2441-2444 - Matti Karjalainen:
Mixed physical modeling techniques applied to speech production. 2445-2448 - Sascha Fagel, Walter F. Sendlmeier:
An expandable web-based audiovisual text-to-speech synthesis system. 2449-2452 - P. Nikleczy, Gábor Olaszy:
A reconstruction of farkas kempelen's speaking machine. 2453-2456 - Wentao Gu, Keikichi Hirose:
Acoustic model selection and voice quality assessment for HMM-based Mandarin speech synthesis. 2457-2460 - Junichi Yamagishi, Koji Onishi, Takashi Masuko, Takao Kobayashi:
Modeling of various speaking styles and emotions for HMM-based speech synthesis. 2461-2464 - Ranniery Maia, Heiga Zen, Keiichi Tokuda, Tadashi Kitamura, Fernando Gil Vianna Resende Jr.:
Towards the development of a brazilian portuguese text-to-speech system based on HMM. 2465-2468 - Paul Vozila, Jeff Adams, Yuliya Lobacheva, Ryan Thomas:
Grapheme to phoneme conversion and dictionary verification using graphonemes. 2469-2472 - Justin Fackrell, Wojciech Skut, Kathrine Hammervold:
Improving the accuracy of pronunciation prediction for unit selection TTS. 2473-2476 - Taniya Mishra, Esther Klabbers, Jan P. H. van Santen:
Detection of list-type sentences. 2477-2480
Acoustic Modelling 1, 2
- Ramon Prieto, Jing Jiang, Chi-Ho Choi:
A new pitch synchronous time domain phoneme recognizer using component analysis and pitch clustering. 2481-2484 - Hiroaki Kojima, Kazuyo Tanaka:
Mixed-lingual spoken word recognition by using VQ codebook sequences of variable length segments. 2485-2488 - Tommi Lahti, Olli Viikki, Marcel Vasilache:
Low memory acoustic models for HMM based speech recognition. 2489-2492 - José A. R. Fonollosa:
Nearest-neighbor search algorithms based on subcodebook selection and its application to speech recognition. 2493-2496 - Mohamed Kamal Omar, Mark Hasegawa-Johnson:
Non-linear maximum likelihood feature transformation for speech recognition. 2497-2500 - Soo-Young Suk, Ho-Youl Jung, Hyun-Yeol Chung:
Automatic generation of context-independent variable parameter models using successive state and mixture splitting. 2501-2504 - Andrej Zgank, Zdravko Kacic, Bogomir Horvat:
Data driven generation of broad classes for decision tree construction in acoustic modeling. 2505-2508 - Peder A. Olsen, Satya Dharanipragada:
An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models. 2509-2512 - Jun Ogata, Yasuo Ariki:
Syllable-based acoustic modeling for Japanese spontaneous speech recognition. 2513-2516 - Özgür Çetin, Mari Ostendorf:
Cross-stream observation dependencies for multi-stream speech recognition. 2517-2520 - Brian Kan-Wing Mak, Kin-Wah Chan:
Pruning transitions in a hidden Markov model with optimal brain surgeon. 2521-2524 - Mathew Magimai-Doss, Todd A. Stephenson, Hervé Bourlard:
Using pitch frequency information in speech recognition. 2525-2528 - Karen Livescu, James R. Glass, Jeff A. Bilmes:
Hidden feature models for speech recognition using dynamic Bayesian networks. 2529-2532 - Wei Hu, Yimin Zhang, Qian Diao, Shan Huang:
An efficient viterbi algorithm on DBNs. 2533-2536 - Li Zhang, William H. Edmondson:
Speech recognition based on syllable recovery. 2537-2540 - Tarek Abu-Amer, Julie Carson-Berndsen:
HARTFEX: a multi-dimensional system of HMM based recognisers for articulatory features extraction. 2541-2544 - Benoît Maison:
Automatic baseform generation from acoustic data. 2545-2548 - Thurid Spiess, Britta Wrede, Gernot A. Fink, Franz Kummert:
Data-driven pronunciation modeling for ASR using acoustic subword units. 2549-2552 - Vincent Vanhoucke, Ananth Sankar:
Variable length mixtures of inverse covariances. 2605-2608 - Christoph Neukirchen:
Semi-tied full deviation matrices for laplacian density models. 2609-2612 - Karthik Visweswariah, Scott Axelrod, Ramesh A. Gopinath:
Acoustic modeling with mixtures of subspace constrained exponential models. 2613-2616 - Vaibhava Goel, Scott Axelrod, Ramesh A. Gopinath, Peder A. Olsen, Karthik Visweswariah:
Discriminative estimation of subspace precision and mean (SPAM) models. 2617-2620 - Shinichi Yoshizawa, Kiyohiro Shikano:
Model-integration rapid training based on maximum likelihood for speech recognition. 2621-2624 - Amaro A. de Lima, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
On the use of kernel PCA for feature extraction in speech recognition. 2625-2628
Time is of the Essence - Dynamic Approaches to Spoken Language
- Steven Greenberg:
Time is of the essence - dynamic approaches to spoken language. 2553-2556 - Ken W. Grant, Steven Greenberg:
Spectro-temporal interactions in auditory and auditory-visual speech processing. 2557-2560 - David Poeppel:
Brain imaging correlates of temporal quantization in spoken language. 2561-2564 - Elliot Saltzman:
Temporal aspects of articulatory control. 2565-2568 - Brigitte Zellner Keller:
The temporal organisation of speech as gauged by speech synthesis. 2569-2572 - Michael Kleinschmidt:
Localized spectro-temporal features for automatic speech recognition. 2573-2576 - Les E. Atlas:
Modulation spectral filtering of speech. 2577-2580
Topics in Speech Recognition
- Roger K. Moore:
A comparison of the data requirements of automatic speech recognition systems and human listeners. 2581-2584 - Min Tang, Stephanie Seneff, Victor W. Zue:
Modeling linguistic features in speech recognition. 2585-2588 - Bhuvana Ramabhadran, Jing Huang, Upendra V. Chaudhari, Giridharan Iyengar, Harriet J. Nock:
Impact of audio segmentation and segment clustering on automated transcription accuracy of large spoken archives. 2589-2592 - Françoise Beaufays, Ananth Sankar, Shaun Williams, Mitch Weintraub:
Learning linguistically valid pronunciations from acoustic data. 2593-2596 - Nobuaki Minematsu, Koichi Osaki, Keikichi Hirose:
Improvement of non-native speech recognition by effectively modeling frequently observed pronunciation habits. 2597-2600 - Yoshitaka Nakajima, Hideki Kashioka, Kiyohiro Shikano, Nick Campbell:
Non-audible murmur recognition. 2601-2604
Speaker and Language Recognition
- Yassine Mami, Delphine Charlet:
Speaker modeling from selected neighbors applied to speaker recognition. 2629-2632 - Elisabeth Zetterholm, Kirk P. H. Sullivan, James Green, Erik J. Eriksson, Jan van Doorn, Peter E. Czigler:
Who knows carl bildt? - and what if you don't? 2633-2636 - Carlos Vivaracho-Pascual, Javier Ortega-Garcia, Luis Alonso Romero, Q. Isaac Moro-Sancho:
Improving the competitiveness of discriminant neural networks in speaker verification. 2637-2640 - Tomi Kinnunen, Ville Hautamäki, Pasi Fränti:
On the fusion of dissimilarity-based classifiers for speaker identification. 2641-2644 - Ji Ming, Darryl Stewart, Philip Hanna, Pat Corr, Francis Jack Smith, Saeed Vaseghi:
Robust speaker identification using posterior union models. 2645-2648 - Ran D. Zilca, Jirí Navrátil, Ganesh N. Ramaswamy:
"syncpitch": a pseudo pitch synchronous algorithm for speaker recognition. 2649-2652 - Soonil Kwon, Shrikanth S. Narayanan:
A method for on-line speaker indexing using generic reference models. 2653-2656 - Mohamed Mihoubi, Gilles Boulianne, Pierre Dumouchel:
Discriminative training and maximum likelihood detector for speaker identification. 2657-2660 - Sachin S. Kajarekar, André Gustavo Adami, Hynek Hermansky:
Novel approaches for one- and two-speaker detection. 2661-2664 - Joseph P. Campbell, Douglas A. Reynolds, Robert B. Dunn:
Fusing high- and low-level features for speaker recognition. 2665-2668 - P. Sivakumaran, J. Fortuna, Aladdin M. Ariyaeeinia:
Score normalisation applied to open-set, text-independent speaker identification. 2669-2672 - Mijail Arcienega, Andrzej Drygajlo:
On the number of Gaussian components in a mixture: an application to speaker verification tasks. 2673-2676 - Giampiero Salvi:
Using accent information in ASR models for Swedish. 2677-2680 - Hideharu Nakajima, Masaaki Nagata, Hisako Asano, Masanobu Abe:
Estimating Japanese word accent from syllable sequence using support vector machine. 2681-2684 - Ricardo de Córdoba, G. Prime, Javier Macías Guarasa, Juan Manuel Montero, Javier Ferreiros, José Manuel Pardo:
PPRLM optimization for language identification in air traffic control tasks. 2685-2688
Spoken Language Understanding and Translation
- Hsin-Hsi Chen:
Spoken cross-language access to image collection via captions. 2749-2752 - Salma Jamoussi, Kamel Smaïli, Jean Paul Haton:
Understanding process for speech recognition. 2753-2756 - Toshiyuki Takezawa, Gen-ichiro Kikui:
Collecting machine-translation-aided bilingual dialogues for corpus-based speech translation. 2757-2760 - Chai Wutiwiwatchai, Sadaoki Furui:
Combination of finite state automata and neural network for spoken language understanding. 2761-2764 - James Horlock, Simon King:
Discriminative methods for improving named entity extraction on speech data. 2765-2768 - Liang Gu, Yuqing Gao, Michael Picheny:
Improving statistical natural concept generation in interlingua-based speech-to-speech translation. 2769-2772 - Jérôme Goulian, Jean-Yves Antoine, Franck Poirier:
How NLP techniques can improve speech understanding: ROMUS - a robust chunk based message understanding system using link grammars. 2773-2776 - Ciprian Chelba, Alex Acero:
Discriminative training of n-gram classifiers for speech and text routing. 2777-2780 - Matthias Honal, Tanja Schultz:
Correction of disfluencies in spontaneous speech using a noisy-channel approach. 2781-2784 - Konstantinos Koumpis, Steve Renals:
Multi-class extractive voicemail summarization. 2785-2788 - Gökhan Tür, Mazin G. Rahim, Dilek Hakkani-Tür:
Active labeling for spoken language understanding. 2789-2792 - Gökhan Tür, Dilek Hakkani-Tür:
Exploiting unlabeled utterances for spoken language understanding. 2793-2796 - Fu-Hua Liu, Yuqing Gao, Liang Gu, Michael Picheny:
Noise robustness in speech to speech translation. 2797-2800 - Kai-Chung Siu, Helen M. Meng, Chin-Chung Wong:
Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars. 2801-2804 - Britta Wrede, Elizabeth Shriberg:
Spotting "hot spots" in meetings: human judgments and prosodic cues. 2805-2808 - Ye-Yi Wang, Alex Acero:
Combination of CFG and n-gram modeling in semantic grammar learning. 2809-2812 - Shun-Chuan Chen, Lin-Shan Lee:
Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach. 2813-2816 - Takaaki Hori, Chiori Hori, Yasuhiro Minami:
Speech summarization using weighted finite-state transducers. 2817-2820 - Yun-Tien Lee, Shun-Chuan Chen, Lin-Shan Lee:
Cross domain Chinese speech understanding and answering based on named-entity extraction. 2821-2824 - Chiori Hori, Takaaki Hori, Sadaoki Furui:
Evaluation method for automatic speech summarization. 2825-2828 - Li Li, Feng Liu, Wu Chou:
An information theoretic approach for using word cluster information in natural language call routing. 2829-2832 - Sreenivasa Sista, Amit Srivastava, Francis Kubala, Richard M. Schwartz:
Unsupervised topic discovery applied to segmentation of news transcriptions. 2833-2836
Towards a Roadmap for Speech Technology
- Paul Heisterkamp:
"do not attempt to light with match!": some thoughts on progress and research goals in spoken dialog systems. 2897-2900 - Björn Granström, David House:
Multimodality and speech technology: verbal and non-verbal communication in talking agents. 2901-2904 - Ronald A. Cole:
Roadmaps, journeys and destinations speculations on the future of speech technology research. 2905-2908 - Roger K. Moore:
Spoken language output: realising the vision. 2909-2912
Speaker Recognition and Verification
- Patrick Kenny, Mohamed Mihoubi, Pierre Dumouchel:
New MAP estimators for speaker recognition. 2961-2964 - Pedro J. Moreno, Purdy Ho:
A new SVM approach to speaker identification and verification using probabilistic distance kernels. 2965-2968 - Ming-Cheung Cheung, Man-Wai Mak, Sun-Yuan Kung:
Adaptive decision fusion for multi-sample speaker verification over GSM networks. 2969-2972 - Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung:
Environment adaptation for robust speaker verification. 2973-2976 - Yaniv Zigel, Arnon Cohen:
On cohort selection for speaker verification. 2977-2980 - Chakib Tadj, A. Benlahouar:
Speaker characterization using principal component analysis and wavelet transform for speaker verification. 2981-2984 - Yuya Akita, Tatsuya Kawahara:
Unsupervised speaker indexing using anchor models and automatic transcription of discussions. 2985-2988 - Klaus R. Scherer, Didier Grandjean, Tom Johnstone, Gudrun Klasmeyer, Tanja Bänziger:
A statistical approach to assessing speech and voice variability in speaker verification. 2989-2992 - Wei-Ho Tsai, Hsin-Min Wang, Dwight Rodgers:
Automatic singer identification of popular music recordings via estimation and modeling of solo vocal signal. 2993-2996 - Michele Vescovi, Mauro Cettolo, Romeo Rizzi:
A DP algorithm for speaker change detection. 2997-3000 - Itshak Lapidot:
SOM as likelihood estimator for speaker clustering. 3001-3004 - Nobuaki Minematsu, Keita Yamauchi, Keikichi Hirose:
Automatic estimation of perceptual age using speaker modeling techniques. 3005-3008 - Ryan Rifkin:
Speaker recognition using local models. 3009-3012 - Robbie Vogt, Jason W. Pelecanos, Sridha Sridharan:
Dependence of GMM adaptation on feature post-processing for speaker recognition. 3013-3016 - Seiichi Nakagawa, Wei Zhang:
Text-independent speaker recognition by speaker-specific GMM and speaker adapted syllable-based HMM. 3017-3020 - Ales Padrta, Vlasta Radová:
On the amount of speech data necessary for successful speaker identification. 3021-3024 - Ulrich Türk, Florian Schiel:
Speaker verification based on the German veridat database. 3025-3028
Multi-Lingual Spoken Language Processing
- Volker Fischer, Eric Janke, Siegfried Kunzmann:
Recent progress in the decoding of non-native speech with multilingual acoustic models. 3105-3108 - Wei-Chih Kuo, Li-Feng Lin, Yih-Ru Wang, Sin-Horng Chen:
An NN-based approach to prosodic information generation for synthesizing English words embedded in Chinese text. 3109-3112 - Shoichi Matsunaga, Atsunori Ogawa, Yoshikazu Yamaguchi, Akihiro Imamura:
Speaker adaptation for non-native speakers using bilingual English lexicon and acoustic models. 3113-3116 - Viet Bac Le, Brigitte Bigi, Laurent Besacier, Eric Castelli:
Using the web for fast language model construction in minority languages. 3117-3120 - Yan Ming Cheng, Chen Liu, Yuanjun Wei, Lynette Melnar, Changxue Ma:
An approach to multilingual acoustic modeling for portable devices. 3121-3124 - Terrence Martin, Torbjørn Svendsen, Sridha Sridharan:
Cross-lingual pronunciation modelling for indonesian speech recognition. 3125-3128 - Woosung Kim, Sanjeev Khudanpur:
Language model adaptation using cross-lingual information. 3129-3132 - Eddie Wong, Terrence Martin, Torbjørn Svendsen, Sridha Sridharan:
Multilingual phone clustering for recognition of spontaneous indonesian speech utilising pronunciation modelling techniques. 3133-3136 - Naveen Srinivasamurthy, Shrikanth S. Narayanan:
Language-adaptive persian speech recognition. 3137-3140 - Mirjam Killer, Sebastian Stüker, Tanja Schultz:
Grapheme based speech recognition. 3141-3144
Interdisciplinary
- Valery A. Petrushin:
Learning Chinese tones. 3145-3148 - Keikichi Hirose, Frédéric Gendrin, Nobuaki Minematsu:
A pronunciation training system for Japanese lexical accents with corrective feedback in learner's voice. 3149-3152 - Taro Mouri, Keikichi Hirose, Nobuaki Minematsu:
Considerations on vowel durations for Japanese CALL system. 3153-3156 - Hiroaki Kato, Masumi Nukinay, Hideki Kawahara, Reiko Akahane-Yamada:
Influence of recording equipment on the identification of second language phoneme contrasts. 3157-3160 - Yik-Cheung Tam, Jack Mostow, Joseph E. Beck, Satanjeev Banerjee:
Training a confidence measure for a reading tutor that listens. 3161-3164 - Satanjeev Banerjee, Joseph E. Beck, Jack Mostow:
Evaluating the effect of predicting oral reading miscues. 3165-3168 - Miroslav Holada, Jan Nouza:
VISPER II - enhanced version of the educational software for speech processing courses. 3169-3172 - Meirong Lu, Kazuyuki Takagi, Kazuhiko Ozeki:
The use of multiple pause information in dependency structure analysis of spoken Japanese sentences. 3173-3176 - Kazuyuki Takagi, Mamiko Okimoto, Yoshio Ogawa, Kazuhiko Ozeki:
A neural network approach to dependency analysis of Japanese sentences using prosodic information. 3177-3180 - Hisako Asano, Masaaki Nagata, Masanobu Abe:
Say-as classification for alphabetic words in Japanese texts. 3181-3184 - Kazushi Ishihara, Yasushi Tsubota, Hiroshi G. Okuno:
Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure. 3185-3188 - Heiga Zen, Keiichi Tokuda, Tadashi Kitamura:
Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling. 3189-3192 - Seiichi Nakagawa, Kazumasa Mori, Naoki Nakamura:
A statistical method of evaluating pronunciation proficiency for English words spoken by Japanese. 3193-3196
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.