default search action
ICSLP 1996: Philadelphia, PA, USA
- The 4th International Conference on Spoken Language Processing, Philadelphia, PA, USA, October 3-6, 1996. ISCA 1996
Plenary Lectures
- Anne Cutler:
The comparative study of spoken-language processing. 1 - James L. Flanagan:
Natural communication with machines - progress and challenge. 2522
Large Vocabulary
- Zhishun Li, Michel Héon, Douglas D. O'Shaughnessy:
New developments in the INRS continuous speech recognition system. 2-5 - Lori Lamel, Gilles Adda:
On designing pronunciation lexicons for large vocabulary, continuous speech recognition. 6-9 - Pablo Fetter, Frédéric Dandurand, Peter Regel-Brietzmann:
Word graph rescoring using confidence measures. 10-13 - Xavier L. Aubert, Peter Beyerlein, Meinhard Ullrich:
A bottom-up approach for handling unseen triphones in large vocabulary continuous speech recognition. 14-17 - V. Valtchev, Philip C. Woodland, Steve J. Young:
Discriminative optimisation of large vocabulary recognition systems. 18-21 - Tatsuo Matsuoka, Katsutoshi Ohtsuki, Takeshi Mori, Sadaoki Furui, Katsuhiko Shirai:
Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus. 22-25 - David M. Carter, Jaan Kaja, Leonardo Neumeyer, Manny Rayner, Fuliang Weng, Mats Wirén:
Handling compound nouns in a Swedish speech-understanding system. 26-29 - Javier Macías Guarasa, Ascensión Gallardo-Antolín, Javier Ferreiros, José Manuel Pardo, Luis Villarrubia Grande:
Initial evaluation of a preselection module for a flexible large vocabulary speech recognition system in. 30-33
Multimodal ASR (Face and Lips)
- Mamoun Alissali, Paul Deléglise, Alexandrina Rogozan:
Asynchronous integration of visual information in an automatic speech recognition system. 34-37 - Iain A. Matthews, J. Andrew Bangham, Stephen J. Cox:
Audiovisual speech recognition using multiscale nonlinear image decomposition. 38-41 - Qin Su, Peter L. Silsbee:
Robust audiovisual integration using semicontinuous hidden Markov models. 42-45 - Richard P. Schumeyer, Kenneth E. Barner:
The effect of visual information on word initial consonant perception of dysarthric speech. 46-49 - Devi Chandramohan, Peter L. Silsbee:
A multiple deformable template approach for visual speech recognition. 50-53 - Piero Cosi, Emanuela Magno Caldognetto, Franco Ferrero, M. Dugatto, Kyriaki Vagges:
Speaker independent bimodal phonetic recognition experiments. 54-57 - Juergen Luettin, Neil A. Thacker, Steve W. Beet:
Speechreading using shape and intensity information. 58-61 - Juergen Luettin, Neil A. Thacker, Steve W. Beet:
Speaker identification by lipreading. 62-65
Perception of Words
- David W. Gow Jr., Janis Melvold, Sharon Manuel:
How word onsets drive lexical access and segmentation: evidence from acoustics, phonology and processing. 66-69 - David van Kuijk, Peter Wittenburg, Ton Dijkstra:
RAW: a real-speech model for human word recognition. 70-73 - Mehdi Meftah, Sami Boudelaa:
How facilitatory can lexical information be during word recognition? evidence from moroccan arabic. 74-77 - Alette P. Haveman:
Effects of frequency on the auditory perception of open- versus closed-class words. 78-81 - Michael S. Vitevitch, Paul A. Luce, Jan Charles-Luce, David Kemmerer:
Phonotactic and metrical influences on adult ratings of spoken nonsense words. 82-85 - Edward T. Auer, Lynne E. Bernstein:
Lipreading supplemented by voice fundamental frequency: to what extent does the addition of voicing increase lexical uniqueness for the lipreader? 86-89 - Saskia te Riele, Sieb G. Nooteboom, Hugo Quené:
Strategies used in rhyme-monitoring. 90-93 - Wilma van Donselaar, Cecile T. L. Kuijpers, Anne Cutler:
How do dutch listeners process words with epenthetic schwa? 94-97
Phonetics, Transcription, and Analysis
- Patrick Juola, Philip Zimmermann:
Whole-word phonetic distances and the PGPfone alphabet. 98-101 - Shuping Ran, J. Bruce Millar, Phil Rose:
Automatic vowel quality description using a variable mapping to an eight cardinal vowel reference set. 102-105 - Andreas Kipp, Maria-Barbara Wesenick, Florian Schiel:
Automatic detection and segmentation of pronunciation variants in German speech corpora. 106-109 - Stephanie Seneff, Raymond Lau, Helen M. Meng:
ANGIE: a new framework for speech analysis based on morpho-phonological modelling. 110-113 - Byunggon Yang:
Perceptual contrast in the Korean and English vowel system normalized. 114-117 - Yong-Ju Lee, Sook-Hyang Lee:
On phonetic characteristics of pause in the Korean read speech. 118-120 - Sami Boudelaa, Mehdi Meftah:
Cross-language effects of lexical stress in word recognition: the case of Arabic English bilinguals. 121-124 - Maria-Barbara Wesenick:
Automatic generation of German pronunciation variants. 125-128 - Maria-Barbara Wesenick, Andreas Kipp:
Estimating the quality of phonetic transcriptions and segmentations of speech signals. 129-132 - Bojan Petek, Rastislav Sustarsic, Smiljana Komar:
An acoustic analysis of contemporary vowels of the standard slovenian language. 133-136 - Sandrine Robbe, Anne Bonneau, Sylvie Coste, Yves Laprie:
Using decision trees to construct optimal acoustic cues. 137-140 - Donna Erickson, Osamu Fujimura:
Maximum jaw displacement in contrastive emphasis. 141-144 - Rebecca Herman, Mary E. Beckman, Kiyoshi Honda:
Subglottal pressure and final lowering in English. 145-148 - Cecile T. L. Kuijpers, Wilma van Donselaar, Anne Cutler:
Phonological variation: epenthesis and deletion of schwa in Dutch. 149-152
Spoken Language Processing for Special Populations
- James J. Mahshie:
Feedback considerations for speech training systems. 153-156 - Anne-Marie Öster:
Clinical applications of computer-based speech training for children with hearing impairment. 157-160 - Valérie Hazan, Andrew Simpson:
Enhancing information-rich regions of natural VCV and sentence materials presented in noise. 161-164 - Valérie Hazan, Alan Adlard:
Speech perceptual abilities of children with specific reading difficulty (dyslexia). 165-168 - Larry D. Paarmann, Michael K. Wynne:
Bimodal perception of spectrum compressed speech. 169-172 - Dragana Barac-Cikoja, Sally Revoile:
Effect of sentential context on syllabic stress perception by hearing-impaired listeners. 173-175 - Martin J. Russell, Catherine Brown, Adrian Skilling, Robert W. Series, Julie L. Wallace, Bill Bohnam, Paul Barker:
Applications of automatic speech recognition to speech and language development in young children. 176-179 - D. R. Campbell:
Sub-band adaptive speech enhancement for hearing aids. 180-183 - Thomas Portele, Jürgen Krämer:
Adapting a TTS system to a reading machine for the blind. 184-187
Dialogue Special Sessions
- Katsuhiko Shirai:
Modeling of spoken dialogue with and without visual information. 188-191 - Stephanie Seneff, David Goddeau, Christine Pao, Joseph Polifroni:
Multimodal discourse modelling in a multi-user multi-domain environment. 192-195 - Kenji Kita, Yoshikazu Fukui, Masaaki Nagata, Tsuyoshi Morimoto:
Automatic acquisition of probabilistic dialogue models. 196-199 - Paul Heisterkamp, Scott McGlashan:
Units of dialogue management: an example. 200-203 - Sharon L. Oviatt, Robert VanGent:
Error resolution during multimodal human-computer interaction. 204-207 - Ramesh R. Sarukkai, Dana H. Ballard:
Improved spontaneous dialogue recognition using dialogue and utterance triggers by adaptive probability boosting. 208-211 - Kai Hübener, Uwe Jost, Henrik Heine:
Speech recognition for spontaneously spoken German dialogues. 212-215 - Paul Taylor, Hiroshi Shimodaira, Stephen Isard, Simon King, Jacqueline C. Kowtko:
Using prosodic information to constrain language models for spoken dialogue. 216-219 - Peter A. Heeman, Kyung-ho Loken-Kim, James F. Allen:
Combining the detection and correction of speech repairs. 362-365 - Yuji Sagawa, Wataru Sugimoto, Noboru Ohnishi:
Generating spontaneous elliptical utterance. 366-369 - Gösta Bruce, Marcus Filipsson, Johan Frid, Björn Granström, Kjell Gustafson, Merle Horne, David House, Birgitta Lastow, Paul Touati:
Developing the modelling of Swedish prosody in spontaneous dialogue. 370-373 - Shimei Pan, Kathleen R. McKeown:
Spoken language generation in a multimedia system. 374-377 - Keikichi Hirose, Mayumi Sakata, Hiromichi Kawanami:
Synthesizing dialogue speech of Japanese based on the quantitative analysis of prosodic features. 378-381 - Shuichi Tanaka, Shu Nakazato, Keiichiro Hoashi, Katsuhiko Shirai:
Spoken dialogue interface in a dual task situation. 382-385 - Yasuhisa Niimi, Yutaka Kobayashi:
A dialogue control strategy based on the reliability of speech recognition. 534-537 - Alexander I. Rudnicky, Stephen Reed, Eric H. Thayer:
Speechwear: a mobile speech system. 538-541 - Helen M. Meng, Senis Busayapongchai, James R. Glass, David Goddeau, I. Lee Hetherington, Edward Hurley, Christine Pao, Joseph Polifroni, Stephanie Seneff, Victor Zue:
WHEELS: a conversational system in the automobile classifieds domain. 542-545 - M. David Sadek, Alexandre Ferrieux, A. Cozannet, Philippe Bretier, Franck Panaget, J. Simonin:
Effective human-computer cooperative spoken dialogue: the AGS demonstrator. 546-549 - Samir Bennacef, Laurence Devillers, Sophie Rosset, Lori Lamel:
Dialog in the RAILTEL telephone-based system. 550-553 - Alon Lavie, Lori S. Levin, Yan Qu, Alex Waibel, Donna Gates, Marsal Gavaldà, Laura Mayfield, Maite Taboada:
Dialogue processing in a conversational speech translation system. 554-557
Language Modeling
- Thomas Niesler, Philip C. Woodland:
Combination of word-based and category-based language models. 220-223 - Francisco J. Valverde-Albacete, José Manuel Pardo:
A multi-level lexical-semantics based language model design for guided integrated continuous speech recognition. 224-227 - Florian Gallwitz, Elmar Nöth, Heinrich Niemann:
A category based approach for recognition of out-of-vocabulary words. 228-231 - Kristie Seymore, Ronald Rosenfeld:
Scalable backoff language models. 232-235 - Rukmini Iyer, Mari Ostendorf:
Modeling long distance dependence in language: topic mixtures vs. dynamic cache models. 236-239 - Marcello Federico:
Bayesian estimation methods for n-gram language model adaptation. 240-243 - Man-Hung Siu, Mari Ostendorf:
Modeling disfluencies in conversational speech. 386-389 - John Miller, Fil Alleva:
Evaluation of a language model using a clustered model backoff. 390-393 - Antonio Bonafonte, José B. Mariño:
Language modeling using x-grams. 394-397 - Klaus Ries, Finn Dag Buø, Alex Waibel:
Class phrase models for language modelling. 398-401 - Petra Geutner:
Introducing linguistic constraints into statistical language modeling. 402-405 - Jianying Hu, William Turin, Michael K. Brown:
Language modeling with stochastic automata. 406-409
Feature Extraction for Speech Recognition
- Don X. Sun:
Feature dimension reduction using reduced-rank maximum likelihood estimation for hidden Markov models. 244-247 - Kai Hübener:
Using multi-level segmentation coefficients to improve HMM speech recognition. 248-251 - Thomas Eisele, Reinhold Haeb-Umbach, Detlev Langmann:
A comparative study of linear feature transformation techniques for automatic speech recognition. 252-255 - Ben Milner:
Inclusion of temporal information into features for speech recognition. 256-259 - Hubert Wassner, Gérard Chollet:
New cepstral representation using wavelet analysis and spectral transformation for robust speech recognition. 260-263 - Christopher John Long, Sekharajit Datta:
Wavelet based feature extraction for phoneme recognition. 264-267 - Andrzej Drygajlo:
New fast wavelet packet transform algorithms for frame synchronized speech processing. 410-413 - Srinivasan Umesh, Leon Cohen, Nenad Marinovic, Douglas J. Nelson:
Frequency-warping in speech. 414-417 - Daisuke Kobayashi, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Extracting speech features from human speech-like noise. 418-421 - Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Subband-crosscorrelation analysis for robust speech recognition. 422-425 - Hervé Bourlard, Stéphane Dupont:
A new ASR approach based on independent processing and recombination of partial frequency bands. 426-429 - Climent Nadeu, José B. Mariño, Javier Hernando, Albino Nogueiras:
Frequency and time filtering of filter-bank energies for HMM speech recognition. 430-433
Speech Production - Measurement and Modeling
- Yves Laprie, Marie-Odile Berger:
Extraction of tongue contours in x-ray images with minimal user interaction. 268-271 - Didier Demolin, Thierry Metens, Alain Soquet:
Three-dimensional measurement of the vocal tract by MRI. 272-275 - Philip Gleason, Betty Tuller, J. A. Scott Kelso:
Syllable affiliation of final consonant clusters undergoes a phase transition over speaking rates. 276-278 - Arthur Lobo, Michael H. O'Malley:
Towards a biomechanical model of the larynx. 279-282 - Yann Morlec, Gérard Bailly, Véronique Aubergé:
Generating intonation by superposing gestures. 283-286 - Hideki Kawahara, Hiroko Kato, J. C. Williams:
Effects of auditory feedback on F0 trajectory generation. 287-290
Speech Coding / HMMs and NNs in ASR
- Ian S. Burnett, John J. Parry:
On the effects of accent and language on low rate speech coders. 291-294 - Jeng-Shyang Pan, Fergus R. McInnes, Mervyn A. Jack:
VQ codevector index assignment using genetic algorithms for noisy channels. 295-298 - Gavin C. Cawley:
An improved vector quantization algorithm for speech transmission over noisy channels. 299-301 - C. Murgia, Gang Feng, Alain Le Guyader, Catherine Quinquis:
Very low delay and high quality coding of 20 hz-15 khz speech signals at 64 kbit/s. 302-305 - Carlos M. Ribeiro, Isabel Trancoso:
Application of speaker modification techniques to phonetic vocoding. 306-309 - Tadashi Yonezaki, Kiyohiro Shikano:
Entropy coded vector quantization with hidden Markov models. 310-313 - Minoru Kohata:
An application of recurrent neural networks to low bit rate speech coding. 314-317 - Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai:
CELP coding system based on mel-generalized cepstral analysis. 318-321 - Cheung-Fat Chan, Wai-Kwong Hui:
Wideband re-synthesis of narrowband CELP-coded speech using multiband excitation model. 322-325 - Takuya Koizumi, Mikio Mori, Shuji Taniguchi, Mitsutoshi Maruya:
Recurrent neural networks for phoneme recognition. 326-329 - M. A. Mokhtar, A. Zein-el-Abddin:
A model for the acoustic phonetic structure of arabic language using a single ergodic hidden Markov model. 330-333 - Yifan Gong, Irina Illina, Jean Paul Haton:
Modelling long term variability information in mixture stochastic trajectory framework. 334-337 - Thierry Moudenc, Robert Sokol, Guy Mercier:
Segmental phonetic features recognition by means of neural-fuzzy networks and integration in an n-best solutions post-processing. 338-341 - Irina Illina, Yifan Gong:
Stochastic trajectory model with state-mixture for continuous speech recognition. 342-345 - Hermann Hild, Alex Waibel:
Recognition of spelled names over the telephone. 346-349 - Gilles Boulianne, Patrick Kenny:
Optimal tying of HMM mixture densities using decision trees. 350-353 - Hwan Jin Choi, Yung-Hwan Oh:
Speech recognition using an enhanced FVQ based on a codeword dependent distribution normalization and codeword weighting by fuzzy objective function. 354-357 - Mikko Kurimo, Panu Somervuo:
Using the self-organizing map to speed up the probability density estimation for speech recognition with mixture density HMMs. 358-361
Vowels
- Carrie E. Lang, John J. Ohala:
Temporal cues for vowels and universals of vowel inventories. 434-437 - Ann K. Syrdal:
Acoustic variability in spontaneous conversational speech of american English talkers. 438-441 - Raquel Willerman, Patricia K. Kuhl:
Cross-language speech perception: Swedish, English, and Spanish speakers' perception of front rounded vowels. 442-445 - John C. L. Ingram, See-Gyoon Park:
Inter-language vowel perception and production by Korean and Japanese listeners. 446-449 - Diane Kewley-Port, Reiko Akahane-Yamada, Kiyoaki Aikawa:
Intelligibility and acoustic correlates of Japanese accented English vowels. 450-453 - Kiyoko Yoneyama:
Segmentation strategies for spoken language recognition: evidence from semi-bilingual Japanese speakers of English. 454-457
NNs and Stochastic Modeling
- Geunbae Lee, Jong-Hyeok Lee, Kyubong Park, Byung-Chang Kim:
Integrating connectionist, statistical and symbolic approaches for continuous spoken Korean processing. 458-461 - Hynek Hermansky, Sangita Tibrewala, Misha Pavel:
Towards ASR on partially corrupted speech. 462-465 - Herbert Gish, Kenney Ng:
Parametric trajectory models for speech recognition. 466-469 - Kate M. Knill, Mark J. F. Gales, Steve J. Young:
Use of Gaussian selection in large vocabulary continuous speech recognition using HMMs. 470-473 - Jesper Högberg, Kåre Sjölander:
Cross phone state clustering using lexical stress and context. 474-477 - Eduardo Lleida-Solano, Richard C. Rose:
Likelihood ratio decoding and confidence measures for continuous speech recognition. 478-481 - Xiaohui Ma, Yifan Gong, Yuqing Fu, Jiren Lu, Jean Paul Haton:
A study on continuous Chinese speech recognition based on stochastic trajectory models. 482-485 - Yoshiaki Itoh, Jiro Kiyama, Hiroshi Kojima, Susumu Seki, Ryuichi Oka:
A proposal for a new algorithm of reference interval-free continuous DP for real-time speech or text retrieval. 486-489 - Akinori Ito, Masaki Kohda:
Language modeling by string pattern n-gram for Japanese speech recognition. 490-493 - Reinhard Kneser:
Statistical language modeling using a variable context length. 494-497 - Finn Tore Johansen:
A comparison of hybrid HMM architectures using global discriminative training. 498-501 - Wei Wei, Etienne Barnard, Mark A. Fanty:
Improved probability estimation with neural network models. 502-505 - Ha-Jin Yu, Yung-Hwan Oh:
A neural network using acoustic sub-word units for continuous speech recognition. 506-509 - Louis ten Bosch, Roel Smits:
On the error criteria in neural networks as a tool for human classification modelling. 510-513 - Gordon Ramsay:
A non-linear filtering approach to stochastic training of the articulatory-acoustic mapping using the EM algorithm. 514-517 - Y. P. Yang, John R. Deller Jr.:
A tool for automated design of language models. 518-521 - Felix Freitag, Enric Monte:
Acoustic-phonetic decoding based on elman predictive neural networks. 522-525 - Tan Lee, P. C. Ching:
On improving discrimination capability of an RNN based recognizer. 526-529 - Yumi Wakita, Jun Kawai, Hitoshi Iida:
An evaluation of statistical language modeling for speech recognition using a mixed category of both words and parts-of-speech. 530-533
Neural Models of Speech Processing
- Boris Aleksandrovsky, James Whitson, Gretchen Andes, Gary Lynch, Richard Granger:
Novel speech processing mechanism derived from auditory neocortical circuit analysis. 558-561 - Ping Tang, Jean Rouat:
Modeling neurons in the anteroventral cochlear nucleus for amplitude modulation (AM) processing: application to speech sound. 562-565 - Halewijn Vereecken, Jean-Pierre Martens:
Noise suppression and loudness normalization in an auditory model-based acoustic front-end. 566-569 - James J. Hant, Brian Strope, Abeer Alwan:
A psychoacoustic model for the noise masking of voiceless plosive bursts. 570-573 - Martin Hunke, Thomas Holton:
Training machine classifiers to match the performance of human listeners in a natural vowel classification task. 574-577 - Kiyoaki Aikawa, Hideki Kawahara, Minoru Tsuzaki:
A neural matrix model for active tracking of frequency-modulated tones. 578-581
Utterance Verification and Word Spotting
- Richard C. Rose, Eduardo Lleida-Solano, G. W. Erhart, R. V. Grubbe:
A user-configurable system for voice label recognition. 582-585 - Philippe Gelin, Christian Wellekens:
Keyword spotting enhancement for video soundtrack indexing. 586-589 - Rachida El Méliani, Douglas D. O'Shaughnessy:
New efficient fillers for unlimited word recognition and keyword spotting. 590-593 - Michelle S. Spina, Victor Zue:
Automatic transcription of general audio data: preliminary analyses. 594-597 - Francis Kubala, Tasos Anastasakos, Hubert Jin, Long Nguyen, Richard M. Schwartz:
Transcribing radio news. 598-601 - Anand R. Setlur, Rafid A. Sukkar, John Jacob:
Correcting recognition errors via discriminative utterance verification. 602-605
Acquisition/Learning Training L2 Learners
- Reiko Akahane-Yamada, Yoh'ichi Tohkura, Ann R. Bradlow, David B. Pisoni:
Does training in speech perception modify speech production? 606-609 - Motoko Ueyama:
Phrase-final lengthening and stress-timed shortening in the speech of native speakers and Japanese learners of English. 610-613 - Nobuko Yamada:
Japanese accentuations by foreign students and Japanese speakers of non-tokyo dialect. 614-617 - J. Kevin Varden, Tsutomu Sato:
Devoicing of Japanese vowels by taiwanese learners of Japanese. 618-621 - Danièle Archambault, Catherine Foucher, Blagovesta Maneva:
Fluency and use of segmental dialect features in the acquisition of a second language (French) by English speakers. 622-625 - P. Martland, Sandra P. Whiteside, Steve W. Beet, Ladan Baghai-Ravary:
Estimating child and adolescent formant frequency values from adult data. 626-629
Focus, Stress and Accent
- Agaath M. C. Sluijter, Vincent J. van Heuven:
Acoustic correlates of linguistic stress and accent in dutch and american English. 630-633 - Hiroya Fujisaki, Sumio Ohno, Osamu Tomita:
On the levels of accentuation in spoken Japanese. 634-637 - Linda Thibault, Marise Ouellet:
Tonal distinctions between emphatic stress and pretonic lengthening in quebec French. 638-641 - Anja (Petzold) Elsner:
Distinction between 'normal' focus and 'contrastive/emphatic' focus. 642-645 - Yukihiro Nishinuma, Masako Arai, Takako Ayusawa:
Perception of tonal accent by americans learning Japanese. 646-649 - Elizabeth Shriberg, D. Robert Ladd, Jacques M. B. Terken:
Modeling intra-speaker pitch range variation: predicting F0 targets when "speaking up". 650-653
Spoken Language Dialogue and Conversation
- Norbert Reithinger, Ralf Engel, Michael Kipp, Martin Klesen:
Predicting dialogue acts for a speech-to-speech translation system. 654-657 - Johannes Müller, Holger Stahl, Manfred K. Lang:
Automatic speech translation based on the semantic structure. 658-661 - Lewis M. Norton, Carl Weir, K. W. Scholz, Deborah A. Dahl, Ahmed Bouzid:
A methodology for application development for spoken language systems. 662-664 - Stephanie Seneff, Joseph Polifroni:
A new restaurant guide conversational system: issues in rapid prototyping for specialized domains. 665-668 - Tadahiko Kumamoto, Akira Ito:
Semantic interpretation of a Japanese complex sentence in an advisory dialogue - focused on the postpositional word "KEDO, " which works as a conjunction between clauses. 669-672 - Youngkuk Hong, Myoung-Wan Koo, Gijoo Yang:
A Korean morphological analyzer for speech translation system. 673-676 - Rolf Carlson, Sheri Hunnicutt:
Generic and domain-specific aspects of the waxholm NLP and dialog modules. 677-680 - Megumi Kameyama, Goh Kawai, Isao Arima:
A real-time system for summarizing human-human spontaneous spoken dialogues. 681-684 - Bernd Hildebrandt, Heike Rautenstrauch, Gerhard Sagerer:
Evaluation of spoken language understanding and dialogue systems. 685-688 - Kuniko Kakita:
Inter-speaker interaction of F0 in dialogs. 689-692 - Hans Brandt-Pook, Gernot A. Fink, Bernd Hildebrandt, Franz Kummert, Gerhard Sagerer:
A robust dialogue system for making an appointment. 693-696 - Kazuyuki Takagi, Shuichi Itahashi:
Segmentation of spoken dialogue by interjections, disfluent utterances and pauses. 697-700 - David Goddeau, Helen M. Meng, Joseph Polifroni, Stephanie Seneff, Senis Busayapongchai:
A form-based dialogue manager for spoken language applications. 701-704 - Steve Whittaker, David Attwater:
The design of complex telephony applications using large vocabulary speech technology. 705-708 - Stephen Sutton, David G. Novick, Ronald A. Cole, Pieter J. E. Vermeulen, Jacques de Villiers, Johan Schalkwyk, Mark A. Fanty:
Building 10, 000 spoken dialogue systems. 709-712 - Yen-Ju Yang, Lee-Feng Chien, Lin-Shan Lee:
Speaker intention modeling for large vocabulary Mandarin spoken dialogues. 713-716 - P. E. Kenne, Mary O'Kane:
Hybrid language models and spontaneous legal discourse. 717-720 - P. E. Kenne, Mary O'Kane:
Topic change and local perplexity in spoken legal dialogue. 721-724 - Jennifer J. Venditti, Marc Swerts:
Intonational cues to discourse structure in Japanese. 725-728 - Niels Ole Bernsen, Hans Dybkjær, Laila Dybkjær:
Principles for the design of cooperative spoken human-machine dialogue. 729-732 - Karen L. Jenkin, Michael S. Scordilis:
Development and comparison of three syllable stress classifiers. 733-736
Speech Disorders
- Donald G. Jamieson, Li Deng, M. Price, Vijay Parsa, J. Till:
Interaction of speech disorders with speech coders: effects on speech intelligibility. 737-740 - Maurílio Nunes Vieira, Arnold G. D. Maran, Fergus R. McInnes, Mervyn A. Jack:
Detecting arytenoid cartilage misplacement through acoustic and electroglottographic jitter analysis. 741-744 - Maurílio Nunes Vieira, Fergus R. McInnes, Mervyn A. Jack:
Robust F0 and jitter estimation in pathological voices. 745-748 - Fabrice Plante, H. Kessler, Barry M. G. Cheetham, J. E. Earis:
Speech monitoring of infective laryngitis. 749-752 - Jean Schoentgen, Raoul De Guchteneere:
Searching for nonlinear relations in whitened jitter time series. 753-756 - Liliana Gavidia-Ceballos, John H. L. Hansen, James F. Kaiser:
Vocal fold pathology assessment using AM autocorrelation analysis of the teager energy operator. 757-760 - David P. Kuehn:
Continuous positive airway pressure (CPAP) in the treatment of hypernasality. 761-763 - Carol Y. Espy-Wilson, Venkatesh R. Chari, Caroline B. Huang:
Enhancement of alaryngeal speech by adaptive filtering. 764-767 - Li Deng, Xuemin Shen, Donald G. Jamieson, J. Till:
Simulation of disordered speech using a frequency-domain vocal tract model. 768-771 - Yasuo Endo, Hideki Kasuya:
A stochastic model of fundamental period perturbation and its application to perception of pathological voice quality. 772-775 - Eric J. Wallen, John H. L. Hansen:
A screening test for speech pathology assessment using objective quality measures. 776-779 - Douglas A. Cairns, John H. L. Hansen, James F. Kaiser:
Recent advances in hypernasal speech detection using the nonlinear teager energy operator. 780-783
Vocal Tract Geometry
- Kiyoshi Honda, Shinji Maeda, Michiko Hashi, Jim Dembowski, John R. Westbury:
Human palate and related structures: their articulatory consequences. 784-787 - Edward P. Davis, Andrew Douglas, Maureen L. Stone:
A continuum mechanics representation of tongue deformation. 788-792 - Philbert Bangayan, Abeer Alwan, Shrikanth S. Narayanan:
From MRI and acoustic data to articulatory synthesis: a case study of the lateral approximants in american English. 793-796 - Shrikanth S. Narayanan, Abigail Kaun, Dani Byrd, Peter Ladefoged, Abeer Alwan:
Liquids in tamil. 797-800 - Chang-Sheng Yang, Hideki Kasuya:
Speaker individualities of vocal tract shapes of Japanese vowels measured by magnetic resonance images. 949-952 - Samir El-Masri, Xavier Pelorson, Pierre Saguet, Pierre Badin:
Vocal tract acoustics using the transmission line matrix (TLM) method. 953-956 - Gérard Bailly:
Building sensori-motor prototypes from audiovisual exemplars. 957-960 - Mats Båvegård, Gunnar Fant:
Parameterized VT area function inversion. 961-964 - Jianwu Dang, Kiyoshi Honda:
An improved vocal tract model of vowel production implementing piriform resonance and transvelar nasal coupling. 965-968 - C. Simon Blackburn, Steve J. Young:
Pseudo-articulatory speech synthesis for recognition using automatic feature extraction from x-ray data. 969-972
Prosody in ASR and Segmentation
- Sharon L. Oviatt, Gina-Anne Levow, Margaret MacEachern, Karen Kuhn:
Modeling hyperarticulate speech during human-computer error resolution. 801-804 - Siripong Potisuk, Mary P. Harper, Jackson T. Gandour:
Using stress to disambiguate spoken Thai sentences containing syntactic ambiguity. 805-808 - Hung-Yun Hsieh, Ren-Yuan Lyu, Lin-Shan Lee:
Use of prosodic information to integrate acoustic and linguistic knowledge in continuous Mandarin speech recognition with very large vocabulary. 809-812 - G. V. Ramana Rao, J. Srichand:
Word boundary detection using pitch variations. 813-816 - Atsuhiro Sakurai, Keikichi Hirose:
Detection of phrase boundaries in Japanese by low-pass filtering of fundamental frequency contours. 817-820 - Vincent Pagel, Noelle Carbonell, Yves Laprie:
A new method for speech delexicalization, and its application to the perception of French prosody. 821-824
Acquisition and Learning by Machine
- Udo Bub:
Task adaptation for dialogues via telephone lines. 825-828 - Ronald A. Cole, Yonghong Yan, Troy Bailey:
The influence of bigram constraints on word recognition by humans: implications for computer speech recognition. 829-832 - Tetsunori Kobayashi:
ALICE: acquisition of language in conversational environment - an approach to weakly supervised training of spoken language system for language porting. 833-836 - Takashi Yoshimura, Satoru Hayamizu, Hiroshi Ohmura, Kazuyo Tanaka:
Pitch pattern clustering of user utterances in human-machine dialogue. 837-840 - Juan-Carlos Amengual, Enrique Vidal, José-Miguel Benedí:
Simplifying language through error-correcting decoding. 841-844 - Mauro Cettolo, Anna Corazza, Renato de Mori:
A mixed approach to speech understanding. 845-848
Dialogue Systems
- Jean-Luc Gauvain, Jean-Jacques Gangolf, Lori Lamel:
Speech recognition for an information kiosk. 849-852 - Helmer Strik, Albert Russel, Henk van den Heuvel, Catia Cucchiarini, Lou Boves:
Localizing an automatic inquiry system for public transport information. 853-856 - Stephen M. Marcus, Deborah W. Brown, Randy G. Goldberg, Max S. Schoeffler, William R. Wetzel, Richard R. Rosinski:
Prompt constrained natural language - evolving the next generation of telephony services. 857-860 - Tatsuya Kawahara, Chin-Hui Lee, Biing-Hwang Juang:
Key-phrase detection and verification for flexible speech understanding. 861-864 - Bernhard Suhm, Brad A. Myers, Alex Waibel:
Interactive recovery from speech recognition errors in speech user interfaces. 865-868 - Sunil Issar:
Estimation of language models for new spoken language applications. 869-872
Speech Enhancement and Robust Processing
- Xuemin Shen, Li Deng, Anisa Yasmin:
H-infinity filtering for speech enhancement. 873-876 - Saeed Vaseghi, Ben P. Milner:
A comparitive analysis of channel-robust features and channel equalization methods for speech recognition. 877-880 - Jia-Lin Shen, Wen-Liang Hwang, Lin-Shan Lee:
Robust speech recognition features based on temporal trajectory filtering of frequency band spectrum. 881-884 - Kevin Power:
Durational modelling for improved connected digit recognition. 885-888 - Carlos Avendaño, Hynek Hermansky:
Study on the dereverberation of speech based on temporal envelope filtering. 889-892 - Thorsten Brants:
Estimating Markov model structures. 893-896 - Eric K. Ringger, James F. Allen:
A fertility channel model for post-correction of continuous speech recognition. 897-900 - Hiroshi Yasukawa:
Restoration of wide band signal from telephone speech using linear prediction error processing. 901-904 - Hiroshi Matsumoto, Noboru Naitoh:
Smoothed spectral subtraction for a frequency-weighted HMM in noisy speech recognition. 905-908 - William S. Woods, Martin Hansen, Thomas Wittkop, Birger Kollmeier:
A simple architecture for using multiple cues in sound separation. 909-912 - Bojan Petek, Ove Andersen, Paul Dalsgaard:
On the robust automatic segmentation of spontaneous speech. 913-916 - C. G. Miglietta, Chafic Mokbel, Denis Jouvet, Jean Monné:
Bayesian adaptation of speech recognizers to field speech data. 917-920 - A. J. Darlington, D. J. Campbell:
Sub-band adaptive filtering applied to speech enhancement. 921-924 - John P. Openshaw, John S. Mason:
Noise robust estimate of speech dynamics for speaker recognition. 925-928 - Javier Ortega-Garcia, Joaquin Gonzalez-Rodriguez:
Overview of speech enhancement techniques for automatic speaker recognition. 929-932 - Naomi Harte, Saeed Vaseghi, Ben P. Milner:
Dynamic features for segmental speech recognition. 933-936 - Takuya Koizumi, Mikio Mori, Shuji Taniguchi:
Speech recognition based on a model of human auditory system. 937-940 - Josep M. Salavedra, Enrique Masgrau:
APVQ encoder applied to wideband speech coding. 941-944 - Jin Zhou, Yair Shoham, Ali N. Akansu:
Simple fast vector quantization of the line spectral frequencies. 945-948
Speaker Adaptation and Normalization I
- Tomoko Matsui, Sadaoki Furui:
N-best-based instantaneous speaker adaptation method for speech recognition. 973-976 - Claude Montacié, Marie-José Caraty, Claude Barras:
Mixture splitting technic and temporal control in a HMM-based recognition system. 977-980 - Lei Yao, Dong Yu, Taiyi Huang:
A unified spectral transformation adaptation approach for robust speech recognition. 981-984 - Qiang Huo, Chin-Hui Lee:
On-line adaptive learning of the correlated continuous density hidden Markov models for speech recognition. 985-988 - Nikko Ström:
Speaker adaptation by modeling the speaker variation in a continuous speech recognition system. 989-992 - Yasuo Ariki, Shigeaki Tagashira:
An enquiring system of unknown words in TV news by spontaneous repetition (application of speaker normalization by speaker subspace projection). 993-996 - Jinsong Zhang, Beiqian Dai, Changfu Wang, HingKeung Kwan, Keikichi Hirose:
Adaptive recognition method based on posterior use of distribution pattern of output probabilities. 1129-1132 - Philip C. Woodland, David Pye, Mark J. F. Gales:
Iterative unsupervised adaptation using maximum likelihood linear regression. 1133-1136 - Tasos Anastasakos, John W. McDonough, Richard M. Schwartz, John Makhoul:
A compact model for speaker-adaptive training. 1137-1140 - Shigeru Homma, Jun-ichi Takahashi, Shigeki Sagayama:
Iterative unsupervised speaker adaptation for batch dictation. 1141-1144 - Daniel C. Burnett, Mark A. Fanty:
Rapid unsupervised adaptation to children's speech on a connected-digit task. 1145-1148 - Jun Ishii, Masahiro Tonomura, Shoichi Matsunaga:
Speaker adaptation using tree structured shared-state HMMs. 1149-1152
Spoken Language and NLP
- Richard M. Schwartz, Scott Miller, David Stallard, John Makhoul:
Language understanding using hidden understanding models. 997-1000 - Allen L. Gorin:
Processing of semantic information in fluently spoken language. 1001-1004 - Andreas Stolcke, Elizabeth Shriberg:
Automatic linguistic segmentation of conversational speech. 1005-1008 - Manuela Boros, Wieland Eckert, Florian Gallwitz, Günther Görz, Gerhard Hanrieder, Heinrich Niemann:
Towards understanding spontaneous speech: word accuracy vs. concept accuracy. 1009-1012 - Wolfgang Minker, Samir Bennacef, Jean-Luc Gauvain:
A stochastic case frame approach for natural language understanding. 1013-1016 - Frank Seide, Bernhard Rueber, Andreas Kellner:
Improving speech understanding by incorporating database constraints and dialogue history. 1017-1020 - Finn Dag Buø, Alex Waibel:
Learning to parse spontaneous speech. 1153-1156 - Jean-Yves Antoine:
Spontaneous speech and natural language processing ALPES: a robust semantic-led parser. 1157-1160 - Jorge Alvarez-Cercadillo, F. Javier Caminero-Gil, Carlos Crespo-Casas, Daniel Tapias Merino:
The natural language processing module for a voice assisted operator at telef nica i+D. 1161-1164 - André Berton, Pablo Fetter, Peter Regel-Brietzmann:
Compound words in large-vocabulary German speech recognition systems. 1165-1168 - Anton Batliner, Anke Feldhaus, Stefan Geißler, Tibor Kiss, Ralf Kompe, Elmar Nöth:
Prosody, empty categories and parsing - a success story. 1169-1172 - B. Srinivas:
"almost parsing" technique for language modeling. 1173-1176
Spoken Discourse Analysis/Synthesis
- Tetsuro Chino, Hiroyuki Tsuboi:
A new discourse structure model for spontaneous spoken dialogue. 1021-1024 - David Duff, Barbara Gates, Susann LuperFoy:
An architecture for spoken dialogue management. 1025-1028 - Monique E. van Donzel, Florien J. Koopmans-van Beinum:
Pausing strategies in discourse in dutch. 1029-1032 - Marc Swerts, Anne Wichmann, Robbert-Jan Beun:
Filled pauses as markers of discourse structure. 1033-1036 - Cheol-jae Seong, Minsoo Hahn:
The prosodic analysis of Korean dialogue speech - through a comparative study with read speech. 1037-1040 - Mary O'Kane, P. E. Kenne:
Changing the topic: how long does it take? 1041-1044
Acoustic Modeling
- Christian-Michael Westendorf, Jens Jelitto:
Learning pronunciation dictionary from speech data. 1045-1048 - Ariane Lazaridès, Yves Normandin, Roland Kuhn:
Improving decision trees for acoustic modeling. 1053-1056 - Gongjun Li, Taiyi Huang:
An improved training algorithm in HMM-based speech recognition. 1057-1060 - Ji Ming, Peter O'Boyle, John G. McMahon, Francis Jack Smith:
Speech recognition using a strong correlation assumption for the instantaneous spectra. 1061-1064 - Pau Pachès-Leal, Climent Nadeu:
On parameter filtering in continuous subword-unit-based speech recognition. 1065-1068 - Shigeki Okawa, Katsuhiko Shirai:
Estimation of statistical phoneme center considering phonemic environments. 1069-1072 - Xue Wang, Louis ten Bosch, Louis C. W. Pols:
Integration of context-dependent durational knowledge into HMM-based speech recognition. 1073-1076 - Toshiaki Fukada, Michiel Bacchiani, Kuldip K. Paliwal, Yoshinori Sagisaka:
Speech recognition based on acoustically derived segment units. 1077-1080 - Rivarol Vergin, Azarshid Farhat, Douglas D. O'Shaughnessy:
Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification. 1081-1084 - Tae-Young Yang, Won-Ho Shin, Weon-Goo Kim, Dae Hee Youn:
A codebook adaptation algorithm for SCHMM using formant distribution. 1085-1088 - Jacques Simonin, S. Bodin, Denis Jouvet, Katarina Bartkova:
Parameter tying for flexible speech recognition. 1089-1092 - Tsuneo Nitta, Shin'ichi Tanaka, Yasuyuki Masai, Hiroshi Matsuura:
Word-spotting based on inter-word and intra-word diphone models. 1093-1096 - Antonio Bonafonte, Josep Vidal, Albino Nogueiras:
Duration modeling with expanded HMM applied to speech recognition. 1097-1100 - Ricardo de Córdoba, José Manuel Pardo:
Different strategies for distribution clustering using discrete, semicontinuous and continuous HMMs in CSR. 1101-1104 - Ilija Zeljkovic, Shrikanth S. Narayanan:
Improved HMM phone and triphone models for realtime ASR telephony applications. 1105-1108 - Yasuhiro Minami, Sadaoki Furui:
Improved extended HMM composition by incorporating power variance. 1109-1112 - Gordon Ramsay, Li Deng:
Optimal filtering and smoothing for speech recognition using a stochastic target model. 1113-1116 - Zhihong Hu, Johan Schalkwyk, Etienne Barnard, Ronald A. Cole:
Speech recognition using syllable-like units. 1117-1120 - Jean-Claude Junqua, Lorenzo Vassallo:
Context modeling and clustering in continuous speech recognition. 2262-2265 - Li Deng, Jim Jian-Xiong Wu:
Hierarchical partition of the articulatory state space for overlapping-feature based speech recognition. 2266-2269 - Olivier Oppizzi, David Fournier, Philippe Gilles, Henri Meloni:
A fuzzy acoustic-phonetic decoder for speech recognition. 2270-2273 - Katrin Kirchhoff:
Syllable-level desynchronisation of phonetic features for speech recognition. 2274-2276 - James R. Glass, Jane W. Chang, Michael K. McCandless:
A probabilistic framework for feature-based speech recognition. 2277-2280 - Jim Jian-Xiong Wu, Li Deng, Jacky Chan:
Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese. 2281-2284
Physics and Simulation of the Vocal Tract
- Cecil H. Coker, Michael H. Krane, B. Y. Reis, R. A. Kubli:
Search for unexplored effects in speech production. 1121-1124 - Pierre Badin, Christian Abry:
Articulatory synthesis from x-rays and inversion for an adaptive speech robot. 1125-1128 - Hisayoshi Suzuki, Takayoshi Nakai, Hiroshi Sakakibara:
Analysis of acoustic properties of the nasal tract using 3-d FEM. 1285-1288 - Johan Liljencrants:
Experiments with analysis by synthesis of glottal airflow. 1289-1292
Duration and Rhythm
- Marise Ouellet, Benoît Tardif:
From segmental duration properties to rhythmic structure: a study of interactions between high and low level. 1177-1180 - Xue Wang, Louis C. W. Pols, Louis ten Bosch:
Analysis of context-dependent segmental duration for automatic speech recognition. 1181-1184 - Delphine Dahan:
The role of the rhythmic groups in the segmentation of continuous French speech. 1185-1188 - Zita McRobbie-Utasi:
The implications of temporal patterns for the prosody of boundary signaling in connected speech. 1189-1192 - Hyunbok Lee, Cheol-jae Seong:
Experimental phonetic study of the syllable duration of Korean with respect to the positional effect. 1193-1196 - Dik J. Hermes:
Timing of pitch movements and accentuation of syllables. 1197-1200
Acoustic Analysis
- Goangshiuan S. Ying, Leah H. Jamieson, Carl D. Mitchell:
A probabilistic approach to AMDF pitch detection. 1201-1204 - Alain Soquet, Véronique Lecuit, Thierry Metens, Didier Demolin:
From sagittal cut to area function: an RMI investigation. 1205-1208 - Léonard Janer, Juan José Bonet, Eduardo Lleida-Solano:
Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms. 1209-1212 - Yannis Stylianou:
Decomposition of speech signals into a deterministic and a stochastic part. 1213-1216 - Cheol-Woo Jo, Ho-Gyun Bang, William A. Ainsworth:
Improved glottal closure instant detector based on linear prediction and standard pitch concept. 1217-1220 - Xihong Wang, Stephen A. Zahorian, Stefan Auberg:
Analysis of speech segments using variable spectral/temporal resolution. 1221-1224 - Brian Eberman, William Goldenthal:
Time-based clustering for phonetic segmentation. 1225-1228 - Parham Zolfaghari, Tony Robinson:
Formant analysis using mixtures of Gaussians. 1229-1232 - Hywel B. Richards, John S. Mason, Melvyn J. Hunt, John S. Bridle:
Deriving articulatory representations from speech with various excitation modes. 1233-1236 - Manish Sharma, Richard J. Mammone:
"blind" speech segmentation: automatic segmentation of speech without linguistic knowledge. 1237-1240 - Hiroshi Ohmura, Kazuyo Tanaka:
Speech synthesis using a nonlinear energy damping model for the vocal folds vibration effect. 1241-1244 - Munehiro Namba, Hiroyuki Kamata, Yoshihisa Ishida:
Neural networks learning with L1 criteria and its efficiency in linear prediction of speech signals. 1245-1248 - Anna Esposito, Eugène C. Ezin, M. Ceccarelli:
Preprocessing and neural classification of English stop consonants [b, d, g, p, t, k]. 1249-1252 - K. S. Ananthakrishnan:
A comparison of modified k-means(MKM) and NN based real time adaptive clustering algorithms for articulatory space codebook formation. 1253-1256 - Wen Ding, Hideki Kasuya:
A novel approach to the estimation of voice source and vocal tract parameters from speech signals. 1257-1260 - Hartmut R. Pfitzinger, Susanne Burger, Sebastian Heid:
Syllable detection in read and spontaneous speech. 1261-1264 - Kuansan Wang, Chin-Hui Lee, Biing-Hwang Juang:
Maximum likelihood learning of auditory feature maps for stationary vowels. 1265-1268 - Antonio Bonafonte, Albino Nogueiras, Antonio Rodriguez-Garrido:
Explicit segmentation of speech using Gaussian models. 1269-1272 - E. Mousset, William A. Ainsworth, José A. R. Fonollosa:
A comparison of several recent methods of fundamental frequency and voicing decision estimation. 1273-1276 - Toshihiko Abe, Takao Kobayashi, Satoshi Imai:
Robust pitch estimation with harmonics enhancement in noisy environments based on instantaneous frequency. 1277-1280 - Asunción Moreno, Miquel Rutllán:
Integrated polispectrum on speech recognition. 1281-1284
Speech Recognition Using HMMs and NNs
- Joao P. Neto, Ciro Martins, Luís B. Almeida:
An incremental speaker-adaptation technique for hybrid HMM-MLP recognizer. 1293-1296 - Youngjoo Suh, Youngjik Lee:
Phoneme segmentation of continuous speech using multi-layer perceptron. 1297-1300 - Jeff A. Bilmes, Nelson Morgan, Su-Lin Wu, Hervé Bourlard:
Stochastic perceptual speech models with durational dependence. 1301-1304 - Gary D. Cook, Anthony J. Robinson:
Boosting the performance of connectionist large vocabulary speech recognition. 1305-1308 - Nicolas Pican, Dominique Fohr, Jean-François Mari:
HMMs and OWE neural network for continuous speech recognition. 1309-1312 - Steve R. Waterhouse, Dan J. Kershaw, Tony Robinson:
Smoothed local adaptation of connectionist systems. 1313-1316
Adverse Environments and Multiple Microphones
- Takeshi Yamada, Satoshi Nakamura, Kiyohiro Shikano:
Robust speech recognition with speaker localization by a microphone array. 1317-1320 - Ea-Ee Jan, James L. Flanagan:
Sound source localization in reverberant environments using an outlier elimination algorithm. 1321-1324 - Dan J. Kershaw, Tony Robinson, Steve Renals:
The 1995 abbot LVCSR system for multiple unknown microphones. 1325-1328 - Diego Giuliani, Maurizio Omologo, Piergiorgio Svaizer:
Experiments of speech recognition in a noisy and reverberant environment using a microphone array and HMM. 1329-1332 - Joaquin Gonzalez-Rodriguez, Javier Ortega-Garcia, César Martin, Luis Hernández:
Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays. 1333-1336 - Kuan-Chieh Yen, Yunxin Zhao:
Robust automatic speech recognition using a multi-channel signal separation front-end. 1337-1340
Prosodic Synthesis in Dialogue
- Anders Lindström, Ivan Bretan, Mats Ljungqvist:
Prosody generation in text-to-speech conversion using dependency graphs. 1341-1344 - Hisako Asano, Hisashi Ohara, Yoshifumi Ooyama:
Extraction method of non-restrictive modification in Japanese as a marked factor of prosody. 1345-1348 - Scott Prevost:
Modeling contrast in the generation and synthesis of spoken language. 1349-1352 - Hajime Tsukada:
A left-to-right processing model of pausing in Japanese based on limited syntactic information. 1353-1356 - Dimitrios Galanis, Vassilios Darsinos, George Kokkinakis:
Modeling of intonation bearing emphasis for TTS-synthesis of greek dialogues. 1357-1360 - Barbara Heuft, Thomas Portele:
Synthesizing prosody: a prominence-based approach. 1361-1364
Speech Synthesis
- Richard Sproat:
Multilingual text analysis for text-to-speech synthesis. 1365-1368 - Yoshifumi Ooyama, Hisako Asano, Koji Matsuoka:
Spoken-style explanation generator for Japanese kanji using a text-to-speech system. 1369-1372 - Ken-ichi Magata, Tomoki Hamagami, Mitsuo Komura:
A method for estimating prosodic symbol from text for Japanese text-to-speech synthesis. 1373-1376 - Eduardo López Gonzalo, Jose M. Rodriguez-Garcia:
Statistical methods in data-driven modeling of Spanish prosody for text to speech. 1377-1380 - Jung-Chul Lee, Youngjik Lee, Sanghun Kim, Minsoo Hahn:
Intonation processing for TTS using stylization and neural network learning method. 1381-1384 - Alan W. Black, Andrew J. Hunt:
Generating F0 contours from toBI labels using linear regression. 1385-1388 - Wern-Jun Wang, Shaw-Hwa Hwang, Sin-Horng Chen:
The broad study of homograph disambiguity for Mandarin speech synthesis. 1389-1392 - Thierry Dutoit, Vincent Pagel, Nicolas Pierret, F. Bataille, Olivier van der Vrecken:
The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes. 1393-1396 - Makoto Hashimoto, Norio Higuchi:
Training data selection for voice conversion using speaker selection and vector field smoothing. 1397-1400 - Ki-Seung Lee, Dae Hee Youn, Il-Whan Cha:
A new voice transformation method based on both linear and nonlinear prediction analysis. 1401-1404 - Geneviève Baudoin, Yannis Stylianou:
On the transformation of the speech spectrum for voice conversion. 1405-1408 - Cristina Delogu, Andrea Paoloni, Susanna Ragazzini, Paola Ridolfi:
Spectral analysis of synthetic speech and natural speech with noise over the telephone line. 1409-1412 - Weizhong Zhu, Hideki Kasuya:
A new speech synthesis system based on the ARX speech production model. 1413-1416 - Geraldo Lino de Campos, Evandro B. Gouvêa:
Speech synthesis using the CELP algorithm. 1417-1420 - Shaw-Hwa Hwang, Sin-Horng Chen, Yih-Ru Wang:
A Mandarin text-to-speech system. 1421-1424 - Mike D. Edgington, A. Lowry:
Residual-based speech modification algorithms for text-to-speech synthesis. 1425-1428 - Per Olav Heggtveit:
A generalized LR parser for text-to-speech synthesis. 1429-1432 - Mat P. Pollard, Barry M. G. Cheetham, Colin C. Goodyear, Mike D. Edgington, A. Lowry:
Enhanced shape-invariant pitch and time-scale modification for concatenative speech synthesis. 1433-1436 - Yasuhiko Arai, Ryo Mochizuki, Hirofumi Nishimura, Takashi Honda:
An excitation synchronous pitch waveform extraction method and its application to the VCV-concatenation synthesis of Japanese spoken words. 1437-1440 - Ren-Hua Wang, Qingfeng Liu, Difei Tang:
A new Chinese text-to-speech system with high naturalness. 1441-1444 - Ansgar Rinscheid:
Voice conversion based on topological feature maps and time-variant filtering. 1445-1448
Instructional Technology for Spoken Language
- Yoram Meron, Keikichi Hirose:
Language training system utilizing speech modification. 1449-1452 - Donald G. Jamieson, K. Yu:
Perception of English /r/ and /l/ speech contrasts by native Korean listeners with extensive English-language experience. 1453-1456 - Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, Patti Price:
Automatic text-independent pronunciation scoring of foreign language student speech. 1457-1460 - Antônio Simoes:
Assessing the contribution of instructional technology in the teaching of pronunciation. 1461-1464 - Maxine Eskénazi:
Detection of foreign speakers' pronunciation errors for second language training - preliminary results. 1465-1468 - Hansjörg Mixdorff:
Foreign accent in intonation patterns - a contrastive study applying a quantitative model of the F0 contour. 1469-1472 - Duncan J. Markham, Yasuko Nagano-Madsen:
Input modality effects in foreign accent. 1473-1476
Multimodal Spoken Language Processing
- Lynne E. Bernstein, Christian Benoît:
For speech perception by humans or machines, three senses are better than one. 1477-1480 - Kaoru Sekiyama, Yoh'ichi Tohkura, Michio Umeda:
A few factors which affect the degree of incorporating lip-read information into speech perception. 1481-1484 - Eric Vatikiotis-Bateson, Kevin G. Munhall, Y. Kasahara, Frederique Garcia, Hani Yehia:
Characterizing audiovisual information during speech. 1485-1488 - Charlotte M. Reed:
The implications of the tadoma method of speechreading for spoken language processing. 1489-1492 - Ruth Campbell:
Seeing speech in space and time: psychological and neurological findings. 1493-1496 - Kerry P. Green:
Studies of the mcgurk effect: implications for theories of speech perception. 1652-1655 - N. Michael Brooke:
Using the visual component in automatic speech recognition. 1656-1659 - Robert E. Remez:
Perceptual organization of speech in one and several modalities: common functions, common resources. 1660-1663 - David B. Pisoni, Helena M. Saldaña, Sonya M. Sheffert:
Multi-modal encoding of speech in memory: a first report. 1664-1667
Prosody - Phonological/Phonetic Measures
- Volker Strom, Christina Widera:
What's in the "pure" prosody? 1497-1500 - Marc Swerts, Eva Strangert, Mattias Heldner:
F0 declination in read-aloud and spontaneous speech. 1501-1504 - Yeon-Jun Kim, Yung-Hwan Oh:
Prediction of prosodic phrase boundaries considering variable speaking rate. 1505-1508 - Yoichi Yamashita, Riichiro Mizoguchi:
Prediction of F0 parameter of contextualized utterances in dialogue. 1509-1512 - Veronika Makarova, J. Matsui:
The production and perception of potentially ambiguous intonation contours by speakers of Russian and Japanese. 1513-1516 - Robert Eklund:
What is invariant and what is optional in the realization of a FOCUSED word? a cross-dialectal study of Swedish sentences with moving focus. 1517-1520
Phonetics and Perception
- Christine H. Shadle, Sheila J. Mair:
Quantifying spectral characteristics of fricatives. 1521-1524 - Natasha Warner:
Acoustic characteristics of ejectives in ingush. 1525-1528 - R. J. J. H. van Son, Louis C. W. Pols:
An acoustic profile of consonant reduction. 1529-1532 - Danièle Archambault, Blagovesta Maneva:
Devoicing in post-vocalic canadian-French obstruants. 1533-1536 - Alexander L. Francis, Howard C. Nusbaum:
Paying attention to speaking rate. 1537-1540 - Irene Appelbaum:
The lack of invariance problem and the goal of speech perception. 1541-1544
Language Acquisition
- Jean E. Andruski, Patricia K. Kuhl:
The acoustic structure of vowels in mothers' speech to infants and adults. 1545-1548 - Chris J. Clement, Florien J. Koopmans-van Beinum, Louis C. W. Pols:
Acoustical characteristics of sound production of deaf and normally hearing infants. 1549-1552 - John Kingston, Christine Bartels, José Benkí, Deanna Moore, Jeremy Rice, Rachel Thorburn, Neil Macmillan:
Learning non-native vowel categories. - Pierre A. Hallé, Toshisada Deguchi, Yuji Tamekawa, Benedicte de Boysson-Bardies, Shigeru Kiritani:
Word recognition by Japanese infants. 1557-1560 - Peter W. Jusczyk:
Investigations of the word segmentation abilities of infants. 1561-1564 - Akiko Hayashi, Yuji Tamekawa, Toshisada Deguchi, Shigeru Kiritani:
Developmental change in perception of clause boundaries by 6- and 10-month-old Japanese infants. 1565-1568
Production and Prosody Posters
- Paavo Alku, Erkki Vilkman:
A frequency domain method for parametrization of the voice source. 1569-1572 - Krzysztof Marasek:
Glottal correlates of the word stress and the tense/lax opposition in German. 1573-1576 - Suzanne Boyce, Carol Y. Espy-Wilson:
Coarticulatory stability in american English /r/. 1577-1580 - Shinobu Masaki, Reiko Akahane-Yamada, Mark K. Tiede, Yasuhiro Shimada, Ichiro Fujimoto:
An MRI-based analysis of the English /r/ and /l/ articulations. 1581-1584 - David van Kuijk:
Does lexical stress or metrical stress better predict word boundaries in Dutch? 1585-1588 - Alan Wrench, Alan D. McIntosh, William J. Hardcastle:
Optopalatograph (OPG): a new apparatus for speech production analysis. 1589-1592 - René Carré:
Prediction of vowel systems using a deductive approach. 1593-1596 - Sheila J. Mair, Celia Scully, Christine H. Shadle:
Distinctions between [t] and [tch] using electropalatography data. 1597-1600 - Michiko Hashi, Raymond D. Kent, John R. Westbury, Mary J. Lindstrom:
Relating formants and articulation in intelligibility test words. 1601-1604 - Imad Znagui, Mohamed Yeou:
The role of coarticulation in the perception of vowel quality in modern standard Arabic. 1605-1608 - Simon Arnfield, Wilf Jones:
Updating the reading EPG. 1609-1611 - Goangshiuan S. Ying, Leah H. Jamieson, Ruxin Chen, Carl D. Mitchell:
Lexical stress detection on stress-minimal word pairs. 1612-1615 - Jing Wang:
An acoustic study of the interaction between stressed and unstressed syllables in spoken Mandarin. 1616-1619 - Nobuaki Minematsu, Seiichi Nakagawa:
Automatic detection of accent nuclei at the head of words for speech recognition. 1620-1623 - Fu-Chiang Chou, Chiu-yu Tseng, Lin-Shan Lee:
Automatic generation of prosodic structure for high quality Mandarin speech synthesis. 1624-1627 - Tomoki Hamagami, Ken-ichi Magata, Mitsuo Komura:
A study on Japanese prosodic pattern and its modeling in restricted speech. 1628-1631 - Steve Hoskins:
A phonetic study of focus in intransitive verb sentences. 1632-1635 - Stefan Rapp:
Goethe for prosody. 1636-1639 - K. A. Straub:
Prosodic cues in syntactically ambiguous strings; an interactive speech planning mechanism. 1640-1643 - Jinfu Ni, Ren-Hua Wang, Deyu Xia:
A functional model for generation of the local components of F0 contours in Chinese. 1644-1647 - Marie Fellbaum:
The acquisition of voiceless stops in the interlanguage of second language learners of English and Spanish. 1648-1651
User-Machine Interfaces
- Brian Mellor, Chris Baber, C. Tunley:
Evaluating automatic speech recognition as a component of a multi-input device human-computer interface. 1668-1671 - Andrew Life, Ian Salter, Jean-Noël Temem, Franck Bernard, Sophie Rosset, Samir Bennacef, Lori Lamel:
Data collection for the MASK kiosk: WOz vs prototype system. 1672-1675 - Murat Karaorman, Ted H. Applebaum, Tatsuro Itoh, Mitsuru Endo, Yoshio Ohno, Masakatsu Hoshimi, Takahiro Kamai, Kenji Matsui, Kazue Hata, Steve Pearson, Jean-Claude Junqua:
An experimental Japanese/English interpreting video phone system. 1676-1679 - Sara Basson, Stephen Springer, Cynthia Fong, Hong C. Leung, Edward Man, Michele Olson, John F. Pitrelli, Ranvir Singh, Suk Wong:
User participation and compliance in speech automated telecommunications applications. 1680-1683 - Samuel Bayer:
Embedding speech in web interfaces. 1684-1687 - Toshihiro Isobe, Masatoshi Morishima, Fuminori Yoshitani, Nobuo Koizumi, Ken'ya Murakami:
Voice-activated home banking system and its field trial. 1688-1691
TTS Systems and Rules
- Sangho Lee, Yung-Hwan Oh:
A text analyzer for Korean text-to-speech systems. 1692-1695 - Helen E. Karn:
Design and evaluation of a phonological phrase parser for Spanish text-to-speech. 1696-1699 - Ove Andersen, Roland Kuhn, Ariane Lazaridès, Paul Dalsgaard, Jürgen Haas, Elmar Nöth:
Comparison of two tree-structured approaches for grapheme-to-phoneme conversion. 1700-1703 - Martin J. Adamson, Robert I. Damper:
A recurrent network that learns to pronounce English text. 1704-1707 - Eleonora Cavalcante Albano, Agnaldo Antonio Moreira:
Archisegment-based letter-to-phone conversion for concatenative speech synthesis in Portuguese. 1708-1711 - Yuki Yoshida, Shin'ya Nakajima, Kazuo Hakoda, Tomohisa Hirokawa:
A new method of generating speech synthesis units based on phonological knowledge and clustering technique. 1712-1715
Prosody and Labeling
- Martine Grice, Matthias Reyelt, Ralf Benzmüller, Jörg Mayer, Anton Batliner:
Consistency in transcription and labelling of German intonation with GToBI. 1716-1719 - Anton Batliner, Ralf Kompe, Andreas Kießling, Heinrich Niemann, Elmar Nöth:
Syntactic-prosodic labeling of large spontaneous speech data-bases. 1720-1723 - Florien J. Koopmans-van Beinum, Monique E. van Donzel:
Relationship between discourse structure and dynamic speech rate. 1724-1727 - Nigel Ward:
Using prosodic clues to decide when to produce back-channel utterances. 1728-1731 - Marion Mast, Ralf Kompe, Stefan Harbeck, Andreas Kießling, Heinrich Niemann, Elmar Nöth, Ernst Günter Schukat-Talamazzini, Volker Warnke:
Dialog act classification with the help of prosody. 1732-1735 - David van Kuijk, Henk van den Heuvel, Lou Boves:
Using lexical stress in continuous speech recognition for dutch. 1736-1739
Speaker/Language Identification and Verification
- Karsten Kumpf, Robin W. King:
Automatic accent classification of foreign accented australian English speech. 1740-1743 - Filipp Korkmazskiy, Biing-Hwang Juang:
Discriminative adaptation for speaker verification. 1744-1747 - Verna Stockmal, D. Muljani, Zinny S. Bond:
Perceptual features of unknown foreign languages as revealed by multi-dimensional scaling. 1748-1751 - Kin Yu, John S. Mason:
On-line incremental adaptation for speaker verification using maximum likelihood estimates of CDHMM parameters. 1752-1755 - Dominique Genoud, Frédéric Bimbot, Guillaume Gravier, Gérard Chollet:
Combining methods to improve speaker verification decision. 1756-1759 - Cesar Martín del Alamo, J. Álvarez, Celinda de la Torre, F. J. Poyatos, Lis Hernández:
Incremental speaker adaptation with minimum error discriminative training for speaker identification. 1760-1763 - Konstantin P. Markov, Seiichi Nakagawa:
Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models. 1764-1767 - Ann E. Thymé-Gobbel, Sandra E. Hutchins:
On using prosodic cues in automatic language identification. 1768-1771 - Tadashi Kitamura, Shinsai Takei:
Speaker recognition model using two-dimensional mel-cepstrum and predictive neural network. 1772-1775 - HingKeung Kwan, Keikichi Hirose:
Unknown language rejection in language identification system. 1776-1779 - James Hieronymus, Shubha Kadambe:
Spoken language identification using large vocabulary speech recognition. 1780-1783 - Carlos Teixeira, Isabel Trancoso, António Joaquim Serralheiro:
Accent identification. 1784-1787 - Sarel van Vuuren:
Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch. 1788-1791 - Xue Yang, J. Bruce Millar, Iain MacLeod:
On the sources of inter- and intra-speaker variability in the acoustic dynamics of speech. 1792-1795 - Kay M. Berkling, Etienne Barnard:
Language identification with inaccurate string matching. 1796-1799 - Michael J. Carey, Eluned S. Parris, Harvey Lloyd-Thomas, Stephen J. Bennett:
Robust prosodic features for speaker identification. 1800-1803 - Enric Monte, Javier Hernando Pericas, Xavier Miró, A. Adolf:
Text independent speaker identification on noisy environments by means of self organizing maps. 1804-1807 - Paul Dalsgaard, Ove Andersen, Hanne Hesselager, Bojan Petek:
Language identification using language-dependent phonemes and language-independent speech units. 1808-1811
Emotion in Recognition and Synthesis
- Klaus R. Scherer:
Adding the affective dimension: a new look in speech analysis and synthesis. 1811 - John J. Ohala:
Ethological theory and the expression of emotion in the voice. 1812-1815 - Iain R. Murray, John L. Arnott:
Synthesizing emotions in speech: is it time to get excited? 1816-1819 - Frank Dellaert, Thomas Polzin, Alex Waibel:
Recognizing emotion in speech. 1970-1973 - Barbara Heuft, Thomas Portele, Monika Rauth:
Emotions in time domain synthesis. 1974-1977 - Simon Arnfield:
Word class driven synthesis of prosodic annotations. 1978-1980 - Michael Banbrook, Steve McLaughlin:
Dynamical modelling of vowel sounds as a synthesis tool. 1981-1984 - Tom Johnstone:
Emotional speech elicited using computer games. 1985-1988 - Roddy Cowie, Ellen Douglas-Cowie:
Automatic statistical analysis of the signal and prosodic signs of emotion in speech. 1989-1992
Stochastic Techniques in Robust Speech Recognition
- Chin-Hui Lee, Biing-Hwang Juang, Wu Chou, J. J. Molina-Perez:
A study on task-independent subword selection and modeling for speech recognition. 1820-1823 - Mazin G. Rahim, Chin-Hui Lee:
Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training. 1824-1827 - Sunil K. Gupta, Frank K. Soong, Raziel Haimi-Cohen:
Quantizing mixture-weights in a tied-mixture HMM. 1828-1831 - Mark J. F. Gales, David Pye, Philip C. Woodland:
Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. 1832-1835 - Arun C. Surendran, Chin-Hui Lee, Mazin G. Rahim:
Maximum-likelihood stochastic matching approach to non-linear equalization for robust speech recognition. 1836-1839 - Jen-Tzung Chien, Hsiao-Chuan Wang, Lee-Min Lee:
Estimation of channel bias for telephone speech recognition. 1840-1843
Prosodic Synthesis in Text to Speech
- M. E. Johnson:
Synthesis of English intonation using explicit models of reading and spontaneous speech. 1844-1847 - Merle Horne, Marcus Filipsson:
Implementation and evaluation of a model for synthesis of Swedish intonation. 1848-1851 - Nobuyuki Katae, Shinta Kimura:
Natural prosody generation for domain specific text-to-speech systems. 1852-1855 - Mark Tatham, Eric Lewis:
Improving text-to-speech synthesis. 1856-1859 - Sahar E. Bou-Ghazale, John H. L. Hansen:
Synthesis of stressed speech from isolated neutral speech using HMM-based models. 1860-1863 - Ales Dobnikar:
Modeling segment intonation for Slovene TTS system. 1864-1867
Dialogue Events
- Elizabeth Shriberg, Andreas Stolcke:
Word predictability after hesitations: a corpus-based study. 1868-1871 - Li-chiung Yang:
Interruptions and intonation. 1872-1875 - Robin J. Lickley, Ellen Gurman Bard:
On not recognizing disfluencies in dialogue. 1876-1879 - Philip N. Garner, Sue Browning, Roger K. Moore, Martin J. Russell:
A theory of word frequencies and its application to dialogue move recognition. 1880-1883 - David R. Traum, Peter A. Heeman:
Utterance units and grounding in spoken dialogue. 1884-1887 - David G. Novick, Brian Hansen, Karen Ward:
Coordinating turn-taking with gaze. 1888-1891
Databases and Tools
- Peter Roach, Simon Arnfield, William J. Barry, J. Baltova, Marian Boldea, Adrian Fourcin, Wiktor Gonet, Ryszard Gubrynowicz, E. Hallum, Lori Lamel, Krzysztof Marasek, Alain Marchal, Einar Meister, Klára Vicsi:
BABEL: an eastern european multi-language database. 1892-1893 - Ren-Hua Wang, Deyu Xia, Jinfu Ni, Bicheng Liu:
USTC95 - a putonghua corpus. 1894-1897 - Edward Hurley, Joseph Polifroni, James R. Glass:
Telephone data collection using the world wide web. 1898-1901 - M. Falcone, A. Gallo:
The "SIVA" speech database for speaker verification: description and evaluation. 1902-1905 - Christoph Draxler:
A multi-level description of date expressions in German telephone speech. 1906-1909 - Robert H. Halstead Jr., Ben Serridge, Jean-Manuel Van Thong, William Goldenthal:
Viterbi search visualization using vista: a generic performance visualization tool. 1910-1913 - Toomas Altosaar, Matti Karjalainen, Martti Vainio:
A multilingual phonetic representation and analysis system for different speech databases. 1914-1917 - Detlev Langmann, Reinhold Haeb-Umbach, Lou Boves, Els den Os:
FRESCO: the French telephone speech data collection - part of the european Speechdat(m) project. 1918-1921 - Johannes Müller, Holger Stahl, Manfred K. Lang:
Predicting the out-of-vocabulary rate and the required vocabulary size for speech processing applications. 1922-1925 - Nathalie Parlangeau, Alain Marchal:
AMULET: automatic MUltisensor speech labelling and event tracking: study of the spatio-temporal correlations in voiceless plosive production. 1926-1929 - Minsoo Hahn, Sanghun Kim, Jung-Chul Lee, Yong-Ju Lee:
Constructing multi-level speech database for spontaneous speech processing. 1930-1933 - Marian Boldea, Alin Doroga, Tiberiu Dumitrescu, Maria Pescaru:
Preliminaries to a romanian speech database. 1934-1937 - Klaus J. Kohler:
Labelled data bank of spoken standard German the Kiel corpus of read/spontaneous speech. 1938-1941 - I. Lee Hetherington, Michael K. McCandless:
SAPPHIRE: an extensible speech analysis and recognition tool based on tcl/tk. 1942-1945 - Jiro Kiyama, Yoshiaki Itoh, Ryuichi Oka:
Automatic detection of topic boundaries and keywords in arbitrary speech using incremental reference interval-free continuous DP. 1946-1949 - Bo-Ren Bai, Lee-Feng Chien, Lin-Shan Lee:
Very-large-vocabulary Mandarin voice message file retrieval using speech queries. 1950-1953 - Håkan Melin:
Gandalf - a Swedish telephone speaker verification database. 1954-1957 - Ellen Gurman Bard, Catherine Sotillo, Anne H. Anderson, M. M. Taylor:
The DCIEM map task corpus: spontaneous dialogue under sleep deprivation and drug treatment. 1958-1961 - Xavier Menéndez-Pidal, James B. Polikoff, Shirley M. Peters, Jennie E. Leonzio, H. Timothy Bunnell:
The nemours database of dysarthric speech. 1962-1965 - Jean Hennebert, Dijana Petrovska-Delacrétaz:
POST: parallel object-oriented speech toolkit. 1966-1969
Robust Speech Processing
- Xiaoyu Zhang, Richard J. Mammone:
Channel and noise normalization using affine transformed cepstrum. 1993-1996 - Tom Claes, Fei Xie, Dirk Van Compernolle:
Spectral estimation and normalisation for robust speech recognition. 1997-2000 - Wu Chou, Nambi Seshadri, Mazin G. Rahim:
Trellis encoded vector quantization for robust speech recognition. 2001-2004 - Brian Mak, Etienne Barnard:
Phone clustering using the bhattacharyya distance. 2005-2008 - Atsushi Wakao, Kazuya Takeda, Fumitada Itakura:
Variability of lombard effects under different noise conditions. 2009-2012 - Sang-Mun Chi, Yung-Hwan Oh:
Lombard effect compensation and noise suppression for noisy Lombard speech recognition. 2013-2016
Dialects and Speaking Styles
- A. W. F. Huggins, Yogen Patel:
The use of shibboleth words for automatically classifying speakers by dialect. 2017-2020 - Ikuo Kudo, Takao Nakama, Tomoko Watanabe, Reiko Kameyama:
Data collection of Japanese dialects and its influence into speech recognition. 2021-2024 - David R. Miller, James Trischitta:
Statistical dialect classification based on mean phonetic features. 2025-2027 - Knut Kvale:
Norwegian numerals: a challenge to automatic speech recognition. 2028-2031 - Celinda de la Torre, F. Javier Caminero-Gil, Jorge Alvarez-Cercadillo, Cesar Martín del Alamo, Luis A. Hernández Gómez:
Evaluation of the telef nica i+d natural numbers recognizer over different dialects of Spanish from Spain and America. 2032-2035
Production and Perception of Prosody
- Fred Cummins, Robert F. Port:
Rhythmic constraints on English stress timing. 2036-2039 - Irene Vogel, Steve Hoskins:
On the interaction of clash, focus and phonological phrasing. 2040-2043 - Gunnar Fant, Anita Kruckenberg:
On the quantal nature of speech timing. 2044-2047 - David House:
Differential perception of tonal contours through the syllable. 2048-2051 - Martti Vainio, Toomas Altosaar:
Pitch, loudness, and segmental duration correlates: towards a model for the phonetic aspects of finnish prosody. 2052-2055 - Nobuaki Minematsu, Seiichi Nakagawa, Keikichi Hirose:
Prosodic manipulation system of speech material for perceptual experiments. 2056-2059
Topics in ASR and Search
- Joerg P. Ueberla, I. R. Gransden:
Clustered language models with context-equivalent states. 2060-2062 - Yuji Yonezawa, Masato Akagi:
Modeling of contextual effects and its application to word spotting. 2063-2066 - Jochen Junkawitsch, L. Neubauer, Harald Höge, Günther Ruske:
A new keyword spotting algorithm with pre-calculated optimal thresholds. 2067-2070 - Roxane Lacouture, Yves Normandin:
Detection of ambiguous portions of signal corresponding to OOV words or misrecognized portions of input. 2071-2074 - Fabio Brugnara, Marcello Federico:
Techniques for approximating a trigram language model. 2075-2078 - Keizaburo Takagi, Koichi Shinoda, Hiroaki Hattori, Takao Watanabe:
Unsupervised and incremental speaker adaptation under adverse environmental conditions. 2079-2082 - Hugo Van hamme, Filip Van Aelten:
An adaptive-beam pruning technique for continuous speech recognition. 2083-2086 - Carlos Avendaño, Sarel van Vuuren, Hynek Hermansky:
Data based filter design for RASTA-like channel normalization in ASR. 2087-2090 - Stefan Ortmanns, Hermann Ney, Frank Seide, Ingo Lindam:
A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition. 2091-2094 - Stefan Ortmanns, Hermann Ney, Andreas Eiden:
Language-model look-ahead for large vocabulary speech recognition. 2095-2098 - Jean-Luc Husson, Yves Laprie:
A new search algorithm in segmentation lattices of speech signals. 2099-2102 - Tomokazu Yamada, Shigeki Sagayama:
LR-parser-driven viterbi search with hypotheses merging mechanism using context-dependent phone models. 2103-2106 - Jan Nouza:
Discrete-utterance recognition with a fast match based on total data reduction. 2107-2110 - F. Javier Caminero-Gil, Celinda de la Torre, Luis Villarrubia, Cesar Martín del Alamo, Lis Hernández:
On-line garbage modeling with discriminant analysis for utterance verification. 2111-2114 - Paul Placeway, John D. Lafferty:
Cheating with imperfect transcripts. 2115-2118 - Naoto Iwahashi:
Novel training method for classifiers used in speaker adaptation. 2119-2122 - Katsuki Minamino:
Large vocabulary word recognition based on a graph-structured dictionary. 2123-2126 - Bach-Hiep Tran, Frank Seide, Volker Steinbiss:
A word graph based n-best search in continuous speech recognition. 2127-2130 - David M. Goblirsch:
Viterbi beam search with layered bigrams. 2131-2134 - Eric R. Buhrke, Wu Chou, Qiru Zhou:
A wave decoder for continuous speech recognition. 2135-2138 - Eric Thelen:
Long term on-line speaker adaptation for large vocabulary dictation. 2139-2142 - Gerhard Sagerer, Heike Rautenstrauch, Gernot A. Fink, Bernd Hildebrandt, A. Jusek, Franz Kummert:
Incremental generation of word graphs. 2143-2146 - Irina Illina, Yifan Gong:
Improvement in n-best search for continuous speech recognition. 2147-2150 - Antonio Bonafonte, José B. Mariño, Albino Nogueiras:
Sethos: the UPC speech understanding system. 2151-2154 - Pietro Laface, Luciano Fissore, A. Maro, Franco Ravera:
Segmental search for continuous speech recognition. 2155-2158
Multimodal Dialogue/HCI
- Andrew P. Breen, E. Bowers, W. Welsh:
An investigation into the generation of mouth shapes for a talking head. 2159-2162 - Bertrand Le Goff, Christian Benoît:
A text-to-audiovisual-speech synthesizer for French. 2163-2166 - Yuri Iwano, Shioya Kageyama, Emi Morikawa, Shu Nakazato, Katsuhiko Shirai:
Analysis of head movements and its role in spoken dialogue. 2167-2170 - Satoru Hayamizu, Osamu Hasegawa, Katunobu Itou, Katsuhiko Sakaue, Kazuyo Tanaka, Shigeki Nagaya, Masayuki Nakazawa, T. Endoh, Fumio Togawa, Kenji Sakamoto, Kazuhiko Yamamoto:
RWC multimodal database for interactions by integration of spoken language and visual information. 2171-2174 - Christian Cavé, Isabelle Guaïtella, Roxane Bertrand, Serge Santi, Françoise Harlay, Robert Espesser:
About the relationship between eyebrow movements and F0 variations. 2175-2178 - Laurel Fais, Kyung-ho Loken-Kim, Tsuyoshi Morimoto:
How many words is a picture really worth? 2179-2182 - A. Lagana, F. Lavagetto, A. Storace:
Visual synthesis of source acoustic speech through kohonen neural networks. 2183-2186 - Helena M. Saldaña, David B. Pisoni, Jennifer M. Fellowes, Robert E. Remez:
Audio-visual speech perception without speech cues. 2187-2190
Multilingual Speech Processing
- Jim Barnett, Andrés Corrada, G. Gao, Larry Gillick, Yoshiko Ito, Steve Lowe, Linda Manganaro, Barbara Peskin:
Multilingual speech recognition at dragon systems. 2191-2194 - Joachim Köhler:
Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds. 2195-2198 - Atsushi Nakamura, Shoichi Matsunaga, Tohru Shimizu, Masahiro Tonomura, Yoshinori Sagisaka:
Japanese speech databases for robust speech recognition. 2199-2202 - Lori Lamel, Martine Adda-Decker, Jean-Luc Gauvain, Gilles Adda:
Spoken language processing in a multilingual context. 2203-2206 - Victor Zue, Stephanie Seneff, Joseph Polifroni, Helen M. Meng, James R. Glass:
Multilingual human-computer interactions: from information access to language learning. 2207-2210 - Ulla Ackermann, Bianca Angelini, Fabio Brugnara, Marcello Federico, Diego Giuliani, Roberto Gretter, Gianni Lazzari, Heinrich Niemann:
Speedata: multilingual spoken data entry. 2211-2214 - Hiyan Alshawi:
Head automata for speech translation. 2360-2363 - Ye-Yi Wang, John D. Lafferty, Alex Waibel:
Word clustering with parallel spoken language corpora. 2364-2367 - Jae-Woo Yang, Youngjik Lee:
Toward translating Korean speech into other languages. 2368-2370 - Thomas Bub, Johannes Schwinn:
VERBMOBIL: the evolution of a complex large speech-to-speech translation system. 2371-2374 - Alon Lavie, Alex Waibel, Lori S. Levin, Donna Gates, Marsal Gavaldà, Torsten Zeppenfeld, Puming Zhan, Oren Glickman:
Translation of conversational speech with JANUS-II. 2375-2378
Acoustics in Synthesis
- William H. Edmondson, Jon P. Iles, Dorota J. Iskra:
Pseudo-articulatory representations in speech synthesis and recognition. 2215-2218 - David R. Williams:
Synthesis of initial (/s/-) stop-liquid clusters using HLsyn. 2219-2222 - Chilin Shih:
Synthesis of trill. 2223-2226 - Wai Kit Lo, P. C. Ching:
Phone-based speech synthesis with neural network and articulatory control. 2227-2230 - P. Martland, Sandra P. Whiteside, Steve W. Beet, Ladan Baghai-Ravary:
Analysis of ten vowel sounds across gender and regional/cultural accent. 2231-2234 - Masanobu Abe:
Speech morphing by gradually changing spectrum parameter and fundamental frequency. 2235-2238
Pitch and Rate
- Edouard Geoffrois:
The multi-lag-window method for robust extended-range F0 determination. 2239-2242 - Kenneth E. Barner:
Nonlinear estimation of DEGG signals with applications to speech pitch detection. 2243-2246 - John A. Maidment, María Luisa García Lecumberri:
Pitch analysis methods for cross-speaker comparison. 2247-2249 - Steve W. Beet, Ladan Baghai-Ravary:
Continuous adaptation of linear models with impulsive excitation. 2250-2253 - Sumio Ohno, Masamichi Fukumiya, Hiroya Fujisaki:
Quantitative analysis of the local speech rate and its application to speech synthesis. 2254-2257 - Jan P. Verhasselt, Jean-Pierre Martens:
A fast and reliable rate of speech detector. 2258-2261
General ASR Posters
- Puming Zhan, Klaus Ries, Marsal Gavaldà, Donna Gates, Alon Lavie, Alex Waibel:
JANUS-II: towards spontaneous Spanish speech recognition. 2285-2288 - Kris Demuynck, Jacques Duchateau, Dirk Van Compernolle:
Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch. 2289-2292 - Andrei Constantinescu, Olivier Bornet, Gilles Caloz, Gérard Chollet:
Validating different flexible vocabulary approaches on the Swiss French Polyphone and Polyvar databases. 2293-2296 - Néstor Becerra Yoma, Fergus R. McInnes, Mervyn A. Jack:
Use of a reliability coefficient in noise cancelling by neural net and weighted matching algorithms. 2297-2300 - Kazuhiko Ozeki:
Likelihood normalization using an ergodic HMM for continuous speech recognition. 2301-2304 - Laurence Candille, Henri Meloni:
Dynamic control of a production model. 2305-2308 - Hiroaki Hattori, Eiko Yamada:
Speech recognition using sub-word units dependent on phonetic contexts of both training and recognition vocabularies. 2309-2312 - Bruno Jacob, Christine Sénac:
Hidden Markov models merging acoustic and articulatory information to automatic speech recognition. 2313-2315 - Mats Blomberg, Kjell Elenius:
Creation of unseen triphones from diphones and monophones using a speech production approach. 2316-2319 - Bo Xu, Bing Ma, Shuwu Zhang, Fei Qu, Taiyi Huang:
Speaker-independent dictation of Chinese speech with 32k vocabulary. 2320-2323 - Jason J. Humphries, Philip C. Woodland, David J. B. Pearce:
Using accent-specific pronunciation modelling for robust speech recognition. 2324-2327 - Tilo Sloboda, Alex Waibel:
Dictionary learning for spontaneous speech recognition. 2328-2331 - Johan de Veth, Lou Boves:
Comparison of channel normalisation techniques for automatic speech recognition over the phone. 2332-2335 - Manuel A. Leandro, José Manuel Pardo:
Anchor point detection for continuous speech recognition in Spanish: the spotting of phonetic events. 2336-2339 - Bhiksha Raj, Evandro Bacci Gouvêa, Pedro J. Moreno, Richard M. Stern:
Cepstral compensation by polynomial approximation for environment-independent speech recognition. 2340-2343 - B. T. Lilly, Kuldip K. Paliwal:
Effect of speech coders on speech recognition performance. 2344-2347 - Léonard Janer, Josep Martí, Climent Nadeu, Eduardo Lleida-Solano:
Wavelet transforms for non-uniform speech recogntion systems. 2348-2351 - Tsuyoshi Usagawa, Markus Bodden, Klaus Rateitschek:
A binaural model as a front-end for isolated word recognition. 2352-2355 - Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata:
A new speech enhancement: speech stream segregation. 2356-2359
Data-based Synthesis
- Andrew Slater, John Coleman:
Non-segmental analysis and synthesis based on a speech database. 2379-2382 - Ralf Benzmüller, William J. Barry:
Microsegment synthesis - economic principles in a low-cost solution. 2383-2386 - Xuedong Huang, Alex Acero, J. Adcock, Hsiao-Wuen Hon, John Goldsmith, Jingsong Liu, Mike Plumpe:
Whistler: a trainable text-to-speech system. 2387-2390 - Thomas Portele, Karlheinz Stöber, Horst Meyer, Wolfgang Hess:
Generation of multiple synthesis inventories by a bootstrapping procedure. 2391-2394 - Bernd Möbius, Jan P. H. van Santen:
Modeling segmental duration in German text-to-speech synthesis. 2395-2398 - Nick Campbell:
Autolabelling Japanese ToBI. 2399-2402
Speaker Identification and Verification
- Sarangarajan Parthasarathy, Aaron E. Rosenberg:
General phrase speaker verification using sub-word background models and likelihood-ratio scoring. 2403-2406 - Jin'ichi Murakami, Masahide Sugiyama, Hideyuki Watanabe:
Unknown-multiple signal source clustering problem using ergodic HMM and applied to speaker classification. 2407-2410 - Jean-Luc Le Floch, Claude Montacié, Marie-José Caraty:
GMM and ARVM cooperation and competition for text-independent speaker recognition on telephone speech. 2411-2414 - Qiguang Lin, Ea-Ee Jan, ChiWei Che, Dong-Suk Yuk, James L. Flanagan:
Selective use of the speech spectrum and a VQGMM method for speaker identification. 2415-2418 - Michael Newman, Larry Gillick, Yoshiko Ito, Don McAllaster, Barbara Peskin:
Speaker verification through large vocabulary continuous speech recognition. 2419-2422 - Andrea Paoloni, Susanna Ragazzini, Giacomo Ravaioli:
Predictive neural networks in text independent speaker verification: an evaluation on the SIVA database. 2423-2426
Acoustic Phonetics
- Nisheeth Shrotriya, Rajesh Verma, Sunil K. Gupta, S. S. Agrawal:
Durational characterstics of hindi consonant clusters. 2427-2430 - Beng T. Tan, Minyue Fu, Andrew Spray, Phillip Dermody:
The use of wavelet transforms in phoneme recognition. 2431-2434 - Hisao Kuwabara:
Acoustic properties of phonemes in continuous speech for different speaking rate. 2435-2438 - Hiroya Fujisaki, Sumio Ohno:
Prosodic parameterization of spoken Japanese based on a model of the generation process of F0 contours. 2439-2442 - Arman Maghbouleh:
A logistic regression model for detecting prominences. 2443-2445 - Beat Pfister:
High-quality prosodic modification of speech signals. 2446-2449
Perception of Vowels and Consonants
- Jialu Zhang:
On the syllable structures of Chinese relating to speech recognition. 2450-2453 - Takashi Otake, Kiyoko Yoneyama:
Can a moraic nasal occur word-initially in Japanese? 2454-2457 - Winifred Strange, Reiko Akahane-Yamada, B. H. Fitzgerald, Rieko Kubo:
Perceptual assimilation of american English vowels by Japanese listeners. 2458-2461 - Winifred Strange, Ocke-Schwen Bohn, S. A. Trent, M. C. McNair, K. C. Bielec:
Context and speaker effects in the perceptual assimilation of German vowels by american listeners. 2462-2465 - Mohamed Zahid:
Examination of a perceptual non-native speech contrast: pharyngealized/non-pharyngealized discrimination by French-speaking adults. 2466-2469 - Roel Smits:
Context-dependent relevance of burst and transitions for perceived place in stops: it's in production, not perception. 2470-2473 - Ryoji Baba, Kaori Omuro, Hiromitsu Miyazono, Tsuyoshi Usagawa, Masahiko Higuchi:
The perception of morae in long vowels comparison among Japanese, Korean and English speakers. 2474-2477 - Robin J. Lickley:
Juncture cues to disfluency. 2478-2481 - James R. Sawusch:
Effects of duration and formant movement on vowel perception. 2482-2485 - Neeraj Deshmukh, Richard Duncan, Aravind Ganapathiraju, Joseph Picone:
Benchmarking human performance for continuous speech recognition. 2486-2489 - Takayuki Arai, Misha Pavel, Hynek Hermansky, Carlos Avendaño:
Intelligibility of speech with filtered time trajectories of spectral envelopes. 2490-2493 - Douglas H. Whalen, Sonya M. Sheffert:
Perceptual use of vowel and speaker information in breath sounds. 2494-2497 - Philippe Mousty, Monique Radeau, Ronald Peereman, Paul Bertelson:
The role of neighborhood relative frequency in spoken word recognition. 2498-2501 - James M. McQueen, Mark A. Pitt:
Transitional probability and phoneme monitoring. 2502-2505 - Anne Bonneau:
Identification of vowel features from French stop bursts. 2506-2509 - Zinny S. Bond, Thomas J. Moore, Beverley Gable:
Listening in a second language. 2510-2513 - Denis Burnham, Elizabeth Francis, Di Webster, Sudaporn Luksaneeyanawin, Chayada Attapaiboon, Francisco Lacerda, Peter Keller:
Perception of lexical tone across languages: evidence for a linguistic mode of processing. 2514-2517 - James S. Magnuson, Reiko Akahane-Yamada:
Acoustic correlates to the effects of talker variability on the perception of English /r/ and /l/ by Japanese listeners. 2518-2521
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.