default search action
EUROSPEECH/INTERSPEECH 2001: Aalborg, Denmark
- Paul Dalsgaard, Børge Lindberg, Henrik Benner, Zheng-Hua Tan:
EUROSPEECH 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, 2nd INTERSPEECH Event, Aalborg, Denmark, September 3-7, 2001. ISCA 2001
Keynotes
- Louis C. W. Pols:
Acquiring and implementing phonetic knowledge. - Yrjö Neuvo:
Mobile future. - Susan E. Brennan:
How visual co-presence and joint attention shape speaking.
What do Industry and Universities Expect from Each Other? (Special Session)
- Ilkka Niiniluoto:
Universities and industry: marriage or co-operation between independent partners? 11-12 - Yrjö Neuvo:
Considerations on what industry expects from universities. 13-14 - Gary W. Strong:
A perspective on industry/university relationships in the US. 15-16 - Khalid Choukri:
ELRA contribution to bridge the gap between industry and academia. 17-18
Linguistic Modelling: Language Model Compression
- Giulio Maltese, Paolo Bravetti, H. Crépy, B. J. Grainger, M. Herzog, Francisco Palou:
Combining word- and class-based language models: a comparative study in several languages using automatic and manual word-clustering techniques. 21-24 - Shuntaro Isogai, Katsuhiko Shirai, Hirofumi Yamamoto, Yoshinori Sagisaka:
Multi-class composite n-gram language model using multiple word clusters and word successions. 25-28 - Imed Zitouni, Kamel Smaïli, Jean Paul Haton:
Statistical language model based on a hierarchical approach: MCnv. 29-32 - Edward W. D. Whittaker, Bhiksha Raj:
Quantization-based language model compression. 33-36
Speech Production: Voice Source
- Gerrit Bloothooft, Mieke van Wijck, Peter Pabon:
Relations between vocal registers in voice breaks. 39-42 - Gordon Ramsay:
A quasi-one-dimensional model of aerodynamic and acoustic flow in the time-varying vocal tract: source and excitation mechanisms. 43-46 - Nathalie Henrich, Christophe d'Alessandro, Boris Doval:
Spectral correlates of voice open quotient and glottal flow asymmetry : theory, limits and experimental data. 47-50 - Federico Avanzini, Paavo Alku, Matti Karjalainen:
One-delayed-mass model for efficient synthesis of glottal flow. 51-54
Speech Recognition and Understanding: Pronunciation and Subword Units
- Fang Zheng, Zhanjiang Song, Pascale Fung, William Byrne:
Modeling pronunciation variation using context-dependent weighting and b/s refined acoustic modeling. 57-60 - Issam Bazzi, James R. Glass:
Learning units for domain-independent out-of- vocabulary word modelling. 61-64 - Hideharu Nakajima, Izumi Hirano, Yoshinori Sagisaka, Katsuhiko Shirai:
Pronunciation variant analysis using speaking style parallel corpus. 65-68 - Jan Kneissler, Dietrich Klakow:
Speech recognition for huge vocabularies by using optimized sub-word units. 69-72 - Kyung-Tak Lee, Christian Wellekens:
Dynamic lexicon using phonetic features. 1413-1416 - Ute Ziegenhain, Josef G. Bauer:
Triphone tying techniques combining a-priori rules and data driven methods. 1417-1420 - Louis ten Bosch, Nick Cremelie:
Pronunciation modeling and lexical adaptation in midsize vocabulary ASR. 1421-1424 - Yi Liu, Pascale Fung:
Estimating pronunciation variations from acoustic likelihood score for HMM reconstruction. 1425-1428 - Maximilian Bisani, Hermann Ney:
Breadth-first search for finding the optimal phonetic transcription from multiple utterances. 1429-1432 - Matthias Wolff, Matthias Eichner, Rüdiger Hoffmann:
Improved data-driven generation of pronunciation dictionaries using an adapted word list. 1433-1436 - Karen Livescu, James R. Glass:
Segment-based recognition on the phonebook task: initial results and observations on duration modeling. 1437-1440 - Søren Kamaric Riis, Morten With Pedersen, Kåre Jean Jensen:
Multilingual text-to-phoneme mapping. 1441-1444 - Ming-Yi Tsai, Fu-Chiang Chou, Lin-Shan Lee:
Pronunciation variation analysis with respect to various linguistic levels and contextual conditions for Mandarin Chinese. 1445-1448 - Laura Mayfield Tomokiyo:
Hypothesis-driven accent discrimination. 1449-1452 - Changxue Ma, Mark A. Randolph:
An approach to automatic phonetic baseform generation based on Bayesian networks. 1453-1457 - Hauke Schramm, Peter Beyerlein:
Towards discriminative lexicon optimization. 1457-1460 - Xiaodong He, Yunxin Zhao:
Model complexity optimization for nonnative English speakers. 1461-1464 - Tibor Fegyó, Péter Mihajlik, Péter Tatai, Géza Gordos:
Pronunciation modeling in hungarian number recognition. 1465-1468
Phonetics and Phonology: Prosody and Others
- Marc Swerts, Hanne Kloots, Steven Gillis, Georges De Schutter:
Factors affecting schwa-insertion in final consonant clusters in standard dutch. 75-78 - Leah Hitchcock, Steven Greenberg:
Vowel height is intimately associated with stress accent in spontaneous american English discourse. 79-82 - Dafydd Gibbon:
Finite state prosodic analysis of african corpus resources. 83-86 - Marc Schröder, Roddy Cowie, Ellen Douglas-Cowie, Machiel Westerdijk, Stan C. A. M. Gielen:
Acoustic correlates of emotion dimensions in view of speech synthesis. 87-90 - Hanny den Ouden, Jacques M. B. Terken:
Measuring pitch range. 91-94 - Dafydd Gibbon, Ulrike Gut:
Measuring speech rhythm. 95-98 - Mariapaola D'Imperio:
Tonal alignment, scaling and slope in Italian question and statement tunes. 99-102 - Antti Iivonen:
Pragmatic temporal voice range profile as a tool in the research of speech styles. 103-106 - Wooil Kim, Taeyun Kim, Sungjoo Ahn, Hanseok Ko:
Model based stress decision method. 107-110 - Torbjørn Nordgård, Arne Kjell Foldvik:
Reduction of alternative pronunciations in the norwegian computational lexicon norkompleks. 111-114 - Gorka Elordieta, José Ignacio Hualde:
The role of duration as a correlate of accent in lekeitio basque. 115-118 - Victoria Johansson, Merle Horne, Sven Strömqvist:
Word final aspiration as a phrase boundary cue: data from spontaneous Swedish discourse. 119-122 - Xipeng Shen, Bo Xu:
Study and auto-detection of stress based on tonal pitch range in Mandarin. 123-126 - Noam Amir, Ori Kerret, Dimitry Karlinski:
Classifying emotions in speech: a comparison of methods. 127-130
Speech Perception: First and Second Language Learning
- Dawn M. Behne, Peter E. Czigler, Kirk P. H. Sullivan:
Development of vowel quantity perception in late childhood. 133-136 - Byunggon Yang:
A study on the production-perception link of English vowels produced by native and non-native speakers. 137-140 - Takashi Otake, Yuka Yamaguchi:
Japanese can be aware of syllables and morae: evidence from Japanese-English bilingual children. 141-144 - Daniel E. Callan, Keiichi Tajima, Akiko E. Callan, Reiko Akahane-Yamada, Shinobu Masaki:
Neural processes underlying perceptual learning of a difficult second language phonetic contrast. 145-148 - Masahiko Komatsu, Kazuya Mori, Takayuki Arai, Yuji Murahara:
Human language identification with reduced segmental information: comparison between monolinguals and bilinguals. 149-152
Speech Perception: Miscellaneous
- Santiago Fernández, Sergio Feijóo:
Coarticulatory effects in perception. 155-158 - Sue Harding, Georg F. Meyer:
A case for multi-resolution auditory scene analysis. 159-162 - Lucie Ménard, Jean-Luc Schwartz, Louis-Jean Boë, Sonia Kandel, Nathalie Vallée:
Perceptual identification and normalization of synthesized French vowels from birth to adulthood. 163-166 - Lucie Ménard, Louis-Jean Boë:
Perceptual categorization of maximal vowel spaces from birth to adulthood simulated by an articulatory model. 167-170 - Maxine Eskénazi, Alan W. Black:
A study on speech over the telephone and aging. 171-174 - Marcia Chen, Abeer Alwan:
On the perception of voicing for plosives in noise. 175-178 - Jintao Jiang, Abeer Alwan, Edward T. Auer, Lynne E. Bernstein:
Predicting visual consonant perception from physical measures. 179-182 - William A. Ainsworth, T. Cervera:
Effects of noise adaptation on the perception of voiced plosives in isolated syllables. 371-374 - Makoto Hiroshige, Kenji Araki, Koji Tochinai:
On differential limen of word-based local speechrate variation in Japanese expressed by duration ratio. 375-378 - Wan Tokuma:
A multidimensional scaling study of fricatives; a comparison of perceptual and physical dimensions. 379-382 - Marc Swerts, Emiel Krahmer:
Reconstructing dialogue history. 383-386 - David House, Jonas Beskow, Björn Granström:
Timing and interaction of visual cues for prominence in audiovisual speech perception. 387-390 - Masahiko Komatsu, Shinichi Tokuma, Won Tokuma, Takayuki Arai:
Modelling the perceptual identification of Japanese consonants from LPC cepstral distances. 391-394 - Denis Burnham, Valter Ciocca, Stephanie Stokes:
Auditory-visual perception of lexical tone. 395-398 - Anders Eriksson, Gunilla C. Thunberg, Hartmut Traunmller:
Syllable prominence: a matter of vocal effort, phonetic distinct-ness and top-down processing. 399-402 - Hansjörg Mixdorff, Christina Widera:
Perceived prominence in terms of a linguistically motivated quantitative intonation model. 403-406 - Sarah Hawkins, Noël Nguyen:
Perception of coda voicing from properties of the onset and nucleus of 'led' and 'let'. 407-410 - Lee Lin, Eliathamby Ambikairajah, W. Harvey Holmes:
Auditory filter bank design using masking curves. 411-414 - Dashtseren Erdenebat, Shigeyoshi Kitazawa, Tatsuya Kitamura:
A new feature driven cochlear implant speech processing strategy. 415-418
Noise Robust Recognition: Frontend and Compensation Algorithms (Special Session)
- Qifeng Zhu, Markus Iseli, Xiaodong Cui, Abeer Alwan:
Noise robust feature extraction for ASR using the Aurora 2 database. 185-188 - Daniel P. W. Ellis, Manuel J. Reyes Gomez:
Investigations into tandem acoustic modeling for the Aurora task. 189-192 - Bernt Andrassy, Damjan Vlaj, Christophe Beaugeant:
Recognition performance of the siemens front-end with and without frame dropping on the Aurora 2 database. 193-196 - Bojan Kotnik, Zdravko Kacic, Bogomir Horvat:
A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm. 197-200 - Johan de Veth, Laurent Mauuary, Bernhard Noé, Febe de Wet, Jürgen Sienel, Lou Boves, Denis Jouvet:
Feature vector selection to improve ASR robustness in noisy conditions. 201-204 - Dusan Macho, Climent Nadeu:
Comparison of spectral derivative parameters for robust speech recognition. 205-208 - Umit H. Yapanel, John H. L. Hansen, Ruhi Sarikaya, Bryan L. Pellom:
Robust digit recognition in noise: an evaluation using the AURORA corpus. 209-212 - Jon Barker, Martin Cooke, Phil D. Green:
Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise. 213-217 - Jasha Droppo, Li Deng, Alex Acero:
Evaluation of the SPLICE algorithm on the Aurora2 database. 217-220 - José C. Segura, Ángel de la Torre, M. Carmen Benítez, Antonio M. Peinado:
Model-based compensation of the additive noise for continuous speech recognition. experiments using the Aurora II database and tasks. 221-224 - Andrew C. Morris, Astrid Hagen, Hervé Bourlard:
MAP combination of multi-stream HMM or HMM/ANN experts. 225-228 - Bojan Jarc, Rudolf Babic:
Second order statistics spectrum estimation method for robust speech recognition. 229-232 - Kaisheng Yao, Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura:
Feature extraction and model-based noise compensation for noisy speech recognition evaluated on AURORA 2 task. 233-236
Linguistic Modelling: Language Model Adaptation
- Marcello Federico, Nicola Bertoldi:
Broadcast news LM adaptation using contemporary texts. 239-242 - Mirjam Sepesy Maucec, Zdravko Kacic:
Topic detection for language model adaptation of highly-inflected languages by using a fuzzy comparison function. 243-246 - Kallirroi Georgila, Nikos Fakotakis, George K. Kokkinakis:
Efficient stochastic finite-state networks for language modelling in spoken dialogue systems. 247-250 - Karthik Visweswariah, Harry Printz:
Language models conditioned on dialog state. 251-254 - Langzhou Chen, Jean-Luc Gauvain, Lori Lamel, Gilles Adda, Martine Adda-Decker:
Using information retrieval methods for language model adaptation. 255-258
Speech Production: Articulation
- Olov Engwall:
Making the tongue model talk: merging MRI & EMA measurements. 261-264 - Inger Moen, Hanne Gram Simonsen, Morten Huseby, John Grue:
The relationship between intraoral air pressure and tongue/palate contact during the articulation of norwegian /t/ and /d/. 265-268 - Ahmed M. Elgendy, Louis C. W. Pols:
Mechanical versus perceptual constraints as determinants of articulatory strategy. 269-272 - Bryan Gick, Ian Wilson:
Pre-liquid excrescent schwa: what happens when vocalic targets conflict. 273-276 - Slim Ouni, Yves Laprie:
Exploring the null space of the acoustic-to- articulatory inversion using a hypercube codebook. 277-280
Speech Recognition and Understanding: Topic Detection and Information Retrieval
- M. W. Theunissen, Konrad Scheffler, Johan A. du Preez:
Phoneme-based topic spotting on the switchboard corpus. 283-286 - Martin Franz, J. Scott McCarley, Todd Ward, Wei-Jing Zhu:
Topic styles in IR and TDT: effect on system behavior. 287-290 - Geoffrey Zweig, Jing Huang, Mukund Padmanabhan:
Extracting caller information from voicemail. 291-294 - Hong-Kwang Jeff Kuo, Chin-Hui Lee:
A portability study on natural language call steering. 295-298 - Berlin Chen, Hsin-Min Wang, Lin-Shan Lee:
Improved spoken document retrieval by exploring extra acoustic and linguistic cues. 299-302
Phonetics and Phonology: Segmentals and Synthesis
- Kimiko Tsukada:
Native vs non-native production of English vowels in spontaneous speech: an acoustic phonetic study. 305-308 - Silke Goronzy, Marina Sahakyan, Wolfgang Wokurek:
Is non-native pronunciation modelling necessary ? 309-312 - Yves Laprie, Anne Bonneau:
Burst segmentation and evaluation of acoustic cues. 313-316 - Theodor Granser, Sylvia Moosmller:
The schwa in albanian. 317-320 - Simone Ashby, Julie Carson-Berndsen, Gina Joue:
A testbed for developing multilingual phonotactic descriptions. 321-324 - Wing-Nga Fung, Sze-Lok Lau:
A physiological analysis of nasals and nasalization in Chinese. 325-328 - Robert E. Donovan:
A component by component listening test analysis of the IBM trainable speech synthesis system. 329-332 - Shimei Pan, Kathleen R. McKeown, Julia Hirschberg:
Semantic abnormality and its realization in spoken language. 333-336 - Nick Campbell:
TALKING FOREIGN - concatenative speech synthesis and the language barrier. 337-340 - Christian Jensen:
Schwa-assimilation in danish synthetic speech. 341-344 - Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi:
Text-to-speech synthesis with arbitrary speaker's voice from average voice. 345-348 - Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. 349-352 - Min Tang, Chao Wang, Stephanie Seneff:
Voice transformations: from speech synthesis to mammalian vocalizations. 353-356 - Juana M. Gutiérrez-Arriola, Juan Manuel Montero, José A. Vallejo, Ricardo de Córdoba, Rubén San Segundo, José Manuel Pardo:
A new multi-speaker formant synthesizer that applies voice conversion techniques. 357-360 - Mikiko Mashimo, Tomoki Toda, Kiyohiro Shikano, Nick Campbell:
Evaluation of cross-language voice conversion based on GMM and straight. 361-364 - Rachel Coulston:
Ejective reduction in chaha is conditioned by more than prosodic position. 365-368
Noise Robust Recognition: Frontend (Special Session)
- Hong Kook Kim, Richard C. Rose, Hong-Goo Kang:
Acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. 421-424 - Yan Ming Cheng, Dusan Macho, Yuanjun Wei, Douglas Ealey, Holly Kelleher, David Pearce, William Kushner, Tenkasi Ramabadran:
A robust front-end algorithm for distributed speech recognition. 425-428 - M. Carmen Benítez, Lukás Burget, Barry Y. Chen, Stéphane Dupont, Harinath Garudadri, Hynek Hermansky, Pratibha Jain, Sachin S. Kajarekar, Nelson Morgan, Sunil Sivadas:
Robust ASR front-end using spectral-based and discriminant features: experiments on the Aurora tasks. 429-432 - Bernhard Noé, Jürgen Sienel, Denis Jouvet, Laurent Mauuary, Johan de Veth, Lou Boves, Febe de Wet:
Noise reduction for noise robust feature extraction for distributed speech recognition. 433-436 - Douglas Ealey, Holly Kelleher, David Pearce:
Harmonic tunnelling: tracking non-stationary noises during speech. 437-440
Linguistic Modelling: Semantic Modelling
- David Carter, Ian Gransden:
Resource-limited sentence boundary detection. 443-446 - Andrew N. Pargellis, Eric Fosler-Lussier, Alexandros Potamianos, Chin-Hui Lee:
Metrics for measuring domain independence of semantic classes. 447-450 - Xiaolong Mou, Stephanie Seneff, Victor Zue:
Context-dependent probabilistic hierarchical sublexical modelling using finite state transducers. 451-454 - Jerome R. Bellegarda, Kim E. A. Silverman:
Data-driven semantic inference for unconstrained desktop command and control. 455-458 - Martin Jansche:
Information extraction via heuristics for a movie showtime query system. 459-462
Speech Perception: Recognition and Intelligibility
- Takashi Otake, Anne Cutler:
Recognition of (almost) spoken words: evidence from word play in Japanese. 465-468 - Vincent Colotte, Yves Laprie, Anne Bonneau:
Perceptual experiments on enhanced and slowed down speech sentences for second language acquisition. 469-473 - Steven Greenberg, Takayuki Arai:
The relation between speech intelligibility and the complex modulation spectrum. 473-476 - Olivier Crouzet, William A. Ainsworth:
Envelope information in speech processing: acoustic-phonetic analysis vs. auditory figure-ground segregation. 477-480 - Patti Adank, Roeland van Hout, Roel Smits:
A comparison between human vowel normalization strategies and acoustic vowel transformation techniques. 481-484
Speech Recognition and Understanding: LVCSR
- Pavel Ircing, Pavel Krbec, Jan Hajic, Josef Psutka, Sanjeev Khudanpur, Frederick Jelinek, William Byrne:
On large vocabulary continuous speech recognition of highly inflectional language - czech. 487-490 - Takahiro Shinozaki, Chiori Hori, Sadaoki Furui:
Towards automatic transcription of spontaneous presentations. 491-494 - Olivier Siohan, Akio Ando, Mohamed Afify, Hui Jiang, Chin-Hui Lee, Qi Li, Feng Liu, Kazuo Onoe, Frank K. Soong, Qiru Zhou:
A real-time Japanese broadcast news closed-captioning system. 495-498 - Peter Beyerlein, Xavier L. Aubert, Matthew Harris, Carsten Meyer, Hauke Schramm:
Investigations on conversational speech recognition. 499-502 - Yuqing Gao, Hakan Erdogan, Yongxin Li, Vaibhava Goel, Michael Picheny:
Recent advances in speech recognition system for IBM DARPA communicator. 503-506 - Daniel Willett, Erik McDermott, Yasuhiro Minami, Shigeru Katagiri:
Time and memory efficient viterbi decoding for LVCSR using a precompiled search network. 847-850 - Feng Liu, Mohamed Afify, Hui Jiang, Olivier Siohan:
A new verification-based fast match approach to large vocabulary speech recognition. 851-854 - Seiichi Nakagawa, Yukihisa Horibe:
A fast calculation method in LVCSRS by time-skipping and clustering of probability density distributions. 855-858 - Shinichi Homma, Akio Kobayashi, Shoei Sato, Toru Imai, Akio Ando:
Speech recognition of Japanese news commentary. 859-862
Speech Synthesis: Systems and Prosody
- Piero Cosi, Fabio Tesser, Roberto Gretter, Cinzia Avesani, Mike Macon:
Festival speaks Italian! 509-512 - Alex I. C. Monaghan, Mahmoud Kassaei, Mark Luckin, Mariscela Amador-Hernandez, Andrew Lowry, Daniel Faulkner, Fred Sannier:
Multilingual TTS for computer telephony: the aculab approach. 513-516 - Géza Kiss, Géza Németh, Gábor Olaszy, Géza Gordos:
A flexible multilingual TTS development and speech research tool. 517-520 - Esther Klabbers, Karlheinz Stöber, Raymond N. J. Veldhuis, Petra Wagner, Stefan Breuer:
Speech synthesis development made easy: the bonn open synthesis system. 521-525 - Gábor Olaszy, Géza Németh, Péter Olaszi:
Automatic prosody generation - a model for hungarian. 525-528 - Olga van Herwijnen, Jacques M. B. Terken:
Evaluation of PROS-3 for the assignment of prosodic structure, compared to assignment by human experts. 529-532 - Yoichi Yamashita, Tomoyoshi Ishida:
Stochastic F0 contour model based on the clustering of F0 shapes of a syntactic unit. 533-536 - Xuejing Sun, Ted H. Applebaum:
Intonational phrase break prediction using decision tree and n-gram model. 537-540 - A. Zaki, A. Rajouani, Mohamed Najim:
Synthesizing intonation of standard arabic language. 541-545 - Dawei Xu, Hiroki Mori, Hideki Kasuya:
Invariance of relative F0 change field of Chinese disyllabic words. 545-548 - Achim F. Müller, Rüdiger Hoffmann:
Accent label prediction by time delay neural networks using gating clusters. 549-553 - Peter Juel Henrichsen:
Transformation-based learning of danish stress assignment. 553-556 - Stefan Baumann, Jürgen Trouvain:
On the prosody of German telephone numbers. 557-560 - Marc Schröder:
Emotional speech synthesis: a review. 561-564 - Kjell Gustafson, David House:
Fun or boring? a web-based evaluation of expressive synthesis for children. 565-568
Speech Recognition and Understanding: Articulatory and Perceptual Approaches to ASR
- Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura:
Sub-band based additive noise removal for robust speech recognition. 571-574 - Yik-Cheung Tam, Brian Kan-Wing Mak:
Development of an asynchronous multi-band system for continuous speech recognition. 575-578 - Peter Jancovic, Ji Ming:
A multi-band approach based on the probabilistic union model and frequency-filtering features for robust speech recognition. 579-582 - Liang Gu, Kenneth Rose:
Split-band perceptual harmonic cepstral coefficients as acoustic features for speech recognition. 583-586 - Astrid Hagen, Hervé Bourlard:
Error correcting posterior combination for robust multi-band speech recognition. 587-590 - Bojana Gajic, Kuldip K. Paliwal:
Robust parameters for speech recognition based on subband spectral centroid histograms. 591-594 - William H. Edmondson, Li Zhang:
Pseudo-articulatory representations and the recognition of syllable patterns in speech. 595-598 - Joe Frankel, Simon King:
ASR - articulatory speech recognition. 599-602 - Jeff Z. Ma, Li Deng:
Efficient decoding strategy for conversational speech recognition using state-space models for vocal-tract-resonance dynamics. 603-606 - Katrin Weber, Samy Bengio, Hervé Bourlard:
HMM2- extraction of formant structures and their use for robust ASR. 607-610 - Xiaoqing Yu, Wanggen Wan, Daniel Pak-Kong Lun:
Auditory model based speech recognition in noisy environment. 611-614 - Sascha Wendt, Gernot A. Fink, Franz Kummert:
Forward masking for increased robustness in automatic speech recognition. 615-618 - Qi Li, Frank K. Soong, Olivier Siohan:
An auditory system-based feature for robust speech recognition. 619-622
Noise Robust Recognition: Robust Systems - What Helps? (Special Session)
- Markus Lieb, Alexander Fischer:
Experiments with the philips continuous ASR system on the AURORA noisy digits database. 625-628 - George Saon, Juan M. Huerta, Ea-Ee Jan:
Robust digit recognition in noisy environments: the IBM Aurora 2 system. 629-632 - Mohamed Afify, Hui Jiang, Filipp Korkmazskiy, Chin-Hui Lee, Qi Li, Olivier Siohan, Frank K. Soong, Arun C. Surendran:
Evaluating the Aurora connected digit recognition task - a bell labs approach. 633-636
Phonetics and Phonology: Segmentals
- Cécile Fougeron, Jean-Philippe Goldman, Ulrich H. Frauenfelder:
Liaison and schwa deletion in French: an effect of lexical frequency and competition? 639-642 - Eric Zee, Wai-Sum Lee:
An acoustical analysis of the vowels in beijing Mandarin. 643-646 - Véronique Delvaux, Alain Soquet:
Discriminant analysis of nasal vs. oral vowels in French: comparison between different parametric representations. 647-650 - Didier Demolin, Véronique Delvaux:
Whispery voiced nasal stops in rwanda. 651-654
Speech Production: Prosody
- Gunnar Fant, Anita Kruckenberg, Johan Liljencrants, Antonis Botinis:
Prominence correlates. a study of Swedish. 657-660 - Sumio Ohno, Hiroya Fujisaki:
Quantitative analysis of the effects of emphasis upon prosodic features of speech. 661-664 - Grzegorz Dogil, Bernd Möbius:
Towards a model of target oriented production of prosody. 665-668 - Chilin Shih, Greg Kochanski:
Prosody control for speaking and singing styles. 669-672 - Greg Kochanski, Chilin Shih:
Automated modeling of Chinese intonation in continuous speech. 911-914 - Johan Frid:
Prediction of intonation patterns of accented words in a corpus of read Swedish news through pitch contour stylization. 915-918 - Paavo Alku, Juha Vintturi, Erkki Vilkman:
The use of fundamental frequency raising as a strategy for increasing vocal intensity in soft, normal, and loud phonation. 919-922 - Antonis Botinis, Marios Fourakis, Robert Bannert:
Prosodic interactions on segmental durations ingreek. 923-926 - Min Chu, Yongqiang Feng:
Study on factors influencing durations of syllables in Mandarin. 927-930 - Sofia Gustafson-Capková, Beáta Megyesi:
A comparative study of pauses in dialogues and read speech. 931-934 - Keiichi Takamaru, Makoto Hiroshige, Kenji Araki, Koji Tochinai:
Detecting Japanese local speech rate deceleration in spontaneous conversational speech using a variable threshold. 935-938 - Niels Reinholt Petersen:
Modelling fundamental frequency in first post-tonic syllables in danish sentences. 939-942 - Michelina Savino:
Non-finality and pre-finality in bari Italian intonation: a preliminary account. 943-946 - Hansjörg Mixdorff, Oliver Jokisch:
Building an integrated prosodic model of German. 947-950 - Omar A. G. Ibrahim, Salwa H. El-Ramly, Nemat S. Abdel Kader:
A model of F0 contour for arabic affirmative and interrogative sentences. 951-954 - Caroline L. Smith, Lisa A. Hogan:
Variation in final lengthening as a function of topic structure. 955-958 - Olga van Herwijnen, Jacques M. B. Terken:
Do speakers realize the prosodic structure they say they do? 959-962 - Marija Tabain, Guillaume Rolland, Christophe Savariaux:
Coarticulatory effects at prosodic boundaries: some acoustic results. 963-966 - Plínio A. Barbosa:
Generating duration from a cognitively plausible model of rhythm production. 967-970
Speech Recognition and Understanding: Acoustic Modelling - I
- Matthew N. Stuttle, Mark J. F. Gales:
A mixture of Gaussians front end for speech recognition. 675-678 - Jing Zheng, John Butzberger, Horacio Franco, Andreas Stolcke:
Improved maximum mutual information estimation training of continuous density HMMs. 679-682 - Florent Perronnin, Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua:
Maximum-likelihood training of a bipartite acoustic model for speech recognition. 683-686 - Ruhi Sarikaya, John H. L. Hansen:
Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition. 687-690 - Ellen Eide:
Distinctive features for use in an automatic speech recognition system. 1613-1616 - Jiyong Zhang, Fang Zheng, Jing Li, Chunhua Luo, Guoliang Zhang:
Improved context-dependent acoustic modeling for continuous Chinese speech recognition. 1617-1620 - Jacques Duchateau, Kris Demuynck, Dirk Van Compernolle, Patrick Wambacq:
Class definition in discriminant feature analysis. 1621-1624 - José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:
Feature extraction from time-frequency matrices for robust speech recognition. 1625-1628 - Peng Yu, Zuoying Wang:
Using spatial correlation information in speech recognition. 1629-1632 - Josef G. Bauer:
On the choice of classes in MCE based discriminative HMM-training for speech recognizers used in the telephone environment. 1633-1636 - Joseph Keshet, Dan Chazan, Ben-Zion Bobrovsky:
Plosive spotting with margin classifiers. 1637-1640 - Fabio Brugnara:
Model agglomeration for context-dependent acoustic modeling. 1641-1644 - Michael Levit, Allen L. Gorin, Jeremy H. Wright:
Multipass algorithm for acquisition of salient acoustic morphemes. 1645-1648 - Tadashi Emori, Koichi Shinoda:
Rapid vocal tract length normalization using maximum likelihood estimation. 1649-1652 - Kozo Okuda, Tomoko Matsui, Satoshi Nakamura:
Towards the creation of acoustic models for stressed Japanese speech. 1653-1656 - Akira Baba, Shinichi Yoshizawa, Miichi Yamada, Akinobu Lee, Kiyohiro Shikano:
Elderly acoustic model for large vocabulary continuous speech recognition. 1657-1660 - Jinsong Zhang, Shuwu Zhang, Yoshinori Sagisaka, Satoshi Nakamura:
A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition. 1661-1664 - Luis Javier Rodríguez, Inés Torres, Amparo Varona:
Evaluation of sublexical and lexical models of acoustic disfluencies for spontaneous speech recognition in Spanish. 1665-1668 - Murat Deviren, Khalid Daoudi:
Structural learning of dynamic Bayesian networks in speech recognition. 1669-1672
Linguistic Modelling: Language Models
- Shigehiko Onishi, Hirofumi Yamamoto, Yoshinori Sagisaka:
Structured language model for class identification of out-of-vocabulary words arising from multiple wordclasses. 693-696 - Takatoshi Jitsuhiro, Hirofumi Yamamoto, Setsuo Yamada, Yoshinori Sagisaka:
New language models using phrase structures extracted from parse trees. 697-700 - Elvira I. Sicilia-Garcia, Ji Ming, Francis Jack Smith:
Triggering individual word domains in n-gram language models. 701-704 - Tomoyosi Akiba, Katunobu Itou:
A structured statistical language model conditioned by arbitrarily abstracted grammatical categories based on GLR parsing. 705-708 - Atsushi Matsui, Hiroyuki Segi, Akio Kobayashi, Toru Imai, Akio Ando:
Speech recognition of broadcast sports news. 709-712 - Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh:
Improvement of a structured language model: arbori-context tree. 713-716 - Woosung Kim, Sanjeev Khudanpur, Jun Wu:
Smoothing issues in the structured language model. 717-720 - Xipeng Shen, Bo Xu:
The study of the effect of training set on statistical language modeling. 721-724 - Yannick Estève, Frédéric Béchet, Alexis Nasr, Renato de Mori:
Stochastic finite state automata language model triggered by dialogue states. 725-728 - Manny Rayner, John Dowding, Beth Ann Hockey:
A baseline method for compiling typed unification grammars into context free language models. 729-732 - Edward W. D. Whittaker, Bhiksha Raj:
Comparison of width-wise and length-wise language model compression. 733-736 - Vesa Siivola, Mikko Kurimo, Krista Lagus:
Large vocabulary statistical language modeling for continuous speech recognition in finnish. 737-740 - Ramón López-Cózar, Diego H. Milone:
A new technique based on augmented language models to improve the performance of spoken dialogue systems. 741-744 - Kazuyuki Takagi, Kazuhiko Ozeki:
Pause information for dependency analysis of read Japanese sentences. 1041-1044 - Berlin Chen, Hsin-Min Wang, Lin-Shan Lee:
An HMM/n-gram-based linguistic processing approach for Mandarin spoken document retrieval. 1045-1048 - Yi-Chung Lin, Huei-Ming Wang:
Probabilistic concept verification for language understanding in spoken dialogue systems. 1049-1052 - M. Oguzhan Külekci, Mehmed Özkan:
Turkish word segmentation using morphological analyzer. 1053-1056 - Pongthai Tarsaku, Virach Sornlertlamvanich, Rachod Thongprasirt:
Thai grapheme-to-phoneme using probabilistic GLR parser. 1057-1060 - Philippe Blache, Daniel Hirst:
Aligning prosody and syntax in property grammars. 1061-1064 - Melissa Barkat, Ioana Vasilescu:
From perceptual designs to linguistic typology and automatic language identification : overview and perspectives. 1065-1068 - Susan Fitt:
Morphological approaches for an English pronunciation lexicon. 1069-1072 - Gina Joue, Julie Carson-Berndsen:
An embodiment paradigm for speech recognition systems. 1073-1076 - Kui Xu, Fuliang Weng, Helen M. Meng, Po-Chui Luk:
Multi-parser architecture for query processing. 1077-1080 - Yi-Chia Chen, Yi-Chung Lin:
Two-stage probabilistic approach to text segmentation. 1081-1084 - Roeland Ordelman, Arjan van Hessen, Franciska de Jong:
Lexicon optimization for dutch speech recognition in spoken document retrieval. 1085-1088 - Tom Brøndsted:
Evaluation of recent speech grammar standardization efforts. 1089-1092
Speaker Recognition: Identification, Verification and Tracking. Speech Recognition and Understanding: Language Identification
- Douglas Brungart, Kimberly R. Scott, Brian D. Simpson:
The influence of vocal effort on human speaker identification. 747-750 - Robert Faltlhauser, Günther Ruske:
Improving speaker recognition using phonetically structured Gaussian mixture models. 751-754 - Conrad Sanderson, Kuldip K. Paliwal:
Information fusion for robust speaker verification. 755-758 - Takayuki Satoh, Takashi Masuko, Takao Kobayashi, Keiichi Tokuda:
A robust speaker verification system against imposture using an HMM-based speech synthesis system. 759-762 - Arun C. Surendran:
Sequential decisions for faster and more flexible verification. 763-766 - Wei-Ho Tsai, Y. C. Chu, Chao-Shih Huang, Wen-Whei Chang:
Background learning of speaker voices for textindependent speaker identification. 767-771 - Wei-Ho Tsai, Wen-Whei Chang, Chao-Shih Huang:
Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification. 771-774 - Chai Wutiwiwatchai, Varin Achariyakulporn, Sawit Kasuriya:
Improvement of speaker verification for Thai language. 775-778 - Javier Rodríguez Saeta, Christian Koechling, Javier Hernando:
Speaker identification for car infotainment applications. 779-782 - Holger Schalk, Herbert Reininger, Stephan Euler:
A system for text dependent speaker verification - field trial evaluation and simulation results. 783-786 - Alvin F. Martin, Mark A. Przybocki:
Speaker recognition in a multi-speaker environment. 787-790 - Zhijian Ou, Zuoying Wang:
A new DP-like speaker clustering algorithm. 791-794 - P. Sivakumaran, J. Fortuna, Aladdin M. Ariyaeeinia:
On the use of the Bayesian information criterion in multiple speaker detection. 795-798 - Laurent Benarousse, Edouard Geoffrois:
Preliminary experiments on language identification using broadcast news recordings. 799-802 - Katrin Kirchhoff, Sonia Parandekar:
Multi-stream statistical n-gram modeling with application to automatic language identification. 803-806
Phonetics and Phonology: Prominence and Timing
- Barbertje M. Streefkerk, Louis C. W. Pols, Louis ten Bosch:
Up to what level can acoustical and textual features predict prominence. 811-814 - Hyunsong Chung, Mark A. Huckvale:
Linguistic factors affecting timing in Korean with application to speech synthesis. 815-818 - Felix Schaeffler:
Measuring rhythmic deviation in second language speech. 819-822 - Ian Maddieson:
Good timing: place-dependent voice onset time in ejective stops. 823-826
Speech Synthesis: Concatenation
- Hélène François, Olivier Boëffard:
Design of an optimal continuous speech database for text-to-speech synthesis considered as a set covering problem. 829-832 - Christos Vosnidis, Vassilios Digalakis:
Use of clustering information for coarticulation compensation in speech synthesis by word concatenation. 833-836 - Maria Founda, George Tambouratzis, Aimilios Chalamandaris, George Carayannis:
Reducing spectral mismatches in concatenative speech synthesis via systematic database enrichment. 837-840 - Attila Ferencz, Sung-Woo Choi, Ho-Eun Song, Myoung-Wan Koo:
Hansori 2001 - corpus-based implementation of the Korean hansori text-to-speech synthesizer. 841-844 - William J. Barry, Claus Nielsen, Ove Andersen:
Must diphone synthesis be so unnatural? 975-978 - Ann K. Syrdal:
Phonetic effects on listener detection of vowel concatenation. 979-982 - Olivier Boëffard:
Variable-length acoustic units inference for text-to-speech synthesis. 983-986 - Ivan Bulyko, Mari Ostendorf:
Unit selection for speech synthesis using splicing costs with weighted finite state transducers. 987-990 - Ka Man Law, Tan Lee, Wai H. Lau:
Cantonese text-to-speech synthesis using sub-syllable units. 991-994
Speech Recognition and Understanding: Noise Robustness
- Febe de Wet, Bert Cranen, Johan de Veth, Lou Boves:
A comparison of LPC and FFT-based acoustic features for noise robust ASR. 865-868 - Miichi Yamada, Akira Baba, Shinichi Yoshizawa, Yuichiro Mera, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Unsupervised noisy environment adaptation algorithm using MLLR and speaker selection. 869-872 - Zekeriya Tufekci, John N. Gowdy, Sabri Gurbuz, Eric K. Patterson:
Applying parallel model compensation with mel-frequency discrete wavelet coefficients for noise-robust speech recognition. 873-876 - Tai-Hwei Hwang, Kuo-Hwei Yuo, Hsiao-Chuan Wang:
Linear interpolation of cepstral variance for noisy speech recognition. 877-880 - Hiroshi Matsumoto, Akihiko Shimizu, Kazumasa Yamamoto:
Evaluation of a generalized dynamic cepstrum in distant speech recognition. 881-884 - Arnaud Martin, Géraldine Damnati, Laurent Mauuary:
Robust speech/non-speech detection using LDA applied to MFCC for continuous speech recognition. 885-888 - Edmondo Trentin, Marco Gori:
Toward noise-tolerant acoustic models. 889-892 - Nicholas W. D. Evans, John S. D. Mason:
Noise estimation without explicit speech, non-speech detection: a comparison of mean, modal and median based approaches. 893-896 - Rathi Chengalvarayan:
Evaluation of front-end features and noise compensation methods for robust Mandarin speech recognition. 897-900 - Brendan J. Frey, Li Deng, Alex Acero, Trausti T. Kristjansson:
ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition. 901-904 - John H. L. Hansen, Ruhi Sarikaya, Umit H. Yapanel, Bryan L. Pellom:
Robust speech recognition in noise: an evaluation using the SPINE corpus. 905-908 - Man-Hung Siu, Yu-Chung Chan:
Robust speech recognition against packet loss. 1095-1098 - Masaki Naito, Shingo Kuroiwa, Tsuneo Kato, Tohru Shimizu, Norio Higuchi:
Rapid CODEC adaptation for cellular phone speech recognition. 1099-1102 - Ascensión Gallardo-Antolín, Carmen Peláez-Moreno, Fernando Díaz-de-María:
A robust front-end for ASR over IP snd GSM networks: an integrated scenario. 1103-1106 - Philippe Renevey, Rolf Vetter, Jens Krauss:
Robust speech recognition using missing feature theory and vector quantization. 1107-1110 - Ji Ming, Peter Jancovic, Philip Hanna, Darryl Stewart:
Modeling the mixtures of known noise and unknown unexpected noise for robust speech recognition. 1111-1114 - Takayoshi Kawamura, Kazuya Takeda, Fumitada Itakura:
Robust speech recognition based on selective use of missing frequency band HMMs. 1115-1118 - Ikuyo Masuda-Katsuse:
A new method for speech recognition in the presence of non-stationary, unpredictable and high-level noise. 1119-1122 - Bojan Kotnik, Zdravko Kacic, Bogomir Horvat:
A computational efficient real time noise robust speech recognition based on improved spectral subtraction method. 1123-1126 - Damjan Vlaj, Zdravko Kacic, Bogomir Horvat:
The use of noisy frame elimination and frequency spectrum magnitude reduction in noise robust speech recognition. 1127-1130 - Jen-Tzung Chien:
Combined linear regression adaptation and Bayesian predictive classification for robust speech recognition. 1131-1134 - Florian Hilger, Hermann Ney:
Quantile based histogram equalization for noise robust speech recognition. 1135-1138 - Kaisheng Yao, Kuldip K. Paliwal, Satoshi Nakamura:
Sequential noise compensation by a sequential kullback proximal algorithm. 1139-1142
Signal Analysis: Microphone Arrays & Source Localisation
- Athanasios Koutras, Evangelos Dermatas, George K. Kokkinakis:
Blind speech separation of moving speakers using hybrid neural networks. 997-1000 - Wolfgang Herbordt, Herbert Buchner, Walter Kellermann:
Computationally efficient frequency-domain combination of acoustic echo cancellation and robust adaptive beamforming. 1001-1004 - Michael L. Seltzer, Bhiksha Raj:
Calibration of microphone arrays for improved speech recognition. 1005-1008 - Athanasios Koutras, Evangelos Dermatas, George K. Kokkinakis:
Improving simultaneous speech recognition in real room environments using overdetermined blind source separation. 1009-1012 - Futoshi Asano, Masataka Goto, Katunobu Itou, Hideki Asoh:
Real-time sound source localization and separation system and its application to automatic speech recognition. 1013-1016
Speech Recognition and Understanding: Audio-Visual Processing
- Joohun Lee, Jin Young Kim:
An efficient lipreading method using the symmetry of lip. 1019-1022 - Martin Heckmann, Thorsten Wild, Frédéric Berthommier, Kristian Kroschel:
Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition. 1023-1026 - Gerasimos Potamianos, Chalapathy Neti, Giridharan Iyengar, Eric Helmuth:
Large-vocabulary audio-visual speech recognition by machines and humans. 1027-1030 - Philippe Daubias, Paul Deléglise:
Evaluation of an automatically obtained shape and appearance model for automatic audio visual speech recognition. 1031-1034 - Catherine Pelachaud, Emanuela Magno Caldognetto, Claudio Zmarich, Piero Cosi:
An approach to an Italian talking head. 1035-1038
SIGshow (Special Session)
- Jean-François Bonastre, Ivan Magrin-Chagnolleau, Stephan Euler, François Pellegrino, Régine André-Obrecht, John S. D. Mason, Frédéric Bimbot:
SPeaker and language characterization (spLC): a special interest group (SIG) of ISCA. 1145-1148 - Nick Campbell, Wolfgang Hess, Bernd Möbius, Jan P. H. van Santen:
The ISCA special interest group on speech synthesis. 1149-1152 - Dominic W. Massaro:
Auditory visual speech processing. 1153-1156 - Laila Dybkjær:
SIGdial - special interest group on discourse and dialogue. 1345-1348 - Philippe Delcloque:
Integrating speech technology in language learning: an overview of the activities of inSTIL. 1349-1352 - Climent Nadeu, Donncha Cróinín, Bojan Petek, Kepa Sarasola, Briony Williams:
ISCA SALTMIL SIG: speech and language technology for minority languages. 1353-1556
Speech Synthesis: Prosody
- Weijun Chen, Fuzong Lin, Jianmin Li, Bo Zhang:
Training prosodic phrasing rules for Chinese TTS systems. 1159-1162 - Per Olav Heggtveit, Jon Emil Natvig:
Intonation modelling with a lexicon of natural F0 contours. 1163-1166 - Kim E. A. Silverman, Jerome R. Bellegarda, Kevin A. Lenzo:
Smooth contour estimation in data-driven pitch modelling. 1167-1170 - Takashi Saito, Masaharu Sakamoto:
Generating F0 contours by statistical manipulation of natural F0 shapes. 1171-1174 - Julia Hirschberg, Owen Rambow:
Learning prosodic features using a tree representation. 1175-1178
Applications: Multimodal Applications
- Sabri Gurbuz, Eric K. Patterson, Zekeriya Tufekci, John N. Gowdy:
Lip-reading from parametric lip contours for audio- visual speech recognition. 1181-1184 - Simon Lucey, Sridha Sridharan, Vinod Chandran:
An investigation of HMM classifier combination strategies for improved audio-visual speech recognition. 1185-1188 - Niels Ole Bernsen, Laila Dybkjær:
Combining multi-party speech and text exchanges over the internet. 1189-1192 - Kazuhiro Nakadai, Ken-ichi Hidai, Hiroshi G. Okuno, Hiroaki Kitano:
Real-time multiple speaker tracking by multi-modal integration for mobile robots. 1193-1196 - Tsuneo Nitta, Kouichi Katsurada, Hirobumi Yamada, Yusaku Nakamura, Satoshi Kobayashi:
XISL: an attempt to separate multimodal interactions from XML contents. 1197-1200
Speech Recognition and Understanding: Speaker Adaptation
- Asela Gunawardana, William Byrne:
Discriminative speaker adaptation with conditional maximum likelihood linear regression. 1203-1206 - Patrick Kenny, Gilles Boulianne, Pierre Dumouchel:
What is the best type of prior distribution for EMAP speaker adaptation? 1207-1210 - Yoon Kim:
Maximum-likelihood affine cepstral filtering (MLACF) technique for speaker normalization. 1211-1214 - Bowen Zhou, John H. L. Hansen:
A novel algorithm for rapid speaker adaptation based on structural maximum likelihood eigenspace mapping. 1215-1218 - Shinichi Yoshizawa, Akira Baba, Kanako Matsunami, Yuichiro Mera, Miichi Yamada, Akinobu Lee, Kiyohiro Shikano:
Evaluation on unsupervised speaker adaptation based on sufficient HMM statictics of selected speakers. 1219-1222
Speech Recognition and Understanding: Adaptation
- Lei Jia, Bo Xu:
A novel target-driven MLLR adaptation algorithm with multi-layer structure. 1225-1228 - Frank Wallhoff, Daniel Willett, Gerhard Rigoll:
Scaled likelihood linear regression for hidden Markov model adaptation. 1229-1232 - Tor André Myrvoll, Kuldip K. Paliwal, Torbjørn Svendsen:
Fast adaptation using constrained affine transformations with hierarchical priors. 1233-1236 - Xiaoxing Liu, Baosheng Yuan, Yonghong Yan:
A context adaptation approach for building context dependent models in LVCSR. 1237-1240 - Fabrice Lefèvre, Jean-Luc Gauvain, Lori Lamel:
Improving genericity for task-independent speech recognition. 1241-1244 - Driss Matrouf, Olivier Bellot, Pascal Nocera, Georges Linarès, Jean-François Bonastre:
A posteriori and a priori transformations for speaker adaptation in large vocabulary speech recognition systems. 1245-1248 - Darryl W. Purnell, Elizabeth C. Botha:
Bayesian methods for HMM speech recognition with limited training data. 1249-1252 - Kwok-Man Wong, Brian Kan-Wing Mak:
Rapid speaker adaptation using MLLR and subspace regression classes. 1253-1256 - Néstor Becerra Yoma, Jorge F. Silva:
Speaker adaptation of output probabilities and state duration distributions for speech recognition. 1257-1260 - Jian Wu, Eric Chang:
Cohorts based custom models for rapid speaker and dialect adaptation. 1261-1264 - Marcel Vasilache, Olli Viikki:
Speaker adaptation of quantized parameter HMMs. 1265-1268 - Yu Tsao, Shang-Ming Lee, Fu-Chiang Chou, Lin-Shan Lee:
Segmental eigenvoice for rapid speaker adaptation. 1269-1272 - Narada D. Warakagoda, Magne Hallstein Johnsen:
Speaker adaptation in an ASR system based on nonlinear dynamical systems. 1273-1276
Dialogue Systems: Project Descriptions
- Ricardo de Córdoba, Rubén San Segundo, Juan Manuel Montero, José Colás, Javier Ferreiros, Javier Macías Guarasa, José Manuel Pardo:
An interactive directory assistance service for Spanish with large-vocabulary recognition. 1279-1282 - Yunbiao Xu, Masahiro Araki, Yasuhisa Niimi:
A multilingual-supporting dialog system using a common dialog controller. 1283-1286 - Tomás Nouza, Jan Nouza:
Graphic platform for designing and developing practical voice interaction systems. 1287-1290 - Laurent Besacier, Hervé Blanchon, Yannick Fouquet, Jean-Philippe Guilbaud, Stéphane Helme, Sylviane Mazenot, Daniel Moraru, Dominique Vaufreydaz:
Speech translation for French in the NESPOLE! European project. 1291-1294 - Marianne Hickey, Paul St John Brittan:
Lessons from the development of a conversational interface. 1295-1298 - Julia Hirschberg, Michiel Bacchiani, Donald Hindle, Philip L. Isenhour, Aaron E. Rosenberg, Litza A. Stark, Larry Stead, Steve Whittaker, Gary Zamchick:
SCANMail: browsing and searching speech data by content. 1299-1302 - Wai Kit Lo, Patrick Schone, Helen M. Meng:
Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system. 1303-1306 - Shih-Chieh Chien, Sen-Chia Chang:
Compact word graph in spoken dialogue system. 1307-1310 - Munehiko Sasajima, Takehide Yano, Taishi Shimomori, Tatsuya Uehara:
MINOS-II: a prototype car navigation system with mixed initiative turn taking dialogue. 1311-1314 - Shinya Kiriyama, Keikichi Hirose, Nobuaki Minematsu:
Use of topic knowledge in spoken dialogue information retrieval system for academic documents. 1315-1318 - Kazunori Komatani, Katsuaki Tanaka, Hiroaki Kashima, Tatsuya Kawahara:
Domain-independent spoken dialogue platform using key-phrase spotting based on combined language model. 1319-1322 - Peter J. Durston, Mark Farrell, David Attwater, James Allen, Hong-Kwang Jeff Kuo, Mohamed Afify, Eric Fosler-Lussier, Chin-Hui Lee:
OASIS natural language call steering trial. 1323-1326 - Ivano Azzini, Daniele Falavigna, Roberto Gretter, Giordano Lanzola, Marco Orlandi:
First steps toward an adaptive spoken dialogue system in medical domain. 1327-1330 - Mikio Nakano, Yasuhiro Minami, Stephanie Seneff, Timothy J. Hazen, D. Scott Cyphers, James R. Glass, Joseph Polifroni, Victor Zue:
Mokusei: a telephone-based Japanese conversational system in the weather domain. 1331-1334 - James R. Glass, Eugene Weinstein:
Speechbuilder: facilitating spoken dialogue system development. 1335-1338 - Mazin G. Rahim, Giuseppe Di Fabbrizio, Candace A. Kamm, Marilyn A. Walker, A. Pokrovsky, P. Ruscitti, Esther Levin, Sungbok Lee, Ann K. Syrdal, K. Schlosser:
Voice-IF: a mixed-initiative spoken dialogue system for AT&t conference services. 1339-1342 - Wolfgang Wahlster, Norbert Reithinger, Anselm Blocher:
Smartkom: multimodal communication with a life- like character. 1547-1550 - Helen M. Meng, Shuk Fong Chan, Yee Fong Wong, Cheong Chat Chan, Yiu Wing Wong, Tien Ying Fung, Wai Ching Tsui, Ke Chen, Lan Wang, Ting-Yao Wu, Xiaolong Li, Tan Lee, Wing Nin Choi, P. C. Ching, Huisheng Chi:
ISIS: a learning system with combined interaction and delegation dialogs. 1551-1554 - Ye-Yi Wang:
Robust language understanding in mipad. 1555-1558 - Oliver Lemon, Anne Bracy, Alexander Gruenstein, Stanley Peters:
The WITAS multi-modal dialogue system I. 1559-1562 - Stefanie Shriver, Roni Rosenfeld, Xiaojin Zhu, Arthur R. Toth, Alexander I. Rudnicky, Markus D. Flückiger:
Universalizing speech: notes from the USI project. 1563-1566
Dialogue Systems: Resources
- Elizabeth Shriberg, Andreas Stolcke, Don Baron:
Observations on overlap: findings and implications for automatic processing of multi-party conversation. 1359-1362 - Jennifer L. Beckham, Giuseppe Di Fabbrizio, Nils Klarlund:
Towards SMIL as a foundation for multimodal, multimedia applications. 1363-1366 - Michael Kipp:
ANVIL - a generic annotation tool for multimodal dialogue. 1367-1370 - Marilyn A. Walker, John S. Aberdeen, Julie E. Boland, Elizabeth Owen Bratt, John S. Garofolo, Lynette Hirschman, Audrey N. Le, Sungbok Lee, Shrikanth S. Narayanan, Kishore Papineni, Bryan L. Pellom, Joseph Polifroni, Alexandros Potamianos, P. Prabhu, Alexander I. Rudnicky, Gregory A. Sanders, Stephanie Seneff, David Stallard, Steve Whittaker:
DARPA communicator dialog travel planning systems: the june 2000 data collection. 1371-1374
Speaker Recognition: Features and Transforms
- Chao Huang, Tao Chen, Stan Z. Li, Eric Chang, Jian-Lai Zhou:
Analysis of speaker variability. 1377-1380 - Masafumi Nishida, Yasuo Ariki:
Speaker recognition by separating phonetic space and speaker space. 1381-1384 - Nick J.-C. Wang, Wei-Ho Tsai, Lin-Shan Lee:
Eigen-MLLR coefficients as new feature parameters for speaker identification. 1385-1388
Speech Perception: Prosody
- Johanneke Caspers:
Testing the perceptual relevance of syntactic completion and melodic configuration for turn-taking in dutch. 1395-1398 - Toni C. M. Rietveld, Patricia Vermillion:
Cues for perceived pitch register. 1399-1402 - Aoju Chen, Toni C. M. Rietveld, Carlos Gussenhoven:
Language-specific effects of pitch range on the perception of universal intonational meaning. 1403-1406 - Esther Janse:
Comparing word-level intelligibility after linear vs. non-linear time-compression. 1407-1410
Speech Production: Miscellaneous
- Florien J. Koopmans-van Beinum, Chris J. Clement, Ineke Van den Dikkenberg-Pot:
AMSTIVOC (AMsterdam system for transcription of infant VOCalizations) applied to utterances of deaf and normally hearing infants. 1471-1474 - Olov Engwall:
Using linguopalatal contact patterns to tune a 3d tongue model. 1475-1478 - Tokihiko Kaburagi, Masaaki Honda:
Electromagnetic articulograph (EMA) based on a nonparametric representation of tthe magnetic field. 1479-1482 - António J. S. Teixeira, Francisco A. C. Vaz:
European portuguese nasal vowels: an EMMA study. 1483-1486 - Susanne Fuchs, Pascal Perrier, Christine Mooshammer:
The role of the palate in tongue kinematics: an experimental assessment in v sequences from EPG and EMMA data. 1487-1490 - Matthew P. Aylett:
Modelling care of articulation with HMMs is dangerous. 1491-1494 - Peter J. Murphy:
Spectral tilt as a perturbation-free measurement of noise levels in voice signals. 1495-1498 - Jean Schoentgen:
Estimation of the modulation frequency and modulation depth of the fundamental frequency owing to vocal micro-tremor of the voice source signal. 1499-1502 - Ralph van Dinther, Raymond N. J. Veldhuis, Armin Kohlrausch:
The perceptual relevance of glottal-pulse parameter variations. 1503-1506 - Marcel Ogner, Zdravko Kacic:
Speaker normalization based on test to reference speaker mapping. 1507-1510 - Michel Pitermann, Kevin G. Munhall:
A face-to-muscle inversion of a biomechanical face model for audiovisual and motor control research. 1511-1514 - Allan J. South:
A model of vowel production under positive pressure breathing. 1515-1518 - Adam Podhorski, Marek Czepulonis:
Helium speech normalisation by codebook mapping. 1519-1523
Existing and Future Corpora: Next Generation Speech Resources (Special Session)
- Nick Campbell:
Building a corpus of natural speech - and tools for the processing of expressive speech. 1525-1528 - Daan Broeder, Hennie Brugman, Peter Wittenburg:
Aspects of modern multi-modal/multi-media corpora exploitation environments. 1529-1532 - Tony Bigbee, Dan Loehr, Lisa Harper:
Emerging requirements for multi-modal annotation and analysis tools. 1533-1536 - Toomas Altosaar, Matti Karjalainen, Martti Vainio:
Three-dimensional modelling of speech corpora: added value through visualisation. 1537-1540 - Ulrich Trk:
The technical processing in smartkom data collection: a case study. 1541-1544
Signal Analysis: Speech Processing in Car Environments
- Marco Matassoni, Maurizio Omologo, Piergiorgio Svaizer:
Use of real and contaminated speech for training of a hands-free in-car speech recognizer. 1569-1572 - Jay P. Plucienkowski, John H. L. Hansen, Pongtep Angkititrakul:
Combined front-end signal processing for in-vehicle speech systems. 1573-1576 - Sid-Ahmed Selouani, Hesham Tolba, Douglas D. O'Shaughnessy:
Robust automatic speech recognition in low-SNR car environments by the application of a connectionist subspace-based approach to the melbased cepstral coefficients. 1577-1580 - Andreas Korthauer:
Recognition of spelled city names in automotive environments. 1581-1584 - Eduardo Lleida, Enrique Masgrau, Alfonso Ortega:
Acoustic echo control and noise reduction for cabin car communication. 1585-1588
Speech Recognition and Understanding: Finite State Transducers for ASR
- Timothy J. Hazen, I. Lee Hetherington, Alex Park:
FST-based recognition techniques for multi-lingual and multi-domain spontaneous speech. 1591-1594 - Gilles Boulianne, Pierre Ouellet, Pierre Dumouchel:
A transducer approach to word graph generation. 1595-1598 - I. Lee Hetherington:
An efficient implementation of phonological rules using finite-state transducers. 1599-1602 - Mehryar Mohri, Michael Riley:
A weight pushing algorithm for large vocabulary speech recognition. 1603-1606 - Alexander Seward:
Transducer optimizations for tight-coupled decoding. 1607-1612
Resources, Assessment and Standards: Assessment Tools & Methodology
- Sander J. van Wijngaarden, Paula M. T. Smeele, Herman J. M. Steeneken:
A new method for testing communication efficiency and user acceptability of speech communication channels. 1675-1678 - Catia Cucchiarini, Diana Binnenpoorte, Simo M. A. Goddijn:
Phonetic transcriptions in the spoken dutch corpus: how to combine efficiency and good transcription quality. 1679-1682 - Ben Hutchinson:
A functional approach to speech recognition evaluation. 1683-1686 - Sebastian Möller, Jens Berger:
Instrumental derivation of equipment impairment factors for describing telephone speech codec degradations. 1687-1690 - Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano:
Julius - an open source real-time large vocabulary recognition engine. 1691-1694 - Doroteo Torre Toledano, Luis A. Hernández Gómez:
Local refinement of phonetic boundaries: a general framework and its application using different transition models. 1695-1698 - Thorsten Ludwig, Ulrich Heute:
Detection of digital transmission systems for voice quality measurements. 1699-1702 - Eric Lewis, Mark Tatham:
Automatic segmentation of recorded speech into syllables for speech synthesis. 1703-1706 - João Paulo Ramos Teixeira, Diamantino Freitas, Daniela Braga, Maria João Barros, Vagner Latsch:
Phonetic events from the labeling the european portuguese database for speech synthesis, FEUP/IPBDB. 1707-1710 - Samir Nefti, Olivier Boëffard:
Acoustical and topological experiments for an HMM-based speech segmentation system. 1711-1714 - Qiru Zhou, Jinsong Zheng, Chin-Hui Lee:
TclBLASR: an automatic speech recognition extension for tcl. 1715-1718
Existing and Future Corpora: Automated Analysis of Speech Resources (Special Session)
- Judith M. Kessens, Helmer Strik:
Lower WERs do not guarantee better transcriptions. 1721-1724 - Shuangyu Chang, Steven Greenberg, Mirjam Wester:
An elitist approach to articulatory-acoustic feature classification. 1725-1728 - Mirjam Wester, Steven Greenberg, Shuangyu Chang:
A dutch treatment of an elitist approach to articulatory-acoustic feature classification. 1729-1732
Dialogue Systems: Dialogue Systems and Generation
- Michel Galley, Eric Fosler-Lussier, Alexandros Potamianos:
Hybrid natural language generation for spoken dialogue systems. 1735-1738 - Nicholas J. Cook, Ian D. Benest:
The generation of speech for a search guide. 1739-1742 - Masahiro Araki, Tasuku Ono, Kiyoshi Ueda, Takuya Nishimoto, Yasuhisa Niimi:
An automatic dialogue system generator from the internet information contents. 1743-1746 - Monica Rogati, Marilyn A. Walker, Owen Rambow:
Training a sentence planner for spoken dialog: the impact of syntactic and planning features. 1747-1750
Speaker Recognition: Alternative Trends in Verification
- Carlos E. Vivaracho, Javier Ortega-Garcia, Luis Alonso, Q. Isaac Moro:
A comparative study of MLP-based artificial neural networks in text-independent speaker verification against GMM-based systems. 1753-1757 - Shai Fine, Jirí Navrátil, Ramesh A. Gopinath:
Enhancing GMM scores using SVM "hints". 1757-1760 - Jamal Kharroubi, Dijana Petrovska-Delacrétaz, Gérard Chollet:
Combining GMM's with suport vector machines for text-independent speaker verification. 1761-1764 - Yong Gu, Trevor Thomas:
A text-independent speaker verification system using support vector machines classifier. 1765-1768 - Robert P. Stapert, John S. D. Mason:
A segmental mixture model for speaker recognition. 2509-2512 - Raphaël Blouet, Frédéric Bimbot:
Tree based score computation for speaker verification. 2513-2516 - Walter D. Andrews, Mary A. Kohler, Joseph P. Campbell:
Phonetic speaker recognition. 2517-2520 - George R. Doddington:
Speaker recognition based on idiolectal differences between speakers. 2521-2524
Speech Recognition and Understanding: Speech Understanding
- Chiori Hori, Sadaoki Furui:
Advances in automatic speech summarization. 1771-1774 - Kadri Hacioglu, Wayne H. Ward:
A word graph interface for a flexible concept based speech understanding framework. 1775-1778 - Sylvia Knight, Genevieve Gorrell, Manny Rayner, David Milward, Rob Koeling, Ian Lewin:
Comparing grammar-based and robust approaches to speech understanding: a case study. 1779-1782 - Sherif M. Abdou, Michael S. Scordilis:
Integrating multiple knowledge sources for improved speech understanding. 1783-1786
Speech Recognition and Understanding: Algorithms and Architectures
- Zeev Litichever, Dan Chazan:
Classification of transition sounds with application to automatic speech recognition. 1789-1792 - Avi Faizakov, Arnon Cohen, Tzur Vaich:
Gaussian subtraction (GS) algorithms for word spotting in continuous speech. 1793-1796 - Michael L. Shire:
Relating frame accuracy with word error in hybrid ANN-HMM ASR. 1797-1800 - Guoliang Zhang, Fang Zheng, Wenhu Wu:
A two-layer lexical tree based beam search in continuous Chinese speech recognition. 1801-1804 - Yoshiaki Itoh, Kazuyo Tanaka:
Automatic labeling and digesting for lecture speech utilizing repeated speech by shift CDP. 1805-1808 - Takaaki Hori, Yoshiaki Noda, Shoichi Matsunaga:
Improved phoneme-history-dependent search for large-vocabulary continuous-speech recognition. 1809-1813 - Josef Psutka, Ludek Müller, Josef V. Psutka:
Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. 1813-1816 - Ernest Pusateri, Jean-Manuel Van Thong:
N-best list generation using word and phoneme recognition fusion. 1817-1820 - Dong-Hoon Ahn, Minhwa Chung:
A one pass semi-dynamic network decoder based on language model network. 1821-1824 - Wolfgang Macherey, Daniel Keysers, Jörg Dahmen, Hermann Ney:
Improving automatic speech recognition using tangent distance. 1825-1828 - Ananlada Chotimongkol, Alexander I. Rudnicky:
N-best speech hypotheses reordering using linear regression. 1829-1832 - Sabine Deligne, Ellen Eide, Ramesh A. Gopinath, Dimitri Kanevsky, Benoît Maison, Peder A. Olsen, Harry Printz, Jan Sedivý:
Low-resource hidden Markov model speech recognition. 1833-1836 - Hans-Günter Hirsch, K. Hellwig, Stefan Dobler:
Speech recognition at multiple sampling rates. 1837-1840 - Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama:
Support vector machine with dynamic time-alignment kernel for speech recognition. 1841-1844 - Naveen Srinivasamurthy, Antonio Ortega, Shrikanth S. Narayanan:
Efficient scalable speech compression for scalable speech recognition. 1845-1848
Signal Analysis: Speech Enhancement and Noise Processing
- Jan Stadermann, V. Stahl, G. Rose:
Voice activity detection in noisy environments. 1851-1854 - Hamid Sheikhzadeh, Hamid Reza Abutalebi:
An improved wavelet-based speech enhancement system. 1855-1858 - Tenkasi Ramabadran, Jeff Meunier, Mark A. Jasiuk, Bill Kushner:
Enhancing distributed speech recognition with back- end speech reconstruction. 1859-1862 - Jiri Tihelka, Pavel Sovka:
Implementation effective one-channel noise reduction system. 1863-1866 - Hyoung-Gook Kim, Klaus Obermayer, Mathias Bode, Dietmar Ruwisch:
Efficient speech enhancement by diffusive gain factors (DGF). 1867-1870 - Gaël Mahé, André Gilloire:
Correction of the voice timbre distortions on telephone network. 1871-1874 - Yunjung Lee, Joohun Lee, Ki Yong Lee, Katsuhiko Shirai:
Speech enhancement based on IMM with NPHMM. 1875-1878 - Masakiyo Fujimoto, Yasuo Ariki:
Speech recognition under musical environments using kalman filter and iterative MLLR adaptation. 1879-1882 - Rolf Vetter, Philippe Renevey, Jens Krauss:
Dual channel speech enhancement using coherence function and MDL-based subspace approach in bark domain. 1883-1886 - Philippe Renevey, Andrzej Drygajlo:
Entropy based voice activity detection in very noisy conditions. 1887-1890 - Stefan Karnebäck:
Discrimination between speech and music based on a low frequency modulation feature. 1891-1894 - Yiou-Wen Cheng, Lin-Shan Lee:
Credibility proof for speech content and speaker verification by fragile watermarking with consecutive frame-based processing. 1895-1898 - Ilyas Potamitis, Nikos Fakotakis, George K. Kokkinakis:
Map estimation for on-line noise compensation of time trajectories of spectral coefficients. 1899-1902 - Hagai Attias, Li Deng, Alex Acero, John C. Platt:
A new method for speech denoising and robust speech recognition using probabilistic models for clean speech and for noise. 1903-1906
Speech Synthesis: Grapheme-to-Phoneme Conversion
- Anne K. Kienappel, Reinhard Kneser:
Designing very compact decision trees for grapheme-to-phoneme transcription. 1911-1914 - Franco Mana, Paolo Massimino, Alberto Pacchiotti:
Using machine learning techniques for grapheme to phoneme transcription. 1915-1918 - Ariadna Font Llitjós, Alan W. Black:
Knowledge of language origin improves pronunciation accuracy of proper names. 1919-1922 - Philippe Boula de Mareüil, Franck Floricic:
On the pronunciation of acronyms in French and in Italian. 1923-1926
Signal Analysis: Speech Enhancement
- Vladimir I. Shin, Doh-Suk Kim, Moo Young Kim, Jeongsu Kim:
Enhancement of noisy speech by using improved global soft decision. 1929-1932 - Israel Cohen:
Enhancement of speech using bark-scaled wavelet packet decomposition. 1933-1936 - Mohammed Bahoura, Jean Rouat:
A new approach for wavelet speech enhancement. 1937-1940 - Sukhyun Yoon, Chang D. Yoo:
Speech/noise-dominant decision for speech enhancement. 1941-1944
Speech Recognition and Understanding: Discriminative Training
- Fan Wang, Fang Zheng, Wenhu Wu:
An MCE based classification tree using hierarchical feature-weighting in speech recognition. 1947-1950 - Jian-Lai Zhou, Eric Chang, Chao Huang:
Selective MCE training strategy in Mandarin speech recognition. 1951-1954 - Chung-Hsien Wu, Gwo-Lang Yan:
Discriminative disfluency modeling for spontaneous speech recognition. 1955-1958 - Jeih-Weih Hung, Hsin-Min Wang, Lin-Shan Lee:
Comparative analysis for data-driven temporal filters obtained via principal component analysis (PCA) and linear discriminant analysis (LDA) in speech recognition. 1959-1962
Speech Coding: Advances in Speech Coding
- Ari Heikkinen, Vesa T. Ruoppila, Samuli Pietilä:
Coding method for successive pitch periods. 1965-1968 - Jani Nurminen, Ari Heikkinen, Jukka Saarinen:
Objective evaluation of methods for quantization of variable-dimension spectral vectors in WI speech coding. 1969-1972 - Harald Pobloth, W. Bastiaan Kleijn:
Squared error as a measure of phase distortion. 1973-1976 - Marcos Faúndez-Zanuy:
Non-linear predictive vector quantization of speech. 1977-1980 - Nilantha Katugampala, Ahmet M. Kondoz:
A variable rate hybrid coder based on a synchronized harmonic excitation. 1981-1984 - Meau Shin Ho, Derek J. Molyneux, Barry M. G. Cheetham:
A hybrid sub-band sinusoidal coding scheme. 1985-1988 - Jason Lukasiak, Ian S. Burnett, Christian H. Ritz:
Low rate speech coding incorporating simultaneously masked spectrally weighted linear prediction. 1989-1992 - Hossein Najaf-Zadeh, Peter Kabal:
Narrowband perceptual audio coding: enhancements for speech. 1993-1996 - Bruno Bessette, Roch Lefebvre, Redwan Salami, Milan Jelinek, Janne Vainio, J. Rotola-Pukkila, Hannu Mikkola, Kari Järvinen:
Techniques for high-quality ACELP coding of wideband speech. 1997-2000 - Sílvia Pujalte, Asunción Moreno:
Wideband ACELP at 16 kb/s with multi-band excitation. 2001-2004 - Seung Won Lee, Keun-Sung Bae:
Wideband speech coding algorithm with application of discrete wavelet transform to upper band. 2005-2008 - S. Satheesh, T. V. Sreenivas:
A switched DPCM/subband coder for pre-echo reduction. 2009-2012 - Çagri Özgenc Etemoglu, Vladimir Cuperman:
A generalized multistage VQ approach for spectral magnitude quantization. 2013-2016 - Sung-Kyo Jung, Young-Cheol Park, Sung-Wan Yoon, Kyung-Tae Kim, Dae Hee Youn:
Efficient implementation of ITU-t g.723.1 speech coder for multichannel voice transmission and storage. 2017-2020
Resources, Assessment and Standards: Corpora
- John H. L. Hansen, Pongtep Angkititrakul, Jay P. Plucienkowski, Stephen Gallant, Umit H. Yapanel, Bryan L. Pellom, Wayne H. Ward, Ronald A. Cole:
"CU-move" : analysis & corpus development for interactive in-vehicle speech systems. 2023-2026 - Nobuo Kawaguchi, Shigeki Matsubara, Kazuya Takeda, Fumitada Itakura:
Multimedia data collection of in-car speech communication. 2027-2030 - Peter A. Heeman, David Cole, Andrew Cronk:
The u.s. speechdat-car data collection. 2031-2034 - Géza Németh, Csaba Zainkó:
Word unit based multilingual comparative analysis of text corpora. 2035-2038 - Gerhard Backfried, Robert Hecht, Sabine Loots, Norbert Pfannerer, Jürgen Riedler, Christian Schiefer:
Creating a european English broadcast news transcription corpus and system. 2039-2042 - Susanne Burger, Laurent Besacier, Paolo Coletti, Florian Metze, Céline Morel:
The nespole! voIP dialogue database. 2043-2046 - Jindrich Matousek, Josef Psutka, Jiri Kruta:
Design of speech corpus for text-to-speech synthesis. 2047-2050 - R. J. J. H. van Son, Diana Binnenpoorte, Henk van den Heuvel, Louis C. W. Pols:
The IFA corpus: a phonemically segmented dutch "open source" speech database. 2051-2054 - Philippa H. Louw, Justus C. Roux, Elizabeth C. Botha:
African speech technology (AST) telephone speech databases: corpus design and contents. 2055-2058 - Henk van den Heuvel, Jérôme Boudy, Zsolt Bakcsi, Jan Cernocký, Valery Galunov, Julia Kochanina, Wojciech Majewski, Petr Pollák, Milan Rusko, Jerzy Sadowski, Piotr Staroniewicz, Herbert S. Tropf:
Speechdat-e: five eastern european speech databases for voice-operated teleservices completed. 2059-2062 - Dafydd Gibbon, Thorsten Trippel, Serge Sharoff:
Concordancing for parallel spoken language corpora. 2063-2066 - Josef Psutka, Vlasta Radová, Ludek Müller, Jindrich Matousek, Pavel Ircing, David Graff:
Large broadcast news and read speech corpora of spoken czech. 2067-2070 - Serge A. Yablonsky:
Development of Russian lexical databases, corpora and supporting tools for speech products. 2071-2074 - Stavroula-Evita Fotinea, George Tambouratzis, George Carayannis:
Constructing a segment database for greek time domain speech synthesis. 2075-2078
Resources, Assessment and Standards: Assessment Methodology
- Kate S. Hone, Robert Graham:
Subjective assessment of speech-system interface usability. 2083-2086 - Min Chu, Hu Peng:
An objective measure for estimating MOS of synthesized speech. 2087-2090 - Helmer Strik, Catia Cucchiarini, Judith M. Kessens:
Comparing the performance of two CSRs: how to determine the significance level of the differences. 2091-2094 - Ryuta Terashima, Hiroyuki Hoshino, Toshihiro Wakita:
Prediction of low recognition rate words for isolated word recognition system. 2095-2098 - Robert Batusek:
An objective measure for assessment of the concatenative TTS segment inventories. 2099-2102
Speech Recognition and Understanding: Confidence Measures
- Rong Zhang, Alexander I. Rudnicky:
Word level confidence annotation using combinations of features. 2105-2108 - Pedro J. Moreno, Beth Logan, Bhiksha Raj:
A boosting approach for confidence scoring. 2109-2112 - Delphine Charlet, Guy Mercier, Denis Jouvet:
On combining confidence measures for improved rejection of incorrect data. 2113-2116 - David D. Palmer, Mari Ostendorf:
Improved word confidence estimation using long range features. 2117-2120 - Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Dan Bohus, Alexander I. Rudnicky:
Is this conversation on track? 2121
Speech Recognition and Understanding: Language Modelling
- Ryuichi Nisimura, Kumiko Komatsu, Yuka Kuroda, Kentaro Nagatomo, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Automatic n-gram language model creation from web resources. 2127-2130 - Diamantino Caseiro, Isabel Trancoso:
On integrating the lexicon with the language model. 2131-2134 - Amparo Varona, Inés Torres:
Back-off smoothing evaluation over syntactic language models. 2135-2138 - Genqing Wu, Fang Zheng, Ling Jin, Wenhu Wu:
An online incremental language model adaptation method. 2139-2142 - Christer Samuelsson, James Hieronymus:
Using boosting and POS word graph tagging to improve speech recognition. 2143-2146
Dialogue Systems: Techniques and Strategies
- Pengju Yan, Fang Zheng, Mingxing Xu:
Robust parsing in spoken dialogue systems. 2149-2152 - Yinfei Huang, Fang Zheng, Yi Su, Fang Li, Wenhu Wu:
A theme structure method for the ellipsis resolution. 2153-2156 - Martin Haase, Werner Kriechbaum, Gregor Möhler, Gerhard Stenzel:
Deriving document structure from prosodic cues. 2157-2160 - Yi Su, Fang Zheng, Yinfei Huang:
Design of a semantic parser with support to ellipsis resolution in a Chinese spoken language dialogue system. 2161-2164 - Rubén San Segundo, Juan Manuel Montero, José Colás, Juana M. Gutiérrez, J. M. Ramos, José Manuel Pardo:
Methodology for dialogue design in telephone-based spoken dialogue systems: a Spanish train information system. 2165-2168 - Bo Zhang, Qingsheng Cai, Jianfeng Mao, Eric Chang, Baining Guo:
Spoken dialogue management as planning and acting under uncertainty. 2169-2172 - Yosuke Matsusaka, Shinya Fujie, Tetsunori Kobayashi:
Modeling of conversational strategy for the robot participating in the group conversation. 2173-2176 - Jacques M. B. Terken, Saskia te Riele:
Supporting the construction of a user model in speech-only interfaces by adding multi-modality. 2177-2180 - Shu-Chuan Tseng:
A word- and turn-oriented approach to exploring the structure of Mandarin dialogues. 2181-2184 - Yasuhisa Niimi, Tomoki Oku, Takuya Nishimoto, Masahiro Araki:
A rule based approach to extraction of topics and dialog acts in a spoken dialog system. 2185-2188 - Markku Turunen, Jaakko Hakulinen:
Agent-based error handling in spoken dialogue systems. 2189-2192 - Lars Degerstedt, Arne Jönsson:
Iterative implementation of dialogue system modules. 2193-2196 - Daniela Oppermann, Florian Schiel, Silke Steininger, Nicole Beringer:
Off-talk - a problem for human-machine-interaction? 2197-2200 - Jana Schwarz, Václav Matousek:
Automatic analysis of real dialogues and generating of training corpora. 2201-2204 - Klaus Macherey, Franz Josef Och, Hermann Ney:
Natural language understanding using statistical machine translation. 2205-2208 - Jianping Zhang, Wayne H. Ward, Bryan L. Pellom, Xiuyang Yu, Kadri Hacioglu:
Improvements in audio processing and language modeling in the CU communicator. 2209-2212 - Augustine Tsai, Andrew N. Pargellis, Chin-Hui Lee, Joseph P. Olive:
Dialogue session: management using voiceXML. 2213-2216 - Egbert Ammicht, Alexandros Potamianos, Eric Fosler-Lussier:
Ambiguity representation and resolution in spoken dialogue systems. 2217-2220 - Cosmin Popovici, Marco Andorno, Pietro Laface, Luciano Fissore, Mario Nigra, Claudio Vair:
Learning of user formulations for business listings in automatic directory assistance. 2325-2328 - D. Louloudis, Anastasios Tsopanoglou, Nikos Fakotakis, George K. Kokkinakis:
Mathematical modeling of spoken human - machine dialogues including erroneous confirmations. 2329-2332 - Ian Lewin:
Limited enquiry negotiation dialogues. 2333-2336 - Stephen Cox, Ben Shahshahani:
A comparison of some different techniques for vector based call-routing. 2337-2340 - Georg Niklfeld, Robert Finan, Michael Pucher:
Architecture for adaptive multimodal dialog systems based on voiceXML. 2341-2344
Speech Synthesis: Miscellaneous
- Minoru Tsuzaki:
Feature extraction by auditory modeling for unit selection in concatenative speech synthesis. 2223-2226 - Minkyu Lee:
Perceptual cost functions for unit searching in large corpus-based text-to-speech. 2227-2230 - Sanghun Kim, Youngjik Lee, Keikichi Hirose:
Pruning of redundant synthesis instances based on weighted vector quantization. 2231-2234 - Susan Fitt:
Using real words for recording diphones. 2235-2238 - John Dines, Sridha Sridharan, Miles Moody:
Application of the trended hidden Markov model to speech synthesis. 2239-2242 - Stefano Sandri, Enrico Zovato:
Two features to check phonetic transcriptions in text to speech systems. 2243-2246 - Gerasimos Xydas, Georgios Kouroupetroglou:
Text-to-speech scripting interface for appropriate vocalisation of e-texts. 2247-2250 - Matej Rojc, Zdravko Kacic:
Representation of large lexica using finite-state transducers for the multilingual text-to-speech synthesis systems. 2251-2254 - Keikichi Hirose, Masaya Eto, Nobuaki Minematsu, Atsuhiro Sakurai:
Corpus-based synthesis of fundamental frequency contours based on a generation process model. 2255-2258 - Zbynek Tychtl, Josef Psutka:
Corpus-based database of residual excitations used for speech reconstruction from MFCCs. 2259-2262 - Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura:
Mixed excitation for HMM-based speech synthesis. 2263-2266 - Takahiro Ohtsuka, Hideki Kasuya:
Aperiodicity control in ARX-based speech analysis-synthesis method. 2267-2270 - Matti Karjalainen, Tuomas Paatero:
Generalized source-filter structures for speech synthesis. 2271-2274 - Mikolaj Wypych:
The speech synthesis environment and parametric modeling of coarticulation. 2275-2278
Integration of Phonetic Knowledge in Speech Technology: Experiments and Experiences (Special Session)
- Julie Carson-Berndsen, Michael Walsh:
Defining constraints for multilinear speech processing. 2281-2284 - Anton Batliner, Bernd Möbius, Gregor Möhler, Antje Schweitzer, Elmar Nöth:
Prosodic models, automatic speech understanding, and speech synthesis: towards the common ground. 2285-2288 - Heidi Christensen, Børge Lindberg, Ove Andersen:
Introducing phonetically motivated information into ASR. 2289-2292 - Guillaume Gravier, François Yvon, Bruno Jacob, Frédéric Bimbot:
Integrating contextual phonological rules in a large vocabulary decoder. 2293-2296 - Moisés Pastor-i-Gadea, Francisco Casacuberta:
Automatic learning of finite state automata for pronunciation modeling. 2297-2300
Speech Coding: Wideband Speech Coding
- J. Rotola-Pukkila, Janne Vainio, Hannu Mikkola, Kari Järvinen, Bruno Bessette, Roch Lefebvre, Redwan Salami, Milan Jelinek:
AMR wideband codec - leap in mobile communication voice quality. 2303-2306 - Maria Farrugia, Ahmet M. Kondoz:
Combined speech and audio coding with bit rate and bandwidth scalability. 2307-2310 - Márk Fék, Annamária R. Várkonyi-Kóczy, Jean-Marc Boucher:
Joint speech and audio coding combining sinusoidal modeling and wavelet packets. 2311-2314 - Christian H. Ritz, Ian S. Burnett:
Temporal decomposition: a promising approach to low rate wideband speech compression. 2315-2318 - Stéphane Ragot, Hassan Lahdili, Roch Lefebvre:
Wideband LSF quantization by generalized voronoi codes. 2319-2322
Speech Recognition and Understanding: Robust ASR
- Luca Rigazio, Patrick Nguyen, David Kryze, Jean-Claude Junqua:
Separating speaker and environment variabilities for improved recognition in non-stationary conditions. 2347-2350 - Richard C. Rose, Hong Kook Kim, Donald Hindle:
Robust speech recognition techniques applied to a speech in noise task. 2351-2355 - Mohamed Afify, Olivier Siohan, Chin-Hui Lee:
Minimax classification with parametric neighborhoods for noisy speech recognition. 2355-2358 - Mukund Padmanabhan, Satya Dharanipragada:
Maximum likelihood non-linear transformation for environment adaptation in speech recognition. 2359-2362 - Jari Juhani Turunen, Damjan Vlaj:
A study of speech coding parameters in speech recognition. 2363-2366
Applications: Miscellaneous Applications
- Carmen García-Mateo, Laura Docío Fernández, Antonio Cardenal López:
Some practical considerations in the deployment of a wireless-communication interactive voice response system. 2369-2372 - Aaron E. Rosenberg, Julia Hirschberg, Michiel Bacchiani, Sarangarajan Parthasarathy, Philip L. Isenhour, Larry Stead:
Caller identification for the SCANMail voicemail browser. 2373-2376 - Konstantinos Koumpis, Steve Renals, Mahesan Niranjan:
Extractive summarization of voicemail using lexical and prosodic feature subset selection. 2377-2380 - Odette Scharenborg, Janienke Sturm, Lou Boves:
Business listings in automatic directory assistance. 2381-2384 - Moisés Pastor-i-Gadea, Alberto Sanchís, Francisco Casacuberta, Enrique Vidal:
Eutrans: a speech-to-speech translator prototype. 2385-2388 - Florian Metze, John W. McDonough, Hagen Soltau:
Speech recognition over netmeeting connections. 2389-2392 - Juan Carlos Díaz Martín, Juan-Luis García Zapata, José Manuel Rodríguez García, José F. Álvarez Salgado, Pablo Espada Bueno, Pedro Gómez Vilda:
DIARCA: a component approach to voice recognition. 2393-2396 - Y. J. Kyung, J. O. Jung, S. M. Sohn, H. J. Chun, S. Y. Moon, M. H. Kim, W. H. Sull:
The mvprotek : m-commerce voice verification system. 2397-2400 - Norman Alm, Mamoru Iwabuchi, Peter N. Andreasen, Kenryu Nakamura, Iain R. Murray:
Real-time multilingual communication by means of prestored conversational units. 2401-2404 - Iain R. Murray, John L. Arnott, Norman Alm, Richard Dye, Gillian Harper:
Writing script-based dialogues for AAC. 2405-2409 - Akemi Iida, Yosuke Sakurada, Nick Campbell, Michiaki Yasumura:
Communication aid for non-vocal people using corpusbased concatenative speech synthesis. 2409-2412 - Noriko Suzuki, Kazuhiko Kakehi, Yugo Takeuchi, Michio Okada:
Social effects on vocal rate with echoic mimicry using prosody-only voice. 2413-2416 - Eric Castelli, Dan Istrate:
Everyday life sounds and speech analysis for a medical telemonitoring system. 2417-2420 - Christoph Draxler, Klaus Bengler, Cristina Olaverri-Monreal:
Speaking while driving - preliminary results on spellings in the German speechdat-car database. 2421-2424
Signal Analysis: Pitch and Speech Analysis
- Dan Chazan, Meir Tzur, Ron Hoory, Gilad Cohen:
Efficient periodicity extraction based on sine-wave representation and its application to pitch determination of speech signals. 2427-2430 - Islam Shdaifat, Rolf-Rainer Grigat, Stefan Lütgert:
Viseme recognition using multiple feature matching. 2431-2434 - Annemie Van Hirtum, Daniel Berckmans:
The fundamental frequency of cough by autocorrelation analysis. 2435-2438 - Yuichi Ishimoto, Masashi Unoki, Masato Akagi:
A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency. 2439-2442 - Akira Sasou, Kazuyo Tanaka:
Robust LP analysis using glottal source HMM with application to high-pitched and noise corrupted speech. 2443-2446 - Yong-Soo Choi, Dae Hee Youn:
Fast harmonic estimation using a low resolution pitch for low bit rate harmonic coding. 2447-2450 - Alain de Cheveigné, Hideki Kawahara:
Comparative evaluation of F0 estimation algorithms. 2451-2454 - Carlos Toshinori Ishi, Nobuaki Minematsu, Ryuji Nishide, Keikichi Hirose:
Identification of accent and intonation in sentences for CALL systems. 2455-2458 - Hideki Kawahara, Parham Zolfaghari:
Systematic F0 glitches around nasal-vowel transitions. 2459-2462 - Jacek C. Wojdel, Léon J. M. Rothkrantz:
Using aerial and geometric features in automatic lip-reading. 2463-2466 - Karl Schnell, Arild Lacroix:
Inverse filtering of tube models with frequency dependent tube terminations. 2467-2470 - Kaïs Ouni, Zied Lachiri, Noureddine Ellouze:
Formant estimation using gammachirp filterbank. 2471-2474 - Ilyas Potamitis, Nikos Fakotakis:
Autoregressive time-frequency interpolation in the context of missing data theory for impulsive noise compensation. 2475-2478 - Davor Petrinovic, Vladimir Cuperman:
Analysis of the voiced speech using the generalized fourier transform with quadratic phase. 2479-2482
Integration of Phonetic Knowledge in Speech Technology: Is Phonetic Knowledge any use? Panel discussion (Special Session)
- Steven Greenberg:
From here to utility - melding phonetic insight with speech technology. 2485-2488
Speech Coding: Speech Transmission Systems
- Sang-Wook Park, Young-Cheol Park, Dae Hee Youn:
Speech quality measure for voIP using wavelet based bark coherence function. 2491-2494 - Sander J. van Wijngaarden, Herman J. M. Steeneken:
A proposed method for measuring language dependency of narrow band voice coders. 2495-2498 - Sung-Wan Yoon, Sung-Kyo Jung, Young-Cheol Park, Dae Hee Youn:
An efficient transcoding algorithm for g.723.1 and g.729a speech coders. 2499-2502 - José L. Pérez-Córdoba, Antonio J. Rubio, Antonio M. Peinado, Ángel de la Torre:
Joint source-channel coding for low bit-rate coding of LSP parameters. 2503-2506
Speech Recognition and Understanding: Rhythm and Timing in ASR
- Britta Wrede, Gernot A. Fink, Gerhard Sagerer:
An investigation of modelling aspects for ratedependent speech recognition. 2527-2530 - Hiroaki Nanjo, Kazuomi Kato, Tatsuya Kawahara:
Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition. 2531-2534 - Tibor Fábián, Thilo Pfau, Günther Ruske:
Analysis of n-best output hypotheses for fast speech in large vocabulary continuous speech recognition. 2535-2538 - Jérôme Farinas, François Pellegrino:
Automatic rhythm modeling for language identification. 2539-2542
Speech Recognition and Understanding: Confidence Measures and OOV
- Yaxin Zhang, Raymond Lee, Anton Madievski:
Confidence measure (CM) estimation for large vocabulary speaker-independent continuous speech recognition system. 2545-2548 - Yasuhiro Kodama, Takehito Utsuro, Hiromitsu Nishizaki, Seiichi Nakagawa:
Experimental evaluation on confidence of agreement among multiple Japanese LVCSR models. 2549-2552 - Rubén San Segundo, Javier Macías Guarasa, Javier Ferreiros, P. Martín, José Manuel Pardo:
Detection of recognition errors and out of the spelling dictionary names in a spelled name recognizer for Spanish. 2553-2556 - Erhan Mengusoglu, Christophe Ris:
Use of acoustic prior information for confidence measure in ASR applications. 2557-2560 - Luciana Ferrer, Claudio Estienne:
Improving performance of a keyword spotting system by using a new confidence measure. 2561-2564 - Beng Tiong Tan, Yong Gu, Trevor Thomas:
Word level confidence measures using n-best sub-hypotheses likelihood ratio. 2565-2568 - Vaibhava Goel, Shankar Kumar, William Byrne:
Confidence based lattice segmentation and minimum Bayes-risk decoding. 2569-2572 - Hui Jiang, Frank K. Soong, Chin-Hui Lee:
A data selection strategy for utterance verification in continuous speech recognition. 2573-2576 - Jun Ogata, Yasuo Ariki:
Improved speech recognition using iterative decoding based on confidence measures. 2577-2580 - Thomas Schaaf:
Detection of OOV words using generalized word models and a semantic class language model. 2581-2584 - Gies Bouwman, Janienke Sturm, Lou Boves:
Effects of OOV rates on keyphrase rejection schemes. 2585-2588
Signal Analysis: Source Localisation and Beam Forming
- José-Luis Sánchez-Bote, Joaquin Gonzalez-Rodriguez, Danilo Simon-Zorita:
A new auditory based microphone array and objective evaluation using e-RASTI. 2591-2594 - Shoko Araki, Shoji Makino, Ryo Mukai, Hiroshi Saruwatari:
Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers. 2595-2598 - Ryo Mukai, Shoko Araki, Shoji Makino:
Separation and dereverberation performance of frequency domain blind source separation for speech in a reverberant environment. 2599-2602 - Hiroshi Saruwatari, Toshiya Kawamura, Kiyohiro Shikano:
Blind source separation for speech based on fast-convergence algorithm with ICA and beamforming. 2603-2606 - Mitsunori Mizumachi, Satoshi Nakamura:
Noise reduction using paired-microphones for both far-field and near-field sound sources. 2607-2610 - Takanobu Nishiura, Satoshi Nakamura, Kiyohiro Shikano:
Statistical sound source identification in a real acoustic environment for robust speech recognition using a microphone array. 2611-2614 - Agustín Álvarez Marquina, Pedro Gómez Vilda, Rafael Martínez-Olalla, Victor Nieto Lluis, María Victoria Rodellar Biarge:
Speech enhancement and source separation based on binaural negative beamforming. 2615-2618 - Pedro Gómez Vilda, Agustín Álvarez Marquina, Victor Nieto Lluis, María Victoria Rodellar Biarge, Rafael Martínez-Olalla:
Multiple source separation in the frequency domain using negative beamforming. 2619-2622 - Rainer Martin, Alexey Petrovsky, Thomas Lotter:
Planar superdirective microphone arrays for speech acquisition in the car. 2623-2626 - Tomi Kinnunen, Ismo Kärkkäinen, Pasi Fränti:
Is speech data clustered? - statistical analysis of cepstral features. 2627-2630 - George Nokas, Evangelos Dermatas, George K. Kokkinakis:
Maximum likelihood adaptation for distant speech recognition of stationary and moving speakers in reverberant environments. 2631-2634 - Laurent Couvreur, Christophe Ris, Christophe Couvreur:
Model-based blind estimation of reverberation time: application to robust ASR in reverberant environments. 2635-2638 - Yasunori Momomura, Kenji Okada, Takayuki Arai, Noboru Kanedera, Yuji Murahara:
Using the modulation complex wavelet transform for feature extraction in automatic speech recognition. 2639-2642 - Hiroshi G. Okuno, Kazuhiro Nakadai, Tino Lourens, Hiroaki Kitano:
Separating three simultaneous speeches with two microphones by integrating auditory and visual processing. 2643-2646
Signal Analysis: Speech Features and Modelling
- Keiichi Funaki:
A time-varying complex AR speech analysis based on GLS and ELS method. 2649-2652 - Michael Pitz, Sirko Molau, Ralf Schlüter, Hermann Ney:
Vocal tract normalization equals linear transformation in cepstral space. 2653-2656 - An-Tze Yu, Hsiao-Chuan Wang:
An algorithm for finding line spectrum frequencies of added speech signals and its application to robust speech recognition. 2657-2660 - Gilles Gonon, Silvio Montrésor, Marc Baudry:
Improved entropic gain for speech signals analysis/synthesis based on an adaptive time-frequency segmentation scheme. 2661-2664
Speech Recognition and Understanding: Kids, Toys and Emotions
- Helmut Lucke, Masanori Omote:
Automatic word acquisition from continuous speech. 2667-2670 - Qun Li, Martin J. Russell:
Why is automatic recognition of children's speech difficult? 2671-2674 - Sudha Arunachalam, Dylan Gould, Elaine Andersen, Dani Byrd, Shrikanth S. Narayanan:
Politeness and frustration language in child-machine interactions. 2675-2678 - Albino Nogueiras, Asunción Moreno, Antonio Bonafonte, José B. Mariño:
Speech emotion recognition using hidden Markov models. 2679-2682
Applications: Media Applications
- Aseel Ibrahim, Jonas Lundberg, Jenny Johansson:
Speech enhanced remote control for media terminal. 2685-2688 - Rui Amaral, Thibault Langlois, Hugo Meinedo, João Paulo Neto, Nuno Souto, Isabel Trancoso:
The development of a portuguese version of a media watch system. 2689-2692 - Matthew Roach, John S. D. Mason:
Classification of video genre using audio. 2693-2696 - Yasuo Horiuchi, Akira Ichikawa:
Prosody in finger braille and teletext receiver for finger braille. 2697-2702
Speech Recognition and Understanding: Distributed Speech Recognition
- Alexis Bernard, Abeer Alwan:
Joint channel decoding - Viterbi recognition for wireless applications. 2703-2706 - Antonio M. Peinado, Victoria E. Sánchez, José C. Segura, José L. Pérez-Córdoba:
MMSE-based channel error mitigation for distributed speech recognition. 2707-2710 - Jan Stadermann, Ralf Meermeier, Gerhard Rigoll:
Distributed speech recognition using traditional and hybrid modeling techniques. 2711-2714 - Eve A. Riskin, Constantinos Boulis, Scott Otterson, Mari Ostendorf:
Graceful degradation of speech recognition performance over lossy packet networks. 2715-2718
Speech Recognition and Understanding: Prosody and Cross-Language in ASR
- Tanja Schultz, Alex Waibel:
Experiments on cross-language acoustic modeling. 2721-2724 - Andrej Zgank, Bojan Imperl, Finn Tore Johansen, Zdravko Kacic, Bogomir Horvat:
Crosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering. 2725-2729 - Mikko Harju, Petri Salmela, Jussi Leppänen, Olli Viikki, Jukka Saarinen:
Comparing parameter tying methods for multilingual acoustic modelling. 2729-2732 - Rathinavelu Chengalvarayan:
Accent-independent universal HMM-based speech recognizer for american, australian and british English. 2733-2736 - Fang Chen, Jonas Sääv:
The effect of time stress on automatic speech recognition accuracy when using second language. 2737-2740 - Yiu Wing Wong, Eric Chang:
The effect of pitch and lexical tone on different Mandarin speech recognition tasks. 2741-2744 - Georg Stemmer, Elmar Nöth, Heinrich Niemann:
Acoustic modeling of foreign words in a German speech recognition system. 2745-2748 - Kai-Chung Siu, Helen M. Meng:
Semi-automatic grammar induction for bi-directional English-Chinese machine translation. 2749-2752 - Patavee Charnvivit, Somchai Jitapunkul, Visarut Ahkuputra, Ekkarit Maneenoi, Umavasee Thathong, Boonchai Thampanitchawong:
F0 feature extraction by polynomial regression function for monosyllabic Thai tone recognition. 2753-2756 - Ji-Hwan Kim, Philip C. Woodland:
The use of prosody in a combined system for punctuation generation and speech recognition. 2757-2760 - Chao Wang, Stephanie Seneff:
Lexical stress modeling for improved speech recognition of spontaneous telephone speech in the jupiter domain. 2761-2765 - Todd A. Stephenson, Mathew Magimai-Doss, Hervé Bourlard:
Modeling auxiliary information in Bayesian network based ASR. 2765-2768 - Feili Chen, Eric Chang:
A new dynamic HMM model for speech recognition. 2769-2772 - Wern-Jun Wang, Chun-Jen Lee, Eng-Fong Huang, Sin-Horng Chen:
Multi-keyword spotting of telephone speech using orthogonal transform-based SBR and RNN prosodic model. 2773-2776 - Andrej Iskra, Bojan Petek, Tom Brøndsted:
Recognition of slovenian speech: within and cross-language experiments on monophones using the speechdat(II). 2777-2780 - Anton Batliner, Jan Buckow, Richard Huber, Volker Warnke, Elmar Nöth, Heinrich Niemann:
Boiling down prosody for the classification of boundaries and accents in German and English. 2781-2784
Education: Education and Training
- Andrzej Drygajlo, Gary Garcia Molina:
Javaspeakerrecognition - interactive workbench for visualizing speaker recognition concepts on the WWW. 2787-2790 - Takayuki Arai, Nobuyuki Usuki, Yuji Murahara:
Prototype of a vocal-tract model for vowel production designed for education in speech science. 2791-2794 - Martin Cooke, María Luisa García Lecumberri, John A. Maidment:
A tool for automatic feedback on phonemic transcription. 2795-2798 - Eric Chang, Yu Shi, Jian-Lai Zhou, Chao Huang:
Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research. 2799-2802 - John H. A. L. de Jong, Jared Bernstein:
Relating phonepass scores overall scores to the council of europe framework level descriptors. 2803-2806 - Klára Vicsi, Peter Roach, Anne-Marie Öster, Zdravko Kacic, Ferenc Csatári, Anna Sfakianaki, R. Veronik, Géza Gordos:
A multilingual, multimodal, speech training system, SPECO. 2807-2810 - Naoki Nakamura, Nobuaki Minematsu, Seiichi Nakagawa:
Instantaneous estimation of accentuation habits for Japanese students to learn English pronunciation. 2811-2814 - Takashi Tanaka, Kazumasa Mori, Satoshi Kobayashi, Seiichi Nakagawa:
Automatic construction of CALL system from TV news program with captions. 2815-2818
Speaker Recognition: Features and Robustness
- Mijail Arcienega, Andrzej Drygajlo:
Pitch-dependent GMMs for text-independent speaker recognition systems. 2821-2825 - Hassan Ezzaidi, Jean Rouat, Douglas D. O'Shaughnessy:
Towards combining pitch and MFCC for speaker recognition systems. 2825-2828 - Yu-Jin Kim, Hea-Kyoung Jung, Jae-Ho Chung:
Formant-broadened CMS using peak-picking in LOG spectrum. 2829-2832 - Daniel J. Mashao, N. Tinyiko Baloyi:
Improvements in the speaker identification rate using feature-sets. 2833-2836 - Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
Minimum classification error training for speaker identification using Gaussian mixture models based on multi-space probability distribution. 2837-2840 - Yadong Wu, Zhizhu Li:
Speaker recognition based on feature space trace. 2841-2844 - Néstor Becerra Yoma, Miguel Villar Fernandez:
Additive and convolutional noise canceling in speaker verification using a stochastic weighted viterbi algorithm. 2845-2848 - Kenichi Yoshida, Kazuyuki Takagi, Kazuhiko Ozeki:
A multi-SNR subband model for speaker identification under noisy environments. 2849-2852
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.