default search action
INTERSPEECH 2007: Antwerp, Belgium
- 8th Annual Conference of the International Speech Communication Association, INTERSPEECH 2007, Antwerp, Belgium, August 27-31, 2007. ISCA 2007
Keynotes 1-4
- Victor Zue:
On organic interfaces. 1-8 - Sophie K. Scott:
The neural basis of speech perception - a view from functional imaging. 9-13 - Alex Waibel, Keni Bernardin, Matthias Wölfel:
Computer-supported human-human multilingual communication. 14-21 - Pierre-Yves Oudeyer:
Self-organization in the evolution of shared systems of speech sounds: a computational study. 22-29
Discriminative and Large Margin Techniques in Acoustic Modeling
- Jinyu Li, Chin-Hui Lee:
Soft margin feature extraction for automatic speech recognition. 30-33 - Yan Yin, Hui Jiang:
A fast optimization method for large margin estimation of HMMs based on second order cone programming. 34-37 - Hao-Zheng Li, Douglas D. O'Shaughnessy:
Frame margin probability discriminative training algorithm for noisy speech recognition. 38-41 - Fabio Valente, Jithendra Vepa, Christian Plahl, Christian Gollan, Hynek Hermansky, Ralf Schlüter:
Hierarchical neural networks feature extraction for LVCSR system. 42-45 - Peder A. Olsen, John R. Hershey:
Bhattacharyya error and divergence using variational importance sampling. 46-49 - Tingyao Wu, Jacques Duchateau, Dirk Van Compernolle:
Phoneme dependent frame selection preference. 50-53
Speech Production I, II
- Xinhui Zhou, Carol Y. Espy-Wilson, Mark Tiede, Suzanne Boyce:
An articulatory and acoustic study of "retroflex" and "bunched" american English rhotic sound based on MRI. 54-57 - Paula Martins, Inês Carbone, Augusto Silva, António J. S. Teixeira:
An MRI study of european portuguese nasals. 58-61 - Sayoko Takano, Hiroki Matsuzaki, Kunitoshi Motoki:
A four-cube FEM model of the extrinsic and intrinsic tongue muscles to simulate the production of vowel /i/. 62-65 - Juan F. Torres, Elliot Moore:
Performance evaluation of glottal quality measures from the perspective of vocal tract filter consistency. 66-69 - Veena D. Singampalli, Philip J. B. Jackson:
Statistical identification of critical, dependent and redundant articulators. 70-73 - Chao Qin, Miguel Á. Carreira-Perpiñán:
An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping. 74-77
Phonetic Segmentation and Classification I, II
- Peter Karsmakers, Kristiaan Pelckmans, Johan A. K. Suykens, Hugo Van hamme:
Fixed-size kernel logistic regression for phoneme classification. 78-81 - Seung Seop Park, Jong Won Shin, Jong Kyu Kim, Nam Soo Kim:
A multiple-model based framework for automatic speech segmentation. 82-85 - Aren Jansen, Partha Niyogi:
Semi-supervised learning of speech sounds. 86-89 - Abhinav Parate, Ashish Verma, Jayanta Basak:
Evaluation of syllable stress using single class classifier. 90-93 - Mohammad Nurul Huda, Muhammad Ghulam, Junsei Horikawa, Tsuneo Nitta:
Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks. 94-97 - Jean-Philippe Goldman, Mathieu Avanzi, Anne-Catherine Simon, Anne Lacheret, Antoine Auchlin:
A methodology for the automatic detection of perceived prominent syllables in spoken French. 98-101
Discourse, Dialog and Conversation
- Hiroki Mori, Hideki Kasuya:
Voice source and vocal tract variations as cues to emotional states perceived from expressive conversational speech. 102-105 - Fan Yang, Peter A. Heeman:
Exploring initiative strategies using computer simulation. 106-109 - Chiu-yu Tseng, Zhao-yu Su:
From one base form to multiple output styles - predicting stylistic dynamics of discourse prosody. 110-113 - Claudia Crocco, Renata Savy:
Topic in dialogue: prosodic and syntactic features. 114-117 - Michiko Watanabe, Yasuharu Den, Keikichi Hirose, Shusaku Miwa, Nobuaki Minematsu:
Features of pauses and conjunctions at syntactic and discourse boundaries in Japanese monologues. 118-121
Spoken Dialog Systems I, II
- Craig Wootton, Michael F. McTear, Terry Anderson:
Utilizing online content as domain knowledge in a multi-domain dynamic dialogue system. 122-125 - Boris W. van Schooten, Sophie Rosset, Olivier Galibert, Aurélien Max, Rieks op den Akker, Gabriel Illouz:
Handling speech input in the ritel QA dialogue system. 126-129 - Woosung Kim:
Online call quality monitoring for automating agent-based call centers. 130-133 - Sebastian Möller, Klaus-Peter Engelbrecht, Antti Oulasvirta:
Analysis of communication failures for spoken dialogue systems. 134-137 - Sandra Mann, André Berton, Ute Ehrlich:
How to access audio files of large data bases using in-car speech dialogue systems. 138-141 - Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno:
Analyzing temporal transition of real user's behaviors in a spoken dialogue system. 142-145 - J. Sherwani, Dong Yu, Tim Paek, Mary Czerwinski, Yun-Cheng Ju, Alex Acero:
Voicepedia: towards speech-based access to unstructured information. 146-149 - Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, Shrikanth S. Narayanan:
Exploiting prosodic features for dialog act tagging in a discriminative modeling framework. 150-153 - Hua Ai, Antonio Roque, Anton Leuski, David R. Traum:
Using information state to improve dialogue move identification in a spoken dialogue system. 154-157 - Shiu-Wah Chu, Ian M. O'Neill, Philip Hanna:
Using multiple strategies to manage spoken dialogue. 158-161 - Marcelo Quinderé, Luís Seabra Lopes, António J. S. Teixeira:
An information state based dialogue manager for a mobile robot. 162-165
Accent and Language Identification I, II
- Josef G. Bauer, Bernt Andrassy, Ekaterina Timoshenko:
Discriminative optimization of language adapted HMMs for a language identification system based on parallel phoneme recognizers. 166-169 - Khe Chai Sim, Haizhou Li:
Fusion of contrastive acoustic models for parallel phonotactic spoken language identification. 170-173 - Liang Wang, Eliathamby Ambikairajah, Eric H. C. Choi:
Multi-layer kohonen self-organizing feature map for language identification. 174-177 - Bo Yin, Eliathamby Ambikairajah, Fang Chen:
Hierarchical language identification based on automatic language clustering. 178-181 - Ekaterina Timoshenko, Harald Höge:
Using speech rhythm for acoustic language identification. 182-185 - Kakeung Wong, Man-Hung Siu, Brian Mak:
A model-based estimation of phonotactic language verification performance. 186-189 - Mike Rosner, Paulseph-John Farrugia:
A tagging algorithm for mixed language identification in a noisy domain. 190-193 - Doroteo T. Toledano, Javier Gonzalez-Dominguez, Alejandro Abejón-Gonzalez, Danilo Spada, Ismael Mateos-Garcia, Joaquin Gonzalez-Rodriguez:
Improved language recognition using better phonetic decoders and fusion with MFCC and SDC features. 194-197
Education and Training
- Daniel Bolaños, Wayne H. Ward, Sarel van Vuuren, Javier Garrido Salas:
Syllable lattices as a basis for a children's speech reading tracker. 198-201 - Fuping Pan, Qingwei Zhao, Yonghong Yan:
Mandarin vowel pronunciation quality evaluation by using formant pattern recognition. 202-205 - Matthew Black, Joseph Tepperman, Sungbok Lee, Patti Price, Shrikanth S. Narayanan:
Automatic detection and classification of disfluent reading miscues in young children's speech for the purpose of assessment. 206-209 - Nobuaki Minematsu, K. Kamata, Satoshi Asakawa, Takehiko Makino, Tazuko Nishimura, Keikichi Hirose:
Structural assessment of language learners' pronunciation. 210-213 - Abdurrahman Samir, Sherif Mahdy Abdou, Ahmed Husien Khalil, Mohsen A. Rashwan:
Enhancing usability of CAPL system for qur'an recitation learning. 214-217 - Febe de Wet, Christa van der Walt, Thomas Niesler:
Automatic large-scale oral language proficiency assessment. 218-221
Robust ASR I, II
- Yuki Denda, Takamasa Tanaka, Masato Nakayama, Takanobu Nishiura, Yoichi Yamashita:
Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation. 222-225 - Agustín Álvarez Marquina, Rafael Martínez, Pedro Gómez, Victor Nieto Lluis, V. Rodellar:
A robust mel-scale subband voice activity detector for a car platform. 226-229 - Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki:
Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio. 230-233 - A. M. Toh, Roberto Togneri, Sven Nordholm:
Feature and distribution normalization schemes for statistical mismatch reduction in reverberant speech recognition. 234-237 - Matthew Gibson, Thomas Hain:
Temporal masking for unsupervised minimum Bayes risk speaker adaptation. 238-241 - Tsung-hsueh Hsieh, Jeih-Weih Hung:
Speech feature compensation based on pseudo stereo codebooks for robust speech recognition in additive noise environments. 242-245 - Dimitrios Dimitriadis, Petros Maragos, Stamatios Lefkimmiatis:
Multiband, multisensor robust features for noisy speech recognition. 246-249 - Akira Sasou, Hiroaki Kojima:
Noise robust speech recognition for voice driven wheelchair. 250-253
Adaptation in ASR I, II
- Yun Tang, Richard C. Rose:
Clustered maximum likelihood linear basis for rapid speaker adaptation. 254-257 - Wen Xuan Teng, Guillaume Gravier, Frédéric Bimbot, Frédéric Soufflet:
Rapid speaker adaptation by reference model interpolation. 258-261 - Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection. 262-265 - Brian Kan-Wing Mak, Roger Wend-Huu Hsiao:
Robustness of several kernel-based fast adaptation methods on noisy LVCSR. 266-269 - Janne Pylkkönen:
Estimating VTLN warping factors by distribution matching. 270-273 - Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, Zhengyou Zhang:
Frequency domain correspondence for speaker normalization. 274-277 - Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Unsupervised training of adaptation rate using q-learning in large vocabulary continuous speech recognition. 278-281 - Martin Karafiát, Lukás Burget, Jan Cernocký, Thomas Hain:
Application of CMLLR in narrow band wide band adapted systems. 282-285 - Christophe Lévy, Georges Linarès, Jean-François Bonastre:
Fast adaptation of GMM-based compact models. 286-289
Speaker Verification & Identification I-IV
- Zahi N. Karam, William M. Campbell:
A new kernel for SVM MLLR based speaker recognition. 290-293 - Kong-Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen:
A GMM-based probabilistic sequence kernel for speaker verification. 294-297 - Hagai Aronowitz:
Speaker recognition using kernel-PCA and intersession variability modeling. 298-301 - Réda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel:
Linear and non linear kernel GMM supervector machines for speaker verification. 302-305 - Ignacio López-Moreno, Ismael Mateos-Garcia, Daniel Ramos, Joaquin Gonzalez-Rodriguez:
Support vector regression for speaker verification. 306-309 - Chris Longworth, Mark J. F. Gales:
Derivative and parametric kernels for speaker verification. 310-313
Spoken Data Retrieval I, II
- David R. H. Miller, Michael Kleber, Chia-Lin Kao, Owen Kimball, Thomas Colthurst, Stephen A. Lowe, Richard M. Schwartz, Herbert Gish:
Rapid and accurate spoken term detection. 314-317 - Yi-Cheng Pan, Hung-lin Chang, Berlin Chen, Lin-Shan Lee:
Subword-based position specific posterior lattices (s-PSPL) for indexing speech information. 318-321 - Andreas Merkel, Dietrich Klakow:
Improved methods for language model based question classification. 322-325 - Tomoyosi Akiba, Hirofumi Tsujimura:
Error-tolerant question answering for spoken documents. 326-329 - Dilek Hakkani-Tür, Gökhan Tür, Michael Levit:
Exploiting information extraction annotations for document retrieval in distillation tasks. 330-333 - Kishan Thambiratnam, Frank Seide:
Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis. 334-337
Accent and Language Identification I, II
- David A. van Leeuwen, Khiet P. Truong:
An open-set detection evaluation methodology applied to language and emotion recognition. 338-341 - Xi Yang, Man-Hung Siu, Herbert Gish, Brian Mak:
Boosting with anti-models for automatic language identification. 342-345 - Fabio Castaldo, Daniele Colibro, Emanuele Dalmasso, Pietro Laface, Claudio Vair:
Acoustic language identification using fast discriminative training. 346-349 - Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, Yonghong Yan:
Spoken language identification using score vector modeling and support vector machine. 350-353 - Ricardo de Córdoba, Luis Fernando D'Haro, Fernando Fernández Martínez, Javier Macías Guarasa, Javier Ferreiros:
Language identification based on n-gram frequency ranking. 354-357 - Wade Shen, Douglas A. Reynolds:
Improving phonotactic language recognition with acoustic adaptation. 358-361
Speech Perception I, II
- Michael C. W. Yip:
Spoken word recognition of Chinese homophones: a further investigation. 362-365 - Maria K. Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, David Owens:
The role of outer hair cell function in the perception of synthetic versus natural speech. 366-369 - Akiko Kusumoto, Alexander Kain, John-Paul Hosom, Jan P. H. van Santen:
Hybridizing conversational and clear speech. 370-373 - Sophie Dufour, Ulrich H. Frauenfelder:
Neighborhood density and neighborhood frequency effects in French spoken word recognition. 374-377 - Toshio Irino, Yoshie Aoki, Yoshie Hayashi, Hideki Kawahara, Roy D. Patterson:
Discrimination and recognition of scaled word sounds. 378-381 - László Tóth:
Benchmarking human performance on the acoustic and linguistic subtasks of ASR systems. 382-385 - Lin Yang, Jianping Zhang, Yonghong Yan:
Contributions of temporal fine structure cues to Chinese speech recognition in cochlear implant simulation. 386-389 - Xihong Wu, Jing Chen, Zhigang Yang, Qiang Huang, Mengyuan Wang, Liang Li:
Effect of number of masking talkers on speech-on-speech masking in Chinese. 390-393 - Odile Bagou, Sophie Dufour, Cécile Fougeron, Alain Content, Ulrich H. Frauenfelder:
Do different boundary types induce subtle acoustic cues to which French listeners are sensitive? 394-397 - Svante Stadler, Arne Leijon, Björn Hagerman:
An information theoretic approach to predict speech intelligibility for listeners with normal and impaired hearing. 398-401 - Travis Wade, Bernd Möbius:
Speaking rate effects in a landmark-based phonetic exemplar model. 402-405 - Kazumi Maniwa, Allard Jongman, Travis Wade:
Acoustic correlates of intelligibility enhancements in clearly produced fricatives. 406-409 - Tim Jürgens, Thomas Brand, Birger Kollmeier:
Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model. 410-413 - Ayako Ikeno, John H. L. Hansen:
Lombard speech impact on perceptual speaker recognition. 414-417 - Huiwen Goy, Kathleen Pichora-Fuller, Pascal van Lieshout, Gurjit Singh, Bruce Schneider:
Effect of within- and between-talker variability on word identification in noise by younger and older adults. 418-421 - H. Timothy Bunnell, N. Carolyn Schanen, Linda D. Vallino, Thierry G. Morlet, James B. Polikoff, Jennette D. Driscoll, James T. Mantell:
Speech perception in children with speech sound disorder. 422-425 - Huan Wang, Werner Hemmert:
Speech coding and information processing by auditory neurons. 426-429 - Annie C. Gilbert, Victor J. Boucher:
What do listeners attend to in hearing prosodic structures? investigating the human speech-parser using short-term recall. 430-433
Prosody: Prosodic Structure
- Yosuke Igarashi:
Pitch pattern alternation in goshogawara Japanese: evidence for a prosodic phrase above the domain for downstep. 434-437 - Irina Nesterenko, Pavel A. Skrelin:
Some evidence on the phonetics and phonology of prosodic phrasing in Russian. 438-441 - Jan Volín, Radek Skarnitzl:
Temporal downtrends in Czech read speech. 442-445 - Hyongsil Cho, Daniel Hirst:
Empirical evidence for prosodic phrasing: pauses as linguistic annotation in Korean read speech. 446-449 - Markus Dreyer, Izhak Shafran:
Exploiting prosody for PCFGs with latent annotations. 450-453 - Qin Shi, Danning Jiang, Fanping Meng, Yong Qin:
Combining length distribution model with decision tree in prosodic phrase prediction. 454-457 - Li-chiung Yang:
Duration and pauses as boundary-markers in speech: a cross-linguistic study. 458-461
Prosodic Modeling I, II
- Jian Yu, Lixing Huang, Jianhua Tao, Xia Wang:
Modeling incompletion phenomenon in Mandarin dialog prosody. 462-465 - Anne Tamm, Kálmán Abari, Gábor Olaszy:
Accent assignment algorithm in Hungarian, based on syntactic analysis. 466-469 - Cheng-Yuan Lin, Pei-Chi Jao, Jyh-Shing Roger Jang:
An effective initial/final duration prediction method for corpus-based singing voice synthesis of Mandarin Chinese. 470-473 - Géza Németh, Márk Fék, Tamás Gábor Csapó:
Increasing prosodic variability of text-to-speech synthesizers. 474-477 - Damien Lolive, Nelly Barbot, Olivier Boëffard:
Unsupervised HMM classification of F0 curves. 478-481 - Ian Read, Stephen Cox:
Automatic pitch accent prediction for text-to-speech synthesis. 482-485 - Xinqiang Ni, Yining Chen, Frank K. Soong, Min Chu, Ping Zhang:
An unsupervised approach to automatic prosodic annotation. 486-489 - Zeynep Inanoglu, Steve J. Young:
A system for transforming the emotion in speech: combining data-driven conversion techniques for prosody and voice quality. 490-493 - Chen-Yu Chiang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen:
An automatic prosody labeling method for Mandarin speech. 494-497
Speech Analysis
- Koby Crammer:
A conservative aggressive subspace tracker. 498-501 - Mattias Nilsson, W. Bastiaan Kleijn:
Mutual information and the speech signal. 502-505 - Tony Ezzat, Jake V. Bouvrie, Tomaso A. Poggio:
Spectro-temporal analysis of speech using 2-d Gabor filters. 506-509 - Tomas Dekens, Mike Demol, Werner Verhelst, Piet Verhoeve:
A comparative study of speech rate estimation techniques. 510-513 - Tiago H. Falk, Hua Yuan, Wai-Yip Chan:
Spectro-temporal processing for blind estimation of reverberation time and single-ended quality measurement of reverberant speech. 514-517
Spectral Analysis, Formants and Vocal Tract Models
- Toon van Waterschoot, Marc Moonen:
Linear prediction of audio signals. 518-521 - Carlo Magi, Tom Bäckström, Paavo Alku:
Stabilised weighted linear prediction - a robust all-pole method for speech processing. 522-525 - Daniel Rudoy, Daniel N. Spendley, Patrick J. Wolfe:
Conditionally linear Gaussian models for estimating vocal tract resonances. 526-529 - Karl Schnell, Arild Lacroix:
Time-varying pre-emphasis and inverse filtering of speech. 530-533 - Joachim Thiemann, Peter Kabal:
Reconstructing audio signals from modified non-coherent hilbert envelopes. 534-537 - Binh Phu Nguyen, Masato Akagi:
A flexible spectral modification method based on temporal decomposition and Gaussian mixture model. 538-541 - Jonathan Darch, Ben Milner:
A comparison of estimated and MAP-predicted formants and fundamental frequencies with a speech reconstruction application. 542-545 - Huiqun Deng, Douglas D. O'Shaughnessy:
Effect of incomplete glottal closures on estimates of glottal waves via inverse filtering of vowel sounds. 546-549 - Kaustubh Kalgaonkar, Mark A. Clements:
Vocal tract and area function estimation with both lip and glottal losses. 550-553 - S. Guruprasad, B. Yegnanarayana, K. Sri Rama Murty:
Detection of instants of glottal closure using characteristics of excitation source. 554-557 - Nicolas Sturmel, Christophe d'Alessandro, Boris Doval:
A comparative evaluation of the zeros of z transform representation for voice source estimation. 558-561
Speech and Audio Processing for Intelligent Environments
- Aki Härmä:
Ambient telephony: scenarios and research challenges. 562-565 - Yasunari Obuchi, Akio Amano:
Always listening to you: creating exhaustive audio database in home environments. 566-569 - Joerg Schmalenstroeer, Reinhold Haeb-Umbach:
Joint speaker segmentation, localization and identification for streaming audio. 570-573 - Yan-Chen Lu, Martin Cooke, Heidi Christensen:
Active binaural distance estimation for dynamic sources. 574-577 - Bengt J. Borgström, Abeer Alwan:
A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition. 578-581 - Matthias Wölfel:
Channel selection by class separability measures for automatic transcriptions on distant microphones. 582-585 - Danny Wyatt, Tanzeem Choudhury, Jeff A. Bilmes:
Conversation detection and speaker segmentation in privacy-sensitive situated speech data. 586-589 - Alberto Abad, Carlos Segura, Climent Nadeu, Javier Hernando:
Audio-based approaches to head orientation estimation in a smart-room. 590-593 - Valentin Ion, Reinhold Haeb-Umbach:
Multi-resolution soft features for channel-robust distributed speech recognition. 594-597
Language Modeling I, II
- Yi Su, Frederick Jelinek, Sanjeev Khudanpur:
Large-scale random forest language models for speech recognition. 598-601 - Yuya Akita, Yusuke Nemoto, Tatsuya Kawahara:
PLSA-based topic detection in meetings for adaptation of lexicon and language model. 602-605 - Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki:
Language modeling using PLSA-based topic HMM. 606-609 - Yi-Cheng Pan, Lin-Shan Lee:
Lexicon adaptation with reduced character error (LARCE) - a new direction in Chinese language modeling. 610-613 - Meng-Sung Wu, Jen-Tzung Chien:
Minimum rank error training for language modeling. 614-617 - Wen Wang, Andreas Stolcke:
Integrating MAP, marginals, and unsupervised language model adaptation. 618-621
Prosody Production and Perception
- Sasha Calhoun:
Predicting focus through prominence structure. 622-625 - Murtaza Bulut, Sungbok Lee, Shrikanth S. Narayanan:
Analysis of emotional speech prosody in terms of part of speech tags. 626-629 - Fang Liu, Yi Xu:
The neutral tone in question intonation in Mandarin. 630-633 - Amélie Rochet-Capellan, Jean-Luc Schwartz, Rafael Laboissière, Arturo Galvàn:
Pointing to a target while naming it with /pata/ or /tapa/: the effect of consonants and stress position on jaw-finger coordination. 634-637 - Øydis Hide, Steven Gillis, Paul Govaerts:
Suprasegmental aspects of pre-lexical speech in cochlear implanted children. 638-641 - Oliver Niebuhr:
Categorical perception in intonation: a matter of signal dynamics? 642-645
Multimodal Speech Recognition
- Noureddine Aboutabit, Denis Beautemps, Jeanne Clarke, Laurent Besacier:
A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case. 646-649 - Patrick Lucey, Gerasimos Potamianos, Sridha Sridharan:
A unified approach to multi-pose audio-visual ASR. 650-653 - Rowan Seymour, Darryl Stewart, Ji Ming:
Audio-visual integration for robust speech recognition using maximum weighted stream posteriors. 654-657 - Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone:
Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips. 658-661 - Bo Zhu, Timothy J. Hazen, James R. Glass:
Multimodal speech recognition with ultrasonic sensors. 662-665 - David Dean, Patrick Lucey, Sridha Sridharan, Tim Wark:
Fused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition. 666-669
Speech and Other Modalities
- Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
Analysis of head motions and speech in spoken dialogue. 670-673 - Lars Bo Larsen, Kasper Løvborg Jensen, Søren Larsen, Morten Højfeldt Rasmussen:
A paradigm for mobile speech-centric services. 674-677 - Pavel Campr, Marek Hrúz, Milos Zelezný:
Design and recording of Czech sign language corpus for automatic sign language recognition. 678-681 - Jens Edlund, Jonas Beskow:
Pushy versus meek - using avatars to influence turn-taking behaviour. 682-685 - Michael Wand, Szu-Chen Stan Jou, Tanja Schultz:
Wavelet-based front-end for electromyographic speech recognition. 686-689 - Gaëlle Ferré, Roxane Bertrand, Philippe Blache, Robert Espesser, Stéphane Rauzy:
Intensive gestures in French and their multimodal correlates. 690-693 - Slim Ouni, Kaïs Ouni:
Aspects of visual speech in Arabic. 694-697 - Denis Burnham, Jessica Reynolds, Guillaume Vignali, Sandra Bollwerk, Caroline Jones:
Rigid vs non-rigid face and head motion in phone and tone perception. 698-701
Multimodal/Multimedia Signal Processing
- Hedvig Kjellström, Olov Engwall, Sherif Mahdy Abdou, Olle Bälter:
Audio-visual phoneme classification for pronunciation training applications. 702-705 - Katja Grauwinkel, Britta Dewitt, Sascha Fagel:
Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech. 706-709 - Wei Zhou, Zengfu Wang:
A speech rate related lip movement model for speech animation. 710-713 - Guanyong Wu, Jie Zhu:
An extension 2DPCA based visual feature extraction method for audio-visual speech recognition. 714-717 - Soo-Jong Lee, Jun Park, Eung-Kyeu Kim:
Preventing an external acoustic noise from being misrecognized as a speech recognition object by confirming the lip movement image signal. 718-721 - Gregor Hofer, Hiroshi Shimodaira:
Automatic head motion prediction from speech data. 722-725 - Yuki Denda, Takanobu Nishiura, Yoichi Yamashita:
Omnidirectional audio-visual talker localizer with dynamic feature fusion based on validity and reliability criteria. 726-729 - Nick Campbell, Damien Douxchamps:
Processing image and audio information for recognising discourse participation status through features of face and voice. 730-733
Speaker Verification & Identification I-IV
- José R. Calvo, Rafael Fernández, Gabriel Hernández:
Application of shifted delta cepstral features in speaker verification. 734-737 - Luciana Ferrer, M. Kemal Sönmez, Elizabeth Shriberg:
A smoothing kernel for spatially related features and its application to speaker verification. 738-741 - Delphine Charlet, Mikaël Collet, Frédéric Bimbot:
VZ-norm: an extension of z-norm to the multivariate case for anchor model based speaker verification. 742-745 - Howard Lei, Nikki Mirghafori:
Word-conditioned HMM supervectors for speaker recognition. 746-749 - Wei-Ho Tsai:
Speaker clustering using direct maximization of a BIC-based score. 750-753 - Alexandre Preti, Jean-François Bonastre, Driss Matrouf, François Capman, Bertrand Ravera:
Confidence measure based unsupervised target model adaptation for speaker verification. 754-757 - Huanjun Bao, Ming-Xing Xu, Thomas Fang Zheng:
Emotion attribute projection for speaker recognition on emotional speech. 758-761 - Shi-Xiong Zhang, Man-Wai Mak, Helen M. Meng:
High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling. 762-765 - T. Yingthawornsuk, H. Kaymaz Keskinpala, D. Mitchell Wilkes, Richard G. Shiavi, Ronald M. Salomon:
Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech. 766-769 - Claudio Garretón, Néstor Becerra Yoma, Fernando Huenupán, Carlos Molina:
On comparing and combining intra-speaker variability compensation and unsupervised model adaptation in speaker verification. 770-773 - Xianyu Zhao, Yuan Dong, Hao Yang, Jian Zhao, Liang Lu, Haila Wang:
Comparison of two kinds of speaker location representation for SVM-based speaker verification. 774-777 - Mireia Farrús, Javier Hernando, Pascual Ejarque:
Jitter and shimmer measurements for speaker recognition. 778-781 - Zhenyu Shan, Yingchun Yang, Ruizhi Ye:
Natural-emotion GMM transformation algorithm for emotional speaker recognition. 782-785 - Ivy H. Tseng, Olivier Verscheure, Deepak S. Turaga, Upendra V. Chaudhari:
Optimized one-bit quantization for adapted GMM-based speaker verification. 786-789 - Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan:
A comparison of session variability compensation techniques for SVM-based speaker recognition. 790-793 - Benoit G. B. Fauve, Nicholas W. D. Evans, Neil Pearson, Jean-François Bonastre, John S. D. Mason:
Influence of task duration in text-independent speaker verification. 794-797
Speech Enhancement
- Kamil K. Wójcicki, Stephen So, Kuldip K. Paliwal:
The effect of the additivity assumption on time and frequency domain wiener filtering for speech enhancement. 798-801 - Junfeng Li, Shuichi Sakamoto, Satoshi Hongo, Masato Akagi, Yôiti Suzuki:
Noise reduction based on adaptive β-order generalized spectral subtraction for speech enhancement. 802-805 - Amit Das, John H. L. Hansen:
Class constrained ROVER based speech enhancement. 806-809 - Erhan Deger, Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu, Md. Kamrul Hasan:
EMD based soft-thresholding for speech enhancement. 810-813 - Adam Borowicz, Alexander A. Petrovsky:
An approximate solution for perceptually constrained signal subspace speech enhancement method. 814-817 - Tim Fingscheidt, Suhadi Suhadi:
Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo. 818-821 - Anis Ben Aicha, Sofia Ben Jebara:
Perceptual musical noise reduction using critical bands tonality coefficients and masking thresholds. 822-825 - Dirk Mauler, Anil M. Nagathil, Rainer Martin:
On optimal estimation of compressed speech for hearing aids. 826-829 - Richard C. Hendriks, Jesper Jensen, Richard Heusdens:
DFT domain subspace based noise tracking for speech enhancement. 830-833 - Nitish Krishnamurthy, John H. L. Hansen:
Noise tracking for speech systems in adverse environments. 834-837 - Abderrahman Essebbar, Tristan Poinsard:
Speech enhancement using multi-reference noise reduction in a vehicle environment. 838-841 - Ernst Warsitz, Reinhold Haeb-Umbach, Dang Hai Tran Vu:
Blind adaptive principal eigenvector beamforming for acoustical source separation. 842-845 - Zbynek Koldovský, Petr Tichavský:
Time-domain blind audio source separation using advanced ICA methods. 846-849 - Siu Wa Lee, Frank K. Soong, Pak-Chung Ching:
Model-based speech separation with single-microphone input. 850-853 - Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Masato Miyoshi:
Multi-step linear prediction based speech dereverberation in noisy reverberant environment. 854-857 - Seung Yeol Lee, Jong Won Shin, Hwan Sik Yun, Nam Soo Kim:
A statistical model based post-filtering algorithm for residual echo suppression. 858-861 - Xiaoshan Huang, Xiaoqun Zhao:
An optimal speech enhancement under speech uncertainty probability and masking property of auditory system. 862-865
Structure-based and Template-based Automatic Speech Recognition
- Viktoria Maier, Roger K. Moore:
Temporal episodic memory model: an evolution of minerva2. 866-869 - Gianpaolo Coro, Francesco Cutugno, Fulvio Caropreso:
Speech recognition with factorial-HMM syllabic acoustic models. 870-873 - Mathias De Wachter, Kris Demuynck, Patrick Wambacq, Dirk Van Compernolle:
Evaluating acoustic distance measures for template based recognition. 874-877 - Yan Han, Lou Boves:
Hierarchical acoustic modeling based on random-effects regression for automatic speech recognition. 878-881 - Annika Hämäläinen, Louis ten Bosch, Lou Boves:
Construction and analysis of multiple paths in syllable models. 882-885 - Carol Y. Espy-Wilson, Tarun Pruthi, Amit Juneja, Om Deshmukh:
Landmark-based approach to speech recognition: an alternative to HMMs. 886-889 - Satoshi Asakawa, Nobuaki Minematsu, Keikichi Hirose:
Automatic recognition of connected vowels only using speaker-invariant representation of speech dynamics. 890-893 - Roberto Togneri, Li Deng:
A structured speech model parameterized by recursive dynamics and neural networks. 894-897 - Li Deng, Helmer Strik:
Structure-based and template-based automatic speech recognition - comparing parametric and non-parametric approaches. 898-901 - David Grangier, Samy Bengio:
Learning the inter-frame distance for discriminative template-based keyword detection. 902-905 - Dong Yu, Li Deng, Alex Acero:
Handling phonetic context and speaker variation in a structure-based speech recognizer. 906-909
Robust ASR Against Noise and Reverberation
- Maarten Van Segbroeck, Hugo Van hamme:
Vector-quantization based mask estimation for missing data automatic speech recognition. 910-913 - Sébastien Demange, Christophe Cerisara, Jean Paul Haton:
Accurate marginalization range for missing data recognition. 914-917 - Marco Kühne, Roberto Togneri, Sven Nordholm:
Smooth soft mel-spectrographic masks based on blind sparse source separation. 918-921 - Jonathan Laidler, Martin Cooke, Neil D. Lawrence:
Model-driven detection of clean speech patches in noise. 922-925 - Richard M. Stern, Evandro B. Gouvêa, Govindarajan Thattai:
"polyaural" array processing for automatic speech recognition in degraded environments. 926-929 - Nicolás Morales, Liang Gu, Yuqing Gao:
Adding noise to improve noise robustness in speech recognition. 930-933
Language Resources and Tools
- Eric Fosler-Lussier, Laura Dilley, Na'im R. Tyson, Mark A. Pitt:
The buckeye corpus of speech: updates and enhancements. 934-937 - Nora Barroso, Aitzol Ezeiza, N. Gilisagasti, Karmele López de Ipiña, A. López, Juan Miguel López:
Development of multimodal resources for multilingual information retrieval in the basque context. 938-941 - Reva Schwartz, Wade Shen, Joseph P. Campbell, Shelley Paget, Julie Vonwiller, Dominique Estival, Christopher Cieri:
Construction of a phonotactic dialect corpus using semiautomatic annotation. 942-945 - Slim Abdennadher, Mohamed Aly, Dirk Bühler, Wolfgang Minker, Johannes Pittermann:
BECAM tool - a semi-automatic tool for bootstrapping emotion corpus annotation and management. 946-949 - Christopher Cieri, Linda Corson, David Graff, Kevin Walker:
Resources for new research directions in speaker recognition: the mixer 3, 4 and 5 corpora. 950-953 - Peter A. Heeman, Andy McMillin, J. Scott Yaruss:
Intercoder reliability in annotating complex disfluencies. 954-957
Single-channel Speech Enhancement
- Mohammad H. Radfar, Richard M. Dansereau:
Single channel speech separation using maximum a posteriori estimation. 958-961 - Suhadi Suhadi, Tim Fingscheidt:
Speech enhancement with improved a posteriori SNR computation. 962-965 - Thang Tat Vu, Germine Seide, Masashi Unoki, Masato Akagi:
Method of LP-based blind restoration for improving intelligibility of bone-conducted speech. 966-969 - Tiago H. Falk, Svante Stadler, W. Bastiaan Kleijn, Wai-Yip Chan:
Noise suppression based on extending a speech-dominated modulation band. 970-973 - Amin Haji Abolhassani, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy, Mohamed Faouzi Harkat:
Speech enhancement using PCA and variance of the reconstruction error model identification. 974-977 - Jong Won Shin, Woohyung Lim, June Sig Sung, Nam Soo Kim:
Speech reinforcement based on partial specific loudness. 978-981
Phonetics and Phonology
- Tamara Rathcke, Jonathan Harrington:
The phonetics and phonology of high and low tones in two falling f0-contours in standard German. 982-985 - Tina John, Jonathan Harrington:
Temporal alignment of creaky voice in neutralised realisations of an underlying, post-nasal voicing contrast in German. 986-989 - Mike Demol, Werner Verhelst, Piet Verhoeve:
The duration of speech pauses in a multilingual environment. 990-993 - Dafydd Gibbon, Jolanta Bachan, Grazyna Demenko:
Syllable timing patterns in Polish: results from annotation mining. 994-997 - Constandinos Kalimeris, Stelios Bakamidis:
Minimal pairs and functional loads of sound contrasts obtained from a list of modern greek words. 998-1001 - Daan Wissing:
More on acoustic correlates of stress. 1002-1005 - Cécile Woehrling, Philippe Boula de Mareüil:
Comparing praat and snack formant measurements on two large corpora of northern and southern French. 1006-1009 - William J. Barry, Bistra Andreeva, Ingmar Steiner:
The phonetic exponency of phrasal accentuation in French and German. 1010-1013 - Christiana Christodoulou:
Phonetic geminates in cypriot greek: the case of voiceless plosives. 1014-1017 - Darcie Williams, François Poiré:
Predicting vowel duration in spontaneous canadian French speech. 1018-1021 - Ivan Chow, François Poiré:
Rhotic variation and schwa epenthesis in windsor French. 1022-1025 - Audrey Bürki, Cécile Fougeron, Cédric Gendrot:
On the categorical nature of the process involved in schwa elision in French. 1026-1029 - Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang:
Exploring tonal variations via context-dependent tone models. 1030-1033 - Philippe Martin, Jun Li:
Acoustic analysis of the neutral tone in Mandarin. 1034-1037 - Rerrario Shui-Ching Ho, Yoshinori Sagisaka:
F0 analysis of perceptual distance among Cantonese level tones. 1038-1041
Robust ASR I, II
- Yu Hu, Qiang Huo:
Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions. 1042-1045 - Luis Buera, Antonio Miguel, Eduardo Lleida, Oscar Saz, Alfonso Ortega:
On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition. 1046-1049 - Yu Tsao, Chin-Hui Lee:
An ensemble modeling approach to joint characterization of speaker and speaking environments. 1050-1053 - Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:
Cluster-based polynomial-fit histogram equalization (CPHEQ) for robust speech recognition. 1054-1057 - Pedro M. Martinez, José C. Segura, Luz García:
Robust distributed speech recognition using histogram equalization and correlation information. 1058-1061 - Jen-Tzung Chien, Koichi Shinoda, Sadaoki Furui:
Predictive minimum Bayes risk classification for robust speech recognition. 1062-1065 - Ning Ma, Jon Barker, Phil D. Green:
Applying word duration constraints by using unrolled HMMs. 1066-1069 - Xiong Xiao, Engsiong Chng, Haizhou Li:
Evaluating the temporal structure normalisation technique on the Aurora-4 task. 1070-1073 - Hynek Boril, Petr Fousek, Harald Höge:
Two-stage system for robust neutral/lombard speech recognition. 1074-1077 - Takatoshi Jitsuhiro, Tomoji Toriyama, Kiyoshi Kogure:
Noise suppression using search strategy with multi-model compositions. 1078-1081 - Takanobu Nishiura, Yoshiki Hirano, Yuki Denda, Masato Nakayama:
Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria. 1082-1085 - Stefan Windmann, Reinhold Haeb-Umbach:
An approach to iterative speech feature enhancement and recognition. 1086-1089 - Jeih-Weih Hung:
Optimization of temporal filters in the modulation frequency domain for constructing robust features in speech recognition. 1090-1093 - Rico Petrick, Kevin Lohde, Matthias Wolff, Rüdiger Hoffmann:
The harming part of room acoustics in automatic speech recognition. 1094-1097 - Yuan-Fu Liao, Yh-Her Yang, Chi-Hui Hsu, Cheng-Chang Lee, Jing-Teng Zeng:
A reference model weighting-based method for robust speech recognition. 1098-1101 - Babak Nasersharif, Ahmad Akbari, Mohammad Mehdi Homayounpour:
Mel sub-band filtering and compression for robust speech recognition. 1102-1105
Features for ASR
- Chang-Wen Hsu, Lin-Shan Lee:
Extended powered cepstral normalization (p-CN) with range equalization for robust features in speech recognition. 1106-1109 - Makoto Sakai, Norihide Kitaoka, Seiichi Nakagawa:
Selection of optimal dimensionality reduction method using chernoff bound for segmental unit input HMM. 1110-1113 - Vivek Tyagi:
Fepstrum: an improved modulation spectrum for ASR. 1114-1117 - Dusan Macho:
Narrowband to wideband feature expansion for robust multilingual ASR. 1118-1121 - Weifeng Li, Hervé Bourlard:
Non-linear spectral contrast stretching for in-car speech recognition. 1122-1125 - Xiao-Bing Li, Douglas D. O'Shaughnessy:
Clustering-based two-dimensional linear discriminant analysis for speech recognition. 1126-1129 - Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai:
A study on temporal features derived by analytic signal. 1130-1133 - Stephen A. Zahorian, Tara Singh, Hongbing Hu:
Dimensionality reduction of speech features using nonlinear principal components analysis. 1134-1137 - D. Rama Sanand, D. Dinesh Kumar, Srinivasan Umesh:
Linear transformation approach to VTLN using dynamic frequency warping. 1138-1141 - Vladimir Fabregas Surigué de Alencar, Abraham Alcaim:
Features interpolation domain for distributed speech recognition and performance for ITU-t g.723.1 CODEC. 1142-1145 - Shoei Sato, Kazuo Onoe, Akio Kobayashi, Shinichi Homma, Toru Imai, Tohru Takagi, Tetsunori Kobayashi:
Dynamic integration of multiple feature streams for robust real-time LVCSR. 1146-1149 - Hironori Matsumasa, Tetsuya Takiguchi, Yasuo Ariki, Ichao Li, Toshitaka Nakabayashi:
PCA-based feature extraction for fluctuation in speaking style of articulation disorders. 1150-1153 - Fabio Valente, Jithendra Vepa, Hynek Hermansky:
Multi-stream features combination based on dempster-shafer rule for LVCSR system. 1154-1157 - Natasha Singh-Miller, Michael Collins, Timothy J. Hazen:
Dimensionality reduction for speech recognition using neighborhood components analysis. 1158-1161 - Dan Su, Xihong Wu, Huisheng Chi:
Probabilistic latent speaker analysis for large vocabulary speech recognition. 1162-1165 - S. R. Mahadeva Prasanna, Hynek Hermansky:
MRASTA and PLP in automatic speech recognition. 1166-1169
Objective Assessment of Voice and Speech Quality
- Markus Brckl:
Women's vocal aging: a longitudinal approach. 1170-1173 - Laurence Cnockaert, Jean Schoentgen, Canan Ozsancak, Pascal Auzou, Francis Grenez:
Effect of intensive voice therapy on vocal tremor for parkinson speakers. 1174-1177 - Ali Alpan, Abdellah Kacha, Francis Grenez, Jean Schoentgen:
Assessment of vocal dysperiodicities in connected disordered speech. 1178-1181 - Anne-Maria Laukkanen, Jaromír Horácek, Pavel Svancara, Elina Lehtinen:
Effects of FE modelled consequences of tonsillectomy on perceptual evaluation of voice. 1182-1185 - Irma Verdonck-de Leeuw, Louis ten Bosch, Li Ying Chao, Rico N. P. M. Rinkel, Pepijn A. Borggreven, Lou Boves, C. René Leemans:
Speech quality after major surgery of the oral cavity and oropharynx with microvascular soft tissue reconstruction. 1186-1189 - Christel G. de Bruijn, Sandra P. Whiteside:
Voice fatigue and use of speech recognition: a study of voice quality ratings. 1190-1193 - Jean-François Bonastre, Corinne Fredouille, Alain Ghio, Antoine Giovanni, Gilles Pouchoulin, Joana Revis, Bernard Teston, P. Yu:
Complementary approaches for voice disorder assessment. 1194-1197 - Gilles Pouchoulin, Corinne Fredouille, Jean-François Bonastre, Alain Ghio, Antoine Giovanni:
Frequency study for the characterization of the dysphonic voices. 1198-1201 - Victor J. Boucher:
Acoustic correlates of laryngeal-muscle fatigue: findings for a phonometric prevention of acquired voice pathologies. 1202-1205 - Andreas K. Maier, Maria Schuster, Anton Batliner, Elmar Nöth, Emeka Nkenke:
Automatic scoring of the intelligibility in patients with cancer of the oral cavity. 1206-1209 - Jacques Duchateau, Leen Cleuren, Hugo Van hamme, Pol Ghesquière:
Automatic assessment of children's reading level. 1210-1213 - Carlos A. Ferrer-Riesgo, María Esperanza Hernández-Díaz, Eduardo González-Moreira:
Using waveform matching techniques in the measurement of shimmer in voiced signals. 1214-1217 - Rubén Fraile, Juan Ignacio Godino-Llorente, Nicolás Sáenz-Lechón, Víctor Osma-Ruiz, Pedro Gómez Vilda:
Analysis of the impact of analogue telephone channel on MFCC parameters for voice pathology detection. 1218-1221 - Claudia Manfredi, Leonardo Bocchi, Giovanna Cantarella, Giorgio Peretti, Gabriele Guidi, Vincenzo Mezzatesta:
Objective parameters from videokymographic images: a user-friendly interface. 1222-1225
Speaker Verification & Identification I-IV
- Elizabeth Shriberg, Luciana Ferrer:
A text-constrained prosodic system for speaker verification. 1226-1229 - Asmaa El Hannani, Dijana Petrovska-Delacrétaz:
Fusing acoustic, phonetic and data-driven systems for text-independent speaker verification. 1230-1233 - Najim Dehak, Patrick Kenny, Pierre Dumouchel:
Continuous prosodic features and formant modeling with joint factor analysis for speaker verification. 1234-1237 - Claudio Vair, Daniele Colibro, Fabio Castaldo, Emanuele Dalmasso, Pietro Laface:
Loquendo - Politecnico di torino's 2006 NIST speaker recognition evaluation system. 1238-1241 - Driss Matrouf, Nicolas Scheffer, Benoit G. B. Fauve, Jean-François Bonastre:
A straightforward and efficient implementation of the factor analysis model for speaker verification. 1242-1245 - Timothy J. Hazen, Daniel Schultz:
Multi-modal user authentication from video for mobile or variable-environment applications. 1246-1249
Discourse, Dialog and Emotion Expression
- David House:
Integrating audio and visual cues for speaker friendliness in multimodal speech synthesis. 1250-1253 - Wieneke Wesseling, R. J. J. H. van Son, Louis C. W. Pols:
The influence of masking words on the prediction of TRPs in a shadowed dialog. 1254-1257 - Kornel Laskowski, Susanne Burger:
Analysis of the occurrence of laughter in meetings. 1258-1261 - Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts:
Incremental perception of acted and real emotional speech. 1262-1265 - David Schlangen, Raquel Fernández:
Speaking through a noisy channel - experiments on inducing clarification behaviour in human-human dialogue. 1266-1269 - Christophe d'Alessandro, Albert Rilliard, Sylvain Le Beux:
Computerized chironomy: evaluation of hand-controlled intonation reiteration. 1270-1273
Prosodic Modeling I, II
- Keikichi Hirose, Keiko Ochi, Nobuaki Minematsu:
Corpus-based generation of prosodic features from text based on generation process model. 1274-1277 - Jilei Tian, Jani Nurminen, Imre Kiss:
Novel eigenpitch-based prosody model for text-to-speech synthesis. 1278-1281 - Volker Strom, Ani Nenkova, Robert A. J. Clark, Yolanda Vazquez-Alvarez, Jason M. Brenier, Simon King, Dan Jurafsky:
Modelling prominence and emphasis improves unit-selection synthesis. 1282-1285 - Seiya Takada, Yuji Yagi, Keikichi Hirose, Nobuaki Minematsu:
A framework of reply speech generation for concept-to-speech conversion in spoken dialogue systems. 1286-1289 - Thorsten Stocksmeier, Stefan Kopp, Dafydd Gibbon:
Synthesis of prosodic attitudinal variants in German backchannel ja. 1290-1293 - Ke Li, Yoko Greenberg, Yoshinori Sagisaka:
Inter-language prosodic style modification experiment using word impression vector for communicative speech generation. 1294-1297
Resource Acquisition and Preparation; Resource and System Evaluation
- Ivan Habernal, Miloslav Konopík:
JAAE: the java abstract annotation editor. 1298-1301 - Goshu Nagino, Makoto Shozakai, Kiyohiro Shikano:
How to judge reusability of existing speech corpora for target task by utilizing statistical multidimensional scaling. 1302-1305 - Peter Rutten:
Feasibility of constructing an expressive speech corpus from television soap opera dialogue. 1306-1309 - Rosemary Orr, Bernat González i Llinares, Françoise Petersen, Helge Hüttenrauch, Martin Böcker, Michael Tate:
Collection of empirical data for standardization of generic vocabularies in speech driven ICT devices and services. 1310-1313 - Antonio Marcos Selmini, Fábio Violaro:
Acoustic-phonetic features for refining the explicit speech segmentation. 1314-1317 - Benjamin Lecouteux, Georges Linarès, Frédéric Beaugendre, Pascal Nocera:
Text island spotting in large speech databases. 1318-1321 - Tim Paek, Yun-Cheng Ju, Christopher Meek:
People watcher: a game for eliciting human-transcribed data for automated directory assistance. 1322-1325 - Andrew L. Kun, Tim Paek, Zeljko Medenica:
The effect of speech interface accuracy on driving performance. 1326-1329 - Hua Zhang, Lijuan Wang, Frank K. Soong, Wenju Liu:
Context constrained-generalized posterior probability for verifying phone transcriptions. 1330-1333 - Pongtep Angkititrakul, DongGu Kwak, SangJo Choi, JeongHee Kim, Anh PhucPhan, Amardeep Sathyanarayana, John H. L. Hansen:
Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems. 1334-1337 - BalaKrishna Kolluru, Yoshihiko Gotoh:
Relative evaluation of informativeness in machine generated summaries. 1338-1341 - Toshiyuki Takezawa, Masahide Mizushima, Tohru Shimizu, Gen-ichiro Kikui:
A method for evaluating task-oriented spoken dialog translation systems based on communication efficiency. 1342-1345 - Charlotte van Hooijdonk, Edwin Commandeur, Reinier Cozijn, Emiel Krahmer, Erwin Marsi:
Using eye movements for online evaluation of speech synthesis. 1346-1349 - Jian Li, Dmitry Sityaev, Jie Hao:
Sentence level intelligibility evaluation for Mandarin text-to-speech systems using semantically unpredictable sentences. 1350-1353 - Judith M. Kessens, David A. van Leeuwen:
N-best: the northern- and southern-dutch benchmark evaluation of speech recognition technology. 1354-1357 - Trym Holter, Svein Srsdal:
A MAP based approach to adaptive speech intelligibility measurements. 1358-1361 - Sirinoot Boonsuk, Proadpran Punyabukkana, Atiwong Suchato:
Phone boundary detection using selective refinements and context-dependent acoustic features. 1362-1365
Speech Production I, II
- Sorin Dusan:
Vocal tract length during speech production. 1366-1369 - Nobuhiro Miki, Kyohei Hayashi:
Approximation method of subglottal system using ARMA filter. 1370-1373 - Asterios Toutios, Konstantinos G. Margaritis:
Enhancing acoustic-to-EPG mapping with lip position information. 1374-1377 - Tokihiko Kaburagi, Yosuke Tanabe:
A model of glottal flow incorporating viscous-inviscid interaction. 1378-1381 - Kilian G. Seeber:
Thinking outside the cube: modeling language processing tasks in a multiple resource paradigm. 1382-1385 - Julien Cisonni, Annemie Van Hirtum, Jan Willems, Xavier Pelorson:
Experimental validation of direct and inverse glottal flow models for unsteady flow conditions. 1386-1389 - Hideyuki Nomura, Tetsuo Funada:
Effect of unsteady glottal flow on the speech production process. 1390-1393 - Katrin Schneider, Bernd Möbius:
Word stress correlates in spontaneous child-directed speech in German. 1394-1397 - Michael Aron, Nicolas Ferveur, Erwan Kerrien, Marie-Odile Berger, Yves Laprie:
Acquisition and synchronization of multimodal articulatory data. 1398-1401 - Vincent Robert, Yves Laprie, Anne Bonneau:
A phonetic concatenative approach of labial coarticulation. 1402-1405 - Aseel Turkmani, Adrian Hilton, Philip J. B. Jackson, James D. Edge:
Visual analysis of lip coarticulation in VCV utterances. 1406-1409 - Matti Airas, Paavo Alku:
Comparison of multiple voice source parameters in different phonation types. 1410-1413 - Monja A. Knoll, Lisa Scharrer:
Acoustic and affective comparisons of natural and imaginary infant-, foreigner- and adult-directed speech. 1414-1417 - André Araújo, Luis M. T. Jesus, Isabel M. Costa:
Vowel production in two occlusal classes. 1418-1421 - Rajesh Khatiwada:
Nepalese retroflex stops: a static palatography study of inter- and intra-speaker variability. 1422-1425 - Charles A. Lamoureux, Victor J. Boucher:
Effects of testosterone levels on temporal and intonational aspects of speech: more exploratory data. 1426-1428
ASR: New Paradigms
- Tien Ping Tan, Laurent Besacier:
Modeling context and language variation for non-native speech recognition. 1429-1432 - Xufang Zhao, Douglas D. O'Shaughnessy:
An evaluation of cross-language adaptation and native speech training for rapid HMM construction based on very limited training data. 1433-1436 - Konstantin Markov, Satoshi Nakamura:
Never-ending learning with dynamic hidden Markov network. 1437-1440 - Catherine Breslin, Mark J. F. Gales:
Building multiple complementary systems using directed decision trees. 1441-1444 - Hiroaki Nanjo, Yuichi Oku, Takehiko Yoshimi:
Automatic speech recognition framework for multilingual audio contents. 1445-1448 - Ghazi Bouselmi, Dominique Fohr, Irina Illina:
Combined acoustic and pronunciation modelling for non-native speech recognition. 1449-1452 - Tadashi Emori, Yoshifumi Onishi, Koichi Shinoda:
Automatic estimation of scaling factors among probabilistic models in speech recognition. 1453-1456 - Emilian Stoimenov, John W. McDonough:
Memory efficient modeling of polyphone context with weighted finite-state transducers. 1457-1460 - Valeriy Pylypenko:
Extra large vocabulary continuous speech recognition algorithm based on information retrieval. 1461-1464 - I. Lee Hetherington:
PocketSUMMIT: small-footprint continuous speech recognition. 1465-1468 - Tobias Cincarek, Izumi Shindo, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task. 1469-1472 - Chengyuan Ma, Chin-Hui Lee:
A study on word detector design and knowledge-based pruning and rescoring. 1473-1476 - Thomas Colthurst, Tresi Arvizo, Chia-Lin Kao, Owen Kimball, Stephen A. Lowe, David R. H. Miller, Jim Van Sciver:
Parameter tuning for fast speech recognition. 1477-1480 - Louis ten Bosch, Bert Cranen:
A computational model for unsupervised word discovery. 1481-1484 - Bernd T. Meyer, Matthias Wächter, Thomas Brand, Birger Kollmeier:
Phoneme confusions in human and automatic speech recognition. 1485-1488 - Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa:
Construction of spoken language model including fillers using filler prediction model. 1489-1492 - Raghunandan Kumaran, Jeff A. Bilmes, Katrin Kirchhoff:
Attention shift decoding for conversational speech recognition. 1493-1496
Speech and Language Technology for Less-resourced Languages
- Péter Mihajlik, Tibor Fegyó, Zoltán Tüske, Pavel Ircing:
A morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages - like Hungarian. 1497-1500 - Mei Yang, Jing Zheng, Andreas Kathol:
A semi-supervised learning approach for morpheme segmentation for an Arabic dialect. 1501-1504 - Gerhard B. Van Huyssteen, Martin J. Puttkammer:
Accelerating the annotation of lexical data for less-resourced languages. 1505-1508 - Christoph Draxler:
On web-based creation of speech resources for less-resourced languages. 1509-1512 - Miroslav Martinovic, Srdjdan Vesic, Goran Rakic:
Building an information retrieval system for serbian - challenges and solutions. 1513-1516 - Guy De Pauw, Peter Waiganjo Wagacha:
Bootstrapping morphological analysis of gĩkũyũ using unsupervised maximum entropy learning. 1517-1520 - Jerneja Zganec-Gros, Stanislav Gruden:
The voiceTRAN machine translation system. 1521-1524 - Sérgio Paulo, Luís C. Oliveira:
MuLAS: a framework for automatically building multi-tier corpora. 1525-1528 - Jacquelijn Ringersma, Marc Kemps-Snijders:
Creating multimedia dictionaries of endangered languages using LEXUS. 1529-1532 - Hrafn Loftsson, Eiríkur Rögnvaldsson:
IceNLP: a natural language processing toolkit for icelandic. 1533-1536 - Marius Peche, Marelie H. Davel, Etienne Barnard:
Phonotactic spoken language identification with limited training data. 1537-1540 - Solomon Teferra Abate, Wolfgang Menzel:
Automatic speech recognition for an under-resourced language - amharic. 1541-1544 - Abdillahi Nimaan, Pascal Nocera, Frédéric Béchet, Jean-François Bonastre:
Information retrieval strategies for accessing african audio corpora. 1545-1548 - Vesa Siivola, Mathias Creutz, Mikko Kurimo:
Morfessor and variKN machine learning tools for speech and language technology. 1549-1552 - Markpong Jongtaveesataporn, Issara Thienlikit, Chai Wutiwiwatchai, Sadaoki Furui:
Towards better language modeling for Thai LVCSR. 1553-1556
Adaptation in ASR I, II
- Jonas Lööf, Ralf Schlüter, Hermann Ney:
Efficient estimation of speaker-specific projecting feature transforms. 1557-1560 - Mohamed Kamal Omar:
Regularized feature-based maximum likelihood linear regression for speech recognition. 1561-1564 - Santiago Omar Caballero Morales, Stephen J. Cox:
Modelling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech. 1565-1568 - Qiang Huo, Wei Li:
An active approach to speaker and task adaptation based on automatic analysis of vocabulary confusability. 1569-1572 - Jing Zheng, Andreas Stolcke:
fMPE-MAP: improved discriminative adaptation for modeling new domains. 1573-1576 - Timothy J. Hazen, Erik McDermott:
Discriminative MCE-based speaker adaptation of acoustic models for a spoken lecture processing task. 1577-1580
Speech Perception I, II
- Douglas Brungart, Nandini Iyer:
Time-compressed speech perception with speech and noise maskers. 1581-1584 - Anne Cutler, Martin Cooke, María Luisa García Lecumberri, Dennis Pasveer:
L2 consonant identification in noise: cross-language comparisons. 1585-1588 - Jennifer T. Le, Catherine T. Best, Michael D. Tyler, Christian Kroos:
Effects of non-native dialects on spoken word recognition. 1589-1592 - Julien Meyer, Fanny Meunier, Laure Dentel:
Identification of natural whistled vowels by non-whistlers. 1593-1596 - Alexandra Jesse, James M. McQueen:
Prelexical adjustments to speaker idiosyncrasies: are they position-specific? 1597-1600 - Holger Mitterer:
Top-down effects on compensation for coarticulation are not replicable. 1601-1604
Spoken Language Understanding
- Christian Raymond, Giuseppe Riccardi:
Generative and discriminative algorithms for spoken language understanding. 1605-1608 - Elias Iosif, Alexandros Potamianos:
A soft-clustering algorithm for automatic induction of semantic classes. 1609-1612 - Agustín Gravano, Stefan Benus, Julia Hirschberg, Shira Mitchell, Ilia Vovsha:
Classification of discourse functions of affirmative words in spoken dialogue. 1613-1616 - Bogdan Minescu, Géraldine Damnati, Frédéric Béchet, Renato de Mori:
Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy. 1617-1620 - Jáchym Kolár, Yang Liu, Elizabeth Shriberg:
Speaker adaptation of language models for automatic dialog act segmentation of meetings. 1621-1624 - Amparo Albalate, Dimitar Dimitrov, Roberto Pieraccini:
Unsupervised categorisation approaches for technical support automated agents. 1625-1628
Pitch Extraction I, II
- Michael Wohlmayr, Marián Képesi:
Joint position-pitch extraction from multichannel audio. 1629-1632 - Hyun Soo Kim:
Morphological pre-processing technique and its applications on speech signal. 1633-1636 - Patricia A. Pelle, Claudio Estienne:
A pitch extraction system based on phase locked loops and consensus decision. 1637-1640 - Milan Legát, Jindrich Matousek, Daniel Tihelka:
A robust multi-phase pitch-mark detection algorithm. 1641-1644 - Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu, Md. Kamrul Hasan:
Pitch estimation of noisy speech signals using empirical mode decomposition. 1645-1648 - Daniel Hirst, Hyongsil Cho, Sunhee Kim, Hyunji Yu:
Evaluating two versions of the momel pitch modelling algorithm on a corpus of read speech in Korean. 1649-1652 - Hussein Hussein, Oliver Jokisch:
Hybrid electroglottograph and speech signal based algorithm for pitch marking. 1653-1656
Speech Coding and Transmission
- Saikat Chatterjee, Thippur V. Sreenivas:
Normalized two stage SVQ for minimum complexity wide-band LSF quantization. 1657-1660 - Peng Zhang, Changchun Bao:
A novel 2kb/s waveform interpolation speech coder based on non-negative matrix factorization. 1661-1664 - Ahmed Ismail, Yasser Dakroury, Hazem M. Abbas:
A novel energy distribution comparison approach for robust speech spectrum vector quantization. 1665-1668 - Ahmed Ismail, Yasser Dakroury, Hazem M. Abbas:
Novel low-band phase representation for low bit-rate speech coding. 1669-1672 - Chun-Feng Wu, Cheng-Lung Lee, Wen-Whei Chang:
Perceptual-based playout mechanisms for multi-stream voice over IP networks. 1673-1676 - Robert Zopf, Jes Thyssen, Juin-Hwey Chen:
Time-warping and re-phasing in packet loss concealment. 1677-1680 - Yannis Agiomyrgiannakis, Yannis Stylianou:
The harmonic model codec (HMC) framework for voIP. 1681-1684 - Yannis Agiomyrgiannakis, Yannis Stylianou:
Bit-erasure channel decoding for GMM-based multiple description coding. 1685-1688 - Hua Yuan, Tiago H. Falk, Wai-Yip Chan:
Degradation-classification assisted single-ended quality measurement of speech. 1689-1692 - Alexander Raake, Sascha Spors, Jens Ahrens, Jitendra Ajmera:
Concept and evaluation of a downward-compatible system for spatial teleconferencing using automatic speaker clustering. 1693-1696 - Min-Ki Lee, Kyung-Tae Kim, Hong-Goo Kang, Dae Hee Youn:
Speech quality estimation using packet loss effects in CELP-type speech coders. 1697-1700 - Masahiro Oshikiri, Hiroyuki Ehara, Toshiyuki Morii, Tomofumi Yamanashi, Kaoru Satoh, Koji Yoshida:
An 8-32 kbit/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kbit/s narrowband CELP coder. 1701-1704
Topics in Acoustic Modeling
- Robert Wielgat, Tomasz P. Zielinski, Pawel Swietojanski, Piotr Zoladz, Daniel Król, Tomasz Wozniak, Stanislaw Grabias:
Comparison of HMM and DTW methods in automatic recognition of pathological phoneme pronunciation. 1705-1708 - Kai Yu, Mark J. F. Gales, Philip C. Woodland:
Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio. 1709-1712 - Hao Wu, Xihong Wu:
Context dependent syllable acoustic model for continuous Chinese speech recognition. 1713-1716 - Dimitris Oikonomidis, Vassilios Diakoloukas, Vassilios Digalakis:
A sub-optimal viterbi-like search for linear dynamic models classification. 1717-1720 - Georg Heigold, Ralf Schlüter, Hermann Ney:
On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields. 1721-1724 - Stefano Scanzio, Pietro Laface, Roberto Gemello, Franco Mana:
Speeding-up neural network training using sentence and frame selection. 1725-1728 - Linquan Liu, Thomas Fang Zheng, Makoto Akabane, Ruxin Chen, Wenhu Wu:
Using a small development set to build a robust dialectal Chinese speech recognizer. 1729-1732
Confidence Measures (and Related Topics)
- Carlos Molina, Néstor Becerra Yoma, Fernando Huenupán, Claudio Garretón:
Unsupervised re-scoring of observation probability in viterbi based on reinforcement learning by using confidence measure and HMM neighborhood. 1733-1736 - Shiuan-Sung Lin, François Yvon:
Optimization on decoding graphs by discriminative training. 1737-1740 - Stéphane Huet, Guillaume Gravier, Pascale Sébillot:
Morphosyntactic processing of n-best lists for improved recognition and confidence measure computation. 1741-1744 - Xiang Li, Juan M. Huerta:
How predictable is ASR confidence in dialog applications? 1745-1748 - Alexandre Allauzen:
Error detection in confusion network. 1749-1752 - Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition. 1753-1756 - Hamed Ketabdar, Mirko Hannemann, Hynek Hermansky:
Detection of out-of-vocabulary words in posterior based ASR. 1757-1760
Grapheme-to-Phoneme Conversion
- Daniela Braga, Luís Pinto Coelho, Fernando Gil Vianna Resende Jr.:
Homograph ambiguity resolution in front-end design for portuguese TTS systems. 1761-1764 - Ghinwa F. Choueiter, Stephanie Seneff, James R. Glass:
New word acquisition using subword modeling. 1765-1768 - Samuel Thomas, Ashish Verma:
Language identification of person names using CF-IOF based weighing function. 1769-1772 - Henk van den Heuvel, Jean-Pierre Martens, Nanneke Konings:
G2p conversion of names: what can we do (better)? 1773-1776 - Ausdang Thangthai, Chai Wutiwiwatchai, Anocha Rugchatjaroen, Sittipong Saychum:
A learning method for Thai phonetization of English words. 1777-1780 - Steffen Werner, Rüdiger Hoffmann:
Spontaneous speech synthesis by pronunciation variant selection - a comparison to natural speech. 1781-1784 - Nikos Tsourakis, Vassilios Digalakis:
A generic methodology of converting transliterated text to phonetic strings case study: greeklish. 1785-1788 - Rita Singh, Evandro B. Gouvêa, Bhiksha Raj:
Probabilistic deduction of symbol mappings for extension of lexicons. 1789-1792
Lexical and Prosodic Modeling
- Sergey Astrov, Joachim Hofer, Harald Höge:
Use of syllable center detection for improved duration modeling in Chinese Mandarin connected digits recognition. 1793-1796 - Thomas Pellegrini, Lori Lamel:
Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language. 1797-1800 - Sheng Qiang, Yao Qian, Frank K. Soong, Congfu Xu:
Robust F0 modeling for Mandarin speech recognition in noise. 1801-1804 - Dino Seppi, Daniele Falavigna, Georg Stemmer, Roberto Gretter:
Word duration modeling for word graph rescoring in LVCSR. 1805-1808 - Fabio Tamburini, Petra Wagner:
On automatic prominence detection for German. 1809-1812 - Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan:
Prosody-enriched lattices for improved syllable recognition. 1813-1816 - Joel Pinto, Andrew Lovitt, Hynek Hermansky:
Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting. 1817-1820 - C. E. Liu, Kishan Thambiratnam, Frank Seide:
Online vocabulary adaptation using limited adaptation data. 1821-1824
Speech Recognition by Automatic Attribute Transcription
- Chin-Hui Lee, Mark A. Clements, Sorin Dusan, Eric Fosler-Lussier, Keith Johnson, Biing-Hwang Juang, Lawrence R. Rabiner:
An overview on automatic speech attribute transcription (ASAT). 1825-1828 - Ilana Bromberg, Qian Qian, Jun Hou, Jinyu Li, Chengyuan Ma, Brett Matthews, Antonio Moreno-Daniel, Jeremy Morris, Sabato Marco Siniscalchi, Yu Tsao, Yu Wang:
Detection-based ASR in the automatic speech attribute transcription project. 1829-1832 - Chi-Yueh Lin, Hsiao-Chuan Wang:
Attribute-based Mandarin speech recognition using conditional random fields. 1833-1836 - Helmer Strik, Khiet P. Truong, Febe de Wet, Catia Cucchiarini:
Comparing classifiers for pronunciation error detection. 1837-1840 - Jarek Krajewski, Bernd J. Kröger:
Using prosodic and spectral characteristics for sleepiness detection. 1841-1844 - Brian M. Ore, Raymond E. Slyh:
Score fusion for articulatory feature detection. 1845-1848
Speaker Diarization
- Scott Otterson:
Improved location features for meeting speaker diarization. 1849-1852 - Kyu Jeong Han, Shrikanth S. Narayanan:
A robust stopping criterion for agglomerative hierarchical clustering in a speaker diarization system. 1853-1856 - Marijn Huijbregts, Chuck Wooters:
The blame game: performance analysis of speaker diarization system components. 1857-1860 - Hagai Aronowitz:
Trainable speaker diarization. 1861-1864 - Jing Huang, Etienne Marcheret, Karthik Visweswariah:
Improving speaker diarization for CHIL lecture meetings. 1865-1868 - Viet Bac Le, Odile Mella, Dominique Fohr:
Speaker diarization using normalized cross likelihood ratio. 1869-1872
First and Second Language Learning
- Wai-Sum Lee:
Tone production by the speakers of different age-and-gender groups. 1873-1876 - Nan Xu, Denis Burnham, Christine Kitamura:
Vowels and tones in infant directed speech: hyperarticulation for both, but different developmental patterns. 1877-1880 - Eon-Suk Ko:
Acquisition of vowel duration in children speaking american English. 1881-1884 - Hiroko Hirano, Keikichi Hirose, Goh Kawai, Wentao Gu, Nobuaki Minematsu:
F0 models show Chinese speakers of Japanese insert intonational boundaries and drop pitch. 1885-1888 - Paola Escudero, Jelle Kastelein, Klara A. Weiand, R. J. J. H. van Son:
Formal modelling of L1 and L2 perceptual learning: computational linguistics versus machine learning. 1889-1892 - Mirjam Broersma:
Kettle hinders cat, shadow does not hinder shed: activation of 'almost embedded' words in nonnative listening. 1893-1896
Speech Synthesis I, II
- Sacha Krstulovic, Anna Hunecke, Marc Schröder:
An HMM-based speech synthesis system applied to German and its adaptation to a limited set of expressive football announcements. 1897-1900 - Liang Gu, Wei Zhang, Lazkin Tahir, Yuqing Gao:
Statistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems. 1901-1904 - Wu Liu, Dezhi Huang, Yuan Dong, Xinnian Mao, Haila Wang:
A pair-based language model for the robust lexical analysis in Chinese text-to-speech synthesis. 1905-1908 - Ranniery Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
A trainable excitation model for HMM-based speech synthesis. 1909-1912 - Jochen Steigner, Marc Schröder:
Cross-language phonemisation in German text-to-speech synthesis. 1913-1916 - Ryuki Tachibana, Tohru Nagano, Gakuto Kurata, Masafumi Nishimura, Noboru Babaguchi:
Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone. 1917-1920
Phonetic Segmentation and Classification I, II
- Xiaochuan Niu, Jan P. H. van Santen:
Dual-channel acoustic detection of nasalization states. 1921-1924 - Tarun Pruthi, Carol Y. Espy-Wilson:
Acoustic parameters for the automatic detection of vowel nasalization. 1925-1928 - Jun Hou, Lawrence R. Rabiner, Sorin Dusan:
On the use of time-delay neural networks for highly accurate classification of stop consonants. 1929-1932 - Ladan Golipour, Douglas D. O'Shaughnessy:
A new approach for phoneme segmentation of speech signals. 1933-1936 - Veronique Stouten, Kris Demuynck, Hugo Van hamme:
Automatically learning the units of speech by non-negative matrix factorisation. 1937-1940 - Ozlem Kalinli, Shrikanth S. Narayanan:
A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech. 1941-1944 - Sung Jun An, Young-Ik Kim, Rhee Man Kil:
Zero-crossing-based ratio masking for sound segregation. 1945-1948 - Satomi Tanaka, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka:
Event detection of speech signals based on auditory processing with a dynamic compressive gammachirp filterbank. 1949-1952 - Odette Scharenborg, Mirjam Ernestus, Vincent Wan:
Segmentation of speech: child's play? 1953-1956 - Andrew Errity, John McKenna, Barry Kirkpatrick:
Dimensionality reduction methods applied to both magnitude and phase derived features. 1957-1960
Voice Conversion and Modification
- Zdenek Hanzlícek, Jindrich Matousek:
F0 transformation within the voice conversion framework. 1961-1964 - Daniel Erro, Asunción Moreno:
Weighted frequency warping for voice conversion. 1965-1968 - Daniel Erro, Asunción Moreno:
Frame alignment method for cross-lingual voice conversion. 1969-1972 - Jani Nurminen, Jilei Tian, Victor Popa:
Voicing level control with application in voice conversion. 1973-1976 - Winston S. Percybrooks, Elliot Moore:
New algorithm for LPC residual estimation from LSF vectors for a voice conversion system. 1977-1980 - Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model. 1981-1984 - Petko Nikolov Petkov, W. Bastiaan Kleijn:
Improving the phase vocoder approach to pitch-shifting. 1985-1988 - Larbi Mesbahi, Vincent Barreaud, Olivier Boëffard:
Comparing GMM-based speech transformation systems. 1989-1992
Speaker Verification & Identification I-IV
- Michael Gerber, René Beutler, Beat Pfister:
Quasi text-independent speaker-verification based on pattern matching. 1993-1996 - Yosef A. Solewicz, Moshe Koppel:
Virtual fusion for speaker recognition. 1997-2000 - Yi-Hsiang Chao, Wei-Ho Tsai, Shih-Sian Cheng, Hsin-Min Wang, Ruei-Chuan Chang:
Evolutionary minimum verification error learning of the alternative hypothesis model for LLR-based speaker verification. 2001-2004 - Seiichi Nakagawa, Kouhei Asakawa, Longbiao Wang:
Speaker recognition by combining MFCC and phase information. 2005-2008 - Sandeep Manocha, Carol Y. Espy-Wilson:
A semi-automatic approach for speaker mining of tapped telephone conversations. 2009-2012 - Hao Yang, Yuan Dong, Xianyu Zhao, Jian Zhao, Liang Lu, Haila Wang:
Cluster adaptive training weights as features in SVM-based speaker verification. 2013-2016 - Hideki Okamoto, Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Study on speaker verification with non-audible murmur segments. 2017-2020 - Xugang Lu, Jianwu Dang:
Dimension reduction for speaker identification based on mutual information. 2021-2024 - Jonas Lindh, Anders Eriksson:
Robustness of long time measures of fundamental frequency. 2025-2028 - Vinod Prakash, John H. L. Hansen:
Score distribution scaling for speaker recognition. 2029-2032 - Andrew C. Morris, Jacques C. Koreman, B. Ly-Van, Harin Sellahewa, Sabah Jassim, R. Llarena Gómez:
Global features for rapid identity verification with dynamic biometric data. 2033-2036 - Tuan Van Pham, Michael Neffe, Gernot Kubin:
Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments. 2037-2040 - Fernando Huenupán, Néstor Becerra Yoma, Carlos Molina, Claudio Garretón:
Speaker verification with multiple classifier fusion using Bayes based confidence measure. 2041-2044 - Girija Chetty, Michael Wagner:
Audiovisual speaker identity verification based on lip motion features. 2045-2048 - Gökhan Tür, Elizabeth Shriberg, Andreas Stolcke, Sachin S. Kajarekar:
Duration and pronunciation conditioned lexical modeling for speaker verification. 2049-2052 - Jean-François Bonastre, Driss Matrouf, Corinne Fredouille:
Artificial impostor voice transformation effects on false acceptance rates. 2053-2056
Improved Acoustic Modeling for ASR
- Jen-Wei Kuo, Hung-Yi Lo, Hsin-Min Wang:
Improved HMM/SVM methods for automatic phoneme segmentation. 2057-2060 - Takahiro Shinozaki, Tatsuya Kawahara:
Gaussian mixture optimization for HMM based on efficient cross-validation. 2061-2064 - Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
Model-space MLLR for trajectory HMMs. 2065-2068 - Hamed Ketabdar, Hervé Bourlard:
In-context phone posteriors as complementary features for tandem ASR. 2069-2072 - Qian Qian, Xiaodong He, Li Deng:
Phone-discriminating minimum classification error (p-MCE) training for phonetic recognition. 2073-2076 - Lori Lamel, Abdelkhalek Messaoudi, Jean-Luc Gauvain:
Improved acoustic modeling for transcribing Arabic broadcast data. 2077-2080 - Erik McDermott, Atsushi Nakamura:
String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task. 2081-2084 - Byung Ok Kang, Ho-Young Jung, Yunkeun Lee:
Discriminative noise adaptive training approach for an environment migration. 2085-2088 - Jia-Yu Chen, Peder A. Olsen, John R. Hershey:
Word confusability - measuring hidden Markov model similarity. 2089-2092 - Thomas Deselaers, Georg Heigold, Hermann Ney:
Speech recognition with state-based nearest neighbour classifiers. 2093-2096 - Remco Teunen, Masami Akamine:
HMM-based speech recognition using decision trees instead of GMMs. 2097-2100 - Christian Gollan, Stefan Hahn, Ralf Schlüter, Hermann Ney:
An improved method for unsupervised training of LVCSR systems. 2101-2104 - Mohamed Kamal Omar:
A variational approach to robust maximum likelihood estimation for speech recognition. 2105-2108 - Kai Yu, Rob A. Rutenbar:
Generating small, accurate acoustic models with a modified Bayesian information criterion. 2109-2112 - Peter Bell, Simon King:
Sparse Gaussian graphical models for speech recognition. 2113-2116 - Sakriani Sakti, Konstantin Markov, Satoshi Nakamura:
An HMM acoustic model incorporating various additional knowledge sources. 2117-2120 - Matti Varjokallio, Mikko Kurimo:
Comparison of subspace methods for Gaussian mixture models in speech recognition. 2121-2124
Multilingualism in Speech and Language Processing
- Tanja Schultz, Alan W. Black, Sameer Badaskar, Matthew Hornyak, John Kominek:
SPICE: web-based tools for rapid language adaptation in speech processing systems. 2125-2128 - Filip Deprez, Jan Odijk, Jan De Moortel:
Introduction to multilingual corpus-based concatenative speech synthesis. 2129-2132 - Frederik Stouten, Jean-Pierre Martens:
Recognition of foreign names spoken by native speakers. 2133-2136 - Ricardo de Córdoba, Luis Fernando D'Haro, Fernando Fernández Martínez, Juan Manuel Montero, Roberto Barra-Chicote:
Language identification using several sources of information with a multiple-Gaussian classifier. 2137-2140 - Carmen del Solar, Guillermo Pérez García, Eva Florencio, David Moral, Gabriel Amores Carredano, Pilar Manchón Portillo:
Dynamic language change in MIMUS. 2141-2144
Systems for LVCSR and Rich Transcription I, II
- Jonas Lööf, Christian Gollan, Stefan Hahn, Georg Heigold, Björn Hoffmeister, Christian Plahl, David Rybach, Ralf Schlüter, Hermann Ney:
The RWTH 2007 TC-STAR evaluation system for european English and Spanish. 2145-2148 - Chin-Wei Eugene Koh, Hanwu Sun, Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma, Engsiong Chng, Haizhou Li, Susanto Rahardja:
Using direction of arrival estimate and acoustic feature information in speaker diarization. 2149-2152 - Fernando Batista, Diamantino Caseiro, Nuno J. Mamede, Isabel Trancoso:
Recovering punctuation marks for automatic speech recognition. 2153-2156 - Jui-Feng Yeh, Chung-Hsien Wu, Wei-Yen Wu:
Disfluency correction of spontaneous speech using conditional random fields with variable-length features. 2157-2160 - Jing Huang, Etienne Marcheret, Karthik Visweswariah, Vit Libal, Gerasimos Potamianos:
Detection, diarization, and transcription of far-field lecture speech. 2161-2164 - Timothy J. Hazen, Brennan Sherry, Mark Adler:
Speech-based annotation and retrieval of digital photographs. 2165-2168
Language Learning and Assessment
- Joseph Tepperman, Abe Kazemzadeh, Shrikanth S. Narayanan:
A text-free approach to assessing nonnative intonation. 2169-2172 - John Lee, Stephanie Seneff:
Automatic generation of cloze items for prepositions. 2173-2176 - Christopher J. Waple, Hongcui Wang, Tatsuya Kawahara, Yasushi Tsubota, Masatake Dantsuji:
Evaluating and optimizing Japanese tutor system featuring dynamic question generation and interactive guidance. 2177-2180 - Catia Cucchiarini, Ambra Neri, Febe de Wet, Helmer Strik:
ASR-based pronunciation training: scoring accuracy and pedagogical effectiveness of a system for dutch L2 learners. 2181-2184 - Joseph Tepperman, Matthew Black, Patti Price, Sungbok Lee, Abe Kazemzadeh, Matteo Gerosa, Margaret Heritage, Abeer Alwan, Shrikanth S. Narayanan:
A Bayesian network classifier for word-level reading assessment. 2185-2188
Multimodal Interaction: Analysis and Technology
- Hartwig Holzapfel, Alex Waibel:
Behavior models for learning and receptionist dialogs. 2189-2192 - Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen, Aleksi Melto, Topi Hurtig:
Design of a rich multimodal interface for mobile spoken route guidance. 2193-2196 - Mariët Theune, Dennis Hofs, Marco van Kessel:
The virtual guide: a direction giving embodied conversational agent. 2197-2200 - Sudeep Gandhe, David R. Traum:
Creating spoken dialogue characters from corpora without annotations. 2201-2204 - Pui-Yu Hui, Zhengyu Zhou, Helen M. Meng:
Complementarity and redundancy in multimodal user inputs with speech and pen gestures. 2205-2208 - Linda Bell, Joakim Gustafson:
Children's convergence in referring expressions to graphical objects in a speech-enabled computer game. 2209-2212
Emotion
- Hiromi Kawatsu, Sumio Ohno:
An analysis of individual differences in the f0 contour and the duration of anger utterances at several degrees. 2213-2216 - Yoshiko Arimoto, Sumio Ohno, Hitoshi Iida:
Acoustic features of anger utterances during natural dialog. 2217-2220 - Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg, Wisam Dakka:
Comparing american and palestinian perceptions of charisma using acoustic-prosodic and lexical analysis. 2221-2224 - Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan:
Using neutral speech models for emotional speech analysis. 2225-2228 - N. Satoh, Katsuya Yamauchi, Shoichi Matsunaga, Masaru Yamashita, R. Nakagawa, Kazuyuki Shinohara:
Emotion clustering using the results of subjective opinion tests for emotion recognition in infants' cries. 2229-2232 - Roberto Barra-Chicote, Juan Manuel Montero, Javier Macías Guarasa, Juana M. Gutiérrez-Arriola, Javier Ferreiros, José Manuel Pardo:
On the limitations of voice conversion techniques in emotion identification tasks. 2233-2236 - Kate Dupuis, Kathleen Pichora-Fuller:
Use of lexical and affective prosodic cues to emotion by younger and older adults. 2237-2240 - Purnima Gupta, Nitendra Rajput:
Two-stream emotion recognition for call center monitoring. 2241-2244 - Ioulia Grichkovtsova, Anne Lacheret, Michel Morel:
The role of intonation and voice quality in the affective speech perception. 2245-2248 - Bogdan Vlasenko, Björn W. Schuller, Andreas Wendemuth, Gerhard Rigoll:
Combining frame and turn-level information for robust recognition of emotions within speech. 2249-2252
Speakers: Expression, Emotion and Personality Recognition
- Björn W. Schuller, Anton Batliner, Dino Seppi, Stefan Steidl, Thurid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Loïc Kessous, Vered Aharonson:
The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. 2253-2256 - Minh-Quang Vu, Laurent Besacier, Eric Castelli:
Automatic question detection: prosodic-lexical features and crosslingual experiments. 2257-2260 - Makoto Tachibana, Keigo Kawashima, Junichi Yamagishi, Takao Kobayashi:
Performance evaluation of HMM-based style classification with a small amount of training data. 2261-2264 - Khiet P. Truong, David A. van Leeuwen:
Visualizing acoustic similarities between emotions in speech: an acoustic map of emotions. 2265-2268 - Hao Hu, Ming-Xing Xu, Wei Wu:
Fusion of global statistical and segmental spectral features for speech emotion recognition. 2269-2272 - Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps:
Group delay features for emotion detection. 2273-2276 - Christian A. Müller, Felix Burkhardt:
Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age. 2277-2280 - Frank Enos, Elizabeth Shriberg, Martin Graciarena, Julia Hirschberg, Andreas Stolcke:
Detecting deception using critical segments. 2281-2284 - Takashi Nose, Yoichi Kato, Takao Kobayashi:
Style estimation of speech based on multiple regression hidden semi-Markov model. 2285-2288 - Chi Zhang, John H. L. Hansen:
Analysis and classification of speech mode: whispered through shouted. 2289-2292
First Language, Second Language, Cross-language
- Melissa Bettoni-Techio, Andréia S. Rauber, Rosana Denise Koerich:
Perception and production of word-final alveolar stops by brazilian portuguese learners of English. 2293-2296 - Denise Cristina Kluge, Andréia S. Rauber, Mara Silvia Reis, Ricardo Augusto Hoffmann Bion:
The relationship between the perception and production of English nasal codas by brazilian learners of English. 2297-2300 - Takafumi Utashiro, Goh Kawai:
CALL courseware for learning reactive tokens in face-to-face dialogs. 2301-2304 - Shinya Kiriyama, Ryo Tsuji, Tomohiko Kasami, Shogo Ishikawa, Naofumi Otani, Hiroaki Horiuchi, Yoichi Takebayashi, Shigeyoshi Kitazawa:
The developmental analysis of demonstrative expression skills utilizing a multimodal infant behavior corpus. 2305-2308 - Elena E. Lyakso, Olga V. Frolova:
Russian vowels system acoustic features development in ontogenesis. 2309-2312 - Petra van Alphen, Elise de Bree, Paula Fikkert, Frank Wijnen:
The role of metrical stress in comprehension and production in dutch children at-risk of dyslexia. 2313-2316 - Seiichi Nakagawa, Kei Ohta:
A statistical method of evaluating pronunciation proficiency for presentation in English. 2317-2320 - Akiyo Joto, Yoshiki Nagase, Seiya Funatsu:
The intelligibility and its relations to acoustic characteristics of English /s/ and /esh/ produced by native speakers of Japanese. 2321-2324 - Martijn Goudbeek, Daniel Swingley, Keith R. Kluender:
The limits of multidimensional category learning. 2325-2328 - Maria Uther, James Uther, Panos Athanasopoulos, Pushpendra Singh, Reiko Akahane-Yamada:
Mobile adaptive CALL (MAC): a lightweight speech-based intervention for mobile language learners. 2329-2332 - Catherine T. Best, Pierre A. Hallé, Jennifer S. Pardo:
English and French speakers' perception of voicing distinctions in non-native lateral consonant syllable onsets. 2333-2336 - Francisco Lacerda, Lisa Gustavsson:
Predicting the consequences of vocalizations in early infancy. 2337-2340 - David Weenink, Guangqin Chen, Zongyan Chen, Stefan de Konink, Dennis Vierkant, Eveline van Hagen, R. J. J. H. van Son:
Learning tone distinctions for Mandarin Chinese. 2341-2344 - Catherine Lai, Kyle Gorman, Jiahong Yuan, Mark Y. Liberman:
Perception of disfluency: language differences and listener bias. 2345-2348
Language Modeling I, II
- Hiroki Yamazaki, Koji Iwano, Koichi Shinoda, Sadaoki Furui, Haruo Yokota:
Dynamic language model adaptation using presentation slides for lecture speech recognition. 2349-2352 - Cosmin Munteanu, Gerald Penn, Ronald Baecker:
Web-based language modelling for automatic lecture transcription. 2353-2356 - Tanel Alumäe, Toomas Kirt:
LSA-based language model adaptation for highly inflected languages. 2357-2360 - Aaron Heidel, Hung-An Chang, Lin-Shan Lee:
Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm. 2361-2364 - Sibel Yaman, Jen-Tzung Chien, Chin-Hui Lee:
Structural Bayesian language modeling and adaptation. 2365-2368 - Ciro Martins, António J. S. Teixeira, João Paulo Neto:
Vocabulary selection for a broadcast news transcription system using a morpho-syntactic approach. 2369-2372 - Nguyen Bach, Mohamed Noamany, Ian R. Lane, Tanja Schultz:
Handling OOV words in Arabic ASR via flexible morphological constraints. 2373-2376 - Raquel Justo, M. Inés Torres:
Phrases in category-based language models for Spanish and basque ASR. 2377-2380 - Ebru Arisoy, Hasim Sak, Murat Saraclar:
Language modeling for automatic turkish broadcast news transcription. 2381-2384
Spoken Data Retrieval I, II
- Roy Wallace, Robbie Vogt, Sridha Sridharan:
A phonetic search approach to the 2006 NIST spoken term detection evaluation. 2385-2388 - Yoshiaki Itoh, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee:
An integration method of retrieval results using plural subword models for vocabulary-free spoken document retrieval. 2389-2392 - Dimitra Vergyri, Izhak Shafran, Andreas Stolcke, Venkata Ramana Rao Gadde, Murat Akbacak, Brian Roark, Wen Wang:
The SRI/OGI 2006 spoken term detection system. 2393-2396 - Masataka Goto, Jun Ogata, Kouichirou Eto:
Podcastle: a web 2.0 approach to speech recognition research. 2397-2400 - Nathalie Camelin, Frédéric Béchet, Géraldine Damnati, Renato de Mori:
Speech mining in noisy audio message corpus. 2401-2404 - Jian Shao, Qingwei Zhao, Pengyuan Zhang, Zhaojie Liu, Yonghong Yan:
A fast fuzzy keyword spotting algorithm based on syllable confusion network. 2405-2408 - Wooil Kim, John H. L. Hansen:
Advances in speechfind: transcript reliability estimation employing confidence measure based on discriminative sub-word model for SDR. 2409-2412 - Benoît Favre, Jean-François Bonastre, Patrice Bellot:
An interactive timeline for speech database browsing. 2413-2416
Novel Techniques for the NATO Non-native Air-traffic Control and HIWIRE Cockpit Databases
- Stéphane Pigeon, Wade Shen, Aaron D. Lawson, David A. van Leeuwen:
Design and characterization of the non-native military air traffic communications database (nnMATC). 2417-2420 - Wade Shen, Douglas A. Reynolds:
A comparison of speaker clustering and speech recognition techniques for air situational awareness. 2421-2424 - Dimitrios Dimitriadis, José C. Segura, Luz García, Alexandros Potamianos, Petros Maragos, Vassilis Pitsikalis:
Advanced front-end for robust speech recognition in extremely adverse environments. 2425-2428 - Roberto Gemello, Franco Mana, Stefano Scanzio:
Experiments on hiwire database using denoising and adaptation with a hybrid HMM-ANN model. 2429-2432 - Brett Y. Smolenski:
Detection and removal of switching noise in push-to-talk and voice operated exchange communications systems. 2433-2436 - Luis Buera, Antonio Miguel, Oscar Saz, Eduardo Lleida, Alfonso Ortega:
Evaluation of the combined use of MEMLIN and MLLR on the non-native adaptation task of hiwire project database. 2437-2440
Systems for Spoken Language Translation I, II
- Daniel Déchelotte, Holger Schwenk, Gilles Adda, Jean-Luc Gauvain:
Improved machine translation of speech-to-text outputs. 2441-2444 - Shirin Saleem, Krishna Subramanian, Rohit Prasad, David Stallard, Chia-Lin Kao, Prem Natarajan, Raid Suleiman:
Improvements in machine translation for English/iraqi speech translation. 2445-2448 - Evgeny Matusov, Dustin Hillard, Mathew Magimai-Doss, Dilek Hakkani-Tür, Mari Ostendorf, Hermann Ney:
Improving speech translation with automatic boundary prediction. 2449-2452 - Roldano Cattoni, Nicola Bertoldi, Marcello Federico:
Punctuating confusion networks for speech translation. 2453-2456 - Aarthi M. Reddy, Richard C. Rose, Alain Désilets:
Integration of ASR and machine translation models in a document translation task. 2457-2460 - Yik-Cheung Tam, Tanja Schultz:
Bilingual LSA-based translation lexicon adaptation for spoken language translation. 2461-2464
Articulatory Features
- Korin Richmond:
A multitask learning perspective on acoustic-articulatory inversion. 2465-2468 - Chao Qin, Miguel Á. Carreira-Perpiñán:
A comparison of acoustic features for articulatory inversion. 2469-2472 - Odette Scharenborg, Vincent Wan:
Can unquantised articulatory feature continuums be modelled? 2473-2476 - Milind S. Shah, Prem C. Pandey:
Estimation of place of articulation in stop consonants for visual feedback. 2477-2480 - Blaise Potard, Yves Laprie:
Compact representations of the articulatory-to-acoustic mapping. 2481-2484 - Joe Frankel, Mathew Magimai-Doss, Simon King, Karen Livescu, Özgür Çetin:
Articulatory feature classifiers trained on 2000 hours of telephone speech. 2485-2488
Wideband Speech Processing
- Amr H. Nour-Eldin, Peter Kabal:
Objective analysis of the effect of memory inclusion on bandwidth extension of narrowband speech. 2489-2492 - Bernd Geiser, Hervé Taddei, Peter Vary:
Artificial bandwidth extension without side information for ITU-t g.729.1. 2493-2496 - Hannu Pulakka, Paavo Alku, Laura Laaksonen, Päivi Valve:
The effect of highband harmonic structure in the artificial bandwidth expansion of telephone speech. 2497-2500 - Shingo Kuroiwa, Masashi Takashina, Satoru Tsuge, Fuji Ren:
Artificial bandwidth extension for speech signals using speech recogniton. 2501-2504 - Driss Guerchi, Tamer Rabie, Abdelrhani Louzi:
Voicing-based codebook in low-rate wideband CELP coding. 2505-2508 - Ethan Robert Duni, Bhaskar D. Rao:
Performance of speaker-dependent wideband speech coding. 2509-2512
Accessibility Issues
- Philippe Dreuw, David Rybach, Thomas Deselaers, Morteza Zahedi, Hermann Ney:
Speech recognition techniques for a sign language recognition system. 2513-2516 - Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees. 2517-2520 - Petr Cerva, Jan Nouza:
Design and development of voice controlled aids for motor-handicapped persons. 2521-2524 - Kouichi Katsurada, Yuji Okuma, Makoto Yano, Yurie Iribe, Tsuneo Nitta:
Management of static/dynamic properties in a multimodal interaction system. 2525-2528 - Rubén San Segundo, Alicia Pérez, Daniel Ortiz, Luis Fernando D'Haro, M. Inés Torres, Francisco Casacuberta:
Evaluation of alternatives on speech to sign language translation. 2529-2532 - Géza Németh, Gábor Olaszy, Mátyás Bartalis, Géza Kiss, Csaba Zainkó, Péter Mihajlik:
Speech based drug information system for aged and visually impaired persons. 2533-2536 - Waldo Nogueira, Tamás Harczos, Bernd Edler, Jörn Ostermann, Andreas Büchner:
Automatic speech recognition with a cochlear implant front-end. 2537-2540 - Soo-Young Suk, Hiroaki Kojima:
Voice activated powered wheelchair with non-voice rejection algorithm. 2541-2544 - Laurianne Sitbon, Patrice Bellot, Philippe Blache:
Phonetic based sentence level rewriting of questions typed by dyslexic spellers in an information retrieval context. 2545-2548
New Application Areas
- André Berton, Peter Regel-Brietzmann, Hans Ulrich Block, Stefanie Schachtl, Manfred Gehrke:
How to integrate speech-operated internet information dialogs into a car. 2549-2552 - James R. Glass, Timothy J. Hazen, D. Scott Cyphers, Igor Malioutov, David Huynh, Regina Barzilay:
Recent progress in the MIT spoken lecture processing project. 2553-2556 - Philipp Fischer, Andreas Österle, André Berton, Peter Regel-Brietzmann:
How to personalize speech applications for web-based information in a car. 2557-2560 - Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Topic estimation with domain extensibility for guiding user's out-of-grammar utterances in multi-domain spoken dialogue systems. 2561-2564 - Ryota Nishimura, Norihide Kitaoka, Seiichi Nakagawa:
Prosody change and response timing analysis in spontaneously spoken dialogs and their modeling in a spoken dialog system. 2565-2568 - Satoshi Tamura, Kunihiko Takamatsu, Shinji Ogura, Satoru Hayamizu:
GEMSIS - a novel application of speech recognition to emergency and disaster medicine. 2569-2572 - Rachel Coulston, Esther Klabbers, Jacques de Villiers, John-Paul Hosom:
Application of speech technology in a home based assessment kiosk for early detection of alzheimer's disease. 2573-2576 - Olga Vybornova, Monica Gemo, Ronald Moncarey, Benoît Macq:
Ontology-based multimodal high level fusion involving natural language analysis for aged people home care application. 2577-2580
Story Segmentation
- Shing-kai Chan, Lei Xie, Helen M. Meng:
Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation. 2581-2584 - James G. Fung, Dilek Hakkani-Tür, Mathew Magimai-Doss, Elizabeth Shriberg, Sébastien Cuendet, Nikki Mirghafori:
Cross-linguistic analysis of prosodic features for sentence segmentation. 2585-2588 - Andrew Rosenberg, Mehrbod Sharifi, Julia Hirschberg:
Varying input segmentation for story boundary detection in English, Arabic and Mandarin broadcast news. 2589-2592 - BalaKrishna Kolluru, Yoshihiko Gotoh:
Speaker role based structural classification of broadcast news stories. 2593-2596
Systems for LVCSR and Rich Transcription I, II
- Ümit Güz, Sébastien Cuendet, Dilek Hakkani-Tür, Gökhan Tür:
Co-training using prosodic and lexical information for sentence segmentation. 2597-2600 - Yannick Estève, Sylvain Meignier, Paul Deléglise, Julie Mauclair:
Extracting true speaker identities from transcriptions. 2601-2604 - Rong Fu, Ian D. Benest:
An improved speaker diarization system. 2605-2608 - Sebastian Stüker, Christian Fügen, Florian Kraft, Matthias Wölfel:
The ISL 2007 English speech transcription system for european parliament speeches. 2609-2612 - Mei-Yuh Hwang, Wen Wang, Xin Lei, Jing Zheng, Özgür Çetin, Gang Peng:
Advances in Mandarin broadcast speech recognition. 2613-2616 - Jun Ogata, Masataka Goto, Kouichirou Eto:
Automatic transcription for a web 2.0 service to search podcasts. 2617-2620
Prosody: Production
- Matthias Jilka, Bernd Möbius:
The influence of vowel quality features on peak alignment. 2621-2624 - Yen-Liang Shue, Markus Iseli, Nanette Veilleux, Abeer Alwan:
Pitch accent versus lexical stress: quantifying acoustic measures related to the voice source. 2625-2628 - Stefan Benus, Agustín Gravano, Julia Hirschberg:
Prosody, emotions, and... 'whatever'. 2629-2632 - Wentao Gu, Rerrario Shui-Ching Ho, Tan Lee:
Modeling tones in hakka on the basis of the command-response model. 2633-2636 - Gerrit Kentner:
Length, ordering preference and intonational phrasing: evidence from pauses. 2637-2640 - Jörg Peters, Judith Hanssen, Carlos Gussenhoven:
Alignment of the second low target in dutch falling-rising pitch contours. 2641-2644 - Helena Moniz, Ana Isabel Mata, Céu Viana:
On filled-pauses and prolongations in european portuguese. 2645-2648
Prosody: Perception
- Michael Olsberg, Yi Xu, Jeremy Green:
Dependence of tone perception on syllable perception. 2649-2652 - Ralf Winkler:
Testing the relevance of speech rate, pitch and a glottal Chink for the perception of age in synthesized speech using formant synthesis. 2653-2656 - Tamás Böhm, Stefanie Shattuck-Hufnagel:
Utterance-final glottalization as a cue for familiar speaker recognition. 2657-2660 - Chun-Fang Huang, Masato Akagi:
A rule-based speech morphing for verifying a expressive speech perception model. 2661-2664 - Elina Helander, Jani Nurminen:
On the importance of pure prosody in the perception of speaker identity. 2665-2668 - Shi-Han Chen, Chih-Chung Kuo:
Perceptual relevance of pitch contours of Mandarin tones and its efficacy in prosody generation of speech synthesis. 2669-2672 - Hiromitsu Nishizaki, Mitsuhiro Somiya, Kenji Kobayashi, Yoshihiro Sekiguchi:
The effect of filled pauses in a lecture speech on impressive evaluation of listeners. 2673-2676 - Yujia Li, Tan Lee:
Perceptual equivalence of approximated Cantonese tone contours. 2677-2680 - Suleman Shahid, Emiel Krahmer, Marc Swerts:
Audiovisual emotional speech of game playing children: effects of age and culture. 2681-2684
Machine Learning for Spoken Dialog Systems
- Oliver Lemon, Olivier Pietquin:
Machine learning for spoken dialogue systems. 2685-2688 - Verena Rieser, Oliver Lemon:
Learning dialogue strategies for interactive database search. 2689-2692 - Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, Hiroshi Shimodaira:
Hierarchical dialogue optimization using semi-Markov decision processes. 2693-2696 - Hua Ai, Diane J. Litman:
Knowledge consistent user simulations for dialog systems. 2697-2700 - Hsu-Chih Wu, Stephanie Seneff:
Reducing recognition error rate based on context relationships among dialogue turns. 2701-2704 - Teruhisa Misu, Tatsuya Kawahara:
Bayes risk-based optimization of dialogue management for document retrieval system with speech interface. 2705-2708
Spoken Dialog Systems I, II
- Dong Yu, Yun-Cheng Ju, Ye-Yi Wang, Geoffrey Zweig, Alex Acero:
Automated directory assistance system - from theory to practice. 2709-2712 - Geoffrey Zweig, Patrick Nguyen, Yun-Cheng Ju, Ye-Yi Wang, Dong Yu, Alex Acero:
The voice-rate dialog system for consumer ratings. 2713-2716 - Andi Winterboer, Jiang Hu, Johanna D. Moore, Clifford Nass:
The influence of user tailoring and cognitive load on user performance in spoken dialogue systems. 2717-2720 - Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Geoffrey Zweig, Alex Acero:
Confidence measures for voice search applications. 2721-2724 - Ryuichiro Higashinaka, Kohji Dohsaka, Shigeaki Amano, Hideki Isozaki:
Effects of quiz-style information presentation on user understanding. 2725-2728 - Hong-Kwang Jeff Kuo, Vaibhava Goel:
A data visualization and analysis method for natural language call routing system design. 2729-2732
Phonetics
- Christiane Ulbrich, Horst Ulbrich:
Realisations and alternations in German /r/-realisation. 2733-2736 - Christopher S. Doty, Kaori Idemaru, Susan G. Guion:
Singleton and geminate stops in Finnish - acoustic correlates. 2737-2740 - Christophe Van Bael, R. Harald Baayen, Helmer Strik:
Segment deletion in spontaneous speech: a corpus study using mixed effects models with crossed random effects. 2741-2744 - Hongying Zheng, Peter W. M. Tsang, William S.-Y. Wang:
Categorical perception of Cantonese tones in context: a cross-linguistic study. 2745-2748 - Yiya Chen, Jiahong Yuan:
A corpus study of the 3rd tone sandhi in standard Chinese. 2749-2752 - Jonathan Harrington, Sallyanne Palethorpe, Catherine I. Watson:
Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers. 2753-2756
Pitch Extraction I, II
- Jasha Droppo, Alex Acero:
A fine pitch model for speech. 2757-2760 - Prasanta Kumar Ghosh, Antonio Ortega, Shrikanth S. Narayanan:
Pitch period estimation using multipulse model and wavelet transform. 2761-2764 - Martin Heckmann, Frank Joublin, Christian Goerick:
Combining rate and place information for robust pitch extraction. 2765-2768 - Heidi Christensen, Ning Ma, Stuart N. Wrigley, Jon Barker:
Integrating pitch and localisation cues at a speech fragment level. 2769-2772 - Jean-Sylvain Liénard, François Signol, Claude Barras:
Speech fundamental frequency estimation using the alternate comb. 2773-2776 - Andrew Rosenberg, Julia Hirschberg:
Detecting pitch accent using pitch-corrected energy-based predictors. 2777-2780
Spoken Language Understanding and Summarization
- Jian Zhang, Ricky Ho Yin Chan, Pascale Fung, Lu Cao:
A comparative study on speech summarization of broadcast news and lecture speech. 2781-2784 - Gabriel Murray, Steve Renals:
Towards online speech summarization. 2785-2788 - Tomoyuki Yamagata, Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki:
System request detection in conversation based on acoustic and speaker alternation features. 2789-2792 - Michael Levit, Elizabeth Boschee, Marjorie Freedman:
Selecting on-topic sentences from natural language corpora. 2793-2796 - Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee:
A semi-supervised method for efficient construction of statistical spoken language understanding resources. 2797-2800 - Yasuhisa Fujii, Norihide Kitaoka, Seiichi Nakagawa:
Automatic extraction of cue phrases for important sentences in lecture speech and automatic lecture speech summarization. 2801-2804 - Yi-Ting Chen, Hsuan-Sheng Chiu, Hsin-Min Wang, Berlin Chen:
A unified probabilistic generative framework for extractive spoken document summarization. 2805-2808 - Matthieu Hébert:
Generic class-based statistical language models for robust speech understanding in directed dialog applications. 2809-2812 - Michael L. Seltzer, Yun-Cheng Ju, Ivan Tashev, Alex Acero:
Robust location understanding in spoken dialog systems using intersections. 2813-2816
Systems for Spoken Language Translation I, II
- David Stallard, Fred Choi, Chia-Lin Kao, Kriste Krstovski, Premkumar Natarajan, Rohit Prasad, Shirin Saleem, Krishna Subramanian:
The BBN 2007 displayless English/iraqi speech-to-speech translation system. 2817-2820 - Ruhi Sarikaya, Yonggang Deng, Yuqing Gao:
Context dependent word modeling for statistical machine translation using part-of-speech tags. 2821-2824 - Darren Scott Appling, Nick Campbell:
Translating conversational speech to standard linguistic form. 2825-2828 - Caroline Lavecchia, Kamel Smaïli, David Langlois, Jean Paul Haton:
Using inter-lingual triggers for machine translation. 2829-2832 - Daniele Falavigna, Nicola Bertoldi, Fabio Brugnara, Roldano Cattoni, Mauro Cettolo, Boxing Chen, Marcello Federico, Diego Giuliani, Roberto Gretter, Deepa Gupta, Dino Seppi:
The IRST English-Spanish translation system for european parliament speeches. 2833-2836 - Christian Fügen, Muntsin Kolss:
The influence of utterance chunking on machine translation performance. 2837-2840 - Kristin Precoda, Jing Zheng, Dimitra Vergyri, Horacio Franco, Colleen Richey, Andreas Kathol, Sachin S. Kajarekar:
Iraqcomm: a next generation translation system. 2841-2844 - Sharath Rao, Ian R. Lane, Tanja Schultz:
Optimizing sentence segmentation for spoken language translation. 2845-2848
Speech Synthesis I, II
- Suphattharachai Chomphan, Takao Kobayashi:
Implementation and evaluation of an HMM-based Thai speech synthesis system. 2849-2852 - Davide Bonardo, Enrico Zovato:
Speech synthesis enhancement in noisy environments. 2853-2856 - Helmut Schmid, Bernd Möbius, Julia Weidenkaff:
Tagging syllable boundaries with joint n-gram models. 2857-2860 - Jun Xu, Dezhi Huang, Yongxin Wang, Yuan Dong, Lianhong Cai, Haila Wang:
Hierarchical non-uniform unit selection based on prosodic structure. 2861-2864 - Peter Birkholz:
Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets. 2865-2868 - Nobuyuki Nishizawa, Hisashi Kawai:
A preselection method based on cost degradation from the optimal sequence for concatenative speech synthesis. 2869-2872 - Guntram Strecha, Matthias Eichner, Rüdiger Hoffmann:
Line cepstral quefrencies and their use for acoustic inventory coding. 2873-2876 - Peter Cahill, Daniel Aioanei, Julie Carson-Berndsen:
Articulatory acoustic feature applications in speech synthesis. 2877-2880 - Aleksandra Krul, Géraldine Damnati, François Yvon, Cédric Boidin, Thierry Moudenc:
Approaches for adaptive database reduction for text-to-speech synthesis. 2881-2884 - Richard Tzong-Han Tsai, Hsi-Chuan Hung, Hong-Jie Dai, Wen-Lian Hsu:
Exploiting unlabeled internal data in conditional random fields to reduce word segmentation errors for Chinese texts. 2885-2888 - Barry Kirkpatrick, Darragh O'Brien, Ronan Scaife, Andrew Errity:
On the role of spectral dynamics in unit selection speech synthesis. 2889-2892 - Brian Langner, Alan W. Black:
ugloss: a framework for improving spoken language generation understandability. 2893-2896 - Karl Schnell, Arild Lacroix:
Combination of LSF and pole based parameter interpolation for model-based diphone concatenation. 2897-2900 - Kishore Prahallad, Arthur R. Toth, Alan W. Black:
Automatic building of synthetic voices from large multi-paragraph speech databases. 2901-2904 - Ascensión Gallardo-Antolín, Roberto Barra-Chicote, Marc Schröder, Sacha Krstulovic, Juan Manuel Montero:
Automatic phonetic segmentation of Spanish emotional speech. 2905-2908 - Dacheng Lin, Yong Zhao, Frank K. Soong, Min Chu, Jieyu Zhao:
Iterative unit selection with unnatural prosody detection. 2909-2912
Voice Activity Detection and Sound Classification
- Maria E. Markaki, Michael Wohlmayr, Yannis Stylianou:
Speech-nonspeech discrimination using the information bottleneck method and spectro-temporal modulation index. 2913-2916 - Keun Won Jang, Dong Kook Kim, Joon-Hyuk Chang:
A uniformly most powerful test for statistical model-based voice activity detection. 2917-2920 - John Dines, Jithendra Vepa:
Direct optimisation of a multilayer perceptron for the estimation of cepstral mean and variance statistics. 2921-2924 - Marijn Huijbregts, Chuck Wooters, Roeland Ordelman:
Filtering the unknown: speech activity detection in heterogeneous video collections. 2925-2928 - Abhijeet Sangwan, Nitish Krishnamurthy, John H. L. Hansen:
Environmentally aware voice activity detector. 2929-2932 - Masakiyo Fujimoto, Kentaro Ishizuka:
Noise robust voice activity detection based on switching kalman filter. 2933-2936 - Q-Haing Jo, Yun-Sik Park, Kye-Hwan Lee, Ji-Hyun Song, Joon-Hyuk Chang:
Voice activity detection based on support vector machine using effective feature vectors. 2937-2940 - K. Sri Rama Murty, B. Yegnanarayana, S. Guruprasad:
Voice activity detection in degraded speech using excitation source information. 2941-2944 - David Cournapeau, Tatsuya Kawahara:
Evaluation of real-time voice activity detection based on high order statistics. 2945-2948 - Yanmeng Guo, Qian Qian, Yonghong Yan:
Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection. 2949-2952 - Corinne Fredouille, Nicholas W. D. Evans:
The influence of speech activity detection and overlap on speaker diarization for meeting room recordings. 2953-2956 - Gibak Kim, Nam Ik Cho:
Voice activity detection using the phase vector in microphone array. 2957-2960 - Federico Flego, Christian Zieger, Maurizio Omologo:
Adaptive weighting of microphone arrays for distant-talking F0 and voiced/unvoiced estimation. 2961-2964 - A. Sreenivasa Murthy, S. Chandra Sekhar, Thippur V. Sreenivas:
Robust and high-resolution voiced/unvoiced classification in noisy speech using a signal smoothness criterion. 2965-2968 - Tara N. Sainath, Victor Zue, Dimitri Kanevsky:
Audio classification using extended baum-welch transformations. 2969-2972 - Mary Tai Knox, Nikki Mirghafori:
Automatic laughter detection using neural networks. 2973-2976 - Gang Peng, Mei-Yuh Hwang, Mari Ostendorf:
Automatic acoustic segmentation for speech recognition on broadcast recordings. 2977-2980
Unreviewed Papers for Special Sessions
- Peter Birkholz:
Articulatory synthesis of singing. 4001-4004 - Takeshi Saitou, Masataka Goto, Masashi Unoki, Masato Akagi:
Vocal conversion from speaking voice to singing voice using STRAIGHT. 4005-4006 - Axel Röbel, Joshua Fineberg:
Speech to chant transformation with the phase vocoder. 4007-4008 - Hideki Kenmochi, Hayato Ohshita:
VOCALOID - commercial singing synthesizer based on sample concatenation. 4009-4010 - Nicolas D'Alessandro, Thierry Dutoit:
RAMCESS/handsketch: a multi-representation framework for realtime and expressive singing synthesis. 4011-4012 - Sten Ternström, Johan Sundberg:
Formant-based synthesis of singing. 4013-4014 - Han Sloetjes, Albert Russel, Alexander Klassmann:
ELAN: a free and open-source multimedia annotation tool. 4015-4016 - Jozsef Szakos, Ulrike Glavitsch:
Speechindexer in action: managing endangered Formosan languages. 4017-4019 - Tohru Ifukube, Yasuyuki Shimizu:
A portable record player for wax cylinders using a laser-beam reflection method. 4020
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.