default search action
27th TSD 2024: Brno, Czech Republic - Part II
- Elmar Nöth, Ales Horák, Petr Sojka:
Text, Speech, and Dialogue - 27th International Conference, TSD 2024, Brno, Czech Republic, September 9-13, 2024, Proceedings, Part II. Lecture Notes in Computer Science 15049, Springer 2024, ISBN 978-3-031-70565-6
Speech
- Gokul Srinivasagan, Munir Georges:
Retrieval Augmented Spoken Language Generation for Transport Domain. 3-12 - Sven Aller, Mark Fishel:
Adapting Audiovisual Speech Synthesis to Estonian. 13-23 - Dosti Aziz, Dávid Sztahó:
Dysphonia Diagnosis Using Self-supervised Speech Models in Mono and Cross-Lingual Settings. 24-35 - Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlícek, Lukás Vladar:
Sentences vs Phrases in Neural Speech Synthesis. 36-45 - Jan Lehecka, Zdenek Hanzlícek, Jindrich Matousek, Daniel Tihelka:
Zero-Shot vs. Few-Shot Multi-speaker TTS Using Pre-trained Czech SpeechT5 Model. 46-57 - Mohammed Hamzah Abed, Dávid Sztahó:
Deep Speaker Embeddings for Speaker Verification of Children. 58-69 - Kwok Chin Yuen, Jia Qi Yip, Eng Siong Chng:
Improved Alignment for Score Combination of RNN-T and CTC Decoder for Online Decoding. 70-80 - Erfan A. Shams, Julie Carson-Berndsen:
Attention to Phonetics: A Visually Informed Explanation of Speech Transformers. 81-93 - Lukás Vladar, Jindrich Matousek:
Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis. 94-104 - Santiago Andres Moreno-Acevedo, Juan Camilo Vásquez-Correa, Juan M. Martín-Doñas, Aitor Álvarez:
Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning. 105-117 - Zdenek Hanzlícek:
Data Alignment and Duration Modelling in VITS. 118-129 - Ilaria Manfredi:
Multiword Expressions Resources for Italian: Presenting a Manually Annotated Spoken Corpus. 130-138 - David Portes, Ales Horák:
Generating High-Quality F0 Embeddings Using the Vector-Quantized Variational Autoencoder. 139-148 - Abner Hernandez, Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Seung Hee Yang, Juan Rafael Orozco-Arroyave, Andreas K. Maier:
Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation. 149-160 - Mala J. B., S. M. Alex Raj, Rajeev Rajan:
X-Vector-Based Speaker Diarization Using Bi-LSTM and Interim Voting-Driven Post-processing. 161-173 - Thibault Bañeras Roux, Mickael Rouvier, Jane Wottawa, Richard Dufour:
A Paradigm for Interpreting Metrics and Measuring Error Severity in Automatic Speech Recognition. 174-183 - Maros Jakubec, Roman Jarina, Eva Lieskovska, Peter Kasak, Michal Spisiak:
Enhancing Speech Emotion Recognition Using Transfer Learning from Speaker Embeddings. 184-195
Dialogue
- Lucas Druart, Valentin Vielzeuf, Yannick Estève:
Investigating Low-Cost LLM Annotation for Spoken Dialogue Understanding Datasets. 199-209 - Kwan Yeung Wong, Korris Fu-Lai Chung:
PiCo-VITS: Leveraging Pitch Contours for Fine-Grained Emotional Speech Synthesis. 210-221 - Daniel Ortega, Steven Söhnel, Ngoc Thang Vu:
Improving and Understanding Clarifying Question Generation in Conversational Search. 222-235 - Duygu Altinok:
Explainable Multimodal Fusion for Dementia Detection From Text and Speech. 236-251 - Diego Alexander Lopez-Santander, Cristian David Ríos-Urrego, Christian Bergler, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Robust Classification of Parkinson's Speech: an Approximation to a Scenario With Non-controlled Acoustic Conditions. 252-262 - Ondrej Sotolár, Jaromír Plhák, David Smahel:
Leveraging Conceptual Similarities to Enhance Modeling of Factors Affecting Adolescents' Well-Being. 263-274 - Ankit Kumar, Munir Georges:
Joint-Average Mean and Variance Feature Matching (JAMVFM) Semi-supervised GAN with Additional-Objective Training Function for Intent Detection. 275-287 - Niko Kleer, Leon Weyand, Michael Feld, Klaus Berberich:
Capturing Task-Related Information for Text-Based Grasp Classification Using Fine-Tuned Embeddings. 288-299 - Julian Wolter, Niko Kleer, Michael Feld:
StepDP: A Step Towards Expressive and Pervasive Dialogue Platforms. 300-312 - Jeferson David Gallo-Aristizábal, Daniel Escobar-Grisales, Cristian David Ríos-Urrego, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Automatic Classification of Parkinson's Disease Using Wav2vec Embeddings at Phoneme, Syllable, and Word Levels. 313-323
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.