default search action
26th SPECOM 2024: Belgrade, Serbia - Part I
- Alexey Karpov, Vlado Delic:
Speech and Computer - 26th International Conference, SPECOM 2024, Belgrade, Serbia, November 25-28, 2024, Proceedings, Part I. Lecture Notes in Computer Science 15299, Springer 2025, ISBN 978-3-031-77960-2
Invited Papers
- Ivan Kraljevski, Frank Duckhorn, Daniel Sobe, Constanze Tschöpe, Matthias Wolff:
Preserving Language Heritage Through Speech Technology: The Case of Upper Sorbian. 3-22 - Milan Secujski, Branislav M. Popovic, Darko Pekar, Niksa Jakovljevic, Edvin Pakoci, Sinisa Suzic, Tijana V. Nosek, Nikola Simic, Vuk Stanojev, Vlado Delic:
Retrospective and Perspectives of TTS & STT Technology Development and Implementation for South Slavic Under-Resourced Languages. 23-42
Automatic Speech Recognition
- Yue Luo, Péter Mihajlik:
Comparison of Well and Lower-Resourced Self-training in ASR. 45-56 - Irina S. Kipyatkova, Ildar Kagirov, Mikhail Dolgushin, Alexandra Rodionova:
Towards a Livvi-Karelian End-to-End ASR System. 57-68 - Vishwa Gupta:
Advances in OpenASR21 Evaluation with Increased Temporal Resolution for Speech Self-supervised Learning Models. 69-81 - Sergei Katkov, Antonio Liotta, Alessandro Vietti:
Benchmarking Whisper Under Diverse Audio Transformations and Real-Time Constraints. 82-91 - Ahmet Gunduz, Yunsu Kim, Kamer Ali Yuksel, Mohamed Al-Badrashiny, Thiago Castro Ferreira, Hassan Sawaf:
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost. 92-103 - Manuel Torralbo, Ariane Méndez, Maia Agirre, Arantza del Pozo:
Pre-training and Adverse Audio Samples for Data-Efficient Wake Word Detection. 104-118 - Pranav Karande, Balaram Sarkar, Chandresh Kumar Maurya:
Cross-Lingual Summarization of Speech-to-Speech Translation: A Baseline. 119-133
Speech and Language Resources
- Nikola Ljubesic, Peter Rupnik, Danijel Korzinek:
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings. 137-150 - Tatiana Y. Sherstinova, Irina Petrova:
ESC Corpus of Spoken Russian: Everyday Student Conversations Captured Through Continuous Speech Recording in Natural Communicative Environments. 151-162 - Denis Ivanko, Dmitry Ryumin, Alexandr Axyonov, Alexey M. Kashevnik, Alexey Karpov:
OpenAV: Bilingual Dataset for Audio-Visual Voice Control of a Computer for Hand Disabled People. 163-173 - Velka Popova, Dimitar Popov:
Bulgarian Speech Resources in the CHILDES System. 174-186 - Natalia Bogdanova-Beglarian, Olga Blinova, Maria Khokhlova, Tatiana Y. Sherstinova, Tatiana I. Popova:
Multiword Units in Russian Everyday Speech: Empirical Classification and Corpus-Based Studies. 187-200 - Rodmonga Potapova, Vsevolod Potapov, Ekaterina Karimova, Leonid Motovskikh, Nikolay Bobrov:
Neurophysiological Correlates of Textual Modulation in Visual Stimuli: An Experimental Study of Russian and English Memes. 201-215
Speech Synthesis and Perception
- Tijana V. Nosek, Sinisa Suzic, Milan Secujski, Vuk Stanojev, Darko Pekar, Vlado Delic:
End-to-End Speech Synthesis for the Serbian Language Based on Tacotron. 219-229 - Shaimaa Alwaisi, Mohammed Salah Al-Radhi, Géza Németh:
ChildTinyTalks (CTT): A Benchmark Dataset and Baseline for Expressive Child Speech Synthesis. 230-240 - Anna Borzykh, Tatiana Shevchenko:
Multidimensional Rhythm: Comparing Rhythmic Properties of Australian and New Zealand Monologues. 241-250 - Anastasia Ananeva, Uliana E. Kochetkova:
Influence of Linguistic and Sociolinguistic Factors on Speech Rate Perception. 251-264 - Daria Guseva, Olga Mitrofanova, Mikhail Dolgushin:
Human and Machine Keyphrase Perception in Russian Text and Speech. 265-280 - Elena E. Lyakso, Olga V. Frolova, Anton Matveev, Aleksandr Nikolaev, Ruban Nersisson:
Assessment of Children's Ability to Manifest Emotions in Facial Expressions, Voice and Speech by Humans, Automatic, and on a Likert Scale. 281-294
Speech Processing for Medicine
- Gábor Gosztolya, László Tóth, Veronika Svindt, Judit Bóna, Ildikó Hoffmann:
Investigating the Utility of wav2vec 2.0 Hidden Layers for Detecting Multiple Sclerosis. 297-308 - Danila Mamontov, Sebastian Zepf, Alexey Karpov, Wolfgang Minker:
Cross-Cultural Automatic Depression Detection Based on Audio Signals. 309-323 - Lokesh Kumar, Kumar Kaustubh, S. R. Mahadeva Prasanna:
Depression Classification Using Token Merging-Based Speech Spectrotemporal Transformer. 324-335 - Mary Idamkina, Andrea Corradini:
Detecting Depression from Audio Data. 336-351 - Dosti Aziz, Dávid Sztahó:
Binary and Multiclass Classification of Dysphonia Using Whisper Encoder and One-Dimensional Convolutional Neural Network. 352-366 - German Egle, Dariya Novokhrestova, Svetlana Tomilina, Evgeny Kostyuchenko:
Approach to Assessing the Quality of Syllable Pronunciation by Patients in the Process of Speech Rehabilitation Based on Comparison with Healthy Speakers. 367-376 - Philipp L. Harnisch, Daniel Schuhmann, Stefan Hillmann:
A Comparative Study for Contextualized Spoken Answer Classification in German Medical Questionnaires. 377-391
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.