default search action
12th SSW 2023: Grenoble, France
- Gérard Bailly, Thomas Hueber, Damien Lolive, Nicolas Obin, Olivier Perrotin:
12th ISCA Speech Synthesis Workshop, SSW 2023, Grenoble, France, August 26-28, 2023. ISCA 2023
Orals 1: TTS input
- Gérard Bailly, Martin Lenglet, Olivier Perrotin, Esther Klabbers:
Advocating for text input in multi-speaker text-to-speech systems. 1-7 - Jason Fong, Hao Tang, Simon King:
Spell4TTS: Acoustically-informed spellings for improving text-to-speech pronunciations. 8-13 - Marcel Granero Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell, Nicole Peinelt, Alexis Moinet, Thomas Drugman:
A Comparative Analysis of Pretrained Language Models for Text-to-Speech. 14-20 - Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers:
Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection. 21-26
Orals 2: Evaluation
- Lev Finkelstein, Joshua Camp, Rob Clark:
Importance of Human Factors in Text-To-Speech Evaluations. 27-33 - Fritz Seebauer, Michael Kuhlmann, Reinhold Haeb-Umbach, Petra Wagner:
Re-examining the quality dimensions of synthetic speech. 34-40 - Ambika Kirkland, Shivam Mehta, Harm Lameris, Gustav Eje Henter, Éva Székely, Joakim Gustafson:
Stuck in the MOS pit: A critical analysis of MOS test methodology in TTS evaluation. 41-47 - Ondrej Plátek, Ondrej Dusek:
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module. 48-54
Orals 3: Beyond text-to-speech
- Johannes A. Louw:
Cross-lingual transfer using phonological features for resource-scarce text-to-speech. 55-61 - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion. 62-68 - Harm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely:
Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS. 69-74 - Johannah O'Mahony, Catherine Lai, Simon King:
Synthesising turn-taking cues using natural conversational data. 75-80
Orals 4: Voice conversion
- Arnab Das, Suhita Ghosh, Tim Polzehl, Ingo Siegert, Sebastian Stober:
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings. 81-87 - Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko:
PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder. 88-93 - Ryunosuke Hirai, Yuki Saito, Hiroshi Saruwatari:
Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion. 94-99 - Anton Kashkin, Ivan Karpukhin, Svyatoslav Shishkin:
HiFi-VC: High Quality ASR-based Voice Conversion. 100-105
Orals 5: Expressivity, emotion and styles
- Daria Diatlova, Vitalii Shutov:
EmoSpeech: guiding FastSpeech2 towards Emotional Text to Speech. 106-112 - Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova:
Controllable Emphasis with zero data for text-to-speech. 113-119 - Martin Lenglet, Olivier Perrotin, Gérard Bailly:
Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive Control. 120-126 - Sofoklis Kakouros, Juraj Simko, Martti Vainio, Antti Suni:
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody. 127-133
Orals 6: Long form, multimodal & multi-speaker TTS
- Adriana Stan, Johannah O'Mahony:
An analysis on the effects of speaker embedding choice in non auto-regressive TTS. 134-138 - Weicheng Zhang, Cheng-chieh Yeh, Will Beckman, Tuomo Raitio, Ramya Rasipuram, Ladan Golipour, David Winarsky:
Audiobook synthesis with long-form neural text-to-speech. 139-143 - Tuomo Raitio, Javier Latorre, Andrea Davis, Tuuli Morrill, Ladan Golipour:
Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling. 144-149 - Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter:
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis. 150-156
Posters SSW
- Haolin Chen, Philip N. Garner:
Diffusion Transformer for Adaptive Text-to-Speech. 157-162 - Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely:
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis. 163-169 - David Guennec, Lily Wadoux, Aghilas Sini, Nelly Barbot, Damien Lolive:
Voice Cloning: Training Speaker Selection with Limited Multi-Speaker Corpus. 170-176 - Ravi Shankar, Archana Venkataraman:
Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time Warping. 177-183 - Jarod Duret, Yannick Estève, Titouan Parcollet:
Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data. 184-190 - Kishor Kayyar Lakshminarayana, Christian Dittmar, Nicola Pia, Emanuël A. P. Habets:
Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests. 191-196 - Sajad Shirali-Shahreza, Gerald Penn:
Better Replacement for TTS Naturalness Evaluation. 197-203 - Mikey Elmers, Éva Székely:
The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized Lectures. 204-210 - Takenori Yoshimura, Takato Fujimoto, Keiichiro Oura, Keiichi Tokuda:
SPTK4: An Open-Source Software Toolkit for Speech Signal Processing. 211-217 - Lev Finkelstein, Chun-an Chan, Vincent Wan, Heiga Zen, Rob Clark:
FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From Spectrograms. 218-224 - Biel Tura Vecino, Adam Gabrys, Daniel Matwicki, Andrzej Pomirski, Tom Iddon, Marius Cotescu, Jaime Lorenzo-Trueba:
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications. 225-229 - Ibrahim Ibrahimov, Gábor Gosztolya, Tamás Gábor Csapó:
Data Augmentation Methods on Ultrasound Tongue Images for Articulation-to-Speech Synthesis. 230-235
Late breaking reports (not peer reviewed)
- Shaimaa Alwaisi, Mohammed Salah Al-Radhi, Géza Németh:
Universal Approach to Multilingual Multispeaker Child Speech SynthesisUniversal Approach to Multilingual Multispeaker Child Speech Synthesis. 236-237 - Seraphina Fong, Marco Matassoni, Gianluca Esposito, Alessio Brutti:
Towards Speaker-Independent Voice Conversion for Improving Dysarthric Speech Intelligibility. 238-239 - Maxime Jacquelin, Maeva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin:
Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised models. 240-241 - Zhu Li, Xiyuan Gao, Shekhar Nayak, Matt Coler:
SarcasticSpeech: Speech Synthesis for Sarcasm in Low-Resource Scenarios. 242-243 - Nicholas Sanders, Korin Richmond:
Recovering Discrete Prosody Inputs via Invert-Classify. 244-245 - Atli Thor Sigurgeirsson, Simon King:
Using a Large Language Model to Control Speaking Style for Expressive TTS. 246-247 - Emmett Strickland, Dana Aubakirova, Dorin Doncenco, Diego Torres, Marc Evrard:
NaijaTTS: A pitch-controllable TTS model for Nigerian Pidgin. 248-249
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.