


default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 31
Volume 31, 2023
- Mrinmoy Bhattacharjee
, S. R. M. Prasanna
, Prithwijit Guha
:
Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning. 1-10 - Zhaojie Luo
, Shoufeng Lin
, Rui Liu
, Jun Baba
, Yuichiro Yoshikawa
, Hiroshi Ishiguro:
Decoupling Speaker-Independent Emotions for Voice Conversion via Source-Filter Networks. 11-24 - Jinchuan Tian
, Jianwei Yu
, Chao Weng, Yuexian Zou
, Dong Yu
:
Integrating Lattice-Free MMI Into End-to-End Speech Recognition. 25-38 - Ravi Shankar
, Hsi-Wei Hsieh, Nicolas Charon
, Archana Venkataraman
:
A Diffeomorphic Flow-Based Variational Framework for Multi-Speaker Emotion Conversion. 39-53 - Ryandhimas E. Zezario
, Szu-Wei Fu
, Fei Chen
, Chiou-Shann Fuh
, Hsin-Min Wang
, Yu Tsao
:
Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features. 54-70 - Xiaoyi Qin, Danwei Cai
, Ming Li
:
Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios. 71-85 - Vikram C. Mathad
, Julie M. Liss, Kathy Chapman, Nancy Scherer, Visar Berisha:
Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation. 86-95 - Li Li
, Hirokazu Kameoka
, Shoji Makino
:
FastMVAE2: On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures. 96-110 - Jie Wang
, Yan Yang
, Keyu Liu
, Zhiping Zhu, Xiaorong Liu:
M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER. 111-120 - Marc Delcroix
, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita
, Yasunori Ohishi
, Shoko Araki
:
SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning. 121-136 - Daisuke Niizumi
, Daiki Takeuchi, Yasunori Ohishi
, Noboru Harada
, Kunio Kashino:
BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations. 137-151 - Yingrui Xu
, Hao Liu, Jingguo Ge
, Xiaodan Zhang, Jingyuan Hu
, Yulei Wu
, Honglei Lv, Hongbin Shi, Wei Zhou
:
Mining Weak Relations Between Reviews for Opinion Spam Detection. 152-162 - Yoshiki Masuyama
, Kohei Yatabe
, Kento Nagatomo, Yasuhiro Oikawa
:
Online Phase Reconstruction via DNN-Based Phase Differences Estimation. 163-176 - Jiang Liu
, Donghong Ji, Jingye Li, Dongdong Xie, Chong Teng, Liang Zhao
, Fei Li
:
TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags. 177-187 - Zhe Hu
, Zhiwei Cao
, Hou Pong Chan
, Jiachen Liu, Xinyan Xiao, Jinsong Su
, Hua Wu:
Controllable Dialogue Generation With Disentangled Multi-Grained Style Specification and Attribute Consistency Reward. 188-199 - Sondes Abderrazek
, Corinne Fredouille
, Alain Ghio, Muriel Lalain, Christine Meunier
, Virginie Woisard:
Interpreting Deep Representations of Phonetic Features via Neuro-Based Concept Detector: Application to Speech Disorders Due to Head and Neck Cancer. 200-214 - Jie Zhang
, Rui Tao, Jun Du
, Li-Rong Dai:
Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks. 215-228 - Xianke Wang
, Bowen Tian
, Weiming Yang, Wei Xu
, Wenqing Cheng:
MusicYOLO: A Vision-Based Framework for Automatic Singing Transcription. 229-241 - Yuanyuan Liu
, Mittapalle Kiran Reddy
, Nelly Penttilä
, Tiina Ihalainen
, Paavo Alku
, Okko Räsänen
:
Automatic Assessment of Parkinson's Disease Using Speech Representations of Phonation and Articulation. 242-255 - David Südholt
, Alec Wright
, Cumhur Erkut
, Vesa Välimäki
:
Pruning Deep Neural Network Models of Guitar Distortion Effects. 256-264 - Fangkai Jiao
, Yangyang Guo
, Minlie Huang
, Liqiang Nie
:
Enhanced Multi-Domain Dialogue State Tracker With Second-Order Slot Interactions. 265-276 - Hui Tian
, Yiqin Qiu
, Wojciech Mazurczyk
, Haizhou Li
, Zhenxing Qian
:
STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams. 277-289 - Gopendra Vikram Singh
, Mauajama Firdaus
, Asif Ekbal
, Pushpak Bhattacharyya:
EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations. 290-300 - De Hu
, Huaiwen Zhang
, Feilong Bao, Rui Wang
:
Distributed Sampling Rate Offset Estimation Over Acoustic Sensor Networks Based on Asynchronous Network Newton Optimization. 301-312 - David Diaz-Guerra
, Antonio Miguel
, José Ramón Beltrán
:
Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs. 313-321 - Peiming Guo
, Shen Huang, Peijie Jiang
, Yueheng Sun
, Meishan Zhang
, Min Zhang:
Curriculum-Style Fine-Grained Adaption for Unsupervised Cross-Lingual Dependency Transfer. 322-332 - Naveen Kumar Desiraju
, Simon Doclo
, Markus Buck
, Tobias Wolff:
Joint Online Estimation of Early and Late Residual Echo PSD for Residual Echo Suppression. 333-344 - Guangzhi Sun
, Chao Zhang
, Philip C. Woodland
:
Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator. 345-354 - Jonah Casebeer
, Nicholas J. Bryan
, Paris Smaragdis:
Meta-AF: Meta-Learning for Adaptive Filters. 355-370 - Yingwen Fu
, Nankai Lin
, Boyu Chen, Ziyu Yang, Shengyi Jiang
:
Cross-Lingual Named Entity Recognition for Heterogenous Languages. 371-382 - Jun-You Wang
, Jyh-Shing Roger Jang:
Training a Singing Transcription Model Using Connectionist Temporal Classification Loss and Cross-Entropy Loss. 383-396 - Zhong-Qiu Wang
, Gordon Wichern
, Shinji Watanabe
, Jonathan Le Roux
:
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. 397-410 - Yu Li
, Bojie Hu, Jian Liu
, Yufeng Chen, Jinan Xu
:
A Neighborhood Re-Ranking Model With Relation Constraint for Knowledge Graph Completion. 411-425 - Alessio Miaschi
, Dominique Brunato
, Felice Dell'Orletta
, Giulia Venturi
:
On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors. 426-438 - Rong Xiao
, Yu Wan
, Baosong Yang
, Haibo Zhang
, Huajin Tang
, Derek F. Wong
, Boxing Chen
:
Towards Energy-Preserving Natural Language Understanding With Spiking Neural Networks. 439-447 - Juan Zhao
, Tianrui Zong, Yong Xiang
, Longxiang Gao
, Guang Hua
, Keshav Sood
, Yushu Zhang
:
SSVS-SSVD Based Desynchronization Attacks Resilient Watermarking Method for Stereo Signals. 448-461 - Qiquan Zhang
, Xinyuan Qian
, Zhaoheng Ni, Aaron Nicolson, Eliathamby Ambikairajah
, Haizhou Li:
A Time-Frequency Attention Module for Neural Speech Enhancement. 462-475 - Binhong Xie
, Yu Li
, Hongyan Zhao
, Lihu Pan
, Enhui Wang:
A Cross-Attention Fusion Based Graph Convolution Auto-Encoder for Open Relation Extraction. 476-485 - Qian-Bei Hong
, Chung-Hsien Wu
, Hsin-Min Wang
:
Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification. 486-499 - Xinglin Lyu
, Junhui Li
, Min Zhang, Chenchen Ding, Hideki Tanaka, Masao Utiyama
:
Refining History for Future-Aware Neural Machine Translation. 500-512 - Mou Wang
, Junqi Chen
, Xiao-Lei Zhang
, Susanto Rahardja
:
End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus. 513-524 - Asier López-Zorrilla
, María Inés Torres
, Heriberto Cuayáhuitl:
Audio Embedding-Aware Dialogue Policy Learning. 525-538 - Xichen Shang
, Chuxin Chen, Zipeng Chen
, Qianli Ma
:
Modularized Mutuality Network for Emotion-Cause Pair Extraction. 539-549 - Xinyuan Qian
, Zhengdong Wang, Jiadong Wang
, Guohui Guan, Haizhou Li
:
Audio-Visual Cross-Attention Network for Robotic Speaker Tracking. 550-562 - Kristina Tesch
, Timo Gerkmann
:
Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement. 563-575 - Thilo von Neumann
, Keisuke Kinoshita
, Christoph Böddeker
, Marc Delcroix
, Reinhold Haeb-Umbach
:
Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria. 576-589 - Davide Albertini
, Alberto Bernardini
, Federico Borra, Fabio Antonacci, Augusto Sarti
:
Two-Stage Beamforming With Arbitrary Planar Arrays of Differential Microphone Array Units. 590-602 - Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai
:
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization. 603-618 - Yingying Xiao
, Shanmou Chen
, Qiangqiang Zhang
, Dongyuan Lin
, Minglin Shen, Junhui Qian
, Shiyuan Wang
:
Generalized Hyperbolic Tangent Based Random Fourier Conjugate Gradient Filter for Nonlinear Active Noise Control. 619-632 - Jun Qi
, Chao-Han Huck Yang
, Pin-Yu Chen
, Javier Tejedor
:
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing. 633-642 - Bin Gu
, Wu Guo
, Jie Zhang
:
Memory Storable Network Based Feature Aggregation for Speaker Representation Learning. 643-655 - Takumi Abe
, Shoichi Koyama
, Natsuki Ueno
, Hiroshi Saruwatari
:
Amplitude Matching for Multizone Sound Field Control. 656-669 - Mahdi Barhoush
, Ahmed Hallawa
, Arne Peine, Lukas Martin, Anke Schmeink:
Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning. 670-683 - Herman Kamper
:
Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring. 684-694 - Changheng Li
, Jorge Martínez
, Richard Christian Hendriks
:
Joint Maximum Likelihood Estimation of Microphone Array Parameters for a Reverberant Single Source Scenario. 695-705 - Shota Horiguchi
, Shinji Watanabe
, Paola García
, Yuki Takashima
, Yohei Kawaguchi
:
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors. 706-720 - Ling He
, Jia Fu, Yuanyuan Li
, Xi Xiong
, Jing Zhang
:
WNSA-Net: An Axial-Attention-Based Network for Schizophrenia Detection Using Wideband and Narrowband Spectrograms. 721-733 - Anusha Prakash
, Hema A. Murthy:
Exploring the Role of Language Families for Building Indic Speech Synthesisers. 734-747 - Mahdin Rohmatillah
, Jen-Tzung Chien
:
Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy. 748-761 - Shahram Ghorbani
, John H. L. Hansen
:
Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech. 762-774 - Weidong Chen
, Xiaofen Xing
, Xiangmin Xu
, Jianxin Pang
, Lan Du:
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing. 775-788 - Nicki Holighaus
, Günther Koliander
, Clara Hollomey, Friedrich Pillichshammer
:
Grid-Based Decimation for Wavelet Transforms With Stably Invertible Implementation. 789-801 - Weiwei Lin
, Man-Wai Mak
:
Robust Speaker Verification Using Deep Weight Space Ensemble. 802-812 - Lin Zhang
, Xin Wang
, Erica Cooper
, Nicholas W. D. Evans
, Junichi Yamagishi
:
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance. 813-825 - Jie Mei
, Yufan Wang, Xinhui Tu, Ming Dong
, Tingting He
:
Incorporating BERT With Probability-Aware Gate for Spoken Language Understanding. 826-834 - Tsubasa Ochiai
, Marc Delcroix
, Tomohiro Nakatani, Shoko Araki
:
Mask-Based Neural Beamforming for Moving Speakers With Self-Attention-Based Tracking. 835-848 - Rongzhi Gu
, Shi-Xiong Zhang, Yuexian Zou
, Dong Yu
:
Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation. 849-862 - Naotake Masuda
, Daisuke Saito:
Improving Semi-Supervised Differentiable Synthesizer Sound Matching for Practical Applications. 863-875 - Erfan Loweimi
, Zhengjun Yue
, Peter Bell
, Steve Renals
, Zoran Cvetkovic
:
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform. 876-890 - Bengt J. Borgström
:
A Generative Approach to Condition-Aware Score Calibration for Speaker Verification. 891-901 - Irene Martín-Morató
, Annamaria Mesaros
:
Strong Labeling of Sound Events Using Crowdsourced Weak Labels and Annotator Competence Estimation. 902-914 - Wenzhao Zhu, Lei Luo
, Jinwei Sun
, Mads Græsbøll Christensen
:
A New Virtual Tracking Sub-Algorithm Based Hybrid Active Control System for Narrowband Noise With Impulsive Interference. 915-926 - Thomas Deppisch
, Sebastià V. Amengual Garí
, Paul Calamia
, Jens Ahrens
:
Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses. 927-942 - Eloi Moliner
, Vesa Välimäki
:
BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks. 943-956 - Martin Jälmby
, Filip Elvander
, Toon van Waterschoot
:
Low-Rank Room Impulse Response Estimation. 957-969 - Hong Liu, Yucheng Cai, Zhenru Lin
, Zhijian Ou
, Yi Huang
, Junlan Feng:
Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems. 970-984 - De Hu
, Qintuya Si
, Rui Liu
, Feilong Bao:
Distributed Sensor Selection for Speech Enhancement With Acoustic Sensor Networks. 985-999 - Yingke Zhu
, Brian Mak
:
Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification. 1000-1012 - Yuying Li
, Yuchen Liu
, Donald S. Williamson
:
A Composite T60 Regression and Classification Approach for Speech Dereverberation. 1013-1023 - Hanyi Zhang
, Longbiao Wang
, Kong Aik Lee
, Meng Liu, Jianwu Dang
, Helen Meng:
Meta-Generalization for Domain-Invariant Speaker Verification. 1024-1036 - Shutong Niu
, Jun Du
, Lei Sun
, Yu Hu, Chin-Hui Lee
:
QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization. 1037-1049 - Boyang Lyu
, Chunxiao Fan, Yue Ming
, Panzi Zhao, Nannan Hu
:
En-HACN: Enhancing Hybrid Architecture With Fast Attention and Capsule Network for End-to-end Speech Recognition. 1050-1062 - Yang Liu
, Haoqin Sun
, Wenbo Guan
, Yuqi Xia
, Yongwei Li
, Masashi Unoki
, Zhen Zhao
:
A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition. 1063-1074 - Hao Zhang
, Nianwen Si, Yaqi Chen
, Wenlin Zhang, Xukui Yang
, Dan Qu
, Weiqiang Zhang
:
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning. 1075-1086 - Wei-Cheng Lin
, Carlos Busso
:
Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion. 1087-1099 - Achyut Mani Tripathi
, Om Jee Pandey
:
Divide and Distill: New Outlooks on Knowledge Distillation for Environmental Sound Classification. 1100-1113 - Hao Zhang
, Ashutosh Pandey
, DeLiang Wang
:
Low-Latency Active Noise Control Using Attentive Recurrent Network. 1114-1123 - Avital Bross, Sharon Gannot
:
Training-Based Multiple Source Tracking Using Manifold-Learning and Recursive Expectation-Maximization. 1124-1140 - Guimin Hu
, Yi Zhao
, Guangming Lu
:
Emotion Prediction Oriented Method With Multiple Supervisions for Emotion-Cause Pair Extraction. 1141-1152 - Reza Mohsenipour
, Daniel Massicotte
, Wei-Ping Zhu
:
PI Control of Loudspeakers Based on Linear Fractional Order Model. 1153-1162 - Tim Lübeck
, Johannes M. Arend
, Christoph Pörschmann
:
Spatial Upsampling of Sparse Spherical Microphone Array Signals. 1163-1174 - Jiajun Deng
, Xurong Xie
, Tianzi Wang, Mingyu Cui, Boyang Xue
, Zengrui Jin
, Guinan Li
, Shujie Hu
, Xunying Liu
:
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems. 1175-1190 - Hongsheng Zhang
, Jizhang Gan
, Ting Liu
, Kui Huang, Hong Yang:
Coefficients-Switched Normalized Least-Mean- Squares Adaption in Echo Canceler of Sparse-Echo-Path. 1191-1199 - Eric Guizzo
, Tillman Weyde
, Simone Scardapane
, Danilo Comminiello
:
Learning Speech Emotion Representations in the Quaternion Domain. 1200-1212 - Jiaqi Bai
, Ze Yang, Jian Yang
, Hongcheng Guo
, Zhoujun Li
:
KINet: Incorporating Relevant Facts Into Knowledge-Grounded Dialog Generation. 1213-1222 - Haiquan Zhao
, Yuan Gao, Yingying Zhu:
Robust Subband Adaptive Filter Algorithms-Based Mixture Correntropy and Application to Acoustic Echo Cancellation. 1223-1233 - Chen Zhang
, Luis Fernando D'Haro
, Qiquan Zhang
, Thomas Friedrichs, Haizhou Li
:
PoE: A Panel of Experts for Generalized Automatic Dialogue Assessment. 1234-1250 - Qing Wang
, Jun Du
, Huaxin Wu, Jia Pan, Feng Ma, Chin-Hui Lee
:
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection. 1251-1264 - Yingwen Fu
, Nankai Lin
, Xiaohui Yu, Shengyi Jiang
:
Self-Training With Double Selectors for Low-Resource Named Entity Recognition. 1265-1275 - Kilian Schulze-Forster
, Gaël Richard
, Liam Kelley, Clement S. J. Doire
, Roland Badeau
:
Unsupervised Music Source Separation Using Differentiable Parametric Source Models. 1276-1289 - Yinggang Liu
, Hong Fu
, Ying Wei
, Hanbing Zhang
:
Sound Event Classification Based on Frequency-Energy Feature Representation and Two-Stage Data Dimension Reduction. 1290-1304 - Ege Erdem
, Zoran Cvetkovic
, Hüseyin Hacihabiboglu
:
3D Perceptual Soundfield Reconstruction via Virtual Microphone Synthesis. 1305-1317 - Dong-Yuan Shi
, Woon-Seng Gan
, Bhan Lam
, Xiaoyi Shen
:
A Frequency-Domain Output-Constrained Active Noise Control Algorithm Based on an Intuitive Circulant Convolutional Penalty Factor. 1318-1332 - Muhammed Zahid Ozturk
, Chenshu Wu, Beibei Wang, Min Wu
, K. J. Ray Liu:
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System. 1333-1347 - Jianwei Zhang
, Julie Liss, Suren Jayasuriya
, Visar Berisha:
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection. 1348-1359 - Ashutosh Pandey
, DeLiang Wang
:
Attentive Training: A New Training Framework for Speech Enhancement. 1360-1370 - Hirofumi Inaguma
, Tatsuya Kawahara
:
Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition. 1371-1385 - Mittapalle Kiran Reddy
, Paavo Alku
:
Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech. 1386-1396 - Shunsuke Kita
, Yoshinobu Kajikawa
:
Sound Source Localization Inside a Structure Under Semi-Supervised Conditions. 1397-1408 - Guowei Wu
, Shipei Liu
, Xiaoya Fan
:
The Power of Fragmentation: A Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation. 1409-1420 - Xueqin Luo
, Gongping Huang
, Jilu Jin
, Jingdong Chen
, Jacob Benesty
, Wen Zhang
, Mengyao Zhu
, Chunjian Li:
Design of Maximum Directivity Beamformers With Linear Acoustic Vector Sensor Arrays. 1421-1435 - Ruchao Fan
, Wei Chu, Peng Chang, Abeer Alwan:
A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition. 1436-1448 - Tianyou Li, Siyuan Lian, Sipei Zhao
, Jing Lu
, Ian S. Burnett
:
Distributed Active Noise Control Based on an Augmented Diffusion FxLMS Algorithm. 1449-1463 - Jiayuan Xie
, Wenhao Fang, Qingbao Huang
, Yi Cai
, Tao Wang
:
Enhancing Paraphrase Question Generation With Prior Knowledge. 1464-1475 - Chen Chen, Hansheng Hong, Jie Guo
, Bin Song
:
Inter-Intra Modal Representation Augmentation With Trimodal Collaborative Disentanglement Network for Multimodal Sentiment Analysis. 1476-1488 - Jian Yang
, Yuwei Yin
, Liqun Yang
, Shuming Ma, Haoyang Huang, Dongdong Zhang
, Furu Wei, Zhoujun Li
:
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation. 1489-1498 - Xin Wu
, Yi Cai
, Zetao Lian, Ho-fung Leung
, Tao Wang
:
Generating Natural Language From Logic Expressions With Structural Representation. 1499-1510 - Yi Li
, Yang Sun
, Wenwu Wang
, Syed Mohsen Naqvi
:
U-Shaped Transformer With Frequency-Band Aware Attention for Speech Enhancement. 1511-1521 - Christian Antoñanzas
, Miguel Ferrer
, Maria de Diego
, Alberto González
:
Remote Microphone Technique for Active Noise Control Over Distributed Networks. 1522-1535 - Yi Zhu
, Abhishek Tiwari, João Monteiro, Shruti Rajendra Kshirsagar, Tiago H. Falk
:
COVID-19 Detection via Fusion of Modulation Spectrum and Linear Prediction Speech Features. 1536-1549 - Jijie Li
, Kai Shuang
, Jinyu Guo
, Zengyi Shi, Hongman Wang:
Enhancing Semantic Relation Classification With Shortest Dependency Path Reasoning. 1550-1560 - Mao-Kui He
, Jun Du
, Qing-Feng Liu, Chin-Hui Lee
:
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding. 1561-1573 - Longting Xu
, Jichen Yang
, Chang Huai You
, Xinyuan Qian
, Daiyu Huang
:
Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection. 1574-1586 - Huajian Fang
, Dennis Becker
, Stefan Wermter
, Timo Gerkmann
:
Integrating Uncertainty Into Neural Network-Based Speech Enhancement. 1587-1600 - Libo Qin
, Xiao Xu
, Lehan Wang, Yue Zhang
, Wanxiang Che
:
Modularized Pre-Training for End-to-End Task-Oriented Dialogue. 1601-1610 - Hanlei Zhang
, Hua Xu
, Shaojie Zhao
, Qianrui Zhou
:
Learning Discriminative Representations and Decision Boundaries for Open Intent Detection. 1611-1623 - Guangsheng Bao
, Yue Zhang
:
A General Contextualized Rewriting Framework for Text Summarization. 1624-1635 - Christoph Kirsch
, Stephan Dieter Ewert
:
A Universal Filter Approximation of Edge Diffraction for Geometrical Acoustics. 1636-1651 - Peyman Goli
, Steven van de Par:
Deep Learning-Based Speech Specific Source Localization by Using Binaural and Monaural Microphone Arrays in Hearing Aids. 1652-1666 - Nguyen Binh Thien
, Yukoh Wakabayashi
, Kenta Iwai
, Takanobu Nishiura
:
Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood. 1667-1680 - Srikanth Raj Chetupalli
, Emanuël A. P. Habets
:
Speaker Counting and Separation From Single-Channel Noisy Mixtures. 1681-1692 - Guangyan Zhang
, Ying Qin
, Wenjie Zhang, Jialun Wu, Mei Li, Yutao Gai
, Feijun Jiang, Tan Lee
:
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre. 1693-1705 - Ruijie Tao
, Kong Aik Lee
, Rohan Kumar Das
, Ville Hautamäki, Haizhou Li:
Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs. 1706-1719 - Dongchao Yang
, Jianwei Yu
, Helin Wang
, Wen Wang
, Chao Weng, Yuexian Zou
, Dong Yu
:
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation. 1720-1733 - Paul Konstantin Krug
, Peter Birkholz
, Branislav Gerazov
, Daniel Rudolph van Niekerk
, Anqi Xu
, Yi Xu
:
Artificial Vocal Learning Guided by Phoneme Recognition and Visual Information. 1734-1744 - Qian-Bei Hong
, Chung-Hsien Wu
, Hsin-Min Wang
:
Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning. 1745-1757 - Wenbin Jiang
, Kai Yu
:
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking. 1758-1770 - Shu'ang Li
, Xuming Hu
, Li Lin, Aiwei Liu, Lijie Wen
, Philip S. Yu
:
A Multi-Level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference. 1771-1783 - Xiaoqing Zheng
:
Building Conventional "Experts" With a Dialogue Logic Programming Language. 1784-1796 - Haitao Lin
, Junnan Zhu, Lu Xiang
, Feifei Zhai
, Yu Zhou
, Jiajun Zhang
, Chengqing Zong
:
Topic-Oriented Dialogue Summarization. 1797-1810 - Haohan Guo
, Fenglong Xie, Xixin Wu
, Frank K. Soong, Helen Meng:
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS. 1811-1824 - Bei Liu
, Zhengyang Chen
, Yanmin Qian
:
Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification. 1825-1838 - Ria Ghosh
, John H. L. Hansen
:
Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform. 1839-1850 - Aolong Zhou
, Wen Zhang
, Guojun Xu
, Xiaoyong Li
, Kefeng Deng
, Junqiang Song
:
DBSA-Net: Dual Branch Self-Attention Network for Underwater Acoustic Signal Denoising. 1851-1865 - Weiwei Lin
, Man-Wai Mak
:
Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation. 1866-1876 - Andrea Galassi
, Marco Lippi
, Paolo Torroni
:
Multi-Task Attentive Residual Networks for Argument Mining. 1877-1892 - Yi Luo
, Jianwei Yu
:
Music Source Separation With Band-Split RNN. 1893-1901 - Keisuke Matsubara
, Takuma Okamoto
, Ryoichi Takashima
, Tetsuya Takiguchi
, Tomoki Toda
, Hisashi Kawai:
Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder. 1902-1915 - Yi Zhou
, Zhizheng Wu
, Xiaohai Tian
, Haizhou Li
:
Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents. 1916-1926 - Qiu-Shi Zhu
, Jie Zhang
, Ziqiang Zhang
, Li-Rong Dai
:
A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition. 1927-1939 - Siqi Sun
, Korin Richmond
, Hao Tang
:
Improving Seq2Seq TTS Frontends With Transcribed Speech Audio. 1940-1952 - Shih-Lun Wu
, Yi-Hsuan Yang
:
MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer With One Transformer VAE. 1953-1967 - Xiaoxue Gao
, Chitralekha Gupta
, Haizhou Li:
PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music. 1968-1981 - Zhicheng Lian
, Haonan Cheng
, Jiawan Zhang
:
PQG-A2SA: Performance Quantification Guided Audio-to-Score Alignment for Orchestral Music. 1982-1992 - Jingen Ni
, Ningning Zhang
, Haofen Li
:
Sparsity-Promoting Affine Projection Algorithm With Periodically-Updated Gain Matrix and Its Performance Analysis. 1993-2003 - Orchisama Das
, Sebastian J. Schlecht
, Enzo De Sena
:
Grouped Feedback Delay Networks With Frequency-Dependent Coupling. 2004-2015 - Xudong Zhao
, Gongping Huang
, Jingdong Chen
, Jacob Benesty
:
Design of 2D and 3D Differential Microphone Arrays With a Multistage Framework. 2016-2031 - Jia-Hao Hsu
, Jeremy Chang
, Min-Hsueh Kuo, Chung-Hsien Wu
:
Empathetic Response Generation Based on Plug-and-Play Mechanism With Empathy Perturbation. 2032-2042 - Aditya Dutt
, Paul D. Gader:
Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using 1D CNN LSTM Networks. 2043-2054 - Arturo Morales
, Juan I. Yuz
, Juan P. Cortés
, Javier G. Fontanet
, Matías Zañartu
:
Glottal Airflow Estimation Using Neck Surface Acceleration and Low-Order Kalman Smoothing. 2055-2066 - Yuya Hosoda
, Arata Kawamura
, Youji Iiguni:
Complex-Domain Pitch Estimation Algorithm for Narrowband Speech Signals. 2067-2078 - Zhidong Liu
, Junhui Li
, Muhua Zhu
:
Alleviating Exposure Bias for Neural Machine Translation via Contextual Augmentation and Self Distillation. 2079-2089 - Hanan Beit-On
, Tom Shlomo
, Boaz Rafaely
:
Weighted Frequency Smoothing for Enhanced Speaker Localization. 2090-2099 - Shan Gao, Xihong Wu, Tianshu Qu
:
A Physical Model-Based Self-Supervised Learning Method for Signal Enhancement Under Reverberant Environment. 2100-2110 - Xue Jiang
, Xiulian Peng
, Huaying Xue
, Yuan Zhang
, Yan Lu
:
Latent-Domain Predictive Neural Speech Coding. 2111-2123 - Shumin Deng
, Jiacheng Yang, Hongbin Ye, Chuanqi Tan
, Mosha Chen
, Songfang Huang
, Fei Huang
, Huajun Chen, Ningyu Zhang
:
LOGEN: Few-Shot Logical Knowledge-Conditioned Text Generation With Self-Training. 2124-2133 - Yuanzhi Liu
, Min He
, Qingqing Yang, Gwanggil Jeon
:
An Unsupervised Framework With Attention Mechanism and Embedding Perturbed Encoder for Non-Parallel Text Sentiment Style Transfer. 2134-2144 - Yang Ai
, Zhen-Hua Ling
:
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra. 2145-2157 - Fei Zhao
, Zhen Wu
, Liang He
, Xin-Yu Dai
:
Label-Correction Capsule Network for Hierarchical Text Classification. 2158-2168 - Cem Subakan
, Mirco Ravanelli
, Samuele Cornell, François Grondin
, Mirko Bronzi:
Exploring Self-Attention Mechanisms for Speech Separation. 2169-2180 - Chenggang Zhang
, Jinjiang Liu, Hao Li
, Xueliang Zhang
:
Neural Multi-Channel and Multi-Microphone Acoustic Echo Cancellation. 2181-2192 - Zheng Liu
, Xin Kang
, Fuji Ren
:
Dual-TBNet: Improving the Robustness of Speech Features via Dual-Transformer-BiLSTM for Speech Emotion Recognition. 2193-2203 - Sandro Cumani
, Salvatore Sarni
:
The Distributions of Uncalibrated Speaker Verification Scores: A Generative Model for Domain Mismatch and Trial-Dependent Calibration. 2204-2219 - Xi Ai
, Bin Fang
:
Cross-Modal Language Modeling in Multi-Motion-Informed Context for Lip Reading. 2220-2232 - Andreas Jonas Fuglsig
, Jesper Jensen
, Zheng-Hua Tan
, Lars Søndergaard Bertelsen
, Jens Christian Lindof, Jan Østergaard
:
Minimum Processing Near-End Listening Enhancement. 2233-2245 - Zhiwen Xie
, Runjie Zhu, Jin Liu
, Guangyou Zhou
, Jimmy Xiangji Huang
:
TARGAT: A Time-Aware Relational Graph Attention Model for Temporal Knowledge Graph Embedding. 2246-2258 - Cuilian Zhang
, Derek F. Wong
, Eddy Sio Kei Lei
, Runzhe Zhan
, Lidia S. Chao
:
Obscurity-Quantified Curriculum Learning for Machine Translation Evaluation. 2259-2271 - Yaxin Liu
, Yan Zhou
, Ziming Li, Junlin Wang, Wei Zhou
, Songlin Hu
:
HIM: An End-to-End Hierarchical Interaction Model for Aspect Sentiment Triplet Extraction. 2272-2285 - Yukoh Wakabayashi
, Kouei Yamaoka
, Nobutaka Ono
:
Sound Field Interpolation for Rotation-Invariant Multichannel Array Signal Processing. 2286-2298 - Jesper Kjær Nielsen
, Mads Græsbøll Christensen
, Jesper Bünsow Boldt
:
An Analysis of Traditional Noise Power Spectral Density Estimators Based on the Gaussian Stochastic Volatility Model. 2299-2313 - Karen Gissell Rosero Jacome
, Felipe Leonel Grijalva
, Bruno Sanches Masiero
:
Sound Events Localization and Detection Using Bio-Inspired Gammatone Filters and Temporal Convolutional Neural Networks. 2314-2324 - Lin Yuan
, Guoheng Huang
, Fenghuan Li
, Xiaochen Yuan
, Chi-Man Pun
, Guo Zhong:
RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition. 2325-2337 - Samuel Poirot
, Stefan Bilbao
, Mitsuko Aramaki
, Sølvi Ystad
, Richard Kronland-Martinet
:
A Perceptually Evaluated Signal Model: Collisions Between a Vibrating Object and an Obstacle. 2338-2350 - Julius Richter
, Simon Welker
, Jean-Marie Lemercier
, Bunlong Lay
, Timo Gerkmann
:
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models. 2351-2364 - Siarhei Y. Barysenka
, Vasili I. Vorobiov
:
SNR-Based Inter-Component Phase Estimation Using Bi-Phase Prior Statistics for Single-Channel Speech Enhancement. 2365-2381 - Jiandian Zeng
, Jiantao Zhou
, Caishi Huang:
Exploring Semantic Relations for Social Media Sentiment Analysis. 2382-2394 - Fotios Drakopoulos
, Sarah Verhulst:
A Neural-Network Framework for the Design of Individualised Hearing-Loss Compensation. 2395-2409 - Xinbei Ma
, Zhuosheng Zhang
, Hai Zhao
:
Enhanced Speaker-Aware Multi-Party Multi-Turn Dialogue Comprehension. 2410-2423 - Tianrui Wang
, Weibin Zhu
, Yingying Gao, Shilei Zhang, Junlan Feng:
Harmonic Attention for Monaural Speech Enhancement. 2424-2436 - Lei Lei
, Guoshun Yuan, Hongjiang Yu, Dewei Kong, Yuefeng He:
Multilingual Customized Keyword Spotting Using Similar-Pair Contrastive Learning. 2437-2447 - Shaokai Li
, Peng Song
, Wenming Zheng
:
Multi-Source Discriminant Subspace Alignment for Cross-Domain Speech Emotion Recognition. 2448-2460 - Yeqing Ren
, Haipeng Peng
, Lixiang Li
, Xiaopeng Xue, Yang Lan, Yixian Yang
:
Generalized Voice Spoofing Detection via Integral Knowledge Amalgamation. 2461-2475 - Xing Chen
, Jie Wang, Xiao-Lei Zhang
, Weiqiang Zhang
, Kunde Yang
:
LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification. 2476-2490 - Benjamin Yen
, Yameizhen Li, Yusuke Hioka
:
Rotor Noise-Aware Noise Covariance Matrix Estimation for Unmanned Aerial Vehicle Audition. 2491-2506 - Xuechen Liu
, Xin Wang
, Md. Sahidullah
, Jose Patino
, Héctor Delgado
, Tomi Kinnunen
, Massimiliano Todisco, Junichi Yamagishi
, Nicholas W. D. Evans
, Andreas Nautsch
, Kong Aik Lee
:
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild. 2507-2522 - Zalán Borsos
, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matthew Sharifi
, Dominik Roblek
, Olivier Teboul, David Grangier
, Marco Tagliasacchi
, Neil Zeghidour
:
AudioLM: A Language Modeling Approach to Audio Generation. 2523-2533 - Xingfeng Li
, Xiaohan Shi
, Desheng Hu, Yongwei Li
, Qingchen Zhang
, Zhengxia Wang
, Masashi Unoki
, Masato Akagi
:
Music Theory-Inspired Acoustic Representation for Speech Emotion Recognition. 2534-2547 - Jiachen Lian, Chunlei Zhang
, Gopala Krishna Anumanchipalli, Dong Yu
:
Unsupervised TTS Acoustic Modeling for TTS With Conditional Disentangled Sequential VAE. 2548-2557 - Arsalan Malik
, Nipun Agarwal
, Harshavardhan Settibhaktini
, Ananthakrishna Chintanpalli
:
Predicting Level-Dependent Changes in Concurrent Vowel Scores Using the 2D-CNN Models. 2558-2566 - Michael Krause
, Meinard Müller
:
Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings. 2567-2578 - Julie Meyer
, Sebastian Prepelita
, Ali Khajeh-Saeed
, Michael Smirnov, Pablo Hoffmann:
Verification on Head-Related Transfer Functions of a Snowman Model Simulated Using the Finite-Difference Time-Domain Method. 2579-2591 - Darius Petermann
, Gordon Wichern
, Aswin Shanmugam Subramanian
, Zhong-Qiu Wang
, Jonathan Le Roux
:
Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks. 2592-2605 - Hailong Cao
, Liguo Li
, Conghui Zhu, Muyun Yang
, Tiejun Zhao:
Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction. 2606-2615 - Lin Xiao, Pengyu Xu
, Mingyang Song
, Huafeng Liu
, Liping Jing
, Xiangliang Zhang
:
Triple Alliance Prototype Orthotist Network for Long-Tailed Multi-Label Text Classification. 2616-2628 - Juhua Liu
, Qihuang Zhong
, Liang Ding
, Hua Jin
, Bo Du
, Dacheng Tao
:
Unified Instance and Knowledge Alignment Pretraining for Aspect-Based Sentiment Analysis. 2629-2642 - Yiming Zhang
, Hong Yu
, Ruoyi Du, Zheng-Hua Tan
, Wenwu Wang
, Zhanyu Ma
, Yuan Dong
:
ACTUAL: Audio Captioning With Caption Feature Space Regularization. 2643-2657 - Jakob Abeßer
, Sascha Grollmisch
, Meinard Müller
:
How Robust are Audio Embeddings for Polyphonic Sound Event Tagging? 2658-2667 - Wei Xia
, John H. L. Hansen
:
Attention and DCT Based Global Context Modeling for Text-Independent Speaker Recognition. 2668-2679 - Takuya Hasumi, Tomohiko Nakamura
, Norihiro Takamune
, Hiroshi Saruwatari
, Daichi Kitamura
, Yu Takahashi
, Kazunobu Kondo:
PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation. 2680-2694 - Ben Liu
, Jun Wang
, Guanyuan Yu
, Shaolei Chen:
CUPVC: A Constraint-Based Unsupervised Prosody Transfer for Improving Telephone Banking Services. 2695-2706 - Guinan Li
, Jiajun Deng
, Mengzhe Geng
, Zengrui Jin
, Tianzi Wang, Shujie Hu
, Mingyu Cui, Helen Meng, Xunying Liu
:
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition. 2707-2723 - Jean-Marie Lemercier
, Julius Richter
, Simon Welker
, Timo Gerkmann
:
StoRM: A Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation. 2724-2737 - Yen-Ju Lu
, Chia-Yu Chang, Cheng Yu
, Ching-Feng Liu, Jeih-weih Hung
, Shinji Watanabe
, Yu Tsao
:
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information. 2738-2750 - Sungjae Kim
, Yewon Kim, Jewoo Jun, Injung Kim
:
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer That Controls Emotional Intensity. 2751-2764 - Xinxin Su
, Zhen Huang
, Yunxiang Zhao
, Yifan Chen, Yong Dou, Hengyue Pan:
Recent Trends in Deep Learning Based Textual Emotion Cause Extraction. 2765-2786 - Junyu Lu
, Hongfei Lin
, Xiaokun Zhang
, Zhaoqing Li, Tongyue Zhang, Linlin Zong
, Fenglong Ma
, Bo Xu
:
Hate Speech Detection via Dual Contrastive Learning. 2787-2795 - Diego Marques do Carmo, Ricardo Augusto Borsoi
, Márcio Holsbach Costa
:
Closed-Form Solution to the Multichannel Wiener Filter With Interaural Level Difference Preservation. 2796-2811 - Ya-Jie Zhang
, Chao Zhang
, Wei Song, Zhengchen Zhang, Youzheng Wu, Xiaodong He
:
Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis. 2812-2823 - Ching-Yu Chiu
, Meinard Müller
, Matthew E. P. Davies
, Alvin Wen-Yu Su, Yi-Hsuan Yang
:
Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music. 2824-2835 - Feng Chen
, Ke Ma, Yapeng Mao, Desen Yang, Yi Zhang
, Jie Shi, Shiqi Mo
, Chenyang Gui, Song Li
:
A Novel Method to Design Steerable Differential Beamformer Using Linear Acoustics Vector Sensor Array. 2836-2849 - Tianyu Huang
, Weisheng Dong
, Fangfang Wu
, Xin Li
, Guangming Shi
:
Uncertainty-Driven Knowledge Distillation for Language Model Compression. 2850-2858 - Roberto Andrés Vasco Carofilis
, Enrique Alegre
, Eduardo Fidalgo
, Laura Fernández-Robles
:
Improvement of Accent Classification Models Through Grad-Transfer From Spectrograms and Gradient-Weighted Class Activation Mapping. 2859-2871 - Jacob Hollebon
, Filippo Maria Fazi
:
Higher-Order Stereophony. 2872-2885 - Jeremy Heng Meng Wong
, Huayun Zhang
, Nancy F. Chen
:
Modelling Inter-Rater Uncertainty in Spoken Language Assessment. 2886-2898 - Qinghua Zheng
, Yuefei Wu
, Guangtao Wang
, Yanping Chen
, Wei Wu
, Zai Zhang
, Bin Shi
, Bo Dong
:
Exploring Interactive and Contrastive Relations for Nested Named Entity Recognition. 2899-2909 - Dongyuan Shi
, Woon-Seng Gan
, Bhan Lam
, Zhengding Luo
, Xiaoyi Shen
:
Transferable Latent of CNN-Based Selective Fixed-Filter Active Noise Control. 2910-2921 - Dorian Desblancs
, Vincent Lostanlen
, Romain Hennequin
:
Zero-Note Samba: Self-Supervised Beat Tracking. 2922-2934 - Nankai Lin
, Yingwen Fu
, Xiaotian Lin
, Dong Zhou
, Aimin Yang
, Shengyi Jiang
:
CL-XABSA: Contrastive Learning for Cross-Lingual Aspect-Based Sentiment Analysis. 2935-2946 - Hanmeng Liu
, Jian Liu
, Leyang Cui
, Zhiyang Teng
, Nan Duan
, Ming Zhou
, Yue Zhang
:
LogiQA 2.0 - An Improved Dataset for Logical Reasoning in Natural Language Understanding. 2947-2962 - Jiangyan Yi
, Jianhua Tao
, Ruibo Fu
, Tao Wang
, Chu Yuan Zhang
, Chenglong Wang
:
Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings. 2963-2973 - Ji Won Yoon
, Hyung Yong Kim
, Hyeonseung Lee
, Sunghwan Ahn
, Nam Soo Kim
:
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models. 2974-2987 - Sufeng Duan
, Hai Zhao
, Dongdong Zhang
:
Syntax-Aware Data Augmentation for Neural Machine Translation. 2988-2999 - Tongzheng Liu
, Zhihua Lu
, João Paulo C. L. da Costa
, Tai Fei
:
A Hybrid Reverberation Model and Its Application to Joint Speech Dereverberation and Separation. 3000-3014 - Junjun Guo
, Junjie Ye
, Yan Xiang
, Zhengtao Yu
:
Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation. 3015-3026 - Qian Tao
, Zhihao Xiong, Bocheng Han
, Xiaoyang Fan
, Lusi Li
:
A Novel Unsupervised Approach for Cross-Lingual Word Alignment in Low Isomorphic Embedding Spaces. 3027-3041 - Jilu Jin
, Jacob Benesty
, Jingdong Chen
, Gongping Huang
:
Differential Beamforming From a Geometric Perspective. 3042-3054 - Alberto Palomo-Alonso
, David Casillas-Pérez
, Silvia Jiménez-Fernández
, José Antonio Portilla-Figueras
, Sancho Salcedo-Sanz
:
A Flexible Architecture Using Temporal, Spatial and Semantic Correlation-Based Algorithms for Story Segmentation of Broadcast News. 3055-3069 - Bolaji Yusuf
, Jan Cernocký
, Murat Saraçlar
:
End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations. 3070-3080 - Adrian Herzog
, Srikanth Raj Chetupalli
, Emanuël A. P. Habets
:
AmbiSep: Joint Ambisonic-to-Ambisonic Speech Separation and Noise Reduction. 3081-3094 - Po-Chun Hsu
, Da-Rong Liu
, Andy T. Liu
, Hung-yi Lee
:
Parallel Synthesis for Autoregressive Speech Generation. 3095-3111 - Siddharth Dalmia
, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe
, Florian Metze
, Luke Zettlemoyer, Abdelrahman Mohamed:
LegoNN: Building Modular Encoder-Decoder Models. 3112-3126 - Tom Gajecki
, Waldo Nogueira
:
Deep Latent Fusion Layers for Binaural Speech Enhancement. 3127-3138 - Huawen Feng
, Zhenxi Lin
, Qianli Ma
:
Perturbation-Based Self-Supervised Attention for Attention Bias in Text Classification. 3139-3151 - Jiaxin Zhong
, Tao Zhuang
, Mengtong Li
, Ray Kirby
, Mahmoud Karimi
, Jing Lu
, Dong Zhang
:
Sidelobe Suppression for a Steerable Parametric Source Using the Sparse Random Array Technique. 3152-3161 - Yan Fang
, Wei Lu
, Xiaodong Liu
, Witold Pedrycz
, Qi Lang
, Jianhua Yang
:
CircularE: A Complex Space Circular Correlation Relational Model for Link Prediction in Knowledge Graph Embedding. 3162-3175 - Jie Zhang
, Rui Tao, Jun Du
, Li-Rong Dai
:
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction. 3176-3189 - Haozhou Li
, Qinke Peng
, Xu Mou
, Ying Wang
, Zeyuan Zeng
, Muhammad Fiaz Bashir
:
Abstractive Financial News Summarization via Transformer-BiLSTM Encoder and Graph Attention-Based Decoder. 3190-3205 - Weitao Yuan
, Shengbei Wang
, Jianming Wang
, Masashi Unoki
, Wenwu Wang
:
Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation. 3206-3220 - Zhong-Qiu Wang
, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation. 3221-3236 - Marvin Tammen
, Simon Doclo
:
Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement. 3237-3248 - Yi Lin
, Qingyang Wang
, Xincheng Yu
, Zichen Zhang
, Dongyue Guo
, Jizhe Zhou
:
Towards Recognition for Radio-Echo Speech in Air Traffic Control: Dataset and a Contrastive Learning Approach. 3249-3262 - Diego Caviedes-Nozal
, Efren Fernandez-Grande
:
Spatio-Temporal Bayesian Regression for Room Impulse Response Reconstruction With Spherical Waves. 3263-3277 - Xinyu Hu
, Xiaojun Wan
:
RST Discourse Parsing as Text-to-Text Generation. 3278-3289 - Shun Lei
, Yixuan Zhou
, Liyang Chen
, Zhiyong Wu
, Xixin Wu
, Shiyin Kang
, Helen Meng:
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis. 3290-3303 - Pedro Izquierdo Lehmann
, Rodrigo F. Cádiz
, Carlos A. Sing-Long
:
Towards Maximizing a Perceptual Sweet Spot for Spatial Sound With Loudspeakers. 3304-3319 - Han Zhu
, Dongji Gao
, Gaofeng Cheng
, Daniel Povey
, Pengyuan Zhang
, Yonghong Yan
:
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition. 3320-3330 - Junqing Zhang
, Liming Shi
, Mads Græsbøll Christensen
, Wen Zhang
, Lijun Zhang
, Jingdong Chen
:
CGMM-Based Sound Zone Generation Using Robust Pressure Matching With ATF Perturbation Constraints. 3331-3345 - Erfan Loweimi
, Andrea Carmantini
, Peter Bell
, Steve Renals
, Zoran Cvetkovic
:
Phonetic Error Analysis Beyond Phone Error Rate. 3346-3361 - Runxuan Yang
, Yuyang Peng
, Xiaolin Hu
:
A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules. 3362-3373 - Yuxiang Zhang
, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang
, Pengyuan Zhang
:
The Impact of Silence on Speech Anti-Spoofing. 3374-3389 - Philippe Gonzalez
, Tommy Sonne Alstrøm
, Tobias May
:
Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments. 3390-3403 - Ziyi Xu
, Ziyue Zhao
, Tim Fingscheidt
:
Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN. 3404-3417 - Tao Li
, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li
, Qiao Tian, Yuping Wang, Lei Xie
:
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin. 3418-3430 - Xuexin Xu
, Liang Shi
, Xunquan Chen
, Pingyuan Lin
, Jie Lian
, Jinhui Chen
, Zhihong Zhang
, Edwin R. Hancock
:
Any-to-Any Voice Conversion With Multi-Layer Speaker Adaptation and Content Supervision. 3431-3445 - Chenpeng Du
, Yiwei Guo
, Xie Chen
, Kai Yu
:
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature. 3446-3456 - Yash Kumar Atri
, Vikram Goyal
, Tanmoy Chakraborty
:
Multi-Document Summarization Using Selective Attention Span and Reinforcement Learning. 3457-3467 - Maochun Huang
, Chunmei Qing
, Junpeng Tan
, Xiangmin Xu
:
Context-Based Adaptive Multimodal Fusion Network for Continuous Frame-Level Sentiment Prediction. 3468-3477 - Sebastian J. Schlecht
, Jon Fagerström
, Vesa Välimäki
:
Decorrelation in Feedback Delay Networks. 3478-3487 - Jinliang Lu
, Jiajun Zhang
:
Towards Unified Multi-Domain Machine Translation With Mixture of Domain Experts. 3488-3498 - Julien Hauret
, Thomas Joubaud
, Véronique Zimpfer
, Eric Bavu
:
Configurable EBEN: Extreme Bandwidth Extension Network to Enhance Body-Conducted Speech Capture. 3499-3512 - Wanli Peng
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
Text Steganalysis Based on Hierarchical Supervised Learning and Dual Attention Mechanism. 3513-3526 - Lin Xu
, Qixian Zhou
, Jinlan Fu
, See-Kiong Ng
:
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations. 3527-3536 - Vincent W. Neo
, Christine Evers
, Stephan Weiss
, Patrick A. Naylor
:
Signal Compaction Using Polynomial EVD for Spherical Array Processing With Applications. 3537-3549 - Gerald Enzner
, Svantje Voit:
Hybrid-Frequency-Resolution Adaptive Kalman Filter for Online Identification of Long Acoustic Responses With Low Input-Output Latency. 3550-3563 - Shang Gao
, Maoshen Jia
, Dingding Yao
, Jing Wang
:
Multi-Source Localization Using Optimized Time-Frequency Representation and Sparsity Component Analysis. 3564-3578 - He Qi
, Mingjie Gao
, Ka Fai Cedric Yiu
, Sven Nordholm
:
Distributed Microphone Array Localization Problem via SDP-SOCP Method. 3579-3588 - Hiroshi Sawada
, Rintaro Ikeshita
, Keisuke Kinoshita
, Tomohiro Nakatani
:
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined Blind Source Separation and Dereverberation. 3589-3602 - Hongyang Chang
, Hongfei Xu
, Josef van Genabith
, Deyi Xiong
, Hongying Zan
:
JoinER-BART: Joint Entity and Relation Extraction With Constrained Decoding, Representation Reuse and Fusion. 3603-3616 - Xinqi Huang
, Yingsong Li
, Yuriy V. Zakharov
, Yongchun Miao
, Zhixiang Huang
:
Squared Sine Adaptive Algorithm and Its Performance Analysis. 3617-3628 - Andong Li
, Guochen Yu
, Chengshi Zheng
, Wenzhe Liu
, Xiaodong Li
:
A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem. 3629-3646 - Bin Gu
, Jie Zhang
, Wu Guo
:
A Dynamic Convolution Framework for Session-Independent Speaker Embedding Learning. 3647-3658 - Daojian Zeng
, Chao Zhao
, Chao Jiang
, Jianling Zhu
, Jianhua Dai
:
Document-Level Relation Extraction With Context Guided Mention Integration and Inter-Pair Reasoning. 3659-3666 - Lu Li
, Maoshen Jia
, Jing Wang
, Ruiyuan Cao
:
Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection. 3667-3680 - Xiaoxiao Miao
, Xin Wang
, Erica Cooper
, Junichi Yamagishi
, Natalia A. Tomashenko
:
Speaker Anonymization Using Orthogonal Householder Neural Network. 3681-3695 - Zhengshan Xue
, Xiaolei Zhang, Tingxun Shi
, Deyi Xiong
:
DetTrans: A Lightweight Framework to Detect and Translate Noisy Inputs Simultaneously. 3696-3705 - Chang Liu
, Zhen-Hua Ling
, Ling-Hui Chen
:
Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations. 3706-3716 - Reo Yoneyama
, Yi-Chiao Wu
, Tomoki Toda
:
High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks. 3717-3729 - Stefan Thaleiser
, Gerald Enzner
:
Binaural-Projection Multichannel Wiener Filter for Cue-Preserving Binaural Speech Enhancement. 3730-3745 - Yixin Wang
, Wei Wei
, Xiangming Gu
, Xiaohong Guan
, Ye Wang
:
Disentangled Adversarial Domain Adaptation for Phonation Mode Detection in Singing and Speech. 3746-3759 - Yixuan Zhang
, Heming Wang
, DeLiang Wang
:
$F0$ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech. 3760-3770 - Zhengdao Zhao
, Yuhua Wang
, Guang Shen
, Yuezhu Xu
, Jiayuan Zhang
:
TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition. 3771-3782 - Johannes M. Arend
, Christoph Pörschmann
, Stefan Weinzierl
, Fabian Brinkmann
:
Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions. 3783-3799 - Desh Raj
, Daniel Povey
, Sanjeev Khudanpur
:
SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition. 3800-3813 - Jiaming An
, Zixiang Ding
, Ke Li
, Rui Xia
:
Global-View and Speaker-Aware Emotion Cause Extraction in Conversations. 3814-3823 - Yuqin Lin
, Longbiao Wang
, Yanbing Yang
, Jianwu Dang
:
CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition. 3824-3836 - Rémi Blandin
, Simon Stone
, Angélique Remacle
, Vincent Didone, Peter Birkholz
:
A Comparative Study of 3D and 1D Acoustic Simulations of the Higher Frequencies of Speech. 3837-3847 - Qing Wang
, Jixun Yao
, Li Zhang
, Pengcheng Guo
, Lei Xie
:
Timbre-Reserved Adversarial Attack in Speaker Identification. 3848-3858 - Yachao Li
, Junhui Li
, Jing Jiang
, Shimin Tao, Hao Yang
, Min Zhang:
P-Transformer: Towards Better Document-to-Document Neural Machine Translation. 3859-3870 - Chao Xie
, Tomoki Toda
:
Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition. 3871-3882 - Zhichao Wang
, Xinsheng Wang
, Qicong Xie, Tao Li
, Lei Xie
, Qiao Tian, Yuping Wang:
MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling. 3883-3895 - Yilin Zhao
, Hai Zhao
, Sufeng Duan
:
Multi-Grained Evidence Inference for Multi-Choice Reading Comprehension. 3896-3907 - Ye-Qian Du
, Jie Zhang
, Xin Fang, Ming-Hui Wu, Zhouwang Yang
:
A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition. 3908-3921 - Changheng Li
, Richard C. Hendriks
:
Alternating Least-Squares-Based Microphone Array Parameter Estimation for a Single-Source Reverberant and Noisy Acoustic Scenario. 3922-3934 - Kun Zhou
, Yuanhang Zhou, Wayne Xin Zhao
, Ji-Rong Wen
:
Learning to Perturb for Contrastive Learning of Unsupervised Sentence Representations. 3935-3944 - Georg Götz
, Sebastian J. Schlecht
, Ville Pulkki
:
Common-Slope Modeling of Late Reverberation. 3945-3957 - Guanhua Chen
, Runzhe Zhan
, Derek F. Wong
, Lidia S. Chao
:
Multi-Level Curriculum Learning for Multi-Turn Dialogue Generation. 3958-3967 - Yun-Yen Chuang
, Hung-Min Hsu
, Kevin Lin
, Ray-I Chang
, Hung-Yi Lee
:
MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks. 3968-3980 - Chuxuan Tong
, Xi Zheng
, Jianhua Li
, Xingjun Ma
, Longxiang Gao
, Yong Xiang
:
Query-Efficient Black-Box Adversarial Attacks on Automatic Speech Recognition. 3981-3992 - Xixin Wu
, Hui Lu
, Kun Li
, Zhiyong Wu
, Xunying Liu
, Helen Meng
:
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms. 3993-4003 - Ante Wang
, Linfeng Song
, Lifeng Jin
, Junfeng Yao
, Haitao Mi, Chen Lin
, Jinsong Su
, Dong Yu
:
D$^{2}$PSG: Multi-Party Dialogue Discourse Parsing as Sequence Generation. 4004-4013 - Nan Gao
, Yongjian Wang
, Peng Chen
, Jijun Tang
:
Boosting Short Text Classification by Solving the OOV Problem. 4014-4024

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.