default search action
Zhiyong Wu 0001
Person information
- unicode name: 吴志勇
- affiliation (PhD): Tsinghua University, Joint Research Center for Media Sciences, Beijing, China
- affiliation: Chinese University of Hong Kong, Hong Kong
Other persons with the same name
- Zhiyong Wu — disambiguation page
- Zhiyong Wu 0002 — Hohai University, College of Hydrology and Water Resources, Nanjing, China
- Zhiyong Wu 0003 — Shanghai AI Laboratory (and 1 more)
- Zhiyong Wu 0004 — University of Science and Technology of China, School of Computer Science and Technology, Hefei, China
- Zhiyong Wu 0005 — Anhui Polytechnic University, School of Mathematics and Physics, Wuhu, China
- Zhiyong Wu 0006 — Nanjing University of Posts and Telecommunications, College of Automation, China
- Zhiyong Wu 0007 — Army Engineering University, Institute of Command and Control Engineering, Nanjing, China
- Zhiyong Wu 0008 — Chinese Academy of Sciences, Changchun Institute of Optics, Fine Mechanics and Physics, China
- Zhiyong Wu 0009 — Shantou Central Hospital, Departments of Oncology Surgery, Shantou, China
- Zhiyong Wu 0010 — Tsinghua University, KLISS, BNRist, School of Software, Beijing, China
- Zhiyong Wu 0011 — Shanghai Artificial Intelligence Laboratory, China
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j13]Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing. IEEE ACM Trans. Audio Speech Lang. Process. 32: 517-528 (2024) - [c199]Zilin Wang, Haolin Zhuang, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen, Yu Yang, Boshi Tang, Zhiyong Wu:
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations. AAAI 2024: 301-309 - [c198]Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng:
SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes. AAAI 2024: 15267-15275 - [c197]Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shi-Xiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu:
SECap: Speech Emotion Captioning with Large Language Model. AAAI 2024: 19323-19331 - [c196]Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu:
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model. CVPR 2024: 2263-2273 - [c195]Yaxin Liu, Xiaomei Nie, Zhiyong Wu:
Collaboration of Digital Human Gestures and Teaching Materials for Enhanced Integration in MOOC Teaching Scenarios. HCI (59) 2024: 169-175 - [c194]Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu:
The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge. ICASSP Workshops 2024: 71-72 - [c193]Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng:
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation. ICASSP 2024: 941-945 - [c192]Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng:
Consistent and Relevant: Rethink the Query Embedding in General Sound Separation. ICASSP 2024: 961-965 - [c191]Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng:
SCNet: Sparse Compression Network for Music Source Separation. ICASSP 2024: 1276-1280 - [c190]Xingda Li, Fan Zhuo, Dan Luo, Jun Chen, Shiyin Kang, Zhiyong Wu, Tao Jiang, Yang Li, Han Fang, Yahui Zhou:
Generating Stereophonic Music with Single-Stage Language Models. ICASSP 2024: 1471-1475 - [c189]Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu:
FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness. ICASSP 2024: 7945-7949 - [c188]Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng:
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information. ICASSP 2024: 8185-8189 - [c187]Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu, Minglei Li, Zonghong Dai, Helen Meng:
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models. ICASSP 2024: 8296-8300 - [c186]Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu, Helen Meng:
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations. ICASSP 2024: 11141-11145 - [c185]Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng:
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis. ICASSP 2024: 12316-12320 - [c184]Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng:
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction. ICASSP 2024: 12341-12345 - [c183]Binzhu Sha, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng:
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion. ICASSP 2024: 12577-12581 - [c182]Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng:
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts. ICASSP 2024: 12662-12666 - [c181]Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang:
Hydraformer: One Encoder for All Subsampling Rates. ICME 2024: 1-6 - [c180]Ming Cheng, Shun Lei, Dongyang Dai, Zhiyong Wu, Dading Chong:
NRAdapt: Noise-Robust Adaptive Text to Speech Using Untranscribed Data. IJCNN 2024: 1-8 - [c179]Rui Niu, Zhiyong Wu, Changhe Song:
Representation Space Maintenance: Against Forgetting in Continual Learning. IJCNN 2024: 1-7 - [c178]Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia:
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. ACM Multimedia 2024: 554-563 - [c177]Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu:
SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description. ACM Multimedia 2024: 1255-1264 - [c176]Yunrui Cai, Runchuan Ye, Jingran Xie, Yixuan Zhou, Yaoxun Xu, Zhiyong Wu:
Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup. MRAC@MM 2024: 93-97 - [c175]Yaoxun Xu, Yixuan Zhou, Yunrui Cai, Jingran Xie, Runchuan Ye, Zhiyong Wu:
Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering. MRAC@MM 2024: 104-109 - [i101]Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu:
Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness. CoRR abs/2401.03476 (2024) - [i100]Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng:
Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation. CoRR abs/2401.07532 (2024) - [i99]Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng:
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction. CoRR abs/2401.17796 (2024) - [i98]Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng:
Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information. CoRR abs/2403.05834 (2024) - [i97]Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu:
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model. CoRR abs/2404.01862 (2024) - [i96]Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu:
The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge. CoRR abs/2404.16619 (2024) - [i95]Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng:
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction. CoRR abs/2406.08336 (2024) - [i94]Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng:
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models. CoRR abs/2407.13509 (2024) - [i93]Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang:
HydraFormer: One Encoder For All Subsampling Rates. CoRR abs/2408.04325 (2024) - [i92]Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu:
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description. CoRR abs/2408.13608 (2024) - [i91]Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang:
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement. CoRR abs/2408.14211 (2024) - [i90]Yinghao Ma, Anders Øland, Anton Ragni, Bleiz Macsen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang:
Foundation Models for Music: A Survey. CoRR abs/2408.14340 (2024) - [i89]Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia:
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. CoRR abs/2408.15676 (2024) - [i88]Yaoxun Xu, Shi-Xiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu:
Comparing Discrete and Continuous Space LLMs for Speech Recognition. CoRR abs/2409.00800 (2024) - [i87]Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng:
SongCreator: Lyrics-based Universal Song Generation. CoRR abs/2409.06029 (2024) - [i86]Wei Chen, Xintao Zhao, Jun Chen, Binzhu Sha, Zhiwei Lin, Zhiyong Wu:
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion. CoRR abs/2409.06237 (2024) - [i85]Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang, Qiaochu Huang, Shiyin Kang, Zhiyong Wu:
An End-to-End Approach for Chord-Conditioned Song Generation. CoRR abs/2409.06307 (2024) - [i84]Zhiqi Huang, Dan Luo, Jun Wang, Huan Liao, Zhiheng Li, Zhiyong Wu:
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis. CoRR abs/2409.08628 (2024) - [i83]Yuanyuan Wang, Hangting Chen, Dongchao Yang, Zhiyong Wu, Helen Meng, Xixin Wu:
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions. CoRR abs/2409.12560 (2024) - [i82]Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Shun Lei, Zhiwei Lin, Zhiyong Wu:
MuCodec: Ultra Low-Bitrate Music Codec. CoRR abs/2409.13216 (2024) - 2023
- [j12]Xingwei Liang, Lu Zhang, Zhiyong Wu, Ruifeng Xu:
Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time Speech Enhancement in RTC Scenarios. IEEE Signal Process. Lett. 30: 1697-1701 (2023) - [j11]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng:
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3290-3303 (2023) - [j10]Xixin Wu, Hui Lu, Kun Li, Zhiyong Wu, Xunying Liu, Helen Meng:
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3993-4003 (2023) - [c174]Zhihan Yang, Zhiyong Wu, Ying Shan, Jia Jia:
What Does Your Face Sound Like? 3D Face Shape towards Voice. AAAI 2023: 13905-13913 - [c173]Yunrui Cai, Changhe Song, Boshi Tang, Dongyang Dai, Zhiyong Wu, Helen Meng:
Robust Representation Learning for Speech Emotion Recognition with Moment Exchange. APSIPA ASC 2023: 1002-1007 - [c172]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang:
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. CVPR 2023: 2321-2330 - [c171]Weihong Bao, Liyang Chen, Chaoyong Zhou, Sicheng Yang, Zhiyong Wu:
Wavsyncswap: End-To-End Portrait-Customized Audio-Driven Talking Face Generation. ICASSP 2023: 1-5 - [c170]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu, Yannan Wang, Shidong Shang, Helen Meng:
Inter-Subnet: Speech Enhancement with Subband Interaction. ICASSP 2023: 1-5 - [c169]Jun Chen, Yupeng Shi, Wenzhe Liu, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu, Shidong Shang, Chengshi Zheng:
Gesper: A Unified Framework for General Speech Restoration. ICASSP 2023: 1-2 - [c168]Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu:
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech. ICASSP 2023: 1-5 - [c167]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis. ICASSP 2023: 1-5 - [c166]Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu, Yujun Wang, Helen Meng:
Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction. ICASSP 2023: 1-5 - [c165]Xingchen Song, Di Wu, Zhiyong Wu, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu:
TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty. ICASSP 2023: 1-5 - [c164]Weinan Tong, Jiaxu Zhu, Jun Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
TFCnet: Time-Frequency Domain Corrector for Speech Separation. ICASSP 2023: 1-5 - [c163]Zilin Wang, Peng Liu, Jun Chen, Sipan Li, Jinfeng Bai, Gang He, Zhiyong Wu, Helen Meng:
A Synthetic Corpus Generation Method for Neural Vocoder Training. ICASSP 2023: 1-5 - [c162]Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng:
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification. ICASSP 2023: 1-5 - [c161]Yaoxun Xu, Baiji Liu, Qiaochu Huang, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng:
CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition. ICASSP 2023: 1-5 - [c160]Yujie Yang, Kun Zhang, Zhiyong Wu, Helen Meng:
Keyword-Specific Acoustic Model Pruning for Open-Vocabulary Keyword Spotting. ICASSP 2023: 1-5 - [c159]Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng:
Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training. ICASSP 2023: 1-5 - [c158]Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng:
GTN-Bailando: Genre Consistent long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network. ICASSP 2023: 1-5 - [c157]Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao:
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer. ICCV (Workshops) 2023: 2969-2979 - [c156]Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng:
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion. ICME 2023: 1691-1696 - [c155]Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng:
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias. ICME 2023: 1703-1708 - [c154]Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai:
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023. ICMI 2023: 779-785 - [c153]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao:
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models. IJCAI 2023: 5860-5868 - [c152]Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng:
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation. INTERSPEECH 2023: 1334-1338 - [c151]Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu:
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs. INTERSPEECH 2023: 1648-1652 - [c150]Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang:
Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information. INTERSPEECH 2023: 2488-2492 - [c149]Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng:
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge. INTERSPEECH 2023: 3272-3276 - [c148]Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. INTERSPEECH 2023: 3377-3381 - [c147]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu:
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation. INTERSPEECH 2023: 4034-4038 - [c146]Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu:
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction. INTERSPEECH 2023: 4044-4048 - [c145]Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng:
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model. INTERSPEECH 2023: 4858-4862 - [c144]Zhihan Yang, Shansong Liu, Xu Li, Haozhe Wu, Zhiyong Wu, Ying Shan, Jia Jia:
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing. INTERSPEECH 2023: 4863-4867 - [c143]Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. ACM Multimedia 2023: 1033-1044 - [c142]Hui Lu, Xixin Wu, Zhiyong Wu, Helen Meng:
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody. ACM Multimedia 2023: 2829-2837 - [c141]Yunrui Cai, Jingran Xie, Boshi Tang, Yuanyuan Wang, Jun Chen, Haiwei Xue, Zhiyong Wu:
First-order Multi-label Learning with Cross-modal Interactions for Multimodal Emotion Recognition. MRAC@MM 2023: 13-20 - [i81]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis. CoRR abs/2304.06359 (2023) - [i80]Yaoxun Xu, Baiji Liu, Qiaochu Huang, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng:
CB-Conformer: Contextual biasing Conformer for biased word recognition. CoRR abs/2304.09607 (2023) - [i79]Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng:
GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network. CoRR abs/2304.12704 (2023) - [i78]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao:
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models. CoRR abs/2305.04919 (2023) - [i77]Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing. CoRR abs/2305.05203 (2023) - [i76]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Zhiyong Wu, Yannan Wang, Shidong Shang, Helen Meng:
Inter-SubNet: Speech Enhancement with Subband Interaction. CoRR abs/2305.05599 (2023) - [i75]Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng:
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion. CoRR abs/2305.09167 (2023) - [i74]Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu:
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs. CoRR abs/2305.10649 (2023) - [i73]Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang:
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. CoRR abs/2305.11094 (2023) - [i72]Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng:
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model. CoRR abs/2305.16749 (2023) - [i71]Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu:
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction. CoRR abs/2306.08454 (2023) - [i70]Jiuxin Lin, Xinyu Cai, Heinrich Dinkel, Jun Chen, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Zhiyong Wu, Yujun Wang, Helen Meng:
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction. CoRR abs/2306.14170 (2023) - [i69]Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang:
Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information. CoRR abs/2306.16241 (2023) - [i68]Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu:
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation. CoRR abs/2306.16250 (2023) - [i67]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng:
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis. CoRR abs/2307.16012 (2023) - [i66]Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao:
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer. CoRR abs/2308.04830 (2023) - [i65]Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai:
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023. CoRR abs/2308.13879 (2023) - [i64]Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng:
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis. CoRR abs/2308.16021 (2023) - [i63]Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu:
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech. CoRR abs/2308.16569 (2023) - [i62]Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng:
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information. CoRR abs/2308.16577 (2023) - [i61]Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. CoRR abs/2308.16593 (2023) - [i60]Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information. CoRR abs/2308.16836 (2023) - [i59]Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng:
Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training. CoRR abs/2309.00284 (2023) - [i58]Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen M. Meng:
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge. CoRR abs/2309.01437 (2023) - [i57]Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen M. Meng:
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation. CoRR abs/2309.02459 (2023) - [i56]Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. CoRR abs/2309.07051 (2023) - [i55]Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng:
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias. CoRR abs/2309.07803 (2023) - [i54]Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang:
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis. CoRR abs/2309.11849 (2023) - [i53]Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng:
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts. CoRR abs/2309.11977 (2023) - [i52]Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, Haozhi Huang:
AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation. CoRR abs/2310.07236 (2023) - [i51]Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng:
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification. CoRR abs/2310.12111 (2023) - [i50]Binzhu Sha, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng:
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion. CoRR abs/2312.04919 (2023) - [i49]Boshi Tang, Jianan Wang, Zhiyong Wu, Lei Zhang:
Stable Score Distillation for High-Quality 3D Generation. CoRR abs/2312.09305 (2023) - [i48]Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shi-Xiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu:
SECap: Speech Emotion Captioning with Large Language Model. CoRR abs/2312.10381 (2023) - [i47]Zilin Wang, Haolin Zhuang, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen, Yu Yang, Boshi Tang, Zhiyong Wu:
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations. CoRR abs/2312.11442 (2023) - [i46]Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng:
SimCalib: Graph Neural Network Calibration based on Similarity between Nodes. CoRR abs/2312.11858 (2023) - [i45]Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng:
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis. CoRR abs/2312.12181 (2023) - [i44]Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng:
Consistent and Relevant: Rethink the Query Embedding in General Sound Separation. CoRR abs/2312.15463 (2023) - [i43]Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu, Minglei Li, Zonghong Dai, Helen M. Meng:
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models. CoRR abs/2312.15567 (2023) - 2022
- [j9]Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee:
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning. IEEE ACM Trans. Audio Speech Lang. Process. 30: 202-217 (2022) - [c140]Xueyuan Chen, Shun Lei, Zhiyong Wu, Dong Xu, Weifeng Zhao, Helen Meng:
Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis. COLING 2022: 7193-7202 - [c139]Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee:
Adversarial Sample Detection for Speaker Verification by Neural Vocoders. ICASSP 2022: 236-240 - [c138]Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu:
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings. ICASSP 2022: 6827-6831 - [c137]Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng:
Neural Architecture Search for Speech Emotion Recognition. ICASSP 2022: 6902-6906 - [c136]Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng:
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion. ICASSP 2022: 7022-7026 - [c135]Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng:
An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer. ICASSP 2022: 7122-7126 - [c134]Liyang Chen, Zhiyong Wu, Jun Ling, Runnan Li, Xu Tan, Sheng Zhao:
Transformer-S2A: Robust and Efficient Speech-to-Animation. ICASSP 2022: 7247-7251 - [c133]Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng:
A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction. ICASSP 2022: 7602-7606 - [c132]Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng:
FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement. ICASSP 2022: 7857-7861 - [c131]Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su:
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling. ICASSP 2022: 7917-7921 - [c130]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis. ICASSP 2022: 7922-7926 - [c129]Jingbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism. ICASSP 2022: 8007-8011 - [c128]Yulan Chen, Zhiyong Wu, Zheyan Shen, Jia Jia:
Learning from Designers: Fashion Compatibility Analysis Via Dataset Distillation. ICIP 2022: 856-860 - [c127]Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao:
The ReprGesture entry to the GENEA Challenge 2022. ICMI 2022: 758-763 - [c126]Zhihan Yang, Zhiyong Wu, Jia Jia:
Speaker Characteristics Guided Speech Synthesis. IJCNN 2022: 1-8 - [c125]Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-yi Lee, Helen Meng:
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification. INTERSPEECH 2022: 306-310 - [c124]Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng:
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information. INTERSPEECH 2022: 426-430 - [c123]Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng:
Speech Enhancement with Fullband-Subband Cross-Attention Network. INTERSPEECH 2022: 976-980 - [c122]Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng:
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion. INTERSPEECH 2022: 2553-2557 - [c121]Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng:
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis. INTERSPEECH 2022: 2573-2577 - [c120]Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information. INTERSPEECH 2022: 4292-4296 - [c119]Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng:
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis. INTERSPEECH 2022: 5518-5522 - [c118]Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis. INTERSPEECH 2022: 5523-5527 - [c117]Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng:
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset. INTERSPEECH 2022: 5528-5532 - [c116]Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng:
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis. INTERSPEECH 2022: 5533-5537 - [c115]Xueyuan Chen, Qiaochu Huang, Xixin Wu, Zhiyong Wu, Helen Meng:
HILvoice:Human-in-the-Loop Style Selection for Elder-Facing Speech Synthesis. ISCSLP 2022: 86-90 - [c114]Chenyi Li, Zhiyong Wu, Wei Rao, Yannan Wang, Helen Meng:
Boosting the Performance of SpEx+ by Attention and Contextual Mechanism. ISCSLP 2022: 135-139 - [c113]Jingbei Li, Yi Meng, Xixin Wu, Zhiyong Wu, Jia Jia, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks. ACM Multimedia 2022: 5811-5820 - [c112]Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-yi Lee, Helen Meng:
Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion. Odyssey 2022: 92-99 - [c111]Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng:
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE. SLT 2022: 814-821 - [i42]Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng:
FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement. CoRR abs/2203.12188 (2022) - [i41]Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis. CoRR abs/2203.12201 (2022) - [i40]Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng:
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion. CoRR abs/2203.12813 (2022) - [i39]Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-Yi Lee, Helen Meng:
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification. CoRR abs/2203.15249 (2022) - [i38]Jingbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism. CoRR abs/2203.16838 (2022) - [i37]Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng:
A Character-level Span-based Model for Mandarin Prosodic Structure Prediction. CoRR abs/2203.16922 (2022) - [i36]Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng:
Neural Architecture Search for Speech Emotion Recognition. CoRR abs/2203.16928 (2022) - [i35]Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng:
An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer. CoRR abs/2203.16954 (2022) - [i34]Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng:
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis. CoRR abs/2204.00990 (2022) - [i33]Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis. CoRR abs/2204.02743 (2022) - [i32]Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-yi Lee, Helen Meng:
Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion. CoRR abs/2206.09131 (2022) - [i31]Bin Su, Shaoguang Mao, Frank K. Soong, Zhiyong Wu:
Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives. CoRR abs/2207.02454 (2022) - [i30]Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng:
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset. CoRR abs/2208.05359 (2022) - [i29]Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng:
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion. CoRR abs/2208.08757 (2022) - [i28]Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao:
The ReprGesture entry to the GENEA Challenge 2022. CoRR abs/2208.12133 (2022) - [i27]Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng:
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE. CoRR abs/2210.13771 (2022) - [i26]Xingchen Song, Di Wu, Binbin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu:
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition. CoRR abs/2210.17079 (2022) - [i25]Xingchen Song, Di Wu, Zhiyong Wu, Binbin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu:
TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty. CoRR abs/2211.00522 (2022) - [i24]Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng:
Speech Enhancement with Fullband-Subband Cross-Attention Network. CoRR abs/2211.05432 (2022) - 2021
- [j8]Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Helen Meng:
Exemplar-Based Emotive Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 29: 874-886 (2021) - [j7]Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng:
Speech Emotion Recognition Using Sequential Capsule Networks. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3280-3291 (2021) - [c110]Suping Zhou, Jia Jia, Zhiyong Wu, Zhihan Yang, Yanfeng Wang, Wei Chen, Fanbo Meng, Shuo Huang, Jialie Shen, Xiaochuan Wang:
Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach. AAAI 2021: 6039-6047 - [c109]Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng:
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams. APSIPA ASC 2021: 1433-1437 - [c108]Aolan Sun, Jianzong Wang, Ning Cheng, Methawee Tantrawenith, Zhiyong Wu, Helen Meng, Edward Xiao, Jing Xiao:
Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples. ASRU 2021: 946-953 - [c107]Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, Kun Li, Zhiyong Wu, Yuanchun Shi, Xiaobo Lu, Ziwei Liu:
PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback. CHI 2021: 676:1-676:14 - [c106]Yingmei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang:
Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding. EMNLP (1) 2021: 3226-3237 - [c105]Xiong Cai, Dongyang Dai, Zhiyong Wu, Xiang Li, Jingbei Li, Helen Meng:
Emotion Controllable Speech Synthesis Using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition. ICASSP 2021: 5734-5738 - [c104]Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen M. Meng:
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input. ICASSP 2021: 5894-5898 - [c103]Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen M. Meng:
Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal. ICASSP 2021: 6064-6068 - [c102]Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee:
Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models. ICASSP 2021: 6718-6722 - [c101]Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia, Jonathan Tien, Zhiyong Wu:
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples. ICASSP 2021: 7748-7752 - [c100]Jie Wang, Yuren You, Feng Liu, Deyi Tuo, Shiyin Kang, Zhiyong Wu, Helen Meng:
The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020. ICASSP 2021: 8608-8612 - [c99]Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng:
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion. Interspeech 2021: 846-850 - [c98]Hui Lu, Zhiyong Wu, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng:
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis. Interspeech 2021: 3775-3779 - [c97]Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-yi Lee:
Voting for the Right Answer: Adversarial Defense for Speaker Verification. Interspeech 2021: 4294-4298 - [c96]Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng:
Towards Multi-Scale Style Control for Expressive Speech Synthesis. Interspeech 2021: 4673-4677 - [c95]Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng:
Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network. ISCSLP 2021: 1-5 - [c94]Liangqi Liu, Jiankun Hu, Zhiyong Wu, Song Yang, Songfan Yang, Jia Jia, Helen Meng:
Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis. SLT 2021: 410-414 - [i23]Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Helen Meng:
Adversarially learning disentangled speech representations for robust multi-factor voice conversion. CoRR abs/2102.00184 (2021) - [i22]Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee:
Adversarial defense for automatic speaker verification by cascaded self-supervised learning models. CoRR abs/2102.07047 (2021) - [i21]Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen M. Meng:
Towards Multi-Scale Style Control for Expressive Speech Synthesis. CoRR abs/2104.03521 (2021) - [i20]Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Helen Meng:
Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech. CoRR abs/2104.06835 (2021) - [i19]Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, Kun Li, Zhiyong Wu, Yuanchun Shi, Xiaobo Lu, Ziwei Liu:
PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback. CoRR abs/2105.05182 (2021) - [i18]Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee:
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning. CoRR abs/2106.00273 (2021) - [i17]Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su:
Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis. CoRR abs/2106.06233 (2021) - [i16]Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-yi Lee:
Voting for the right answer: Adversarial defense for speaker verification. CoRR abs/2106.07868 (2021) - [i15]Haibin Wu, Po-Chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-yi Lee:
Spotting adversarial samples for speaker verification by neural vocoders. CoRR abs/2107.00309 (2021) - [i14]Hui Lu, Zhiyong Wu, Xixin Wu, Xu Li, Shiyin Kang, Xunying Liu, Helen Meng:
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis. CoRR abs/2107.03298 (2021) - [i13]Yingmei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang:
Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding. CoRR abs/2109.01583 (2021) - [i12]Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu:
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings. CoRR abs/2110.07274 (2021) - [i11]Liyang Chen, Zhiyong Wu, Jun Ling, Runnan Li, Xu Tan, Sheng Zhao:
Transformer-S2A: Robust and Efficient Speech-to-Animation. CoRR abs/2111.09771 (2021) - 2020
- [c93]Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng:
End-To-End Accent Conversion Without Using Native Utterances. ICASSP 2020: 6289-6293 - [c92]Yuewen Cao, Songxiang Liu, Xixin Wu, Shiyin Kang, Peng Liu, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng:
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora. ICASSP 2020: 7619-7623 - [c91]Michael Lao BanTeng, Zhiyong Wu:
Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition. ICPR 2020: 3799-3806 - [c90]Yingmei Guo, Zhiyong Wu, Mingxing Xu:
FERNet: Fine-grained Extraction and Reasoning Network for Emotion Recognition in Dialogues. AACL/IJCNLP 2020: 37-43 - [c89]Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng:
SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition. INTERSPEECH 2020: 581-585 - [c88]Kun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song:
Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting. INTERSPEECH 2020: 2567-2571 - [c87]Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao, Helen Meng:
Enhancing Monotonicity for Robust Autoregressive Transformer TTS. INTERSPEECH 2020: 3181-3185 - [c86]Xingchen Song, Guangsen Wang, Yiheng Huang, Zhiyong Wu, Dan Su, Helen Meng:
Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks. INTERSPEECH 2020: 3765-3769 - [i10]Dongyang Dai, Li Chen, Yuping Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang:
Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement. CoRR abs/2005.12531 (2020) - [i9]Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng:
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams. CoRR abs/2006.11610 (2020) - [i8]Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia, Jonathan Tien, Zhiyong Wu:
Improving pronunciation assessment via ordinal regression with anchored reference samples. CoRR abs/2010.13339 (2020) - [i7]Xiong Cai, Dongyang Dai, Zhiyong Wu, Xiang Li, Jingbei Li, Helen Meng:
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition. CoRR abs/2010.13350 (2020) - [i6]Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng:
Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input. CoRR abs/2010.15025 (2020) - [i5]Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen M. Meng:
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal. CoRR abs/2012.06971 (2020) - [i4]Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng:
Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network. CoRR abs/2012.11174 (2020)
2010 – 2019
- 2019
- [c85]Yingmei Guo, Mingxing Xu, Zhiyong Wu, Jianming Wu, Bin Su:
Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection. ACII Workshops 2019: 1-5 - [c84]Yao Du, Zhiyong Wu, Shiyin Kang, Dan Su, Dong Yu, Helen Meng:
Prosodic Structure Prediction using Deep Self-attention Neural Network. APSIPA 2019: 320-324 - [c83]Liangqi Liu, Zhiyong Wu, Runnan Li, Jia Jia, Helen Meng:
Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection. APSIPA 2019: 922-926 - [c82]Yao Du, Zhiyong Wu, Shiyin Kang, Dan Su, Dong Yu, Helen Meng:
Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network. APSIPA 2019: 1234-1238 - [c81]Kun Zhang, Zhiyong Wu, Jia Jia, Helen M. Meng, Binheng Song:
Query-by-Example Spoken Term Detection using Attentive Pooling Networks. APSIPA 2019: 1267-1272 - [c80]Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, Helen Meng:
Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition. ICASSP 2019: 6675-6679 - [c79]Xixin Wu, Songxiang Liu, Yuewen Cao, Xu Li, Jianwei Yu, Dongyang Dai, Xi Ma, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng:
Speech Emotion Recognition Using Capsule Networks. ICASSP 2019: 6695-6699 - [c78]Hui Lu, Zhiyong Wu, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng:
A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams. ICASSP 2019: 6810-6814 - [c77]Yuewen Cao, Xixin Wu, Songxiang Liu, Jianwei Yu, Xu Li, Zhiyong Wu, Xunying Liu, Helen Meng:
End-to-end Code-switched TTS with Mix of Monolingual Recordings. ICASSP 2019: 6935-6939 - [c76]Mu Wang, Xixin Wu, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Guangzhi Li, Dan Su, Dong Yu, Helen Meng:
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis. ICASSP 2019: 7060-7064 - [c75]Dongyang Dai, Zhiyong Wu, Runnan Li, Xixin Wu, Jia Jia, Helen Meng:
Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition. ICASSP 2019: 7405-7409 - [c74]Shaoguang Mao, Zhiyong Wu, Jingshuai Jiang, Peiyun Liu, Frank K. Soong:
NN-based Ordinal Regression for Assessing Fluency of ESL Speech. ICASSP 2019: 7420-7424 - [c73]Yulan Chen, Jia Jia, Zhiyong Wu:
Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network. ICMI 2019: 302-309 - [c72]Runnan Li, Zhiyong Wu, Jia Jia, Yaohua Bu, Sheng Zhao, Helen Meng:
Towards Discriminative Representation Learning for Speech Emotion Recognition. IJCAI 2019: 5060-5066 - [c71]Hui Lu, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng:
One-Shot Voice Conversion with Global Speaker Embeddings. INTERSPEECH 2019: 669-673 - [c70]Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng:
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT. INTERSPEECH 2019: 2090-2094 - [c69]Jingbei Li, Zhiyong Wu, Runnan Li, Pengpeng Zhi, Song Yang, Helen Meng:
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis. INTERSPEECH 2019: 4494-4498 - [i3]Xingcheng Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng:
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks. CoRR abs/1910.10387 (2019) - 2018
- [j6]Kun Li, Shaoguang Mao, Xu Li, Zhiyong Wu, Helen Meng:
Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks. Speech Commun. 96: 28-36 (2018) - [c68]Jingbei Li, Zhiyong Wu, Runnan Li, Mingxing Xu, Kehua Lei, Lianhong Cai:
Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning. AIMS 2018: 16-28 - [c67]Ziwei Zhu, Zhiyong Wu, Runnan Li, Yishuang Ning, Helen Meng:
Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices. AIMS 2018: 55-66 - [c66]Runnan Li, Zhiyong Wu, Yuchen Huang, Jia Jia, Helen Meng, Lianhong Cai:
Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis. ICASSP 2018: 5129-5133 - [c65]Xixin Wu, Lifa Sun, Shiyin Kang, Songxiang Liu, Zhiyong Wu, Xunying Liu, Helen Meng:
Feature Based Adaptation for Speaking Style Synthesis. ICASSP 2018: 5304-5308 - [c64]Shaoguang Mao, Xu Li, Kun Li, Zhiyong Wu, Xunying Liu, Helen Meng:
Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis. ICASSP 2018: 6244-6248 - [c63]Shaoguang Mao, Zhiyong Wu, Runnan Li, Xu Li, Helen Meng, Lianhong Cai:
Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech. ICASSP 2018: 6254-6258 - [c62]Shaoguang Mao, Zhiyong Wu, Xu Li, Runnan Li, Xixin Wu, Helen Meng:
Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech. ICME 2018: 1-6 - [c61]Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai:
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection. INTERSPEECH 2018: 102-106 - [c60]Shuai Yang, Zhiyong Wu, Binbin Shen, Helen Meng:
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method. INTERSPEECH 2018: 317-321 - [c59]Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng:
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis. INTERSPEECH 2018: 3072-3076 - [c58]Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai:
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms. INTERSPEECH 2018: 3683-3687 - [c57]Mu Wang, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng:
Speech Super-Resolution Using Parallel WaveNet. ISCSLP 2018: 260-264 - [c56]Runnan Li, Zhiyong Wu, Jia Jia, Jingbei Li, Wei Chen, Helen Meng:
Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs. ACM Multimedia 2018: 136-144 - 2017
- [c55]Yishuang Ning, Jia Jia, Zhiyong Wu, Runnan Li, Yongsheng An, Yanfeng Wang, Helen M. Meng:
Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems. AAAI 2017: 161-167 - [c54]Runnan Li, Zhiyong Wu, Xunying Liu, Helen M. Meng, Lianhong Cai:
Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis. ICASSP 2017: 5510-5514 - [c53]Yishuang Ning, Zhiyong Wu, Runnan Li, Jia Jia, Mingxing Xu, Helen M. Meng, Lianhong Cai:
Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data. ICASSP 2017: 5615-5619 - [c52]Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai:
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer. INTERSPEECH 2017: 779-783 - [c51]Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai:
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space. INTERSPEECH 2017: 1238-1242 - [c50]Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai:
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion. INTERSPEECH 2017: 3409-3413 - [c49]Song Tang, Zhiyong Wu, Kang Chen:
Movie Recommendation via BLSTM. MMM (2) 2017: 269-279 - 2016
- [c48]Quanjie Yu, Peng Liu, Zhiyong Wu, Shiyin Kang, Helen Meng, Lianhong Cai:
Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages. ICASSP 2016: 5545-5549 - [c47]Xinyu Lan, Xu Li, Yishuang Ning, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai:
Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar. ICASSP 2016: 5550-5554 - [c46]Yaodong Tang, Yuchen Huang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai:
Question detection from acoustic features using recurrent neural network with gated recurrent unit. ICASSP 2016: 6125-6129 - [c45]Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen M. Meng, Lianhong Cai:
Recognizing stances in Mandarin social ideological debates with text and acoustic features. ICME Workshops 2016: 1-6 - [c44]Haishu Xianyu, Mingxing Xu, Zhiyong Wu, Lianhong Cai:
Heterogeneity-entropy based unsupervised feature learning for personality prediction with cross-media data. ICME 2016: 1-6 - [c43]Yaodong Tang, Zhiyong Wu, Helen M. Meng, Mingxing Xu, Lianhong Cai:
Analysis on Gated Recurrent Unit Based Question Detection Approach. INTERSPEECH 2016: 735-739 - [c42]Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen M. Meng, Lianhong Cai:
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition. INTERSPEECH 2016: 1392-1396 - [c41]Xu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai:
Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis. INTERSPEECH 2016: 1472-1476 - [c40]Xu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai:
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data. INTERSPEECH 2016: 1477-1481 - [c39]Runnan Li, Zhiyong Wu, Helen M. Meng, Lianhong Cai:
DBLSTM-based multi-task learning for pitch transformation in voice conversion. ISCSLP 2016: 1-5 - [c38]Leye Wei, Xin Jin, Zhiyong Wu:
3D modeling based on multiple Unmanned Aerial Vehicles with the optimal paths. ISPACS 2016: 1-6 - [c37]Yiqi Jiang, Xin Jin, Zhiyong Wu:
Video Inpainting Based on Joint Gradient and Noise Minimization. PCM (1) 2016: 407-417 - [c36]Leye Wei, Xin Jin, Zhiyong Wu, Lei Zhang:
A Real-Time Gesture-Based Unmanned Aerial Vehicle Control System. PCM (1) 2016: 529-539 - [i2]Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen M. Meng, Lianhong Cai:
Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition. CoRR abs/1611.05675 (2016) - 2015
- [j5]Zhiyong Wu, Kai Zhao, Xixin Wu, Xinyu Lan, Helen Meng:
Acoustic to articulatory mapping with deep neural network. Multim. Tools Appl. 74(22): 9889-9907 (2015) - [j4]Zhiyong Wu, Yishuang Ning, Xiao Zang, Jia Jia, Fanbo Meng, Helen Meng, Lianhong Cai:
Generating emphatic speech with hidden Markov model for expressive speech synthesis. Multim. Tools Appl. 74(22): 9909-9925 (2015) - [c35]Xixin Wu, Zhiyong Wu, Yishuang Ning, Jia Jia, Lianhong Cai, Helen M. Meng:
Understanding speaking styles of internet speech data with LSTM and low-resource training. ACII 2015: 815-820 - [c34]Peng Liu, Quanjie Yu, Zhiyong Wu, Shiyin Kang, Helen M. Meng, Lianhong Cai:
A deep recurrent approach for acoustic-to-articulatory inversion. ICASSP 2015: 4450-4454 - [c33]Yishuang Ning, Zhiyong Wu, Jia Jia, Fanbo Meng, Helen M. Meng, Lianhong Cai:
HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training. ICASSP 2015: 4934-4938 - [c32]Qi Lyu, Zhiyong Wu, Jun Zhu, Helen Meng:
Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation. IJCAI 2015: 4138-4139 - [c31]Yishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen M. Meng, Jia Jia, Lianhong Cai:
Using tilt for automatic emphasis detection with Bayesian networks. INTERSPEECH 2015: 578-582 - [c30]Qi Lyu, Zhiyong Wu, Jun Zhu:
Polyphonic Music Modelling with LSTM-RTRBM. ACM Multimedia 2015: 991-994 - 2014
- [j3]Jia Jia, Zhiyong Wu, Shen Zhang, Helen M. Meng, Lianhong Cai:
Head and facial gestures synthesis using PAD model for an expressive talking avatar. Multim. Tools Appl. 73(1): 439-461 (2014) - [j2]Fanbo Meng, Zhiyong Wu, Jia Jia, Helen M. Meng, Lianhong Cai:
Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training. Multim. Tools Appl. 73(1): 463-489 (2014) - [c29]Yuchao Fan, Mingxing Xu, Zhiyong Wu, Lianhong Cai:
Automatic Emotion Variation Detection in continuous speech. APSIPA 2014: 1-5 - [c28]Xin Zheng, Zhiyong Wu, Helen Meng, Lianhong Cai:
Learning dynamic features with neural networks for phoneme recognition. ICASSP 2014: 2524-2528 - [c27]Xin Zheng, Zhiyong Wu, Helen Meng, Lianhong Cai:
Contrastive auto-encoder for phoneme recognition. ICASSP 2014: 2529-2533 - [c26]Xiao Zang, Zhiyong Wu, Helen M. Meng, Jia Jia, Lianhong Cai:
Using conditional random fields to predict focus word pair in spontaneous spoken English. INTERSPEECH 2014: 756-760 - [c25]Zhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang, Qingmin Liao:
Multi-channel speech enhancement using sparse coding on local time-frequency structures. INTERSPEECH 2014: 2824-2827 - [c24]Xixin Wu, Zhiyong Wu, Jia Jia, Helen M. Meng, Lianhong Cai, Weifeng Li:
Automatic speech data clustering with human perception based weighted distance. ISCSLP 2014: 216-220 - 2013
- [c23]Jianbo Jiang, Zhiyong Wu, Mingxing Xu, Jia Jia, Lianhong Cai:
Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition. APSIPA 2013: 1-4 - [c22]Mingming Zhang, Weifeng Li, Longbiao Wang, Jianguo Wei, Zhiyong Wu, Qingmin Liao:
Sparse coding for sound event classification. APSIPA 2013: 1-5 - [c21]Mingming Zhang, Weifeng Li, Longbiao Wang, Jianguo Wei, Zhiyong Wu, Qingmin Liao:
Frequency-domain dereverberation on speech signal using surround retinex. APSIPA 2013: 1-5 - [c20]Kai Zhao, Zhiyong Wu, Lianhong Cai:
A real-time speech driven talking avatar based on deep neural network. APSIPA 2013: 1-4 - [c19]Xin Zheng, Zhiyong Wu, Binbin Shen, Helen M. Meng, Lianhong Cai:
Investigation of tandem deep belief network approach for phoneme recognition. ICASSP 2013: 7586-7590 - [i1]Xin Zheng, Zhiyong Wu, Helen M. Meng, Weifeng Li, Lianhong Cai:
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition. CoRR abs/1309.6176 (2013) - 2012
- [c18]Jia Jia, Xiaohui Wang, Zhiyong Wu, Lianhong Cai, Helen M. Meng:
Modeling the correlation between modality semantics and facial expressions. APSIPA 2012: 1-10 - [c17]Fanbo Meng, Zhiyong Wu, Helen M. Meng, Jia Jia, Lianhong Cai:
Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data. INTERSPEECH 2012: 466-469 - [c16]Tao Jiang, Zhiyong Wu, Jia Jia, Lianhong Cai:
Perceptual clustering based unit selection optimization for concatenative text-to-speech synthesis. ISCSLP 2012: 64-68 - [c15]Chunrong Li, Zhiyong Wu, Fanbo Meng, Helen M. Meng, Lianhong Cai:
Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis. ISCSLP 2012: 93-97 - [c14]Xixin Wu, Zhiyong Wu, Jia Jia, Lianhong Cai:
Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers. ISCSLP 2012: 363-367 - [c13]Jianbo Jiang, Zhiyong Wu, Mingxing Xu, Jia Jia, Lianhong Cai:
Comparison of adaptation methods for GMM-SVM based speech emotion recognition. SLT 2012: 269-273 - 2011
- [c12]Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai:
Combining Active and Semi-Supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis. INTERSPEECH 2011: 2165-2168 - 2010
- [c11]Quansheng Duan, Shiyin Kang, Zhiyong Wu, Lianhong Cai, Zhiwei Shuang, Yong Qin:
Comparison of Syllable/Phone HMM Based Mandarin TTS. ICPR 2010: 4496-4499 - [c10]Zhiyong Wu, Lianhong Cai, Helen M. Meng:
Modeling prosody patterns for Chinese expressive text-to-speech synthesis. ISCSLP 2010: 148-152 - [p1]Shen Zhang, Zhiyong Wu, Helen M. Meng, Lianhong Cai:
Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar. Modeling Machine Emotions for Realizing Intelligence 2010: 109-132
2000 – 2009
- 2009
- [j1]Zhiyong Wu, Helen M. Meng, Hongwu Yang, Lianhong Cai:
Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System. IEEE Trans. Speech Audio Process. 17(8): 1567-1576 (2009) - 2008
- [c9]Honglei Cong, Zhiyong Wu, Lianhong Cai, Helen M. Meng:
A New Prosodic Strength Calculation Method for Prosody Reduction Modeling. ISCSLP 2008: 53-56 - [c8]Zhiyong Wu, Jiying Wu, Helen M. Meng:
The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes. ISCSLP 2008: 370-373 - 2007
- [c7]Shen Zhang, Zhiyong Wu, Helen M. Meng, Lianhong Cai:
Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar. ACII 2007: 24-35 - [c6]Shen Zhang, Zhiyong Wu, Helen M. Meng, Lianhong Cai:
Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar. ICASSP (4) 2007: 837-840 - 2006
- [c5]Zhiyong Wu, Lianhong Cai, Helen M. Meng:
Multi-level Fusion of Audio and Visual Features for Speaker Identification. ICB 2006: 493-499 - [c4]Zhiyong Wu, Shen Zhang, Lianhong Cai, Helen M. Meng:
Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar. INTERSPEECH 2006 - [c3]Zhiyong Wu, Helen M. Meng, Hui Ning, Sam C. Tse:
A Corpus-Based Approach for Cooperative Response Generation in a Dialog System. ISCSLP (Selected Papers) 2006: 614-626 - [c2]Hongwu Yang, Helen M. Meng, Zhiyong Wu, Lianhong Cai:
Modelling the Global acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis. SLT 2006: 138-141 - 2000
- [c1]Zhiyong Wu, Lianhong Cai, Tongchun Zhou:
Research on dynamic characters of Chinese pitch contours. INTERSPEECH 2000: 686-689
Coauthor Index
aka: Hung-Yi Lee
aka: Helen Meng
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-19 23:09 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint