default search action
Yuxuan Wang 0002
Person information
- affiliation: ByteDance AI Lab, Mountain View, CA, USA
- affiliation: Google, Mountain View, CA, USA
- affiliation (former, PhD): Ohio State University, Columbus, OH, USA
Other persons with the same name
- Yuxuan Wang (aka: Yu-Xuan Wang) — disambiguation page
- Yuxuan Wang 0001 — Harbin Institute of Technology, School of Computer Science and Technology, China
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j15]Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing. IEEE ACM Trans. Audio Speech Lang. Process. 32: 517-528 (2024) - [j14]Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley:
AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2871-2883 (2024) - [c51]Yuzhuo Liu, Xubo Liu, Yan Zhao, Yuanyuan Wang, Rui Xia, Pingchuan Tain, Yuxuan Wang:
Audio Prompt Tuning for Universal Sound Separation. ICASSP 2024: 1446-1450 - [c50]Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Qiao Tian, Yuanyuan Huo, Yuxuan Wang:
A Unified Front-End Framework for English Text-to-Speech Synthesis. ICASSP 2024: 10181-10185 - [c49]Qianqian Dong, Zhiying Huang, Qi Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang:
PolyVoice: Language Models for Speech to Speech Translation. ICLR 2024 - [c48]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models. ICML 2024 - [c47]Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song:
InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models. IJCAI 2024: 5835-5843 - [i55]Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma:
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing. CoRR abs/2404.06674 (2024) - [i54]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
Can Large Language Models Understand Spatial Audio? CoRR abs/2406.07914 (2024) - [i53]Junyi Ao, Yuancheng Wang, Xiaohai Tian, Dekun Chen, Jun Zhang, Lu Lu, Yuxuan Wang, Haizhou Li, Zhizheng Wu:
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words. CoRR abs/2406.13340 (2024) - [i52]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models. CoRR abs/2406.15704 (2024) - [i51]Van Tung Pham, Yist Y. Lin, Tao Han, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang:
A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR. CoRR abs/2406.17272 (2024) - [i50]Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Mark D. Plumbley, Wenwu Wang:
Improving Audio Generation with Visual Enhanced Caption. CoRR abs/2407.04416 (2024) - [i49]Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li, Xiaoyang Li, Zeyang Li, Zehua Lin, Rui Liu, Shouda Liu, Lu Lu, Yizhou Lu, Jingting Ma, Shengtao Ma, Yulin Pei, Chen Shen, Tian Tan, Xiaogang Tian, Ming Tu, Bo Wang, Hao Wang, Yuping Wang, Yuxuan Wang, Hanzhang Xia, Rui Xia, Shuangyi Xie, Hongmin Xu, Meng Yang, Bihong Zhang, Jun Zhang, Wanyi Zhang, Yang Zhang, Yawei Zhang, Yijie Zheng, Ming Zou:
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition. CoRR abs/2407.04675 (2024) - [i48]Minglun Han, Ye Bai, Chen Shen, Youjia Huang, Mingkun Huang, Zehua Lin, Linhao Dong, Lu Lu, Yuxuan Wang:
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training. CoRR abs/2409.08680 (2024) - 2023
- [c46]Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang:
Streaming Voice Conversion via Intermediate Bottleneck Features and Non-Streaming Teacher Guidance. ICASSP 2023: 1-5 - [c45]Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Memory Augmented Lookup Dictionary Based Language Modeling for Automatic Speech Recognition. INTERSPEECH 2023: 481-485 - [c44]Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Language-universal Phonetic Encoder for Low-resource Speech Recognition. INTERSPEECH 2023: 1429-1433 - [c43]Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang:
Efficient Neural Music Generation. NeurIPS 2023 - [i47]Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition. CoRR abs/2301.00066 (2023) - [i46]Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing. CoRR abs/2305.05203 (2023) - [i45]Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Yuanyuan Huo, Yuping Wang, Yuxuan Wang:
a unified front-end framework for english text-to-speech synthesis. CoRR abs/2305.10666 (2023) - [i44]Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition. CoRR abs/2305.11569 (2023) - [i43]Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Language-universal phonetic encoder for low-resource speech recognition. CoRR abs/2305.11576 (2023) - [i42]Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang:
Efficient Neural Music Generation. CoRR abs/2305.15719 (2023) - [i41]Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang:
PolyVoice: Language Models for Speech to Speech Translation. CoRR abs/2306.02982 (2023) - [i40]Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang:
Separate Anything You Describe. CoRR abs/2308.05037 (2023) - [i39]Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley:
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining. CoRR abs/2308.05734 (2023) - [i38]Bing Han, Junyu Dai, Xuchen Song, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian:
InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models. CoRR abs/2308.14360 (2023) - [i37]Yuzhuo Liu, Xubo Liu, Yan Zhao, Yuanyuan Wang, Rui Xia, Pingchuan Tain, Yuxuan Wang:
Audio Prompt Tuning for Universal Sound Separation. CoRR abs/2311.18399 (2023) - 2022
- [j13]Qiuqiang Kong, Bochen Li, Jitong Chen, Yuxuan Wang:
GiantMIDI-Piano: A Large-Scale MIDI Dataset for Classical Piano Music. Trans. Int. Soc. Music. Inf. Retr. 5(1): 87-98 (2022) - [c42]Jingbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism. ICASSP 2022: 8007-8011 - [c41]Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang:
Cloning One's Voice Using Very Limited Data in the Wild. ICASSP 2022: 8322-8326 - [c40]Xiaofeng Shu, Yanjie Chen, Chuxiang Shang, Yan Zhao, Chengshuai Zhao, Yehang Zhu, Chuanzeng Huang, Yuxuan Wang:
Non-intrusive Speech Quality Assessment with a Multi-Task Learning based Subband Adaptive Attention Temporal Convolutional Neural Network. INTERSPEECH 2022: 3298-3302 - [c39]Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang:
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration. INTERSPEECH 2022: 4232-4236 - [c38]Jingbei Li, Yi Meng, Xixin Wu, Zhiyong Wu, Jia Jia, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks. ACM Multimedia 2022: 5811-5820 - [i36]Jingbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang:
NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism. CoRR abs/2203.16838 (2022) - [i35]Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang:
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration. CoRR abs/2204.05841 (2022) - [i34]Zhengxi Liu, Qiao Tian, Chenxu Hu, Xudong Liu, Menglin Wu, Yuping Wang, Hang Zhao, Yuxuan Wang:
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech. CoRR abs/2207.06088 (2022) - [i33]Qiuqiang Kong, Shilei Liu, Junjie Shi, Xuzhou Ye, Yin Cao, Qiaoxi Zhu, Yong Xu, Yuxuan Wang:
Neural Sound Field Decomposition with Super-resolution of Sound Direction. CoRR abs/2210.12345 (2022) - [i32]Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yuping Wang, Yuxuan Wang:
Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance. CoRR abs/2210.15158 (2022) - 2021
- [j12]Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, Yuxuan Wang:
High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3707-3717 (2021) - [c37]Jiawen Huang, Ju-Chiang Wang, Jordan B. L. Smith, Xuchen Song, Yuxuan Wang:
Modeling the Compatibility of Stem Tracks to Generate Music Mashups. AAAI 2021: 187-195 - [c36]Ju-Chiang Wang, Jordan B. L. Smith, Jitong Chen, Xuchen Song, Yuxuan Wang:
Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-Task Learning. ICASSP 2021: 566-570 - [c35]Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang:
Speech Enhancement with Weakly Labelled Data from AudioSet. Interspeech 2021: 191-195 - [c34]Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma:
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders. ISCSLP 2021: 1-5 - [c33]Keunwoo Choi, Yuxuan Wang:
Listen, Read, and Identify: Multimodal Singing Language Identification of Music. ISMIR 2021: 121-127 - [c32]Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang:
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation. ISMIR 2021: 342-349 - [c31]Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao:
Neural Dubber: Dubbing for Videos According to Scripts. NeurIPS 2021: 16582-16595 - [i31]Xuchen Song, Qiuqiang Kong, Xingjian Du, Yuxuan Wang:
CatNet: music source separation system with mix-audio augmentation. CoRR abs/2102.09966 (2021) - [i30]Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang:
Speech enhancement with weakly labelled data from AudioSet. CoRR abs/2102.09971 (2021) - [i29]Keunwoo Choi, Yuxuan Wang:
Listen, Read, and Identify: Multimodal Singing Language Identification of Music. CoRR abs/2103.01893 (2021) - [i28]Jiawen Huang, Ju-Chiang Wang, Jordan B. L. Smith, Xuchen Song, Yuxuan Wang:
Modeling the Compatibility of Stem Tracks to Generate Music Mashups. CoRR abs/2103.14208 (2021) - [i27]Ju-Chiang Wang, Jordan B. L. Smith, Jitong Chen, Xuchen Song, Yuxuan Wang:
Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-task Learning. CoRR abs/2103.14253 (2021) - [i26]Xiaofeng Shu, Yehang Zhu, Yanjie Chen, Li Chen, Haohe Liu, Chuanzeng Huang, Yuxuan Wang:
Joint Echo Cancellation and Noise Suppression based on Cascaded Magnitude and Complex Mask Estimation. CoRR abs/2107.09298 (2021) - [i25]Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, Yuxuan Wang:
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation. CoRR abs/2109.05418 (2021) - [i24]Haohe Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang:
VoiceFixer: Toward General Speech Restoration With Neural Vocoder. CoRR abs/2109.13731 (2021) - [i23]Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang:
Cloning one's voice using very limited data in the wild. CoRR abs/2110.03347 (2021) - [i22]Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao:
Neural Dubber: Dubbing for Silent Videos According to Scripts. CoRR abs/2110.08243 (2021) - 2020
- [j11]Shan Yang, Yuxuan Wang, Lei Xie:
Adversarial Feature Learning and Unsupervised Clustering Based Speech Synthesis for Found Data With Acoustic and Textual Noise. IEEE Signal Process. Lett. 27: 1730-1734 (2020) - [j10]Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley:
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 28: 2880-2894 (2020) - [c30]Runxin Xu, Jun Cao, Mingxuan Wang, Jiaze Chen, Hao Zhou, Ying Zeng, Yuping Wang, Li Chen, Xiang Yin, Xijin Zhang, Songcheng Jiang, Yuxuan Wang, Lei Li:
Xiaomingbot: A Multilingual Robot News Reporter. ACL (demo) 2020: 1-8 - [c29]Zishun Feng, Ming Tu, Rui Xia, Yuxuan Wang, Ashok K. Krishnamurthy:
Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos. IEEE BigData 2020: 5671-5672 - [c28]Qiuqiang Kong, Yuxuan Wang, Xuchen Song, Yin Cao, Wenwu Wang, Mark D. Plumbley:
Source Separation with Weakly Labelled Data: an Approach to Computational Auditory Scene Analysis. ICASSP 2020: 101-105 - [c27]Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang:
A Unified Sequence-to-Sequence Front-End Model for Mandarin Text-to-Speech Synthesis. ICASSP 2020: 6689-6693 - [c26]Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma:
A Hybrid Text Normalization System Using Multi-Head Self-Attention For Mandarin. ICASSP 2020: 6694-6698 - [i21]Qiuqiang Kong, Yuxuan Wang, Xuchen Song, Yin Cao, Wenwu Wang, Mark D. Plumbley:
Source separation with weakly labelled data: An approach to computational auditory scene analysis. CoRR abs/2002.02065 (2020) - [i20]Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma:
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders. CoRR abs/2004.11012 (2020) - [i19]Shan Yang, Yuxuan Wang, Lei Xie:
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise. CoRR abs/2004.13595 (2020) - [i18]Wenjie Li, Benlai Tang, Xiang Yin, Yushi Zhao, Wei Li, Kang Wang, Hao Huang, Yuxuan Wang, Zejun Ma:
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech. CoRR abs/2005.09271 (2020) - [i17]Dongyang Dai, Li Chen, Yuping Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang:
Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement. CoRR abs/2005.12531 (2020) - [i16]Runxin Xu, Jun Cao, Mingxuan Wang, Jiaze Chen, Hao Zhou, Ying Zeng, Yuping Wang, Li Chen, Xiang Yin, Xijin Zhang, Songcheng Jiang, Yuxuan Wang, Lei Li:
Xiaomingbot: A Multilingual Robot News Reporter. CoRR abs/2007.08005 (2020) - [i15]Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, Yuxuan Wang:
High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times. CoRR abs/2010.01815 (2020) - [i14]Qiuqiang Kong, Bochen Li, Jitong Chen, Yuxuan Wang:
GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music. CoRR abs/2010.07061 (2020) - [i13]Qiuqiang Kong, Keunwoo Choi, Yuxuan Wang:
Large-Scale MIDI-based Composer Classification. CoRR abs/2010.14805 (2020)
2010 – 2019
- 2019
- [c25]Xiaochun An, Yuxuan Wang, Shan Yang, Zejun Ma, Lei Xie:
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis. ASRU 2019: 184-191 - [c24]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Yu-An Chung, Yuxuan Wang, Yonghui Wu, James R. Glass:
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization. ICASSP 2019: 5901-5905 - [c23]Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, R. J. Skerry-Ryan:
Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis. ICASSP 2019: 6940-6944 - [c22]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang:
Hierarchical Generative Modeling for Controllable Speech Synthesis. ICLR (Poster) 2019 - [i12]Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang:
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis. CoRR abs/1911.04111 (2019) - [i11]Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma:
A hybrid text normalization system using multi-head self-attention for mandarin. CoRR abs/1911.04128 (2019) - [i10]Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley:
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. CoRR abs/1912.10211 (2019) - 2018
- [c21]Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, R. J. Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu:
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. ICASSP 2018: 4779-4783 - [c20]R. J. Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous:
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. ICML 2018: 4700-4709 - [c19]Yuxuan Wang, Daisy Stanton, Yu Zhang, R. J. Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, Rif A. Saurous:
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis. ICML 2018: 5167-5176 - [c18]Daisy Stanton, Yuxuan Wang, R. J. Skerry-Ryan:
Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis. SLT 2018: 595-602 - [i9]Yuxuan Wang, Daisy Stanton, Yu Zhang, R. J. Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Ren, Ye Jia, Rif A. Saurous:
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis. CoRR abs/1803.09017 (2018) - [i8]R. J. Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous:
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. CoRR abs/1803.09047 (2018) - [i7]Daisy Stanton, Yuxuan Wang, R. J. Skerry-Ryan:
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis. CoRR abs/1808.01410 (2018) - [i6]Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, R. J. Skerry-Ryan:
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis. CoRR abs/1808.10128 (2018) - [i5]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang:
Hierarchical Generative Modeling for Controllable Speech Synthesis. CoRR abs/1810.07217 (2018) - 2017
- [c17]Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous:
Trainable frontend for robust and far-field keyword spotting. ICASSP 2017: 5670-5674 - [c16]Yuxuan Wang, R. J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc V. Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous:
Tacotron: Towards End-to-End Speech Synthesis. INTERSPEECH 2017: 4006-4010 - [i4]Yuxuan Wang, R. J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc V. Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous:
Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model. CoRR abs/1703.10135 (2017) - [i3]Yuxuan Wang, R. J. Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Battenberg, Rob Clark, Rif A. Saurous:
Uncovering Latent Style Factors for Expressive Speech Synthesis. CoRR abs/1711.00520 (2017) - [i2]Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, R. J. Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu:
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. CoRR abs/1712.05884 (2017) - 2016
- [j9]Jitong Chen, Yuxuan Wang, DeLiang Wang:
Noise perturbation for supervised speech separation. Speech Commun. 78: 1-10 (2016) - [j8]Donald S. Williamson, Yuxuan Wang, DeLiang Wang:
Complex Ratio Masking for Monaural Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. 24(3): 483-492 (2016) - [c15]Donald S. Williamson, Yuxuan Wang, DeLiang Wang:
Complex ratio masking for joint enhancement of magnitude and phase. ICASSP 2016: 5220-5224 - [i1]Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. Saurous:
Trainable Frontend For Robust and Far-Field Keyword Spotting. CoRR abs/1607.05666 (2016) - 2015
- [j7]Kun Han, Yuxuan Wang, DeLiang Wang, William S. Woods, Ivo Merks, Tao Zhang:
Learning Spectral Mapping for Speech Dereverberation and Denoising. IEEE ACM Trans. Audio Speech Lang. Process. 23(6): 982-992 (2015) - [j6]Xiaojia Zhao, Yuxuan Wang, DeLiang Wang:
Cochannel Speaker Identification in Anechoic and Reverberant Conditions. IEEE ACM Trans. Audio Speech Lang. Process. 23(11): 1727-1736 (2015) - [c14]Jitong Chen, Yuxuan Wang, DeLiang Wang:
Noise Perturbation Improves Supervised Speech Separation. LVA/ICA 2015: 83-90 - [c13]Yuxuan Wang, DeLiang Wang:
A deep neural network for time-domain signal reconstruction. ICASSP 2015: 4390-4394 - [c12]Xiaojia Zhao, Yuxuan Wang, DeLiang Wang:
Deep neural networks for cochannel speaker identification. ICASSP 2015: 4824-4828 - [c11]Donald S. Williamson, Yuxuan Wang, DeLiang Wang:
Deep neural networks for estimating speech model activations. ICASSP 2015: 5113-5117 - 2014
- [j5]Xiaojia Zhao, Yuxuan Wang, DeLiang Wang:
Robust Speaker Identification in Noisy and Reverberant Conditions. IEEE ACM Trans. Audio Speech Lang. Process. 22(4): 836-845 (2014) - [j4]Yuxuan Wang, Arun Narayanan, DeLiang Wang:
On training targets for supervised speech separation. IEEE ACM Trans. Audio Speech Lang. Process. 22(12): 1849-1858 (2014) - [j3]Jitong Chen, Yuxuan Wang, DeLiang Wang:
A feature study for classification-based speech separation at low signal-to-noise ratios. IEEE ACM Trans. Audio Speech Lang. Process. 22(12): 1993-2002 (2014) - [c10]Xiaojia Zhao, Yuxuan Wang, DeLiang Wang:
Robust speaker identification in noisy and reverberant conditions. ICASSP 2014: 3997-4001 - [c9]Kun Han, Yuxuan Wang, DeLiang Wang:
Learning spectral mapping for speech dereverberation. ICASSP 2014: 4628-4632 - [c8]Yuxuan Wang, DeLiang Wang:
A structure-preserving training target for supervised speech separation. ICASSP 2014: 6107-6111 - [c7]Donald S. Williamson, Yuxuan Wang, DeLiang Wang:
A two-stage approach for improving the perceptual quality of separated speech. ICASSP 2014: 7034-7038 - [c6]Jitong Chen, Yuxuan Wang, DeLiang Wang:
A feature study for classification-based speech separation at very low signal-to-noise ratio. ICASSP 2014: 7039-7043 - 2013
- [j2]Yuxuan Wang, Kun Han, DeLiang Wang:
Exploring Monaural Features for Classification-Based Speech Segregation. IEEE Trans. Speech Audio Process. 21(2): 270-279 (2013) - [j1]Yuxuan Wang, DeLiang Wang:
Towards Scaling Up Classification-Based Speech Separation. IEEE Trans. Speech Audio Process. 21(7): 1381-1390 (2013) - [c5]Donald S. Williamson, Yuxuan Wang, DeLiang Wang:
A sparse representation approach for perceptual quality improvement of separated speech. ICASSP 2013: 7015-7019 - [c4]Yuxuan Wang, DeLiang Wang:
Feature denoising for speech separation in unknown noisy environments. ICASSP 2013: 7472-7476 - 2012
- [c3]Yuxuan Wang, DeLiang Wang:
Boosting Classification Based Speech Separation Using Temporal Dynamics. INTERSPEECH 2012: 1528-1531 - [c2]Yuxuan Wang, Kun Han, DeLiang Wang:
Acoustic Features for Classification Based Speech Separation. INTERSPEECH 2012: 1532-1535 - [c1]Yuxuan Wang, DeLiang Wang:
Cocktail Party Processing via Structured Prediction. NIPS 2012: 224-232
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-19 21:44 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint