default search action
Lei Xie 0001
Person information
- affiliation: Northwestern Polytechnical University, School of Computer Science, Xi'an, China
- affiliation (2006 - 2007): The Chinese University of Hong Kong, Department of Systems Engineering and Engineering Management, Hong Kong
- affiliation (2004 - 2006): City University of Hong Kong, School of Creative Media, Hong Kong
- affiliation (PhD 2004): Northwestern Polytechnical University, Xi'an, China
- affiliation (2001 - 2002): Vrije Universiteit Brussel, Department of Electronics and Information Processing, Belgium
Other persons with the same name
- Lei Xie — disambiguation page
- Lei Xie 0002 — Xi'an Jiaotong University, China
- Lei Xie 0003 — Zhejiang University, College of Information Science and Electronic Engineering, Hangzhou, China
- Lei Xie 0004 — Nanjing University, State Key Laboratory for Novel Software Technology, China
- Lei Xie 0005 — Delft University of Technology, Laboratory of Computer Engineering, The Netherlands
- Lei Xie 0006 — City University of New York, Department of Computer Science, Hunter College, NY, USA (and 1 more)
- Lei Xie 0007 — Zhejiang University, State Key Laboratory of Industrial Control Technology, Hangzhou, China (and 2 more)
- Lei Xie 0008 — Air Force Engineering University, Institute of Aeronautics Engineering, Department of weapon science and technology, China
- Lei Xie 0009 — Hong Kong University of Science and Technology, Department of Electronic and Computer Science, Hong Kong (and 1 more)
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j74]Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie:
Whisper-SV: Adapting Whisper for low-data-resource speaker verification. Speech Commun. 163: 103103 (2024) - [j73]Bingshen Mu, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie:
MMGER: Multi-Modal and Multi-Granularity Generative Error Correction With LLM for Joint Accent and Speech Recognition. IEEE Signal Process. Lett. 31: 1940-1944 (2024) - [j72]Runduo Han, Weiming Xu, Zihan Zhang, Mingshuai Liu, Lei Xie:
Distil-DCCRN: A Small-Footprint DCCRN Leveraging Feature-Based Knowledge Distillation in Speech Enhancement. IEEE Signal Process. Lett. 31: 2075-2079 (2024) - [j71]Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang:
StreamVoice+: Evolving Into End-to-End Streaming Zero-Shot Voice Conversion. IEEE Signal Process. Lett. 31: 3000-3004 (2024) - [j70]Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie:
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 32: 459-470 (2024) - [j69]Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie:
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1506-1518 (2024) - [j68]Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie:
Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2432-2444 (2024) - [j67]Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang:
Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2926-2937 (2024) - [j66]Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie:
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-Assisted Matrix. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2944-2956 (2024) - [j65]Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yuping Wang, Lei Xie:
U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4026-4035 (2024) - [c277]Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang:
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion. ACL (1) 2024: 7328-7338 - [c276]Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
Bs-Plcnet: Band-Split Packet Loss Concealment Network with Multi-Task Learning Framework and Multi-Discriminators. ICASSP Workshops 2024: 23-24 - [c275]Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie:
An Audio-Quality-Based Multi-Strategy Approach For Target Speaker Extraction in the Misp 2023 Challenge. ICASSP Workshops 2024: 27-28 - [c274]Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
Rad-Net: A Repairing and Denoising Network for Speech Signal Improvement. ICASSP Workshops 2024: 49-50 - [c273]He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li:
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge. ICASSP Workshops 2024: 63-64 - [c272]He Wang, Pengcheng Guo, Pan Zhou, Lei Xie:
MLCA-AVSR: Multi-Layer Cross Attention Fusion Based Audio-Visual Speech Recognition. ICASSP 2024: 8150-8154 - [c271]Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu, Lei Xie:
Promptvc: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts. ICASSP 2024: 10571-10575 - [c270]Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi:
Dualvc 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion. ICASSP 2024: 11106-11110 - [c269]Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie:
Automatic Channel Selection and Spatial Feature Integration for Multi-Channel Speech Recognition Across Various Array Topologies. ICASSP 2024: 11396-11400 - [c268]Ziqian Wang, Xinfa Zhu, Zihan Zhang, Yuanjun Lv, Ning Jiang, Guoqing Zhao, Lei Xie:
SELM: Speech Enhancement using Discrete Tokens and Language Models. ICASSP 2024: 11561-11565 - [c267]He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie:
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder. ICME Workshops 2024: 1-6 - [c266]Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Jie Liu, Lei Xie:
SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition. ICME 2024: 1-6 - [c265]Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie:
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-Supervised Contrastive Learning. ICME 2024: 1-6 - [c264]Xinfa Zhu, Wenjie Tian, Xinsheng Wang, Lei He, Yujia Xiao, Xi Wang, Xu Tan, Sheng Zhao, Lei Xie:
UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis. ACM Multimedia 2024: 7513-7522 - [c263]Zhixian Zhao, Haifeng Chen, Xi Li, Dongmei Jiang, Lei Xie:
Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment. MRAC@MM 2024: 67-71 - [i195]Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie:
E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models. CoRR abs/2401.00475 (2024) - [i194]He Wang, Pengcheng Guo, Pan Zhou, Lei Xie:
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition. CoRR abs/2401.03424 (2024) - [i193]He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li:
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge. CoRR abs/2401.03473 (2024) - [i192]Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators. CoRR abs/2401.03687 (2024) - [i191]Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie:
An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge. CoRR abs/2401.03697 (2024) - [i190]Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement. CoRR abs/2401.04389 (2024) - [i189]He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie:
The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023. CoRR abs/2401.06788 (2024) - [i188]He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie:
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder. CoRR abs/2404.05466 (2024) - [i187]Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie:
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets. CoRR abs/2405.02132 (2024) - [i186]Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie:
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition. CoRR abs/2405.03152 (2024) - [i185]Yuepeng Jiang, Tao Li, Fengyu Yang, Lei Xie, Meng Meng, Yujun Wang:
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling. CoRR abs/2406.05681 (2024) - [i184]Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li:
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection. CoRR abs/2406.07256 (2024) - [i183]Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie:
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention. CoRR abs/2406.07498 (2024) - [i182]Yuanjun Lv, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie:
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter. CoRR abs/2406.08196 (2024) - [i181]Linhan Ma, Xinfa Zhu, Yuanjun Lv, Zhichao Wang, Ziqian Wang, Wendi He, Hongbin Zhou, Lei Xie:
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy. CoRR abs/2406.09844 (2024) - [i180]Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie:
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study. CoRR abs/2406.18862 (2024) - [i179]Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie:
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification. CoRR abs/2407.10048 (2024) - [i178]He Wang, Lei Xie:
The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024. CoRR abs/2408.02369 (2024) - [i177]Runduo Han, Weiming Xu, Zihan Zhang, Mingshuai Liu, Lei Xie:
Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement. CoRR abs/2408.04267 (2024) - [i176]Yangze Li, Xiong Wang, Songjun Cao, Yike Zhang, Long Ma, Lei Xie:
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition. CoRR abs/2408.09491 (2024) - [i175]Tianyi Xu, Kaixun Huang, Pengcheng Guo, Yu Zhou, Longtao Huang, Hui Xue, Lei Xie:
Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper. CoRR abs/2408.10680 (2024) - [i174]Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie:
Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation. CoRR abs/2408.15474 (2024) - [i173]Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, Binbin Zhang, Bin Jia:
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge. CoRR abs/2409.05430 (2024) - [i172]Shuiyun Liu, Yuxiang Kong, Pengcheng Guo, Weiji Zhuang, Peng Gao, Yujun Wang, Lei Xie:
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge. CoRR abs/2409.10076 (2024) - [i171]Hongfei Xue, Wei Ren, Xuelong Geng, Kun Wei, Longhao Li, Qijie Shao, Linju Yang, Kai Diao, Lei Xie:
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text. CoRR abs/2409.11214 (2024) - [i170]Yuguang Yang, Yu Pan, Jixun Yao, Xiang Zhang, Jianhao Ye, Hongbin Zhou, Lei Xie, Lei Ma, Jianjun Zhao:
Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling. CoRR abs/2410.01350 (2024) - [i169]Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, Eng Siong Chng:
NTU-NPU System for Voice Privacy 2024 Challenge. CoRR abs/2410.02371 (2024) - [i168]Dake Guo, Jixun Yao, Xinfa Zhu, Kangxiang Xia, Zhao Guo, Ziyu Zhang, Yao Wang, Jie Liu, Lei Xie:
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge. CoRR abs/2410.23815 (2024) - [i167]Kangxiang Xia, Dake Guo, Jixun Yao, Liumeng Xue, Hanzhao Li, Shuai Wang, Zhao Guo, Lei Xie, Qingqing Zhang, Lei Luo, Minghui Dong, Peng Sun:
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings. CoRR abs/2411.00064 (2024) - [i166]Xiong Wang, Yangze Li, Chaoyou Fu, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long Ma:
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM. CoRR abs/2411.00774 (2024) - 2023
- [j64]Xiang Hao, Chenglin Xu, Lei Xie:
Neural speech enhancement with unsupervised pre-training and mixture training. Neural Networks 158: 216-227 (2023) - [j63]Zhenglei Wei, Huan Zhou, Fei Cen, Lei Xie, Wenqiang Zhu, Peng Zhang, Qinzhi Hao:
A novel evolutionary algorithm inspired from triangle search and its applications on parameters identification of photovoltaic models. Soft Comput. 27(20): 14835-14860 (2023) - [j62]Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang:
LM-VC: Zero-Shot Voice Conversion via Speech Generation Based on Language Models. IEEE Signal Process. Lett. 30: 1157-1161 (2023) - [j61]Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li, Qiao Tian, Yuping Wang, Lei Xie:
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3418-3430 (2023) - [j60]Qing Wang, Jixun Yao, Li Zhang, Pengcheng Guo, Lei Xie:
Timbre-Reserved Adversarial Attack in Speaker Identification. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3848-3858 (2023) - [j59]Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang:
MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3883-3895 (2023) - [j58]Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha:
Look&listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement. IEEE Trans. Multim. 25: 5800-5812 (2023) - [j57]Xinsheng Wang, Qicong Xie, Jihua Zhu, Lei Xie, Odette Scharenborg:
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Persons. IEEE Trans. Multim. 25: 6717-6728 (2023) - [c262]Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie, Dan Su:
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis. AAAI 2023: 13025-13033 - [c261]Wenjiang Chi, Xiaoqin Feng, Liumeng Xue, Yunlin Chen, Lei Xie, Zhifei Li:
Multi-granularity Semantic and Acoustic Stress Prediction for Expressive TTS. APSIPA ASC 2023: 2409-2415 - [c260]Peikun Chen, Fan Yu, Yuhao Liang, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie:
BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. ASRU 2023: 1-7 - [c259]Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie:
HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS. ASRU 2023: 1-7 - [c258]Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie:
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. ASRU 2023: 1-8 - [c257]Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie:
Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR. ASRU 2023: 1-7 - [c256]Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie:
Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis. ASRU 2023: 1-8 - [c255]Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. ASRU 2023: 1-8 - [c254]Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie:
Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation. ASRU 2023: 1-8 - [c253]Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie:
Vits-Based Singing Voice Conversion Leveraging Whisper and Multi-Scale F0 Modeling. ASRU 2023: 1-8 - [c252]Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie:
MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement. ASRU 2023: 1-8 - [c251]Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li:
Promptspeaker: Speaker Generation Based on Text Descriptions. ASRU 2023: 1-7 - [c250]Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie:
An Exploration of Task-Decoupling on Two-Stage Neural Post Filter for Real-Time Personalized Acoustic Echo Cancellation. ASRU 2023: 1-7 - [c249]Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie:
U2-KWS: Unified Two-Pass Open-Vocabulary Keyword Spotting with Keyword Bias. ASRU 2023: 1-8 - [c248]Yuepeng Jiang, Kun Song, Fengyu Yang, Lei Xie, Meng Meng, Yu Ji, Yujun Wang:
The Xiaomi-ASLP Text-to-speech System for Blizzard Challenge 2023. Blizzard Challenge 2023 - [c247]Ziqian Wang, Qing Wang, Jixun Yao, Lei Xie:
The NPU-ASLP System for Deepfake Algorithm Recognition in ADD 2023 Challenge. DADA@IJCAI 2023: 64-69 - [c246]Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie:
Two-Stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge. ICASSP 2023: 1-2 - [c245]Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi:
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features. ICASSP 2023: 1-5 - [c244]Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, Jinfeng Bai:
DSPGAN: A Gan-Based Universal Vocoder for High-Fidelity TTS by Time-Frequency Domain Supervision from DSP. ICASSP 2023: 1-5 - [c243]Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang:
Delivering Speaking Style in Low-Resource Voice Conversion with Multi-Factor Constraints. ICASSP 2023: 1-5 - [c242]Jie Wang, Menglong Xu, Jingyong Hou, Binbin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan:
Wekws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit. ICASSP 2023: 1-5 - [c241]Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie:
The NPU-Elevoc Personalized Speech Enhancement System for Icassp2023 DNS Challenge. ICASSP 2023: 1-2 - [c240]Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie:
Preserving Background Sound in Noise-Robust Voice Conversion Via Multi-Task Learning. ICASSP 2023: 1-5 - [c239]Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu:
Distinguishable Speaker Anonymization Based on Formant and Fundamental Frequency Scaling. ICASSP 2023: 1-5 - [c238]Ao Zhang, He Wang, Pengcheng Guo, Yihui Fu, Lei Xie, Yingying Gao, Shilei Zhang, Junlan Feng:
VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting. ICASSP 2023: 1-5 - [c237]Li Zhang, Qing Wang, Hongji Wang, Yue Li, Wei Rao, Yannan Wang, Lei Xie:
Distance-Based Weight Transfer for Fine-Tuning From Near-Field to Far-Field Speaker Verification. ICASSP 2023: 1-5 - [c236]Zihan Zhang, Shimin Zhang, Mingshuai Liu, Yanhong Leng, Zhe Han, Li Chen, Lei Xie:
Two-Step Band-Split Neural Network Approach For Full-Band Residual Echo Suppression. ICASSP 2023: 1-2 - [c235]Xinfa Zhu, Yi Lei, Kun Song, Yongmao Zhang, Tao Li, Lei Xie:
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling. ICASSP 2023: 1-5 - [c234]Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma:
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation. INTERSPEECH 2023: 42-46 - [c233]Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu:
TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition. INTERSPEECH 2023: 216-220 - [c232]Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie:
DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting. INTERSPEECH 2023: 929-933 - [c231]Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie:
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition. INTERSPEECH 2023: 1668-1672 - [c230]Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi:
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding. INTERSPEECH 2023: 2063-2067 - [c229]Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie:
Two Stage Contextual Word Filtering for Context Bias in Unified Streaming and Non-streaming Transducer. INTERSPEECH 2023: 3257-3261 - [c228]Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie:
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR. INTERSPEECH 2023: 3487-3491 - [c227]Qing Wang, Jixun Yao, Ziqian Wang, Pengcheng Guo, Lei Xie:
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification. INTERSPEECH 2023: 3994-3998 - [c226]Yongmao Zhang, Heyang Xue, Hanzhao Li, Lei Xie, Tingwei Guo, Ruixiong Zhang, Caixia Gong:
VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer. INTERSPEECH 2023: 4444-4448 - [c225]Guanghou Liu, Yongmao Zhang, Yi Lei, Yunlin Chen, Rui Wang, Lei Xie, Zhifei Li:
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions. INTERSPEECH 2023: 4888-4892 - [c224]Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie:
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network. INTERSPEECH 2023: 4933-4937 - [c223]Kun Song, Yi Lei, Peikun Chen, Yiqing Cao, Kun Wei, Yongmao Zhang, Lei Xie, Ning Jiang, Guoqing Zhao:
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task. IWSLT@ACL 2023: 311-320 - [i165]Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie:
Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer. CoRR abs/2301.06735 (2023) - [i164]Ao Zhang, He Wang, Pengcheng Guo, Yihui Fu, Lei Xie, Yingying Gao, Shilei Zhang, Junlan Feng:
VE-KWS: Visual Modality Enhanced End-to-End Keyword Spotting. CoRR abs/2302.13523 (2023) - [i163]Li Zhang, Qing Wang, Hongji Wang, Yue Li, Wei Rao, Yannan Wang, Lei Xie:
Distance-based Weight Transfer from Near-field to Far-field Speaker Verification. CoRR abs/2303.00264 (2023) - [i162]Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie:
Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge. CoRR abs/2303.07621 (2023) - [i161]Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang:
Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion. CoRR abs/2305.07204 (2023) - [i160]Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie:
DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting. CoRR abs/2305.12331 (2023) - [i159]Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi:
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding. CoRR abs/2305.12425 (2023) - [i158]Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie:
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network. CoRR abs/2305.12493 (2023) - [i157]Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie:
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR. CoRR abs/2305.13716 (2023) - [i156]Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma:
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation. CoRR abs/2305.17732 (2023) - [i155]Qing Wang, Jixun Yao, Ziqian Wang, Pengcheng Guo, Lei Xie:
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification. CoRR abs/2305.19020 (2023) - [i154]Guanghou Liu, Yongmao Zhang, Yi Lei, Yunlin Chen, Rui Wang, Zhifei Li, Lei Xie:
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions. CoRR abs/2305.19522 (2023) - [i153]Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie:
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition. CoRR abs/2306.00804 (2023) - [i152]Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang:
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models. CoRR abs/2306.10521 (2023) - [i151]Kun Song, Yi Lei, Peikun Chen, Yiqing Cao, Kun Wei, Yongmao Zhang, Lei Xie, Ning Jiang, Guoqing Zhao:
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task. CoRR abs/2307.04630 (2023) - [i150]Li Zhang, Huan Zhao, Yue Li, Bowen Pang, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie:
The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022. CoRR abs/2307.15400 (2023) - [i149]Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li, Qiao Tian, Yuping Wang, Lei Xie:
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech - A Study between English and Mandarin. CoRR abs/2309.00883 (2023) - [i148]Qing Wang, Jixun Yao, Li Zhang, Pengcheng Guo, Lei Xie:
Timbre-reserved Adversarial Attack in Speaker Identification. CoRR abs/2309.00929 (2023) - [i147]Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang:
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling. CoRR abs/2309.01142 (2023) - [i146]Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, Jingjing Yin, Hongbin Zhou, Heng Lu, Lei Xie:
PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts. CoRR abs/2309.09262 (2023) - [i145]Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR. CoRR abs/2309.13573 (2023) - [i144]Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie:
HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS. CoRR abs/2309.13907 (2023) - [i143]Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi:
DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion. CoRR abs/2309.15496 (2023) - [i142]Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Lei Xie, Jie Liu:
SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition. CoRR abs/2309.16937 (2023) - [i141]Peikun Chen, Fan Yu, Yuhao Liang, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie:
BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition. CoRR abs/2310.02629 (2023) - [i140]Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie:
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis. CoRR abs/2310.03963 (2023) - [i139]Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yuping Wang, Lei Xie:
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning. CoRR abs/2310.04004 (2023) - [i138]Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie:
MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement. CoRR abs/2310.04369 (2023) - [i137]Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie:
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. CoRR abs/2310.04657 (2023) - [i136]Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie:
An Exploration of Task-decoupling on Two-stage Neural Post Filter for Real-time Personalized Acoustic Echo Cancellation. CoRR abs/2310.04715 (2023) - [i135]Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie:
SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR. CoRR abs/2310.04863 (2023) - [i134]Yongmao Zhang, Guanghou Liu, Yi Lei, Yunlin Chen, Hao Yin, Lei Xie, Zhifei Li:
PromptSpeaker: Speaker Generation Based on Text Descriptions. CoRR abs/2310.05001 (2023) - [i133]Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie:
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation. CoRR abs/2310.05051 (2023) - [i132]Xinfa Zhu, Yuanjun Lv, Yi Lei, Tao Li, Wendi He, Hongbin Zhou, Heng Lu, Lei Xie:
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation. CoRR abs/2310.07246 (2023) - [i131]Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie:
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation. CoRR abs/2310.14278 (2023) - [i130]Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie:
Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning. CoRR abs/2310.17101 (2023) - [i129]Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie:
Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition. CoRR abs/2311.07062 (2023) - [i128]Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie:
SponTTS: modeling and transferring spontaneous style for TTS. CoRR abs/2311.07179 (2023) - [i127]Huan Zhao, Li Zhang, Yue Li, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie:
Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization. CoRR abs/2312.04131 (2023) - [i126]Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie:
Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies. CoRR abs/2312.09746 (2023) - [i125]Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie:
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias. CoRR abs/2312.09760 (2023) - [i124]Linhan Ma, Yongmao Zhang, Xinfa Zhu, Yi Lei, Ziqian Ning, Pengcheng Zhu, Lei Xie:
Accent-VITS: accent transfer for end-to-end TTS. CoRR abs/2312.16850 (2023) - 2022
- [j56]Hongqiang Du, Lei Xie, Haizhou Li:
Noise-robust voice conversion with domain adversarial training. Neural Networks 148: 74-84 (2022) - [j55]Chenggang Mi, Lei Xie, Yanning Zhang:
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing. Neural Networks 148: 194-205 (2022) - [j54]Jingyong Hou, Lei Xie, Shilei Zhang:
Two-stage streaming keyword detection and localization with multi-scale depthwise temporal convolution. Neural Networks 150: 28-42 (2022) - [j53]Yi Lei, Shan Yang, Xinfa Zhu, Lei Xie, Dan Su:
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis. IEEE Signal Process. Lett. 29: 1948-1952 (2022) - [j52]Xiaochun An, Frank K. Soong, Lei Xie:
Disentangling Style and Speaker Attributes for TTS Style Transfer. IEEE ACM Trans. Audio Speech Lang. Process. 30: 646-658 (2022) - [j51]Yi Lei, Shan Yang, Xinsheng Wang, Lei Xie:
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 30: 853-864 (2022) - [j50]Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Lei Xie:
Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1448-1460 (2022) - [j49]Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie:
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS. IEEE ACM Trans. Audio Speech Lang. Process. 30: 2854-2864 (2022) - [c222]Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu:
M2Met: The Icassp 2022 Multi-Channel Multi-Party Meeting Transcription Challenge. ICASSP 2022: 6167-6171 - [c221]Binbin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu, Zhendong Peng:
WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition. ICASSP 2022: 6182-6186 - [c220]Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma:
Conversational Speech Recognition by Learning Conversation-Level Characteristics. ICASSP 2022: 6752-6756 - [c219]Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi:
One-Shot Voice Conversion For Style Transfer Based On Speaker Adaptation. ICASSP 2022: 6792-6796 - [c218]Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi:
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis. ICASSP 2022: 7237-7241 - [c217]Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie:
Uformer: A Unet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation. ICASSP 2022: 7417-7421 - [c216]Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu:
S-DCCRN: Super Wide Band DCCRN with Learnable Complex Feature for Speech Enhancement. ICASSP 2022: 7767-7771 - [c215]Shimin Zhang, Ziteng Wang, Jiayao Sun, Yihui Fu, Biao Tian, Qiang Fu, Lei Xie:
Multi-Task Deep Residual Echo Suppression with Echo-Aware Loss. ICASSP 2022: 9127-9131 - [c214]Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu:
Summary on the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge. ICASSP 2022: 9156-9160 - [c213]Yukai Ju, Wei Rao, Xiaopeng Yan, Yihui Fu, Shubo Lv, Luyao Cheng, Yannan Wang, Lei Xie, Shidong Shang:
TEA-PSE: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System for ICASSP 2022 DNS Challenge. ICASSP 2022: 9291-9295 - [c212]Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie:
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings. INTERSPEECH 2022: 560-564 - [c211]Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma:
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR. INTERSPEECH 2022: 1016-1020 - [c210]Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu:
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit. INTERSPEECH 2022: 1661-1665 - [c209]Zhanheng Yang, Sining Sun, Jin Li, Xiaoming Zhang, Xiong Wang, Long Ma, Lei Xie:
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer. INTERSPEECH 2022: 1681-1685 - [c208]Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan:
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset. INTERSPEECH 2022: 1736-1740 - [c207]Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie:
Personalized Acoustic Echo Cancellation for Full-duplex Communications. INTERSPEECH 2022: 2518-2522 - [c206]Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie:
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers. INTERSPEECH 2022: 2548-2552 - [c205]Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su:
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion. INTERSPEECH 2022: 2563-2567 - [c204]Zhanheng Yang, Hang Lv, Xiong Wang, Ao Zhang, Lei Xie:
Minimizing Sequential Confusion Error in Speech Command Recognition. INTERSPEECH 2022: 3193-3197 - [c203]Qijie Shao, Jinghao Yan, Jian Kang, Pengcheng Guo, Xian Shi, Pengfei Hu, Lei Xie:
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition. INTERSPEECH 2022: 3719-3723 - [c202]Yu Wang, Xinsheng Wang, Pengcheng Zhu, Jie Wu, Hanzhao Li, Heyang Xue, Yongmao Zhang, Lei Xie, Mengxiao Bi:
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis. INTERSPEECH 2022: 4242-4246 - [c201]Heyang Xue, Xinsheng Wang, Yongmao Zhang, Lei Xie, Pengcheng Zhu, Mengxiao Bi:
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher. INTERSPEECH 2022: 4267-4271 - [c200]Li Zhang, Yue Li, Huan Zhao, Qing Wang, Lei Xie:
Backend Ensemble for Speaker Verification and Spoofing Countermeasure. INTERSPEECH 2022: 4381-4385 - [c199]Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Mingqi Jiang, Lei Xie:
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis. INTERSPEECH 2022: 5498-5502 - [c198]Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan:
Multi-speaker Multi-style Text-to-speech Synthesis with Single-speaker Single-style Training Data Scenarios. ISCSLP 2022: 66-70 - [c197]Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang, Haiying Wu:
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS. ISCSLP 2022: 71-75 - [c196]Yongmao Zhang, Zhichao Wang, Peiji Yang, Hongshen Sun, Zhisheng Wang, Lei Xie:
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents. ISCSLP 2022: 76-80 - [c195]Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su:
End-to-End Voice Conversion with Information Perturbation. ISCSLP 2022: 91-95 - [c194]Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang, Dan Su:
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation. ISCSLP 2022: 319-323 - [c193]Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan:
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines. ISCSLP 2022: 488-492 - [c192]Bowen Pang, Huan Zhao, Gaosheng Zhang, Xiaoyue Yang, Yang Sun, Li Zhang, Qing Wang, Lei Xie:
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge. ISCSLP 2022: 502-506 - [c191]Ao Zhang, Fan Yu, Kaixun Huang, Lei Xie, Longbiao Wang, Eng Siong Chng, Hui Bu, Binbin Zhang, Wei Chen, Xin Xu:
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results. ISCSLP 2022: 507-511 - [c190]Yuhao Liang, Peikun Chen, Fan Yu, Xinfa Zhu, Tianyi Xu, Yingying Gao, Lei Xie:
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge. ISCSLP 2022: 532-536 - [c189]Fan Yu, Shiliang Zhang, Pengcheng Guo, Yuhao Liang, Zhihao Du, Yuxiao Lin, Lei Xie:
MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario. SLT 2022: 144-151 - [c188]Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang:
Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. SLT 2022: 436-443 - [c187]Yukai Ju, Shimin Zhang, Wei Rao, Yannan Wang, Tao Yu, Lei Xie, Shidong Shang:
TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. SLT 2022: 472-479 - [i123]Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li:
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion. CoRR abs/2201.00269 (2022) - [i122]Yi Lei, Shan Yang, Xinsheng Wang, Lei Xie:
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis. CoRR abs/2201.06460 (2022) - [i121]Yu Wang, Xinsheng Wang, Pengcheng Zhu, Jie Wu, Hanzhao Li, Heyang Xue, Yongmao Zhang, Lei Xie, Mengxiao Bi:
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis. CoRR abs/2201.07429 (2022) - [i120]Xiaochun An, Frank K. Soong, Lei Xie:
Disentangling Style and Speaker Attributes for TTS Style Transfer. CoRR abs/2201.09472 (2022) - [i119]Hongqiang Du, Lei Xie, Haizhou Li:
Noise-robust voice conversion with domain adversarial training. CoRR abs/2201.10693 (2022) - [i118]Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, Siqi Zheng, Weilong Huang, Lei Xie, Zheng-Hua Tan, DeLiang Wang, Yanmin Qian, Kong Aik Lee, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu:
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge. CoRR abs/2202.03647 (2022) - [i117]Shimin Zhang, Ziteng Wang, Jiayao Sun, Yihui Fu, Biao Tian, Qiang Fu, Lei Xie:
Multi-Task Deep Residual Echo Suppression with Echo-aware Loss. CoRR abs/2202.06850 (2022) - [i116]Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma:
Conversational Speech Recognition By Learning Conversation-level Characteristics. CoRR abs/2202.07855 (2022) - [i115]Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha:
Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement. CoRR abs/2203.02216 (2022) - [i114]Junwen Xiong, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha, Yanning Zhang:
Audio-visual speech separation based on joint feature representation with cross-modal attention. CoRR abs/2203.02655 (2022) - [i113]Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha:
Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild. CoRR abs/2203.03984 (2022) - [i112]Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha, Yanning Zhang:
An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection. CoRR abs/2203.05178 (2022) - [i111]Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu:
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit. CoRR abs/2203.15455 (2022) - [i110]Heyang Xue, Xinsheng Wang, Yongmao Zhang, Lei Xie, Pengcheng Zhu, Mengxiao Bi:
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher. CoRR abs/2203.16408 (2022) - [i109]Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie:
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings. CoRR abs/2203.16834 (2022) - [i108]Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan:
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset. CoRR abs/2203.16844 (2022) - [i107]Qijie Shao, Jinghao Yan, Jian Kang, Pengcheng Guo, Xian Shi, Pengfei Hu, Lei Xie:
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition. CoRR abs/2204.03398 (2022) - [i106]Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie:
Personalized Acoustic Echo Cancellation for Full-duplex Communications. CoRR abs/2205.15195 (2022) - [i105]Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang, Dan Su:
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation. CoRR abs/2206.00208 (2022) - [i104]Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su:
End-to-End Voice Conversion with Information Perturbation. CoRR abs/2206.07569 (2022) - [i103]Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie:
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers. CoRR abs/2207.00756 (2022) - [i102]Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma:
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR. CoRR abs/2207.01039 (2022) - [i101]Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Mingqi Jiang, Lei Xie:
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis. CoRR abs/2207.01198 (2022) - [i100]Zhanheng Yang, Hang Lv, Xiong Wang, Ao Zhang, Lei Xie:
Minimizing Sequential Confusion Error in Speech Command Recognition. CoRR abs/2207.01261 (2022) - [i99]Zhanheng Yang, Sining Sun, Jin Li, Xiaoming Zhang, Xiong Wang, Long Ma, Lei Xie:
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer. CoRR abs/2207.01267 (2022) - [i98]Li Zhang, Yue Li, Huan Zhao, Qing Wang, Lei Xie:
Backend Ensemble for Speaker Verification and Spoofing Countermeasure. CoRR abs/2207.01802 (2022) - [i97]Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su:
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion. CoRR abs/2207.01832 (2022) - [i96]Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan:
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines. CoRR abs/2208.08042 (2022) - [i95]Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie:
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS. CoRR abs/2209.06484 (2022) - [i94]Jixun Yao, Qing Wang, Li Zhang, Pengcheng Guo, Yuhao Liang, Lei Xie:
NWPU-ASLP System for the VoicePrivacy 2022 Challenge. CoRR abs/2209.11969 (2022) - [i93]Fan Yu, Shiliang Zhang, Pengcheng Guo, Yuhao Liang, Zhihao Du, Yuxiao Lin, Lei Xie:
MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario. CoRR abs/2210.05265 (2022) - [i92]Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang:
spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement. CoRR abs/2210.08802 (2022) - [i91]Yuhao Liang, Peikun Chen, Fan Yu, Xinfa Zhu, Tianyi Xu, Lei Xie:
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge. CoRR abs/2210.14448 (2022) - [i90]Bowen Pang, Huan Zhao, Gaosheng Zhang, Xiaoyue Yang, Yang Sun, Li Zhang, Qing Wang, Lei Xie:
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge. CoRR abs/2210.14653 (2022) - [i89]Jie Wang, Menglong Xu, Jingyong Hou, Binbin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan:
WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit. CoRR abs/2210.16743 (2022) - [i88]Yongmao Zhang, Zhichao Wang, Peiji Yang, Hongshen Sun, Zhisheng Wang, Lei Xie:
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents. CoRR abs/2210.17305 (2022) - [i87]Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang, Haiying Wu:
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS. CoRR abs/2210.17349 (2022) - [i86]Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, Jinfeng Bai:
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP. CoRR abs/2211.01087 (2022) - [i85]Ao Zhang, Fan Yu, Kaixun Huang, Lei Xie, Longbiao Wang, Eng Siong Chng, Hui Bu, Binbin Zhang, Wei Chen, Xin Xu:
The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results. CoRR abs/2211.01585 (2022) - [i84]Yongmao Zhang, Heyang Xue, Hanzhao Li, Lei Xie, Tingwei Guo, Ruixiong Zhang, Caixia Gong:
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer. CoRR abs/2211.02903 (2022) - [i83]Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie:
Preserving background sound in noise-robust voice conversion via multi-task learning. CoRR abs/2211.03036 (2022) - [i82]Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu:
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling. CoRR abs/2211.03038 (2022) - [i81]Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi:
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features. CoRR abs/2211.04710 (2022) - [i80]Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang:
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints. CoRR abs/2211.08857 (2022) - [i79]Xinfa Zhu, Yi Lei, Kun Song, Yongmao Zhang, Tao Li, Lei Xie:
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling. CoRR abs/2211.10568 (2022) - [i78]Zhuoyuan Yao, Shuo Ren, Sanyuan Chen, Ziyang Ma, Pengcheng Guo, Lei Xie:
TESSP: Text-Enhanced Self-Supervised Speech Pre-training. CoRR abs/2211.13443 (2022) - [i77]Yue Li, Li Zhang, Namin Wang, Jie Liu, Lei Xie:
MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages. CoRR abs/2211.16694 (2022) - [i76]Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie, Dan Su:
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis. CoRR abs/2212.01546 (2022) - 2021
- [j48]Liumeng Xue, Shifeng Pan, Lei He, Lei Xie, Frank K. Soong:
Cycle consistent network for end-to-end style transfer TTS training. Neural Networks 140: 223-236 (2021) - [j47]Xiaochun An, Frank K. Soong, Shan Yang, Lei Xie:
Effective and direct control of neural TTS prosody by removing interactions between different attributes. Neural Networks 143: 250-260 (2021) - [j46]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Factorized WaveNet for voice conversion with limited data. Speech Commun. 130: 45-54 (2021) - [j45]Hang Lv, Daniel Povey, Mahsa Yarmohammadi, Ke Li, Yiming Wang, Lei Xie, Sanjeev Khudanpur:
LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder With Exact Lattice Generation. IEEE Signal Process. Lett. 28: 703-707 (2021) - [c186]Qijie Shao, Jingyong Hou, Yanxin Hu, Qing Wang, Lei Xie, Xin Lei:
Target Speaker Extraction for Customizable Query-by-Example Keyword Spotting. APSIPA ASC 2021: 672-678 - [c185]Chenggang Mi, Shaolin Zhu, Yi Fan, Lei Xie:
Incorporating Typological Features into Language Selection for Multilingual Neural Machine Translation. APWeb/WAIM (1) 2021: 348-357 - [c184]Li Zhang, Qing Wang, Lei Xie:
Duality Temporal-Channel-Frequency Attention Enhanced Speaker Representation Learning. ASRU 2021: 206-213 - [c183]Fan Yu, Haoneng Luo, Pengcheng Guo, Yuhao Liang, Zhuoyuan Yao, Lei Xie, Yingying Gao, Leijing Hou, Shilei Zhang:
Boundary and Context Aware Training for CIF-Based Non-Autoregressive End-to-End ASR. ASRU 2021: 328-334 - [c182]Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang:
Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing. ASRU 2021: 679-686 - [c181]Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
Wake Word Detection with Streaming Transformers. ICASSP 2021: 5864-5868 - [c180]Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition. ICASSP 2021: 6019-6023 - [c179]Xian Shi, Fan Yu, Yizhou Lu, Yuhao Liang, Qiangze Feng, Daliang Wang, Yanmin Qian, Lei Xie:
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods. ICASSP 2021: 6918-6922 - [c178]Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu:
The Multi-Speaker Multi-Style Voice Cloning Challenge 2021. ICASSP 2021: 8613-8617 - [c177]Xian Shi, Pan Zhou, Wei Chen, Lei Xie:
Efficient Gradient-Based Neural Architecture Search For End-to-End ASR. ICMI Companion 2021: 91-96 - [c176]Zhiwei Chen, Weizhao Yang, Jinrong Li, Jiale Wang, Shuai Li, Ziwen Wang, Lei Xie:
A Web-Based Longitudinal Mental Health Monitoring System. ICMI Companion 2021: 121-125 - [c175]Yi Chen, Shan Yang, Na Hu, Lei Xie, Dan Su:
TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN. ICMI Companion 2021: 126-130 - [c174]Heyang Xue, Xiao Zhang, Jie Wu, Jian Luan, Yujun Wang, Lei Xie:
Noise Robust Singing Voice Synthesis Using Gaussian Mixture Variational Autoencoder. ICMI Companion 2021: 131-136 - [c173]Dongyan Huang, Björn W. Schuller, Jianhua Tao, Lei Xie, Jie Yang:
ASMMC21: The 6th International Workshop on Affective Social Multimedia Computing. ICMI 2021: 864-867 - [c172]Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li:
Enriching Source Style Transfer in Recognition-Synthesis Based Non-Parallel Voice Conversion. Interspeech 2021: 831-835 - [c171]Li Zhang, Qing Wang, Kong Aik Lee, Lei Xie, Haizhou Li:
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification. Interspeech 2021: 1094-1098 - [c170]Jian Cong, Shan Yang, Lei Xie, Dan Su:
Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis. Interspeech 2021: 2182-2186 - [c169]Shubo Lv, Yanxin Hu, Shimin Zhang, Lei Xie:
DCCRN+: Channel-Wise Subband DCCRN with SNR Estimation for Speech Enhancement. Interspeech 2021: 2816-2820 - [c168]Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen:
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario. Interspeech 2021: 3665-3669 - [c167]Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie:
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain. Interspeech 2021: 3720-3724 - [c166]Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei:
WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit. Interspeech 2021: 4054-4058 - [c165]Xiong Wang, Sining Sun, Lei Xie, Long Ma:
Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition. Interspeech 2021: 4578-4582 - [c164]Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su:
Controllable Context-Aware Conversational Speech Synthesis. Interspeech 2021: 4658-4662 - [c163]Xiaochun An, Frank K. Soong, Lei Xie:
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS. Interspeech 2021: 4688-4692 - [c162]Shimin Zhang, Yuxiang Kong, Shubo Lv, Yanxin Hu, Lei Xie:
F-T-LSTM Based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement. Interspeech 2021: 4758-4762 - [c161]Tao Li, Shan Yang, Liumeng Xue, Lei Xie:
Controllable Emotion Transfer For End-to-End Speech Synthesis. ISCSLP 2021: 1-5 - [c160]Zhichao Wang, Wenshuo Ge, Xiong Wang, Shan Yang, Wendong Gan, Haitao Chen, Hai Li, Lei Xie, Xiulin Li:
Accent and Speaker Disentanglement in Many-to-many Voice Conversion. ISCSLP 2021: 1-5 - [c159]Qing Wang, Wei Rao, Pengcheng Guo, Lei Xie:
Adversarial Training for Multi-domain Speaker Recognition. ISCSLP 2021: 1-5 - [c158]Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie:
Context-aware RNNLM Rescoring for Conversational Speech Recognition. ISCSLP 2021: 1-5 - [c157]Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie:
Cascade RNN-Transducer: Syllable Based Streaming On-Device Mandarin Speech Recognition with a Syllable-To-Character Converter. SLT 2021: 15-21 - [c156]Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie:
Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition. SLT 2021: 75-81 - [c155]Yuxiang Kong, Jian Wu, Quandong Wang, Peng Gao, Weiji Zhuang, Yujun Wang, Lei Xie:
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet. SLT 2021: 104-110 - [c154]Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agents. SLT 2021: 403-409 - [c153]Yi Lei, Shan Yang, Lei Xie:
Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis. SLT 2021: 423-430 - [c152]Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie:
Multi-Band Melgan: Faster Waveform Generation For High-Quality Text-To-Speech. SLT 2021: 492-498 - [c151]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity. SLT 2021: 507-513 - [c150]Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li:
Learn2Sing: Target Speaker Singing Voice Synthesis by Learning from a Singing Teacher. SLT 2021: 522-529 - [c149]Yihui Fu, Jian Wu, Yanxin Hu, Mengtao Xing, Lei Xie:
DESNet: A Multi-Channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation. SLT 2021: 857-864 - [c148]Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlícek, Jean-Marc Odobez:
IEEE SLT 2021 Alpha-Mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines. SLT 2021: 1101-1108 - [c147]Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao:
The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines. SLT 2021: 1117-1123 - [i75]Binbin Zhang, Di Wu, Chao Yang, Xiaoyu Chen, Zhendong Peng, Xiangming Wang, Zhuoyuan Yao, Xiong Wang, Fan Yu, Lei Xie, Xin Lei:
WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit. CoRR abs/2102.01547 (2021) - [i74]Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
Wake Word Detection with Streaming Transformers. CoRR abs/2102.04488 (2021) - [i73]Xian Shi, Fan Yu, Yizhou Lu, Yuhao Liang, Qiangze Feng, Daliang Wang, Yanmin Qian, Lei Xie:
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods. CoRR abs/2102.10233 (2021) - [i72]Jingyong Hou, Li Zhang, Yihui Fu, Qing Wang, Zhanheng Yang, Qijie Shao, Lei Xie:
The NPU System for the 2020 Personalized Voice Trigger Challenge. CoRR abs/2102.13552 (2021) - [i71]Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
An Asynchronous WFST-Based Decoder For Automatic Speech Recognition. CoRR abs/2103.09063 (2021) - [i70]Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang:
INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing. CoRR abs/2104.00960 (2021) - [i69]Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen:
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario. CoRR abs/2104.03603 (2021) - [i68]Fan Yu, Haoneng Luo, Pengcheng Guo, Yuhao Liang, Zhuoyuan Yao, Lei Xie, Yingying Gao, Leijing Hou, Shilei Zhang:
Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR. CoRR abs/2104.04702 (2021) - [i67]Shimin Zhang, Yuxiang Kong, Shubo Lv, Yanxin Hu, Lei Xie:
F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement. CoRR abs/2106.07577 (2021) - [i66]Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie:
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain. CoRR abs/2106.08595 (2021) - [i65]Shubo Lv, Yanxin Hu, Shimin Zhang, Lei Xie:
DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement. CoRR abs/2106.08672 (2021) - [i64]Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li:
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion. CoRR abs/2106.08741 (2021) - [i63]Xiong Wang, Sining Sun, Lei Xie, Long Ma:
Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition. CoRR abs/2106.09236 (2021) - [i62]Li Zhang, Qing Wang, Kong Aik Lee, Lei Xie, Haizhou Li:
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification. CoRR abs/2106.09320 (2021) - [i61]Xiaochun An, Frank K. Soong, Lei Xie:
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS. CoRR abs/2106.10003 (2021) - [i60]Hongqiang Du, Lei Xie:
Improving robustness of one-shot voice conversion with deep discriminative speaker encoder. CoRR abs/2106.10406 (2021) - [i59]Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su:
Controllable Context-aware Conversational Speech Synthesis. CoRR abs/2106.10828 (2021) - [i58]Jian Cong, Shan Yang, Lei Xie, Dan Su:
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis. CoRR abs/2106.10831 (2021) - [i57]Xinsheng Wang, Qicong Xie, Jihua Zhu, Lei Xie, Odette Scharenborg:
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person. CoRR abs/2108.04325 (2021) - [i56]Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Lei Xie:
Controllable cross-speaker emotion transfer for end-to-end speech synthesis. CoRR abs/2109.06733 (2021) - [i55]Binbin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu, Zhendong Peng:
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition. CoRR abs/2110.03370 (2021) - [i54]Li Zhang, Qing Wang, Lei Xie:
Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning. CoRR abs/2110.06565 (2021) - [i53]Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu:
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge. CoRR abs/2110.07393 (2021) - [i52]Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi:
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis. CoRR abs/2110.08813 (2021) - [i51]Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie:
Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation. CoRR abs/2111.06015 (2021) - [i50]Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu:
S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement. CoRR abs/2111.08387 (2021) - [i49]Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi:
One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation. CoRR abs/2111.12277 (2021) - [i48]Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan:
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios. CoRR abs/2112.12743 (2021) - 2020
- [j44]Shan Yang, Heng Lu, Shiyin Kang, Liumeng Xue, Jinba Xiao, Dan Su, Lei Xie, Dong Yu:
On the localness modeling for the self-attention based end-to-end speech synthesis. Neural Networks 125: 121-130 (2020) - [j43]Shan Yang, Yuxuan Wang, Lei Xie:
Adversarial Feature Learning and Unsupervised Clustering Based Speech Synthesis for Found Data With Acoustic and Textual Noise. IEEE Signal Process. Lett. 27: 1730-1734 (2020) - [j42]Chenggang Mi, Lei Xie, Yanning Zhang:
Loanword Identification in Low-Resource Languages with Minimal Supervision. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19(3): 43:1-43:22 (2020) - [j41]Yougen Yuan, Lei Xie, Cheung-Chi Leung, Hongjie Chen, Bin Ma:
Fast Query-by-Example Speech Search Using Attention-Based Deep Binary Embeddings. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1988-2000 (2020) - [j40]Chenggang Mi, Lei Xie, Yanning Zhang:
Improving Adversarial Neural Machine Translation for Morphologically Rich Language. IEEE Trans. Emerg. Top. Comput. Intell. 4(4): 417-426 (2020) - [c146]Xiaohai Tian, Zhichao Wang, Shan Yang, Xinyong Zhou, Hongqiang Du, Yi Zhou, Mingyang Zhang, Kun Zhou, Berrak Sisman, Lei Xie, Haizhou Li:
The NUS & NWPU system for Voice Conversion Challenge 2020. Blizzard Challenge / Voice Conversion Challenge 2020 - [c145]Xiang Hao, Chenglin Xu, Nana Hou, Lei Xie, Eng Siong Chng, Haizhou Li:
Time-Domain Neural Network Approach for Speech Bandwidth Extension. ICASSP 2020: 866-870 - [c144]Jingyong Hou, Yangyang Shi, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie:
Mining Effective Negative Training Samples for Keyword Spotting. ICASSP 2020: 7444-7448 - [c143]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Effective Wavenet Adaptation for Voice Conversion with Limited Data. ICASSP 2020: 7779-7783 - [c142]Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-End Architecture of Online Multi-Channel Speech Separation. INTERSPEECH 2020: 81-85 - [c141]Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan:
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training. INTERSPEECH 2020: 811-815 - [c140]Haohe Liu, Lei Xie, Jian Wu, Geng Yang:
Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music. INTERSPEECH 2020: 1241-1245 - [c139]Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie:
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition. INTERSPEECH 2020: 2142-2146 - [c138]Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie:
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement. INTERSPEECH 2020: 2472-2476 - [c137]Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie:
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis. INTERSPEECH 2020: 3436-3440 - [c136]Li Zhang, Jian Wu, Lei Xie:
NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge. INTERSPEECH 2020: 3471-3475 - [c135]Qing Wang, Pengcheng Guo, Lei Xie:
Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition. INTERSPEECH 2020: 4228-4232 - [c134]Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
Wake Word Detection with Alignment-Free Lattice-Free MMI. INTERSPEECH 2020: 4258-4262 - [c133]Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie:
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals. NeurIPS 2020 - [i47]Shan Yang, Yuxuan Wang, Lei Xie:
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise. CoRR abs/2004.13595 (2020) - [i46]Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie:
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech. CoRR abs/2005.05106 (2020) - [i45]Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
Wake Word Detection with Alignment-Free Lattice-Free MMI. CoRR abs/2005.08347 (2020) - [i44]Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agent. CoRR abs/2005.10438 (2020) - [i43]Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie:
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition. CoRR abs/2005.10463 (2020) - [i42]Qing Wang, Pengcheng Guo, Lei Xie:
Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition. CoRR abs/2005.10637 (2020) - [i41]Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie:
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition. CoRR abs/2006.01712 (2020) - [i40]Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie:
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals. CoRR abs/2006.14150 (2020) - [i39]Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie:
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement. CoRR abs/2008.00264 (2020) - [i38]Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie:
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis. CoRR abs/2008.00613 (2020) - [i37]Li Zhang, Jian Wu, Lei Xie:
NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge. CoRR abs/2008.03521 (2020) - [i36]Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan:
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training. CoRR abs/2008.04265 (2020) - [i35]Haohe Liu, Lei Xie, Jian Wu, Geng Yang:
Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music. CoRR abs/2008.05216 (2020) - [i34]Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-end Architecture of Online Multi-channel Speech Separation. CoRR abs/2009.03141 (2020) - [i33]Yihui Fu, Jian Wu, Yanxin Hu, Mengtao Xing, Lei Xie:
DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation. CoRR abs/2011.02131 (2020) - [i32]Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlícek, Jean-Marc Odobez:
IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines. CoRR abs/2011.02198 (2020) - [i31]Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao:
The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines. CoRR abs/2011.06724 (2020) - [i30]Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li:
Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher. CoRR abs/2011.08467 (2020) - [i29]Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie:
Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter. CoRR abs/2011.08469 (2020) - [i28]Yi Lei, Shan Yang, Lei Xie:
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis. CoRR abs/2011.08477 (2020) - [i27]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
Optimizing voice conversion network with cycle consistency loss of speaker identity. CoRR abs/2011.08548 (2020) - [i26]Zhichao Wang, Wenshuo Ge, Xiong Wang, Shan Yang, Wendong Gan, Haitao Chen, Hai Li, Lei Xie, Xiulin Li:
Accent and Speaker Disentanglement in Many-to-many Voice Conversion. CoRR abs/2011.08609 (2020) - [i25]Qing Wang, Wei Rao, Pengcheng Guo, Lei Xie:
Adversarial Training for Multi-domain Speaker Recognition. CoRR abs/2011.08623 (2020) - [i24]Tao Li, Shan Yang, Liumeng Xue, Lei Xie:
Controllable Emotion Transfer For End-to-End Speech Synthesis. CoRR abs/2011.08679 (2020) - [i23]Yuxiang Kong, Jian Wu, Quandong Wang, Peng Gao, Weiji Zhuang, Yujun Wang, Lei Xie:
Multi-Channel Automatic Speech Recognition Using Deep Complex Unet. CoRR abs/2011.09081 (2020) - [i22]Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie:
Context-aware RNNLM Rescoring for Conversational Speech Recognition. CoRR abs/2011.09301 (2020) - [i21]Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu:
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training. CoRR abs/2012.01837 (2020) - [i20]Binbin Zhang, Di Wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei:
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition. CoRR abs/2012.05481 (2020)
2010 – 2019
- 2019
- [j39]Xiaolian Zhu, Yuchao Zhang, Shan Yang, Liumeng Xue, Lei Xie:
Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis. IEEE Access 7: 65955-65964 (2019) - [j38]Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma:
Query-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal Context. IEEE Access 7: 67656-67665 (2019) - [j37]Jingyong Hou, Yangyang Shi, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie:
Region Proposal Network Based Small-Footprint Keyword Spotting. IEEE Signal Process. Lett. 26(10): 1471-1475 (2019) - [j36]Sining Sun, Pengcheng Guo, Lei Xie, Mei-Yuh Hwang:
Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 27(11): 1826-1838 (2019) - [c132]Sining Sun, Shuran Zhou, Mei-Yuh Hwang, Lei Xie, Qin Li, Xin Lei:
Multiple fixed beamformers with a spacial Wiener-form postfilter for far-field speech recognition. APSIPA 2019: 633-637 - [c131]Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie:
Exploring RNN-Transducer for Chinese speech recognition. APSIPA 2019: 1364-1369 - [c130]Zhehuai Chen, Mahsa Yarmohammadi, Hainan Xu, Hang Lv, Lei Xie, Daniel Povey, Sanjeev Khudanpur:
Incremental Lattice Determinization for WFST Decoders. ASRU 2019: 1-7 - [c129]Yiming Wang, Sanjeev Khudanpur, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe:
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit. ASRU 2019: 136-143 - [c128]Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
WaveNet Factorization with Singular Value Decomposition for Voice Conversion. ASRU 2019: 152-159 - [c127]Xiaochun An, Yuxuan Wang, Shan Yang, Zejun Ma, Lei Xie:
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis. ASRU 2019: 184-191 - [c126]Xiaolian Zhu, Shan Yang, Geng Yang, Lei Xie:
Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis. ASRU 2019: 192-199 - [c125]Fengyu Yang, Shan Yang, Pengcheng Zhu, Pengju Yan, Lei Xie:
Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian Bias. ASRU 2019: 208-213 - [c124]Xiong Wang, Sining Sun, Lei Xie:
Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting. ASRU 2019: 607-612 - [c123]Yougen Yuan, Zhiqiang Lv, Shen Huang, Lei Xie:
Verifying Deep Keyword Spotting Detection with Acoustic Word Embeddings. ASRU 2019: 613-620 - [c122]Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Time Domain Audio Visual Speech Separation. ASRU 2019: 667-673 - [c121]Bing Liu, Yunlin Chen, Hao Yin, Yongqiang Li, Xin Lei, Lei Xie:
The Mobvoi Text-To-Speech System for Blizzard Challenge 2019. Blizzard Challenge 2019 - [c120]Shan Yang, Wenshuo Ge, Fengyu Yang, Xinyong Zhou, Fanbo Meng, Kai Liu, Lei Xie:
SZ-NPU Team's Entry to Blizzard Challenge 2019. Blizzard Challenge 2019 - [c119]Ke Wang, Frank K. Soong, Lei Xie:
A Pitch-aware Approach to Single-channel Speech Separation. ICASSP 2019: 296-300 - [c118]Changhao Shan, Chao Weng, Guangsen Wang, Dan Su, Min Luo, Dong Yu, Lei Xie:
Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System. ICASSP 2019: 5631-5635 - [c117]Changhao Shan, Chao Weng, Guangsen Wang, Dan Su, Min Luo, Dong Yu, Lei Xie:
Investigating End-to-end Speech Recognition for Mandarin-english Code-switching. ICASSP 2019: 6056-6060 - [c116]Xiong Wang, Sining Sun, Changhao Shan, Jingyong Hou, Lei Xie, Shen Li, Xin Lei:
Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting. ICASSP 2019: 6366-6370 - [c115]Shiliang Zhang, Ming Lei, Bin Ma, Lei Xie:
Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization. ICASSP 2019: 6570-6574 - [c114]Xiang Hao, Changhao Shan, Yong Xu, Sining Sun, Lei Xie:
An Attention-based Neural Network Approach for Single Channel Speech Enhancement. ICASSP 2019: 6895-6899 - [c113]Shan Yang, Heng Lu, Shiying Kang, Lei Xie, Dong Yu:
Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis. ICASSP 2019: 6910-6914 - [c112]Jingyong Hou, Pengcheng Guo, Sining Sun, Frank K. Soong, Wenping Hu, Lei Xie:
Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech. ICASSP 2019: 8122-8126 - [c111]Yougen Yuan, Wei Tang, Minhao Fan, Yue Cao, Peng Zhang, Lei Xie:
Deep Audio-visual System for Closed-set Word-level Speech Recognition. ICMI 2019: 540-545 - [c110]Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Improved Speaker-Dependent Separation for CHiME-5 Challenge. INTERSPEECH 2019: 466-470 - [c109]Pengcheng Guo, Sining Sun, Lei Xie:
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition. INTERSPEECH 2019: 749-753 - [c108]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
A New GAN-Based End-to-End TTS Training Algorithm. INTERSPEECH 2019: 1288-1292 - [c107]Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu:
Building a Mixed-Lingual Neural TTS System with Only Monolingual Data. INTERSPEECH 2019: 2060-2064 - [c106]Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie:
Towards Language-Universal Mandarin-English Speech Recognition. INTERSPEECH 2019: 2170-2174 - [c105]Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H. L. Hansen:
Adversarial Regularization for End-to-End Robust Speaker Verification. INTERSPEECH 2019: 4010-4014 - [c104]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. INTERSPEECH 2019: 4460-4464 - [i19]Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Time Domain Audio Visual Speech Separation. CoRR abs/1904.03760 (2019) - [i18]Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Improved Speaker-Dependent Separation for CHiME-5 Challenge. CoRR abs/1904.03792 (2019) - [i17]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. CoRR abs/1904.04764 (2019) - [i16]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
A New GAN-based End-to-End TTS Training Algorithm. CoRR abs/1904.04775 (2019) - [i15]Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu:
Building a mixed-lingual neural TTS system with only monolingual data. CoRR abs/1904.06063 (2019) - [i14]Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur:
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit. CoRR abs/1909.08723 (2019) - 2018
- [j35]Wei Feng, Xuecheng Nie, Yujun Zhang, Lei Xie, Jianwu Dang:
Unsupervised measure of Chinese lexical semantic similarity using correlated graph model for news story segmentation. Neurocomputing 318: 236-247 (2018) - [j34]Jia Yu, Lei Xie, Xiong Xiao, Eng Siong Chng:
Learning distributed sentence representations for story segmentation. Signal Process. 142: 403-411 (2018) - [j33]Lei Xie, Tan Lee, Man-Wai Mak:
Guest Editorial: Advances in Deep Learning for Speech Processing. J. Signal Process. Syst. 90(7): 959-961 (2018) - [j32]Chenglin Xu, Lei Xie, Xiong Xiao:
A Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary Detection. J. Signal Process. Syst. 90(7): 1063-1075 (2018) - [c103]Wei Feng, Lei Xie, Jin Zhang, Yujun Zhang, Yanning Zhang:
Self-validated Story Segmentation of Chinese Broadcast News. BICS 2018: 568-578 - [c102]Jinba Xiao, Shan Yang, Mingyang Zhang, Berrak Sisman, Dongyan Huang, Lei Xie, Minghui Dong, Haizhou Li:
The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018. Blizzard Challenge 2018 - [c101]Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie:
Attention-Based End-to-End Speech Recognition on Voice Search. ICASSP 2018: 4764-4768 - [c100]Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie:
Domain Adversarial Training for Accented Speech Recognition. ICASSP 2018: 4854-4858 - [c99]Qing Wang, Wei Rao, Sining Sun, Lei Xie, Eng Siong Chng, Haizhou Li:
Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition. ICASSP 2018: 4889-4893 - [c98]Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li:
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search. INTERSPEECH 2018: 97-101 - [c97]Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie:
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition. INTERSPEECH 2018: 1581-1585 - [c96]Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng:
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition. INTERSPEECH 2018: 1928-1932 - [c95]Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie:
Attention-based End-to-End Models for Small-Footprint Keyword Spotting. INTERSPEECH 2018: 2037-2041 - [c94]Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie:
Training Augmentation with Adversarial Examples for Robust Speech Recognition. INTERSPEECH 2018: 2404-2408 - [c93]Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie:
Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model. INTERSPEECH 2018: 2429-2433 - [c92]Jingyong Hou, Wenping Hu, Frank K. Soong, Lei Xie:
A Refined Query-by-Example Approach to Spoken-Term-Detection on ESL learners' Speech. ISCSLP 2018: 111-115 - [c91]Dong-Yan Huang, Sicheng Zhao, Björn W. Schuller, Hongxun Yao, Jianhua Tao, Min Xu, Lei Xie, Qingming Huang, Jie Yang:
ASMMC-MMAC 2018: The Joint Workshop of 4th the Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop. ACM Multimedia 2018: 2120-2121 - [i13]Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie:
Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition. CoRR abs/1803.10132 (2018) - [i12]Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie:
Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model. CoRR abs/1803.10146 (2018) - [i11]Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie:
Attention-based End-to-End Models for Small-Footprint Keyword Spotting. CoRR abs/1803.10916 (2018) - [i10]Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie:
Training Augmentation with Adversarial Examples for Robust Speech Recognition. CoRR abs/1806.02782 (2018) - [i9]Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie:
Domain Adversarial Training for Accented Speech Recognition. CoRR abs/1806.02786 (2018) - [i8]Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li:
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search. CoRR abs/1806.03621 (2018) - [i7]Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng:
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition. CoRR abs/1806.06200 (2018) - [i6]Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie:
Exploring RNN-Transducer for Chinese Speech Recognition. CoRR abs/1811.05097 (2018) - 2017
- [j31]Lei Xie, Jhing-Fa Wang:
Introduction to special section on advances of orange technologies. Frontiers Comput. Sci. 11(3): 407 (2017) - [j30]Yougen Yuan, Lei Xie, Zhong-Hua Fu, Ming Xu, Qi Cong:
Sound image externalization for headphone based real-time 3D audio. Frontiers Comput. Sci. 11(3): 419-428 (2017) - [j29]Sining Sun, Binbin Zhang, Lei Xie, Yanning Zhang:
An unsupervised deep domain adaptation approach for robust speech recognition. Neurocomputing 257: 79-87 (2017) - [j28]Lei Xie, Janne Heikkilä, Bo Li:
Media computing and applications for immersive communications: recent advances. J. Ambient Intell. Humaniz. Comput. 8(6): 827-828 (2017) - [j27]Xiangzeng Zhou, Lei Xie, Peng Zhang, Yanning Zhang:
Online object tracking based on BLSTM-RNN with contextual-sequential labeling. J. Ambient Intell. Humaniz. Comput. 8(6): 861-870 (2017) - [j26]Jia Yu, Lei Xie, Xiong Xiao, Eng Siong Chng:
A hybrid neural network hidden Markov model approach for automatic story segmentation. J. Ambient Intell. Humaniz. Comput. 8(6): 925-936 (2017) - [j25]Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Multitask Feature Learning for Low-Resource Query-by-Example Spoken Term Detection. IEEE J. Sel. Top. Signal Process. 11(8): 1329-1339 (2017) - [j24]Hongjie Chen, Lei Xie, Cheung-Chi Leung, Xiaoming Lu, Bin Ma, Haizhou Li:
Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News. IEEE ACM Trans. Audio Speech Lang. Process. 25(1): 108-119 (2017) - [c90]Jie Yan, Lei Xie, Guangsen Wang, Zhong-Hua Fu:
A segmental DNN/i-vector approach for digit-prompted speaker verification. APSIPA 2017: 1-5 - [c89]Jia Yu, Lei Xie, Xiong Xiao, Eng Siong Chng:
An end-to-end neural network approach to story segmentation. APSIPA 2017: 171-176 - [c88]Jia Yu, Xiong Xiao, Lei Xie, Eng Siong Chng:
Topic embedding of sentences for story segmentation. APSIPA 2017: 1602-1607 - [c87]Zhong-Hua Fu, Lei Xie, Peng Li, Jiaen Liang:
Frequency-invariant differential microphone array design in the STFT domain. APSIPA 2017: 1692-1695 - [c86]Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li:
Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework. ASRU 2017: 685-691 - [c85]Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Multilingual bottle-neck feature learning from untranscribed speech. ASRU 2017: 727-733 - [c84]Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li:
Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation. ASRU 2017: 734-739 - [c83]Yanfeng Lu, Zhengchen Zhang, Chenyu Yang, Huaiping Ming, Xiaolian Zhu, Yuchao Zhang, Shan Yang, Dongyan Huang, Lei Xie, Minghui Dong:
The I2R-NWPU Text-to-Speech System for Blizzard Challenge 2017. Blizzard Challenge 2017 - [c82]Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li:
Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection. ICASSP 2017: 5645-5649 - [c81]Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu:
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling. INTERSPEECH 2017: 528-532 - [c80]Jie Wu, Dong-Yan Huang, Lei Xie, Haizhou Li:
Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion. INTERSPEECH 2017: 3379-3383 - [i5]Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu:
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling. CoRR abs/1703.05880 (2017) - [i4]Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li:
Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework. CoRR abs/1707.01670 (2017) - [i3]Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie:
Attention-Based End-to-End Speech Recognition in Mandarin. CoRR abs/1707.07167 (2017) - 2016
- [j23]Peng Zhang, Tao Zhuo, Lei Xie, Yanning Zhang:
Deformable object tracking with spatiotemporal segmentation in big vision surveillance. Neurocomputing 204: 87-96 (2016) - [j22]Lei Xie, Longbiao Wang, Janne Heikkilä, Peng Zhang:
Guest Editorial: Immersive Audio/Visual Systems. Multim. Tools Appl. 75(9): 5047-5053 (2016) - [j21]Bo Fan, Lei Xie, Shan Yang, Lijuan Wang, Frank K. Soong:
A deep bidirectional LSTM approach for video-realistic talking head. Multim. Tools Appl. 75(9): 5287-5309 (2016) - [j20]Peng Zhang, Tao Zhuo, Yanning Zhang, Lei Xie, Dapeng Tao:
Real-time tracking-by-learning with high-order regularization fusion for big video abstraction. Signal Process. 124: 246-258 (2016) - [c79]Pengyun Guang, Zhonghua Fu, Lei Xie, Wenjie Zhao:
Study on near-field crosstalk cancellation based on least square algorithm. APSIPA 2016: 1-5 - [c78]Zhen Wei, Zhizheng Wu, Lei Xie:
Predicting articulatory movement from text using deep architecture with stacked bottleneck features. APSIPA 2016: 1-6 - [c77]Jie Wu, Zhizheng Wu, Lei Xie:
On the use of I-vectors and average voice model for voice conversion without parallel data. APSIPA 2016: 1-6 - [c76]Shan Yang, Zhizheng Wu, Lei Xie:
On the training of DNN-based average voice model for speech synthesis. APSIPA 2016: 1-6 - [c75]Zhengchen Zhang, Mei Li, Yuchao Zhang, Weini Zhang, Yang Liu, Shan Yang, Yanfeng Lu, Van Tung Pham, Lei Xie, Minghui Dong:
The I2R-NWPU-NTU Text-to-Speech System at Blizzard Challenge 2016. Blizzard Challenge 2016 - [c74]Huaiping Ming, Dong-Yan Huang, Lei Xie, Shaofei Zhang, Minghui Dong, Haizhou Li:
Exemplar-based sparse representation of timbre and prosody for voice conversion. ICASSP 2016: 5175-5179 - [c73]Haihua Xu, Jingyong Hou, Xiong Xiao, Van Tung Pham, Cheung-Chi Leung, Lei Wang, Van Hai Do, Hang Lv, Lei Xie, Bin Ma, Eng Siong Chng, Haizhou Li:
Approximate search of audio queries by using DTW with phone time boundary and data augmentation. ICASSP 2016: 6030-6034 - [c72]Bihong Zhang, Lei Xie, Yougen Yuan, Huaiping Ming, Dong-Yan Huang, Mingli Song:
Deep neural network derived bottleneck features for accurate audio classification. ICME Workshops 2016: 1-6 - [c71]Yougen Yuan, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information. INTERSPEECH 2016: 788-792 - [c70]Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection. INTERSPEECH 2016: 923-927 - [c69]Jia Yu, Xiong Xiao, Lei Xie, Eng Siong Chng, Haizhou Li:
A DNN-HMM Approach to Story Segmentation. INTERSPEECH 2016: 1527-1531 - [c68]Huaiping Ming, Dong-Yan Huang, Lei Xie, Jie Wu, Minghui Dong, Haizhou Li:
Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion. INTERSPEECH 2016: 2453-2457 - [c67]Cheung-Chi Leung, Lei Wang, Haihua Xu, Jingyong Hou, Van Tung Pham, Hang Lv, Lei Xie, Xiong Xiao, Chongjia Ni, Bin Ma, Eng Siong Chng, Haizhou Li:
Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis. INTERSPEECH 2016: 3703-3707 - [c66]Jingyong Hou, Lei Xie, Zhonghua Fu:
Investigating neural network based query-by-example keyword spotting approach for personalized wake-up word detection in Mandarin Chinese. ISCSLP 2016: 1-5 - [c65]Lei Wang, Chongjia Ni, Cheung-Chi Leung, Changhuai You, Lei Xie, Haihua Xu, Xiong Xiao, Tin Lay Nwe, Eng Siong Chng, Bin Ma, Haizhou Li:
The NNI Vietnamese Speech Recognition System for MediaEval 2016. MediaEval 2016 - [c64]Dong-Yan Huang, Lei Xie, Yvonne Siu Wa Lee, Jie Wu, Huaiping Ming, Xiaohai Tian, Shaofei Zhang, Chuang Ding, Mei Li, Nguyen Quy Hy, Minghui Dong, Haizhou Li:
An Automatic Voice Conversion Evaluation Strategy Based on Perceptual Background Noise Distortion and Speaker Similarity. SSW 2016: 44-51 - [c63]Mei Li, Zhizheng Wu, Lei Xie:
On the impact of phoneme alignment in DNN-based speech synthesis. SSW 2016: 196-201 - 2015
- [j19]Lei Xie, Jia Jia, Helen M. Meng, Zhigang Deng, Lijuan Wang:
Expressive talking avatar synthesis and animation. Multim. Tools Appl. 74(22): 9845-9848 (2015) - [j18]Chuang Ding, Lei Xie, Pengcheng Zhu:
Head motion synthesis from speech using deep neural networks. Multim. Tools Appl. 74(22): 9871-9888 (2015) - [j17]Lei Xie, Jia Zeng, Zhi-Qiang Liu:
Topic modeling in multimedia: algorithms and applications. Soft Comput. 19(1): 1-2 (2015) - [j16]Hongjie Chen, Lei Xie, Wei Feng, Lilei Zheng, Yanning Zhang:
Topic segmentation on spoken documents using self-validated acoustic cuts. Soft Comput. 19(1): 47-59 (2015) - [j15]Wei Feng, Xuefei Yin, Yifeng Zhang, Lei Xie:
NestDE: generic parameters tuning for automatic story segmentation. Soft Comput. 19(1): 61-70 (2015) - [j14]Peng Zhang, Liang Wang, Wei Huang, Lei Xie, Guang Chen:
Multiple pedestrian tracking based on couple-states Markov chain with semantic topic learning for video surveillance. Soft Comput. 19(1): 85-97 (2015) - [j13]Xiangzeng Zhou, Lei Xie, Qiang Huang, Stephen J. Cox, Yanning Zhang:
Tennis Ball Tracking Using a Two-Layered Data Association Approach. IEEE Trans. Multim. 17(2): 145-156 (2015) - [c62]Huaiping Ming, Dong-Yan Huang, Minghui Dong, Haizhou Li, Lei Xie, Shaofei Zhang:
Fundamental frequency modeling using wavelets for emotional voice conversion. ACII 2015: 804-809 - [c61]Jia Yu, Lei Xie, Xiong Xiao, Eng Siong Chng, Haizhou Li:
A density peak clustering approach to unsupervised acoustic subword units discovery. APSIPA 2015: 178-183 - [c60]Shaofei Zhang, Dong-Yan Huang, Lei Xie, Eng Siong Chng, Haizhou Li, Minghui Dong:
Non-negative matrix factorization using stable alternating direction method of multipliers for source separation. APSIPA 2015: 222-228 - [c59]Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, Minghui Dong:
A waveform representation framework for high-quality statistical parametric speech synthesis. APSIPA 2015: 530-536 - [c58]Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu:
Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features. ASRU 2015: 98-102 - [c57]Bo Fan, Lijuan Wang, Frank K. Soong, Lei Xie:
Photo-real talking head with deep bidirectional LSTM. ICASSP 2015: 4884-4888 - [c56]Haihua Xu, Peng Yang, Xiong Xiao, Lei Xie, Cheung-Chi Leung, Hongjie Chen, Jia Yu, Hang Lv, Lei Wang, Su Jun Leow, Bin Ma, Engsiong Chng, Haizhou Li:
Language independent query-by-example spoken term detection using N-best phone sequences and partial matching. ICASSP 2015: 5191-5195 - [c55]Shaofei Zhang, Dong-Yan Huang, Lei Xie, Engsiong Chng, Haizhou Li, Minghui Dong:
Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separation. INTERSPEECH 2015: 1498-1502 - [c54]Pengcheng Zhu, Lei Xie, Yunlin Chen:
Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings. INTERSPEECH 2015: 2192-2196 - [c53]Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study. INTERSPEECH 2015: 3189-3193 - [c52]Chuang Ding, Pengcheng Zhu, Lei Xie:
BLSTM neural networks for speech driven head motion synthesis. INTERSPEECH 2015: 3345-3349 - [c51]Huaiping Ming, Dong-Yan Huang, Lei Xie, Haizhou Li, Minghui Dong:
An alternating optimization approach for phase retrieval. INTERSPEECH 2015: 3426-3430 - [c50]Jingyong Hou, Van Tung Pham, Cheung-Chi Leung, Lei Wang, Haihua Xu, Hang Lv, Lei Xie, Zhonghua Fu, Chongjia Ni, Xiong Xiao, Hongjie Chen, Shaofei Zhang, Sining Sun, Yougen Yuan, Pengcheng Li, Tin Lay Nwe, Sunil Sivadas, Bin Ma, Engsiong Chng, Haizhou Li:
The NNI Query-by-Example System for MediaEval 2015. MediaEval 2015 - [c49]Xiangzeng Zhou, Lei Xie, Peng Zhang, Yanning Zhang:
Online Object Tracking Based on CNN with Metropolis-Hasting Re-Sampling. ACM Multimedia 2015: 1163-1166 - [i2]Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, Minghui Dong:
A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis. CoRR abs/1510.01443 (2015) - [i1]Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu:
Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features. CoRR abs/1511.00360 (2015) - 2014
- [j12]Lei Xie, Zhigang Deng, Stephen J. Cox:
Multimodal joint information processing in human machine interaction: recent advances. Multim. Tools Appl. 73(1): 267-271 (2014) - [j11]Lei Xie, Naicai Sun, Bo Fan:
A statistical parametric approach to video-realistic text-driven talking avatar. Multim. Tools Appl. 73(1): 377-396 (2014) - [c48]Guangpu Huang, Chenglin Xu, Xiong Xiao, Lei Xie, Chng Eng Siong, Haizhou Li:
Multi-view features in a DNN-CRF model for improved sentence unit detection on English broadcast news. APSIPA 2014: 1-9 - [c47]Jiamei Wei, Ercheng Pei, Dongmei Jiang, Hichem Sahli, Lei Xie, Zhonghua Fu:
Multimodal continuous affect recognition based on LSTM and multiple kernel learning. APSIPA 2014: 1-4 - [c46]Chenglin Xu, Lei Xie, Zhonghua Fu:
Sentence boundary detection in chinese broadcast news using conditional random fields and prosodic features. ChinaSIP 2014: 37-41 - [c45]Huaiping Ming, Dong-Yan Huang, Lei Xie, Haizhou Li:
Learning optimal features for music transcription. ChinaSIP 2014: 105-109 - [c44]Chao Yang, Lei Xie, Xiangzeng Zhou:
Unsupervised broadcast news story segmentation using distance dependent Chinese restaurant processes. ICASSP 2014: 4062-4066 - [c43]Xiangzeng Zhou, Lei Xie, Peng Zhang, Yanning Zhang:
An ensemble of deep neural networks for object tracking. ICIP 2014: 843-847 - [c42]Peng Yang, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection. INTERSPEECH 2014: 1722-1726 - [c41]Chuang Ding, Pengcheng Zhu, Lei Xie, Dongmei Jiang, Zhong-Hua Fu:
Speech-driven head motion synthesis using neural networks. INTERSPEECH 2014: 2303-2307 - [c40]Zhong-Hua Fu, Lei Xie:
Stereo acoustic echo suppression using widely linear filtering in the frequency domain. INTERSPEECH 2014: 2809-2813 - [c39]Chenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Engsiong Chng, Haizhou Li:
A deep neural network approach for sentence boundary detection in broadcast news. INTERSPEECH 2014: 2887-2891 - [c38]Zhong-Hua Fu, Lei Xie, Hang Lv:
Experimental study on dereverberation and noise reduction for distant speech recognition. ISCSLP 2014: 393-397 - [c37]Shaofei Zhang, Lei Xie, Zhong-Hua Fu, Yougen Yuan:
A hybrid virtual bass system with improved phase vocoder and high efficiency. ISCSLP 2014: 401-405 - [c36]Peng Yang, Haihua Xu, Xiong Xiao, Lei Xie, Cheung-Chi Leung, Hongjie Chen, Jia Yu, Hang Lv, Lei Wang, Su Jun Leow, Bin Ma, Chng Eng Siong, Haizhou Li:
The NNI Query-by-Example System for MediaEval 2014. MediaEval 2014 - [e1]Haizhou Li, Helen M. Meng, Bin Ma, Engsiong Chng, Lei Xie:
15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, September 14-18, 2014. ISCA 2014 [contents] - 2013
- [c35]Xiaoming Lu, Lei Xie, Cheung-Chi Leung, Bin Ma, Haizhou Li:
Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions. ACL (2) 2013: 190-195 - [c34]Jianwei Niu, Lei Xie, Lei Jia, Na Hu:
Context-dependent deep neural networks for commercial Mandarin speech recognition applications. APSIPA 2013: 1-5 - [c33]Ling Tang, Zhong-Hua Fu, Lei Xie:
Numerical calculation of the head-related transfer functions with Chinese dummy head. APSIPA 2013: 1-4 - [c32]Xiangzeng Zhou, Qiang Huang, Lei Xie, Stephen J. Cox:
A two layered data association approach for ball tracking. ICASSP 2013: 2317-2321 - [c31]Xuecheng Nie, Wei Feng, Liang Wan, Lei Xie:
Measuring semantic similarity by contextualword connections in Chinese news story segmentation. ICASSP 2013: 8312-8316 - [c30]Xiaoming Lu, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Broadcast news story segmentation using latent topics on data manifold. ICASSP 2013: 8465-8469 - [c29]Peng Yang, Lei Xie, Qiao Luan, Wei Feng:
A tighter lower bound estimate for dynamic time warping. ICASSP 2013: 8525-8529 - 2012
- [j10]Xiaoxuan Wang, Lei Xie, Mimi Lu, Bin Ma, Engsiong Chng, Haizhou Li:
Broadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features. IEICE Trans. Inf. Syst. 95-D(5): 1206-1215 (2012) - [j9]Lei Xie, Lilei Zheng, Zihan Liu, Yanning Zhang:
Laplacian Eigenmaps for Automatic Story Segmentation of Broadcast News. IEEE Trans. Speech Audio Process. 20(1): 276-289 (2012) - [c28]Qiang Huang, Stephen J. Cox, Xiangzeng Zhou, Lei Xie:
Detection of ball hits in a tennis game using audio and visual information. APSIPA 2012: 1-10 - [c27]Lilei Zheng, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Acoustic TextTiling for story segmentation of spoken documents. ICASSP 2012: 5121-5124 - [c26]Wei Feng, Xuecheng Nie, Liang Wan, Lei Xie, Jianmin Jiang:
Lexical Story Co-Segmentation of Chinese Broadcast News. INTERSPEECH 2012: 2286-2289 - [c25]Lei Xie, Yinqing Xu, Lilei Zheng, Qiang Huang, Bingfeng Li:
Speech Pattern Discovery using Audio-Visual Fusion and Canonical Correlation Analysis. INTERSPEECH 2012: 2374-2377 - [c24]Yali Zhao, Lei Xie, Zhonghua Fu:
Mask Estimation and Refinement for MFT-based Robust Speaker Verification. INTERSPEECH 2012: 2654-2657 - 2011
- [j8]Lei Xie, Yulian Yang, Zhi-Qiang Liu:
On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news. Inf. Sci. 181(13): 2873-2891 (2011) - [j7]Lei Xie, Zhong-Hua Fu, Wei Feng, Yong Luo:
Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news. Multim. Syst. 17(2): 101-112 (2011) - [c23]Mimi Lu, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation. INTERSPEECH 2011: 1109-1112 - 2010
- [j6]Jia Zeng, Wei Feng, Lei Xie, Zhi-Qiang Liu:
Cascade Markov random fields for stroke extraction of Chinese characters. Inf. Sci. 180(2): 301-311 (2010) - [j5]Yaodong Ni, Lei Xie, Zhi-Qiang Liu:
Minimizing the expected complete influence time of a social network. Inf. Sci. 180(13): 2514-2527 (2010) - [c22]Zihan Liu, Lei Xie, Wei Feng:
Maximum lexical cohesion for fine-grained news story segmentation. INTERSPEECH 2010: 1301-1304 - [c21]Xiaoxuan Wang, Lei Xie, Bin Ma, Engsiong Chng, Haizhou Li:
Phoneme lattice based texttiling towards multilingual story segmentation. INTERSPEECH 2010: 1305-1308 - [c20]Zhong-Hua Fu, Lei Xie, Dongmei Jiang:
Dual-microphone noise reduction based on semi-blind DUET. ISCSLP 2010: 33-37 - [c19]Mimi Lu, Lei Xie, Zhong-Hua Fu, Dongmei Jiang, Yanning Zhang:
Multi-modal feature integration for story boundary detection in broadcast news. ISCSLP 2010: 420-425 - [c18]Lei Xie, Wenhuai Zhao, Xiangzeng Zhou, Xiaohai Tian, Bingfeng Li, Naicai Sun, Yali Zhao, Yanning Zhang:
Speech and Auditory Interfaces for Ubiquitous, Immersive and Personalized Applications. UIC/ATC Workshops 2010: 503-505
2000 – 2009
- 2009
- [j4]Wei Feng, Lei Xie, Jia Zeng, Zhi-Qiang Liu:
Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models. J. Vis. Lang. Comput. 20(3): 188-195 (2009) - [c17]Wei Feng, Lei Xie, Zhi-Qiang Liu:
Multicue Graph Mincut for Image Segmentation. ACCV (2) 2009: 707-717 - [c16]Jin Zhang, Lei Xie, Wei Feng, Yanning Zhang:
A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News. AIRS 2009: 136-148 - [c15]Zhong-Hua Fu, Jhing-Fa Wang, Lei Xie:
Noise robust features for speech/music discrimination in real-time telecommunication. ICME 2009: 574-577 - 2008
- [j3]Jia Zeng, Lei Xie, Zhi-Qiang Liu:
Type-2 fuzzy Gaussian mixture models. Pattern Recognit. 41(12): 3636-3643 (2008) - [c14]Lei Xie, Jia Zeng, Wei Feng:
Multi-Scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News. AIRS 2008: 345-355 - [c13]Lei Xie, Xi Tan:
A Heuristic Approach to Caption Enhancement for Effective Video OCR. ICIC (1) 2008: 347-355 - [c12]Yulian Yang, Lei Xie:
Subword Latent Semantic Analysis for Texttiling-Based Automatic Story Segmentation of Chinese Broadcast News. ISCSLP 2008: 358-361 - [c11]Lei Xie, Yulian Yang, Jia Zeng:
Subword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast News. PCM 2008: 248-258 - 2007
- [j2]Lei Xie, Zhi-Qiang Liu:
A coupled HMM approach to video-realistic speech animation. Pattern Recognit. 40(8): 2325-2340 (2007) - [j1]Lei Xie, Zhi-Qiang Liu:
Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling. IEEE Trans. Multim. 9(3): 500-510 (2007) - [c10]Shing-kai Chan, Lei Xie, Helen M. Meng:
Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation. INTERSPEECH 2007: 2581-2584 - [c9]Lei Xie, Chuan Liu, Helen Meng:
Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News. HLT-NAACL (Short Papers) 2007: 193-196 - 2006
- [c8]Lei Xie, Zhi-Qiang Liu:
An Articulatory Approach to Video-Realistic Mouth Animation. ICASSP (1) 2006: 593-596 - [c7]Lei Xie, Zhi-Qiang Liu:
Speech Animation Using Coupled Hidden Markov Models. ICPR (1) 2006: 1128-1131 - [c6]Lei Xie, Helen Meng, Zhi-Qiang Liu:
A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion. ISCSLP (Selected Papers) 2006: 627-639 - [c5]Yi Wang, Lei Xie, Zhi-Qiang Liu, Li-Zhu Zhou:
Supervised Learning of Motion Style for Real-time Synthesis of 3D Character Animations. SMC 2006: 4321-4325 - [c4]Lei Xie, Yi Wang, Zhi-Qiang Liu:
Lip Assistant: Visualize Speech for Hearing Impaired People in Multimedia Services. SMC 2006: 4331-4336 - [c3]Yi Wang, Lei Xie, Zhi-Qiang Liu, Li-Zhu Zhou:
The SOMN-HMM Model and Its Application to Automatic Synthesis of 3D Character Animations. SMC 2006: 4948-4952 - [c2]Yi Wang, Li-Zhu Zhou, Jianhua Feng, Lei Xie, Chun Yuan:
2D/3D Web Visualization on Mobile Devices. WISE 2006: 536-547 - 2005
- [c1]Lei Xie, Zhi-Qiang Liu:
Multi-stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech Recognition. ICMLC 2005: 994-1004
Coauthor Index
aka: Zhonghua Fu
aka: Dongyan Huang
aka: Zhi-Qiang Liu
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-18 19:21 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint