default search action
Li-Rong Dai 0001
Person information
- affiliation: University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China
Other persons with the same name
- Li-Rong Dai (aka: Lirong Dai) — disambiguation page
- Li-Rong Dai 0002 (aka: Lirong Dai 0002) — Seattle University, USA (and 1 more)
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j60]Li-Rong Dai, Luqi Gong, Zhulin An, Yongjun Xu, Boyu Diao:
Sketch-fusion: A gradient compression method with multi-layer fusion for communication-efficient distributed training. J. Parallel Distributed Comput. 185: 104811 (2024) - [j59]Jian Zhu, Zhangmin Huang, Lei Liu, Chang Tang, Li-Rong Dai:
Boosted Curriculum Multi-View Hashing for Multimedia Retrieval. IEEE Signal Process. Lett. 31: 2065-2069 (2024) - [j58]Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, Li-Rong Dai, Jinyu Li, Furu Wei:
SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2177-2187 (2024) - [j57]Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Li-Rong Dai, Daxin Jiang, Jinyu Li, Furu Wei:
VatLM: Visual-Audio-Text Pre-Training With Unified Masked Prediction for Speech Representation Learning. IEEE Trans. Multim. 26: 1055-1064 (2024) - [j56]Dianhai Yu, Liang Shen, Hongxiang Hao, Weibao Gong, HuaChao Wu, Jiang Bian, Lirong Dai, Haoyi Xiong:
MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services. IEEE Trans. Serv. Comput. 17(5): 2626-2639 (2024) - [c276]Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai:
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation. AAAI 2024: 19768-19776 - [c275]Yichi Wang, Jie Zhang, Shihao Chen, Weitai Zhang, Zhongyi Ye, Xinyuan Zhou, Lirong Dai:
A Study of Multichannel Spatiotemporal Features and Knowledge Distillation on Robust Target Speaker Extraction. ICASSP 2024: 431-435 - [c274]Jianwei Cui, Yu Gu, Chao Weng, Jie Zhang, Liping Chen, Lirong Dai:
Sifisinger: A High-Fidelity End-to-End Singing Voice Synthesizer Based on Source-Filter Model. ICASSP 2024: 11126-11130 - [c273]Shihao Chen, Liping Chen, Jie Zhang, Kong-Aik Lee, Zhenhua Ling, Lirong Dai:
Adversarial Speech for Voice Privacy Protection from Personalized Speech Generation. ICASSP 2024: 11411-11415 - [c272]Weitai Zhang, Hanyi Zhang, Chenxuan Liu, Zhongyi Ye, Xinyuan Zhou, Chao Lin, Lirong Dai:
Pre-Trained Acoustic-and-Textual Modeling for End-To-End Speech-To-Text Translation. ICASSP 2024: 11451-11455 - [c271]Shifu Xiong, Li-Rong Dai:
Exploring Semi-Supervised, Subcategory Classification and Subwords Alignment for Visual Wake Word Spotting. ICME Workshops 2024: 1-6 - [i47]Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai:
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation. CoRR abs/2401.03468 (2024) - [i46]Shihao Chen, Liping Chen, Jie Zhang, Kong-Aik Lee, Zhenhua Ling, Lirong Dai:
Adversarial speech for voice privacy protection from Personalized Speech generation. CoRR abs/2401.11857 (2024) - [i45]Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Liping Chen, Lirong Dai:
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance. CoRR abs/2406.05325 (2024) - [i44]Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang, Rilin Chen, Li-Rong Dai:
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation. CoRR abs/2408.12354 (2024) - [i43]Jianwei Cui, Yu Gu, Chao Weng, Jie Zhang, Liping Chen, Li-Rong Dai:
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model. CoRR abs/2410.12536 (2024) - 2023
- [j55]Hongyu Yang, Yiyuan Xie, Tingting Song, Ye Su, Bocheng Liu, Junxiong Chai, Xiao Jiang, Lirong Dai, Jing Pang:
Universal wavelength reuse mechanism for optical networks-on-chip based on a cooperative game. J. Opt. Commun. Netw. 15(6): 367-380 (2023) - [j54]Jie Zhang, Rui Tao, Jun Du, Li-Rong Dai:
Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks. IEEE ACM Trans. Audio Speech Lang. Process. 31: 215-228 (2023) - [j53]Qiu-Shi Zhu, Jie Zhang, Ziqiang Zhang, Li-Rong Dai:
A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1927-1939 (2023) - [j52]Jie Zhang, Rui Tao, Jun Du, Li-Rong Dai:
SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3176-3189 (2023) - [c270]Pan Deng, Jie Zhang, Xinyuan Zhou, Zhongyi Ye, Weitai Zhang, Jianwei Cui, Lirong Dai:
Learning Semantic Information from Machine Translation to Improve Speech-to-Text Translation. APSIPA ASC 2023: 954-959 - [c269]Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai:
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings. APSIPA ASC 2023: 1943-1948 - [c268]Hang-Rui Hu, Yan Song, Jian-Tao Zhang, Li-Rong Dai, Ian McLoughlin, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue:
Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification. ICASSP 2023: 1-5 - [c267]Kang Li, Yan Song, Li-Rong Dai, Ian McLoughlin, Xin Fang, Lin Liu:
AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer. ICASSP 2023: 1-5 - [c266]Haitao Xu, Liangfa Wei, Jie Zhang, Jianming Yang, Yannan Wang, Tian Gao, Xin Fang, Li-Rong Dai:
A Multi-Scale Feature Aggregation Based Lightweight Network for Audio-Visual Speech Enhancement. ICASSP 2023: 1-5 - [c265]Xiao-Min Zeng, Yan Song, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, Li-Rong Dai, Ian McLoughlin:
Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection. ICASSP 2023: 1-5 - [c264]Qiu-Shi Zhu, Long Zhou, Jie Zhang, Shujie Liu, Yu-Chen Hu, Li-Rong Dai:
Robust Data2VEC: Noise-Robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning. ICASSP 2023: 1-5 - [c263]Shan He, Haonan He, Shuo Yang, Xiaoyan Wu, Pengcheng Xia, Bing Yin, Cong Liu, Lirong Dai, Chang Xu:
Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation. ICCV 2023: 14146-14156 - [c262]Jiajia Wu, Anni Li, Kun Zhao, Zhengyan Yang, Bing Yin, Cong Liu, Li-Rong Dai:
A Multimodal Text Block Segmentation Framework for Photo Translation. ICIG (3) 2023: 116-127 - [c261]Jiajia Wu, Kun Zhao, Zhengyan Yang, Bing Yin, Cong Liu, Li-Rong Dai:
End-to-End Multilingual Text Recognition Based on Byte Modeling. ICIG (3) 2023: 128-137 - [c260]Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, Li-Rong Dai:
Vision-Language Adaptive Mutual Decoder for OOV-STR. ICIG (2) 2023: 175-186 - [c259]Xiao-Min Zeng, Yan Song, Ian McLoughlin, Lin Liu, Li-Rong Dai:
Robust Prototype Learning for Anomalous Sound Detection. INTERSPEECH 2023: 261-265 - [c258]Kang Li, Yan Song, Ian McLoughlin, Lin Liu, Jin Li, Li-Rong Dai:
Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection. INTERSPEECH 2023: 291-295 - [c257]Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai:
CASA-ASR: Context-Aware Speaker-Attributed ASR. INTERSPEECH 2023: 411-415 - [c256]Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai:
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction. INTERSPEECH 2023: 5047-5051 - [c255]Jingyuan Wang, Jie Zhang, Li-Rong Dai:
Real-Time Causal Spectro-Temporal Voice Activity Detection Based on Convolutional Encoding and Residual Decoding. INTERSPEECH 2023: 5062-5066 - [c254]Pan Deng, Shihao Chen, Weitai Zhang, Jie Zhang, Lirong Dai:
The USTC's Dialect Speech Translation System for IWSLT 2023. IWSLT@ACL 2023: 102-112 - [c253]Xinyuan Zhou, Jianwei Cui, Zhongyi Ye, Yichi Wang, Luzhen Xu, Hanyi Zhang, Weitai Zhang, Lirong Dai:
Submission of USTC's System for the IWSLT 2023 - Offline Speech Translation Track. IWSLT@ACL 2023: 194-201 - [c252]Jinshui Hu, Hao Wu, Mingjun Chen, Chenyu Liu, Jiajia Wu, Shi Yin, Baocai Yin, Bing Yin, Cong Liu, Jun Du, Lirong Dai:
Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder. ACM Multimedia 2023: 8114-8124 - [c251]Jie Zhang, Rui Tao, Li-Rong Dai:
A Speech Distortion Weighted Single-Channel Wiener Filter Based STFT-Domain Noise Reduction. SSP 2023: 527-531 - [i42]Kang Li, Yan Song, Li-Rong Dai, Ian McLoughlin, Xin Fang, Lin Liu:
AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer. CoRR abs/2303.03689 (2023) - [i41]Xiao-Min Zeng, Yan Song, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, Li-Rong Dai, Ian McLoughlin:
Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection. CoRR abs/2305.12111 (2023) - [i40]Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai:
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction. CoRR abs/2305.12450 (2023) - [i39]Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai:
CASA-ASR: Context-Aware Speaker-Attributed ASR. CoRR abs/2305.12459 (2023) - [i38]Qiushi Zhu, Yu Gu, Chao Weng, Yuchen Hu, Lirong Dai, Jie Zhang:
Rep2wav: Noise Robust text-to-speech Using self-supervised representations. CoRR abs/2308.14553 (2023) - 2022
- [j51]Zi-qiang Zhang, Yan Song, Ming-Hui Wu, Xin Fang, Ian McLoughlin, Li-Rong Dai:
Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition. Circuits Syst. Signal Process. 41(12): 6827-6843 (2022) - [j50]Jiajia Wu, Jun Du, Fengren Wang, Chen Yang, Xinzhe Jiang, Jinshui Hu, Bing Yin, Jianshu Zhang, Lirong Dai:
A multimodal attention fusion network with a dynamic vocabulary for TextVQA. Pattern Recognit. 122: 108214 (2022) - [j49]Jie Zhang, Guanghui Zhang, Li-Rong Dai:
Frequency-Invariant Sensor Selection for MVDR Beamforming in Wireless Acoustic Sensor Networks. IEEE Trans. Wirel. Commun. 21(12): 10648-10661 (2022) - [c250]Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei:
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training. EMNLP 2022: 1663-1676 - [c249]Han Chen, Yan Song, Li-Rong Dai, Ian McLoughlin, Lin Liu:
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift. ICASSP 2022: 471-475 - [c248]Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang, Li-Rong Dai:
Supervised and Self-Supervised Pretraining Based Covid-19 Detection Using Acoustic Breathing/Cough/Speech Signals. ICASSP 2022: 561-565 - [c247]Qiu-Shi Zhu, Jie Zhang, Zi-qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai:
A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition. ICASSP 2022: 3174-3178 - [c246]Xing-Yu Chen, Jie Zhang, Li-Rong Dai:
Reference Microphone Selection and Low-Rank Approximation Based Multichannel Wiener Filter with Application to Speech Recognition. ICASSP 2022: 4963-4967 - [c245]Hang-Rui Hu, Yan Song, Ying Liu, Li-Rong Dai, Ian McLoughlin, Lin Liu:
Domain Robust Deep Embedding Learning for Speaker Recognition. ICASSP 2022: 7182-7186 - [c244]Yuxuan Xi, Yan Song, Li-Rong Dai, Ian McLoughlin, Lin Liu:
Frontend Attributes Disentanglement for Speech Emotion Recognition. ICASSP 2022: 7712-7716 - [c243]Ziqiang Zhang, Jie Zhang, Jian-Shu Zhang, Ming-Hui Wu, Xin Fang, Lirong Dai:
Learning Contextually Fused Audio-Visual Representations For Audio-Visual Speech Recognition. ICIP 2022: 1346-1350 - [c242]Ao-Ran Gan, Jie Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai:
An Experimental Comparison between Low-Resource Semi-Supervised and High-Resource Supervised Automatic Speech Recognition Models. ICME 2022: 1-6 - [c241]Jiajia Wu, Jinshui Hu, Mingjun Chen, Lirong Dai, Xuejing Niu, Ning Wang:
Structural String Decoder for Handwritten Mathematical Expression Recognition. ICPR 2022: 3246-3251 - [c240]Hai-tao Xu, Jie Zhang, Li-Rong Dai:
Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition. INTERSPEECH 2022: 1963-1967 - [c239]Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Lirong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang:
A Complementary Joint Training Approach Using Unpaired Speech and Text A Complementary Joint Training Approach Using Unpaired Speech and Text. INTERSPEECH 2022: 2613-2617 - [c238]Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei:
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data. INTERSPEECH 2022: 2658-2662 - [c237]Hang-Rui Hu, Yan Song, Li-Rong Dai, Ian McLoughlin, Lin Liu:
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification. INTERSPEECH 2022: 3689-3693 - [c236]Guolong Zhong, Hongyu Song, Ruoyu Wang, Lei Sun, Diyuan Liu, Jia Pan, Xin Fang, Jun Du, Jie Zhang, Lirong Dai:
External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge. INTERSPEECH 2022: 4860-4864 - [c235]Weitai Zhang, Zhongyi Ye, Haitao Tang, Xiaoxi Li, Xinyuan Zhou, Jing Yang, Jianwei Cui, Dan Liu, Junhua Liu, Lirong Dai:
The USTC-NELSLIP Offline Speech Translation Systems for IWSLT 2022. IWSLT@ACL 2022: 198-207 - [i37]Qiu-Shi Zhu, Jie Zhang, Zi-qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai:
A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition. CoRR abs/2201.08930 (2022) - [i36]Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang, Li-Rong Dai:
Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals. CoRR abs/2201.08934 (2022) - [i35]Zi-qiang Zhang, Jie Zhang, Jian-Shu Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai:
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition. CoRR abs/2202.07428 (2022) - [i34]Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei:
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data. CoRR abs/2203.17113 (2022) - [i33]Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang:
A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition. CoRR abs/2204.02023 (2022) - [i32]Qiu-Shi Zhu, Jie Zhang, Zi-qiang Zhang, Li-Rong Dai:
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR. CoRR abs/2205.13293 (2022) - [i31]Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei:
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training. CoRR abs/2210.03730 (2022) - [i30]Qiu-Shi Zhu, Long Zhou, Jie Zhang, Shujie Liu, Yu-Chen Hu, Lirong Dai:
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning. CoRR abs/2210.15324 (2022) - [i29]Qiu-Shi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, Jinyu Li, Furu Wei:
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning. CoRR abs/2211.11275 (2022) - 2021
- [j48]Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee:
Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement. Neural Networks 143: 171-182 (2021) - [j47]Jie Zhang, Huawei Chen, Li-Rong Dai, Richard Christian Hendriks:
A Study on Reference Microphone Selection for Multi-Microphone Speech Enhancement. IEEE ACM Trans. Audio Speech Lang. Process. 29: 671-683 (2021) - [j46]Jie Zhang, Jun Du, Li-Rong Dai:
Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1220-1232 (2021) - [j45]Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai:
UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 29: 2643-2655 (2021) - [j44]Jian Tang, Jie Zhang, Yan Song, Ian McLoughlin, Li-Rong Dai:
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR. IEEE ACM Trans. Audio Speech Lang. Process. 29: 2816-2828 (2021) - [j43]Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Lirong Dai:
SRD: A Tree Structure Based Decoder for Online Handwritten Mathematical Expression Recognition. IEEE Trans. Multim. 23: 2471-2480 (2021) - [c234]Jing-Xuan Zhang, Korin Richmond, Zhen-Hua Ling, Lirong Dai:
TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis. AAAI 2021: 14402-14410 - [c233]Xu Zheng, Yan Song, Ian McLoughlin, Lin Liu, Li-Rong Dai:
An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection. ICASSP 2021: 356-360 - [c232]Ying Liu, Yan Song, Ian McLoughlin, Lin Liu, Li-Rong Dai:
An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification. ICASSP 2021: 6683-6687 - [c231]Xu Zheng, Yan Song, Li-Rong Dai, Ian McLoughlin, Lin Liu:
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection. Interspeech 2021: 556-560 - [c230]Hui Wang, Lin Liu, Yan Song, Lei Fang, Ian McLoughlin, Li-Rong Dai:
A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification. Interspeech 2021: 1499-1503 - [c229]Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee:
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries. Interspeech 2021: 3001-3005 - [c228]Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai:
UnitNet-Based Hybrid Speech Synthesis. Interspeech 2021: 4119-4123 - [c227]Qiu-Shi Zhu, Jie Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai:
An Improved Wav2Vec 2.0 Pre-Training Approach Using Enhanced Local Dependency Modeling for Speech Recognition. Interspeech 2021: 4334-4338 - [c226]Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, Lirong Dai:
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021. IWSLT 2021: 30-38 - [i28]Zi-qiang Zhang, Yan Song, Ming-Hui Wu, Xin Fang, Li-Rong Dai:
XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition. CoRR abs/2103.08207 (2021) - [i27]Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, Lirong Dai:
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021. CoRR abs/2107.00279 (2021) - 2020
- [j42]Jian Tang, Junfeng Hou, Yan Song, Li-Rong Dai, Ian McLoughlin:
Effective Exploitation of Posterior Information for Attention-Based Speech Recognition. IEEE Access 8: 108988-108999 (2020) - [j41]Junfeng Hou, Wu Guo, Yan Song, Li-Rong Dai:
Segment boundary detection directed attention for online end-to-end speech recognition. EURASIP J. Audio Speech Music. Process. 2020(1): 3 (2020) - [j40]Jianshu Zhang, Jun Du, Lirong Dai:
Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognit. 103: 107305 (2020) - [j39]Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai:
Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis. ACM Trans. Asian Low Resour. Lang. Inf. Process. 19(3): 38:1-38:14 (2020) - [j38]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations. IEEE ACM Trans. Audio Speech Lang. Process. 28: 540-552 (2020) - [c225]Liangfa Wei, Jie Zhang, Junfeng Hou, Lirong Dai:
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition. APSIPA 2020: 638-643 - [c224]Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai:
Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment. Blizzard Challenge / Voice Conversion Challenge 2020 - [c223]Jing-Xuan Zhang, Li-Juan Liu, Yan-Nian Chen, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling, Li-Rong Dai:
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer. Blizzard Challenge / Voice Conversion Challenge 2020 - [c222]Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlin:
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection. ICASSP 2020: 326-330 - [c221]Hui Wang, Yan Song, Zengxi Li, Ian McLoughlin, Li-Rong Dai:
An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation. ICASSP 2020: 6379-6383 - [c220]Bin Gu, Wu Guo, Lirong Dai, Jun Du:
An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales. ICASSP 2020: 6814-6818 - [c219]Fenglin Ding, Wu Guo, Lirong Dai, Jun Du:
Attention-Based Gated Scaling Adaptive Acoustic Model for CTC-Based Speech Recognition. ICASSP 2020: 7404-7408 - [c218]Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai:
Extracting Unit Embeddings Using Sequence-To-Sequence Acoustic Models for Unit Selection Speech Synthesis. ICASSP 2020: 7659-7663 - [c217]Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, Lirong Dai:
A Tree-Structured Decoder for Image-to-Markup Generation. ICML 2020: 11076-11085 - [c216]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:
Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning. INTERSPEECH 2020: 771-775 - [c215]Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Ian McLoughlin, Lin Liu:
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection. INTERSPEECH 2020: 841-845 - [c214]Ying Liu, Yan Song, Yiheng Jiang, Ian McLoughlin, Lin Liu, Li-Rong Dai:
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions. INTERSPEECH 2020: 3007-3011 - [c213]Zi-qiang Zhang, Yan Song, Jian-Shu Zhang, Ian McLoughlin, Li-Rong Dai:
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution. INTERSPEECH 2020: 3580-3584 - [i26]Fenglin Ding, Wu Guo, Lirong Dai, Jun Du:
Attentive batch normalization for lstm-based acoustic modeling of speech recognition. CoRR abs/2001.00129 (2020) - [i25]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:
Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning. CoRR abs/2008.02371 (2020) - [i24]Liangfa Wei, Jie Zhang, Junfeng Hou, Lirong Dai:
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition. CoRR abs/2008.02686 (2020) - [i23]Jing-Xuan Zhang, Li-Juan Liu, Yan-Nian Chen, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling, Li-Rong Dai:
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer. CoRR abs/2009.01475 (2020) - [i22]Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee:
Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement. CoRR abs/2009.09561 (2020) - [i21]Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, Bao-Cai Yin:
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention. CoRR abs/2012.14360 (2020)
2010 – 2019
- 2019
- [j37]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Juan Liu, Yuan Jiang, Li-Rong Dai:
Sequence-to-Sequence Acoustic Modeling for Voice Conversion. IEEE ACM Trans. Audio Speech Lang. Process. 27(3): 631-644 (2019) - [j36]Zengxi Li, Yan Song, Li-Rong Dai, Ian McLoughlin:
Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. 27(4): 692-703 (2019) - [j35]Jianshu Zhang, Jun Du, Lirong Dai:
Track, Attend, and Parse (TAP): An End-to-End Framework for Online Handwritten Mathematical Expression Recognition. IEEE Trans. Multim. 21(1): 221-233 (2019) - [c212]Yuxuan Xi, Pengcheng Li, Yan Song, Yiheng Jiang, Lirong Dai:
Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters. APSIPA 2019: 513-518 - [c211]Peng-Fei Wu, Zhen-Hua Ling, Li-Juan Liu, Yuan Jiang, Hong-Chuan Wu, Lirong Dai:
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training. APSIPA 2019: 623-627 - [c210]Jingyi Xu, Junfeng Hou, Yan Song, Wu Guo, Lirong Dai:
Knowledge Distillation from Multilingual and Monolingual Teachers for End-to-End Multilingual Speech Recognition. APSIPA 2019: 844-849 - [c209]Rui Na, Junfeng Hou, Wu Guo, Yan Song, Lirong Dai:
Learning Adaptive Downsampling Encoding for Online End-to-End Speech Recognition. APSIPA 2019: 850-854 - [c208]Yiheng Jiang, Yan Song, Jie Yan, Lirong Dai, Ian McLoughlin:
Triplet-Center Loss Based Deep Embedding Learning Method for Speaker Verification. APSIPA 2019: 1625-1629 - [c207]Yuan Jiang, Ya-Jun Hu, Li-Juan Liu, Hong-Chuan Wu, Zhi-Kun Wang, Yang Ai, Zhen-Hua Ling, Li-Rong Dai:
The USTC System for Blizzard Challenge 2019. Blizzard Challenge 2019 - [c206]Jie Yan, Yan Song, Wu Guo, Li-Rong Dai, Ian McLoughlin, Liang Chen:
A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification. ICASSP 2019: 755-759 - [c205]Jing-Xuan Zhang, Zhen-Hua Ling, Yuan Jiang, Li-Juan Liu, Chen Liang, Li-Rong Dai:
Improving Sequence-to-sequence Voice Conversion by Adding Text-supervision. ICASSP 2019: 6785-6789 - [c204]Qinhui Lei, Hang Chen, Junfeng Hou, Liang Chen, Lirong Dai:
Deep Neural Network Based Regression Approach for Acoustic Echo Cancellation. ICMSSP 2019: 94-98 - [c203]Zhifu Gao, Yan Song, Ian McLoughlin, Pengcheng Li, Yiheng Jiang, Li-Rong Dai:
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System. INTERSPEECH 2019: 361-365 - [c202]Lanhua You, Wu Guo, Li-Rong Dai, Jun Du:
Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification. INTERSPEECH 2019: 1158-1162 - [c201]Lanhua You, Wu Guo, Li-Rong Dai, Jun Du:
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification. INTERSPEECH 2019: 1168-1172 - [c200]Jia-Xiang Chen, Zhen-Hua Ling, Li-Rong Dai:
A Chinese Dataset for Identifying Speakers in Novels. INTERSPEECH 2019: 1561-1565 - [c199]Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai:
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling. INTERSPEECH 2019: 2593-2597 - [c198]Yiheng Jiang, Yan Song, Ian McLoughlin, Zhifu Gao, Li-Rong Dai:
An Effective Deep Embedding Learning Architecture for Speaker Verification. INTERSPEECH 2019: 4040-4044 - [c197]Zhi Chen, Wu Guo, Li-Rong Dai, Zhen-Hua Ling, Jun Du:
Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels. INTERSPEECH 2019: 4225-4229 - [i20]Lanhua You, Wu Guo, Lirong Dai, Jun Du:
Deep Neural Network Embedding Learning with High-Order Statistics for Text-Independent Speaker Verification. CoRR abs/1903.12058 (2019) - [i19]Lanhua You, Wu Guo, Lirong Dai, Jun Du:
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification. CoRR abs/1903.12092 (2019) - [i18]Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai:
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling. CoRR abs/1906.08977 (2019) - [i17]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:
Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations. CoRR abs/1906.10508 (2019) - [i16]Peng-Fei Wu, Zhen-Hua Ling, Li-Juan Liu, Yuan Jiang, Hong-Chuan Wu, Li-Rong Dai:
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training. CoRR abs/1906.10859 (2019) - 2018
- [j34]Zengxi Li, Li-Rong Dai, Yan Song, Ian McLoughlin:
A Conditional Generative Model for Speech Enhancement. Circuits Syst. Signal Process. 37(11): 5005-5022 (2018) - [j33]Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai:
Articulatory-to-acoustic conversion using BLSTM-RNNs with augmented input representation. Speech Commun. 99: 161-172 (2018) - [j32]Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai:
Statistical Parametric Speech Synthesis Using Generalized Distillation Framework. IEEE Signal Process. Lett. 25(5): 695-699 (2018) - [j31]Ma Jin, Yan Song, Ian McLoughlin, Li-Rong Dai:
LID-Senones and Their Statistics for Language Identification. IEEE ACM Trans. Audio Speech Lang. Process. 26(1): 171-183 (2018) - [j30]Zhen-Hua Ling, Yang Ai, Yu Gu, Li-Rong Dai:
Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension. IEEE ACM Trans. Audio Speech Lang. Process. 26(5): 883-894 (2018) - [j29]Qing Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures. IEEE ACM Trans. Audio Speech Lang. Process. 26(7): 1181-1193 (2018) - [j28]Junhua Liu, Zhen-Hua Ling, Si Wei, Guoping Hu, Li-Rong Dai:
Improving the Decoding Efficiency of Deep Neural Network Acoustic Models by Cluster-Based Senone Selection. J. Signal Process. Syst. 90(7): 999-1011 (2018) - [c196]Yaming Liu, Jian Tang, Yan Song, Lirong Dai:
A Capsule based Approach for Polyphonic Sound Event Detection. APSIPA 2018: 1853-1857 - [c195]Yuan Jiang, Xiao Zhou, Chuang Ding, Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai:
The USTC System for Blizzard Challenge 2018. Blizzard Challenge 2018 - [c194]Zengxi Li, Yan Song, Li-Rong Dai, Ian McLoughlin:
Source-Aware Context Network for Single-Channel Multi-Speaker Speech Separation. ICASSP 2018: 681-685 - [c193]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:
Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis. ICASSP 2018: 4789-4793 - [c192]Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Densely Connected Progressive Learning for LSTM-Based Speech Enhancement. ICASSP 2018: 5054-5058 - [c191]Shiliang Zhang, Ming Lei, Zhijie Yan, Lirong Dai:
Deep-FSMN for Large Vocabulary Continuous Speech Recognition. ICASSP 2018: 5869-5873 - [c190]Peixin Chen, Wu Guo, Lirong Dai, Zhenhua Ling:
Pseudo-Supervised Approach for Text Clustering Based on Consensus Analysis. ICASSP 2018: 6184-6188 - [c189]Jianshu Zhang, Yixing Zhu, Jun Du, Lirong Dai:
Radical Analysis Network for Zero-Shot Learning in Printed Chinese Character Recognition. ICME 2018: 1-6 - [c188]Jianshu Zhang, Jun Du, Lirong Dai:
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition. ICPR 2018: 2245-2250 - [c187]Jianshu Zhang, Yixing Zhu, Jun Du, Lirong Dai:
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition. ICPR 2018: 3681-3686 - [c186]Jian Tang, Yan Song, Lirong Dai, Ian McLoughlin:
Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition. INTERSPEECH 2018: 1783-1787 - [c185]Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou, Li-Rong Dai:
WaveNet Vocoder with Limited Training Data for Voice Conversion. INTERSPEECH 2018: 1983-1987 - [c184]Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou, Li-Rong Dai:
Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis. INTERSPEECH 2018: 2509-2513 - [c183]Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo, Lirong Dai:
An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition. INTERSPEECH 2018: 3087-3091 - [c182]Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo, Lirong Dai:
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification. INTERSPEECH 2018: 3578-3582 - [c181]Qing Wang, Jun Du, Li Chai, Li-Rong Dai, Chin-Hui Lee:
A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network. ISCSLP 2018: 295-299 - [i15]Jianshu Zhang, Jun Du, Lirong Dai:
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition. CoRR abs/1801.03530 (2018) - [i14]Zhen-Hua Ling, Yang Ai, Yu Gu, Li-Rong Dai:
Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension. CoRR abs/1801.07910 (2018) - [i13]Jianshu Zhang, Yixing Zhu, Jun Du, Lirong Dai:
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition. CoRR abs/1801.10109 (2018) - [i12]Shiliang Zhang, Ming Lei, Zhijie Yan, Lirong Dai:
Deep-FSMN for Large Vocabulary Continuous Speech Recognition. CoRR abs/1803.05030 (2018) - [i11]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:
Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis. CoRR abs/1807.06736 (2018) - [i10]Yaming Liu, Jian Tang, Yan Song, Lirong Dai:
A Capsule based Approach for Polyphonic Sound Event Detection. CoRR abs/1807.07436 (2018) - [i9]Jing-Xuan Zhang, Zhen-Hua Ling, Li-Juan Liu, Yuan Jiang, Li-Rong Dai:
Sequence-to-Sequence Acoustic Modeling for Voice Conversion. CoRR abs/1810.06865 (2018) - [i8]Jing-Xuan Zhang, Zhen-Hua Ling, Yuan Jiang, Li-Juan Liu, Chen Liang, Li-Rong Dai:
Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision. CoRR abs/1811.08111 (2018) - 2017
- [j27]Yanhui Tu, Jun Du, Qing Wang, Xiao Bao, Li-Rong Dai, Chin-Hui Lee:
An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech. Comput. Speech Lang. 46: 517-534 (2017) - [j26]Yonghong Tian, Xilin Chen, Hongkai Xiong, Hong-Liang Li, Li-Rong Dai, Jing Chen, Junliang Xing, Jing Chen, Xihong Wu, Weiming Hu, Yu Hu, Tiejun Huang, Wen Gao:
Towards human-like and transhuman perception in AI 2.0: a review. Frontiers Inf. Technol. Electron. Eng. 18(1): 58-67 (2017) - [j25]Jianshu Zhang, Jun Du, Shiliang Zhang, Dan Liu, Yulong Hu, Jin-Shui Hu, Si Wei, Li-Rong Dai:
Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71: 196-206 (2017) - [j24]Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments. Speech Commun. 95: 28-39 (2017) - [j23]Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Li-Rong Dai, Yu Hu:
Nonrecurrent Neural Structure for Long-Term Dependence. IEEE ACM Trans. Audio Speech Lang. Process. 25(4): 871-884 (2017) - [j22]Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks. IEEE ACM Trans. Audio Speech Lang. Process. 25(7): 1535-1546 (2017) - [c180]Junfeng Hou, Shiliang Zhang, Li-Rong Dai, Hui Jiang:
Feedforward sequential memory networks based encoder-decoder model for machine translation. APSIPA 2017: 622-625 - [c179]Huang Chen, Shiliang Zhang, Junfeng Hou, Lirong Dai:
Learning the number of nodes in DNNs with activation mask. APSIPA 2017: 1218-1221 - [c178]Shumin An, Zhenhua Ling, Lirong Dai:
Emotional statistical parametric speech synthesis using LSTM-RNNs. APSIPA 2017: 1613-1616 - [c177]Ya-Jun Hu, Li-Juan Liu, Chuang Ding, Zhen-Hua Ling, Li-Rong Dai:
The USTC system for blizzard machine learning challenge 2017-ES2. ASRU 2017: 650-656 - [c176]Ya-Jun Hu, Chuang Ding, Li-Juan Liu, Zhen-Hua Ling, Li-Rong Dai:
The USTC System for Blizzard Challenge 2017. Blizzard Challenge 2017 - [c175]Qing Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features. HSCMA 2017: 101-105 - [c174]Lei Sun, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Multiple-target deep learning for LSTM-RNN based speech enhancement. HSCMA 2017: 136-140 - [c173]Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai:
Extracting structural spectral features using what-where auto-encoders for statistical parametric speech synthesis. ICASSP 2017: 4915-4919 - [c172]Liping Chen, Kong-Aik Lee, Bin Ma, Long Ma, Haizhou Li, Li-Rong Dai:
Adaptation of PLDA for multi-source text-independent speaker verification. ICASSP 2017: 5380-5384 - [c171]Jianshu Zhang, Jun Du, Lirong Dai:
A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition. ICDAR 2017: 902-907 - [c170]Xiao Bao, Tian Gao, Jun Du, Li-Rong Dai:
An investigation of high-resolution modeling units of deep neural networks for acoustic scene classification. IJCNN 2017: 3028-3035 - [c169]Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation. INTERSPEECH 2017: 1178-1182 - [c168]Ma Jin, Yan Song, Ian Vince McLoughlin, Wu Guo, Li-Rong Dai:
End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling. INTERSPEECH 2017: 2571-2575 - [c167]Junfeng Hou, Shiliang Zhang, Li-Rong Dai:
Gaussian Prediction Based Attention for Online End-to-End Speech Recognition. INTERSPEECH 2017: 3692-3696 - [i7]Junbei Zhang, Xiaodan Zhu, Qian Chen, Li-Rong Dai, Si Wei, Hui Jiang:
Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering. CoRR abs/1703.04617 (2017) - [i6]Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee:
Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement. CoRR abs/1703.07172 (2017) - [i5]Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai:
RAN: Radical analysis networks for zero-shot learning of Chinese characters. CoRR abs/1711.01889 (2017) - [i4]Jianshu Zhang, Jun Du, Li-Rong Dai:
A GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition. CoRR abs/1712.03991 (2017) - 2016
- [j21]Xin Wang, Zhen-Hua Ling, Li-Rong Dai:
Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering. Comput. Speech Lang. 38: 46-67 (2016) - [j20]Tian Gao, Jun Du, Yong Xu, Cong Liu, Li-Rong Dai, Chin-Hui Lee:
Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition. EURASIP J. Adv. Signal Process. 2016: 86 (2016) - [j19]Shiliang Zhang, Hui Jiang, Li-Rong Dai:
Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Learn Neural Networks. J. Mach. Learn. Res. 17: 37:1-37:33 (2016) - [j18]Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai:
Modeling F0 trajectories in hierarchically structured deep neural networks. Speech Commun. 76: 82-92 (2016) - [j17]Jun Du, Yanhui Tu, Li-Rong Dai, Chin-Hui Lee:
A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks. IEEE ACM Trans. Audio Speech Lang. Process. 24(8): 1424-1437 (2016) - [j16]Shaofei Xue, Hui Jiang, Li-Rong Dai, Qingfeng Liu:
Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition. J. Signal Process. Syst. 82(2): 175-185 (2016) - [j15]Liping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai:
Exploration of Local Variability in Text-Independent Speaker Verification. J. Signal Process. Syst. 82(2): 217-228 (2016) - [c166]Qing Wang, Jun Du, Li-Rong Dai:
Boosting DNN-based speech enhancement via explicit transformations. APSIPA 2016: 1-4 - [c165]Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Unsupervised single-channel speech separation via deep neural network for different gender mixtures. APSIPA 2016: 1-4 - [c164]Ling-Hui Chen, Yuan Jiang, Ming Zhou, Zhen-Hua Ling, Li-Rong Dai:
The USTC System for Blizzard Challenge 2016. Blizzard Challenge 2016 - [c163]Zengxi Li, Yan Song, Ian McLoughlin, Li-Rong Dai:
Compact convolutional neural network transfer learning for small-scale image classification. ICASSP 2016: 2737-2741 - [c162]Xiang Yin, Zhen-Hua Ling, Ya-Jun Hu, Li-Rong Dai:
Modeling spectral envelopes using deep conditional restricted Boltzmann machines for statistical parametric speech synthesis. ICASSP 2016: 5125-5129 - [c161]Zhiying Huang, Jian Tang, Shaofei Xue, Li-Rong Dai:
Speaker adaptation OF RNN-BLSTM for speech recognition based on speaker code. ICASSP 2016: 5305-5309 - [c160]Liping Chen, Kong-Aik Lee, Eng Siong Chng, Bin Ma, Haizhou Li, Li-Rong Dai:
Content-aware local variability vector for speaker verification with short utterance. ICASSP 2016: 5485-5489 - [c159]Ya-Jun Hu, Zhen-Hua Ling, Li-Rong Dai:
Deep belief network-based post-filtering for statistical parametric speech synthesis. ICASSP 2016: 5510-5514 - [c158]Zhen-Hua Ling, Xiao-Hui Sun, Li-Rong Dai, Yu Hu:
Modulation spectrum compensation for HMM-based speech synthesis using line spectral pairs. ICASSP 2016: 5595-5599 - [c157]Yu Gu, Zhen-Hua Ling, Li-Rong Dai:
Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks. INTERSPEECH 2016: 297-301 - [c156]Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai:
Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks. INTERSPEECH 2016: 1502-1506 - [c155]Ling-Hui Chen, Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai:
The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion. INTERSPEECH 2016: 1642-1646 - [c154]Jianshu Zhang, Jian Tang, Li-Rong Dai:
RNN-BLSTM Based Multi-Pitch Estimation. INTERSPEECH 2016: 1785-1789 - [c153]Shiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei, Li-Rong Dai:
Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition. INTERSPEECH 2016: 3389-3393 - [c152]Jian Tang, Shiliang Zhang, Si Wei, Li-Rong Dai:
Future Context Attention for Unidirectional LSTM Based Acoustic Model. INTERSPEECH 2016: 3394-3398 - [c151]Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee:
SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement. INTERSPEECH 2016: 3713-3717 - [c150]Nana Fan, Jun Du, Li-Rong Dai:
A regression approach to binaural speech segregation via deep neural network. ISCSLP 2016: 1-5 - [c149]Junfeng Hou, Shiliang Zhang, Li-Rong Dai:
Learning FOFE based FNN-LMs with noise contrastive estimation and part-of-speech features. ISCSLP 2016: 1-5 - [c148]Zhiying Huang, Shaofei Xue, Zhijie Yan, Li-Rong Dai:
Unsupervised speaker adaptation of BLSTM-RNN for LVCSR based on speaker code. ISCSLP 2016: 1-5 - [c147]Junhua Liu, Zhen-Hua Ling, Si Wei, Guoping Hu, Li-Rong Dai:
Cluster-based senone selection for the efficient calculation of deep neural network acoustic models. ISCSLP 2016: 1-5 - [c146]Mengjie Qian, Ian McLoughlin, Wu Quo, Li-Rong Dai:
Mismatched training data enhancement for automatic recognition of children's speech using DNN-HMM. ISCSLP 2016: 1-5 - [c145]Yanhui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition. ISCSLP 2016: 1-5 - [c144]Shaofei Xue, Zhijie Yan, Zhiying Huang, Li-Rong Dai:
Rapid speaker adaptation based on D-code extracted from BLSTM-RNN in LVCSR. ISCSLP 2016: 1-5 - [c143]Junbei Zhang, Junfeng Hou, Shiliang Zhang, Li-Rong Dai:
USTC at NTCIR-12 STC Task. NTCIR 2016 - [c142]Yan Song, Ruilian Cui, Ian McLoughlin, Li-Rong Dai:
Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification. Odyssey 2016: 140-145 - [c141]Ma Jin, Yan Song, Ian McLoughlin, Li-Rong Dai, Zhongfu Ye:
LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification. Odyssey 2016: 210-216 - [c140]Yan Song, Xinhai Hong, Ian McLoughlin, Li-Rong Dai:
Image classification with CNN-based Fisher vector coding. VCIP 2016: 1-4 - 2015
- [j14]Ming-Qi Cai, Zhen-Hua Ling, Li-Rong Dai:
Statistical parametric speech synthesis using a hidden trajectory model. Speech Commun. 72: 149-159 (2015) - [j13]Liping Chen, Kong-Aik Lee, Li-Rong Dai, Haizhou Li:
Quasi-Factorial Prior for i-vector Extraction. IEEE Signal Process. Lett. 22(12): 2484-2488 (2015) - [j12]Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A Regression Approach to Speech Enhancement Based on Deep Neural Networks. IEEE ACM Trans. Audio Speech Lang. Process. 23(1): 7-19 (2015) - [j11]Pan Zhou, Hui Jiang, Li-Rong Dai, Yu Hu, Qingfeng Liu:
State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 23(4): 631-642 (2015) - [c139]Shiliang Zhang, Hui Jiang, Mingbin Xu, Junfeng Hou, Li-Rong Dai:
The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models. ACL (2) 2015: 495-500 - [c138]Jun Du, Qing Wang, Yanhui Tu, Xiao Bao, Li-Rong Dai, Chin-Hui Lee:
An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework. ASRU 2015: 430-435 - [c137]Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai:
LIP movement generation using restricted Boltzmann machines for visual speech synthesis. ChinaSIP 2015: 606-610 - [c136]Tian Gao, Jun Du, Li Xu, Cong Liu, Li-Rong Dai, Chin-Hui Lee:
A unified speaker-dependent speech separation and enhancement system based on deep neural networks. ChinaSIP 2015: 687-691 - [c135]Tian Gao, Jun Du, Yong Xu, Cong Liu, Li-Rong Dai, Chin-Hui Lee:
Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments. LVA/ICA 2015: 75-82 - [c134]Yanhui Tu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition. ICASSP 2015: 61-65 - [c133]Yan Song, Ruilian Cui, Xinhai Hong, Ian McLoughlin, Jiong Shi, Li-Rong Dai:
Improved language identification using deep bottleneck network. ICASSP 2015: 4200-4204 - [c132]Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Joint training of front-end and back-end deep neural networks for robust speech recognition. ICASSP 2015: 4375-4379 - [c131]Shaofei Xue, Hui Jiang, Li-Rong Dai, Qingfeng Liu:
Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition. ICASSP 2015: 4555-4559 - [c130]Li-Juan Liu, Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai:
Spectral conversion using deep neural networks trained with multi-source speakers. ICASSP 2015: 4849-4853 - [c129]Liping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai:
Channel adaptation of plda for text-independent speaker verification. ICASSP 2015: 5251-5255 - [c128]Jun Du, Jian-Fang Zhai, Jin-Shui Hu, Bo Zhu, Si Wei, Li-Rong Dai:
Writer adaptive feature extraction based on convolutional neural networks for online handwritten Chinese character recognition. ICDAR 2015: 841-845 - [c127]Liping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai:
Phone-centric local variability vector for text-constrained speaker verification. INTERSPEECH 2015: 229-233 - [c126]Yan Song, Xinhai Hong, Bing Jiang, Ruilian Cui, Ian McLoughlin, Li-Rong Dai:
Deep bottleneck network based i-vector representation for language identification. INTERSPEECH 2015: 398-402 - [c125]Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification. INTERSPEECH 2015: 992-996 - [c124]Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee:
Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement. INTERSPEECH 2015: 1508-1512 - [c123]Qian Chen, Zhen-Hua Ling, Chen-Yu Yang, Li-Rong Dai:
Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions. INTERSPEECH 2015: 1581-1585 - [c122]Qing Wang, Jun Du, Xiao Bao, Zi-Rui Wang, Li-Rong Dai, Chin-Hui Lee:
A universal VAD based on jointly trained deep neural networks. INTERSPEECH 2015: 2282-2286 - [c121]Shiliang Zhang, Hui Jiang, Si Wei, Li-Rong Dai:
Rectified linear neural networks with tied-scalar regularization for LVCSR. INTERSPEECH 2015: 2635-2639 - [c120]Yan Song, Ian McLoughlin, Li-Rong Dai:
Deep Bottleneck Feature for Image Classification. ICMR 2015: 491-494 - [i3]Shiliang Zhang, Hui Jiang, Mingbin Xu, Junfeng Hou, Li-Rong Dai:
A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models. CoRR abs/1505.01504 (2015) - [i2]Shiliang Zhang, Hui Jiang, Si Wei, Li-Rong Dai:
Feedforward Sequential Memory Neural Networks without Recurrent Feedback. CoRR abs/1510.02693 (2015) - [i1]Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Li-Rong Dai, Yu Hu:
Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency. CoRR abs/1512.08301 (2015) - 2014
- [j10]Chen-Yu Yang, Zhen-Hua Ling, Li-Rong Dai:
Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs. IEICE Trans. Inf. Syst. 97-D(6): 1449-1460 (2014) - [j9]Xian-Jun Xia, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai:
HMM-based unit selection speech synthesis using log likelihood ratios derived from perceptual data. Speech Commun. 63: 27-37 (2014) - [j8]Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
An Experimental Study on Speech Enhancement Based on Deep Neural Networks. IEEE Signal Process. Lett. 21(1): 65-68 (2014) - [j7]Shaofei Xue, Ossama Abdel-Hamid, Hui Jiang, Li-Rong Dai, Qingfeng Liu:
Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE ACM Trans. Audio Speech Lang. Process. 22(12): 1713-1725 (2014) - [j6]Ling-Hui Chen, Zhen-Hua Ling, Li-Juan Liu, Li-Rong Dai:
Voice conversion using deep neural networks with layer-wise generative training. IEEE ACM Trans. Audio Speech Lang. Process. 22(12): 1859-1872 (2014) - [c119]Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Global variance equalization for improving deep neural network based speech enhancement. ChinaSIP 2014: 71-75 - [c118]Yan Song, Wu Guo, Li-Rong Dai, Ian Vince McLoughlin:
A spectral based visual matching method for image classification. ICAILP 2014: 666-670 - [c117]Jun Du, Li-Rong Dai, Qiang Huo:
Synthesized stereo mapping via deep neural networks for noisy speech recognition. ICASSP 2014: 1764-1768 - [c116]Xiang Yin, Zhen-Hua Ling, Li-Rong Dai:
Spectral modeling using neural autoregressive distribution estimators for statistical parametric speech synthesis. ICASSP 2014: 3824-3828 - [c115]Liping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai:
Minimum divergence estimation of speaker prior in multi-session PLDA scoring. ICASSP 2014: 4007-4011 - [c114]Diyuan Liu, Si Wei, Wu Guo, Yebo Bao, Shifu Xiong, Li-Rong Dai:
Lattice based optimization of bottleneck feature extractor with linear transformation. ICASSP 2014: 5617-5621 - [c113]Pan Zhou, Li-Rong Dai, Hui Jiang:
Sequence training of multiple deep neural networks for better performance and faster training speed. ICASSP 2014: 5627-5631 - [c112]Shaofei Xue, Ossama Abdel-Hamid, Hui Jiang, Li-Rong Dai:
Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. ICASSP 2014: 6339-6343 - [c111]Shiliang Zhang, Yebo Bao, Pan Zhou, Hui Jiang, Li-Rong Dai:
Improving deep neural networks for LVCSR using dropout and shrinking structure. ICASSP 2014: 6849-6853 - [c110]Li-Juan Liu, Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai:
Using bidirectional associative memories for joint spectral envelope modeling in voice conversion. ICASSP 2014: 7884-7888 - [c109]Jun Du, Jin-Shui Hu, Bo Zhu, Si Wei, Li-Rong Dai:
Writer Adaptation Using Bottleneck Features and Discriminative Linear Regression for Online Handwritten Chinese Character Recognition. ICFHR 2014: 311-316 - [c108]Jun Du, Jin-Shui Hu, Bo Zhu, Si Wei, Li-Rong Dai:
A Study of Designing Compact Classifiers Using Deep Neural Networks for Online Handwritten Chinese Character Recognition. ICPR 2014: 2950-2955 - [c107]Jun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai, Chin-Hui Lee:
Robust speech recognition with speech enhanced deep neural networks. INTERSPEECH 2014: 616-620 - [c106]Ming-Qi Cai, Zhen-Hua Ling, Li-Rong Dai:
Formant-controlled speech synthesis using hidden trajectory model. INTERSPEECH 2014: 1529-1533 - [c105]Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai:
Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree. INTERSPEECH 2014: 2273-2277 - [c104]Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai:
Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes. INTERSPEECH 2014: 2313-2317 - [c103]Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Dynamic noise aware training for speech enhancement based on deep neural networks. INTERSPEECH 2014: 2670-2674 - [c102]Xin Wang, Zhen-Hua Ling, Li-Rong Dai:
Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis. INTERSPEECH 2014: 2942-2946 - [c101]Bing Jiang, Yan Song, Si Wei, Ian Vince McLoughlin, Li-Rong Dai:
Task-aware deep bottleneck features for spoken language identification. INTERSPEECH 2014: 3012-3016 - [c100]Shaofei Xue, Hui Jiang, Li-Rong Dai:
Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition. ISCSLP 2014: 1-5 - [c99]Liping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai:
Local variability vector for text-independent speaker verification. ISCSLP 2014: 54-58 - [c98]Changqing Kong, Shaofei Xue, Jianqing Gao, Wu Guo, Li-Rong Dai, Hui Jiang:
Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes. ISCSLP 2014: 83-87 - [c97]Bing Jiang, Yan Song, Si Wei, Meng-Ge Wang, Ian McLoughlin, Li-Rong Dai:
Performance evaluation of deep bottleneck features for spoken language identification. ISCSLP 2014: 143-147 - [c96]Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors. ISCSLP 2014: 158-162 - [c95]Yu-Sheng Sun, Zhen-Hua Ling, Xiang Yin, Li-Rong Dai:
Integrating global variance of log power spectrum derived from LSPs into MGE training for HMM-based parametric speech synthesis. ISCSLP 2014: 201-205 - [c94]Yanhui Tu, Jun Du, Yong Xu, Li-Rong Dai, Chin-Hui Lee:
Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers. ISCSLP 2014: 250-254 - [c93]Li Gao, Zhen-Hua Ling, Ling-Hui Chen, Li-Rong Dai:
Improving F0 prediction using bidirectional associative memories and syllable-level F0 features for HMM-based Mandarin speech synthesis. ISCSLP 2014: 275-279 - [c92]Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Cross-language transfer learning for deep neural network based speech enhancement. ISCSLP 2014: 336-340 - [c91]Kong Aik Lee, Bin Ma, Haizhou Li, Liping Chen, Wu Guo, Li-Rong Dai:
Local Variability Modeling for Text-Independent Speaker Verification. Odyssey 2014: 54-59 - 2013
- [c90]Ling-Hui Chen, Zhen-Hua Ling, Yuan Jiang, Yang Song, Xian-Jun Xia, Yi-Qing Zu, Run-Qiang Yan, Li-Rong Dai:
The USTC System for Blizzard Challenge 2013. Blizzard Challenge 2013 - [c89]Pan Zhou, Cong Liu, Qingfeng Liu, Li-Rong Dai, Hui Jiang:
A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition. ICASSP 2013: 6650-6654 - [c88]Chen-Yu Yang, Zhen-Hua Ling, Li-Rong Dai:
Unsupervised prosodic phrase boundary labeling of Mandarin speech synthesis database using context-dependent HMM. ICASSP 2013: 6875-6879 - [c87]Yebo Bao, Hui Jiang, Li-Rong Dai, Cong Liu:
Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition. ICASSP 2013: 6980-6984 - [c86]Meng-Ge Wang, Yan Song, Bing Jiang, Li-Rong Dai, Ian McLoughlin:
Exemplar based language recognition method for short-duration speech segments. ICASSP 2013: 7354-7358 - [c85]LianWu Chen, Wu Guo, Yan Song, Li-Rong Dai:
Phoneme variation based synthesized speech discrimination for speaker verification. ICASSP 2013: 7874-7877 - [c84]Ling-Hui Chen, Zhen-Hua Ling, Yan Song, Li-Rong Dai:
Joint spectral distribution modeling using restricted boltzmann machines for voice conversion. INTERSPEECH 2013: 3052-3056 - 2012
- [j5]Zhen-Hua Ling, Li-Rong Dai:
Minimum Kullback-Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis. IEEE Trans. Speech Audio Process. 20(5): 1492-1502 (2012) - [c83]Zhen-Hua Ling, Xian-Jun Xia, Yang Song, Chen-Yu Yang, Ling-Hui Chen, Li-Rong Dai:
The USTC System for Blizzard Challenge 2012. Blizzard Challenge 2012 - [c82]Xiang Yin, Zhen-Hua Ling, Ming Lei, Li-Rong Dai:
Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis. INTERSPEECH 2012: 1147-1150 - [c81]Bing Jiang, Yan Song, Wu Guo, Li-Rong Dai:
Exemplar-Based Sparse Representation for Language Recognition on I-Vectors. INTERSPEECH 2012: 2057-2060 - [c80]Xin Wang, Zhen-Hua Ling, Li-Rong Dai:
Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis. ISCSLP 2012: 84-87 - [c79]Yong Xu, Wu Guo, Shan Su, Li-Rong Dai:
Spoken term detection for OOV terms based on triphone confusion matrix. ISCSLP 2012: 98-102 - [c78]Xian-Jun Xia, Zhen-Hua Ling, Chen-Yu Yang, Li-Rong Dai:
Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech. ISCSLP 2012: 160-164 - [c77]Kui Wu, Yan Song, Wu Guo, Li-Rong Dai:
Intra-conversation intra-speaker variability compensation for speaker clustering. ISCSLP 2012: 330-334 - [c76]Yong Xu, Wu Guo, Li-Rong Dai:
A hybrid fragment / syllable-based system for improved OOV term detection. ISCSLP 2012: 378-382 - 2011
- [j4]Cong Liu, Yu Hu, Li-Rong Dai, Hui Jiang:
Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 19(8): 2474-2485 (2011) - [c75]Yan Song, Jinhui Tang, Xia Li, Qi Tian, Li-Rong Dai:
Effective image representation based on bi-layer visual codebook. ACPR 2011: 224-228 - [c74]Ling-Hui Chen, Chen-Yu Yang, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai, Yu Hu, Ren-Hua Wang:
The USTC System for Blizzard Challenge 2011. Blizzard Challenge 2011 - [c73]Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo:
Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model. ICASSP 2011: 4520-4523 - [c72]Ming Lei, Zhen-Hua Ling, Li-Rong Dai:
Preserve ordering property of generated LSPS for minimum generation error training in HMM-based speech synthesis. ICASSP 2011: 4712-4715 - [c71]Eryu Wang, Kong-Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Li-Rong Dai:
Factored covariance modeling for text-independent speaker verification. ICASSP 2011: 4856-4859 - [c70]Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai:
Non-parallel training for voice conversion based on FT-GMM. ICASSP 2011: 5116-5119 - [c69]Heng Lu, Zhen-Hua Ling, Li-Rong Dai, Ren-Hua Wang:
Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score. ICASSP 2011: 5352-5355 - [c68]Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo:
Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model. INTERSPEECH 2011: 373-376 - [c67]Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai:
Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis. INTERSPEECH 2011: 1801-1804 - [c66]Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai:
Formant-Controlled HMM-Based Speech Synthesis. INTERSPEECH 2011: 2777-2780 - 2010
- [j3]Heng Lu, Zhen-Hua Ling, Li-Rong Dai, Ren-Hua Wang:
Cross-Validation and Minimum Generation Error based Decision Tree Pruning for HMM-based Speech Synthesis. Int. J. Comput. Linguistics Chin. Lang. Process. 15(1) (2010) - [c65]Yuan Jiang, Zhen-Hua Ling, Ming Lei, Cheng-Cheng Wang, Heng Lu, Yu Hu, Li-Rong Dai, Ren-Hua Wang:
The USTC System for Blizzard Challenge 2010. Blizzard Challenge 2010 - [c64]Ming Lei, Zhen-Hua Ling, Li-Rong Dai:
Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis. ICASSP 2010: 4230-4233 - [c63]Wu Guo, Zhao Zhang, Yanhua Long, Li-Rong Dai:
N-gram nearest neighbor algorithm for voice password system. ICASSP 2010: 4438-4441 - [c62]Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang:
HMM-based pseudo-clean speech synthesis for splice algorithm. ICASSP 2010: 4570-4573 - [c61]Cong Liu, Yu Hu, Hui Jiang, Li-Rong Dai:
A bounded trust region optimization for discriminative training of HMMS in speech recognition. ICASSP 2010: 4914-4917 - [c60]Yan Song, Qi Tian, Mengyue Wang, Heng Liu, Li-Rong Dai:
Multiple instance learning using visual phrases for object classification. ICME 2010: 649-654 - [c59]Heng Lu, Zhen-Hua Ling, Si Wei, Li-Rong Dai, Ren-Hua Wang:
Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier. INTERSPEECH 2010: 162-165 - [c58]Zhen-Hua Ling, Yu Hu, Li-Rong Dai:
Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis. INTERSPEECH 2010: 825-828 - [c57]Eryu Wang, Kong-Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Li-Rong Dai:
The estimation and kernel metric of spectral correlation for text-independent speaker verification. INTERSPEECH 2010: 1065-1068 - [c56]Yanhua Long, Li-Rong Dai, Bin Ma, Wu Guo:
Effects of the phonological relevance in speaker verification. INTERSPEECH 2010: 2130-2133 - [c55]Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, Li-Rong Dai:
A hierarchical F0 modeling method for HMM-based speech synthesis. INTERSPEECH 2010: 2170-2173 - [c54]Tian-Yi Zhao, Zhen-Hua Ling, Ming Lei, Li-Rong Dai, Qingfeng Liu:
Minimum generation error training for HMM-based prediction of articulatory movements. ISCSLP 2010: 99-102 - [c53]Zhen-Hua Ling, Zhiguo Wang, Li-Rong Dai:
Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis. ISCSLP 2010: 144-147 - [c52]Ying Xu, Yan Song, Yanhua Long, Hai-Bing Zhong, Li-Rong Dai:
The description of iFlyTek Speech Lab system for NIST2009 Language Recognition Evaluation. ISCSLP 2010: 157-161 - [c51]Eryu Wang, Wu Guo, Li-Rong Dai, Kong-Aik Lee, Bin Ma, Haizhou Li:
Factor analysis based spatial correlation modeling for speaker verification. ISCSLP 2010: 166-170 - [c50]Zhiguo Wang, Cong Liu, Hai-Kun Wang, Yu Hu, Li-Rong Dai:
Phonetic clustering based confidence measure for embedded speech recognition. ISCSLP 2010: 186-189 - [c49]Yanhua Long, Li-Rong Dai, Eryu Wang, Bin Ma, Wu Guo:
Non-negative matrix factorization based discriminative features for speaker verification. ISCSLP 2010: 291-295 - [c48]LianWu Chen, Wu Guo, Li-Rong Dai:
Speaker verification against synthetic speech. ISCSLP 2010: 309-312 - [c47]Ling-Hui Chen, Zhen-Hua Ling, Wu Guo, Li-Rong Dai:
GMM-based voice conversion with explicit modelling on feature transform. ISCSLP 2010: 364-368 - [c46]Chen-Yu Yang, Zhen-Hua Ling, Heng Lu, Wu Guo, Li-Rong Dai:
Automatic phrase boundary labeling for Mandarin TTS corpus using context-dependent HMM. ISCSLP 2010: 374-377
2000 – 2009
- 2009
- [j2]Meng Wang, Xian-Sheng Hua, Tao Mei, Richang Hong, Guo-Jun Qi, Yan Song, Li-Rong Dai:
Semi-supervised kernel density estimation for video annotation. Comput. Vis. Image Underst. 113(3): 384-396 (2009) - [c45]Heng Lu, Zhen-Hua Ling, Ming Lei, Cheng-Cheng Wang, Huan-huan Zhao, Ling-Hui Chen, Yu Hu, Li-Rong Dai, Ren-Hua Wang:
The USTC System for Blizzard Challenge 2009. Blizzard Challenge 2009 - [c44]Heng Lu, Yi-Jian Wu, Keiichi Tokuda, Li-Rong Dai, Ren-Hua Wang:
Full covariance state duration modeling for HMM-based speech synthesis. ICASSP 2009: 4033-4036 - [c43]Haizhou Li, Bin Ma, Kong-Aik Lee, Hanwu Sun, Donglai Zhu, Khe Chai Sim, Changhuai You, Rong Tong, Ismo Kärkkäinen, Chien-Lin Huang, Vladimir Pervouchine, Wu Guo, Yijie Li, Li-Rong Dai, Mohaddeseh Nosratighods, Tharmarajah Thiruvaran, Julien Epps, Eliathamby Ambikairajah, Chng Eng Siong, Tanja Schultz, Qin Jin:
The I4U system in NIST 2008 speaker recognition evaluation. ICASSP 2009: 4201-4204 - [c42]Wu Guo, Yanhua Long, Yijie Li, Lei Pan, Eryu Wang, Li-Rong Dai:
iFLY system for the NIST 2008 speaker recognition evaluation. ICASSP 2009: 4209-4212 - [c41]Yanhua Long, Bin Ma, Haizhou Li, Wu Guo, Chng Eng Siong, Li-Rong Dai:
Exploiting prosodic information for Speaker Recognition. ICASSP 2009: 4225-4228 - [c40]Yan Song, Li-Rong Dai, Ren-Hua Wang:
An automatic language identification method based on subspace analysis. ICME 2009: 598-601 - [c39]Cheng-Cheng Wang, Zhen-Hua Ling, Li-Rong Dai:
Asynchronous F0 and spectrum modeling for HMM-based speech synthesis. INTERSPEECH 2009: 404-407 - 2008
- [c38]Zhen-Hua Ling, Heng Lu, Guoping Hu, Li-Rong Dai, Ren-Hua Wang:
The USTC System for Blizzard Challenge 2008. Blizzard Challenge 2008 - [c37]Long Qin, Yi-Jian Wu, Zhen-Hua Ling, Ren-Hua Wang, Li-Rong Dai:
Minumum generation error linear regression based model adaptation for HMM-based speech synthesis. ICASSP 2008: 3953-3956 - [c36]Long Qin, Yi-Jian Wu, Zhen-Hua Ling, Ren-Hua Wang, Li-Rong Dai:
Minimum generation error criterion considering global/local variance for HMM-based speech synthesis. ICASSP 2008: 4621-4624 - [c35]Bo Zhu, Zhi-Jie Yan, Yu Hu, Zhiguo Wang, Li-Rong Dai, Ren-Hua Wang:
Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map. ISCSLP 2008: 93-96 - [c34]Cheng-Cheng Wang, Zhen-Hua Ling, Bu-Fan Zhang, Li-Rong Dai:
Multi-Layer F0 Modeling for HMM-Based Speech Synthesis. ISCSLP 2008: 129-132 - [c33]Heng Lu, Zhen-Hua Ling, Si Wei, Yu Hu, Li-Rong Dai, Ren-Hua Wang:
Heteronym Verification for Mandarin Speech Synthesis. ISCSLP 2008: 137-140 - [c32]Wu Guo, Li-Rong Dai, Ren-Hua Wang:
Double Gauss Based Unsupervised Score Normalization in Speaker Verification. ISCSLP 2008: 165-168 - [c31]Cong Liu, Yu Hu, Xiong-Guo Lei, Zhiguo Wang, Li-Rong Dai, Ren-Hua Wang:
Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion. ISCSLP 2008: 229-232 - [c30]Yanhua Long, Wu Guo, Li-Rong Dai:
Interfusing the Confused Region Score of Speaker Verification Systems. ISCSLP 2008: 314-317 - [c29]Eryu Wang, Wu Guo, Li-Rong Dai:
Parallel Phone Recognizer based MLLR Speaker Recognition. ISCSLP 2008: 318-321 - [c28]Yan Song, Li-Rong Dai:
A Sample and Feature Selection Scheme for GMM-SVM Based Language Recognition. ISCSLP 2008: 326-329 - [c27]Xu Bing, Yan Song, Li-Rong Dai:
The Adaptation Schemes In PR-SVM Based Language Recognition. ISCSLP 2008: 334-337 - 2007
- [j1]Meng Wang, Xian-Sheng Hua, Tao Mei, Jinhui Tang, Guo-Jun Qi, Yan Song, Li-Rong Dai:
Interactive Video Annotation by Multi-Concept Multi-Modality Active Learning. Int. J. Semantic Comput. 1(4): 459-477 (2007) - [c26]Zhen-Hua Ling, Long Qin, Heng Lu, Yu Gao, Li-Rong Dai, Ren-Hua Wang, Yuan Jiang, Zhi-Wei Zhao, Jin-Hui Yang, Jie Chen, Guo-Ping Hu:
The USTC and iflytek speech synthesis systems for Blizzard Challenge 2007. Blizzard Challenge 2007 - [c25]Wu Guo, Lei Pan, Ren-Hua Wang, Li-Rong Dai:
Angle of Models Distance as Test Algorithm in Speaker Verification. FSKD (4) 2007: 231-234 - [c24]Meng Wang, Xian-Sheng Hua, Yan Song, Li-Rong Dai, Ren-Hua Wang:
An Interactive Video Annotation Frameowrk with Multiple Modalities. ICASSP (1) 2007: 957-960 - [c23]Meng Wang, Xian-Sheng Hua, Yan Song, Richang Hong, Li-Rong Dai:
Lazy Learning Based Efficient Video Annotation. ICME 2007: 607-610 - [c22]Meng Wang, Xian-Sheng Hua, Xun Yuan, Yan Song, Li-Rong Dai:
Multi-Graph Semi-Supervised Learning for Video Semantic Feature Extraction. ICME 2007: 1978-1981 - [c21]Meng Wang, Tao Mei, Xun Yuan, Yan Song, Li-Rong Dai:
Video annotation by graph-based learning with neighborhood similarity. ACM Multimedia 2007: 325-328 - [c20]Meng Wang, Xian-Sheng Hua, Xun Yuan, Yan Song, Li-Rong Dai:
Optimizing multi-graph learning: towards a unified video annotation scheme. ACM Multimedia 2007: 862-871 - [c19]Meng Wang, Xian-Sheng Hua, Yan Song, Wei Lai, Li-Rong Dai, Ren-Hua Wang:
An Efficient Automatic Video Shot Size Annotation Scheme. MMM (1) 2007: 649-658 - [c18]Meng Wang, Xian-Sheng Hua, Yan Song, Jinhui Tang, Li-Rong Dai:
RMulti-Concept Multi-Modality Active Learning for Interactive Video Annotation. ICSC 2007: 321-328 - 2006
- [c17]Guo-Jun Qi, Yan Song, Xian-Sheng Hua, Hong-Jiang Zhang, Li-Rong Dai:
Video Annotation by Active Learning and Cluster Tuning. CVPR Workshops 2006: 114 - [c16]Yan Song, Xian-Sheng Hua, Li-Rong Dai, Meng Wang, Ren-Hua Wang:
An Automatic Video Semantic Annotation Scheme Based on Combination of Complementary Predictors. ICASSP (5) 2006: 501-504 - [c15]Meng Wang, Xian-Sheng Hua, Yan Song, Li-Rong Dai, HongJiang Zhang:
Semi-Supervised Kernel Regression. ICDM 2006: 1130-1135 - [c14]Yan Song, Guo-Jun Qi, Xian-Sheng Hua, Li-Rong Dai, Ren-Hua Wang:
Video Annotation by Active Learning and Semi-Supervised Ensembling. ICME 2006: 933-936 - [c13]Meng Wang, Xian-Sheng Hua, Li-Rong Dai, Yan Song:
Enhanced Semi-Supervised Learning for Automatic Video Annotation. ICME 2006: 1485-1488 - [c12]Meng Wang, Xian-Sheng Hua, Yan Song, Li-Rong Dai, Shipeng Li:
Automatic video annotation based on co-adaptation and label correction. ISCAS 2006 - [c11]Wu Guo, Renhua Wang, Lirong Dai:
Feature Extraction and Test Algorithm for Speaker Verification. ISCSLP 2006 - [c10]Feng Zhang, Yan Song, Lirong Dai, Ren-Hua Wang:
Two-layer Distance Scheme in Matching Engine for Query by Humming System. ISCSLP 2006 - [c9]Yan Song, Xian-Sheng Hua, Guo-Jun Qi, Li-Rong Dai, Meng Wang, HongJiang Zhang:
Efficient semantic annotation method for indexing large personal video database. Multimedia Information Retrieval 2006: 289-296 - 2005
- [c8]Long Qin, Gao Peng Chen, Zhen-Hua Ling, Li-Rong Dai:
An Improved Spectral and Prosodic Transformation Method in STRAIGHT-based Voice Conversion. ICASSP (1) 2005: 21-24 - [c7]Jianfeng Li, Guoping Hu, Ren-Hua Wang, Li-Rong Dai:
Sliding Window Smoothing For Maximum Entropy Based Intonational Phrase Prediction In Chinese. ICASSP (1) 2005: 285-288 - [c6]Yan Song, Xian-Sheng Hua, Li-Rong Dai, Meng Wang:
Semi-automatic video annotation based on active learning with multiple complementary predictors. Multimedia Information Retrieval 2005: 97-104 - 2004
- [c5]Jin-Yu Li, Bo Liu, Ren-Hua Wang, Li-Rong Dai:
A complexity reduction of ETSI advanced front-end for DSR. ICASSP (1) 2004: 61-64 - [c4]Wei Lai, Xiaodong Gu, Ren-Hua Wang, Li-Rong Dai, HongJiang Zhang:
A region based multiple frame-rate tradeoff of video streaming. ICIP 2004: 2067-2070 - [c3]Xiao-Bing Li, Li-Rong Dai, Ren-Hua Wang:
MCE-based training of subspace distribution clustering HMM. ISCSLP 2004: 113-116 - [c2]Bo Liu, Li-Rong Dai, Jin-Yu Li, Ren-Hua Wang:
Double Gaussian based feature normalization for robust speech recognition. ISCSLP 2004: 253-256 - [c1]Wei Lai, Xiaodong Gu, Ren-Hua Wang, Li-Rong Dai, HongJiang Zhang:
Perceptual Video Streaming by Adaptive Spatial-temporal Scalability. PCM (2) 2004: 431-438
Coauthor Index
aka: Ling-Hui Chen
aka: Kong Aik Lee
aka: Zhenhua Ling
aka: Ian Vince McLoughlin
aka: Ren-Hua Wang
aka: Zi-qiang Zhang
aka: Qiu-Shi Zhu
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-01-21 00:10 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint