default search action
Kai Yu 0004
Person information
- affiliation: Shanghai Jiao Tong University, Computer Science and Engineering Department, China
- affiliation (PhD 2006): Cambridge University, Engineering Department, UK
Other persons with the same name
- Kai Yu — disambiguation page
- Kai Yu 0001 — Baidu Inc., Institute of Deep Learning, Beijing, China (and 3 more)
- Kai Yu 0002 — Royal Institute of Technology, Stockholm, Sweden
- Kai Yu 0003 — University of Minnesota, Department of Biomedical Engineering, Minneapolis, MN, USA
- Kai Yu 0005 — Zhejiang University, State Key Laboratory of Industrial Control Technology, Hangzhou, China
- Kai Yu 0006 — Beijing Normal University, School of Geography, China (and 1 more)
- Kai Yu 0007 — Hohai University, College of Oceanography, Nanjing, China (and 2 more)
- Kai Yu 0008 — Guangdong University of Technology, School of Information Engineering, School of Integrated Circuits, China (and 1 more)
- Kai Yu 0009 — Soochow University, School of Electronics and Information Engineering, Jiangsu, China (and 1 more)
- Kai Yu 0010 — Nanjing University, School of Electronic Science and Engineering, China
- Kai Yu 0011 — Sun Yat-Sen University, Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, China
- Kai Yu 0012 — Chinese Academy of Sciences, Shanghai Institute of Microsystem and Information Technology, China
- Kai Yu 0013 — Beihang University, School of Computer Science and Engineering, State Key Laboratory of Software Development Environment, China
- Kai Yu 0014 — Nanjing University of Aeronautics and Astronautics, College of Electronic and Information Engineering, China
- Kai Yu 0015 — Shandong University of Science and Technology, College of Mining and Safety Engineering, Qingdao, China
- Kai Yu 0016 — Intel Corporation, Hillsboro, OR, USA (and 1 more)
- Kai Yu 0017 — Nankai University, Chern Institute of Mathematics and LPMC, Tianjin, China
- Kai Yu 0018 — Hangzhou Normal University, Department of Information Science and Technology, China (and 2 more)
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j42]Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu:
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning. IEEE ACM Trans. Audio Speech Lang. Process. 32: 95-112 (2024) - [j41]Wenbin Jiang, Kai Yu, Fei Wen:
Unsupervised Speech Enhancement Using Optimal Transport and Speech Presence Probability. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4445-4455 (2024) - [j40]Zheng Liang, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4810-4821 (2024) - [j39]Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu:
Towards Weakly Supervised Text-to-Audio Grounding. IEEE Trans. Multim. 26: 11126-11138 (2024) - [c224]Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu:
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding. AAAI 2024: 17924-17932 - [c223]Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, Kai Yu:
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. AAAI 2024: 19053-19061 - [c222]Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu:
Sparsity-Accelerated Training for Large Language Models. ACL (Findings) 2024: 14696-14707 - [c221]Ruiyang Zhou, Lu Chen, Kai Yu:
Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks. LREC/COLING 2024: 9340-9351 - [c220]Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu:
Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind. LREC/COLING 2024: 11794-11812 - [c219]Yang Han, Yiming Wang, Rui Wang, Lu Chen, Kai Yu:
AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference. EMNLP (Findings) 2024: 8506-8522 - [c218]Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhancing Audio Generation Diversity with Visual Information. ICASSP 2024: 866-870 - [c217]Xuenan Xu, Xiaohang Xu, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu:
A Detailed Audio-Text Data Simulation Pipeline Using Single-Event Sounds. ICASSP 2024: 1091-1095 - [c216]Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu:
DiffDub: Person-Generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-Encoder. ICASSP 2024: 3630-3634 - [c215]Pingyue Zhang, Mengyue Wu, Kai Yu:
Semantic-Enhanced Supervised Contrastive Learning. ICASSP 2024: 6030-6034 - [c214]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. ICASSP 2024: 10401-10405 - [c213]Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching. ICASSP 2024: 11121-11125 - [c212]Sen Liu, Yiwei Guo, Xie Chen, Kai Yu:
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations. ICASSP 2024: 11521-11525 - [c211]Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
Acoustic BPE for Speech Generation with Discrete Tokens. ICASSP 2024: 11746-11750 - [c210]Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen, Kai Yu:
A Birgat Model for Multi-Intent Spoken Language Understanding with Hierarchical Semantic Frames. ICASSP 2024: 12251-12255 - [c209]Junjie Li, Yiwei Guo, Xie Chen, Kai Yu:
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention. ICASSP 2024: 12296-12300 - [c208]Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu:
Evolving Subnetwork Training for Large Language Models. ICML 2024 - [c207]Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu:
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding. ACM Multimedia 2024: 6696-6705 - [c206]Hanchong Zhang, Ruisheng Cao, Hongshen Xu, Lu Chen, Kai Yu:
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions. NAACL-HLT 2024: 6487-6508 - [c205]Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu:
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding. WSDM 2024: 864-872 - [i119]Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu:
Towards Weakly Supervised Text-to-Audio Grounding. CoRR abs/2401.02584 (2024) - [i118]Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen, Kai Yu:
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech. CoRR abs/2401.14321 (2024) - [i117]Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu:
ChemDFM: Dialogue Foundation Model for Chemistry. CoRR abs/2401.14818 (2024) - [i116]Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu:
MULTI: Multimodal Understanding Leaderboard with Text and Images. CoRR abs/2402.03173 (2024) - [i115]Yiming Ai, Zhiwei He, Ziyin Zhang, Wenhong Zhu, Hongkun Hao, Kai Yu, Lingjun Chen, Rui Wang:
Is Cognition and Action Consistent or Not: Investigating Large Language Model's Personality. CoRR abs/2402.14679 (2024) - [i114]Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen, Kai Yu:
A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames. CoRR abs/2402.18258 (2024) - [i113]Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu:
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding. CoRR abs/2402.18262 (2024) - [i112]Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhancing Audio Generation Diversity with Visual Information. CoRR abs/2403.01278 (2024) - [i111]Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, Lijie Wen:
ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary. CoRR abs/2403.02574 (2024) - [i110]Xuenan Xu, Xiaohang Xu, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu:
A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds. CoRR abs/2403.04594 (2024) - [i109]Hongshen Xu, Zichen Zhu, Da Ma, Situo Zhang, Shuai Fan, Lu Chen, Kai Yu:
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback. CoRR abs/2403.18349 (2024) - [i108]Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu:
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind. CoRR abs/2404.04748 (2024) - [i107]Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. CoRR abs/2404.06079 (2024) - [i106]Sen Liu, Yiwei Guo, Xie Chen, Kai Yu:
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations. CoRR abs/2404.14946 (2024) - [i105]Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu:
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech. CoRR abs/2404.19723 (2024) - [i104]Hanchong Zhang, Ruisheng Cao, Hongshen Xu, Lu Chen, Kai Yu:
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions. CoRR abs/2405.02712 (2024) - [i103]Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu:
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding. CoRR abs/2405.03121 (2024) - [i102]Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu:
Sparsity-Accelerated Training for Large Language Models. CoRR abs/2406.01392 (2024) - [i101]Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu:
Evolving Subnetwork Training for Large Language Models. CoRR abs/2406.06962 (2024) - [i100]Zeyu Xie, Baihan Li, Xuenan Xu, Zheng Liang, Kai Yu, Mengyue Wu:
FakeSound: Deepfake General Audio Detection. CoRR abs/2406.08052 (2024) - [i99]Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. CoRR abs/2406.11546 (2024) - [i98]Bohan Li, Feiyu Shen, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu:
On the Effectiveness of Acoustic BPE in Decoder-Only TTS. CoRR abs/2407.03892 (2024) - [i97]Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu:
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? CoRR abs/2407.10956 (2024) - [i96]Baihan Li, Zeyu Xie, Xuenan Xu, Yiwei Guo, Ming Yan, Ji Zhang, Kai Yu, Mengyue Wu:
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation. CoRR abs/2407.13198 (2024) - [i95]Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu:
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders. CoRR abs/2409.01995 (2024) - [i94]Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Ziping Wan, Yansi Li, Zhongyang Dai, Xin Chen, Kai Yu:
ChemDFM-X: Towards Large Multimodal Model for Chemistry. CoRR abs/2409.13194 (2024) - [i93]Liangtai Sun, Danyu Luo, Da Ma, Zihan Zhao, Baocai Chen, Zhennan Shen, Su Zhu, Lu Chen, Xin Chen, Kai Yu:
SciDFM: A Large Language Model with Mixture-of-Experts for Science. CoRR abs/2409.18412 (2024) - [i92]Yang Han, Yiming Wang, Rui Wang, Lu Chen, Kai Yu:
AlignSum: Data Pyramid Hierarchical Fine-tuning for Aligning with Human Summarization Preference. CoRR abs/2410.00409 (2024) - [i91]Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen:
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. CoRR abs/2410.06885 (2024) - [i90]Wenxi Chen, Ziyang Ma, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen:
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs. CoRR abs/2410.09503 (2024) - [i89]Zichen Zhu, Hao Tang, Yansi Li, Kunyao Lan, Yixuan Jiang, Hao Zhou, Yixiao Wang, Situo Zhang, Liangtai Sun, Lu Chen, Kai Yu:
MobA: A Two-Level Agent System for Efficient Mobile Task Automation. CoRR abs/2410.13757 (2024) - [i88]Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu:
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec. CoRR abs/2410.15764 (2024) - 2023
- [j38]Ruisheng Cao, Lu Chen, Jieyu Li, Hanchong Zhang, Hongshen Xu, Wangyou Zhang, Kai Yu:
A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL. IEEE Trans. Pattern Anal. Mach. Intell. 45(11): 13796-13813 (2023) - [j37]Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu, Kai Yu:
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue. Trans. Assoc. Comput. Linguistics 11: 68-84 (2023) - [j36]Wenbin Jiang, Kai Yu:
Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1758-1770 (2023) - [j35]Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu:
Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3446-3456 (2023) - [c204]Sheng Jiang, Su Zhu, Ruisheng Cao, Qingliang Miao, Kai Yu:
SPM: A Split-Parsing Method for Joint Multi-Intent Detection and Slot Filling. ACL (industry) 2023: 668-675 - [c203]Jieyu Li, Lu Chen, Ruisheng Cao, Su Zhu, Hongshen Xu, Zhi Chen, Hanchong Zhang, Kai Yu:
Exploring Schema Generalizability of Text-to-SQL. ACL (Findings) 2023: 1344-1360 - [c202]Yiming Ai, Zhiwei He, Kai Yu, Rui Wang:
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation. ACL (2) 2023: 1930-1941 - [c201]Hanchong Zhang, Jieyu Li, Lu Chen, Ruisheng Cao, Yunyan Zhang, Yu Huang, Yefeng Zheng, Kai Yu:
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset. ACL (Findings) 2023: 6970-6983 - [c200]Hanchong Zhang, Ruisheng Cao, Lu Chen, Hongshen Xu, Kai Yu:
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought. EMNLP (Findings) 2023: 3501-3532 - [c199]Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Kai Yu, Xie Chen:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. ICASSP 2023: 1-5 - [c198]Chenpeng Du, Yiwei Guo, Feiyu Shen, Kai Yu:
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge. ICASSP 2023: 1-2 - [c197]Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
Emodiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance. ICASSP 2023: 1-5 - [c196]Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu:
Diverse and Vivid Sound Generation from Text Descriptions. ICASSP 2023: 1-5 - [c195]Tao Liu, Zhengyang Chen, Yanmin Qian, Kai Yu:
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge. ICASSP 2023: 1-2 - [c194]Zhijun Liu, Yiwei Guo, Kai Yu:
DiffVoice: Text-to-Speech with Latent Diffusion. ICASSP 2023: 1-5 - [c193]Xuenan Xu, Mengyue Wu, Kai Yu:
Investigating Pooling Strategies and Loss Functions for Weakly-Supervised Text-to-Audio Grounding via Contrastive Learning. ICASSP Workshops 2023: 1-5 - [c192]Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech. INTERSPEECH 2023: 616-620 - [c191]Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation. INTERSPEECH 2023: 919-923 - [c190]Pingyue Zhang, Mengyue Wu, Kai Yu:
ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection. INTERSPEECH 2023: 2998-3002 - [c189]Wenbin Jiang, Fei Wen, Yifan Zhang, Kai Yu:
UnSE: Unsupervised Speech Enhancement Using Optimal Transport. INTERSPEECH 2023: 4029-4033 - [c188]Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhance Temporal Relations in Audio Captioning with Sound Event Detection. INTERSPEECH 2023: 4179-4183 - [c187]Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian:
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder. ACM Multimedia 2023: 4281-4289 - [c186]Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu:
Large Language Models Are Semi-Parametric Reinforcement Learning Agents. NeurIPS 2023 - [i87]Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Xie Chen, Kai Yu:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. CoRR abs/2303.05322 (2023) - [i86]Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian:
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder. CoRR abs/2303.17550 (2023) - [i85]Zhijun Liu, Yiwei Guo, Kai Yu:
DiffVoice: Text-to-Speech with Latent Diffusion. CoRR abs/2304.11750 (2023) - [i84]Chenpeng Du, Yiwei Guo, Feiyu Shen, Kai Yu:
Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge. CoRR abs/2304.13121 (2023) - [i83]Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu:
Diverse and Vivid Sound Generation from Text Descriptions. CoRR abs/2305.01980 (2023) - [i82]Danyang Zhang, Lu Chen, Kai Yu:
Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction. CoRR abs/2305.08144 (2023) - [i81]Yiming Ai, Zhiwei He, Kai Yu, Rui Wang:
TeCS: A Dataset and Benchmark for Tense Consistency of Machine Translation. CoRR abs/2305.13740 (2023) - [i80]Hanchong Zhang, Jieyu Li, Lu Chen, Ruisheng Cao, Yunyan Zhang, Yu Huang, Yefeng Zheng, Kai Yu:
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset. CoRR abs/2305.15891 (2023) - [i79]Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhance Temporal Relations in Audio Captioning with Sound Event Detection. CoRR abs/2306.01533 (2023) - [i78]Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu:
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding. CoRR abs/2306.07547 (2023) - [i77]Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu:
Large Language Model Is Semi-Parametric Reinforcement Learning Agent. CoRR abs/2306.07929 (2023) - [i76]Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation. CoRR abs/2306.08588 (2023) - [i75]Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu:
Improving Audio Caption Fluency with Automatic Error Correction. CoRR abs/2306.10090 (2023) - [i74]Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech. CoRR abs/2306.14145 (2023) - [i73]Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, Kai Yu:
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. CoRR abs/2308.13149 (2023) - [i72]Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching. CoRR abs/2309.05027 (2023) - [i71]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. CoRR abs/2309.07377 (2023) - [i70]Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
Acoustic BPE for Speech Generation with Discrete Tokens. CoRR abs/2310.14580 (2023) - [i69]Hanchong Zhang, Ruisheng Cao, Lu Chen, Hongshen Xu, Kai Yu:
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought. CoRR abs/2310.17342 (2023) - [i68]Ruisheng Cao, Hanchong Zhang, Hongshen Xu, Jieyu Li, Da Ma, Lu Chen, Kai Yu:
ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL. CoRR abs/2310.18662 (2023) - [i67]Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu:
Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations. CoRR abs/2311.01260 (2023) - [i66]Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu:
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder. CoRR abs/2311.01811 (2023) - [i65]Junjie Li, Yiwei Guo, Xie Chen, Kai Yu:
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention. CoRR abs/2312.08676 (2023) - 2022
- [j34]Bo Chen, Zhihang Xu, Kai Yu:
Data augmentation based non-parallel voice conversion with frame-level speaker disentangler. Speech Commun. 136: 14-22 (2022) - [j33]Chenpeng Du, Kai Yu:
Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 30: 190-201 (2022) - [j32]Bo Chen, Chenpeng Du, Kai Yu:
Neural Fusion for Voice Cloning. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1993-2001 (2022) - [c185]Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu:
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat. EMNLP 2022: 2438-2459 - [c184]Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu:
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. EMNLP 2022: 6699-6712 - [c183]Zhi Chen, Bei Chen, Lu Chen, Kai Yu, Jian-Guang Lou:
AdapterShare: Task Correlation Modeling with Adapter Differentiation. EMNLP 2022: 10645-10651 - [c182]Wenbin Jiang, Zhijun Liu, Kai Yu, Fei Wen:
Speech Enhancement with Neural Homomorphic Synthesis. ICASSP 2022: 376-380 - [c181]Guangwei Li, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
Category-Adapted Sound Event Enhancement with Weakly Labeled Data. ICASSP 2022: 851-855 - [c180]Xuenan Xu, Mengyue Wu, Kai Yu:
Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition. ICASSP 2022: 971-975 - [c179]Guangwei Li, Xuenan Xu, Mengyue Wu, Kai Yu:
Navigating Audio-Visual Event Detection Across Mismatched Modalities. ICASSP 2022: 1975-1979 - [c178]Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu:
Audio-Text Retrieval in Context. ICASSP 2022: 4793-4797 - [c177]Lingfeng Dai, Lu Chen, Zhikai Zhou, Kai Yu:
LatticeBART: Lattice-to-Lattice Pre-Training for Speech Recognition. ICASSP 2022: 6112-6116 - [c176]Wen Wu, Mengyue Wu, Kai Yu:
Climate and Weather: Inspecting Depression Detection via Emotion Recognition. ICASSP 2022: 6262-6266 - [c175]Yu Xi, Tian Tan, Wangyou Zhang, Baochen Yang, Kai Yu:
Text Adaptive Detection for Customizable Keyword Spotting. ICASSP 2022: 6652-6656 - [c174]Yiwei Guo, Chenpeng Du, Kai Yu:
Unsupervised Word-Level Prosody Tagging for Controllable Speech Synthesis. ICASSP 2022: 7597-7601 - [c173]Wenbin Jiang, Tao Liu, Kai Yu:
Efficient Speech Enhancement with Neural Homomorphic Synthesis. INTERSPEECH 2022: 986-990 - [c172]Tao Liu, Shuai Fan, Xu Xiang, Hongbo Song, Shaoxiong Lin, Jiaqi Sun, Tianyuan Han, Siyuan Chen, Binwei Yao, Sen Liu, Yifei Wu, Yanmin Qian, Kai Yu:
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild. INTERSPEECH 2022: 1476-1480 - [c171]Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu:
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature. INTERSPEECH 2022: 1596-1600 - [c170]Tao Liu, Xu Xiang, Zhengyang Chen, Bing Han, Kai Yu, Yanmin Qian:
The X-Lance Speaker Diarization System for the Conversational Short-phrase Speaker Diarization Challenge 2022. ISCSLP 2022: 498-501 - [c169]Qinpei Zhu, Renshou Wu, Guangfeng Liu, Xinyu Zhu, Xingyu Chen, Yang Zhou, Qingliang Miao, Rui Wang, Kai Yu:
The AISP-SJTU Simultaneous Translation System for IWSLT 2022. IWSLT@ACL 2022: 208-215 - [c168]Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen, Kai Yu:
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages. NAACL-HLT 2022: 1808-1821 - [c167]Zhi Chen, Lu Chen, Bei Chen, Libo Qin, Yuncong Liu, Su Zhu, Jian-Guang Lou, Kai Yu:
UniDU: Towards A Unified Generative Dialogue Understanding Framework. SIGDIAL 2022: 442-455 - [c166]Guangfeng Liu, Qinpei Zhu, Xingyu Chen, Renjie Feng, Jianxin Ren, Renshou Wu, Qingliang Miao, Rui Wang, Kai Yu:
The AISP-SJTU Translation System for WMT 2022. WMT 2022: 310-317 - [i64]Yiwei Guo, Chenpeng Du, Kai Yu:
Unsupervised word-level prosody tagging for controllable speech synthesis. CoRR abs/2202.07200 (2022) - [i63]Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu:
Audio-text Retrieval in Context. CoRR abs/2203.13645 (2022) - [i62]Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu:
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature. CoRR abs/2204.00768 (2022) - [i61]Zhi Chen, Lu Chen, Bei Chen, Libo Qin, Yuncong Liu, Su Zhu, Jian-Guang Lou, Kai Yu:
UniDU: Towards A Unified Generative Dialogue Understanding Framework. CoRR abs/2204.04637 (2022) - [i60]Wen Wu, Mengyue Wu, Kai Yu:
Climate and Weather: Inspecting Depression Detection via Emotion Recognition. CoRR abs/2204.14099 (2022) - [i59]Xuenan Xu, Mengyue Wu, Kai Yu:
A Comprehensive Survey of Automated Audio Captioning. CoRR abs/2205.05357 (2022) - [i58]Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen, Kai Yu:
TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages. CoRR abs/2205.06435 (2022) - [i57]Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu:
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. CoRR abs/2205.11029 (2022) - [i56]Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu:
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat. CoRR abs/2205.11764 (2022) - [i55]Zhi Chen, Jijia Bao, Lu Chen, Yuncong Liu, Da Ma, Bei Chen, Mengyue Wu, Su Zhu, Jian-Guang Lou, Kai Yu:
DialogZoo: Large-Scale Dialog-Oriented Task Learning. CoRR abs/2205.12662 (2022) - [i54]Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu, Kai Yu:
OPAL: Ontology-Aware Pretrained Language Model for End-to-End Task-Oriented Dialogue. CoRR abs/2209.04595 (2022) - [i53]Tao Liu, Kai Yu:
BER: Balanced Error Rate For Speaker Diarization. CoRR abs/2211.04304 (2022) - [i52]Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance. CoRR abs/2211.09496 (2022) - 2021
- [j31]Heinrich Dinkel, Mengyue Wu, Kai Yu:
Towards Duration Robust Weakly Supervised Sound Event Detection. IEEE ACM Trans. Audio Speech Lang. Process. 29: 887-900 (2021) - [j30]Heinrich Dinkel, Shuai Wang, Xuenan Xu, Mengyue Wu, Kai Yu:
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1542-1555 (2021) - [c165]Boer Lyu, Lu Chen, Su Zhu, Kai Yu:
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching. AAAI 2021: 13498-13506 - [c164]Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, Kai Yu:
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations. ACL/IJCNLP (1) 2021: 2541-2555 - [c163]Zhi Chen, Lu Chen, Hanqi Li, Ruisheng Cao, Da Ma, Mengyue Wu, Kai Yu:
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL. ACL/IJCNLP (Findings) 2021: 3063-3074 - [c162]Xingyu Chen, Zihan Zhao, Lu Chen, Jiabao Ji, Danyang Zhang, Ao Luo, Yuxuan Xiong, Kai Yu:
WebSRC: A Dataset for Web-Based Structural Reading Comprehension. EMNLP (1) 2021: 4173-4185 - [c161]Boer Lyu, Lu Chen, Kai Yu:
Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction. EMNLP (Findings) 2021: 4549-4555 - [c160]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events. ICASSP 2021: 606-610 - [c159]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Zeyu Xie, Kai Yu:
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning. ICASSP 2021: 905-909 - [c158]Chenpeng Du, Bing Han, Shuai Wang, Yanmin Qian, Kai Yu:
SynAug: Synthesis-Based Data Augmentation for Text-Dependent Speaker Verification. ICASSP 2021: 5844-5848 - [c157]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
A Lightweight Framework for Online Voice Activity Detection in the Wild. Interspeech 2021: 371-375 - [c156]Lingfeng Dai, Qi Liu, Kai Yu:
Class-Based Neural Network Language Model for Second-Pass Rescoring in ASR. Interspeech 2021: 2022-2026 - [c155]Chenpeng Du, Kai Yu:
Rich Prosody Diversity Modelling with Phone-Level Mixture Density Network. Interspeech 2021: 3136-3140 - [c154]Shuai Wang, Yexin Yang, Yanmin Qian, Kai Yu:
Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning. ISCSLP 2021: 1-5 - [c153]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
Audio Caption in a Car Setting with a Sentence-Level Loss. ISCSLP 2021: 1-5 - [c152]Pingyue Zhang, Mengyue Wu, Heinrich Dinkel, Kai Yu:
DEPA: Self-Supervised Audio Embedding for Depression Detection. ACM Multimedia 2021: 135-143 - [c151]Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, Kai Yu:
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser. NAACL-HLT 2021: 5567-5577 - [c150]Su Zhu, Lu Chen, Ruisheng Cao, Zhi Chen, Qingliang Miao, Kai Yu:
Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF. NLPCC (1) 2021: 505-516 - [c149]Yao Zhao, Lu Chen, Kai Yu:
Relation-Aware Multi-hop Reasoning forVisual Dialog. NLPCC (1) 2021: 810-821 - [i51]Heinrich Dinkel, Mengyue Wu, Kai Yu:
Towards duration robust weakly supervised sound event detection. CoRR abs/2101.07687 (2021) - [i50]Lu Chen, Xingyu Chen, Zihan Zhao, Danyang Zhang, Jiabao Ji, Ao Luo, Yuxuan Xiong, Kai Yu:
WebSRC: A Dataset for Web-Based Structural Reading Comprehension. CoRR abs/2101.09465 (2021) - [i49]Chenpeng Du, Kai Yu:
Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis. CoRR abs/2102.00851 (2021) - [i48]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Zeyu Xie, Kai Yu:
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning. CoRR abs/2102.11457 (2021) - [i47]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events. CoRR abs/2102.11474 (2021) - [i46]Boer Lyu, Lu Chen, Su Zhu, Kai Yu:
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching. CoRR abs/2102.12671 (2021) - [i45]Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, Kai Yu:
ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser. CoRR abs/2104.04689 (2021) - [i44]Heinrich Dinkel, Shuai Wang, Xuenan Xu, Mengyue Wu, Kai Yu:
Voice activity detection in the wild: A data-driven approach using teacher-student training. CoRR abs/2105.04065 (2021) - [i43]Chenpeng Du, Kai Yu:
Diverse and Controllable Speech Synthesis with GMM-Based Phone-Level Prosody Modelling. CoRR abs/2105.13086 (2021) - [i42]Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, Kai Yu:
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations. CoRR abs/2106.01093 (2021) - [i41]Zhi Chen, Lu Chen, Hanqi Li, Ruisheng Cao, Da Ma, Mengyue Wu, Kai Yu:
Decoupled Dialogue Modeling and Semantic Parsing for Multi-Turn Text-to-SQL. CoRR abs/2106.02282 (2021) - [i40]Su Zhu, Lu Chen, Ruisheng Cao, Zhi Chen, Qingliang Miao, Kai Yu:
Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF. CoRR abs/2112.04999 (2021) - 2020
- [j29]Fei Wu, Cewu Lu, Mingjie Zhu, Hao Chen, Jun Zhu, Kai Yu, Lei Li, Ming Li, Qianfeng Chen, Xi Li, Xudong Cao, Zhongyuan Wang, Zhengjun Zha, Yueting Zhuang, Yunhe Pan:
Towards a new generation of artificial intelligence in China. Nat. Mach. Intell. 2(6): 312-316 (2020) - [j28]Su Zhu, Zijian Zhao, Rao Ma, Kai Yu:
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1440-1451 (2020) - [j27]Su Zhu, Ruisheng Cao, Kai Yu:
Dual Learning for Semi-Supervised Natural Language Understanding. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1936-1947 (2020) - [j26]Qi Liu, Zhehuai Chen, Hao Li, Mingkun Huang, Yizhou Lu, Kai Yu:
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model. IEEE ACM Trans. Audio Speech Lang. Process. 28: 2174-2183 (2020) - [j25]Zhi Chen, Lu Chen, Xiaoyuan Liu, Kai Yu:
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. IEEE ACM Trans. Audio Speech Lang. Process. 28: 2400-2411 (2020) - [j24]Kai Yu, Rao Ma, Kaiyu Shi, Qi Liu:
Neural Network Language Model Compression With Product Quantization and Soft Binarization. IEEE ACM Trans. Audio Speech Lang. Process. 28: 2438-2449 (2020) - [j23]Shuai Wang, Yexin Yang, Zhanghao Wu, Yanmin Qian, Kai Yu:
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 28: 2598-2609 (2020) - [c148]Lu Chen, Boer Lv, Chi Wang, Su Zhu, Bowen Tan, Kai Yu:
Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks. AAAI 2020: 7521-7528 - [c147]Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu:
Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders. AAAI 2020: 9668-9675 - [c146]Yanbin Zhao, Lu Chen, Zhi Chen, Ruisheng Cao, Su Zhu, Kai Yu:
Line Graph Enhanced AMR-to-Text Generation with Mix-Order Graph Attention Networks. ACL 2020: 732-741 - [c145]Lu Chen, Yanbin Zhao, Boer Lyu, Lesheng Jin, Zhi Chen, Su Zhu, Kai Yu:
Neural Graph Matching Networks for Chinese Short Text Matching. ACL 2020: 6152-6158 - [c144]Ruisheng Cao, Su Zhu, Chenyu Yang, Chen Liu, Rao Ma, Yanbin Zhao, Lu Chen, Kai Yu:
Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing. ACL 2020: 6806-6817 - [c143]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning. DCASE 2020: 225-229 - [c142]Su Zhu, Jieyu Li, Lu Chen, Kai Yu:
Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking. EMNLP (Findings) 2020: 766-781 - [c141]Heinrich Dinkel, Kai Yu:
Duration Robust Weakly Supervised Sound Event Detection. ICASSP 2020: 311-315 - [c140]Yexin Yang, Shuai Wang, Xun Gong, Yanmin Qian, Kai Yu:
Text Adaptation for Speaker Verification with Speaker-Text Factorized Embeddings. ICASSP 2020: 6454-6458 - [c139]Zhengyang Chen, Shuai Wang, Yanmin Qian, Kai Yu:
Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training. ICASSP 2020: 6574-6578 - [c138]Shuai Wang, Johan Rohdin, Oldrich Plchot, Lukás Burget, Kai Yu, Jan Cernocký:
Investigation of Specaugment for Deep Speaker Embedding Learning. ICASSP 2020: 7139-7143 - [c137]Chenpeng Du, Kai Yu:
Speaker Augmentation for Low Resource Speech Recognition. ICASSP 2020: 7719-7723 - [c136]Rao Ma, Hao Li, Qi Liu, Lu Chen, Kai Yu:
Neural Lattice Search for Speech Recognition. ICASSP 2020: 7794-7798 - [c135]Jieyu Li, Su Zhu, Kai Yu:
A Hierarchical Tracker for Multi-Domain Dialogue State Tracking. ICASSP 2020: 8014-8018 - [c134]Rao Ma, Lesheng Jin, Qi Liu, Lu Chen, Kai Yu:
Addressing the Polysemy Problem in Language Modeling with Attentional Multi-Sense Embeddings. ICASSP 2020: 8129-8133 - [c133]Han Zhao, Weihao Cui, Quan Chen, Jingwen Leng, Kai Yu, Deze Zeng, Chao Li, Minyi Guo:
CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs. ICDCS 2020: 853-863 - [c132]Zhijun Liu, Kuan Chen, Kai Yu:
Neural Homomorphic Vocoder. INTERSPEECH 2020: 240-244 - [c131]Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen, Kai Yu:
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding. INTERSPEECH 2020: 871-875 - [c130]Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu:
Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection. INTERSPEECH 2020: 1086-1090 - [c129]Yefei Chen, Heinrich Dinkel, Mengyue Wu, Kai Yu:
Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection. INTERSPEECH 2020: 3665-3669 - [c128]Zihan Xu, Zhi Chen, Lu Chen, Su Zhu, Kai Yu:
Memory Attention Neural Network for Multi-domain Dialogue State Tracking. NLPCC (1) 2020: 41-52 - [c127]Chen Liu, Su Zhu, Lu Chen, Kai Yu:
Robust Spoken Language Understanding with RL-Based Value Error Recovery. NLPCC (1) 2020: 78-90 - [c126]Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu:
An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models. NLPCC (1) 2020: 359-371 - [i39]Su Zhu, Zijian Zhao, Rao Ma, Kai Yu:
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding. CoRR abs/2003.09831 (2020) - [i38]Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu:
GPVAD: Towards noise robust voice activity detection via weakly supervised sound event detection. CoRR abs/2003.12222 (2020) - [i37]Su Zhu, Jieyu Li, Lu Chen, Kai Yu:
Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking. CoRR abs/2004.03386 (2020) - [i36]Su Zhu, Ruisheng Cao, Kai Yu:
Dual Learning for Semi-Supervised Natural Language Understanding. CoRR abs/2004.12299 (2020) - [i35]Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu:
Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders. CoRR abs/2004.14693 (2020) - [i34]Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen, Kai Yu:
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding. CoRR abs/2005.11640 (2020) - [i33]Ruisheng Cao, Su Zhu, Chenyu Yang, Chen Liu, Rao Ma, Yanbin Zhao, Lu Chen, Kai Yu:
Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing. CoRR abs/2005.13485 (2020) - [i32]Heinrich Dinkel, Nanxin Chen, Yanmin Qian, Kai Yu:
End-to-end spoofing detection with raw waveform CLDNNs. CoRR abs/2007.13060 (2020) - [i31]Qi Liu, Zhehuai Chen, Hao Li, Mingkun Huang, Yizhou Lu, Kai Yu:
Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model. CoRR abs/2008.00953 (2020) - [i30]Qi Liu, Tian Tan, Kai Yu:
An Investigation on Deep Learning with Beta Stabilizer. CoRR abs/2008.01173 (2020) - [i29]Qi Liu, Yanmin Qian, Kai Yu:
Future Vector Enhanced LSTM Language Model for LVCSR. CoRR abs/2008.01832 (2020) - [i28]Chen Liu, Su Zhu, Lu Chen, Kai Yu:
Robust Spoken Language Understanding with RL-based Value Error Recovery. CoRR abs/2009.03095 (2020) - [i27]Su Zhu, Ruisheng Cao, Lu Chen, Kai Yu:
Vector Projection Network for Few-shot Slot Tagging in Natural Language Understanding. CoRR abs/2009.09568 (2020) - [i26]Yefei Chen, Shuai Wang, Yanmin Qian, Kai Yu:
End-to-End Speaker-Dependent Voice Activity Detection. CoRR abs/2009.09906 (2020) - [i25]Zhi Chen, Lu Chen, Xiang Zhou, Kai Yu:
Deep Reinforcement Learning for On-line Dialogue State Tracking. CoRR abs/2009.10321 (2020) - [i24]Zhi Chen, Lu Chen, Xiaoyuan Liu, Kai Yu:
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. CoRR abs/2009.10326 (2020) - [i23]Zhi Chen, Xiaoyuan Liu, Lu Chen, Kai Yu:
Structured Hierarchical Dialogue Policy with Graph Neural Networks. CoRR abs/2009.10355 (2020) - [i22]Zhi Chen, Lu Chen, Yanbin Zhao, Su Zhu, Kai Yu:
Dual Learning for Dialogue State Tracking. CoRR abs/2009.10430 (2020) - [i21]Zhi Chen, Lu Chen, Zihan Xu, Yanbin Zhao, Su Zhu, Kai Yu:
CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking. CoRR abs/2009.10435 (2020) - [i20]Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu:
An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models. CoRR abs/2010.07109 (2020)
2010 – 2019
- 2019
- [j22]Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, Kai Yu:
AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning. IEEE ACM Trans. Audio Speech Lang. Process. 27(9): 1378-1391 (2019) - [j21]Shuai Wang, Zili Huang, Yanmin Qian, Kai Yu:
Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification. IEEE ACM Trans. Audio Speech Lang. Process. 27(11): 1686-1696 (2019) - [c125]Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, Kai Yu:
Semantic Parsing with Dual Learning. ACL (1) 2019: 51-64 - [c124]Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian, Kai Yu:
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition. APSIPA 2019: 1652-1656 - [c123]Rao Ma, Qi Liu, Kai Yu:
Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training. ASRU 2019: 62-69 - [c122]Mingkun Huang, Yizhou Lu, Lan Wang, Yanmin Qian, Kai Yu:
Exploring Model Units and Training Strategies for End-to-End Speech Recognition. ASRU 2019: 524-531 - [c121]Bo Chen, Kuan Chen, Zhijun Liu, Zhihang Xu, Songze Wu, Chenpeng Du, Muyang Li, Sijun Li, Kai Yu:
SJTU Entry in Blizzard Challenge 2019. Blizzard Challenge 2019 - [c120]Zijian Zhao, Su Zhu, Kai Yu:
Data Augmentation with Atomic Templates for Spoken Language Understanding. EMNLP/IJCNLP (1) 2019: 3635-3641 - [c119]Mengyue Wu, Heinrich Dinkel, Kai Yu:
Audio Caption: Listen and Tell. ICASSP 2019: 830-834 - [c118]Shuai Wang, Yexin Yang, Tianzhe Wang, Yanmin Qian, Kai Yu:
Knowledge Distillation for Small Foot-print Deep Speaker Embedding. ICASSP 2019: 6021-6025 - [c117]Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe:
End-to-end Monaural Multi-speaker ASR System without Pretraining. ICASSP 2019: 6256-6260 - [c116]Zijian Zhao, Su Zhu, Kai Yu:
A Hierarchical Decoding Model for Spoken Language Understanding from Unaligned Data. ICASSP 2019: 7305-7309 - [c115]Su Zhu, Zijian Zhao, Tiejun Zhao, Chengqing Zong, Kai Yu:
CATSLU: The 1st Chinese Audio-Textual Spoken Language Understanding Challenge. ICMI 2019: 521-525 - [c114]Hao Li, Chen Liu, Su Zhu, Kai Yu:
Robust Spoken Language Understanding with Acoustic and Domain Knowledge. ICMI 2019: 531-535 - [c113]Yexin Yang, Hongji Wang, Heinrich Dinkel, Zhengyang Chen, Shuai Wang, Yanmin Qian, Kai Yu:
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge. INTERSPEECH 2019: 1038-1042 - [c112]Shuai Wang, Johan Rohdin, Lukás Burget, Oldrich Plchot, Yanmin Qian, Kai Yu, Jan Cernocký:
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction. INTERSPEECH 2019: 1148-1152 - [c111]Zhanghao Wu, Shuai Wang, Yanmin Qian, Kai Yu:
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification. INTERSPEECH 2019: 1163-1167 - [c110]Jiaqi Guo, Yongbin You, Yanmin Qian, Kai Yu:
Joint Decoding of CTC Based Systems for Speech Recognition. INTERSPEECH 2019: 2205-2209 - [c109]Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu:
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training. INTERSPEECH 2019: 2938-2942 - [c108]Juncheng Cao, Hai Zhao, Kai Yu:
Cross Aggregation of Multi-head Attention for Neural Machine Translation. NLPCC (1) 2019: 380-392 - [e2]Wen Gao, Helen Mei-Ling Meng, Matthew A. Turk, Susan R. Fussell, Björn W. Schuller, Yale Song, Kai Yu:
International Conference on Multimodal Interaction, ICMI 2019, Suzhou, China, October 14-18, 2019. ACM 2019, ISBN 978-1-4503-6860-5 [contents] - [i19]Mengyue Wu, Heinrich Dinkel, Kai Yu:
Audio Caption: Listen and Tell. CoRR abs/1902.09254 (2019) - [i18]Heinrich Dinkel, Kai Yu:
Duration robust sound event detection. CoRR abs/1904.03841 (2019) - [i17]Zijian Zhao, Su Zhu, Kai Yu:
A Hierarchical Decoding Model For Spoken Language Understanding From Unaligned Data. CoRR abs/1904.04498 (2019) - [i16]Heinrich Dinkel, Mengyue Wu, Kai Yu:
Text-based Depression Detection: What Triggers An Alert. CoRR abs/1904.05154 (2019) - [i15]Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, Kai Yu:
AgentGraph: Towards Universal Dialogue Management with Structured Deep Reinforcement Learning. CoRR abs/1905.11259 (2019) - [i14]Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
What does a Car-ssette tape tell? CoRR abs/1905.13448 (2019) - [i13]Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian, Kai Yu:
Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition. CoRR abs/1906.07317 (2019) - [i12]Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, Kai Yu:
Semantic Parsing with Dual Learning. CoRR abs/1907.05343 (2019) - [i11]Zijian Zhao, Su Zhu, Kai Yu:
Data Augmentation with Atomic Templates for Spoken Language Understanding. CoRR abs/1908.10770 (2019) - [i10]Heinrich Dinkel, Pingyue Zhang, Mengyue Wu, Kai Yu:
Depa: Self-supervised audio embedding for depression detection. CoRR abs/1910.13028 (2019) - 2018
- [j20]Zhehuai Chen, Yanmin Qian, Kai Yu:
Sequence discriminative training for deep learning based acoustic keyword spotting. Speech Commun. 102: 100-111 (2018) - [j19]Kai Yu, Zijian Zhao, Xueyang Wu, Hongtao Lin, Xuan Liu:
Rich Short Text Conversation Using Semantic-Key-Controlled Sequence Generation. IEEE ACM Trans. Audio Speech Lang. Process. 26(8): 1359-1368 (2018) - [j18]Tian Tan, Yanmin Qian, Hu Hu, Ying Zhou, Wen Ding, Kai Yu:
Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 26(8): 1393-1405 (2018) - [j17]Heinrich Dinkel, Yanmin Qian, Kai Yu:
Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection. IEEE ACM Trans. Audio Speech Lang. Process. 26(11): 2002-2014 (2018) - [c107]Lu Chen, Bowen Tan, Sishan Long, Kai Yu:
Structured Dialogue Policy with Graph Neural Networks. COLING 2018: 1257-1268 - [c106]Liliang Ren, Kaige Xie, Lu Chen, Kai Yu:
Towards Universal Dialogue State Tracking. EMNLP 2018: 2780-2786 - [c105]Zhehuai Chen, Qi Liu, Hao Li, Kai Yu:
On Modular Training of Neural Acoustics-to-Word Model for LVCSR. ICASSP 2018: 4754-4758 - [c104]Shuai Wang, Yanmin Qian, Kai Yu:
Focal Kl-Divergence Based Dilated Convolutional Neural Networks for Co-Channel Speaker Identification. ICASSP 2018: 5339-5343 - [c103]Ouyu Lan, Su Zhu, Kai Yu:
Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language Understanding. ICASSP 2018: 6049-6053 - [c102]Lu Chen, Cheng Chang, Zhi Chen, Bowen Tan, Milica Gasic, Kai Yu:
Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management. ICASSP 2018: 6074-6078 - [c101]Su Zhu, Ouyu Lan, Kai Yu:
Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation. ICASSP 2018: 6179-6183 - [c100]Bo Huang, Ya Zhang, Kai Yu:
MLN: Moment localization Network and Samples Selection for Moment Retrieval. ICVIP 2018: 165-170 - [c99]Zili Huang, Shuai Wang, Kai Yu:
Angular Softmax for Short-Duration Text-independent Speaker Verification. INTERSPEECH 2018: 3623-3627 - [c98]Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian, Kai Yu:
Knowledge Distillation for Sequence Model. INTERSPEECH 2018: 3703-3707 - [c97]Shuai Wang, Heinrich Dinkel, Yanmin Qian, Kai Yu:
Covariance Based Deep Feature for Text-Dependent Speaker Verification. IScIDE 2018: 231-242 - [c96]Huifeng Zhang, Su Zhu, Shuai Fan, Kai Yu:
Joint Spoken Language Understanding and Domain Adaptive Language Modeling. IScIDE 2018: 311-324 - [c95]Shuai Wang, Zili Huang, Yanmin Qian, Kai Yu:
Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition. ISCSLP 2018: 195-199 - [c94]Yexin Yang, Shuai Wang, Man Sun, Yanmin Qian, Kai Yu:
Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification. ISCSLP 2018: 205-209 - [c93]Xuan Liu, Di Cao, Kai Yu:
Binarized LSTM Language Model. NAACL-HLT 2018: 2113-2121 - [c92]Kaige Xie, Cheng Chang, Liliang Ren, Lu Chen, Kai Yu:
Cost-Sensitive Active Learning for Dialogue State Tracking. SIGDIAL Conference 2018: 209-213 - [c91]Su Zhu, Kai Yu:
Concept Transfer Learning for Adaptive Language Understanding. SIGDIAL Conference 2018: 391-399 - [e1]Yuxin Peng, Kai Yu, Jiwen Lu, Xingpeng Jiang:
Intelligence Science and Big Data Engineering - 8th International Conference, IScIDE 2018, Lanzhou, China, August 18-19, 2018, Revised Selected Papers. Lecture Notes in Computer Science 11266, Springer 2018, ISBN 978-3-030-02697-4 [contents] - [i9]Zhehuai Chen, Qi Liu, Hao Li, Kai Yu:
On Modular Training of Neural Acoustics-to-Word Model for LVCSR. CoRR abs/1803.01090 (2018) - [i8]Shuai Wang, Zili Huang, Yanmin Qian, Kai Yu:
Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition. CoRR abs/1805.01344 (2018) - [i7]Zhehuai Chen, Yanmin Qian, Kai Yu:
Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting. CoRR abs/1808.00639 (2018) - [i6]Liliang Ren, Kaige Xie, Lu Chen, Kai Yu:
Towards Universal Dialogue State Tracking. CoRR abs/1810.09587 (2018) - [i5]Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe:
End-to-End Monaural Multi-speaker ASR System without Pretraining. CoRR abs/1811.02062 (2018) - 2017
- [j16]Zhehuai Chen, Yimeng Zhuang, Yanmin Qian, Kai Yu:
Phone Synchronous Speech Recognition With CTC Lattices. IEEE ACM Trans. Audio Speech Lang. Process. 25(1): 86-97 (2017) - [c90]Qi Liu, Yanmin Qian, Kai Yu:
Future vector enhanced LSTM language model for LVCSR. ASRU 2017: 104-110 - [c89]Yue Wu, Tianxing He, Zhehuai Chen, Yanmin Qian, Kai Yu:
Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR. CCL 2017: 398-410 - [c88]Lu Chen, Runzhe Yang, Cheng Chang, Zihao Ye, Xiang Zhou, Kai Yu:
On-line Dialogue Policy Learning with Companion Teaching. EACL (2) 2017: 198-204 - [c87]Cheng Chang, Runzhe Yang, Lu Chen, Xiang Zhou, Kai Yu:
Affordable On-line Dialogue Policy Learning. EMNLP 2017: 2200-2209 - [c86]Lu Chen, Xiang Zhou, Cheng Chang, Runzhe Yang, Kai Yu:
Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning. EMNLP 2017: 2454-2464 - [c85]Zhehuai Chen, Yimeng Zhuang, Kai Yu:
Confidence measures for CTC-based phone synchronous decoding. ICASSP 2017: 4850-4854 - [c84]Heinrich Dinkel, Nanxin Chen, Yanmin Qian, Kai Yu:
End-to-end spoofing detection with raw waveform CLDNNS. ICASSP 2017: 4860-4864 - [c83]Su Zhu, Kai Yu:
Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. ICASSP 2017: 5675-5679 - [c82]Heinrich Dinkel, Yanmin Qian, Kai Yu:
Small-footprint convolutional neural network for spoofing detection. IJCNN 2017: 3086-3091 - [c81]Xu Xiang, Yanmin Qian, Kai Yu:
Binary Deep Neural Networks for Speech Recognition. INTERSPEECH 2017: 533-537 - [c80]Bo Chen, Tianling Bian, Kai Yu:
Discrete Duration Model for Speech Synthesis. INTERSPEECH 2017: 789-793 - [c79]Shuai Wang, Yanmin Qian, Kai Yu:
What Does the Speaker Embedding Encode? INTERSPEECH 2017: 1497-1501 - [c78]Zhehuai Chen, Yanmin Qian, Kai Yu:
A Unified Confidence Measure Framework Using Auxiliary Normalization Graph. IScIDE 2017: 123-133 - [c77]Di Cao, Kai Yu:
Deep Attentive Structured Language Model Based on LSTM. IScIDE 2017: 169-180 - [c76]Xuan Liu, Xueyang Wu, Ruinian Chen, Zijian Zhao, Hongtao Lin, Kai Yu:
splab at the NTCIR-13 STC-2 Task. NTCIR 2017 - [i4]Su Zhu, Kai Yu:
Concept Transfer Learning for Adaptive Language Understanding. CoRR abs/1706.00927 (2017) - 2016
- [j15]Kai Sun, Qizhe Xie, Kai Yu:
Recurrent Polynomial Network for Dialogue State Tracking. Dialogue Discourse 7(3): 65-88 (2016) - [j14]Kai Yu, Lu Chen, Kai Sun, Qizhe Xie, Su Zhu:
Evolvable dialogue state tracking for statistical dialogue management. Frontiers Comput. Sci. 10(2): 201-215 (2016) - [j13]Yanmin Qian, Nanxin Chen, Kai Yu:
Deep features for automatic spoofing detection. Speech Commun. 85: 43-52 (2016) - [j12]Tian Tan, Yanmin Qian, Kai Yu:
Cluster Adaptive Training for Deep Neural Network Based Acoustic Model. IEEE ACM Trans. Audio Speech Lang. Process. 24(3): 459-468 (2016) - [j11]Yanmin Qian, Mengxiao Bi, Tian Tan, Kai Yu:
Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 24(12): 2263-2276 (2016) - [c75]Maofan Yin, Sunil Sivadas, Kai Yu, Bin Ma:
Discriminatively trained joint speaker and environment representations for adaptation of deep neural network acoustic models. ICASSP 2016: 5065-5069 - [c74]Sibo Tong, Hao Gu, Kai Yu:
A comparative study of robustness of deep learning approaches for VAD. ICASSP 2016: 5695-5699 - [c73]Yimeng Zhuang, Xuankai Chang, Yanmin Qian, Kai Yu:
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC. INTERSPEECH 2016: 938-942 - [c72]Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu:
Phone Synchronous Decoding with CTC Lattice. INTERSPEECH 2016: 1923-1927 - [c71]Kai Sun, Su Zhu, Lu Chen, Siqiu Yao, Xueyang Wu, Kai Yu:
Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues. INTERSPEECH 2016: 2060-2064 - [c70]Tianxing He, Yu Zhang, Jasha Droppo, Kai Yu:
On training bi-directional neural network language model with noise contrastive estimation. ISCSLP 2016: 1-5 - [c69]Xueyang Wu, Su Zhu, Yue Wu, Kai Yu:
Rich punctuations prediction using large-scale deep learning. ISCSLP 2016: 1-5 - [c68]Da Zheng, Zhehuai Chen, Yue Wu, Kai Yu:
Directed automatic speech transcription error correction using bidirectional LSTM. ISCSLP 2016: 1-5 - [c67]Yimeng Zhuang, Sibo Tong, Maofan Yin, Yanmin Qian, Kai Yu:
Multi-task joint-learning for robust voice activity detection. ISCSLP 2016: 1-5 - [c66]Ke Wu, Xuan Liu, Kai Yu:
The splab at the NTCIR-12 Short Text Conversation Task. NTCIR 2016 - [i3]Tianxing He, Yu Zhang, Jasha Droppo, Kai Yu:
On Training Bi-directional Neural Network Language Model with Noise Contrastive Estimation. CoRR abs/1602.06064 (2016) - [i2]Su Zhu, Kai Yu:
Encoder-decoder with Focus-mechanism for Sequence Labelling Based Spoken Language Understanding. CoRR abs/1608.02097 (2016) - 2015
- [j10]Yuan Liu, Yanmin Qian, Nanxin Chen, Tianfan Fu, Ya Zhang, Kai Yu:
Deep feature for text-dependent speaker verification. Speech Commun. 73: 1-13 (2015) - [j9]Kai Yu, Kai Sun, Lu Chen, Su Zhu:
Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking. IEEE ACM Trans. Audio Speech Lang. Process. 23(12): 2177-2188 (2015) - [c65]Yanmin Qian, Maofan Yin, Yongbin You, Kai Yu:
Multi-task joint-learning of deep neural networks for robust speech recognition. ASRU 2015: 310-316 - [c64]Yongbin You, Yanmin Qian, Kai Yu:
Local trajectory based speech enhancement for robust speech recognition with deep neural network. ChinaSIP 2015: 5-9 - [c63]Yongbin You, Yanmin Qian, Tianxing He, Kai Yu:
An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition. ChinaSIP 2015: 30-34 - [c62]Tian Tan, Yanmin Qian, Maofan Yin, Yimeng Zhuang, Kai Yu:
Cluster adaptive training for deep neural network. ICASSP 2015: 4325-4329 - [c61]Suliang Bu, Yunxin Zhao, Yanmin Qian, Kai Yu:
A novel static parameter calculation method for model compensation. ICASSP 2015: 4510-4514 - [c60]Tianxing He, Xu Xiang, Yanmin Qian, Kai Yu:
Recurrent neural network language model with structured word embeddings for speech recognition. ICASSP 2015: 5396-5400 - [c59]Yanmin Qian, Tianxing He, Wei Deng, Kai Yu:
Automatic model redundancy reduction for fast back-propagation for deep neural networks in speech recognition. IJCNN 2015: 1-6 - [c58]Nanxin Chen, Yanmin Qian, Kai Yu:
Multi-task learning for text-dependent speaker verification. INTERSPEECH 2015: 185-189 - [c57]Nanxin Chen, Yanmin Qian, Heinrich Dinkel, Bo Chen, Kai Yu:
Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge. INTERSPEECH 2015: 2097-2101 - [c56]Bo Chen, Zhehuai Chen, Jiachen Xu, Kai Yu:
An investigation of context clustering for statistical speech synthesis with deep neural network. INTERSPEECH 2015: 2212-2216 - [c55]Mengxiao Bi, Yanmin Qian, Kai Yu:
Very deep convolutional neural networks for LVCSR. INTERSPEECH 2015: 3259-3263 - [c54]Wengong Jin, Tianxing He, Yanmin Qian, Kai Yu:
Paragraph vector based topic model for language model adaptation. INTERSPEECH 2015: 3516-3520 - [c53]Qizhe Xie, Kai Sun, Su Zhu, Lu Chen, Kai Yu:
Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers. SIGDIAL Conference 2015: 295-304 - [i1]Kai Sun, Qizhe Xie, Kai Yu:
Recurrent Polynomial Network for Dialogue State Tracking. CoRR abs/1507.03934 (2015) - 2014
- [c52]Wei Deng, Yanmin Qian, Yuchen Fan, Tianfan Fu, Kai Yu:
Stochastic data sweeping for fast DNN training. ICASSP 2014: 240-244 - [c51]Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, Kai Yu:
Reshaping deep neural network for fast decoding by node-pruning. ICASSP 2014: 245-249 - [c50]Suliang Bu, Yanmin Qian, Khe Chai Sim, Yongbin You, Kai Yu:
Second order vector taylor series based robust speech recognition. ICASSP 2014: 1769-1773 - [c49]Yuan Liu, Tianfan Fu, Yuchen Fan, Yanmin Qian, Kai Yu:
Speaker verification with deep features. IJCNN 2014: 747-753 - [c48]Tianfan Fu, Yanmin Qian, Yuan Liu, Kai Yu:
Tandem deep features for text-dependent speaker verification. INTERSPEECH 2014: 1327-1331 - [c47]Suliang Bu, Yanmin Qian, Kai Yu:
A novel dynamic parameters calculation approach for model compensation. INTERSPEECH 2014: 2744-2748 - [c46]Jianwei Niu, Yanmin Qian, Kai Yu:
Acoustic emotion recognition using deep neural network. ISCSLP 2014: 128-132 - [c45]Kai Sun, Lu Chen, Su Zhu, Kai Yu:
The SJTU System for Dialog State Tracking Challenge 2. SIGDIAL Conference 2014: 318-326 - [c44]Kai Sun, Lu Chen, Su Zhu, Kai Yu:
A generalized rule based tracker for dialogue state tracking. SLT 2014: 330-335 - [c43]Su Zhu, Lu Chen, Kai Sun, Da Zheng, Kai Yu:
Semantic parser enhancement for dialogue domain extension with little data. SLT 2014: 336-341 - 2013
- [c42]Yanmin Qian, Kai Yu, Jia Liu:
Combination of data borrowing strategies for low-resource LVCSR. ASRU 2013: 404-409 - [c41]Peilu Wang, Ruihua Sun, Hai Zhao, Kai Yu:
A New Word Language Model Evaluation Metric for Character Based Languages. CCL 2013: 315-324 - 2012
- [j8]Jason D. Williams, Kai Yu, Brahim Chaib-draa, Oliver Lemon, Roberto Pieraccini, Olivier Pietquin, Pascal Poupart, Steve J. Young:
Introduction to the Issue on Advances in Spoken Dialogue Systems and Mobile Interface. IEEE J. Sel. Top. Signal Process. 6(8): 889-890 (2012) - [c40]Khe Chai Sim, Shengdong Zhao, Kai Yu, Hank Liao:
ICMI'12 grand challenge: haptic voice recognition. ICMI 2012: 363-370 - [c39]Hainan Xu, Yuchen Fan, Kai Yu:
Development of the 2012 SJTU HVR system. ICMI 2012: 539-544 - [c38]Milica Gasic, Pirros Tsiakoulis, Matthew Henderson, Blaise Thomson, Kai Yu, Eli Tzirkel, Steve J. Young:
The Effect of Cognitive Load on a Statistical Dialogue System. SIGDIAL Conference 2012: 74-78 - [c37]Matthew Henderson, Milica Gasic, Blaise Thomson, Pirros Tsiakoulis, Kai Yu, Steve J. Young:
Discriminative spoken language understanding using word confusion networks. SLT 2012: 176-181 - 2011
- [j7]Kai Yu, Heiga Zen, François Mairesse, Steve J. Young:
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis. Speech Commun. 53(6): 914-923 (2011) - [j6]Kai Yu, Steve J. Young:
Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis. IEEE Trans. Speech Audio Process. 19(5): 1071-1079 (2011) - [c36]Milica Gasic, Filip Jurcícek, Blaise Thomson, Kai Yu, Steve J. Young:
On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. ASRU 2011: 312-317 - [c35]Kai Yu, Steve J. Young:
Joint modelling of voicing label and continuous F0 for HMM based speech synthesis. ICASSP 2011: 4572-4575 - [c34]Filip Jurcícek, Simon Keizer, Milica Gasic, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk. INTERSPEECH 2011: 3061-3064 - [c33]Alan W. Black, Susanne Burger, Alistair Conkie, Helen Wright Hastie, Simon Keizer, Oliver Lemon, Nicolas Merigaud, Gabriel Parent, Gabriel Schubiner, Blaise Thomson, Jason D. Williams, Kai Yu, Steve J. Young, Maxine Eskénazi:
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results. SIGDIAL Conference 2011: 2-7 - 2010
- [j5]Steve J. Young, Milica Gasic, Simon Keizer, François Mairesse, Jost Schatzmann, Blaise Thomson, Kai Yu:
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management. Comput. Speech Lang. 24(2): 150-174 (2010) - [j4]Kai Yu, Mark J. F. Gales, Lan Wang, Philip C. Woodland:
Unsupervised training and directed manual transcription for LVCSR. Speech Commun. 52(7-8): 652-663 (2010) - [c32]François Mairesse, Milica Gasic, Filip Jurcícek, Simon Keizer, Blaise Thomson, Kai Yu, Steve J. Young:
Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning. ACL 2010: 1552-1561 - [c31]Kai Yu, François Mairesse, Steve J. Young:
Word-level emphasis modelling in HMM-based speech synthesis. ICASSP 2010: 4238-4241 - [c30]Mark J. F. Gales, Kai Yu:
Canonical state models for automatic speech recognition. INTERSPEECH 2010: 58-61 - [c29]Filip Jurcícek, Blaise Thomson, Simon Keizer, François Mairesse, Milica Gasic, Kai Yu, Steve J. Young:
Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems. INTERSPEECH 2010: 90-93 - [c28]Kai Yu, Heiga Zen, François Mairesse, Steve J. Young:
Context adaptive training with factorized decision trees for HMM-based speech synthesis. INTERSPEECH 2010: 414-417 - [c27]Simon Keizer, Milica Gasic, Filip Jurcícek, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Parameter estimation for agenda-based user simulation. SIGDIAL Conference 2010: 116-123 - [c26]Milica Gasic, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers. SIGDIAL Conference 2010: 201-204 - [c25]Blaise Thomson, Filip Jurcícek, Milica Gasic, Simon Keizer, François Mairesse, Kai Yu, Steve J. Young:
Parameter learning for POMDP spoken dialogue models. SLT 2010: 271-276 - [c24]Blaise Thomson, Kai Yu, Simon Keizer, Milica Gasic, Filip Jurcícek, François Mairesse, Steve J. Young:
Bayesian dialogue system for the Let's Go Spoken Dialogue Challenge. SLT 2010: 460-465 - [c23]Kai Yu, Blaise Thomson, Steve J. Young:
From discontinuous to continuous F0 modelling in HMM-based speech synthesis. SSW 2010: 94-99
2000 – 2009
- 2009
- [j3]Kai Yu, Mark J. F. Gales, Philip C. Woodland:
Unsupervised Adaptation With Discriminative Mapping Transforms. IEEE Trans. Speech Audio Process. 17(4): 714-723 (2009) - [c22]Milica Gasic, Fabrice Lefèvre, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Back-off action selection in summary space-based POMDP dialogue systems. ASRU 2009: 456-461 - [c21]Kai Yu, Tomoki Toda, Milica Gasic, Simon Keizer, François Mairesse, Blaise Thomson, Steve J. Young:
Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis. ICASSP 2009: 3773-3776 - [c20]François Mairesse, Milica Gasic, Filip Jurcícek, Simon Keizer, Blaise Thomson, Kai Yu, Steve J. Young:
Spoken language understanding from unaligned data using discriminative classification models. ICASSP 2009: 4749-4752 - [c19]Filip Jurcícek, Milica Gasic, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Transformation-based learning for semantic parsing. INTERSPEECH 2009: 2719-2722 - [c18]Fabrice Lefèvre, Milica Gasic, Filip Jurcícek, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
k-Nearest Neighbor Monte-Carlo Control Algorithm for POMDP-Based Dialogue Systems. SIGDIAL Conference 2009: 272-275 - 2008
- [c17]Kai Yu, Mark J. F. Gales, Philip C. Woodland:
Unsupervised discriminative adaptation using discriminative mapping transforms. ICASSP 2008: 4273-4276 - [c16]Blaise Thomson, Milica Gasic, Simon Keizer, François Mairesse, Jost Schatzmann, Kai Yu, Steve J. Young:
User study of the Bayesian update of dialogue state approach to dialogue management. INTERSPEECH 2008: 483-486 - [c15]Blaise Thomson, Kai Yu, Milica Gasic, Simon Keizer, François Mairesse, Jost Schatzmann, Steve J. Young:
Evaluating semantic-level confidence scores with multiple hypotheses. INTERSPEECH 2008: 1153-1156 - [c14]Chandra Kant Raut, Kai Yu, Mark J. F. Gales:
Adaptive training using discriminative mapping transforms. INTERSPEECH 2008: 1697-1700 - [c13]Milica Gasic, Simon Keizer, François Mairesse, Jost Schatzmann, Blaise Thomson, Kai Yu, Steve J. Young:
Training and Evaluation of the HIS POMDP Dialogue System in Noise. SIGDIAL Workshop 2008: 112-119 - [c12]Simon Keizer, Milica Gasic, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Modelling user behaviour in the HIS-POMDP dialogue manager. SLT 2008: 121-124 - 2007
- [j2]Kai Yu, Mark J. F. Gales:
Bayesian Adaptive Inference and Adaptive Training. IEEE Trans. Speech Audio Process. 15(6): 1932-1943 (2007) - [c11]Mark J. F. Gales, Frank Diehl, Chandra Kant Raut, Marcus Tomalin, Philip C. Woodland, Kai Yu:
Development of a phonetic system for large vocabulary Arabic speech recognition. ASRU 2007: 24-29 - [c10]Xunying Liu, William J. Byrne, Mark J. F. Gales, Adrià de Gispert, Marcus Tomalin, Philip C. Woodland, Kai Yu:
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation. ASRU 2007: 153-158 - [c9]Marcus Tomalin, Mark J. F. Gales, X. Andrew Liu, Khe Chai Sim, Rohit Sinha, Lan Wang, Philip C. Woodland, Kai Yu:
Improving Speech Transcription for Mandarin-English Translation. ICASSP (4) 2007: 97-100 - [c8]Mark J. F. Gales, Xunying Liu, Rohit Sinha, Philip C. Woodland, Kai Yu, Spyros Matsoukas, Tim Ng, Kham Nguyen, Long Nguyen, Jean-Luc Gauvain, Lori Lamel, Abdelkhalek Messaoudi:
Speech Recognition System Combination for Machine Translation. ICASSP (4) 2007: 1277-1280 - [c7]Kai Yu, Mark J. F. Gales, Philip C. Woodland:
Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio. INTERSPEECH 2007: 1709-1712 - 2006
- [j1]Kai Yu, Mark J. F. Gales:
Discriminative cluster adaptive training. IEEE Trans. Speech Audio Process. 14(5): 1694-1703 (2006) - [c6]Kai Yu, Mark J. F. Gales:
Incremental Adaptation using Bayesian Inference. ICASSP (1) 2006: 217-220 - 2005
- [c5]Gunnar Evermann, Ho Yin Chan, Mark J. F. Gales, Bin Jia, David Mrva, Philip C. Woodland, Kai Yu:
Training LVCSR Systems on Thousands of Hours of Data. ICASSP (1) 2005: 209-212 - [c4]Mark J. F. Gales, Bin Jia, X. Andrew Liu, Khe Chai Sim, Philip C. Woodland, Kai Yu:
Development of the CUHTK 2004 Mandarin Conversational Telephone Speech Transcription System. ICASSP (1) 2005: 841-844 - [c3]Xunying Liu, Mark J. F. Gales, Khe Chai Sim, Kai Yu:
Investigation of Acoustic Modeling Techniques for LVCSR Systems. ICASSP (1) 2005: 849-852 - 2004
- [c2]Kai Yu, Mark J. F. Gales:
Adaptive training using structured transforms. ICASSP (1) 2004: 317-320 - [c1]Sue Tranter, Kai Yu, Gunnar Evermann, Philip C. Woodland:
Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech. ICASSP (1) 2004: 753-756
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-15 02:17 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint