default search action
Zejun Ma
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
Journal Articles
- 2024
- [j5]Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Zejun Ma, Jiakai Wang, Jie Luo, Xianglong Liu:
BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance. IEEE Trans. Neural Networks Learn. Syst. 35(8): 10674-10686 (2024) - 2023
- [j4]Huidong Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Ke Chen, Junbin Gao:
Graph contrastive learning with implicit augmentations. Neural Networks 163: 156-164 (2023) - [j3]Pengfei Wei, Thanh Vinh Vo, Xinghua Qu, Yew Soon Ong, Zejun Ma:
Transfer Kernel Learning for Multi-Source Transfer Gaussian Process Regression. IEEE Trans. Pattern Anal. Mach. Intell. 45(3): 3862-3876 (2023) - [j2]Pengfei Wei, Yiping Ke, Yew-Soon Ong, Zejun Ma:
Adaptive Transfer Kernel Learning for Transfer Gaussian Process Regression. IEEE Trans. Pattern Anal. Mach. Intell. 45(6): 7142-7156 (2023) - 2022
- [j1]Zhiyun Fan, Linhao Dong, Meng Cai, Zejun Ma, Bo Xu:
Sequence-Level Speaker Change Detection With Difference-Based Continuous Integrate-and-Fire. IEEE Signal Process. Lett. 29: 1551-1554 (2022)
Conference and Workshop Papers
- 2024
- [c73]Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma:
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR. ICASSP 2024: 9986-9990 - [c72]Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Lu Lu, Zejun Ma:
Extending Multilingual ASR to New Languages Using Supplementary Encoder and Decoder Components. ICASSP 2024: 10586-10590 - [c71]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Extending Large Language Models for Speech and Audio Captioning. ICASSP 2024: 11236-11240 - [c70]Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Connecting Speech Encoder and Large Language Model for ASR. ICASSP 2024: 12637-12641 - [c69]Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao:
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis. ICLR 2024 - [c68]Qianqian Dong, Zhiying Huang, Qi Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang:
PolyVoice: Language Models for Speech to Speech Translation. ICLR 2024 - [c67]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
SALMONN: Towards Generic Hearing Abilities for Large Language Models. ICLR 2024 - [c66]Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao:
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis. ICLR 2024 - [c65]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models. ICML 2024 - 2023
- [c64]Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, Zejun Ma:
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training. ACL (Findings) 2023: 8894-8907 - [c63]Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:
Improving Large-Scale Deep Biasing With Phoneme Features and Text-Only Data in Streaming Transducer. ASRU 2023: 1-8 - [c62]Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma:
Bytecover3: Accurate Cover Song Identification On Short Queries. ICASSP 2023: 1-5 - [c61]Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee:
Leveraging Phone-Level Linguistic-Acoustic Similarity For Utterance-Level Pronunciation Scoring. ICASSP 2023: 1-5 - [c60]Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee:
An ASR-Free Fluency Scoring Approach with Self-Supervised Learning. ICASSP 2023: 1-5 - [c59]Rao Ma, Xiaobo Wu, Jin Qiu, Yanan Qin, Haihua Xu, Peihao Wu, Zejun Ma:
Internal Language Model Estimation Based Adaptive Language Model Fusion for Domain Adaptation. ICASSP 2023: 1-5 - [c58]Chunfeng Wang, Peisong Huang, Yuxiang Zou, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma:
LiteG2P: A Fast, Light and High Accuracy Model for Grapheme-to-Phoneme Conversion. ICASSP 2023: 1-5 - [c57]Zhi Li, Pengfei Wei, Xiang Yin, Zejun Ma, Alex C. Kot:
Virtual Try-On with Pose-Garment Keypoints Guided Inpainting. ICCV 2023: 22731-22740 - [c56]Zejun Ma, Hong Jiang, Huangxu Ge, Huajie Zhang, Mengshi Zhao, Ting Wang, Hongwu Bai:
Dynamics Analysis of Large-Scale Transmission Tower-Line Coupled System under Measured Typhoon Load. ICITEE 2023: 90-96 - [c55]Xinghua Qu, Xiang Yin, Pengfei Wei, Lu Lu, Zejun Ma:
AudioQR: Deep Neural Audio Watermarks For QR Code. IJCAI 2023: 6192-6200 - [c54]Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma:
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation. INTERSPEECH 2023: 42-46 - [c53]Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer. INTERSPEECH 2023: 386-390 - [c52]Yist Y. Lin, Tao Han, Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma:
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition. INTERSPEECH 2023: 904-908 - [c51]Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma:
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring. INTERSPEECH 2023: 949-953 - [c50]Shuju Shi, Kaiqi Fu, Yiwei Gu, Xiaohai Tian, Shaojun Gao, Wei Li, Zejun Ma:
Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment. INTERSPEECH 2023: 954-958 - [c49]Zhipeng Chen, Haihua Xu, Yerbolat Khassanov, Yi He, Lu Lu, Zejun Ma, Ji Wu:
Knowledge Distillation Approach for Efficient Internal Language Model Estimation. INTERSPEECH 2023: 1339-1343 - [c48]Pengfei Wei, Xiang Yin, Chunfeng Wang, Zhonghao Li, Xinghua Qu, Zhiqiang Xu, Zejun Ma:
S2CD: Self-heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion. INTERSPEECH 2023: 2288-2292 - [c47]Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma:
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition. INTERSPEECH 2023: 2908-2912 - [c46]Zhiyun Fan, Linhao Dong, Chen Shen, Zhenlin Liang, Jun Zhang, Lu Lu, Zejun Ma:
Language-specific Boundary Learning for Improving Mandarin-English Code-switching Speech Recognition. INTERSPEECH 2023: 3322-3326 - [c45]Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma:
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech. INTERSPEECH 2023: 5486-5490 - [c44]Yuchen Liu, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma, Qin Jin:
Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation. ACM Multimedia 2023: 5966-5974 - [c43]Xinghua Qu, Hongyang Liu, Zhu Sun, Xiang Yin, Yew Soon Ong, Lu Lu, Zejun Ma:
Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions and Prospects. SIGIR 2023: 2701-2711 - 2022
- [c42]Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov:
Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data. AAAI 2022: 4441-4449 - [c41]Pengfei Wei, Xinghua Qu, Wen Song, Zejun Ma:
Dynamic Transfer Gaussian Process Regression. CIKM 2022: 2118-2127 - [c40]Hang Zhao, Chen Zhang, Bilei Zhu, Zejun Ma, Kejun Zhang:
S3T: Self-Supervised Pre-Training with Swin Transformer For Music Classification. ICASSP 2022: 606-610 - [c39]Xingjian Du, Ke Chen, Zijie Wang, Bilei Zhu, Zejun Ma:
Bytecover2: Towards Dimensionality Reduction of Latent Embedding for Efficient Cover Song Identification. ICASSP 2022: 616-620 - [c38]Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov:
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection. ICASSP 2022: 646-650 - [c37]Jingning Xu, Benlai Tang, Mingjie Wang, Siyuan Bian, Wenyi Guo, Xiang Yin, Zejun Ma:
Towards Using Clothes Style Transfer for Scenario-Aware Person Video Generation. ICASSP 2022: 1745-1749 - [c36]Yizhou Lu, Mingkun Huang, Xinghua Qu, Pengfei Wei, Zejun Ma:
Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks. ICASSP 2022: 6882-6886 - [c35]Shaoshi Ling, Chen Shen, Meng Cai, Zejun Ma:
Improving Pseudo-Label Training For End-To-End Speech Recognition Using Gradient Mask. ICASSP 2022: 8397-8401 - [c34]Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu:
Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection. ICASSP 2022: 8532-8536 - [c33]Chen Shen, Yi Liu, Wenzhi Fan, Bin Wang, Shixue Wen, Yao Tian, Jun Zhang, Jingsheng Yang, Zejun Ma:
The Volcspeech System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge. ICASSP 2022: 9176-9180 - [c32]Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Yao Tian, Zejun Ma, Jie Luo, Xianglong Liu:
BiFSMN: Binary Neural Network for Keyword Spotting. IJCAI 2022: 4346-4352 - [c31]Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang:
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR. INTERSPEECH 2022: 1666-1670 - [c30]Junfeng Hou, Jinkun Chen, Wanyu Li, Yufeng Tang, Jun Zhang, Zejun Ma:
Bring dialogue-context into RNN-T for streaming ASR. INTERSPEECH 2022: 2048-2052 - [c29]Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu:
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire. INTERSPEECH 2022: 3749-3753 - [c28]Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma:
Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding. INTERSPEECH 2022: 4287-4291 - [c27]Kaiqi Fu, Shaojun Gao, Xiaohai Tian, Wei Li, Zejun Ma:
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring. INTERSPEECH 2022: 4337-4341 - [c26]Xiaohai Tian, Kaiqi Fu, Shaojun Gao, Yiwei Gu, Kai Wang, Wei Li, Zejun Ma:
A Transfer and Multi-Task Learning based Approach for MOS Prediction. INTERSPEECH 2022: 5438-5442 - [c25]Xingjian Du, Huidong Liang, Yuan Wan, Yuheng Lin, Ke Chen, Bilei Zhu, Zejun Ma:
Latent feature augmentation for chorus detection. ISMIR 2022: 240-247 - [c24]Xinghua Qu, Yew Soon Ong, Abhishek Gupta, Pengfei Wei, Zhu Sun, Zejun Ma:
Importance Prioritized Policy Distillation. KDD 2022: 1420-1429 - [c23]Xinghua Qu, Pengfei Wei, Mingyong Gao, Zhu Sun, Yew Soon Ong, Zejun Ma:
Synthesising Audio Adversarial Examples for Automatic Speech Recognition. KDD 2022: 1430-1440 - [c22]Xiaoheng Sun, Xia Liang, Qiqi He, Bilei Zhu, Zejun Ma:
GIO: A Timbre-informed Approach for Pitch Tracking in Highly Noisy Environments. ICMR 2022: 480-488 - [c21]Yu Lin, Zhecheng An, Peihao Wu, Zejun Ma:
Improving Contextual Representation with Gloss Regularized Pre-training. NAACL-HLT (Findings) 2022: 907-920 - 2021
- [c20]Yongwei Gao, Xingjian Du, Bilei Zhu, Xiaoheng Sun, Wei Li, Zejun Ma:
An Hrnet-Blstm Model With Two-Stage Training For Singing Melody Extraction. ICASSP 2021: 56-60 - [c19]Xingjian Du, Bilei Zhu, Qiuqiang Kong, Zejun Ma:
Singing Melody Extraction from Polyphonic Music based on Spectral Correlation Modeling. ICASSP 2021: 241-245 - [c18]Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, Zejun Ma:
Bytecover: Cover Song Identification Via Multi-Loss Training. ICASSP 2021: 551-555 - [c17]Yuanbo Hou, Yi Deng, Bilei Zhu, Zejun Ma, Dick Botteldooren:
Rule-Embedded Network for Audio-Visual Voice Activity Detection in Live Musical Video Streams. ICASSP 2021: 4165-4169 - [c16]Yao Tian, Haitao Yao, Meng Cai, Yaming Liu, Zejun Ma:
Improving RNN Transducer Modeling for Small-Footprint Keyword Spotting. ICASSP 2021: 5624-5628 - [c15]Junjie Pan, Lin Wu, Xiang Yin, Pengfei Wu, Chenchang Xu, Zejun Ma:
A Chapter-Wise Understanding System for Text-To-Speech in Chinese Novels. ICASSP 2021: 6069-6073 - [c14]Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Ling Xu, Chen Shen, Zejun Ma:
PPG-Based Singing Voice Conversion with Adversarial Representation Learning. ICASSP 2021: 7073-7077 - [c13]Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren:
Attention-Based Cross-Modal Fusion for Audio-Visual Voice Activity Detection in Musical Video Streams. Interspeech 2021: 321-325 - [c12]Lu Huang, Jingyu Sun, Yufeng Tang, Junfeng Hou, Jinkun Chen, Jun Zhang, Zejun Ma:
HMM-Free Encoder Pre-Training for Streaming RNN Transducer. Interspeech 2021: 1797-1801 - [c11]Xianzhao Chen, Hao Ni, Yi He, Kang Wang, Zejun Ma, Zongxia Xie:
Emitting Word Timings with HMM-Free End-to-End System in Automatic Speech Recognition. Interspeech 2021: 2571-2575 - [c10]Yuxiang Zou, Shichao Liu, Xiang Yin, Haopeng Lin, Chunfeng Wang, Haoyu Zhang, Zejun Ma:
Fine-Grained Prosody Modeling in Neural Speech Synthesis Using ToBI Representation. Interspeech 2021: 3146-3150 - [c9]Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma:
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders. ISCSLP 2021: 1-5 - [c8]Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma:
Towards Realistic Visual Dubbing with Heterogeneous Sources. ACM Multimedia 2021: 1739-1747 - 2020
- [c7]Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang:
A Unified Sequence-to-Sequence Front-End Model for Mandarin Text-to-Speech Synthesis. ICASSP 2020: 6689-6693 - [c6]Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma:
A Hybrid Text Normalization System Using Multi-Head Self-Attention For Mandarin. ICASSP 2020: 6694-6698 - 2019
- [c5]Xiaochun An, Yuxuan Wang, Shan Yang, Zejun Ma, Lei Xie:
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis. ASRU 2019: 184-191 - 2012
- [c4]Zejun Ma, Xiaorui Wang, Bo Xu:
Unsupervised training of subspace gaussian mixture models for conversational telephone speech recognition. ICASSP 2012: 4829-4832 - 2011
- [c3]Zejun Ma, Xiaorui Wang, Bo Xu:
An Empirical Study of Multilingual Spoken Term Detection. INTERSPEECH 2011: 1921-1924 - [c2]Zejun Ma, Xiaorui Wang, Bo Xu:
Fusing Multiple Confidence Measures for Chinese Spoken Term Detection. INTERSPEECH 2011: 1925-1928 - 2010
- [c1]Zejun Ma, Li Song, Cheng Zhi, Libo Yang:
Distributed link-aware rate allocation for R-D optimal multiple video streaming over wireless networks. WCSP 2010: 1-6
Informal and Other Publications
- 2024
- [i67]Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao:
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis. CoRR abs/2401.08503 (2024) - [i66]Hang Zhao, Yifei Xin, Zhesong Yu, Bilei Zhu, Lu Lu, Zejun Ma:
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning. CoRR abs/2402.07485 (2024) - [i65]Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma:
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR. CoRR abs/2403.02010 (2024) - [i64]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
Can Large Language Models Understand Spatial Audio? CoRR abs/2406.07914 (2024) - [i63]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang:
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models. CoRR abs/2406.15704 (2024) - [i62]Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li:
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models. CoRR abs/2407.07895 (2024) - 2023
- [i61]Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee:
Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring. CoRR abs/2302.10444 (2023) - [i60]Chunfeng Wang, Peisong Huang, Yuxiang Zou, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma:
LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion. CoRR abs/2303.01086 (2023) - [i59]Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma:
ByteCover3: Accurate Cover Song Identification on Short Queries. CoRR abs/2303.11692 (2023) - [i58]Xinnian Liang, Bing Wang, Hui Huang, Shuangzhi Wu, Peihao Wu, Lu Lu, Zejun Ma, Zhoujun Li:
Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System. CoRR abs/2304.13343 (2023) - [i57]Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao:
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation. CoRR abs/2305.00787 (2023) - [i56]Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma:
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring. CoRR abs/2305.11438 (2023) - [i55]Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, Zejun Ma:
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training. CoRR abs/2305.17499 (2023) - [i54]Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma:
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation. CoRR abs/2305.17732 (2023) - [i53]Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou Zhao:
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation. CoRR abs/2305.18474 (2023) - [i52]Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang:
PolyVoice: Language Models for Speech to Speech Translation. CoRR abs/2306.02982 (2023) - [i51]Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao:
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis. CoRR abs/2306.03504 (2023) - [i50]Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao:
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias. CoRR abs/2306.03509 (2023) - [i49]Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer. CoRR abs/2306.04076 (2023) - [i48]Zhiyun Fan, Linhao Dong, Chen Shen, Zhenlin Liang, Jun Zhang, Lu Lu, Zejun Ma:
Language-specific Acoustic Boundary Learning for Mandarin-English Code-switching Speech Recognition. CoRR abs/2306.05279 (2023) - [i47]Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma:
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition. CoRR abs/2306.07949 (2023) - [i46]Xinghua Qu, Hongyang Liu, Zhu Sun, Xiang Yin, Yew Soon Ong, Lu Lu, Zejun Ma:
Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects. CoRR abs/2306.08219 (2023) - [i45]Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma:
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech. CoRR abs/2306.15304 (2023) - [i44]Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Chen Zhang, Zhenhui Ye, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao:
Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts. CoRR abs/2307.07218 (2023) - [i43]Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Connecting Speech Encoder and Large Language Model for ASR. CoRR abs/2309.13963 (2023) - [i42]Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models. CoRR abs/2310.05863 (2023) - [i41]Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang:
SALMONN: Towards Generic Hearing Abilities for Large Language Models. CoRR abs/2310.13289 (2023) - [i40]Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer. CoRR abs/2311.08966 (2023) - 2022
- [i39]Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma:
Towards Realistic Visual Dubbing with Heterogeneous Sources. CoRR abs/2201.06260 (2022) - [i38]Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang:
Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR. CoRR abs/2201.11627 (2022) - [i37]Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu:
Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection. CoRR abs/2201.12806 (2022) - [i36]Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov:
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection. CoRR abs/2202.00874 (2022) - [i35]Chen Shen, Yi Liu, Wenzhi Fan, Bin Wang, Shixue Wen, Yao Tian, Jun Zhang, Jingsheng Yang, Zejun Ma:
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge. CoRR abs/2202.04261 (2022) - [i34]Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Yao Tian, Zejun Ma, Jie Luo, Xianglong Liu:
BiFSMN: Binary Neural Network for Keyword Spotting. CoRR abs/2202.06483 (2022) - [i33]Hang Zhao, Chen Zhang, Belei Zhu, Zejun Ma, Kejun Zhang:
S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification. CoRR abs/2202.10139 (2022) - [i32]Kaiqi Fu, Shaojun Gao, Kai Wang, Wei Li, Xiaohai Tian, Zejun Ma:
Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information. CoRR abs/2203.01826 (2022) - [i31]Yizhou Lu, Mingkun Huang, Xinghua Qu, Pengfei Wei, Zejun Ma:
Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks. CoRR abs/2203.04583 (2022) - [i30]Yu Lin, Zhecheng An, Peihao Wu, Zejun Ma:
Improving Contextual Representation with Gloss Regularized Pre-training. CoRR abs/2205.06603 (2022) - [i29]Wudi Bao, Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma:
A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation. CoRR abs/2206.04922 (2022) - [i28]Zhiyun Fan, Linhao Dong, Meng Cai, Zejun Ma, Bo Xu:
Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire. CoRR abs/2206.13110 (2022) - [i27]Pengfei Wei, Lingdong Kong, Xinghua Qu, Xiang Yin, Zhiqiang Xu, Jing Jiang, Zejun Ma:
Unsupervised Video Domain Adaptation: A Disentanglement Perspective. CoRR abs/2208.07365 (2022) - [i26]Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Yist Y. Lin, Tao Han, Tze Yuan Chong, Yi He, Zejun Ma:
Improving short-video speech recognition using random utterance concatenation. CoRR abs/2210.15876 (2022) - [i25]Rao Ma, Xiaobo Wu, Jin Qiu, Yanan Qin, Haihua Xu, Peihao Wu, Zejun Ma:
Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation. CoRR abs/2211.00968 (2022) - [i24]Huidong Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Ke Chen, Junbin Gao:
Graph Contrastive Learning with Implicit Augmentations. CoRR abs/2211.03710 (2022) - [i23]Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Zejun Ma, Jiakai Wang, Jie Luo, Xianglong Liu:
BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance. CoRR abs/2211.06987 (2022) - [i22]Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu:
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire. CoRR abs/2211.09381 (2022) - [i21]Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma:
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features. CoRR abs/2212.05805 (2022) - 2021
- [i20]Lu Huang, Jingyu Sun, Yufeng Tang, Junfeng Hou, Jinkun Chen, Jun Zhang, Zejun Ma:
HMM-Free Encoder Pre-Training for Streaming RNN Transducer. CoRR abs/2104.10764 (2021) - [i19]Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren:
Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams. CoRR abs/2106.11411 (2021) - [i18]Shaoshi Ling, Chen Shen, Meng Cai, Zejun Ma:
Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask. CoRR abs/2110.04056 (2021) - [i17]Pengfei Wu, Junjie Pan, Chenchang Xu, Junhui Zhang, Lin Wu, Xiang Yin, Zejun Ma:
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech. CoRR abs/2110.04153 (2021) - [i16]Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma:
Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding. CoRR abs/2110.04754 (2021) - [i15]Jingning Xu, Benlai Tang, Mingjie Wang, Siyuan Bian, Wenyi Guo, Xiang Yin, Zejun Ma:
Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation. CoRR abs/2110.11894 (2021) - [i14]Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov:
Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data. CoRR abs/2112.07891 (2021) - 2020
- [i13]Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma:
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders. CoRR abs/2004.11012 (2020) - [i12]Wenjie Li, Benlai Tang, Xiang Yin, Yushi Zhao, Wei Li, Kang Wang, Hao Huang, Yuxuan Wang, Zejun Ma:
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech. CoRR abs/2005.09271 (2020) - [i11]Zhesong Yu, Xingjian Du, Bilei Zhu, Zejun Ma:
Contrastive Unsupervised Learning for Audio Fingerprinting. CoRR abs/2010.13540 (2020) - [i10]Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, Zejun Ma:
ByteCover: Cover Song Identification via Multi-Loss Training. CoRR abs/2010.14022 (2020) - [i9]Yuanbo Hou, Yi Deng, Bilei Zhu, Zejun Ma, Dick Botteldooren:
Rule-embedded network for audio-visual voice activity detection in live musical video streams. CoRR abs/2010.14168 (2020) - [i8]Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Ling Xu, Chen Shen, Zejun Ma:
PPG-based singing voice conversion with adversarial representation learning. CoRR abs/2010.14804 (2020) - [i7]Mingkun Huang, Meng Cai, Jun Zhang, Yang Zhang, Yongbin You, Yi He, Zejun Ma:
Dynamic latency speech recognition with asynchronous revision. CoRR abs/2011.01570 (2020) - [i6]Mingkun Huang, Jun Zhang, Meng Cai, Yang Zhang, Jiali Yao, Yongbin You, Yi He, Zejun Ma:
Improving RNN transducer with normalized jointer network. CoRR abs/2011.01576 (2020) - 2019
- [i5]Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang:
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis. CoRR abs/1911.04111 (2019) - [i4]Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma:
A hybrid text normalization system using multi-head self-attention for mandarin. CoRR abs/1911.04128 (2019) - 2017
- [i3]Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei:
Exponential Moving Average Model in Parallel Speech Recognition Training. CoRR abs/1703.01024 (2017) - [i2]Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang Situ, Shuai Li, Yang Zhang:
Deep LSTM for Large Vocabulary Continuous Speech Recognition. CoRR abs/1703.07090 (2017) - [i1]Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei:
Frame Stacking and Retaining for Recurrent Neural Network Acoustic Model. CoRR abs/1705.05992 (2017)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 22:05 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint