default search action
Frank K. Soong
Person information
- affiliation: Microsoft Research Asia, Beijing, China
- affiliation: Chinese University of Hong Kong (CUHK), Department of Systems Engineering and Engineering Management, Hong Kong
- affiliation: Bell Labs Research, Murray Hill, NJ, USA
- affiliation (PhD): University of Stanford, Department of Electrical Engineering, CA, USA
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j49]Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Sheng Zhao, Tao Qin, Frank K. Soong, Tie-Yan Liu:
NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality. IEEE Trans. Pattern Anal. Mach. Intell. 46(6): 4234-4245 (2024) - 2023
- [j48]Haohan Guo, Fenglong Xie, Xixin Wu, Frank K. Soong, Helen Meng:
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1811-1824 (2023) - [c254]Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee:
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading. INTERSPEECH 2023: 4883-4887 - [i25]Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee:
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading. CoRR abs/2307.00782 (2023) - 2022
- [j47]Xiaochun An, Frank K. Soong, Lei Xie:
Disentangling Style and Speaker Attributes for TTS Style Transfer. IEEE ACM Trans. Audio Speech Lang. Process. 30: 646-658 (2022) - [j46]Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie:
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS. IEEE ACM Trans. Audio Speech Lang. Process. 30: 2854-2864 (2022) - [c253]Shaoguang Mao, Frank K. Soong, Yan Xia, Jonathan Tien:
A Universal Ordinal Regression for Assessing Phoneme-Level Pronunciation. ICASSP 2022: 6807-6811 - [c252]Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu:
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings. ICASSP 2022: 6827-6831 - [c251]Yujia Xiao, Xi Wang, Lei He, Frank K. Soong:
Improving Fastspeech TTS with Efficient Self-Attention and Compact Feed-Forward Network. ICASSP 2022: 7472-7476 - [c250]Mutian He, Jingzhou Yang, Lei He, Frank K. Soong:
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge. INTERSPEECH 2022: 441-445 - [c249]Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng:
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. INTERSPEECH 2022: 1611-1615 - [i24]Xiaochun An, Frank K. Soong, Lei Xie:
Disentangling Style and Speaker Attributes for TTS Style Transfer. CoRR abs/2201.09472 (2022) - [i23]Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Frank K. Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu:
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality. CoRR abs/2205.04421 (2022) - [i22]Bin Su, Shaoguang Mao, Frank K. Soong, Zhiyong Wu:
Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives. CoRR abs/2207.02454 (2022) - [i21]Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie:
ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS. CoRR abs/2209.06484 (2022) - [i20]Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng:
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. CoRR abs/2209.10887 (2022) - 2021
- [j45]Liumeng Xue, Shifeng Pan, Lei He, Lei Xie, Frank K. Soong:
Cycle consistent network for end-to-end style transfer TTS training. Neural Networks 140: 223-236 (2021) - [j44]Xiaochun An, Frank K. Soong, Shan Yang, Lei Xie:
Effective and direct control of neural TTS prosody by removing interactions between different attributes. Neural Networks 143: 250-260 (2021) - [c248]Yichong Leng, Xu Tan, Sheng Zhao, Frank K. Soong, Xiang-Yang Li, Tao Qin:
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network. ICASSP 2021: 391-395 - [c247]Feng-Long Xie, Xinhui Li, Wen-Chao Su, Li Lu, Frank K. Soong:
A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time. ICASSP 2021: 5704-5708 - [c246]Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He:
Speech Bert Embedding for Improving Prosody in Neural TTS. ICASSP 2021: 6563-6567 - [c245]Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia, Jonathan Tien, Zhiyong Wu:
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples. ICASSP 2021: 7748-7752 - [c244]Xiaochun An, Frank K. Soong, Lei Xie:
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS. Interspeech 2021: 4688-4692 - [c243]Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agents. SLT 2021: 403-409 - [i19]Yichong Leng, Xu Tan, Sheng Zhao, Frank K. Soong, Xiangyang Li, Tao Qin:
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network. CoRR abs/2103.00110 (2021) - [i18]Mutian He, Jingzhou Yang, Lei He, Frank K. Soong:
Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis. CoRR abs/2103.03541 (2021) - [i17]Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He:
Speech BERT Embedding For Improving Prosody in Neural TTS. CoRR abs/2106.04312 (2021) - [i16]Xiaochun An, Frank K. Soong, Lei Xie:
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS. CoRR abs/2106.10003 (2021) - [i15]Xu Tan, Tao Qin, Frank K. Soong, Tie-Yan Liu:
A Survey on Neural Speech Synthesis. CoRR abs/2106.15561 (2021) - [i14]Wenxuan Ye, Shaoguang Mao, Frank K. Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu:
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings. CoRR abs/2110.07274 (2021) - [i13]Mutian He, Jingzhou Yang, Lei He, Frank K. Soong:
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge. CoRR abs/2110.09698 (2021) - 2020
- [j43]Yao Qian, Rutuja Ubale, Patrick L. Lange, Keelan Evanini, Vikram Ramanarayanan, Frank K. Soong:
Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications. J. Signal Process. Syst. 92(8): 805-817 (2020) - [c242]Min-Jae Hwang, Frank K. Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang:
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis. APSIPA 2020: 810-814 - [c241]Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong:
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS. ICASSP 2020: 6704-6708 - [c240]Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank K. Soong, Hong-Goo Kang:
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network. ICASSP 2020: 7219-7223 - [c239]Feng-Long Xie, Xinhui Li, Bo Liu, Yibin Zheng, Li Meng, Li Lu, Frank K. Soong:
An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data. ICASSP 2020: 7754-7758 - [c238]Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li:
Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music. INTERSPEECH 2020: 1236-1240 - [c237]Yang Cui, Xi Wang, Lei He, Frank K. Soong:
An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis. INTERSPEECH 2020: 3555-3559 - [c236]Liping Chen, Kong-Aik Lee, Lei He, Frank K. Soong:
On Early-stop Clustering for Speaker Diarization. Odyssey 2020: 110-116 - [i12]Haohan Guo, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agent. CoRR abs/2005.10438 (2020) - [i11]Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li:
Transfer Learning for Improving Singing-voice Detection in Polyphonic Instrumental Music. CoRR abs/2008.04658 (2020) - [i10]Bin Su, Shaoguang Mao, Frank K. Soong, Yan Xia, Jonathan Tien, Zhiyong Wu:
Improving pronunciation assessment via ordinal regression with anchored reference samples. CoRR abs/2010.13339 (2020) - [i9]Xi Wang, Huaiping Ming, Lei He, Frank K. Soong:
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis. CoRR abs/2011.08480 (2020)
2010 – 2019
- 2019
- [j42]Feng-Long Xie, Frank K. Soong, Haifeng Li:
Voice conversion with SI-DNN and KL divergence based mapping without parallel training data. Speech Commun. 106: 57-67 (2019) - [c235]Ke Wang, Frank K. Soong, Lei Xie:
A Pitch-aware Approach to Single-channel Speech Separation. ICASSP 2019: 296-300 - [c234]Shaoguang Mao, Zhiyong Wu, Jingshuai Jiang, Peiyun Liu, Frank K. Soong:
NN-based Ordinal Regression for Assessing Fluency of ESL Speech. ICASSP 2019: 7420-7424 - [c233]Jingyong Hou, Pengcheng Guo, Sining Sun, Frank K. Soong, Wenping Hu, Lei Xie:
Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech. ICASSP 2019: 8122-8126 - [c232]Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jianhua Tao:
Forward-Backward Decoding for Regularizing End-to-End TTS. INTERSPEECH 2019: 1283-1287 - [c231]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
A New GAN-Based End-to-End TTS Training Algorithm. INTERSPEECH 2019: 1288-1292 - [c230]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. INTERSPEECH 2019: 4460-4464 - [i8]Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong:
Feature reinforcement with word embedding and parsing information in neural TTS. CoRR abs/1901.00707 (2019) - [i7]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. CoRR abs/1904.04764 (2019) - [i6]Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
A New GAN-based End-to-End TTS Training Algorithm. CoRR abs/1904.04775 (2019) - [i5]Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jianhua Tao:
Forward-Backward Decoding for Regularizing End-to-End TTS. CoRR abs/1907.09006 (2019) - 2018
- [c229]Liping Chen, Yong Zhao, Shi-Xiong Zhang, Jie Li, Guoli Ye, Frank K. Soong:
Exploring Sequential Characteristics in Speaker Bottleneck Feature for Text-Dependent Speaker Verification. ICASSP 2018: 5364-5368 - [c228]Yujia Xiao, Frank K. Soong, Wenping Hu:
Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment. INTERSPEECH 2018: 1631-1635 - [c227]Yang Cui, Xi Wang, Lei He, Frank K. Soong:
A New Glottal Neural Vocoder for Speech Synthesis. INTERSPEECH 2018: 2017-2021 - [c226]Feng-Long Xie, Frank K. Soong, Xi Wang, Lei He, Haifeng Li:
Frame Selection in SI-DNN Phonetic Space with WaveNet Vocoder for Voice Conversion without Parallel Training Data. ISCSLP 2018: 56-60 - [c225]Jingyong Hou, Wenping Hu, Frank K. Soong, Lei Xie:
A Refined Query-by-Example Approach to Spoken-Term-Detection on ESL learners' Speech. ISCSLP 2018: 111-115 - [c224]Yao Qian, Rutuja Ubale, Patrick L. Lange, Keelan Evanini, Frank K. Soong:
From Speech Signals to Semantics - Tagging Performance at Acoustic, Phonetic and Word Levels. ISCSLP 2018: 280-284 - [i4]Min-Jae Hwang, Frank K. Soong, Feng-Long Xie, Xi Wang, Hong-Goo Kang:
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis. CoRR abs/1811.11913 (2018) - [i3]Yan Deng, Lei He, Frank K. Soong:
Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice. CoRR abs/1812.05253 (2018) - 2017
- [j41]Eunwoo Song, Frank K. Soong, Hong-Goo Kang:
Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems. IEEE ACM Trans. Audio Speech Lang. Process. 25(11): 2152-2161 (2017) - [c223]Yao Qian, Keelan Evanini, Patrick L. Lange, Robert A. Pugh, Rutuja Ubale, Frank K. Soong:
Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corpora. ASRU 2017: 606-613 - [c222]Eunwoo Song, Frank K. Soong, Hong-Goo Kang:
Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems. ASRU 2017: 671-676 - [c221]Jinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng:
DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances. INTERSPEECH 2017: 1507-1511 - [c220]Yujia Xiao, Frank K. Soong:
Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference. INTERSPEECH 2017: 1755-1759 - [c219]Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong:
Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech. INTERSPEECH 2017: 2586-2590 - 2016
- [j40]Bo Fan, Lei Xie, Shan Yang, Lijuan Wang, Frank K. Soong:
A deep bidirectional LSTM approach for video-realistic talking head. Multim. Tools Appl. 75(9): 5287-5309 (2016) - [j39]Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai:
Modeling F0 trajectories in hierarchically structured deep neural networks. Speech Commun. 76: 82-92 (2016) - [j38]Linlin Wang, Jun Wang, Lantian Li, Thomas Fang Zheng, Frank K. Soong:
Improving speaker verification performance against long-term speaker variability. Speech Commun. 79: 14-29 (2016) - [j37]Xiaojun Qian, Helen M. Meng, Frank K. Soong:
A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training. IEEE ACM Trans. Audio Speech Lang. Process. 24(6): 1020-1028 (2016) - [c218]Wenping Hu, Frank K. Soong:
KL-divergence based mispronunciation detection via DNN and decision tree in the phonetic space. APSIPA 2016: 1-6 - [c217]Yuchen Fan, Yao Qian, Frank K. Soong, Lei He:
Unsupervised speaker adaptation for DNN-based TTS synthesis. ICASSP 2016: 5135-5139 - [c216]Feng-Long Xie, Frank K. Soong, Haifeng Li:
A KL divergence and DNN approach to cross-lingual TTS. ICASSP 2016: 5515-5519 - [c215]Yuchen Fan, Yao Qian, Frank K. Soong, Lei He:
Speaker and language factorization in DNN-based TTS synthesis. ICASSP 2016: 5540-5544 - [c214]Feng-Long Xie, Frank K. Soong, Haifeng Li:
A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences. INTERSPEECH 2016: 287-291 - [c213]Eunwoo Song, Frank K. Soong, Hong-Goo Kang:
Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis. INTERSPEECH 2016: 2253-2257 - [c212]Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao:
Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network. HLT-NAACL 2016: 527-533 - 2015
- [j36]Lijuan Wang, Frank K. Soong:
HMM trajectory-guided sample selection for photo-realistic talking head. Multim. Tools Appl. 74(22): 9849-9869 (2015) - [j35]Wenping Hu, Yao Qian, Frank K. Soong, Yong Wang:
Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Commun. 67: 154-166 (2015) - [c211]Xiaojun Qian, Helen M. Meng, Frank K. Soong:
A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training. APSIPA 2015: 384-387 - [c210]Frank K. Soong, Lijuan Wang:
From text-to-speech (TTS) to talking head - a machine learning approach to A/V speech modeling and rendering. AVSP 2015 - [c209]Yuchen Fan, Yao Qian, Frank K. Soong, Lei He:
Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. ICASSP 2015: 4475-4479 - [c208]Hao Wang, Frank K. Soong, Helen Meng:
AA spectral space warping approach to cross-lingual voice transformation in HMM-based TTS. ICASSP 2015: 4874-4878 - [c207]Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao:
Word embedding for recurrent neural network based TTS synthesis. ICASSP 2015: 4879-4883 - [c206]Bo Fan, Lijuan Wang, Frank K. Soong, Lei Xie:
Photo-real talking head with deep bidirectional LSTM. ICASSP 2015: 4884-4888 - [c205]Yuchen Fan, Yao Qian, Frank K. Soong, Lei He:
Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis. INTERSPEECH 2015: 864-868 - [c204]Wenping Hu, Yao Qian, Frank K. Soong:
An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech. SLaTE 2015: 71-76 - [i2]Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao:
Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network. CoRR abs/1510.06168 (2015) - [i1]Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao:
A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding. CoRR abs/1511.00215 (2015) - 2014
- [c203]Jun Wang, Dong Wang, Ziwei Zhu, Thomas Fang Zheng, Frank K. Soong:
Discriminative scoring for speaker recognition based on I-vectors. APSIPA 2014: 1-5 - [c202]Wenping Hu, Yao Qian, Frank K. Soong:
A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training. ICASSP 2014: 3206-3210 - [c201]Yao Qian, Yuchen Fan, Wenping Hu, Frank K. Soong:
On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis. ICASSP 2014: 3829-3833 - [c200]Hyunson Seo, Hong-Goo Kang, Frank K. Soong:
A maximum a Posterior-based reconstruction approach to speech bandwidth expansion in noise. ICASSP 2014: 6087-6091 - [c199]Yuchen Fan, Yao Qian, Feng-Long Xie, Frank K. Soong:
TTS synthesis with bidirectional LSTM based recurrent neural networks. INTERSPEECH 2014: 1964-1968 - [c198]Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai:
Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree. INTERSPEECH 2014: 2273-2277 - [c197]Feng-Long Xie, Yao Qian, Yuchen Fan, Frank K. Soong, Haifeng Li:
Sequence error (SE) minimization training of neural network for voice conversion. INTERSPEECH 2014: 2283-2287 - [c196]Feng-Long Xie, Yao Qian, Frank K. Soong, Haifeng Li:
Pitch transformation in neural network based voice conversion. ISCSLP 2014: 197-200 - [c195]Wenping Hu, Yao Qian, Frank K. Soong:
A new Neural Network based logistic regression classifier for improving mispronunciation detection of L2 language learners. ISCSLP 2014: 245-249 - 2013
- [j34]Yao Qian, Frank K. Soong, Zhi-Jie Yan:
A Unified Trajectory Tiling Approach to High Quality Speech Rendering. IEEE Trans. Speech Audio Process. 21(2): 280-290 (2013) - [c194]Yao Qian, Frank K. Soong, Xiaobo Zhou, Yundi Qian, Xiaotian Zhang:
A fast table lookup based, statistical model driven non-uniform unit selection TTS. ICASSP 2013: 7957-7961 - [c193]JeeSok Lee, Frank K. Soong, Hong-Goo Kang:
A source-filter based adaptive harmonic model and its application to speech prosody modification. INTERSPEECH 2013: 39-43 - [c192]Wenping Hu, Yao Qian, Frank K. Soong:
A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). INTERSPEECH 2013: 1886-1890 - [c191]Xinjian Zhang, Lijuan Wang, Gang Li, Frank Seide, Frank K. Soong:
A new language independent, photo-realistic talking head driven by voice only. INTERSPEECH 2013: 2743-2747 - [c190]Chaoyang Wang, Lijuan Wang, Yasuyuki Matsushita, Bojun Huang, Magnetro Chen, Frank K. Soong:
Binocular photometric stereo acquisition and reconstruction for 3d talking head applications. INTERSPEECH 2013: 2748-2752 - 2012
- [j33]Lijuan Wang, Yao Qian, Matthew R. Scott, Gang Chen, Frank K. Soong:
Computer-Assisted Audiovisual Language Learning. Computer 45(6): 38-47 (2012) - [c189]Lijuan Wang, Frank K. Soong:
High quality lips animation with speech and captured facial action unit as A/V input. APSIPA 2012: 1-4 - [c188]Yi-Jian Wu, Frank K. Soong:
Modeling pitch trajectory by hierarchical HMM with minimum generation error training. ICASSP 2012: 4017-4020 - [c187]Wei Han, Lijuan Wang, Frank K. Soong, Bo Yuan:
Improved minimum converted trajectory error training for real-time speech-to-lips conversion. ICASSP 2012: 4513-4516 - [c186]Lijuan Wang, Wei Han, Frank K. Soong:
High quality lip-sync animation for 3D photo-realistic talking head. ICASSP 2012: 4529-4532 - [c185]Dongwen Ying, Xugang Lu, Junfeng Li, Yonghong Yan, Jianwu Dang, Frank K. Soong:
Noise estimation using a constrained sequential HMM IN log-spectral domain. ICASSP 2012: 4553-4556 - [c184]Linfang Wang, Lijuan Wang, Yan Teng, Zhe Geng, Frank K. Soong:
Objective Intelligibility Assessment of Text-to-Speech System using Template Constrained Generalized Posterior Probability. INTERSPEECH 2012: 627-630 - [c183]Xiaojun Qian, Helen M. Meng, Frank K. Soong:
The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training. INTERSPEECH 2012: 775-778 - [c182]Ji He, Yao Qian, Frank K. Soong, Sheng Zhao:
Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS. INTERSPEECH 2012: 963-966 - [c181]Meng Yu, Frank K. Soong:
Constrained Multichannel Speech Dereverberation. INTERSPEECH 2012: 1938-1941 - [c180]Feng-Long Xie, Yi-Jian Wu, Frank K. Soong:
Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS. ISCSLP 2012: 60-63 - [c179]Yao Qian, Frank K. Soong:
A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation. ISCSLP 2012: 165-169 - [c178]Xiaotian Zhang, Yao Qian, Hai Zhao, Frank K. Soong:
Break index labeling of mandarin text via syntactic-to-prosodic tree mapping. ISCSLP 2012: 256-260 - [c177]Wenping Hu, Yao Qian, Frank K. Soong:
Pitch accent detection and prediction with DCT features and CRF model. ISCSLP 2012: 266-270 - [c176]Darren Edge, Kai-Yin Cheng, Michael Whitney, Yao Qian, Zhijie Yan, Frank K. Soong:
Tip tap tones: mobile microtraining of mandarin sounds. Mobile HCI (Companion) 2012: 215-216 - [c175]Darren Edge, Kai-Yin Cheng, Michael Whitney, Yao Qian, Zhijie Yan, Frank K. Soong:
Tip tap tones: mobile microtraining of mandarin sounds. Mobile HCI 2012: 427-430 - 2011
- [j32]Yao Qian, Zhizheng Wu, Boyang Gao, Frank K. Soong:
Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units. IEEE Trans. Speech Audio Process. 19(6): 1702-1710 (2011) - [j31]Dongwen Ying, Yonghong Yan, Jianwu Dang, Frank K. Soong:
Voice Activity Detection Based on an Unsupervised Learning Framework. IEEE ACM Trans. Audio Speech Lang. Process. 19(8): 2624-2633 (2011) - [c174]King Keung Wu, Lijuan Wang, Frank K. Soong, Yeung Yam:
A Sparse and Low-rank approach to efficient face alignment for photo-real talking head synthesis. ICASSP 2011: 1397-1400 - [c173]Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo:
Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model. ICASSP 2011: 4520-4523 - [c172]Aki Kunikoshi, Yao Qian, Frank K. Soong, Nobuaki Minematsu:
Improved F0 modeling and generation in voice conversion. ICASSP 2011: 4568-4571 - [c171]Lijuan Wang, Yi-Jian Wu, Xiaodan Zhuang, Frank K. Soong:
Synthesizing visual speech trajectory with minimum generation error. ICASSP 2011: 4580-4583 - [c170]Yao Qian, Ji Xu, Frank K. Soong:
A frame mapping based HMM approach to cross-lingual voice transformation. ICASSP 2011: 5120-5123 - [c169]Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo:
Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model. INTERSPEECH 2011: 373-376 - [c168]Xiaojun Qian, Helen M. Meng, Frank K. Soong:
On Mispronunciation Lexicon Generation Using Joint-Sequence Multigrams in Computer-Aided Pronunciation Training (CAPT). INTERSPEECH 2011: 865-868 - [c167]Bo Peng, Yao Qian, Frank K. Soong, Bo Zhang:
A New Phonetic Candidate Generator for Improving Search Query Efficiency. INTERSPEECH 2011: 1117-1120 - [c166]Lijuan Wang, Wei Han, Frank K. Soong, Qiang Huo:
Text Driven 3D Photo-Realistic Talking Head. INTERSPEECH 2011: 3307-3308 - 2010
- [c165]Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Guoliang Zhang, Lijuan Wang:
An HMM Trajectory Tiling (HTT) Approach to High Quality TTS - Microsoft Entry to Blizzard Challenge 2010. Blizzard Challenge 2010 - [c164]Yu Zhang, Zhi-Jie Yan, Frank K. Soong:
Cross-validation based decision tree clustering for HMM-based TTS. ICASSP 2010: 4602-4605 - [c163]Qingqing Zhang, Frank K. Soong, Yao Qian, Zhijie Yan, Jielin Pan, Yonghong Yan:
Improved modeling for F0 generation and V/U decision in HMM-based TTS. ICASSP 2010: 4606-4609 - [c162]Zhi-Jie Yan, Yao Qian, Frank K. Soong:
RIch-context Unit Selection (RUS) approach to high quality TTS. ICASSP 2010: 4798-4801 - [c161]Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong:
An HMM trajectory tiling (HTT) approach to high quality TTS. INTERSPEECH 2010: 422-425 - [c160]Yining Chen, Zhi-Jie Yan, Frank K. Soong:
A perceptual study of acceleration parameters in HMM-based TTS. INTERSPEECH 2010: 426-429 - [c159]Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong:
Synthesizing photo-real talking head via trajectory-guided sample selection. INTERSPEECH 2010: 446-449 - [c158]Xiaojun Qian, Frank K. Soong, Helen M. Meng:
Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). INTERSPEECH 2010: 757-760 - [c157]Xin Zhuang, Yao Qian, Frank K. Soong, Yi-Jian Wu, Bo Zhang:
Formant-based frequency warping for improving speaker adaptation in HMM TTS. INTERSPEECH 2010: 817-820 - [c156]Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson:
A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion. INTERSPEECH 2010: 1736-1739 - [c155]Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, Li-Rong Dai:
A hierarchical F0 modeling method for HMM-based speech synthesis. INTERSPEECH 2010: 2170-2173 - [c154]Xiaojun Qian, Helen M. Meng, Frank K. Soong:
Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT). ISCSLP 2010: 84-88 - [c153]Lijuan Wang, Wei Han, Xiaojun Qian, Frank K. Soong:
Rendering a personalized photo-real talking head from short video footage. ISCSLP 2010: 129-134 - [c152]Yao Qian, Zhizheng Wu, Xuezhe Ma, Frank K. Soong:
Automatic prosody prediction and detection with Conditional Random Field (CRF) models. ISCSLP 2010: 135-138 - [c151]Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong:
Photo-real lips synthesis with trajectory-guided sample selection. SSW 2010: 217-222
2000 – 2009
- 2009
- [j30]Yao Qian, Frank K. Soong:
A Multi-Space Distribution (MSD) and two-stream tone modeling approach to Mandarin speech recognition. Speech Commun. 51(12): 1169-1179 (2009) - [j29]Peng Liu, Frank K. Soong:
A Quadratic Optimization Approach to Discriminative Training of CDHMMs. IEEE Signal Process. Lett. 16(3): 149-152 (2009) - [j28]Peng Liu, Frank K. Soong:
Graph-Based Partial Hypothesis Fusion for Pen-Aided Speech Input. IEEE Trans. Speech Audio Process. 17(3): 478-485 (2009) - [j27]Yao Qian, Hui Liang, Frank K. Soong:
A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin-English) TTS. IEEE Trans. Speech Audio Process. 17(6): 1231-1239 (2009) - [c150]Lijuan Wang, Wei Han, Xiaojun Qian, Frank K. Soong:
HMM-based motion trajectory generation for speech animation synthesis. AVSP 2009: 170 - [c149]Yao Qian, Zhizheng Wu, Frank K. Soong:
Improved prosody generation by maximizing joint likelihood of state and longer units. ICASSP 2009: 3781-3784 - [c148]Yu Zhang, Peng Liu, Jen-Tzung Chien, Frank K. Soong:
An evidence framework for Bayesian learning of continuous-density hidden Markov models. ICASSP 2009: 3857-3860 - [c147]Yining Chen, Yang Jiao, Yao Qian, Frank K. Soong:
State mapping for cross-language speaker adaptation in TTS. ICASSP 2009: 4273-4276 - [c146]Yuqiang Chen, Chao Huang, Frank K. Soong:
Improving mispronunciation detection using machine learning. ICASSP 2009: 4865-4868 - [c145]Yao Qian, Frank K. Soong, Miaomiao Wang, Zhizheng Wu:
A minimum v/u error approach to F0 generation in HMM-based TTS. INTERSPEECH 2009: 408-411 - [c144]Siu Wa Lee, Frank K. Soong, Tan Lee:
Model-based speech separation: identifying transcription using orthogonality. INTERSPEECH 2009: 1343-1346 - [c143]Zhi-Jie Yan, Yao Qian, Frank K. Soong:
Rich context modeling for high quality HMM-based TTS. INTERSPEECH 2009: 1755-1758 - [c142]Lijuan Wang, Shenghao Qin, Frank K. Soong:
Auto-checking speech transcriptions by multiple template constrained posterior. INTERSPEECH 2009: 1831-1834 - 2008
- [j26]Yao Qian, Frank K. Soong, Tan Lee:
Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR. Comput. Speech Lang. 22(4): 360-373 (2008) - [j25]Peng Liu, Cong Liu, Hui Jiang, Frank K. Soong, Ren-Hua Wang:
A Constrained Line Search Optimization Method for Discriminative Training of HMMs. IEEE Trans. Speech Audio Process. 16(5): 900-909 (2008) - [j24]Jia-Li You, Yining Chen, Min Chu, Frank K. Soong, Jin-Lin Wang:
Identifying Language Origin of Named Entity With Multiple Information Sources. IEEE Trans. Speech Audio Process. 16(6): 1077-1086 (2008) - [c141]Zhen Xuan Luo, Yu Shi, Frank K. Soong:
Symbol graph based discriminative training and rescoring for improved math symbol recognition. ICASSP 2008: 1953-1956 - [c140]Peng Liu, Lei Ma, Frank K. Soong:
Prefix tree based auto-completion for convenient bi-modal chinese character input. ICASSP 2008: 4465-4468 - [c139]Hui Liang, Yao Qian, Frank K. Soong, Gongshen Liu:
A cross-language state mapping approach to bilingual (Mandarin-English) TTS. ICASSP 2008: 4641-4644 - [c138]Yining Chen, Peng Liu, Jia-Li You, Frank K. Soong:
Discriminative training for improving letter-to-sound conversion performance. ICASSP 2008: 4649-4652 - [c137]Jia-Li You, Yining Chen, Frank K. Soong, Jin-Lin Wang:
Improving letter-to-sound conversion performance with automatically generated new words. ICASSP 2008: 4653-4656 - [c136]Lijuan Wang, Tao Hu, Frank K. Soong:
Template constrained posterior for verifying phone transcriptions. ICASSP 2008: 4681-4684 - [c135]Feng Zhang, Chao Huang, Frank K. Soong, Min Chu, Ren-Hua Wang:
Automatic mispronunciation detection for Mandarin. ICASSP 2008: 5077-5080 - [c134]Peng Liu, Lei Ma, Frank K. Soong:
Radical based fine trajectory HMMs of online handwritten characters. ICPR 2008: 1-4 - [c133]Yu Shi, Frank K. Soong:
A symbol graph based handwritten math expression recognition. ICPR 2008: 1-4 - [c132]Peng Liu, Frank K. Soong:
An ellipsoid constrained quadratic programming perspective to discriminative training of HMMs. INTERSPEECH 2008: 281-284 - [c131]Yu Shi, Frank Seide, Frank K. Soong:
GPU-accelerated Gaussian clustering for fMPE discriminative training. INTERSPEECH 2008: 944-947 - [c130]Yu Ting Yeung, Yao Qian, Tan Lee, Frank K. Soong:
Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech. INTERSPEECH 2008: 1133-1136 - [c129]Yao Qian, Hui Liang, Frank K. Soong:
Generating natural F0 trajectory with additive trees. INTERSPEECH 2008: 2126-2129 - [c128]Boyang Gao, Yao Qian, Zhizheng Wu, Frank K. Soong:
Duration refinement by jointly optimizing state and longer unit likelihood. INTERSPEECH 2008: 2266-2269 - [c127]Lijuan Wang, Xiaojun Qian, Lei Ma, Yao Qian, Yining Chen, Frank K. Soong:
A real-time text to audio-visual speech synthesis system. INTERSPEECH 2008: 2338-2341 - [c126]Chao Huang, Feng Zhang, Frank K. Soong, Min Chu:
Mispronunciation detection for Mandarin Chinese. INTERSPEECH 2008: 2655-2658 - [c125]Lijuan Wang, Tao Hu, Peng Liu, Frank K. Soong:
Efficient handwriting correction of speech recognition errors with template constrained posterior (TCP). INTERSPEECH 2008: 2659-2662 - [c124]Yao Qian, Houwei Cao, Frank K. Soong:
HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis. ISCSLP 2008: 13-16 - [c123]Chao Huang, Feng Zhang, Frank K. Soong:
Improving Automatic Evaluation of Mandarin Pronunciation with Speaker Adaptive Training (SAT) and MLLR Speaker Adaption. ISCSLP 2008: 37-40 - [c122]Zhizheng Wu, Yao Qian, Frank K. Soong, Bo Zhang:
Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech. ISCSLP 2008: 121-124 - [c121]Siu Wa Lee, Frank K. Soong, P. C. Ching, Tan Lee:
Pitch Tracking for Model-Based Speech Separation. ISCSLP 2008: 145-148 - 2007
- [j23]Jun Du, Peng Liu, Frank K. Soong, Jian-Lai Zhou, Ren-Hua Wang:
Performance of Discriminative HMM Training in Noise. Int. J. Comput. Linguistics Chin. Lang. Process. 12(3) (2007) - [j22]Chen Yang, Frank K. Soong, Tan Lee:
Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR. IEEE Trans. Speech Audio Process. 15(3): 1087-1097 (2007) - [j21]Wei Wu, Thomas Fang Zheng, Mingxing Xu, Frank K. Soong:
A Cohort-Based Speaker Model Synthesis for Mismatched Channels in Speaker Verification. IEEE Trans. Speech Audio Process. 15(6): 1893-1903 (2007) - [j20]Minho Jin, Frank K. Soong, Chang Dong Yoo:
A Syllable Lattice Approach to Speaker Verification. IEEE Trans. Speech Audio Process. 15(8): 2476-2484 (2007) - [c120]Peng Liu, Cong Liu, Hui Jiang, Frank K. Soong, Ren-Hua Wang:
A constrained line search approach to general discriminative HMM training. ASRU 2007: 290-295 - [c119]Min Chu, Yusheng Li, Xin Zou, Frank K. Soong:
Enrich Web Applications with Voice Internet Persona Text-to-Speech for Anyone, Anywhere. HCI (3) 2007: 40-49 - [c118]Peng Liu, Frank K. Soong, Jian-Lai Zhou:
Divergence-Based Similarity Measure for Spoken Document Retrieval. ICASSP (4) 2007: 89-92 - [c117]Jing Zheng, Chao Huang, Min Chu, Frank K. Soong, Weiping Ye:
Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation. ICASSP (4) 2007: 201-204 - [c116]Cong Liu, Peng Liu, Hui Jiang, Frank K. Soong, Ren-Hua Wang:
A Constrained Line Search Optimization for Discriminative Training in Speech Recognition. ICASSP (4) 2007: 329-332 - [c115]Zhi-Jie Yan, Frank K. Soong, Ren-Hua Wang:
Word Graph Based Feature Enhancement for Noisy Speech Recognition. ICASSP (4) 2007: 373-376 - [c114]Yi-Jian Wu, Ren-Hua Wang, Frank K. Soong:
Full HMM Training for Minimizing Generation Error in Synthesis. ICASSP (4) 2007: 517-520 - [c113]Jun Du, Peng Liu, Hui Jiang, Frank K. Soong, Ren-Hua Wang:
A New Minimum Divergence Approach to Discriminative Training. ICASSP (4) 2007: 677-680 - [c112]Yanlu Xie, Yu Shi, Frank K. Soong, Beiqian Dai:
A Segmentation Posterior Based Endpointing Algorithm. ICASSP (4) 2007: 813-816 - [c111]Xinqiang Ni, Yining Chen, Min Chu, Frank K. Soong, Yong Zhao, Ping Zhang:
Agreement Learning for Automatic Accent Annotation. ICASSP (4) 2007: 829-832 - [c110]Yu Zhang, Peng Liu, Frank K. Soong:
Minimum Error Discriminative Training for Radical-Based Online Chinese Handwriting Recognition. ICDAR 2007: 53-57 - [c109]Lei Ma, Frank K. Soong, Peng Liu, Yi-Jian Wu:
A MSD-HMM Approach to Pen Trajectory Modeling for Online Handwriting Recognition. ICDAR 2007: 128-132 - [c108]Yu Shi, HaiYang Li, Frank K. Soong:
A Unified Framework for Symbol Segmentation and Recognition of Handwritten Mathematical Expressions. ICDAR 2007: 854-858 - [c107]Xinqiang Ni, Yining Chen, Frank K. Soong, Min Chu, Ping Zhang:
An unsupervised approach to automatic prosodic annotation. INTERSPEECH 2007: 486-489 - [c106]Siu Wa Lee, Frank K. Soong, Pak-Chung Ching:
Model-based speech separation with single-microphone input. INTERSPEECH 2007: 850-853 - [c105]Hua Zhang, Lijuan Wang, Frank K. Soong, Wenju Liu:
Context constrained-generalized posterior probability for verifying phone transcriptions. INTERSPEECH 2007: 1330-1333 - [c104]Sheng Qiang, Yao Qian, Frank K. Soong, Congfu Xu:
Robust F0 modeling for Mandarin speech recognition in noise. INTERSPEECH 2007: 1801-1804 - [c103]Dacheng Lin, Yong Zhao, Frank K. Soong, Min Chu, Jieyu Zhao:
Iterative unit selection with unnatural prosody detection. INTERSPEECH 2007: 2909-2912 - [c102]Lijuan Wang, Min Chu, Yaya Peng, Yong Zhao, Frank K. Soong:
Perceptual annotation of expressive speech. SSW 2007: 46-51 - [c101]Hui Liang, Yao Qian, Frank K. Soong:
An HMM-based bilingual (Mandarin-English) TTS. SSW 2007: 137-142 - [c100]Yong Zhao, Chengsuo Zhang, Frank K. Soong, Min Chu, Xi Xiao:
Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis. SSW 2007: 206-210 - 2006
- [j19]Lijuan Wang, Yong Zhao, Min Chu, Frank K. Soong, Jian-Lai Zhou, Zhigang Cao:
Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units. IEICE Trans. Inf. Syst. 89-D(3): 1082-1091 (2006) - [j18]Tan Lee, Patgi Kam, Frank K. Soong:
Modeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition. Int. J. Comput. Linguistics Chin. Lang. Process. 11(1) (2006) - [j17]Zhenyu Xiong, Thomas Fang Zheng, Zhanjiang Song, Frank K. Soong, Wenhu Wu:
A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification. Speech Commun. 48(10): 1273-1282 (2006) - [c99]Min Chu, Yining Chen, Yong Zhao, Yusheng Li, Frank K. Soong:
A Study on How Human Annotations Benefit the TTS Voice. Blizzard Challenge 2006 - [c98]Chao Huang, Yingchun Huang, Frank K. Soong, Jianlai Zhou:
Weighted Likelihood Ratio (WLR) Hidden Markov Model for Noisy Speech Recognition. ICASSP (1) 2006: 37-40 - [c97]Yao Qian, Frank K. Soong, Tan Lee:
Tone-Enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR. ICASSP (1) 2006: 133-136 - [c96]Zhengyu Zhou, Jianfeng Gao, Frank K. Soong, Helen Meng:
A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization. ICASSP (1) 2006: 141-144 - [c95]Siu Wa Lee, Frank K. Soong, Pak-Chung Ching:
An Iterative Trajectory Regeneration Algorithm for Separating Mixed Speech Sources. ICASSP (1) 2006: 157-160 - [c94]Xi Zhou, Ye Tian, Jian-Lai Zhou, Frank K. Soong, Beiqian Dai:
Improved Chinese Character Input by Merging Speech and Handwriting Recognition Hypotheses. ICASSP (1) 2006: 609-612 - [c93]Yu Shi, Frank K. Soong, Jian-Lai Zhou:
Auto-Segmentation Based Partitioning and Clustering Approach to Robust Endpointing. ICASSP (1) 2006: 793-796 - [c92]Minho Jin, Frank K. Soong, Chang D. Yoo:
Syllable Lattice Based Re-Scoring For Speaker Verification. ICASSP (1) 2006: 921-924 - [c91]Peng Liu, Frank K. Soong:
Word graph based speech rcognition error correction by handwriting input. ICMI 2006: 339-346 - [c90]Jun Du, Peng Liu, Frank K. Soong, Jian-Lai Zhou, Ren-Hua Wang:
Minimum divergence based discriminative training. INTERSPEECH 2006 - [c89]Qiang Fu, Antonio Moreno-Daniel, Biing-Hwang Juang, Jian-Lai Zhou, Frank K. Soong:
Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP). INTERSPEECH 2006 - [c88]Yu Shi, Frank K. Soong, Jian-Lai Zhou:
Auto-segmentation based VAD for robust ASR. INTERSPEECH 2006 - [c87]Huanliang Wang, Yao Qian, Frank K. Soong, Jian-Lai Zhou, Jiqing Han:
A multi-space distribution (MSD) approach to speech recognition of tonal languages. INTERSPEECH 2006 - [c86]Yu Shi, Frank K. Soong, Jian-Lai Zhou:
Integrating Hypotheses of Multiple Recognizers for Improving Mandarin LVCSR Performance. ISCSLP 2006 - [c85]Zhijie Yan, Peng Liu, Jun Du, Frank K. Soong, Renhua Wang:
Training Discriminative HMM by Optimal Allocation of Gaussian Kernels. ISCSLP 2006 - [c84]Dongwen Ying, Yu Shi, Frank K. Soong, Jianwu Dang, Xugang Lu:
A Robust Voice Activity Detection Based on Noise Eigenspace Projection. ISCSLP (Selected Papers) 2006: 76-86 - [c83]Yao Qian, Frank K. Soong, Yining Chen, Min Chu:
An HMM-Based Mandarin Chinese Text-To-Speech System. ISCSLP (Selected Papers) 2006: 223-232 - [c82]Peng Liu, Jian-Lai Zhou, Frank K. Soong:
Non-uniform Kernel Allocation Based Parsimonious HMM. ISCSLP (Selected Papers) 2006: 294-302 - [c81]Zhi-Jie Yan, Jian-Lai Zhou, Frank K. Soong, Ren-Hua Wang:
Signal Trajectory Based Noise Compensation for Robust Speech Recognition. ISCSLP (Selected Papers) 2006: 335-345 - [c80]Jun Du, Peng Liu, Frank K. Soong, Jian-Lai Zhou, Ren-Hua Wang:
Noisy Speech Recognition Performance of Discriminative HMMs. ISCSLP (Selected Papers) 2006: 358-369 - [c79]Huanliang Wang, Yao Qian, Frank K. Soong, Jian-Lai Zhou, Jiqing Han:
Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models. ISCSLP (Selected Papers) 2006: 445-453 - [c78]Li Zhang, Chao Huang, Min Chu, Frank K. Soong, Xianda Zhang, Yudong Chen:
Automatic Detection of Tone Mispronunciation in Mandarin. ISCSLP (Selected Papers) 2006: 590-601 - [c77]Min Chu, Yong Zhao, Yining Chen, Lijuan Wang, Frank K. Soong:
The Paradigm for Creating Multi-lingual Text-To-Speech Voice Databases. ISCSLP (Selected Papers) 2006: 736-747 - 2005
- [j16]Tomoko Matsui, Frank K. Soong, Biing-Hwang Juang:
Verification of Multi-Class Recognition Decision: A Classification Approach. IEICE Trans. Inf. Syst. 88-D(3): 455-462 (2005) - [j15]Hui Jiang, Frank K. Soong, Chin-Hui Lee:
A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification. IEEE Trans. Speech Audio Process. 13(5-2): 945-955 (2005) - [c76]Wai Kit Lo, Frank K. Soong:
Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences. ICASSP (1) 2005: 85-88 - [c75]Chen Yang, Frank K. Soong, Tan Lee:
Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR. ICASSP (1) 2005: 241-244 - [c74]Xiao-Bing Li, Frank K. Soong, Tor André Myrvoll, Ren-Hua Wang:
Optimal Clustering and Non-Uniform Allocation of Gaussian Kernels in Scalar Dimension for HMM Compression. ICASSP (1) 2005: 669-672 - [c73]Siu Wa Lee, Frank K. Soong, Pak-Chung Ching:
Harmonic filtering for joint estimation of pitch and voiced source with single-microphone input. INTERSPEECH 2005: 309-312 - [c72]Peng Liu, Ye Tian, Jian-Lai Zhou, Frank K. Soong:
Background model based posterior probability for measuring confidence. INTERSPEECH 2005: 1465-1468 - [c71]Lijuan Wang, Yong Zhao, Min Chu, Frank K. Soong, Zhigang Cao:
Phonetic transcription verification with generalized posterior probability. INTERSPEECH 2005: 1949-1952 - [c70]Yong Zhao, Lijuan Wang, Min Chu, Frank K. Soong, Zhigang Cao:
Refining phoneme segmentations using speaker-adaptive context dependent boundary models. INTERSPEECH 2005: 2557-2560 - 2004
- [c69]Ruiqiang Zhang, Gen-ichiro Kikui, Hirofumi Yamamoto, Frank K. Soong, Taro Watanabe, Wai Kit Lo:
A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech recognition and Machine Translation. COLING 2004 - [c68]Frank K. Soong, Wai Kit Lo, Satoshi Nakamura:
Optimal acoustic and language model weights for minimizing word verification errors. INTERSPEECH 2004: 441-444 - [c67]Ruiqiang Zhang, Gen-ichiro Kikui, Hirofumi Yamamoto, Frank K. Soong, Taro Watanabe, Eiichiro Sumita, Wai Kit Lo:
Improved spoken language translation using n-best speech recognition hypotheses. INTERSPEECH 2004: 1629-1632 - [c66]Wai Kit Lo, Frank K. Soong, Satoshi Nakamura:
Robust verification of recognized words in noise. INTERSPEECH 2004: 1665-1668 - [c65]Yao Qian, Tan Lee, Frank K. Soong:
Tone information as a confidence measure for improving Cantonese LVCSR. INTERSPEECH 2004: 1965-1968 - [c64]Wai Kit Lo, Frank K. Soong, Satoshi Nakamura:
Generalized posterior probability for minimizing verification errors at subword, word and sentence levels. ISCSLP 2004: 13-16 - [c63]Chen Yang, Frank K. Soong, Tan Lee:
On noise robustness of dynamic and static features for continuous Cantonese digit recognition. ISCSLP 2004: 277-280 - 2003
- [c62]Tor André Myrvoll, Frank K. Soong:
Optimal clustering of multivariate normal distributions using divergence and its application to HMM adaptation. ICASSP (1) 2003: 552-555 - [c61]Florian Hilger, Hermann Ney, Olivier Siohan, Frank K. Soong:
Combining neighboring filter channels to improve quantile based histogram equalization. ICASSP (1) 2003: 640-643 - [c60]Patgi Kam, Tan Lee, Frank K. Soong:
Modeling Cantonese pronunciation variation by acoustic model refinement. INTERSPEECH 2003: 1477-1480 - [c59]Tor André Myrvoll, Frank K. Soong:
On divergence based clustering of normal distributions and its application to HMM adaptation. INTERSPEECH 2003: 1517-1520 - 2002
- [c58]Hui Jiang, Olivier Siohan, Frank K. Soong, Chin-Hui Lee:
A dynamic in-search discriminative training approach for large vocabulary speech recognition. ICASSP 2002: 113-116 - [c57]Tomoko Matsui, Frank K. Soong, Biing-Hwang Juang:
Classifier design for verification of multi-class recognition decision. ICASSP 2002: 117-120 - [c56]Jingdong Chen, Dimitris Dimitriadis, Hui Jiang, Qi Li, Tor André Myrvoll, Olivier Siohan, Frank K. Soong:
Bell labs approach to Aurora evaluation on connected digit recognition. INTERSPEECH 2002: 229-232 - [c55]Jingdong Chen, Yiteng Huang, Qi Li, Frank K. Soong:
Recognition of noisy speech using normalized moments. INTERSPEECH 2002: 2441-2444 - 2001
- [c54]Hui Jiang, Frank K. Soong, Chin-Hui Lee:
Hierarchical stochastic feature matching for robust speech recognition. ICASSP 2001: 217-220 - [c53]Olivier Siohan, Akio Ando, Mohamed Afify, Hui Jiang, Chin-Hui Lee, Qi Li, Feng Liu, Kazuo Onoe, Frank K. Soong, Qiru Zhou:
A real-time Japanese broadcast news closed-captioning system. INTERSPEECH 2001: 495-498 - [c52]Qi Li, Frank K. Soong, Olivier Siohan:
An auditory system-based feature for robust speech recognition. INTERSPEECH 2001: 619-622 - [c51]Mohamed Afify, Hui Jiang, Filipp Korkmazskiy, Chin-Hui Lee, Qi Li, Olivier Siohan, Frank K. Soong, Arun C. Surendran:
Evaluating the Aurora connected digit recognition task - a bell labs approach. INTERSPEECH 2001: 633-636 - [c50]Hui Jiang, Frank K. Soong, Chin-Hui Lee:
A data selection strategy for utterance verification in continuous speech recognition. INTERSPEECH 2001: 2573-2576 - 2000
- [c49]Frank K. Soong, Eric A. Woudenberg:
Hands-free human-machine dialogue - corpora, technology and evaluation. INTERSPEECH 2000: 41-44 - [c48]Qi Li, Frank K. Soong, Olivier Siohan:
A high-performance auditory feature for robust speech recognition. INTERSPEECH 2000: 51-54
1990 – 1999
- 1999
- [j14]Qi Li, Biing-Hwang Juang, Chin-Hui Lee, Qiru Zhou, Frank K. Soong:
Recent advancements in automatic speaker authentication. IEEE Robotics Autom. Mag. 6(1): 24-34 (1999) - [c47]Jae H. Kim, Raziel Haimi-Cohen, Frank K. Soong:
Hidden Markov models with divergence based vector quantized variances. ICASSP 1999: 125-128 - [c46]Eric A. Woudenberg, Frank K. Soong, Biing-Hwang Juang:
A block least squares approach to acoustic echo cancellation. ICASSP 1999: 869-872 - 1998
- [c45]Sunil K. Gupta, Frank K. Soong:
Improved utterance rejection using length dependent thresholds. ICSLP 1998 - 1997
- [c44]Filipp Korkmazskiy, Biing-Hwang Juang, Frank K. Soong:
Generalized mixture of HMMs for continuous speech recognition. ICASSP 1997: 1443-1446 - 1996
- [c43]Sunil K. Gupta, Frank K. Soong, Raziel Haimi-Cohen:
High-accuracy connected digit recognition for mobile applications. ICASSP 1996: 57-60 - [c42]Sunil K. Gupta, Frank K. Soong, Raziel Haimi-Cohen:
Quantizing mixture-weights in a tied-mixture HMM. ICSLP 1996: 1828-1831 - 1995
- [c41]Chi-Shi Liu, Hsiao-Chuan Wang, Frank K. Soong, Chao-Shih Huang:
An orthogonal polynomial representation of speech signals and its probabilistic model for text independent speaker verification. ICASSP 1995: 345-348 - [c40]Jung-Kuei Chen, Lin-Shan Lee, Frank K. Soong:
Large vocabulary, word-based Mandarin dictation system. EUROSPEECH 1995: 285-288 - [c39]Torbjørn Svendsen, Frank K. Soong, Heiko Purnhagen:
Optimizing baseforms for HMM-based speech recognition. EUROSPEECH 1995: 783-787 - 1994
- [j13]Eng-Fong Huang, Frank K. Soong, Hsiao-Chuan Wang:
The use of tree-trellis search for large-vocabulary Mandarin polysyllabic word speech recognition. Comput. Speech Lang. 8(1): 39-50 (1994) - [j12]Wu Chou, Chin-Hui Lee, Biing-Hwang Juang, Frank K. Soong:
A Minimum Error Rate Pattern Recognition Approach to Speech Recognition. Int. J. Pattern Recognit. Artif. Intell. 8(1): 5-31 (1994) - [j11]Jung-Kuei Chen, Frank K. Soong:
An N-best candidates-based discriminative training for speech recognition applications. IEEE Trans. Speech Audio Process. 2(1): 206-216 (1994) - [j10]Eng-Fong Huang, Hsiao-Chuan Wang, Frank K. Soong:
A fast algorithm for large vocabulary keyword spotting application. IEEE Trans. Speech Audio Process. 2(3): 449-452 (1994) - [c38]Jung-Kuei Chen, Frank K. Soong, Lin-Shan Lee:
Large vocabulary word recognition based on tree-trellis search. ICASSP (2) 1994: 137-140 - [c37]Jung-Kuei Chen, Frank K. Soong:
Discriminative training of high performance speech recognizer using N best candidates. ICASSP (1) 1994: 625-628 - [c36]Aaron E. Rosenberg, Chin-Hui Lee, Frank K. Soong:
Cepstral channel normalization techniques for HMM-based speaker verification. ICSLP 1994: 1835-1838 - 1993
- [j9]Frank K. Soong, Biing-Hwang Juang:
Optimal quantization of LSP parameters. IEEE Trans. Speech Audio Process. 1(1): 15-24 (1993) - 1992
- [c35]Belle L. Tseng, Frank K. Soong, Aaron E. Rosenberg:
Continuous probabilistic acoustic map for speaker recognition. ICASSP 1992: 161-164 - [c34]Kouichi Yamaguchi, Shigeki Sagayama, Kenji Kita, Frank K. Soong:
Continuous mixture HMM-LR using the a* algorithm for continuous speech recognition. ICSLP 1992: 301-304 - [c33]Aaron E. Rosenberg, Joel DeLong, Chin-Hui Lee, Biing-Hwang Juang, Frank K. Soong:
The use of cohort normalized scores for speaker verification. ICSLP 1992: 599-602 - 1991
- [c32]Frank K. Soong, Eng-Fong Huang:
A tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition. ICASSP 1991: 705-708 - 1990
- [c31]Frank K. Soong, Biing-Hwang Juang:
Optimal quantization of LSP parameters using delayed decisions. ICASSP 1990: 185-188 - [c30]Aaron E. Rosenberg, Chin-Hui Lee, Frank K. Soong:
Sub-word unit talker verification using hidden Markov models. ICASSP 1990: 269-272 - [c29]Biing-Hwang Juang, Frank K. Soong:
Speaker recognition based on source coding approaches. ICASSP 1990: 613-616 - [c28]Eng-Fong Huang, Frank K. Soong:
A probabilistic acoustic map based discriminative HMM training. ICASSP 1990: 693-696 - [c27]S. A. Euler, Biing-Hwang Juang, Chin-Hui Lee, Frank K. Soong:
Statistical segmentation and word modeling techniques in isolated word recognition. ICASSP 1990: 745-748 - [c26]Aaron E. Rosenberg, Chin-Hui Lee, Frank K. Soong, Maureen A. McGee:
Experiments in automatic talker verification using sub-word unit hidden Markov models. ICSLP 1990: 141-144 - [c25]Frank K. Soong, Eng-Fong Huang:
A tree-trellis based fast search for finding the n best sentence hypotheses in continuous speech recognition. ICSLP 1990: 709-712 - [c24]Frank K. Soong, Eng-Fong Huang:
A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition. HLT 1990
1980 – 1989
- 1989
- [j8]Lawrence R. Rabiner, Jay G. Wilpon, Frank K. Soong:
High performance connected digit recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37(8): 1214-1225 (1989) - [c23]Frank K. Soong:
A phonetically labeled acoustic segment (PLAS) approach to speech analysis-synthesis. ICASSP 1989: 584-587 - [c22]Chin-Hui Lee, Biing-Hwang Juang, Frank K. Soong, Lawrence R. Rabiner:
Word recognition using whole word and subword models. ICASSP 1989: 683-686 - 1988
- [j7]Frank K. Soong, Man Mohan Sondhi:
A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise. IEEE Trans. Acoust. Speech Signal Process. 36(1): 41-48 (1988) - [j6]Frank K. Soong, Aaron E. Rosenberg:
On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans. Acoust. Speech Signal Process. 36(6): 871-879 (1988) - [c21]Lawrence R. Rabiner, Jay G. Wilpon, Frank K. Soong:
High performance connected digit recognition, using hidden Markov models. ICASSP 1988: 119-122 - [c20]Frank K. Soong, Bling-Hwang Juang:
Optimal quantization of LSP parameters [speech coding]. ICASSP 1988: 394-397 - [c19]Chin-Hui Lee, Frank K. Soong, Biing-Hwang Juang:
A segment model based approach to speech recognition. ICASSP 1988: 501-541 - 1987
- [c18]Torbjørn Svendsen, Frank K. Soong:
On the automatic segmentation of speech signals. ICASSP 1987: 77-80 - [c17]Frank K. Soong, M. Mohan Sondhi:
A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise. ICASSP 1987: 625-628 - [c16]Frank K. Soong:
A training procedure for a segment-based-network approach to isolated word recognition. ICASSP 1987: 693-696 - 1986
- [c15]Aaron E. Rosenberg, Frank K. Soong:
Evaluation of a vector quantization talker recognition system in text independent and text dependent modes. ICASSP 1986: 873-876 - [c14]Frank K. Soong, Aaron E. Rosenberg:
On the use of instantaneous and transitional spectral information in speaker recognition. ICASSP 1986: 877-880 - [c13]Frank K. Soong, Richard V. Cox, Nikil S. Jayant:
A high quality subband speech coder with backward adaptive predictor and optimal time-frequency bit assignment. ICASSP 1986: 2387-2390 - 1985
- [j5]A. F. Bergh, Frank K. Soong, Lawrence R. Rabiner:
Incorporation of temporal structure into a vector-quantization-based preprocessor for speaker-independent, isolated-word recognition. AT&T Tech. J. 64(5): 1047-1063 (1985) - [j4]Lawrence R. Rabiner, Frank K. Soong:
Single-frame vowel recognition using vector quantization with several distance measures. AT&T Tech. J. 64(10): 2319-2330 (1985) - [j3]N. Nocerino, Frank K. Soong, Lawrence R. Rabiner, Dennis H. Klatt:
Comparative study of several distortion measures for speech recognition. Speech Commun. 4(4): 317-331 (1985) - [j2]Kuk-Chin Pan, Frank K. Soong, Lawrence R. Rabiner:
A vector-quantization-based preprocessor for speaker-independent isolated word recognition. IEEE Trans. Acoust. Speech Signal Process. 33(3): 546-560 (1985) - [c12]N. Nocerino, Frank K. Soong, Lawrence R. Rabiner, Dennis H. Klatt:
Comparative study of several distortion measures for speech recognition. ICASSP 1985: 25-28 - [c11]Frank K. Soong, Aaron E. Rosenberg, Lawrence R. Rabiner, Biing-Hwang Juang:
A vector quantization approach to speaker recognition. ICASSP 1985: 387-390 - [c10]Kuk-Chin Pan, Frank K. Soong, Lawrence R. Rabiner, A. F. Bergh:
An efficient vector-quantization preprocessor for speaker independent isolated word recognition. ICASSP 1985: 874-877 - [c9]Frank K. Soong, Richard V. Cox, Nikil S. Jayant:
Subband coding of speech using backward adaptive prediction and bit allocation. ICASSP 1985: 1672-1675 - 1984
- [j1]Lawrence R. Rabiner, Kuk-Chin Pan, Frank K. Soong:
On the performance of isolated word speech recognizers using vector quantization and temporal energy contours. AT&T Bell Lab. Tech. J. 63(7): 1245-1260 (1984) - [c8]Jean-Sylvain Liénard, Frank K. Soong:
On the use of transient information in speech recognition. ICASSP 1984: 9-12 - [c7]Frank K. Soong, Biing-Hwang Juang:
Line spectrum pair (LSP) and speech data compression. ICASSP 1984: 37-40 - 1982
- [c6]Frank K. Soong, Allen M. Peterson:
On the high resolution and unbiased frequency estimates of sinusoids in white noise-A new adaptive approach. ICASSP 1982: 1362-1366 - [c5]Frank K. Soong, Allen M. Peterson:
Fast least-squares (LS) in the voice echo cancellation application. ICASSP 1982: 1398-1403 - 1981
- [c4]Frank K. Soong, S. Shankar Narayan, Allen M. Peterson:
On the asymptotic behavior of a complex adaptive line enchancer (CALE). ICASSP 1981: 287-292 - 1980
- [c3]Frank K. Soong, Allen M. Peterson:
Fast spectral estimation of speech signal in analytic form. ICASSP 1980: 158-161
1970 – 1979
- 1978
- [c2]Leland B. Jackson, Frank K. Soong:
Observations on linear estimation. ICASSP 1978: 203-207 - [c1]Leland B. Jackson, Donald W. Tufts, Frank K. Soong, Rahul M. Rao:
Frequency estimation by linear prediction. ICASSP 1978: 352-356
Coauthor Index
aka: Helen Meng
aka: Ren-Hua Wang
aka: Zhi-Jie Yan
aka: Jianlai Zhou
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-22 21:13 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint