default search action
Olivier Siohan
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c68]Otavio Braga, Wei Xia, Keith Johnson, Alice Chuang, Yunfan Ye, Olivier Siohan, Tuan Anh Nguyen:
Large Scale Self-Supervised Pretraining for Active Speaker Detection. ICASSP 2024: 10036-10040 - [c67]Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shahy, Olivier Siohan:
Conformer is All You Need for Visual Speech Recognition. ICASSP 2024: 10136-10140 - 2023
- [c66]Oscar Chang, Dongseong Hwang, Olivier Siohan:
Revisiting the Entropy Semiring for Neural Speech Recognition. ICLR 2023 - [c65]Richard Rose, Oscar Chang, Olivier Siohan:
Cascaded encoders for fine-tuning ASR models on overlapped speech. INTERSPEECH 2023: 3457-3461 - [i13]Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Parag Shah, Olivier Siohan:
Conformers are All You Need for Visual Speech Recogntion. CoRR abs/2302.10915 (2023) - [i12]Richard Rose, Oscar Chang, Olivier Siohan:
Cascaded encoders for fine-tuning ASR models on overlapped speech. CoRR abs/2306.16398 (2023) - [i11]Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan:
Audio-visual fine-tuning of audio-only ASR models. CoRR abs/2312.09369 (2023) - [i10]Oscar Chang, Dongseong Hwang, Olivier Siohan:
Revisiting the Entropy Semiring for Neural Speech Recognition. CoRR abs/2312.10087 (2023) - [i9]Oscar Chang, Otavio Braga, Hank Liao, Dmitriy Serdyuk, Olivier Siohan:
On Robustness to Missing Video for Audiovisual Speech Recognition. CoRR abs/2312.10088 (2023) - 2022
- [j11]Oscar Chang, Otavio Braga, Hank Liao, Dmitriy Serdyuk, Olivier Siohan:
On Robustness to Missing Video for Audiovisual Speech Recognition. Trans. Mach. Learn. Res. 2022 (2022) - [c64]Otavio Braga, Olivier Siohan:
Best of Both Worlds: Multi-Task Audio-Visual Automatic Speech Recognition and Active Speaker Detection. ICASSP 2022: 6047-6051 - [c63]Richard Rose, Olivier Siohan:
End-to-End multi-talker audio-visual ASR using an active speaker attention module. INTERSPEECH 2022: 2828-2832 - [c62]Dmitriy Serdyuk, Otavio Braga, Olivier Siohan:
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Muti-Person Video. INTERSPEECH 2022: 2833-2837 - [i8]Dmitriy Serdyuk, Otavio Braga, Olivier Siohan:
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition. CoRR abs/2201.10439 (2022) - [i7]Richard Rose, Olivier Siohan:
End-to-end multi-talker audio-visual ASR using an active speaker attention module. CoRR abs/2204.00652 (2022) - [i6]Otavio Braga, Olivier Siohan:
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection. CoRR abs/2205.05206 (2022) - [i5]Otavio Braga, Takaki Makino, Olivier Siohan, Hank Liao:
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition. CoRR abs/2205.05586 (2022) - [i4]Otavio Braga, Olivier Siohan:
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection. CoRR abs/2205.05684 (2022) - 2021
- [c61]Dmitriy Serdyuk, Otavio Braga, Olivier Siohan:
Audio-Visual Speech Recognition is Worth $32\times 32\times 8$ Voxels. ASRU 2021: 796-802 - [c60]Kishan Sachdeva, Joshua Maynez, Olivier Siohan:
Action Item Detection in Meetings Using Pretrained Transformers. ASRU 2021: 861-868 - [c59]Otavio Braga, Olivier Siohan:
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection. ICASSP 2021: 6863-6867 - [c58]Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao:
Bridging the Gap Between Streaming and Non-Streaming ASR Systems by Distilling Ensembles of CTC and RNN-T Models. Interspeech 2021: 1807-1811 - [c57]Richard Rose, Olivier Siohan, Anshuman Tripathi, Otavio Braga:
End-to-End Audio-Visual Speech Recognition for Overlapping Speech. Interspeech 2021: 3016-3020 - [i3]Thibault Doutre, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Olivier Siohan, Liangliang Cao:
Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models. CoRR abs/2104.14346 (2021) - [i2]Dmitriy Serdyuk, Otavio Braga, Olivier Siohan:
Audio-Visual Speech Recognition is Worth 32×32×8 Voxels. CoRR abs/2109.09536 (2021) - 2020
- [c56]Otavio Braga, Takaki Makino, Olivier Siohan, Hank Liao:
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition. ICASSP 2020: 6994-6998
2010 – 2019
- 2019
- [c55]Takaki Makino, Hank Liao, Yannis M. Assael, Brendan Shillingford, Basilio Garcia, Otavio Braga, Olivier Siohan:
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition. ASRU 2019: 905-912 - [i1]Takaki Makino, Hank Liao, Yannis M. Assael, Brendan Shillingford, Basilio Garcia, Otavio Braga, Olivier Siohan:
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition. CoRR abs/1911.04890 (2019) - 2017
- [c54]Bo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon:
Acoustic Modeling for Google Home. INTERSPEECH 2017: 399-403 - [c53]Olivier Siohan:
CTC Training of Multi-Phone Acoustic Models for Speech Recognition. INTERSPEECH 2017: 709-713 - [c52]Tara N. Sainath, Vijayaditya Peddinti, Olivier Siohan, Arun Narayanan:
Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training. INTERSPEECH 2017: 3542-3546 - 2016
- [c51]Olivier Siohan:
Sequence training of multi-task acoustic models using meta-state labels. ICASSP 2016: 5425-5429 - [c50]Victor Soto, Olivier Siohan, Mohamed Elfeky, Pedro J. Moreno:
Selection and combination of hypotheses for dialectal speech recognition. ICASSP 2016: 5845-5849 - [c49]Mortaza Doulaty, Richard Rose, Olivier Siohan:
Automatic optimization of data perturbation distributions for multi-style training in speech recognition. SLT 2016: 21-27 - 2015
- [c48]Olivier Siohan, David Rybach:
Multitask learning and system combination for automatic speech recognition. ASRU 2015: 589-595 - [c47]Yanbo Xu, Olivier Siohan, David Simcha, Sanjiv Kumar, Hank Liao:
Exemplar-based large vocabulary speech recognition using k-nearest neighbors. ICASSP 2015: 5167-5171 - [c46]Hank Liao, Golan Pundak, Olivier Siohan, Melissa K. Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew W. Senior, Françoise Beaufays, Michiel Bacchiani:
Large vocabulary automatic speech recognition for children. INTERSPEECH 2015: 1611-1615 - 2014
- [c45]Olivier Siohan:
Training data selection based on context-dependent state matching. ICASSP 2014: 3316-3319 - [c44]Olga Kapralova, John Alex, Eugene Weinstein, Pedro J. Moreno, Olivier Siohan:
A big data approach to acoustic model training corpus selection. INTERSPEECH 2014: 2083-2087 - 2013
- [c43]Olivier Siohan, Michiel Bacchiani:
ivector-based acoustic data selection. INTERSPEECH 2013: 657-661 - 2010
- [c42]Hank Liao, Christopher Alberti, Michiel Bacchiani, Olivier Siohan:
Decision tree state clustering with word and syllable features. INTERSPEECH 2010: 2958-2961
2000 – 2009
- 2009
- [c41]Christopher Alberti, Michiel Bacchiani, Ari Bezman, Ciprian Chelba, Anastassia Drofa, Hank Liao, Pedro J. Moreno, Ted Power, Arnaud Sahuguet, Maria Shugrina, Olivier Siohan:
An audio indexing system for election video material. ICASSP 2009: 4873-4876 - 2007
- [j10]Mohamed Afify, Olivier Siohan:
Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space. IEEE Trans. Speech Audio Process. 15(5): 1731-1732 (2007) - [c40]Bhuvana Ramabhadran, Olivier Siohan, Abhinav Sethy:
The IBM 2007 speech transcription system for European parliamentary speeches. ASRU 2007: 472-477 - [c39]Mohamed Afify, Olivier Siohan, Ruhi Sarikaya:
Gaussian Mixture Language Models for Speech Recognition. ICASSP (4) 2007: 29-32 - [c38]Jonathan Mamou, Bhuvana Ramabhadran, Olivier Siohan:
Vocabulary independent spoken term detection. SIGIR 2007: 615-622 - 2006
- [c37]Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu, Brian Kingsbury:
Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy. ICASSP (1) 2006: 589-592 - [c36]Bhuvana Ramabhadran, Olivier Siohan, Lidia Mangu, Geoffrey Zweig, Martin Westphal, Henrik Schulz, Alvaro Soneiro:
The IBM 2006 speech transcription system for european parliamentary speeches. INTERSPEECH 2006 - [c35]Jing Huang, Martin Westphal, Stanley F. Chen, Olivier Siohan, Daniel Povey, Vit Libal, Alvaro Soneiro, Henrik Schulz, Thomas Ross, Gerasimos Potamianos:
The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings. MLMI 2006: 432-443 - [c34]Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu, Brian Kingsbury:
Automated Quality Monitoring for Call Centers using Speech and NLP Technologies. HLT-NAACL 2006 - 2005
- [j9]Mohamed Afify, Feng Liu, Hui Jiang, Olivier Siohan:
A new verification-based fast-match for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 13(4): 546-553 (2005) - [c33]Olivier Siohan, Bhuvana Ramabhadran, Brian Kingsbury:
Contructing Ensembles of ASR Systems Using Randomized Decision Trees. ICASSP (1) 2005: 197-200 - [c32]Olivier Siohan, Michiel Bacchiani:
Fast vocabulary-independent audio search using path-based graph indexing. INTERSPEECH 2005: 53-56 - 2004
- [j8]Mohamed Afify, Olivier Siohan:
Sequential estimation with optimal forgetting for robust speech recognition. IEEE Trans. Speech Audio Process. 12(1): 19-26 (2004) - [c31]Bhuvana Ramabhadran, Olivier Siohan, Geoffrey Zweig:
Use of metadata to improve recognition of spontaneous speech and named entities. INTERSPEECH 2004: 381-384 - [c30]Olivier Siohan, Bhuvana Ramabhadran, Geoffrey Zweig:
Speech recognition error analysis on the English MALACH corpus. INTERSPEECH 2004: 413-416 - 2003
- [j7]Hong-Kwang Jeff Kuo, Olivier Siohan, Joseph P. Olive:
Advances in natural language call routing. Bell Labs Tech. J. 7(3): 155-170 (2003) - [c29]Florian Hilger, Hermann Ney, Olivier Siohan, Frank K. Soong:
Combining neighboring filter channels to improve quantile based histogram equalization. ICASSP (1) 2003: 640-643 - [c28]Imed Zitouni, Olivier Siohan, Chin-Hui Lee:
Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition. INTERSPEECH 2003: 237-240 - 2002
- [j6]Olivier Siohan, Tor André Myrvoll, Chin-Hui Lee:
Structural maximum a posteriori linear regression for fast HMM adaptation. Comput. Speech Lang. 16(1): 5-24 (2002) - [j5]Mohamed Afify, Olivier Siohan, Chin-Hui Lee:
Upper and lower bounds on the mean of noisy speech: application to minimax classification. IEEE Trans. Speech Audio Process. 10(2): 79-88 (2002) - [c27]Hui Jiang, Olivier Siohan, Frank K. Soong, Chin-Hui Lee:
A dynamic in-search discriminative training approach for large vocabulary speech recognition. ICASSP 2002: 113-116 - [c26]Benoit Launay, Olivier Siohan, Arun C. Surendran, Chin-Hui Lee:
Towards knowledge-based features for HMM based large vocabulary automatic speech recognition. ICASSP 2002: 817-820 - [c25]Mohamed Afify, Olivier Siohan:
A discriminative training criterion and an associated EM learning algorithm. ICASSP 2002: 1065-1068 - [c24]Jingdong Chen, Dimitris Dimitriadis, Hui Jiang, Qi Li, Tor André Myrvoll, Olivier Siohan, Frank K. Soong:
Bell labs approach to Aurora evaluation on connected digit recognition. INTERSPEECH 2002: 229-232 - [c23]Imed Zitouni, Olivier Siohan, Hong-Kwang Jeff Kuo, Chin-Hui Lee:
Backoff hierarchical class n-gram language modelling for automatic speech recognition systems. INTERSPEECH 2002: 885-888 - 2001
- [j4]Olivier Siohan, Cristina Chesta, Chin-Hui Lee:
Joint maximum a posteriori adaptation of transformation and HMM parameters. IEEE Trans. Speech Audio Process. 9(4): 417-428 (2001) - [c22]Mohamed Afify, Olivier Siohan:
Sequential noise estimation with optimal forgetting for robust speech recognition. ICASSP 2001: 229-232 - [c21]Olivier Siohan, Akio Ando, Mohamed Afify, Hui Jiang, Chin-Hui Lee, Qi Li, Feng Liu, Kazuo Onoe, Frank K. Soong, Qiru Zhou:
A real-time Japanese broadcast news closed-captioning system. INTERSPEECH 2001: 495-498 - [c20]Qi Li, Frank K. Soong, Olivier Siohan:
An auditory system-based feature for robust speech recognition. INTERSPEECH 2001: 619-622 - [c19]Mohamed Afify, Hui Jiang, Filipp Korkmazskiy, Chin-Hui Lee, Qi Li, Olivier Siohan, Frank K. Soong, Arun C. Surendran:
Evaluating the Aurora connected digit recognition task - a bell labs approach. INTERSPEECH 2001: 633-636 - [c18]Feng Liu, Mohamed Afify, Hui Jiang, Olivier Siohan:
A new verification-based fast match approach to large vocabulary speech recognition. INTERSPEECH 2001: 851-854 - [c17]Mohamed Afify, Olivier Siohan, Chin-Hui Lee:
Minimax classification with parametric neighborhoods for noisy speech recognition. INTERSPEECH 2001: 2355-2358 - 2000
- [j3]Aaron E. Rosenberg, Olivier Siohan, Sarangarajan Parthasarathy:
Small group speaker identification with common password phrases. Speech Commun. 31(2-3): 131-140 (2000) - [c16]Olivier Siohan, Cristina Chesta, Chin-Hui Lee:
Joint maximum a posteriori estimation of transformation and hidden Markov model parameters. ICASSP 2000: 965-968 - [c15]Partha Niyogi, Jean-Benoît Pierrot, Olivier Siohan:
Multiple classifiers by constrained minimization. ICASSP 2000: 3462-3465 - [c14]Qi Li, Frank K. Soong, Olivier Siohan:
A high-performance auditory feature for robust speech recognition. INTERSPEECH 2000: 51-54 - [c13]Tor André Myrvoll, Olivier Siohan, Chin-Hui Lee, Wu Chou:
Structural maximum a-posteriori linear regression for unsupervised speaker adaptation. INTERSPEECH 2000: 540-543 - [c12]Wu Chou, Olivier Siohan, Tor André Myrvoll, Chin-Hui Lee:
Extended maximum a posterior linear regression (EMAPLR) model adaptation for speech recognition. INTERSPEECH 2000: 616-619 - [c11]Mohamed Afify, Olivier Siohan:
Constrained maximum likelihood linear regression for speaker adaptation. INTERSPEECH 2000: 861-864
1990 – 1999
- 1999
- [c10]Olivier Siohan, Chin-Hui Lee, Arun C. Surendran, Qi Li:
Background model design for flexible and portable speaker verification systems. ICASSP 1999: 825-828 - [c9]Cristina Chesta, Olivier Siohan, Chin-Hui Lee:
Maximum a posteriori linear regression for hidden Markov model adaptation. EUROSPEECH 1999: 211-214 - 1998
- [c8]Aaron E. Rosenberg, Olivier Siohan, Sarangarajan Parthasarathy:
Speaker verification using minimum verification error training. ICASSP 1998: 105-108 - [c7]Olivier Siohan, Aaron E. Rosenberg, Sarangarajan Parthasarathy:
Speaker identification using minimum classification error training. ICASSP 1998: 109-112 - 1997
- [j2]Olivier Siohan, Chin-Hui Lee:
Iterative noise and channel estimation under the stochastic matching algorithm framework. IEEE Signal Process. Lett. 4(11): 304-306 (1997) - 1996
- [j1]Olivier Siohan, Yifan Gong, Jean Paul Haton:
Comparative experiments of several adaptation approaches to noisy speech recognition using stochastic trajectory models. Speech Commun. 18(4): 335-352 (1996) - [c6]Olivier Siohan, Yifan Gong:
A semi-continuous stochastic trajectory model for phoneme-based continuous speech recognition. ICASSP 1996: 471-474 - 1995
- [b1]Olivier Siohan:
Reconnaissance automatique de la parole continue en environnement bruité : application à des modèles stochastiques de trajectoires. (Continuous speech recognition in a noisy environment : application to stochastic trajectory models). Henri Poincaré University, Nancy, France, 1995 - [c5]Olivier Siohan:
On the robustness of linear discriminant analysis as a preprocessing step for noisy speech recognition. ICASSP 1995: 125-128 - [c4]Olivier Siohan, Yifan Gong, Jean Paul Haton:
Noise adaptation using linear regression for continuous noisy speech recognition. EUROSPEECH 1995: 465-468 - 1994
- [c3]Olivier Siohan, Yifan Gong, Jean Paul Haton:
A comparison of three noisy speech recognition approaches. ICSLP 1994: 1031-1034 - 1993
- [c2]Olivier Siohan, Yifan Gong, Jean Paul Haton:
A Bayesian approach to phone duration adaptation for lombard speech recognition. EUROSPEECH 1993: 1639-1642 - 1992
- [c1]Yifan Gong, Olivier Siohan, Jean Paul Haton:
Minimization of speech alignment error by iterative transformation for speaker adaptation. ICSLP 1992: 377-380
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-09-20 00:40 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint