default search action
Tomohiro Nakatani
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j58]Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Shoji Makino:
DOA-informed switching independent vector extraction and beamforming for speech enhancement in underdetermined situations. EURASIP J. Audio Speech Music. Process. 2024(1): 52 (2024) - [j57]Reinhold Hëb-Umbach, Tomohiro Nakatani, Marc Delcroix, Christoph Boeddeker, Tsubasa Ochiai:
Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering [Special Issue On Model-Based and Data-Driven Audio Signal Processing]. IEEE Signal Process. Mag. 41(6): 12-23 (2024) - [j56]Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino:
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1157-1172 (2024) - [j55]Rintaro Ikeshita, Tomohiro Nakatani:
Geometrically-Regularized Fast Independent Vector Extraction by Pure Majorization-Minimization. IEEE Trans. Signal Process. 72: 1560-1575 (2024) - [c238]Rino Kimura, Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki, Tetsuya Ueda, Shoji Makino:
Diffusion Model-Based MIMO Speech Denoising and Dereverberation. ICASSP Workshops 2024: 455-459 - [c237]Hao Shi, Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Ensemble Inference for Diffusion Model-Based Speech Enhancement. ICASSP Workshops 2024: 735-739 - [c236]Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada, Shoji Makino:
Neural Network-Based Virtual Microphone Estimation with Virtual Microphone and Beamformer-Level Multi-Task Loss. ICASSP 2024: 11021-11025 - [c235]Tomohiro Nakatani, Naoyuki Kamo, Marc Delcroix, Shoko Araki:
Multi-Stream Diffusion Model for Probabilistic Integration of Model-Based and Data-Driven Speech Enhancement. IWAENC 2024: 65-69 - [c234]Shinya Furunaga, Hiroshi Sawada, Rintaro Ikeshita, Tomohiro Nakatani, Shoji Makino:
Accurate Delayed Source Model for Multi-Frame Full-Rank Spatial Covariance Analysis. IWAENC 2024: 170-174 - [c233]Carlos Hernandez-Olivan, Marc Delcroix, Tsubasa Ochiai, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki:
Interaural Time Difference Loss for Binaural Target Sound Extraction. IWAENC 2024: 210-214 - [i41]Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Simon Doclo:
Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers. CoRR abs/2402.03058 (2024) - [i40]Carlos Hernandez-Olivan, Marc Delcroix, Tsubasa Ochiai, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki:
Interaural time difference loss for binaural target sound extraction. CoRR abs/2408.00344 (2024) - [i39]Carlos Hernandez-Olivan, Marc Delcroix, Tsubasa Ochiai, Daisuke Niizumi, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki:
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model. CoRR abs/2409.12528 (2024) - 2023
- [j54]Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Mask-Based Neural Beamforming for Moving Speakers With Self-Attention-Based Tracking. IEEE ACM Trans. Audio Speech Lang. Process. 31: 835-848 (2023) - [j53]Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani:
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined Blind Source Separation and Dereverberation. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3589-3602 (2023) - [c232]Ning Guo, Tomohiro Nakatani, Shoko Araki, Takehiro Moriya:
Modified Parametric Multichannel Wiener Filter for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers. APSIPA ASC 2023: 1042-1049 - [c231]Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Shoji Makino:
Spatially-Regularized Switching Independent Vector Analysis. APSIPA ASC 2023: 2024-2030 - [c230]Koki Nishida, Norihiro Takamune, Rintaro Ikeshita, Daichi Kitamura, Hiroshi Saruwatari, Tomohiro Nakatani:
NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction. EUSIPCO 2023: 925-929 - [c229]Taishi Nakashima, Rintaro Ikeshita, Nobutaka Ono, Shoko Araki, Tomohiro Nakatani:
Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis. ICASSP 2023: 1-5 - [c228]Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani:
Target Speech Extraction with Conditional Diffusion Model. INTERSPEECH 2023: 176-180 - [c227]Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino:
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine. INTERSPEECH 2023: 2503-2507 - [c226]Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki:
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. INTERSPEECH 2023: 3477-3481 - [i38]Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki:
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. CoRR abs/2305.13580 (2023) - [i37]Koki Nishida, Norihiro Takamune, Rintaro Ikeshita, Daichi Kitamura, Hiroshi Saruwatari, Tomohiro Nakatani:
NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction. CoRR abs/2306.12820 (2023) - [i36]Ning Guo, Tomohiro Nakatani, Shoko Araki, Takehiro Moriya:
Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers. CoRR abs/2306.17317 (2023) - [i35]Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani:
Target Speech Extraction with Conditional Diffusion Model. CoRR abs/2308.03987 (2023) - 2022
- [j52]Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki:
Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1032-1047 (2022) - [j51]Wangyou Zhang, Xuankai Chang, Christoph Böddeker, Tomohiro Nakatani, Shinji Watanabe, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3173-3188 (2022) - [c225]Rintaro Ikeshita, Tomohiro Nakatani:
ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis. EUSIPCO 2022: 65-69 - [c224]Naoyuki Kamo, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani:
Importance of Switch Optimization Criterion in Switching WPE Dereverberation. ICASSP 2022: 176-180 - [c223]Hiroshi Sawada, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani:
Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environments. ICASSP 2022: 496-500 - [c222]Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani:
Listen only to me! How well can target speech extraction handle false alarms? INTERSPEECH 2022: 216-220 - [i34]Ayako Yamamoto, Toshio Irino, Shoko Araki, Kenichi Arai, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani:
Subjective intelligibility of speech sounds enhanced by ideal ratio mask via crowdsourced remote experiments with effective data screening. CoRR abs/2203.16760 (2022) - [i33]Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani:
Listen only to me! How well can target speech extraction handle false alarms? CoRR abs/2204.04811 (2022) - [i32]Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking. CoRR abs/2205.03568 (2022) - 2021
- [j50]Reinhold Haeb-Umbach, Jahn Heymann, Lukas Drude, Shinji Watanabe, Marc Delcroix, Tomohiro Nakatani:
Far-Field Automatic Speech Recognition. Proc. IEEE 109(2): 124-148 (2021) - [j49]Rintaro Ikeshita, Naoyuki Kamo, Tomohiro Nakatani:
Blind Signal Dereverberation Based on Mixture of Weighted Prediction Error Models. IEEE Signal Process. Lett. 28: 399-403 (2021) - [j48]Rintaro Ikeshita, Tomohiro Nakatani:
Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation. IEEE Signal Process. Lett. 28: 972-976 (2021) - [j47]Rintaro Ikeshita, Keisuke Kinoshita, Naoyuki Kamo, Tomohiro Nakatani:
Online Speech Dereverberation Using Mixture of Multichannel Linear Prediction Models. IEEE Signal Process. Lett. 28: 1580-1584 (2021) - [j46]Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani:
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1950-1965 (2021) - [j45]Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki:
Block Coordinate Descent Algorithms for Auxiliary-Function-Based Independent Vector Extraction. IEEE Trans. Signal Process. 69: 3252-3267 (2021) - [c221]Tomohiro Nakatani, Rintaro Ikeshita, Naoyuki Kamo, Keisuke Kinoshita, Shoko Araki, Hiroshi Sawada:
Switching Convolutional Beamformer. EUSIPCO 2021: 266-270 - [c220]Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani:
Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation. EUSIPCO 2021: 326-330 - [c219]Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino:
Low Latency Online Source Separation and Noise Reduction Based on Joint Optimization with Dereverberation. EUSIPCO 2021: 1000-1004 - [c218]Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino:
Low Latency Online Blind Source Separation Based on Joint Optimization with Blind Dereverberation. ICASSP 2021: 506-510 - [c217]Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura:
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain. ICASSP 2021: 4705-4709 - [c216]Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani:
Speaker Activity Driven Neural Speech Extraction. ICASSP 2021: 6099-6103 - [c215]Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki:
Neural Network-Based Virtual Microphone Estimator. ICASSP 2021: 6114-6118 - [c214]Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki:
Blind and Neural Network-Guided Convolutional Beamformer for Joint Denoising, Dereverberation, and Source Separation. ICASSP 2021: 6129-6133 - [c213]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. ICASSP 2021: 6898-6902 - [c212]Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach:
Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation. ICASSP 2021: 8428-8432 - [c211]Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani:
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility. Interspeech 2021: 181-185 - [c210]Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa:
PILOT: Introducing Transformers for Probabilistic Sound Event Localization. Interspeech 2021: 2117-2121 - [c209]Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Multimodal Attention Fusion for Target Speaker Extraction. SLT 2021: 778-784 - [c208]Katerina Zmolíková, Marc Delcroix, Lukás Burget, Tomohiro Nakatani, Jan Honza Cernocký:
Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation. SLT 2021: 889-896 - [i31]Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki:
Neural Network-based Virtual Microphone Estimator. CoRR abs/2101.04315 (2021) - [i30]Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani:
Speaker activity driven neural speech extraction. CoRR abs/2101.05516 (2021) - [i29]Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani:
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter. CoRR abs/2101.08563 (2021) - [i28]Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Multimodal Attention Fusion for Target Speaker Extraction. CoRR abs/2102.01326 (2021) - [i27]Rintaro Ikeshita, Tomohiro Nakatani:
Independent Vector Extraction for Joint Blind Source Separation and Dereverberation. CoRR abs/2102.04696 (2021) - [i26]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. CoRR abs/2102.11525 (2021) - [i25]Julio Wissing, Benedikt T. Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura:
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain. CoRR abs/2102.11588 (2021) - [i24]Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa:
Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization. CoRR abs/2103.00417 (2021) - [i23]Ayako Yamamoto, Toshio Irino, Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani:
Comparison of remote experiments using crowdsourcing and laboratory experiments on speech intelligibility. CoRR abs/2104.10001 (2021) - [i22]Christopher Schymura, Benedikt T. Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa:
PILOT: Introducing Transformers for Probabilistic Sound Event Localization. CoRR abs/2106.03903 (2021) - [i21]Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani:
Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation. CoRR abs/2106.05529 (2021) - [i20]Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki:
Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation. CoRR abs/2108.01836 (2021) - [i19]Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki:
Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithm. CoRR abs/2111.10574 (2021) - 2020
- [j44]Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani:
GEDI: Gammachirp envelope distortion index for predicting intelligibility of enhanced speech. Speech Commun. 123: 43-58 (2020) - [j43]Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach:
Jointly Optimal Denoising, Dereverberation, and Source Separation. IEEE ACM Trans. Audio Speech Lang. Process. 28: 2267-2282 (2020) - [c207]Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa:
Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization. EUSIPCO 2020: 231-235 - [c206]Hiroshi Sawada, Rintaro Ikeshita, Tomohiro Nakatani:
Experimental Analysis of EM and MU Algorithms for Optimizing Full-rank Spatial Covariance Model. EUSIPCO 2020: 885-889 - [c205]Christoph Böddeker, Tomohiro Nakatani, Keisuke Kinoshita, Reinhold Haeb-Umbach:
Jointly Optimal Dereverberation and Beamforming. ICASSP 2020: 216-220 - [c204]Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani:
Tackling Real Noisy Reverberant Meetings with All-Neural Source Separation, Counting, and Diarization System. ICASSP 2020: 381-385 - [c203]Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa:
A Dynamic Stream Weight Backprop Kalman Filter for Audiovisual Speaker Tracking. ICASSP 2020: 581-585 - [c202]Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki:
Overdetermined Independent Vector Analysis. ICASSP 2020: 591-595 - [c201]Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani:
Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's T Distribution. ICASSP 2020: 681-685 - [c200]Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki:
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam. ICASSP 2020: 691-695 - [c199]Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki:
Beam-TasNet: Time-domain Audio Separation Network Meets Frequency-domain Beamformer. ICASSP 2020: 6384-6388 - [c198]Tomohiro Nakatani, Riki Takahashi, Tsubasa Ochiai, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Shoko Araki:
DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation. ICASSP 2020: 6399-6403 - [c197]Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Böddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach:
End-to-End Training of Time Domain Audio Separation and Recognition. ICASSP 2020: 7004-7008 - [c196]Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani:
Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network. ICASSP 2020: 7009-7013 - [c195]Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Shoko Araki:
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation. INTERSPEECH 2020: 91-95 - [c194]Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Toshio Irino:
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System. INTERSPEECH 2020: 1156-1160 - [c193]Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach:
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation. INTERSPEECH 2020: 2652-2656 - [c192]Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach:
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR. INTERSPEECH 2020: 3097-3101 - [c191]Ali Aroudi, Marc Delcroix, Tomohiro Nakatani, Keisuke Kinoshita, Shoko Araki, Simon Doclo:
Cognitive-Driven Convolutional Beamforming Using EEG-Based Auditory Attention Decoding. MLSP 2020: 1-6 - [i18]Marc Delcroix, Tsubasa Ochiai, Katerina Zmolíková, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki:
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam. CoRR abs/2001.08378 (2020) - [i17]Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani:
Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution. CoRR abs/2002.08582 (2020) - [i16]Rintaro Ikeshita, Tomohiro Nakatani, Shoko Araki:
Overdetermined independent vector analysis. CoRR abs/2003.02458 (2020) - [i15]Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani:
Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system. CoRR abs/2003.03987 (2020) - [i14]Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani:
Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. CoRR abs/2003.03998 (2020) - [i13]Ali Aroudi, Marc Delcroix, Tomohiro Nakatani, Keisuke Kinoshita, Shoko Araki, Simon Doclo:
Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding. CoRR abs/2005.04669 (2020) - [i12]Tomohiro Nakatani, Christoph Böddeker, Keisuke Kinoshita, Rintaro Ikeshita, Marc Delcroix, Reinhold Haeb-Umbach:
Jointly optimal denoising, dereverberation, and source separation. CoRR abs/2005.09843 (2020) - [i11]Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach:
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR. CoRR abs/2006.02786 (2020) - [i10]Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach:
Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation. CoRR abs/2006.13579 (2020) - [i9]Christoph Böddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Shinji Watanabe, Reinhold Haeb-Umbach:
Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation. CoRR abs/2011.15003 (2020)
2010 – 2019
- 2019
- [j42]Michael Hentschel, Marc Delcroix, Atsunori Ogawa, Tomoharu Iwata, Tomohiro Nakatani:
Feature Based Domain Adaptation for Neural Network Language Models with Factorised Hidden Layers. IEICE Trans. Inf. Syst. 102-D(3): 598-608 (2019) - [j41]Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Tomohiro Nakatani, Lukás Burget, Jan Cernocký:
SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures. IEEE J. Sel. Top. Signal Process. 13(4): 800-814 (2019) - [j40]Tomohiro Nakatani, Keisuke Kinoshita:
A Unified Convolutional Beamformer for Simultaneous Denoising and Dereverberation. IEEE Signal Process. Lett. 26(6): 903-907 (2019) - [j39]Reinhold Haeb-Umbach, Shinji Watanabe, Tomohiro Nakatani, Michiel Bacchiani, Björn Hoffmeister, Michael L. Seltzer, Heiga Zen, Mehrez Souden:
Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques. IEEE Signal Process. Mag. 36(6): 111-124 (2019) - [c190]Rintaro Ikeshita, Nobutaka Ito, Tomohiro Nakatani, Hiroshi Sawada:
A Unifying Framework for Blind Source Separation Based on A Joint Diagonalizability Constraint. EUSIPCO 2019: 1-5 - [c189]Tomohiro Nakatani, Keisuke Kinoshita:
Maximum likelihood convolutional beamformer for simultaneous denoising and dereverberation. EUSIPCO 2019: 1-5 - [c188]Hiroshi Sawada, Rintaro Ikeshita, Nobutaka Ito, Tomohiro Nakatani:
Computational Acceleration and Smart Initialization of Full-RANK Spatial Covariance Analysis. EUSIPCO 2019: 1-5 - [c187]Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach:
All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis. ICASSP 2019: 91-95 - [c186]Nobutaka Ito, Tomohiro Nakatani:
FastMNMF: Joint Diagonalization Based Accelerated Algorithms for Multichannel Nonnegative Matrix Factorization. ICASSP 2019: 371-375 - [c185]Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders. ICASSP 2019: 6166-6170 - [c184]Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, Tomohiro Nakatani:
Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online ASR. ICASSP 2019: 6655-6659 - [c183]Yuki Kubo, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Shoko Araki:
Mask-based MVDR Beamformer for Noisy Multisource Environments: Introduction of Time-varying Spatial Covariance Model. ICASSP 2019: 6855-6859 - [c182]Marc Delcroix, Katerina Zmolíková, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki, Tomohiro Nakatani:
Compact Network for Speakerbeam Target Speaker Extraction. ICASSP 2019: 6965-6969 - [c181]Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani:
A Unified Framework for Neural Speech Separation and Extraction. ICASSP 2019: 6975-6979 - [c180]Atsunori Ogawa, Tsutomu Hirao, Tomohiro Nakatani, Masaaki Nagata:
ILP-based Compressive Speech Summarization with Content Word Coverage Maximization and Its Oracle Performance Analysis. ICASSP 2019: 7190-7194 - [c179]Michael Hentschel, Marc Delcroix, Atsunori Ogawa, Tomoharu Iwata, Tomohiro Nakatani:
A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models. ICASSP 2019: 7250-7254 - [c178]Tomohiro Nakatani, Keisuke Kinoshita:
Simultaneous Denoising and Dereverberation for Low-Latency Applications Using Frame-by-Frame Online Unified Convolutional Beamformer. INTERSPEECH 2019: 111-115 - [c177]Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani:
End-to-End SpeakerBeam for Single Channel Target Speech Recognition. INTERSPEECH 2019: 451-455 - [c176]Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration. INTERSPEECH 2019: 1408-1412 - [c175]Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani:
Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues. INTERSPEECH 2019: 2718-2722 - [c174]Atsunori Ogawa, Marc Delcroix, Shigeki Karita, Tomohiro Nakatani:
Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders. INTERSPEECH 2019: 3900-3904 - [c173]Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Katsuhiko Yamamoto, Toshio Irino:
Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-Based ASR System. INTERSPEECH 2019: 4275-4279 - [c172]Tomohiro Nakatani, Keisuke Kinoshita, Rintaro Ikeshita, Hiroshi Sawada, Shoko Araki:
Simultaneous Denoising, Dereverberation, and Source Separation Using a Unified Convolutional Beamformer. WASPAA 2019: 224-228 - [c171]Rintaro Ikeshita, Nobutaka Ito, Tomohiro Nakatani, Hiroshi Sawada:
Independent Low-Rank Matrix Analysis with Decorrelation Learning. WASPAA 2019: 288-292 - [i8]Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach:
All-neural online source separation, counting, and diarization for meeting analysis. CoRR abs/1902.07881 (2019) - [i7]Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani:
GEDI: Gammachirp Envelope Distortion Index for Predicting Intelligibility of Enhanced Speech. CoRR abs/1904.02096 (2019) - [i6]Tomohiro Nakatani, Keisuke Kinoshita:
Maximum likelihood convolutional beamformer for simultaneous denoising and dereverberation. CoRR abs/1908.02710 (2019) - [i5]Christoph Böddeker, Tomohiro Nakatani, Keisuke Kinoshita, Reinhold Haeb-Umbach:
Jointly optimal dereverberation and beamforming. CoRR abs/1910.13707 (2019) - [i4]Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Böddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach:
End-to-end training of time domain audio separation and recognition. CoRR abs/1912.08462 (2019) - 2018
- [j38]Satoru Emura, Shoko Araki, Tomohiro Nakatani, Noboru Harada:
Distortionless Beamforming Optimized With ℓ1-Norm Minimization. IEEE Signal Process. Lett. 25(7): 936-940 (2018) - [j37]Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Christian Huemmer, Tomohiro Nakatani:
Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation. IEEE ACM Trans. Audio Speech Lang. Process. 26(5): 895-908 (2018) - [c170]Michael Hentschel, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs. APSIPA 2018: 1692-1696 - [c169]Michael Hentschel, Marc Delcroix, Atsunori Ogawa, Tomoharu Iwata, Tomohiro Nakatani:
Factorised Hidden Layer Based Domain Adaptation for Recurrent Neural Network Language Models. APSIPA 2018: 1940-1944 - [c168]Nobutaka Ito, Christopher Schymura, Shoko Araki, Tomohiro Nakatani:
Noisy cGMM: Complex Gaussian Mixture Model with Non-Sparse Noise Model for Joint Source Separation and Denoising. EUSIPCO 2018: 1662-1666 - [c167]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
FastFCA: Joint Diagonalization Based Acceleration of Audio Source Separation Using a Full-Rank Spatial Covariance Model. EUSIPCO 2018: 1667-1671 - [c166]Nobutaka Ito, Tomohiro Nakatani:
Multiplicative Updates and Joint Diagonalization Based Acceleration for Under-Determined BSS Using a Full-Rank Spatial Covariance Model. GlobalSIP 2018: 231-235 - [c165]Juan Azcarreta, Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Permutation-Free Cgmm: Complex Gaussian Mixture Model with Inverse Wishart Mixture Model Based Spatial Prior for Permutation-Free Source Separation and Source Counting. ICASSP 2018: 51-55 - [c164]Takuya Higuchi, Keisuke Kinoshita, Nobutaka Ito, Shigeki Karita, Tomohiro Nakatani:
Frame-by-Frame Closed-Form Update for Mask-Based Adaptive MVDR Beamforming. ICASSP 2018: 531-535 - [c163]Nobutaka Ito, Takashi Makino, Shoko Araki, Tomohiro Nakatani:
Maximum-Likelihood Online Speaker Diarization in Noisy Meetings Based on Categorical Mixture Model and Probabilistic Spatial Dictionary. ICASSP 2018: 546-550 - [c162]Lukas Drude, Takuya Higuchi, Keisuke Kinoshita, Tomohiro Nakatani, Reinhold Haeb-Umbach:
Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source Separation. ICASSP 2018: 691-695 - [c161]Keisuke Kinoshita, Lukas Drude, Marc Delcroix, Tomohiro Nakatani:
Listening to Each Speaker One by One with Recurrent Selective Hearing Networks. ICASSP 2018: 5064-5068 - [c160]Marc Delcroix, Katerina Zmolíková, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani:
Single Channel Target Speaker Extraction and Recognition with Speaker Beam. ICASSP 2018: 5554-5558 - [c159]Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani:
Sequence Training of Encoder-Decoder Model Using Policy Gradient for End-to-End Speech Recognition. ICASSP 2018: 5839-5843 - [c158]Atsunori Ogawa, Marc Delcroix, Shigeki Karita, Tomohiro Nakatani:
Rescoring N-Best Speech Recognition List Based on One-on-One Hypothesis Comparison Using Encoder-Classifier Model. ICASSP 2018: 6099-6103 - [c157]Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Tomohiro Nakatani, Jan Cernocký:
Optimization of Speaker-Aware Multichannel Speech Extraction with ASR Criterion. ICASSP 2018: 6702-6706 - [c156]Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani:
Multi-resolution Gammachirp Envelope Distortion Index for Intelligibility Prediction of Noisy Speech. INTERSPEECH 2018: 1863-1867 - [c155]Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita, Tomohiro Nakatani:
Auxiliary Feature Based Adaptation of End-to-end ASR Systems. INTERSPEECH 2018: 2444-2448 - [c154]Lukas Drude, Christoph Böddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation. INTERSPEECH 2018: 3043-3047 - [c153]Yutaro Matsui, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Nobutaka Ito, Shoko Araki, Shoji Makino:
Online Integration of DNN-Based and Spatial Clustering-Based Mask Estimation for Robust MVDR Beamforming. IWAENC 2018: 71-75 - [c152]Nobutaka Ito, Tomohiro Nakatani:
Fastfca-As: Joint Diagonalization Based Acceleration of Full-Rank Spatial Covariance Analysis for Separating any Number of Sources. IWAENC 2018: 151-155 - [c151]Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, Tomohiro Nakatani:
Frame-Online DNN-WPE Dereverberation. IWAENC 2018: 466-470 - [i3]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
FastFCA: A Joint Diagonalization Based Fast Algorithm for Audio Source Separation Using A Full-Rank Spatial Covariance Model. CoRR abs/1805.06572 (2018) - [i2]Nobutaka Ito, Tomohiro Nakatani:
FastFCA-AS: Joint Diagonalization Based Acceleration of Full-Rank Spatial Covariance Analysis for Separating Any Number of Sources. CoRR abs/1805.09498 (2018) - [i1]Tomohiro Nakatani, Keisuke Kinoshita:
A unified convolutional beamformer for simultaneous denoising and dereverberation. CoRR abs/1812.08400 (2018) - 2017
- [j36]Tomoko Kawase, Kenta Niwa, Masakiyo Fujimoto, Kazunori Kobayashi, Shoko Araki, Tomohiro Nakatani:
Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 100-A(5): 1127-1136 (2017) - [j35]Takuya Higuchi, Nobutaka Ito, Shoko Araki, Takuya Yoshioka, Marc Delcroix, Tomohiro Nakatani:
Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR. IEEE ACM Trans. Audio Speech Lang. Process. 25(4): 780-793 (2017) - [c150]Michael Hentschel, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani, Yuji Matsumoto:
Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs. APSIPA 2017: 618-621 - [c149]Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani:
Learning speaker representation for neural network based multichannel speaker extraction. ASRU 2017: 8-15 - [c148]Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
Adversarial training for data-driven speech enhancement without parallel corpus. ASRU 2017: 40-47 - [c147]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Data-driven and physical model-based designs of probabilistic spatial dictionary for online meeting diarization and adaptive beamforming. EUSIPCO 2017: 1165-1169 - [c146]Shoko Araki, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, Keisuke Kinoshita, Takuya Higuchi, Takuya Yoshioka, Dung T. Tran, Shigeki Karita, Tomohiro Nakatani:
Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming. HSCMA 2017: 16-20 - [c145]Keisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Takuya Higuchi, Tomohiro Nakatani:
Deep mixture density network for statistical model-based feature enhancement. ICASSP 2017: 251-255 - [c144]Tomohiro Nakatani, Nobutaka Ito, Takuya Higuchi, Shoko Araki, Keisuke Kinoshita:
Integrating DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming. ICASSP 2017: 286-290 - [c143]Nobutaka Ito, Shoko Araki, Marc Delcroix, Tomohiro Nakatani:
Probabilistic spatial dictionary based online adaptive beamforming for meeting recognition in noisy and reverberant environments. ICASSP 2017: 681-685 - [c142]Christian Huemmer, Marc Delcroix, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Walter Kellermann:
Online environmental adaptation of CNN-based acoustic models using spatial diffuseness features. ICASSP 2017: 4875-4879 - [c141]Takuya Higuchi, Takuya Yoshioka, Keisuke Kinoshita, Tomohiro Nakatani:
Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion. ICASSP 2017: 5170-5174 - [c140]Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Taichi Asami, Shigeru Katagiri, Tomohiro Nakatani:
Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models. ICASSP 2017: 5175-5179 - [c139]Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Christian Huemmer, Tomohiro Nakatani:
Feedback connection for deep neural network-based acoustic modeling. ICASSP 2017: 5240-5244 - [c138]Keisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani:
Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. INTERSPEECH 2017: 384-388 - [c137]Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Katerina Zmolíková, Tomohiro Nakatani:
Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources. INTERSPEECH 2017: 1183-1187 - [c136]Dung T. Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani:
Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling. INTERSPEECH 2017: 1596-1600 - [c135]Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani:
Forward-Backward Convolutional LSTM for Acoustic Modeling. INTERSPEECH 2017: 1601-1605 - [c134]Atsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search. INTERSPEECH 2017: 1963-1967 - [c133]Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani:
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures. INTERSPEECH 2017: 2655-2659 - [c132]Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani:
Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio. INTERSPEECH 2017: 2949-2953 - [c131]Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling. INTERSPEECH 2017: 3852-3856 - [p4]Marc Delcroix, Takuya Yoshioka, Nobutaka Ito, Atsunori Ogawa, Keisuke Kinoshita, Masakiyo Fujimoto, Takuya Higuchi, Shoko Araki, Tomohiro Nakatani:
Multichannel Speech Enhancement Approaches to DNN-Based Far-Field Speech Recognition. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 21-49 - [p3]Keisuke Kinoshita, Marc Delcroix, Sharon Gannot, Emanuël A. P. Habets, Reinhold Haeb-Umbach, Walter Kellermann, Volker Leutnant, Roland Maas, Tomohiro Nakatani, Bhiksha Raj, Armin Sehr, Takuya Yoshioka:
The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 345-354 - 2016
- [j34]Marc Delcroix, Atsunori Ogawa, Seong-Jun Hahm, Tomohiro Nakatani, Atsushi Nakamura:
Differenced maximum mutual information criterion for robust unsupervised acoustic model adaptation. Comput. Speech Lang. 36: 24-41 (2016) - [j33]Keisuke Kinoshita, Marc Delcroix, Sharon Gannot, Emanuël A. P. Habets, Reinhold Haeb-Umbach, Walter Kellermann, Volker Leutnant, Roland Maas, Tomohiro Nakatani, Bhiksha Raj, Armin Sehr, Takuya Yoshioka:
A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J. Adv. Signal Process. 2016: 7 (2016) - [c130]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Complex angular central Gaussian mixture model for directional statistics in mask-based microphone array signal processing. EUSIPCO 2016: 1153-1157 - [c129]Naoki Murata, Hirokazu Kameoka, Keisuke Kinoshita, Shoko Araki, Tomohiro Nakatani, Shoichi Koyama, Hiroshi Saruwatari:
Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution. EUSIPCO 2016: 1648-1652 - [c128]Shoko Araki, Masahiro Okada, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani:
Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition. ICASSP 2016: 385-389 - [c127]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Modeling audio directional statistics using a complex bingham mixture model for blind source extraction from diffuse noise. ICASSP 2016: 465-468 - [c126]Tomoko Kawase, Kenta Niwa, Masakiyo Fujimoto, Noriyoshi Kamado, Kazunori Kobayashi, Shoko Araki, Tomohiro Nakatani:
Real-time integration of statistical model-based speech enhancement with unsupervised noise PSD estimation using microphone array. ICASSP 2016: 604-608 - [c125]Takuya Higuchi, Nobutaka Ito, Takuya Yoshioka, Tomohiro Nakatani:
Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise. ICASSP 2016: 5210-5214 - [c124]Marc Delcroix, Keisuke Kinoshita, Chengzhu Yu, Atsunori Ogawa, Takuya Yoshioka, Tomohiro Nakatani:
Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions. ICASSP 2016: 5270-5274 - [c123]Takuya Yoshioka, Katsunori Ohnishi, Fuming Fang, Tomohiro Nakatani:
Noise robust speech recognition using recent developments in neural networks for computer vision. ICASSP 2016: 5730-5734 - [c122]Hendrik Meutzner, Shoko Araki, Masakiyo Fujimoto, Tomohiro Nakatani:
A generative-discriminative hybrid approach to multi-channel noise reduction for robust automatic speech recognition. ICASSP 2016: 5740-5744 - [c121]Masakiyo Fujimoto, Tomohiro Nakatani:
Multi-pass feature enhancement based on generative-discriminative hybrid approach for noise robust speech recognition. ICASSP 2016: 5750-5754 - [c120]Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Yoshioka, Dung T. Tran, Tomohiro Nakatani:
Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models. INTERSPEECH 2016: 1573-1577 - [c119]Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani:
Speech Intelligibility Prediction Based on the Envelope Power Spectrum Model with the Dynamic Compressive Gammachirp Auditory Filterbank. INTERSPEECH 2016: 2885-2889 - [c118]Atsunori Ogawa, Shogo Seki, Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Kazuya Takeda:
Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement. INTERSPEECH 2016: 3733-3737 - [c117]Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani:
Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion. INTERSPEECH 2016: 3808-3812 - [c116]Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions. INTERSPEECH 2016: 3813-3817 - [c115]Mahmoud Fakhry, Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Modeling audio directional statistics using a probabilistic spatial dictionary for speaker diarization in real meetings. IWAENC 2016: 1-5 - [c114]Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani:
Sparseness-based multichannel nonnegative matrix factorization for blind source separation. IWAENC 2016: 1-5 - 2015
- [j32]Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, Tomohiro Nakatani:
Exploiting spectro-temporal locality in deep learning based acoustic event detection. EURASIP J. Audio Speech Music. Process. 2015: 26 (2015) - [j31]Marc Delcroix, Takuya Yoshioka, Atsunori Ogawa, Yotaro Kubo, Masakiyo Fujimoto, Nobutaka Ito, Keisuke Kinoshita, Miquel Espi, Shoko Araki, Takaaki Hori, Tomohiro Nakatani:
Strategies for distant speech recognitionin reverberant environments. EURASIP J. Adv. Signal Process. 2015: 60 (2015) - [j30]Miquel Espi, Masakiyo Fujimoto, Tomohiro Nakatani:
Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning. IEICE Trans. Inf. Syst. 98-D(10): 1799-1807 (2015) - [j29]Nobutaka Ito, Emmanuel Vincent, Tomohiro Nakatani, Nobutaka Ono, Shoko Araki, Shigeki Sagayama:
Blind Suppression of Nonstationary Diffuse Acoustic Noise Based on Spatial Covariance Matrix Decomposition. J. Signal Process. Syst. 79(2): 145-157 (2015) - [c113]Takuya Yoshioka, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, Keisuke Kinoshita, Masakiyo Fujimoto, Chengzhu Yu, Wojciech J. Fabian, Miquel Espi, Takuya Higuchi, Shoko Araki, Tomohiro Nakatani:
The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices. ASRU 2015: 436-443 - [c112]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Permutation-free clustering of relative transfer function features for blind source separation. EUSIPCO 2015: 409-413 - [c111]Shoko Araki, Tomoki Hayashi, Marc Delcroix, Masakiyo Fujimoto, Kazuya Takeda, Tomohiro Nakatani:
Exploring multi-channel features for denoising-autoencoder-based speech enhancement. ICASSP 2015: 116-120 - [c110]Keisuke Kinoshita, Tomohiro Nakatani:
Modeling inter-node acoustic dependencies with Restricted Boltzmann Machine for distributed microphone array based BSS. ICASSP 2015: 464-468 - [c109]Takuya Yoshioka, Shigeki Karita, Tomohiro Nakatani:
Far-field speech recognition using CNN-DNN-HMM with convolution in time. ICASSP 2015: 4360-4364 - [c108]Marc Delcroix, Keisuke Kinoshita, Takaaki Hori, Tomohiro Nakatani:
Context adaptive deep neural networks for fast acoustic model adaptation. ICASSP 2015: 4535-4539 - [c107]Masakiyo Fujimoto, Tomohiro Nakatani:
Feature enhancement based on generative-discriminative hybrid approach with gmms and DNNS for noise robust speech recognition. ICASSP 2015: 5019-5023 - [c106]Keisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Text-informed speech enhancement with deep neural networks. INTERSPEECH 2015: 1760-1764 - [c105]Chengzhu Yu, Atsunori Ogawa, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, John H. L. Hansen:
Robust i-vector extraction for neural network adaptation in noisy environment. INTERSPEECH 2015: 2854-2857 - [c104]Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, Tomohiro Nakatani:
Feature extraction strategies in deep learning based acoustic event detection. INTERSPEECH 2015: 2922-2926 - 2014
- [j28]Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
Location Feature Integration for Clustering-Based Speech Separation in Distributed Microphone Arrays. IEEE ACM Trans. Audio Speech Lang. Process. 22(2): 354-367 (2014) - [c103]Marc Delcroix, Takuya Yoshioka, Atsunori Ogawa, Yotaro Kubo, Masakiyo Fujimoto, Nobutaka Ito, Keisuke Kinoshita, Miquel Espi, Shoko Araki, Takaaki Hori, Tomohiro Nakatani:
Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition. GlobalSIP 2014: 522-526 - [c102]Miquel Espi, Masakiyo Fujimoto, Yotaro Kubo, Tomohiro Nakatani:
Spectrogram patch based acoustic event detection and classification in speech overlapping conditions. HSCMA 2014: 117-121 - [c101]Atsunori Ogawa, Keisuke Kinoshita, Takaaki Hori, Tomohiro Nakatani, Atsushi Nakamura:
Fast segment search for corpus-based speech enhancement based on speech recognition technology. ICASSP 2014: 1557-1561 - [c100]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Probabilistic integration of diffuse noise suppression and dereverberation. ICASSP 2014: 5167-5171 - [c99]Masakiyo Fujimoto, Yotaro Kubo, Tomohiro Nakatani:
Unsupervised non-parametric Bayesian modeling of non-stationary noise for model-based noise suppression. ICASSP 2014: 5562-5566 - [c98]Nobutaka Ito, Shoko Araki, Takuya Yoshioka, Tomohiro Nakatani:
Relaxed disjointness based clustering for joint blind source separation and dereverberation. IWAENC 2014: 268-272 - 2013
- [j27]Marc Delcroix, Shinji Watanabe, Tomohiro Nakatani, Atsushi Nakamura:
Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer. Comput. Speech Lang. 27(1): 350-368 (2013) - [j26]Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Atsunori Ogawa, Takaaki Hori, Shinji Watanabe, Masakiyo Fujimoto, Takuya Yoshioka, Takanobu Oba, Yotaro Kubo, Mehrez Souden, Seong-Jun Hahm, Atsushi Nakamura:
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds. Comput. Speech Lang. 27(3): 851-873 (2013) - [j25]Mehrez Souden, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, Hiroshi Sawada:
A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction. IEEE Trans. Speech Audio Process. 21(9): 1913-1928 (2013) - [j24]Takuya Yoshioka, Tomohiro Nakatani:
Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise. IEEE Trans. Speech Audio Process. 21(10): 2182-2192 (2013) - [j23]Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Marc Delcroix, Masakiyo Fujimoto:
Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement. IEEE ACM Trans. Audio Speech Lang. Process. 21(12): 2516-2531 (2013) - [c97]Takuya Yoshioka, Tomohiro Nakatani:
Dereverberation for reverberation-robust microphone arrays. EUSIPCO 2013: 1-5 - [c96]Mehrez Souden, Keisuke Kinoshita, Tomohiro Nakatani:
An integration of source location cues for speech clustering in distributed microphone arrays. ICASSP 2013: 111-115 - [c95]Nobutaka Ito, Shoko Araki, Tomohiro Nakatani:
Permutation-free convolutive blind source separation via full-band clustering based on frequency-independent source presence priors. ICASSP 2013: 3238-3242 - [c94]Takuya Yoshioka, Tomohiro Nakatani:
Noise model transfer using affine transformation with application to large vocabulary reverberant speech recognition. ICASSP 2013: 7058-7062 - [c93]Tomohiro Nakatani, Mehrez Souden, Shoko Araki, Takuya Yoshioka, Takaaki Hori, Atsunori Ogawa:
Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition. ICASSP 2013: 7249-7253 - [c92]Marc Delcroix, Atsunori Ogawa, Seong-Jun Hahm, Tomohiro Nakatani, Atsushi Nakamura:
Unsupervised discriminative adaptation using differenced maximum mutual information based linear regression. ICASSP 2013: 7888-7892 - [c91]Roland Maas, Walter Kellermann, Armin Sehr, Takuya Yoshioka, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani:
Formulation of the REMOS concept from an uncertainty decoding perspective. DSP 2013: 1-6 - [c90]Keisuke Kinoshita, Mehrez Souden, Tomohiro Nakatani:
Blind source separation using spatially distributed microphones based on microphone-location dependent source activities. INTERSPEECH 2013: 822-826 - [c89]Masakiyo Fujimoto, Tomohiro Nakatani:
Model-based noise suppression using unsupervised estimation of hidden Markov model for non-stationary noise. INTERSPEECH 2013: 2982-2986 - [c88]Marc Delcroix, Yotaro Kubo, Tomohiro Nakatani, Atsushi Nakamura:
Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling? INTERSPEECH 2013: 2992-2996 - [c87]Yasufumi Uezu, Keisuke Kinoshita, Mehrez Souden, Tomohiro Nakatani:
On the robustness of distributed EM based BSS in asynchronous distributed microphone array scenarios. INTERSPEECH 2013: 3298-3302 - [c86]Armin Sehr, Takuya Yoshioka, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Roland Maas, Walter Kellermann:
Conditional emission densities for combining speech enhancement and recognition systems. INTERSPEECH 2013: 3502-3506 - [c85]Keisuke Kinoshita, Tomohiro Nakatani:
Microphone-location dependent mask estimation for BSS using spatially distributed asynchronous microphones. ISPACS 2013: 326-331 - [c84]Ingrid Jafari, Nobutaka Ito, Mehrez Souden, Shoko Araki, Tomohiro Nakatani:
Source number estimation based on clustering of speech activity sequences for microphone array processing. MLSP 2013: 1-6 - [c83]Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Armin Sehr, Walter Kellermann, Roland Maas:
The reverb challenge: Acommon evaluation framework for dereverberation and recognition of reverberant speech. WASPAA 2013: 1-4 - 2012
- [j22]Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection. Speech Commun. 54(2): 229-244 (2012) - [j21]Mehrez Souden, Marc Delcroix, Keisuke Kinoshita, Takuya Yoshioka, Tomohiro Nakatani:
Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective. IEEE Signal Process. Lett. 19(8): 495-498 (2012) - [j20]Takuya Yoshioka, Armin Sehr, Marc Delcroix, Keisuke Kinoshita, Roland Maas, Tomohiro Nakatani, Walter Kellermann:
Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition. IEEE Signal Process. Mag. 29(6): 114-126 (2012) - [j19]Katsuhiko Ishiguro, Takeshi Yamada, Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada:
Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information. IEEE Trans. Speech Audio Process. 20(2): 447-460 (2012) - [j18]Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato:
Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera. IEEE Trans. Speech Audio Process. 20(2): 499-513 (2012) - [j17]Takuya Yoshioka, Tomohiro Nakatani:
Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening. IEEE Trans. Speech Audio Process. 20(10): 2707-2720 (2012) - [c82]Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura:
New analytical calculation and estimation of TDOA for underdetermined BSS in noisy environments. APSIPA 2012: 1-6 - [c81]Takuya Yoshioka, Armin Sehr, Marc Delcroix, Keisuke Kinoshita, Roland Maas, Tomohiro Nakatani, Walter Kellermann:
Survey on approaches to speech recognition in reverberant environments. APSIPA 2012: 1-4 - [c80]Mehrez Souden, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani, Hiroshi Sawada:
A multichannel MMSE-based framework for joint blind source separation and noise reduction. ICASSP 2012: 109-112 - [c79]Yasuaki Iwata, Tomohiro Nakatani:
Introduction of speech log-spectral priors into dereverberation based on Itakura-Saito distance minimization. ICASSP 2012: 245-248 - [c78]Shoko Araki, Tomohiro Nakatani:
Sparse vector factorization for underdetermined BSS using wrapped-phase GMM and source log-spectral prior. ICASSP 2012: 265-268 - [c77]Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura:
New analytical update rule for TDOA inference for underdetermined BSS in noisy environments. ICASSP 2012: 269-272 - [c76]Tomohiro Nakatani, Takuya Yoshioka, Shoko Araki, Marc Delcroix, Masakiyo Fujimoto:
LogMax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise. ICASSP 2012: 4029-4032 - [c75]Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation. ICASSP 2012: 4713-4716 - [c74]Marc Delcroix, Atsunori Ogawa, Shinji Watanabe, Tomohiro Nakatani, Atsushi Nakamura:
Discriminative feature transforms using differenced maximum mutual information. ICASSP 2012: 4753-4756 - [c73]Takuya Yoshioka, Emmanuel Ternon, Tomohiro Nakatani:
Time-varying residual noise feature model estimation for multi-microphone speech recognition. ICASSP 2012: 4913-4916 - [c72]Keisuke Kinoshita, Marc Delcroix, Mehrez Souden, Tomohiro Nakatani:
Example-based speech enhancement with joint utilization of spatial, spectral & temporal cues of speech and noise. INTERSPEECH 2012: 1926-1929 - [c71]Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani, Atsushi Nakamura:
Dynamic variance adaptation using differenced maximum mutual information. MLSLP 2012: 9-12 - [c70]Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
Distributed microphone array processing for speech source separation with classifier fusion. MLSP 2012: 1-6 - 2011
- [j16]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi, Hiroshi G. Okuno:
Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization. IEEE Trans. Speech Audio Process. 19(1): 69-84 (2011) - [c69]Shoko Araki, Tomohiro Nakatani:
Hybrid approach for multichannel source separation combining time-frequency mask with multi-channel Wiener filter. ICASSP 2011: 225-228 - [c68]Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto:
Joint unsupervised learning of hidden Markov source models and source location models for multichannel source separation. ICASSP 2011: 237-240 - [c67]Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition. ICASSP 2011: 4816-4819 - [c66]Takuya Yoshioka, Tomohiro Nakatani:
Speech enhancement based on log spectral envelope model and harmonicity-derived spectral mask, and its coupling with feature compensation. ICASSP 2011: 5064-5067 - [c65]Keisuke Kinoshita, Mehrez Souden, Marc Delcroix, Tomohiro Nakatani:
Single Channel Dereverberation Using Example-Based Speech Enhancement with Uncertainty Decoding Technique. INTERSPEECH 2011: 197-200 - [c64]Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
A Multichannel Feature-Based Processing for Robust Speech Recognition. INTERSPEECH 2011: 689-692 - [c63]Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
A Robust Estimation Method of Noise Mixture Model for Noise Suppression. INTERSPEECH 2011: 697-700 - [c62]Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto:
Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR. INTERSPEECH 2011: 1785-1788 - [p2]Marc Delcroix, Shinji Watanabe, Tomohiro Nakatani:
Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing. Robust Speech Recognition of Uncertain or Missing Data 2011: 225-255 - 2010
- [j15]Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki:
Noise robust voice activity detection based on periodic to aperiodic component ratio. Speech Commun. 52(1): 41-60 (2010) - [j14]Tomohiro Nakatani, Walter Kellermann, Patrick A. Naylor, Masato Miyoshi, Biing-Hwang Juang:
Introduction to the Special Issue on Processing Reverberant Speech: Methodologies and Applications. IEEE Trans. Speech Audio Process. 18(7): 1673-1675 (2010) - [j13]Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, Biing-Hwang Juang:
Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction. IEEE Trans. Speech Audio Process. 18(7): 1717-1731 (2010) - [c61]Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada:
Simultaneous clustering of mixing and spectral model parameters for blind sparse source separation. ICASSP 2010: 5-8 - [c60]Tomohiro Nakatani, Shoko Araki:
Single channel source separation based on sparse source observation model with harmonic constraint. ICASSP 2010: 13-16 - [c59]Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Blind upmix of stereo music signals using multi-step linear prediction based reverberation extraction. ICASSP 2010: 49-52 - [c58]Naoki Yasuraoka, Takuya Yoshioka, Tomohiro Nakatani, Atsushi Nakamura, Hiroshi G. Okuno:
Music dereverberation using harmonic structure source model and Wiener filter. ICASSP 2010: 53-56 - [c57]Takuya Yoshioka, Tomohiro Nakatani, Hiroshi G. Okuno:
Noisy speech enhancement based on prior knowledge about spectral envelope and harmonic structure. ICASSP 2010: 4270-4273 - [c56]Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto:
Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model. INTERSPEECH 2010: 2766-2769 - [c55]Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. INTERSPEECH 2010: 3102-3105 - [c54]Yumi Ansa, Shoko Araki, Shoji Makino, Tomohiro Nakatani, Takeshi Yamada, Atsushi Nakamura, Nobuhiko Kitawaki:
Cepstral smoothing of separated signals for underdetermined speech separation. ISCAS 2010: 2506-2509 - [c53]Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato:
Real-time meeting recognition and understanding using distant microphones and omni-directional camera. SLT 2010: 424-429 - [p1]Masato Miyoshi, Marc Delcroix, Keisuke Kinoshita, Takuya Yoshioka, Tomohiro Nakatani, Takafumi Hikichi:
Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information. Speech Dereverberation 2010: 271-310
2000 – 2009
- 2009
- [j12]Shigeaki Amano, Tadahisa Kondo, Kazumi Kato, Tomohiro Nakatani:
Development of Japanese infant speech database from longitudinal recordings. Speech Commun. 51(6): 510-520 (2009) - [j11]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi:
Integrated Speech Enhancement Method Using Noise Suppression and Dereverberation. IEEE Trans. Speech Audio Process. 17(2): 231-246 (2009) - [j10]Marc Delcroix, Tomohiro Nakatani, Shinji Watanabe:
Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing. IEEE Trans. Speech Audio Process. 17(2): 324-334 (2009) - [j9]Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Masato Miyoshi:
Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction. IEEE Trans. Speech Audio Process. 17(4): 534-545 (2009) - [c52]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi:
Fast algorithm for conditional separation and dereverberation. EUSIPCO 2009: 1432-1436 - [c51]Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino:
Blind sparse source separation for unknown number of sources using Gaussian mixture model fitting with Dirichlet prior. ICASSP 2009: 33-36 - [c50]Hirokazu Kameoka, Tomohiro Nakatani, Takuya Yoshioka:
Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms. ICASSP 2009: 45-48 - [c49]Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, Biing-Hwang Juang:
Real-time speech enhancement in noisy reverberant multi-talker environments based on a location-independent room acoustics model. ICASSP 2009: 137-140 - [c48]Takuya Yoshioka, Hideyuki Tachibana, Tomohiro Nakatani, Masato Miyoshi:
Adaptive dereverberation of speech signals with speaker-position change detection. ICASSP 2009: 3733-3736 - [c47]Kentaro Ishizuka, Shoko Araki, Kazuhiro Otsuka, Tomohiro Nakatani, Masakiyo Fujimoto:
A speaker diarization method based on the probabilistic fusion of audio-visual location information. ICMI 2009: 55-62 - [c46]Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino:
Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem. ICA 2009: 742-750 - [c45]Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani:
A study of mutual front-end processing method based on statistical model for noise robust speech recognition. INTERSPEECH 2009: 1235-1238 - [c44]Takuya Yoshioka, Hirokazu Kameoka, Tomohiro Nakatani, Hiroshi G. Okuno:
Statistical models for speech dereverberation. WASPAA 2009: 145-148 - [c43]Katsuhiko Ishiguro, Takeshi Yamada, Shoko Araki, Tomohiro Nakatani:
A probabilistic speaker clustering for DOA-based diarization. WASPAA 2009: 241-244 - 2008
- [j8]Tomohiro Nakatani, Shigeaki Amano, Toshio Irino, Kentaro Ishizuka, Tadahisa Kondo:
A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments. Speech Commun. 50(3): 203-214 (2008) - [j7]Tomohiro Nakatani, Biing-Hwang Juang, Takuya Yoshioka, Keisuke Kinoshita, Marc Delcroix, Masato Miyoshi:
Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model. IEEE Trans. Speech Audio Process. 16(8): 1512-1527 (2008) - [c42]Masato Miyoshi, Keisuke Kinoshita, Takuya Yoshioka, Tomohiro Nakatani:
Principles and applications of dereverberation for noisy and reverberant audio signals. ACSCC 2008: 793-796 - [c41]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi:
An integrated method for blind separation and dereverberation of convolutive audio mixtures. EUSIPCO 2008: 1-5 - [c40]Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, Biing-Hwang Juang:
Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation. ICASSP 2008: 85-88 - [c39]Marc Delcroix, Tomohiro Nakatani, Shinji Watanabe:
Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer. ICASSP 2008: 4073-4076 - [c38]Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani:
A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme. ICASSP 2008: 4441-4444 - [c37]Takuya Yoshioka, Tomohiro Nakatani, Takafumi Hikichi, Masato Miyoshi:
Maximum likelihood approach to speech enhancement for noisy reverberant signals. ICASSP 2008: 4585-4588 - [c36]Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani:
Study of integration of statistical model-based voice activity detection and noise suppression. INTERSPEECH 2008: 2008-2011 - [c35]Dorothea Kolossa, Shoko Araki, Marc Delcroix, Tomohiro Nakatani, Reinhold Orglmeister, Shoji Makino:
Missing feature speech recognition in a meeting situation with maximum SNR beamforming. ISCAS 2008: 3218-3221 - 2007
- [j6]Tomohiro Nakatani, Keisuke Kinoshita, Masato Miyoshi:
Harmonicity-Based Blind Dereverberation for Single-Channel Speech Signals. IEEE Trans. Speech Audio Process. 15(1): 80-95 (2007) - [c34]Tomohiro Nakatani, Biing-Hwang Juang, Takafumi Hikichi, Takuya Yoshioka, Keisuke Kinoshita, Marc Delcroix, Masato Miyoshi:
Study on Speech Dereverberation with Autocorrelation Codebook. ICASSP (1) 2007: 193-196 - [c33]Juan E. Rubio, Kentaro Ishizuka, Hiroshi Sawada, Shoko Araki, Tomohiro Nakatani, Masakiyo Fujimoto:
Two-Microphone Voice Activity Detection Based on the Homogeneity of the Direction of Arrival Estimates. ICASSP (4) 2007: 385-388 - [c32]Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki:
Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio. INTERSPEECH 2007: 230-233 - [c31]Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Masato Miyoshi:
Multi-step linear prediction based speech dereverberation in noisy reverberant environment. INTERSPEECH 2007: 854-857 - [c30]Tomohiro Nakatani, Takafumi Hikichi, Keisuke Kinoshita, Takuya Yoshioka, Marc Delcroix, Masato Miyoshi, Biing-Hwang Juang:
Robust blind dereverberation of speech signals based on characteristics of short-time speech segments. ISCAS 2007: 2986-2989 - [c29]Biing-Hwang Juang, Tomohiro Nakatani:
Joint Source-Channel Modeling and Estimation for Speech Dereverberation. ISCAS 2007: 2990-2993 - 2006
- [j5]Tomohiro Nakatani, Masato Miyoshi, Keisuke Kinoshita:
Blind dereverberation of monaural speech signals based on harmonic structure. Syst. Comput. Jpn. 37(6): 1-12 (2006) - [j4]Kentaro Ishizuka, Tomohiro Nakatani:
A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition. Speech Commun. 48(11): 1447-1457 (2006) - [c28]Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Spectral Subtraction Steered by Multi-Step Forward Linear Prediction For Single Channel Speech Dereverberation. ICASSP (1) 2006: 817-820 - [c27]Tomohiro Nakatani, Biing-Hwang Juang, Keisuke Kinoshita, Masato Miyoshi:
Speech Dereverberation Based on Probabilistic Models of Source and Room Acoustics. ICASSP (1) 2006: 821-824 - [c26]Kentaro Ishizuka, Tomohiro Nakatani:
Study of noise robust voice activity detection based on periodic component to aperiodic component ratio. SAPA@INTERSPEECH 2006: 65-70 - 2005
- [j3]Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Harmonicity Based Dereverberation for Improving Automatic Speech Recognition Performance and Speech Intelligibility. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(7): 1724-1731 (2005) - [c25]Kentaro Ishizuka, Hiroko Kato Solvang, Tomohiro Nakatani:
Speech Signal Analysis with Exponential Autoregressive Model. ICASSP (1) 2005: 225-228 - [c24]Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Fast Estimation of a Precise Dereverberation Filter based on Speech Harmonicity. ICASSP (1) 2005: 1073-1076 - [c23]Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Efficient blind dereverberation framework for automatic speech recognition. INTERSPEECH 2005: 3145-3148 - 2004
- [c22]Tomohiro Nakatani, Keisuke Kinoshita, Masato Miyoshi, Parham Zolfaghari:
Harmonicity based blind dereverberation with time warping. SAPA@INTERSPEECH 2004: 53 - [c21]Tomohiro Nakatani, Keisuke Kinoshita, Masato Miyoshi, Parham Zolfaghari:
Harmonicity based monaural speech dereverberation with time warping and F0 adaptive window. INTERSPEECH 2004: 873-876 - [c20]Kentaro Ishizuka, Noboru Miyazaki, Tomohiro Nakatani, Yasuhiro Minami:
Improvement in robustness of speech feature extraction method using sub-band based periodicity and aperiodicity decomposition. INTERSPEECH 2004: 937-940 - [c19]Kazushi Ishihara, Yuya Hattori, Tomohiro Nakatani, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Disambiguation in determining phonemes of sound-imitation words for environmental sound recognition. INTERSPEECH 2004: 1485-1488 - [c18]Shigeaki Amano, Tomohiro Nakatani, Tadahisa Kondo:
Developmental changes in voiced-segment ratio for Japanese infants and parents. INTERSPEECH 2004: 1853-1856 - [c17]Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Improving automatic speech recognition performance and speech inteligibility with harmonicity based dereverberation. INTERSPEECH 2004: 2649-2652 - [c16]Kazushi Ishihara, Tomohiro Nakatani, Tetsuya Ogata, Hiroshi G. Okuno:
Automatic Sound-Imitation Word Recognition from Environmental Sounds Focusing on Ambiguity Problem in Determining Phonemes. PRICAI 2004: 909-918 - 2003
- [c15]Tomohiro Nakatani, Masato Miyoshi:
Blind dereverberation of single channel speech signal based on harmonic structure. ICASSP (1) 2003: 92-95 - [c14]Tomohiro Nakatani, Toshio Irino, Parham Zolfaghari:
Dominance spectrum based v/UV classification and f_0 estimation. INTERSPEECH 2003: 2313-2316 - [c13]Parham Zolfaghari, Tomohiro Nakatani, Toshio Irino, Hideki Kawahara, Fumitada Itakura:
Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis. INTERSPEECH 2003: 2441-2444 - [c12]Tomohiro Nakatani, Masato Miyoshi, Keisuke Kinoshita:
One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals. NIPS 2003: 1417-1424 - 2002
- [c11]Tomohiro Nakatani, Toshio Irino:
Robust fundamental frequency estimation against background noise and spectral distortion. INTERSPEECH 2002: 1733-1736 - [c10]Toshio Irino, Yasuhiro Minami, Tomohiro Nakatani, Minoru Tsuzaki, H. Tagawa:
Evaluation of a speech recognition / generation method based on HMM and straight. INTERSPEECH 2002: 2545-2548
1990 – 1999
- 1999
- [j2]Tomohiro Nakatani, Hiroshi G. Okuno:
Harmonic sound stream segregation using localization and its application to speech stream segregation. Speech Commun. 27(3-4): 209-222 (1999) - [j1]Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata:
Listening to two simultaneous speeches. Speech Commun. 27(3-4): 299-310 (1999) - 1998
- [c9]Tomohiro Nakatani, Hiroshi G. Okuno:
Sound Ontology for Computational Auditory Scence Analysis. AAAI/IAAI 1998: 1004-1010 - 1997
- [c8]Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata:
Understanding Three Simultaneous Speeches. IJCAI (1) 1997: 30-35 - 1996
- [c7]Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata:
Interfacing Sound Stream Segregation to Automatic Speech Recognition - Preliminary Results on Listening to Several Sounds Simultaneously. AAAI/IAAI, Vol. 2 1996: 1082-1089 - [c6]Tomohiro Nakatani, Masataka Goto, Hiroshi G. Okuno:
Localization by harmonic structure and its application to harmonic sound stream segregation. ICASSP 1996: 653-656 - [c5]Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata:
A new speech enhancement: speech stream segregation. ICSLP 1996: 2356-2359 - 1995
- [c4]Tomohiro Nakatani, Takeshi Kawabata, Hiroshi G. Okuno:
A computational model of sound stream segregation with multi-agent paradigm. ICASSP 1995: 2671-2674 - [c3]Tomohiro Nakatani, Hiroshi G. Okuno, Takeshi Kawabata:
Residue-Driven Architecture for Computational Auditory Scene Analysis. IJCAI 1995: 165-174 - 1994
- [c2]Tomohiro Nakatani, Hiroshi G. Okuno, Takeshi Kawabata:
Auditory Stream Segregation in Auditory Scene Analysis with a Multi-Agent System. AAAI 1994: 100-107 - [c1]Tomohiro Nakatani, Takeshi Kawabata, Hiroshi G. Okuno:
Unified architecture for auditory scene analysis and spoken language processing. ICSLP 1994: 1403-1406
Coauthor Index
aka: Christoph Boeddeker
aka: Reinhold Haeb-Umbach
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-01-27 00:49 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint