default search action
Minsu Kim 0001
Person information
- affiliation: Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Integrated Vision and Language Laboratory, Daejeon, South Korea
- affiliation: Yonsei University, Seoul, School of Electrical and Electronic Engineering, South Korea
Other persons with the same name
- Minsu Kim (aka: Min-su Kim, Min-Su Kim) — disambiguation page
- Minsu Kim 0002 — Samsung Electronics Company Ltd, Network Business, Suwon, South Korea (and 1 more)
- Minsu Kim 0003 — Virginia Tech, Bradley Department of Electrical and Computer Engineering, Arlington, VA, USA
- Minsu Kim 0004 — Korea Advanced Institute of Science and Technology (KAIST), Department of Electrical Engineering, Daejeon, South Korea
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j4]Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro:
Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3934-3946 (2024) - [j3]Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro:
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model. IEEE Trans. Multim. 26: 6462-6474 (2024) - [c25]Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro:
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation. ACL (1) 2024: 16334-16348 - [c24]Jeongsoo Choi, Se Jin Park, Minsu Kim, Yong Man Ro:
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation. CVPR 2024: 27315-27327 - [c23]Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro:
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing. EMNLP (Findings) 2024: 11391-11406 - [c22]Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro:
Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation. ICASSP 2024: 4325-4329 - [c21]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens. ICASSP 2024: 7970-7974 - [c20]Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro:
Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models. ICASSP 2024: 8065-8069 - [c19]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro:
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper. ICASSP 2024: 10471-10475 - [c18]Minsu Kim, Jeong Hun Yeo, Se Jin Park, Hyeongseop Rha, Yong Man Ro:
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation. ACM Multimedia 2024: 1311-1320 - [i27]Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro:
Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units. CoRR abs/2401.09802 (2024) - [i26]Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro:
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing. CoRR abs/2402.15151 (2024) - [i25]Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro:
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. CoRR abs/2402.16021 (2024) - [i24]Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro:
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation. CoRR abs/2406.07867 (2024) - 2023
- [c17]Minsu Kim, Chae Won Kim, Yong Man Ro:
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video. AAAI 2023: 8273-8281 - [c16]Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro:
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring. CVPR 2023: 18783-18794 - [c15]Minsu Kim, Joanna Hong, Yong Man Ro:
Lip-to-Speech Synthesis in the Wild with Multi-Task Learning. ICASSP 2023: 1-5 - [c14]Jeong Hun Yeo, Minsu Kim, Yong Man Ro:
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition. ICASSP 2023: 1-5 - [c13]Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro:
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge. ICCV 2023: 15313-15325 - [c12]Jeongsoo Choi, Minsu Kim, Yong Man Ro:
Intelligible Lip-to-Speech Synthesis with Speech Units. INTERSPEECH 2023: 4349-4353 - [i23]Minsu Kim, Hyung-Il Kim, Yong Man Ro:
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition. CoRR abs/2302.08102 (2023) - [i22]Minsu Kim, Joanna Hong, Yong Man Ro:
Lip-to-Speech Synthesis in the Wild with Multi-task Learning. CoRR abs/2302.08841 (2023) - [i21]Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro:
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring. CoRR abs/2303.08536 (2023) - [i20]Minsu Kim, Chae Won Kim, Yong Man Ro:
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video. CoRR abs/2303.08670 (2023) - [i19]Jeong Hun Yeo, Minsu Kim, Yong Man Ro:
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition. CoRR abs/2305.04542 (2023) - [i18]Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro:
Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation. CoRR abs/2305.19556 (2023) - [i17]Jeongsoo Choi, Minsu Kim, Yong Man Ro:
Intelligible Lip-to-Speech Synthesis with Speech Units. CoRR abs/2305.19603 (2023) - [i16]Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro:
Reprogramming Audio-driven Talking Face Synthesis into Text-driven. CoRR abs/2306.16003 (2023) - [i15]Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro:
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation. CoRR abs/2308.01831 (2023) - [i14]Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro:
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model. CoRR abs/2308.07593 (2023) - [i13]Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro:
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge. CoRR abs/2308.09311 (2023) - [i12]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens. CoRR abs/2309.08531 (2023) - [i11]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro:
Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model. CoRR abs/2309.08535 (2023) - [i10]Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro:
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion. CoRR abs/2310.05934 (2023) - [i9]Jeongsoo Choi, Se Jin Park, Minsu Kim, Yong Man Ro:
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation. CoRR abs/2312.02512 (2023) - 2022
- [j2]Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro:
CroMM-VSR: Cross-Modal Memory Augmented Visual Speech Recognition. IEEE Trans. Multim. 24: 4342-4355 (2022) - [c11]Minsu Kim, Jeong Hun Yeo, Yong Man Ro:
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading. AAAI 2022: 1174-1182 - [c10]Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro:
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory. AAAI 2022: 2062-2070 - [c9]Joanna Hong, Minsu Kim, Yong Man Ro:
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection. ECCV (36) 2022: 452-468 - [c8]Minsu Kim, Hyunjun Kim, Yong Man Ro:
Speaker-Adaptive Lip Reading with User-Dependent Padding. ECCV (36) 2022: 576-593 - [c7]Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro:
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition. INTERSPEECH 2022: 2838-2842 - [i8]Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro:
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video. CoRR abs/2204.01265 (2022) - [i7]Minsu Kim, Jeong Hun Yeo, Yong Man Ro:
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading. CoRR abs/2204.01725 (2022) - [i6]Minsu Kim, Joanna Hong, Yong Man Ro:
Lip to Speech Synthesis with Visual Context Attentional GAN. CoRR abs/2204.01726 (2022) - [i5]Joanna Hong, Minsu Kim, Yong Man Ro:
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection. CoRR abs/2206.07458 (2022) - [i4]Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro:
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition. CoRR abs/2207.06020 (2022) - [i3]Minsu Kim, Hyunjun Kim, Yong Man Ro:
Speaker-adaptive Lip Reading with User-dependent Padding. CoRR abs/2208.04498 (2022) - [i2]Minsu Kim, Youngjoon Yu, Sungjune Park, Yong Man Ro:
Meta Input: How to Leverage Off-the-Shelf Deep Neural Networks. CoRR abs/2210.13186 (2022) - [i1]Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro:
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory. CoRR abs/2211.00924 (2022) - 2021
- [j1]Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro:
Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3654-3667 (2021) - [c6]Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro:
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video. ICCV 2021: 296-306 - [c5]Junho Kim, Minsu Kim, Yong Man Ro:
Interpretation of Lesional Detection via Counterfactual Generation. ICIP 2021: 96-100 - [c4]Minsu Kim, Joanna Hong, Yong Man Ro:
Lip to Speech Synthesis with Visual Context Attentional GAN. NeurIPS 2021: 2758-2770 - 2020
- [c3]Minsu Kim, Hong Joo Lee, Sangmin Lee, Yong Man Ro:
Robust Video Facial Authentication With Unsupervised Mode Disentanglement. ICIP 2020: 1321-1325 - [c2]Junho Kim, Minsu Kim, Jung Uk Kim, Hong Joo Lee, Sangmin Lee, Joanna Hong, Yong Man Ro:
Learning Style Correlation for Elaborate Few-Shot Classification. ICIP 2020: 1791-1795 - [c1]Minsu Kim, Joanna Hong, Junho Kim, Hong Joo Lee, Yong Man Ro:
Unsupervised Disentangling of Viewpoint and Residues Variations by Substituting Representations for Robust Face Recognition. ICPR 2020: 8952-8959
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-26 01:55 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint