default search action
MMM 2025, Nara, Japan - Part IV
- Ichiro Ide, Ioannis Kompatsiaris, Changsheng Xu, Keiji Yanai, Wei-Ta Chu, Naoko Nitta, Michael Riegler, Toshihiko Yamasaki:
MultiMedia Modeling - 31st International Conference on Multimedia Modeling, MMM 2025, Nara, Japan, January 8-10, 2025, Proceedings, Part IV. Lecture Notes in Computer Science 15523, Springer 2025, ISBN 978-981-96-2070-8
Regular Papers
- Congjian Lu, Shuwang Zhou, Ke Shan, Hongkuan Zhang, Zhaoyang Liu:
SES-Net: Multi-dimensional Spot-Edge-Surface Network for Nuclei Segmentation. 3-15 - Zhuowei Chen, Mengqi Huang, Nan Chen, Zhendong Mao:
Skin-Adapter: Fine-Grained Skin-Color Preservation for Text-to-Image Generation. 16-29 - Yishan Lv, Jing Luo, Boyuan Ju, Xinyu Yang:
Small Tunes Transformer: Exploring Macro and Micro-level Hierarchies for Skeleton-Conditioned Melody Generation. 30-43 - Yongliang Zhang, Jing Liu:
SMG-Diff: Adversarial Attack Method Based on Semantic Mask-Guided Diffusion. 44-57 - Ding-Chi Chang, Shiou-Chi Li, Jen-Wei Huang:
SPLGAN-TTS: Learning Semantic and Prosody to Enhance the Text-to-Speech Quality of Lightweight GAN Models. 58-70 - Hui Zhao, Na Qi, Qing Zhu, Xiumin Lin:
SSCDUF: Spatial-Spectral Correlation Transformer Based on Deep Unfolding Framework for Hyperspectral Image Reconstruction. 71-84 - Nikhil Sharma, Changchang Sun, Zhenghao Zhao, Anne Hee Hiong Ngu, Hugo Latapie, Yan Yan:
SSDL: Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition. 85-99 - Zhiyi Fang, Yi Qian, Xiyue Dai:
Structural Information-Guided Fine-Grained Texture Image Inpainting. 100-113 - Lingyi Lu, Xin Xu, Xiao Wang:
Style Separation and Content Recovery for Generalizable Sketch Re-identification and a New Benchmark. 114-127 - Daniele Bonatto, Sarah Fachada, Jaime Sancho, Eduardo Juárez, Gauthier Lafruit, Mehrdad Teratani:
Synchronization and Calibration of Video Sequences Acquired Using Multiple Plenoptic 2.0 Cameras. 128-140 - Zihao Suo, Shanliang Pan:
Target-Oriented Dynamic Denosing Curriculum Learning for Multimodel Stance Detection. 141-154 - Yizhou Li, Zihua Liu, Yusuke Monno, Masatoshi Okutomi:
TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration. 155-169 - Shuhei Yamamoto, Noriko Kando:
Temporal Closeness for Enhanced Cross-Modal Retrieval of Sensor and Image Data. 170-183 - Bjørn Aslak Juliussen:
The Right to an Explanation Under the GDPR and the AI Act. 184-197 - Joshua Springer, Gylfi Þór Guðmundsson, Marcel Kyas:
Toward Appearance-Based Autonomous Landing Site Identification for Multirotor Drones in Unstructured Environments. 198-211 - Saumya Yadav, Élise Lincker, Caroline Huron, Stéphanie Martin, Camille Guinaudeau, Shin'ichi Satoh, Jainendra Shukla:
Towards Inclusive Education: Multimodal Classification of Textbook Images for Accessibility. 212-225 - Itthisak Phueaksri, Marc A. Kastner, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide:
Towards Visual Storytelling by Understanding Narrative Context Through Scene-Graphs. 226-239 - Li Yao, Qianni Huang, Yan Wan:
TPS-YOLO: The Efficient Tiny Person Detection Network Based on Improved YOLOv8 and Model Pruning. 240-252 - Junjian Chen, Xuan Yang:
Uncertainty-Guided Joint Semi-supervised Segmentation and Registration of Cardiac Images. 253-267 - Qian Cao, Ruihua Song, Xu Chen:
Understanding the Roles of Visual Modality in Multimodal Dialogue: An Empirical Study. 268-282 - Sotirios Papadopoulos, Konstantinos Ioannidis, Stefanos Vrochidis, Ioannis Kompatsiaris, Ioannis Patras:
Vision-Language Pretraining for Variable-Shot Image Classification. 283-297 - Yu Li, Zhenping Xie:
Visual Anomaly Detection on Topological Connectivity Under Improved YOLOv8. 298-310 - Takamasa Terada, Masahiro Toyoura:
Wavelet Integrated Convolutional Neural Network for ECG Signal Denoising. 311-324 - Feng Li, Jiusong Luo, Wanjun Xia:
WavFusion: Towards Wav2vec 2.0 Multimodal Speech Emotion Recognition. 325-336 - Weijie Wu, Jun Li, Zhijian Wu, Jianhua Xu:
Zero-Shot Sketch-Based Image Retrieval with Hybrid Information Fusion and Sample Relationship Modeling. 337-350
Special Session: ExpertSUM: Special Session on Expert-Level Text Summarization from Fine-Grained Multimedia Analytics
- Hikaru Tanabe, Keiji Yanai:
CalorieVoL: Integrating Volumetric Context Into Multimodal Large Language Models for Image-Based Calorie Estimation. 353-365 - Takumi Fukuzawa, Kensho Hara, Hirokatsu Kataoka, Toru Tamaki:
Can Masking Background and Object Reduce Static Bias for Zero-Shot Action Recognition? 366-379
Special Session: MLLMA: Special Session on Multimodal Large Language Models and Applications
- Su Li, Liang Wang, Jianye Wang, Ziheng Zhang, Junjun Zhang, Lei Zhang:
Enhanced Anomaly Detection in 3D Motion Through Language-Inspired Occlusion-Aware Modeling. 383-397 - Khanh-An C. Quan, Camille Guinaudeau, Shin'ichi Satoh:
Evaluating VQA Models' Consistency in the Scientific Domain. 398-412 - Jia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac, Evangelos Kanoulas:
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models. 413-427 - Chihaya Matsuhira, Marc A. Kastner, Takahiro Komamizu, Takatsugu Hirayama, Ichiro Ide:
Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models. 428-441 - Wei Wei, Bingkun Zhang, Yibing Wang:
TACST: Time-Aware Transformer for Robust Speech Emotion Recognition. 442-453 - Wei Wei, Bingkun Zhang, Yibing Wang:
TS-MEFM: A New Multimodal Speech Emotion Recognition Network Based on Speech and Text Fusion. 454-467
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.