default search action
ICMR 2024: Phuket, Thailand
- Cathal Gurrin, Rachada Kongkachandra, Klaus Schoeffmann, Duc-Tien Dang-Nguyen, Luca Rossetto, Shin'ichi Satoh, Liting Zhou:
Proceedings of the 2024 International Conference on Multimedia Retrieval, ICMR 2024, Phuket, Thailand, June 10-14, 2024. ACM 2024
Regular Long Papers
- Xinzhe Ni, Yong Liu, Hao Wen, Yatai Ji, Jing Xiao, Yujiu Yang:
Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition. 1-10 - Kaixing Yang, Xukun Zhou, Xulong Tang, Ran Diao, Hongyan Liu, Jun He, Zhaoxin Fan:
BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval. 11-19 - Yang Xu, Yifan Feng, Lin Bie:
Triadic Elastic Structure Representation for Open-Set Incremental 3D Object Retrieval. 20-28 - Stephan Repp, Ernst Georg Haffner:
Dynamic Segmentation for Efficient Retrieval of Podcasts: The Repping Algorithm. 29-36 - Zhaoxin Fan, Fengxin Li, Hongyan Liu, Jun He, Xiaoyong Du:
PoseRec: 3D Human Pose Driven Online Advertisement Recommendation for Micro-videos. 37-45 - Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li:
Progressive Multi-modal Conditional Prompt Tuning. 46-54 - Zhaoxin Fan, Zhenbo Song, Zhicheng Wang, Jian Xu, Kejian Wu, Hongyan Liu, Jun He:
ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation. 55-63 - Yunfeng Yu, Longlong Lin, Qiyu Liu, Zeli Wang, Xi Ou, Tao Jia:
GSD-GNN: Generalizable and Scalable Algorithms for Decoupled Graph Neural Networks. 64-72 - Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan:
Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank. 73-82 - Hua Gao, Chenchen Hu, Guang Han, Jiafa Mao, Wei Huang, Kaiyuan Wan:
HashNeck is a Boosting Tool for Deep Learning to Hashing. 83-91 - Di Wang, Feng Yan, Yifeng Wang, Lin Zhao, Xiao Liang, Haodi Zhong, Ronghua Zhang:
Fine-grained Semantics-aware Representation Learning for Text-based Person Retrieval. 92-100 - Guangzhe Zhao, Yanan Liu, Xueping Wang, Feihu Yan:
CMFF-Face: Attention-Based Cross-Modal Feature Fusion for High-Quality Audio-Driven Talking Face Generation. 101-110 - Meng Wei, Zhongnian Li, Yong Zhou, Xinzheng Xu:
Learning from Reduced Labels for Long-Tailed Data. 111-119 - Tianyi Wang, Shenghua Zhong:
Fingerprinting in EEG Model IP Protection Using Diffusion Model. 120-128 - Weixing Liu, Shenghua Zhong:
MarginFinger: Controlling Generated Fingerprint Distance to Classification boundary Using Conditional GANs. 129-136 - Chuang Zhao, Hefei Ling, Shijie Lu, Yuxuan Shi, Jiazhong Chen, Ping Li:
Improve Deep Hashing with Language Guidance for Unsupervised Image Retrieval. 137-145 - Yue Yang, Liangjun Ke:
Exploiting Degradation Prior for Personalized Federated Learning in Real-World Image Super-Resolution. 146-154 - Hui Liu, Xiaojun Wan:
QAVidCap: Enhancing Video Captioning through Question Answering Techniques. 155-164 - Fanlei Meng, Xiangru Chen, Yuan Cao:
Targeted Universal Adversarial Attack on Deep Hash Networks. 165-174 - Feifei Fu, Yizhao Gao, Zhiwu Lu:
Enhancing Class-Incremental Learning for Image Classification via Bidirectional Transport and Selective Momentum. 175-183 - Mingzhe Yu, Yunshan Ma, Lei Wu, Kai Cheng, Xue Li, Lei Meng, Tat-Seng Chua:
Smart Fitting Room: A One-stop Framework for Matching-aware Virtual Try-On. 184-192 - Mingyue Li, Yuting Zhu, Ruizhong Du, Chunfu Jia:
Secure Verification Encrypted Image Retrieval Scheme with Addition Homomorphic Bitmap Index. 193-201 - Xingquan Cai, Haoyu Zhang, Shanshan He, Haoyu Song, Haiyan Sun:
A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera Videos. 202-210 - Donghuo Zeng, Yanan Wang, Kazushi Ikeda, Yi Yu:
Anchor-aware Deep Metric Learning for Audio-visual Retrieval. 211-219 - Jiaao Yu, Yunlai Ding, Junyu Dong, Yuezun Li:
Dynamic Soft Labeling for Visual Semantic Embedding. 220-228 - Feifei Xu, Ziheng Yu:
Navigating Style Variations in Scene Text Image Super-Resolution through Multi-Scale Perception. 229-238 - Depei Liu, Hongjie Fan, Junfei Liu:
ExpoGenius: Robust Personalized Human Image Generation using Diffusion Model for Exposure Variation and Pose Transfer. 239-247 - Xudong Ru, Haichuan Zhao, Xingce Wang, Zhongke Wu, Shaolong Liu, Yi-Cheng Zhu, Alejandro F. Frangi:
Vector-Aware Anisotropic Gauge Equivariant Mesh Convolution Network for 3D Aneurysm Detection. 248-256 - Junming Wang, Yi Shi:
NeurNCD: Novel Class Discovery via Implicit Neural Representation. 257-265 - Lin Bie, Siqi Li, Kai Cheng:
Image-to-Point Registration via Cross-Modality Correspondence Retrieval. 266-274 - Lilong Wen, Xiu Tang, Dongxiang Zhang:
TWIST: Text-only Weakly Supervised Scene Text Spotting Using Pseudo Labels. 275-284 - Xintao Jiao, Jiansheng Chen, Jiale Liu:
A Graph Convolution Network with a POS-aware Filter and Context Enhancement Mechanism for Event Detection. 285-292 - Florian Spiess, Nicolas Scharowski, Ariane Haller, Zgjim Memeti, Heiko Schuldt, Florian Brühlmann:
Bringing Video Browsing to Virtual Reality: Empirical Evaluation of a Novel Multimedia Drawer. 293-301 - Changgu Chen, Yang Li, Jian Zhang, Jiali Liu, Changbo Wang:
Generative Data Augmentation with Liveness Information Preserving for Face Anti-Spoofing. 302-310 - Lucas Joos, Bastian Jäckl, Daniel A. Keim, Maximilian T. Fischer, Ladislav Peska, Jakub Lokoc:
Known-Item Search in Video: An Eye Tracking-Based Study. 311-319 - Huixia Ben, Shuo Wang, Meng Wang, Richang Hong:
Pseudo Content Hallucination for Unpaired Image Captioning. 320-329 - Haiyang Zheng, Ruilin Zhang, Hongpeng Wang:
Deep Image Clustering Based on Curriculum Learning and Density Information. 330-338 - Jiaxin Li, Zhihan Yu, Guibo Luo, Yuesheng Zhu:
CodeDetector: Revealing Forgery Traces with Codebook for Generalized Deepfake Detection. 339-347 - Zeli Wang, Jian Li, Shuyin Xia, Longlong Lin, Guoyin Wang:
Text Adversarial Defense via Granular-Ball Sample Enhancement. 348-356 - Zeli Wang, Tuo Zhang, Shuyin Xia, Longlong Lin, Guoyin Wang:
GBRAIN: Combating Textual Label Noise by Granular-ball based Robust Training. 357-365 - Wei Tang, Yuanyi Wang:
Multi-modal Entity Alignment via Position-enhanced Multi-label Propagation. 366-375 - Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, Jing Xiao, Jianzong Wang:
Retrieval-Augmented Audio Deepfake Detection. 376-384 - Yongcheng Zhang, Lingou Kong, Sheng Tian, Hao Fei, Changpeng Xiang, Huan Wang, Xiaomei Wei:
Multi-view Counterfactual Contrastive Learning for Fact-checking Fake News Detection. 385-393 - Danyang Hou, Liang Pang, Huawei Shen, Xueqi Cheng:
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement. 394-403 - Albatool Wazzan, Imtiaz Ahmad, Stephen MacNeil, Richard Souvenir:
Context or Clutter? Efficiently Matching Objects Across Scenes. 404-413 - Tianpeng Zhang, Xuesong Jiang:
A Lightweight Surface Defect Segmentation Network with External Semantics and High-frequency Information. 414-422 - Zhenghao Zhao, Hao Tang, Joy Wan, Yan Yan:
Monocular Expressive 3D Human Reconstruction of Multiple People. 423-432 - Mei Yu, Xiaoxi Zhou, Mankun Zhao, Tianyi Xu, Yue Zhao, Ruiguo Yu, Xuewei Li:
A Causal View for Multi-Interest User Modeling in News Recommendation. 433-441 - Yang Liu, Tongfei Shen, Dong Zhang, Qingying Sun, Shoushan Li, Guodong Zhou:
Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection. 442-450 - Yichen Yan, Xingjian He, Sihan Chen, Jing Liu:
Calibration & Reconstruction: Deeply Integrated Language for Referring Image Segmentation. 451-459 - Thao-Nhu Nguyen, Zongyao Li, Satoshi Yamazaki, Jianquan Liu, Cathal Gurrin:
A Parallel Transformer Framework for Video Moment Retrieval. 460-468 - Pengfei Wei, Hongjun Ouyang, Qintai Hu, Bi Zeng, Guang Feng, Qingpeng Wen:
VEC-MNER: Hybrid Transformer with Visual-Enhanced Cross-Modal Multi-level Interaction for Multimodal NER. 469-477 - Weiwei Zhou, Guoqiang Xiao, Michael S. Lew, Song Wu:
Causal Inference-based Few-Shot Class-Incremental Learning. 478-487 - Zixin Tang, Haihui Fan, Xiaoyan Gu, Yang Li, Bo Li, Xin Wang:
ELSEIR: A Privacy-Preserving Large-Scale Image Retrieval Framework for Outsourced Data Sharing. 488-496 - Yijing Zhao, Yuchao Xia, Yi Ding, Yumeng Liu, Shuai Liu, Hongan Wang:
S2F-Net: Shared-Specific Fusion Network for Infrared and Visible Image Fusion. 497-505 - Gullal S. Cheema, Judi Arafat, Chiao-I Tseng, John A. Bateman, Ralph Ewerth, Eric Müller-Budack:
Identification of Speaker Roles and Situation Types in News Videos. 506-514 - Tianwei Chen, Noa Garcia, Liangzhi Li, Yuta Nakashima:
Retrieving Emotional Stimuli in Artworks. 515-523 - Pengfei Wei, Zhaokang Huang, Hongjun Ouyang, Qintai Hu, Bi Zeng, Guang Feng:
CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction. 524-532 - Chenxiao Liu, Zheyong Xie, Sirui Zhao, Jin Zhou, Tong Xu, Minglei Li, Enhong Chen:
Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation. 533-542 - Zhirui Kuai, Yulu Zhou, Qi Xie, Li Kuang:
Multi-Source Augmentation and Composite Prompts for Visual Recognition with Missing Modality. 543-551 - Xiangyu Liu, Yanlei Shang, Yong Chen:
TriMPL: Masked Multi-Prompt Learning with Knowledge Mixing for Vision-Language Few-shot Learning. 552-560 - Zhongnian Li, Peng Ying, Meng Wei, Tongfeng Sun, Xinzheng Xu:
Prompt Expending for Single Positive Multi-Label Learning with Global Unannotated Categories. 561-569 - Yaqun Fang, Yi Shi, Jia Bei, Tongwei Ren:
Semantic-guided RGB-Thermal Crowd Counting with Segment Anything Model. 570-578 - Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, Peng Wang:
Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning. 579-588 - Zhenyu Xie, Huanyu He, Gui Zou, Jie Wu, Guoliang Liu, Jun Zhao, Yingxue Wang, Hui Lin, Weiyao Lin:
Visibility-guided Human Body Reconstruction from Uncalibrated Multi-view Cameras. 589-598 - Yilin Li, Tszyin Guo, Ying Qiao, Zitong Bo, Hongan Wang:
FEST: A Multi-way Framework with Enhanced Spatial-Temporal Modeling for Traffic Forecasting. 599-607 - Yuchen Niu, Min Zhu, Zhihua Wei:
SamCap: Energy-based Controllable Image Captioning by Gradient-Based Sampling. 608-617 - Zhuoyuan Wei, Xun Jiang, Zheng Wang, Fumin Shen, Xing Xu:
PTAN: Principal Token-aware Adjacent Network for Compositional Temporal Grounding. 618-627 - Chao Ye, Qian Wang, Lanfang Dong:
A Hybrid Few-Shot Image Classification Framework Combining Gaussian Modeling and Label Propagation. 628-637 - Shizhou Huang, Bo Xu, Changqun Li, Jiabo Ye, Xin Lin:
A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis. 638-646 - Zhikai Hu, Yiu-ming Cheung, Yonggang Zhang, Peiying Zhang, Pui-ling Tang:
Component-Level Oracle Bone Inscription Retrieval. 647-656 - Nico Hezel, Kai Uwe Barthel, Konstantin Schall, Klaus Jung:
An Exploration Graph with Continuous Refinement for Efficient Multimedia Retrieval. 657-665 - Siqi Wei, Bin Wu:
Intra and Inter-modality Incongruity Modeling and Adversarial Contrastive Learning for Multimodal Fake News Detection. 666-674 - Kaixing Yang, Xulong Tang, Ran Diao, Hongyan Liu, Jun He, Zhaoxin Fan:
CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic Unit. 675-683 - Yuwen Yang, Yuxiang Lu, Suizhi Huang, Shalayiding Sirejiding, Hongtao Lu, Yue Ding:
Federated Multi-Task Learning on Non-IID Data Silos: An Experimental Study. 684-693 - Xiaoqian Liang, Jianji Wang, Yuanliang Lu, Xubin Duan, Xichun Liu, Nanning Zheng:
Refracting Once is Enough: Neural Radiance Fields for Novel-View Synthesis of Real Refractive Objects. 694-703 - Bo Li, You Wu, Zhixin Li:
Team HUGE: Image-Text Matching via Hierarchical and Unified Graph Enhancing. 704-712 - Peijia Chen, Ke Qi, Xi Tao, Wenhao Xu, Jingdong Zhang:
MFVG: A Visual Grounding Network with Multi-scale Fusion. 713-721 - Zhijian Wu, Wenhui Liu, Dingjiang Huang:
When Handcrafted Filter Meets CNN: A Lightweight Conv-Filter Mixer Network for Efficient Image Super-Resolution. 722-730 - Dahuang Liu, Jiuxiang You, Guobo Xie, Lap-Kei Lee, Fu Lee Wang, Zhenguo Yang:
Modality-specific and -shared Contrastive Learning for Sentiment Analysis. 731-739 - Zhuohua Li, Ruyun Wang, Fuqing Zhu, Jizhong Han, Songlin Hu:
Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image Classification. 740-748 - Xuanhao Qi, Min Zhi, Yanjun Yin, Ping Ping, Yuening Zhang:
SFAM: Lightweight Spectrum Unreferenced Attention Network. 749-757 - Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou:
FaceX: Understanding Face Attribute Classifiers through Summary Model Explanations. 758-766 - Weipeng Yang, Hongxia Gao, Wenbin Zou, Tongtong Liu, Shasha Huang, Jianliang Ma:
Low-Light Image Enhancement via Weighted Low-Rank Tensor Regularized Retinex Model. 767-775 - Lai Wei, Shanshan Song:
Multi-view Subspace Clustering via An Adaptive Consensus Graph Filter. 776-784 - Ruihai Wu, Yourong Zhang, Yu Qi, Andy Guanhong Chen, Hao Dong:
Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns. 785-794 - Xigang Bao, Mengyuan Tian, Luyao Wang, Zhiyuan Zha, Biao Qin:
Contrastive Pre-training with Multi-level Alignment for Grounded Multimodal Named Entity Recognition. 795-803 - Jian Yang, Weize Quan, Zhen Shen, Dong-Ming Yan, Huaiyu Wu:
Neural Parametric Human Hand Modeling with Point Cloud Representation. 804-813 - Yi Li, Qingmeng Zhu, Changwen Zheng, Jiangmeng Li:
MSI: Multi-modal Recommendation via Superfluous Semantics Discarding and Interaction Preserving. 814-823 - Chao He, Hongxi Wei:
HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval. 824-832 - Lisong Ou, Zhixin Li:
Modeling Multi-Task Joint Training of Aggregate Networks for Multi-Modal Sarcasm Detection. 833-841 - Ziyu Gong, Chengcheng Mai, Yihua Huang:
ML2MG-VLCR: A Multimodal LLM Guided Zero-shot Method for Visio-linguistic Compositional Reasoning with Autoregressive Generative Language Model. 842-850 - Ziqing Deng, Zhihui Lai, Yujuan Ding, Heng Kong, Xu Wu:
Deep Scaling Factor Quantization Network for Large-scale Image Retrieval. 851-859 - Yan Wang, Yawen Zeng, Junjie Liang, Xiaofen Xing, Jin Xu, Xiangmin Xu:
RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation. 860-868 - Runlai Hao, Jinlong Li, Qiuju Chen, Huanhuan Chen:
DualStyle3D: Real-time Exemplar-based Artistic Portrait View Synthesis Based on Radiance Field. 869-877 - Jiancheng Huang, Mingfu Yan, Yifan Liu, Shifeng Chen:
SBCR: Stochasticity Beats Content Restriction Problem in Training and Tuning Free Image Editing. 878-887 - Shenghao Liu, Yuqin Lan, Xianjun Deng, Lingzhi Yi, Chenlu Zhu, Laurence T. Yang, Jong Hyuk Park:
TrustGo: Trust Mining and Multi-semantic Regularization in Social Recommendation. 888-896 - Beiqi Liu, Fuqing Duan, Junli Zhao:
SkeletonFormer: Point Cloud Completion with Dynamic Selective Skeleton Points. 897-905 - Chen Huang, Zhijun Fan, Kui Xiao, Yan Zhang, Shihui Wang, Jianhua Song, Wei Wu, Chao Liu:
Research on Epilepsy Classification Model Based on Variational Mode Quadratic Decomposition. 906-914 - Xukun Zhou, Zhenbo Song, Jun He, Hongyan Liu, Zhaoxin Fan:
STDG: Semi-Teacher-Student Training Paradigm for Depth-guided One-stage Scene Graph Generation. 915-924 - Anrui Wang, Libo Weng, Fei Gao:
BFIDet: A YOLOv7-improved Vehicle and Pedestrian Detector via Balancing Feature Integration. 925-933 - Chun-Yen Chen, Mei-Chen Yeh:
Self-Supervised Multi-Label Classification with Global Context and Local Attention. 934-942 - Tianlong Zhang, Jing Lv, Ming Yang:
Semi-Parametric Style Transfer with Multi-Perspective Feature Fusion and Information-Guided Alignment. 943-950 - Kontawat Wisetpaitoon, Sattaya Singkul, Theerat Sakdejayont, Tawunrat Chalothorn:
End-to-End Thai Text-to-Speech with Linguistic Unit. 951-959 - Linhao Zhou, Sheng-Hua Zhong, Zhijiao Xiao:
Discovering Multi-Relational Integration for Knowledge Tracing with Retentive Networks. 960-968 - Qin Jiang, Qinglin Wang, Lihua Chi, Wentao Ma, Feng Li, Jie Liu:
DeepEnhancer: Temporally Consistent Focal Transformer for Comprehensive Video Enhancement. 969-977 - Hongyi Zhu, Jia-Hong Huang, Stevan Rudinac, Evangelos Kanoulas:
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models. 978-987 - Yitong Xing, Guoqiang Xiao, Michael S. Lew, Song Wu:
Lifelong Visible-Infrared Person Re-Identification via a Tri-Token Transformer with a Query-Key Mechanism. 988-997 - Wenzhuo Li, Yinghui Wang, Wei Li, Liangyi Huang, Kamoliddin Shukurov, Mingfeng Wang:
Wireless Capsule Endoscope Low-light Image Enhancement with Balanced Brightness and Saturation. 998-1005 - Sohail Ahmed Khan, Duc-Tien Dang-Nguyen:
CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection. 1006-1015 - Boyue Xu, Ruichao Hou, Tongwei Ren, Gangshan Wu:
RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory. 1016-1024 - Yongkang Ding, Anqi Wang, Liyan Zhang:
Multidimensional Semantic Disentanglement Network for Clothes-Changing Person Re-Identification. 1025-1033 - Yuting Mei, Linli Yao, Qin Jin:
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos. 1034-1042 - Ali Abdari, Alex Falcon, Giuseppe Serra:
AdOCTeRA: Adaptive Optimization Constraints for improved Text-guided Retrieval of Apartments. 1043-1050 - Ruiting Dai, Yuqiao Tan, Lisi Mo, Shuang Liang, Guohao Huo, Jiayi Luo, Yao Cheng:
G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning. 1051-1060 - Minyang Xu, Yunzhong Lou, Weijian Ma, Xueyang Li, Xiangdong Zhou:
Parametric CAD Primitive Retrieval via Multi-Modal Fusion and Deep Hashing. 1061-1069 - Lai Wei, Mingyuan Xi:
Subspace Clustering with A Hybrid Adaptive Graph Filter. 1070-1078
Regular Short Papers
- Cencen Liu, Dongyang Zhang, Ke Qin:
Knowledge Distillation for Single Image Super-Resolution via Contrastive Learning. 1079-1083 - Yuhang Zheng, Zhen Wang, Long Chen:
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning. 1084-1088 - Shuyang Zhang, Liangwu Wei, Qingyu Wang, Yuntao Wei, Yanzhi Song:
CLCP: Realtime Text-Image Retrieval for Retailing via Pre-trained Clustering and Priority Queue. 1089-1093 - Mengzhu Yu, Zhenjun Tang, Huijiang Zhuang, Xiaoping Liang, Zhixin Li, Xianquan Zhang:
Robust Video Hashing with Non-negative Tensor Factorization for Copy Detection. 1094-1098 - Yihua Chen, Xiaoping Liang, Mengzhu Yu, Zhenjun Tang:
Unifying Pictorial and Textual Features for Screen Content Image Quality Evaluation. 1099-1103 - Mingyong Li, Zongwei Zhao, Xiaolong Jiang, Zheng Jiang:
CLIP-ProbCR: CLIP-based Probability embedding Combination Retrieval. 1104-1109 - Peihao Li, Jie Huang, Shuaishuai Zhang, Chunyang Qi:
Proactive Privacy and Intellectual Property Protection of Multimedia Retrieval Models in Edge Intelligence. 1110-1114 - Ruonan Zhang, Xiaohang Liu, Ge Li, Thomas H. Li, Pengjun Zhao:
Sketch-aided Interactive Fusion Point Cloud Place Recognition. 1115-1119 - Huxiao Ji, Haitao Yang, Linchuan Li, Shunyu Zhang, Cunyi Zhang, Xuanping Li, Wenwu Ou:
TIM: Temporal Interaction Model in Notification System. 1120-1124 - Quan Li, Xike Xie, Chao Wang, Jiali Weng:
Local Deep Learning Quantization for Approximate Nearest Neighbor Search. 1125-1129 - Pengfei Zhou, Fangxiang Feng, Xiaojie Wang:
DiffHarmony: Latent Diffusion Model Meets Image Harmonization. 1130-1134 - Haoran Tong, Xinyan Liu, Guorong Li, Laiyun Qing:
Directly Locating Actions in Video with Single Frame Annotation. 1135-1139 - Ruoxi Sun, Xinyu Yang, Cong Qian, Chenyu Zhu, Wei Sui, Zeyd Boukhers, Cong Yang:
YawnNet: A Visual-Centric Approach for Yawning Detection. 1140-1144 - Eisaku Yoshikawa, Keishi Tajima:
Content-Based Exclusion Queries in Keyword-Based Image Retrieval. 1145-1149 - Zhikang Zhang, Zhongjie Zhu, Yongqiang Bai, Ming Wang, Zhijing Yu:
Octree-Retention Fusion: A High-Performance Context Model for Point Cloud Geometry Compression. 1150-1154 - Zhuo Lei, Qiang Yu, Lidan Shou, Shengquan Li, Yunqing Mao:
A GAN based Video Summarization Method with Representation Loss. 1155-1159 - Sherzod Hakimov, Gullal S. Cheema:
Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict. 1160-1164 - Minh-Son Dao, Koji Zettsu:
Near-Miss Accident Prediction on the Edge: A Real-Time System for Safer Driving. 1165-1169 - Qinghua Sun, Jia Cui, Zhenyu Gu:
Extending CLIP for Text-to-font Retrieval. 1170-1174 - Xitie Zhang, Suping Wu:
CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning. 1175-1179 - Chih-Pin Tan, Shuen-Huei Guan, Yi-Hsuan Yang:
PiCoGen: Generate Piano Covers with a Two-stage Approach. 1180-1184 - Yueying Feng, Fan Ma, Wang Lin, Chang Yao, Jingyuan Chen, Yi Yang:
FedPAM: Federated Personalized Augmentation Model for Text-to-Image Retrieval. 1185-1189
Brave New Ideas Papers
- Lorin Sweeney, Graham Healy, Alan F. Smeaton:
Reconciling the Rift Between Recognition and Recall: Insights from a Video Memorability Drawing Experiment. 1190-1198 - Kai Uwe Barthel, Florian Tim Barthel, Peter Eisert, Nico Hezel, Konstantin Schall:
Creating Sorted Grid Layouts with Gradient-based Optimization. 1199-1206 - Christian Limberg, Zhe Zhang:
Mapping the Audio Landscape for Innovative Music Sample Generation. 1207-1213
Doctoral Symposium Papers
- Jia-Hong Huang:
Multi-modal Video Summarization. 1214-1218 - Maria Eirini Pegia:
Multimodality in Media Retrieval. 1219-1223
Reproducibility Track Papers
- Shuiying Liao, Yujuan Ding, P. Y. Mok, Qiushi Huang, Jialun Cao:
Reproducibility Companion Paper: Recommendation of Mix-and-Match Clothing by Modeling Indirect Personal Compatibility. 1224-1227 - Yankun Wu, Yuta Nakashima, Noa Garcia, Sheng Li, Zhaoyang Zeng:
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis. 1228-1231 - Fan Yu, Beibei Zhang, Yaqun Fang, Jia Bei, Tongwei Ren, Jiyi Li, Luca Rossetto:
Reproducibility Companion Paper of "MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style". 1232-1235
Technical Demonstrations
- Luca Rossetto:
OpenLifelogCam - A Low-Cost Open-Source Wearable Camera Platform. 1236-1240 - Panumate Chetprayoon, Sakol Tasanangam, Gayatri Tirumalasetty, Thanatwit Angsarawanee, Paveen Virameteekul, Wadeepas Lertwatanawanich, Theerat Sakdejayont:
CarAI: Car Inspection with Artificial Intelligence. 1241-1245 - Kuo-Yu Liu, Ting-Yu Guo, Ta-Shan Pan, Ping-Yi Tung, Yi-Rou Lin:
AI Batting Buddy: A Computational and Kinematic Approach for Enhancing Batting Performance and Analysis in Baseball. 1246-1250 - Supatta Viriyavisuthisakul, Parinya Sanguansat, Toshihiko Yamasaki:
A Web Demo Interface for Super-Resolution Reconstruction with Parametric Regularization Loss. 1251-1254 - Quang-Linh Tran, Binh T. Nguyen, Gareth J. F. Jones, Cathal Gurrin:
MemoriLens: a Low-cost Lifelog Camera Using Raspberry Pi Zero. 1255-1259 - Maria Eirini Pegia, Dimitris Georgalis, Nick Pantelidis, Björn Þór Jónsson, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris:
3DMSE: An Interactive 3D Media Search Engine. 1260-1264 - Daniel D. Braghis, Haiming Liu:
Conversational Image Search: A Sketch-based Approach. 1265-1269 - Wang Xia, Guodao Sun, Zihao Zhu, Pan Liang, Sujia Zhu, Yiming Wu, Haoran Liang, Ronghua Liang:
RE-IDVIS: Person Re-Identification System based on Interactive Visualization. 1270-1274
Challenge Papers
- Duc-Tien Dang-Nguyen, Sohail Ahmed Khan, Michael Riegler, Pål Halvorsen, Anh-Duy Tran, Minh-Son Dao, Minh-Triet Tran:
Overview of the Grand Challenge on Detecting Cheapfakes at ACM ICMR 2024. 1275-1281 - Hoa-Vien Vo-Hoang, Long-Khanh Pham, Minh-Son Dao:
Detecting Out-of-Context Media with LLaMa-Adapter V2 and RoBERTa: An Effective Method for Cheapfakes Detection. 1282-1287 - Long-Khanh Pham, Hoa-Vien Vo-Hoang, Anh-Duy Tran:
A Generative Adaptive Context Learning Framework for Large Language Models in Cheapfake Detection. 1288-1293 - Anh-Thu Le, Minh-Dat Nguyen, Minh-Son Dao, Anh-Duy Tran, Duc-Tien Dang-Nguyen:
TeGA: A Text-Guided Generative-based Approach in Cheapfake Detection. 1294-1299 - Van-Loc Nguyen, Bao-Tin Nguyen, Thanh-Son Nguyen, Duc-Tien Dang-Nguyen, Minh-Triet Tran:
A Unified Network for Detecting Out-Of-Context Information Using Generative Synthetic Data. 1300-1305 - Dang Vu, Minh-Nhat Nguyen, Quoc-Trung Nguyen:
Enhancing Cheapfake Detection: An Approach Using Prompt Engineering and Interleaved Text-Image Model. 1306-1311 - Jangwon Seo, Hyo-Seok Hwang, Jiyoung Lee, Minhyeok Lee, Wonsuk Kim, Junhee Seok:
A Multi-Stage Deep Learning Approach Incorporating Text-Image and Image-Image Comparisons for Cheapfake Detection. 1312-1316
Invited Talks Abstracts
- Alan F. Smeaton:
The LLM Wrecking Ball: Are We About to Lose Decades of Work in Multimedia because of MM-LLMs? 1317 - Yi-Ping Phoebe Chen:
Diversity in Multimedia. 1318
Tutorial Abstracts
- Frank Sommers, Alisa Kongthon, Sarawoot Kongyoung:
Fine-Tuning Large Language Models for Private Document Retrieval: A Tutorial. 1319-1320 - Vinh Dang, Thanh-Son Nguyen, Minh-Triet Tran, Duc-Tien Dang-Nguyen:
Detecting Misinformation in Photos Utilizing Reverse Image Search. 1321-1323 - Maria Pegia, Sotiris Diplaris, Stefanos Vrochidis, Heiko Schuldt, Florian Spiess, Rahel Arnold, Werner Bailer:
Multimedia Retrieval in and for XR. 1324-1325 - Shiqi Wang, Xinfeng Zhang:
Compact Visual Data Representation for Multimedia Search and Analytics. 1326-1327
Workshop Abstracts
- Tai Tan Mai, Quang-Linh Tran, Ly-Duyen Tran, Tu V. Ninh, Duc-Tien Dang-Nguyen, Cathal Gurrin:
The First ACM Workshop on AI-Powered Question Answering Systems for Multimedia. 1328-1329 - Mahasak Ketcham, Kanyalag Phodong, Patiyuth Pramkeaw, Worawut Yimyam, Narumol Chumuang, Pokpong Songmuang, Thittaporn Ganokratanaa:
AI-SIPM 2024: International Workshop on Artificial Intelligence for Signal, Image Processing and Multimedia. 1330-1331 - Minh-Son Dao, Michael Alexander Riegler, Duc-Tien Dang-Nguyen, Hanh-Nhi Tran, Rage Uday Kiran, Takahiro Komamizu:
ICDAR 24: Intelligent Cross-Data Analysis and Retrieval. 1332-1333 - Cathal Gurrin, Liting Zhou, Graham Healy, Werner Bailer, Duc-Tien Dang-Nguyen, Steve Hodges, Björn Þór Jónsson, Jakub Lokoc, Luca Rossetto, Minh-Triet Tran, Klaus Schöffmann:
Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24. 1334-1335 - Zhedong Zheng, Yaxiong Wang, Xuelin Qian, Zhun Zhong, Zheng Wang, Liang Zheng:
MORE'24 Multimedia Object Re-ID: Advancements, Challenges, and Opportunities. 1336-1338 - Cristian Lucian Stanciu, Bogdan Ionescu, Luca Cuccovillo, Symeon Papadopoulos, Giorgos Kordopatis-Zilos, Adrian Popescu, Roberto Caldelli:
MAD '24 Workshop: Multimedia AI against Disinformation. 1339-1341 - Marc A. Kastner, Gullal S. Cheema, Sherzod Hakimov, Noa Garcia:
MUWS 2024: The 3rd International Workshop on Multimodal Human Understanding for the Web and Social Media. 1342-1344 - Hui Wang, Josef Kittler, Mark J. F. Gales, Rob Cooper, Maurice D. Mulvenna, Wing W. Y. Ng, Yang Hua, Richard Gault, Abbas Haider, Guanfeng Wu:
MVRMLM 2024: Multimodal Video Retrieval and Multimodal Language Modelling. 1345-1346 - Hongzhang Mu, Shuili Zhang, Hongbo Xu:
A Knowledge-Driven Approach to Enhance Topic Modeling with Multi-Modal Representation Learning. 1347-1355
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.