![](https://dblp.uni-trier.de./img/logo.320x120.png)
![search dblp search dblp](https://dblp.uni-trier.de./img/search.dark.16x16.png)
![search dblp](https://dblp.uni-trier.de./img/search.dark.16x16.png)
default search action
MMAsia 2024: Auckland, New Zealand
- Ruili Wang, Zhiyong Wang, Jiaying Liu, Alberto Del Bimbo, Jun Zhou, Anup Basu, Min Xu:
Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024, AucklandNew Zealand, December 3-6, 2024. ACM 2024, ISBN 979-8-4007-1273-9
Full Papers
- Chen-Hsiu Huang
, Ja-Ling Wu
:
SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation. 1:1-1:7 - Bingyang Cui
, Yujie Zhang
, Qi Yang
, Yiling Xu
:
MS-GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via multi-scale Geodesic Patch Similarity. 2:1-2:7 - Haowei Lou
, Hye-Young Paik
, Wen Hu
, Lina Yao
:
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech. 3:1-3:7 - Jian Ma
, Bin Zhu
, Kun Li
, Dima Damen
:
Active Object Segmentation: A New Modality for Egocentric Action Recognition. 4:1-4:7 - Xiaoyi Han
, Yanfei Wu
, Nan Pu
, Zunlei Feng
, Qifei Zhang
, Yijun Bei
, Lechao Cheng
:
Fire and Smoke Detection with Burning Intensity Representation. 5:1-5:8 - Xianbin Hu
, Wei Wu
, Zhu Li
:
RandommaskFormer: Light Weight Remote Sensing Scene Classification with Masked Transformer. 6:1-6:7 - Tailin Yang
, Wei Wu
, Zhu Li
, Rui Zhou
:
Multi-Frame Sparse Convolutional Learning for Point Cloud Color Denoising. 7:1-7:7 - Yiran Chen
, Haoran Liu
, Mingzhe Liu
, Yanhua Liu
, Ruili Wang
, Peng Li
:
Moving Object Tracking based on Kernel and Random-coupled Neural Network. 8:1-8:6 - Zihuang Wu
, Xinyu Xiong
, Ying Chen
, Siying Li
, Hua Chen
:
MoE-Polyp: Shifting More Attention to Small Polyp Segmentation via Mixture-of-Experts. 9:1 - Zhiyuan Wang
, Cong Yang
, Yulu Zhang
, Zeyd Boukhers
, Wei Sui
, Yi Ji
, Chunping Liu
:
Transition in Focus of Prediction Tasks for Skeleton Graph Component Detection with Transformer. 10:1-10:7 - Chenqiu Zhao
, Guanfang Dong
, Anup Basu
:
Accelerating Inference of Networks in the Frequency Domain. 11:1-11:7 - Qi Yang
, Kaifa Yang
, Yuke Xing
, Yiling Xu
, Zhu Li
:
A Benchmark for Gaussian Splatting Compression and Quality Assessment Study. 12:1-12:8 - Zichuan Huang
, Yifan Li
, Shuai Yang
, Jiaying Liu
:
CoolColor: Text-guided COherent OLd film COLORization. 13:1-13:7 - Takara Taniguchi
, Ryosuke Furuta
:
Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga. 14:1-14:8 - Shuwei Peng
, Xu Zhang
, Aiwen Jiang
, Changhong Liu
, Jihua Ye
:
Low-Light Image Enhancement via FourierTMamba: A Hybrid Frequency-Spatial Approach. 15:1-15:8 - Xin Li
, Feng Xu
, Yao Tong
, Fan Liu
, Yiwei Fang
, Xin Lyu
, Jun Zhou
:
FreqFormer: A Frequency Transformer for Semantic Segmentation of Remote Sensing Images. 16:1-16:8 - Ze Kun Wang
, Zhan Jun Si
:
Adaptive Both homo- and hetero-Feature Integration for Multimodal Emotion Recognition. 17:1 - Ruikun Zhang
, Hao Yang
, Yan Yang
, Ying Fu
, Liyuan Pan
:
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset. 18:1 - Guan-Yu Wu
, Wei-Ta Chu
:
Incremental Few-Shot Object Detection by Leveraging External Information from Large Multimodal Models. 19:1-19:7 - Yuang Liu
, Dacheng Liao, Mengshi Qi
, Liang Liu
, Huadong Ma
:
RoboFormer: A Robust Multi-Modal Transformer for 3D Object Detection in Autonomous Driving. 20:1-20:7 - Fanyi Wang
, Peng Liu
, Haotian Hu
, Dan Meng
, Jingwen Su
, Jinjin Xu
, Yanhao Zhang
, Xiaoming Ren
, Zhiwang Zhang
:
LoopAnimate: Loopable Salient Object Animation. 21:1-21:8 - Yuankang Pan
, Zhaoquan Yuan
, Xiao Wu
, Zechao Li
, Changsheng Xu
:
TMM-CLIP: Task-guided Multi-Modal Alignment for Rehearsal-Free Class Incremental Learning. 22:1-22:7 - Long H. Nguyen
, Nhat Truong Pham
, Mustaqeem Khan
, Alice Othmani
, Abdulmotaleb El-Saddik
:
HuBERT-CLAP: Contrastive Learning-Based Multimodal Emotion Recognition using Self-Alignment Approach. 23:1-23:6 - Tianchen Zhou
, Jiateng Liu
, Yue Jin
, Li Yao
:
MicroMamba: State Space Model with Partitioned Window Scan for Micro-Expression Recognition. 24:1-24:7 - Yang Yi
, Dasith de Silva Edirimuni
, Ye Zhu
, Shang Gao
, Zhiyong Wang
, Antonio Robles-Kelly
, Xuequan Lu
:
Point Cloud Normal Estimation via Representation Learning on Height Maps. 25:1-25:7 - Jieqiong Zhou
, Guoqing Zhang
, Yuhui Zheng
, Fuguo Zhang
:
Local Feature-Emphasizing Transformer for Cloth-Changing Person Re-identification. 26:1 - Chao Tan
, Sheng Li
, Yang Cao
, Zhao Ren
, Tanja Schultz
:
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition. 27:1-27:7 - Faisal Ahmed
, Justin Rozeboom
, Hanran Song
, Chenqiu Zhao
, Anup Basu
:
LMoW: A Latent Random Variable Model for Unconditional Human Motion Generation. 28:1-28:8 - Yujia Xu
, Deyu Pan
, Ling Ding
:
A method for detecting hands off the steering wheel. 29:1-29:6 - Luhao Zhu
, Xiangwei Kong
, Runsen Li
, Guodong Guo
:
Where You See Is What You Know: A Visual-Semantic Conceptual Explainer. 30:1-30:7 - Shogo Yonezawa
, Yukinobu Taniguchi
, Go Irie
:
Bivariate Mixup for 2D Contact Point Localization with Piezoelectric Microphone Array. 31:1-31:7 - Jinheng Zhou
, Wu Liu
, Guang Yang
, He Zhao
, Feiniu Yuan
:
Prompting Industrial Anomaly Segment with Large Vision-Language Models. 32:1 - Yi-Chen Li
, Chih-Fan Hsu
, Jian-Kai Wang
, Chung-Chi Tsai
, Cheng-Hsin Hsu
:
MAFS: Modality-Aware Federated Semi-Supervised Learning with Selective Data Sharing Specified by Individual Clients. 33:1-33:8 - Yiran Song
, Qianyu Zhou
, Kun Hu
, Lizhuang Ma
, Xuequan Lu
:
CFRL: Coarse-Fine Decoupled Representation Learning For Long-Tailed Recognition. 34:1-34:7 - Tao Jiang
, Feng Hou
, Yi Wang
:
Multimodal Energy Prompting for Video Salient Object Detection. 35:1-35:8 - Yingkai He
, Zhen Zhang
, Jing Xiao
:
A Multi-scale Framework towards Human-Machine Friendly Remote Sensing Image Coding. 36:1-36:6 - Mingwei Cao
, Fengna Wang
, Dengdi Sun
, Haifeng Zhao
:
BCS-NeRF: Bundle Cross-Sensing Neural Radiance Fields. 37:1-37:8 - Zhenzhen Hu
, Xin Guan
, Jia Li
, Zijie Song
, Richang Hong
:
Dual-Stream Keyframe Enhancement for Video Question Answering. 38:1-38:7 - Junchao Ge
, Huafeng Li
, Yafei Zhang
:
Robust discriminative and modal-consistent feature learning for fine-grained sketch-based image retrieval. 39:1-39:8 - Ying Hu
, Chenyi Zhuang
, Pan Gao
:
DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer. 40:1 - Mary Pilataki
, Matthias Mauch
, Simon Dixon
:
Pitch-aware generative pretraining improves multi-pitch estimation with scarce data. 41:1-41:8 - Qianyu Li
, Bingcai Chen
, Jiaxing Tian
, Ruolan Liu
:
FA-UNext: A Feedback Attention-based MLP Network for Medical Image Segmentation. 42:1-42:7 - Jiaqi Chen
, Yan Yang
, Shizhuo Deng
, Da Teng
, Liyuan Pan
:
SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition. 43:1-43:8 - Tingting Yao
, Yuan Gao
, Zihao Feng
, Qing Hu
, Zhiyong Wang
:
Underwater Image Enhancement via Domain Adaptive Transfer Learning and Hybrid Reinforcement Model. 44:1-44:7 - Yuhang Zhang
, Cuixin Yang
, Muxin Liao
, Shishun Tian
, Wenbin Zou
, Chen Xu
:
Layout Relationship Decoupling Framework for Multi-target Domain Adaptative Semantic Segmentation. 45:1-45:7 - Haoyuan Zhang
, Xiangyu Zhu
, Qu Tang
, Zhaoxiang Zhang
, Zhen Lei
:
STODINE: Decompose video to Object-centric Spatial-Temporal Slots for physical reasoning. 46:1-46:7 - Ziqiang Liu, Gongwei Fang, Wentong Wang, Qiang Liu:
Multimodal Sign Language Knowledge Graph and Representation: Text Video KeyFrames and Motion Trajectories. 47:1-47:7 - JieYing Liu
:
The Quantification of Emotional Expressions and Perceptions of Vocal Vibrato in Basic Emotion: Commercial Operatic Singing Recordings. 48:1-48:7 - Zhengjie Lu
, Jinjia Peng
, Huibing Wang
, Qingxuan Shi
, Bin Wang
:
HSMnet: Hybrid Sampling and Matching Network for DETR-based Person Search. 49:1-49:7 - Trung Thanh Nguyen
, Yasutomo Kawanishi
, Takahiro Komamizu
, Ichiro Ide
:
Action Selection Learning for Multi-label Multi-view Action Recognition. 50:1 - Sangni Xu, Hao Xiong, Qiuxia Wu, Zhihui Wang, Shlomo Berkovsky, Zhiyong Wang:
Fast Online Adaptation of Visual SLAM via Variational Information Transfer and Preservation. 51:1-51:7 - Son Duy Dao
, Hengcan Shi
, Dinh Q. Phung
, Jianfei Cai
:
CA-OVS: Cluster and Adapt Mask Proposals for Open-Vocabulary Semantic Segmentation. 52:1-52:8 - Shuangping Chen
, Huijin Wang
, Shun Long
, Jieyun Bai
, Jianmei Jiang
:
Ultrasound Video Segmentation of Pubic Symphysis and Fetal Head for Angle of Progression Measurement. 53:1-53:8 - Pu Li
, Yibiao Zhao
, Xiaobai Liu
:
Policy-driven Auto-Augmentation with Distillment Rewards for Scene Text Recognition. 54:1-54:8 - Li Jiao
, Lihong Cao
, Tian Wang
:
Prompt-based Continual Learning for Extending Pretrained CLIP Models' Knowledge. 55:1-55:8 - Xinhao Zhong
, Siyu Jiao
, Yao Zhao
, Yunchao Wei
:
Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection. 56:1-56:7 - Junjiang Liu
, Dandan Sun
, Hailun Xia
, Jiangtao Bai
, Xinyue Fan
:
FeedMatch: Evolves for Semi-Supervised Multimedia Classification from Student Feedback. 57:1-57:6 - Huilin Chen
:
MFNet: Mixed Feature Network for Enhancing Facial Emotion Recognition on the Small-Scale Dataset. 58:1-58:7 - Zhiyi Mo
, Guangtong Zhang
, Jian Nong
, Bineng Zhong
, Zhi Li
:
Dual-stream Multi-modal Interactive Vision-language Tracking. 59:1-59:7 - Yangyuan Chen
, Zhizhong Ma
, Mingjing Wang
, Mingzhe Liu
:
Advancing Music Emotion Recognition: A Transformer Encoder-Based Approach. 60:1-60:5 - Jie Wang
, Huilin Chen
, Wandong Xue
, Dongming Chen
, Dongqi Wang
:
A Multi-angle Text Recognition Algorithm. 61:1-61:7 - Zhiyuan Li
, Dongnan Liu
, Heng Wang
, Chaoyi Zhang
, Weidong Cai
:
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation. 62:1-62:8 - Yuanyuan Shi
, Yunan Li
, Huizhou Chen
, Siyu Liang
, Qiguang Miao
:
CISampler: Correlated Information Guided Frame Sampling for Gesture Recognition in Video. 63:1-63:8 - Meng Shen
, Yake Wei
, Jianxiong (Terry) Yin
, Deepu Rajan
, Di Hu
, Simon See
:
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning. 64:1-64:8 - Huajie Tan
, Guoqing Xiang
, Xiaodong Xie
, Huizhu Jia
:
Joint Frame-Level and Block-Level Rate-Perception Optimized Preprocessing for Video Coding. 65:1 - Yongjian Liu
, Shunwei Zhang
, Jinyu Xu
, Jiachen Li
, Yanchun Ma
, Qing Xie
:
Dlpp-Net: Degradation Location Prior Prediction Network for Image Restoration. 66:1-66:8 - Fengqi Li
, Mengchao Guo
, Renxuan Xiong
, Donglei Yang
, Yi Wang
, Fengqiang Xu
:
MSTMENet: Multi-Scale Spatio-Temporal Mapping and Evolution Network for Video Deraining. 67:1 - Yichen Ouyang
, Jiayi Ye
, Wenhao Chai
, Dapeng Tao
, Yibing Zhan
, Gaoang Wang
:
An Efficient Multi-prior Hybrid Approach for Consistent 3D Generation from Single Images. 68:1 - Minghui Wang
, Zixu Wang
, Hongbin Xu
, Kun Hu
, Zhiyong Wang
, Wenxiong Kang
:
T2QRM: Text-Driven Quadruped Robot Motion Generation. 69:1-69:7 - Yanming Chen
, Ziyu Liu
, Xiangjian He
:
MambaVesselNet: A Hybrid CNN-Mamba Architecture for 3D Cerebrovascular Segmentation. 70:1-70:7 - Hongyu An
, Xinfeng Zhang
, Shijie Zhao
, Li Zhang
:
FATO: Frequency Attention Transformer for Omnidirectional Image Super-Resolution. 71:1-71:7 - Kai Zhang
, Xia Yuan
, Shuntong Chen
, Di Hu
, Chunxia Zhao
:
Multi-Modality Semantic-Shared Cross-View Ground-to-Aerial Localization. 72:1-72:7 - Sicheng Liu
, Lintao Wang
, Xiaogang Zhu
, Xuequan Lu
, Zhiyong Wang
, Kun Hu
:
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization. 73:1-73:7 - Song Huang
, Ziming Zeng
, Min Li
, Jianping Wang
:
Unified Multi-view Clustering based on Joint Multi-Structure Representation Learning. 74:1-74:7 - Cheng-Kang Tan
, Wei-Ta Chu
:
CS-HOI: Human Object Interaction Detection Enhanced by Common Sense. 75:1-75:7 - Zhiqian Dong
, Sheng Yang
, Peng Zhou
:
Dual-Enhanced Disentangled Multi-View Clustering. 76:1-76:7 - Rim El Filali
, Soufiane Jdaba
, Ronghui Xie
, Ran Shi
, Tong Qiao
, Pan Qiaodong
, Ting Wu
:
S2FB IoU: Improving Boundary-based Object-Centric Image Segmentation Quality Evaluation. 77:1-77:7 - Yu Wei
, Yi Wang
, Shijun Yan
, Tianzhu Wang
, Zhihan Wang
, Weirong Sun
, Yu Zhao
, Xinwei Xue
:
CSUNet: Contour-Sensitive Underwater Salient Object Detection. 78:1-78:7 - Yachao He
, Li Liu
, Huaxiang Zhang
, Dongmei Liu
, Hongzhen Li
:
A Unified Contrastive Framework with Multi-Granularity Fusion for Text-to-Image Generation. 79:1-79:7 - Jingxuan Chen
:
GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video. 80:1-80:7 - Zhaojun Guo
, Junqiang Huang
, Guobiao Li
, Wanli Peng
, Xinpeng Zhang
, Zhenxing Qian
, Sheng Li
:
Emotion-Aware and Efficient Meme Sticker Dialogue Generation. 81:1 - Xiaocong Zhou
, Fan Liu
, Chuanyi Zhang
, Feifan Li
, Wenwen Cai
, Jun Zhou
:
Feature-weighted Multi-stage Bayesian Prototype for Few-shot Classification. 82:1-82:7 - Jiale Wang
, Xueliang Liu
, Yuling Su
:
A Robust Few-shot Learning Framework via Dual-branch Adversarial Noise Pretraining. 83:1-83:8 - Zheng-Xian Keh
, Lai-Kuan Wong
, Yuen Peng Loh
, Ke Gu
, Weisi Lin
:
KBY-Net: A Dual Learning Framework for Improving Object Detection in Rainy Weather Conditions. 84:1-84:7 - Muhammad Saad Shakeel
, Kun Liu
, Xiaochuan Liao
, Wenxiong Kang
:
MRGait: A Multi-range feature learning framework for Cross-View Gait Recognition. 85:1-85:7 - Shun Katada
, Kazunori Komatani
:
Personalized Sentiment Estimation Based on Recall and Resting Ratio of Frontal EEG. 86:1-86:7 - Wenyu Shao
, Hongbo Liu
:
TCFusion: A Three-branch Cross-domain Fusion Network for Infrared and Visible Images. 87:1 - Jianhua Zhao
, Xue Jun Li
, Peter Han Joo Chong
:
HFS-HNeRV: High-Frequency Spectrum Hybrid Neural Representation for Videos. 88:1-88:7 - Zichen Zhu
, Zhongze Tang
, Amir Nassereldine
, Jinjun Xiong
, Sheng Wei
:
OpenVideoWalls: an Open-Source System for Building Video Walls with Recycling Heterogeneous Displays. 89:1-89:7 - Yingkai He
, Zhen Zhang
, Liang Liao
, Jing Xiao
:
Latent Variables Coding for Perceptual Image Compression. 90:1-90:7 - Yuxin Yang
, Pengfei Zhu
, Mengshi Qi
, Huadong Ma
:
Following in the Footsteps: Predicting Human Trajectories Using Motion Pattern Memory. 91:1-91:7 - Liqun Shan
, Rujun Zhang
, Sai Venkatesh Chilukoti
, Xingli Zhang
, Insup Lee
, Xiali Hei
:
IdentityKD: Identity-wise Cross-modal Knowledge Distillation for Person Recognition via mmWave Radar Sensors. 92:1-92:7 - Xingang Wang
, Mengyi Wang
, Hai Cui
, Yijia Zhang
:
Efficient Low-Dimensional Representation Via Manifold Learning-Based Model for Multimodal Sentiment Analysis. 93:1-93:7 - Yixin Zhang
, Yoko Yamakata
, Keishi Tajima
:
Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking Instructions. 94:1-94:7 - Fei Xiang
, Hongbo Liu
, Ruili Wang
, Junjie Hou
, Xingang Wang
:
DCEPNet: Dual-Channel Emotional Perception Network for Speech Emotion Recognition. 95:1 - Dongming Chen
, Mingshuo Nie
, Zhengping Sun
, Huilin Chen
, Dongqi Wang
:
An Information Cascade Prediction Algorithm Based on Time Series. 96:1 - Chengxi Lei
, Satwinder Singh
, Feng Hou
, Ruili Wang
:
Mix-fine-tune: An Alternate Fine-tuning Strategy for Domain Adaptation and Generalization of Low-resource ASR. 97:1-97:7 - Yuchong Sun
, Bei Liu
, Xu Chen
, Ruihua Song
, Jianlong Fu
:
ViCo: Engaging Video Comment Generation with Human Preference Rewards. 98:1 - Zeyu Zhao
, Nan Gao
, Zhi Zeng
, Guixuan Zhang
, Jie Liu
, Shuwu Zhang
:
A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion. 99:1-99:7 - Raj Jaiswal
, Avinash Anand
, Rajiv Ratn Shah
:
Advancing Multimodal LLMs: A Focus on Geometry Problem Solving Reasoning and Sequential Scoring. 100:1-100:7 - Haipeng Li
, Guangcun Wei
, Haochen Xu
, Boyan Guo
:
DocPointer: A parameter-efficient Pointer Network for Key Information Extraction. 101:1-101:7 - Yani Chen
, Jiaxiang E
, Kaiyu Nie
, Xiaoxia Nie
, Ruili Wang
:
Development of a Chinese Synonym Library: Enhancing Clinical Terminology Standardization and Interoperability. 102:1-102:7 - Chen Wang
, Feng Hou
, Yi Wang
, Ruili Wang
:
Structured Bipartite Graph Ensemble Clustering. 103:1-103:7 - Yuan Gao
, Feng Hou
, Ruili Wang
:
Incorporating Pre-ordering Representations for Low-resource Neural Machine Translation. 104:1-104:7 - Wenhao Gao
, Zhenbo Song
, Zhenyuan Zhang
, Jianfeng Lu
:
On the Robustness of Deep Face Inpainting: An Adversarial Perspective. 105:1-105:7 - Ying Qiao
, Aoxuan Chen
, Xiang Li
, Jinfei Gao
:
Variational Stochastic Multiple Auto-Encoder For Multimodal Recommendation. 106:1-106:7 - Jiazhen Zhang
, Kun Li
, Yanyan Wei
, Fei Wang
, Wei Qian
, Jinxing Zhou
, Dan Guo
:
Repetitive Action Counting with Feature Interaction Enhancement and Adaptive Gate Fusion. 107:1-107:7 - Xiong Zeng
, Min Jiang
, Ronghua Huang
:
Multi-stage Image Deraining based on Pre-trained Diffusion Model. 108:1-108:7 - Tomoya Sugihara
, Shuntaro Masuda
, Ling Xiao
, Toshihiko Yamasaki
:
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video. 109:1 - Qingjin Wei
, Xiaozhuo Li
, Dinglu Liu
, Zhiwu Liao
:
MFTAnet: Two-step Aggregation Net of Multiscale Features for Pneumoconiosis Screening. 110:1-110:7 - Huijie Zhang
, Xiaobai Liu
:
Focal Diffusion Process for Object-Aware 3D LiDAR Generation. 111:1-111:7 - Longyun Dong
, Yuanrong Xu
, Jianping Zhong
, Zhaobo Qi
, Weigang Zhang
:
Improving Sequential DeepFake Detection with Local information enhancement. 112:1 - Pingyi Huo
, Ajay Narayanan Sridhar
, Md Fahim Faysal Khan
, Kiwan Maeng
, Vijaykrishnan Narayanan
:
QoS-Diff: Adaptive Auto-tuning Framework for Low-latency Diffusion Model Inference. 113:1-113:7 - Cui Xu
, Laiyun Qing
:
Point-Supervised Temporal Action Detection with Label Supplementation Based on Transformer. 114:1-114:7 - Xu Gu
, Xihua Wang
, Chuhao Jin
, Ruihua Song
:
ScaMo: Towards Text to Video Storyboard Generation Using Scale and Movement of Shots. 115:1-115:8 - Guohuan Gao
, Gang Zhang
, Xiangyang Xu
:
ADP3D: Adaptive Point Selection for Efficient Multi-frame 3D Object Detection. 116:1-116:7 - Shanshan Yao
, Tian Li
:
Multi-domain Acoustic Feature Fusion for Speaker Recognition. 117:1-117:6 - Zhitong Zhu
, Jing Yu
, Keke Gai
, Jiamin Zhuang
, Gaopeng Gou
, Gang Xiong
:
Flexible Semantic Watermarking for Robust Diffusion Model Detection and Tracing. 118:1-118:7 - Shan Wan
, Wu Liu
, Yijun Liu
, Feiniu Yuan
, Chunli Meng
:
Watermarking Vision-Language Models. 119:1 - Jinwei Li
, Yongkang Cheng
, Yonghe Zhang
, Pengcheng Wang
:
Hierarchical Part-Attention Networks for 3D Human Reconstruction. 120:1
Short Papers and Demo Papers
- Hairui Yang
, Ning Wang
, Zhihui Wang
, Lei Wang
:
Sketch-based 3D Model Retrieval with Cross-Modal Representation. 121:1-121:5 - Daidou Guo
, Chuan Qin
:
PCMark-NAS: Lightweight Print-Camera Resilient Watermarking Networks via Neural Architecture Search. 122:1-122:5 - Daidou Guo
, Ching-Chun Chang
, Cheng SenMao
, Chuan Qin
:
Highly Fault-Tolerant Discrete Lattice Information Coding Method for Screen-Shooting Scenarios. 123:1-123:5 - Yu Song
, Xiaohui Yang
, Rongping Huang
, Haifeng Bai
, Lili Yang
:
CSCCap: Plugging Sparse Coding in Zero-Shot Image Captioning. 124:1-124:5 - Zuyi Pei
, Baoli Sun
, Zhihui Wang
, Haojie Li
:
Fine-grained Video Semantic Distillation for Video-Text Retrieval. 125:1-125:5 - Zihao Tang
, Xinyi Wang
, Mariano Cabezas
, Arkiev D'Souza
, Michael Barnett
, Fernando Calamante
, Weidong Cai
, Chenyu Wang
:
Fibre Population-guided Pre-training for 3D Spatial Super-Resolution on Multimodal Brain Diffusion MR Imaging. 126:1 - Mingzhe Zhang
, Laura J. Ferris
, Lin Yue
, Miao Xu
:
Emotionally Guided Symbolic Music Generation Using Diffusion Models: The AGE-DM Approach. 127:1-127:5 - Wei-Lun Huang
, Shao-Hung Wu
, Hung-Chang Huang
, Min-Chun Hu
, Tse-Yu Pan
:
Description-Driven Audiovisual Embedding Space Learning for Enhanced Movie Understanding. 128:1-128:5 - Chenxi Niu
, Ziyu Liu
, Xiangjian He
:
SS-FS CSA: Self-Supervised and Fully Supervised Integration for 3D Cerebrovascular Segmentation. 129:1-129:5 - Hao Zhang
, Xingning Dong
, Jinfei Gao
, Liang Hao
, Pei Shen
, Tian Gan
:
MBC-ATA: Maximum Binary Classification and Anchor-based Triplet Augmentation for Unbiased Scene Graph Generation. 130:1-130:5 - Tianqi Wei
, Zhi Chen
, Xin Yu
:
Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild. 131:1-131:3
![](https://dblp.uni-trier.de./img/cog.dark.24x24.png)
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.