Abstract: Cross-lingual image captioning is a challenging task that requires addressing both cross-lingual and cross-modal obstacles in multimedia analysis. The crucial issue in this task is to model ...
Abstract: News image captioning involves generating descriptive and informative captions for news images by utilizing news article context. This task aims to capture detailed information, including ...
We are excited to release a new video-text benchmark and extendable codes for multi-shot video understanding. Our updated 134k version of dataset includes detailed long summaries for 134k videos and ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果