With the development of the Internet, social media, mobile apps, and other digital communication technologies, the world has stepped into a multimedia big data era. Millions of multimedia data, including image, text, audio, and video, are uploaded to the social platform every day. To make the artificial intelligence better understand the world around us, it is essential to teach machines to understand the multimodal
messages. Multimodal machine learning, which aims to build models that can process and relate information from different modalities, has been a vibrant field with increasing importance and extraordinary potential. In this novel and hopeful area, extensive efforts have been dedicated to seamlessly unifying computer vision and natural language processing, such as multimedia content recognition (e.g.,
multimodal affect recognition), matching (e.g., cross-modal retrieval), description (e.g., image captioning), indexing (e.g., multimedia event detection), summarization (e.g., video summarization), reasoning (e.g., visual question answering), and so on. Although fruitful progress has been made with deep learning-based methods, the performance of above tasks is still far from users’ expectations, given the heterogeneous data due to several well-known challenges:
(1) how to represent and summarize multimodal data;
(2) how to identify and construct the connection and interaction between different modality data;
(3) how to learn and infer adequate knowledge from multimodal data;
(4) how to translate data or knowledge from one modality to another; and
(5) how to understand and evaluate the heterogeneity in multimodal datasets.
Submission Guidelines:
Authors should prepare their manuscript according to the Instructions for Authors available from the Multimedia Systems website. Authors should submit through the online submission site at Multimedia Systems and select “S.I. - Multi-modal Transformers" when they reach the “Article Type” step in the submission process. Submitted papers should present original, unpublished work, relevant to the topics of the special issue. All submitted papers will be evaluated on the basis of relevance, significance of contribution, technical quality, scholarship, and quality of presentation, by at least three independent reviewers. It is the policy of the journal that no submission, or substantially overlapping submission, be published or be under review at another journal or conference at any time during the review process. Final decisions on all papers are made by the Editor in Chief.