Koutras et al., 2018 - Google Patents

Exploring cnn-based architectures for multimodal salient event detection in videos

Koutras et al., 2018

Document ID: 14210410902962915782
Author: Koutras P; Zlatinsi A; Maragos P
Publication year: 2018
Publication venue: 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP)

External Links

Cited by

Snippet

Nowadays, multimodal attention plays a significant role in many machine-based understanding applications, computer vision and robotic applications, such as action recognition or summarization. In this paper, we present our approach to the problem of …

Continue reading at n12safonidi.cs.ece.ntua.gr (PDF) (other versions)

238000001514 detection method 0 title abstract description 23

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30029—Querying by filtering; by personalisation, e.g. querying making use of user profiles
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G06K9/00711—Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30817—Information retrieval; Database structures therefor; File system structures therefor of video data using information manually generated or using information not derived from the video content, e.g. time and location information, usage information, user ratings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/102—Programmed access in sequence to addressed parts of tracks of operating record carriers
- G11B27/105—Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Ghose et al.	2020	Autofoley: Artificial synthesis of synchronized sound tracks for silent videos with deep learning
Tian et al.	2020	Unified multisensory perception: Weakly-supervised audio-visual video parsing
Ramaswamy et al.	2020	See the sound, hear the pixels
Tiwari et al.	2021	A survey of recent work on video summarization: approaches and techniques
Chung et al.	2016	Out of time: automated lip sync in the wild
Zlatintsi et al.	2017	COGNIMUSE: A multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
Surís et al.	2018	Cross-modal embeddings for video and audio retrieval
US8135221B2 (en)	2012-03-13	Video concept classification using audio-visual atoms
Vijayakumar et al.	2012	A study on video data mining
CN108307229B (en)	2023-12-22	Video and audio data processing method and device
Chen et al.	2022	A comprehensive survey on video saliency detection with auditory information: The audio-visual consistency perceptual is the key!
Acar et al.	2014	Understanding affective content of music videos through learned representations
Koutras et al.	2018	Exploring cnn-based architectures for multimodal salient event detection in videos
Stappen et al.	2020	Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild
Kächele et al.	2016	Revisiting the EmotiW challenge: how wild is it really? Classification of human emotions in movie snippets based on multiple features
Lian et al.	2018	Investigation of multimodal features, classifiers and fusion methods for emotion recognition
Li et al.	2014	What's making that sound?
Pantazis et al.	2020	Salsum: Saliency-based video summarization using generative adversarial networks
Zhang et al.	2022	AI video editing: A survey
Rani et al.	2020	Automatic Video Classification: A Review.
Acar et al.	2015	Fusion of learned multi-modal representations and dense trajectories for emotional analysis in videos
Ayadi et al.	2023	Deep neural network architectures for audio emotion recognition performed on song and speech modalities
Stappen et al.	2020	MuSe 2020--The First International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop
Chisholm et al.	2015	Audio-based affect detection in web videos
Javed et al.	2022	Multimodal summarization: A concise review