Im et al., 2024 - Google Patents
Local feature‐based video captioning with multiple classifier and CARU‐attentionIm et al., 2024
View PDF- Document ID
- 8177686742581801334
- Author
- Im S
- Chan K
- Publication year
- Publication venue
- IET Image Processing
External Links
Snippet
Video captioning aims to identify multiple objects and their behaviours in a video event and generate captions for the current scene. This task aims to generate a detailed description of the current video in real‐time using natural language, which requires deep learning to …
- 238000000034 method 0 abstract description 91
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/32—Aligning or centering of the image pick-up or image-field
- G06K9/3233—Determination of region of interest
- G06K9/325—Detection of text region in scene imagery, real life image or Web pages, e.g. licenses plates, captions on TV images
- G06K9/3258—Scene text, e.g. street name
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G06K9/00711—Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
- G06K9/00718—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00624—Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
- G06K9/00771—Recognising scenes under surveillance, e.g. with Markovian modelling of scene activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6288—Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
- G06K9/6292—Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion of classification results, e.g. of classification results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6201—Matching; Proximity measures
- G06K9/6202—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00362—Recognising human body or animal bodies, e.g. vehicle occupant, pedestrian; Recognising body parts, e.g. hand
- G06K9/00369—Recognition of whole body, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Xu et al. | Dual-stream recurrent neural network for video captioning | |
| Yang et al. | Video captioning by adversarial LSTM | |
| Liu et al. | Online data organizer: micro-video categorization by structure-guided multimodal dictionary learning | |
| Chen et al. | Less is more: Picking informative frames for video captioning | |
| Amaresh et al. | Video captioning using deep learning: an overview of methods, datasets and metrics | |
| Tu et al. | I 2 Transformer: Intra-and inter-relation embedding transformer for TV show captioning | |
| CN115293348A (en) | Pre-training method and device for multi-mode feature extraction network | |
| Sun et al. | Two‐Level Multimodal Fusion for Sentiment Analysis in Public Security | |
| Liu et al. | A fine-grained spatial-temporal attention model for video captioning | |
| Li et al. | A deep reinforcement learning framework for Identifying funny scenes in movies | |
| Gao et al. | A hierarchical recurrent approach to predict scene graphs from a visual‐attention‐oriented perspective | |
| Li et al. | Social context-aware person search in videos via multi-modal cues | |
| Xu et al. | A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition | |
| Deng et al. | ContentCTR: Frame-level live streaming click-through rate prediction with multimodal transformer | |
| Liu et al. | A multimodal approach for multiple-relation extraction in videos | |
| Huang et al. | A Survey on Deep Learning for Group-Level Emotion Recognition | |
| Varma et al. | RETRACTED: An efficient deep learning‐based video captioning framework using multi‐modal features | |
| Kamath et al. | Multimodal LLMs | |
| Yu et al. | CAGNet: a context-aware graph neural network for detecting social relationships in videos | |
| Lee et al. | DVC‐Net: A deep neural network model for dense video captioning | |
| Im et al. | Local feature‐based video captioning with multiple classifier and CARU‐attention | |
| Mehta et al. | Open-domain trending hashtag recommendation for videos | |
| Song et al. | Hierarchical LSTMs with adaptive attention for visual captioning | |
| Wang et al. | Improvement of continuous emotion recognition of temporal convolutional networks with incomplete labels | |
| Su et al. | Recurrent unit augmented memory network for video summarisation |