Qi et al., 2021 - Google Patents
Video captioning via a symmetric bidirectional decoderQi et al., 2021
View PDF- Document ID
- 8351702066120743786
- Author
- Qi S
- Yang L
- Publication year
- Publication venue
- IET Computer vision
External Links
Snippet
The dominant video captioning methods employ the attentional encoder–decoder architecture, where the decoder is an autoregressive structure that generates sentences from left‐to‐right. However, these methods generally suffer from the exposure bias issue and …
- 230000002457 bidirectional 0 title abstract description 8
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Wang et al. | An overview of image caption generation methods | |
| Yang et al. | Video captioning by adversarial LSTM | |
| Gabeur et al. | Multi-modal transformer for video retrieval | |
| Abdar et al. | A review of deep learning for video captioning | |
| Dilawari et al. | ASoVS: abstractive summarization of video sequences | |
| Liu et al. | Uamner: uncertainty-aware multimodal named entity recognition in social media posts | |
| Khan et al. | A deep neural framework for image caption generation using gru-based attention mechanism | |
| Jain et al. | RETRACTED ARTICLE: Video captioning: a review of theory, techniques and practices | |
| CN113392265B (en) | Multimedia processing method, device and equipment | |
| Amaresh et al. | Video captioning using deep learning: an overview of methods, datasets and metrics | |
| Tu et al. | I 2 Transformer: Intra-and inter-relation embedding transformer for TV show captioning | |
| Sun et al. | Video question answering: a survey of models and datasets | |
| Wang et al. | LLM-Enhanced multimodal detection of fake news | |
| Piergiovanni et al. | Video question answering with iterative video-text co-tokenization | |
| Liu et al. | A fine-grained spatial-temporal attention model for video captioning | |
| He et al. | Deep learning in natural language generation from images | |
| Rodriguez et al. | How important is motion in sign language translation? | |
| Liu et al. | A multimodal approach for multiple-relation extraction in videos | |
| Wang et al. | RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction | |
| Hussain et al. | Low-resource MobileBERT for emotion recognition in imbalanced text datasets mitigating challenges with limited resources | |
| Xue et al. | Continuous sign language recognition based on hierarchical memory sequence network | |
| Lee et al. | DVC‐Net: A deep neural network model for dense video captioning | |
| Bhalekar et al. | Generation of image captions using VGG and ResNet CNN models cascaded with RNN approach | |
| CN116069926A (en) | Text classification model training method and device, text classification method and device | |
| Qi et al. | Video captioning via a symmetric bidirectional decoder |