Qi et al., 2021 - Google Patents

Video captioning via a symmetric bidirectional decoder

Qi et al., 2021

Document ID: 8351702066120743786
Author: Qi S; Yang L
Publication year: 2021
Publication venue: IET Computer vision

External Links

Cited by

Snippet

The dominant video captioning methods employ the attentional encoder–decoder architecture, where the decoder is an autoregressive structure that generates sentences from left‐to‐right. However, these methods generally suffer from the exposure bias issue and …

Continue reading at ietresearch.onlinelibrary.wiley.com (PDF) (other versions)

230000002457 bidirectional 0 title abstract description 8

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search

Similar Documents

Publication	Publication Date	Title
Wang et al.	2020	An overview of image caption generation methods
Yang et al.	2018	Video captioning by adversarial LSTM
Gabeur et al.	2020	Multi-modal transformer for video retrieval
Abdar et al.	2024	A review of deep learning for video captioning
Dilawari et al.	2019	ASoVS: abstractive summarization of video sequences
Liu et al.	2022	Uamner: uncertainty-aware multimodal named entity recognition in social media posts
Khan et al.	2022	A deep neural framework for image caption generation using gru-based attention mechanism
Jain et al.	2022	RETRACTED ARTICLE: Video captioning: a review of theory, techniques and practices
CN113392265B (en)	2025-02-18	Multimedia processing method, device and equipment
Amaresh et al.	2019	Video captioning using deep learning: an overview of methods, datasets and metrics
Tu et al.	2022	I 2 Transformer: Intra-and inter-relation embedding transformer for TV show captioning
Sun et al.	2021	Video question answering: a survey of models and datasets
Wang et al.	2024	LLM-Enhanced multimodal detection of fake news
Piergiovanni et al.	2022	Video question answering with iterative video-text co-tokenization
Liu et al.	2018	A fine-grained spatial-temporal attention model for video captioning
He et al.	2018	Deep learning in natural language generation from images
Rodriguez et al.	2021	How important is motion in sign language translation?
Liu et al.	2022	A multimodal approach for multiple-relation extraction in videos
Wang et al.	2024	RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
Hussain et al.	2025	Low-resource MobileBERT for emotion recognition in imbalanced text datasets mitigating challenges with limited resources
Xue et al.	2024	Continuous sign language recognition based on hierarchical memory sequence network
Lee et al.	2021	DVC‐Net: A deep neural network model for dense video captioning
Bhalekar et al.	2019	Generation of image captions using VGG and ResNet CNN models cascaded with RNN approach
CN116069926A (en)	2023-05-05	Text classification model training method and device, text classification method and device
Qi et al.	2021	Video captioning via a symmetric bidirectional decoder