[go: up one dir, main page]

Qi et al., 2021 - Google Patents

Video captioning via a symmetric bidirectional decoder

Qi et al., 2021

View PDF
Document ID
8351702066120743786
Author
Qi S
Yang L
Publication year
Publication venue
IET Computer vision

External Links

Snippet

The dominant video captioning methods employ the attentional encoder–decoder architecture, where the decoder is an autoregressive structure that generates sentences from left‐to‐right. However, these methods generally suffer from the exposure bias issue and …
Continue reading at ietresearch.onlinelibrary.wiley.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Similar Documents

Publication Publication Date Title
Wang et al. An overview of image caption generation methods
Yang et al. Video captioning by adversarial LSTM
Gabeur et al. Multi-modal transformer for video retrieval
Abdar et al. A review of deep learning for video captioning
Dilawari et al. ASoVS: abstractive summarization of video sequences
Liu et al. Uamner: uncertainty-aware multimodal named entity recognition in social media posts
Khan et al. A deep neural framework for image caption generation using gru-based attention mechanism
Jain et al. RETRACTED ARTICLE: Video captioning: a review of theory, techniques and practices
CN113392265B (en) Multimedia processing method, device and equipment
Amaresh et al. Video captioning using deep learning: an overview of methods, datasets and metrics
Tu et al. I 2 Transformer: Intra-and inter-relation embedding transformer for TV show captioning
Sun et al. Video question answering: a survey of models and datasets
Wang et al. LLM-Enhanced multimodal detection of fake news
Piergiovanni et al. Video question answering with iterative video-text co-tokenization
Liu et al. A fine-grained spatial-temporal attention model for video captioning
He et al. Deep learning in natural language generation from images
Rodriguez et al. How important is motion in sign language translation?
Liu et al. A multimodal approach for multiple-relation extraction in videos
Wang et al. RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
Hussain et al. Low-resource MobileBERT for emotion recognition in imbalanced text datasets mitigating challenges with limited resources
Xue et al. Continuous sign language recognition based on hierarchical memory sequence network
Lee et al. DVC‐Net: A deep neural network model for dense video captioning
Bhalekar et al. Generation of image captions using VGG and ResNet CNN models cascaded with RNN approach
CN116069926A (en) Text classification model training method and device, text classification method and device
Qi et al. Video captioning via a symmetric bidirectional decoder