Wei et al., 2023 - Google Patents

Mseva: A system for multimodal short videos emotion visual analysis

Wei et al., 2023

Document ID: 12736360048191603041
Author: Wei Q; Zhou Y; Xiao L; Zhang Y
Publication year: 2023
Publication venue: arXiv preprint arXiv:2312.04279

External Links

Cited by

Snippet

YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different …

Continue reading at arxiv.org (PDF) (other versions)

230000008451 emotion 0 title abstract description 121

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30796—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using original textual content or text extracted from visual content or transcript of audio data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30029—Querying by filtering; by personalisation, e.g. querying making use of user profiles
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce, e.g. shopping or e-commerce
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information

Similar Documents

Publication	Publication Date	Title
Pont-Tuset et al.	2020	Connecting vision and language with localized narratives
Wang et al.	2020	An overview of image caption generation methods
Gao et al.	2021	Hierarchical representation network with auxiliary tasks for video captioning and video question answering
Labatut et al.	2019	Extraction and analysis of fictional character networks: A survey
Li et al.	2019	Know more say less: Image captioning based on scene graphs
Ou et al.	2020	Multimodal local-global attention network for affective video content analysis
Yang et al.	2018	Video captioning by adversarial LSTM
CN108986186B (en)	2023-05-05	Method and system for converting text into video
JP2023062173A (en)	2023-05-02	Video generation method and apparatus of the same, and neural network training method and apparatus of the same
CN111798879A (en)	2020-10-20	Method and apparatus for generating video
CN110991290A (en)	2020-04-10	Video description method based on semantic guidance and memory mechanism
Amaresh et al.	2019	Video captioning using deep learning: an overview of methods, datasets and metrics
Huddar et al.	2020	Attention-based word-level contextual feature extraction and cross-modality fusion for sentiment analysis and emotion classification
CN116977992A (en)	2023-10-31	Text information recognition method, device, computer equipment and storage medium
CN114677165B (en)	2025-11-25	Contextual online advertising delivery methods, devices, servers, and storage media
Wu et al.	2025	Multimodal emotion recognition in conversations: A survey of methods, trends, challenges and prospects
Mei et al.	2020	Vision and language: from visual perception to content creation
Braun et al.	2022	Automating audio description
CN113901263B (en)	2022-08-19	Label generation method and device for video material
Wei et al.	2023	Mseva: A system for multimodal short videos emotion visual analysis
Yousif et al.	2025	Semantic-based temporal attention network for Arabic Video Captioning
Vayadande et al.	2024	The Rise of AI‐Generated News Videos: A Detailed Review
Reiter et al.	2022	Exploring text recombination for automatic narrative level detection
Morato et al.	2022	A summarization approach to evaluating audio captioning
Wei et al.	2024	MEAS: Multimodal Emotion Analysis System for Short Videos on Social Media Platforms