Wei et al., 2023 - Google Patents
Mseva: A system for multimodal short videos emotion visual analysisWei et al., 2023
View PDF- Document ID
- 12736360048191603041
- Author
- Wei Q
- Zhou Y
- Xiao L
- Zhang Y
- Publication year
- Publication venue
- arXiv preprint arXiv:2312.04279
External Links
Snippet
YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different …
- 230000008451 emotion 0 title abstract description 121
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30796—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30038—Querying based on information manually generated or based on information not derived from the media content, e.g. tags, keywords, comments, usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G06F17/30023—Querying
- G06F17/30029—Querying by filtering; by personalisation, e.g. querying making use of user profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce, e.g. shopping or e-commerce
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Pont-Tuset et al. | Connecting vision and language with localized narratives | |
| Wang et al. | An overview of image caption generation methods | |
| Gao et al. | Hierarchical representation network with auxiliary tasks for video captioning and video question answering | |
| Labatut et al. | Extraction and analysis of fictional character networks: A survey | |
| Li et al. | Know more say less: Image captioning based on scene graphs | |
| Ou et al. | Multimodal local-global attention network for affective video content analysis | |
| Yang et al. | Video captioning by adversarial LSTM | |
| CN108986186B (en) | Method and system for converting text into video | |
| JP2023062173A (en) | Video generation method and apparatus of the same, and neural network training method and apparatus of the same | |
| CN111798879A (en) | Method and apparatus for generating video | |
| CN110991290A (en) | Video description method based on semantic guidance and memory mechanism | |
| Amaresh et al. | Video captioning using deep learning: an overview of methods, datasets and metrics | |
| Huddar et al. | Attention-based word-level contextual feature extraction and cross-modality fusion for sentiment analysis and emotion classification | |
| CN116977992A (en) | Text information recognition method, device, computer equipment and storage medium | |
| CN114677165B (en) | Contextual online advertising delivery methods, devices, servers, and storage media | |
| Wu et al. | Multimodal emotion recognition in conversations: A survey of methods, trends, challenges and prospects | |
| Mei et al. | Vision and language: from visual perception to content creation | |
| Braun et al. | Automating audio description | |
| CN113901263B (en) | Label generation method and device for video material | |
| Wei et al. | Mseva: A system for multimodal short videos emotion visual analysis | |
| Yousif et al. | Semantic-based temporal attention network for Arabic Video Captioning | |
| Vayadande et al. | The Rise of AI‐Generated News Videos: A Detailed Review | |
| Reiter et al. | Exploring text recombination for automatic narrative level detection | |
| Morato et al. | A summarization approach to evaluating audio captioning | |
| Wei et al. | MEAS: Multimodal Emotion Analysis System for Short Videos on Social Media Platforms |