[go: up one dir, main page]

CN116597362A - Method, device, electronic equipment and medium for identifying hotspot video segments - Google Patents

Method, device, electronic equipment and medium for identifying hotspot video segments Download PDF

Info

Publication number
CN116597362A
CN116597362A CN202310636583.6A CN202310636583A CN116597362A CN 116597362 A CN116597362 A CN 116597362A CN 202310636583 A CN202310636583 A CN 202310636583A CN 116597362 A CN116597362 A CN 116597362A
Authority
CN
China
Prior art keywords
text
video
video segment
bullet screen
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310636583.6A
Other languages
Chinese (zh)
Inventor
舒畅
陈又新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310636583.6A priority Critical patent/CN116597362A/en
Publication of CN116597362A publication Critical patent/CN116597362A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及人工智能技术,揭露一种热点视频片段识别方法,包括:利用预先训练好的图文匹配模型,提取每个视频段的视频特征及每个弹幕文本的文本特征,依次选取一个弹幕文本作为待匹配文本,将待匹配文本对应的初始视频段及预设相邻范围内的视频段作为待匹配视频段集,计算待匹配文本的文本特征与待匹配视频段集对应的视频特征之间的相似度,选择满足预设相似度条件的视频段作为待匹配文本的匹配视频段,对图文匹配后的弹幕文本进行分类,根据每种分类中弹幕文本的数量的多少计算对应分类中视频段的热点程度,根据所述热点程度确定热点视频。本发明还提出一种热点视频片段识别装置、设备以及介质。本发明可以提升医疗相关视频中热点视频片段识别的准确性。

The present invention relates to artificial intelligence technology, and discloses a method for identifying hot video clips, which includes: using a pre-trained image-text matching model to extract the video features of each video segment and the text features of each bullet chat text, and sequentially select a bullet The subtitle text is used as the text to be matched, and the initial video segment corresponding to the text to be matched and the video segments in the preset adjacent range are used as the set of video segments to be matched, and the text features of the text to be matched and the video features corresponding to the set of video segments to be matched are calculated The similarity between them, select the video segment that meets the preset similarity conditions as the matching video segment of the text to be matched, classify the bullet chat text after the graphic text matching, and calculate according to the number of bullet chat text in each category Corresponding to the hotspot degree of the video segment in the classification, the hotspot video is determined according to the hotspot degree. The present invention also proposes a device, equipment and medium for identifying hotspot video segments. The present invention can improve the accuracy of identifying hotspot video segments in medical-related videos.

Description

Hot spot video clip identification method and device, electronic equipment and medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for identifying a hotspot video clip, an electronic device, and a computer readable storage medium.
Background
Along with the development of network technology, in the medical field, propaganda and popularization work of medical health, medical science popularization and medical small stickers are carried out to common users in a video recording mode, and the traditional non-interactive video watching function can not meet the user demands far. In order to enhance interaction with a user, APP releasing medical videos improves user experience by means of a barrage function, and meanwhile, video release strategies can be timely adjusted according to information fed back by the user, so that the release quantity and quality of hot spot videos are guaranteed.
Traditional hot spot video clip extraction is severely dependent on manual work, and has certain subjectivity and limitation, so that a video clip which is considered to have sharing value by a specific user per se can not be accepted by masses, and even negative emotion or bad value of a clipper per se can be conveyed, and the network environment of a self-media platform is seriously influenced. Insufficient mining of video hotspot segments is also prone to occur.
The current technology for automatically identifying hot spot videos by using a machine mainly comprises a bullet screen number and threshold value size detection technology based on videos and a bullet screen text emotion polarity detection technology based on bullet screen text emotion polarity detection.
The detection method based on the number of video barrages and the threshold value cannot meet the requirement that the user selects the hot spot fragments according to the content and the emotion tendencies.
Based on the barrage text emotion polarity detection technology, the overall emotion intensity of the video segments is calculated in a manner of constructing an emotion dictionary, the emotion intensity change rate of adjacent video segments is inspected, hot spot video segments are identified according to emotion change rate characteristics, and the requirement that users search for hot spot segments by using emotion trends and keywords can be met. However, the technology does not consider the problem of delay of the barrage text, namely, the situation of time delay exists between the barrage comment text and the video time point or the time period when the user resonates, certain errors exist when the hot spot video clips are mined only by means of the barrage text, and the identification accuracy of the hot spot video clips needs to be further improved.
Disclosure of Invention
The invention provides a method, a device, electronic equipment and a computer readable storage medium for identifying hot spot video clips, which mainly aim to improve the accuracy of identifying hot spot video clips.
In order to achieve the above object, the present invention provides a method for identifying a hotspot video clip, including:
acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
classifying the video barrage concentrated barrage texts matched with the graphics context, calculating the hot spot degree of the video segments in the corresponding classification according to the number of the barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
Optionally, the extracting the video feature of each video segment in the initial video segment set includes:
and carrying out framing treatment on each video segment in the initial video segment set in sequence to obtain a video frame set corresponding to each video segment.
Extracting pixel values of three RGB channels of each video frame in the video frame set;
generating a pixel point matrix of the corresponding video frame by using the extracted pixel values;
and fusing pixel point matrixes corresponding to each video frame in the video frame set to obtain video features of the corresponding video segments.
Optionally, the extracting text features of each barrage text in the initial barrage text set includes:
sequentially carrying out word segmentation processing on each barrage text to obtain a plurality of text word segmentation;
selecting one text word from the plurality of text words one by one as a target word, and counting the co-occurrence times of the target word and the adjacent text word of the target word in a preset neighborhood range of the target word;
constructing a co-occurrence matrix by using the co-occurrence times corresponding to each text word;
respectively converting the text word segmentation into word vectors, and splicing the word vectors into vector matrixes;
Performing product operation by using the co-occurrence matrix and the vector matrix to obtain a text vector matrix;
and extracting text features of the barrage text from the text vector matrix.
Optionally, the extracting the text feature of the barrage text from the text vector matrix includes:
selecting one text word from the text word segments one by one as a target word segment, and calculating a key value of the target word segment according to a word vector of the target word segment and the text vector matrix;
selecting a preset number of text word fragments from the plurality of text word fragments according to the sequence from the key value to the key value;
and splicing the word vectors of the feature word segmentation to obtain the text features of the barrage text.
Optionally, the classifying the text of the video barrage set matched with the graphics includes:
sequentially selecting one video segment as a target video segment in the video barrage set with the matched image and text;
calculating a preset classification feature vector of each barrage text in the target video segment;
clustering bullet screen texts in the target video segment according to the classification feature vector to obtain a clustering center;
And calculating the probability value between the classification feature vector of each clustering center and a preset classification label, and selecting the classification label corresponding to the probability value larger than the preset probability threshold as the classification of the target video segment.
Optionally, the calculating the hot spot degree of the video segment in each category according to the number of the bullet screen texts in the corresponding category, and determining the hot spot video according to the hot spot degree includes:
counting the number of barrage texts corresponding to each video segment in all video segments in the same category;
taking the counted number as the hot spot degree of the corresponding video segments, and sequencing all the video segments of the same class according to the hot spot degree;
and selecting video segments in a preset ordering range as hot spot videos.
Optionally, the acquiring the initial video segment set and the initial barrage text set corresponding to the initial video segment set includes:
acquiring a long video with bullet screen data;
dividing the long video into video segments with equal length according to a preset unit duration to obtain the initial video segment set;
and acquiring bullet screen data contained in each video segment in the initial video segment set, and preprocessing each piece of bullet screen data to obtain an initial bullet screen text set corresponding to the initial video segment set.
In order to solve the above problems, the present invention further provides a hotspot video clip identification apparatus, which includes:
the image-text object acquisition module is used for acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
the image-text feature extraction module is used for extracting the video feature of each video segment in the initial video segment set and the text feature of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
the image-text alignment module is used for sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched; calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
and the image-text classification and hot spot statistics module is used for classifying the image-text matched video barrage concentrated barrage texts, calculating the hot spot degree of the video segments in the corresponding classification according to the number of barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one computer program; and
And the processor executes the program stored in the memory to realize the hot spot video clip identification method.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned hotspot video clip identification method.
According to the embodiment of the invention, the video characteristics of each video segment in the initial video segment set and the text characteristics of each barrage text in the initial barrage text set are extracted by using a pre-trained image-text matching model, the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched is calculated, and the video segment meeting the preset similarity condition is selected as the matching video segment of the text to be matched, so that alignment of barrage text and video segment is realized, the accuracy of classifying video segments by using barrage text and calculating the hot spot degree is improved, and the accuracy of identifying hot spot video segments is further improved.
Drawings
Fig. 1 is a flowchart of a method for identifying a hot spot video clip according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a detailed implementation of one of the steps in the method for identifying hot video clips according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another step in the method for identifying hot video clips according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating another step in the method for identifying hot video clips according to an embodiment of the present application;
FIG. 5 is a functional block diagram of a hot spot video clip identification apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device for implementing the hotspot video clip identification method according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The embodiment of the application provides a hot spot video clip identification method. The execution subject of the hotspot video clip identification method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the hotspot video clip identification method may be performed by software or hardware installed on a terminal device or a server device, where the software may be a blockchain platform. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (ContentDelivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flowchart of a method for identifying a hot spot video clip according to an embodiment of the present invention is shown. In this embodiment, the method for identifying a hotspot video clip includes:
s1, acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
in the embodiment of the invention, a medical guidance video is taken as an example for explanation. The initial video segment set is composed of a plurality of medical guidance videos with different or specified lengths, such as a simple wound dressing video, a home blood pressure measurement guidance video, a cervical spondylosis prevention exercise video and the like. The initial bullet screen text set refers to a set of bullet screen texts generated in the playing process of each video segment in the initial video segment set.
In the embodiment of the invention, the initial video segment set and the initial bullet screen text set can be acquired from a designated storage area, for example, the initial video segment set and the initial bullet screen text set can be acquired from a designated storage area of a database, a blockchain, a network cache and the like through a computer sentence (java sentence, python sentence and the like) with a data acquisition function.
In detail, the acquiring the initial video segment set and the initial bullet screen text set corresponding to the initial video segment set includes: acquiring a long video with bullet screen data; dividing the long video into video segments with equal length according to a preset unit duration to obtain the initial video segment set; and acquiring bullet screen data contained in each video segment in the initial video segment set, and preprocessing each piece of bullet screen data to obtain an initial bullet screen text set corresponding to the initial video segment set.
Illustratively V 1 、V 2 、V 3 、V n N long videos with bullet screen data are respectively represented, and the duration of each long video is different. The preset unit length is L seconds,(i=1, 2, … N) represents N video segments corresponding to the i-th long video.
In the embodiment of the invention, the preprocessing comprises operations such as text de-duplication, useless character removal, expression symbol filtering and the like.
Illustratively, assume that the ith long video v i The j-th video segment of (i=1, 2, …, N) Tiji=1, 2, …, N; j=1, 2, …, ni bullet screen texts can be used respectivelyAnd (3) representing.
It will be appreciated that, due to the fact that there is a certain delay in the time of generation of the bullet screen text relative to the time of playing the video, there is a possibility that a certain bullet screen text appearsThe video segment, the content actually expressed by the barrage text is aimed atThe video segment is preceded by a video segment or first N video segments, and, therefore,and a certain time delay exists between the initial barrage text set and the initial video segment set, and the initial barrage text set and the initial video segment set are not completely matched.
S2, extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
In the embodiment of the invention, the pre-trained graph-text matching model is a neural network model created based on an attention mechanism. And training the image-text matching model by using a given video segment training set and a barrage text training set, so that the image-text matching model can find the optimal barrage text delay time parameter relative to the video segment.
It can be understood that each video segment is composed of an unequal number of video frames, each video frame can be regarded as a static image, typically, the image is composed of R, G, B three channels, each channel can be regarded as a pixel matrix, and the pixel matrix can be used to represent the features of the corresponding image, so that the video features of the corresponding video segment can be generated according to the features of the image.
In detail, referring to fig. 3, the extracting the video feature of each video segment in the initial video segment set includes:
s21, sequentially carrying out framing treatment on each video segment in the initial video segment set to obtain a video frame set corresponding to each video segment;
s22, extracting pixel values of RGB three channels of each video frame in the video frame set;
s23, generating a pixel point matrix of a corresponding video frame by using the extracted pixel values;
S24, fusing pixel point matrixes corresponding to each video frame in the video frame set to obtain video features of the corresponding video segment.
In an embodiment of the present invention, the following formula may be used to fuse the pixel point matrix of each video frame:
wherein,,representing video characteristics corresponding to the jth video segment in the ith long video,/video>Respectively indicate->The pixel values of the RGB channel of the nth second and the xth frame, W represents the parameters of convolution transformation, and b represents the corresponding bias items.
In the embodiment of the invention, since each barrage text is composed of natural language, if the barrage text is directly analyzed, a large amount of calculation resources are occupied, and the analysis efficiency is low, therefore, the barrage text can be converted into a text vector matrix, and further the barrage text expressed by the natural language is converted into a numerical form.
In the embodiment of the invention, a Glove (Global Vectors for Word Representation, global word vector), an editing Layer and other methods can be adopted to convert each barrage text into a text vector matrix, and further, after converting the barrage text into the text vector matrix, feature extraction can be performed on the text vector matrix to obtain text features of the barrage text, wherein the text features include but are not limited to text scenes, text topics and text keywords.
In one embodiment of the present invention, referring to fig. 3, the extracting text features of each barrage text in the initial barrage text set includes:
s21, sequentially performing word segmentation processing on each barrage text to obtain a plurality of text word segmentation;
s22, selecting one text word from the plurality of text words one by one as a target word, and counting the co-occurrence times of the target word and the adjacent text word of the target word in a preset neighborhood range of the target word;
s23, constructing a co-occurrence matrix by using the co-occurrence times corresponding to each text word;
s24, respectively converting the text word segmentation into word vectors, and splicing the word vectors into vector matrixes;
s25, performing product operation by using the co-occurrence matrix and the vector matrix to obtain a text vector matrix;
s26, extracting text features of the barrage text from the text vector matrix.
In detail, the bullet screen text may be subjected to word segmentation processing by using a preset standard dictionary, so as to obtain a plurality of text word segments, where the standard dictionary includes a plurality of standard word segments.
For example, the barrage text is searched in the standard dictionary according to different lengths, and if the standard word identical to the barrage text can be searched, the searched standard word can be determined to be the text word of the barrage text.
Illustratively, the co-occurrence matrix shown below may be constructed using the co-occurrence times corresponding to each text word:
wherein X is i,j And the co-occurrence times of the keyword i and the word j of the adjacent text of the keyword i in the barrage text are obtained.
In detail, the extracting the text feature of the barrage text from the text vector matrix includes:
selecting one text word from the text word segments one by one as a target word segment, and calculating a key value of the target word segment according to a word vector of the target word segment and the text vector matrix;
selecting a preset number of text word fragments from the plurality of text word fragments according to the sequence from the key value to the key value;
and splicing the word vectors of the feature word segmentation to obtain the text features of the barrage text.
In detail, since the barrage text contains a large number of text words, but not every text word is a feature of the barrage text, the text words need to be screened, one of the text words is selected from the text words one by one as a target word, and a key value of the target word is calculated according to a word vector of the target word and the text vector matrix, so that feature words which are representative of the barrage text are screened according to the key value, and the text feature of the barrage text is obtained.
Specifically, the calculating the key value of the target word according to the word vector of the target word and the text vector matrix includes:
calculating the key value of the target word by using the following key value algorithm:
wherein K is the key value, W is the text vector matrix, T is a matrix transpose symbol, W is a modulo symbol,word vectors that segment the target word.
In the embodiment of the invention, the text word segmentation with the preset number is selected from the text word segmentation with the preset number as the characteristic word segmentation according to the sequence from the big key value to the small key value of each text word segmentation.
For example, the plurality of text tokens includes: and selecting the text word A and the text word B as feature word segments according to the sequence of the key values from big to small if the preset number is 2, and splicing word vectors of the text word A and the text word B to obtain the text feature of the barrage text.
S3, sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
It will be appreciated that typically the bullet screen text contains a time stamp relative to the video playing time that identifies at which time the user was publishing a bullet screen of a video segment, but the bullet screen text does not match the corresponding initial video segment exactly due to unequal user editing bullet screen text time, network communication speed variations, etc. A bullet screen text may be associated with a video segment preceding or following the initial video segment.
In the embodiment of the present invention, the preset adjacent range may be set according to actual situations. For example, the video segments within the preset adjacent range corresponding to the jth initial video segment include the jth E (0<E.ltoreq.j) video segments and j+e-th video segments, the value of e can be a natural number of 1, 2, 3, etc., namelyTo->The video segments form the set of video segments to be matched.
S4, calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched pictures and texts;
In the embodiment of the invention, the similarity between the text characteristics of the text to be matched and the video characteristics of each video segment in the video segment set to be matched can be calculated by using a preset activation function.
In the embodiment of the present invention, the similarity may be calculated using the following activation function:
wherein,,the jth video segment corresponding to the jth video segment representing the ith long video ij Text feature of barrage text, +.> Corresponding j-E0 representing i-th long video<E.ltoreq.j video segments to j+E-th video segment +.>And 2.epsilon.+1 video segments.
In the embodiment of the present invention, the preset similarity condition may be set according to actual situations, for example, the preset similarity condition may be that when the similarity between the text feature of a text to be matched and the video feature of a video segment to be matched is greater than or equal to a preset similarity threshold, a matching relationship exists between the corresponding text to be matched and the video segment to be matched.
In the embodiment of the invention, the video barrage set with matched graphics and texts is concentrated, and the barrage text contained in each video segment and the corresponding video segment realize the graphics and texts alignment effect.
S5, classifying the video bullet screen concentrated bullet screen texts with the matched images and texts, calculating the hot spot degree of the video segments in the corresponding classification according to the number of bullet screen texts in each classification, and determining hot spot videos according to the hot spot degree.
In the embodiment of the invention, the classification of the corresponding matched video segments can be completed by carrying out cluster analysis on the barrage text. The classification may be emotion polarity classification, including but not limited to positive, neutral, negative, etc., and in practical application, the corresponding classification setting may be performed according to the needs of hotspot video selection.
In detail, referring to fig. 4, the classifying the text of the video bullet screen set matched with the graphics includes:
s51, sequentially selecting a video segment as a target video segment in the video barrage set matched with the graphics context;
s52, calculating a preset classification feature vector of each barrage text in the target video segment;
s53, clustering bullet screen texts in the target video segment according to the classification feature vector to obtain a clustering center;
s54, calculating probability values between the classification feature vectors of each clustering center and preset classification labels, and selecting the classification labels corresponding to the probability values larger than a preset probability threshold as the classification of the target video segment.
In the embodiment of the invention, the classifying feature vector of each barrage text can be calculated by adopting methods such as Glove (Global Vectors for Word Representation, global word vector), embedding Layer and the like.
In the embodiment of the present invention, a preset activation function may be used to calculate a probability value between the classification feature vector of each cluster center and a preset classification label, where the activation function includes, but is not limited to, a softmax activation function, a sigmoid activation function, and a relu activation function, and the preset classification label includes, but is not limited to, positive, negative, and neutral.
In one embodiment of the present invention, the probability value may be calculated using the following activation function:
wherein p (a|x) is the probability between x and the classification label a of the classification feature vector of the cluster center, and w a For the weight vector of the classification tag a, T is the transpose operation symbol, exp is the desired operationAnd the symbol A is the number of preset classification labels.
Further, the calculating the hot spot degree of the video segment in the corresponding category according to the number of the bullet screen texts in each category, and determining the hot spot video according to the hot spot degree includes:
counting the number of barrage texts corresponding to each video segment in all video segments in the same category; taking the counted number as the hot spot degree of the corresponding video segments, and sequencing all the video segments of the same class according to the hot spot degree; and selecting video segments in a preset ordering range as hot spot videos.
Illustratively, the top 3 video segment is selected as the hotspot video.
In the embodiment of the invention, after matching and alignment of each barrage text and the corresponding video segment are completed, the number of barrage texts in each video segment is not counted directly, but the barrage texts in each video segment are classified first, so that the effect of marking each video segment can be achieved, and then the number of barrage texts in all video segments in the same classification is counted, and the accuracy of selecting hot videos can be improved.
According to the embodiment of the invention, the video characteristics of each video segment in the initial video segment set and the text characteristics of each barrage text in the initial barrage text set are extracted by using a pre-trained image-text matching model, the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched is calculated, and the video segment meeting the preset similarity condition is selected as the matching video segment of the text to be matched, so that alignment of barrage text and video segment is realized, the accuracy of classifying video segments by using barrage text and calculating the hot spot degree is improved, and the accuracy of identifying hot spot video segments is further improved.
Fig. 5 is a functional block diagram of a hotspot video clip identification apparatus according to an embodiment of the present invention.
The hotspot video clip identification apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the hotspot video clip identification apparatus 100 includes: the system comprises a picture and text object acquisition module 101, a picture and text feature extraction module 102, a picture and text alignment module 103 and a picture and text classification and hot spot statistics module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the image-text object obtaining module 101 is configured to obtain an initial video segment set and an initial bullet screen text set corresponding to the initial video segment set;
the image-text feature extraction module 102 is configured to extract a video feature of each video segment in the initial video segment set and a text feature of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
the image-text alignment module 103 is configured to sequentially select one bullet screen text from the initial bullet screen text set as a text to be matched, and use an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched; calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
The image-text classification and hot spot statistics module 104 is configured to classify the image-text matched video bullet screen concentrated bullet screen text, calculate the hot spot degree of the video segment in the corresponding classification according to the number of bullet screen texts in each classification, and determine the hot spot video according to the hot spot degree.
In detail, each module in the hotspot video clip identification apparatus 100 in the embodiment of the present invention adopts the same technical means as the hotspot video clip identification method described in fig. 1 to 4, and can produce the same technical effects, which are not described herein.
Fig. 6 is a schematic structural diagram of an electronic device for implementing a method for identifying hot video clips according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a hot spot video clip identification program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of a hotspot video clip identification program, etc., but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., a hot spot video clip recognition program, etc.) stored in the memory 11, and calling data stored in the memory 11.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 6 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 6 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The hotspot video clip identification program stored in the memory 11 of the electronic device 1 is a combination of instructions which, when executed in the processor 10, can implement:
acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
Sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
classifying the video barrage concentrated barrage texts matched with the graphics context, calculating the hot spot degree of the video segments in the corresponding classification according to the number of the barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:
acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
classifying the video barrage concentrated barrage texts matched with the graphics context, calculating the hot spot degree of the video segments in the corresponding classification according to the number of the barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The embodiment of the application can acquire and process the related data based on the holographic projection technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims (10)

1.一种热点视频片段识别方法,其特征在于,所述方法包括:1. A method for identifying trending video clips, characterized in that the method comprises: 获取初始视频段集及所述初始视频段集对应的初始弹幕文本集;Obtain the initial video segment set and the corresponding initial bullet screen text set; 利用预先训练好的图文匹配模型,提取所述初始视频段集中每个视频段的视频特征及所述初始弹幕文本集中每个弹幕文本的文本特征;Using a pre-trained image-text matching model, extract the video features of each video segment in the initial video segment set and the text features of each bullet screen text in the initial bullet screen text set; 从所述初始弹幕文本集中,依次选取一个弹幕文本作为待匹配文本,将所述待匹配文本对应的初始视频段及所述初始视频段对应的预设相邻范围内的视频段作为待匹配视频段集;From the initial set of bullet screen texts, one bullet screen text is selected as the text to be matched, and the initial video segment corresponding to the text to be matched and the video segments within the preset adjacent range corresponding to the initial video segment are taken as the set of video segments to be matched. 计算所述待匹配文本的文本特征与所述待匹配视频段集对应的视频特征之间的相似度,从所述待匹配视频段集中,选择满足预设相似度条件的视频段作为所述待匹配文本的匹配视频段,得到图文匹配的视频弹幕集;Calculate the similarity between the text features of the text to be matched and the video features corresponding to the set of video segments to be matched. Select video segments that meet the preset similarity conditions from the set of video segments to be matched as the matching video segments of the text to be matched, and obtain the set of video bullet comments with image-text matching. 对所述图文匹配的视频弹幕集中弹幕文本进行分类,根据每种分类中弹幕文本的数量的多少计算对应分类中视频段的热点程度,根据所述热点程度确定热点视频。The bullet screen text in the video bullet screen set of the image-text matching is classified. The popularity degree of the video segment in the corresponding category is calculated according to the number of bullet screen texts in each category. The popular videos are determined according to the popularity degree. 2.如权利要求1所述的热点视频片段识别方法,其特征在于,所述提取所述初始视频段集中每个视频段的视频特征,包括:2. The hotspot video segment identification method as described in claim 1, characterized in that, extracting the video features of each video segment in the initial video segment set includes: 依次对所述初始视频段集中的每个视频段进行分帧处理,得到每个所述视频段对应的视频帧集合。Each video segment in the initial video segment set is sequentially processed into frames to obtain a set of video frames corresponding to each video segment. 提取所述视频帧集合中每个视频帧的RGB三个通道的像素值;Extract the pixel values of the RGB three channels of each video frame in the video frame set; 利用提取到的像素值生成对应视频帧的像素点矩阵;The extracted pixel values are used to generate a pixel matrix for the corresponding video frame; 融合所述视频帧集合中每个视频帧对应的像素点矩阵,得到对应视频段的视频特征。By fusing the pixel matrix corresponding to each video frame in the video frame set, the video features of the corresponding video segment are obtained. 3.如权利要求1所述的热点视频片段识别方法,其特征在于,所述提取所述初始弹幕文本集中每个弹幕文本的文本特征,包括:3. The method for identifying trending video segments as described in claim 1, characterized in that, extracting the text features of each bullet screen text in the initial bullet screen text set includes: 依次对每个所述弹幕文本进行分词处理,得到多个文本分词;Each of the bullet screen texts is segmented sequentially to obtain multiple text segments; 从所述多个文本分词中逐个选取其中一个文本分词为目标分词,并统计所述目标分词和所述目标分词的相邻文本分词在所述目标分词的预设邻域范围内共同出现的共现次数;Select one text segment from the multiple text segmentations as the target segment, and count the number of times the target segment and its neighboring text segments co-occur within a preset neighborhood of the target segment. 利用每一个文本分词对应的共现次数构建共现矩阵;Construct a co-occurrence matrix using the co-occurrence counts corresponding to each text segmentation; 分别将所述多个文本分词转换为词向量,将所述词向量拼接为向量矩阵;The multiple text segments are converted into word vectors, and the word vectors are concatenated into a vector matrix; 利用所述共现矩阵和所述向量矩阵进行乘积运算,得到文本向量矩阵;The text vector matrix is obtained by multiplying the co-occurrence matrix and the vector matrix. 从所述文本向量矩阵中提取所述弹幕文本的文本特征。Extract the text features of the bullet screen text from the text vector matrix. 4.如权利要求3所述的热点视频片段识别方法,其特征在于,所述从所述文本向量矩阵中提取所述弹幕文本的文本特征,包括:4. The method for identifying trending video clips as described in claim 3, characterized in that, extracting the text features of the bullet screen text from the text vector matrix includes: 从所述多个文本分词中逐个选取其中一个文本分词为目标分词,根据所述目标分词的词向量及所述文本向量矩阵,计算所述目标分词的关键值;Select one text segment from the multiple text segmentations as the target segment, and calculate the key value of the target segment based on the word vector of the target segment and the text vector matrix; 按照所述关键值从大到小的顺序从所述多个文本分词中选取预设数量的文本分词为特征分词;A predetermined number of text segments are selected as feature segments from the plurality of text segments according to the key values in descending order; 将所述特征分词的词向量拼接得到所述弹幕文本的文本特征。The text features of the bullet screen text are obtained by concatenating the word vectors of the feature segmentation. 5.如权利要求1所述的热点视频片段识别方法,其特征在于,所述对所述图文匹配的视频弹幕集中弹幕文本进行分类,包括:5. The method for identifying trending video clips as described in claim 1, characterized in that, classifying the text of the video bullet comments in the image-text matching set includes: 在所述图文匹配的视频弹幕集中,依次选择一个视频段作为目标视频段;In the set of video bullet comments that match text and images, select one video segment as the target video segment in sequence; 计算所述目标视频段中每条弹幕文本的预设的分类特征向量;Calculate the preset classification feature vector for each bullet screen text in the target video segment; 根据所述分类特征向量对所述目标视频段中的弹幕文本进行聚类,得到聚类中心;Cluster the bullet screen text in the target video segment based on the classification feature vector to obtain cluster centers; 计算每个所述聚类中心的分类特征向量与预设的分类标签之间的概率值,选取大于预设概率阈值的概率值对应的分类标签作为所述目标视频段的分类。Calculate the probability value between the classification feature vector of each cluster center and the preset classification label, and select the classification label corresponding to the probability value greater than the preset probability threshold as the classification of the target video segment. 6.如权利要求1所述的热点视频片段识别方法,其特征在于,所述根据每种分类中弹幕文本的数量的多少计算对应分类中视频段的热点程度,根据所述热点程度确定热点视频,包括:6. The method for identifying trending video segments as described in claim 1, characterized in that, the step of calculating the trending degree of video segments in each category based on the number of bullet screen texts in each category, and determining trending videos based on the trending degree, includes: 统计同一分类的所有视频段中,每个视频段对应的弹幕文本的数量;Count the number of bullet screen texts for each video segment within the same category; 将统计的数量作为对应视频段的热点程度,根据所述热点程度对同一分类的所有视频段的进行排序;The number of statistics is used as the popularity level of the corresponding video segment, and all video segments in the same category are sorted according to the popularity level. 选择预设排序范围内的视频段作为热点视频。Select video segments within the preset sorting range as hot videos. 7.如权利要求1所述的热点视频片段识别方法,其特征在于,所述获取初始视频段集及所述初始视频段集对应的初始弹幕文本集,包括:7. The method for identifying trending video segments as described in claim 1, characterized in that obtaining the initial video segment set and the initial bullet screen text set corresponding to the initial video segment set includes: 获取带有弹幕数据的长视频;Get long videos with bullet screen data; 按照预设单位时长将所述长视频划分为等长的视频段,得到所述初始视频段集;The long video is divided into video segments of equal length according to a preset unit duration to obtain the initial video segment set; 获取所述初始视频段集中每个视频段包含的弹幕数据,对所述每条弹幕数据进行预处理,得到所述初始视频段集对应的初始弹幕文本集。Obtain the bullet screen data contained in each video segment of the initial video segment set, preprocess each bullet screen data, and obtain the initial bullet screen text set corresponding to the initial video segment set. 8.一种热点视频片段识别装置,其特征在于,所述装置包括:8. A device for identifying hotspot video clips, characterized in that the device comprises: 图文对象获取模块,用于获取初始视频段集及所述初始视频段集对应的初始弹幕文本集;The image and text object acquisition module is used to acquire the initial video segment set and the initial bullet screen text set corresponding to the initial video segment set; 图文特征提取模块,用于利用预先训练好的图文匹配模型,提取所述初始视频段集中每个视频段的视频特征及所述初始弹幕文本集中每个弹幕文本的文本特征;The image and text feature extraction module is used to extract the video features of each video segment in the initial video segment set and the text features of each bullet screen text in the initial bullet screen text set using a pre-trained image and text matching model. 图文对齐模块,用于从所述初始弹幕文本集中,依次选取一个弹幕文本作为待匹配文本,将所述待匹配文本对应的初始视频段及所述初始视频段对应的预设相邻范围内的视频段作为待匹配视频段集;计算所述待匹配文本的文本特征与所述待匹配视频段集对应的视频特征之间的相似度,从所述待匹配视频段集中,选择满足预设相似度条件的视频段作为所述待匹配文本的匹配视频段,得到图文匹配的视频弹幕集;The image-text alignment module is used to sequentially select one bullet screen text as the text to be matched from the initial bullet screen text set, and to take the initial video segment corresponding to the text to be matched and the video segments within a preset adjacent range corresponding to the initial video segment as the set of video segments to be matched; calculate the similarity between the text features of the text to be matched and the video features corresponding to the set of video segments to be matched, and select the video segments that meet the preset similarity conditions from the set of video segments to be matched as the matching video segments of the text to be matched, thereby obtaining the image-text matched video bullet screen set; 图文分类及热点统计模块,用于对所述图文匹配的视频弹幕集中弹幕文本进行分类,根据每种分类中弹幕文本的数量的多少计算对应分类中视频段的热点程度,根据所述热点程度确定热点视频。The image and text classification and hotspot statistics module is used to classify the bullet screen text in the video bullet screen set of the image and text matching, calculate the hotspot degree of the video segment in the corresponding category based on the number of bullet screen texts in each category, and determine the hotspot video based on the hotspot degree. 9.一种电子设备,其特征在于,所述电子设备包括:9. An electronic device, characterized in that the electronic device comprises: 至少一个处理器;以及,At least one processor; and, 与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected to the at least one processor; wherein, 所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至7中任意一项所述的热点视频片段识别方法。The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the hotspot video segment recognition method as described in any one of claims 1 to 7. 10.一种计算机可读存储介质,存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任意一项所述的热点视频片段识别方法。10. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, it implements the hotspot video segment identification method as described in any one of claims 1 to 7.
CN202310636583.6A 2023-05-31 2023-05-31 Method, device, electronic equipment and medium for identifying hotspot video segments Pending CN116597362A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310636583.6A CN116597362A (en) 2023-05-31 2023-05-31 Method, device, electronic equipment and medium for identifying hotspot video segments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310636583.6A CN116597362A (en) 2023-05-31 2023-05-31 Method, device, electronic equipment and medium for identifying hotspot video segments

Publications (1)

Publication Number Publication Date
CN116597362A true CN116597362A (en) 2023-08-15

Family

ID=87599034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310636583.6A Pending CN116597362A (en) 2023-05-31 2023-05-31 Method, device, electronic equipment and medium for identifying hotspot video segments

Country Status (1)

Country Link
CN (1) CN116597362A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290542A (en) * 2023-09-26 2023-12-26 京东方科技集团股份有限公司 Video question and answer method, computer equipment and storage medium
CN118608204A (en) * 2024-07-18 2024-09-06 金数信息科技(苏州)有限公司 A real-time effect tracking method for advertising based on multi-objective optimization algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248258A (en) * 2019-07-18 2019-09-17 腾讯科技(深圳)有限公司 Recommended method, device, storage medium and the computer equipment of video clip
CN113420556A (en) * 2021-07-23 2021-09-21 平安科技(深圳)有限公司 Multi-mode signal based emotion recognition method, device, equipment and storage medium
CN113420723A (en) * 2021-07-21 2021-09-21 北京有竹居网络技术有限公司 Method and device for acquiring video hotspot, readable medium and electronic equipment
CN114339362A (en) * 2021-12-08 2022-04-12 腾讯科技(深圳)有限公司 Video bullet screen matching method and device, computer equipment and storage medium
CN114979620A (en) * 2022-03-31 2022-08-30 北京邮电大学 Video bright spot segment detection method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248258A (en) * 2019-07-18 2019-09-17 腾讯科技(深圳)有限公司 Recommended method, device, storage medium and the computer equipment of video clip
CN113420723A (en) * 2021-07-21 2021-09-21 北京有竹居网络技术有限公司 Method and device for acquiring video hotspot, readable medium and electronic equipment
CN113420556A (en) * 2021-07-23 2021-09-21 平安科技(深圳)有限公司 Multi-mode signal based emotion recognition method, device, equipment and storage medium
CN114339362A (en) * 2021-12-08 2022-04-12 腾讯科技(深圳)有限公司 Video bullet screen matching method and device, computer equipment and storage medium
CN114979620A (en) * 2022-03-31 2022-08-30 北京邮电大学 Video bright spot segment detection method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290542A (en) * 2023-09-26 2023-12-26 京东方科技集团股份有限公司 Video question and answer method, computer equipment and storage medium
CN118608204A (en) * 2024-07-18 2024-09-06 金数信息科技(苏州)有限公司 A real-time effect tracking method for advertising based on multi-objective optimization algorithm

Similar Documents

Publication Publication Date Title
CN110837579B (en) Video classification method, apparatus, computer and readable storage medium
CN113051356B (en) Open relation extraction method and device, electronic equipment and storage medium
CN113157927B (en) Text classification method, device, electronic device and readable storage medium
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
CN113704623B (en) Data recommendation method, device, equipment and storage medium
CN113360768A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN113707302B (en) Service recommendation method, device, equipment and storage medium based on associated information
CN114398557B (en) Information recommendation method and device based on double images, electronic equipment and storage medium
CN112380859A (en) Public opinion information recommendation method and device, electronic equipment and computer storage medium
CN117390173B (en) Massive resume screening method for semantic similarity matching
CN114677526A (en) Image classification method, device, equipment and medium
CN112328833B (en) Label processing method, device and computer readable storage medium
CN113742592A (en) Public opinion information pushing method, device, equipment and storage medium
CN112364068A (en) Course label generation method, device, equipment and medium
CN115525761A (en) Method, device, equipment and storage medium for article keyword screening category
CN116450829A (en) Medical text classification method, device, equipment and medium
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN116719904A (en) Information query methods, devices, equipment and storage media based on the combination of images and text
CN114386392B (en) Document generation method, device, equipment and storage medium
CN113592606B (en) Product recommendation method, device, equipment and storage medium based on multiple decisions
CN116521867B (en) Text clustering method and device, electronic equipment and storage medium
CN116644315B (en) Visual metaphor mining method, device, electronic equipment and medium
CN116597362A (en) Method, device, electronic equipment and medium for identifying hotspot video segments
CN113407843B (en) User portrait generation method, device, electronic device and computer storage medium
CN112861750B (en) Video extraction method, device, equipment and medium based on inflection point detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination