CN116597362A

CN116597362A - Method, device, electronic equipment and medium for identifying hotspot video segments

Info

Publication number: CN116597362A
Application number: CN202310636583.6A
Authority: CN
Inventors: 舒畅; 陈又新
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-08-15

Abstract

The present invention relates to artificial intelligence technology, and discloses a method for identifying hot video clips, which includes: using a pre-trained image-text matching model to extract the video features of each video segment and the text features of each bullet chat text, and sequentially select a bullet The subtitle text is used as the text to be matched, and the initial video segment corresponding to the text to be matched and the video segments in the preset adjacent range are used as the set of video segments to be matched, and the text features of the text to be matched and the video features corresponding to the set of video segments to be matched are calculated The similarity between them, select the video segment that meets the preset similarity conditions as the matching video segment of the text to be matched, classify the bullet chat text after the graphic text matching, and calculate according to the number of bullet chat text in each category Corresponding to the hotspot degree of the video segment in the classification, the hotspot video is determined according to the hotspot degree. The present invention also proposes a device, equipment and medium for identifying hotspot video segments. The present invention can improve the accuracy of identifying hotspot video segments in medical-related videos.

Description

Hot spot video clip identification method and device, electronic equipment and medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for identifying a hotspot video clip, an electronic device, and a computer readable storage medium.

Background

Along with the development of network technology, in the medical field, propaganda and popularization work of medical health, medical science popularization and medical small stickers are carried out to common users in a video recording mode, and the traditional non-interactive video watching function can not meet the user demands far. In order to enhance interaction with a user, APP releasing medical videos improves user experience by means of a barrage function, and meanwhile, video release strategies can be timely adjusted according to information fed back by the user, so that the release quantity and quality of hot spot videos are guaranteed.

Traditional hot spot video clip extraction is severely dependent on manual work, and has certain subjectivity and limitation, so that a video clip which is considered to have sharing value by a specific user per se can not be accepted by masses, and even negative emotion or bad value of a clipper per se can be conveyed, and the network environment of a self-media platform is seriously influenced. Insufficient mining of video hotspot segments is also prone to occur.

The current technology for automatically identifying hot spot videos by using a machine mainly comprises a bullet screen number and threshold value size detection technology based on videos and a bullet screen text emotion polarity detection technology based on bullet screen text emotion polarity detection.

The detection method based on the number of video barrages and the threshold value cannot meet the requirement that the user selects the hot spot fragments according to the content and the emotion tendencies.

Based on the barrage text emotion polarity detection technology, the overall emotion intensity of the video segments is calculated in a manner of constructing an emotion dictionary, the emotion intensity change rate of adjacent video segments is inspected, hot spot video segments are identified according to emotion change rate characteristics, and the requirement that users search for hot spot segments by using emotion trends and keywords can be met. However, the technology does not consider the problem of delay of the barrage text, namely, the situation of time delay exists between the barrage comment text and the video time point or the time period when the user resonates, certain errors exist when the hot spot video clips are mined only by means of the barrage text, and the identification accuracy of the hot spot video clips needs to be further improved.

Disclosure of Invention

The invention provides a method, a device, electronic equipment and a computer readable storage medium for identifying hot spot video clips, which mainly aim to improve the accuracy of identifying hot spot video clips.

In order to achieve the above object, the present invention provides a method for identifying a hotspot video clip, including:

acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;

extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;

sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;

calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;

classifying the video barrage concentrated barrage texts matched with the graphics context, calculating the hot spot degree of the video segments in the corresponding classification according to the number of the barrage texts in each classification, and determining hot spot videos according to the hot spot degree.

Optionally, the extracting the video feature of each video segment in the initial video segment set includes:

and carrying out framing treatment on each video segment in the initial video segment set in sequence to obtain a video frame set corresponding to each video segment.

Extracting pixel values of three RGB channels of each video frame in the video frame set;

generating a pixel point matrix of the corresponding video frame by using the extracted pixel values;

and fusing pixel point matrixes corresponding to each video frame in the video frame set to obtain video features of the corresponding video segments.

Optionally, the extracting text features of each barrage text in the initial barrage text set includes:

sequentially carrying out word segmentation processing on each barrage text to obtain a plurality of text word segmentation;

selecting one text word from the plurality of text words one by one as a target word, and counting the co-occurrence times of the target word and the adjacent text word of the target word in a preset neighborhood range of the target word;

constructing a co-occurrence matrix by using the co-occurrence times corresponding to each text word;

respectively converting the text word segmentation into word vectors, and splicing the word vectors into vector matrixes;

Performing product operation by using the co-occurrence matrix and the vector matrix to obtain a text vector matrix;

and extracting text features of the barrage text from the text vector matrix.

Optionally, the extracting the text feature of the barrage text from the text vector matrix includes:

selecting one text word from the text word segments one by one as a target word segment, and calculating a key value of the target word segment according to a word vector of the target word segment and the text vector matrix;

selecting a preset number of text word fragments from the plurality of text word fragments according to the sequence from the key value to the key value;

and splicing the word vectors of the feature word segmentation to obtain the text features of the barrage text.

Optionally, the classifying the text of the video barrage set matched with the graphics includes:

sequentially selecting one video segment as a target video segment in the video barrage set with the matched image and text;

calculating a preset classification feature vector of each barrage text in the target video segment;

clustering bullet screen texts in the target video segment according to the classification feature vector to obtain a clustering center;

And calculating the probability value between the classification feature vector of each clustering center and a preset classification label, and selecting the classification label corresponding to the probability value larger than the preset probability threshold as the classification of the target video segment.

Optionally, the calculating the hot spot degree of the video segment in each category according to the number of the bullet screen texts in the corresponding category, and determining the hot spot video according to the hot spot degree includes:

counting the number of barrage texts corresponding to each video segment in all video segments in the same category;

taking the counted number as the hot spot degree of the corresponding video segments, and sequencing all the video segments of the same class according to the hot spot degree;

and selecting video segments in a preset ordering range as hot spot videos.

Optionally, the acquiring the initial video segment set and the initial barrage text set corresponding to the initial video segment set includes:

acquiring a long video with bullet screen data;

dividing the long video into video segments with equal length according to a preset unit duration to obtain the initial video segment set;

and acquiring bullet screen data contained in each video segment in the initial video segment set, and preprocessing each piece of bullet screen data to obtain an initial bullet screen text set corresponding to the initial video segment set.

In order to solve the above problems, the present invention further provides a hotspot video clip identification apparatus, which includes:

the image-text object acquisition module is used for acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;

the image-text feature extraction module is used for extracting the video feature of each video segment in the initial video segment set and the text feature of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;

the image-text alignment module is used for sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched; calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;

and the image-text classification and hot spot statistics module is used for classifying the image-text matched video barrage concentrated barrage texts, calculating the hot spot degree of the video segments in the corresponding classification according to the number of barrage texts in each classification, and determining hot spot videos according to the hot spot degree.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

a memory storing at least one computer program; and

And the processor executes the program stored in the memory to realize the hot spot video clip identification method.

In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned hotspot video clip identification method.

According to the embodiment of the invention, the video characteristics of each video segment in the initial video segment set and the text characteristics of each barrage text in the initial barrage text set are extracted by using a pre-trained image-text matching model, the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched is calculated, and the video segment meeting the preset similarity condition is selected as the matching video segment of the text to be matched, so that alignment of barrage text and video segment is realized, the accuracy of classifying video segments by using barrage text and calculating the hot spot degree is improved, and the accuracy of identifying hot spot video segments is further improved.

Drawings

Fig. 1 is a flowchart of a method for identifying a hot spot video clip according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a detailed implementation of one of the steps in the method for identifying hot video clips according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating another step in the method for identifying hot video clips according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating another step in the method for identifying hot video clips according to an embodiment of the present application;

FIG. 5 is a functional block diagram of a hot spot video clip identification apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device for implementing the hotspot video clip identification method according to an embodiment of the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application provides a hot spot video clip identification method. The execution subject of the hotspot video clip identification method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the hotspot video clip identification method may be performed by software or hardware installed on a terminal device or a server device, where the software may be a blockchain platform. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (ContentDelivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Referring to fig. 1, a flowchart of a method for identifying a hot spot video clip according to an embodiment of the present invention is shown. In this embodiment, the method for identifying a hotspot video clip includes:

s1, acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;

in the embodiment of the invention, a medical guidance video is taken as an example for explanation. The initial video segment set is composed of a plurality of medical guidance videos with different or specified lengths, such as a simple wound dressing video, a home blood pressure measurement guidance video, a cervical spondylosis prevention exercise video and the like. The initial bullet screen text set refers to a set of bullet screen texts generated in the playing process of each video segment in the initial video segment set.

In the embodiment of the invention, the initial video segment set and the initial bullet screen text set can be acquired from a designated storage area, for example, the initial video segment set and the initial bullet screen text set can be acquired from a designated storage area of a database, a blockchain, a network cache and the like through a computer sentence (java sentence, python sentence and the like) with a data acquisition function.

In detail, the acquiring the initial video segment set and the initial bullet screen text set corresponding to the initial video segment set includes: acquiring a long video with bullet screen data; dividing the long video into video segments with equal length according to a preset unit duration to obtain the initial video segment set; and acquiring bullet screen data contained in each video segment in the initial video segment set, and preprocessing each piece of bullet screen data to obtain an initial bullet screen text set corresponding to the initial video segment set.

Illustratively V ₁ 、V ₂ 、V ₃ 、V _n N long videos with bullet screen data are respectively represented, and the duration of each long video is different. The preset unit length is L seconds,(i=1, 2, … N) represents N video segments corresponding to the i-th long video.

In the embodiment of the invention, the preprocessing comprises operations such as text de-duplication, useless character removal, expression symbol filtering and the like.

Illustratively, assume that the ith long video v _i The j-th video segment of (i=1, 2, …, N) Tiji=1, 2, …, N; j=1, 2, …, ni bullet screen texts can be used respectivelyAnd (3) representing.

It will be appreciated that, due to the fact that there is a certain delay in the time of generation of the bullet screen text relative to the time of playing the video, there is a possibility that a certain bullet screen text appearsThe video segment, the content actually expressed by the barrage text is aimed atThe video segment is preceded by a video segment or first N video segments, and, therefore,and a certain time delay exists between the initial barrage text set and the initial video segment set, and the initial barrage text set and the initial video segment set are not completely matched.

S2, extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;

In the embodiment of the invention, the pre-trained graph-text matching model is a neural network model created based on an attention mechanism. And training the image-text matching model by using a given video segment training set and a barrage text training set, so that the image-text matching model can find the optimal barrage text delay time parameter relative to the video segment.

It can be understood that each video segment is composed of an unequal number of video frames, each video frame can be regarded as a static image, typically, the image is composed of R, G, B three channels, each channel can be regarded as a pixel matrix, and the pixel matrix can be used to represent the features of the corresponding image, so that the video features of the corresponding video segment can be generated according to the features of the image.

In detail, referring to fig. 3, the extracting the video feature of each video segment in the initial video segment set includes:

s21, sequentially carrying out framing treatment on each video segment in the initial video segment set to obtain a video frame set corresponding to each video segment;

s22, extracting pixel values of RGB three channels of each video frame in the video frame set;

s23, generating a pixel point matrix of a corresponding video frame by using the extracted pixel values;

S24, fusing pixel point matrixes corresponding to each video frame in the video frame set to obtain video features of the corresponding video segment.

In an embodiment of the present invention, the following formula may be used to fuse the pixel point matrix of each video frame:

wherein,,representing video characteristics corresponding to the jth video segment in the ith long video,/video>Respectively indicate->The pixel values of the RGB channel of the nth second and the xth frame, W represents the parameters of convolution transformation, and b represents the corresponding bias items.

In the embodiment of the invention, since each barrage text is composed of natural language, if the barrage text is directly analyzed, a large amount of calculation resources are occupied, and the analysis efficiency is low, therefore, the barrage text can be converted into a text vector matrix, and further the barrage text expressed by the natural language is converted into a numerical form.

In the embodiment of the invention, a Glove (Global Vectors for Word Representation, global word vector), an editing Layer and other methods can be adopted to convert each barrage text into a text vector matrix, and further, after converting the barrage text into the text vector matrix, feature extraction can be performed on the text vector matrix to obtain text features of the barrage text, wherein the text features include but are not limited to text scenes, text topics and text keywords.

In one embodiment of the present invention, referring to fig. 3, the extracting text features of each barrage text in the initial barrage text set includes:

s21, sequentially performing word segmentation processing on each barrage text to obtain a plurality of text word segmentation;

s22, selecting one text word from the plurality of text words one by one as a target word, and counting the co-occurrence times of the target word and the adjacent text word of the target word in a preset neighborhood range of the target word;

s23, constructing a co-occurrence matrix by using the co-occurrence times corresponding to each text word;

s24, respectively converting the text word segmentation into word vectors, and splicing the word vectors into vector matrixes;

s25, performing product operation by using the co-occurrence matrix and the vector matrix to obtain a text vector matrix;

s26, extracting text features of the barrage text from the text vector matrix.

In detail, the bullet screen text may be subjected to word segmentation processing by using a preset standard dictionary, so as to obtain a plurality of text word segments, where the standard dictionary includes a plurality of standard word segments.

For example, the barrage text is searched in the standard dictionary according to different lengths, and if the standard word identical to the barrage text can be searched, the searched standard word can be determined to be the text word of the barrage text.

Illustratively, the co-occurrence matrix shown below may be constructed using the co-occurrence times corresponding to each text word:

wherein X is _i,j And the co-occurrence times of the keyword i and the word j of the adjacent text of the keyword i in the barrage text are obtained.

In detail, the extracting the text feature of the barrage text from the text vector matrix includes:

In detail, since the barrage text contains a large number of text words, but not every text word is a feature of the barrage text, the text words need to be screened, one of the text words is selected from the text words one by one as a target word, and a key value of the target word is calculated according to a word vector of the target word and the text vector matrix, so that feature words which are representative of the barrage text are screened according to the key value, and the text feature of the barrage text is obtained.

Specifically, the calculating the key value of the target word according to the word vector of the target word and the text vector matrix includes:

calculating the key value of the target word by using the following key value algorithm:

wherein K is the key value, W is the text vector matrix, T is a matrix transpose symbol, W is a modulo symbol,word vectors that segment the target word.

In the embodiment of the invention, the text word segmentation with the preset number is selected from the text word segmentation with the preset number as the characteristic word segmentation according to the sequence from the big key value to the small key value of each text word segmentation.

For example, the plurality of text tokens includes: and selecting the text word A and the text word B as feature word segments according to the sequence of the key values from big to small if the preset number is 2, and splicing word vectors of the text word A and the text word B to obtain the text feature of the barrage text.

S3, sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;

It will be appreciated that typically the bullet screen text contains a time stamp relative to the video playing time that identifies at which time the user was publishing a bullet screen of a video segment, but the bullet screen text does not match the corresponding initial video segment exactly due to unequal user editing bullet screen text time, network communication speed variations, etc. A bullet screen text may be associated with a video segment preceding or following the initial video segment.

In the embodiment of the present invention, the preset adjacent range may be set according to actual situations. For example, the video segments within the preset adjacent range corresponding to the jth initial video segment include the jth E (0<E.ltoreq.j) video segments and j+e-th video segments, the value of e can be a natural number of 1, 2, 3, etc., namelyTo->The video segments form the set of video segments to be matched.

S4, calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched pictures and texts;

In the embodiment of the invention, the similarity between the text characteristics of the text to be matched and the video characteristics of each video segment in the video segment set to be matched can be calculated by using a preset activation function.

In the embodiment of the present invention, the similarity may be calculated using the following activation function:

wherein,,the jth video segment corresponding to the jth video segment representing the ith long video _ij Text feature of barrage text, +.> Corresponding j-E0 representing i-th long video<E.ltoreq.j video segments to j+E-th video segment +.>And 2.epsilon.+1 video segments.

In the embodiment of the present invention, the preset similarity condition may be set according to actual situations, for example, the preset similarity condition may be that when the similarity between the text feature of a text to be matched and the video feature of a video segment to be matched is greater than or equal to a preset similarity threshold, a matching relationship exists between the corresponding text to be matched and the video segment to be matched.

In the embodiment of the invention, the video barrage set with matched graphics and texts is concentrated, and the barrage text contained in each video segment and the corresponding video segment realize the graphics and texts alignment effect.

S5, classifying the video bullet screen concentrated bullet screen texts with the matched images and texts, calculating the hot spot degree of the video segments in the corresponding classification according to the number of bullet screen texts in each classification, and determining hot spot videos according to the hot spot degree.

In the embodiment of the invention, the classification of the corresponding matched video segments can be completed by carrying out cluster analysis on the barrage text. The classification may be emotion polarity classification, including but not limited to positive, neutral, negative, etc., and in practical application, the corresponding classification setting may be performed according to the needs of hotspot video selection.

In detail, referring to fig. 4, the classifying the text of the video bullet screen set matched with the graphics includes:

s51, sequentially selecting a video segment as a target video segment in the video barrage set matched with the graphics context;

s52, calculating a preset classification feature vector of each barrage text in the target video segment;

s53, clustering bullet screen texts in the target video segment according to the classification feature vector to obtain a clustering center;

s54, calculating probability values between the classification feature vectors of each clustering center and preset classification labels, and selecting the classification labels corresponding to the probability values larger than a preset probability threshold as the classification of the target video segment.

In the embodiment of the invention, the classifying feature vector of each barrage text can be calculated by adopting methods such as Glove (Global Vectors for Word Representation, global word vector), embedding Layer and the like.

In the embodiment of the present invention, a preset activation function may be used to calculate a probability value between the classification feature vector of each cluster center and a preset classification label, where the activation function includes, but is not limited to, a softmax activation function, a sigmoid activation function, and a relu activation function, and the preset classification label includes, but is not limited to, positive, negative, and neutral.

In one embodiment of the present invention, the probability value may be calculated using the following activation function:

wherein p (a|x) is the probability between x and the classification label a of the classification feature vector of the cluster center, and w _a For the weight vector of the classification tag a, T is the transpose operation symbol, exp is the desired operationAnd the symbol A is the number of preset classification labels.

Further, the calculating the hot spot degree of the video segment in the corresponding category according to the number of the bullet screen texts in each category, and determining the hot spot video according to the hot spot degree includes:

counting the number of barrage texts corresponding to each video segment in all video segments in the same category; taking the counted number as the hot spot degree of the corresponding video segments, and sequencing all the video segments of the same class according to the hot spot degree; and selecting video segments in a preset ordering range as hot spot videos.

Illustratively, the top 3 video segment is selected as the hotspot video.

In the embodiment of the invention, after matching and alignment of each barrage text and the corresponding video segment are completed, the number of barrage texts in each video segment is not counted directly, but the barrage texts in each video segment are classified first, so that the effect of marking each video segment can be achieved, and then the number of barrage texts in all video segments in the same classification is counted, and the accuracy of selecting hot videos can be improved.

Fig. 5 is a functional block diagram of a hotspot video clip identification apparatus according to an embodiment of the present invention.

The hotspot video clip identification apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the hotspot video clip identification apparatus 100 includes: the system comprises a picture and text object acquisition module 101, a picture and text feature extraction module 102, a picture and text alignment module 103 and a picture and text classification and hot spot statistics module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

the image-text object obtaining module 101 is configured to obtain an initial video segment set and an initial bullet screen text set corresponding to the initial video segment set;

the image-text feature extraction module 102 is configured to extract a video feature of each video segment in the initial video segment set and a text feature of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;

the image-text alignment module 103 is configured to sequentially select one bullet screen text from the initial bullet screen text set as a text to be matched, and use an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched; calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;

The image-text classification and hot spot statistics module 104 is configured to classify the image-text matched video bullet screen concentrated bullet screen text, calculate the hot spot degree of the video segment in the corresponding classification according to the number of bullet screen texts in each classification, and determine the hot spot video according to the hot spot degree.

In detail, each module in the hotspot video clip identification apparatus 100 in the embodiment of the present invention adopts the same technical means as the hotspot video clip identification method described in fig. 1 to 4, and can produce the same technical effects, which are not described herein.

Fig. 6 is a schematic structural diagram of an electronic device for implementing a method for identifying hot video clips according to an embodiment of the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a hot spot video clip identification program, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of a hotspot video clip identification program, etc., but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., a hot spot video clip recognition program, etc.) stored in the memory 11, and calling data stored in the memory 11.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 6 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 6 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The hotspot video clip identification program stored in the memory 11 of the electronic device 1 is a combination of instructions which, when executed in the processor 10, can implement:

Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the holographic projection technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.

Claims

1. A method for identifying trending video clips, characterized in that the method comprises:

Obtain the initial video segment set and the corresponding initial bullet screen text set;

Using a pre-trained image-text matching model, extract the video features of each video segment in the initial video segment set and the text features of each bullet screen text in the initial bullet screen text set;

From the initial set of bullet screen texts, one bullet screen text is selected as the text to be matched, and the initial video segment corresponding to the text to be matched and the video segments within the preset adjacent range corresponding to the initial video segment are taken as the set of video segments to be matched.

Calculate the similarity between the text features of the text to be matched and the video features corresponding to the set of video segments to be matched. Select video segments that meet the preset similarity conditions from the set of video segments to be matched as the matching video segments of the text to be matched, and obtain the set of video bullet comments with image-text matching.

The bullet screen text in the video bullet screen set of the image-text matching is classified. The popularity degree of the video segment in the corresponding category is calculated according to the number of bullet screen texts in each category. The popular videos are determined according to the popularity degree.

2. The hotspot video segment identification method as described in claim 1, characterized in that, extracting the video features of each video segment in the initial video segment set includes:

Each video segment in the initial video segment set is sequentially processed into frames to obtain a set of video frames corresponding to each video segment.

Extract the pixel values of the RGB three channels of each video frame in the video frame set;

The extracted pixel values are used to generate a pixel matrix for the corresponding video frame;

By fusing the pixel matrix corresponding to each video frame in the video frame set, the video features of the corresponding video segment are obtained.

3. The method for identifying trending video segments as described in claim 1, characterized in that, extracting the text features of each bullet screen text in the initial bullet screen text set includes:

Each of the bullet screen texts is segmented sequentially to obtain multiple text segments;

Select one text segment from the multiple text segmentations as the target segment, and count the number of times the target segment and its neighboring text segments co-occur within a preset neighborhood of the target segment.

Construct a co-occurrence matrix using the co-occurrence counts corresponding to each text segmentation;

The multiple text segments are converted into word vectors, and the word vectors are concatenated into a vector matrix;

The text vector matrix is obtained by multiplying the co-occurrence matrix and the vector matrix.

Extract the text features of the bullet screen text from the text vector matrix.

4. The method for identifying trending video clips as described in claim 3, characterized in that, extracting the text features of the bullet screen text from the text vector matrix includes:

Select one text segment from the multiple text segmentations as the target segment, and calculate the key value of the target segment based on the word vector of the target segment and the text vector matrix;

A predetermined number of text segments are selected as feature segments from the plurality of text segments according to the key values in descending order;

The text features of the bullet screen text are obtained by concatenating the word vectors of the feature segmentation.

5. The method for identifying trending video clips as described in claim 1, characterized in that, classifying the text of the video bullet comments in the image-text matching set includes:

In the set of video bullet comments that match text and images, select one video segment as the target video segment in sequence;

Calculate the preset classification feature vector for each bullet screen text in the target video segment;

Cluster the bullet screen text in the target video segment based on the classification feature vector to obtain cluster centers;

Calculate the probability value between the classification feature vector of each cluster center and the preset classification label, and select the classification label corresponding to the probability value greater than the preset probability threshold as the classification of the target video segment.

6. The method for identifying trending video segments as described in claim 1, characterized in that, the step of calculating the trending degree of video segments in each category based on the number of bullet screen texts in each category, and determining trending videos based on the trending degree, includes:

Count the number of bullet screen texts for each video segment within the same category;

The number of statistics is used as the popularity level of the corresponding video segment, and all video segments in the same category are sorted according to the popularity level.

Select video segments within the preset sorting range as hot videos.

7. The method for identifying trending video segments as described in claim 1, characterized in that obtaining the initial video segment set and the initial bullet screen text set corresponding to the initial video segment set includes:

Get long videos with bullet screen data;

The long video is divided into video segments of equal length according to a preset unit duration to obtain the initial video segment set;

Obtain the bullet screen data contained in each video segment of the initial video segment set, preprocess each bullet screen data, and obtain the initial bullet screen text set corresponding to the initial video segment set.

8. A device for identifying hotspot video clips, characterized in that the device comprises:

The image and text object acquisition module is used to acquire the initial video segment set and the initial bullet screen text set corresponding to the initial video segment set;

The image and text feature extraction module is used to extract the video features of each video segment in the initial video segment set and the text features of each bullet screen text in the initial bullet screen text set using a pre-trained image and text matching model.

The image-text alignment module is used to sequentially select one bullet screen text as the text to be matched from the initial bullet screen text set, and to take the initial video segment corresponding to the text to be matched and the video segments within a preset adjacent range corresponding to the initial video segment as the set of video segments to be matched; calculate the similarity between the text features of the text to be matched and the video features corresponding to the set of video segments to be matched, and select the video segments that meet the preset similarity conditions from the set of video segments to be matched as the matching video segments of the text to be matched, thereby obtaining the image-text matched video bullet screen set;

The image and text classification and hotspot statistics module is used to classify the bullet screen text in the video bullet screen set of the image and text matching, calculate the hotspot degree of the video segment in the corresponding category based on the number of bullet screen texts in each category, and determine the hotspot video based on the hotspot degree.

9. An electronic device, characterized in that the electronic device comprises:

At least one processor; and,

A memory communicatively connected to the at least one processor; wherein,

The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the hotspot video segment recognition method as described in any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, it implements the hotspot video segment identification method as described in any one of claims 1 to 7.