Disclosure of Invention
The invention provides a method, a device, electronic equipment and a computer readable storage medium for identifying hot spot video clips, which mainly aim to improve the accuracy of identifying hot spot video clips.
In order to achieve the above object, the present invention provides a method for identifying a hotspot video clip, including:
acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
classifying the video barrage concentrated barrage texts matched with the graphics context, calculating the hot spot degree of the video segments in the corresponding classification according to the number of the barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
Optionally, the extracting the video feature of each video segment in the initial video segment set includes:
and carrying out framing treatment on each video segment in the initial video segment set in sequence to obtain a video frame set corresponding to each video segment.
Extracting pixel values of three RGB channels of each video frame in the video frame set;
generating a pixel point matrix of the corresponding video frame by using the extracted pixel values;
and fusing pixel point matrixes corresponding to each video frame in the video frame set to obtain video features of the corresponding video segments.
Optionally, the extracting text features of each barrage text in the initial barrage text set includes:
sequentially carrying out word segmentation processing on each barrage text to obtain a plurality of text word segmentation;
selecting one text word from the plurality of text words one by one as a target word, and counting the co-occurrence times of the target word and the adjacent text word of the target word in a preset neighborhood range of the target word;
constructing a co-occurrence matrix by using the co-occurrence times corresponding to each text word;
respectively converting the text word segmentation into word vectors, and splicing the word vectors into vector matrixes;
Performing product operation by using the co-occurrence matrix and the vector matrix to obtain a text vector matrix;
and extracting text features of the barrage text from the text vector matrix.
Optionally, the extracting the text feature of the barrage text from the text vector matrix includes:
selecting one text word from the text word segments one by one as a target word segment, and calculating a key value of the target word segment according to a word vector of the target word segment and the text vector matrix;
selecting a preset number of text word fragments from the plurality of text word fragments according to the sequence from the key value to the key value;
and splicing the word vectors of the feature word segmentation to obtain the text features of the barrage text.
Optionally, the classifying the text of the video barrage set matched with the graphics includes:
sequentially selecting one video segment as a target video segment in the video barrage set with the matched image and text;
calculating a preset classification feature vector of each barrage text in the target video segment;
clustering bullet screen texts in the target video segment according to the classification feature vector to obtain a clustering center;
And calculating the probability value between the classification feature vector of each clustering center and a preset classification label, and selecting the classification label corresponding to the probability value larger than the preset probability threshold as the classification of the target video segment.
Optionally, the calculating the hot spot degree of the video segment in each category according to the number of the bullet screen texts in the corresponding category, and determining the hot spot video according to the hot spot degree includes:
counting the number of barrage texts corresponding to each video segment in all video segments in the same category;
taking the counted number as the hot spot degree of the corresponding video segments, and sequencing all the video segments of the same class according to the hot spot degree;
and selecting video segments in a preset ordering range as hot spot videos.
Optionally, the acquiring the initial video segment set and the initial barrage text set corresponding to the initial video segment set includes:
acquiring a long video with bullet screen data;
dividing the long video into video segments with equal length according to a preset unit duration to obtain the initial video segment set;
and acquiring bullet screen data contained in each video segment in the initial video segment set, and preprocessing each piece of bullet screen data to obtain an initial bullet screen text set corresponding to the initial video segment set.
In order to solve the above problems, the present invention further provides a hotspot video clip identification apparatus, which includes:
the image-text object acquisition module is used for acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
the image-text feature extraction module is used for extracting the video feature of each video segment in the initial video segment set and the text feature of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
the image-text alignment module is used for sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched; calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
and the image-text classification and hot spot statistics module is used for classifying the image-text matched video barrage concentrated barrage texts, calculating the hot spot degree of the video segments in the corresponding classification according to the number of barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one computer program; and
And the processor executes the program stored in the memory to realize the hot spot video clip identification method.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned hotspot video clip identification method.
According to the embodiment of the invention, the video characteristics of each video segment in the initial video segment set and the text characteristics of each barrage text in the initial barrage text set are extracted by using a pre-trained image-text matching model, the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched is calculated, and the video segment meeting the preset similarity condition is selected as the matching video segment of the text to be matched, so that alignment of barrage text and video segment is realized, the accuracy of classifying video segments by using barrage text and calculating the hot spot degree is improved, and the accuracy of identifying hot spot video segments is further improved.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The embodiment of the application provides a hot spot video clip identification method. The execution subject of the hotspot video clip identification method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the hotspot video clip identification method may be performed by software or hardware installed on a terminal device or a server device, where the software may be a blockchain platform. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (ContentDelivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flowchart of a method for identifying a hot spot video clip according to an embodiment of the present invention is shown. In this embodiment, the method for identifying a hotspot video clip includes:
s1, acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
in the embodiment of the invention, a medical guidance video is taken as an example for explanation. The initial video segment set is composed of a plurality of medical guidance videos with different or specified lengths, such as a simple wound dressing video, a home blood pressure measurement guidance video, a cervical spondylosis prevention exercise video and the like. The initial bullet screen text set refers to a set of bullet screen texts generated in the playing process of each video segment in the initial video segment set.
In the embodiment of the invention, the initial video segment set and the initial bullet screen text set can be acquired from a designated storage area, for example, the initial video segment set and the initial bullet screen text set can be acquired from a designated storage area of a database, a blockchain, a network cache and the like through a computer sentence (java sentence, python sentence and the like) with a data acquisition function.
In detail, the acquiring the initial video segment set and the initial bullet screen text set corresponding to the initial video segment set includes: acquiring a long video with bullet screen data; dividing the long video into video segments with equal length according to a preset unit duration to obtain the initial video segment set; and acquiring bullet screen data contained in each video segment in the initial video segment set, and preprocessing each piece of bullet screen data to obtain an initial bullet screen text set corresponding to the initial video segment set.
Illustratively V 1 、V 2 、V 3 、V n N long videos with bullet screen data are respectively represented, and the duration of each long video is different. The preset unit length is L seconds,(i=1, 2, … N) represents N video segments corresponding to the i-th long video.
In the embodiment of the invention, the preprocessing comprises operations such as text de-duplication, useless character removal, expression symbol filtering and the like.
Illustratively, assume that the ith long video v i The j-th video segment of (i=1, 2, …, N) Tiji=1, 2, …, N; j=1, 2, …, ni bullet screen texts can be used respectivelyAnd (3) representing.
It will be appreciated that, due to the fact that there is a certain delay in the time of generation of the bullet screen text relative to the time of playing the video, there is a possibility that a certain bullet screen text appearsThe video segment, the content actually expressed by the barrage text is aimed atThe video segment is preceded by a video segment or first N video segments, and, therefore,and a certain time delay exists between the initial barrage text set and the initial video segment set, and the initial barrage text set and the initial video segment set are not completely matched.
S2, extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
In the embodiment of the invention, the pre-trained graph-text matching model is a neural network model created based on an attention mechanism. And training the image-text matching model by using a given video segment training set and a barrage text training set, so that the image-text matching model can find the optimal barrage text delay time parameter relative to the video segment.
It can be understood that each video segment is composed of an unequal number of video frames, each video frame can be regarded as a static image, typically, the image is composed of R, G, B three channels, each channel can be regarded as a pixel matrix, and the pixel matrix can be used to represent the features of the corresponding image, so that the video features of the corresponding video segment can be generated according to the features of the image.
In detail, referring to fig. 3, the extracting the video feature of each video segment in the initial video segment set includes:
s21, sequentially carrying out framing treatment on each video segment in the initial video segment set to obtain a video frame set corresponding to each video segment;
s22, extracting pixel values of RGB three channels of each video frame in the video frame set;
s23, generating a pixel point matrix of a corresponding video frame by using the extracted pixel values;
S24, fusing pixel point matrixes corresponding to each video frame in the video frame set to obtain video features of the corresponding video segment.
In an embodiment of the present invention, the following formula may be used to fuse the pixel point matrix of each video frame:
wherein,,representing video characteristics corresponding to the jth video segment in the ith long video,/video>Respectively indicate->The pixel values of the RGB channel of the nth second and the xth frame, W represents the parameters of convolution transformation, and b represents the corresponding bias items.
In the embodiment of the invention, since each barrage text is composed of natural language, if the barrage text is directly analyzed, a large amount of calculation resources are occupied, and the analysis efficiency is low, therefore, the barrage text can be converted into a text vector matrix, and further the barrage text expressed by the natural language is converted into a numerical form.
In the embodiment of the invention, a Glove (Global Vectors for Word Representation, global word vector), an editing Layer and other methods can be adopted to convert each barrage text into a text vector matrix, and further, after converting the barrage text into the text vector matrix, feature extraction can be performed on the text vector matrix to obtain text features of the barrage text, wherein the text features include but are not limited to text scenes, text topics and text keywords.
In one embodiment of the present invention, referring to fig. 3, the extracting text features of each barrage text in the initial barrage text set includes:
s21, sequentially performing word segmentation processing on each barrage text to obtain a plurality of text word segmentation;
s22, selecting one text word from the plurality of text words one by one as a target word, and counting the co-occurrence times of the target word and the adjacent text word of the target word in a preset neighborhood range of the target word;
s23, constructing a co-occurrence matrix by using the co-occurrence times corresponding to each text word;
s24, respectively converting the text word segmentation into word vectors, and splicing the word vectors into vector matrixes;
s25, performing product operation by using the co-occurrence matrix and the vector matrix to obtain a text vector matrix;
s26, extracting text features of the barrage text from the text vector matrix.
In detail, the bullet screen text may be subjected to word segmentation processing by using a preset standard dictionary, so as to obtain a plurality of text word segments, where the standard dictionary includes a plurality of standard word segments.
For example, the barrage text is searched in the standard dictionary according to different lengths, and if the standard word identical to the barrage text can be searched, the searched standard word can be determined to be the text word of the barrage text.
Illustratively, the co-occurrence matrix shown below may be constructed using the co-occurrence times corresponding to each text word:
wherein X is i,j And the co-occurrence times of the keyword i and the word j of the adjacent text of the keyword i in the barrage text are obtained.
In detail, the extracting the text feature of the barrage text from the text vector matrix includes:
selecting one text word from the text word segments one by one as a target word segment, and calculating a key value of the target word segment according to a word vector of the target word segment and the text vector matrix;
selecting a preset number of text word fragments from the plurality of text word fragments according to the sequence from the key value to the key value;
and splicing the word vectors of the feature word segmentation to obtain the text features of the barrage text.
In detail, since the barrage text contains a large number of text words, but not every text word is a feature of the barrage text, the text words need to be screened, one of the text words is selected from the text words one by one as a target word, and a key value of the target word is calculated according to a word vector of the target word and the text vector matrix, so that feature words which are representative of the barrage text are screened according to the key value, and the text feature of the barrage text is obtained.
Specifically, the calculating the key value of the target word according to the word vector of the target word and the text vector matrix includes:
calculating the key value of the target word by using the following key value algorithm:
wherein K is the key value, W is the text vector matrix, T is a matrix transpose symbol, W is a modulo symbol,word vectors that segment the target word.
In the embodiment of the invention, the text word segmentation with the preset number is selected from the text word segmentation with the preset number as the characteristic word segmentation according to the sequence from the big key value to the small key value of each text word segmentation.
For example, the plurality of text tokens includes: and selecting the text word A and the text word B as feature word segments according to the sequence of the key values from big to small if the preset number is 2, and splicing word vectors of the text word A and the text word B to obtain the text feature of the barrage text.
S3, sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
It will be appreciated that typically the bullet screen text contains a time stamp relative to the video playing time that identifies at which time the user was publishing a bullet screen of a video segment, but the bullet screen text does not match the corresponding initial video segment exactly due to unequal user editing bullet screen text time, network communication speed variations, etc. A bullet screen text may be associated with a video segment preceding or following the initial video segment.
In the embodiment of the present invention, the preset adjacent range may be set according to actual situations. For example, the video segments within the preset adjacent range corresponding to the jth initial video segment include the jth E (0<E.ltoreq.j) video segments and j+e-th video segments, the value of e can be a natural number of 1, 2, 3, etc., namelyTo->The video segments form the set of video segments to be matched.
S4, calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched pictures and texts;
In the embodiment of the invention, the similarity between the text characteristics of the text to be matched and the video characteristics of each video segment in the video segment set to be matched can be calculated by using a preset activation function.
In the embodiment of the present invention, the similarity may be calculated using the following activation function:
wherein,,the jth video segment corresponding to the jth video segment representing the ith long video ij Text feature of barrage text, +.> Corresponding j-E0 representing i-th long video<E.ltoreq.j video segments to j+E-th video segment +.>And 2.epsilon.+1 video segments.
In the embodiment of the present invention, the preset similarity condition may be set according to actual situations, for example, the preset similarity condition may be that when the similarity between the text feature of a text to be matched and the video feature of a video segment to be matched is greater than or equal to a preset similarity threshold, a matching relationship exists between the corresponding text to be matched and the video segment to be matched.
In the embodiment of the invention, the video barrage set with matched graphics and texts is concentrated, and the barrage text contained in each video segment and the corresponding video segment realize the graphics and texts alignment effect.
S5, classifying the video bullet screen concentrated bullet screen texts with the matched images and texts, calculating the hot spot degree of the video segments in the corresponding classification according to the number of bullet screen texts in each classification, and determining hot spot videos according to the hot spot degree.
In the embodiment of the invention, the classification of the corresponding matched video segments can be completed by carrying out cluster analysis on the barrage text. The classification may be emotion polarity classification, including but not limited to positive, neutral, negative, etc., and in practical application, the corresponding classification setting may be performed according to the needs of hotspot video selection.
In detail, referring to fig. 4, the classifying the text of the video bullet screen set matched with the graphics includes:
s51, sequentially selecting a video segment as a target video segment in the video barrage set matched with the graphics context;
s52, calculating a preset classification feature vector of each barrage text in the target video segment;
s53, clustering bullet screen texts in the target video segment according to the classification feature vector to obtain a clustering center;
s54, calculating probability values between the classification feature vectors of each clustering center and preset classification labels, and selecting the classification labels corresponding to the probability values larger than a preset probability threshold as the classification of the target video segment.
In the embodiment of the invention, the classifying feature vector of each barrage text can be calculated by adopting methods such as Glove (Global Vectors for Word Representation, global word vector), embedding Layer and the like.
In the embodiment of the present invention, a preset activation function may be used to calculate a probability value between the classification feature vector of each cluster center and a preset classification label, where the activation function includes, but is not limited to, a softmax activation function, a sigmoid activation function, and a relu activation function, and the preset classification label includes, but is not limited to, positive, negative, and neutral.
In one embodiment of the present invention, the probability value may be calculated using the following activation function:
wherein p (a|x) is the probability between x and the classification label a of the classification feature vector of the cluster center, and w a For the weight vector of the classification tag a, T is the transpose operation symbol, exp is the desired operationAnd the symbol A is the number of preset classification labels.
Further, the calculating the hot spot degree of the video segment in the corresponding category according to the number of the bullet screen texts in each category, and determining the hot spot video according to the hot spot degree includes:
counting the number of barrage texts corresponding to each video segment in all video segments in the same category; taking the counted number as the hot spot degree of the corresponding video segments, and sequencing all the video segments of the same class according to the hot spot degree; and selecting video segments in a preset ordering range as hot spot videos.
Illustratively, the top 3 video segment is selected as the hotspot video.
In the embodiment of the invention, after matching and alignment of each barrage text and the corresponding video segment are completed, the number of barrage texts in each video segment is not counted directly, but the barrage texts in each video segment are classified first, so that the effect of marking each video segment can be achieved, and then the number of barrage texts in all video segments in the same classification is counted, and the accuracy of selecting hot videos can be improved.
According to the embodiment of the invention, the video characteristics of each video segment in the initial video segment set and the text characteristics of each barrage text in the initial barrage text set are extracted by using a pre-trained image-text matching model, the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched is calculated, and the video segment meeting the preset similarity condition is selected as the matching video segment of the text to be matched, so that alignment of barrage text and video segment is realized, the accuracy of classifying video segments by using barrage text and calculating the hot spot degree is improved, and the accuracy of identifying hot spot video segments is further improved.
Fig. 5 is a functional block diagram of a hotspot video clip identification apparatus according to an embodiment of the present invention.
The hotspot video clip identification apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the hotspot video clip identification apparatus 100 includes: the system comprises a picture and text object acquisition module 101, a picture and text feature extraction module 102, a picture and text alignment module 103 and a picture and text classification and hot spot statistics module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the image-text object obtaining module 101 is configured to obtain an initial video segment set and an initial bullet screen text set corresponding to the initial video segment set;
the image-text feature extraction module 102 is configured to extract a video feature of each video segment in the initial video segment set and a text feature of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
the image-text alignment module 103 is configured to sequentially select one bullet screen text from the initial bullet screen text set as a text to be matched, and use an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched; calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
The image-text classification and hot spot statistics module 104 is configured to classify the image-text matched video bullet screen concentrated bullet screen text, calculate the hot spot degree of the video segment in the corresponding classification according to the number of bullet screen texts in each classification, and determine the hot spot video according to the hot spot degree.
In detail, each module in the hotspot video clip identification apparatus 100 in the embodiment of the present invention adopts the same technical means as the hotspot video clip identification method described in fig. 1 to 4, and can produce the same technical effects, which are not described herein.
Fig. 6 is a schematic structural diagram of an electronic device for implementing a method for identifying hot video clips according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a hot spot video clip identification program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of a hotspot video clip identification program, etc., but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 1 and processes data by running or executing programs or modules (e.g., a hot spot video clip recognition program, etc.) stored in the memory 11, and calling data stored in the memory 11.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 6 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 6 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The hotspot video clip identification program stored in the memory 11 of the electronic device 1 is a combination of instructions which, when executed in the processor 10, can implement:
acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
Sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
classifying the video barrage concentrated barrage texts matched with the graphics context, calculating the hot spot degree of the video segments in the corresponding classification according to the number of the barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:
acquiring an initial video segment set and an initial barrage text set corresponding to the initial video segment set;
extracting video characteristics of each video segment in the initial video segment set and text characteristics of each barrage text in the initial barrage text set by using a pre-trained image-text matching model;
sequentially selecting one barrage text from the initial barrage text set as a text to be matched, and taking an initial video segment corresponding to the text to be matched and a video segment in a preset adjacent range corresponding to the initial video segment as a video segment set to be matched;
calculating the similarity between the text characteristics of the text to be matched and the video characteristics corresponding to the video segment set to be matched, and selecting video segments meeting the preset similarity condition from the video segment set to be matched as the matched video segments of the text to be matched to obtain a video bullet screen set with matched images and texts;
classifying the video barrage concentrated barrage texts matched with the graphics context, calculating the hot spot degree of the video segments in the corresponding classification according to the number of the barrage texts in each classification, and determining hot spot videos according to the hot spot degree.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The embodiment of the application can acquire and process the related data based on the holographic projection technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present application without departing from the spirit and scope of the technical solution of the present application.