CN110647933B - Video classification method and device - Google Patents
Video classification method and device Download PDFInfo
- Publication number
- CN110647933B CN110647933B CN201910894922.4A CN201910894922A CN110647933B CN 110647933 B CN110647933 B CN 110647933B CN 201910894922 A CN201910894922 A CN 201910894922A CN 110647933 B CN110647933 B CN 110647933B
- Authority
- CN
- China
- Prior art keywords
- classification
- frame image
- classification result
- video
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
The disclosure relates to the technical field of videos, and in particular relates to a video classification method and device. The method is used for improving the classification speed and ensuring the classification accuracy, and comprises the following steps: extracting a plurality of frame images from a video to be classified, obtaining a plurality of frame image sets and a plurality of reference frame images, respectively carrying out resolution compression on the frame images in each frame image set to obtain a corresponding dynamic diagram, then carrying out video type identification, obtaining a first classification result set and a corresponding first weight set, obtaining a second classification result set and a corresponding second weight set, finally merging, and screening out classification results with weights meeting preset conditions as target classification results. Therefore, the video is classified according to the obtained images of each image and reference frame, the automatic training function of classification is guaranteed, the system overhead is reduced, the accuracy of a target classification result is guaranteed, the accuracy and the effectiveness of video type identification are improved, and the classification speed is increased.
Description
Technical Field
The disclosure relates to the technical field of videos, and in particular relates to a video classification method and device.
Background
With the rapid development of fourth-Generation mobile communication system (Fourth Generation, 4G) networks and Fifth-Generation mobile communication system (5G) networks and the popularization of wearable photographing devices, more and more video images are appearing on network platforms. In a fast-handed video platform, tens of millions of video images are uploaded every day, and the content conveyed by one video image can be better understood by utilizing multi-frame sequence information of the video images.
At present, an algorithm for classifying video images based on a double-flow method basically adopts two kinds of information, namely a three-primary-color (Red Green Blue) image and an optical flow, to classify the video images, namely, after the classification result of the RGB image and the classification result of the optical flow are respectively obtained, the obtained two classification results are weighted, so that the classification result of the video images is obtained.
However, the video image classification algorithm based on RGB images and optical flows needs to calculate the motion information of the previous and subsequent frames, usually by optical flows, and although the optical flows have a significant effect in video image classification, the time consuming process is long, and the video image classification algorithm is not suitable for the video image classification in the industry, reduces the classification speed of the video images, and affects the processing efficiency of the system.
Therefore, there is a need to design a classification method of video to solve the above-mentioned problems.
Disclosure of Invention
The purpose of the present disclosure is to provide a method and an apparatus for classifying video, so as to effectively improve the classification speed and ensure the classification accuracy.
According to a first aspect of an embodiment of the present disclosure, there is provided a method for classifying video, including:
extracting a plurality of frame images from a specified time period in a video to be classified to obtain a plurality of frame image sets and a plurality of reference frame images, wherein each frame image set comprises at least two frame images;
respectively carrying out resolution compression on the frame images in each frame image set, and obtaining a corresponding moving picture of each frame image set based on a time sequence corresponding relation;
performing video type recognition based on the obtained images, obtaining a first classification result set and a corresponding first weight set, and performing video type recognition based on the obtained images of the reference frames, obtaining a second classification result set and a corresponding second weight set, wherein one weight in the first weight set corresponds to one classification result in the first classification result, and one weight in the second weight set corresponds to one classification result in the second classification result;
and merging the first classification result set, the first weight set, the second classification result set and the second weight set, and screening classification results with weights meeting preset conditions from the merging results as target classification results.
Optionally, extracting a plurality of frame images from a specified time period in the video to be classified to obtain a plurality of frame image sets and a plurality of reference frame images, including:
dividing the appointed time period in the video to be classified into a plurality of time intervals;
for each time interval, the following operations are performed:
extracting a plurality of frame images from a time interval to form a frame image set corresponding to the time interval;
and extracting one frame image from the time interval to serve as a reference frame image corresponding to the time interval.
Optionally, performing resolution compression on the frame image in any one of the frame image sets includes:
determining an original pixel matrix of each frame image in the arbitrary frame image set;
determining the resolution corresponding to the original pixel matrix corresponding to each frame image according to the corresponding relation between the pixel matrix and the resolution preset by the system, and setting the resolution corresponding to the existing pixel matrix;
and modifying the resolution corresponding to the original pixel matrix to the resolution corresponding to the existing pixel matrix.
Optionally, merging the first classification result set, the first weight set, the second classification result set and the second weight set, and screening classification results with weights meeting preset conditions from the merged results as target classification results, including:
determining the same classification result and different classification results contained in the first classification result set and the second classification result set;
the following is performed for each set of identical classification results: combining and averaging a group of the same classification results in the weights corresponding to the first weight set and the second weight set respectively to obtain the latest weight corresponding to the same classification result;
combining the same classification result and the corresponding latest weight of each group, and the different classification results and the corresponding weights to obtain a combined result;
and screening the classification result with the weight meeting the preset condition from the combined result to be used as the target classification result.
Optionally, screening the classification result with the weight meeting the preset condition from the combined result as the target classification result includes:
according to the obtained merging result, comparing weights corresponding to all classification results in the merging result to obtain a comparison result;
and determining the maximum value of the weights according to the comparison result, and taking the video type corresponding to the maximum value of the weights as a target classification result.
According to a second aspect of embodiments of the present disclosure, there is provided a video classification apparatus, including:
the extraction unit is used for extracting a plurality of frame images from a specified time period in the video to be classified to obtain a plurality of frame image sets and a plurality of reference frame images, wherein each frame image set comprises at least two frame images;
the compression unit is used for respectively carrying out resolution compression on the frame images in each frame image set and obtaining a corresponding dynamic diagram of each frame image set based on a time sequence corresponding relation;
the identification unit is used for carrying out video type identification based on each obtained moving picture, obtaining a first classification result set and a corresponding first weight set, and carrying out video type identification based on each obtained reference frame image, obtaining a second classification result set and a corresponding second weight set, wherein one weight in the first weight set corresponds to one classification result in the first classification result, and one weight in the second weight set corresponds to one classification result in the second classification result;
the processing unit is used for merging the first classification result set, the first weight set, the second classification result set and the second weight set, and screening classification results with weights meeting preset conditions from the merging results to serve as target classification results.
Optionally, a plurality of frame images are extracted from a specified time period in the video to be classified, a plurality of frame image sets and a plurality of reference frame images are obtained, and the extracting unit is used for:
dividing the appointed time period in the video to be classified into a plurality of time intervals;
for each time interval, the following operations are performed:
extracting a plurality of frame images from a time interval to form a frame image set corresponding to the time interval;
and extracting one frame image from the time interval to serve as a reference frame image corresponding to the time interval.
Optionally, the resolution compression is performed on the frame images in any one of the frame image sets, and the compression unit is configured to:
determining an original pixel matrix of each frame image in the arbitrary frame image set;
determining the resolution corresponding to the original pixel matrix corresponding to each frame image according to the corresponding relation between the pixel matrix and the resolution preset by the system, and setting the resolution corresponding to the existing pixel matrix;
and modifying the resolution corresponding to the original pixel matrix to the resolution corresponding to the existing pixel matrix.
Optionally, the first classification result set, the first weight set, the second classification result set and the second weight set are combined, and classification results with weights meeting preset conditions are screened out from the combined results to be used as target classification results, and the processing unit is used for:
determining the same classification result and different classification results contained in the first classification result set and the second classification result set;
the following is performed for each set of identical classification results: combining and averaging a group of the same classification results in the weights corresponding to the first weight set and the second weight set respectively to obtain the latest weight corresponding to the same classification result;
combining the same classification result and the corresponding latest weight of each group, and the different classification results and the corresponding weights to obtain a combined result;
and screening the classification result with the weight meeting the preset condition from the combined result to be used as the target classification result.
Optionally, a classification result with weight meeting a preset condition is selected from the combined result as the target classification result, and the processing unit is configured to:
according to the obtained merging result, comparing weights corresponding to all classification results in the merging result to obtain a comparison result;
and determining the maximum value of the weights according to the comparison result, and taking the video type corresponding to the maximum value of the weights as a target classification result.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of classifying video as claimed in any one of the preceding claims.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, which when executed by a processor of a video classification apparatus, enables an electronic device to perform the method of classifying video as set forth in any one of the above.
In summary, in the embodiment of the present disclosure, a plurality of frame images are extracted from a video to be classified, a plurality of frame image sets and a plurality of reference frame images are obtained, resolution compression is performed on the frame images in each frame image set, a corresponding image is obtained, then video type recognition is performed, a first classification result set and a corresponding first weight set are obtained, a second classification result set and a corresponding second weight set are obtained, finally, merging is performed, and classification results with weights meeting preset conditions are screened out as target classification results. Therefore, the video is classified according to the obtained images of each image and reference frame, the automatic training function of classification is guaranteed, the system overhead is reduced, the accuracy of a target classification result is guaranteed, the accuracy and the effectiveness of video type identification are improved, and the classification speed is increased.
Drawings
FIG. 1 is a schematic diagram of a training process of a classification model in an embodiment of the disclosure;
FIG. 2 is a schematic diagram of a video classification flow in an embodiment of the disclosure;
FIG. 3 is a functional schematic diagram of a classification processing device according to an embodiment of the disclosure;
fig. 4 is a functional structural schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, and not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
In the embodiment of the disclosure, referring to fig. 1, before classifying a video, training a classification model is required, and the detailed flow is as follows:
step 100: the classification processing device acquires a sample video and determines a classification result of the sample video.
Step 110: the classification processing device extracts a plurality of sample frame images from a specified time period in the sample video to obtain a plurality of sample frame image sets and a plurality of sample reference frame images, wherein each sample frame image set comprises at least two sample frame images.
Specifically, in the embodiment of the present disclosure, after determining a sample video to be classified, the classification processing device divides the specified time period in the sample video into a plurality of time intervals, and performs the following operations for each time interval, respectively: extracting a plurality of sample frame images from one time interval to form a sample frame image set corresponding to the one time interval, and extracting one sample frame image from the one time interval to serve as a sample reference frame image corresponding to the one time interval.
For example, the sample video is subjected to frame extraction within the first 16 seconds of one sample video, the preset time interval of the system is 1 second, the sample video is subjected to frame extraction according to the time interval of 1 second and the frame number of 8 frames, namely, for each second, the corresponding 8 frames are extracted from the sample video to form a sample frame image set, and finally 16 sample frame image sets are obtained, wherein each sample frame image set comprises 8 sample frame images. In addition, the sample video image is decimated at intervals of 1 second and frame numbers of 1 frame, that is, for each second, a corresponding 1 frame is extracted from the sample video as one sample reference frame image, and the extracted 1 frame may be one of 8 frames extracted previously or may not be the one extracted previously, and 16 sample reference frame images are obtained finally.
Step 120: the classification processing device respectively performs resolution compression on the sample frame images in each sample frame image set, and obtains a sample map corresponding to each sample frame image set based on the time sequence corresponding relation.
Specifically, in the embodiment of the present disclosure, taking an arbitrary sample frame image set as an example, the classification processing device may perform compression processing on the sample frame image set, that is, determine an original pixel matrix of each sample frame image in the arbitrary sample frame image set, determine, according to a correspondence between a pixel matrix preset by a system and a resolution, a resolution corresponding to the original pixel matrix corresponding to each sample frame image, set a resolution corresponding to an existing pixel matrix, and modify the resolution corresponding to the original pixel matrix to the resolution corresponding to the existing pixel matrix. Then, a corresponding sample map is obtained.
In the embodiment of the disclosure, after resolution compression, when the classification processing device obtains the sample map, the following manner may be adopted:
taking any sample frame image set (hereinafter referred to as sample frame image set X) as an example, the classification processing means may finally obtain the pixel matrix ρ of the sample map using the following formula 0 :
Wherein ρ is 0 For a matrix of pixels of a sample map, M 0 For the total number of sample frame images in the sample frame image set X,t is in a sample frame image set 0 Zhang Yangben pixel matrix corresponding to frame image, +.>T is in a sample frame image set 0 Zhang Yangben frame image.
For example, when the sample frame image set X includes 4 sample frame images, the pixel matrixes corresponding to the 4 sample frame images are respectively I 1 、I 2 、I 3 And I 4 The weight values corresponding to the 4 sample frame images are alpha respectively 1 、α 2 、α 3 And alpha 4 I.e. M 0 =4,α 1 =-3,α 2 =-1,α 3 =1,α 4 =3, generating a corresponding sample motion map based on the weight values, and obtaining a pixel matrix ρ of the sample motion map corresponding to the sample frame image set X 1 =-3I 1 -I 2 +I 3 +3I 4 。
Step 130: the classification processing device adopts a 3D convolution network (Informated 3D ConvNet, I3D) to carry out model training based on the corresponding relation between the obtained sample moving picture and the classification result of the sample video and the corresponding relation between the sample reference frame image and the classification result of the sample video, so as to obtain a corresponding classification model.
Specifically, in the embodiment of the present disclosure, after a sample video is uploaded by a user, by executing the steps 100 to 120, the classification processing device may obtain a sample moving image and a sample reference frame image, determine a correspondence between classification results of the sample moving image and the sample video, and a correspondence between classification results of the sample reference frame image and the sample video, and further, obtain a corresponding classification model after model training by using a 3D convolutional network.
For example, after the user uploads the video 1, the classification processing device determines that the classification result of the sample video is dancing, and the classification processing device may perform frame extraction and resolution compression for a specified period of time in the video 1 to obtain each sample map and each sample reference frame image.
For example, the classification processing device performs frame extraction on the video 1 according to a time interval of 1 second and a frame number of 8 frames to obtain 16 sample frame image sets, wherein each sample frame image set includes 8 sample frame images, further, after compressing the sample frame images in each sample frame image set, 16 sample images corresponding to the 16 sample frame image sets are obtained, and meanwhile, performs frame extraction on the video 1 according to a time interval of 1 second and a frame number of 1 frame to obtain 16 sample reference frame images. Then, inputting the 16 sample images into the initial classification model to obtain a corresponding classification result set of = { singing, dancing and speaking }, wherein the classification result of the sample video is dancing, so that the dancing weight becomes high when the sample video is output next time according to the parameters of the adjustment classification model. Similarly, for the processing flow of 16 sample reference frame images, the 16 sample reference frame images are also required to be input into an initial classification model to obtain corresponding classification result combination, and then parameters of the classification model are adjusted according to the classification result of the sample video, so that the dancing weight becomes high.
Furthermore, the classification processing device adopts the same mode, so that massive sample data can be obtained, namely, the corresponding relation between each sample image and each sample video classification result and the corresponding relation between each sample reference frame image and each sample video classification result are determined, then training is performed based on the massive sample data, and optionally, a video type recognition algorithm can be adopted for model training.
Therefore, in the subsequent video type identification process, the trained classification model can be directly adopted for classification, the automatic training performance of the system can be improved, the classification speed is accelerated, the processing efficiency of the system is improved, the processing time is further shortened, the accuracy of a classification result is ensured, and the purpose of accurate classification is achieved.
Referring to fig. 2, in the embodiment of the disclosure, after training of the classification model is completed, the obtained classification model may be used to perform video type recognition on different videos, and the detailed flow is as follows:
step 200: the classification processing means determines the video to be classified.
Specifically, in the embodiment of the present disclosure, after receiving a video uploaded by a user, the classification processing device may determine a video to be classified, and then start a process of identifying a video type of the video.
Step 210: the classification processing device extracts a plurality of frame images from a specified time period in the video to be classified, and obtains a plurality of frame image sets and a plurality of reference frame images, wherein each frame image set comprises at least two frame images.
Specifically, in the embodiment of the present disclosure, after determining a video to be classified, the classification processing device divides the specified time period in the video to be classified into a plurality of time intervals, and performs the following operations for each time interval: and extracting a plurality of frame images from one time interval to form a frame image set corresponding to the one time interval, and extracting one frame image from the one time interval to serve as a reference frame image corresponding to the one time interval.
For example, frames are extracted from the video within the first 60 seconds of the video to be classified, the preset time interval of the system is 5 seconds, frames are extracted from the video according to the time interval of 5 seconds and the frame number of 4 frames, namely, for each 5 seconds, the corresponding 4 frames are extracted from the video to form a frame image set, and finally 12 frame image sets are obtained, wherein each frame image set comprises 4 frame images. In addition, the video is decimated at intervals of 5 seconds and frame numbers of 1 frame, that is, for every 5 seconds, the corresponding 1 frame is extracted from the video as one reference frame image, and the extracted 1 frame may be one of the 4 frames extracted previously or may not be the one, and the 12 reference frame images are obtained finally.
Therefore, in the embodiment of the disclosure, according to the extracted plurality of frame images, a plurality of corresponding frame image sets and a plurality of reference frame images are obtained, so that the utilization rate of the video to be classified is effectively improved, and the reliability of the frame image sets and the reference frame images is ensured.
Step 220: the classification processing device respectively performs resolution compression on the frame images in each frame image set, and obtains a corresponding moving picture of each frame image set based on the time sequence corresponding relation.
Specifically, in the embodiment of the present disclosure, taking any one frame image set as an example, the classification processing device may perform compression processing on the frame image set, that is, determine an original pixel matrix of each frame image in the any one frame image set, determine, according to a correspondence between a pixel matrix preset by a system and a resolution, a resolution corresponding to the original pixel matrix corresponding to each frame image, set a resolution corresponding to an existing pixel matrix, and modify the resolution corresponding to the original pixel matrix to the resolution corresponding to the existing pixel matrix. Then, a corresponding map is obtained.
Therefore, the correct dynamic diagram is ensured to be obtained, the compression failure can be avoided while the compression process is simplified, and the situation of obtaining the error dynamic diagram can be avoided.
In addition, when a motion map is obtained through resolution compression, a pixel matrix corresponding to the motion map may be obtained according to formula (1).
Step 230: the classification processing device performs video type recognition based on the obtained images, obtains a first classification result set and a corresponding first weight set, and performs video type recognition based on the obtained reference frame images, and obtains a second classification result set and a corresponding second weight set, wherein one weight in the first weight set corresponds to one classification result in the first classification result, and one weight in the second weight set corresponds to one classification result in the second classification result.
Specifically, in the embodiment of the present disclosure, after the classification model is obtained, the classification processing device may use the obtained classification model to perform video type recognition on the video to be classified, so as to obtain a corresponding classification result set and a corresponding weight set.
For example, the classification processing device performs frame extraction on the video according to a time interval of 5 seconds and a frame number of 4 frames to obtain 12 images corresponding to 12 frame image sets, performs frame extraction on the video according to a time interval of 5 seconds and a frame number of 1 frame to obtain 12 reference frame images, inputs the 12 images into the classification model based on a trained classification model to obtain a first classification result set= { singing and dancing }, a corresponding first weight set= {30% and 70% }, wherein 30% of the corresponding classification results are singing, 70% of the corresponding classification results are dancing, 12 reference frame images are input into the classification model to obtain a second classification result set= { dancing and presenting }, and a corresponding second weight set= {80%,20% }, wherein 80% of the corresponding classification results are dancing and 20% of the corresponding classification results are presenting.
Therefore, in the embodiment of the present disclosure, the video type recognition is performed according to each obtained image and each reference frame image, and compared with the classification method for calculating the motion information by the optical flow in the related art, the classification method in the embodiment of the present disclosure is more suitable for the industry, and only requires a shorter processing time. Step 240: the classification processing device combines the first classification result set, the first weight set, the second classification result set and the second weight set, and screens the classification result with the weight meeting the preset condition from the combined result as a target classification result.
Specifically, in the embodiment of the present disclosure, when step 240 is performed, the classification processing apparatus may specifically perform the following operations:
step A: the classification processing means determines the same classification result and different classification results contained in the first classification result set and the second classification result set.
For example, when the first classification result set= { singing, dancing }, the second classification result set= { dancing, lecture }, the classification processing means determines that the same classification result is dancing, and the different classification results are singing and lecture.
And (B) step (B): the classification processing means performs the following operations for each set of identical classification results: and merging and averaging the weights corresponding to the same classification result in the first weight set and the second weight set to obtain the latest weight corresponding to the same classification result.
For example, when the same classification result is dancing, the classification processing means determines that the weight 1 of dancing in the first weight set is 70%, and the weight 2 of dancing in the second weight set is 80%, then the 70% and 80% are combined to obtain an average value of 75%, that is, the latest weight corresponding to dancing is 75%.
Step C: the classification processing device combines the same classification results and the corresponding latest weights of each group, and the different classification results and the corresponding weights to obtain a combined result.
For example, the classification processing device combines the latest weights and the weights corresponding to the singing, the dancing, and the speaking, that is, combines the singing, the dancing, and the speaking with the weights corresponding to 30%,75%, and 20%, respectively, to obtain a combined classification result set= { singing, dancing, and speaking }, and a combined weight set= {30%,75%, and 20% }.
Step D: and the classification processing device screens out classification results with weight meeting preset conditions from the combined results as the target classification results.
Specifically, in the embodiment of the disclosure, the classification processing device compares weights corresponding to respective classification results in the combination results according to the obtained combination results to obtain comparison results, then determines a maximum value of the weights according to the comparison results, and takes a video type corresponding to the maximum value of the weights as a target classification result.
For example, when the classification processing apparatus obtains the combination result = { singing, dancing, lecturing }, the sizes of 30%,75%, and 20% are compared according to the weights corresponding to the respective classification results, and since 75% >30% >20%, the maximum value of the weights is determined to be 75%, and then the video type corresponding to 75% is determined to be the dancing as the target classification result, that is, the dancing is determined to be the video type to be classified.
In this way, the embodiment of the disclosure performs finer processing on each classification result set and the corresponding weight set, and can improve the processing efficiency of the system under the condition of ensuring the accuracy of the combined result, thereby improving the calculation speed, obtaining an accurate target classification result, meeting the high-speed requirement of users on video classification, and improving the user experience.
In an embodiment of the present disclosure, referring to fig. 3, a classification processing apparatus at least includes: an extraction unit 300, a compression unit 310, an identification unit 320, and a processing unit 330, wherein,
an extracting unit 300, configured to extract a plurality of frame images from a specified time period in a video to be classified, and obtain a plurality of frame image sets and a plurality of reference frame images, where each frame image set includes at least two frame images;
a compression unit 310, configured to perform resolution compression on the frame images in each frame image set, and obtain a map corresponding to each frame image set based on a time sequence correspondence;
a recognition unit 320, configured to perform video type recognition based on each obtained moving picture, obtain a first classification result set and a corresponding first weight set, and perform video type recognition based on each obtained reference frame image, obtain a second classification result set and a corresponding second weight set, where one weight in the first weight set corresponds to one classification result in the first classification result, and one weight in the second weight set corresponds to one classification result in the second classification result;
the processing unit 330 is configured to combine the first classification result set with the first weight set, and the second classification result set with the second weight set, and screen the classification result with the weight meeting the preset condition from the combined result as a target classification result.
The specific manner in which the individual units perform the operations in relation to the apparatus of the above embodiments has been described in detail in relation to the embodiments of the method and will not be described in detail here.
Referring to fig. 4, an embodiment of the disclosure provides a schematic structural diagram of an electronic device. As shown, the electronic device may include: a processor 401, a memory 402, a transceiver 403, and a bus interface 404.
The processor 401 is responsible for managing the bus architecture and general processing, and the memory 402 may store data used by the processor 401 in performing operations. The transceiver 403 is used to receive and transmit data under the control of the processor 401. The processor 401 is configured to read the computer instructions in the memory 402, and execute:
extracting a plurality of frame images from a specified time period in a video to be classified to obtain a plurality of frame image sets and a plurality of reference frame images, wherein each frame image set comprises at least two frame images;
respectively carrying out resolution compression on the frame images in each frame image set, and obtaining a corresponding moving picture of each frame image set based on a time sequence corresponding relation;
performing video type recognition based on the obtained images, obtaining a first classification result set and a corresponding first weight set, and performing video type recognition based on the obtained images of the reference frames, obtaining a second classification result set and a corresponding second weight set, wherein one weight in the first weight set corresponds to one classification result in the first classification result, and one weight in the second weight set corresponds to one classification result in the second classification result;
and merging the first classification result set, the first weight set, the second classification result set and the second weight set, and screening classification results with weights meeting preset conditions from the merging results as target classification results.
Optionally, a plurality of frame images are extracted from the video to be classified within a specified time period, and a plurality of frame image sets and a plurality of reference frame images are obtained, and the processor 401 is configured to:
dividing the appointed time period in the video to be classified into a plurality of time intervals;
for each time interval, the following operations are performed:
extracting a plurality of frame images from a time interval to form a frame image set corresponding to the time interval;
and extracting one frame image from the time interval to serve as a reference frame image corresponding to the time interval.
Optionally, the processor 401 is configured to perform resolution compression on a frame image in any one of the set of frame images, where:
determining an original pixel matrix of each frame image in the arbitrary frame image set;
determining the resolution corresponding to the original pixel matrix corresponding to each frame image according to the corresponding relation between the pixel matrix and the resolution preset by the system, and setting the resolution corresponding to the existing pixel matrix;
and modifying the resolution corresponding to the original pixel matrix to the resolution corresponding to the existing pixel matrix.
Optionally, the first classification result set, the first weight set, the second classification result set and the second weight set are combined, and classification results with weights meeting preset conditions are screened from the combined results as target classification results, and the processor 401 is configured to:
determining the same classification result and different classification results contained in the first classification result set and the second classification result set;
the following is performed for each set of identical classification results: combining and averaging a group of the same classification results in the weights corresponding to the first weight set and the second weight set respectively to obtain the latest weight corresponding to the same classification result;
combining the same classification result and the corresponding latest weight of each group, and the different classification results and the corresponding weights to obtain a combined result;
and screening the classification result with the weight meeting the preset condition from the combined result to be used as the target classification result.
Optionally, a classification result with a weight meeting a preset condition is selected from the combined result as the target classification result, and the processor 401 is configured to:
according to the obtained merging result, comparing weights corresponding to all classification results in the merging result to obtain a comparison result;
and determining the maximum value of the weights according to the comparison result, and taking the video type corresponding to the maximum value of the weights as a target classification result.
The bus architecture may include any number of interconnecting buses and bridges, and in particular one or more processors, represented by the processor 401, and various circuits of memory, represented by the memory 402, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The processor 401 is responsible for managing the bus architecture and general processing, and the memory 402 may store data used by the processor 401 in performing operations.
Based on the same inventive concept, the embodiments of the present disclosure provide a storage medium storing computer-executable instructions for causing the computer to perform the method performed by the video classification apparatus in the above embodiments.
In summary, in the embodiment of the present invention, first, a classification processing device extracts a plurality of frame images from a specified time period in a video to be classified, obtains a plurality of frame image sets and a plurality of reference frame images, respectively performs resolution compression on the frame images in each frame image set, obtains a map corresponding to each frame image set, performs video type recognition, obtains a first classification result set and a corresponding first weight set, obtains a second classification result set and a corresponding second weight set, merges the first classification result set and the first weight set, and the second classification result set and the second weight set, and screens out a classification result with a weight meeting a preset condition as a target classification result. Therefore, the classification processing device can classify the video according to each obtained image of the moving pictures and the reference frame images, ensures the automatic training function of classification, improves the autonomy of system classification, reduces the system cost, saves the manpower resources, compresses a plurality of frame images of the video into the moving pictures compared with a classification method based on single frame images, then classifies the moving pictures, can greatly improve the classification efficiency, and in addition, screens out the target classification result from the final combined result, ensures the accuracy of the target classification result, improves the accuracy and the effectiveness of video type identification and accelerates the classification speed.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims and the equivalents thereof, the present invention is also intended to include such modifications and variations.
Claims (6)
1. A method of classifying video, comprising:
dividing a specified time period in the video to be classified into a plurality of time intervals, and respectively aiming at each time interval, executing the following operations: extracting a plurality of frame images from a time interval to form a frame image set corresponding to the time interval; extracting a frame image from the time interval to serve as a reference frame image corresponding to the time interval; each frame image set comprises at least two frame images;
respectively carrying out resolution compression on the obtained frame images in each frame image set, and carrying out weighted summation on pixel matrixes respectively corresponding to the frame images contained in each frame image set based on a time sequence corresponding relation and weight values respectively corresponding to the frame images contained in each frame image set to obtain a dynamic diagram corresponding to each frame image set;
performing video type recognition based on the obtained images, obtaining a first classification result set and a corresponding first weight set, and performing video type recognition based on the obtained images of the reference frames, obtaining a second classification result set and a corresponding second weight set, wherein one weight in the first weight set corresponds to one classification result in the first classification result, and one weight in the second weight set corresponds to one classification result in the second classification result;
determining the same classification result and different classification results contained in the first classification result set and the second classification result set;
the following is performed for each set of identical classification results: combining and averaging a group of the same classification results in the weights corresponding to the first weight set and the second weight set respectively to obtain the latest weight corresponding to the same classification result;
combining the same classification result and the corresponding latest weight of each group, and the different classification results and the corresponding weights to obtain a combined result;
and comparing weights corresponding to all the classification results in the combined result according to the obtained combined result, obtaining a comparison result, determining a maximum value of the weights according to the comparison result, and taking the video type corresponding to the maximum value of the weights as a target classification result.
2. The method of classifying video according to claim 1, wherein the resolution compression of the frame images in any one of the frame image sets includes:
determining an original pixel matrix of each frame image in the arbitrary frame image set;
determining the resolution corresponding to the original pixel matrix corresponding to each frame image according to the corresponding relation between the pixel matrix and the resolution preset by the system, and setting the resolution corresponding to the existing pixel matrix;
and modifying the resolution corresponding to the original pixel matrix to the resolution corresponding to the existing pixel matrix.
3. A video classification apparatus, comprising:
the extraction unit is used for dividing a specified time period in the video to be classified into a plurality of time intervals and respectively executing the following operations for each time interval: extracting a plurality of frame images from a time interval to form a frame image set corresponding to the time interval; extracting a frame image from the time interval to serve as a reference frame image corresponding to the time interval; each frame image set comprises at least two frame images;
the compression unit is used for respectively carrying out resolution compression on the obtained frame images in each frame image set, and carrying out weighted summation on pixel matrixes respectively corresponding to the frame images contained in each frame image set based on a time sequence corresponding relation and weight values respectively corresponding to the frame images contained in each frame image set to obtain a diagram corresponding to each frame image set;
the identification unit is used for carrying out video type identification based on each obtained moving picture, obtaining a first classification result set and a corresponding first weight set, and carrying out video type identification based on each obtained reference frame image, obtaining a second classification result set and a corresponding second weight set, wherein one weight in the first weight set corresponds to one classification result in the first classification result, and one weight in the second weight set corresponds to one classification result in the second classification result;
a processing unit, configured to determine the same classification result and different classification results contained in the first classification result set and the second classification result set; the following is performed for each set of identical classification results: combining and averaging a group of the same classification results in the weights corresponding to the first weight set and the second weight set respectively to obtain the latest weight corresponding to the same classification result; combining the same classification result and the corresponding latest weight of each group, and the different classification results and the corresponding weights to obtain a combined result; and comparing weights corresponding to all the classification results in the combined result according to the obtained combined result, obtaining a comparison result, determining a maximum value of the weights in all the weights according to the comparison result, and taking the video type corresponding to the maximum value of the weights as a target classification result.
4. A video classification apparatus according to claim 3, wherein the frame images in any one of the sets of frame images are subjected to resolution compression, and the compression unit is configured to:
determining an original pixel matrix of each frame image in the arbitrary frame image set;
determining the resolution corresponding to the original pixel matrix corresponding to each frame image according to the corresponding relation between the pixel matrix and the resolution preset by the system, and setting the resolution corresponding to the existing pixel matrix;
and modifying the resolution corresponding to the original pixel matrix to the resolution corresponding to the existing pixel matrix.
5. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of classifying video according to any of claims 1 to 2.
6. A storage medium, characterized in that instructions in the storage medium, when executed by a processor of a video classification apparatus, enable an electronic device to perform the video classification method of any one of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910894922.4A CN110647933B (en) | 2019-09-20 | 2019-09-20 | Video classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910894922.4A CN110647933B (en) | 2019-09-20 | 2019-09-20 | Video classification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110647933A CN110647933A (en) | 2020-01-03 |
CN110647933B true CN110647933B (en) | 2023-06-20 |
Family
ID=69010962
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910894922.4A Active CN110647933B (en) | 2019-09-20 | 2019-09-20 | Video classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110647933B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038420A (en) * | 2017-11-21 | 2018-05-15 | 华中科技大学 | A kind of Human bodys' response method based on deep video |
CN109145712A (en) * | 2018-06-28 | 2019-01-04 | 南京邮电大学 | A kind of short-sighted frequency emotion identification method of the GIF of fusing text information and system |
CN110059662A (en) * | 2019-04-26 | 2019-07-26 | 山东大学 | A deep video behavior recognition method and system |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6266442B1 (en) * | 1998-10-23 | 2001-07-24 | Facet Technology Corp. | Method and apparatus for identifying objects depicted in a videostream |
US20080089591A1 (en) * | 2006-10-11 | 2008-04-17 | Hui Zhou | Method And Apparatus For Automatic Image Categorization |
US9569696B1 (en) * | 2015-08-12 | 2017-02-14 | Yahoo! Inc. | Media content analysis system and method |
CN106779073B (en) * | 2016-12-27 | 2019-05-31 | 西安石油大学 | Media information classification method and device based on deep neural network |
CN107194419A (en) * | 2017-05-10 | 2017-09-22 | 百度在线网络技术(北京)有限公司 | Video classification methods and device, computer equipment and computer-readable recording medium |
CN108229529A (en) * | 2017-08-18 | 2018-06-29 | 北京市商汤科技开发有限公司 | Combining classifiers sorting technique and device, electronic equipment, storage medium |
CN108777815B (en) * | 2018-06-08 | 2021-04-23 | Oppo广东移动通信有限公司 | Video processing method and apparatus, electronic device, computer-readable storage medium |
CN109034012A (en) * | 2018-07-09 | 2018-12-18 | 四川大学 | First person gesture identification method based on dynamic image and video sequence |
CN109145840B (en) * | 2018-08-29 | 2022-06-24 | 北京字节跳动网络技术有限公司 | Video scene classification method, device, equipment and storage medium |
CN109862391B (en) * | 2019-03-18 | 2021-10-19 | 网易(杭州)网络有限公司 | Video classification method, medium, device and computing equipment |
-
2019
- 2019-09-20 CN CN201910894922.4A patent/CN110647933B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108038420A (en) * | 2017-11-21 | 2018-05-15 | 华中科技大学 | A kind of Human bodys' response method based on deep video |
CN109145712A (en) * | 2018-06-28 | 2019-01-04 | 南京邮电大学 | A kind of short-sighted frequency emotion identification method of the GIF of fusing text information and system |
CN110059662A (en) * | 2019-04-26 | 2019-07-26 | 山东大学 | A deep video behavior recognition method and system |
Non-Patent Citations (3)
Title |
---|
Carreira J et al..《action recognition a new model and the kinetics dataset》.《Proceedings of the IEEE conference on computer vision and pattern recognition》.2017,全文. * |
智洪欣.《基于深度学习的视频分类技术研究》.《中国优秀硕士学位论文全文数据库 信息科技辑》.2019,(第1期),全文. * |
范露露.《基于内容的视频检索与分类方法研究》.《中国优秀硕士学位论文全文数据库 信息科技辑》.2015,(第4期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN110647933A (en) | 2020-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110807757B (en) | Image quality evaluation method and device based on artificial intelligence and computer equipment | |
CN111950723B (en) | Neural network model training method, image processing method, device and terminal equipment | |
US20230085605A1 (en) | Face image processing method, apparatus, device, and storage medium | |
CN112950581A (en) | Quality evaluation method and device and electronic equipment | |
CN108198177A (en) | Image acquisition method, device, terminal and storage medium | |
CN109472193A (en) | Method for detecting human face and device | |
CN111967406A (en) | Method, system, equipment and storage medium for generating human body key point detection model | |
CN111985281A (en) | Image generation model generation method and device, and image generation method and device | |
CN109558901A (en) | A kind of semantic segmentation training method and device, electronic equipment, storage medium | |
CN112749660B (en) | Method and device for generating video content description information | |
CN113177529B (en) | Method, device, equipment and storage medium for identifying screen | |
CN113177438A (en) | Image processing method, apparatus and storage medium | |
US20250078570A1 (en) | Expression driving method and device, and expression driving model training method and device | |
CN113971627A (en) | A method and device for generating a license plate image | |
CN113920582B (en) | Human action scoring method, device, equipment and storage medium | |
CN115239551A (en) | Video enhancement method and device | |
CN114004974A (en) | Method and device for optimizing images captured in low light environment | |
CN110647933B (en) | Video classification method and device | |
CN111353330A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN106412567A (en) | Method and system for determining video definition | |
CN116415019B (en) | Virtual reality (VR) image recognition method and device, electronic device, and storage medium | |
CN116723194A (en) | Method, device, equipment and medium for distributing and processing video stream in cloud edge scene | |
CN113870210B (en) | Image quality assessment method, device, equipment and storage medium | |
CN115409721A (en) | Dark light video enhancement method and device | |
CN113191376A (en) | Image processing method, image processing device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |