[go: up one dir, main page]

CN114266778B - Video processing method, device, equipment and storage medium - Google Patents

Video processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114266778B
CN114266778B CN202111580218.5A CN202111580218A CN114266778B CN 114266778 B CN114266778 B CN 114266778B CN 202111580218 A CN202111580218 A CN 202111580218A CN 114266778 B CN114266778 B CN 114266778B
Authority
CN
China
Prior art keywords
video frame
target
frame sequence
feature
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111580218.5A
Other languages
Chinese (zh)
Other versions
CN114266778A (en
Inventor
胡可飞
邓帆
朱思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eswin Computing Technology Co Ltd
Original Assignee
Beijing Eswin Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eswin Computing Technology Co Ltd filed Critical Beijing Eswin Computing Technology Co Ltd
Priority to CN202111580218.5A priority Critical patent/CN114266778B/en
Publication of CN114266778A publication Critical patent/CN114266778A/en
Application granted granted Critical
Publication of CN114266778B publication Critical patent/CN114266778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种视频处理方法、装置、设备以及存储介质。该方法包括:对初始视频帧序列进行图像分割处理,确定图像分割结果;基于图像分割结果,确定初始视频帧序列中的目标子视频帧序列,目标子视频帧序列由包括目标对象的连续视频帧组成,目标子视频帧序列的帧数小于初始视频帧序列的帧数;基于目标子视频帧序列,生成目标视频帧序列,目标视频帧序列中的各视频帧包括目标对象,目标视频帧序列的帧数大于目标子视频帧序列的帧数。采用本申请实施例,可基于初始视频帧序列中的一段由包括目标对象的连续视频帧组成的目标子视频帧序列,生成各视频帧包括目标对象的目标视频帧序列,适用性高。

The embodiment of the present application discloses a video processing method, device, equipment and storage medium. The method includes: performing image segmentation processing on the initial video frame sequence to determine the image segmentation result; based on the image segmentation result, determining the target sub-video frame sequence in the initial video frame sequence, the target sub-video frame sequence is composed of continuous video frames including the target object, and the number of frames of the target sub-video frame sequence is less than the number of frames of the initial video frame sequence; based on the target sub-video frame sequence, generating a target video frame sequence, each video frame in the target video frame sequence includes the target object, and the number of frames of the target video frame sequence is greater than the number of frames of the target sub-video frame sequence. By adopting the embodiment of the present application, a target video frame sequence in which each video frame includes the target object can be generated based on a target sub-video frame sequence composed of continuous video frames including the target object in the initial video frame sequence, which has high applicability.

Description

Video processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a video processing method, apparatus, device, and storage medium.
Background
The unsupervised video object segmentation (Video Object Segmentation, VOS) problem requires that major objects, such as animals, persons, etc., in each video frame of a sequence of video frames be identified and located without providing any additional input, and that the same object needs to be tracked between different video frames. The video frame sequences are mostly from video clips such as movie drama, sports, dance, street beats and the like, and the various scenes can cause problems such as shot switching, target shielding, rapid movement, target midway appearance or disappearance and the like.
The existing target segmentation method mainly comprises the steps of firstly determining a target in a first video frame in a video frame sequence, and further determining a target in a subsequent video frame based on the target in the first video frame. Based on the method, an error accumulation phenomenon occurs along with the increase of the frame number, so that the recognition and tracking effects on the target object in the subsequent video frame are poor.
Disclosure of Invention
The embodiment of the application provides a video processing method, which can generate a target video frame sequence of each video frame comprising a target object based on a target sub-video frame sequence formed by a section of continuous video frames comprising the target object in an initial video frame sequence, and has high applicability.
In a first aspect, an embodiment of the present application provides a video processing method, including:
performing image segmentation processing on the initial video frame sequence, and determining an image segmentation result;
Determining a target sub-video frame sequence in the initial video frame sequence based on the image segmentation result, wherein the target sub-video frame sequence consists of continuous video frames comprising a target object, and the frame number of the target sub-video frame sequence is smaller than that of the initial video frame sequence;
generating a target video frame sequence based on the target sub-video frame sequence, wherein each video frame in the target video frame sequence comprises the target object, and the frame number of the target video frame sequence is larger than the frame number of the target sub-video frame sequence.
In a second aspect, an embodiment of the present application provides a video processing apparatus, including:
The image processing module is used for carrying out image segmentation processing on the initial video frame sequence and determining an image segmentation result;
a sequence determining module, configured to determine a target sub-video frame sequence in the initial video frame sequence based on the image segmentation result, where the target sub-video frame sequence is composed of continuous video frames including a target object, and a frame number of the target sub-video frame sequence is smaller than a frame number of the initial video frame sequence;
And the sequence generating module is used for generating a target video frame sequence based on the target sub-video frame sequence, wherein each video frame in the target video frame sequence comprises the target object, and the frame number of the target video frame sequence is larger than that of the target sub-video frame sequence.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other;
The memory is used for storing a computer program;
the processor is configured to execute the video processing method provided by the embodiment of the application when the computer program is called.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program that is executed by a processor to implement the video processing method provided by the embodiments of the present application.
In the embodiment of the application, a section of target sub-video frame sequence in the initial video frame sequence can be determined based on the image segmentation result corresponding to the initial video frame sequence, and each video frame in the target sub-video frame sequence comprises a target object. The target video frame sequence which also comprises the target object can be determined based on the target sub-video frame sequence, and the target video frame sequence is also composed of continuous video frames comprising the target object and has more video frames than the target sub-video frame sequence, so that the target object in most video frames in the initial video frame sequence is tracked, and the applicability is high.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a video processing method according to an embodiment of the present application;
FIG. 2 is a schematic view of a scenario in which mask features are optimized according to an embodiment of the present application;
FIG. 3 is a schematic view of a scenario for determining a predicted target mask feature provided by an embodiment of the present application;
FIG. 4 is a schematic view of a scenario featuring a determination of attention as provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a flowchart of a video processing method according to an embodiment of the present application. As shown in fig. 1, the video processing method provided by the embodiment of the present application may include the following steps:
and S11, performing image segmentation processing on the initial video frame sequence, and determining an image segmentation result.
In some possible embodiments, the initial video frame sequence may be a video frame sequence corresponding to any video segment such as a movie, a sports video, etc., and may be specifically determined based on actual application scene requirements, which is not limited herein.
Specifically, when the image segmentation processing is performed on the initial video frame sequence, the image segmentation processing may be performed on each video frame in the initial video frame sequence, so as to obtain an image segmentation result corresponding to each video frame in the initial video frame sequence.
When each video frame in the initial video frame sequence is subjected to image segmentation processing, initial image characteristics corresponding to the video frame can be determined, and then an image segmentation result corresponding to the video frame is obtained based on the initial image characteristics corresponding to the video frame.
When the image segmentation processing is performed on the initial video frame sequence, the image segmentation processing may be directly performed on each video frame in the initial video frame sequence based on an image segmentation algorithm, for example, SOLOv algorithm, specifically, may be determined based on the actual application scene requirement, which is not limited herein.
In some possible embodiments, the image segmentation result corresponding to the initial video frame sequence may include mask features for a plurality of objects included in each video frame in the initial video frame sequence. For each video frame, the image segmentation result corresponding to the video frame may further include an object included in the video frame determined based on each mask feature corresponding to the video frame.
For example, the image segmentation result corresponding to the initial video frame sequence includes a mask feature corresponding to each video frame, and an object, such as a person, an animal, or the like, included in each video frame determined based on the mask feature corresponding to each video frame.
Step S12, determining a target sub-video frame sequence in the initial video frame sequence based on the image segmentation result.
In some possible implementations, the target sub-video frame sequence is composed of consecutive video frames including the target object, and the number of frames of the target sub-video frame sequence is less than the number of frames of the initial video frame sequence. That is, the target sub-video frame sequence is a video frame sequence segment in which each video frame in the initial video frame sequence includes a target object, and the target object is any one of the objects corresponding to each video frame in the initial video frame sequence.
Specifically, all objects included in each video frame in the initial video frame sequence may be determined based on the image segmentation result corresponding to the initial video frame sequence, and the first object and the second object may be further determined therefrom.
The first object is an object included in each video frame in the initial video frame sequence, and the second object is an object in the initial video frame sequence, where at least one video frame does not include the object.
Further, for each second object, since at least one video frame in the initial video frame sequence does not include the second object, at least one sub video frame sequence corresponding to the second object may be determined from the initial video frame sequence, each sub video frame sequence being composed of consecutive video frames including the second object.
For example, the initial video frame sequence has a frame number of 100 frames, wherein the 33 th, 34 th and 55 th video frames do not include the second object, and three video frame sequences of 1 st to 32 th, 35 th to 54 th and 56 th to 100 th video frames in the initial video frame sequence may be determined as sub-video frame sequences corresponding to the second object.
Further, any second object may be determined as a target object, and any sub-video frame sequence in the at least one sub-video frame sequence corresponding to the second object may be determined as a target sub-video frame sequence. For example, a sub-video frame sequence having the largest number of frames (i.e., the longest sequence) among at least one sub-video frame sequence corresponding to the second object may be determined as the target sub-video frame sequence.
The specific determination mode of the target object and the specific mode of determining the target sub-video frame sequence from at least one sub-video frame sequence corresponding to the target object may be determined based on the actual application scene requirement, which is not limited herein.
Alternatively, in determining the target sub-video frame sequence in the initial video frame sequence, at least one sub-video frame sequence of the initial video frame sequence corresponding to each object may be determined by a tracking algorithm based on the image segmentation result of the initial video frame sequence. That is, for each object, at least one sub-video frame sequence consisting of consecutive video frames comprising the object may be determined from the initial video frame sequence based on a tracking algorithm.
The tracking algorithm may be a Simple Online and real-time tracking (Simple Online AND REALTIME TRACKING, SORT) algorithm or other algorithms, which are not limited herein.
Further, for each object, if at least one sub-video frame sequence including the object is determined from the initial video frame sequence and the number of frames of any one sub-video frame sequence is smaller than the number of frames of the initial video frame sequence, the object may be determined as a second object.
Further, any second object may be determined as a target object, and any sub-video frame sequence in the at least one sub-video frame sequence corresponding to the second object may be determined as a target sub-video frame sequence. For example, a sub-video frame sequence having the largest number of frames (i.e., the longest sequence) among at least one sub-video frame sequence corresponding to the second object may be determined as the target sub-video frame sequence.
Optionally, the target object may be determined from objects included in each video frame in the initial video frame sequence, for example, any object is determined as the target object. Further, a mask feature of a target object is determined from mask features of a plurality of objects included in each video frame in the initial video frame sequence (for convenience of description, the mask feature of the target object is hereinafter referred to as a target mask feature), and a target sub-video frame sequence in the initial video frame sequence is determined based on the mask feature of the target object.
If the video frames corresponding to the target mask features are determined from the initial video frame sequence, at least one sub-video frame sequence is obtained based on the video frames corresponding to the target mask features, and any sub-video frame sequence is determined as the target sub-video frame sequence in the initial video frame sequence.
If the frame number of the target sub-video frame sequence determined from the initial video frame sequence is the same as that of the initial video frame, determining the target object from other objects again, and determining the target sub-video frame sequence consisting of continuous video frames comprising the new target object based on the mode.
In some possible embodiments, after determining the mask features of the objects included in each video frame in the initial video frame sequence, each mask feature may be further optimized to further improve the segmentation accuracy of the mask feature and optimize the edge details of the mask feature to improve the integrity and accuracy of the mask feature.
Specifically, for each mask feature, a video frame corresponding to the mask feature may be determined, and an object corresponding to the mask feature may be determined from the video frame. Further, the image characteristics of the object can be determined, and the image characteristics of the object and the mask characteristics are fused to obtain fusion target fusion characteristics, so that the optimized mask characteristics corresponding to the mask characteristics can be obtained based on the fusion target fusion characteristics. After determining the optimized mask features for each mask feature, a target sub-video frame sequence in the initial video frame sequence may be determined based on each optimized mask feature based on any of the above embodiments.
As an example, a target object is first determined from objects included in each video frame in the initial video frame sequence, and a mask feature (hereinafter referred to as a target mask feature for convenience of description) of the target object is determined from each mask feature.
For each target mask feature, an image feature of a target object (hereinafter referred to as a third image feature for convenience of description) included in a video frame corresponding to the target mask feature may be determined, and an optimized mask feature corresponding to the target mask feature may be determined based on the target mask feature and the third image feature.
After determining the optimized mask features corresponding to each target mask feature, a target sub-video frame sequence in the initial video frame sequence may be determined based on the optimized mask features corresponding to the target object.
The determining of the optimized mask feature corresponding to any mask feature (e.g., any target mask feature) may be performed based on a neural network model, for example, edge details of the mask feature may be optimized based on REFINENET network models, and the selection of a specific neural network model may be determined based on actual application scene requirements, which is not limited herein.
Referring to fig. 2, fig. 2 is a schematic view of a scenario in which mask features are optimized according to an embodiment of the present application. The video frame shown in fig. 2 is any video frame in the initial video frame sequence, and the building in the video frame is the target object. Based on the above, the image features of the target object and the target mask features of the target object may be fused, and the fused features may be input REFINENET into a network model, and finally the optimized mask features of the target mask features may be obtained based on the network model.
And step S13, generating a target video frame sequence based on the target sub-video frame sequence.
In some possible embodiments, based on the target sub-video frame sequence, a target video frame sequence having a greater number of frames than the target sub-video frame sequence may be generated, i.e., a target video frame sequence having a longer sequence length. Wherein each video frame in the target video frame sequence comprises a target object, i.e. the target video frame sequence consists of consecutive video frames comprising the target object.
In particular, the target mask features of the target object may be sequentially determined for other video frames in the initial video frame sequence than the target sub-video frame sequence based on the target mask features of the target object included in the plurality of video frames in the target sub-video frame sequence.
The target mask feature corresponding to each video frame after the target sub-video frame sequence is determined based on the target mask feature corresponding to the previous video frame of the video frame, and the target mask feature corresponding to the first video frame (hereinafter referred to as the first video frame for convenience of description) after the target sub-video frame sequence is determined based on the target mask feature corresponding to the last video frame in the target sub-video frame sequence.
The target mask feature corresponding to each video frame before the target sub-video frame sequence is determined based on the target mask feature corresponding to the next video frame of the video frame, and the target mask feature corresponding to the last video frame before the target sub-video frame sequence is determined based on the target mask feature corresponding to the first video frame in the target sub-video frame sequence.
For example, the initial video frame sequence includes 20 frames, and the target sub-video frame sequence is a video frame sequence corresponding to the 3 rd to 18 th video frames including the target object. Then the target mask feature for the target object may be determined based on the target mask feature for the last video frame of the sequence of target sub-video frames (i.e., the 18 th video frame of the sequence of initial video frames), the 19 th video frame of the sequence of initial video frames that follows the sequence of target sub-video frames corresponds to the target object, and the target mask feature for the target object may be determined based on the target mask feature for the 19 th video frame of the sequence of initial video frames that corresponds to the target object.
Similarly, the target mask feature of the target object may be determined based on the target mask feature of the target object included in the first video frame of the target sub-video frame sequence (i.e., the 3 rd video frame of the initial video frame sequence), the 2 nd video frame of the initial video frame sequence preceding the target sub-video frame sequence corresponding to the target object, and the target mask feature of the target object may be determined based on the target mask feature of the 2 nd video frame of the initial video frame sequence corresponding to the target object.
In this manner, the other video frames in the sequence of forward predicted initial video frames may correspond to the target mask features of the target object based on the target mask features of the target object included in the first video frame in the sequence of target sub-video frames, and the other video frames in the sequence of backward predicted initial video frames may correspond to the target mask features of the target object based on the target mask features of the target object included in the last video frame in the sequence of target sub-video frames.
Further, a mask feature sequence may be generated based on target mask features corresponding to each video frame in the target sub-video frame sequence and target mask features corresponding to other video frames in the initial video frame sequence. The mask feature sequence is obtained when each video frame in the mask feature sequence is subjected to image segmentation processing, and the mask features corresponding to other video frames in the mask feature sequence are obtained based on the prediction of the mask feature corresponding to the first or last video frame in the mask feature sequence.
For any video frame except for the target sub-video frame sequence in the initial video frame sequence, in the case that the image segmentation processing does not determine the target mask characteristics of the target object in the video frame or in the case that the target mask characteristics of the target object in the video frame are omitted when the target sub-video frame sequence is determined based on an algorithm such as SORT, the target mask characteristics of the video frame corresponding to the target object can be predicted based on the above mode.
Further, based on the determined mask feature sequence, a target video frame sequence may be generated in which each video frame includes a target object and the number of frames is longer than the number of frames of the target sub-video frame sequence. If the target sub-video frame sequence is a sequence of 20 consecutive frames comprising the target object, a sequence of target video frames comprising more than 20 consecutive frames of the target object may be generated based on the above-described mask feature sequence. If the mask feature sequence corresponding to the target object is determined based on the mask feature sequence, the target tracking of the target object in the initial video frame sequence can be realized based on the generated target video frame sequence.
The target object included in any video frame in the target video frame sequence may be determined based on a target mask feature corresponding to the video frame in the mask feature sequence.
In some possible embodiments, different second objects in the second objects included in each video frame in the initial video frame sequence may be determined as target objects, and a target video frame sequence corresponding to the different second objects may be obtained, so as to implement target tracking on each second object in the initial video frame sequence, and predict the second object in the video frame in the initial video frame sequence that does not include the second object.
The first object is an object included in each video frame in the initial video frame sequence, so that the first corresponding target tracking in the initial video frame sequence is realized based on the first object included in each video frame in the initial video frame sequence or a mask feature corresponding to the first object.
In some possible embodiments, when determining that any video frame in the initial video frame sequence other than the target sub-video frame sequence corresponds to the target mask feature of the target object, the target mask feature of the video frame corresponding to the target object may be determined based on the video frame and the target mask feature of the video frame corresponding to the target object, and at least one video frame (hereinafter referred to as a third video frame for convenience of description) in the target sub-video frame sequence and the target mask feature of the target object included in the video frame sequence.
When it is determined that any two other video frames except the target sub-video frame sequence in the initial video frame sequence correspond to the target mask feature of the target object, at least one third video frame selected from the target sub-video frame sequence corresponding to the two video frames may be completely identical or partially identical or completely different, and may be specifically determined based on the actual application scene requirement, which is not limited herein.
When determining that any one of the other video frames except the target sub-video frame sequence in the initial video frame sequence corresponds to the target mask feature of the target object, any one of the third video frames selected from the target sub-video frame sequence corresponding to the video frame is any one of the video frames in the target sub-video frame sequence, which can be determined specifically based on the actual application scene requirement, and is not limited herein.
When it is determined that the last video frame (the second video frame) of the initial video frame sequence before the target sub-video frame sequence or the first video frame (the first video frame) of the initial video frame sequence after the target sub-video frame sequence corresponds to the target mask feature of the target object, the last video frame in the target sub-video frame sequence may be included in at least one third video frame selected from the target sub-video frame sequence corresponding to the first video frame, and the first video frame in the target sub-video frame sequence may be included in at least one third video frame selected from the target sub-video frame sequence corresponding to the second video frame, which may be specifically determined based on the actual application scene requirement, without limitation.
Taking the first video frame (i.e., the first video frame) after the target sub-video frame sequence in the initial video frame sequence as an example, any one or more video frames in the target sub-video frame sequence may be determined to be third video frames, and the first video frame may be determined to correspond to the target mask feature of the target object based on the last video frame in the target sub-video frame sequence, the target mask feature of the target object included in the last video frame in the target sub-video frame sequence, each third video frame, and the target mask feature of the target object included in each third video frame.
Specifically, a predicted target mask feature of the first video frame corresponding to the target object may be determined based on the last video frame in the sequence of target sub-video frames, the target mask feature of the target object included in the last video frame in the sequence of target sub-video frames, the target mask feature of each third video frame, and the target object included in each third video frame, and the predicted target mask feature may be determined as the target mask feature of the first video frame corresponding to the target object.
Optionally, after determining that the first video frame corresponds to the predicted target mask feature of the target object, an intersection ratio of the target mask feature corresponding to the last video frame in the sequence of target sub-video frames and the predicted target mask feature may be determined. I.e. determining the intersection and union of the target mask feature corresponding to the last video frame in the sequence of target sub-video frames with the predicted target mask feature and determining the ratio of the intersection and union.
If the intersection ratio of the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature is smaller than a preset threshold value, the difference between the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature is larger, and the predicted target predicted mask feature can be determined to be the target mask feature of the target object included in the first video frame.
If the intersection ratio of the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature is greater than or equal to a preset threshold, the difference between the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature is smaller, and at this time, the target mask feature corresponding to the last video frame in the target sub-video frame sequence can be determined as the target mask feature corresponding to the first video frame.
Based on the above manner, the target mask characteristics corresponding to the finally determined first video frame can be made to be closer to the actual mask characteristics of the target object included in the first video frame. The specific value of the preset threshold may be determined based on the actual application scene requirement, for example, may be 0.9, and the like, which is not limited herein.
It should be specifically noted that, for any video frame in the initial video frame sequence other than the target sub-video frame sequence, after determining the predicted target mask feature corresponding to the video frame, the intersection ratio of the target predicted mask feature of the previous video frame or the next video frame to the predicted target mask feature of the video frame may be determined, so as to determine the target mask feature of the video frame corresponding to the target object based on the intersection ratio.
In some possible implementations, when determining that the first video frame corresponds to the predicted target mask feature of the target object based on the last video frame in the sequence of target sub-video frames, the target mask feature of the target object included in each third video frame, and the target mask feature of the target object included in each third video frame, the attention feature may be determined based on the target mask feature of the target object included in the last video frame in the sequence of target sub-video frames, the target mask feature of the target object included in each third video frame, and the target mask feature of the target object included in the last video frame in the sequence of target sub-video frames, and the predicted target mask feature of the first video frame corresponding to the target object is determined based on the attention feature and the target mask feature of the target object included in the last video frame in the sequence of target sub-video frames.
Specifically, for each third video frame, an image feature (hereinafter referred to as a first image feature for convenience of description) and a context feature (hereinafter referred to as a first context feature for convenience of description) corresponding to the third video frame may be determined based on the third video frame and its corresponding object mask feature.
For each third video frame, the target object included in the third video frame may be replaced with a corresponding target mask feature to obtain a new third video frame, and then feature processing is performed on the new third video frame to obtain a first image feature and a first context feature corresponding to the third video frame. Based on the first image feature and the first context feature corresponding to each third video frame can be obtained.
Further, the first image features corresponding to each third video frame are fused to obtain a fused image feature (hereinafter referred to as a first fused image feature for convenience of description). For example, the feature values of the first image features corresponding to the third video frames in each channel may be fused, or the feature values of the first image features in each channel may be averaged to obtain a first fused image feature, which is not limited herein.
Similarly, the first context features corresponding to the third video frames can be fused to obtain fused context features. For example, the first context features corresponding to each third video frame may be fused or subjected to mean processing, to obtain fused context features.
Further, the gaussian blur processing may be performed on the object mask corresponding to the last video frame in the object sub-video frame sequence to obtain a blur mask feature, and the feature processing may be performed on the last video frame in the object sub-video frame sequence to obtain an image feature (for convenience of description, hereinafter referred to as a second image feature) and a context feature (for convenience of description, hereinafter referred to as a second context feature) corresponding to the last video frame in the object sub-video frame sequence. And determining the attention feature based on the first fused image feature, the fused context feature, the blur mask feature, the second image feature, and the second context feature.
The first fused image feature, the fused context feature, the blur mask feature, the second image feature, and the second context feature may be input into an attention network, through which the attention feature is ultimately obtained.
After determining the attention feature, feature processing may be performed on the attention feature and the second image feature to obtain a prediction mask feature of the first video frame corresponding to the target object.
With reference to fig. 3, fig. 3 is a schematic view of a scenario for determining a predicted target mask feature according to an embodiment of the present application. Fig. 3 is used to determine that a first video frame (first video frame) in the initial video frame sequence that follows the target sub-video frame sequence corresponds to a predicted target mask feature for a target object, and wherein the target object is a person in the video frame.
Fig. 3 determines two third video frames from the target sub-video frame sequence, after replacing the target object (character) included in each third video frame with a corresponding target mask feature, encodes each third video frame to obtain first image features k1 and k2 corresponding to each third video frame and first context features v1 and v2 corresponding to each third video frame, further fuses k1 and k2 to obtain a first fused image feature km, and fuses v1 and v2 to obtain a fused context feature vm.
And simultaneously, for the last video frame in the target sub-video frame sequence, encoding the video frame to obtain a second image characteristic kq and a second context characteristic vq corresponding to the last video frame in the target sub-video frame sequence. And carrying out Gaussian blur processing on the target mask feature corresponding to the last video frame in the target sub-video frame sequence to obtain a blur mask feature p.
Further, the first fused image feature km, the fused context feature vm, the fuzzy Gaussian feature p, the second image feature kq and the second context feature vq are input into an attention network to obtain an attention feature y.
The second image feature kq and the attention feature y are input to a decoder, resulting in a predicted target mask feature of the first video frame corresponding to the target object.
After determining the attention features, the attention features can be sampled in parallel by hole convolution with different sampling rates based on a hole space convolution pooling pyramid (AtrousSpatial Pyramid Pooling, ASPP), and further fusion processing is carried out on the attention features according to a plurality of proportions, so that the processed attention features are obtained. And further, the processed attention feature and the second image feature can be subjected to feature processing to obtain a prediction mask feature of the first video frame corresponding to the target object.
The attention network may be a Motion-Guided space-time Memory network (SPACE TIME Memory STM) network, or may be another neural network, which is not limited herein.
In some possible embodiments, when the first fused image feature, the fused context feature, the blur mask feature, the second image feature, and the second context feature are input into the attention network, and the attention feature is finally obtained through the attention network, the blur mask may be further processed based on the second image feature, so that the blur mask further covers related information of a last video frame in the target sub-video frame sequence, and the processed blur mask feature is obtained.
Specifically, the second image feature and the fuzzy mask feature can be fused to obtain a fusion target fusion feature, the bias parameter and the weight parameter corresponding to the fuzzy mask feature are obtained based on the fusion target fusion feature, and then the fuzzy mask feature is processed based on the bias parameter and the weight parameter, so that the processed fuzzy mask feature is obtained. For example, the blur mask feature may be processed by different convolution layers and activation functions, respectively, to obtain corresponding weight parameters and bias parameters.
Further, the first image feature and the second image feature may be further fused to obtain a corresponding fused image feature (hereinafter referred to as a second fused image feature for convenience of description), so as to determine the attention feature based on the second image fusion feature, the second context feature, the processed blur mask feature, and the context fusion feature.
With reference to fig. 4, fig. 4 is a schematic view of a scenario for determining attention features provided by an embodiment of the present application. Wherein fig. 4 is a network structure diagram of the attention network shown in fig. 3. After the first fused image feature km, the fused context feature vm, the blur mask feature p, the second image feature kq, and the second context feature vq are obtained based on fig. 3, the second image feature kq and the blur mask feature p may be fused to obtain the target fused feature.
The method comprises the steps of processing target fusion features through a convolution layer and an activation function (Sigmoid) to obtain bias parameters w corresponding to fuzzy mask features, processing target fusion features through another convolution layer and an activation parameter Sigmoid) to obtain weight parameters b corresponding to fuzzy mask features p, and determining the bias parameters w and the weight parameters b to be different convolution layers. After obtaining the bias parameter w and the weight parameter b, the fuzzy mask feature p and the weight parameter w may be subjected to a dot product operation, and added to the bias parameter over the operation result to obtain a processed mask feature p'.
Alternatively, a vector product of the second image feature kq and the first fused image feature km may be determined and processed by an activation function (softmax) to obtain the second fused image feature. And performing dot multiplication operation on the second fused image feature and the processed mask feature p', determining a vector product of an operation result and the fused context feature vm, and further fusing the vector product, the second context feature and the second fused image feature to obtain the attention feature.
It should be specifically noted that, the implementation manner of determining the predicted target mask feature of the first video frame and determining the target mask feature of the first video frame may be applicable to other video frames in the initial video frame sequence except for the target sub-video frame sequence, which is not described herein.
In some possible embodiments, in the process of determining the target mask features corresponding to other video frames in the initial video frame sequence except for the target sub-video frame sequence, after determining the target mask feature corresponding to one video frame, a mask feature sequence may be generated based on the target mask feature corresponding to the video frame in the target sub-video frame sequence and the newly determined target mask feature, so as to generate a new video frame sequence based on the mask feature sequence.
That is, in the process of forward predicting the target mask feature of the target object included in the first video frame in the initial video frame sequence based on the target mask feature of the target object included in the first video frame in the initial video frame sequence, backward predicting the target mask feature of the target object included in the last video frame in the initial video frame sequence based on the target mask feature of the target object included in the last video frame in the initial video frame sequence, a frame number corresponding to the newly determined target mask feature may be determined to be greater than the target video frame sequence of the target sub video frame sequence once a target mask feature is predicted.
In this case, since the target object may correspond to a plurality of sub-video frame sequences, and the target sub-video frame sequence is one of the plurality of sub-video frame sequences corresponding to the target object, in the above-described process, there may be a plurality of sub-video frame sequences composed of consecutive video frames including the target object. At this time, if there are a preset number of overlapping video frames in the two sub-video frame sequences, a new video frame sequence may be regenerated based on the two sub-video frame sequences, so as to avoid repeatedly predicting the target mask features of the target objects included in the other sub-video frame sequences.
That is, if a first video frame sequence and a second video frame sequence composed of consecutive video frames including a target object are obtained, for example, the target sub-video frame sequence or a new video frame sequence generated based on the target sub-video frame sequence is regarded as the first video frame sequence, and any other sub-video frame sequence corresponding to the target object is regarded as the second video frame sequence. If there are a preset number of video frames overlapping with each other in the first video frame sequence and the second video frame sequence, it may be determined that the first video frame sequence and the second video frame sequence include the same target object, and there are video frames having the same partial frame numbers, and at this time, a third video frame sequence may be generated based on the first video frame sequence and the second video frame sequence.
The overlapping preset number of video frames may be regarded as the same frame number and include the same target object in the initial video frame sequence, and the preset number may be determined based on the actual application scene requirement, which is not limited herein.
Further, if the number of frames of the third video frame sequence is less than the number of frames in the initial video frame sequence, indicating that the target mask features corresponding to the other video frames in the initial video frame sequence except the third video frame sequence are not yet determined, based on the target mask features of the target object included in the first video frame in the third video frame sequence, the other video frames in the initial video frame sequence except the third video frame sequence are forward predicted to correspond to the target mask features of the target object, and based on the target mask features of the target object included in the last video frame in the third video frame sequence, the other video frames in the initial video frame sequence except the third video frame sequence are backward predicted to correspond to the target mask features of the target object. And the method is circulated until two video frame sequences of the preset number of the video frames which are overlapped do not exist, the target mask characteristics corresponding to the rest individual video frames can be continuously determined based on the first video frame and/or the last video frame in the finally obtained video frame sequences, the final video frame sequence is obtained based on the final mask characteristic sequence, and the final video frame sequence is determined to be the target object tracking video frame sequence corresponding to the initial video frame sequence.
For example, the initial video frame sequence includes 20 frames, and the sub-video frame sequence corresponding to the target object is a continuous video frame sequence including the target object composed of 3 rd to 8 th video frames, and a continuous video frame sequence including the target object composed of 10 th to 19 th video frames.
Assuming that the previous sub-video frame sequence is determined as the target sub-video frame sequence, the target mask features corresponding to the 1 st frame video frame and the 2 nd frame video frame can be obtained by starting continuous prediction based on the mask features corresponding to the 3 rd frame video frame in the target sub-video frame sequence, and the target mask features corresponding to the 9 th frame and a plurality of subsequent video frames can be obtained by starting continuous prediction based on the mask features corresponding to the 8 th frame in the target sub-video frame sequence. And along with the continuous progress of the predicted target mask feature, a target mask feature sequence can be obtained in real time and a target video frame sequence can be generated in real time.
When there is a preset number of overlapping video frames in the sub-video frame sequence (assumed to be the first video frame sequence) corresponding to the 10 th to 19 th video frames in the initial video frame sequence, if the last video frame in the first video frame sequence is the 12 th video frame in the initial video frame sequence, the 10 th to 12 th video frames in the first video frame sequence corresponding to the initial video frame sequence and the 10 th to 12 th video frames in the second video frame sequence corresponding to the initial video frame sequence each include a target object, a third video frame sequence can be generated based on the first video frame sequence and the second video frame sequence, and the third video frame sequence is composed of the 1 st to 19 th video frames corresponding to the initial video frame sequence, and each video frame includes a target object.
Further, based on the target mask feature of the last video frame in the third video frame sequence, the target mask feature corresponding to the 20 th video frame can be determined, and further, a final target video frame sequence is obtained based on the latest obtained mask feature sequence, and the final video frame sequence is determined to be the target object tracking video frame sequence corresponding to the initial video frame sequence, so that target tracking of the target object in the initial video frame sequence is achieved.
Based on the implementation manner, the object tracking video frame sequence corresponding to each object in the initial video frame sequence can be obtained, so that the target tracking of each object in the initial video frame sequence can be realized. Meanwhile, each video frame corresponding to the same video frame (hereinafter referred to as a target video frame for convenience of description) in the initial video frame sequence in the object tracking video frame sequence corresponding to each object can be fused to obtain a fused video frame corresponding to the target video frame.
The fusion video frame comprises all objects in the initial video frame sequence, so that the content of each video frame in the initial video frame sequence can be modified and the definition of a target object in each video frame can be improved. And arranging the fused video frames according to the frame numbers of the corresponding target video frames to obtain a fused video frame sequence, namely an optimized video frame sequence corresponding to the initial video frame sequence, so that each object in the initial video frame sequence can be subjected to target tracking based on the optimized video frame sequence.
In the embodiment of the application, by determining the mask characteristics of other video frames except the target sub-video frame sequence in the initial video frame sequence corresponding to the target object, the target video frame sequences which all comprise the target object can be determined based on the mask characteristic sequences under the condition that the target mask characteristics of the target object in the video frame are not determined in the image segmentation process or the target object in partial video frames is omitted when the target object is tracked based on the target tracking algorithm, so that the target tracking of the target object in the initial video frame sequence is realized, the accuracy and the continuity of the target tracking are improved, and the applicability is high.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. The video processing device provided by the embodiment of the application comprises:
an image processing module 51, configured to perform image segmentation processing on the initial video frame sequence, and determine an image segmentation result;
A sequence determining module 52, configured to determine a target sub-video frame sequence in the initial video frame sequence based on the image segmentation result, where the target sub-video frame sequence is composed of continuous video frames including a target object, and a frame number of the target sub-video frame sequence is smaller than a frame number of the initial video frame sequence;
The sequence generating module 53 is configured to generate a target video frame sequence based on the target sub-video frame sequence, where each video frame in the target video frame sequence includes the target object, and a frame number of the target video frame sequence is greater than a frame number of the target sub-video frame sequence.
In some possible embodiments, the image segmentation result includes mask features of a plurality of objects included in each video frame in the initial video frame sequence;
the sequence determining module 52 is configured to:
Determining a target mask feature of a target object from mask features of a plurality of objects included in each video frame in the initial video frame sequence;
and determining a target sub-video frame sequence in the initial video frame sequence based on the target mask characteristics of the target object.
In some possible embodiments, the sequence generating module 53 is configured to:
Sequentially determining that other video frames in the initial video frame sequence except the target sub-video frame sequence correspond to the target mask features of the target object based on the target mask features of the target object included in the plurality of video frames in the target sub-video frame sequence;
Wherein the target mask feature corresponding to each video frame following the target sub-video frame sequence is determined based on the target mask feature corresponding to the previous video frame of the video frame, and the target mask feature corresponding to the first video frame is determined based on the target mask feature corresponding to the last video frame in the target sub-video frame sequence, and the first video frame is the first video frame following the target sub-video frame sequence;
The target mask feature corresponding to each video frame before the target sub-video frame sequence is determined based on the target mask feature corresponding to the next video frame of the video frame, and the target mask feature corresponding to the second video frame is determined based on the target mask feature corresponding to the first video frame in the target sub-video frame sequence, and the second video frame is the last video frame before the target sub-video frame sequence;
Generating a mask feature sequence based on the target mask features corresponding to each video frame in the target sub-video frame sequence and the target mask features corresponding to other video frames in the initial video frame sequence except the target sub-video frame sequence;
And generating a target video frame sequence based on each video frame corresponding to the mask feature sequence.
In some possible embodiments, the sequence generating module 53 is configured to:
determining at least one third video frame in the target sub-video frame sequence, wherein each third video frame is any video frame in the target sub-video frame sequence;
determining a predicted target mask feature corresponding to the first video frame based on a last video frame in the sequence of target sub-video frames and a target mask feature corresponding to the last video frame and each third video frame and a target mask feature corresponding to the third video frame;
And determining the target mask characteristic corresponding to the first video frame based on the target mask characteristic corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask characteristic.
In some possible embodiments, the sequence generating module 53 is configured to:
for each third video frame, determining a first image feature and a first context feature corresponding to the third video frame based on the third video frame and the corresponding target mask feature;
fusing the first image features to obtain first fused image features, and fusing the first context features to obtain fused context features;
Carrying out Gaussian blur processing on the target mask feature corresponding to the last video frame in the target sub-video frame sequence to obtain a blur mask feature;
determining a second image feature and a second context feature corresponding to the last video frame in the target sub-video frame sequence;
determining an attention feature based on the first fused image feature, the fused context feature, the blur mask feature, the second context feature, and the second image feature;
And determining a prediction target mask characteristic corresponding to the first video frame based on the attention characteristic and the second image characteristic.
In some possible embodiments, the sequence generating module 53 is configured to:
Processing the fuzzy mask features based on the second image features to obtain processed fuzzy mask features;
Determining a second fused image feature based on the second image feature and the first fused image feature;
and determining an attention feature based on the second fused image feature, the second context feature, the processed blur mask feature, and the fused context feature.
In some possible embodiments, the sequence generating module 53 is configured to:
fusing the second image features and the fuzzy mask features to obtain fusion target fusion features;
Obtaining bias parameters and weight parameters corresponding to the fuzzy mask features based on the fusion target fusion features;
And obtaining the processed fuzzy mask characteristic based on the bias parameter, the weight parameter and the fuzzy mask characteristic.
In some possible embodiments, the sequence generating module 53 is configured to:
determining the cross ratio of the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature;
if the intersection ratio is smaller than a preset threshold value, determining the predicted target mask characteristic as a target mask characteristic corresponding to the first video frame;
and if the merging ratio is greater than or equal to the preset threshold, determining the target mask characteristic corresponding to the last video frame in the target sub-video frame sequence as the target mask characteristic corresponding to the first video frame.
In some possible embodiments, the sequence determining module 52 is configured to:
For each target mask feature, determining a third image feature of a target object included in the video frame corresponding to the target mask feature, and determining an optimized mask feature corresponding to the target mask feature based on the third image feature and the target mask feature;
A target sub-video frame sequence in the initial video frame sequence is determined based on each of the optimization mask features.
In some possible embodiments, the sequence determining module 52 is further configured to:
If a first video frame sequence and a second video frame sequence which are formed by continuous video frames comprising the target object are obtained, and a preset number of overlapped video frames exist in the first video frame sequence and the second target sub-video frame, a third target sub-video frame sequence is generated based on the first video frame sequence and the second video frame sequence, and the third target sub-video frame sequence is formed by continuous video frames comprising the target object.
In a specific implementation, the video processing apparatus may execute, through each functional module built in the video processing apparatus, an implementation manner provided by each step in fig. 1, and specifically, the implementation manner provided by each step may be referred to, which is not described herein again.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 600 in this embodiment may include a processor 601, a network interface 604, and a memory 605, and the electronic device 600 may further include a user interface 603, and at least one communication bus 602. Wherein the communication bus 602 is used to enable connected communications between these components. The user interface 603 may include a Display screen (Display), a Keyboard (Keyboard), and the optional user interface 603 may further include a standard wired interface, a wireless interface, among others. The network interface 604 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 604 may be a high-speed RAM memory or a non-volatile memory (NVM), such as at least one disk memory. The memory 605 may also optionally be at least one storage device located remotely from the processor 601. As shown in fig. 6, an operating system, a network communication module, a user interface module, and a device control application may be included in the memory 605, which is one type of computer-readable storage medium.
In the electronic device 600 shown in fig. 6, the network interface 604 may provide network communication functions, while the user interface 603 is primarily an interface for providing input to a user, while the processor 601 may be used to invoke the device control application stored in the memory 605 to implement:
performing image segmentation processing on the initial video frame sequence, and determining an image segmentation result;
Determining a target sub-video frame sequence in the initial video frame sequence based on the image segmentation result, wherein the target sub-video frame sequence consists of continuous video frames comprising a target object, and the frame number of the target sub-video frame sequence is smaller than that of the initial video frame sequence;
generating a target video frame sequence based on the target sub-video frame sequence, wherein each video frame in the target video frame sequence comprises the target object, and the frame number of the target video frame sequence is larger than the frame number of the target sub-video frame sequence.
In some possible embodiments, the image segmentation result includes mask features of a plurality of objects included in each video frame in the initial video frame sequence;
the processor 601 is configured to:
Determining a target mask feature of a target object from mask features of a plurality of objects included in each video frame in the initial video frame sequence;
and determining a target sub-video frame sequence in the initial video frame sequence based on the target mask characteristics of the target object.
In some possible embodiments, the processor 601 is configured to:
Sequentially determining that other video frames in the initial video frame sequence except the target sub-video frame sequence correspond to the target mask features of the target object based on the target mask features of the target object included in the plurality of video frames in the target sub-video frame sequence;
Wherein the target mask feature corresponding to each video frame following the target sub-video frame sequence is determined based on the target mask feature corresponding to the previous video frame of the video frame, and the target mask feature corresponding to the first video frame is determined based on the target mask feature corresponding to the last video frame in the target sub-video frame sequence, and the first video frame is the first video frame following the target sub-video frame sequence;
The target mask feature corresponding to each video frame before the target sub-video frame sequence is determined based on the target mask feature corresponding to the next video frame of the video frame, and the target mask feature corresponding to the second video frame is determined based on the target mask feature corresponding to the first video frame in the target sub-video frame sequence, and the second video frame is the last video frame before the target sub-video frame sequence;
Generating a mask feature sequence based on the target mask features corresponding to each video frame in the target sub-video frame sequence and the target mask features corresponding to other video frames in the initial video frame sequence except the target sub-video frame sequence;
And generating a target video frame sequence based on each video frame corresponding to the mask feature sequence.
In some possible embodiments, the processor 601 is configured to:
determining at least one third video frame in the target sub-video frame sequence, wherein each third video frame is any video frame in the target sub-video frame sequence;
determining a predicted target mask feature corresponding to the first video frame based on a last video frame in the sequence of target sub-video frames and a target mask feature corresponding to the last video frame and each third video frame and a target mask feature corresponding to the third video frame;
And determining the target mask characteristic corresponding to the first video frame based on the target mask characteristic corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask characteristic.
In some possible embodiments, the processor 601 is configured to:
for each third video frame, determining a first image feature and a first context feature corresponding to the third video frame based on the third video frame and the corresponding target mask feature;
fusing the first image features to obtain first fused image features, and fusing the first context features to obtain fused context features;
Carrying out Gaussian blur processing on the target mask feature corresponding to the last video frame in the target sub-video frame sequence to obtain a blur mask feature;
determining a second image feature and a second context feature corresponding to the last video frame in the target sub-video frame sequence;
determining an attention feature based on the first fused image feature, the fused context feature, the blur mask feature, the second context feature, and the second image feature;
And determining a prediction target mask characteristic corresponding to the first video frame based on the attention characteristic and the second image characteristic.
In some possible embodiments, the processor 601 is configured to:
Processing the fuzzy mask features based on the second image features to obtain processed fuzzy mask features;
Determining a second fused image feature based on the second image feature and the first fused image feature;
and determining an attention feature based on the second fused image feature, the second context feature, the processed blur mask feature, and the fused context feature.
In some possible embodiments, the processor 601 is configured to:
fusing the second image features and the fuzzy mask features to obtain fusion target fusion features;
Obtaining bias parameters and weight parameters corresponding to the fuzzy mask features based on the fusion target fusion features;
And obtaining the processed fuzzy mask characteristic based on the bias parameter, the weight parameter and the fuzzy mask characteristic.
In some possible embodiments, the processor 601 is configured to:
determining the cross ratio of the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature;
if the intersection ratio is smaller than a preset threshold value, determining the predicted target mask characteristic as a target mask characteristic corresponding to the first video frame;
and if the merging ratio is greater than or equal to the preset threshold, determining the target mask characteristic corresponding to the last video frame in the target sub-video frame sequence as the target mask characteristic corresponding to the first video frame.
In some possible embodiments, the processor 601 is configured to:
For each target mask feature, determining a third image feature of a target object included in the video frame corresponding to the target mask feature, and determining an optimized mask feature corresponding to the target mask feature based on the third image feature and the target mask feature;
A target sub-video frame sequence in the initial video frame sequence is determined based on each of the optimization mask features.
In some possible embodiments, the above processor 601 is further configured to:
If a first video frame sequence and a second video frame sequence which are formed by continuous video frames comprising the target object are obtained, and a preset number of overlapped video frames exist in the first video frame sequence and the second target sub-video frame, a third target sub-video frame sequence is generated based on the first video frame sequence and the second video frame sequence, and the third target sub-video frame sequence is formed by continuous video frames comprising the target object.
It should be appreciated that in some possible embodiments, the above-described processor 601 may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In a specific implementation, the electronic device 600 may execute, through each functional module built in the electronic device, an implementation manner provided by each step in fig. 1, and specifically, the implementation manner provided by each step may be referred to, which is not described herein again.
The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored and executed by a processor to implement the method provided by each step in fig. 1, and specifically, the implementation manner provided by each step may be referred to, which is not described herein.
The computer readable storage medium may be the video processing apparatus or the internal storage unit of the electronic device provided in any one of the foregoing embodiments, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the electronic device. The computer readable storage medium may also include a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (randomaccess memory, RAM), or the like. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application provide a computer program product comprising a computer program or computer instructions which, when executed by a processor, provides a method according to the steps of the embodiment of the present application as shown in fig. 1.
The terms first, second and the like in the claims and in the description and drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or electronic device that comprises a list of steps or elements is not limited to the list of steps or elements but may, alternatively, include other steps or elements not listed or inherent to such process, method, article, or electronic device. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (12)

1.一种视频处理方法,其特征在于,所述方法包括:1. A video processing method, characterized in that the method comprises: 对初始视频帧序列进行图像分割处理,确定图像分割结果;Performing image segmentation processing on the initial video frame sequence to determine the image segmentation result; 基于所述图像分割结果,确定所述初始视频帧序列中的目标子视频帧序列,所述目标子视频帧序列由包括目标对象的连续视频帧组成,所述目标子视频帧序列的帧数小于所述初始视频帧序列的帧数;Based on the image segmentation result, determining a target sub-video frame sequence in the initial video frame sequence, wherein the target sub-video frame sequence is composed of continuous video frames including a target object, and the number of frames of the target sub-video frame sequence is less than the number of frames of the initial video frame sequence; 基于所述目标子视频帧序列中的多个视频帧包括的所述目标对象的目标掩码特征,依次确定所述初始视频帧序列中除所述目标子视频帧序列外的其他视频帧对应于所述目标对象的目标掩码特征;Based on the target mask features of the target object included in the multiple video frames in the target sub-video frame sequence, sequentially determining the target mask features of the target object corresponding to the other video frames in the initial video frame sequence except the target sub-video frame sequence; 其中,所述目标子视频帧序列之后的每一视频帧对应的目标掩码特征,是基于该视频帧的前一个视频帧对应的目标掩码特征确定的,第一视频帧对应的目标掩码特征是基于所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征确定的,所述第一视频帧为所述目标子视频帧序列之后的第一个视频帧;所述目标子视频帧序列之前的每一视频帧对应的目标掩码特征,是基于该视频帧的后一个视频帧对应的目标掩码特征确定的,第二视频帧对应的目标掩码特征是基于所述目标子视频帧序列中的第一个视频帧对应的目标掩码特征确定的,所述第二视频帧为所述目标子视频帧序列之前的最后一个视频帧;Among them, the target mask features corresponding to each video frame after the target sub-video frame sequence are determined based on the target mask features corresponding to the previous video frame of the video frame, the target mask features corresponding to the first video frame are determined based on the target mask features corresponding to the last video frame in the target sub-video frame sequence, and the first video frame is the first video frame after the target sub-video frame sequence; the target mask features corresponding to each video frame before the target sub-video frame sequence are determined based on the target mask features corresponding to the next video frame of the video frame, the target mask features corresponding to the second video frame are determined based on the target mask features corresponding to the first video frame in the target sub-video frame sequence, and the second video frame is the last video frame before the target sub-video frame sequence; 基于所述目标子视频帧序列中的各视频帧对应的目标掩码特征、以及所述初始视频帧序列中除所述目标子视频帧序列外的其他视频帧对应的目标掩码特征,生成掩码特征序列;Generate a mask feature sequence based on the target mask features corresponding to each video frame in the target sub-video frame sequence and the target mask features corresponding to other video frames in the initial video frame sequence except the target sub-video frame sequence; 基于所述掩码特征序列对应的各视频帧,生成目标视频帧序列,所述目标视频帧序列中的各视频帧包括所述目标对象,所述目标视频帧序列的帧数大于所述目标子视频帧序列的帧数。A target video frame sequence is generated based on each video frame corresponding to the mask feature sequence, each video frame in the target video frame sequence includes the target object, and the number of frames of the target video frame sequence is greater than the number of frames of the target sub-video frame sequence. 2.根据权利要求1所述的方法,其特征在于,所述图像分割结果包括所述初始视频帧序列中的各视频帧所包括的多个对象的掩码特征;2. The method according to claim 1, characterized in that the image segmentation result comprises mask features of multiple objects included in each video frame in the initial video frame sequence; 所述基于所述图像分割结果,确定所述初始视频帧序列中的目标子视频帧序列,包括:The step of determining a target sub-video frame sequence in the initial video frame sequence based on the image segmentation result includes: 从所述初始视频帧序列中的各视频帧所包括的多个对象的掩码特征中,确定目标对象的目标掩码特征;Determining a target mask feature of a target object from mask features of a plurality of objects included in each video frame in the initial video frame sequence; 基于所述目标对象的目标掩码特征,确定所述初始视频帧序列中的目标子视频帧序列。Based on the target mask feature of the target object, a target sub-video frame sequence in the initial video frame sequence is determined. 3.根据权利要求1所述的方法,其特征在于,基于所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征,确定所述第一视频帧对应的目标掩码特征,包括:3. The method according to claim 1, characterized in that determining the target mask feature corresponding to the first video frame based on the target mask feature corresponding to the last video frame in the target sub-video frame sequence comprises: 确定所述目标子视频帧序列中的至少一个第三视频帧,每一所述第三视频帧为所述目标子视频帧序列中的任一视频帧;Determine at least one third video frame in the target sub-video frame sequence, each of the third video frames being any video frame in the target sub-video frame sequence; 基于所述目标子视频帧序列中的最后一个视频帧及其对应的目标掩码特征、和各所述第三视频帧及其对应的目标掩码特征,确定所述第一视频帧对应的预测目标掩码特征;Determining a predicted target mask feature corresponding to the first video frame based on the last video frame in the target sub-video frame sequence and its corresponding target mask feature, and each of the third video frames and its corresponding target mask feature; 基于所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征和所述预测目标掩码特征,确定所述第一视频帧对应的目标掩码特征。Based on the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature, the target mask feature corresponding to the first video frame is determined. 4.根据权利要求3所述的方法,其特征在于,所基于所述目标子视频帧序列中的最后一个视频帧及其对应的目标掩码特征、和各所述第三视频帧及其对应的目标掩码特征,确定所述第一视频帧对应的预测目标掩码特征,包括:4. The method according to claim 3, characterized in that determining the predicted target mask feature corresponding to the first video frame based on the last video frame in the target sub-video frame sequence and its corresponding target mask feature, and each of the third video frames and its corresponding target mask feature, comprises: 对于每一所述第三视频帧,基于该第三视频帧及其对应的目标掩码特征,确定该第三视频帧对应的第一图像特征和第一上下文特征;For each of the third video frames, based on the third video frame and its corresponding target mask feature, determining a first image feature and a first context feature corresponding to the third video frame; 将各所述第一图像特征进行融合得到第一融合图像特征,将各所述第一上下文特征融合得到融合上下文特征;fusing the first image features to obtain a first fused image feature, and fusing the first context features to obtain a fused context feature; 对所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征进行高斯模糊处理,得到模糊掩码特征;Performing Gaussian blur processing on the target mask feature corresponding to the last video frame in the target sub-video frame sequence to obtain a blurred mask feature; 确定所述目标子视频帧序列中的最后一个视频帧对应的第二图像特征和第二上下文特征;Determine a second image feature and a second context feature corresponding to the last video frame in the target sub-video frame sequence; 基于所述第一融合图像特征、所述融合上下文特征、所述模糊掩码特征、所述第二上下文特征和所述第二图像特征,确定注意力特征;determining an attention feature based on the first fused image feature, the fused context feature, the fuzzy mask feature, the second context feature, and the second image feature; 基于所述注意力特征和所述第二图像特征,确定所述第一视频帧对应的预测目标掩码特征。Based on the attention feature and the second image feature, a predicted target mask feature corresponding to the first video frame is determined. 5.根据权利要求4所述的方法,其特征在于,所述基于所述第一融合图像特征、所述融合上下文特征、所述模糊掩码特征、所述第二上下文特征和所述第二图像特征,确定注意力特征,包括:5. The method according to claim 4, characterized in that the determining of the attention feature based on the first fused image feature, the fused context feature, the fuzzy mask feature, the second context feature and the second image feature comprises: 基于所述第二图像特征对所述模糊掩码特征进行处理,得到处理后的模糊掩码特征;Processing the fuzzy mask feature based on the second image feature to obtain a processed fuzzy mask feature; 基于所述第二图像特征和所述第一融合图像特征,确定第二融合图像特征;Determine a second fused image feature based on the second image feature and the first fused image feature; 基于所述第二融合图像特征、所述第二上下文特征、所述处理后的模糊掩码特征以及所述融合上下文特征,确定注意力特征。An attention feature is determined based on the second fused image feature, the second context feature, the processed blur mask feature, and the fused context feature. 6.根据权利要求5所述的方法,其特征在于,所述基于所述第二图像特征对所述模糊掩码特征进行处理,得到处理后的模糊掩码特征,包括:6. The method according to claim 5, characterized in that the processing of the fuzzy mask feature based on the second image feature to obtain the processed fuzzy mask feature comprises: 将所述第二图像特征和所述模糊掩码特征进行融合得到融合目标融合特征;Fusing the second image feature and the fuzzy mask feature to obtain a fusion target fusion feature; 基于所述融合目标融合特征得到所述模糊掩码特征对应的偏置参数和权重参数;Obtaining a bias parameter and a weight parameter corresponding to the fuzzy mask feature based on the fusion target fusion feature; 基于所述偏置参数、所述权重参数和所述模糊掩码特征,得到处理后的模糊掩码特征。Based on the bias parameter, the weight parameter and the fuzzy mask feature, a processed fuzzy mask feature is obtained. 7.根据权利要求3所述的方法,其特征在于,所述基于所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征和所述预测目标掩码特征,确定所述第一视频帧对应的目标掩码特征,包括:7. The method according to claim 3, characterized in that the determining the target mask feature corresponding to the first video frame based on the target mask feature corresponding to the last video frame in the target sub-video frame sequence and the predicted target mask feature comprises: 确定所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征和所述预测目标掩码特征的交并比;Determine an intersection-over-union ratio of a target mask feature corresponding to a last video frame in the target sub-video frame sequence and the predicted target mask feature; 若所述交并比小于预设阈值,则将所述预测目标掩码特征确定为所述第一视频帧对应的目标掩码特征;If the intersection-over-union ratio is less than a preset threshold, determining the predicted target mask feature as the target mask feature corresponding to the first video frame; 若所述交并比大于或者等于所述预设阈值,则将所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征确定为所述第一视频帧对应的目标掩码特征。If the intersection-over-union ratio is greater than or equal to the preset threshold, the target mask feature corresponding to the last video frame in the target sub-video frame sequence is determined as the target mask feature corresponding to the first video frame. 8.根据权利要求2所述的方法,其特征在于,所述基于所述目标对象的目标掩码特征,确定所述初始视频帧序列中的目标子视频帧序列,包括:8. The method according to claim 2, characterized in that the determining of the target sub-video frame sequence in the initial video frame sequence based on the target mask feature of the target object comprises: 对于每一所述目标掩码特征,确定该目标掩码特征对应的视频帧中所包括的目标对象的第三图像特征,基于所述第三图像特征和该目标掩码特征,确定该目标掩码特征对应的优化掩码特征;For each of the target mask features, determining a third image feature of the target object included in the video frame corresponding to the target mask feature, and determining an optimized mask feature corresponding to the target mask feature based on the third image feature and the target mask feature; 基于各所述优化掩码特征,确定所述初始视频帧序列中的目标子视频帧序列。Based on each of the optimized mask features, a target sub-video frame sequence in the initial video frame sequence is determined. 9.根据权利要求1所述的方法,其特征在于,所述方法还包括:9. The method according to claim 1, characterized in that the method further comprises: 若得到由包括所述目标对象的连续视频帧组成的第一视频帧序列和第二视频帧序列,且所述第一视频帧序列和第二目标子视频帧存在重合的预设数量的视频帧,则基于所述第一视频帧序列和所述第二视频帧序列,生成第三目标子视频帧序列,所述第三目标子视频帧序列由包括所述目标对象的连续视频帧组成。If a first video frame sequence and a second video frame sequence consisting of continuous video frames including the target object are obtained, and there are a preset number of overlapping video frames between the first video frame sequence and the second target sub-video frame, a third target sub-video frame sequence is generated based on the first video frame sequence and the second video frame sequence, and the third target sub-video frame sequence is composed of continuous video frames including the target object. 10.一种视频处理装置,其特征在于,所述装置包括:10. A video processing device, characterized in that the device comprises: 图像处理模块,用于对初始视频帧序列进行图像分割处理,确定图像分割结果;An image processing module is used to perform image segmentation processing on the initial video frame sequence and determine the image segmentation result; 序列确定模块,用于基于所述图像分割结果,确定所述初始视频帧序列中的目标子视频帧序列,所述目标子视频帧序列由包括目标对象的连续视频帧组成,所述目标子视频帧序列的帧数小于所述初始视频帧序列的帧数;A sequence determination module, configured to determine a target sub-video frame sequence in the initial video frame sequence based on the image segmentation result, wherein the target sub-video frame sequence is composed of continuous video frames including a target object, and the number of frames of the target sub-video frame sequence is less than the number of frames of the initial video frame sequence; 序列生成模块,用于基于所述目标子视频帧序列中的多个视频帧包括的所述目标对象的目标掩码特征,依次确定所述初始视频帧序列中除所述目标子视频帧序列外的其他视频帧对应于所述目标对象的目标掩码特征;A sequence generation module, configured to sequentially determine target mask features of the target object corresponding to other video frames in the initial video frame sequence except the target sub-video frame sequence based on target mask features of the target object included in a plurality of video frames in the target sub-video frame sequence; 其中,所述目标子视频帧序列之后的每一视频帧对应的目标掩码特征,是基于该视频帧的前一个视频帧对应的目标掩码特征确定的,第一视频帧对应的目标掩码特征是基于所述目标子视频帧序列中的最后一个视频帧对应的目标掩码特征确定的,所述第一视频帧为所述目标子视频帧序列之后的第一个视频帧;所述目标子视频帧序列之前的每一视频帧对应的目标掩码特征,是基于该视频帧的后一个视频帧对应的目标掩码特征确定的,第二视频帧对应的目标掩码特征是基于所述目标子视频帧序列中的第一个视频帧对应的目标掩码特征确定的,所述第二视频帧为所述目标子视频帧序列之前的最后一个视频帧;Among them, the target mask features corresponding to each video frame after the target sub-video frame sequence are determined based on the target mask features corresponding to the previous video frame of the video frame, the target mask features corresponding to the first video frame are determined based on the target mask features corresponding to the last video frame in the target sub-video frame sequence, and the first video frame is the first video frame after the target sub-video frame sequence; the target mask features corresponding to each video frame before the target sub-video frame sequence are determined based on the target mask features corresponding to the next video frame of the video frame, the target mask features corresponding to the second video frame are determined based on the target mask features corresponding to the first video frame in the target sub-video frame sequence, and the second video frame is the last video frame before the target sub-video frame sequence; 所述序列生成模块,用于基于所述目标子视频帧序列中的各视频帧对应的目标掩码特征、以及所述初始视频帧序列中除所述目标子视频帧序列外的其他视频帧对应的目标掩码特征,生成掩码特征序列;The sequence generation module is used to generate a mask feature sequence based on the target mask features corresponding to each video frame in the target sub-video frame sequence and the target mask features corresponding to other video frames in the initial video frame sequence except the target sub-video frame sequence; 所述序列生成模块,用于基于所述掩码特征序列对应的各视频帧,生成目标视频帧序列,所述目标视频帧序列中的各视频帧包括所述目标对象,所述目标视频帧序列的帧数大于所述目标子视频帧序列的帧数。The sequence generation module is used to generate a target video frame sequence based on each video frame corresponding to the mask feature sequence, each video frame in the target video frame sequence includes the target object, and the number of frames in the target video frame sequence is greater than the number of frames in the target sub-video frame sequence. 11.一种电子设备,其特征在于,包括处理器和存储器,所述处理器和存储器相互连接;11. An electronic device, comprising a processor and a memory, wherein the processor and the memory are connected to each other; 所述存储器用于存储计算机程序;The memory is used to store computer programs; 所述处理器被配置用于在调用所述计算机程序时,执行如权利要求1至9任一项所述的方法。The processor is configured to execute the method according to any one of claims 1 to 9 when calling the computer program. 12.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1至9任一项所述的方法。12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method according to any one of claims 1 to 9.
CN202111580218.5A 2021-12-22 2021-12-22 Video processing method, device, equipment and storage medium Active CN114266778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111580218.5A CN114266778B (en) 2021-12-22 2021-12-22 Video processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111580218.5A CN114266778B (en) 2021-12-22 2021-12-22 Video processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114266778A CN114266778A (en) 2022-04-01
CN114266778B true CN114266778B (en) 2025-06-20

Family

ID=80828787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111580218.5A Active CN114266778B (en) 2021-12-22 2021-12-22 Video processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114266778B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978891A (en) * 2019-03-13 2019-07-05 浙江商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110163188A (en) * 2019-06-10 2019-08-23 腾讯科技(深圳)有限公司 Video processing and the method, apparatus and equipment for being embedded in target object in video

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8396328B2 (en) * 2001-05-04 2013-03-12 Legend3D, Inc. Minimal artifact image sequence depth enhancement system and method
US7907793B1 (en) * 2001-05-04 2011-03-15 Legend Films Inc. Image sequence depth enhancement system and method
US8897596B1 (en) * 2001-05-04 2014-11-25 Legend3D, Inc. System and method for rapid image sequence depth enhancement with translucent elements
EP1405269A2 (en) * 2001-05-04 2004-04-07 Legend Films, Llc Image sequence enhancement system and method
JP5949481B2 (en) * 2012-03-14 2016-07-06 富士通株式会社 Image processing method, program, and apparatus
CN111491180B (en) * 2020-06-24 2021-07-09 腾讯科技(深圳)有限公司 Method and device for determining key frame
CN111986105B (en) * 2020-07-27 2024-03-26 成都考拉悠然科技有限公司 Video time sequence consistency enhancing method based on time domain denoising mask
CN112016476B (en) * 2020-08-31 2022-11-01 山东大学 Method and system for predicting visual saliency of complex traffic guided by target detection
CN113011320B (en) * 2021-03-17 2024-06-21 腾讯科技(深圳)有限公司 Video processing method, device, electronic equipment and storage medium
CN113269021B (en) * 2021-03-18 2024-03-01 北京工业大学 Non-supervision video target segmentation method based on local global memory mechanism
CN113096685B (en) * 2021-04-02 2024-05-07 北京猿力未来科技有限公司 Audio processing method and device
CN113139502B (en) * 2021-05-11 2024-12-06 大连理工大学 Unsupervised video segmentation methods
CN113744306B (en) * 2021-06-08 2023-07-21 电子科技大学 Video Object Segmentation Method Based on Temporal Content-Aware Attention Mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978891A (en) * 2019-03-13 2019-07-05 浙江商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110163188A (en) * 2019-06-10 2019-08-23 腾讯科技(深圳)有限公司 Video processing and the method, apparatus and equipment for being embedded in target object in video

Also Published As

Publication number Publication date
CN114266778A (en) 2022-04-01

Similar Documents

Publication Publication Date Title
JP7135143B2 (en) Methods, apparatus, electronic devices and computer readable storage media for building keypoint learning models
CN109086873B (en) Training method, identification method, device and processing device of recurrent neural network
CN111862274B (en) Generative adversarial network training method, image style transfer method and device
CN111143612A (en) Video auditing model training method, video auditing method and related device
CN112200041B (en) Video motion recognition method and device, storage medium and electronic equipment
KR102308889B1 (en) Method for video highlight detection and computing device for executing the method
US11756300B1 (en) Method and apparatus for summarization of unsupervised video with efficient key frame selection reward functions
JP2022531639A (en) How to embed information in video, computer equipment and computer programs
CN112990078A (en) Facial expression generation method based on generation type confrontation network
CN117745761B (en) Point tracking method and device, electronic equipment and storage medium
CN114339362A (en) Video bullet screen matching method and device, computer equipment and storage medium
CN111738092A (en) A Deep Learning-Based Method for Restoring Occluded Human Pose Sequences
US8407575B1 (en) Video content summary
CN118411285A (en) Video portrait matting method, device, storage medium and equipment
CN111177460B (en) Method and device for extracting key frame
CN117014693A (en) Video processing method, device, equipment and storage medium
CN114266778B (en) Video processing method, device, equipment and storage medium
CN119496964A (en) A relatively controllable video generation system based on AIGC large model
CN119540821A (en) Autonomous driving scene recognition method, device, computer equipment, medium and product
KR20230090716A (en) Method and apparatus for image restoration based on burst image
CN114170558A (en) Method, system, device, medium and article for video processing
CN115063713B (en) Training method of video generation model, video generation method, device, electronic equipment and readable storage medium
CN115115985B (en) Video analysis method and device, electronic equipment and storage medium
CN110874553A (en) Recognition model training method and device
CN114998814B (en) Target video generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100176 Room 101, 1f, building 3, yard 18, Kechuang 10th Street, Beijing Economic and Technological Development Zone, Beijing

Applicant after: Beijing ESWIN Computing Technology Co.,Ltd.

Address before: 100176 Room 101, 1f, building 3, yard 18, Kechuang 10th Street, Beijing Economic and Technological Development Zone, Beijing

Applicant before: Beijing yisiwei Computing Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant