CN114445750A

CN114445750A - Video object segmentation method, device, storage medium and program product

Info

Publication number: CN114445750A
Application number: CN202210108563.7A
Authority: CN
Inventors: 胡立; 张邦; 潘攀
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-06

Abstract

The present application provides a video object segmentation method, device, storage medium and program product, wherein the method includes: determining a video to be processed and a target object to be segmented in the video; Process each frame of image to be processed in the video, and segment the target object from each frame of image; wherein, the target segmentation model includes a cyclic feature update model, which is used in the video processing process according to the currently processed image. The loop feature is updated. The application can realize the dynamic update of the loop feature through the model, which can not only make the loop feature fuse the information of the processed multi-frame images, but also effectively avoid the stacking of features. The size of the loop feature can not gradually increase with the progress of the video. It reduces the memory occupation, avoids the long video affecting the segmentation effect due to insufficient memory, and reduces the cost of the video target segmentation algorithm.

Description

Video object segmentation method, device, storage medium and program product

技术领域technical field

本申请涉及视频处理技术领域，尤其涉及一种视频目标分割方法、设备、存储介质及程序产品。The present application relates to the technical field of video processing, and in particular, to a video object segmentation method, device, storage medium and program product.

背景技术Background technique

视频目标分割能够从视频的各帧图像中确定出目标对象所在的区域，在目标跟踪、视频抠图等场景有着重要的作用。Video object segmentation can determine the region where the target object is located from each frame of the video, and plays an important role in object tracking, video matting and other scenes.

在视频目标分割过程中，可以存储已分割的图像信息，指导后续图像的分割。该方案的不足之处在于，随着视频处理的不断推进，已分割的图像信息会在内存中不断堆叠，占用内存较大，并且，对于较长的视频，可能有限的内存无法支持无限增长的特征存储，影响分割效果。In the process of video target segmentation, the segmented image information can be stored to guide the segmentation of subsequent images. The disadvantage of this scheme is that, with the continuous advancement of video processing, the segmented image information will be continuously stacked in the memory, occupying a large amount of memory, and, for long videos, the limited memory may not be able to support infinite growth. Feature storage, which affects the segmentation effect.

发明内容SUMMARY OF THE INVENTION

本申请实施例的主要目的在于提供一种视频目标分割方法、设备、存储介质及程序产品，以降低对视频进行目标分割的成本，提升分割效果。The main purpose of the embodiments of the present application is to provide a video target segmentation method, device, storage medium and program product, so as to reduce the cost of video target segmentation and improve the segmentation effect.

第一方面，本申请实施例提供一种视频目标分割方法，包括：In a first aspect, an embodiment of the present application provides a video target segmentation method, including:

确定待处理的视频以及所述视频中待分割的目标对象；Determine the video to be processed and the target object to be segmented in the video;

通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标对象；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新。Through the pre-built target segmentation model, each frame of images to be processed in the video is sequentially processed based on the cyclic feature, and the target object is segmented from each frame of image; wherein, the target segmentation model includes a cyclic feature update model, which uses During the video processing, the loop feature is updated according to the currently processed image.

可选的，所述目标分割模型还包括：编码器、特征读取模块、解码器；Optionally, the target segmentation model further includes: an encoder, a feature reading module, and a decoder;

其中，在处理视频中的任意一帧图像时，所述编码器用于提取所述图像对应的图像特征；Wherein, when processing any frame of image in the video, the encoder is used to extract the image feature corresponding to the image;

所述特征读取模块用于计算得到所述图像特征和当前的循环特征的关联信息；The feature reading module is used to calculate and obtain the associated information of the image feature and the current loop feature;

所述解码器用于根据所述关联信息，得到所述图像的目标分割结果。The decoder is configured to obtain a target segmentation result of the image according to the associated information.

可选的，所述目标对象的数量为至少一个，所述方法还包括：Optionally, the number of the target object is at least one, and the method further includes:

根据所述视频的目标分割结果，得到各个目标对象的目标视频；其中，所述目标对象的目标视频为在所述待处理的视频中去除所述目标对象的背景后得到的视频，或者，所述目标视频由所述待处理的视频中目标对象所在的多帧图像构成；According to the target segmentation result of the video, the target video of each target object is obtained; wherein, the target video of the target object is the video obtained after removing the background of the target object from the video to be processed, or, the The target video is composed of multiple frames of images where the target object is located in the video to be processed;

输出各个目标对象的目标视频和/或目标视频中的至少部分图像。A target video of each target object and/or at least part of images in the target video is output.

可选的，输出各个目标对象的目标视频和/或目标视频中的至少部分图像，包括：Optionally, output the target video of each target object and/or at least part of the image in the target video, including:

展示所述目标视频和/或所述目标视频中的至少部分图像；displaying the target video and/or at least part of the images in the target video;

根据用户对所述目标视频和/或所述目标视频中的至少部分图像的编辑操作，生成所述目标对象的推广视频和/或推广图像。According to the user's editing operation on the target video and/or at least part of the images in the target video, a promotion video and/or promotion image of the target object is generated.

可选的，对所述视频中图像的处理操作包括：Optionally, the processing operations on the images in the video include:

获取视频中待处理的图像以及当前的循环特征，并提取所述图像对应的图像特征；其中，所述循环特征为根据所述视频中已处理的图像.确定的特征；Obtain the image to be processed in the video and the current loop feature, and extract the image feature corresponding to the image; wherein, the loop feature is a feature determined according to the processed image in the video;

根据所述图像特征与循环特征，确定所述待处理的图像的目标分割结果，并将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，以根据更新后的循环特征处理所述视频中下一帧待处理的图像。Determine the target segmentation result of the image to be processed according to the image features and cycle features, and input the image features and cycle features into the cycle feature update model to obtain the updated cycle features, so as to obtain the updated cycle features according to the updated cycle features The feature processes the next frame in the video to be processed.

可选的，确定待处理的视频以及所述视频中待分割的目标对象，包括：Optionally, determining the video to be processed and the target object to be segmented in the video, including:

获取待处理的视频，并确定所述视频中的至少一帧基准图像及各基准图像对应的目标分割结果；Obtain the video to be processed, and determine at least one frame of reference image in the video and the target segmentation result corresponding to each reference image;

所述方法还包括：根据所述至少一帧基准图像及对应的目标分割结果，确定初始的循环特征；The method further includes: determining an initial loop feature according to the at least one frame of reference image and the corresponding target segmentation result;

其中，所述基准图像的目标分割结果是通过用户在所述基准图像中对目标对象进行标注确定的，或者，是根据目标对象的预设图像对所述基准图像进行自动标注确定的；Wherein, the target segmentation result of the reference image is determined by annotating the target object in the reference image by the user, or is determined by automatically labeling the reference image according to a preset image of the target object;

所述视频中待处理的图像为除所述基准图像以外的其它图像；所述初始的循环特征用于对第一帧待处理的图像进行分割。The images to be processed in the video are other images except the reference image; the initial cycle feature is used to segment the images to be processed in the first frame.

可选的，所述方法还包括下述至少一项：Optionally, the method also includes at least one of the following:

展示所述视频中的基准图像，以供用户在所述基准图像上标注目标对象所在位置，并根据用户的标注确定所述基准图像的目标分割结果；Displaying the reference image in the video, so that the user can mark the location of the target object on the reference image, and determine the target segmentation result of the reference image according to the user's label;

对所述视频中的基准图像进行语义分割，确定所述基准图像中的至少一个对象；将所述至少一个对象对应的图像区域分别与目标对象的预设图像进行比对，得到所述基准图像的目标分割结果；Semantically segment the reference image in the video to determine at least one object in the reference image; and compare the image area corresponding to the at least one object with the preset image of the target object to obtain the reference image The target segmentation result;

对所述视频中的基准图像进行语义分割，确定所述基准图像中的至少一个对象；展示所述至少一个对象对应的图像区域，获取用户从所述至少一个对象中选择的目标对象，并根据目标对象对应的图像区域确定目标分割结果。Semantic segmentation is performed on the reference image in the video to determine at least one object in the reference image; the image area corresponding to the at least one object is displayed, the target object selected by the user from the at least one object is acquired, and according to the The image area corresponding to the target object determines the target segmentation result.

可选的，将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，包括：Optionally, inputting the image features and cyclic features into a cyclic feature update model to obtain updated cyclic features, including:

将所述图像特征和循环特征输入到提炼模块，得到提炼结果；其中，所述提炼模块用于对所述图像特征和循环特征进行融合；Inputting the image features and the cyclic features into a refining module to obtain a refining result; wherein the refining module is used to fuse the image features and the cyclic features;

将所述提炼结果输入到增强模块，得到增强结果；其中，所述增强模块用于对提炼结果进行池化操作；Inputting the refining result to an enhancement module to obtain an enhancement result; wherein, the enhancement module is used to perform a pooling operation on the refining result;

将所述提炼结果与增强结果相加后输入到压缩模块，得到更新后的预设长度的循环特征。The refined result and the enhancement result are added and input to the compression module to obtain the updated loop feature of preset length.

可选的，所述提炼模块包括下采样层、卷积层、矩阵计算单元、归一化层；将所述图像特征和循环特征输入到提炼模块，得到提炼结果，包括：Optionally, the refining module includes a downsampling layer, a convolution layer, a matrix computing unit, and a normalization layer; input the image features and cyclic features into the refining module to obtain a refining result, including:

通过下采样层对所述图像特征和循环特征进行下采样，并将下采样后的图像特征和循环特征分别输入到卷积层，得到图像特征对应的第一特征矩阵和循环特征对应的第二特征矩阵；The image features and cyclic features are down-sampled through the down-sampling layer, and the down-sampled image features and cyclic features are respectively input to the convolution layer to obtain a first feature matrix corresponding to the image features and a second feature matrix corresponding to the cyclic features. feature matrix;

通过矩阵计算单元对所述第一特征矩阵和第二特征矩阵进行矩阵相关运算，并将运算结果输入到归一化层，得到提炼结果。A matrix correlation operation is performed on the first feature matrix and the second feature matrix by a matrix computing unit, and the operation result is input to the normalization layer to obtain a refined result.

第二方面，本申请实施例还提供一种视频目标分割方法，包括：In a second aspect, an embodiment of the present application also provides a video target segmentation method, including:

确定待处理的视频以及所述视频中待分割的目标商品；Determine the video to be processed and the target product to be segmented in the video;

通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标商品；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新；Through a pre-built target segmentation model, each frame of images to be processed in the video is sequentially processed based on the cyclic feature, and the target product is segmented from each frame of image; wherein, the target segmentation model includes a cyclic feature update model, using In the video processing process, the loop feature is updated according to the currently processed image;

根据所述视频的目标分割结果，生成目标商品的目标视频和/或目标图像。According to the target segmentation result of the video, a target video and/or a target image of the target product is generated.

可选的，所述目标图像为所述待处理的视频中所述目标对象所在的图像，或者，在所述目标对象所在的图像中去除背景后得到的图像；所述方法还包括：Optionally, the target image is the image where the target object is located in the video to be processed, or an image obtained after removing the background in the image where the target object is located; the method further includes:

获取用户输入的筛选条件，根据所述筛选条件对所述目标图像进行筛选；Obtain the filter conditions input by the user, and filter the target image according to the filter conditions;

将筛选得到的图像更新到素材库，所述素材库用于制作目标商品的推广视频和/或推广图像。The filtered images are updated to the material library, where the material library is used to make promotional videos and/or promotional images of the target product.

第三方面，本申请实施例提供一种电子设备，包括：In a third aspect, an embodiment of the present application provides an electronic device, including:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；a memory communicatively coupled to the at least one processor;

其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述电子设备执行上述任一方面所述的方法。Wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the electronic device to perform the method according to any one of the above aspects.

第四方面，本申请实施例提供一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机执行指令，当处理器执行所述计算机执行指令时，实现上述任一方面所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, any of the above-mentioned aspects is implemented. method.

第五方面，本申请实施例提供一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述任一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, including a computer program, which implements the method described in any one of the foregoing aspects when the computer program is executed by a processor.

本申请提供的视频目标分割方法、设备、存储介质及程序产品，可以确定待处理的视频以及所述视频中待分割的目标对象，通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标对象，其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新，从而可以通过模型实现循环特征的动态更新，不仅能够使循环特征融合已处理的多帧图像的信息，还能够有效避免特征的堆叠，循环特征的大小可以不随视频的推进而逐渐增大，减少了内存占用，避免长视频因内存不够而影响分割效果，降低了视频目标分割算法的成本。The video object segmentation method, device, storage medium and program product provided by the present application can determine the video to be processed and the target object to be segmented in the video, and then use a pre-built target segmentation model to sequentially analyze the video based on cycle features. Each frame of images to be processed in the video processing is processed, and the target object is segmented from each frame of images, wherein the target segmentation model includes a cyclic feature update model, which is used in the video processing process, according to the currently processed image. The feature is updated, so that the dynamic update of the loop feature can be realized through the model, which can not only make the loop feature integrate the information of the processed multi-frame images, but also effectively avoid the stacking of features, and the size of the loop feature can not gradually increase with the progress of the video. It reduces the memory occupation, avoids the long video affecting the segmentation effect due to insufficient memory, and reduces the cost of the video target segmentation algorithm.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

图1为本申请实施例提供的一种目标分割的应用示意图；1 is a schematic diagram of an application of target segmentation provided by an embodiment of the present application;

图2为本申请提供的一种应用场景图；Fig. 2 is a kind of application scene diagram that this application provides;

图3为本申请实施例提供的一种视频目标分割方法的流程示意图；3 is a schematic flowchart of a video target segmentation method provided by an embodiment of the present application;

图4为本申请实施例提供的一种确定目标分割结果的原理示意图；FIG. 4 is a schematic diagram of a principle for determining a target segmentation result according to an embodiment of the present application;

图5为本申请实施例提供的一种根据步长更新循环特征的示意图；FIG. 5 is a schematic diagram of updating a cycle feature according to a step size according to an embodiment of the present application;

图6为本申请实施例提供的一种目标分割结果的应用流程图；FIG. 6 is an application flowchart of a target segmentation result provided by an embodiment of the present application;

图7为本申请实施例提供的一种更新循环特征的原理示意图；FIG. 7 is a schematic schematic diagram of an update cycle feature provided by an embodiment of the present application;

图8为本申请实施例提供的另一种视频目标分割方法的流程示意图；8 is a schematic flowchart of another video object segmentation method provided by an embodiment of the present application;

图9为本申请实施例提供的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

通过上述附图，已示出本申请明确的实施例，后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围，而是通过参考特定实施例为本领域技术人员说明本申请的概念。Specific embodiments of the present application have been shown by the above-mentioned drawings, and will be described in more detail hereinafter. These drawings and written descriptions are not intended to limit the scope of the concepts of the present application in any way, but to illustrate the concepts of the present application to those skilled in the art by referring to specific embodiments.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application.

首先对本申请所涉及的名词进行解释：First, the terms involved in this application are explained:

半监督视频目标分割(Semi-supervised Video Object Segmentation，VOS)：对于一段视频，在第一帧提供要分割的一个或多个目标的初始标注，在后续的视频帧中，由算法来对第一帧提供的目标做分割。半监督视频目标分割只需要少量的交互，就能够完成整个视频的分割，能够减少人力、提高效率。Semi-supervised Video Object Segmentation (VOS): For a video, the initial annotation of one or more objects to be segmented is provided in the first frame. The target provided by the frame is segmented. Semi-supervised video target segmentation only requires a small amount of interaction to complete the segmentation of the entire video, which can reduce manpower and improve efficiency.

神经网络模型：由大量的神经元互相连接而形成的复杂网络系统，在图像处理等领域有着广泛的应用。Neural network model: a complex network system formed by interconnecting a large number of neurons, which is widely used in image processing and other fields.

本申请实施例可以应用于任意需要对视频进行目标分割的场景。例如，在视频编辑、影视后期、视频会议中，常常需要对视频中的目标做精确的、像素级别的分割。用户可以通过交互式点击的方式完善第一帧的目标分割结果，算法可以根据视频和第一帧中想要分割的目标对象的分割结果，在所有后续帧中将该目标对象分割出来。The embodiments of the present application can be applied to any scene that needs to perform target segmentation on a video. For example, in video editing, post-production, and video conferencing, it is often necessary to perform precise, pixel-level segmentation of objects in the video. The user can improve the target segmentation result of the first frame by interactively clicking, and the algorithm can segment the target object in all subsequent frames according to the video and the segmentation result of the target object to be segmented in the first frame.

可选的，在商品领域，可以将视频进行目标分割，分割出视频中的目标商品，在去除背景后可以用于素材制作等。Optionally, in the commodity field, the video can be segmented by target, and the target commodity in the video can be segmented, which can be used for material production after removing the background.

图1为本申请实施例提供的一种目标分割的应用示意图。如图1所示，通过目标分割方法，可以将视频的各帧图像中的目标商品例如杯子分割出来，得到目标分割结果后，可以为目标商品增加背景，编辑得到处理后的视频，例如目标商品的广告视频或者其它类型的商品视频，能够快速实现各类商品视频的制作。FIG. 1 is an application schematic diagram of a target segmentation provided by an embodiment of the present application. As shown in Figure 1, through the target segmentation method, the target product in each frame of the video, such as a cup, can be segmented. After obtaining the target segmentation result, the background can be added to the target product and edited to obtain the processed video, such as the target product. It can quickly realize the production of various commodity videos.

可选的，本申请实施例尤其可以应用于对直播视频进行目标分割。示例性地，在虚拟人物直播场景中，虚拟人物可以对目标商品进行介绍，形成能够展示目标商品的视频流传输至用户终端，方便用户观看。在直播过程中，可以对实时视频流进行目标分割，或者，也可以在直播结束后对视频进行目标分割，根据目标分割结果，可以确定目标商品出现的图像帧以及在每帧图像中的位置，从而能够利用直播视频得到目标商品的视频素材和图像素材，便于后续使用。Optionally, the embodiments of the present application can be particularly applied to target segmentation for live video. Exemplarily, in an avatar live broadcast scenario, the avatar can introduce the target commodity, form a video stream capable of displaying the target commodity, and transmit it to the user terminal for the convenience of the user to watch. During the live broadcast, target segmentation can be performed on the real-time video stream, or, after the live broadcast is over, the target segmentation can be performed on the video. According to the target segmentation result, the image frame of the target product and its position in each frame of image can be determined. Therefore, the video material and image material of the target product can be obtained by using the live video, which is convenient for subsequent use.

在视频会议领域，可以通过拍摄装置对参会人员进行拍摄，得到参会人员的视频流，通过目标分割方法，将得到的视频流中的参会人员分割出来，并对背景进行虚化，或者添加虚拟背景，形成处理后的视频流显示给参会的其他人员，能够有效保护用户隐私，满足用户的个性化需求。In the field of video conferencing, the participants can be photographed by the camera to obtain the video stream of the participants, and the participants in the obtained video stream can be segmented by the target segmentation method, and the background can be blurred, or Adding a virtual background to form a processed video stream to display to other participants in the conference can effectively protect user privacy and meet the individual needs of users.

在短视频领域，可以通过目标分割方法，将视频中的目标物体分割来作为视频制作素材。例如，用户在室内录制跳舞视频后，可以将视频中的目标人物分割出来，并添加场景例如田园场景、晚会场景等，使得用户可以根据舞蹈风格和动作选择合适的场景，提高短视频的制作效果。In the field of short video, the target object in the video can be segmented as the video production material through the target segmentation method. For example, after a user records a dancing video indoors, he can segment the target person in the video and add scenes such as pastoral scenes, party scenes, etc., so that the user can select the appropriate scene according to the dance style and action, and improve the production effect of the short video. .

在对视频进行目标分割的过程中，为了能够达到更好的分割效果，在对每一帧图像进行分割时，可以利用该图像和历史信息得到该图像对应的分割结果，其中，历史信息可以是视频中已经分割过的图像对应的信息。随着视频的推进，历史信息会不断增加，影响视频分割的效果。In the process of target segmentation of the video, in order to achieve a better segmentation effect, when segmenting each frame of image, the image and historical information can be used to obtain the segmentation result corresponding to the image, where the historical information can be Information corresponding to the segmented images in the video. As the video progresses, historical information will continue to increase, affecting the effect of video segmentation.

有鉴于此，本申请提供一种视频目标分割方法，可以设置循环特征来辅助实现目标分割，在视频处理的推进过程中，利用循环特征对当前图像进行处理，得到当前图像的目标分割结果，并利用模型将循环特征和当前的图像特征进行融合，得到更新后的循环特征，用于后续的图像处理。In view of this, the present application provides a video target segmentation method, which can set a cyclic feature to assist in achieving target segmentation, and in the progress of video processing, utilize the cyclic feature to process the current image to obtain the target segmentation result of the current image, and The model is used to fuse the cyclic features with the current image features to obtain the updated cyclic features for subsequent image processing.

图2为本申请提供的一种应用场景图。如图2所示，用户终端可以将拍摄得到的包含目标对象的视频上传至服务器，服务器可以基于目标分割模型对视频进行目标分割，目标分割模型可以包括循环特征更新模型和分割模块。在处理每一帧图像时，可以从存储单元中取出当前的循环特征，将图像特征以及当前的循环特征输入到分割模块，得到该图像的目标分割结果，并将图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，并在存储单元中更新循环特征，在处理下一帧图像时，可以使用更新后的循环特征。将视频中的各帧图像处理完后，服务器可以将视频的目标分割结果反馈给用户终端。FIG. 2 is an application scenario diagram provided by the present application. As shown in FIG. 2 , the user terminal can upload the captured video containing the target object to the server, and the server can perform target segmentation on the video based on the target segmentation model. The target segmentation model can include a cyclic feature update model and a segmentation module. When processing each frame of image, the current cycle feature can be extracted from the storage unit, the image feature and the current cycle feature can be input into the segmentation module, the target segmentation result of the image can be obtained, and the image feature and cycle feature can be input into the cycle The feature updates the model, obtains the updated cyclic feature, and updates the cyclic feature in the storage unit. When processing the next frame of image, the updated cyclic feature can be used. After processing each frame of images in the video, the server may feed back the target segmentation result of the video to the user terminal.

通过循环特征更新模型不断地对当前处理的图像特征和循环特征进行融合，得到更新后的循环特征，能够实现循环特征的循环动态变化，既能使循环特征包含已经处理的多帧图像的信息，还能够有效避免特征的堆叠，循环特征的大小与模型的输出维度有关，可以不随着视频的推进而逐渐增大，减少了内存占用，避免长视频因内存不够而影响分割效果，降低了视频目标分割算法的成本，提升目标分割效果。Through the cyclic feature update model, the currently processed image features and the cyclic features are continuously fused to obtain the updated cyclic features, which can realize the cyclic dynamic change of the cyclic features. It can also effectively avoid the stacking of features. The size of the loop feature is related to the output dimension of the model, and it can gradually increase without the advancement of the video, which reduces the memory usage, prevents the long video from affecting the segmentation effect due to insufficient memory, and reduces the video target. The cost of the segmentation algorithm improves the target segmentation effect.

下面结合附图，对本申请的一些实施方式作详细说明。在各实施例之间不冲突的情况下，下述的实施例及实施例中的特征可以相互组合。另外，下述各方法实施例中的步骤时序仅为一种举例，而非严格限定。Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The following embodiments and features in the embodiments may be combined with each other without conflict between the embodiments. In addition, the sequence of steps in the following method embodiments is only an example, and is not strictly limited.

图3为本申请实施例提供的一种视频目标分割方法的流程示意图。本实施例中方法的执行主体可以为任意具有图像处理功能的设备，例如服务器、用户终端等。如图3所示，所述方法可以包括：FIG. 3 is a schematic flowchart of a video object segmentation method provided by an embodiment of the present application. The execution body of the method in this embodiment may be any device having an image processing function, such as a server, a user terminal, and the like. As shown in Figure 3, the method may include:

步骤301、确定待处理的视频以及所述视频中待分割的目标对象。Step 301: Determine the video to be processed and the target object to be segmented in the video.

其中，所述视频可以为任意类型的视频，例如长视频或短视频等。所述视频中可以包含至少一个对象，本申请实施例可以用于从所述视频中分割出目标对象。The video may be any type of video, such as a long video or a short video. The video may contain at least one object, and the embodiment of the present application may be used to segment the target object from the video.

可选的，所述目标对象可以根据实际需要来设置，例如，可以为目标商品，或者人物等。所述目标对象的数量可以为一个或多个。所述目标对象可以是用户确定的，也可以是设备自动确定的。Optionally, the target object may be set according to actual needs, for example, it may be a target commodity, or a person. The number of the target objects may be one or more. The target object may be determined by the user or automatically determined by the device.

步骤302、通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标对象；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新。Step 302: Process each frame of image to be processed in the video in turn based on the cyclic feature through a pre-built target segmentation model, and segment the target object from each frame of image; wherein, the target segmentation model includes cyclic feature update. The model is used to update the loop feature according to the currently processed image during the video processing.

其中，所述视频可以包括多帧图像，在对视频进行分割时，可以根据循环特征依次对视频中各帧图像分别进行处理。Wherein, the video may include multiple frames of images, and when the video is segmented, each frame of images in the video may be processed in sequence according to the cycle feature.

可选的，所述循环特征保存了视频中已处理的图像的特征，可以作为分割每一帧图像时的历史特征，辅助实现当前图像的目标分割，并且，循环特征是在视频多帧图像的处理过程中不断循环更新的。所述循环特征可以存储在内存中，或者其它存储器件中。Optionally, the loop feature saves the feature of the processed image in the video, and can be used as a historical feature when dividing each frame of image to assist in the target segmentation of the current image, and the loop feature is used in the video multi-frame image. It is continuously updated in the process of processing. The cycle signature may be stored in memory, or in other storage devices.

可选的，所述循环特征可以具体为循环动态嵌入特征(Recurrent DynamicEmbedding，RDE)，是在视频多帧图像的处理过程中不断循环更新、动态变化的嵌入特征。所述循环特征可以通过循环特征更新模型得到。所述循环特征更新模型可以是机器学习模型，例如神经网络模型。Optionally, the recurring feature may specifically be a recurring dynamic embedding feature (Recurrent Dynamic Embedding, RDE), which is an embedded feature that is continuously updated and dynamically changed in the process of processing multiple frames of video images. The cyclic features can be obtained by updating the model with cyclic features. The recurrent feature update model may be a machine learning model, such as a neural network model.

在对任意一帧图像进行处理时，可以获取当前的循环特征，利用当前的循环特征对图像进行处理，得到目标分割结果，所述目标分割结果可以用于表示目标对象所在的位置，以从图像中分割出目标对象。When processing any frame of image, the current cycle feature can be obtained, and the image can be processed by using the current cycle feature to obtain the target segmentation result. segment the target object.

在一示例中，所述目标分割结果可以包括图像中各个像素点的标注，所述标注用于表示该像素点是否属于目标对象。例如，标注的数字可以在0到1之间取值，若图像中任一像素点的标注为0，说明该像素点不属于目标对象，若像素点的标注为1，则说明该像素点属于目标对象。通过对图像中各像素点进行0和1的标注，可以确定目标对象包含的像素点，从而实现对目标对象的像素级分割。In an example, the target segmentation result may include a label of each pixel in the image, and the label is used to indicate whether the pixel belongs to the target object. For example, the number of the label can take a value between 0 and 1. If the label of any pixel in the image is 0, it means that the pixel does not belong to the target object. If the label of the pixel is 1, it means that the pixel belongs to the target object. target. By labeling each pixel in the image with 0 and 1, the pixels contained in the target object can be determined, thereby realizing pixel-level segmentation of the target object.

在另一示例中，所述目标分割结果可以用于确定图像中的多个目标对象，可以用不同的数字代表不同的目标对象。例如，在对像素点进行标注时，数字1、2、3分别代表目标对象1、目标对象2和目标对象3，数字0代表不感兴趣的背景区域。In another example, the target segmentation result may be used to determine multiple target objects in the image, and different numbers may be used to represent different target objects. For example, when annotating pixel points, numbers 1, 2, and 3 represent target object 1, target object 2, and target object 3, respectively, and number 0 represents an uninteresting background area.

在本实施例中，从存储空间读取循环特征后，更新循环特征和确定目标分割结果可以同时进行，也可以先后进行，本实施例不限制两者的执行顺序。需要注意的是，无论顺序如何，确定目标分割结果使用的可以是更新前的循环特征。In this embodiment, after reading the loop feature from the storage space, updating the loop feature and determining the target segmentation result may be performed simultaneously or sequentially, and the execution order of the two is not limited in this embodiment. It should be noted that regardless of the order, the cyclic features before the update can be used to determine the target segmentation result.

综上，本实施例提供的视频目标分割方法，可以确定待处理的视频以及所述视频中待分割的目标对象，通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标对象，其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新，从而可以通过模型实现循环特征的动态更新，不仅能够使循环特征融合已处理的多帧图像的信息，还能够有效避免特征的堆叠，循环特征的大小可以不随视频的推进而逐渐增大，减少了内存占用，避免长视频因内存不够而影响分割效果，降低了视频目标分割算法的成本。To sum up, the video object segmentation method provided in this embodiment can determine the video to be processed and the target object to be segmented in the video, and through the pre-built target segmentation model, based on the cycle feature, the to-be-processed objects in the video are sequentially analyzed. Each frame of image is processed, and the target object is segmented from each frame of image, wherein the target segmentation model includes a cyclic feature update model, which is used to update the cyclic feature according to the currently processed image during the video processing process, In this way, the dynamic update of the loop feature can be realized through the model, which not only enables the loop feature to fuse the information of the processed multi-frame images, but also effectively avoids the stacking of features. The size of the loop feature can not gradually increase with the progress of the video. Memory occupation, avoiding long videos affecting the segmentation effect due to insufficient memory, and reducing the cost of video target segmentation algorithms.

在本申请的一个或多个实施例中，可选的，对所述视频中图像的处理操作可以包括：获取视频中待处理的图像以及当前的循环特征，并提取所述图像对应的图像特征；其中，所述循环特征为根据所述视频中已处理的图像.确定的特征；根据所述图像特征与循环特征，确定所述待处理的图像的目标分割结果，并将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，以根据更新后的循环特征处理所述视频中下一帧待处理的图像。In one or more embodiments of the present application, optionally, the processing operation on the image in the video may include: acquiring the image to be processed and the current loop feature in the video, and extracting the image feature corresponding to the image ; wherein, the cycle feature is a feature determined according to the processed image in the video; according to the image feature and the cycle feature, determine the target segmentation result of the image to be processed, and combine the image feature and the cycle feature. The loop feature is input to the loop feature update model to obtain the updated loop feature, so as to process the next frame of the image to be processed in the video according to the updated loop feature.

具体地，在对任意一帧图像进行处理时，可以根据该图像获取对应的图像特征。可选的，所述图像特征可以通过编码器提取。在得到图像特征后，可以将图像特征和循环特征输入到深度学习模型中，得到对应的目标分割结果。或者，可以对图像特征和循环特征进行矩阵运算等操作，得到对应的目标分割结果。或者，可以将深度学习模型与矩阵运算等操作进行结合，综合确定目标分割结果。Specifically, when any frame of image is processed, corresponding image features can be obtained according to the image. Optionally, the image features can be extracted by an encoder. After the image features are obtained, the image features and cyclic features can be input into the deep learning model to obtain the corresponding target segmentation results. Alternatively, operations such as matrix operations can be performed on image features and cyclic features to obtain corresponding target segmentation results. Alternatively, the deep learning model can be combined with operations such as matrix operations to comprehensively determine the target segmentation result.

除了确定目标分割结果，图像特征还可以用于对循环特征进行更新，可选的，可以将图像特征和当前的循环特征输入到循环特征更新模型中，得到更新后的循环特征。In addition to determining the target segmentation result, the image features can also be used to update the cyclic features. Optionally, the image features and the current cyclic features can be input into the cyclic feature updating model to obtain the updated cyclic features.

在实际应用中，可以对视频中的图像依次进行分割，在处理每一帧图像时，均可以根据当前图像的特征和循环特征确定当前图像的目标分割结果，并更新循环特征，在处理下一帧图像时，使用更新后的循环特征。In practical applications, the images in the video can be segmented in turn. When processing each frame of image, the target segmentation result of the current image can be determined according to the characteristics and cycle characteristics of the current image, and the cycle characteristics can be updated. When framing the image, use the updated loop feature.

在其他可选的实现方式中，也可以仅对视频中的部分图像使用上述目标分割算法。例如，可以仅通过目标分割算法对视频中的部分图像进行处理，得到目标分割结果，其余图像的目标分割结果可以由用户输入或使用其它方式获得，或者，其余图像可以不进行目标分割。In other optional implementation manners, the above target segmentation algorithm may also be used only for some images in the video. For example, only part of the images in the video can be processed by the target segmentation algorithm to obtain the target segmentation result, and the target segmentation results of the remaining images can be input by the user or obtained by other methods, or the remaining images may not be subjected to target segmentation.

综上，本实施例提供的视频目标分割方法，可以获取视频中待处理的图像以及当前的循环特征，并提取所述图像对应的图像特征，其中，所述循环特征为根据所述视频中已处理的图像.确定的特征，根据所述图像特征与循环特征，确定所述待处理的图像的目标分割结果，并将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，以根据更新后的循环特征处理所述视频中下一帧待处理的图像，从而可以在视频处理的推进过程中，通过模型不断地利用已有的图像特征对循环特征进行更新，使得循环特征能够融合已处理的图像的信息，快速、准确地实现循环特征的更新，提升循环特征更新的效率和准确性。To sum up, the video target segmentation method provided in this embodiment can acquire the image to be processed and the current loop feature in the video, and extract the image feature corresponding to the image, wherein the loop feature is based on the Processed image. Determined features, according to the image features and cycle features, determine the target segmentation result of the image to be processed, and input the image features and cycle features into the cycle feature update model to obtain the updated cycle feature to process the image to be processed in the next frame of the video according to the updated loop feature, so that in the process of video processing, the loop feature can be continuously updated by using the existing image feature through the model, so that the loop The feature can fuse the information of the processed image, realize the update of the cyclic feature quickly and accurately, and improve the efficiency and accuracy of the cyclic feature update.

在其他可选的实施方式中，也可以将图像和循环特征输入到循环特征更新模型中，得到更新后的循环特征。In other optional implementations, the image and cyclic features may also be input into the cyclic feature updating model to obtain updated cyclic features.

在本申请的一个或多个实施例中，可选的，所述目标分割模型还包括：编码器(Encoder)、特征读取模块(Read)、解码器(Decoder)；其中，在处理视频中的任意一帧图像时，所述编码器用于提取所述图像对应的图像特征；所述特征读取模块用于计算得到所述图像特征和当前的循环特征的关联信息；所述解码器用于根据所述关联信息，得到所述图像的目标分割结果。In one or more embodiments of the present application, optionally, the target segmentation model further includes: an encoder (Encoder), a feature reading module (Read), and a decoder (Decoder); wherein, in processing the video When any frame of image is obtained, the encoder is used to extract the image feature corresponding to the image; the feature reading module is used to calculate the correlation information between the image feature and the current loop feature; the decoder is used to From the association information, the target segmentation result of the image is obtained.

相应的，根据所述图像特征与循环特征，确定所述待处理的图像帧的目标分割结果，可以包括：将所述图像特征和循环特征输入到特征读取模块，得到所述图像特征和循环特征的关联信息；将所述关联信息输入到解码器，得到所述待处理的图像的目标分割结果。Correspondingly, determining the target segmentation result of the image frame to be processed according to the image features and cycle features may include: inputting the image features and cycle features into a feature reading module to obtain the image features and cycle features. The associated information of the feature; the associated information is input to the decoder to obtain the target segmentation result of the image to be processed.

图4为本申请实施例提供的一种确定目标分割结果的原理示意图。如图4所示，每一帧的图像输入到编码器，得到对应的图像特征，图像特征与循环特征一起输入到特征读取模块进行计算，计算结果再输入到解码器，得到目标分割结果。FIG. 4 is a schematic diagram of a principle for determining a target segmentation result according to an embodiment of the present application. As shown in Figure 4, the image of each frame is input to the encoder to obtain the corresponding image features. The image features and cyclic features are input to the feature reading module for calculation, and the calculation results are input to the decoder to obtain the target segmentation result.

可选的，图像和各模型的输入输出大小可以根据实际需要来设置。示例性地，待处理的图像可以为RGB图像，包含H*W*3的数据，输入到编码器后得到h*w*c的图像特征，循环特征的尺寸可以与图像特征一致，两者输入到特征读取模块，经过计算后再输入到解码器，得到H*W的目标分割结果。Optionally, the input and output sizes of the image and each model can be set according to actual needs. Exemplarily, the image to be processed can be an RGB image, including data of H*W*3, and the image features of h*w*c are obtained after inputting to the encoder, and the size of the loop feature can be consistent with the image feature. To the feature reading module, after calculation, it is input to the decoder to obtain the H*W target segmentation result.

其中，特征读取模块可以用于对图像特征和循环特征进行融合，得到关联信息。可选的，图像特征和循环特征都可以表示为矩阵的形式，特征读取模块可以对图像特征和循环特征进行矩阵乘法和/或矩阵加法等操作，从而实现两者的融合。Among them, the feature reading module can be used to fuse image features and cyclic features to obtain associated information. Optionally, both the image feature and the cyclic feature can be expressed in the form of a matrix, and the feature reading module can perform operations such as matrix multiplication and/or matrix addition on the image feature and the cyclic feature, so as to realize the fusion of the two.

基于图4所示方案，历史信息通过一个循环特征来维护，历史的图像帧通过编码器得到图像特征，并和原先的循环特征一起作为输入，通过一个循环特征更新模型，得到新的循环特征。当前帧图像要分割时，通过编码得到特征，并和循环特征一起输入到一个特征读取模块，最后通过解码器得到分割结果。Based on the scheme shown in Figure 4, the historical information is maintained by a cyclic feature. The historical image frame obtains the image feature through the encoder, and together with the original cyclic feature as the input, the model is updated through a cyclic feature to obtain a new cyclic feature. When the current frame image is to be segmented, the features are obtained by encoding, and input to a feature reading module together with the cyclic features, and finally the segmentation results are obtained through the decoder.

综上，通过编码器、特征读取模块和解码器，可以提取图像特征，对图像特征和循环特征进行融合并对融合后的特征进行解码，得到目标分割结果，能够有效利用循环特征指导图像的分割，提升图像分割的效率和准确性。In summary, through the encoder, the feature reading module and the decoder, the image features can be extracted, the image features and the cyclic features can be fused, and the fused features can be decoded to obtain the target segmentation result, which can effectively use the cyclic features to guide the image segmentation. Segmentation to improve the efficiency and accuracy of image segmentation.

在一可选实现方式中，可以设置一定的步长，在满足步长要求后更新循环特征。可选的，将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，可以包括：判断当前计数器是否等于步长；若等于，则将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，并将计数器清0。若不等于，则可以将计数器加1，并跳过更新循环特征。In an optional implementation manner, a certain step size can be set, and the cyclic feature is updated after the step size requirement is met. Optionally, inputting the image feature and the cyclic feature into the cyclic feature updating model to obtain the updated cyclic feature may include: judging whether the current counter is equal to the step size; if it is equal, inputting the image feature and the cyclic feature into the model. Update the model to the cyclic feature, get the updated cyclic feature, and clear the counter to 0. If not equal, the counter can be incremented by 1 and the update loop feature is skipped.

图5为本申请实施例提供的一种根据步长更新循环特征的示意图。如图5所示，可以设置步长为θ。循环特征更新模型可以具体为SAM(Spatio-temporal AggregationModule，时空聚合模块)，其中，时空聚合模块可以将图像特征和循环特征在时空域上进行聚合。FIG. 5 is a schematic diagram of updating a cycle feature according to a step size according to an embodiment of the present application. As shown in Figure 5, the step size can be set to θ. The cyclic feature update model may specifically be a SAM (Spatio-temporal Aggregation Module, spatio-temporal aggregation module), wherein the spatio-temporal aggregation module can aggregate image features and cyclic features in the spatio-temporal domain.

可选的，将第t-nθ帧的图像特征和第t-nθ帧的图像对应的循环特征输入到SAM中，可以得到更新后的循环特征。随着视频处理的逐渐推进，将t-θ的图像特征和第t-θ帧的图像对应的循环特征输入到SAM中，可以得到第t帧对应的循环特征。其中，第t帧对应的循环特征可以为在确定第t帧图像的目标分割结果时使用的循环特征。Optionally, the image feature of the t-nθth frame and the cyclic feature corresponding to the image of the t-nθth frame are input into the SAM, and the updated cyclic feature can be obtained. With the gradual advancement of video processing, the image features of t-θ and the cycle features corresponding to the images of the t-θth frame are input into the SAM, and the cycle features corresponding to the t-th frame can be obtained. Wherein, the cycle feature corresponding to the t-th frame may be the cycle feature used when determining the target segmentation result of the t-th frame image.

示例性地，可以根据第一帧图像的图像特征和用户标注的目标分割结果，确定初始的循环特征。根据第二帧的图像特征和初始的循环特征，确定第二帧图像的目标分割结果。Exemplarily, the initial cycle feature may be determined according to the image feature of the first frame of image and the target segmentation result marked by the user. According to the image feature of the second frame and the initial cycle feature, the target segmentation result of the second frame image is determined.

假设步长为3，由于第二帧还不满足步长要求，因此不对循环特征进行更新。第三帧图像使用当前的循环特征即初始的循环特征，与第三帧的图像特征一起得到第三帧图像的目标分割结果，并且不对循环特征进行更新。Assuming that the step size is 3, since the second frame does not meet the step size requirement, the cyclic features are not updated. The third frame image uses the current cycle feature, that is, the initial cycle feature, and obtains the target segmentation result of the third frame image together with the image feature of the third frame, and the cycle feature is not updated.

在处理第四帧图像时，由于满足步长要求，此时可以根据初始的循环特征和第四帧的图像特征确定第四帧图像的目标分割结果，并根据第四帧的图像特征更新循环特征。更新后的循环特征可以用于处理第五帧图像，与第五帧的图像特征一起得到第五帧图像的目标分割结果。When processing the fourth frame image, since the step size requirement is met, the target segmentation result of the fourth frame image can be determined according to the initial cycle feature and the image feature of the fourth frame, and the cycle feature can be updated according to the image feature of the fourth frame . The updated loop feature can be used to process the image of the fifth frame, and the target segmentation result of the image of the fifth frame can be obtained together with the image feature of the fifth frame.

综上，通过步长可以调整循环特征的更新频率，使得目标分割算法兼顾效率和准确性。所述步长还可以由用户输入，在对实时性要求较高的场合，用户可以设置较大的步长，反之则可以设置较小的步长，使得用户可以根据场景调整循环特征的更新频率，满足不同场景下的需求，提高算法灵活性。In summary, the update frequency of the cyclic features can be adjusted through the step size, so that the target segmentation algorithm takes into account both efficiency and accuracy. The step size can also be input by the user. In the case of high real-time requirements, the user can set a larger step size, otherwise, a smaller step size can be set, so that the user can adjust the update frequency of the loop feature according to the scene. , to meet the needs of different scenarios and improve the flexibility of the algorithm.

图6为本申请实施例提供的一种目标分割结果的应用流程图。如图6所示，在得到目标分割结果后，还可以执行下述操作。FIG. 6 is an application flowchart of a target segmentation result provided by an embodiment of the present application. As shown in Figure 6, after obtaining the target segmentation result, the following operations can also be performed.

步骤601、根据所述视频的目标分割结果，得到各个目标对象的目标视频；其中，所述目标对象的目标视频为在所述待处理的视频中去除所述目标对象的背景后得到的视频，或者，所述目标视频由所述待处理的视频中目标对象所在的多帧图像构成。Step 601, according to the target segmentation result of the video, obtain the target video of each target object; wherein, the target video of the target object is the video obtained after removing the background of the target object in the video to be processed, Or, the target video is composed of multiple frames of images where the target object is located in the video to be processed.

其中，待分割的目标对象的数量可以为至少一个。每一目标对象可以有对应的目标分割结果，根据目标分割结果，可以提取出目标视频，所述目标视频可以包括多帧图像。Wherein, the number of target objects to be segmented may be at least one. Each target object may have a corresponding target segmentation result, and according to the target segmentation result, a target video may be extracted, and the target video may include multiple frames of images.

一示例中，提取出的多帧图像可以是原视频中目标对象所在的图像，例如，原视频共有1000帧，其中600帧出现了目标对象，则可以将这600帧图像抽取出来形成目标视频。若出现目标对象的多帧图像是连续的，则可以形成一个目标视频，若是不连续的，则可以形成一个或多个目标视频。例如，第1至200帧出现了目标对象，可以形成一目标视频，200帧后目标对象消失，第301至第700帧又重新出现了目标对象，则可以形成另一目标视频。In an example, the extracted multi-frame images may be images where the target object is located in the original video. For example, if the original video has 1000 frames in total, of which 600 frames have the target object, the 600 frames of images can be extracted to form the target video. If the multiple frames of images in which the target object appears are continuous, one target video can be formed; if they are discontinuous, one or more target videos can be formed. For example, if the target object appears in the 1st to 200th frames, a target video can be formed. After 200 frames, the target object disappears, and the target object reappears in the 301st to 700th frames, and another target video can be formed.

通过提取各个目标对象所在的图像，可以快速准确地从原视频提取出各个目标对象的目标视频，实现基于对象的视频分割操作。By extracting the image where each target object is located, the target video of each target object can be quickly and accurately extracted from the original video, and the object-based video segmentation operation can be realized.

另一示例中，可以根据目标分割结果，对原视频进行抠图，得到去除背景后的目标视频，目标视频中只出现目标对象，不出现其它对象，能够快速实现针对目标对象的视频抠图操作。In another example, according to the target segmentation result, the original video can be cut out to obtain the target video after removing the background. Only the target object appears in the target video, and no other objects appear, which can quickly realize the video cutout operation for the target object. .

步骤602、输出各个目标对象的目标视频和/或目标视频中的至少部分图像。Step 602: Output the target video of each target object and/or at least part of the images in the target video.

可选的，可以将所述目标视频和/或目标视频中的至少部分图像展示给用户，供用户查看或编辑，还可以为用户提供下载目标视频和/或目标视频中的至少部分图像等功能。或者，可以将目标视频和/或目标视频中的至少部分图像上传到云端进行存储。Optionally, the target video and/or at least part of the images in the target video may be displayed to the user for viewing or editing by the user, and functions such as downloading the target video and/or at least part of the image in the target video may also be provided for the user. . Alternatively, the target video and/or at least part of the images in the target video may be uploaded to the cloud for storage.

在一种可选的实现方式中，输出各个目标对象的目标视频和/或目标视频中的至少部分图像，可以包括：展示所述目标视频和/或所述目标视频中的至少部分图像；根据用户对所述目标视频和/或所述目标视频中的至少部分图像的编辑操作，生成所述目标对象的推广视频和/或推广图像。In an optional implementation manner, outputting the target video of each target object and/or at least part of the image in the target video may include: displaying the target video and/or at least part of the image in the target video; The user's editing operation on the target video and/or at least part of the images in the target video generates a promotion video and/or promotion image of the target object.

可选的，可以直接向用户播放目标视频，也可以将目标视频中的各帧图像静态地展示给用户，用户可以对目标视频或图像进行剪辑、调整尺寸、美化、添加文字等操作，生成目标对象的推广视频或图像。Optionally, the target video can be played directly to the user, or each frame of the target video can be statically displayed to the user, and the user can edit, resize, beautify, add text and other operations on the target video or image to generate the target Promotional video or image of the object.

示例性地，在商品领域，在通过目标分割算法确定目标商品在原视频中所在的图像帧后，可以提取出图像帧形成目标视频，或者，对原视频进行抠图处理，得到目标视频。所述目标视频可以作为所述目标商品的推广视频或图像，例如宣传视频、广告视频、海报等，发布到电商平台或其它平台，或者，目标视频可以为短视频的形式，发布到短视频平台。此外，用户可以对所述目标视频或图像进行编辑，并发布编辑后的视频。Exemplarily, in the commodity field, after determining the image frame where the target commodity is located in the original video through the target segmentation algorithm, the image frame can be extracted to form the target video, or the original video can be cut out to obtain the target video. The target video can be used as a promotion video or image of the target product, such as a promotional video, an advertising video, a poster, etc., to be published on an e-commerce platform or other platforms, or the target video can be in the form of a short video and published to a short video. platform. In addition, the user can edit the target video or image, and publish the edited video.

通过展示所述目标视频或所述目标视频中的至少部分图像，并根据用户对所述目标视频或所述至少部分图像的编辑操作，生成所述目标对象的推广视频或推广图像，能够依托于算法提供的目标分割结果，辅助用户实现目标对象推广素材的制作，提升制作素材的效率，提高用户体验。By displaying the target video or at least part of the image in the target video, and according to the user's editing operation on the target video or the at least part of the image, the promotion video or promotion image of the target object is generated, which can rely on The target segmentation results provided by the algorithm assist users in the production of target object promotion materials, improve the efficiency of material production, and improve user experience.

在另一种可选的实现方式中，在得到目标视频后，可以将目标视频或者其中的至少部分图像添加到素材库中，或者，经过用户筛选或编辑后添加到素材库中。In another optional implementation manner, after the target video is obtained, the target video or at least some images therein may be added to the material library, or added to the material library after being filtered or edited by the user.

可选的，可以获取用户输入的筛选条件，根据所述筛选条件对目标视频和/或目标图像进行筛选，并将筛选得到的视频或图像更新到素材库，所述素材库可以用于制作目标对象的推广视频和/或推广图像。其中，目标图像可以是指目标视频中的图像。Optionally, the filter conditions input by the user can be obtained, the target video and/or the target image can be screened according to the filter conditions, and the screened video or image can be updated to the material library, and the material library can be used to make the target Promotional video and/or promotional image of the object. The target image may refer to an image in the target video.

可选的，所述筛选条件可以根据实际需要设置，例如，可以包括但不限于：目标对象完整显示的图像、目标对象正面显示的图像、处于某种特殊形态的目标对象所在的图像、目标对象处于某种变化状态的视频片段等。Optionally, the screening conditions may be set according to actual needs, for example, may include but are not limited to: the image displayed by the target object in its entirety, the image displayed on the front of the target object, the image where the target object in a special form is located, the target object A video clip in some changing state, etc.

示例性地，在商品领域，所述筛选条件可以为：商品介绍图像、商品完整展示的图像、商品包装完整的图像、已经去掉包装的图像、去掉部分包装的图像、商品使用过程对应的视频片段、商品开箱片段等。可以根据用户输入的筛选条件，在目标视频或目标图像中进行选择，选择时可以通过简单比对、或者机器学习模型等方式确定哪些是满足筛选条件的图像。Exemplarily, in the field of commodities, the filtering conditions may be: an image of the introduction of the commodity, an image of the complete display of the commodity, an image of the complete packaging of the commodity, an image of which the packaging has been removed, an image of which part of the packaging has been removed, and a video clip corresponding to the use of the commodity. , product unpacking fragment, etc. The target video or target image can be selected according to the filter conditions input by the user, and the images that meet the filter conditions can be determined by simple comparison or machine learning model during selection.

或者，在得到目标视频或目标图像后，可以由用户对目标视频或目标图像进行编辑，并将编辑后得到的视频或者图像存储至素材库。Alternatively, after obtaining the target video or target image, the user may edit the target video or target image, and store the edited video or image in the material library.

可选的，可以将目标对象的标识、属性信息等与编辑或筛选后的视频、图像一并存储到素材库。在实际应用中，用户可以输入查询条件，例如目标对象的标识，从而快速查找到对应的视频或图像，用于制作目标对象的短视频、海报、宣传材料、广告视频等。Optionally, the identifier and attribute information of the target object may be stored in the material library together with the edited or filtered video and image. In practical applications, users can input query conditions, such as the identification of the target object, so as to quickly find the corresponding video or image, which can be used to make short videos, posters, promotional materials, advertising videos, etc. of the target object.

综上，通过根据所述视频的目标分割结果，对原视频进行抠图或者图像提取操作，得到各个目标对象的目标视频，并输出各个目标对象的目标视频或目标视频中的至少部分图像，能够提高目标对象的视频或图像的制作效率和准确性，满足不同场景下的使用需求。To sum up, by performing matting or image extraction operations on the original video according to the target segmentation result of the video, the target video of each target object is obtained, and the target video of each target object or at least part of the image in the target video can be output. Improve the production efficiency and accuracy of the video or image of the target object to meet the needs of use in different scenarios.

在本申请的一个或多个实施例中，可选的，确定待处理的视频以及所述视频中待分割的目标对象，包括：获取待处理的视频，并确定所述视频中的至少一帧基准图像及各基准图像对应的目标分割结果。In one or more embodiments of the present application, optionally, determining the video to be processed and the target object to be segmented in the video includes: acquiring the video to be processed, and determining at least one frame in the video The reference image and the target segmentation result corresponding to each reference image.

相应的，还可以根据所述至少一帧基准图像及对应的目标分割结果，确定初始的循环特征。Correspondingly, the initial cycle feature may also be determined according to the at least one frame of reference image and the corresponding target segmentation result.

可选的，视频可以通过多种方法获取，例如，由用户拍摄、读取本地视频、从服务器获取等。所述基准图像用于确定初始的循环特征，基准图像可以是视频中的任意一帧或多帧，例如，可以是视频中的第一帧。Optionally, the video can be obtained through various methods, for example, shooting by a user, reading a local video, obtaining from a server, and so on. The reference image is used to determine the initial cycle feature, and the reference image may be any one frame or multiple frames in the video, for example, may be the first frame in the video.

其中，所述基准图像的目标分割结果是通过用户在所述基准图像中对目标对象进行标注确定的，或者，是根据目标对象的预设图像对所述基准图像进行自动标注确定的。所述待处理的图像为所述视频中除所述基准图像以外的其它图像，对应的目标分割结果用于从图像中分割出目标对象；所述初始的循环特征用于对第一帧待处理的图像进行分割。The target segmentation result of the reference image is determined by marking the target object in the reference image by the user, or automatically marking the reference image according to a preset image of the target object. The to-be-processed image is an image other than the reference image in the video, and the corresponding target segmentation result is used to segment the target object from the image; the initial loop feature is used for the first frame to be processed image for segmentation.

在一可选的实现方式中，可以使用半监督视频目标分割方法，在获取到视频后，可以通过由用户来对基准图像进行标注，确定其中目标对象所在的位置。In an optional implementation manner, a semi-supervised video object segmentation method can be used, and after the video is acquired, the user can mark the reference image to determine the location of the target object.

在另一可选的实现方式中，可以根据目标对象的预设图像对所述基准图像进行自动标注。其中，目标对象的预设图像可以是所述视频以外的其它图像，尤其可以是能够准确地对目标对象进行展示的典型图像。例如，对于目标商品来说，预设图像可以是目标商品在商品详情页面的主图。In another optional implementation manner, the reference image may be automatically annotated according to a preset image of the target object. Wherein, the preset image of the target object may be other images than the video, in particular, it may be a typical image that can accurately display the target object. For example, for a target product, the preset image may be the main image of the target product on the product details page.

在确定基准图像的目标分割结果后，可以根据基准图像及目标分割结果确定初始的循环特征，通过循环特征指导其它图像的分割。After the target segmentation result of the reference image is determined, the initial cycle feature can be determined according to the reference image and the target segmentation result, and the segmentation of other images can be guided by the cycle feature.

在实际应用中，可以先由用户或者根据预设图像对视频中的第一帧图像进行标注，确定第一帧图像的目标分割结果，根据第一帧图像及目标分割结果，生成初始的循环特征，再对第二帧图像进行处理，按照前述实施例提供的方法确定第二帧的目标分割结果，并更新循环特征，再根据更新后的循环特征处理第三帧的图像，以此类推，直至视频中的全部图像被处理完。In practical applications, the first frame image in the video can be marked by the user or according to the preset image, the target segmentation result of the first frame image can be determined, and the initial cycle feature can be generated according to the first frame image and the target segmentation result. , and then process the image of the second frame, determine the target segmentation result of the second frame according to the method provided in the foregoing embodiment, update the cyclic feature, and then process the image of the third frame according to the updated cyclic feature, and so on, until All images in the video are processed.

或者，可以先由用户或者根据预设图像对视频中的最后一帧图像进行标注，算法可以从后往前依次对剩余的各个图像进行分割。或者，可以对视频中的中间帧图像进行标注，算法从中间帧向前后两个方向依次进行图像的分割。Alternatively, the last frame of images in the video can be marked by the user or according to a preset image, and the algorithm can segment the remaining images in sequence from the back to the front. Alternatively, the intermediate frame images in the video can be marked, and the algorithm sequentially divides the images from the intermediate frame forward and backward directions.

综上，通过先由用户或者根据预设图像对视频中的至少一帧基准图像进行标注，确定基准图像的目标分割结果，并根据至少一帧基准图像及各基准图像对应的目标分割结果，确定初始的循环特征，能够使初始的循环特征参考更加准确的目标分割结果，指导预测后续图像的目标分割结果，提高后续图像的分割准确性。To sum up, by first marking at least one frame of reference image in the video by the user or according to a preset image, the target segmentation result of the reference image is determined, and according to the at least one frame of reference image and the target segmentation result corresponding to each reference image, the target segmentation result is determined. The initial loop feature can make the initial loop feature refer to a more accurate target segmentation result, guide the prediction of the target segmentation result of the subsequent image, and improve the segmentation accuracy of the subsequent image.

本申请的一个或多个实施例中，可选的，可以具体使用如下的标注方式实现对基准图像的标注。In one or more embodiments of the present application, optionally, the following labeling methods may be used to realize labeling of the reference image.

在第一种可选的标注方式中，可以展示所述视频中的基准图像，以供用户在所述基准图像上标注目标对象所在位置，并根据用户的标注确定所述基准图像的目标分割结果。In the first optional labeling method, the reference image in the video can be displayed, so that the user can label the location of the target object on the reference image, and determine the target segmentation result of the reference image according to the user's labeling .

在一示例中，将基准图像展示给用户后，用户可以在基准图像中描绘出目标对象的轮廓，从而可以确定目标对象在基准图像中的精确位置。In an example, after the reference image is presented to the user, the user can draw the outline of the target object in the reference image, so that the precise position of the target object in the reference image can be determined.

另一示例中，用户可以使用图像编辑软件对基准图像中的目标对象进行标注，算法可以从目标编辑软件获取目标对象的位置。In another example, the user can use image editing software to mark the target object in the reference image, and the algorithm can obtain the position of the target object from the target editing software.

在本实现方式中，由用户对基准图像进行标注，能够提高标注的准确性，从而提升视频目标分割的准确性。In this implementation manner, the user annotates the reference image, which can improve the accuracy of the annotation, thereby improving the accuracy of video target segmentation.

在第二种可选的标注方式中，可以对所述视频中的基准图像进行语义分割，确定所述基准图像中的至少一个对象；将所述至少一个对象对应的图像区域分别与目标对象的预设图像进行比对，得到所述基准图像的目标分割结果。In the second optional labeling method, semantic segmentation can be performed on the reference image in the video to determine at least one object in the reference image; the image area corresponding to the at least one object is respectively associated with the target object The preset images are compared to obtain the target segmentation result of the reference image.

示例性地，可以确定每一对象对应的图像区域与预设图像的相似度，其中，相似度最高的对象可以确定为基准图像中的目标对象。Exemplarily, the similarity between the image area corresponding to each object and the preset image may be determined, wherein the object with the highest similarity may be determined as the target object in the reference image.

可选的，在确定相似度最高的对象后，可以高亮该对象所在的区域显示给用户，由用户确认该对象是否属于目标对象，并在用户确认后生成基准图像的目标分割结果。Optionally, after the object with the highest similarity is determined, the region where the object is located can be highlighted and displayed to the user, the user confirms whether the object belongs to the target object, and after the user confirms, the target segmentation result of the reference image is generated.

在本实现方式中，通过目标对象的预设图像对基准图像进行标注，能够减少用户的工作量，提高用户体验。In this implementation manner, the reference image is marked by the preset image of the target object, which can reduce the workload of the user and improve the user experience.

在第三种可选的标注方式中，可以对所述视频中的基准图像进行语义分割，确定所述基准图像中的至少一个对象；展示所述至少一个对象对应的图像区域，获取用户从所述至少一个对象中选择的目标对象，并根据目标对象对应的图像区域确定目标分割结果。In a third optional labeling method, the reference image in the video can be semantically segmented to determine at least one object in the reference image; the image area corresponding to the at least one object is displayed, and the user's The target object selected from the at least one object is described, and the target segmentation result is determined according to the image area corresponding to the target object.

示例性地，可以将不同对象所在的图像区域设置为不同的颜色，或者，可以通过线条描绘出各个对象的轮廓，用户可以直接点击某一对象所在的图像区域，该对象可以作为目标对象，此外，用户还可以对目标对象的轮廓进行调整。Exemplarily, the image areas where different objects are located can be set to different colors, or the outline of each object can be drawn through lines, the user can directly click on the image area where an object is located, and the object can be used as the target object. , the user can also adjust the outline of the target object.

在本实现方式中，用户直接通过点击操作可以实现对基准图像的标注，无需用户描绘基准图像的轮廓，提升基准图像的标注效率，兼顾标注准确性和用户体验度。In this implementation manner, the user can directly mark the reference image through a click operation, without requiring the user to draw the outline of the reference image, thereby improving the labeling efficiency of the reference image, and taking into account the labeling accuracy and user experience.

上述几种标注方式可以单独使用，也可以结合使用。例如，对部分基准图像采用第一种方式，对其余部分基准图像采用第二种方式，或者，对同一基准图像，可以分别采用第二种方式和第三种方式进行标注，并对两种方式的结果进行融合，确定最终的目标分割结果。The above several labeling methods can be used alone or in combination. For example, the first method is used for some reference images, and the second method is used for the rest of the reference images, or the same reference image can be marked with the second method and the third method respectively, and the two methods The results are fused to determine the final target segmentation result.

在本申请的一个或多个实施例中，可选的，还可以针对视频中的每一图像，进行如下操作：对所述图像进行语义分割，确定图像中的至少一个对象，并将各对象对应的图像区域与目标对象的预设图像进行比对，确定各个对象与目标对象的相似度，将其中的最高相似度作为所述图像的相似度。In one or more embodiments of the present application, optionally, for each image in the video, the following operations may be performed: semantically segment the image, determine at least one object in the image, and classify each object The corresponding image area is compared with the preset image of the target object, the similarity between each object and the target object is determined, and the highest similarity among them is taken as the similarity of the image.

从所述视频的各帧图像中，选出相似度最高的一个或多个图像作为基准图像，根据基准图像中与目标对象相似度最高的对象确定目标分割结果。From each frame of images in the video, one or more images with the highest similarity are selected as the reference image, and the target segmentation result is determined according to the object with the highest similarity with the target object in the reference image.

示例性地，第一帧图像中包含3个对象，与目标对象的相似度分别为10％、1％、80％，则可以取其中的最高相似度即80％作为第一帧图像与目标对象的相似度。同理，可以确定其它各帧图像与目标对象的相似度，假设中间某一帧图像的相似度最高为90％，则可以以该图像作为基准图像。Exemplarily, the first frame of image contains 3 objects, and the similarity with the target object is 10%, 1%, and 80%, respectively, and the highest similarity among them, that is, 80%, can be taken as the first frame of image and the target object. similarity. Similarly, the similarity between the other frame images and the target object can be determined, and if the similarity of a certain frame image in the middle is at most 90%, this image can be used as the reference image.

综上，可以将视频中各帧图像分别与目标对象进行对比，确定相似度最高的作为基准图像，能够在不依赖用户的前提下，提高初始标注的准确性。In summary, each frame image in the video can be compared with the target object, and the one with the highest similarity can be determined as the reference image, which can improve the accuracy of the initial annotation without relying on the user.

在本申请的一个或多个实施例中，可选的，在依赖用户进行初始标注的场景下，可以计算视频中每一帧图像与前一帧图像的相似度，若相邻两帧图像的相似度低于预设的相似度阈值，可以将所述相邻两帧图像中的后一帧图像作为基准图像，展示给用户进行标注。或者，可以从视频中选出相似度最低的一组或多组相邻图像，并将相邻图像中的后一帧图像作为基准图像，展示给用户进行标注。In one or more embodiments of the present application, optionally, in a scenario that relies on the user to perform initial annotation, the similarity between each frame of image in the video and the previous frame of image may be calculated. If the similarity is lower than the preset similarity threshold, the next frame of the image in the two adjacent frames may be used as a reference image, and displayed to the user for marking. Alternatively, one or more groups of adjacent images with the lowest similarity may be selected from the video, and the next frame of images in the adjacent images may be used as a reference image, and displayed to the user for annotation.

综上，在通过用户对初始图像进行标注时，可以从视频中选出与前一帧图像的相似度较低的图像作为基准图像，由于基准图像相对于前一帧图像的变化较大，因此通过算法对基准图像进行分割的效果略差于通过算法对其它图像进行分割的效果，由用户对基准图像进行标注，可以减少因视频场景突变导致目标分割效果较差的情况，进一步提高视频目标分割的准确性。To sum up, when the user annotates the initial image, the image with low similarity to the previous frame image can be selected from the video as the reference image. The effect of segmenting the reference image by the algorithm is slightly worse than that of other images segmented by the algorithm. The user annotates the reference image, which can reduce the situation that the target segmentation effect is poor due to the sudden change of the video scene, and further improve the video target segmentation. accuracy.

在本申请的一个或多个实施例中，可选的，根据所述至少一帧基准图像及对应的目标分割结果，确定初始的循环特征，包括：将所述至少一帧基准图像及对应的目标分割结果进行拼接，并将拼接结果输入到编码器，得到初始的循环特征。In one or more embodiments of the present application, optionally, determining the initial cycle feature according to the at least one frame of reference image and the corresponding target segmentation result includes: dividing the at least one frame of the reference image and the corresponding The target segmentation results are spliced, and the splicing results are input into the encoder to obtain the initial cyclic features.

示例性地，在只有一帧基准图像时，可以将基准图像和对应的目标分割结果进行拼接，并输入到编码器中，得到初始的循环特征。在有多帧基准图像时，可以将每一帧的基准图像和目标分割结果依次进行拼接，输入到编码器中，得到初始的循环特征。Exemplarily, when there is only one frame of reference image, the reference image and the corresponding target segmentation result can be spliced and input into the encoder to obtain the initial cyclic feature. When there are multiple frames of reference images, the reference image of each frame and the target segmentation result can be spliced in turn, and input into the encoder to obtain the initial loop feature.

可选的，所述编码器可以是输入维度固定的编码器，也可以是能够接受不同输入维度的编码器，从而可以根据非固定数量的基准图像确定固定大小的初始循环特征。Optionally, the encoder may be an encoder with a fixed input dimension, or an encoder capable of accepting different input dimensions, so that an initial loop feature of a fixed size can be determined according to a non-fixed number of reference images.

所述编码器可以通过卷积神经网络等来实现，所述编码器可以与循环特征更新模型、分割模块等一起训练。The encoder can be implemented by a convolutional neural network or the like, and the encoder can be trained with a recurrent feature update model, segmentation module, and the like.

其中，提取图像特征的编码器与确定初始循环特征的编码器可以不同。可选的，提取图像特征的编码器可以记为第一编码器，确定初始循环特征的编码器可以记为第二编码器。第一编码器和第二编码器的作用不同，但是都可以和其它模块一起进行训练。Wherein, the encoder for extracting image features and the encoder for determining initial cyclic features may be different. Optionally, the encoder that extracts image features may be denoted as the first encoder, and the encoder that determines the initial cyclic features may be denoted as the second encoder. The role of the first encoder and the second encoder are different, but both can be trained together with other modules.

综上，通过将所述至少一帧基准图像及对应的目标分割结果进行拼接，并将拼接结果输入到编码器，能够提取基准图像的特征和目标分割结果的特征，并根据提取到的特征确定初始循环特征，使得循环特征能够保留基准图像的关键信息，提升初始循环特征的准确性，提高视频的整体分割效果。To sum up, by splicing the at least one frame of the reference image and the corresponding target segmentation result, and inputting the splicing result into the encoder, the features of the reference image and the features of the target segmentation result can be extracted, and determined according to the extracted features. The initial loop feature enables the loop feature to retain the key information of the reference image, improves the accuracy of the initial loop feature, and improves the overall segmentation effect of the video.

在其它可选的实现方式中，在基准图像有多帧时，可以从中选择其中一帧确定初始循环特征，其余的基准图像可以不用于计算初始循环特征，而是直接利用基准图像的目标分割结果，在通过算法进行视频目标分割的过程中，跳过处理这些基准图像，提高视频目标分割的效率。In other optional implementation manners, when there are multiple frames of the reference image, one of the frames can be selected to determine the initial cycle feature, and the rest of the reference images may not be used to calculate the initial cycle feature, but directly use the target segmentation result of the reference image , in the process of video target segmentation through the algorithm, the processing of these reference images is skipped, and the efficiency of video target segmentation is improved.

图7为本申请实施例提供的一种更新循环特征的原理示意图。如图7所示，循环特征更新模块可以包括：提炼(Extracting)模块、增强(Enhancing)模块、压缩(Squeezing)模块。将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，可以包括：将所述图像特征和循环特征输入到提炼模块，得到提炼结果；其中，所述提炼模块用于对所述图像特征和循环特征进行融合；将所述提炼结果输入到增强模块，得到增强结果；其中，所述增强模块用于对提炼结果进行池化操作；将所述提炼结果与增强结果相加后输入到压缩模块，得到更新后的预设长度的循环特征。FIG. 7 is a schematic schematic diagram of an update cycle feature provided by an embodiment of the present application. As shown in FIG. 7 , the recurrent feature update module may include: an extraction (Extracting) module, an enhancement (Enhancing) module, and a compression (Squeezing) module. Inputting the image features and the cyclic features into the cyclic feature updating model to obtain the updated cyclic features may include: inputting the image features and the cyclic features into a refining module to obtain a refining result; wherein the refining module is used for Integrate the image feature and the cyclic feature; input the refining result into an enhancement module to obtain an enhancement result; wherein, the enhancement module is used to perform a pooling operation on the refining result; compare the refining result with the enhancement result. After adding, it is input to the compression module to obtain the updated cyclic feature of preset length.

可选的，所述循环特征可以是预设长度的特征，并且，在视频目标分割过程中保持所述预设长度。具体来说，在更新循环特征时，可以将图像特征和当前的循环特征进行拼接后输入到提炼模块，提炼模块可以对输入的特征进行融合，再经过增强模块实现特征增强，最后将提炼的特征和增强的特征相加，并通过压缩模块压缩到预设长度，得到更新后的循环特征。Optionally, the loop feature may be a feature of a preset length, and the preset length is maintained during the video object segmentation process. Specifically, when updating the cyclic features, the image features and the current cyclic features can be spliced and input into the extraction module. It is added to the enhanced features, and compressed to a preset length by the compression module to obtain the updated cyclic features.

综上，在视频目标分割中，通过提炼模块、增强模块和压缩模块，可以生成轻量且大小恒定的循环特征作为历史信息进行存储和指导实现目标分割，不论处理多少帧，循环特征的尺寸不变，满足轻量化需求。To sum up, in the video target segmentation, through the refining module, the enhancement module and the compression module, a lightweight and constant size loop feature can be generated as historical information to store and guide the target segmentation. No matter how many frames are processed, the size of the loop feature does not vary. change to meet lightweight requirements.

在本申请的一个或多个实施例中，可选的，所述提炼模块可以用于对图像的特征和循环特征的特征进行相乘或相加等运算，实现图像特征和循环特征的融合。In one or more embodiments of the present application, optionally, the extraction module may be used to perform operations such as multiplying or adding features of the image and features of the cycle to realize the fusion of the features of the image and the cycle.

可选的，所述提炼模块可以包括下采样层、卷积层、矩阵计算单元、归一化层。将所述图像特征和循环特征输入到提炼模块，得到提炼结果，包括：通过下采样层对所述图像特征和循环特征进行下采样，并将下采样后的图像特征和循环特征分别输入到卷积层，得到图像特征对应的第一特征矩阵和循环特征对应的第二特征矩阵；通过矩阵计算单元对所述第一特征矩阵和第二特征矩阵进行矩阵相关运算，并将运算结果输入到归一化层，得到提炼结果。Optionally, the refining module may include a downsampling layer, a convolution layer, a matrix computing unit, and a normalization layer. Inputting the image features and cyclic features into the refining module to obtain a refining result, including: down-sampling the image features and cyclic features through a downsampling layer, and inputting the down-sampled image features and cyclic features into the volume Layer layers to obtain the first feature matrix corresponding to the image feature and the second feature matrix corresponding to the cyclic feature; perform a matrix correlation operation on the first feature matrix and the second feature matrix through the matrix calculation unit, and input the operation result into the normalization unit. One layer, the refined result is obtained.

具体的，对图像特征和循环特征进行下采样后，分别输入到卷积层，卷积层可以提取图像特征对应的矩阵，还可以提取循环特征对应的矩阵，这两个矩阵代表了图像特征和循环特征的更深层的特征，对两个矩阵进行矩阵相关运算例如矩阵加法和/或矩阵乘法等，可以实现图像特征和循环特征的融合，再进行归一化操作，得到提炼结果。Specifically, after down-sampling the image features and the cyclic features, they are input to the convolutional layer respectively. The convolutional layer can extract the matrix corresponding to the image features, and can also extract the matrix corresponding to the cyclic features. These two matrices represent the image features and For the deeper features of cyclic features, performing matrix correlation operations on two matrices such as matrix addition and/or matrix multiplication, etc., can realize the fusion of image features and cyclic features, and then perform a normalization operation to obtain a refined result.

在一实现方式中，所述增强模块具体可以用于对提炼结果进行金字塔池化。可选的，所述增强模块可以包括金字塔池化层，例如ASPP(atrous spatial pyramid pooling，空洞空间金字塔池化)，能够更好地对提炼结果进行金字塔池化，得到固定大小的增强结果。In an implementation manner, the enhancement module may specifically be used to perform pyramid pooling on the refining result. Optionally, the enhancement module may include a pyramid pooling layer, such as ASPP (atrous spatial pyramid pooling, atrous spatial pyramid pooling), which can better perform pyramid pooling on the extraction result to obtain an enhancement result of a fixed size.

综上，提炼模块通过矩阵相关计算能够增强时空域上的特征关联，更好地实现图像特征和循环特征的融合，增强模块通过ASPP可以获取不同尺度的感受野，在空间和时间上有不同的特征筛选能力，从而进一步提升视频的目标分割效果。In summary, the extraction module can enhance the feature correlation in the spatiotemporal domain through matrix correlation calculation, and better realize the fusion of image features and cyclic features. The enhancement module can obtain different scales of receptive fields through ASPP, which are different in space and time. Feature screening capability to further improve the target segmentation effect of video.

在本申请的一个或多个实施例中，可选的，可以先通过训练样本对模型进行训练，并利用训练好的模型对待处理的视频进行目标分割。In one or more embodiments of the present application, optionally, the model can be trained by using training samples first, and the trained model can be used to perform target segmentation on the video to be processed.

在一可选实现方式中，可以获取训练样本，所述训练样本包括训练视频以及训练视频中各图像对应的目标分割结果。在训练过程中，从所述训练视频中选择至少一帧基准图像，所述训练视频中除所述至少一帧基准图像以外的其它图像为待处理的图像；对所述至少一帧基准图像的目标分割结果进行扰动，得到带噪声的目标分割结果，并根据所述带噪声的目标分割结果确定初始的循环特征。对于所述训练视频中任一待处理的图像，执行如下操作：基于当前的循环特征预测所述图像的目标分割结果，以根据预测结果以及训练样本中的目标分割结果更新所述循环特征更新模型的参数；对所述图像对应的图像特征进行扰动，得到带有噪声的图像特征，并根据所述带有噪声的图像特征更新循环特征，以根据更新后的循环特征处理所述训练视频中下一帧待处理的图像。In an optional implementation manner, training samples may be obtained, and the training samples include training videos and target segmentation results corresponding to each image in the training videos. During the training process, at least one frame of reference image is selected from the training video, and other images except the at least one frame of reference image in the training video are images to be processed; The target segmentation result is perturbed to obtain a noisy target segmentation result, and an initial loop feature is determined according to the noisy target segmentation result. For any to-be-processed image in the training video, perform the following operations: predict the target segmentation result of the image based on the current cycle feature, so as to update the cycle feature update model according to the prediction result and the target segmentation result in the training sample perturb the image features corresponding to the image to obtain image features with noise, and update the cyclic features according to the image features with noise, so as to process the middle and lower parts of the training video according to the updated cyclic features An image to be processed.

可选的，在训练过程中，可以通过待处理的图像对应的预测结果和实际结果确定损失值，根据损失值更新模型参数以使预测结果接近于实际结果。其中，预测结果是指通过算法预测得到的目标分割结果，实际结果是训练样本中图像对应的目标分割结果。Optionally, in the training process, the loss value may be determined according to the prediction result corresponding to the image to be processed and the actual result, and the model parameters are updated according to the loss value so that the prediction result is close to the actual result. The prediction result refers to the target segmentation result predicted by the algorithm, and the actual result is the target segmentation result corresponding to the image in the training sample.

为了提高算法的抗噪声能力，在训练时可以增加扰动以更新循环特征。以第一帧图像为基准图像为例，可以对第一帧图像的目标分割结果进行扰动，得到带有噪声的目标分割结果并构造初始的循环特征。在处理后续的图像时，可以对图像特征进行扰动，并根据带有噪声的图像特征更新循环特征。In order to improve the anti-noise ability of the algorithm, perturbation can be added to update the recurrent features during training. Taking the first frame image as the reference image as an example, the target segmentation result of the first frame image can be perturbed to obtain the target segmentation result with noise and construct the initial loop feature. When processing subsequent images, the image features can be perturbed and the recurrent features can be updated based on the noisy image features.

可选的，可以设置步长为2，则在处理第2、4、6帧时不增加扰动，在处理第3、5、7帧时可以增加扰动并更新循环特征。Optionally, the step size can be set to 2, then the disturbance is not added when processing the 2nd, 4th, and 6th frames, and the disturbance can be added and the cyclic feature can be updated when processing the 3rd, 5th, and 7th frames.

在模型使用时，对待处理的视频是不断进行分割处理的，由于历史的分割结果不一定正确，可能会引入很多噪声，导致视频分割结果不佳，因此，在训练时，通过构造干扰输入，可以保证模型学习到克服特征存储中的噪声问题，使模型具有更好的抗噪声能力，在实际应用时克服视频序列带来的噪声，提升目标分割的效果和性能。When the model is used, the video to be processed is continuously segmented. Since the historical segmentation results are not necessarily correct, a lot of noise may be introduced, resulting in poor video segmentation results. Therefore, during training, by constructing interference input, you can Ensure that the model learns to overcome the noise problem in feature storage, so that the model has better anti-noise ability, overcomes the noise caused by video sequences in practical applications, and improves the effect and performance of target segmentation.

可选的，目标分割算法使用的各个模块例如第一编码器、第二编码器、解码器、循环特征更新模型等可以是一起训练的。Optionally, each module used by the target segmentation algorithm, such as the first encoder, the second encoder, the decoder, the cyclic feature update model, etc., may be trained together.

在其他可选的实现方式中，也可以根据实际需要对上述模型的结构进行调整，增加、减少、替换某些模块。例如，特征读取模块和解码器可以合并为一个模块，循环特征更新模型可以是其它结构的卷积神经网络模型等等。In other optional implementation manners, the structure of the above model may also be adjusted according to actual needs, and some modules may be added, reduced or replaced. For example, the feature reading module and the decoder can be combined into one module, the recurrent feature update model can be a convolutional neural network model of other structures, and so on.

图8为本申请实施例提供的另一种视频目标分割方法的流程示意图。如图8所示，所述方法可以包括：FIG. 8 is a schematic flowchart of another video object segmentation method provided by an embodiment of the present application. As shown in Figure 8, the method may include:

步骤801、确定待处理的视频以及所述视频中待分割的目标商品。Step 801: Determine the video to be processed and the target commodity to be segmented in the video.

可选的，待处理视频可以是目标商品对应的视频。所述目标商品对应的视频可以是由用户拍摄的，或者，可以是从其它设备获取到的目标商品对应的视频，例如，可以是从服务器获取到的目标商品在商品详情页面上的视频。Optionally, the video to be processed may be a video corresponding to the target product. The video corresponding to the target product may be shot by the user, or may be a video corresponding to the target product obtained from other devices, for example, a video of the target product on the product details page obtained from the server.

其中，所述商品详情页面可以是电商平台提供的用于展示商品的页面，页面上可以展示商品的名称、价格、规格、评论、详情等，还可以展示商品对应的视频。The product details page may be a page provided by an e-commerce platform for displaying products, and the page may display the name, price, specifications, comments, details, etc. of the product, and may also display the video corresponding to the product.

步骤802、通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标商品；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新。Step 802: Process each frame of image to be processed in the video in turn based on the cyclic feature through a pre-built target segmentation model, and segment the target product from each frame of image; wherein, the target segmentation model includes cyclic feature update. The model is used to update the loop feature according to the currently processed image during the video processing.

可选的，对于所述视频中待处理的各个图像可以依次进行如下处理：根据当前待处理的图像对应的图像特征与循环特征，确定所述图像的目标分割结果，并将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征，以根据更新后的循环特征处理所述视频中下一待处理的图像；其中，所述目标分割结果用于表征所述目标商品在图像中的位置。Optionally, each image to be processed in the video may be sequentially processed as follows: according to the image feature and cycle feature corresponding to the current image to be processed, determine the target segmentation result of the image, and compare the image feature with the cyclic feature. The loop feature is input to the loop feature update model, and the updated loop feature is obtained, so as to process the next image to be processed in the video according to the updated loop feature; wherein, the target segmentation result is used to represent the target product in the position in the image.

可选的，可以参照前述任一实施例，将其中的目标对象替换为目标商品，对视频中的图像进行处理，得到各个图像的目标分割结果。Optionally, by referring to any of the foregoing embodiments, the target object therein may be replaced with a target commodity, and the images in the video may be processed to obtain target segmentation results of each image.

步骤803、根据所述视频的目标分割结果，生成目标商品的目标视频和/或目标图像。Step 803: Generate a target video and/or a target image of the target product according to the target segmentation result of the video.

可选的，所述目标图像可以为所述待处理的视频中所述目标对象所在的图像，或者，在所述目标对象所在的图像中去除背景后得到的图像；所述目标视频可以是由目标图像构成的视频。Optionally, the target image may be the image where the target object is located in the video to be processed, or an image obtained after removing the background from the image where the target object is located; the target video may be an image obtained by A video composed of target images.

可选的，可以根据目标分割结果对各个图像进行抠图，提取出各个图像中的目标商品，形成目标图像或目标视频，该图像或视频不包含原视频中的背景，可以作为视频制作素材提供给商家进行编辑，或者，可以从原视频中提取出现目标商品的图像或视频片段，作为目标图像或目标视频。Optionally, each image can be cut out according to the target segmentation result, and the target product in each image can be extracted to form a target image or target video. The image or video does not contain the background in the original video and can be provided as a video production material. Editing for merchants, or extracting images or video clips in which the target product appears from the original video, as the target image or target video.

可选的，还可以获取用户输入的筛选条件，根据所述筛选条件对所述目标图像进行筛选；将筛选得到的图像更新到素材库，所述素材库用于制作目标商品的推广视频和/或推广图像。Optionally, the filter conditions input by the user can also be obtained, and the target images are screened according to the filter conditions; the images obtained by the screening are updated to the material library, and the material library is used to make the promotion video and/or the target product. or promotional images.

通过基于用户的需求对生成的目标图像进行筛选，并将筛选后的图像用作商品推广的素材，能够从原视频中精准地提取出商品素材，满足用户的个性化需求。By screening the generated target images based on the needs of users, and using the screened images as materials for product promotion, the product materials can be accurately extracted from the original video to meet the personalized needs of users.

在一可选的实现方式中，还可以展示所述视频中的基准图像，以供用户在所述基准图像上标注目标商品所在位置，并根据用户的标注确定所述基准图像的目标分割结果；和/或，获取目标商品在商品详情页面对应的主图，并对所述视频中的基准图像进行语义分割，确定所述基准图像中的至少一个对象，将所述至少一个对象对应的图像区域分别与目标商品的主图进行比对，得到所述基准图像的目标分割结果；其中，所述基准图像及对应的目标分割结果用于确定初始的循环特征。In an optional implementation manner, the reference image in the video can also be displayed, so that the user can mark the location of the target product on the reference image, and determine the target segmentation result of the reference image according to the user's label; And/or, obtain the main image corresponding to the target product on the product detail page, perform semantic segmentation on the reference image in the video, determine at least one object in the reference image, and divide the image area corresponding to the at least one object. Comparing with the main image of the target product respectively, the target segmentation result of the reference image is obtained; wherein, the reference image and the corresponding target segmentation result are used to determine the initial cycle feature.

示例性地，可以将第一帧图像作为基准图像，由用户在第一帧图像中标注目标商品的位置，或者，可以从商品详情页面获取目标商品的主图，将主图与第一帧图像中的各个对象进行比较，确定目标商品，从而得到目标分割结果，用于构造初始循环特征。Exemplarily, the first frame image can be used as the reference image, and the user can mark the position of the target product in the first frame image, or the main image of the target product can be obtained from the product details page, and the main image and the first frame image can be compared. Compare each object in the target product to determine the target product, so as to obtain the target segmentation result, which is used to construct the initial cycle feature.

综上，本实施例提供的视频目标分割方法，可以确定待处理的视频以及所述视频中待分割的目标商品，通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标商品，其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新，根据所述视频的目标分割结果，生成目标商品的目标视频和/或目标图像，由于循环特征可以不随着视频的推进而逐渐增大，有效减少了内存占用，避免长视频因内存不够而影响分割效果，降低了视频目标分割算法的成本，提高了商品视频的目标分割效果，提升商品视频或图像的制作效率和准确性。To sum up, the video target segmentation method provided in this embodiment can determine the video to be processed and the target commodity to be segmented in the video, and through the pre-built target segmentation model, based on the cycle feature, the to-be-processed objects in the video are sequentially analyzed. Each frame of image is processed, and the target product is segmented from each frame of image, wherein the target segmentation model includes a cyclic feature update model, which is used to update the cyclic feature according to the currently processed image during the video processing process, According to the target segmentation result of the video, the target video and/or target image of the target product is generated. Since the loop feature may not gradually increase with the progress of the video, the memory occupation is effectively reduced, and the segmentation of long videos due to insufficient memory is avoided. It reduces the cost of the video target segmentation algorithm, improves the target segmentation effect of the product video, and improves the production efficiency and accuracy of the product video or image.

在视频会议领域，本申请实施例还提供一种视频目标分割方法，包括：In the field of video conferencing, an embodiment of the present application also provides a video target segmentation method, including:

确定待处理的会议视频以及目标参会人员；Determine pending meeting videos and target participants;

通过预先构建的目标分割模型，基于循环特征依次对所述会议视频中待处理的各帧图像进行处理，从各帧图像中分割出目标参会人员；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新；Through the pre-built target segmentation model, each frame of images to be processed in the conference video is sequentially processed based on the cyclic features, and the target participants are segmented from each frame of images; wherein, the target segmentation model includes cyclic feature updates A model for updating the loop feature according to the currently processed image during the video processing;

根据所述目标分割结果，对所述会议视频的背景进行虚化或添加虚拟背景。According to the target segmentation result, the background of the conference video is blurred or a virtual background is added.

示例性地，视频会议的目标参会人员使用的设备或服务器，可以将进行背景虚化或添加虚拟背景后的视频发送给其他参会人员使用的设备以将视频播放给其他参会人员，能够有效隐藏用户所在环境，提高会议效率，保护用户隐私。可选的，虚拟背景可以由用户设置，满足用户的个性化需求。Exemplarily, the device or server used by the target participant of the video conference can send the video after the background blur or the virtual background is added to the device used by other participants to play the video to other participants. Effectively hide the user's environment, improve meeting efficiency, and protect user privacy. Optionally, the virtual background can be set by the user to meet the individual needs of the user.

在短视频领域，本申请实施例还提供一种视频目标分割方法，包括：In the field of short videos, an embodiment of the present application also provides a video target segmentation method, including:

确定待处理的长视频以及所述视频中待分割的目标人物；Determine the long video to be processed and the target person to be segmented in the video;

通过预先构建的目标分割模型，基于循环特征依次对所述长视频中待处理的各帧图像进行处理，从各帧图像中分割出目标人物；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新；Through the pre-built target segmentation model, each frame of images to be processed in the long video is sequentially processed based on the cyclic feature, and the target person is segmented from each frame of image; wherein, the target segmentation model includes a cyclic feature update model, for updating the loop feature according to the currently processed image during the video processing;

根据所述目标分割结果，生成目标人物对应的短视频。According to the target segmentation result, a short video corresponding to the target person is generated.

其中，所述目标人物可以是用户指定的人物，也可以是满足一定要求的人物，例如，视频中做出某些动作的人物。The target character may be a character designated by the user, or may be a character that meets certain requirements, for example, a character who performs certain actions in the video.

示例性地，在一段舞蹈视频中，可以将跳舞的人物从视频中分割出来，用作素材来制作短视频或者其它类型的视频，能够有效提高短视频的制作效果。Exemplarily, in a dance video, dancing characters can be segmented from the video and used as materials to produce a short video or other types of videos, which can effectively improve the production effect of the short video.

在本申请各实施例中，方法的执行主体可以根据实际需要来设置。示例性地，可以由终端设备执行，或者，可以由服务器执行，或者，部分步骤由终端设备执行，部分步骤由服务器执行。In each embodiment of the present application, the execution body of the method may be set according to actual needs. Exemplarily, it may be performed by a terminal device, or may be performed by a server, or some steps may be performed by a terminal device, and some steps may be performed by a server.

对应于上述视频目标分割方法，本申请实施例还提供一种视频目标分割装置，所述装置可以包括：Corresponding to the above video object segmentation method, the embodiment of the present application further provides a video object segmentation device, and the device may include:

确定模块，用于确定待处理的视频以及所述视频中待分割的目标对象；A determination module, used to determine the video to be processed and the target object to be segmented in the video;

处理模块，用于通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标对象；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新。The processing module is used for sequentially processing each frame of images to be processed in the video based on the cycle feature through a pre-built target segmentation model, and segmenting the target object from each frame image; wherein, the target segmentation model includes a cycle The feature update model is used to update the loop feature according to the currently processed image during the video processing.

在本申请的一个或多个实施例中，可选的，所述目标分割模型还包括：编码器、特征读取模块、解码器；In one or more embodiments of the present application, optionally, the target segmentation model further includes: an encoder, a feature reading module, and a decoder;

所述特征读取模块用于计算得到所述图像特征和当前的循环特征的关联信息；The feature reading module is used to calculate the associated information of the image feature and the current loop feature;

在本申请的一个或多个实施例中，可选的，所述目标对象的数量为至少一个，所述处理模块还用于：In one or more embodiments of the present application, optionally, the number of the target object is at least one, and the processing module is further configured to:

在本申请的一个或多个实施例中，可选的，所述处理模块在输出各个目标对象的目标视频和/或目标视频中的至少部分图像时，具体用于：In one or more embodiments of the present application, optionally, when outputting the target video of each target object and/or at least part of the image in the target video, the processing module is specifically configured to:

在本申请的一个或多个实施例中，可选的，所述处理模块对所述视频中图像的处理操作包括：In one or more embodiments of the present application, optionally, the processing operations performed by the processing module on the images in the video include:

在本申请的一个或多个实施例中，可选的，所述确定模块具体用于：In one or more embodiments of the present application, optionally, the determining module is specifically configured to:

在本申请的一个或多个实施例中，可选的，所述确定模块还用于执行下述至少一项：In one or more embodiments of the present application, optionally, the determining module is further configured to perform at least one of the following:

对所述视频中的基准图像进行语义分割，确定所述基准图像中的至少一个对象；展示所述至少一个对象对应的图像区域，获取用户从所述至少一个对象中选择的目标对象，并根据目标对象对应的图像区域确定目标分割结果。Semantically segment the reference image in the video to determine at least one object in the reference image; display the image area corresponding to the at least one object, obtain the target object selected by the user from the at least one object, and perform The image area corresponding to the target object determines the target segmentation result.

在本申请的一个或多个实施例中，可选的，所述处理模块在将所述图像特征和循环特征输入到循环特征更新模型，得到更新后的循环特征时，具体用于：In one or more embodiments of the present application, optionally, when the processing module inputs the image feature and the cyclic feature into the cyclic feature update model to obtain the updated cyclic feature, the processing module is specifically used for:

将所述提炼结果与增强结果相加后输入到压缩模块，得到更新后的预设长度的循环特征。The refined result and the enhancement result are added and input to the compression module to obtain the updated cyclic feature of the preset length.

在本申请的一个或多个实施例中，可选的，所述提炼模块包括下采样层、卷积层、矩阵计算单元、归一化层；将所述图像特征和循环特征输入到提炼模块，得到提炼结果，包括：In one or more embodiments of the present application, optionally, the refining module includes a downsampling layer, a convolution layer, a matrix computing unit, and a normalization layer; input the image features and cyclic features into the refining module , get the refined results, including:

通过矩阵计算单元对所述第一特征矩阵和第二特征矩阵进行矩阵相关运算，并将运算结果输入到归一化层，得到提炼结果。本申请实施例提供的视频目标分割装置，可用于执行上述图1至图7所示实施例的技术方案，其实现原理和技术效果类似，本实施例此处不再赘述。A matrix correlation operation is performed on the first feature matrix and the second feature matrix by a matrix computing unit, and the operation result is input to the normalization layer to obtain a refined result. The video object segmentation apparatus provided in the embodiments of the present application can be used to implement the technical solutions of the embodiments shown in FIG. 1 to FIG. 7 , and the implementation principles and technical effects thereof are similar, and details are not described herein again in this embodiment.

本申请实施例还提供另一种视频目标分割装置，所述装置可以包括：The embodiment of the present application further provides another video object segmentation apparatus, and the apparatus may include:

目标商品确定模块，用于确定待处理的视频以及所述视频中待分割的目标商品；a target commodity determination module, used to determine the video to be processed and the target commodity to be segmented in the video;

商品视频处理模块，用于通过预先构建的目标分割模型，基于循环特征依次对所述视频中待处理的各帧图像进行处理，从各帧图像中分割出目标商品；其中，所述目标分割模型包括循环特征更新模型，用于在视频处理过程中，根据当前处理的图像对所述循环特征进行更新；The commodity video processing module is used to sequentially process each frame of images to be processed in the video based on the cycle feature through a pre-built target segmentation model, and segment the target commodity from each frame of image; wherein, the target segmentation model Including a cyclic feature update model, used for updating the cyclic feature according to the currently processed image in the video processing process;

生成模块，用于根据所述视频的目标分割结果，生成目标商品的目标视频和/或目标图像。A generating module, configured to generate a target video and/or a target image of the target product according to the target segmentation result of the video.

在本申请的一个或多个实施例中，可选的，所述目标图像为所述待处理的视频中所述目标对象所在的图像，或者，在所述目标对象所在的图像中去除背景后得到的图像；所述生成模块还用于：In one or more embodiments of the present application, optionally, the target image is the image where the target object is located in the video to be processed, or, after removing the background from the image where the target object is located the resulting image; the generation module is also used to:

将筛选得到的图像更新到素材库，所述素材库用于制作目标商品的推广视频和/或推广图像。本申请实施例提供的视频目标分割装置，可用于执行上述图8所示实施例的技术方案，其实现原理和技术效果类似，本实施例此处不再赘述。The filtered images are updated to the material library, where the material library is used to make promotional videos and/or promotional images of the target product. The video object segmentation apparatus provided in the embodiment of the present application can be used to implement the technical solution of the embodiment shown in FIG. 8 , and the implementation principle and technical effect thereof are similar, and are not repeated here in this embodiment.

图9为本申请实施例提供的一种电子设备的结构示意图。如图9所示，本实施例的电子设备可以包括：FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 9 , the electronic device of this embodiment may include:

至少一个处理器901；以及at least one processor 901; and

与所述至少一个处理器通信连接的存储器902；a memory 902 in communication with the at least one processor;

其中，所述存储器902存储有可被所述至少一个处理器901执行的指令，所述指令被所述至少一个处理器901执行，以使所述电子设备执行如上述任一实施例所述的方法。Wherein, the memory 902 stores instructions that can be executed by the at least one processor 901, and the instructions are executed by the at least one processor 901, so that the electronic device executes any of the foregoing embodiments. method.

可选地，存储器902既可以是独立的，也可以跟处理器901集成在一起。Optionally, the memory 902 may be independent or integrated with the processor 901 .

本实施例提供的电子设备的实现原理和技术效果可以参见前述各实施例，此处不再赘述。For the implementation principle and technical effect of the electronic device provided in this embodiment, reference may be made to the foregoing embodiments, and details are not repeated here.

本申请实施例还提供一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机执行指令，当处理器执行所述计算机执行指令时，实现前述任一实施例所述的方法。Embodiments of the present application further provide a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the method described in any of the foregoing embodiments is implemented.

本申请实施例还提供一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现前述任一实施例所述的方法。Embodiments of the present application further provide a computer program product, including a computer program, which implements the method described in any of the foregoing embodiments when the computer program is executed by a processor.

在本申请所提供的几个实施例中，应该理解到，所揭露的设备和方法，可以通过其它的方式实现。例如，以上所描述的设备实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个模块可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules may be combined or integrated. to another system, or some features can be ignored, or not implemented.

上述以软件功能模块的形式实现的集成的模块，可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器执行本申请各个实施例所述方法的部分步骤。The above-mentioned integrated modules implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute some steps of the methods described in the various embodiments of the present application.

应理解，上述处理器可以是中央处理单元(Central Processing Unit，简称CPU)，还可以是其它通用处理器、数字信号处理器(Digital Signal Processor，简称DSP)、专用集成电路(Application Specific Integrated Circuit，简称ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合申请所公开的方法的步骤可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。存储器可能包含高速RAM存储器，也可能还包括非易失性存储NVM，例如至少一个磁盘存储器，还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。It should be understood that the above-mentioned processor may be a central processing unit (Central Processing Unit, CPU for short), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, Referred to as ASIC) and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the application can be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor. The memory may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a U disk, a removable hard disk, a read-only memory, a magnetic disk or an optical disk, and the like.

上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。The above-mentioned storage medium may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. A storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

一种示例性的存储介质耦合至处理器，从而使处理器能够从该存储介质读取信息，且可向该存储介质写入信息。当然，存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits，简称ASIC)中。当然，处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and the storage medium may be located in application specific integrated circuits (Application Specific Integrated Circuits, ASIC for short). Of course, the processor and the storage medium may also exist in the electronic device or the host device as discrete components.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of this application.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims

1. A method for segmenting video objects, comprising:

determining a video to be processed and a target object to be segmented in the video;

sequentially processing each frame image to be processed in the video based on the cycle characteristics through a pre-constructed target segmentation model, and segmenting a target object from each frame image; the target segmentation model comprises a cyclic feature updating model used for updating the cyclic features according to the currently processed image in the video processing process.

2. The method of claim 1, wherein the object segmentation model further comprises: the device comprises an encoder, a characteristic reading module and a decoder;

when any frame of image in a video is processed, the encoder is used for extracting image characteristics corresponding to the image;

the characteristic reading module is used for calculating and obtaining the associated information of the image characteristic and the current cycle characteristic;

and the decoder is used for obtaining a target segmentation result of the image according to the associated information.

3. The method of claim 1, wherein the number of target objects is at least one, the method further comprising:

obtaining target videos of all target objects according to the target segmentation result of the video; the target video of the target object is a video obtained after the background of the target object is removed from the video to be processed, or the target video is composed of a plurality of frames of images where the target object is located in the video to be processed;

and outputting the target video of each target object and/or at least partial images in the target video.

4. The method according to claim 3, wherein outputting the target video of each target object and/or at least a partial image of the target video comprises:

displaying the target video and/or at least part of images in the target video;

and generating a promotion video and/or a promotion image of the target object according to the editing operation of the target video and/or at least part of images in the target video by the user.

5. The method according to any of claims 1-4, wherein the processing operation on the images in the video comprises:

acquiring an image to be processed and a current cycle characteristic in a video, and extracting an image characteristic corresponding to the image; wherein the cyclic features are features determined from processed images in the video;

and determining a target segmentation result of the image to be processed according to the image characteristic and the cycle characteristic, inputting the image characteristic and the cycle characteristic into a cycle characteristic updating model to obtain an updated cycle characteristic, and processing the image to be processed of the next frame in the video according to the updated cycle characteristic.

6. The method of claim 5, wherein determining a video to be processed and a target object to be segmented in the video comprises:

acquiring a video to be processed, and determining at least one frame of reference image in the video and a target segmentation result corresponding to each reference image;

the method further comprises the following steps: determining an initial cycle characteristic according to the at least one frame of reference image and the corresponding target segmentation result;

the target segmentation result of the reference image is determined by labeling a target object in the reference image by a user, or is determined by automatically labeling the reference image according to a preset image of the target object;

the image to be processed in the video is other images except the reference image; the initial cyclic feature is used to segment the first frame of the image to be processed.

7. The method of claim 6, further comprising at least one of:

displaying a reference image in the video to enable a user to mark the position of a target object on the reference image, and determining a target segmentation result of the reference image according to the mark of the user;

performing semantic segmentation on a reference image in the video, and determining at least one object in the reference image; comparing the image area corresponding to the at least one object with a preset image of a target object respectively to obtain a target segmentation result of the reference image;

performing semantic segmentation on a reference image in the video, and determining at least one object in the reference image; and displaying an image area corresponding to the at least one object, acquiring a target object selected by a user from the at least one object, and determining a target segmentation result according to the image area corresponding to the target object.

8. The method of claim 5, wherein inputting the image features and the cyclic features into a cyclic feature update model, resulting in updated cyclic features, comprises:

inputting the image features and the cycle features into a refining module to obtain a refining result; the refining module is used for fusing the image features and the cycle features;

inputting the refined result into an enhancement module to obtain an enhancement result; the enhancement module is used for performing pooling operation on the extraction result;

and adding the refined result and the enhanced result, and inputting the result into a compression module to obtain the updated cycle characteristic with the preset length.

9. The method of claim 8, wherein the refinement module comprises a downsampling layer, a convolution layer, a matrix computation unit, a normalization layer; inputting the image features and the cycle features into a refining module to obtain a refining result, wherein the refining result comprises:

the image features and the cyclic features are downsampled through a downsampling layer, and the downsampled image features and the downsampled cyclic features are input into a convolutional layer respectively to obtain a first feature matrix corresponding to the image features and a second feature matrix corresponding to the cyclic features;

and performing matrix correlation operation on the first feature matrix and the second feature matrix through a matrix calculation unit, and inputting an operation result into a normalization layer to obtain an extraction result.

10. A method for segmenting video objects, comprising:

determining a video to be processed and a target commodity to be segmented in the video;

sequentially processing each frame image to be processed in the video based on the cycle characteristics through a pre-constructed target segmentation model, and segmenting a target commodity from each frame image; the target segmentation model comprises a cyclic feature updating model used for updating the cyclic features according to the currently processed image in the video processing process;

and generating a target video and/or a target image of the target commodity according to the target segmentation result of the video.

11. The method according to claim 10, wherein the target image is an image of the target commodity in the video to be processed, or an image obtained by removing a background from the image of the target commodity; the method further comprises the following steps:

obtaining a screening condition input by a user, and screening the target image according to the screening condition;

and updating the image obtained by screening to a material library, wherein the material library is used for manufacturing a promotion video and/or a promotion image of the target commodity.

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to cause the electronic device to perform the method of any of claims 1-11.

13. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-11.

14. A computer program product comprising a computer program, characterized in that the computer program realizes the method according to any of claims 1-11 when executed by a processor.