CN108111911A

CN108111911A - Video data real-time processing method and device based on the segmentation of adaptive tracing frame

Info

Publication number: CN108111911A
Application number: CN201711423802.3A
Authority: CN
Inventors: 赵鑫; 邱学侃; 颜水成
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: 360 Technology Group Co Ltd
Priority date: 2017-12-25
Filing date: 2017-12-25
Publication date: 2018-06-01
Anticipated expiration: 2037-12-25
Also published as: CN108111911B

Abstract

The invention discloses a video data real-time processing method, device, computing device and computer storage medium based on adaptive tracking frame segmentation. The tracking frame corresponding to the t-1 frame image; according to the t-th frame image, the tracking frame corresponding to the t-1 frame image is adjusted to obtain the tracking frame corresponding to the t-frame image; according to the t-th frame image corresponding The tracking frame of the tth frame image is subjected to scene segmentation processing on the partial area of the tth frame image, and the segmentation result corresponding to the tth frame image is obtained; according to the segmentation result, the second foreground image of the tth frame image is determined; according to the second foreground image, adding Personalize special effects to obtain the processed t-th frame of image; cover the processed t-th frame of image with the t-th frame of image to obtain processed video data; display the processed video data. This technical solution can add personalized special effects to frame images more accurately and quickly.

Description

Video data real-time processing method and device based on adaptive tracking frame segmentation

技术领域technical field

本发明涉及图像处理技术领域，具体涉及一种基于自适应跟踪框分割的视频数据实时处理方法、装置、计算设备及计算机存储介质。The present invention relates to the technical field of image processing, in particular to a video data real-time processing method, device, computing device and computer storage medium based on adaptive tracking frame segmentation.

背景技术Background technique

现有技术中，当用户需要对视频进行更换背景、添加特效等个性化处理时，经常会使用到图像分割方法对视频中的帧图像进行场景分割处理，其中，采用基于深度学习的图像分割方法可以达到像素级别的分割效果。然而现有的图像分割方法在进行场景分割处理时，需要对帧图像的全部内容都进行场景分割处理，数据处理量较大，处理效率较低；另外，现有的图像分割方法在进行场景分割处理时，并不考虑前景图像在帧图像中所占比例，因此当前景图像在帧图像中所占比例较小时，利用现有的图像分割方法很容易将实际属于前景图像边缘处的像素点划分为背景图像，所得到的分割结果的分割精度较低、分割效果较差。因此，现有技术中的图像分割方式存在着图像场景分割的数据处理量大，处理效率和分割精度低下的问题，那么利用所得到的分割结果也无法很好地、精准地对视频中的帧图像添加个性化特效，所得到的处理后的视频数据的显示效果较差。In the prior art, when the user needs to perform personalized processing such as changing the background of the video, adding special effects, etc., the image segmentation method is often used to perform scene segmentation processing on the frame images in the video. Among them, the image segmentation method based on deep learning is adopted The segmentation effect at the pixel level can be achieved. However, when the existing image segmentation method performs scene segmentation processing, it is necessary to perform scene segmentation processing on all the contents of the frame image, the amount of data processing is large, and the processing efficiency is low; in addition, the existing image segmentation method is performing scene segmentation When processing, the proportion of the foreground image in the frame image is not considered, so when the proportion of the foreground image in the frame image is small, it is easy to divide the pixels that actually belong to the edge of the foreground image by using the existing image segmentation method is the background image, the resulting segmentation results have low segmentation accuracy and poor segmentation effect. Therefore, the image segmentation method in the prior art has the problems of a large amount of data processing for image scene segmentation, low processing efficiency and low segmentation accuracy, so the obtained segmentation results cannot be used to accurately and accurately classify the frames in the video. Add personalized special effects to the image, and the display effect of the obtained processed video data is relatively poor.

发明内容Contents of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的基于自适应跟踪框分割的视频数据实时处理方法、装置、计算设备及计算机存储介质。In view of the above problems, the present invention is proposed to provide a method, device, computing device and computer storage medium for real-time processing of video data based on adaptive tracking frame segmentation to overcome the above problems or at least partially solve the above problems.

根据本发明的一个方面，提供了一种基于自适应跟踪框分割的视频数据实时处理方法，该方法用于对视频中每隔n帧划分得到的各组帧图像进行处理，针对其中一组帧图像，该方法包括：According to one aspect of the present invention, a method for real-time processing of video data based on adaptive tracking frame segmentation is provided, the method is used to process each group of frame images obtained by dividing every n frames in the video, and for one group of frames image, the method includes:

获取一组帧图像中包含有特定对象的第t帧图像以及与第t-1帧图像对应的跟踪框，其中t大于1；与第1帧图像对应的跟踪框是根据与第1帧图像对应的分割结果所确定的；Obtain the tth frame image containing a specific object in a set of frame images and the tracking frame corresponding to the t-1th frame image, where t is greater than 1; the tracking frame corresponding to the first frame image is based on the corresponding to the first frame image Determined by the segmentation results;

依据第t帧图像，对与第t-1帧图像对应的跟踪框进行调整处理，得到与第t帧图像对应的跟踪框；根据与第t帧图像对应的跟踪框，对第t帧图像的部分区域进行场景分割处理，得到与第t帧图像对应的分割结果；According to the tth frame image, adjust the tracking frame corresponding to the t-1th frame image to obtain the tracking frame corresponding to the tth frame image; according to the tracking frame corresponding to the tth frame image, adjust the tth frame image Part of the area is subjected to scene segmentation processing to obtain a segmentation result corresponding to the image of the tth frame;

根据与第t帧图像对应的分割结果，确定第t帧图像的第二前景图像；According to the segmentation result corresponding to the tth frame image, determine the second foreground image of the tth frame image;

依据第二前景图像，添加个性化特效，得到处理后的第t帧图像；Adding personalized special effects according to the second foreground image to obtain the processed t-th frame image;

将处理后的第t帧图像覆盖第t帧图像得到处理后的视频数据；Covering the processed t-th frame image with the t-th frame image to obtain processed video data;

显示处理后的视频数据。Display the processed video data.

进一步地，依据第二前景图像，添加个性化特效，得到处理后的第t帧图像进一步包括：Further, according to the second foreground image, adding personalized special effects, the processed t-th frame image further includes:

从第二前景图像中提取出待处理区域的关键信息；Extract key information of the area to be processed from the second foreground image;

根据关键信息，绘制效果贴图；According to the key information, draw the effect map;

将效果贴图、第二前景图像与预设背景图像进行融合处理，得到处理后的第t帧图像；或者，将效果贴图、第二前景图像与根据与第t帧图像对应的分割结果确定的第二背景图像进行融合处理，得到处理后的第t帧图像。Fusing the effect map, the second foreground image and the preset background image to obtain the processed image of the tth frame; The two background images are fused to obtain the processed t-th frame image.

进一步地，关键信息为关键点信息；根据关键信息，绘制效果贴图进一步包括：Further, the key information is key point information; drawing the effect map according to the key information further includes:

查找与关键点信息对应的基础效果贴图；或者，获取用户指定的基础效果贴图；Find the basic effect texture corresponding to the key point information; or, obtain the basic effect texture specified by the user;

根据关键点信息，计算具有对称关系的至少两个关键点之间的位置信息；Calculating position information between at least two key points having a symmetrical relationship according to the key point information;

依据位置信息，对基础效果贴图进行处理，得到效果贴图。According to the location information, the basic effect map is processed to obtain the effect map.

从第二前景图像中提取出待识别区域的关键信息；extracting key information of the area to be identified from the second foreground image;

依据关键信息，对特定对象的姿态进行识别，得到特定对象的姿态识别结果；According to the key information, the posture of the specific object is recognized, and the posture recognition result of the specific object is obtained;

根据特定对象的姿态识别结果，确定对应的对第t帧图像待响应的效果处理命令，得到处理后的第t帧图像。According to the posture recognition result of the specific object, determine the corresponding effect processing command to be responded to the t-th frame of image, and obtain the processed t-th frame of image.

进一步地，根据特定对象的姿态识别结果，确定对应的对第t帧图像待响应的效果处理命令，得到处理后的第t帧图像进一步包括：Further, according to the gesture recognition result of the specific object, determine the corresponding effect processing command to be responded to the t-th frame image, and obtain the processed t-th frame image further comprising:

根据特定对象的姿态识别结果，以及第t帧图像中的包含的与交互对象的交互信息，确定对应的对第t帧图像待响应的效果处理命令，得到处理后的第t帧图像。According to the posture recognition result of the specific object and the interaction information with the interactive object contained in the t-th frame of image, determine the corresponding effect processing command to be responded to the t-th frame of image, and obtain the processed t-th frame of image.

进一步地，待响应的效果处理命令包括效果贴图处理命令、风格化处理命令、亮度处理命令、光照处理命令和/或色调处理命令。Further, the effect processing command to be responded to includes an effect map processing command, a stylization processing command, a brightness processing command, a lighting processing command and/or a tone processing command.

进一步地，依据第t帧图像，对与第t-1帧图像对应的跟踪框进行调整处理进一步包括：Further, according to the tth frame image, adjusting the tracking frame corresponding to the t-1th frame image further includes:

对第t帧图像进行识别处理，确定第t帧图像中针对特定对象的第一前景图像；Perform recognition processing on the tth frame image, and determine the first foreground image for a specific object in the tth frame image;

将与第t-1帧图像对应的跟踪框应用于第t帧图像；Apply the tracking frame corresponding to the t-1th frame image to the tth frame image;

根据第t帧图像中的第一前景图像，对与第t-1帧图像对应的跟踪框进行调整处理。According to the first foreground image in the tth frame image, the tracking frame corresponding to the t-1th frame image is adjusted.

进一步地，根据第t帧图像中的第一前景图像，对与第t-1帧图像对应的跟踪框进行调整处理进一步包括：Further, according to the first foreground image in the tth frame image, adjusting the tracking frame corresponding to the t-1th frame image further includes:

计算第t帧图像中属于第一前景图像的像素点在与第t-1帧图像对应的跟踪框中所有像素点中所占的比例，将比例确定为第t帧图像的第一前景像素比例；Calculate the ratio of the pixels belonging to the first foreground image in the tth frame image to all the pixels in the tracking frame corresponding to the t-1th frame image, and determine the ratio as the first foreground pixel ratio of the tth frame image ;

获取第t-1帧图像的第二前景像素比例，其中，第t-1帧图像的第二前景像素比例为第t-1帧图像中属于第一前景图像的像素点在与第t-1帧图像对应的跟踪框中所有像素点中所占的比例；Obtain the second foreground pixel ratio of the t-1th frame image, wherein the second foreground pixel ratio of the t-1th frame image is the pixel point belonging to the first foreground image in the t-1th frame image and the t-1th frame image The proportion of all pixels in the tracking frame corresponding to the frame image;

计算第t帧图像的第一前景像素比例与第t-1帧图像的第二前景比例之间的差异值；Calculate the difference value between the first foreground pixel ratio of the tth frame image and the second foreground ratio of the t-1th frame image;

判断差异值是否超过预设差异阈值；若是，则根据差异值，对与第t-1帧图像对应的跟踪框的大小进行调整处理。Judging whether the difference value exceeds a preset difference threshold; if so, adjusting the size of the tracking frame corresponding to the t-1th frame image according to the difference value.

计算第t帧图像中的第一前景图像距离与第t-1帧图像对应的跟踪框的各边框的距离；Calculate the distance between the first foreground image in the t-th frame image and the distance of each border of the tracking frame corresponding to the t-1-th frame image;

根据距离和预设距离阈值，对与第t-1帧图像对应的跟踪框的大小进行调整处理。According to the distance and the preset distance threshold, the size of the tracking frame corresponding to the t-1th frame image is adjusted.

根据第t帧图像中的第一前景图像，确定第t帧图像中的第一前景图像的中心点位置；According to the first foreground image in the tth frame image, determine the center point position of the first foreground image in the tth frame image;

依据第t帧图像中的第一前景图像的中心点位置，对与第t-1帧图像对应的跟踪框的位置进行调整处理，以使与第t-1帧图像对应的跟踪框的中心点位置与第t帧图像中的第一前景图像的中心点位置重合。According to the position of the center point of the first foreground image in the tth frame image, the position of the tracking frame corresponding to the t-1th frame image is adjusted so that the center point of the tracking frame corresponding to the t-1th frame image The position coincides with the center point position of the first foreground image in the tth frame image.

进一步地，根据与第t帧图像对应的跟踪框，对第t帧图像的部分区域进行场景分割处理，得到与第t帧图像对应的分割结果进一步包括：Further, according to the tracking frame corresponding to the t-th frame image, perform scene segmentation processing on the partial area of the t-th frame image, and obtain the segmentation result corresponding to the t-th frame image, which further includes:

根据与第t帧图像对应的跟踪框，从第t帧图像的部分区域提取出待分割图像；According to the tracking frame corresponding to the tth frame image, extract the image to be segmented from the partial area of the tth frame image;

对待分割图像进行场景分割处理，得到与待分割图像对应的分割结果；Scene segmentation processing is performed on the image to be segmented to obtain a segmentation result corresponding to the image to be segmented;

依据与待分割图像对应的分割结果，得到与第t帧图像对应的分割结果。According to the segmentation result corresponding to the image to be segmented, the segmentation result corresponding to the t-th frame image is obtained.

进一步地，根据与第t帧图像对应的跟踪框，从第t帧图像的部分区域提取出待分割图像进一步包括：Further, according to the tracking frame corresponding to the tth frame image, extracting the image to be segmented from the partial area of the tth frame image further includes:

从第t帧图像中提取出与第t帧图像对应的跟踪框中的图像，将提取出的图像确定为待分割图像。Extract the image in the tracking frame corresponding to the t-th frame image from the t-th frame image, and determine the extracted image as the image to be segmented.

进一步地，对待分割图像进行场景分割处理，得到与待分割图像对应的分割结果进一步包括：Further, performing scene segmentation processing on the image to be segmented, and obtaining a segmentation result corresponding to the image to be segmented further includes:

将待分割图像输入至场景分割网络中，得到与待分割图像对应的分割结果。The image to be segmented is input into the scene segmentation network, and the segmentation result corresponding to the image to be segmented is obtained.

进一步地，显示处理后的视频数据进一步包括：将处理后的视频数据实时显示；Further, displaying the processed video data further includes: displaying the processed video data in real time;

该方法还包括：将处理后的视频数据上传至云服务器。The method also includes: uploading the processed video data to a cloud server.

进一步地，将处理后的视频数据上传至云服务器进一步包括：Further, uploading the processed video data to the cloud server further includes:

将处理后的视频数据上传至云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。Upload the processed video data to the cloud video platform server for the cloud video platform server to display the video data on the cloud video platform.

将处理后的视频数据上传至云直播服务器，以供云直播服务器将视频数据实时推送给观看用户客户端。Upload the processed video data to the cloud live broadcast server, so that the cloud live broadcast server can push the video data to the viewing user client in real time.

将处理后的视频数据上传至云公众号服务器，以供云公众号服务器将视频数据推送给公众号关注客户端。Upload the processed video data to the cloud official account server, so that the cloud official account server can push the video data to the official account follower client.

根据本发明的另一方面，提供了一种基于自适应跟踪框分割的视频穿越处理装置，该装置用于对视频中每隔n帧划分得到的各组帧图像进行处理，该装置包括：According to another aspect of the present invention, a video traversal processing device based on adaptive tracking frame segmentation is provided, the device is used to process each group of frame images obtained by dividing every n frames in the video, the device includes:

获取模块，适于获取一组帧图像中包含有特定对象的第t帧图像以及与第t-1帧图像对应的跟踪框，其中t大于1；与第1帧图像对应的跟踪框是根据与第1帧图像对应的分割结果所确定的；The acquisition module is adapted to acquire the tth frame image containing a specific object in a group of frame images and the tracking frame corresponding to the t-1th frame image, wherein t is greater than 1; the tracking frame corresponding to the first frame image is based on the Determined by the segmentation result corresponding to the first frame image;

分割模块，适于依据第t帧图像，对与第t-1帧图像对应的跟踪框进行调整处理，得到与第t帧图像对应的跟踪框；根据与第t帧图像对应的跟踪框，对第t帧图像的部分区域进行场景分割处理，得到与第t帧图像对应的分割结果；The segmentation module is adapted to adjust the tracking frame corresponding to the t-1th frame image according to the tth frame image to obtain the tracking frame corresponding to the tth frame image; according to the tracking frame corresponding to the tth frame image, the Scene segmentation processing is performed on a part of the t-th frame image to obtain a segmentation result corresponding to the t-th frame image;

确定模块，适于根据与第t帧图像对应的分割结果，确定第t帧图像的第二前景图像；The determination module is adapted to determine the second foreground image of the t-th frame image according to the segmentation result corresponding to the t-th frame image;

处理模块，适于依据第二前景图像，添加个性化特效，得到处理后的第t帧图像；The processing module is adapted to add personalized special effects according to the second foreground image to obtain the processed t-th frame image;

覆盖模块，适于将处理后的第t帧图像覆盖第t帧图像得到处理后的视频数据；An overlay module, adapted to cover the processed t-th frame image with the processed t-frame image to obtain processed video data;

显示模块，适于显示处理后的视频数据。The display module is suitable for displaying the processed video data.

进一步地，处理模块进一步适于：Further, the processing module is further adapted to:

进一步地，关键信息为关键点信息；处理模块进一步适于：Further, the key information is key point information; the processing module is further suitable for:

进一步地，分割模块进一步适于：Further, the segmentation module is further adapted to:

根据与第t帧图像对应的跟踪框，从第t帧图像的部分区域提取出待分割图像；According to the tracking frame corresponding to the tth frame image, the image to be segmented is extracted from the partial area of the tth frame image;

进一步地，显示模块进一步适于：将处理后的视频数据实时显示；Further, the display module is further adapted to: display the processed video data in real time;

该装置还包括：上传模块，适于将处理后的视频数据上传至云服务器。The device also includes: an upload module, suitable for uploading the processed video data to a cloud server.

进一步地，上传模块进一步适于：Further, the upload module is further adapted to:

根据本发明的又一方面，提供了一种计算设备，包括：处理器、存储器、通信接口和通信总线，处理器、存储器和通信接口通过通信总线完成相互间的通信；According to yet another aspect of the present invention, a computing device is provided, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus;

存储器用于存放至少一可执行指令，可执行指令使处理器执行上述基于自适应跟踪框分割的视频数据实时处理方法对应的操作。The memory is used to store at least one executable instruction, and the executable instruction causes the processor to execute the operations corresponding to the above video data real-time processing method based on adaptive tracking frame segmentation.

根据本发明的再一方面，提供了一种计算机存储介质，存储介质中存储有至少一可执行指令，可执行指令使处理器执行如上述基于自适应跟踪框分割的视频数据实时处理方法对应的操作。According to still another aspect of the present invention, a computer storage medium is provided, and at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to perform the above-mentioned method corresponding to the video data real-time processing method based on adaptive tracking frame segmentation operate.

根据本发明提供的技术方案，针对每组帧图像，基于与第t-1帧图像对应的跟踪框得到与第t帧图像对应的跟踪框，并利用该跟踪框对第t帧图像进行场景分割，能够快速、精准地得到第t帧图像对应的分割结果，有效地提高了图像场景分割的分割精度。与现有技术中对帧图像的全部内容都进行场景分割处理相比，本发明仅对帧图像的部分区域进行场景分割处理，有效地减少了图像场景分割的数据处理量，提高了处理效率，优化了图像场景分割处理方式；并且基于所得到的分割结果能够更为精准、快速地对帧图像添加个性化特效，美化了视频数据显示效果。According to the technical solution provided by the present invention, for each group of frame images, the tracking frame corresponding to the t-th frame image is obtained based on the tracking frame corresponding to the t-1th frame image, and the scene segmentation is performed on the t-th frame image using the tracking frame , can quickly and accurately obtain the segmentation result corresponding to the t-th frame image, effectively improving the segmentation accuracy of the image scene segmentation. Compared with performing scene segmentation processing on all contents of the frame image in the prior art, the present invention only performs scene segmentation processing on a part of the frame image, which effectively reduces the amount of data processing for image scene segmentation and improves processing efficiency. The image scene segmentation processing method is optimized; and based on the obtained segmentation results, personalized special effects can be added to the frame image more accurately and quickly, and the video data display effect is beautified.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same parts. In the attached picture:

图1示出了根据本发明一个实施例的基于自适应跟踪框分割的视频数据实时处理方法的流程示意图；FIG. 1 shows a schematic flow diagram of a method for real-time processing of video data based on adaptive tracking frame segmentation according to an embodiment of the present invention;

图2示出了根据本发明另一个实施例的基于自适应跟踪框分割的视频数据实时处理方法的流程示意图；2 shows a schematic flow diagram of a method for real-time processing of video data based on adaptive tracking frame segmentation according to another embodiment of the present invention;

图3示出了根据本发明一个实施例的基于自适应跟踪框分割的视频数据实时处理装置的结构框图；FIG. 3 shows a block diagram of a real-time processing device for video data based on adaptive tracking frame segmentation according to an embodiment of the present invention;

图4示出了根据本发明实施例的一种计算设备的结构示意图。Fig. 4 shows a schematic structural diagram of a computing device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

本发明提供了一种基于自适应跟踪框分割的视频数据实时处理方法，考虑到在视频拍摄或视频录制的过程中，所拍摄或所录制的特定对象由于发生运动等原因，其数量可能会发生变化，以特定对象为人体为例，所拍摄或所录制的人体的数量可能会增加或减少，为了能够快速、精准对视频中的帧图像进行场景分割处理，该方法对视频中每隔n帧划分得到的各组帧图像进行处理，并针对每组帧图像，基于与第t-1帧图像对应的跟踪框得到与第t帧图像对应的跟踪框，并利用该跟踪框对第t帧图像进行场景分割。在本发明中，前景图像可以仅包含特定对象，背景图像为帧图像中除前景图像之外的图像。为了将在分割处理前的帧图像中的前景图像和在分割处理后的帧图像中的前景图像进行区分，在本发明中，将在分割处理前的帧图像中的前景图像称为第一前景图像，将在分割处理后的帧图像中的前景图像称为第二前景图像。同理，将在分割处理前的帧图像中的背景图像称为第一背景图像，将在分割处理后的帧图像中的背景图像称为第二背景图像。其中，跟踪框可以为矩形框，用于框选帧图像中的第一前景图像，实现对帧图像中特定对象的跟踪，本领域技术人员可根据实际需要对n进行设置，此处不做限定。其中，n可以为固定预设值，例如当n为20时，那么每隔20帧对视频中的帧图像进行划分，得到各组帧图像，该方法对划分得到的各组帧图像进行处理。The present invention provides a real-time processing method of video data based on adaptive tracking frame segmentation, considering that during the process of video shooting or video recording, the number of specific objects photographed or recorded may change due to movement and other reasons. Changes, taking a specific object as a human body as an example, the number of photographed or recorded human bodies may increase or decrease. In order to quickly and accurately perform scene segmentation processing on the frame images in the video, this method divides every n frames in the video Each group of frame images obtained by division is processed, and for each group of frame images, based on the tracking frame corresponding to the t-1th frame image, the tracking frame corresponding to the t-th frame image is obtained, and the t-th frame image is tracked using the tracking frame Perform scene segmentation. In the present invention, the foreground image may only contain specific objects, and the background image is an image other than the foreground image in the frame image. In order to distinguish the foreground image in the frame image before the segmentation processing from the foreground image in the frame image after the segmentation processing, in the present invention, the foreground image in the frame image before the segmentation processing is referred to as the first foreground image, the foreground image in the segmented frame image is referred to as the second foreground image. Similarly, the background image in the frame image before the segmentation processing is called the first background image, and the background image in the frame image after the segmentation processing is called the second background image. Wherein, the tracking frame can be a rectangular frame, which is used to frame the first foreground image in the frame image, and realize the tracking of a specific object in the frame image. Those skilled in the art can set n according to actual needs, and there is no limitation here . Wherein, n can be a fixed preset value. For example, when n is 20, then the frame images in the video are divided every 20 frames to obtain each group of frame images, and the method processes each group of divided frame images.

图1示出了根据本发明一个实施例的基于自适应跟踪框分割的视频数据实时处理方法的流程示意图，该方法用于对视频中每隔n帧划分得到的各组帧图像进行处理，如图1所示，针对其中一组帧图像，该方法包括如下步骤：Fig. 1 shows a schematic flow chart of a method for real-time processing of video data based on adaptive tracking frame segmentation according to an embodiment of the present invention, the method is used to process each group of frame images obtained by dividing every n frames in the video, such as As shown in Figure 1, for one set of frame images, the method includes the following steps:

步骤S100，获取一组帧图像中包含有特定对象的第t帧图像以及与第t-1帧图像对应的跟踪框。Step S100, acquiring the t-th frame image containing a specific object in a group of frame images and the tracking frame corresponding to the t-1-th frame image.

其中，帧图像中包含有特定对象，特定对象可为人体、车辆等。本领域技术人员可根据实际需要对特定对象进行设置，此处不做限定。当需要对一组帧图像中的第t帧图像进行场景分割时，其中t大于1，在步骤S100中，获取第t帧图像以及与第t-1帧图像对应的跟踪框。其中，与第t-1帧图像对应的跟踪框能够完全将第t-1帧图像中的第一前景图像框选在内。具体地，与第1帧图像对应的跟踪框是根据与第1帧图像对应的分割结果所确定的。Wherein, the frame image contains a specific object, and the specific object may be a human body, a vehicle, or the like. Those skilled in the art may set specific objects according to actual needs, which is not limited here. When it is necessary to perform scene segmentation on the tth frame image in a group of frame images, where t is greater than 1, in step S100, acquire the tth frame image and the tracking frame corresponding to the t-1th frame image. Wherein, the tracking frame corresponding to the t-1th frame image can completely select the first foreground image frame in the t-1th frame image. Specifically, the tracking frame corresponding to the first frame image is determined according to the segmentation result corresponding to the first frame image.

步骤S101，依据第t帧图像，对与第t-1帧图像对应的跟踪框进行调整处理，得到与第t帧图像对应的跟踪框；根据与第t帧图像对应的跟踪框，对第t帧图像的部分区域进行场景分割处理，得到与第t帧图像对应的分割结果。Step S101, according to the tth frame image, adjust the tracking frame corresponding to the t-1th frame image to obtain the tracking frame corresponding to the tth frame image; A part of the frame image is subjected to scene segmentation processing to obtain a segmentation result corresponding to the t-th frame image.

在利用跟踪框对第一前景图像进行跟踪的过程中，跟踪框需要根据每一个帧图像进行调整，那么针对第t帧图像，可对与第t-1帧图像对应的跟踪框的大小和位置进行调整，使得调整后的跟踪框能够适用于第t帧图像，从而得到与第t帧图像对应的跟踪框。由于与第t帧图像对应的跟踪框能够将第t帧图像中的第一前景图像框选在内，因此可根据与第t帧图像对应的跟踪框，对第t帧图像的部分区域进行场景分割处理，得到与第t帧图像对应的分割结果。例如，可对第t帧图像中与第t帧图像对应的跟踪框所框选的区域进行场景分割处理。与现有技术中对帧图像的全部内容进行场景分割处理相比，本发明仅对帧图像的部分区域进行场景分割处理，有效地减少了图像场景分割的数据处理量，提高了处理效率。In the process of using the tracking frame to track the first foreground image, the tracking frame needs to be adjusted according to each frame image, then for the tth frame image, the size and position of the tracking frame corresponding to the t-1th frame image can be adjusted Adjustment is made so that the adjusted tracking frame can be applied to the image of the tth frame, so as to obtain the tracking frame corresponding to the image of the tth frame. Since the tracking frame corresponding to the t-th frame image can frame the first foreground image in the t-th frame image, the scene can be performed on a part of the t-th frame image according to the tracking frame corresponding to the t-th frame image Segmentation processing to obtain the segmentation result corresponding to the t-th frame image. For example, the scene segmentation process may be performed on the area framed by the tracking frame corresponding to the t-th frame of image in the t-th frame of image. Compared with performing scene segmentation processing on all contents of a frame image in the prior art, the present invention only performs scene segmentation processing on a partial area of a frame image, which effectively reduces the amount of data processing for image scene segmentation and improves processing efficiency.

步骤S102，根据与第t帧图像对应的分割结果，确定第t帧图像的第二前景图像。Step S102: Determine the second foreground image of the t-th frame of image according to the segmentation result corresponding to the t-th frame of image.

根据与第t帧图像对应的分割结果可清楚地确定出第t帧图像中哪些像素点属于第二前景图像，哪些像素点属于第二背景图像，从而确定出第t帧图像的第二前景图像。According to the segmentation result corresponding to the t-th frame image, it can be clearly determined which pixels in the t-th frame image belong to the second foreground image and which pixels belong to the second background image, thereby determining the second foreground image of the t-th frame image .

步骤S103，依据第二前景图像，添加个性化特效，得到处理后的第t帧图像。Step S103, adding personalized special effects according to the second foreground image to obtain the processed t-th frame image.

在确定了第二前景图像之后，就可依据第二前景图像，添加个性化特效，得到处理后的第t帧图像。本领域技术人员可根据实际需要设置个性化特效，此处不做限定。例如，可依据第二前景图像，在特定对象的边缘处添加效果贴图，效果贴图可以为静态的效果贴图，也可以为动态的效果贴图，具体地，当特定对象为人体时，效果贴图可以为如火焰、跳动的音符、浪花等效果贴图；当特定对象为人体头部时，效果贴图可以为如发冠、晃动的耳朵等效果贴图，具体根据实施情况进行设置，此处不做限定。After the second foreground image is determined, a personalized special effect can be added according to the second foreground image to obtain the processed t-th frame image. Persons skilled in the art may set personalized special effects according to actual needs, which is not limited here. For example, according to the second foreground image, an effect map can be added at the edge of the specific object. The effect map can be a static effect map or a dynamic effect map. Specifically, when the specific object is a human body, the effect map can be Effect maps such as flames, jumping musical notes, waves, etc.; when the specific object is a human head, the effect maps can be effect maps such as hair crowns, shaking ears, etc., which are set according to the actual situation, and are not limited here.

步骤S104，将处理后的第t帧图像覆盖第t帧图像得到处理后的视频数据。Step S104, overlaying the processed t-th frame image on the t-th frame image to obtain processed video data.

使用处理后的第t帧图像直接覆盖掉原来的第t帧图像，直接可以得到处理后的视频数据。同时，录制的用户还可以直接看到处理后的第t帧图像。The processed image of the tth frame is used to directly overwrite the original image of the tth frame, and the processed video data can be obtained directly. At the same time, the recorded user can also directly see the processed t-th frame image.

步骤S105，显示处理后的视频数据。Step S105, displaying the processed video data.

在得到处理后的第t帧图像时，会将处理后的第t帧图像直接覆盖原来的第t帧图像。覆盖时的速度较快，一般在1/24秒之内完成。对于用户而言，由于覆盖处理的时间相对短，人眼没有明显的察觉，即人眼没有察觉到视频数据中的原第t帧图像被覆盖的过程。这样在后续显示处理后的视频数据时，相当于一边拍摄和/或录制和/或播放视频数据时，一边实时显示的为处理后的视频数据，用户不会感觉到视频数据中帧图像发生覆盖的显示效果。When the processed t-th frame of image is obtained, the processed t-th frame of image will directly overwrite the original t-th frame of image. The speed of covering is relatively fast, and it is generally completed within 1/24 second. For the user, since the overlay processing time is relatively short, the human eye does not notice it obviously, that is, the human eye does not perceive the process that the original t-th frame image in the video data is overlaid. In this way, when the processed video data is subsequently displayed, it is equivalent to the processed video data being displayed in real time while shooting and/or recording and/or playing the video data, and the user will not feel that the frame image in the video data is covered display effect.

根据本实施例提供的基于自适应跟踪框分割的视频数据实时处理方法，针对每组帧图像，基于与第t-1帧图像对应的跟踪框得到与第t帧图像对应的跟踪框，并利用该跟踪框对第t帧图像进行场景分割，能够快速、精准地得到第t帧图像对应的分割结果，有效地提高了图像场景分割的分割精度。与现有技术中对帧图像的全部内容都进行场景分割处理相比，本发明仅对帧图像的部分区域进行场景分割处理，有效地减少了图像场景分割的数据处理量，提高了处理效率，优化了图像场景分割处理方式；并且基于所得到的分割结果能够更为精准、快速地对帧图像添加个性化特效，美化了视频数据显示效果。According to the video data real-time processing method based on adaptive tracking frame segmentation provided in this embodiment, for each group of frame images, based on the tracking frame corresponding to the t-1th frame image, the tracking frame corresponding to the t-th frame image is obtained, and using The tracking frame performs scene segmentation on the t-th frame image, and can quickly and accurately obtain the segmentation result corresponding to the t-th frame image, effectively improving the segmentation accuracy of the image scene segmentation. Compared with performing scene segmentation processing on all contents of the frame image in the prior art, the present invention only performs scene segmentation processing on a part of the frame image, which effectively reduces the amount of data processing for image scene segmentation and improves processing efficiency. The image scene segmentation processing method is optimized; and based on the obtained segmentation results, personalized special effects can be added to the frame image more accurately and quickly, and the video data display effect is beautified.

图2示出了根据本发明另一个实施例的基于自适应跟踪框分割的视频数据实时处理方法的流程示意图，该方法用于对视频中每隔n帧划分得到的各组帧图像进行处理，如图2所示，针对其中一组帧图像，该方法包括如下步骤：Fig. 2 shows a schematic flowchart of a method for real-time processing of video data based on adaptive tracking frame segmentation according to another embodiment of the present invention, the method is used to process each group of frame images obtained by dividing every n frames in the video, As shown in Figure 2, for one set of frame images, the method includes the following steps:

步骤S200，获取一组帧图像中包含有特定对象的第t帧图像以及与第t-1帧图像对应的跟踪框。Step S200, acquiring the tth frame image containing a specific object in a group of frame images and the tracking frame corresponding to the t-1th frame image.

其中t大于1。例如，当t为2时，在步骤S200中，获取一组帧图像中包含有特定对象的第2帧图像以及与第1帧图像对应的跟踪框，具体地，与第1帧图像对应的跟踪框是根据与第1帧图像对应的分割结果所确定的；当t为3时，在步骤S200中，获取一组帧图像中包含有特定对象的第3帧图像以及与第2帧图像对应的跟踪框，其中，与第2帧图像对应的跟踪框是在对第2帧图像进行场景分割处理的过程中，对与第1帧图像对应的跟踪框进行调整得到的。where t is greater than 1. For example, when t is 2, in step S200, the second frame image containing a specific object in a group of frame images and the tracking frame corresponding to the first frame image are acquired, specifically, the tracking frame corresponding to the first frame image The frame is determined according to the segmentation result corresponding to the first frame image; when t is 3, in step S200, the third frame image containing a specific object in a group of frame images and the second frame image corresponding to The tracking frame, wherein the tracking frame corresponding to the second frame image is obtained by adjusting the tracking frame corresponding to the first frame image during the scene segmentation process on the second frame image.

步骤S201，对第t帧图像进行识别处理，确定第t帧图像中针对特定对象的第一前景图像，将与第t-1帧图像对应的跟踪框应用于第t帧图像，并根据第t帧图像中的第一前景图像，对与第t-1帧图像对应的跟踪框进行调整处理。Step S201, perform recognition processing on the tth frame image, determine the first foreground image for a specific object in the tth frame image, apply the tracking frame corresponding to the t-1th frame image to the tth frame image, and according to the tth frame image For the first foreground image in the frame image, the tracking frame corresponding to the t-1th frame image is adjusted.

具体地，可利用现有技术中的AE(Adobe After Effects)、NUKE(The FoundryNuke)等图像处理工具对第t帧图像进行识别处理，可识别出第t帧图像中哪些像素点属于第一前景图像，从而确定得到第t帧图像中针对特定对象的第一前景图像。在确定第一前景图像之后，可将与第t-1帧图像对应的跟踪框设置在第t帧图像上，以便根据第t帧图像中的第一前景图像对该跟踪框进行调整，从而得到与第t帧图像对应的跟踪框。Specifically, image processing tools such as AE (Adobe After Effects) and NUKE (The FoundryNuke) in the prior art can be used to identify and process the image of the tth frame, which pixels in the image of the tth frame can be identified as belonging to the first foreground image, so as to determine the first foreground image for a specific object in the tth frame image. After determining the first foreground image, the tracking frame corresponding to the t-1th frame image can be set on the tth frame image, so as to adjust the tracking frame according to the first foreground image in the tth frame image, thus obtaining Tracking box corresponding to image frame t.

具体地，可计算第t帧图像中属于第一前景图像的像素点在与第t-1帧图像对应的跟踪框中所有像素点中所占的比例，将该比例确定为第t帧图像的第一前景像素比例，接着获取第t-1帧图像的第二前景像素比例，其中，第t-1帧图像的第二前景像素比例为第t-1帧图像中属于第一前景图像的像素点在与第t-1帧图像对应的跟踪框中所有像素点中所占的比例，然后计算第t帧图像的第一前景像素比例与第t-1帧图像的第二前景比例之间的差异值，判断差异值是否超过预设差异阈值，如果判断得到差异值超过预设差异阈值，说明与第t-1帧图像对应的跟踪框与第t帧图像中的第一前景图像不相匹配，则根据差异值，对与第t-1帧图像对应的跟踪框的大小进行调整处理。如果判断得到差异值未超过预设差异阈值，则可不对与第t-1帧图像对应的跟踪框的大小进行调整处理。本领域技术人员可根据实际需要对预设差异阈值进行设置，此处不做限定。Specifically, it is possible to calculate the ratio of the pixels belonging to the first foreground image in the tth frame image to all the pixels in the tracking frame corresponding to the t-1th frame image, and determine this ratio as the tth frame image The first foreground pixel ratio, and then obtain the second foreground pixel ratio of the t-1th frame image, wherein the second foreground pixel ratio of the t-1th frame image is the pixel belonging to the first foreground image in the t-1th frame image Points in the proportion of all pixels in the tracking frame corresponding to the t-1th frame image, and then calculate the ratio between the first foreground pixel ratio of the t-th frame image and the second foreground ratio of the t-1th frame image Difference value, judge whether the difference value exceeds the preset difference threshold, if it is judged that the difference value exceeds the preset difference threshold, it means that the tracking frame corresponding to the t-1th frame image does not match the first foreground image in the tth frame image , then adjust the size of the tracking frame corresponding to the t-1th frame image according to the difference value. If it is determined that the difference value does not exceed the preset difference threshold, the size of the tracking frame corresponding to the t-1th frame image may not be adjusted. Those skilled in the art may set the preset difference threshold according to actual needs, which is not limited here.

假设在将与第t-1帧图像对应的跟踪框应用于第t帧图像之后，虽然与第t-1帧图像对应的跟踪框能够完全将第t帧图像中的第一前景图像框选在内，但是第t帧图像的第一前景像素比例与第t-1帧图像的第二前景比例之间的差异值超过了预设差异阈值，说明对于第t帧图像中的第一前景图像，与第t-1帧图像对应的跟踪框可能较大或较小，因此需要对与第t-1帧图像对应的跟踪框的大小进行调整处理。例如，当第t帧图像的第一前景像素比例为0.9，第t-1帧图像的第二前景比例为0.7，且两比例之间的差异值超过了预设差异阈值，那么可根据差异值将与第t-1帧图像对应的跟踪框的大小进行适应性地放大；又如，当第t帧图像的第一前景像素比例为0.5，第t-1帧图像的第二前景比例为0.7，且两比例之间的差异值超过了预设差异阈值，那么可根据差异值将与第t-1帧图像对应的跟踪框的大小进行适应性地缩小。Assume that after applying the tracking frame corresponding to the t-1th frame image to the t-th frame image, although the tracking frame corresponding to the t-1th frame image can completely frame the first foreground image in the t-th frame image However, the difference between the first foreground pixel ratio of the tth frame image and the second foreground ratio of the t-1th frame image exceeds the preset difference threshold, indicating that for the first foreground image in the tth frame image, The tracking frame corresponding to the t-1th frame image may be larger or smaller, so it is necessary to adjust the size of the tracking frame corresponding to the t-1th frame image. For example, when the first foreground pixel ratio of the tth frame image is 0.9, the second foreground pixel ratio of the t-1th frame image is 0.7, and the difference between the two ratios exceeds the preset difference threshold, then the difference can be based on the difference value Adaptively enlarge the size of the tracking frame corresponding to the t-1th frame image; for another example, when the first foreground pixel ratio of the t-th frame image is 0.5, the second foreground ratio of the t-1th frame image is 0.7 , and the difference value between the two ratios exceeds the preset difference threshold, then the size of the tracking frame corresponding to the t-1th frame image can be adaptively reduced according to the difference value.

可选地，计算第t帧图像中的第一前景图像距离与第t-1帧图像对应的跟踪框的各边框的距离；根据计算得到的距离和预设距离阈值，对与第t-1帧图像对应的跟踪框的大小进行调整处理。本领域技术人员可根据实际需要对预设距离阈值进行设置，此处不做限定。例如，计算得到的距离小于预设距离阈值，那么可将与第t-1帧图像对应的跟踪框的大小进行适应性地放大，使得第t帧图像中的第一前景图像距离该跟踪框的各边框的距离符合预设距离阈值；又如，计算得到的距离大于预设距离阈值，那么可将与第t-1帧图像对应的跟踪框的大小进行适应性地缩小，使得第t帧图像中的第一前景图像距离该跟踪框的各边框的距离符合预设距离阈值。Optionally, calculate the distance between the first foreground image in the tth frame image and the borders of the tracking frame corresponding to the t-1th frame image; according to the calculated distance and the preset distance threshold, for the t-1th frame image The size of the tracking frame corresponding to the frame image is adjusted. Those skilled in the art may set the preset distance threshold according to actual needs, which is not limited here. For example, if the calculated distance is less than the preset distance threshold, then the size of the tracking frame corresponding to the t-1th frame image can be adaptively enlarged, so that the distance between the first foreground image in the tth frame image and the tracking frame The distance of each frame meets the preset distance threshold; for another example, if the calculated distance is greater than the preset distance threshold, then the size of the tracking frame corresponding to the t-1th frame image can be adaptively reduced, so that the tth frame image The distance between the first foreground image in and each frame of the tracking frame meets the preset distance threshold.

另外，还可根据第t帧图像中的第一前景图像，确定第t帧图像中的第一前景图像的中心点位置；依据第t帧图像中的第一前景图像的中心点位置，对与第t-1帧图像对应的跟踪框的位置进行调整处理，以使与第t-1帧图像对应的跟踪框的中心点位置与第t帧图像中的第一前景图像的中心点位置重合，从而使第一前景图像能够位于跟踪框中间。In addition, the center point position of the first foreground image in the tth frame image can also be determined according to the first foreground image in the tth frame image; according to the center point position of the first foreground image in the tth frame image, pair and The position of the tracking frame corresponding to the t-1 frame image is adjusted so that the center point position of the tracking frame corresponding to the t-1 frame image coincides with the center point position of the first foreground image in the t frame image, Therefore, the first foreground image can be located in the middle of the tracking frame.

步骤S202，根据与第t帧图像对应的跟踪框，从第t帧图像的部分区域提取出待分割图像。Step S202, according to the tracking frame corresponding to the t-th frame image, extract the image to be segmented from the partial area of the t-th frame image.

具体地，可从第t帧图像中提取出与第t帧图像对应的跟踪框中的图像，将提取出的图像确定为待分割图像。由于与第t帧图像对应的跟踪框能够完全将第t帧图像中的第一前景图像框选在内，那么在第t帧图像中属于该跟踪框之外的像素点均属于第二背景图像，因此在得到了与第t帧图像对应的跟踪框之后，可从第t帧图像中提取出与第t帧图像对应的跟踪框中的图像，并将该图像确定为待分割图像，后续仅对该待分割图像进行场景分割处理，有效地减少了图像场景分割的数据处理量，提高了处理效率。Specifically, the image in the tracking frame corresponding to the t-th frame image may be extracted from the t-th frame image, and the extracted image is determined as the image to be segmented. Since the tracking frame corresponding to the tth frame image can completely select the first foreground image frame in the tth frame image, then the pixels outside the tracking frame in the tth frame image belong to the second background image , so after obtaining the tracking frame corresponding to the t-th frame image, the image in the tracking frame corresponding to the t-th frame image can be extracted from the t-th frame image, and the image is determined as the image to be segmented, and then only The scene segmentation processing is performed on the image to be segmented, which effectively reduces the data processing amount of the image scene segmentation and improves the processing efficiency.

步骤S203，对待分割图像进行场景分割处理，得到与待分割图像对应的分割结果。Step S203, performing scene segmentation processing on the image to be segmented to obtain a segmentation result corresponding to the image to be segmented.

由于与第t帧图像对应的跟踪框能够完全将第t帧图像中的第一前景图像框选在内，那么无需对在第t帧图像中属于该跟踪框之外的像素点进行场景分割处理即可确定属于该跟踪框之外的像素点均属于第二背景图像，因此可仅对提取出的待分割图像进行场景分割处理。Since the tracking frame corresponding to the t-th frame image can completely select the first foreground image frame in the t-th frame image, there is no need to perform scene segmentation processing on the pixels outside the tracking frame in the t-th frame image That is, it can be determined that the pixels outside the tracking frame belong to the second background image, so scene segmentation processing can only be performed on the extracted image to be segmented.

其中，在对待分割图像进行场景分割处理时，可以利用深度学习方法。深度学习是机器学习中一种基于对数据进行表征学习的方法。观测值(例如一幅图像)可以使用多种方式来表示，如每个像素强度值的向量，或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务。可利用深度学习的分割方法对待分割图像进行场景分割处理，得到与待分割图像对应的分割结果。其中，可利用深度学习方法得到的场景分割网络等对待分割图像进行场景分割处理，得到与待分割图像对应的分割结果，根据分割结果可以确定出待分割图像中哪些像素点属于第二前景图像，哪些像素点属于第二背景图像。Wherein, when performing scene segmentation processing on the image to be segmented, a deep learning method can be used. Deep learning is a method based on representation learning of data in machine learning. Observations (such as an image) can be represented in a variety of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, regions of a specific shape, etc. And it is easier to learn tasks from examples with some specific representations. The segmentation method of deep learning can be used to perform scene segmentation processing on the image to be segmented, and the segmentation result corresponding to the image to be segmented can be obtained. Among them, the scene segmentation network obtained by the deep learning method can be used to perform scene segmentation processing on the image to be segmented, and the segmentation result corresponding to the image to be segmented is obtained. According to the segmentation result, it can be determined which pixels in the image to be segmented belong to the second foreground image. Which pixels belong to the second background image.

具体地，可将待分割图像输入至场景分割网络中，得到与待分割图像对应的分割结果。在现有技术中为了便于场景分割网络对所输入的图像进行场景分割处理，需要对图像的尺寸进行调整，将其尺寸调整为预设尺寸，例如预设尺寸为320×240像素，而一般情况下，图像的尺寸大多为1280×720像素，因此需要先将其尺寸调整为320×240像素，然后再对尺寸调整后的图像进行场景分割处理。然而当利用场景分割网络对视频中的帧图像进行场景分割处理时，若第一前景图像在帧图像中所占比例较小，比如第一前景图像在帧图像中所占比例为0.2，那么根据现有技术仍然需要将帧图像的尺寸调小，然后再对其进行场景分割处理，那么在进行场景分割处理时，则很容易将实际属于第二前景图像边缘处的像素点划分为第二背景图像，所得到的分割结果的分割精度较低、分割效果较差。Specifically, the image to be segmented may be input into the scene segmentation network to obtain a segmentation result corresponding to the image to be segmented. In the prior art, in order to facilitate the scene segmentation processing of the input image by the scene segmentation network, it is necessary to adjust the size of the image to a preset size, for example, the preset size is 320×240 pixels, and in general Below, the size of the image is mostly 1280×720 pixels, so it needs to be resized to 320×240 pixels first, and then the scene segmentation process is performed on the resized image. However, when the scene segmentation network is used to perform scene segmentation processing on the frame images in the video, if the proportion of the first foreground image in the frame image is small, such as the proportion of the first foreground image in the frame image is 0.2, then according to The existing technology still needs to reduce the size of the frame image, and then perform scene segmentation processing on it, so when performing scene segmentation processing, it is easy to divide the pixels that actually belong to the edge of the second foreground image into the second background image, the resulting segmentation results have low segmentation accuracy and poor segmentation effect.

而根据本发明提供的技术方案，是将从第t帧图像中提取出的与第t帧图像对应的跟踪框中的图像确定为待分割图像，然后对该待分隔图像进行场景分割处理，当第一前景图像在第t帧图像中所占比例较小时，所提取出的待分割图像的尺寸也将远远小于第t帧图像的尺寸，那么调整为预设尺寸的待分割图像与调整为预设尺寸的帧图像相比，能够更为有效地保留前景图像信息，因此所得到的分割结果的分割精度更高。According to the technical solution provided by the present invention, the image in the tracking frame corresponding to the t-frame image extracted from the t-frame image is determined as the image to be divided, and then the scene segmentation process is performed on the image to be divided. When the proportion of the first foreground image in the tth frame image is small, the size of the extracted image to be segmented will also be far smaller than the size of the tth frame image, so the image to be segmented and adjusted to the preset size are adjusted to Compared with frame images with preset sizes, foreground image information can be preserved more effectively, so the segmentation results obtained have higher segmentation accuracy.

步骤S204，依据与待分割图像对应的分割结果，得到与第t帧图像对应的分割结果。In step S204, according to the segmentation result corresponding to the image to be segmented, the segmentation result corresponding to the t-th frame image is obtained.

待分割图像为与第t帧图像对应的跟踪框中的图像，根据与待分割图像对应的分割结果能够清楚地确定待分割图像中哪些像素点属于第二前景图像，哪些像素点属于第二背景图像，而在第t帧图像中属于该跟踪框之外的像素点均属于第二背景图像，因此可方便、快速地依据与待分割图像对应的分割结果，得到与第t帧图像对应的分割结果，从而能够清楚地确定第t帧图像中哪些像素点属于第二前景图像，哪些像素点属于第二背景图像。与现有技术中对帧图像的全部内容进行场景分割处理相比，本发明仅对从帧图像中提取出的待分割图像进行场景分割处理，有效地减少了图像场景分割的数据处理量，提高了处理效率。The image to be segmented is the image in the tracking frame corresponding to the t-th frame image, and which pixels in the image to be segmented belong to the second foreground image and which pixels belong to the second background can be clearly determined according to the segmentation result corresponding to the image to be segmented image, and the pixels outside the tracking frame in the tth frame image belong to the second background image, so the segmentation corresponding to the tth frame image can be obtained conveniently and quickly according to the segmentation result corresponding to the image to be segmented As a result, it can be clearly determined which pixels in the t-th frame image belong to the second foreground image and which pixels belong to the second background image. Compared with performing scene segmentation processing on all contents of frame images in the prior art, the present invention only performs scene segmentation processing on images to be segmented extracted from frame images, which effectively reduces the amount of data processing for image scene segmentation and improves processing efficiency.

步骤S205，根据与第t帧图像对应的分割结果，确定第t帧图像的第二前景图像。Step S205: Determine the second foreground image of the t-th frame of image according to the segmentation result corresponding to the t-th frame of image.

步骤S206，依据第二前景图像，添加个性化特效，得到处理后的第t帧图像。Step S206, adding personalized special effects according to the second foreground image to obtain the processed t-th frame image.

在一种具体实施方式中，可从第二前景图像中提取出待处理区域的关键信息，根据关键信息，绘制效果贴图，该关键信息可以具体为关键点信息、关键区域信息、和/或关键线信息等。本发明的实施例以关键信息为关键点信息为例进行说明，但本发明的关键信息不限于是关键点信息。使用关键点信息可以提高根据关键点信息绘制效果贴图的处理速度和效率，可以直接根据关键点信息绘制效果贴图，不需要再对关键信息进行后续计算、分析等复杂操作。同时，关键点信息便于提取，且提取准确，使得绘制效果贴图的效果更精准。具体地，可从第二前景图像中提取出待处理区域边缘的关键点信息。本领域技术人员可根据实际需要对待处理区域进行设置，此处不做限定。In a specific implementation manner, the key information of the area to be processed can be extracted from the second foreground image, and the effect map is drawn according to the key information. The key information can be specifically key point information, key area information, and/or key line information, etc. The embodiment of the present invention is described by taking the key information as the key point information as an example, but the key information in the present invention is not limited to the key point information. The use of key point information can improve the processing speed and efficiency of drawing effect maps based on key point information, and can directly draw effect maps based on key point information, without the need for subsequent calculations, analysis and other complex operations on key information. At the same time, key point information is easy to extract, and the extraction is accurate, making the effect of drawing effect maps more accurate. Specifically, the key point information of the edge of the region to be processed may be extracted from the second foreground image. Those skilled in the art can set the area to be treated according to actual needs, which is not limited here.

为了能够方便、快速地绘制出效果贴图，可预先绘制许多的基础效果贴图，那么在绘制效果贴图时，就可先找到对应的基础效果贴图，然后对基础效果贴图进行处理，从而快速地得到效果贴图。其中，这些基础效果贴图可包括不同的服装效果贴图、装饰效果贴图、纹理效果贴图等，例如，装饰效果贴图可以为如火焰、跳动的音符、浪花、发冠、晃动的耳朵等效果贴图。另外，为了便于管理这些基础效果贴图，可建立一个效果贴图库，将这些基础效果贴图存储至该效果贴图库中。In order to draw the effect map conveniently and quickly, many basic effect maps can be drawn in advance, then when drawing the effect map, you can first find the corresponding basic effect map, and then process the basic effect map to get the effect quickly stickers. These basic effect maps may include different clothing effect maps, decorative effect maps, texture effect maps, etc. For example, the decorative effect maps may be effect maps such as flames, dancing musical notes, waves, hair crowns, and shaking ears. In addition, in order to facilitate the management of these basic effect textures, an effect texture library can be established, and these basic effect texture maps can be stored in the effect texture library.

具体地，以关键信息为关键点信息为例，在从第二前景图像中提取出待处理区域的关键点信息之后，可查找与关键点信息对应的基础效果贴图，接着根据关键点信息，计算具有对称关系的至少两个关键点之间的位置信息，然后依据位置信息，对基础效果贴图进行处理，得到效果贴图。通过这种方式能够精准地绘制得到效果贴图。其中，本方法可以根据提取的关键点信息，自动地从效果贴图库中查找与关键点信息对应的基础效果贴图。另外，在实际应用中，为了便于用户使用、更好地满足用户的个性化需求，可向用户展现效果贴图库中所包含的基础效果贴图，用户可根据自己的喜好自行指定基础效果贴图，那么在这种情况下，本方法可获取用户指定的基础效果贴图。Specifically, taking the key information as the key point information as an example, after extracting the key point information of the area to be processed from the second foreground image, the basic effect map corresponding to the key point information can be searched, and then according to the key point information, calculate The position information between at least two key points having a symmetrical relationship, and then according to the position information, the basic effect map is processed to obtain the effect map. In this way, the effect map can be accurately drawn. Wherein, the method can automatically search the basic effect map corresponding to the key point information from the effect map library according to the extracted key point information. In addition, in practical applications, in order to facilitate the use of users and better meet the individual needs of users, the basic effect textures contained in the effect texture library can be displayed to users, and users can specify the basic effect textures according to their own preferences. Then In this case, this method obtains a user-specified base effect texture.

在绘制得到效果贴图之后，可将效果贴图、第二前景图像与预设背景图像进行融合处理，得到处理后的第t帧图像。本领域技术人员可根据实际需要设置预设背景图像，此处不做限定。预设背景图像可以为二维场景背景图像，也可以三维场景背景图像，例如三维海底场景背景图像、三维火山场景背景图像等三维场景背景图像。另外，也可将效果贴图、第二前景图像与根据与第t帧图像对应的分割结果确定的第二背景图像(即第t帧图像原有的背景图像)进行融合处理，得到处理后的第t帧图像。After the effect map is drawn, the effect map, the second foreground image and the preset background image may be fused to obtain the processed t-th frame image. Those skilled in the art can set a preset background image according to actual needs, which is not limited here. The preset background image can be a 2D scene background image, or a 3D scene background image, such as a 3D seabed scene background image, a 3D volcano scene background image, and other 3D scene background images. In addition, the effect map, the second foreground image and the second background image determined according to the segmentation result corresponding to the t-th frame image (that is, the original background image of the t-th frame image) can also be fused to obtain the processed first t frames of images.

可选地，在另一种具体实施方式中，可从第二前景图像中提取出待识别区域的关键信息，接着依据关键信息，对特定对象的姿态进行识别，得到特定对象的姿态识别结果，然后根据特定对象的姿态识别结果，确定对应的对第t帧图像待响应的效果处理命令，得到处理后的第t帧图像。Optionally, in another specific implementation manner, the key information of the area to be recognized can be extracted from the second foreground image, and then the gesture of the specific object is recognized according to the key information to obtain the gesture recognition result of the specific object, Then, according to the posture recognition result of the specific object, determine the corresponding effect processing command to be responded to the t-th frame of image, and obtain the processed t-th frame of image.

在识别特定对象的姿态时，可将关键信息与预设姿态关键信息进行匹配，得到姿态识别结果；另外，还可利用经过训练的姿态识别网络识别特定对象的姿态，由于识别网络是经过训练的，所以可方便、快速地得到特定对象的姿态识别结果。在得到了特定对象的姿态识别结果之后，根据特定对象的不同的姿态识别结果，确定对应的对第t帧图像待响应的效果处理命令。具体地，姿态识别结果可包括如不同形状的面部姿态、手势、腿部动作、全身整体的姿态动作等，根据不同的姿态识别结果，结合不同的应用场景(视频数据所在场景、视频数据应用场景)，可以为不同的姿态识别结果确定一个或多个对应的待响应的效果处理命令。其中，同一姿态识别结果对不同的应用场景可以确定不同的待响应的效果处理命令，不同姿态识别结果在同一应用场景中也可以确定相同的待响应的效果处理命令。针对一个姿态识别结果，所确定的待响应的效果处理命令中可以包含一条或多条的处理命令。具体根据实施情况设置，此处不做限定。在确定待响应的效果处理命令之后，响应该待响应的效果处理命令，将第t帧图像按照待响应的效果处理命令进行处理，从而得到处理后的第t帧图像。When recognizing the posture of a specific object, the key information can be matched with the preset posture key information to obtain the posture recognition result; in addition, the trained posture recognition network can also be used to recognize the posture of a specific object, because the recognition network is trained , so the gesture recognition result of a specific object can be obtained conveniently and quickly. After the gesture recognition result of the specific object is obtained, according to different gesture recognition results of the specific object, the corresponding effect processing command to be responded to the t-th frame image is determined. Specifically, gesture recognition results may include different shapes of facial gestures, gestures, leg movements, whole-body gestures, etc. According to different gesture recognition results, different application scenarios (the scene where the video data is located, the scene where the video data is used) ), one or more corresponding effect processing commands to be responded to can be determined for different gesture recognition results. Wherein, the same gesture recognition result can determine different effect processing commands to be responded to in different application scenarios, and different gesture recognition results can also determine the same effect processing commands to be responded to in the same application scenario. For a gesture recognition result, the determined effect processing commands to be responded to may include one or more processing commands. It is specifically set according to the implementation situation, and is not limited here. After determining the effect processing command to be responded to, in response to the effect processing command to be responded, the t-th frame of image is processed according to the effect processing command to be responded to obtain the processed t-th frame of image.

其中，待响应的效果处理命令可以包括如各种效果贴图处理命令、风格化处理命令、亮度处理命令、光照处理命令、色调处理命令等。待响应的效果处理命令可以一次包括以上多种处理命令，以使按照待响应的效果处理命令对第t帧图像进行处理时，使处理后的第t帧图像的效果更逼真，整体更协调。The effect processing commands to be responded to may include, for example, various effect map processing commands, stylization processing commands, brightness processing commands, lighting processing commands, tone processing commands, and the like. The effect processing command to be responded can include the above multiple processing commands at one time, so that when the t-th frame image is processed according to the effect processing command to be responded to, the effect of the t-th frame image after processing is more realistic and more coordinated overall.

例如，用户在自拍、直播或录制快视频时，若通过识别所得到的姿态识别结果为手比心形，那么所确定的对第t帧图像待响应的效果处理命令可以为在第t帧图像中增加心形效果贴图处理命令，心形效果贴图可以为静态贴图，也可以为动态贴图；若通过识别所得到的姿态识别结果为双手置于头部下并做出花朵姿态时，那么所确定的对第t帧图像待响应的效果处理命令可以包括在头部增加向日葵的效果贴图命令、将第t帧图像的风格修改为田园风格的风格化处理命令、对第t帧图像的光照效果进行处理的光照处理命令(晴天光照效果)等。For example, when the user is taking a selfie, live broadcasting or recording a quick video, if the gesture recognition result obtained through recognition is a hand-to-heart shape, then the determined effect processing command to be responded to the t-th frame image can be the t-th frame image Add the heart-shaped effect texture processing command in , the heart-shaped effect texture can be a static texture or a dynamic texture; The effect processing command to be responded to the t-th frame image may include an effect map command for adding sunflowers on the head, a stylized processing command for modifying the style of the t-th frame image to a pastoral style, and performing a stylization command on the lighting effect of the t-th frame image. Handled lighting processing commands (sunny lighting effects), etc.

可选地，还可根据特定对象的姿态识别结果，以及第t帧图像中的包含的与交互对象的交互信息，确定对应的对第t帧图像待响应的效果处理命令，得到处理后的第t帧图像。Optionally, according to the gesture recognition result of the specific object and the interaction information with the interactive object contained in the t-th frame image, the corresponding effect processing command to be responded to the t-th frame image can be determined, and the processed t-th frame image can be obtained. t frames of images.

例如，用户在直播时，第t帧图像中除包含该用户(即特定对象)外，还包含了与交互对象(如观看直播的观众)的交互信息，如观看直播的观众送给用户一个冰激凌，第t帧图像上会出现一个冰激凌。结合该交互信息，当得到的姿态识别结果为用户做出吃冰激凌的姿态，确定待响应的效果处理命令为去除原冰激凌效果贴图，增加冰激凌被咬减少的效果贴图，那么将第t帧图像按照待响应的效果处理命令进行处理，以增加与观看直播的观众的互动效果，吸引更多的观众观看直播。For example, when the user is live broadcasting, the t-th frame of image not only contains the user (that is, a specific object), but also contains interactive information with interactive objects (such as the audience watching the live broadcast), such as the audience watching the live broadcast giving the user an ice cream , an ice cream will appear on the tth frame image. Combined with the interaction information, when the obtained gesture recognition result is that the user is eating ice cream, and the effect processing command to be responded is determined to remove the original ice cream effect map and add the effect map for reducing the bite of the ice cream, then the t-th frame image is processed according to The effect processing command to be responded is processed to increase the interactive effect with the audience watching the live broadcast and attract more viewers to watch the live broadcast.

步骤S207，将处理后的第t帧图像覆盖第t帧图像得到处理后的视频数据。In step S207, the processed t-th frame of image is overlaid with the t-th frame of image to obtain processed video data.

步骤S208，显示处理后的视频数据。Step S208, displaying the processed video data.

得到处理后的视频数据后，可以将其实时的进行显示，用户可以直接看到处理后的视频数据的显示效果。After the processed video data is obtained, it can be displayed in real time, and the user can directly see the display effect of the processed video data.

步骤S209，将处理后的视频数据上传至云服务器。Step S209, uploading the processed video data to the cloud server.

将处理后的视频数据可以直接上传至云服务器，具体的，可以将处理后的视频数据上传至一个或多个的云视频平台服务器，如爱奇艺、优酷、快视频等云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。或者还可以将处理后的视频数据上传至云直播服务器，当有直播观看端的用户进入云直播服务器进行观看时，可以由云直播服务器将视频数据实时推送给观看用户客户端。或者还可以将处理后的视频数据上传至云公众号服务器，当有用户关注该公众号时，由云公众号服务器将视频数据推送给公众号关注客户端；进一步，云公众号服务器还可以根据关注公众号的用户的观看习惯，推送符合用户习惯的视频数据给公众号关注客户端。The processed video data can be directly uploaded to the cloud server, specifically, the processed video data can be uploaded to one or more cloud video platform servers, such as iQiyi, Youku, Kuai Video and other cloud video platform servers, For the cloud video platform server to display video data on the cloud video platform. Or the processed video data can also be uploaded to the cloud live broadcast server. When a user with a live viewing terminal enters the cloud live broadcast server to watch, the cloud live broadcast server can push the video data to the viewing user client in real time. Or the processed video data can also be uploaded to the cloud official account server. When a user pays attention to the official account, the cloud official account server will push the video data to the official account follower client; further, the cloud official account server can also according to Pay attention to the viewing habits of the users of the official account, and push the video data in line with the user's habits to the official account follower client.

根据本实施例提供的基于自适应跟踪框分割的视频数据实时处理方法，针对每组帧图像，根据第t帧图像中的第一前景图像，对与第t-1帧图像对应的跟踪框进行调整处理，得到与第t帧图像对应的跟踪框，并利用该跟踪框提取出待分割图像，依据与待分割图像对应的分割结果，能够快速、精准地得到与第t帧图像对应的分割结果，有效地提高了图像场景分割的分割精度。与现有技术中对帧图像的全部内容都进行场景分割处理相比，本发明仅对从帧图像中提取出的待分割图像进行场景分割处理，有效地减少了图像场景分割的数据处理量，提高了处理效率，优化了图像场景分割处理方式；并且基于所得到的分割结果能够更为精准、快速地对帧图像添加个性化特效，美化了视频数据显示效果；另外，基于所得到的分割结果还能够更为精准地识别姿态，快速、准确地确定待响应的效果处理命令，以便对帧图像进行处理，优化了视频数据处理方式。According to the video data real-time processing method based on adaptive tracking frame segmentation provided in this embodiment, for each group of frame images, according to the first foreground image in the tth frame image, the tracking frame corresponding to the t-1th frame image is processed Adjust the processing to obtain the tracking frame corresponding to the t-th frame image, and use the tracking frame to extract the image to be segmented. According to the segmentation result corresponding to the image to be segmented, the segmentation result corresponding to the t-th frame image can be quickly and accurately obtained , which effectively improves the segmentation accuracy of image scene segmentation. Compared with performing scene segmentation processing on all contents of the frame image in the prior art, the present invention only performs scene segmentation processing on the image to be segmented extracted from the frame image, effectively reducing the data processing amount of the image scene segmentation, The processing efficiency is improved, and the image scene segmentation processing method is optimized; and based on the obtained segmentation results, personalized special effects can be added to the frame image more accurately and quickly, and the video data display effect is beautified; in addition, based on the obtained segmentation results It can also identify gestures more accurately, quickly and accurately determine the effect processing command to be responded to process the frame image, and optimize the video data processing method.

图3示出了根据本发明一个实施例的基于自适应跟踪框分割的视频数据实时处理装置的结构框图，该装置用于对视频中每隔n帧划分得到的各组帧图像进行处理，如图3所示，该装置包括：获取模块310、分割模块320、确定模块330、处理模块340、覆盖模块350和显示模块360。Fig. 3 shows a structural block diagram of a real-time video data processing device based on adaptive tracking frame segmentation according to an embodiment of the present invention, the device is used to process each group of frame images obtained by dividing every n frames in the video, such as As shown in FIG. 3 , the device includes: an acquisition module 310 , a segmentation module 320 , a determination module 330 , a processing module 340 , an overlay module 350 and a display module 360 .

获取模块310适于：获取一组帧图像中包含有特定对象的第t帧图像以及与第t-1帧图像对应的跟踪框。The acquisition module 310 is adapted to: acquire the t-th frame image containing a specific object in a group of frame images and the tracking frame corresponding to the t-1-th frame image.

其中t大于1；与第1帧图像对应的跟踪框是根据与第1帧图像对应的分割结果所确定的。Where t is greater than 1; the tracking frame corresponding to the first frame image is determined according to the segmentation result corresponding to the first frame image.

分割模块320适于：依据第t帧图像，对与第t-1帧图像对应的跟踪框进行调整处理，得到与第t帧图像对应的跟踪框；根据与第t帧图像对应的跟踪框，对第t帧图像的部分区域进行场景分割处理，得到与第t帧图像对应的分割结果。The segmentation module 320 is adapted to: adjust the tracking frame corresponding to the t-1th frame image according to the tth frame image to obtain the tracking frame corresponding to the tth frame image; according to the tracking frame corresponding to the tth frame image, Scene segmentation processing is performed on a part of the t-th frame image to obtain a segmentation result corresponding to the t-th frame image.

可选地，分割模块320进一步适于：对第t帧图像进行识别处理，确定第t帧图像中针对特定对象的第一前景图像；将与第t-1帧图像对应的跟踪框应用于第t帧图像；根据第t帧图像中的第一前景图像，对与第t-1帧图像对应的跟踪框进行调整处理。Optionally, the segmentation module 320 is further adapted to: perform recognition processing on the t-th frame image, determine the first foreground image for a specific object in the t-th frame image; apply the tracking frame corresponding to the t-1-th frame image to the t-th frame image t-frame image; according to the first foreground image in the t-th frame image, adjust the tracking frame corresponding to the t-1-th frame image.

具体地，分割模块320进一步适于：计算第t帧图像中属于第一前景图像的像素点在与第t-1帧图像对应的跟踪框中所有像素点中所占的比例，将比例确定为第t帧图像的第一前景像素比例；获取第t-1帧图像的第二前景像素比例，其中，第t-1帧图像的第二前景像素比例为第t-1帧图像中属于第一前景图像的像素点在与第t-1帧图像对应的跟踪框中所有像素点中所占的比例；计算第t帧图像的第一前景像素比例与第t-1帧图像的第二前景比例之间的差异值；判断差异值是否超过预设差异阈值；若是，则根据差异值，对与第t-1帧图像对应的跟踪框的大小进行调整处理。Specifically, the segmentation module 320 is further adapted to: calculate the ratio of pixels belonging to the first foreground image in the tth frame image to all pixels in the tracking frame corresponding to the t-1th frame image, and determine the ratio as The first foreground pixel ratio of the tth frame image; obtain the second foreground pixel ratio of the t-1th frame image, wherein the second foreground pixel ratio of the t-1th frame image is the first foreground pixel ratio in the t-1th frame image The ratio of the pixels of the foreground image to all the pixels in the tracking frame corresponding to the t-1th frame image; calculate the first foreground pixel ratio of the tth frame image and the second foreground ratio of the t-1th frame image The difference value between; determine whether the difference value exceeds the preset difference threshold; if so, adjust the size of the tracking frame corresponding to the t-1th frame image according to the difference value.

分割模块320进一步适于：计算第t帧图像中的第一前景图像距离与第t-1帧图像对应的跟踪框的各边框的距离；根据距离和预设距离阈值，对与第t-1帧图像对应的跟踪框的大小进行调整处理。The segmentation module 320 is further adapted to: calculate the distance between the first foreground image in the t-th frame image and the distance between each frame of the tracking frame corresponding to the t-1-th frame image; The size of the tracking frame corresponding to the frame image is adjusted.

分割模块320进一步适于：根据第t帧图像中的第一前景图像，确定第t帧图像中的第一前景图像的中心点位置；依据第t帧图像中的第一前景图像的中心点位置，对与第t-1帧图像对应的跟踪框的位置进行调整处理，以使与第t-1帧图像对应的跟踪框的中心点位置与第t帧图像中的第一前景图像的中心点位置重合。The segmentation module 320 is further adapted to: according to the first foreground image in the tth frame image, determine the center point position of the first foreground image in the tth frame image; according to the center point position of the first foreground image in the tth frame image , adjust the position of the tracking frame corresponding to the t-1th frame image, so that the center point position of the tracking frame corresponding to the t-1th frame image is the same as the center point of the first foreground image in the t-th frame image The location coincides.

可选地，分割模块320进一步适于：根据与第t帧图像对应的跟踪框，从第t帧图像的部分区域提取出待分割图像；对待分割图像进行场景分割处理，得到与待分割图像对应的分割结果；依据与待分割图像对应的分割结果，得到与第t帧图像对应的分割结果。Optionally, the segmentation module 320 is further adapted to: extract the image to be segmented from the partial area of the t-th frame image according to the tracking frame corresponding to the t-th frame image; perform scene segmentation processing on the image to be segmented, and obtain the corresponding The segmentation result of ; according to the segmentation result corresponding to the image to be segmented, the segmentation result corresponding to the t-th frame image is obtained.

分割模块320进一步适于：从第t帧图像中提取出与第t帧图像对应的跟踪框中的图像，将提取出的图像确定为待分割图像。The segmentation module 320 is further adapted to: extract the image in the tracking frame corresponding to the t-th frame image from the t-th frame image, and determine the extracted image as the image to be segmented.

分割模块320进一步适于：将待分割图像输入至场景分割网络中，得到与待分割图像对应的分割结果。The segmentation module 320 is further adapted to: input the image to be segmented into the scene segmentation network, and obtain a segmentation result corresponding to the image to be segmented.

确定模块330适于：根据与第t帧图像对应的分割结果，确定第t帧图像的第二前景图像。The determining module 330 is adapted to: determine the second foreground image of the t-th frame of image according to the segmentation result corresponding to the t-th frame of image.

处理模块340适于：依据第二前景图像，添加个性化特效，得到处理后的第t帧图像。The processing module 340 is adapted to: add personalized special effects according to the second foreground image to obtain the processed t-th frame of image.

可选地，处理模块340进一步适于：从第二前景图像中提取出待处理区域的关键信息；根据关键信息，绘制效果贴图；将效果贴图、第二前景图像与预设背景图像进行融合处理，得到处理后的第t帧图像；或者，将效果贴图、第二前景图像与根据与第t帧图像对应的分割结果确定的第二背景图像进行融合处理，得到处理后的第t帧图像。Optionally, the processing module 340 is further adapted to: extract key information of the area to be processed from the second foreground image; draw an effect map according to the key information; perform fusion processing on the effect map, the second foreground image and the preset background image , to obtain the processed t-th frame image; or, perform fusion processing on the effect map, the second foreground image and the second background image determined according to the segmentation result corresponding to the t-th frame image, to obtain the processed t-th frame image.

其中，关键信息可以具体为关键点信息、关键区域信息、和/或关键线信息等。本发明的实施例以关键信息为关键点信息为例进行说明。处理模块340进一步适于：查找与关键点信息对应的基础效果贴图；或者，获取用户指定的基础效果贴图；根据关键点信息，计算具有对称关系的至少两个关键点之间的位置信息；依据位置信息，对基础效果贴图进行处理，得到效果贴图。The key information may specifically be key point information, key area information, and/or key line information. The embodiments of the present invention are described by taking key information as key point information as an example. The processing module 340 is further adapted to: find the basic effect map corresponding to the key point information; or obtain the basic effect map specified by the user; calculate the position information between at least two key points with a symmetrical relationship according to the key point information; The position information is used to process the basic effect map to obtain the effect map.

可选地，处理模块340进一步适于：从第二前景图像中提取出待识别区域的关键信息；依据关键信息，对特定对象的姿态进行识别，得到特定对象的姿态识别结果；根据特定对象的姿态识别结果，确定对应的对第t帧图像待响应的效果处理命令，得到处理后的第t帧图像。其中，待响应的效果处理命令包括效果贴图处理命令、风格化处理命令、亮度处理命令、光照处理命令和/或色调处理命令。Optionally, the processing module 340 is further adapted to: extract the key information of the area to be recognized from the second foreground image; recognize the gesture of the specific object according to the key information, and obtain the gesture recognition result of the specific object; As a result of gesture recognition, determine the corresponding effect processing command to be responded to the t-th frame of image, and obtain the processed t-th frame of image. Wherein, the effect processing command to be responded to includes an effect map processing command, a stylization processing command, a brightness processing command, a lighting processing command and/or a tone processing command.

可选地，处理模块340进一步适于：根据特定对象的姿态识别结果，以及第t帧图像中的包含的与交互对象的交互信息，确定对应的对第t帧图像待响应的效果处理命令，得到处理后的第t帧图像。Optionally, the processing module 340 is further adapted to: determine the corresponding effect processing command to be responded to the t-th frame image according to the gesture recognition result of the specific object and the interaction information with the interactive object included in the t-th frame image, Get the image of frame t after processing.

覆盖模块350适于：将处理后的第t帧图像覆盖第t帧图像得到处理后的视频数据。The overlay module 350 is adapted to: cover the processed t-th frame of image with the t-th frame of image to obtain processed video data.

显示模块360适于：显示处理后的视频数据。The display module 360 is adapted to: display the processed video data.

显示模块360得到处理后的视频数据后，可以将其实时的进行显示，用户可以直接看到处理后的视频数据的显示效果。After the display module 360 obtains the processed video data, it can display it in real time, and the user can directly see the display effect of the processed video data.

该装置还可包括：上传模块370，适于将处理后的视频数据上传至云服务器。The device may further include: an upload module 370, adapted to upload the processed video data to a cloud server.

上传模块370将处理后的视频数据可以直接上传至云服务器，具体的，上传模块370可以将处理后的视频数据上传至一个或多个的云视频平台服务器，如爱奇艺、优酷、快视频等云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。或者上传模块370还可以将处理后的视频数据上传至云直播服务器，当有直播观看端的用户进入云直播服务器进行观看时，可以由云直播服务器将视频数据实时推送给观看用户客户端。或者上传模块370还可以将处理后的视频数据上传至云公众号服务器，当有用户关注该公众号时，由云公众号服务器将视频数据推送给公众号关注客户端；进一步，云公众号服务器还可以根据关注公众号的用户的观看习惯，推送符合用户习惯的视频数据给公众号关注客户端。The upload module 370 can directly upload the processed video data to the cloud server. Specifically, the upload module 370 can upload the processed video data to one or more cloud video platform servers, such as iQiyi, Youku, Kuai Video Wait for the cloud video platform server for the cloud video platform server to display video data on the cloud video platform. Or the upload module 370 can also upload the processed video data to the cloud live server. When a user at the live viewing end enters the cloud live server to watch, the cloud live server can push the video data to the viewing user client in real time. Or the upload module 370 can also upload the processed video data to the cloud public number server, and when a user pays attention to the public number, the cloud public number server pushes the video data to the public number attention client; further, the cloud public number server It is also possible to push video data that conforms to user habits to the official account follower client according to the viewing habits of users who follow the official account.

根据本实施例提供的基于自适应跟踪框分割的视频数据实时处理装置，针对每组帧图像，基于与第t-1帧图像对应的跟踪框得到与第t帧图像对应的跟踪框，并利用该跟踪框对第t帧图像进行场景分割，能够快速、精准地得到第t帧图像对应的分割结果，有效地提高了图像场景分割的分割精度。与现有技术中对帧图像的全部内容都进行场景分割处理相比，本发明仅对帧图像的部分区域进行场景分割处理，有效地减少了图像场景分割的数据处理量，提高了处理效率，优化了图像场景分割处理方式；并且基于所得到的分割结果能够更为精准、快速地对帧图像添加个性化特效，美化了视频数据显示效果。According to the video data real-time processing device based on adaptive tracking frame segmentation provided in this embodiment, for each group of frame images, based on the tracking frame corresponding to the t-1th frame image, the tracking frame corresponding to the t-th frame image is obtained, and using The tracking frame performs scene segmentation on the t-th frame image, and can quickly and accurately obtain the segmentation result corresponding to the t-th frame image, effectively improving the segmentation accuracy of the image scene segmentation. Compared with performing scene segmentation processing on all contents of the frame image in the prior art, the present invention only performs scene segmentation processing on a part of the frame image, which effectively reduces the amount of data processing for image scene segmentation and improves processing efficiency. The image scene segmentation processing method is optimized; and based on the obtained segmentation results, personalized special effects can be added to the frame image more accurately and quickly, and the video data display effect is beautified.

本发明还提供了一种非易失性计算机存储介质，计算机存储介质存储有至少一可执行指令，可执行指令可执行上述任意方法实施例中的基于自适应跟踪框分割的视频数据实时处理方法。The present invention also provides a non-volatile computer storage medium, the computer storage medium stores at least one executable instruction, and the executable instruction can execute the video data real-time processing method based on adaptive tracking frame segmentation in any of the above method embodiments .

图4示出了根据本发明实施例的一种计算设备的结构示意图，本发明具体实施例并不对计算设备的具体实现做限定。Fig. 4 shows a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

如图4所示，该计算设备可以包括：处理器(processor)402、通信接口(Communications Interface)404、存储器(memory)406、以及通信总线408。As shown in FIG. 4 , the computing device may include: a processor (processor) 402 , a communication interface (Communications Interface) 404 , a memory (memory) 406 , and a communication bus 408 .

其中：in:

处理器402、通信接口404、以及存储器406通过通信总线408完成相互间的通信。The processor 402 , the communication interface 404 , and the memory 406 communicate with each other through the communication bus 408 .

通信接口404，用于与其它设备比如客户端或其它服务器等的网元通信。The communication interface 404 is used to communicate with network elements of other devices such as clients or other servers.

处理器402，用于执行程序410，具体可以执行上述基于自适应跟踪框分割的视频数据实时处理方法实施例中的相关步骤。The processor 402 is configured to execute the program 410, and specifically, may execute relevant steps in the embodiment of the above-mentioned method for real-time processing of video data based on adaptive tracking frame segmentation.

具体地，程序410可以包括程序代码，该程序代码包括计算机操作指令。Specifically, the program 410 may include program codes including computer operation instructions.

处理器402可能是中央处理器CPU，或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit)，或者是被配置成实施本发明实施例的一个或多个集成电路。计算设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU；也可以是不同类型的处理器，如一个或多个CPU以及一个或多个ASIC。The processor 402 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present invention. The one or more processors included in the computing device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.

存储器406，用于存放程序410。存储器406可能包含高速RAM存储器，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。The memory 406 is used to store the program 410 . The memory 406 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

程序410具体可以用于使得处理器402执行上述任意方法实施例中的基于自适应跟踪框分割的视频数据实时处理方法。程序410中各步骤的具体实现可以参见上述自适应跟踪框分割的视频数据实时处理实施例中的相应步骤和单元中对应的描述，在此不赘述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述方法实施例中的对应过程描述，在此不再赘述。The program 410 may be specifically configured to enable the processor 402 to execute the video data real-time processing method based on adaptive tracking frame segmentation in any of the above method embodiments. For the specific implementation of each step in the program 410, refer to the corresponding description of the corresponding steps and units in the above-mentioned embodiment of the real-time processing of video data by adaptive tracking frame segmentation, and details are not repeated here. Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the foregoing method embodiments, and details are not repeated here.

在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the embodiments of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

Claims

1. it is a kind of based on adaptive tracing frame segmentation video data real-time processing method, the method be used for in video every The each group two field picture that n frames divide is handled, for one of which two field picture, the described method includes：

It obtains in the framing image and includes the t two field pictures of special object and tracking corresponding with t-1 two field pictures Frame, wherein t are more than 1；Tracking box corresponding with the 1st two field picture is according to determined by segmentation result corresponding with the 1st two field picture；

According to t two field pictures, a pair tracking box corresponding with t-1 two field pictures is adjusted processing, obtains corresponding with t two field pictures Tracking box；According to tracking box corresponding with t two field pictures, the subregion of the t two field pictures is carried out at scene cut Reason, obtains segmentation result corresponding with t two field pictures；

According to segmentation result corresponding with t two field pictures, the second foreground image of t two field pictures is determined；

According to second foreground image, personalized special efficacy is added, the t two field pictures that obtain that treated；

Treated the t two field pictures are covered into the t two field pictures video data that obtains that treated；

Video data after display processing.

2. according to the method described in claim 1, wherein, described according to second foreground image, the personalized special efficacy of addition obtains To treated, t two field pictures further comprise：

The key message of pending area is extracted from second foreground image；

According to the key message, effect textures are drawn；

The effect textures, second foreground image and default background image are subjected to fusion treatment, the t that obtains that treated Two field picture；It is alternatively, the effect textures, second foreground image is true with according to segmentation result corresponding with t two field pictures Fixed the second background image carries out fusion treatment, the t two field pictures that obtain that treated.

3. method according to claim 1 or 2, wherein, the key message is key point information；It is described according to the pass Key information is drawn effect textures and is further comprised：

Search basic effect textures corresponding with the key point information；Alternatively, obtain the basic effect textures that user specifies；

According to the key point information, the location information between at least two key points with symmetric relation is calculated；

According to the location information, the basic effect textures are handled, obtain effect textures.

4. according to claim 1-3 any one of them methods, wherein, it is described according to second foreground image, add individual character Change special efficacy, obtaining that treated, t two field pictures further comprise：

The key message in region to be identified is extracted from second foreground image；

According to the key message, the posture of the special object is identified, obtains the gesture recognition of the special object As a result；

According to the gesture recognition of the special object as a result, determining the corresponding effect process order to be responded to t two field pictures, The t two field pictures that obtain that treated.

5. according to claim 1-4 any one of them methods, wherein, the gesture recognition knot according to the special object Fruit determines the corresponding effect process order to be responded to t two field pictures, and obtaining that treated, t two field pictures further comprise：

According to the gesture recognition of the special object as a result, and the friendship with interactive object included in the t two field pictures Mutual information determines the corresponding effect process order to be responded to t two field pictures, the t two field pictures that obtain that treated.

6. according to claim 1-5 any one of them methods, wherein, the effect process order to be responded is pasted including effect Figure processing order, stylization processing order, brightness processed order, photo-irradiation treatment order and/or tone processing order.

7. according to claim 1-6 any one of them methods, wherein, the foundation t two field pictures, pair with t-1 two field pictures Corresponding tracking box is adjusted processing and further comprises：

Processing is identified to t two field pictures, determines to be directed to the first foreground image of special object in t two field pictures；

Tracking box corresponding with t-1 two field pictures is applied to t two field pictures；

The first foreground image in t two field pictures, a pair tracking box corresponding with t-1 two field pictures are adjusted processing.

8. it is a kind of based on adaptive tracing frame segmentation video data real-time processing device, described device be used for in video every The each group two field picture that n frames divide is handled, and described device includes：

Acquisition module, suitable for obtain include in a framing image special object t two field pictures and with t-1 frame figures As corresponding tracking box, wherein t is more than 1；Tracking box corresponding with the 1st two field picture is according to segmentation corresponding with the 1st two field picture As a result it is identified；

Split module, suitable for according to t two field pictures, a pair tracking box corresponding with t-1 two field pictures is adjusted processing, obtain and The corresponding tracking box of t two field pictures；According to tracking box corresponding with t two field pictures, to the subregions of the t two field pictures into The processing of row scene cut, obtains segmentation result corresponding with t two field pictures；

Determining module, suitable for according to segmentation result corresponding with t two field pictures, determining the second foreground image of t two field pictures；

Processing module, suitable for according to second foreground image, adding personalized special efficacy, the t two field pictures that obtain that treated；

Overlay module, suitable for treated the t two field pictures are covered the t two field pictures video counts that obtain that treated According to；

Display module, suitable for the video data after display processing.

9. a kind of computing device, including：Processor, memory, communication interface and communication bus, the processor, the storage Device and the communication interface complete mutual communication by the communication bus；

For the memory for storing an at least executable instruction, the executable instruction makes the processor perform right such as will Ask the corresponding operation of video data real-time processing method based on the segmentation of adaptive tracing frame any one of 1-7.

10. a kind of computer storage media, an at least executable instruction, the executable instruction are stored in the storage medium The video data based on the segmentation of adaptive tracing frame that processor is performed as any one of claim 1-7 is made to handle in real time The corresponding operation of method.