CN110602424A

CN110602424A - Video processing method and electronic equipment

Info

Publication number: CN110602424A
Application number: CN201910803481.2A
Authority: CN
Inventors: 沈军行
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2019-12-20

Abstract

The embodiment of the invention discloses a video processing method and electronic equipment. Wherein, the video processing method includes: acquiring first image data and first audio data of the first video data; performing focus processing on pixels of at least one subject object in the first image data through a preset object separation network to obtain at least A second image data; through the preset voice separation network, focus processing on the audio data matched with at least one subject object in the first audio data to obtain at least one second audio data; for the second image data and the second audio The data is encoded and compressed to obtain the second video data. Using the embodiment of the present invention can realize focusing processing on the image and audio of each subject object.

Description

Video processing method and electronic device

技术领域technical field

本发明实施例涉及人工智能技术领域，尤其涉及一种视频处理方法及电子设备。Embodiments of the present invention relate to the technical field of artificial intelligence, and in particular, to a video processing method and electronic equipment.

背景技术Background technique

目前，图像、音频的分离技术已经广泛地应用；在对图像分离或音频分离之后，可以对分离的图像或音频进行聚焦处理，实现对图像或音频的聚焦。At present, image and audio separation technologies have been widely used; after image separation or audio separation, focusing processing can be performed on the separated image or audio to realize focusing on the image or audio.

但是，只是单纯地对图像或音频进行聚焦处理，并未考虑到一个视频中每一个主体对象对应的图像和音频可以不同，也未针对每一个主体对象对应的图像和音频分别进行聚焦。However, it only focuses on the image or audio without considering that the image and audio corresponding to each main object in a video may be different, and does not focus on the image and audio corresponding to each main object separately.

发明内容Contents of the invention

本发明实施例提供一种视频处理方法及电子设备，以解决不能针对每一个主体对象对应的图像和音频分别进行聚焦的问题。Embodiments of the present invention provide a video processing method and an electronic device to solve the problem that the image and audio corresponding to each subject object cannot be separately focused.

为了解决上述技术问题，本发明是这样实现的：In order to solve the problems of the technologies described above, the present invention is achieved in that:

第一方面，本发明实施例提供了一种视频处理方法，该视频处理方法包括：In a first aspect, an embodiment of the present invention provides a video processing method, the video processing method comprising:

获取第一视频数据的第一图像数据和第一音频数据；Acquiring first image data and first audio data of the first video data;

通过预设的对象分离网络，对第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；Focusing on pixels of at least one subject in the first image data through a preset object separation network to obtain at least one second image data;

通过预设的语音分离网络，对第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据；Focusing on the audio data matched by at least one subject object in the first audio data through a preset voice separation network to obtain at least one second audio data;

对第二图像数据和第二音频数据进行编码压缩处理，得到第二视频数据。Encoding and compressing the second image data and the second audio data is performed to obtain second video data.

第二方面，本发明实施例提供了一种电子设备，该电子设备包括：In a second aspect, an embodiment of the present invention provides an electronic device, and the electronic device includes:

获取模块，用于获取第一视频数据的第一图像数据和第一音频数据；An acquisition module, configured to acquire the first image data and the first audio data of the first video data;

第一聚焦模块，用于通过预设的对象分离网络，对第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；The first focusing module is configured to focus on pixels of at least one subject object in the first image data through a preset object separation network to obtain at least one second image data;

第二聚焦模块，用于通过预设的语音分离网络，对第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据；The second focusing module is configured to focus on the audio data matched by at least one subject object in the first audio data through a preset voice separation network, to obtain at least one second audio data;

编码模块，用于对第二图像数据和第二音频数据进行编码压缩处理，得到第二视频数据。The encoding module is used to encode and compress the second image data and the second audio data to obtain the second video data.

第三方面，本发明实施例提供了一种电子设备，其特征在于，包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述计算机程序被所述处理器执行时实现第一方面所述的视频处理方法的步骤。In the third aspect, the embodiment of the present invention provides an electronic device, which is characterized in that it includes a processor, a memory, and a computer program stored in the memory and operable on the processor, and the computer program is controlled by the The steps for implementing the video processing method described in the first aspect when the processor executes.

第四方面，本发明实施例还提供一种电子设备，其特征在于，包括：In a fourth aspect, an embodiment of the present invention further provides an electronic device, which is characterized in that it includes:

触摸屏，其中，所述触摸屏包括触敏表面和显示屏；a touch screen, wherein the touch screen includes a touch-sensitive surface and a display screen;

一个或多个处理器；one or more processors;

一个或多个存储器；one or more memories;

一个或多个传感器；one or more sensors;

以及一个或多个计算机程序，其中所述一个或多个计算机程序被存储在所述一个或多个存储器中，所述一个或多个计算机程序包括指令，当所述指令被所述电子设备执行时，使得所述电子设备执行第一方面所述的视频处理方法的步骤。and one or more computer programs, wherein the one or more computer programs are stored in the one or more memories, the one or more computer programs comprising instructions, which when executed by the electronic device , causing the electronic device to execute the steps of the video processing method described in the first aspect.

第五方面，本发明实施例还提供一种计算机非瞬态存储介质，所述计算机非瞬态存储介质存储有计算机程序，其特征在于，所述计算机程序被计算设备执行时实现第一方面所述的视频处理方法的步骤。In the fifth aspect, the embodiment of the present invention also provides a computer non-transitory storage medium, the computer non-transitory storage medium stores a computer program, and it is characterized in that, when the computer program is executed by a computing device, the computer program implements the first aspect. The steps of the video processing method described above.

第六方面，本发明实施例还提供一种计算机程序产品，其特征在于，当所述计算机程序产品在计算机上运行时，使得所述计算机执行第一方面所述的视频处理方法。In a sixth aspect, an embodiment of the present invention further provides a computer program product, which is characterized in that, when the computer program product is run on a computer, the computer is made to execute the video processing method described in the first aspect.

在本发明实施例中，通过预设的对象分离网络，对电子设备第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；以及通过预设的语音分离网络，对电子设备第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据，进而能够实现对每一个主体对象的图像数据和音频数据的聚焦处理。In an embodiment of the present invention, at least one second image data is obtained by performing focusing processing on pixels of at least one subject object in the first image data of the electronic device through a preset object separation network; and through a preset voice separation network performing focus processing on the audio data matched by at least one main object in the first audio data of the electronic device to obtain at least one second audio data, and then realizing the focus processing on the image data and audio data of each main object.

附图说明Description of drawings

图1为本发明实施例提供的一种视频处理方法的流程图；FIG. 1 is a flowchart of a video processing method provided by an embodiment of the present invention;

图2为本发明实施例提供的多人聚焦示意图；FIG. 2 is a schematic diagram of multi-person focusing provided by an embodiment of the present invention;

图3为本发明实施例提供的一种视频处理示意图；FIG. 3 is a schematic diagram of video processing provided by an embodiment of the present invention;

图4为本发明实施例提供的另一种视频处理示意图；FIG. 4 is another schematic diagram of video processing provided by an embodiment of the present invention;

图5为本发明实施例提供的一种电子设备的示意图；FIG. 5 is a schematic diagram of an electronic device provided by an embodiment of the present invention;

图6为本发明实施例提供的一种电子设备的示意图。Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

图1为本发明实施例提供的一种视频处理方法的流程图。如图1所示，该视频处理方法可以包括：FIG. 1 is a flowchart of a video processing method provided by an embodiment of the present invention. As shown in Figure 1, the video processing method may include:

步骤101：获取第一视频数据的图像数据和音频数据；Step 101: acquiring image data and audio data of the first video data;

步骤102：通过预设的对象分离网络，对第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；Step 102: Focusing on pixels of at least one subject in the first image data through a preset object separation network to obtain at least one second image data;

步骤103：通过预设的语音分离网络，对第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据；Step 103: Focusing on the audio data matched by at least one subject object in the first audio data through a preset voice separation network to obtain at least one second audio data;

步骤104：对第二图像数据和第二音频数据进行编码压缩处理，得到第二视频数据。Step 104: Perform encoding and compression processing on the second image data and the second audio data to obtain second video data.

在本发明实施例中，通过预设的对象分离网络，对电子设备第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；以及通过预设的语音分离网络，对电子设备第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据；进而能够实现对每一个主体对象的图像数据和音频数据的聚焦处理。In an embodiment of the present invention, at least one second image data is obtained by performing focusing processing on pixels of at least one subject object in the first image data of the electronic device through a preset object separation network; and through a preset voice separation network performing focus processing on the audio data matched by at least one main object in the first audio data of the electronic device to obtain at least one second audio data; furthermore, the focus processing on the image data and audio data of each main object can be realized.

在本发明实施例中，步骤101所述的第一图像数据为第一视频数据中的图像数据；步骤101所述的第一音频数据为第一视频数据中的音频数据；其中，第一图像数据可以通过摄像头或者其他图像采集设备从第一音频数据中获取；第一音频数据可以通过麦克风或者其他音频采集设备从第一音频数据中获取。In the embodiment of the present invention, the first image data described in step 101 is the image data in the first video data; the first audio data described in step 101 is the audio data in the first video data; wherein, the first image The data can be acquired from the first audio data through a camera or other image acquisition equipment; the first audio data can be acquired from the first audio data through a microphone or other audio acquisition equipment.

在本发明实施例中，步骤102所述的对至少一个主体对象的像素进行聚焦处理，包括：In the embodiment of the present invention, the focusing processing on the pixels of at least one subject object in step 102 includes:

基于至少一个主体对象的像素，从图像数据的每一帧图像中识别出非主体对象的像素；identifying pixels of non-subject objects from each frame of image data based on pixels of at least one subject object;

基于预定的高斯滤波处理系数，对非主体对象的像素进行高斯滤波处理；或者performing Gaussian filter processing on pixels of non-subject objects based on predetermined Gaussian filter processing coefficients; or

基于预定的灰度处理系数，对非主体对象的像素进行灰度处理。Based on a predetermined grayscale processing coefficient, grayscale processing is performed on the pixels of the non-subject object.

在本发明实施例中，根据至少一个主体对象的像素，从每一帧图像中识别出非主体对象的像素，并对非主体对象的像素进行虚化或主体对象的像素进行留色处理，进而实现对主体对象的像素的聚焦处理；其中，对像素的聚焦处理是指集中显示主体对象的像素，弱化非主体对象的像素的显示；其中，虚化处理是指对非主体对象的像素运用高斯滤波进行模糊化处理：Ib’＝GaussBlur(Ib，alpha)；留色是对主体对象的像素保留原有颜色，而对非主体对象的像素灰度化处理：Ib’＝Gray(Ib，alpha)。其中，Ib为非主体对象的像素，Ib’为第二图像数据，alpha为调节参数。In the embodiment of the present invention, according to at least one pixel of the main object, the pixels of the non-main object are identified from each frame of image, and the pixels of the non-main object are blurred or the pixels of the main object are color-retained, and then Realize the focus processing on the pixels of the main object; among them, the focus processing on the pixels refers to concentrating on displaying the pixels of the main object, and weakening the display of the pixels of the non-main object; among them, the blurring process refers to applying Gaussian to the pixels of the non-main object Filtering for fuzzy processing: Ib'=GaussBlur(Ib, alpha); color retention is to retain the original color for the pixels of the main object, and grayscale processing for the pixels of non-main objects: Ib'=Gray(Ib, alpha) . Wherein, Ib is a pixel of a non-subject object, Ib' is the second image data, and alpha is an adjustment parameter.

其中，alpha参数可以为定值，也可以变化的值；例如，在第一视频数据为已录制好的视频数据时，alpha参数为定值，一般是在录制之前预先设置好alpha参数，在视频录制好之后，alpha参数的值不能被改变；在第一视频数据为录制过程中的视频数据时，则可以根据用户的设置需要更改alpha参数的值，进而使得第二视频数据更符合用户的需要，提高用户体验。Among them, the alpha parameter can be a fixed value or a variable value; for example, when the first video data is recorded video data, the alpha parameter is a fixed value, and the alpha parameter is generally set in advance before recording. After the recording is completed, the value of the alpha parameter cannot be changed; when the first video data is the video data in the recording process, the value of the alpha parameter can be changed according to the user's settings, so that the second video data can better meet the user's needs , improve user experience.

在本发明实施例，在从第一图像数据中识别出非主体对象的像素之后，该视频处理方法还包括：基于获取的非主体对象的图像亮度，来调整预定的高斯滤波处理系数或预定的灰度处理系数，具体包括：In the embodiment of the present invention, after identifying the pixels of the non-subject object from the first image data, the video processing method further includes: adjusting a predetermined Gaussian filter processing coefficient or a predetermined Gamma processing coefficients, specifically including:

获取非主体对象的图像亮度；Obtain the image brightness of the non-subject object;

根据图像亮度，调整预定的高斯滤波处理系数或预定的灰度处理系数。According to the brightness of the image, a predetermined Gaussian filter processing coefficient or a predetermined grayscale processing coefficient is adjusted.

需要说明的是，如果非主体对象的图像亮度值很高，即亮度很大时，则可以调大预定的高斯滤波处理系数或预定的灰度处理系数，使非主体对象的图像亮度变暗，进而实现对主体对象的像素的聚焦处理。It should be noted that if the brightness value of the image of the non-subject object is very high, that is, when the brightness is very large, the predetermined Gaussian filter processing coefficient or the predetermined gray scale processing coefficient can be increased to darken the image brightness of the non-subject object, Further, the focus processing on the pixels of the subject object is realized.

在本发明实施例，步骤103所述的对第一音频数据中的至少一个主体对象的音频数据进行聚焦处理，包括：In the embodiment of the present invention, the focusing processing on the audio data of at least one subject object in the first audio data in step 103 includes:

基于至少一个主体对象的音频数据，从第一音频数据中识别出非主体对象的音频数据；identifying audio data of a non-subject object from the first audio data based on audio data of at least one subject object;

基于预设的衰减系数，对非主体对象的音频数据进行衰减处理。Based on a preset attenuation coefficient, perform attenuation processing on the audio data of the non-main object.

在本发明实施例中，根据至少一个主体对象的音频数据，从每一帧音频中识别出非主体对象的音频数据，并对非主体对象的音频数据进行抑制衰减处理，进而实现对主体对象的音频数据的聚焦处理，对音频数据的聚焦处理是指将非主体对象的音频数据进行衰减处理，进而集中显示主体对象的音频数据；其中，抑制衰减处理：Ab’＝beta*Ab，其中beta为介于0和1之间的衰减系数，如果完全抑制衰减，则beta＝0。In the embodiment of the present invention, according to the audio data of at least one main object, the audio data of the non-main object is identified from each frame of audio, and the audio data of the non-main object is suppressed and attenuated, and then the main object is realized. The focus processing of audio data, the focus processing of audio data refers to the attenuation processing of the audio data of the non-subject object, and then the audio data of the main object is concentratedly displayed; wherein, the suppression of attenuation processing: Ab'=beta*Ab, where beta is Attenuation coefficient between 0 and 1, beta = 0 if attenuation is completely suppressed.

其中，beta参数可以为定值，也可以变化的值；例如，在第一视频数据为已录制好的视频数据时，beta参数为定值，一般是在录制之前预先设置好beta参数，在视频录制好之后，beta参数的值不能被改变；在第一视频数据为录制过程中的视频数据时，则可以根据用户的设置需要更改beta参数的值，进而使得第二视频数据更符合用户的需要，提高用户体验。Wherein, the beta parameter can be a fixed value or a value that can be changed; for example, when the first video data is recorded video data, the beta parameter is a fixed value, and generally the beta parameter is pre-set before recording. After recording, the value of the beta parameter cannot be changed; when the first video data is the video data in the recording process, the value of the beta parameter can be changed according to the user's settings, so that the second video data can better meet the user's needs , improve user experience.

在本发明实施例，步骤103所述的对至少一个主体对象的音频数据进行聚焦处理，包括：In the embodiment of the present invention, the focusing processing on the audio data of at least one subject object described in step 103 includes:

将主体对象的音频数据替换为预设的音频数据。Replace the audio data of the main object with the preset audio data.

在本发明实施例中，聚焦处理主体对象的音频数据的方式不单单限于抑制衰减、由预设的音频数据替换等方式，例如，虚拟声音，只要能将主体对象的音频数据相较非主体对象的音频数据凸显出来，都可以涵盖在本发明实施例的保护范围内，在此不再赘述。In the embodiment of the present invention, the method of focusing on the audio data of the main object is not limited to suppressing attenuation, replacing by preset audio data, etc., for example, virtual sound, as long as the audio data of the main object can be compared with the non-main object The highlighted audio data can be included in the scope of protection of the embodiments of the present invention, and will not be repeated here.

在本发明实施例中，在获取第一视频数据的第一图像数据和第一音频数据之后，该视频处理方法还包括：In an embodiment of the present invention, after acquiring the first image data and the first audio data of the first video data, the video processing method further includes:

基于用户对至少一个主体对象的选择输入，确定每个主体对象的目标像素；determining a target pixel for each subject object based on a user selection input of at least one subject object;

确定与每个主体对象匹配的目标音频数据；determining target audio data matching each subject object;

建立每个主体对象的目标像素和与每个主体对象匹配的目标音频数据之间的映射关系并存储至第二视频数据中。A mapping relationship between the target pixels of each main object and the target audio data matching each main object is established and stored in the second video data.

其中，目标像素为主体对象的所有像素，目标音频数据为主体对象的所有音频数据。Wherein, the target pixels are all pixels of the main object, and the target audio data are all audio data of the main object.

在对主体对象进行分离的过程中，基于用户对至少一个主体对象的选择输入，从第一图像数据中分割出多个主体对象的像素，例如，I0，I1，……，In；并确定与每一个主体对象匹配的音频数据，例如A0，A1，……，An；其中，I0与A0对应同一个主体对象、I1与A1对应同一个主体对象、…、In与An对应同一个主体对象；以及建立同一个主体对象的像素与音频数据之间的映射关系，以及将映射关系存储在第二视频数据中。其中，与主体对象匹配的音频数据指该主体对象产生的音频数据或该主体对象对应的音频数据。In the process of separating the main object, based on the user's selection input of at least one main object, the pixels of a plurality of main objects are segmented from the first image data, for example, I0, I1, ..., In; and determined and The audio data matched by each subject object, such as A0, A1, ..., An; wherein, I0 and A0 correspond to the same subject object, I1 and A1 correspond to the same subject object, ..., In and An correspond to the same subject object; And establishing a mapping relationship between pixels of the same subject object and audio data, and storing the mapping relationship in the second video data. Wherein, the audio data matching the main object refers to the audio data generated by the main object or the audio data corresponding to the main object.

在建立映射关系之后，还可以针对不同的主体对象设置不同的重要性系数(即权重值)，例如，针对不同的主体对象的像素设置不同的第一权重值，以及针对不同的主体对象的音频数据设置不同的第二权重值。在设置权重值之后，一并与映射关系存储在第二音频数据中。上述针对多个主体对象设置不同的权重值，在后续播放的时候，能进一步丰富用户视觉效果，提高用户体验度。After the mapping relationship is established, different importance coefficients (ie, weight values) can also be set for different main objects, for example, different first weight values are set for pixels of different main objects, and different first weight values are set for different main objects. Different second weight values are set for the data. After the weight value is set, the mapping relationship is stored in the second audio data. The aforementioned setting of different weight values for multiple main objects can further enrich the user's visual effect and improve the user experience during subsequent playback.

在本发明实施例中，确定与每个主体对象匹配的目标音频数据，包括：In an embodiment of the present invention, determining the target audio data matched with each subject object includes:

从预先存储的至少一个音频数据中，筛选与所述每个主体对象的音频特征匹配的目标音频数据；From at least one pre-stored audio data, screening target audio data matching the audio characteristics of each subject object;

或者，将用户选取的音频数据确定为与每个主体对象匹配的目标音频数据。Alternatively, the audio data selected by the user is determined as target audio data matching each subject object.

需要说明的是，针对第一种确定与主体对象匹配的目标音频数据，主要是由电子设备进行识别，即预先将每一个主体对象的音频数据存储起来，并进行筛选，由于每一主体对象的音频特征(例如，声纹)不同，可以分别进行声纹识别，并筛选出与每个主体对象的音频特征匹配的目标音频数据；针对第二种确定与主体对象匹配的目标音频数据，则是将用户的选择确定为与每个主体对象匹配的目标音频数据。It should be noted that for the first type of target audio data that matches the subject object, it is mainly identified by electronic equipment, that is, the audio data of each subject object is stored in advance and screened, because each subject object Audio characteristics (for example, voiceprint) are different, can carry out voiceprint recognition separately, and screen out the target audio data that matches the audio characteristic of each subject object; A user's selection is determined as target audio data matched with each subject object.

本发明实施例，通过匹配到主体对象的音频数据，使得同一主体对象的像素与音频数据关联起来，便于后续对该主体对象的像素和音频数据同时进行处理，使得在后续播放的时候，能进一步丰富用户视觉效果，提高用户体验度。In the embodiment of the present invention, by matching the audio data of the main object, the pixels of the same main object are associated with the audio data, which facilitates subsequent processing of the pixel and audio data of the main object at the same time, so that it can be further played during subsequent playback. Enrich user visual effects and improve user experience.

在本发明实施例中，选择输入包括以下项中的至少一项：单击、双击、长按等。In this embodiment of the present invention, the selection input includes at least one of the following items: single click, double click, long press, and the like.

本发明实施例，通过预设的对象分离网络，对第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据之后，该视频处理方法还包括：In an embodiment of the present invention, after performing focus processing on pixels of at least one subject object in the first image data through a preset object separation network, and obtaining at least one second image data, the video processing method further includes:

在至少一个主体对象中的至少两个主体对象被选择的情况下，基于第一权重值，对至少两个主体对象的目标像素进行高斯滤波处理或灰度处理；When at least two of the at least one main object are selected, based on the first weight value, perform Gaussian filtering or grayscale processing on the target pixels of the at least two main objects;

播放经过高斯滤波处理或灰度处理的第二图像数据；Playing the second image data processed by Gaussian filtering or grayscale processing;

其中，至少一个主体对象中每一个主体对象的预设的高斯滤波处理系数或预设的灰度处理系数对应不同的第一权重值；和/或，Wherein, the preset Gaussian filter processing coefficients or the preset gray scale processing coefficients of each of the main objects in at least one main object correspond to different first weight values; and/or,

本发明实施例，通过预设的语音分离网络，对第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据之后，该视频处理方法还包括：In the embodiment of the present invention, after the at least one second audio data is obtained, the video processing method further includes:

在至少一个主体对象中的至少两个主体对象被选择的情况下，基于第二权重值，对至少两个主体对象的目标音频数据进行混音处理；In the case where at least two of the at least one main object are selected, based on the second weight value, performing mixing processing on the target audio data of the at least two main objects;

播放经过混音处理的第二音频数据；Play the second audio data after audio mixing;

其中，至少一个主体对象中的每一个主体对象的预设衰减系数对应不同的第二权重值。Wherein, the preset attenuation coefficients of each of the at least one main object correspond to different second weight values.

本发明实施例，用户直接点击选择某个主体对象，该主体对象被识别之后，该主体对象的像素Ix(I0，I1，……，In中的一个主体对象的像素)之外的非主体对象的像素将被虚化或去色，同时主体对象匹配的音频数据Ax(A0，A1，……，An中的一个主体对象匹配的视频数据)之外的非主体对象的音频数据将被抑制。在用户选择多个主体对象时，可以针对多个主体对象设置不同的重要性系数，例如c0，c1，c2，……，cn；对于主体对象的像素，可以基于重要性系数来控制不同主体对象的虚化程度或颜色留色程度；对于主体对象的音频数据，可以基于重要性系数来进行加权混合(A＝c0*A0+c1*A1+c2*A2+……+cn*An)。In the embodiment of the present invention, the user directly clicks to select a certain subject object. After the subject object is identified, the non-subject objects other than the pixel Ix of the subject object (I0, I1, ..., a pixel of a subject object in In) The pixels of will be blurred or decolorized, and at the same time, the audio data of non-subject objects other than the audio data Ax (A0, A1, . When the user selects multiple main objects, different importance coefficients can be set for multiple main objects, such as c0, c1, c2, ..., cn; for the pixels of the main object, different main objects can be controlled based on the importance coefficient For the audio data of the subject object, weighted mixing can be performed based on the importance coefficient (A=c0*A0+c1*A1+c2*A2+...+cn*An).

需要说明的是，针对录制过程中的视频数据，则可以在录制过程中针对不同的执行主体对象设置不同的重要性系数，便于实时针对用户选取的主体对象，来处理第二视频数据，以像素为例，在用户选择主体对象I的像素Ix，则对Ix之外的像素(包括主体对象I对应的非主体对象的像素、I之外的其他主体对象的像素、以及I之外的其他主体对象对应的非主体对象的像素)进行高斯滤波处理或灰度处理并播放。It should be noted that for the video data in the recording process, different importance coefficients can be set for different execution subject objects during the recording process, so as to facilitate real-time processing of the second video data for the subject object selected by the user, and the pixel For example, when the user selects the pixel Ix of the main object 1, then for pixels other than Ix (including the pixels of the non-main object corresponding to the main object I, the pixels of other main objects other than I, and other main bodies other than I The pixels of the non-subject object corresponding to the object) are processed by Gaussian filtering or grayscale processing and played.

针对已录制的视频数据，则需要在录制完成之前就需要针对不同的执行主体对象设置不同的重要性系数，且在设置完之后存储在第二视频数据中，便于在后续播放第二视频数据过程中，在用户选择主体对象I的像素Ix，则对第二视频数据中Ix之外的像素进行高斯滤波处理或灰度处理并播放。For the recorded video data, it is necessary to set different importance coefficients for different execution subject objects before the recording is completed, and store them in the second video data after setting, so as to facilitate the subsequent playback of the second video data , when the user selects the pixel Ix of the main object I, Gaussian filter processing or grayscale processing is performed on pixels other than Ix in the second video data and played.

本发明实施例可以针对不同的主体对象引入重要性参数，便于对不同的主体对象进行不同程度的处理，进而丰富用户视觉体验，增强视频趣味性。The embodiments of the present invention can introduce importance parameters for different main objects, so as to facilitate different levels of processing for different main objects, thereby enriching the user's visual experience and enhancing the fun of the video.

另外，音频数据的混合处理可以与图像数据的高斯滤波处理或灰度处理同时进行或分开进行。In addition, the mixing processing of audio data may be performed simultaneously with or separately from the Gaussian filter processing or gradation processing of image data.

在本发明实施例中，在至少一个主体对象中的至少两个主体对象被选择的情况下，该视频处理方法，包括：In an embodiment of the present invention, when at least two main objects in at least one main object are selected, the video processing method includes:

第I步骤：分别采集第一视频数据中的第一图像数据和第一音频数据；The 1st step: collect the first image data and the first audio data in the first video data respectively;

第Ⅱ步骤：对第一图像数据中的每帧图像I应用INet进行主体对象分割，分割出多个主体对象的像素I0，I1，……，In；以及对第一音频数据中的每帧音频应用ANet进行分割，分割出多个主体对象匹配的音频数据A0，A1，……，An；The second step: apply INet to each frame of image I in the first image data to perform subject object segmentation, and segment pixels I0, I1, ..., In of a plurality of subject objects; and to each frame of audio in the first audio data Apply ANet to segment, and segment audio data A0, A1,..., An that match multiple subject objects;

第Ⅲ步骤：建立主体对象的音频数据与主体对象的像素之间的映射关系(Ix<->Ay)；Step III: Establish the mapping relationship between the audio data of the main object and the pixels of the main object (Ix<->Ay);

在分离出的音频波形图上选择，通过音频数据判断归属的主体对象，然后在电子设备的屏幕上点击选择该主体对象，进而建立同一主体对象的像素与音频数据之间的映射关系。Select on the separated audio waveform graph, judge the main object belonging to through the audio data, and then click to select the main object on the screen of the electronic device, and then establish the mapping relationship between the pixels of the same main object and the audio data.

第Ⅳ步骤：将映射关系(Ix<->Ay)保存在已录制的视频数据或录制过程中的视频数据中；The fourth step: save the mapping relationship (Ix<->Ay) in the recorded video data or the video data in the recording process;

第Ⅴ步骤：在视频播放端解析映射关系之后，用户可在播放界面直接点击选择某个对象(例如，人像)，该人像Ix之外的非主体对象的像素将被虚化或去色，同时所对应的音频Ax之外的非主体对象的音频数据将被抑制衰减。Step Ⅴ: After analyzing the mapping relationship at the video playback end, the user can directly click on an object (for example, a portrait) on the playback interface, and the pixels of non-subject objects other than the portrait 1x will be blurred or decolorized, and at the same time Audio data of non-subject objects other than the corresponding audio Ax will be suppressed and attenuated.

在图2中，如果I0被选中，则I1和I2的像素将被虚化或去色，I1的音频数据A1与I2的音频数据A2将被抑制衰减；如果I0和I1被选中，则可以根据重要性系数c1、c2、c3控制I0、I1、I2的像素被虚化或去色，以及根据重要性系数c1、c2、c3进行加权混合，得到抑制衰减之后的音频数据为A＝c1*A1+c2*A2+c3*A3。In Figure 2, if I0 is selected, the pixels of I1 and I2 will be blurred or decolorized, the audio data A1 of I1 and the audio data A2 of I2 will be suppressed and attenuated; if I0 and I1 are selected, the The importance coefficients c1, c2, and c3 control the pixels of I0, I1, and I2 to be blurred or decolorized, and weighted mixing is performed according to the importance coefficients c1, c2, and c3, and the audio data obtained after suppressing attenuation is A=c1*A1 +c2*A2+c3*A3.

需要说明的是，用户也可选择多个主体，此时引入重要性系数(c0，c1，c2，……，cn)，对于视觉像素，系数用来控制不同主体的虚化程度或颜色淡化程度，对音频数据，重要性系数可用来进行加权混合(A＝c0*A0+c1*A1+c2*A2+……+cn*An)。It should be noted that the user can also select multiple subjects. At this time, the importance coefficients (c0, c1, c2, ..., cn) are introduced. For visual pixels, the coefficients are used to control the blurring degree or color fading degree of different subjects , for audio data, the importance coefficients can be used for weighted mixing (A=c0*A0+c1*A1+c2*A2+...+cn*An).

第ⅤI步骤：将第二图像数据I’和第二音频数据A’重新压缩编码保存为新的视频数据。Step VI: the second image data I' and the second audio data A' are recompressed and encoded as new video data.

本发明实施例，在单人像聚焦的基础上引入了交互，实现了多人像聚焦，增加了可扩展性和操作趣味性。The embodiment of the present invention introduces interaction on the basis of single-portrait focusing, realizes multi-portrait focusing, and increases scalability and operation fun.

需要说明的是，针对已录制的视频数据，可以预先在录制视频数据之前，预先设置好每一个主体对象的像素对应的第一权重值，以及每一个主体对象的音频数据对应的第二权重值；在后续播放的过程中，对按照第一权重值控制非主体对象的像素的虚化或主体对象的像素留色处理，以及按照第二权重值控制主体对象的音频数据的抑制衰减处理，并将处理的视频存储；在后续播放中，将存储的已处理的视频数据进行播放。针对录制过程中的视频数据，则可以在录制过程中设置每一个主体对象的像素对应的第一权重值，以及每一个主体对象的音频数据对应的第二权重值，对录制过程中的视频数据进行处理，便于实时播放。It should be noted that, for the recorded video data, the first weight value corresponding to the pixel of each main object and the second weight value corresponding to the audio data of each main object can be preset before recording the video data ; In the process of subsequent playback, control the blurring of the pixels of the non-main object or the color retention of the pixels of the main object according to the first weight value, and control the suppression and attenuation processing of the audio data of the main object according to the second weight value, and The processed video is stored; in subsequent playback, the stored processed video data is played. For the video data in the recording process, the first weight value corresponding to the pixel of each main object and the second weight value corresponding to the audio data of each main object can be set during the recording process. processing for real-time playback.

在本发明实施例中，主体对象分离网络包括：Mask R-CNN，和/或，语音分离网络包括：长短期记忆网络(Long Short-Term Memory，LSTM，LSTM)。In the embodiment of the present invention, the subject-object separation network includes: Mask R-CNN, and/or, the speech separation network includes: Long Short-Term Memory (Long Short-Term Memory, LSTM, LSTM).

在本发明实施例中，第一视频数据为录制过程中的视频数据或已录制的视频数据。In the embodiment of the present invention, the first video data is video data in the process of recording or recorded video data.

下面针对录制过程中的视频数据或已录制的视频数据，对视频处理方法进行详细说明。The video processing method will be described in detail below for the video data in the recording process or the recorded video data.

在本发明实施例中，在第一视频数据为录制过程中的视频数据的情况下(如图3所示)，该视频处理方法，包括：In an embodiment of the present invention, when the first video data is video data in the recording process (as shown in FIG. 3 ), the video processing method includes:

第一步骤：利用摄像头和麦克风分别采集录制过程中的视频数据的第一图像数据和第一音频数据；The first step: using a camera and a microphone to respectively collect the first image data and the first audio data of the video data in the recording process;

第二步骤：对第一图像数据中的每帧图像(即，第i帧)I应用INet进行图像分割，假设分割出来的主体对象的像素为In，背景像素为Ib(即非主体对象的像素)；Second step: apply INet to each frame image (that is, the i-th frame) I in the first image data for image segmentation, assuming that the pixel of the main object that is segmented is In, and the background pixel is Ib (that is, the pixel of the non-main object );

第三步骤：对分离出的非主体对象的像素进行虚化或主体对象的像素留色处理，得到第二图像数据。Step 3: performing blurring on the separated pixels of the non-main object or color-reserving processing on the pixels of the main object to obtain the second image data.

虚化就是对非主体对象的像素运用高斯滤波进行模糊化处理：Ib’＝GaussBlur(Ib，alpha)，留色即是对主体对象的像素保留原有颜色，而对非主体对象的像素灰度化：Ib’＝Gray(Ib，alpha)。其中alpha为调节参数，对于录制过程中的视频数据，可实时调节虚化或留色参数来改变最后的效果。第二图像数据用I‘表示。Blurring is to use Gaussian filter to blur the pixels of the non-subject object: Ib'=GaussBlur(Ib, alpha), color retention means to retain the original color of the pixels of the subject object, and the grayscale of the pixel of the non-subject object Bl: Ib'=Gray(Ib, alpha). Among them, alpha is an adjustment parameter. For the video data in the recording process, the blur or color retention parameters can be adjusted in real time to change the final effect. The second image data is denoted by I'.

第四步骤：对第一音频数据中的每帧音频(即，第i帧)A应用ANet分离出对应主体对象匹配的音频数据，假设An为主体对象匹配的音频数据，Ab为背景音频数据(即非主体对象的音频数据)，原始总体音频数据为两者的叠加A＝An+Ab；The fourth step: apply ANet to each frame of audio (that is, the i-th frame) A in the first audio data to separate the audio data corresponding to the subject object matching, assuming that An is the audio data matched by the subject object, and Ab is the background audio data ( That is, the audio data of the non-subject object), the original overall audio data is the superposition of the two A=An+Ab;

第五步骤：对非主体对象的音频数据进行抑制衰减，得到第二音频数据；The fifth step: suppressing and attenuating the audio data of the non-subject object to obtain the second audio data;

其中，Ab’＝beta*Ab，其中beta为介于0和1的衰减系数，如果完全抑制则beta＝0。在录制过程中的视频数据中beta参数也可实时调节。经过抑制处理后的总体音频数据为A’＝An+Ab’＝An+beta*Ab。where Ab'=beta*Ab, where beta is an attenuation coefficient between 0 and 1, and beta=0 if complete inhibition. The beta parameter can also be adjusted in real time in the video data during the recording process. The overall audio data after suppression processing is A'=An+Ab'=An+beta*Ab.

第六步骤：将第二图像数据P‘和第二音频数据A‘分别进行编码压缩；The sixth step: encoding and compressing the second image data P' and the second audio data A' respectively;

第七步骤：通过网络传送聚焦后的压缩视频和压缩音频实时流，此实时流即为录制过程中的视频数据，例如直播视频数据。The seventh step: transmit the focused compressed video and compressed audio real-time streams through the network, and the real-time streams are the video data in the recording process, such as live video data.

本发明将图像分割和音频数据分离进行了有效结合，实现了录制过程中的视频数据中的图像和音频两个维度的聚焦。The invention effectively combines image segmentation and audio data separation, and realizes two-dimensional focusing of image and audio in video data during recording.

在本发明实施例中，在第一视频数据为已录制的视频数据的情况下(如图4所示)，该视频处理方法，包括：In an embodiment of the present invention, in the case where the first video data is recorded video data (as shown in FIG. 4 ), the video processing method includes:

第1步骤：对已录制的视频数据进行解码分离，分别获取第一图像数据和第一音频数据；Step 1: Decoding and separating the recorded video data to obtain first image data and first audio data respectively;

第2步骤：对第一图像数据中的每帧图像(即，第i帧)I应用INet进行图像分割，假设分割出来的主体对象的像素为In，背景像素为Ib；The 2nd step: apply INet to every frame image (that is, the i frame) I in the first image data and carry out image segmentation, assume that the pixel of the subject object that is divided is In, and the background pixel is Ib;

第3步骤：对分离出的非主体对象的像素进行虚化或留色处理，得到第二图像数据；Step 3: Perform blurring or color retention processing on the separated pixels of non-subject objects to obtain second image data;

虚化就是对非主体对象的像素运用高斯滤波进行模糊化处理：Ib’＝GaussBlur(Ib，alpha)，留色即是对主体对象的像素保留原有颜色，而对非主体对象的像素灰度化：Ib’＝Gray(Ib，alpha)。不同于录制过程中的视频数据，alpha只能预设一次，一旦生成已录制的视频数据，不能再修改。第二图像数据用I‘表示；Blurring is to use Gaussian filter to blur the pixels of the non-subject object: Ib'=GaussBlur(Ib, alpha), color retention means to retain the original color of the pixels of the subject object, and the grayscale of the pixel of the non-subject object Bl: Ib'=Gray(Ib, alpha). Unlike the video data in the recording process, alpha can only be preset once, and once the recorded video data is generated, it cannot be modified. The second image data is represented by I';

第4步骤：对第一音频数据中的每帧音频(即，第i帧)A应用ANet进行分割，假设An为主体对象匹配的音频数据，Ab为背景音频数据(即非主体对象的音频数据)，原始总体音频数据为两者的叠加A＝An+Ab；Step 4: ANet is applied to each frame of audio (that is, the i-th frame) A in the first audio data to segment, assuming that An is the audio data matched by the main object, and Ab is the background audio data (that is, the audio data of the non-main object ), the original overall audio data is the superposition A=An+Ab of the two;

第5步骤：对非主体对象的音频数据进行抑制衰减，得到第二视频数据；Step 5: suppress and attenuate the audio data of the non-subject object to obtain the second video data;

抑制衰减为Ab’＝beta*Ab，其中beta为介于0和1的衰减系数，如果完全抑制则beta＝0。抑制衰减处理后的总体音频数据为A’＝An+Ab’＝An+beta*Ab；Inhibition decay is Ab'=beta*Ab, where beta is an attenuation coefficient between 0 and 1, and beta=0 if complete inhibition. The overall audio data after suppressing attenuation processing is A'=An+Ab'=An+beta*Ab;

第6步骤：将第二图像数据I’和第二音频数据A’重新压缩编码保存为第二视频数据，此第二视频数据可作为短视频在网上分享。Step 6: Recompressing and encoding the second image data I' and the second audio data A' as second video data, which can be shared as a short video on the Internet.

本发明将AI图像分割和AI音频数据分离进行了有效结合，实现了对已录制的视频数据中的图像和音频两个维度的聚焦处理。The invention effectively combines AI image segmentation and AI audio data separation, and realizes two-dimensional focus processing of images and audio in recorded video data.

在本发明实施例中，主体对象不限于人物、动物、卡通人物、卡通动物等。In the embodiment of the present invention, the subject object is not limited to characters, animals, cartoon characters, cartoon animals and the like.

图5为本发明实施例提供的一种电子设备的示意图。如图5所示，该电子设备50包括：Fig. 5 is a schematic diagram of an electronic device provided by an embodiment of the present invention. As shown in Figure 5, the electronic device 50 includes:

获取模块501，用于获取第一视频数据中的第一图像数据和第一音频数据；An acquisition module 501, configured to acquire first image data and first audio data in the first video data;

第一聚焦模块502，用于通过预设的对象分离网络，对第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；The first focusing module 502 is configured to perform focusing processing on pixels of at least one subject object in the first image data through a preset object separation network to obtain at least one second image data;

第二聚焦模块503，用于通过预设的语音分离网络，对第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据；The second focusing module 503 is configured to focus on the audio data matched by at least one subject object in the first audio data through a preset voice separation network, to obtain at least one second audio data;

编码模块504，用于对第二图像数据和第二音频数据进行编码压缩处理，得到第二视频数据。The encoding module 504 is configured to perform encoding and compression processing on the second image data and the second audio data to obtain second video data.

可选的，第一聚焦模块502，还用于：Optionally, the first focusing module 502 is also used for:

基于至少一个主体对象的像素，从第一图像数据中识别出非主体对象的像素；identifying pixels of non-subject objects from the first image data based on pixels of at least one subject object;

基于预设的高斯滤波处理系数，对非主体对象的像素进行高斯滤波处理，或者，基于预设的灰度处理系数，对非主体对象的像素进行灰度处理。Based on the preset Gaussian filter processing coefficient, Gaussian filter processing is performed on the pixels of the non-main object, or, based on the preset gray scale processing coefficient, grayscale processing is performed on the pixels of the non-main object.

本发明实施例，通过对非主体对象的像素进行高斯滤波处理或灰度化处理，进而实现对主体对象的像素的聚焦处理。In the embodiment of the present invention, by performing Gaussian filter processing or grayscale processing on the pixels of the non-main object, the focus processing on the pixels of the main object is further realized.

可选的，该电子设备还包括：Optionally, the electronic device also includes:

获取模块，还用于获取非主体对象的图像亮度；The obtaining module is also used to obtain the image brightness of the non-subject object;

调整模块，用于根据图像亮度，调整预设的高斯滤波处理系数或预设的灰度处理系数。The adjustment module is configured to adjust a preset Gaussian filter processing coefficient or a preset gray scale processing coefficient according to image brightness.

本发明实施例，能够更灵活地基于非主体对象的像素的图像亮度，动态调整调整预设的高斯滤波处理系数或预设的灰度处理系数，进而动态调整聚焦效果。The embodiment of the present invention can more flexibly adjust the preset Gaussian filter processing coefficient or the preset grayscale processing coefficient based on the image brightness of the pixel of the non-subject object more flexibly, and then dynamically adjust the focusing effect.

可选的，第二聚焦模块503，还用于：Optionally, the second focusing module 503 is also used for:

本发明实施例，通过对非主体对象的音频数据进行衰减处理，进而实现对主体对象的音频数据的聚焦处理。In the embodiment of the present invention, by performing attenuation processing on the audio data of the non-main object, the focus processing on the audio data of the main object is realized.

确定模块，用于基于用户对至少一个主体对象的选择输入，确定每个主体对象的目标像素；A determining module, configured to determine the target pixel of each subject object based on the user's selection input of at least one subject object;

确定模块，还用于确定与每个主体对象匹配的目标音频数据；A determination module is also used to determine the target audio data matched with each subject object;

建立模块，用于建立每个主体对象的目标像素和与每个主体对象匹配的目标音频数据之间的映射关系Establishing a module for establishing a mapping relationship between the target pixels of each subject object and the target audio data matched with each subject object

存储模块，用于将映射关系存储至第二视频数据中。A storage module, configured to store the mapping relationship into the second video data.

本发明实施例，通过建立每一个主体对象的像素和音频数据之间的映射关系，进而能够实现对每一个主体对象的像素和音频数据进行聚焦处理。In the embodiment of the present invention, by establishing a mapping relationship between the pixels of each main object and the audio data, it is possible to perform focus processing on the pixels and audio data of each main object.

可选的，确定模块，还用于：Optional, identify modules, also used for:

从预先存储的至少一个音频数据中，筛选与每个主体对象的音频特征匹配的目标音频数据；From at least one pre-stored audio data, screening target audio data matching the audio characteristics of each subject object;

本发明实施例，通过匹配到主体对象的音频数据，进而能够将每一个主体对象的像素与音频数据关联起来，便于后续对该主体对象的像素和音频数据同时进行处理，使得在后续播放的时候，能进一步丰富用户视觉效果，提高用户体验度。In the embodiment of the present invention, by matching the audio data of the main object, the pixels of each main object can be associated with the audio data, which facilitates subsequent processing of the pixels and audio data of the main object at the same time, so that during subsequent playback , which can further enrich user visual effects and improve user experience.

处理模块，用于在至少一个主体对象中的至少两个主体对象被选择的情况下，基于第一权重值，对至少两个主体对象的目标像素进行高斯滤波处理或灰度处理；A processing module, configured to perform Gaussian filtering or grayscale processing on the target pixels of the at least two subject objects based on the first weight value when at least two subject objects in the at least one subject object are selected;

播放模块，用于播放经过高斯滤波处理或灰度处理的第二图像数据；A playback module, configured to play the second image data processed by Gaussian filtering or grayscale processing;

其中，至少一个主体对象中每一个主体对象的预设的高斯滤波处理系数或预设的灰度处理系数对应不同的第一权重值。Wherein, the preset Gaussian filter processing coefficients or the preset gray scale processing coefficients of each of the at least one main object correspond to different first weight values.

本发明实施例可以针对不同的主体对象引入重要性参数，便于对不同的主体对象的像素进行不同程度高斯滤波处理或灰度处理，进而丰富用户视觉体验，增强视频趣味性。The embodiments of the present invention can introduce importance parameters for different main objects, so as to perform different degrees of Gaussian filtering or grayscale processing on the pixels of different main objects, thereby enriching the user's visual experience and enhancing the interest of the video.

可选的，电子设备还包括：处理模块，用于在至少一个主体对象中的至少两个主体对象被选择的情况下，基于第二权重值，对至少两个主体对象的目标音频数据进行混音处理；播放模块，用于播放经过混音处理的第二音频数据；其中，至少一个主体对象中的每一个主体对象的预设衰减系数对应不同的第二权重值。Optionally, the electronic device further includes: a processing module, configured to mix the target audio data of at least two main objects based on the second weight value when at least two main objects in at least one main object are selected. Audio processing; a playback module, configured to play the second audio data after audio mixing; wherein, the preset attenuation coefficient of each of the at least one main object corresponds to a different second weight value.

本发明实施例可以针对不同的主体对象引入重要性参数，便于对不同的主体对象匹配的音频数据进行不同程度衰减处理，进而丰富用户视觉体验，增强视频趣味性。The embodiments of the present invention can introduce importance parameters for different main objects, so as to attenuate the audio data matched by different main objects to different degrees, thereby enriching the user's visual experience and enhancing the fun of the video.

可选的，第一视频数据为录制过程中的视频数据或已录制的视频数据。Optionally, the first video data is video data during recording or recorded video data.

可选的，主体对象分离网络包括：Mask R-CNN，和/或，语音分离网络包括：长短期记忆网络LSTM。Optionally, the subject-object separation network includes: Mask R-CNN, and/or, the speech separation network includes: long short-term memory network LSTM.

本发明实施例提供的电子设备能够实现图1的方法实施例中电子设备实现的各个过程，为避免重复，这里不再赘述。The electronic device provided by the embodiment of the present invention can implement various processes implemented by the electronic device in the method embodiment in FIG. 1 , and details are not repeated here to avoid repetition.

在本发明实施例中，通过预设的对象分离网络，对电子设备第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；以及通过预设的语音分离网络，对电子设备第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据，进而能够实现对第一视频数据中的图像和音频的聚焦处理。In an embodiment of the present invention, at least one second image data is obtained by performing focusing processing on pixels of at least one subject object in the first image data of the electronic device through a preset object separation network; and through a preset voice separation network performing focus processing on the audio data matched by at least one subject object in the first audio data of the electronic device to obtain at least one second audio data, and then realizing the focus processing on the image and audio in the first video data.

图6为实现本发明各个实施例的一种电子设备的硬件结构示意图，该电子设备100包括但不限于：射频单元101、网络模块102、音频输出单元103、输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、处理器110、以及电源111等部件。本领域技术人员可以理解，图6中示出的电子设备结构并不构成对电子设备的限定，电子设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。在本发明实施例中，电子设备包括但不限于电子设备、平板电脑、笔记本电脑、掌上电脑、车载终端、可穿戴设备、以及计步器等。6 is a schematic diagram of the hardware structure of an electronic device implementing various embodiments of the present invention, the electronic device 100 includes but not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, and a display unit 106 , a user input unit 107 , an interface unit 108 , a memory 109 , a processor 110 , and a power supply 111 and other components. Those skilled in the art can understand that the structure of the electronic device shown in Figure 6 does not constitute a limitation on the electronic device, and the electronic device may include more or less components than shown in the illustration, or combine certain components, or different components layout. In the embodiment of the present invention, the electronic device includes, but is not limited to, an electronic device, a tablet computer, a notebook computer, a palmtop computer, a vehicle terminal, a wearable device, and a pedometer.

处理器110，用于通过预设的对象分离网络，对所述第一图像数据中的至少一个主体对象的像素进行聚焦处理，得到至少一个第二图像数据；The processor 110 is configured to perform focus processing on pixels of at least one subject object in the first image data through a preset object separation network to obtain at least one second image data;

通过预设的语音分离网络，对所述第一音频数据中的至少一个主体对象匹配的音频数据进行聚焦处理，得到至少一个第二音频数据；performing focus processing on the audio data matching at least one subject object in the first audio data through a preset voice separation network, to obtain at least one second audio data;

对所述第二图像数据和所述第二音频数据进行编码压缩处理，得到第二视频数据。Encoding and compressing the second image data and the second audio data to obtain second video data.

应理解的是，本发明实施例中，射频单元101可用于收发信息或通话过程中，信号的接收和发送，具体的，将来自基站的下行数据接收后，给处理器110处理；另外，将上行的数据发送给基站。通常，射频单元101包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外，射频单元101还可以通过无线通信系统与网络和其他设备通信。It should be understood that, in the embodiment of the present invention, the radio frequency unit 101 can be used for receiving and sending signals during sending and receiving information or during a call. Specifically, after receiving the downlink data from the base station, the processor 110 processes it; Uplink data is sent to the base station. Generally, the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with the network and other devices through a wireless communication system.

电子设备通过网络模块102为用户提供了无线的宽带互联网访问，如帮助用户收发电子邮件、浏览网页和访问流式媒体等。The electronic device provides users with wireless broadband Internet access through the network module 102, such as helping users send and receive emails, browse web pages, and access streaming media.

音频输出单元103可以将射频单元101或网络模块102接收的或者在存储器109中存储的音频数据转换成音频信号并且输出为声音。而且，音频输出单元103还可以提供与电子设备100执行的特定功能相关的音频输出(例如，呼叫信号接收声音、消息接收声音等等)。音频输出单元103包括扬声器、蜂鸣器以及受话器等。The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the network module 102 or stored in the memory 109 into an audio signal and output as sound. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the electronic device 100 (eg, call signal reception sound, message reception sound, etc.). The audio output unit 103 includes a speaker, a buzzer, a receiver, and the like.

输入单元104用于接收音频或视频信号。输入单元104可以包括图形处理器(Graphics Processing Unit，GPU)1041和麦克风1042，图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示单元106上。经图形处理器1041处理后的图像帧可以存储在存储器109(或其它存储介质)中或者经由射频单元101或网络模块102进行发送。麦克风1042可以接收声音，并且能够将这样的声音处理为音频数据。处理后的音频数据可以在电话通话模式的情况下转换为可经由射频单元101发送到移动通信基站的格式输出。The input unit 104 is used to receive audio or video signals. The input unit 104 may include a graphics processing unit (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processing unit 1041 is used for still pictures or video images obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode. The data is processed. The processed image frames may be displayed on the display unit 106 . The image frames processed by the graphics processor 1041 may be stored in the memory 109 (or other storage media) or sent via the radio frequency unit 101 or the network module 102 . The microphone 1042 can receive sound and can process such sound into audio data. The processed audio data can be converted into a format that can be sent to a mobile communication base station via the radio frequency unit 101 for output in the case of a phone call mode.

电子设备100还包括至少一种传感器105，比如光传感器、运动传感器以及其他传感器。具体地，光传感器包括环境光传感器及接近传感器，其中，环境光传感器可根据环境光线的明暗来调节显示面板1061的亮度，接近传感器可在电子设备100移动到耳边时，关闭显示面板1061和/或背光。作为运动传感器的一种，加速计传感器可检测各个方向上(一般为三轴)加速度的大小，静止时可检测出重力的大小及方向，可用于识别电子设备姿态(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等；传感器105还可以包括指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等，在此不再赘述。The electronic device 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off the display panel 1061 and the / or backlighting. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is still, and can be used to identify the posture of electronic equipment (such as horizontal and vertical screen switching, related games) , magnetometer posture calibration), vibration recognition-related functions (such as pedometer, knocking), etc.; the sensor 105 can also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, Infrared sensors, etc., will not be repeated here.

显示单元106用于显示由用户输入的信息或提供给用户的信息。显示单元106可包括显示面板1061，可以采用液晶显示器(Liquid Crystal Display，LCD)、有机发光二极管(Organic Light-Emitting Diode，OLED)等形式来配置显示面板1061。The display unit 106 is used to display information input by the user or information provided to the user. The display unit 106 may include a display panel 1061 , and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD) or an organic light-emitting diode (Organic Light-Emitting Diode, OLED).

用户输入单元107可用于接收输入的数字或字符信息，以及产生与电子设备的用户设置以及功能控制有关的键信号输入。具体地，用户输入单元107包括触控面板1071以及其他输入设备1072。触控面板1071，也称为触摸屏，可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1071上或在触控面板1071附近的操作)。触控面板1071可包括触摸检测装置和触摸控制器两个部分。其中，触摸检测装置检测用户的触摸方位，并检测触摸操作带来的信号，将信号传送给触摸控制器；触摸控制器从触摸检测装置上接收触摸信息，并将它转换成触点坐标，再送给处理器110，接收处理器110发来的命令并加以执行。此外，可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1071。除了触控面板1071，用户输入单元107还可以包括其他输入设备1072。具体地，其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆，在此不再赘述。The user input unit 107 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the electronic device. Specifically, the user input unit 107 includes a touch panel 1071 and other input devices 1072 . The touch panel 1071, also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1071 or near the touch panel 1071). operate). The touch panel 1071 may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and sends it to the For the processor 110, receive the command sent by the processor 110 and execute it. In addition, the touch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1071 , the user input unit 107 may also include other input devices 1072 . Specifically, other input devices 1072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.

进一步的，触控面板1071可覆盖在显示面板1061上，当触控面板1071检测到在其上或附近的触摸操作后，传送给处理器110以确定触摸事件的类型，随后处理器110根据触摸事件的类型在显示面板1061上提供相应的视觉输出。虽然在图6中，触控面板1071与显示面板1061是作为两个独立的部件来实现电子设备的输入和输出功能，但是在某些实施例中，可以将触控面板1071与显示面板1061集成而实现电子设备的输入和输出功能，具体此处不做限定。Further, the touch panel 1071 can be covered on the display panel 1061, and when the touch panel 1071 detects a touch operation on or near it, it will be sent to the processor 110 to determine the type of the touch event, and then the processor 110 can The type of event provides a corresponding visual output on the display panel 1061 . Although in FIG. 6, the touch panel 1071 and the display panel 1061 are used as two independent components to realize the input and output functions of the electronic device, in some embodiments, the touch panel 1071 and the display panel 1061 can be integrated. The implementation of the input and output functions of the electronic device is not specifically limited here.

接口单元108为外部装置与电子设备100连接的接口。例如，外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元108可以用于接收来自外部装置的输入(例如，数据信息、电力等等)并且将接收到的输入传输到电子设备100内的一个或多个元件或者可以用于在电子设备100和外部装置之间传输数据。The interface unit 108 is an interface for connecting an external device to the electronic device 100 . For example, an external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) ports, video I/O ports, headphone ports, and more. The interface unit 108 can be used to receive input from an external device (for example, data information, power, etc.) transfer data between devices.

存储器109可用于存储软件程序以及各种数据。存储器109可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据电子设备的使用所创建的数据(比如音频数据、电话本等)等。此外，存储器109可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 109 can be used to store software programs as well as various data. The memory 109 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, at least one application program required by a function (such as a sound playback function, an image playback function, etc.) etc.; Data created by the use of electronic devices (such as audio data, phonebook, etc.), etc. In addition, the memory 109 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.

处理器110是电子设备的控制中心，利用各种接口和线路连接整个电子设备的各个部分，通过运行或执行存储在存储器109内的软件程序和/或模块，以及调用存储在存储器109内的数据，执行电子设备的各种功能和处理数据，从而对电子设备进行整体监控。处理器110可包括一个或多个处理单元；优选的，处理器110可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器110中。The processor 110 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device, by running or executing software programs and/or modules stored in the memory 109, and calling data stored in the memory 109 , to perform various functions of the electronic equipment and process data, so as to monitor the electronic equipment as a whole. The processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface and application programs, etc., and the modem The processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .

电子设备100还可以包括给各个部件供电的电源111(比如电池)，优选的，电源111可以通过电源管理系统与处理器110逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The electronic device 100 can also include a power supply 111 (such as a battery) for supplying power to various components. Preferably, the power supply 111 can be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. and other functions.

另外，电子设备100包括一些未示出的功能模块，在此不再赘述。In addition, the electronic device 100 includes some functional modules not shown, which will not be repeated here.

优选的，本发明实施例还提供一种电子设备，包括处理器110，存储器109，存储在存储器109上并可在所述处理器110上运行的计算机程序，该计算机程序被处理器110执行时实现上述视频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Preferably, the embodiment of the present invention also provides an electronic device, including a processor 110, a memory 109, a computer program stored on the memory 109 and operable on the processor 110, when the computer program is executed by the processor 110 Each process of the foregoing video processing method embodiment can achieve the same technical effect. To avoid repetition, details are not repeated here.

本发明实施例还提供一种电子设备，包括：The embodiment of the present invention also provides an electronic device, including:

一个或多个处理器110；one or more processors 110;

一个或多个存储器109；one or more memories 109;

一个或多个传感器；one or more sensors;

以及一个或多个计算机程序，其中所述一个或多个计算机程序被存储在所述一个或多个存储器中，所述一个或多个计算机程序包括指令，当所述指令被所述电子设备执行时，使得所述电子设备执行实现上述视频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。and one or more computer programs, wherein the one or more computer programs are stored in the one or more memories, the one or more computer programs comprising instructions, which when executed by the electronic device , so that the electronic device executes the various processes for realizing the above video processing method embodiment, and can achieve the same technical effect, to avoid repetition, no more details are given here.

本发明实施例还提供一种计算机可读存储介质，计算机可读存储介质上存储有计算机程序，该计算机程序被处理器执行时实现上述视频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。其中，所述的计算机可读存储介质，如只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等。The embodiment of the present invention also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the above-mentioned video processing method embodiment is realized, and the same technology can be achieved. Effect, in order to avoid repetition, will not repeat them here. Wherein, the computer-readable storage medium is, for example, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), a magnetic disk or an optical disk, and the like.

本发明实施例还提供一种计算机非瞬态存储介质，所述计算机非瞬态存储介质存储有计算机程序，所述计算机程序被计算设备执行时实现上述视频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。An embodiment of the present invention also provides a computer non-transitory storage medium, the computer non-transitory storage medium stores a computer program, and when the computer program is executed by a computing device, each process of the above video processing method embodiment is implemented, and can To achieve the same technical effect, in order to avoid repetition, no more details are given here.

本发明实施例还提供一种计算机程序产品，当所述计算机程序产品在计算机上运行时，使得所述计算机执行时实现上述视频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。The embodiment of the present invention also provides a computer program product, when the computer program product is run on a computer, the computer can realize each process of the above video processing method embodiment when executed, and can achieve the same technical effect, for To avoid repetition, I won't go into details here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是电子设备，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products are stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal (which may be an electronic device, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in various embodiments of the present invention.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本发明的保护之内。Embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific implementations, and the above-mentioned specific implementations are only illustrative, rather than restrictive, and those of ordinary skill in the art will Under the enlightenment of the present invention, without departing from the gist of the present invention and the protection scope of the claims, many forms can also be made, all of which belong to the protection of the present invention.

Claims

1. A video processing method, characterized in that, comprising:

Acquiring first image data and first audio data of the first video data;

performing focus processing on pixels of at least one subject object in the first image data through a preset object separation network to obtain at least one second image data;

performing focus processing on the audio data matching at least one subject object in the first audio data through a preset voice separation network, to obtain at least one second audio data;

Encoding and compressing the second image data and the second audio data to obtain second video data.

2. The method according to claim 1, wherein the focusing processing on pixels of at least one subject object in the first image data comprises:

identifying pixels of a non-subject object from the first image data based on pixels of the at least one subject object;

Based on a preset Gaussian filter processing coefficient, perform Gaussian filter processing on the pixels of the non-main object, or perform grayscale processing on the pixels of the non-main object based on a preset grayscale processing coefficient.

3. The method according to claim 2, wherein after identifying the pixels of the non-subject object from the first image data, further comprising:

Acquiring the image brightness of the non-subject object;

According to the brightness of the image, the preset Gaussian filter processing coefficient or the preset grayscale processing coefficient is adjusted.

4. The method according to claim 1, wherein the focusing processing on the audio data of at least one subject object in the first audio data comprises:

identifying audio data of a non-subject object from the first audio data based on the audio data of the at least one subject object;

Based on a preset attenuation coefficient, attenuation processing is performed on the audio data of the non-subject object.

5. The method according to claim 1, wherein the focusing processing on the audio data of the at least one subject object comprises:

The audio data of the main object is replaced with preset audio data.

6. The method according to claim 1, characterized in that, after the acquisition of the first image data and the first audio data of the first video data, the method further comprises:

determining a target pixel for each subject object based on a user selection input of the at least one subject object;

determining target audio data matching each subject object;

A mapping relationship between the target pixels of each main object and the target audio data matching each main object is established and stored in the second video data.

7. The method according to claim 6, wherein said determining the target audio data matched with each subject object comprises:

From at least one pre-stored audio data, screening target audio data matching the audio characteristics of each subject object;

Alternatively, the audio data selected by the user is determined as target audio data matching each subject object.

8. The method according to claim 2, wherein the pixel of at least one main object in the first image data is subjected to focusing processing through a preset object separation network to obtain at least one second image After the data, the method also includes:

When at least two of the at least one subject object are selected, perform Gaussian filter processing or grayscale processing on the target pixels of the at least two subject objects based on the first weight value;

Playing the second image data processed by Gaussian filtering or grayscale processing;

Wherein, the preset Gaussian filter processing coefficients or the preset grayscale processing coefficients of each of the at least one main object correspond to different first weight values.

9. The method according to claim 4, characterized in that, through the preset voice separation network, the audio data matched with at least one subject object in the first audio data is subjected to focus processing to obtain at least one second audio data. After the second audio data, the method also includes:

In the case where at least two of the at least one main object are selected, performing sound mixing processing on the target audio data of the at least two main objects based on a second weight value;

Play the second audio data that has been mixed;

Wherein, the preset attenuation coefficients of each of the at least one main object correspond to different second weight values.

10. The method according to claim 1, wherein the first video data is video data in a recording process or recorded video data.

11. The method according to claim 1, wherein the subject object separation network comprises: Mask R-CNN.

12. The method according to claim 1, wherein the speech separation network comprises: a long short-term memory network (LSTM).

13. An electronic device, characterized in that it comprises:

An acquisition module, configured to acquire first image data and first audio data of the first video data;

A first focusing module, configured to focus on pixels of at least one subject object in the first image data through a preset object separation network to obtain at least one second image data;

The second focusing module is configured to focus on the audio data matched by at least one subject object in the first audio data through a preset voice separation network, to obtain at least one second audio data;

The encoding module is configured to perform encoding and compression processing on the second image data and the second audio data to obtain second video data.

14. An electronic device, characterized by comprising a processor, a memory, and a computer program stored on the memory and operable on the processor, when the computer program is executed by the processor, the The steps of the video processing method described in any one of claims 1 to 12.