CN115396728B

CN115396728B - Method, device, electronic device and medium for determining video playback speed

Info

Publication number: CN115396728B
Application number: CN202210994766.0A
Authority: CN
Inventors: 沈晓磊
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2024-08-27
Anticipated expiration: 2042-08-18
Also published as: CN115396728A

Abstract

The present application discloses a method, device, electronic device and medium for determining video playback speed, which belongs to the field of image technology. The method includes: obtaining first video content feature information corresponding to N video clips in a first video, where N is a positive integer; respectively splicing the user behavior feature information of the target user with the first video content feature information corresponding to the N video clips to obtain the video feature information corresponding to each of the N video clips; determining the first playback speed corresponding to each video clip based on the video feature information corresponding to each video clip; and playing the first video based on the first playback speed.

Description

Method, device, electronic device and medium for determining video playback speed

技术领域Technical Field

本申请属于人工智能技术领域，具体涉及一种视频播放倍速的确定方法、装置、电子设备及介质。The present application belongs to the field of artificial intelligence technology, and specifically relates to a method, device, electronic device and medium for determining a video playback speed.

背景技术Background Art

现如今互联网上的视频数量越来越多，视频内容也更加丰富。由于用户在观看视频过程中，对不同的视频片段的感兴趣程度是不同的，因此当前市场上的各类播放器均具备倍速播放的功能，以满足用户快速浏览或者慢速欣赏等各种需求。Nowadays, there are more and more videos on the Internet, and the video content is also richer. Since users have different interests in different video clips when watching videos, various players on the market currently have the function of multiple-speed playback to meet users' various needs such as fast browsing or slow appreciation.

然而，当前各类播放器的倍速播放功能都需要用户在观看过程中不断主动触发调节视频的视频播放速度。如此，导致整体视频播放倍速调整过程过于繁琐，使得用户需要不断自主调整视频播放倍速，从而降低了用户观看视频的体验感。However, the speed playback function of various current players requires users to actively trigger and adjust the video playback speed during the viewing process. This makes the overall video playback speed adjustment process too cumbersome, requiring users to constantly adjust the video playback speed on their own, thereby reducing the user's video viewing experience.

发明内容Summary of the invention

本申请实施例的目的是提供一种视频播放倍速的确定方法、装置、电子设备及介质，能够使得电子设备在用户观看视频过程中自适应性地调整当前视频片段的播放速度。The purpose of the embodiments of the present application is to provide a method, device, electronic device and medium for determining a video playback speed, which can enable the electronic device to adaptively adjust the playback speed of the current video clip while the user is watching the video.

第一方面，本申请实施例提供了一种视频播放倍速的确定方法，该视频播放倍速的确定方法包括：获取第一视频中的N个视频片段对应的第一视频内容特征信息，N为正整数；将目标用户的用户行为特征信息与上述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到上述N个视频片段中的每个视频片段对应的视频特征信息；基于上述每个视频片段对应的视频特征信息，确定上述每个视频片段对应的第一播放倍速；基于该第一播放倍速播放所述第一视频。In a first aspect, an embodiment of the present application provides a method for determining a video playback speed, the method comprising: obtaining first video content feature information corresponding to N video clips in a first video, where N is a positive integer; respectively splicing user behavior feature information of a target user with the first video content feature information corresponding to the above N video clips to obtain video feature information corresponding to each of the above N video clips; determining a first playback speed corresponding to each of the above video clips based on the video feature information corresponding to the above each video clip; and playing the first video based on the first playback speed.

第二方面，本申请实施例提供了一种视频播放倍速的确定装置，该视频播放倍速的确定装置包括：获取模块和处理模块：该获取模块，用于获取第一视频中的N个视频片段，N为正整数；该获取模块，还用于获取上述N个视频片段中的每个视频片段对应的视频特征信息；任一视频片段对应的视频特征信息包括：目标用户的用户行为特征信息和任一视频片段的视频内容特征信息；所述处理模块，用于基于上述每个视频片段对应的视频特征信息，确定所述每个视频片段对应的第一播放倍速；所述处理模块，还用于将上述每个视频片段对应的第一播放倍速分别适用于各自对应的视频片段。In the second aspect, an embodiment of the present application provides a device for determining a video playback speed, and the device for determining a video playback speed includes: an acquisition module and a processing module: the acquisition module is used to acquire N video clips in a first video, N being a positive integer; the acquisition module is also used to acquire video feature information corresponding to each of the above N video clips; the video feature information corresponding to any video clip includes: user behavior feature information of a target user and video content feature information of any video clip; the processing module is used to determine a first playback speed corresponding to each video clip based on the video feature information corresponding to each video clip; the processing module is also used to apply the first playback speed corresponding to each video clip to each corresponding video clip.

第三方面，本申请实施例提供了一种电子设备，该电子设备包括处理器和存储器，所述存储器存储可在所述处理器上运行的程序或指令，所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, wherein the memory stores programs or instructions that can be run on the processor, and when the program or instructions are executed by the processor, the steps of the method described in the first aspect are implemented.

第四方面，本申请实施例提供了一种可读存储介质，所述可读存储介质上存储程序或指令，所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.

第五方面，本申请实施例提供了一种芯片，所述芯片包括处理器和通信接口，所述通信接口和所述处理器耦合，所述处理器用于运行程序或指令，实现如第一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run a program or instruction to implement the method described in the first aspect.

第六方面，本申请实施例提供一种计算机程序产品，该程序产品被存储在存储介质中，该程序产品被至少一个处理器执行以实现如第一方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, which is stored in a storage medium and is executed by at least one processor to implement the method described in the first aspect.

在本申请实施例中，获取第一视频中的N个视频片段对应的第一视频内容特征信息，N为正整数；并将目标用户的用户行为特征信息与上述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到上述N个视频片段中的每个视频片段对应的视频特征信息；基于上述每个视频片段对应的视频特征信息，确定上述每个视频片段对应的第一播放倍速；进而基于该第一播放倍速播放所述第一视频。如此，通过将目标用户的用户行为特征信息融入每一视频片段的视频内容特征信息，使得电子设备能够按照用户行为特征信息和视频内容特征信息来个性化的设置每个片段对应的视频播放倍速，从而无需用户手动调整视频播放倍速。In an embodiment of the present application, the first video content feature information corresponding to N video clips in the first video is obtained, where N is a positive integer; and the user behavior feature information of the target user is respectively spliced with the first video content feature information corresponding to the above N video clips to obtain the video feature information corresponding to each of the above N video clips; based on the video feature information corresponding to each of the above video clips, the first playback speed corresponding to each of the above video clips is determined; and then the first video is played based on the first playback speed. In this way, by integrating the user behavior feature information of the target user into the video content feature information of each video clip, the electronic device can set the video playback speed corresponding to each clip in a personalized manner according to the user behavior feature information and the video content feature information, thereby eliminating the need for the user to manually adjust the video playback speed.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请实施例提供的一种视频播放倍速的确定方法的流程示意图之一；FIG1 is a flow chart of a method for determining a video playback speed according to an embodiment of the present application;

图2是本申请实施例提供的一种视频播放倍速的确定方法的模型示意图之一；FIG. 2 is a schematic diagram of a model of a method for determining a video playback speed provided in an embodiment of the present application;

图3是本申请实施例提供的一种视频播放倍速的确定方法的模型示意图之二；FIG3 is a second schematic diagram of a model of a method for determining a video playback speed provided by an embodiment of the present application;

图4是本申请实施例提供的一种视频播放倍速的确定方法的模型示意图之三；FIG4 is a third model schematic diagram of a method for determining a video playback speed provided in an embodiment of the present application;

图5是本申请实施例提供的一种视频播放倍速的确定方法的流程示意图之二；FIG5 is a second flow chart of a method for determining a video playback speed according to an embodiment of the present application;

图6是本申请实施例提供的一种视频播放倍速的确定装置的结构示意图；FIG6 is a schematic diagram of the structure of a device for determining a video playback speed provided by an embodiment of the present application;

图7是本申请实施例提供的一种电子设备的硬件结构示意图之一；FIG7 is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application;

图8是本申请实施例提供的一种电子设备的硬件结构示意图之二。FIG. 8 is a second schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员获得的所有其他实施例，都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all the embodiments. All other embodiments obtained by ordinary technicians in this field based on the embodiments in the present application belong to the scope of protection of this application.

本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象，而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施，且“第一”、“第二”等所区分的对象通常为一类，并不限定对象的个数，例如第一对象可以是一个，也可以是多个。此外，说明书以及权利要求中“和/或”表示所连接对象的至少其中之一，字符“/”，一般表示前后关联对象是一种“或”的关系。The terms "first", "second", etc. in the specification and claims of this application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first", "second", etc. are generally of one type, and the number of objects is not limited. For example, the first object can be one or more. In addition, "and/or" in the specification and claims represents at least one of the connected objects, and the character "/" generally indicates that the objects associated with each other are in an "or" relationship.

下面结合附图，通过具体的实施例及其应用场景对本申请实施例提供的视频播放倍速的确定方法、装置、电子设备及可读存储介质进行详细地说明。In conjunction with the accompanying drawings, the following detailed description is given of the method, device, electronic device and readable storage medium for determining the video playback speed provided by the embodiments of the present application through specific embodiments and their application scenarios.

在本申请实施例提供的视频播放倍速的确定方法可以应用于观看视频场景中。The method for determining the video playback speed provided in the embodiment of the present application can be applied to the video watching scenario.

现如今互联网上的视频数量越来越多，视频内容也更加丰富。由于用户在观看视频过程中，对不同的视频片段的感兴趣程度是不同的，因此当前市场上的各类播放器均具备倍速播放的功能，以满足用户快速浏览或者慢速欣赏等各种需求。然而，当前各类播放器的倍速播放功能都需要用户在观看过程中不断主动触发调节视频的视频播放速度。如此，导致整体视频播放倍速调整过程过于繁琐，使得用户需要不断自主调整视频播放倍速，从而降低了用户观看视频的体验感。Nowadays, there are more and more videos on the Internet, and the video content is also richer. Since users have different interests in different video clips when watching videos, various players on the market currently have the function of multiple-speed playback to meet users' various needs such as fast browsing or slow viewing. However, the multiple-speed playback function of various current players requires users to actively trigger and adjust the video playback speed during the viewing process. As a result, the overall video playback speed adjustment process is too cumbersome, requiring users to constantly adjust the video playback speed on their own, thereby reducing the user's experience of watching videos.

例如，目前当前各类播放器的倍速播放功能具有两种调节方式，一种是用户通过主动点击设置某个固定数值(例如0.5x/0.75x/1.0x/1.5x/2.0x)，并以此固定数值对应的倍速播放当前视频片段，另一种是用户通过长按屏幕的输入，在长按屏幕的期间触发某个固定数值(例如3.0x)，并以此固定数值对应的倍速播放当前视频片段。For example, the speed playback function of various current players has two adjustment methods. One is that the user actively clicks to set a fixed value (for example, 0.5x/0.75x/1.0x/1.5x/2.0x), and plays the current video clip at the speed corresponding to the fixed value. The other is that the user long presses the screen to trigger a fixed value (for example, 3.0x) during the long press of the screen, and plays the current video clip at the speed corresponding to the fixed value.

由此可知，当前视频倍速播放确定的方案过于依赖用户主动输入，这两种方式都存在明显的缺陷，每次通过用户的输入仅仅能设置一个固定的播放速度，无法针对视频中不同的片段及时进行自动调节。It can be seen that the current scheme for determining video speed playback is too dependent on active user input. Both methods have obvious defects. Each time, only a fixed playback speed can be set through user input, and it is impossible to automatically adjust in time for different clips in the video.

综上，如何解决用户在观看过程中可以不使用手动触发确定当前视频片段的播放速度的问题，是本申请亟待解决的技术问题。In summary, how to solve the problem that the user can determine the playback speed of the current video clip without manual triggering during the viewing process is a technical problem that needs to be solved urgently in this application.

在本申请实施例提供的视频播放倍速的确定方法中，获取第一视频中的N个视频片段对应的第一视频内容特征信息，N为正整数；并将目标用户的用户行为特征信息与上述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到上述N个视频片段中的每个视频片段对应的视频特征信息；基于上述每个视频片段对应的视频特征信息，确定上述每个视频片段对应的第一播放倍速；进而基于该第一播放倍速播放所述第一视频。如此，通过将目标用户的用户行为特征信息融入每一视频片段的视频内容特征信息，使得电子设备能够按照用户行为特征信息和视频内容特征信息来个性化的设置每个片段对应的视频播放倍速，从而无需用户手动调整视频播放倍速。In the method for determining the video playback speed provided in the embodiment of the present application, the first video content feature information corresponding to the N video clips in the first video is obtained, where N is a positive integer; and the user behavior feature information of the target user is respectively spliced with the first video content feature information corresponding to the above N video clips to obtain the video feature information corresponding to each of the above N video clips; based on the video feature information corresponding to each of the above video clips, the first playback speed corresponding to each of the above video clips is determined; and then the first video is played based on the first playback speed. In this way, by integrating the user behavior feature information of the target user into the video content feature information of each video clip, the electronic device can set the video playback speed corresponding to each clip in a personalized manner according to the user behavior feature information and the video content feature information, thereby eliminating the need for the user to manually adjust the video playback speed.

本申请实施例提供的视频播放倍速的确定方法的执行主体可以为视频播放倍速的确定装置，该视频播放倍速的确定装置可以为电子设备，也可以为该电子设备中的功能模块。The execution subject of the method for determining the video playback speed provided in the embodiment of the present application may be a device for determining the video playback speed, and the device for determining the video playback speed may be an electronic device or a functional module in the electronic device.

本申请实施例提供一种视频播放倍速的确定方法，图1示出了本申请实施例提供的一种视频播放倍速的确定方法的流程图，该方法可以应用于电子设备。如图1所示，本申请实施例提供的视频播放倍速的确定方法可以包括下述的步骤201至步骤204。The embodiment of the present application provides a method for determining a video playback speed. FIG1 shows a flowchart of a method for determining a video playback speed provided by the embodiment of the present application. The method can be applied to electronic devices. As shown in FIG1, the method for determining a video playback speed provided by the embodiment of the present application can include the following steps 201 to 204.

步骤201、获取第一视频中的N个视频片段对应的第一视频内容特征信息。Step 201: Obtain first video content feature information corresponding to N video clips in a first video.

其中，N为正整数。Wherein, N is a positive integer.

在本申请实施例中，上述第一视频可以为电子设备中的任意应用中的视频，或者，电子设备本地存储的视频。In the embodiment of the present application, the first video may be a video in any application in the electronic device, or a video locally stored in the electronic device.

在本申请实施例中，电子设备可以按照预定片段数，来规律或随机的将第一视频拆分为N个视频片段。In an embodiment of the present application, the electronic device may regularly or randomly split the first video into N video segments according to a predetermined number of segments.

在本申请实施例中，电子设备可以将获取到的第一视频输入至视频分段模型进行分段处理，从而得到N个视频片段。In an embodiment of the present application, the electronic device may input the acquired first video into a video segmentation model for segmentation processing, thereby obtaining N video clips.

在本申请实施例中，上述视频分段模型为已经训练好的视频分段模型。In the embodiment of the present application, the above-mentioned video segmentation model is a trained video segmentation model.

示例性地，电子设备将时长为T的第一视频输入至视频分段模型后，可以根据设置的视频采样间隔I，采样得到N_f＝T/I帧图像，并将将N_f帧图像以每个片段帧数为M的规律合成，得到N_s＝N_f/M个视频片段 Exemplarily, after the electronic device inputs the first video with a duration of T into the video segmentation model, it can sample and obtain N _f =T/I frame images according to the set video sampling interval I, and synthesize the N _f frame images according to the rule that the number of frames per segment is M, to obtain N _s =N _f /M video segments.

需要说明的是，最后所得到的视频片段中的每一帧图像都是通过Resize和Crop处理得到的H×W×3大小的RGB图像。It should be noted that each frame of the video clip finally obtained is an RGB image of size H×W×3 obtained by Resize and Crop processing.

可选地，在本申请实施例中，上述步骤201“获取第一视频中的N个视频片段对应的第一视频内容特征信息”的过程中，包括以下步骤201a至步骤201d：Optionally, in the embodiment of the present application, the process of the above step 201 of "obtaining first video content feature information corresponding to N video clips in the first video" includes the following steps 201a to 201d:

步骤201a：针对上述N个视频片段中的其中一帧视频片段，将上述视频片段中的视频图像进行划分，得到X个图像块。Step 201a: for one frame of the N video segments, divide the video image in the video segment to obtain X image blocks.

步骤201b：获取上述X个图像块对应的图像特征信息。Step 201b: Obtain image feature information corresponding to the X image blocks.

步骤201c：将上述X个图像块对应的图像特征信息输入视频内容特征提取模型进行特征提取，得到该X个图像块对应的第二视频内容特征信息。Step 201c: input the image feature information corresponding to the X image blocks into a video content feature extraction model for feature extraction to obtain second video content feature information corresponding to the X image blocks.

步骤201d：将上述X个图像块对应的第二视频内容特征信息进行特征融合，得到第一视频内容特征信息。Step 201d: performing feature fusion on the second video content feature information corresponding to the X image blocks to obtain the first video content feature information.

示例性地，一个图像块对应一个第一关键视频内容特征信息。Exemplarily, one image block corresponds to one first key video content feature information.

示例性地，如图2所示，视频内容特征抽取模型M_video可以由基础结构重复N次构成，其中，基础结构又由多头注意力模块(Multi-Head Attention)，残差和标准化模块(Add&Norm)，以及前馈网络模块组成(Feed Forward)，其中，Multi-Head Attention由12个自注意力模块(Self-Attention)组成。Exemplarily, as shown in Figure 2, the video content feature extraction model M _video can be composed of a basic structure repeated N times, wherein the basic structure is composed of a multi-head attention module (Multi-Head Attention), a residual and normalization module (Add&Norm), and a feedforward network module (Feed Forward), wherein the Multi-Head Attention is composed of 12 self-attention modules (Self-Attention).

示例性地，电子设备可以将上述N个视频片段输入视频内容特征抽取模型M_video，通过视频内容特征抽取模型M_video从上述N个视频片段的视频内容信息中抽取出每个视频片段的第二视频内容特征信息，并将该第二视频内容特征信息进行特征融合最终得到第一视频内容特征信息。Exemplarily, the electronic device may input the N video clips into a video content feature extraction model M _video , extract the second video content feature information of each video clip from the video content information of the N video clips through the video content feature extraction model M _video , and perform feature fusion on the second video content feature information to finally obtain the first video content feature information.

示例性地，电子设备可以先将N个视频片段中的其中一个视频片段中的每一帧图像划分为预设大小的X个图像块，并将这些X个图像块拉直成相同大小长度的向量，从而获取该X个图像块对应的图像特征信息。接着，将该向量经由一个全连接层进行线性变换得到上述视频片段中视频内容信息所对应的长度为L_video-in的向量F_video-in。Exemplarily, the electronic device may first divide each frame of an image in one of the N video clips into X image blocks of a preset size, and straighten the X image blocks into vectors of the same size and length, thereby obtaining image feature information corresponding to the X image blocks. Then, the vector is linearly transformed through a fully connected layer to obtain a vector F _video-in of a length L _video- in corresponding to the video content information in the above video clip.

可选地，在本申请实施例中，上述步骤201c“将上述X个图像块对应的图像特征信息输入视频内容特征提取模型进行特征提取，得到该X个图像块对应的第二视频内容特征信息”的过程中，包括以下步骤201c1至步骤201c3：Optionally, in the embodiment of the present application, the process of the above step 201c of "inputting the image feature information corresponding to the above X image blocks into a video content feature extraction model for feature extraction to obtain the second video content feature information corresponding to the X image blocks" includes the following steps 201c1 to 201c3:

步骤201c1、基于上述多头注意力模块，将上述X个图像块对应的图像特征信息进行视频内容特征提取，得到上述X个图像块对应的X个第一关键视频内容特征信息。Step 201c1: Based on the multi-head attention module, perform video content feature extraction on the image feature information corresponding to the X image blocks to obtain X first key video content feature information corresponding to the X image blocks.

步骤201c2、基于上述残差和标准化模块，计算上述X个第一关键视频内容特征信息对应的均值和标准差值，并基于该均值和标准差值，获取上述X个第一关键视频内容特征信息对应的X个第三视频内容特征信息。Step 201c2: Based on the residual and standardization module, calculate the mean and standard deviation corresponding to the X first key video content feature information, and based on the mean and standard deviation, obtain X third video content feature information corresponding to the X first key video content feature information.

步骤201c3、基于前反馈模块，分别将上述X个第三视频内容特征信息中的每个第三视频内容特征信息中的所有特征信息进行融合，得到X个第二视频内容特征信息。Step 201c3: Based on the forward feedback module, all feature information in each of the X pieces of third video content feature information are merged to obtain X pieces of second video content feature information.

具体地，首先，将上述X个图像块对应的图像信息特征信息输入Multi-Head Attention，通过其包含的12个Self-Attention采用公式根据特征间的相关性计算出权重，从而提取出不同的第一关键视频内容特征信息。其中，Q(query)，K(key),V(value)均为由输入的图像特征信息经过一个FC层重新映射得到的向量，d_k表示向量的长度。Specifically, first, the image information feature information corresponding to the above X image blocks is input into Multi-Head Attention, and the 12 Self-Attentions contained therein are used using the formula The weights are calculated based on the correlation between the features, thereby extracting different first key video content feature information. Among them, Q (query), K (key), V (value) are all vectors obtained by remapping the input image feature information through an FC layer, and d _k represents the length of the vector.

接着，将这些不同的关键视频内容特征信息输入至上述Add&Norm，计算出第一关键视频内容特征信息对应的均值和标准差值，并基于该均值和标准差值对第一关键视频内容特征信息进行计算得到第三视频内容特征信息。其中，Add操作等价于Short Cut，即可以复用之前的特征，并且缓解训练过程中梯度消失的优化问题，提升模型的效果；Norm为特征的层标准化(LayerNor)操作，可以将特征进行归一化保证特征分布的一致性。Next, these different key video content feature information are input into the above Add&Norm, the mean and standard deviation corresponding to the first key video content feature information are calculated, and the first key video content feature information is calculated based on the mean and standard deviation to obtain the third video content feature information. Among them, the Add operation is equivalent to Short Cut, that is, the previous features can be reused, and the optimization problem of gradient disappearance during training can be alleviated, thereby improving the effect of the model; Norm is the layer normalization (LayerNor) operation of the feature, which can normalize the features to ensure the consistency of feature distribution.

然后，将第三视频内容特征信息输入Feed Forward，经过一个FC层和RelU激活函数，再经过一个FC层，分别将X个第三视频内容特征信息中的所有特征信息进行融合，从而得到更充分描述的X个第二视频内容特征信息。Then, the third video content feature information is input into Feed Forward, passes through an FC layer and a RelU activation function, and then passes through an FC layer, and all feature information in the X third video content feature information are fused respectively, so as to obtain more fully described X second video content feature information.

示例2，结合示例1，将上述N个视频片段中的每一帧RGB图像以P×P大小的方块来划分，得到个P×P×3的方块。接着，将每一个方块拉直成P×P×3长度的向量。并由一个全连接层进行线性变换得到长度为L_video-in的向量F_video-in。这样，电子设备将得到的向量F_video-in输入到视频内容特征抽取模型M_video中，进行视频内容的特征信息抽取后，便可得到长度为L_video-out的视频内容特征信息F_video-out。其中，L_video-out＝L_user-out。Example 2, combined with Example 1, divide each RGB image frame in the above N video clips into P×P blocks, and obtain P×P×3 blocks. Then, each block is straightened into a vector of length P×P×3. A fully connected layer is used for linear transformation to obtain a vector F video _{-in of length L video} _-in . In this way, the electronic device inputs the obtained vector F _video-in into the video content feature extraction model M _video , and after extracting the feature information of the video content, the video content feature information F _video-out of length L _video-out can be obtained. Wherein, L _video-out = L _user-out .

步骤202、将目标用户的用户行为特征信息与所述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到所述N个视频片段中的每个视频片段对应的视频特征信息。Step 202: respectively concatenate the user behavior feature information of the target user with the first video content feature information corresponding to the N video clips to obtain video feature information corresponding to each of the N video clips.

在本申请实施例中，电子设备通过Concatenate函数将上述每个视频片段对应的视频内容特征信息F_video-out和用户行为特征信息F_user-out对应位置进行拼接，最终得到上述N个视频片段中的每个视频片段对应的视频特征信息。In an embodiment of the present application, the electronic device concatenates the corresponding positions of the video content feature information F _video-out and the user behavior feature information F _user-out corresponding to each of the above video clips through a Concatenate function, and finally obtains the video feature information corresponding to each of the above N video clips.

在本申请实施例中，任一视频片段对应的视频特征信息包括：目标用户的用户行为特征信息和该任一视频片段的视频内容特征信息。其中，上述任一视频片段为N个视频片段中的任一视频片段。In the embodiment of the present application, the video feature information corresponding to any video clip includes: user behavior feature information of the target user and video content feature information of the any video clip. Wherein, the any video clip is any video clip among the N video clips.

在本申请实施例中，上述用户行为特征信息和视频内容特征信息都可以以特征向量形式表示。In the embodiment of the present application, the above-mentioned user behavior feature information and video content feature information can both be represented in the form of feature vectors.

在本申请实施例中，上述目标用户的用户行为特征信息包括用户基础特征F_user-base信息和可学习的用户属性特征F_user-laern信息。In the embodiment of the present application, the user behavior feature information of the target user includes user basic feature F _user-base information and learnable user attribute feature F _user-laern information.

在本申请实施例中，上述用户基础特征F_user-base信息可以为能够体现用户的性别、年龄、喜欢的影视类型等信息的特征信息。In the embodiment of the present application, the above-mentioned user basic feature F _user-base information may be feature information that can reflect the user's gender, age, favorite film and television genres, and other information.

在一种可能的实施例中，上述可学习的用户属性特征F_user-laern信息可以根据用户行为参数进行更新。In a possible embodiment, the learnable user attribute feature F _user-laern information may be updated according to user behavior parameters.

在一种可能的实施例中，针对用户基础特征F_user-base，电子设备可以在获取到用户属性数据(UserData)后，通过对用户属性数据进行编码，然后，基于编码后的用户属性数据和第一公式计算出长度为的L_user-base用户基础特征F_user-base。In a possible embodiment, for the user basic feature F _user-base , the electronic device may encode the user attribute data (UserData) after acquiring the user attribute data, and then calculate a user basic feature F _user-base of length L user _-base based on the encoded user attribute data and the first formula.

示例性地，上述用户属性数据可以如下表1所示。Exemplarily, the above user attribute data may be shown in Table 1 below.

表1Table 1

示例性地，电子设备在对用户属性数据进行编码时，通常是通过将采集到的用户各属性数据进行编码转换为用户各属性特征向量。Exemplarily, when encoding user attribute data, the electronic device usually encodes the collected user attribute data into a feature vector of each user attribute.

示例性地，电子设备在将用户属性数据转换为用户属性特征向量时，通过下述转发方式实现。具体的，该转换方式为：各属性的可选值如表1所示，每个属性的特征向量长度为其可选择值的数量，若属性a的第i个值被选择则V_a,i＝1否则V_a,i＝0。Exemplarily, when the electronic device converts the user attribute data into the user attribute feature vector, it is implemented by the following forwarding method. Specifically, the conversion method is: the optional values of each attribute are shown in Table 1, the feature vector length of each attribute is the number of its optional values, if the i-th value of attribute a is selected, then Va _,i = 1, otherwise Va _,i = 0.

示例性地，电子设备将用户属性数据进行编码后，以向量V表示，并通过第一公式将各属性归一化，从而得到适合神经网络输入的数据分布。并将归一化后的所有向量再次进行编码得到长度为L_user-base的用户基础特征F_user-base。Exemplarily, the electronic device encodes the user attribute data and represents it with a vector V, and normalizes each attribute using the first formula to obtain a data distribution suitable for neural network input. All normalized vectors are encoded again to obtain a user basic feature F _user-base with a length of L _user-base .

示例性地，上述第一公式为： Exemplarily, the first formula is:

其中，V_norm为归一化后的属性特征；Among them, V _norm is the normalized attribute feature;

V_mean为从训练数据中统计所得到的属性均值；V _mean is the attribute mean obtained from the statistics of training data;

V_std为从训练数据中统计所得到的标准差。V _std is the standard deviation obtained from the training data.

在本申请实施例中，用户行为特征信息通过用户行为特征抽取模型M_user基于上述用户基础特征F_user-base信息和可学习的用户属性特征F_user-laern信息进行抽取。In an embodiment of the present application, user behavior feature information is extracted by a user behavior feature extraction model M _user based on the above-mentioned user basic feature F _user-base information and learnable user attribute feature F _user-laern information.

在一种可能的实施例中，电子设备在将上述用户基础特征F_user-base和可学习的用户属性特征F_user-laern使用Concatenate进行拼接，得到F_user-in。In a possible embodiment, the electronic device concatenates the user basic feature F _user-base and the learnable user attribute feature F _user-laern using Concatenate to obtain F _user-in .

示例性地，电子设备在将上述F_user-in输入用户行为特征抽取模型M_user后，用户行为特征抽取模型M_user可以对F_user-in进行用户行为特征信息的提取，得到长度为L_user-out的用户行为特征信息F_user-out。Exemplarily, after the electronic device inputs the above F _user-in into the user behavior feature extraction model M _user , the user behavior feature extraction model M _user may extract user behavior feature information from F _user-in to obtain user behavior feature information F _{user-out having a length of L user-} _out .

示例1：如图3所示，电子设备将F_user-in输入用户行为特征抽取模型M_user后，会经过线性变换层(Fully Connected，FC)改变F_user-in的维度，接着通过正则化层(BatchNorm)、激活函数层(如Relu激活函数)将输入转换成长度为L_user-hidden的向量F_user-hidden；然后，再次经过FC层、BatchNorm正则化、ReLU激活函数的最终大小为L_user-out的用户行为特征信息F_user-out并输出。Example 1: As shown in Figure 3, after the electronic device inputs F _user-in into the user behavior feature extraction model M _user , it will pass through a linear transformation layer (Fully Connected, FC) to change the dimension of F _user-in , and then pass through a regularization layer (BatchNorm) and an activation function layer (such as a Relu activation function) to convert the input into a vector F _user-hidden of length L _user-hidden ; then, it passes through the FC layer, BatchNorm regularization, and ReLU activation function again, and the final user behavior feature information F user _{-out of} size L _user-out is output.

步骤203、基于上述每个视频片段对应的视频特征信息，确定该每个视频片段对应的第一播放倍速。Step 203: Determine a first playback speed corresponding to each video segment based on the video feature information corresponding to each video segment.

步骤204、基于上述第一播放倍速播放上述第一视频。Step 204: Play the first video based on the first playback speed.

在本申请实施例中，可以将上述每个视频片段对应的第一播放倍速分别适用于各自对应的视频片段。In the embodiment of the present application, the first playback speed corresponding to each of the above-mentioned video clips can be applied to the corresponding video clips respectively.

在本申请实施例中，电子设备将确定好的第一播放速度应用于各个视频片段，从而使得在播放视频时视频可以根据第一播放速度自动调整当前播放速度。In an embodiment of the present application, the electronic device applies the determined first playback speed to each video clip, so that when playing the video, the video can automatically adjust the current playback speed according to the first playback speed.

示例3，对于第N_i个视频片段S_Ni，假设原始的帧数为N，若确认出的第一播放倍速为V，则从原始的N帧均匀采样N/V帧，实现V的播放倍速。Example 3: for the Ni _- th video segment S _Ni , assuming that the original number of frames is N, if the first playback speed determined is V, N/V frames are uniformly sampled from the original N frames to achieve a playback speed of V.

可选地，在本申请实施例中，上述步骤203“基于上述每个视频片段对应的视频特征信息，确定该每个视频片段对应的第一播放倍速”的过程中，具体地包括以下步骤203a和步骤203b：Optionally, in the embodiment of the present application, the process of the above step 203 of "determining the first playback speed corresponding to each video clip based on the video feature information corresponding to each video clip" specifically includes the following steps 203a and 203b:

步骤203a、将上述每个视频片段对应的视频特征信息输入视频播放速度估计模型，输出上述每个视频片段对应的播放倍速预测参数。Step 203a: input the video feature information corresponding to each of the above video clips into a video playback speed estimation model, and output a playback speed prediction parameter corresponding to each of the above video clips.

示例性地，上述视频播放速度估计模型中M_speed包含注意力机制模块SEModules。Exemplarily, M _speed in the above video playback speed estimation model includes attention mechanism modules SEModules.

可选地，在本申请实施例中，上述步骤203a的过程中，包括以下步骤203a1和步骤203a2：Optionally, in the embodiment of the present application, the process of step 203a includes the following steps 203a1 and 203a2:

步骤203a1、针对上述N个视频片段中的其中一帧视频片段，将该视频片段对应的视频特征信息输入注意力机制模块，得到该视频片段对应的关键视频特征信息。Step 203a1: for one frame of the video clip among the above-mentioned N video clips, input the video feature information corresponding to the video clip into the attention mechanism module to obtain the key video feature information corresponding to the video clip.

步骤203a2、获取与上述关键视频特征信息存在映射关系的播放速度信息，并基于该播放倍数信息和上述关键视频特征信息，得到上述视频片段对应的播放倍速预测参数。Step 203a2: Obtain the playback speed information that is mapped to the key video feature information, and obtain the playback speed prediction parameter corresponding to the video clip based on the playback multiple information and the key video feature information.

示例性地，如图4所示，针对第i个视频片段，电子设备将第i个视频片段对应的视频特征信息F_speed-in输入视频播放速度估计模型M_speed后，首先通过SEModules进行关键特征提取，以得到关键性的视频特征信息，经过注意力模块SEModules输出得到F_speed-se，接着，再经过FC层、BatchNorm正则化、ReLU激活函数得到F_speed-hidden，再经过FC层得到F_speed-out，最后经过Softmax函数输出第i个视频片段对应的播放倍速预测参数P_i。i＝1，2……，N。Exemplarily, as shown in FIG4 , for the i-th video clip, the electronic device inputs the video feature information F _speed-in corresponding to the i-th video clip into the video playback speed estimation model M _speed , firstly performs key feature extraction through SEModules to obtain key video feature information, and obtains F _speed-se through the output of the attention module SEModules, then obtains F speed _-hidden through the FC layer, BatchNorm regularization, and ReLU activation function, and then obtains F _speed-out through the FC layer, and finally outputs the playback speed prediction parameter P _i corresponding to the i-th video clip through the Softmax function. i=1, 2..., N.

进一步示例性地，首先，将第i个视频片段中对应的用户行为特征信息和第一视频内容特征信息以Concatenate函数操作进行拼接，得到向量形状为向量形状为(BatchSize，L_video-out+L_user-out)视频特征信息F_speed-in。Further exemplarily, first, the corresponding user behavior feature information in the i-th video segment and the first video content feature information are concatenated using a Concatenate function operation to obtain the video feature information F _speed-in with a vector shape of (BatchSize, L _video-out +L _user-out ).

接着，将视频特征信息F_speed-in输入SEModules，对F_speed-in中包含的用户行为特征信息和第一视频内容特征信息进一步进行融合，并对该模块中各通道的特征赋予不同的权重来表示其重要性，从而提取出第i个视频片段对应的关键视频特征信息。具体地，设SEModule的输入为F_speed-in，形状为(Batch Size，L_video-out+L_user-out)。F_speed-in经过2个FC层和1个Sigmoid层生成取值范围0-1，形状为(Batch Size，L_video-out+L_user-out)的特征权重系数W，最终将F_speed-in和W相乘得到加权后的F_speed-in，即关键视频特征信息F_speed-se，其形状仍为(Batch Size，L_video-out+L_user-out)。Next, the video feature information F _speed-in is input into SEModules, the user behavior feature information and the first video content feature information contained in F _speed-in are further fused, and different weights are assigned to the features of each channel in the module to indicate their importance, thereby extracting the key video feature information corresponding to the i-th video clip. Specifically, let the input of SEModule be F _speed-in , and the shape be (Batch Size, L _video-out +L _user-out ). F _speed-in passes through 2 FC layers and 1 Sigmoid layer to generate a feature weight coefficient W with a value range of 0-1 and a shape of (Batch Size, L _video-out +L _user-out ). Finally, F _speed-in and W are multiplied to obtain the weighted F _speed-in , i.e., the key video feature information F _speed-se , and its shape is still (Batch Size, L _video-out +L _user-out ).

然后，将关键视频特征信息F_speed-se输入FC层，BatchNorm正则化、ReLU激活函数将该关键视频特征信息映射到播放速度的维度，得到用于估计播放速度的抽象特征F_speed-hidden，再经过FC层获取到上述关键视频特征信息存在映射关系的播放速度信息F_speed-out输出播放速度估计模块。最后将输出的F_speed-out经过Softma函数得到第i个片段播放倍速的预测参数。Then, the key video feature information F _speed-se is input into the FC layer, and the BatchNorm regularization and ReLU activation function map the key video feature information to the dimension of the playback speed, and the abstract feature F _speed-hidden for estimating the playback speed is obtained. Then, the playback speed information F _speed-out with which the above key video feature information has a mapping relationship is obtained through the FC layer and output to the playback speed estimation module. Finally, the output F _speed-out is passed through the Softma function to obtain the prediction parameters of the playback speed of the i-th segment.

应注意的是，上述第i个视频片段为上述N个视频片段中的任一视频片段，换句话说，上述N个视频片段中的每个视频片段可以按照视频播放速度估计模型针对上述第i个视频片段的估计过程来实现，此处不再赘述。It should be noted that the above-mentioned i-th video clip is any video clip among the above-mentioned N video clips. In other words, each of the above-mentioned N video clips can be implemented according to the estimation process of the video playback speed estimation model for the above-mentioned i-th video clip, which will not be repeated here.

步骤203b、分别基于上述每个视频片段对应的播放倍速预测参数，从X个预置播放倍速中，确定上述每个视频片段对应的第一播放倍速。Step 203b: Determine a first playback speed corresponding to each video segment from X preset playback speeds based on the playback speed prediction parameters corresponding to each video segment.

其中，X为正整数。Wherein, X is a positive integer.

示例性地，上述X个预置播放倍速可以为各类播放视频的应用中所预存的播放倍速，例如，0.5x/0.75x/1.0x/1.5x/2.0x。Exemplarily, the X preset playback speeds may be playback speeds pre-stored in various video playback applications, for example, 0.5x/0.75x/1.0x/1.5x/2.0x.

需要说明的是，不同应用中的预置播放倍速可以相同也可以不同。例如，若上述第一视频为第一应用中的视频，则上述X个预置播放倍速为该第一应用对应的预置播放倍速。It should be noted that the preset playback speeds in different applications may be the same or different. For example, if the first video is a video in a first application, the X preset playback speeds are the preset playback speeds corresponding to the first application.

示例性地，每个视频片段对应的播放倍速预测参数用于指示至少一个预置播放倍速。如此，电子设备可以一个视频片段对应的该播放倍速预测参数所指示的预置播放倍速中选择一个预置播放倍速作为该视频片段的第一播放倍速。Exemplarily, the playback speed prediction parameter corresponding to each video segment is used to indicate at least one preset playback speed. Thus, the electronic device can select a preset playback speed from the preset playback speeds indicated by the playback speed prediction parameter corresponding to a video segment as the first playback speed of the video segment.

示例性地，上述X个预置播放倍数也可以为用户自定义设置的。具体地，设置最小播放倍速C_min和最大播放倍速C_max，以及播放倍速可选择数量为X，则第i级的播放倍速可表示为播放倍速可选择集合表示为C＝{C_i}。Exemplarily, the above X preset playback multiples can also be set by the user. Specifically, the minimum playback speed C _min and the maximum playback speed C _max are set, and the number of selectable playback speeds is X, then the playback speed of the i-th level can be expressed as The set of selectable playback speeds is represented by C = {C _i }.

可选地，在本申请实施例中，上述N个视频片段中的任一视频片段对应的播放倍速预测参数包括：X个概率值。Optionally, in the embodiment of the present application, the playback speed prediction parameter corresponding to any video segment among the above-mentioned N video segments includes: X probability values.

示例性地，X个概率值对应X个预置播放倍速。Exemplarily, X probability values correspond to X preset playback speeds.

示例性地，X个概率值中的任一概率值用于表征：任一视频片段的播放倍速为任一概率值对应的预置播放倍速的概率。Exemplarily, any probability value among the X probability values is used to represent the probability that the playback speed of any video segment is a preset playback speed corresponding to any probability value.

可选地，在本申请实施例中，上述步骤203b“基于上述每个视频片段对应的播放倍速预测参数，从X个预置播放倍速中，确定上述每个视频片段对应的第一播放倍速”的过程中，具体地包括以下步骤203b1：Optionally, in the embodiment of the present application, the process of step 203b of "determining the first playback speed corresponding to each video clip from X preset playback speeds based on the playback speed prediction parameter corresponding to each video clip" specifically includes the following steps 203b1:

步骤203b1、将上述任一视频片段对应的X个概率值中的最大概率值对应的预置播放倍速中，确定为上述任一视频片段对应的第一播放倍速。Step 203b1: Determine the preset playback speed corresponding to the maximum probability value among the X probability values corresponding to any one of the video clips as the first playback speed corresponding to any one of the video clips.

示例性地，上述播放倍速预测参数可以是长度为X的向量。Exemplarily, the playback speed prediction parameter may be a vector of length X.

示例性地，针对第i个视频片段，电子设备可以将该第i个视频片段对应的播放倍速预测参数向量中最大元素(即，该向量中概率值最大的元素)所对应的预置播放倍速作为该第i个视频片段的对应的第一播放倍速。Exemplarily, for the i-th video clip, the electronic device may use the preset playback speed corresponding to the maximum element in the playback speed prediction parameter vector corresponding to the i-th video clip (i.e., the element with the largest probability value in the vector) as the corresponding first playback speed of the i-th video clip.

示例性地，电子设备可以采用第二公式从上述X个预置播放倍速中选择上述任一视频片段对应的X个概率值中的最大概率值对应的预置播放倍速。Exemplarily, the electronic device may use the second formula to select, from the X preset playback speeds, a preset playback speed corresponding to the maximum probability value among the X probability values corresponding to any one of the video clips.

示例性地，第二公式为： Exemplarily, the second formula is:

其中，V为上述第一播放倍速；Wherein, V is the first playback speed mentioned above;

C_i为上述集合C中对应的第i级预置播放倍速； _{Ci is} the i-th level preset playback speed corresponding to the above set C;

P_j为最大元素的位置所对应的向量。P _j is the vector corresponding to the position of the largest element.

如此，根据对视频内容特征的提取，计算出视频中每个视频对应的播放倍速，将该倍数自适应应用于当前视频中，从而使得用户观看视频过程中无需手动调节播放倍速。In this way, based on the extraction of video content features, the playback speed corresponding to each video in the video is calculated, and the multiple is adaptively applied to the current video, so that the user does not need to manually adjust the playback speed during the video viewing process.

可选地，在本申请实施例中，上述步骤204“将上述每个视频片段对应的第一播放倍速分别适用于各自对应的视频片段”之后，本申请提供的视频播放倍速的确定方法还包括以下步骤301和步骤302：Optionally, in the embodiment of the present application, after the above step 204 of "applying the first playback speed corresponding to each of the above video segments to the corresponding video segments respectively", the method for determining the video playback speed provided by the present application further includes the following steps 301 and 302:

步骤301、在检测到用户调节第一视频片段的视频播放速度的第一输入之后，将该第一视频片段对应的播放倍速预测参数和该第一输入对应的实际视频播放速度作为第一训练样本。Step 301: after detecting a first input of a user adjusting a video playback speed of a first video clip, use a playback speed prediction parameter corresponding to the first video clip and an actual video playback speed corresponding to the first input as a first training sample.

步骤302、采用该第一训练样本对上述视频播放速度估计模型进行训练，得到训练后的上述视频播放速度估计模型。Step 302: Use the first training sample to train the video playback speed estimation model to obtain the trained video playback speed estimation model.

示例性地，上述第一视频片段为上述N个视频片段中的其中之一。Exemplarily, the first video clip is one of the N video clips.

示例性地，上述第一输入可以包括：用户对显示屏的触控输入，或者，用户输入的语音指令，或者，用户输入的特定手势，具体的可以根据实际使用需求确定，本发明实施例不作限定。Exemplarily, the first input may include: a touch input of a user to a display screen, or a voice command input by a user, or a specific gesture input by a user, which may be determined according to actual usage requirements and is not limited in the embodiments of the present invention.

示例性地，上述的第一输入为调节当前视频播放倍速的输入。Exemplarily, the first input is an input for adjusting the playback speed of the current video.

示例性地，上述第一输入可以包括：在用户不满意第一视频片段的视频播放速度时，对视频界面的第一输入。Exemplarily, the first input may include: a first input to the video interface when the user is not satisfied with the video playback speed of the first video clip.

示例性地，电子设备可以基于第三公式处理第一训练样本，并以处理后的训练样本对上述视频播放速度估计模型进行训练，得到训练后的上述视频播放速度估计模型。Exemplarily, the electronic device may process the first training sample based on the third formula, and train the video playback speed estimation model using the processed training sample to obtain the trained video playback speed estimation model.

示例性地，上述第三公式可以为交叉熵损失函数。Exemplarily, the third formula above may be a cross entropy loss function.

具体地，上述第三公式为：L＝-logP_i[V_i]。Specifically, the third formula is: L=-logP _i [V _i ].

其中，P_i为第一视频片段对应的播放倍速预测参数；Wherein, Pi _is the playback speed prediction parameter corresponding to the first video clip;

V_i为第一输入对应的实际视频播放速度。Vi _is the actual video playback speed corresponding to the first input.

可选地，在本申请实施例中，本申请提供的视频播放倍速的确定方法还包括以下步骤401：Optionally, in the embodiment of the present application, the method for determining the video playback speed provided by the present application further includes the following steps 401:

步骤401、在检测到目标用户在预定时间段内调节视频播放速度的行为满足第一条件的情况下，基于该目标用户的第一信息，更新该目标用户的行为特征信息。Step 401: When it is detected that the behavior of the target user adjusting the video playback speed within a predetermined time period satisfies a first condition, the behavior characteristic information of the target user is updated based on the first information of the target user.

示例性地，上述第一信息包括：上述预定时间段内该目标用户调节视频播放速度的行为对应的行为参数，以及该目标用户所调节视频的视频信息。Exemplarily, the first information includes: behavior parameters corresponding to the target user's behavior of adjusting the video playback speed within the predetermined time period, and video information of the video adjusted by the target user.

示例性地，上述预定时间段可以为用户自行设置，也可以为电子设备自定义的。Exemplarily, the above-mentioned predetermined time period may be set by the user himself or customized by the electronic device.

示例性地，上述第一条件包括以下至少之一：Exemplarily, the first condition includes at least one of the following:

在预定时间段内用户主动调节第一视频的视频播放倍速次数达到第一阈值(如，在预定时间段内用户调节某一视频片段的视频播放倍速的次数达到阈值A)；The number of times that the user actively adjusts the video playback speed of the first video within the predetermined time period reaches a first threshold (for example, the number of times that the user adjusts the video playback speed of a certain video segment within the predetermined time period reaches threshold A);

在预定时间段内用户调节第一视频的视频播放倍速的调节时长达到第二阈值。The adjustment duration of the video playback speed of the first video by the user within the predetermined time period reaches a second threshold.

示例性地，在检测用户观看视频的过程中主动触发的倍速播放的行为以及相应的视频片段的内容，不断更新可学习的用户属性特征F_user-laern信息，作为后续观看视频过程中最新的可学习的用户属性特征F_user-laern信息。更新可学习的用户属性特征F_user-laern信息过程中，仅将F_user-laern作为可学习参数，其余模型参数和F_user-base均固定，仍使用交叉熵损失函数进行训练。在用户观看视频过程中，在预定时间内满足第一条件，即用户主动调节视频播放倍速次数达到预定阈值时，对可学习的用户属性特征F_user-laern信息进行更新。Exemplarily, in the process of detecting the user's actively triggered speed playback behavior and the content of the corresponding video clip in the process of watching the video, the learnable user attribute feature F _user-laern information is continuously updated as the latest learnable user attribute feature F _user-laern information in the subsequent video watching process. In the process of updating the learnable user attribute feature F _user-laern information, only F _user-laern is used as a learnable parameter, and the remaining model parameters and F _user-base are fixed, and the cross entropy loss function is still used for training. In the process of the user watching the video, when the first condition is met within the predetermined time, that is, when the user actively adjusts the video playback speed to a predetermined threshold, the learnable user attribute feature F _user-laern information is updated.

如此，可以通过不断的训练，保证在电子设备提供的模型所推荐的播放倍速已不满足用户需求时，可以及时触发训练，更新为最新的模型，从而更大程度减少用户观看视频时主动触发调节视频播放倍速的行为。In this way, through continuous training, it can be ensured that when the playback speed recommended by the model provided by the electronic device no longer meets the user's needs, training can be triggered in time and updated to the latest model, thereby greatly reducing the user's behavior of actively triggering the adjustment of the video playback speed when watching the video.

以下将以一个实施例对本申请提供的图像识别方法进行示例性说明。The image recognition method provided in this application is exemplified by an embodiment as follows.

本实施例中基于用户行为特征信息和视频内容特征信息设计了个性化、自适应的视频播放倍速调节功能，用户观看视频过程中无需手动调节播放倍数。构建的用户画像可以帮助了解用户感兴趣的视频片段的类型，对视频内容的理解可以帮助定位用户感兴趣的内容。结合上述两者可以自适应地估算用户期望的视频播放速率，并及时进行调节。具体地，如图5所示，包括以下步骤S1至步骤S6：In this embodiment, a personalized and adaptive video playback speed adjustment function is designed based on user behavior feature information and video content feature information, and the user does not need to manually adjust the playback multiple during video viewing. The constructed user portrait can help understand the type of video clips that the user is interested in, and the understanding of the video content can help locate the content that the user is interested in. Combining the above two, the user's expected video playback rate can be adaptively estimated and adjusted in time. Specifically, as shown in Figure 5, the following steps S1 to S6 are included:

步骤S1：电子设备获取用户画像及原始视频内容信息。Step S1: The electronic device obtains the user portrait and original video content information.

步骤S2：电子设备通过编码器将用户画像编码成向量形式，并基于用户行为特征抽取模型提取用户行为特征信息向量；同时，将步骤S1中获取的原始视频进行分段，得到N个视频片段。Step S2: The electronic device encodes the user portrait into a vector form through an encoder, and extracts the user behavior feature information vector based on the user behavior feature extraction model; at the same time, the original video obtained in step S1 is segmented to obtain N video clips.

步骤S3：电子设备将步骤S2中的每个视频片段输入视频内容特征抽取模型，进行分析并编码，得到每个视频对应的用户行为特征信息向量。Step S3: The electronic device inputs each video clip in step S2 into the video content feature extraction model, analyzes and encodes it, and obtains the user behavior feature information vector corresponding to each video.

步骤S4：电子设备将步骤S2中用户行为特征信息向量的和步骤S3中的用户行为特征信息向量输入至播放速度估计模型，得到每个视频片段对应的播放速度。Step S4: The electronic device inputs the user behavior feature information vector in step S2 and the user behavior feature information vector in step S3 into a playback speed estimation model to obtain a playback speed corresponding to each video clip.

步骤S5：电子设备将每个视频片段对应的播放速度应用至每个视频片段中。Step S5: the electronic device applies the playback speed corresponding to each video segment to each video segment.

步骤S6：电子设备根据用户主动对视频播放速度调节的交互，更新步骤S1中的用户画像，即更新用户行为特征信息。Step S6: The electronic device updates the user portrait in step S1, that is, updates the user behavior feature information, based on the user's active interaction in adjusting the video playback speed.

如此，使用用户行为特征抽取模型和视频内容特征抽取模型分别得到用户和视频内容的抽象表征，并通过播放速度估计模型建立抽象表征与视频播放倍速之间的关系，实现无需用户交互的视频倍速播放功能。In this way, the user behavior feature extraction model and the video content feature extraction model are used to obtain abstract representations of users and video content respectively, and the relationship between the abstract representation and the video playback speed is established through the playback speed estimation model, thereby realizing the video speed playback function without user interaction.

需要说明的是，本申请实施例提供的视频播放倍速的确定方法，执行主体可以为视频播放倍速的确定装置，或者电子设备，还可以为电子设备中的功能模块或实体。本申请实施例中以视频播放倍速的确定装置执行视频播放倍速的确定方法为例，说明本申请实施例提供的视频播放倍速的确定装置。It should be noted that the method for determining the video playback speed provided in the embodiment of the present application can be performed by a device for determining the video playback speed, or an electronic device, or a functional module or entity in the electronic device. In the embodiment of the present application, the method for determining the video playback speed performed by the device for determining the video playback speed is taken as an example to illustrate the device for determining the video playback speed provided in the embodiment of the present application.

图6示出了本申请实施例中涉及的视频播放倍速的确定装置的一种可能的结构示意图。如图6所示，该视频播放倍速的确定装置700可以包括：获取模块701、处理模块702和播放模块703：该获取模块701，用于获取第一视频中的N个视频片段对应的第一视频内容特征信息，N为正整数；该处理模块702，用于将目标用户的用户行为特征信息与上述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到该N个视频片段中的每个视频片段对应的视频特征信息；该处理模块702，用于基于上述每个视频片段对应的视频特征信息，确定该每个视频片段对应的第一播放倍速；该播放模块703，用于基于该第一播放倍速播放上述第一视频。Fig. 6 shows a possible structural diagram of a device for determining the video playback speed involved in an embodiment of the present application. As shown in Fig. 6, the device 700 for determining the video playback speed may include: an acquisition module 701, a processing module 702 and a playback module 703: the acquisition module 701 is used to acquire the first video content feature information corresponding to N video clips in the first video, where N is a positive integer; the processing module 702 is used to respectively splice the user behavior feature information of the target user with the first video content feature information corresponding to the above N video clips to obtain the video feature information corresponding to each video clip in the N video clips; the processing module 702 is used to determine the first playback speed corresponding to each video clip based on the video feature information corresponding to each video clip; the playback module 703 is used to play the above first video based on the first playback speed.

可选地，在本申请实施例中，上述处理模块702，具体用于针对上述N个视频片段中的其中一帧视频片段，将该视频片段中的视频图像进行划分，得到X个图像块；上述获取模块701，还用于获取上述X个图像块对应的图像特征信息；上述处理模块702，具体用于将上述X个图像块对应的图像特征信息输入视频内容特征提取模型进行特征提取，得到该X个图像块对应的第二视频内容特征信息；上述处理模块702，具体用于将上述X个图像块对应的第二视频内容特征信息进行特征融合，得到上述第一视频内容特征信息。Optionally, in an embodiment of the present application, the processing module 702 is specifically used to divide the video image in one of the N video segments into X image blocks; the acquisition module 701 is also used to obtain image feature information corresponding to the X image blocks; the processing module 702 is specifically used to input the image feature information corresponding to the X image blocks into a video content feature extraction model for feature extraction to obtain second video content feature information corresponding to the X image blocks; the processing module 702 is specifically used to perform feature fusion on the second video content feature information corresponding to the X image blocks to obtain the first video content feature information.

可选地，在本申请实施例中，上述处理模块702，具体用于：将上述X个图像块对应的图像特征信息输入视频内容特征提取模型之后，基于多头注意力模块，将上述X个图像块对应的图像特征信息进行视频内容特征提取，得到该X个图像块对应的X个第一关键视频内容特征信息；基于残差和标准化模块，计算上述X个第一关键视频内容特征信息对应的均值和标准差值，并基于该均值和标准差值，获取上述X个第一关键视频内容特征信息对应的X个第三视频内容特征信息；基于前反馈模块，分别将上述X个第三视频内容特征信息中的每个第三视频内容特征信息中的所有特征信息进行融合，得到X个第二视频内容特征信息；其中，一个图像块对应一个第一关键视频内容特征信息。Optionally, in an embodiment of the present application, the processing module 702 is specifically used to: after inputting the image feature information corresponding to the X image blocks into the video content feature extraction model, based on the multi-head attention module, perform video content feature extraction on the image feature information corresponding to the X image blocks to obtain X first key video content feature information corresponding to the X image blocks; based on the residual and normalization module, calculate the mean and standard deviation values corresponding to the X first key video content feature information, and based on the mean and standard deviation values, obtain X third video content feature information corresponding to the X first key video content feature information; based on the forward feedback module, respectively fuse all feature information in each of the X third video content feature information to obtain X second video content feature information; wherein, one image block corresponds to one first key video content feature information.

可选地，在本申请实施例中，上述处理模块702，具体用于：将上述每个视频片段对应的视频特征信息输入视频播放速度估计模型，输出该每个视频片段对应的播放倍速预测参数；分别基于该每个视频片段对应的播放倍速预测参数，从X个预置播放倍速中，确定上述每个视频片段对应的第一播放倍速；其中，X为正整数。Optionally, in an embodiment of the present application, the processing module 702 is specifically used to: input the video feature information corresponding to each of the video clips into a video playback speed estimation model, and output a playback speed prediction parameter corresponding to each video clip; based on the playback speed prediction parameter corresponding to each video clip, determine the first playback speed corresponding to each of the video clips from X preset playback speeds; wherein X is a positive integer.

可选地，在本申请实施例中，上述处理模块702，具体用于：针对上述N个视频片段中的其中一帧视频片段，将该视频片段对应的视频特征信息输入注意力机制模块，得到上述视频片段对应的关键视频特征信息；获取与上述关键视频特征信息存在映射关系的播放速度信息，并基于上述播放倍数信息和上述关键视频特征信息，得到上述视频片段对应的播放倍速预测参数。Optionally, in an embodiment of the present application, the processing module 702 is specifically used to: for one frame of the N video clips, input the video feature information corresponding to the video clip into the attention mechanism module to obtain the key video feature information corresponding to the video clip; obtain the playback speed information that has a mapping relationship with the key video feature information, and based on the playback multiple information and the key video feature information, obtain the playback speed prediction parameter corresponding to the video clip.

在本申请实施例提供的视频播放倍速的确定装置中，获取第一视频中的N个视频片段对应的第一视频内容特征信息，N为正整数；并将目标用户的用户行为特征信息与上述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到上述N个视频片段中的每个视频片段对应的视频特征信息；基于上述每个视频片段对应的视频特征信息，确定上述每个视频片段对应的第一播放倍速；进而基于该第一播放倍速播放所述第一视频。如此，通过将目标用户的用户行为特征信息融入每一视频片段的视频内容特征信息，使得电子设备能够按照用户行为特征信息和视频内容特征信息来个性化的设置每个片段对应的视频播放倍速，从而无需用户手动调整视频播放倍速。In the apparatus for determining the video playback speed provided in the embodiment of the present application, the first video content feature information corresponding to the N video clips in the first video is obtained, where N is a positive integer; and the user behavior feature information of the target user is respectively spliced with the first video content feature information corresponding to the above N video clips to obtain the video feature information corresponding to each of the above N video clips; based on the video feature information corresponding to each of the above video clips, the first playback speed corresponding to each of the above video clips is determined; and then the first video is played based on the first playback speed. In this way, by integrating the user behavior feature information of the target user into the video content feature information of each video clip, the electronic device can set the video playback speed corresponding to each clip in a personalized manner according to the user behavior feature information and the video content feature information, thereby eliminating the need for the user to manually adjust the video playback speed.

本申请实施例中的视频播放倍速的确定装置可以是电子设备，也可以是电子设备中的部件，例如集成电路或芯片。该电子设备可以是终端，也可以为除终端之外的其他设备。示例性的，电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、移动上网装置(Mobile Internet Device，MID)、增强现实(augmented reality，AR)/虚拟现实(virtual reality，VR)设备、机器人、可穿戴设备、超级移动个人计算机(ultra-mobilepersonal computer，UMPC)、上网本或者个人数字助理(personal digital assistant，PDA)等，还可以为服务器、网络附属存储器(Network Attached Storage，NAS)、个人计算机(personal computer，PC)、电视机(television，TV)、柜员机或者自助机等，本申请实施例不作具体限定。The device for determining the video playback speed in the embodiment of the present application can be an electronic device, or a component in the electronic device, such as an integrated circuit or a chip. The electronic device can be a terminal, or other devices other than a terminal. Exemplarily, the electronic device can be a mobile phone, a tablet computer, a laptop computer, a PDA, a car-mounted electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, a robot, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant, PDA), etc., and can also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., which is not specifically limited in the embodiment of the present application.

本申请实施例中的视频播放倍速的确定装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统，可以为ios操作系统，还可以为其他可能的操作系统，本申请实施例不作具体限定。The device for determining the video playback speed in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.

本申请实施例提供的视频播放倍速的确定装置能够实现图1至图6的方法实施例实现的各个过程，为避免重复，这里不再赘述。The apparatus for determining the video playback speed provided in the embodiment of the present application can implement each process implemented in the method embodiments of Figures 1 to 6, and will not be described again here to avoid repetition.

可选地，如图7所示，本申请实施例还提供一种电子设备800，包括处理器801和存储器802，存储器802上存储有可在所述处理器801上运行的程序或指令，该程序或指令被处理器801执行时实现上述视频播放倍速的确定方法实施例的各个步骤，且能达到相同的技术效果，为避免重复，这里不再赘述。Optionally, as shown in Figure 7, an embodiment of the present application also provides an electronic device 800, including a processor 801 and a memory 802, and the memory 802 stores a program or instruction that can be executed on the processor 801. When the program or instruction is executed by the processor 801, the various steps of the above-mentioned method for determining the video playback speed embodiment are implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.

需要说明的是，本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and non-mobile electronic devices mentioned above.

图8为实现本申请实施例的一种电子设备的硬件结构示意图。FIG8 is a schematic diagram of the hardware structure of an electronic device implementing an embodiment of the present application.

该电子设备100包括但不限于：射频单元101、网络模块102、音频输出单元103、输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、以及处理器110等部件。The electronic device 100 includes but is not limited to components such as a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.

本领域技术人员可以理解，电子设备100还可以包括给各个部件供电的电源(比如电池)，电源可以通过电源管理系统与处理器110逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图8中示出的电子设备结构并不构成对电子设备的限定，电子设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置，在此不再赘述。Those skilled in the art can understand that the electronic device 100 can also include a power supply (such as a battery) for supplying power to each component, and the power supply can be logically connected to the processor 110 through a power management system, so that the power management system can manage charging, discharging, and power consumption. The electronic device structure shown in FIG8 does not constitute a limitation on the electronic device, and the electronic device can include more or fewer components than shown in the figure, or combine certain components, or arrange components differently, which will not be described in detail here.

其中，处理器110，用于获取第一视频中的N个视频片段对应的第一视频内容特征信息，N为正整数；处理器110，用于将目标用户的用户行为特征信息与上述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到该N个视频片段中的每个视频片段对应的视频特征信息；处理器110，用于基于上述每个视频片段对应的视频特征信息，确定该每个视频片段对应的第一播放倍速；处理器110，用于基于该第一播放倍速播放上述第一视频。Among them, the processor 110 is used to obtain the first video content feature information corresponding to N video clips in the first video, where N is a positive integer; the processor 110 is used to respectively splice the user behavior feature information of the target user with the first video content feature information corresponding to the above-mentioned N video clips to obtain the video feature information corresponding to each video clip in the N video clips; the processor 110 is used to determine the first playback speed corresponding to each video clip based on the video feature information corresponding to each video clip; the processor 110 is used to play the above-mentioned first video based on the first playback speed.

可选地，在本申请实施例中，上述处理器110，具体用于针对上述N个视频片段中的其中一帧视频片段，将该视频片段中的视频图像进行划分，得到X个图像块；上述处理器110，还用于获取上述X个图像块对应的图像特征信息；上述处理器110，具体用于将上述X个图像块对应的图像特征信息输入视频内容特征提取模型进行特征提取，得到该X个图像块对应的第二视频内容特征信息；上述处理器110，具体用于将上述X个图像块对应的第二视频内容特征信息进行特征融合，得到上述第一视频内容特征信息。Optionally, in an embodiment of the present application, the processor 110 is specifically used to divide a video image in one of the N video segments into X image blocks; the processor 110 is also used to obtain image feature information corresponding to the X image blocks; the processor 110 is specifically used to input the image feature information corresponding to the X image blocks into a video content feature extraction model for feature extraction to obtain second video content feature information corresponding to the X image blocks; the processor 110 is specifically used to perform feature fusion on the second video content feature information corresponding to the X image blocks to obtain the first video content feature information.

可选地，在本申请实施例中，上述处理器110，具体用于：将上述X个图像块对应的图像特征信息输入视频内容特征提取模型之后，基于多头注意力模块，将上述X个图像块对应的图像特征信息进行视频内容特征提取，得到该X个图像块对应的X个第一关键视频内容特征信息；基于残差和标准化模块，计算上述X个第一关键视频内容特征信息对应的均值和标准差值，并基于该均值和标准差值，获取上述X个第一关键视频内容特征信息对应的X个第三视频内容特征信息；基于前反馈模块，分别将上述X个第三视频内容特征信息中的每个第三视频内容特征信息中的所有特征信息进行融合，得到X个第二视频内容特征信息；其中，一个图像块对应一个第一关键视频内容特征信息。Optionally, in an embodiment of the present application, the processor 110 is specifically configured to: after inputting the image feature information corresponding to the X image blocks into a video content feature extraction model, based on a multi-head attention module, perform video content feature extraction on the image feature information corresponding to the X image blocks to obtain X first key video content feature information corresponding to the X image blocks; based on a residual and normalization module, calculate the mean and standard deviation values corresponding to the X first key video content feature information, and based on the mean and standard deviation values, obtain X third video content feature information corresponding to the X first key video content feature information; based on a forward feedback module, respectively fuse all feature information in each of the X third video content feature information to obtain X second video content feature information; wherein one image block corresponds to one first key video content feature information.

可选地，在本申请实施例中，上述处理器110，具体用于：将上述每个视频片段对应的视频特征信息输入视频播放速度估计模型，输出该每个视频片段对应的播放倍速预测参数；分别基于该每个视频片段对应的播放倍速预测参数，从X个预置播放倍速中，确定上述每个视频片段对应的第一播放倍速；其中，X为正整数。Optionally, in an embodiment of the present application, the processor 110 is specifically used to: input the video feature information corresponding to each of the video clips into a video playback speed estimation model, and output a playback speed prediction parameter corresponding to each of the video clips; based on the playback speed prediction parameters corresponding to each of the video clips, determine the first playback speed corresponding to each of the video clips from X preset playback speeds; where X is a positive integer.

可选地，在本申请实施例中，上述处理器110，具体用于：针对上述N个视频片段中的其中一帧视频片段，将该视频片段对应的视频特征信息输入注意力机制模块，得到上述视频片段对应的关键视频特征信息；获取与上述关键视频特征信息存在映射关系的播放速度信息，并基于上述播放倍数信息和上述关键视频特征信息，得到上述视频片段对应的播放倍速预测参数。Optionally, in an embodiment of the present application, the processor 110 is specifically used to: for one frame of the N video clips, input the video feature information corresponding to the video clip into the attention mechanism module to obtain key video feature information corresponding to the video clip; obtain playback speed information that is mapped to the key video feature information, and based on the playback multiple information and the key video feature information, obtain the playback speed prediction parameter corresponding to the video clip.

在本申请实施例提供的视频播放倍速的确定电子设备中，该电子设备通过获取第一视频中的N个视频片段对应的第一视频内容特征信息，N为正整数；并将目标用户的用户行为特征信息与上述N个视频片段对应的第一视频内容特征信息分别进行拼接，得到上述N个视频片段中的每个视频片段对应的视频特征信息；基于上述每个视频片段对应的视频特征信息，确定上述每个视频片段对应的第一播放倍速；进而基于该第一播放倍速播放所述第一视频。如此，通过将目标用户的用户行为特征信息融入每一视频片段的视频内容特征信息，使得电子设备能够按照用户行为特征信息和视频内容特征信息来个性化的设置每个片段对应的视频播放倍速，从而无需用户手动调整视频播放倍速。In the electronic device for determining the video playback speed provided in the embodiment of the present application, the electronic device obtains the first video content feature information corresponding to N video clips in the first video, where N is a positive integer; and respectively splices the user behavior feature information of the target user with the first video content feature information corresponding to the above N video clips to obtain the video feature information corresponding to each of the above N video clips; based on the video feature information corresponding to each of the above video clips, determines the first playback speed corresponding to each of the above video clips; and then plays the first video based on the first playback speed. In this way, by integrating the user behavior feature information of the target user into the video content feature information of each video clip, the electronic device can set the video playback speed corresponding to each clip in a personalized manner according to the user behavior feature information and the video content feature information, thereby eliminating the need for the user to manually adjust the video playback speed.

应理解的是，本申请实施例中，输入单元104可以包括图形处理器(GraphicsProcessing Unit，GPU)1041和麦克风1042，图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元106可包括显示面板1061，可以采用液晶显示器、有机发光二极管等形式来配置显示面板1061。用户输入单元107包括触控面板1071以及其他输入设备1072中的至少一种。触控面板1071，也称为触摸屏。触控面板1071可包括触摸检测装置和触摸控制器两个部分。其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆，在此不再赘述。It should be understood that in the embodiment of the present application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processor 1041 processes the image data of a static picture or video obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc. The user input unit 107 includes a touch panel 1071 and at least one of other input devices 1072. The touch panel 1071 is also called a touch screen. The touch panel 1071 may include two parts: a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which will not be repeated here.

存储器109可用于存储软件程序以及各种数据。存储器109可主要包括存储程序或指令的第一存储区和存储数据的第二存储区，其中，第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外，存储器109可以包括易失性存储器或非易失性存储器，或者，存储器109可以包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(Read-Only Memory，ROM)、可编程只读存储器(Programmable ROM，PROM)、可擦除可编程只读存储器(Erasable PROM，EPROM)、电可擦除可编程只读存储器(Electrically EPROM，EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory，RAM)，静态随机存取存储器(Static RAM，SRAM)、动态随机存取存储器(Dynamic RAM，DRAM)、同步动态随机存取存储器(Synchronous DRAM，SDRAM)、双倍速据速率同步动态随机存取存储器(Double Data Rate SDRAM，DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM，ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM，SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM，DRRAM)。本申请实施例中的存储器109包括但不限于这些和任意其它适合类型的存储器。The memory 109 can be used to store software programs and various data. The memory 109 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, an image playback function, etc.), etc. In addition, the memory 109 may include a volatile memory or a non-volatile memory, or the memory 109 may include both volatile and non-volatile memories. Among them, the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory may be a random access memory (RAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a double data rate synchronous dynamic random access memory (DDRSDRAM), an enhanced synchronous dynamic random access memory (ESDRAM), a synchronous link dynamic random access memory (SLDRAM) and a direct memory bus random access memory (DRRAM). The memory 109 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.

处理器110可包括一个或多个处理单元；可选的，处理器110集成应用处理器和调制解调处理器，其中，应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作，调制解调处理器主要处理无线通信信号，如基带处理器。可以理解的是，上述调制解调处理器也可以不集成到处理器110中。The processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to an operating system, a user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It is understandable that the modem processor may not be integrated into the processor 110.

本申请实施例还提供一种可读存储介质，所述可读存储介质上存储有程序或指令，该程序或指令被处理器执行时实现上述视频播放倍速的确定方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。An embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored. When the program or instruction is executed by a processor, each process of the above-mentioned method for determining the video playback speed is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.

其中，所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质，包括计算机可读存储介质，如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk.

本申请实施例另提供了一种芯片，所述芯片包括处理器和通信接口，所述通信接口和所述处理器耦合，所述处理器用于运行程序或指令，实现上述视频播放倍速的确定方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned method for determining the video playback speed embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.

应理解，本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.

本申请实施例提供一种计算机程序产品，该程序产品被存储在存储介质中，该程序产品被至少一个处理器执行以实现如上述视频播放倍速的确定方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。An embodiment of the present application provides a computer program product, which is stored in a storage medium. The program product is executed by at least one processor to implement the various processes of the above-mentioned method for determining the video playback speed, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外，需要指出的是，本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能，还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能，例如，可以按不同于所描述的次序来执行所描述的方法，并且还可以添加、省去、或组合各种步骤。另外，参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this article, the terms "comprise", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the sentence "comprises one..." does not exclude the presence of other identical elements in the process, method, article or device including the element. In addition, it should be noted that the scope of the method and device in the embodiment of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved, for example, the described method may be performed in an order different from that described, and various steps may also be added, omitted, or combined. In addition, the features described with reference to certain examples may be combined in other examples.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present application, or the part that contributes to the prior art, can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present application.

上面结合附图对本申请的实施例进行了描述，但是本申请并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本申请的启示下，在不脱离本申请宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本申请的保护之内。The embodiments of the present application are described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementation methods. The above-mentioned specific implementation methods are merely illustrative and not restrictive. Under the guidance of the present application, ordinary technicians in this field can also make many forms without departing from the purpose of the present application and the scope of protection of the claims, all of which are within the protection of the present application.

Claims

1. A method for determining a video playback speed, characterized in that the method comprises:

Obtaining first video content feature information corresponding to N video segments in the first video, wherein the N video segments are: N video segments synthesized according to a preset number of segment frames from video frames collected at fixed intervals of the first video, where N is a positive integer;

The user behavior feature information of the target user is respectively spliced with the first video content feature information corresponding to the N video clips to obtain video feature information corresponding to each of the N video clips;

Determining a first playback speed corresponding to each video clip based on the video feature information corresponding to each video clip;

Play the first video based on the first playback speed;

The obtaining first video content feature information corresponding to N video segments in the first video includes:

For one of the N video segments, dividing each frame of video image in the video segment to obtain X image blocks;

Obtaining image feature information corresponding to the X image blocks;

Inputting the image feature information corresponding to the X image blocks into a video content feature extraction model for feature extraction to obtain second video content feature information corresponding to the X image blocks;

Performing feature fusion on the second video content feature information corresponding to the X image blocks to obtain the first video content feature information;

The video content feature extraction model includes a multi-head attention module, a residual and normalization module, and a forward feedback module. The image feature information corresponding to the X image blocks is input into the video content feature extraction model for feature extraction to obtain the second video content feature information corresponding to the X image blocks, including:

After inputting the image feature information corresponding to the X image blocks into the video content feature extraction model, based on the multi-head attention module, performing video content feature extraction on the image feature information corresponding to the X image blocks to obtain X first key video content feature information corresponding to the X image blocks;

Based on the residual and standardization module, calculate the mean and standard deviation values corresponding to the X first key video content feature information, and based on the mean and standard deviation values, obtain X third video content feature information corresponding to the X first key video content feature information;

Based on the forward feedback module, all feature information of each third video content feature information in the X third video content feature information are respectively merged to obtain X second video content feature information;

Among them, one image block corresponds to one first key video content feature information.

2. The method according to claim 1, characterized in that the step of determining the first playback speed corresponding to each video segment based on the video feature information corresponding to each video segment comprises:

Inputting the video feature information corresponding to each video clip into a video playback speed estimation model, and outputting a playback speed prediction parameter corresponding to each video clip;

Based on the playback speed prediction parameters corresponding to each video segment, determine the first playback speed corresponding to each video segment from Y preset playback speeds;

Wherein, Y is a positive integer.

3. The method according to claim 2, characterized in that the step of inputting the video feature information corresponding to each video segment into a video playback speed estimation model and outputting the playback speed prediction parameter corresponding to each video segment comprises:

For one of the N video clips, inputting video feature information corresponding to the video clip into an attention mechanism module to obtain key video feature information corresponding to the video clip;

The playback speed information that is mapped to the key video feature information is obtained, and based on the playback speed information and the key video feature information, a playback speed prediction parameter corresponding to the video segment is obtained.

4. A device for determining a video playback speed, characterized in that the device for determining a video playback speed comprises: an acquisition module, a processing module and a playback module:

The acquisition module is used to acquire first video content feature information corresponding to N video segments in the first video, where the N video segments are: N video segments synthesized according to a preset number of segment frames from video frames collected at fixed intervals of the first video, where N is a positive integer;

The processing module is used to respectively splice the user behavior feature information of the target user with the first video content feature information corresponding to the N video clips acquired by the acquisition module to obtain video feature information corresponding to each of the N video clips;

The processing module is used to determine the first playback speed corresponding to each video segment based on the video feature information corresponding to each video segment;

The playback module is configured to play the first video based on the first playback speed;

The processing module is specifically configured to divide each frame of a video image in one of the N video segments into X image blocks;

The acquisition module is further used to acquire image feature information corresponding to the X image blocks;

The processing module is specifically configured to input the image feature information corresponding to the X image blocks into a video content feature extraction model for feature extraction, so as to obtain second video content feature information corresponding to the X image blocks;

The processing module is specifically used to perform feature fusion on the second video content feature information corresponding to the X image blocks to obtain the first video content feature information;

The video content feature extraction model includes a multi-head attention module, a residual and normalization module, and a forward feedback module;

The processing module is specifically used for:

5. The device according to claim 4, characterized in that

The processing module is specifically used for:

Wherein, Y is a positive integer.

6. The device according to claim 5, characterized in that

The processing module is specifically used for:

7. An electronic device, characterized in that it comprises a processor, a memory, and a program or instruction stored in the memory and executable on the processor, wherein when the program or instruction is executed by the processor, the steps of the method for determining the video playback speed as described in any one of claims 1 to 3 are implemented.

8. A readable storage medium, characterized in that a program or instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method for determining the video playback speed as described in any one of claims 1 to 3 are implemented.