HK40045493B

HK40045493B - Media information processing method, device, electronic equipment and storage medium

Info

Publication number: HK40045493B
Application number: HK42021035908.9A
Authority: HK
Inventors: 康善同
Original assignee: 腾讯科技（深圳）有限公司
Filing date: 2021-08-02
Publication date: 2023-09-08

Description

Media information processing methods, devices, electronic devices and storage media

技术领域Technical Field

本申请涉及人工智能、云技术以及大数据等领域，尤其涉及一种媒体信息处理方法、装置、电子设备以及存储介质。This application relates to fields such as artificial intelligence, cloud technology, and big data, and in particular to a media information processing method, apparatus, electronic device, and storage medium.

背景技术Background Technology

在目前的互联网媒体信息中，例如互联网广告，按照产品形态的不同，互联网广告可以分为合约广告和竞价广告两种类型。In current internet media information, such as internet advertising, internet advertising can be divided into two types according to the different product forms: contract advertising and auction advertising.

由于合约广告和竞价广告产品形式的不同，决定了媒体对两者需要采用不同的投放策略。对于合约广告，最重要的目标是保量，也就是广告的投放量要达到预定量，既不能多，又不能少。对于竞价广告，媒体的投放目标为平台收益最大化。The differences between contract advertising and auction-based advertising necessitate different placement strategies from media outlets. For contract advertising, the most important goal is to ensure sufficient ad volume—the number of ads placed must reach a predetermined target, neither too much nor too little. For auction-based advertising, the media's objective is to maximize platform revenue.

合约广告和效果广告投放目标的不同，决定了两者需要采用不同的投放策略。目前对于很多媒体而言，同一个广告位上既可以给用户展示合约广告，也可以展示竞价广告，因此，如何合理地展示合约广告和竞价广告，成为目前亟需解决的技术问题。The different objectives of contract advertising and performance advertising necessitate different placement strategies. Currently, many media outlets can display both contract ads and auction ads on the same ad slot. Therefore, how to effectively display both contract ads and auction ads has become a pressing technical challenge.

发明内容Summary of the Invention

本申请实施例提供一种媒体信息处理方法、装置、电子设备以及存储介质，通过对候选媒体信息集合中的媒体信息进行处理，提高了播放媒体信息的合理性。This application provides a media information processing method, apparatus, electronic device, and storage medium, which improves the rationality of playing media information by processing media information in a candidate media information set.

一方面，本申请实施例提供一种媒体信息处理方法，该方法包括：On one hand, embodiments of this application provide a media information processing method, the method comprising:

获取候选媒体信息集合中的各第一媒体信息在上一时段的目标数据，上述目标数据包括流量预测信息和播放相关数据中的至少一项，上述播放相关数据包括上一时段的历史播放量、媒体信息的播放竞争信息或目标播放量中的至少一项，上述候选媒体信息集合包括至少一个第一类型的第一媒体信息和至少一个第二类型的第二媒体信息；Obtain the target data of each first media information in the candidate media information set in the previous time period. The target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous time period, playback competition information of media information, or target playback volume. The candidate media information set includes at least one first type of first media information and at least one second type of second media information.

对于每一上述第一媒体信息，基于上述第一媒体信息在上一时段的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值；For each of the aforementioned first media information, based on the target data of the aforementioned first media information in the previous time period, the first playback evaluation value of the aforementioned first media information in the current time period is determined;

获取各上述第二媒体信息在当前时段的第二播放评估值，其中，对于上述第一媒体信息或第二媒体信息对应的播放评估值，播放评估值表征了媒体信息的被推广概率；Obtain the second playback evaluation value of each of the aforementioned second media information in the current time period, wherein, for the playback evaluation value corresponding to the aforementioned first media information or second media information, the playback evaluation value represents the probability of the media information being promoted;

根据各上述第一媒体信息对应的第一播放评估值以及各上述第二媒体信息的对应的第二播放评估值，从上述候选媒体信息集合中确定出当前时段的待播放媒体信息。Based on the first playback evaluation value corresponding to each of the aforementioned first media information and the second playback evaluation value corresponding to each of the aforementioned second media information, the media information to be played in the current time period is determined from the aforementioned candidate media information set.

一方面，本申请实施例提供了一种媒体信息处理装置，该装置包括：On one hand, embodiments of this application provide a media information processing apparatus, the apparatus comprising:

目标数据获取模块，用于获取候选媒体信息集合中的各第一媒体信息的目标数据，上述目标数据包括流量预测信息和播放相关数据中的至少一项，上述播放相关数据包括上一时段的历史播放量、媒体信息播放竞争信息或目标播放量中的至少一项，上述候选媒体信息集合包括至少一个第一类型的第一媒体信息和至少一个第二类型的第二媒体信息；The target data acquisition module is used to acquire target data for each first media information in the candidate media information set. The target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous period, media information playback competition information, or target playback volume. The candidate media information set includes at least one first type of first media information and at least one second type of second media information.

播放评估值处理模块，用于对于每一上述第一媒体信息，基于上述第一媒体信息的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值；The playback evaluation value processing module is used to determine, for each of the first media information, the first playback evaluation value of the first media information in the current time period based on the target data of the first media information.

上述播放评估值处理模块，用于获取各上述第二媒体信息的第二播放评估值，其中，对于上述第一媒体信息或第二媒体信息对应的播放评估值，播放评估值表征了媒体信息的被推广概率；The aforementioned playback evaluation value processing module is used to obtain the second playback evaluation value of each of the aforementioned second media information, wherein, for the playback evaluation value corresponding to the aforementioned first media information or second media information, the playback evaluation value represents the probability of the media information being promoted;

待播放媒体信息处理模块，用于根据各上述第一媒体信息对应的第一播放评估值以及各上述第二媒体信息的对应的第二播放评估值，从上述候选媒体信息集合中确定出当前时段的待播放媒体信息。The media information processing module is used to determine the media information to be played in the current time period from the candidate media information set based on the first playback evaluation value corresponding to each of the first media information and the second playback evaluation value corresponding to each of the second media information.

在一种可选的实施例中，上述播放竞争信息包括上述候选媒体信息集合的各媒体信息的以下信息中的至少一项：In one optional embodiment, the aforementioned playback competition information includes at least one of the following pieces of information from each media in the aforementioned candidate media information set:

上一时段的点击率；Click-through rate in the previous period;

上一时段的转化率；Conversion rate in the previous period;

上一时段的曝光率；Exposure rate in the previous period;

上一时段的播放评估值。The playback evaluation value of the previous time period.

在一种可选的实施例中，上述播放评估值处理模块，用于：In an optional embodiment, the playback evaluation value processing module described above is used for:

根据上述目标数据，对上述第一媒体信息上一时段的历史播放评估值进行调整，得到上述第一媒体信息在当前时段的第一播放评估值。Based on the aforementioned target data, the historical playback evaluation value of the aforementioned first media information in the previous time period is adjusted to obtain the first playback evaluation value of the aforementioned first media information in the current time period.

在一种可选的实施例中，上述待播放媒体信息处理模块，还用于：In an optional embodiment, the above-mentioned media information processing module is further configured to:

确定各上述第一媒体信息在当前时段的被播放概率；Determine the probability of each of the aforementioned primary media information being played in the current time period;

上述根据各上述第一媒体信息对应的第一播放评估值以及各上述第二媒体信息的对应的第二播放评估值，从上述候选媒体信息集合中确定出当前时段的待播放媒体信息，包括：The above-mentioned determination of the media information to be played in the current time period from the candidate media information set based on the first playback evaluation value corresponding to each of the aforementioned first media information and the second playback evaluation value corresponding to each of the aforementioned second media information includes:

基于各上述第一媒体信息对应的第一播放评估值、上述被播放概率以及各上述第二媒体信息对应的第二播放评估值，从上述各候选媒体信息中确定出当前时段对应的待播放媒体信息。Based on the first playback evaluation value corresponding to each of the aforementioned first media information, the aforementioned probability of being played, and the second playback evaluation value corresponding to each of the aforementioned second media information, the media information to be played for the current time period is determined from the aforementioned candidate media information.

对于每一上述第一媒体信息，获取上述第一媒体信息对应于当前时段的展示评估数据，其中，上述展示评估数据是指影响上述第一媒体信息的被播放概率的信息；For each of the aforementioned first media information, obtain the display evaluation data corresponding to the current time period for the aforementioned first media information, wherein the aforementioned display evaluation data refers to information that affects the probability of the aforementioned first media information being played;

对于每一上述第一媒体信息，根据上述第一媒体信息对应的展示评估数据，确定上述第一媒体信息在当前时段的被播放概率。For each of the aforementioned first media information, the probability of the aforementioned first media information being played in the current time period is determined based on the display evaluation data corresponding to the aforementioned first media information.

在一种可选的实施例中，对于任一上述第一媒体信息，上述展示评估数据包括以下至少一项：In one optional embodiment, for any of the aforementioned first media information, the aforementioned display evaluation data includes at least one of the following:

当前终端设备对应的目标用户的画像数据；Profile data of the target user corresponding to the current terminal device;

当前终端设备的设备相关信息；The device information of the current terminal device;

上述第一媒体信息的属性信息；The attribute information of the aforementioned primary media information;

上述终端设备对应的当前时间信息；The current time information corresponding to the aforementioned terminal devices;

上述终端设备对应的当前地点信息；The current location information corresponding to the aforementioned terminal devices;

上述目标用户对应于当前终端设备的行为统计信息。The above target users correspond to the behavioral statistics of the current terminal device.

在一种可选的实施例中，对于每一上述第一媒体信息，基于上述第一媒体信息在上一时段的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值，是通过媒体信息评估模型实现的，上述媒体信息评估模型是通过训练模块通过以下方式训练得到的：In an optional embodiment, for each of the aforementioned first media information, determining the first playback evaluation value of the first media information in the current time period based on the target data of the first media information in the previous time period is achieved through a media information evaluation model, which is trained by a training module in the following manner:

获取训练样本集，其中，上述训练样本集中的每个训练样本包括一个第一类型的第三媒体信息在初始时段的样本目标数据；Obtain a training sample set, wherein each training sample in the training sample set includes a sample target data of a first type of third media information in the initial time period;

将各上述第三媒体信息在初始时段的样本目标数据输入至初始信息评估模型中，得到各上述第三媒体信息在上述初始时段的下一时段的预测播放评估值；The sample target data of each of the aforementioned third media information in the initial time period are input into the initial information evaluation model to obtain the predicted playback evaluation value of each of the aforementioned third media information in the next time period in the initial time period.

对于每个上述第三媒体信息，基于上述第三媒体信息在初始时段的样本目标数据和下一时段的预测播放评估值，确定上述第三媒体信息对应的第一预测评估效果表征值；For each of the aforementioned third media information, based on the sample target data of the aforementioned third media information in the initial time period and the predicted playback evaluation value in the next time period, the first predicted evaluation effect characterization value corresponding to the aforementioned third media information is determined.

基于各上述第三媒体信息对应的第一预测效果评估表征值，确定上述信息评估模型对应的第一训练总损失；Based on the first prediction effect evaluation characterization value corresponding to each of the above-mentioned third media information, the first total training loss corresponding to the above information evaluation model is determined.

基于各上述训练样本和第一训练总损失对上述信息评估模型进行重复训练，直至上述训练总损失满足预设的第一训练结束条件，得到上述媒体信息评估模型。The information evaluation model is repeatedly trained based on the aforementioned training samples and the first total training loss until the total training loss meets the preset first training termination condition, thus obtaining the aforementioned media information evaluation model.

在一种可选的实施例中，每个上述训练样本还包括上述第三媒体信息在初始时段的下一时段的真实播放评估值、初始时段的下一时段的真实评估效果表征值和上述初始时段的第一时段的样本目标数据，其中，上述第一时段为下一时段的下一时段，对于每个上述第三媒体信息，基于上述第三媒体信息在初始时段的样本目标数据和预测播放评估值，确定上述第三媒体信息对应的第一预测评估效果表征值，是通过效果评估模型实现的，上述效果评估模型是通过上述训练模块训练得到的，上述训练模块，用于：In an optional embodiment, each training sample further includes the actual playback evaluation value of the third media information in the next time period after the initial time period, the actual evaluation effect representation value in the next time period after the initial time period, and the sample target data of the first time period of the initial time period, wherein the first time period is the time period after the next time period. For each piece of third media information, based on the sample target data and predicted playback evaluation value of the third media information in the initial time period, the first predicted evaluation effect representation value corresponding to the third media information is determined through an effect evaluation model. The effect evaluation model is trained by the training module, which is used to:

将各上述第三媒体信息在第一时段的样本目标数据输入至初始信息评估模型中，得到各上述第三媒体信息在第一时段的第二预测播放评估值；The sample target data of each of the aforementioned third media information in the first time period are input into the initial information evaluation model to obtain the second predicted playback evaluation value of each of the aforementioned third media information in the first time period;

将各上述第三媒体信息在初始时段的第一时段的样本目标数据和对应的第二预测播放评估值输入至初始效果评估模型中，得到各上述第三媒体信息对应的第二预测评估效果表征值；The sample target data of each of the above-mentioned third media information in the first period of the initial period and the corresponding second predicted playback evaluation value are input into the initial effect evaluation model to obtain the second predicted evaluation effect characterization value corresponding to each of the above-mentioned third media information.

对于每个上述第三媒体信息，基于上述真实评估效果表征值和上述第二预测评估效果表征值，确定上述第三媒体信息对应的第一评估效果表征值，基于上述第三媒体信息在初始时段的样本目标数据和初始时段的下一时段的真实播放评估值，通过效果评估模型，得到上述第三媒体信息对应的第二评估效果表征值；For each of the aforementioned third media information, based on the aforementioned actual evaluation effect characterization value and the aforementioned second predicted evaluation effect characterization value, the first evaluation effect characterization value corresponding to the aforementioned third media information is determined. Based on the sample target data of the aforementioned third media information in the initial period and the actual playback evaluation value of the next period in the initial period, the second evaluation effect characterization value corresponding to the aforementioned third media information is obtained through the effect evaluation model.

基于各上述第三媒体信息对应的第一评估效果表征值和第二评估效果表征值，确定上述效果评估模型对应的第二训练总损失；Based on the first and second evaluation effect representation values corresponding to each of the above-mentioned third media information, the second total training loss corresponding to the above-mentioned effect evaluation model is determined.

基于各上述训练样本和上述第二训练总损失，对效果评估模型进行重复训练，直至满足预设的第二训练结束条件。Based on the aforementioned training samples and the aforementioned second total training loss, the performance evaluation model is repeatedly trained until the preset second training termination condition is met.

在一种可选的实施例中，每个上述训练样本还包括上述第三媒体信息所对应的至少一个第二类型的第四媒体信息在上述初始时段的下一时段的播放效果评估参量，对于任一上述第三媒体信息，上述真实评估效果表征值是通过上述训练模块通过以下方式得到的：In an optional embodiment, each of the above-mentioned training samples further includes a playback effect evaluation parameter for at least one second-type fourth media information corresponding to the above-mentioned third media information in the next time period after the above-mentioned initial time period. For any of the above-mentioned third media information, the above-mentioned true evaluation effect representation value is obtained by the above-mentioned training module in the following manner:

获取上述第三媒体信息的目标播放量以及上述初始时段的下一时段对应的已播放量；Obtain the target play count of the aforementioned third-party media information and the play count of the next time period corresponding to the initial time period;

根据上述目标播放量和上述已播放量，确定上述第三媒体信息的播放效果评估参量；Based on the target playback volume and the playback volume, determine the playback effect evaluation parameters of the third media information.

根据上述第三媒体信息的播放效果评估参量和上述第三媒体信息对应的各第四媒体信息的各播放评估效果参量，确定上述真实评估效果表征值。Based on the playback effect evaluation parameters of the aforementioned third media information and the playback effect evaluation parameters of each of the corresponding fourth media information, the aforementioned true evaluation effect representation value is determined.

一方面，本申请实施例提供了一种电子设备，该电子设备包括处理器和存储器，该处理器和存储器相互连接；该存储器用于存储计算机程序；该处理器被配置用于在调用上述计算机程序时，执行上述媒体信息播放任一种可能的实现方式提供的方法。On one hand, embodiments of this application provide an electronic device, which includes a processor and a memory interconnected thereto; the memory is used to store a computer program; the processor is configured to execute a method provided by any possible implementation of the above-mentioned media information playback when the computer program is invoked.

一方面，本申请实施例提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机程序，该计算机程序被处理器执行以实现上述媒体信息处理方法任一种可能的实施方式所提供的方法。On one hand, embodiments of this application provide a computer-readable storage medium storing a computer program that is executed by a processor to implement the method provided in any possible implementation of the media information processing method described above.

一方面，本申请实施例提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述媒体信息处理方法的任一种可能的实施方式所提供的方法。On one hand, embodiments of this application provide a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of an electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the method provided in any possible implementation of the media information processing method described above.

本申请实施例的有益效果在于：The beneficial effects of the embodiments of this application are as follows:

在本申请实施例中，获取候选媒体信息集合中的各第一媒体信息上一时段的目标数据，该目标数据包括流量预测信息和播放相关数据中的至少一项，播放相关数据包括上一时段的历史播放量、媒体信息的竞争信息或目标播放量中的至少一项，对于每一第一媒体信息来说，可以基于获取到的在上一时段的目标数据，确定出该第一媒体信息在当前时段的第一播放评估值，然后结合获取到的各第二媒体信息在当前时段的第二播放评估值，可以对候选媒体信息集合中的各第一媒体信息和各第二媒体信息进行排序，确定当前时段的待播放媒体信息。采用上述方式，可以依据目标数据确定各第一媒体信息的第一播放评估值，并依据各第一媒体信息的第一播放评估值和第二播放评估值来确定要在当前时段播放的待播放媒体信息，能够有效利用目标数据确定第一媒体信息的播放评估值，并利用播放评估值确定待播放的媒体信息，提高了展示待播放媒体信息的合理性。In this embodiment, target data for each first media information in the candidate media information set from the previous time period is obtained. This target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume from the previous time period, media information competition information, or target playback volume. For each first media information, a first playback evaluation value for the current time period can be determined based on the obtained target data from the previous time period. Then, combined with the obtained second playback evaluation values for each second media information in the current time period, the first and second media information in the candidate media information set can be sorted to determine the media information to be played in the current time period. Using this method, the first playback evaluation value for each first media information can be determined based on the target data, and the media information to be played in the current time period can be determined based on the first and second playback evaluation values. This effectively utilizes target data to determine the playback evaluation value of the first media information and uses the playback evaluation value to determine the media information to be played, improving the rationality of displaying the media information to be played.

附图说明Attached Figure Description

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

图1是本申请实施例提供的一种广告混合排序系统框架的结构示意图；Figure 1 is a schematic diagram of the framework of an advertising hybrid sorting system provided in an embodiment of this application;

图2本申请实施例提供的一种媒体信息处理方法的应用环境的示意图；Figure 2 is a schematic diagram of the application environment of a media information processing method provided in an embodiment of this application;

图3本申请实施例提供的一种媒体信息处理方法的流程示意图；Figure 3 is a schematic flowchart of a media information processing method provided in an embodiment of this application;

图4a是本申请实施例提供的一种通过目标展示界面播放待播放媒体信息的示意图；Figure 4a is a schematic diagram of playing media information to be played through a target display interface according to an embodiment of this application;

图4b是本申请实施例提供的另一种通过目标展示界面播放待播放媒体信息的示意图；Figure 4b is a schematic diagram of another method for playing media information to be played through a target display interface according to an embodiment of this application;

图5是本发明实施例提供的一种可选的分布式系统100应用于区块链系统的结构示意图；Figure 5 is a schematic diagram of the structure of an optional distributed system 100 applied to a blockchain system according to an embodiment of the present invention;

图6是本发明实施例提供的一种可选的区块结构(Block Structure)的示意图；Figure 6 is a schematic diagram of an optional block structure provided in an embodiment of the present invention;

图7是本申请实施例提供的一种媒体信息处理装置的结构示意图；Figure 7 is a schematic diagram of the structure of a media information processing device provided in an embodiment of this application;

图8是本申请实施例提供的一种电子设备的结构示意图。Figure 8 is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

具体实施方式Detailed Implementation

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

本申请实施例提供的媒体信息处理方法中的至少部分内容涉及到人工智能领域中的机器学习等领域，还涉及云技术的多种领域，如云技术(Cloud technology)中的云计算、云服务以及大数据领域中的相关数据计算处理领域。At least some of the media information processing methods provided in this application involve fields such as machine learning in the field of artificial intelligence, as well as various fields of cloud technology, such as cloud computing, cloud services, and related data computing and processing fields in the field of big data.

人工智能(Artificial Intelligence，简称AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce new intelligent machines that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess perception, reasoning, and decision-making capabilities.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence (AI) is a comprehensive discipline encompassing a wide range of fields, including both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating/interactive systems, and mechatronics. AI software technologies primarily include computer vision, speech processing, natural language processing, and machine learning/deep learning.

机器学习(Machine Learning，简称ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine learning (ML) is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. It specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence; its applications span all areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and instructional learning.

云技术是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来，实现数据的计算、储存、处理和共享的一种托管技术。本申请实施例所提供的媒体信息处理方法可基于云技术中的云计算(cloud computing)实现。Cloud technology refers to a hosting technology that unifies hardware, software, network, and other resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. The media information processing method provided in this application embodiment can be implemented based on cloud computing.

云计算是指通过网络以按需、易扩展的方式获得所需资源，是网格计算(GridComputing)、分布式计算(Distributed Computing)、并行计算(Parallel Computing)、效用计算(Utility Computing)、网络存储(Network Storage Technologies)、虚拟化(Virtualization)、负载均衡(Load Balance)等传统计算机和网络技术发展融合的产物。Cloud computing refers to obtaining the required resources on demand and in an easily scalable manner through the network. It is the product of the development and integration of traditional computer and network technologies such as grid computing, distributed computing, parallel computing, utility computing, network storage technologies, virtualization, and load balancing.

人工智能云服务，一般也被称作是AIaaS(AI as a Service，AI即服务)。这是目前主流的一种人工智能平台的服务方式，具体来说AIaaS平台会把几类常见的人工智能服务进行拆分，并在云端提供独立或者打包的服务，如处理资源转换请求等。Artificial intelligence cloud services are generally also known as AIaaS (AI as a Service). This is currently a mainstream service model for artificial intelligence platforms. Specifically, AIaaS platforms break down several common artificial intelligence services and provide them as independent or packaged services in the cloud, such as handling resource transformation requests.

大数据(Big data)是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合，是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。随着云时代的来临，大数据也吸引了越来越多的关注。基于大数据需要特殊的技术，以有效地实施本实施例所提供的媒体信息处理方法，其中适用于大数据的技术，包括大规模并行处理数据库、数据挖掘、分布式文件系统、分布式数据库、以及上述云计算等。Big data refers to data sets that cannot be captured, managed, and processed within a certain timeframe using conventional software tools. It represents massive, rapidly growing, and diverse information assets that require new processing models to achieve stronger decision-making, insightful discovery, and process optimization capabilities. With the advent of the cloud era, big data has attracted increasing attention. Big data requires specialized technologies to effectively implement the media information processing method provided in this embodiment. Technologies suitable for big data include massively parallel processing databases, data mining, distributed file systems, distributed databases, and cloud computing, among others.

本申请实施例中对于所涉及的媒体信息(第一媒体信息、第二媒体信息)的具体类型不做限定，本申请实施例中的媒体信息(第一媒体信息、第二媒体信息)可以对产品进行宣传所形成的媒体信息，具体可以应用于对产品1(即第一媒体信息)的播放数量有保量要求，同时追求产品2(即第二媒体信息)的整体收益最大化的场景中。其中，产品可以为媒体类产品，如游戏类产品、影视类产品、美妆类产品、家居类产品、服饰类产品等等、日常用品类产品，等等。具体可以通过广告投放的方式对产品进行宣传，如通过合约广告和竞价广告的方式，在智能设备上对产品进行播放宣传，在商场的电子展示屏、公路上的电子展示屏等进行播放宣传的场景中。This application does not limit the specific type of media information (first media information, second media information) involved. The media information (first media information, second media information) in this application embodiment can be media information formed by promoting products. Specifically, it can be applied in scenarios where there is a requirement to maintain the number of plays for product 1 (i.e., first media information) while maximizing the overall revenue of product 2 (i.e., second media information). The product can be a media product, such as a game product, a film and television product, a beauty product, a home furnishing product, an apparel product, or a daily necessities product, etc. Specifically, the product can be promoted through advertising, such as through contract advertising and bidding advertising, playing promotional content on smart devices, or playing promotional content on electronic display screens in shopping malls, electronic display screens on highways, etc.

其中，上述智能设备包括但不限于移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等移动终端以及诸如数字TV、台式计算机等固定终端，具体也可基于实际应用场景需求确定，在此不作限定。The aforementioned smart devices include, but are not limited to, mobile terminals such as mobile phones, laptops, digital radio receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), and in-vehicle terminals (such as in-vehicle navigation terminals), as well as fixed terminals such as digital TVs and desktop computers. The specific devices can be determined based on the actual application scenario requirements and are not limited here.

为了更加清楚的说明本申请的技术方案，在本示例中，以产品是合约广告、竞价广告的方式进行宣传播放的方式，需要对合约广告、竞价广告进行混合排序为例进行说明。产品1以合约广告的方式进行播放，产品2以竞价广告的方式进行播放，在播放合约广告和竞价广告时，需要达到合约广告保量与效果广告的千次展示收益(effective cost permile，简称ecpm)最大两者整体最优的目标。其中，合约广告即为上述第一类型的第一媒体信息，效果广告即为上述第二类型的第二媒体信息。To more clearly illustrate the technical solution of this application, this example uses a method where products are promoted and played through contract advertising and auction advertising, requiring a mixed sorting of contract advertising and auction advertising. Product 1 is played through contract advertising, and Product 2 is played through auction advertising. When playing contract advertising and auction advertising, the goal is to achieve the optimal overall balance between maximizing the guaranteed volume of contract advertising and maximizing the effective cost per thousand impressions (ecpm) of performance advertising. Here, contract advertising refers to the first type of media information mentioned above, and performance advertising refers to the second type of media information mentioned above.

在当前的互联网广告市场上，按照产品形态的不同，互联网广告可以分为合约广告和竞价广告两种类型。In the current internet advertising market, based on the different product forms, internet advertising can be divided into two types: contract advertising and auction advertising.

合约广告又称展示广告，合约广告就是在预定时间段按照预定价格投放预定数量的广告。合约广告是最早的在线广告售卖方式，它是指媒体和广告主约定在某一时间段内、在某些广告位上固定投放该广告主的广告，相应的结算方式为按照一个时间段进行展示来费用(Cost Per Time，简称cpt)，后来又陆续演化出担保式投送(Guaranteed Delivery)，媒体和广告主约定在某一时间段内、在某些广告位上为某些用户投放一定数量的该广告主的广告，相应的结算方式为千次广告展现的费用(Cost Per Mille，简称cpm)，如果媒体投放的广告数量超出了广告主的预定数量，则超出的部分不会被计费，而如果媒体投放的广告数量少于广告主的预定数量，则将需要进行相应的经济赔偿。Contract advertising, also known as display advertising, involves placing a predetermined number of ads at a predetermined price within a specified time period. It was the earliest form of online advertising sales. It refers to an agreement between a media outlet and an advertiser to place the advertiser's ads on certain ad slots within a specific timeframe. The settlement method is based on cost per time (Cost Per Time, CPT). Later, Guaranteed Delivery evolved, where the media outlet and advertiser agree to place a certain number of the advertiser's ads on certain ad slots for certain users within a specific timeframe. The settlement method is cost per mille (CPM). If the number of ads placed by the media outlet exceeds the advertiser's predetermined number, the excess will not be charged. However, if the number of ads placed by the media outlet is less than the advertiser's predetermined number, the media outlet will be required to provide financial compensation.

竞价广告，又称效果广告，竞价广告是指按照广告主的广告出价进行选择性投放的广告。由广告主自主出价，并按照效果计费，常见的计费方式包括按照广告被点击的次数来计费(Cost Per Click，简称cpc)和按照用户反馈操作来计费(cost per Action，简称cpa)，近年来又演化出优化点击付费(Optimized Cost per Click，简称ocpc)和优化行为出价(Optimized Cost per Action，简称ocpa)等形式。与合约广告不同，媒体和效果广告主并未约定广告的投放数量。对于媒体方的流量，效果广告主之间需要出价进行竞争，常见的流量拍卖方式为竞价机制(Generalized Second-Price，简称GSP)。Pay-per-click (PPC) advertising, also known as performance-based advertising, refers to advertising where ads are selectively displayed based on the advertiser's bid. Advertisers set their own bids, and billing is based on performance. Common billing methods include cost per click (CPC) and cost per action (CPA). In recent years, optimized cost per click (OCPC) and optimized cost per action (OCPA) have emerged. Unlike contract advertising, performance-based advertising does not stipulate the number of ads to be displayed between media outlets and performance advertisers. Performance advertisers compete for media traffic through bidding, with the common traffic auction method being the Generalized Second-Price (GSP) mechanism.

由于合约广告和竞价广告产品形式的不同，决定了媒体对两者需要采用不同的投放策略。对于合约广告，最重要的目标是保量，也就是广告的投放量要达到预定量，既不能多，又不能少，同时兼顾合约广告的点击率(Click-Through Rate，简称ctr)、广告的转化率(click value rate，简称cvr)等指标。对于效果广告，媒体的投放目标为投放平台收益最大化，也就是整体ecpm最大。The differences between contract advertising and auction-based advertising necessitate different placement strategies from media outlets. For contract advertising, the most important goal is to ensure sufficient ad volume—the number of ads placed must reach a predetermined target, neither too much nor too little—while simultaneously considering metrics such as click-through rate (CTR) and conversion rate (CVR). For performance-based advertising, the media's objective is to maximize platform revenue, specifically, to maximize overall eCPM.

合约广告和效果广告投放目标的不同，决定了两者需要采用不同的投放策略。目前，对于很多媒体而言，同一个广告位上既可以给用户展示合约广告，也可以展示竞价广告，因此广告投放系统往往会引入一个专门的模块-合约广告/竞价广告混合排序来决策是否展示合约/竞价广告、展示哪一个合约/竞价广告。The different objectives of contract advertising and performance advertising necessitate different placement strategies. Currently, many media outlets can display both contract ads and performance-based ads on the same ad slot. Therefore, ad placement systems often incorporate a dedicated module—a mixed sorting of contract/performance ads—to determine whether to display contract/performance ads and which type of contract/performance ad to show.

本申请涉及的术语解释如下：The terms used in this application are explained as follows:

广告主(advertisers)：指想为自己的品牌或者产品做广告的人。Advertisers: People who want to advertise their own brand or products.

媒体(publisers)：提供广告位置的载体。例如新闻类客户端、浏览器、短视频平台、影视剧平台、即时通讯类应用等等。Media (publishers): The platforms that provide advertising space. Examples include news apps, browsers, short video platforms, film and television platforms, instant messaging applications, etc.

广告商(agency)：本质上其实就是中介，帮广告主找媒体广告位，帮媒体找广告主。Advertisers (agency): In essence, they are intermediaries who help advertisers find media advertising space and help media find advertisers.

受众(audience)：“消费”广告的人，即消费者、用户。Audience: People who "consume" the advertisement, i.e., consumers or users.

点击率(Click-Through Rate，简称ctr)，是互联网广告常用的术语，指网络广告(图片广告/文字广告/关键词广告/排名广告/视频广告等)的点击到达率，即该广告的实际点击次数除以广告的展现量。Click-through rate (CTR) is a common term in internet advertising, referring to the click-through rate of online ads (image ads/text ads/keyword ads/ranking ads/video ads, etc.), which is the actual number of clicks on the ad divided by the number of ad impressions.

广告的转化率(click value rate，简称cvr)，是一个衡量cpa广告效果的指标，简言之就是用户点击广告到成为一个有效激活或者注册甚至付费用户的转化率。cvr＝(转化量/点击量)*100％。转化量在此需要一个明确的定义，根据不同人的定义不同，一般指广告行业甲方(一般指广告主)，也就是广告主考核乙方(一般指渠道)的标准，如果此标准是一个有效的手机号码注册用户，那么转化率(点击到有效用户的转化率)就是：cvr＝渠道带来的手机号码注册用户数/渠道带来的点击数，一般而言，0≤cvr≤100％。Click-value rate (CVR) is a metric for measuring the effectiveness of CPA advertising. Simply put, it's the conversion rate from a user clicking on an ad to becoming a valid activated, registered, or even paying user. CVR = (Conversions / Clicks) * 100%. Conversions need a clear definition, which varies depending on the individual. Generally, it refers to the standard used by the advertising industry client (usually the advertiser) to evaluate the agency (usually the distribution channel). If this standard is a valid mobile phone number registered user, then the conversion rate (the conversion rate from click to valid user) is: CVR = Number of mobile phone number registered users brought in by the channel / Number of clicks brought in by the channel. Generally, 0 ≤ CVR ≤ 100%.

广告点击率预测(Predict Click-Through rate，简称pctr)，是指对某个广告将要在某个情形下展现前，预估该广告可能的点击概率。Predicting click-through rate (PCTR) refers to estimating the probability of an ad being clicked before it is displayed under certain circumstances.

千次展示收益(effective cost per mile，简称ecpm)，是媒体衡量自己广告投产效率的指标，它是指每千次广告的曝光，能够给媒体带来多少的广告收益。这个值对媒体越大越好。ecpm＝CPC×CTR×1000。Effective cost per mile (ecpm) is a metric used by media outlets to measure the return on their advertising campaigns. It refers to the revenue generated for each thousand ad impressions. A higher ecpm is generally better for media outlets. ecpm = CPC × CTR × 1000.

千次广告展现的费用(Cost Per Mille，简称cpm)。因为每次展现的费用数字很小，业内约定按照广告展现了1000次的形式来收费。按此付费的广告大多以品牌展示、产品发布为主。Cost Per Mille (cpm) is the cost per thousand ad impressions. Because the cost per impression is very small, the industry standard is to charge based on 1,000 ad impressions. Ads charged this way are mostly for brand promotion and product launches.

按照广告被点击的次数来计费(Cost Per Click，简称cpc)。关键词竞价、信息流广告大多是这种模式。Billing is based on the number of times an ad is clicked (Cost Per Click, or CPC for short). Keyword bidding and feed ads mostly use this model.

优化点击付费(Optimized Cost per Click，简称ocpc)，按照cpc付费。采用更科学的转化率预估机制的准确性，可帮助广告主在获取更多优质流量的同时提高转化完成率。系统会在广告主出价基础上，基于多维度、实时反馈及历史积累的海量数据，并根据预估的转化率以及竞争环境智能化的动态调整出价，进而优化广告排序，帮助广告主竞得最适合的流量，并降低转化成本。Optimized Cost per Click (OCPC) is implemented, charging based on CPC. The more accurate and scientific conversion rate prediction mechanism helps advertisers acquire more high-quality traffic while improving conversion rates. Based on the advertiser's bid, the system intelligently and dynamically adjusts bids according to multi-dimensional, real-time feedback, and massive amounts of historical data, as well as the predicted conversion rate and competitive environment. This optimizes ad ranking, helping advertisers win the most suitable traffic and reduce conversion costs.

按照用户反馈操作来计费(cost per Action，简称cpa)，一般为注册行为，注册成本。也包括每次下载成本(CostPer Download，简称cpd)和每次安装成本(Cost PerInstall，简称cpi)。Billing is based on user feedback and actions (cost per action, CPA), typically for registration, which incurs registration costs. It also includes the cost per download (Cost Per Download, CPD) and the cost per installation (Cost Per Install, CPI).

优化行为出价(Optimized Cost per Action，简称ocpa)，本质还是按照cpa付费。当广告主在广告投放流程中选定特定的优化目标(例如：移动应用的激活，网站的下单)，提供愿意为此投放目标而支付的平均价格，并及时、准确回传效果数据，我们将借助转化预估模型，实时预估每一次点击对广告主的转化价值，自动出价，最终按照点击扣费；同时，我们的转化预估模型会根据广告主的广告转化数据不断自动优化。Optimized Cost per Action (OCPA) is essentially still a CPA-based payment system. When advertisers select specific optimization goals (e.g., mobile app activation, website order placement) during the ad campaign process, provide the average price they are willing to pay for these goals, and promptly and accurately report performance data, we will use a conversion prediction model to estimate the conversion value of each click for the advertiser in real time, automatically set bids, and ultimately charge per click. Simultaneously, our conversion prediction model will continuously and automatically optimize based on the advertiser's ad conversion data.

按照销售来付费(cost per sell，简称cps)，直接的效果营销广告。Cost per sell (CPS) is a direct performance-based marketing advertising method.

按照一个时间段进行展示来费用(Cost Per Time，简称cpt)，一般为1天，1周，1月。以一个固定价格去买断一段时间内的广告位展示，被称作最省心的投放方式。大多数平台方通过CPT是最快速也是最有效赚钱的合作方式。Cost Per Time (CPT) is the payment method for ad placements, typically for one day, one week, or one month. Paying a fixed price to display ad space for a specific period is considered the most hassle-free advertising method. For most platforms, CPT is the fastest and most effective way to generate revenue.

流量可以理解为PV，即页面浏览量或点击量，是衡量一个网站或网页用户访问量。具体的说，PV值就是所有访问者在24小时(0点到24点)内看了某个网站多少个页面或某个网页多少次。PV是指页面刷新的次数，每一次页面刷新，就算做一次PV流量。度量方法就是从浏览器发出一个对网络服务器的请求(Request)，网络服务器接到这个请求后，会将该请求对应的一个网页(Page)发送给浏览器，从而产生了一个PV。那么在这里只要是这个请求发送给了浏览器，无论这个页面是否完全打开(下载完成)，那么都是应当计为1个PV。Traffic can be understood as PV, which stands for Page Views or Clicks, and it measures the number of times a website or webpage is visited by users. Specifically, the PV value is the number of pages on a website or the number of times a specific webpage is viewed by all visitors within 24 hours (0:00 to 24:00). PV refers to the number of times a page is refreshed; each page refresh counts as one PV. The measurement method involves the browser sending a request to the web server. Upon receiving this request, the web server sends the corresponding webpage to the browser, thus generating one PV. Therefore, as long as the request is sent to the browser, regardless of whether the page is fully loaded (downloaded), it should be counted as one PV.

为了实现对合约广告和效果广告的混合排序，在本申请示例中，主要通过博弈论的思想，基于强化学习的方式，利用合约广告的保量情况、效果广告的ecpm分布、合约广告的库存分布(即尚未完成的播放量)，给出合约广告的合理的出价，从而达到合约广告保量与效果广告整体ecpm最大这个最优的目标。To achieve a hybrid ranking of contract ads and performance ads, this application example primarily employs game theory and reinforcement learning to utilize the guaranteed volume of contract ads, the eCPM distribution of performance ads, and the inventory distribution of contract ads (i.e., unfinished plays) to determine a reasonable bid for contract ads, thereby achieving the optimal goal of maximizing both the guaranteed volume of contract ads and the overall eCPM of performance ads.

作为一种可选实施方式，图1中示出了本申请提供的一种广告混合排序系统的框架结构示意图，如图1中所示，该系统包括可以离线和在线两部分。离线部分主要进行模型训练和流量预测。在线部分主要根据实时的在线精排和曝光情况，利用模型预测功能，给出合约广告的出价。As an optional implementation, Figure 1 shows a schematic diagram of the framework structure of an advertising hybrid ranking system provided in this application. As shown in Figure 1, the system includes both offline and online components. The offline component mainly performs model training and traffic prediction. The online component mainly uses the model prediction function to provide the bid for contracted advertisements based on real-time online fine-tuning and exposure.

如图1所示，广告混合排序系统的离线部分可以包括流量预测模块、模拟器、模型训练模块，在线部分可以包括模型预测模块、广告混合排序模块、用户终端。As shown in Figure 1, the offline part of the ad hybrid ranking system may include a traffic prediction module, a simulator, and a model training module, while the online part may include a model prediction module, an ad hybrid ranking module, and a user terminal.

以某一天(即周期为一天)的播放时间为例，对合约广告和效果广告进行混合排序，然后进行播放的具体过程如下：Taking a specific day (i.e., a one-day period) as an example, the process of sorting contract ads and performance ads in a mixed manner and then playing them is as follows:

步骤S1，将合约广告的投放周期划分为细粒度的时间片(例如5min、10min、15min等等)。Step S1: Divide the campaign period of the contract advertisement into fine-grained time slices (e.g., 5 min, 10 min, 15 min, etc.).

步骤S2，在每个时间片开始，根据上一个时间片的播放情况(即上一时间段的历史播放量)、混排阶段的竞争情况(即上述媒体信息播放竞争信息)、以及该合约广告的流量预测信息，使用强化学习(Reinforcement Learning，简称RL)给出该合约广告的出价。出价即为上述播放评估值，出价也可以理解为对合约广告进行打分，得到该合约广告被播放的最终可能性。Step S2: At the beginning of each time slot, based on the playback status of the previous time slot (i.e., the historical playback volume of the previous time period), the competition situation in the mixed playback stage (i.e., the aforementioned media information playback competition information), and the traffic prediction information of the contract advertisement, reinforcement learning (RL) is used to determine the bid for the contract advertisement. The bid is the aforementioned playback evaluation value; the bid can also be understood as scoring the contract advertisement to obtain the final probability of the contract advertisement being played.

步骤S3，观察一个时间片，并记录当按照上一个时间片的出价进行竞争，在上一个时间片所产生的相关信息，记录上一个时间片的播放情况、混排阶段的竞争情况等。Step S3: Observe a time slot and record relevant information generated in the previous time slot when competing based on the bid of the previous time slot, such as the playback status of the previous time slot and the competition status during the mixed play phase.

步骤S4，重复步骤S2-S3，直至合约广告的播放达到预定量或者投放周期结束。Step S4: Repeat steps S2-S3 until the contracted advertisement plays to the predetermined number or the campaign period ends.

需要说明的是，在对合约广告和效果广告进行混合排序时，当天时间内，要播放多少个合约广告、要播放哪些合约广告都是预先知晓的，要播放的效果广告会根据当前的播放情况安排不同的时间段播放不同的效果广告。It should be noted that when sorting contract ads and performance ads in a mixed manner, the number and specific contract ads to be played within a given day are known in advance. The performance ads to be played will be scheduled to play different time slots based on the current playback situation.

为了更加清楚的说明每个模块的作用，以下对各个模块分别进行介绍：To more clearly explain the function of each module, the following is a separate introduction to each module:

流量预测模块主要负责对合约广告的流量情况进行预估，其预测内容包括，该合约广告随时间的流量分布、对应流量上该合约广告的竞争情况(包括合约广告个数、ctr分布等)、对应流量上效果广告的竞争情况(包括效果广告的个数、ecpm分布等)。The traffic prediction module is mainly responsible for estimating the traffic of contract ads. Its predictions include the traffic distribution of the contract ad over time, the competition for the contract ad on the corresponding traffic (including the number of contract ads, CTR distribution, etc.), and the competition for performance ads on the corresponding traffic (including the number of performance ads, ECPM distribution, etc.).

其中，流量预测模块对合约广告随时间的流量分布进行预估时，可以使用流量预估算法进行流量分布的预估，流量预估算法会根据预设应用在预定地区的预定人群的历史流量数据，对预设应用在预定地区的预定人群在预定时间的流量分布进行预测。例如，流量分布预估是指，譬如说某个合约广告定向投放在2020年1月2号-2020年1月4号某视频应用A的某地区B的男性流量上，流量预估算法会根据过去一段时间内每天该视频应用A上该地区B的男性流量的分布，预估2020年1月2号-2020年1月4号该视频应用A的该地区B的男性流量的具体数额。When the traffic prediction module estimates the traffic distribution of contracted ads over time, it can use a traffic prediction algorithm. This algorithm predicts the traffic distribution of a predetermined audience within a predetermined region based on historical traffic data of the app. For example, traffic distribution prediction means that if a contracted ad is targeted at male traffic in region B of a video app A from January 2nd to January 4th, 2020, the algorithm will estimate the specific amount of male traffic in region B of video app A from January 2nd to January 4th, 2020, based on the daily distribution of male traffic in region B on video app A over a past period.

典型的流量预估算法一般可以采用机器学习模型，预估流程如下：Typical traffic prediction methods generally employ machine learning models, and the prediction process is as follows:

步骤1，收集广告主的定向条件，并统计该定向条件下每天的流量大小；Step 1: Collect the advertiser's targeting criteria and calculate the daily traffic volume under those targeting criteria;

其中，定向条件可以根据实际场景需求设定进行设定，在此不作限定。例如，该定向条件可以为针对某些特定地区(如一线城市)的流量，该定向条件还可以为针对某些特定用户(如电信用户、移动用户、联通用户等)的流量，该定向条件还可以为针对特定人群(如20岁-40岁年龄段的人群)的流量，等等。The targeting conditions can be set according to the actual needs of the scenario, and are not limited here. For example, the targeting conditions can be for traffic in certain specific regions (such as first-tier cities), traffic for certain specific users (such as China Telecom users, China Mobile users, China Unicom users, etc.), traffic for specific groups of people (such as people aged 20-40), and so on.

步骤2，将广告主的定向条件作为特征，对应的流量大小为标签label，训练机器学习模型；Step 2: Use the advertiser's targeting conditions as features and the corresponding traffic volume as labels to train a machine learning model;

步骤3，对于一个新下单的广告主，使用步骤2中训练出的机器学习模型预估其在投放期间的流量分布。Step 3: For a new advertiser who places an order, use the machine learning model trained in Step 2 to predict their traffic distribution during the campaign period.

模拟器主要对广告的混排阶段和展示阶段进行模拟，能够模拟真实的广告混合排序过程，其内部包括两个子模块：混排模拟模块，该混排模拟模块模拟了混排阶段的在线逻辑(包括排序规则、广告位最大可播广告个数/时长限制、同广告主过滤逻辑等)，可以模拟某个合约广告在给定的出价下在某个时间片(即时间段)内的混排胜出情况，包括合约广告胜出个数/胜出率、效果广告胜出个数/胜出率/胜出ecpm分布等；展示模拟模块，为一个统计机器学习模型，该模型的输入特征为用户的特征数据(年龄、性别、受教育等程度、兴趣爱好、购物行为数据、最近关注度较高的浏览行为数据等)、广告属性(广告id、行业、时长等)、上下文(时间、地点、网络、用户设备等)、以及用户的行为统计类特征(当日已经观看视频次数、广告次数等)，展示模拟模块的输出为广告能够的被播放概率。The simulator primarily simulates the mixed-ranking and display phases of advertising, capable of mimicking the realistic mixed-ranking process. It comprises two sub-modules: a mixed-ranking simulation module, which simulates the online logic of the mixed-ranking phase (including ranking rules, maximum number/duration limits for playable ads per ad slot, and filtering logic for ads from the same advertiser); and a display simulation module, which is a statistical machine learning model. The model's input features include user characteristics (age, gender, education level, interests, shopping behavior data, recently viewed browsing behavior data, etc.), ad attributes (ad ID, industry, duration, etc.), context (time, location, network, user device, etc.), and user behavioral statistics (number of videos watched that day, number of ads viewed, etc.). The output of the display simulation module is the probability that an ad will be played.

其中，可以通过混排模拟模块的算法实现模拟混排阶段的在线逻辑的过程，该混排模拟模块的算法的输入和输出具体如下：The online logic process of simulating the mixed-sorting stage can be implemented through the algorithm of the mixed-sorting simulation module. The specific inputs and outputs of the algorithm of the mixed-sorting simulation module are as follows:

输入：某个5min时间片上的贴片广告排序日志(如第一媒体信息在上一时段的目标数据)，合约广告A以及由强化学习模型(即媒体信息评估模型)给出的该广告(如第一媒体信息)的新出价new_bid(如第一播放评估值)。Input: the ranking log of the ad placements in a 5-minute time slice (such as the target data of the first media information in the previous time slice), the contract ad A, and the new bid new_bid (such as the first playback evaluation value) of the ad (such as the first media information) given by the reinforcement learning model (i.e. the media information evaluation model).

其中，贴片广告是指通过激光唱片(Compact Disk，简称CD)、视频压缩碟片(VideoCompact Disc，简称VCD)、数字多功能光盘(Digital Video Disc，简称DVD)等介质或包装海报等形式，经过覆盖全国及部分海外市场的发行机构，在短时间内将品牌和产品信息传达给目标消费者的传播平台，也叫作“随片广告。Among them, embedded advertising refers to a communication platform that uses media such as Compact Disc (CD), Video Compact Disc (VCD), Digital Video Disc (DVD) or packaging posters to convey brand and product information to target consumers in a short period of time through distribution agencies covering the whole country and some overseas markets. It is also called "advertising with the film".

输出：该合约广告在此5min时间片上的胜出次数，以及实际的竞争情况等信息。Output: The number of times the contract ad won during this 5-minute time slice, and information such as the actual competition situation.

混排模拟模块的具体算法如下：The specific algorithm for the mixed-row simulation module is as follows:

步骤1，收集5min时间片上，合约广告A参与的排序流量X；Step 1: Collect the sorting traffic X of contract ad A during the 5-minute time slice;

步骤2，针对排序流量X中的每一个流量x：Step 2, for each traffic x in the sorted traffic X:

a.计算合约广告A在X中的每一个流量x上的新得分new_score，一般而言new_score＝new_bid*pctr*pcvr；a. Calculate the new score new_score of contract ad A on each traffic x in X. Generally, new_score = new_bid * pctr * pcvr;

b.初始化胜出广告队列Res为空。b. Initialize the winning ad queue Res to be empty.

c.针对在贴片广告上的每一贴：c. For each ad in a pre-roll ad:

1)将可以在本贴上进行展示的广告按照打分从高到低排序，广告队列标记为L。1) Sort the ads that can be displayed on this post from highest to lowest score, and mark the ad queue as L.

2)对广告队列L中的每一个广告l，如果广告l和胜出广告队列Res中广告的总时长没有超过广告位所允许的总时长、同时广告l和胜出广告队列Res中的广告不是同一个广告主，则将广告l加入胜出广告队列Res，否则继续执行。2) For each ad l in the ad queue L, if the total duration of ad l and the ads in the winning ad queue Res does not exceed the total duration allowed for the ad slot, and ad l and the ads in the winning ad queue Res are not from the same advertiser, then add ad l to the winning ad queue Res; otherwise, continue execution.

d.判断广告A是否在胜出广告队列Res中，记录其是否胜出，以及竞争队列的平均打分等情况。d. Determine whether ad A is in the winning ad queue Res, record whether it wins, and the average score of the competing queue, etc.

步骤3，汇总数据，给出广告A的胜出率以及竞争环境等相关统计数据。Step 3: Summarize the data and provide relevant statistics such as the win rate of Ad A and the competitive environment.

上述胜出广告队列Res用于表示最终可以播放的广告的队列。The winning ad queue Res mentioned above represents the queue of ads that can ultimately be played.

具体来说，假设某视频应用的开头的广告位可以播放总时长为150秒的8个贴片广告，可以按照以下方式确定要播放的8个贴片广告：Specifically, assuming a video app's intro ad slot can play eight pre-roll ads with a total duration of 150 seconds, the eight pre-roll ads to be played can be determined as follows:

可以预先设置好每个贴片广告的具体时长，对于任一贴来说，将可以在本贴上进行展示的广告按照打分从高到低排序，广告队列标记为L，从中选出打分最高的一个广告，如广告1，如果广告l和胜出广告队列Res中广告的总时长没有超过广告位所允许的总时长、同时广告l和胜出广告队列Res中的广告不是同一个广告主，可以将该广告1作为胜出广告队列Res中的一个广告。The specific duration of each ad can be preset. For any given post, the ads that can be displayed on this post are sorted from highest to lowest score, and the ad queue is marked as L. The ad with the highest score is selected, such as ad 1. If the total duration of ad 1 and the ads in the winning ad queue Res does not exceed the total duration allowed for the ad slot, and ad 1 and the ads in the winning ad queue Res are not from the same advertiser, then ad 1 can be included as an ad in the winning ad queue Res.

模型训练模块：合约广告的投放是一个典型的奖赏延迟问题，直至合约广告的播放周期结束，才可以获知合约广告的真正保量情况，该问题可以使用强化学习建模解决。强化学习的模型可以采用强化学习(Reinforcement Learning，简称RL)、深度强化学习(DeepQ Network，简称DQN)、深度确定性策略梯度(Deep Deterministic Policy Gradient，简称DDPG)、深度强化学习算法(Actor-Critic Algorithm，简称A3C)等模型，在此不作限定。Model Training Module: Contract advertising is a typical reward delay problem. The true performance of the contract advertisement can only be known after the advertisement's playback period ends. This problem can be solved using reinforcement learning modeling. Reinforcement learning models can include Reinforcement Learning (RL), Deep Q Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Actor-Critic Algorithm (A3C), among others. No specific model is limited here.

强化学习的State、Action、Reward具体设计如下：The specific design of State, Action, and Reward in reinforcement learning is as follows:

状态State：状态总共包括3个部分，混排阶段的竞争环境状态、流量预测状态和投放状态，其中，混排阶段的竞争环境状态可以包括合约广告参与混排的个数、效果广告的出价分布、其他合约广告的出价分步、胜率、ctr/cvr分布等。流量预测状态可以包括该合约广告对应的预测库存分布、对应流量上的效果广告ecpm分布、合约广告的竞争个数分布等。投放状态可以合约广告的预定量、已经曝光的广告数量等。前文中的第一媒体信息在上一时段的目标数据和第三媒体信息在初始时段的样本目标数据，以及后文中的初始State和终止State中的所包含的具体信息，可以参考该State中的具体信息。State: The state consists of three parts: the competitive environment state during the mixed ranking phase, the traffic prediction state, and the delivery state. The competitive environment state during the mixed ranking phase can include the number of contract ads participating in the mixed ranking, the bid distribution of performance ads, the bid distribution of other contract ads, win rates, CTR/CVR distributions, etc. The traffic prediction state can include the predicted inventory distribution corresponding to the contract ad, the ECPM distribution of performance ads on the corresponding traffic, and the distribution of the number of competing contract ads, etc. The delivery state can include the pre-ordered volume of contract ads, the number of ads already exposed, etc. The target data of the first media information in the previous time period and the sample target data of the third media information in the initial time period mentioned above, as well as the specific information contained in the initial and final states mentioned later, can be found in the specific information within that state.

行为Action：合约广告的出价。该Action即为前文描述的预测播放评估值。Action: The bid for the contracted advertisement. This action is the predicted playback evaluation value described above.

奖赏Reward：Reward是一个分段函数，主要包括3个部分。前文中的第三媒体信息在初始时段的下一时段的真实评估效果表征值的计算方式，可参考该Reward的计算方式。Reward: The reward is a piecewise function, mainly consisting of three parts. The calculation method for the actual evaluation effect representation value of the third-media information in the next time period after the initial time period, as mentioned earlier, can be referenced in this reward calculation method.

在每一个时间片(如前文中的初始时段的下一时段)，如果合约广告的截止到当前时间片的全部的已经完成的播放量尚未达到预定量(如第三媒体信息的目标播放量)：In each time slot (such as the next time slot after the initial time slot mentioned above), if the total number of completed plays for the contracted advertisement up to the current time slot has not yet reached the predetermined number (such as the target number of plays for third-party media messages):

那么Reward＝合约广告播放量+alpha*合约广告效果+beta*效果广告ecpm。Therefore, Reward = Contract Ad Views + Alpha * Contract Ad Performance + Beta * Performance Ad ECPM.

如果合约广告的播放量已经达到预定量：If the number of views for the contracted ads has reached the predetermined target:

那么，Reward＝-合约广告播放量+alpha*合约广告效果+beta*效果广告ecpm。Therefore, Reward = - Contract ad views + alpha * Contract ad performance + beta * Performance ad eCPM.

其中，上述公式中的合约广告播放量可对应于前文中的第三媒体信息在初始时段的下一时段的已播放量，合约广告效果可对应于前文中的第三媒体信息在初始时段的下一时段的播放效果评估参量，效果广告ecpm可对应于前文中的第四媒体信息在初始时段的下一时段的播放效果评估参量。In the above formula, the number of contract advertisement plays can correspond to the number of plays of the third media information in the next period after the initial period, the contract advertisement effect can correspond to the playback effect evaluation parameter of the third media information in the next period after the initial period, and the performance advertisement ecpm can correspond to the playback effect evaluation parameter of the fourth media information in the next period after the initial period.

在合约广告投放结束时：At the end of the contract advertising campaign:

Reward＝-abs(合约广告的实际播放量–合约广告的预定播放量)。Reward = -abs(actual number of views for the contracted ad – planned number of views for the contracted ad).

其中，abs表示绝对值函数。合约广告效果指合约广告的ctr、cvr等指标。alpha和beta的取值范围为0-1，为模拟器的权重，在此不作限定。Here, abs represents the absolute value function. Contract advertising effectiveness refers to metrics such as CTR and CVR of contract ads. Alpha and beta range from 0 to 1, representing the simulator's weights, and are not limited here.

举例来说，假设有个合约广告A，预订了今天的播放量为100w，截止到18:00点，已经播放了90w，在18:00-18:05这个时间片上的总共参与混排30万次，播放量为10w，平均点击率ctr是0.02，这个合约广告在18:00-18:05这个时间片上与其他广告进行竞争时，在pk失败的20w流量上，胜出效果广告的平均ecpm是10块钱。For example, suppose there is a contract ad A that has booked 1 million views for today. By 6:00 PM, it has already been viewed 900,000 times. During the 6:00 PM to 6:05 PM time slot, it participated in the mixed ad placement 300,000 times, with 100,000 views and an average click-through rate (CTR) of 0.02. When this contract ad competes with other ads during the 6:00 PM to 6:05 PM time slot, the winning ad has an average eCPM of 10 yuan, even though it lost 200,000 views.

那么18:00-18:05这个时间片上，Reward＝10w+alpha*10w*0.02+beta*20w*10。Therefore, in the time slice from 18:00 to 18:05, Reward = 10w + alpha * 10w * 0.02 + beta * 20w * 10.

然后，18:05-18:10这个时间片，该合约广告的播放量为10w，其他情况与18:00-18:05时间片的情况一样，即ctr、pk失败的流量数量和胜出效果广告的平均ecpm不变。Then, during the 18:05-18:10 time slot, the contracted ad received 100,000 views. Other than that, the ad's performance remained the same as during the 18:00-18:05 time slot, meaning the number of failed CTR and PK ads and the average ECPM of the winning ad remained unchanged.

那么18:05-18:10这个时间片上，Reward＝-10w+alpha*10w*0.02+beta*20w*10。Therefore, in the time slice from 18:05 to 18:10, Reward = -10w + alpha * 10w * 0.02 + beta * 20w * 10.

总体来说，模型在训练的时候，主要思想如下：有两个模型，一个用于预测Action(如文中所描述的媒体信息评估模型)，记为模型A，一个用于预测Reward(如文中所描述的效果评估模型、critic模型)，记为模型B。主要过程为，先用模型A通过获取到的上一个时间片的State预测当前时间片的Action。然后，模型A基于当前的State和这个预测出来的Action，将当前的State和预测出来的Action送到模型B，得到预测的Reward。然后，将当前的State和真实的Action送到模型B中，得到真实的Reward。In general, the main idea behind model training is as follows: Two models are used, one for predicting Actions (such as the media information evaluation model described in the text), denoted as Model A, and the other for predicting Rewards (such as the effect evaluation model and critic model described in the text), denoted as Model B. The main process is as follows: First, Model A uses the State obtained from the previous time slice to predict the Action for the current time slice. Then, Model A, based on the current State and the predicted Action, feeds the current State and the predicted Action into Model B to obtain the predicted Reward. Finally, the current State and the actual Action are fed into Model B to obtain the actual Reward.

其中，模型B的优化目标就是让预测的Reward和真实Reward越来越接近。模型A的优化目标就是，输出Action之后，送到模型B里，得到预测Reward，预测Reward越接近真实的Reward越好。模型B在训练过程中，得到的预测结果会越来越准，这就相当于模型A的优化目标是输出Action后，得到真实Reward越接近真实的Reward越好。The optimization objective of Model B is to make the predicted reward as close as possible to the actual reward. The optimization objective of Model A is to output an Action, feed it into Model B, and obtain a predicted reward that is as close to the actual reward as possible. During training, Model B's predictions become increasingly accurate, which is equivalent to Model A's optimization objective being to obtain a true reward that is as close as possible to the actual reward after outputting an Action.

模型训练整体流程如下：The overall model training process is as follows:

首先，获取训练样本集合，基于博弈轮的RL混排模型离线训练流程具体如下：First, obtain the training sample set. The offline training process for the game-round-based RL hybrid ranking model is as follows:

步骤1，对每一个合约广告，获取合约广告的信息，包括预定量(如第三媒体信息的目标播放量)，预定周期等；Step 1: For each contracted advertisement, obtain information about the contracted advertisement, including the number of views (such as the target number of views for third-party media information), the period of the contract, etc.

步骤2，将预定周期切分为时间片(譬如5min粒度)。Step 2: Divide the predetermined period into time slices (e.g., 5-minute intervals).

步骤3，对于每一个时间片，执行以下步骤：Step 3, for each time slice, perform the following steps:

步骤a，收集当前时间片的初始State。Step a: Collect the initial State for the current time slice.

步骤b，将收集到的State输入至RL模型中，得到预测的Action。其中，该预测的Action可对应于文中描述的始时段的下一时段的真实播放评估值。Step b involves inputting the collected State into the RL model to obtain the predicted Action. This predicted Action corresponds to the actual playback evaluation value of the next time period after the initial time period described in the text.

步骤c，将预测得到的Action，以及该合约广告在当前时间片的混排日志，输入到模拟器，获取Reward、结束标记done、以及后续状态-终止State。Step c: Input the predicted Action and the mixed log of the contract advertisement in the current time slice into the simulator to obtain the Reward, the end marker "done", and the subsequent state - the termination State.

步骤d，将<初始State，Action，Reward，终止State>保存为一条样本(即将每个时间片的初始State，预测得到的Action，该时间片的Reward和终止State作为一条样本进行存储)，如果样本数量达到一定阈值，则进行RL模型训练(RL模型训练过程见下文)。Step d: Save <Initial State, Action, Reward, Terminating State> as a sample (that is, store the initial state, predicted action, reward and termination state of each time slice as a sample). If the number of samples reaches a certain threshold, then perform RL model training (the RL model training process is described below).

步骤e，如果结束标记done为真，则返回步骤1。In step e, if the end marker "done" is true, return to step 1.

步骤f，如果结束标记done为假，则返回步骤a。If the end marker "done" is false in step f, then return to step a.

步骤4，如果所有合约训练完毕，则结束流程。Step 4: If all contracts have been trained, the process ends.

当前时间片的初始State，即为当前时间片的初始状态，也就是上一个时间片终止时的状态。该时间片的终止State是该时间片结束时对应的state(即当前时间片的下一个时间片的初始State)。The initial state of the current time slice is the initial state of the current time slice, which is the state at the end of the previous time slice. The ending state of the time slice is the state corresponding to the end of the current time slice (i.e., the initial state of the next time slice).

需要说明的是，该当前时间片可对应于文中所描述的初始时段的下一时段。<初始State，Action，Reward，终止State>中的初始State可与文中描述的初始时段的样本目标数据相对应，Action可与文中描述的初始时段的下一时段的真实播放评估值相对应，Reward可与文中描述的初始时段的下一时段的真实评估效果表征值相对应，终止State可与文中描述的初始时段的第一时段的样本目标数据，其中，上述第一时段为下一时段的下一时段。It should be noted that the current time slice can correspond to the next time segment after the initial time segment described in the text. In <Initial State, Action, Reward, Terminating State>, the Initial State corresponds to the sample target data of the initial time segment described in the text; the Action corresponds to the actual playback evaluation value of the next time segment after the initial time segment described in the text; the Reward corresponds to the actual evaluation performance representation value of the next time segment after the initial time segment described in the text; and the Terminating State corresponds to the sample target data of the first time segment of the initial time segment described in the text, where the first time segment is the time segment after the next time segment.

可选的，在存储样本时，可以将每个合约广告在每一个时间片对应的上一个时间片的State、当前时间片预测得到的Action，当前时间片通过模拟器得到的Reward和下一个时间片的State(即文中描述的<初始State，Action，Reward，终止State>)作为一条样本。将所有按照这种方式存储的样本记为样本A，该样本A为最初存储的样本。Optionally, when storing samples, each contract advertisement can use the State of the previous time slice, the Action predicted in the current time slice, the Reward obtained from the simulator in the current time slice, and the State of the next time slice (i.e., the <Initial State, Action, Reward, Terminating State> described in the text) as a sample. All samples stored in this way are denoted as sample A, which is the initially stored sample.

需要说明的是，在对RL模型进行训练时，样本A中的数据均作为真实的数据。It should be noted that when training the RL model, all data in sample A are used as real data.

其中，上述步骤a-步骤c是基于合约广告的真实的历史数据所产生的多条样本的过程，目的是为了丰富样本，所产生的样本均作为真实的样本进行RL模型的训练，通过步骤a-步骤c可以产生该合约广告在每一个时间片的样本，当该合约广告播放完成、且产生的样本超过一定阈值时，即可通过产生的样本进行RL模型的训练。Steps a-c above are a process of generating multiple samples based on real historical data of the contract advertisement. The purpose is to enrich the sample. All generated samples are used as real samples for training the RL model. Through steps a-c, samples of the contract advertisement in each time slice can be generated. When the contract advertisement finishes playing and the generated samples exceed a certain threshold, the generated samples can be used to train the RL model.

具体地，RL模型训练过程如下(以经典的Action-critic构型为例)：Specifically, the RL model training process is as follows (taking the classic Action-critic configuration as an example):

步骤1:训练critic模型：Step 1: Train the critic model:

步骤a，获取样本<初始State，Action，Reward，终止State>；Step a, obtain the sample <Initial State, Action, Reward, Terminating State>;

步骤b，将终止State输入到Action模型中，获得Action_Next；Step b: Input the termination State into the Action model to obtain Action_Next;

步骤c，将<终止State，Action_Next>输入到critic模型中，获得Reward_Next；Step c: Input <Termination State, Action_Next> into the critic model to obtain Reward_Next;

步骤d，计算Final_Reward1＝Reward+lamda*Reward_Next；Step d: Calculate Final_Reward1 = Reward + lambda * Reward_Next;

步骤e，将<初始State，Action>输入到critc模型中，得到Final_Reward2；Step e: Input <Initial State, Action> into the critc model to obtain Final_Reward2;

步骤f，训练critic模型，输入为<初始State，Action>，损失loss为Final_Reward1和Final_Reward2的差值；Step f: Train the critic model with input <initial State, Action> and loss = the difference between Final_Reward1 and Final_Reward2;

其中，在训练critic模型时，lamda的取值范围为0-1，长期Reward(即Reward_Next)的衰减权重，在此不作限定。In training the critic model, the value of lambda is in the range of 0-1, and the decay weight of the long-term reward (i.e., reward_Next) is not limited here.

步骤2:训练Action模型：Step 2: Train the Action model:

步骤a：将<初始State>输入到Action模型中，得到Action_New；Step a: Input the <Initial State> into the Action model to obtain Action_New;

步骤b：将<初始State，Action_New>输入到critc模型中，得到Final_Reward3；Step b: Input <Initial State, Action_New> into the critc model to obtain Final_Reward3;

步骤c：训练Action模型，输入为<初始State>，损失loss为-Final_Reward3。Step c: Train the Action model with the input <Initial State> and the loss -Final_Reward3.

长期Reward(即Reward_Next、Final_Reward2和Final_Reward3)，是对应于所有时间片的Reward得到的。The long-term rewards (i.e., Reward_Next, Final_Reward2, and Final_Reward3) are obtained corresponding to the rewards for all time slices.

需要说明的是，critic模型为用于预测Reward的模型(即效果评估模型)，Action模型为用于预测Action的模型(即媒体信息评估模型)。It should be noted that the critic model is used to predict rewards (i.e., the effect evaluation model), while the action model is used to predict actions (i.e., the media information evaluation model).

其中，<初始State，Action_New>中的初始State可与文中描述的初始时段的样本目标数据相对应，Action_Next可与文中描述的初始时段的下一时段的预测播放评估值相对应，Final_Reward3可与文中描述的第三媒体信息对应的第一预测评估效果表征值相对应。全部的-Final_Reward3的和即为第一训练总损失。In this context, `InitialState` in `<InitialState, Action_New>` corresponds to the target sample data of the initial time period described in the text; `Action_Next` corresponds to the predicted playback evaluation value of the next time period described in the text; and `Final_Reward3` corresponds to the first prediction evaluation performance representation value corresponding to the third media information described in the text. The sum of all `-Final_Reward3` values is the first total training loss.

<终止State，Action_Next>中的Action_Next可与文中描述的第三媒体信息在第一时段的第二预测播放评估值相对应，Reward_Next可与文中描述的第三媒体信息对应的第二预测评估效果表征值相对应，Final_Reward1可与文中描述的第一评估效果表征值相对应，Final_Reward2可与文中描述的第二评估效果表征值相对应。Final_Reward1和Final_Reward2的差值即为第一评估效果表征值和第二评估效果表征值的差值。In the `<Termination State, Action_Next>` section, `Action_Next` corresponds to the second predicted playback evaluation value of the third media information described in the text during the first time period; `Reward_Next` corresponds to the second predicted evaluation effect representation value of the third media information described in the text; `Final_Reward1` corresponds to the first evaluation effect representation value described in the text; and `Final_Reward2` corresponds to the second evaluation effect representation value described in the text. The difference between `Final_Reward1` and `Final_Reward2` is the difference between the first and second evaluation effect representation values.

在对critic模型进行训练时，训练的目的是为了让损失loss为Final_Reward1和Final_Reward2的差值越来越小，达到让critic模型预测Reward的精度越来越高的目的。在对Action模型进行训练时，训练的目的是为了让损失loss为-Final_Reward3越来越小(即Final_Reward3越来越接近正无穷大)，达到让Action模型预测Action的效果越来越好的目的。When training the critic model, the goal is to minimize the difference between the loss Final_Reward1 and Final_Reward2, thereby increasing the accuracy of the critic model's reward predictions. Similarly, when training the action model, the goal is to minimize the loss -Final_Reward3 (i.e., to make Final_Reward3 closer to positive infinity), thereby improving the action model's action prediction performance.

根据本发明实施例的一个方面，提供了一种媒体信息处理方法。可选地，该媒体信息处理方法可以但不限于应用于如图2所示的应用环境中。如图2所示，该媒体信息处理方法可以但不限于应用于如图2所示的环境中的媒体信息处理系统中，其中，该媒体信息处理系统可以包括但不限于用户终端101、网络102、服务器103。用户终端101可以通过网络102与服务器103通信，用户终端101可以通过网络向服务器103发起互联网请求。用户终端101中运行有互联网应用，该互联网应用能够为用户提供互联网服务，例如，为用户提供观看视频、浏览网页、搜索资料、游戏等服务，在该互联网应用上，用户可以观看到插播的媒体信息，如广告等。该互联网应用可以为网页应用、应用程序(Application，简称APP)等。上述用户终端101中包括人机交互屏幕1011，处理器1012及存储器1013。人机交互屏幕1011用于用户发起互联网请求，还可以用于用户观看待播放媒体信息。处理器1012用于处理该用户的互联网请求的相关操作。存储器1013用于存储该互联网请求的相关的数据。服务器103中包括数据库1031及处理引擎1032。According to one aspect of the present invention, a media information processing method is provided. Optionally, the media information processing method may be applied, but is not limited to, the application environment shown in FIG2. As shown in FIG2, the media information processing method may be applied, but is not limited to, the media information processing system in the environment shown in FIG2, wherein the media information processing system may include, but is not limited to, a user terminal 101, a network 102, and a server 103. The user terminal 101 can communicate with the server 103 through the network 102, and the user terminal 101 can initiate an Internet request to the server 103 through the network. An Internet application runs on the user terminal 101, which can provide Internet services to the user, such as providing services such as watching videos, browsing web pages, searching for information, and playing games. On the Internet application, the user can watch interstitial media information, such as advertisements. The Internet application can be a web application, an application program (APP), etc. The user terminal 101 includes a human-computer interaction screen 1011, a processor 1012, and a memory 1013. The human-computer interaction screen 1011 is used for the user to initiate Internet requests and can also be used for the user to watch media information to be played. Processor 1012 is used to handle operations related to the user's internet request. Memory 1013 is used to store data related to the internet request. Server 103 includes database 1031 and processing engine 1032.

如图2所示，本申请中的媒体信息处理方法确定待播放媒体信息的具体过程如步骤S1-S9：As shown in Figure 2, the specific process of determining the media information to be played in the media information processing method of this application is as follows: steps S1-S9:

S1，用户通过用户终端101中的互联网应用发起互联网请求，并通过网络102向服务器103发送互联网请求。S1, the user initiates an Internet request through the Internet application in the user terminal 101 and sends the Internet request to the server 103 through the network 102.

S2，服务器103接收到互联网请求后，响应该互联网请求，如向用户终端101返回该互联网请求对应的视频等。同时，服务器103中的处理引擎1032获取候选媒体信息集合中的各第一媒体信息在上一时段的目标数据，上述目标数据包括流量预测信息和播放相关数据中的至少一项，上述播放相关数据包括上一时段的历史播放量、媒体信息的播放竞争信息或目标播放量中的至少一项，上述候选媒体信息集合包括至少一个第一类型的第一媒体信息和至少一个第二类型的第二媒体信息。其中，服务器103中的数据库1031用户存储目标数据。S2, after receiving an internet request, server 103 responds to the internet request, such as returning the video corresponding to the internet request to user terminal 101. Simultaneously, the processing engine 1032 in server 103 obtains the target data of each first media information in the candidate media information set for the previous time period. The target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous time period, playback competition information of the media information, or target playback volume. The candidate media information set includes at least one first type of first media information and at least one second type of second media information. The database 1031 in server 103 stores the target data.

S3，对于每一上述第一媒体信息，服务器103中的处理引擎1032基于上述第一媒体信息在上一时段的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值。S3, for each of the aforementioned first media information, the processing engine 1032 in the server 103 determines the first playback evaluation value of the aforementioned first media information in the current time period based on the target data of the aforementioned first media information in the previous time period.

S4，服务器103的处理引擎1032获取各上述第二媒体信息在当前时段的第二播放评估值。S4, the processing engine 1032 of server 103 obtains the second playback evaluation value of each of the above-mentioned second media information in the current time period.

S5，服务器103中的处理引擎1032根据各上述第一媒体信息对应的第一播放评估值以及各上述第二媒体信息的对应的第二播放评估值，从上述候选媒体信息集合中确定出当前时段的待播放媒体信息。并将该待播放媒体信息通过网络102发送给用户终端101。S5, the processing engine 1032 in server 103 determines the media information to be played in the current time period from the candidate media information set based on the first playback evaluation value corresponding to each of the first media information and the second playback evaluation value corresponding to each of the second media information. The media information to be played is then sent to user terminal 101 via network 102.

S6，用户终端101在接收到待播放媒体信息时，在用户终端101的人机交互屏幕1011上播放该待播放媒体信息。S6, when the user terminal 101 receives the media information to be played, it plays the media information to be played on the human-computer interaction screen 1011 of the user terminal 101.

可理解，上述仅为一种示例，本实施例在此不作限定。It is understood that the above is only one example, and this embodiment is not limited here.

其中，服务器可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network，内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器或服务器集群。上述网络可以包括但不限于：有线网络，无线网络，其中，该有线网络包括：局域网、城域网和广域网，该无线网络包括：蓝牙、Wi-Fi及其他实现无线通信的网络。用户终端可以是智能手机(如Android手机、iOS手机等)、平板电脑、笔记本电脑、数字广播接收器、MID(Mobile InternetDevices，移动互联网设备)、PDA(个人数字助理)、台式计算机、车载终端(例如车载导航终端)、智能音箱、智能手表等，用户终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接，但并不局限于此。具体也可基于实际应用场景需求确定，在此不作限定。The server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server or server cluster providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. The network can include, but is not limited to, wired networks and wireless networks. Wired networks include local area networks (LANs), metropolitan area networks (MANs), and wide area networks (WANs). Wireless networks include Bluetooth, Wi-Fi, and other networks that enable wireless communication. User terminals can be smartphones (such as Android phones, iOS phones, etc.), tablets, laptops, digital radio receivers, MIDs (Mobile Internet Devices), PDAs (Personal Digital Assistants), desktop computers, in-vehicle terminals (such as in-vehicle navigation terminals), smart speakers, smartwatches, etc. User terminals and servers can be directly or indirectly connected via wired or wireless communication, but are not limited to these methods. Specific details can be determined based on the actual application scenario requirements and are not limited here.

参见图3，图3是本申请实施例提供的一种媒体信息处理方法的流程示意图，该方法可以由任一电子设备执行，如可以是服务器或者用户终端，也可以是用户终端和服务器交互完成，可选的，可以由服务器执行，如图3所示本申请实施例提供的媒体信息处理方法包括如下步骤：Referring to Figure 3, which is a flowchart illustrating a media information processing method provided in an embodiment of this application, the method can be executed by any electronic device, such as a server or a user terminal, or it can be completed through interaction between the user terminal and the server. Optionally, it can be executed by the server. As shown in Figure 3, the media information processing method provided in this embodiment of the application includes the following steps:

S301、获取候选媒体信息集合中的各第一媒体信息在上一时段的目标数据，上述目标数据包括流量预测信息和播放相关数据中的至少一项，上述播放相关数据包括上一时段的历史播放量、媒体信息的播放竞争信息或目标播放量中的至少一项，上述候选媒体信息集合包括至少一个第一类型的第一媒体信息和至少一个第二类型的第二媒体信息求。S301. Obtain the target data of each first media information in the candidate media information set in the previous time period. The target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of the historical playback volume in the previous time period, playback competition information of media information, or target playback volume. The candidate media information set includes at least one first type of first media information and at least one second type of second media information.

S302，对于每一上述第一媒体信息，基于上述第一媒体信息在上一时段的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值。S302, for each of the aforementioned first media information, based on the target data of the aforementioned first media information in the previous time period, determine the first playback evaluation value of the aforementioned first media information in the current time period.

S303，获取各上述第二媒体信息在当前时段的第二播放评估值，其中，对于上述第一媒体信息或第二媒体信息对应的播放评估值，播放评估值表征了媒体信息的被推广概率；S303, obtain the second playback evaluation value of each of the above-mentioned second media information in the current time period, wherein, for the playback evaluation value corresponding to the above-mentioned first media information or second media information, the playback evaluation value represents the probability of the media information being promoted;

S304，根据各上述第一媒体信息对应的第一播放评估值以及各上述第二媒体信息的对应的第二播放评估值，从上述候选媒体信息集合中确定出当前时段的待播放媒体信息。S304, based on the first playback evaluation value corresponding to each of the aforementioned first media information and the second playback evaluation value corresponding to each of the aforementioned second media information, determine the media information to be played in the current time period from the aforementioned candidate media information set.

可选的，候选媒体信息集合中包括至少一个第一类型的第一媒体信息(如合约广告)，以及至少一个第二类型的第二媒体信息(如效果广告)。Optionally, the candidate media information set includes at least one first type of first media information (such as contract advertising) and at least one second type of second media information (such as performance advertising).

对于每个第一媒体信息对应的目标数据，该目标数据包括流量预测信息和播放相关数据中的至少一项，播放相关数据包括上一时段的历史播放量、媒体信息播放竞争信息或目标播放量中的至少一项。For each target data corresponding to the first media information, the target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of the historical playback volume of the previous time period, media information playback competition information, or target playback volume.

其中，流量预测信息是当前时段以及当前时段之后的各时段该第一媒体信息对应的流量预测信息。该第一媒体信息可以为针对不同地方的对应的流量预测信息，如该第一媒体信息在一线城市等地方的对应用户的流量预测信息，在此不作限定。The traffic prediction information refers to the traffic prediction information for the current time period and subsequent time periods corresponding to this first media information. This first media information can be traffic prediction information for different locations, such as the traffic prediction information for users in first-tier cities, etc., which is not limited here.

上述历史播放量可以为上一时段该第一媒体信息的历史播放量，如该第一媒体信息在上一时段播放了10w(10万)次。目标播放量即该第一媒体信息预先设定好的播放量，如100w。The historical play count mentioned above can refer to the historical play count of the first media information in the previous time period, such as the first media information being played 100,000 times in the previous time period. The target play count is the pre-set play count of the first media information, such as 1 million.

在一种可选的实施例中，上述播放竞争信息包括以下至少一项：In one optional embodiment, the aforementioned playback competition information includes at least one of the following:

各上述第一媒体信息在上一时段内的历史点击率；The historical click-through rate of each of the aforementioned primary media information within the previous time period;

各上述第二媒体信息在上一时段内的历史点击率；The historical click-through rate of each of the aforementioned secondary media information within the previous time period;

各上述第一媒体信息在上一时段内的转化率；The conversion rate of each of the aforementioned primary media messages in the previous time period;

各上述第二媒体信息在上一时段内的转化率；The conversion rate of each of the aforementioned secondary media messages in the previous time period;

各上述第一媒体信息在上一时段内的曝光率；The exposure rate of each of the aforementioned primary media outlets in the previous time period;

各上述第二媒体信息在上一时段内的曝光率；The exposure rate of each of the aforementioned secondary media messages in the previous time period;

各上述第一媒体信息在上一时段内的播放评估值；The playback evaluation value of each of the aforementioned primary media information in the previous time period;

各上述第二媒体信息在上一时段内的播放评估值。The playback evaluation value of each of the aforementioned secondary media information in the previous time period.

可选的，曝光率用于表征该第一媒体信息的胜率，由该第一媒体信息在上一时段的曝光情况决定。播放评估值即该第一媒体信息在上一时段的出价，出价也可以理解为对合约广告进行打分，得到该合约广告的质量分。Optionally, the exposure rate is used to characterize the win rate of the first media message, and is determined by the exposure of the first media message in the previous time period. The playback evaluation value is the bid of the first media message in the previous time period. The bid can also be understood as scoring the contracted advertisement to obtain the quality score of the contracted advertisement.

基于获取到的目标数据，可以确定出当前时段段内该第一媒体信息的第一播放评估值。然后，获取当前时间段内参与此次混合排序竞争的各第二媒体信息的第二播放评估值，按照第一播放评估值和第二播放评估值，从候选媒体集合中确定出要当前时间段要播放的待播放媒体信息，并播放。Based on the acquired target data, the first playback evaluation value of the first media information within the current time period can be determined. Then, the second playback evaluation values of each second media information participating in this mixed ranking competition within the current time period are acquired. According to the first and second playback evaluation values, the media information to be played in the current time period is determined from the candidate media set and played.

举例来说，以在某热播剧的开头的广告位上播放合约广告为例进行说明，需要在该广告位上对10个合约广告在一天(即00:00-24:00)的时间内完成这10个合约广告各自对应的预订播放量，例如，合约广告1的预订播放量为100w、合约广告1的预订播放量为200w……合约广告10的预订播放量为150w。For example, let's take playing contract ads in the ad slot at the beginning of a popular TV series as an example. The goal is to achieve the reserved number of views for each of the 10 contract ads in that ad slot within one day (i.e., 00:00-24:00). For example, the reserved number of views for contract ad 1 is 1 million, the reserved number of views for contract ad 2 is 2 million, and so on, and the reserved number of views for contract ad 10 is 1.5 million.

以合约广告1为例进行说明，具体步骤如下：Taking Contract Ad 1 as an example, the specific steps are as follows:

步骤1，将这一天的投放周期划分为细粒度的时间片，如5min。Step 1: Divide the day's delivery cycle into fine-grained time slices, such as 5 minutes.

步骤2，在晚上的时间片内，观看该热播剧的人数较少，对应的流量也较少，可以设置出价较低，或者为0。Step 2: During the evening time slot, fewer people watch the popular drama, resulting in less traffic. Therefore, you can set a lower bid or even 0.

步骤3，在白天的时间片内，如9:00-9:05这个时间片开始，获取上一个时间片该合约广告1的播放量(如5w)(即上一时间段的历史播放量)、混排阶段的竞争情况(即除了合约广告1的其他9个合约广告的竞争情况、参与该时间片的广告播放竞争的其他效果广告的竞争情况)(即上述媒体信息播放竞争信息)、以及该合约广告的对应的流量预测信息(如12:30-13:00流量较多，20:00-22:30的流量较多等)。假设由这些信息得出，该合约广告1在9:00-9:05胜率较高，库存的播放量还剩85w，在9:00-9:05这个时间片之后还有较大的流量产生，那么，在9:00-9:05时间片可以仍然保持8:55-9:00这个时间片的出价进行该9:00-9:05时间片的出价，并且，引入能够产生较大收益的至少一个效果广告。或者，相对于8:55-9:00这个时间片，可以提高9:00-9:05这个时间片的出价，并且，引入能够产生较大收益的1个或多个效果广告。Step 3: During the daytime time slot, such as 9:00-9:05, obtain the play count of the contract ad 1 in the previous time slot (e.g., 50,000) (i.e., the historical play count of the previous time slot), the competition situation in the mixed playback stage (i.e., the competition situation of the other 9 contract ads besides contract ad 1, and the competition situation of other performance ads participating in the ad playback competition in this time slot) (i.e., the above media information playback competition information), and the corresponding traffic prediction information of the contract ad (e.g., more traffic from 12:30-13:00, more traffic from 20:00-22:30, etc.). Assuming that based on this information, Ad 1 has a high win rate between 9:00 and 9:05, has 850,000 remaining plays, and will generate significant traffic after the 9:00-9:05 time slot, then the bid for the 9:00-9:05 time slot can remain the same as the bid for the 8:55-9:00 time slot, and at least one performance ad that can generate substantial revenue should be introduced. Alternatively, compared to the 8:55-9:00 time slot, the bid for the 9:00-9:05 time slot can be increased, and one or more performance ads that can generate substantial revenue should be introduced.

当确定好该合约广告1的出价之后，该合约广告1会在该9:00-9:05时间片内与其他9个合约广告以及其他1个或多个效果广告进行竞争，在竞争时，除了考虑出价因素，还会考虑参与竞争的广告的时长、广告的类型、广告位最大可播广告个数/时长限制、同广告主过滤逻辑(即不能在同一个时间段的同一个广告位播放同一个广告主的广告)等因素，对竞争成功的广告进行排序，并对竞争成功的广告按照排序之后的顺序在上述热播剧开头的广告位进行播放。Once the bid for Contract Ad 1 is determined, Contract Ad 1 will compete with 9 other Contract Ads and 1 or more other performance Ads during the 9:00-9:05 time slot. During the competition, in addition to the bid, factors such as the duration of the competing ads, the type of ads, the maximum number/duration limit of ads that can be played in the ad slot, and the same advertiser filtering logic (i.e., ads from the same advertiser cannot be played in the same ad slot during the same time slot) will be considered. The successful bidders will be ranked and played in the ad slot at the beginning of the aforementioned popular drama in the order of their ranking.

步骤4，按照上述方式，对每个时间片的广告进行出价、排序、播放，直至达到预定播放量或者达到这一天的投放周期。Step 4: Following the above method, bid, sort, and play the ads for each time slot until the predetermined number of plays is reached or the campaign period for that day is completed.

如图4a所示，假设按照上述方式待播放媒体信息为竞争成功的合约广告1、合约广告3、合约广告4、合约广告5、以及效果广告2、效果广告5，这些竞争成功的待播放媒体信息会在该热播剧开头的广告位上轮流播放。图4a中所示的目标展示界面中，合约广告1、合约广告3、合约广告4、合约广告5、以及效果广告2、效果广告5这几个广告的总时长为120秒，在该目标展示界面中，当前正在播放的是关于效果广告5的相关信息(即图中所示的“汽车A，您最好的选择！”)，在播放完该效果广告5之后，可以播放合约广告1、合约广告3、合约广告4、合约广告5、以及效果广告2。As shown in Figure 4a, assuming the media information to be played according to the above method consists of successfully bid contract ads 1, 3, 4, and 5, and performance ads 2 and 5, these successfully bid media information will be played in rotation at the beginning of the popular drama. In the target display interface shown in Figure 4a, the total duration of the ads 1, 3, 4, 5, 2, and 5 is 120 seconds. Currently playing in this target display interface is information about performance ad 5 (i.e., "Car A, your best choice!" shown in the figure). After performance ad 5 finishes playing, contract ads 1, 3, 4, 5, and 2 can be played.

如图4b所示，假设按照上述方式待播放媒体信息为竞争成功的合约广告1、合约广告3、合约广告4、合约广告5、以及效果广告2、效果广告5，这些竞争成功的待播放媒体信息会在该热播剧开头的广告位上轮流播放。图4b中所示的目标展示界面中，当前正在播放的是关于合约广告1、合约广告4和效果广告2的相关信息，在播放完合约广告1、合约广告4和效果广告2之后，可以播放合约广告3、合约广告5、以及效果广告5的相关信息。As shown in Figure 4b, assuming the media information to be played according to the above method consists of successfully bid contract ads 1, 3, 4, and 5, and performance ads 2 and 5, these successfully bid media information will be played in rotation on the advertising slots at the beginning of the popular drama. In the target display interface shown in Figure 4b, information about contract ads 1, 4, and 2 is currently being played. After contract ads 1, 4, and 2 are played, information about contract ads 3, 5, and 5 can be played.

也就是说，可以在目标播放界面中同时播放一个或多个广告，在此不作限定。其中，图4a和图4b所示的示例中，是以视频中的广告位为例进行说明的，实际应用中，对广告位具体是什么不作限定。In other words, one or more advertisements can be played simultaneously on the target playback interface, without limitation. The examples shown in Figures 4a and 4b illustrate advertisements within a video; in practical applications, the specific type of advertisement is not limited.

在本申请实施例中，获取候选媒体信息集合中的各第一媒体信息在上一时段的目标数据，该目标数据包括流量预测信息和播放相关数据中的至少一项，播放相关数据包括上一时段的历史播放量、媒体信息的竞争信息或目标播放量中的至少一项，对于每一第一媒体信息来说，可以基于获取到的在上一时段的目标数据，确定出该第一媒体信息在当前时段的第一播放评估值，然后结合获取到的各第二媒体信息在当前时段的第二播放评估值，可以对候选媒体信息集合中的各第一媒体信息和各第二媒体信息进行排序，确定当前时段的待播放媒体信息。采用上述方式，可以依据目标数据确定各第一媒体信息的第一播放评估值，并依据各第一媒体信息的第一播放评估值和第二播放评估值来确定要在当前时段播放的待播放媒体信息，能够有效利用目标数据确定第一媒体信息的播放评估值，并利用播放评估值确定待播放的媒体信息，提高了展示待播放媒体信息的合理性。In this embodiment, target data for each first media information in the candidate media information set in the previous time period is obtained. This target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous time period, media information competition information, or target playback volume. For each first media information, a first playback evaluation value for the current time period can be determined based on the obtained target data from the previous time period. Then, combined with the obtained second playback evaluation values for each second media information in the current time period, the first and second media information in the candidate media information set can be sorted to determine the media information to be played in the current time period. Using this method, the first playback evaluation value for each first media information can be determined based on the target data, and the media information to be played in the current time period can be determined based on the first and second playback evaluation values. This effectively utilizes target data to determine the playback evaluation value of the first media information and uses the playback evaluation value to determine the media information to be played, improving the rationality of displaying the media information to be played.

在一种可选的实施例中，上述播放竞争信息包括上一时段的播放评估值，对于任一上述第一媒体信息，上述基于上述第一媒体信息的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值，包括：In an optional embodiment, the playback competition information includes the playback evaluation value of the previous time period. For any of the first media information, determining the first playback evaluation value of the first media information in the current time period based on the target data of the first media information includes:

根据上述目标数据，对上述第一媒体信息上一时段的播放评估值进行调整，得到上述第一媒体信息在当前时段的第一播放评估值。Based on the aforementioned target data, the playback evaluation value of the aforementioned first media information in the previous time period is adjusted to obtain the first playback evaluation value of the aforementioned first media information in the current time period.

可选的，在获取到目标数据后，可以根据目标数据对第一媒体信息在上一时段的历史播放评估值进行调整，得到该第一媒体信息在当前时段的第一播放评估值。Optionally, after obtaining the target data, the historical playback evaluation value of the first media information in the previous time period can be adjusted based on the target data to obtain the first playback evaluation value of the first media information in the current time period.

具体地，可以综合考虑上一时段该第一媒体信息的出价情况、曝光率、点击率、转化率，其他第二媒体信息的播放评估值、其他第一媒体信息的播放评估值、该第一媒体信息的流量预测信息、该第一媒体信息尚未完成的播放量、对应流量上的第二媒体信息的ecpm分布、其他第一媒体信息的竞争个数分布等来得到该第一媒体信息在当前时段的第一播放评估值。Specifically, the first playback evaluation value of the first media information in the current time period can be obtained by comprehensively considering the bidding situation, exposure rate, click rate, and conversion rate of the first media information in the previous period, the playback evaluation value of other second media information, the playback evaluation value of other first media information, the traffic prediction information of the first media information, the number of unfinished playbacks of the first media information, the ecpm distribution of the second media information on the corresponding traffic, and the distribution of the number of competitors of other first media information.

以广告为例，可以综合考虑混排阶段的竞争环境状态的因素，即合约广告参与混排的个数、效果广告的出价分布、其他合约广告的出价分步、胜率、ctr/cvr分布等。流量预测状态的因素，即该合约广告对应的尚未完成的播放量、对应流量上的效果广告ecpm分布、合约广告的竞争个数分布等。投放状态的因素，即合约广告的预定量、已经曝光的广告数量等。Taking advertising as an example, factors such as the competitive environment during the mixed-broadcast phase can be comprehensively considered, including the number of contract ads participating in the mixed-broadcast, the bid distribution of performance ads, the bid distribution of other contract ads, win rate, CTR/CVR distribution, etc. Traffic prediction factors include the number of unfinished plays for the contract ad, the ECPM distribution of performance ads on the corresponding traffic, and the distribution of competing contract ads. The delivery status factors include the pre-ordered volume of contract ads and the number of ads already exposed.

举例来说，如果在当前时段第一媒体信息的目标播放量仍未达到，第一媒体信息在上一时段的播放情况比较好，如点击率和/或转化率相对比较高，满足了设定的要求，那么，可以保持上一时段的历史播放评估值不变，或者稍微降低一些历史播放评估值来得到当前时段的第一播放评估值。For example, if the target number of views for the first media message has not yet been reached in the current time period, but the first media message performed well in the previous time period, such as having a relatively high click-through rate and/or conversion rate, thus meeting the set requirements, then the historical view evaluation value of the previous time period can be kept unchanged, or the historical view evaluation value can be slightly reduced to obtain the first view evaluation value for the current time period.

如果在当前时段第一媒体信息的目标播放量仍未达到，第一媒体信息在上一时段的播放情况比较差，如点击率和/或转化率相对比较低，没有满足设定的要求，那么，可以在上一时段的历史播放评估值的基础上，提高该历史播放评估值，得到当前时段的第一播放评估值。If the target number of views for the first media message in the current time period has not been reached, and the performance of the first media message in the previous time period was relatively poor, such as a relatively low click-through rate and/or conversion rate, and the set requirements were not met, then the historical playback evaluation value of the previous time period can be increased to obtain the first playback evaluation value for the current time period.

如果在当前时段第一媒体信息的目标播放量仍未达到，第一媒体信息在上一时段的播放情况比较好，如点击率和/或转化率相对比较高，满足了设定的要求，并且由流量预测信息可知，在当前时间段之后有较大的流量产生，那么，可以保持上一时段的历史播放评估值不变，或者稍微降低一些历史播放评估值来得到当前时段的第一播放评估值。If the target number of views for the first media message has not yet been reached in the current time period, but the first media message performed well in the previous time period, such as having a relatively high click-through rate and/or conversion rate, meeting the set requirements, and traffic prediction information indicates that there will be significant traffic generation after the current time period, then the historical playback evaluation value of the previous time period can be kept unchanged, or the historical playback evaluation value can be slightly reduced to obtain the first playback evaluation value for the current time period.

如果在当前时段第一媒体信息的目标播放量仍未达到，第一媒体信息在上一时段的播放情况比较差，如点击率和/或转化率相对比较低，没有满足设定的要求，并且由流量预测信息可知，在当前时间段之后没有较大的流量产生，那么，可以在上一时段的历史播放评估值的基础上，提高该历史播放评估值，得到当前时段的第一播放评估值。If the target number of views for the first media message has not been reached in the current time period, the performance of the first media message in the previous time period was relatively poor, such as a relatively low click-through rate and/or conversion rate, and the set requirements have not been met, and the traffic prediction information indicates that there will be no significant traffic generation after the current time period, then the historical playback evaluation value can be increased based on the historical playback evaluation value of the previous time period to obtain the first playback evaluation value for the current time period.

还可以考虑除了该第一媒体信息之外的其他媒体信息的播放评估值情况这个因素，如果由其他媒体信息播放评估值情况得知，其他媒体信息播放评估值比较高(即出价较高)，表明竞争比较大，如果该第一媒体信息尚未完成的播放量已经所剩不多，可以在上一时段的历史播放评估值得基础上降低当前时段的播放评估值，如果该第一媒体信息尚未完成的播放量还比较多，可以在上一时段的历史播放评估值得基础上增加当前时段的播放评估值。The playback evaluation value of other media information besides the first media information can also be considered. If the playback evaluation value of other media information is high (i.e., the bid is high), it indicates that the competition is relatively fierce. If the number of unfinished playbacks of the first media information is not much, the playback evaluation value of the current period can be reduced based on the historical playback evaluation value of the previous period. If the number of unfinished playbacks of the first media information is still relatively large, the playback evaluation value of the current period can be increased based on the historical playback evaluation value of the previous period.

可理解，上述仅为几种可能的示例，在实际应用中，会综合考虑各种会对播放量以及ecpm值产生影响的各种因素，来调整当前时段的播放评估值(即出价)，本实施例在此不作任何限定。It is understood that the above are only a few possible examples. In practical applications, various factors that may affect the number of plays and the eCPM value will be comprehensively considered to adjust the play evaluation value (i.e., bid) for the current time period. This embodiment does not impose any limitations here.

在一种可能的实施例中，上述方法还包括：In one possible embodiment, the above method further includes:

可选的，在实际应用中，在确定待播放媒体信息时，除了考虑播放评估值(即出价)这个因素，还需要考虑媒体信息的被播放概率。也就是说，通过综合考虑各第一媒体信息的第一播放评估值、被播放概率以及各第二媒体信息的第二播放评估值，从各候选媒体信息中确定出当前时段对应的待播放媒体信息。Optionally, in practical applications, when determining the media information to be played, in addition to considering the playback evaluation value (i.e., the bid), it is also necessary to consider the probability of the media information being played. That is, by comprehensively considering the first playback evaluation value and the probability of being played for each first media information, as well as the second playback evaluation value for each second media information, the media information to be played for the current time period is determined from the candidate media information.

在一种可选的实施例中，上述确定各上述第一媒体信息在当前时段的被播放概率，包括：In an optional embodiment, determining the probability of each of the first media information being played in the current time period includes:

可选的，展示评估数据可以是能够影响第一媒体信息的被播放概率的相关的信息，可以通过展示评估数据来确定第一媒体信息对应于当前时段的被播放概率。Optionally, the evaluation data can be relevant information that can affect the probability of the first media information being played. The probability of the first media information being played in the current time period can be determined by displaying the evaluation data.

可选的，当前终端设备对应的目标用户的画像数据可以理解为用户的年龄、性别、受教育程度、兴趣爱好等。当前终端设备的设备相关信息可以理解为当前终端设备的对应的网络情况、用户设备的类型，如第五代移动通信技术(5th generation mobilenetworks，简称5G)手机，第四代移动通讯技术(the 4Generation mobile communicationtechnology，简称4G)手机，第三代移动通讯技术(The 3rd GenerationTelecommunication，简称3G)手机，第二代手机通信技术(2-Generation wirelesstelephone technology，简称2G)手机，以及各类型手机对应的产品品牌等。展示评估数据包括终端设备对应的当前时间信息(如晚上3点、早上10点等)、终端设备对应的当前地点信息(如北京、上海、青藏等)。目标用户对应于当前终端设备的行为统计信息可以理解为用户的行为统计类特征(如当日已经观看视频次数、广告次数、使用终端设备的时间等)Optionally, the target user profile data corresponding to the current terminal device can be understood as the user's age, gender, education level, interests, etc. Device-related information for the current terminal device can be understood as the network status of the current terminal device, the type of user device, such as 5G phones, 4G phones, 3G phones, 2G phones, and the corresponding product brands for each type of phone. The evaluation data displayed includes the current time information corresponding to the terminal device (e.g., 3 PM, 10 AM), and the current location information corresponding to the terminal device (e.g., Beijing, Shanghai, Qinghai-Tibet, etc.). The behavioral statistics of the target user corresponding to the current terminal device can be understood as the user's behavioral statistical characteristics (e.g., the number of videos watched, the number of advertisements viewed, and the time spent using the terminal device that day).

通过展示评估信息，可以确定出第一媒体信息的被播放概率。By displaying evaluation information, the probability of the primary media information being played can be determined.

通过本申请实施例，可以综合考虑播放评估值和被播放概率的因素，来确定待播放媒体信息，提高了确定待播放媒体信息的准确性。The embodiments of this application can comprehensively consider factors such as playback evaluation value and probability of being played to determine the media information to be played, thereby improving the accuracy of determining the media information to be played.

在一种可选的实施例中，上述对于每一上述第一媒体信息，基于上述第一媒体信息在上一时段的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值，是通过媒体信息评估模型实现的，上述媒体信息评估模型是通过以下方式训练得到的：In an optional embodiment, the determination of the first playback evaluation value of each of the first media information in the current time period based on the target data of the first media information in the previous time period is achieved through a media information evaluation model, which is trained in the following manner:

可选的，上述预测播放评估值是通过初始信息评估模型得到的，该预测播放评估值表征了上述第三媒体信息的被推广概率。Optionally, the predicted playback evaluation value is obtained through an initial information evaluation model, and the predicted playback evaluation value represents the probability of the aforementioned third media information being promoted.

确定第一媒体信息的第一播放评估值是通过媒体信息评估模型实现的，该媒体信息评估模型是通过以下方式训练得到的：The first playback evaluation value of the first media information is determined through a media information evaluation model, which is trained in the following way:

以广告为例，通过真实的历史数据，如前一个月的精排日志(即广告的相关历史播放数据的记录)，作为训练样本集。或者，还可以通过前文中的描述，将按照样本A的形式存储的样本集作为训练样本集，在此不作任何限定。Taking advertising as an example, real historical data, such as the previous month's detailed scheduling logs (i.e., records of the relevant historical playback data of the advertisement), can be used as the training sample set. Alternatively, as described above, the sample set stored in the form of sample A can also be used as the training sample set; no restrictions are imposed here.

对于该媒体信息评估模型的具体训练过程可参考前文中训练Action模型的具体过程，在此不再赘述。For the specific training process of this media information evaluation model, please refer to the specific process of training the Action model mentioned above, which will not be repeated here.

本申请实施例中，通过这种强化学习的方式，通过状态信息State确定出价Action，然后对出价Action通过奖赏机制Reward来评判出价是否合理，使得模型不断的学习，直到具备能够完成合理的出价，表示该模型训练完成，通过模型自动出价来确定待播放媒体信息，有效解决了对广告进行混合排序时合约广告保量和效果广告ecpm最大化的问题，达到了混排系统的最优化，提高了混合排序的合理性。In this embodiment of the application, the bidding action is determined by the state information (State) through this reinforcement learning method. Then, the bidding action is evaluated by the reward mechanism (Reward) to determine whether the bid is reasonable. This allows the model to learn continuously until it is able to make reasonable bids, indicating that the model training is complete. The model automatically bids to determine the media information to be played, which effectively solves the problem of maximizing the volume of contract ads and the ecpm of performance ads when performing mixed ranking of ads. This achieves the optimization of the mixed ranking system and improves the rationality of mixed ranking.

在一种可选的实施例中，每个上述训练样本还包括上述第三媒体信息在初始时段的下一时段的真实播放评估值、初始时段的下一时段的真实评估效果表征值和上述初始时段的第一时段的样本目标数据，其中，上述第一时段为下一时段的下一时段；In an optional embodiment, each of the above-mentioned training samples further includes the actual playback evaluation value of the third media information in the next time period after the initial time period, the actual evaluation effect representation value in the next time period after the initial time period, and the sample target data of the first time period of the initial time period, wherein the first time period is the next time period after the next time period.

其中，对于每个上述第三媒体信息，基于上述第三媒体信息在初始时段的样本目标数据和预测播放评估值，确定上述第三媒体信息对应的第一预测评估效果表征值，是通过效果评估模型实现的，上述效果评估模型是通过以下方式训练得到的：Specifically, for each of the aforementioned third media information, the first predicted evaluation effect representation value corresponding to the aforementioned third media information is determined based on the sample target data and predicted playback evaluation value of the aforementioned third media information in the initial time period. This is achieved through an effect evaluation model, which is trained in the following manner:

可选的，对于该效果评估模型模型的具体训练过程可以参考前文中训练critic模型的具体过程，在此不再赘述。Optionally, the specific training process for this performance evaluation model can be found in the previous section on training the critic model, and will not be repeated here.

在一种可选的实施例中，每个上述训练样本还包括上述第三媒体信息所对应的至少一个第二类型的第四媒体信息在上述初始时段的下一时段的播放效果评估参量，对于任一上述第三媒体信息，上述真实评估效果表征值是通过以下方式得到的：In an optional embodiment, each of the above-mentioned training samples further includes a playback effect evaluation parameter for at least one second-type fourth media information corresponding to the above-mentioned third media information in the next time period after the above-mentioned initial time period. For any of the above-mentioned third media information, the above-mentioned true evaluation effect representation value is obtained in the following manner:

可选的，对于上述真实评估效果表征值的具体计算方式，真实评估效果表征值的计算方式可参考文中奖赏Reward的计算方式，在此不再赘述。Optionally, for the specific calculation method of the above-mentioned true evaluation effect characterization value, please refer to the calculation method of the reward in the text, which will not be repeated here.

本申请实施例中所涉及的各电子设备(包括但不限于用户终端、终端设备)可以是区块链系统的节点，所涉及的服务器(可以是区块链服务器)，本申请实施例中所需要存储的数据也可以存储于区块链节点中。本申请实施例涉及的媒体信息处理系统可以是由客户端、多个区块链节点(接入网络中的任意形式的计算设备，如服务器、用户终端)通过网络通信的形式连接形成的分布式系统。The electronic devices (including but not limited to user terminals and terminal devices) involved in the embodiments of this application can be nodes in a blockchain system, and the servers involved (which can be blockchain servers) can also be stored in blockchain nodes. The media information processing system involved in the embodiments of this application can be a distributed system formed by connecting clients and multiple blockchain nodes (any form of computing device in the network, such as servers and user terminals) through network communication.

以分布式系统为区块链系统为例，参见图5，图5是本发明实施例提供的一种可选的分布式系统100应用于区块链系统的结构示意图，由多个节点200(接入网络中的任意形式的计算设备，如服务器、用户终端)和客户端300形成，节点之间形成组成的点对点(P2P，Peer To Peer)网络，P2P协议是一个运行在传输控制协议(TCP，Transmission ControlProtocol)协议之上的应用层协议。在分布式系统中，任何机器如服务器、终端都可以加入而成为节点，节点包括硬件层、中间层、操作系统层和应用层。Taking a distributed system as an example of a blockchain system, refer to Figure 5. Figure 5 is a schematic diagram of the structure of an optional distributed system 100 applied to a blockchain system according to an embodiment of the present invention. It consists of multiple nodes 200 (any form of computing device in the network, such as servers or user terminals) and clients 300. The nodes form a peer-to-peer (P2P) network. The P2P protocol is an application layer protocol running on top of the Transmission Control Protocol (TCP). In the distributed system, any machine, such as a server or terminal, can join and become a node. A node includes a hardware layer, a middleware layer, an operating system layer, and an application layer.

参见图5示出的区块链系统中各节点的功能，涉及的功能包括：Refer to Figure 5 for the functions of each node in the blockchain system, which include:

1)路由，节点具有的基本功能，用于支持节点之间的通信。1) Routing: A basic function of nodes used to support communication between nodes.

节点除具有路由功能外，还可以具有以下功能：In addition to routing capabilities, nodes can also have the following functions:

2)应用，用于部署在区块链中，根据实际业务需求而实现特定业务，记录实现功能相关的数据形成记录数据，在记录数据中携带数字签名以表示任务数据的来源，将记录数据发送到区块链系统中的其他节点，供其他节点在验证记录数据来源以及完整性成功时，将记录数据添加到临时区块中。2) Applications are deployed in the blockchain to implement specific business needs. They record data related to the implementation of functions to form record data, carry digital signatures in the record data to indicate the source of the task data, and send the record data to other nodes in the blockchain system. When other nodes successfully verify the source and integrity of the record data, they add the record data to a temporary block.

例如，应用实现的业务包括：For example, the business logic implemented by the application includes:

2.1)钱包，用于提供进行电子货币的交易的功能，包括发起交易(即，将当前交易的交易记录发送给区块链系统中的其他节点，其他节点验证成功后，作为承认交易有效的响应，将交易的记录数据存入区块链的临时区块中；当然，钱包还支持查询电子货币地址中剩余的电子货币；2.1) A wallet is used to provide the function of conducting electronic currency transactions, including initiating transactions (i.e., sending the transaction record of the current transaction to other nodes in the blockchain system; after other nodes successfully verify the transaction, they store the transaction record data in the temporary block of the blockchain as a response to acknowledge the validity of the transaction; of course, the wallet also supports querying the remaining electronic currency in the electronic currency address;

2.2)共享账本，用于提供账目数据的存储、查询和修改等操作的功能，将对账目数据的操作的记录数据发送到区块链系统中的其他节点，其他节点验证有效后，作为承认账目数据有效的响应，将记录数据存入临时区块中，还可以向发起操作的节点发送确认。2.2) Shared ledger, used to provide functions such as storage, query and modification of ledger data. It sends the record data of the operation on the ledger data to other nodes in the blockchain system. After the other nodes verify the validity, as a response to acknowledge the validity of the ledger data, they store the record data in a temporary block. They can also send confirmation to the node that initiated the operation.

2.3)智能合约，计算机化的协议，可以执行某个合约的条款，通过部署在共享账本上的用于在满足一定条件时而执行的代码实现，根据实际的业务需求代码用于完成自动化的交易，例如查询买家所购买商品的物流状态，在买家签收货物后将买家的电子货币转移到商户的地址；当然，智能合约不仅限于执行用于交易的合约，还可以执行对接收的信息进行处理的合约。2.3) Smart contracts are computerized protocols that can execute the terms of a contract. They are implemented through code deployed on a shared ledger that executes when certain conditions are met. Based on actual business needs, the code is used to complete automated transactions, such as querying the logistics status of goods purchased by a buyer and transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods. Of course, smart contracts are not limited to executing contracts for transactions; they can also execute contracts for processing received information.

3)区块链，包括一系列按照产生的先后时间顺序相互接续的区块(Block)，新区块一旦加入到区块链中就不会再被移除，区块中记录了区块链系统中节点提交的记录数据。3) A blockchain consists of a series of blocks that are sequentially generated. Once a new block is added to the blockchain, it will not be removed. The blocks contain the data submitted by the nodes in the blockchain system.

参见图6，图6是本发明实施例提供的一种可选的区块结构(Block Structure)的示意图，每个区块中包括本区块存储交易记录的哈希值(本区块的哈希值)、以及前一区块的哈希值，各区块通过哈希值连接形成区块链。另外，区块中还可以包括有区块生成时的时间戳等信息。区块链(Blockchain)，本质上是一个去中心化的数据库，是一串使用密码学方法相关联产生的数据块，每一个数据块中包含了相关的信息，用于验证其信息的有效性(防伪)和生成下一个区块。Referring to Figure 6, which is a schematic diagram of an optional block structure provided by an embodiment of the present invention, each block includes the hash value of the transaction records stored in this block (the hash value of this block) and the hash value of the previous block. The blocks are connected through their hash values to form a blockchain. Additionally, the block may also include information such as a timestamp when it was generated. A blockchain is essentially a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains relevant information used to verify the validity of the information (anti-counterfeiting) and to generate the next block.

参见图7，图7是本申请实施例提供的一种媒体信息处理装置的结构示意图。本申请实施例提供的媒体信息处理装置1包括：Referring to Figure 7, Figure 7 is a schematic diagram of the structure of a media information processing device provided in an embodiment of this application. The media information processing device 1 provided in this embodiment of the application includes:

目标数据获取模块11，用于获取候选媒体信息集合中的各第一媒体信息在上一时段的目标数据，上述目标数据包括流量预测信息和播放相关数据中的至少一项，上述播放相关数据包括上一时段的历史播放量、媒体信息的播放竞争信息或目标播放量中的至少一项，上述候选媒体信息集合包括至少一个第一类型的第一媒体信息和至少一个第二类型的第二媒体信息；The target data acquisition module 11 is used to acquire the target data of each first media information in the candidate media information set in the previous time period. The target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous time period, playback competition information of media information, or target playback volume. The candidate media information set includes at least one first type of first media information and at least one second type of second media information.

播放评估值处理模块12，用于对于每一上述第一媒体信息，基于上述第一媒体信息在上一时段的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值；The playback evaluation value processing module 12 is used to determine the first playback evaluation value of the first media information in the current time period based on the target data of the first media information in the previous time period for each of the first media information.

上述播放评估值处理模块12，用于获取各上述第二媒体信息在当前时段的第二播放评估值，其中，对于上述第一媒体信息或第二媒体信息对应的播放评估值，播放评估值表征了媒体信息的被推广概率；The playback evaluation value processing module 12 is used to obtain the second playback evaluation value of each of the above-mentioned second media information in the current time period, wherein, for the playback evaluation value corresponding to the above-mentioned first media information or second media information, the playback evaluation value represents the probability of the media information being promoted;

待播放媒体信息处理模块13，用于根据各上述第一媒体信息对应的第一播放评估值以及各上述第二媒体信息的对应的第二播放评估值，从上述候选媒体信息集合中确定出当前时段的待播放媒体信息。The media information processing module 13 is used to determine the media information to be played in the current time period from the candidate media information set based on the first playback evaluation value corresponding to each of the first media information and the second playback evaluation value corresponding to each of the second media information.

上述播放竞争信息包括上述候选媒体信息集合的各媒体信息的以下信息中的至少一项：The aforementioned playback competition information includes at least one of the following pieces of information from each of the candidate media information sets:

上一时段的点击率；Click-through rate in the previous time period;

上一时段的转化率；Conversion rate in the previous period;

上一时段的曝光率；Exposure rate in the previous period;

在一种可选的实施例中，对于每一上述第一媒体信息，基于上述第一媒体信息在上一时段的目标数据，确定上述第一媒体信息在当前时段的第一播放评估值，是通过媒体信息评估模型实现的，上述装置还包括训练模块，上述训练模块用于训练媒体信息评估模型，上述媒体信息评估模型是通过以下方式训练得到的：In an optional embodiment, for each of the aforementioned first media information, determining the first playback evaluation value of the first media information in the current time period based on the target data of the first media information in the previous time period is achieved through a media information evaluation model. The device further includes a training module, which is used to train the media information evaluation model. The media information evaluation model is trained in the following manner:

具体实现中，上述媒体信息处理装置1可通过其内置的各个功能模块执行如上述图3中各个步骤所提供的实现方式，具体可参见上述各个步骤所提供的实现方式，在此不再赘述。In practice, the media information processing device 1 can execute the implementation methods provided by each step in Figure 3 above through its built-in functional modules. For details, please refer to the implementation methods provided by each step above, which will not be repeated here.

在本申请实施例中，获取候选媒体信息集合中的各第一媒体信息在上一时段的目标数据，该目标数据包括流量预测信息和播放相关数据中的至少一项，播放相关数据包括上一时段的历史播放量、媒体信息的竞争信息或目标播放量中的至少一项，对于每一第一媒体信息来说，可以基于获取到的上一时段的目标数据，确定出该第一媒体信息在当前时段的第一播放评估值，然后结合获取到的各第二媒体信息在当前时段的第二播放评估值，可以对候选媒体信息集合中的各第一媒体信息和各第二媒体信息进行排序，确定当前时段的待播放媒体信息。采用上述方式，可以依据目标数据确定各第一媒体信息的第一播放评估值，并依据各第一媒体信息的第一播放评估值和第二播放评估值来确定要在当前时段播放的待播放媒体信息，能够有效利用目标数据确定第一媒体信息的播放评估值，并利用播放评估值确定待播放的媒体信息，提高了展示待播放媒体信息的合理性。In this embodiment, target data for each first media information in the candidate media information set in the previous time period is obtained. This target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous time period, media information competition information, or target playback volume. For each first media information, a first playback evaluation value for the current time period can be determined based on the obtained target data from the previous time period. Then, combined with the obtained second playback evaluation values for each second media information in the current time period, the first and second media information in the candidate media information set can be sorted to determine the media information to be played in the current time period. Using this method, the first playback evaluation value for each first media information can be determined based on the target data, and the media information to be played in the current time period can be determined based on the first and second playback evaluation values. This effectively utilizes target data to determine the playback evaluation value of the first media information and uses the playback evaluation value to determine the media information to be played, improving the rationality of displaying the media information to be played.

上文主要介绍说明了执行主体为硬件，来实施本申请中的媒体信息处理方法，但是本申请的媒体信息处理方法的执行主体并不仅限于硬件，本申请中的媒体信息处理方法的执行主体还可以为软件，上述媒体信息处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码)，例如，该媒体信息处理装置为一个应用软件；该装置可以用于执行本申请实施例提供的方法中的相应步骤。The above mainly describes that the execution subject is hardware to implement the media information processing method in this application. However, the execution subject of the media information processing method in this application is not limited to hardware. The execution subject of the media information processing method in this application can also be software. The aforementioned media information processing device can be a computer program (including program code) running on a computer device. For example, the media information processing device is an application software. The device can be used to execute the corresponding steps in the method provided in the embodiments of this application.

在一些实施例中，本发明实施例提供的媒体信息处理装置可以采用软硬件结合的方式实现，作为示例，本发明实施例提供的媒体信息处理装置可以是采用硬件译码处理器形式的处理器，其被编程以执行本发明实施例提供的媒体信息处理方法，例如，硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC，Application SpecificIntegrated Circuit)、DSP、可编程逻辑器件(PLD，Programmable Logic Device)、复杂可编程逻辑器件(CPLD，Complex Programmable Logic Device)、现场可编程门阵列(FPGA，Field-Programmable Gate Array)或其他电子元件。In some embodiments, the media information processing apparatus provided in this invention can be implemented using a combination of hardware and software. As an example, the media information processing apparatus provided in this invention can be a processor in the form of a hardware decoding processor, which is programmed to execute the media information processing method provided in this invention. For example, the processor in the form of a hardware decoding processor can be one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components.

在另一些实施例中，本发明实施例提供的媒体信息处理装置可以采用软件方式实现，图8示出的媒体信息处理装置1，其可以是程序和插件等形式的软件，并包括一系列的模块，包括目标数据获取模块11、播放评估值处理模块12、待播放媒体信息处理模块13；其中，目标数据获取模块11、播放评估值处理模块12、待播放媒体信息处理模块13用于实现本发明实施例提供的媒体信息处理方法。In other embodiments, the media information processing device provided in the embodiments of the present invention can be implemented in software. The media information processing device 1 shown in FIG8 can be software in the form of programs and plug-ins, and includes a series of modules, including a target data acquisition module 11, a playback evaluation value processing module 12, and a media information processing module to be played 13; wherein, the target data acquisition module 11, the playback evaluation value processing module 12, and the media information processing module to be played are used to implement the media information processing method provided in the embodiments of the present invention.

参见图8，图8是本申请实施例提供的一种电子设备的结构示意图。如图8所示，本实施例中的电子设备1000可以包括：处理器1001，网络接口1004和存储器1005，此外，上述电子设备1000还可以包括：用户接口1003，和至少一个通信总线1002。其中，通信总线1002用于实现这些组件之间的连接通信。其中，用户接口1003可以包括显示屏(Display)、键盘(Keyboard)，可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1004可以是高速RAM存储器，也可以是非不稳定的存储器(non-volatile memory)，例如至少一个磁盘存储器。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图8所示，作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。Referring to Figure 8, which is a schematic diagram of the structure of an electronic device provided in an embodiment of this application, the electronic device 1000 in this embodiment may include a processor 1001, a network interface 1004, and a memory 1005. Furthermore, the electronic device 1000 may also include a user interface 1003 and at least one communication bus 1002. The communication bus 1002 is used to enable communication between these components. The user interface 1003 may include a display screen and a keyboard; optionally, the user interface 1003 may also include a standard wired interface or a wireless interface. The network interface 1004 may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface). The memory 1004 may be a high-speed RAM or a non-volatile memory, such as at least one disk storage device. The memory 1005 may optionally be at least one storage device located remotely from the processor 1001. As shown in Figure 8, the memory 1005, as a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and a device control application.

在图8所示的电子设备1000中，网络接口1004可提供网络通讯功能；而用户接口1003主要用于为用户提供输入的接口；而处理器1001可以用于调用存储器1005中存储的计算机程序。In the electronic device 1000 shown in Figure 8, the network interface 1004 provides network communication functions; the user interface 1003 is mainly used to provide an input interface for users; and the processor 1001 can be used to call computer programs stored in the memory 1005.

应当理解，在一些可行的实施方式中，上述处理器1001可以是中央处理单元(central processing unit，CPU)，该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor，DSP)、专用集成电路(application specific integratedcircuit，ASIC)、现成可编程门阵列(field-programmable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。该存储器可以包括只读存储器和随机存取存储器，并向处理器提供指令和数据。存储器的一部分还可以包括非易失性随机存取存储器。例如，存储器还可以存储设备类型的信息。It should be understood that in some feasible implementations, the processor 1001 described above may be a central processing unit (CPU), which may also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor. The memory may include read-only memory and random access memory, and provides instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

具体实现中，上述电子设备1000可通过其内置的各个功能模块执行如上述图3中各个步骤所提供的实现方式，具体可参见上述各个步骤所提供的实现方式，在此不再赘述。In practice, the aforementioned electronic device 1000 can execute the implementation methods provided in the steps shown in Figure 3 above through its built-in functional modules. For details, please refer to the implementation methods provided in the steps above, which will not be repeated here.

本申请实施例还提供一种计算机可读存储介质，该计算机可读存储介质存储有计算机程序，被处理器执行以实现图3中各个步骤所提供的方法，具体可参见上述各个步骤所提供的实现方式，在此不再赘述。This application also provides a computer-readable storage medium storing a computer program that is executed by a processor to implement the methods provided in the various steps of FIG3. For details, please refer to the implementation methods provided in the above steps, which will not be repeated here.

上述计算机可读存储介质可以是前述任一实施例提供的任务处理装置的内部存储单元，例如电子设备的硬盘或内存。该计算机可读存储介质也可以是该电子设备的外部存储设备，例如该电子设备上配备的插接式硬盘，智能存储卡(smart media card,SMC)，安全数字(secure digital,SD)卡，闪存卡(flash card)等。上述计算机可读存储介质还可以包括磁碟、光盘、只读存储记忆体(read-only memory，ROM)或随机存储记忆体(randomaccess memory，RAM)等。进一步地，该计算机可读存储介质还可以既包括该电子设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该电子设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The aforementioned computer-readable storage medium can be an internal storage unit of the task processing apparatus provided in any of the foregoing embodiments, such as a hard disk or memory of an electronic device. The computer-readable storage medium can also be an external storage device of the electronic device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the electronic device. The aforementioned computer-readable storage medium can also include magnetic disks, optical disks, read-only memory (ROM), or random access memory (RAM), etc. Furthermore, the computer-readable storage medium can include both internal storage units and external storage devices of the electronic device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

本申请实施例提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述图3中任一种可能的实施方式所提供的方法。This application provides a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of an electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the method provided in any of the possible embodiments shown in FIG3.

本申请的权利要求书和说明书及附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置展示该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。The terms "first," "second," etc., used in the claims, description, and drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that comprises a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or apparatuses. References to "embodiment" herein mean that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The presentation of this phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments. The term "and/or" as used in the description and appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Those skilled in the art can implement the described functions using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

以上所揭露的仅为本申请较佳实施例而已，当然不能以此来限定本申请之权利范围，因此依本申请权利要求所作的等同变化，仍属本申请所涵盖的范围。The above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application shall still fall within the scope of this application.

Claims

1. A media information processing method, characterized in that it includes:

Obtain the target data of each first media information in the candidate media information set in the previous time period. The target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous time period, playback competition information of media information, or target playback volume. The candidate media information set includes at least one first type of first media information and at least one second type of second media information.

For each piece of the first media information, based on the target data of the first media information in the previous time period, determine the first playback evaluation value of the first media information in the current time period;

Obtain the second playback evaluation value of each of the second media information in the current time period, wherein, for the playback evaluation value corresponding to the first media information or the second media information, the playback evaluation value represents the probability of the media information being promoted;

Based on the first playback evaluation value corresponding to each of the first media information and the second playback evaluation value corresponding to each of the second media information, the media information to be played in the current time period is determined from the candidate media information set.

2. The method according to claim 1, wherein the playback competition information includes at least one of the following information from each media information in the candidate media information set:

Click-through rate in the previous time period;

Conversion rate in the previous period;

Exposure rate in the previous period;

The playback evaluation value of the previous time period.

3. The method according to claim 2, wherein the playback competition information includes the playback evaluation value of the previous time period, and for any first media information, determining the first playback evaluation value of the first media information in the current time period based on the target data of the first media information includes:

Based on the target data, the playback evaluation value of the first media information in the previous time period is adjusted to obtain the first playback evaluation value of the first media information in the current time period.

4. The method according to claim 1, characterized in that the method further comprises:

Determine the probability of each of the first media information being played in the current time period;

The step of determining the media information to be played in the current time period from the candidate media information set based on the first playback evaluation value corresponding to each of the first media information and the second playback evaluation value corresponding to each of the second media information includes:

Based on the first playback evaluation value corresponding to each of the first media information, the probability of being played, and the second playback evaluation value corresponding to each of the second media information, the media information to be played in the current time period is determined from the candidate media information.

5. The method according to claim 4, wherein determining the probability of each of the first media information being played in the current time period includes:

For each piece of the first media information, obtain the display evaluation data corresponding to the current time period for the first media information, wherein the display evaluation data refers to information that affects the probability of the first media information being played;

For each piece of the first media information, the probability of the first media information being played in the current time period is determined based on the display evaluation data corresponding to the first media information.

6. The method according to claim 5, wherein for any of the first media information, the display evaluation data includes at least one of the following:

Profile data of the target user corresponding to the current terminal device;

The device information of the current terminal device;

The attribute information of the first media information;

The current time information corresponding to the terminal device;

The current location information corresponding to the terminal device;

The target user corresponds to the behavioral statistics of the current terminal device.

7. The method according to any one of claims 1 to 6, characterized in that, for each piece of first media information, determining the first playback evaluation value of the first media information in the current time period based on the target data of the first media information in the previous time period is achieved through a media information evaluation model, wherein the media information evaluation model is trained in the following manner:

Obtain a training sample set, wherein each training sample in the training sample set includes sample target data of a first type of third media information in the initial time period;

The sample target data of each of the third media information in the initial time period are input into the initial information evaluation model to obtain the predicted playback evaluation value of each of the third media information in the next time period in the initial time period.

For each of the third media information, based on the sample target data of the third media information in the initial time period and the predicted playback evaluation value in the next time period, the first predicted evaluation effect characterization value corresponding to the third media information is determined;

Based on the first prediction effect evaluation characterization value corresponding to each of the third media information, the first total training loss corresponding to the initial information evaluation model is determined.

The initial information evaluation model is repeatedly trained based on each training sample and the first total training loss until the total training loss meets the preset first training termination condition, thereby obtaining the media information evaluation model.

8. The method according to claim 7, wherein each training sample further includes the actual playback evaluation value of the third media information in the next time period after the initial time period, the actual evaluation effect representation value in the next time period after the initial time period, and the sample target data of the first time period, wherein the first time period is the next time period after the next time period of the initial time period;

Specifically, for each piece of third media information, the first predicted evaluation effect representation value corresponding to the third media information is determined based on the sample target data of the third media information in the initial time period and the predicted playback evaluation value in the next time period. This is achieved through an effect evaluation model, which is trained in the following way:

The sample target data of each of the third media information in the first time period are input into the initial information evaluation model to obtain the second predicted playback evaluation value of each of the third media information in the first time period;

The sample target data of each of the third media information in the first time period and the corresponding second predicted playback evaluation value are input into the initial effect evaluation model to obtain the second predicted evaluation effect characterization value corresponding to each of the third media information.

For each of the third media information, based on the actual evaluation effect characterization value and the second predicted evaluation effect characterization value, a first evaluation effect characterization value corresponding to the third media information is determined. Based on the sample target data of the third media information in the initial period and the actual playback evaluation value of the next period in the initial period, a second evaluation effect characterization value corresponding to the third media information is obtained through the effect evaluation model.

Based on the first and second evaluation effect characterization values corresponding to each of the third media information, the second total training loss corresponding to the effect evaluation model is determined.

Based on each training sample and the second total training loss, the performance evaluation model is repeatedly trained until the preset second training termination condition is met.

9. The method according to claim 8, wherein each training sample further includes a playback effect evaluation parameter for at least one second-type fourth media information corresponding to the third media information in the next time period after the initial time period, and for any third media information, the true evaluation effect representation value is obtained in the following manner:

Obtain the target playback count of the third media information and the playback count of the next time period corresponding to the initial time period;

Based on the target number of plays and the number of plays, determine the playback effect evaluation parameters of the third media information;

The true evaluation effect representation value is determined based on the playback effect evaluation parameters of the third media information and the playback effect parameters of each fourth media information corresponding to the third media information.

10. A media information processing apparatus, characterized in that the apparatus comprises:

The target data acquisition module is used to acquire the target data of each first media information in the candidate media information set in the previous time period. The target data includes at least one of traffic prediction information and playback-related data. The playback-related data includes at least one of historical playback volume in the previous time period, playback competition information of media information, or target playback volume. The candidate media information set includes at least one first type of first media information and at least one second type of second media information.

The playback evaluation value processing module is used to determine the first playback evaluation value of the first media information in the current time period based on the target data of the first media information in the previous time period for each first media information.

The playback evaluation value processing module is used to obtain the second playback evaluation value of each of the second media information in the current time period, wherein, for the playback evaluation value corresponding to the first media information or the second media information, the playback evaluation value represents the probability of the media information being promoted;

The media information processing module is used to determine the media information to be played in the current time period from the candidate media information set based on the first playback evaluation value corresponding to each of the first media information and the second playback evaluation value corresponding to each of the second media information.

11. An electronic device, characterized in that it comprises a processor and a memory, the processor and the memory being interconnected;

The memory is used to store computer programs;

The processor is configured to perform the method as described in any one of claims 1 to 9 when the computer program is invoked.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, the computer program being executed by a processor to implement the method according to any one of claims 1 to 9.