CN109660598B

CN109660598B - Cache replacement method and system for transient data of Internet of things

Info

Publication number: CN109660598B
Application number: CN201811370683.4A
Authority: CN
Inventors: 曹洋; 褚磊; 竺浩; 江涛
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-11-17
Filing date: 2018-11-17
Publication date: 2020-05-19
Anticipated expiration: 2038-11-17
Also published as: CN109660598A

Abstract

The invention discloses a cache replacement method and system for the transient data of the Internet of Things. The deep reinforcement learning method is used to learn the cache strategy of the transient data, the heat trend of the data is mined from the data request history of the edge cache node, and the transient data is combined. Information, as the input of deep reinforcement learning, the immediate reward is set as the inverse of the comprehensive communication cost, using the critic network to learn the value function, the actor network to learn the strategy function, and through the continuous cache replacement operation, adaptively learn the cache strategy, Solve the problem of low efficiency of transient data caching in edge cache nodes under the condition of limited storage resources; incorporate the data freshness of IoT data and the consumption of communication resources into the comprehensive communication cost to minimize the long-term comprehensive cost of acquiring IoT data , which can offload network traffic to the edge of the network, reduce the delay, and solve the problems of large delay and large consumption of communication resources in the massive transient data transmission of the Internet of Things to a certain extent.

Description

A method and system for cache replacement of transient data in the Internet of Things

技术领域technical field

本发明属于无线通信领域，更具体地，涉及一种物联网暂态数据的缓存替换方法及系统。The invention belongs to the field of wireless communication, and more particularly, relates to a cache replacement method and system for transient data of the Internet of Things.

背景技术Background technique

随着物联网(IoT)在智能交通、智能电网、智能家居以及工业自动化等领域的迅速发展和广泛应用，海量的物联网数据流量给现今的通信网络带来了巨大的压力和挑战。为了解决上述问题，一种常见的思路是在物联网中加入边缘缓存机制，利用网络边缘缓存节点空闲的存储资源，缓存热点数据，请求端可以直接从相应的边缘缓存节点获取数据，而无需从数据源处获取，从而避免了大量不必要的端到端通信。物联网系统中的边缘缓存可以卸载网络流量，降低网络时延，提供更好的服务质量和用户体验。由于边缘缓存节点的存储容量一般比较有限，一个高效的缓存替换策略能够提高缓存击中率，从而更高效的利用缓存空间，卸载更多的网络流量。物联网系统中大量应用还对数据的时效性有要求，只有在一定时效内的数据才是可用的数据，因而缓存数据的新鲜度也是缓存替换时一个重要的考量标准。因而，物联网系统中边缘缓存替换策略，需要同时考虑缓存数据的热度和新鲜度信息，从而更好的满足物联网场景下的缓存需求。With the rapid development and wide application of the Internet of Things (IoT) in the fields of smart transportation, smart grid, smart home, and industrial automation, massive IoT data traffic has brought enormous pressure and challenges to today's communication networks. In order to solve the above problems, a common idea is to add an edge cache mechanism to the Internet of Things, using the idle storage resources of network edge cache nodes to cache hot data, and the requester can directly obtain data from the corresponding edge cache nodes data source, thus avoiding a lot of unnecessary end-to-end communication. Edge caching in IoT systems can offload network traffic, reduce network latency, and provide better service quality and user experience. Since the storage capacity of edge cache nodes is generally limited, an efficient cache replacement strategy can improve the cache hit rate, so as to use the cache space more efficiently and offload more network traffic. A large number of applications in the IoT system also have requirements on the timeliness of data. Only data within a certain timeliness is available data. Therefore, the freshness of cached data is also an important consideration for cache replacement. Therefore, the edge cache replacement strategy in the IoT system needs to consider the heat and freshness information of the cached data at the same time, so as to better meet the cache requirements in the IoT scenario.

传统的缓存替换方法，如先进先出方法，最近最少使用方法，最不经常使用等，由于未考虑内容的热度趋势和用户请求分布，缓存效率较低。现有的边缘缓存的缓存替换策略包括：E.Bastug等人提出的利用用户-数据的相关性，通过协同过滤预测数据的热度，缓存策略在开始时贪婪地缓存热点数据，直到边缘缓存节点缓存耗尽，然后根据预测出来的热度信息，进行缓存替换；P.Blasco等人提出的通过背包方法(Knapsack)解决小型基站中数据放置的问题，其中，数据热度是根据数据接收到的请求速率估算的；J.Song等人提出的引入多臂赌博机(multi-armed bandit,MAB)方法，将数据热度与数据缓存过程结合起来考虑；S.Tanzil等人提出的通过构建神经网络估计数据的热度，使用混合整数线性规划方法计算缓存的放置位置和大小，但是该方法的热度预测阶段，需要利用视频内容的关键字和分类信息，这对一般的IoT数据不适用。Traditional cache replacement methods, such as first-in first-out method, least recently used method, least frequently used, etc., have low cache efficiency because they do not consider the hot trend of content and the distribution of user requests. Existing cache replacement strategies for edge caches include: E. Bastug et al. used user-data correlation to predict the hotness of data through collaborative filtering. The cache strategy greedily caches hot data at the beginning until the edge cache node caches. Exhausted, and then perform cache replacement according to the predicted heat information; P. Blasco et al. proposed to solve the problem of data placement in small base stations through the knapsack method (Knapsack), in which the data heat is estimated according to the request rate received by the data. The multi-armed bandit (MAB) method proposed by J.Song et al. combines the data heat with the data caching process; S. Tanzil et al. proposed to estimate the data heat by constructing a neural network , using the mixed integer linear programming method to calculate the location and size of the cache, but the popularity prediction stage of this method needs to use the keyword and classification information of the video content, which is not applicable to general IoT data.

上述方法主要集中于非暂态数据的边缘缓存，边缘缓存节点通过估计数据的热度来决定是否缓存如何替换缓存。一方面，由于假设数据的热度分布和用户请求服从特定的分布(如泊松分布)，从而无法适应数据热度和用户请求分布快速变化的场景；另一方面，只专注于非暂态数据的缓存，没有考虑暂态数据的时效性问题。The above methods mainly focus on the edge cache of non-transitory data, and the edge cache node decides whether to cache and how to replace the cache by estimating the heat of the data. On the one hand, due to the assumption that the data heat distribution and user requests obey a specific distribution (such as Poisson distribution), it cannot adapt to the scenario of rapid changes in data heat and user request distribution; on the other hand, it only focuses on the caching of non-transient data , without considering the timeliness of transient data.

发明内容SUMMARY OF THE INVENTION

针对现有技术的缺陷，本发明的目的在于解决现有技术物联网海量暂态数据传输中存在的时延大、通信资源消耗大，存储资源受限条件下边缘缓存节点中暂态数据缓存效率不高的技术问题。In view of the defects of the prior art, the purpose of the present invention is to solve the problems of large time delay, large consumption of communication resources and limited storage resources in the transmission of massive transient data of the Internet of Things in the prior art, and the efficiency of transient data caching in edge cache nodes under the condition of limited storage resources Not a high technical problem.

为实现上述目的，第一方面，本发明实施例提供了一种物联网暂态数据的缓存替换方法，当前边缘缓存节点的缓存空间已满，该方法包括以下步骤：In order to achieve the above object, in the first aspect, an embodiment of the present invention provides a cache replacement method for transient data of the Internet of Things. The cache space of the current edge cache node is full, and the method includes the following steps:

S1.边缘缓存节点接收到用户发出的新的暂态数据项请求；S1. The edge cache node receives a new transient data item request sent by the user;

S2.判断请求的暂态数据项内容是否在边缘缓存节点的缓存中，若是，进入步骤S3，否则，进入步骤S6；S2. Determine whether the content of the requested transient data item is in the cache of the edge cache node, if so, go to step S3, otherwise, go to step S6;

S3.判断请求的暂态数据项是新鲜数据还是过期数据，若是新鲜数据，进入步骤S4，若是过期数据，进入步骤S5；S3. Determine whether the requested transient data item is fresh data or expired data, if it is fresh data, go to step S4, if it is expired data, go to step S5;

S4.直接从边缘缓存节点的缓存区读取该数据，将该数据转发给用户；S4. Read the data directly from the buffer area of the edge cache node, and forward the data to the user;

S5.边缘缓存节点将用户请求转发给数据源，从数据源处读取新数据，用新数据替换边缘缓存节点的缓存区中该过期数据，并将该新数据转发给用户；S5. The edge cache node forwards the user request to the data source, reads new data from the data source, replaces the expired data in the cache area of the edge cache node with the new data, and forwards the new data to the user;

S6.边缘缓存节点将用户请求转发给数据源，从数据源处读取新数据，使用深度增强学习选择边缘缓存节点的缓存区中待替换的数据，用新数据替换该待替换的数据，并将该新数据转发给用户。S6. The edge cache node forwards the user request to the data source, reads new data from the data source, uses deep reinforcement learning to select the data to be replaced in the buffer area of the edge cache node, replaces the data to be replaced with new data, and This new data is forwarded to the user.

具体地，步骤S2中，若f_k∈F_k，则请求的数据项内容在边缘缓存节点的缓存中，若

则请求的数据项内容不在边缘缓存节点的缓存中，其中，f_k为数据项请求k对应的数据内容唯一标识符CID，F_k为请求k到达时边缘缓存节点中缓存的数据项对应的CID集合。Specifically, in step S2, if f _k ∈ F _k , the content of the requested data item is in the cache of the edge cache node.

Then the content of the requested data item is not in the cache of the edge cache node, where f _k is the unique identifier CID of the data content corresponding to the data item request k, and F _k is the CID corresponding to the data item cached in the edge cache node when the request k arrives gather.

具体地，步骤S3中，若t_age(p(f_k))≤T_life(p(f_k))，则请求的数据项是新鲜数据，若t_age(p(f_k))＞T_life(p(f_k))，则请求的数据项是过期数据，其中，f_k为数据项请求k对应的CID，p(·)为从请求内容CID到数据项的映射函数，T_life(·)为数据项的有效生命周期，t_age(·)表示数据项的年龄。Specifically, in step S3, if _age (p(f _k ))≤T _life (p(f _k )), the requested data item is fresh data, and if _age (p(f _k ))>T _life (p(f _k )), then the requested data item is expired data, where f _k is the CID corresponding to the data item request k, p( ) is the mapping function from the request content CID to the data item, T _life (· ) is the effective life cycle of the data item, and _age (·) represents the age of the data item.

具体地，步骤S6中所述使用深度增强学习选择边缘缓存节点的缓存区中待替换的数据，具体包括：Specifically, the use of deep reinforcement learning in step S6 to select the data to be replaced in the buffer area of the edge cache node specifically includes:

1)在n时刻，观察边缘缓存节点状态信息，得到n时刻状态s_n；1) At time n, observe the state information of the edge cache node, and obtain the state s n at time _n ;

2)根据缓存策略π(a_n|s_n)选择缓存动作a_n并执行该缓存动作；2) Select the cache action _an according to the cache policy π(a _n |s _n ) and execute the cache action;

3)执行缓存动作a_n后，计算即时奖励r_n，边缘缓存节点状态信息由s_n变为s_n+1；3) After the cache action an is executed, the immediate reward rn _is calculated, and the state information of the edge cache node changes from s _n to s _n ₊₁ ;

4)将即时奖励r_n反馈到边缘缓存节点，并将本次状态转换过程<s_n,a_n,r_n,s_n+1>作为训练样本，用于训练深度增强学习的行动者-评论家网络，重复上述过程。4) Feed back the immediate reward rn to the edge cache node, and use this state transition process <s _n , a _n , rn , s _n ₊ ₁ > as a training sample for training deep reinforcement learning actors-comment Home network, repeat the above process.

具体地，即时奖励r_n的计算公式如下：Specifically, the calculation formula of the instant reward rn _is as follows:

其中，Req_n表示缓存动作a_n执行之后，到下一次执行缓存动作a_n+1之间边缘缓存节点收到的所有数据请求集合，C(d_k)为获取数据d_k的综合成本。Among them, Req _n represents the set of all data requests received by the edge cache node between the execution of the cache action an and the next execution of the cache action an ₊ ₁ , and C(d _k ) is the comprehensive cost of obtaining the data d _k .

具体地，综合成本C(d_k)的计算公式如下：Specifically, the calculation formula of the comprehensive cost C(d _k ) is as follows:

C(d_k)＝α·c(d_k)+(1-α)·l(d_k)C(d _k )=α·c(d _k )+(1-α)·l(d _k )

其中，α∈[0，1]表示折中系数，c(d_k)表示通信成本，l(d_k)表示数据时效性成本，c₁表示直接从边缘缓存节点获取数据的通信开销，c₂表示从数据源处获取数据的通信开销，c₁＜c₂，且c₁、c₂均为正常数；f_k为数据项请求k对应的CID，F_k为请求k到达时边缘缓存节点中缓存的数据项对应的CID集合，p(·)为从请求内容CID到数据项的映射函数，T_life(·)为数据项的有效生命周期，t_age(·)表示数据项的年龄。Among them, α∈[0, 1] represents the compromise coefficient, c(d _k ) represents the communication cost, l(d _k ) represents the data timeliness cost, c ₁ represents the communication cost of obtaining data directly from the edge cache node, c ₂ Represents the communication overhead of obtaining data from the data source, c ₁ <c ₂ , and both c ₁ and c ₂ are positive numbers; f _k is the CID corresponding to the data item request k, and F _k is the edge cache node when the request k arrives The CID set corresponding to the cached data item, p( ) is the mapping function from the request content CID to the data item, T _life ( ) is the effective life cycle of the data item, and _age ( ) represents the age of the data item.

具体地，深度增强学习中行动者网络参数θ的依照梯度上升更新为：Specifically, the actor network parameter θ in deep reinforcement learning is updated according to the gradient ascent as follows:

其中，λ为行动者网络的学习速率，

表示梯度算符，策略π(a_n|s_n；θ)表示在状态s_n下，选择缓存替换动作a_n的概率，

为优势函数，γ∈[0，1]表示折扣系数，

表示状态-价值函数；where λ is the learning rate of the actor network,

represents the gradient operator, and the policy π(a _n |s _n ; θ) represents the probability of selecting the cache replacement action _an in the state s _n ,

is the advantage function, γ∈[0,1] represents the discount coefficient,

represents the state-value function;

深度增强学习中评论家网络的网络参数θ_v依照梯度下降更新为：The network parameters θ _v of the critic network in deep reinforcement learning are updated according to gradient descent as:

其中，λ′为评论家网络的学习速率。where λ′ is the learning rate of the critic network.

其中，λ为行动者网络的学习速率，

为优势函数，γ∈[0，1]表示折扣系数，

表示状态-价值函数；H(π(·|s_n；θ))是状态s_n下、策略π_θ输出的动作空间的策略熵，β表示探索系数；where λ is the learning rate of the actor network,

is the advantage function, γ∈[0,1] represents the discount coefficient,

represents the state-value function; H(π(·|s _n ; θ)) is the policy entropy of the action space output by the policy π _θ under the state s _n , and β represents the exploration coefficient;

第二方面，本发明实施例提供一种物联网暂态数据的缓存替换系统，当前边缘缓存节点的缓存空间已满，该系统包括：状态判断模块、读取模块、请求转发模块和缓存替换模块；In a second aspect, an embodiment of the present invention provides a cache replacement system for transient data of the Internet of Things. The cache space of the current edge cache node is full. The system includes: a state judgment module, a reading module, a request forwarding module, and a cache replacement module. ;

所述状态判断模块，用于判断用户所请求暂态数据的状态，所述状态包括：状态一：请求的暂态数据项内容在边缘缓存节点的缓存中且请求的暂态数据项是新鲜数据；状态二：请求的暂态数据项内容在边缘缓存节点的缓存中且请求的暂态数据项是过期数据；状态三：请求的暂态数据项内容不在边缘缓存节点的缓存中；The state judging module is used for judging the state of the transient data requested by the user, and the state includes: state 1: the content of the requested transient data item is in the cache of the edge cache node and the requested transient data item is fresh data ; State 2: The content of the requested temporary data item is in the cache of the edge cache node and the requested temporary data item is expired data; State 3: The content of the requested temporary data item is not in the cache of the edge cache node;

所述读取模块，用于在所述状态判断模块判断结果为状态一时，直接从边缘缓存节点的缓存区读取该数据，将该数据转发给用户；The reading module is configured to directly read the data from the buffer area of the edge cache node and forward the data to the user when the state determination module determines that the result is state one;

所述请求转发模块，用于在所述状态判断模块判断结果为状态二或三时，边缘缓存节点将用户请求转发给数据源，从数据源处读取新数据；The request forwarding module is configured to forward the user request to the data source and read new data from the data source when the judgment result of the status judgment module is status 2 or 3;

所述缓存替换模块，用于在所述状态判断模块判断结果为状态二时，用所述请求转发模块读取的新数据替换边缘缓存节点的缓存区中该过期数据，并将该新数据转发给用户；在所述状态判断模块判断结果为状态三时，使用深度增强学习选择边缘缓存节点的缓存区中待替换的数据，用所述请求转发模块读取的新数据替换该待替换的数据，并将该新数据转发给用户。The cache replacement module is configured to replace the expired data in the cache area of the edge cache node with the new data read by the request forwarding module when the state judgment module determines that the result is state two, and forward the new data To the user; when the judgment result of the state judgment module is state three, use deep reinforcement learning to select the data to be replaced in the buffer area of the edge cache node, and replace the data to be replaced with the new data read by the request forwarding module , and forward this new data to the user.

第三方面，本发明实施例提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器执行时实现上述第一方面所述的缓存替换方法。In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the cache replacement method described in the first aspect above is implemented.

总体而言，通过本发明所构思的以上技术方案与现有技术相比，具有以下有益效果：In general, compared with the prior art, the above technical solutions conceived by the present invention have the following beneficial effects:

1.本发明考虑暂态数据的有效生命周期，将物联网数据的数据新鲜度和通信资源的消耗纳入综合通信成本，从而给出了物联网暂态数据缓存策略的目标：最小化获取物联网数据的长期综合成本。通过在网络的边缘缓存节点中缓存暂态数据，能够卸载网络流量到网络边缘，降低时延，一定程度上解决了物联网海量暂态数据传输中存在的时延大、通信资源消耗大的问题。1. The present invention considers the effective life cycle of the transient data, and incorporates the data freshness of the IoT data and the consumption of communication resources into the comprehensive communication cost, thereby providing the goal of the IoT transient data caching strategy: minimizing the acquisition of IoT data Long-term combined cost of data. By caching transient data in the edge cache nodes of the network, network traffic can be offloaded to the network edge, reducing the delay, and to a certain extent, it solves the problems of large delay and large consumption of communication resources in the transmission of massive transient data in the Internet of Things .

2.本发明使用深度增强学习的方法学习暂态数据的缓存策略，具体地，将缓存替换问题建模为马尔科夫过程，从边缘缓存节点的数据请求历史中挖掘数据的热度趋势信息，结合暂态数据的生命周期和数据新鲜度信息，作为深度增强学习的环境状态的输入。短期奖励设置为综合通信成本的相反数。利用评论家网络学习价值函数、行动者网络学习策略函数，通过不断地进行缓存替换操作，自适应地学习缓存策略，最大化长期奖励，从而最小化物联网中获取暂态数据的长期综合成本，解决了存储资源受限条件下边缘缓存节点中暂态数据缓存效率不高的问题。2. The present invention uses the method of deep reinforcement learning to learn the caching strategy of transient data. Specifically, the cache replacement problem is modeled as a Markov process, and the heat trend information of the data is mined from the data request history of the edge cache node, combined with Transient data lifetime and data freshness information as input to the environment state for deep reinforcement learning. The short-term reward is set as the inverse of the comprehensive communication cost. Using the critic network to learn the value function and the actor network to learn the policy function, through the continuous cache replacement operation, the cache strategy is adaptively learned, and the long-term reward is maximized, thereby minimizing the long-term comprehensive cost of acquiring transient data in the Internet of Things, solving the problem of It solves the problem of low efficiency of transient data caching in edge cache nodes under the condition of limited storage resources.

附图说明Description of drawings

图1为本发明实施例提供的一种物联网暂态数据的缓存替换方法流程图。FIG. 1 is a flowchart of a cache replacement method for transient data of the Internet of Things according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

首先，对本发明中用到的一些术语进行解释。First, some terms used in the present invention are explained.

边缘缓存节点是指物联网中靠近用户侧的具有缓存能力的网络节点。An edge cache node refers to a network node with caching capability that is close to the user side in the Internet of Things.

数据新鲜度等于数据剩余的有效时间与有效生命周期的比值，比值越大，使用的数据越及时，数据的新鲜度越高。The freshness of data is equal to the ratio of the remaining valid time of the data to the valid life cycle. The larger the ratio, the more timely the data is used, and the higher the freshness of the data.

数据时效性是指从数据源产生数据到用户端获取该数据的时间间隔及效率。获取时间间隔越短，时效性越强。Data timeliness refers to the time interval and efficiency from the generation of data by the data source to the acquisition of the data by the client. The shorter the acquisition time interval, the stronger the timeliness.

数据热度表示数据的流行程度，即一定时间内被请求的次数，次数越高，数据热度越高。Data popularity indicates the popularity of data, that is, the number of times it is requested within a certain period of time. The higher the number, the higher the data popularity.

暂态数据是指对数据有时效性要求的数据。Transient data refers to data that requires timeliness of data.

本发明的构思为：首先，为研究暂态数据缓存的综合成本，将获取物联网数据的综合成本分为两个部分：通信成本(包括带宽消耗、时延等)和数据时效性成本。物联网边缘缓存节点中缓存替换的目标是最小化获取数据的长期综合成本，即同时考虑通信成本和数据时效性成本。然后，将缓存替换问题建模为马尔可夫过程问题，从而构建了基于深度增强学习(Deep Reinforcement Learning,DRL)的缓存策略：根据物联网一段时间内的数据请求历史和当前的缓存状态，自动学习缓存策略，最小化获取物联网数据获取的长期综合成本。The concept of the present invention is as follows: First, in order to study the comprehensive cost of transient data caching, the comprehensive cost of acquiring IoT data is divided into two parts: communication cost (including bandwidth consumption, delay, etc.) and data timeliness cost. The goal of cache replacement in IoT edge cache nodes is to minimize the long-term comprehensive cost of acquiring data, that is, considering both the communication cost and the data timeliness cost. Then, the cache replacement problem is modeled as a Markov process problem, and a cache strategy based on Deep Reinforcement Learning (DRL) is constructed. Learn caching strategies to minimize the long-term overall cost of acquiring IoT data.

如图1所示，一种物联网暂态数据的缓存替换方法，当前边缘缓存节点的缓存空间已满，该方法包括以下步骤：As shown in Figure 1, a cache replacement method for transient data of the Internet of Things, the cache space of the current edge cache node is full, and the method includes the following steps:

步骤S1.边缘缓存节点接收到用户发出的新的暂态数据项请求。Step S1. The edge cache node receives a new transient data item request sent by the user.

物联网中的数据项d用CID(Content ID)唯一标识，每个数据项包含两个字段：生成时间t_gen(d)和有效生命周期T_life(d)。在t时刻，数据项d的年龄表示为t_age(d)＝t-t_gen(d)。如果数据项d的年龄小于其有效生命周期，即t_age(d)＜T_life(d)，则称数据项d是新鲜数据，在有效生命周期内；否则，称称数据项d是过期数据，已过时效期。The data item d in the Internet of Things is uniquely identified by CID (Content ID), and each data item contains two fields: the generation time t _gen (d) and the effective life cycle T _life (d). At time t, the age of data item d is expressed as t _age (d) = tt _gen (d). If the age of the data item d is less than its valid life cycle, that is, t _age (d) < T _life (d), then the data item d is said to be fresh data within the valid life cycle; otherwise, the data item d is said to be expired data , the expiration date has expired.

将来自物联网用户端的数据项请求记为k，所请求数据项内容对应的CID为f_k，请求到达的时间为t_k。在t_k时刻，物联网边缘缓存节点中缓存的数据项集合记为

缓存的数据项对应的CID集合为

其中，I表示边缘缓存节点的最大缓存数据项容量。映射函数

将请求内容的CID信息

与缓存的数据项

联系起来。The data item request from the IoT client is denoted as k, the CID corresponding to the content of the requested data item is f _k , and the arrival time of the request is t _k . At time _tk , the set of data items cached in the IoT edge cache node is recorded as

The CID set corresponding to the cached data item is

Among them, I represents the maximum cache data item capacity of the edge cache node. mapping function

CID information of the content that will be requested

with cached data items

Get in touch.

当数据项请求k到达后，物联网边缘缓存节点首先检查缓存中是否存在CID为f_k且新鲜的缓存数据项。分三种情况考虑：When the data item request k arrives, the IoT edge cache node first checks whether there is a fresh cached data item with CID f _k in the cache. Three situations are considered:

情形1：f_k∈F_k且t_age(p(f_k))≤T_life(p(f_k))，即请求的数据项内容在缓存中，而且该缓存项是新鲜的，满足时效性要求。因此，边缘缓存节点直接返回缓存的数据项p(f_k)给数据请求者。Case 1: f _k ∈ F _k and t _age (p(f _k ))≤T _life (p(f _k )), that is, the content of the requested data item is in the cache, and the cache item is fresh and meets the timeliness Require. Therefore, the edge cache node directly returns the cached data item p(f _k ) to the data requester.

情形2：f_k∈F_k且t_age(p(f_k))＞T_life(p(f_k))，即请求的数据项内容在缓存中，而且该缓存项已经过期。因此，边缘缓存节点从数据源处获取新数据返回给数据请求者，并且用新获取的数据替换掉缓存中的过期数据。Case 2: f _k ∈ F _k and _age (p(f _k ))>T _life (p(f _k )), that is, the content of the requested data item is in the cache, and the cache item has expired. Therefore, the edge cache node obtains new data from the data source and returns it to the data requester, and replaces the expired data in the cache with the newly obtained data.

情形3：

即请求的数据项内容不在缓存中。因此，边缘缓存节点从数据源处获取新的数据返回给数据请求者，同时使用深度增强学习选择边缘缓存节点的缓存区中待替换的数据，用新数据替换该待替换的数据。Scenario 3:

That is, the content of the requested data item is not in the cache. Therefore, the edge cache node obtains new data from the data source and returns it to the data requester, and at the same time uses deep reinforcement learning to select the data to be replaced in the buffer area of the edge cache node, and replaces the data to be replaced with new data.

通过以上三种情形的分析，用户发出请求k后，收到的返回数据项d_k可以表示为：Through the analysis of the above three situations, after the user sends the request k, the received returned data item d _k can be expressed as:

其中，f_k为数据项请求k对应的CID，p(·)为从请求内容CID到数据项的映射函数，F_k为请求k到达时边缘缓存节点中缓存的数据项对应的CID集合，t_age(d)表示数据项d的年龄，T_life(d)表示数据项d的有效生命周期。Among them, f _k is the CID corresponding to the data item request k, p( ) is the mapping function from the request content CID to the data item, F _k is the CID set corresponding to the data item cached in the edge cache node when the request k arrives, t _age (d) represents the age of data item d, and T _life (d) represents the effective life cycle of data item d.

步骤S2.判断请求的暂态数据项内容是否在边缘缓存节点的缓存中，若是，进入步骤S3，否则，进入步骤S6。Step S2. Determine whether the content of the requested transient data item is in the cache of the edge cache node, if so, go to step S3, otherwise, go to step S6.

f_k∈F_k表明请求的数据项内容在边缘缓存节点的缓存中，

表明请求的数据项内容不在边缘缓存节点的缓存中。f _k ∈ F _k indicates that the content of the requested data item is in the cache of the edge cache node,

Indicates that the content of the requested data item is not in the cache of the edge cache node.

步骤S3.判断请求的暂态数据项是新鲜数据还是过期数据，若是新鲜数据，进入步骤S4，若是过期数据，进入步骤S5。Step S3. Determine whether the requested transient data item is fresh data or expired data, if it is fresh data, go to step S4, if it is expired data, go to step S5.

t_age(p(f_k))≤T_life(p(f_k))表明请求的数据项是新鲜数据，t_age(p(f_k))＞T_life(p(f_k))表明请求的数据项是过期数据。t _age (p(f _k ))≤T _life (p(f _k )) indicates that the requested data item is fresh data, and t _age (p(f _k ))>T _life (p(f _k )) indicates that the requested data item is fresh data The data item is expired data.

边缘缓存节点的缓存替换策略而言，初始时边缘缓存节点先贪婪地缓存所有到达的数据项，直到缓存空间填满。当缓存已满后，如果新到达的请求，对应情形1，无需进行缓存替换；对应情形2，由于已知当前请求所对应的缓存数据已过期，因此直接用新获取的数据替换掉该过期数据即可；对应情形3，新的数据到达，需要缓存替换方法决定是否用新的数据项替换掉缓存区已经缓存的数据项，如果替换，具体替换哪一个已缓存的数据项。In terms of the cache replacement strategy of edge cache nodes, initially edge cache nodes greedily cache all incoming data items until the cache space is full. When the cache is full, if a new request arrives, corresponding to case 1, there is no need to perform cache replacement; for case 2, since it is known that the cached data corresponding to the current request has expired, the expired data is directly replaced with the newly obtained data That is, corresponding to case 3, when new data arrives, a cache replacement method is required to decide whether to replace the cached data item in the cache area with the new data item, and if so, which cached data item to replace.

具体地，对数据项d_k缓存策略给出的缓存动作记为a_k，动作空间为A＝{a⁰,a¹,…,a^I}。a_k＝a⁰表示不进行缓存替换，a_k＝aⁱ(1≤i≤I)表示用从数据源处获取的新鲜数据项d_k替换缓存区

位置对应的缓存项。当缓存区执行缓存替换动作a_k以后，D_k和F_k将变成D_k+1和F_k+1。Specifically, the caching action given by the data item d _k caching strategy is denoted as _ak , and the action space is A={a ⁰ , a ¹ , . . . , a ^I }. a _k =a ⁰ means not to perform cache replacement, a _k =a ⁱ (1≤i≤I) means to replace the cache area with the fresh data item d _k obtained from the data source

The cache entry corresponding to the location. After the cache area performs the cache replacement action a _k , D _k and F _k will become D _k+1 and F _k+1 .

步骤S6.边缘缓存节点将用户请求转发给数据源，从数据源处读取新数据，使用深度增强学习选择边缘缓存节点的缓存区中待替换的数据，用新数据替换该待替换的数据，并将该新数据转发给用户。Step S6. The edge cache node forwards the user request to the data source, reads new data from the data source, uses deep reinforcement learning to select the data to be replaced in the buffer area of the edge cache node, and replaces the data to be replaced with new data, and forward that new data to the user.

步骤S6对应情形3，使用深度增强学习选择边缘缓存节点的缓存区中待替换的数据，定义获取暂态数据项d_k的综合成本C(d_k)。Step S6 corresponds to Scenario 3, using deep reinforcement learning to select the data to be replaced in the buffer area of the edge cache node, and defining the comprehensive cost C(d _k ) for obtaining the transient data item d _k .

将获取暂态数据项d_k的综合成本C(d_k)分为两个部分，一部分为通信成本c(d_k)，另一部分为数据时效性成本l(d_k)。The comprehensive cost C(d _k ) of acquiring the transient data item d _k is divided into two parts, one part is the communication cost c(d _k ), and the other part is the data timeliness cost l(d _k ).

通信成本c(d_k)的计算公式如下：The calculation formula of the communication cost c(d _k ) is as follows:

其中，c₁表示直接从边缘缓存节点获取数据的通信开销，c₂表示从数据源处获取数据的通信开销，c₁＜c₂，且c₁、c₂均为正常数。Among them, c ₁ represents the communication overhead of acquiring data directly from the edge cache node, c ₂ represents the communication overhead of acquiring data from the data source, c ₁ <c ₂ , and both c ₁ and c ₂ are positive numbers.

数据时效性成本l(d_k)的计算公式如下：The calculation formula of data timeliness cost l(d _k ) is as follows:

综合成本C(d_k)的计算公式如下：The formula for calculating the comprehensive cost C(d _k ) is as follows:

C(d_k)＝α·c(d_k)+(1-α)·l(d_k)C(d _k )=α·c(d _k )+(1-α)·l(d _k )

其中，α∈[0，1]表示折中系数，对两种成本的重要性进行了加权，较大的α表示用户更在意通信损耗，否则，用户更在意数据时效性。Among them, α∈[0, 1] represents the compromise coefficient, which weights the importance of the two costs. A larger α indicates that the user is more concerned about the communication loss, otherwise, the user is more concerned about the data timeliness.

为了优化物联网获取数据的综合成本，将缓存替换问题建模成马尔可夫过程问题。对应前面缓存情形1和情形2都是确定性的规则，因此，只需要优化情景3中的缓存替换动作。In order to optimize the comprehensive cost of acquiring data in IoT, the cache replacement problem is modeled as a Markov process problem. Corresponding to the previous cache case 1 and case 2 are deterministic rules, therefore, only the cache replacement action in case 3 needs to be optimized.

马尔可夫过程问题可以通过{S,A,M(s_n+₁|s_n,a_n),R(s_n,a_n)}定义，其中，S表示物联网系统边缘缓存节点的状态集合，s_n表示n时刻边缘缓存节点的状态；A表示缓存替换策略的动作集合，a_n表示n时刻的缓存动作；M(s_n+1|s_n,a_n)表示执行动作a_n后，边缘缓存节点的状态从s_n转移为s_n+1的状态转移概率矩阵，R(s_n,a_n)表示即时奖励函数，在状态s_n执行动作a_n后的系统奖励反馈。因此，整个缓存替换过程可以表示为：The Markov process problem can be defined by {S, A, M(s _n + ₁ |s _n , a _n ), R(s _n , a _n )}, where S represents the state set of the edge cache nodes of the IoT system , sn represents the state of the edge cache node at time _n ; A represents the action set of the cache replacement strategy, a _n represents the cache action at time n; M(s _n ₊₁ |s _n , a _n ) represents that after the action an The state transition probability matrix of edge cache node from s _n to s _n ₊₁ , R(s _n , a _n ) represents the immediate reward function, the system reward feedback after performing action an in state s _n . Therefore, the entire cache replacement process can be expressed as:

1)在n时刻，边缘缓存节点观察系统状态信息，得到系统n时刻状态s_n∈S。1) At time n, the edge cache node observes the system state information and obtains the state s _n ∈ S of the system at time n.

2)边缘缓存节点根据缓存策略π(a_n|s_n)选择缓存动作a_n并执行。2) The edge cache node selects the cache action _an according to the cache policy π(a _n |s _n ) and executes it.

3)执行缓存动作a_n后，系统返回一个即时奖励r_n＝R(s_n,a_n)，且系统状态由s_n转移为s_n+1。3) After executing the cache action an, the system returns an instant reward _rn =R( _sn ,an), and the system state _is transferred from _sn to sn ₊ ₁ .

4)即时奖励r_n反馈到边缘缓存节点，将本次状态转换过程<s_n,a_n,r_n,s_n+1>作为训练样本加入深度强化学习的经验池，用于训练行动者-评论家网络，重复上述过程。4) The immediate reward rn _is fed back to the edge cache node, and the current state transition process <s _n , a _n , rn , s _n ₊₁ > is added as a training sample to the experience pool of deep reinforcement learning for training the actor- Critic network, repeat the above process.

其中，缓存策略π(a_n|s_n)表示，在状态s_n下，选择缓存替换动作a_n的概率，可简写为π。Among them, the cache policy π(a _n |s _n ) represents the probability of selecting the cache replacement action an in the state _sn , which can _be abbreviated as π.

整个过程的长期积累奖励记为

其中，γ∈[0，1]是折扣系数，决定了将来的奖励对现阶段缓存决策的影响程度。因此，本发明的目标是，找到最优的缓存策略π^*，最大化所有状态下的长期累积奖励期望：

其中，E[R_n|π]表示缓存策略π的长期累积奖励R_n的期望。The long-term accumulation reward of the whole process is recorded as

Among them, γ∈[0, 1] is the discount coefficient, which determines the degree of influence of future rewards on the cache decision at the current stage. Therefore, the goal of the present invention is to find the optimal caching policy π ^* that maximizes the long-term cumulative reward expectation in all states:

where E[R _n |π] represents the expectation of the long-term cumulative reward R _n of the caching policy π.

为了衡量缓存策略π的优劣，定义价值函数V^π(s_n)＝E[R_n|s_n；π]表示缓存策略π在状态s_n下的长期积累奖励R_n的期望。状态s_n下的最优价值函数可表示为V^*(s_n)＝max_πV^π(s_n)。In order to measure the pros and cons of the caching strategy π, the value function V ^π (s _n )=E[R _n |s _n ; π] is defined to represent the expectation of the long-term accumulation reward R _n of the caching strategy π in the state _sn . The optimal cost function in state _sn can be expressed as V ^* ( _sn )=max _π V ^π ( _sn ).

由马尔可夫性质，将V^π(s_n)带入Bellman方程，得到：From the Markov property, V ^π (s _n ) is brought into the Bellman equation, and we get:

其中，p(s_n+1,r_n|s_n,a_n)表示在状态s_n下执行动作a_n后，转移到状态s_n+1且得到系统即时奖励r_n的概率，γ∈[0，1]表示折扣系数。该式给出了马尔科夫过程中价值函数的迭代计算方法。Among them, p(s _n ₊₁ , r _n |s _n , a _n ) represents the probability of transferring to the state s _n ₊₁ and getting the system instant reward rn after performing the action an in the state s _n , γ∈[ 0, 1] represents the discount coefficient. This formula gives the iterative calculation method of the value function in the Markov process.

如果给出马尔可夫过程的状态转移概率矩阵M(s_n+1|s_n,a_n)，可以通过动态规划的方法，解出最优策略。然而边缘缓存替换过程中，状态转移概率矩阵未知，因此，本发明使用深度增强学习的方法，智能地从历史数据中挖掘信息，使用深度神经网络学习拟合状态-价值函数，从而得到最优的缓存替换策略。If the state transition probability matrix M(s _n+1 |s _n , a _n ) of the Markov process is given, the optimal policy can be solved by the method of dynamic programming. However, in the process of replacing the edge cache, the state transition probability matrix is unknown. Therefore, the present invention uses the deep reinforcement learning method to intelligently mine information from historical data, and uses the deep neural network to learn and fit the state-value function, so as to obtain the optimal Cache replacement strategy.

本发明使用深度增强学习方法，自适应地学习高效的缓存策略，以行动者-评论家(Actor-Critic)深度增强学习方法为例，但是不局限于行动者-评论家方法，介绍本发明的技术细节。其中：The present invention uses a deep reinforcement learning method to adaptively learn an efficient caching strategy, taking the actor-critic deep reinforcement learning method as an example, but is not limited to the actor-critic method, and introduces the method of the present invention. technical details. in:

输入：在边缘缓存节点不断处理用户请求的过程中，出现情形3，即请求的数据项内容不在缓存中时，边缘缓存节点通过转发该请求，从数据源处获取到所请求的数据项d_n。此时，缓存策略智能体(Agent)观察当前的环境状态

作为深度神经网络的输入。其中，

表示当前请求数据项d_n的特征向量，

表示边缘缓存节点的缓存区中第i个数据项的特征向量。特征向量

表示过去J组请求中，每一组中对数据项内容

的请求次数，该特征向量反映了该数据项的热度。

表示数据项内容

的有效生命周期，

表示数据项内容

的新鲜度，新鲜度等于数据项内容

的剩余有效时间与有效生命周期的比值。除了这些数据请求信息，输入的信息里还可以包含场景信息、边缘网络信息等。Input: In the process of the edge cache node continuously processing user requests, situation 3 occurs, that is, when the content of the requested data item is not in the cache, the edge cache node obtains the requested data item d _n from the data source by forwarding the request . At this point, the caching policy agent (Agent) observes the current environment state

as the input to the deep neural network. in,

represents the feature vector of the current request data item d _n ,

The feature vector representing the ith data item in the buffer of the edge cache node. Feature vector

Indicates the content of the data item in each group in the past J group requests

The number of requests, the feature vector reflects the popularity of the data item.

Indicates the content of the data item

the effective life cycle of ,

Indicates the content of the data item

The freshness of the data item is equal to the freshness

The ratio of the remaining effective time to the effective life cycle. In addition to these data request information, the input information can also include scene information, edge network information, and so on.

策略：本发明用行动者网络表征替换策略π_θ(a_n|s_n)，也可表示为π(a_n|s_n；θ)，可简称为π_θ，行动者网络参数为θ。行动者网络的输入是n时刻边缘缓存节点的状态信息s_n，行动者网络的输出为选择各个缓存动作的概率。深度增强学习最终是根据策略π_θ(a_n|s_n)选择缓存替换动作，即对选择各缓存动作概率最大的缓存区位置上的数据，执行替换动作a_n。例如，边缘缓存节点中有3个缓存区(标位1,2,3，然后用0表示不替换)。行动者网络输出的概率为(0.1,0.1,0.7,0.1)，然后按照这个概率选择替换的位置，此处很有可能输出就是a²，所以就是替换掉2号缓存区。行动者网络的目标是根据学习到的缓存策略，输出相应的缓存替换动作，从而最大化长期累积奖励期望E[R_n|π_θ]。Strategy: The present invention replaces strategy π _θ (a _n |s _n ) with actor network representation, which can also be expressed as π(a _n |s _n ; θ), which can be abbreviated as π _θ , and the actor network parameter is θ. The input of the actor network is the state information s _n of the edge cache node at time n, and the output of the actor network is the probability of selecting each cache action. Deep reinforcement learning finally selects the cache replacement action according to the policy π _θ (a _n |s _n ), that _is , executes the replacement action an for the data at the buffer location with the highest probability of selecting each cache action. For example, there are 3 buffers in the edge cache node (marks 1, 2, 3, and then use 0 to indicate no replacement). The probability of the actor network output is (0.1, 0.1, 0.7, 0.1), and then the replacement position is selected according to this probability. Here, the output is very likely to be a ² , so the No. 2 buffer is replaced. The goal of the actor network is to output the corresponding cache replacement action according to the learned cache policy, thereby maximizing the long-term cumulative reward expectation E[R _n |π _θ ].

状态-价值函数：本发明用评论家网络来表征状态-价值函数

评论家网络参数为θ_v。评论家网络的输入是n时刻边缘缓存节点的状态信息s_n，评论家网络的输出是该状态在当前策略π_θ下所表示的价值。评论家网络的目标是，尽可能准确的估计出策略π_θ下状态s_n的价值。State-value function: The present invention uses a network of critics to characterize the state-value function

The critic network parameter is θ _v . The input of the critic network is the state information s _n of the edge cache node at time n, and the output of the critic network is the value represented by the state under the current policy π _θ . The goal of the critic network is to estimate the value of state _sn under policy _πθ as accurately as possible.

策略训练：每执行一次缓存替换动作a_n，系统反馈给缓存策略智能决策模块一个即时奖励r_n：Policy training: every time a cache replacement action an _is performed, the system feeds back an instant reward rn to the intelligent decision _- making module of the cache policy:

其中，Req_n表示缓存动作a_n执行之后，到下一次执行缓存动作a_n+1之间边缘缓存节点收到的所有数据请求集合，C(d_k)为获取数据d_k的综合成本。由于目标是为了最小化获取数据的长期综合成本，所以在成本前面加了负号。Among them, Req _n represents the set of all data requests received by the edge cache node between the execution of the cache action an and the next execution of the cache action an ₊ ₁ , and C(d _k ) is the comprehensive cost of obtaining the data d _k . Since the goal is to minimize the long-term combined cost of acquiring the data, a minus sign is added to the front of the cost.

根据Bellman方程和马尔可夫过程的状态转移轨迹，可以估算出长期总奖励的期望，从而通过梯度更新的方式来学习网络参数。总奖励的期望相对于行动者网络参数θ的梯度可以计算为：According to the Bellman equation and the state transition trajectory of the Markov process, the expectation of the long-term total reward can be estimated, so that the network parameters can be learned by means of gradient update. The gradient of the expected total reward with respect to the actor network parameter θ can be calculated as:

为了最大化总奖励的期望，行动者网络参数θ的依照梯度上升更新为：To maximize the expectation of the total reward, the gradient-ascent update of the actor network parameter θ is:

其中，λ为行动者网络的学习速率，可以根据实际情况调整；

表示梯度算符；优势函数

衡量了在状态s_n下，选择动作a_n有多好。Among them, λ is the learning rate of the actor network, which can be adjusted according to the actual situation;

Represents the gradient operator; the advantage function

It measures how good the choice action an _is in the state _sn .

评论家网络可以通过时间差分的方法训练，损失函数设为评论家网络输出

与目标值

的平方误差。评论家网络的网络参数θ_v依照梯度下降更新为：The critic network can be trained by the time difference method, and the loss function is set as the output of the critic network

with the target value

the squared error. The network parameters θ _v of the critic network are updated according to gradient descent as:

其中，λ′为评论家网络的学习速率，可以根据实际情况调整；

表示梯度算符。Among them, λ′ is the learning rate of the critic network, which can be adjusted according to the actual situation;

Represents the gradient operator.

为了解决增强学习中的“探索-利用困境”(exploration-exploitationdilemma)，“利用”表示采取当前已经学习到的最优动作，而“探索”试图通过采取当前非最佳的动作来充分地探索动作空间。为了防止学习策略落入局部最优，本技术方案通过将策略(行动者网络输出的动作概率分布)的熵以正则项的形式加入行动者网络参数θ的更新过程，从而实现了对“探索”过程的鼓励：In order to solve the "exploration-exploitation dilemma" in reinforcement learning, "exploitation" means to take the optimal action that has been learned so far, while "exploration" attempts to fully explore the action by taking the current non-optimal action space. In order to prevent the learning strategy from falling into the local optimum, this technical solution realizes the “exploration” by adding the entropy of the strategy (the action probability distribution output by the actor network) in the form of a regular term to the update process of the actor network parameter θ. Process encouragement:

其中，H(π(·|s_n；θ))是状态s_n下、策略π_θ输出的动作空间的策略熵，β表示探索系数，

表示梯度算符。具体计算为：Among them, H(π(·|s _n ; θ)) is the policy entropy of the action space output by the policy π _θ under the state s _n , β is the exploration coefficient,

Represents the gradient operator. The specific calculation is:

通过梯度上升方法，θ朝着熵增的方向更新，鼓励“探索”过程。探索系数β是一个正数，用以平衡“探索”和“利用”的程度。越大的β值，表示越鼓励“探索”，具体实施时可以根据需要调整。With the gradient ascent method, θ is updated in the direction of increasing entropy, encouraging the "exploration" process. The exploration coefficient β is a positive number that balances the degree of "exploration" and "exploitation". The larger the β value, the more "exploration" is encouraged, and it can be adjusted as needed in the specific implementation.

缓存策略支持在线学习和离线学习两种方式。在线学习可以直接部署到边缘缓存节点，根据边缘缓存节点处理的物联网请求数据，周期性的更新网络参数，学习缓存策略。离线学习先在线下将缓存策略预训练好，然后部署到边缘缓存节点，保持不变。The caching strategy supports both online learning and offline learning. Online learning can be directly deployed to edge cache nodes. According to the IoT request data processed by edge cache nodes, network parameters are periodically updated to learn caching strategies. Offline learning first pre-trains the cache policy offline, and then deploys it to the edge cache node, which remains unchanged.

本发明提供一种物联网暂态数据的缓存替换系统，该系统包括：状态判断模块、读取模块、请求转发模块和缓存替换模块；The invention provides a cache replacement system for the transient data of the Internet of Things, which comprises: a state judgment module, a reading module, a request forwarding module and a cache replacement module;

该系统还包括训练模块，用于在每执行一次缓存替换动作时，收集缓存替换动作、替换动作所带来的即时奖励、替换前后的边缘缓存节点的状态，并基于上述参数训练所述缓存替换模块中深度增强学习的网络参数。The system also includes a training module for collecting the cache replacement action, the instant reward brought by the replacement action, and the status of the edge cache nodes before and after the replacement, and training the cache replacement based on the above parameters when the cache replacement action is performed. Network parameters for deep reinforcement learning in the module.

以上，仅为本申请较佳的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应该以权利要求的保护范围为准。The above are only the preferred embodiments of the present application, but the protection scope of the present application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in the present application, All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A cache replacement method for transient data of the Internet of things is characterized in that the cache space of a current edge cache node is full, and the method comprises the following steps:

s1, an edge cache node receives a new transient data item request sent by a user;

s2, judging whether the content of the transient data item of the request is in the cache of the edge cache node, if so, entering a step S3, otherwise, entering a step S6;

s3, judging whether the transient data item of the request is fresh data or expired data, if so, entering a step S4, and if so, entering a step S5;

s4, directly reading the data from the cache region of the edge cache node, and forwarding the data to a user;

s5, the edge cache node forwards the user request to a data source, reads new data from the data source, replaces the expired data in a cache region of the edge cache node with the new data, and forwards the new data to the user;

s6, the edge cache node forwards the user request to a data source, reads new data from the data source, selects data to be replaced in a cache region of the edge cache node by using depth-enhanced learning, replaces the data to be replaced by the new data, and forwards the new data to the user;

in step S6, the selecting data to be replaced in the cache region of the edge cache node using depth-enhanced learning specifically includes:

1) at the moment n, observing the state information of the edge cache node to obtain the state s at the moment n_n；

2) According to a cache strategy pi (a)_n|s_n) Select cache action a_nAnd executing the caching action;

3) performing a caching action a_nThen, calculate the instant prize r_nThe edge cache node state information is represented by s_nBecomes s_n+1；

4) Will award r immediately_nFeeding back to the edge cache node and converting the state<s_n,a_n,r_n,s_n+1>As a training sample, training an actor-critic network for deep reinforcement learning, and repeating the process;

the network parameter θ of the actor in the deep reinforcement learning is updated according to the gradient rise as follows:

wherein λ is the learning rate of the mobile network,

representing gradient operators, strategy pi (a)_n|s_n(ii) a θ) is shown in state s_nNext, select cache replace action a_nThe probability of (a) of (b) being,

for the merit function, γ ∈ [0, 1]]The discount factor is represented by a number of discount factors,

representing a state-cost function; network parameter theta of critic network in deep reinforcement learning_vUpdate according to gradient descent is:

wherein λ' is the learning rate of the critic network;

or, the actor network parameter θ in the deep reinforcement learning is updated according to the gradient rise as:

wherein λ is the learning rate of the mobile network,

representing a state-cost function; h (pi (· | s)_n(ii) a θ)) is the state s_nStrategy n_θstrategy entropy of output action space, beta represents exploration coefficient, and network parameter theta of critic network in deep reinforcement learning_vUpdate according to gradient descent is:

where λ' is the learning rate of the critic network.

2. The cache replacement method according to claim 1, wherein in step S2, if f is greater than f_k∈F_kThe requested data item content is in the cache of the edge cache node, if so

The requested data item content is not in the cache of the edge cache node, where f_kRequesting for a data item a corresponding data content unique identifier CID, F_kAnd the CID set corresponding to the data item cached in the edge cache node when the request k arrives.

3. The cache replacement method according to claim 1, wherein in step S3, if t is the case_age(p(f_k))≤T_life(p(f_k) The requested data item is fresh data if t)_age(p(f_k))＞T_life(p(f_k))，The requested data item is stale data, where f_kRequesting k the corresponding CID for the data item, p (-) a mapping function from the request content CID to the data item, T_life(. is) the effective lifecycle of the data item, t_age(. cndot.) represents the age of the data item.

4. The cache replacement method of claim 1, wherein an instant prize r_nThe calculation formula of (a) is as follows:

wherein, Req_nRepresenting a caching action a_nAfter execution, the action a is cached until the next execution_n+1All data request sets received by the edge cache nodes in between, C (d)_k) For obtaining data d_kThe comprehensive cost of (2).

5. The cache replacement method of claim 4, wherein the combined cost C (d)_k) The calculation formula of (a) is as follows:

C(d_k)＝α·c(d_k)+(1-α)·l(d_k)

wherein, α ∈ [0, 1]]Representing the compromise coefficient, c (d)_k) Denotes the communication cost, l (d)_k) Representing the data timeliness cost, c₁Representing the communication overhead of data taken directly from the edge cache node, c₂Representing the communication overhead for obtaining data from a data source, c₁＜c₂And c is and c₁、c₂Are all normal numbers; f. of_kRequesting k the corresponding CID, F for the data item_kCaching in edge cache nodes for request k arrivalP (-) is a mapping function from the request content CID to the data item, T_life(. is) the effective lifecycle of the data item, t_age(. cndot.) represents the age of the data item.

6. A cache replacement system for transient data of the Internet of things is characterized in that the cache space of a current edge cache node is full, and the system comprises: the device comprises a state judgment module, a reading module, a request forwarding module and a cache replacement module;

the state judgment module is configured to judge a state of transient data requested by a user, where the state includes: the first state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is fresh data; and a second state: the requested transient data item content is in the cache of the edge cache node and the requested transient data item is stale data; and a third state: the transient data item content of the request is not in the cache of the edge cache node;

the reading module is used for directly reading the data from the cache region of the edge cache node and forwarding the data to the user when the judgment result of the state judgment module is that the state is one;

the request forwarding module is used for forwarding the user request to the data source by the edge cache node when the judgment result of the state judgment module is the state two or three, and reading new data from the data source;

the cache replacement module is configured to replace the expired data in the cache area of the edge cache node with the new data read by the request forwarding module and forward the new data to the user when the determination result of the state determination module is state two; when the judgment result of the state judgment module is state three, selecting data to be replaced in the cache region of the edge cache node by using deep reinforcement learning, replacing the data to be replaced by the new data read by the request forwarding module, and forwarding the new data to the user;

the selecting of the data to be replaced in the cache region of the edge cache node by using the deep reinforcement learning specifically includes:

wherein λ is the learning rate of the mobile network,

wherein λ' is the learning rate of the critic network;

wherein λ is the learning rate of the mobile network,

where λ' is the learning rate of the critic network.

7. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the cache replacement method of any one of claims 1 to 5.