CN109325266B

CN109325266B - Response time distribution prediction method for online cloud service

Info

Publication number: CN109325266B
Application number: CN201810997628.1A
Authority: CN
Inventors: 纪梓潼; 赵来平; 曹悦; 李克秋
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2023-11-10
Anticipated expiration: 2038-08-29
Also published as: CN109325266A

Abstract

The invention relates to the technical field of cloud computing, and aims to solve the problem of response time distribution prediction of real loads of online cloud services, thereby providing guidance for deployment of cloud platform infrastructure and applications. Therefore, the technical scheme adopted by the invention is that the response time distribution prediction method for the online cloud service is combined with the access load prediction method and the tail delay analysis method to realize the prediction of the response time distribution law of the online cloud service, wherein the access load prediction result is used for analyzing the input parameters of the tail delay of the seat. The method is mainly applied to cloud computing occasions.

Description

Response time distribution prediction method for online cloud services

技术领域Technical field

本发明涉及云计算技术领域，特别是软件定义的云平台服务性能预测领域，具体讲,涉及面向在线云服务的响应时间分布预测方法。The present invention relates to the field of cloud computing technology, in particular to the field of software-defined cloud platform service performance prediction. Specifically, it relates to a response time distribution prediction method for online cloud services.

背景技术Background technique

在共享云计算环境下，运行在同一台物理机的多应用之间由于物理资源竞争(如CPU，Cache，内存带宽等资源)导致性能下降，其延迟可能达到单应用独占物理机运行时所产生延迟的600倍。为了克服由资源竞争而产生的性能干扰现象，从而保障服务质量，服务提供商所投入的硬件成本以及运维人力成本巨大。In a shared cloud computing environment, competition for physical resources (such as CPU, Cache, memory bandwidth and other resources) between multiple applications running on the same physical machine leads to performance degradation, and the delay may reach the level of a single application running exclusively on the physical machine. 600 times the delay. In order to overcome performance interference caused by resource competition and ensure service quality, service providers have invested huge hardware costs and operation and maintenance manpower costs.

“体验差的服务是不存在的服务”——在线云服务的响应时间过长跟服务不可用对用户造成的影响接近。因此，如何降低在线云服务的尾延迟，提高用户体验是云计算服务所面临的一个重要挑战。基于各云服务的运行特征分析，发现其性能瓶颈，是提高云服务的用户体验的关键，然而，由于云端基础设施架构的复杂性以及应用部署的规模十分庞大，有效的在线云服务分析模型和方法目前还十分稀少。此外，大多数已有的工作都侧重于评估平均性能指标，如平均响应时间、平均资源利用率等，而很少有研究考虑对在线云服务的延迟分布进行建模和分析。因此，设计一个有效的在线云服务分析模型是十分必要的。"A service with poor experience is a non-existent service" - the long response time of online cloud services is close to the impact of service unavailability on users. Therefore, how to reduce the tail delay of online cloud services and improve user experience is an important challenge faced by cloud computing services. Based on the analysis of the operating characteristics of each cloud service, discovering its performance bottleneck is the key to improving the user experience of cloud services. However, due to the complexity of the cloud infrastructure architecture and the huge scale of application deployment, effective online cloud service analysis models and Methods are still very scarce. In addition, most existing works focus on evaluating average performance metrics, such as average response time, average resource utilization, etc., while few studies consider modeling and analyzing the delay distribution of online cloud services. Therefore, it is necessary to design an effective online cloud service analysis model.

发明内容Contents of the invention

为克服现有技术的不足，本发明旨在解决在线云服务真实负载的响应时间分布预测问题，从而对云平台基础设施以及应用的部署提供指导。为此，本发明采取的技术方案是，面向在线云服务的响应时间分布预测方法，结合访问负载预测方法和尾延迟分析方法，实现对在线云服务响应时间分布律的预测，其中，访问负载预测结果座位尾延迟分析的输入参数。In order to overcome the shortcomings of the existing technology, the present invention aims to solve the problem of response time distribution prediction of real load of online cloud services, thereby providing guidance for the deployment of cloud platform infrastructure and applications. To this end, the technical solution adopted by the present invention is to predict the response time distribution of online cloud services by combining the access load prediction method and the tail delay analysis method to predict the response time distribution of online cloud services. The access load prediction method Input parameters for result seat tail delay analysis.

访问负载预测采用基于温特季节性指数平滑算法的负载强度预测，温特线性和季节性指数平滑法利用三个方程式，其中每一个方程式都用于平滑模型的三个组成部分：平稳的、趋势的和季节性的，且都含有一个有关的参数。三个方程式如下：Access load forecasting uses load intensity forecasting based on Winter's seasonal exponential smoothing algorithm. Winter's linear and seasonal exponential smoothing methods utilize three equations, each of which is used to smooth the three components of the model: stationary, trend and seasonal, and both contain a relevant parameter. The three equations are as follows:

b_t＝γ(S_t+S_(t-1))+(1-γ)b_(t-1),0＜γ＜1 1-2b_t＝γ(S_t+S_(t-1))+(1-γ)b_(t-1),0＜γ＜1 1-2

其中，S_t指的是t+1期的一次指数平滑值，x_t为最新的观察值，b_t指的是线性趋势的修正项，∝、β、γ为平滑系数，趋势值用相邻两次平滑值之差表示，L为季节的长度；I_t为第t期的季节修正系数，对t+m期，其预测值表示为：Among them, S _t refers to the exponential smoothing value of period t+1, x _t is the latest observation value, b _t refers to the correction term of the linear trend, ∝, β, and γ are smoothing coefficients, and the trend value is represented by adjacent The difference between the two smoothing values is expressed as: L is the length of the season; I _t is the seasonal correction coefficient of period t. For period t+m, its predicted value is expressed as:

F_t+m＝(S_t+b_tm)I_t+m-L,0＜β＜1 (1-4)F _t+m =(S _t +b _t m)I _t+mL ,0＜β＜1 (1-4)

通过反复试验的方法，确定α、β和γ的值，使均方差MSE达到最小从而得出模型，均方差的表示如下：Through repeated trials, the values of α, β and γ are determined to minimize the mean square error MSE to obtain the model. The mean square error is expressed as follows:

对于访问量高峰时段、低谷时段，分别对两个时间段进行预测，其到达速率分别为λ₁，λ₂，作为尾延迟分析输入参数。For the peak period and trough period of visit volume, the two time periods are predicted respectively, and their arrival rates are λ ₁ and λ ₂ respectively, which are used as the input parameters of tail delay analysis.

尾延迟分析为基于随机回报网进行分析，具体地，对前一阶段处于峰值和波谷的不同到达速率分别建模，探究不同访问强度下的云服务响应时间分布，计算响应时间分布需要三个步骤：(1)将总操作时间t划分为n个时间片；(2)在每个时间片中，都得到相应的响应时间分布，在此过程中应同时考虑到系统故障的情况；(3)获得基于所有时间段的总体响应时间分布，获得各分位下的尾延迟，从而求得其体验可用性。The tail delay analysis is based on the stochastic return network. Specifically, the different arrival rates at the peak and trough in the previous stage are modeled separately to explore the cloud service response time distribution under different access intensities. Calculating the response time distribution requires three steps. (1) Divide the total operation time t into n time slices; (2) In each time slice, obtain the corresponding response time distribution, and system failures should be taken into consideration in this process; (3) Obtain the overall response time distribution based on all time periods, obtain the tail delay under each quantile, and obtain its experience availability.

在具体的实施上，尾延迟分析首先使用SPNP工具对随机回报网模型进行计算，求得平稳状态下模型的初始状态概率，其次用标签顾客模型，求各初始状态下的响应时间分布，经过数据处理和计算，获得用户请求的总响应时间分布，具体的数据处理和计算过程如下，系统的初始状态：In terms of specific implementation, the tail delay analysis first uses the SPNP tool to calculate the stochastic return network model to obtain the initial state probability of the model in the stationary state, and then uses the label customer model to obtain the response time distribution in each initial state. After data Process and calculate to obtain the total response time distribution of user requests. The specific data processing and calculation process is as follows. The initial state of the system:

T＝[m,n,n',i,j,j',a,b,b',p,q,q'] (1)T＝[m,n,n',i,j,j',a,b,b',p,q,q'] (1)

m∈[0,…,M_s],n∈[0,…,M_db],n'∈[0,…,M_ca],m∈[0,…,M _s ],n∈[0,…,M _db ],n’∈[0,…,M _ca ],

i∈[0,…,N_s],j∈[0,…,N_db],j'∈[0,…,N_ca],i∈[0,…,N _s ],j∈[0,…,N _db ],j’∈[0,…,N _ca ],

a∈[0,…,N_s],b∈[0,…,N_db],b'∈[0,…,N_ca],a∈[0,…,N _s ],b∈[0,…,N _db ],b’∈[0,…,N _ca ],

p∈[0,…,N_s4,q∈[0,…,N_db],q'∈[0,…,N_ca],p∈[0,…,N _s 4,q∈[0,…,N _db ],q’∈[0,…,N _ca ],

i+a+p＝N_s,j+b+q＝N_db,j′+b′+q′＝N_ca (2)i+a+p＝N _s ,j+b+q＝N _db ,j′+b′+q′＝N _ca (2)

其中，m为搜索层当前等待队列长度，n为数据库层当前等待队列长度，n'为缓存层当前等待队列长度，i为搜索层当前服务队列长度，j为数据库层当前服务队列长度，j'为缓存层当前服务队列长度，a为搜索层当前发生故障的连接数，b为数据库层当前发生故障的连接数，b'为缓存层当前发生故障的连接数，p为搜索层当前空闲且正常的连接数，q为数据库层当前空闲且正常的连接数，q'为缓存层当前空闲且正常的连接数，M_s为搜索层等待队列最大长度，M_db为数据库层等待队列最大长度，N_s为搜索层支持的最大连接数，N_db为数据库层支持的最大连接数，N_ca为缓存层支持的最大连接数，π_x为系统处于稳定状态x下的可能性，其中则π_x的值通过随机回报网模型求得，且Among them, m is the current waiting queue length of the search layer, n is the current waiting queue length of the database layer, n' is the current waiting queue length of the cache layer, i is the current service queue length of the search layer, j is the current service queue length of the database layer, j' is the current service queue length of the cache layer, a is the number of currently failed connections in the search layer, b is the number of currently failed connections in the database layer, b' is the number of currently failed connections in the cache layer, p is the current idle and normal number of search layer The number of connections, q is the number of currently idle and normal connections in the database layer, q' is the number of currently idle and normal connections in the cache layer, M _s is the maximum length of the waiting queue in the search layer, M _db is the maximum length of the waiting queue in the database layer, N _s is the maximum number of connections supported by the search layer, N _db is the maximum number of connections supported by the database layer, N _ca is the maximum number of connections supported by the cache layer, π _x is the possibility that the system is in a stable state x, where Then the value of π _x is obtained through the stochastic return network model, and

同时，在标签顾客模型中，一个非空状态P_fin意味着标签顾客的请求已经处理完成，因此，定义标签顾客的吸收态如下，At the same time, in the tag customer model, a non-empty state P _fin means that the tag customer's request has been processed. Therefore, the absorption state of the tag customer is defined as follows,

其中，回报函数r_xt表示在t时刻是否到达吸收态；Among them, the reward function r _x t represents whether the absorption state is reached at time t;

通过求解回报函数r_xt，得到在初始状态x下，标签顾客请求在t时刻被吸收的概率R_xt。因此，计算在t时刻完成请求处理过程的概率Rt，即系统响应时间累积分布，By solving the reward function r _x t, we obtain the probability R _x t that the tag customer request is absorbed at time t under the initial state Therefore, calculate the probability Rt of completing the request processing process at time t, which is the cumulative distribution of system response time,

根据R(t)，知道在当前时间片，γ^th百分位下，延迟是否超过τ，然后，直接利用公式(6)计算系统的体验可用性EA，According to R(t), we know whether the delay exceeds τ at the current time slice, γ ^th percentile, and then directly use formula (6) to calculate the experience availability EA of the system,

其中，代表高尾延迟，即在γ^th百分位下，延迟小于等于τ，即TL(γ)≤τ；/>代表低尾延迟，即在γ^th百分位下，延迟大于τ,即TL(γ)>τ。in, Represents the high-tail delay, that is, at the γ ^th percentile, the delay is less than or equal to τ, that is, TL(γ)≤τ;/> Represents the low-tail delay, that is, at the γ ^th percentile, the delay is greater than τ, that is, TL(γ)>τ.

还包括实验验证步骤：搭建在线搜索服务实验平台，使用测试工具发送服从特定分布的请求，模拟用户发送请求对搜索引擎的服务能力进行测试，通过测试获取该服务的性能指标作为模型的理论值，进而结合访问负载预测方法和尾延迟分析方法的输出结果获得响应时间分布的理论值。It also includes experimental verification steps: build an online search service experimental platform, use testing tools to send requests that obey specific distributions, simulate users to send requests to test the service capabilities of the search engine, and obtain the performance indicators of the service through testing as the theoretical values of the model. Then, the theoretical value of the response time distribution is obtained by combining the output results of the access load prediction method and the tail delay analysis method.

本发明的特点及有益效果是：The characteristics and beneficial effects of the present invention are:

本发明针对在线云服务建立完善、准确的评估模型，将真实负载强度、硬件性能、并发数设置为相应的到达速率、故障率、连接数，求得响应时间累积分布函数，获得在线云服务的尾延迟，刻画其用户体验，速度快、结果准确。The present invention establishes a complete and accurate evaluation model for online cloud services, sets the real load intensity, hardware performance, and concurrency number to the corresponding arrival rate, failure rate, and number of connections, obtains the response time cumulative distribution function, and obtains the response time cumulative distribution function of the online cloud service. Tail latency characterizes its user experience, with fast speed and accurate results.

附图说明：Picture description:

图1为本发明的运行流程图；Figure 1 is an operation flow chart of the present invention;

图2为在线搜索服务的一般架构；Figure 2 shows the general architecture of online search services;

图3为在线搜索服务的随机回报网模型；Figure 3 shows the stochastic return network model of online search service;

图4为在线搜索服务的标签顾客模型；Figure 4 shows the tag customer model of online search services;

图5为改变最大连接数下的响应时间累积分布函数；Figure 5 shows the cumulative distribution function of response time under changing the maximum number of connections;

图中，(a)请求到达率＝100(b)请求到达率＝900。In the figure, (a) request arrival rate = 100 (b) request arrival rate = 900.

图6一周访问量指数平滑预测图。Figure 6 Exponential smoothing prediction chart of weekly visits.

具体实施方式Detailed ways

本发明所提出的预测方法创新地在传统的服务建模分析的基础上，结合了访问负载预测方法和尾延迟分析方法，实现了对在线云服务响应时间分布律的准确预测能力。我们将该方法应用于在线搜索服务场景，并在模型中增加了对节点故障行为和缓存服务器加速行为的建模分析，使之更加符合当前云服务的部署和运行场景。实验结果表明，本发明能够准确有效地分析在线云服务的尾延迟，为保障用户体验提供参考。The prediction method proposed by the present invention innovatively combines the access load prediction method and the tail delay analysis method on the basis of traditional service modeling analysis to achieve accurate prediction capabilities for the online cloud service response time distribution law. We applied this method to the online search service scenario, and added modeling analysis of node failure behavior and cache server acceleration behavior to the model, making it more consistent with the deployment and operation scenarios of current cloud services. Experimental results show that the present invention can accurately and effectively analyze the tail delay of online cloud services, providing a reference for ensuring user experience.

本发明提出一种基于时间序列预测以及随机回报网的在线云服务性能预测方法，评估特定硬件环境下软件应用的响应时间，准确刻画90分位，99分位等百分位的尾延迟。The present invention proposes an online cloud service performance prediction method based on time series prediction and stochastic return network, evaluates the response time of software applications under a specific hardware environment, and accurately depicts the tail delay of the 90th percentile, 99th percentile and other percentiles.

本发明提出的基于时间序列及随机回报网的云服务响应时间分布预测方法，主要由负载强度预测以及响应时间分布预测两个模块组成。The cloud service response time distribution prediction method proposed by the present invention based on time series and random return network mainly consists of two modules: load intensity prediction and response time distribution prediction.

(1)基于温特季节性指数平滑算法的负载强度预测(1) Load intensity prediction based on Winter seasonal exponential smoothing algorithm

温特季节性指数平滑算法可以对存在季节性变动趋势的时间序列进行建模。用户对在线云服务的访问存在季节性的规律。如：以一天24小时作为一个周期，即季节性。每一天用户的访问量居多集中在白天，所以服务器的压力主要集中在白天，在夜里经常处于空置状态。Winter's seasonal exponential smoothing algorithm can model time series with seasonal trends. There are seasonal patterns in user access to online cloud services. For example: taking 24 hours a day as a cycle, that is, seasonality. Most of the user visits every day are concentrated during the day, so the pressure on the server is mainly concentrated during the day, and it is often idle at night.

温特线性和季节性指数平滑法利用三个方程式，其中每一个方程式都用于平滑模型的三个组成部分(平稳的、趋势的和季节性的)，且都含有一个有关的参数。三个方程式如下：Winter's linear and seasonal exponential smoothing methods utilize three equations, each of which is used to smooth the three components of the model (stationary, trend and seasonal) and contains an associated parameter. The three equations are as follows:

b_t＝γ(S_t+S_(t-1))+(1-γ)b_(t-1),0＜γ＜1 (1-2)b_t＝γ(S_t+S_(t-1))+(1-γ)b_(t-1),0＜γ＜1 (1-2)

其中，S_t指的是t+1期的一次指数平滑值，x_t为最新的观察值，b_t指的是线性趋势的修正项，趋势值用相邻两次平滑值之差表示，L为季节的长度；I为季节修正系数。Among them, S _t refers to the exponential smoothing value in period t+1, x _t is the latest observation value, b _t refers to the correction term of the linear trend, and the trend value is represented by the difference between two adjacent smoothing values, L is the length of the season; I is the seasonal correction coefficient.

对t+m期，其预测值表示为：For period t+m, its predicted value is expressed as:

F_t+m＝(S_t+b_tm)I_t+m-L,0＜β＜1 (1-4)F _t+ m＝(S _t +b _t m)I _t+mL ,0＜β＜1 (1-4)

通过反复试验的方法，确定α、β和γ的值，使均方差(MSE)达到最小从而得出模型。均方差的表示如下：Through trial and error method, the values of α, β and γ are determined to minimize the mean square error (MSE) to obtain the model. The expression of mean square error is as follows:

本处使用该模型，选取NASA网站一个月的访问量作为输入数据集，进行访问到达速率的时间序列分析，从而说明该方法的应用，其预测效果如图6所示：This model is used here, and the monthly visits to the NASA website are selected as the input data set to conduct time series analysis of the access arrival rate, thereby illustrating the application of this method. The prediction effect is shown in Figure 6:

通过实验，本次预测得到均方差为0.19047，此时得出α、β和γ的值近似等于0.27，0.0与0.0。该方法可对具有规律性的在线云服务访问速率(记为λ)进行预测，如图中所示，周一至周五工作日的访问量较大，即视为高峰时段；周六日访问量较小，即视为低谷时段，分别对两个时间段进行预测，其到达速率分别为λ₁，λ₂，该到达速率作为第二模块SRN模型的输入参数。Through experiments, the mean square error of this prediction is 0.19047. At this time, the values of α, β and γ are approximately equal to 0.27, 0.0 and 0.0. This method can predict the regular access rate of online cloud services (denoted as λ). As shown in the figure, the number of visits on working days from Monday to Friday is larger, which is regarded as the peak period; the number of visits on Saturdays and Sundays is smaller, it is regarded as a trough period, and the two time periods are predicted respectively, and their arrival rates are λ ₁ and λ ₂ respectively. The arrival rates are used as input parameters of the second module SRN model.

(2)基于Stochastic Reward Net(随机回报网)的响应时间预测(2) Response time prediction based on Stochastic Reward Net (stochastic reward network)

给定云计算服务在不同时间段下的负载到达频率预测结果，本发明另外提供一种结合随机回报网与标签顾客模型的尾延迟预测方法，实现对用户感知的响应时间分布的准确建模。由于SRN模型仅支持服从单一变量的泊松分布数据流作为输入，因此本发明将对前一阶段处于峰值和波谷的不同到达速率分别建模，探究不同访问强度下的云服务响应时间分布。计算响应时间分布需要三个步骤：(1)将总操作时间t划分为n个时间片。(2)在每个时间片中，都得到相应的响应时间分布，在此过程中应同时考虑到系统故障的情况。(3)获得基于所有时间段的总体响应时间分布，获得各分位下的尾延迟，从而求得其体验可用性。Given the load arrival frequency prediction results of cloud computing services in different time periods, the present invention additionally provides a tail delay prediction method that combines a random return network and a tag customer model to achieve accurate modeling of user-perceived response time distribution. Since the SRN model only supports the Poisson distributed data flow that obeys a single variable as input, this invention will separately model different arrival rates at peaks and troughs in the previous stage to explore the distribution of cloud service response time under different access intensities. Calculating the response time distribution requires three steps: (1) Divide the total operation time t into n time slices. (2) In each time slice, the corresponding response time distribution is obtained, and system faults should be taken into consideration in this process. (3) Obtain the overall response time distribution based on all time periods, obtain the tail delay under each quantile, and thereby obtain its experience availability.

在具体的实施上，本发明首先使用SPNP(Stochastic Petri Net Package，随机Petri网工具包)等工具对随机回报网模型进行计算，求得平稳状态下模型的初始状态概率。其次用标签顾客模型，求各初始状态下的响应时间分布，经过数据处理和计算，获得用户请求的总响应时间分布。下面介绍具体的数据处理和计算过程。In terms of specific implementation, the present invention first uses tools such as SPNP (Stochastic Petri Net Package, Stochastic Petri Net Toolkit) to calculate the stochastic return network model to obtain the initial state probability of the model in a stationary state. Secondly, the label customer model is used to find the response time distribution in each initial state. After data processing and calculation, the total response time distribution of user requests is obtained. The specific data processing and calculation process are introduced below.

假设系统的初始状态如下所示：Assume that the initial state of the system is as follows:

T＝[m,n,n',i,j,j',a,b,b',p,q,q'] (1)T＝[m,n,n',i,j,j',a,b,b',p,q,q'] (1)

p∈[0,…,N_s],q∈[0,…,N_db],q'∈[0,…,N_ca],p∈[0,…,N _s ],q∈[0,…,N _db ],q’∈[0,…,N _ca ],

其中，m为搜索层当前等待队列长度，n为数据库层当前等待队列长度，n'为缓存层当前等待队列长度，i为搜索层当前服务队列长度，j为数据库层当前服务队列长度，j'为缓存层当前服务队列长度，a为搜索层当前发生故障的连接数，b为数据库层当前发生故障的连接数，b'为缓存层当前发生故障的连接数，p为搜索层当前空闲且正常的连接数，q为数据库层当前空闲且正常的连接数，q'为缓存层当前空闲且正常的连接数，M_s为搜索层等待队列最大长度，M_db为数据库层等待队列最大长度，N_s为搜索层支持的最大连接数，N_db为数据库层支持的最大连接数，N_ca为缓存层支持的最大连接数，。Among them, m is the current waiting queue length of the search layer, n is the current waiting queue length of the database layer, n' is the current waiting queue length of the cache layer, i is the current service queue length of the search layer, j is the current service queue length of the database layer, j' is the current service queue length of the cache layer, a is the number of currently failed connections in the search layer, b is the number of currently failed connections in the database layer, b' is the number of currently failed connections in the cache layer, p is the current idle and normal number of search layer The number of connections, q is the number of currently idle and normal connections in the database layer, q' is the number of currently idle and normal connections in the cache layer, M _s is the maximum length of the waiting queue in the search layer, M _db is the maximum length of the waiting queue in the database layer, N _s is the maximum number of connections supported by the search layer, N _db is the maximum number of connections supported by the database layer, and N _ca is the maximum number of connections supported by the cache layer.

假设π_x为系统处于稳定状态x下的可能性，其中则π_x的值可通过随机回报网模型求得，且Assume that π _x is the possibility that the system is in a stable state x, where Then the value of π _x can be obtained through the stochastic return network model, and

同时，在标签顾客模型中，一个非空状态P_fin意味着标签顾客的请求已经处理完成。因此，定义标签顾客的吸收态如下，At the same time, in the tag customer model, a non-empty state P _fin means that the tag customer's request has been processed. Therefore, the absorptive state of the label customer is defined as follows,

其中，回报函数r_xt表示在t时刻是否到达吸收态。Among them, the reward function r _x t represents whether the absorption state is reached at time t.

通过求解回报函数r_xt，可以得到在初始状态x下，标签顾客请求在t时刻被吸收的概率R_xt。因此，可以计算在t时刻完成请求处理过程的概率R(t)，即系统响应时间累积分布，By solving the reward function r _x t, we can get the probability R _x t that the tag customer request is absorbed at time t under the initial state x. Therefore, the probability R(t) of completing the request processing process at time t can be calculated, that is, the cumulative distribution of system response time,

根据R(t)，可以很容易地知道在当前时间片，γ^th百分位下，延迟是否超过τ。然后，直接利用公式(6)计算系统的体验可用性EA，According to R(t), it is easy to know whether the delay exceeds τ at the current time slice, γ ^th percentile. Then, directly use formula (6) to calculate the experience availability EA of the system,

其中，代表高尾延迟，即在γ^th百分位下，延迟小于等于τ，即TL(γ)≤τ；代表低尾延迟，即在γ^th百分位下，延迟大于τ,即TL(γ)>τ。in, Represents the high-tail delay, that is, at the γ ^th percentile, the delay is less than or equal to τ, that is, TL(γ)≤τ; Represents the low-tail delay, that is, at the γ ^th percentile, the delay is greater than τ, that is, TL(γ)>τ.

以下结合附图及较佳实施例，对依据本发明提供的具体实施方式、结构、特征及其功效，详细说明如下。The specific implementations, structures, features and functions provided by the present invention are described in detail below with reference to the accompanying drawings and preferred embodiments.

1、在线搜索服务的一般架构如下：1. The general structure of the online search service is as follows:

本发明根据在线搜索服务的一般架构来进行建模，其具体内容如图2所示。在线搜索服务主要由三个主要组件组成：网络爬虫组件、索引生成器组件和搜索组件。网络爬虫组件从网络上抓取一些可访问的网页。索引生成器组件将这些网页上找到的关键字与网页名称关联起来，生成索引信息。索引信息存储在数据库中，可用于用户搜索查询。搜索组件将接受用户的搜索请求，支持文本分析，并通过搜索存储在数据库中的索引信息，生成搜索结果列表。在这个过程中，搜索组件还会根据索引中的信息对整个列表中的每一页进行加权，并将最终搜索结果返回给用户。The present invention is modeled based on the general architecture of online search services, and its specific content is shown in Figure 2. The online search service mainly consists of three main components: web crawler component, index generator component and search component. The web crawler component crawls some accessible web pages from the Internet. The Index Builder component associates keywords found on these web pages with web page names to generate index information. Index information is stored in the database and is available for user search queries. The search component will accept the user's search request, support text analysis, and generate a list of search results by searching the index information stored in the database. During this process, the search component also weights each page in the entire list based on the information in the index and returns the final search results to the user.

在线搜索服务的搜索组件主要包括四层：搜索层、数据库层和两个负载均衡层。本发明不考虑这两个负载均衡器层，因为它们的处理时间远小于搜索层和数据库层的处理时间。数据库层分为基于内存的缓存数据库与基于硬盘存储的数据库。根据用户的数量，每一层都可部署多个服务器实例，这些实例在不同的虚拟机中运行。每个实例，无论是搜索实例还是数据库实例，都有一个用于接受请求的线程池。当搜索请求到达时，它首先在搜索队列中等待。负载均衡器读取搜索队列，并将请求分发给搜索层的特定实例以进行文本分析。然后，该搜索实例为该请求分配一个线程池中的连接线程。如果一个连接线程被分配了一个请求，该连接线程就会被占用，直到用户接收到搜索结果，该连接线程才会被解除占用。通过文本分析获得关键字后，请求将被进一步转发给数据库实例。数据库层首先根据缓存数据是否存在、数据是否过期等判断是否从缓存读取数据，若在缓存中查找到有效数据，则从缓存获得数据并返回；否则查询基于硬盘存储的数据库。通过设置访问缓存的权重与数据库的权重(如0.95与0.05)，将一个线程分配给具体数据库示例，以便搜索包含关键字的所有网页。最后，搜索实例中的响应编辑器将构建网页列表，并将搜索结果发送给用户。The search component of the online search service mainly includes four layers: search layer, database layer and two load balancing layers. This invention does not consider these two load balancer layers because their processing time is much smaller than that of the search layer and the database layer. The database layer is divided into memory-based cache database and hard disk storage-based database. Depending on the number of users, each tier can deploy multiple server instances running in different virtual machines. Each instance, whether it's a search instance or a database instance, has a thread pool that accepts requests. When a search request arrives, it first waits in the search queue. The load balancer reads the search queue and distributes requests to specific instances of the search layer for text analysis. The search instance then allocates a connection thread from the thread pool to the request. If a connection thread is assigned a request, the connection thread will be occupied until the user receives the search results, and the connection thread will not be released. After the keywords are obtained through text analysis, the request will be further forwarded to the database instance. The database layer first determines whether to read data from the cache based on whether the cached data exists, whether the data has expired, etc. If valid data is found in the cache, the data is obtained from the cache and returned; otherwise, the database based on hard disk storage is queried. By setting the weight of accessing the cache and the weight of the database (such as 0.95 and 0.05), assign a thread to a specific database example to search all web pages that contain the keyword. Finally, the response editor in the search instance builds the list of web pages and sends the search results to the user.

2、在线搜索服务的随机回报网模型如下：2. The stochastic return network model of online search service is as follows:

本发明通过随机回报模型求得系统的初始状态概率，在线搜索服务的随机回报模型如图3所示。延时转移T_a代表系统中的请求到达过程。状态P_sw和P_dbw分别代表搜索层和数据库层的等待队列，T_a只有在搜索层的等待队列长度小于最长等待队列长度的时候才会被触发。一旦T_a(t_trans)被触发，一个令牌就会被添加到状态P_sw(P_dbw)中，即一个请求被添加到搜索层(数据库层)，并等待被搜索层(数据库层)服务。状态P_s和状态P_db代表搜索层和数据库层的空闲资源数，即空闲连接数。如果一个令牌处于状态P_sw，并且状态P_s中令牌数大于等于1，则该令牌会从状态P_sw转移至P_sp，同时状态P_s中令牌数减一，即一个请求从等待队列转移至服务队列，并消耗一个空闲连接。状态P_sp和状态P_dbp代表搜索层和数据库层的服务队列。符号#代表转移的真实触发率是具有依赖性的。因此，延时转移T_sp的真实触发率为Kμ_s，其中K是状态P_sp中的令牌数。令牌经过搜索层的服务队列之后，将会被转移至状态P_trans，即请求已经被服务并离开搜索层。瞬时转移t_trans代表请求从搜索层转移至数据库层。瞬时转移to代表请求被系统拒绝，该转移只有在数据库层的等待队列已满时才会被触发。如果数据库层的等待队列未满，请求将会被转移至状态P_dbw。数据库层与搜索层类似，请求只有在有资源空闲时才会被转移至服务队列。最后，当转移T_dbp被触发，一个令牌将会从P_dbp移除，所占用的搜索层和数据库层的连接资源也会被分别释放，这意味着一个请求已经完成全部搜索过程，并离开系统。This invention obtains the initial state probability of the system through a random return model. The random return model of the online search service is shown in Figure 3. Delayed transfer _Ta represents the request arrival process in the system. The states P _sw and P _dbw represent the waiting queues of the search layer and the database layer respectively. _Ta will only be triggered when the waiting queue length of the search layer is less than the longest waiting queue length. Once T _a (t _trans ) is triggered, a token will be added to the state P _sw (P _dbw ), that is, a request is added to the search layer (database layer) and is waiting to be served by the search layer (database layer) . State P _s and state P _db represent the number of idle resources in the search layer and database layer, that is, the number of idle connections. If a token is in state P _sw , and the number of tokens in state P _s is greater than or equal to 1, the token will be transferred from state P _sw to P _sp , and the number of tokens in state P _s will be reduced by one, that is, a request will be transferred from state P sw to P sp . The waiting queue is transferred to the service queue and consumes an idle connection. The status P _sp and the status P _dbp represent the service queues of the search layer and the database layer. The symbol # indicates that the actual trigger rate of the transfer is dependent. Therefore, the true firing rate of delayed transition T _sp is Kμ _s , where K is the number of tokens in state P _sp . After the token passes through the service queue of the search layer, it will be transferred to state P _trans , that is, the request has been served and left the search layer. Transient transfer t _trans represents the transfer of requests from the search layer to the database layer. Transient transfer to means that the request was rejected by the system. The transfer will only be triggered when the waiting queue at the database layer is full. If the waiting queue at the database layer is not full, the request will be transferred to state P _dbw . The database layer is similar to the search layer in that requests are transferred to the service queue only when resources are available. Finally, when the transfer T _dbp is triggered, a token will be removed from P _dbp , and the occupied search layer and database layer connection resources will also be released respectively, which means that a request has completed the entire search process and left system.

除此之外，本发明所提出的随机回报网模型能够刻画系统的故障发生和故障恢复行为。如图3所示，状态P_sdo_wn和状态P_dbdo_wn分别代表搜索层和数据库层的故障连接数。如果一个搜索实例发生故障，那么在状态P_sp中该搜索实例正在运行的连接，以及在状态Ps中的空闲连接，都会发生故障。转移T_sd代表搜索层的故障发生过程。图中的锯齿线则代表一次转移多个令牌。假设在系统中，I_s和I_db分别代表搜索层和数据库层的实例个数。当搜索层的一个实例发生故障时，状态P_sp的1/Is个令牌以及状态P_s的1/Is个令牌，会同时转移至状态P_sdo_wn，即该故障实例的所有连接都会发生转移并等待恢复。延时转移T_sr代表故障实例的恢复过程。一旦T_sr被触发，该故障实例的所有连接会从状态P_sdo_wn转移至状态P_s，即故障实例被恢复为正常的空闲资源。数据库层的故障恢复过程与搜索层类似。值得注意的是，由于只有在请求离开数据库层才释放所占用的搜索层和数据库层资源，那么当数据库层发生故障时，除了会将故障的数据库实例进行转移，还会释放相应所占用的搜索层连接资源。具体的保护函数如表1所示。In addition, the stochastic reward network model proposed by the present invention can describe the fault occurrence and fault recovery behavior of the system. As shown in Figure 3, the state P _sd o _wn and the state P _dbd o _wn represent the number of failed connections in the search layer and database layer respectively. If a search instance fails, then the connections that the search instance is running in state P _sp , as well as the idle connections in state Ps, will fail. The transition T _sd represents the failure occurrence process of the search layer. The jagged lines in the figure represent the transfer of multiple tokens at once. Assume that in the system, I _s and I _db represent the number of instances of the search layer and database layer respectively. When an instance of the search layer fails, 1/Is tokens in state P _sp and 1/Is tokens in state P _s will be transferred to state P _sd o _wn at the same time, that is, all connections of the failed instance will be Transfer occurs and recovery is pending. Delayed transfer T _sr represents the recovery process of a failed instance. Once T _sr is triggered, all connections of the fault instance will transfer from state P _sd o _wn to state P _s , that is, the fault instance is restored to normal idle resources. The failure recovery process for the database layer is similar to that for the search layer. It is worth noting that since the occupied search layer and database layer resources are only released when the request leaves the database layer, when the database layer fails, in addition to transferring the failed database instance, the corresponding occupied search resources will also be released. Layer connection resources. The specific protection functions are shown in Table 1.

转移transfer 保护函数protection function g1g1 #P_sw<M_s？1:0#P _sw <M _s ? 1:0 g2g2 #P_dbw≥M_db？1:0#P _dbw ≥M _db ? 1:0 g3g3 #P_dbw<M_db？1:0#P _dbw <M _db ? 1:0

表1Table 1

3、在线搜索服务的标签顾客模型如下：3. The tag customer model of the online search service is as follows:

本发明使用标签顾客模型获得各初始状态下的响应时间分布。在线搜索服务的标签顾客模型如图4所示。该模型与随即回报网模型类似。值得注意的是，状态P_csw代表一个单独的标签顾客到达队列，其队列长度为1。假设m，n，n'，i，j，j'，a，b，b'，p，q，q'分别为状态P_sw，P_dbw，P_caw，P_sp，P_dbp，P_cap，P_sdown，P_dbdown,P_cachedown，P_s，P_db，P_ca在初始状态下所包含的令牌数，即在标签顾客请求到达前的系统初始状态。换句话说，在标签顾客请求到达时，在搜索层(缓存层、数据库层)，有m(n',n)个请求处于等待队列中，有i(j',j)个请求处于服务队列中，有a(b',b)个连接发生故障，有p(q',q)个空闲连接可以被占用。瞬时转移t_csw只有在状态P_sw为空，且P_s不为空时才会被触发。相应的，瞬时转移t_cdbw只有在状态P_dbw为空，且P_db不为空时才会被触发。具体的保护函数如表2所示。The present invention uses the tag customer model to obtain the response time distribution in each initial state. The tag customer model of the online search service is shown in Figure 4. This model is similar to the random reward network model. It is worth noting that state P _csw represents a single tagged customer arrival queue with a queue length of 1. Assume that m, n, n', i, j, j', a, b, b', p, q, q' are states P _sw , P _dbw , P _caw , P _sp , P _dbp , P _cap , P respectively. _sdown , P _dbdown , P _cachedown , P _s , P _db , P _ca contain the number of tokens in the initial state, that is, the initial state of the system before the tag customer request arrives. In other words, when a tag customer request arrives, there are m(n',n) requests in the waiting queue and i(j',j) requests in the service queue at the search layer (cache layer, database layer) , a(b',b) connections fail, and p(q',q) idle connections can be occupied. The transient transition t _csw will only be triggered when the state P _sw is empty and P _s is not empty. Correspondingly, the instantaneous transfer t _cdbw will only be triggered when the state P _dbw is empty and P _db is not empty. The specific protection functions are shown in Table 2.

转移transfer 保护函数protection function g1g1 #P_dbw≥M_db？1:0#P _dbw ≥M _db ? 1:0 g2g2 #P_dbw<M_db？1:0#P _dbw <M _db ? 1:0 g3g3 #P_s>0且#P_sw＝＝0？1:0#P _s >0 and #P _sw == 0? 1:0 g4g4 #P_db>0且#P_dbw＝＝0？1:0#P _db >0 and #P _dbw ==0?1:0 g5g5 #P_dbw≥M_db？1:0#P _dbw ≥M _db ? 1:0 g6g6 #P_dbw<M_db？1:0#P _dbw <M _db ? 1:0

表2Table 2

4、响应时间累积分布函数：4. Response time cumulative distribution function:

本发明使用SPNP等工具对随机回报网模型进行计算，获得用户请求响应时间分布的模型值。除此之外，在实验验证方面，本发明以Apache Solr搜索引擎为例，搭建在线搜索服务实验平台，使用Jmeter等测试工具发送服从特定分布的请求(如指数分布)，模拟用户发送请求对搜索引擎的服务能力进行测试，通过测试获取该服务的性能指标(如服务时间，排队时间)作为模型的理论值，进而根据模型的输出结果获得响应时间分布的理论值。图5通过改变请求到达率，举例说明响应时间分布的模型值和实验值，验证该模型的正确性。并可通过分析该在线搜索服务各分位下的尾延迟，评估云服务的用户体验。The present invention uses tools such as SPNP to calculate the random return network model and obtain the model value of the user request response time distribution. In addition, in terms of experimental verification, the present invention takes the Apache Solr search engine as an example to build an online search service experimental platform, uses testing tools such as Jmeter to send requests that obey a specific distribution (such as exponential distribution), and simulates users sending requests to search The service capability of the engine is tested, and the performance indicators of the service (such as service time, queuing time) are obtained through the test as the theoretical value of the model, and then the theoretical value of the response time distribution is obtained based on the output results of the model. Figure 5 illustrates the model values and experimental values of response time distribution by changing the request arrival rate to verify the correctness of the model. The user experience of the cloud service can be evaluated by analyzing the tail latency of each quantile of the online search service.

Claims

1. A response time distribution prediction method for online cloud services, which is characterized by combining the access load prediction method and the tail delay analysis method to predict the response time distribution law of online cloud services, in which the access load prediction result is tail. Input parameters for delay analysis; access load forecast using load intensity forecast based on Winter's seasonal exponential smoothing algorithm, Winter's linear and seasonal exponential smoothing methods utilizing three equations, each of which is used to smooth the three components of the model Parts: stationary, trend and seasonal, and all contain a related parameter. The three equations are as follows:

b_t＝γ(S_t+S_(t-1))+(1-γ)b_(t-1),0<γ<1 (1-2)

Among them, S _t refers to the exponential smoothing value of period t+1, x _t is the latest observation value, b _t refers to the correction term of the linear trend, ∝, β, and γ are smoothing coefficients, and the trend value is represented by adjacent The difference between the two smoothing values is expressed as: L is the length of the season; I _t is the seasonal correction coefficient of period t. For period t+m, its predicted value is expressed as:

F _t+m =(S _t +b _t m)I _t+mL ,0<β<1 (1-4)

Through repeated trials, the values of α, β and γ are determined to minimize the mean square error MSE to obtain the model. The mean square error is expressed as follows:

For the peak period and trough period of visit volume, the two time periods are predicted respectively, and their arrival rates are λ ₁ and λ ₂ respectively, which are used as the input parameters of tail delay analysis.

2. The response time distribution prediction method for online cloud services as claimed in claim 1, characterized in that the tail delay analysis is based on a stochastic return network. Specifically, different arrival rates at peaks and troughs in the previous stage are analyzed. Model separately to explore the response time distribution of cloud services under different access intensities. Calculating the response time distribution requires three steps:

(1) Divide the total operation time t into n time slices; (2) In each time slice, obtain the corresponding response time distribution, and the system failure should also be taken into consideration in this process; (3) Obtain Based on the overall response time distribution in all time periods, the tail delay under each quantile is obtained to obtain its experience availability.

3. The response time distribution prediction method for online cloud services as claimed in claim 2, characterized in that, in specific implementation, the tail delay analysis first uses the SPNP tool to calculate the stochastic return network model to obtain the The initial state probability of the model, and then use the label customer model to find the response time distribution in each initial state. After data processing and calculation, the total response time distribution of user requests is obtained. The specific data processing and calculation process is as follows. The initial state of the system :

T＝[m,n,n′,i,j,j′,a,b,b′,p,q,q′] (1)

m∈[0,…,M _s ],n∈[0,…,M _db ],n′∈[0,…,M _ca ],

i∈[0,…,N _s ],j∈[0,…,N _db ],j′∈[0,…,N _ca ],

a∈[0,…,N _s ],b∈[0,…,N _db ],b′∈[0,…,N _ca ],

p∈[0,…,N _s ], q∈[0,…,N _db ], q′∈[0,…,N _ca ],

i+a+p＝N _s ,j+b+q＝N _db ,j′+b′+q′＝N _ca (2)

Among them, m is the current waiting queue length of the search layer, n is the current waiting queue length of the database layer, n' is the current waiting queue length of the cache layer, i is the current service queue length of the search layer, j is the current service queue length of the database layer, j' is the current service queue length of the cache layer, a is the number of currently failed connections in the search layer, b is the number of currently failed connections in the database layer, b' is the number of currently failed connections in the cache layer, p is the current idle and normal number of search layer The number of connections, q is the number of currently idle and normal connections in the database layer, q' is the number of currently idle and normal connections in the cache layer, Ms is the maximum length of the waiting queue in the search layer, Mdb is the maximum length of the waiting queue in the database layer, Ns is the search The maximum number of connections supported by the layer, Ndb is the maximum number of connections supported by the database layer, Nca is the maximum number of connections supported by the cache layer, π _x is the possibility that the system is in a stable state x, where Then the value of π _x is obtained through the stochastic return network model, and

At the same time, in the tag customer model, a non-empty state Pfin means that the tag customer's request has been processed. Therefore, the absorption state of the tag customer is defined as follows,

Among them, the reward function r _x (t) indicates whether the absorption state is reached at time t;

By solving the reward function r _x (t), we get the probability R _x (t) that the tag customer request is absorbed at time t under the initial state That is, the cumulative distribution of system response time,

According to R(t), we know whether the delay exceeds τ at the current time slice, γ ^th percentile, and then directly use formula (6) to calculate the experience availability EA of the system,

in, Represents the high-tail delay, that is, at the γ ^th percentile, the delay is less than or equal to τ, that is, TL(γ)≤τ; Represents the low-tail delay, that is, at the γ ^th percentile, the delay is greater than τ, that is, TL(γ)>τ.

4. The response time distribution prediction method for online cloud services as claimed in claim 1, further comprising an experimental verification step: building an online search service experimental platform, using testing tools to send requests that obey specific distributions, and simulating users to send Request to test the service capability of the search engine, obtain the performance indicators of the service through the test as the theoretical value of the model, and then combine the output results of the access load prediction method and the tail delay analysis method to obtain the theoretical value of the response time distribution.