JP2018136681A

JP2018136681A - Performance management program, performance management method, and management device

Info

Publication number: JP2018136681A
Application number: JP2017030013A
Authority: JP
Inventors: 浩一尾上; Koichi Onoue
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-02-21
Filing date: 2017-02-21
Publication date: 2018-08-30

Abstract

【課題】性能悪化要因の処理を特定できるようにする。
【解決手段】管理装置１０は、複数の処理を連携させることで提供されるサービス１の性能を示す性能情報が、サービス１に求められる性能を示す性能要件を満たしているか否かを判断する。性能情報が性能要件を満たしていない場合、管理装置１０は、直近の所定期間における複数の処理それぞれの動作状態を示す第１状態情報を取得する。さらに管理装置１０は、サービス１の性能が性能要件を満たしているときの複数の処理それぞれの動作状態を示す第２状態情報１１ａと、第１状態情報とに基づいて、性能要件が満たされているときと満たされてないときとの動作状態の差を、複数の処理それぞれについて計算する。そして管理装置１０は、複数の処理それぞれの動作状態の差に基づいて、サービス１の性能悪化要因となっている処理を判定する。
【選択図】図１The present invention makes it possible to specify the processing of a performance deterioration factor.
A management apparatus determines whether performance information indicating performance of a service provided by linking a plurality of processes satisfies a performance requirement indicating performance required for the service. When the performance information does not satisfy the performance requirement, the management apparatus 10 acquires first state information indicating the operation state of each of the plurality of processes in the latest predetermined period. Further, the management device 10 satisfies the performance requirement based on the second state information 11a indicating the operation state of each of the plurality of processes when the performance of the service 1 satisfies the performance requirement, and the first state information. The difference in operation state between when the condition is satisfied and when the condition is not satisfied is calculated for each of the plurality of processes. And the management apparatus 10 determines the process which is the performance deterioration factor of the service 1 based on the difference of the operation state of each of the plurality of processes.
[Selection] Figure 1

Description

本発明は、性能管理プログラム、性能管理方法、および管理装置に関する。 The present invention relates to a performance management program, a performance management method, and a management apparatus.

クラウドコンピューティング技術により、ユーザが望む量のコンピュータリソースをネットワーク経由でユーザに提供することが容易となっている。クラウドコンピューティングのなかには、例えばアプリケーションソフトウェア（以下、アプリケーションと呼ぶ）を稼働させるためのプラットフォームの利用環境を、ネットワークを介してユーザに提供するＰａａＳ（Platform as a Service）がある。 Cloud computing technology makes it easy to provide users with the amount of computer resources they want through the network. Among cloud computing, for example, there is PaaS (Platform as a Service) that provides a user with a platform usage environment for operating application software (hereinafter referred to as an application) via a network.

ＰａａＳを利用したサービスは、例えばマイクロサービスアーキテクチャと呼ばれる技術思想に基づいて構築することができる。マイクロサービスアーキテクチャでは、１つのサービスを提供するソフトウェアが、コンポーネントと呼ばれる複数の小さなアプリケーションに分割して作成される。複数のコンポーネントを組み合わせて１つのサービスを提供することによって、処理能力の増強を、コンポーネント単位で実施することができる。これにより、あるコンポーネントの処理負荷が過大となった場合、そのコンポーネントについて処理能力の増強を行えばよく、他のコンポーネントは変更せずにすむ。 A service using PaaS can be constructed based on a technical idea called a micro service architecture, for example. In the micro service architecture, software providing one service is created by dividing it into a plurality of small applications called components. By combining a plurality of components to provide one service, the processing capacity can be increased on a component basis. As a result, when the processing load of a certain component becomes excessive, the processing capacity of the component may be increased, and other components need not be changed.

コンポーネントの実行単位はコンテナと呼ばれる。コンポーネントの処理能力を増強する場合、管理者は、例えば増強対象のコンポーネント用のコンテナ数を増加（スケールアウト）させる。コンテナ数の増減でサービスの性能調整ができることにより、システムのリソースを効率的に利用することができる。このようなコンテナを利用したＰａａＳシステムは、Container-based PaaS Platformと呼ばれる。 The execution unit of a component is called a container. When increasing the processing capacity of a component, the manager increases (scales out) the number of containers for the component to be increased, for example. By adjusting the service performance by increasing or decreasing the number of containers, system resources can be used efficiently. A PaaS system using such a container is called a Container-based PaaS Platform.

リソース利用の効率化に関する技術としては、例えば状況変化に対応して、リソースの利用効率を高めることができるリソース管理システムがある。またコンポーネントの管理に関する技術としては、例えばアプリケーションプログラムのコンポーネントの生産性を損なうことなく当該コンポーネントの監視および監視結果にもとづいた処理を行なう技術がある。 As a technology related to efficient use of resources, for example, there is a resource management system that can improve the use efficiency of resources in response to a change in situation. In addition, as a technology related to component management, for example, there is a technology that performs monitoring based on the monitoring of the component and processing based on the monitoring result without impairing the productivity of the component of the application program.

国際公開第２０１５／０４９７８９号International Publication No. 2015/049789 特開２００９−１１６６１８号公報JP 2009-116618 A

クラウドコンピューティングシステムの管理者は、サービスの品質が保てるように、サービスを実現するコンポーネントの性能を適宜調整する。例えば管理者は、性能要件として、サービスを提供する際のレイテンシの最大値を定め、サービスのレイテンシが最大値を超えた場合、そのサービスの提供に利用しているコンポーネントを実行する処理能力を増強することとなる。 The administrator of the cloud computing system appropriately adjusts the performance of the component that implements the service so that the quality of the service can be maintained. For example, the administrator determines the maximum value of the latency when providing a service as a performance requirement, and if the latency of the service exceeds the maximum value, the administrator increases the processing capacity to execute the component used to provide the service Will be.

しかし、サービスのレイテンシが最大値を超えたというだけでは、性能要件を満たさなくなったサービスで利用している複数のコンポーネントのうち、どのコンポーネントに性能悪化の要因あるのかが分からない。特にＰａａＳでは、ＰａａＳの利用者がコンポーネントを作成しており、システムの管理者は、コンポーネントの具体的な処理内容を知ることができない。そのためシステムの管理者が、性能悪化の要因となっているコンポーネントを適確に特定するのは困難である。 However, just because the service latency exceeds the maximum value, it is not known which component causes the performance deterioration among a plurality of components used in the service that does not satisfy the performance requirement. In particular, in PaaS, a PaaS user creates a component, and the system administrator cannot know the specific processing contents of the component. Therefore, it is difficult for the system administrator to accurately identify the component that causes the performance deterioration.

なお、性能悪化の要因となっている処理の特定が難しいという問題は、マイクロサービスアーキテクチャに準じて作成されたサービスに限らず、複数の処理を連携させることで提供されるサービスの性能を調整する場合に同様に生じる問題である。 In addition, the problem that it is difficult to identify the process causing performance degradation is not limited to services created according to the micro service architecture, but the performance of services provided by coordinating multiple processes is adjusted. This is a problem that occurs in the same way.

１つの側面では、本件は、性能悪化要因の処理を特定できるようにすることを目的とする。 In one aspect, the purpose of this case is to make it possible to specify the processing of the performance deterioration factor.

１つの案では、コンピュータに以下の処理を実行させる性能管理プログラムが提供される。
性能管理プログラムに基づいて、コンピュータは、複数の処理を連携させることで提供されるサービスの性能を示す性能情報を取得する。次にコンピュータは、性能情報が、サービスに求められる性能を示す性能要件を満たしているか否かを判断する。次にコンピュータは、性能情報が性能要件を満たしていない場合、直近の所定期間における複数の処理それぞれの動作状態を示す第１状態情報を取得する。次にコンピュータは、サービスの性能が性能要件を満たしているときの複数の処理それぞれの動作状態を示す第２状態情報と、第１状態情報とに基づいて、性能要件が満たされているときと満たされてないときとの動作状態の差を、複数の処理それぞれについて計算する。そしてコンピュータは、複数の処理それぞれの動作状態の差に基づいて、サービスの性能悪化要因となっている処理を判定する。 In one proposal, a performance management program that causes a computer to execute the following processing is provided.
Based on the performance management program, the computer acquires performance information indicating the performance of a service provided by linking a plurality of processes. Next, the computer determines whether or not the performance information satisfies a performance requirement indicating the performance required for the service. Next, when the performance information does not satisfy the performance requirements, the computer acquires first state information indicating the operation states of the plurality of processes in the most recent predetermined period. Next, the computer, when the performance requirement is satisfied, based on the second state information indicating the operation state of each of the plurality of processes when the performance of the service satisfies the performance requirement, and the first state information, The difference in operating state from when it is not satisfied is calculated for each of a plurality of processes. Then, the computer determines the process that causes the performance deterioration of the service based on the difference between the operation states of the plurality of processes.

１態様によれば、性能悪化要因の処理を特定できる。 According to one aspect, it is possible to specify the processing of the performance deterioration factor.

第１の実施の形態に係るシステムの構成例を示す図である。It is a figure which shows the structural example of the system which concerns on 1st Embodiment. 第２の実施の形態のシステム構成例を示す図である。It is a figure which shows the system configuration example of 2nd Embodiment. 本実施の形態に用いる管理サーバのハードウェアの一構成例を示す図である。It is a figure which shows one structural example of the hardware of the management server used for this Embodiment. マイクロサービスアーキテクチャの概念を示す図である。It is a figure which shows the concept of a micro service architecture. 性能調整のためにゲートウェイと管理サーバが有する機能を示すブロック図である。It is a block diagram which shows the function which a gateway and a management server have for performance adjustment. レイテンシ記憶部が記憶する情報の一例を示す図である。It is a figure which shows an example of the information which a latency memory | storage part memorize | stores. サービス情報記憶部が記憶する情報の一例を示す図である。It is a figure which shows an example of the information which a service information storage part memorize | stores. メトリック情報記憶部が記憶する情報の一例を示す図である。It is a figure which shows an example of the information which a metric information storage part memorize | stores. 正常時振る舞い記憶部が記憶する情報の一例を示す図である。It is a figure which shows an example of the information which a behavior storage part at normal time memorize | stores. リソース情報記憶部が記憶する情報の一例を示す図である。It is a figure which shows an example of the information which a resource information storage part memorize | stores. 性能調整エンジンの機能を示すブロック図である。It is a block diagram which shows the function of a performance adjustment engine. 性能要件の判定処理の一例を示す図である。It is a figure which shows an example of the determination process of a performance requirement. コンテナの振る舞いの計算例を示す図である。It is a figure which shows the example of calculation of the behavior of a container. サーバの振る舞いの計算例を示す図である。It is a figure which shows the example of calculation of the behavior of a server. パーセンタイル値への重み付けの例を示す図である。It is a figure which shows the example of the weighting to a percentile value. 要因度の計算例を示す図である。It is a figure which shows the example of calculation of a factor degree. 要因コンポーネントの推定例を示す図である。It is a figure which shows the example of an estimation of a factor component. サーバ要因度符号の判定例を示す図である。It is a figure which shows the example of determination of a server factor degree code | symbol. コンテナの配置例を示す図である。It is a figure which shows the example of arrangement | positioning of a container. 性能調整結果の一例を示す図である。It is a figure which shows an example of a performance adjustment result. 性能調整処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of a performance adjustment process. 第３の実施の形態における性能調整処理の手順の一例を示すフローチャートの前半である。It is the first half of the flowchart which shows an example of the procedure of the performance adjustment process in 3rd Embodiment. スケールイン処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of a scale-in process. 第３の実施の形態における性能調整処理の手順の一例を示すフローチャートの後半である。It is the latter half of the flowchart which shows an example of the procedure of the performance adjustment process in 3rd Embodiment.

以下、本実施の形態について図面を参照して説明する。なお各実施の形態は、矛盾のない範囲で複数の実施の形態を組み合わせて実施することができる。
〔第１の実施の形態〕
まず、第１の実施の形態について説明する。 Hereinafter, the present embodiment will be described with reference to the drawings. Each embodiment can be implemented by combining a plurality of embodiments within a consistent range.
[First Embodiment]
First, the first embodiment will be described.

図１は、第１の実施の形態に係るシステムの構成例を示す図である。複数の処理（「処理ａ」、「処理ｂ」、「処理ｃ」）を連携して動作させることで提供されるサービス１が、複数のサーバ２〜４に実装されている。例えばサーバ２では「処理ａ」が実行され、サーバ３では「処理ｃ」が実行され、サーバ４では「処理ｂ」が実行されている。 FIG. 1 is a diagram illustrating a configuration example of a system according to the first embodiment. A service 1 provided by operating a plurality of processes (“process a”, “process b”, “process c”) in cooperation with each other is implemented in a plurality of servers 2 to 4. For example, “Processing a” is executed on the server 2, “Processing c” is executed on the server 3, and “Processing b” is executed on the server 4.

例えば端末装置５からのサービス１のリクエストがサーバ２に入力される。するとサーバ２が「処理ａ」を実行する。サーバ２は、「処理ａ」の実行過程で、サーバ４に対して「処理ｂ」の処理要求を送信する。するとサーバ４が「処理ｂ」を実行する。サーバ４は、「処理ｂ」の実行過程で、サーバ３に対して「処理ｃ」の処理要求を送信する。するとサーバ３が「処理ｃ」を実行する。サーバ３は、「処理ｃ」の処理結果をサーバ４に送信する。サーバ４は、「処理ｃ」の処理結果を用いて「処理ｂ」の処理を実行し、「処理ｂ」の処理結果をサーバ２に送信する。サーバ２は、「処理ｂ」の処理結果を用いて「処理ａ」の処理を実行し、「処理ａ」の処理結果を、端末装置５からのリクエストに対するレスポンスとして端末装置５に送信する。 For example, a service 1 request from the terminal device 5 is input to the server 2. Then, the server 2 executes “Processing a”. The server 2 transmits a processing request for “processing b” to the server 4 in the course of executing “processing a”. Then, the server 4 executes “processing b”. The server 4 transmits a processing request for “processing c” to the server 3 in the course of executing “processing b”. Then, the server 3 executes “processing c”. The server 3 transmits the processing result of “processing c” to the server 4. The server 4 executes the process “process b” using the process result of “process c”, and transmits the process result of “process b” to the server 2. The server 2 executes the process “process a” using the process result of “process b”, and transmits the process result of “process a” to the terminal apparatus 5 as a response to the request from the terminal apparatus 5.

管理装置１０は、サーバ２〜４で提供されているサービス１を管理する。例えば管理装置１０は、サービス１の性能調整を行う。具体的には、管理装置１０は、サービス１の性能が悪化した場合、サービス１の性能悪化要因となる処理を特定する。そして管理装置１０は、性能悪化が解消するように、サーバ２〜４に実行させる処理を制御する。 The management device 10 manages the service 1 provided by the servers 2 to 4. For example, the management apparatus 10 adjusts the performance of the service 1. Specifically, when the performance of the service 1 deteriorates, the management apparatus 10 specifies a process that causes the performance deterioration of the service 1. And the management apparatus 10 controls the process performed by the servers 2-4 so that performance degradation is eliminated.

ここで、サービス１の性能悪化要因となる処理を特定することの困難性について説明する。図１に示すように、複数の処理を連携させることで提供されるサービス１の場合、サービス１の性能が悪化したというだけでは、どのコンポーネントに性能悪化の要因があるのかが分からない。 Here, the difficulty of specifying the process that causes the performance deterioration of the service 1 will be described. As shown in FIG. 1, in the case of a service 1 provided by linking a plurality of processes, just because the performance of the service 1 has deteriorated, it cannot be understood which component has a factor of performance deterioration.

そこでコンポーネントごとに性能要件を定めることが考えられる。しかしながら、各コンポーネントにどのような性能要件を定めれば、サービスの性能要件を満たすことが可能なのかを、的確に判断するのは困難である。例えばサービスのレイテンシを１００ミリ秒以内にするために，コンポーネントごとのＣＰＵ（Central Processing Unit）使用率、メモリ使用率、ディスクＩ／Ｏレートなどの値がいくつであれば適当なのかを、正確に決定することは困難である。しかも、サービスの利用者が作成したコンポーネントの場合、管理者は、コンポーネントの具体的な処理内容を知ることができない。コンポーネントの処理内容を知らずに、そのコンポーネントの性能要件を定めるのは困難である。 Therefore, it is conceivable to define performance requirements for each component. However, it is difficult to accurately determine what kind of performance requirement is defined for each component and whether the performance requirement of the service can be satisfied. For example, in order to keep the service latency within 100 milliseconds, it is necessary to accurately determine the appropriate value of CPU (Central Processing Unit) usage rate, memory usage rate, disk I / O rate, etc. for each component. It is difficult to decide. In addition, in the case of a component created by a service user, the administrator cannot know the specific processing contents of the component. It is difficult to determine the performance requirements of a component without knowing the processing content of the component.

そこで管理装置１０により、各サーバ２〜４での処理の動作状態に基づいて、性能悪化要因となる処理を適確に特定する性能管理方法を実現する。そのために、管理装置１０は、以下のような記憶部１１と処理部１２とを有する。記憶部１１は、例えば管理装置１０が有するメモリまたはストレージ装置である。処理部１２は、例えば管理装置１０が有する１または複数のプロセッサである。処理部１２が実行する処理は、例えばその処理の手順が記述された性能管理プログラムをプロセッサに実行させることで実現できる。 Therefore, the management device 10 implements a performance management method for accurately identifying the process that causes the performance deterioration based on the operation state of the process in each of the servers 2 to 4. For this purpose, the management apparatus 10 includes a storage unit 11 and a processing unit 12 as described below. The storage unit 11 is, for example, a memory or a storage device included in the management device 10. The processing unit 12 is, for example, one or more processors included in the management device 10. The processing executed by the processing unit 12 can be realized, for example, by causing a processor to execute a performance management program in which the processing procedure is described.

記憶部１１は、複数の処理を連携させることで提供されるサービス１の性能が、サービス１に求められる性能を示す性能要件を満たしているときの、複数の処理それぞれの動作状態を示す第２状態情報１１ａを記憶する。第２状態情報１１ａは、例えば、各処理のＣＰＵ使用率、各処理実行時のメモリＩ／Ｏレートなどの複数種の情報である。このような動作状態を示す情報は、メトリックと呼ばれる。なお、記憶部１１は、各種メトリックの統計処理を施した結果の値を、第２状態情報１１ａとして記憶していてもよい。例えば処理ごとのＣＰＵ使用率のパーセンタイル値を、第２状態情報１１ａとすることもできる。 The storage unit 11 indicates the operation state of each of the plurality of processes when the performance of the service 1 provided by linking the plurality of processes satisfies the performance requirement indicating the performance required for the service 1. The state information 11a is stored. The second state information 11a is a plurality of types of information such as a CPU usage rate of each process and a memory I / O rate at the time of executing each process. Information indicating such an operating state is called a metric. Note that the storage unit 11 may store values resulting from the statistical processing of various metrics as the second state information 11a. For example, the percentile value of the CPU usage rate for each process can be used as the second state information 11a.

処理部１２は、サービス１の性能を示す性能情報を取得する。例えば処理部１２は、端末装置５とサーバ２との間の通信を監視し、リクエストからレスポンスまでの時間（レイテンシ）を取得する。処理部１２は、例えば複数のリクエストに対するレイテンシに基づいて、Ａｐｄｅｘなどの性能の指標値を算出する。Ａｐｄｅｘについて後述する。 The processing unit 12 acquires performance information indicating the performance of the service 1. For example, the processing unit 12 monitors communication between the terminal device 5 and the server 2 and acquires a time (latency) from a request to a response. For example, the processing unit 12 calculates an index value of performance such as Addex based on the latency for a plurality of requests. The index will be described later.

処理部１２は、取得した性能情報が、性能要件を満たしているか否かを判断する。例えば性能要件として、Ａｐｄｅｘが０．８以上であることが指定されているものとする。この場合、処理部１２は、取得した性能情報に基づいて算出したＡｐｄｅｘ値が、０．８以上か否かを判断する。 The processing unit 12 determines whether the acquired performance information satisfies the performance requirement. For example, it is assumed that the performance requirement specifies that the index is 0.8 or more. In this case, the processing unit 12 determines whether or not the Index value calculated based on the acquired performance information is 0.8 or more.

処理部１２は、性能情報が性能要件を満たしていない場合、サーバ２〜４から、直近の所定期間における複数の処理それぞれの動作状態を示す第１状態情報を取得する。例えば処理部１２は、各処理のＣＰＵ使用率、各処理実行時のメモリＩ／Ｏレートなどのメトリックの値を取得する。 When the performance information does not satisfy the performance requirement, the processing unit 12 acquires, from the servers 2 to 4, first state information indicating the operation states of the plurality of processes in the latest predetermined period. For example, the processing unit 12 acquires metric values such as the CPU usage rate of each process and the memory I / O rate when each process is executed.

処理部１２は、取得した第１状態情報と第２状態情報１１ａとに基づいて、性能要件が満たされているときと満たされてないときとの動作状態の差を、複数の処理それぞれについて計算する。例えば処理部１２は、取得した第１状態情報に基づいて、直近の所定期間のメトリックの値の代表値（例えばパーセンタイル値）を計算する。そして処理部１２は、第１状態情報から算出した代表値を第２状態情報１１ａから算出した代表値との差を計算する。 Based on the acquired first state information and second state information 11a, the processing unit 12 calculates a difference in operating state between when the performance requirement is satisfied and when it is not satisfied for each of the plurality of processes. To do. For example, the processing unit 12 calculates a representative value (for example, a percentile value) of a metric value for the most recent predetermined period based on the acquired first state information. Then, the processing unit 12 calculates a difference between the representative value calculated from the first state information and the representative value calculated from the second state information 11a.

そして処理部１２は、複数の処理それぞれの動作状態の差に基づいて、サービス１の性能悪化要因となっている処理を判定する。例えば、処理部１２は、動作状態の差が最も大きな処理を、性能悪化要因の処理と判定する。 And the process part 12 determines the process used as the performance deterioration factor of the service 1 based on the difference of the operation state of each of several process. For example, the processing unit 12 determines a process with the largest difference in operation state as a process of performance deterioration factor.

処理部１２は、さらに性能悪化要因と判定された要因処理の動作状態の差に基づいて、性能悪化に対する対処方法を決定し、決定した対処方法による対処を実施する。例えば処理部１２は、要因処理のスケールアウトを行う。 The processing unit 12 further determines a coping method for the performance deterioration based on the difference in the operation state of the factor processing determined to be the performance deteriorating factor, and performs coping with the determined coping method. For example, the processing unit 12 scales out the factor processing.

このようにして、サービス１の提供に使用する処理のうち、その処理が性能悪化要因となっているのかを、判定することができる。その結果、サービス１の性能悪化に対して、迅速に対処することができる。また、各処理について、メトリックごとの性能要件を設定するといった手間が不要となり、システムの管理負担が軽減される。 In this way, it is possible to determine whether the process used to provide the service 1 is a factor that deteriorates performance. As a result, it is possible to quickly cope with the performance deterioration of the service 1. Further, it is not necessary to set performance requirements for each metric for each process, and the management burden on the system is reduced.

なお、処理部１２は、第２状態情報１１ａを、適宜更新することで、第２状態情報１１ａの精度を向上させることもできる。例えば処理部１２は、サービス１の性能情報が性能要件を満たしている場合、直近の所定期間における複数の処理それぞれの動作状態を示す第３状態情報を取得する。そして処理部１２は、取得した第３状態情報に基づいて、第２状態情報１１ａを更新する。例えば処理部１２は、複数の期間の第３状態情報に基づき、現在に近い期間の第３状態情報に示される動作状態ほど、更新後の第２状態情報１１ａに強く反映させる。このように、最新の性能情報によって第２状態情報１１ａを更新すると共に、新しい更新情報の重みを重くして第２状態情報１１ａを更新することで、システムの最近の運用状況を反映させた精度の高い第２状態情報１１ａを生成することができる。 In addition, the process part 12 can also improve the precision of the 2nd status information 11a by updating the 2nd status information 11a suitably. For example, when the performance information of the service 1 satisfies the performance requirement, the processing unit 12 acquires the third state information indicating the operation state of each of the plurality of processes in the latest predetermined period. Then, the processing unit 12 updates the second state information 11a based on the acquired third state information. For example, based on the third state information of a plurality of periods, the processing unit 12 reflects the operation state indicated by the third state information in a period closer to the present in the updated second state information 11a. As described above, the second state information 11a is updated with the latest performance information, and the second state information 11a is updated by increasing the weight of the new update information, thereby reflecting the recent operation status of the system. High second state information 11a can be generated.

記憶部１１は、第２状態情報１１ａとして、例えばサービス１の性能が性能要件を満たしているときに複数の処理それぞれが使用しているリソースの稼働状況の時系列変化を示す第２リソース情報の所定の代表値である第２代表値を記憶してもよい。この場合、処理部１２は、第１状態情報として、直近の所定期間に複数の処理それぞれが使用しているリソースの稼働状況の時系列変化を示す第１リソース情報を取得し、第１リソース情報の所定の代表値を、第１代表値として算出する。そして処理部１２は、複数の処理それぞれについて、第１代表値と第２代表値との差を計算し、差が最も大きい処理を、性能悪化の要因である要因処理であると判定する。このようにリソースの稼働状況を代表値で表すことで、動作状態の差を容易に数値化することができる。その結果、リソース１の性能悪化の前後で動作状態が大きく変化した処理を、容易に特定することができる。 For example, when the performance of the service 1 satisfies the performance requirement, the storage unit 11 stores, as the second state information 11a, the second resource information indicating the time series change of the operation status of the resources used by each of the plurality of processes. A second representative value that is a predetermined representative value may be stored. In this case, the processing unit 12 acquires, as the first state information, first resource information indicating a time-series change in the operating status of resources used by each of the plurality of processes in the most recent predetermined period, and the first resource information Is determined as a first representative value. Then, the processing unit 12 calculates the difference between the first representative value and the second representative value for each of the plurality of processes, and determines that the process with the largest difference is a factor process that is a factor of performance deterioration. Thus, by representing the operation status of the resource by the representative value, the difference in the operation state can be easily quantified. As a result, it is possible to easily identify a process whose operation state has largely changed before and after the performance deterioration of the resource 1.

なお、第１状態情報および第２状態情報１１ａとして、複数種メトリック（ＣＰＵ使用率、メモリＩ／Ｏレートなど）の値を取得している場合、処理部１２は、メトリック種別ごとに代表値の差を計算する。また処理部１２は、１種のメトリックについて複数種の代表値（例えば５０パーセンタイル、９０パーセンタイル、９９パーセンタイルなど）を算出することもできる。この場合、処理部１２は、各処理について、第１状態情報と第２状態情報１１ａとの同種のメトリックの同種の代表値間の差を計算する。そして処理部１２は、各処理のメトリック種別（例えばＣＰＵ使用率）ごとに、代表値間の差（例えば絶対値）を合計し、対応する処理の該当メトリック種別に関する動作状態の差とする。また処理部１２は、性能悪化時に値が増加した代表値の差（第２の実施の形態では「正の要因度」と呼ぶ）と、性能悪化時に値が減少した代表値の差（第２の実施の形態では「負の要因度」と呼ぶ）とを個別に算出してもよい。 Note that when the values of multiple types of metrics (CPU usage rate, memory I / O rate, etc.) are acquired as the first status information and the second status information 11a, the processing unit 12 displays the representative value for each metric type. Calculate the difference. The processing unit 12 can also calculate a plurality of types of representative values (for example, 50th percentile, 90th percentile, 99th percentile, etc.) for one type of metric. In this case, the processing unit 12 calculates, for each process, the difference between the same kind of representative values of the same kind of metrics of the first state information and the second state information 11a. Then, the processing unit 12 sums up the differences (for example, absolute values) between the representative values for each metric type (for example, CPU usage rate) of each process, and sets the difference in the operation state regarding the corresponding metric type of the corresponding process. Further, the processing unit 12 determines the difference between the representative values that increase when the performance deteriorates (referred to as “positive factor” in the second embodiment) and the difference between the representative values that decrease when the performance deteriorates (second In this embodiment, it is also possible to calculate “negative factor degree” individually.

処理部１２は、サービス１の性能悪化に対する対処方法としては、例えば要因処理のスケールアウトを行うことができる。また処理部１２は、要因処理を現在実行しているサーバにおける、要因処理以外の処理の影響でサービス１の性能が悪化している場合、要因処理を実行するサーバを変更することもできる。例えば処理部１２は、要因処理の第２状態情報１１ａ（性能悪化時の状態情報）の方が、要因処理の第１状態情報（正常時の状態情報）よりも負荷が大きい動作状態を表している場合、要因処理のスケールアウトを行う。また処理部１２は、要因処理の第１状態情報の方が、要因処理の第２状態情報１１ａよりも負荷が大きい動作状態を表している場合、要因処理を実行するサーバを変更する。これにより、無駄なスケールアウトの実行を抑止することができる。 For example, the processing unit 12 can perform scale-out of factor processing as a countermeasure against the performance deterioration of the service 1. Further, when the performance of the service 1 is deteriorated due to the influence of processes other than the factor process in the server that is currently executing the factor process, the processing unit 12 can change the server that performs the factor process. For example, the processing unit 12 represents an operation state in which the second state information 11a of the factor processing (state information at the time of performance deterioration) has a larger load than the first state information of the factor processing (state information at normal time). If there is, scale out the factor processing. Further, the processing unit 12 changes the server that executes the factor process when the first state information of the factor process represents an operation state with a larger load than the second state information 11a of the factor process. Thereby, execution of useless scale-out can be suppressed.

さらに処理部１２は、要因処理の変更とスケールアウトとを同時の行った後、スケールアウトが余分であることを確認できたとき、スケールインを実施してもよい。例えば処理部１２は、まず、要因処理を現在実行している第１サーバでの要因処理の実行を停止し、第１サーバとは異なる複数の第２サーバそれぞれで要因処理を実行させる。そして処理部１２は、対処実施後の複数の第２サーバが要因処理を実行するための処理負荷が、所定値以下の場合、複数の第２サーバの一部における要因処理の実行を停止させる。これにより、サービス１の性能悪化状態を迅速に解消し、かつ無駄なリソースの消費を抑制することができる。 Further, the processing unit 12 may perform the scale-in when it is confirmed that the scale-out is excessive after the change of the factor process and the scale-out are performed at the same time. For example, the processing unit 12 first stops the execution of the factor process on the first server that is currently executing the factor process, and causes the factor server to execute the factor process on each of a plurality of second servers different from the first server. Then, the processing unit 12 stops the execution of the factor processing in a part of the plurality of second servers when the processing load required for the plurality of second servers to execute the factor processing after the execution of the countermeasure is equal to or less than a predetermined value. Thereby, the performance deterioration state of the service 1 can be quickly resolved, and wasteful resource consumption can be suppressed.

〔第２の実施の形態〕
次に第２の実施の形態について説明する。第２の実施の形態は、マイクロサービスアーキテクチャに基づいて構築されたＰａａＳの運用管理を行う際に、サービスのレイテンシが最大値を超えたとき、負荷が過大となったコンポーネントを的確に判断できるコンピュータシステムである。 [Second Embodiment]
Next, a second embodiment will be described. The second embodiment is a computer that can accurately determine a component having an excessive load when the service latency exceeds the maximum value when performing operation management of PaaS constructed based on the micro service architecture. System.

図２は、第２の実施の形態のシステム構成例を示す図である。クラウドコンピューティングシステム４０には、ネットワーク２０を介して複数の端末装置３１，３２，・・・が接続されている。クラウドコンピューティングシステム４０は、複数の端末装置３１，３２，・・・に対して、ＰａａＳによるサービスを提供する。 FIG. 2 is a diagram illustrating a system configuration example according to the second embodiment. A plurality of terminal devices 31, 32,... Are connected to the cloud computing system 40 via the network 20. The cloud computing system 40 provides a service by PaaS to a plurality of terminal devices 31, 32,.

クラウドコンピューティングシステム４０には、ゲートウェイ４１、管理サーバ１００、および複数のサーバ４２〜４４が含まれる。ゲートウェイ４１は、ネットワーク２０に接続されており、複数の端末装置３１，３２，・・・からの要求を受け付ける。管理サーバ１００は、ゲートウェイ４１と複数のサーバ４２〜４４とに接続されており、複数のサーバ４２〜４４を管理する。複数のサーバ４２〜４４は、複数の端末装置３１，３２，・・・からの要求に応じて、情報処理のサービスを提供する。 The cloud computing system 40 includes a gateway 41, a management server 100, and a plurality of servers 42 to 44. The gateway 41 is connected to the network 20 and accepts requests from a plurality of terminal devices 31, 32,. The management server 100 is connected to the gateway 41 and the plurality of servers 42 to 44 and manages the plurality of servers 42 to 44. The plurality of servers 42 to 44 provide information processing services in response to requests from the plurality of terminal devices 31, 32,.

図３は、本実施の形態に用いる管理サーバのハードウェアの一構成例を示す図である。管理サーバ１００は、プロセッサ１０１によって装置全体が制御されている。プロセッサ１０１には、バス１０９を介してメモリ１０２と複数の周辺機器が接続されている。プロセッサ１０１は、マルチプロセッサであってもよい。プロセッサ１０１は、例えばＣＰＵ、ＭＰＵ（Micro Processing Unit）、またはＤＳＰ（Digital Signal Processor）である。プロセッサ１０１がプログラムを実行することで実現する機能の少なくとも一部を、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）などの電子回路で実現してもよい。 FIG. 3 is a diagram illustrating a configuration example of hardware of the management server used in the present embodiment. The management server 100 is entirely controlled by a processor 101. A memory 102 and a plurality of peripheral devices are connected to the processor 101 via a bus 109. The processor 101 may be a multiprocessor. The processor 101 is, for example, a CPU, an MPU (Micro Processing Unit), or a DSP (Digital Signal Processor). At least a part of the functions realized by the processor 101 executing the program may be realized by an electronic circuit such as an ASIC (Application Specific Integrated Circuit) or a PLD (Programmable Logic Device).

メモリ１０２は、管理サーバ１００の主記憶装置として使用される。メモリ１０２には、プロセッサ１０１に実行させるＯＳ（Operating System）のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、メモリ１０２には、プロセッサ１０１による処理に必要な各種データが格納される。メモリ１０２としては、例えばＲＡＭ（Random Access Memory）などの揮発性の半導体記憶装置が使用される。 The memory 102 is used as a main storage device of the management server 100. The memory 102 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the processor 101. The memory 102 stores various data necessary for processing by the processor 101. As the memory 102, for example, a volatile semiconductor storage device such as a RAM (Random Access Memory) is used.

バス１０９に接続されている周辺機器としては、ストレージ装置１０３、グラフィック処理装置１０４、入力インタフェース１０５、光学ドライブ装置１０６、機器接続インタフェース１０７およびネットワークインタフェース１０８がある。 Peripheral devices connected to the bus 109 include a storage device 103, a graphic processing device 104, an input interface 105, an optical drive device 106, a device connection interface 107, and a network interface 108.

ストレージ装置１０３は、内蔵した記録媒体に対して、電気的または磁気的にデータの書き込みおよび読み出しを行う。ストレージ装置１０３は、コンピュータの補助記憶装置として使用される。ストレージ装置１０３には、ＯＳのプログラム、アプリケーションプログラム、および各種データが格納される。なお、ストレージ装置１０３としては、例えばＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）を使用することができる。 The storage device 103 writes and reads data electrically or magnetically with respect to a built-in recording medium. The storage device 103 is used as an auxiliary storage device of a computer. The storage device 103 stores an OS program, application programs, and various data. For example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive) can be used as the storage device 103.

グラフィック処理装置１０４には、モニタ２１が接続されている。グラフィック処理装置１０４は、プロセッサ１０１からの命令に従って、画像をモニタ２１の画面に表示させる。モニタ２１としては、ＣＲＴ（Cathode Ray Tube）を用いた表示装置や液晶表示装置などがある。 A monitor 21 is connected to the graphic processing device 104. The graphic processing device 104 displays an image on the screen of the monitor 21 in accordance with an instruction from the processor 101. Examples of the monitor 21 include a display device using a CRT (Cathode Ray Tube) and a liquid crystal display device.

入力インタフェース１０５には、キーボード２２とマウス２３とが接続されている。入力インタフェース１０５は、キーボード２２やマウス２３から送られてくる信号をプロセッサ１０１に送信する。なお、マウス２３は、ポインティングデバイスの一例であり、他のポインティングデバイスを使用することもできる。他のポインティングデバイスとしては、タッチパネル、タブレット、タッチパッド、トラックボールなどがある。 A keyboard 22 and a mouse 23 are connected to the input interface 105. The input interface 105 transmits signals sent from the keyboard 22 and the mouse 23 to the processor 101. The mouse 23 is an example of a pointing device, and other pointing devices can also be used. Examples of other pointing devices include a touch panel, a tablet, a touch pad, and a trackball.

光学ドライブ装置１０６は、レーザ光などを利用して、光ディスク２４に記録されたデータの読み取りを行う。光ディスク２４は、光の反射によって読み取り可能なようにデータが記録された可搬型の記録媒体である。光ディスク２４には、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）などがある。 The optical drive device 106 reads data recorded on the optical disc 24 using laser light or the like. The optical disc 24 is a portable recording medium on which data is recorded so that it can be read by reflection of light. The optical disc 24 includes a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable) / RW (ReWritable), and the like.

機器接続インタフェース１０７は、管理サーバ１００に周辺機器を接続するための通信インタフェースである。例えば機器接続インタフェース１０７には、メモリ装置２５やメモリリーダライタ２６を接続することができる。メモリ装置２５は、機器接続インタフェース１０７との通信機能を搭載した記録媒体である。メモリリーダライタ２６は、メモリカード２７へのデータの書き込み、またはメモリカード２７からのデータの読み出しを行う装置である。メモリカード２７は、カード型の記録媒体である。 The device connection interface 107 is a communication interface for connecting peripheral devices to the management server 100. For example, the memory device 25 and the memory reader / writer 26 can be connected to the device connection interface 107. The memory device 25 is a recording medium equipped with a communication function with the device connection interface 107. The memory reader / writer 26 is a device that writes data to the memory card 27 or reads data from the memory card 27. The memory card 27 is a card type recording medium.

ネットワークインタフェース１０８は、ネットワーク２０に接続されている。ネットワークインタフェース１０８は、ネットワーク２０を介して、他のコンピュータまたは通信機器との間でデータの送受信を行う。 The network interface 108 is connected to the network 20. The network interface 108 transmits and receives data to and from other computers or communication devices via the network 20.

以上のようなハードウェア構成によって、第２の実施の形態における管理サーバ１００の処理機能を実現することができる。なお、端末装置３１，３２，・・・、ゲートウェイ４１、およびサーバ４２〜４４も、管理サーバ１００と同様のハードウェアによって実現できる。また、第１の実施の形態に示した管理装置１０も、図３に示した管理サーバ１００と同様のハードウェアにより実現することができる。 With the hardware configuration as described above, the processing function of the management server 100 in the second embodiment can be realized. The terminal devices 31, 32,..., The gateway 41, and the servers 42 to 44 can also be realized by the same hardware as the management server 100. Further, the management apparatus 10 shown in the first embodiment can also be realized by the same hardware as the management server 100 shown in FIG.

管理サーバ１００は、例えばコンピュータ読み取り可能な記録媒体に記録されたプログラムを実行することにより、第２の実施の形態の処理機能を実現する。管理サーバ１００に実行させる処理内容を記述したプログラムは、様々な記録媒体に記録しておくことができる。例えば、管理サーバ１００に実行させるプログラムをストレージ装置１０３に格納しておくことができる。プロセッサ１０１は、ストレージ装置１０３内のプログラムの少なくとも一部をメモリ１０２にロードし、プログラムを実行する。また管理サーバ１００に実行させるプログラムを、光ディスク２４、メモリ装置２５、メモリカード２７などの可搬型記録媒体に記録しておくこともできる。可搬型記録媒体に格納されたプログラムは、例えばプロセッサ１０１からの制御により、ストレージ装置１０３にインストールされた後、実行可能となる。またプロセッサ１０１が、可搬型記録媒体から直接プログラムを読み出して実行することもできる。 The management server 100 implements the processing functions of the second embodiment by executing a program recorded on a computer-readable recording medium, for example. The program describing the processing contents to be executed by the management server 100 can be recorded in various recording media. For example, a program to be executed by the management server 100 can be stored in the storage device 103. The processor 101 loads at least a part of the program in the storage apparatus 103 into the memory 102 and executes the program. A program to be executed by the management server 100 can also be recorded on a portable recording medium such as the optical disc 24, the memory device 25, and the memory card 27. The program stored in the portable recording medium becomes executable after being installed in the storage apparatus 103 under the control of the processor 101, for example. The processor 101 can also read and execute a program directly from a portable recording medium.

なお、第２の実施の形態では、マイクロサービスアーキテクチャに基づいて、サービスを提供するソフトウェアがサーバ４２〜４４に実装される。
図４は、マイクロサービスアーキテクチャの概念を示す図である。ユーザに提供するサービス５０は、複数のコンポーネント５１〜５３を用いて実現される。例えばコンポーネント５１はプレゼンテーション層の処理を実行するソフトウェアであり、コンポーネント５２はロジック層の処理を実行するソフトウェアであり、コンポーネント５３はデータ層の処理を実行するソフトウェアである。 In the second embodiment, software providing a service is installed in the servers 42 to 44 based on the micro service architecture.
FIG. 4 is a diagram illustrating the concept of the micro service architecture. The service 50 provided to the user is realized using a plurality of components 51 to 53. For example, the component 51 is software that executes processing of the presentation layer, the component 52 is software that executes processing of the logic layer, and the component 53 is software that executes processing of the data layer.

コンポーネント５１〜５３は、複数のサーバ４２〜４４のいずれか１以上で実行される。コンポーネント５１〜５３を実行することでサーバ４２〜４４上に構築される処理機能がコンテナである。第２の実施の形態では、コンテナを「Ｃ_xy」と表している。添字の「ｘ」は、そのコンテナを含むコンポーネントの識別番号（コンポーネント番号）である。添字の「ｙ」は、そのコンテナを含むコンポーネント内でのコンテナの識別番号（コンテナ番号）である。 The components 51 to 53 are executed by any one or more of the plurality of servers 42 to 44. A processing function constructed on the servers 42 to 44 by executing the components 51 to 53 is a container. In the second embodiment, the container is represented as “C _xy ”. The subscript “x” is an identification number (component number) of a component including the container. The subscript “y” is an identification number (container number) of a container in a component including the container.

このように、マイクロサービスアーキテクチャでは、一つのサービス５０を提供するためのソフトウェアが、複数の小さなコンポーネント５１〜５３に分割して作成される。各コンポーネント５１〜５３は疎に結合している。結合が疎であるとは、コンポーネント５１〜５３同士の結びつきが比較的緩やかであり、独立性が強い状態にあることである。コンポーネント５１〜５３の結合が疎であることにより、新たなコンポーネントの追加や一部のコンポーネントの拡張による他のコンポーネントの変更が少なくてすむという利点がある。 As described above, in the micro service architecture, software for providing one service 50 is created by being divided into a plurality of small components 51 to 53. Each component 51-53 is loosely coupled. The loose coupling means that the components 51 to 53 are relatively loosely coupled and have a strong independence. Since the coupling of the components 51 to 53 is sparse, there is an advantage that a change of other components due to addition of a new component or expansion of some components can be reduced.

マイクロサービスアーキテクチャに準じて作成されたサービスのコンポーネント５１〜５３は、コンテナによって実行される。コンポーネント５１〜５３とコンテナは１対多の関係にある。 The service components 51 to 53 created according to the micro service architecture are executed by the container. The components 51 to 53 and the container have a one-to-many relationship.

ユーザに提供するサービス５０に求められる性能要件は、例えばレイテンシを用いて表すことができる。従って、システムの管理者は、サービス５０に求められるレイテンシが得られるような処理能力のコンポーネント５１〜５３を用意することになる。コンポーネント５１〜５３の処理能力は、コンポーネント５１〜５３を実行するコンテナを増やしたり、減らしたりすることで調整することができる。 The performance requirement required for the service 50 provided to the user can be expressed using, for example, latency. Therefore, the system administrator prepares the components 51 to 53 having the processing capability so that the latency required for the service 50 can be obtained. The processing capacity of the components 51 to 53 can be adjusted by increasing or decreasing the number of containers that execute the components 51 to 53.

ここで、サービス５０に求められる性能要件を管理者が規定することは容易である。それに対して、サービス５０に求められるレイテンシを満たすように、各コンポーネントにどの程度のリソースを割り当てればよいのかを、管理者が判断するのは困難である。そこで第２の実施の形態では、管理サーバ１００が、性能が不足しているコンポーネントを検出し、そのコンポーネントを実行するコンテナを追加することで、サービス５０に対する性能要件を満たすようなコンポーネントへのリソースの割り当てを実現する。 Here, it is easy for the administrator to specify performance requirements required for the service 50. On the other hand, it is difficult for the administrator to determine how much resources should be allocated to each component so as to satisfy the latency required for the service 50. Therefore, in the second embodiment, the management server 100 detects a component having insufficient performance, and adds a container for executing the component, thereby adding a resource to the component that satisfies the performance requirement for the service 50. Realize the allocation.

図５は、性能調整のためにゲートウェイと管理サーバが有する機能を示すブロック図である。ゲートウェイ４１は、レイテンシ計測部４１ａとレイテンシ記憶部４１ｂとを有する。レイテンシ計測部４１ａは、端末装置３１，３２，・・・から要求を受信してから、その要求に対応する応答を端末装置３１，３２，・・・に送信するまでの時間を計測する。レイテンシ計測部４１ａは、計測した時間を、その要求に応じたサービスについてのレイテンシとして、レイテンシ記憶部４１ｂに格納する。レイテンシ記憶部４１ｂは、レイテンシ計測部４１ａが計測したレイテンシを記憶する。 FIG. 5 is a block diagram illustrating functions of the gateway and the management server for performance adjustment. The gateway 41 includes a latency measurement unit 41a and a latency storage unit 41b. The latency measuring unit 41a measures the time from when a request is received from the terminal device 31, 32,... Until a response corresponding to the request is transmitted to the terminal device 31, 32,. The latency measuring unit 41a stores the measured time in the latency storage unit 41b as the latency for the service according to the request. The latency storage unit 41b stores the latency measured by the latency measurement unit 41a.

管理サーバ１００は、サービス情報記憶部１１０、メトリック情報記憶部１２０、正常時振る舞い記憶部１３０、リソース情報記憶部１４０、および性能調整エンジン１５０を有する。サービス情報記憶部１１０は、提供するサービスに関する情報を記憶する。メトリック情報記憶部１２０は、サーバ４２〜４４やコンテナによるリソースの稼働状況に関する情報（メトリック）を記憶する。正常時振る舞い記憶部１３０は、複数のコンテナそれぞれと複数のサーバそれぞれとの正常動作時の振る舞いを示す情報を記憶する。リソース情報記憶部１４０は、サーバ４２〜４４の使用リソースに関する情報を記憶する。性能調整エンジン１５０は、サービス情報記憶部１１０、メトリック情報記憶部１２０、正常時振る舞い記憶部１３０、およびリソース情報記憶部１４０に記憶された情報を用いて、コンポーネント単位での性能調整を行う。 The management server 100 includes a service information storage unit 110, a metric information storage unit 120, a normal behavior storage unit 130, a resource information storage unit 140, and a performance adjustment engine 150. The service information storage unit 110 stores information related to the service to be provided. The metric information storage unit 120 stores information (metric) on the operating status of resources by the servers 42 to 44 and the containers. The normal behavior storage unit 130 stores information indicating the behavior during normal operation of each of the plurality of containers and the plurality of servers. The resource information storage unit 140 stores information regarding resources used by the servers 42 to 44. The performance adjustment engine 150 uses the information stored in the service information storage unit 110, the metric information storage unit 120, the normal behavior storage unit 130, and the resource information storage unit 140 to perform performance adjustment in units of components.

なお、以下の説明において、コンポーネントの処理を実行するコンテナをサーバに実装することを、コンテナの配置と呼ぶ。コンテナの配置は、具体的には、コンポーネントを実行するためのプログラムをサーバにインストールし、そのプログラムに基づいてコンポーネントの処理を実行するプロセスを起動する処理である。また、コンテナがサーバに実装されているとき、そのコンテナがそのサーバに配置されていると呼ぶ。 In the following description, mounting a container that executes component processing on a server is referred to as container arrangement. Specifically, the container arrangement is a process of installing a program for executing a component in a server and starting a process for executing the component process based on the program. Also, when a container is mounted on a server, it is said that the container is placed on that server.

図５の例では、各サーバ４２〜４４には、異なるコンポーネントの複数のコンテナが配置されている。例えばサーバ４２には、コンテナＣ₁₁，Ｃ₂₂，Ｃ₃₁が配置されている。
以下、図６〜図１０を参照して、サービス情報記憶部１１０、メトリック情報記憶部１２０、正常時振る舞い記憶部１３０、およびリソース情報記憶部１４０が記憶する情報について、詳細に説明する。 In the example of FIG. 5, each of the servers 42 to 44 has a plurality of containers having different components. For example, containers C ₁₁ , C ₂₂ and C ₃₁ are arranged in the server 42.
Hereinafter, the information stored in the service information storage unit 110, the metric information storage unit 120, the normal behavior storage unit 130, and the resource information storage unit 140 will be described in detail with reference to FIGS.

図６は、レイテンシ記憶部が記憶する情報の一例を示す図である。レイテンシ記憶部４１ｂは、例えばレイテンシ管理テーブル４１ｃを記憶している。レイテンシ管理テーブル４１ｃは、タイムスタンプ、リクエストＩＤ、サービス名、およびレイテンシの欄を有している。 FIG. 6 is a diagram illustrating an example of information stored in the latency storage unit. The latency storage unit 41b stores, for example, a latency management table 41c. The latency management table 41c has columns of time stamp, request ID, service name, and latency.

タイムスタンプの欄には、レイテンシを計測した日時が設定される。リクエストＩＤの欄には、レイテンシを計測した要求の識別情報（リクエストＩＤ）が設定される。サービス名の欄には、レイテンシを計測した要求に対応するサービスの名称（サービス名）が設定される。レイテンシの欄には、計測したレイテンシが設定される。 In the time stamp column, the date and time when the latency is measured is set. In the request ID column, identification information (request ID) of a request for measuring latency is set. In the service name column, the name of the service (service name) corresponding to the request whose latency has been measured is set. The measured latency is set in the latency column.

図７は、サービス情報記憶部が記憶する情報の一例を示す図である。サービス情報記憶部１１０は、例えばサービス管理テーブル１１１を記憶している。サービス管理テーブル１１１は、サービス名、Ａｐｄｅｘ（Application performance index）、ＳａｔｉｓｆｉｅｄＴｉｍｅ、およびコンポーネント名の欄が設けられている。サービス名の欄には、提供しているサービスの名称（サービス名）が設定される。Ａｐｄｅｘの欄には、対応するサービスに求められる性能要件が、Ａｐｄｅｘによって設定される。Ａｐｄｅｘは、レイテンシについてのユーザの満足度を示す指標である。ＳａｔｉｓｆｉｅｄＴｉｍｅの欄には、対応するサービスを利用するユーザが満足すると思われる最大のレイテンシの値（Ｔ）が設定される。コンポーネント名の欄には、サービスの提供に用いられるコンポーネントの名称が設定される。 FIG. 7 is a diagram illustrating an example of information stored in the service information storage unit. The service information storage unit 110 stores a service management table 111, for example. The service management table 111 includes columns for a service name, an index (Application performance index), a satisfied time, and a component name. In the service name column, the name of the service provided (service name) is set. In the column of “Adex”, performance requirements required for the corresponding service are set by the “Adex”. The index is an index indicating the degree of user satisfaction with respect to latency. In the field of Satified Time, the maximum latency value (T) that the user who uses the corresponding service is considered to be satisfied is set. In the component name column, the name of the component used for providing the service is set.

ここで、Ａｐｄｅｘについて詳細に説明する。Ａｐｄｅｘは、「ＴｈｅＡｌｌｉａｎｃｅ」によって標準化された指標であり、以下の式によって計算される。
Ａｐｄｅｘ＝（（ｓａｔｉｓｆｉｅｄｃｏｕｎｔｓ）＋（ｔｏｌｅｒａｔｉｎｇｃｏｕｎｔｓ）／２）／（ｔｏｔａｌｃｏｕｎｔｓ）
「ｓａｔｉｓｆｉｅｄｃｏｕｎｔｓ」は、レイテンシがＴ以下のリクエスト回数である。すなわち「ｓａｔｉｓｆｉｅｄｃｏｕｎｔｓ」は、ユーザが満足できるレイテンシが得られたリクエストの回数である。 Here, the Index will be described in detail. Addex is an index standardized by “The Alliance” and is calculated by the following equation.
Addex = ((satisfied counts) + (tolerating counts) / 2) / (total counts)
“Satisfied counts” is the number of requests whose latency is T or less. In other words, “satisfied counts” is the number of requests for which a latency satisfying the user is obtained.

「ｔｏｌｅｒａｔｉｎｇｃｏｕｎｔｓ」は、レイテンシがＴ以上、かつ４×Ｔ以下のリクエスト回数である。すなわち「ｔｏｌｅｒａｔｉｎｇｃｏｕｎｔｓ」は、ユーザが満足できるレイテンシではないものの、許容できるレイテンシが得られたリクエストの回数である。 “Tollating counts” is the number of requests with a latency of T or more and 4 × T or less. In other words, “tolerating counts” is the number of requests for which an acceptable latency is obtained, although the latency is not satisfactory for the user.

なお、レイテンシが４×Ｔより大きなリクエスト回数は、「ｆｒｕｓｔｒａｔｅｄ」と呼ばれる。この「ｆｒｕｓｔｒａｔｅｄ」は、ユーザが不満に感じるレイテンシとなったリクエストの回数である。 The number of requests with a latency greater than 4 × T is called “frustrated”. This “frustrated” is the number of requests that resulted in a latency that the user felt dissatisfied with.

第２の実施の形態では、サービスのレイテンシに基づいて計算したＡｐｄｅｘの値が、性能要件として設定されたＡｐｄｅｘ値以上であれば、性能要件を満たしていると判断される。逆にサービスのレイテンシに基づいて計算したＡｐｄｅｘの値が、性能要件として設定されたＡｐｄｅｘ値未満であれば、性能要件を満たしていないと判断される。 In the second embodiment, if the value of the Index calculated based on the service latency is equal to or higher than the Index value set as the performance requirement, it is determined that the performance requirement is satisfied. Conversely, if the value of the Index calculated based on the service latency is less than the Index value set as the performance requirement, it is determined that the performance requirement is not satisfied.

図８は、メトリック情報記憶部が記憶する情報の一例を示す図である。メトリック情報記憶部１２０は、例えばメトリック管理テーブル１２１を記憶している。メトリック管理テーブル１２１は、タイムスタンプ、サーバ／コンテナ名、メトリック種別、および値の欄を有している。タイムスタンプの欄には、メトリックの値を計測した日時が設定される。サーバ／コンテナ名の欄には、メトリックの値を計測したサーバまたはコンテナの名称が設定される。メトリック種別の欄には、計測したメトリックの種別（メトリック種別）が設定される。値の欄には、計測したメトリックの値が設定される。 FIG. 8 is a diagram illustrating an example of information stored in the metric information storage unit. The metric information storage unit 120 stores a metric management table 121, for example. The metric management table 121 has columns of time stamp, server / container name, metric type, and value. The date and time when the metric value is measured is set in the time stamp column. In the server / container name column, the name of the server or container that has measured the metric value is set. In the metric type column, the measured metric type (metric type) is set. In the value column, the value of the measured metric is set.

図９は、正常時振る舞い記憶部が記憶する情報の一例を示す図である。正常時振る舞い記憶部１３０は、例えば振る舞い測定周期ごとの複数のコンテナ振る舞い管理テーブル１３１ａ，１３１ｂ，・・・と、振る舞い測定周期ごとの複数のサーバ振る舞い管理テーブル１３２ａ，１３２ｂ，・・・とを記憶している。 FIG. 9 is a diagram illustrating an example of information stored in the normal behavior storage unit. The normal behavior storage unit 130 stores, for example, a plurality of container behavior management tables 131a, 131b,... For each behavior measurement period, and a plurality of server behavior management tables 132a, 132b,. doing.

複数のコンテナ振る舞い管理テーブル１３１ａ，１３１ｂ，・・・は、それぞれコンテナの振る舞いの測定周期に対応付けて設けられている。複数のコンテナ振る舞い管理テーブル１３１ａ，１３１ｂ，・・・は、コンテナ、メトリック種別、パーセンタイル種別、パーセンタイル値、および重み付きパーセンタイル値の欄を有している。コンテナの欄には、振る舞いの測定対象であるコンテナの名称（コンテナ名）が設定される。メトリック種別の欄には、振る舞いを測定したメトリックの種別が設定される。パーセンタイル種別の欄には、メトリックの値について求めるパーセンタイルの種別が設定される。例えば５０パーセンタイル、９０パーセンタイル、９９パーセンタイルなどが、パーセンタイルの種別として設定される。パーセンタイル値の欄には、対応するメトリックについてのパーセンタイルの種別で示されるパーセンタイルの値が設定される。重み付きパーセンタイル値の欄には、過去数周期分のメトリック値に基づく、コンテナのメトリックごとの重み付きパーセンタイル値が設定される。重み付きパーセンタイル値の詳細は、後述する（図１５参照）。 The plurality of container behavior management tables 131a, 131b,... Are provided in association with the measurement cycle of the container behavior. The plurality of container behavior management tables 131a, 131b,... Have columns for containers, metric types, percentile types, percentile values, and weighted percentile values. In the container column, the name (container name) of the container whose behavior is to be measured is set. In the metric type column, the type of metric whose behavior is measured is set. In the percentile type field, the percentile type to be obtained for the metric value is set. For example, the 50th percentile, 90th percentile, 99th percentile, etc. are set as the type of percentile. In the percentile value column, the percentile value indicated by the type of the percentile for the corresponding metric is set. In the column of weighted percentile values, weighted percentile values for each metric of the container based on metric values for the past several cycles are set. Details of the weighted percentile value will be described later (see FIG. 15).

複数のサーバ振る舞い管理テーブル１３２ａ，１３２ｂ，・・・は、それぞれサーバの振る舞いの測定周期に対応付けて設けられている。複数のサーバ振る舞い管理テーブル１３２ａ，１３２ｂ，・・・は、サーバ、メトリック種別、パーセンタイル種別、パーセンタイル値、および重み付きパーセンタイル値の欄を有している。サーバの欄には、振る舞いの測定対象であるサーバの名称（サーバ名）が設定される。メトリック種別の欄には、振る舞いを測定したメトリックの種別が設定される。パーセンタイル種別の欄には、メトリックの値について求めるパーセンタイルの種別が設定される。例えば５０パーセンタイル、９０パーセンタイル、９９パーセンタイルなどが、パーセンタイルの種別として設定される。パーセンタイル値の欄には、対応するサーバについてのパーセンタイルの種別で示されるパーセンタイルの値が設定される。重み付きパーセンタイル値の欄には、過去数周期分のメトリック値に基づく、サーバのメトリックごとの重み付きパーセンタイル値が設定される。 The plurality of server behavior management tables 132a, 132b,... Are provided in association with the server behavior measurement cycle. The plurality of server behavior management tables 132a, 132b,... Have columns of server, metric type, percentile type, percentile value, and weighted percentile value. In the server column, the name (server name) of the server whose behavior is to be measured is set. In the metric type column, the type of metric whose behavior is measured is set. In the percentile type field, the percentile type to be obtained for the metric value is set. For example, the 50th percentile, 90th percentile, 99th percentile, etc. are set as the type of percentile. In the percentile value column, the percentile value indicated by the percentile type for the corresponding server is set. In the column of weighted percentile values, weighted percentile values for each metric of the server based on metric values for the past several cycles are set.

なお、パーセンタイルは、統計の代表値の一種である。複数のデータを大きさの順に並べたとき、値ｘ（ｘは実数）より小さなデータの割合がｐ％以下（ｐは０以上１００以下の実数）、それより大きなデータの割合が「１００−ｐ」％となる値ｘが、ｐパーセンタイルである。ｐパーセンタイルは、第ｐ百分位数とも呼ばれる。 The percentile is a kind of representative value of statistics. When a plurality of data are arranged in order of size, the proportion of data smaller than the value x (x is a real number) is less than p% (p is a real number of 0 to 100), and the proportion of data larger than that is “100−p The value x that is “%” is the p percentile. The p percentile is also called the pth percentile.

図１０は、リソース情報記憶部が記憶する情報の一例を示す図である。リソース情報記憶部１４０は、例えばコンテナ配置管理テーブル１４１、サーバリソース管理テーブル１４２、およびコンテナリソース管理テーブル１４３を記憶している。 FIG. 10 is a diagram illustrating an example of information stored in the resource information storage unit. The resource information storage unit 140 stores, for example, a container arrangement management table 141, a server resource management table 142, and a container resource management table 143.

コンテナ配置管理テーブル１４１は、サーバ４２〜４４へのコンテナの配置状況を管理するデータテーブルである。コンテナ配置管理テーブル１４１は、サーバ名とコンテナ名との欄を有している。サーバ名の欄には、コンテナが実装されているサーバの名称（サーバ名）が設定される。コンテナ名の欄には、対応するサーバに実装されているコンテナの名称（コンテナ名）が設定される。 The container placement management table 141 is a data table that manages the placement status of containers on the servers 42 to 44. The container arrangement management table 141 has columns for server name and container name. The name of the server (server name) on which the container is mounted is set in the server name column. In the container name column, the name of the container (container name) installed in the corresponding server is set.

サーバリソース管理テーブル１４２は、サーバ４２〜４４のリソースの空き量を管理するデータテーブルである。サーバリソース管理テーブル１４２は、サーバ名と残余リソース量との欄を有している。サーバ名の欄には、サービスの提供に使用しているサーバの名称（サーバ名）が設定される。残余リソース量の欄には、対応するサーバのリソースの空き量（残余リソース量）が、リソースの種別ごとに設定される。図９の例では、ＣＰＵ、メモリ、ネットワークの残余リソース量が設定されている。 The server resource management table 142 is a data table for managing the amount of free resources of the servers 42 to 44. The server resource management table 142 includes columns for server name and remaining resource amount. In the server name column, the name of the server (server name) used for providing the service is set. In the remaining resource amount column, the free amount of resources of the corresponding server (remaining resource amount) is set for each resource type. In the example of FIG. 9, the remaining resource amounts of the CPU, memory, and network are set.

コンテナリソース管理テーブル１４３は、各コンポーネントのコンテナが使用するリソースの量を管理するデータテーブルである。コンテナリソース管理テーブル１４３は、コンポーネントとコンテナ使用リソース量との欄を有している。コンポーネントの欄には、サービスの提供に使用されるコンポーネントの名称（コンポーネント名）が設定される。コンテナ使用リソース量の欄には、対応するコンポーネントのコンテナが使用するリソースの量が、リソースの種別ごとに設定される。図９の例では、ＣＰＵ、メモリ、ネットワークについてのコンテナの使用リソース量が設定されている。 The container resource management table 143 is a data table for managing the amount of resources used by each component container. The container resource management table 143 has columns of components and container usage resource amounts. In the component column, the name of the component (component name) used for providing the service is set. The amount of resource used by the container of the corresponding component is set for each resource type in the column for the amount of resource used by the container. In the example of FIG. 9, the used resource amount of the container for the CPU, memory, and network is set.

次に、性能調整エンジン１５０について詳細に説明する。
図１１は、性能調整エンジンの機能を示すブロック図である。性能調整エンジン１５０は、サービス管理部１５１、メトリック情報収集部１５２、レイテンシ検査部１５３、振る舞い計算部１５４、異常要因推定部１５５、およびコンテナ配置制御部１５６を有する。 Next, the performance adjustment engine 150 will be described in detail.
FIG. 11 is a block diagram illustrating functions of the performance adjustment engine. The performance adjustment engine 150 includes a service management unit 151, a metric information collection unit 152, a latency inspection unit 153, a behavior calculation unit 154, an abnormality factor estimation unit 155, and a container arrangement control unit 156.

サービス管理部１５１は、サービスの構成や性能要件を管理する。メトリック情報収集部１５２は、サーバ４２〜４４からメトリックの値を定期的に収集し、メトリック情報記憶部１２０に格納する。レイテンシ検査部１５３は、サービスのレイテンシが性能要件を満たしているか検査する。振る舞い計算部１５４は、コンテナとサーバとの正常時および異常時の振る舞いを計算する。振る舞い計算部１５４は、正常時の振る舞いを、正常時振る舞い記憶部１３０に格納する。異常要因推定部１５５は、レイテンシが性能要件を満たしていないサービスの異常要因となっているコンポーネント（要因コンポーネント）を推定する。コンテナ配置制御部１５６は、要因コンポーネントのスケールアウト、または要因コンポーネントを実行するコンテナの配置変更を行う。 The service management unit 151 manages service configuration and performance requirements. The metric information collection unit 152 periodically collects metric values from the servers 42 to 44 and stores them in the metric information storage unit 120. The latency checking unit 153 checks whether the service latency satisfies the performance requirement. The behavior calculation unit 154 calculates the behavior of the container and the server at normal time and abnormal time. The behavior calculation unit 154 stores the normal behavior in the normal behavior storage unit 130. The abnormality factor estimation unit 155 estimates a component (factor component) that is an abnormality factor of a service whose latency does not satisfy the performance requirement. The container arrangement control unit 156 performs scale-out of the factor component or changes the arrangement of the container that executes the factor component.

なお、図１１に示した各要素間を接続する線は通信経路の一部を示すものであり、図示した通信経路以外の通信経路も設定可能である。また、図１１に示した各要素の機能は、例えば、その要素に対応するプログラムモジュールをコンピュータに実行させることで実現することができる。 In addition, the line which connects between each element shown in FIG. 11 shows a part of communication path, and communication paths other than the illustrated communication path can also be set. Moreover, the function of each element shown in FIG. 11 can be realized, for example, by causing a computer to execute a program module corresponding to the element.

次に、性能調整エンジン１５０における、各サービスが性能要件を満たしているか否かの判定処理について説明する。
図１２は、性能要件の判定処理の一例を示す図である。サービス管理部１５１は、管理者の入力に従って、サービス５０の性能要件として、Ａｐｄｅｘ値をサービス情報記憶部１１０に登録する。例えばサービス管理部１５１は、管理者からのＡｐｄｅｘ値とＳａｔｉｓｆｉｅｄＴｉｍｅ（Ｔ）との入力を受け付ける。そしてサービス管理部１５１は、入力されたＡｐｄｅｘ値とＳａｔｉｓｆｉｅｄＴｉｍｅ（Ｔ）とを、サービス管理テーブル１１１に、サービス５０のサービス名に対応付けて格納する。 Next, the determination process in the performance adjustment engine 150 for determining whether or not each service satisfies the performance requirement will be described.
FIG. 12 is a diagram illustrating an example of a performance requirement determination process. The service management unit 151 registers the Index value in the service information storage unit 110 as the performance requirement of the service 50 according to the input of the administrator. For example, the service management unit 151 receives an input of an Index value and Satisfied Time (T) from the administrator. Then, the service management unit 151 stores the input Index value and Satisfied Time (T) in the service management table 111 in association with the service name of the service 50.

レイテンシ検査部１５３は、ゲートウェイ４１から定期的に、直近の所定期間内のサービス５０へのリクエストに関するレイテンシを収集する。サービスのレイテンシは、端末装置３１から発行されたリクエストのゲートウェイ４１での受信時刻と、端末装置３１へのゲートウェイ４１からの応答の送信時刻との差である。レイテンシ検査部１５３は、取得したレイテンシに基づいて、所定期間におけるＡｐｄｅｘ値を計算する。そしてレイテンシ検査部１５３は、計算したＡｐｄｅｘ値が、性能要件として指定されたＡｐｄｅｘ値以上であれば、性能要件を満たしていると判断する。またレイテンシ検査部１５３は、計算したＡｐｄｅｘ値が、性能要件として指定されたＡｐｄｅｘ値未満であれば、性能要件を満たしていないと判断する。 The latency checking unit 153 periodically collects latencies relating to requests to the service 50 within the latest predetermined period from the gateway 41. The service latency is the difference between the reception time of the request issued from the terminal device 31 at the gateway 41 and the transmission time of the response from the gateway 41 to the terminal device 31. The latency inspecting unit 153 calculates an Index value for a predetermined period based on the acquired latency. The latency checking unit 153 determines that the performance requirement is satisfied if the calculated Index value is equal to or greater than the Index value specified as the performance requirement. The latency checking unit 153 determines that the performance requirement is not satisfied when the calculated Index value is less than the Index value specified as the performance requirement.

次にメトリック情報収集部１５２によって、コンテナとサーバとのメトリック情報が収集され、メトリック情報記憶部１２０に格納される。収集されるメトリック情報には、例えばＣＰＵの使用率、メモリのＩ／Ｏレートやページフォルト数、ディスク（ファイルシステム）のＩ／Ｏレート、ネットワークの送受信レートなどが含まれる。収集されたメトリック情報に基づいて、振る舞い計算部１５４によって、直近の所定期間におけるコンテナとサーバとの振る舞いが計算される。 Next, metric information of the container and the server is collected by the metric information collection unit 152 and stored in the metric information storage unit 120. The collected metric information includes, for example, a CPU usage rate, a memory I / O rate and the number of page faults, a disk (file system) I / O rate, a network transmission / reception rate, and the like. Based on the collected metric information, the behavior calculation unit 154 calculates the behavior of the container and the server in the latest predetermined period.

図１３は、コンテナの振る舞いの計算例を示す図である。図１３の例では、コンテナＣ₁₁の振る舞いを計算するものとする。振る舞い計算部１５４は、メトリック情報記憶部１２０から、コンテナ名が「Ｃ₁₁」であるレコードを抽出する。次に振る舞い計算部１５４は、抽出したレコードをメトリック種別で分類する。次に振る舞い計算部１５４は、同じメトリック種別のレコードに設定されている値（メトリック値）が０〜１００となるように正規化し、度数分布を生成する。例えば振る舞い計算部１５４は、各メトリック値の理論上の最大値が「１００」となるように正規化する。そして振る舞い計算部１５４は、度数分布に基づいて、メトリック種別ごとに、５０パーセンタイル値、９０パーセンタイル値、および９９パーセンタイル値を計算する。 FIG. 13 is a diagram illustrating a calculation example of the behavior of the container. In the example of FIG. 13, it is assumed to calculate the behavior of the container C _11. The behavior calculation unit 154 extracts a record whose container name is “C ₁₁ ” from the metric information storage unit 120. Next, the behavior calculation unit 154 classifies the extracted records by metric type. Next, the behavior calculation unit 154 normalizes the values (metric values) set in the records of the same metric type to be 0 to 100, and generates a frequency distribution. For example, the behavior calculation unit 154 normalizes so that the theoretical maximum value of each metric value becomes “100”. Then, the behavior calculation unit 154 calculates a 50th percentile value, a 90th percentile value, and a 99th percentile value for each metric type based on the frequency distribution.

振る舞い計算部１５４は、サービス５０のコンポーネントを実行するすべてのコンテナの振る舞いを計算する。そして、レイテンシ検査部１５３によってサービス５０の性能要件が満たされていると判断されている場合、振る舞い計算部１５４は、直近の周期のコンテナ振る舞い管理テーブル１３１ａを作成し、そのコンテナ振る舞い管理テーブル１３１ａを正常時振る舞い記憶部１３０に格納する。 The behavior calculation unit 154 calculates the behavior of all containers that execute the components of the service 50. When the latency checking unit 153 determines that the performance requirement of the service 50 is satisfied, the behavior calculation unit 154 creates a container behavior management table 131a of the latest cycle, and stores the container behavior management table 131a. It is stored in the normal behavior storage unit 130.

図１４は、サーバの振る舞いの計算例を示す図である。図１４の例では、サーバ名「サーバ１」のサーバ４２の振る舞いを計算するものとする。振る舞い計算部１５４は、メトリック情報記憶部１２０から、サーバ名が「サーバ１」であるレコードを抽出する。次に振る舞い計算部１５４は、抽出したレコードをメトリック種別で分類する。次に振る舞い計算部１５４は、同じメトリック種別のレコードに設定されている値（メトリック値）が０〜１００となるように正規化し、度数分布を生成する。そして振る舞い計算部１５４は、度数分布に基づいて、メトリック種別ごとに、５０パーセンタイル値、９０パーセンタイル値、および９９パーセンタイル値を計算する。 FIG. 14 is a diagram illustrating a calculation example of the behavior of the server. In the example of FIG. 14, it is assumed that the behavior of the server 42 with the server name “server 1” is calculated. The behavior calculation unit 154 extracts a record whose server name is “server 1” from the metric information storage unit 120. Next, the behavior calculation unit 154 classifies the extracted records by metric type. Next, the behavior calculation unit 154 normalizes the values (metric values) set in the records of the same metric type to be 0 to 100, and generates a frequency distribution. Then, the behavior calculation unit 154 calculates a 50th percentile value, a 90th percentile value, and a 99th percentile value for each metric type based on the frequency distribution.

振る舞い計算部１５４は、すべてのサーバ４２〜４４の振る舞いを計算する。そして、レイテンシ検査部１５３によってサービス５０の性能要件が満たされていると判断されている場合、振る舞い計算部１５４は、直近の周期のサーバ振る舞い管理テーブル１３２ａを作成し、そのサーバ振る舞い管理テーブル１３２ａを正常時振る舞い記憶部１３０に格納する。 The behavior calculation unit 154 calculates the behavior of all the servers 42 to 44. When the latency checking unit 153 determines that the performance requirement of the service 50 is satisfied, the behavior calculation unit 154 creates the server behavior management table 132a of the most recent cycle and stores the server behavior management table 132a. It is stored in the normal behavior storage unit 130.

レイテンシ検査部１５３によってサービス５０の性能要件が満たされてないと判断された場合、振る舞い計算部１５４は、計算したコンテナとサーバとのパーセンタイル値を、異常時の振る舞いを示す情報として、異常要因推定部１５５に送信する。すると異常要因推定部１５５は、異常時の振る舞いと正常時の振る舞いとを比較して、サービスのレイテンシ低下の要因となっているコンポーネントを推定する。 When the latency checking unit 153 determines that the performance requirement of the service 50 is not satisfied, the behavior calculation unit 154 uses the calculated percentile value between the container and the server as information indicating the behavior at the time of abnormality, and estimates the cause of the abnormality To the unit 155. Then, the abnormality factor estimation unit 155 compares the behavior at the time of abnormality with the behavior at the time of normality, and estimates a component that causes a decrease in service latency.

例えば異常要因推定部１５５は、正常時振る舞い記憶部１３０から、新しい方からｎ周期分（ｎは１以上の整数）のコンテナのメトリックごとのパーセンタイル値を取得する。そして異常要因推定部１５５は、取得したパーセンタイル値に基づいて、各メトリックの正常時の振る舞いを決定する。このとき異常要因推定部１５５は、現在に近い周期の振る舞いほど今後の振る舞いに近いとみなすようにするため、パーセンタイル値の取得元の周期の古さに応じて、パーセンタイル値に重み付けを行う。 For example, the abnormality factor estimation unit 155 acquires a percentile value for each metric of the container for n cycles (n is an integer equal to or greater than 1) from the newest behavior storage unit 130. Then, the abnormality factor estimation unit 155 determines the normal behavior of each metric based on the acquired percentile value. At this time, the abnormality factor estimation unit 155 weights the percentile value according to the age of the cycle from which the percentile value is acquired in order to consider that the behavior with the cycle closer to the present is closer to the future behavior.

図１５は、パーセンタイル値への重み付けの例を示す図である。図１５に示した例では、周期ｔ〜ｔ＋２周期の３周期分の正常時のパーセンタイル値を取得したものとする。このとき異常要因推定部１５５は、最新の周期ｔ＋２のパーセンタイル値の重みを「３」とする。また異常要因推定部１５５は、１つ前の周期ｔ＋１のパーセンタイル値の重みを「２」とする。さらに異常要因推定部１５５は、２つ前の周期ｔのパーセンタイル値の重みを「２」とする。 FIG. 15 is a diagram illustrating an example of weighting the percentile value. In the example illustrated in FIG. 15, it is assumed that the percentile values at the normal time for three periods of the period t to t + 2 are acquired. At this time, the abnormality factor estimation unit 155 sets the weight of the percentile value of the latest period t + 2 to “3”. In addition, the abnormality factor estimation unit 155 sets the weight of the percentile value of the immediately preceding cycle t + 1 to “2”. Furthermore, the abnormality factor estimation unit 155 sets the weight of the percentile value of the previous period t to “2”.

このように異常要因推定部１５５は、現在に近い周期のパーセンタイル値ほど重みを大きくして、ｎ周期分の期間のパーセンタイル値（重み付きパーセンタイル値）をメトリックごとに算出する。例えば、以下のようにして、重み付きパーセンタイル値を算出する。 As described above, the abnormality factor estimation unit 155 increases the weight as the percentile value of the cycle closer to the current time, and calculates the percentile value (weighted percentile value) of the period of n cycles for each metric. For example, the weighted percentile value is calculated as follows.

正常時のパーセンタイル値として、以下のデータが得られたものとする。Ｓ１は最新の周期のデータの集合である。Ｓ２は、Ｓ１の１つ前の周期のデータ集合である。Ｓ３は、Ｓ２の１つ前の周期のデータ集合である。
Ｓ１：｛１，２｝
Ｓ２：｛３，４｝
Ｓ３：｛５，６｝
この例では、重み付けの処理を分かりやすくするため、データの値を単純化している。Ｓ１，Ｓ２，Ｓ３に対する重み付パーセンタイル値を求めるとき、重みの分だけ、各正常データの数を増やす。例えば、集合Ｓ１，Ｓ２，Ｓ３それぞれに対する重みを、「３」、「２」、「１」とする。この場合、集合Ｓ１，Ｓ２，Ｓ３は、以下の集合に置き換えられる。
Ｓ１’＝Ｓ１×３：｛１，１，１，２，２，２｝
Ｓ２’＝Ｓ２×２：｛３，３，４，４｝
Ｓ３’＝Ｓ３×１：｛５，６｝
集合Ｓ１’は、集合Ｓ１を３倍したものである。すなわち集合Ｓ１と同じ３つの集合を１つに纏めたものが、集合Ｓ１’である。集合Ｓ２’は、集合Ｓ２を２倍したものである。すなわち集合Ｓ２と同じ２つの集合を１つに纏めたものが、集合Ｓ２’である。集合Ｓ３’は、集合Ｓ３と同じである。異常要因推定部１５５は、これらの集合Ｓ１’，Ｓ２’Ｓ３’を１つの集合にまとめ、データを昇順ソートする。すなわち異常要因推定部１５５は、周期ごとの各集合について、その集合と同じ集合を重みの数だけ生成し、生成した集合を１つに纏めて、データを昇順にソートする。ソートの結果、以下の集合Ｓが得られる。
Ｓ＝：｛１，１，１，２，２，２，３，３，４，４，５，６｝
異常要因推定部１５５は、この集合Ｓに基づいて得られたパーセンタイル値を、重み付きパーセンタイル値とする。すると、５０パーセンタイルは「２」となる。また９０パーセンタイルは「４」となる。 It is assumed that the following data is obtained as the percentile value at the normal time. S1 is a set of data of the latest cycle. S2 is a data set of the cycle immediately before S1. S3 is a data set of the cycle immediately before S2.
S1: {1, 2}
S2: {3, 4}
S3: {5, 6}
In this example, the data value is simplified to make the weighting process easier to understand. When obtaining weighted percentile values for S1, S2, and S3, the number of each normal data is increased by the weight. For example, the weights for the sets S1, S2, and S3 are “3”, “2”, and “1”, respectively. In this case, the sets S1, S2, and S3 are replaced with the following sets.
S1 ′ = S1 × 3: {1, 1, 1, 2, 2, 2}
S2 ′ = S2 × 2: {3, 3, 4, 4}
S3 ′ = S3 × 1: {5, 6}
The set S1 ′ is a triple of the set S1. That is, the set S1 ′ is a set of the same three sets as the set S1. The set S2 ′ is a double of the set S2. That is, a set S2 ′ is a set of the same two sets as the set S2. The set S3 ′ is the same as the set S3. The abnormality factor estimation unit 155 combines these sets S1 ′ and S2′S3 ′ into one set, and sorts the data in ascending order. That is, the abnormality factor estimation unit 155 generates, for each set for each period, the same set as the set for the number of weights, collects the generated sets into one, and sorts the data in ascending order. As a result of the sorting, the following set S is obtained.
S =: {1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6}
The abnormality factor estimation unit 155 sets the percentile value obtained based on the set S as a weighted percentile value. Then, the 50th percentile becomes “2”. The 90th percentile is “4”.

異常要因推定部１５５は、正常時の重み付きパーセンタイル値と、異常時の振る舞いを示す最新のパーセンタイル値とを、メトリック種別ごとに比較し、そのメトリック種別に関する要因度を求める。異常要因推定部１５５は、例えば要因度として、正の要因度と負の要因度とを求める。 The abnormality factor estimation unit 155 compares the weighted percentile value at the normal time and the latest percentile value indicating the behavior at the time of the abnormality for each metric type, and obtains a factor degree related to the metric type. The abnormality factor estimation unit 155 obtains, for example, a positive factor degree and a negative factor degree as factor degrees.

図１６は、要因度の計算例を示す図である。図１６の例では、正常時の振る舞いを示す重み付きパーセンタイル値では、５０パーセンタイル値が「１５」、９０パーセンタイル値が「７１」、９９パーセンタイル値が「９０」である。また異常時の振る舞いを示す最新のパーセンタイル値では、５０パーセンタイル値が「６」、９０パーセンタイル値が「９２」、９９パーセンタイル値が「９８」である。 FIG. 16 is a diagram illustrating a calculation example of the factor degree. In the example of FIG. 16, in the weighted percentile value indicating the normal behavior, the 50th percentile value is “15”, the 90th percentile value is “71”, and the 99th percentile value is “90”. In the latest percentile value indicating the behavior at the time of abnormality, the 50th percentile value is “6”, the 90th percentile value is “92”, and the 99th percentile value is “98”.

ここで、正の要因度と負の要因度とを、以下のように定める。
・正の要因度Ｆ₊＝Σ（値が増加するＰパーセンタイルのＰの増分）×（パーセンタイル値の差）
・負の要因度Ｆ_-＝Σ（値が減少するＰパーセンタイルのＰの増分）×（パーセンタイル値の差）
Ｐはパーセンタイル種別を示す数値であり、５０パーセンタイルの場合Ｐ＝５０である。値が増加するＰパーセンタイルとは、正常時のパーセンタイル値より異常時のパーセンタイル値の方が大きいパーセンタイル種別である。値が減少するＰパーセンタイルとは、異常時のパーセンタイル値より正常時のパーセンタイル値の方が大きいパーセンタイル種別である。 Here, the positive factor degree and the negative factor degree are determined as follows.
-Positive factor F ₊ = Σ (increment of P of P percentile with increasing value) × (difference of percentile value)
Negative factor F ₋ = Σ (increment of P of P percentile for which value decreases) × (difference of percentile value)
P is a numerical value indicating the type of percentile, and P = 50 for the 50th percentile. The P percentile whose value increases is a percentile type in which the percentile value at the time of abnormality is larger than the percentile value at the time of normality. The P percentile whose value decreases is a percentile type in which the normal percentile value is larger than the normal percentile value.

ＰパーセンタイルのＰの増分とは、パーセンタイル種別をＰの値が小さい順に並べたときの、各パーセンタイル種別についての、直前のパーセンタイル種別からのＰの値の増加量である。図１６の例では、５０パーセンタイル、９０パーセンタイル、９９パーセンタイルがある。その場合、５０パーセンタイルについてのＰの増分は、「５０」である。９０パーセンタイルについてのＰの増分は、「４０」（９０−５０）である。９９パーセンタイルについてのＰの増分は、「９」（９９−９０）である。 The P increment of the P percentile is the amount of increase in the P value from the previous percentile type for each percentile type when the percentile types are arranged in ascending order of the P value. In the example of FIG. 16, there are 50th percentile, 90th percentile, and 99th percentile. In that case, the increment of P for the 50th percentile is “50”. The increment of P for the 90th percentile is “40” (90-50). The increment of P for the 99th percentile is “9” (99-90).

サービスのレイテンシが性能要件を満たしていないとき、コンテナやサーバの負荷が平常時より増加していれば、メトリック値が高い値に集中し、正の要因度が高くなる。またサービスのレイテンシが性能要件を満たしていないとき、コンテナやサーバの負荷が平常時より低下していれば、メトリック値が低い値に集中し、負の要因度が高くなる。サービスのレイテンシが性能要件を満たしているのに、コンテナまたはサーバの正の要因度よりも負の要因度の方が高い場合、そのコンテナまたはサーバとは別の要因で性能が劣化していると判断できる。 When the service latency does not meet the performance requirement, if the load on the container or server is increased than usual, the metric value is concentrated on a high value, and the positive factor is high. Also, when the service latency does not satisfy the performance requirements, if the load on the container or server is lower than normal, the metric value is concentrated at a low value, and the negative factor is high. If the service latency meets the performance requirements but the negative factor is higher than the positive factor of the container or server, the performance is degraded due to a factor other than that of the container or server. I can judge.

図１６に示した例では、要因度は以下の通りとなる。
・正の要因度Ｆ₊＝（９０−５０）×（９２−７１）＋（９９−９０）×（９８−９０）＝９１２
・負の要因度Ｆ_-＝５０×（１５−６）＝４５０
異常要因推定部１５５は、このような要因度の計算を、メトリック種別ごとに行う。そして異常要因推定部１５５は、最大の要因度の算出元のコンテナが実行しているコンポーネントを、異常の要因である要因コンポーネントとして推定する。 In the example shown in FIG. 16, the factor degrees are as follows.
Positive factor F ₊ = (90−50) × (92−71) + (99−90) × (98−90) = 912
・ Negative factor F ₋ = 50 × (15−6) = 450
The abnormality factor estimation unit 155 performs the calculation of the factor degree for each metric type. Then, the abnormality factor estimation unit 155 estimates the component executed by the container from which the maximum factor degree is calculated as the factor component that is the cause of the abnormality.

図１７は、要因コンポーネントの推定例を示す図である。図１７に示すように、すべてのコンテナについて、メトリック種別ごとに、正の要因度と負の要因度とが算出される。異常要因推定部１５５は、算出された要因度の中から、最大の要因度を抽出する。図１７の例では、コンテナＣ₁₁のＣＰＵ使用率についての正の要因度の値が最大となっている。異常要因推定部１５５は、抽出した要因度の算出元となっているコンテナＣ₁₁で実行しているコンポーネント（コンポーネント名「コンポーネント１」）を、要因コンポーネントとして推定する。このとき異常要因推定部１５５は、最大の要因度に対応するメトリック種別「ＣＰＵ使用率」を、要因メトリックとする。また異常要因推定部１５５は、最大の要因度が正の要因度なのか負の要因度なのかを示すコンテナ要因度符号を、正とする。 FIG. 17 is a diagram illustrating an example of estimating the factor component. As shown in FIG. 17, the positive factor and the negative factor are calculated for each metric type for all containers. The abnormality factor estimation unit 155 extracts the maximum factor degree from the calculated factor degrees. In the example of FIG. 17, the value of the positive factors of about CPU usage of the container C ₁₁ is maximum. The abnormality factor estimation unit 155 estimates the component (component name “component 1”) executed in the container C ₁₁ which is the calculation source of the extracted factor degree as the factor component. At this time, the abnormality factor estimation unit 155 sets the metric type “CPU usage rate” corresponding to the maximum factor degree as the factor metric. Also, the abnormality factor estimation unit 155 sets the container factor degree code indicating whether the maximum factor factor is a positive factor factor or a negative factor factor to be positive.

さらに異常要因推定部１５５は、コンテナ配置管理テーブル１４１から、最大の要因度の算出元となったコンテナが実装されているサーバのサーバ名を取得する。そして異常要因推定部１５５は、取得したサーバ名を、コンテナ稼働サーバのサーバ名とする。図１７の例では、コンテナ稼働サーバは「サーバ１」である。 Furthermore, the abnormality factor estimation unit 155 acquires the server name of the server on which the container that is the source of calculation of the maximum factor degree is mounted from the container arrangement management table 141. Then, the abnormality factor estimation unit 155 sets the acquired server name as the server name of the container operation server. In the example of FIG. 17, the container operating server is “Server 1”.

また異常要因推定部１５５は、サーバについても、メトリック種別ごとの要因度を計算する。そして異常要因推定部１５５は、サーバのメトリック種別それぞれについて、正の要因度と負の要因度とを比較する。異常要因推定部１５５は、正の要因度が負の要因度以上であれば、そのメトリック種別の要因度符号を「正」とする。異常要因推定部１５５は、正の要因度が負の要因度未満であれば、そのメトリック種別の要因度符号を「負」とする。 Also, the abnormality factor estimation unit 155 calculates a factor degree for each metric type for the server. Then, the abnormality factor estimation unit 155 compares the positive factor degree with the negative factor degree for each metric type of the server. If the positive factor degree is equal to or greater than the negative factor degree, the abnormality factor estimating unit 155 sets the factor degree code of the metric type to “positive”. If the positive factor degree is less than the negative factor degree, the abnormality factor estimating unit 155 sets the factor degree code of the metric type to “negative”.

そして、異常要因推定部１５５は、コンテナ稼働サーバの要因メトリックの要因度符号を、サーバ要因度符号とする。
図１８は、サーバ要因度符号の判定例を示す図である。図１８の例では、コンテナ稼働サーバ「サーバ１」の要因メトリック「ＣＰＵ使用率」の要因度符号は「正」であるため、サーバ要因度符号は「正」となる。 Then, the abnormality factor estimation unit 155 sets the factor degree code of the factor metric of the container operation server as the server factor degree code.
FIG. 18 is a diagram illustrating a determination example of the server factor degree code. In the example of FIG. 18, since the factor degree code of the factor metric “CPU usage rate” of the container operation server “server 1” is “positive”, the server factor degree code is “positive”.

なおサーバの要因度についても、コンテナと同じ手順で計算することができるが、サーバについては、各メトリック種別の要因度符号が判明すればよい。そこで例えば、正の要因度と負の要因度とを分けずに、メトリック種別の要因度を以下の式で計算してもよい。
・要因度Ｆ＝Σ（ＰパーセンタイルのＰの増分）×（パーセンタイル値の差）
このときのパーセンタイル値の差は、正常値のパーセンタイル値から異常時のパーセンタイル値を減算した値である。このようにして計算した要因度Ｆが０以上の値であれば、要因度符号は「正」である。要因度Ｆが負の値であれば、要因度符号は「負」である。 The server factor can also be calculated in the same procedure as the container, but for the server, the factor code for each metric type only needs to be known. Therefore, for example, the factor degree of the metric type may be calculated by the following formula without dividing the positive factor degree and the negative factor degree.
・ Factor factor F = Σ (increment of P of P percentile) × (difference of percentile value)
The difference between the percentile values at this time is a value obtained by subtracting the abnormal percentile value from the normal percentile value. If the factor F calculated in this way is a value of 0 or more, the factor code is “positive”. If the factor F is a negative value, the factor sign is “negative”.

異常要因推定部１５５が、要因コンポーネント、要因メトリック、最大要因符号、およびサーバ要因度符号を決定すると、コンテナ配置制御部１５６が、レイテンシを改善するようにコンテナの追加、またはコンテナの配置先の変更などの性能改善処理を行う。 When the abnormal factor estimation unit 155 determines the factor component, factor metric, maximum factor code, and server factor degree code, the container arrangement control unit 156 adds a container or changes the container arrangement destination so as to improve the latency. Perform performance improvement processing.

コンテナ配置制御部１５６は、例えば、コンテナ要因度符号が正の場合、要因コンポーネントのリソースが不足していると判断し、要因コンポーネントのスケールアウトを行う。またコンテナ配置制御部１５６は、要因コンポーネントの要因度が負の場合であり、かつサーバ要因度符号が「正」の場合、要因コンポーネント以外のコンポーネントによるリソースの負荷が大きい影響で、要因コンポーネントの性能が低下していると判断する。この場合、コンテナ配置制御部１５６は、コンテナの配置変換を行う。コンテナの配置変換は、コンテナを稼働させるサーバを、別のサーバに変更する処理である。 For example, when the container factor degree code is positive, the container arrangement control unit 156 determines that the resource of the factor component is insufficient, and scales out the factor component. In addition, when the factor degree of the factor component is negative and the server factor degree code is “positive”, the container arrangement control unit 156 is affected by a large resource load by components other than the factor component, and the performance of the factor component Is judged to have declined. In this case, the container arrangement control unit 156 performs container arrangement conversion. Container layout conversion is a process of changing a server that operates a container to another server.

なお、コンポーネントのコンテナが使用するリソース量が規定されている場合がある。この場合、コンテナ配置制御部１５６は、コンポーネントのスケールアウトまたは配置変換のとき、コンテナを収容できるサーバを配置先候補とする。配置先候補となるサーバが複数ある場合、コンテナ配置制御部１５６は、コンテナが各配置先候補に配備されたと仮定したとき、サーバの最小残余リソース量が最大となる配置先候補を、配置先に決定する。 Note that the amount of resources used by a component container may be specified. In this case, the container placement control unit 156 sets a server that can accommodate the container as a placement destination candidate when the component is scaled out or placed. When there are a plurality of servers that are placement destination candidates, the container placement control unit 156 assumes that the placement destination candidate having the largest minimum residual resource amount of the server is placed in the placement destination, assuming that the container is placed in each placement destination candidate. decide.

図１９は、コンテナの配置例を示す図である。図１９の例では、要因コンポーネントが「コンポーネント１」であり、コンテナ要因度符号が「正」である。この場合、コンテナ配置制御部１５６は、「コンポーネント１」のスケールアウトを行う。 FIG. 19 is a diagram illustrating an exemplary arrangement of containers. In the example of FIG. 19, the factor component is “component 1”, and the container factor degree code is “positive”. In this case, the container arrangement control unit 156 scales out “component 1”.

このときコンテナ配置制御部１５６は、サーバリソース管理テーブル１４２を参照し、各サーバの残余リソース量を確認する。図１９の例では、「サーバ１」の残余リソース量は、ＣＰＵ「５０」、メモリ「３０」、ネットワーク「４０」である。「サーバ２」の残余リソース量は、ＣＰＵ「３０」、メモリ「５０」、ネットワーク「６０」である。 At this time, the container arrangement control unit 156 refers to the server resource management table 142 and confirms the remaining resource amount of each server. In the example of FIG. 19, the remaining resource amount of “server 1” is CPU “50”, memory “30”, and network “40”. The remaining resource amount of “Server 2” is CPU “30”, memory “50”, and network “60”.

またコンテナ配置制御部１５６は、コンテナリソース管理テーブル１４３を参照し、要因コンポーネントのコンテナ１つ当たりに使用するリソース量を確認する。図１９の例では、要因コンポーネントである「コンポーネント１」のコンテナの使用リソースは、ＣＰＵ「１０」、メモリ「２０」、ネットワーク「１０」である。 Further, the container arrangement control unit 156 refers to the container resource management table 143 and confirms the amount of resources used per container of the factor component. In the example of FIG. 19, the resources used in the container of “component 1” that is the factor component are the CPU “10”, the memory “20”, and the network “10”.

ここで「コンポーネント１」のコンテナを配置できるだけの残余リソース量を有しているサーバが、サーバ名「サーバ１」のサーバ４２と、サーバ名「サーバ２」のサーバ４３のみであるものとする。この場合、サーバ４２とサーバ４３とが、配置先候補となる。 Here, it is assumed that only the server 42 having the server name “server 1” and the server 43 having the server name “server 2” have the remaining resource amount sufficient to place the container of “component 1”. In this case, the server 42 and the server 43 are placement destination candidates.

サーバ名「サーバ１」のサーバ４２にコンテナを配置した場合の残余リソース量は、ＣＰＵ「４０」、メモリ「１０」、ネットワーク「３０」である。サーバ名「サーバ２」のサーバ４３にコンテナを配置した場合の残余リソース量は、ＣＰＵ「２０」、メモリ「３０」、ネットワーク「５０」である。この場合、サーバ名「サーバ１」のサーバ４２の最小残余リソース量は、メモリの「１０」である。それに対して、サーバ名「サーバ２」のサーバ４３の最小残余リソース量は、ＣＰＵの「２０」である。 The remaining resource amount when the container is arranged in the server 42 with the server name “server 1” is the CPU “40”, the memory “10”, and the network “30”. When the container is arranged in the server 43 with the server name “server 2”, the remaining resource amounts are the CPU “20”, the memory “30”, and the network “50”. In this case, the minimum remaining resource amount of the server 42 with the server name “server 1” is “10” in the memory. On the other hand, the minimum remaining resource amount of the server 43 with the server name “server 2” is “20” of the CPU.

コンテナ配置制御部１５６は、最小残余リソース量が最大となる、サーバ名「サーバ２」のサーバ４３を配置先として選択する。そしてコンテナ配置制御部１５６は、サーバ４３に、スケールアウト処理として。「コンポーネント１」を実行するためのコンテナＣ₁₃を配置する。 The container placement control unit 156 selects the server 43 with the server name “server 2” having the smallest minimum residual resource amount as the placement destination. The container placement control unit 156 then sends the server 43 a scale-out process. A container C ₁₃ for executing “component 1” is arranged.

コンテナ配置制御部１５６は、Ａｐｄｅｘ値が目標値に達するまで、性能調整を継続する。そして、コンテナ配置制御部１５６は、Ａｐｄｅｘ値が目標値に達すると、性能調整を終了する。 The container arrangement control unit 156 continues the performance adjustment until the Index value reaches the target value. Then, when the Addex value reaches the target value, the container placement control unit 156 ends the performance adjustment.

図２０は、性能調整結果の一例を示す図である。図２０の例では、Ａｐｄｅｘ値の目標値は０．８以上である。性能調整前はＡｐｄｅｘ値が「０．７５」であったのが、性能調整を行うことで、Ａｐｄｅｘ値が「０．８３」まで向上している。 FIG. 20 is a diagram illustrating an example of the performance adjustment result. In the example of FIG. 20, the target value of the Index value is 0.8 or more. Before the performance adjustment, the Apdex value was “0.75”, but by performing the performance adjustment, the Index value is improved to “0.83”.

次に性能調整処理の手順について詳細に説明する。
図２１は、性能調整処理の手順の一例を示すフローチャートである。なお図２１に示す処理は、１つのサービスについて性能調整を行う場合の処理である。複数のサービスについて性能調整を行う場合、図２１に示す処理が、複数のサービスそれぞれについて実行される。以下、図２１に示す処理をステップ番号に沿って説明する。 Next, the procedure of the performance adjustment process will be described in detail.
FIG. 21 is a flowchart illustrating an example of the procedure of the performance adjustment process. Note that the processing shown in FIG. 21 is processing when performance adjustment is performed for one service. When performance adjustment is performed for a plurality of services, the process illustrated in FIG. 21 is performed for each of the plurality of services. In the following, the process illustrated in FIG. 21 will be described in order of step number.

［ステップＳ１０１］性能調整エンジン１５０は、例えば管理者により、サービスの性能調整処理の開始指示の入力が行われると、繰り返し回数を示す変数Ｒの値を「０」に初期化する。 [Step S101] The performance adjustment engine 150 initializes the value of the variable R indicating the number of repetitions to “0” when, for example, an administrator inputs a service performance adjustment processing start instruction.

［ステップＳ１０２］レイテンシ検査部１５３は、性能調整対象のサービスについてのサービス情報と、そのサービスのレイテンシとを取得する。例えばレイテンシ検査部１５３は、サービス情報記憶部１１０からサービス情報を取得する。取得するサービス情報には、性能要件として指定されているＡｐｄｅｘの値、Ａｐｄｅｘの算出に用いるＳａｔｉｓｆｉｅｄＴｉｍｅ（Ｔ）が含まれる。またレイテンシ検査部１５３は、ゲートウェイ４１のレイテンシ記憶部４１ｂから、直近の所定期間内に計測された、性能調整対象のサービスに対するリクエストのレイテンシを取得する。 [Step S102] The latency checking unit 153 acquires service information about the performance adjustment target service and the latency of the service. For example, the latency checking unit 153 acquires service information from the service information storage unit 110. The service information to be acquired includes the value of Index specified as a performance requirement, and Satisfied Time (T) used for calculating the Index. Further, the latency checking unit 153 acquires the latency of the request for the performance adjustment target service, which is measured within the latest predetermined period, from the latency storage unit 41 b of the gateway 41.

［ステップＳ１０３］レイテンシ検査部１５３は、複数のリクエストのレイテンシに基づいて、サービスのＡｐｄｅｘを計算する。
［ステップＳ１０４］レイテンシ検査部１５３は、ステップＳ１０３で計算したＡｐｄｅｘの値が、性能要件を満たしているか否かを判断する。例えばレイテンシ検査部１５３は、算出したＡｐｄｅｘ値が性能要件として指定されたＡｐｄｅｘ値以上であれば、性能要件を満たしていると判断する。レイテンシ検査部１５３は、性能要件を満たしている場合、処理をステップＳ１０５に進める。またレイテンシ検査部１５３は、性能要件を満たしていない場合、処理をステップＳ１０７に進める。 [Step S103] The latency checking unit 153 calculates the service index based on the latency of the plurality of requests.
[Step S104] The latency checking unit 153 determines whether or not the value of the Index calculated in Step S103 satisfies the performance requirement. For example, the latency checking unit 153 determines that the performance requirement is satisfied if the calculated Index value is equal to or greater than the Index value specified as the performance requirement. If the latency inspection unit 153 satisfies the performance requirement, the processing proceeds to step S105. In addition, if the performance requirement is not satisfied, the latency checking unit 153 advances the processing to step S107.

［ステップＳ１０５］振る舞い計算部１５４は、コンテナとサーバとの正常時の振る舞いを計算して、正常時振る舞い記憶部１３０に保存する。例えば振る舞い計算部１５４は、メトリック情報記憶部１２０から、コンテナとサーバとの直近の所定期間分のメトリックの値を取得し、複数のパーセンタイル種別についてのパーセンタイル値を計算する。そして振る舞い計算部１５４は、コンテナのパーセンタイル値を設定したコンテナ振る舞い管理テーブルを、そのコンテナの正常時の振る舞いを示す情報として、正常時振る舞い記憶部１３０に格納する。また振る舞い計算部１５４は、サーバのパーセンタイル値を設定したサーバ振る舞い管理テーブルを、そのサーバの正常時の振る舞いを示す情報として、正常時振る舞い記憶部１３０に格納する。 [Step S105] The behavior calculation unit 154 calculates the normal behavior of the container and the server, and stores them in the normal behavior storage unit 130. For example, the behavior calculation unit 154 acquires the metric values for the most recent predetermined period between the container and the server from the metric information storage unit 120, and calculates the percentile values for a plurality of percentile types. Then, the behavior calculation unit 154 stores the container behavior management table in which the percentile value of the container is set in the normal behavior storage unit 130 as information indicating the normal behavior of the container. Further, the behavior calculation unit 154 stores the server behavior management table in which the percentile value of the server is set in the normal behavior storage unit 130 as information indicating the normal behavior of the server.

［ステップＳ１０６］性能調整エンジン１５０は、繰り返し回数を示す変数Ｒを「０」にリセットする。その後、性能調整エンジン１５０は、処理をステップＳ１０２に進める。 [Step S106] The performance adjustment engine 150 resets a variable R indicating the number of repetitions to “0”. Thereafter, the performance adjustment engine 150 proceeds with the process to step S102.

［ステップＳ１０７］振る舞い計算部１５４は、コンテナとサーバとの異常時の振る舞いを計算する。例えば振る舞い計算部１５４は、メトリック情報記憶部１２０から、コンテナとサーバとの直近の所定期間分のメトリックの値を取得し、複数のパーセンタイル種別についてのパーセンタイル値を計算する。複数のコンテナそれぞれについて算出したパーセンタイル値が、対応するコンテナの異常時の振る舞いを示す情報である。また複数のサーバそれぞれについて算出したパーセンタイル値が、対応するサーバの異常時の振る舞いを示す情報である。 [Step S107] The behavior calculation unit 154 calculates the behavior when the container and the server are abnormal. For example, the behavior calculation unit 154 acquires the metric values for the most recent predetermined period between the container and the server from the metric information storage unit 120, and calculates the percentile values for a plurality of percentile types. The percentile value calculated for each of a plurality of containers is information indicating the behavior of the corresponding container when it is abnormal. In addition, the percentile value calculated for each of the plurality of servers is information indicating the behavior when the corresponding server is abnormal.

［ステップＳ１０８］異常要因推定部１５５は、性能調整対象のサービスの提供に使用されるコンポーネントを実行するコンテナの正常時と異常時との振る舞いの差を、メトリック種別ごとに計算する。例えば異常要因推定部１５５は、正常時振る舞い記憶部１３０から重み付きパーセンタイル値を取得する。次に異常要因推定部１５５は、正常時の振る舞いを示す重み付きパーセンタイル値と、ステップＳ１０７で計算した異常時の振る舞いを示すパーセンタイル値とを比較して、メトリック種別ごとに正の要因度と負の要因度を計算する。 [Step S108] The abnormality factor estimation unit 155 calculates, for each metric type, a difference in behavior between the normal state and the abnormal state of the container that executes the component used to provide the performance adjustment target service. For example, the abnormality factor estimation unit 155 acquires a weighted percentile value from the normal behavior storage unit 130. Next, the abnormal factor estimation unit 155 compares the weighted percentile value indicating the behavior at normal time with the percentile value indicating the behavior at the time of abnormality calculated in step S107, and determines the positive factor degree and negative value for each metric type. Calculate the factor of

［ステップＳ１０９］異常要因推定部１５５は、ステップＳ１０８における計算結果に基づいて、要因コンポーネントを推定する。例えば異常要因推定部１５５は、メトリック種別ごとの正の要因度と負の要因度との中から、最も大きな値の要因度を抽出する。そして異常要因推定部１５５は、抽出した要因度を算出元となったコンテナで実行されているコンポーネントを、要因コンポーネントとして推定する。 [Step S109] The abnormality factor estimation unit 155 estimates a factor component based on the calculation result in step S108. For example, the abnormality factor estimation unit 155 extracts the factor value having the largest value from the positive factor value and the negative factor value for each metric type. Then, the abnormal factor estimation unit 155 estimates a component executed in the container from which the extracted factor degree is calculated as a factor component.

［ステップＳ１１０］性能調整エンジン１５０は、繰り返し回数を示す変数Ｒの値が、閾値Ｘ（Ｘは、１以上の整数）に達したか否かを判断する。性能調整エンジン１５０は、繰り返し回数が閾値Ｘに達した場合、性能調整を断念し、処理を終了する。またコンテナ配置制御部１５６は、繰り返し回数が閾値Ｘに達していなければ、処理をステップＳ１１１に進める。 [Step S110] The performance adjustment engine 150 determines whether or not the value of the variable R indicating the number of repetitions has reached a threshold value X (X is an integer of 1 or more). When the number of repetitions reaches the threshold value X, the performance adjustment engine 150 gives up performance adjustment and ends the processing. If the number of repetitions has not reached the threshold value X, the container arrangement control unit 156 advances the process to step S111.

［ステップＳ１１１］コンテナ配置制御部１５６は、ステップＳ１０９において抽出した要因度の符号（コンテナ要因度符号）が正か否かを判断する。コンテナ配置制御部１５６は、正の要因度であれば、処理をステップＳ１１２に進める。またコンテナ配置制御部１５６は、負の要因度であれば、処理をステップＳ１１３に進める。 [Step S111] The container arrangement control unit 156 determines whether or not the factor degree code (container factor degree code) extracted in Step S109 is positive. If the container arrangement control unit 156 has a positive factor, the process proceeds to step S112. If the container placement control unit 156 has a negative factor, the process advances to step S113.

［ステップＳ１１２］コンテナ配置制御部１５６は、要因コンポーネントのスケールアウトを実施する。すなわちコンテナ配置制御部１５６は、要因コンポーネントを実行するコンテナを、いずれかのサーバに追加で配置する。例えばコンテナ配置制御部１５６は、コンテナを配置可能なサーバのうち、配置後の空きリソース量が最も多いサーバに、コンテナを配置する。その後、コンテナ配置制御部１５６は、処理をステップＳ１１５に進める。 [Step S112] The container placement control unit 156 scales out the factor component. That is, the container placement control unit 156 additionally places a container that executes the factor component on any server. For example, the container placement control unit 156 places a container on a server having the largest amount of free resources after placement among servers that can place the container. Thereafter, the container arrangement control unit 156 advances the processing to step S115.

［ステップＳ１１３］コンテナ配置制御部１５６は、サーバ要因度符号が正か否かを判断する。コンテナ配置制御部１５６は、サーバ要因度符号が正の場合、処理をステップＳ１１４に進める。またコンテナ配置制御部１５６は、サーバ要因度符号が負の場合、性能調整を断念し、処理を終了する。 [Step S113] The container arrangement control unit 156 determines whether or not the server factor degree code is positive. If the server factor degree code is positive, the container arrangement control unit 156 advances the process to step S114. In addition, when the server factor degree code is negative, the container arrangement control unit 156 gives up performance adjustment and ends the processing.

［ステップＳ１１４］コンテナ配置制御部１５６は、コンテナの配置変更を行う。すなわちコンテナ配置制御部１５６は、ステップＳ１０９で抽出した要因度の計算元となったコンテナの配置先を、現在のサーバから別のサーバに変更する。 [Step S114] The container arrangement control unit 156 changes the arrangement of containers. That is, the container arrangement control unit 156 changes the arrangement destination of the container that is the source of calculation of the factor extracted in step S109 from the current server to another server.

［ステップＳ１１５］性能調整エンジン１５０は、繰り返し回数を示す変数Ｒの値を１だけカウントアップし、処理をステップＳ１０２に進める。
このようにして、性能要件を満たさないサービスにおいて、どのコンポーネントがボトルネックになっているのかを適切に判断し、そのコンポーネントの処理能力が向上するように性能調整をすることができる。これにより、コンポーネントごとの性能要件を定めなくても、コンポーネントの性能が不足した場合、コンポーネントの機能が自動で拡張される。その結果、例えばシステムの運用管理コストが削減される。またコンポーネントの性能調整が自動で行われることにより、コンポーネントの開発時にそのコンポーネントの発揮性能を意識せずにすみ、開発コストが削減される。 [Step S115] The performance adjustment engine 150 increments the value of the variable R indicating the number of repetitions by 1, and advances the process to step S102.
In this way, it is possible to appropriately determine which component is a bottleneck in a service that does not satisfy the performance requirement, and perform performance adjustment so that the processing capability of the component is improved. Thereby, even if the performance requirement for each component is not defined, the component function is automatically expanded when the performance of the component is insufficient. As a result, for example, system operation management costs are reduced. Moreover, by automatically adjusting the performance of the component, it is not necessary to be aware of the performance of the component when developing the component, and the development cost is reduced.

また第２の実施の形態では、コンテナの正常時と異常時との振る舞いの差に基づいて、レイテンシ悪化の要因となっているコンポーネントを判断している。これにより、レイテンシ悪化の要因のコンポーネントを適切に判断することができる。 In the second embodiment, the component causing the latency deterioration is determined based on the difference in behavior between the normal state and the abnormal state of the container. This makes it possible to appropriately determine the component that causes the latency deterioration.

しかも第２の実施の形態では、メトリックの度数分布からパーセンタイル値を求めることで、メトリックの度数分布で示される状態が、比較容易な数値に置き換えられている。これにより、正常時と異常時との振る舞いの差を数値化でき、複数のコンテナの中から、振る舞いの差が最も大きいコンテナを容易に特定可能となっている。 Moreover, in the second embodiment, the percentile value is obtained from the metric frequency distribution, whereby the state indicated by the metric frequency distribution is replaced with an easily comparable numerical value. As a result, the difference in behavior between normal time and abnormal time can be quantified, and the container having the largest difference in behavior can be easily identified from among a plurality of containers.

さらに第２の実施の形態では、重み付きパーセンタイル値を用いることで、正常時の状態に対して、最近の状態を強く反映させている。これにより、正常時の振る舞いを正しく計算することができる。すなわち、クラウドコンピューティングシステムでは、サーバの追加やソフトウェアの追加などのシステム構成の変更が頻繁に行われる。そのため、コンテナやサーバの遠い過去の正常時の振る舞いは、最近の正常時の振る舞いと大きく異なる可能性がある。また、最近の短い期間の振る舞いを正常時の振る舞いとしてしまうと、ある一時期に発生した特殊要因（例えばサーバ故障）などが振る舞いに反映されてしまい、正常時の振る舞いとしての正確性に欠ける。そこで性能調整エンジン１５０は、最近の正常時の振る舞いを強く反映させて、ある程度長い期間の振る舞いに基づいて正常時の振る舞いを計算している。その結果、正常時の振る舞いの正確性が向上する。 In the second embodiment, the weighted percentile value is used to strongly reflect the latest state with respect to the normal state. As a result, normal behavior can be calculated correctly. That is, in the cloud computing system, system configuration changes such as addition of servers and addition of software are frequently performed. For this reason, the normal behavior of containers and servers in the past in the past may be significantly different from the recent normal behavior. Further, if the behavior in the short period of time is regarded as the normal behavior, a special factor (for example, server failure) that occurs at a certain time is reflected in the behavior, and the accuracy as the normal behavior is lacking. Therefore, the performance adjustment engine 150 strongly reflects the recent normal behavior, and calculates the normal behavior based on the behavior over a relatively long period. As a result, the accuracy of normal behavior is improved.

また第２の実施の形態では、性能調整エンジン１５０は、性能劣化の要因であるコンテナの要因度の符号（コンテナ要因度符号）が正であれば、そのコンテナに対応するコンポーネントのスケールアウトを行うが、コンテナ要因度符号が負であれば配置変更を行う。コンテナ要因度符号が負の場合、性能劣化の要因であるコンテナは、そのコンテナ自身の問題ではなく、コンテナが実装されたサーバの問題（例えば別のソフトウェアの実行による過負荷）によって、性能が劣化している可能性がある。そこで性能調整エンジン１５０は、コンテナの配置変更により、コンテナを何らかの問題を抱えたサーバから別のサーバに移動させ、コンテナが正しく性能を発揮できるようにしている。これにより、無駄なスケールアウトによるリソースの過大消費が抑止される。 In the second embodiment, the performance adjustment engine 150 scales out the component corresponding to the container if the sign of the factor of the container (container factor sign) that is a factor of performance deterioration is positive. However, if the container factor degree sign is negative, the arrangement is changed. When the container factor degree sign is negative, the performance of the container that is the cause of performance degradation is not due to the problem of the container itself, but due to the problem of the server in which the container is installed (for example, overload due to execution of other software). There is a possibility. Therefore, the performance adjustment engine 150 moves the container from a server having some problem to another server by changing the arrangement of the container so that the container can exhibit its performance correctly. Thereby, excessive consumption of resources due to useless scale-out is suppressed.

〔第３の実施の形態〕
次に第３の実施の形態について説明する。第３の実施の形態は、スケールアウト後に、スケールインが可能であれば、スケールインを実施するものである。 [Third Embodiment]
Next, a third embodiment will be described. In the third embodiment, if scale-in is possible after scale-out, scale-in is performed.

すなわち、性能要件を満たすようにすることが主目的であるが、できるだけ少ないリソースでこれを実現させることも重要である。単純にスケールアウトすると、リソースの消費量が増加し、本来は不要なリソースが使用される可能性がある。そこで、第３の実施の形態では、不要なリソース使用量の増加を抑制するため、性能調整エンジン１５０は、可能であればスケールアウト後にスケールインを実施する。 That is, the main purpose is to satisfy the performance requirements, but it is also important to achieve this with as few resources as possible. If you simply scale out, the amount of resource consumption increases, and resources that are originally unnecessary may be used. Therefore, in the third embodiment, in order to suppress an increase in unnecessary resource usage, the performance adjustment engine 150 performs scale-in after scale-out if possible.

具体的には、性能調整エンジン１５０は、要因コンポーネントのコンテナが稼働しているサーバよりも負荷の小さいサーバが２つある場合には、現在稼動中のコンテナを削除して、負荷の小さい２つのサーバでコンテナを稼働させる。このスケールアウト（２増１減のスケールアウト）後のコンポーネントの総負荷（コンテナの負荷の合計）が正常時の総負荷よりも小さい場合、性能調整エンジン１５０は、コンテナが稼働しているサーバの中で最小の負荷であるサーバを選択し、選択したサーバ上のコンテナを削除する。これにより、コンテナ数を増加させることなく性能要件を満たすように性能が調整される。 Specifically, if there are two servers with a smaller load than the server on which the factor component container is running, the performance adjustment engine 150 deletes the currently running container and Run the container on the server. If the total component load (total container load) after this scale-out (scale-up of 2 increase 1 decrease) is smaller than the normal total load, the performance adjustment engine 150 of the server on which the container is operating Select the server with the smallest load, and delete the container on the selected server. As a result, the performance is adjusted so as to satisfy the performance requirement without increasing the number of containers.

以下、図２２〜図２４を参照して、第３の実施の形態における性能調整処理の手順について詳細に説明する。
図２２は、第３の実施の形態における性能調整処理の手順の一例を示すフローチャートの前半である。図２２に示す処理のうち、ステップＳ２０１〜Ｓ２０４、ステップＳ２０６〜Ｓ２１０は、それぞれ図２１に示した第２の実施の形態におけるステップＳ１０１〜Ｓ１０９の処理と同じである。異なるステップＳ２０５の処理は、以下の通りである。 Hereinafter, the performance adjustment processing procedure according to the third embodiment will be described in detail with reference to FIGS. 22 to 24.
FIG. 22 is the first half of a flowchart showing an example of the procedure of the performance adjustment process in the third embodiment. Among the processes shown in FIG. 22, steps S201 to S204 and steps S206 to S210 are the same as the processes of steps S101 to S109 in the second embodiment shown in FIG. The process of different step S205 is as follows.

［ステップＳ２０５］ステップＳ２０４において性能要件を満たしていると判断した場合、コンテナ配置制御部１５６はスケールイン処理を行う。コンテナ配置制御部１５６は、スケールイン処理が終了すると、処理をステップＳ２０６に進める。 [Step S205] If it is determined in step S204 that the performance requirement is satisfied, the container arrangement control unit 156 performs a scale-in process. When the scale-in process ends, the container arrangement control unit 156 advances the process to step S206.

図２３は、スケールイン処理の手順の一例を示すフローチャートである。以下、図２３に示す処理をステップ番号に沿って説明する。
［ステップＳ２２１］コンテナ配置制御部１５６は、２増１減のスケールアウトを実施済みであることを示すフラグ「ＳＣＡＬＥ＿ＦＬＡＧ」の値が「ｔｒｕｅ」か否かを判断する。フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」は初期値が「ｆａｌｓｅ」であり、２増１減のスケールアウトの実施後に「ｔｒｕｅ」に更新される。コンテナ配置制御部１５６は、フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」の値が「ｔｒｕｅ」であれば、処理をステップＳ２２２に進める。またコンテナ配置制御部１５６は、フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」の値が「ｔｒｕｅ」でなければ、スケールイン処理を終了する。 FIG. 23 is a flowchart illustrating an example of the procedure of the scale-in process. In the following, the process illustrated in FIG. 23 will be described in order of step number.
[Step S221] The container arrangement control unit 156 determines whether or not the value of the flag “SCALE_FLAG” indicating that the scale-out of 2 increments and 1 decrements has been performed is “true”. The initial value of the flag “SCALE_FLAG” is “false”, and is updated to “true” after the scale-out of 2 increase 1 decrease is performed. If the value of the flag “SCALE_FLAG” is “true”, the container arrangement control unit 156 proceeds with the process to step S222. If the value of the flag “SCALE_FLAG” is not “true”, the container arrangement control unit 156 ends the scale-in process.

［ステップＳ２２２］コンテナ配置制御部１５６は、２増１減のスケールアウトを実施時の要因コンポーネントの総負荷が、正常時の総負荷以下か否かを判断する。要因コンポーネントの総負荷は、例えばそのコンポーネントを実行しているコンテナの、スケールアウト時に最大の要因度となったメトリック種別の最新のメトリック値の合計である。正常時の総負荷は、例えば要因コンポーネントを実行しているコンテナの、スケールアウト時に最大の要因度となったメトリック種別の、過去の平常動作時のメトリック値の合計である。コンテナ配置制御部１５６は、要因コンポーネントの総負荷が正常時の総負荷以下であれば、処理をステップＳ２２３に進める。またコンテナ配置制御部１５６は、要因コンポーネントの総負荷が正常時の総負荷より大きければ、処理をステップＳ２２４に進める。 [Step S222] The container placement control unit 156 determines whether or not the total load of the factor component when the scale-out of 2 and 1 is performed is less than or equal to the normal total load. The total load of the factor component is, for example, the sum of the latest metric values of the metric type having the largest factor degree at the time of scale-out of the container executing the component. The total load at the normal time is, for example, the total of the metric values in the past normal operation of the metric type having the largest factor degree at the scale-out of the container executing the factor component. If the total load of the factor component is equal to or less than the normal total load, the container arrangement control unit 156 advances the process to step S223. Further, if the total load of the factor component is larger than the normal total load, the container arrangement control unit 156 advances the processing to step S224.

［ステップＳ２２３］コンテナ配置制御部１５６は、要因コンポーネントのスケールインを実施する。すなわちコンテナ配置制御部１５６は、要因コンポーネントを実行するコンテナのうちの１つをサーバから削除する。その後、スケールイン処理が終了する。 [Step S223] The container placement control unit 156 scales in the factor component. That is, the container arrangement control unit 156 deletes one of the containers that execute the factor component from the server. Thereafter, the scale-in process ends.

［ステップＳ２２４］コンテナ配置制御部１５６は、フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」を「ｆａｌｓｅ」に設定する。その後、スケールイン処理が終了する。
このようにして、スケールイン処理が行われる。 [Step S224] The container arrangement control unit 156 sets the flag “SCALE_FLAG” to “false”. Thereafter, the scale-in process ends.
In this way, the scale-in process is performed.

図２４は、第３の実施の形態における性能調整処理の手順の一例を示すフローチャートの後半である。以下、図２４に示す処理をステップ番号に沿って説明する。
［ステップＳ２３１］コンテナ配置制御部１５６は、フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」の値が「ｔｒｕｅ」か否かを判断する。コンテナ配置制御部１５６は、フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」の値が「ｔｒｕｅ」であれば、処理をステップＳ２３２に進める。またコンテナ配置制御部１５６は、フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」の値が「ｔｒｕｅ」でなければ、処理をステップＳ２３３に進める。 FIG. 24 is the latter half of the flowchart showing an example of the procedure of the performance adjustment process in the third embodiment. In the following, the process illustrated in FIG. 24 will be described in order of step number.
[Step S231] The container arrangement control unit 156 determines whether the value of the flag “SCALE_FLAG” is “true”. If the value of the flag “SCALE_FLAG” is “true”, the container arrangement control unit 156 proceeds with the process to step S232. If the value of the flag “SCALE_FLAG” is not “true”, the container arrangement control unit 156 advances the process to step S233.

［ステップＳ２３２］コンテナ配置制御部１５６は、要因コンポーネントを実行するコンテナを１つ増加させるスケールアウトを実施する。その後、コンテナ配置制御部１５６は、処理をステップＳ２４１に進める。 [Step S232] The container placement control unit 156 performs scale-out to increase the container for executing the factor component by one. Thereafter, the container arrangement control unit 156 advances the processing to step S241.

［ステップＳ２３３］性能調整エンジン１５０は、繰り返し回数を示す変数Ｒの値が、閾値Ｘに達したか否かを判断する。性能調整エンジン１５０は、繰り返し回数が閾値Ｘに達した場合、性能調整を断念し、処理を終了する。またコンテナ配置制御部１５６は、繰り返し回数が閾値Ｘに達していなければ、処理をステップＳ２３４に進める。 [Step S233] The performance adjustment engine 150 determines whether or not the value of the variable R indicating the number of repetitions has reached the threshold value X. When the number of repetitions reaches the threshold value X, the performance adjustment engine 150 gives up performance adjustment and ends the processing. If the number of repetitions has not reached the threshold value X, the container arrangement control unit 156 advances the process to step S234.

［ステップＳ２３４］コンテナ配置制御部１５６は、ステップＳ２１０において抽出した要因度の符号（コンテナ要因度符号）が正か否かを判断する。コンテナ配置制御部１５６は、正の要因度であれば、処理をステップＳ２３７に進める。またコンテナ配置制御部１５６は、負の要因度であれば、処理をステップＳ２３５に進める。 [Step S234] The container arrangement control unit 156 determines whether or not the factor degree code (container factor degree code) extracted in step S210 is positive. If the container arrangement control unit 156 has a positive factor, the process proceeds to step S237. If the container placement control unit 156 has a negative factor, the process advances to step S235.

［ステップＳ２３５］コンテナ配置制御部１５６は、サーバ要因度符号が正か否かを判断する。コンテナ配置制御部１５６は、サーバ要因度符号が正の場合、処理をステップＳ２３６に進める。またコンテナ配置制御部１５６は、サーバ要因度符号が負の場合、性能調整を断念し、処理を終了する。 [Step S235] The container arrangement control unit 156 determines whether or not the server factor degree code is positive. If the server factor degree code is positive, the container arrangement control unit 156 proceeds with the process to step S236. In addition, when the server factor degree code is negative, the container arrangement control unit 156 gives up performance adjustment and ends the processing.

［ステップＳ２３６］コンテナ配置制御部１５６は、コンテナの配置変更を行う。すなわちコンテナ配置制御部１５６は、ステップＳ２１０で抽出した要因度の計算元となったコンテナの配置先を、現在のサーバから別のサーバに変更する。 [Step S236] The container arrangement control unit 156 changes the arrangement of containers. That is, the container arrangement control unit 156 changes the arrangement destination of the container that is the calculation source of the factor degree extracted in step S210 from the current server to another server.

［ステップＳ２３７］コンテナ配置制御部１５６は、サーバ要因度符号が正か否かを判断する。コンテナ配置制御部１５６は、サーバ要因度符号が正の場合、処理をステップＳ２３８に進める。またコンテナ配置制御部１５６は、サーバ要因度符号が負の場合、処理をステップＳ２４０に進める。 [Step S237] The container arrangement control unit 156 determines whether or not the server factor degree code is positive. If the server factor degree code is positive, the container arrangement control unit 156 proceeds with the process to step S238. If the server factor degree code is negative, the container arrangement control unit 156 advances the process to step S240.

［ステップＳ２３８］コンテナ配置制御部１５６は、２増１減のスケールアウト処理を行う。
［ステップＳ２３９］コンテナ配置制御部１５６は、フラグ「ＳＣＡＬＥ＿ＦＬＡＧ」を「ｔｒｕｅ」に設定する。その後、コンテナ配置制御部１５６は、処理をステップＳ２４１に進める。 [Step S238] The container arrangement control unit 156 performs a scale-out process of 2 increase 1 decrease.
[Step S239] The container arrangement control unit 156 sets the flag “SCALE_FLAG” to “true”. Thereafter, the container arrangement control unit 156 advances the processing to step S241.

［ステップＳ２４０］コンテナ配置制御部１５６は、１増のスケールアウト処理を行う。
［ステップＳ２４１］性能調整エンジン１５０は、繰り返し回数を示す変数Ｒの値を１だけカウントアップし、処理をステップＳ２０２（図２２参照）に進める。 [Step S240] The container arrangement control unit 156 performs a scale-out process by one increment.
[Step S241] The performance adjustment engine 150 increments the value of the variable R indicating the number of repetitions by 1, and advances the processing to Step S202 (see FIG. 22).

このようにして、２増１減のスケールアップをした場合、スケールインが可能であれば、スケールインを行うことができる。その結果、無駄にリソースを消費せずにすみ、リソースの有効利用が図れる。 In this way, when the scale-up is increased by 2 and 1 and the scale-in is possible, the scale-in can be performed. As a result, it is possible to effectively use resources without consuming resources unnecessarily.

〔その他の実施の形態〕
第２および第３の実施の形態では、コンテナごとに正の要因度と負の要因度とを計算しているが、例えば正の要因度と負の要因度との合計を、そのコンテナの要因度としてもよい。 [Other Embodiments]
In the second and third embodiments, the positive factor and the negative factor are calculated for each container. For example, the sum of the positive factor and the negative factor is calculated as the factor of the container. It may be a degree.

また第２および第３の実施の形態では、リソースのメトリック情報の代表値としてパーセンタイル値を用いているが、平均値、中央値などの他の代表値を用いてもよい。
以上、実施の形態を例示したが、実施の形態で示した各部の構成は同様の機能を有する他のものに置換することができる。また、他の任意の構成物や工程が付加されてもよい。さらに、前述した実施の形態のうちの任意の２以上の構成（特徴）を組み合わせたものであってもよい。 In the second and third embodiments, the percentile value is used as the representative value of the resource metric information, but other representative values such as an average value and a median value may be used.
As mentioned above, although embodiment was illustrated, the structure of each part shown by embodiment can be substituted by the other thing which has the same function. Moreover, other arbitrary structures and processes may be added. Further, any two or more configurations (features) of the above-described embodiments may be combined.

１サービス
２〜４サーバ
５端末装置
１０管理装置
１１記憶部
１１ａ第２状態情報
１２処理部 DESCRIPTION OF SYMBOLS 1 Service 2-4 Server 5 Terminal apparatus 10 Management apparatus 11 Memory | storage part 11a 2nd state information 12 Processing part

Claims

On the computer,
Acquire performance information indicating the performance of services provided by linking multiple processes,
Determining whether the performance information satisfies a performance requirement indicating performance required for the service;
If the performance information does not satisfy the performance requirements, obtain first state information indicating the operation state of each of the plurality of processes in the latest predetermined period,
When the performance requirement is satisfied based on the second state information indicating the operation state of each of the plurality of processes when the performance of the service satisfies the performance requirement, and the first state information; Calculating the difference in operating state from when not satisfied for each of the plurality of processes;
Based on the difference between the operation states of each of the plurality of processes, the process that causes the performance deterioration of the service is determined.
Performance management program that executes processing.

In addition to the computer,
When the performance information satisfies the performance requirement, third state information indicating an operation state of each of the plurality of processes in the latest predetermined period is acquired, and the second state information is obtained based on the third state information. Update,
The performance management program according to claim 1, wherein the process is executed.

In the update of the second state information, based on the third state information of a plurality of periods, the operation state indicated by the third state information in a period closer to the present is more strongly reflected in the updated second state information. ,
The performance management program according to claim 2.

The second state information is a predetermined representative value of second resource information indicating a time-series change in the operating status of resources used by each of the plurality of processes when the performance of the service satisfies the performance requirement. Is a second representative value,
In the acquisition of the first state information, a predetermined representative value of the first resource information indicating a time-series change in the operation status of the resource used by each of the plurality of processes in the most recent predetermined period is used as a first representative As a value,
In the calculation of the difference between the operating states, the difference between the first representative value and the second representative value is calculated for each of the plurality of processes.
The performance management program according to any one of claims 1 to 3.

In addition to the computer,
Based on the difference in operating state of the factor processing determined as the performance deterioration factor, determine a method for dealing with the performance deterioration,
Implementing the countermeasure according to the determined countermeasure method,
The performance management program according to claim 1, wherein the process is executed.

In the determination of the coping method, when the second state information of the factor processing represents an operating state having a load greater than the first state information of the factor processing, the factor processing is used as the coping method. If the first state information of the factor processing indicates an operating state having a larger load than the second state information of the factor processing, the factor processing is performed as the coping method. Decide to change the server running
The performance management program according to claim 5.

In the determination of the coping method, execution of the factor processing at the first server that is currently executing the factor processing is stopped, and the factor processing is executed at each of a plurality of second servers different from the first server. To decide
In addition to the computer,
If the processing load for the plurality of second servers to execute the factor processing after performing the countermeasure according to the determined countermeasure method is less than or equal to a predetermined value, the factor processing in a part of the plurality of second servers Stop running,
The performance management program according to claim 5, wherein the process is executed.

Computer
Acquire performance information indicating the performance of services provided by linking multiple processes,
Determining whether the performance information satisfies a performance requirement indicating performance required for the service;
If the performance information does not satisfy the performance requirements, obtain first state information indicating the operation state of each of the plurality of processes in the latest predetermined period,
When the performance requirement is satisfied based on the second state information indicating the operation state of each of the plurality of processes when the performance of the service satisfies the performance requirement, and the first state information; Calculating the difference in operating state from when not satisfied for each of the plurality of processes;
Based on the difference between the operation states of each of the plurality of processes, the process that causes the performance deterioration of the service is determined.
Performance management method.

Stores second state information indicating the operating state of each of the plurality of processes when the performance of the service provided by linking the plurality of processes satisfies the performance requirement indicating the performance required for the service A storage unit;
Obtaining performance information indicating the performance of the service, determining whether the performance information satisfies the performance requirement, and when the performance information does not satisfy the performance requirement, First state information indicating the operation state of each process is acquired, and based on the first state information and the second state information, the operation state when the performance requirement is satisfied and when the performance requirement is not satisfied A processing unit that calculates the difference of each of the plurality of processes, and determines a process that is a factor that degrades the performance of the service, based on a difference in operation state of each of the plurality of processes.
A management device.