WO2015092873A1

WO2015092873A1 - Information processing system and information processing method

Info

Publication number: WO2015092873A1
Application number: PCT/JP2013/083818
Authority: WO
Inventors: 仁史藪崎; 洋中越; 耕一村山; 崇利加藤
Original assignee: 株式会社日立製作所
Priority date: 2013-12-18
Filing date: 2013-12-18
Publication date: 2015-06-25

Abstract

Provided are an information processing system and an information processing method that improve overall system performance in a large-scale system by dynamically varying the arrangement of applications and data. In a system that comprises a plurality of terminal devices and a plurality of computers, applications and data are experimentally arranged in different computers and response time is measured, the result is compared with the response time in the original arrangement and evaluated, and the arrangement exhibiting the smallest response time is adopted. Response time is improved in a stepwise manner by repeatedly performing experimental arrangement and evaluation.

Description

Information processing system and information processing method

The present invention relates to an information processing system and an information processing method for improving performance by changing the arrangement configuration of a system including a plurality of computers.

As a technique for collectively managing servers in a distributed data center, there is a technique described in Patent Document 1. In Patent Document 1, when information indicating data to be transferred and conditions for the transfer destination are received from the user terminal, the data center information indicating the characteristics and the conditions for the transfer destination from a plurality of data centers are met according to a predetermined standard. When the candidate data center is identified and transmitted to the user terminal, and information on the selection of the migration destination data center from the candidate data center is received from the user terminal, the migration data is received from the migration source data center. A data center management device that transmits the data to a migration destination data center is disclosed.

JP 2012-093992 A

The technology disclosed in Patent Document 1 does not disclose updating data center information indicating the characteristics of each data center managed by the data center management device. In other words, since the data center information in Patent Document 1 is static information such as location, cost, SLA, etc., it is about collecting and updating dynamic information such as response time that varies with time in real time. Not considered. For this reason, depending on the operation state and load state of the data center, a situation may occur in which the migration destination data center does not satisfy the conditions requested by the user.

In recent years, in cloud computing services, there are increasing forms of providing services by linking multiple data centers installed in a wide area that spans the country and abroad. Furthermore, in the future, in addition to the conventional large-scale data center, cloud computing using a huge number of sites of various scales such as a container-type data center, a small data center of about 1 rack, and a server at a corporate base. It is also envisaged that a service will be provided.

In such a large-scale and complicated system environment, considering the operation status and load status of the data center that changes every moment, when changing the arrangement of data and applications to improve the quality of response time, etc. There are different challenges than small systems. For example, if the management server such as the data center management device of Patent Document 1 collects measurement information of computers at each site and decides the arrangement of applications and data according to the analysis results, the amount of information to be collected Increases the network load and collection / analysis time. In addition, since it takes time to determine the arrangement of data and applications, there is a possibility that the operation status and load status may change between the timing of moving the data and applications and the measured timing based on the determination.

The present invention has been made in view of the circumstances as described above, and is applied to a large-scale system in which it is difficult to collect and analyze measurement information indicating an operation state and a load state of each computer constituting the system. In addition, an object of the present invention is to improve the performance of the entire system by dynamically changing the data arrangement configuration.

In order to achieve the above object, the computer system in this embodiment measures the response time by placing applications and data on different computers on a trial basis, and compares and evaluates the response time in the original configuration, and the response time is small. Adopt an arrangement configuration. By repeating this trial placement and evaluation, the response time is improved step by step.

Specifically, it is a computer system in which a plurality of terminal devices and a plurality of computers are connected via a network, and each of one or more first computers among the plurality of computers has the same function. A computer in an executable state, wherein one of the first computers is one or more second computers that are other computers in the plurality of computers. The second computer sets the copied program to an executable state, and the first and second computers execute the program in response to requests from the plurality of terminal devices. Each of the plurality of terminals measures the response time of the request, and any one of the first and second computers can respond to the request from the plurality of terminals. And the information of the computer that processed the request, the one having a relatively long response time is identified from the response times, and the program being executed by the computer that has processed the request of the identified response time is It is characterized by issuing an instruction to stop.

According to the present invention, in a large-scale distributed system, it is possible to dynamically change the arrangement configuration of data and applications so that the performance of the entire system is improved.

It is a figure which illustrates the outline | summary of a present Example. It is a figure which illustrates the structure of a distributed processing system. It is a figure which illustrates the composition of a data processing system. It is a figure which illustrates the status information of an application. It is a figure which illustrates the status information of an execution base. It is a figure which illustrates the status information of a data processing program. It is a figure which illustrates the status information and task information of a management function. It is a figure which illustrates the flow by which the data processing program arrange | positioned at the distant execution base | substrate is arrange | positioned at the execution base | substrate near a terminal. It is a figure which illustrates the flow until a management function judges the execution base which operates / stops a data processing program anew, and executes it. It is a figure which illustrates the flow until a management function determines a management group and performs a task.

Schematic examples of the present invention are shown in FIGS. In the system of the present embodiment, applications and data are arranged on different execution bases on a trial basis, the response time is measured, compared with the response time in the original arrangement configuration, and an arrangement configuration with a small response time is adopted. By repeating this trial placement and evaluation, the response time is improved step by step.

∙ The response time may be improved by copying the application or data to multiple execution platforms and the terminal accesses the nearest application or data. However, as the number of copies increases, the cost of computing resources increases. Therefore, when the number of replicas reaches a specific number, unnecessary execution platforms are stopped. In addition, by performing a trial arrangement on many execution platforms at the same time, it is possible to search for a configuration with a small response time with a small number of trials, and early recovery can be performed when responsiveness deteriorates, but the cost associated with the trial increases. Therefore, a trial placement is performed within a specific number of ranges according to the situation.

For example, in the initial configuration (Fig. 1 (a)), the sub-application (Sub App1) and data (Data1) placed in the EU data center (DC1) are copied to the Asian data center (DC2) on a trial basis. (FIG. 1B). Processing is executed by both sub-applications after copying, and when the response time of Sub App1 of DC2 is shorter than DC1, Sub App1 of DC1 is stopped (FIG. 1 (c)). In addition, the test arrangement and the evaluation process are repeated in the same manner for the Sub App 1 of DC2 (FIG. 1 (d)). Here, Sub App1 of DC2 is copied to DC3.

It should be noted that the system in the present embodiment is more suitable for application to applications that do not require strong consistency than applications that require strong consistency such as transactions. In the present embodiment, the application and data are temporarily copied to different locations in a trial and error manner. For this reason, in the case of an application that requires strong consistency, data updated at the time of replication needs to be synchronized between a plurality of locations. Further, in this embodiment, the system at the location judged as unnecessary is stopped, but it is necessary to merge the updated data before stopping. From the viewpoint of ease of implementation, an application that allows reference to old data (that is, it may be weakly consistent) is suitable for the system in this embodiment.

The outline of the configuration of the distributed processing system in this embodiment is shown in FIG. The information processing system of this embodiment includes a network 110, a data processing system 100 (100-1 to 100-n), a management computer 130, and a terminal 120 (120-1 to 120-n). The network 130 is a WAN (Wide Area Network) or a LAN (Local Area Network), and may be a virtual network.

The data processing system 100 is a system that processes data in response to access from the terminal 120. The data processing system 100 includes a processor 200, a main storage device 300, a communication interface (I / F) 400, and an external storage device 500. The main storage device 300 or the external storage device 500 includes a data processing program 630, an execution base 650, A location server 620 and a management function 640 are stored. Although the data processing system 100 is shown as a single computer in FIG. 3, it may be a system composed of a plurality of computers. Further, even a multi-tenant that accommodates different applications may be a single tenant that accommodates a single application.

The execution platform 650 is middleware for operating the data processing program 630, and includes an OS (Operation System). The location server is a system that performs name resolution, such as DNS (Domain Name Service) or global name service, and holds a name resolution table composed of combinations of URLs and IP addresses. The data processing program 630 is, for example, an application program such as SNS (Social Network System) or e-commerce, or a program that operates to measure and control the state of a server, storage, or network. The data processing program 630 may be a part of a program constituting the application, such as an application plug-in or a web front end.

The location server 620 manages the location of the execution base 650 where the data processing program 630 and data are stored, and the data processing program 630 and data are stored in response to inquiries from the terminal 120, the data processing program 630, and the management function 640. The location of the execution platform 650 is notified. Since the location server 620 is equivalent to existing technologies such as a global name service and DNS (Domain Name Service), detailed description is omitted. The management function 640 will be described later with reference to FIG.

The management computer 130 is a management device that manages the data processing system 100, and includes an input device, an operation screen, a processor, a storage device, and the like used by the administrator of the data processing system 100. The management computer 130 generates and manages application status information 1000, execution platform status information 1100, and data processing program status information 1200, and notifies the data processing system 100 of them. For this reason, the operation screen of the management computer 130 displays such information, a setting button, a change button, and a delete button so that the administrator can change the setting.

In the present embodiment, an example in which the data processing system 100 is configured by an autonomous distributed control architecture having a management function 640 will be mainly described. However, the management function 640 is not in the data processing system 100 but in the management computer 130. Or both. Further, the management computer 130 may exist in one or a plurality of execution platforms 650.

The terminal 120 is a device for generating and utilizing data, such as a smartphone, a tablet, a notebook PC, a construction device, a medical diagnostic device, a smart meter, a farming device, a car, an elevator, an escalator, and the like. The terminal may include the management function 640 for managing the data processing program 630 on the terminal and managing the response time. Further, the terminal selects the data processing program 630 that requests the same or similar data processing from the plurality of data processing systems 100 and responds with a short response time, and executes the next data processing. Or the response time until the data processing system 100 responds by requesting the data processing is notified to the location server 620 and the data processing program 630 designated by the location server 620 based on the notified response time or the like. An execution platform is provided for requesting the next data processing.

FIG. 3 shows the hardware configuration of the data processing system 100 and the logical configuration of the management function. The management function 640 includes a management calculation unit 642 and a management information storage unit 644. The management calculation unit 642 includes an application quality evaluation function 6410, an execution history analysis function 6420, an execution base management function 6430, a communication function 6440, and a data processing program management function 6460. The application quality evaluation function 6410 evaluates a response time, which is a time from when the terminal 120 requests data processing when using an application until a response is returned. For the measurement and evaluation, for example, an existing method such as collecting and averaging response times measured by a plurality of terminals can be applied, and thus detailed description is omitted.

The data processing program management function 6460 confirms the requirements of the data processing program 630. In addition, the data processing program 630 can be operated or stopped. In addition, when it is necessary to change a parameter such as a configuration file because the type of execution base is different in order to make it operable, it is changed. Also, the management group to which the data processing program 630 belongs is specified. The execution base management function 6430 calculates the control cycle when the data processing program 630 is operated or stopped, the number of data processing programs 630 to be operated or stopped, and specifies the execution base for operating or stopping the data processing program 630. . The communication function 6440 exchanges information between management functions. When there are a plurality of management function roles, the task management function 6450 determines and executes the role of the task management function 6450. The execution history analysis function 6420 determines information for assisting or determining the execution base for operating or stopping the data processing program 630 from the history information regarding the past operation of the data processing program 630 or the response time at the time of stop.
The application / terminal information storage unit 6470 holds application state information 1000. The execution base group information storage unit 6475 holds execution base state information 1100. The execution history information storage unit 6480 has a response time change when the data processing program 630 is moved in the past, a response time change when the data processing program 630 is moved, a failure occurrence, or a sudden increase in access load from the terminal. Holds past execution history information such as response time changes. The management group information storage unit 6485 holds status information of the management function, task list information carried by the management function, and the like. The data processing program information storage unit 6490 holds data processing program status information 1200, data processing program requirements, and the like. Each information is shown in detail below.

Application state information 1000 is illustrated in FIG. The application state information is information indicating conditions such as SLA (Service Level Agreement) requested by the application and the current state. This information is referred to when controlling the operation and stop of the data processing program 630 in consideration of the state and characteristics of the application. The application status information includes, for example, an application, a data processing program, an arrangement execution base, a response time, a request response time, a quality degradation allowable time, an operation cost, a related terminal group, a terminal average position, and the like. Here, the application indicates an identifier of the application, the data processing program indicates an identifier of the data processing program 630 constituting the application, and the placement execution base indicates an identifier of the execution base on which the data processing program 630 is placed.

Response time is the time from when a terminal requests data processing to the data processing program stored in the same placement execution platform until the response is returned to the terminal. Yes, not a processing delay in the server. The request response time is a response time required by an application (that is, a target or a constraint). The quality degradation time is from when the response time does not meet the required response time until it is satisfied, but the allowable quality degradation time is the allowable quality degradation time.

The operating cost is the cost required to operate. The operation cost may be classified into two, an initial cost for operating in the initial stage and a running cost that is a constant cost. The initial cost includes a communication cost generated when an application or data is deployed on an execution platform, and a storage write cost. The running cost includes, for example, an instance use cost and a communication cost associated with data transmission / reception with a terminal. The operation cost may be managed by dividing it into an initial cost and a running cost.

Related terminal group is a set of terminals that use applications. The set of terminals is, for example, a customer segment such as a region where the terminal is located, a language, a user age, and an importance level for an application providing company. The related terminal group may be specified for each data processing program 630. The terminal average position indicates a physical average position of the related terminal group or a logical position on the network.

Execution base state information 1100 is illustrated in FIG. The status information of the execution base is information indicating the characteristics of the execution base 630 and the characteristics of the execution base group to which the execution base 630 belongs. The execution base state information is referred to when performing control related to the operation and stop of the data processing program 630 in consideration of the state and characteristics of the execution base and the state and characteristics of the execution base group. Execution board state information includes, for example, execution board, execution board characteristics, execution board group, execution board group characteristics, and the like. Here, the execution base and the execution base group are the identifier of the execution base 630 and the identifier of the execution base group to which the execution base 630 belongs. The execution platform characteristics include, for example, an execution platform type, an operator that provides the execution platform 630, a charging model for the service provided by the execution platform 630, and the location of the execution platform. Examples of the group characteristics of the execution base include, for example, the degree of distribution of the positions of the execution bases belonging to the execution base group, the control cycle that is a control cycle of the operation and stop of the data processing program 630, and new data in the execution base group The response time improvement expected by the activation of the processing program 630 is included.

FIG. 6 illustrates status information 1200 of the data processing program. The status information of the data processing program is information indicating the system configuration and operation status of the data processing program 630. The status information 1200 of the data processing program is referred to when performing control related to the operation and stop of the data processing program 630 in consideration of the configuration and operation status of the data processing program 630. The status information 1200 of the data processing program includes, for example, a data processing program, a data processing program attribute, a management group, an operation / stoppage availability program, a reference data processing program, a non-reference data processing program, an execution platform, and the like. Here, the data processing program is an identifier of the data processing program 630.

The data processing program attribute indicates an attribute of the data processing program 630, and indicates, for example, an attribute such as a Web front server, an App server, a DB server, or a required high level of data consistency. The management group indicates the identifier of the management group to which the data processing program 630 belongs. The operation / stop flag indicates whether or not the data processing program 630 can be newly operated on an arbitrary execution platform 650 and whether or not the data processing program 630 can be stopped. The reference data program indicates an identifier of a different data processing program 630 to which the data processing program 1210 refers. The referenced data processing program indicates a different data processing program 630 that refers to the data processing program 1210. The execution base indicates an identifier of the execution base for operating the data processing program 1210.

Further, the data processing program management function holds the requirements of the data processing program 630 for each data processing program 630. Data processing program requirements here include computing resources such as CPU, memory, storage capacity, communication bandwidth, SLA such as response time, availability, failure recovery time, PV (Page View), service sales, cloud This is a requirement related to ROI (Return On Investment) indicating sales for costs associated with use, KPI (Key Performance Indicator) such as customer satisfaction, and costs associated with cloud use.

Fig. 7 (a) shows the status information of the management function. The management function status information is information indicating the status of the management function 640. The management function status information holds information necessary for the management function 640 to control the operation and stop of the data processing program 630. The management function status information is classified into, for example, a management function, a management group, a task, an operation status, an arrangement execution base, and a management target data processing program. The management group is as described above. The management function indicates an identifier of the management function 630. The task is an identifier of a task that the management function 640 bears. The tasks include, for example, an inter-management group information transmission task, an analysis task for analyzing the status within the management group, and a resource task for requesting computing resources. The operating status is the operating status of the management function. The management target data processing program is an identifier of a data processing program managed by the management function. The placement execution base is an identifier of the execution base (or data processing system) on which the management function is placed.

Fig. 7 (b) shows the task information of the management function. The task information is information that the management function 640 refers to when determining a task that the management function 640 is responsible for. The task information is uniquely set for the same management group. The task information includes information such as task name and priority. Each management function 640 executes, for example, a task that can execute an arbitrary task and has a high priority according to a computing resource that can be used, and is not assigned to another management function 640. A task may be fixedly assigned to the management function 640.

FIG. 8 is a sequence diagram showing an outline of the operation of the data processing system in the present embodiment. First, a flow of a series of operations will be described, and a specific example of this operation will be described.

In step 810, the management function 640-1 of the data processing system 1 selects the data processing system 2 based on a predetermined condition, and the same program 630-2 as the data processing program 630-1 is selected as the data processing system 2 To be ready for operation on the execution platform 650-2. The management function 640-1 may send a request to the management function 640-2 so that the management function 640-2 can operate the data processing program 630-2.

Also, the management function 640-1 updates the name resolution table managed by the location server 620-1 based on the information on the location of the execution base on which the data processing program 630-1 held by the management function 640-1 operates. To do. Here, since the program 630-1 (630-2) operates on the execution bases of the

data processing systems

1 and 2, information on the locations of both execution bases is stored in the table. For example, if the name resolution table is a combination of a URL and an IP address, the execution base IP address is added to the URL line indicating the data processing program 630-1. The update timing may be updated when information on the location of the execution base where the data processing program 630 held by the management function 640 operates is updated, or at certain time intervals based on a timer. Thereafter, for the sake of simplification, description of updating the name resolution table is omitted, but the name resolution table is updated at the above timing.

Note that the number and timing of operating the data processing program 630 and the method of determining the execution platform to be operated will be described together after explaining a series of operation outlines using sequences.

In step 820, when the application platform 720 of the terminal 120 inquires the location server 620-1 about an access destination necessary for executing the data processing program 630-1 (630-2), the location server 620-1 720 notifies the access destinations (execution platforms 650-1 and 650-2) that the data processing program (630-2) can execute. The timing of the inquiry is a time when the user inputs to the terminal 120-1, a time determined regularly or in advance. Here, the location server 620 grasps all or any execution platform 650 in which the data processing program 630 can operate by sharing information between the management functions 640. Further, the access destination notified by the location server 620 is one or a plurality of execution platforms 650 determined by the management function 640 in Step 3240 and Step 3250. Since there is an existing technology such as DNS for dealing with inquiries about access destinations, a detailed description is omitted.

In step 830, the execution platform 650-1 and 650-2, which are the access destinations grasped in step 840 by the application 710 of the terminal 120, requests data processing and returns the result quickly (in this case, the execution platform 650). -2) and requests the execution platform 650-2 to continue data processing. Data processing is the contents of data processing specified by the application.

In step 840, as in step 810, the management function 640-2 selects the data processing system 3 based on a predetermined condition, and executes the data processing program 630-2 (630-) on its execution base 650-3. The program 640-3 identical to 1) is put into an operable state.

In step 850, the management function 640-2 specifies the management function (management function 640-1), and transmits / receives updated status information (in this case, the data processing program 630-3 is operable). Similar to step 810, the application platform 720-1 makes an inquiry to the location server 620-2.

In step 860, when the application platform 720-1 of the terminal 120-1 inquires of the location server 620-1 about the access destination, the location server 620-1 notifies the application platform 720-1 of the access destination.

In step 870, when the application 710-1 requests data processing from the access destination (execution base 650-1) grasped in step 810, the execution base 650-1 returns the processing result of the data processing program 630-1.

As described above, the management functions of the data processing system are set so that each of them can autonomously execute its own data processing program on the execution base of other systems. That is, a program having the same function is executed in various systems, but since the system (execution base) for executing the program is notified to the terminal, the terminal is notified of all of the notified systems. Request processing for one of them.

Then, when the terminal receives the processing result from the system that requested the processing, the terminal selects a system with a short response time and requests the next processing. As a result, only the program whose response time is shortened is continuously used, so that it is possible to determine the execution destination of the program so that the response time gradually decreases. Although details will be described later, by stopping execution of a program having a long response time, the number of program executions can be suppressed to a fixed number, and resource consumption can be suppressed.

Next, the control mechanism in the management function 640 for improving the response time in stages by repeating trial arrangement and evaluation will be described with reference to FIG.

Each management function in this embodiment belongs to a plurality of management groups and shares information within the same management group. Moreover, each management function bears a different task. Each management function autonomously selects the task that it takes. The task is, for example, a task of totaling response times notified or measured to each data processing system, a task of exchanging information with different management groups, a task of determining the next placement destination by Bayesian estimation, This is a task for analyzing an execution history or the like. Details of the management group and task determination method will be described later.

In step 3205, the execution platform management function 6430 determines the execution platform group to which the data processing program 630 to be managed belongs. Here, the execution platform group is a set of execution platforms 650 that are candidates for execution platforms for newly operating the data processing program 630. As a result, even in a distributed processing system composed of a large number of execution platforms, it is possible to select an execution platform with a high expectation of improving response time with a small amount of calculation. An execution platform may belong to a plurality of execution platform groups. The execution platform group may be defined in advance by an application developer or a distributed processing system administrator. Therefore, the GUI of the management computer includes an input field and a setting button for setting a determination policy for the execution base group. The method for determining the execution platform group is shown below.

The first example of the execution base group determination method is a method of determining the execution base group based on the physical location of the execution base or the logical position on the network and the execution base type. Specifically, for example, an execution platform group whose execution platform type belongs to the execution platform group is different from its own execution platform type is selected. As a result, even if a failure occurs in a certain execution base or a cyber attack occurs, it is possible to propose a possibility that all execution bases may stop operating simultaneously. In addition, when the data processing program 630 is compatible with the execution base and the data processing time differs depending on the execution base type, the data processing program 630 is not biased to the execution base with a slow data processing time. The data processing program 630 can be operated on an execution platform with a short data processing time. Note that the above compatibility is executed when, for example, the data processing has a feature that requires a large amount of memory, CPU performance, or I / O performance, or features that are suitable for KVS or suitable for RDB. Whether or not the platform has performance and functions that meet the above characteristics.

In addition, as a method for determining the execution base group based on the position information, for example, each execution base group has an upper limit value of the degree of distribution, and is determined so that the distribution degree of the execution base does not exceed the upper limit value. To do. As a result, using multiple execution platform groups with different upper limits of the degree of distribution, a local execution platform group that is a set of execution platforms that are physically or logically close to each other on the network, and distributed execution It can belong to both execution platform groups distributed over a wide area, which is a set of platforms. As a result, it is possible to make a plan for local recovery by a local execution base group against a deterioration in response time and a plan for recovery from a globally optimal viewpoint by execution base groups distributed over a wide area.

The second example of the execution base group determination method is a method of determining based on the management group to which the data processing program 630 and the management function 640 running on the execution base 650 belong. In other words, execution bases on which the data processing programs 630 belonging to the same management group are deployed belong to the same execution base group.

The third example of the calculation method of the execution base group is a method of determining based on the execution history. Specifically, the execution history analysis function calculates the response time when the data processing program 630 is newly executed on the execution base from the past execution history. The past execution history is, for example, a change in response time when an arbitrary data processing program 630 whose physical position on the execution base where the data processing program 630 is arranged or whose logical position on the network is similar is moved, Or, when an arbitrary data processing program 630 having similar characteristics of the data processing program 630 is moved, a response time change occurs, or a similar environmental change occurs such as a failure or a sudden increase in access load from the terminal. This is a change in response time when an arbitrary data processing program 630 is moved. The execution platform group management function groups each execution platform so as to belong to one or a plurality of groups based on the calculated response time.

Returning to the explanation of FIG. Hereinafter, when the execution base 650 managed by the management function 640 belongs to a plurality of execution base groups, the processing from step 3210 to step 3260 is performed for each execution base group. However, it may have a hierarchical structure in which there are execution base groups in which a plurality of execution base groups are collected. In such a case, the processing from step 3210 to step 3260 is performed in units of a plurality of execution base groups. May be performed.

In step 3210, the execution base management function 6430 determines a control cycle for operating and stopping the data processing program 630. A method for determining the control period is exemplified below.

The first example of the control cycle determination method is shown. By referring to the status information 1100 of the execution base, the execution base belonging to the execution base group and its position are grasped. The degree of dispersion is calculated from the position information by, for example, an average value of physical distance and communication delay. A control cycle is determined based on the degree of dispersion. For example, the control period is determined using a linear function that increases monotonously as the degree of dispersion increases. In addition, when calculating the control cycle, in addition to the degree of distribution, information on the business operator, the charging model, and the execution base type may be used. For example, if the operators are different, the network or GW connecting the operators may become a communication bottleneck, so the control cycle may be increased by a certain rate or a certain value may be added. Also, if the billing model indicates that the traffic is pay-as-you-go rather than fixed, or if the increase in the number of execution platforms leads to a significant increase in fees, the control cycle is increased by a certain rate or a certain value is set. Add.

A second example of the control cycle calculation method is shown below. With reference to the application status information 1000, a value obtained by multiplying the quality degradation allowable time by a coefficient is set as the control period. Alternatively, the response time of the application state information 1000, the request response time, and the allowable quality degradation time are grasped, the difference between the response time and the request response time is calculated, and the response time improvement expected range of the execution base state information 1100 is When the difference is larger than the difference, a value obtained by multiplying the quality degradation allowable time by a coefficient is set as a control cycle. Alternatively, the operation cost may be taken into consideration because a large number of data processing programs 630 of an application having a high cost for operation such as application and data arrangement are not operated unnecessarily. For example, the product of the control cycle calculated above and the operation cost is calculated as the control cycle.

A third example of the control cycle calculation method is shown below. The control cycle calculated in the first and second calculations is calculated in consideration of one or more of the excess bandwidth of the network connecting the execution infrastructure, the I / O surplus performance of the execution infrastructure, and the communication cost. For example, the product of the control cycle calculated above, the surplus bandwidth of the network connecting the execution bases, the surplus performance of the execution base I / O, and the communication cost is calculated as the control cycle.

The fourth example of the control cycle calculation method is shown below. In the second and third calculations, the estimated response time improvement cost and cost are estimated using the past execution history and simulation results. As the estimation method, for example, Bayesian estimation can be used. Specifically, for example, the expected value of the response time improvement expected range is estimated with reference to the response time improvement expected range under similar conditions obtained from the execution history. Since Bayesian estimation is a common existing technology, a detailed explanation of the calculation method is omitted.

Returning to the explanation of FIG. In step 3220, the execution infrastructure management function 6430 determines the number of primary operations and the number of operations of the data processing program 630. Here, the number of temporary operations indicates the number of execution platforms 650 that operate the same data processing program 630 in an execution platform test. The trial operation is to operate for a certain period of time so that it can be easily stopped or deleted if unnecessary. The state that can be easily stopped or deleted includes, for example, limited operations such as permitting data reference but not permitting writing of data that requires merging of distributed data later. The number of operations is the number of data processing programs 630 that continue operation without being stopped or deleted when the data processing programs 630 that are temporarily operated on the execution platforms 650 are evaluated.

The number of temporary operations and the number of operations are determined based on the group characteristics of the execution base and the characteristics of the execution base as in the control cycle. Further, it may be determined based on application state information and characteristics. Alternatively, it may be determined based on the surplus bandwidth of the network connecting the execution infrastructure, the surplus performance of the I / O of the execution infrastructure, and the communication cost. Alternatively, it may be determined by Bayesian estimation based on the execution history or the simulation result.

In step 3230, the execution base management function 6430 determines an execution base that makes the data processing program 630 temporarily operable or an execution base that is to be stopped. In step 3205, candidates for execution platforms to be operated are appropriately limited as execution platform groups based on information such as location information, management groups, and application types. Therefore, in this step, the execution base group is selected at random or selected from the execution base group based on the execution history. A method of selecting based on the past execution history is shown below. Using the past execution history and simulation results, the execution base that minimizes the response time is estimated from the response time when the data processing program is arranged on the execution base included in the execution base group. As the estimation method, for example, Bayesian estimation can be used.

Specifically, for example, as a premise, it is assumed that the result of measuring the average response time of a terminal accessing the execution base as a result of placing an application on a test execution base as a test. In addition, it is assumed from the past execution history or prior simulation that the response time when placing on the same execution base under similar conditions and the placement configuration that minimizes the response time are given as a probability distribution. To do.

When x _i is placed in a set (site) i of physically separated execution platforms, the average response time r _i (t) of users connected to site i falls below a value α (i = 1 to N) And Further, when ￢x _i is arranged at site i, an average response time r _i (t) of users connected to site i is an event (i = 1 to N) that exceeds a certain value α. Further, y _k is an event (k = 1 to N) in which arrangement at site k minimizes response time.

In the t-th arrangement, the probability that the arrangement at the site k minimizes the response time when the arrangement at the site i falls below the value α is P _t (y _k | x _i ) is calculated by Equation 1 and Equation 2.

Here, P _t (y _k ) is Bayes updated from the result of the (t-1) th time. Assuming that the test location at the t-1th time is i ′, P _t (y _k ) is calculated by

Equations

3 and 4 when the response time is equal to or less than a certain value α.

Here, P ₁ (y _k ) = 1 / N, or P _∞ (y _k ) in the prior simulation. The site to be placed in the T-th trial is the site having the maximum probability P _t (y _k | x _i ) or P _t (y _k | ￢x ₁ ).

The same applies to the calculation of the probability when a plurality of trial placements are performed. As an example, the probability calculation method is illustrated when the trial placement is performed twice. Configuration where placement at site k minimizes response time when responsiveness is below β when placed at site i and below responsive value β when placed at site j The probability P _t (y _k | x _i , ￢x _j ) is calculated by Equation 5.

Returning to the explanation of FIG.

In step 3240, the data processing program management function 6460 temporarily maintains or stops the state in which the data processing program 630 can be operated according to the number of operations and the number of temporary operations determined in step 3230, and the response time and cost. Is measured and evaluated. In order to make it possible to operate, it is confirmed whether necessary resources and applications have been allocated to the execution platform 650 on which the necessary applications and data are to be operated, whether they are arranged, and necessary settings are made.

If necessary computing resources are not secured, secure computing resources. If the computing resource is managed by another cloud operator, the computing resource is requested from the operator. If the application or data is not deployed, deploy the application or data. If initial settings are required, make necessary settings such as initial settings and activate the application. If not, check whether necessary computing resources are secured.

In addition, in order for the terminal 120 to change the access destination to the execution platform 650 on which the application or data operates, the management function 640 that grasps the execution platform 650 on which the application or data operates needs to notify the terminal 120 of the access destination. is there. Therefore, the management function 640 notifies the execution platform 650 to be accessed by the terminal 120 through the location server 620. Specifically, when the terminal 120 makes an inquiry, the terminal notifies the terminal of an identifier such as a URL that uniquely indicates an application running on the execution platform 650. The timing at which the terminal 120 inquires about the access destination is a certain interval, a timing at which an application built in the terminal 120 is activated, a timing at which a reload button or the like on the terminal is pressed by the user, or the like.

The first example of the response time measurement method is not the response time itself, but the number of executions of data processing in each execution platform. When the terminal accesses the data processing program 630 arranged on the plurality of execution bases 650 and selects the execution base 650 having a short response time, the number of executions of the data processing program 630 increases in the execution base 650 near the terminal.

In the second example of the measurement method, the terminal or the management function 640 measures the response time, and maintains the data processing program 630 arranged on the execution base 650 with a short response time in an operable state. When the terminal measures, the terminal notifies one or a plurality of location servers 620. The location server notified of the response time measured by the terminal uses the information sharing mechanism between the plurality of management functions 640 shown in Step 3030 of FIG. 10 to share the response time information.

In step 3250, the data processing program management function 6460 determines a data processing program to be operated or stopped based on the evaluation result in step 3240. When the number of executions is measured instead of the response time, the number of executions of the data processing program 630 is larger than the data processing programs stored in the other execution bases 650, or the data processing program 630 having a predetermined threshold value or more. Is maintained in an operable state. Note that the data processing program 630 having a small number of executions may be stopped. When the response time is measured, if the data processing program 630 operating on the execution platform 650 having a small average response time is continued and the data processing program 630 operating on the execution platform 650 having a large average response time is stopped. decide. Here, when the number of executions and the response time are compared and determined among the plurality of execution bases 650, the number of executions and the response time are shared between the management functions 640 of the different data processing systems 100. A method of sharing information between different management functions 640 will be described later in step 3030 of FIG.

In step 3260, the data processing program management function 6460 maintains or stops the operation of the data processing program determined in step 3250. The procedure for operation or stop in this step is the same as the operation / stop procedure shown in step 3240.

In step 3270, the execution infrastructure management function 6430 determines whether or not the control cycle needs to be changed. If it is determined that the control cycle is to be changed, the process proceeds to step 3210. If it is determined that the control period is not to be changed, the process proceeds to step 3220. Note that step 3220 is not necessarily executed after the second time.

By executing the above control in multiple execution platform groups, the execution platform is physically or logically on the network, and is divided into a local execution platform group that is a set of execution platforms that are close together and a set of distributed execution platforms. It becomes possible to belong to both execution platform groups distributed over a wide area. For example, it is possible to shorten the control cycle of a local execution platform group and shorten the execution cycle group distributed over a wide area. Thus, a control loop for quickly recovering the response time locally and a control loop from the viewpoint of overall optimization based on evaluation over a wide area are possible.

Next, a procedure in which the management function 640 calculates a management group, transmits / receives information between the management functions 640 belonging to the same management group, and determines and executes a task of the management function 640 will be described with reference to FIG. .

In step 3010, the data processing program management function 6460 refers to the management information storage unit 644 and grasps the requirements of the data processing program arranged on the execution base to be managed. In step 3020, the data processing program management function 6460 determines the management group to which the data processing program 630 to be managed belongs. Here, the management group is a group of management functions 640 indicating a range in which information related to the status information 1200 of the data processing program is transmitted / received in the plurality of data processing programs 630 distributed to the plurality of execution platforms 650.

Information shared by the management functions in the same group includes a plurality of data processing programs 630 in which the management function 640 determines the data processing requested by the user or the analysis processing performed in the background. Determine which of the data processing programs 630 capable of performing similar processing is operated to be stopped / deleted, determine the location of data handled by the data processing program 630, and each data processing program 630 is executed This is used for sharing information such as excess or deficiency of computing resources of the base 650.

By managing the data processing program 630 in units of management groups, the management function 640 can limit the parties to which information such as the status information 1200 of the data processing program is transmitted and received, and the number of data processing programs 630 and management functions 640 can be limited. Even in the case of an increase, the amount of data accompanying information transmission / reception can be suppressed. The management group determination method is exemplified below.

The management group decision method is roughly divided into three categories. One is a method based on information related to the data processing program 630, one is based on information related to the management function 640, and the other is a method based on information related to the execution platform 650. The determination method is illustrated in each case. The following determination methods may be combined.

The first management group determination method is a method for determining based on data processing program attributes. Specifically, the data processing program and the data processing program attribute of the status information 1200 of the data processing program are grasped, and data processing programs having the same data processing program attribute are set as the same management group. When a data processing program holds a plurality of data processing program attributes, those having the same combination of data processing program attributes may be used as the same management group. Note that an application developer or a distributed processing system administrator may predefine attributes and combinations of attributes that are the same management group.

The second management group determination method is a method for determining based on the task of the management function 640. Specifically, the task of the management function 640 is grasped, and the management function 640 having the same task is set as the same management group. When the management function 640 holds a plurality of tasks, the same combination of tasks may be set as the same management group. Note that an application developer or a distributed processing system administrator may predefine tasks and combinations of tasks for the same management group.

The second management group determination method is to determine the management group based on the execution base group described later. For example, the execution base group in which the execution base value of the execution base group information 1100 is the same as the data processing program and the placement execution base of the application status information 1000 is grasped. Then, the execution base group grasped in the column of the management group of the row which is the data processing program grasped by the data processing program of the status information 1200 of the data processing program is inserted. As a result, the management function 640 that is physically close or logically close to the network can reduce the amount of communication generated by transmitting / receiving information such as the status information 1100 of the data processing program to a physically or logically local area. It becomes possible to limit to.

Returning to the explanation of FIG. In step 3030, the communication function 6440 is the same as the management function to which the data processing program 630 belongs, the status information 1200 of each data processing program, the application status information 1000, the execution base group information 1100, and the measurement in step 3240. The received response time information and the like are transmitted and received. Since the information sharing range is limited to the management group, an increase in the amount of communication required for information sharing between the management functions 640 can be prevented.

In step 3040, the task management function 6450 determines a task that the management function 640 takes. Specifically, the task management function 6450 needs the number of management functions 640 in charge of tasks (the number of task execution management functions) by referring to the task list information of the management group to which the management function 640 belongs. Among tasks that have not reached the number of management functions 640 to be performed (the number of necessary management functions), a task having a high priority is grasped. If the management function 640 has already been in charge of the task, if the priority is higher than the task that was originally in charge, the newly grasped task is assigned and the assignment of the task that has been in charge is stopped. At that time, the value of the task execution management function number of the task in charge is increased, and the number of task execution management functions of the task that was originally in charge is reduced. In the above, for simplification of explanation, the case where each management function 640 can handle a maximum of one task is illustrated. However, each management function 640 determines the number of tasks in charge based on the amount of computing resources available to itself. You may change it.

Note that when multiple persons are in charge, the task processing load may be used as a weighting factor. Further, avoidance of task assignment may be determined based on the execution base type and distribution of the execution base 650 in which the management function 640 is arranged, or the data processing program attribute. At this time, an input field and a setting / change button for the determination policy information are displayed on the GUI of the management computer so that the application developer or the distributed processing system administrator can set the determination policy for the availability.

In step 3050, the task management function 6450 executes the task in charge. In step 3060, the data processing program management function 6450 determines whether or not to change the management group. If it is determined that it is necessary, the process proceeds to step 3020. If it is determined that it is unnecessary, the process proceeds to 3030. The method of judging is illustrated below. The following methods may be combined.

The first determination method is a method that is periodically performed based on a timer. That is, the management group is re-determined when a certain time has passed. Here, the period may be a constant value or a dynamically changing value. As a method for calculating the period in the case of dynamic change, there is a method of determining based on the execution history. For example, when the management group is re-determined and the management group is not changed, the cycle is lengthened, and when the management group is changed, the cycle is shortened.

The second judgment method is not a timer but a trigger based on environmental changes. For example, when the execution base is built on a virtual environment, the management group is changed when the virtual environment is migrated, when the response time shown in the application status information 1000 is increased, or when the terminal average position is changed. Judge that it will be redetermined.

With the above configuration and processing, the management group can be changed according to changes in the system environment, so that it is possible to reduce the increase in the amount of communication required for information sharing between the management functions 640.

100: data processing system, 110: network, 120: terminal, 130: management computer, 640: management function, 642: management operation unit, 6410: app quality evaluation function, 6420: execution history analysis function, 6430: execution base management function , 6440: Communication function, 6450: Task management function, 6460: Data processing program management function, 644: Management information storage unit, 6470: Application / terminal information storage unit, 6475: Execution base group information storage unit, 6480: Execution history information Storage unit, 6485: management group information storage unit, 6490: data processing program information storage unit

Claims

A computer system in which a plurality of terminal devices and a plurality of computers are connected via a network,
Each of one or more first computers among the plurality of computers holds a program in an executable state having the same function,
Any computer of the first computer copies the program to one or more second computers that are other computers of the plurality of computers,
The second computer sets the copied program to an executable state,
The first and second computers execute the program in response to requests from the plurality of terminal devices,
Each of the plurality of terminals measures a response time of the request,
The third computer of any of the first and second computers receives the response time of the request and information of the computer that has processed the request from the plurality of terminals, and is relative to the response time. A computer system characterized by specifying a long response time and instructing to stop the program being executed on the computer that has processed the specified response time request.
The computer system according to claim 1,
The plurality of computers are classified into a plurality of groups, and any one of the first computers copies the program to the second computer belonging to the same group as itself. .
The computer system according to claim 2,
The computer system according to claim 1, wherein the group is classified according to a type of a program that can be executed by each of the plurality of computers.
The computer system according to claim 1,
The computer system according to any one of the first computers, wherein the computer copies the program to the second computer at a predetermined cycle.
The computer system according to claim 1,
The second computer is classified into either the first or second group;
Any one of the first computers copies the program to the second computer belonging to the first group in a first period;
The computer copies the program to the second computer belonging to the second group in a second period,
The computer system according to claim 1, wherein the first period is longer than the second period.
The computer system according to claim 1,
The plurality of computers are classified into a plurality of groups,
The computer system, wherein the third computer shares the response time of the request received from the plurality of terminals and information of the computer that has processed the request with a computer in a group to which the third computer belongs.
A computer system according to claim 6, wherein
The role of computers belonging to the same group is set,
The computer system according to claim 1, wherein the third computer is determined from the first and second computers based on the setting.
A computer system control method in which a plurality of terminal devices and a plurality of computers are connected via a network,
Each of one or more first computers among the plurality of computers holds a program in an executable state having the same function,
Any computer of the first computer copies the program to one or more second computers that are other computers of the plurality of computers,
The second computer sets the copied program to an executable state,
The first and second computers execute the program in response to requests from the plurality of terminal devices,
Each of the plurality of terminals measures a response time of the request,
The third computer of any of the first and second computers receives the response time of the request and information of the computer that has processed the request from the plurality of terminals, and is relative to the response time. A computer system control method characterized by specifying a long response time and instructing to stop the program being executed by a computer that has processed the specified response time request.
A control method for a computer system according to claim 8,
The plurality of computers are classified into a plurality of groups, and any one of the first computers copies the program to the second computer belonging to the same group as itself. Control method.
A control method for a computer system according to claim 9,
The group is classified according to the type of program that can be executed by each of the plurality of computers.
A control method for a computer system according to claim 8,
The computer system control method according to any one of the eighth computers, wherein the computer copies the program to the second computer at a predetermined cycle.
A control method for a computer system according to claim 8,
The second computer is classified into either the first or second group;
Any one of the first computers copies the program to the second computer belonging to the first group in a first period;
The computer copies the program to the second computer belonging to the second group in a second period,
The computer system control method according to claim 1, wherein the first period is longer than the second period.
A control method for a computer system according to claim 8,
The plurality of computers are classified into a plurality of groups,
The third computer shares a response time of the request received from the plurality of terminals and information on the computer that has processed the request with a computer in a group to which the third computer belongs. .
A computer system control method according to claim 13, comprising:
The role of computers belonging to the same group is set,
A control method for a computer system, wherein the third computer is determined from the first and second computers based on the setting.