Disclosure of Invention
In a first aspect, an embodiment of the present invention provides an operation and maintenance method applied to a data center, where the data center includes a private cloud node and a public cloud node. The method comprises the following steps: the public cloud node receives a plurality of historical data sent by the private cloud node, wherein each historical data is a comprehensive evaluation value obtained according to N dimensionalities of service quality evaluation values of the private cloud node, the N dimensionalities of the service quality evaluation values represent the service quality of the private cloud node in the N dimensionalities respectively, and N is an integer not less than 2; the public cloud node predicts the comprehensive evaluation value of the private cloud node according to the plurality of historical data to obtain a predicted value; the public cloud node determines that the predicted value meets an alarm rule; and responding to the determination, and sending an alarm message to the private cloud node by the public cloud node.
In the operation and maintenance method provided by the embodiment of the invention, the private cloud node predicts the comprehensive evaluation value by sending the historical data of the comprehensive evaluation value to the public cloud node and utilizing the computing capacity of the public cloud node, so that the private cloud node is subjected to early warning and operation and maintenance before a fault occurs. Because the public cloud node has stronger computing capacity and storage capacity than the private cloud node, compared with the prediction completed on the private cloud node, the method for predicting the comprehensive evaluation value of the private cloud node by using the common node can introduce more historical data of the comprehensive evaluation value to perform larger-scale computation. Therefore, the prediction accuracy is improved, the calculation speed is higher, and a more efficient and accurate operation and maintenance mode is provided for the data center.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the private cloud node includes a physical device for providing cloud services, and the quality of service of the N dimensions includes a quality of service of the cloud services and a quality of service of the physical device.
A plurality of service quality evaluation values with different dimensionalities are introduced, and the service quality of the private cloud node is inspected or monitored from the dimensionalities of the service provided by the resource, the working state of the resource providing the service and the like, so that the operation and the maintenance of the private cloud node are more accurate, and the service quality of the private cloud node can be more comprehensively reflected.
With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the method further includes: the private cloud node obtains first historical data in the plurality of historical data according to the service quality evaluation values of the N dimensions of the private cloud node in a first time period.
The comprehensive evaluation value of the service quality is introduced, comprehensive, visual and comprehensive parameters are given according to the service quality of the private cloud nodes on the basis of the comprehensive multi-dimensional service quality evaluation value, the comprehensive, visual and comprehensive parameters are used for monitoring the service quality of the data center more comprehensively, macroscopically and visually, the complexity is reduced, and the user experience is improved.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the obtaining, by the private cloud node, the first history data in the plurality of history data according to the N-dimensional service quality evaluation values of the private cloud node in the first time period includes: the private cloud node normalizes the service quality evaluation values of the N dimensions in the first time period; the private cloud node obtains the first history data according to the service quality evaluation values of the N dimensions after normalization and the weight of the service quality evaluation value of each dimension.
With reference to the third implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the method further includes: the private cloud node acquires N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions; and the private cloud node acquires the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
In a second aspect, an embodiment of the present invention provides an operation and maintenance device for operation and maintenance of a data center, where the data center includes a private cloud node and a public cloud node, and the operation and maintenance device includes: a monitoring module deployed on the private cloud node to: monitoring the service quality of N dimensionalities of the private cloud node; obtaining a comprehensive evaluation value according to the service quality evaluation values of the N dimensions, wherein the service quality evaluation values of the N dimensions represent the service quality of the private cloud node in the N dimensions respectively, and N is an integer not less than 2; and sending a plurality of historical data to a prediction module deployed on the public cloud node, wherein each historical data is a comprehensive evaluation value obtained according to the service quality evaluation values of the N dimensions of the private cloud node. The operation and maintenance equipment further comprises: a prediction module deployed on the public cloud node to: receiving a plurality of historical data sent by the monitoring module; predicting the comprehensive evaluation value of the private cloud node according to the plurality of historical data to obtain a predicted value; determining that the predicted value meets an alarm rule; and responding to the determination, and sending an alarm message to the private cloud node.
The detection module on the private cloud node sends the historical data of the comprehensive evaluation value to the prediction module of the public cloud node, and the comprehensive evaluation value is predicted by utilizing the computing capacity of the public cloud node, so that early warning and operation and maintenance before a fault occurs are carried out on the private cloud node. Compared with the private cloud node, the public cloud node has stronger computing capacity and storage capacity, and the comprehensive evaluation value of the private cloud node is predicted by utilizing the common node, so that more historical data of the comprehensive evaluation value can be introduced to perform larger-scale calculation compared with the prediction completed on the private cloud node, and more efficient operation and maintenance with higher accuracy and lower time delay are realized.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the private cloud node includes a physical device for providing cloud services, and the quality of service of the N dimensions includes a quality of service of the cloud services and a quality of service of the physical device.
A plurality of service quality evaluation values with different dimensionalities are introduced, and the service quality of the private cloud node is inspected or monitored from the dimensionalities of the service provided by the resource, the working state of the resource providing the service and the like, so that the operation and the maintenance of the private cloud node are more accurate, and the service quality of the private cloud node can be more comprehensively reflected.
The comprehensive evaluation value of the service quality is introduced, comprehensive, visual and comprehensive parameters are given according to the service quality of the private cloud nodes on the basis of the comprehensive multi-dimensional service quality evaluation value, the comprehensive, visual and comprehensive parameters are used for monitoring the service quality of the data center more comprehensively, macroscopically and visually, the complexity is reduced, and the user experience is improved.
With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the obtaining, by the monitoring module, a first historical data of the plurality of historical data according to the N-dimensional service quality evaluation values of the private cloud node in a first time period includes: normalizing the service quality evaluation values of the N dimensions in the first time period; and obtaining the first history data according to the service quality evaluation values of the N normalized dimensions and the weight of the service quality evaluation value of each dimension.
With reference to the second implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the monitoring module is further configured to: acquiring N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions; and acquiring the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
In a third aspect, an embodiment of the present invention provides a data center, where the data center includes at least one computing device, and the at least one computing device includes a processor and a memory, where the processor executes program instructions in the memory to implement the various methods performed by the public cloud node and the private cloud node in the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer program product and a non-volatile computer-readable storage medium, where the computer program product and the non-volatile computer-readable storage medium contain computer instructions, and a computing device executes the computer instructions to implement various methods in the first aspect of the embodiments of the present invention.
Detailed Description
A data center in an embodiment of the invention is shown as data center 100 in fig. 1. The data center 100 includes resources 110, and based on the resources 110, the data center 100 provides services 120. The services 120 are all deployed on the resources 110. The services 120 include operation and maintenance services 121, computing services, storage services, network services, management services, data services, security services, and the like. The operation and maintenance service 121 is used for the operation and maintenance data center 100. The resources 110 include physical resources and/or virtual resources, and the resources 110 include computing resources 111, storage resources 112, network resources 113, operation and maintenance equipment, and the like. The computing resources 111 include computing devices used to provide computing power to the data center 100, including physical computing devices and/or virtual computing devices, e.g., physical servers, or virtual machines or containers running on physical servers. The storage resources 112 include storage devices, including physical storage devices and/or virtual storage devices, such as storage arrays or virtual storage devices, for providing storage capability for the data center 100. Network resources 113 include network devices used to provide storage capabilities for data center 100, including physical network devices and/or virtual network devices, such as switches, routers, virtual switches, virtual routers, and the like. In practice, computing resources 111, storage resources 112, and network resources 113 may be deployed in data center 100. The computing devices, storage devices, and network devices in the computing resources 111, storage resources 112, and network resources 113 may be used to directly provide services to users, may also be used to support or manage services provided to users, and the like.
The data center 100 in which the virtual machine, the virtual storage device, or the virtual network device is deployed is a cloud data center. The cloud data center provides cloud services to users on demand based on the resources 110, and the resources 110 of the cloud data center include physical resources and virtual resources.
The cloud data centers comprise a public cloud data center, a private cloud data center and a hybrid cloud data center.
A public cloud data center is a cloud environment shared for use by several organizations and/or users. In a public cloud data center, the services required by users are provided by an independent, third-party provider, and all users share all resources on the public cloud data center.
A private cloud data center is a data center that is exclusively shared by an organization or user. Public cloud data centers provided by third party providers typically have significant computing and storage capabilities. In a private cloud data center, if the data center is exclusively shared by a certain organization, all resources of the private cloud data center are shared by members of the organization, and users not belonging to the organization cannot access services provided by the data center; if the data center is shared by a certain user, other users cannot access the service provided by the data center. In general, the computing power and the storage capacity of the private cloud data center are weaker than those of public cloud data, but the security of the private cloud data center is higher because the private cloud data center is exclusively shared by an organization or a user.
The hybrid cloud data center integrates the advantages of both public cloud data centers and private cloud data centers. As shown in fig. 2, hybrid cloud data center 200 includes public cloud node 212 and private cloud node 211. Public cloud node 212 and private cloud node 211 each have computing, storage, and network resources. The service 120 of the hybrid cloud data center 200 is deployed based on the public cloud node 212 and the private cloud node 211, and the service 120 includes an operation and maintenance service 121. Public cloud node 212 has powerful computing and storage capabilities, shared by several organizations and/or users of its resources; the resources of the private cloud node 211 are exclusively shared by an organization or a user, thereby providing higher security performance for the organization or the user. Services deployed on public cloud nodes 212 of hybrid cloud data center 200 often require strong computing or storage capabilities, but have relatively low requirements for security performance; the service deployed in the private cloud node 211 has a low requirement on computing capacity or storage capacity, but has a high requirement on security performance.
In the embodiment of the invention, the service quality index of the private cloud node is predicted by utilizing the computing capacity of the public cloud node in the operation and maintenance process of the private node in the hybrid cloud data center.
The embodiment of the invention provides an operation and maintenance method of a data center. The method can be applied to the hybrid cloud data center 200 for providing the operation and maintenance service 121 to the hybrid cloud data center 200. The method may be performed by the operation and maintenance device 300 shown in fig. 3. As shown in fig. 3, the operation and maintenance device 300 is deployed in the hybrid cloud data center 200. Specifically, the operation and maintenance device 300 includes a first operation and maintenance unit 310 and a second operation and maintenance unit 320; the first operation and maintenance unit 310 is deployed in the private cloud node 211 and implemented by computing resources, storage resources and network resources in the private cloud node 211; the second operation and maintenance unit 320 is deployed in the public cloud node 212 and implemented by computing resources, storage resources and network resources in the public cloud node 211. The operation and maintenance method described in the embodiment of the present invention will be described with reference to fig. 3 and 4. As shown in fig. 4, the method includes the following steps.
401, the first operation and maintenance unit 310 of the private cloud node 211 acquires multiple sets of historical data of N-dimensional service quality evaluation values of the private cloud node 211, where the N-dimensional service quality evaluation values respectively represent the service quality of the private cloud node 211 in the N dimensions, N is an integer not less than 2, and each set of historical data includes the N-dimensional service quality evaluation values in a time period.
Exemplarily, part of the service quality evaluations are shown in table 1, which belong to private cloud service quality, server service quality, storage service quality, and network service quality, respectively, different types of service quality evaluations, such as performance, availability, and reliability. Each service quality evaluation represents the service quality of the data center in a corresponding dimension, for example, the service quality evaluation, which is the response time of the private cloud service, is a performance index, and represents the service quality of the private cloud node in the dimension, which is the response speed of the private cloud service to the service request. The N quality of service evaluations in the embodiment of the present invention are not limited to the quality of service indexes shown in table 1.
TABLE 1
402, the first operation and maintenance unit 310 of the private cloud node 211 obtains multiple sets of historical data of a composite index value according to multiple sets of historical data, where each set of historical data is a composite evaluation value obtained according to N-dimensional service quality evaluation values of the private cloud node 211.
Specifically, the first operation and maintenance unit 310 of the private cloud node 211 obtains, according to the N-dimensional service quality evaluation values of the private cloud node 211 in a first time period, first historical data in the multiple sets of historical data of the composite index value, where the first historical data is one of the multiple sets of historical data of the composite index value, and the first time period is one of the multiple time periods corresponding to the multiple sets of historical data of the N-dimensional service quality evaluation values.
In general, after a set of history data of the N-dimensional service quality evaluation values for one time slot is obtained, the history data of the comprehensive evaluation value for the time slot is calculated based on the set of history data. An embodiment of the present invention provides a method for obtaining historical data of a comprehensive evaluation value according to a service quality evaluation value in a time period, as shown in fig. 5.
4021, normalizing the service quality evaluation values of N dimensions.
The service quality evaluation values of N dimensions may have different units, for example, the unit of storage mean time between failures and physical server tie time between failures is second, and the unit of storage device availability and physical server availability is second. Before obtaining the comprehensive evaluation value, the service quality evaluation values of the N dimensions need to be normalized to eliminate the unit.
Specifically, the embodiment of the present invention provides a formula for performing normalization processing on N-dimensional service quality evaluation values. Evaluating the service quality x according to the following formulaiNormalization processing is carried out to obtain normalized service quality evaluation value yi:
Wherein i is an arbitrary integer from 1 to N, xiFor any of the N-dimensional quality of service valuations, yiMin is the smallest service quality evaluation value in the service quality evaluation values of the N dimensions, and max is the largest service quality evaluation value in the service quality evaluation values of the N dimensions.
4022, processing the normalized service quality evaluation values of N dimensions according to the weight of each service quality index by adopting the idea of Multiple Attribute Decision (MADM), and obtaining a comprehensive evaluation value. Service quality evaluation value xiWeight w ofiThe importance of the service quality evaluation value in evaluating the service quality of the data center according to the service quality evaluation values of the N dimensions is represented. Specifically, it is obtained according to the following formulaThe comprehensive evaluation value P:
in the case where the weight of each quality of service evaluation is not easily obtained, the embodiment of the present invention provides a method for obtaining the weight of each quality of service evaluation value according to the importance degree parameter.
The importance degree parameter represents a comparison value of any two service quality evaluation values in the service quality evaluation values of the N dimensions. The evaluation values of the service quality indexes of the N dimensions correspond to N x (N-1)/2 importance parameters, the N x (N-1)/2 importance parameters are used as elements of the matrix to construct a judgment matrix A, and then a feature vector W corresponding to the maximum feature root of the matrix A is judged, namely the weight of the evaluation values of the service quality indexes of the N dimensions is represented.
Wherein, aijThe values of i and j are integers from 1 to N as the important degree parameter, aijCharacterization of xiCorresponding service quality evaluation value and xjA comparison value of the corresponding quality of service evaluation value. The feature vector W is a 1 × N matrix, and the elements of the feature vector are weights of the N-dimensional qos evaluation values, i.e., W ═ W1,w2,....,wn),wiQuality of service index xiThe weight of (c).
In general, the first operation and maintenance unit of the private cloud node monitors the N-dimensional service quality evaluation values in real time, and calculates a comprehensive evaluation value according to the N-dimensional service quality evaluation values in real time. Since storage resources and computing resources on the private cloud are limited, after the historical data of the comprehensive evaluation value is obtained, the first operation and maintenance unit uploads the historical data of the comprehensive evaluation value to the public cloud node, and the historical data of the comprehensive evaluation value is stored in the public cloud node.
403, the second operation and maintenance unit 320 of the public cloud node 212 acquires a plurality of pieces of historical data of the comprehensive evaluation value sent by the private cloud node.
The public cloud node 212 predicts an operating condition of the service of the private cloud node based on a plurality of history data of the comprehensive evaluation value 404.
Specifically, the second operation and maintenance unit takes a plurality of historical data of the comprehensive evaluation value as a training set to obtain a prediction model of the comprehensive evaluation value. By means of the neural network and the deep learning method, a prediction model of the comprehensive evaluation value can be obtained according to a training set of a plurality of historical data containing the comprehensive evaluation value. Preferably, the training method comprises a Recurrent Neural Network (RNN) training method, in particular a Long Short-Term Memory (LSTM) training method. In addition, any method of deriving a predictive model from a training set may be used in embodiments of the invention.
And the second operation and maintenance unit predicts the comprehensive evaluation value based on the prediction model to obtain a predicted value. The predicted value reflects the operation condition trend of the service of the private cloud node.
405, the second operation and maintenance unit 320 of the public cloud node 212 determines that the predicted value satisfies the alarm rule.
405, in response to the determination, the second operation and maintenance unit 320 of the public cloud node 212 sends the alarm message to the first operation and maintenance unit 310 of the private cloud node 211, so that the first operation and maintenance unit 310 operates and maintains the private cloud node 211 according to the alarm message.
406, the first operation and maintenance unit 310 of the private cloud node 211 performs operation and maintenance, such as fault query, troubleshooting, capacity expansion, and the like, on the private cloud node 211 according to the received alarm message.
By the method, the comprehensive index of the data center can be obtained, so that the visual, comprehensive and quantitative evaluation on the service quality of the data center is realized, the operation and maintenance efficiency of the data center is improved, and the follow-up operation and maintenance operations such as early warning, fault identification and the like are facilitated.
The operation and maintenance device 300 in the embodiment of the present invention includes a first operation and maintenance unit 310 and a second operation and maintenance unit 320. As shown in fig. 6, the data center 200 includes a private cloud node and a public cloud node. The first operation and maintenance unit 310 comprises a monitoring module 311 and a processing module 312; the second operation and maintenance module includes a prediction module 313. The modules on the first operation and maintenance unit 310 are respectively deployed at the private cloud node 211, and the modules on the second operation and maintenance unit 320 are respectively deployed at the public cloud node 313.
A monitoring module 311 configured to: monitoring the quality of service of the N dimensions of the private cloud node 211; obtaining a comprehensive evaluation value according to the N-dimensional service quality evaluation values, where the N-dimensional service quality evaluation values respectively represent the service quality of the private cloud node 211 in the N dimensions, and N is an integer not less than 2; a plurality of historical data, each of which is a composite evaluation value obtained from the N-dimensional service quality evaluation values of the private cloud node 211, is sent to a prediction module 313 deployed on the public cloud node 212.
A prediction module 313 to: receiving a plurality of historical data sent by the monitoring module 311; predicting the comprehensive evaluation value of the private cloud node 211 according to the plurality of historical data to obtain a predicted value; determining that the predicted value meets an alarm rule; in response to the determination, an alert message is sent to the processing module 312 of the private cloud node 211.
The processing module 312 performs operation and maintenance on the private cloud node 211 with the fan alert message.
Optionally, the private cloud node 211 includes a physical device for providing the service 120 as shown in fig. 2, and the quality of service of the N dimensions includes the quality of service of the service 120 and the quality of service of the physical device.
Optionally, the monitoring module 311 is configured to obtain, according to the quality of service evaluation values of the N dimensions of the private cloud node 211 in the first time period, a first historical data in the plurality of historical data, where the obtaining includes: normalizing the service quality evaluation values of the N dimensions in the first time period; and obtaining the first history data according to the service quality evaluation values of the N normalized dimensions and the weight of the service quality evaluation value of each dimension.
Optionally, the monitoring module 311 is further configured to: acquiring N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions; and acquiring the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
An embodiment of the present application further provides a data center 700 as shown in fig. 7. Data center 700 includes at least one computing device 710 and at least one computing device 720. Data center 700 may be used to implement hybrid cloud data center 200 as shown in fig. 3, where public cloud nodes 212, private cloud nodes 211, and operation and maintenance device 300 in hybrid cloud data center 200 are all deployed on at least one computing device 710 and/or at least one computing device 720. Specifically, the private cloud node 211 is deployed on at least one computing device 710 and the public cloud node 212 is deployed on at least one computing device 720. Correspondingly, the first operation and maintenance unit 310 on the private cloud node 211 is deployed on at least one computing device 710, and the second operation and maintenance unit 320 on the public cloud node 212 is deployed on at least one computing device 720. The computing device 710 may include a processing unit 711 and a communication interface 712, where the processing unit 711 is configured to execute functions defined by an operating system and various software programs running on the computing device, including the functions of the modules in the first operation and maintenance unit 310. The computing device 720 may include a processing unit 721 and a communication interface 722, where the processing unit 721 is configured to execute the functions defined by the operating system and various software programs running on the computing device, including the functions of the modules in the second operation and maintenance unit 320. Communication interface 712 and communication interface 722 are for communicative interaction with other devices, which may be other computing devices, and in particular communication interface 712 and communication interface 722 may be network adapter cards.
Optionally, the computing device 710 may further include an input/output interface 713, and the input/output interface 713 is connected with an input/output device for receiving input information and outputting an operation result. The input/output interface 713 may be a mouse, a keyboard, a display, or an optical drive, among others. Optionally, the computing device 710 may also include a secondary storage 714, also commonly referred to as external memory, the storage medium of the secondary storage 714 may be a magnetic medium (e.g., floppy disks, hard disks, tapes), an optical medium (e.g., compact disks), or a semiconductor medium (e.g., solid state drives), among others. The processing unit 711 may have various specific implementations, for example, the processing unit 711 may include a processor 7112 and a memory 7111, the processor 7112 may execute related operations according to program instructions stored in the memory 7111, the processor 7112 may be a Central Processing Unit (CPU), for example, the processor 7112 may include a CPU0 and a CPU1, or may be a Graphics Processing Unit (GPU), and the processor 7112 may be a single-core processor or a multi-core processor. The processing unit 711 may also be implemented by using a logic device with built-in processing logic, such as a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or the like. Moreover, computing device 710 in FIG. 7 is merely one example of a computing device, and computing device 710 may contain more or fewer components than shown in FIG. 7, or have a different arrangement of components.
Likewise, computing device 720 may also include input/output interface 723 and secondary memory 724. The processing unit 712 of the computing device 720 may also have various implementations, for example, the processing unit 721 may include a processor 7212 and a memory 7211, the processor 7212 may perform operations related to program instructions stored in the memory 7211, or may be implemented solely using logic devices with built-in processing logic. Computing device 720 may contain more or fewer components than computing device 710, or have a different arrangement of components.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.