CN117667573A

CN117667573A - Cluster operation and maintenance method and device based on AI language model

Info

Publication number: CN117667573A
Application number: CN202311540512.2A
Authority: CN
Inventors: 杨诚
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-03-08

Abstract

The disclosure provides a cluster operation and maintenance method based on an AI language model, relates to the technical field of artificial intelligence, and can be applied to the technical field of finance. The method comprises the following steps: responding to the abnormal event trigger of the cluster, and acquiring abnormal event information; invoking an AI language model interface, and inputting the abnormal event information into the AI language model; receiving an abnormal event description text returned by the AI language model; and carrying out cluster operation and maintenance according to the abnormal event description text. The disclosure also provides a cluster operation and maintenance device, equipment, a storage medium and a program product based on the AI language model.

Description

Cluster operation and maintenance method and device based on AI language model

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular, to the field of cloud computing technology, and more particularly, to an AI-based language model cluster operation and maintenance method, apparatus, device, storage medium, and program product.

Background

In today's large cloud computing systems, kubernetes is a very popular container orchestration and management platform that is capable of automatically deploying, expanding, and managing applications and services. However, due to the large number of nodes and services involved in the Kubernetes cluster, a wide variety of anomalies may occur, such as node failures, service failures, and the like. This has a certain technical requirement for operation and maintenance personnel, and for some novice or persons unfamiliar with Kubernetes clusters, handling the above abnormal events becomes difficult and time-consuming, and greatly affects operation and maintenance efficiency.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a cluster operation and maintenance method, apparatus, device, storage medium, and program product based on an AI language model, which improves operation and maintenance efficiency.

According to a first aspect of the present disclosure, there is provided a cluster operation and maintenance method based on an AI language model, the method comprising:

responding to the abnormal event trigger of the cluster, and acquiring abnormal event information;

invoking an AI language model interface, and inputting the abnormal event information into the AI language model;

receiving an abnormal event description text returned by the AI language model; and

and carrying out cluster operation and maintenance according to the abnormal event description text.

According to an embodiment of the present disclosure, the performing cluster operation according to the abnormal event description text includes:

determining the type of the abnormal event according to the abnormal event description text;

determining a target solution according to the abnormal event type; and

and executing the operation and maintenance script corresponding to the target solution to process the abnormal event.

According to an embodiment of the present disclosure, the determining a target solution according to the abnormal event type includes:

invoking an AI language model interface according to the abnormal event type to determine a target solution; or (b)

And matching the target solution in a solution library according to the abnormal event type.

According to an embodiment of the disclosure, the invoking the AI language model interface according to the abnormal event type to determine the target solution includes:

invoking an AI language model interface, and inputting the abnormal event type into the AI language model;

and comparing the solutions received from the AI language model in a solution library, and determining a target solution according to the comparison result.

According to an embodiment of the present disclosure, the determining a target solution according to the comparison result includes:

if the solution exists in the solution library, determining the solution as a target solution;

if the solution is determined to be not in the solution library, sending the solution to an operation and maintenance foreground; and

and in response to a confirmation instruction of the operation and maintenance foreground, determining the solution as a target solution.

According to an embodiment of the present disclosure, the method further comprises:

responding to a search instruction of the operation and maintenance foreground, and acquiring an abnormal event list in a solution library; and

and displaying unprocessed abnormal events and notifying operation staff.

A second aspect of the present disclosure provides a cluster operation and maintenance device based on an AI language model, the device comprising:

the first acquisition module is used for responding to the abnormal event trigger of the cluster and acquiring abnormal event information;

the calling module is used for calling an AI language model interface and inputting the abnormal event information into the AI language model;

the receiving module is used for receiving the abnormal event description text returned by the AI language model; and

and the operation and maintenance module is used for carrying out cluster operation and maintenance according to the abnormal event description text.

According to an embodiment of the present disclosure, the operation and maintenance module includes: the device comprises a first determining sub-module, a second determining sub-module and an executing sub-module.

The first determining submodule is used for determining the type of the abnormal event according to the abnormal event description text;

a second determining sub-module for determining a target solution according to the abnormal event type; and

and the execution sub-module is used for executing the operation and maintenance script corresponding to the target solution to process the abnormal event.

According to an embodiment of the present disclosure, the second determining submodule includes: a first determining unit and a matching unit.

A first determining unit, configured to invoke an AI language model interface according to the abnormal event type, so as to determine a target solution; or (b)

And the matching unit is used for matching the target solution in the solution library according to the abnormal event type.

According to an embodiment of the present disclosure, the first determining unit includes: the method comprises the steps of calling the subunit, comparing the subunit and the first determining subunit.

The calling subunit is used for calling an AI language model interface and inputting the abnormal event type into the AI language model;

the comparison subunit is used for comparing the solutions received from the AI language model in a solution library;

and the first determination subunit is used for determining a target solution according to the comparison result.

According to an embodiment of the present disclosure, the first determining subunit is further configured to determine, if it is determined that the solution exists in the solution library, that the solution is a target solution; if the solution is determined to be not in the solution library, sending the solution to an operation and maintenance foreground; and in response to a confirmation instruction of the operation and maintenance foreground, determining that the solution is a target solution.

According to an embodiment of the present disclosure, the apparatus further comprises: and the second acquisition module and the display module.

The second acquisition module is used for responding to the search instruction of the operation and maintenance foreground and acquiring an abnormal event list in the solution library; and

and the display module is used for displaying unprocessed abnormal events and notifying operation and maintenance personnel.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the cluster operation and maintenance method described above.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the cluster operation and maintenance method described above.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the cluster operation and maintenance method described above.

According to the cluster operation and maintenance method based on the AI language model, when the cluster triggers an abnormal event, an AI language model interface is called, and abnormal event information is input to the AI language model; and carrying out cluster operation and maintenance according to the received abnormal event description text returned by the AI language model. According to the embodiment of the disclosure, the AI language model is combined with the Kubernetes cluster, the abnormal events in the Kubernetes cluster are converted into the corresponding text expressions through the natural language processing technology, so that the understanding of operation and maintenance personnel is facilitated, the technical threshold of the operation and maintenance personnel is reduced, the corresponding solution proposal is automatically generated for the operation and maintenance personnel to confirm, and the cluster operation and maintenance efficiency is greatly improved.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a system architecture diagram of an AI language model-based cluster tool in accordance with an embodiment of the disclosure;

FIG. 2 schematically illustrates an application scenario diagram of a cluster operation and maintenance method, apparatus, device, storage medium and program product based on an AI language model in accordance with an embodiment of the disclosure;

FIG. 3 schematically illustrates a flow chart of a cluster operation and maintenance method based on an AI language model provided in accordance with an embodiment of the disclosure;

FIG. 4a schematically illustrates one of the flowcharts of a cluster operation and maintenance operation according to an abnormal event description text provided in accordance with another embodiment of the present disclosure;

FIG. 4b schematically illustrates a second flowchart of a cluster operation according to the anomaly event description text provided in accordance with another embodiment of the present disclosure;

FIG. 5 schematically illustrates a third flowchart of a cluster operation and maintenance operation according to the abnormal event description text provided in accordance with another embodiment of the present disclosure;

FIG. 6 schematically illustrates a fourth flow chart of a cluster operation and maintenance operation according to the abnormal event description text provided in accordance with another embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a cluster operation and maintenance device based on an AI language model in accordance with an embodiment of the disclosure;

fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement an AI language model-based cluster operation and maintenance method, in accordance with an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The terms appearing in the embodiments of the present disclosure will first be explained:

kubernetes: an open source system platform for automatically deploying, expanding and managing containerized applications.

Pod: the Kubernetes platform manages the minimum load unit.

Kubelet: the Kubernetes component deployed in the virtual machine responsible for managing Pod instances.

Kube-apiserver: components of the exposed Kubernetes API deployed at the management node.

Etcd: a key value pair stores a database.

Based on the technical problems, an embodiment of the present disclosure provides a cluster operation and maintenance method based on an A1 language model, where the method includes: responding to the abnormal event trigger of the cluster, and acquiring abnormal event information; invoking an AI language model interface, and inputting the abnormal event information into the AI language model; receiving an abnormal event description text returned by the AI language model; and carrying out cluster operation and maintenance according to the abnormal event description text.

Fig. 1 schematically illustrates a system architecture diagram of an AI language model-based cluster operation and maintenance device according to an embodiment of the present disclosure. As shown in fig. 1, an apparatus provided in an embodiment of the present disclosure includes: the system comprises a data acquisition plane, an AI language model controller, a solution library and a plurality of components of an operation and maintenance foreground. Data acquisition plane: it is responsible for collecting various monitoring indexes, log information and other data from the Kubernetes cluster and storing the data in a designated data storage. The monitoring index part is collected through promethues, the event part is collected through kube-event, and abnormal rules are configured to send up error, warnning events, and normal and info events are ignored. By analyzing the preprocessed data, potential anomalies such as node failures, application crashes, insufficient container resources, etc., are detected. AI language model controller: and the method is responsible for carrying out detailed interaction between the abnormal part in the acquired data and the AI language model, sending the text content of the abnormal information to the AI language model api, and designating the output format of the AI language model. Solution library: the solution library will record each exception condition in the table above and mark the exception that has been handled as known. Operation and maintenance foreground: the operation and maintenance foreground retrieves the solution library, can list all the exceptions and unprocessed exceptions, and notifies and displays the unprocessed exceptions to a system administrator.

Fig. 2 schematically illustrates an application scenario diagram of a cluster operation and maintenance method, apparatus, device, storage medium and program product based on an AI language model according to an embodiment of the disclosure.

As shown in fig. 2, the application scenario 200 according to this embodiment may include a recommended scenario of test cases. The network 204 is the medium used to provide communication links between the terminal devices 201, 202, 203 and the server 205. The network 204 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 205 via the network 204 using the terminal devices 201, 202, 203 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 201, 202, 203.

The terminal devices 201, 202, 203 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like. The server 205 may be a cluster operation and maintenance server, which may perform the cluster operation and maintenance method provided by the embodiment of the present disclosure, and obtain abnormal event information in response to an abnormal event of a cluster. Invoking an AI language model interface, and inputting the abnormal event information into the AI language model; receiving an abnormal event description text returned by the AI language model; and carrying out cluster operation and maintenance according to the abnormal event description text.

It should be noted that the cluster operation and maintenance method based on the AI language model provided by the embodiment of the disclosure may be generally executed by the server 205. Accordingly, the cluster operation and maintenance device based on the AI language model provided by the embodiments of the present disclosure may be generally disposed in the server 205. The cluster operation and maintenance method based on the AI language model provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 205 and is capable of communicating with the terminal devices 201, 202, 203 and/or the server 205. Accordingly, the cluster operation and maintenance apparatus based on the AI language model provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 205 and is capable of communicating with the terminal devices 201, 202, 203 and/or the server 205.

It should be understood that the number of terminal devices, networks and servers in fig. 2 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be noted that, the cluster operation and maintenance method and device based on the AI language model determined in the embodiments of the present disclosure may be used in the technical field of cloud computing, the technical field of artificial intelligence and the technical field of finance, and may also be used in any field other than the financial field, and the application field of the cluster operation and maintenance method and device based on the AI language model determined in the embodiments of the present disclosure is not limited.

The cluster operation and maintenance method according to the embodiments of the present disclosure will be described in detail below with reference to fig. 3 to 6 based on the system architecture described in fig. 1 and the application scenario described in fig. 2.

Fig. 3 schematically illustrates a flowchart of a cluster operation and maintenance method based on an AI language model according to an embodiment of the disclosure. As shown in fig. 3, the cluster operation and maintenance method of this embodiment includes operations S210 to S240, which may be performed by a server or other computing device.

In operation S210, abnormal event information is acquired in response to an abnormal event trigger of the cluster.

In one example, the monitoring plane collects various monitoring indexes, log information and other data from the K8s cluster, wherein the monitoring indexes are collected through prometuites, the events are collected through kube-event, and abnormal rules are configured to send up error, warnning events, and normal and info events are ignored. By analyzing the preprocessed data, potential anomalies such as node failures, application crashes, insufficient container resources, etc., are detected. When the monitoring plane determines that an abnormal event exists in the K8s cluster, abnormal event information is acquired, wherein the abnormal event information comprises time, park, server type and the like of the abnormal event.

In operation S220, an AI language model interface is called, and the abnormal event information is input to the AI language model.

In operation S230, an abnormal event description text returned from the AI language model is received.

In one example, the acquired abnormal event information is interacted with the AI language model by the AI language model controller, the text content of the abnormal information is transmitted to the AI language model api, and the output format of the AI language model is specified. The AI language model in the embodiments of the present disclosure may be, for example, a commonly used natural language processing model. The AI language model generates descriptive text of the anomaly event for the operator to better understand and handle the anomaly event.

In operation S240, cluster operation is performed according to the abnormal event description text.

According to an embodiment of the disclosure, an abnormal event list in a solution library is acquired in response to a search instruction of an operation and maintenance foreground; and displaying unprocessed abnormal events and notifying operation staff.

In one example, the solution library record maintains each exception event, marking exceptions that have been handled as known. The operation and maintenance foreground retrieves the solution library, can list all the exceptions and unprocessed exceptions, and notifies and displays the unprocessed exceptions to a system administrator. The cluster operation and maintenance are carried out according to the abnormal event description text with a specific format output by the AI language model, and the specific process can be seen from the operation shown in fig. 4a to 6.

Fig. 4a schematically illustrates one of flowcharts of a cluster operation according to an abnormal event description text provided according to another embodiment of the present disclosure, fig. 4b schematically illustrates a second one of the flowcharts of the cluster operation according to the abnormal event description text provided according to another embodiment of the present disclosure, fig. 5 schematically illustrates a third one of the flowcharts of the cluster operation according to the abnormal event description text provided according to another embodiment of the present disclosure, and fig. 6 schematically illustrates a fourth one of the flowcharts of the cluster operation according to the abnormal event description text provided according to another embodiment of the present disclosure. As shown in fig. 4a, operation S240 includes operations S241 to S243.

In operation S241, an abnormal event type is determined according to the abnormal event description text.

In operation S242, a target solution is determined according to the type of the abnormal event.

In operation S243, an operation and maintenance script corresponding to the target solution is executed to process an abnormal event.

As shown in fig. 4b, operation S242 includes operations S2421 and S2422.

In operation S2421, an AI language model interface is invoked according to the abnormal event type to determine a target solution.

In operation S2422, a target solution is matched in a solution library according to the type of the abnormal event.

In one example, by summarizing and cleaning the received abnormal event description text, determining the type of the abnormal event, and further determining a corresponding target solution according to the type of the abnormal event, the method can be to call an AI language model API interface again to generate a solution; in another possible implementation, the target solution may be matched in a solution library according to the type of anomaly event. Specifically, as shown in fig. 5, operation S2421 includes operations S310 to S330.

In operation S310, an AI language model interface is called, and the abnormal event type is input to the AI language model.

In operation S320, the solutions received the AI language model output are compared in a solution library.

In operation S330, a target solution is determined according to the comparison result.

As shown in fig. 6, operation S330 includes operations S331 to S333.

In operation S331, if it is determined that the solution exists in the solution library, it is determined that the solution is a target solution.

In operation S332, if it is determined that the solution does not exist in the solution library, the solution is transmitted to the operation and maintenance foreground.

In operation S333, the solution is determined to be the target solution in response to the confirmation instruction of the operation and maintenance foreground.

In one example, the controller invokes the AI language model api to have the AI language model provide the corresponding solution suggestion in a fixed format. As shown in table one below:

form-abnormal event expression text and solution

In one example, if a controller retrieves a solution already exists in the solution library, the solution will be automatically executed and the system operator will be notified of the repair result. If the exception cannot be found in the solution library, the controller will send a notification to the operation and maintenance foreground after invoking the AI language model to provide the solution suggestion, waiting for the administrator to confirm whether the solution is viable. In response to a confirmation instruction of the operation and maintenance foreground, the solution is determined to be a target solution. If the operator confirms that the solution is viable, the system will store the solution and automatically match the questions from the same exception event type to the solution for processing. According to the method provided by the embodiment of the disclosure, the AI language model is used for natural language processing, simple text description is generated to express the abnormal event, an operator does not need to know indexes and log data of the Kubernetes cluster deeply, meanwhile, the solution provided by the system is more visual and easy to understand, and the feasibility is easier to confirm. In addition, the embodiment provides better reference for the subsequent operation and maintenance work by continuously updating the solution library, and improves the cluster operation and maintenance efficiency.

Based on the cluster operation and maintenance method based on the AI language model, the disclosure also provides a cluster operation and maintenance device based on the AI language model. The device will be described in detail below in connection with fig. 7.

Fig. 7 schematically illustrates a block diagram of a cluster operation and maintenance device based on an AI language model according to an embodiment of the disclosure. As shown in fig. 7, the cluster operation and maintenance device 700 based on the AI language model of this embodiment includes a first acquisition module 710, a calling module 720, a receiving module 730, and an operation and maintenance module 740.

The first obtaining module 710 is configured to obtain abnormal event information in response to an abnormal event trigger of the cluster. In an embodiment, the obtaining module 710 may be configured to perform the operation S210 described above, which is not described herein.

The calling module 720 is configured to call an AI language model interface, and input the abnormal event information to the AI language model. In an embodiment, the calling module 720 may be configured to perform the operation S220 described above, which is not described herein.

The receiving module 730 is configured to receive the abnormal event description text returned by the AI language model. In an embodiment, the receiving module 730 may be configured to perform the operation S230 described above, which is not described herein.

The operation and maintenance module 740 is configured to perform cluster operation and maintenance according to the abnormal event description text. In an embodiment, the operation and maintenance module 740 may be used to perform the operation S240 described above, which is not described herein.

And the first determination submodule is used for determining the type of the abnormal event according to the abnormal event description text.

And the second determination submodule is used for determining a target solution according to the abnormal event type.

And the first determining unit is used for calling an AI language model interface according to the abnormal event type so as to determine a target solution.

And the calling subunit is used for calling an AI language model interface and inputting the abnormal event type into the AI language model.

And the comparison subunit is used for comparing the solutions received from the AI language model in a solution library.

And the second acquisition module is used for responding to the search instruction of the operation and maintenance foreground and acquiring an abnormal event list in the solution library.

According to an embodiment of the present disclosure, any of the first acquisition module 710, the calling module 720, the receiving module 730, and the operation and maintenance module 740 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the first acquisition module 710, the invocation module 720, the receiving module 730, and the operation and maintenance module 740 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the first acquisition module 710, the calling module 720, the receiving module 730, and the operation and maintenance module 740 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.

As shown in fig. 8, an electronic device 900 according to an embodiment of the present disclosure includes a processor 901 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. Processor 901 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the program may be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the disclosure, the electronic device 900 may also include an input/output (I/O) interface 905, the input/output (I/O) interface 905 also being connected to the bus 904. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 909 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 909, so that a computer program read therefrom is installed into the storage section 908 as needed.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs that, when executed, implement a cluster operation and maintenance method according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 902 and/or RAM 903 and/or one or more memories other than ROM 902 and RAM 903 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to implement the cluster operation and maintenance method based on the AI language model provided by the embodiments of the disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, via communication portion 909, and/or installed from removable medium 911. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A cluster operation and maintenance method based on an AI language model, the method comprising:

2. The cluster operation and maintenance method according to claim 1, wherein the performing cluster operation and maintenance according to the abnormal event description text comprises:

determining a target solution according to the abnormal event type; and

3. The cluster operation and maintenance method according to claim 2, wherein the determining a target solution according to the abnormal event type includes:

4. The cluster operation and maintenance method according to claim 3, wherein the invoking the AI language model interface according to the abnormal event type to determine the target solution comprises:

comparing the solutions received from the AI language model in a solution library;

and determining a target solution according to the comparison result.

5. The cluster operation and maintenance method according to claim 4, wherein the determining the target solution according to the comparison result includes:

6. The cluster operation and maintenance method according to any one of claims 1 to 5, further comprising:

and displaying unprocessed abnormal events and notifying operation staff.

7. A cluster operation and maintenance device based on AI language model, the device comprising:

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.

9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-6.

10. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 6.