CN118963968A

CN118963968A - Function management method, device, computer equipment and storage medium

Info

Publication number: CN118963968A
Application number: CN202411227714.6A
Authority: CN
Inventors: 都雯卿; 杨茁
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-09-03
Filing date: 2024-09-03
Publication date: 2024-11-15

Abstract

The present invention relates to the field of computer technology, and discloses a function management method, device, computer equipment and storage medium, the method comprising: obtaining a first license of an artificial intelligence platform at the current moment, the first license including a first quantity identifier corresponding to a first quantity of functions of the artificial intelligence platform deployed on the computing node at the current moment, in response to a user's adding operation, obtaining a second license of the artificial intelligence platform, the second license including a second quantity identifier corresponding to a second quantity of functions of the artificial intelligence platform to be deployed on the computing node; if the first quantity identifier is smaller than the second quantity identifier, then determining a newly added identifier, and deploying the function corresponding to the newly added identifier on the computing node to upgrade the artificial intelligence platform. The present invention can solve the problems of low upgrade efficiency of artificial intelligence platforms, and consuming R&D costs and testing costs.

Description

Function management method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a function management method, a device, a computer device, and a storage medium.

Background

The artificial intelligence platform refers to an integrated software system or cloud service, and aims to support development, deployment and management of artificial intelligence application programs. These platforms typically provide a series of tools and services that help developers and data scientists build, train, deploy machine learning models and other AI solutions. The functional requirements of different clients are different, and the artificial intelligent platform needs to develop and distribute the installation packages and products with various functions according to the functional requirements of the clients.

In the related art, an artificial intelligent platform needs to test an installation package and a product corresponding to each function; and when the required functions of clients are increased, an upgrade package needs to be developed to upgrade the original installation package, so that the upgrade efficiency of the artificial intelligent platform is low, and research and development cost and test cost are consumed.

Disclosure of Invention

In view of the above, the present invention provides a method, apparatus, computer device and storage medium for managing functions, so as to solve the problems of low upgrade efficiency, and development cost and test cost of the artificial intelligent platform.

In a first aspect, the present invention provides a function management method, including: under the condition that the node state of the computing node is normal and the residual resources meet the preset conditions, acquiring a first license of the artificial intelligent platform at the current moment, wherein the first license comprises a first number of identifiers corresponding to the first number of functions of the artificial intelligent platform deployed on the computing node at the current moment; responding to the adding operation of the user, and acquiring a second license of the artificial intelligent platform, wherein the second license comprises a second number identifier corresponding to a second number of functions of the artificial intelligent platform to be deployed on the computing node; if the first number of identifiers is smaller than the second number of identifiers, determining a newly added identifier, and deploying a function corresponding to the newly added identifier on the computing node so as to upgrade the artificial intelligent platform.

Based on the method of the first aspect, under the condition that the node state of the computing node is detected to be normal and the residual resources meet the preset condition, a first license of the artificial intelligent platform at the current moment can be obtained; responding to the adding operation of the user, and acquiring a second license of the artificial intelligent platform; if the first number of identifiers included in the first license is smaller than the second number of identifiers included in the second license, determining a newly added identifier, and deploying a function corresponding to the newly added identifier on the computing node so as to upgrade the artificial intelligent platform.

Because the license comprises the identification corresponding to the functions of the artificial intelligent platform, and the license is used for indicating the functions of the artificial intelligent platform which are allowed to be opened or deployed by the computing node, and the license is obtained in response to the adding operation of the user, the user can select the license according to the actual requirement, and then select the functions of the artificial intelligent platform which are required to be deployed on the computing node; further, comparing the first number of identifications included in the first license to the second number of identifications included in the second license determines a new added identification to determine functionality of the artificial intelligence platform that needs to be added on the computing node. Compared with the method for controlling the functions of the artificial intelligent platform by developing and testing different installation packages in the related art, the method can control the functions of the artificial intelligent platform through the license, so that the upgrading efficiency of the artificial intelligent platform is improved, and the development cost and the testing cost are not required to be consumed.

In an alternative embodiment, if the first number of identities is greater than the second number of identities, then a pruned identity is determined and the function corresponding to the pruned identity is deleted on the computing node to downgrade the artificial intelligence platform.

Based on the method, the deleted identification can be determined, and then the function corresponding to the deleted identification needs to be closed or deleted on the computing node, so that the function of the artificial intelligent platform required by the client is met.

In an alternative embodiment, before detecting that the node state of the computing node is normal and the remaining resources meet the preset condition, the method further includes: and receiving clicking operation of a user on the display interface of the artificial intelligent platform, and responding to the clicking operation, and sending a first detection request to the computing node, wherein the first detection request is used for detecting the running state of each operating system service in all operating system services of the computing node.

And under the condition that the running state of each operating system service is normal according to the first detection request, sending a second detection request to the computing node, wherein the second detection request is used for detecting the running state of each minimum scheduling unit in all minimum scheduling units of the computing node. Under the condition that the running state of each minimum scheduling unit is normal according to the second detection request, sending a third detection request to the computing node, wherein the third detection request is used for detecting the running state of each task in all tasks of the computing node; under the condition that the fact that the task which is being executed is not existed or the task to be executed is not existed in all the tasks is determined according to the third detection request, the node state of the computing node is determined to be normal, and the residual resources of the computing node are obtained; and if the residual resources are greater than or equal to the preset threshold, determining that the residual resources of the computing node meet the preset condition.

Based on the method, since all the minimum scheduling units of the computing node are built on the operating system service, the running state of each minimum scheduling unit in all the minimum scheduling units of the computing node is detected under the condition that the running state of each operating system service is determined to be normal according to the first detection request; because all tasks of the computing node are executed in the minimum scheduling unit, the running state of each task in all tasks of the computing node is detected under the condition that the running state of each minimum scheduling unit is determined to be normal according to the second detection request; and because the execution task needs to occupy the resources of the computing node, under the condition that the fact that the task which is being executed is not existed or the task to be executed is not existed in all the tasks is determined according to the third detection request, the node state of the computing node is determined to be normal, and the residual resources of the computing node are obtained. Therefore, whether the node state of the computing node is normal or not can be detected, and whether the residual resources of the computing node meet preset conditions or not can be determined.

In an alternative embodiment, the method further comprises: determining node state abnormality of the computing node under the condition that the running state abnormality of at least one operating system service exists in all the operating system services according to the first detection request; or determining that the node state of the computing node is abnormal under the condition that the running state of at least one minimum scheduling unit is abnormal in all the minimum scheduling units according to the second detection request; or determining that the node state of the computing node is abnormal in the condition that at least one task which is being executed exists in all tasks according to the third detection request and/or at least one task which is to be executed exists.

In an alternative embodiment, deploying a function corresponding to the newly added identifier on the computing node includes: acquiring an upgrade script corresponding to a function corresponding to the newly added identifier; the method comprises the steps that an upgrade script is sent to a computing node, and the computing node is used for executing at least one command in the upgrade script and returning an execution result of each command in the at least one command; receiving an execution result of each command in at least one command fed back by a computing node; if the execution result of each command indicates that each command is successfully executed, the function corresponding to the newly added identifier is successfully deployed on the computing node.

Based on the method, the artificial intelligent platform can be upgraded by executing the command in the upgrade script, and whether the artificial intelligent platform is successfully upgraded is determined based on the command execution result in the upgrade script.

In an alternative embodiment, the at least one command includes a first command, and after receiving the execution result of each command in the at least one command fed back by the computing node, the method further includes: if the execution result of the first command indicates that the first command reports errors, analyzing the execution result of the first command to obtain error reporting keywords corresponding to the first command; and determining a target strategy according to the error reporting keyword and a preset corresponding relation, and executing the target strategy to correct errors in the execution result of the first command, wherein the preset corresponding relation is the corresponding relation between the error reporting keyword and the target strategy.

Based on the method, the error in the execution result of the first command can be corrected according to the preset corresponding relation preset in the artificial intelligent platform, so that the first command is successfully executed, and the artificial intelligent platform is upgraded.

In an alternative embodiment, under the condition that the function corresponding to the newly added identifier is failed to be deployed on the computing node, the state of the second license is set to be a failure state, so that the artificial intelligent platform is restored to the state before upgrading.

Based on the method, the state of the second license can be set as the failure state under the condition that the upgrade of the artificial intelligent platform fails, so that the artificial intelligent platform is restored to the state before the upgrade.

In a second aspect, the present invention provides a function management apparatus comprising: the acquisition module is used for acquiring a first license of the artificial intelligent platform at the current moment under the condition that the node state of the computing node is normal and the residual resources meet the preset condition, wherein the first license comprises a first number identifier corresponding to a first number of functions of the artificial intelligent platform deployed on the computing node at the current moment; the acquisition module is used for responding to the adding operation of the user and acquiring a second license of the artificial intelligent platform, wherein the second license comprises a second number identifier corresponding to a second number of functions of the artificial intelligent platform to be deployed on the computing node; and the processing module is also used for determining the newly added identifier and deploying the function corresponding to the newly added identifier on the computing node to upgrade the artificial intelligent platform if the first number of identifiers is smaller than the second number of identifiers.

In a third aspect, the present invention provides a computer device comprising: the memory and the processor are in communication connection with each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the function management method according to the first aspect or any implementation manner corresponding to the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the function management method of the first aspect or any of the embodiments corresponding thereto.

In a fifth aspect, the present invention provides a computer program product comprising computer instructions for causing a computer to perform the function management method of the first aspect or any of its corresponding embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic architecture diagram of an artificial intelligence platform 10 according to an embodiment of the invention;

FIG. 2 is a flow chart of a method of function management according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a management node detecting node status and remaining resources of a computing node in accordance with an embodiment of the invention;

FIG. 4 is a flow chart of yet another function management method according to an embodiment of the present invention;

fig. 5 is a block diagram of a function management apparatus according to an embodiment of the present invention;

Fig. 6 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As described in the background art, because the artificial intelligence platform needs to test the installation package and the product corresponding to each function; and when the required functions of clients are increased, an upgrade package needs to be developed to upgrade the original installation package, which results in low upgrade efficiency, and development cost and test cost consumption of the artificial intelligent platform.

In order to solve the technical problems, the embodiment of the invention provides a function management method, which controls the functions of an artificial intelligent platform through a license so as to improve the upgrading efficiency of the artificial intelligent platform without consuming research and development cost and test cost.

As shown in FIG. 1, FIG. 1 is a schematic architecture diagram of an artificial intelligence platform 10 according to an embodiment of the invention. In FIG. 1, the artificial intelligence platform 10 includes a management node 101, a computing node 102, and a computing node 103.

The management node (also referred to as a function management device or Master node) in the embodiment of the present invention, for example: the management node 101 may be any device having a computing function and a communication function. For example, the management node may be a server, a cloud device, or the like. The cloud management node manages computing nodes of the artificial intelligence platform. Alternatively, the management node may be a computing node.

The computing node in the embodiment of the invention comprises the following steps: the computing node 102 or the computing node 103 may be any device having a computing function and a communication function. The compute nodes are used to schedule execution tasks. The computing nodes may have deployed thereon functionality comprised by the artificial intelligence platform.

The artificial intelligence platform 10 shown in fig. 1 is for example only and is not intended to limit the scope of the present application. Those skilled in the art will appreciate that the artificial intelligence platform 10 may include many more management nodes and computing nodes in a particular implementation without limitation.

According to an embodiment of the present invention, there is provided a function management method embodiment, it being noted that the steps shown in the flowcharts of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

In this embodiment, a function management method is provided, which may be used in the above-mentioned function management device or management node, and fig. 2 is a flowchart of a function management method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:

S201: and under the condition that the node state of the computing node is normal and the residual resources meet the preset conditions, acquiring a first license of the artificial intelligent platform at the current moment.

Wherein the first license includes a first number of identifications corresponding to a first number of functions of the artificial intelligence platform deployed on the compute node at the current time. The first license is for indicating a first number of functions of the artificial intelligence platform that the computing node is permitted to deploy. Optionally, the first license further includes a validity period of the first license and a state of the first license, and the state of the first license is a valid state. The license is also called license.

In the embodiment of the invention, the computing nodes are all the computing nodes included in the artificial intelligent management platform.

In the embodiment of the invention, the normal node state comprises that the running state of each operating system service in all operating system services of the computing node is normal, the running state of each minimum scheduling unit in all minimum scheduling units is normal, and no executing task or task to be executed exists in all tasks.

In the embodiment of the invention, the residual resources are idle resources of the computing node. The remaining resources may include GPU resources, CPU resources, memory resources, and the like.

In an optional implementation manner, before detecting that the node state of the computing node is normal and the residual resources meet the preset condition, the management node receives a click operation of a user on a display interface of the artificial intelligent platform, and sends a first detection request to the computing node in response to the click operation; under the condition that the running state of each operating system service is normal according to the first detection request, sending a second detection request to the computing node; transmitting a third detection request to the computing node under the condition that the running state of each minimum scheduling unit is determined to be normal according to the second detection request; under the condition that the fact that the task which is being executed is not existed or the task to be executed is not existed in all the tasks is determined according to the third detection request, the node state of the computing node is determined to be normal, and the residual resources of the computing node are obtained; and if the residual resources are greater than or equal to the preset threshold, determining that the residual resources of the computing node meet the preset condition.

The first detection request is used for detecting the running state of each operating system service in all operating system services of the computing node.

In the embodiment of the present invention, the second detection request is used to detect an operation state of each minimum scheduling unit in all minimum scheduling units of the computing node. The smallest schedule element is also called pod.

In the embodiment of the invention, the third detection request is used for detecting the running state of each task in all the tasks of the computing node.

As shown in fig. 3, fig. 3 is a schematic diagram of a management node detecting node states and remaining resources of a computing node according to an embodiment of the present invention. In fig. 3, the management node first detects an operation state of each of all operating system services of the computing node; detecting the running state of each minimum scheduling unit in all minimum scheduling units of the computing node; detecting the running state of each task in all tasks of the computing node; and finally detecting the residual resources of the computing node.

It can be understood that if the running state of each operating system service in all the operating system services of the computing node is normal, the running state of each minimum scheduling unit in all the minimum scheduling units is normal, no task being executed or no task to be executed exists in all the tasks, and the remaining resources are greater than or equal to a preset threshold, the management node determines that the health of the computing node meets the standard, and meets the upgrade condition.

In an alternative embodiment, the management node determines that the node state of the computing node is abnormal when the running state of at least one operating system service is abnormal in all operating system services according to the first detection request; or determining that the node state of the computing node is abnormal under the condition that the running state of at least one minimum scheduling unit is abnormal in all the minimum scheduling units according to the second detection request; or determining that the node state of the computing node is abnormal in the condition that at least one task which is being executed exists in all tasks according to the third detection request and/or at least one task which is to be executed exists.

Optionally, after determining that the running state of at least one operating system service is abnormal in all the operating system services according to the first detection request, the management node obtains a service name corresponding to each operating system service in the at least one operating system service; and generating and displaying prompt information corresponding to the service name on a display interface so as to prompt a user to manually repair at least one operating system service. For example, the hint information corresponding to the service name is "nfs service state is abnormal, please manually repair-! ".

Optionally, the management node obtains the name of each minimum scheduling unit in at least one minimum scheduling unit when determining that the running state of at least one minimum scheduling unit is abnormal in all the minimum scheduling units according to the second detection request; and generating and displaying prompt information corresponding to the name of the minimum scheduling unit on a display interface so as to prompt a user to manually repair at least one minimum scheduling unit. For example, the hint information corresponding to the name of the minimum dispatch unit is "first pod state exception, please manually repair-! ".

Optionally, the management node determines that at least one task being executed exists in all the tasks according to the third detection request, and/or obtains a task name of at least one task being executed and/or obtains a task name of at least one task to be executed when at least one task to be executed exists; and generating and displaying prompt information corresponding to the task name on a display interface. For example, the prompt information corresponding to the task name is "the task is running, queuing, please wait for the task to run, then operate, or manually terminate the task-! ".

Optionally, after the management node obtains that the remaining resources of the computing node are smaller than the preset threshold, generating and displaying the prompt information on the display interface. For example, the hint may be "computing node 102 has insufficient remaining resources, please free memory resources to guarantee the resource requirements for promotion/demotion".

S202: and responding to the adding operation of the user, and acquiring a second license of the artificial intelligence platform.

The second license comprises a second number identifier corresponding to a second number of functions of the artificial intelligent platform to be deployed on the computing node.

The management node responds to the adding operation of the user, acquires a second license of the artificial intelligent platform, analyzes the second license and obtains a second number identifier corresponding to a second number of functions of the artificial intelligent platform to be deployed on the computing node.

S203: if the first number of identifiers is smaller than the second number of identifiers, determining a newly added identifier, and deploying a function corresponding to the newly added identifier on the computing node so as to upgrade the artificial intelligent platform.

In an alternative implementation manner, the management node obtains an upgrade script corresponding to a function corresponding to the newly added identifier; the method comprises the steps that an upgrade script is sent to a computing node, and the computing node is used for executing at least one command in the upgrade script and returning an execution result of each command in the at least one command; receiving an execution result of each command in at least one command fed back by a computing node; if the execution result of each command indicates that each command is successfully executed, the function corresponding to the newly added identifier is successfully deployed on the computing node.

It can be appreciated that the installation package of the artificial intelligent platform comprises upgrade scripts and configuration files corresponding to all functions. Before obtaining the upgrade script corresponding to the function corresponding to the newly added identifier, the management node needs to modify the configuration file parameters according to the newly added identifier, and opens the switch of the corresponding function.

In one example, the at least one command includes a first command. After receiving the execution result of each command in at least one command fed back by the computing node, if the execution result of the first command indicates that the first command is in error reporting, the management node analyzes the execution result of the first command to obtain an error reporting keyword corresponding to the first command; and determining a target strategy according to the error reporting keywords and a preset corresponding relation, and executing the target strategy to correct errors in the execution result of the first command.

The preset corresponding relation is the corresponding relation between the error reporting key word and the target strategy.

Optionally, the knowledge base includes correspondence between a plurality of keywords and policies. The preset corresponding relation is the corresponding relation in the knowledge base.

Taking the first command as a push-pull mirror image command as an example, after receiving an execution result of the push-pull mirror image command fed back by the computing node, if the execution result of the first command indicates that the push-pull mirror image command is in error reporting, analyzing the execution result of the push-pull mirror image command to obtain an error reporting keyword corresponding to the push-pull mirror image command as push-pull mirror image abnormality; and determining the target policy as restarting the mirror image service according to the push-pull mirror image abnormality and the preset corresponding relation, and re-executing the push-pull mirror image command and executing the target policy.

It can be understood that the knowledge base cannot include the correspondence between all the error reporting keywords and the corresponding solution strategies. Therefore, under the condition that the target strategy cannot be inquired in the knowledge base, the management node stores the error reporting keywords in the upgrade log of the artificial intelligent platform, and terminates the upgrade operation.

In addition, under the condition that the management node cannot determine the target strategy, the user can also manually process related error reporting, and click a 'continue' control to upgrade the artificial intelligent platform after the error reporting process is completed. Optionally, the user may not continue to upgrade the artificial intelligent platform, and at this time, may click on the "rollback" control to perform a rollback operation, so that the artificial intelligent platform is restored to the state before upgrade.

In one example, in the process that the management node deploys the function corresponding to the newly added identifier on the computing node, when the user selects to stop upgrading the artificial intelligent platform, the state of the second license may be set to be a failure state, so that the artificial intelligent platform is restored to a state before upgrading.

In one example, the management node sets the state of the second license to be a failure state under the condition that the function corresponding to the newly added identifier is deployed on the computing node fails, so that the artificial intelligent platform is restored to the state before upgrading.

Optionally, before deploying the function corresponding to the newly added identifier on the computing node, the management node saves the backup configuration file of the artificial intelligent platform and the backup data of the related database.

In an example, under the condition that the function corresponding to the newly added identifier is deployed on the computing node fails, the management node acquires the stored backup configuration file of the artificial intelligent platform and the backup data of the related database, restores the configuration file to the backup configuration file, and restores the data in the database to the backup data, so that the artificial intelligent platform is restored to the upgraded configuration, and the rollback effect is achieved.

In conclusion, the method is suitable for the scene that the management node upgrades the artificial intelligent platform. The same applies to the scenario where the management node downgrades the artificial intelligence platform.

Optionally, if the first number of identifiers is greater than the second number of identifiers, the management node determines the pruned identifiers and deletes the functions corresponding to the pruned identifiers on the computing node to downgrade the artificial intelligence platform.

Based on the above methods from S201 to S203, a first license of the artificial intelligent platform at the current moment may be obtained when it is detected that the node state of the computing node is normal and the remaining resources satisfy the preset condition; responding to the adding operation of the user, and acquiring a second license of the artificial intelligent platform; if the first number of identifiers included in the first license is smaller than the second number of identifiers included in the second license, determining a newly added identifier, and deploying a function corresponding to the newly added identifier on the computing node so as to upgrade the artificial intelligent platform.

In this embodiment, a method for further function management is provided, which may be used in the above-mentioned device or management node for function management, fig. 4 is a flowchart of a further function management method according to an embodiment of the present invention, and as shown in fig. 4, the management node may further perform the following steps:

s401: and responding to the click upgrading operation of the user.

S402: and detecting whether the node state of the computing node is normal.

S403: if the node state of the computing node is abnormal, a prompt message is output, and the node state is restored based on the prompt message, and S401 is executed again.

S404: if the node state of the computing node is normal, detecting whether the residual resources are larger than or equal to a preset threshold value.

S405: if the remaining resources are smaller than the preset threshold, releasing the resources, and executing S403 again;

s406: and if the residual resources are greater than or equal to a preset threshold, acquiring a first license of the artificial intelligent platform at the current moment.

S407: and responding to the adding operation of the user, and acquiring a second license of the artificial intelligence platform.

S408: comparing whether the first number of identifications is smaller than the second number of identifications;

s409: if yes, determining the newly added identification, and deploying the function corresponding to the newly added identification on the computing node so as to upgrade the artificial intelligent platform.

S410: if not, determining the pruned identification, and deleting the function corresponding to the pruned identification on the computing node to degrade the artificial intelligent platform.

In this embodiment, a function management device is further provided, and the function management device is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a function management apparatus, as shown in fig. 6, including:

The obtaining module 501 is configured to obtain, when it is detected that the node state of the computing node is normal and the remaining resources meet the preset condition, a first license of the artificial intelligence platform at the current moment, where the first license includes a first number identifier corresponding to a first number of functions of the artificial intelligence platform deployed on the computing node at the current moment.

The obtaining module 501 is configured to obtain, in response to an adding operation by a user, a second license of the artificial intelligence platform, where the second license includes a second number identifier corresponding to a second number of functions of the artificial intelligence platform that need to be deployed on the computing node.

The processing module 502 is further configured to determine a new identifier if the first number of identifiers is smaller than the second number of identifiers, and deploy a function corresponding to the new identifier on the computing node to upgrade the artificial intelligence platform.

In some optional embodiments, the processing module 502 is further configured to determine a pruned identifier if the first number of identifiers is greater than the second number of identifiers, and delete a function corresponding to the pruned identifier on the computing node to downgrade the artificial intelligence platform.

In some optional embodiments, the obtaining module 501 is further configured to receive a click operation of the user on the display interface of the artificial intelligence platform, and the processing module 502 is further configured to send, in response to the click operation, a first detection request to the computing node, where the first detection request is used to detect an operation state of each operating system service in all operating system services of the computing node; the processing module 502 is further configured to send a second detection request to the computing node when it is determined that the operation state of each operating system service is normal according to the first detection request, where the second detection request is used to detect the operation state of each minimum scheduling unit in all minimum scheduling units of the computing node; the processing module 502 is further configured to send a third detection request to the computing node, where the running state of each minimum scheduling unit is determined to be normal according to the second detection request, where the third detection request is used to detect the running state of each task in all tasks of the computing node; the processing module 502 is further configured to determine that the node state of the computing node is normal and obtain the remaining resources of the computing node when it is determined that there is no task being executed or no task to be executed in all the tasks according to the third detection request; the processing module 502 is further configured to determine that the remaining resources of the computing node satisfy the preset condition if the remaining resources are greater than or equal to a preset threshold.

In some optional embodiments, the processing module 502 is further configured to determine that, in a case where it is determined that, according to the first detection request, there is an abnormal running state of at least one operating system service in all operating system services, a node state of the computing node is abnormal; or the processing module 502 is further configured to determine that the node state of the computing node is abnormal when the running state of at least one minimum scheduling unit is abnormal in all the minimum scheduling units determined according to the second detection request; or the processing module 502 is further configured to determine that the node status of the computing node is abnormal if at least one task being executed exists among all the tasks and/or at least one task to be executed exists.

In some optional embodiments, the processing module 502 is specifically configured to obtain an upgrade script corresponding to a function corresponding to the newly added identifier; the processing module 502 is further specifically configured to send an upgrade script to a computing node, where the computing node is configured to execute at least one command in the upgrade script, and return an execution result of each command in the at least one command; the processing module 502 is further specifically configured to receive an execution result of each command in at least one command fed back by the computing node; the processing module 502 is further specifically configured to determine that the deployment of the function corresponding to the newly added identifier on the computing node is successful if the execution result of each command indicates that each command is executed successfully.

In some optional embodiments, the at least one command includes a first command, and the processing module 502 is further specifically configured to parse an execution result of the first command to obtain an error reporting keyword corresponding to the first command if the execution result of the first command indicates that the first command reports an error; the processing module 502 is further specifically configured to determine a target policy according to the error reporting keyword and a preset corresponding relationship, and execute the target policy to correct an error in an execution result of the first command, where the preset corresponding relationship is a corresponding relationship between the error reporting keyword and the target policy.

In some optional embodiments, the processing module 502 is further configured to set, in a case where the deployment of the function corresponding to the newly added identifier on the computing node fails, a state of the second license as a failure state, so that the artificial intelligence platform is restored to a state before upgrading.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The function management device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC (Application SPECIFIC INTEGRATED Circuit) Circuit, a processor and a memory that execute one or more software or firmware programs, and/or other devices that can provide the above functions.

The embodiment of the invention also provides computer equipment, which is provided with the function management device shown in the figure 5.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 6, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 6.

The processor 10 may be a central processor, a network processor, or a combination thereof. Wherein the processor 10 may further comprise a hardware integrated circuit. The hardware integrated circuit may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 6.

The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Portions of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present application by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A function management method, characterized in that the method comprises:

When it is detected that the node status of the computing node is normal and the remaining resources meet the preset conditions, a first license of the artificial intelligence platform at the current moment is obtained, wherein the first license includes a first quantity identifier corresponding to a first quantity of functions of the artificial intelligence platform deployed on the computing node at the current moment;

In response to an adding operation by the user, obtaining a second license of the artificial intelligence platform, where the second license includes a second quantity identifier corresponding to a second quantity function of the artificial intelligence platform that needs to be deployed on the computing node;

If the first quantity identifier is smaller than the second quantity identifier, a newly added identifier is determined, and a function corresponding to the newly added identifier is deployed on the computing node to upgrade the artificial intelligence platform.

2. The method according to claim 1, characterized in that the method further comprises:

If the first quantity identifier is greater than the second quantity identifier, the deleted identifier is determined, and the function corresponding to the deleted identifier is deleted on the computing node to downgrade the artificial intelligence platform.

3. The method according to claim 2, characterized in that before detecting that the node status of the computing node is normal and the remaining resources meet the preset conditions, the method further comprises:

Receiving a click operation by a user on a display interface of the artificial intelligence platform;

In response to the click operation, sending a first detection request to the computing node, the first detection request being used to detect a running status of each operating system service among all operating system services of the computing node;

When it is determined according to the first detection request that the running state of each operating system service is normal, sending a second detection request to the computing node, wherein the second detection request is used to detect the running state of each minimum scheduling unit in all minimum scheduling units of the computing node;

When it is determined according to the second detection request that the running state of each minimum scheduling unit is normal, sending a third detection request to the computing node, wherein the third detection request is used to detect the running state of each task in all tasks of the computing node;

In a case where it is determined according to the third detection request that there is no task being executed or no task to be executed among all the tasks, determining that the node state of the computing node is normal, and acquiring the remaining resources of the computing node;

If the remaining resources are greater than or equal to a preset threshold, it is determined that the remaining resources of the computing node meet the preset condition.

4. The method according to claim 3, characterized in that the method further comprises:

In a case where it is determined according to the first detection request that at least one operating system service among all the operating system services has an abnormal running state, determining that the node state of the computing node is abnormal;

Alternatively, when it is determined according to the second detection request that the operation status of at least one minimum scheduling unit among all the minimum scheduling units is abnormal, determining that the node status of the computing node is abnormal;

Alternatively, when it is determined according to the third detection request that among all the tasks, there is at least one task being executed and/or there is at least one task to be executed, it is determined that the node state of the computing node is abnormal.

5. The method according to claim 1, characterized in that the deploying the function corresponding to the newly added identifier on the computing node comprises:

Obtain an upgrade script corresponding to the function corresponding to the newly added identifier;

Sending the upgrade script to the computing node, the computing node is used to execute at least one command in the upgrade script and return the execution result of each command in the at least one command;

receiving an execution result of each command in the at least one command fed back by the computing node;

If the execution result of each command indicates that each command is executed successfully, it is determined that the function corresponding to the newly added identifier is deployed successfully on the computing node.

6. The method according to claim 5, characterized in that the at least one command includes a first command, and after receiving the execution result of each command in the at least one command fed back by the computing node, the method further comprises:

If the execution result of the first command indicates that the first command reports an error, parsing the execution result of the first command to obtain an error keyword corresponding to the first command;

According to the error reporting keyword and the preset corresponding relationship, a target strategy is determined and executed to correct the error in the execution result of the first command, and the preset corresponding relationship is the corresponding relationship between the error reporting keyword and the target strategy.

7. The method according to claim 1, characterized in that the method further comprises:

In the event that the function corresponding to the newly added identifier fails to be deployed on the computing node, the status of the second license is set to an invalid state so that the artificial intelligence platform is restored to the state before the upgrade.

8. A function management device, characterized in that the device comprises:

An acquisition module, configured to acquire a first license of the artificial intelligence platform at a current moment when it is detected that the node status of the computing node is normal and the remaining resources meet a preset condition, wherein the first license includes a first quantity identifier corresponding to a first quantity of functions of the artificial intelligence platform deployed on the computing node at the current moment;

The acquisition module is configured to acquire a second license of the artificial intelligence platform in response to an adding operation by the user, wherein the second license includes a second quantity identifier corresponding to a second quantity function of the artificial intelligence platform that needs to be deployed on the computing node;

The processing module is also used to determine a newly added identifier if the first quantity identifier is smaller than the second quantity identifier, and deploy the function corresponding to the newly added identifier on the computing node to upgrade the artificial intelligence platform.

9. A computer device, comprising:

A memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the function management method according to any one of claims 1 to 7 by executing the computer instructions.

10 . A computer-readable storage medium, characterized in that computer instructions are stored on the computer-readable storage medium, and the computer instructions are used to enable a computer to execute the function management method according to any one of claims 1 to 7.