CN115277006A

CN115277006A - Management method and device for private computing nodes

Info

Publication number: CN115277006A
Application number: CN202210737341.1A
Authority: CN
Inventors: 郭石磊; 胡晓龙; 王磊; 胡东文
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-11-01

Abstract

The embodiment of the specification provides a method and a device for managing private computing nodes. The method comprises the following steps: firstly, acquiring first state information of a first privacy computing node in a kubernets cluster, wherein the first state information indicates that the node state is abnormal; the first privacy computation node belongs to a first namespace corresponding to a first mechanism; then, under the condition that the duration time of the abnormal state of the node is determined to reach a first preset time length based on the first state information, adding a target stain mark to the first privacy computing node; the first preset duration is set for the first mechanism; next, in response to the addition of the target taint marker, scheduling a pod served on the first private computing node to a second private computing node in the kubernets cluster that is located under the first namespace; the second private computing node is in a normal state and does not have the target taint mark.

Description

Management method and device for private computing nodes

技术领域technical field

本说明书一个或多个实施例涉及计算机技术领域，尤其涉及一种隐私计算节点的管理方法及装置。One or more embodiments of this specification relate to the field of computer technology, and in particular, to a method and device for managing privacy computing nodes.

背景技术Background technique

Kubernetes是一个开源的、工业级的容器编排平台，它的出现推动了微服务架构等热门技术的普及和落地，使开发、运维和交付变得越来越简单。虽然标准的kubernetes已经提供了很多强大的功能，但是针对一些特殊的应用场景，标准的kubernetes已经无法满足需求。Kubernetes is an open source, industrial-grade container orchestration platform. Its emergence has promoted the popularization and implementation of popular technologies such as microservice architecture, making development, operation and maintenance, and delivery easier and easier. Although the standard kubernetes already provides many powerful functions, for some special application scenarios, the standard kubernetes can no longer meet the needs.

目前，对于隐私多方安全计算的业务场景，尚不存在一种可以满足该场景下特殊需求的kubernetes节点管理方案。At present, for the business scenario of private multi-party secure computing, there is no kubernetes node management solution that can meet the special needs of this scenario.

发明内容Contents of the invention

本说明书一个或多个实施例描述一种隐私计算节点的管理方法及装置，可以更加细粒度地管理机构下的隐私计算节点，及时触发其中无效节点上服务的重新调度，保障服务的可用性及稳定性。One or more embodiments of this specification describe a method and device for managing private computing nodes, which can manage private computing nodes under the organization in a more fine-grained manner, trigger the rescheduling of services on invalid nodes in time, and ensure the availability and stability of services sex.

根据第一方面，提供一种隐私计算节点的管理方法，包括：获取kubernetes集群中第一隐私计算节点的第一状态信息，其指示节点状态异常；所述第一隐私计算节点属于第一机构对应的第一命名空间；在基于所述第一状态信息确定出所述节点状态异常的持续时间达到第一预设时长的情况下，为所述第一隐私计算节点添加目标污点标记；所述第一预设时长针对所述第一机构而设置；响应于所述目标污点标记的添加，将所述第一隐私计算节点上服务的pod调度至所述kubernetes集群中位于所述第一命名空间下的第二隐私计算节点；所述第二隐私计算节点状态正常且没有所述目标污点标记。According to the first aspect, a method for managing a privacy computing node is provided, including: obtaining first status information of a first privacy computing node in a kubernetes cluster, which indicates that the node status is abnormal; the first privacy computing node belongs to the first institution corresponding the first namespace; when it is determined based on the first state information that the duration of the abnormal state of the node reaches a first preset duration, add a target taint mark to the first privacy computing node; the second A preset duration is set for the first organization; in response to the addition of the target taint mark, dispatch the pod served on the first privacy computing node to the kubernetes cluster under the first namespace The second privacy computing node; the state of the second privacy computing node is normal and does not have the target taint mark.

在一个实施例中，所述第一状态信息中还包括第一时间戳；其中，在基于所述第一状态信息确定出所述节点状态异常持续时间达到第一预设时长的情况下，为所述第一隐私计算节点添加目标污点标记，包括：自所述第一时间戳起经过所述第一预设时长后，获取所述第一隐私计算节点的第二状态信息；在所述第二状态信息仍指示节点状态异常的情况下，为所述第一隐私计算节点添加所述目标污点标记。In one embodiment, the first status information further includes a first time stamp; wherein, when it is determined based on the first status information that the duration of the node status abnormality reaches a first preset duration, it is Adding a target taint mark to the first privacy computing node includes: obtaining second state information of the first privacy computing node after the first preset time period has elapsed since the first timestamp; If the second state information still indicates that the state of the node is abnormal, add the target taint mark for the first privacy computing node.

在一个实施例中，获取kubernetes集群中第一隐私计算节点的第一状态信息，包括：监听到所述第一隐私计算节点发生状态变更的第一事件，该第一事件的信息中包括所述第一状态信息；其中，在基于所述第一状态信息确定出所述节点状态异常的持续时间达到第一预设时长的情况下，为所述第一隐私计算节点添加目标污点标记，包括：在自所述第一事件起经过的所述第一预设时长内，没有监听到所述第一隐私计算节点状态变更为正常的事件的情况下，为所述第一隐私计算节点添加所述目标污点标记。In one embodiment, obtaining the first state information of the first privacy computing node in the kubernetes cluster includes: listening to the first event of a state change of the first privacy computing node, and the information of the first event includes the First state information; wherein, when it is determined based on the first state information that the duration of the abnormal state of the node reaches a first preset duration, adding a target taint mark to the first privacy computing node includes: Add the Target taint marker.

在一个实施例中，在为所述第一隐私计算节点添加目标污点标记之后，所述方法还包括：获取所述第一隐私计算节点的第三状态信息，其指示节点状态正常；在基于所述第三状态信息确定出所述节点状态正常的持续时间达到第二预设时长的情况下，去除所述目标污点标记。In one embodiment, after adding the target taint mark to the first privacy computing node, the method further includes: obtaining third state information of the first privacy computing node, which indicates that the node state is normal; If the third state information determines that the duration of the normal state of the node reaches a second preset duration, the target taint mark is removed.

在一个具体的实施例中，所述第三状态信息中还包括第二时间戳；其中，在基于所述第三状态信息确定出所述节点状态正常的持续时间达到第二预设时长的情况下，去除所述目标污点标记，包括：自所述第二时间戳起经过所述第二预设时长后，获取所述第一隐私计算节点的第四状态信息；在所述第四状态信息仍指示节点状态正常的情况下，去除所述目标污点标记。In a specific embodiment, the third status information further includes a second time stamp; wherein, when it is determined based on the third status information that the duration of the normal state of the node reaches the second preset duration Next, removing the target taint mark includes: obtaining fourth state information of the first privacy computing node after the second preset time period has elapsed since the second time stamp; in the fourth state information If it still indicates that the node status is normal, remove the target taint mark.

在另一个具体的实施例中，获取所述第一隐私计算节点的第三状态信息，包括：监听到所述第一隐私计算节点发生状态变更的第二事件，该第二事件的信息中包括所述第三状态信息；其中，在基于所述第三状态信息确定出所述节点状态正常的持续时间达到所述第二预设时长的情况下，去除所述目标污点标记，包括：在自所述第二事件起经过的所述第二预设时长内，没有监听到所述第一隐私计算节点状态变更为异常的事件的情况下，去除所述目标污点标记。In another specific embodiment, obtaining the third state information of the first privacy computing node includes: listening to a second event that the state of the first privacy computing node changes, and the information of the second event includes The third state information; wherein, when it is determined based on the third state information that the duration of the normal state of the node reaches the second preset duration, removing the target taint mark includes: Within the second preset time period elapsed from the second event, if the event that the state of the first privacy computing node changes to abnormal is not detected, the target taint mark is removed.

在一个实施例中，所述kubernetes集群包括多个计算节点；在获取kubernetes集群中第一隐私计算节点的第一状态信息之前，所述方法还包括：从所述多个计算节点中过滤出带有隐私计算标签的多个隐私计算节点；所述多个隐私计算节点分属于对应不同机构的不同命名空间，其中包括所述第一隐私计算节点和第二隐私计算节点。In one embodiment, the kubernetes cluster includes a plurality of computing nodes; before obtaining the first status information of the first private computing node in the kubernetes cluster, the method further includes: filtering out bands from the plurality of computing nodes A plurality of privacy computing nodes with privacy computing labels; the plurality of privacy computing nodes belong to different namespaces corresponding to different institutions, including the first privacy computing node and the second privacy computing node.

在一个实施例中，所述方法的执行主体为隐私计算节点控制器；其中，在将所述第一隐私计算节点上服务的pod调度至所述kubernetes集群中位于所述第一命名空间下的第二隐私计算节点之后，所述方法还包括：响应于对所述隐私计算节点控制器所提供的目标API的调用，从所述kubernetes集群中删除所述第一隐私计算节点。In one embodiment, the execution subject of the method is a privacy computing node controller; wherein, when scheduling the pod served on the first privacy computing node to the pod located in the first namespace in the kubernetes cluster After the second privacy computing node, the method further includes: deleting the first privacy computing node from the kubernetes cluster in response to calling a target API provided by the privacy computing node controller.

在一个实施例中，所述第一状态信息中还包括第一时间戳；在将所述第一隐私计算节点上服务的pod调度至所述kubernetes集群中位于所述第一命名空间下的第二隐私计算节点之后，所述方法还包括：自所述第一时间戳起经过第三预设时长后，获取所述第一隐私计算节点的第五状态信息；在所述第五状态信息指示节点状态异常的情况下，将所述第一隐私计算节点从所述kubernetes集群中删除。In one embodiment, the first status information further includes a first timestamp; when scheduling the pod served on the first privacy computing node to the first pod under the first namespace in the kubernetes cluster After the second privacy computing node, the method further includes: acquiring fifth status information of the first privacy computing node after a third preset time period has elapsed since the first timestamp; when the fifth status information indicates When the node state is abnormal, the first privacy computing node is deleted from the kubernetes cluster.

根据第二方面，提供一种隐私计算节点的管理装置，包括：状态获取单元，配置为获取kubernetes集群中第一隐私计算节点的第一状态信息，其指示节点状态异常；所述第一隐私计算节点属于第一机构对应的第一命名空间；污点添加单元，配置为在基于所述第一状态信息确定出所述节点状态异常的持续时间达到第一预设时长的情况下，为所述第一隐私计算节点添加目标污点标记，以响应于所述目标污点标记的添加，将所述第一隐私计算节点上服务的pod调度至所述kubernetes集群中位于所述第一命名空间下的第二隐私计算节点；所述第一预设时长针对所述第一机构而设置；所述第二隐私计算节点状态正常且没有所述目标污点标记。According to the second aspect, there is provided a management device for a private computing node, including: a state acquisition unit configured to obtain first state information of a first private computing node in a kubernetes cluster, which indicates that the state of the node is abnormal; the first private computing node The node belongs to the first namespace corresponding to the first organization; the taint adding unit is configured to, when it is determined based on the first state information that the duration of the abnormal state of the node reaches a first preset duration, add the taint to the second A privacy computing node adds a target taint mark, and in response to the addition of the target taint mark, schedules the pod served on the first privacy computing node to the second pod under the first namespace in the kubernetes cluster. A privacy computing node; the first preset duration is set for the first organization; the second privacy computing node is in a normal state and does not have the target taint mark.

根据第三方面，提供了一种计算机可读存储介质，其上存储有计算机程序，当所述计算机程序在计算机中执行时，令计算机执行第一方面的方法。According to a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.

根据第四方面，提供了一种计算设备，包括存储器和处理器，所述存储器中存储有可执行代码，该处理器执行所述可执行代码时，实现第一方面的方法。According to a fourth aspect, there is provided a computing device, including a memory and a processor, where executable codes are stored in the memory, and when the processor executes the executable codes, the method of the first aspect is implemented.

采用本说明书实施例披露的隐私计算节点的管理方法，可以更加细粒度地管理机构下的隐私计算节点，及时发现其中的无效节点并添加目标污点标记，从而及时触发对其上服务的pod的重新调度，保障服务的可用性及稳定性。进一步，还可以实现对隐私计算无效节点的清理回收，以及，对恢复正常状态且状态稳定的隐私计算节点进行目标污点标记的去除，使其能够再次投入使用，充分发挥隐私计算节点的使用效能。By adopting the management method of privacy computing nodes disclosed in the embodiment of this specification, the privacy computing nodes under the organization can be managed in a more fine-grained manner, the invalid nodes among them can be found in time and the target taint mark can be added, so as to trigger the restart of the pod serving on it in time. Scheduling to ensure the availability and stability of services. Furthermore, it is also possible to clean up and recycle invalid nodes for privacy computing, and remove the target taint mark for private computing nodes that have returned to a normal state and are in a stable state, so that they can be put into use again and give full play to the use efficiency of private computing nodes.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

图1示出根据一个实施例的隐私计算节点的管理方案实施示例图；FIG. 1 shows an example diagram of implementing a management solution of a privacy computing node according to an embodiment;

图2示出根据一个实施例的隐私计算节点的管理方法流程示意图；Fig. 2 shows a schematic flowchart of a management method of a privacy computing node according to an embodiment;

图3示出根据一个实施例的隐私计算节点的管理装置结构示意图。Fig. 3 shows a schematic structural diagram of a management device of a privacy computing node according to an embodiment.

具体实施方式Detailed ways

下面结合附图，对本说明书提供的方案进行描述。The solutions provided in this specification will be described below in conjunction with the accompanying drawings.

承前所述，对于隐私多方安全计算的业务场景，尚不存在一种可以满足该场景下特殊需求的kubernetes节点管理方案。具体地，由于隐私多方安全计算的业务特殊性，每个机构都有一组专属的隐私计算节点。当有部分隐私计算节点不可用时，需要及时发现这些无效的隐私计算节点，并将这些隐私计算节点上的服务及时重新调度到其他可用节点上，否则将影响服务的可用性。As mentioned above, for the business scenario of private multi-party secure computing, there is no kubernetes node management solution that can meet the special needs of this scenario. Specifically, due to the business particularity of private multi-party secure computing, each organization has a dedicated set of private computing nodes. When some privacy computing nodes are unavailable, it is necessary to discover these invalid privacy computing nodes in time, and reschedule the services on these privacy computing nodes to other available nodes in time, otherwise the availability of services will be affected.

基于以上观察和分析，为了更加细粒度地管理每个机构下的隐私计算节点，包括及时发现和清理其中的无效节点，发明人提出一种隐私计算节点的管理方案，用于管理每个机构下的隐私计算节点，保障服务的可用性及稳定性。Based on the above observations and analysis, in order to manage the private computing nodes under each organization in a more fine-grained manner, including timely discovery and cleaning of invalid nodes, the inventor proposes a management scheme for private computing nodes, which is used to manage the private computing nodes under each organization. The private computing nodes guarantee the availability and stability of services.

图1示出根据一个实施例的隐私计算节点的管理方案实施示例图，如图1所示，对kubernetes集群中的隐私计算节点进行动态监控，在发现某个机构中的某个隐私计算节点(例如，机构A中的隐私计算节点S_A1)状态异常的情况下，对其进行持续监测，若监测到其状态异常的持续时间超过针对该某个机构设置的异常时长阈值，则给第一计算节点加上预定污点标记(图1中将污点标记直观示意为不规则形状的灰色色块)，从而触发该无效节点上服务的pod重新调度到同一机构下的其他可用隐私计算节点(例如，机构A中的隐私计算节点S_A2)上。Fig. 1 shows an implementation example diagram of a management scheme of a private computing node according to an embodiment. As shown in Fig. 1, the private computing nodes in the kubernetes cluster are dynamically monitored, and when a certain private computing node ( For example, if the status of the privacy computing node S _A1 in organization A is abnormal, it will be continuously monitored. The node is marked with a predetermined taint (in Figure 1, the taint mark is visually illustrated as an irregularly shaped gray color block), thereby triggering the rescheduling of the service pod on the invalid node to other available privacy computing nodes under the same organization (for example, the organization On the privacy computing node S _A2 in A.

下面结合更多实施例，对上述方案的实施步骤进行详细介绍。The implementation steps of the above solution will be introduced in detail below in conjunction with more embodiments.

图2示出根据一个实施例的隐私计算节点的管理方法流程示意图，所述方法的执行主体可以为任何具有计算、处理能力的装置、服务器或设备集群等，例如，可以为隐私计算节点控制器。如图2所述，所述方法包括以下步骤：Fig. 2 shows a schematic flowchart of a method for managing a private computing node according to an embodiment. The subject of execution of the method may be any device, server, or device cluster with computing and processing capabilities, for example, it may be a private computing node controller . As shown in Figure 2, the method comprises the following steps:

步骤S210，获取kubernetes集群中第一隐私计算节点的第一状态信息，其指示节点状态异常，所述第一隐私计算节点属于第一机构对应的第一命名空间；步骤S220，在基于所述第一状态信息确定出所述节点状态异常的持续时间达到第一预设时长的情况下，为所述第一隐私计算节点添加目标污点标记；所述第一预设时长针对所述第一机构而设置；步骤S230，响应于所述目标污点标记的添加，将所述第一隐私计算节点上服务的pod调度至所述kubernetes集群中位于所述第一命名空间下的第二隐私计算节点，所述第二隐私计算节点状态正常且没有所述目标污点标记。Step S210, obtain the first status information of the first privacy computing node in the kubernetes cluster, which indicates that the node status is abnormal, and the first privacy computing node belongs to the first namespace corresponding to the first organization; step S220, based on the first If the state information determines that the duration of the abnormal state of the node reaches a first preset duration, add a target taint mark to the first privacy computing node; the first preset duration is specific to the first institution Setting; step S230, in response to the addition of the target taint mark, dispatching the pod served on the first privacy computing node to the second privacy computing node in the kubernetes cluster under the first namespace, so The state of the second privacy computing node is normal and does not have the target taint mark.

对以上步骤的展开介绍如下：The expansion of the above steps is as follows:

首先，在步骤S210，获取kubernetes集群中第一隐私计算节点的第一状态信息。需说明，在本说明书实施例披露的kubernetes集群中，包括含有第一隐私计算节点在内的多个隐私计算节点，本说明书实施例披露的方法旨在对这些隐私计算节点进行管理。First, in step S210, the first status information of the first privacy computing node in the kubernetes cluster is acquired. It should be noted that in the kubernetes cluster disclosed in the embodiment of this specification, there are multiple private computing nodes including the first private computing node, and the method disclosed in the embodiment of this specification aims to manage these private computing nodes.

上述多个隐私计算节点分属于不同的机构，换言之，单个隐私计算节点专属于某个机构，只能被该机构进行调度使用，以执行该机构中的计算任务。进一步，一方面，在技术实现上，不同机构使用不同的命名空间(kubernetes namespace)进行资源隔离，相应，不同机构在kubernetes中表示为不同的命名空间。为简洁及区分描述，将第一隐私计算节点所属的机构和命名空间分别称为第一机构和第一命名空间。另一方面，可以将隐私计算节点上执行的计算任务称为隐私计算任务。典型地，隐私计算任务可以是多个机构在进行多方安全计算时各自需要执行的计算任务，例如，在联邦学习的场景下，各个机构需要利用自身持有的本地私有数据对本地部署的模型进行训练。The multiple privacy computing nodes mentioned above belong to different organizations. In other words, a single privacy computing node belongs to a certain organization and can only be scheduled and used by the organization to perform computing tasks in the organization. Further, on the one hand, in terms of technical implementation, different organizations use different namespaces (kubernetes namespace) for resource isolation, and correspondingly, different organizations are expressed as different namespaces in kubernetes. For brevity and distinction, the organization and namespace to which the first privacy computing node belongs are called the first organization and the first namespace, respectively. On the other hand, computing tasks performed on private computing nodes can be called private computing tasks. Typically, private computing tasks can be the computing tasks that multiple organizations need to perform when performing multi-party secure computing. For example, in the scenario of federated learning, each organization needs to use its own local private data to perform local deployment. train.

通常，隐私计算节点还带有隐私计算标签，使其能够区分于kubernetes集群中的非隐私计算节点。可以理解的是，隐私计算标签的标签内容可以由工作人员自定义，例如，定义为“kubernetes.io/nueva-agent”。另外，隐私计算节点和非隐私计算节点都可以实现为虚拟机或物理机。Usually, a private computing node also has a private computing label, which enables it to be distinguished from non-private computing nodes in the kubernetes cluster. It can be understood that the label content of the privacy computing label can be customized by the staff, for example, defined as "kubernetes.io/nueva-agent". In addition, both privacy computing nodes and non-privacy computing nodes can be implemented as virtual machines or physical machines.

在一个实施例中，在kubernetes集群中还包括普通计算节点的场景下，在实施本步骤之前，所述方法还可以包括：从kubernetes集群中包括的多个计算节点中过滤出带有隐私计算标签的上述多个隐私计算节点。如此，使得后续可以对定位到的多个隐私计算节点进行精准监管。In one embodiment, in the scenario where common computing nodes are also included in the kubernetes cluster, before performing this step, the method may further include: filtering out the private computing tags from the plurality of computing nodes included in the kubernetes cluster The above multiple privacy computing nodes. In this way, the multiple privacy computing nodes located can be accurately supervised in the future.

以上，主要对包含第一隐私计算节点在内的多个隐私计算节点进行介绍。Above, a plurality of privacy computing nodes including the first privacy computing node are mainly introduced.

对于本步骤的实施，在一个实施例中，可以获取上述多个隐私计算节点中各个隐私计算节点的状态信息，此状态信息中显示其中若干(若干指代一个或多个)隐私计算节点的状态异常(或者说状态无效)，例如，节点状态为Not Ready。此时，上述第一隐私计算节点可以是该若干隐私计算节点中任意的一个计算节点。示例性地，对于多个隐私计算节点的状态信息的批量获取，可以基于kubernetes的list机制实现。For the implementation of this step, in one embodiment, the status information of each privacy computing node in the above-mentioned multiple privacy computing nodes can be obtained, and the state information shows the status of several (several refer to one or more) privacy computing nodes among them Abnormal (or invalid state), for example, the node state is Not Ready. At this time, the above-mentioned first privacy computing node may be any computing node among the several privacy computing nodes. Exemplarily, the batch acquisition of state information of multiple privacy computing nodes can be realized based on the list mechanism of kubernetes.

在另一个实施例中，监听到上述多个隐私计算节点中的某个计算节点状态发生变更的事件。此时，上述第一隐私计算节点可以是该某个计算节点，并且，可以将该事件称为第一事件，将第一事件中包括的变更后状态信息称为第一状态信息，指示节点状态变更为异常。示例性地，对于第一事件的监听，可以基于kubernetes的list-watch机制实现。In another embodiment, an event that a state of a certain computing node among the plurality of privacy computing nodes is changed is monitored. At this time, the above-mentioned first privacy computing node may be the certain computing node, and this event may be called the first event, and the changed state information included in the first event is called the first state information, which indicates the node state changed to exception. Exemplarily, the monitoring of the first event may be implemented based on the list-watch mechanism of kubernetes.

由此，可以实现对多个隐私计算节点的状态监测，获取得到指示其中的第一隐私计算节点处于异常状态的第一状态信息。In this way, the status monitoring of multiple privacy computing nodes can be realized, and the first status information indicating that the first privacy computing node among them is in an abnormal state can be obtained.

然后，在步骤S220，在基于上述第一状态信息确定出节点状态异常的持续时间达到第一预设时长的情况下，为第一隐私计算节点添加目标污点标记。需说明，其中第一预设时长针对第一机构而设置，示例性地，此设置可以通过隐私计算节点控制器提供的相关API来实现，例如，上层平台通过调用此相关API传入机构名称和对应的时长，完成针对特定机构的管理时长设置。如此，为不同机构配置不同的节点管理时间，可以满足不同机构的实际需求，实现更加细粒度地管理机构下的隐私计算无效节点。Then, in step S220, if it is determined based on the above-mentioned first state information that the duration of the abnormal state of the node reaches the first preset duration, a target taint mark is added to the first privacy computing node. It should be noted that the first preset duration is set for the first organization. Exemplarily, this setting can be realized through a related API provided by the privacy computing node controller. For example, the upper platform passes in the name of the organization and Complete the management time setting for a specific organization. In this way, configuring different node management time for different organizations can meet the actual needs of different organizations and realize more fine-grained management of privacy computing invalid nodes under organizations.

在获取到第一状态信息后，对于第一隐私计算节点的异常状态持续时长的判断，可以通过查询或监听等方式实现。具体，第一状态信息对应第一时间戳，也就是说，在第一时间戳对应的时刻，第一隐私计算节点的状态为异常。在一种实施情况中，第一状态信息被包含在上述第一事件的信息中，此时，第一时间戳也对应第一事件的发生时刻。After the first state information is obtained, the judgment on the duration of the abnormal state of the first privacy computing node can be realized by means of query or monitoring. Specifically, the first status information corresponds to the first timestamp, that is, at the moment corresponding to the first timestamp, the status of the first privacy computing node is abnormal. In an implementation situation, the first state information is included in the information of the first event, and at this time, the first time stamp also corresponds to the occurrence time of the first event.

在一个实施例中，自第一时间戳起经过第一预设时长后，可以实时查询第一隐私计算节点的第二状态信息。示例地，可以根据第一隐私计算节点的名称查询其当前的状态信息，比如，可以通过get方法进行查询，又比如，可以通过kubernetes中list资源列表定位出第一隐私计算节点，进而得到其状态信息。进一步，在第二状态信息仍指示节点状态异常的情况下，为第一隐私计算节点添加目标污点标记。否则，不进行目标污点标记的添加。需理解，目标污点标记可以由工作人员定义，如定义为nueva.alipay.com/agent＝NoExecute。In an embodiment, after a first preset period of time elapses from the first timestamp, the second state information of the first privacy computing node may be queried in real time. For example, the current status information of the first privacy computing node can be queried according to its name, for example, the query can be performed through the get method, and for example, the first privacy computing node can be located through the list resource list in kubernetes, and then its status can be obtained information. Further, when the second state information still indicates that the state of the node is abnormal, add a target taint mark for the first privacy computing node. Otherwise, the addition of the target taint mark is not performed. It should be understood that the target taint mark can be defined by the worker, such as nueva.alipay.com/agent=NoExecute.

在另一个实施例中，在第一时间戳起经过的第一预设时长内，没有监听到第一隐私计算节点状态变更为正常的事件，此时可以判定第一隐私计算节点的状态仍为异常，于是，为第一隐私计算节点添加目标污点标记。若没有达到第一预设时长时监听到了第一隐私计算节点的状态变更为正常的事件，此时，无需添加目标污点标记。In another embodiment, within the first preset period of time elapsed from the first time stamp, no event that the state of the first privacy computing node changes to normal is detected, and at this time it can be determined that the state of the first privacy computing node is still Abnormally, then, add a target taint mark for the first privacy computing node. If the event that the status of the first privacy computing node changes to normal is detected before the first preset duration, at this time, there is no need to add the target taint mark.

对于上述目标污点标记，指示对应节点无法运行任何服务的pod，添加目标污点标记后的隐私计算节点可以被称为隐私计算无效节点。For the above target taint mark, indicating that the corresponding node cannot run any service pod, the privacy computing node after adding the target taint mark can be called a privacy computing invalid node.

之后，在步骤S230，响应于上述目标污点标记的添加，将第一隐私计算节点上所有服务的pod调度到同一机构(对应同一命名空间)下的可用隐私计算节点上。其中，可用隐私计算节点的状态正常(示例地，节点状态为Ready)，且其不带有上述目标污点标记。如此，可以保障pod服务正常、稳定地运行。另外，对于被调度运行上述pod服务的可用隐私计算节点，文中或称为第二隐私计算节点。Afterwards, in step S230, in response to the above-mentioned addition of the target taint mark, all service pods on the first privacy computing node are scheduled to available privacy computing nodes under the same organization (corresponding to the same namespace). Wherein, the state of the available privacy computing node is normal (for example, the node state is Ready), and it does not carry the above-mentioned target taint mark. In this way, the pod service can be guaranteed to run normally and stably. In addition, the available privacy computing nodes that are scheduled to run the pod service may be referred to as second privacy computing nodes herein.

在一种实施方式中，可以根据第一隐私计算节点上pod需要占用的资源情况，以及第一命名空间中可用隐私计算节点的资源使用情况，如剩余内存等，确定将第一隐私计算节点上的所有服务的pod调度至一个或多个第二隐私计算节点。在一个例子中，可以将所有pod调度至同一个第二隐私计算节点；在另一个例子中，还可以将所有pod划分为两组，分别调度到两个第二隐私计算节点。在另一种实施方式中，针对第一隐私计算节点上包括运行若干服务的多个pod的情况，可以将该多个pod对应调度至多个第二隐私计算节点，如此，通过对运行若干服务的多个pod进行分散部署，可以有效保障该服务的稳定性。In one embodiment, according to the resources that pods need to occupy on the first privacy computing node, and the resource usage of the available privacy computing nodes in the first namespace, such as remaining memory, it can be determined that the pod on the first privacy computing node The pods of all services of are scheduled to one or more second privacy computing nodes. In one example, all pods can be scheduled to the same second privacy computing node; in another example, all pods can be divided into two groups and scheduled to two second privacy computing nodes respectively. In another implementation, for the case that the first privacy computing node includes multiple pods running several services, the multiple pods can be correspondingly scheduled to multiple second privacy computing nodes, so that by pairing the pods running several services Decentralized deployment of multiple pods can effectively guarantee the stability of the service.

需说明的是，对于上述步骤S230的执行主体，可以是kubernetes默认调度器，还可以是定制的调度器。It should be noted that, the execution subject of the above step S230 may be the default scheduler of kubernetes or a customized scheduler.

由上，可以实现对隐私计算无效节点上服务的pod的及时调度，保障服务的可用性。From the above, timely scheduling of pods serving on nodes with invalid privacy computing can be realized, ensuring the availability of services.

以上，对隐私计算无效节点的发现和其上服务的pod的调度进行介绍。Above, the discovery of invalid nodes for privacy computing and the scheduling of pods served on them are introduced.

根据另一方面的实施例，考虑到导致隐私计算节点处于无效状态的原因是多种多样的，有的情况下，例如，因网络抖动导致节点无效，当网络恢复后，隐私计算节点可以随之恢复正常状态，所以，在上述方法中还可以设计对已恢复正常的隐私计算节点的目标污点标记的去除，恢复该隐私计算节点为可用，使其可以重新投入使用。According to another embodiment, considering that there are various reasons for the privacy computing node to be in an invalid state, in some cases, for example, the node is invalid due to network jitter, when the network is restored, the privacy computing node can follow To restore the normal state, therefore, in the above method, it is also possible to design the removal of the target taint mark of the privacy computing node that has been restored to normal, and restore the privacy computing node to be available, so that it can be put into use again.

由此，在上述步骤S220之后，所述方法还可以包括：获取第一隐私计算节点的第三状态信息，其指示节点状态正常；在基于该第三状态信息确定出节点状态正常的持续时间达到第二预设时长的情况下，去除第一隐私计算节点上的目标污点标记。进一步，在一个具体的实施例中，获取的第三状态信息中还包括第二时间戳，自此第二时间戳起经过上述第二预设时长后，获取第一隐私计算节点的第四状态信息；在此第四状态信息仍指示节点状态正常的情况下，对第一隐私计算节点带有的目标污点标记进行去除，否则不作去除。在另一个具体的实施例中，第三状态信息是通过监听到第一隐私计算节点发生状态变更的第二事件而获得的，从而，在自第二事件起经过的上述第二预设时长内，没有监听到第一隐私计算节点状态变更为异常的事件的情况下，进行目标污点标记的去除。如此，对恢复正常状态的第一隐私计算节点的目标污点标记的去除，实现恢复第一隐私计算节点为可用。Therefore, after the above step S220, the method may further include: obtaining third status information of the first privacy computing node, which indicates that the node status is normal; In the case of the second preset duration, the target taint mark on the first privacy computing node is removed. Further, in a specific embodiment, the obtained third state information also includes a second time stamp, and after the above-mentioned second preset time period has elapsed since the second time stamp, the fourth state of the first privacy computing node is obtained information; if the fourth state information still indicates that the node state is normal, remove the target taint mark carried by the first privacy computing node; otherwise, do not remove it. In another specific embodiment, the third state information is obtained by listening to the second event that the state of the first privacy computing node changes, so that within the above-mentioned second preset time period since the second event , if the event that the state of the first privacy computing node changes to abnormal is not detected, the target taint mark is removed. In this way, the removal of the target taint mark of the first privacy computing node restored to a normal state can restore the first privacy computing node to be available.

根据又一方面的实施例，在发现隐私计算无效节点后，还可以对其进行清理，以实现对计算节点的回收。具体，可以将所述方法的执行主体称为隐私计算节点控制器，在一个实施例中，此控制器可以对外(例如，对kubernetes默认调度器)提供应用程序编程接口(Application Programming Interface，简称API)，用于支持针对隐私计算节点的状态信息查询和删除操作。由此，在一个具体的实施例中，在执行上述步骤S220之后，所述方法还可以包括：接收对隐私计算节点控制器所提供的第一目标API的第一调用，调用信息中包括对应某个机构的命名空间的标识参数，基于此第一调用，提供该某个机构下隐私计算节点的状态数据。示例性地，此状态数据中可以包括第一隐私计算节点当前状态异常以及无效状态的持续时长。在另一个具体的实施例中，在执行上述步骤S220之后，所述方法还可以包括：接收对隐私计算节点控制器所提供的第二目标API的第二调用，调用信息中包括第一隐私计算节点的标识参数和针对该第一隐私计算节点的删除操作，基于此第二调用，从kubernetes集群中删除所述第一隐私计算节点。According to yet another embodiment, after an invalid privacy computing node is found, it may also be cleaned up, so as to recycle the computing node. Specifically, the execution subject of the method may be called a privacy computing node controller. In one embodiment, the controller may provide an application programming interface (Application Programming Interface, API for short) to the outside world (for example, to the kubernetes default scheduler). ), used to support state information query and delete operations for privacy computing nodes. Therefore, in a specific embodiment, after performing the above step S220, the method may further include: receiving a first call to the first target API provided by the privacy computing node controller, and the call information includes a corresponding The identification parameter of the namespace of an organization, based on the first call, provides the state data of the privacy computing node under the organization. Exemplarily, the status data may include the duration of the abnormal and invalid status of the first privacy computing node. In another specific embodiment, after performing the above step S220, the method may further include: receiving a second call to the second target API provided by the privacy computing node controller, where the call information includes the first privacy computing The identification parameter of the node and the deletion operation for the first privacy computing node, based on the second call, delete the first privacy computing node from the kubernetes cluster.

在另一个实施例中，隐私计算节点控制器可以设计有针对隐私无效计算节点的自动清理机制。在上述步骤S220之后，所述方法还可以包括：在为第一隐私计算节点添加目标污点标记后，如果监测到之后节点状态异常的持续时间达到对应的预设时长阈值，将第一隐私计算节点从kubernetes集群中删除。另一方面，在一个具体的实施例中，还可以设计自动清理功能的开关，以控制对自动清理功能的启用或停用。In another embodiment, the privacy computing node controller can be designed with an automatic cleaning mechanism for privacy invalid computing nodes. After the above step S220, the method may further include: after adding the target taint mark to the first privacy computing node, if it is detected that the duration of the abnormal state of the node reaches the corresponding preset duration threshold, the first privacy computing node Removed from the kubernetes cluster. On the other hand, in a specific embodiment, a switch of the automatic cleaning function can also be designed to control the activation or deactivation of the automatic cleaning function.

如此，可以实现对隐私计算无效节点的清理。In this way, the cleaning of invalid nodes for privacy calculation can be realized.

综上，采用本说明书实施例披露的隐私计算节点的管理方法，可以更加细粒度地管理机构下的隐私计算节点，及时发现其中的无效节点并添加目标污点标记，从而及时触发对其上服务的pod的重新调度，保障服务的可用性及稳定性。进一步，还可以实现对隐私计算无效节点的清理回收，以及，对恢复正常状态且状态稳定的隐私计算节点进行目标污点标记的去除，使其能够再次投入使用，充分发挥隐私计算节点的使用效能。To sum up, using the management method of privacy computing nodes disclosed in the embodiment of this specification, the privacy computing nodes under the organization can be managed in a more fine-grained manner, and the invalid nodes can be found in time and the target taint mark can be added, so as to trigger the service on it in time. The rescheduling of pods ensures the availability and stability of services. Furthermore, it is also possible to clean up and recycle invalid nodes for privacy computing, and remove the target taint mark for private computing nodes that have returned to a normal state and are in a stable state, so that they can be put into use again and give full play to the use efficiency of private computing nodes.

与上述管理方法相对应地，本说明书实施例还披露一种隐私计算节点的管理装置，图3示出根据一个实施例的隐私计算节点的管理装置结构示意图，如图3所示，所述装置300包括以下单元：Corresponding to the above-mentioned management method, the embodiment of this specification also discloses a management device for a privacy computing node. FIG. 3 shows a schematic structural diagram of a management device for a privacy computing node according to an embodiment. As shown in FIG. 3, the device 300 includes the following units:

状态获取单元310，配置为获取kubernetes集群中第一隐私计算节点的第一状态信息，其指示节点状态异常；所述第一隐私计算节点属于第一机构对应的第一命名空间。污点添加单元320，配置为在基于所述第一状态信息确定出所述节点状态异常的持续时间达到第一预设时长的情况下，为所述第一隐私计算节点添加目标污点标记，以响应于所述目标污点标记的添加，将所述第一隐私计算节点上服务的pod调度至所述kubernetes集群中位于所述第一命名空间下的第二隐私计算节点；所述第一预设时长针对所述第一机构而设置；所述第二隐私计算节点状态正常且没有所述目标污点标记。The state obtaining unit 310 is configured to obtain first state information of a first private computing node in the kubernetes cluster, which indicates that the state of the node is abnormal; the first private computing node belongs to the first namespace corresponding to the first institution. The taint adding unit 320 is configured to add a target taint mark to the first privacy computing node in response to When the target taint mark is added, dispatching the pod served on the first privacy computing node to the second privacy computing node located in the first namespace in the kubernetes cluster; the first preset duration Set for the first organization; the second privacy computing node is in a normal state and does not have the target taint mark.

在一个实施例中，所述第一状态信息中还包括第一时间戳。所述污点添加单元320具体配置为：自所述第一时间戳起经过所述第一预设时长后，获取所述第一隐私计算节点的第二状态信息；在所述第二状态信息仍指示节点状态异常的情况下，为所述第一隐私计算节点添加所述目标污点标记。In an embodiment, the first status information further includes a first timestamp. The taint adding unit 320 is specifically configured to: obtain the second status information of the first privacy computing node after the first preset time period has elapsed since the first timestamp; when the second status information is still Adding the target taint mark to the first privacy computing node when it indicates that the state of the node is abnormal.

在一个实施例中，状态获取单元310具体配置为：监听到所述第一隐私计算节点发生状态变更的第一事件，该第一事件的信息中包括所述第一状态信息；污点添加单元320具体配置为：在自所述第一事件起经过的所述第一预设时长内，没有监听到所述第一隐私计算节点的状态变更为正常的事件的情况下，为所述第一隐私计算节点添加所述目标污点标记。In one embodiment, the state acquisition unit 310 is specifically configured to: listen to a first event of a state change of the first privacy computing node, and the information of the first event includes the first state information; the taint addition unit 320 The specific configuration is: within the first preset period of time since the first event, if the event that the state of the first privacy computing node changes to normal is not detected, the first privacy A compute node adds the target taint flag.

在一个实施例中，所述装置还包括服务调度单元330，配置为响应于所述目标污点标记的添加，将所述第一隐私计算节点上服务的pod调度至所述kubernetes集群中位于所述第一命名空间下的第二隐私计算节点。In one embodiment, the device further includes a service scheduling unit 330 configured to, in response to the addition of the target taint mark, schedule the service pod on the first privacy computing node to the kubernetes cluster located at the A second privacy computing node under the first namespace.

在一个实施例中，所述装置300还包括：状态再获取单元340，配置为获取所述第一隐私计算节点的第三状态信息，其指示节点状态正常；污点去除单元350，配置为在基于所述第三状态信息确定出所述节点状态正常的持续时间达到第二预设时长的情况下，去除所述目标污点标记。In one embodiment, the apparatus 300 further includes: a state reacquisition unit 340, configured to obtain third state information of the first privacy computing node, which indicates that the node state is normal; a stain removal unit 350, configured to obtain the third state information based on If the third state information determines that the duration of the normal state of the node reaches a second preset duration, the target taint mark is removed.

在一个具体的实施例中，所述第三状态信息中还包括第二时间戳；污点去除单元350具体配置为：自所述第二时间戳起经过所述第二预设时长后，获取所述第一隐私计算节点的第四状态信息；在所述第四状态信息仍指示节点状态正常的情况下，去除所述目标污点标记。In a specific embodiment, the third state information further includes a second time stamp; the stain removal unit 350 is specifically configured to: after the second preset time period elapses from the second time stamp, obtain the The fourth status information of the first privacy computing node; when the fourth status information still indicates that the node status is normal, remove the target taint mark.

在一个具体的实施例中，状态再获取单元340具体配置为：监听到所述第一隐私计算节点发生状态变更的第二事件，该第二事件的信息中包括所述第三状态信息；污点去除单元350具体配置为：在自所述第二事件起经过的所述第二预设时长内，没有监听到所述第一隐私计算节点的状态变更为异常的事件的情况下，去除所述目标污点标记。In a specific embodiment, the state re-acquisition unit 340 is specifically configured to: listen to a second event in which the state of the first privacy computing node changes, and the information of the second event includes the third state information; The removal unit 350 is specifically configured to: remove the first privacy computing node if no event is detected that the state of the first privacy computing node changes to abnormal within the second preset time period since the second event. Target taint marker.

在一个实施例中，所述第一隐私计算节点上包括运行若干服务的多个pod；其中，服务调度单元330具体配置为：将所述多个pod对应调度至多个第二隐私计算节点。In one embodiment, the first privacy computing node includes a plurality of pods running several services; wherein, the service scheduling unit 330 is specifically configured to: correspondingly schedule the plurality of pods to a plurality of second privacy computing nodes.

在一个实施例中，所述kubernetes集群包括多个计算节点；所述装置300还包括：隐私节点过滤单元360，配置为从所述多个计算节点中过滤出带有隐私计算标签的多个隐私计算节点；所述多个隐私计算节点分属于不同机构，其中包括所述第一隐私计算节点和第二隐私计算节点。In one embodiment, the kubernetes cluster includes a plurality of computing nodes; the apparatus 300 further includes: a privacy node filtering unit 360 configured to filter out a plurality of privacy nodes with privacy computing labels from the plurality of computing nodes; A computing node; the plurality of privacy computing nodes belong to different organizations, including the first privacy computing node and the second privacy computing node.

在一个实施例中，所述方法的执行主体为隐私计算节点控制器；其中，所述装置300还包括：无效节点清除单元370，在一个具体的实施例中，无效节点清除单元370配置为响应于对所述隐私计算节点控制器所提供的目标API的调用，从所述kubernetes集群中删除所述第一隐私计算节点。在另一个具体的实施例中，无效节点清除单元370配置为：自所述第一时间戳起经过第三预设时长后，获取所述第一隐私计算节点的第五状态信息；在所述第五状态信息指示节点状态异常的情况下，将所述第一隐私计算节点从所述kubernetes集群中删除。In one embodiment, the method is executed by a privacy computing node controller; wherein, the apparatus 300 further includes: an invalid node removal unit 370, and in a specific embodiment, the invalid node removal unit 370 is configured to respond to Deleting the first privacy computing node from the kubernetes cluster upon invoking the target API provided by the privacy computing node controller. In another specific embodiment, the invalid node clearing unit 370 is configured to: obtain the fifth status information of the first privacy computing node after a third preset period of time has elapsed since the first timestamp; If the fifth state information indicates that the state of the node is abnormal, the first privacy computing node is deleted from the kubernetes cluster.

综上，采用本说明书实施例披露的隐私计算节点的管理装置，可以更加细粒度地管理机构下的隐私计算节点，及时发现其中的无效节点并添加目标污点标记，从而及时触发对其上服务的pod的重新调度，保障服务的可用性及稳定性。进一步，还可以实现对隐私计算无效节点的清理回收，以及，对恢复正常状态且状态稳定的隐私计算节点进行目标污点标记的去除，使其能够再次投入使用，充分发挥隐私计算节点的使用效能。To sum up, using the management device for privacy computing nodes disclosed in the embodiments of this specification can manage privacy computing nodes under organizations in a more fine-grained manner, discover invalid nodes among them in a timely manner and add target taint marks, thereby triggering timely access to the services on it. The rescheduling of pods ensures the availability and stability of services. Furthermore, it is also possible to clean up and recycle invalid nodes for privacy computing, and remove the target taint mark for private computing nodes that have returned to a normal state and are in a stable state, so that they can be put into use again and give full play to the use efficiency of private computing nodes.

根据另一方面的实施例，还提供一种计算机可读存储介质，其上存储有计算机程序，当所述计算机程序在计算机中执行时，令计算机执行结合图2所描述的方法。According to another embodiment, there is also provided a computer-readable storage medium on which a computer program is stored. When the computer program is executed in a computer, the computer is instructed to execute the method described in conjunction with FIG. 2 .

根据再一方面的实施例，还提供一种计算设备，包括存储器和处理器，所述存储器中存储有可执行代码，所述处理器执行所述可执行代码时，实现结合图2所描述的方法。According to yet another embodiment, there is also provided a computing device, including a memory and a processor, wherein executable code is stored in the memory, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 is implemented. method.

本领域技术人员应该可以意识到，在上述一个或多个示例中，本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时，可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should be aware that, in the above one or more examples, the functions described in the present invention may be implemented by hardware, software, firmware or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

以上所述的具体实施方式，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施方式而已，并不用于限定本发明的保护范围，凡在本发明的技术方案的基础之上，所做的任何修改、等同替换、改进等，均应包括在本发明的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. Protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims

1. A method of managing a private computing node, comprising:

acquiring first state information of a first private computing node in a kubernets cluster, wherein the first state information indicates that the state of the node is abnormal; the first privacy computing node belongs to a first namespace corresponding to a first organization;

adding a target taint mark to the first privacy calculation node under the condition that the duration of the abnormal node state is determined to reach a first preset duration based on the first state information; the first preset duration is set for the first mechanism;

in response to the addition of the target taint marker, scheduling a pod served on the first private computing node to a second private computing node in the kubernets cluster that is located under the first namespace; the second private computing node is in a normal state and does not have the target taint mark.

2. The method of claim 1, wherein the first state information further comprises a first timestamp;

wherein, in a case that it is determined based on the first state information that the duration of the node state anomaly reaches a first preset duration, adding a target taint mark to the first privacy computing node includes:

acquiring second state information of the first privacy computation node after the first preset time length from the first timestamp;

and under the condition that the second state information still indicates that the node state is abnormal, adding the target taint mark for the first privacy computing node.

3. The method of claim 1, wherein obtaining first state information for a first private computing node in a kubernets cluster comprises:

monitoring a first event of state change of the first privacy calculation node, wherein the information of the first event comprises the first state information;

under the condition that the duration of the abnormal node state reaches a first preset duration based on the first state information, adding a target taint mark to the first privacy calculation node, wherein the target taint mark comprises:

and adding the target taint mark to the first privacy computing node when the event that the state of the first privacy computing node is changed to be normal is not monitored in the first preset time period after the first event.

4. The method of claim 1, wherein after adding a target taint mark for the first private computing node, the method further comprises:

acquiring third state information of the first privacy calculation node, wherein the third state information indicates that the state of the node is normal;

and removing the target stain mark under the condition that the normal duration of the node state is determined to reach a second preset duration based on the third state information.

5. The method of claim 4, wherein the third state information further includes a second timestamp;

wherein, removing the target stain mark when it is determined that the duration of the node state normality reaches a second preset duration based on the third state information includes:

acquiring fourth state information of the first privacy calculation node after the second preset time from the second timestamp;

and under the condition that the fourth state information still indicates that the node state is normal, removing the target taint mark.

6. The method of claim 4, wherein obtaining third state information of the first private computing node comprises:

monitoring a second event of state change of the first privacy calculation node, wherein the information of the second event comprises the third state information;

wherein, removing the target stain mark when it is determined that the duration of the node state normality reaches the second preset duration based on the third state information includes:

and removing the target taint mark when the event that the state of the first privacy computation node is changed to be abnormal is not monitored within the second preset time period after the second event.

7. The method of claim 1, wherein the kubernets cluster includes a plurality of computing nodes; before obtaining first state information of a first private computing node in a kubernets cluster, the method further comprises:

filtering out a plurality of privacy compute nodes with privacy compute tags from the plurality of compute nodes; the plurality of privacy computing nodes belong to different namespaces corresponding to different organizations, and comprise the first privacy computing node and the second privacy computing node.

8. The method of claim 1, the method performed by a private computing node controller; wherein, after scheduling a pod served on the first private computing node to a second private computing node in the kubernets cluster that is located under the first namespace, the method further comprises:

deleting the first private computing node from the kubernets cluster in response to a call to a target API provided by the private computing node controller.

9. The method of claim 1, wherein the first state information further comprises a first timestamp; after scheduling a pod served on the first private computing node to a second private computing node in the kubernets cluster that is located under the first namespace, the method further comprises:

acquiring fifth state information of the first privacy computation node after a third preset time from the first timestamp;

deleting the first privacy computing node from the kubernets cluster if the fifth state information indicates that the node state is abnormal.

10. An apparatus for managing private computing nodes, comprising:

the state acquisition unit is configured to acquire first state information of a first private computing node in the kubernets cluster, and the first state information indicates that the state of the node is abnormal; the first privacy computation node belongs to a first namespace corresponding to a first mechanism;

a taint adding unit, configured to add a target taint mark to the first privacy computing node when it is determined that the duration of the node state anomaly reaches a first preset duration based on the first state information, and to schedule a pod served on the first privacy computing node to a second privacy computing node located under the first namespace in the kubernets cluster in response to the addition of the target taint mark; the first preset duration is set for the first mechanism; the second private computing node is in a normal state and does not have the target taint mark.

11. A computer-readable storage medium, on which a computer program is stored, wherein the computer program causes a computer to carry out the method of any one of claims 1-9, when the computer program is carried out in the computer.

12. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-9.