HK40089547A - Method, apparatus, device, and storage medium for determining number of copies - Google Patents
Method, apparatus, device, and storage medium for determining number of copies Download PDFInfo
- Publication number
- HK40089547A HK40089547A HK42023078258.3A HK42023078258A HK40089547A HK 40089547 A HK40089547 A HK 40089547A HK 42023078258 A HK42023078258 A HK 42023078258A HK 40089547 A HK40089547 A HK 40089547A
- Authority
- HK
- Hong Kong
- Prior art keywords
- target
- cluster
- replicas
- replica
- node
- Prior art date
Links
Description
技术领域Technical Field
本申请实施例涉及数据处理技术领域,尤其涉及一种副本数量的确定方法、装置、设备及存储介质。This application relates to the field of data processing technology, and in particular to a method, apparatus, device, and storage medium for determining the number of copies.
背景技术Background Technology
随着云计算,分布式存储等各类技术的蓬勃发展,由于成本低、效率高以及测试轻量级的特点,Kubernetes集群架构系统的研究已经成为了互联网通信技术领域的大热方向,Kubernetes容器集群管理工具主要应用于管理云计算环境下计算密集型的大规模任务应用,然而为满足容器集群所需的计算资源,通常需要在集群中增加宿主机节点,由于单集群可以管理的计算节点有上限,需要不断新增集群以满足在集群故障时可以迅速切换到正常的集群上,使得容器集群高可用,从而满足容器集群对资源的需求。With the rapid development of technologies such as cloud computing and distributed storage, the research on Kubernetes cluster architecture systems has become a hot topic in the field of Internet communication technology due to their low cost, high efficiency, and lightweight testing characteristics. Kubernetes container cluster management tools are mainly used to manage computationally intensive large-scale task applications in cloud computing environments. However, in order to meet the computing resources required by container clusters, it is usually necessary to add host nodes to the cluster. Since there is an upper limit to the number of computing nodes that a single cluster can manage, it is necessary to continuously add clusters to ensure that the cluster can be quickly switched to a normal cluster in case of cluster failure, so as to make the container cluster highly available and meet the resource requirements of the container cluster.
但是目前部署集群通常是由应用自行选择的,而应用在多集群调度器调度之前不知道集群的具体资源使用情况,实际情况下可能有些集群剩余资源充足,有些集群资源紧张,若是应用选择了资源紧张的集群创建过多副本,则会导致的资源紧张的集群创建副本资源不足无法生产的情况,无法满足容器集群对资源的需求,从而导致集群资源负载不平衡。However, currently, cluster deployment is usually chosen by the application itself. Before the multi-cluster scheduler schedules the clusters, the application does not know the specific resource usage of the clusters. In reality, some clusters may have sufficient remaining resources, while others may have scarce resources. If the application chooses to create too many replicas of a cluster with scarce resources, it will lead to a situation where the cluster with scarce resources does not have enough resources to create replicas and cannot produce, thus failing to meet the resource requirements of the container cluster and resulting in an unbalanced cluster resource load.
发明内容Summary of the Invention
本申请实施例提供了一种副本数量的确定方法、装置、设备及存储介质,用于通过从第一集群中筛选出的每个第一目标节点的第一目标副本数量来获取第一集群的第一目标副本总数,能够在多集群调度器调度之前准确感知到第一集群的可支持创建的第一目标副本总数,可以避免多集群调度器分配给超过第一目标副本总数的副本给第一集群进行生产,从而实现集群资源负载均衡。This application provides a method, apparatus, device, and storage medium for determining the number of replicas. It is used to obtain the total number of first target replicas of a first cluster by selecting the number of first target replicas of each first target node in the first cluster. It can accurately perceive the total number of first target replicas that the first cluster can support creation before the multi-cluster scheduler schedules, and can avoid the multi-cluster scheduler from allocating more than the total number of first target replicas to the first cluster for production, thereby achieving cluster resource load balancing.
本申请实施例一方面提供了一种副本数量的确定方法,包括:One embodiment of this application provides a method for determining the number of copies, including:
获取第一集群的副本节点配置信息以及副本资源配置信息,第一集群包括若干个节点;Obtain the replica node configuration information and replica resource configuration information of the first cluster, which includes several nodes;
根据副本节点配置信息对第一集群的若干个节点进行筛选操作,得到至少一个第一目标节点;Based on the replica node configuration information, a filtering operation is performed on several nodes in the first cluster to obtain at least one first target node;
获取每个第一目标节点中最小集群调度单元的已使用资源量和总资源量;Obtain the used resources and total resources of the smallest cluster scheduling unit in each first target node;
根据已使用资源量和总资源量计算每个第一目标节点的剩余资源量;Calculate the remaining resources for each first target node based on the amount of resources used and the total amount of resources;
根据数量计算策略,对剩余资源量和副本资源配置信息中各资源量进行计算,得到每个第一目标节点的第一目标副本数量;Based on the quantity calculation strategy, the remaining resource quantity and the resource quantity in the replica resource configuration information are calculated to obtain the number of first target replicas for each first target node;
对每个第一目标节点的第一目标副本数量进行汇总,得到第一集群的第一目标副本总数。The total number of first target replicas for each first target node is summed to obtain the total number of first target replicas for the first cluster.
本申请另一方面提供了一种副本数量的确定装置,包括:This application, in another aspect, provides an apparatus for determining the number of copies, comprising:
获取单元,用于获取第一集群的副本节点配置信息以及副本资源配置信息,第一集群包括若干个节点;The acquisition unit is used to acquire the replica node configuration information and replica resource configuration information of the first cluster, which includes several nodes;
处理单元,用于根据副本节点配置信息对第一集群的若干个节点进行筛选操作,得到至少一个第一目标节点;The processing unit is used to perform a filtering operation on several nodes of the first cluster based on the replica node configuration information to obtain at least one first target node.
获取单元,还用于获取每个第一目标节点中最小集群调度单元的已使用资源量和总资源量;The acquisition unit is also used to acquire the amount of resources used and the total amount of resources of the smallest cluster scheduling unit in each first target node;
处理单元,还用于根据已使用资源量和总资源量计算每个第一目标节点的剩余资源量;The processing unit is also used to calculate the remaining resources of each first target node based on the amount of resources used and the total amount of resources.
处理单元,还用于根据数量计算策略,对剩余资源量和副本资源配置信息中各资源量进行计算,得到每个第一目标节点的第一目标副本数量;The processing unit is also used to calculate the number of first target replicas for each first target node based on the quantity calculation strategy, the remaining resource quantity and the resource quantity in the replica resource configuration information.
处理单元,还用于对每个第一目标节点的第一目标副本数量进行汇总,得到第一集群的第一目标副本总数。The processing unit is also used to summarize the number of first target replicas for each first target node to obtain the total number of first target replicas for the first cluster.
在一种可能的设计中,本申请实施例另一方面的一种实现方式中,处理单元具体可以用于:In one possible design, in another implementation of the embodiments of this application, the processing unit may specifically be used for:
根据选择器配置数据,从第一集群的若干个节点中筛选出第一候选节点;Based on the selector configuration data, the first candidate node is selected from several nodes in the first cluster;
根据亲和性配置数据,从若干个第一候选节点中筛选出第二候选节点;Based on affinity configuration data, select second candidate nodes from several first candidate nodes;
根据污点和容忍度配置数据,从若干个第二候选节点中筛选出第三候选节点;Based on the taint and tolerance configuration data, a third candidate node is selected from several second candidate nodes;
将若干个第三候选节点中满足调度条件的节点作为第一目标节点。The node that meets the scheduling conditions among several third candidate nodes is selected as the first target node.
在一种可能的设计中,本申请实施例另一方面的一种实现方式中,处理单元具体可以用于:In one possible design, in another implementation of the embodiments of this application, the processing unit may specifically be used for:
获取若干个第三候选节点的节点状态信息;Obtain the node status information of several third candidate nodes;
若节点状态信息为可运行且可调度状态,则将第三候选节点作为第一目标节点。If the node status information is in a runnable and schedulable state, then the third candidate node will be selected as the first target node.
在一种可能的设计中,本申请实施例另一方面的一种实现方式中,In one possible design, in another implementation of the embodiments of this application,
获取单元,还用于获取每个第一目标节点中最小集群调度单元的已运行个数和总个数;The acquisition unit is also used to acquire the number of running and total number of the smallest cluster scheduling units in each first target node;
处理单元具体可以用于:根据数量计算策略,对剩余资源量、副本资源配置信息中各资源量、已运行个数和总个数进行计算,得到每个第一目标节点的第一目标副本数量。Specifically, the processing unit can be used to: calculate the number of first target replicas for each first target node based on the quantity calculation strategy, the remaining resource quantity, the resource quantity in the replica resource configuration information, the number of already run replicas, and the total number of replicas.
在一种可能的设计中,本申请实施例另一方面的一种实现方式中,处理单元具体可以用于:In one possible design, in another implementation of the embodiments of this application, the processing unit may specifically be used for:
计算剩余资源量和副本资源配置信息中各资源量之间的比值,得到至少一个第一副本数量;Calculate the ratio between the remaining resource quantity and the resource quantity in the replica resource configuration information to obtain the number of at least one first replica;
计算已运行个数和总个数之间的比值,得到第二副本数量;Calculate the ratio between the number of already run replicas and the total number of replicas to obtain the number of second replicas;
从至少一个第一副本数量和第二副本数量中选取数值最小的作为第一目标副本数量。Select the smallest value from at least one first replica quantity and a second replica quantity as the first target replica quantity.
在一种可能的设计中,本申请实施例另一方面的一种实现方式中,In one possible design, in another implementation of the embodiments of this application,
获取单元,还用于获取至少两个集群中的第二集群的第二目标副本总数;The acquisition unit is also used to acquire the total number of second target replicas in the second cluster out of at least two clusters;
处理单元,还用于计算第一目标副本总数和第二目标副本总数之间的比值,得到至少两个集群对应的副本分配权值。The processing unit is also used to calculate the ratio between the total number of the first target replicas and the total number of the second target replicas, so as to obtain the replica allocation weights for at least two clusters.
在一种可能的设计中,本申请实施例另一方面的一种实现方式中,In one possible design, in another implementation of the embodiments of this application,
接收单元,用于接收集群调度请求,其中,集群调度请求携带有待生产的副本总数;The receiving unit is used to receive cluster scheduling requests, wherein the cluster scheduling request carries the total number of replicas to be produced;
处理单元,还用于根据副本分配权值,将待生产的副本总数划分第一集群的第一待生产副本数量以及第二集群的第二待生产副本数量;The processing unit is also used to divide the total number of replicas to be produced into the first number of replicas to be produced in the first cluster and the second number of replicas to be produced in the second cluster according to the replica allocation weight.
处理单元,还用于分别将第一待生产副本数量以及第二待生产副本数量分配至第一集群以及第二集群进行副本生产。The processing unit is also used to allocate the first number of replicas to be produced and the second number of replicas to be produced to the first cluster and the second cluster respectively for replica production.
本申请另一方面提供了一种计算机设备,包括:存储器、收发器、处理器以及总线系统;Another aspect of this application provides a computer device, including: a memory, a transceiver, a processor, and a bus system;
其中,存储器用于存储程序;The memory is used to store programs;
处理器用于执行存储器中的程序时实现如上述各方面的方法;The processor implements the methods described above when executing a program in memory;
总线系统用于连接存储器以及处理器,以使存储器以及处理器进行通信。Bus systems are used to connect memory and processor to enable communication between them.
本申请的另一方面提供了一种计算机可读存储介质,计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面的方法。Another aspect of this application provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the methods described above.
从以上技术方案可以看出,本申请实施例具有以下优点:As can be seen from the above technical solutions, the embodiments of this application have the following advantages:
通过获取第一集群的副本节点配置信息以及副本资源配置信息,可以根据副本节点配置信息从第一集群的若干个节点中筛选出至少个第一目标节点,可以获取每个第一目标节点中最小集群调度单元已使用资源量和总资源量,并根据已使用资源量和总资源量计算每个第一目标节点的剩余资源量,然后可以根据数量计算策略,对剩余资源量和副本资源配置信息中各资源量进行计算,以获取每个第一目标节点的第一目标副本数量,并将每个第一目标节点的第一目标副本数量进行汇总,以获取第一集群的第一目标副本总数。通过上述方式,能够通过获取到的副本节点配置信息从第一集群中筛选出至少一个第一目标节点,并通过副本资源配置信息计算得到每个第一目标节点的第一目标副本数量,以获取第一集群的第一目标副本总数,能够在多集群调度器调度之前准确感知到第一集群的可支持创建的第一目标副本总数,可以避免多集群调度器分配给超过第一目标副本总数的副本给第一集群进行生产,从而实现集群资源负载均衡。By obtaining the replica node configuration information and replica resource configuration information of the first cluster, at least one first target node can be selected from several nodes in the first cluster based on the replica node configuration information. The used resources and total resources of the smallest cluster scheduling unit in each first target node can be obtained. Based on the used resources and total resources, the remaining resources of each first target node can be calculated. Then, according to the quantity calculation strategy, the remaining resources and each resource quantity in the replica resource configuration information can be calculated to obtain the number of first target replicas for each first target node. The number of first target replicas for each first target node is then summed to obtain the total number of first target replicas in the first cluster. Through this method, at least one first target node can be selected from the first cluster using the obtained replica node configuration information, and the number of first target replicas for each first target node can be calculated using the replica resource configuration information to obtain the total number of first target replicas in the first cluster. This allows for accurate perception of the total number of first target replicas that the first cluster can support creation before multi-cluster scheduler scheduling, preventing the multi-cluster scheduler from allocating more replicas than the total number of first target replicas to the first cluster for production, thereby achieving cluster resource load balancing.
附图说明Attached Figure Description
图1是本申请实施例中副本数据控制系统的一个架构示意图;Figure 1 is a schematic diagram of the architecture of a copy data control system in an embodiment of this application;
图2是本申请实施例中副本数量的确定方法的一个实施例流程图;Figure 2 is a flowchart of an embodiment of the method for determining the number of copies in this application;
图3是本申请实施例中副本数量的确定方法的另一个实施例流程图;Figure 3 is a flowchart of another embodiment of the method for determining the number of copies in this application;
图4是本申请实施例中副本数量的确定方法的另一个实施例流程图;Figure 4 is a flowchart of another embodiment of the method for determining the number of copies in this application;
图5是本申请实施例中副本数量的确定方法的另一个实施例流程图;Figure 5 is a flowchart of another embodiment of the method for determining the number of copies in this application;
图6是本申请实施例中副本数量的确定方法的另一个实施例流程图;Figure 6 is a flowchart of another embodiment of the method for determining the number of copies in this application;
图7是本申请实施例中副本数量的确定方法的另一个实施例流程图;Figure 7 is a flowchart of another embodiment of the method for determining the number of copies in this application;
图8是本申请实施例中副本数量的确定方法的另一个实施例流程图;Figure 8 is a flowchart of another embodiment of the method for determining the number of copies in this application;
图9是本申请实施例中副本数量的确定方法的一个原理实施例流程图;Figure 9 is a flowchart of a principle embodiment of the method for determining the number of copies in this application;
图10是本申请实施例中副本数量的确定装置的一个实施例示意图;Figure 10 is a schematic diagram of an embodiment of the device for determining the number of copies in this application;
图11是本申请实施例中计算机设备的一个实施例示意图。Figure 11 is a schematic diagram of one embodiment of the computer device in this application.
具体实施方式Detailed Implementation
本申请实施例提供了一种副本数量的确定方法、装置、设备及存储介质,用于通过从第一集群中筛选出的每个第一目标节点的第一目标副本数量来获取第一集群的第一目标副本总数,能够在多集群调度器调度之前准确感知到第一集群的可支持创建的第一目标副本总数,可以避免多集群调度器分配给超过第一目标副本总数的副本给第一集群进行生产,从而实现集群资源负载均衡。This application provides a method, apparatus, device, and storage medium for determining the number of replicas. It is used to obtain the total number of first target replicas of a first cluster by selecting the number of first target replicas of each first target node in the first cluster. It can accurately perceive the total number of first target replicas that the first cluster can support creation before the multi-cluster scheduler schedules, and can avoid the multi-cluster scheduler from allocating more than the total number of first target replicas to the first cluster for production, thereby achieving cluster resource load balancing.
本申请的说明书和权利要求书及附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“对应于”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification, claims, and drawings of this application are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented, for example, in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “corresponding to,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
而随着信息的飞速发展,云技术(Cloud technology)也逐渐走入人们生活的方方面面。云技术是基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。With the rapid development of information technology, cloud technology is gradually permeating all aspects of people's lives. Cloud technology is a general term encompassing network technology, information technology, integration technology, management platform technology, and application technology based on the cloud computing business model. It can form resource pools, providing on-demand, flexible, and convenient access. Cloud computing technology will become a crucial support. Backend services of technical network systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites. With the rapid development and application of the internet industry, every item may have its own identification mark in the future, requiring data to be transmitted to backend systems for logical processing. Data at different levels will be processed separately, and various industry data will require robust system support, which can only be achieved through cloud computing.
其中,云安全(Cloud Security)是指基于云计算商业模式应用的安全软件、硬件、用户、机构、安全云平台的总称。云安全融合了并行处理、网格计算、未知病毒行为判断等新兴技术和概念,通过网状的大量客户端对网络中软件行为的异常监测,获取互联网中木马、恶意程序的最新信息,并发送到服务端进行自动分析和处理,再把病毒和木马的解决方案分发到每一个客户端。本申请实施例提供的副本数量的确定方法就可以通过云计算技术以及云安全技术来实现。Cloud security refers to the collective term for security software, hardware, users, organizations, and security cloud platforms based on cloud computing business models. Cloud security integrates emerging technologies and concepts such as parallel processing, grid computing, and unknown virus behavior detection. It uses a large network of clients to monitor abnormal software behavior on the network, obtain the latest information on Trojans and malware on the internet, and send it to the server for automatic analysis and processing. Finally, solutions for viruses and Trojans are distributed to each client. The method for determining the number of copies provided in this application can be implemented using cloud computing technology and cloud security technology.
下面对本申请实施例中涉及的部分概念进行介绍。The following describes some of the concepts involved in the embodiments of this application.
1、工作负载1. Workload
工作负载是一类应用程序,它可以含有多个副本实例。A workload is a type of application that can contain multiple replica instances.
2、副本2. Copy
工作负载的实例单元,每个副本实例都是一个独立的容器。The instance unit of a workload, each replica instance is an independent container.
3、Kubernetes3. Kubernetes
一个生产级别的大规模容器编排调度系统,用于容器集群的自动化部署、扩容以及运维。A production-grade, large-scale container orchestration and scheduling system for the automated deployment, scaling, and operation and maintenance of container clusters.
4、容器4. Container
容器一般是指Docker容器,通过容器隔离的特性和宿主机进行解耦,使得服务不需要依赖于宿主机而运行,与宿主机互不影响,Docker容器十分轻量。而kubernetes则负责管理服务中所有的Docker容器,创建、运行、重启与删除容器。Containers generally refer to Docker containers. Through their isolation features, they decouple from the host machine, allowing services to run independently of it and without interference. Docker containers are very lightweight. Kubernetes, on the other hand, manages all Docker containers within the service, creating, running, restarting, and deleting them.
5、Pod5. Pod
Pod是一组容器形成的集合,是集群调度的最小单元,其中,一个Pod代表着一个工作负载的副本。A Pod is a collection of containers and is the smallest unit of cluster scheduling. A Pod represents a copy of a workload.
应理解,本申请提供的副本数量的确定方法可以应用于云技术、人工智能、智慧交通等领域,用于基于副本调度维护集群负载均衡等场景。作为示例,例如通过确定每个集群的副本数量来完成对集群的工作负载的一次调度。作为另一个示例,例如通过确定每个集群的副本数量来辅助集群调度器感知每个集群的最大能够生产的副本数量。作为再一示例,例如通过确定每个集群的副本数量为每个集群分配相对应的副本生产任务。It should be understood that the method for determining the number of replicas provided in this application can be applied to fields such as cloud technology, artificial intelligence, and intelligent transportation, for scenarios such as maintaining cluster load balancing based on replica scheduling. For example, determining the number of replicas for each cluster can be used to schedule the workload of the cluster. As another example, determining the number of replicas for each cluster can help the cluster scheduler perceive the maximum number of replicas that each cluster can produce. As yet another example, determining the number of replicas for each cluster can be used to allocate corresponding replica production tasks to each cluster.
为了解决上述问题,本申请提出了一种副本数量的确定方法,该方法应用于图1所示的副本数据控制系统,请参阅图1,图1为本申请实施例中副本数据控制系统的一个架构示意图,如图1所示,服务器获取第一集群的副本节点配置信息以及副本资源配置信息,可以根据副本节点配置信息从第一集群的若干个节点中筛选出至少个第一目标节点,可以获取每个第一目标节点中最小集群调度单元已使用资源量和总资源量,并根据已使用资源量和总资源量计算每个第一目标节点的剩余资源量,然后可以根据数量计算策略,对剩余资源量和副本资源配置信息中各资源量进行计算,以获取每个第一目标节点的第一目标副本数量,并将每个第一目标节点的第一目标副本数量进行汇总,以获取第一集群的第一目标副本总数。通过上述方式,能够通过获取到的副本节点配置信息从第一集群中筛选出至少一个第一目标节点,并通过副本资源配置信息计算得到每个第一目标节点的第一目标副本数量,以获取第一集群的第一目标副本总数,能够在多集群调度器调度之前准确感知到第一集群的可支持创建的第一目标副本总数,可以避免多集群调度器分配给超过第一目标副本总数的副本给第一集群进行生产,从而实现集群资源负载均衡。To address the aforementioned issues, this application proposes a method for determining the number of replicas. This method is applied to the replica data control system shown in Figure 1. Figure 1 is a schematic diagram of the architecture of the replica data control system in an embodiment of this application. As shown in Figure 1, the server obtains the replica node configuration information and replica resource configuration information of the first cluster. Based on the replica node configuration information, at least one first target node can be selected from several nodes in the first cluster. The server can obtain the used resources and total resources of the smallest cluster scheduling unit in each first target node, and calculate the remaining resources of each first target node based on the used resources and total resources. Then, based on the quantity calculation strategy, the remaining resources and each resource in the replica resource configuration information are calculated to obtain the number of first target replicas for each first target node. Finally, the number of first target replicas for each first target node is summarized to obtain the total number of first target replicas of the first cluster. Using the above method, at least one first target node can be selected from the first cluster by obtaining the replica node configuration information, and the number of first target replicas of each first target node can be calculated by using the replica resource configuration information to obtain the total number of first target replicas of the first cluster. This allows for accurate perception of the total number of first target replicas that the first cluster can support before scheduling by the multi-cluster scheduler, which can prevent the multi-cluster scheduler from allocating more replicas than the total number of first target replicas to the first cluster for production, thereby achieving cluster resource load balancing.
可以理解的是,图1中仅示出了一种终端设备,在实际场景中可以由更多种类的终端设备参与到数据处理的过程中,终端设备包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端等,具体数量和种类因实际场景而定,具体此处不做限定。另外,图1中示出了一个服务器,但在实际场景中,也可以有多个服务器的参与,特别是在多模型训练交互的场景中,服务器的数量因实际场景而定,具体此处不做限定。It is understandable that Figure 1 only shows one type of terminal device. In real-world scenarios, many more types of terminal devices can participate in the data processing. These terminal devices include, but are not limited to, mobile phones, computers, smart voice interaction devices, smart home appliances, and in-vehicle terminals. The specific number and types depend on the actual scenario and are not limited here. Additionally, while Figure 1 shows one server, multiple servers can participate in real-world scenarios, especially in multi-model training and interaction scenarios. The number of servers depends on the actual scenario and is not limited here.
需要注意的是,本实施例中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(contentdelivery network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,终端设备以及服务器可以连接组成区块链网络,本申请在此不做限制。It should be noted that in this embodiment, the server can be a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms. Terminal devices and servers can be directly or indirectly connected via wired or wireless communication, and terminal devices and servers can be connected to form a blockchain network; this application does not impose any limitations on this.
为了解决上述问题,本申请提出了一种副本数量的确定方法,该方法一般由服务器或终端设备执行,相应地,应用于副本数量的确定装置一般设置于服务器或终端设备中。To address the aforementioned issues, this application proposes a method for determining the number of copies. This method is generally executed by a server or terminal device, and correspondingly, the device for determining the number of copies is generally located in the server or terminal device.
可以理解的是,如本申请所公开的副本数量的确定方法、装置、设备及存储介质,其中多个服务器或终端设备可以组成为一区块链,而服务器或终端设备为区块链上的节点。在实际应用中,可以在区块链中需要进行节点与节点之间的数据共享,每个节点上可以存储有副本数据、节点数据等。It is understood that, as disclosed in this application, the method, apparatus, device, and storage medium for determining the number of copies can form a blockchain, where multiple servers or terminal devices can constitute nodes on the blockchain. In practical applications, data sharing between nodes may be required within the blockchain, and each node can store copy data, node data, etc.
下面将对本申请中副本数量的确定方法进行介绍,请参阅图2,本申请实施例中副本数量的确定方法一个实施例包括:The method for determining the number of copies in this application will be described below. Please refer to Figure 2. One embodiment of the method for determining the number of copies in this application includes:
在步骤S101中,获取第一集群的副本节点配置信息以及副本资源配置信息,第一集群包括若干个节点;In step S101, the replica node configuration information and replica resource configuration information of the first cluster are obtained. The first cluster includes several nodes.
在本实施例中,当需要调度集群进行副本生产时,可以先获取预调度集群即第一集群的副本节点配置信息以及副本资源配置信息,以使后续可以根据副本节点配置信息以及副本资源配置信息进行资源调度,以维护集群的负载均衡。In this embodiment, when it is necessary to schedule the cluster to produce replicas, the replica node configuration information and replica resource configuration information of the pre-scheduled cluster, i.e. the first cluster, can be obtained first, so that subsequent resource scheduling can be performed based on the replica node configuration information and replica resource configuration information to maintain the load balance of the cluster.
其中,第一集群指的是包含有若干个节点的集群,第一集群具体可以表现为运行在容器集群管理工具中的集群。容器集群管理工具具体可以表现为Kubernetes,用于容器集群的自动化部署、扩容以及管理,还可以采用其他集群管理工具,此处不作具体限制。副本节点配置信息指的是对第一集群中的节点的调度需求,具体可以表现为副本节点的选择器需求、亲和性需求以及污点和容忍度需求等,还可以表现为其他节点需求,此处不作具体限制。副本资源配置信息指的是第一集群中的节点需要生产的副本的资源需求,具体可以表现为CPU核心数需求、内存大小需求以及扩展资源需求等,还可以表现为其他资源需求,此处不作具体限制。The first cluster refers to a cluster containing several nodes, which can specifically be a cluster running within a container cluster management tool. This tool can be Kubernetes, used for automated deployment, scaling, and management of container clusters; other cluster management tools can also be used, without specific limitations. Replica node configuration information refers to the scheduling requirements for nodes in the first cluster, specifically including selector requirements, affinity requirements, taint and tolerance requirements, etc., or other node requirements, without specific limitations. Replica resource configuration information refers to the resource requirements for replicas produced by the nodes in the first cluster, specifically including CPU core count requirements, memory size requirements, and expansion resource requirements, or other resource requirements, without specific limitations.
具体地,如图9所示,目标对象可以预先通过终端设备将目标对象预调度使用的集群即第一集群,以及预调度集群进行工作负载时,对预调度集群的副本节点需求即副本节点配置信息和节点生产副本的预消耗的副本资源需求即副本资源配置信息上传至平台数据库,当目标对象需要调度集群进行副本生产时,可以根据目标对象的对象标识,从数据库中获取到与对象标识相对应的预调度集群以及预调度集群相对应的副本节点配置信息和副本资源配置信息,其中,对象标识对象身份标识码(identity,ID),用于指示目标对象,可以具体表现为整数(int)型的数字串,也可以具体表现为字符串等,可以理解的是,在本实施例以及后续的实施例中目标对象均指的是使用的终端设备的用户;或者,还可以是当目标对象需要调度集群进行副本生产时,可以通过终端设备上安装的客户端执行集群调度操作,则客户端可以接收目标对象对想要调度的目标集群的选择操作,并根据目标对象的选择操作生成集群的副本估算指令的输入界面,以获取到目标对象输入的集群的副本估算指令,如图9所示,使得服务器可以根据接收到的集群的副本估算指令中的集群标识或集群名称索引至目标集群即第一集群,具体可以是在若干个集群中遍历与所述集群标识或者集群名称一致的集群,还可以是通过其他方式,此处不作具体限制,并从数据库中读取与该集群标识或集群名称相对应的副本节点配置信息以及副本资源配置信息,还可以是通过其他方式获取副本节点配置信息以及副本资源配置信息,此处不作具体限制。Specifically, as shown in Figure 9, the target object can pre-upload the cluster it pre-schedules to use (i.e., the first cluster), the replica node requirements (i.e., replica node configuration information), and the replica resource requirements (i.e., replica resource configuration information) for the node to produce replicas to the platform database via the terminal device. When the target object needs to schedule the cluster for replica production, it can retrieve the pre-scheduled cluster, the corresponding replica node configuration information, and the replica resource configuration information from the database based on the target object's object identifier. The object identifier (identity ID) indicates the target object and can be represented as an integer (int) string or a string, etc. It is understood that in this embodiment and subsequent embodiments, the target object refers to the user of the terminal device used; or Alternatively, when the target object needs to schedule a cluster for replica production, the cluster scheduling operation can be performed through a client installed on the terminal device. The client can receive the target object's selection operation for the target cluster to be scheduled, and generate an input interface for the cluster replica estimation instruction based on the target object's selection operation, so as to obtain the cluster replica estimation instruction input by the target object, as shown in Figure 9. This allows the server to index to the target cluster, i.e., the first cluster, based on the cluster identifier or cluster name in the received cluster replica estimation instruction. Specifically, this can be done by traversing several clusters that are consistent with the cluster identifier or cluster name, or by other methods, which are not specifically limited here. The server can also read the replica node configuration information and replica resource configuration information corresponding to the cluster identifier or cluster name from the database, or obtain the replica node configuration information and replica resource configuration information by other methods, which are not specifically limited here.
可以理解的是,服务器在接收到的目标对象上传的副本节点配置信息以及副本资源配置信息时,可以按照预设的数据存储格式对接收到的副本节点配置信息以及副本资源配置信息进行格式转换处理,以便于计算机对副本节点配置信息以及副本资源配置信息的快速读取。Understandably, when the server receives the replica node configuration information and replica resource configuration information uploaded by the target object, it can perform format conversion processing on the received replica node configuration information and replica resource configuration information according to the preset data storage format, so that the computer can quickly read the replica node configuration information and replica resource configuration information.
在步骤S102中,根据副本节点配置信息对第一集群的若干个节点进行筛选操作,得到至少一个第一目标节点;In step S102, a filtering operation is performed on several nodes of the first cluster according to the replica node configuration information to obtain at least one first target node;
在本实施例中,在获取到第一集群的副本节点配置信息之后,可以根据副本节点配置信息从第一集群的若干个节点筛选出第一集群的若干个节点要求的至少一个第一目标节点,以使后续可以调度第一目标节点进行副本生产。In this embodiment, after obtaining the replica node configuration information of the first cluster, at least one first target node required by several nodes of the first cluster can be selected from several nodes of the first cluster according to the replica node configuration information, so that the first target node can be scheduled to produce replicas in the future.
具体地,如图9所示,在获取到第一集群的副本节点配置信息之后,可以根据副本节点配置信息对第一集群的若干个节点进行预选,具体可以是将副本节点配置信息包含的副本节点的选择器需求、亲和性需求以及污点和容忍度需求等信息,分别与每个节点的节点标签进行匹配,如果匹配到满足选择器需求、亲和性需求以及污点和容忍度需求等信息的节点,可以理解为是第一集群中能够被调度使用进行副本生产的节点,即第一目标节点,如果匹配到不完全满足选择器需求、亲和性需求以及污点和容忍度需求等信息的节点,可以理解为是第一集群中不能被调度使用进行副本生产的节点,则可以对该节点进行过滤或忽略处理,还可以进行其他操作,此处不作具体限制,其中,节点标签(Label)是用于表示对目标对象有意义的Kubernetes对象的属性标识,或是用于对元数据加以描述,还可以是用于命令查询时做筛选等,节点标签具体可以表现为附加到Kubernetes对象上的键值对,例如,kubectl label nodes<node-name><label-key>=<label-value>,还可以是其他标签形式,此处不作具体限制。Specifically, as shown in Figure 9, after obtaining the replica node configuration information of the first cluster, several nodes in the first cluster can be pre-selected based on the replica node configuration information. Specifically, the selector requirements, affinity requirements, taint and tolerance requirements, etc., contained in the replica node configuration information can be matched with the node labels of each node. If a node that meets the selector requirements, affinity requirements, taint and tolerance requirements is matched, it can be understood as a node in the first cluster that can be scheduled and used for replica production, i.e., the first target node. If a node that does not fully meet the selector requirements, affinity requirements, taint and tolerance requirements is matched, it can be understood as a node in the first cluster that can be scheduled and used for replica production, i.e., the first target node. Nodes in the cluster that cannot be scheduled for replica production can be filtered or ignored, and other operations can be performed. There are no specific restrictions here. Node labels are used to represent the attributes of Kubernetes objects that are meaningful to the target object, or to describe metadata, or to filter during command queries. Node labels can be key-value pairs attached to Kubernetes objects, for example, `kubectl label nodes<node-name><label-key>=<label-value>`, or other label formats. There are no specific restrictions here.
在步骤S103中,获取每个第一目标节点中最小集群调度单元的已使用资源量和总资源量;In step S103, the used resources and total resources of the smallest cluster scheduling unit in each first target node are obtained;
在本实施例中,在获取到第一集群中的至少一个第一目标节点后,可以获取每个第一目标节点中最小集群调度单元的已使用资源量和总资源量,以使后续可以根据最小集群调度单元的已使用资源量和总资源量估算每个第一目标节点的最多能够生产的副本数量。In this embodiment, after obtaining at least one first target node in the first cluster, the used resources and total resources of the smallest cluster scheduling unit in each first target node can be obtained, so that the maximum number of replicas that each first target node can produce can be estimated based on the used resources and total resources of the smallest cluster scheduling unit.
其中,每个第一目标节点可以包含有至少一个最小集群调度单元。最小集群调度单元指的是一组容器形成的集合Pod,一个Pod可以代表着一个工作负载的副本。最小集群调度单元的已使用资源量指的是当前每个第一目标节点中的已经运行的每一个Pod的资源需求。总资源量指的是每个第一目标节点的预先配置的节点总资源规格。Each first target node can contain at least one minimum cluster scheduling unit. A minimum cluster scheduling unit refers to a collection of containers called Pods; one Pod can represent a replica of a workload. The used resources of a minimum cluster scheduling unit refer to the resource requirements of each Pod currently running on each first target node. The total resources refer to the pre-configured total resource specifications of each first target node.
可以理解的是,Pod可以作为垂直应用整合的载体,用于支持同地协作,同地管理程序,例如:Pod可以用于内容管理系统、文件和数据加载以及本地缓存等;或是用于日志和检查点备份、压缩、循环以及快照等;或是用于数据交换监测、日志追踪、日志记录和监测适配器以及事件发布等;或是用作代理、网桥或适配器等;或者用于控制、管理、配置或更新应用程序,还可以有其他用途,此处不作具体限制,总体来说,独立的Pod不会去加载多个相同的应用实例。Understandably, Pods can serve as carriers for vertical application integration, supporting local collaboration and management of applications. For example, Pods can be used for content management systems, file and data loading, and local caching; or for log and checkpoint backup, compression, looping, and snapshots; or for data exchange monitoring, log tracing, logging and monitoring adapters, and event publishing; or as proxies, bridges, or adapters; or for controlling, managing, configuring, or updating applications, and may have other uses. No specific restrictions are made here. In general, a standalone Pod will not load multiple instances of the same application.
具体地,如图9所示,在获取到第一集群中的至少一个第一目标节点后,可以调用每个第一目标节点对应的线程,先获取每个第一目标节点对应的每个最小集群调度单元Pod,并从若干个Pod中找出该第一目标节点上已经运行的Pod,记为p1,p2,...,pn,然后,再分别获取每一个已经运行的Pod的资源需求,即每个最小集群调度单元的已使用资源量,以及该第一目标节点资源规格即第一目标节点的总资源量。Specifically, as shown in Figure 9, after obtaining at least one first target node in the first cluster, the thread corresponding to each first target node can be called to first obtain each Pod of the smallest cluster scheduling unit corresponding to each first target node, and find the Pods already running on the first target node from several Pods, denoted as p1, p2, ..., pn. Then, the resource requirements of each running Pod, that is, the amount of resources used by each smallest cluster scheduling unit, and the resource specifications of the first target node, that is, the total amount of resources of the first target node, can be obtained respectively.
例如,假设第一集群中的一个第一目标节点A的总资源量为CPU核心数为8、内存大小为16GB、扩展资源(如GPU)数为8以及最大支持Pod数为4。For example, suppose that the total resources of a first target node A in the first cluster are 8 CPU cores, 16GB of memory, 8 extended resources (such as GPUs), and a maximum number of supported Pods of 4.
例如,假设第一集群中的一个第一目标节点A的一个最小集群调度单元Pod A1的已使用资源量为CPU核心数为3,内存大小为5GB,以及第一目标节点A的一个最小集群调度单元Pod A2的已使用资源量为CPU核心数为1,内存大小为6GB以及扩展资源(如GPU)数为4等。For example, suppose that the minimum cluster scheduling unit Pod A1 of the first target node A in the first cluster has 3 CPU cores and 5GB of memory, and the minimum cluster scheduling unit Pod A2 of the first target node A has 1 CPU core, 6GB of memory, and 4 extended resources (such as GPUs).
在步骤S104中,根据已使用资源量和总资源量计算每个第一目标节点的剩余资源量;In step S104, the remaining resources of each first target node are calculated based on the amount of resources used and the total amount of resources.
在本实施例中,在获取到每个第一目标节点中最小集群调度单元的已使用资源量和总资源量之后,可以根据已使用资源量和总资源量计算每个第一目标节点的剩余资源量,以使后续可以根据每个第一目标节点的剩余资源量来估算每个第一目标节点的第一目标副本数量。In this embodiment, after obtaining the used resources and total resources of the smallest cluster scheduling unit in each first target node, the remaining resources of each first target node can be calculated based on the used resources and total resources, so that the number of first target replicas of each first target node can be estimated based on the remaining resources of each first target node.
具体地,如图9所示,在获取到每个第一目标节点中最小集群调度单元的已使用资源量和总资源量之后,可以调用每个第一目标节点对应的线程,对获取到的每个第一目标节点中最小集群调度单元的已使用资源量和总资源量进行差值计算,具体可以是采用如下公式(1),将每个第一目标节点的总资源量与每一个Pod的资源需求即最小集群调度单元的已使用资源量进行逐项相减,以获取每个第一目标节点的剩余资源量:Specifically, as shown in Figure 9, after obtaining the used resources and total resources of the smallest cluster scheduling unit in each first target node, the thread corresponding to each first target node can be called to calculate the difference between the used resources and total resources of the smallest cluster scheduling unit in each first target node. Specifically, the following formula (1) can be used to subtract the total resources of each first target node from the resource requirements of each Pod, i.e., the used resources of the smallest cluster scheduling unit, to obtain the remaining resources of each first target node:
其中,rs表示每个第一目标节点的总资源量,rpn表示每个最小集群调度单元的已使用资源量,rl表示每个第一目标节点的剩余资源量,n为每个第一目标节点的pod的个数。Where r<sub>s</sub> represents the total resources of each first target node, r <sub>pn</sub> represents the used resources of each smallest cluster scheduling unit, r<sub>rl</sub> represents the remaining resources of each first target node, and n is the number of pods in each first target node.
例如,假设第一集群中的一个第一目标节点A的总资源量为CPU核心数为8、内存大小为16GB、扩展资源(如GPU)数为8以及最大支持Pod数为4,第一集群中的一个第一目标节点A的一个最小集群调度单元Pod A1的已使用资源量为CPU核心数为3,内存大小为5GB,以及第一目标节点A的一个最小集群调度单元Pod A2的已使用资源量为CPU核心数为1,内存大小为6GB以及扩展资源(如GPU)数为4等,那么通过公式(1)可以计算得到第一目标节点A的剩余资源量为CPU核心数为4、内存大小为5GB、扩展资源(如GPU)数为4以及最大支持Pod数为2。For example, suppose that the total resources of a first target node A in the first cluster are 8 CPU cores, 16GB memory, 8 extended resources (such as GPUs) and a maximum number of supported Pods of 4. The used resources of a minimum cluster scheduling unit Pod A1 of a first target node A in the first cluster are 3 CPU cores and 5GB memory, and the used resources of a minimum cluster scheduling unit Pod A2 of a first target node A are 1 CPU core, 6GB memory and 4 extended resources (such as GPUs), etc. Then, the remaining resources of the first target node A can be calculated by formula (1) as 4 CPU cores, 5GB memory, 4 extended resources (such as GPUs) and a maximum number of supported Pods of 2.
在步骤S105中,根据数量计算策略,对剩余资源量和副本资源配置信息中各资源量进行计算,得到每个第一目标节点的第一目标副本数量;In step S105, the remaining resource quantity and the resource quantity in the replica resource configuration information are calculated according to the quantity calculation strategy to obtain the first target replica quantity of each first target node.
在本实施例中,在获取到每个第一目标节点的剩余资源量之后,可以按照数量计算策略,分别对剩余资源量和副本资源配置信息中各资源量进行计算,以获取每个第一目标节点的第一目标副本数量。In this embodiment, after obtaining the remaining resources of each first target node, the remaining resources and each resource in the replica resource configuration information can be calculated according to the quantity calculation strategy to obtain the number of first target replicas of each first target node.
其中,数量计算策略具体可以表现数量计算规则、约束条件、函数表达式或数量计算公式等,还可以表现为其他计算策略,此处不作具体限制。The quantity calculation strategy can specifically represent quantity calculation rules, constraints, function expressions, or quantity calculation formulas, and can also be represented by other calculation strategies, without specific restrictions here.
具体地,如图9所示,在获取到每个第一目标节点的副本资源配置信息以及剩余资源量之后,可以按照数量计算策略如下公式(2),来计算剩余资源量和副本资源配置信息中各资源量之间的比值,以获取第一目标的节点剩余资源量最多支持生产的副本数量即第一目标副本数量:Specifically, as shown in Figure 9, after obtaining the replica resource configuration information and remaining resource quantity of each first target node, the ratio between the remaining resource quantity and the resource quantity in the replica resource configuration information can be calculated according to the following formula (2) to obtain the maximum number of replicas that the remaining resource quantity of the first target node can support, i.e., the number of first target replicas:
其中,a1表示每个第一目标节点的第一目标副本数量,rd表示每个第一目标节点的副本资源配置信息中一资源量,rl表示每个第一目标节点的剩余资源量。Where a1 represents the number of first target replicas for each first target node, rd represents the resource quantity in the replica resource configuration information of each first target node, and rl represents the remaining resource quantity of each first target node.
例如,假设输入的工作负载R副本资源需求即副本资源配置信息为CPU核心数为2以及内存大小为3GB,假设第一目标节点A的剩余资源量为CPU核心数为4、内存大小为5GB、扩展资源(如GPU)数为4以及最大支持Pod数为2,通过公式(2)有剩余资源量的CPU核心数和副本资源配置信息的CPU核心数之间的比值为2,剩余资源量的内存大小和副本资源配置信息的内存大小之间的比值取较小整数值约为1,则可以将剩余资源量和副本资源配置信息中各资源量之间的比值中的最小数值作为第一目标副本数量即为1。For example, suppose the input workload R has the following resource requirements, i.e., the replica resource configuration information is 2 CPU cores and 3GB memory. Suppose the remaining resources of the first target node A are 4 CPU cores, 5GB memory, 4 extended resources (such as GPUs), and a maximum number of supported Pods of 2. According to formula (2), the ratio between the number of CPU cores in the remaining resources and the number of CPU cores in the replica resource configuration information is 2. The ratio between the memory size in the remaining resources and the memory size in the replica resource configuration information is taken as the smaller integer value of approximately 1. Then, the minimum value among the ratios between the remaining resources and the resource sizes in the replica resource configuration information can be taken as the number of the first target replicas, which is 1.
在步骤S106中,对每个第一目标节点的第一目标副本数量进行汇总,得到第一集群的第一目标副本总数。In step S106, the number of first target replicas for each first target node is summed to obtain the total number of first target replicas for the first cluster.
在本实施例中,在获取到每个第一目标节点的第一目标副本数量之后,可以对每个第一目标节点的第一目标副本数量进行汇总以获取第一集群的第一目标副本总数,能够在多集群调度器调度之前准确感知到第一集群的可支持创建的第一目标副本总数,从而维护集群资源负载均衡。In this embodiment, after obtaining the number of first target replicas for each first target node, the number of first target replicas for each first target node can be summarized to obtain the total number of first target replicas for the first cluster. This allows for accurate perception of the total number of first target replicas that the first cluster can support creation before scheduling by the multi-cluster scheduler, thereby maintaining the load balance of cluster resources.
具体地,如图9所示,在获取到每个第一目标节点的第一目标副本数量之后,可以对每个第一目标节点的第一目标副本数量进行汇总,具体可以是对获取到的每个第一目标节点的第一目标副本数量进行加和处理,还可以采用其他汇总方式如加权求和等,此处不作具体限制,以获取第一集群的第一目标副本总数。Specifically, as shown in Figure 9, after obtaining the number of first target replicas for each first target node, the number of first target replicas for each first target node can be summarized. Specifically, the obtained number of first target replicas for each first target node can be summed, or other summation methods such as weighted summation can be used. No specific restrictions are imposed here, in order to obtain the total number of first target replicas for the first cluster.
例如,假设第一目标节点A的第一目标副本数量为1,第一目标节点B的第一目标副本数量为2,以及第一目标节点C的第一目标副本数量为1,可以将每个第一目标节点的第一目标副本数量进行加和计算,得到第一集群的第一目标副本总数为4。For example, assuming that the number of first target replicas of the first target node A is 1, the number of first target replicas of the first target node B is 2, and the number of first target replicas of the first target node C is 1, the number of first target replicas of each first target node can be summed to obtain the total number of first target replicas of the first cluster as 4.
在本申请实施例中,提供了一种副本数量的确定方法,通过上述方式,能够通过获取到的副本节点配置信息从第一集群中筛选出多个第一目标节点,并通过副本资源配置信息计算得到每个第一目标节点的第一目标副本数量,以获取第一集群的第一目标副本总数,能够在多集群调度器调度之前准确感知到第一集群的可支持创建的第一目标副本总数,可以避免多集群调度器分配给超过第一目标副本总数的副本给第一集群进行生产,从而实现集群资源负载均衡。In this embodiment of the application, a method for determining the number of replicas is provided. By means of the above method, multiple first target nodes can be selected from the first cluster by obtaining the replica node configuration information, and the number of first target replicas of each first target node can be calculated by means of the replica resource configuration information, so as to obtain the total number of first target replicas of the first cluster. This method can accurately perceive the total number of first target replicas that the first cluster can support creation before the multi-cluster scheduler schedules, and can avoid the multi-cluster scheduler from allocating replicas exceeding the total number of first target replicas to the first cluster for production, thereby achieving cluster resource load balancing.
可选地,在上述图2对应的实施例的基础上,本申请实施例提供的副本数量的确定方法另一个可选实施例中,如图3所示,副本节点配置信息包括选择器配置数据、亲和性配置数据以及污点和容忍度配置数据;Optionally, based on the embodiment corresponding to Figure 2 above, in another optional embodiment of the method for determining the number of replicas provided in this application, as shown in Figure 3, the replica node configuration information includes selector configuration data, affinity configuration data, and taint and tolerance configuration data;
根据副本节点配置信息对第一集群的若干个节点进行筛选操作,得到至少一个第一目标节点,包括:Based on the replica node configuration information, a filtering operation is performed on several nodes in the first cluster to obtain at least one primary target node, including:
在步骤S301中,根据选择器配置数据,从第一集群的若干个节点中筛选出第一候选节点;In step S301, a first candidate node is selected from several nodes in the first cluster according to the selector configuration data;
在步骤S302中,根据亲和性配置数据,从若干个第一候选节点中筛选出第二候选节点;In step S302, a second candidate node is selected from several first candidate nodes based on the affinity configuration data;
在步骤S303中,根据污点和容忍度配置数据,从若干个第二候选节点中筛选出第三候选节点;In step S303, a third candidate node is selected from several second candidate nodes based on the taint and tolerance configuration data;
在步骤S304中,将若干个第三候选节点中满足调度条件的节点作为第一目标节点。In step S304, the node that meets the scheduling conditions among the several third candidate nodes is selected as the first target node.
在本实施例中,在获取到第一集群的副本节点配置信息之后,可以先根据副本节点配置信息中的选择器配置数据,从第一集群的若干个节点中筛选出第一候选节点,再根据副本节点配置信息中的亲和性配置数据,从若干个第一候选节点中筛选出第二候选节点,进而可以根据副本节点配置信息中的污点和容忍度配置数据,从若干个第二候选节点中筛选出第三候选节点,然后,可以将满足调度条件的第三候选节点节点作为第一目标节点,以使后续可以调度第一目标节点进行副本生产。In this embodiment, after obtaining the replica node configuration information of the first cluster, a first candidate node can be selected from several nodes of the first cluster based on the selector configuration data in the replica node configuration information. Then, a second candidate node can be selected from several first candidate nodes based on the affinity configuration data in the replica node configuration information. Furthermore, a third candidate node can be selected from several second candidate nodes based on the taint and tolerance configuration data in the replica node configuration information. Finally, the third candidate node that meets the scheduling conditions can be used as the first target node so that the first target node can be scheduled for replica production in the future.
其中,选择器配置数据用于通过节点选择器进行标签选择,以获取满足节点选择器的表达式的节点标签。标签选择具体可以是通过等值选择方式、集合选择方式、matchLabels或matchExpressions等方式,还可以是其他方式,此处不作具体限制。等值选择方式,可以采用表达式如“disktype=ssd&&disksize=big”,其中,等值可以表示为=、==或!=,可以理解的是,=和==无区别,以及在多个需求(如多个label)的情况下,可以使用&&运算符来表示。集合选择方式,支持如in、notin和exists三种操作符。matchLabels是由{key,value}对组成的映射,可以理解的是matchLabels映射中的单个{key,value}可以等同于matchExpressions的元素,其key字段为“key”,operator为“In”,而values数组仅包含“value”。matchExpressions可以理解为Pod选择算符需求的列表,有效的运算符包括In、NotIn、Exists以及DoesNotExist,但是在运算符为In和NotIn的情况下,通常设置的值必须是非空的。The selector configuration data is used to select labels using node selectors to obtain node labels that satisfy the expressions of the node selectors. Label selection can be performed using equality selection, set selection, `matchLabels`, or `matchExpressions`, or other methods, without specific restrictions. Equality selection can use expressions such as "disktype=ssd&&disksize=big", where equality can be represented by =, ==, or !=. It's understood that = and == are indistinguishable, and the && operator can be used for multiple requirements (such as multiple labels). Set selection supports operators such as in, notin, and exists. `matchLabels` is a mapping composed of {key, value} pairs. A single {key, value} in a `matchLabels` mapping is equivalent to an element in a `matchExpressions`, where the key field is "key", the operator is "In", and the values array only contains "value". matchExpressions can be understood as a list of operators that a Pod needs to select. Valid operators include In, NotIn, Exists, and DoesNotExist. However, when the operator is In or NotIn, the value set must usually be non-empty.
其中,亲和性配置数据类似于选择器配置数据,可以根据节点上的标签约束pod可以调度到哪些节点,即也可以用于节点筛选。可以通过字段-Affinity设置亲和性,例如,节点亲和性nodeAffinity,反亲和性使用字段-AntiAffinity,例如,nodeAntiAffinity,然后可以通过节点亲和性语法支持的操作符如In、NotIn、Exists、DoesNotExist、Gt以及Lt等配置亲和性集合表达式,用于筛选符合亲和性集合的节点标签。Affinity configuration data is similar to selector configuration data. It can constrain which nodes a pod can be scheduled to based on the labels on the nodes, and can also be used for node filtering. Affinity can be set through the -Affinity field, for example, nodeAffinity. Anti-affinity is set using the -AntiAffinity field, for example, nodeAntiAffinity. Then, affinity set expressions can be configured using operators supported by the node affinity syntax, such as In, NotIn, Exists, DoesNotExist, Gt, and Lt, to filter node labels that match the affinity set.
其中,污点和容忍度配置数据包括节点污点(taint)和容忍度(Tolerations),节点污点可以排斥一类特定的pod,而容忍度则表示能够容忍这个对象的污点。可以理解为当节点添加一个污点后,除非pod声明能够容忍这个污点,否则pod不会被调度到这个节点上。通常系统会尽量避免将Pod调度到存在其不能容忍污点的节点上,但这不是强制的。可以理解的是,Kubernetes处理多个污点和容忍度的过程就像一个过滤器:从一个节点的所有污点开始遍历,过滤掉那些Pod中存在与之不匹配的容忍度的污点。The taint and tolerance configuration data includes node taints and tolerances. Node taints exclude a specific type of pod, while tolerances indicate whether a pod can tolerate a taint on that pod. Think of it this way: when a taint is added to a node, unless a pod declares it can tolerate that taint, the pod will not be scheduled to that node. Typically, the system tries to avoid scheduling pods to nodes with taints it cannot tolerate, but this is not mandatory. In essence, Kubernetes handles multiple taints and tolerances like a filter: starting from all taints on a node, it iterates through them, filtering out pods with taints that don't match the tolerance.
具体地,如图9所示,在获取到第一集群的副本节点配置信息之后,可以根据副本节点配置信息对第一集群的若干个节点进行层层过滤或筛选以获取至少一个第一目标节点,具体可以是先根据副本对于节点的选择器要求即选择器配置数据,从第一集群所有节点中匹配到节点标签满足选择器的节点作为第一候选节点,进而可以根据副本对于节点的亲和性要求即亲和性配置数据,从若干个第一候选节点中匹配到节点标签满足亲和性集合表达式的节点作为第二候选节点,然后可以根据副本污点和容忍度要求即污点和容忍度配置数据,在若干个第二候选节点中进行遍历,选取污点与副本容忍度相匹配的节点作为第三候选节点,最后,可以将若干个第三候选节点中满足调度条件如可运行可调度的节点作为第一目标节点。Specifically, as shown in Figure 9, after obtaining the replica node configuration information of the first cluster, several nodes of the first cluster can be filtered or screened layer by layer according to the replica node configuration information to obtain at least one first target node. Specifically, firstly, according to the replica's node selector requirements, i.e., selector configuration data, nodes whose node labels satisfy the selector are matched from all nodes in the first cluster as first candidate nodes. Then, according to the replica's node affinity requirements, i.e. affinity configuration data, nodes whose node labels satisfy the affinity set expression are matched from several first candidate nodes as second candidate nodes. Then, according to the replica taint and tolerance requirements, i.e. taint and tolerance configuration data, several second candidate nodes are traversed to select nodes whose taint matches the replica tolerance as third candidate nodes. Finally, nodes among several third candidate nodes that meet the scheduling conditions, such as being runnable and schedulable, can be selected as the first target node.
可选地,在上述图3对应的实施例的基础上,本申请实施例提供的副本数量的确定方法另一个可选实施例中,如图4所示,将若干个第三候选节点中满足调度条件的节点作为第一目标节点,包括:Optionally, based on the embodiment corresponding to Figure 3 above, in another optional embodiment of the method for determining the number of replicas provided in this application, as shown in Figure 4, the node that meets the scheduling conditions among a plurality of third candidate nodes is taken as the first target node, including:
在步骤S401中,获取若干个第三候选节点的节点状态信息;In step S401, the node status information of several third candidate nodes is obtained;
在步骤S402中,若节点状态信息为可运行且可调度状态,则将第三候选节点作为第一目标节点。In step S402, if the node status information is in a runnable and schedulable state, then the third candidate node is selected as the first target node.
在本实施例中,在获取到第一集群的若干个第三候选节点之后,可以获取若干个第三候选节点中每个节点的节点状态信息,如果当节点状态信息为可运行且可调度状态时,可以将第三候选节点作为第一目标节点,能够通过节点状态信息快速从若干个第三候选节点筛选出能够满足后续调度生产副本需求的第一目标节点,从而可以在一定程度上提高集群资源调度效率。In this embodiment, after obtaining several third candidate nodes of the first cluster, the node status information of each of the several third candidate nodes can be obtained. If the node status information is in a runnable and schedulable state, the third candidate node can be used as the first target node. The first target node that can meet the subsequent scheduling production replica requirements can be quickly selected from several third candidate nodes through the node status information, thereby improving the cluster resource scheduling efficiency to a certain extent.
其中,节点状态信息指的是在第一集群中当前每个节点的运行、调度以及资源使用状态,节点状态信息具体可以表现为节点状态标签或节点状态列表,还可以是其他节点状态标记,此处不作具体限制。Among them, node status information refers to the current running, scheduling and resource usage status of each node in the first cluster. Node status information can be represented as node status labels or node status lists, or other node status markers, without specific restrictions here.
具体地,在获取到第一集群的若干个第三候选节点之后,获取若干个第三候选节点的节点状态信息,具体可以是通过调用监测接口,获取若干个第三候选节点中每个节点的实时监测的节点状态标签,或者,还可以是从第一集群对应的节点状态列表中读取若干个第三候选节点中每个节点的节点状态数据,还可以是采用其他获取方式,此处不作具体限制,然后,可以根据调度条件为可运行且可调度状态,从若干个第三候选节点筛选出当前节点状态信息为可运行且可调度状态的节点,作为第一目标节点,以避免因第三候选节点不可运行或不可调度,导致后续副本生产失败的情况,从而可以在一定程度上维护集群资源负载均衡。Specifically, after obtaining several third candidate nodes of the first cluster, the node status information of these third candidate nodes is obtained. This can be done by calling a monitoring interface to obtain the real-time monitoring node status label of each of the several third candidate nodes, or by reading the node status data of each of the several third candidate nodes from the node status list corresponding to the first cluster, or by using other methods. No specific restrictions are imposed here. Then, based on the scheduling conditions of being runnable and schedulable, nodes whose current node status information is runnable and schedulable are selected from the several third candidate nodes as the first target nodes. This is to avoid the failure of subsequent replica production due to the third candidate nodes being inoperable or unschedulable, thereby maintaining the cluster resource load balance to a certain extent.
可选地,在上述图2对应的实施例的基础上,本申请实施例提供的副本数量的确定方法另一个可选实施例中,如图5所示,根据数量计算策略,对剩余资源量和副本资源配置信息中各资源量进行计算,得到每个第一目标节点的第一目标副本数量之前,该方法还包括:步骤S501,以及步骤S105包括:步骤S502;Optionally, based on the embodiment corresponding to Figure 2 above, in another optional embodiment of the method for determining the number of replicas provided in this application embodiment, as shown in Figure 5, before calculating the remaining resource quantity and each resource quantity in the replica resource configuration information according to the quantity calculation strategy to obtain the first target replica quantity of each first target node, the method further includes: step S501, and step S105 includes: step S502.
在步骤S501中,获取每个第一目标节点中最小集群调度单元的已运行个数和总个数;In step S501, the number of running and the total number of the smallest cluster scheduling units in each first target node are obtained;
在步骤S502中,根据数量计算策略,对剩余资源量、副本资源配置信息中各资源量、已运行个数和总个数进行计算,得到每个第一目标节点的第一目标副本数量。In step S502, based on the quantity calculation strategy, the remaining resource quantity, the resource quantity in the replica resource configuration information, the number of already run and the total number are calculated to obtain the number of first target replicas for each first target node.
在本实施例中,在获取到第一集群中的至少一个第一目标节点后,可以获取每个第一目标节点中最小集群调度单元的已运行个数和总个数,然后可以按照数量计算策略,对剩余资源量、副本资源配置信息中各资源量、已运行个数和总个数进行计算,以获取每个第一目标节点的第一目标副本数量,能够通过每个第一目标节点中最小集群调度单元的已运行个数和总个数进一步估算每个第一目标节点最大支持生产副本的数量,以更加准确地获取第一集群的第一目标副本总数,从而可以一定程度上提高集群的资源负载均衡。In this embodiment, after obtaining at least one first target node in the first cluster, the number of running and total number of the smallest cluster scheduling units in each first target node can be obtained. Then, according to the quantity calculation strategy, the remaining resource quantity, the resource quantity in the replica resource configuration information, the number of running and total number can be calculated to obtain the number of first target replicas for each first target node. The maximum number of production replicas that each first target node can support can be further estimated by the number of running and total number of the smallest cluster scheduling units in each first target node, so as to obtain the total number of first target replicas in the first cluster more accurately, thereby improving the resource load balancing of the cluster to a certain extent.
其中,每个第一目标节点中最小集群调度单元的已运行个数指的是第一目标节点上已经运行的Pod的数量,如p1,p2,...,pn的已运行个数为n。每个第一目标节点中最小集群调度单元的总个数指的是第一目标节点上已经运行的Pod的数量和未运行的Pod的数量。The number of running Pods in the smallest cluster scheduling unit of each first target node refers to the number of Pods already running on the first target node, such as n for p1, p2, ..., pn. The total number of smallest cluster scheduling units in each first target node refers to the number of running Pods and the number of non-running Pods on the first target node.
具体地,在获取到第一集群中的至少一个第一目标节点后,可以获取每个第一目标节点中最小集群调度单元的已运行个数和总个数,然后可以按照数量计算策略如上述公式(2),来计算剩余资源量和副本资源配置信息中各资源量之间的比值,以及按照数量计算策略如差值或比值等,还可以是其他计算策略,此处不作具体限制,来对已运行个数和总个数进行计算,以获取相应的计算结果,从而可以从多个计算结果中筛选出如计算结果数值最小的作为每个第一目标节点的第一目标副本数量。Specifically, after obtaining at least one first target node in the first cluster, the number of running and the total number of the smallest cluster scheduling unit in each first target node can be obtained. Then, the ratio between the remaining resource quantity and the resource quantity in the replica resource configuration information can be calculated according to the quantity calculation strategy as described in the above formula (2), and the number of running and the total number can be calculated according to the quantity calculation strategy such as difference or ratio, or other calculation strategies, without specific restrictions here, to obtain the corresponding calculation results. Thus, the smallest calculation result value can be selected from multiple calculation results as the number of first target replicas for each first target node.
可选地,在上述图5对应的实施例的基础上,本申请实施例提供的副本数量的确定方法另一个可选实施例中,如图6所示,根据数量计算策略,对剩余资源量、副本资源配置信息中各资源量、已运行个数和总个数进行计算,得到每个第一目标节点的第一目标副本数量,包括:Optionally, based on the embodiment corresponding to Figure 5 above, in another optional embodiment of the method for determining the number of replicas provided in this application, as shown in Figure 6, according to the quantity calculation strategy, the remaining resource quantity, the resource quantity in the replica resource configuration information, the number of already running replicas, and the total number of replicas are calculated to obtain the first target replica quantity for each first target node, including:
在步骤S601中,计算剩余资源量和副本资源配置信息中各资源量之间的比值,得到至少一个第一副本数量;In step S601, the ratio between the remaining resource quantity and each resource quantity in the replica resource configuration information is calculated to obtain at least one first replica quantity;
在步骤S602中,计算已运行个数和总个数之间的比值,得到第二副本数量;In step S602, the ratio between the number of copies already run and the total number is calculated to obtain the second number of copies;
在步骤S603中,从至少一个第一副本数量和第二副本数量中选取数值最小的作为第一目标副本数量。In step S603, the smallest value among at least one first replica quantity and a second replica quantity is selected as the first target replica quantity.
在本实施例中,在获取到每个第一目标节点中最小集群调度单元的已运行个数和总个数之后,可以按照数量计算策略,计算剩余资源量和副本资源配置信息中各资源量之间的比值,以获取至少一个第一副本数量,同理,可以计算已运行个数和总个数之间的比值,以获取第二副本数量,然后,从至少一个第一副本数量和第二副本数量中选取数值最小的作为第一目标副本数量,能够通过剩余资源量、副本资源配置信息中各资源量、已运行个数和总个数等多个维度的比较,更加准确地计算得到每个第一目标节点最大支持生产的副本数量即第一目标数量,以更加准确地获取第一集群的第一目标副本总数,从而在多集群调度过程中,能够通过预先计算出每个集群工作负载的最大可用副本数即第一目标副本总数,可以防止每个集群被分配的工作负载的副本过多导致集群无法运行的情况,能够提高调度的精确性,可以一定程度上提高集群的资源负载均衡。In this embodiment, after obtaining the number of running and total number of the smallest cluster scheduling units in each first target node, the ratio between the remaining resource quantity and the resource quantity in the replica resource configuration information can be calculated according to the quantity calculation strategy to obtain at least one first replica quantity. Similarly, the ratio between the number of running and the total number can be calculated to obtain the second replica quantity. Then, the smallest value among the at least one first replica quantity and the second replica quantity is selected as the first target replica quantity. By comparing multiple dimensions such as the remaining resource quantity, the resource quantity in the replica resource configuration information, the number of running and the total number, the maximum number of replicas that each first target node can support for production, i.e., the first target quantity, can be calculated more accurately. This allows for a more accurate acquisition of the total number of first target replicas in the first cluster. Thus, in the multi-cluster scheduling process, by pre-calculating the maximum number of available replicas for the workload of each cluster, i.e., the total number of first target replicas, it is possible to prevent the cluster from being unable to run due to too many replicas of the workload allocated to each cluster. This improves the accuracy of scheduling and can improve the resource load balancing of the cluster to a certain extent.
具体地,在获取到每个第一目标节点中的剩余资源量、副本资源配置信息、最小集群调度单元的已运行个数和总个数之后,可以按照数量计算策略,对剩余资源量、副本资源配置信息中各资源量、已运行个数和总个数进行计算,具体可以是采用上述公式(2),来计算剩余资源量和副本资源配置信息中各资源量之间的比值,以获取到至少一个第一副本数量,此处不再赘述,同理,按照数量计算策略如下公式(3)可以将第一目标节点最大支持Pod数即总个数与该第一目标节点上已经运行的Pod数即已运行个数进行相减,以获取该第一目标节点剩余允许生产的Pod总数作为第二副本数量:Specifically, after obtaining the remaining resources, replica resource configuration information, number of running units and total number of units in each first target node, the remaining resources, replica resource configuration information, number of running units and total number can be calculated according to the quantity calculation strategy. Specifically, the above formula (2) can be used to calculate the ratio between the remaining resources and the resource quantities in the replica resource configuration information to obtain at least one first replica quantity. This will not be elaborated here. Similarly, according to the quantity calculation strategy, the following formula (3) can be used to subtract the maximum number of Pods supported by the first target node (i.e., the total number) from the number of Pods already running on the first target node (i.e., the number of running units) to obtain the remaining number of Pods that the first target node can produce as the second replica quantity.
a2=m-n (3);a 2 = mn (3);
其中,a2表示每个第一目标节点的第二副本数量,m表示每个第一目标节点的最小集群调度单元的总个数,n表示每个第一目标节点的最小集群调度单元的已运行个数。Where a2 represents the number of second replicas for each first target node, m represents the total number of minimum cluster scheduling units for each first target node, and n represents the number of running minimum cluster scheduling units for each first target node.
进一步地,在获取到至少一个第一副本数量和第二副本数量之后,可以按照如下公式(4)将至少一个第一副本数量和第二副本数量进行两两比较,以获取数值最小的作为第一目标副本数量:Furthermore, after obtaining at least one first replica quantity and a second replica quantity, the at least one first replica quantity and the second replica quantity can be compared pairwise according to the following formula (4) to obtain the one with the smallest value as the first target replica quantity:
a3=min(a1,a2) (4);a 3 =min(a 1 , a 2 ) (4);
其中,a3表示每个第一目标节点的第一目标副本数量。Where a3 represents the number of first target replicas for each first target node.
可选地,在上述图2对应的实施例的基础上,本申请实施例提供的副本数量的确定方法另一个可选实施例中,如图7所示,当存在至少两个集群时;Optionally, based on the embodiment corresponding to Figure 2 above, in another optional embodiment of the method for determining the number of replicas provided in this application embodiment, as shown in Figure 7, when there are at least two clusters;
对每个第一目标节点的第一目标副本数量进行汇总,得到第一集群的第一目标副本总数之后,该方法还包括:After summing the number of first target replicas for each first target node to obtain the total number of first target replicas for the first cluster, the method further includes:
在步骤S701中,获取至少两个集群中的第二集群的第二目标副本总数;In step S701, the total number of second target replicas in the second cluster of at least two clusters is obtained;
在步骤S702中,计算第一目标副本总数和第二目标副本总数之间的比值,得到至少两个集群对应的副本分配权值。In step S702, the ratio between the total number of the first target replicas and the total number of the second target replicas is calculated to obtain the replica allocation weights for at least two clusters.
在本实施例中,如果当前存在至少两个集群时,在获取到第一集群的第一目标副本总数之后,可以获取其他集群如第二集群的第二目标副本总数,然后可以通过计算第一目标副本总数和第二目标副本总数之间的比值,以获取至少两个集群对应的副本分配权值,以使后续在多集群调度过程中,能够通过预先计算出每个集群工作负载的目标副本总数来获取至少两个集群对应的副本分配权值,以更精确合理地分发副本数到不同的集群中进行副本生产,从而可以在一定程度上维护多集群的负载均衡。In this embodiment, if there are at least two clusters, after obtaining the first target total number of replicas for the first cluster, the second target total number of replicas for other clusters, such as the second cluster, can be obtained. Then, the replica allocation weights corresponding to at least two clusters can be obtained by calculating the ratio between the first target total number of replicas and the second target total number of replicas. This allows the replica allocation weights corresponding to at least two clusters to be obtained by pre-calculating the target total number of replicas for the workload of each cluster during subsequent multi-cluster scheduling. This enables more accurate and reasonable distribution of replicas to different clusters for replica production, thereby maintaining load balancing across multiple clusters to a certain extent.
具体地,如果当前存在至少两个集群时,即预调度的集群可以为多个,在获取到第一集群的第一目标副本总数之后,可以按照步骤S101至步骤S106获取第一集群的第一目标副本总数的方式,获取其他集群如第二集群的第二目标副本总数,此处不再赘述,可以理解的是,第二集群可以用于泛指第三集群、第四集群以及第n集群等,同理还可以获取第三集群的第三目标副本总数或第四集群的第四目标副本总数等,此处不作具体限制。Specifically, if there are at least two clusters, that is, there can be multiple pre-scheduled clusters, after obtaining the total number of the first target replicas of the first cluster, the total number of the second target replicas of other clusters, such as the second cluster, can be obtained in the same way as obtaining the total number of the first target replicas of the first cluster in steps S101 to S106. It will not be elaborated here. It can be understood that the second cluster can be used to refer to the third cluster, the fourth cluster, and the nth cluster, etc. Similarly, the total number of the third target replicas of the third cluster or the total number of the fourth target replicas of the fourth cluster can also be obtained, etc. There are no specific restrictions here.
进一步地,在获取到至少两个集群中的第二集群的第二目标副本总数之后,可以通过计算第一目标副本总数和第二目标副本总数之间的比值,以获取至少两个集群对应的副本分配权值,并将该至少两个集群对应的副本分配权值存储至数据库中,例如,假设第一集群的第一目标副本总数为4,第二集群的第二目标副本总数为3,计算第一目标副本总数和第二目标副本总数之间的比值,可以得到至少两个集群对应的副本分配权值为4:3。Furthermore, after obtaining the total number of second target replicas in the second cluster of at least two clusters, the replica allocation weights corresponding to at least two clusters can be obtained by calculating the ratio between the total number of first target replicas and the total number of second target replicas, and these replica allocation weights corresponding to at least two clusters can be stored in the database. For example, assuming that the total number of first target replicas in the first cluster is 4 and the total number of second target replicas in the second cluster is 3, the replica allocation weights corresponding to at least two clusters can be obtained as 4:3 by calculating the ratio between the total number of first target replicas and the total number of second target replicas.
可选地,在上述图7对应的实施例的基础上,本申请实施例提供的副本数量的确定方法另一个可选实施例中,如图8所示,计算第一目标副本总数和第二目标副本总数之间的比值,得到至少两个集群对应的副本分配权值之后,该方法还包括:Optionally, based on the embodiment corresponding to Figure 7 above, in another optional embodiment of the method for determining the number of replicas provided in this application, as shown in Figure 8, after calculating the ratio between the first target total number of replicas and the second target total number of replicas to obtain the replica allocation weights corresponding to at least two clusters, the method further includes:
在步骤S801中,接收集群调度请求,其中,集群调度请求携带有待生产的副本总数;In step S801, a cluster scheduling request is received, wherein the cluster scheduling request carries the total number of replicas to be produced;
在步骤S802中,根据副本分配权值,将待生产的副本总数划分第一集群的第一待生产副本数量以及第二集群的第二待生产副本数量;In step S802, the total number of replicas to be produced is divided into the first number of replicas to be produced in the first cluster and the second number of replicas to be produced in the second cluster according to the replica allocation weight.
在步骤S803中,分别将第一待生产副本数量以及第二待生产副本数量分配至第一集群以及第二集群进行副本生产。In step S803, the first number of replicas to be produced and the second number of replicas to be produced are allocated to the first cluster and the second cluster respectively for replica production.
在本实施例中,在获取到至少两个集群对应的副本分配权值之后,当接收到携带有待生产的副本总数以及预调度的集群标识的集群调度请求时,可以根据副本分配权值,将待生产的副本总数划分第一集群的第一待生产副本数量以及第二集群的第二待生产副本数量,然后,可以分别将第一待生产副本数量以及第二待生产副本数量分配至第一集群以及第二集群进行副本生产,能够更精确合理地将副本数分发到不同的集群中进行副本生产,从而可以在一定程度上维护多集群的负载均衡。In this embodiment, after obtaining the replica allocation weights corresponding to at least two clusters, when a cluster scheduling request carrying the total number of replicas to be produced and the pre-scheduled cluster identifier is received, the total number of replicas to be produced can be divided into the first number of replicas to be produced in the first cluster and the second number of replicas to be produced in the second cluster according to the replica allocation weights. Then, the first number of replicas to be produced and the second number of replicas to be produced can be allocated to the first cluster and the second cluster respectively for replica production. This can more accurately and reasonably distribute the number of replicas to different clusters for replica production, thereby maintaining the load balance of multiple clusters to a certain extent.
具体地,在获取到至少两个集群对应的副本分配权值之后,当接收到携带有待生产的副本总数以及预调度的集群标识的集群调度请求时,可以根据预调度的集群标识从数据库中读取与该集群标识项对应的至少两个集群对应的副本分配权值,进而,可以按照副本分配权值对待生产的副本总数进行划分,具体划分得到第一集群的第一待生产副本数量以及第二集群的第二待生产副本数量,然后,可以分别将第一待生产副本数量以及第二待生产副本数量分配至第一集群以及第二集群进行副本生产,实现多集群的负载均衡。Specifically, after obtaining the replica allocation weights corresponding to at least two clusters, when a cluster scheduling request carrying the total number of replicas to be produced and the pre-scheduled cluster identifier is received, the replica allocation weights corresponding to at least two clusters corresponding to the pre-scheduled cluster identifier can be read from the database according to the pre-scheduled cluster identifier. Then, the total number of replicas to be produced can be divided according to the replica allocation weights, specifically dividing the first number of replicas to be produced in the first cluster and the second number of replicas to be produced in the second cluster. Then, the first number of replicas to be produced and the second number of replicas to be produced can be allocated to the first cluster and the second cluster respectively for replica production, thereby achieving load balancing of multiple clusters.
下面对本申请中的副本数量的确定装置进行详细描述,请参阅图10,图10为本申请实施例中副本数量的确定装置的一个实施例示意图,副本数量的确定装置20包括:The following is a detailed description of the device for determining the number of copies in this application. Please refer to Figure 10, which is a schematic diagram of an embodiment of the device for determining the number of copies in this application. The device 20 for determining the number of copies includes:
获取单元201,用于获取第一集群的副本节点配置信息以及副本资源配置信息,第一集群包括若干个节点;The acquisition unit 201 is used to acquire the replica node configuration information and replica resource configuration information of the first cluster, which includes several nodes;
处理单元202,用于根据副本节点配置信息对第一集群的若干个节点进行筛选操作,得到至少一个第一目标节点;Processing unit 202 is used to perform a filtering operation on several nodes of the first cluster according to the replica node configuration information to obtain at least one first target node;
获取单元201,还用于获取每个第一目标节点中最小集群调度单元的已使用资源量和总资源量;The acquisition unit 201 is also used to acquire the amount of resources used and the total amount of resources of the smallest cluster scheduling unit in each first target node;
处理单元202,还用于根据已使用资源量和总资源量计算每个第一目标节点的剩余资源量;The processing unit 202 is also used to calculate the remaining resources of each first target node based on the amount of resources used and the total amount of resources;
处理单元202,还用于根据数量计算策略,对剩余资源量和副本资源配置信息中各资源量进行计算,得到每个第一目标节点的第一目标副本数量;Processing unit 202 is also used to calculate the number of first target replicas for each first target node based on the remaining resource quantity and the replica resource configuration information according to the quantity calculation strategy.
处理单元202,还用于对每个第一目标节点的第一目标副本数量进行汇总,得到第一集群的第一目标副本总数。The processing unit 202 is also used to summarize the number of first target replicas of each first target node to obtain the total number of first target replicas of the first cluster.
可选地,在上述图10对应的实施例的基础上,本申请实施例提供的副本数量的确定装置的另一实施例中,处理单元202具体可以用于:Optionally, based on the embodiment corresponding to FIG10 above, in another embodiment of the device for determining the number of copies provided in this application, the processing unit 202 may specifically be used for:
根据选择器配置数据,从第一集群的若干个节点中筛选出第一候选节点;Based on the selector configuration data, the first candidate node is selected from several nodes in the first cluster;
根据亲和性配置数据,从若干个第一候选节点中筛选出第二候选节点;Based on affinity configuration data, select second candidate nodes from several first candidate nodes;
根据污点和容忍度配置数据,从若干个第二候选节点中筛选出第三候选节点;Based on the taint and tolerance configuration data, a third candidate node is selected from several second candidate nodes;
将若干个第三候选节点中满足调度条件的节点作为第一目标节点。The node that meets the scheduling conditions among several third candidate nodes is selected as the first target node.
可选地,在上述图10对应的实施例的基础上,本申请实施例提供的副本数量的确定装置的另一实施例中,处理单元202具体可以用于:Optionally, based on the embodiment corresponding to FIG10 above, in another embodiment of the device for determining the number of copies provided in this application, the processing unit 202 may specifically be used for:
获取若干个第三候选节点的节点状态信息;Obtain the node status information of several third candidate nodes;
若节点状态信息为可运行且可调度状态,则将第三候选节点作为第一目标节点。If the node status information is in a runnable and schedulable state, then the third candidate node will be selected as the first target node.
可选地,在上述图10对应的实施例的基础上,本申请实施例提供的副本数量的确定装置的另一实施例中,Optionally, based on the embodiment corresponding to FIG10 above, in another embodiment of the device for determining the number of copies provided in this application,
获取单元201,还用于获取每个第一目标节点中最小集群调度单元的已运行个数和总个数;The acquisition unit 201 is also used to acquire the number of running and the total number of the smallest cluster scheduling units in each first target node;
处理单元202具体可以用于:根据数量计算策略,对剩余资源量、副本资源配置信息中各资源量、已运行个数和总个数进行计算,得到每个第一目标节点的第一目标副本数量。The processing unit 202 can be specifically used to: calculate the number of first target replicas for each first target node based on the quantity calculation strategy, the remaining resource quantity, the resource quantity in the replica resource configuration information, the number of already run replicas and the total number of replicas.
可选地,在上述图10对应的实施例的基础上,本申请实施例提供的副本数量的确定装置的另一实施例中,处理单元202具体可以用于:Optionally, based on the embodiment corresponding to FIG10 above, in another embodiment of the device for determining the number of copies provided in this application, the processing unit 202 may specifically be used for:
计算剩余资源量和副本资源配置信息中各资源量之间的比值,得到至少一个第一副本数量;Calculate the ratio between the remaining resource quantity and the resource quantity in the replica resource configuration information to obtain the number of at least one first replica;
计算已运行个数和总个数之间的比值,得到第二副本数量;Calculate the ratio between the number of already run replicas and the total number of replicas to obtain the number of second replicas;
从至少一个第一副本数量和第二副本数量中选取数值最小的作为第一目标副本数量。Select the smallest value from at least one first replica quantity and a second replica quantity as the first target replica quantity.
可选地,在上述图10对应的实施例的基础上,本申请实施例提供的副本数量的确定装置的另一实施例中,Optionally, based on the embodiment corresponding to FIG10 above, in another embodiment of the device for determining the number of copies provided in this application,
获取单元201,还用于获取至少两个集群中的第二集群的第二目标副本总数;The acquisition unit 201 is also used to acquire the total number of second target replicas of the second cluster in at least two clusters;
处理单元202,还用于计算第一目标副本总数和第二目标副本总数之间的比值,得到至少两个集群对应的副本分配权值。The processing unit 202 is also used to calculate the ratio between the total number of the first target replicas and the total number of the second target replicas, so as to obtain the replica allocation weights for at least two clusters.
可选地,在上述图10对应的实施例的基础上,本申请实施例提供的副本数量的确定装置的另一实施例中,Optionally, based on the embodiment corresponding to FIG10 above, in another embodiment of the device for determining the number of copies provided in this application,
接收单元203,用于接收集群调度请求,其中,集群调度请求携带有待生产的副本总数;The receiving unit 203 is used to receive a cluster scheduling request, wherein the cluster scheduling request carries the total number of replicas to be produced;
处理单元202,还用于根据副本分配权值,将待生产的副本总数划分第一集群的第一待生产副本数量以及第二集群的第二待生产副本数量;Processing unit 202 is also used to divide the total number of replicas to be produced into the first number of replicas to be produced in the first cluster and the second number of replicas to be produced in the second cluster according to the replica allocation weight.
处理单元202,还用于分别将第一待生产副本数量以及第二待生产副本数量分配至第一集群以及第二集群进行副本生产。The processing unit 202 is also used to allocate the first number of replicas to be produced and the second number of replicas to be produced to the first cluster and the second cluster respectively for replica production.
本申请另一方面提供了另一种计算机设备示意图,如图11所示,图11是本申请实施例提供的一种计算机设备结构示意图,该计算机设备300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)310(例如,一个或一个以上处理器)和存储器320,一个或一个以上存储应用程序331或数据332的存储介质330(例如一个或一个以上海量存储设备)。其中,存储器320和存储介质330可以是短暂存储或持久存储。存储在存储介质330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对计算机设备300中的一系列指令操作。更进一步地,中央处理器310可以设置为与存储介质330通信,在计算机设备300上执行存储介质330中的一系列指令操作。This application also provides another schematic diagram of a computer device, as shown in FIG11. FIG11 is a schematic diagram of a computer device structure provided by an embodiment of this application. The computer device 300 may vary considerably due to different configurations or performance, and may include one or more central processing units (CPUs) 310 (e.g., one or more processors) and memory 320, and one or more storage media 330 (e.g., one or more mass storage devices) for storing application programs 331 or data 332. The memory 320 and storage media 330 may be temporary or persistent storage. The program stored in the storage media 330 may include one or more modules (not shown in the figure), each module may include a series of instruction operations on the computer device 300. Furthermore, the CPU 310 may be configured to communicate with the storage media 330 and execute the series of instruction operations in the storage media 330 on the computer device 300.
计算机设备300还可以包括一个或一个以上电源340,一个或一个以上有线或无线网络接口350,一个或一个以上输入输出接口360,和/或,一个或一个以上操作系统333,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The computer device 300 may also include one or more power supplies 340, one or more wired or wireless network interfaces 350, one or more input/output interfaces 360, and/or one or more operating systems 333, such as Windows Server ™ , Mac OS X ™ , Unix ™ , Linux ™ , FreeBSD ™ , etc.
上述计算机设备300还用于执行如图2至图8对应的实施例中的步骤。The computer device 300 described above is also used to perform the steps in the embodiments corresponding to Figures 2 to 8.
本申请的另一方面提供了一种计算机可读存储介质,计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如图2至图8所示实施例描述的方法中的步骤。Another aspect of this application provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the steps of the method described in the embodiments shown in Figures 2 to 8.
本申请的另一方面提供了一种包含指令的计算机程序产品当其在计算机或处理器上运行时,使得所述计算机或处理器执行如图2至图8所示实施例描述的方法中的步骤。Another aspect of this application provides a computer program product containing instructions that, when run on a computer or processor, causes the computer or processor to perform the steps of the method described in the embodiments shown in Figures 2 to 8.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection between apparatuses or units through some interfaces, and may be electrical, mechanical, or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
Claims (11)
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK40089547A true HK40089547A (en) | 2023-10-20 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11593149B2 (en) | Unified resource management for containers and virtual machines | |
| US11392422B1 (en) | Service-managed containers for container orchestration service | |
| US11422844B1 (en) | Client-specified network interface configuration for serverless container management service | |
| US7743142B2 (en) | Verifying resource functionality before use by a grid job submitted to a grid environment | |
| US10824474B1 (en) | Dynamically allocating resources for interdependent portions of distributed data processing programs | |
| US10715460B2 (en) | Opportunistic resource migration to optimize resource placement | |
| US8909769B2 (en) | Determining optimal component location in a networked computing environment | |
| US9413604B2 (en) | Instance host configuration | |
| EP2656215B1 (en) | Scheduling and management in a personal datacenter | |
| US9813423B2 (en) | Trust-based computing resource authorization in a networked computing environment | |
| CN111931949A (en) | Communication in a federated learning environment | |
| US10146636B1 (en) | Disaster recovery rehearsals | |
| CN105027108B (en) | Instance host configuration | |
| CN111666131B (en) | Load balancing distribution method, device, computer equipment and storage medium | |
| US20220382601A1 (en) | Configuration map based sharding for containers in a machine learning serving infrastructure | |
| US10616134B1 (en) | Prioritizing resource hosts for resource placement | |
| CN107567696A (en) | The automatic extension of resource instances group in computing cluster | |
| US10154091B1 (en) | Deploying infrastructure units according to resource hosting constraints | |
| US8660996B2 (en) | Monitoring files in cloud-based networks | |
| US20210067599A1 (en) | Cloud resource marketplace | |
| CN109614227A (en) | Task resource allocation method, apparatus, electronic device, and computer-readable medium | |
| US20210286647A1 (en) | Embedded persistent queue | |
| CN112148461B (en) | Application scheduling method and device | |
| Mohamed et al. | MidCloud: an agent‐based middleware for effective utilization of replicated Cloud services | |
| CN111666034A (en) | Disk management method and device for container cluster |