CN116701051A - Node processing method, electronic equipment and computer storage medium - Google Patents
Node processing method, electronic equipment and computer storage medium Download PDFInfo
- Publication number
- CN116701051A CN116701051A CN202310526416.6A CN202310526416A CN116701051A CN 116701051 A CN116701051 A CN 116701051A CN 202310526416 A CN202310526416 A CN 202310526416A CN 116701051 A CN116701051 A CN 116701051A
- Authority
- CN
- China
- Prior art keywords
- node
- computing
- sub
- capacity
- partition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a node processing method, electronic equipment and a computer storage medium, wherein the method is applied to a node cluster system, and the node cluster system comprises a calculation main node, a calculation sub node and a storage node; the node processing method comprises the following steps: the computing master node obtains the node capacity of all the computing sub-nodes; the computing main node obtains the partition number of each computing sub-node based on the node capacity of all the computing sub-nodes, and returns the partition number of each computing sub-node; and the computing sub-node acquires backup data from the corresponding storage partition of the storage node based on the partition number of the computing sub-node, and loads the backup data to the node memory of the computing sub-node. Each computing node reports the capacity of the computing node, and the partition number and all partition numbers corresponding to the current node are calculated according to the respective capacity. When the computing node needs to perform capacity expansion operation, the partition number bound by the current node is recalculated, data loading is performed again, partition reconstruction is not needed, and the time required by capacity expansion is reduced.
Description
Technical Field
The present application relates to the field of big data technologies, and in particular, to a node processing method, an electronic device, and a computer storage medium.
Background
At the moment of rapid development of the technology level, a large amount of data can be generated every day, and the current mainstream business service is basically deployed through containerization, so that under the condition of increasing the data volume, the capability of processing the large data volume of the system is required to be improved through transverse expansion and contraction. The general system can load data into the container for calculation, and meanwhile, the data loaded into the container memory can be backed up for loading the original data after the container crashes accidentally.
Most servers refer to a consistent hashing algorithm when caching data, hash the id of the data, hash all cache nodes ip or hostnames, and cache the data to the closest node clockwise on the hash ring. The existing data for backup are partitioned, the number of the partitions is consistent with the number of the computing nodes, and each computing node loads the data of the fixed partition so as to reload the data when the computing node is restarted due to possible crashes; the amount of data that can be accommodated in each computing node is consistent. Each piece of data has a unique string type id identification, when each piece of data is newly added, the partition number of the data to be stored is calculated, the stored data partition number is obtained by carrying out hash on the ids and then modulo the total number of the data partitions, so that the client can quickly find the partition/node where the client is located according to the ids to acquire/modify data information, and meanwhile, the load balance of each partition/node can be ensured because the generation of the ids is random.
When the system needs to expand and contract capacity, the data partition for backup needs to be re-partitioned and processed into partitions with the same number as the calculated nodes, so that the data volume on each node is balanced when the system is restarted and loaded after expanding and contracting capacity, and the calculated master node can hit the node/partition number according to the id. And if the storage capacities of the computing nodes are different, the capacity use balance of the computing nodes cannot be guaranteed.
Disclosure of Invention
In order to solve the technical problems, the application provides a node processing method, electronic equipment and a computer storage medium.
In order to solve the technical problems, the application provides a node processing method which is applied to a node cluster system, wherein the node cluster system comprises a calculation main node, a calculation sub node and a storage node; the node processing method comprises the following steps:
the computing master node obtains node capacities of all computing sub-nodes;
the computing main node obtains the partition number of each computing sub-node based on the node capacity of all the computing sub-nodes, and returns the partition number of each computing sub-node;
and the computing sub-node acquires backup data from the corresponding storage partition of the storage node based on the partition number of the computing sub-node, and loads the backup data into the node memory of the computing sub-node.
The method comprises the steps that the computing main node obtains node capacity of all computing sub-nodes, and the method comprises the following steps: the computing sub-node initiates a registration request to the computing main node, and the computing main node acquires registration information of the computing sub-node; and the computing master node acquires the node capacity of all the computing sub-nodes according to the registration information.
The method for obtaining the partition number of each computing sub-node by the computing main node based on the node capacity of all the computing sub-nodes comprises the following steps: the computing master node computes the partition quantity bound by each computing sub-node according to the node capacity of each computing sub-node; and the computing main node acquires the partition number of each computing sub-node based on the node capacity and the partition number, and returns the partition number of each computing sub-node.
The method for obtaining the partition number of each computing sub-node by the computing main node based on the node capacity and the partition number, and returning the partition number of each computing sub-node comprises the following steps: the calculation master node calculates the capacity ratio of the node capacity of each calculation sub-node to the capacity of all the calculation sub-nodes; the computing main node obtains the total number of the partitions of the storage node, obtains the partition number of each computing sub-node based on the total number of the partitions and the capacity ratio, and returns the partition number of each computing sub-node.
The method for obtaining the partition number of each computing sub-node by the computing main node based on the partition total number and the capacity ratio comprises the following steps: the computing master node obtains a product value of the total number of partitions of the storage node and any ratio; the computing main node rounds the product value downwards to obtain the partition number distributed by each computing sub-node; and the computing main node numbers the computing sub-nodes based on the partition number to acquire the partition numbers of the computing sub-nodes.
The calculating main node numbers the calculating sub-nodes based on the partition number, and obtains the partition numbers of the calculating sub-nodes, including: the computing master node obtains the number of the space partitions of the storage nodes of the unassigned computing sub-nodes; the node partition number of the calculation sub-node before the capacity sequencing of the calculation main node is the space partition number is added with 1; and the computing main node numbers the computing sub-nodes based on the node partition number to acquire the partition numbers of the computing sub-nodes, wherein the partition numbers of all the nodes are different.
The node processing method further comprises the following steps: the computing master node acquires a data modification request, acquires a first partition corresponding to the data modification request and acquires a first partition number; the computing master node obtains a first computing sub-node corresponding to the data modification request according to a node mapping table of the computing sub-node and the first partition number; the computing master node sends the data modification request to the first computing child node.
The node processing method further comprises the following steps: the computing master node obtains the capacity utilization rate of each computing sub-node; when the capacity utilization rate of at least one computing sub-node is larger than a first preset threshold value, the computing main node acquires a to-be-processed computing sub-node set; the computing master node acquires a second computing sub-node with highest capacity utilization rate and a third computing sub-node with lowest capacity utilization rate in the computing sub-nodes to be processed, and exchanges a partition bound by the second computing sub-node and a partition bound by the third computing sub-node; the computing master node updates the to-be-processed computing sub-node set until the to-be-processed computing sub-node set is empty; and the absolute value of the difference between the capacity utilization rate of the computing sub-nodes in the to-be-processed computing sub-node set and the average capacity utilization rate of all the computing sub-nodes is larger than a second preset threshold.
The to-be-processed computing sub-node set comprises a first to-be-processed computing sub-node set and a second to-be-processed computing sub-node set, wherein the capacity utilization rate in the first to-be-processed computing sub-node set is larger than the average capacity utilization rate, and the capacity utilization rate in the second to-be-processed computing sub-node set is smaller than the average capacity utilization rate.
Wherein said swapping a partition bound by said second compute child node with a partition bound by said third compute child node comprises: the computing master node obtains a second capacity utilization rate and a second capacity of the second computing sub-node, and a third capacity utilization rate and a third capacity of the third computing sub-node; the computing master node computes a third preset threshold based on the second capacity usage, the second capacity, the third capacity usage, the third capacity, and an average capacity usage; and the computing main node exchanges one partition bound by the second computing sub-node and one partition bound by the third computing sub-node so as to maximize the capacity change of the second computing sub-node and the capacity change of the third computing sub-node, and the change amount is smaller than the third preset threshold value.
In order to solve the above technical problems, the present application provides an electronic device, which includes a processor and a memory connected to the processor, wherein the memory stores program instructions, and the processor executes the program instructions to implement the above node processing method.
To solve the above-mentioned problems, the present application provides a computer readable storage medium storing program instructions, which when executed by a processor, implement the node processing method described above.
Compared with the prior art, the application has the beneficial effects that: the node processing method is applied to a node cluster system, and the node cluster system comprises a calculation main node, a calculation sub node and a storage node; the node processing method comprises the following steps: the computing master node obtains the node capacity of all the computing sub-nodes; the computing main node obtains the partition number of each computing sub-node based on the node capacity of all the computing sub-nodes, and returns the partition number of each computing sub-node; and the computing sub-node acquires backup data from the corresponding storage partition of the storage node based on the partition number of the computing sub-node, and loads the backup data to the node memory of the computing sub-node. Each computing node reports the capacity of the computing node, and the partition number and all partition numbers corresponding to the current node are calculated according to the respective capacity. When the computing node needs to perform capacity expansion operation, the partition number bound by the current node is recalculated, data loading is performed again, partition reconstruction is not needed, and the time required by capacity expansion is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1 is a schematic diagram of a node cluster system according to the node processing method provided by the application;
FIG. 2 is a flow chart of a first embodiment of a node processing method provided by the present application;
FIG. 3 is a schematic diagram of a data interaction flow of a node processing method according to the present application;
FIG. 4 is a flow chart of a second embodiment of a node processing method provided by the present application;
FIG. 5 is a schematic flow chart of partition allocation of a computing master node according to the present application;
FIG. 6 is a flow chart of the substeps of step S22 in a second embodiment of the present application;
FIG. 7 is a flow chart of a third embodiment of a node processing method provided by the present application;
FIG. 8 is a schematic diagram of a general flow of data modification of a node processing method provided by the present application;
FIG. 9 is a flow chart of a fourth embodiment of a node processing method provided by the present application;
FIG. 10 is a schematic diagram of a load balancing overall flow of a node processing method provided by the present application;
FIG. 11 is a schematic diagram of an embodiment of an electronic device according to the present application;
fig. 12 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The node processing method is applied to a node cluster system, wherein as shown in fig. 1, fig. 1 is a schematic structural diagram of the node cluster system according to the node processing method provided by the application. The node cluster system comprises a calculation main node, a calculation sub node and a storage node.
The storage node is used for storing data requested by the client and is mainly used for backup disaster recovery, the data in the storage node are all subjected to partition operation so as to facilitate the calculation of the partition loaded by the sub-node, otherwise, the self-calculation main node is required to load the whole data and distribute the whole data to each node, and a bottleneck exists. Meanwhile, the request of adding, deleting and checking the computing master node is accepted, and the corresponding operation is carried out on the data in the request. The data on the storage nodes is stored on hdfs (distributed file system).
The computing sub-node may store all data for computation, in embodiments of the present application, there may be multiple computing sub-nodes, and in other embodiments, there may be only one computing sub-node, and the data stored by the different computing nodes is not repeated.
The computing main node receives the adding, deleting and modifying search request of the client side for the data, and forwards the adding, deleting and modifying search request to the corresponding computing sub-node and the partition of the storage node.
The client side can send a request for adding, deleting and checking data to the computing master node according to actual requirements.
Referring to fig. 2 and fig. 3, fig. 2 is a flow chart of a first embodiment of a node processing method according to the present application; fig. 3 is a schematic diagram of a data interaction flow of the node processing method provided by the application.
The node processing method is applied to the electronic equipment, wherein the electronic equipment can be a server, a local terminal or a system formed by mutually matching the server and the local terminal. Accordingly, each part, for example, each unit, sub-unit, module, and sub-module, included in the electronic device may be all disposed in the server, may be all disposed in the local terminal, or may be disposed in the server and the local terminal, respectively.
Further, the server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing a distributed server, or may be implemented as a single software or software module, which is not specifically limited herein.
As shown in fig. 2, the specific steps are as follows:
step S11: the computing master node obtains the node capacities of all the computing sub-nodes.
In the embodiment of the application, the computing node comprises a computing main node and a computing sub-node, and backup data is loaded on the storage node when the computing main node and the computing sub-node are started. In general, a storage node only has one node, data are written on a disk, and the storage capacity of the disk can ensure the data quantity of the far-beyond computing node.
Specifically, the computing master node manages all computing sub-nodes, the total number of partitions on the node and the total number of computing sub-nodes are synchronously stored in the computing master node configuration file, and ip information of the master node is synchronously computed in the computing sub-node configuration file.
After the process, the computing sub-node finds the computing main node ip and registers, and the registration information contains the ip and capacity information of the computing sub-node.
As shown in fig. 3, a computing sub-node initiates a registration request to a computing main node, the computing main node acquires registration information of the computing sub-node, and the computing main node acquires node capacities of all the computing sub-nodes according to the registration information.
The storage nodes can be prefabricated with partitions, each partition logically corresponds to a certain computing node, the partition can be understood as a file for storing data, and the relationship between the partition and the computing node is a many-to-one relationship. To ensure that the data storage on each compute node is ultimately relatively balanced, the partitioning of the data on the storage nodes needs to be as much as possible.
Step S12: the computing master node obtains the partition number of each computing sub-node based on the node capacity of all the computing sub-nodes, and returns the partition number of each computing sub-node.
Specifically, after all the computing sub-nodes are registered by the computing master node, a sequence is allocated to each computing node according to ip, the capacity ratio of each node is calculated, the partition number corresponding to each computing node is calculated according to the capacity ratio, and finally the partition number corresponding to each computing node is reassigned. Meanwhile, the information of all the partition numbers corresponding to each node is also maintained in the computing master node.
Further, the present application provides a step as a sub-step of step S12, for obtaining the partition number of the computing node, referring specifically to fig. 4 and fig. 5, fig. 4 is a flow chart of a second embodiment of the node processing method provided by the present application; fig. 5 is a schematic flow chart of partition allocation of a computing master node according to the present application.
As shown in fig. 4, the specific steps are as follows:
step S21: and the computing master node calculates the number of the partitions bound by each computing sub-node according to the node capacity of each computing sub-node.
Specifically, as shown in fig. 5, the computing master node waits for registration of the computing sub-nodes, acquires registration information of each computing sub-node registered to the computing master node, and acquires information of the computing sub-nodes such as node capacity, node ip, node number, and the like from the registration information.
Further, the computing master node judges whether the actual registration number of the computing sub-nodes is the same as the number of the configured computing sub-nodes, if so, the unique node numbers of all the computing sub-nodes are computed through registration information, and if not, the computing node registration is waited again until the actual registration number of the computing sub-nodes is the same as the number of the configured computing sub-nodes.
In the embodiment of the application, the computing master node calculates the partition number of the storage node bound by each computing sub-node according to the node capacity of each node capable of storing information or data.
Step S22: the computing master node obtains the partition number of each computing sub-node based on the node capacity and the partition number, and returns the partition number of each computing sub-node.
Specifically, the computing master node obtains the partition number corresponding to each computing sub-node according to the node capacity and the partition number of the storage node bound by each computing sub-node, and sends the partition number back to each computing sub-node.
The application provides a specific embodiment for obtaining and calculating the partition numbers of the sub-nodes through the node capacity. Referring to fig. 6 in particular, fig. 6 is a schematic flow chart of the substeps of step S22 in the second embodiment provided by the present application.
As shown in fig. 6, the specific steps are as follows:
step S221: the computing master node computes the capacity ratio of the node capacity of each computing sub-node to the capacity of all computing sub-nodes.
Specifically, in one embodiment of the present application, please continue to refer to fig. 5, the product value of the total number of partitions of the storage node obtained by the master node and any ratio is calculated. The calculation main node rounds the product value downwards, the partition number allocated to each calculation sub-node is obtained, the calculation main node numbers the calculation sub-nodes based on the partition number, and the partition numbers of the calculation sub-nodes are obtained.
And (3) using the calculated node duty ratio P (i) of the ith partition total N+ to obtain the result, rounding downwards, and obtaining the partition number X (i) distributed by each calculated child node.
In an embodiment of the present application, a computing master node obtains the number of space partitions of a storage node to which a computing child node is not allocated, and the number of node partitions of the computing child node before the computing master node sorts the capacity into the number of space partitions is increased by 1. And further calculates the partition number of the compute sub-node.
Specifically, the computing sub-node calculates the number of unassigned partitions r=n- Σx (i), gives the number of partitions bound to the node with the capacity rank r+1, numbers the partitions bound to each node, ensures that the partition numbers between each node are different, and sends all the partition numbers bound to each node to each corresponding node.
Step S222: the main computing node obtains the total number of the partitions of the storage node, obtains the partition number of each computing sub-node based on the total number of the partitions and the capacity ratio, and returns the partition number of each computing sub-node.
Specifically, the computing master node obtains the partition number of each computing sub-node according to the total number of the partitions and the capacity ratio, and returns the partition number of each computing sub-node.
By the method, the backup data on the storage nodes are partitioned, and the nodes bind different numbers of partitions according to the capacity size of the backup data, so that the balance of the utilization rate among the nodes is realized, and meanwhile, the high-efficiency hit of the data can be ensured.
Step S13: and the computing sub-node acquires backup data from the corresponding storage partition of the storage node based on the partition number of the computing sub-node, and loads the backup data to the node memory of the computing sub-node.
The number of data partitions used for backup is relatively large, and meanwhile, the data partition numbers and the node numbers are not in one-to-one correspondence any more, and are changed into a many-to-one relationship. Meanwhile, each computing node reports the capacity of the computing node, and the partition number and all partition numbers corresponding to the current node are calculated according to the respective capacity.
Therefore, when the computing nodes need to perform capacity expansion operation, all the nodes can recalculate the partition numbers bound by the current node, each node carries out data loading again, partition reconstruction on the original data backup partition is not needed, and the time required by capacity expansion is reduced.
Further, the present application further proposes an embodiment for performing data modification, and referring to fig. 7 and fig. 8, fig. 7 is a schematic flow chart of a third embodiment of a node processing method provided by the present application; fig. 8 is a general flow chart of data modification of the node processing method provided by the application.
As shown in fig. 7, the specific steps are as follows:
step S31: the method comprises the steps that a computing master node obtains a data modification request, obtains a first partition corresponding to the data modification request, and obtains a first partition number.
The client side sends a data adding and deleting and checking request to the computing main node, and the computing main node further operates the corresponding computing sub-node data and the data on the storage node.
When the client needs to newly add data, the computing master node performs hash (hash) through the id of the data, and performs redundancy operation on the total number N of the partitions to obtain a stored partition number K, and at the moment, the data is written into the Kth partition on the storage node.
Similarly, the operations of deleting, modifying and inquiring are similar, the partition number K which needs to be written in is calculated, K=hash (id)% N, and the ip of the calculation node which needs to be operated is calculated at the same time: ip=p2n (K), where P2N represents the relationship from the partition number maintained in the computing master node to the computing child node number, and sends a corresponding request to the node ip of the computing child node.
Step S32: and the computing main node acquires a first computing sub-node corresponding to the data modification request according to the node mapping table of the computing sub-node and the first partition number.
Specifically, the mapping table of the computing node corresponding to the partition maintained in the computing master node may send the request for writing the data to the computing node corresponding to the kth partition.
Step S33: the computing master node sends the data modification request to the first computing child node.
By the method, when the node is restarted, the computing master node calculates the corresponding partition number according to the capacity of each computing sub-node, and the capacity can be quickly expanded and contracted by only setting the number of the computing sub-nodes bound by the computing master node when expanding and contracting the capacity.
Further, although the probability stored on each computing node when newly added is approximately consistent with the total capacity ratio of the capacity of each node, the data is easy to incline inevitably with the migration of time, so that node data balancing operation is also performed during the process of data addition, thereby helping computing node utilization balancing.
As shown in fig. 9, the specific steps are as follows:
step S41: the computing master node obtains the capacity utilization rate of each computing child node.
Specifically, as shown in fig. 10, the computing master node counts the number of files in each partition, calculates the total number of files, obtains partition number information of all computing sub-nodes, and periodically calculates the capacity utilization rate on each computing sub-node. And further calculates the average usage of all nodes.
Step S42: and when the capacity utilization rate of at least one computing sub-node is larger than a first preset threshold value, the computing main node acquires a to-be-processed computing sub-node set.
In all the computing sub-nodes, when the utilization rate of the computing sub-nodes is larger than a first preset threshold value, the computing sub-node set is set as a to-be-processed computing sub-node set according to a preset rule, and when the utilization rate of the computing sub-nodes is not larger than the first preset threshold value, load balancing is not needed.
Step S43: the computing main node acquires a second computing sub-node with highest capacity utilization rate and a third computing sub-node with lowest capacity utilization rate in a to-be-processed computing sub-node set, and exchanges a partition bound by the second computing sub-node and a partition bound by the third computing sub-node.
Specifically, the computing sub-node obtains a third computing sub-node a with the lowest utilization rate in the to-be-processed computing sub-node set, and a second computing sub-node b with the highest utilization rate, wherein the utilization rates are respectively a1 and b1, and the node capacities are respectively a2 and b2.
The absolute value of the difference between the capacity utilization rate of the computing sub-nodes in the to-be-processed computing sub-node set and the average capacity utilization rate of all the computing sub-nodes is larger than a second preset threshold.
Specifically, the computing master node acquires the second capacity usage b1 and the second capacity b2 of the second computing sub-node, and the third capacity usage a1 and the third capacity a1 of the third computing sub-node.
The calculation master node calculates a third preset threshold based on the second capacity usage b1, the second capacity b2, the third capacity usage a1, the third capacity a2, and the average capacity meannusage usage. The third preset threshold is the maximum newly added data of the third calculation sub-node and the maximum reduced data of the second calculation sub-node.
Calculating maximum reduction data exchange maxnum=min ((means usage+v-a 1) ×a2 of the maximum increase data/b of the node a;
exchangeMaxNum=(b1+V-meanUsage)*b2)。
further, the computing master node exchanges one partition bound by the second computing sub-node and one partition bound by the third computing sub-node, so that the capacity change of the second computing sub-node and the capacity change of the third computing sub-node are maximum, and the change amount is smaller than the third preset threshold value.
In an embodiment of the present application, the set of computing sub-nodes to be processed includes a first set of computing sub-nodes to be processed and a second set of computing sub-nodes to be processed, wherein a capacity usage rate in the first set of computing sub-nodes to be processed is greater than the average capacity usage rate, and a capacity usage rate in the second set of computing sub-nodes to be processed is less than the average capacity usage rate.
In other embodiments of the present application, the set of computing nodes to be processed may include only one computing sub-node set, and the computing master node switches the computing sub-nodes with the highest usage rate and the lowest usage rate among the computing sub-nodes to be processed, so that the capacity change of the second computing sub-node and the capacity change of the third computing sub-node are the largest, and the change amount is smaller than the third preset threshold, thereby achieving the effect of load balancing.
Step S44: and the computing main node updates the to-be-processed computing sub-node set until the to-be-processed computing sub-node set is empty.
Specifically, the computing sub-nodes update, delete or newly add the original partition to the exchanged computing sub-nodes to be processed until the computing sub-node set to be processed does not have any computing sub-nodes.
By the mode, under the condition that the data quantity which can be contained by each computation sub-node is different, the utilization rate of each storage server can be relatively balanced. The problem of data inclination caused by the simple use of the consistent hash method is avoided. After data on the computing sub-node is inclined, the data balancing operation of the computing sub-node can be completed rapidly through a load balancing strategy, so that the problem of data inclination is relieved to the greatest extent.
In order to implement the above node processing method, the present application further provides an electronic device, and referring to fig. 11 specifically, fig. 11 is a schematic structural diagram of an embodiment of the electronic device provided by the present application.
The electronic device 400 of the present embodiment includes a processor 41, a memory 42, an input-output device 43, and a bus 44.
The processor 41, the memory 42 and the input/output device 43 are respectively connected to the bus 44, and the memory 42 stores program data, and the processor 41 is configured to execute the program data to implement the node processing method according to the above embodiment.
In an embodiment of the present application, the processor 41 may also be referred to as a CPU (Central Processing Unit ). The processor 41 may be an integrated circuit chip with signal processing capabilities. The processor 41 may also be a general purpose processor, a digital signal processor (DSP, digital Signal Process), an application specific integrated circuit (ASIC, application Specific Integrated Circuit), a field programmable gate array (FPGA, field Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The general purpose processor may be a microprocessor or the processor 41 may be any conventional processor or the like.
The present application further provides a computer storage medium, please continue to refer to fig. 12, fig. 12 is a schematic structural diagram of an embodiment of the computer storage medium provided by the present application, in which a computer program 51 is stored in the computer storage medium 500, and the computer program 51 is used to implement the node processing method of the above embodiment when being executed by a processor.
Embodiments of the present application may be stored in a computer readable storage medium when implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.
Claims (12)
1. The node processing method is characterized by being applied to a node cluster system, wherein the node cluster system comprises a calculation main node, a calculation sub node and a storage node; the node processing method comprises the following steps:
the computing master node obtains node capacities of all computing sub-nodes;
the computing main node obtains the partition number of each computing sub-node based on the node capacity of all the computing sub-nodes, and returns the partition number of each computing sub-node;
and the computing sub-node acquires backup data from the corresponding storage partition of the storage node based on the partition number of the computing sub-node, and loads the backup data into the node memory of the computing sub-node.
2. The node processing method according to claim 1, wherein,
the computing master node obtains node capacities of all computing sub-nodes, including:
the computing sub-node initiates a registration request to the computing main node, and the computing main node acquires registration information of the computing sub-node;
and the computing master node acquires the node capacity of all the computing sub-nodes according to the registration information.
3. The node processing method according to claim 1 or 2, wherein,
the computing master node obtains the partition number of each computing sub-node based on the node capacity of all the computing sub-nodes, and the method comprises the following steps:
the computing master node computes the partition quantity bound by each computing sub-node according to the node capacity of each computing sub-node;
and the computing main node acquires the partition number of each computing sub-node based on the node capacity and the partition number, and returns the partition number of each computing sub-node.
4. The method of processing a node according to claim 3,
the main computing node obtains the partition number of each sub computing node based on the node capacity and the partition number, and returns the partition number of each sub computing node, and the method comprises the following steps:
the calculation master node calculates the capacity ratio of the node capacity of each calculation sub-node to the capacity of all the calculation sub-nodes;
the computing master node obtains the total number of the partitions of the storage node, obtains the partition number of each computing sub-node based on the total number of the partitions and the capacity ratio, and returns the partition number of each computing sub-node.
5. The node processing method of claim 4, wherein,
the computing master node obtains the total number of the partitions of the storage node, obtains the partition number of each computing sub-node based on the total number of the partitions and the capacity ratio, and comprises the following steps:
the computing master node obtains a product value of the total number of partitions of the storage node and any ratio;
the computing main node rounds the product value downwards to obtain the partition number distributed by each computing sub-node;
and the computing main node numbers the computing sub-nodes based on the partition number to acquire the partition numbers of the computing sub-nodes.
6. The node processing method of claim 5, wherein,
the computing main node numbers the computing sub-nodes based on the partition number, and obtains the partition numbers of the computing sub-nodes, and the method comprises the following steps:
the computing master node obtains the number of the space partitions of the storage nodes of the unassigned computing sub-nodes;
the node partition number of the calculation sub-node before the capacity sequencing of the calculation main node is the space partition number is added with 1;
and the computing main node numbers the computing sub-nodes based on the node partition number to acquire the partition numbers of the computing sub-nodes, wherein the partition numbers of all the nodes are different.
7. The node processing method according to claim 1, wherein,
the node processing method further comprises the following steps:
the computing master node acquires a data modification request, acquires a first partition corresponding to the data modification request and acquires a first partition number;
the computing master node obtains a first computing sub-node corresponding to the data modification request according to a node mapping table of the computing sub-node and the first partition number;
the computing master node sends the data modification request to the first computing child node.
8. The node processing method according to claim 1, wherein,
the node processing method further comprises the following steps:
the computing master node obtains the capacity utilization rate of each computing sub-node;
when the capacity utilization rate of at least one computing sub-node is larger than a first preset threshold value, the computing main node acquires a to-be-processed computing sub-node set;
the computing master node acquires a second computing sub-node with highest capacity utilization rate and a third computing sub-node with lowest capacity utilization rate in the computing sub-nodes to be processed, and exchanges a partition bound by the second computing sub-node and a partition bound by the third computing sub-node;
the computing master node updates the to-be-processed computing sub-node set until the to-be-processed computing sub-node set is empty;
and the absolute value of the difference between the capacity utilization rate of the computing sub-nodes in the to-be-processed computing sub-node set and the average capacity utilization rate of all the computing sub-nodes is larger than a second preset threshold.
9. The node processing method of claim 8, wherein,
the to-be-processed computing sub-node set comprises a first to-be-processed computing sub-node set and a second to-be-processed computing sub-node set, wherein the capacity utilization rate in the first to-be-processed computing sub-node set is larger than the average capacity utilization rate, and the capacity utilization rate in the second to-be-processed computing sub-node set is smaller than the average capacity utilization rate.
10. The node processing method according to claim 8 or 9, wherein,
said swapping a partition bound by said second compute child node with a partition bound by said third compute child node comprises:
the computing master node obtains a second capacity utilization rate and a second capacity of the second computing sub-node, and a third capacity utilization rate and a third capacity of the third computing sub-node;
the computing master node computes a third preset threshold based on the second capacity usage, the second capacity, the third capacity usage, the third capacity, and an average capacity usage; and the computing main node exchanges one partition bound by the second computing sub-node and one partition bound by the third computing sub-node so as to maximize the capacity change of the second computing sub-node and the capacity change of the third computing sub-node, and the change amount is smaller than the third preset threshold value.
11. An electronic device comprising a processor and a memory coupled to the processor, the memory storing program instructions, the processor executing the program instructions to implement the node processing method of any of claims 1-10.
12. A computer readable storage medium storing program instructions which when executed by a processor implement the node processing method of any of claims 1-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310526416.6A CN116701051A (en) | 2023-05-10 | 2023-05-10 | Node processing method, electronic equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310526416.6A CN116701051A (en) | 2023-05-10 | 2023-05-10 | Node processing method, electronic equipment and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116701051A true CN116701051A (en) | 2023-09-05 |
Family
ID=87830283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310526416.6A Pending CN116701051A (en) | 2023-05-10 | 2023-05-10 | Node processing method, electronic equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116701051A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312448A (en) * | 2023-09-26 | 2023-12-29 | 济南浪潮数据技术有限公司 | Cluster data synchronization method, system, storage medium and device |
-
2023
- 2023-05-10 CN CN202310526416.6A patent/CN116701051A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312448A (en) * | 2023-09-26 | 2023-12-29 | 济南浪潮数据技术有限公司 | Cluster data synchronization method, system, storage medium and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112015583B (en) | A method, device and system for data storage | |
CN103136114B (en) | Storage means and memory storage | |
CN102629219B (en) | Reduce in parallel computation frame holds adaptive load balancing method | |
US20230244694A1 (en) | Database system, computer program product, and data processing method | |
CN112948120A (en) | Load balancing method, system, device and storage medium | |
CN110321225B (en) | Load balancing method, metadata server and computer readable storage medium | |
CN111580959B (en) | Data writing method, data writing device, server and storage medium | |
CN112422611B (en) | Virtual bucket storage processing method and system based on distributed object storage | |
CN111563070A (en) | A method and device for storing the result of a Hash algorithm | |
CN113986846B (en) | Data processing method, system, device and storage medium | |
CN114157583B (en) | Reliability-based network resource heuristic mapping method and system | |
CN116701051A (en) | Node processing method, electronic equipment and computer storage medium | |
CN106550006A (en) | Cloud Server resource allocation methods and device | |
KR20180109921A (en) | Data storage and service processing methods and devices | |
CN101963978B (en) | Distributed database management method, device and system | |
CN108153759B (en) | Data transmission method, middle-tier server and system for distributed database | |
CN107948229B (en) | Distributed storage method, device and system | |
CN103905512B (en) | A kind of data processing method and equipment | |
US10606478B2 (en) | High performance hadoop with new generation instances | |
US11507313B2 (en) | Datafall: a policy-driven algorithm for decentralized placement and reorganization of replicated data | |
CN117785952A (en) | Data query method, device, server and medium | |
CN110046040B (en) | Distributed task processing method and system and storage medium | |
CN109787899B (en) | Data partition routing method, device and system | |
CN111443872A (en) | Distributed storage system construction method, device, equipment, medium | |
CN112559022A (en) | Jenkins high-availability system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |