[go: up one dir, main page]

CN119201002A - Data copy storage and access method and device of distributed system - Google Patents

Data copy storage and access method and device of distributed system Download PDF

Info

Publication number
CN119201002A
CN119201002A CN202411466064.0A CN202411466064A CN119201002A CN 119201002 A CN119201002 A CN 119201002A CN 202411466064 A CN202411466064 A CN 202411466064A CN 119201002 A CN119201002 A CN 119201002A
Authority
CN
China
Prior art keywords
physical
physical nodes
group
distributed system
physical node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411466064.0A
Other languages
Chinese (zh)
Inventor
李树毫
付胜博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202411466064.0A priority Critical patent/CN119201002A/en
Publication of CN119201002A publication Critical patent/CN119201002A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供了一种分布式系统的数据副本存储方法及访问方法、装置、电子设备、计算机可读存储介质及计算机程序产品,涉及信息处理技术领域,具体为分布式存储等人工智能技术领域。存储方法包括:基于分布式系统中物理节点的状态信息确定物理节点的权重值,状态信息包括物理节点的负载信息;基于物理节点的权重值,将分属于分布式系统中不同逻辑分区的物理节点组合为归置组;将数据副本的多个分片存储在同一归置组的物理节点。该方法在保证数据副本安全性的同时,提高了系统数据恢复的速度,优化了系统的负载均衡能力,并降低了系统整体的存储成本。

The present disclosure provides a data copy storage method and access method, device, electronic device, computer-readable storage medium and computer program product for a distributed system, which relates to the field of information processing technology, specifically the field of artificial intelligence technology such as distributed storage. The storage method includes: determining the weight value of the physical node based on the status information of the physical node in the distributed system, the status information includes the load information of the physical node; based on the weight value of the physical node, combining the physical nodes belonging to different logical partitions in the distributed system into a placement group; storing multiple fragments of the data copy in the physical nodes of the same placement group. While ensuring the security of the data copy, this method improves the speed of system data recovery, optimizes the load balancing capability of the system, and reduces the overall storage cost of the system.

Description

Data copy storage and access method and device of distributed system
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to the field of artificial intelligence technologies such as distributed storage, and in particular, to a data copy storage method and access method, apparatus, electronic device, computer readable storage medium, and computer program product for a distributed system.
Background
With the development, popularity, and application of computer and information technologies, data has been explosively growing. The accumulated data are written and stored in the distributed system at present, and how to effectively utilize and process the data and ensure the load balance of the distributed system and the security of data copies at the same time becomes one of research hot spots of the distributed system.
Disclosure of Invention
The embodiment of the disclosure provides a data copy storage method of a distributed system, a data copy access method, a device, electronic equipment, a computer readable storage medium and a computer program product of the distributed system, and improves the security of the data copy and the load balancing capability of the distributed system.
In a first aspect, an embodiment of the present disclosure provides a data copy storage method of a distributed system, including determining a weight value of a physical node based on state information of the physical node in the distributed system, where the state information includes load information of the physical node, combining the physical nodes that belong to different logical partitions in the distributed system into a set based on the weight value of the physical node, and storing a plurality of partitions of the data copy in physical nodes of the same set.
In a second aspect, an embodiment of the present disclosure provides a method for accessing a data copy of a distributed system, including constructing an access list based on state information of physical nodes where a plurality of slices of the data copy are located, where the plurality of slices are stored in a same allocation group of the distributed system, the allocation group includes physical nodes that belong to different logical partitions in the distributed system, and accessing the plurality of slices of the data copy based on an arrangement order of the physical nodes in the access list.
In a third aspect, an embodiment of the disclosure provides an electronic device, including at least one processor, and a memory communicatively coupled to the at least one processor, where the memory stores instructions executable by the at least one processor to enable the at least one processor to implement a data copy storage method of a distributed system as described in any one of the implementations of the first aspect or a data copy access method of a distributed system as described in any one of the implementations of the second aspect.
In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement a data copy storage method of a distributed system as described in any one of the implementations of the first aspect or a data copy access method of a distributed system as described in any one of the implementations of the second aspect when executed.
In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, is capable of implementing a data copy storage method of a distributed system as described in any of the implementations of the first aspect or a data copy access method of a distributed system as described in any of the implementations of the second aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:
FIG. 1 is an exemplary system architecture in which the present disclosure may be applied;
FIG. 2 is a flow chart of one embodiment of a method for storing copies of data for a distributed system provided by embodiments of the present disclosure;
FIG. 3 is a flow chart of another embodiment of a method for storing copies of data for a distributed system according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of CPU idle rate of an update physical node of a data copy storage method of a distributed system under an application scenario according to an embodiment of the present disclosure;
FIG. 5 is a flow chart of yet another embodiment of a method for storing copies of data for a distributed system provided by an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a combined set of data copy storage methods for a distributed system under an application scenario provided by an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of physical node distribution in the allocation group in the application scenario of FIG. 6;
FIG. 8 is a schematic diagram of an update-homing group of a data copy storage method for a distributed system under an application scenario provided by an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of multiple slices of stored data copies of a data copy storage method of a distributed system under an application scenario provided by an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of multiple shards of stored data copies of a data copy storage method of a distributed system under an application scenario provided by an embodiment of the present disclosure;
FIG. 11 is a flow chart of one embodiment of a method for accessing data copies of a distributed system provided by embodiments of the present disclosure;
FIG. 12 is a flow chart of another embodiment of a method for accessing data copies of a distributed system provided by an embodiment of the present disclosure;
FIG. 13 is a schematic diagram illustrating one embodiment of a data copy storage device of a distributed system according to an embodiment of the present disclosure;
FIG. 14 is a schematic diagram illustrating one embodiment of a data copy access apparatus for a distributed system according to an embodiment of the present disclosure;
Fig. 15 is a block diagram of an electronic device adapted to perform a data copy storage method or an access method of a distributed system according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of a data copy storage method of a distributed system, a data copy access method of a distributed system, an apparatus, an electronic device, and a computer-readable storage medium of the present disclosure may be applied.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various applications for implementing information communication between the terminal devices 101, 102, 103 and the server 105 may be installed on the terminal devices, for example, a data copy storage application of the distributed system, a data copy access application of the distributed system, an instant messaging application, and the like.
The terminal devices 101, 102, 103 and the server 105 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smartphones, tablet computers, laptop computers, desktop computers, etc., and when the terminal devices 101, 102, 103 are software, they may be installed in the above-listed electronic devices, which may be implemented as a plurality of software or software modules, or as a single software or software module, which is not particularly limited herein. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server, and when the server is software, it may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited herein specifically.
The server 105 can provide various services through various built-in applications, for example, a data copy storage application of the distributed system can be provided, and when the data copy storage application of the distributed system is operated, the server 105 can achieve the following effects that the weight value of a physical node is determined based on the state information of the physical node in the distributed system, wherein the state information comprises the load information of the physical node, the physical nodes which are distributed into different logic partitions in the distributed system are combined into a distribution group based on the weight value of the physical node, and a plurality of fragments of the data copy are stored in the physical nodes of the same distribution group.
It should be noted that the status information of the physical nodes may be stored in advance in the server 105 in various ways, in addition to being acquired from the terminal devices 101, 102, 103 through the network 104. Thus, when the server 105 detects that such data has been stored locally, it may choose to obtain the data directly from the local, in which case the exemplary system architecture 100 may not include the terminal devices 101, 102, 103 and the network 104.
Since determining the state information of the physical node needs to occupy more computing resources and stronger computing power, the data copy storage method of the distributed system or the data copy access method of the distributed system provided in the subsequent embodiments of the present disclosure is generally performed by the server 105 having stronger computing power and more computing resources, and accordingly, the data copy storage or access device of the distributed system is also generally provided in the server 105. However, it should be noted that, when the terminal devices 101, 102, 103 also have the required computing capability and computing resources, the terminal devices 101, 102, 103 may also complete each operation performed by the server 105 through the data copy storage application of the distributed system and the data copy access application of the distributed system installed thereon, and further output the same result as the server 105. Especially, in the case that there are multiple terminal devices with different computing capabilities at the same time, when the data copy storage application of the distributed system or the data copy access application of the distributed system determines that the terminal device where the data copy storage application of the distributed system is located has a stronger computing capability and more computing resources remain, the terminal device may perform the above operation, so that the computing pressure of the server 105 is properly reduced, and correspondingly, the data copy storage or access device of the distributed system may also be provided in the terminal devices 101, 102, 103. In this case, the exemplary system architecture 100 may also not include the server 105 and the network 104.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, fig. 2 is a flowchart of a data copy storing method of a distributed system according to an embodiment of the disclosure, where the flowchart 200 includes the following steps:
step 201, determining a weight value of a physical node based on state information of the physical node in the distributed system, wherein the state information comprises load information of the physical node.
In this embodiment, the weight value of a physical node is determined by the execution body of the data copy storing method of the distributed system (for example, the server 105 shown in fig. 1) based on the state information of the physical node in the distributed system, which can optimize the load balancing capability of the distributed system. In a large-scale cluster scene, taking a heterogeneous cluster as an example, the storage and calculation performances of a plurality of physical nodes can be different, the probability of the physical nodes bearing data is determined based on the weight value according to the storage capacity of the physical nodes, the load balancing capacity of the distributed system can be optimized, and the storage cost of the whole distributed system can be reduced.
Step 202, based on the weight value of the physical nodes, combining the physical nodes which are divided into different logic partitions in the distributed system into a reset group.
In this embodiment, the executing body uses a rack sensing policy to combine physical nodes that belong to different logical partitions in the distributed system into a set group, so that each partition of the data copy is guaranteed, and the data copy can fall into different logical partitions, such as different racks, switches, machine rooms, regions, and the like, so as to guarantee the security of the data copy.
Step 203, storing the plurality of fragments of the data copy in the physical nodes of the same homing group.
In this embodiment, based on step 202, the execution body performs a combined division on physical nodes of different logical partitions to form multiple homing groups, and stores multiple fragments of the data copy on physical nodes of the same homing group. The probability of losing the data copy caused by the failure of the physical node under different logic partitions can be reduced. In addition, in the case that a plurality of physical nodes simultaneously fail, if the physical nodes do not belong to the same configuration group, there is no possibility that the data copy fragments are all lost.
The data copy storage method of the distributed system can reduce the probability of losing the data copy caused by the failure of the physical node under different logic partitions. In other words, the data copy storage method of the distributed system provided by the embodiment of the disclosure may determine the probability that the physical node accepts data based on the weight value of the physical node, and store each slice of the data copy in different logical partitions. Therefore, the data copy safety is ensured, the speed of system data recovery is improved, the load balancing capability of the system is optimized, and the storage cost of the whole system is reduced.
With further continued reference to FIG. 3, a flow 300 of another embodiment of a data copy storage method of a distributed system according to the present disclosure is shown. The training method comprises the following steps:
Step 301, determining a weight value of a physical node based on state information of the physical node in the distributed system, wherein the state information includes load information of the physical node.
In this embodiment, the specific operation of step 301 is described in detail in step 201 in the embodiment shown in fig. 2, and will not be described herein.
Step 302, dynamically updating state information of the physical node. In this embodiment, the state information of the physical node may be dynamically updated according to the dynamic multidimensional loading capability of the physical node.
In an alternative implementation manner of the embodiment, the load information of the physical node may include a CPU idle rate, a memory idle rate, a disk idle rate, and a network idle rate, where the load information of the physical node may represent a storage capability when the node is completely idle. And determining the weight value of the physical node according to the storage capacity of the physical node, and determining the probability of the physical node receiving data based on the weight value of the physical node, so that the load balancing capacity of the distributed system can be optimized.
For example, the state information of the physical nodes may be updated based on dynamic changes in network load in the distributed system. In addition, the state information of the physical nodes can be updated according to the dynamic change of the memory load in the distributed system. Optionally, the state information of the physical node may also be updated according to dynamic changes in the computational load in the distributed system. In addition, the state information of the physical nodes can be updated according to the dynamic change of the disk load in the distributed system.
Further, in some embodiments of the present disclosure, dynamically updating the state information of the physical node further includes performing a smoothing process on the CPU idle rate data set, the memory idle rate data set, the disk idle rate data set, and the network idle rate data set during the update period.
It is considered that the idle rate is changing in real time, where noise effects are extremely large, for example high frequency fluctuations may cause the distributed system to oscillate. Thus, to reduce the impact of noise in the real-time idle rate on the weight value of the physical node, the adjusting the weight value of the physical node based on the minimum value of the updated CPU idle rate, memory idle rate, disk idle rate, and network idle rate may include performing a smoothing process on the data set of the CPU idle rate, the data set of the memory idle rate, the data set of the disk idle rate, and the data set of the network idle rate in the update period, and determining the minimum value of the updated CPU idle rate, the memory idle rate, the disk idle rate, and the network idle rate based on the smoothed data set of the CPU idle rate, the data set of the memory idle rate, the data set of the disk idle rate, and the data set of the network idle rate, respectively.
Alternatively, the idle rate dataset may be smoothed using a moving average (Simple Moving Average, SMA) to remove short term fluctuations therein, thereby facilitating long term trends.
Fig. 4 is a schematic diagram of updating CPU idle rate of a physical node in a data copy storage method of a distributed system in an application scenario according to an embodiment of the present disclosure.
As shown in fig. 4, taking the example of updating the CPU-idle rate of a physical node, the exponentially weighted moving average of the CPU-idle rate in the update period can be expressed by the formula (1),
EWMAt= αxt+(1-α)EWMAt-1 (1)
Wherein EWMA t represents an exponentially weighted moving average of CPU idle rate in an update period, α represents a smoothing factor that can take a value between 0 and 1, x t represents a true value of CPU idle rate at time t, and EWMA t-1 represents an exponentially weighted moving average of CPU idle rate at time t-1.
Taking the example that the data set of CPU idle rate in the update period includes x 1、x2、x3……xt, the smoothing factor alpha is selected to be 0.3, the calculation process can be expressed as follows,
EWMA1=αx1+(1-α)EWMA0
EWMA2=αx2+(1-α)EWMA1
EWMA3=αx3+(1-α)EWMA2
...
EWMAt= αxt+(1-α)EWMAt-1 (2)
After setting the initial value of the CPU idle rate to 50, data point x 1=55,x2=45,x3=50……xt, smoothing factor α=0.3, the effect is as shown in fig. 3. Short-term fluctuation in the idle rate data set in the updating period can be removed by smoothing the idle rate, so that long-term trend can be observed conveniently, and the influence of noise in the real-time idle rate on the weight value of the physical node is reduced.
Alternatively, in some embodiments of the present disclosure, the state information of the physical node may be updated based on a predetermined period (e.g., a second predetermined period). Or the state information of the physical node may be updated based on the load information of the physical node satisfying a preset condition. The load and the resource use condition of the physical nodes are monitored, the residual resource quantity is determined, and the weight value of the physical nodes can be dynamically adjusted based on the real-time state information of the physical nodes, so that the physical nodes with larger weight values can be ensured not to be continuously overloaded, and the resources of all the physical nodes in the distributed system are fully utilized. The weight value of the physical node can be passively triggered to change through real-time monitoring and inspection, so that the probability of receiving data by the physical node is redistributed.
Step 303, adjusting the weight value of the physical node based on the updated state information of the physical node.
In this embodiment, the state information of the physical node may be dynamically updated according to the dynamic multidimensional loading capability of the physical node, and the weight value of the physical node may be adjusted based on the updated state information of the physical node. The weight value of the physical node can be dynamically adjusted according to at least one of the capacity, the performance and the current load of the physical node, so that the reasonable distribution of each fragment of the data copy is ensured, and the occurrence of overload condition of the physical node is reduced.
For example, the weight value of a physical node may be affected by dynamic changes in network load, and the product of the initial weight value of the physical node and the network idle rate may be used to characterize the extent to which the weight value of the physical node is affected by dynamic changes in network load. The weight value of the physical node can also be influenced by the dynamic change of the memory load, and the product of the initial weight value of the physical node and the memory idle rate can be used for representing the degree of the influence of the dynamic change of the memory load on the weight value of the physical node. The weight value of the physical node can also be influenced by the dynamic change of the computational load, and the product of the initial weight value of the physical node and the CPU idle rate can be used for representing the degree of influence of the dynamic change of the computational load on the weight value of the physical node. The weight value of the physical node can also be influenced by the dynamic change of the disk load, and the product of the initial weight value of the physical node and the disk idle rate can be used for representing the degree of the influence of the dynamic change of the disk load on the weight value of the physical node.
After determining the above-mentioned respective load influence degrees, a value in which the influence on the weight value of the physical node is the greatest may be determined as the updated weight value of the physical node based on the barrel principle. In other words, the weight value of the physical node may be adjusted based on the minimum value of the updated CPU idle rate, memory idle rate, disk idle rate, and network idle rate.
By dynamically collecting the status information of the physical nodes, such as CPU idle rate, memory idle rate, network idle rate, etc., and intelligently adjusting the weight value of each physical node based on the dynamically collected status information of the physical nodes, finer load balancing adjustment can be achieved. The adjustment of the weight value of the physical node is based on the actual load condition of the physical node.
In other words, the weight value of the physical node may be adjusted based on the dynamic state information of the physical node. For example, a high load physical node may be given a lower weight value, while a low load physical node may be given a higher weight value. Further, in response to the adjusted weight value of the physical node (e.g., the first physical node) being less than a first predetermined threshold, the first physical node may cease receiving the shards of the data copies. This dynamic adjustment ensures a reasonable allocation and utilization of resources in the distributed system. And setting a first preset threshold value to ensure that the physical nodes meeting the conditions do not continuously receive data with large specific gravity any more, thereby avoiding the problem of data inclination.
Step 304, based on the updated weight value of the physical nodes, the physical nodes which are divided into different logic partitions in the distributed system are combined into a reset group.
In this embodiment, the specific operation of step 304 is described in detail in the embodiment shown in fig. 2 at step 202, and will not be described herein.
In an alternative implementation of the present embodiment, the logical partition may include at least one of a rack partition, a switch partition, a machine room partition, and a zone partition. The method has the advantages that the reliability and fault tolerance of data copy storage can be improved by placing the data copy across the logic partition, the bandwidth consumption of data transmission across the logic partition is reduced, the read-write performance of data is improved, in addition, the load among the logic partitions can be balanced, and the data loss risk caused by the fault of a single logic partition is reduced. According to the division of the logical partitions, different physical nodes are selected from different logical partitions to form a collocation group, and therefore the integral faults of the logical partition level smaller than the number of the fragments in the data copy can be tolerated.
In step 305, multiple shards of the data copy are stored at physical nodes of the same set.
In this embodiment, the specific operation of step 305 is described in detail in step 203 in the embodiment shown in fig. 2, and will not be described herein.
In an alternative implementation of this embodiment, a plurality of different selected combinations of the homing groups may be randomly formed, where the distribution of some of the plurality of physical nodes among the different combinations may be different. This may result in an optimal distribution strategy for the shards of the data copies, based on the guaranteed data recovery time.
With further continued reference to FIG. 5, a flow 400 of another embodiment of a data copy storage method of a distributed system according to the present disclosure is shown. The training method comprises the following steps:
step 401, determining a weight value of a physical node based on state information of the physical node in the distributed system, wherein the state information includes load information of the physical node.
In this embodiment, the specific operation of step 401 is described in detail in step 201 in the embodiment shown in fig. 2 or step 301 in the embodiment shown in fig. 3, and will not be described herein.
Step 402, determining a required number of physical nodes in the homing group.
In this embodiment, the number of the set groups may be determined based on the number of the plurality of slices in the data copy and the number of the physical nodes in the distributed system, and the required number of the physical nodes in the set groups may be determined based on the number of the physical nodes in the distributed system and the number of the set groups. This may result in an optimal distribution strategy for the shards of the data copies, based on the guaranteed data recovery time.
For example, taking the number of physical nodes in the distributed system as N and the number of the plurality of slices in the data copy as R as an example, a plurality of rounds of calculation can be performed according to the number of the plurality of physical nodes N and the number of the plurality of slices R, and each round of calculation divides the number of the plurality of physical nodes N into N/R configuration groups (Copysets) according to the number of the plurality of slices R. In each round, it is ensured that the distribution of the homing groups therein is not the same as the distribution of the homing groups that have been generated in all the current and previous rounds, and after a plurality of homing groups are determined, at least one of the homing groups is selected for storage of a copy of the data.
In addition, copies of data on one disk are scattered over other disks, and SCATTER WIDTH (S) is used to characterize the number of disks to which all the copies correspond. The larger SCATTER WIDTH, the more nodes involved in data recovery, the faster the recovery speed. The recovery speed is positively correlated with SCATTER WIDTH. Thus, SCATTER WIDTH (hash width) required for data recovery can be calculated with the recovery speed fixed. After each of the above runs, S may be increased by R-1, and then run P may be expressed as p=s/(R-1) times, and the number of the set of groups K may be expressed as k=p (N/R) =s (N/R)/(R-1).
Fig. 6 is a schematic diagram of a combined collocation group of a data copy storage method of a distributed system under an application scenario according to an embodiment of the disclosure. Fig. 7 is a schematic diagram of physical node distribution in the allocation group in the application scenario of fig. 6.
As shown in fig. 6, in this application scenario, 9 physical nodes (Nodes) in the distributed system may be combined to form multiple homing groups (Copysets). The 9 physical nodes are Node1, node2, node3, node4, node5, node6, node7, node8 and Node9 respectively. In combination one, three homing groups may be formed, copyset, copyset, and Copyset, respectively. Copyset1 may include Node1, node7, and Node5, copyset2 may include Node4, node6, and Node9, and Copyset3 may include Node2, node3, and Node8. In addition, based on the required number of physical nodes in the allocation groups, the physical nodes can be combined into other allocation groups, for example, a second combination mode. In combination two, three homing groups may be formed, copyset, copyset, and Copyset, respectively. Copyset4 may include Node2, node7, and Node9, copyset5 may include Node1, node6, and Node3, and Copyset6 may include Node4, node5, and Node8.
As shown in fig. 7, through the procedure of combining the homing groups shown in fig. 6, the physical Node2 may be distributed to the homing group Copyset in the first combination manner and may be distributed to the homing group Copyset in the second combination manner. In other words, the number of the homing groups may be determined based on the number of the plurality of fragments in the data replica and the number of physical nodes in the distributed system, and the required number of physical nodes in the homing groups may be determined based on the number of physical nodes in the distributed system and the number of homing groups. In determining the required number of physical nodes in the homing group, a plurality of different selected homing group combining modes can be formed randomly, wherein the distribution of some physical nodes in the plurality of physical nodes in the different combining modes can be different. This may result in an optimal distribution strategy for the shards of the data copies, based on the guaranteed data recovery time.
In addition, in the case of random placement, if a storage strategy is adopted in which small files are combined into large files, the size of the large files can be controlled, so that the number of the small files on each disk can be controlled. For example, in the case of a large file of 100G, the maximum number of file stores on an 8T disk is 8T/100G. In this way, the degree of scattering of the data copies or SCATTER WIDTH required for data recovery can be controlled.
Step 403, based on the number of physical nodes in the set, physical nodes which belong to different logical partitions and have weight values distributed in a predetermined range are combined into the set.
In this embodiment, after the required number of physical nodes in the allocation group is determined, physical nodes that belong to different logical partitions and have weight values distributed in a predetermined range may be combined into the allocation group.
Specifically, according to the barrel principle, the storage capacity of the allocation group depends on the physical node with the smallest storage capacity in the allocation group, so that in order to ensure that all physical nodes in the allocation group can be fully utilized, weight values can be selected to be distributed in a preset range on the basis that the physical nodes belong to different logical partitions, and the physical nodes with the weight values equal to each other as much as possible are combined into the allocation group. For example, physical nodes having equal weight values and belonging to different logical partitions are combined into a set of groupings.
In some embodiments of the present disclosure, to optimize the load balancing capability of the distributed system, the probability of data skew is reduced, and the weight value of the homing group may be dynamically adjusted. Optionally, dynamically adjusting the weight value of the homing group may include dynamically adjusting the weight value of the homing group based on a predetermined period (e.g., a first predetermined period), or dynamically adjusting the weight value of the homing group based on a remaining hash value of a physical node in the homing group being less than a first predetermined value, wherein the remaining hash value is used to characterize the number of times the physical node remains available to participate in the homing group.
For example, an initial value of the weight value of the homing group may be determined based on the weight values of the plurality of physical nodes in the homing group and the weight value of the homing group may be dynamically adjusted based on a predetermined period or based on the remaining hash values of the physical nodes in the homing group being less than a first predetermined value. The weight value of the homing group can represent the processing capacity of the homing group, so that the homing group with a large weight value has higher data falling rate. The weight value of the allocation group is periodically evaluated, and dynamically adjusted, so that the load balancing capability of the distributed system can be optimized, and the probability of data inclination is reduced.
In addition, in some embodiments of the present disclosure, combining physical nodes classified into different logical partitions in the distributed system into a cluster based on weight values of the physical nodes may further include removing physical nodes having a remaining hash value of less than 0 in the cluster, and supplementing a cluster of physical nodes classified into different logical partitions and having weight values distributed within a predetermined range into the cluster after performing the removal process based on the remaining number of physical nodes in the cluster after performing the removal process.
Specifically, when a physical node joins in the homing group, the state information of the physical node may be initialized, and its logical partition flag, for example RackID, may be set according to the logical partition to which the physical node belongs, and its Weight value, for example, base Weight, may be determined according to the state information of the physical node. In addition, the maximum hash value of the physical node can be set according to the planning, wherein the maximum hash value determines the maximum number of physical nodes for jointly recovering the data of the physical node when the physical node fails, and the maximum hash value can characterize the data recovery speed.
Fig. 8 is a schematic diagram of an update-collocation group of a data copy storage method of a distributed system under an application scenario according to an embodiment of the disclosure.
As shown in fig. 8, in step 4031, when the physical node information is added to the small top heap (any one of the configuration groups), the top heap may sort according to the remaining hash values, and preferentially return the physical nodes with large remaining hash values, so as to improve the resource utilization rate of the distributed system, and as far as possible, form the most idle physical nodes into a new configuration group for data placement.
In addition, in step 4032, the physical node with the smallest residual hash value can be fetched from the top heap, and other physical nodes are searched according to the physical node to form a new allocation group. During the process of looking for other physical nodes in step 4033, rackID of the other physical nodes may be required to be unequal, which ensures that the other physical nodes come from different logical partitions, tolerating failures at the logical partition level. Furthermore, the remaining hash values of these other physical nodes may be greater than zero. In addition, the weight values of the other physical nodes may be distributed within a predetermined range from the weight value of the physical node having the smallest remaining hash value. Or, the weight values of the physical nodes forming the grouping group are equal as much as possible, and according to the wooden barrel principle, the storage capacity of the grouping group depends on the physical node with the smallest storage capacity, so that in order to ensure that all the physical nodes in the grouping group can be fully utilized, the weight values of the physical nodes forming the grouping group can be set to be distributed in a preset range.
Based on condition 4034 that the number of shards in the data copy is satisfied after the number of other physical nodes sought and the number of physical nodes remaining in the set of attributes, step 4035 may be performed. Step 4035, the process of searching for other physical nodes in step 4033 is ended, and the remaining hash value of each physical node in the homing group is subtracted by 1. Thereafter, step 4036 may be performed to add the set of homing groups combined to perform the above procedure to the top heap.
The reset group provided by the embodiment of the disclosure can dynamically and transversely expand and contract, and the deletion and recombination of the reset group can be performed under the condition of ensuring that the combination of the non-relevant reset group is unchanged for the dynamic expansion and contraction of the reset group. For example, in the process of searching for other physical nodes to form a new set based on the physical node with the smallest remaining hash value in the top heap, only the set related to the physical node with the smallest remaining hash value may be deleted and reorganized.
The remaining hash value of the physical nodes which are already composed into the set can be obtained through reverse calculation. In the process of transverse capacity reduction, only the reset group related to the physical node to be deleted can be cleared. Based on the residual quantity of physical nodes in the relevant reset groups after the elimination processing is executed, the physical node groups which are divided into different logic partitions and the weight values of which are distributed in a preset range are supplemented into the reset groups after the elimination processing is executed. In the transverse capacity expansion process, the residual hash values of all physical nodes can be obtained through reverse calculation, the physical nodes with the residual hash values meeting the requirements are brought into the calculation process, the physical nodes meeting the conditions are determined, and the physical nodes are combined into a new allocation group.
At step 404, multiple shards of the data copy are stored at physical nodes of the same set of locations.
In this embodiment, the specific operation of step 404 is described in detail in step 203 in the embodiment shown in fig. 2 or step 304 in the embodiment shown in fig. 3, and will not be described herein.
In addition, fig. 9 is a schematic diagram of multiple slices of stored data copies of the data copy storage method of the distributed system under an application scenario according to an embodiment of the disclosure.
As shown in FIG. 9, in an alternative implementation manner of the present embodiment, based on the weight values of physical nodes, physical nodes belonging to different logical partitions in the distributed system are combined into a homing group, and multiple fragments of a data copy are stored in the physical nodes of the same homing group, so that fault domain faults of various levels can be covered, dynamic planning of the homing group is realized, dynamic transverse capacity expansion and contraction of the homing group are supported, and in addition, the weight values of the physical nodes and the weight values of the homing group are introduced, so that the application scenario of heterogeneous clusters can be supported.
Specifically, physical nodes belonging to different logical partitions in the distributed system may be combined into a set of groups according to the division of the logical partitions. For example, the distributed system shown in FIG. 9 may include three logical partitions, logical partition 1, logical partition 2, and logical partition 3, respectively, with physical node NodeA from logical partition 1, physical nodes NodeB, nodeD from logical partition 2, and physical nodes NodeC, nodeE from logical partition 3. Based on the weight values of the physical nodes, the physical nodes which are divided into different logic partitions in the distributed system and the weight values of which are distributed in a preset range are combined into a set. For example, nodeA, nodeB and NodeC may be combined into a organizer 1, and NodeA, nodeD and NodeE may be combined into organizer 2. The plurality of slices of the data replica may include slices 1,2,3, 4, and 5, for example, slice 2, which may be stored in the NodeA, nodeB, and NodeC of the homing group 1, and for example, slice 4, which may be stored in the NodeA, nodeD, and NodeE of the homing group 2. Thus, all fragments of the data copy are lost only in the event of a simultaneous failure of the NodeA, nodeB and NodeC or of the NodeA, nodeD and NodeE. However, the physical nodes in the allocation group come from different logical partitions, so that the overall fault of the logical partition level smaller than the number of the partitions of the data copy can be tolerated, and the risk of data loss when the physical nodes of the different logical partitions are in fault is greatly reduced.
Furthermore, for the distribution of data replicas, the overall capacity of the homing group depends on the physical node with the weakest internal capacity, so the probability of data fall of the homing group 1 is greater.
Fig. 10 is a schematic diagram of multiple slices of a stored data copy of a data copy storage method of a distributed system under an application scenario according to an embodiment of the disclosure.
As shown in fig. 10, in some embodiments of the present disclosure, the data placement rate of the placement group with a large weight value is high. Optionally, the data drop rate P is proportional to the reset group weight value CW and inversely proportional to the greatest common divisor GCD (CW 1, CW2,.. CWn) of the weight values of all physical nodes in the reset group. Specifically, the three relationships can be expressed by the formula (3),
Wherein P is the data drop rate, CW is the weight value of the homing group, and GCD (CW 1, CW2,.. CWn) is the greatest common divisor of the weight values of all physical nodes in the homing group.
The Shuffle shown in fig. 10 refers to the process of repartitioning the slices of the data copy. A slot refers to a data structure used to store a copy of data. Optionally, solt is defined according to the data drop rate P, so that Solt attributes carry the features of the set of settings (Copyset). When selecting the data homing bit and distributing the data slice slots, a homing group corresponding to the data slice can be determined, and the data drop rate P determines the number of slots corresponding to the homing group, for example, the number p1=3 of slots corresponding to Copyset1, the number p2=2 of slots corresponding to copyset2, and the number p3=1 of slots corresponding to copyset3. And Shullte is carried out on all Copyset slot, slots are sequentially taken for distribution on the incoming data fragments, so that all slots are guaranteed to be opportunistically distributed, and the probability control of the distribution method is more accurate.
Therefore, the data copy storage method of the distributed system provided by the embodiment of the disclosure determines the probability of the physical node bearing the data based on the weight value of the physical node, and stores each fragment of the data copy in different logical partitions, so that the data copy security is ensured, the speed of recovering the system data is improved, the load balancing capability of the system is optimized, and the overall storage cost of the system is reduced.
Referring to fig. 11, fig. 11 is a flowchart of a method for accessing a data copy of a distributed system according to an embodiment of the disclosure, where the flowchart 500 includes the following steps:
Step 501, an access list is constructed based on state information of physical nodes where a plurality of fragments of a data copy are located, wherein the plurality of fragments are stored in the same allocation group of a distributed system, and the allocation group comprises physical nodes which belong to different logical partitions in the distributed system.
Since the features described in the data copy storage method of the distributed system above may be applied, in whole or in part, to "the plurality of slices are stored in the same homing group of the distributed system" in step 501 described herein, the homing group includes physical nodes that belong to different logical partitions in the distributed system, and thus, related or similar contents thereof will not be described in detail. However, it will be appreciated by those skilled in the art that the plurality of fragments of the data copy in the data copy access method of the distributed system are stored in the physical nodes described in the data copy storage method of the distributed system, and thus the characteristics, implementation principles and technical effects of step 501 may be similar to those of the data copy storage method of the distributed system.
In some alternative implementations of the present embodiment, the physical nodes of the shards that are in the same logical partition as the access client are arranged before the physical nodes of the shards that are in a different logical partition than the access client in the access list. In addition, when the shards of the co-regional data copy are unavailable, the shards of the cross-regional data copy are selected to ensure high availability of the data.
Further, in some embodiments of the present disclosure, in the access list, the physical nodes in the access list may be arranged in an ascending order according to the size of the number of waiting requests of the physical nodes. Alternatively, in the access list, the physical nodes in the access list may be arranged in descending order according to the magnitude of the historical access success rate of the physical nodes. In addition, in the access list, the physical nodes in the access list may be arranged in an ascending order according to the size of the historical access delay value of the physical nodes.
Step 502, accessing a plurality of fragments based on the arrangement order of the physical nodes in the access list.
In this embodiment, by creating the access list, the user can access sequentially according to the physical node arrangement order in the access list. When access abnormality occurs, physical nodes in the access list can be sequentially selected for retry.
With further continued reference to FIG. 12, a flow 600 of another embodiment of a data copy access method of a distributed system according to the present disclosure is shown. The process 600 includes the steps of:
And 601, constructing an access list based on state information of physical nodes where a plurality of fragments of the data copy are located, wherein the plurality of fragments are stored in the same allocation group of the distributed system, and the allocation group comprises the physical nodes which belong to different logic partitions in the distributed system.
In this embodiment, the specific operation of step 601 is described in detail in step 501 in the embodiment shown in fig. 11, and will not be described herein.
Step 602, updating the arrangement sequence of the physical nodes in the access list.
In this embodiment, before accessing the plurality of slices based on the arrangement order of the physical nodes in the access list, the arrangement order of the physical nodes in the access list may be updated.
Specifically, the set of settings includes multiple shards of the copy of the data, so that by accessing the set of settings, the shards of the data stored on its physical nodes can be obtained. By establishing the access list, the user can access in sequence according to the physical node arrangement sequence in the access list. When access abnormality occurs, physical nodes in the access list can be sequentially selected for retry. Therefore, the arrangement order of the physical nodes in the access list determines the access order of the client. Alternatively, the ordering of the physical nodes in the access list may be rearranged upon accessing the shards of the data copy to ensure that the best access service is provided and that the access operation is resumed and completed again as soon as possible in the event of an access failure.
In addition, in reordering physical nodes in an access list according to an access policy to optimize access performance, the ordering may be according to the priority of the following policies:
For example, updating the ordering of physical nodes in the access list may include ordering physical nodes of a shard that is in the same logical partition as the access client before physical nodes of a shard that is in a different logical partition than the access client in the access list. The slicing of the data copy in the same logical partition with the request source is preferably selected, so that network delay and cross-region data transmission cost can be reduced. In addition, when the shards of the co-regional data copy are unavailable, the shards of the cross-regional data copy are selected to ensure high availability of the data.
Further, in some embodiments of the present disclosure, updating the ordering of the physical nodes in the access list may include ordering the physical nodes in the access list in an ascending order according to the size of the number of waiting requests of the physical nodes. The preference for shards of the data copy that currently wait for a smaller number of requests can reduce access latency and balance the load in the distributed system.
Alternatively, updating the order of physical nodes in the access list may include ordering the physical nodes in the access list in descending order according to the magnitude of historical access success rates of the physical nodes. The slicing of the data copy with higher historical access success rate is preferentially selected, so that the reliability of access can be improved.
In addition, updating the ordering of the physical nodes in the access list may include ordering the physical nodes in the access list in an ascending order according to the size of the historical access delay value of the physical nodes. The preference for the shards of the data copy with lower historical access latency may reduce the response time of the access.
In addition, updating the order of physical nodes in the access list may further include equalizing access requests of each of the homing groups as much as possible, reducing the probability of access bottlenecks caused by accessing only a single homing group. Taking the example of a distributed system comprising three homing groups A, B and C, homing groups A, B and C store slices of the same copy of data. The physical nodes in the access list are sequentially accessed to the physical nodes of the configuration groups A, B and C in the initial state. If the physical nodes in the configuration group A are found to have access failure in the access process, the arrangement sequence of the physical nodes in the access list can be rearranged when the fragments of the data copy are accessed, so that the best access service can be provided, and the access operation can be recovered and completed again as soon as possible under the condition of the access failure. For example, the physical nodes of the allocation groups B, C and a are sequentially accessed, so that the fragments of the data copies can be obtained from other allocation groups as soon as possible, and the influence of access failure is reduced.
Step 603, accessing the plurality of fragments based on the arrangement order of the physical nodes in the access list.
In this embodiment, the specific operation of step 603 is described in detail in step 502 in the embodiment shown in fig. 11, and will not be described herein.
With further reference to FIG. 13, the present disclosure provides one embodiment of a data copy storage apparatus of a distributed system, which apparatus embodiment corresponds to the method embodiment shown in FIG. 2, which apparatus is particularly applicable in a variety of electronic devices.
As shown in fig. 13, the data copy storage device 700 of the distributed system of the present embodiment may include a weight determining module 701, a configuration group acquiring module 702, and a data storage module 703. The system comprises a weight determining module 701, a grouping acquisition module 702 and a data storage module 703, wherein the weight determining module 701 is configured to determine the weight value of a physical node based on the state information of the physical node in the distributed system, wherein the state information comprises the load information of the physical node, the grouping acquisition module 702 is configured to combine the physical nodes which are divided into different logic partitions in the distributed system into a grouping based on the weight value of the physical node, and the data storage module 703 is configured to store a plurality of fragments of a data copy in the physical nodes of the same grouping.
In the embodiment, in the data copy storage device 700 of the distributed system, the specific processes of the weight determining module 701, the configuration group acquiring module 702, and the data storage module 703 and the technical effects thereof may refer to the related descriptions of steps 201-203 in the corresponding embodiment of fig. 2, and are not repeated herein.
In some alternative implementations of the present embodiment, the set obtaining module 702 includes a combining unit configured to combine physical nodes that belong to different logical partitions and have weight values distributed in a predetermined range into a set based on the required number of physical nodes in the set.
In some optional implementations of this embodiment, the set obtaining module 702 further includes a first number obtaining unit configured to determine a number of set groups based on a number of physical nodes in the distributed system of the number of the plurality of slices, and a second number obtaining unit configured to determine a required number of physical nodes in the set groups based on the number of physical nodes in the distributed system and the number of set groups.
In some alternative implementations of the present embodiment, the data storage module 703 may include an initial value obtaining unit configured to determine an initial value of a weight value of the homing group based on the weight values of the plurality of physical nodes in the homing group, a homing group weight value adjusting unit configured to dynamically adjust the weight value of the homing group, and a homing group determining unit configured to store the plurality of fragments to a homing group with the dynamically adjusted weight value being the largest.
In some optional implementations of this embodiment, the reset group weight adjustment unit may include a first response subunit configured to dynamically adjust the reset group weight based on a first predetermined period, or a second response subunit configured to dynamically adjust the reset group weight based on a remaining hash value of the physical node in the reset group being less than a first predetermined value, wherein the remaining hash value characterizes a number of times the physical node remains available to participate in the reset group.
In some alternative implementations of the present embodiment, the obtaining module 702 of the preset group may further include a rejection unit configured to reject physical nodes with residual hash values smaller than 0 in the preset group, and a supplementing unit configured to supplement, based on the residual number of physical nodes in the preset group after the rejection process is performed, the physical node groups which are respectively belonging to different logical partitions and have weight values distributed in a predetermined range into the preset group after the rejection process is performed.
In some alternative implementations of the present embodiment, the logical partitions may include at least one of a rack partition, a switch partition, and a machine room partition zone partition.
In some alternative implementations of the present embodiment, the weight determining module 701 may include an updating unit configured to dynamically update the state information of the physical node, and a physical node weight value adjusting unit configured to adjust the weight value of the physical node based on the updated state information of the physical node.
In some alternative implementations of the present embodiment, the load information of the physical node includes a CPU idle rate, a memory idle rate, a disk idle rate, and a network idle rate.
In some optional implementations of this embodiment, the physical node weight value adjustment unit may include a selection subunit configured to adjust the weight value of the physical node based on a minimum value of the updated CPU idle rate, memory idle rate, disk idle rate, and network idle rate.
In some optional implementations of the present embodiment, the physical node weight adjustment unit may further include a smoothing subunit configured to perform smoothing processing on the CPU idle rate data set, the memory idle rate data set, the disk idle rate data set, and the network idle rate data set in the update period, and a minimum value determination subunit configured to determine a minimum value of the updated CPU idle rate, the memory idle rate, the disk idle rate, and the network idle rate based on the smoothed CPU idle rate data set, the memory idle rate data set, the disk idle rate data set, and the network idle rate data set, respectively.
In some alternative implementations of the present embodiment, the updating unit may include a period response subunit configured to update the state information of the physical node based on the second predetermined period, or a preset condition response subunit configured to update the state information of the physical node based on the load information of the physical node satisfying a preset condition.
In some alternative implementations of the present embodiment, the data copy storage 700 of the distributed system further includes a stop receiving module configured to stop receiving the shards by the first physical node in response to the adjusted weight value of the first physical node being below a first predetermined threshold.
With further reference to fig. 14, the present disclosure provides one embodiment of a data copy access apparatus of a distributed system, which corresponds to the method embodiment shown in fig. 11, and which is particularly applicable to various electronic devices.
As shown in fig. 14, the data copy accessing apparatus 800 of the distributed system of the present embodiment may include an access list construction module 801 configured to construct an access list based on state information of physical nodes where a plurality of slices of a data copy are located, where the plurality of slices are stored in the same homing group of the distributed system, the homing group includes physical nodes that belong to different logical partitions in the distributed system, and an access module 802 configured to access the plurality of slices based on an arrangement order of the physical nodes in the access list.
In this embodiment, the specific processing of each module in the data copy accessing apparatus 800 of the distributed system and the technical effects thereof may refer to the description of steps 501-502 in the corresponding embodiment of fig. 11, which is not repeated herein.
In some optional implementations of this embodiment, the data copy accessing apparatus 800 of the distributed system further includes an updating module configured to update the arrangement order of the physical nodes in the access list before accessing the plurality of fragments based on the arrangement order of the physical nodes in the access list.
In some alternative implementations of the present embodiment, the update module includes a first update unit configured to rank, in the access list, physical nodes of a shard that is in the same logical partition as the access client before physical nodes of a shard that is in a different logical partition than the access client.
In some optional implementations of this embodiment, the update module further includes a second update unit configured to arrange the physical nodes in the access list in an ascending order according to a size of a waiting request number of the physical nodes.
In some optional implementations of this embodiment, the update module further includes a third update unit configured to rank the physical nodes in the access list in descending order according to a magnitude of a historical access success rate of the physical nodes.
In some alternative implementations of the present embodiment, the update module further includes a fourth update unit configured to arrange the physical nodes in the access list in ascending order according to the size of the historical access delay value of the physical nodes.
According to an embodiment of the present disclosure, there is further provided an electronic device including at least one processor, and a memory communicatively connected to the at least one processor, where the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the data copy storage method of the distributed system or the data copy access method of the distributed system described in any of the above embodiments when executed.
According to an embodiment of the present disclosure, there is further provided a readable storage medium storing computer instructions for enabling a computer to implement the data copy storage method of the distributed system or the data copy access method of the distributed system described in any of the above embodiments when executed.
According to an embodiment of the present disclosure, there is also provided a computer program product capable of implementing the data copy storage method of a distributed system or the data copy access method of a distributed system described in any of the above embodiments when executed by a processor.
Fig. 15 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 15, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
Various components in the device 600 are connected to the I/O interface 905, including an input unit 906, such as a keyboard, mouse, etc., an output unit 907, such as various types of displays, speakers, etc., a storage unit 908, such as a magnetic disk, optical disk, etc., and a communication unit 909, such as a network card, modem, wireless communication transceiver, etc. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, such as a data copy storage method of a distributed system or a data copy access method of a distributed system. For example, in some embodiments, a data copy storage method of a distributed system or a data copy access method of a distributed system may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM902 and/or the communication unit 909. When the computer program is loaded into RAM 903 and executed by computing unit 901, one or more steps of the data copy storage method of the distributed system or the data copy access method of the distributed system described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the data copy storage method of the distributed system or the data copy access method of the distributed system in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), and the Internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) PRIVATE SERVER service.
According to the technical scheme of the embodiment of the disclosure, the probability of losing the data copy caused by the failure of the physical node under different logic partitions can be reduced. Specifically, by carrying out combined division on physical nodes of different logical partitions to form a plurality of homing groups, and storing a plurality of fragments of a data copy on the physical nodes of the same homing group, when a plurality of physical nodes simultaneously fail, if the physical nodes do not belong to one homing group, the possibility that all fragments of the data copy are lost does not exist. In other words, the data copy storage method of the distributed system provided by the embodiment of the disclosure may determine the probability that the physical node accepts data based on the weight value of the physical node, and store each slice of the data copy in different logical partitions. Therefore, the data copy safety is ensured, the speed of system data recovery is improved, the load balancing capability of the system is optimized, and the storage cost of the whole system is reduced.
Further, according to the technical solution of the embodiments of the present disclosure, the homing group includes a plurality of slices of the data copy, and thus by accessing the homing group, the data slices stored on its physical nodes can be obtained. By establishing the access list, the user can access in sequence according to the physical node arrangement sequence in the access list. When access abnormality occurs, physical nodes in the access list can be sequentially selected for retry.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (41)

1. A method of data copy storage for a distributed system, comprising:
Determining a weight value of a physical node in the distributed system based on state information of the physical node, wherein the state information comprises load information of the physical node;
Based on the weight value of the physical nodes, combining the physical nodes which belong to different logic partitions in the distributed system into a reset group;
and storing the plurality of fragments of the data copy in the physical nodes of the same homing group.
2. The storage method of claim 1, wherein combining physical nodes belonging to different logical partitions in the distributed system into a set of attributes based on weight values of the physical nodes comprises:
and based on the required quantity of the physical nodes in the allocation group, combining the physical nodes which belong to different logic partitions and have weight values distributed in a preset range into the allocation group.
3. The storage method of claim 2, wherein the required number of physical nodes in the provisioning group is determined based on:
Determining the number of the allocation groups based on the number of the plurality of the fragments and the number of the physical nodes in the distributed system;
and determining the required number of the physical nodes in the allocation group based on the number of the physical nodes in the distributed system and the number of the allocation group.
4. The storage method of claim 1, wherein storing the plurality of shards of the copy of data at physical nodes of a same organization group comprises:
determining an initial value of the weight value of the homing group based on the weight values of a plurality of the physical nodes in the homing group;
Dynamically adjusting the weight value of the allocation group;
and storing the fragments into a reset group with the maximum dynamically adjusted weight value.
5. The storage method of claim 4, wherein dynamically adjusting the weight value of the homing group comprises:
dynamically adjusting the weight value of the homing group based on a first predetermined period, or
And dynamically adjusting the weight value of the homing group based on the residual hash value of the physical node in the homing group being smaller than a first preset value, wherein the residual hash value characterizes the number of times the physical node can participate in the homing group.
6. The storage method of claim 5, wherein combining physical nodes belonging to different logical partitions in the distributed system into a set of attributes based on the weight values of the physical nodes further comprises:
rejecting physical nodes with the residual hash value smaller than 0 in the reset group;
And supplementing the physical node groups which belong to the different logic partitions and have weight values distributed in a preset range into the reset group after the elimination processing based on the residual quantity of the physical nodes in the reset group after the elimination processing is executed.
7. The storage method of claim 1, wherein the logical partition comprises at least one of a rack partition, a switch partition, a machine room partition, and a zone partition.
8. The storage method of claim 1, further comprising:
dynamically updating the state information of the physical node;
And adjusting the weight value of the physical node based on the updated state information of the physical node.
9. The storage method of claim 8, wherein the load information of the physical node comprises a CPU idle rate, a memory idle rate, a disk idle rate, and a network idle rate.
10. The storage method of claim 9, wherein adjusting the weight value of the physical node based on the updated state information of the physical node comprises:
And adjusting the weight value of the physical node based on the minimum value of the updated CPU idle rate, the updated memory idle rate, the updated disk idle rate and the updated network idle rate.
11. The storage method of claim 10, further comprising:
Performing smoothing processing on the data set of the CPU idle rate, the data set of the memory idle rate, the data set of the disk idle rate and the data set of the network idle rate in an updating period;
and determining the minimum value of the updated CPU idle rate, the updated memory idle rate, the updated disk idle rate and the updated network idle rate based on the smoothed CPU idle rate data set, the smoothed memory idle rate data set, the updated disk idle rate data set and the updated network idle rate data set respectively.
12. The storage method of claim 8, wherein dynamically updating the state information of the physical node comprises:
updating state information of the physical node based on a second predetermined period, or
And updating the state information of the physical node based on the load information of the physical node meeting a preset condition.
13. The storage method of claim 8, wherein the method further comprises:
and stopping receiving the fragments by the first physical node in response to the adjusted weight value of the first physical node being lower than a first preset threshold.
14. A method for accessing a data copy of a distributed system, comprising:
Constructing an access list based on state information of physical nodes where a plurality of fragments of the data copy are located, wherein the fragments are stored in the same allocation group of the distributed system, and the allocation group comprises the physical nodes belonging to different logical partitions in the distributed system;
and accessing a plurality of fragments based on the arrangement sequence of the physical nodes in the access list.
15. The access method of claim 14, further comprising:
and updating the arrangement sequence of the physical nodes in the access list before accessing a plurality of fragments based on the arrangement sequence of the physical nodes in the access list.
16. The access method of claim 15, wherein updating the order of the physical nodes in the access list comprises:
in the access list, physical nodes of the shards located in the same logical partition as the access client are arranged before physical nodes of the shards located in a different logical partition than the access client.
17. The access method of claim 15, wherein updating the order of the physical nodes in the access list further comprises:
And according to the size of the waiting request number of the physical nodes, the physical nodes in the access list are arranged in an ascending order.
18. The access method of claim 15, wherein updating the order of the physical nodes in the access list further comprises:
And arranging the physical nodes in the access list in a descending order according to the historical access success rate of the physical nodes.
19. The access method of claim 15, wherein updating the order of the physical nodes in the access list further comprises:
And according to the size of the historical access delay value of the physical nodes, the physical nodes in the access list are arranged in an ascending order.
20. A data copy storage device of a distributed system, comprising:
A weight determination module configured to determine a weight value of a physical node in the distributed system based on state information of the physical node, wherein the state information includes load information of the physical node;
the system comprises a physical node acquisition module, a grouping group acquisition module and a grouping group acquisition module, wherein the physical node acquisition module is configured to combine physical nodes which belong to different logic partitions in the distributed system into a grouping group based on the weight value of the physical node;
and the data storage module is configured to store the plurality of fragments of the data copy in physical nodes of the same collocation group.
21. The storage device of claim 20, wherein the homing group retrieval module comprises:
and the combining unit is configured to combine the physical nodes which belong to the different logic partitions and have weight values distributed in a preset range into the allocation group based on the required quantity of the physical nodes in the allocation group.
22. The storage device of claim 21, wherein the homing group retrieval module further comprises;
a first number obtaining unit configured to determine the number of the allocation groups based on the number of the physical nodes in the distributed system;
and the second quantity obtaining unit is configured to determine the required quantity of the physical nodes in the allocation group based on the quantity of the physical nodes in the distributed system and the quantity of the allocation group.
23. The storage device of claim 20, wherein the data storage module comprises:
An initial value calculation unit configured to determine an initial value of the weight values of the homing group based on the weight values of the plurality of physical nodes in the homing group;
a reset group weight value adjustment unit configured to dynamically adjust a weight value of the reset group;
And the reset group determining unit is configured to store the plurality of fragments into the reset group with the largest dynamically adjusted weight value.
24. The storage device according to claim 23, wherein the homing group weight adjustment unit includes:
A first response subunit configured to dynamically adjust the weight value of the set based on a first predetermined period, or
And a second response subunit configured to dynamically adjust a weight value of the homing group based on a remaining hash value of the physical node in the homing group being less than a first predetermined value, wherein the remaining hash value characterizes a number of times the physical node remains available to participate in the homing group.
25. The storage device of claim 24, wherein the homing group retrieval module comprises:
A rejecting unit configured to reject physical nodes in the homing group for which the remaining hash value is less than 0;
And the supplementing unit is configured to supplement the physical node groups which are respectively distributed in the different logic partitions and the weight values of which are distributed in a preset range into the reset group after the elimination processing based on the residual quantity of the physical nodes in the reset group after the elimination processing is executed.
26. The storage device of claim 20, wherein the logical partition comprises at least one of a rack partition, a switch partition, a machine room partition, and a domain partition.
27. The storage device of claim 20, wherein the weight determination module comprises:
An updating unit configured to dynamically update state information of the physical node;
And the physical node weight value adjusting unit is configured to adjust the weight value of the physical node based on the updated state information of the physical node.
28. The storage device of claim 27, wherein the load information of the physical node comprises a CPU idle rate, a memory idle rate, a disk idle rate, and a network idle rate.
29. The storage device of claim 28, wherein the physical node weight adjustment unit comprises:
And the selecting subunit is configured to adjust the weight value of the physical node based on the minimum value of the updated CPU idle rate, the updated memory idle rate, the updated disk idle rate and the updated network idle rate.
30. The storage device of claim 29, wherein the physical node weight adjustment unit further comprises:
a smoothing processing subunit configured to perform smoothing processing on the CPU-idle-rate data set, the memory-idle-rate data set, the disk-idle-rate data set, and the network-idle-rate data set in an update period;
A minimum value determining subunit configured to determine the minimum value of the updated CPU idle rate, the memory idle rate, the disk idle rate, and the network idle rate based on the smoothed CPU idle rate data set, the memory idle rate data set, the disk idle rate data set, and the network idle rate data set, respectively.
31. The storage device of claim 27, wherein the update unit comprises:
A period response subunit configured to update the state information of the physical node based on a second predetermined period, or
And the preset condition response subunit is configured to update the state information of the physical node based on the fact that the load information of the physical node meets a preset condition.
32. The storage device of claim 27, further comprising:
And the stopping receiving module is configured to stop receiving the fragments by the first physical node in response to the adjusted weight value of the first physical node being lower than a first preset threshold.
33. A data copy access apparatus of a distributed system, comprising:
an access list construction module configured to construct an access list based on state information of physical nodes where a plurality of fragments of the data copy are located, wherein the plurality of fragments are stored in a same allocation group of the distributed system, and the allocation group includes the physical nodes belonging to different logical partitions in the distributed system;
and the access module is configured to access the fragments based on the arrangement sequence of the physical nodes in the access list.
34. The access device of claim 33, further comprising:
and the updating module is configured to update the arrangement sequence of the physical nodes in the access list before accessing the plurality of fragments based on the arrangement sequence of the physical nodes in the access list.
35. The access device of claim 34, wherein the update module comprises:
a first updating unit configured to arrange physical nodes of the shards located in the same logical partition as the access client before physical nodes of the shards located in a different logical partition as the access client in the access list.
36. The access device of claim 34, wherein the update module further comprises:
and the second updating unit is configured to arrange the physical nodes in the access list in an ascending order according to the size of the waiting request number of the physical nodes.
37. The access device of claim 34, wherein the update module further comprises:
And a third updating unit configured to arrange the physical nodes in the access list in descending order according to the magnitude of the historical access success rate of the physical nodes.
38. The access device of claim 34, wherein the update module further comprises:
and a fourth updating unit configured to arrange the physical nodes in the access list in ascending order according to the size of the historical access delay value of the physical nodes.
39. An electronic device, comprising:
at least one processor;
a memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data copy storage method of the distributed system of any one of claims 1-13 or the data copy access method of the distributed system of any one of claims 14-19.
40. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the data copy storage method of the distributed system of any one of claims 1-13 or the data copy access method of the distributed system of any one of claims 14-19.
41. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the data copy storage method of a distributed system according to any of claims 1-13 or the data copy access method of a distributed system according to any of claims 14-19.
CN202411466064.0A 2024-10-18 2024-10-18 Data copy storage and access method and device of distributed system Pending CN119201002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411466064.0A CN119201002A (en) 2024-10-18 2024-10-18 Data copy storage and access method and device of distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411466064.0A CN119201002A (en) 2024-10-18 2024-10-18 Data copy storage and access method and device of distributed system

Publications (1)

Publication Number Publication Date
CN119201002A true CN119201002A (en) 2024-12-27

Family

ID=94063863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411466064.0A Pending CN119201002A (en) 2024-10-18 2024-10-18 Data copy storage and access method and device of distributed system

Country Status (1)

Country Link
CN (1) CN119201002A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120848812A (en) * 2025-09-22 2025-10-28 浪潮云信息技术股份公司 A distributed storage system expansion and contraction method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120848812A (en) * 2025-09-22 2025-10-28 浪潮云信息技术股份公司 A distributed storage system expansion and contraction method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN110134495B (en) A container cross-host online migration method, storage medium and terminal device
US9645756B2 (en) Optimization of in-memory data grid placement
US9954758B2 (en) Virtual network function resource allocation and management system
US20150127649A1 (en) Efficient implementations for mapreduce systems
CN116057507A (en) Storage-level load balancing
CN109451540A (en) A kind of resource allocation methods and equipment of network slice
WO2017020742A1 (en) Load balancing method and device
CN115604269A (en) Load balancing method and device of server, electronic equipment and storage medium
CN112118314A (en) Load balancing method and device
CN114035906B (en) Virtual machine migration method and device, electronic equipment and storage medium
CN117785465A (en) Resource scheduling method, device, equipment and storage medium
WO2024239865A1 (en) Hot migration method for virtual machine, and related device
US9462521B2 (en) Data center network provisioning method and system thereof
CN105450784B (en) Apparatus and method for allocating consumer nodes to messages in MQ
CN119254780A (en) Large model processing method, device, equipment and medium based on distributed cache
CN119201002A (en) Data copy storage and access method and device of distributed system
CN105471893A (en) Distributed equivalent data stream connection method
CN105635285B (en) A kind of VM migration scheduling method based on state aware
CN115373862A (en) Dynamic resource scheduling method, system and storage medium based on data center
US8918555B1 (en) Adaptive and prioritized replication scheduling in storage clusters
CN112887407B (en) Job flow control method and device for distributed cluster
WO2016065198A1 (en) High performance hadoop with new generation instances
US10594620B1 (en) Bit vector analysis for resource placement in a distributed system
CN119440726A (en) Resource optimization method, device, electronic device and medium for heterogeneous computing cluster
Fazul et al. PRBP: A prioritized replica balancing policy for HDFS balancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination