Balancing method of loads based on object storage equipment
Technical field
The invention belongs to computer memory technical field, be specifically related to a kind of balancing method of loads based on object storage equipment.
Background technology
Along with the develop rapidly of the computer technology and the network communications technology, global information storage to be to surpass 30% speed sharp increase every year, and the Data Growth rate on the hard disk of being stored in is especially up to 114%.In the face of the explosive increase of data volume and people demand to the aspects such as capacity, security, extensibility and availability of storage system, traditional direct memory module (Direct Access Storage, DAS) seemed unable to do what one wishes, this makes the network storage become the inevitable trend of technical field of memory.The main framework of current network storage has attached net storage (NAS, Network Attached Storage) and storage area network (SAN, Storage Area Network).Though NAS and SUN have solved many problems of direct storage, the limitation of himself is also arranged, can not satisfy the requirement of memory technology development to a certain extent fully.Object-oriented memory technology (OBS, Object Basedstorage) is data-centered network storage pattern, adopted object as BTU, data storage is separated with metadata management, broken through the file-sharing restriction of data routing bottleneck common among the NAS and SAN, and even better in characteristics such as security, professional platform independence, availability and extensibility, may become the standard that the Next Generation Internet network is stored.
Object storage system (OBSS, Object-Based Storage System) combines object interface and intelligent storage equipment, can reach the storage size of PB level.And along with the expansion of storage system scale and the enhancing of frequency of utilization, the unbalanced phenomenon of load inevitably can manifest between memory node, becomes the bottleneck of total system, when serious even can cause the performance of total system sharply to descend.
Summary of the invention
The invention provides a kind of balancing method of loads based on object storage equipment, purpose is by rational management I/O load and hot spot data migration, system load balancing is distributed between each memory node, to give full play to the performance advantage of each high-performance storage devices node.
A kind of balancing method of loads based on object storage equipment of the present invention, order comprises the steps:
(1) active load detecting step, each device node initiatively carries out the load detecting of self, and load factor is sent to meta data server;
(2) apparatus of load statistic procedure, the load of calculating each device node according to method for normalizing, and according to loading condition each device node is sorted;
(3) object migration and replica management step, the device node that load value is surpassed mobility threshold moves and replica management as the focus object and to it;
(4) object properties spread step, property pages to OSD SCSI consensus standard is expanded, a self-defined load attribute page or leaf, these five attribute items of predefine load value, temperature, primary copy judgement, copy information and load weights are preserved load information, the migration information of each device node of needs in the mode of object properties;
(5) I/O request processing step, according to the distributed intelligence of object, scheduling I/O request between each device node makes each device node load balancing.
Described balancing method of loads based on object storage equipment is characterized in that described active load detecting sequence of steps comprises following process:
(1) load information of each device node reading system queue length and disk, CPU, internal memory and network from/proc Virtual File System;
(2) calculate each device node load factor;
(3) each device node load factor is judged to delay time 10 seconds in load factor≤30; If delayed time 30 seconds in 30<load factor≤60; Delayed time 60 seconds in load factor>60;
(4) load information of reading system queue length and disk, CPU, internal memory and network from/proc Virtual File System once more;
(5) send each device node load factor to meta data server, turn over journey (1).
Described balancing method of loads based on object storage equipment is characterized in that described apparatus of load statistic procedure comprises following process in proper order:
(1) definition load factor is defined as each device node load factor LOAD:
LOAD=W
1*Lrql+W
2*Ldisk+W
3*Lcpu+W
4*Lmem+W
5*Lnet
Carry out the normalization management, Lrql, Ldisk, Lcpu, Lmem and Lnet are respectively system queue length, disk load, cpu load, internal memory load and the offered load of object storage equipment, W
1, W
2, W
3, W
4, W
5Be the weights of correspondence, and ∑ W
i(i=1,2,3,4,5)=1;
(2) setting the initial load weights, serves as according to W is set with the load value that obtains behind the load full test of file system standard test procedure Iozone to each device node
iInitial value;
(3) judge the individual event load state;
(4) according to load information correction load weights W
i, continuous five detection individual event load values are all greater than 80 or less than 20, and then its corresponding weights increase or reduce by 10%, otherwise constant;
(5) calculate each device node load factor;
(6) send each device node load factor to meta data server;
(7) meta data server sorts by each device node load factor size, safeguards the ascending order device queue, turns over journey (3).
Described balancing method of loads based on object storage equipment is characterized in that described object migration and replica management sequence of steps comprise following process:
(1) set mobility threshold, this mobility threshold is the critical condition of moving;
(2) detect load, detect each device node load, and add up the load average of each device node nearest ten times;
(3) judge the load average, nearest ten load averages of each device node are judged, if this average greater than mobility threshold, turns over journey (4), otherwise turns over journey (5);
(4) meta data server, moves object as the migration target from the contiguous device node of selecting the load factor minimum; Meta data server upgraded copy information after migration was finished;
(5) if nearest ten load averages less than half of threshold value, the deletion copy, the upgating object metadata information, otherwise, turn over journey (2).
Described balancing method of loads based on object storage equipment is characterized in that described object properties spread step comprises following process in proper order:
(1) determines extended mode, determine to use interim attribute extended mode;
(2) stipulate the attribute page number of each device node, the attribute page number of each device node for C000 0000h to EFFF FFFFh, i.e. D=C000 0000h, then D+5h represents C000 0005h;
(3) definition load attribute page structure, definition load value, temperature, primary copy judgement, copy information and these five attribute items of load weights.
Described balancing method of loads based on object storage equipment is characterized in that described I/O request processing step comprises following process in proper order:
(1) judge user's request type, read request is carried out process (2); Write request is carried out process (3); Update request is carried out process (4);
(2) judge whether the object of asking exists copy, exists then to select the underload copy to read, and turns over journey (5); Otherwise directly read, turn over journey (5) from this object;
(3) according to file size decision burst number N, the N that load factor is minimum in a meta data server selective system device node carries out write operation, turns over journey (5);
(4) after the primary copy information of judgement object, the object master is originally upgraded operation, after renewal was finished, all copies of upgating object carried out next process again;
(5) update metadata information and object properties information.
The present invention is applicable to object-based large-scale storage systems, has following characteristics:
(1) object storage equipment initiatively obtains every load, sends load value;
(2) weights of every load are according to measured data, and can dynamically revise;
(3) judge that by load value attribute, temperature attribute, the primary copy of EXPANDING DISPLAY AREA attribute and copy information attribute provide decision information;
(4) expanding to flow process that meta data server carries out I/O operation by load detecting and attribute is optimized scheduling decision-making foundation is provided;
(5) realize the load migration of hot spot data according to dynamic load information;
The present invention not only can realize the load balancing of system, is equally applicable to the application that the backup selection of significant data and the backup node selection of meta data server etc. need Dynamic Selection equipment.
Description of drawings
Fig. 1 is a FB(flow block) of the present invention;
Fig. 2 is an active load detecting flow chart of steps of the present invention;
Fig. 3 is a load statistics flow chart of steps of the present invention;
Fig. 4 is object migration of the present invention and replica management flow chart of steps;
Fig. 5 is an object properties spread step process flow diagram of the present invention;
Fig. 6 is an extended attribute page structure synoptic diagram of the present invention;
Fig. 7 is an I/O request processing step process flow diagram of the present invention.
Embodiment
The present invention is described in more detail below in conjunction with drawings and Examples.
Fig. 1 is a FB(flow block) of the present invention.The present invention includes: (1) is the load detecting step initiatively; (2) apparatus of load statistic procedure; (3) object migration and replica management step; (4) object properties spread step; (5) I/O request processing step.
Fig. 2 is an active load detecting flow chart of steps of the present invention.The load information that from/proc Virtual File System, reads; The computational load factor is judged each device node load factor, if load factor is higher, then devices illustrated is in busy condition, should reduce the number of times of load statistics, otherwise devices illustrated is in idle condition, can increase the load statistics number of times, set load factor≤30, delay time 10 seconds; If delayed time 30 seconds in 30<load factor≤60; Delayed time 60 seconds in load factor>60; The load information of reading system queue length, disk, CPU, internal memory and network from/proc Virtual File System once more; Send the load factor of each device node to meta data server; Judge time-delay according to load factor once more, so circulation.
Fig. 3 is a load statistics flow chart of steps of the present invention.The load factor LOAD of memory device is defined as:
LOAD=W
1*Lrql+W
2*Ldisk+W
3*Lcpu+W
4*Lmem+W
5*Lnet
Carry out the normalization management, Lrql, Ldisk, Lcpu, Lmem and Lnet are respectively system queue length, disk load, cpu load, internal memory load and the offered load of object storage equipment, W
1, W
2, W
3, W
4, W
5Be the weights of correspondence, and ∑ W
i(i=1,2,3,4,5)=1; During the computational load value five loads all are converted to 100 to be the shared ratio of unit, by ∑ W
i=1 as can be known load value LOAD more than or equal to zero smaller or equal to 100.With the average that obtains system request queue length behind the load full test of Iozone to each device node is 3.280, maximal value is 7.806, the utilization rate average of CPU, internal memory, the network bandwidth is respectively 33.0781%, 58.0078%, 24.4531%, magnetic disc i/o handling capacity average is 46.1016MB/S, and the utilization rate average is 76.836%.Selected W1 is 0.4, can get one group of load initial weight [0.4,0.2406,0.1032,0.1812,0.075] with universality; After judging the individual event load state, according to load information correction load weights W
i, if a certainly load on that load value is all greater than 80 or less than 20 in continuous detect for five times, then its corresponding weights increase or reduce by 10%, otherwise constant; Send each device node load factor to meta data server after calculating each device node load factor; Meta data server sorts by each device node load factor size, safeguards the ascending order device queue.
Fig. 4 is object migration of the present invention and replica management flow chart of steps.The focus object definition is for frequently carrying out the object of read/write operation, and the frequent read/write of object must cause the lifting of corresponding stored device node load, and each device node is set mobility threshold, and this mobility threshold is the critical condition of moving; Detect the loading condition of each device node, and statistics, calculate the load average of each device node nearest ten times; Judge the load average, each device node judged that if this average greater than mobility threshold, is carried out the migration of object, meta data server, moves object as the migration target from the contiguous device node of selecting the load factor minimum during object migration; Meta data server upgraded copy information after migration was finished; If nearest ten load averages are less than half of threshold value, deletion copy, upgating object metadata information.
The focus object definition is for frequently carrying out the object of read/write operation, and threshold value is the critical condition of moving, and temperature is the frequent degree of Object Operations, and to the read/write operation each time of object, temperature adds does not have read/write operation in 1,1 minute, and temperature subtracts 1; Detect temperature, if temperature is greater than mobility threshold, then meta data server from the nearby device node, selects the load factor minimum node as moving target, it is moved; Meta data server upgraded copy information after migration was finished; If read operation is moved to the object copies of underloading node with read request, if write operation then needs to upgrade all copies; Detect temperature,, then delete copy, update metadata information as if temperature half less than threshold value.
Fig. 5 is an object properties spread step process flow diagram of the present invention.Function is preserved in the mode of object properties for load information, migration information that the present invention is needed.In OSD SCSI consensus standard, object properties are described with many property pages, each property pages is made up of many concrete attribute items again, property pages is determined by attribute page number Page Number, the attribute item is to there being attribute AttributeNumber, like this, a concrete attribute item comes index with two tuples (Page Number, Attribute Number).Object properties can be divided into permanent attribute and interim attribute according to the relative length of life period, and what each the attribute item among the present invention in the load attribute page or leaf all embodied is interior object accesses feature of a period of time, is fit to describe with interim attribute; The page number of specified loads property pages for C000 0000h to C000 FFFFh, i.e. D=C000 0000h, then D+5h represents C000 0005h.
Fig. 6 is a load attribute page structure of the present invention.Property pages to OSD SCSI consensus standard is expanded, a self-defined load attribute page or leaf, according to demand of the present invention, these five attribute items of predefine load value, temperature, primary copy judgement, copy information and load weights, all the other attribute items keep does expansion from now on.
Load value attribute item is preserved present device load statistics step gained load factor, is described by integer data, accounts for four bytes.
Temperature attribute item is preserved temperature, is described by integer data, accounts for four bytes, and the big expression operation of temperature is frequent, because hot spot data is mainly by reading concurrent generation in a large number, so can disperse temperature by replication policy.Temperature is moved to a copy of object on the underloading node after surpassing setting threshold, if temperature is still bigger after once moving, then carries out the migration second time.
Primary copy judges that the attribute item is used for judging to liking originally main or copy, owing to only carry out single judgement, so be set to character type data, take a son and saves.Write operation only carries out on leading originally, follows latest copy information after finishing again.
Copy information attribute item has been preserved object and whether has been had copy, has information such as how many copies, takies 20 bytes.For because the copy that hot spot data produces if the temperature of hot spot data drops to a certain threshold value after a period of time, according to principle of locality, can think that focus disappears, can delete copy.
Load weights attribute item has been preserved the load weights of five load correspondences of the present invention's statistics, and each weights accounts for four bytes by a floating type variable description, and five loads take 20 bytes, and keeps 20 bytes do expansion uses.
Fig. 7 is an I/O request processing step process flow diagram of the present invention.In the object storage system, the generation of hot spot data is almost concurrent and produce by a large amount of read operations, the read/write load can separate after keeping a plurality of copies in the system, read operation can be carried out on any copy, can only be applied to originally main and upgrade operation, and all read operations need to lose efficacy when upgrading operation and carrying out, and treat that main this renewal operation is finished and more after the latest copy, read operation just can restart.Can there be a plurality of copies in significant data because of the requirement of reliability, copy exist for load balance and utilize object properties information that good application foundation is provided in a large number.After the I/O request arrives, at first judge request type, request type can be divided into read request, write request and update request.When I/O is read request, judge whether to exist copy, exist under the copy situation, meta data server is found out the minimum equipment of load in the object storage equipment that has copy according to the present load record, and the I/O request is guided to this equipment; Otherwise, do not consider load state, directly from the memory device that has object, read.When I/O was write request, meta data server at first according to the burst number N of file size decision objects, was selected the minimum N of load the memory node as object, storage object according to present load record and historical load record again from all devices node.When I/O asked to update request, the renewal operation can only be applied to originally main, carried out the renewal of copy after main renewal operation is originally finished again.Update metadata and object belonged to part information after all operations was finished.