CN119248196A - Data storage method and device - Google Patents
Data storage method and device Download PDFInfo
- Publication number
- CN119248196A CN119248196A CN202411375140.7A CN202411375140A CN119248196A CN 119248196 A CN119248196 A CN 119248196A CN 202411375140 A CN202411375140 A CN 202411375140A CN 119248196 A CN119248196 A CN 119248196A
- Authority
- CN
- China
- Prior art keywords
- data
- storage
- pool
- hierarchical
- object data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a data storage method and device, wherein the method comprises the steps of obtaining an object data set stored in a storage bucket, aggregating at least one first object data into first aggregate data under the condition that at least one first object data in the object data set meets first data processing conditions, and flushing the first aggregate data into a first storage pool, wherein a hierarchical flushing pool associated with the storage bucket comprises the first storage pool, and aggregating at least one second object data into second aggregate data under the condition that at least one second object data in the object data set meets second data processing conditions, and flushing the second aggregate data into a second storage pool, wherein the hierarchical flushing pool associated with the storage bucket comprises the second storage pool. The application solves the problem of lower data storage efficiency, thereby achieving the effect of improving the data storage efficiency.
Description
Technical Field
The embodiment of the application relates to the field of computers, in particular to a data storage method and device.
Background
In the data storage scenario, the aggregated data is usually flushed down to a fixed storage pool, and after the fixed hierarchical flushing pool is fully written, the aggregated data has no space and can be written, only the capacity of the storage pool can be expanded, or a new storage bucket is created to bind the new hierarchical flushing pool, so that the problem of lower data storage efficiency occurs. Therefore, there is a problem in that data storage efficiency is low.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a data storage method and device, which at least solve the problem of low data storage efficiency in the related art.
According to one embodiment of the application, a data storage method is provided, which comprises the steps of obtaining an object data set stored in a storage bucket, aggregating at least one first object data in the object data set into first aggregate data under the condition that the first data processing condition is met, and brushing the first aggregate data into a first storage pool, wherein a hierarchical brushing pool associated with the storage bucket comprises the first storage pool, and aggregating at least one second object data in the object data set into a second aggregate data under the condition that the second data processing condition is met, and brushing the second aggregate data into a second storage pool, wherein the hierarchical brushing pool associated with the storage bucket comprises the second storage pool.
According to another embodiment of the present application, there is provided a data storage device, including an obtaining unit configured to obtain an aggregate of object data stored in a storage bucket, a first flushing unit configured to aggregate at least one first object data in the aggregate of object data into first aggregate data and flush the first aggregate data into a first storage pool, where a hierarchical flush pool associated with the storage bucket includes the first storage pool, and a second flushing unit configured to aggregate at least one second object data in the aggregate of object data into a second aggregate data and flush the second aggregate data into a second storage pool, where at least one second object data in the aggregate of object data satisfies a second data processing condition.
In an exemplary embodiment, the first brushing unit includes an aggregation module configured to aggregate the at least one first object data into the first aggregate data, a determining module configured to determine a storage pool with a highest priority from hierarchical brushing pools associated with the storage bucket, and a first brushing module configured to brush the first aggregate data down to the storage pool with the highest priority.
In an exemplary embodiment, the determining module includes a first obtaining submodule, configured to obtain creation times corresponding to respective hierarchical lower brush pools associated with the storage buckets, and a first determining submodule, configured to determine, as the storage pool with the highest priority, a hierarchical lower brush pool with the furthest creation time and a remaining capacity greater than or equal to a capacity required for storing the first aggregate data in the hierarchical lower brush pool associated with the storage bucket.
In an exemplary embodiment, the determining module includes a second obtaining sub-module, configured to obtain a remaining capacity corresponding to each of the hierarchical lower brush pools associated with the storage bucket, and a second determining sub-module, configured to determine, as the storage pool with the highest priority, the hierarchical lower brush pool with the largest remaining capacity in the hierarchical lower brush pools associated with the storage bucket.
In an exemplary embodiment, the determining module includes a third obtaining sub-module, configured to obtain a remaining capacity ratio corresponding to each of the hierarchical lower brush pools associated with the storage bucket, and a third determining sub-module, configured to determine, as the storage pool with the highest priority, the hierarchical lower brush pool with the largest remaining capacity ratio among the hierarchical lower brush pools associated with the storage bucket.
In an exemplary embodiment, the second lower swiping unit includes a second lower swiping module configured to aggregate the at least one second object data into the second aggregate data and lower swiping the second aggregate data into the second storage pool when the at least one second object data satisfies the second data processing condition and the remaining capacity of the first storage pool is smaller than the required capacity of the second aggregate data and the remaining capacity of the second storage pool is larger than the required capacity of the second aggregate data, or a creating module configured to aggregate the at least one second object data into the second aggregate data and lower swiping the second aggregate data into the second storage pool when the remaining capacity of the first storage pool is smaller than the required capacity of the second aggregate data and the remaining capacities of the respective hierarchical lower swiping pools associated with the storage bucket are all smaller than the required capacity of the second aggregate data.
In an exemplary embodiment, the device further comprises an allocation unit, an aggregation unit and a first brushing unit, wherein the allocation unit is used for allocating a grading mark to object data in the object data set after the object data set stored in the storage barrel is acquired, the grading mark is used for indicating grading brushing pool which is allowed to be stored in the storage barrel, the aggregation unit is used for aggregating at least one first object data into the first aggregation data when the grading mark is carried by at least one first object data after the object data set stored in the storage barrel is acquired, and the first brushing unit is used for brushing the first aggregation data into the first storage pool when the aggregated quantity in the object data set is larger than or equal to a first preset threshold value or the data quantity of the first aggregation data is larger than or equal to a second preset threshold value after the object data set stored in the storage barrel is acquired.
In an exemplary embodiment, the device further comprises a storage unit configured to aggregate at least one first object data stored in the fast storage pool into the first aggregate data and to flush the first aggregate data into a first slow storage pool after the acquiring of the object data sets stored in the storage bucket, wherein the hierarchical flush pool associated with the storage bucket comprises the first slow storage pool, and a third flush unit configured to flush the at least one second object data stored in the fast storage pool into the first aggregate data and to flush the at least one second aggregate data into the first slow storage pool after the acquiring of the object data sets stored in the storage bucket, wherein the second aggregate data comprises the second slow storage pool when the at least one second object data stored in the fast storage pool satisfies a first data processing condition, wherein the at least one second aggregate data stored in the fast storage pool comprises the second aggregate data.
According to a further embodiment of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the application, there is also provided an electronic device comprising a memory having stored therein a computer program, and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the application, the storage space can be better managed and utilized, and the pertinence and the flexibility of data processing can be improved, so that the overall efficiency of data storage is remarkably improved. Specifically, the application can flexibly process according to the characteristics of different object data by setting the first data processing condition and the second data processing condition. This means that the expansion or migration is not required to be performed after the storage pool is fully written, but the data meeting the specific conditions can be flushed down to the corresponding storage pool in real time according to the actual situation of the data.
Furthermore, the application achieves fine management of data by classifying object data according to processing conditions and flushing down to different storage pools (e.g., a first storage pool and a second storage pool) respectively. This strategy not only optimizes the use of storage space, but also improves the pertinence of the data processing, as different categories of data may be deposited into the storage pool that best suits its characteristics.
Furthermore, the application dynamically allocates the storage space and brushes the data down to different hierarchical storage pools according to the data processing conditions, thereby avoiding the efficiency bottleneck caused by the full writing of the storage pools in the traditional method, solving the problem of lower data storage efficiency and further achieving the technical effect of improving the data storage efficiency.
Drawings
FIG. 1 is a schematic illustration of an application environment for a data storage method according to an embodiment of the present application;
FIG. 2 is a flow chart of a data storage method according to an embodiment of the application;
FIG. 3 is a schematic diagram of a data storage method according to an embodiment of the application;
FIG. 4 is a schematic diagram of a data storage method according to an embodiment of the application;
FIG. 5 is a schematic diagram of a data storage method according to an embodiment of the application;
FIG. 6 is a schematic diagram of a data storage method according to an embodiment of the application;
Fig. 7 is a block diagram of a data storage device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the embodiments of the present application may be executed in a server apparatus or similar computing device. Taking the example of running on a server device, fig. 1 is a block diagram of a hardware structure of a server device of a data storage method according to an embodiment of the present application. As shown in fig. 1, the server device may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like processing means) and a memory 104 for storing data, wherein the server device may further include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the architecture shown in fig. 1 is merely illustrative and is not intended to limit the architecture of the server apparatus described above. For example, the server device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a data storage method in an embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located with respect to the processor 102, which may be connected to the server device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a server device. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, a data storage method is provided, fig. 2 is a flowchart of the data storage method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
Step S202, acquiring an object data set stored in a storage barrel;
In one exemplary embodiment, "acquiring a set of object data" stored in a bucket refers to extracting or retrieving all object data stored therein from a particular bucket and processing the data as a set. In the context of data storage and processing, buckets generally serve as containers for data, and may hold various types of object data, such as files, pictures, videos, and the like. Acquiring the data sets is the first step of the data processing flow, and provides a basis for subsequent data analysis, aggregation, brushing and other operations.
Further by way of example, it is optionally assumed that there is a cloud storage service in which a user creates a bucket named "photo backup" for storing personal photos. When a user wants to perform batch processing or analysis on the photos, firstly, a 'acquisition object data set' stored in a storage bucket is needed, namely, all photo data in the 'photo backup' storage bucket are retrieved to form a photo data set.
Step S204, under the condition that at least one first object data in the object data set meets a first data processing condition, aggregating the at least one first object data into first aggregate data, and brushing the first aggregate data down to a first storage pool, wherein a hierarchical brushing-down pool associated with the storage bucket comprises the first storage pool;
In one exemplary embodiment, specific steps in a data processing flow are described. First, it identifies a subset of data satisfying a specific condition (called "first data processing condition") from a previously acquired set of object data, i.e. "at least one first object data". It then aggregates these data satisfying the condition into a new data set, called "first aggregate data". Finally, the aggregated dataset is flushed (i.e., written or transferred) to a designated storage pool, here "first storage pool", which is one of a plurality of hierarchical flush pools previously associated with the bucket.
The first object data refers to single or multiple data objects which are screened from the object data set and meet the first data processing condition.
The first data processing condition is a rule or criteria for screening specific data in the object data set.
And the first aggregation data is a new data set formed by aggregating all the first object data meeting the first data processing condition.
Swiping-down refers to the act of moving data from a current processing location to another storage location, typically for backup, archiving, or further processing.
The first storage pool is one of a hierarchical under-brush pool associated with the storage bucket for storing first aggregate data.
Further by way of example, it is optionally assumed that one e-commerce platform needs to handle a large amount of user transaction data. The platform first "obtains a set of object data stored in a bucket", which may be a user's purchase record. Next, the platform sets a "" first data processing condition "", for example, filters out records of all transaction amounts exceeding 1000 yuan. These satisfying transaction records (i.e. "first object data") are then aggregated together to form "first aggregated data". Finally, the aggregate data is flushed down "first storage pool", which may be a high-performance database, for further data analysis and report generation.
In step S206, in a case where at least one second object data in the object data set meets a second data processing condition, aggregating the at least one second object data into second aggregated data, and downflushing the second aggregated data into a second storage pool, where the hierarchical downflushing pool associated with the storage bucket includes the second storage pool.
In one exemplary embodiment, the description is in describing a data processing process, particularly with respect to how object data in a bucket is processed according to particular conditions. Here, "second object data" refers to data satisfying "second data processing condition" in the object data set. When these data satisfy the condition, they are aggregated into "second aggregate data" and flushed (i.e., transferred or saved) into "second storage pool". This second storage pool is one of a plurality of hierarchical lower brush pools associated with the storage bucket.
Wherein the second object data is one or more data objects in the object data set that fulfill a specific condition, i.e. a second data processing condition.
And the second data processing condition is a specific rule or standard for screening the second object data from the object data set.
And second aggregate data, which is a data set formed by combining all second object data meeting second data processing conditions.
And a second storage pool, one of the hierarchical under-brush pools associated with the storage buckets, for storing second aggregate data.
Further by way of example, it is optionally assumed that a video editing software is processing video files uploaded by a user. These files are stored in a bucket. The software sets different processing conditions to optimize the video processing flow. Wherein,
The 'second data processing condition' may be for video resolution, e.g. screening out all videos with a resolution higher than 4K. These high resolution videos (i.e. "second object data") are then aggregated together to form "second aggregate data" which is then flushed down to a "second storage pool" dedicated to storing high quality videos.
Through the steps, the storage space can be better managed and utilized, and the pertinence and the flexibility of data processing can be improved, so that the overall efficiency of data storage is remarkably improved. Specifically, the application can flexibly process according to the characteristics of different object data by setting the first data processing condition and the second data processing condition. This means that the expansion or migration is not required to be performed after the storage pool is fully written, but the data meeting the specific conditions can be flushed down to the corresponding storage pool in real time according to the actual situation of the data.
Furthermore, the application achieves fine management of data by classifying object data according to processing conditions and flushing down to different storage pools (e.g., a first storage pool and a second storage pool) respectively. This strategy not only optimizes the use of storage space, but also improves the pertinence of the data processing, as different categories of data may be deposited into the storage pool that best suits its characteristics.
Furthermore, the application dynamically allocates the storage space and brushes the data down to different hierarchical storage pools according to the data processing conditions, thereby avoiding the efficiency bottleneck caused by the full writing of the storage pools in the traditional method, solving the problem of lower data storage efficiency and further achieving the technical effect of improving the data storage efficiency.
The main execution body of the above steps may be a server, a terminal, or the like, but is not limited thereto.
The execution sequence of step S206 and step S204 may be interchanged, i.e. step S206 may be executed first and then step S204 may be executed.
As an alternative, aggregating at least one first object data into a first aggregate data and flushing the first aggregate data down into a first storage pool, including:
s1-1, aggregating at least one first object data into first aggregate data;
s1-2, determining a storage pool with the highest priority from the hierarchical brushing pools associated with the storage buckets;
S1-3, the first aggregation data is brushed down to a storage pool with the highest priority.
In an alternative embodiment aggregation refers to the process of combining a plurality of individual data objects into a larger data set according to certain rules or conditions.
Swiping-down refers to the act of moving data from one processing level to another, typically based on specific logic or rules.
The highest priority pool is determined as the first or most primary pool to receive data according to a preset priority rule among a plurality of hierarchical flush pools associated with the buckets.
It should be noted that this embodiment illustrates a sub-step of data processing, in particular with respect to how data satisfying specific conditions is aggregated and brushed down. Here, first, the "at least one first object data" satisfying the "first data processing condition" is aggregated into "first aggregated data". Next, the system identifies a highest priority storage pool from a plurality of hierarchical flush pools associated with the storage bucket. Finally, the aggregated data (i.e. "first aggregate data") is flushed down to the highest priority storage pool.
This approach has the advantage of flexibility and efficiency. By aggregating data that satisfies certain conditions, the system is able to more accurately locate and analyze critical information. At the same time, by flushing the data down to the highest priority storage pool, the system can ensure that the data can be accessed and processed quickly and efficiently when needed. This hierarchical storage and swiping-down strategy also helps balance the relationship between storage costs and data access performance.
Further by way of example, in an intelligent logistics system, a large amount of transportation data is collected and stored in a storage bucket. The system needs to process this data periodically to optimize the transportation route. When the system identifies "first object data" that satisfies "first data processing conditions" (e.g., orders that are transported over a distance of 1000 km), it aggregates these data into "first aggregate data". Next, the system examines a plurality of hierarchical flush pools associated with the buckets to determine which pool has the highest priority (possibly based on storage speed, cost, or other factors). Once determined, the system swipes the "first aggregate data" down into this highest priority storage pool for further analysis and optimization.
By the embodiment of the application, efficient and accurate data processing can be realized. The first object data satisfying the first data processing condition is effectively aggregated into first aggregated data and is flushed down to the highest priority storage pool. This not only improves the efficiency of data processing, but also ensures that critical data can be prioritised and utilised when required. Meanwhile, the processing mode is also beneficial to optimizing the use of storage resources, reducing the storage cost and improving the performance and reliability of the whole system.
As an alternative, determining, from the hierarchical lower brush pools associated with the buckets, a pool with a highest priority, including:
s2-1, acquiring creation time corresponding to each hierarchical brushing pool associated with a storage bucket;
S2-2, establishing the hierarchical lower brushing pool with the farthest creation time and the residual capacity larger than or equal to the capacity required for storing the first aggregation data in the hierarchical lower brushing pool associated with the storage bucket, and determining the hierarchical lower brushing pool as the storage pool with the highest priority.
In an alternative embodiment, the creation time refers to the point in time when the hierarchical under-brush pool is created or initialized.
The remaining capacity refers to the size of the currently available storage space of the hierarchical lower brush pool.
The storage pool with the highest priority is specifically selected as the storage pool for receiving and storing data first by the system after comprehensively judging according to the two factors of the creation time and the residual capacity.
It should be noted that this embodiment describes a specific data processing step, namely, a method for determining a storage pool with the highest priority from a plurality of hierarchical brushing-down pools associated with a storage bucket. In this process, the system first obtains the creation time of each hierarchical under-brush pool and then makes decisions based on these times and the remaining capacity of the storage pool. Specifically, the system will select the hierarchical under-brush pool that was created furthest in time (i.e., created earliest) and has sufficient remaining capacity to store the "first aggregate data", which is determined to be the highest priority storage pool.
This method of prioritizing has several advantages in practical applications. First, it helps balance the utilization of storage resources, avoiding situations where some storage pools reach the upper capacity limit prematurely, while other storage pools are idle. Secondly, by prioritizing the creation of a storage pool that is earlier in time, the "aged" management of the data can be achieved to a degree that allows the earlier stored data to migrate naturally to a lower cost storage hierarchy. Finally, this approach also helps to simplify the storage management policy, reducing the need for human intervention.
Further by way of example, it is alternatively assumed that a large enterprise has multiple under-brush pools for data storage that are hierarchically managed by creation time and capacity. When an enterprise needs to process a new aggregate data batch, the system determines which storage pool the data should be flushed to according to the method described above. For example, the system may find an early created storage pool a whose total capacity is not the largest, but whose current remaining capacity is sufficient to store the new aggregate data batch. At the same time, other later created storage pools may have insufficient remaining capacity due to recent frequent use. In this case, the system would determine storage pool A as the highest priority storage pool and flush the new aggregate data down to that storage pool.
By the embodiment of the application, the storage pool with the highest priority can be determined in an efficient and automatic mode. This not only improves the efficiency of data processing, but also ensures that data can be reasonably allocated into different storage tiers. Meanwhile, the priority determining method based on the creation time and the residual capacity is also beneficial to optimizing the storage cost and the management complexity, so that greater economic benefit and operation convenience are brought to enterprises.
As an alternative, determining, from the hierarchical lower brush pools associated with the buckets, a pool with a highest priority, including:
S3-1, obtaining residual capacity corresponding to each grading lower brushing pool associated with a storage bucket;
S3-2, determining the hierarchical lower brushing pool with the largest residual capacity in the hierarchical lower brushing pools associated with the storage buckets as the storage pool with the highest priority.
In an alternative embodiment, the remaining capacity refers to the storage space that is currently unoccupied by the hierarchical flush pool, i.e., the capacity that is available for storing new data.
The highest priority storage pool refers to the hierarchical flush pool with the greatest remaining capacity, which is used preferentially to receive new data.
It should be noted that this embodiment describes how to determine a storage pool with the highest priority from a plurality of hierarchical flush pools associated with a storage bucket. The specific method is that the residual capacity of each grading lower brushing pool is firstly obtained, and then the selection is made according to the residual capacity. Here, the system selects the hierarchical lower pool with the largest remaining capacity and determines it as the highest priority storage pool.
This method of determining priority based on remaining capacity is very practical in data storage and management. It can help the system maximize the utilization of limited storage resources, avoiding situations where some storage pools fill prematurely, while other storage pools have a lot of space unused. In addition, this approach can also dynamically adapt to changes in storage requirements, as the remaining capacity of each storage pool changes as data is written and deleted.
Further by way of example, consider optionally a cloud storage service in which user data is stored in multiple hierarchical under-brush pools that are ranked according to factors such as performance, cost, and frequency of data access. When a service needs to select a storage pool for a user's data, it checks the remaining capacity of each pool. For example, if there are three pools A, B and C, whose remaining capacities are 100GB, 200GB, and 50GB, respectively, then pool B (which has the greatest remaining capacity) will be determined to be the highest priority storage pool where new data will be stored preferentially, according to the method described.
By the embodiment of the application, the storage pool with the highest priority, namely the storage pool with the largest residual capacity, can be automatically and efficiently determined. This ensures that new data can be quickly stored in a storage pool with sufficient space, thereby improving the efficiency of data processing and the utilization of storage resources. Meanwhile, the method simplifies the storage management process, reduces the condition of needing manual intervention, and brings greater flexibility and expandability for the storage system.
As an alternative, determining, from the hierarchical lower brush pools associated with the buckets, a pool with a highest priority, including:
s4-1, obtaining the residual capacity proportion corresponding to each grading lower brushing pool associated with the storage bucket;
s4-2, determining the hierarchical lower brushing pool with the largest residual capacity proportion in the hierarchical lower brushing pools associated with the storage buckets as the storage pool with the highest priority.
In an alternative embodiment, the remaining capacity ratio refers to the ratio between the remaining capacity of the lower brush pool in the hierarchy and its total capacity. This ratio reflects the current free degree of the storage pool or the ratio of available space.
The storage pool with the highest priority refers to a hierarchical lower-brushing pool with the largest residual capacity proportion. This pool is preferred by the system for data storage because it has relatively more space available.
It should be noted that this embodiment describes how to determine a storage pool with the highest priority from a plurality of hierarchical flush pools associated with a storage bucket. In particular, this process involves taking a proportion of the remaining capacity of each of the hierarchical lower brush pools and determining which pool has the highest priority based on this proportion. Here, the hierarchical lower brush pool with the largest remaining capacity ratio is selected as the storage pool with the highest priority.
The use of a residual capacity ratio to determine priority, rather than just an absolute value of residual capacity, helps the system more evenly utilize the resources of the various storage pools. This approach is particularly important, especially where the total capacity of the storage pools varies significantly, as it ensures that even small capacity storage pools may be used preferentially whenever the remaining capacity ratio is high, thereby avoiding premature saturation of large capacity storage pools and inactivity of small capacity storage pools.
Further by way of example, suppose that there are three hierarchical under-brush pools in a data management system, pool X, pool Y, and pool Z. Their total capacities are 1000GB, 1500GB and 2000GB, respectively, while the current remaining capacities are 400GB, 600GB and 800GB, respectively. By calculating the remaining capacity ratio we have a ratio of pool X of 40% (400/1000), pool Y of 40% (600/1500), and pool Z of 40% (800/2000). Although the three pools differ in their remaining capacity, they have the same ratio of remaining capacities. However, in practical applications, there is a high probability that the ratio is different, for example, the ratio of pool X is 40%, pool Y is 30%, and pool Z is 50%. In this case, according to the described method, pool Z (the remaining capacity ratio is the largest) will be determined as the storage pool with the highest priority.
By the embodiment of the application, the hierarchical lower brush pool with the maximum residual capacity proportion at present can be effectively identified and determined as the storage pool with the highest priority. This not only increases the efficiency and flexibility of data storage, but also helps to achieve optimal allocation of storage resources and long-term sustainability management. Meanwhile, the method reduces the need of manual intervention, and enables the storage management process to be more automatic and intelligent.
As an alternative, aggregating at least one second object data in the set of object data into a second aggregate data and flushing the second aggregate data down into a second storage pool, in case the at least one second object data satisfies a second data processing condition, comprising:
s5-1, aggregating at least one second object data into second aggregate data and brushing the second aggregate data down into the second storage pool under the condition that the at least one second object data meets the second data processing condition and the residual capacity of the first storage pool is smaller than the required capacity of the second aggregate data and the residual capacity of the second storage pool is larger than the required capacity of the second aggregate data, or,
S5-2, under the condition that at least one second object data meets a second data processing condition, the residual capacity of the first storage pool is smaller than the required capacity of the second polymeric data, and the residual capacity corresponding to each hierarchical lower brushing pool associated with the storage bucket is smaller than the required capacity of the second polymeric data, creating a second storage pool;
S5-3, aggregating the at least one second object data into second aggregated data, and brushing the second aggregated data down to a second storage pool.
In an alternative embodiment, the second object data refers to data objects in the object data set that satisfy a particular "second data processing condition".
Second aggregated data, a data set aggregated from at least one 'second object data' satisfying 'second data processing conditions'.
The second storage pool, the storage pool for storing the second aggregate data, may be pre-existing or newly created when needed.
It should be noted that this embodiment describes how to aggregate "at least one second object data" satisfying "the second data processing condition" into "second aggregate data" under specific conditions, and decide how to brush these aggregate data down into an appropriate storage pool according to the remaining capacity of the storage pool. Specifically, the method relates to two main scenes, namely, directly aggregating data and then brushing down the data to a second storage pool when the capacity of the first storage pool is insufficient and the second storage pool has enough capacity, and secondly, when all the existing storage pools are insufficient, creating the second storage pool first and then performing data aggregation and brushing down operation.
Such data processing and storage strategies are very useful for managing dynamically changing data sets, especially in scenarios where flexible adjustments are required depending on data characteristics and storage resource availability. The method can ensure that the data is effectively stored, can help to optimize the use of storage resources, and avoids resource waste or data loss.
Further by way of example, consider optionally a video processing system in which "second object data" may refer to video files having a resolution above a certain threshold. When these files reach a certain number or total size, the system needs to aggregate them (e.g., package compression) for storage. If the current primary storage pool (first storage pool) does not have sufficient space to store the aggregated data, but another backup storage pool (second storage pool) has sufficient space, the system may flush the data down to the backup storage pool. If all the storage pools do not have sufficient space, the system may automatically create a new storage pool (the second storage pool) and flush the data down into it.
By the embodiment of the application, when 'second object data' meets specific processing conditions, the data can be efficiently aggregated, and the appropriate storage pool can be intelligently selected or created to store the aggregated data according to the residual capacity condition of the storage pool. The method not only improves the efficiency of data processing and the utilization rate of storage resources, but also enhances the flexibility and the expandability of the system, so that the system can better cope with the changing data storage requirements.
As an alternative, after acquiring the object data set stored in the bucket, the method further includes:
s6-1, distributing grading marks for the object data in the object data set, wherein the grading marks are used for representing grading lower brushing pools which are allowed to be stored in the storage bucket association;
S6-2, aggregating at least one first object data into first aggregate data under the condition that the at least one first object data carries a grading mark;
S6-3, brushing the first aggregation data down to the first storage pool under the condition that the aggregated quantity in the object data set is larger than or equal to a first preset threshold value or the data quantity of the first aggregation data is larger than or equal to a second preset threshold value.
In an alternative embodiment, the hierarchy flag is an identifier indicating whether the object data is allowed to be stored in a particular hierarchy pool associated with the bucket.
The first object data refers to data carrying grading marks in an object data set.
And the first aggregation data is a data set formed by aggregating at least one first object data carrying the grading marks.
The first preset threshold value and the second preset threshold value are respectively used for judging whether the quantity of the aggregated data and the data quantity of the first aggregated data reach the standard of brushing down to the storage pool.
It should be noted that this embodiment describes a series of operation steps after acquiring the object data set stored in the bucket. First, the system will assign hierarchical labels to the object data that indicate which data can be stored in the hierarchical flush pool associated with the bucket. Then, when at least one first object data carries such a hierarchical label, the system aggregates the data into first aggregate data. Finally, the system may flush the first aggregate data down to the first storage pool if a condition is met (e.g., the amount of aggregated data or the amount of data of the first aggregate data reaches a preset threshold).
The data processing strategy based on the hierarchical label can be widely applied to various scenes needing to efficiently and flexibly manage a large amount of data, such as big data analysis, cloud computing, the Internet of things and the like. By assigning different hierarchical labels to data, the system can intelligently select storage locations based on characteristics of the data (e.g., importance, access frequency, processing requirements, etc.), thereby optimizing the use of storage resources and the processing efficiency of the data.
Further by way of example, it is alternatively assumed that a cloud computing platform is required to handle a large number of user uploaded files. The platform first assigns hierarchical labels to these files (object data) to distinguish which files can be stored in a high performance storage pool and which can be stored in a lower cost storage pool. When a user uploads a batch of files (first object data) with a specific hierarchical label, the platform aggregates the files into a compressed package (first aggregate data). When the number or total size of the compressed packets reaches a preset threshold, the platform automatically swipes the compressed packets down into the corresponding level storage pool to optimize storage efficiency and cost.
According to the embodiment of the application, the efficient management and intelligent storage of the object data in the storage bucket can be realized. The use of hierarchical labels allows the system to flexibly allocate memory based on data characteristics and memory requirements, while aggregate and flush operations help reduce memory waste and increase data processing speed. Overall, this approach increases the efficiency of data management and the performance of the storage system, while reducing storage costs.
As an alternative, after acquiring the object data set stored in the bucket, the method further includes:
s7-1, storing the object data in the object data set into a quick storage pool under the condition that the object data amount in the object data set is larger than or equal to a third preset threshold value;
S7-2, under the condition that at least one first object data stored in the rapid storage pool meets a first data processing condition, aggregating the at least one first object data stored in the rapid storage pool into first aggregate data, and brushing the first aggregate data down to a first storage slow pool, wherein a hierarchical brushing pool associated with the storage bucket comprises the first storage slow pool;
And S7-3, under the condition that at least one second object data stored in the rapid storage pool meets the second data processing condition, aggregating the at least one second object data stored in the rapid storage pool into second aggregate data, and brushing the second aggregate data down to a second slow storage pool, wherein the hierarchical brushing pool associated with the storage bucket comprises the second slow storage pool.
In an alternative embodiment, the third preset threshold is a preset data size criterion, and when the data size in the object data set reaches or exceeds the threshold, the subsequent data storage and processing flow is triggered.
A fast memory pool, a high-performance memory space, is used for temporarily storing a large amount of object data so as to perform fast data processing and aggregation.
First data processing conditions and second data processing conditions, specific data processing criteria or rules, for determining which data should be aggregated and flushed down to the corresponding slow storage pool.
A first slow storage pool and a second slow storage pool, which may provide lower access speeds than fast storage pools, but typically have greater storage capacity for long-term storage of aggregated data.
It should be noted that, the description of the present embodiment relates to a data processing flow, where after the object data set in the bucket is acquired, the data is stored in different storage pools and corresponding aggregation and brushing operations are performed according to the data amount and specific processing conditions. Specifically, when the amount of object data reaches a certain threshold, the data is first stored in a fast memory pool. Data is then aggregated from the fast storage pools and flushed down to the different slow storage pools, depending on the different data processing conditions satisfied.
The hierarchical storage and processing strategy is not only suitable for an e-commerce platform, but also can be widely applied to the fields of finance, the Internet of things, big data analysis and the like. By reasonably setting the data processing conditions and the classification of the storage pool, the limited storage resources can be more effectively managed and utilized, and meanwhile, the efficiency and the accuracy of data processing are improved.
Further by way of example, consider optionally an e-commerce platform in which large amounts of transaction data are generated rapidly during peak periods of user activity. When the data amounts reach a preset threshold (a third preset threshold), the system quickly stores the data in a high-performance storage pool (quick storage pool). The system then classifies the data according to different processing conditions (e.g., transaction type, amount, etc.). Data meeting a first data processing condition (e.g., high value transactions) is aggregated and flushed down a first slow pool of storage, while data meeting a second data processing condition (e.g., normal transactions) is flushed down a second slow pool of storage.
By the embodiment of the application, the rapid storage and the efficient processing of a large amount of data can be realized. The hierarchical storage strategy ensures that data is reasonably allocated to different storage pools according to its importance and processing requirements, thereby optimizing the utilization of storage resources. Meanwhile, through aggregation and brushing operation, the system can reduce the storage of redundant data and improve the efficiency and accuracy of data processing. Overall, the method is beneficial to improving the data storage and processing performance, reducing the storage cost and meeting the requirements of complex data processing scenes.
As an alternative, for ease of understanding, the above data storage method is applied to a distributed object storage scenario, where the distributed object storage scenario refers to a distributed storage scenario for unstructured data objects. In the current distributed object storage system, the function of small object aggregation is realized aiming at common massive small objects, the small objects are written into a storage medium (NVME) with good performance so as to improve the read-write performance of the storage system, a plurality of small objects are aggregated into a large object in the background, and the large object is brushed down into a relatively cheap storage medium (SSD) so as to reduce the number of the objects of Rados and greatly accelerate the reconstruction speed of the cluster. However, in the current implementation scheme, the aggregated data can only be flushed down to a fixed storage pool, multiple hierarchical flushing pools are not supported, artificial selection is not possible, proper flushing pools cannot be automatically matched, and load balancing in multiple data pools is not supported. After the unique hierarchical flush pool is fully written, the aggregated data has no space and can only expand the capacity of the storage pool, or a new storage bucket is created to bind the new hierarchical flush pool, so that bad use experience is brought to clients.
For two ways of capacity expansion, the first way expands the capacity of an original data pool to cause cluster data reconstruction, when the number of objects in the data pool is large, the corresponding reconstruction time is longer, front-end service can be influenced during data reconstruction, and the second way does not cause cluster reconstruction, but under the current mechanism, as one storage bucket only has one hierarchical lower brushing pool and cannot be changed, an old storage bucket cannot continuously brush aggregated data to a new hierarchical lower brushing pool, so that an old storage bucket cannot be continuously used. When the cluster is expanded by creating a new hierarchical brushing pool, a new storage bucket must be created first to bind the new hierarchical brushing pool, so that front-end services of clients must be re-planned and adapted, great inconvenience is brought to the clients, and user experience is seriously affected.
Based on this, the embodiment provides a hierarchical aggregation implementation scheme based on multiple hierarchical brushing pools in a distributed object storage system, a user can create multiple hierarchical brushing pools, and can designate an aggregate data brushing policy, so as to realize automatic equalization of aggregation brushing, automatically select an optimal hierarchical brushing pool to perform data persistence according to the aggregate data brushing policy selected by the user, and can correctly access an aggregated object. Based on the scheme, when the space of the hierarchical lower brushing pool is insufficient, a user can build a new storage pool and set a hierarchical aggregation lower brushing strategy to realize the capacity expansion of the hierarchical aggregation data pool, so that the single-bucket cross-multi-pool hierarchical aggregation is realized, automatic balancing can be realized according to the strategy, the risk of the full writing of the single lower brushing pool is avoided, the data reconstruction and service switching caused by the single lower brushing pool are effectively eliminated, the influence of capacity expansion operation on the client after the field hierarchical lower brushing pool is full is effectively eliminated, the seamless switching between the new and old hierarchical lower brushing pools is realized, the usability and the usability of the product are greatly improved, the functions of the distributed object storage are enriched, the operation and maintenance cost is reduced, better use experience is brought to the client, and the market competitiveness of the distributed object storage product is greatly enhanced.
In an optional embodiment, the present embodiment proposes a hierarchical aggregation implementation scheme based on multiple hierarchical brushing pools in a distributed object storage system, where a user may create multiple hierarchical brushing pools, and may designate an aggregate data brushing policy, so as to implement automatic balancing of aggregate brushing, automatically select an optimal hierarchical brushing pool to perform persistence of data according to the aggregate data brushing policy selected by the user, and enable correct access to an aggregated object. Based on the scheme, when the storage pool space is insufficient, a user can automatically and evenly swipe the aggregated data in the storage bucket into a plurality of hierarchical swipe pools by creating a new hierarchical swipe pool and setting a swipe strategy. The function specifically comprises the following functional modules:
And creating and marking the grading lower brushing pool, namely marking the grading lower brushing pool with special marks. When creating a hierarchical flush pool, a label "obj. Aggdata" and a flush priority "obj. Agglevel" are added to the storage pool, indicating that the pool is used for hierarchical aggregation, and the flush priority is specified.
And the hierarchical downloading strategy setting and inquiring module is responsible for strategy setting, persistence and inquiring. The hierarchical brushing-down strategy is set and then recorded on a storage barrel in the form of metadata for use when the aggregation thread brushes down the aggregation data.
Specifically, the hierarchical aggregation under-brush policy includes at least one of:
1. And according to the set priority, designating a lower brushing priority when the lower brushing pool is created by hierarchical aggregation, and under the strategy, writing data according to the priority of the lower brushing pool, and preferentially writing the data into the lower brushing pool with high priority.
2. According to the creation time, namely according to the creation sequence of the data pools, under the strategy, the aggregated data is firstly brushed down to the first created branch office data pool, and after the data pool is fully written, the data is written into the second hierarchical brushing down pool, and so on;
3. The residual capacity is prioritized, under the strategy, the aggregated data is preferentially brushed to a grading brushing pool with the maximum residual capacity;
4. The residual capacity proportion is preferential, and under the strategy, the aggregated data is preferentially brushed down to a grading brushing pool with the largest residual capacity proportion.
And the classified lower brushing pool water level updating and maintaining module is responsible for periodically acquiring the service conditions (residual capacity, residual capacity proportion, creation time and set priority) of each classified lower brushing pool and updating the record in the memory.
And the strategy selection module is responsible for selecting the hierarchical brushing pool according to the strategy selected by the user and the use condition of the current hierarchical brushing pool. Specifically, in the object storage service process, a thread (a hierarchical brushing pool information maintenance module) is specially created to be responsible for periodically updating the service condition of the hierarchical brushing pool so as to prepare a hierarchical policy selection module for the selection of the aggregation brushing pool. The thread obtains the water level used by each grading brushing pool every 3 seconds, calculates the respective use proportion, maintains a grading brushing pool use condition table in the memory, and uniformly records the set priority, creation time, capacity use and residual capacity proportion information of each grading brushing pool.
After the function of hierarchical aggregation is started, when small objects pass through a front-end service writing system, the small objects are written into a quick pool (NVME medium) at first, hierarchical aggregation marks (each mark corresponds to one small object) are recorded on barrel fragments, a hierarchical aggregation background processing thread can enumerate the hierarchical aggregation marks on the barrel fragments to aggregate the written objects, after aggregation, aggregation data are brushed into a cold pool, at the moment, a strategy selection module is called to select a brushing strategy, the strategy selection module can automatically select a matched hierarchical brushing pool according to each hierarchical brushing pool information table recorded in a memory, the hierarchical aggregation thread carries out brushing persistence of aggregation data according to the selected hierarchical brushing pool, after data brushing, a data part of an original object is deleted, and an aggregation attribute information is recorded in metadata of each small object before aggregation, and a real storage position of the recorded data comprises the aggregation brushing pool, the aggregation large object oid, an offset and an object size is used for reading the data when the objects are downloaded.
When downloading the aggregation object, firstly judging whether the aggregation attribute exists, if so, finding the real storage position (aggregation lower brushing pool, aggregation large object oid, offset and object size) of the data according to the information recorded in the aggregation information, and acquiring the object data according to the information.
It should be noted that, when constructing the hierarchical lower brush pool, this embodiment will label it with "obj. Aggdata" and set the lower brush priority "obj. Aggevel". These flags specify the specific nature of the storage pool, namely, providing the swiping service for the hierarchically aggregated data, and specifying its swiping priority. Such an arrangement facilitates more efficient pool management by the hierarchical under-brush pool information update maintenance module.
Through an intuitive visual operation interface, a user can easily set, save and inquire the hierarchical aggregation downloading strategy. The embodiment supports four flexible policy configuration options, namely, a priority higher one is firstly to be brushed down, a creation time earlier one is to be prioritized, a residual capacity larger one is to be prioritized, and a residual capacity ratio higher one is to be prioritized.
The background system periodically acquires the latest state information of the hierarchical under-brush pool from the osd (object storage device), including the remaining capacity, the remaining capacity proportion, the creation time, and the set priority level. The information is updated to the memory record table in real time, so that the accuracy and timeliness of the data are ensured.
After the aggregation of the small objects is completed, the brushing strategy selection module intelligently selects the optimal brushing pool to carry out the persistent storage of the aggregated data according to the hierarchical brushing pool information record table in the memory. This process enables flexible distribution of hierarchically aggregated data across multiple pools within a single bucket.
By analyzing the aggregate attribute information recorded by the object header, the system can quickly locate key information such as a pool under which the object data is located, oid (object identifier) of the aggregate large object, offset, size and the like. Based on this information, the user can efficiently perform the reading and downloading operations of the data.
Further by way of example, an alternative hierarchical aggregated under-brush policy setting is shown in FIG. 3, comprising the following steps:
a1, setting a hierarchical aggregation lower brushing strategy:
A1-1, setting a hierarchical aggregation under-brushing strategy on a client interface:
The user selects a plurality of hierarchical underlying pools through the management interface, which pools are used for subsequent data processing.
After selecting the hierarchical lower layer, the user needs to further set a brushing strategy including hardware priority, residual capacity proportion priority and the like so as to meet different data processing requirements.
A1-2, after receiving the setting request, assembling metadata information and recording the metadata information into the metadata of the barrel:
And after receiving a setting request of a user, the system integrates the hierarchical lower-layer pool selected by the user and the brushing strategy information into metadata information.
This metadata information is then recorded into the corresponding bucket metadata in preparation for subsequent storage and querying.
A1-3, persisting the assembled metadata information into a storage pool:
metadata information is securely persisted into the metadata pool, ensuring long-term preservation and accessibility of the data.
A2, searching a hierarchical aggregation lower brushing strategy:
a2-1, client interface query hierarchical aggregation downloading strategy:
And the user initiates a query request of the hierarchical aggregation downloading strategy through the management interface so as to acquire the strategy information of the current setting.
A2-2, after receiving the request, reading corresponding metadata information from the storage pool:
after the system receives the inquiry request, the metadata information matched with the user request is quickly retrieved from the metadata pool.
A2-3, returning to the client after analysis:
The retrieved metadata information is displayed in a user-friendly mode after being analyzed and processed and returned to the client for the user to check or further operate.
In an alternative embodiment, the updating and maintaining of the brushing pool information under hierarchical aggregation is shown in fig. 4, and the specific steps are as follows:
b1, starting a background thread, and automatically triggering an information updating process.
And B2, periodically acquiring water level use condition data of each grading lower brushing pool from an Operating System (OS), and ensuring the real-time performance and accuracy of the data.
And B3, calculating the residual capacity and the proportion of the brushing pool under each stage based on the acquired data, and providing key indexes for resource management.
And B4, updating or filling a hierarchical lower-brushing pool information table recorded in the memory, comprehensively updating the calculated information such as the residual capacity, the residual capacity proportion, the original priority, the creation time and the like into the table, and ensuring the integrity and the latest of the information.
In an alternative embodiment, the hierarchical aggregation under-brush policy selection is shown in fig. 5, and the specific steps are as follows:
and C1, starting a hierarchical aggregation background processing thread, and marking the beginning of the whole hierarchical aggregation process.
And C2, the system comprehensively enumerates all storage buckets in the storage system, and ensures no omission.
And C3, then, entering a cyclic processing stage, and processing each enumerated storage bucket one by one. This step ensures that each bucket is properly handled.
C4, when each bucket is processed, the system enumerates the hierarchical labels recorded on the bucket fragments, which labels correspond to the individual small objects. Through this step, the system can identify which small objects need to be aggregated.
And C5, according to preset conditions (1024 small objects are defaulted or 4M is aggregated), the system aggregates a plurality of small objects into one large object so as to improve the storage efficiency and the access speed.
And C6, after the aggregation is completed, the system calls a hierarchical aggregation lower brushing strategy selection module to select an equalization strategy so as to select the most suitable hierarchical lower brushing pool. This step is the key to ensure that the data can be evenly distributed and to improve the overall performance of the system.
C7, for subsequent data management and garbage disposal, the system records bitmap information, which is critical to the garbage disposal module.
And C8, the system performs persistent storage on the aggregated data according to the selected hierarchical aggregation lower brushing pool, so that the safety and reliability of the data are ensured.
And C9, after the data is persisted, deleting the data part of the original small object by the system to release the storage space. Meanwhile, the ag aggregation information is recorded in the head metadata of the original small object, and comprises an actual hierarchical lower brushing pool ag_data_pool, an offset, an object size and oid for aggregating the large object. This information will play an important role in the subsequent download of the object.
And C10, finally, deleting the hierarchical aggregation mark by the system, and marking the satisfactory end of the next hierarchical aggregation lower brushing strategy selection flow.
In an alternative embodiment, the downloading procedure is as shown in fig. 6, and the specific steps are as follows:
And D1, the server receives a request for downloading the object sent by the client, and firstly acquires metadata information of the object.
D2, next, the system judges whether the object metadata contains aggregation attribute information. If the attribute does not exist, the data is directly processed according to the original downloading flow and returned.
And D3, if the aggregation attribute exists, entering an aggregation data processing stage by the flow. Firstly, determining the real storage position of the object data according to the aggregation information, namely positioning to a storage pool agg_data_pool where the large aggregate object is located. Then, key information such as a unique identifier oid, a data offset, and a data size of the aggregate large object is acquired.
D4, then, the system extracts and combines the data fragments corresponding to these small objects from the storage pool. This process may involve steps of stitching, checking, etc. of the data to ensure the integrity and accuracy of the data.
And D5, finally, returning the aggregated and processed complete data to the client to finish the downloading flow.
Through the hierarchical aggregation implementation scheme based on the multi-hierarchical brushing pool in the distributed object storage system provided by the embodiment, a user can create a plurality of hierarchical brushing pools, can specify an aggregation data brushing strategy, realizes automatic equalization of aggregation brushing, automatically selects an optimal hierarchical brushing pool to perform data persistence according to the aggregation data brushing strategy selected by the user, and can correctly access the aggregated object. Based on the scheme, when the space of the grading lower brushing pool is insufficient, a user can build a new storage pool and set a grading aggregation balance strategy to realize the capacity expansion of the grading aggregation data pool, so that the grading aggregation single-bucket cross-multi-pool is realized, automatic balance can be realized according to the strategy, the risk of the single lower brushing pool being fully written, the data reconstruction and service switching caused by the single lower brushing pool are avoided, the influence of capacity expansion operation on the client after the field grading lower brushing pool is fully written by the polar client is effectively eliminated, the seamless switching between the new and old grading lower brushing pools is realized, the usability and usability of the product are greatly improved, the functions of the distributed object storage are enriched, the operation and maintenance cost is reduced, better use experience is brought to the client, and the market competitiveness of the distributed object storage product is greatly enhanced.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present application.
In this embodiment, a data storage device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations of hardware, or a combination of software and hardware, are also possible and contemplated.
Fig. 7 is a block diagram of a data storage device according to an embodiment of the present application, as shown in fig. 7, the device includes:
An obtaining unit 702, configured to obtain an object data set stored in a bucket;
A first brushing unit 704, configured to aggregate at least one first object data in the object data set into first aggregate data and brush the first aggregate data down to a first storage pool, where the hierarchical brushing pool associated with the storage bucket includes the first storage pool, if the at least one first object data meets a first data processing condition;
and a second brushing unit 706, configured to aggregate at least one second object data in the set of object data into second aggregate data and brush the second aggregate data down into a second storage pool, where the hierarchical brushing pool associated with the storage bucket includes the second storage pool, if the at least one second object data satisfies a second data processing condition.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
As an alternative, the first lower brush unit 704 includes:
the aggregation module is used for aggregating at least one first object data into first aggregation data;
The determining module is used for determining a storage pool with the highest priority from the hierarchical brushing pools associated with the storage buckets;
and the first brushing module is used for brushing the first aggregate data down to the storage pool with the highest priority.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
As an alternative, the determining module includes:
The first acquisition sub-module is used for acquiring the creation time corresponding to each hierarchical brushing pool associated with the storage bucket;
And the first determining submodule is used for determining the storage pool with the highest priority as the hierarchical lower brushing pool which is furthest in creation time and has the residual capacity which is larger than or equal to the capacity required for storing the first aggregation data in the hierarchical lower brushing pool associated with the storage bucket.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
As an alternative, the determining module includes:
The second acquisition submodule is used for acquiring residual capacity corresponding to each grading lower brushing pool associated with the storage bucket;
and the second determining submodule is used for determining the hierarchical lower brushing pool with the largest residual capacity in the hierarchical lower brushing pools associated with the storage bucket as the storage pool with the highest priority.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
As an alternative, the determining module includes:
a third obtaining sub-module, configured to obtain a remaining capacity ratio corresponding to each hierarchical brushing pool associated with the storage bucket;
And the third determining submodule is used for determining the hierarchical lower brushing pool with the largest residual capacity proportion in the hierarchical lower brushing pools associated with the storage buckets as the storage pool with the highest priority.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
As an alternative, the second lower brush unit 706 includes:
A second brushing module, configured to aggregate at least one second object data into second aggregate data and brush the second aggregate data down into the second storage pool, where the at least one second object data meets a second data processing condition, and the remaining capacity of the first storage pool is smaller than the required capacity of the second aggregate data, and the remaining capacity of the second storage pool is greater than the required capacity of the second aggregate data, or,
The creating module is used for creating a second storage pool under the condition that at least one second object data meets the second data processing condition, the residual capacity of the first storage pool is smaller than the required capacity of the second polymeric data, and the residual capacity corresponding to each hierarchical lower brushing pool associated with the storage bucket is smaller than the required capacity of the second polymeric data;
And the third brushing module is used for aggregating at least one second object data into second aggregated data and brushing the second aggregated data into a second storage pool.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
As an alternative, the apparatus further includes:
The distribution unit is used for distributing grading marks for the object data in the object data set after the object data set stored in the storage bucket is acquired, wherein the grading marks are used for representing grading lower brushing pools which allow storage in the storage bucket association;
An aggregation unit, configured to aggregate at least one first object data into first aggregate data when the at least one first object data carries a hierarchical label after acquiring an object data set stored in a storage bucket;
The first brushing unit 704 is configured to, after acquiring the object data set stored in the storage bucket, brush the first aggregate data down to the first storage pool if the aggregate number in the object data set is greater than or equal to a first preset threshold, or if the data amount of the first aggregate data is greater than or equal to a second preset threshold.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
As an alternative, the apparatus further includes:
The storage unit is used for storing the object data in the object data set into the quick storage pool under the condition that the object data amount in the object data set is larger than or equal to a third preset threshold value after the object data set stored in the storage bucket is acquired;
a second brushing unit 706, configured to aggregate, after acquiring the object data set stored in the bucket, at least one first object data stored in the fast storage pool into first aggregate data if the at least one first object data stored in the fast storage pool meets a first data processing condition, and brush the first aggregate data down to a first slow storage pool, where the hierarchical brushing pool associated with the bucket includes the first slow storage pool;
and the third brushing unit is used for aggregating at least one second object data stored in the quick storage pool into second aggregate data and brushing the second aggregate data into a second slow storage pool after the object data set stored in the storage bucket is acquired and under the condition that at least one second object data stored in the quick storage pool meets second data processing conditions, wherein the hierarchical brushing pool related to the storage bucket comprises the second slow storage pool.
Specific embodiments may refer to examples shown in the above data storage method, and in this example, details are not described herein.
It should be noted that each of the above-mentioned virtual devices (modules, units, sub-modules, sub-units, components, etc.) may be implemented by software or hardware, and the latter may be implemented by, but not limited to, the above-mentioned virtual devices being located in the same processor, or the above-mentioned virtual devices being located in different processors in any combination.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In an exemplary embodiment, the computer readable storage medium may include, but is not limited to, a U disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which a computer program may be stored.
An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor, and an input/output device connected to the processor.
Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the various virtual devices or steps of the application described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, such that they may be stored in a memory device and, in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be individually fabricated as individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated as a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present application should be included in the protection scope of the present application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411375140.7A CN119248196A (en) | 2024-09-29 | 2024-09-29 | Data storage method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411375140.7A CN119248196A (en) | 2024-09-29 | 2024-09-29 | Data storage method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN119248196A true CN119248196A (en) | 2025-01-03 |
Family
ID=94027512
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411375140.7A Pending CN119248196A (en) | 2024-09-29 | 2024-09-29 | Data storage method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119248196A (en) |
-
2024
- 2024-09-29 CN CN202411375140.7A patent/CN119248196A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111522636B (en) | Application container adjusting method, application container adjusting system, computer readable medium and terminal device | |
US8191070B2 (en) | Dynamic resource allocation | |
US8887166B2 (en) | Resource allocation and modification using access patterns | |
US8364710B2 (en) | Model-based resource allocation | |
US20100125715A1 (en) | Storage System and Operation Method Thereof | |
CN106462575A (en) | Design and implementation of clustered in-memory database | |
US20140201491A1 (en) | Efficient allocation and reclamation of thin-provisioned storage | |
CN105049268A (en) | Distributed computing resource allocation system and task processing method | |
EP3285187B1 (en) | Optimized merge-sorting of data retrieved from parallel storage units | |
JP7597465B2 (en) | Storage allocation improvements for microservices | |
CN110706148B (en) | Face image processing method, device, equipment and storage medium | |
US12032515B2 (en) | Data migration management and migration metric prediction | |
CN105373347B (en) | A kind of hot spot data identification of storage system and dispatching method and system | |
EP2311250A1 (en) | Model-based resource allocation | |
CN113312161A (en) | Application scheduling method, platform and storage medium | |
CN114661419B (en) | A service quality control system and method | |
CN114816272B (en) | Magnetic disk management system under Kubernetes environment | |
CN114296891A (en) | Task scheduling method, system, computing device, storage medium and program product | |
WO2024119793A1 (en) | Scheduling method, system and device based on cache affinity, and medium | |
CN102098170A (en) | Data acquisition optimization method and system | |
CN119248196A (en) | Data storage method and device | |
CN118034878A (en) | Pod scheduling method and device, storage medium and electronic equipment | |
CN106254516A (en) | Load-balancing method and device | |
CN114245139B (en) | Video transcoding scheduling method, device, computer equipment and storage medium | |
CN118158216A (en) | Resource management method based on container and storage computing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |