[go: up one dir, main page]

CN119002817A - Data storage optimization method and device based on persistent cache and persistent memory - Google Patents

Data storage optimization method and device based on persistent cache and persistent memory Download PDF

Info

Publication number
CN119002817A
CN119002817A CN202411147652.8A CN202411147652A CN119002817A CN 119002817 A CN119002817 A CN 119002817A CN 202411147652 A CN202411147652 A CN 202411147652A CN 119002817 A CN119002817 A CN 119002817A
Authority
CN
China
Prior art keywords
target address
database record
data
target
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411147652.8A
Other languages
Chinese (zh)
Inventor
陈康
季智成
武永卫
章明星
姜进磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202411147652.8A priority Critical patent/CN119002817A/en
Publication of CN119002817A publication Critical patent/CN119002817A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开涉及一种基于持久缓存和持久内存的数据存储优化方法及装置。该方法包括:响应于针对持久内存的访问行为,确定与访问行为对应的目标地址的访问频率信息,访问频率信息表示该目标地址被访问的频率;响应于目标地址的访问频率信息满足预设条件且访问行为是写入操作,对目标数据执行刷写操作,刷写操作包括将目标数据由缓存写入持久内存的目标地址的操作和内存屏障操作。根据本申请实施例,可以实现针对写入PM的数据选择性地执行刷写操作,在保证数据持久性和正确性的同时减少对持久内存的写入次数,使应用持久内存的数据库系统在动态变化的负载下仍能保持稳定的性能,从而提升了数据库系统的性能和吞吐能力。

The present disclosure relates to a data storage optimization method and device based on persistent cache and persistent memory. The method includes: in response to the access behavior to the persistent memory, determining the access frequency information of the target address corresponding to the access behavior, the access frequency information indicates the frequency of the target address being accessed; in response to the access frequency information of the target address meeting the preset conditions and the access behavior being a write operation, performing a flush operation on the target data, the flush operation including an operation of writing the target data from the cache to the target address of the persistent memory and a memory barrier operation. According to the embodiment of the present application, it is possible to selectively perform a flush operation on the data written to the PM, while ensuring the persistence and correctness of the data and reducing the number of writes to the persistent memory, so that the database system using the persistent memory can still maintain stable performance under dynamically changing loads, thereby improving the performance and throughput of the database system.

Description

Data storage optimization method and device based on persistent cache and persistent memory
Technical Field
The disclosure relates to the technical field of data processing, and in particular relates to a data storage optimization method and device based on persistent cache and persistent memory.
Background
Database systems may provide data storage, management, and access functions, supporting a variety of data processing and applications. Conventional databases typically store data in persistent disk storage. Meanwhile, in order to improve the data access speed and the system performance, the database also utilizes the memory (Dynamic Random Access Memory, DRAM) to buffer partial data or execute computation-intensive operation, so that the query request of a user can be responded quickly, and more efficient data retrieval and processing capacity can be provided. With the advent of persistent Memory (Persist Memory, PM), also known as Non-Volatile Memory (NVM), the Memory has been developed to have the characteristics of large capacity, persistence, low latency, and byte-addressable, and to have the advantages of both disk and DRAM, thus providing new opportunities for building a database online transaction (Online Transaction Processing, OLTP) engine.
Persistent memory can significantly reduce programming complexity and a programmer can safely remove all instructions associated with a flush while maintaining program correctness. However, in a database system with persistent memory, due to the mismatch in granularity between the cache line (typically 64B) of the central processing unit (Central Processing Unit, CPU) and the PM write operation (typically 256B), when the cache line is passively evicted to the PM at random time, if the flush instruction is simply deleted, the problem of write amplification occurs, and when data spanning multiple consecutive cache lines is written, the cache lines can be written into the PM at the same time by using the flush instruction, i.e. multiple cache lines can be merged into one write; if the write-through instruction is not used, the cache lines are evicted to the PM at random time, and the cache lines can be written into the PM only by writing operation for a plurality of times, so that the write-in times of the PM are increased, and extra permanent memory write-in is caused, so that the performance of the database system of the permanent memory is reduced, and the throughput capacity of the database system of the permanent memory is affected.
Disclosure of Invention
In view of this, the present disclosure proposes a data storage optimization method and apparatus based on persistent cache and persistent memory.
According to one aspect of the present disclosure, a data storage optimization method based on persistent cache and persistent memory is provided. The method comprises the following steps:
determining access frequency information of a target address corresponding to the access behavior in response to the access behavior for the persistent memory, the access frequency information representing the frequency with which the target address is accessed;
in response to the access frequency information of the target address meeting a preset condition and the access behavior being a write operation, performing a flush operation on the target data, the flush operation including an operation of writing the target data from the cache to the target address of the persistent memory and a memory barrier operation.
In one possible implementation, the access frequency information of the target address is used to indicate that the data stored in the target address is cold data or hot data, and determining, in response to the access behavior for the persistent memory, the access frequency information of the target address corresponding to the access behavior includes:
Searching a database record associated with the target address in a database record table in response to the access behavior for the target address in the persistent memory;
Determining that the data stored in the target address is hot data in response to the database record associated with the target address, otherwise determining that the data stored in the target address is cold data; wherein the access frequency for hot data is higher than the access frequency for cold data.
In one possible implementation, the preset condition includes that the access frequency information indicates that the data stored at the destination address is cold data.
In one possible implementation, a database record table includes one or more hash buckets, each including one or more database records therein, the database records including addresses in persistent memory, and determining access frequency information for a target address corresponding to an access behavior in response to the access behavior for the persistent memory, including:
Determining a target hash bucket where a database record associated with the target address is located according to the hash value of the target address;
and searching a database record associated with the target address in the target hash bucket.
In one possible implementation, the database record table includes access frequencies for each database record, the method further comprising:
in response to there being no database record associated with the target address, one of the database records in the database record table is replaced with the database record associated with the target address.
In one possible implementation, in response to the absence of a database record associated with the target address, replacing one of the database records in the database record table with the database record associated with the target address comprises:
searching a database record with the access frequency of 0 in a target hash bucket in response to the fact that the database record associated with the target address does not exist;
And in response to the database record with the access frequency of 0 in the target hash bucket, replacing the database record with the access frequency of 0 with the database record associated with the target address.
In one possible implementation, in response to the absence of a database record associated with the target address, replacing one of the database records in the database record table with the database record associated with the target address, further comprises:
and in response to the fact that the database records with the access frequency of 0 do not exist in the target hash bucket, reducing the access frequency of each database record in the target hash bucket, and re-executing the steps of searching the database records with the access frequency of 0 in the target hash bucket and then until the database records with the access frequency of 0 are searched in the target hash bucket.
In one possible implementation, the method further includes:
In response to the presence of the database record associated with the target address, increasing the frequency of access to the database record associated with the target address.
According to another aspect of the present disclosure, a data storage optimization apparatus based on persistent cache and persistent memory is provided. The device comprises:
The first statistics module is used for responding to the access behavior aiming at the persistent memory, determining the access frequency information of the target address corresponding to the access behavior, wherein the access frequency information represents the frequency of the target address being accessed;
And the refreshing module is used for responding to the condition that the access frequency information of the target address meets the preset condition and the access behavior is a writing operation, and executing the refreshing operation on the target data, wherein the refreshing operation comprises the operation of writing the target data into the target address of the persistent memory from the cache and the memory barrier operation.
In one possible implementation, the access frequency information of the target address is used to indicate that the data stored in the target address is cold data or hot data, and the first statistics module is used to:
Searching a database record associated with the target address in a database record table in response to the access behavior for the target address in the persistent memory;
Determining that the data stored in the target address is hot data in response to the database record associated with the target address, otherwise determining that the data stored in the target address is cold data; wherein the access frequency for hot data is higher than the access frequency for cold data.
In one possible implementation, the preset condition includes that the access frequency information indicates that the data stored at the destination address is cold data.
In one possible implementation, the database record table includes one or more hash buckets, each of the hash buckets including one or more database records, the database records including addresses in persistent memory, a first statistics module for:
Determining a target hash bucket where a database record associated with the target address is located according to the hash value of the target address;
and searching a database record associated with the target address in the target hash bucket.
In one possible implementation, the database record table includes access frequencies for each database record, the apparatus further comprising:
And a replacement module for replacing one database record in the database record table with the database record associated with the target address in response to the absence of the database record associated with the target address.
In one possible implementation, in response to the absence of a database record associated with the target address, replacing one of the database records in the database record table with the database record associated with the target address comprises:
searching a database record with the access frequency of 0 in a target hash bucket in response to the fact that the database record associated with the target address does not exist;
And in response to the database record with the access frequency of 0 in the target hash bucket, replacing the database record with the access frequency of 0 with the database record associated with the target address.
In one possible implementation, in response to the absence of a database record associated with the target address, replacing one of the database records in the database record table with the database record associated with the target address, further comprises:
and in response to the fact that the database records with the access frequency of 0 do not exist in the target hash bucket, reducing the access frequency of each database record in the target hash bucket, and re-executing the steps of searching the database records with the access frequency of 0 in the target hash bucket and then until the database records with the access frequency of 0 are searched in the target hash bucket.
In one possible implementation, the apparatus further includes:
and the second statistical module is used for increasing the access frequency of the database records associated with the target address in response to the database records associated with the target address.
According to another aspect of the present disclosure, there is provided a data storage optimization apparatus based on persistent cache and persistent memory, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the above-described method.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
According to the embodiment of the application, the access frequency information of the target address corresponding to the access behavior is determined in response to the access behavior aiming at the persistent memory, the access frequency information indicates the accessed frequency of the target address, the target data is subjected to the refreshing operation in response to the access frequency information of the target address meeting the preset condition and the access behavior being the writing operation, different refreshing strategies can be selected according to the accessed frequency, the refreshing operation can be selectively carried out on the data written into PM, the data durability and the correctness are ensured, the writing times of the persistent memory are reduced, and the database system applying the persistent memory can still maintain stable performance under the dynamically changed load, so that the performance and the throughput capacity of the database system are improved.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a schematic diagram of an application scenario according to an embodiment of the application.
FIG. 2 shows a flow chart of a data storage optimization method according to an embodiment of the application.
FIG. 3 shows a schematic diagram of a database system according to an embodiment of the application.
FIG. 4 is a schematic overall flow diagram of a data storage optimization method according to an embodiment of the application.
Fig. 5 shows an effect diagram of a data storage optimization method according to an embodiment of the present application.
FIG. 6 shows a block diagram of a data storage optimization device according to an embodiment of the application.
FIG. 7 is a block diagram illustrating an apparatus 1900 for data storage optimization, according to an example embodiment.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
Database systems may provide data storage, management, and access functions, supporting a variety of data processing and applications. Conventional databases typically store data in persistent disk storage. Meanwhile, in order to improve the data access speed and the system performance, the database also utilizes the memory (Dynamic Random Access Memory, DRAM) to buffer partial data or execute computation-intensive operation, so that the query request of a user can be responded quickly, and more efficient data retrieval and processing capacity can be provided. With the advent of persistent Memory (Persist Memory, PM), also known as Non-Volatile Memory (NVM), the Memory has been developed to have the characteristics of large capacity, persistence, low latency, and byte-addressable, and to have the advantages of both disk and DRAM, thus providing new opportunities for building a database online transaction (Online Transaction Processing, OLTP) engine.
Persistent memory can significantly reduce programming complexity and a programmer can safely remove all instructions associated with a flush while maintaining program correctness. However, in a database system with persistent memory, due to the mismatch in granularity between the cache line (typically 64B) of the central processing unit (Central Processing Unit, CPU) and the PM write operation (typically 256B), when the cache line is passively evicted to the PM at random time, if the flush instruction is simply deleted, the problem of write amplification occurs, and when data spanning multiple consecutive cache lines is written, the cache lines can be written into the PM at the same time by using the flush instruction, i.e. multiple cache lines can be merged into one write; if the write-through instruction is not used, the cache lines are evicted to the PM at random time, and the cache lines can be written into the PM only by writing operation for a plurality of times, so that the write-in times of the PM are increased, and extra permanent memory write-in is caused, so that the performance of the database system of the permanent memory is reduced, and the throughput capacity of the database system of the permanent memory is affected.
In view of the above, the application provides a data storage optimization method and device based on persistent cache and persistent memory. According to the method, the access frequency information of the target address corresponding to the access behavior is determined in response to the access behavior aiming at the persistent memory, the access frequency information indicates the accessed frequency of the target address, the target data is subjected to the refreshing operation in response to the access frequency information of the target address meeting the preset condition and the access behavior being the writing operation, different refreshing strategies can be selected according to the accessed frequency, the refreshing operation can be selectively carried out on the data written into the PM, the data persistence and the correctness are ensured, the writing times of the persistent memory are reduced, the database system applying the persistent memory can still maintain stable performance under the dynamically changed load, and therefore the performance and the throughput capacity of the database system are improved.
Fig. 1 shows a schematic diagram of an application scenario according to an embodiment of the application. As shown in fig. 1, the present application may be applied to a memory optimization scenario where a Persistent Memory (PM) cooperates with a CPU Cache (CPU Cache), where the CPU Cache may include a plurality of Cache lines (such as Cache lines corresponding to Updates and Other in the figure), and in this scenario, a user may write or read data, and if the data written or read by the user does not hit in the CPU Cache, an access behavior for the PM may be generated, that is, the data may be written into the PM or read from the PM. Where data (e.g., a cache line corresponding to Updates in the figure) is written to PM, it may be selected according to a method of an embodiment of the present application to use a flush instruction or not to use a flush instruction when writing a cache line to PM, where multiple cache lines may be consolidated to write to PM when using a flush instruction, and where multiple cache lines are written to PM by multiple write operations when not using a flush instruction.
The application can be used for terminal equipment or a server. The terminal device may be any one or more of a mobile phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), and a vehicle-mounted device. The embodiment of the present disclosure is not particularly limited to a specific type of terminal device, and may have a wired or wireless communication function.
The server may be located at a local or cloud end, may be a physical device, may also be a virtual device, such as a virtual machine, a container, or the like, and has a wireless communication function, where the wireless communication function may be provided on a chip (system) or other parts or components of the server. The wireless communication function may be realized by mobile communication technologies such as 2G/3G/4G/5G, wi-Fi, bluetooth, frequency modulation (frequency modulation, FM), digital radio, satellite communication, and the like. Communication may also be via a wired connection to enable interaction with other devices.
FIG. 2 shows a flow chart of a data storage optimization method according to an embodiment of the application. The method can be used for the terminal equipment or the server, as shown in fig. 2, and comprises the following steps:
in step S201, in response to the access behavior for the persistent memory, access frequency information of the target address corresponding to the access behavior is determined.
The access behavior for persistent memory may be due to a miss of target data in the CPU cache. The access behavior for persistent memory may be writing data to a target address of the PM or reading data in the target address of the PM. Writing the target data to the PM may be writing the target data to the PM by a CPU cache.
The access frequency information may indicate a frequency with which the target address is accessed, and is used to indicate whether the data stored in the target address is cold data or hot data. The access frequency of the hot data is higher than that of the cold data, the hot data is frequently accessed, and the cold data is not frequently accessed. Because the CPU cache is usually prioritized for replacing the least recently Used (LEAST RECENTLY Used, LRU) data when performing the cache replacement algorithm, hot data more easily resides in the CPU cache but is not replaced.
According to the method and the device, the access frequency of the PM address is recorded through the database record table, so that the access frequency information of the target address can be obtained according to the database record table. Referring to FIG. 3, a schematic diagram of a database system is shown, according to an embodiment of the application. As shown in fig. 3, the data in the embodiment of the present application may be data stored in the persistent memory in units of records (such as data records in the figure), and a database record table may be stored in the DRAM to maintain the access frequency of the address in the PM. The data records maintained in the DRAM by each thread of the CPU are isolated from each other, and each piece of content in the database record table is respectively associated with a record in the PM.
In order to make statistics of cold and hot data as light as possible, the database record table is split into a plurality of sub-tables, wherein the database record is divided into a plurality of hash buckets through a hash algorithm, and each hash bucket corresponds to one sub-table. The hash processing can be performed on the addresses of each piece of data in the PM in advance to obtain hash values of each address, database records corresponding to the data are divided into different hash buckets according to the hash values, for example, the first n bits of the hash values are used as indexes of the hash buckets, and the database records with the same indexes are divided into the same hash bucket.
Referring to the example of fig. 3, the database record table may include one or more hash buckets, each having a corresponding hash bucket number, each hash bucket including one or more database records therein. The database record includes addresses (data record addresses) in persistent memory, such as 0x0000, 0x0036 in the figure, for pointing to data stored in the PM; the database record table may further include an access frequency of each database record for indicating the number of times the corresponding address is accessed.
This step S201 includes:
Searching a database record associated with the target address in a database record table in response to the access behavior for the target address in the persistent memory; and in response to the database record associated with the target address, determining that the data stored by the target address is hot data, and otherwise, determining that the data stored by the target address is cold data.
The database record table may include a database record associated with the data in the PM, and if there is a database record associated with the target address in the database record table, the target address is once accessed, and the data stored at the target address may be considered to be hot data.
In response to an access behavior for a target address in persistent memory, in looking up a database record associated with the target address in a database record table, it is possible to:
Determining a target hash bucket where a database record associated with the target address is located according to the hash value of the target address; and searching a database record associated with the target address in the target hash bucket.
The first n bits of the hash value of the target address may be used as an index of the hash bucket, a target hash bucket associated with the target address may be found first, and then the target address may be compared with the data record address in the target hash bucket to determine whether a database record associated with the target address exists in the target hash bucket.
By dividing the database record into a plurality of hash buckets, the cost during searching can be reduced, the resources can be saved, and the access speed can be increased.
If the database record associated with the target address is found in the target hash bucket, the data stored in the target address can be considered to be hot data, and the method can further include:
In response to the presence of the database record associated with the target address, increasing the frequency of access to the database record associated with the target address.
The frequency of accesses in the database record associated with the target address may be increased by 1. At this time, if the access behavior is a read data behavior, the data stored in the target address may be read out from the PM; if the access behavior is a write data behavior, the target data may be evicted (evict) from the cache to the PM according to writing the target data to a target address in the PM.
If the database record associated with the target address is not found in the target hash bucket, the data stored in the target address can be considered to be cold data and can be added into the database record table, and the method can further comprise:
in response to there being no database record associated with the target address, one of the database records in the database record table is replaced with the database record associated with the target address.
The replaced database record may be the least recently used item, corresponding to the least frequently accessed database record.
Wherein, can:
Searching a database record with the access frequency of 0 in a target hash bucket in response to the fact that the database record associated with the target address does not exist; and in response to the database record with the access frequency of 0 in the target hash bucket, replacing the database record with the access frequency of 0 with the database record associated with the target address.
And in response to the fact that the database records with the access frequency of 0 do not exist in the target hash bucket, reducing the access frequency of each database record in the target hash bucket, and re-executing the steps of searching the database records with the access frequency of 0 in the target hash bucket and then until the database records with the access frequency of 0 are searched in the target hash bucket.
In the program implementation, a global pointer (pointer) may be used to record the end position of each replacement process, when no database record with the access frequency of 0 is found after one round of searching, the pointer may be redirected to the first database record in the target hash bucket, the access frequency of each database record is subtracted by 1 in sequence, then each database record in the target hash bucket is traversed again, whether there is a database record with the access frequency of 0 is searched, the above process is executed circularly until there is a database record with the access frequency of 0, and the database record is replaced by the database record associated with the target address.
If the access behavior is a read data behavior, the data stored at the target address may be read from the PM and the process may be ended.
In step S202, in response to the access frequency information of the target address satisfying the preset condition and the access behavior being a write operation, a swiping operation is performed on the target data.
When the access behavior is a write operation, the target data may be data to be written to the target address.
The flushing operations may include operations to write target data from the cache to the target address of persistent memory (clwb instructions) and memory barrier operations (sfence instructions). The clwb instruction may not invalidate the cache line when target data in the cache line is written to the PM, keeping the target data consistent between the PM and the cache. The sfence instruction may be used to ensure that the write operation before it has been completed and saved to the PM to avoid data consistency issues.
The preset condition may include that the access frequency information indicates that the data stored in the target address is cold data. The durability of the written target data can be ensured through clwb instructions and the sfence instructions, namely the target data in PM cannot disappear after power failure. The application does not add a brushing operation to the hot data, but simply evicts the hot data from the cache (evict) to the PM, and the part of the hot data in the PM disappears after power is off, because the hot data is more likely to be frequently replaced due to higher access frequency, and thus the durability of the hot data is not required to be ensured.
According to the embodiment of the application, the access frequency information of the target address corresponding to the access behavior is determined in response to the access behavior aiming at the persistent memory, the access frequency information indicates the accessed frequency of the target address, the target data is subjected to the refreshing operation in response to the access frequency information of the target address meeting the preset condition and the access behavior being the writing operation, different refreshing strategies can be selected according to the accessed frequency, the refreshing operation can be selectively carried out on the data written into PM, the data durability and the correctness are ensured, the writing times of the persistent memory are reduced, and the database system applying the persistent memory can still maintain stable performance under the dynamically changed load, so that the performance and the throughput capacity of the database system are improved.
FIG. 4 is a schematic overall flow diagram of a data storage optimization method according to an embodiment of the application.
As shown in fig. 4, the overall flow includes:
S1: executing S2 when accessing the target address of the PM;
The target address of the accessing PM may be writing data into the target address or reading data stored by the target address.
S2: carrying out hash processing on the target address, determining a hash bucket corresponding to the target address according to the hash value, and executing S3;
S3: searching a database record related to the target address in the hash bucket, executing S3.1 if the record is found, otherwise executing S3.2;
S3.1: sequentially adding 1 to the access frequency of the database records in the hash bucket, and ending the flow;
S3.2: searching a database record with the access frequency of 0 in the hash bucket, if so, executing S3.3, otherwise, executing S3.4;
s3.3: selecting a database record with the access frequency of 0, replacing the database record with the database record associated with the target address, and executing S3.3.1;
s3.4: sequentially subtracting 1 from the access frequency of the database records in the hash bucket, and returning to execute S3.2;
s3.3.1: judging whether the access behavior is a writing operation or not, if yes, executing S3.3.2, otherwise ending the flow;
s3.3.2: executing a brushing instruction on the written data, and ending the flow.
Fig. 5 shows an effect diagram of a data storage optimization method according to an embodiment of the present application. As shown in fig. 5, a schematic diagram of the effect of all the swiping, all the non-swiping, and selective data swiping (the method of the present application) is given, wherein two graphs show the key performance indicators (i.e., throughput rate) of the database system as a function of the number of computing threads.
The left graph shows the effect of the load without hot spot data, namely, the data access is evenly distributed (YCSB-AUniform), and the problem of write amplification caused by different access granularities of CPU cache and PM in a data system adopting all non-brush write strategies can be seen, so that the throughput rate is lower than that of the other two data systems by at most 38.1%.
The right graph shows the effect of the existence of hot spot data in the load, namely, the data access follows the Zipfan distribution (YCSB-AZipfan) of θ=0.99, and it can be seen that the adoption of the all-brush write strategy can lead to the hot data to be needlessly written into PM (particulate matter), the throughput rate of the method is 39.0% lower than that of the selective brush write strategy adopted by the method, and the problem of write amplification caused by different access granularities of CPU cache and PM is solved by the data system adopting the all-non-brush write strategy, and the throughput rate is lower than that of the method by at most 38.1%.
FIG. 6 shows a block diagram of a data storage optimization device according to an embodiment of the application. As shown in fig. 6, the apparatus may include:
A first statistics module 601, configured to determine, in response to an access behavior for a persistent memory, access frequency information of a target address corresponding to the access behavior, where the access frequency information indicates a frequency at which the target address is accessed;
The flushing module 602 is configured to perform a flushing operation on the target data in response to the access frequency information of the target address meeting a preset condition and the access behavior being a write operation, where the flushing operation includes an operation of writing the target data from the cache to the target address of the persistent memory and a memory barrier operation.
In one possible implementation, the access frequency information of the target address is used to indicate that the data stored in the target address is cold data or hot data, and the first statistics module 601 is configured to:
Searching a database record associated with the target address in a database record table in response to the access behavior for the target address in the persistent memory;
Determining that the data stored in the target address is hot data in response to the database record associated with the target address, otherwise determining that the data stored in the target address is cold data; wherein the access frequency for hot data is higher than the access frequency for cold data.
In one possible implementation, the preset condition includes that the access frequency information indicates that the data stored at the destination address is cold data.
In one possible implementation, the database record table includes one or more hash buckets, each of which includes one or more database records, the database records including addresses in persistent memory, and the first statistics module 601 is configured to:
Determining a target hash bucket where a database record associated with the target address is located according to the hash value of the target address;
and searching a database record associated with the target address in the target hash bucket.
In one possible implementation, the database record table includes access frequencies for each database record, the apparatus further comprising:
And a replacement module for replacing one database record in the database record table with the database record associated with the target address in response to the absence of the database record associated with the target address.
In one possible implementation, in response to the absence of a database record associated with the target address, replacing one of the database records in the database record table with the database record associated with the target address comprises:
searching a database record with the access frequency of 0 in a target hash bucket in response to the fact that the database record associated with the target address does not exist;
And in response to the database record with the access frequency of 0 in the target hash bucket, replacing the database record with the access frequency of 0 with the database record associated with the target address.
In one possible implementation, in response to the absence of a database record associated with the target address, replacing one of the database records in the database record table with the database record associated with the target address, further comprises:
and in response to the fact that the database records with the access frequency of 0 do not exist in the target hash bucket, reducing the access frequency of each database record in the target hash bucket, and re-executing the steps of searching the database records with the access frequency of 0 in the target hash bucket and then until the database records with the access frequency of 0 are searched in the target hash bucket.
In one possible implementation, the apparatus further includes:
and the second statistical module is used for increasing the access frequency of the database records associated with the target address in response to the database records associated with the target address.
According to the embodiment of the application, the access frequency information of the target address corresponding to the access behavior is determined in response to the access behavior aiming at the persistent memory, the access frequency information indicates the accessed frequency of the target address, the target data is subjected to the refreshing operation in response to the access frequency information of the target address meeting the preset condition and the access behavior being the writing operation, different refreshing strategies can be selected according to the accessed frequency, the refreshing operation can be selectively carried out on the data written into PM, the data durability and the correctness are ensured, the writing times of the persistent memory are reduced, and the database system applying the persistent memory can still maintain stable performance under the dynamically changed load, so that the performance and the throughput capacity of the database system are improved.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiment of the disclosure also provides a data storage optimization device based on persistent cache and persistent memory, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
FIG. 7 is a block diagram illustrating an apparatus 1900 for data storage optimization, according to an example embodiment. For example, the apparatus 1900 may be provided as a server or terminal device. Referring to fig. 7, the apparatus 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by the processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The apparatus 1900 may further comprise a power component 1926 configured to perform power management of the apparatus 1900, a wired or wireless network interface 1950 configured to connect the apparatus 1900 to a network, and an input/output interface 1958 (I/O interface). The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server TM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of apparatus 1900 to perform the above-described methods.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (11)

1. A method for optimizing data storage based on persistent cache and persistent memory, the method comprising:
Determining access frequency information of a target address corresponding to access behaviors of a persistent memory in response to the access behaviors, wherein the access frequency information represents the frequency of the target address being accessed;
and in response to the access frequency information of the target address meeting a preset condition and the access behavior being a write operation, performing a flush operation on the target data, wherein the flush operation comprises an operation of writing the target data into the target address of the persistent memory by a cache and a memory barrier operation.
2. The method of claim 1, wherein the access frequency information of the target address is used to indicate that the data stored by the target address is cold data or hot data, and wherein the determining, in response to the access behavior for the persistent memory, the access frequency information of the target address corresponding to the access behavior includes:
Searching a database record associated with the target address in a database record table in response to the access behavior for the target address in the persistent memory;
Determining that the data stored by the target address is hot data in response to the database record associated with the target address, otherwise determining that the data stored by the target address is cold data; wherein the access frequency for hot data is higher than the access frequency for cold data.
3. The method of claim 2, wherein the preset condition includes the access frequency information indicating that the data stored at the target address is cold data.
4. The method of claim 2, wherein the database record table includes one or more hash buckets, each hash bucket including one or more database records therein, the database records including addresses in persistent memory, wherein the determining access frequency information for a target address corresponding to an access behavior for persistent memory in response to the access behavior comprises:
Determining a target hash bucket where a database record associated with the target address is located according to the hash value of the target address;
and searching a database record associated with the target address in the target hash bucket.
5. The method of claim 2, wherein the database record table includes access frequencies for each database record, the method further comprising:
In response to there being no database record associated with the target address, replacing one database record in the database record table with a database record associated with the target address.
6. The method of claim 5, wherein the replacing one of the database records in the database record table with the database record associated with the target address in response to the absence of the database record associated with the target address comprises:
searching a database record with the access frequency of 0 in a target hash bucket in response to the fact that the database record associated with the target address does not exist;
and in response to the database record with the access frequency of 0 in the target hash bucket, replacing the database record with the access frequency of 0 with the database record associated with the target address.
7. The method of claim 6, wherein in response to the absence of a database record associated with a target address, replacing one of the database records in the database record table with the database record associated with the target address further comprises:
and in response to the fact that the database records with the access frequency of 0 do not exist in the target hash bucket, reducing the access frequency of each database record in the target hash bucket, and re-executing the steps of searching the database records with the access frequency of 0 in the target hash bucket and then until the database records with the access frequency of 0 are searched in the target hash bucket.
8. The method according to claim 2, wherein the method further comprises:
In response to the presence of the database record associated with the target address, increasing the frequency of access to the database record associated with the target address.
9. A data storage optimization apparatus based on persistent cache and persistent memory, the apparatus comprising:
The first statistics module is used for responding to the access behavior aiming at the persistent memory, determining the access frequency information of the target address corresponding to the access behavior, wherein the access frequency information represents the frequency of the accessed target address;
And the refreshing module is used for executing refreshing operation on the target data in response to the access frequency information of the target address meeting a preset condition and the access behavior being a writing operation, wherein the refreshing operation comprises the operation of writing the target data into the target address of the persistent memory from a cache and a memory barrier operation.
10. A data storage optimization apparatus based on persistent cache and persistent memory, comprising:
A processor;
A memory for storing processor-executable instructions;
Wherein the processor is configured to implement the method of any one of claims 1 to 8 when executing the instructions stored by the memory.
11. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 8.
CN202411147652.8A 2024-08-20 2024-08-20 Data storage optimization method and device based on persistent cache and persistent memory Pending CN119002817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411147652.8A CN119002817A (en) 2024-08-20 2024-08-20 Data storage optimization method and device based on persistent cache and persistent memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411147652.8A CN119002817A (en) 2024-08-20 2024-08-20 Data storage optimization method and device based on persistent cache and persistent memory

Publications (1)

Publication Number Publication Date
CN119002817A true CN119002817A (en) 2024-11-22

Family

ID=93493201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411147652.8A Pending CN119002817A (en) 2024-08-20 2024-08-20 Data storage optimization method and device based on persistent cache and persistent memory

Country Status (1)

Country Link
CN (1) CN119002817A (en)

Similar Documents

Publication Publication Date Title
US10831654B2 (en) Cache management using multiple cache history lists
CN108268219B (en) Method and device for processing IO (input/output) request
US10802755B2 (en) Method and manager for managing storage system
CN110109915B (en) Method, apparatus and computer program product for managing hash tables
CN110737399B (en) Method, apparatus and computer program product for managing a storage system
US9733991B2 (en) Deferred re-MRU operations to reduce lock contention
CN114490445B (en) Fast cache tracking supporting aggressive prefetching
US11593268B2 (en) Method, electronic device and computer program product for managing cache
US8935481B2 (en) Apparatus system and method for providing raw data in a level-two cache
US11113195B2 (en) Method, device and computer program product for cache-based index mapping and data access
US20180157593A1 (en) Value cache in a computing system
US11204891B2 (en) Identifying partial update for tape file system
US9965397B2 (en) Fast read in write-back cached memory
CN104573112A (en) Page query method and data processing node for OLTP cluster database
US11194720B2 (en) Reducing index operations in a cache
US11520818B2 (en) Method, apparatus and computer program product for managing metadata of storage object
CN119473162A (en) Cache management method and device, cache, electronic device and storage medium
CN114528229A (en) Cache data access method and device and electronic equipment
CN112764662B (en) Method, apparatus and computer program product for storage management
CN107748649B (en) Method and device for caching data
US11803469B2 (en) Storing data in a log-structured format in a two-tier storage system
US20140115246A1 (en) Apparatus, system and method for managing empty blocks in a cache
CN108984117B (en) Data reading and writing method, medium and equipment
CN119002817A (en) Data storage optimization method and device based on persistent cache and persistent memory
CN108959517B (en) File management method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination