[go: up one dir, main page]

CN120803363A - Method and system for hierarchical memory and hierarchical storage based on NVM - Google Patents

Method and system for hierarchical memory and hierarchical storage based on NVM

Info

Publication number
CN120803363A
CN120803363A CN202510945875.7A CN202510945875A CN120803363A CN 120803363 A CN120803363 A CN 120803363A CN 202510945875 A CN202510945875 A CN 202510945875A CN 120803363 A CN120803363 A CN 120803363A
Authority
CN
China
Prior art keywords
memory
page
nvm
access
pages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510945875.7A
Other languages
Chinese (zh)
Inventor
陈海波
蔡文俊
董明凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong University filed Critical Shanghai Jiao Tong University
Priority to CN202510945875.7A priority Critical patent/CN120803363A/en
Publication of CN120803363A publication Critical patent/CN120803363A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明提供了一种基于NVM的分层内存和分层存储的方法及系统,包括步骤S1,识别内存页的访问特征,将内存页分类为持久化冷页、访存冷页、持久化热页和访存热页,并基于分类,对NVM的存储空间进行初始化分区;步骤S2,通过跟踪内存页的访问热度以及持久化热度,动态调整内存页在动态随机存取存储器DRAM和NVM之间的分布;步骤S3,将持久化热页在后台持久化到NVM上。本发明将分层内存系统和分层存储系统相结合,引入新的迁移判断标准,通过预测该页是否频繁需要进行持久化操作,以及该页的访存热度高低,智能地将对应的页迁移到NVM或DRAM中,减少系统在持久化操作和访存过程中的开销。

The present invention provides a method and system for tiered memory and tiered storage based on NVM, including step S1, identifying the access characteristics of memory pages, classifying the memory pages into persistent cold pages, access cold pages, persistent hot pages, and access hot pages, and initializing the partitioning of the NVM storage space based on the classification; step S2, dynamically adjusting the distribution of memory pages between dynamic random access memory (DRAM) and NVM by tracking the access heat and persistent heat of the memory pages; and step S3, persisting the persistent hot pages to NVM in the background. The present invention combines a tiered memory system with a tiered storage system, introduces a new migration judgment standard, and intelligently migrates the corresponding page to NVM or DRAM by predicting whether the page frequently requires persistence operations and the access heat of the page, thereby reducing the system's overhead during persistence operations and memory access.

Description

Method and system for hierarchical memory and hierarchical storage based on NVM
Technical Field
The invention relates to the field of storage systems, in particular to a hierarchical memory based on an NVM and a hierarchical storage method and system.
Background
With the explosive growth of data volumes and diversification of applications, modern storage systems face comprehensive challenges of performance, capacity, and cost. In order to reduce storage costs while ensuring high performance, researchers have proposed layered storage systems based on different storage media. By integrating the characteristic advantages of various storage media, an efficient storage architecture is constructed.
Dynamic Random Access Memory (DRAM) is the storage medium with the fastest access speed in the current computing field, which has very low access latency and very high data transfer rate, and is often used to cache frequently accessed hot data, thereby accelerating the running of applications. However, DRAM has problems of limited capacity and high cost, and it is difficult to meet the storage requirement of large-scale data.
Non-volatile memory (NVM), such as Intel Optane, is an emerging product as a new storage medium between DRAM and Solid State Disk (SSD), which has both access speed approaching DRAM and non-volatile characteristics (data remains after power down). Not only is the capacity of the NVM typically greater than that of DRAM, but it is often provided as an intermediate layer in a tiered storage system for storing sub-thermal data, effectively relieving the capacity pressure of DRAM. But the cost and power consumption of NVM is relatively high.
SSDs are flash-based non-volatile storage devices featuring larger capacity and lower unit storage cost. Although its access speed is inferior to that of DRAM and NVM, its excellent capacity scalability makes it an ideal choice for the underlying storage medium of a tiered storage system.
Hard Disk Drives (HDDs) are traditional storage devices that are lower in cost per GB than SSDs, but have significantly slower access speeds, especially random access performance. Therefore, the HDD is suitable as a storage medium at the lowest layer of the hierarchical storage system for storing cold data such as backup data, history logs, archive files, and the like, which have a low access frequency. By virtue of the advantages of large capacity and low cost, the HDD can obviously reduce the total ownership cost (TCO) of the storage system, and can avoid the influence of low-speed random access on the system performance by matching with a reasonable data layering strategy.
The core idea of the hierarchical storage system is to dynamically allocate data to different storage layers through a reasonable data layering strategy according to the access frequency, performance requirements and cost factors of the data. Specifically, the DRAM stores the most active data, the NVM stores the sub-hot data, and the SSD and HDD are used to store the cold data, so as to maximize the utilization efficiency of the storage resources while ensuring the overall performance of the system, and reduce the Total Cost of Ownership (TCO). In addition, the system is also required to be provided with an intelligent data migration mechanism, and the distribution of data in each storage layer is dynamically adjusted according to the data access mode, so that the optimal balance of performance and cost is realized.
While prior studies have conducted extensive research into a hierarchical memory system consisting of DRAM and NVM, and a hierarchical storage system consisting of NVM and SSD, the research on the combination of a hierarchical memory system and a hierarchical storage system is relatively poor. The method is mainly characterized in that in the traditional computer architecture, a memory system and a storage system are mutually independent, if fusion of the memory system and the storage system is to be realized, implicit semantics in the memory system need to be deeply mined, and the data pages are accurately judged to be suitable for entering the memory system and the data pages are required to be stored in the storage system, so that efficient migration of the data pages between the two systems is realized.
The invention patent with publication number CN109213422A is found by searching patent literature, which discloses a layered storage method, a layered storage device and a computer readable storage medium, the method comprises the steps of monitoring the reading frequency of CPU to the data in the memory; and if the target storage area is judged not to be the original storage area of the data, when the memory releases the data, the data is migrated to the target storage area. The patent is not combined with the layering of the NVM and the DRAM, has no persistent migration mechanism, only relies on reading frequency migration data, does not consider persistent heat and the like, has single migration strategy, cannot reduce persistence and access expenditure, and has insufficient application scenes and functions.
In summary, in view of the above-mentioned problems in the prior art, research on a method and a system for hierarchical memory and hierarchical storage based on NVM is a critical task to be solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a hierarchical memory based on an NVM and a hierarchical storage method and a hierarchical storage system.
The method for promoting the intelligent cooperative work of the layered memory and the layered storage system provided by the invention comprises the following steps:
Step S1, identifying access characteristics of a memory page, classifying the memory page into a persistent cold page, a memory access cold page, a persistent hot page and a memory access hot page, and initializing partition of a storage space of the NVM based on classification;
Step S2, dynamically adjusting the distribution of the memory pages between the DRAM and the NVM by tracking the access heat and the persistence heat of the memory pages;
step S3, persisting the persisted hot page to the NVM in the background.
Preferably, step S1 comprises the following sub-steps:
S1.1, checking validity of a super block, if the content of the super block is illegal, distributing and initializing a log area, and finishing initialization of the NVM (non-volatile memory) by setting the super block, wherein the NVM is not mounted to a system memory;
S1.2, mounting the space except the direct persistent file page, the super block and the log area in the NVM as a system memory through a memory hot plug mechanism of an operating system;
S1.3, recording the access heat of each memory page by using an LRU linked list, and counting the persistence times of each memory page in preset time by using a hash table;
And S1.4, judging that the memory page is accessed to the hot page if the memory page is located in the active list of the LRU linked list in two continuous samplings, judging that the memory page is durable if the persistence times of the memory page exceeds a preset threshold value, and judging that the memory page which does not meet any hot page condition is cold page.
Preferably, step S1.4 includes the system providing an operating system interface allowing the user to manually adjust the cool and hot attributes of the memory pages and the placement.
Preferably, step S2 comprises the following sub-steps:
step S2.1, after the memory mount of the NVM is completed, enabling a memory access heat tracking module, a persistence heat tracking module and a background migration thread in the kernel through an operating system interface, and creating a stacked file system;
Step S2.2, when a thread accesses a memory page, updating the position of the memory page in the LRU linked list to count the memory access heat degree, and when fsync persistence operation is executed, recording the range information of the persisted memory page to count the persistence heat degree;
Step S2.3, if the memory page is determined to be required to be migrated and the memory page is located in the DRAM, adding the memory page into a specific linked list of the kernel, marking a target storage position, and migrating the memory page to a target medium one by one in the background through a kernel thread;
Step S2.4, if the memory page is a memory access page and is not a persistent hot page and is currently located in the NVM, the memory page is migrated to the DRAM.
Preferably, in step S2.1, the access heat tracking module is realized by modifying the LRU linked list updating function, wherein the persistence heat tracking module is positioned in fsync operation processing logic and background thread refreshing logic of the file system;
and the background migration thread writes the persistent file pages and the direct persistent file pages on the NVM back to the SSD or the HDD according to the content of the log area of the NVM, and performs migration between the DRAM and the NVM.
Preferably, in step S2.4, when the remaining capacity of the DRAM is lower than the lower threshold, the cold page is randomly selected from the inactive list of the LRU linked list for migration to NVM until the remaining capacity of the DRAM is restored to above the upper threshold, and then the memory page is migrated.
Preferably, step S3 comprises the following sub-steps:
step S3.1, if the memory page is a memory access cold page and a persistent hot page, the memory page is migrated to the NVM, and if the memory page is both the memory access hot page and the persistent hot page, the processing mode is determined according to the comparison result of the memory access strength and the persistent strength:
if the access strength is higher than the persistence strength, the memory page is reserved in the DRAM, if the memory page is a dirty page, the memory page is persistence to the NVM, the page table is updated to be marked as a clean page, and meanwhile, the corresponding relation between the file and the memory page is recorded in an index area of the NVM;
if the access strength is lower than the persistence strength, the memory page is migrated to the NVM and the corresponding relation is recorded;
And S3.2, when the file is read through the stacked file system, firstly acquiring a memory page to be read from the underlying file system, then searching a matched memory page in a log area of the NVM, mapping the memory page in the NVM to a page table if the matched memory page exists, and otherwise, reading and mapping the memory page from the SSD.
Preferably, in step S3.1, the access strength is obtained by collecting page address statistics of L3 cache miss through the accurate event sampling PEBS function of the intel CPU;
the persistence strength is obtained by counting the number of pages to be persistence through fsync system calls;
When the access memory strength and the persistence strength are compared, the delay data and the throughput of the DRAM and the NVM are combined, the delay cost ratio of the access memory and the persistence is calculated, then the delay increment is calculated according to the access memory times and the persistence times in the preset time, if the delay increment of the DRAM is larger, the persistence strength is judged to be higher, and if the delay increment of the NVM is larger, the access memory strength is judged to be higher.
Preferably, step S3 further comprises:
And receiving the persistence heat or access heat parameters of the memory pages set by the user through an operating system interface so as to adjust the placement positions and maintenance strategies of the memory pages.
The invention also provides a system for promoting intelligent cooperative work of the layered memory and the layered storage system, which comprises:
the method comprises the steps that a module M1 identifies access characteristics of a memory page, classifies the memory page into a persistent cold page, a memory access cold page, a persistent hot page and a memory access hot page, and performs initialization partitioning on a storage space of an NVM based on classification;
the module M2 dynamically adjusts the distribution of the memory pages between the DRAM and the NVM by tracking the access heat and the persistence heat of the memory pages;
Module M3 persists the persisted hot page in the background onto NVM.
Compared with the prior art, the invention has the following beneficial effects:
1. The invention combines the layered memory and the layered storage system based on the NVM, uses the single NVM as the memory and the storage at the same time, and improves the utilization ratio of the NVM.
2. The invention can intelligently realize zero-overhead conversion between the memory and the storage system, avoids the copy operation from the memory part NVM to the storage part NVM or in the opposite direction when the memory page is persistent, and remarkably improves the system performance.
3. The invention can be used based on any existing file system, does not need to modify the code level of the file system, and has good compatibility and usability.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a memory layout in an NVM in an embodiment of the present invention;
FIG. 2 is a diagram of all possible page types on different media in an embodiment of the invention;
FIG. 3 is a schematic diagram of a process of starting a hierarchical memory and a hierarchical storage system after power-on in an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The invention provides a layered memory and a layered storage method and system based on an NVM (non-volatile memory), which comprise the steps of S1, identifying access characteristics of memory pages, classifying the memory pages into a persistent cold page, a memory access cold page, a persistent hot page and a memory access hot page, initializing and partitioning a storage space of the NVM based on the classification, S2, dynamically adjusting the distribution of the memory pages between a dynamic random access memory DRAM and the NVM by tracking the access heat and the persistent heat of the memory pages, and S3, and persisting the persistent hot page to the NVM in the background. The invention combines the layered memory system and the layered storage system, introduces new migration judgment standard, intelligently migrates the corresponding page into the NVM or the DRAM by predicting whether the page needs to be subjected to persistent operation frequently or not and the access heat of the page, and reduces the overhead of the system in the processes of persistent operation and access.
According to the invention, a log structure and an index mechanism are adopted on the NVM, and a persistent area and a non-persistent area are definitely divided, so that the high bandwidth characteristic of the NVM as the memory expansion is fully exerted, and the persistent characteristic is utilized to realize quick data disk drop. When the system crashes and is restarted, the design can quickly recover the persistent data in the NVM through the index information, and ensure the consistency of the data. Secondly, a memory page migration strategy based on two-dimensional heat evaluation is designed, and the optimal placement position of the memory page between the DRAM and the NVM is intelligently determined by comprehensively analyzing two indexes of access heat and persistence heat, so that the optimization of the overall performance of the storage system is realized.
Example 1:
The embodiment provides a method for promoting intelligent collaborative work of a layered memory and a layered storage system, which comprises the following steps:
Step S1, when the nonvolatile memory NVM is mounted for the first time, identifying the access characteristics of the memory pages, classifying the memory pages into a lasting cold page, a memory access cold page, a lasting hot page and a memory access hot page, and initializing and partitioning the storage space of the NVM based on the classification;
In the embodiment, the persistent cold page is a memory page with the persistent operation heat lower than a preset threshold, the access cold page is a memory page with the access heat lower than the preset threshold, the persistent hot page is a memory page with the persistent operation heat higher than the preset threshold, and the access hot page is a memory page with the access heat higher than the preset threshold.
FIG. 1 is a schematic diagram of a memory layout in an NVM in an embodiment of the present invention, and FIG. 2 is a diagram illustrating all possible page types on different media in an embodiment of the present invention.
As shown in fig. 1,2, the memory space of the NVM is divided into a plurality of functional areas including superblocks, log areas, persistent file pages, direct persistent file pages, etc.
FIG. 3 is a schematic diagram of a process of starting a hierarchical memory and a hierarchical storage system after power-on in an embodiment of the present invention.
As shown in fig. 3, step S1 includes the following sub-steps:
S1.1, checking validity of a super block, if the content of the super block is illegal, distributing and initializing a log area, and finishing initialization of the NVM (non-volatile memory) by setting the super block, wherein the NVM is not mounted to a system memory;
Step S1.2, the space except the direct persistent file page, the superblock and the log area in the NVM is mounted as the system memory through the memory hot plug mechanism of the operating system, so that the system can place the page on the NVM.
S1.3, recording the access heat of each memory page by using an LRU linked list, and counting the persistence times of each memory page in preset time by using a hash table;
in this embodiment, the persistence number is the number of calls fsync.
And S1.4, judging that the memory page is accessed to the hot page if the memory page is located in the active list of the LRU linked list in two continuous samplings, judging that the memory page is durable if the persistence times of the memory page exceeds a preset threshold value, and judging that the memory page which does not meet any hot page condition is cold page.
Further, step S1.4 includes providing an operating system interface to allow the user to manually adjust the cold and hot properties of the memory pages and the placement.
Step S2, dynamically adjusting the distribution of the memory pages between the DRAM and the NVM by tracking the access heat and the persistence heat of the memory pages.
Specifically, step S2 includes the following sub-steps:
And S2.1, after the memory mount of the NVM is completed, enabling a memory access heat tracking module, a persistence heat tracking module and a background migration thread in the kernel through an operating system interface, and creating a stacked file system.
In the embodiment, the access heat tracking module is realized by modifying an LRU linked list updating function, and the persistence heat tracking module is positioned in fsync operation processing logic and background thread refreshing logic of the file system.
And the background migration thread writes the persistent file pages and the direct persistent file pages on the NVM back to the SSD or the HDD according to the content of the log area of the NVM, and performs migration between the DRAM and the NVM.
Because of the possible inconsistencies of persistent file pages residing in the NVM and file content on the underlying block device (SSD/HDD), a stacked file system is employed.
Step S2.2, when the thread accesses the memory page, updating the position of the memory page in the LRU linked list to count the memory access heat degree, when the fsync persistence operation is executed, recording the range information of the persisted memory page to count the persistence heat degree, and judging whether the memory page needs to be migrated or not by combining the current position of the memory page, the memory access heat degree and the persistence heat degree.
Step S2.3, if the memory page is determined to be required to be migrated and the memory page is located in the DRAM, adding the memory page into a specific linked list of the kernel, marking a target storage position, and migrating the memory page to a target medium one by one in the background through a kernel thread;
Step S2.4, if the memory page is a memory access page and is not a persistent hot page and is currently located in the NVM, the memory page is migrated to the DRAM.
In this embodiment, when the remaining capacity of the DRAM is lower than the lower threshold, the cold page is randomly selected from the inactive list of the LRU linked list to migrate to the NVM until the remaining capacity of the DRAM is restored to above the upper threshold, and then the migration of the memory page is performed, thereby avoiding system crash due to insufficient memory and ensuring that the high-frequency access memory page is preferentially retained in the DRAM.
Step S3, persisting the persisted hot page to the NVM in the background.
Specifically, step S3 includes the following sub-steps:
step S3.1, if the memory page is a memory access cold page and a persistent hot page, the memory page is migrated to the NVM, and if the memory page is both the memory access hot page and the persistent hot page, the processing mode is determined according to the comparison result of the memory access strength and the persistent strength:
If the access strength is higher than the persistence strength, the memory page is reserved in the DRAM, if the memory page is a dirty page, the memory page is persistence to the NVM, the page table is updated to be marked as a clean page, and meanwhile, the corresponding relation between the file and the memory page is recorded in the index area of the NVM, the advantage of the access speed of the DRAM is exerted, and the persistence expenditure of the memory page is reduced by using an speculative strategy.
If the access strength is lower than the persistence strength, the memory page is migrated to the NVM and the corresponding relation is recorded. Further, the subsequent persistence operation is directly skipped, at this time, the memory page is directly accessed and modified through the access instruction, and the modification is directly persisted to the NVM, so as to reduce the persistence cost.
Further, in step S3.1, the access strength is obtained by counting the page addresses of the L3 cache miss through the accurate event sampling PEBS function of the intel CPU, and the access times of the current memory page in the preset time are obtained.
The persistence strength is obtained by counting the number of pages that are being persisted through fsync system calls.
When the access memory strength and the persistence strength are compared, the delay data and the throughput of the DRAM and the NVM are combined, the delay cost ratio of the access memory and the persistence is calculated, then the delay increment is calculated according to the access memory times and the persistence times in the preset time, if the delay increment of the DRAM is larger, the persistence strength is judged to be higher, and if the delay increment of the NVM is larger, the access memory strength is judged to be higher.
And S3.2, when the file is read through the stacked file system, firstly acquiring a memory page to be read from the underlying file system, then searching a matched memory page in a log area of the NVM, mapping the memory page in the NVM to a page table if the matched memory page exists, and otherwise, reading and mapping the memory page from the SSD.
Further, step S3 further includes:
And receiving the persistence heat or access heat parameters of the memory pages set by the user through an operating system interface so as to adjust the placement positions and maintenance strategies of the memory pages.
Example 2:
the present invention also provides a system for promoting intelligent cooperative work of the layered memory and the layered storage system, where the system for promoting intelligent cooperative work of the layered memory and the layered storage system may be implemented by executing the flow steps of the method for promoting intelligent cooperative work of the layered memory and the layered storage system, that is, those skilled in the art may understand the method for promoting intelligent cooperative work of the layered memory and the layered storage system as a preferred embodiment of the system for promoting intelligent cooperative work of the layered memory and the layered storage system.
Specifically, the system for promoting intelligent cooperative work of the layered memory and the layered storage system comprises:
the method comprises the steps that a module M1 identifies access characteristics of a memory page, classifies the memory page into a persistent cold page, a memory access cold page, a persistent hot page and a memory access hot page, and performs initialization partitioning on a storage space of an NVM based on classification;
the module M2 dynamically adjusts the distribution of the memory pages between the DRAM and the NVM by tracking the access heat and the persistence heat of the memory pages;
Module M3 persists the persisted hot page in the background onto NVM.
Specifically, the module M1 comprises the following sub-modules:
The module M1.1 is used for checking the validity of the super block, if the content of the super block is illegal, allocating and initializing a log area, and finishing the initialization of the NVM by setting the super block, but the NVM is not mounted to a system memory;
and the module M1.2 mounts the space except the direct persistent file pages, the superblock and the log area in the NVM as a system memory through a memory hot plug mechanism of the operating system, so that the system can place the pages on the NVM.
The module M1.3 records the access heat of each memory page by using the LRU linked list, and counts the persistence times of each memory page in preset time through the hash table;
in this embodiment, the persistence number is the number of calls fsync.
And a module M1.4, wherein if the memory pages are both positioned in the active list of the LRU linked list in two continuous samplings, the memory pages are judged to be accessed to the hot page, if the persistence times of the memory pages exceed a preset threshold value, the memory pages are judged to be persistence hot pages, and the memory pages which do not meet any hot page condition are judged to be cold pages.
Further, the module M1.4 includes an operating system interface provided by the system to allow the user to manually adjust the cold and hot properties of the memory pages and the placement.
Specifically, module M2 includes the following sub-modules:
And the module M2.1 starts a memory access heat tracking module, a persistence heat tracking module and a background migration thread in the kernel through an operating system interface after the memory mount of the NVM is completed, and creates a stacked file system.
In the embodiment, the access heat tracking module is realized by modifying an LRU linked list updating function, and the persistence heat tracking module is positioned in fsync operation processing logic and background thread refreshing logic of the file system.
And the background migration thread writes the persistent file pages and the direct persistent file pages on the NVM back to the SSD or the HDD according to the content of the log area of the NVM, and performs migration between the DRAM and the NVM.
Because of the possible inconsistencies of persistent file pages residing in the NVM and file content on the underlying block device (SSD/HDD), a stacked file system is employed.
The module M2.2 is used for updating the position of the memory page in the LRU chain table to count the memory access heat degree when the thread accesses the memory page, recording the range information of the memory page which is durable to count the durable heat degree when the fsync durable operation is executed, and judging whether the memory page needs to be migrated or not by combining the current position of the memory page, the memory access heat degree and the durable heat degree.
If the memory page is located in the NVM and the lasting times of the memory page in the preset time are lower than the preset threshold value, the memory page is regarded as a cold page, and the memory page is lasting to the SSD;
and the module M2.4 is used for migrating the memory page to the DRAM if the memory page is a memory access page and is not a persistent hot page and is currently positioned in the NVM.
In this embodiment, when the remaining capacity of the DRAM is lower than the lower threshold, the cold page is randomly selected from the inactive list of the LRU linked list to migrate to the NVM until the remaining capacity of the DRAM is restored to above the upper threshold, and then the migration of the memory page is performed, thereby avoiding system crash due to insufficient memory and ensuring that the high-frequency access memory page is preferentially retained in the DRAM.
Specifically, module M3 includes the following sub-modules:
The module M3.1 is used for migrating the memory page to the NVM if the memory page is a memory access cold page and a persistent hot page, and determining a processing mode according to a comparison result of the memory access strength and the persistent strength if the memory page is both the memory access hot page and the persistent hot page:
If the access strength is higher than the persistence strength, the memory page is reserved in the DRAM, if the memory page is a dirty page, the memory page is persistence to the NVM, the page table is updated to be marked as a clean page, and meanwhile, the corresponding relation between the file and the memory page is recorded in the index area of the NVM, the advantage of the access speed of the DRAM is exerted, and the persistence expenditure of the memory page is reduced by using an speculative strategy.
If the access strength is lower than the persistence strength, the memory page is migrated to the NVM and the corresponding relation is recorded. Further, the subsequent persistence operation is directly skipped, at this time, the memory page is directly accessed and modified through the access instruction, and the modification is directly persisted to the NVM, so as to reduce the persistence cost.
Further, in the module M3.1, the access strength is obtained by collecting page address statistics of L3 cache miss through the accurate event sampling PEBS function of the intel CPU, and the access times of the current memory page in a preset time are obtained.
The persistence strength is obtained by counting the number of pages that are being persisted through fsync system calls.
When the access memory strength and the persistence strength are compared, the delay data and the throughput of the DRAM and the NVM are combined, the delay cost ratio of the access memory and the persistence is calculated, then the delay increment is calculated according to the access memory times and the persistence times in the preset time, if the delay increment of the DRAM is larger, the persistence strength is judged to be higher, and if the delay increment of the NVM is larger, the access memory strength is judged to be higher.
And the module M3.2 is used for firstly acquiring a memory page to be read from the bottom file system when the file is read through the stacked file system, searching a matched memory page in a log area of the NVM, mapping the memory page in the NVM to a page table if the matched memory page exists, and otherwise, reading and mapping the memory page from the SSD.
In summary, the invention provides a method and a system for promoting intelligent collaborative work of a layered memory and a layered storage system, which can more intelligently identify semantics of pages in two systems, support the conversion of pages between the two systems without spending, and improve memory access and persistence operation performance of the system using the NVM to a certain extent compared with the common layered memory system and the common layered storage method and system.
Those skilled in the art will appreciate that the invention provides a system and its individual devices, modules, units, etc. that can be implemented entirely by logic programming of method steps, in addition to being implemented as pure computer readable program code, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Therefore, the system and the devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units for realizing various functions included in the system can be regarded as structures in the hardware component, and the devices, modules and units for realizing various functions can be regarded as structures in the hardware component as well as software modules for realizing the method.
The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims (10)

1.一种促进分层内存和分层存储系统智能协同工作的方法,其特征在于,包括如下步骤:1. A method for promoting intelligent collaboration between tiered memory and tiered storage systems, comprising the following steps: 步骤S1,识别内存页的访问特征,将所述内存页分类为持久化冷页、访存冷页、持久化热页和访存热页,并基于所述分类,对所述NVM的存储空间进行初始化分区;Step S1, identifying access characteristics of memory pages, classifying the memory pages into persistent cold pages, memory access cold pages, persistent hot pages, and memory access hot pages, and initializing partitioning of the NVM storage space based on the classification; 步骤S2,通过跟踪内存页的访问热度以及持久化热度,动态调整所述内存页在动态随机存取存储器DRAM和所述NVM之间的分布;Step S2, dynamically adjusting the distribution of the memory pages between the dynamic random access memory DRAM and the NVM by tracking the access heat and persistence heat of the memory pages; 步骤S3,将所述持久化热页在后台持久化到所述NVM上。Step S3: persisting the persistent hot page to the NVM in the background. 2.根据权利要求1所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S1包括如下子步骤:2. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 1, wherein step S1 comprises the following sub-steps: 步骤S1.1,检查超级块的合法性,若所述超级块内容不合法,则分配并初始化日志区,通过设置所述超级块完成所述NVM的初始化,但NVM不被挂载至系统内存;若所述超级块内容合法,则根据所述日志区的索引信息,定位持久化文件页,通过普通写请求将所述持久化文件页写入固态硬盘SSD或机械硬盘HDD的文件系统;Step S1.1: Check the validity of the super block. If the super block is not valid, allocate and initialize a log area. Initialize the NVM by setting the super block, but do not mount the NVM to the system memory. If the super block is valid, locate the persistent file page based on the index information in the log area and write the persistent file page to the file system of the solid-state drive (SSD) or mechanical hard disk (HDD) through a normal write request. 步骤S1.2,通过操作系统的内存热插拔机制,将所述NVM中除直接持久化文件页、所述超级块和所述日志区以外的空间挂载为系统内存;Step S1.2, using the memory hot-swap mechanism of the operating system, mounting the space in the NVM except for the direct persistent file page, the super block, and the log area as system memory; 步骤S1.3,利用LRU链表记录每个内存页的访问热度,通过哈希表统计每个内存页在预设时间内的持久化次数;Step S1.3: Use the LRU linked list to record the access popularity of each memory page, and use the hash table to count the number of persistence times of each memory page within a preset time; 步骤S1.4,若所述内存页在连续两次采样中均位于所述LRU链表的活跃列表,则判定为访存热页;若所述内存页的持久化次数超过预设阈值,则判定为持久化热页;未满足以上任一热页条件的内存页判定为冷页。Step S1.4: If the memory page is located in the active list of the LRU linked list in two consecutive samplings, it is determined to be a hot page for memory access; if the number of persistence times of the memory page exceeds the preset threshold, it is determined to be a persistent hot page; a memory page that does not meet any of the above hot page conditions is determined to be a cold page. 3.根据权利要求2所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S1.4包括:系统提供操作系统接口,允许用户手动调整内存页的冷热属性,以及放置方式。3. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 2, wherein step S1.4 includes: providing an operating system interface to allow a user to manually adjust the hot and cold attributes of memory pages and their placement. 4.根据权利要求1所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S2包括如下子步骤:4. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 1, wherein step S2 comprises the following sub-steps: 步骤S2.1,在完成所述NVM的内存挂载后,通过操作系统接口启用内核中的访存热度跟踪模块、持久化热度跟踪模块和后台迁移线程,并创建堆叠文件系统;Step S2.1, after the NVM memory is mounted, the memory access heat tracking module, the persistent heat tracking module and the background migration thread in the kernel are enabled through the operating system interface, and a stacked file system is created; 步骤S2.2,当线程访问所述内存页时,更新所述内存页在所述LRU链表中的位置以统计访存热度;当执行fsync持久化操作时,记录被持久化的所述内存页的范围信息,以统计持久化热度;结合所述内存页的当前位置、所述访存热度和所述持久化热度,判断是否需要迁移所述内存页;Step S2.2: When a thread accesses the memory page, the position of the memory page in the LRU linked list is updated to calculate the memory access heat; when an fsync persistence operation is performed, the range information of the persisted memory page is recorded to calculate the persistence heat; and based on the current position of the memory page, the memory access heat, and the persistence heat, it is determined whether the memory page needs to be migrated. 步骤S2.3,若确定所述内存页需要迁移且所述内存页位于所述DRAM,则将所述内存页添加到内核的特定链表,并标注目标存储位置,通过内核线程,在后台逐个迁移到目标介质;若所述内存页位于所述NVM且预设时间内被持久化的次数低于预设阈值,则,则将所述内存页视为冷页,将所述内存页持久化到所述SSD;Step S2.3: If it is determined that the memory page needs to be migrated and the memory page is located in the DRAM, the memory page is added to a specific linked list of the kernel and the target storage location is marked. The memory page is migrated one by one to the target medium in the background through the kernel thread. If the memory page is located in the NVM and the number of times it has been persisted within a preset time is lower than a preset threshold, the memory page is considered a cold page and is persisted to the SSD. 步骤S2.4,若所述内存页为访存热页且非持久化热页,且当前位于所述NVM,则将所述内存页迁移到所述DRAM。Step S2.4: If the memory page is a hot page for access and a non-persistent hot page, and is currently located in the NVM, the memory page is migrated to the DRAM. 5.根据权利要求4所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S2.1中,所述访存热度跟踪模块通过修改LRU链表更新函数实现;所述持久化热度跟踪模块位于文件系统的fsync操作处理逻辑以及后台线程刷写逻辑中;5. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 4, wherein in step S2.1, the memory access heat tracking module is implemented by modifying an LRU linked list update function; the persistent heat tracking module is located in the file system's fsync operation processing logic and background thread flushing logic; 所述后台迁移线程根据所述NVM的日志区内容,将所述NVM上的持久化文件页和直接持久化文件页写回所述SSD或所述HDD,并执行所述内存页在所述DRAM和所述NVM之间的迁移。The background migration thread writes the persistent file pages and the direct persistent file pages on the NVM back to the SSD or the HDD according to the log area content of the NVM, and performs migration of the memory pages between the DRAM and the NVM. 6.根据权利要求4所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S2.4中,当所述DRAM的剩余容量低于下限阈值时,从所述LRU链表的非活跃列表中随机选择冷页迁移到所述NVM,直到所述DRAM的剩余容量恢复至上限阈值以上,再执行所述内存页的迁移。6. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 4, wherein in step S2.4, when the remaining capacity of the DRAM is lower than a lower threshold, cold pages are randomly selected from the inactive list of the LRU linked list and migrated to the NVM until the remaining capacity of the DRAM is restored to above an upper threshold, at which time the migration of the memory pages is performed. 7.根据权利要求1所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S3包括如下子步骤:7. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 1, wherein step S3 comprises the following sub-steps: 步骤S3.1,若所述内存页为访存冷页且为持久化热页,则将所述内存页迁移到所述NVM;若所述内存页同时为访存热页和持久化热页,则根据访存强度和持久化强度的比较结果决定处理方式:Step S3.1: If the memory page is both a cold page for access and a persistent hot page, migrate the memory page to the NVM. If the memory page is both a hot page for access and a persistent hot page, determine the processing method based on the comparison result of the access strength and the persistence strength: 若所述访存强度高于所述持久化强度,则将所述内存页保留在所述DRAM,若所述内存页为脏页,则将所述内存页持久化到所述NVM并,并更新页表标记为干净页,同时在所述NVM的索引区记录文件与所述内存页的对应关系;If the memory access intensity is higher than the persistence intensity, the memory page is retained in the DRAM; if the memory page is a dirty page, the memory page is persisted to the NVM and the page table is updated to mark it as a clean page. At the same time, the correspondence between the file and the memory page is recorded in the index area of the NVM; 若所述访存强度低于所述持久化强度,则将所述内存页迁移到所述NVM并记录对应关系;If the memory access intensity is lower than the persistence intensity, migrating the memory page to the NVM and recording the corresponding relationship; 步骤S3.2,通过所述堆叠文件系统读取文件时,先从底层文件系统获取待读取的内存页,再在所述NVM的日志区搜索匹配的内存页,若存在,则将所述NVM中的内存页映射到页表;否则从所述SSD读取并映射。Step S3.2: When reading a file through the stacked file system, first obtain the memory page to be read from the underlying file system, then search for a matching memory page in the log area of the NVM. If it exists, map the memory page in the NVM to the page table; otherwise, read and map it from the SSD. 8.根据权利要求7所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S3.1中,所述访存强度通过英特尔CPU的精确事件采样PEBS功能采集L3缓存缺失的页地址统计得到;8. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 7, wherein in step S3.1, the memory access intensity is obtained by collecting page address statistics of L3 cache misses using the Precise Event Sampling (PEBS) function of an Intel CPU; 所述持久化强度通过fsync系统调用统计被持久化的页数得到;The persistence strength is obtained by counting the number of pages persisted by the fsync system call; 比较所述访存强度和所述持久化强度时,结合所述DRAM和所述NVM的延迟数据和吞吐量,计算访存和持久化的延迟开销比,再根据预设时间内的访存次数和持久化次数推算延迟增量,若所述DRAM的延迟增量更大,则判定持久化强度更高,若所述NVM的延迟增量更大,则判定所述访存强度更高。When comparing the memory access intensity and the persistence intensity, the delay overhead ratio of memory access and persistence is calculated in combination with the delay data and throughput of the DRAM and the NVM, and then the delay increment is calculated based on the number of memory accesses and the number of persistence within a preset time. If the delay increment of the DRAM is larger, it is determined that the persistence intensity is higher. If the delay increment of the NVM is larger, it is determined that the memory access intensity is higher. 9.根据权利要求1所述的促进分层内存和分层存储系统智能协同工作的方法,其特征在于,所述步骤S3还包括:9. The method for promoting intelligent collaboration between tiered memory and tiered storage systems according to claim 1, wherein step S3 further comprises: 通过操作系统接口接收用户设置的所述内存页的持久化热度或访存热度参数,以调整所述内存页的放置位置和维护策略。The persistent heat or memory access heat parameter of the memory page set by the user is received through the operating system interface to adjust the placement position and maintenance strategy of the memory page. 10.一种促进分层内存和分层存储系统智能协同工作的系统,其特征在于,包括:10. A system for promoting intelligent collaboration between tiered memory and tiered storage systems, comprising: 模块M1,识别内存页的访问特征,将所述内存页分类为持久化冷页、访存冷页、持久化热页和访存热页,并基于所述分类,对所述NVM的存储空间进行初始化分区;Module M1 identifies access characteristics of memory pages, classifies the memory pages into persistent cold pages, access cold pages, persistent hot pages, and access hot pages, and initializes partitions of the NVM storage space based on the classification; 模块M2,通过跟踪内存页的访问热度以及持久化热度,动态调整所述内存页在动态随机存取存储器DRAM和所述NVM之间的分布;Module M2 dynamically adjusts the distribution of memory pages between the dynamic random access memory DRAM and the NVM by tracking the access heat and persistence heat of the memory pages; 模块M3,将所述持久化热页在后台持久化到所述NVM上。Module M3 persists the persistent hot page to the NVM in the background.
CN202510945875.7A 2025-07-09 2025-07-09 Method and system for hierarchical memory and hierarchical storage based on NVM Pending CN120803363A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510945875.7A CN120803363A (en) 2025-07-09 2025-07-09 Method and system for hierarchical memory and hierarchical storage based on NVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510945875.7A CN120803363A (en) 2025-07-09 2025-07-09 Method and system for hierarchical memory and hierarchical storage based on NVM

Publications (1)

Publication Number Publication Date
CN120803363A true CN120803363A (en) 2025-10-17

Family

ID=97313222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510945875.7A Pending CN120803363A (en) 2025-07-09 2025-07-09 Method and system for hierarchical memory and hierarchical storage based on NVM

Country Status (1)

Country Link
CN (1) CN120803363A (en)

Similar Documents

Publication Publication Date Title
JP7089830B2 (en) Devices, systems, and methods for write management of non-volatile memory data
US10761777B2 (en) Tiered storage using storage class memory
US9940261B2 (en) Zoning of logical to physical data address translation tables with parallelized log list replay
US9229653B2 (en) Write spike performance enhancement in hybrid storage systems
CN102012791B (en) Flash based PCIE (peripheral component interface express) board for data storage
US20180018259A1 (en) Apparatus and method for low power low latency high capacity storage class memory
WO2021218038A1 (en) Storage system, memory management method, and management node
US20140337566A1 (en) Solid state memory (ssm), computer system including an ssm, and method of operating an ssm
Niu et al. Hybrid storage systems: A survey of architectures and algorithms
US20090106518A1 (en) Methods, systems, and computer program products for file relocation on a data storage device
US8572321B2 (en) Apparatus and method for segmented cache utilization
CN103838676B (en) Data-storage system, date storage method and PCM bridges
US20070005904A1 (en) Read ahead method for data retrieval and computer system
US11663136B2 (en) Storage capacity recovery source selection
CN102681792B (en) Solid-state disk memory partition method
CN117312188A (en) Hybrid SSD data cache prefetching system and method
Xie et al. Dynamic data reallocation in hybrid disk arrays
TW202449615A (en) Devices and methods for cache operation in storage devices
CN120803363A (en) Method and system for hierarchical memory and hierarchical storage based on NVM
Micheloni et al. Hybrid storage
Wang et al. Evaluating non-in-place update techniques for flash-based transaction processing systems
Micheloni et al. Hybrid storage systems
Zhang et al. MCB: a multidevice cooperative buffer management strategy for boosting the write performance of the SSD-SMR hybrid storage: C. Zhang et al.
Koo et al. Maintaining Inter-Layer Equilibrium in Hierarchical-Storage-based KV Store
US20240402924A1 (en) Systems, methods, and apparatus for cache configuration based on storage placement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination