[go: up one dir, main page]

WO2015156758A1 - Method and apparatus of cache promotion between server and storage system - Google Patents

Method and apparatus of cache promotion between server and storage system Download PDF

Info

Publication number
WO2015156758A1
WO2015156758A1 PCT/US2014/033152 US2014033152W WO2015156758A1 WO 2015156758 A1 WO2015156758 A1 WO 2015156758A1 US 2014033152 W US2014033152 W US 2014033152W WO 2015156758 A1 WO2015156758 A1 WO 2015156758A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
server
segment
storage
backup information
Prior art date
Application number
PCT/US2014/033152
Other languages
French (fr)
Inventor
Akio Nakajima
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/US2014/033152 priority Critical patent/WO2015156758A1/en
Publication of WO2015156758A1 publication Critical patent/WO2015156758A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1441Resetting or repowering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • G06F11/1484Generic software techniques for error detection or fault masking by means of middleware or OS functionality involving virtual machines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2046Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share persistent storage
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1032Reliability improvement, data loss prevention, degraded operation etc
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/152Virtualized environment, e.g. logically partitioned system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/22Employing cache memory using specific memory technology
    • G06F2212/222Non-volatile memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/26Using a specific storage system architecture
    • G06F2212/264Remote server
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/311In host system

Definitions

  • the present invention relates generally to storage systems and, more particularly, to cache promotion between server cache and storage system.
  • server cache When the server cache feature is caching read or write data in the server, the server does not access the storage system if the storage volume data is stored in the server cache.
  • the server cache requires a promotion process, which is a warm-up process of storing storage data to the server cache.
  • the server cache uses nonvolatile memory such as flash memory. The server cache requires a re-promotion process when the server cache program is restarted, because other physical server(s) execute write IO to the storage volume when a virtual machine migrates to the other physical server(s).
  • Exemplary embodiments of the invention provide a cache promotion method between server cache and storage system.
  • the server has a cache program, a cache directory, backup information and cache data.
  • the storage system has backup information and manages the backup information.
  • the server sends the server cache a directory backup command
  • the storage system saves the cache directory backup.
  • the server sends a write command
  • the storage system invalidates the backup information corresponding to the LUN/LBA (Logical Unit Number/Logical Block Addressing) segments of the write command.
  • LUN/LBA Logical Unit Number/Logical Block Addressing
  • the server cache program is restarted due to reboot after the server has been down or due to maintenance, the server sends a restore command and the storage system returns backup information, and then the server merges valid entry directory information.
  • the server cache program does not invalidate the entire server cache data and does not execute a re-promotion (warm up) process to a volume since the storage system keeps the latest backup information when other server(s) send(s) write process to
  • a system comprises: a server which includes a nonvolatile persistent memory having a cache data area that has a plurality of cache segments, and a server processor managing cache directory information which indicates whether each of the plurality of cache segments is valid or not, wherein a cache segment is valid when the cache segment is allocated for caching data; and a storage which includes a storage memory storing backup information of the cache directory information, a plurality of storage devices, and a storage processor configuring one or more logical units by storage space of the plurality of storage devices.
  • the server processor When the server recovers from being down, the server processor is configured to obtain the backup information of the cache directory information from the storage, and, when the backup information indicates that a particular cache segment is invalid, to invalidate the particular cache segment in the cache directory information. [0005] In some embodiments, the server processor is configured, after invalidating the particular cache segment in the cache directory information, to delete server cache data from the particular cache segment and to delete a corresponding server cache address of the particular cache segment from the cache directory information.
  • the storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers.
  • a plurality of servers execute a shared server cache program to manage the cache directory information and the backup information.
  • the storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers.
  • a virtual machine is migrated from the first server to a second server.
  • the first server recovers after being shut down, the first server obtains the shared backup information of the cache directory information from the storage via the shared server cache program.
  • the server processor is configured, in executing a read command, to: search for valid cache segments in the cache data area which correspond to segments of the read command; when there is a valid cache segment corresponding to a segment of the read command, read data from the valid cache segment of the cache data area; when there is no valid cache segment corresponding to a segment of the read command, read data for the segment from the storage; and concatenate data read from the cache data area and data read from the storage to form an entire data of the read command.
  • the server processor is configured, in executing the read command, to: when reading data for the segment from the storage having no corresponding valid cache segment, update an access counter representing a number of access to the storage for the segment by incrementing the access counter by one; and when the access counter exceeds a preset threshold, allocate a new cache segment from the cache data area for storing the data to be read for the segment from the storage, and update server backup information of the cache directory information.
  • the server processor is configured to send the updated server backup information to the storage to update the backup information stored in the storage.
  • the server processor is configured, in executing a write command, to: search the cache directory information for cache segments in the cache data area which correspond to segments of the write command; when there is a cache segment corresponding to a segment of the write command, update an access counter representing a number of access to the storage for the segment by incrementing the access counter by one, and when the access counter exceeds a preset threshold, allocate the corresponding cache segment to the segment of the write command if the cache segment has not been allocated so as to validate the cache segment, and write data of the segment to the cache segment of the cache data area and update server backup information of the cache directory information; when there is no cache segment corresponding to a segment of the write command, create a cache segment in the cache directory information corresponding to the segment of the write command, initialize an access counter for the cache segment to one, and update server backup information of the cache directory information; and send the write command to the storage for execution.
  • the storage processor is configured, upon receiving the write command, to: invalidate the backup information corresponding to the segments of the write command; and write data of the write command to the storage space.
  • the server processor is configured to send the updated server backup information to the storage to update the backup information stored in the storage.
  • FIG. 1 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the first embodiment.
  • FIG. 2a illustrates an example of the host memory.
  • FIG. 2b illustrates an example of the host nonvolatile persistent memory.
  • FIG. 3 illustrates an example of the storage memory.
  • FIG. 4 illustrates an example of the cache directory.
  • FIG. 5 illustrates an example of the least recently used list.
  • FIG. 6 illustrates an example of the virtual machine table.
  • FIG. 7 illustrates an example of the directory backup information.
  • FIG. 8 shows an example of a flow diagram illustrating a process of the host cache program for executing a read command.
  • FIG. 9 shows an example of a flow diagram illustrating a process of the host cache program for executing a write command.
  • FIG. 10 shows an example of a flow diagram illustrating a process of the host cache program for executing a host cache directory backup process.
  • FIG. 11 shows an example of a flow diagram illustrating a process of the host cache program for executing a process to restore cache directory from the storage system.
  • FIG. 12 shows an example of a flow diagram illustrating a process of the storage program for executing a host cache directory backup process.
  • FIG. 13 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the second embodiment.
  • FIG. 14 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the third embodiment.
  • processing can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
  • the present invention also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially
  • Exemplary embodiments of the invention provide apparatuses, methods and computer programs for cache promotion between server cache and storage system.
  • FIG. 1 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the first embodiment.
  • the computer system includes a storage system 1 and a host/server computer 2.
  • the storage system includes host interface for connecting to the host computer 2, CPU, cache memory 30, disk interface, and HDDs (Hard Disk Drives) 4.
  • the host interface, CPU, cache memory 30, and disk interface are connected via a bus interface, such as PCI, DDR, SCSI, or the like.
  • a logical unit (LU) 3 is a logical data store which is created by multiple HDDs 4.
  • the host computer 2 includes disk interface for connecting to the storage system 1 or to internal HDD, CPU, nonvolatile persistent memory 20b, and volatile memory 20a, which are connected via a bus interface such as PCI, DDR, SCSI, or the like.
  • FIG. 2a illustrates an example of the host memory 20a.
  • the volatile host memory 20a includes a server program 21.
  • the host program 21 includes virtual machine 22, hypervisor 23, and cache program 24.
  • FIG. 2b illustrates an example of the host nonvolatile persistent memory 20b.
  • the host nonvolatile persistent memory 20b includes cache directory 40 (see FIG. 4), LRU (Least Recently Used) list 50 (see FIG. 5), cache data area 25, and virtual machine table 60 (see FIG. 6).
  • the cache directory 40 may be stored in the host memory 20a instead of the host nonvolatile persistent memory 20b.
  • FIG. 3 illustrates an example of the storage memory 30.
  • the storage cache memory 30 includes a storage program 31 and cache directory backup information 70 (see FIG. 7).
  • the storage program 31 includes IO program 32 and server cache directory maintenance program 33.
  • the IO program 32 executes IO commands from the host(s).
  • the server cache directory maintenance program 33 executes backup or restores cache directory information to recover server side cache data and directory information.
  • FIG. 4 illustrates an example of the cache directory 40.
  • the cache directory 40 contains Logical Unit Number (LUN) field 41 , Logical Block
  • LBA LBA Address
  • SCA server cache address
  • access counter field 45 The LUN 41 and LBA 42 entries are identifiers of store location of the LU 3.
  • the valid flag 43 is a cache validate/invalidate flag.
  • the SCA 44 is the location in the NV (nonvolatile) persistent cache memory
  • FIG. 5 illustrates an example of the least recently used list 50.
  • the LRU list contains a server cache address list 51.
  • the SCA list 51 stores a list of SCA 44 having insufficient capacity of cache data area.
  • At the top of the LRU list 50 is the least recently used (i.e., oldest access of the cache data), and at the bottom of the list is the most recently used (i.e., latest read/write access of the cache data).
  • the cache program 24 detects insufficient capacity of cache space in the cache data area 25, the cache program 24 chooses the entry at the top of the LRU list 50 to create free space.
  • the cache program 24 searches the entry of SCA and move an existing entry or insert a new entry at the bottom of LRU list 50 (latest access). An SCA entry that is most recently accessed is deleted and inserted at the bottom of the LRU list 50.
  • FIG. 6 illustrates an example of the virtual machine table 60.
  • the virtual machine table 60 contains virtual machine UUID 61 and used LUN 62.
  • the hypervisor migrates the virtual machine (VM)
  • the hypervisor refers to the virtual machine table 60 to use promoted cache data from the storage system 1.
  • FIG. 7 illustrates an example of the directory backup information 70.
  • the directory backup information 70 contains Logical Unit Number (LUN) field 71 , Logical Block Address (LBA) field 72, valid flag 73, and server cache address (SCA) field 74.
  • the directory backup information can be created by bitmap table with valid flag for each cache segment size.
  • the SCA field 74 is needed if the cache directory 40 is stored in the host memory 20a, but is not needed if the cache directory 40 is stored in the host nonvolatile persistent memory 20b.
  • FIG. 8 shows an example of a flow diagram 800 illustrating a process of the host cache program for executing a read command. In step
  • step S801 the server program 21 issues a read command to LUN of the storage system 1.
  • the server cache program 24 gets the read command and proceeds to the step S802.
  • step S802 the server cache program searches the server cache directory 40. If the search for the server cache is a hit, the next step is S806; if the search is a miss, the next step is S803.
  • step S803 the server cache program searches the server cache directory 40. If the search for the server cache is a hit, the next step is S806; if the search is a miss, the next step is S803.
  • step S804 the server cache program checks the access counter. If the threshold for the access counter is exceeded (YES), the next step is S805; if the threshold is not exceeded (NO), the next step is S807. In step S805, the server cache program 24 allocates space (e.g., a cache segment) from the cache data area 25, and then updates the cache directory 40 and server backup information (the server backup information is stored in the server and is sent to the storage to update the backup information stored in the storage as shown in FIG. 10). The next step is S807.
  • space e.g., a cache segment
  • step S806 the server cache program 24 reads data from the cache data area 25 of the corresponding SCA 44.
  • step S807 if the data size of data to be read according to the read command is larger than the cache segment and next segment exists, then the next step is S802; if the next segment does not exist, then the next step is S808.
  • step S808 the server cache program 24 executes read processes to the storage system 1 that were created in step S803.
  • the server cache program stores the data to the allocated space in the cache data area 25.
  • the server cache program 24 concatenates the storage read data and cache read data, and then it returns the entire data to the server program 21.
  • FIG. 9 shows an example of a flow diagram 900 illustrating a process of the host cache program for executing a write command.
  • step 900 shows an example of a flow diagram 900 illustrating a process of the host cache program for executing a write command.
  • step S901 the server program 21 issues a write command to LUN of the storage system .
  • the server cache program 24 gets the write command and write data, and proceeds to step S902.
  • step S902 the server cache program searches the server cache directory 40. If the search for the server cache is a hit (YES), the next step is S903; if the search is a miss (NO), the next step is
  • step S903 the server cache program 24 updates the access counter in the access counter field 45 of the corresponding segment to the storage system, and the next step is S904.
  • step S904 the server cache program checks the access counter. If the threshold for the access counter is exceeded (YES), the next step is S905; if the threshold is not exceeded (NO), the next step is S908. It is noted that when the search for the server cache is a hit (YES) in step S903, the cache directory entry (LUN/LBA) exists, but the valid flag can be 0 (SCA not allocated) or 1 (SCA allocated).
  • step S905 if the cache directory entry does not have an allocated SCA (valid flag is 0), the server program allocates an SCA (from the cache data area) for that entry of corresponding LUN and LBA (e.g., that particular cache segment).
  • SCA from the cache data area
  • the server cache program stores the write data to the cache data area
  • the server cache program updates the cache directory 40 and server backup information.
  • the next step is S908.
  • the server creates a cache directory entry (without allocating an SCA to the space in the cache data area corresponding to the created cache directory entry) and initializes the access counter to the initial value of one, and updates the server backup information.
  • the next step S908 if the data size of data to be written according to the write command is larger than the cache segment and the next segment exists, the next step is S902; if the next segment does not exist, the next step is S909.
  • the server cache program 24 executes the write process to the storage system 1. Then the server cache program 24 returns the write status to the server program 21.
  • FIG. 10 shows an example of a flow diagram 1000 illustrating a process of the host cache program for executing a host cache directory backup process.
  • the server cache program 24 checks the backup process timer. If the timer has expired (YES), the next step is S1002; if the timer has not expired (NO), the process ends.
  • the server cache program sends server backup information to the storage system 1 asynchronously. The storage system receives and uses the server backup information to update the backup information stored therein.
  • the server cache program restarts the backup process timer.
  • FIG. 11 shows an example of a flow diagram 1100 illustrating a process of the host cache program for executing a process to restore cache directory from the storage system.
  • step S1101 the server cache program
  • step 24 is restarted when the physical server is rebooted after being down.
  • step S1102 the server cache program gets backup information from the storage system 1.
  • step S1103 if the server cache program gets the backup information successfully (YES), the next step is S1 06; if not (NO), the next step is S1104.
  • step S1104 the server cache program invalidates the server cache data of all LUN stored to the storage system from the cache data area 25.
  • step S1105 the server cache program deletes the corresponding SCA 44 from the cache directory 40 and invalidates the corresponding entry, and the process ends.
  • step S1106 the server cache program 24 invalidates the entry in the cache directory 40 when the corresponding LUN/LBA field (41/42) is invalid.
  • step S1107 the server cache program deletes the server cache data of the corresponding SCA 44 from the cache data area 25 if the valid flag is 0.
  • step S1108 the server cache program deletes the corresponding SCA 44 from the cache directory 40 if the valid flag is 0.
  • step S1109 the server cache program increments to the next LBA segment. If the next segment exists (YES), the next step is S1106; if all LBAs are checked (NO), the next step is S11 10.
  • step S1110 the server cache program checks the next LUN. If the next LUN exists in the cache backup information (YES), the next step is S1106; if the next LUN does not exist (NO), the process ends.
  • FIG. 12 shows an example of a flow diagram 1200 illustrating a process of the storage program for executing a host cache directory backup process.
  • the storage program 31 receives a write command from the host system 2 which executed a backup command.
  • the next step
  • the storage program invalidates the backup information corresponding to LUN/LBA segments of the write command.
  • the next step S1203 if the data size of data to be written according to the write command is larger than the cache segment and the next segment exists, the next step is S1202; if the next segment does not exist, the process ends.
  • the server cache program When the server cache program restarts, the server cache program does not invalidate entire server cache data.
  • the server cache program does not execute a re-promotion (warm up) process to a volume since the storage system keeps the latest backup information when other server(s) send(s) write process to the same volume.
  • FIG. 13 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the second embodiment.
  • the computer system of this embodiment includes multiple hosts 2a, 2b that execute server cache programs and a storage system 1 that has shared backup information from the multiple hosts.
  • the second host 2b updates the shared backup information based on the virtual machine table 60.
  • the second host 2b can take over LBA and threshold value to cache directory from backup information.
  • the hypervisor checks the virtual machine table 60 and re-migrates the VM to the first host 2a, since the first host 2a has promoted cache data. This means that the restore procedure of the host cache program of the first host 2a is executed to restore the cache directory from the storage system (see FIG. 11).
  • FIG. 14 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the third embodiment.
  • the computer system of this embodiment includes multiple hosts 2a, 2b that execute a shared server cache program and a storage system 1 that has shared backup information from the multiple hosts.
  • the second host 2b updates the shared backup information based on the virtual machine table 60.
  • the second host 2b can take over LBA and threshold value to cache directory from backup information.
  • the hypervisor checks the virtual machine table 60 and re-migrates the VM to the first host 2a.
  • FIGS. 1 , 13, and 14 are purely exemplary of information systems in which the present invention may be implemented, and the invention is not limited to a particular hardware configuration.
  • the computers and storage systems implementing the invention can also have known I/O devices (e.g., CD and DVD drives, floppy disk drives, hard drives, etc.) which can store and read the modules, programs and data structures used to implement the above-described invention.
  • These modules, programs and data structures can be encoded on such computer-readable media.
  • the data structures of the invention can be stored on computer-readable media independently of one or more computer-readable media on which reside the programs used in the invention.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include local area networks, wide area networks, e.g., the Internet, wireless networks, storage area networks, and the like.
  • the methods When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system for cache promotion comprises: a server which includes a nonvolatile persistent memory having a cache data area that has a plurality of cache segments, and a server processor managing cache directory information which indicates whether each of the cache segments is valid or not, wherein a cache segment is valid when the cache segment is allocated for caching data; and a storage which includes a storage memory storing backup information of the cache directory information, a plurality of storage devices, and a storage processor configuring one or more logical units by storage space of the plurality of storage devices. When the server recovers from being down, the server processor is configured to obtain the backup information of the cache directory information from the storage, and, when the backup information indicates that a particular cache segment is invalid, to invalidate the particular cache segment in the cache directory information.

Description

METHOD AND APPARATUS OF CACHE PROMOTION BETWEEN SERVER AND STORAGE SYSTEM
BACKGROUND OF THE INVENTION
[0001] The present invention relates generally to storage systems and, more particularly, to cache promotion between server cache and storage system.
[0002] Currently, a server cache technology exists. When the server cache feature is caching read or write data in the server, the server does not access the storage system if the storage volume data is stored in the server cache. According to current technology, the server cache requires a promotion process, which is a warm-up process of storing storage data to the server cache. Also, currently the server cache uses nonvolatile memory such as flash memory. The server cache requires a re-promotion process when the server cache program is restarted, because other physical server(s) execute write IO to the storage volume when a virtual machine migrates to the other physical server(s).
BRIEF SUMMARY OF THE INVENTION
[0003] Exemplary embodiments of the invention provide a cache promotion method between server cache and storage system. According to one embodiment, the server has a cache program, a cache directory, backup information and cache data. The storage system has backup information and manages the backup information. When the server sends the server cache a directory backup command, the storage system saves the cache directory backup. When the server sends a write command, the storage system invalidates the backup information corresponding to the LUN/LBA (Logical Unit Number/Logical Block Addressing) segments of the write command. When the server cache program is restarted due to reboot after the server has been down or due to maintenance, the server sends a restore command and the storage system returns backup information, and then the server merges valid entry directory information. The server cache program does not invalidate the entire server cache data and does not execute a re-promotion (warm up) process to a volume since the storage system keeps the latest backup information when other server(s) send(s) write process to the same volume.
[0004] In accordance with an aspect of the present invention, a system comprises: a server which includes a nonvolatile persistent memory having a cache data area that has a plurality of cache segments, and a server processor managing cache directory information which indicates whether each of the plurality of cache segments is valid or not, wherein a cache segment is valid when the cache segment is allocated for caching data; and a storage which includes a storage memory storing backup information of the cache directory information, a plurality of storage devices, and a storage processor configuring one or more logical units by storage space of the plurality of storage devices. When the server recovers from being down, the server processor is configured to obtain the backup information of the cache directory information from the storage, and, when the backup information indicates that a particular cache segment is invalid, to invalidate the particular cache segment in the cache directory information. [0005] In some embodiments, the server processor is configured, after invalidating the particular cache segment in the cache directory information, to delete server cache data from the particular cache segment and to delete a corresponding server cache address of the particular cache segment from the cache directory information. There are a plurality of servers. The storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers. When a first server is shut down, a virtual machine is migrated from the first server to a second server. When the first server recovers after being shut down, the virtual machine is re-migrated from the second server to the first server, and the first server obtains the shared backup information of the cache directory information from the storage.
[0006] In specific embodiments, a plurality of servers execute a shared server cache program to manage the cache directory information and the backup information. The storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers. When a first server is shut down, a virtual machine is migrated from the first server to a second server. When the first server recovers after being shut down, the first server obtains the shared backup information of the cache directory information from the storage via the shared server cache program.
[0007] In some embodiments, the server processor is configured, in executing a read command, to: search for valid cache segments in the cache data area which correspond to segments of the read command; when there is a valid cache segment corresponding to a segment of the read command, read data from the valid cache segment of the cache data area; when there is no valid cache segment corresponding to a segment of the read command, read data for the segment from the storage; and concatenate data read from the cache data area and data read from the storage to form an entire data of the read command. The server processor is configured, in executing the read command, to: when reading data for the segment from the storage having no corresponding valid cache segment, update an access counter representing a number of access to the storage for the segment by incrementing the access counter by one; and when the access counter exceeds a preset threshold, allocate a new cache segment from the cache data area for storing the data to be read for the segment from the storage, and update server backup information of the cache directory information. The server processor is configured to send the updated server backup information to the storage to update the backup information stored in the storage.
[0008] In specific embodiments, the server processor is configured, in executing a write command, to: search the cache directory information for cache segments in the cache data area which correspond to segments of the write command; when there is a cache segment corresponding to a segment of the write command, update an access counter representing a number of access to the storage for the segment by incrementing the access counter by one, and when the access counter exceeds a preset threshold, allocate the corresponding cache segment to the segment of the write command if the cache segment has not been allocated so as to validate the cache segment, and write data of the segment to the cache segment of the cache data area and update server backup information of the cache directory information; when there is no cache segment corresponding to a segment of the write command, create a cache segment in the cache directory information corresponding to the segment of the write command, initialize an access counter for the cache segment to one, and update server backup information of the cache directory information; and send the write command to the storage for execution.
[0009] In some embodiments, the storage processor is configured, upon receiving the write command, to: invalidate the backup information corresponding to the segments of the write command; and write data of the write command to the storage space. The server processor is configured to send the updated server backup information to the storage to update the backup information stored in the storage.
[0010] Another aspect of the invention is directed to a method for caching data between a server and a storage. The server includes a nonvolatile persistent memory having a cache data area that has a plurality of cache segments, and a server processor managing cache directory information which indicates whether each of the plurality of cache segments is valid or not, wherein a cache segment is valid when the cache segment is allocated for caching data. The storage includes a storage memory storing backup information of the cache directory information, a plurality of storage devices, and a storage processor configuring one or more logical units by storage space of the plurality of storage devices. The method comprises: when the server recovers from being down, obtaining the backup information of the cache directory information from the storage, and, when the backup information indicates that a particular cache segment is invalid, invalidating the particular cache segment in the cache directory information.
[0011] These and other features and advantages of the present invention will become apparent to those of ordinary skill in the art in view of the following detailed description of the specific embodiments.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] FIG. 1 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the first embodiment.
[0013] FIG. 2a illustrates an example of the host memory.
[0014] FIG. 2b illustrates an example of the host nonvolatile persistent memory.
[0015] FIG. 3 illustrates an example of the storage memory.
[0016] FIG. 4 illustrates an example of the cache directory.
[0017] FIG. 5 illustrates an example of the least recently used list.
[0018] FIG. 6 illustrates an example of the virtual machine table.
[0019] FIG. 7 illustrates an example of the directory backup information.
[0020] FIG. 8 shows an example of a flow diagram illustrating a process of the host cache program for executing a read command.
[0021] FIG. 9 shows an example of a flow diagram illustrating a process of the host cache program for executing a write command. [0022] FIG. 10 shows an example of a flow diagram illustrating a process of the host cache program for executing a host cache directory backup process.
[0023] FIG. 11 shows an example of a flow diagram illustrating a process of the host cache program for executing a process to restore cache directory from the storage system.
[0024] FIG. 12 shows an example of a flow diagram illustrating a process of the storage program for executing a host cache directory backup process.
[0025] FIG. 13 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the second embodiment.
[0026] FIG. 14 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the third embodiment.
DETAILED DESCRIPTION OF THE INVENTION
[0027] In the following detailed description of the invention, reference is made to the accompanying drawings which form a part of the disclosure, and in which are shown by way of illustration, and not of limitation, exemplary embodiments by which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. Further, it should be noted that while the detailed description provides various exemplary embodiments, as described below and as illustrated in the drawings, the present invention is not limited to the embodiments described and illustrated herein, but can extend to other embodiments, as would be known or as would become known to those skilled in the art. Reference in the specification to "one embodiment," "this embodiment," or "these
embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention, and the appearances of these phrases in various places in the specification are not necessarily all referring to the same embodiment. Additionally, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that these specific details may not all be needed to practice the present invention. In other circumstances, well-known structures, materials, circuits, processes and interfaces have not been described in detail, and/or may be illustrated in block diagram form, so as to not unnecessarily obscure the present invention.
[0028] Furthermore, some portions of the detailed description that follow are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to most effectively convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In the present invention, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
Usually, though not necessarily, these quantities take the form of electrical or magnetic signals or instructions capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, instructions, or the like. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as
"processing," "computing," "calculating," "determining," "displaying," or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
[0029] The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially
constructed for the required purposes, or it may include one or more general- purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer- readable storage medium including non-transitory medium, such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of media suitable for storing electronic information. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs and modules in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.
[0030] Exemplary embodiments of the invention, as will be described in greater detail below, provide apparatuses, methods and computer programs for cache promotion between server cache and storage system.
[0031] First Embodiment
[0032] FIG. 1 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the first embodiment. The computer system includes a storage system 1 and a host/server computer 2. The storage system includes host interface for connecting to the host computer 2, CPU, cache memory 30, disk interface, and HDDs (Hard Disk Drives) 4. The host interface, CPU, cache memory 30, and disk interface are connected via a bus interface, such as PCI, DDR, SCSI, or the like. A logical unit (LU) 3 is a logical data store which is created by multiple HDDs 4. The host computer 2 includes disk interface for connecting to the storage system 1 or to internal HDD, CPU, nonvolatile persistent memory 20b, and volatile memory 20a, which are connected via a bus interface such as PCI, DDR, SCSI, or the like. [0033] FIG. 2a illustrates an example of the host memory 20a. The volatile host memory 20a includes a server program 21. The host program 21 includes virtual machine 22, hypervisor 23, and cache program 24.
[0034] FIG. 2b illustrates an example of the host nonvolatile persistent memory 20b. The host nonvolatile persistent memory 20b includes cache directory 40 (see FIG. 4), LRU (Least Recently Used) list 50 (see FIG. 5), cache data area 25, and virtual machine table 60 (see FIG. 6). In a different embodiment, the cache directory 40 may be stored in the host memory 20a instead of the host nonvolatile persistent memory 20b.
[0035] FIG. 3 illustrates an example of the storage memory 30. The storage cache memory 30 includes a storage program 31 and cache directory backup information 70 (see FIG. 7). The storage program 31 includes IO program 32 and server cache directory maintenance program 33. The IO program 32 executes IO commands from the host(s). The server cache directory maintenance program 33 executes backup or restores cache directory information to recover server side cache data and directory information.
[0036] FIG. 4 illustrates an example of the cache directory 40. The cache directory 40 contains Logical Unit Number (LUN) field 41 , Logical Block
Address (LBA) field 42, valid flag 43, server cache address (SCA) field 44, and access counter field 45. The LUN 41 and LBA 42 entries are identifiers of store location of the LU 3. The valid flag 43 is a cache validate/invalidate flag.
The SCA 44 is the location in the NV (nonvolatile) persistent cache memory
20b corresponding to the LUN 41 and LBA 42. The access counter 45 is the frequency of read or write command to the corresponding LBA. [0037] FIG. 5 illustrates an example of the least recently used list 50. The LRU list contains a server cache address list 51. The SCA list 51 stores a list of SCA 44 having insufficient capacity of cache data area. At the top of the LRU list 50 is the least recently used (i.e., oldest access of the cache data), and at the bottom of the list is the most recently used (i.e., latest read/write access of the cache data). When the cache program 24 detects insufficient capacity of cache space in the cache data area 25, the cache program 24 chooses the entry at the top of the LRU list 50 to create free space. When the host issues a read or write command, the cache program 24 searches the entry of SCA and move an existing entry or insert a new entry at the bottom of LRU list 50 (latest access). An SCA entry that is most recently accessed is deleted and inserted at the bottom of the LRU list 50.
[0038] FIG. 6 illustrates an example of the virtual machine table 60. The virtual machine table 60 contains virtual machine UUID 61 and used LUN 62. When the hypervisor migrates the virtual machine (VM), the hypervisor refers to the virtual machine table 60 to use promoted cache data from the storage system 1.
[0039] FIG. 7 illustrates an example of the directory backup information 70. The directory backup information 70 contains Logical Unit Number (LUN) field 71 , Logical Block Address (LBA) field 72, valid flag 73, and server cache address (SCA) field 74. The directory backup information can be created by bitmap table with valid flag for each cache segment size. The SCA field 74 is needed if the cache directory 40 is stored in the host memory 20a, but is not needed if the cache directory 40 is stored in the host nonvolatile persistent memory 20b. [0040] FIG. 8 shows an example of a flow diagram 800 illustrating a process of the host cache program for executing a read command. In step
S801 , the server program 21 issues a read command to LUN of the storage system 1. The server cache program 24 gets the read command and proceeds to the step S802. In step S802, the server cache program searches the server cache directory 40. If the search for the server cache is a hit, the next step is S806; if the search is a miss, the next step is S803. In step S803
(miss), the server cache program creates a read process of the corresponding segment to the storage system, and then it creates entry for the segment in the cache directory and updates the access counter in the access counter field 45 (initialized to one for the first access). In step S804, the server cache program checks the access counter. If the threshold for the access counter is exceeded (YES), the next step is S805; if the threshold is not exceeded (NO), the next step is S807. In step S805, the server cache program 24 allocates space (e.g., a cache segment) from the cache data area 25, and then updates the cache directory 40 and server backup information (the server backup information is stored in the server and is sent to the storage to update the backup information stored in the storage as shown in FIG. 10). The next step is S807. In step S806 (hit), the server cache program 24 reads data from the cache data area 25 of the corresponding SCA 44. In the next step S807, if the data size of data to be read according to the read command is larger than the cache segment and next segment exists, then the next step is S802; if the next segment does not exist, then the next step is S808. In step S808, the server cache program 24 executes read processes to the storage system 1 that were created in step S803. When the read data from the storage system is allocated space in the cache data area 25, the server cache program stores the data to the allocated space in the cache data area 25. The server cache program 24 concatenates the storage read data and cache read data, and then it returns the entire data to the server program 21.
[0041] FIG. 9 shows an example of a flow diagram 900 illustrating a process of the host cache program for executing a write command. In step
S901 , the server program 21 issues a write command to LUN of the storage system . The server cache program 24 gets the write command and write data, and proceeds to step S902. In step S902, the server cache program searches the server cache directory 40. If the search for the server cache is a hit (YES), the next step is S903; if the search is a miss (NO), the next step is
S907. In step S903, the server cache program 24 updates the access counter in the access counter field 45 of the corresponding segment to the storage system, and the next step is S904. In step S904, the server cache program checks the access counter. If the threshold for the access counter is exceeded (YES), the next step is S905; if the threshold is not exceeded (NO), the next step is S908. It is noted that when the search for the server cache is a hit (YES) in step S903, the cache directory entry (LUN/LBA) exists, but the valid flag can be 0 (SCA not allocated) or 1 (SCA allocated). In step S905, if the cache directory entry does not have an allocated SCA (valid flag is 0), the server program allocates an SCA (from the cache data area) for that entry of corresponding LUN and LBA (e.g., that particular cache segment). In step
S906, the server cache program stores the write data to the cache data area
25. The server cache program updates the cache directory 40 and server backup information. The next step is S908. In step S907, the server creates a cache directory entry (without allocating an SCA to the space in the cache data area corresponding to the created cache directory entry) and initializes the access counter to the initial value of one, and updates the server backup information. In the next step S908, if the data size of data to be written according to the write command is larger than the cache segment and the next segment exists, the next step is S902; if the next segment does not exist, the next step is S909. In step S909, the server cache program 24 executes the write process to the storage system 1. Then the server cache program 24 returns the write status to the server program 21.
[0042] FIG. 10 shows an example of a flow diagram 1000 illustrating a process of the host cache program for executing a host cache directory backup process. In step S1001 , the server cache program 24 checks the backup process timer. If the timer has expired (YES), the next step is S1002; if the timer has not expired (NO), the process ends. In step S1002, the server cache program sends server backup information to the storage system 1 asynchronously. The storage system receives and uses the server backup information to update the backup information stored therein. In step S1003, the server cache program restarts the backup process timer.
[0043] FIG. 11 shows an example of a flow diagram 1100 illustrating a process of the host cache program for executing a process to restore cache directory from the storage system. In step S1101 , the server cache program
24 is restarted when the physical server is rebooted after being down. In step
S1102, the server cache program gets backup information from the storage system 1. In step S1103, if the server cache program gets the backup information successfully (YES), the next step is S1 06; if not (NO), the next step is S1104. In step S1104, the server cache program invalidates the server cache data of all LUN stored to the storage system from the cache data area 25. In the next step S1105, the server cache program deletes the corresponding SCA 44 from the cache directory 40 and invalidates the corresponding entry, and the process ends. In step S1106, the server cache program 24 invalidates the entry in the cache directory 40 when the corresponding LUN/LBA field (41/42) is invalid. That is, if the valid flag of the corresponding LUN/LBA field (a particular cache segment) of the cache directory backup information 70 is 0, then the valid flag of the host cache directory 40 for the entry having the same LUN/LBA (that particular cache segment) is set to 0. In the next step S1107, the server cache program deletes the server cache data of the corresponding SCA 44 from the cache data area 25 if the valid flag is 0. In the next step S1108, the server cache program deletes the corresponding SCA 44 from the cache directory 40 if the valid flag is 0. In the next step S1109, the server cache program increments to the next LBA segment. If the next segment exists (YES), the next step is S1106; if all LBAs are checked (NO), the next step is S11 10. In step S1110: the server cache program checks the next LUN. If the next LUN exists in the cache backup information (YES), the next step is S1106; if the next LUN does not exist (NO), the process ends.
[0044] FIG. 12 shows an example of a flow diagram 1200 illustrating a process of the storage program for executing a host cache directory backup process. In step S1201 , the storage program 31 receives a write command from the host system 2 which executed a backup command. In the next step
S1202, the storage program invalidates the backup information corresponding to LUN/LBA segments of the write command. In the next step S1203, if the data size of data to be written according to the write command is larger than the cache segment and the next segment exists, the next step is S1202; if the next segment does not exist, the process ends.
[0045] When the server cache program restarts, the server cache program does not invalidate entire server cache data. The server cache program does not execute a re-promotion (warm up) process to a volume since the storage system keeps the latest backup information when other server(s) send(s) write process to the same volume.
[0046] Second Embodiment
[0047] FIG. 13 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the second embodiment. The computer system of this embodiment includes multiple hosts 2a, 2b that execute server cache programs and a storage system 1 that has shared backup information from the multiple hosts. When a VM is migrated to from one host 2a to another host 2b due to the shutdown of the first host 2a, the second host 2b updates the shared backup information based on the virtual machine table 60. In this step, the second host 2b can take over LBA and threshold value to cache directory from backup information. When the first host 2a reboots, the hypervisor checks the virtual machine table 60 and re-migrates the VM to the first host 2a, since the first host 2a has promoted cache data. This means that the restore procedure of the host cache program of the first host 2a is executed to restore the cache directory from the storage system (see FIG. 11). [0048] Third Embodiment
[0049] FIG. 14 illustrates an example of a hardware configuration of a computer system in which the method and apparatus of the invention may be applied according to the third embodiment. The computer system of this embodiment includes multiple hosts 2a, 2b that execute a shared server cache program and a storage system 1 that has shared backup information from the multiple hosts. When a VM is migrated from one host 2a to another host 2b due to the shutdown of the first host 2a, the second host 2b updates the shared backup information based on the virtual machine table 60. In this step, the second host 2b can take over LBA and threshold value to cache directory from backup information. When the first host 2a reboots and gets the latest backup information from the storage system, the hypervisor checks the virtual machine table 60 and re-migrates the VM to the first host 2a.
[0050] Of course, the system configurations illustrated in FIGS. 1 , 13, and 14 are purely exemplary of information systems in which the present invention may be implemented, and the invention is not limited to a particular hardware configuration. The computers and storage systems implementing the invention can also have known I/O devices (e.g., CD and DVD drives, floppy disk drives, hard drives, etc.) which can store and read the modules, programs and data structures used to implement the above-described invention. These modules, programs and data structures can be encoded on such computer-readable media. For example, the data structures of the invention can be stored on computer-readable media independently of one or more computer-readable media on which reside the programs used in the invention. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include local area networks, wide area networks, e.g., the Internet, wireless networks, storage area networks, and the like.
[0051] In the description, numerous details are set forth for purposes of explanation in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that not all of these specific details are required in order to practice the present invention. It is also noted that the invention may be described as a process, which is usually depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged.
[0052] As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of embodiments of the invention may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out embodiments of the invention.
Furthermore, some embodiments of the invention may be performed solely in hardware, whereas other embodiments may be performed solely in software.
Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways.
When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.
[0053] From the foregoing, it will be apparent that the invention provides methods, apparatuses and programs stored on computer readable media for cache promotion between server cache and storage system.
Additionally, while specific embodiments have been illustrated and described in this specification, those of ordinary skill in the art appreciate that any arrangement that is calculated to achieve the same purpose may be substituted for the specific embodiments disclosed. This disclosure is intended to cover any and all adaptations or variations of the present invention, and it is to be understood that the terms used in the following claims should not be construed to limit the invention to the specific
embodiments disclosed in the specification. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with the established doctrines of claim interpretation, along with the full range of equivalents to which such claims are entitled.

Claims

WHAT IS CLAIMED IS:
1. A system comprising:
a server which includes a nonvolatile persistent memory having a cache data area that has a plurality of cache segments, and a server processor managing cache directory information which indicates whether each of the plurality of cache segments is valid or not, wherein a cache segment is valid when the cache segment is allocated for caching data; and a storage which includes a storage memory storing backup information of the cache directory information, a plurality of storage devices, and a storage processor configuring one or more logical units by storage space of the plurality of storage devices;
wherein when the server recovers from being down, the server processor is configured to obtain the backup information of the cache directory information from the storage, and, when the backup information indicates that a particular cache segment is invalid, to invalidate the particular cache segment in the cache directory information.
2. The system according to claim 1 ,
wherein the server processor is configured, after invalidating the particular cache segment in the cache directory information, to delete server cache data from the particular cache segment and to delete a corresponding server cache address of the particular cache segment from the cache directory information.
3. The system according to claim 1 , comprising a plurality of servers, wherein the storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers;
wherein when a first server is shut down, a virtual machine is migrated from the first server to a second server; and
wherein when the first server recovers after being shut down, the virtual machine is re-migrated from the second server to the first server, and the first server obtains the shared backup information of the cache directory information from the storage.
4. The system according to claim 1 , comprising a plurality of servers, wherein the plurality of servers execute a shared server cache program to manage the cache directory information and the backup information;
wherein the storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers;
wherein when a first server is shut down, a virtual machine is migrated from the first server to a second server; and
wherein when the first server recovers after being shut down, the first server obtains the shared backup information of the cache directory information from the storage via the shared server cache program.
5. The system according to claim 1 , wherein the server processor is configured, in executing a read command, to: search for valid cache segments in the cache data area which correspond to segments of the read command;
when there is a valid cache segment corresponding to a segment of the read command, read data from the valid cache segment of the cache data area;
when there is no valid cache segment corresponding to a segment of the read command, read data for the segment from the storage; and
concatenate data read from the cache data area and data read from the storage to form an entire data of the read command.
6. The system according to claim 5, wherein the server processor is configured, in executing the read command, to:
when reading data for the segment from the storage having no corresponding valid cache segment, update an access counter representing a number of access to the storage for the segment by incrementing the access counter by one; and
when the access counter exceeds a preset threshold, allocate a new cache segment from the cache data area for storing the data to be read for the segment from the storage, and update server backup information of the cache directory information.
7. The system according to claim 6, wherein the server processor is configured to:
send the updated server backup information to the storage to update the backup information stored in the storage.
8. The system according to claim 1 , wherein the server processor is configured, in executing a write command, to:
search the cache directory information for cache segments in the cache data area which correspond to segments of the write command;
when there is a cache segment corresponding to a segment of the write command, update an access counter representing a number of access to the storage for the segment by incrementing the access counter by one, and when the access counter exceeds a preset threshold, allocate the corresponding cache segment to the segment of the write command if the cache segment has not been allocated so as to validate the cache segment, and write data of the segment to the cache segment of the cache data area and update server backup information of the cache directory information; when there is no cache segment corresponding to a segment of the write command, create a cache segment in the cache directory information corresponding to the segment of the write command, initialize an access counter for the cache segment to one, and update server backup information of the cache directory information; and
send the write command to the storage for execution.
9. The system according to claim 8, wherein the storage processor is configured, upon receiving the write command, to:
invalidate the backup information corresponding to the segments of the write command; and
write data of the write command to the storage space.
10. The system according to claim 8, wherein the server processor is configured to:
send the updated server backup information to the storage to update the backup information stored in the storage.
11. A method for caching data between a server and a storage, the server including a nonvolatile persistent memory having a cache data area that has a plurality of cache segments, and a server processor managing cache directory information which indicates whether each of the plurality of cache segments is valid or not, wherein a cache segment is valid when the cache segment is allocated for caching data; the storage including a storage memory storing backup information of the cache directory information, a plurality of storage devices, and a storage processor configuring one or more logical units by storage space of the plurality of storage devices; the method comprising:
when the server recovers from being down, obtaining the backup information of the cache directory information from the storage, and, when the backup information indicates that a particular cache segment is invalid, invalidating the particular cache segment in the cache directory information.
12. The method according to claim 1 1 , further comprising:
after invalidating the particular cache segment in the cache directory information, deleting server cache data from the particular cache segment and deleting a corresponding server cache address of the particular cache segment from the cache directory information.
13. The method according to claim 11 , wherein a plurality of servers are provided, wherein the storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers, the method further comprising:
when a first server is shut down, migrating a virtual machine from the first server to a second server; and
when the first server recovers after being shut down, re-migrating the virtual machine from the second server to the first server, and obtaining, by the first server, the shared backup information of the cache directory information from the storage.
14. The method according to claim 11 , wherein a plurality of servers are provided, wherein the plurality of servers execute a shared server cache program to manage the cache directory information and the backup information, wherein the storage memory stores shared backup information of the cache directory information from the plurality of servers, and the shared backup information is shared by the plurality of servers, the method further comprising:
when a first server is shut down, migrating a virtual machine from the first server to a second server; and when the first server recovers after being shut down, obtaining, by the first server, the shared backup information of the cache directory information from the storage via the shared server cache program.
15. The method according to claim 11 , further comprising, in executing a read command:
searching for valid cache segments in the cache data area which correspond to segments of the read command;
when there is a valid cache segment corresponding to a segment of the read command, reading data from the valid cache segment of the cache data area;
when there is no valid cache segment corresponding to a segment of the read command, reading data for the segment from the storage; and
concatenating data read from the cache data area and data read from the storage to form an entire data of the read command.
16. The method according to claim 15, further comprising, in executing the read command:
when reading data for the segment from the storage having no corresponding valid cache segment, updating an access counter representing a number of access to the storage for the segment by incrementing the access counter by one; and
when the access counter exceeds a preset threshold, allocating a new cache segment from the cache data area for storing the data to be read for the segment from the storage, and updating server backup information of the cache directory information.
17. The method according to claim 16, further comprising:
sending the updated server backup information to the storage to update the backup information stored in the storage.
18. The method according to claim 11 , further comprising, in executing a write command:
searching the cache directory information for cache segments in the cache data area which correspond to segments of the write command;
when there is a cache segment corresponding to a segment of the write command, updating an access counter representing a number of access to the storage for the segment by incrementing the access counter by one, and when the access counter exceeds a preset threshold, allocating the corresponding cache segment to the segment of the write command if the cache segment has not been allocated so as to validate the cache segment, and writing data of the segment to the cache segment of the cache data area and updating server backup information of the cache directory information; when there is no cache segment corresponding to a segment of the write command, creating a cache segment in the cache directory information corresponding to the segment of the write command, initializing an access counter for the cache segment to one, and updating server backup information of the cache directory information; and
sending the write command to the storage for execution.
19. The method according to claim 18, further comprising, upon receiving the write command by the storage:
invalidating the backup information corresponding to the segments of the write command; and
writing data of the write command to the storage space.
20. The method according to claim 18, further comprising:
sending the updated server backup information to the storage to update the backup information stored in the storage.
PCT/US2014/033152 2014-04-07 2014-04-07 Method and apparatus of cache promotion between server and storage system WO2015156758A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2014/033152 WO2015156758A1 (en) 2014-04-07 2014-04-07 Method and apparatus of cache promotion between server and storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/033152 WO2015156758A1 (en) 2014-04-07 2014-04-07 Method and apparatus of cache promotion between server and storage system

Publications (1)

Publication Number Publication Date
WO2015156758A1 true WO2015156758A1 (en) 2015-10-15

Family

ID=54288193

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/033152 WO2015156758A1 (en) 2014-04-07 2014-04-07 Method and apparatus of cache promotion between server and storage system

Country Status (1)

Country Link
WO (1) WO2015156758A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111372277A (en) * 2018-12-26 2020-07-03 中兴通讯股份有限公司 Data distribution method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172082A1 (en) * 2004-01-30 2005-08-04 Wei Liu Data-aware cache state machine
US20100257269A1 (en) * 2009-04-01 2010-10-07 Vmware, Inc. Method and System for Migrating Processes Between Virtual Machines
US20110153953A1 (en) * 2009-12-23 2011-06-23 Prakash Khemani Systems and methods for managing large cache services in a multi-core system
US20140012936A1 (en) * 2012-07-05 2014-01-09 Hitachi, Ltd. Computer system, cache control method and computer program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172082A1 (en) * 2004-01-30 2005-08-04 Wei Liu Data-aware cache state machine
US20100257269A1 (en) * 2009-04-01 2010-10-07 Vmware, Inc. Method and System for Migrating Processes Between Virtual Machines
US20110153953A1 (en) * 2009-12-23 2011-06-23 Prakash Khemani Systems and methods for managing large cache services in a multi-core system
US20140012936A1 (en) * 2012-07-05 2014-01-09 Hitachi, Ltd. Computer system, cache control method and computer program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111372277A (en) * 2018-12-26 2020-07-03 中兴通讯股份有限公司 Data distribution method, device and storage medium
CN111372277B (en) * 2018-12-26 2023-07-14 南京中兴新软件有限责任公司 Data distribution method, device and storage medium

Similar Documents

Publication Publication Date Title
US8924664B2 (en) Logical object deletion
US9910777B2 (en) Enhanced integrity through atomic writes in cache
US20140258628A1 (en) System, method and computer-readable medium for managing a cache store to achieve improved cache ramp-up across system reboots
US10310980B2 (en) Prefetch command optimization for tiered storage systems
US20170139825A1 (en) Method of improving garbage collection efficiency of flash-oriented file systems using a journaling approach
US9710283B2 (en) System and method for pre-storing small data files into a page-cache and performing reading and writing to the page cache during booting
US9146928B1 (en) Techniques for storing metadata of a filesystem in persistent memory
CN108885589B (en) Approach to Flash-Friendly Cache for CDM Workloads
JP6337902B2 (en) Storage system, node device, cache control method and program
JP5944502B2 (en) Computer system and control method
US9183127B2 (en) Sequential block allocation in a memory
US10860481B2 (en) Data recovery method, data recovery system, and computer program product
JP5801933B2 (en) Solid state drive that caches boot data
US9880761B2 (en) Restorable memory allocator
JP6746747B2 (en) Storage system
US20140372710A1 (en) System and method for recovering from an unexpected shutdown in a write-back caching environment
US9658799B2 (en) Data storage device deferred secure delete
CN109074308A (en) The block conversion table (BTT) of adaptability
US20220067549A1 (en) Method and Apparatus for Increasing the Accuracy of Predicting Future IO Operations on a Storage System
WO2015065312A1 (en) Method and apparatus of data de-duplication for solid state memory
JP5175953B2 (en) Information processing apparatus and cache control method
US20140059291A1 (en) Method for protecting storage device data integrity in an external operating environment
KR102403063B1 (en) Mobile device and management method of mobile device
WO2015156758A1 (en) Method and apparatus of cache promotion between server and storage system
WO2017023339A1 (en) Snapshot storage management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14888699

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14888699

Country of ref document: EP

Kind code of ref document: A1