[go: up one dir, main page]

US20180113810A1 - Method and system for efficient hashing optimized for hardware accelerated caching - Google Patents

Method and system for efficient hashing optimized for hardware accelerated caching Download PDF

Info

Publication number
US20180113810A1
US20180113810A1 US15/335,025 US201615335025A US2018113810A1 US 20180113810 A1 US20180113810 A1 US 20180113810A1 US 201615335025 A US201615335025 A US 201615335025A US 2018113810 A1 US2018113810 A1 US 2018113810A1
Authority
US
United States
Prior art keywords
hash
slot
slots
controller
bucket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/335,025
Inventor
Horia Simionescu
Timothy Hoglund
Sridhar Rao Veerla
Panthini Pandit
Gowrisankar RADHAKRISHNAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avago Technologies International Sales Pte Ltd
Original Assignee
Avago Technologies General IP Singapore Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Avago Technologies General IP Singapore Pte Ltd filed Critical Avago Technologies General IP Singapore Pte Ltd
Priority to US15/335,025 priority Critical patent/US20180113810A1/en
Assigned to AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD reassignment AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOGLUND, TIMOTHY, RADHAKRISHNAN, GOWRISANKAR, SIMIONESCU, HORIA, PANDIT, PANTHINI, VEERLA, SRIDHAR RAO
Publication of US20180113810A1 publication Critical patent/US20180113810A1/en
Assigned to AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED reassignment AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED MERGER (SEE DOCUMENT FOR DETAILS). Assignors: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.
Assigned to AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED reassignment AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED CORRECTIVE ASSIGNMENT TO CORRECT THE EXECUTION DATE OF THE MERGER AND APPLICATION NOS. 13/237,550 AND 16/103,107 FROM THE MERGER PREVIOUSLY RECORDED ON REEL 047231 FRAME 0369. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER. Assignors: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/30Providing cache or TLB in specific location of a processing system
    • G06F2212/302In image processor or graphics adapter
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/452Instruction code
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/604Details relating to cache allocation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements
    • G06F2212/621Coherency control relating to peripheral accessing, e.g. from DMA or I/O device

Definitions

  • the present disclosure is generally directed toward computer memory.
  • Hashing is one of the basic frameworks utilized in a caching environment.
  • the advantage of hashing for cache lookup is that in the best case the lookup will be O(1) when there are no collisions. But when there are collisions, the average or worst case lookup can be very time consuming because of the memory accesses involved.
  • LBA Logical Block Address
  • the load store operations of memory locations determine the performance that can be achieved by the solution.
  • An effective hash structure organization would be useful so that the load store operations are minimized.
  • the division operation is very expensive and it is not preferred to add division logic to the hardware if such hardware can be avoided.
  • FIG. 1 is a block diagram depicting a computing system in accordance with at least some embodiments of the present disclosure
  • FIG. 2 is a block diagram depicting details of an illustrative controller in accordance with at least some embodiments of the present disclosure
  • FIG. 3A is a block diagram depicting additional details of hash slot buckets and the slots contained therein in accordance with at least some embodiments of the present disclosure
  • FIG. 3B is a block diagram depicting additional details of a hash extension in accordance with at least some embodiments of the present disclosure
  • FIG. 4 is a flow diagram depicting a method of adding to a local hash value in accordance with at least some embodiments of the present disclosure
  • FIG. 5A is a first portion of a flow diagram depicting a hash searching method in accordance with at least some embodiments of the present disclosure
  • FIG. 5B is a second portion of the flow diagram from FIG. 5A ;
  • FIG. 6 is a flow diagram depicting a method of approximating a number of rows needed to accommodate an Input/Output (I/O) operation in accordance with at least some embodiments of the present disclosure.
  • embodiments of the present disclosure present a hashing method for efficient cache lookup that minimizes memory accesses.
  • a new hash with a chaining scheme is introduced with a hash table (e.g., a deep DRAM hash table) that is used for the cache lookup.
  • a hash table e.g., a deep DRAM hash table
  • multi-level hash functions are used—one level to get to an appropriate hash slot bucket and another level to get the appropriate hash slot within the hash slot bucket in call all slots are occupied.
  • a unique hash structure is also provided that can aid in set associative hashing scheme.
  • the proposed hash structure also enables a method to add elements to a hash, while ensuring that all slots in a hash slot bucket are equally loaded.
  • Another aspect of the present disclosure is to provide a method for approximating the number of slots to load for an R5/R6 Input/Output (I/O) operation.
  • embodiments of the present disclosure will be described. While many of the examples depicted and described herein will relate to a RAID architecture, it should be appreciated that embodiments of the present disclosure are not so limited. Indeed, aspects of the present disclosure can be used in any type of computing system and/or memory environment. In particular, embodiments of the present disclosure can be used in any type of caching scheme (whether employed by a RAID controller or some other type of device used in a communication system). In particular, hard drives, hard drive controllers (e.g., SCSI controllers, SAS controllers, or RAID controllers) may be configured to implement embodiments of the present disclosure. As another example, network cards or the like having cache memory may also be configured to implement embodiments of the present disclosure.
  • hard drives e.g., SCSI controllers, SAS controllers, or RAID controllers
  • network cards or the like having cache memory may also be configured to implement embodiments of the present disclosure.
  • the computing system 100 is shown to include a host system 104 , a controller 108 (e.g., a SCSI controller, a SAS controller, a RAID controller, etc.), and a storage array 112 having a plurality of storage devices 136 a -N therein.
  • the system 100 may utilize any type of data storage architecture.
  • the particular architecture depicted and described herein e.g., a RAID architecture
  • RAID-0 also referred to as a RAID level 0
  • data blocks are stored in order across one or more of the storage devices 136 a -N without redundancy. This effectively means that none of the data blocks are copies of another data block and there is no parity block to recover from failure of a storage device 136 .
  • a RAID-1 also referred to as a RAID level 1
  • RAID-1 uses one or more of the storage devices 136 a -N to store a data block and an equal number of additional mirror devices for storing copies of a stored data block.
  • Higher level RAID schemes can further segment the data into bits, bytes, or blocks for storage across multiple storage devices 136 a -N.
  • One or more of the storage devices 136 a -N may also be used to store error correction or parity information.
  • a single unit of storage can be spread across multiple devices 136 a -N and such a unit of storage may be referred to as a stripe.
  • a stripe may include the related data written to multiple devices 136 a -N as well as the parity information written to a parity storage device 136 a -N.
  • a RAID-5 also referred to as a RAID level 5
  • the data being stored is segmented into blocks for storage across multiple devices 136 a -N with a single parity block for each stripe distributed in a particular configuration across the multiple devices 136 a -N.
  • This scheme can be compared to a RAID-6 (also referred to as a RAID level 6) scheme in which dual parity blocks are determined for a stripe and are distributed across each of the multiple devices 136 a -N in the array 112 .
  • controller 108 One of the functions of the controller 108 is to make the multiple storage devices 136 a -N in the array 112 appear to a host system 104 as a single high capacity disk drive.
  • the controller 108 may be configured to automatically distribute data supplied from the host system 104 across the multiple storage devices 136 a -N(potentially with parity information) without ever exposing the manner in which the data is actually distributed to the host system 104 .
  • the host system 104 is shown to include a processor 116 , an interface 120 , and memory 124 . It should be appreciated that the host system 104 may include additional components without departing from the scope of the present disclosure.
  • the host system 104 in some embodiments, corresponds to a user computer, laptop, workstation, server, collection of servers, or the like. Thus, the host system 104 may or may not be designed to receive input directly from a human user.
  • the processor 116 of the host system 104 may include a microprocessor, central processing unit (CPU), collection of microprocessors, or the like.
  • the memory 124 may be designed to store instructions that enable functionality of the host system 104 when executed by the processor 116 .
  • the memory 124 may also store data that is eventually written by the host system 104 to the storage array 112 . Further still, the memory 124 may be used to store data that is retrieved from the storage array 112 .
  • Illustrative memory 124 devices may include, without limitation, volatile or non-volatile computer memory (e.g., flash memory, RAM, DRAM, ROM, EEPROM, etc.).
  • the interface 120 of the host system 104 enables the host system 104 to communicate with the controller 108 via a host interface 128 of the controller 108 .
  • the interface 120 and host interface(s) 128 may be of a same or similar type (e.g., utilize a common protocol, a common communication medium, etc.) such that commands issued by the host system 104 are receivable at the controller 108 and data retrieved by the controller 108 is transmittable back to the host system 104 .
  • the interfaces 120 , 128 may correspond to parallel or serial computer interfaces that utilize wired or wireless communication channels.
  • the interfaces 120 , 128 may include hardware that enables such wired or wireless communications.
  • the communication protocol used between the host system 104 and the controller 108 may correspond to any type of known host/memory control protocol.
  • Non-limiting examples of protocols that may be used between interfaces 120 , 128 include SAS, SATA, SCSI, FibreChannel (FC), iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • the controller 108 may provide the ability to represent the entire storage array 112 to the host system 104 as a single high volume data storage device. Any known mechanism can be used to accomplish this task.
  • the controller 108 may help to manager the storage devices 136 a -N (which can be hard disk drives, sold-state drives, or combinations thereof) so as to operate as a logical unit.
  • the controller 108 may be physically incorporated into the host device 104 as a Peripheral Component Interconnect (PCI) expansion (e.g., PCI express (PCI)e) card or the like. In such situations, the controller 108 may be referred to as a RAID adapter.
  • PCI Peripheral Component Interconnect
  • the storage devices 136 a -N in the storage array 112 may be of similar types or may be of different types without departing from the scope of the present disclosure.
  • the storage devices 136 a -N may be co-located with one another or may be physically located in different geographical locations.
  • the nature of the storage interface 132 may depend upon the types of storage devices 136 a -N used in the storage array 112 and the desired capabilities of the array 112 .
  • the storage interface 132 may correspond to a virtual interface or an actual interface. As with the other interfaces described herein, the storage interface 132 may include serial or parallel interface technologies. Examples of the storage interface 132 include, without limitation, SAS, SATA, SCSI, FC, iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • the controller 108 is shown to have communication capabilities with a controller cache 140 . While depicted as being separate from the controller 108 , it should be appreciated that the controller cache 140 may be integral to the controller 108 , meaning that components of the controller 108 and the controller cache 140 may be contained within a single physical housing or computing unit (e.g., server blade).
  • the controller cache 140 is provided to enable the controller 108 to perform caching operations.
  • the controller 108 may employ caching operations during execution of I/O commands received from the host system 104 . Depending upon the nature of the I/O command and the amount of information being processed during the command, the controller 108 may require a large number of cache memory modules 148 or a smaller number of cache memory modules 148 .
  • the memory modules 148 may correspond to flash memory, RAM, DDR memory, or some other type of computer memory that is quickly accessible and can be rewritten multiple times.
  • the number of separate memory modules 148 in the controller cache 140 is typically larger than one, although a controller cache 140 may be configured to operate with a single memory module 148 if desired.
  • the cache interface 144 may correspond to any interconnect that enables the controller 108 to access the memory modules 148 , temporarily store data thereon, and/or retrieve data stored thereon in connection with performing an I/O command or some other executable command.
  • the controller cache 140 may be integrated with the controller 108 and may be executed on a CPU chip or placed on a separate chip within the controller 108 .
  • the interface 144 may correspond to a separate bus interconnect within the CPU or traces connecting a chip of the controller cache 140 with a chip executing the processor of the controller 108 .
  • the controller cache 140 may be external to the controller 108 in which case the interface 144 may correspond to a serial or parallel data port.
  • Direct mapping is one type of cache configuration in which each block is mapped to exactly one cache location.
  • this is like rows in a table with three columns: the data block or cache line that contains the actual data fetched and stored, a tag that contains all or part of the address of the fetched data, and a flag bit that connotes the presence of a valid bit of data in the row entry.
  • Fully associative mapping is similar to direct mapping in structure, but allows a block to be mapped to any cache location rather than to a pre-specified cache location (as is the case with direct mapping).
  • Set associative mapping can be viewed as a compromise between direct mapping and fully associative mapping in which each block is mapped to a subset of cache locations. It is sometimes called N-way set associative mapping, which provides for a location in main memory to be cached to any of “N” locations in the L1 cache.
  • the controller 108 and controller cache 140 may be enabled to implement an N-way set associated mapping in a more efficient manner than previous caching systems.
  • the controller 108 and controller cache 140 may be configured to use hashing techniques that are N-time associative. This will effectively control the number of hash collisions, which can help increase processing speed even though different inputs may result in the same hash output.
  • hashing techniques that are N-time associative. This will effectively control the number of hash collisions, which can help increase processing speed even though different inputs may result in the same hash output.
  • the controller 108 is shown to include the host interface(s) 128 and storage interface(s) 132 .
  • the controller 108 is also shown to include a processor 204 , memory 208 , one or more drivers 212 , and a power source 216 .
  • the processor 204 may include an Integrated Circuit (IC) chip or multiple IC chips, a CPU, a microprocessor, or the like.
  • the processor 204 may be configured to execute instructions in memory 208 that are shown to include a cache management module 224 , hashing instructions 228 , hash search instructions 232 , and a slot collision list.
  • the processor 204 may utilize a set associative hash 220 containing a plurality of hash buckets 240 a -M.
  • the set associative hash 220 is shown to be maintained internally to the controller 108 .
  • set associative hash 220 may be stored and/or maintained external to the controller 108 .
  • the set associative hash 220 may be stored or contained within memory 208 of the controller 108 .
  • the memory 208 may be volatile and/or non-volatile in nature. As indicated above, the memory 208 may include any hardware component or collection of hardware components that are capable of storing instructions and communicating those instructions to the processor 204 for execution. Non-limiting examples of memory 208 include RAM, ROM, flash memory, EEPROM, variants thereof, combinations thereof, and the like.
  • the instructions stored in memory 208 are shown to be different instruction sets, but it should be appreciated that the instructions can be combined into a smaller number of instruction sets without departing from the scope of the present disclosure.
  • the cache management module 224 when executed, enable the processor 204 to receive I/O commands from the host system 104 , determine one or more caching operations that need to be performed in connection with executing the I/O command, and identify which memory module(s) 148 should be used in connection with performing the one or more caching operations.
  • the cache management module 224 may also be configured to invoke the other instruction sets in memory 208 to further perform such caching operations.
  • the cache management module 224 may also be configured to manage the set associative hash 220 and the values assigned to various data fields within the set associative hash 220 .
  • the cache management module 224 may be enabled to update the set associative hash 220 to reflect such utilizations.
  • the cache management module 224 may also be responsible for facilitating communications between the controller 108 and the controller cache 140 .
  • the hashing instructions 228 when executed, enable the processor 208 to calculate hash output values based hash inputs.
  • hash inputs used by the hashing instructions 228 to calculate hash output values may include numbers assigned to particular hash buckets 240 a -M, numbers assigned to hash slots within the hash buckets 240 a -M, memory addresses of memory modules 148 , LBAs, storage time, data retrieval time, strip/row numbers, a hash index, a maximum number of bits needed to complete the I/O command, and other numbers derived therefrom.
  • the hashing instructions 228 may implement a hashing function that correspond to a linear hashing function.
  • Use of a linear hash function can help ensure that hash slots for successive strips/rows of data are stored consecutively (thereby increasing the efficiency with which such data can be later retrieved by the controller 108 ).
  • Use of a linear hash function is advantageous especially in hardware accelerated caching solutions where the hardware thread does not have direct access to the memory module 148 (e.g., SRAM or DRAM) where the hash slots are stored.
  • These hash slots can be loaded when required and stored back when the objective is achieved (e.g., the I/O command is completed). Since an I/O command almost always results in consecutive strips/rows, many slots can be loaded and stored together, thus ensuring that there are not too many unnecessary load and store operations.
  • the transpose operation of the hash function executed by the hashing instructions 228 ensures that a region in each virtual disk should not map to the same slot in a hash bucket 240 a -M, otherwise the slot will have a tendency to be added to the slot collision list 236 .
  • the hash search instructions 232 when executed, enable the processor 208 to search the set associative hash 220 for hash collisions. As will be described in further detail herein, as the controller 108 executes I/O commands, the controller 108 may utilize the hash search instructions 232 to see if a particular piece or set of data is already stored in the memory module 148 .
  • the hash search instructions 232 may be enabled to utilize the hashing instructions 228 to generate a hash for the data being searched and may compare that hash to hashes stored in the set associative hash 220 . If collisions (e.g., matches) are found, the hash search instructions 232 and/or cache management module 224 may be enabled to update the slot collision list 236 as will be described in further detail herein.
  • the set associative hash 220 is shown to include a plurality of hash buckets 240 a -M linearly arranged.
  • the set associative hash 220 may include up to M hash buckets, where M is an integer number greater than or equal to one.
  • Each hash slot bucket 240 is shown to include a plurality of hash slots 304 a - d . Although the hash slot buckets 240 are shown to include four hash slots 304 a - d , it should be appreciated that a hash slot bucket 240 may be configured to include a greater or lesser number of hash slots. For instance, the hash slot bucket 240 may have two, three, four, five, . . . , ten, twenty, or more hash slots without departing from the scope of the present disclosure.
  • Each hash slot 304 is shown to include a number of fields for storing information in connection with a hash generated for the slot 304 .
  • the fields include, without limitation, a flag field 308 , a Cache Segment ID (CSID) field 312 , an LSB field 316 , and an MSB field 320 .
  • the flag field 308 may contain a number of bits that indicate various details or pieces of information related to the hash slot 304 .
  • the flags 308 may be used to indicate whether data stored in the hash slot 304 continues in another hash slot (e.g., by an appropriate indication in the bit link field 324 ).
  • the flags 308 may indicate the presence of a bad block in one of the LBA ranges that map into the corresponding slot 304 .
  • the CSID field 312 provides an identification of a cache segment.
  • the identification may be provided as an alphanumerical string.
  • adjacent slots may be assigned consecutively-numbered CSIDs.
  • CSIDs may be assigned randomly or pseudo-randomly.
  • the CSID provides a substantially unique (e.g., unique to the data structure 220 ) identifier for a slot 304 . This means that any particular slot 304 will have a different CSID from all other slots 304 in the set associative hash 220 .
  • the combination of the LSB field 316 and MSB field 320 may provide an address range in memory where a hash slot is stored.
  • the identifiers stored in the fields 316 , 320 may identify an LBA of a memory module 148 or an LBA range of the memory module 148 .
  • the fields 316 , 320 may be referred to as a tag field.
  • a 1 byte value in the tag field may indicate that the cache segment frame is used as a hash slot extension, which is shown in FIG. 3B .
  • each hash slot 304 is provisioned with four potential collision lists.
  • This set associative hashing ensures that the length of the collision list is reduced. For instance, assume there are 4 elements that needed to be added to a same hash slot bucket 240 then all the 4 are added into the slots 304 a - d and there would not be any addition to the collision list 236 .
  • an extension (as shown in FIG. 3B ) that can hold up to 16 slots is created and the last index (e.g., the fourth index) in the hash slot 304 d updated with the extension ID, the information in the last index (e.g., the fourth index) is copied into the first index slot in the extension thus establishing the chain. Now up to 16 more elements can be added to this extension frame without adding a further collision list 236 .
  • linear traversals do not involve loading a new memory which makes the traversal faster unlike a non-linear traversal (e.g., a linked list) which involves loading of the next link and its corresponding memory before proceeding to the next element.
  • a non-linear traversal e.g., a linked list
  • Hardware implementations can optimize comparisons in a linear list.
  • the hash bucket 240 preserves the expendability of the chaining, but this feature kicks in only when unusual workloads make the chaining necessary (e.g., there are more than 4 collisions).
  • the hash slot 304 also indicates if the bad block bit is set (e.g., via flag 308 ). If the flag bit is set, it indicates that some/all of the cache segments present in this hash slot 304 have bad blocks and hence the caching algorithms can take appropriate actions based on the specific use case—for example divert the command to a firmware module to take specific actions.
  • each consecutive strip/row can be found in consecutive hash slot buckets. This effectively allows a group of memory accesses to be performed instead of multiple individual memory accesses.
  • a hash slot bucket corresponds to a group of positions in a hash table and a hash slot corresponds to a particular position in the hash table.
  • a particular slot may be empty or contain an entry.
  • the hash search instructions 232 will search the hashes assigned to each slot bucket 240 for matches. These hashes need to be updated as additional slots in the hash slot bucket 240 are used and data is stored in corresponding addresses of cache memory 148 .
  • the method begins when an I/O command is received at the controller 108 from the host system 104 and then the controller 108 invokes the cache management module 224 to search for an empty slot in the set associative hash 220 .
  • the cache management module 224 will set a last slot index equal to 3 (e.g., if there are 4 slots per hash bucket) (step 404 ).
  • the cache management module 224 will set an empty slot index (step 408 ). In this step, the cache management module 224 will find an empty slot by searching hash slot buckets 240 starting at hash slot bucket 0 240 a until the last slot index is reached.
  • the cache manager 224 will check to see if the CSID field of the hash slot bucket is INVALID. This may indicate slot bucket is empty.
  • the first hash index is obtained from the I/O command although it can also be received from other sources. This first hash index gives the hash slot bucket which contains four slots. The next check is to see if any of the four slots 304 within the hash slot bucket 240 are empty (step 412 )(e.g., from 0 to 3). If empty, the slot will be used to store the data in connection with the received I/O command.
  • the cache management module 224 will also add the CSID that is assigned to the strip/row in connection to the IO command, to the hash slot assigned to the hash slot bucket 240 (step 440 ).
  • the cache management module 224 will also update the appropriate fields of the slot 304 and bucket 240 to indicate that the assigned slot is not part of a slot chain. Thereafter, the method ends at step 444 .
  • the cache management module 224 will check to see if the last slot index in the hash bucket has an extension (step 416 ). If an extension is found (step 420 ), then the cache management module 224 will get the CSID from the last slot and load the CSID into local memory. The cache management module 224 will then start the linear search within the frame/slot extension (step 424 ). The method then reverts back to step 412 to see if any empty slots are found within the extension.
  • the cache management module 224 allocates a new cache frame/extension (step 428 ).
  • the allocated cache frame can be of any size and can be indexable by an identifier.
  • the cache frame may be a 128 byte cache frame.
  • the method further continues with the cache management module 224 copying the last slot into the first slot in the newly-allocated frame (step 432 ). Then the cache management module 224 sets the tag field 316 , 320 to indicate that the newly-allocated frame corresponds to an extension (step 436 ). The cache management module 224 may further set the empty slot index to an appropriate value to indicate that the newly-allocated frame does not have any slots currently allocated therein. The method then proceeds to step 440 .
  • the hash searching method is performed to see if there is a hash hit or miss for a particular I/O command.
  • the method begins by setting the slot index (step 504 ).
  • the slot index is set using a hash function.
  • the hash function used to set the slot index is not a “strong” hash function. Rather, a relatively weak hash function is used for simple manipulation of data (e.g., shift and add functions). This means that faster responses can be achieved, but at the expense of higher collision rates.
  • the hash slots are grouped into buckets and because adjacent LBAs are assigned to consecutive buckets, the benefits of the fast I/O processing can be realized without having to worry about the increased risk of collisions.
  • the I/O command can be executed relatively quickly because the data for any I/O command is stored in consecutive memory locations. Moreover, all of the M buckets can be accessed in a single read access rather than having to use multiple accesses. This will be true as long as there are less than four collisions for each hash bucket (if each bucket has four slots).
  • the get hash index algorithm computes a hash index (e.g., “HashIndex”) according to the following:
  • the method proceeds with the cache management module 224 and/or hash search instructions 232 determining the number of slots that need to be loaded to accommodate the I/O command (step 508 ). In some embodiments, this number may be approximated by the cache management module 224 and/or hash search instructions 232 (as will be described in connection with FIG. 6 ).
  • the slots then begin to be loaded (step 512 ). In some embodiments, the slots are loaded in alignment with a burst boundary. The burst boundary can depend upon the slot size, the burst size, and the number of slots included in a slot bucket.
  • the slot loading begins with the first slot (step 516 ).
  • the cache management module 224 and/or hash search instructions 232 will determine whether the number of slots needed (e.g., as estimated or determined in step 508 ) is less than the total number of slots in a bucket (step 520 ). If this query is answered negatively, then the method proceeds by checking if the last slot has an extension (step 536 ). If the last slot has an extension, then the cache management module 224 and/or hash search instructions 232 get the ID of the frame from the last slot index and load that particular frame into local memory for additional searching (step 544 ). The cache management module 224 and/or hash search instructions 232 then update the last slot number and the last slot index (step 548 ). After the slot number and last slot index have been updated, the method proceeds to step 528 .
  • step 540 if there is no extension present, then the method proceeds to step 568 ( FIG. 5B ) as will be discussed in further detail herein.
  • the method proceeds by comparing the strip/row from the slot to the strip/row for the request I/O command (step 524 ). Thereafter, the method proceeds with the cache management module 224 and/or hash search instructions 232 determining if a hash match is found (step 528 ). If not, the method increments to the next slot (step 532 ) and returns to step 520 .
  • the method proceeds until a CSID match is found (step 552 ).
  • Finding a matching CSID may be referred to as finding a hash hit.
  • the required operation is performed in connection with executing the I/O command (step 556 ).
  • the local hash value may be stored in memory of the controller 108 or in the controller cache 140 (step 560 ). Thereafter, the method ends (step 564 ).
  • step 540 ( FIG. 5A ) if there is no extension present, then the method proceeds with the cache management module 224 and/or hash search instructions 232 performing the required operation associated with a hash miss (step 568 ).
  • the local hash can then be stored in memory if needed (step 560 ) and then the method ends (step 564 ).
  • Embodiments of the present disclosure contemplate such an approximation method.
  • the method begins by determining the number of strips of the first row (NSFR) (step 604 ).
  • This value of NSFR may be set equal to the row data size minus log Arm.
  • the Log Arm is the Logical Arm in a Raid Volume. For A Raid 5 volume of 3 drives. There will be two data Logical Arms Log Arm 0 and Log Arm 1 and one parity arm. Data is striped across these logical Arms.
  • An I/O command contains Row Number, Logical Arm Number and Offset in Arm to represent the start LBA for a Raid 5 volume.
  • RowNumber startlba/number of Data Arms.
  • Log Arm (startLba/srtipsize) % Number of Data Arms.
  • OffsetInArm startLba % strip size.
  • the method proceeds by determining whether the NSFR is greater than the number of strips spanned by the I/O command (step 608 ). If the query is answered positively, then the NSFR is adjusted to equal the number of strips spanned by the I/O operation (step 612 ).
  • step 608 determines the row data size, the number of rows, the number of full rows, and the number of strips spanned in the last row (step 616 ). The method further continues by determining whether the number of strips spanned in the last row is equal to zero (step 620 ). If this query is answered affirmatively, then the method proceeds to setting the number of rows equal to the determined number of rows plus one (step 624 ). However, if the query of step 720 is answered negatively, then the method proceeds by approximating the number of rows spanned by the I/O operation (step 628 ). In some embodiments, the nearest power of two value (NP2) of the row data size of the RAID volume is used to approximate the number of rows spanned by the I/O operation.
  • NP2 nearest power of two value

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A system and method for data caching are provided. The disclosed method includes organizing a plurality of hash slots into a plurality of hash slot buckets such that each hash slot bucket in the plurality of hash slot buckets contains a plurality of hash slots having Logical Block Addressing (LBA) and Cache Segment ID (CSID) pairs, receiving an Input/Output (I/O) request from a host system, determining that cache memory is needed to fulfill the I/O request, and performing a cache lookup in connection with fulfilling the I/O request, where the cache lookup includes analyzing the plurality of hash slots for unoccupied hash slots by comparing a hash against hash values assigned to the hash slot buckets instead of individual hash values assigned to the hash slots.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This Non-Provisional patent Application claims the benefit of U.S. Provisional Patent Application No. 62/410,752, filed Oct. 20, 2016, the entire disclosure of which is hereby incorporated herein by reference.
  • FIELD OF THE DISCLOSURE
  • The present disclosure is generally directed toward computer memory.
  • BACKGROUND
  • Hashing is one of the basic frameworks utilized in a caching environment. The advantage of hashing for cache lookup is that in the best case the lookup will be O(1) when there are no collisions. But when there are collisions, the average or worst case lookup can be very time consuming because of the memory accesses involved.
  • There is a need for an effective hashing function which can make sure that the hash slots are used optimally. A region in each virtual disk should not map to the same slot otherwise it has tendency to get into collision list. For example: Logical Block Address (LBA) 0, which contains the master boot record, is frequently accessed. This means that if LBA 0 of every virtual disk maps to the same hash slot, it is more likely that most of these LBAs are frequently accessed and hence lead to the collision list.
  • When caching is implemented in hardware, the load store operations of memory locations determine the performance that can be achieved by the solution. An effective hash structure organization would be useful so that the load store operations are minimized. Typically, in a hardware accelerated solution, the division operation is very expensive and it is not preferred to add division logic to the hardware if such hardware can be avoided.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present disclosure is described in conjunction with the appended figures, which are not necessarily drawn to scale:
  • FIG. 1 is a block diagram depicting a computing system in accordance with at least some embodiments of the present disclosure;
  • FIG. 2 is a block diagram depicting details of an illustrative controller in accordance with at least some embodiments of the present disclosure;
  • FIG. 3A is a block diagram depicting additional details of hash slot buckets and the slots contained therein in accordance with at least some embodiments of the present disclosure;
  • FIG. 3B is a block diagram depicting additional details of a hash extension in accordance with at least some embodiments of the present disclosure;
  • FIG. 4 is a flow diagram depicting a method of adding to a local hash value in accordance with at least some embodiments of the present disclosure;
  • FIG. 5A is a first portion of a flow diagram depicting a hash searching method in accordance with at least some embodiments of the present disclosure;
  • FIG. 5B is a second portion of the flow diagram from FIG. 5A; and
  • FIG. 6 is a flow diagram depicting a method of approximating a number of rows needed to accommodate an Input/Output (I/O) operation in accordance with at least some embodiments of the present disclosure.
  • DETAILED DESCRIPTION
  • The ensuing description provides embodiments only, and is not intended to limit the scope, applicability, or configuration of the claims. Rather, the ensuing description will provide those skilled in the art with an enabling description for implementing the described embodiments. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the appended claims.
  • Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and this disclosure.
  • As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprise,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The term “and/or” includes any and all combinations of one or more of the associated listed items.
  • As will be discussed in further detail herein, embodiments of the present disclosure present a hashing method for efficient cache lookup that minimizes memory accesses. A new hash with a chaining scheme is introduced with a hash table (e.g., a deep DRAM hash table) that is used for the cache lookup. In some embodiments, multi-level hash functions are used—one level to get to an appropriate hash slot bucket and another level to get the appropriate hash slot within the hash slot bucket in call all slots are occupied.
  • A unique hash structure is also provided that can aid in set associative hashing scheme. The proposed hash structure also enables a method to add elements to a hash, while ensuring that all slots in a hash slot bucket are equally loaded.
  • Another aspect of the present disclosure is to provide a method for approximating the number of slots to load for an R5/R6 Input/Output (I/O) operation.
  • With reference to FIGS. 1-6, various embodiments of the present disclosure will be described. While many of the examples depicted and described herein will relate to a RAID architecture, it should be appreciated that embodiments of the present disclosure are not so limited. Indeed, aspects of the present disclosure can be used in any type of computing system and/or memory environment. In particular, embodiments of the present disclosure can be used in any type of caching scheme (whether employed by a RAID controller or some other type of device used in a communication system). In particular, hard drives, hard drive controllers (e.g., SCSI controllers, SAS controllers, or RAID controllers) may be configured to implement embodiments of the present disclosure. As another example, network cards or the like having cache memory may also be configured to implement embodiments of the present disclosure.
  • With reference now to FIG. 1, additional details of a computing system 100 capable of implementing hashing methods and various cache lookup techniques will be described in accordance with at least some embodiments of the present disclosure. The computing system 100 is shown to include a host system 104, a controller 108 (e.g., a SCSI controller, a SAS controller, a RAID controller, etc.), and a storage array 112 having a plurality of storage devices 136 a-N therein. The system 100 may utilize any type of data storage architecture. The particular architecture depicted and described herein (e.g., a RAID architecture) should not be construed as limiting embodiments of the present disclosure. If implemented as a RAID architecture, however, it should be appreciated that any type of RAID scheme may be employed (e.g., RAID-0, RAID-1, RAID-2, . . . , RAID-5, RAID-6, etc.).
  • In a RAID-0 (also referred to as a RAID level 0) scheme, data blocks are stored in order across one or more of the storage devices 136 a-N without redundancy. This effectively means that none of the data blocks are copies of another data block and there is no parity block to recover from failure of a storage device 136. A RAID-1 (also referred to as a RAID level 1) scheme, on the other hand, uses one or more of the storage devices 136 a-N to store a data block and an equal number of additional mirror devices for storing copies of a stored data block. Higher level RAID schemes can further segment the data into bits, bytes, or blocks for storage across multiple storage devices 136 a-N. One or more of the storage devices 136 a-N may also be used to store error correction or parity information.
  • A single unit of storage can be spread across multiple devices 136 a-N and such a unit of storage may be referred to as a stripe. A stripe, as used herein and as is well known in the data storage arts, may include the related data written to multiple devices 136 a-N as well as the parity information written to a parity storage device 136 a-N. In a RAID-5 (also referred to as a RAID level 5) scheme, the data being stored is segmented into blocks for storage across multiple devices 136 a-N with a single parity block for each stripe distributed in a particular configuration across the multiple devices 136 a-N. This scheme can be compared to a RAID-6 (also referred to as a RAID level 6) scheme in which dual parity blocks are determined for a stripe and are distributed across each of the multiple devices 136 a-N in the array 112.
  • One of the functions of the controller 108 is to make the multiple storage devices 136 a-N in the array 112 appear to a host system 104 as a single high capacity disk drive. Thus, the controller 108 may be configured to automatically distribute data supplied from the host system 104 across the multiple storage devices 136 a-N(potentially with parity information) without ever exposing the manner in which the data is actually distributed to the host system 104.
  • In the depicted embodiment, the host system 104 is shown to include a processor 116, an interface 120, and memory 124. It should be appreciated that the host system 104 may include additional components without departing from the scope of the present disclosure. The host system 104, in some embodiments, corresponds to a user computer, laptop, workstation, server, collection of servers, or the like. Thus, the host system 104 may or may not be designed to receive input directly from a human user.
  • The processor 116 of the host system 104 may include a microprocessor, central processing unit (CPU), collection of microprocessors, or the like. The memory 124 may be designed to store instructions that enable functionality of the host system 104 when executed by the processor 116. The memory 124 may also store data that is eventually written by the host system 104 to the storage array 112. Further still, the memory 124 may be used to store data that is retrieved from the storage array 112. Illustrative memory 124 devices may include, without limitation, volatile or non-volatile computer memory (e.g., flash memory, RAM, DRAM, ROM, EEPROM, etc.).
  • The interface 120 of the host system 104 enables the host system 104 to communicate with the controller 108 via a host interface 128 of the controller 108. In some embodiments, the interface 120 and host interface(s) 128 may be of a same or similar type (e.g., utilize a common protocol, a common communication medium, etc.) such that commands issued by the host system 104 are receivable at the controller 108 and data retrieved by the controller 108 is transmittable back to the host system 104. The interfaces 120, 128 may correspond to parallel or serial computer interfaces that utilize wired or wireless communication channels. The interfaces 120, 128 may include hardware that enables such wired or wireless communications. The communication protocol used between the host system 104 and the controller 108 may correspond to any type of known host/memory control protocol. Non-limiting examples of protocols that may be used between interfaces 120, 128 include SAS, SATA, SCSI, FibreChannel (FC), iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • The controller 108 may provide the ability to represent the entire storage array 112 to the host system 104 as a single high volume data storage device. Any known mechanism can be used to accomplish this task. The controller 108 may help to manager the storage devices 136 a-N (which can be hard disk drives, sold-state drives, or combinations thereof) so as to operate as a logical unit. In some embodiments, the controller 108 may be physically incorporated into the host device 104 as a Peripheral Component Interconnect (PCI) expansion (e.g., PCI express (PCI)e) card or the like. In such situations, the controller 108 may be referred to as a RAID adapter.
  • The storage devices 136 a-N in the storage array 112 may be of similar types or may be of different types without departing from the scope of the present disclosure. The storage devices 136 a-N may be co-located with one another or may be physically located in different geographical locations. The nature of the storage interface 132 may depend upon the types of storage devices 136 a-N used in the storage array 112 and the desired capabilities of the array 112. The storage interface 132 may correspond to a virtual interface or an actual interface. As with the other interfaces described herein, the storage interface 132 may include serial or parallel interface technologies. Examples of the storage interface 132 include, without limitation, SAS, SATA, SCSI, FC, iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • The controller 108 is shown to have communication capabilities with a controller cache 140. While depicted as being separate from the controller 108, it should be appreciated that the controller cache 140 may be integral to the controller 108, meaning that components of the controller 108 and the controller cache 140 may be contained within a single physical housing or computing unit (e.g., server blade). The controller cache 140 is provided to enable the controller 108 to perform caching operations. The controller 108 may employ caching operations during execution of I/O commands received from the host system 104. Depending upon the nature of the I/O command and the amount of information being processed during the command, the controller 108 may require a large number of cache memory modules 148 or a smaller number of cache memory modules 148. The memory modules 148 may correspond to flash memory, RAM, DDR memory, or some other type of computer memory that is quickly accessible and can be rewritten multiple times. The number of separate memory modules 148 in the controller cache 140 is typically larger than one, although a controller cache 140 may be configured to operate with a single memory module 148 if desired.
  • The cache interface 144 may correspond to any interconnect that enables the controller 108 to access the memory modules 148, temporarily store data thereon, and/or retrieve data stored thereon in connection with performing an I/O command or some other executable command. In some embodiments, the controller cache 140 may be integrated with the controller 108 and may be executed on a CPU chip or placed on a separate chip within the controller 108. In such a scenario, the interface 144 may correspond to a separate bus interconnect within the CPU or traces connecting a chip of the controller cache 140 with a chip executing the processor of the controller 108. In other embodiments, the controller cache 140 may be external to the controller 108 in which case the interface 144 may correspond to a serial or parallel data port.
  • Direct mapping is one type of cache configuration in which each block is mapped to exactly one cache location. Conceptually, this is like rows in a table with three columns: the data block or cache line that contains the actual data fetched and stored, a tag that contains all or part of the address of the fetched data, and a flag bit that connotes the presence of a valid bit of data in the row entry.
  • Fully associative mapping is similar to direct mapping in structure, but allows a block to be mapped to any cache location rather than to a pre-specified cache location (as is the case with direct mapping).
  • Set associative mapping can be viewed as a compromise between direct mapping and fully associative mapping in which each block is mapped to a subset of cache locations. It is sometimes called N-way set associative mapping, which provides for a location in main memory to be cached to any of “N” locations in the L1 cache.
  • As will be discussed in further detail herein, the controller 108 and controller cache 140 may be enabled to implement an N-way set associated mapping in a more efficient manner than previous caching systems. In particular, the controller 108 and controller cache 140 may be configured to use hashing techniques that are N-time associative. This will effectively control the number of hash collisions, which can help increase processing speed even though different inputs may result in the same hash output. Although this particular type of hash strategy will be described herein, it should be appreciated that any hash type can be used, including relatively simple hash types that potentially result in the same hash outputs using different hash inputs.
  • With reference now to FIG. 2 additional details of a controller 108 will be described in accordance with at least some embodiments of the present disclosure. The controller 108 is shown to include the host interface(s) 128 and storage interface(s) 132. The controller 108 is also shown to include a processor 204, memory 208, one or more drivers 212, and a power source 216.
  • The processor 204 may include an Integrated Circuit (IC) chip or multiple IC chips, a CPU, a microprocessor, or the like. The processor 204 may be configured to execute instructions in memory 208 that are shown to include a cache management module 224, hashing instructions 228, hash search instructions 232, and a slot collision list. Furthermore, in connection with executing the cache management module 224, the processor 204 may utilize a set associative hash 220 containing a plurality of hash buckets 240 a-M. The set associative hash 220 is shown to be maintained internally to the controller 108. It should be appreciated, however, that some or all of the set associative hash 220 may be stored and/or maintained external to the controller 108. Alternatively or additionally, the set associative hash 220 may be stored or contained within memory 208 of the controller 108.
  • The memory 208 may be volatile and/or non-volatile in nature. As indicated above, the memory 208 may include any hardware component or collection of hardware components that are capable of storing instructions and communicating those instructions to the processor 204 for execution. Non-limiting examples of memory 208 include RAM, ROM, flash memory, EEPROM, variants thereof, combinations thereof, and the like.
  • The instructions stored in memory 208 are shown to be different instruction sets, but it should be appreciated that the instructions can be combined into a smaller number of instruction sets without departing from the scope of the present disclosure. The cache management module 224, when executed, enable the processor 204 to receive I/O commands from the host system 104, determine one or more caching operations that need to be performed in connection with executing the I/O command, and identify which memory module(s) 148 should be used in connection with performing the one or more caching operations. The cache management module 224 may also be configured to invoke the other instruction sets in memory 208 to further perform such caching operations. The cache management module 224 may also be configured to manage the set associative hash 220 and the values assigned to various data fields within the set associative hash 220. In particular, as hash buckets 240 a-M are utilized, the cache management module 224 may be enabled to update the set associative hash 220 to reflect such utilizations. The cache management module 224 may also be responsible for facilitating communications between the controller 108 and the controller cache 140.
  • The hashing instructions 228, when executed, enable the processor 208 to calculate hash output values based hash inputs. In some embodiments, hash inputs used by the hashing instructions 228 to calculate hash output values may include numbers assigned to particular hash buckets 240 a-M, numbers assigned to hash slots within the hash buckets 240 a-M, memory addresses of memory modules 148, LBAs, storage time, data retrieval time, strip/row numbers, a hash index, a maximum number of bits needed to complete the I/O command, and other numbers derived therefrom. In some embodiments, the hashing instructions 228 may implement a hashing function that correspond to a linear hashing function. Use of a linear hash function can help ensure that hash slots for successive strips/rows of data are stored consecutively (thereby increasing the efficiency with which such data can be later retrieved by the controller 108). Use of a linear hash function is advantageous especially in hardware accelerated caching solutions where the hardware thread does not have direct access to the memory module 148 (e.g., SRAM or DRAM) where the hash slots are stored. These hash slots can be loaded when required and stored back when the objective is achieved (e.g., the I/O command is completed). Since an I/O command almost always results in consecutive strips/rows, many slots can be loaded and stored together, thus ensuring that there are not too many unnecessary load and store operations. In some embodiments, the transpose operation of the hash function executed by the hashing instructions 228 ensures that a region in each virtual disk should not map to the same slot in a hash bucket 240 a-M, otherwise the slot will have a tendency to be added to the slot collision list 236.
  • The hash search instructions 232, when executed, enable the processor 208 to search the set associative hash 220 for hash collisions. As will be described in further detail herein, as the controller 108 executes I/O commands, the controller 108 may utilize the hash search instructions 232 to see if a particular piece or set of data is already stored in the memory module 148. The hash search instructions 232 may be enabled to utilize the hashing instructions 228 to generate a hash for the data being searched and may compare that hash to hashes stored in the set associative hash 220. If collisions (e.g., matches) are found, the hash search instructions 232 and/or cache management module 224 may be enabled to update the slot collision list 236 as will be described in further detail herein.
  • With reference now to FIGS. 3A and 3B, details of a set associative hash 220 used in accordance with embodiments of the present disclosure will be described. The set associative hash 220 is shown to include a plurality of hash buckets 240 a-M linearly arranged. The set associative hash 220 may include up to M hash buckets, where M is an integer number greater than or equal to one.
  • Each hash slot bucket 240 is shown to include a plurality of hash slots 304 a-d. Although the hash slot buckets 240 are shown to include four hash slots 304 a-d, it should be appreciated that a hash slot bucket 240 may be configured to include a greater or lesser number of hash slots. For instance, the hash slot bucket 240 may have two, three, four, five, . . . , ten, twenty, or more hash slots without departing from the scope of the present disclosure.
  • Each hash slot 304 is shown to include a number of fields for storing information in connection with a hash generated for the slot 304. The fields include, without limitation, a flag field 308, a Cache Segment ID (CSID) field 312, an LSB field 316, and an MSB field 320. The flag field 308 may contain a number of bits that indicate various details or pieces of information related to the hash slot 304. For instance, the flags 308 may be used to indicate whether data stored in the hash slot 304 continues in another hash slot (e.g., by an appropriate indication in the bit link field 324). As another non-limiting example, the flags 308 may indicate the presence of a bad block in one of the LBA ranges that map into the corresponding slot 304.
  • The CSID field 312 provides an identification of a cache segment. The identification may be provided as an alphanumerical string. In some embodiments, adjacent slots may be assigned consecutively-numbered CSIDs. In other embodiments, CSIDs may be assigned randomly or pseudo-randomly. In any event, the CSID provides a substantially unique (e.g., unique to the data structure 220) identifier for a slot 304. This means that any particular slot 304 will have a different CSID from all other slots 304 in the set associative hash 220.
  • The combination of the LSB field 316 and MSB field 320 may provide an address range in memory where a hash slot is stored. In particular, the identifiers stored in the fields 316, 320 may identify an LBA of a memory module 148 or an LBA range of the memory module 148. Collectively, the fields 316, 320 may be referred to as a tag field. In some embodiments, a 1 byte value in the tag field may indicate that the cache segment frame is used as a hash slot extension, which is shown in FIG. 3B. In some embodiments, each hash slot 304 is provisioned with four potential collision lists. This effectively ensures that when there is a hash hit and a new element is added to the hash index, the hash does not go into the collision list but instead into the next available slot. The hash will only be added to the collision list 236 when all four slots 304 of the hash bucket 240 are filled.
  • This set associative hashing ensures that the length of the collision list is reduced. For instance, assume there are 4 elements that needed to be added to a same hash slot bucket 240 then all the 4 are added into the slots 304 a-d and there would not be any addition to the collision list 236. When a fifth element is to be added, an extension (as shown in FIG. 3B) that can hold up to 16 slots is created and the last index (e.g., the fourth index) in the hash slot 304 d updated with the extension ID, the information in the last index (e.g., the fourth index) is copied into the first index slot in the extension thus establishing the chain. Now up to 16 more elements can be added to this extension frame without adding a further collision list 236. After all the 16 slots of the hash extension are filled, if an additional element needs to be added, then one more hash extension frame that can hold up to 16 frames is created and it is chained to the last slot in the current extension frame (thus creating a chain of hash slot extensions). Accordingly, if a 100 elements needs to be added, if there is not set associative hashing employed then the worst case nonlinear traversal is going to be 100 to determine if there is a hash hit or not. But with this approach, only four slots 304 in the hash bucket 240 would do a linear search, then one chain is loaded and 16 slots in the first extension involves a linear search. In the same way there would be maximum of 6 extensions with each one having a linear list of 16 elements. Hence with this approach, in the worst case, there are 6 non-linear traversals and 100 linear traversals. It should be noted that linear traversals do not involve loading a new memory which makes the traversal faster unlike a non-linear traversal (e.g., a linked list) which involves loading of the next link and its corresponding memory before proceeding to the next element.
  • Hardware implementations can optimize comparisons in a linear list. In some embodiments, the hash bucket 240 preserves the expendability of the chaining, but this feature kicks in only when unusual workloads make the chaining necessary (e.g., there are more than 4 collisions). The hash slot 304 also indicates if the bad block bit is set (e.g., via flag 308). If the flag bit is set, it indicates that some/all of the cache segments present in this hash slot 304 have bad blocks and hence the caching algorithms can take appropriate actions based on the specific use case—for example divert the command to a firmware module to take specific actions.
  • As mentioned above, each consecutive strip/row can be found in consecutive hash slot buckets. This effectively allows a group of memory accesses to be performed instead of multiple individual memory accesses. As used herein, a hash slot bucket corresponds to a group of positions in a hash table and a hash slot corresponds to a particular position in the hash table. A particular slot may be empty or contain an entry.
  • With reference now to FIG. 4, a method of adding elements to a hash assigned to a slot bucket 240 will be described in accordance with at least some embodiments of the present disclosure. As can be appreciated, the hash search instructions 232 will search the hashes assigned to each slot bucket 240 for matches. These hashes need to be updated as additional slots in the hash slot bucket 240 are used and data is stored in corresponding addresses of cache memory 148.
  • The method begins when an I/O command is received at the controller 108 from the host system 104 and then the controller 108 invokes the cache management module 224 to search for an empty slot in the set associative hash 220. Once initiated, the cache management module 224 will set a last slot index equal to 3 (e.g., if there are 4 slots per hash bucket) (step 404). Then the cache management module 224 will set an empty slot index (step 408). In this step, the cache management module 224 will find an empty slot by searching hash slot buckets 240 starting at hash slot bucket 0 240 a until the last slot index is reached. At each h ash bucket, the cache manager 224 will check to see if the CSID field of the hash slot bucket is INVALID. This may indicate slot bucket is empty. In some embodiments, the first hash index is obtained from the I/O command although it can also be received from other sources. This first hash index gives the hash slot bucket which contains four slots. The next check is to see if any of the four slots 304 within the hash slot bucket 240 are empty (step 412)(e.g., from 0 to 3). If empty, the slot will be used to store the data in connection with the received I/O command.
  • If the first slot bucket is empty, then the first available slot within the slot bucket is assigned to the I/O command and the strip/row associated with that slot will be used to store data in connection with the I/O command. The cache management module 224 will also add the CSID that is assigned to the strip/row in connection to the IO command, to the hash slot assigned to the hash slot bucket 240 (step 440). The cache management module 224 will also update the appropriate fields of the slot 304 and bucket 240 to indicate that the assigned slot is not part of a slot chain. Thereafter, the method ends at step 444.
  • Referring back to step 408, if an empty slot is not found, the cache management module 224 will check to see if the last slot index in the hash bucket has an extension (step 416). If an extension is found (step 420), then the cache management module 224 will get the CSID from the last slot and load the CSID into local memory. The cache management module 224 will then start the linear search within the frame/slot extension (step 424). The method then reverts back to step 412 to see if any empty slots are found within the extension.
  • If no extension is found in step 420, then the cache management module 224 allocates a new cache frame/extension (step 428). In some embodiments, the allocated cache frame can be of any size and can be indexable by an identifier. As a non-limiting example, the cache frame may be a 128 byte cache frame.
  • The method further continues with the cache management module 224 copying the last slot into the first slot in the newly-allocated frame (step 432). Then the cache management module 224 sets the tag field 316, 320 to indicate that the newly-allocated frame corresponds to an extension (step 436). The cache management module 224 may further set the empty slot index to an appropriate value to indicate that the newly-allocated frame does not have any slots currently allocated therein. The method then proceeds to step 440.
  • With reference now to FIGS. 5A and 5B, additional details of a hash searching method will be described in accordance with at least some embodiments of the present disclosure. The hash searching method is performed to see if there is a hash hit or miss for a particular I/O command.
  • The method begins by setting the slot index (step 504). In some embodiments, the slot index is set using a hash function. In more specific embodiments, the hash function used to set the slot index is not a “strong” hash function. Rather, a relatively weak hash function is used for simple manipulation of data (e.g., shift and add functions). This means that faster responses can be achieved, but at the expense of higher collision rates. However, because the hash slots are grouped into buckets and because adjacent LBAs are assigned to consecutive buckets, the benefits of the fast I/O processing can be realized without having to worry about the increased risk of collisions. Furthermore, because embodiments of the present disclosure contemplate using DRAM, the I/O command can be executed relatively quickly because the data for any I/O command is stored in consecutive memory locations. Moreover, all of the M buckets can be accessed in a single read access rather than having to use multiple accesses. This will be true as long as there are less than four collisions for each hash bucket (if each bucket has four slots).
  • In some embodiments, the get hash index algorithm computes a hash index (e.g., “HashIndex”) according to the following:
  • Transp(VD=a0a1a2a3a4a5a6a7a8a9aaab) = abaaa9a8a7a6a5a4a3a2a1a0
    Shift = maxBitsForHS-maxBitsInLdNumber
    flippedVD = (Transp(VD) << shift)
    HashIndex = (Strip_RowNumber + flippedVD) % numberOfHashSlots

    Where abaaa9a8a7a6a5a4a3a2a1a0 is the virtual disk ID or number (e.g., TargetID), where maxBitsForHS is the number of bits required to represent the hash table (e.g., if the hash table size is 1024, the maxBitsForHS value is 10 (2̂10=1−24)), where maxBitsInLdNumber is the number of bits required to represent all of the Virtual Disks (e.g., if the total number of virtual disks supported is 64, then this value is 6 whereas for 2048, then the value is 11), where Transp(VD) is the transposed value of the Virtual Disk Number, where shift is the number of bits in Transp(VD) that need to be shifted, where flippedVD is the shifted Transp(VD) makes the hash sparse, and where Strip_RowNumber is the strip number present in the I/O command for RAID 0 or RAID 1 volume or row number for RAID 5, RAID 6 volumes.
  • After the slot index is found, the method proceeds with the cache management module 224 and/or hash search instructions 232 determining the number of slots that need to be loaded to accommodate the I/O command (step 508). In some embodiments, this number may be approximated by the cache management module 224 and/or hash search instructions 232 (as will be described in connection with FIG. 6). The slots then begin to be loaded (step 512). In some embodiments, the slots are loaded in alignment with a burst boundary. The burst boundary can depend upon the slot size, the burst size, and the number of slots included in a slot bucket. The slot loading begins with the first slot (step 516).
  • The cache management module 224 and/or hash search instructions 232 will determine whether the number of slots needed (e.g., as estimated or determined in step 508) is less than the total number of slots in a bucket (step 520). If this query is answered negatively, then the method proceeds by checking if the last slot has an extension (step 536). If the last slot has an extension, then the cache management module 224 and/or hash search instructions 232 get the ID of the frame from the last slot index and load that particular frame into local memory for additional searching (step 544). The cache management module 224 and/or hash search instructions 232 then update the last slot number and the last slot index (step 548). After the slot number and last slot index have been updated, the method proceeds to step 528.
  • Referring back to step 540, if there is no extension present, then the method proceeds to step 568 (FIG. 5B) as will be discussed in further detail herein.
  • Referring back to step 520, if the slot number is not less than or equal to the last slot index, then the method proceeds by comparing the strip/row from the slot to the strip/row for the request I/O command (step 524). Thereafter, the method proceeds with the cache management module 224 and/or hash search instructions 232 determining if a hash match is found (step 528). If not, the method increments to the next slot (step 532) and returns to step 520.
  • At step 552 and with reference to FIG. 5B, the method proceeds until a CSID match is found (step 552). Finding a matching CSID may be referred to as finding a hash hit. At this point, the required operation is performed in connection with executing the I/O command (step 556). In some embodiments, the local hash value may be stored in memory of the controller 108 or in the controller cache 140 (step 560). Thereafter, the method ends (step 564).
  • Referring back to the inquiry of step 540 (FIG. 5A) if there is no extension present, then the method proceeds with the cache management module 224 and/or hash search instructions 232 performing the required operation associated with a hash miss (step 568). The local hash can then be stored in memory if needed (step 560) and then the method ends (step 564).
  • With reference now to FIG. 6, a method of approximating a number of rows needed to accommodate an I/O operation will be described in accordance with at least some embodiments of the present disclosure. While for R0/R1 RAID architectures the number of hash slots to be loaded is equal to the number of strips spanned by the I/O command, for R5 the number of hash slots needed for an I/O command corresponds to the number of rows spanned by the I/O command. To find the number of rows often requires a divider function. Typically, in hardware accelerated solutions, the division operation is very expensive and it is not preferred to add division logic to the hardware if it can be avoided. Accordingly, if an R5 architecture it may be desirable to provide a method for approximating the number of rows needed to accommodate an I/O operation without requiring a division operation.
  • Embodiments of the present disclosure contemplate such an approximation method. The method begins by determining the number of strips of the first row (NSFR) (step 604). This value of NSFR may be set equal to the row data size minus log Arm. The Log Arm is the Logical Arm in a Raid Volume. For A Raid 5 volume of 3 drives. There will be two data Logical Arms Log Arm 0 and Log Arm 1 and one parity arm. Data is striped across these logical Arms.
  • An I/O command contains Row Number, Logical Arm Number and Offset in Arm to represent the start LBA for a Raid 5 volume. RowNumber=startlba/number of Data Arms. Log Arm=(startLba/srtipsize) % Number of Data Arms. OffsetInArm=startLba % strip size. The method proceeds by determining whether the NSFR is greater than the number of strips spanned by the I/O command (step 608). If the query is answered positively, then the NSFR is adjusted to equal the number of strips spanned by the I/O operation (step 612). If, however, the query of step 608 is answered negatively, then the method proceeds to determine the row data size, the number of rows, the number of full rows, and the number of strips spanned in the last row (step 616). The method further continues by determining whether the number of strips spanned in the last row is equal to zero (step 620). If this query is answered affirmatively, then the method proceeds to setting the number of rows equal to the determined number of rows plus one (step 624). However, if the query of step 720 is answered negatively, then the method proceeds by approximating the number of rows spanned by the I/O operation (step 628). In some embodiments, the nearest power of two value (NP2) of the row data size of the RAID volume is used to approximate the number of rows spanned by the I/O operation.
  • Specific details were given in the description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
  • While illustrative embodiments of the disclosure have been described in detail herein, it is to be understood that the inventive concepts may be otherwise variously embodied and employed, and that the appended claims are intended to be construed to include such variations, except as limited by the prior art.

Claims (20)

What is claimed is:
1. A method of data caching, the method comprising:
organizing a plurality of hash slots into a plurality of hash slot buckets such that each hash slot bucket in the plurality of hash slot buckets contains a plurality of hash slots having Logical Block Addressing (LBA) and Cache Segment ID (CSID) pairs;
receiving an Input/Output (I/O) request from a host system;
determining that cache memory is needed to fulfill the I/O request; and
performing a cache lookup in connection with fulfilling the I/O request, wherein the cache lookup includes analyzing the plurality of hash slots for unoccupied hash slots by comparing a hash against hash values assigned to the hash slot buckets instead of individual hash values assigned to the hash slots.
2. The method of claim 1, wherein each hash slot bucket comprises a same number of hash slots.
3. The method of claim 2, wherein each hash slot bucket comprises at least four hash slots.
4. The method of claim 1, wherein the hash value for a hash slot bucket is determined, at least in part, using the LBA or CSID of a hash slot contained within the hash slot bucket as an input to a hash function.
5. The method of claim 1, further comprising:
searching the plurality of hash slots and identifying a hash slot bucket having at least one unoccupied hash slot therein;
adding an element to the at least one unoccupied hash slot contained within the identified hash slot bucket;
adding a CSID of the at least one hash slot to which the element is added; and
setting a next and previous for the CSID of the at least one hash slot to an invalid value.
6. The method of claim 1, further comprising:
determining that an approximation is needed for a number of rows spanned by the I/O request; and
implementing an approximation routine in which the approximate number of rows spanned by the I/O request is determined by considering at least two of the following: (i) a nearest power of two value of row data size of a RAID volume, (ii) a number of rows in the RAID volume that are full, and (iii) a number of strips spanned in a last row of the RAID volume.
7. The method of claim 1, further comprising:
setting a slot index to an initial value;
comparing a first hash value assigned to a first hash slot in a hash slot bucket with the hash, wherein the first hash slot corresponds to a lowest numbered hash slot in the first hash slot bucket;
determining that the hash does not match the first hash value;
incrementing the slot index;
comparing a next hash value assigned to a next hash slot in the hash slot bucket with the hash, wherein the next hash slot corresponds to a lowest numbered hash slot in the first hash slot bucket with the exception of the first hash slot and any hash slot already having its hash value compared with the hash; and
repeating the incrementing and comparing of the next hash value with the hash until a match is found between the hash and the next hash value.
8. A system, comprising:
a controller cache comprising one or more memory modules; and
a controller situated between a host system and a data storage device, the controller being configured to temporarily store data in the one or more memory modules of the controller cache in connection with processing Input/Output (I/O) requests received from the host system, the controller further comprising:
a processor; and
memory coupled to the processor, the memory including instructions that, when executed by the processor, enable the controller to perform the following:
organize a plurality of hash slots into a plurality of hash slot buckets such that each hash slot bucket in the plurality of hash slot buckets contains a plurality of hash slots having Logical Block Addressing (LBA) and Cache Segment ID (CSID) pairs;
receive an Input/Output (I/O) request from the host system;
determine that the one or more memory modules are needed to fulfill the I/O request; and
perform a cache lookup in connection with fulfilling the I/O request, wherein the cache lookup includes analyzing the plurality of hash slots for unoccupied hash slots by comparing a hash against hash values assigned to the hash slot buckets instead of individual hash values assigned to the hash slots.
9. The system of claim 8, wherein each hash slot bucket comprises a same number of hash slots.
10. The system of claim 9, wherein each hash slot bucket comprises at least four hash slots.
11. The system of claim 8, wherein the hash value for a hash slot bucket is determined, at least in part, using the LBA or CSID of a hash slot contained within the hash slot bucket as an input to a hash function.
12. The system of claim 8, wherein the instructions further enable the controller to:
search the plurality of hash slots and identify a hash slot bucket having at least one unoccupied hash slot therein;
add an element to the at least one unoccupied hash slot contained within the identified hash slot bucket;
add a CSID of the at least one hash slot to which the element is added; and
set a next and previous for the CSID of the at least one hash slot to an invalid value.
13. The system of claim 8, wherein the instructions further enable the controller to:
determine that an approximation is needed for a number of rows spanned by the I/O request; and
implement an approximation routine in which the approximate number of rows spanned by the I/O request is determined by considering at least two of the following: (i) a nearest power of two value of row data size of a RAID volume, (ii) a number of rows in the RAID volume that are full, and (iii) a number of strips spanned in a last row of the RAID volume.
14. The system of claim 8, wherein the instructions further enable the controller to:
set a slot index to an initial value;
compare a first hash value assigned to a first hash slot in a hash slot bucket with the hash, wherein the first hash slot corresponds to a lowest numbered hash slot in the first hash slot bucket;
determine that the hash does not match the first hash value;
increment the slot index;
compare a next hash value assigned to a next hash slot in the hash slot bucket with the hash, wherein the next hash slot corresponds to a lowest numbered hash slot in the first hash slot bucket with the exception of the first hash slot and any hash slot already having its hash value compared with the hash; and
repeat the incrementing and comparing of the next hash value with the hash until a match is found between the hash and the next hash value.
15. A controller situated between a host system and a data storage array, the controller being configured to temporarily store data in cache memory in connection with processing Input/Output (I/O) requests received from the host system, the controller comprising:
a processor; and
memory coupled to the processor, the memory including instructions that, when executed by the processor, enable the controller to perform the following:
organize a plurality of hash slots into a plurality of hash slot buckets such that each hash slot bucket in the plurality of hash slot buckets contains a plurality of hash slots having Logical Block Addressing (LBA) and Cache Segment ID (CSID) pairs;
receive an Input/Output (I/O) request from the host system;
determine that the cache memory is needed to fulfill the I/O request; and
perform a cache lookup in connection with fulfilling the I/O request, wherein the cache lookup includes analyzing the plurality of hash slots for unoccupied hash slots by comparing a hash against hash values assigned to the hash slot buckets instead of individual hash values assigned to the hash slots.
16. The controller of claim 15, wherein each hash slot bucket comprises a same number of hash slots.
17. The controller of claim 16, wherein each hash slot bucket comprises at least four hash slots.
18. The controller of claim 15, wherein the hash value for a hash slot bucket is determined, at least in part, using the LBA or CSID of a hash slot contained within the hash slot bucket as an input to a hash function.
19. The controller of claim 15, wherein the instructions further enable the controller to:
search the plurality of hash slots and identify a hash slot bucket having at least one unoccupied hash slot therein;
add an element to the at least one unoccupied hash slot contained within the identified hash slot bucket;
add a CSID of the at least one hash slot to which the element is added; and
set a next and previous for the CSID of the at least one hash slot to an invalid value.
20. The controller of claim 8, wherein the instructions further enable the controller to:
determine that an approximation is needed for a number of rows spanned by the I/O request; and
implement an approximation routine in which the approximate number of rows spanned by the I/O request is determined by considering at least two of the following: (i) a nearest power of two value of row data size of a RAID volume, (ii) a number of rows in the RAID volume that are full, and (iii) a number of strips spanned in a last row of the RAID volume.
US15/335,025 2016-10-20 2016-10-26 Method and system for efficient hashing optimized for hardware accelerated caching Abandoned US20180113810A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/335,025 US20180113810A1 (en) 2016-10-20 2016-10-26 Method and system for efficient hashing optimized for hardware accelerated caching

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662410752P 2016-10-20 2016-10-20
US15/335,025 US20180113810A1 (en) 2016-10-20 2016-10-26 Method and system for efficient hashing optimized for hardware accelerated caching

Publications (1)

Publication Number Publication Date
US20180113810A1 true US20180113810A1 (en) 2018-04-26

Family

ID=61969606

Family Applications (5)

Application Number Title Priority Date Filing Date
US15/335,037 Active US10223009B2 (en) 2016-10-20 2016-10-26 Method and system for efficient cache buffering supporting variable stripe sizes to enable hardware acceleration
US15/335,014 Abandoned US20180113639A1 (en) 2016-10-20 2016-10-26 Method and system for efficient variable length memory frame allocation
US15/335,039 Active 2036-11-25 US10078460B2 (en) 2016-10-20 2016-10-26 Memory controller utilizing scatter gather list techniques
US15/335,025 Abandoned US20180113810A1 (en) 2016-10-20 2016-10-26 Method and system for efficient hashing optimized for hardware accelerated caching
US15/335,030 Active 2036-11-29 US10108359B2 (en) 2016-10-20 2016-10-26 Method and system for efficient cache buffering in a system having parity arms to enable hardware acceleration

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US15/335,037 Active US10223009B2 (en) 2016-10-20 2016-10-26 Method and system for efficient cache buffering supporting variable stripe sizes to enable hardware acceleration
US15/335,014 Abandoned US20180113639A1 (en) 2016-10-20 2016-10-26 Method and system for efficient variable length memory frame allocation
US15/335,039 Active 2036-11-25 US10078460B2 (en) 2016-10-20 2016-10-26 Memory controller utilizing scatter gather list techniques

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/335,030 Active 2036-11-29 US10108359B2 (en) 2016-10-20 2016-10-26 Method and system for efficient cache buffering in a system having parity arms to enable hardware acceleration

Country Status (1)

Country Link
US (5) US10223009B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10789176B2 (en) * 2018-08-09 2020-09-29 Intel Corporation Technologies for a least recently used cache replacement policy using vector instructions
US10970205B2 (en) * 2018-05-31 2021-04-06 Micron Technology, Inc. Logical-to-physical data structures for tracking logical block addresses indicative of a collision
US11106585B2 (en) * 2019-10-31 2021-08-31 EMC IP Holding Company, LLC System and method for deduplication aware read cache in a log structured storage array
US11797539B2 (en) * 2019-09-12 2023-10-24 Oracle International Corporation Accelerated building and probing of hash tables using symmetric vector processing
US12174788B2 (en) * 2021-11-25 2024-12-24 Gluesys Co., Ltd. Data input/output method using storage node-based key-value store
US20250328473A1 (en) * 2024-04-19 2025-10-23 Hitachi Vantara, Ltd. Storage system and storage control method
US20250335362A1 (en) * 2024-04-24 2025-10-30 Sk Hynix Nand Product Solutions Corp. Systems, methods, and media for providing append-only caches

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10250473B2 (en) * 2016-11-29 2019-04-02 Red Hat Israel, Ltd. Recovery from a networking backend disconnect
US10642821B2 (en) * 2017-03-17 2020-05-05 Apple Inc. Elastic data storage system
US10282116B2 (en) * 2017-07-19 2019-05-07 Avago Technologies International Sales Pte. Limited Method and system for hardware accelerated cache flush
US20190087111A1 (en) * 2017-09-15 2019-03-21 Seagate Technology Llc Common logical block addressing translation layer for a storage array
US10496297B2 (en) * 2017-11-21 2019-12-03 Micron Technology, Inc. Data categorization based on invalidation velocities
KR102795964B1 (en) * 2018-11-08 2025-04-16 삼성전자주식회사 Storage device, operating method of storage device and operating method of host controlling storage device
US11061676B2 (en) 2019-04-24 2021-07-13 International Business Machines Corporation Scatter gather using key-value store
US12366961B2 (en) * 2023-10-30 2025-07-22 Avago Technologies International Sales Pte. Limited Watermarked IO diverts for virtual disks

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130032615A1 (en) * 2011-08-04 2013-02-07 Chung Julian Jaeyoon T-Shirts Hanger
US20150213805A1 (en) * 2014-01-30 2015-07-30 Qualcomm Incorporated Indicating frame parameter reusability for coding vectors

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5931920A (en) 1997-08-05 1999-08-03 Adaptec, Inc. Command interpreter system in an I/O controller
US6175900B1 (en) 1998-02-09 2001-01-16 Microsoft Corporation Hierarchical bitmap-based memory manager
KR100528967B1 (en) 2002-12-18 2005-11-15 한국전자통신연구원 Apparatus and method for controlling memory allocation for variable sized packets
EP1619584A1 (en) 2004-02-13 2006-01-25 Jaluna SA Memory allocation
US7730239B2 (en) 2006-06-23 2010-06-01 Intel Corporation Data buffer management in a resource limited environment
US8266116B2 (en) 2007-03-12 2012-09-11 Broadcom Corporation Method and apparatus for dual-hashing tables
US9280609B2 (en) 2009-09-08 2016-03-08 Brocade Communications Systems, Inc. Exact match lookup scheme
US9134909B2 (en) 2011-08-30 2015-09-15 International Business Machines Corporation Multiple I/O request processing in a storage system
US8938603B2 (en) 2012-05-31 2015-01-20 Samsung Electronics Co., Ltd. Cache system optimized for cache miss detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130032615A1 (en) * 2011-08-04 2013-02-07 Chung Julian Jaeyoon T-Shirts Hanger
US20150213805A1 (en) * 2014-01-30 2015-07-30 Qualcomm Incorporated Indicating frame parameter reusability for coding vectors

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970205B2 (en) * 2018-05-31 2021-04-06 Micron Technology, Inc. Logical-to-physical data structures for tracking logical block addresses indicative of a collision
US11556466B2 (en) 2018-05-31 2023-01-17 Micron Technology, Inc. Logical-to-physical data structures
US10789176B2 (en) * 2018-08-09 2020-09-29 Intel Corporation Technologies for a least recently used cache replacement policy using vector instructions
US11797539B2 (en) * 2019-09-12 2023-10-24 Oracle International Corporation Accelerated building and probing of hash tables using symmetric vector processing
US11106585B2 (en) * 2019-10-31 2021-08-31 EMC IP Holding Company, LLC System and method for deduplication aware read cache in a log structured storage array
US12174788B2 (en) * 2021-11-25 2024-12-24 Gluesys Co., Ltd. Data input/output method using storage node-based key-value store
US20250328473A1 (en) * 2024-04-19 2025-10-23 Hitachi Vantara, Ltd. Storage system and storage control method
US20250335362A1 (en) * 2024-04-24 2025-10-30 Sk Hynix Nand Product Solutions Corp. Systems, methods, and media for providing append-only caches

Also Published As

Publication number Publication date
US20180113635A1 (en) 2018-04-26
US10078460B2 (en) 2018-09-18
US20180113634A1 (en) 2018-04-26
US20180113639A1 (en) 2018-04-26
US10223009B2 (en) 2019-03-05
US10108359B2 (en) 2018-10-23
US20180113633A1 (en) 2018-04-26

Similar Documents

Publication Publication Date Title
US20180113810A1 (en) Method and system for efficient hashing optimized for hardware accelerated caching
US9690493B2 (en) Two-level system main memory
US20180095662A1 (en) Parallel segment writer
US11237743B2 (en) Sub-block deduplication using sector hashing
US20150349806A1 (en) Technique for adaptively strengthing ecc for flash cache
US11803527B2 (en) Techniques for efficient data deduplication
WO2017020668A1 (en) Physical disk sharing method and apparatus
US10282116B2 (en) Method and system for hardware accelerated cache flush
US20190243779A1 (en) Method and system for operating nand flash physical space to extend memory capacity
US11288204B2 (en) Logical and physical address field size reduction by alignment-constrained writing technique
US10282301B2 (en) Method and system for hardware accelerated read-ahead caching
US20200057576A1 (en) Method and system for input/output processing for write through to enable hardware acceleration
US10528438B2 (en) Method and system for handling bad blocks in a hardware accelerated caching solution
WO2022119595A1 (en) Techniques for estimating deduplication between storage volumes
US10649906B2 (en) Method and system for hardware accelerated row lock for a write back volume
US12038852B2 (en) Partial logical-to-physical (L2P) address translation table for multiple namespaces
US20250245149A1 (en) Generating a logical to physical data structure for a solid state drive using sectors of different sizes
EP4155934A1 (en) Systems and methods for workload distribution across processing units
WO2025160316A1 (en) Generating a logical to physical data structure for a solid state drive using sectors of different sizes
CN120832091A (en) System and method for memory block allocation

Legal Events

Date Code Title Description
AS Assignment

Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SIMIONESCU, HORIA;HOGLUND, TIMOTHY;VEERLA, SRIDHAR RAO;AND OTHERS;SIGNING DATES FROM 20161017 TO 20161025;REEL/FRAME:040148/0355

AS Assignment

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED, SINGAPORE

Free format text: MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:047231/0369

Effective date: 20180509

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITE

Free format text: MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:047231/0369

Effective date: 20180509

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

AS Assignment

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITE

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE EXECUTION DATE OF THE MERGER AND APPLICATION NOS. 13/237,550 AND 16/103,107 FROM THE MERGER PREVIOUSLY RECORDED ON REEL 047231 FRAME 0369. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:048549/0113

Effective date: 20180905

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED, SINGAPORE

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE EXECUTION DATE OF THE MERGER AND APPLICATION NOS. 13/237,550 AND 16/103,107 FROM THE MERGER PREVIOUSLY RECORDED ON REEL 047231 FRAME 0369. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:048549/0113

Effective date: 20180905

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION