[go: up one dir, main page]

HK1251377B - Method for determining data in cache memory of cloud storage architecture and cloud storage system using the same - Google Patents

Method for determining data in cache memory of cloud storage architecture and cloud storage system using the same Download PDF

Info

Publication number
HK1251377B
HK1251377B HK18110770.5A HK18110770A HK1251377B HK 1251377 B HK1251377 B HK 1251377B HK 18110770 A HK18110770 A HK 18110770A HK 1251377 B HK1251377 B HK 1251377B
Authority
HK
Hong Kong
Prior art keywords
time
data
cache
specific
algorithm
Prior art date
Application number
HK18110770.5A
Other languages
Chinese (zh)
Other versions
HK1251377A1 (en
Inventor
陈文贤
谢文杰
黄明仁
Original Assignee
先智云端数据股份有限公司
Filing date
Publication date
Application filed by 先智云端数据股份有限公司 filed Critical 先智云端数据股份有限公司
Priority to HK18110770.5A priority Critical patent/HK1251377B/en
Publication of HK1251377A1 publication Critical patent/HK1251377A1/en
Publication of HK1251377B publication Critical patent/HK1251377B/en

Links

Description

云端储存设备系统及决定其架构的高速缓存中数据的方法Cloud storage device system and method for determining data in cache of its architecture

技术领域Technical Field

本发明涉及云端储存设备领域,特别是涉及一种云端储存设备系统及决定其架构的高速缓存中数据的方法。The present invention relates to the field of cloud storage devices, and in particular to a cloud storage device system and a method for determining data in a high-speed cache of the cloud storage device system.

背景技术Background Art

对云端服务系统而言,通常会尝试尽可能快速地提供其服务给客户,以响应客户的请求。当客户数量不大时,这标的很容易达到。然而,如果客户数量大增,受制于云端服务系统的硬件架构以及网络流量,响应时间必然会有快慢之分,但应在合理范围之内。另一方面,如果云端服务在商业上与其它云端服务竞争,无论其受限于何种事物,该云端服务系统应技术性地以有限的资源在最短的时间内响应客户的请求。这是个常见的众多云端系统开发者面对的议题,大家都在期盼能够有合适的解决方案。Cloud service systems typically strive to provide their services to clients as quickly as possible to respond to their requests. This goal is easily achieved when the number of clients is small. However, if the number of clients increases significantly, response times will inevitably vary, subject to the constraints of the cloud service system's hardware architecture and network traffic, but they should remain within a reasonable range. On the other hand, if a cloud service competes commercially with other cloud services, regardless of its limitations, the cloud service system should technically respond to client requests in the shortest possible time using limited resources. This is a common issue faced by many cloud system developers, and everyone is eager for a suitable solution.

在传统的工作环境中,请见图1,有许多客户端计算机1通过因特网3连接到服务器4。服务器4是主要处理客户请求的设备,可能会进行复杂的计算或仅执行储存数据的存取。对后者而言,储存的资料可以保留在快取5或辅助内存6中。快取5或辅助内存6的数量可以不限于1个,而是该云端服务所需的任何的数量。服务器4、快取5及辅助内存6形成云端服务系统的架构。快取5可能指的是动态随机存取内存(Dynamic Random Access Memory,DRAM)或静态随机存取内存(Static Random-Access Memory,SRAM)。辅助内存6可能是固态硬盘(Solid State Drive,SSD)、硬盘(Hard Disk Drive,HDD),可写式数字多功能激光视盘(Digital Versatile Disc,DVD),甚或是磁带。快取5与辅助内存6的物理性差异在于断电时的数据储存性。对快取5而言,数据在需要使用时暂时性地储存而当断电时消失。然而,无论是否通电,辅助内存6均能够长久地储存数据。快取5具有快速存取数据的优点,但是却有挥发性(易失性)、高价格及较小储存空间的缺点。In a traditional work environment, as shown in Figure 1, numerous client computers 1 are connected to a server 4 via the Internet 3. Server 4 primarily processes client requests and may perform complex computations or simply access stored data. In the latter case, stored data may be stored in cache 5 or auxiliary memory 6. The number of caches 5 or auxiliary memory 6 is not limited to one; it can be any number required by the cloud service. Server 4, cache 5, and auxiliary memory 6 form the architecture of the cloud service system. Cache 5 may be dynamic random access memory (DRAM) or static random access memory (SRAM). Auxiliary memory 6 may be a solid-state drive (SSD), a hard disk drive (HDD), a writable digital versatile disc (DVD), or even magnetic tape. The physical difference between cache 5 and auxiliary memory 6 lies in their ability to store data during power outages. In cache 5, data is temporarily stored when needed and disappears when power is lost. However, no matter whether it is powered on or not, the auxiliary memory 6 can store data permanently. The cache 5 has the advantage of fast access to data, but has the disadvantages of volatility, high price and small storage space.

如上所述,很明显,为了达到多数请求所需的热数据(较多存取)能快速地被存取,并以可忍受的较慢速度提供冷数据(较少存取),决定合适的数据储存于快取5当中是很重要的,且能改善云端服务效能。平均而言,响应所有来自客户端计算机请求的时间会落在可接受的范围内。近来,有许多传统算法可用于决定何种数据应被快取储存(储存于快取5中)。举例而言,Least Recently Used(LRU)算法、Most Recently Used(MRU)算法、Pseudo-LRU(PLRU)算法、Segmented LRU(SLRU)算法、2-way set associative算法、Least-Frequently Used(LFU)算法、Low Inter-reference Recent Set(LIRS)算法等等。这些算法由被分析数据本身的近因与频率的特性而执行,其结果与其它数据无关(不具有与资料相关的特性)。它们被归类为“与数据相关算法”,以原始的快取数据(来自前述传统快取算法的结果)当作标的数据以获得“与数据相关”的数据并进行快取储存。这意味着新的快取数据与原始快取数据有某种程度上的关联(新的快取资料有较高的机会与原始快取资料一同出现)。上述该些算法被察觉到在某些模式的工作负载上有效。然而,因为它们都计算出现于相对时段的数据,而不是绝对时段的数据,这导致了一个现象:被所有算法选出快取储存于第一时段(例如首8个小时)的数据,可能不尽然会在第二时段(例如首8个小时后的8个小时)中被存取。这很容易理解,因为几乎所有数据存取都是绝对时间相关或频率相关的,举例而言,在每天早晨8:55AM到9:05AM间开机、在每周三2:00PM开的会议、两周结算一次的工资、每月最后一天进行的盘点等等。因此,时间戳本身就是考虑快取数据的一个重要且独立因子。然而,目前尚未以时间戳为前提考虑快取储存的的解决方案。As described above, it's clear that determining the appropriate data to store in cache 5 is crucial for ensuring that hot data (more frequently accessed) required by most requests is quickly accessible, while cold data (less frequently accessed) is served at a tolerably slow rate. This improves cloud service performance. On average, the response time for all client computer requests should fall within an acceptable range. Currently, many conventional algorithms are used to determine which data should be cached (stored in cache 5). For example, the Least Recently Used (LRU) algorithm, the Most Recently Used (MRU) algorithm, the Pseudo-LRU (PLRU) algorithm, the Segmented LRU (SLRU) algorithm, the 2-way set associative algorithm, the Least-Frequently Used (LFU) algorithm, the Low Inter-reference Recent Set (LIRS) algorithm, and others. These algorithms are based on the recency and frequency characteristics of the data being analyzed, and their results are independent of other data (i.e., lack data-related characteristics). These algorithms are categorized as "data-dependent" algorithms, using the original cache data (results from the aforementioned traditional caching algorithms) as target data to obtain "data-dependent" data for caching. This means that the new cache data is somewhat related to the original cache data (new cache data has a higher probability of appearing with the original cache data). These algorithms have been found to be effective for certain workload patterns. However, because they all calculate data occurring in relative time periods, rather than absolute time periods, this leads to a phenomenon: data selected by all algorithms for caching in the first time period (e.g., the first eight hours) may not necessarily be accessed in the second time period (e.g., the eight hours after the first eight hours). This is understandable, as almost all data access is absolute time-dependent or frequency-dependent. For example, computer startups between 8:55 AM and 9:05 AM every morning, meetings at 2:00 PM every Wednesday, biweekly payroll, and inventory checks performed on the last day of each month. Therefore, the timestamp itself is an important and independent factor in considering cached data. However, currently, no solution considers caching based on timestamps.

发明内容Summary of the Invention

鉴于此,有必要针对传统技术中没有以时间戳为前提考虑快取储存的解决方案的问题,提供一种以在过去一段时间内被存取的与时间相关的数据来分析哪些数据应被快取储存,从而改进云端储存设备系统性能的云端储存设备系统及决定其架构的高速缓存中数据的方法。In view of this, it is necessary to address the problem that traditional technologies do not consider cache storage solutions based on timestamps, and to provide a cloud storage device system that analyzes which data should be cached based on time-related data accessed within a period of time, thereby improving the performance of the cloud storage device system and a method for determining the data in the cache of its architecture.

本发明提供一种用于决定云端储存设备架构的高速缓存中数据的方法,该方法包括步骤:The present invention provides a method for determining data in a cache of a cloud storage device architecture, the method comprising the steps of:

A、记录云端储存设备系统的高速缓存在过去一段时间内的处理内容,其中每一处理内容包括记录时间,或记录时间与过去该段时间内被存取的快取数据;A. Recording the cache contents of the cloud storage system over a period of time, where each transaction includes the recording time, or the recording time and the cache data accessed during that period of time;

B、指定在未来的特定时间;B. Specify a specific time in the future;

C、基于参考时段,对每一来自处理内容的快取资料计算出与时间相关的置信度;C. Calculating a time-related confidence score for each cached data from the processed content based on a reference period;

D、排序所述与时间相关的置信度;及D. rank the confidence levels associated with time; and

E、在所述高速缓存中提供具有较高与时间相关的置信度的快取数据,并当所述高速缓存在未来的所述特定时间前耗尽时,移除所述高速缓存中具有较低与时间相关的置信度的快取数据。E. providing cache data with higher time-related confidence in the cache, and removing cache data with lower time-related confidence in the cache when the cache is exhausted before the specific time in the future.

在其中一个实施例中,步骤E可以步骤E’所取代:In one embodiment, step E can be replaced by step E':

E’、提供具有较高与时间相关的置信度的快取数据与从至少一种其它快取算法计算得到的数据到高速缓存中,以在未来的所述特定时间前耗尽高速缓存的使用,其中在具有较高与时间相关的置信度的快取数据及从其它快取算法计算得到的数据间存在固定比率。E', providing cache data with a higher time-related confidence and data calculated from at least one other cache algorithm to the cache to exhaust the cache before the specified time in the future, wherein there is a fixed ratio between the cache data with a higher time-related confidence and the data calculated from the other cache algorithm.

在其中一个实施例中,所述固定比率是基于数据数量或数据占据空间而计算。In one embodiment, the fixed ratio is calculated based on the amount of data or the space occupied by the data.

在其中一个实施例中,所述特定时间包括一小时中的特定分钟、一天中的特定小时、一周中的特定日、一月中的特定日、一季中的特定日、一年中的特定日、一月中的特定周、一季中的特定周、一年中的特定周,或一年中的特定月。In one embodiment, the specific time includes a specific minute of an hour, a specific hour of a day, a specific day of a week, a specific day of a month, a specific day of a quarter, a specific day of a year, a specific week of a month, a specific week of a quarter, a specific week of a year, or a specific month of a year.

在其中一个实施例中,以二连续记录的处理内容间隔时间跨度的方式定期地记录所述些处理内容。In one embodiment, the processing contents are recorded periodically with a time span between two consecutively recorded processing contents.

在其中一个实施例中,所述参考时段包括在一小时中的特定分钟内、在一日中的特定小时内,或在一年中的特定日内。In one embodiment, the reference period includes a specific minute in an hour, a specific hour in a day, or a specific day in a year.

在其中一个实施例中,所述与时间相关的置信度由下列步骤计算得到:In one embodiment, the time-related confidence level is calculated by the following steps:

C1、计算第一数量,所述该第一数量为参考时段在过去该段时间内出现的数量;C1. Calculate a first quantity, where the first quantity is the quantity that occurred during the past period of time in the reference time period;

C2、计算第二数量,所述第二数量为当标的快取数据存取时,所述参考时段的数量;及C2. Calculate a second number, the second number being the number of the reference time periods when the target cache data is accessed; and

C3、将所述第二数量除以所述第一数量。C3. Divide the second amount by the first amount.

在其中一个实施例中,所述快取算法包括Least Recently Used(LRU)算法、MostRecently Used(MRU)算法、Pseudo-LRU(PLRU)算法、Random Replacement(RR)算法、Segmented LRU(SLRU)算法、2-way set associative算法、Least-Frequently Used(LFU)算法、Low Inter-reference Recent Set(LIRS)算法、Adaptive Replacement Cache(ARC)算法、Clock with Adaptive Replacement(CAR)算法、Multi Queue(MQ)算法,或以来自步骤D的结果作为标的数据的与数据相关算法。In one embodiment, the cache algorithm includes a Least Recently Used (LRU) algorithm, a Most Recently Used (MRU) algorithm, a Pseudo-LRU (PLRU) algorithm, a Random Replacement (RR) algorithm, a Segmented LRU (SLRU) algorithm, a 2-way set associative algorithm, a Least-Frequently Used (LFU) algorithm, a Low Inter-reference Recent Set (LIRS) algorithm, an Adaptive Replacement Cache (ARC) algorithm, a Clock with Adaptive Replacement (CAR) algorithm, a Multi Queue (MQ) algorithm, or a data-related algorithm using the result from step D as the target data.

在其中一个实施例中,所述数据的型态包括对象、区块,或档案。In one embodiment, the data type includes an object, a block, or a file.

本发明还提供一种云端储存设备系统,该云端储存设备系统包括:The present invention also provides a cloud storage device system, which includes:

主机,用以存取数据;Host, used to access data;

高速缓存,连接至所述主机,用以暂时储存快取数据供快速存取;a cache memory connected to the host computer for temporarily storing cache data for quick access;

处理内容记录器,配置到或安装在所述高速缓存,连接至所述主机以记录在过去一段时间内高速缓存的处理内容,其中每一处理内容包括记录时间,或记录时间与过去该段时间内被存取的快取数据、接收主机指定在未来的特定时间、基于参考时段,对每一来自处理内容的快取资料计算出与时间相关的置信度、排序所述与时间相关的置信度,及在所述高速缓存中提供具有较高与时间相关的置信度的快取数据,并当所述高速缓存在未来的所述特定时间前耗尽时,移除所述高速缓存中具有较低与时间相关的置信度的快取数据;及a transaction content recorder configured or installed in the cache, connected to the host to record transaction contents of the cache over a past period of time, wherein each transaction content includes a recorded time, or a recorded time and cache data accessed during the past period of time, receives a specific time in the future specified by the host, calculates a time-related confidence for each cache data from the transaction content based on a reference time period, sorts the time-related confidence, and provides cache data with higher time-related confidence in the cache, and removes cache data with lower time-related confidence from the cache when the cache is exhausted before the specific time in the future;

多个辅助内存,连接至所述主机,用以分散储存数据供存取。A plurality of auxiliary memories are connected to the host and used for distributing and storing data for access.

在其中一个实施例中,该云端储存设备系统也可包括:In one embodiment, the cloud storage device system may also include:

主机,用以存取数据;Host, used to access data;

高速缓存,连接至所述主机,用以暂时储存快取数据供快速存取;a cache memory connected to the host computer for temporarily storing cache data for quick access;

处理内容记录器,配置到或安装到所述高速缓存,连接至该所述主机以记录在过去一段时间内高速缓存的处理内容,其中每一处理内容包括记录时间,或记录时间与过去该段时间内被存取的快取数据、接收主机指定在未来的特定时间、基于参考时段,对每一来自处理内容的快取资料计算出与时间相关的置信度、排序所述与时间相关的置信度,及提供具有较高与时间相关的置信度的快取数据与从至少一种其它快取算法计算得到的数据到高速缓存中,以在未来的所述该特定时间前耗尽高速缓存的使用,其中在具有较高与时间相关的置信度的快取数据及从其它快取算法计算得到的数据间存在固定比率;及a transaction content recorder configured or installed in the cache, connected to the host to record transaction contents of the cache over a past period of time, wherein each transaction content includes a recorded time, or a recorded time and cache data accessed during the past period of time, receives a specific time in the future specified by the host, calculates a time-related confidence score for each cache data from the transaction content based on a reference time period, sorts the time-related confidence scores, and provides cache data with higher time-related confidence scores and data calculated from at least one other cache algorithm to the cache to deplete cache usage before the specific time in the future, wherein there is a fixed ratio between cache data with higher time-related confidence scores and data calculated from at least one other cache algorithm; and

多个辅助内存,连接至所述主机,用以分散储存数据供存取。A plurality of auxiliary memories are connected to the host and used for distributing and storing data for access.

在其中一个实施例中,所述固定比率是基于数据数量或数据占据空间而计算。In one embodiment, the fixed ratio is calculated based on the amount of data or the space occupied by the data.

在其中一个实施例中,所述特定时间包括一小时中的特定分钟、一天中的特定小时、一周中的特定日、一月中的特定日、一季中的特定日、一年中的特定日、一月中的特定周、一季中的特定周、一年中的特定周,或一年中的特定月。In one embodiment, the specific time includes a specific minute of an hour, a specific hour of a day, a specific day of a week, a specific day of a month, a specific day of a quarter, a specific day of a year, a specific week of a month, a specific week of a quarter, a specific week of a year, or a specific month of a year.

在其中一个实施例中,以二连续记录的处理内容间隔时间跨度的方式定期地记录该些处理内容。In one embodiment, the processing contents are recorded periodically with a time span between two consecutively recorded processing contents.

在其中一个实施例中,所述参考时段包括在一小时中的特定分钟内、在一日中的特定小时内,或在一年中的特定日内。In one embodiment, the reference period includes a specific minute in an hour, a specific hour in a day, or a specific day in a year.

在其中一个实施例中,所述与时间相关的置信度由下列步骤计算得到:In one embodiment, the time-related confidence level is calculated by the following steps:

C1、计算第一数量,所述该第一数量为参考时段在过去该段时间内出现的数量;C1. Calculate a first quantity, where the first quantity is the quantity that occurred during the past period of time in the reference time period;

C2、计算第二数量,所述第二数量为当标的快取数据存取时,所述参考时段的数量;及C2. Calculate a second number, the second number being the number of the reference time periods when the target cache data is accessed; and

C3、将所述该第二数量除以所述第一数量。C3. Divide the second number by the first number.

在其中一个实施例中,所述快取算法包括LRU算法、MRU算法、PLRU算法、RR算法、SLRU算法、2-way set associative算法、LFU算法、LIRS算法、ARC算法、CAR算法、MQ算法,或以处理内容记录器产生之数据当成标的数据的与数据相关算法。In one embodiment, the cache algorithm includes an LRU algorithm, an MRU algorithm, a PLRU algorithm, an RR algorithm, an SLRU algorithm, a 2-way set associative algorithm, an LFU algorithm, a LIRS algorithm, an ARC algorithm, a CAR algorithm, an MQ algorithm, or a data-related algorithm that uses data generated by a content recorder as target data.

在其中一个实施例中,所述数据的型态包括对象、区块,或档案。In one embodiment, the data type includes an object, a block, or a file.

本发明的有益效果至少包括:The beneficial effects of the present invention include at least:

上述云端储存设备系统及决定其架构的高速缓存中数据的方法中快取储存的数据是与时间相关的。因而,当下一个相关的时间来临时,这些数据是最有可能被存取的。在该相关的时间之前,这些数据可储存到高速缓存中,以改进云端储存设备系统的性能。这是传统快取算法无法企及的。In the aforementioned cloud storage system and method for determining cache data within its architecture, cached data is time-dependent. Therefore, when the next relevant time arrives, the data is most likely to be accessed. Prior to that relevant time, the data can be cached, improving the performance of the cloud storage system. This is unattainable with traditional caching algorithms.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为传统的一个实施例中的数据存取架构的示意图;FIG1 is a schematic diagram of a data access architecture in a conventional embodiment;

图2为一个实施例中的云端储存设备系统的结构示意图;FIG2 is a schematic diagram of the structure of a cloud storage device system in one embodiment;

图3为一个实施例中的处理内容记录的窗体;FIG3 is a window for processing content records in one embodiment;

图4为一个实施例中的决定云端储存设备架构的高速缓存中数据的方法的流程示意图;FIG4 is a flow chart illustrating a method for determining data in a cache of a cloud storage device architecture according to one embodiment;

图5为一个实施例中对所有快取数据计算的与时间相关的置信度的列表示意图;FIG5 is a schematic diagram of a table showing time-related confidence scores calculated for all cached data in one embodiment;

图6为另一个实施例中的对所有快取数据计算的与时间相关的置信度的列表示意图。FIG. 6 is a schematic diagram showing a list of time-related confidence scores calculated for all cached data in another embodiment.

具体实施方式DETAILED DESCRIPTION

本发明将参照下述的实施方式而更具体地描述。The present invention will be described in more detail with reference to the following embodiments.

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例对本发明云端储存设备系统及决定其架构的高速缓存中数据的方法进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。To make the objectives, technical solutions, and advantages of the present invention more clearly understood, the following describes in further detail the cloud storage device system and the method for determining cache data within its architecture, in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are intended only to illustrate the present invention and are not intended to limit the present invention.

图2显示实践本发明的一个实施例的理想架构。一种云端储存设备系统10包括了主机101、高速缓存102、处理内容记录器103,及多个辅助内存104。云端储存设备系统10支持云端服务的数据储存,它可能部分安装于一个服务器100中,如图2所示。服务器100是用来接收来自客户端设备请求的硬件,这些客户端设备比如个人计算机301、平板计算机302,及智能型手机303,或其它经由因特网200连接的远程设备。在运行该些请求之后,服务器100将反向传送对应的响应给客户端设备。每一组件将详细说明如下。FIG2 illustrates an idealized architecture for practicing one embodiment of the present invention. A cloud storage system 10 includes a host 101, a cache 102, a content recorder 103, and multiple auxiliary memories 104. Cloud storage system 10 supports data storage for cloud services and may be partially installed in a server 100, as shown in FIG2 . Server 100 is hardware used to receive requests from client devices, such as personal computers 301, tablet computers 302, and smartphones 303, or other remote devices connected via the Internet 200. After executing these requests, server 100 transmits corresponding responses back to the client devices. Each component is described in detail below.

主机101的工作职能主要是响应来自客户端设备的请求执行数据存取。事实上,主机101可能是服务器100中的控制器。在其它的实施例中,如果服务器100的中央处理器具有上述控制器相同的功能的话,主机101指的就是该中央处理器,甚或服务器100本身。主机101的定义并非是由其形态,而是其功能来决定。此外,主机101可能具有其它的功能,例如取得用热数据快取储存到高速缓存102中,但这并不在本发明的范围内。The primary function of host 101 is to respond to requests from client devices and perform data access. In practice, host 101 may be a controller within server 100. In other embodiments, if the central processing unit (CPU) of server 100 performs the same functions as the controller, host 101 may refer to the CPU, or even to server 100 itself. The definition of host 101 is not determined by its form factor, but rather by its function. Furthermore, host 101 may have other functions, such as caching hot data for storage in cache 102, but this is outside the scope of this invention.

高速缓存102连接至主机101,可以暂时储存快取资料供快速存取。实作上,高速缓存102可以是提供数据高速存取的任何硬件。举例而言,高速缓存102可以是SRAM(StaticRandom Access Memory,静态随机存取存储器)。高速缓存102可以是一个用于大型云端储存设备系统的独立模块,某些架构可以嵌设该独立模块到主机101(CPU)中。如同其它云端储存设备系统中的快取,都有一种默认的快取算法来决定哪些数据应该快取储存于高速缓存102中。在一个实施例中,提供一种平行机制与现有的快取算法一起运作,用于一种特定的目的或时机。事实上,也能使该快取机制主宰,取代由原先快取算法决定的快取数据。The cache 102 is connected to the host 101 and can temporarily store cache data for fast access. In practice, the cache 102 can be any hardware that provides high-speed access to data. For example, the cache 102 can be SRAM (Static Random Access Memory). The cache 102 can be an independent module for a large cloud storage device system, and some architectures can embed the independent module into the host 101 (CPU). Like caches in other cloud storage device systems, there is a default cache algorithm to determine which data should be cached and stored in the cache 102. In one embodiment, a parallel mechanism is provided to operate in conjunction with the existing cache algorithm for a specific purpose or time. In fact, the cache mechanism can also dominate and replace the cache data determined by the original cache algorithm.

处理内容记录器103是云端储存设备系统10中的要件,在本实施例中,它是一个硬件模块并配置到高速缓存102中。在其它实施例中,处理内容记录器103可以是软件,安装于高速缓存102或主机101的控制器中。在本实施例中,处理内容记录器103连接到主机101,它的许多功能是本发明的特征:记录在过去一段时间内高速缓存102的处理内容,其中每一处理内容包含记录时间,或记录时间与过去该段时间内被存取的快取数据、接收主机101指定在未来的特定时间、基于参考时段,对每一来自处理内容的快取资料计算出与时间相关的置信度、排序这些与时间相关的置信度,及提供具有较高与时间相关的置信度的快取数据到该高速缓存102中,并当该高速缓存102在未来的该特定时间前耗尽时,移除该高速缓存102中具有较低与时间相关的置信度的快取数据(或提供具有较高与时间相关的置信度的快取数据与从至少一种其它快取算法计算得到的数据到高速缓存102中,以在未来的该特定时间前耗尽高速缓存102的使用),这些功能将与本发明提出的方法将在后续说明。要强调的是本发明使用的“与时间相关的置信度”用词,相似于关系型规则中定义的置信度。该与时间相关的置信度进一步延伸到置信度值,该值由取一个特定时间或时段作为标的以获得一个或多个资料曾在过去历史中被存取的机率而计算得到。The processing content recorder 103 is an essential component of the cloud storage device system 10. In this embodiment, it is a hardware module and is configured in the cache 102. In other embodiments, the processing content recorder 103 may be software installed in the cache 102 or the controller of the host 101. In this embodiment, the processing content recorder 103 is connected to the host 101, and many of its functions are features of the present invention: recording the processing content of the cache 102 over a past period of time, wherein each processing content includes a recording time, or a recording time and cache data accessed during the past period of time, receiving a specific time in the future specified by the host 101, calculating a time-related confidence for each cache data from the processing content based on a reference time period, sorting these time-related confidences, and providing cache data with higher time-related confidence to the cache 102, and when the cache 102 is exhausted before the specific time in the future, removing the cache data with lower time-related confidence in the cache 102 (or providing cache data with higher time-related confidence and data calculated from at least one other cache algorithm to the cache 102 to exhaust the use of the cache 102 before the specific time in the future). These functions will be described later in conjunction with the method proposed by the present invention. It is important to emphasize that the term "time-dependent confidence" used in this invention is similar to the confidence defined in relational rules. This time-dependent confidence further extends to a confidence value calculated by taking a specific time or period as the target and obtaining the probability that one or more data items have been accessed in the past history.

辅助内存104也连接至主机101,它们能分散储存数据,供客户需求进行存取。不同于高速缓存102,辅助内存104的输出╱输入速度较慢,以致任何储存于其中的数据,响应存取请求的访问速度较慢。辅助内存104中经常存取的数据将被复制并储存到高速缓存102中以供快取。实际上,辅助内存104可以是SSD(Solid State Drives,固态硬盘)、HDD(HardDisk Drive,硬盘驱动器)、可写式DVD(Digital Versatile Disc,数字通用光盘),甚或是磁带。辅助内存104的配置依照云端储存设备系统10或其上运行的工作负载的目的而决定。在本实施例中,有3个辅助内存104。事实上,在一个云端储存设备系统中,辅助内存的数量可能是几百到几千个,甚至更多。Auxiliary memory 104 is also connected to host 101 and can store data in a distributed manner for access upon request. Unlike cache 102, auxiliary memory 104 has a slower input/output speed, resulting in slower access to any data stored therein in response to access requests. Frequently accessed data in auxiliary memory 104 is copied and stored in cache 102 for caching. In practice, auxiliary memory 104 can be an SSD (Solid State Drive), an HDD (Hard Disk Drive), a writable DVD (Digital Versatile Disc), or even a magnetic tape. The configuration of auxiliary memory 104 depends on the purpose of the cloud storage system 10 or the workload running on it. In this embodiment, there are three auxiliary memories 104. In reality, a cloud storage system may have hundreds to thousands of auxiliary memories, or even more.

需要说明的是,本发明使用的某些定义要先行阐述。请见图3,图3为处理内容记录的窗体,用来监视高速缓存102中的数据在过去是如何被存取的。该窗体有TID(处理内容ID,由0001到0024)列、快取资料(由D01到D18)栏、参考时段(由H00到H07)栏,以及记录时间。H00指的是记录时间落于00:00到01:00间、H01指的是记录时间落于01:00到02:00间,以此类推。TID与快取数据字段中的“1”意味对应的快取资料已在“目前”记录时间及“最后”记录时间前被存取。TID与参考时段字段中的“1”意味在对处理内容量化不同时段中的记录时间。处理内容是在过去该段时间内快取数据被存取的记录。在本实施例中,过去8小时中的记录(处理内容)拿来进行分析。为了有较佳的说明,每一处理内容具有一个对应的TID以辨认。处理内容记录器103以二连续记录的处理内容间隔时间跨度的方式定期地记录这些处理内容。在本实施例中,每一处理内容在前一次处理内容记录后的20分钟进行记录,时间跨度为20分钟。实际上,记录时间不一定需要准确落在预定时间表上。举例而言,记录时间可能落在00:30:18、00:50:17等时间点上,不是准确落于第15秒上而是有一段范围。这是因为可能有某些大的资料在进行存取或处理内容记录器103正在等待远程联机的高速缓存102的响应。可以接受的更积极的方式是该时间跨度为随机挑选的,这也属于本发明的范畴。It should be noted that certain definitions used in the present invention require prior explanation. Please see Figure 3, which shows a window for processing content records, used to monitor how data in cache 102 has been accessed in the past. This window includes a TID (Processing Content ID, 0001 to 0024) column, a Cache Data column (D01 to D18), a Reference Period column (H00 to H07), and a Recording Time column. H00 indicates a recording time between 00:00 and 01:00, H01 indicates a recording time between 01:00 and 02:00, and so on. A "1" in the TID and Cache Data fields indicates that the corresponding cache data has been accessed before the "Current" recording time and the "Last" recording time. A "1" in the TID and Reference Period fields indicates that the recording time in different time periods is quantified for processing content. Processing content is a record of cache data accesses during the past period. In this embodiment, records (processing content) from the past eight hours are analyzed. For better illustration, each processing content has a corresponding TID for identification. The processing content recorder 103 periodically records these processing contents, using the time span between two consecutively recorded processing contents. In this embodiment, each processing content is recorded 20 minutes after the previous processing content is recorded, with a time span of 20 minutes. In practice, the recording time does not necessarily need to fall exactly on the predetermined schedule. For example, the recording time may fall at 00:30:18, 00:50:17, etc., not exactly at the 15th second, but within a range. This is because some large data may be being accessed, or the processing content recorder 103 may be waiting for a response from the remotely connected cache 102. A more proactive approach is to randomly select the time span, which also falls within the scope of the present invention.

需要注意的是,处理内容的数量很大,可能是上千笔或更多,举例而言,以10分钟为时间跨度进行三个月的记录,以24笔处理内容作为一个具体实施例来说明。处理内容记录器103有较多的处理内容,在未来的特定时间内数据的需求就能更精准地被预测。当然,并非所有快取储存在高速缓存102中的数据都会在一段时间内被存取。如图3所示,处理内容0015没有被存取数据的记录,它仅有记录时间,04:50:05。It should be noted that the number of processed contents is large, potentially thousands or more. For example, a three-month recording with a 10-minute time span is used, with 24 processed contents as a specific example. With more processed contents in the processed content recorder 103, data demand at a specific time in the future can be more accurately predicted. Of course, not all data cached in the cache 102 will be accessed within a certain period of time. As shown in Figure 3, processed content 0015 does not record any accessed data; it only records the time, 04:50:05.

在高速缓存102中的数据被本发明方法借由云端储存设备系统10决定的细节揭露前,先看一下快取资料。虽然有18笔快取数据,依照高速缓存102的容量,快取数据的数目可能大于18。该18笔快取资料在07:50:05由本发明的方法及╱或其它云端储存设备系统10使用的快取算法获得。因为如果某些数据太经常被存取,处理内容记录器103可从辅助内存104之一增加新的数据到高速缓存102中,用于分析的快取数据也可能会因此改变。可能有其它数据在03:50:05前被快取储存但后来被移除,是因为它没被请求或“预期被存取”。Before disclosing the details of how the data in cache 102 is determined by the method of the present invention via cloud storage system 10, let's first look at the cache data. Although there are 18 cached data entries, the number of cached data entries may be greater than 18 depending on the capacity of cache 102. These 18 cached data entries were obtained at 07:50:05 by the method of the present invention and/or other cache algorithms used by cloud storage system 10. Because if certain data is accessed too frequently, the process content recorder 103 may add new data to cache 102 from one of the auxiliary memories 104, and the cached data used for analysis may also change. Other data may have been cached before 03:50:05 but was later removed because it was not requested or "expected to be accessed."

由图3可看出快取数据的特性。快取数据D01在前3小时及最后一小时中常被存取。快取资料D02在每隔一个20分钟内平均被存取。快取资料D03在每隔两个20分钟内平均被存取。快取资料D04在00:10:05至00:30:05、02:50:05至03:10:05,及05:30:05至05:50:05内平均被存取。快取资料D05在00:30:05至00:50:05及06:10:05至06:30:05被存取。快取资料D06仅在05:30:05至05:50:05内被存取。快取资料D07在00:30:05至01:10:05、03:10:05至03:50:05,及06:10:05至06:50:05平均被存取。快取资料D08仅于07:10:05至07:30:05被存取,它可能是在07:10:05后因预期性需求而加入的最新资料。几乎除了04:30:05至04:50:05外的每一时段,快取资料D09最常被存取。快取资料D10是随机地被存取。快取资料D11没有被存取的记录。快取数据D12在每隔一个20分钟的40分钟内平均被存取。快取资料D13随机地被存取。快取资料D14在00:50:05至04:30:05间密集地被存取。快取资料D15在02:50:05至06:50:05间,除了04:30:05至04:50:05外,密集地被存取。快取数据D16和快取数据D01有相似的存取需求。快取数据D17与D18都平均地被存取,但快取资料D17于03:50:05与04:30:05间有较多的请求,快取数据D18在01:50:05与03:10:05间有较多的请求。Figure 3 illustrates the characteristics of cache data. Cache data D01 is frequently accessed in the first three hours and the last hour. Cache data D02 is accessed on average every 20 minutes. Cache data D03 is accessed on average every two 20 minutes. Cache data D04 is accessed on average between 00:10:05 and 00:30:05, 02:50:05 and 03:10:05, and 05:30:05 and 05:50:05. Cache data D05 is accessed between 00:30:05 and 00:50:05 and 06:10:05 and 06:30:05. Cache data D06 is accessed only between 05:30:05 and 05:50:05. Cache data D07 is accessed evenly between 00:30:05 and 01:10:05, 03:10:05 and 03:50:05, and 06:10:05 and 06:50:05. Cache data D08 is accessed only between 07:10:05 and 07:30:05 and may be the latest data added after 07:10:05 in anticipation of demand. Cache data D09 is accessed most frequently in almost every period except 04:30:05 and 04:50:05. Cache data D10 is accessed randomly. There is no record of cache data D11 being accessed. Cache data D12 is accessed evenly within 40 minutes of every 20-minute interval. Cache data D13 is accessed randomly. Cache data D14 is accessed frequently between 00:50:05 and 04:30:05. Cache data D15 is accessed frequently between 02:50:05 and 06:50:05, except between 04:30:05 and 04:50:05. Cache data D16 has similar access requirements to cache data D01. Cache data D17 and D18 are both accessed evenly, but cache data D17 has more requests between 03:50:05 and 04:30:05, and cache data D18 has more requests between 01:50:05 and 03:10:05.

本发明的主要目的在依照历史信息,预测在未来特定时间所请求的数据,并在未来的该特定时间到来前,提供对应数据到高速缓存102中。一种用来决定云端储存设备系统10的高速缓存102中数据的方法有几个步骤。请见图4,该图为本发明提出方法的流程图。如上所述,该方法由处理内容记录器103所执行。首先S01,记录云端储存设备系统10的高速缓存102在过去一段时间内的处理内容。每一处理内容包括记录时间(处理内容ID 0015),或记录时间与过去该段时间(例子中的8小时)内被存取的快取数据。接着S02,指定在未来的特定时间。高速缓存102接收来自主机101的未来的该特定时间。在本实施例中,未来的该特定时间可以是未来的任何时间或时段。举例而言,它可以是一小时中的特定分钟(对每一小时而言)、一天中的特定小时(对每一天而言)、一周中的特定日(对每一周而言)、一月中的特定日(对每一个月而言)、一季中的一特定日(对每一季而言)、一年中的特定日(对每一年而言)、一月中的特定周(对每一个月而言)、一季中的特定周(对每一季而言)、一年中的特定周(对每一年而言),或一年中的特定月(对每一年而言)。在本实施例中,处理内容用来决定哪些数据应该在其它天的00:00:00(H00)前被快取储存。The main purpose of the present invention is to predict the data requested at a specific time in the future based on historical information, and to provide the corresponding data to the cache 102 before the specific time in the future arrives. A method for determining the data in the cache 102 of the cloud storage device system 10 has several steps. Please see Figure 4, which is a flow chart of the method proposed by the present invention. As described above, the method is executed by the processing content recorder 103. First, S01, the processing content of the cache 102 of the cloud storage device system 10 in the past period of time is recorded. Each processing content includes a recording time (processing content ID 0015), or a recording time and cache data accessed within the past period of time (8 hours in the example). Then S02, a specific time in the future is specified. The cache 102 receives the specific time in the future from the host 101. In this embodiment, the specific time in the future can be any time or period in the future. For example, it can be a specific minute of the hour (for each hour), a specific hour of the day (for each day), a specific day of the week (for each week), a specific day of the month (for each month), a specific day of the quarter (for each quarter), a specific day of the year (for each year), a specific week of the month (for each month), a specific week of the quarter (for each quarter), a specific week of the year (for each year), or a specific month of the year (for each year). In this embodiment, the processing content is used to determine which data should be cached before 00:00:00 (H00) on other days.

第三步骤是S03,基于参考时段,对每一来自处理内容的快取资料计算出与时间相关的置信度。参考时段指的是“在一小时中的特定分钟内”(H00,每天第一个小时的每一个20分钟)的时间。在其它例子中,参考时段可以是“在一日中的特定小时内”或“在一年中的特定日内”,随时间跨度记录数量的不同而不同。在特定的例子中,参考时段可以是“在主时间单元中的特定子时间单元内”。举例而言,在一天中的24小时内。该与时间相关的置信度可由下列步骤计算得到:A、计算第一数量,该第一数量为参考时段在过去该段时间内出现的数量;B、计算第二数量,该第二数量为当标的快取数据存取时,该参考时段的数量;及C、.将该第二数量除以该第一数量。在本实施例中,对所有数据来说,图5给出了计算所得的与时间相关的置信度列表。如果未来的该特定时间是8:00AM的第一分钟,且参考时段指的是过去8小时内所有的20分钟,其结果在图6中显示。参见图5和图6,基于不同情况,相对于其它快取数据,每一快取数据具有不同的计算得到的与时间相关的置信度。The third step, S03, is to calculate a time-related confidence score for each cached data item from the processed content based on a reference time period. The reference time period refers to "within a specific minute of an hour" (H00, every 20 minutes of the first hour of each day). In other examples, the reference time period may be "within a specific hour of a day" or "within a specific day of a year," depending on the number of time span records. In a specific example, the reference time period may be "within a specific sub-time unit within a primary time unit," for example, within a 24-hour day. The time-related confidence score can be calculated by the following steps: A. calculating a first number, which is the number of times the reference time period occurred within the past time period; B. calculating a second number, which is the number of times the target cached data was accessed within the reference time period; and C. dividing the second number by the first number. In this embodiment, FIG5 shows a list of the calculated time-related confidence scores for all data items. If the future specific time is the first minute of 8:00 AM, and the reference time period refers to all 20 minutes within the past 8 hours, the results are shown in FIG6. 5 and 6 , based on different situations, each cache data has a different calculated time-related confidence relative to other cache data.

接着S04,排序这些与时间相关的置信度。例子中的结果也各自显示在图5和图6中。最后S05,在该高速缓存102中提供具有较高与时间相关的置信度的快取数据,并当该高速缓存102在未来的该特定时间前耗尽时,移除该高速缓存102中具有较低与时间相关的置信度的快取数据。以图6为例进行说明。在其它天的00:00前,也许在11:59:59PM,除了D11,所有的数据都做为新的快取数据储存到高速缓存102中,供00:00以后存取请求所需。D11移除的原因是高速缓存102的空间不够18笔数据储存且D11具有的与时间相关的置信度低于其它的数据。18笔快取储存档案用来分析的原因是因低命中率或其它因素以及新的数据(D08)加入,有一笔或多笔快取数据已被云端储存设备系统10移除。所有使用的快取数据数量为18。高速缓存102中新被快取的数据是在08:00后最有可能收到请求的数据,它们都是基于与时间相关的置信度而计算出的。要注意的是以上所说的数据或快取数据型态可以是对象、区块,或档案。Next, S04, these time-related confidences are sorted. The results of the example are also shown in Figures 5 and 6, respectively. Finally, S05, cache data with higher time-related confidences are provided to cache 102, and when cache 102 is exhausted before a specific time in the future, cache data with lower time-related confidences are removed from cache 102. Figure 6 is used as an example for illustration. Before 00:00 on other days, perhaps at 11:59:59 PM, all data except D11 is stored as new cache data in cache 102 for access requests after 00:00. The reason for D11's removal is that cache 102 does not have enough space to store 18 data items, and D11 has a lower time-related confidence than the other data items. The 18 cached storage files are used for analysis because one or more cached data items have been removed from the cloud storage device system 10 due to low hit rates or other factors, as well as the addition of new data (D08). The total number of cached data used is 18. The newly cached data in cache 102 is the data most likely to be requested after 08:00, which is calculated based on the confidence associated with time. It should be noted that the data or cached data types mentioned above can be objects, blocks, or files.

在其它实施例中,最后一个步骤(S05)可能不同,这意味处理内容记录器103具有的功能与前一个实施例中的不同。改变的步骤内容为提供具有较高与时间相关的置信度的快取数据与从至少一种其它快取算法计算得到的数据到高速缓存102中,以在未来的该特定时间前耗尽高速缓存102的使用。在具有较高与时间相关的置信度的快取数据及从其它快取算法计算得到的数据间存在固定比率,该固定比率是基于数据数量或数据占据空间而计算。再回到图6。如果高速缓存102设定为快取20笔数据,当本发明所提及的用于快取资料的比率为60%,而剩余由其它快取算法计算得到的数据占40%,则本方法所得的快取数据为D01、D02、D03、D07、D09、D10、D12、D13、D14、D15、D16,及D18,总共12笔数据,其余的数据由前述快取算法所提出。如果有某些相同的快取数据是由两造共同提出,则由本发明或其它快取算法所算出具有较低优先次序的数据可递补使用,本发明并不限定。当然,在多数情况下,高速缓存102设计依照其容量来快取数据,而不是以数据数量决定。由上面的例子来看,60%高速缓存102的容量应留给由本发明所决定的数据,而其余的40%则给至少一种现有快取算法所提出的数据。前述的快取算法包含,但不限于Least Recently Used(LRU)算法、Most Recently Used(MRU)算法、Pseudo-LRU(PLRU)算法、Random Replacement(RR)算法、Segmented LRU(SLRU)算法、2-way set associative算法、Least-Frequently Used(LFU)算法、Low Inter-reference Recent Set(LIRS)算法、Adaptive Replacement Cache(ARC)算法、Clock with Adaptive Replacement(CAR)算法、Multi Queue(MQ)算法,或是定义于发明背景中的与数据相关算法。应注意的是如果应用与数据相关算法,标的数据应使用本发明运算的结果,这意味由步骤S04获得具有较高排序的快取数据再输入到与数据相关算法中当作标的数据,以得到该与数据相关算法的结果。在云端储存设备系统10中,这是由处理内容记录器103产生标的数据来供与数据相关算法使用。与数据相关算法也可利用处理内容记录器103来执行。In other embodiments, the last step (S05) may be different, meaning that the processing content recorder 103 functions differently than in the previous embodiment. The modified step involves providing cache data with a higher time-related confidence and data calculated from at least one other cache algorithm to cache 102, so as to exhaust cache 102 before a specific time in the future. A fixed ratio exists between cache data with a higher time-related confidence and data calculated from other cache algorithms, calculated based on the amount of data or the space occupied by the data. Returning to Figure 6 , if cache 102 is configured to cache 20 data items, and the ratio used for caching data as described in the present invention is 60%, with the remaining 40% being data calculated from other cache algorithms, the cache data obtained by this method is D01, D02, D03, D07, D09, D10, D12, D13, D14, D15, D16, and D18, a total of 12 data items. The remaining data is provided by the aforementioned cache algorithms. If some of the same cache data is requested by both parties, the data with a lower priority, as determined by the present invention or another cache algorithm, may be used instead. This is not a limitation of the present invention. Of course, in most cases, cache 102 is designed to cache data based on its capacity, not the amount of data. In the example above, 60% of cache 102's capacity should be reserved for the data determined by the present invention, while the remaining 40% should be reserved for data requested by at least one existing cache algorithm. The aforementioned cache algorithms include, but are not limited to, the Least Recently Used (LRU) algorithm, the Most Recently Used (MRU) algorithm, the Pseudo-LRU (PLRU) algorithm, the Random Replacement (RR) algorithm, the Segmented LRU (SLRU) algorithm, the 2-way set associative algorithm, the Least-Frequently Used (LFU) algorithm, the Low Inter-reference Recent Set (LIRS) algorithm, the Adaptive Replacement Cache (ARC) algorithm, the Clock with Adaptive Replacement (CAR) algorithm, the Multi Queue (MQ) algorithm, or the data-dependent algorithms defined in the background of the invention. It should be noted that if a data-dependent algorithm is applied, the target data should use the result of the operation of the present invention. This means that the cache data with a higher ranking obtained in step S04 is then input into the data-dependent algorithm as the target data to obtain the result of the data-dependent algorithm. In the cloud storage device system 10, the target data is generated by processing the content recorder 103 for use by the data-dependent algorithm. Data-related algorithms may also be executed using the process content recorder 103 .

以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, not all possible combinations of the technical features in the above-mentioned embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-described embodiments merely illustrate several implementations of the present invention, and while their descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the patent. It should be noted that a person skilled in the art would be able to make numerous variations and improvements without departing from the spirit of the present invention, all of which fall within the scope of protection of the present invention. Therefore, the scope of protection of the patent for this invention shall be determined by the appended claims.

【符号说明】【Explanation of symbols】

1客户端计算机;3因特网;4服务器;5快取;6辅助内存;10云端储存设备系统;100服务器;101主机;102高速缓存;103处理内容记录器;104辅助内存;200因特网;301个人计算机;302平板计算机。1 Client computer; 3 Internet; 4 Server; 5 Cache; 6 Auxiliary memory; 10 Cloud storage system; 100 Server; 101 Host; 102 Cache; 103 Processing content recorder; 104 Auxiliary memory; 200 Internet; 301 Personal computer; 302 Tablet computer.

Claims (16)

1.一种决定云端储存设备架构的高速缓存中数据的方法,其特征在于,所述方法包括步骤:1. A method for determining data in the cache of a cloud storage device architecture, characterized in that the method includes the following steps: A、记录云端储存设备系统的高速缓存在过去一段时间内的处理内容,其中每一处理内容包括记录时间,或记录时间与过去该段时间内被存取的快取数据,其中所述记录时间为记录快取数据被存取的时间;A. Record the processing content of the high-speed cache of the cloud storage device system in the past period of time, wherein each processing content includes the recording time, or the recording time and the cached data accessed in the past period of time, wherein the recording time is the time when the cached data is accessed. B、指定在未来的特定时间;B. Specify a particular time in the future; C、基于参考时段,对每一来自处理内容的快取数据计算出与时间相关的置信度,所述参考时段为主时间单元中所选定的特定子时间单元;C. Based on a reference time period, calculate a time-related confidence level for each cached data from the processed content, wherein the reference time period is a specific sub-time unit selected in the main time unit; D、排序所述与时间相关的置信度;及D. Rank the time-related confidence levels; and E、在未来的所述特定时间前,将具有较高与时间相关的置信度的快取数据加入所述高速缓存中,并当所述高速缓存在未来的所述特定时间前耗尽时,移除所述高速缓存中具有较低与时间相关的置信度的快取数据,E. Before the specified future time, add cached data with high time-related confidence to the cache, and when the cache is exhausted before the specified future time, remove cached data with low time-related confidence from the cache. 其中所述与时间相关的置信度由下列步骤计算得到:The time-related confidence level is calculated using the following steps: C1、计算第一数量,所述第一数量为参考时段在过去该段时间内出现的数量;C1. Calculate the first quantity, which is the number of times the reference time period occurred in the past period. C2、计算第二数量,所述第二数量为当标的快取数据存取时,所述参考时段的数量;及C2. Calculate the second quantity, which is the quantity of the reference time period when the target cache data is accessed; and C3、将所述第二数量除以所述第一数量。C3. Divide the second quantity by the first quantity. 2.如权利要求1所述的方法,其特征在于,所述特定时间包括一小时中的特定分钟、一天中的特定小时、一周中的特定日、一月中的特定日、一季中的特定日、一年中的特定日、一月中的特定周、一季中的特定周、一年中的特定周,或一年中的特定月。2. The method as described in claim 1, wherein the specific time includes a specific minute in an hour, a specific hour in a day, a specific day in a week, a specific day in a month, a specific day in a quarter, a specific day in a year, a specific week in a month, a specific week in a quarter, a specific week in a year, or a specific month in a year. 3.如权利要求1所述的方法,其特征在于,步骤A每间隔一时间跨度定期地对所述处理内容进行记录。3. The method as described in claim 1, wherein step A periodically records the processed content at regular intervals. 4.如权利要求1所述的方法,其特征在于,所述参考时段包括在一小时中的特定分钟内、在一日中的特定小时内,或在一年中的特定日内。4. The method as claimed in claim 1, wherein the reference time period includes a specific minute within an hour, a specific hour within a day, or a specific day within a year. 5.如权利要求1所述的方法,其特征在于,所述数据的型态包括对象、区块,或档案。5. The method as described in claim 1, wherein the data type includes objects, blocks, or files. 6.一种决定云端储存设备架构的高速缓存中数据的方法,其特征在于,所述方法包括步骤:6. A method for determining data in a cache of a cloud storage device architecture, characterized in that the method includes the steps of: A、记录云端储存设备系统的高速缓存在过去一段时间内的处理内容,其中每一处理内容包括记录时间,或记录时间与过去该段时间内被存取的快取数据,其中所述记录时间为记录快取数据被存取的时间;A. Record the processing content of the high-speed cache of the cloud storage device system in the past period of time, wherein each processing content includes the recording time, or the recording time and the cached data accessed in the past period of time, wherein the recording time is the time when the cached data is accessed. B、指定在未来的特定时间;B. Specify a particular time in the future; C、基于参考时段,对每一来自处理内容的快取数据计算出与时间相关的置信度,所述参考时段为主时间单元中所选定的特定子时间单元;C. Based on a reference time period, calculate a time-related confidence level for each cached data from the processed content, wherein the reference time period is a specific sub-time unit selected in the main time unit; D、排序所述与时间相关的置信度;及D. Rank the time-related confidence levels; and E、在未来的所述特定时间前,将具有较高与时间相关的置信度的快取数据与从至少一种其它快取算法计算得到的数据加入到高速缓存中,以供所述高速缓存在未来的所述特定时间使用,其中在具有较高与时间相关的置信度的快取数据及从其它快取算法计算得到的数据间存在固定比率,E. Before the specified future time, cached data with high time-related confidence and data calculated from at least one other caching algorithm are added to the cache for use by the cache at the specified future time, wherein a fixed ratio exists between the cached data with high time-related confidence and the data calculated from other caching algorithms. 其中所述与时间相关的置信度由下列步骤计算得到:The time-related confidence level is calculated using the following steps: C1、计算第一数量,所述第一数量为参考时段在过去该段时间内出现的数量;C1. Calculate the first quantity, which is the number of times the reference time period occurred in the past period. C2、计算第二数量,所述该第二数量为当标的快取数据存取时,所述参考时段的数量;及C2. Calculate the second quantity, wherein the second quantity is the quantity of the reference time period when the target cache data is accessed; and C3、将所述第二数量除以所述第一数量。C3. Divide the second quantity by the first quantity. 7.如权利要求6所述的方法,其特征在于,所述固定比率是基于数据数量或数据占据空间而计算。7. The method as described in claim 6, wherein the fixed ratio is calculated based on the amount of data or the space occupied by the data. 8.如权利要求6所述的方法,其特征在于,所述快取算法包括LRU算法、MRU算法、PLRU算法、RR算法、SLRU算法、2-way set associative算法、LFU算法、LIRS算法、ARC算法、CAR算法、MQ算法,或以来自步骤D的结果作为标的数据的与数据相关算法。8. The method as described in claim 6, wherein the cache algorithm includes LRU algorithm, MRU algorithm, PLRU algorithm, RR algorithm, SLRU algorithm, 2-way set associative algorithm, LFU algorithm, LIRS algorithm, ARC algorithm, CAR algorithm, MQ algorithm, or a data-related algorithm that uses the result from step D as the target data. 9.一种云端储存设备系统,其特征在于,所述系统包括:9. A cloud storage device system, characterized in that the system comprises: 主机,用以存取数据;The host is used to access data; 高速缓存,连接至所述主机,用以暂时储存快取数据供快速存取;A high-speed cache, connected to the host, is used to temporarily store cached data for fast access; 处理内容记录器,配置到或安装到所述高速缓存,连接至所述主机以记录在过去一段时间内高速缓存的处理内容,其中每一处理内容包括记录时间,或记录时间与过去该段时间内被存取的快取数据、接收主机指定在未来的特定时间、基于参考时段,对每一来自处理内容的快取数据计算出与时间相关的置信度、排序所述与时间相关的置信度,及在未来的所述特定时间前,将具有较高与时间相关的置信度的快取数据加入所述高速缓存中,并当所述高速缓存在未来的所述特定时间前耗尽时,移除所述高速缓存中具有较低与时间相关的置信度的快取数据,其中所述记录时间为记录快取数据被存取的时间,所述参考时段为主时间单元中所选定的特定子时间单元;及A content recorder, configured or installed to the cache, connects to the host to record cached content processed over a past period, wherein each piece of content includes a recording time, or a recording time and cached data accessed during that past period, receiving a specific future time specified by the host, calculating a time-related confidence score for each cached data from the processed content based on a reference time period, sorting the time-related confidence scores, and adding cached data with higher time-related confidence scores to the cache before the specific future time, and removing cached data with lower time-related confidence scores from the cache when the cache is exhausted before the specific future time, wherein the recording time is the time when the cached data is accessed, and the reference time period is a specific sub-time unit selected in the main time unit; and 多个辅助内存,连接至所述主机,用以分散储存数据供存取,Multiple auxiliary memory modules, connected to the host, are used to distribute and store data for access. 其中所述与时间相关的置信度由下列步骤计算得到:The time-related confidence level is calculated using the following steps: C1、计算第一数量,所述第一数量为参考时段在过去该段时间内出现的数量;C1. Calculate the first quantity, which is the number of times the reference time period occurred in the past period. C2、计算第二数量,所述该第二数量为当标的快取数据存取时,所述参考时段的数量;及C2. Calculate the second quantity, wherein the second quantity is the quantity of the reference time period when the target cache data is accessed; and C3、将所述第二数量除以所述第一数量。C3. Divide the second quantity by the first quantity. 10.如权利要求9所述的云端储存设备系统,其特征在于,在具有较高与时间相关的置信度的快取数据及从其它快取算法计算得到的数据间存在一固定比率,所述固定比率是基于数据数量或数据占据空间而计算。10. The cloud storage device system as claimed in claim 9, characterized in that there exists a fixed ratio between cached data with a high time-related confidence level and data calculated from other caching algorithms, the fixed ratio being calculated based on the amount of data or the space occupied by the data. 11.如权利要求9所述的云端储存设备系统,其特征在于,所述特定时间包括一小时中的特定分钟、一天中的特定小时、一周中的特定日、一月中的特定日、一季中的特定日、一年中的特定日、一月中的特定周、一季中的特定周、一年中的特定周,或一年中的特定月。11. The cloud storage device system as described in claim 9, wherein the specific time includes a specific minute in an hour, a specific hour in a day, a specific day in a week, a specific day in a month, a specific day in a quarter, a specific day in a year, a specific week in a month, a specific week in a quarter, a specific week in a year, or a specific month in a year. 12.如权利要求9所述的云端储存设备系统,其特征在于,所述处理内容记录器每间隔一时间跨度定期地对所述处理内容进行记录。12. The cloud storage device system as described in claim 9, wherein the processing content recorder periodically records the processing content at intervals of a certain time span. 13.如权利要求9所述的云端储存设备系统,其特征在于,所述参考时段包括在一小时中的特定分钟内、在一日中的特定小时内,或在一年中的特定日内。13. The cloud storage device system as claimed in claim 9, wherein the reference time period includes a specific minute within an hour, a specific hour within a day, or a specific day within a year. 14.一种云端储存设备系统,其特征在于,所述系统包括:14. A cloud storage device system, characterized in that the system comprises: 主机,用以存取数据;The host is used to access data; 高速缓存,连接至所述主机,用以暂时储存快取数据供快速存取;A high-speed cache, connected to the host, is used to temporarily store cached data for fast access; 处理内容记录器,配置到或安装到所述高速缓存,连接至所述主机以记录在过去一段时间内高速缓存的处理内容,其中每一处理内容包括记录时间,或记录时间与过去该段时间内被存取的快取数据、接收主机指定在未来的特定时间、基于参考时段,对每一来自处理内容的快取数据计算出与时间相关的置信度、排序所述与时间相关的置信度,及在未来的所述特定时间前,将具有较高与时间相关的置信度的快取数据与从至少一种其它快取算法计算得到的数据加入到高速缓存中,以供所述高速缓存在未来的所述特定时间使用,其中在具有较高与时间相关的置信度的快取数据及从其它快取算法计算得到的数据间存在固定比率,其中所述记录时间为记录快取数据被存取的时间,所述参考时段为主时间单元中所选定的特定子时间单元;及A content recorder, configured or installed to the cache, is connected to the host to record cached content processed over a past period, wherein each piece of content includes a recording time, or a recording time and cached data accessed during that past period; receiving a host specifying a future specific time; calculating a time-related confidence score for each cached data from the processed content based on a reference time period; sorting the time-related confidence scores; and before the future specific time, adding cached data with higher time-related confidence scores and data calculated from at least one other caching algorithm to the cache for use by the cache at the future specific time, wherein a fixed ratio exists between the cached data with higher time-related confidence scores and the data calculated from other caching algorithms, wherein the recording time is the time when the cached data is accessed, and the reference time period is a specific sub-time unit selected in the main time unit; and 多个辅助内存,连接至所述主机,用以分散储存数据供存取,Multiple auxiliary memory modules, connected to the host, are used to distribute and store data for access. 其中所述与时间相关的置信度由下列步骤计算得到:The time-related confidence level is calculated using the following steps: C1、计算第一数量,所述第一数量为参考时段在过去该段时间内出现的数量;C1. Calculate the first quantity, which is the number of times the reference time period occurred in the past period. C2、计算第二数量,所述该第二数量为当标的快取数据存取时,所述参考时段的数量;及C2. Calculate the second quantity, wherein the second quantity is the quantity of the reference time period when the target cache data is accessed; and C3、将所述第二数量除以所述第一数量。C3. Divide the second quantity by the first quantity. 15.如权利要求14所述的云端储存设备系统,其特征在于,所述快取算法包括LRU算法、MRU算法、PLRU算法、RR算法、SLRU算法、2-way set associative算法、LFU算法、LIRS算法、ARC算法、CAR算法、MQ算法,或以处理内容记录器产生的数据当成标的数据的与数据相关算法。15. The cloud storage device system as described in claim 14, wherein the caching algorithm includes LRU algorithm, MRU algorithm, PLRU algorithm, RR algorithm, SLRU algorithm, 2-way set associative algorithm, LFU algorithm, LIRS algorithm, ARC algorithm, CAR algorithm, MQ algorithm, or a data-related algorithm that uses data generated by the content recorder as target data. 16.如权利要求14所述的云端储存设备系统,其特征在于,所述数据的型态包括对象、区块,或档案。16. The cloud storage device system as described in claim 14, wherein the data type includes objects, blocks, or files.
HK18110770.5A 2018-08-21 Method for determining data in cache memory of cloud storage architecture and cloud storage system using the same HK1251377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
HK18110770.5A HK1251377B (en) 2018-08-21 Method for determining data in cache memory of cloud storage architecture and cloud storage system using the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
HK18110770.5A HK1251377B (en) 2018-08-21 Method for determining data in cache memory of cloud storage architecture and cloud storage system using the same

Publications (2)

Publication Number Publication Date
HK1251377A1 HK1251377A1 (en) 2019-01-25
HK1251377B true HK1251377B (en) 2021-07-09

Family

ID=

Similar Documents

Publication Publication Date Title
US8214599B2 (en) Storage device prefetch system using directed graph clusters
US20200089624A1 (en) Apparatus and method for managing storage of data blocks
US9020893B2 (en) Asynchronous namespace maintenance
US7822712B1 (en) Incremental data warehouse updating
US11537584B2 (en) Pre-caching of relational database management system based on data retrieval patterns
US8380929B2 (en) Hierarchical storage management for database systems
US8909614B2 (en) Data access location selecting system, method, and program
US8762667B2 (en) Optimization of data migration between storage mediums
CN104978362B (en) Data migration method, device and the meta data server of distributed file system
US10140034B2 (en) Solid-state drive assignment based on solid-state drive write endurance
CN110727406B (en) Data storage scheduling method and device
US10061702B2 (en) Predictive analytics for storage tiering and caching
Puttaswamy et al. Frugal storage for cloud file systems
US20190087437A1 (en) Scheduling database compaction in ip drives
CN113946552B (en) Data processing method and electronic device
CN109947363A (en) Data caching method of distributed storage system
WO2020135737A1 (en) Methods, apparatuses, devices and mediums for partition management and data storage and querying
JP2018106545A (en) Information processing device, information processing system, information processing method and program
US10691614B1 (en) Adaptive page replacement
CN110858210A (en) Data query method and device
CN107819804B (en) Cloud storage device system and method for determining data in cache of its architecture
US11132128B2 (en) Systems and methods for data placement in container-based storage systems
HK1251377B (en) Method for determining data in cache memory of cloud storage architecture and cloud storage system using the same
JP2018041455A (en) Method for determining data in cache memory of cloud storage structure and cloud storage system using the same
TWI629593B (en) Method for determining data in cache memory of cloud storage architecture and cloud storage system using the same