CN102024057A

CN102024057A - Method and device for building index of mass data record

Info

Publication number: CN102024057A
Application number: CN2010106063580A
Authority: CN
Inventors: 王俊; 程宁; 王冲
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2010-12-24
Filing date: 2010-12-24
Publication date: 2011-04-20
Anticipated expiration: 2030-12-24
Also published as: CN102024057B; WO2012083877A1

Abstract

The invention discloses a method and device for building an index of a mass data record, wherein the method comprises the following steps of: acquiring current system time when receiving a new file-to-be-written request message; generating an index key word of the file according to the current system time and the file identification of the file to be written requested by the file-to-be-written message; and building an incidence relation of the index key word and the file. Through the invention, the quick positioning of mass data storage can be realized.

Description

Index building method and device for massive data records

技术领域technical field

本发明涉及计算机及通信技术领域，具体而言，涉及一种海量数据记录的索引建立方法及装置。The present invention relates to the field of computer and communication technologies, in particular to a method and device for establishing an index of massive data records.

背景技术Background technique

目前的内存数据库中通常是通过文件的名称或写入时间等属性作为索引关键字，但对于海量数据的存储采用这种方式进行索引的速度较慢或者索引的结果不唯一。例如，当前IPTV系统中通过分布式文件系统访问存储的媒体文件，该文件系统使用内存数据库管理文件系统的元数据。在应用中，要求系统支持五百万文件记录和两千万块(CHUNK)记录，如果采用现有的索引方法，则不能达到快速定位的需求。In the current memory database, attributes such as file name or write time are usually used as index keywords, but for the storage of massive data, indexing in this way is slow or the indexing result is not unique. For example, in the current IPTV system, stored media files are accessed through a distributed file system, and the file system uses an in-memory database to manage metadata of the file system. In the application, the system is required to support 5 million file records and 20 million chunk (CHUNK) records. If the existing indexing method is adopted, the requirement of fast positioning cannot be met.

发明内容Contents of the invention

本发明的主要目的在于提供一种海量数据记录的索引建立方法及装置，以至少解决上述问题之一。The main purpose of the present invention is to provide a method and device for establishing an index of massive data records, so as to solve at least one of the above-mentioned problems.

根据本发明的一个方面，提供了一种海量数据记录的索引建立方法，包括：在接收到新的写入文件请求消息时，获取当前系统时间；根据所述当前系统时间和所述写入文件请求消息请求写入的文件的文件标识，生成所述文件的索引关键字；建立所述索引关键字与所述文件的关联关系。According to one aspect of the present invention, a method for establishing an index for massive data records is provided, including: obtaining the current system time when receiving a new write file request message; according to the current system time and the write file The request message requests the file identification of the file to be written, generates the index key of the file, and establishes the association relationship between the index key and the file.

其中，所述文件可以为分布式文件。Wherein, the file may be a distributed file.

其中，所述文件标识可以包括：所述分布式文件的逻辑文件的第一逻辑文件标识；则根据所述当前系统时间和所述文件标识生成所述索引关键字包括：步骤A：根据所述当前系统时间相对于预定时间经历的总时长，获取第一时间域参数；步骤B：根据预设的配置策略，生成所述第一逻辑文件标识；步骤C：将所述第一时间域参数与所述第一逻辑文件标识合成为查找键值；步骤D：在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值或者所述第一时间域参数和所述第一逻辑文件标识与查找到的查找键值指示的第二时间域参数和第二逻辑文件标识不完成相同，则表示所述第一逻辑文件标识有效，将所述第一时间域参数与所述第一逻辑文件标识合成的所述查找键值作为所述逻辑文件的唯一ID，将所述查找键值作为所述索引关键字。Wherein, the file identifier may include: the first logical file identifier of the logical file of the distributed file; then generating the index key according to the current system time and the file identifier includes: Step A: according to the The total duration of the current system time relative to the predetermined time, and obtain the first time domain parameter; step B: generate the first logical file identifier according to the preset configuration strategy; step C: combine the first time domain parameter with the The first logical file identifier is synthesized into a lookup key value; Step D: look up the lookup key value in the recorded data area, if the lookup key value or the first time domain parameter and the first time domain parameter cannot be found If a logical file identifier is not completely identical to the second time domain parameter and the second logical file identifier indicated by the found search key value, it means that the first logical file identifier is valid, and the first time domain parameter is combined with the second logical file identifier. The first logical file identifies the synthesized search key as the unique ID of the logical file, and uses the search key as the index key.

其中，所述文件标识可以包括：所述分布式文件的逻辑文件的第一逻辑文件标识和所述分布式文件的分片文件的第一分片文件标识；则根据所述当前系统时间和所述文件标识生成所述索引关键字包括：步骤A：根据所述当前系统时间相对于预定时间经历的总时长，获取第一时间域参数；步骤B：根据预设的配置策略，生成所述第一逻辑文件标识和所述第一分片文件标识；步骤C：将所述第一时间域参数、所述第一逻辑文件标识与所述第一分片文件标识合成为查找键值；步骤D：在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值，或者所述第一时间域参数和所述第一逻辑文件标识与查找到的查找键值指示的第二时间域参数和第二逻辑文件标识不完成相同，且所述第一时间域参数和所述第一分片文件标识与所述第二时间域参数和所述第二分片文件标识不完成相同，则表示所述第一逻辑文件标识与所述第一分片文件标识的组合有效，将所述第一时间域参数、所述第一逻辑文件标识与所述第一分片文件标识合成的所述查找键值作为所述分片文件的唯一ID，将所述查找键值作为所述索引关键字。Wherein, the file identifier may include: the first logical file identifier of the logical file of the distributed file and the first fragment file identifier of the fragmented file of the distributed file; then according to the current system time and the The file identifier generating the index keyword includes: Step A: Obtain the first time domain parameter according to the total time experienced by the current system time relative to the predetermined time; Step B: Generate the first time domain parameter according to the preset configuration strategy A logical file identifier and the first fragment file identifier; step C: synthesizing the first time domain parameter, the first logical file identifier and the first fragment file identifier into a lookup key; step D : look up the lookup key value in the data area of the record, if the lookup key value cannot be found, or the first time field parameter and the first logical file identifier are the same as the first search key value indicated by the found lookup key value The second time domain parameter and the second logical file identifier are not identical, and the first time domain parameter and the first fragment file identifier are not complete with the second time domain parameter and the second fragment file identifier If they are the same, it means that the combination of the first logical file identifier and the first fragment file identifier is valid, and the first time domain parameter, the first logical file identifier and the first fragment file identifier are synthesized The search key value is used as the unique ID of the fragment file, and the search key value is used as the index key.

其中，如果所述第一时间域参数与所述第二时间域参数相同且所述第一分片文件标识与所述第二分片文件标识相同，则所述方法还包括：修改所述第一分片文件标识的值生成新的分片文件标识，然后将所述新的分片文件标识作为所述第一分片文件标识，返回步骤C。Wherein, if the first time domain parameter is the same as the second time domain parameter and the first fragment file identifier is the same as the second fragment file identifier, the method further includes: modifying the first fragment file identifier A value of a fragment file identifier generates a new fragment file identifier, and then uses the new fragment file identifier as the first fragment file identifier, and returns to step C.

其中，生成所述第一分片文件标识可以包括：获取上一次生成的分片文件标识的值，将所述值增加指定的增量，得到所述第一分片文件标识，其中，所述分片文件标识所占的位数由所述配置策略确定；则修改所述第一分片文件标识的值生成新的分片文件标识可以包括：将所述第一分片文件标识的值增加所述指定的增量得到所述新的分片文件标识。Wherein, generating the first fragment file identifier may include: obtaining the value of the fragment file identifier generated last time, increasing the value by a specified increment to obtain the first fragment file identifier, wherein the The number of digits occupied by the fragment file identifier is determined by the configuration strategy; then modifying the value of the first fragment file identifier to generate a new fragment file identifier may include: increasing the value of the first fragment file identifier The specified increment obtains the new fragment file identifier.

其中，如果所述第一时间域参数与所述第二时间域参数相同且所述第一逻辑文件标识与所述第二逻辑文件标识相同，则所述方法还包括：修改所述第一逻辑文件标识的值生成新的逻辑文件标识，然后将所述新的逻辑文件标识作为所述第一逻辑文件标识，返回步骤C。Wherein, if the first time domain parameter is the same as the second time domain parameter and the first logical file identifier is the same as the second logical file identifier, the method further includes: modifying the first logic The value of the file identifier generates a new logical file identifier, and then uses the new logical file identifier as the first logical file identifier, and returns to step C.

其中，生成所述第一逻辑文件标识可以包括：获取上一次生成的逻辑文件标识的值，将所述值增加指定的增量，得到所述第一逻辑文件标识，其中，所述逻辑文件标识所占的位数由所述配置策略确定；则修改所述第一逻辑文件标识的值生成新的逻辑文件标识可以包括：将所述第一逻辑文件标识的值增加所述指定的增量得到所述新的逻辑文件标识。Wherein, generating the first logical file identifier may include: obtaining the value of the last generated logical file identifier, increasing the value by a specified increment to obtain the first logical file identifier, wherein the logical file identifier The number of digits occupied is determined by the configuration strategy; then modifying the value of the first logical file identifier to generate a new logical file identifier may include: increasing the value of the first logical file identifier by the specified increment to obtain The new logical file ID.

其中，所述步骤A可以包括：根据所述总时长，获取时间域中各个位域的值，其中，所述时间域的位域包括：年域、小时域或分钟域、和秒域，所述总时长为m年n小时或分钟k秒，所述年域用于记录所述m的值，所述小时域或分钟域用于记录所述n的值，所述秒域用于记录所述k的值，m、n和k为大于或等于0的整数；按照配置策略对所述时间域按位混合得到所述第一时间域参数，其中，所述配置策略包括：将单位小时内秒数与所述时间域对齐到相同数量级，将所述年域，以及所述小时域或分钟域的位移到低位得到所述第一时间域参数；或者，将单位年度内小时或分钟数与所述时间域对齐到相同数量级，将所述年域对应的位移到较低位得到所述第一时间域参数。Wherein, the step A may include: according to the total duration, obtain the value of each bit field in the time field, wherein, the bit fields of the time field include: a year field, an hour field or a minute field, and a second field, so The total duration is m years, n hours or minutes, k seconds, the year field is used to record the value of m, the hour field or minute field is used to record the value of n, and the second field is used to record all For the value of k, m, n and k are integers greater than or equal to 0; according to the configuration strategy, the time domain is mixed bit by bit to obtain the first time domain parameter, wherein the configuration strategy includes: The number of seconds and the time field are aligned to the same order of magnitude, and the first time field parameter is obtained by shifting the year field and the hour field or minute field to a low position; or, the hour or minute number in a unit year and the The time domains are aligned to the same order of magnitude, and the first time domain parameter is obtained by shifting the corresponding bit of the year domain to a lower bit.

其中，在建立所述索引关键字与所述文件的关联关系之后，所述方法还可以包括：有时间相同的并发的写入请求超过预设比率；修改或配置分布式文件的逻辑文件标识和分片文件标识所占的位，和/或，修改或配置按位混合得到所述第一时间域参数的配置策略，返回所述步骤B，重新生成所述分布式文件的新的索引关键字。Wherein, after the association between the index key and the file is established, the method may further include: concurrent write requests with the same time exceeding a preset ratio; modifying or configuring the logical file identification and The bit occupied by the fragmented file identifier, and/or, modify or configure the configuration strategy for obtaining the first time domain parameter by bitwise mixing, return to the step B, and regenerate the new index key of the distributed file .

其中，可以按照折叠取模的方法合成所述查找键值。Wherein, the lookup key value can be synthesized according to the method of folding and modulo taking.

其中，建立所述索引关键字与所述文件的关联关系可以包括：在记录所述文件的内存数据库中申请一个空的记录位置，将所述分布式文件的逻辑文件名称与所述索引关键字的对应关系存储到所述记录位置，并将所述索引关键字加入到索引数据区中；在存储所述分布式文件的分片文件时，将所述索引关键字作为所述分片文件实际存储的文件名称。Wherein, establishing the association relationship between the index keyword and the file may include: applying for an empty record location in the memory database that records the file, combining the logical file name of the distributed file with the index keyword The corresponding relationship is stored in the recording location, and the index key is added to the index data area; when storing the fragment file of the distributed file, the index key is used as the actual value of the fragment file Stored file name.

根据本发明的另一方面，提供了一种海量数据记录的索引建立装置，包括：获取模块，用于在接收到新的写入文件请求时，获取当前系统时间；生成模块，用于根据所述当前系统时间和所述写入文件请求消息请求写入的文件的文件标识，生成所述分布式文件的索引关键字；建立模块，用于建立所述索引关键字与所述文件的关联关系。According to another aspect of the present invention, a device for establishing an index for massive data records is provided, including: an acquisition module, configured to acquire the current system time when a new request for writing a file is received; a generation module, configured to obtain the current system time according to the Describe the current system time and the file identifier of the file requested to be written by the write file request message, generate the index key of the distributed file; establish a module for establishing the association relationship between the index key and the file .

其中，所述文件可以为分布式文件；所述文件标识可以包括：所述分布式文件的逻辑文件的第一逻辑文件标识；则所述生成模块包括：获取子模块，用于根据所述当前系统时间相对于预定时间经历的总时长，获取第一时间域参数；生成子模块，用于根据预设的配置策略，生成所述第一逻辑文件标识；合成子模块，用于将所述第一时间域参数与所述第一逻辑文件标识合成为查找键值；查找子模块，用于在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值或者所述第一时间域参数和所述第一逻辑文件标识与查找到的查找键值指示的第二时间域参数和第二逻辑文件标识不完成相同，则将所述查找键值作为所述索引关键字。Wherein, the file may be a distributed file; the file identifier may include: the first logical file identifier of the logical file of the distributed file; The total duration of the system time relative to the predetermined time is used to obtain the first time domain parameter; the generation submodule is used to generate the first logical file identifier according to the preset configuration strategy; the synthesis submodule is used to combine the first time domain A time domain parameter and the first logical file identifier are synthesized into a lookup key; the lookup submodule is used to look up the lookup key in the recorded data area, if the lookup key or the first If a time domain parameter and the first logical file identifier are not completely the same as the second time domain parameter and the second logical file identifier indicated by the found lookup key value, then the lookup key value is used as the index key.

其中，所述文件标识还可以包括：所述分布式文件的分片文件的第一分片文件标识；则所述生成子模块，还用于根据预设的配置策略，生成所述第一分片文件标识；所述合成子模块，用于将所述第一时间域参数、所述第一逻辑文件标识与所述第一分片文件标识合成为查找键值；所述查找子模块，用于在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值，或者所述第一时间域参数和所述第一逻辑文件标识与查找到的查找键值指示的第二时间域参数和第二逻辑文件标识不完成相同且所述第一时间域参数和所述第一分片文件标识与所述第二时间域参数和所述第二分片文件标识不完成相同，则将所述查找键值作为所述索引关键字。Wherein, the file identifier may also include: the first fragment file identifier of the fragment file of the distributed file; the generating submodule is also used to generate the first fragment file according to a preset configuration Slice file identifier; the synthesis submodule is used to synthesize the first time domain parameter, the first logical file identifier and the first fragment file identifier into a search key; the search submodule uses If the search key value cannot be found in the data area of the record, or the first time domain parameter and the first logical file identifier are the first time indicated by the search key value. The second time domain parameter and the second logical file identifier are not completely identical and the first time domain parameter and the first fragment file identifier are not completely identical to the second time domain parameter and the second fragment file identifier , then use the lookup key value as the index key.

通过本发明，将写入时间与文件的文件标识结合组成索引关键字，解决了现有技术中海量数据存储的定位较慢的问题，进而达到了快速定位效果。Through the present invention, the writing time and the file identifier of the file are combined to form an index key, which solves the problem of slow positioning of massive data storage in the prior art, and further achieves a fast positioning effect.

附图说明Description of drawings

此处所说明的附图用来提供对本发明的进一步理解，构成本申请的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings described here are used to provide a further understanding of the present invention and constitute a part of the application. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute improper limitations to the present invention. In the attached picture:

图1是根据本发明实施例的海量数据记录的索引建立方法的流程图；Fig. 1 is a flowchart of an index building method for massive data records according to an embodiment of the present invention;

图2是根据本发明实施例的一种索引关键字的合成示意图；Fig. 2 is a composite schematic diagram of an index keyword according to an embodiment of the present invention;

图3是根据本发明实施例的另一种索引关键字的合成示意图；FIG. 3 is a composite schematic diagram of another index keyword according to an embodiment of the present invention;

图4是根据本发明实施例的时间域的位域分布示意图；FIG. 4 is a schematic diagram of bit field distribution in the time domain according to an embodiment of the present invention;

图5是根据本发明实施例的按秒混合后的时间域的位域分布示意图；Fig. 5 is a schematic diagram of the bit field distribution of the time field mixed by second according to an embodiment of the present invention;

图6是根据本发明实施例的按小时数混合后的时间域的位域分布示意图；Fig. 6 is a schematic diagram of the bit field distribution of the time field mixed according to the number of hours according to an embodiment of the present invention;

图7是根据本发明实施例的一种逻辑文件标识与分片文件标识的位域分布方式的示意图；FIG. 7 is a schematic diagram of a bit field distribution manner of a logical file identifier and a fragmented file identifier according to an embodiment of the present invention;

图8是根据本发明实施例的另一种逻辑文件标识与分片文件标识的位域分布方式的示意图；FIG. 8 is a schematic diagram of another bit field distribution manner of logical file identifiers and fragmented file identifiers according to an embodiment of the present invention;

图9是根据本发明实施例的又一种逻辑文件标识与分片文件标识的位域分布方式的示意图；Fig. 9 is a schematic diagram of yet another bit field distribution manner of logical file identifiers and fragmented file identifiers according to an embodiment of the present invention;

图10是根据本发明实施例的分布式文件的索引关键字生成的流程图；Fig. 10 is a flow chart of index key generation of distributed files according to an embodiment of the present invention;

图11是根据本发明实施例的海量数据记录的索引建立装置的结构示意图；FIG. 11 is a schematic structural diagram of an index establishment device for massive data records according to an embodiment of the present invention;

图12是根据本发明实施例的生成模块20的结构示意图。Fig. 12 is a schematic structural diagram of the generating module 20 according to an embodiment of the present invention.

具体实施方式Detailed ways

下文中将参考附图并结合实施例来详细说明本发明。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present invention will be described in detail with reference to the drawings and examples. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

图1是根据本发明实施例的海量数据记录的索引建立方法的流程图，如图1所示，该方法主要包括以下步骤(步骤S102-步骤S106)：Fig. 1 is a flowchart of a method for establishing an index of massive data records according to an embodiment of the present invention. As shown in Fig. 1, the method mainly includes the following steps (step S102-step S106):

步骤S102，在接收到新的写入文件请求消息时，获取当前系统时间；Step S102, when receiving a new write file request message, obtain the current system time;

在本发明实施例中，对于每次的新写入文件请求，在获取当前系统时间时，可以获取系统从某一预定时间(例如，可以为1970年1月1日0点)起到当前所经历的总时长，该总时长可以记录为：m年n小时k秒，或者，m年n分钟k秒。In this embodiment of the present invention, for each new request for writing a file, when obtaining the current system time, the current system time from a predetermined time (for example, 0 o'clock on January 1, 1970) to the current time can be obtained. The total time elapsed, which can be recorded as: m years, n hours, and k seconds, or, m years, n minutes, and k seconds.

步骤S104，根据所述当前系统时间和所述写入文件请求消息请求写入的文件的文件标识，生成所述文件的索引关键字；Step S104, generating an index key of the file according to the current system time and the file identifier of the file requested to be written by the write file request message;

以分布式文件为例，在分布式文件使用分片文件存储中，元数据被划分为逻辑文件和分片文件，在生成该文件的索引关键字时，可以将步骤S102获取的当前系统时间按位混合为时间域参数，然后将该时间域参数与分布式文件的逻辑文件的逻辑文件标识组合为索引关键字，如图2所示，或者，也可以将该时间域参数与分布文件的逻辑文件的逻辑文件和分片文件的分片文件标识一起组合为索引关键字，如图3所示。例如，索引关键字可以按照以下两种方式组合得到：Taking distributed files as an example, when distributed files are stored using fragmented files, the metadata is divided into logical files and fragmented files. When generating the index key of the file, the current system time obtained in step S102 can be divided into Bits are mixed into a time domain parameter, and then the time domain parameter is combined with the logical file identifier of the logical file of the distributed file as an index key, as shown in Figure 2, or the time domain parameter can also be combined with the logical file identifier of the distributed file The logical file of the file and the fragment file ID of the fragment file are combined into an index key, as shown in FIG. 3 . For example, index keys can be combined in the following two ways:

方式一，索引关键字由时间域参数和逻辑文件标识组成，主要包括以下步骤：Method 1, the index key is composed of time domain parameters and logical file identifiers, mainly including the following steps:

步骤A：根据所述当前系统时间相对于预定时间经历的总时长，获取时间域参数；Step A: Acquiring time domain parameters according to the total duration of the current system time relative to the predetermined time;

步骤B：根据预设的配置策略，生成所述逻辑文件标识；Step B: generating the logical file identifier according to a preset configuration strategy;

步骤C：将所述时间域参数与所述逻辑文件标识合成为查找键值；Step C: Synthesizing the time domain parameter and the logical file identifier into a lookup key value;

步骤D：在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值或者所述时间域参数和所述逻辑文件标识与查找到的查找键值指示的时间域参数和逻辑文件标识不完成相同，则表示步骤B生成的逻辑文件标识有效可用，将步骤A获取的时间域参数与步骤B中生成的逻辑文件标识合成的上述查找键值作为该逻辑文件的唯一ID，将所述查找键值作为所述索引关键字。Step D: look up the lookup key value in the data area of the record, if the lookup key value or the time domain parameter and the logical file identifier and the time domain parameter indicated by the found lookup key value cannot be found If the logical file identifiers are not identical, it means that the logical file identifiers generated in step B are valid and available, and the above search key value synthesized by the time domain parameters obtained in step A and the logical file identifiers generated in step B is used as the unique ID of the logical file. The lookup key value is used as the index key.

方式二，索引关键字由时间域参数、逻辑文件标识和分片文件标识组成，主要包括以下步骤：Method 2, the index key is composed of time domain parameters, logical file identifiers and fragmented file identifiers, mainly including the following steps:

步骤B：根据预设的配置策略，生成所述逻辑文件标识和所述分片文件标识；Step B: Generate the logical file identifier and the fragmented file identifier according to a preset configuration strategy;

步骤C：将所述时间域参数、所述逻辑文件标识与所述分片文件标识合成为查找键值；Step C: Synthesizing the time domain parameter, the logical file identifier and the fragmented file identifier into a lookup key value;

步骤D：在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值，或者步骤A中时间域参数和步骤B中逻辑文件标识与查找到的查找键值指示的时间域参数和逻辑文件标识不完成相同，且步骤A中的所述时间域参数和步骤B中的所述第分片文件标识与查找到的查找键值所指示的时间域参数和分片文件标识不完成相同，则表示步骤B中生成的逻辑文件标识和分片文件标识的组合有效可用，将步骤A获取的时间域参数与步骤B中生成的逻辑文件标识和分片文件标识合成的查找键值作为所述分片文件的唯一ID，将所述查找键值作为所述索引关键字。Step D: look up the lookup key value in the recorded data area, if the lookup key value cannot be found, or the time field parameter in step A and the logical file identifier in step B and the time indicated by the lookup key value found The domain parameters and logical file identifiers are not completely the same, and the time domain parameters in step A and the first fragment file identifier in step B are the time domain parameters and fragment file identifiers indicated by the found search key value If it is not the same, it means that the combination of the logical file identifier and fragmented file identifier generated in step B is valid and available, and the search key is synthesized by combining the time domain parameters obtained in step A with the logical file identifier and fragmented file identifier generated in step B The value is used as the unique ID of the fragment file, and the search key value is used as the index key.

其中，在上述方式一和方式二的步骤A中，根据上述总时长，获取时间域中各个位域的值，如图4所示，时间域可以包括：年域、小时域和秒域，例如，如果总时长为m年n小时k秒，则时间域中的所述年域用于记录所述m的值，所述小时域用于记录所述n的值，所述秒域用于记录所述k的值，m、n和k为大于或等于0的整数。其中，年域可以占用8位，小时域可以占用13-14位，而秒域可以占用13-14位。或者，总时长也可以描述为m年n分钟k秒，则将图4中的小时域替换为分钟域，由分钟域来记录总时长中的n值。然后可以按照配置策略对时间域按位混合得到新的时间域参数；Among them, in the step A of the above-mentioned mode 1 and mode 2, according to the above-mentioned total duration, obtain the value of each bit field in the time domain, as shown in Figure 4, the time domain can include: year domain, hour domain and second domain, for example , if the total duration is m years, n hours, and k seconds, then the year field in the time field is used to record the value of m, the hour field is used to record the value of n, and the second field is used to record The value of k, m, n and k are integers greater than or equal to 0. Wherein, the year field can occupy 8 bits, the hour field can occupy 13-14 bits, and the second field can occupy 13-14 bits. Alternatively, the total duration can also be described as m years, n minutes, and k seconds, then replace the hour field in Figure 4 with the minute field, and use the minute field to record the n value in the total duration. Then the time domain can be mixed bit by bit according to the configuration strategy to obtain new time domain parameters;

其中，时间域按位混合的策略可以有多种，例如，可以包括以下两种：Among them, there may be multiple strategies for bitwise mixing in the time domain, for example, the following two types may be included:

(1)基于单位小时内秒数的均衡：(1) Balance based on the number of seconds per hour:

将单位小时内秒数这个相对连续变化量与数据目标集的大小对齐到相同数量级，高于小时的位移到较低位，图5为按秒混合后的时间域的位域分布示意图。按照这种方式混合，可以离散短时间内大量文件的索引关键字生成的影响。Align the relative continuous variation of the number of seconds per unit hour with the size of the data target set to the same order of magnitude, and shift higher than the hour to a lower bit. Figure 5 is a schematic diagram of the bit field distribution of the time domain mixed by second. Mixing in this way can discretize the impact of index key generation for a large number of files in a short period of time.

(2)基于单位年度内小时或分钟数的均衡：(2) Equilibrium based on the number of hours or minutes in a unit year:

同上，将单位年度内小时或分钟数这个相对连续变化量与数据目标集的大小对齐到相同数量级，低于小时的位移到较低位，图6为按小时数混合后的时间域的位域分布示意图。采用这种方式，可以离散长时间内大量文件的索引关键字生成的影响。As above, the relative continuous variation of hours or minutes in a unit year is aligned with the size of the data target set to the same order of magnitude, and the displacement below the hour is shifted to a lower bit. Figure 6 shows the bit field distribution of the time domain mixed by the number of hours schematic diagram. In this way, it is possible to discretize the effect of index key generation for a large number of files over a long period of time.

在上述方式二中，如果步骤A中的时间域参数与步骤B中的分片文件标识与查找到的查找关键值指示的时间域参数和分片文件标识均相同，则需要修改步骤B中的分片文件标识的值得到新的分片文件标识，将该文片文件标识作为该分布式文件的分片文件标识，返回步骤C，重新生成新的索引关键字，直至找到未被占用的分片文件标识。In the above method 2, if the time domain parameter in step A and the fragment file identifier in step B are the same as the time domain parameter and fragment file identifier indicated by the search key value found, you need to modify the The value of the fragment file identifier obtains a new fragment file identifier, and uses the fragment file identifier as the fragment file identifier of the distributed file, returns to step C, and regenerates a new index key until an unoccupied partition is found. Slice file ID.

在上述方式一和方式二中，如果步骤A中的时间域参数和步骤B中的逻辑文件标识与查找到的查找键值指示的时间域参数和逻辑文件标识均相同，则表示该时间下的该逻辑文件标识已被占用，则对该逻辑文件标识进行修改，生成新的逻辑文件标识，返回步骤C生成新的索引关键字，直至找到该时间下未被占用的逻辑文件标识。In the above method 1 and method 2, if the time domain parameter in step A and the logical file ID in step B are the same as the time domain parameter and logical file ID indicated by the found search key value, it means that the If the logical file ID is already occupied, modify the logical file ID to generate a new logical file ID, and return to step C to generate a new index keyword until an unoccupied logical file ID is found at that time.

在本发明实施例中，为了使用方便，逻辑文件标识和分片文件标识可以分别进行全局计算增量，在步骤B中生成逻辑文件标识和分片文件标识时，获取上一次生成的逻辑文件标识或分片文件标识的值，将该值增加指定增量后，得到该分布式文件的逻辑文件标识和分片文件标识，则在上述对文件标识进行修改时，可以将该文件标识增加指定增量(例如，1)得到新的文件标识。In the embodiment of the present invention, for the convenience of use, the logical file identifier and the fragmented file identifier can respectively perform global calculation increments, and when generating the logical file identifier and the fragmented file identifier in step B, obtain the logical file identifier generated last time or the value of the fragmented file identifier. After increasing the value by the specified increment, the logical file identifier and the fragmented file identifier of the distributed file are obtained. When the file identifier is modified above, the file identifier can be increased by the specified increment. amount (for example, 1) to get the new file ID.

在本发明实施例，逻辑文件标识和分片文件标识所占的bit位需要根据逻辑文件大小及分片文件的大小来决定，例如，对于少量的大文件，可以采用如图7所示的位域分布方式，在这种情况下，分片文件占用较多的位；对于大量的小文件，可以采用如图8所示的位域分布方式，在这种情况下，需要逻辑文件占用较多位；而在混合情形下，即有大文件，也有较多小文件的情况下或者是无法确定存储的大小文件数量比例的情况下，可以直接取均值划分，即采用如图9所示的位域分布方式。通过选择不同的位域分布方式，可以使得相同时间点处理的文件标识增量不同，从而可以适用于不同的应用场景。In the embodiment of the present invention, the bits occupied by the logical file identifier and the fragmented file identifier need to be determined according to the size of the logical file and the fragmented file. For example, for a small number of large files, the bit position as shown in FIG. Domain distribution mode, in this case, fragmented files occupy more bits; for a large number of small files, the bit field distribution mode as shown in Figure 8 can be used, in this case, logical files need to occupy more In a mixed situation, where there are large files and many small files, or the ratio of the number of stored large and small files cannot be determined, the average value can be directly used to divide, that is, the bit as shown in Figure 9 is used. domain distribution. By selecting different bit field distribution methods, the increments of file identifiers processed at the same time point can be different, so that it can be applied to different application scenarios.

在上述方式一和方式二的步骤C中，在合成查找键值时，可以采用折叠取模法进行合成，例如，对于方式一，将步骤A生成的时间域参数加上步骤B生成的逻辑文件ID或时间域参数与逻辑文件ID拼接成为64位的数值，不足64位部分用0填充，得到关键字，查找键值即为该关键字对应用规模值进行算术取模操作得到。其中，应用规模值是指当前应用场景下预计系统可以存储的文件总数。In the step C of the above method 1 and method 2, when synthesizing the search key value, the folding modulo method can be used for synthesis. For example, for method 1, add the time domain parameters generated in step A to the logic file generated in step B The ID or time field parameters are concatenated with the logical file ID to form a 64-bit value, and the part less than 64 bits is filled with 0 to obtain a keyword, and the search key value is obtained by performing an arithmetic modulo operation on the application scale value of the keyword. Wherein, the application scale value refers to the estimated total number of files that the system can store in the current application scenario.

步骤S106，建立所述索引关键字与所述文件的关联关系。Step S106, establishing an association relationship between the index key and the file.

例如，对于分布式文件存储，可以在内存数据库中申请一个空的记录位置，将逻辑文件名称及步骤S104得到的索引关键字的对应关系存储到该记录位置，同时将该索引关键字加入到索引数据区中，即将该索引关键字与该分布式文件的逻辑文件的存储位置对应，从而完成了索引关键字与逻辑文件的关联构建。For example, for distributed file storage, you can apply for an empty record location in the memory database, store the corresponding relationship between the logical file name and the index keyword obtained in step S104 in the record location, and add the index keyword to the index In the data area, the index key corresponds to the storage location of the logical file of the distributed file, thereby completing the construction of the association between the index key and the logical file.

在建立了逻辑文件与索引关键字的关联后，物理存储文件即分片文件使用该索引关键字作为实际存储的文件名称，从而实现了逻辑文件名称到物理存储的映射关系。通过逻辑文件名称可以查询到物理文件存储名称，通过物理存储文件的名称即索引关键字也可以查询到逻辑文件名称。After the association between the logical file and the index key is established, the physical storage file, that is, the fragmented file, uses the index key as the actual stored file name, thereby realizing the mapping relationship between the logical file name and the physical storage. The storage name of the physical file can be queried through the logical file name, and the logical file name can also be queried through the name of the physical storage file, that is, the index keyword.

下面以采用分布式文件的逻辑文件标识生成该分布式文件的索引关键字为例，说明在本发明实施例中如何生成分布式文件的索引关键字。The following uses the logical file identification of the distributed file to generate the index key of the distributed file as an example to illustrate how to generate the index key of the distributed file in the embodiment of the present invention.

如图10所示，在本发明实施例中可以按照以下步骤生成分布式文件的索引关键字：As shown in Figure 10, in the embodiment of the present invention, the index key of the distributed file can be generated according to the following steps:

步骤1001，在接收到写入分布式文件的请求时，获取系统时间的总秒数；Step 1001, when receiving a request to write a distributed file, obtain the total number of seconds of the system time;

步骤1002，判断是采用基于单位小时内秒数的均衡的策略还是采用基于单位年度内小时数的均衡策略生成时间域数值；Step 1002, judging whether to use the balanced strategy based on the number of seconds per unit hour or the balanced strategy based on the number of hours per unit year to generate the time domain value;

步骤1003，采用步骤1002指示的策略生成时间域数值；Step 1003, adopting the strategy indicated in step 1002 to generate a time domain value;

步骤1004，判断是采用大文件的位域分布方式还是采用小文件的位域分布方式，即采用如图7所示的位域分布方式还是如图8所示的位域分布方式；Step 1004, judging whether to adopt the bit field distribution mode of the large file or the bit field distribution mode of the small file, that is, adopt the bit field distribution mode as shown in FIG. 7 or the bit field distribution mode as shown in FIG. 8;

步骤1005，采用步骤1004选定的策略生成该分布式文件的逻辑文件ID；Step 1005, adopting the strategy selected in step 1004 to generate the logical file ID of the distributed file;

步骤1006，将步骤1003生成的时间域数值与步骤1005生成的逻辑文件ID合成新的查找键值；Step 1006, synthesizing the time domain value generated in step 1003 and the logical file ID generated in step 1005 into a new search key value;

步骤1007，使用步骤1006合成的查找键值检查哈希入口数据区中是否可以查找到该查找键值的数值，如果是，则执行步骤1008，否则，执行步骤1010；Step 1007, using the lookup key value synthesized in step 1006 to check whether the value of the lookup key value can be found in the hash entry data area, if yes, then perform step 1008, otherwise, perform step 1010;

步骤1008，判断步骤1003生成的时间域数值和步骤1005生成的逻辑文件ID与查找到的数值所指示的时间域数值和逻辑文件ID是否均相同，如果是，则执行步骤1009；否则，执行步骤1010；Step 1008, determine whether the time domain value generated in step 1003 and the logical file ID generated in step 1005 are the same as the time domain value and logical file ID indicated by the found value, if yes, then execute step 1009; otherwise, execute step 1010;

步骤1009，将步骤1005生成的逻辑文件ID增加一个增量，得到该分布式文件的新的逻辑文件ID，返回执行步骤1006；Step 1009, adding an increment to the logical file ID generated in step 1005 to obtain the new logical file ID of the distributed file, and returning to step 1006;

步骤1010，将步骤1003生成的时间域数值与步骤1005生成的逻辑文件ID组合作为逻辑文件的唯一ID，将当前生成的查找键值作为该分布式文件的索引关键字。Step 1010, combine the time domain value generated in step 1003 with the logical file ID generated in step 1005 as the unique ID of the logical file, and use the currently generated search key as the index key of the distributed file.

在本发明实施例中，建立文件的索引关键字后，还可以根据实际应用情况，对存储的文件的索引关键字进行修改。例如，在IPTV多个现场设备应用中，可以通过对设备运行现场的数据进行分析，得到针对每月、每天、每小时的分布统计数据。然后对统计数据进行分析，如果数据写入的时间分布是不均衡的，多数的写入请求时间重叠，即在一秒钟时间内，发生了多次的写入请求，对百万级数据量的统计中最高的1秒并发写入请求可能达到400次，对千万级数据量的统计中，最高的1秒并发写入请求可能达到2000多次。如果有时间并发的写入请求超过预定的比率(例如，80％)，在这种情况下，偏移量的位混合选择就非常关键。因此，在本发明实施例中，还可以修改或配置分布式文件的逻辑文件标识和分片文件标识占用的位，也可以修改或配置逻辑文件标识(和分片文件标识)与时间域数值混合的顺序，从而可以有效避免数据冲突。同样，如果统计结果与时间域高度相关的，则可以通过修改或配置时间域的位混合范围及顺序，即修改或配置时间域按位混合的策略，从而有效避免数据冲突，提升数据索引的效率。进行修改时，系统的索引查询功能暂时失效，并需要重新构建新的索引，当新的索引创建完成，系统能够更加高效的进行服务。In the embodiment of the present invention, after the index key of the file is established, the index key of the stored file may be modified according to actual application conditions. For example, in the application of multiple field devices of IPTV, the distribution statistical data for each month, day and hour can be obtained by analyzing the data of the device operation site. Then analyze the statistical data. If the time distribution of data writing is unbalanced and most of the writing requests overlap, that is, multiple writing requests occur within one second. According to statistics, the highest number of concurrent write requests per second may reach 400 times, and in the statistics of tens of millions of data volumes, the highest number of concurrent write requests per second may reach more than 2,000 times. If there are times when concurrent write requests exceed a predetermined ratio (eg, 80%), in which case the bit-blending selection of the offset is critical. Therefore, in the embodiment of the present invention, it is also possible to modify or configure the bits occupied by the logical file identifier and fragment file identifier of the distributed file, and also modify or configure the logical file identifier (and fragment file identifier) mixed with the time domain value order, so that data conflicts can be effectively avoided. Similarly, if the statistical results are highly correlated with the time domain, you can modify or configure the bit mixing range and order of the time domain, that is, modify or configure the bit-by-bit mixing strategy of the time domain, so as to effectively avoid data conflicts and improve the efficiency of data indexing . When the modification is made, the index query function of the system is temporarily disabled, and a new index needs to be rebuilt. When the new index is created, the system can provide services more efficiently.

需要说明的是，虽然本发明实施例以分布式文件的存储为例进行说明，但并不限于此，本发明实施例提供的技术方案也可以应用于其他文件的存储，例如，如果其他文件不包括逻辑文件和分片文件，则可以直接以该文件的标识与时间域数值合成该文件的索引关键值，具体实施过程与分布式文件相似，在此不再赘述。It should be noted that although the embodiment of the present invention is described by taking the storage of distributed files as an example, it is not limited thereto. The technical solution provided by the embodiment of the present invention can also be applied to the storage of other files. For example, if other files do not Including logical files and fragmented files, the index key value of the file can be directly synthesized with the identifier of the file and the value in the time domain. The specific implementation process is similar to that of distributed files, and will not be repeated here.

图11为根据本发明实施例的海量数据记录的索引建立装置，如图11所示，该装置主要包括：获取模块10，用于在接收到新的写入文件请求时，获取当前系统时间；生成模块20，用于根据所述当前系统时间和所述写入文件请求消息请求写入的文件的文件标识，生成所述分布式文件的索引关键字；建立模块30，用于建立所述索引关键字与所述文件的关联关系。Fig. 11 is an index establishment device for mass data records according to an embodiment of the present invention. As shown in Fig. 11, the device mainly includes: an acquisition module 10, which is used to acquire the current system time when a new file write request is received; The generation module 20 is used to generate the index key of the distributed file according to the current system time and the file identifier of the file requested to be written by the write file request message; the establishment module 30 is used to establish the index The association relationship between the keyword and the file.

对于分布式文件，生成模块20可以根据该分布式文件的第一逻辑文件标识和所述当前系统时间生成索引，在这种情况下，如图12所示，生成模块20可以包括：获取子模块210，用于根据所述当前系统时间相对于预定时间经历的总时长，获取第一时间域参数；生成子模块220，用于根据预设的配置策略，生成所述第一逻辑文件标识；合成子模块230，用于将所述第一时间域参数与所述第一逻辑文件标识合成为查找键值；查找子模块240，用于在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值或者所述第一时间域参数和所述第一逻辑文件标识与查找到的查找键值指示的第二时间域参数和第二逻辑文件标识不完成相同，则将所述查找键值作为所述索引关键字。For distributed files, the generation module 20 can generate an index according to the first logical file identifier of the distributed file and the current system time. In this case, as shown in FIG. 12 , the generation module 20 can include: an acquisition submodule 210, for obtaining the first time domain parameter according to the total duration of the current system time relative to the predetermined time; generating submodule 220, for generating the first logical file identifier according to the preset configuration strategy; synthesizing The submodule 230 is used to synthesize the first time domain parameter and the first logical file identifier into a lookup key; the lookup submodule 240 is used to look up the lookup key in the recorded data area, if not If the search key value or the first time domain parameter and the first logical file identifier are found to be different from the second time domain parameter and the second logical file identifier indicated by the search key value, then the The above lookup key value is used as the index key.

其中，获取子模块210根据上述总时长，获取时间域中各个位域的值，其中，时间域可以包括：年域、小时域和秒域，例如，如果总时长为m年n小时k秒，则时间域中的所述年域用于记录所述m的值，所述小时域用于记录所述n的值，所述秒域用于记录所述k的值，m、n和k为大于或等于0的整数。其中，年域可以占用8位，小时域可以占用13-14位，而秒域可以占用13-14位。或者，总时长也可以描述为m年n分钟k秒，则将图4中的小时域替换为分钟域，由分钟域来记录总时长中的n值。然后可以按照配置策略对时间域按位混合得到新的时间域参数；其中，可以采用上述的基于单位小时内秒数的均衡策略，也可以上述基于单位年度内小时或分钟数的均衡的策略。Wherein, the acquiring submodule 210 acquires the value of each bit field in the time domain according to the above-mentioned total duration, wherein the time domain may include: a year domain, an hour domain and a second domain, for example, if the total duration is m years n hours k seconds, Then the year field in the time field is used to record the value of m, the hour field is used to record the value of n, and the second field is used to record the value of k, m, n and k are An integer greater than or equal to 0. Wherein, the year field can occupy 8 bits, the hour field can occupy 13-14 bits, and the second field can occupy 13-14 bits. Alternatively, the total duration can also be described as m years, n minutes, and k seconds, then replace the hour field in Figure 4 with the minute field, and use the minute field to record the n value in the total duration. Then, the time domain can be mixed bit by bit according to the configuration strategy to obtain new time domain parameters; the above-mentioned equalization strategy based on the number of seconds in a unit hour can be used, or the above-mentioned equalization strategy based on the number of hours or minutes in a unit year can be used.

如果查找子模块240从数据区中找到与上述查找键值相同的值，且获取子模块210获取的时间域参数和生成子模块220生成的逻辑文件标识与查找到的值指示的时间域参数和逻辑文件标识均相同，则表示该时间下的该逻辑文件标识已被占用，查找子模块240触发生成子模块220对该逻辑文件标识进行修改，生成新的逻辑文件标识，将该新的逻辑文件标识输入到合成子模块230，触发合成子模块230合成新的查找键值。If the search sub-module 240 finds the same value as the above-mentioned search key value from the data area, and the time domain parameters obtained by the acquisition sub-module 210 and the logical file identifier generated by the generation sub-module 220 and the time domain parameters indicated by the found value and Logical file marks are all identical, then represent that this logical file mark under this time has been occupied, and search submodule 240 triggers generation submodule 220 to revise this logical file mark, generates new logical file mark, and this new logical file mark The identification is input to the synthesis sub-module 230, and the synthesis sub-module 230 is triggered to synthesize a new lookup key.

在本发明实施例中，生成模块20还可以根据该分布式文件的第一逻辑文件标识、第二分片文件标识和所述当前系统时间生成索引，在这种情况下，所述生成子模块220，还用于根据预设的配置策略，生成所述第一分片文件标识；所述合成子模块230，用于将所述第一时间域参数、所述第一逻辑文件标识与所述第一分片文件标识合成为查找键值；所述查找子模块240，用于在记录的数据区中查找所述查找键值，如果不能查找到所述查找键值，或者所述第一时间域参数和所述第一逻辑文件标识与查找到的查找键值指示的第二时间域参数和第二逻辑文件标识不完成相同且所述第一时间域参数和所述第一分片文件标识与所述第二时间域参数和所述第二分片文件标识不完成相同，则将所述查找键值作为所述索引关键字。In the embodiment of the present invention, the generation module 20 may also generate an index according to the first logical file identifier, the second fragment file identifier, and the current system time of the distributed file. In this case, the generation submodule 220, further configured to generate the first fragment file identifier according to a preset configuration policy; the synthesizing submodule 230, configured to combine the first time domain parameter, the first logical file identifier with the The first fragment file identification is synthesized into a lookup key; the lookup submodule 240 is used to look up the lookup key in the recorded data area, if the lookup key cannot be found, or the first time The domain parameter and the first logical file identifier are not completely the same as the second time domain parameter and the second logical file identifier indicated by the found lookup key, and the first time domain parameter and the first fragment file identifier If it is not the same as the second time domain parameter and the second fragment file identifier, then the lookup key value is used as the index key.

同样，如果查找子模块240从数据区中找到与上述查找键值相同的值，且获取子模块210获取的时间域参数和生成子模块220生成的分片文件标识与查找到的值指示的时间域参数和逻辑文件标识均相同，则表示该时间下的该分片文件标识已被占用，查找子模块240触发生成子模块220对该分片文件标识进行修改，生成新的分片文件标识，将该新的分片文件标识输入到合成子模块230，触发合成子模块230合成新的查找键值。Similarly, if the search sub-module 240 finds the same value as the above-mentioned search key value from the data area, and the time domain parameter obtained by the acquisition sub-module 210 and the fragment file identifier generated by the generation sub-module 220 and the time indicated by the searched value Domain parameter and logic file identification are all identical, then represent this fragmentation file identification under this time has been occupied, search submodule 240 trigger generation submodule 220 this fragmentation file identification is modified, generate new fragmentation file identification, The new fragment file identifier is input into the synthesis sub-module 230, and the synthesis sub-module 230 is triggered to synthesize a new lookup key.

其中，建立模块30在建立所述索引关键字与所述文件的关联关系时，可以在记录所述文件的内存数据库中申请一个空的记录位置，将所述分布式文件的逻辑文件名称与所述索引关键字的对应关系存储到所述记录位置，并将所述索引关键字加入到索引数据区中；并在存储所述分布式文件的分片文件时，将所述索引关键字作为所述分片文件实际存储的文件名称。从而实现所述索引关键字与所述文件的关联。Wherein, when establishing the association relationship between the index key and the file, the establishment module 30 can apply for an empty recording position in the memory database that records the file, and combine the logical file name of the distributed file with the The corresponding relationship of the index key is stored in the recording location, and the index key is added to the index data area; and when the fragmented file of the distributed file is stored, the index key is used as the The name of the file actually stored in the fragmented file. In this way, the association between the index key and the file is realized.

本发明实施例提供的上述装置还可以包括检测模块，用于检测有时间并发的写入请求是否超过预设比率，如果是，则触发更新模块；所述更新模块用于修改或配置分布式文件的逻辑文件标识和分片文件标识所占的位，和/或，修改或配置按位混合得到所述第一时间域参数的配置策略，触发生成模块20，重新生成所述分布式文件的新的索引关键字。The above-mentioned device provided by the embodiment of the present invention may also include a detection module, which is used to detect whether the time-concurrent write request exceeds the preset ratio, and if so, trigger the update module; the update module is used to modify or configure the distributed file The bits occupied by the logical file identifier and the fragmented file identifier, and/or modify or configure the configuration strategy for obtaining the first time domain parameter by bitwise mixing, trigger the generation module 20, and regenerate the new distributed file index key.

从以上的描述中，可以看出，在本发明实施例中，通过对索引关键字的位选择混合，并对不同的应用模型进行分析，修改适配参数优化索引的效率，保证了系统的及时高并发快速处理能力。From the above description, it can be seen that in the embodiment of the present invention, by selecting and mixing the bits of the index key and analyzing different application models, modifying the adaptation parameters to optimize the efficiency of the index ensures the timely operation of the system. High concurrency and fast processing capability.

显然，本领域的技术人员应该明白，上述的本发明的各模块或各步骤可以用通用的计算装置来实现，它们可以集中在单个的计算装置上，或者分布在多个计算装置所组成的网络上，可选地，它们可以用计算装置可执行的程序代码来实现，从而，可以将它们存储在存储装置中由计算装置来执行，并且在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤，或者将它们分别制作成各个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样，本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned present invention can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network formed by multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here The steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present invention is not limited to any specific combination of hardware and software.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. An index building method for massive data records, characterized in that, comprising:

When receiving a new write file request message, get the current system time;

Generate an index key of the file according to the current system time and the file identifier of the file requested to be written by the write file request message;

An association relationship between the index key and the file is established.

2. The method according to claim 1, wherein the file is a distributed file.

3. The method according to claim 2, wherein the file identifier comprises: a first logical file identifier of a logical file of the distributed file; generating the file identifier according to the current system time and the file identifier Index keywords include:

Step A: Acquiring the first time domain parameter according to the total duration of the current system time relative to the predetermined time;

Step B: generating the first logical file identifier according to a preset configuration strategy;

Step C: Synthesizing the first time domain parameter and the first logical file identifier into a lookup key value;

Step D: look up the lookup key value in the recorded data area, if the lookup key value or the first time domain parameter and the first logical file identifier and the lookup key value indicated by the lookup key value cannot be found If the second time domain parameter and the second logical file identifier are not identical, it means that the first logical file identifier is valid, and the lookup key value synthesized by the first time domain parameter and the first logical file identifier is used as The unique ID of the logical file, using the lookup key as the index key.

4. The method according to claim 2, wherein the file identifier comprises: the first logical file identifier of the logical file of the distributed file and the first fragment of the fragmented file of the distributed file File identification; generating the index key according to the current system time and the file identification includes:

Step B: generating the first logical file identifier and the first fragment file identifier according to a preset configuration strategy;

Step C: Synthesizing the first time domain parameter, the first logical file identifier and the first fragment file identifier into a lookup key;

Step D: Searching for the search key value in the recorded data area, if the search key value cannot be found, or the first time domain parameter and the first logical file identifier are indicated by the search key value found The second time domain parameter and the second logical file identifier are not completely the same, and the first time domain parameter and the first fragment file identifier are the same as the second time domain parameter and the second fragment file identifier If the same is not completed, it means that the combination of the first logical file identifier and the first fragment file identifier is valid, and the first time domain parameter, the first logical file identifier and the first fragment file The synthesized search key is identified as the unique ID of the fragment file, and the search key is used as the index key.

5. The method according to claim 4, wherein if the first time domain parameter is identical to the second time domain parameter and the first fragment file identifier is identical to the second fragment file identifier same, then the method also includes:

Modify the value of the first fragment file identifier to generate a new fragment file identifier, and then use the new fragment file identifier as the first fragment file identifier, and return to step C.

6. The method of claim 5, wherein,

Generating the first fragment file identifier includes: obtaining the value of the fragment file identifier generated last time, increasing the value by a specified increment to obtain the first fragment file identifier, wherein the fragment file The number of digits occupied by the identifier is determined by the configuration policy;

Modifying the value of the first fragment file identifier to generate a new fragment file identifier includes: increasing the value of the first fragment file identifier by the specified increment to obtain the new fragment file identifier.

7. The method according to claim 3 or 4, wherein if the first time domain parameter is the same as the second time domain parameter and the first logical file identifier is the same as the second logical file identifier If the same, the method further includes: modifying the value of the first logical file identifier to generate a new logical file identifier, and then using the new logical file identifier as the first logical file identifier, and returning to step C.

8. The method of claim 7, wherein,

Generating the first logical file identifier includes: obtaining the value of the last generated logical file identifier, increasing the value by a specified increment to obtain the first logical file identifier, wherein the logical file identifier occupies The number of bits is determined by the configuration policy;

Modifying the value of the first logical file identifier to generate a new logical file identifier includes: increasing the value of the first logical file identifier by the specified increment to obtain the new logical file identifier.

9. The method according to any one of claims 3-6, wherein said step A comprises:

According to the total duration, the value of each bit field in the time domain is obtained, wherein the bit fields of the time domain include: a year field, an hour field or a minute field, and a second field, and the total duration is m years n hours or minute k seconds, the year field is used to record the value of m, the hour field or minute field is used to record the value of n, the second field is used to record the value of k, m, n and k is an integer greater than or equal to 0;

According to the configuration strategy, the first time domain parameter is obtained by bitwise mixing the time domain, wherein the configuration strategy includes: aligning the number of seconds in a unit hour with the time domain to the same order of magnitude, and aligning the year domain, And the displacement of the hour domain or the minute domain to the lower position to obtain the first time domain parameter; or, align the hours or minutes in the unit year with the time domain to the same order of magnitude, and shift the corresponding displacement of the year domain to a lower Low bits get the first time domain parameter.

10. The method according to claim 9, characterized in that, after establishing the association relationship between the index keyword and the file, the method further comprises:

Concurrent write requests with the same amount of time exceed a preset ratio;

Modify or configure the logical file identifier and the bit occupied by the fragmented file identifier of the distributed file, and/or modify or configure the configuration strategy for obtaining the first time domain parameter by bitwise mixing, return to the step B, and regenerate The new index key for the distributed file.

11. The method according to any one of claims 3-6, characterized in that the search key value is synthesized according to a method of folding and modulo taking.

12. The method according to any one of claims 2 to 6, wherein establishing the association relationship between the index keyword and the file comprises:

Apply for an empty record location in the memory database that records the file, store the correspondence between the logical file name of the distributed file and the index keyword in the record location, and add the index keyword into the index data area;

When storing the fragmented files of the distributed files, the index key is used as the file name actually stored in the fragmented files.

13. An index building device for massive data records, characterized in that it comprises:

The obtaining module is used to obtain the current system time when a new file writing request is received;

A generating module, configured to request to write the file according to the current system time and the write file request message

File identification, generating an index key of the distributed file;

An establishment module, configured to establish an association relationship between the index key and the file.

14. The device according to claim 13, wherein the file is a distributed file; the file identifier comprises:

The first logical file identifier of the logical file of the distributed file; the generating module includes:

An acquisition submodule, configured to acquire the first time domain parameter according to the total time elapsed by the current system time relative to the predetermined time;

A generating submodule, configured to generate the first logical file identifier according to a preset configuration policy;

a synthesis submodule, configured to synthesize the first time domain parameter and the first logical file identifier into a lookup key;

A search submodule, configured to search for the search key in the data area of the record, if the search key or the first time domain parameter and the first logical file identifier and the search key cannot be found If the second time domain parameter indicated by the value is not exactly the same as the second logical file identifier, then the lookup key value is used as the index key.

15. The device according to claim 14, wherein the file identifier further comprises: a first fragment file identifier of a fragment file of the distributed file;

The generating submodule is further configured to generate the first fragment file identifier according to a preset configuration strategy;

The synthesis submodule is configured to synthesize the first time domain parameter, the first logical file identifier and the first fragment file identifier into a lookup key;

The search submodule is used to search for the search key value in the recorded data area, if the search key value cannot be found, or the first time domain parameter and the first logical file identification are found and found The second time domain parameter indicated by the lookup key value and the second logical file identifier are not completely the same, and the first time domain parameter and the first slice file identifier are not identical to the second time domain parameter and the second If the identifiers of the fragmented files are not completely the same, the lookup key value is used as the index key.