CN104679889A

CN104679889A - Big data processing-oriented data storage method and device

Info

Publication number: CN104679889A
Application number: CN201510117104.5A
Authority: CN
Inventors: 黄先芝; 徐正礼; 魏金雷
Original assignee: Inspur Group Co Ltd
Current assignee: Inspur Group Co Ltd
Priority date: 2015-03-17
Filing date: 2015-03-17
Publication date: 2015-06-03

Abstract

The invention provides a big data processing-oriented data storage method and a big data processing-oriented data storage device. The method comprises the following steps: setting entity objects, and establishing a hyper archive data table; establishing a corresponding relationship between each entity object and data rows in the hyper archive data table, and setting metadata corresponding to each column in the data rows; for a current entity object to be stored, searching for the data row corresponding to the current entity object in the hyper archive data table; storing all data related to the entity object into the found data row and in the column corresponding to the metadata in the data row in the hyper archive data table according to an attribute of the entity object. According to the scheme, big data storage efficiency can be improved.

Description

A data storage method and device for big data processing

技术领域technical field

本发明涉及网络通信技术领域，特别涉及一种面向大数据处理的数据存储方法和装置。The invention relates to the technical field of network communication, in particular to a data storage method and device for big data processing.

背景技术Background technique

随着数据收集手段的不断丰富及完善，越来越多的行业数据被积累下来。数据规模已经增长到了传统软件行业无法承载的大数据(百GB、TB、乃至PB)级别。在大数据场景下，如何对大数据进行存储则成为了重要的计算问题。With the continuous enrichment and improvement of data collection methods, more and more industry data has been accumulated. The scale of data has grown to the level of big data (hundreds of GB, TB, or even PB) that cannot be carried by the traditional software industry. In the big data scenario, how to store big data has become an important computing problem.

目前，可以采用关系型数据库来存储大数据。比如，将具有关联关系的多个数据分别存储在不同数据库的不同数据表中，并记录该各个不同数据中存储的数据之间的关系，以便将各个数据关联起来。Currently, relational databases can be used to store big data. For example, multiple data with associated relationships are stored in different data tables of different databases, and the relationship between the data stored in the different data is recorded, so as to associate the various data.

可见，目前的利用关联关系将各个数据存储到不同数据库的不同数据表中的做法，数据存储方式松散，其关联关系必须通过关系型数据库来体现，对于大数据，这种松散存储数据及利用关联关系记录不同数据表中数据的做法，则大大降低了数据存储的效率，并会进一步降低后续查找和维护的效率。It can be seen that the current practice of storing each data in different data tables of different databases using association relationships is loose, and the association relationship must be reflected by a relational database. For big data, this loose storage of data and use of associations The practice of relationally recording data in different data tables greatly reduces the efficiency of data storage, and further reduces the efficiency of subsequent search and maintenance.

发明内容Contents of the invention

本发明提供一种面向大数据处理的数据存储方法和装置，能够提高数据存储效率。The invention provides a data storage method and device for big data processing, which can improve data storage efficiency.

一种面向大数据处理的数据存储方法，设置实体对象，并建立超级档案数据表；建立每一个实体对象与所述超级档案数据表中数据行的对应关系，并设置数据行中每一列对应的元数据；A data storage method oriented to big data processing, setting entity objects, and establishing a super file data table; establishing the corresponding relationship between each entity object and the data row in the super file data table, and setting the corresponding relationship of each column in the data row metadata;

针对需存储的当前实体对象，查找与所述当前实体对象对应的超级档案数据表中的数据行；For the current entity object to be stored, find the data row in the super file data table corresponding to the current entity object;

将实体对象相关的所有数据按照其属性存储到所述查找到的超级档案数据表中的数据行中，并位于该数据行中对应元数据的列上。All data related to the entity object is stored in the data row in the found super archive data table according to its attributes, and is located in the corresponding metadata column in the data row.

所述建立每一个实体对象与所述超级档案数据表中数据行的对应关系包括：建立每一个实体对象与所述超级档案数据表中一个数据行的对应关系；The establishment of the corresponding relationship between each entity object and the data row in the super file data table includes: establishing the corresponding relationship between each entity object and a data row in the super file data table;

所述将实体对象相关的所有数据存储到所述查找到的超级档案数据表中的数据行中包括：将实体对象相关的所有数据存储到所述查找到的超级档案数据表中的一行中。The storing all the data related to the entity object into the data row in the found super file data table includes: storing all the data related to the entity object into a row in the found super file data table.

所述超级档案数据表中的元素包括：数据表(SupDocTable)、键(SupDocKey)、键组件(SupDocKeyComponent)、列簇(SupDocFamily)、GROUP列簇(SupDocGroupFamily)、MAP列簇(SupDocMapFamily)、列(SupDocColumn)中的任意一个或多个。Elements in the described super file data table include: data table (SupDocTable), key (SupDocKey), key component (SupDocKeyComponent), column family (SupDocFamily), GROUP column family (SupDocGroupFamily), MAP column family (SupDocMapFamily), column ( Any one or more of SupDocColumn).

当所述超级档案数据表中的元素包括列簇时；设置每一个列簇对应的主题域；When the elements in the super file data table include column clusters; set the subject field corresponding to each column cluster;

所述将实体对象相关的所有数据存储到所述查找到的超级档案数据表中的数据行中包括：将实体对象相关的所有数据中属于同一主题域的数据存储到该主题域对应的列簇中，列簇中的每个列上存储的数据为简单数据类型或者聚合单元的复杂数据类型。The storing all the data related to the entity object into the data row in the found super file data table includes: storing the data belonging to the same subject domain among all the data related to the entity object into the column cluster corresponding to the subject domain In , the data stored on each column in the column family is a simple data type or a complex data type of an aggregation unit.

该方法进一步包括：The method further includes:

在每一个实体对象对应的数据行中，设置出计算结果存储列；In the data row corresponding to each entity object, set the calculation result storage column;

利用所述超级档案数据表中对应第一实体对象的数据行中的已有指定数据，推导计算出对应第一实体对象的计算结果，将该计算结果写入所述第一实体对象对应的数据行中的所述计算结果存储列。Using the existing specified data in the data row corresponding to the first entity object in the super file data table, deduce and calculate the calculation result corresponding to the first entity object, and write the calculation result into the data corresponding to the first entity object The calculation result is stored in a row column.

该方法进一步包括：The method further includes:

设置汇总聚集计算结果存储表；Set the summary aggregation calculation result storage table;

利用所述超级档案数据表的所有行上的数据进行汇总聚集计算，将计算结果写入所述汇总聚集计算结果存储表。Using the data on all the rows of the super archive data table to perform summary aggregation calculation, and write the calculation results into the summary aggregation calculation result storage table.

一种面向大数据处理的数据存储装置，包括：A data storage device for big data processing, comprising:

设置单元，用于设置实体对象，并建立超级档案数据表；建立每一个实体对象与所述超级档案数据表中数据行的对应关系；The setting unit is used to set the entity object and establish the super file data table; establish the corresponding relationship between each entity object and the data row in the super file data table;

查找单元，用于针对需存储的当前实体对象，查找与所述当前实体对象对应的超级档案数据表中的数据行，并设置数据行中每一列对应的元数据；The search unit is used to search for the data row in the super file data table corresponding to the current entity object for the current entity object to be stored, and set the metadata corresponding to each column in the data row;

存储执行单元，用于将实体对象相关的所有数据按照其属性存储到所述查找到的超级档案数据表中的数据行中，并位于该数据行中对应元数据的列上。The storage execution unit is used to store all the data related to the entity object into the data row in the found super archive data table according to its attributes, and locate it on the column corresponding to the metadata in the data row.

所述设置单元，用于设置所述超级档案数据表中的元素包括：数据表(SupDocTable)、键(SupDocKey)、键组件(SupDocKeyComponent)、列簇(SupDocFamily)、GROUP列簇(SupDocGroupFamily)、MAP列簇(SupDocMapFamily)、列(SupDocColumn)中的任意一个或多个。The setting unit is used to set the elements in the super file data table including: data table (SupDocTable), key (SupDocKey), key component (SupDocKeyComponent), column family (SupDocFamily), GROUP column family (SupDocGroupFamily), MAP Any one or more of column family (SupDocMapFamily) and column (SupDocColumn).

所述设置单元在设置所述超级档案数据表中的元素包括列簇时，设置每一个列簇对应的主题域；The setting unit sets the subject field corresponding to each column cluster when setting the elements in the super file data table to include column clusters;

所述存储执行单元将实体对象相关的所有数据中属于同一主题域的数据存储到该主题域对应的列簇中，在列簇中的每个列上存储的数据为简单数据类型或者聚合单元的复杂数据类型。The storage execution unit stores the data belonging to the same subject domain among all the data related to the entity object into the column cluster corresponding to the subject domain, and the data stored on each column in the column cluster is a simple data type or an aggregation unit complex data types.

所述装置还包括：第一计算单元，其中，所述设置单元，进一步用于在每一个实体对象对应的数据行中，设置出计算结果存储列；所述第一计算的那样，用于利用所述超级档案数据表中对应第一实体对象的数据行中的已有指定数据，推导计算出对应第一实体对象的计算结果，将该计算结果写入所述第一实体对象对应的数据行中的所述计算结果存储列；The device also includes: a first calculation unit, wherein the setting unit is further configured to set a calculation result storage column in the data row corresponding to each entity object; The existing designated data in the data row corresponding to the first entity object in the super file data table, deduce and calculate the calculation result corresponding to the first entity object, and write the calculation result into the data row corresponding to the first entity object The calculation result storage column in ;

和/或，and / or,

所述装置还包括：第二计算单元，其中，所述设置单元，进一步用于设置汇总聚集计算结果存储表；所述第二计算单元，用于利用所述超级档案数据表的所有行上的数据进行汇总聚集计算，将计算结果写入所述汇总聚集计算结果存储表。The device also includes: a second calculation unit, wherein the setting unit is further used to set a summary aggregation calculation result storage table; the second calculation unit is used to use all rows of the super file data table The data is aggregated and calculated, and the calculation result is written into the summary and aggregated calculation result storage table.

可见，本发明实施例提供了一种面向大数据处理的数据存储方法和装置，设置超级档案数据表，并面向实体对象，一个实体对象一个超级档案，将一个实体对象相关的所有数据集中存储到超级档案数据表中对应于该实体对象的数据行，这样，实现了一个实体对象的数据的全息存储，提高了存储效率。It can be seen that the embodiment of the present invention provides a data storage method and device oriented to big data processing, setting a super archive data table, and facing entity objects, one entity object has one super archive, and all data related to an entity object is stored in a centralized manner. The data row corresponding to the entity object in the super archive data table realizes the holographic storage of the data of an entity object and improves the storage efficiency.

附图说明Description of drawings

图1是本发明一个实施例中一种面向大数据处理的数据存储方法的流程图。Fig. 1 is a flow chart of a data storage method oriented to big data processing in an embodiment of the present invention.

图2是本发明另一个实施例中面向大数据处理的数据存储方法的流程图。Fig. 2 is a flowchart of a data storage method oriented to big data processing in another embodiment of the present invention.

图3是本发明一个实施例中超级档案数据模型元素的示意图。Fig. 3 is a schematic diagram of elements of a super archive data model in one embodiment of the present invention.

图4是本发明一个实施例中超级档案数据表上的2种MapReduce操作示意图。Fig. 4 is a schematic diagram of two MapReduce operations on the super archive data table in one embodiment of the present invention.

图5是本发明一个实施例面向大数据处理的数据存储装置的结构示意图。Fig. 5 is a schematic structural diagram of a data storage device for big data processing according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本发明一个实施例提出了一种面向大数据处理的数据存储方法，参见图1，该方法包括：One embodiment of the present invention proposes a data storage method for big data processing, referring to Fig. 1, the method includes:

步骤101：设置实体对象，并建立超级档案数据表。Step 101: Set entity objects and create a super file data table.

步骤102：建立每一个实体对象与所述超级档案数据表中数据行的对应关系，并设置数据行中每一列对应的元数据。Step 102: Establish the corresponding relationship between each entity object and the data row in the super file data table, and set the metadata corresponding to each column in the data row.

步骤103：针对需存储的当前实体对象，查找与所述当前实体对象对应的超级档案数据表中的数据行。Step 103: For the current entity object to be stored, look up the data row in the super archive data table corresponding to the current entity object.

步骤104：将实体对象相关的所有数据存储到所述查找到的超级档案数据表中的数据行中。Step 104: Store all data related to the entity object in the data row in the found super file data table.

可见，本发明的该实施例提供了一种面向大数据处理的数据存储方法，设置超级档案数据表，并面向实体对象，一个实体对象一个超级档案，将一个实体对象相关的所有数据集中存储到超级档案数据表中对应于该实体对象的数据行，这样，实现了一个实体对象的数据的全息存储，提高了存储效率。It can be seen that this embodiment of the present invention provides a data storage method oriented to big data processing, setting a super file data table, and facing entity objects, one entity object is one super file, and all data related to an entity object are stored in a centralized manner. The data row corresponding to the entity object in the super archive data table realizes the holographic storage of the data of an entity object and improves the storage efficiency.

在本发明一个实施例中，每一个实体对象可以与超级档案数据表中一个数据行对应。In an embodiment of the present invention, each entity object may correspond to a data row in the super file data table.

在具体设置超级档案数据表时，在本发明一个实施例中，该表中的元素包括：数据表(SupDocTable)、键(SupDocKey)、键组件(SupDocKeyComponent)、列簇(SupDocFamily)、GROUP列簇(SupDocGroupFamily)、MAP列簇(SupDocMapFamily)、列(SupDocColumn)中的任意一个或多个。When specifically setting the super file data table, in one embodiment of the present invention, the elements in the table include: data table (SupDocTable), key (SupDocKey), key component (SupDocKeyComponent), column cluster (SupDocFamily), GROUP column cluster Any one or more of (SupDocGroupFamily), MAP column family (SupDocMapFamily), column (SupDocColumn).

在本发明的一个实施例中，超级档案数据表中的元素包括列簇，每一个列簇对应一个主题域；此时In one embodiment of the present invention, the elements in the super file data table include column clusters, and each column cluster corresponds to a subject field; at this time

所述将实体对象相关的所有数据存储到所述查找到的超级档案数据表中的数据行中包括：将实体对象相关的所有数据中属于同一主题域的数据存储到该主题域对应的列簇中，列簇中的每个列上存储的数据为简单数据类型或者聚合单元的复杂数据类型。此种方式可以更为有效地将一个实体对象的属于同一主题的数据集中存储。The storing all the data related to the entity object into the data row in the found super file data table includes: storing the data belonging to the same subject domain among all the data related to the entity object into the column cluster corresponding to the subject domain In , the data stored on each column in the column family is a simple data type or a complex data type of an aggregation unit. This method can more effectively store the data belonging to the same subject of an entity object in a centralized manner.

在本发明一个实施例中，由于超级档案数据表中实体对象的数据集中存储在一数据行中，因此便于进行后续的利用数据进行对应实体对象的计算处理。比如，In one embodiment of the present invention, since the data of the entity object in the super file data table is stored in a data row, it is convenient to perform subsequent calculation processing of the corresponding entity object using the data. for example,

第一种计算处理包括：The first type of computational processing consists of:

第二种计算处理包括：The second calculation process includes:

本发明另一个实施例提出了一种面向大数据处理的数据存储方法，参见图2，包括：Another embodiment of the present invention proposes a data storage method for big data processing, referring to Fig. 2, including:

步骤201：设置实体对象。Step 201: Set entity objects.

这里，实体对象比如可以是一个人；一辆车；一个案件等。Here, the entity object can be, for example, a person; a car; a case, etc.

步骤202：建立超级档案数据表，建立每一个实体对象与所述超级档案数据表中一个数据行的对应关系，并设置数据行中每一列对应的元数据。Step 202: Create a super file data table, establish a corresponding relationship between each entity object and a data row in the super file data table, and set metadata corresponding to each column in the data row.

这里，参见图3，超级档案数据表中的元素可以包括：数据表(SupDocTable)、键(SupDocKey)、键组件(SupDocKeyComponent)、列簇(SupDocFamily)、GROUP列簇(SupDocGroupFamily)、MAP列簇(SupDocMapFamily)、列(SupDocColumn)。Here, referring to Fig. 3, the elements in the super file data table can include: data table (SupDocTable), key (SupDocKey), key component (SupDocKeyComponent), column family (SupDocFamily), GROUP column family (SupDocGroupFamily), MAP column family ( SupDocMapFamily), Column(SupDocColumn).

比如，通过本步骤的处理，实体对象“案件A”对应超级档案数据表中的第一个数据行，记为数据行1，数据行1的各个列元素分别对应一个元数据，比如数据行1包括元数据“时间”、“地点”、“分类”、“涉案人物关系”、“目标结案时间”、“案情分析”等。For example, through the processing of this step, the entity object "case A" corresponds to the first data row in the super file data table, which is recorded as data row 1, and each column element of data row 1 corresponds to a piece of metadata, such as data row 1 Including metadata "time", "location", "classification", "relationship of persons involved in the case", "target closing time", "case analysis", etc.

步骤203：在数据行中设置列簇，每一个列簇中包括一个以上的元数据，并设置每一个列簇对应的主题域。Step 203: Set column clusters in the data row, each column cluster includes more than one piece of metadata, and set the subject field corresponding to each column cluster.

比如，可以将一个数据行中，元数据“时间”和“地点”设置为属于同一个列簇，该列簇对应的主题域为基本信息。For example, in a data row, metadata "time" and "location" can be set to belong to the same column family, and the subject field corresponding to the column family is basic information.

步骤204：在每一个实体对象对应的数据行中，设置出计算结果存储列。Step 204: In the data row corresponding to each entity object, set the calculation result storage column.

步骤205：设置汇总聚集计算结果存储表。Step 205: Set up a table for storing summary and aggregation calculation results.

执行到本步骤，则完成了在实际存储大数据之前的预设值处理。After this step is executed, the preset value processing before the actual storage of big data is completed.

其中，每一个数据行中的各个元素(即列名)则也确定了。Wherein, each element (ie column name) in each data row is also determined.

在本发明一个实施例中，可以根据超级档案数据模型元素，设计超级档案Schema定义语言，将数据表定义文件内容映射为可执行的Apache HBaseSchema定义语句，实现Shell命令和Web页面数据建模工具，从而实现上述步骤的处理过程。In one embodiment of the present invention, according to the super file data model element, the super file Schema definition language can be designed, and the content of the data table definition file can be mapped to an executable Apache HBaseSchema definition statement, so as to realize Shell commands and Web page data modeling tools, Thereby, the processing procedure of the above steps is realized.

步骤206：针对需存储的当前实体对象记为实体对象A，查找与实体对象A对应的超级档案数据表中的数据行。Step 206: Denote the current entity object to be stored as entity object A, and search for the data row in the super archive data table corresponding to entity object A.

比如实体对象A对应第二个数据行。For example, entity object A corresponds to the second data row.

步骤207：将实体对象相关的所有数据按照其属性存储到所述查找到的超级档案数据表中的数据行中，并位于该数据行中对应元数据的列上。Step 207: Store all data related to the entity object in the data row in the found super-archive data table according to its attributes, and place it in the corresponding metadata column in the data row.

步骤208：在存储时到数据行中时，将实体对象相关的所有数据中属于同一主题域的数据存储到该主题域对应的列簇中，并位于该列簇中对应元数据的列上。Step 208: When storing into the data row, store the data belonging to the same subject domain among all the data related to the entity object into the column family corresponding to the subject domain, and locate on the column corresponding to the metadata in the column family.

本步骤中，列簇中的每个列上存储的数据为简单数据类型或者聚合单元的复杂数据类型(通过json定义复杂类型数据结构)。In this step, the data stored on each column in the column cluster is a simple data type or a complex data type of an aggregation unit (the complex type data structure is defined through json).

步骤209：当需要针对实体对象1获得其某种计算结果时，利用超级档案数据表中对应实体对象1的数据行中的已有指定数据，推导计算出对应第一实体对象的计算结果1。Step 209: When it is necessary to obtain a certain calculation result for the entity object 1, use the existing specified data in the data row corresponding to the entity object 1 in the super file data table to derive and calculate the calculation result 1 corresponding to the first entity object.

步骤210：将计算结果1写入实体对象1对应的数据行中的计算结果存储列。Step 210: Write the calculation result 1 into the calculation result storage column in the data row corresponding to the entity object 1.

步骤211：利用所述超级档案数据表的所有行上的数据进行汇总聚集计算，将计算结果2写入汇总聚集计算结果存储表。Step 211: Use the data on all the rows of the super-archive data table to perform summary aggregation calculation, and write the calculation result 2 into the summary aggregation calculation result storage table.

上述步骤209至步骤211中的两种计算，可以基于MapReduce操作方法来实现。参见图4，基于MapReduce计算框架，第1种是推导计算(MR-derivator)操作，在整个表的每一个单行上，使用已有数据推导计算得出实体对象新的信息并写入该行的某个列；第2种是聚集计算(MR-aggregator)操作，在整个表的所有行上进行汇总聚集计算，结果写入另外一个数据表(其他超级档案表或一般类型的表)。The above two calculations in step 209 to step 211 can be implemented based on the MapReduce operation method. See Figure 4. Based on the MapReduce computing framework, the first type is the derivation calculation (MR-derivator) operation. On each single row of the entire table, the new information of the entity object is calculated using existing data and written into the row. A certain column; the second type is the aggregation calculation (MR-aggregator) operation, which performs summary aggregation calculations on all rows of the entire table, and writes the results to another data table (other super-archive tables or general-type tables).

上述图2所示过程可以基于NOSQL列存储技术如Apache HBase实现，并且，预先确定列簇中的列名。The process shown in Figure 2 above can be implemented based on NOSQL column storage technology such as Apache HBase, and the column names in the column cluster are determined in advance.

本发明一个实施例还提出了一种面向大数据处理的数据存储装置，参见图5，包括：An embodiment of the present invention also proposes a data storage device for big data processing, see FIG. 5, including:

设置单元501，用于设置实体对象，并建立超级档案数据表；建立每一个实体对象与所述超级档案数据表中数据行的对应关系；The setting unit 501 is used to set the entity object and establish the super file data table; establish the corresponding relationship between each entity object and the data row in the super file data table;

查找单元502，用于针对需存储的当前实体对象，查找与所述当前实体对象对应的超级档案数据表中的数据行；A search unit 502, configured to search for the data row in the super file data table corresponding to the current entity object for the current entity object to be stored;

存储执行单元503，用于将实体对象相关的所有数据存储到所述查找单元所查找到的超级档案数据表中的数据行中。The storage execution unit 503 is configured to store all data related to the entity object into the data row in the super file data table found by the search unit.

在本发明一个实施例中，所述设置单元，用于设置所述超级档案数据表中的元素包括：数据表(SupDocTable)、键(SupDocKey)、键组件(SupDocKeyComponent)、列簇(SupDocFamily)、GROUP列簇(SupDocGroupFamily)、MAP列簇(SupDocMapFamily)、列(SupDocColumn)中的任意一个或多个。In one embodiment of the present invention, the setting unit is used to set the elements in the super file data table including: data table (SupDocTable), key (SupDocKey), key component (SupDocKeyComponent), column family (SupDocFamily), Any one or more of GROUP column family (SupDocGroupFamily), MAP column family (SupDocMapFamily), column (SupDocColumn).

在本发明一个实施例中，所述设置单元在设置所述超级档案数据表中的元素包括列簇时，设置每一个列簇对应的主题域；In one embodiment of the present invention, the setting unit sets the subject field corresponding to each column cluster when setting the elements in the super archive data table to include column clusters;

在本发明一个实施例中，所述装置还包括：第一计算单元，其中，所述设置单元，进一步用于在每一个实体对象对应的数据行中，设置出计算结果存储列；所述第一计算的那样，用于利用所述超级档案数据表中对应第一实体对象的数据行中的已有指定数据，推导计算出对应第一实体对象的计算结果，将该计算结果写入所述第一实体对象对应的数据行中的所述计算结果存储列。In an embodiment of the present invention, the device further includes: a first calculation unit, wherein the setting unit is further configured to set a calculation result storage column in the data row corresponding to each entity object; As calculated, it is used to deduce and calculate the calculation result corresponding to the first entity object by using the existing specified data in the data row corresponding to the first entity object in the super file data table, and write the calculation result into the The calculation result storage column in the data row corresponding to the first entity object.

在本发明一个实施例中，所述装置还包括：第二计算单元，其中，所述设置单元，进一步用于设置汇总聚集计算结果存储表；所述第二计算单元，用于利用所述超级档案数据表的所有行上的数据进行汇总聚集计算，将计算结果写入所述汇总聚集计算结果存储表。In an embodiment of the present invention, the device further includes: a second calculation unit, wherein the setting unit is further used to set a summary aggregation calculation result storage table; the second calculation unit is used to use the super The data on all rows of the archive data table is aggregated and calculated, and the calculation results are written into the aggregated calculation result storage table.

本发明的各个实施例至少具有如下的有益效果：Various embodiments of the present invention have at least the following beneficial effects:

1、利用了超级档案数据表，并面向实体对象，一个实体对象一个超级档案，将一个实体对象相关的所有数据集中存储到超级档案数据表中对应于该实体对象的数据行，比如一行中，这样，实现了一个实体对象的数据的全息存储，提高了存储效率。1. Utilize the super file data table, and it is oriented to entity objects, one entity object is one super file, and all the data related to an entity object is centrally stored in the data row corresponding to the entity object in the super file data table, for example, in one row, In this way, the holographic storage of the data of an entity object is realized, and the storage efficiency is improved.

2、由于一个实体对象的所有数据都集中存储到超级档案数据表中对应于该实体对象的数据行中，因此，在后续需要向用户展示该实体对象比如一个案件的所有数据(案件发生的时间，地点，人物，案情分析，车辆信息等)则可以非常方便的实现全息展示，无需现有技术中根据预先记录的数据之间的关联关系到关系型数据库的不同数据表中查找的过程，因此，提高了大数据查找和展示的效率。2. Since all the data of an entity object is centrally stored in the data row corresponding to the entity object in the super archive data table, it is necessary to show the user all the data of the entity object such as a case (the time when the case occurred) , place, person, case analysis, vehicle information, etc.) can realize the holographic display very conveniently, without the process of searching in different data tables of the relational database according to the association relationship between the pre-recorded data in the prior art, so , improving the efficiency of big data search and display.

3、数据模型按照“一个实体对象一个超级档案”的思想，以实体对象为中心组织和聚合数据，全方位展现每个实体对象的相关信息，支持属性动态扩展和多维度数据处理。3. According to the idea of "one entity object, one super file", the data model organizes and aggregates data centered on the entity object, displays the relevant information of each entity object in an all-round way, and supports dynamic expansion of attributes and multi-dimensional data processing.

4、由于设置出了数据行中每一列对应的元数据，因此可以实现对超级档案数据表的元数据管理，为数据访问的用户展现超级档案Schema信息。4. Since the metadata corresponding to each column in the data row is set, the metadata management of the super file data table can be realized, and the super file Schema information can be displayed for data access users.

5、通过定义的2种基本的计算处理比如MapReduce处理，以及这2种计算处理的不同编排组合，可以实现在超级档案数据集上的管道式大数据处理计算。5. Through the defined two types of basic computing processing such as MapReduce processing, and the different arrangements and combinations of these two computing processing, pipeline-type big data processing and computing on super-archive datasets can be realized.

6、数据建模方法和数据建模工具，使得超级档案数据管理员能够方便地定义和扩展数据表Schema，以及Schema定义的版本管理。6. The data modeling method and data modeling tool enable the super archive data administrator to easily define and expand the data table Schema, as well as the version management of the Schema definition.

需要说明的是，在本文中，诸如第一和第二之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个······”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同因素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or sequence. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional same elements in the process, method, article or apparatus comprising said element.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the present invention. within the scope of protection.

Claims

1. towards a date storage method for large data processing, it is characterized in that, entity object is set, and set up super file data table; Set up the corresponding relation of data line in each entity object and described super file data table, and the metadata that in setting data row, each row is corresponding;

For the current entity object that need store, search the data line in the super file data table corresponding with described current entity object;

All data of being correlated with by entity object, and to be arranged on the row of these data line corresponding element data in the data line in the described super file data table found according to its property store.

2. method according to claim 1, it is characterized in that, the described corresponding relation setting up data line in each entity object and described super file data table comprises: the corresponding relation setting up a data line in each entity object and described super file data table;

Data line in the super file data table found described in described all data of being correlated with by entity object are stored into comprises: in a line in the super file data table found described in all data of being correlated with by entity object are stored into.

3. method according to claim 1, it is characterized in that, the element in described super file data table comprises: tables of data (SupDocTable), key (SupDocKey), key asembly (SupDocKeyComponent), row bunch (SupDocFamily), GROUP row bunch (SupDocGroupFamily), MAP row bunch (SupDocMapFamily), row (SupDocColumn) in any one or more.

4. method according to claim 3, is characterized in that, comprises further: in data line, arrange row bunch, and each row bunch comprises more than one described metadata, and arranges each row bunch corresponding subject area;

Data line in the super file data table found described in described all data of being correlated with by entity object are stored into comprises: the data belonging to same subject territory in all data of being correlated with by entity object are stored into the corresponding element data place in row corresponding to this subject area bunch, and the data that each row in row bunch store are the complex data type of simple data type or polymerized unit.

5., according to described method arbitrary in Claims 1-4, it is characterized in that, the method comprises further:

In the data line that each entity object is corresponding, set out result of calculation memory row;

Utilize the existing specific data in the data line of corresponding first instance object in described super file data table, derive and calculate the result of calculation of corresponding first instance object, this result of calculation is write the described result of calculation memory row in data line corresponding to described first instance object.

6., according to described method arbitrary in Claims 1-4, it is characterized in that, the method comprises further:

Setting gathers Aggregation computation result storage list;

Utilize the data on all row of described super file data table to carry out gathering Aggregation computation, described in result of calculation being write, gather Aggregation computation result storage list.

7. towards a data storage device for large data processing, it is characterized in that, comprising:

Setting unit, for arranging entity object, and sets up super file data table; Set up the corresponding relation of data line in each entity object and described super file data table, and the metadata that in setting data row, each row is corresponding;

Search unit, for for the current entity object that need store, search the data line in the super file data table corresponding with described current entity object;

Store performance element, for all data that entity object is correlated with according to its property store in the data line in the described super file data table found, and to be arranged on the row of these data line corresponding element data.

8. device according to claim 7, it is characterized in that, described setting unit, comprises for the element arranged in described super file data table: tables of data (SupDocTable), key (SupDocKey), key asembly (SupDocKeyComponent), row bunch (SupDocFamily), GROUP row bunch (SupDocGroupFamily), MAP row bunch (SupDocMapFamily), row (SupDocColumn) in any one or more.

9. device according to claim 8, is characterized in that, described setting unit arranges row bunch in data line, and each row bunch comprises more than one described metadata, and arranges each row bunch corresponding subject area;

The data belonging to same subject territory in all data that entity object is correlated with by described storage performance element are stored into the corresponding element data place in row corresponding to this subject area bunch, and the data that each row in row bunch store are the complex data type of simple data type or polymerized unit.

10., according to described device arbitrary in claim 7 to 9, it is characterized in that,

Described device also comprises: the first computing unit, and wherein, described setting unit, is further used for, in the data line that each entity object is corresponding, setting out result of calculation memory row; Described first calculate such, for utilizing the existing specific data in the data line of corresponding first instance object in described super file data table, derive and calculate the result of calculation of corresponding first instance object, this result of calculation is write the described result of calculation memory row in data line corresponding to described first instance object;

And/or,

Described device also comprises: the second computing unit, and wherein, described setting unit, is further used for setting and gathers Aggregation computation result storage list; Described second computing unit, the data on all row utilizing described super file data table carry out gathering Aggregation computation, gather Aggregation computation result storage list described in result of calculation being write.