WO2018014267A1

WO2018014267A1 - Method and system for processing massive crowd feature data

Info

Publication number: WO2018014267A1
Application number: PCT/CN2016/090760
Authority: WO
Inventors: 金培银
Original assignee: Donson Times Information Technology Co ltd
Current assignee: Donson Times Information Technology Co ltd
Priority date: 2016-07-20
Filing date: 2016-07-20
Publication date: 2018-01-25
Anticipated expiration: 2019-01-20
Also published as: CN109937413A; CN109937413B

Abstract

A method for processing massive crowd feature data, and a corresponding system. The method comprises: setting serial numbers of a crowd and a feature label (S101); establishing binary feature data corresponding to each feature label according to the serial numbers of the crowd (S102); converting the established binary feature data corresponding to each feature label into hexadecimal feature data (S103); and storing converted hexadecimal data corresponding to each feature label (S104). The method and system improve the processing efficiency of massive crowd feature data and save data storage space.

Description

Method and system for processing massive population characteristic data

Technical field

本发明涉及大数据处理技术领域，尤其涉及一种海量人群特征数据的处理方法及系统。The present invention relates to the field of big data processing technologies, and in particular, to a method and system for processing massive population feature data.

Background technique

现有的海量人群特征数据的处理，通常是基于每个人群受众个体排序对应多个0或1标识的特征标签而以行和列的形式存储。这种处理方式虽然简单，但标签表达式复杂多变难以实现，传统关系型以行和列形式存储占据大量存储空间，同时在进行查询表达式运算时，首先需要关联标签位置索引表获取相应位置，然后才能构造出逻辑表达式进行逻辑运算，查询处理过程复杂耗时，导致海量人群特征的数据处理效率较低。The processing of the existing massive demographic data is usually stored in the form of rows and columns based on the individual individual audiences sorting corresponding feature tags of 0 or 1 identifiers. Although this processing method is simple, the label expression is complicated and difficult to implement. The traditional relation type stores a large amount of storage space in the form of row and column. At the same time, when performing query expression operation, the label position index table needs to be associated with the corresponding position. Then, logical expressions can be constructed to perform logical operations. The query processing process is complicated and time-consuming, resulting in low data processing efficiency for mass population features.

发明内容Summary of the invention

鉴于此，本发明提供一种海量人群特征数据的处理方法及系统，解决现有海量人群特征的数据处理效率较低和数据存储空间较大的技术问题。In view of this, the present invention provides a method and system for processing massive population feature data, and solves the technical problems of low data processing efficiency and large data storage space of the existing mass population features.

根据本发明的实施例，提供一种海量人群特征数据的处理方法，包括：设置人群编号和特征标签；按照人群编号建立每一特征标签对应的二进制特征数据；将建立的每一特征标签对应的二进制特征数据转化为十六进制特征数据；以及将转化的每一特征标签对应的十六进制特征数据进行存储。According to an embodiment of the present invention, a method for processing massive population feature data is provided, including: setting a population number and a feature tag; and establishing binary feature data corresponding to each feature tag according to the crowd number; corresponding to each feature tag to be established The binary feature data is converted into hexadecimal feature data; and the hexadecimal feature data corresponding to each feature tag converted is stored.

优选的，所述海量人群特征数据的处理方法还包括：根据选择查询的特征标签生成查询逻辑表达式；根据生成的逻辑表达式对特征标签对应的十六进制特征数据进行逻辑运算而获取二进制字符串；以及计算获取的二进制字符串中1的数量，并将计算出的数量作为查询的特征标签的受众数量。Preferably, the method for processing the mass demographic data further comprises: generating a query logic expression according to the feature tag of the selected query; performing logical operations on the hexadecimal feature data corresponding to the feature tag according to the generated logical expression to obtain a binary a string; and calculating the number of 1s in the obtained binary string, and using the calculated number as the number of audiences for the feature tag of the query.

优选的，所述根据选择查询的特征标签生成查询逻辑表达式，包括：根据选择查询的特征标签生成逻辑或运算表达式；根据选择查询的特征标签生成逻辑与运算表达式；以及将生成的逻辑或运算表达式和逻辑与运算表达式合成查询逻辑表达式。Preferably, the generating a query logic expression according to the feature tag of the selected query comprises: generating a logical OR operation expression according to the feature tag of the selected query; generating a logical AND operation expression according to the feature tag of the selected query; and generating the logic Or an arithmetic expression and a logical AND operation expression to synthesize a query logical expression.

优选的，所述根据生成的逻辑表达式对特征标签对应的十六进制特征数据进行逻辑运算而获取二进制字符串，包括：对生成的逻辑表达式进行分组；获取分组的特征标签对应的十六进制特征数据并转化为long类型数据；对每一分组特征标签对应的long类型数据分组进行布尔运算；以及将所有分组布尔运算的结果转化为二进制字符串。Preferably, the logical operation of the hexadecimal feature data corresponding to the feature tag according to the generated logical expression to obtain a binary string includes: grouping the generated logical expression; acquiring the feature tag of the group Corresponding hexadecimal feature data is converted into long type data; Boolean operation is performed on the long type data packet corresponding to each group feature tag; and the result of all grouping Boolean operations is converted into a binary string.

优选的，所述海量人群特征数据的处理方法还包括：将计算的所述特征标签的受众数量发送给查询端。Preferably, the method for processing the mass demographic data further comprises: transmitting the calculated audience quantity of the feature tag to the query end.

根据本发明的另一个实施例，提供一种海量人群特征数据的处理系统，包括：设置模块，用于设置人群编号和特征标签；特征数据建立模块，用于按照所述设置模块设置的人群编号建立每一特征标签对应的二进制特征数据；转化模块，用于将所述特征数据建立模块建立的每一特征标签对应的二进制特征数据转化为十六进制特征数据；以及存储模块，用于将所述转化模块转化的每一特征标签对应的十六进制特征数据进行存储。According to another embodiment of the present invention, a processing system for mass demographic data is provided, including: a setting module for setting a crowd number and a feature tag; and a feature data establishing module for a crowd number set according to the setting module Establishing binary feature data corresponding to each feature tag; a conversion module, configured to convert binary feature data corresponding to each feature tag established by the feature data establishing module into hexadecimal feature data; and a storage module, configured to The hexadecimal feature data corresponding to each feature tag converted by the conversion module is stored.

优选的，所述海量人群特征数据的处理系统还包括：查询逻辑表达式生成模块，用于根据选择查询的特征标签生成查询逻辑表达式；逻辑运算模块，用于根据所述查询逻辑表达式生成模块生成的逻辑表达式对特征标签对应的十六进制特征数据进行逻辑运算而获取二进制字符串；以及计算模块，用于计算所述逻辑运算模块获取的二进制字符串中1的数量，并将计算出的数量作为查询的特征标签的受众数量。Preferably, the processing system for the mass demographic data further includes: a query logic expression generating module, configured to generate a query logic expression according to the feature tag of the selected query; and a logic operation module, configured to generate according to the query logic expression The logical expression generated by the module performs a logical operation on the hexadecimal feature data corresponding to the feature tag to obtain a binary string; and a calculation module for calculating the number of 1 in the binary string obtained by the logical operation module, and The calculated quantity is the number of audiences for the feature tag of the query.

优选的，所述查询逻辑表达式生成模块包括：第一生成单元，用于根据选择查询的特征标签生成逻辑或运算表达式；第二生成单元，用于根据选择查询的特征标签生成逻辑与运算表达式；以及合成单元，用于将所述第一生成单元和第二生成单元生成的逻辑或运算表达式和逻辑与运算表达式合成查询逻辑表达式。Preferably, the query logic expression generating module includes: a first generating unit, configured to generate a logical OR operation expression according to the feature tag of the selected query; and a second generating unit, configured to generate a logical AND operation according to the feature tag of the selected query And a synthesizing unit configured to synthesize the logical OR operation expression and the logical AND operation expression generated by the first generation unit and the second generation unit into a query logic expression.

优选的，所述逻辑运算模块包括：分组单元，用于对所述查询逻辑表达式生成模块生成的逻辑表达式进行分组；long类型数据转化单元，用于获取所述分组单元分组的特征标签对应的十六进制特征数据并转化为long类型数据；分组运算单元，用于对所述long类型数据转化单元转化的每一分组特征标签对应的long类型数据分组进行布尔运算；以及二进制字符串转化单元，用于将所述分组运算单元所有分组布尔运算的结果转化为二进制字符串。Preferably, the logic operation module includes: a grouping unit, configured to group a logical expression generated by the query logic expression generating module; and a long type data converting unit, configured to acquire a feature label corresponding to the grouping unit group The hexadecimal feature data is converted into long type data; a grouping operation unit is configured to perform a Boolean operation on the long type data packet corresponding to each group feature tag converted by the long type data conversion unit; and a binary string conversion a unit for converting the result of all grouping Boolean operations of the grouping operation unit into a binary string.

优选的，所述海量人群特征数据的处理系统还包括发送模块，用于将所述计算模块计算的所述特征标签的受众数量发送给查询端。Preferably, the processing system of the mass demographic data further includes a sending module, configured to send the audience number of the feature tag calculated by the computing module to the querying end.

本发明提供的海量人群特征数据的处理方法及系统，按照人群编号建立每一特征标签对应的二进制特征数据，并将建立的每一特征标签对应的二进制特征数据转化为十六进制特征数据进行存储，基于特征标签高效地实现海量人群特征数据的分布式处理、存储和检索，提高了海量人群特征数据的处理效率，并节省了数据存储空间。 The method and system for processing massive population feature data provided by the present invention establish binary feature data corresponding to each feature tag according to the population number, and convert the binary feature data corresponding to each feature tag into hexadecimal feature data. Storage, based on feature tags, efficiently realize distributed processing, storage and retrieval of massive population feature data, improve the processing efficiency of massive population feature data, and save data storage space.

DRAWINGS

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. Obviously, the drawings in the following description are only some embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without any creative work.

图1为本发明一个实施例中海量人群特征数据的处理方法的流程示意图。FIG. 1 is a schematic flow chart of a method for processing mass population characteristic data according to an embodiment of the present invention.

图2为本发明一个实施例中二进制特征数据的示意图。2 is a schematic diagram of binary feature data in one embodiment of the present invention.

图3为本发明一个实施例中十六进制特征数据的示意图。3 is a schematic diagram of hexadecimal feature data in an embodiment of the present invention.

图4为本发明另一实施例中海量人群特征数据的处理方法的流程示意图。FIG. 4 is a schematic flow chart of a method for processing mass population characteristic data according to another embodiment of the present invention.

图5为本发明另一实施例中生成查询逻辑表达式的流程示意图。FIG. 5 is a schematic flowchart of generating a query logic expression according to another embodiment of the present invention.

图6为本发明另一实施例中对特征数据进行逻辑处理的流程示意图。FIG. 6 is a schematic flowchart of logical processing of feature data in another embodiment of the present invention.

图7为本发明又一个实施例中海量人群特征数据的处理系统的结构示意图。FIG. 7 is a schematic structural diagram of a processing system for mass population characteristic data according to still another embodiment of the present invention.

图8为本发明再一个实施例中海量人群特征数据的处理系统的结构示意图。FIG. 8 is a schematic structural diagram of a processing system for mass population characteristic data according to still another embodiment of the present invention.

图9为本发明又一个实施例中查询逻辑表达式生成模块的结构示意图。FIG. 9 is a schematic structural diagram of a query logic expression generating module according to still another embodiment of the present invention.

图10为本发明又一个实施例中逻辑运算模块的结构示意图。FIG. 10 is a schematic structural diagram of a logic operation module according to still another embodiment of the present invention.

detailed description

下面结合附图和具体实施方式对本发明的技术方案作进一步更详细的描述。显然，所描述的实施例仅仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例，都应属于本发明保护的范围。The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings and specific embodiments. It is apparent that the described embodiments are only a part of the embodiments of the invention, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.

在本发明的描述中，需要理解的是，术语“第一”、“第二”等仅用于描述目的，而不能理解为指示或暗示相对重要性。在本发明的描述中，需要说明的是，除非另有明确的规定和限定，术语“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连。对于本领域的普通技术人员而言，可以结合具体情况理解上述术语在本发明中的具体含义。此外，在本发明的描述中，除非另有说明，“多个”的含义是两个或两个以上。In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it should be noted that the terms "connected" and "connected" are to be understood broadly, and may be, for example, a fixed connection, a detachable connection, or an integral, unless otherwise explicitly defined and defined. Ground connection; it can be mechanical connection or electrical connection; it can be directly connected or indirectly connected through an intermediate medium. The specific meanings of the above terms in the present invention can be understood by those skilled in the art in light of the specific circumstances. Further, in the description of the present invention, the meaning of "a plurality" is two or more unless otherwise specified.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method description in the flowchart or otherwise described herein may be understood to include a Modules, segments or portions of code of one or more executable instructions for implementing steps of a particular logical function or process, and the scope of preferred embodiments of the invention includes additional implementations in which The order of discussion includes performing functions in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which should be understood by those skilled in the art to which the present invention pertains.

图1为本发明一个实施例中海量人群特征数据的处理方法的流程示意图。如图所示，所述海量人群特征数据的处理方法，包括：FIG. 1 is a schematic flow chart of a method for processing mass population characteristic data according to an embodiment of the present invention. As shown in the figure, the method for processing the mass demographic data includes:

步骤S101：设置人群编号和特征标签。Step S101: setting a crowd number and a feature tag.

在本实施例中，所述海量人群特征数据可包括性别、年龄、职业、户籍地等身份信息或其他的数据信息。参见图2，在获得所述海量人群特征数据后，预先设置人群编号和特征标签，每个人群个体对应唯一的人群编号并按照顺序不间断排列，所述特征标签可根据这些特征数据的特性和实际应用需要灵活设定，比如可设置为男、女、25-30岁、本科、单身、职员等。In this embodiment, the mass population characteristic data may include identity information such as gender, age, occupation, household registration, or other data information. Referring to FIG. 2, after obtaining the mass population characteristic data, a population number and a feature tag are preset, and each individual group corresponds to a unique population number and is arranged in an uninterrupted order, and the feature tag can be based on the characteristics of the feature data. Practical applications require flexible settings, such as male, female, 25-30 years old, undergraduate, single, and staff.

步骤S102：按照人群编号建立每一特征标签对应的二进制特征数据。Step S102: Establish binary feature data corresponding to each feature tag according to the crowd number.

图2为本发明实施例中二进制特征数据的示意图。如图所示，所述二进制特征数据基于所述特征标签设置存储，所述海量人群特征数据均以0或1的形式对应标识所述特征标签，1标识具有所述特征标签的特征，0标识不具有所述特征标签的特征，这样每一个特征标签按照人群编号对应顺序存储0或1标识的特征标识，以二进制数据的形式形成每一特征标签的受众点阵数据，可充分利用二进制高效的运算特征，可实现海量人群特征数据的快速分析和处理，同时基于特征标签实现海量人群特征数据的分布式存储和检索，支持纵向切割，便于引入分布式计算并支持集群化处理。2 is a schematic diagram of binary feature data in an embodiment of the present invention. As shown in the figure, the binary feature data is stored based on the feature tag setting, and the mass population feature data respectively identifies the feature tag in the form of 0 or 1, and 1 identifies the feature having the feature tag, and the 0 flag The feature tag does not have the feature tag, so that each feature tag stores the feature identifier of 0 or 1 in the order corresponding to the crowd number, and forms the audience dot matrix data of each feature tag in the form of binary data, which can fully utilize the binary efficient The computational features can realize the rapid analysis and processing of massive population feature data, and realize the distributed storage and retrieval of massive population feature data based on feature tags, support vertical cutting, facilitate the introduction of distributed computing and support clustering processing.

步骤S103：将建立的每一特征标签对应的二进制特征数据转化为十六进制特征数据。Step S103: Convert the binary feature data corresponding to each feature tag that is created into hexadecimal feature data.

图3为本发明实施例中十六进制特征数据的示意图。在本实施例中，根据计算机进制转换规则，将建立的每一特征标签对应的二进制特征数据以64位作为一个单元进行划分，转化为十六进制特征数据，如图所示，所述特征标签TagCode包括male、woman、married、bachelor、student等特征标签，每一所述特征标签对应人群编号顺序的十六进制形式的特征标识形成AudienceBool，将二进制数据压缩转化为十六进制数据，节省了海量人群特征数据存储的空间。FIG. 3 is a schematic diagram of hexadecimal feature data in an embodiment of the present invention. In this embodiment, according to the computer-based conversion rule, the binary feature data corresponding to each feature tag is divided into 64 units as a unit, and converted into hexadecimal feature data, as shown in the figure. The feature tag TagCode includes feature tags such as male, woman, married, bachelor, and student, and each of the feature tags forms an AudienceBool corresponding to the hexadecimal form of the crowd number sequence, and converts the binary data into hexadecimal data. , saving space for the storage of characteristic data of massive people.

步骤S104：将转化的每一特征标签对应的十六进制特征数据进行存储。 Step S104: Store the hexadecimal feature data corresponding to each feature tag converted.

在本实施例的海量人群特征数据的处理方法中，按照人群编号建立每一特征标签对应的二进制特征数据，并将建立的每一特征标签对应的二进制特征数据转化为十六进制特征数据进行存储，基于特征标签高效地实现海量人群特征数据的分布式处理、存储和检索，提高了海量人群特征数据的处理效率，并节省了数据存储空间。In the method for processing massive population feature data of the embodiment, the binary feature data corresponding to each feature tag is established according to the population number, and the binary feature data corresponding to each feature tag is converted into hexadecimal feature data. Storage, based on feature tags, efficiently realize distributed processing, storage and retrieval of massive population feature data, improve the processing efficiency of massive population feature data, and save data storage space.

图4为本发明另一实施例中海量人群特征数据的处理方法的流程示意图。如图所示，所述海量人群特征数据的处理方法，包括：FIG. 4 is a schematic flow chart of a method for processing mass population characteristic data according to another embodiment of the present invention. As shown in the figure, the method for processing the mass demographic data includes:

步骤S201：设置人群编号和特征标签。Step S201: setting a crowd number and a feature tag.

步骤S202：按照人群编号建立每一特征标签对应的二进制特征数据。Step S202: Establish binary feature data corresponding to each feature tag according to the crowd number.

步骤S203：将建立的每一特征标签对应的二进制特征数据转化为十六进制特征数据。Step S203: Convert the binary feature data corresponding to each feature tag that is created into hexadecimal feature data.

步骤S204：将转化的每一特征标签对应的十六进制特征数据进行存储。Step S204: Store the hexadecimal feature data corresponding to each feature tag of the conversion.

步骤S205：根据选择查询的特征标签生成查询逻辑表达式。Step S205: Generate a query logic expression according to the feature tag of the selected query.

在本实施例中，在需要对上述方法实施例中存储的海量人群特征数据进行特征检索查询时，首先需要根据实际检索查询需求生成查询逻辑表达式。如图5所示，所述根据选择查询的特征标签生成查询逻辑表达式，包括：In this embodiment, when the feature retrieval query needs to be performed on the mass demographic data stored in the foregoing method embodiment, it is first necessary to generate a query logic expression according to the actual retrieval query requirement. As shown in FIG. 5, the generating a query logic expression according to the feature tag of the selected query includes:

步骤S301：根据选择查询的特征标签生成逻辑或运算表达式。Step S301: Generate a logical OR operation expression according to the feature tag of the selected query.

步骤S302：根据选择查询的特征标签生成逻辑与运算表达式。Step S302: Generate a logical AND operation expression according to the feature tag of the selected query.

步骤S303：将生成的逻辑或运算表达式和逻辑与运算表达式合成查询逻辑表达式。Step S303: synthesize the generated logical OR operation expression and the logical AND operation expression into a query logic expression.

示例的，在本实施例中，根据选择查询的特征标签生成逻辑或运算表达式和逻辑与运算表达式，并合成查询逻辑表达式(Tag1||Tag2..Tagn)&&(Tag1||Tag2..Tagn)，在客户端生成特定特征查询的逻辑表达式。For example, in this embodiment, a logical OR operation expression and a logical AND operation expression are generated according to the feature tag of the selected query, and the query logical expression (Tag1||Tag2..Tagn)&&(Tag1||Tag2. .Tagn), a logical expression that generates a specific feature query on the client.

步骤S206：根据生成的逻辑表达式对特征标签对应的十六进制特征数据进行逻辑运算而获取二进制字符串。Step S206: Perform a logical operation on the hexadecimal feature data corresponding to the feature tag according to the generated logical expression to obtain a binary string.

在本实施例中，参见图6，所述根据生成的逻辑表达式对特征标签对应的十六进制特征数据进行逻辑运算而获取二进制字符串，包括：In this embodiment, referring to FIG. 6, the logic function is performed on the hexadecimal feature data corresponding to the feature tag according to the generated logical expression to obtain a binary string, including:

步骤S401：对生成的逻辑表达式进行分组。Step S401: Group the generated logical expressions.

步骤S402：获取分组的特征标签对应的十六进制特征数据并转化为long类型数据。Step S402: Acquire hexadecimal feature data corresponding to the feature tag of the packet and convert it into long type data.

步骤S403：对每一分组特征标签对应的long类型数据分组进行布尔运算。Step S403: Perform a Boolean operation on the long type data packet corresponding to each group feature tag.

步骤S404：将所有分组布尔运算的结果转化为二进制字符串。 Step S404: Convert the result of all grouping Boolean operations into a binary string.

在本实施例中，在获得查询逻辑表达式后，对所述查询逻辑表达式进行拆分分组而得到或运算组(Tag1||Tag2||..Tagn)和(Tag1||Tag2||..Tagn)，然后根据所述选择的查询标签获取特征标签对应的十六进制特征数据并转化为long类型数据，然后对每一分组特征标签对应的long类型数据分组进行布尔运算，最后将所有分组布尔运算的结果转化为二进制字符串。在本实施例中，在进行查询运算处理之前，在内存中将每一特征标签对应的十六进制特征数据转换成long类型数据，所述long类型数据可方便进行布尔运算，提高了海量人群特征数据处理的效率。In this embodiment, after obtaining the query logic expression, the query logic expression is split and grouped to obtain an OR operation group (Tag1||Tag2||..Tagn) and (Tag1||Tag2||. .Tagn), then obtaining the hexadecimal feature data corresponding to the feature tag according to the selected query tag and converting it into long type data, and then performing a Boolean operation on the long type data packet corresponding to each group feature tag, and finally all The result of the grouping Boolean operation is converted to a binary string. In this embodiment, before performing the query operation process, the hexadecimal feature data corresponding to each feature tag is converted into long type data in the memory, and the long type data can facilitate Boolean operations, thereby improving a large number of people. The efficiency of feature data processing.

步骤S207：计算获取的二进制字符串中1的数量，并将计算出的数量作为查询的特征标签的受众数量。Step S207: Calculate the number of 1s in the acquired binary string, and use the calculated quantity as the audience number of the feature tag of the query.

在本实施例中，通过计算获取二进制字符串中1的数量，获取海量人群中具有特定特征标签的特征的受众的数量，实现高效的分布式运算处理，提高了海量人群特征数据处理和查询的工作效率。In this embodiment, the number of 1s in the binary string is obtained by calculation, and the number of the audiences with the features of the specific feature tags in the mass population is obtained, thereby realizing efficient distributed computing processing and improving the processing and query of the mass data of the mass population. Work efficiency.

在本发明的一些实施例中，所述海量人群特征数据的处理方法，还包括将计算的所述特征标签的受众数量发送给查询端，使用户在所述查询端可直接快速地获取到查询结果，提高了海量人群特征数据处理的工作效率。In some embodiments of the present invention, the method for processing the mass demographic data further includes: sending the calculated number of the audience of the feature tag to the query end, so that the user can directly obtain the query directly at the query end. As a result, the work efficiency of the mass data processing of the mass population is improved.

图7为本发明又一个实施例中海量人群特征数据的处理系统的结构示意图。如图所示，在上述方法实施例的基础上，所述海量人群特征数据的处理系统100，包括设置模块10、特征数据建立模块20、转化模块30和存储模块40。FIG. 7 is a schematic structural diagram of a processing system for mass population characteristic data according to still another embodiment of the present invention. As shown in the above, based on the foregoing method embodiment, the processing system 100 for mass demographic data includes a setting module 10, a feature data establishing module 20, a conversion module 30, and a storage module 40.

在本实施例中，所述海量人群特征数据可包括性别、年龄、职业、户籍地等身份信息或其他的数据信息。在获得所述海量人群特征数据后，所述设置模块10预先设置人群编号和特征标签，每个人群个体对应唯一的人群编号并按照顺序不间断排列，所述特征标签可根据这些特征数据的特性和实际应用需要灵活设定，比如可设置为男、女、25-30岁、本科、单身、职员等。In this embodiment, the mass population characteristic data may include identity information such as gender, age, occupation, household registration, or other data information. After the mass demographic data is obtained, the setting module 10 pre-sets a population number and a feature tag, and each crowd individual corresponds to a unique crowd number and is arranged in an uninterrupted order, and the feature tag can be based on the characteristics of the feature data. And the actual application needs to be flexible, such as can be set to male, female, 25-30 years old, undergraduate, single, staff and so on.

所述特征数据建立模块20，用于按照所述设置模块10设置的人群编号建立每一特征标签对应的二进制特征数据。所述二进制特征数据基于所述特征标签设置存储，所述海量人群特征数据均以0或1的形式对应标识所述特征标签，1标识具有所述特征标签的特征，0标识不具有所述特征标签的特征，这样所述特征数据建立模块20建立的每一个特征标签按照人群编号对应顺序存储0或1标识的特征标识，以二进制数据的形式形成每一特征标签的受众点阵数据，可充分利用二进制高效的运算特征，可实现海量人群特征数据的快速分析和处理，同时基于特征标签实现海量人群特征数据的分布式存储和检索，支持纵向切割，便于引入分布式计算并支持集群化处理。The feature data establishing module 20 is configured to establish binary feature data corresponding to each feature tag according to the crowd number set by the setting module 10. The binary feature data is stored based on the feature tag setting, wherein the mass population feature data respectively identifies the feature tag in the form of 0 or 1, and 1 identifies a feature having the feature tag, and the 0 flag does not have the feature. The feature of the tag is such that each feature tag established by the feature data establishing module 20 stores the feature identifier of 0 or 1 in the order corresponding to the crowd number, and forms the dot matrix data of each feature tag in the form of binary data, which is sufficient Using binary efficient computing features, it can realize rapid analysis and processing of massive population feature data, and realize distributed storage of massive population feature data based on feature tags. And retrieval, support for vertical cutting, easy to introduce distributed computing and support clustering.

在本实施例中，所述转化模块30根据计算机进制转换规则将所述特征数据建立模块20建立的每一特征标签对应的二进制特征数据以64位作为一个单元进行划分，转化为十六进制特征数据，比如所述特征标签TagCode包括male、woman、married、bachelor、student等特征标签，每一所述特征标签对应人群编号顺序的十六进制形式的特征标识形成AudienceBool，将二进制数据压缩转化为十六进制数据，节省了海量人群特征数据存储的空间。In this embodiment, the conversion module 30 divides the binary feature data corresponding to each feature tag established by the feature data establishing module 20 into 64 units as a unit according to a computer-based conversion rule, and converts it into a hexadecimal The feature data, for example, the feature tag TagCode includes feature tags such as male, woman, married, bachelor, and student, and each feature tag forms an AudienceBool corresponding to the feature identifier of the hexadecimal form of the crowd number sequence, and compresses the binary data. Converted to hexadecimal data, saving space for large populations of feature data storage.

在本实施例的海量人群特征数据的处理系统100中，特征数据建立模块20按照人群编号建立每一特征标签对应的二进制特征数据，转化模块30将建立的每一特征标签对应的二进制特征数据转化为十六进制特征数据，存储模块40将其进行存储，基于特征标签高效地实现海量人群特征数据的分布式处理、存储和检索，提高了海量人群特征数据的处理效率，并节省了数据存储空间。In the processing system 100 of the mass demographic data of the present embodiment, the feature data establishing module 20 creates binary feature data corresponding to each feature tag according to the crowd number, and the conversion module 30 converts the binary feature data corresponding to each feature tag that is established. For the hexadecimal feature data, the storage module 40 stores it, and implements distributed processing, storage and retrieval of mass population feature data efficiently based on the feature tag, improves the processing efficiency of the mass population feature data, and saves data storage. space.

图8为本发明再一个实施例中海量人群特征数据的处理系统的结构示意图。如图所示，所述海量人群特征数据的处理系统100，包括设置模块10、特征数据建立模块20、转化模块30、存储模块40、查询逻辑表达式生成模块50、逻辑运算模块60和计算模块70。FIG. 8 is a schematic structural diagram of a processing system for mass population characteristic data according to still another embodiment of the present invention. As shown, the processing system 100 for mass demographic data includes a setup module 10, a feature data creation module 20, a conversion module 30, a storage module 40, a query logic expression generation module 50, a logic operation module 60, and a calculation module. 70.

在本实施例中，在需要对所述海量人群特征数据的处理系统100中存储的海量人群特征数据进行特征检索查询时，所述查询逻辑表达式生成模块50根据实际检索查询特征标签需求生成查询逻辑表达式。如图9所示，所述查询逻辑表达式生成模块50包括第一生成单元501、第二生成单元502和合成单元503。In the embodiment, when the feature retrieval query is performed on the mass demographic data stored in the processing system 100 of the mass demographic data, the query logic expression generating module 50 generates a query according to the actual retrieval query feature tag requirement. Logical expression. As shown in FIG. 9, the query logic expression generating module 50 includes a first generating unit 501, a second generating unit 502, and a synthesizing unit 503.

其中，所述第一生成单元501，用于根据选择查询的特征标签生成逻辑或运算表达式；所述第二生成单元502，用于根据选择查询的特征标签生成逻辑与运算表达式；所述合成单元503，用于将所述第一生成单元501和第二生成单元502生成的逻辑或运算表达式和逻辑与运算表达式合成查询逻辑表达式。The first generating unit 501 is configured to generate a logical OR operation expression according to the feature tag of the selected query; the second generating unit 502 is configured to generate a logical AND operation expression according to the feature tag of the selected query; The synthesizing unit 503 is configured to synthesize the logical OR operation expression and the logical AND operation expression generated by the first generation unit 501 and the second generation unit 502 into a query logic expression.

示例的，第一生成单元501和第二生成单元502根据选择查询的特征标签生成逻辑或运算表达式和逻辑与运算表达式，所述合成单元503合成查询逻辑表达式(Tag1||Tag2..Tagn)&&(Tag1||Tag2..Tagn)，在客户端生成特定特征查询的逻辑表达式。For example, the first generating unit 501 and the second generating unit 502 generate a logical OR operation expression and a logical AND operation expression according to the feature tag of the selected query, and the synthesizing unit 503 synthesizes the query logical expression (Tag1||Tag2.. Tagn)&&(Tag1||Tag2..Tagn), which generates a logical expression for a specific feature query on the client.

在本实施例中，所述逻辑运算模块60用于根据所述查询逻辑表达式生成模块50生成的逻辑表达式对特征标签对应的十六进制特征数据进行逻辑运算而获取二进制字符串。参见图10，所述逻辑运算模块60，包括分组单元601、long类型数据转化单元602、分组运算单元603和二进制字符串转化单元604。In this embodiment, the logic operation module 60 is configured to perform a logical operation on the hexadecimal feature data corresponding to the feature tag according to the logic expression generated by the query logic expression generation module 50 to obtain a binary character. string. Referring to FIG. 10, the logic operation module 60 includes a grouping unit 601, a long type data conversion unit 602, a grouping operation unit 603, and a binary string conversion unit 604.

其中，所述分组单元601，用于对所述查询逻辑表达式生成模块生成的逻辑表达式进行分组；所述long类型数据转化单元602，用于获取所述分组单元601分组的特征标签对应的十六进制特征数据并转化为long类型数据；所述分组运算单元603，用于对所述long类型数据转化单元602转化的每一分组特征标签对应的long类型数据分组进行布尔运算；所述二进制字符串转化单元604，用于将所述分组运算单元603所有分组布尔运算的结果转化为二进制字符串。The grouping unit 601 is configured to group the logical expressions generated by the query logic expression generating module, and the long type data converting unit 602 is configured to acquire the feature tags corresponding to the grouping unit 601 grouping. The hexadecimal feature data is converted into long type data; the grouping operation unit 603 is configured to perform a Boolean operation on the long type data packet corresponding to each group feature tag converted by the long type data conversion unit 602; The binary string conversion unit 604 is configured to convert the result of all grouping Boolean operations of the grouping operation unit 603 into a binary string.

示例的，在所述查询逻辑表达式生成模块50获得查询逻辑表达式后，所述分组单元601对所述查询逻辑表达式进行拆分分组而得到或运算组(Tag1||Tag2||..Tagn)和(Tag1||Tag2||..Tagn)，然后所述long类型数据转化单元602根据所述选择的查询标签获取特征标签对应的十六进制特征数据并转化为long类型数据，所述分组运算单元603对每一分组特征标签对应的long类型数据分组进行布尔运算，最后所述二进制字符串转化单元604将所有分组布尔运算的结果转化为二进制字符串。在本实施例中，在所述分组运算单元603进行查询运算处理之前，在内存中将每一特征标签对应的十六进制特征数据转换成long类型数据，而long类型数据可方便进行布尔运算，提高了海量人群特征数据处理的效率。For example, after the query logic expression generating module 50 obtains the query logic expression, the grouping unit 601 splits the query logic expression to obtain an OR operation group (Tag1||Tag2||.. Tagn) and (Tag1||Tag2||..Tagn), then the long type data conversion unit 602 obtains the hexadecimal feature data corresponding to the feature tag according to the selected query tag and converts it into long type data, The packet operation unit 603 performs a Boolean operation on the long type data packet corresponding to each packet feature tag, and finally the binary string conversion unit 604 converts the result of all the packet Boolean operations into a binary string. In this embodiment, before the packet operation unit 603 performs the query operation process, the hexadecimal feature data corresponding to each feature tag is converted into long type data in the memory, and the long type data can facilitate the Boolean operation. , improve the efficiency of large-scale population feature data processing.

在本实施例中，所述计算模块70用于计算所述逻辑运算模块60获取的二进制字符串中1的数量，并将计算出的数量作为查询的特征标签的受众数量。所述计算模块70通过计算获取二进制字符串中1的数量，获取海量人群中具有特定特征标签的特征的受众的数量，实现高效的分布式运算处理，提高了海量人群特征数据处理和查询的工作效率。In this embodiment, the calculation module 70 is configured to calculate the number of 1 in the binary string obtained by the logic operation module 60, and use the calculated quantity as the audience number of the feature tag of the query. The calculation module 70 obtains the number of the 1s in the binary string to obtain the number of the audiences with the characteristics of the specific feature tags in the mass population, realizes the efficient distributed computing process, and improves the processing and query of the mass population characteristic data. effectiveness.

在本发明的一些实施例中，所述海量人群特征数据的处理系统100还包括发送模块，用于将所述计算模块70计算的所述特征标签的受众数量发送给查询端，使用户在所述查询端可直接快速地获取到查询结果，提高了海量人群特征数据处理的工作效率。In some embodiments of the present invention, the processing system 100 for the mass demographic data further includes a sending module, configured to send the audience number of the feature tag calculated by the calculating module 70 to the query end, so that the user is in the The query end can directly obtain the query result quickly, and improves the work efficiency of the mass data feature data processing.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列 (PGA)，现场可编程门阵列(FPGA)等。It should be understood that portions of the invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuit, ASIC with suitable combination logic gate, programmable gate array (PGA), Field Programmable Gate Array (FPGA), etc.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of the specification, the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like are intended to mean the specific features described in connection with the embodiments or examples, A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施例，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。 While the embodiments of the present invention have been shown and described, the embodiments of the invention may The scope of the invention is defined by the claims and their equivalents.

Claims

A method for processing massive population feature data, comprising:

Set the crowd number and feature label;

Establishing binary feature data corresponding to each feature tag according to the crowd number;

Converting binary feature data corresponding to each feature tag to hexadecimal feature data;

The hexadecimal feature data corresponding to each feature tag converted is stored.

The method for processing massive population feature data according to claim 1, further comprising:

Generating a query logic expression according to the feature tag of the selected query;

Obtaining a binary string by performing a logical operation on the hexadecimal feature data corresponding to the feature tag according to the generated logical expression;

Calculate the number of 1s in the obtained binary string and use the calculated quantity as the audience number of the feature tag of the query.

The method for processing a massive population feature data according to claim 2, wherein the generating a query logic expression according to the feature tag of the selected query comprises:

Generating a logical OR operation expression according to the feature tag of the selected query;

Generating logical AND operation expressions based on the feature tags of the selected query;

The generated logical OR operation expression and the logical AND operation expression are combined into a query logic expression.

The method for processing massive population feature data according to claim 2, wherein the logical operation of the hexadecimal feature data corresponding to the feature tag according to the generated logical expression to obtain a binary string comprises:

Group the generated logical expressions;

Obtaining hexadecimal feature data corresponding to the feature tag of the group and converting it into long type data;

Boolean operations on long type data packets corresponding to each grouping feature tag;

Converts the result of all grouped Boolean operations to a binary string.

The method for processing massive population feature data according to claim 2, further comprising:

The calculated audience number of the feature tag is sent to the query end.

A processing system for massive population characteristic data, comprising:

a setting module for setting a crowd number and a feature tag;

a feature data establishing module, configured to establish binary feature data corresponding to each feature tag according to the crowd number set by the setting module;

a conversion module, configured to convert binary feature data corresponding to each feature tag established by the feature data establishing module into hexadecimal feature data;

And a storage module, configured to store the hexadecimal feature data corresponding to each feature tag converted by the conversion module.

The system for processing massive population characteristic data according to claim 6, further comprising:

Querying a logical expression generating module, configured to generate a query logic expression according to the feature tag of the selected query;

a logic operation module, configured to perform a logic operation on the hexadecimal feature data corresponding to the feature tag according to the logic expression generated by the query logic expression generation module to obtain a binary character string;

The calculation module is configured to calculate the number of 1s in the binary string obtained by the logic operation module, and use the calculated quantity as the audience number of the feature tag of the query.

The system for processing massive population feature data according to claim 7, wherein the query logic expression generating module comprises:

a first generating unit, configured to generate a logical OR operation expression according to the feature tag of the selected query;

a second generating unit, configured to generate a logical AND operation expression according to the feature tag of the selected query;

And a synthesizing unit, configured to synthesize the logical OR operation expression and the logical AND operation expression generated by the first generation unit and the second generation unit into a query logic expression.

The processing system for mass population characteristic data according to claim 7, wherein the logic operation module comprises:

a grouping unit, configured to group the logical expressions generated by the query logic expression generating module;

a long type data conversion unit, configured to acquire hexadecimal feature data corresponding to the feature tag of the grouping unit group and convert the data into long type data;

a grouping operation unit, configured to perform a Boolean operation on the long type data packet corresponding to each group feature tag converted by the long type data conversion unit;

A binary string conversion unit for converting the result of all grouping Boolean operations of the grouping operation unit into a binary string.

The system for processing mass demographic data according to claim 7, further comprising a sending module, configured to send the number of audiences of the feature tags calculated by the calculating module to the querying end.