[go: up one dir, main page]

CN105824811A - Big data analysis method and device - Google Patents

Big data analysis method and device Download PDF

Info

Publication number
CN105824811A
CN105824811A CN201510001942.6A CN201510001942A CN105824811A CN 105824811 A CN105824811 A CN 105824811A CN 201510001942 A CN201510001942 A CN 201510001942A CN 105824811 A CN105824811 A CN 105824811A
Authority
CN
China
Prior art keywords
data
type
rules
rule
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510001942.6A
Other languages
Chinese (zh)
Other versions
CN105824811B (en
Inventor
黄庆荣
谢志崇
魏建荣
彭家华
郑志欢
林恪
陈钰铖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Fujian Co Ltd
Original Assignee
China Mobile Group Fujian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Fujian Co Ltd filed Critical China Mobile Group Fujian Co Ltd
Priority to CN201510001942.6A priority Critical patent/CN105824811B/en
Publication of CN105824811A publication Critical patent/CN105824811A/en
Application granted granted Critical
Publication of CN105824811B publication Critical patent/CN105824811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明实施例公开了一种大数据分析方法,包括:基于输入的第一组数据和第二组数据,获取满足预设条件的至少两个特征信息;所述第一组数据和第二组数据均为第一通信网络中的数据;所述第一组数据满足第一预设规则;所述第二组数据满足第二预设规则;依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则;依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据;所述第三组数据为除所述第一通信网络以外的其他通信网络中的数据。本发明实施例还公开了一种大数据分析装置。

The embodiment of the present invention discloses a big data analysis method, including: based on the input first set of data and second set of data, obtaining at least two feature information satisfying preset conditions; the first set of data and the second set of data The data are all data in the first communication network; the first set of data satisfies a first preset rule; the second set of data satisfies a second preset rule; according to the at least two feature information, the first One set of data and the second set of data are analyzed to determine the first type of rules and the second type of rules; according to the first type of rules and the second type of rules, it is determined in the input third set of data Target data of a preset rule; the third group of data is data in other communication networks except the first communication network. The embodiment of the invention also discloses a big data analysis device.

Description

一种大数据分析方法及其装置A big data analysis method and device thereof

技术领域technical field

本发明涉及通信技术,尤其涉及一种大数据分析方法及其装置。The invention relates to communication technology, in particular to a big data analysis method and a device thereof.

背景技术Background technique

随着第四代移动通信技术(4G,the4Generationmobilecommunicationtechnology)的商用,各大运营商竞争益发激烈;异网高价值用户的回流工作和4G终端的渗透工作对于移动运营商的发展起着重要作用;所以异网高价值用户的识别显得至关重要。With the commercialization of the 4th generation mobile communication technology (4G, the4Generation mobile communication technology), the competition among major operators has become increasingly fierce; the return of high-value users on different networks and the penetration of 4G terminals play an important role in the development of mobile operators; therefore The identification of high-value users of different networks is very important.

目前业界已有对用户行为进行分析并建模以确定用户属性的方法,但是,现有方法中,普遍侧重于统计异网用户的数量,并不侧重于异网用户的识别,以及异网用户的终端类型的识别。At present, there are methods in the industry to analyze and model user behavior to determine user attributes. However, the existing methods generally focus on counting the number of users on different networks, not on the identification of users on different networks. identification of the terminal type.

发明内容Contents of the invention

为解决现有存在的技术问题,本发明实施例提供了一种大数据分析方法及其装置,能够依据本网数据规则,在异网数据中确定出满足预设规则的目标数据。In order to solve the existing technical problems, the embodiment of the present invention provides a big data analysis method and its device, which can determine the target data satisfying the preset rules in the data of different networks according to the data rules of the local network.

本发明实施例的技术方案是这样实现的:本发明实施例提供了一种大数据分析方法,所述方法包括:The technical solution of the embodiment of the present invention is achieved in this way: the embodiment of the present invention provides a big data analysis method, the method comprising:

基于输入的第一组数据和第二组数据,获取满足预设条件的至少两个特征信息;所述第一组数据和第二组数据均为第一通信网络中的数据;所述第一组数据满足第一预设规则;所述第二组数据满足第二预设规则;Based on the input first set of data and second set of data, at least two feature information satisfying preset conditions are acquired; the first set of data and the second set of data are both data in the first communication network; the first set of The set of data satisfies a first preset rule; the second set of data satisfies a second preset rule;

依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则;Analyzing the first set of data and the second set of data according to the at least two feature information to determine the first type of rules and the second type of rules;

依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据;所述第三组数据为除所述第一通信网络以外的其他通信网络中的数据。According to the first type of rule and the second type of rule, determine the target data that satisfies the first preset rule in the input third set of data; the third set of data is other than the first communication network data in other communication networks.

上述方案中,所述依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则,包括:In the above solution, the first set of data and the second set of data are analyzed according to the at least two feature information, and the first type of rules and the second type of rules are determined, including:

采用逻辑回归算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第一类规则;Using a logistic regression algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine the first type of rules;

采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则。A decision tree algorithm is used to analyze the first set of data and the second set of data according to the at least two feature information to determine a second type of rule.

上述方案中,所述采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则,包括:In the above solution, the decision tree algorithm is used to analyze the first set of data and the second set of data according to the at least two feature information, and determine the second type of rules, including:

采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出N个规则;所述N为大于等于2的正整数;Using a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information, and determine N rules; the N is a positive integer greater than or equal to 2;

在所述N个规则中,确定出满足第三预设规则的第二类规则。Among the N rules, a second-type rule that satisfies the third preset rule is determined.

上述方案中,所述依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据,包括:In the above solution, according to the first type of rules and the second type of rules, determining the target data that satisfies the first preset rule from the input third set of data includes:

分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据;Analyzing the input third set of data according to the first type of rules and the second type of rules respectively, to obtain the first suspected target data and the second suspected target data;

基于所述第一疑似目标数据和第二疑似目标数据确定出满足所述第一预设规则的目标数据。Target data satisfying the first preset rule is determined based on the first suspected target data and the second suspected target data.

上述方案中,所述第二类规则包括:第一类子规则;所述第一类子规则满足所述第一预设规则;In the above solution, the second type of rules includes: the first type of sub-rules; the first type of sub-rules satisfy the first preset rule;

对应地,所述分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据,包括:Correspondingly, according to the first type of rules and the second type of rules, the input third set of data is analyzed to obtain the first suspected target data and the second suspected target data, including:

依据所述第一类规则,对输入的第三组数据进行分析,得到第一疑似目标数据;Analyzing the input third set of data according to the first type of rules to obtain the first suspected target data;

依据所述第一类子规则,对输入的第三组数据进行分析,得到第二疑似目标数据。According to the first type of sub-rules, the input third set of data is analyzed to obtain the second suspected target data.

上述方案中,所述第二类规则还包括:第二类子规则;所述第二类子规则满足第二预设规则;所述方法还包括:In the above solution, the second type of rule further includes: a second type of sub-rule; the second type of sub-rule satisfies a second preset rule; the method further includes:

依据所述第二类子规则,对所述第一疑似目标数据和所述第二疑似目标数据进行分析,得到疑似非目标数据;Analyzing the first suspected target data and the second suspected target data according to the second type of sub-rules to obtain suspected non-target data;

对应地,所述基于所述第一疑似目标数据和第二疑似目标数据确定出目标数据,包括:Correspondingly, the determining the target data based on the first suspected target data and the second suspected target data includes:

基于所述第一疑似目标数据、第二疑似目标数据和疑似非目标数据,确定出目标数据。The target data is determined based on the first suspected target data, the second suspected target data and the suspected non-target data.

本发明实施例还提供了一种大数据分析装置,所述装置包括:The embodiment of the present invention also provides a big data analysis device, the device includes:

获取单元,用于基于输入的第一组数据和第二组数据,获取满足预设条件的至少两个特征信息;所述第一组数据和第二组数据均为第一通信网络中的数据;所述第一组数据满足第一预设规则;所述第二组数据满足第二预设规则;An acquisition unit, configured to acquire at least two feature information satisfying preset conditions based on the input first set of data and second set of data; the first set of data and the second set of data are both data in the first communication network ; The first set of data satisfies a first preset rule; the second set of data satisfies a second preset rule;

分析单元,用于依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则;An analysis unit, configured to analyze the first set of data and the second set of data according to the at least two characteristic information, and determine the first type of rules and the second type of rules;

确定单元,用于依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据;所述第三组数据为除所述第一通信网络以外的其他通信网络中的数据。A determining unit, configured to determine, from the input third set of data, target data satisfying the first preset rule according to the first type of rule and the second type of rule; the third set of data is Data in a communication network other than the first communication network.

上述方案中,所述分析单元包括:In the above scheme, the analysis unit includes:

第一分析子单元,用于采用逻辑回归算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第一类规则;The first analysis subunit is configured to use a logistic regression algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine a first type of rule;

第二分析子单元,用于采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则。The second analysis subunit is configured to use a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine a second type of rule.

上述方案中,所述第二分析子单元,还用于采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出N个规则;所述N为大于等于2的正整数;In the above solution, the second analysis subunit is further configured to use a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information, and determine N Rules; said N is a positive integer greater than or equal to 2;

还用于在所述N个规则中,确定出满足第三预设规则的第二类规则。It is also used to determine, among the N rules, a second type of rule that satisfies the third preset rule.

上述方案中,所述确定单元,包括:In the above scheme, the determination unit includes:

第一确定子单元,用于分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据;The first determination subunit is configured to analyze the input third set of data according to the first type of rules and the second type of rules respectively, to obtain the first suspected target data and the second suspected target data;

第二确定子单元,用于基于所述第一疑似目标数据和第二疑似目标数据确定出满足所述第一预设规则的目标数据。The second determining subunit is configured to determine target data satisfying the first preset rule based on the first suspected target data and the second suspected target data.

上述方案中,所述第二类规则包括:第一类子规则;所述第一类子规则满足所述第一预设规则;对应地,In the above solution, the second type of rules includes: the first type of sub-rules; the first type of sub-rules satisfy the first preset rule; correspondingly,

所述第一确定子单元,还用于依据所述第一类规则,对输入的第三组数据进行分析,得到第一疑似目标数据;The first determining subunit is further configured to analyze the input third set of data according to the first type of rules to obtain the first suspected target data;

还用于依据所述第一类子规则,对输入的第三组数据进行分析,得到第二疑似目标数据。It is also used to analyze the input third set of data according to the first type of sub-rules to obtain the second suspected target data.

上述方案中,所述第二类规则还包括:第二类子规则;所述第二类子规则满足第二预设规则;In the above solution, the second type of rule further includes: a second type of sub-rule; the second type of sub-rule satisfies the second preset rule;

所述第一确定子单元,还用于依据所述第二类子规则,对所述第一疑似目标数据和所述第二疑似目标数据进行分析,得到疑似非目标数据;The first determination subunit is further configured to analyze the first suspected target data and the second suspected target data according to the second type of sub-rules to obtain suspected non-target data;

对应地,所述第二确定子单元,还用于基于所述第一疑似目标数据、第二疑似目标数据和疑似非目标数据,确定出目标数据。Correspondingly, the second determination subunit is further configured to determine the target data based on the first suspected target data, the second suspected target data and the suspected non-target data.

本发明实施例所提供的大数据分析方法及其装置,能够在第一通信网络的第一组数据和第二组数据中确定出至少两个特征信息,并采用两种不同算法,基于所述至少两个特征信息确定出针对于不同算法的第一类规则和第二类规则,如此,通过所述第一类规则和第二类规则,对除所述第一通信网络以外的其他通信网络中的第三组数据进行分析,以在所述第三组数据中确定出满足预设规则的目标数据,因此,本发明实施例能够实现依据本网数据规则,在异网数据中确定出满足预设规则的目标数据的目的。The big data analysis method and its device provided by the embodiments of the present invention can determine at least two feature information in the first set of data and the second set of data in the first communication network, and use two different algorithms, based on the At least two pieces of feature information determine the first type of rules and the second type of rules for different algorithms, so that, through the first type of rules and the second type of rules, other communication networks except the first communication network Analyze the third group of data in the third group of data to determine the target data that satisfies the preset rules in the third group of data. The purpose of the target data for preset rules.

附图说明Description of drawings

图1为本发明实施例大数据分析方法的实现流程示意图;Fig. 1 is a schematic diagram of the implementation flow of the big data analysis method of the embodiment of the present invention;

图2为本发明实施例大数据分析装置的具体结构示意图;FIG. 2 is a schematic structural diagram of a big data analysis device according to an embodiment of the present invention;

图3为本发明实施例分析单元的具体结构示意图;Fig. 3 is the specific structural schematic diagram of the analyzing unit of the embodiment of the present invention;

图4为本发明实施例确定单元的具体结构示意图;FIG. 4 is a schematic structural diagram of a determination unit according to an embodiment of the present invention;

图5为本发明实施例大数据分析方法的具体实现的流程示意图。FIG. 5 is a schematic flowchart of a specific implementation of a big data analysis method according to an embodiment of the present invention.

具体实施方式detailed description

为了能够更加详尽地了解本发明的特点与技术内容,下面结合附图对本发明的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本发明。In order to understand the characteristics and technical content of the present invention in more detail, the implementation of the present invention will be described in detail below in conjunction with the accompanying drawings. The attached drawings are only for reference and description, and are not intended to limit the present invention.

实施例一Embodiment one

图1为本发明实施例大数据分析方法的实现流程示意图;如图1所示,所述方法包括:Fig. 1 is a schematic diagram of the implementation process of the big data analysis method of the embodiment of the present invention; as shown in Fig. 1, the method includes:

步骤101:基于输入的第一组数据和第二组数据,获取满足预设条件的至少两个特征信息;所述第一组数据和第二组数据均为第一通信网络中的数据;所述第一组数据满足第一预设规则;所述第二组数据满足第二预设规则;Step 101: Based on the input first set of data and second set of data, obtain at least two feature information satisfying preset conditions; the first set of data and the second set of data are both data in the first communication network; The first set of data satisfies a first preset rule; the second set of data satisfies a second preset rule;

本实施例中,所述第一预设规则可以为在第一通信网络中数据对应的用户的通信设备类型属于第一类型的规则;所述第二预设规则可以为在第一通信网络中数据对应的用户的通信设备类型不属于第一类型的规则;如此,在所述第一通信网络中,所述第一组数据所对应的通信设备类型均为第一类型;所述第二组数据对应的通信设备类型均不为第一类型;由于不同通信设备类型所对应的数据的特征规则不同,因此,通过对第一组数据和第二组数据各自的特征规则进行分析,能够确定出满足预设条件的M个特征信息;基于所述M个特征信息对数据进行分析,能够估算出数据对应的通信设备类型等特征;基于上述过程,本发明实施例能够依据所述第一通信网络中的特征信息,从异网的大量数据中确定出通信设备类型属于第一类型的数据,为大数据分析奠定基础;这里,所述M为大于等于2的正整数。In this embodiment, the first preset rule may be a rule that the communication device type of the user corresponding to the data in the first communication network belongs to the first type; the second preset rule may be a rule in the first communication network The communication device type of the user corresponding to the data does not belong to the first type of rule; thus, in the first communication network, the communication device types corresponding to the first group of data are all of the first type; the second group The type of communication equipment corresponding to the data is not the first type; since the characteristic rules of the data corresponding to different types of communication equipment are different, by analyzing the respective characteristic rules of the first set of data and the second set of data, it can be determined that M pieces of feature information that meet the preset conditions; analyze the data based on the M pieces of feature information, and estimate the characteristics such as the communication device type corresponding to the data; based on the above process, the embodiment of the present invention can be based on the first communication network The feature information in is determined from a large amount of data in different networks to determine that the communication device type belongs to the first type of data, which lays the foundation for big data analysis; here, the M is a positive integer greater than or equal to 2.

本实施例中,所述特征信息具体为符合预设条件的关键变量指标,采用不同的算法,通过关键变量指标对第一通信网络中的大数据进行分析,也即对第一组数据和第二组数据进行分析,如此,为在第一通信网络的大数据中确定出规则奠定基础。In this embodiment, the feature information is specifically a key variable index that meets the preset conditions, and different algorithms are used to analyze the big data in the first communication network through the key variable index, that is, the first group of data and the second group of data The second set of data is analyzed, thus laying the foundation for determining the rules in the big data of the first communication network.

本实施例中,所述预设条件包括但不限于:大于等于第一用户数量的条件、通信对象的通信设备类型为第一类型的条件等。In this embodiment, the preset conditions include, but are not limited to: a condition that the number of users is greater than or equal to the first one, a condition that the communication device type of the communication object is the first type, and the like.

步骤102:依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则;Step 102: Analyze the first set of data and the second set of data according to the at least two characteristic information, and determine the first type of rules and the second type of rules;

本实施例中,依据第一通信网络中确定出的至少两个特征信息,采用不同算法,对所述第一组数据和第二组数据进行分析,进而确定出基于所述第一通信网络的第一类规则和第二类规则。In this embodiment, according to at least two characteristic information determined in the first communication network, different algorithms are used to analyze the first group of data and the second group of data, and then determine the information based on the first communication network. The first type of rules and the second type of rules.

在实际应用中,对大数据进行数据分析时,通常选用不同的算法,如此,以提高分析结果的准确性;因此,本实施例也选用两种不同的算法对输入的第一组数据和第二组数据进行分析。In practical applications, different algorithms are usually used for data analysis of big data, so as to improve the accuracy of the analysis results; Two sets of data were analyzed.

上述方案中,所述依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则,包括:In the above solution, the first set of data and the second set of data are analyzed according to the at least two feature information, and the first type of rules and the second type of rules are determined, including:

采用逻辑回归算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第一类规则;Using a logistic regression algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine the first type of rules;

采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则。A decision tree algorithm is used to analyze the first set of data and the second set of data according to the at least two feature information to determine a second type of rule.

上述方案中,所述采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则,包括:In the above solution, the decision tree algorithm is used to analyze the first set of data and the second set of data according to the at least two feature information, and determine the second type of rules, including:

采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出N个规则;所述N为大于等于2的正整数;Using a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information, and determine N rules; the N is a positive integer greater than or equal to 2;

在所述N个规则中,确定出满足第三预设规则的第二类规则。Among the N rules, a second-type rule that satisfies the third preset rule is determined.

本实施例中,由于步骤101中确定出的特征信息的个数不同,使得采用决策树算法确定出的规则的个数不同,即N不同;因此,N的取值受限于所述特征信息的个数。In this embodiment, since the number of feature information determined in step 101 is different, the number of rules determined by the decision tree algorithm is different, that is, N is different; therefore, the value of N is limited by the feature information the number of .

本实施例中,所述第二类规则为一统称,是所述N个规则中、所有满足第三预设规则的规则统称,因此,并未指一特定规则。In this embodiment, the second type of rules is a general term, which is a general term for all rules satisfying the third preset rule among the N rules, and therefore, does not refer to a specific rule.

步骤103:依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据;所述第三组数据为除所述第一通信网络以外的其他通信网络中的数据。Step 103: According to the first type of rule and the second type of rule, determine the target data that satisfies the first preset rule from the input third set of data; the third set of data is Data in communication networks other than communication networks.

本实施例中,能够通过在第一通信网络中确定出的第一类规则和第二类规则,在除所述第一通信网络之外的其他通信网络中的大量数据中、确定出满足第一预设规则的目标数据,即在其他通信网络的数据中,确定出用户的通信设备类型属于第一类型的目标数据,如此,实现基于本网中数据规则,在异网数据中确定出满足预设规则的目标数据的目的。In this embodiment, by using the first-type rules and the second-type rules determined in the first communication network, it can be determined among a large amount of data in other communication networks other than the first communication network that satisfies the first type of rules. The target data of a preset rule, that is, in the data of other communication networks, it is determined that the type of the user’s communication device belongs to the first type of target data, so that based on the data rules in this network, it is determined in the data of other networks that satisfy The purpose of the target data for preset rules.

上述方案中,所述依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据,包括:In the above solution, according to the first type of rules and the second type of rules, determining the target data that satisfies the first preset rule from the input third set of data includes:

分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据;Analyzing the input third set of data according to the first type of rules and the second type of rules respectively, to obtain the first suspected target data and the second suspected target data;

基于所述第一疑似目标数据和第二疑似目标数据确定出满足所述第一预设规则的目标数据。Target data satisfying the first preset rule is determined based on the first suspected target data and the second suspected target data.

本实施例中,所述第一疑似目标数据为与第一类规则对应的数据,即通过第一类规则,在除所述第一通信网络之外的其他通信网络中确定出的满足第一预设规则的疑似目标数据;所述第二疑似目标数据为与第二类规则对应的数据,即通过第二类规则,在除所述第一通信网络之外的其他通信网络中确定出的满足第一预设规则的疑似目标数据。In this embodiment, the first suspected target data is data corresponding to the first type of rule, that is, through the first type of rule, it is determined in a communication network other than the first communication network that satisfies the first type of rule. Suspected target data of preset rules; the second suspected target data is data corresponding to the second type of rule, that is, determined in other communication networks except the first communication network through the second type of rule Suspected target data satisfying the first preset rule.

上述方案中,所述第二类规则包括:第一类子规则;所述第一类子规则满足所述第一预设规则;In the above solution, the second type of rules includes: the first type of sub-rules; the first type of sub-rules satisfy the first preset rule;

对应地,所述分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据,包括:Correspondingly, according to the first type of rules and the second type of rules, the input third set of data is analyzed to obtain the first suspected target data and the second suspected target data, including:

依据所述第一类规则,对输入的第三组数据进行分析,得到第一疑似目标数据;Analyzing the input third set of data according to the first type of rules to obtain the first suspected target data;

依据所述第一类子规则,对输入的第三组数据进行分析,得到第二疑似目标数据。According to the first type of sub-rules, the input third set of data is analyzed to obtain the second suspected target data.

本实施例中,由于所述第二类规则为采用决策树算法确定出的规则,因此,通过第二类规则能够确定出满足第一预设规则的第二疑似目标数据,和满足第二预设规则的疑似非目标数据;即,所述第二类规则包括:第一类子规则和第二类子规则;通过所述第一类子规则,能够确定出满足第一预设规则的第二疑似目标数据;通过所述第二类子规则,能够确定出满足第二预设规则的疑似非目标数据;因此,本实施例还需要从第一疑似目标数据和第二疑似目标数据中剔除疑似非目标数据,以确定出最终目标数据。In this embodiment, since the second type of rule is a rule determined by using a decision tree algorithm, the second suspected target data that meets the first preset rule can be determined through the second type of rule, and the second suspected target data that satisfies the second preset rule can be determined through the second type of rule. Set the suspected non-target data of the rule; that is, the second type of rule includes: the first type of sub-rule and the second type of sub-rule; through the first type of sub-rule, it is possible to determine the first Two suspected target data; through the second type of sub-rules, the suspected non-target data that meets the second preset rule can be determined; therefore, this embodiment also needs to be removed from the first suspected target data and the second suspected target data Suspected non-target data to determine the final target data.

本实施例中,所述第一类子规则为满足第一预设规则的规则;所述第二类子规则为不满足所述第一预设规则的规则;也即为满足所述第二预设规则的规则;当所述第二类子规则为不满足所述第一预设规则的规则时,所述疑似非目标数据为一类干扰数据;因此,所述疑似非目标数据也可以称为干扰数据。In this embodiment, the first type of sub-rule is a rule that satisfies the first preset rule; the second type of sub-rule is a rule that does not satisfy the first preset rule; that is, it is a rule that satisfies the second A rule of a preset rule; when the second type of sub-rule is a rule that does not satisfy the first preset rule, the suspected non-target data is a type of interference data; therefore, the suspected non-target data can also be called noise data.

上述方案中,所述第二类规则还包括:第二类子规则;所述第二类子规则满足第二预设规则;所述方法还包括:In the above solution, the second type of rule further includes: a second type of sub-rule; the second type of sub-rule satisfies a second preset rule; the method further includes:

依据所述第二类子规则,对所述第一疑似目标数据和所述第二疑似目标数据进行分析,得到疑似非目标数据;Analyzing the first suspected target data and the second suspected target data according to the second type of sub-rules to obtain suspected non-target data;

对应地,所述基于所述第一疑似目标数据和第二疑似目标数据确定出目标数据,包括:Correspondingly, the determining the target data based on the first suspected target data and the second suspected target data includes:

基于所述第一疑似目标数据、第二疑似目标数据和疑似非目标数据,确定出目标数据。The target data is determined based on the first suspected target data, the second suspected target data and the suspected non-target data.

为实现上述方法,本发明实施例还提供了一种大数据分析装置,如图2所示,所述装置包括:In order to implement the above method, an embodiment of the present invention also provides a big data analysis device, as shown in Figure 2, the device includes:

获取单元21,用于基于输入的第一组数据和第二组数据,获取满足预设条件的至少两个特征信息;所述第一组数据和第二组数据均为第一通信网络中的数据;所述第一组数据满足第一预设规则;所述第二组数据满足第二预设规则;An acquisition unit 21, configured to acquire at least two feature information satisfying preset conditions based on the input first set of data and second set of data; both the first set of data and the second set of data are in the first communication network data; the first set of data satisfies a first preset rule; the second set of data satisfies a second preset rule;

分析单元22,用于依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则;An analyzing unit 22, configured to analyze the first set of data and the second set of data according to the at least two characteristic information, and determine the first type of rules and the second type of rules;

确定单元23,用于依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据;所述第三组数据为除所述第一通信网络以外的其他通信网络中的数据。The determining unit 23 is configured to determine, from the input third set of data, target data satisfying the first preset rule according to the first type of rule and the second type of rule; the third set of data is Data in other communication networks other than the first communication network.

上述方案中,如图3所示,所述分析单元22包括:In the above solution, as shown in Figure 3, the analysis unit 22 includes:

第一分析子单元221,用于采用逻辑回归算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第一类规则;The first analysis subunit 221 is configured to use a logistic regression algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine a first type of rule;

第二分析子单元222,用于采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则。The second analysis subunit 222 is configured to use a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine a second type of rule.

上述方案中,所述第二分析子单元222,还用于采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出N个规则;所述N为大于等于2的正整数;In the above solution, the second analysis subunit 222 is also configured to use a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information, and determine N rules; said N is a positive integer greater than or equal to 2;

还用于在所述N个规则中,确定出满足第三预设规则的第二类规则。It is also used to determine, among the N rules, a second type of rule that satisfies the third preset rule.

上述方案中,如图4所示,所述确定单元23,包括:In the above solution, as shown in FIG. 4, the determining unit 23 includes:

第一确定子单元231,用于分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据;The first determination subunit 231 is configured to analyze the input third set of data according to the first type of rules and the second type of rules respectively, to obtain the first suspected target data and the second suspected target data;

第二确定子单元232,用于基于所述第一疑似目标数据和第二疑似目标数据确定出满足所述第一预设规则的目标数据。The second determining subunit 232 is configured to determine target data satisfying the first preset rule based on the first suspected target data and the second suspected target data.

上述方案中,所述第二类规则包括:第一类子规则;所述第一类子规则满足所述第一预设规则;对应地,In the above solution, the second type of rules includes: the first type of sub-rules; the first type of sub-rules satisfy the first preset rule; correspondingly,

所述第一确定子单元231,还用于依据所述第一类规则,对输入的第三组数据进行分析,得到第一疑似目标数据;The first determination subunit 231 is further configured to analyze the input third set of data according to the first type of rules to obtain the first suspected target data;

还用于依据所述第一类子规则,对输入的第三组数据进行分析,得到第二疑似目标数据。It is also used to analyze the input third set of data according to the first type of sub-rules to obtain the second suspected target data.

上述方案中,所述第二类规则还包括:第二类子规则;所述第二类子规则满足第二预设规则;In the above solution, the second type of rule further includes: a second type of sub-rule; the second type of sub-rule satisfies the second preset rule;

所述第一确定子单元231,还用于依据所述第二类子规则,所述第一疑似目标数据和所述第二疑似目标数据进行分析,得到疑似非目标数据;The first determination subunit 231 is further configured to analyze the first suspected target data and the second suspected target data according to the second type of sub-rules to obtain suspected non-target data;

对应地,所述第二确定子单元232,还用于基于所述第一疑似目标数据、第二疑似目标数据和疑似非目标数据,确定出目标数据。Correspondingly, the second determination subunit 232 is further configured to determine the target data based on the first suspected target data, the second suspected target data and the suspected non-target data.

所述获取单元21、分析单元22及确定单元23均可以运行于计算机上,可由位于计算机上的中央处理器(CPU)、或微处理器(MPU)、或数字信号处理器(DSP)、或可编程门阵列(FPGA)实现。The acquisition unit 21, the analysis unit 22 and the determination unit 23 all can run on a computer, and can be located on a central processing unit (CPU), or a microprocessor (MPU), or a digital signal processor (DSP), or Programmable Gate Array (FPGA) implementation.

实施例二Embodiment two

第一软件,例如IMESSAGE软件是指第一类型终端内置的用户间发送短信的软件,该软件可以使短信直接从GPRS端发送,节省了使用第一类型终端的用户的短信费用;因此,使用第一软件的第一类型终端用户可能会大大减少短信的使用量,形成了短信黑洞现象,本实施例正是基于上述短信黑洞现象,在异网中确定出终端类型为第一类型的用户。The first software, such as IMESSAGE software, refers to the built-in software for sending short messages between users of the first type of terminal. This software can make short messages directly send from the GPRS end, saving the cost of short messages for users who use the first type of terminal; therefore, using the first type of terminal The first type of terminal users of a software may greatly reduce the usage of short messages, forming a short message black hole phenomenon. This embodiment is based on the above short message black hole phenomenon, and determines the terminal type as the first type of users in the different network.

本实施例主要利用现有经分系统的通信数据,分析本网使用第一软件的第一类型终端用户的交往行为、以及其交往圈的人群的特点,识别出异网具备上述交往行为、以及其交往圈人群符合上述特点的数据,也即用户,以最终在异网中确定出终端类型为第一类型的用户,以助力于运营商的异网高价值客户的回流工作及营销策略。This embodiment mainly uses the communication data of the existing sub-systems to analyze the communication behavior of the first type of terminal users using the first software on this network and the characteristics of the people in their communication circle, and identify that the different network has the above communication behavior, and The data of the people in the communication circle conforming to the above characteristics, that is, users, can finally determine the terminal type as the first type of users in the different network, so as to help the return work and marketing strategy of the high-value customers of the different network of the operator.

具体地,本实施例主要以用户交往圈模型为基础,通过分析本网第一类型终端中使用第一软件的客户语音交往圈和短信交往圈等习惯特征,在异网大量用户中,分析出第一类型终端用户的用户群,进而分析出异网某一用户是否为第一类型终端用户的概率,以为运营商提供具有参考价值的数据信息。Specifically, this embodiment is mainly based on the user communication circle model, and by analyzing the habitual characteristics of the customer's voice communication circle and SMS communication circle using the first software in the first type of terminal on this network, among a large number of users in different networks, it is analyzed that The user group of the first type of end users, and then analyze the probability of whether a certain user of the different network is the first type of end users, so as to provide operators with data information with reference value.

图5为本发明实施例大数据分析方法的具体实现的流程示意图;在进行大数据分析之前,需要确定出第一组数据和第二组数据;具体地,在第一通信网络中确定出具有第一数据量的第一组数据、以及具有第一数据量的第二组数据;其中,所述第一组数据中各数据对应的用户设备类型为第一类型;所述第二组数据对应的用户设备类型为非第一类型;如图5所示,所述方法包括:Fig. 5 is a schematic flow chart of a specific implementation of the big data analysis method according to the embodiment of the present invention; before performing big data analysis, it is necessary to determine the first set of data and the second set of data; specifically, it is determined in the first communication network that the The first group of data with the first data amount, and the second group of data with the first data amount; wherein, the user equipment type corresponding to each data in the first group of data is the first type; the second group of data corresponds to The type of user equipment is not the first type; as shown in Figure 5, the method includes:

步骤501:在第一组数据和第二组数据中,结合第一组数据和第二组数据各自对应的用户的交往圈的特征规则、交往圈中语音和短信的特征规则、交往对象是否使用第一类型终端的特征规则等选取出M个特征信息;其中,M为大于等于2的正整数;Step 501: In the first set of data and the second set of data, combine the feature rules of the user's social circle corresponding to the first set of data and the second set of data, the feature rules of the voice and text messages in the social circle, and whether the contact object uses M pieces of feature information are selected from the feature rules of the first type of terminal; wherein, M is a positive integer greater than or equal to 2;

这里,所述特征信息也称为关键变量指标。Here, the characteristic information is also referred to as a key variable index.

步骤502:采用逻辑回归算法,依据所述M个特征信息,对所述第一组数据和第二组数据进行分析,模拟出满足第一预设规则的第一类规则;Step 502: Using a logistic regression algorithm to analyze the first set of data and the second set of data according to the M pieces of feature information, and simulate a first type of rule that satisfies the first preset rule;

这里,所述第一类规则可以为逻辑回归公式;所述第一预设规则为用户终端类型为第一类型的规则。Here, the first type of rule may be a logistic regression formula; the first preset rule is a rule that the user terminal type is the first type.

本实施例中,所述对所述第一组数据和第二组数据进行分析,模拟出满足第一预设规则的第一类规则,包括:In this embodiment, the analysis of the first set of data and the second set of data to simulate a first type of rule that satisfies the first preset rule includes:

基于所述M个特征信息,采用逻辑回归算法,对所述第一组数据和第二组数据进行分析,模拟出满足第一预设规则的第一类规则。Based on the M pieces of feature information, a logistic regression algorithm is used to analyze the first set of data and the second set of data, and simulate a first type of rule that satisfies the first preset rule.

步骤503:确定第三组数据,依据所述第一类规则,计算所述第三组数据中的各数据的概率,以确定出第一疑似目标数据;所述第三组数据为与所述第一通信网络中的用户进行通信的、其他通信网络中的用户所对应的数据;Step 503: Determine the third group of data, and calculate the probability of each data in the third group of data according to the first type of rules, so as to determine the first suspected target data; the third group of data is related to the Data corresponding to users in other communication networks that are communicated by users in the first communication network;

这里,所述依据所述第一类规则,计算所述第三组数据中的各数据的概率,以确定出第一疑似目标数据,进一步包括:Here, the calculation of the probability of each data in the third group of data according to the first type of rules to determine the first suspected target data further includes:

依据所述第一类规则,计算所述第三组数据中的各数据的概率;calculating the probability of each data in the third group of data according to the first type of rules;

依据数据业务需求、逻辑回归算法的逻辑回归等级对应的预设用户数,在所述第三组数据中的各数据对应的概率中,确定出概率大于等于预设阈值的数据,并将概率大于等于预设阈值的数据作为第一疑似目标数据。According to the data service requirements and the preset number of users corresponding to the logistic regression level of the logistic regression algorithm, among the probabilities corresponding to each data in the third group of data, determine the data whose probability is greater than or equal to the preset threshold, and set the probability greater than or equal to the preset threshold. The data equal to the preset threshold is taken as the first suspected target data.

步骤504:采用C5决策树算法,依据所述M个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出m1个规则A和m2个规则B;Step 504: Using the C5 decision tree algorithm to analyze the first set of data and the second set of data according to the M feature information, and determine m1 rules A and m2 rules B;

步骤505:根据规则A和规则B对应的用户数和置信度,对规则A和规则B进行筛选,以在所述规则A中确定出第一类子规则,在所述规则B中确定第二类子规则;Step 505: According to the number of users and confidence levels corresponding to rule A and rule B, filter rule A and rule B to determine the first type of sub-rule in rule A, and determine the second type of sub-rule in rule B class subrules;

这里,所述第一类子规则满足所述第一预设规则;所述第二类子规则满足所述第二预设规则;所述m1、m2为大于等于1的正整数。Here, the first type of sub-rule satisfies the first preset rule; the second type of sub-rule satisfies the second preset rule; the m1 and m2 are positive integers greater than or equal to 1.

具体地,当第一组数据和第二组数据的用户数均为10W时,从规则A中筛选出置信度大于85%、用户数大于2W的规则,确定为第一类子规则;从规则B中筛选出置信度大于90%、用户数大于1.8W的规则,确定为第二类子规则;Specifically, when the number of users of the first set of data and the number of users of the second set of data are both 10W, a rule with a confidence degree greater than 85% and a number of users greater than 2W is selected from rule A, and is determined as the first type of sub-rule; In B, the rules with confidence greater than 90% and the number of users greater than 1.8W are selected and determined as the second type of sub-rules;

本实施例中,所述第一类子规则和第二类子规则均归属于第二类规则。In this embodiment, both the first type of sub-rules and the second type of sub-rules belong to the second type of rules.

步骤506:依据所述第一类子规则,对所述第三组数据进行分析,确定出第二疑似目标数据;Step 506: Analyze the third group of data according to the first type of sub-rules, and determine the second suspected target data;

步骤507:确定所述第一疑似目标数据和第二疑似目标数据的交集数据,作为第三疑似目标数据;Step 507: Determine the intersection data of the first suspected target data and the second suspected target data as the third suspected target data;

步骤508:剔除所述第三疑似目标数据中符合第二类子规则的数据,将剩余第三疑似目标数据作为目标数据。Step 508: Eliminate the data conforming to the second type of sub-rules in the third suspected target data, and use the remaining third suspected target data as target data.

本发明实施例,能够在第一通信网络中的第一组数据和第二组数据中确定出关键变量指标,即特征信息;并分别采用逻辑回归算法和决策树算法对所述第一组数据和第二组数据进行分析,确定出与所述逻辑回归算法对应的第一类规则,和与所述决策树算法对应的第二类规则;其中,所述第二类规则包括第一类子规则和第二类子规则;随后,分别依据所述第一类规则和第一类子规则对异网中的第三组数据进行分析,确定出第一疑似目标数据和第二疑似目标数据;由于所述第一类规则满足第一预设规则;所述第一类子规则也满足所述第一预设规则;而第二类子规则满足所述第二预设规则,因此,取所述第一疑似目标数据和第二疑似目标数据的交集确定出第三疑似目标数据后,在所述第三疑似目标数据中剔除满足第二类子规则的数据,即在所述第三疑似目标数据中剔除疑似非目标数据以最终得到目标数据,所述目标数据即为依据本网数据规则,在异网数据中确定出满足第一预设规则的目标数据。In the embodiment of the present invention, it is possible to determine key variable indicators, that is, feature information, from the first set of data and the second set of data in the first communication network; Analyze with the second set of data, determine the first type of rules corresponding to the logistic regression algorithm, and the second type of rules corresponding to the decision tree algorithm; wherein, the second type of rules include the first type of rules rules and second-type sub-rules; then, analyze the third group of data in the different network according to the first-type rules and the first-type sub-rules respectively, and determine the first suspected target data and the second suspected target data; Since the first type of rule satisfies the first preset rule; the first type of sub-rule also satisfies the first preset rule; and the second type of sub-rule satisfies the second preset rule, therefore, the After the third suspected target data is determined by the intersection of the first suspected target data and the second suspected target data, the data satisfying the second type of sub-rule is eliminated from the third suspected target data, that is, in the third suspected target data Suspected non-target data is removed from the data to finally obtain target data. The target data is the target data that satisfies the first preset rule determined from the data of other networks according to the data rules of the local network.

本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

以上所述仅是本发明实施例的实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明实施例原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本发明实施例的保护范围。The above is only the implementation of the embodiment of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the embodiment of the present invention, some improvements and modifications can also be made. These improvements and Retouching should also be regarded as the scope of protection of the embodiments of the present invention.

Claims (12)

1.一种大数据分析方法,其特征在于,所述方法包括:1. a big data analysis method, is characterized in that, described method comprises: 基于输入的第一组数据和第二组数据,获取满足预设条件的至少两个特征信息;所述第一组数据和第二组数据均为第一通信网络中的数据;所述第一组数据满足第一预设规则;所述第二组数据满足第二预设规则;Based on the input first set of data and second set of data, at least two feature information satisfying preset conditions are acquired; the first set of data and the second set of data are both data in the first communication network; the first set of The set of data satisfies a first preset rule; the second set of data satisfies a second preset rule; 依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则;Analyzing the first set of data and the second set of data according to the at least two feature information to determine the first type of rules and the second type of rules; 依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据;所述第三组数据为除所述第一通信网络以外的其他通信网络中的数据。According to the first type of rule and the second type of rule, determine the target data that satisfies the first preset rule in the input third set of data; the third set of data is other than the first communication network data in other communication networks. 2.根据权利要求1所述的方法,其特征在于,所述依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则,包括:2. The method according to claim 1, characterized in that, according to the at least two feature information, the first set of data and the second set of data are analyzed to determine the first type of rules and the second set of rules. class rules, including: 采用逻辑回归算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第一类规则;Using a logistic regression algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine the first type of rules; 采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则。A decision tree algorithm is used to analyze the first set of data and the second set of data according to the at least two feature information to determine a second type of rule. 3.根据权利要求2所述的方法,其特征在于,所述采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则,包括:3. The method according to claim 2, wherein the decision tree algorithm is used to analyze the first group of data and the second group of data according to the at least two feature information, and determine The second category of rules includes: 采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出N个规则;所述N为大于等于2的正整数;Using a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information, and determine N rules; the N is a positive integer greater than or equal to 2; 在所述N个规则中,确定出满足第三预设规则的第二类规则。Among the N rules, a second-type rule that satisfies the third preset rule is determined. 4.根据权利要求1或3任一项所述的方法,其特征在于,所述依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据,包括:4. The method according to any one of claims 1 or 3, characterized in that, according to the first type of rules and the second type of rules, it is determined in the input third group of data that the first type of data is satisfied. Target data for preset rules, including: 分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据;Analyzing the input third set of data according to the first type of rules and the second type of rules respectively, to obtain the first suspected target data and the second suspected target data; 基于所述第一疑似目标数据和第二疑似目标数据确定出满足所述第一预设规则的目标数据。Target data satisfying the first preset rule is determined based on the first suspected target data and the second suspected target data. 5.根据权利要求4所述的方法,其特征在于,所述第二类规则包括:第一类子规则;所述第一类子规则满足所述第一预设规则;5. The method according to claim 4, wherein the second type of rule comprises: a first type of sub-rule; the first type of sub-rule satisfies the first preset rule; 对应地,所述分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据,包括:Correspondingly, according to the first type of rules and the second type of rules, the input third set of data is analyzed to obtain the first suspected target data and the second suspected target data, including: 依据所述第一类规则,对输入的第三组数据进行分析,得到第一疑似目标数据;Analyzing the input third set of data according to the first type of rules to obtain the first suspected target data; 依据所述第一类子规则,对输入的第三组数据进行分析,得到第二疑似目标数据。According to the first type of sub-rules, the input third set of data is analyzed to obtain the second suspected target data. 6.根据权利要求5所述的方法,其特征在于,所述第二类规则还包括:第二类子规则;所述第二类子规则满足第二预设规则;所述方法还包括:6. The method according to claim 5, wherein the second type of rule further comprises: a second type of sub-rule; the second type of sub-rule satisfies a second preset rule; the method further comprises: 依据所述第二类子规则,对所述第一疑似目标数据和所述第二疑似目标数据进行分析,得到疑似非目标数据;Analyzing the first suspected target data and the second suspected target data according to the second type of sub-rules to obtain suspected non-target data; 对应地,所述基于所述第一疑似目标数据和第二疑似目标数据确定出目标数据,包括:Correspondingly, the determining the target data based on the first suspected target data and the second suspected target data includes: 基于所述第一疑似目标数据、第二疑似目标数据和疑似非目标数据,确定出目标数据。The target data is determined based on the first suspected target data, the second suspected target data and the suspected non-target data. 7.一种大数据分析装置,其特征在于,所述装置包括:7. A big data analysis device, characterized in that the device comprises: 获取单元,用于基于输入的第一组数据和第二组数据,获取满足预设条件的至少两个特征信息;所述第一组数据和第二组数据均为第一通信网络中的数据;所述第一组数据满足第一预设规则;所述第二组数据满足第二预设规则;An acquisition unit, configured to acquire at least two feature information satisfying preset conditions based on the input first set of data and second set of data; the first set of data and the second set of data are both data in the first communication network ; The first set of data satisfies a first preset rule; the second set of data satisfies a second preset rule; 分析单元,用于依据所述至少两个特征信息,对所述第一组数据和第二组数据进行分析,确定出第一类规则和第二类规则;An analysis unit, configured to analyze the first set of data and the second set of data according to the at least two characteristic information, and determine the first type of rules and the second type of rules; 确定单元,用于依据所述第一类规则和第二类规则,在输入的第三组数据中确定出满足所述第一预设规则的目标数据;所述第三组数据为除所述第一通信网络以外的其他通信网络中的数据。A determining unit, configured to determine, from the input third set of data, target data satisfying the first preset rule according to the first type of rule and the second type of rule; the third set of data is Data in a communication network other than the first communication network. 8.根据权利要求7所述的装置,其特征在于,所述分析单元包括:8. The device according to claim 7, wherein the analysis unit comprises: 第一分析子单元,用于采用逻辑回归算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第一类规则;The first analysis subunit is configured to use a logistic regression algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine a first type of rule; 第二分析子单元,用于采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出第二类规则。The second analysis subunit is configured to use a decision tree algorithm to analyze the first set of data and the second set of data according to the at least two feature information to determine a second type of rule. 9.根据权利要求8所述的装置,其特征在于,所述第二分析子单元,还用于采用决策树算法,依据所述至少两个特征信息,对所述第一组数据和所述第二组数据进行分析,确定出N个规则;所述N为大于等于2的正整数;9. The device according to claim 8, wherein the second analysis subunit is further configured to use a decision tree algorithm to analyze the first group of data and the The second set of data is analyzed to determine N rules; said N is a positive integer greater than or equal to 2; 还用于在所述N个规则中,确定出满足第三预设规则的第二类规则。It is also used to determine, among the N rules, a second type of rule that satisfies the third preset rule. 10.根据权利要求7至9任一项所述的装置,其特征在于,所述确定单元,包括:10. The device according to any one of claims 7 to 9, wherein the determining unit includes: 第一确定子单元,用于分别依据所述第一类规则和第二类规则,对输入的第三组数据进行分析,得到第一疑似目标数据和第二疑似目标数据;The first determination subunit is configured to analyze the input third set of data according to the first type of rules and the second type of rules respectively, to obtain the first suspected target data and the second suspected target data; 第二确定子单元,用于基于所述第一疑似目标数据和第二疑似目标数据确定出满足所述第一预设规则的目标数据。The second determining subunit is configured to determine target data satisfying the first preset rule based on the first suspected target data and the second suspected target data. 11.根据权利要求10所述的装置,其特征在于,所述第二类规则包括:第一类子规则;所述第一类子规则满足所述第一预设规则;对应地,11. The device according to claim 10, wherein the second type of rule comprises: a first type of sub-rule; the first type of sub-rule satisfies the first preset rule; correspondingly, 所述第一确定子单元,还用于依据所述第一类规则,对输入的第三组数据进行分析,得到第一疑似目标数据;The first determining subunit is further configured to analyze the input third set of data according to the first type of rules to obtain the first suspected target data; 还用于依据所述第一类子规则,对输入的第三组数据进行分析,得到第二疑似目标数据。It is also used to analyze the input third set of data according to the first type of sub-rules to obtain the second suspected target data. 12.根据权利要求11所述的方法,其特征在于,所述第二类规则还包括:第二类子规则;所述第二类子规则满足第二预设规则;12. The method according to claim 11, wherein the second type of rule further comprises: a second type of sub-rule; the second type of sub-rule satisfies a second preset rule; 所述第一确定子单元,还用于依据所述第二类子规则,对所述第一疑似目标数据和所述第二疑似目标数据进行分析,得到疑似非目标数据;The first determination subunit is further configured to analyze the first suspected target data and the second suspected target data according to the second type of sub-rules to obtain suspected non-target data; 对应地,所述第二确定子单元,还用于基于所述第一疑似目标数据、第二疑似目标数据和疑似非目标数据,确定出目标数据。Correspondingly, the second determination subunit is further configured to determine the target data based on the first suspected target data, the second suspected target data and the suspected non-target data.
CN201510001942.6A 2015-01-04 2015-01-04 A big data analysis method and device thereof Active CN105824811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510001942.6A CN105824811B (en) 2015-01-04 2015-01-04 A big data analysis method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510001942.6A CN105824811B (en) 2015-01-04 2015-01-04 A big data analysis method and device thereof

Publications (2)

Publication Number Publication Date
CN105824811A true CN105824811A (en) 2016-08-03
CN105824811B CN105824811B (en) 2019-07-02

Family

ID=56513287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510001942.6A Active CN105824811B (en) 2015-01-04 2015-01-04 A big data analysis method and device thereof

Country Status (1)

Country Link
CN (1) CN105824811B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010049821A1 (en) * 2000-06-02 2001-12-06 Yasushi Ochi Network-utilizing content broadcast system and contest execution system
CN1333612A (en) * 2000-06-19 2002-01-30 阿尔卡塔尔公司 Method for rebooting terminal connected with local area network
CN1647052A (en) * 2002-04-12 2005-07-27 沃达方集团有限公司 Method ans system for distribution of encrypted data in a mobile network
CN1698311A (en) * 2003-01-16 2005-11-16 索尼英国有限公司 Video/audio network
US20060225141A1 (en) * 2005-03-30 2006-10-05 Fujitsu Limited Unauthorized access searching method and device
US20080091532A1 (en) * 2006-10-17 2008-04-17 Silverbrook Research Pty Ltd Method of delivering an advertisement from a computer system
US20090282023A1 (en) * 2008-05-12 2009-11-12 Bennett James D Search engine using prior search terms, results and prior interaction to construct current search term results
CN103327063A (en) * 2012-02-14 2013-09-25 谷歌公司 User presence detection and event discovery

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010049821A1 (en) * 2000-06-02 2001-12-06 Yasushi Ochi Network-utilizing content broadcast system and contest execution system
CN1333612A (en) * 2000-06-19 2002-01-30 阿尔卡塔尔公司 Method for rebooting terminal connected with local area network
CN1647052A (en) * 2002-04-12 2005-07-27 沃达方集团有限公司 Method ans system for distribution of encrypted data in a mobile network
CN1698311A (en) * 2003-01-16 2005-11-16 索尼英国有限公司 Video/audio network
US20060225141A1 (en) * 2005-03-30 2006-10-05 Fujitsu Limited Unauthorized access searching method and device
US20080091532A1 (en) * 2006-10-17 2008-04-17 Silverbrook Research Pty Ltd Method of delivering an advertisement from a computer system
US20090282023A1 (en) * 2008-05-12 2009-11-12 Bennett James D Search engine using prior search terms, results and prior interaction to construct current search term results
CN103327063A (en) * 2012-02-14 2013-09-25 谷歌公司 User presence detection and event discovery

Also Published As

Publication number Publication date
CN105824811B (en) 2019-07-02

Similar Documents

Publication Publication Date Title
TWI804575B (en) Method and apparatus, computer readable storage medium, and computing device for identifying high-risk users
CN112711705B (en) Public opinion data processing method, equipment and storage medium
CN108038130A (en) Automatic cleaning method, device, equipment and the storage medium of fictitious users
CN108090567A (en) Power communication system method for diagnosing faults and device
CN111754241B (en) User behavior perception method, device, equipment and medium
CN110033302B (en) Malicious account identification method and device
CN113298121B (en) Message sending method and device based on multi-data source modeling and electronic equipment
CN104866296B (en) Data processing method and device
CN113904943B (en) Account detection method and device, electronic equipment and storage medium
CN110781514A (en) Data privacy protection method
CN104750760A (en) Application software recommending method and device
CN111814064A (en) Method, device, computer equipment and medium for abnormal user processing based on Neo4j
CN107015993B (en) User type identification method and device
CN104320266A (en) Charging method and device under cloud computing operation system
CN104954360A (en) Method and device for blocking shared content
CN103294833A (en) Junk user discovering method based on user following relationships
US20190220924A1 (en) Method and device for determining key variable in model
CN114726565B (en) Threat intelligence sharing method, threat intelligence rating method, system and storage medium
CN110222484A (en) A kind of method for identifying ID, device, electronic equipment and storage medium
CN111242658A (en) Information sharing reward method, device and computer readable storage medium
CN111125193B (en) Method, device, equipment and storage medium for identifying abnormal multimedia comments
CN105808580A (en) Information determination method and equipment based on prior model
CN113779336A (en) User behavior data processing method and device, electronic equipment
CN105824811A (en) Big data analysis method and device
CN105260467B (en) A kind of SMS classified method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant