CN102929896A - Data mining method based on privacy protection - Google Patents
Data mining method based on privacy protection Download PDFInfo
- Publication number
- CN102929896A CN102929896A CN2011102329325A CN201110232932A CN102929896A CN 102929896 A CN102929896 A CN 102929896A CN 2011102329325 A CN2011102329325 A CN 2011102329325A CN 201110232932 A CN201110232932 A CN 201110232932A CN 102929896 A CN102929896 A CN 102929896A
- Authority
- CN
- China
- Prior art keywords
- data
- decision tree
- method based
- privacy protection
- mining method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 12
- 238000007418 data mining Methods 0.000 title abstract description 6
- 238000003066 decision tree Methods 0.000 claims abstract description 12
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims abstract description 4
- 238000013138 pruning Methods 0.000 claims abstract description 4
- 230000007704 transition Effects 0.000 claims abstract description 4
- 238000013480 data collection Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000009828 non-uniform distribution Methods 0.000 description 1
- 239000002344 surface layer Substances 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of data mining, and particularly relates to a data mining method based on privacy protection. The data mining method based on privacy protection comprises the following steps of: conversion of original real data: discretizing potential digital attribute data, arranging a transition probability matrix for all attributes; generation of a decision tree: obtaining statistics of converted data records, and generating the decision tree gradually and recursively by using a converted training sample data set S, a determined splitting attribute set, splitting points and data subset marks; and the generation of a classification rule: pruning the generated decision tree to generate the classification rule. The data mining method based on privacy protection provided by the invention is applicable to noncharacter type data and nonuniformly distributed original data, and can also convert tag attributes; and moreover a classification tree constructed based on the converted data set has the characteristic of higher accuracy.
Description
Technical field
The invention belongs to Data Mining, relate in particular to the data digging method based on secret protection.
Background technology
Current society is the society of an information explosion, the development of internet has aggravated again exchange and the propagation of information, all these have excited again the demand of excavating useful information from a large amount of data greatly, these data and consequent information are the treasures of each industry, and it is recording the essential situation of managing faithfully.But in the face of like this a large amount of data, traditional data analysing method, can only obtain the surface layer information of data such as data retrieval, statistical study, can not obtain information its inherence, profound, the supvr is faced with the abundant and predicament of knowledge poorness of data.Therefore it is very important how excavating the useful knowledge of business decision from these data.
Summary of the invention
The invention provides a kind of data digging method based on secret protection, it comprises following steps:
The shift step of original True Data is carried out discretize to potential numeric type attribute data, again all properties is arranged transition probability matrix;
The generation step of decision tree, at server end, the data recording of statistics after the conversion utilized training sample data collection S, Split Attribute collection, split point and the data subset sign of having determined, progressively recursive generation decision tree after the conversion;
The generation step of classifying rules is carried out beta pruning to the above-mentioned decision tree that has generated, and produces classifying rules.
The raw data that the data digging method based on secret protection of the present invention invention goes for non-character type data and non-uniform Distribution also can the conversion tag attributes, and the classification tree that the data set after conversion is constructed has higher precision.
Description of drawings
Fig. 1 is the flow chart of steps based on the data digging method of secret protection.
Embodiment
Based on the data digging method of secret protection, its flow chart of steps comprises following steps as shown in Figure 1:
The shift step of original True Data is carried out discretize to potential numeric type attribute data, again all properties is arranged transition probability matrix;
The generation step of decision tree, at server end, the data recording of statistics after the conversion utilized training sample data collection S, Split Attribute collection, split point and the data subset sign of having determined, progressively recursive generation decision tree after the conversion;
The generation step of classifying rules is carried out beta pruning to the above-mentioned decision tree that has generated, and produces classifying rules.
Claims (1)
1. based on the data digging method of secret protection, it is characterized in that, comprise following steps:
The shift step of original True Data is carried out discretize to potential numeric type attribute data, again all properties is arranged transition probability matrix;
The generation step of decision tree, at server end, the data recording of statistics after the conversion utilized training sample data collection S, Split Attribute collection, split point and the data subset sign of having determined, progressively recursive generation decision tree after the conversion;
The generation step of classifying rules is carried out beta pruning to the above-mentioned decision tree that has generated, and produces classifying rules.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011102329325A CN102929896A (en) | 2011-08-13 | 2011-08-13 | Data mining method based on privacy protection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011102329325A CN102929896A (en) | 2011-08-13 | 2011-08-13 | Data mining method based on privacy protection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102929896A true CN102929896A (en) | 2013-02-13 |
Family
ID=47644695
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011102329325A Pending CN102929896A (en) | 2011-08-13 | 2011-08-13 | Data mining method based on privacy protection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102929896A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605749A (en) * | 2013-11-20 | 2014-02-26 | 同济大学 | Privacy protection associated rule data digging method based on multi-parameter interference |
CN104731976A (en) * | 2015-04-14 | 2015-06-24 | 海量云图(北京)数据技术有限公司 | Method for finding and sorting private data in data table |
CN104798043A (en) * | 2014-06-27 | 2015-07-22 | 华为技术有限公司 | Data processing method and computer system |
CN113011484A (en) * | 2021-03-12 | 2021-06-22 | 大商所飞泰测试技术有限公司 | Graphical demand analysis and test case generation method based on classification tree and decision tree |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090248682A1 (en) * | 2008-04-01 | 2009-10-01 | Certona Corporation | System and method for personalized search |
CN201859444U (en) * | 2010-04-07 | 2011-06-08 | 苏州市职业大学 | Data excavation device for privacy protection |
-
2011
- 2011-08-13 CN CN2011102329325A patent/CN102929896A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090248682A1 (en) * | 2008-04-01 | 2009-10-01 | Certona Corporation | System and method for personalized search |
CN201859444U (en) * | 2010-04-07 | 2011-06-08 | 苏州市职业大学 | Data excavation device for privacy protection |
Non-Patent Citations (1)
Title |
---|
葛伟平 等: "基于隐私保护的分类挖掘", 《计算机研究与发展》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605749A (en) * | 2013-11-20 | 2014-02-26 | 同济大学 | Privacy protection associated rule data digging method based on multi-parameter interference |
CN104798043A (en) * | 2014-06-27 | 2015-07-22 | 华为技术有限公司 | Data processing method and computer system |
WO2015196476A1 (en) * | 2014-06-27 | 2015-12-30 | 华为技术有限公司 | Data processing method and computer system |
US9984336B2 (en) | 2014-06-27 | 2018-05-29 | Huawei Technologies Co., Ltd. | Classification rule sets creation and application to decision making |
CN104731976A (en) * | 2015-04-14 | 2015-06-24 | 海量云图(北京)数据技术有限公司 | Method for finding and sorting private data in data table |
CN104731976B (en) * | 2015-04-14 | 2018-03-30 | 海量云图(北京)数据技术有限公司 | The discovery of private data and sorting technique in tables of data |
CN113011484A (en) * | 2021-03-12 | 2021-06-22 | 大商所飞泰测试技术有限公司 | Graphical demand analysis and test case generation method based on classification tree and decision tree |
CN113011484B (en) * | 2021-03-12 | 2023-12-26 | 大商所飞泰测试技术有限公司 | Graphical demand analysis and test case generation method based on classification tree and judgment tree |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kubina et al. | Use of big data for competitive advantage of company | |
CN103778200B (en) | A kind of message information source abstracting method and its system | |
CN104199972A (en) | Named entity relation extraction and construction method based on deep learning | |
US9928288B2 (en) | Automatic modeling of column and pivot table layout tabular data | |
CN114911870A (en) | Fusion management framework for multi-source heterogeneous industrial data | |
CN106503872A (en) | A kind of business process system construction method based on basic business active set | |
CN102929896A (en) | Data mining method based on privacy protection | |
CN102122280A (en) | Method and system for intelligently extracting content object | |
JP5309543B2 (en) | Information search server, information search method and program | |
CN108280561A (en) | A kind of discrete manufacture mechanical product quality source tracing method based on comentropy and Weighted distance | |
CN120196741A (en) | Text event association method based on large language model | |
CN106557881B (en) | A method for building a business process system based on the execution sequence of business activities | |
CN104636324B (en) | Topic source tracing method and system | |
CN105573972A (en) | Report check formula generation method and apparatus | |
Li et al. | Vandalism detection in OpenStreetMap via user embeddings | |
CN118503429A (en) | Knowledge graph-based rapid classifying method and system for scientific and technological achievements | |
CN109614491B (en) | Further mining method based on mining result of data quality detection rule | |
CN102929888A (en) | Data mining method based on web | |
Qiu et al. | How Much Do Women Build Open Source Infrastructure? | |
CN110955754A (en) | A Model Construction Method for Repeated Call Analysis and Recognition | |
CN117348863B (en) | Low-code development method and device for industrial software, electronic equipment and storage medium | |
Brock Júnior et al. | Development of a logical model for geotechnical databases | |
CN106447165A (en) | A heuristic job classification method and device | |
CN107729346A (en) | A kind of new method for excavating the hidden transition of business procedure | |
CN107480822A (en) | A TrieTree-Based Method Forecasting the Development Dynamics of Listed Enterprises |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130213 |