CN104699761B - A kind of incremental calculation method of minimal functional dependencies - Google Patents
A kind of incremental calculation method of minimal functional dependencies Download PDFInfo
- Publication number
- CN104699761B CN104699761B CN201510072548.1A CN201510072548A CN104699761B CN 104699761 B CN104699761 B CN 104699761B CN 201510072548 A CN201510072548 A CN 201510072548A CN 104699761 B CN104699761 B CN 104699761B
- Authority
- CN
- China
- Prior art keywords
- new
- fds
- tuple
- functional dependencies
- minimal functional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种最小函数依赖的增量计算方法,该方法根据关系表变化前的最小非平凡函数依赖集、增量数据集、变化前的关系表的划分信息集,增量检测原有的最小函数依赖是否成立,最后确定关系表变化后的最小非平凡函数依赖集。该方法按照元组的操作类型(增加、删除或修改),进行相应的最小函数依赖的增量计算。由于在实际应用中,数据库变化后,原数据集中的大多数最小函数依赖都是有效的,本发明提出的方法不需要重新计算新数据集的所有最小函数依赖,仅需要计算原最小函数依赖集的新增及删除的最小函数依赖,因此效率较高,且灵活性强,计算结果准确。
The invention discloses an incremental calculation method of minimum function dependence. According to the minimum non-trivial function dependency set before the change of the relationship table, the incremental data set, and the division information set of the relationship table before the change, the method incrementally detects the original Whether the minimum functional dependency of is established, and finally determine the minimum non-trivial functional dependency set after the relationship table changes. According to the operation type (addition, deletion or modification) of the tuple, the method performs the incremental calculation of the corresponding minimum functional dependency. Since in practical application, after the database changes, most of the minimum functional dependencies in the original data set are valid, the method proposed by the present invention does not need to recalculate all the minimum functional dependencies of the new data set, and only needs to calculate the original minimum functional dependency set The minimal functional dependence of adding and deleting, so the efficiency is high, the flexibility is strong, and the calculation results are accurate.
Description
技术领域technical field
本发明涉及计算机数据处理领域,特别涉及一种最小函数依赖的增量计算方法。The invention relates to the field of computer data processing, in particular to an incremental calculation method with minimum function dependence.
背景技术Background technique
随着信息化进程在各行各业的快速推进,海量、动态数据库的涌现使数据分析与处理面临严峻考验。数据库是变化的,数据所反映的知识也是变化的,如何高效、动态地维护数据挖掘所得知识是数据库领域的一个重要课题。近些年来,数据库完整性约束正被广泛应用于数据分析与处理,以及检测不一致数据、提高数据质量等方面。With the rapid advancement of informatization in all walks of life, the emergence of massive and dynamic databases has made data analysis and processing face severe challenges. The database is changing, and the knowledge reflected by the data is also changing. How to efficiently and dynamically maintain the knowledge obtained from data mining is an important topic in the database field. In recent years, database integrity constraints are being widely used in data analysis and processing, as well as detecting inconsistent data and improving data quality.
目前,政府部门、企业等单位的信息系统及数据中心绝大部分以关系数据库管理为核心,为了从中快速、准确地识别出错误、不一致等异常数据需要高效可行的技术与方法支持,数据质量检测工具需求越来越大。从用途方面上考虑,数据质量检测方法适用于:业务数据处理与统计、数据归档、数据仓库维护、数据清洗、数据集成或整合之中。为此,未来几年在政府、事业、企业等单位的信息化建设过程中引入数据质量管理平台,建立各个层次上数据质量检测系统,将成为必然趋势。错误、不一致等异常数据检测方法及相应的检测工具具有很好的产业前景。At present, most of the information systems and data centers of government departments, enterprises and other units use relational database management as the core. In order to quickly and accurately identify abnormal data such as errors and inconsistencies, efficient and feasible technical and method support are needed. Data quality inspection Tools are in increasing demand. From the perspective of usage, the data quality detection method is suitable for: business data processing and statistics, data archiving, data warehouse maintenance, data cleaning, data integration or integration. For this reason, it will become an inevitable trend to introduce data quality management platforms and establish data quality inspection systems at all levels in the informatization construction process of governments, institutions, and enterprises in the next few years. Abnormal data detection methods such as errors and inconsistencies and corresponding detection tools have good industrial prospects.
函数依赖(Functional Dependency,缩写为FD)是关系数据库的重要约束之一,不仅是数据库设计的基础,而且是数据质量监测、控制的依据。虽然函数依赖可以在数据库管理系统作为数据约束定义,但许多情况下,在构建数据库时,没有明确定义函数依赖,或者并非在构造数据库时能够明确制定,需要通过数据集挖掘或计算函数依赖,并且随着数据库的变化还需要对它们进行维护,以保证检测异常数据的准确性。Functional Dependency (FD for short) is one of the important constraints of relational databases, not only the basis of database design, but also the basis of data quality monitoring and control. Although functional dependencies can be defined as data constraints in the database management system, in many cases, when the database is constructed, the functional dependencies are not clearly defined, or cannot be clearly formulated when the database is constructed, and it is necessary to mine or calculate the functional dependencies through data sets, and As the database changes, they need to be maintained to ensure the accuracy of detecting abnormal data.
关于函数依赖集的计算算法已研究多年,最具代表性的两种算法是按照层次的算法(例如Y.Huhtala,J.Karkk ainen等在1999年公开的《TANE:An efficient algorithmfor discovering functional and approximate dependencies》)和深度优先的算法(例如C.M.Wyss,C.Giannella,E.L.Robertson等在2001年公开的《FastFDs:A heuristic-driven,depth-first algorithm for mining functional dependencies from relationinstances-extended for abstract》),随着数据库的变化,函数依赖也可能发生变化,即原有的函数依赖不再成立、新的函数依赖产生。目前计算数据库变化后的函数依赖及最小函数依赖的方式主要有两种:Algorithms for computing functional dependency sets have been studied for many years, and the two most representative algorithms are hierarchical algorithms (for example, "TANE: An efficient algorithm for discovering functional and approximate" published by Y. Huhtala, J. Karkkainen, etc. in 1999 dependencies") and depth-first algorithms (such as "FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relationinstances-extended for abstract" published by C.M.Wyss, C.Giannella, E.L.Robertson, etc. in 2001), As the database changes, the functional dependencies may also change, that is, the original functional dependencies are no longer established, and new functional dependencies are generated. At present, there are two main ways to calculate the functional dependence and the minimum functional dependence after the database change:
(1)重新计算:针对变化后的数据库按照某一函数依赖计算方法重新计算函数依赖,不利用数据集变化的部分(即增加的元组,删除的元组,修改的元组)、原来计算出的最小函数依赖集及划分信息。(1) Recalculation: For the changed database, recalculate the functional dependence according to a certain functional dependence calculation method, without using the changed part of the data set (that is, the added tuple, the deleted tuple, the modified tuple), the original calculation The minimum functional dependency set and partition information obtained.
(2)增量计算:主要根据数据集变化的部分、原有的最小函数依赖集及划分信息,增量检测原有的最小函数依赖是否成立,并计算新增的最小函数依赖。(2) Incremental calculation: mainly based on the changing part of the data set, the original minimum functional dependency set and partition information, incrementally detect whether the original minimum functional dependency is established, and calculate the newly added minimum functional dependency.
目前,针对动态变化的数据库,计算函数依赖集及最小函数依赖的算法均是采用重新计算的方式,如TANE算法和FastFDs算法以及基于它们改进的算法。针对变化的数据库,采用重新计算最小函数依赖方式,重复计算与变化前数据库相同的最小函数依赖,随着数据库的增大,时间效率较低。在实际应用中,随着数据库的变化,大部分最小函数依赖仍然不变或成立,因此,重新计算是没有必要的。At present, for dynamically changing databases, the algorithms for calculating functional dependency sets and minimum functional dependencies are recalculated, such as TANE algorithm, FastFDs algorithm and improved algorithms based on them. For the changed database, the method of recalculating the minimum functional dependency is used to repeatedly calculate the same minimum functional dependency as the database before the change. As the database grows, the time efficiency is low. In practical applications, as the database changes, most of the minimum functional dependencies remain unchanged or hold, so recalculation is unnecessary.
所以,提出一种计算效率高、灵活性强、通过增量计算方式来计算最小函数依赖的方法具有重要的应用价值。Therefore, it is of great application value to propose a method with high computational efficiency, strong flexibility, and incremental calculation to calculate the minimum functional dependence.
发明内容Contents of the invention
本发明的目的在于克服现有技术的缺点与不足,提供一种最小函数依赖的增量计算方法,该方法是采用增量计算的方式,从变化的关系数据集中计算最小函数依赖,计算效率高,且计算结果准确。The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provide a method for incremental calculation of minimum functional dependence. The method adopts an incremental calculation method to calculate minimum functional dependence from changing relational data sets, and has high calculation efficiency. , and the calculation result is accurate.
为了克服现有方法和技术的不足,本发明根据变化数据集、关系表变化前的最小函数依赖集及划分信息,增量检测原有的最小函数依赖是否成立,并计算新增的最小函数依赖。In order to overcome the deficiencies of the existing methods and technologies, the present invention incrementally detects whether the original minimum functional dependence is established according to the change data set, the minimum functional dependence set and the division information before the relational table change, and calculates the newly added minimum functional dependence .
本发明技术方案所涉及的数学定义首先列举如下:The mathematical definitions involved in the technical solution of the present invention are first listed as follows:
函数依赖的定义:令R是一个关系模式,Y∈R,函数依赖X→Y成立的条件是:对于模式为R的关系表r及r中任意两个元组t1和t2,以及所有B∈X,如果t1[B]=t2[B],则t1[Y]=t2[Y]。若Y∈X,X→Y是平凡的。Definition of functional dependencies: Let R be a relational schema, Y∈R, the condition for the establishment of functional dependence X→Y is: for a relational table r with schema R and any two tuples t 1 and t 2 in r, and all B∈X, if t 1 [B]=t 2 [B], then t 1 [Y] = t 2 [Y]. If Y∈X, X→Y is trivial.
最小函数依赖的定义:令R是一个关系模式,Y∈R,函数依赖X→Y成立,若对于任意非空Z→Y不成立,则X→Y是最小函数依赖。Definition of minimal functional dependency: Let R be a relational schema, Y∈R, the functional dependence X→Y holds, if for any non-empty Z→Y is not established, then X→Y is the minimum functional dependence.
本发明仅研究最小非平凡函数依赖的增量计算方法。下文提到的函数依赖均指非平凡函数依赖、最小函数依赖均指最小非平凡函数依赖。The present invention only studies the incremental calculation method of minimum non-trivial function dependence. The functional dependencies mentioned below all refer to non-trivial functional dependencies, and the minimum functional dependencies refer to minimum non-trivial functional dependencies.
等价类的定义:属性集X的某一等价类[t]X是指在给定关系表中,所有与元组t在X上取值对应相等(简称与t在X上取值相等)的元组的集合。Definition of equivalence class: A certain equivalence class [t] X of an attribute set X means that in a given relational table, all the values corresponding to tuple t on X are equal (referred to as equal to t on X) ) set of tuples.
划分的定义:属性集X上的划分是在给定关系表r中,X的所有等价类的集合,记为πX,|πX|表示在πX中等价类的数目。Definition of division: The division on attribute set X is the set of all equivalence classes of X in a given relational table r, denoted as π X , where |π X | represents the number of equivalence classes in π X.
判定函数依赖X→Y是否成立的方法:若|πX|=|πX∪{Y}|,则函数依赖X→Y成立;否则,不成立。本发明按照该方法判定函数依赖是否成立。The method of judging whether the function depends on X→Y is true: if |π X |=|π X∪{Y} |, then the function depends on X→Y; otherwise, it does not hold. According to this method, the present invention judges whether the functional dependence is established.
判定函数依赖X→Y是否为最小函数依赖的方法:函数依赖X→Y成立,且若对于任意Z→Y不成立,则X→Y是最小函数依赖;否则不是最小函数依赖。本发明按照该方法判定函数依赖是否是最小函数依赖。The method of judging whether the functional dependence X→Y is the minimum functional dependence: the functional dependence X→Y is established, and if for any If Z→Y is not established, then X→Y is the minimum functional dependence; otherwise, it is not the minimum functional dependence. According to this method, the present invention determines whether the functional dependence is the minimum functional dependence.
假设给定关系模式为R的关系表r,规定如下标识符:Assuming that the given relational schema is a relational table r of R, the following identifiers are specified:
△r:按给定的周期捕获到变化的增量元组集;△r: Capture the changing set of incremental tuples in a given cycle;
|R|:R中包括的属性个数;|R|: the number of attributes included in R;
rnew:变化后的新关系表;r new : the new relationship table after the change;
Partitions:关系表r变化前的划分信息集;Partitions: the partition information set before the relational table r changes;
Partitions_new:rnew的划分信息集;Partitions_new: r new partition information set;
FDs_old(r):关系表r变化前的最小非平凡函数依赖集;FDs_old(r): the minimum non-trivial functional dependency set before the relational table r changes;
FDs_new(rnew):关系表r变化后的最小非平凡函数依赖集。FDs_new(r new ): The minimum non-trivial functional dependency set after the relational table r changes.
在基于上述数学定义基础上,下面来具体说明本发明一种最小函数依赖的增量计算方法的具体技术方案。On the basis of the above-mentioned mathematical definition, the specific technical solution of an incremental calculation method with minimum functional dependence in the present invention will be described in detail below.
本发明一种最小函数依赖的增量计算方法,已知关系表变化前的最小非平凡函数依赖集、增量元组集、变化前的关系表的划分信息集,首先根据增量元组集中所有元组的操作类型及属性值,修改划分信息集中的划分信息,然后增量检测原有的每条最小非平凡函数依赖在变化后的关系表中是否仍为最小非平凡函数依赖,如果是,则保留在最小非平凡函数依赖集中;如果不是,则从最小非平凡函数依赖集中删除,并计算关系表变化后新增的最小非平凡函数依赖。The present invention is an incremental calculation method of minimum functional dependence. The minimum non-trivial functional dependency set, the incremental tuple set, and the division information set of the relational table before the change are known, and firstly, according to the incremental tuple concentration The operation type and attribute value of all tuples, modify the partition information in the partition information set, and then incrementally check whether each original minimum non-trivial function dependency is still the minimum non-trivial function dependency in the changed relational table, if yes , it will be kept in the minimum non-trivial functional dependency set; if not, it will be deleted from the minimum non-trivial functional dependency set, and the new minimum non-trivial functional dependency after the change of the relationship table will be calculated.
具体步骤如下:Specific steps are as follows:
(1)选择关系模式为R的初始关系表r进行操作,将增量元组记录到△r中,得到变化后的新关系表rnew;(1) Select the initial relational table r whose relational mode is R to operate, record the incremental tuple into △r, and obtain the changed new relational table r new ;
(2)根据增量元组集△r中所有元组操作类型及属性值,修改Partitions中的划分信息;(2) Modify the division information in Partitions according to the operation types and attribute values of all tuples in the incremental tuple set △r;
(3)初始时,令FDs_new(rnew)=FDs_old(r);(3) Initially, let FDs_new(r new )=FDs_old(r);
(4)从△r中读取一条元组tp;(4) Read a tuple t p from △r;
(5)根据元组tp的操作类型,进行最小函数依赖的增量计算,方法如下:(5) According to the operation type of the tuple t p , the incremental calculation of the minimum functional dependence is performed as follows:
(5-1)如果tp是增加的元组,修改FDs_new(rnew)的步骤如下:(5-1) If t p is an added tuple, the steps to modify FDs_new(r new ) are as follows:
(5-1-1)读取FDs_new(rnew)中的一条最小函数依赖f,设f:X→Y;(5-1-1) Read a minimum function dependency f in FDs_new(r new ), let f: X→Y;
(5-1-2)判定函数依赖f在rnew中是否成立,若成立,f仍是最小函数依赖,保留在FDs_new(rnew)中,且转步骤(5-1-4);若不成立,在FDs_new(rnew)中删除f,转步骤(5-1-3);(5-1-2) Determine whether the functional dependence f is established in r new , if it is established, f is still the minimum functional dependence, retained in FDs_new(r new ), and go to step (5-1-4); if not established , delete f in FDs_new(r new ), go to step (5-1-3);
(5-1-3)对于(5-1-2)判定为不再成立的函数依赖:X→Y,判定不在FDs_new(rnew)中的X∪Z→Y在rnew中是否为最小函数依赖,若是,则X∪Z→Y为新增的最小函数依赖,加入FDs_new(rnew)中,其中:“-”是集合的差运算符,且Z非空;(5-1-3) For (5-1-2) judged to be no longer valid functional dependence: X→Y, determine whether X∪Z→Y not in FDs_new(r new ) is the minimum function in r new Dependency, if so, then X∪Z→Y is the newly added minimum functional dependency, which is added to FDs_new(r new ), where: "-" is the difference operator of the set, and Z is not empty;
(5-1-4)判断FDs_new(rnew)中是否还有未读取的最小函数依赖,若是,则重复步骤(5-1-1)~(5-1-3);若否,则进入步骤(6);(5-1-4) Determine whether there are unread minimum functional dependencies in FDs_new(r new ), if so, repeat steps (5-1-1)~(5-1-3); if not, then Go to step (6);
(5-2)如果tp是删除的元组,修改FDs_new(rnew)的步骤如下:(5-2) If t p is a deleted tuple, the steps to modify FDs_new(r new ) are as follows:
(5-2-1)读取FDs_new(rnew)中的一条最小函数依赖f,设f:X→Y,(5-2-1) Read a minimum function dependency f in FDs_new(r new ), let f: X→Y,
(5-2-2)若|X|=1,则f在rnew中仍为最小函数依赖;(5-2-2) If |X|=1, f is still the minimum functional dependency in r new ;
若|X|>1,针对所有且Z非空,判断Z→Y在rnew中是否为最小函数依赖,若是,则将其增加到FDs_new(rnew)中,且在FDs_new(rnew)中删除f;If |X|>1, for all And Z is not empty, judge whether Z→Y is the minimum functional dependency in r new , if so, add it to FDs_new(r new ), and delete f in FDs_new(r new );
若在rnew中不存在Z→Y是最小函数依赖,则判定f在rnew中是否仍为函数依赖,若是,则f在rnew中仍为最小函数依赖;若不是,则在FDs_new(rnew)中删除f;If there is no Z→Y minimum functional dependency in r new , then determine whether f is still a functional dependency in r new , if so, then f is still a minimum functional dependency in r new ; if not, then in FDs_new(r new ) to delete f;
(5-2-3)从R中读取一个属性A,采用以下步骤计算其他新增的最小函数依赖:(5-2-3) Read an attribute A from R, and use the following steps to calculate other newly added minimum functional dependencies:
(5-2-3-1)设且X非空},X为不包含A的非空属性集,L=1;(5-2-3-1) set And X is not empty}, X is a non-empty attribute set that does not contain A, L=1;
(5-2-3-2)若C非空,则否则,转(5-2-4);(5-2-3-2) If C is not empty, then Otherwise, go to (5-2-4);
(5-2-3-3)若C′非空,对于C′中的每一元素X,针对不在FDs_new(rnew)中的X→A,判定其是否是最小函数依赖,若是,则将X→A增加到FDs_new(rnew)中,并在C中删除所有X∪Y元素,其中且 (5-2-3-3) If C' is not empty, for each element X in C', for X→A not in FDs_new(r new ), determine whether it is the minimum functional dependency, and if so, set X→A is added to FDs_new(r new ), and all X∪Y elements are deleted in C, where and
(5-2-3-4)令L加1;(5-2-3-4) Add 1 to L;
(5-2-3-5)如果C非空,且L<|R|-1,则重复步骤(5-2-3-2)~(5-2-3-4);否则,进入步骤(5-2-4);(5-2-3-5) If C is not empty, and L<|R|-1, then repeat steps (5-2-3-2)~(5-2-3-4); otherwise, go to step (5-2-4);
(5-2-4)判定R中是否还有没读取过的属性,若是,重复步骤(5-2-3);若否,进入步骤(5-2-5);(5-2-4) Determine whether there are attributes that have not been read in R, if so, repeat step (5-2-3); if not, enter step (5-2-5);
(5-2-5)判断FDs_new(rnew)中是否还有未读取的最小函数依赖,若是,则重复步骤(5-2-1)~(5-2-4);若否,则进入步骤(6);(5-2-5) Determine whether there are unread minimum functional dependencies in FDs_new(r new ), if so, repeat steps (5-2-1)~(5-2-4); if not, then Go to step (6);
(5-3)如果tp是修改的元组,对于tp中每一被修改的属性A,设tp表示为:(v1,…,(vA-old,vA-new),…,vn),计算FDs_new(rnew)的步骤如下:(5-3) If t p is a modified tuple, for each modified attribute A in t p , let t p be expressed as: (v 1 ,...,(v A -old,v A -new), …,v n ), the steps to calculate FDs_new(r new ) are as follows:
(5-3-1)如果Z=X∪Y},将(v1,…,vA-old,…,vn)作为删除元组,按照步骤(5-2)修改FDs_new(rnew);(5-3-1) If Z=X∪Y}, take (v 1 ,...,v A -old,...,v n ) as a deletion tuple, and modify FDs_new(r new ) according to step (5-2);
(5-3-2)如果A∈{Z|X→Y∈FDs_new(rnew),Z=X∪Y},则先将(v1,…,vA-old,…,vn)作为删除元组处理,按照步骤(5-2)修改FDs_new(rnew);再将(v1,…,vA-new,…,vn)作为增加元组处理,按照步骤(5-1)修改FDs_new(rnew);(5-3-2) If A∈{Z|X→Y∈FDs_new(r new ), Z=X∪Y}, first take (v 1 ,…,v A -old,…,v n ) as To delete a tuple, modify FDs_new(r new ) according to step (5-2); then treat (v 1 ,…,v A -new,…,v n ) as an added tuple, follow step (5-1) Modify FDs_new(r new );
(6)判断△r中是否还存在未读取的元组,若是,重复步骤(4)~(5);若否,进入步骤(7);(6) Determine whether there are unread tuples in △r, if so, repeat steps (4) to (5); if not, go to step (7);
(7)保存FDs_new(rnew)以及rnew的划分信息集Partitions_new;(7) Save the partition information set Partitions_new of FDs_new (r new ) and r new ;
(8)输出FDs_new(rnew)。(8) Output FDs_new(r new ).
所述初始关系表r的最小函数依赖集及划分信息采用已有的最小函数依赖计算算法TANE获得。本发明提出的是关系表变化后增量计算新的最小函数依赖集的方法。The minimum functional dependency set and partition information of the initial relational table r are obtained by using the existing minimum functional dependency calculation algorithm TANE. The invention proposes a method for incrementally calculating a new minimum functional dependency set after a relational table is changed.
本发明与现有技术相比,具有如下优点和有益效果:Compared with the prior art, the present invention has the following advantages and beneficial effects:
(1)本发明方法效率较高。采用增量计算的方式,针对不同类型的元组操作,修改划分信息不需要重新扫描数据库,并且在大多数情况下,利用增量数据、原有的最小函数依赖集及划分信息,计算新增或删除的最小函数依赖也不需要重新扫描数据库。(1) The inventive method has higher efficiency. Incremental calculation is adopted to modify the partition information for different types of tuple operations without rescanning the database, and in most cases, the incremental data, the original minimum functional dependency set and partition information are used to calculate the new Or the minimal functional dependencies removed also don't require rescanning the database.
(2)本发明方法避免了重复计算在修改后的数据集中仍有效的最小函数依赖。在实际应用中,数据库变化后,原数据集中的大多数最小函数依赖都是有效的,本发明提出的方法不需要重新计算新数据集的所有最小函数依赖,仅需要计算原最小函数依赖集的新增及删除的最小函数依赖。(2) The method of the present invention avoids repeated calculation of the minimum functional dependencies that are still valid in the modified data set. In practical applications, after the database changes, most of the minimum functional dependencies in the original data set are valid. The method proposed by the present invention does not need to recalculate all the minimum functional dependencies of the new data set, and only needs to calculate the original minimum functional dependencies. Minimal functional dependencies for adding and removing.
(3)本发明计算结果正确,且灵活性强。目前存在一些近似计算最小函数依赖的方法,不能保证最小函数依赖的正确性,而本发明增量计算的结果与非增量计算的结果一致,且当数据集有变化时,最小函数依赖集可以随时修改,即按照需要、周期及时修改。(3) The calculation result of the present invention is correct and flexible. At present, there are some methods for approximately calculating the minimum functional dependence, which cannot guarantee the correctness of the minimum functional dependence, but the results of the incremental calculation of the present invention are consistent with the results of the non-incremental calculation, and when the data set changes, the minimum functional dependence set can be Modify at any time, that is, modify in time according to needs and cycles.
附图说明Description of drawings
图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.
具体实施方式detailed description
下面结合实施例及附图对本发明作进一步详细的描述,但本发明的实施方式不限于此。The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.
实施例1Example 1
参见图1,本实施例以一个关系模式为R的数据集为例,说明本发明实施方式。Referring to FIG. 1 , this embodiment takes a data set whose relation mode is R as an example to illustrate the implementation of the present invention.
(1)关系模式R={StuID,Name,City,Province},表1是一个R模式的关系表r,包括五条元组(每一行为一条元组);表2是对其的增量操作表△r,其中:OP列表示对r的增量操作类型,Del表示删除,Ins表示增加,Up表示修改,所修改的元组属性项中包括旧值和新值,其形式为:′旧值,新值′。表3是经过表2操作修改后的关系表rnew。注:上述三个表中不同符号代表的属性取值不同,最左边的数字表示给定元组唯一编号。(1) Relational mode R={StuID, Name, City, Province}, Table 1 is a relational table r of R mode, including five tuples (each row is a tuple); Table 2 is its incremental operation Table △r, wherein: OP column represents the type of incremental operation on r, Del represents deletion, Ins represents increase, Up represents modification, and the modified tuple attribute item includes old value and new value, and its form is: 'old value, new value'. Table 3 is the relational table r new modified by the operations in Table 2. Note: Different symbols in the above three tables represent different attribute values, and the leftmost number represents the unique number of a given tuple.
关系表r的最小函数依赖集FDs_old(r)如下:The minimum functional dependency set FDs_old(r) of relational table r is as follows:
FDs_old(r)={{Name,City}→StuID,FDs_old(r)={{Name, City}→StuID,
{Name,Province}→StuID, {Name, Province}→StuID,
{StuID}→Name, {StuID}→Name,
{StuID}→City, {StuID}→City,
{Name,Province}→City, {Name, Province}→City,
{StuID}→Province, {StuID}→Province,
{City}→Province} {City}→Province}
关系表r的部分划分信息如下,其中的数字表示元组唯一编号:The partial division information of the relational table r is as follows, where the numbers represent the unique numbers of the tuples:
π{StuID}={{1},{2},{3},{4},{5}}π {StuID} = {{1},{2},{3},{4},{5}}
π{Name}={{1},{2},{3},{4,5}}π {Name} = {{1},{2},{3},{4,5}}
π{City}={{1},{2,3},{4},{5}}π {City} = {{1},{2,3},{4},{5}}
π{Province}={{1,2,3},{4},{5}}π {Province} = {{1,2,3},{4},{5}}
π{StuID,Name}={{1},{2},{3},{4},{5}}π {StuID,Name} = {{1},{2},{3},{4},{5}}
π{City,Province}={{1},{2,3},{4},{5}}π {City,Province} = {{1},{2,3},{4},{5}}
π{Name,Province}={{1},{2},{3},{4},{5}}π {Name,Province} = {{1},{2},{3},{4},{5}}
π{Name,Province,City}={{1},{2},{3},{4},{5}}π {Name,Province,City} = {{1},{2},{3},{4},{5}}
表1关系表rTable 1 relational table r
表2增量操作表△rTable 2 Incremental operation table △r
表3修改后的关系表rnew Table 3 The modified relationship table r new
(2)依次读取Δr中所有元组操作类型及属性值,修改Partitions中划分信息:(2) Read all tuple operation types and attribute values in Δr in turn, and modify the partition information in Partitions:
读取编号为2的操作元组t2,由于其为删除操作,部分划分信息变化为如下:Read the operation tuple t 2 with the number 2. Since it is a delete operation, part of the division information changes as follows:
π{StuID}={{1},{3},{4},{5}}π {StuID} = {{1},{3},{4},{5}}
π{Name}={{1},{3},{4,5}}π {Name} = {{1},{3},{4,5}}
π{City}={{1},{3},{4},{5}}π {City} = {{1},{3},{4},{5}}
π{Province}={{1,3},{4},{5}}π {Province} = {{1,3},{4},{5}}
π{StuID,Name}={{1},{3},{4},{5}}π {StuID,Name} = {{1},{3},{4},{5}}
π{City,Province}={{1},{3},{4},{5}}π {City,Province} = {{1},{3},{4},{5}}
π{Name,Province}={{1},{3},{4},{5}}π {Name,Province} = {{1},{3},{4},{5}}
π{Name,Province,City}={{1},{3},{4},{5}}π {Name,Province,City} = {{1},{3},{4},{5}}
读取编号为6的操作元组t6,由于其为插入操作,部分划分信息变化为如下:Read the operation tuple t 6 numbered 6. Since it is an insert operation, part of the division information changes as follows:
π{StuID}={{1},{3},{4},{5},{6}}π {StuID} = {{1},{3},{4},{5},{6}}
π{Name}={{1},{3},{4,5},{6}}π {Name} = {{1},{3},{4,5},{6}}
π{City}={{1},{3,6},{4},{5}}π {City} = {{1},{3,6},{4},{5}}
π{Province}={{1,3},{4},{5},{6}}π {Province} = {{1,3},{4},{5},{6}}
π{StuID,Name}={{1},{3},{4},{5},{6}}π {StuID,Name} = {{1},{3},{4},{5},{6}}
π{City,Province}={{1},{3},{4},{5},{6}}π {City,Province} = {{1},{3},{4},{5},{6}}
π{Name,Province}={{1},{3},{4},{5},{6}}π {Name,Province} = {{1},{3},{4},{5},{6}}
π{Name,Province,City}={{1},{3},{4},{5},{6}}π {Name,Province,City} = {{1},{3},{4},{5},{6}}
读取编号为5的操作元组t5,由于其为修改操作,部分划分信息变化为如下:Read the operation tuple t 5 with the number 5. Since it is a modification operation, part of the division information changes as follows:
π{StuID}={{1},{3},{4},{5},{6}}π {StuID} = {{1},{3},{4},{5},{6}}
π{Name}={{1},{3},{4},{5},{6}}π {Name} = {{1},{3},{4},{5},{6}}
π{City}={{1},{3,6},{4},{5}}π {City} = {{1},{3,6},{4},{5}}
π{Province}={{1,3},{4},{5},{6}}π {Province} = {{1,3},{4},{5},{6}}
π{StuID,Name}={{1},{3},{4},{5},{6}}π {StuID,Name} = {{1},{3},{4},{5},{6}}
π{City,Province}={{1},{3},{4},{5},{6}}π {City,Province} = {{1},{3},{4},{5},{6}}
π{Name,Province}={{1},{3},{4},{5},{6}}π {Name,Province} = {{1},{3},{4},{5},{6}}
π{Name,Province,City}={{1},{3},{4},{5},{6}}π {Name,Province,City} = {{1},{3},{4},{5},{6}}
(3)令FDs_new(rnew)=FDs_old(r),从而得到:(3) Make FDs_new(r new )=FDs_old(r), so as to get:
FDs_new(rnew)={{Name,City}→StuID,FDs_new(r new )={{Name,City}→StuID,
{Name,Province}→StuID, {Name, Province}→StuID,
{StuID}→Name, {StuID}→Name,
{StuID}→City, {StuID}→City,
{Name,Province}→City, {Name, Province}→City,
{StuID}→Province, {StuID}→Province,
{City}→Province} {City}→Province}
(4)从△r中读取编号为2的操作元组t2,操作类型为删除,按照如下方法进行最小函数依赖的增量计算:(4) Read the operation tuple t 2 with the number 2 from △r, the operation type is delete, and perform the incremental calculation of the minimum functional dependency as follows:
对于FDs_new(rnew)中的函数依赖:{StuID}→Name,{StuID}→City,{StuID}→Province,{City}→Province,由于|{StuID}|=1,|{City}|=1,所以它们仍为最小函数依赖。For the functional dependencies in FDs_new(r new ): {StuID}→Name,{StuID}→City,{StuID}→Province,{City}→Province, since |{StuID}|=1, |{City}|= 1, so they are still minimally functionally dependent.
对于FDs_new(rnew)中的函数依赖:{Name,City}→StuID,|{Name,City}|>1,由于|π{Name}|=|π{StuID,Name}|,可以判断得到:{Name}→StuID在rnew中成立,且为最小函数依赖,因此,将其加入FDs_new(rnew)中,且在FDs_new(rnew)中删除{Name,City}→StuID,另可以判断{City}→StuID不成立。For the functional dependency in FDs_new(r new ): {Name, City}→StuID,|{Name, City}|>1, Since |π {Name} |=|π {StuID,Name} |, it can be judged that: {Name}→StuID is established in r new , and it is the minimum functional dependency. Therefore, add it to FDs_new(r new ), And delete {Name, City}→StuID in FDs_new(r new ), and it can be judged that {City}→StuID is not established.
对于FDs_new(rnew)中的函数依赖:{Name,Province}→StuID,|{Name,Province}|>1,由于{Name}→StuID已经成立,所以其不再是最小函数依赖,将其从FDs_new(rnew)中删除,另可以判断{Province}→StuID不成立。For the functional dependency in FDs_new(r new ): {Name, Province}→StuID, |{Name, Province}|>1, since {Name}→StuID has been established, it is no longer the minimum functional dependency, and it is changed from FDs_new(r new ), and it can be judged that {Province}→StuID is not established.
对于FDs_new(rnew)中的函数依赖:{Name,Province}→City,|{Name,Province}|>1,由于|π{Name}|=|π{Name,City}|,可以判断得到:{Name}→City在rnew中成立,且为最小函数依赖,因此,将其加入FDs_new(rnew)中,且在FDs_new(rnew)中删除{Name,Province}→City,另可以判断{Province}→City不成立。For functional dependencies in FDs_new(r new ): {Name, Province}→City, |{Name, Province}|>1, Since |π {Name} |=|π {Name,City} |, it can be judged that: {Name}→City is established in r new , and it is the minimum functional dependency. Therefore, add it to FDs_new(r new ), And delete {Name, Province}→City in FDs_new(r new ), and it can be judged that {Province}→City is not established.
此时:at this time:
FDs_new(rnew)={{Name}→StuID,FDs_new(r new )={{Name}→StuID,
{StuID}→Name, {StuID}→Name,
{StuID}→City, {StuID}→City,
{Name}→City, {Name}→City,
{StuID}→Province, {StuID}→Province,
{City}→Province} {City}→Province}
接着,计算其他新增最小函数依赖:从R中依次读取每一属性,例如:读取属性Province,采用以下步骤计算其他新增的最小函数依赖:Next, calculate other newly added minimum functional dependencies: Read each attribute in turn from R, for example: read the attribute Province, and use the following steps to calculate other newly added minimum functional dependencies:
ⅰ)L=1;i) L=1;
ⅱ) ii)
ⅲ)对于C′中的每一元素X,判定X→Province是否是最小函数依赖,可以判断不在FDs_new(rnew)中的{Name}→Province是最小函数依赖,将其增加到FDs_new(rnew)中,并在C中删除{Name},C为空了。ⅲ) For each element X in C′, determine whether X→Province is the minimum functional dependency. It can be judged that {Name}→Province that is not in FDs_new(r new ) is the minimum functional dependency, and add it to FDs_new(r new ), and delete {Name} in C, C is empty.
另外,从R中依次读取StuID、Name属性,还可以判断新增最小函数依赖:{City,Province}→StuID,{City,Province}→Name。In addition, by reading the StuID and Name properties sequentially from R, you can also determine the new minimum functional dependency: {City, Province}→StuID, {City, Province}→Name.
此时:at this time:
FDs_new(rnew)={{Name}→StuID,FDs_new(r new )={{Name}→StuID,
{StuID}→Name, {StuID}→Name,
{StuID}→City, {StuID}→City,
{Name}→City, {Name}→City,
{StuID}→Province, {StuID}→Province,
{City}→Province, {City}→Province,
{Name}→Province, {Name}→Province,
{City,Province}→StuID, {City,Province}→StuID,
{City,Province}→Name} {City, Province}→Name}
(5)从△r中读取编号为6的操作元组t6,操作类型为插入,即增加,按照如下方法进行最小函数依赖的增量计算:(5) Read the operation tuple t 6 with the number 6 from △r, the operation type is insertion, that is, increase, and perform the incremental calculation of the minimum functional dependence according to the following method:
对于FDs_new(rnew)中的每一最小函数依赖,判定其作为函数依赖在rnew中是否成立,若成立,f仍是最小函数依赖,保留在FDs_new(rnew)中。根据修改后的划分信息,|π{City}|≠|π{City,Province}|,所以{City}→Province不再成立,将其从FDs_new(rnew)中删除;其他在FDs_new(rnew)中的最小函数依赖仍有效。For each minimum functional dependency in FDs_new(r new ), determine whether it is established as a functional dependency in r new , and if it is true, f is still the minimum functional dependency and is retained in FDs_new(r new ). According to the modified division information, |π {City} |≠|π {City,Province} |, so {City}→Province is no longer established, and it is deleted from FDs_new(r new ); others are in FDs_new(r new ) in the minimal functional dependencies are still valid.
对于不再成立的函数依赖:{City}→Province,判定不在FDs_new(rnew)中的{City}∪Z→Y在rnew中是否为最小函数依赖,其中,且Z非空,即Z={StuID,Name}。由此,获得新增的最小函数依赖:{StuID,City}→Province,{Name,City}→Province。For the functional dependency that no longer holds true: {City}→Province, determine whether {City}∪Z→Y that is not in FDs_new(r new ) is the minimum functional dependency in r new , where, And Z is not empty, that is, Z={StuID, Name}. Thus, the newly added minimal functional dependencies are obtained: {StuID, City}→Province, {Name, City}→Province.
此时:at this time:
FDs_new(rnew)={{Name}→StuID,FDs_new(r new )={{Name}→StuID,
{StuID}→Name, {StuID}→Name,
{StuID}→City, {StuID}→City,
{Name}→City, {Name}→City,
{StuID}→Province, {StuID}→Province,
{StuID,City}→Province, {StuID,City}→Province,
{Name,City}→Province, {Name, City}→Province,
{Name}→Province, {Name}→Province,
{City,Province}→StuID, {City,Province}→StuID,
{City,Province}→Name} {City, Province}→Name}
(6)从△r中读取编号为5的操作元组t5,操作类型为修改,按照如下方法进行最小函数依赖的增量计算:(6) Read the operation tuple t 5 with the number 5 from △r, the operation type is modification, and perform the incremental calculation of the minimum functional dependency as follows:
该修改操作分解为:删除元组(e4,n3,WH,HB),插入元组(e4,n4,WH,HB),依次按照删除元组的方法和插入元组的方法进行最小函数依赖的增量计算,在此,处理完后,FDs_new(rnew)没有变化。The modification operation is decomposed into: delete tuple (e 4 , n 3 , WH, HB), insert tuple (e 4 , n 4 , WH, HB), follow the method of deleting tuple and inserting tuple in turn Incremental computation of minimum functional dependencies, where FDs_new(r new ) is unchanged after processing.
(7)所有△r中的元组均操作处理完毕,保存FDs_new(rnew)以及rnew的划分信息集Partitions_new,作为下一次增量计算的原有最小函数依赖集以及划分信息。(7) All the tuples in △r have been processed, and FDs_new(r new ) and the partition information set Partitions_new of r new are saved as the original minimum functional dependency set and partition information for the next incremental calculation.
(8)输出FDs_new(rnew)(8) Output FDs_new (r new )
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, Simplifications should be equivalent replacement methods, and all are included in the protection scope of the present invention.
Claims (2)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510072548.1A CN104699761B (en) | 2015-02-11 | 2015-02-11 | A kind of incremental calculation method of minimal functional dependencies |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510072548.1A CN104699761B (en) | 2015-02-11 | 2015-02-11 | A kind of incremental calculation method of minimal functional dependencies |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN104699761A CN104699761A (en) | 2015-06-10 |
| CN104699761B true CN104699761B (en) | 2017-11-21 |
Family
ID=53346882
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510072548.1A Expired - Fee Related CN104699761B (en) | 2015-02-11 | 2015-02-11 | A kind of incremental calculation method of minimal functional dependencies |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN104699761B (en) |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108595624A (en) * | 2018-04-23 | 2018-09-28 | 南京大学 | A kind of large-scale distributed functional dependence discovery method |
| CN109325062B (en) * | 2018-09-12 | 2020-09-25 | 哈尔滨工业大学 | Data dependency mining method and system based on distributed computation |
| CN111125172A (en) * | 2019-12-26 | 2020-05-08 | 成都康赛信息技术有限公司 | Data validity evaluation method and system based on multi-column relationship |
| CN114741381B (en) * | 2022-04-14 | 2023-04-14 | 郑州轻工业大学 | Data Cleaning Method Based on Association Dependency |
| CN115168504A (en) * | 2022-06-20 | 2022-10-11 | 阿里云计算有限公司 | A method and device for determining functional dependencies |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103077181A (en) * | 2012-11-20 | 2013-05-01 | 深圳市华傲数据技术有限公司 | Method for automatically generating approximate functional dependency rule |
| CN103514259A (en) * | 2013-08-13 | 2014-01-15 | 江苏华大天益电力科技有限公司 | Abnormal data detection and modification method based on numerical value relevance model |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003006401A (en) * | 2001-06-26 | 2003-01-10 | Hitachi Ltd | Corporate information processing method and design report creation method |
-
2015
- 2015-02-11 CN CN201510072548.1A patent/CN104699761B/en not_active Expired - Fee Related
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103077181A (en) * | 2012-11-20 | 2013-05-01 | 深圳市华傲数据技术有限公司 | Method for automatically generating approximate functional dependency rule |
| CN103514259A (en) * | 2013-08-13 | 2014-01-15 | 江苏华大天益电力科技有限公司 | Abnormal data detection and modification method based on numerical value relevance model |
Non-Patent Citations (1)
| Title |
|---|
| 一种增量发现条件函数依赖的算法;李丁月等;《计算机工程与科学》;20130830;第35卷(第8期);正文第3节 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN104699761A (en) | 2015-06-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN104699761B (en) | A kind of incremental calculation method of minimal functional dependencies | |
| CN110175158B (en) | A method and system for extracting log template based on vectorization | |
| Fournier-Viger et al. | VMSP: Efficient vertical mining of maximal sequential patterns | |
| Satyanarayana et al. | A linear-time algorithm for computing K-terminal reliability in series-parallel networks | |
| CN103823823B (en) | Denormalization policy selection method based on Frequent Itemsets Mining Algorithm | |
| CN106294762B (en) | Entity identification method based on learning | |
| CN110389950B (en) | A fast-running method for cleaning big data | |
| US9720986B2 (en) | Method and system for integrating data into a database | |
| CN101561813B (en) | An Analysis Method of String Similarity in Web Environment | |
| AU2018211280B2 (en) | Managing memory and storage space for a data operation | |
| CN100590603C (en) | A method and device for processing log files | |
| CN109325062B (en) | Data dependency mining method and system based on distributed computation | |
| CN107909344A (en) | Workflow logs iterative task recognition methods based on relational matrix | |
| CN106778079A (en) | A kind of DNA sequence dna k mer frequency statistics methods based on MapReduce | |
| WO2024021362A1 (en) | Data verification method and apparatus for traffic replay | |
| CN105868266A (en) | Clustering model based high-dimensional data stream outlier detection method | |
| CN112000848B (en) | A graph data processing method, device, electronic device and storage medium | |
| CN111666468A (en) | Method for searching personalized influence community in social network based on cluster attributes | |
| CN116226103A (en) | A Method of Government Data Quality Detection Based on FPGrowth Algorithm | |
| US8548980B2 (en) | Accelerating queries based on exact knowledge of specific rows satisfying local conditions | |
| CN106407304A (en) | Mutual information-based data discretization and feature selection integrated method and apparatus | |
| CN106294139B (en) | A kind of Detection and Extraction method of repeated fragment in software code | |
| CN114238258B (en) | Database data processing method, device, computer equipment and storage medium | |
| Sun et al. | On the size and recovery of submatrices of ones in a random binary matrix | |
| CN114528311A (en) | Similarity detection method and device for SQL (structured query language) statements |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20171121 Termination date: 20210211 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |