[go: up one dir, main page]

WO2016060546A1 - A system and method for managing a collection of data - Google Patents

A system and method for managing a collection of data Download PDF

Info

Publication number
WO2016060546A1
WO2016060546A1 PCT/MY2015/050121 MY2015050121W WO2016060546A1 WO 2016060546 A1 WO2016060546 A1 WO 2016060546A1 MY 2015050121 W MY2015050121 W MY 2015050121W WO 2016060546 A1 WO2016060546 A1 WO 2016060546A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
confidence value
collection
reference table
identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/MY2015/050121
Other languages
French (fr)
Inventor
Nor Azlinayati ABDUL MANAF
Nurul Aida OSMAN
Khalil BOUZEKRI
Dickson Lukose
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mimos Bhd
Original Assignee
Mimos Bhd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Bhd filed Critical Mimos Bhd
Publication of WO2016060546A1 publication Critical patent/WO2016060546A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management

Definitions

  • the invention relates to data management and in particular, the invention provides a system and method for managing a collection of data in tables.
  • Managing data in a database is known as a key factor for business management.
  • Managing a database requires the step of categorizing data and information in various forms, commonly in table form, then combining the details of data and information to perform analytical processing in order to help users to perform any decision making. For example, a manufacturer produces a new product after analysing customer information such as census data, customer demographics, needs and lifestyles.
  • relational database is commonly used for supporting and managing a large volume of data in an organization or a company.
  • a relational database For managing and storing basic and transaction-oriented data and information, a relational database is constructed by composing multiples of tables with values in columns and rows as well as associated with a primary key for each of the tables. Furthermore, the relational database has a unique characteristic which allows users to focus on logical view of a data environment by listing the details of data and its relationships. It defines the relationships between the tables and provides the users a complete picture of the data and information stored by implementing the relational set theory.
  • a relational database generally comprises numerous tables in order to cater for various processes and activities; it hence generates a complex relation between the tables.
  • a course information table in a relational database may consist a list of course title with a unique value in the table. The unique value in the table can be categorized as the primary key.
  • the course table is also called as a reference table.
  • another table may consist of the primary key from the reference table and other data such as student demographic.
  • the table is called as a transactional table.
  • the relational database is not able to link the reference table with other transactional tables if the primary key and the reference table are identified incorrectly.
  • the performance of a relational database is affected significantly if the number of tables is huge due to the relationships among the tables in the database. At the same instant, the process becomes more complicated and time consuming. Furthermore, frequent changes of data in the tables will lead adverse impact on the performance. Therefore, the table relation identification is a crucial attribute for managing a collection of data.
  • US Patent No. 8386529 B2 mentions a method for identifying a reference table automatically by adopting foreign key relationship in a system and then employing a score function on this foreign key relationship.
  • US 8386529 B2 does not teach the method of comparing and ranking the table attributes from a database with the semantic information such as semantic structure based value and instance based value from a knowledge base.
  • US Patent No. 8250101 B2 discloses techniques for mapping and translating reference data in a database system by using the Ontology- Guided Reference.
  • US 8250101 B2 discloses only a method to identify the reference table by using the size of the tables and it may not be able to identify reference tables accurately.
  • a reference table must always be identified correctly when the database is performing queries or analysing result in order to prevent the system from being ineffective.
  • US Patent No. 8583626 B2 teaches a method for identifying reference data tables in an Extract-Transform-Load (ETL) process by identifying a reference table with aid of score ranking of each reference table in a process.
  • the method mentioned in US 8583626 B2 only uses table attributes from a database to identify a reference table without analysing other multi-modality approach such as semantic structure of the table schema and content of database table.
  • a research paper entitled "An Empirical Study of Instance-based Ontology Matching" addresses the best instance based ontology mapping technique that is meeting the Semantic Web standards.
  • the research paper does not disclose the technique to identify reference tables accurately and automatically in a database.
  • the present invention aims to provide a system and a method for managing a collection data that is organized in table forms, a knowledge database that applies reasoning capabilities to the collection of data, and a list comprises reference tables and non-reference tables that stores attributes of the collection of data that is used to link the reference table with the knowledge database as well as a module that identifies reference tables from the collection of data and configures the collection of data in order to match the data of reference tables with semantic information from the knowledge database.
  • Figure 1 illustrates the system for managing a collection of data, as described in the present invention.
  • Figure 2 illustrates the flowchart of the method for managing a collection of data, as described in the present invention.
  • Figure 3 illustrates the flowchart depicting the steps of identifying reference and non-reference table with the aid of a key filter, as described in the present invention.
  • Figure 4 illustrates the flowchart depicting the steps of determining first confidence values for each of reference table by using transaction information of the collection of data, as described in the present invention.
  • Figure 5 illustrates the flowchart depicting the steps of determining the relationship of the knowledge database with the attributes of reference tables and identifying second and third confidence values for each of reference table, as described in the present invention.
  • Figure 6 illustrates the flowchart depicting the steps of obtaining an overall confidence value for each reference table based on the result of each confidence value, as described in the present invention.
  • the present invention relates to a system (1 ) and method (10) for identifying reference tables for managing a collection of data (2). More particularly, the present invention improves the accuracy of reference table identification in the collection of data (2).
  • the system (1 as illustrated in Figure 1 , comprises at least a collection of data (2) that is organized in tables, fields and forms as well as records, at least a knowledge database (3) that applies reasoning capabilities to the collection of data (2), at least a confidence value list (4) that stores attributes of the collection of data (2) as reference and at least a module (5) that identifies tables having the collection of data (2) and configures the data in order to match the data with semantic information from the knowledge database (3), wherein the system (1 ) utilizes a key filter that is incorporated with the module (5) to eliminate non- reference tables, and a ranking that records the relationship of attributes from the collection of data (2) with the semantic information from the knowledge database (3) and to rank reference tables from the collection of data (2), in order to improve the accuracy of reference table identification in the collection of data (2).
  • the present invention explains that the collection of data (2) that is organized in the table forms, which further comprises a series of logically related two dimensional data.
  • the module (5) in the present invention further identifies the reference table by ranking the tables in the collection of data (2) with a confidence value in order to eliminate non- reference tables.
  • the present invention illustrates that the ranking is incorporated with the module (5) that further rates the effectiveness of relationships with the knowledge database (3) by assigning a confidence value for each relationship.
  • Each relationship is identified by comparing the information of the collection of data (2) with information of the knowledge database (3). It involves multi-modality approaches comparison such as semantic structure base and semantic instance.
  • the present invention further discloses the use of knowledge database (3) which consists of centralized repository of data and information submitted by the user.
  • the knowledge database (3) has been utilised to optimise data storage, data analysis, and data reuse as well as transactional properties on data requirements.
  • the knowledge database (3) comprises pre-determined logics, concepts, rules and requirement of the systems to deduce new facts or inconsistencies.
  • the figure illustrates a method (10) for managing a collection of data (2) according to the present invention.
  • the method (10) comprises a first step of identifying reference tables and non-reference tables with the aid of a key filter (1 1 ).
  • a first confidence value for the reference table is identified based on the transaction information and updates of the reference table from the collection of data (2) (12).
  • the relationship of the knowledge database (3) with the attributes of reference table is identified prior to the identification of second and third confidence values for the reference table (13).
  • an overall confidence value for the reference table (14) is obtained.
  • the overall confidence value of the reference table is obtained by comparing the result of each confidence value that is identified in previously.
  • Figure 3 is a flowchart depicting further steps of identifying reference tables and non-reference tables with the aid of a key filter (1 1 ). It begins with the steps of identifying a name (1 1 1 ) and primary keys (1 12) for each table. Then, the primary keys are recorded and managed in a list that is named as a primary key list (1 13). Next, one or more foreign key is identified (1 14) if one primary key is found in the table. Otherwise, the table is identified as a non-reference table if no primary key or more than one primary key is identified. The foreign keys are recorded and managed in a list that is named as a foreign key list (1 15). Subsequently, the table is identified as reference table if no foreign key is identified in the table (1 16).
  • the attributes of the table such as table names, primary keys and foreign keys are stored in these two list that are mentioned previously for identifying a reference table. These two list are used as a key filter list to identify whether the table is a reference table. Non-reference table is eliminated in the collection of data (2) after a reference table is identified (1 17).
  • Figure 4 is a flowchart depicting further steps of determining first confidence value for the reference table by using transaction information of the collection of data (2) (12).
  • the first step is to identify activities of reference table (121 ) and then followed by identifying frequency of each update for the reference table (122). Once the frequency of the table has been identified, the total updates for the reference table is identified in a period of time (123). Based on the total updates of the reference table, the percentage of the updates of the reference table is calculated (124). Next, a first confidence value is determined based on the updates of the reference table (125). Lastly, the first confidence value of the reference table is recorded and managed in a list that is known as a confidence value list (126).
  • a relation of the knowledge database (3) with the attributes of the reference tables is determined to further identify second and third confidence value of the reference table (13) as illustrated in Figure 5.
  • the information of the reference table Prior to the determination of the relationship between the attributes of the reference table with the information in the knowledge database (3) (132), the information of the reference table is required to be converted into semantic information for further semantic analysis (131 ). Then, a relationship between the attributes of the reference table with the information of the knowledge database (3) is identified effectively.
  • the semantic information is utilised for building a relationship among the data. Semantic information further consists of the study of rational, informational content of the collection of data (2).
  • the semantic information is used for building a relationship among the collection of data (2) and linking the data of the reference tables with information from the knowledge database (3) in order to provide a visible and accurate result of data management.
  • a second confidence value of the reference table is determined (133) by comparing its table structure with the information of the knowledge database (3).
  • a third confidence value of the reference table is determined (134) by comparing its table attributes with the semantic instance of the knowledge database (3).
  • the first, second and third confidence value are recorded and stored in the confidence value list (4) for each confidence value respectively (135).
  • the list is known as the confidence value list (4).
  • Each of the newly determined first, second and third confidence values are compared with the existing confidence values that are being stored in the confidence value list (4), so that an updated list of confidence value is created (136), otherwise the determined second and third confidence values in steps of (133) and (134) are treated as the highest value in the list of the confidence value..
  • the last step of the method (10) is identifying a reference table by determining an overall confidence value for the reference table (14) as showed in Figure 6.
  • the overall confidence value is determined by obtaining the first, second and third confidence values that have been determined previously.
  • the first, second and third confidence value are obtainable from the confidence value list respectively (141 ,142,143).
  • the confidence value list (4) that comprises of first, second and third confidence values, is used to rank the reference table in the collection of data (145). The highest value of each confidence value will be maintained and updated in the confidence value list.
  • the reference table is determined based on the highest value of the overall confidence value.
  • the method (10) discloses an advantage of using the confidence value list (4) that is incorporated with the module (5) in order to identify reference tables based on multi-modality approaches such as semantic structure based and semantic instance based. With the aid of semantics information, the method (10) further illustrates other advantages of determining the relationship of reference table with the knowledge database (3) and generates the second and third confidence value that further identifies the reference table regardless of the size or complexity of the collection of the data (2). Moreover, an ontology mapping technique is implemented to compare the semantics information of the knowledge database (3) with attributes of the tables in the collection of data (2).
  • the system (1 ) and method (10) that are presented herein are able to improve the accuracy of identification of reference table in a collection of data (2) by comparing multi-modality approaches such as data log, semantic structure of table and content of table. Last but not least, the tables are managed and ranked with minimal user intervention in order to provide a visible and accurate result in managing the collection of data (2).

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The present invention discloses a system (1) and method (10) for managing a collection of data (2) in table forms and for identifying reference tables based on multi-modality approaches. The system (1) and method (10) discloses a module (5) that is incorporated with a key filter to determine reference tables and eliminate non-reference tables as well as a ranking to rank reference tables in order to identify reference tables with minimal user intervention.

Description

A SYSTEM AND METHOD FOR MANAGING A COLLECTION OF DATA TECHNICAL FIELD OF THE INVENTION
The invention relates to data management and in particular, the invention provides a system and method for managing a collection of data in tables.
BACKGROUND OF THE INVENTION
Managing data in a database is known as a key factor for business management. Managing a database requires the step of categorizing data and information in various forms, commonly in table form, then combining the details of data and information to perform analytical processing in order to help users to perform any decision making. For example, a manufacturer produces a new product after analysing customer information such as census data, customer demographics, needs and lifestyles.
Managing data in a database therefore has been and continue to grow to be a strong focus for the world especially in the business world. In order to minimize data redundancy and errors, improve data integrity and security as well as providing data storage, relational database is commonly used for supporting and managing a large volume of data in an organization or a company.
For managing and storing basic and transaction-oriented data and information, a relational database is constructed by composing multiples of tables with values in columns and rows as well as associated with a primary key for each of the tables. Furthermore, the relational database has a unique characteristic which allows users to focus on logical view of a data environment by listing the details of data and its relationships. It defines the relationships between the tables and provides the users a complete picture of the data and information stored by implementing the relational set theory.
One of the challenges faced in data management is that the data integrity and accuracy of a relational database is dependent on the keys of table that shared and linked across the tables within the relational database. A relational database generally comprises numerous tables in order to cater for various processes and activities; it hence generates a complex relation between the tables. For instance, a course information table in a relational database may consist a list of course title with a unique value in the table. The unique value in the table can be categorized as the primary key. The course table is also called as a reference table. On the other hand, another table may consist of the primary key from the reference table and other data such as student demographic. The table is called as a transactional table. The relational database is not able to link the reference table with other transactional tables if the primary key and the reference table are identified incorrectly.
Further compounding to the issue is that, the performance of a relational database is affected significantly if the number of tables is huge due to the relationships among the tables in the database. At the same instant, the process becomes more complicated and time consuming. Furthermore, frequent changes of data in the tables will lead adverse impact on the performance. Therefore, the table relation identification is a crucial attribute for managing a collection of data.
In view of the abovementioned data accuracy and performance of database, several methods have been developed to identify reference tables in a database.
US Patent No. 8386529 B2 mentions a method for identifying a reference table automatically by adopting foreign key relationship in a system and then employing a score function on this foreign key relationship. However, US 8386529 B2 does not teach the method of comparing and ranking the table attributes from a database with the semantic information such as semantic structure based value and instance based value from a knowledge base. US Patent No. 8250101 B2 discloses techniques for mapping and translating reference data in a database system by using the Ontology- Guided Reference. However, US 8250101 B2 discloses only a method to identify the reference table by using the size of the tables and it may not be able to identify reference tables accurately. Furthermore, a reference table must always be identified correctly when the database is performing queries or analysing result in order to prevent the system from being ineffective.
US Patent No. 8583626 B2 teaches a method for identifying reference data tables in an Extract-Transform-Load (ETL) process by identifying a reference table with aid of score ranking of each reference table in a process. The method mentioned in US 8583626 B2 only uses table attributes from a database to identify a reference table without analysing other multi-modality approach such as semantic structure of the table schema and content of database table.
A research paper entitled "An Empirical Study of Instance-based Ontology Matching" addresses the best instance based ontology mapping technique that is meeting the Semantic Web standards. However, the research paper does not disclose the technique to identify reference tables accurately and automatically in a database.
In terms of data accuracy and performance of database, the existing systems and methods have their limitations. Therefore, it is an aim of this present invention to provide a system and method that manages data based on reference tables and is capable of improving the accuracy of identification reference tables in a database, more particularly in a relational database. SUMMARY OF THE PRESENT INVENTION
The present invention aims to provide a system and a method for managing a collection data that is organized in table forms, a knowledge database that applies reasoning capabilities to the collection of data, and a list comprises reference tables and non-reference tables that stores attributes of the collection of data that is used to link the reference table with the knowledge database as well as a module that identifies reference tables from the collection of data and configures the collection of data in order to match the data of reference tables with semantic information from the knowledge database.
It is an object of the present invention to provide a system and a method that identifies primary keys and foreign keys in order to eliminate non- reference tables.
It is further an object of the present invention to provide a system and a method that determines first confidence values for each of reference table by using transaction information of the collection of data.
It is another object of the present invention to provide a system and a method that determines the relationship of the knowledge database with the attributes of reference tables and further identifies second and third confidence values for each of reference table in the collection of data.
It is another object of the present invention to provide a system and a method to obtain an overall confidence value for each reference table based on the result of each confidence value in order to improve the accuracy of reference table identification in the collection of data.
It is another object of the present invention to provide a system and a method that identifies reference tables by using a ranking that records the relationship of attributes from the collection of data with the semantic information from the knowledge database and ranks the reference tables in order to improve the accuracy of reference table identification in the collection of data. BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates the system for managing a collection of data, as described in the present invention. Figure 2 illustrates the flowchart of the method for managing a collection of data, as described in the present invention.
Figure 3 illustrates the flowchart depicting the steps of identifying reference and non-reference table with the aid of a key filter, as described in the present invention.
Figure 4 illustrates the flowchart depicting the steps of determining first confidence values for each of reference table by using transaction information of the collection of data, as described in the present invention.
Figure 5 illustrates the flowchart depicting the steps of determining the relationship of the knowledge database with the attributes of reference tables and identifying second and third confidence values for each of reference table, as described in the present invention.
Figure 6 illustrates the flowchart depicting the steps of obtaining an overall confidence value for each reference table based on the result of each confidence value, as described in the present invention. DETAILED DESCRIPTION OF THE PRESENT INVENTION
The above mentioned and other features and objects of this invention will become more apparent and better understood by reference to the following detailed description. It should be understood that the detailed description made known below is not intended to be exhaustive or limit the invention to the precise form disclosed as the invention may assume various alternative forms. On the contrary, the detailed description covers all the relevant modifications and alterations made to the present invention, unless the claims expressly state otherwise.
As demand increases by the management to have more analysis on the data from the database, the problems can be solved by implementing the present invention.
The present invention relates to a system (1 ) and method (10) for identifying reference tables for managing a collection of data (2). More particularly, the present invention improves the accuracy of reference table identification in the collection of data (2). The system (1 ), as illustrated in Figure 1 , comprises at least a collection of data (2) that is organized in tables, fields and forms as well as records, at least a knowledge database (3) that applies reasoning capabilities to the collection of data (2), at least a confidence value list (4) that stores attributes of the collection of data (2) as reference and at least a module (5) that identifies tables having the collection of data (2) and configures the data in order to match the data with semantic information from the knowledge database (3), wherein the system (1 ) utilizes a key filter that is incorporated with the module (5) to eliminate non- reference tables, and a ranking that records the relationship of attributes from the collection of data (2) with the semantic information from the knowledge database (3) and to rank reference tables from the collection of data (2), in order to improve the accuracy of reference table identification in the collection of data (2).
The present invention explains that the collection of data (2) that is organized in the table forms, which further comprises a series of logically related two dimensional data. Moreover, the module (5) in the present invention further identifies the reference table by ranking the tables in the collection of data (2) with a confidence value in order to eliminate non- reference tables. In addition to that, the present invention illustrates that the ranking is incorporated with the module (5) that further rates the effectiveness of relationships with the knowledge database (3) by assigning a confidence value for each relationship. Each relationship is identified by comparing the information of the collection of data (2) with information of the knowledge database (3). It involves multi-modality approaches comparison such as semantic structure base and semantic instance.
Additionally, the present invention further discloses the use of knowledge database (3) which consists of centralized repository of data and information submitted by the user. The knowledge database (3) has been utilised to optimise data storage, data analysis, and data reuse as well as transactional properties on data requirements. The knowledge database (3) comprises pre-determined logics, concepts, rules and requirement of the systems to deduce new facts or inconsistencies.
Referring to Figure 2, the figure illustrates a method (10) for managing a collection of data (2) according to the present invention. The method (10) comprises a first step of identifying reference tables and non-reference tables with the aid of a key filter (1 1 ). Subsequently, a first confidence value for the reference table is identified based on the transaction information and updates of the reference table from the collection of data (2) (12). Next, the relationship of the knowledge database (3) with the attributes of reference table is identified prior to the identification of second and third confidence values for the reference table (13). Lastly, an overall confidence value for the reference table (14) is obtained. The overall confidence value of the reference table is obtained by comparing the result of each confidence value that is identified in previously. Figure 3 is a flowchart depicting further steps of identifying reference tables and non-reference tables with the aid of a key filter (1 1 ). It begins with the steps of identifying a name (1 1 1 ) and primary keys (1 12) for each table. Then, the primary keys are recorded and managed in a list that is named as a primary key list (1 13). Next, one or more foreign key is identified (1 14) if one primary key is found in the table. Otherwise, the table is identified as a non-reference table if no primary key or more than one primary key is identified. The foreign keys are recorded and managed in a list that is named as a foreign key list (1 15). Subsequently, the table is identified as reference table if no foreign key is identified in the table (1 16). The attributes of the table such as table names, primary keys and foreign keys are stored in these two list that are mentioned previously for identifying a reference table. These two list are used as a key filter list to identify whether the table is a reference table. Non-reference table is eliminated in the collection of data (2) after a reference table is identified (1 17).
Figure 4 is a flowchart depicting further steps of determining first confidence value for the reference table by using transaction information of the collection of data (2) (12). The first step is to identify activities of reference table (121 ) and then followed by identifying frequency of each update for the reference table (122). Once the frequency of the table has been identified, the total updates for the reference table is identified in a period of time (123). Based on the total updates of the reference table, the percentage of the updates of the reference table is calculated (124). Next, a first confidence value is determined based on the updates of the reference table (125). Lastly, the first confidence value of the reference table is recorded and managed in a list that is known as a confidence value list (126).
After identifying the first confidence value of the reference table in step (12), a relation of the knowledge database (3) with the attributes of the reference tables is determined to further identify second and third confidence value of the reference table (13) as illustrated in Figure 5. Prior to the determination of the relationship between the attributes of the reference table with the information in the knowledge database (3) (132), the information of the reference table is required to be converted into semantic information for further semantic analysis (131 ). Then, a relationship between the attributes of the reference table with the information of the knowledge database (3) is identified effectively. In the step of identifying the relationship of the reference table with the knowledge database (132) as illustrated in Figure 5, the semantic information is utilised for building a relationship among the data. Semantic information further consists of the study of rational, informational content of the collection of data (2). It focuses on the relation between concepts, words, value and what they stand for that include denotation. It further illustrates the degree of the relation between concepts and reference as well as representation. Hereafter, the semantic information is used for building a relationship among the collection of data (2) and linking the data of the reference tables with information from the knowledge database (3) in order to provide a visible and accurate result of data management.
Thereafter, a second confidence value of the reference table is determined (133) by comparing its table structure with the information of the knowledge database (3). Subsequently, a third confidence value of the reference table is determined (134) by comparing its table attributes with the semantic instance of the knowledge database (3). In one embodiment, the first, second and third confidence value are recorded and stored in the confidence value list (4) for each confidence value respectively (135). The list is known as the confidence value list (4). Each of the newly determined first, second and third confidence values are compared with the existing confidence values that are being stored in the confidence value list (4), so that an updated list of confidence value is created (136), otherwise the determined second and third confidence values in steps of (133) and (134) are treated as the highest value in the list of the confidence value..
The last step of the method (10) is identifying a reference table by determining an overall confidence value for the reference table (14) as showed in Figure 6. The overall confidence value is determined by obtaining the first, second and third confidence values that have been determined previously. The first, second and third confidence value are obtainable from the confidence value list respectively (141 ,142,143). Next, the confidence value list (4) that comprises of first, second and third confidence values, is used to rank the reference table in the collection of data (145). The highest value of each confidence value will be maintained and updated in the confidence value list. Lastly, the reference table is determined based on the highest value of the overall confidence value. The method (10) discloses an advantage of using the confidence value list (4) that is incorporated with the module (5) in order to identify reference tables based on multi-modality approaches such as semantic structure based and semantic instance based. With the aid of semantics information, the method (10) further illustrates other advantages of determining the relationship of reference table with the knowledge database (3) and generates the second and third confidence value that further identifies the reference table regardless of the size or complexity of the collection of the data (2). Moreover, an ontology mapping technique is implemented to compare the semantics information of the knowledge database (3) with attributes of the tables in the collection of data (2). The system (1 ) and method (10) that are presented herein are able to improve the accuracy of identification of reference table in a collection of data (2) by comparing multi-modality approaches such as data log, semantic structure of table and content of table. Last but not least, the tables are managed and ranked with minimal user intervention in order to provide a visible and accurate result in managing the collection of data (2).
The invention described herein is susceptible to variations, modifications and/or additions other than those specifically described and it is to be understood that the invention includes all such variations, modifications and/or additions which fall within the scope of the following claims.

Claims

1 . A system (1 ) for managing a collection of data comprising:
at least a collection of data (2) that is organized in table forms;
at least a knowledge database (3) that applies reasoning capabilities to the collection of data (2);
at least a confidence value list (4) that stores attributes of the collection of data (2);
at least a module (5) that identifies reference table having the collection of data (2) and configures the collection of data (2) in order to match the relationship of the reference table with the knowledge database (3);
wherein the system (1 ) utilizes a key filter that is incorporated with the module (5) to eliminate non-reference tables, and a ranking that records the relationship of attributes from the collection of data (2) with the semantic information from the knowledge database (3) and to rank reference tables from the collection of data (2), in order to improve the accuracy of reference table identification in the collection of data (2).
2. A system (1 ) according to claim 1 , wherein the collection of data (2) that is organized in table forms comprises a series of logically related two dimensional data.
3. A system (1 ) according to claim 1 , wherein the module (5) identifies the reference table by ranking the tables in a collection data (2) with a confidence value in order to eliminate non-reference tables.
4. A system (1 ) according to claim 1 , wherein the ranking incorporated with the module (5) further rates the effectiveness of relationships with the knowledge database (3) by assigning a confidence value for each relationship.
5. A method (10) for managing a collection of data (2) comprises of the steps of:
identifying reference tables and non-reference tables with the aid of a key filter (1 1 );
determining first confidence value for the reference table by using transaction information of the collection of data (2) (12);
determining the relationship of a knowledge database (3) with the attributes of reference table and further identifying second and third confidence values for the reference table (13); and
obtaining an overall confidence value for the reference table based on the result of each confidence value identified (14).
6. A method (10) according to claim 5, wherein the step of identifying reference and non-reference table with the aid of a key filter (1 1 ), comprising the steps of:
identifying a name for each table (1 1 1 );
identifying primary keys for each table (1 12);
recording and managing all the primary keys in a list that is named as primary key list (1 13);
identifying foreign keys for each table if only one primary key is identified in the table (1 14);
recording and managing all the foreign keys in a list that is named as foreign key list (1 15);
identifying the table as reference table if no foreign key is identified in the table (1 16); and
eliminating non reference table in the collection of data (2) (1 17).
7. A method (10) according to claim 5, wherein the step of determining first confidence values for the reference table by using transaction information of the collection of data (2) (12), comprising the steps of:
identifying activity of the reference table (121 );
identifying frequency of the updates of the reference table (122); identifying total updates of a period of time (123);
identifying percentage of the updates of the reference table (124); determining a first confidence value of the reference table based on the updates of the reference table (125); and
recording and managing the first confidence value of the reference table in a confidence value list (126).
8. A method (10) according to claim 5, wherein the step of determining the relationship of the knowledge database (3) with the attributes of reference tables and further identifying second and third confidence values for the reference table (13), comprising the steps of:
transforming the information of reference tables into semantic information for further semantic analysis (131 );
identifying the relationship between the attributes of reference table with information of the knowledge database (3) (132);
identifying a second confidence value by comparing the structure of the collection of data (2) with information of the knowledge database (3) (133); and
identifying a third confidence value by comparing the attributes of reference table with semantic instance of the knowledge database (3) (134); storing the second and third confidence value in the confidence value list (135); and
updating the list of the confidence value list (4) with the highest value of each confidence value (136).
9. A method (10) according to claim 5, wherein the step of obtaining an overall confidence value for the reference table based on the result of each confidence value (14), comprising the steps of:
obtaining the first confidence value identified from the confidence value list (141 );
obtaining the second confidence value identified from the confidence value list (142) ; obtaining the third confidence value identified from the confidence value list (143) ;
determining the overall confidence value based on the value of each confidence value (144) ; and
ranking and determining the reference table based on the highest value of the overall confidence value (145).
PCT/MY2015/050121 2014-10-14 2015-10-12 A system and method for managing a collection of data Ceased WO2016060546A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2014002928A MY187722A (en) 2014-10-14 2014-10-14 A system and method for managing a collection of data
MYPI2014002928 2014-10-14

Publications (1)

Publication Number Publication Date
WO2016060546A1 true WO2016060546A1 (en) 2016-04-21

Family

ID=55071113

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2015/050121 Ceased WO2016060546A1 (en) 2014-10-14 2015-10-12 A system and method for managing a collection of data

Country Status (2)

Country Link
MY (1) MY187722A (en)
WO (1) WO2016060546A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027019A2 (en) * 2003-09-10 2005-03-24 Exeros, Inc, A method and apparatus for semantic discovery and mapping between data sources
US8250101B2 (en) 2010-05-27 2012-08-21 International Business Machines Corporation Ontology guided reference data discovery
US8386529B2 (en) 2010-02-21 2013-02-26 Microsoft Corporation Foreign-key detection
US8583626B2 (en) 2012-03-08 2013-11-12 International Business Machines Corporation Method to detect reference data tables in ETL processes
US8631048B1 (en) * 2011-09-19 2014-01-14 Rockwell Collins, Inc. Data alignment system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027019A2 (en) * 2003-09-10 2005-03-24 Exeros, Inc, A method and apparatus for semantic discovery and mapping between data sources
US8386529B2 (en) 2010-02-21 2013-02-26 Microsoft Corporation Foreign-key detection
US8250101B2 (en) 2010-05-27 2012-08-21 International Business Machines Corporation Ontology guided reference data discovery
US8631048B1 (en) * 2011-09-19 2014-01-14 Rockwell Collins, Inc. Data alignment system
US8583626B2 (en) 2012-03-08 2013-11-12 International Business Machines Corporation Method to detect reference data tables in ETL processes

Also Published As

Publication number Publication date
MY187722A (en) 2021-10-14

Similar Documents

Publication Publication Date Title
US12056120B2 (en) Deriving metrics from queries
US20210374109A1 (en) Apparatus, systems, and methods for batch and realtime data processing
US10242016B2 (en) Systems and methods for management of data platforms
US8935364B2 (en) Method, apparatus, and program for supporting creation and management of metadata for correcting problem in dynamic web application
US9977815B2 (en) Generating secured recommendations for business intelligence enterprise systems
US20150278355A1 (en) Temporal context aware query entity intent
GB2572541A (en) System and method for identifying at least one association of entity
US9239863B2 (en) Method and apparatus for graphic code database updates and search
US10095766B2 (en) Automated refinement and validation of data warehouse star schemas
US20160092554A1 (en) Method and system for visualizing relational data as rdf graphs with interactive response time
US12353477B2 (en) Providing an object-based response to a natural language query
US20170116306A1 (en) Automated Definition of Data Warehouse Star Schemas
US20240411767A1 (en) Data analysis system and method
US11321359B2 (en) Review and curation of record clustering changes at large scale
EP2909744A1 (en) Performing a search based on entity-related criteria
US20030191727A1 (en) Managing multiple data mining scoring results
US9507764B2 (en) Computerised data entry form processing
US20210248509A1 (en) Data-driven online score caching for machine learning
US20180150543A1 (en) Unified multiversioned processing of derived data
US12026146B2 (en) Data analysis method, apparatus and device
WO2016060546A1 (en) A system and method for managing a collection of data
US10534761B2 (en) Significant cleanse change information
CN109002502B (en) Searching method, device, equipment and storage medium based on SPO data
Voyat et al. OpenTRIAGE: Entity Linkage for Detail Webpages.
CN108304430B (en) Method for modifying database

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15820653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15820653

Country of ref document: EP

Kind code of ref document: A1