WO2012035754A1 - データ統合処理装置、システム、方法及びプログラム - Google Patents
データ統合処理装置、システム、方法及びプログラム Download PDFInfo
- Publication number
- WO2012035754A1 WO2012035754A1 PCT/JP2011/005129 JP2011005129W WO2012035754A1 WO 2012035754 A1 WO2012035754 A1 WO 2012035754A1 JP 2011005129 W JP2011005129 W JP 2011005129W WO 2012035754 A1 WO2012035754 A1 WO 2012035754A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processing method
- integration processing
- integrated
- graph
- integration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
Definitions
- the present invention relates to a data integration processing device, a data integration processing system, a data integration processing method, and a data integration processing program for integrating a plurality of graphs.
- Patent Document 1 As a data integration technique, systems described in Patent Document 1 and Non-Patent Document 1 are known.
- the directed graph unification device described in Patent Document 1 includes an expression unit 13, a merge unit 14, and a tag check unit 15.
- the directed graph unifying device having such a configuration integrates the directed graphs as follows.
- the expression means 13 represents the input directed graph as a list of pairs of tags and corresponding partial directed graphs. This list is called a tag list.
- the merging means 14 merges the tag lists corresponding to the two directed graphs.
- the tag check means 15 checks that the partial directed graphs corresponding to the tags with the same name in the merged tag list are the same.
- Non-Patent Document 1 the same node determination rule between two graphs defined externally as a map function is used, and integration is performed while determining the coincidence of graphs by the following algorithm. carry out.
- Map M maps blank nodes to blank nodes.
- ⁇ M (lit) lit for all the specific values (RDF literal) lit which are nodes of the graph G.
- ⁇ M (uri) uri for all intermediate nodes (see RDF URI) uri that are G nodes.
- the triple (M (s), p, M (o) is not limited to that time. ) Is in G ′.
- the problem in the technology described above is that the integration function and the throughput are in a trade-off relationship.
- the integration function provided as in the method described in Patent Document 1 is simplified, and the processing is realized with [number of nodes in the graph] 2 .
- Non-Patent Document 1 Requires a processing speed of ([number of nodes in the graph] ⁇ [average number of properties per node]) 2 .
- the existing technology has a problem that it is impossible to achieve both integration processing and throughput.
- the present invention provides a data integration processing device, a data integration processing system, a data integration processing method, and a data integration processing program capable of realizing high throughput while keeping the restriction on the provided integration function low. Objective.
- a data integration processing device is a data integration processing device that integrates a plurality of graphs, and includes an integration processing method selection unit that selects an integration processing method used for integration of input graph groups, and a plurality of integration processing methods. And an integrated processing execution means for integrating a plurality of graphs by executing the integrated processing according to the integrated processing method selected by the integrated processing method selecting means among the plurality of integrated processing methods. Selecting an integration processing method to be used for integrating lower nodes according to the frequency with which lower nodes match when upper nodes match for each node in the input graph. To do.
- the data integration processing system is a data integration processing system for integrating a plurality of graphs, and includes an integration processing method selection means for selecting an integration processing method used for integration of input graph groups, and a plurality of integration processing methods. And an integrated processing execution unit that integrates a plurality of graphs by executing the integrated processing according to the integrated processing method selected by the integrated processing method selecting unit among the plurality of integrated processing methods. Selecting an integration processing method to be used for integrating lower nodes according to the frequency with which lower nodes match when upper nodes match for each node in the input graph. To do.
- a data integration processing method is a data integration processing method for integrating a plurality of graphs, and selects an integration processing method used for integration of an input graph group, and has a plurality of integration processing methods.
- the integration processing method used for integrating the lower nodes is selected according to the frequency with which the lower nodes match.
- a data integration processing program is a data integration processing program for integrating a plurality of graphs, and an integration processing method selection process for selecting an integration processing method used for integration of graph groups input to a computer;
- the integrated processing execution processing for integrating a plurality of graphs is executed by executing the integrated processing according to the integrated processing method selected from the plurality of integrated processing methods. For each node in the input graph, when the upper nodes match, execute the process of selecting the processing integration processing method used to integrate the lower nodes according to the frequency with which the lower nodes match It is characterized by making it.
- FIG. 1 is a block diagram schematically illustrating a configuration of a data integration processing system described in Patent Document 1 as a first related technique.
- FIG. It is a functional block diagram which shows the minimum structural example of a data integration processing apparatus.
- FIG. 1 is a functional block diagram showing an example of the overall configuration of the first embodiment of the data integration processing system according to the present invention.
- the data integration processing system includes a data integration processing device 1 and an analysis processing device 2.
- the data integration processing device 1 and the analysis processing device 2 are configured as different devices will be described.
- the analysis processing device 2 is realized by an information processing device such as a personal computer that operates according to a program.
- the analysis processing device 2 includes a plurality of analysis means (not shown) for analyzing data.
- the data integration processing device 1 is specifically realized by an information processing device such as a personal computer that operates according to a program.
- the data integration processing device 1 includes an analysis unit-specific characteristic storage unit 5, an integration processing method selection rule storage unit 8, an integration processing method selection unit 4, an integration processing execution unit 6, and a characteristic learning unit 7. .
- Each means is controlled by an integrated control means (not shown).
- the analysis unit-specific characteristic storage unit 5 stores, for each analysis unit included in the analysis processing device 2, the characteristic information of the partial graph in the graph expressing the analysis result.
- the characteristic storage means 5 by analysis means is realized by a storage device such as an optical disk device or a magnetic disk device.
- the integrated processing method selection rule storage unit 8 stores rule information (for example, a selection rule table 810) indicating rules for selecting an optimal integrated processing method for the graph data.
- the integrated processing method selection rule storage unit 8 is realized by a storage device such as an optical disk device or a magnetic disk device.
- the integrated processing method selection means 4 receives the graph group expressing the analysis result from the analysis processing device 2 and the information of the analysis means group that outputs the graph group, and integrates the characteristic information stored in the characteristic storage means 5 for each analysis means, A function of selecting an appropriate integrated processing method based on the rule information stored in the processing method selection rule storage unit 8 is provided. More specifically, the integrated processing method selection means 4 is realized by a CPU of an information processing apparatus that operates according to a program.
- the integration processing execution means 6 has a function of integrating the graphs by executing the integration processing according to the integration processing method selected by the integration processing method selection means 4 and transmitting the result to the analysis processing device 2.
- the integrated processing execution means 6 is realized by a CPU of an information processing apparatus that operates according to a program.
- the integrated processing execution means 6 includes a first integrated processing method executing means 9, a second integrated processing method executing means 10, and a third integrated processing method executing means 11 that execute graph integration processing by different methods. Including. In the present embodiment, an example including three different integrated processing method execution units will be described, but the present invention is not limited to this, and any number of two or more may be used. Each integrated processing method execution unit is realized by, for example, the CPU of the information processing apparatus executing a process based on an existing graph integration algorithm.
- the characteristic learning unit 7 has a function of receiving the graph group and the information of the analysis unit group that outputs the graph group from the integrated processing method selection unit 4 and updating the information stored in the characteristic storage unit 5 by analysis unit.
- the characteristic learning unit 7 is realized by a CPU of an information processing apparatus that operates according to a program.
- the analysis unit-specific characteristic storage unit 5 stores a coincidence duplication frequency table 510, a contradiction duplication frequency table 520, and a class property appearance frequency table 530.
- the coincidence overlap frequency table 510 has IDs of analysis means provided in the analysis processing apparatus 2 on the vertical axis and the horizontal axis.
- the expression method of the analysis means ID does not need to be limited to numerals, and any expression can be used as long as the analysis means can be uniquely specified, such as an arbitrary character string or URI.
- Each cell in the coincidence duplication frequency table 510 stores a location where coincidence duplication occurs and its frequency in the analysis results output by the two analysis means corresponding to the analysis means IDs on the vertical and horizontal axes of the cell, respectively. That is, for each node in the graph output by each of the two analysis means, when the upper node matches, the location where the lower node matches and the frequency thereof are stored.
- the element data of the analysis result overlaps about two graphs here, the information which comprises element data corresponds is called coincidence overlap.
- the class name of the graph data node to be processed and the property name which is the label of the edge of the graph data can be used.
- the contradiction overlap frequency table 520 has IDs of analysis means provided in the analysis processing device 2 on the vertical axis and the horizontal axis.
- the expression method of the analysis means ID does not need to be limited to numerals, and any expression can be used as long as the analysis means can be uniquely specified, such as an arbitrary character string or URI.
- Each cell in the contradiction duplication frequency table 520 stores the location and frequency of occurrence of contradiction in the analysis results output by the two analysis units corresponding to the cell vertical axis and horizontal axis analysis unit IDs. That is, for each node in the graph output by each of the two analysis means, the location where the upper node matches, but the lower node does not match, and its frequency are stored.
- the two graphs when the element data of the analysis results are duplicated, the difference in the information constituting the element data is called contradiction duplication.
- the node name of the graph data to be processed and the property name that is the label of the edge of the graph data can be used.
- the class property appearance frequency table 530 stores an analysis unit ID 531, a class / property ID 532, and a frequency 533.
- the analysis means ID 531 is used to uniquely identify the analysis means provided in the analysis processing device 2, similarly to the coincidence duplication management table 510 and the contradiction duplication frequency table 520.
- the class / property ID 532 is used to uniquely identify the class or property included in the data in the analysis result graph.
- the expression method of the class / property ID 532 does not need to be limited to English characters, and any expression can be used as long as the expression can uniquely identify the class or property, such as an arbitrary character string or URI. it can.
- the appearance frequency of the class or property specified by the class / property ID 532 is calculated with the whole class or property included in the graph output as the analysis result output by the analysis unit specified by the analysis unit ID 531 as a parameter. Value.
- the expression method of the frequency 533 is not limited to the percentage, and any numerical expression can be used.
- the integrated processing method selection rule storage unit 8 stores a selection rule table 810. As illustrated in FIG. 5, the selection rule table 810 stores a selection rule ID 811, a rule 812, and an integrated processing method ID 813.
- the selection rule ID 811 is an ID for uniquely identifying the selection rule, and is the main key of the selection rule table 810.
- the expression method of the selection rule ID 811 need not be limited to numbers, and any expression can be used as long as the expression can uniquely identify the selection rule, such as an arbitrary character string or URI.
- Rule 812 is used to select an integrated processing method.
- the integrated processing method to be used is designated by the integrated processing method ID 813 associated with the rule 812.
- the rule 812 includes, for example, “If the matching overlap frequency is high and the contradiction duplication frequency is low, select a low-function and high-speed integrated processing method.” Or “High matching duplication frequency is low and the contradiction duplication frequency is high.
- a rule such as “select a functional integrated processing method” is described.
- Other examples of rules include: “If the value based on the matching overlap frequency is higher than the predetermined value, select a low-function and high-speed integrated processing method.” Or “The value based on the matching overlap frequency is higher than the predetermined value. If the value is too low, the rule may be to select a high-function and low-speed integrated processing method.
- the value based on the coincidence duplication frequency is higher than the predetermined value, and the value based on the contradiction duplication frequency is higher than the predetermined value. If the value is lower than the predetermined value and the value based on the contradiction overlap frequency is higher than the predetermined value, the value is high.
- a rule such as “select a functional integrated processing method” may be used. For example, the rule shown in FIG. 5 is “if the value obtained by multiplying the coincidence overlap frequency by the appearance frequency described later is higher than a predetermined value, and the value obtained by multiplying the contradiction overlap frequency by the appearance frequency described later is equal to or less than the predetermined value.
- a low-function and high-speed integrated processing method or“ a value obtained by multiplying the coincidence duplication frequency by the appearance frequency described later is equal to or less than a predetermined value, and a value obtained by multiplying the contradiction duplication frequency by the appearance frequency described later is predetermined. If the value is higher than the value, a high-functional integrated processing method is selected. " Note that the description method of the rule 812 need not be limited to a logical expression, and a description such as a decision tree can also be used.
- the integration processing method ID is used to uniquely identify the integration processing method (specifically, the integration processing method execution means (9 to 11)).
- the expression method of the integrated processing method ID does not need to be limited to a character string, and any expression can be used as long as the expression can uniquely identify the integrated processing method, such as an arbitrary character string or URI. it can.
- the data integration processing system of the first embodiment executes data integration processing Sa, integration processing method selection processing Sb, and characteristic learning processing Sc.
- the data integration processing system executes a series of integration processing in response to a request from the analysis processing device 2, and returns a result after the integration processing.
- the integration processing method selection means 4 selects an optimal integration processing method group for the graph group to be integrated.
- the characteristic learning unit 7 receives the graph group to be integrated and the information of the analysis unit group that outputs the graph group from the integration processing method selection unit 4, and the characteristic storage unit 5 by analysis unit stores the information. Update information.
- FIG. 6 is a flowchart showing an example of the flow of data integration processing executed by the data integration processing system.
- the integration processing method selection means 4 receives a request for data integration processing from the analysis processing device 2 (step Sa1).
- the integrated processing method selection means 4 receives from the analysis processing device 2 an analysis result graph group expressing the analysis result group and an ID group of the analysis means outputting the graph (step Sa2).
- the integrated processing method selection means 4 performs an integrated processing method selection process (step Sb) and selects an integrated processing method ID group. Details of the integrated processing method selection processing (step Sb) will be described later.
- the integrated processing method selection unit 4 outputs the analysis result graph group and the integrated processing method ID group to the integrated processing execution unit 6.
- the integrated processing execution means 6 extracts the integrated processing method ID corresponding to the analysis result graph from the integrated processing method ID group for all the analysis result graphs, and executes the integrated processing method corresponding to the integrated processing method ID. Any one of the means 9 to 11 is caused to execute the integration processing of the two graphs (steps Sa3 to Sa5). Specifically, the integrated processing execution means 6 extracts the integrated processing method ID corresponding to the analysis result graph, and integrates the integrated processing method into any of the integrated processing method executing means 9 to 11 specified by the extracted integrated processing method ID. Output the request. Then, the integrated processing method execution means (any of 9 to 11) executes processing for integrating the two graphs according to the request.
- the integration processing execution means 6 transmits the graph after the integration processing to the analysis processing device 2 (step Sa6).
- FIG. 7 is a flowchart showing an example of the flow of the integration processing selection process executed by the data integration processing system.
- the integration process method selection means 4 stores the characteristic information corresponding to all analysis result graphs and the IDs of the analysis means that output the analysis result graphs, according to the characteristic storage means 5 by analysis means. Extract from Then, based on the extracted characteristic information, the integrated processing method selection unit 4 identifies a rule that matches from the rule information stored in the integrated processing method selection rule storage unit 8 (steps Sb3 and Sb4). Specifically, the integrated processing method selection unit 4 selects a selection rule ID that matches the condition described in the rule 812 based on the coincidence duplication frequency and contradiction duplication frequency included in the extracted characteristic information and its appearance frequency. Identify.
- the integrated processing method selection unit 4 acquires the integrated processing method ID 813 corresponding to the specified rule from the integrated processing method selection rule storage unit 8, and internally holds it (step Sb5). Specifically, the integrated processing method selection unit 4 extracts information indicating the integrated processing method ID 813 corresponding to the specified selection rule ID 811 from the integrated processing method selection rule storage unit 8, and temporarily stores the extracted information in the storage unit.
- the integrated processing method selection unit 4 extracts information indicating the integrated processing method ID 813 corresponding to the specified selection rule ID 811 from the integrated processing method selection rule storage unit 8, and temporarily stores the extracted information in the storage unit.
- the integrated processing method selection unit 4 outputs the integrated processing method ID group to the integrated control unit (step Sb6). Specifically, the integration processing method selection unit 4 outputs information indicating the extracted integration processing method ID 813 to an integration control unit that controls each unit of the data integration processing device 2.
- FIG. 8 is a flowchart showing an example of a characteristic learning process executed by the data integration processing system.
- the characteristic learning unit 7 receives the graph group of the analysis result and the ID group of the analysis unit of the analysis processing apparatus 2 that has output the graph group from the integrated processing method selection unit 4 (step Sc1). .
- the characteristic learning means 7 executes the following process for all pairs of combinations of the received graph group (step Sc2).
- the characteristic learning means 7 calculates the coincidence overlap frequency between the analysis result graphs for the graph pair (step Sc3).
- the characteristic learning means 7 calculates the class / property appearance frequency between the analysis result graphs for the graph pair (step Sc4).
- the characteristic learning means 7 calculates the contradiction overlap frequency between the analysis result graphs for the graph pair (step Sc5).
- steps Sc3, Sc4, and Sc5 can be executed in parallel, and the execution order is not limited.
- the characteristic learning unit 7 extracts information indicating the corresponding frequency from the characteristic storage unit 5 for each analysis unit for each frequency calculated in steps Sc3 to Sc5, and obtains a weighted average for each (step Sc6).
- the characteristic learning means 7 stores each frequency value for which the weighted average is obtained in the characteristic-by-analysis characteristic storage means 5 (step Sc7).
- the characteristic learning means 7 ends the learning process when the processes from Step Sc3 to Sc7 are executed for all combinations of graphs (Step Sc2).
- the data integration processing system updates the characteristic information stored in the characteristic storage means 5 by analysis means as needed by executing such characteristic learning processing Sc every predetermined period.
- the effect of the present embodiment is to improve the throughput of the integration process without limiting the function by executing the integration process using the integration process method that provides the function required when integrating the two graphs. .
- the integrated processing method selecting means 4 selects an appropriate integrated processing method based on the statistical information between the graphs stored in the characteristic storing means 5 for each analyzing means, so that advanced processing is inadvertently performed. This is because it can be prevented from being applied to all graphs and wasteful processing can be reduced.
- FIG. 9 is a functional block diagram showing an example of the overall configuration of the second embodiment of the data integration processing system.
- the data integration processing system of the second embodiment is different from the first embodiment in that it includes a graph dividing unit 12, and the graph dividing unit 12 divides a graph before integration into subgraph groups.
- FIG. 9 the components other than the graph dividing means 12 are the same as those in the first embodiment. Constituent elements similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof is omitted.
- the graph dividing unit 12 receives from the integration processing method selection unit 4 the graph group to be integrated and the analysis unit ID group that has output the graph group. Then, the graph dividing unit 12 extracts a portion showing the same characteristic in the graph as a subgraph based on the characteristic information in the analyzing unit-specific characteristic storage unit 5, and divides the integration target graph group into subgraph groups. Specifically, the graph dividing unit 12 is realized by a CPU of an information processing apparatus that operates according to a program.
- the data integration process Sd, the integration process selection process Sb, and the characteristic learning process Sc are executed as in the first embodiment.
- this embodiment is different from the first embodiment in that the subgraph dividing process Se is executed and the graph dividing unit 12 divides the input graph into subgraphs in the data integration process Sd.
- FIG. 10 is a flowchart showing an example of the flow of data integration processing Sd executed by the data integration processing system.
- the same operation elements as those in the first embodiment are denoted by the same reference numerals as those in FIG. 6, and detailed description thereof is omitted.
- the integration processing method selection means 4 receives a request for data integration processing from the analysis processing device 2 (step Sa1).
- the integrated processing method selection means 4 receives from the analysis processing device 2 an analysis result graph group expressing the analysis result group and an ID group of the analysis means outputting the graph (step Sa2).
- the integrated processing method selection unit 4 outputs the received analysis result graph group and the ID group of the analysis unit that has output the graph to the graph dividing unit 12. Then, the graph dividing unit 12 divides the graph into subgraphs (Step Se). Details of step Se will be described later.
- step Sb to Sa6 Since the subsequent processing is the same as the processing in the first embodiment (steps Sb to Sa6), description thereof is omitted.
- FIG. 11 is a flowchart showing an example of the flow of the subgraph dividing process Se executed by the data integration processing system.
- the graph dividing unit 12 receives the analysis result graph group and the ID group of the analyzing unit of the analysis processing apparatus 2 that has output the graph from the integrated processing method selecting unit 4. On the other hand, the following processing is performed (step Se1).
- the graph dividing unit 12 refers to the coincidence duplication degree table 510 and the contradiction duplication degree table 520 from the characteristic storage unit 5 for each class and property included in the graph, and the coincidence duplication frequency is high. Listed are those having a low contradiction frequency (step Se2).
- the graph dividing unit 12 extracts a subgraph including many class properties having a high coincidence overlap frequency and a low contradiction duplication frequency as a high coincidence subgraph (step Se3).
- the graph dividing unit 12 refers to the coincidence duplication degree table 510 and the contradiction duplication degree table 520 from the characteristic storage unit 5 for each class and property included in the graph, and the coincidence duplication frequency is determined.
- the ones that are low and have a high contradiction overlap frequency are listed (step Se4).
- the graph dividing unit 12 extracts a subgraph including many class properties having a low coincidence overlap frequency and a high contradiction overlap frequency as a high contradiction subgraph (step Se5).
- the graph dividing means 12 extracts data that is not included in either the high coincidence subgraph or the high contradiction subgraph in the graph as a subgraph (step Se6).
- the graph dividing unit 12 outputs the extracted subgraph group to the integrated processing method selecting unit 4 (step Se8).
- the effect of this embodiment is that the processing time can be shortened by speeding up the integration process by dividing the graph when the size of the graph increases.
- the graph dividing means 12 can divide the graph into sub-graphs in which data having similar characteristics are collected, so that the integration processing can be efficiently performed according to each integration processing method.
- Embodiment 3 FIG. Next, a third embodiment of the present invention will be described.
- the analysis processing apparatus 2 includes analysis means described below.
- a customer information search engine that outputs the customer's email address, address, and name corresponding to the entered name.
- the data integration processing device can execute integration processing according to the following integration processing method.
- First integration processing method a method of executing integration processing that simply integrates two input graphs by regarding nodes having the same node ID and value as the same.
- the integration processing method (ii) requires scanning of the coincidence of the nodes between the two graphs. Therefore, if the number of nodes is N, the amount of calculation of N 2 is necessary. It is. On the other hand, in the integration processing method (b), the model is scanned in order to resolve the contradictions of the inconsistent nodes when integrating the properties of the target node, so that a calculation amount of N 2 ⁇ N 2 is required.
- the integrated processing method selection rule storage unit 8 stores a selection rule table 810 shown in FIG. 5 as rule information.
- the integrated processing method selection means 4 receives the analysis result data of the analysis means (1) and (2) from the analysis processing device 2, it operates as follows.
- the integrated processing method selection unit 4 matches the analysis unit characteristic storage unit 5 based on the analysis unit ID group (in this case, the customer information search engine and the employee information search engine) included in the received analysis result data.
- the duplication frequency (Person 70%) and the contradiction duplication frequency (e-mail 80%) are extracted.
- the integrated processing method selection means 4 receives the class / property appearance frequency (Person 33%, e-mail 33%) from the analysis means-specific characteristic storage means 5 based on the analysis means ID included in the received analysis result data. ).
- the integrated processing method selection means 4 determines whether or not the rule matches based on the rule information stored in the integrated processing method selection rule storage means 8 and the extracted frequency information.
- the integrated processing method selection unit 4 determines that the rule matches the rule with the rule ID: 002 (see FIG. 5), and selects the second integrated processing method associated with the rule ID: 002.
- the integrated processing method selection unit 4 outputs an integrated processing method ID for specifying the second integrated processing method to the integrated control unit.
- the data integration processing device 1 provides a function of integrating while solving this contradiction. Therefore, the calculation time (calculation amount) is N 2 ⁇ N 2 .
- the integrated processing method selection means 4 receives the analysis result data of the analysis means (1) and (3) from the analysis processing apparatus 2, it operates as follows.
- the integrated processing method selection means 4 is based on the analysis means-specific characteristic storage means 5 based on the analysis means ID group (in this case, the customer information search engine and the person flow line search engine) included in the received analysis result data.
- the coincidence duplication frequency (Person 100%) and the contradiction duplication frequency (none) are extracted.
- the integrated processing method selection means 4 extracts the class / property appearance frequency (Person 33%) from the analysis means-specific characteristic storage means 5 based on the analysis means ID included in the received analysis result data.
- the integrated processing method selection means 4 determines whether or not the rule matches based on the rule information stored in the integrated processing method selection rule storage means 8 and the extracted frequency information.
- the integrated processing method selection means 4 determines that the rule matches the rule with the rule ID: 001 (see FIG. 5), and selects the first integrated processing method associated with the rule ID: 001.
- the integrated processing method selection unit 4 outputs an integrated processing method ID for specifying the first integrated processing method to the integrated control unit.
- the data integration processing device 1 provides a simple ID matching type integration function. Therefore, calculation time (calculation amount) becomes N 2.
- the average processing time is N 2 ⁇ (0.8 + 0.2 ⁇ N 2 ). Therefore, it becomes faster than the average processing time (N 2 ⁇ N 2 ) of the system using only (R).
- the present invention is for improving the integration performance of processing result data in a processing platform that integrates and operates a plurality of function groups.
- FIG. 16 is a block diagram illustrating a minimum configuration example of the data integration processing device.
- the data integration processing device includes, as a minimum component, an integration processing method selection unit 4 that selects an integration processing method used for integration of an input graph group, and an integration process having a plurality of integration processing methods. Execution means 6.
- the integration processing method selection unit 4 selects the nodes in the input graph according to the frequency with which the lower nodes match when the upper nodes match.
- the integration processing method used for integrating the lower nodes is selected.
- the integration processing execution means 6 integrates the plurality of input graphs by executing the integration processing according to the integration processing method selected by the integration processing method selection means 4 among the plurality of integration processing methods.
- the data integration processing device having the minimum configuration it is possible to improve the throughput of the integration processing without limiting the functions by applying the integration processing method that provides the functions required when integrating the two graphs. is there.
- the data integration processing device is a data integration processing device (for example, realized by the data integration processing device 1) that integrates a plurality of graphs (for example, analysis result data by the analysis means of the analysis processing device 2).
- An integration processing method selection means for example, realized by the integration processing method selection means 4) for selecting an integration processing method (for example, (ii) a first integration processing method) used for integration of the input graph group;
- the integrated processing execution means for example, the integration processing execution means for integrating a plurality of graphs by executing the integration processing according to the integration processing method selected by the integration processing method selection means among the plurality of integration processing methods
- the integrated processing method selection means is configured so that the upper node matches each node in the input graph. , Depending on the frequency the lower the node matches, and selects the integration processing method used for integrating lower node.
- the integration processing method selection means for each node in the input graph, when the upper node matches, the matching overlap frequency, which is the frequency with which the lower node matches,
- the integration processing method used for integrating the lower nodes may be selected based on the contradiction overlap frequency, which is the frequency at which the nodes exist but do not match.
- the integration processing execution means can execute the integration processing according to the second integration processing method having a high integration function and a low processing speed, or the first integration processing method having a low integration function and a high processing speed.
- the integration processing method selection means selects the first integration processing method when the upper node matches for each node in the input graph and the lower node matches frequently. However, when the frequency is low, the second integrated processing method may be selected.
- the integration processing execution means has a first integration processing method having a predetermined integration function and capable of integration processing at a predetermined processing speed, or more integrated than the first integration processing method.
- the integration processing can be executed according to the second integration processing method having a high function but a low processing speed,
- the integration processing method selection unit selects the first integration if the value based on the frequency with which the lower nodes match is higher than a predetermined value when the upper nodes match.
- the second integrated processing method may be selected.
- the integrated processing method selection means for each node in the input graph, when the upper node matches, the value based on the frequency with which the lower node matches is higher than a predetermined value, When the value based on the frequency that does not match is lower than the predetermined value, the first integrated processing method is selected, the value based on the frequency that matches is lower than the predetermined value, and the value based on the frequency that does not match is lower than the predetermined value. If it is higher, the second integrated processing method may be selected.
- the data integration processing device includes a graph dividing unit (for example, realized by the graph dividing unit 12) that divides the input graph into a plurality of subgraphs, and the graph dividing unit includes each graph in the input graph. For nodes, when the upper nodes match, the graph is divided into subgraphs based on the frequency with which the lower nodes match, and the integration processing method selection means selects the integration processing method for each subgraph divided by the graph division means Then, the integration processing execution means may be configured to execute the integration processing in units of subgraphs divided by the graph dividing means.
- the integrated processing method selection unit analyzes the statistical frequency based on the analysis unit that outputs the input graph. It may be configured to select an integration processing method that is extracted from another characteristic storage means and used to integrate the graphs based on the extracted statistical frequency.
- the statistical frequency with which the lower node matches is calculated and stored in the characteristic storage unit by analysis unit Characteristic learning means (for example, realized by the characteristic learning means 7), and the characteristic learning means inputs information indicating the analysis means that has output the input graph from the integrated processing method selection means, and enters the input information.
- the statistical frequency may be calculated based on the information, and the information stored in the characteristic storage unit for each analysis unit may be sequentially updated.
- the program is stored in a storage device or recorded on a computer-readable recording medium.
- the recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory.
- the present invention can be applied to applications such as a data integration processing device and a program for realizing a data integration processing device in a computer for improving the throughput of integration processing in a processing platform that integrates a plurality of graph data.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
次に、本発明の第1の実施形態について図面を参照して詳細に説明する。図1は、本発明によるデータ統合処理システムの第1の実施形態の全体構成の一例を示す機能ブロック図である。図1に示すように、データ統合処理システムは、データ統合処理装置1と解析処理装置2とを含む。なお、本実施形態では、データ統合処理装置1と解析処理装置2とが異なる装置として構成されている例について説明するが、これに限らず、単一の装置によって構成されていてもよい。
次に、本発明の第2の実施形態について説明する。図9に、データ統合処理システムの第2の実施形態の全体構成の一例を示す機能ブロック図を示す。
次に、本発明の第3の実施形態について説明する。ここでは、解析処理装置2は、下記に示す解析手段を備えているものとする。
統合処理方法選択手段は、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度に基づく値が所定値よりも高い場合には、第1の統合処理方法を選択し、頻度に基づく値が所定値よりも低い場合には、第2の統合処理方法を選択するよう構成されていてもよい。
2 解析処理装置
4 統合処理方法選択手段
5 解析手段別特性記憶手段
6 統合処理実行手段
7 特性学習手段
8 統合手段選択ルール記憶手段
9 第1の統合処理方法実行手段
10 第2の統合処理方法実行手段
11 第3の統合処理方法実行手段
12 グラフ分割手段
510 一致重複頻度テーブル
520 矛盾重複頻度テーブル
530 クラスプロパティ出現頻度テーブル
810 選択ルールテーブル
Claims (10)
- 複数のグラフを統合するデータ統合処理装置であって、
入力されたグラフ群の統合に用いる統合処理方法を選択する統合処理方法選択手段と、
統合処理方法を複数有し、該複数の統合処理方法のうちの前記統合処理方法選択手段が選択した統合処理方法に従って統合処理を実行することにより複数のグラフを統合する統合処理実行手段とを備え、
前記統合処理方法選択手段は、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度に応じて、前記下位のノードを統合するために用いる統合処理方法を選択する
ことを特徴とするデータ統合処理装置。 - 統合処理方法選択手段は、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度である一致重複頻度と、下位のノードが存在するが一致しない頻度である矛盾重複頻度とに基づいて、前記下位のノードを統合するために用いる統合処理方法を選択する
請求項1記載のデータ統合処理装置。 - 統合処理実行手段は、統合機能が高く処理速度が遅い第2の統合処理方法または統合機能が低く処理速度が速い第1の統合処理方法に従って統合処理を実行可能であり、
統合処理方法選択手段は、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度が高い場合には、前記第1の統合処理方法を選択し、前記頻度が低い場合には、前記第2の統合処理方法を選択する
請求項1又は請求項2記載のデータ統合処理装置。 - 統合処理実行手段は、所定の統合機能を有すると共に所定の処理速度で統合処理可能な第1の統合処理方法、または、当該第1の統合処理方法よりも統合機能が高いが処理速度が遅い第2の統合処理方法、に従って統合処理を実行可能であり、
統合処理方法選択手段は、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度に基づく値が所定値よりも高い場合には、前記第1の統合処理方法を選択し、前記頻度に基づく値が所定値よりも低い場合には、前記第2の統合処理方法を選択する
請求項1又は請求項2記載のデータ統合処理装置。 - 入力されたグラフを複数のサブグラフに分割するグラフ分割手段を備え、
前記グラフ分割手段は、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度に基づいて前記グラフをサブグラフに分割し、
統合処理方法選択手段は、前記グラフ分割手段が分割したサブグラフ単位で統合処理方法を選択し、
統合処理実行手段は、前記グラフ分割手段が分割したサブグラフ単位で統合処理を実行する
請求項1から請求項4のうちのいずれか1項に記載のデータ統合処理装置。 - 過去に入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する統計頻度を、該グラフを出力した解析手段と対応付けて格納する解析手段別特性記憶手段を備え、
統合処理方法選択手段は、入力されたグラフを出力した解析手段に基づいて前記統計頻度を前記解析手段別特性記憶手段から抽出し、抽出した統計頻度に基づいて該グラフを統合する統合処理方法を選択する
ことを特徴とした請求項1から請求項5のうちのいずれか1項に記載のデータ統合処理装置。 - 過去に入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する統計頻度を算出し、解析手段別特性記憶手段に格納させる特性学習手段を備え、
前記特性学習手段は、入力されたグラフを出力した解析手段を示す情報を統合処理方法選択手段から入力し、入力した情報に基づいて統計頻度を算出し、前記解析手段別特性記憶手段が格納する情報を順次更新する
請求項6記載のデータ統合処理装置。 - 複数のグラフを統合するデータ統合処理システムであって、
入力されたグラフ群の統合に用いる統合処理方法を選択する統合処理方法選択手段と、
統合処理方法を複数有し、該複数の統合処理方法のうちの前記統合処理方法選択手段が選択した統合処理方法に従って統合処理を実行することにより複数のグラフを統合する統合処理実行手段とを含み、
前記統合処理方法選択手段は、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度に応じて、前記下位のノードを統合するために用いる統合処理方法を選択する
ことを特徴とするデータ統合処理システム。 - 複数のグラフを統合するデータ統合処理方法であって、
入力されたグラフ群の統合に用いる統合処理方法を選択し、
統合処理方法を複数有し、該複数の統合処理方法のうちの選択した統合処理方法に従って統合処理を実行することにより複数のグラフを統合し、
統合処理方法を選択する際には、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度に応じて、前記下位のノードを統合するために用いる統合処理方法を選択する
ことを特徴とするデータ統合処理方法。 - 複数のグラフを統合するためのデータ統合処理プログラムであって、
コンピュータに、
入力されたグラフ群の統合に用いる統合処理方法を選択する統合処理方法選択処理と、
統合処理方法を複数有し、該複数の統合処理方法のうちの選択した統合処理方法に従って統合処理を実行することにより複数のグラフを統合する統合処理実行処理とを実行させ、
前記統合処理方法選択処理で、入力されたグラフ中の各ノードについて、上位のノードが一致する際に、下位のノードが一致する頻度に応じて、前記下位のノードを統合するために用いる処理統合処理方法を選択する処理を
実行させるためのデータ統合処理プログラム。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/805,398 US8972356B2 (en) | 2010-09-13 | 2011-09-13 | Device, system, method and program for data integration process |
| JP2012533861A JPWO2012035754A1 (ja) | 2010-09-13 | 2011-09-13 | データ統合処理装置、システム、方法及びプログラム |
| CN2011800361432A CN103026358A (zh) | 2010-09-13 | 2011-09-13 | 数据整合处理设备、系统、方法和程序 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2010204210 | 2010-09-13 | ||
| JP2010-204210 | 2010-09-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2012035754A1 true WO2012035754A1 (ja) | 2012-03-22 |
Family
ID=45831241
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2011/005129 Ceased WO2012035754A1 (ja) | 2010-09-13 | 2011-09-13 | データ統合処理装置、システム、方法及びプログラム |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US8972356B2 (ja) |
| JP (1) | JPWO2012035754A1 (ja) |
| CN (1) | CN103026358A (ja) |
| WO (1) | WO2012035754A1 (ja) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2014126883A (ja) * | 2012-12-25 | 2014-07-07 | Nippon Telegr & Teleph Corp <Ntt> | 部分的木構造に応じた適応型再構成装置及び方法及びプログラム |
| JP2024000384A (ja) * | 2022-06-20 | 2024-01-05 | 株式会社日立製作所 | グラフ統合システム及び方法 |
| WO2024134703A1 (ja) * | 2022-12-19 | 2024-06-27 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、記録媒体 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001014166A (ja) * | 1999-06-29 | 2001-01-19 | Fujitsu Ltd | オントロジー対応付け情報生成装置 |
| JP2005352874A (ja) * | 2004-06-11 | 2005-12-22 | Nippon Telegr & Teleph Corp <Ntt> | 情報検索システム、情報検索装置、情報検索支援装置および情報検索プログラムおよび情報検索支援プログラム |
| JP2008084114A (ja) * | 2006-09-28 | 2008-04-10 | Toshiba Corp | オントロジー統合支援装置、オントロジー統合支援方法及びオントロジー統合支援プログラム |
| WO2008146807A1 (ja) * | 2007-05-31 | 2008-12-04 | Nec Corporation | オントロジ処理装置、オントロジ処理方法、及びオントロジ処理プログラム |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH05204647A (ja) | 1992-01-13 | 1993-08-13 | Nec Corp | 有向グラフの単一化装置 |
| US8150850B2 (en) * | 2008-01-07 | 2012-04-03 | Akiban Technologies, Inc. | Multiple dimensioned database architecture |
| JP5340751B2 (ja) * | 2008-04-22 | 2013-11-13 | 株式会社エヌ・ティ・ティ・ドコモ | 文書処理装置および文書処理方法 |
| US8065302B2 (en) * | 2008-08-27 | 2011-11-22 | Satyam Computer Services Limited | System and method for annotation aggregation |
-
2011
- 2011-09-13 WO PCT/JP2011/005129 patent/WO2012035754A1/ja not_active Ceased
- 2011-09-13 CN CN2011800361432A patent/CN103026358A/zh active Pending
- 2011-09-13 JP JP2012533861A patent/JPWO2012035754A1/ja active Pending
- 2011-09-13 US US13/805,398 patent/US8972356B2/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001014166A (ja) * | 1999-06-29 | 2001-01-19 | Fujitsu Ltd | オントロジー対応付け情報生成装置 |
| JP2005352874A (ja) * | 2004-06-11 | 2005-12-22 | Nippon Telegr & Teleph Corp <Ntt> | 情報検索システム、情報検索装置、情報検索支援装置および情報検索プログラムおよび情報検索支援プログラム |
| JP2008084114A (ja) * | 2006-09-28 | 2008-04-10 | Toshiba Corp | オントロジー統合支援装置、オントロジー統合支援方法及びオントロジー統合支援プログラム |
| WO2008146807A1 (ja) * | 2007-05-31 | 2008-12-04 | Nec Corporation | オントロジ処理装置、オントロジ処理方法、及びオントロジ処理プログラム |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2014126883A (ja) * | 2012-12-25 | 2014-07-07 | Nippon Telegr & Teleph Corp <Ntt> | 部分的木構造に応じた適応型再構成装置及び方法及びプログラム |
| JP2024000384A (ja) * | 2022-06-20 | 2024-01-05 | 株式会社日立製作所 | グラフ統合システム及び方法 |
| JP7807992B2 (ja) | 2022-06-20 | 2026-01-28 | 株式会社日立製作所 | グラフ統合システム及び方法 |
| WO2024134703A1 (ja) * | 2022-12-19 | 2024-06-27 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、記録媒体 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN103026358A (zh) | 2013-04-03 |
| JPWO2012035754A1 (ja) | 2014-01-20 |
| US8972356B2 (en) | 2015-03-03 |
| US20130091095A1 (en) | 2013-04-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR102178295B1 (ko) | 결정 모델 구성 방법 및 장치, 컴퓨터 장치 및 저장 매체 | |
| US11423082B2 (en) | Methods and apparatus for subgraph matching in big data analysis | |
| JP5995409B2 (ja) | コンピュータ解析のためにテキスト文書を表現するためのグラフィカル・モデル | |
| US9563974B2 (en) | Aggregating graph structures | |
| JP4429236B2 (ja) | 分類ルール作成支援方法 | |
| JP2013152656A (ja) | 説明変数の決定のための情報処理装置、情報処理方法及びプログラム | |
| CN101751333A (zh) | 用于支援程序解析的方法、及其计算机程序以及计算机系统 | |
| Guyet et al. | Incremental mining of frequent serial episodes considering multiple occurrences | |
| US9674083B2 (en) | Path calculation order deciding method, program and calculating apparatus | |
| US20130282649A1 (en) | Deterministic finite automation minimization | |
| JPWO2018235841A1 (ja) | グラフ構造解析装置、グラフ構造解析方法、及びプログラム | |
| CN113641654B (zh) | 一种基于实时事件的营销处置规则引擎方法 | |
| WO2012035754A1 (ja) | データ統合処理装置、システム、方法及びプログラム | |
| JP5964781B2 (ja) | 検索装置、検索方法および検索プログラム | |
| JP5790820B2 (ja) | 不整合検出装置、プログラム及び方法、修正支援装置、プログラム及び方法 | |
| US10467530B2 (en) | Searching text via function learning | |
| JPWO2011016281A1 (ja) | ベイジアンネットワーク構造学習のための情報処理装置及びプログラム | |
| JP5206268B2 (ja) | ルール作成プログラム、ルール作成方法及びルール作成装置 | |
| JP2013003611A (ja) | 設計検証方法及びプログラム | |
| CN111046160B (zh) | 交互方法、交互装置以及计算机系统 | |
| JP6135466B2 (ja) | テストケース抽出プログラム、方法及び装置 | |
| CN117391643A (zh) | 一种基于知识图谱的医保单据审核方法及系统 | |
| US20180144043A1 (en) | Creating device, creating method, and non-transitory computer-readable recording medium | |
| US10430104B2 (en) | Distributing data by successive spatial partitionings | |
| KR20150077669A (ko) | 맵리듀스 방식을 이용한 데이터 분석 방법 및 시스템 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 201180036143.2 Country of ref document: CN |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11824768 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2012533861 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 13805398 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 11824768 Country of ref document: EP Kind code of ref document: A1 |