WO2014039884A1 - Modèle de données de graphe basé sur le temps - Google Patents
Modèle de données de graphe basé sur le temps Download PDFInfo
- Publication number
- WO2014039884A1 WO2014039884A1 PCT/US2013/058599 US2013058599W WO2014039884A1 WO 2014039884 A1 WO2014039884 A1 WO 2014039884A1 US 2013058599 W US2013058599 W US 2013058599W WO 2014039884 A1 WO2014039884 A1 WO 2014039884A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- edge
- time
- information
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
Definitions
- the present disclosure relates to data storage and retrieval, and, in one particular example, to time-based and relationship-based data storage and retrieval.
- Relational databases often store data using related tables and use primary keys and foreign keys to capture associations. These relational databases provide a sliding window for highly normalized data. Normalization may be used to organize the data into a set of related tables to capture associations. As part of normalization, redundant data may be removed from the database and tables may be optimized to store only related data. As tables in the database grow, the sliding window technique is used to remove the oldest records from tables in the database. These removed records are either archived or deleted.
- the system receives a request comprising time-based information. Time-based information of the request is compared with a first time interval, which is associated with at least one node of a graph. The node of the graph is matched based on the time-based information being at least partially within the first time interval. The system returns a result comprising an indicator of the node of the graph, wherein the node of the graph is associated with an entity, and the node of the graph includes an attribute of the entity.
- FIG. 1 illustrates capturing a change to the name attribute of an entity.
- FIG. 2 illustrates capturing a change to the name attribute of an entity.
- FIG. 3 illustrates a first exemplary embodiment of the relationship among various generic nodes.
- FIG. 4 illustrates a second exemplary embodiment of the relationship among various generic nodes.
- FIG. 5 illustrates an exemplary process for retrieving a portion of a graph based on a time point or a time interval.
- FIG. 6 depicts an exemplary computing system.
- a graph database may be structured to provide efficient storage and retrieval of data.
- the graph database includes nodes and edges.
- Each node in the graph database may be assigned an object identity.
- the object identity may be unique to the node.
- the object identity need not be unique to the node.
- Edges represent a relationship between the nodes that they connect. Edges may include information about a relationship type, a direction, the type of nodes being connected, the number of participants between each source and destination, properties and attributes of the edge type, and the like.
- the direction information of the edge is based on whether the edge is directed or undirected. For example, a directed edge has a direction of outgoing or incoming, whereas an undirected edge may not have a direction.
- a node may be associated with an entity.
- an entity may be a person, a user, a group, a content, a computing resource, an activity, an event, or the like.
- a group may be, for example, an organizational unit such as a building, a department, or a company.
- a content may be, for example, a document, an email, an image, or the like.
- the node may represent the entity in the database.
- an entity node may be associated with an employee by the name of "Lisa John.” This node may also include information about the employee, such as her legal name, "Lisa John.”
- a node may also be associated with multiple versions of an entity.
- a change to the employee's legal name from "Lisa John” to "Lisa Smith” may be captured in the information stored in the entity node 100.
- the original state information 102 of the employee may remain stored in the entity node 100 and a complete updated state information 104 of the employee may be stored in the node.
- Each state information for the entity may be associated with a time stamp to identify when the change was made.
- the updated state information 104 may include a differential between the original state information 102 and the updated state.
- a change to an employee's legal name from "Lisa John” to "Lisa Smith” may be captured by adding a node that is connected to generic entity node 200.
- State node 202 may include the original state information of the employee. The original state information may be associated with a time stamp to identify when the information was added to node 202.
- State node 204 may include a complete updated state information of the employee. The complete updated state information may be associated with a time stamp to identify when the information was added to node 204.
- the state nodes 202, 204 may also each have a time interval associated with them that indicates the time during which the state applies.
- the generic entity node 200 may point to the various possible versions of the state information of the employee.
- state node 204 may include a differential between the original state information stored in state node 202 and the updated state.
- FIG. 3 illustrates an exemplary embodiment of the relationship among various generic nodes.
- Edges may exist among generic nodes, and not among versions of state nodes.
- the generic entity nodes 300, 302, and 304 may include original state information and updated state information.
- Edge 306 may connect generic entity node 300 and generic entity node 302.
- Edge 306 may include information indicating that the entity associated with generic entity node 300 works for the entity associated with generic entity node 302.
- edge 308 may connect generic entity node 300 and generic entity node 304.
- Edge 308 may include information indicating that the entity associated with generic entity node 300 works for the entity associated with generic entity node 304. This information may indicate, for example, that the employee Lisa Smith worked for two different managers during two different intervals. More specifically, that Lisa Smith worked for Larry Brown for the period 1/6/2011-6/6/2012 and for John Black for the period 6/7/2012-9/1/2012.
- FIG. 4 illustrates an exemplary embodiment of the relationship among various generic nodes.
- Generic entity node 400 includes the original information about an entity, as well as the changes to the information about the entity.
- the entity may be a person by the name of John.
- Generic entity node 400 may include an ID attribute with a value of 2. The ID may be an object identity. This ID attribute value may uniquely identify generic entity node 400. Alternatively, the ID attribute value may not be unique, but may be used to identify generic entity node 400.
- Generic entity node 400 may include two versions of the state information for the entity represented by the node, storing changes about the entity.
- the 2.1 version of the state information may indicate that the entity's name is John, the entity is associated with the location of Boston, and that the 2.1 version information is effective during the time duration between the dates of 3/1-5/31.
- the Boston location information may indicate, for example, the city in which John was living, the location of the work office John was assigned to, and the like, for the duration from March 1 through May 31 of the current year.
- the effective time duration may include the day, the month, the year, the hour, minutes, seconds, milliseconds, and the like.
- Other measurements of time or duration may also be used, such as academic semesters, fiscal quarters, and the like.
- the 2.2 version of the state information may indicate that the entity's name is John, the entity is associated with the location of Palo Alto, and that the 2.2 version information is effective starting 6/1 and continuing until the current day.
- the Palo Alto location information may indicate, for example, the city in which John was living, the location of the work office John was assigned to, and the like, for the duration starting on June 1 through the current date.
- two versions of state information are stored in a node for an entity: a "Boston" version and a "Palo Alto" version.
- Each version of the state information is associated with the ID attribute value of the node, and each version of the state information includes a time duration during which the version information is effective, or up-to-date.
- Generic entity node 402 may represent an entity with the name "Mobile Telephone Application.”
- the entity may be, for example, a software development project.
- Generic entity node 402 may include an JD attribute value of 1.
- Version 1.1 of the state information for generic entity node 402 may include a timestamp attribute value of 1/1, indicating that the state information was entered or stored on January 1 of the current year, and a name attribute value of "Mobile Telephone Application.”
- Edge 406 may connect generic entity node 400 and generic entity node 402.
- Edge 406 may be a directed edge pointing from generic entity node 400 to generic entity node 402.
- Edge 406 may include information indicating that the entity associated with generic entity node 400 is a participant in the project associated with generic entity node 402. This information may be represented by the type of the edge, in this case
- the edge may also include a label attribute value.
- the label attribute value may indicate, for example, the type of participation.
- the label attribute value is "designer,” indicating that the entity associated with generic entity node 400 was a software development designer with respect to the entity associated with generic entity node 402. In other words, John is a software development designer for the Mobile Telephone Application software development project.
- edge 406 may also have a time attribute value associated with it. The time attribute value may indicate the duration during which the information associated with the edge is applicable.
- Edge 406 includes time attribute value information of 3/1-8/31. This may indicate that John was a software development designer for the Mobile Telephone Application software development project from March 1 to August 31 of the current year.
- Edge 408 may be a directed edge pointing from generic entity node 400 to generic entity node 402.
- Edge 408 may include information indicating that the entity associated with generic entity node 400 is a participant in the project associated with generic entity node 402. This information may be represented by the type of the edge, in this case "participate-in.”
- the edge may also include a label attribute value.
- the label attribute value may indicate, for example, the type of participation. For edge 408, the label attribute value is "manager,” indicating that the entity associated with generic entity node 400 was a project manager with respect to the entity associated with generic entity node 402.
- edge 408 may also have a time attribute value associated with it.
- the time attribute value may indicate the duration during which the information associated with the edge is applicable.
- Edge 408 includes time attribute value information of 9/1-current.
- Generic entity node 400, edge 408, and generic entity node 402 indicate that John began his role as the project manager for the Mobile Telephone Application software development project starting on September 1 of the current year, and that he continues to be a participant as the project manager.
- Generic entity node 404 may represent an entity with the name "Sales Call.” The entity may be, for example, an activity or event that occurred or is scheduled to occur. Generic entity node 404 may include an ID attribute value of 3. This ID attribute value may uniquely identify generic entity node 404 from other nodes in the graph.
- Version 3.1 of the state information for generic entity node 404 may include a timestamp attribute value of 10/1. This timestamp attribute value may indicate that the "Sales Call” event occurred on October 1 of the current year. Version 3.1 of the state information may also include a name attribute value of "Sales Call.”
- Edge 410 may connect generic entity node 400 and generic entity node 404.
- Edge 410 may be a directed edge pointing from generic entity node 400 to generic entity node 404.
- Edge 410 may include information indicating that the entity associated with generic entity node 400 is a participant in the event associated with generic entity node 404. This information may be represented by the type of the edge, in this case
- the edge may also include a label attribute value.
- the label attribute value may indicate, for example, the type of participation.
- the label attribute value is "participant,” indicating that the entity associated with generic entity node 400 was a participant with respect to the entity associated with generic entity node 404. In other words, John was on a sales call that took place on 10/1.
- Edge 410 may also have a time attribute value associated with it. The time attribute value may indicate the duration during which the information associated with the edge is applicable. Edge 410 includes time attribute value information of 10/1 -current.
- FIG. S illustrates an exemplary process for retrieving a portion of a graph based on a time point or a time interval. Retrieving a portion of a graph may be useful to determine the state of the graph, an entity, or an entity and related entities, at a certain point in time or during a certain time interval.
- the system may receive a time-based request.
- the request may include information requesting the state of an entity or a subset of the graph at a particular point-in-time.
- a particular point-in-time may be a specific date, or a specific date and time.
- the request may include information requesting the state of an entity or a subset of the graph during a particular duration of time.
- a particular duration of time may be a range of dates, a range of times, or a range of dates and times.
- the system may access all or a portion of the graph.
- the system may exclude all or some edges from an intermediate result based on the time- based request. For example, edges with an associated time interval that does not intersect the point-in-time from the time-based request may be excluded from the intermediate result. For another example, edges with an associated time interval that does not match the duration of time from the time-based request may be excluded from the intermediate result.
- the system may exclude all or some versions stored in nodes from the intermediate result based on the time-based request. For example, versions of a node that do not intersect a point-in-time from the time-based request or do not match the duration of time from the time-based request may be excluded from the intermediate result.
- the system may exclude all or some nodes that do not include at least one version of a node that has not been excluded. Thus, any node that has had all versions excluded may also be excluded.
- the system may return a result based on the time-based request. The result may be, for example, a subset of the graph or a characteristic of a node or edge.
- the system may include edges, nodes, and version that do match the time-based criteria of the request. These edges, nodes, and versions may be included in an intermediate result.
- the system may include edges that meet the time- based criteria, include node versions that meet the time-based criteria, and include nodes that store at least one node version that has been included for meeting the time-based criteria.
- the system may traverse the graph to determine whether edges and nodes meet the time-based criteria of the request.
- This traversal of the graph may be a loose traversal or a strict traversal.
- a loose traversal may determine a match when the edges and nodes on the path being traversed at least partially meet the time-based criteria of the request. This may not require that there is a single point in time where each of the matched nodes and edges is valid. For example, if the time-based criteria of the request includes a duration from 1/1/2010 to 1/1/2011, the system may match both an edge that has an interval of 3/1/2010 to 4/1/2010 and an edge that has an interval of 7/1/2010 to 8/1/2010, even though they two edges do not overlap at all.
- a strict traversal may determine a match when the edges and nodes on the path being traversed share a single time when they are valid. To determine whether a path is strict, the system may intersect the time window of all the nodes and edges on the path and determine if the results are non-empty.
- Some information in the graph may be classified as data that changes slowly, rather than changing on a time-based, regular schedule.
- the data may never change, change infrequently, or be less likely to change than the data in the graph on average, or change once or less per a specified time period (such as once or less per year, or month, or day).
- birthdates, a work location, place of birth, social security number, or item color may be slow changing.
- techniques related to slow changing dimensions may be used. By identifying data in the graph as slow changing, searching and matching may be performed more efficiently.
- Slow changing data fields in the graph may be classified as one of three types.
- Type 0 data may be data fields that are not changed once the value of the data field is set or stored.
- Type 1 data may be data fields where more recent data overwrites the previous data, or more recent data takes precedence over less recent data. Using this information, queries may be constrained based on slow changing dimension types.
- Type 2 data may be data where a time window is consistent with the traverse path.
- One example of a query related to type 0 data may be querying to find all employees who first worked for the company in California. The first version of the data may be accessed to return a result to this query.
- One example of a query related to type 1 data may be querying to find the current email addresses of all employees who worked on a particular project. In this example, the most recent (and therefore the most likely to be valid) email information should be retrieved.
- One example of a query related to type 2 data may be querying to find all employees who worked on a project in California at a certain point in time.
- a basic set of queries to return edges, nodes, and attributes may be
- the basic set of queries may include:
- An extended set of queries to return edges, nodes, and attributes may also be implemented. Again, these examples of queries need not be implemented verbatim as a query language, but may be abstract forms for queries instead.
- a query is a traversal that begins at a particular node and ends at another particular node.
- FIG. 6 depicts an exemplary computing system 600 configured to perform any one of the above-described processes.
- computing system 600 may include, for example, a processor, memory, storage, and input/output devices (e.g., monitor, keyboard, disk drive, Internet connection, etc.).
- computing system 600 may include circuitry or other specialized hardware for carrying out some or all aspects of the processes.
- computing system 600 may be configured as a system that includes one or more units, each of which is configured to carry out some aspects of the processes either in software, hardware, or some
- FIG. 6 depicts computing system 600 with a number of components that may be used to perform the above-described processes.
- the main system 602 includes a motherboard 604 having an input output (“I/O") section 606, one or more central processing units (“CPU”) 608, and a memory section 610, which may have a flash memory card 612 related to it.
- the I/O section 606 is connected to a display 624, a keyboard 614, a disk storage unit 616, and a media drive unit 618.
- the media drive unit 618 can read/write a computer-readable medium 620, which can contain programs 622 and/or data.
- a non-transitory computer-readable medium can be used to store (e.g., tangibly embody) one or more computer programs for performing any one of the above-described processes by means of a computer.
- the computer program may be written, for example, in a general-purpose programming language (e.g., Pascal, C, C++, Java) or some specialized application-specific language.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201261698521P | 2012-09-07 | 2012-09-07 | |
| US61/698,521 | 2012-09-07 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2014039884A1 true WO2014039884A1 (fr) | 2014-03-13 |
Family
ID=50237649
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2013/058599 Ceased WO2014039884A1 (fr) | 2012-09-07 | 2013-09-06 | Modèle de données de graphe basé sur le temps |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2014039884A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9092548B2 (en) | 2013-03-15 | 2015-07-28 | Magnet Systems, Inc. | Time-based graph data model |
| CN112612905A (zh) * | 2020-12-28 | 2021-04-06 | 北京明略软件系统有限公司 | 基于Elasticsearch的数据处理方法、系统、计算机及可读存储介质 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070209074A1 (en) * | 2006-03-04 | 2007-09-06 | Coffman Thayne R | Intelligent intrusion detection system utilizing enhanced graph-matching of network activity with context data |
| US20080216094A1 (en) * | 2001-04-30 | 2008-09-04 | The Commonwealth Of Australia | Event handling system |
| US20080294648A1 (en) * | 2004-11-01 | 2008-11-27 | Sybase, Inc. | Distributed Database System Providing Data and Space Management Methodology |
| US20100174692A1 (en) * | 2007-03-15 | 2010-07-08 | Scott Meyer | Graph store |
| US20100211924A1 (en) * | 2005-07-05 | 2010-08-19 | Microsoft Corporation | Discovering and exploiting relationships in software repositories |
| US20120017207A1 (en) * | 2009-09-30 | 2012-01-19 | Amitt Mahajan | Apparatuses, Methods and Systems for a Social Networking Application Updater |
-
2013
- 2013-09-06 WO PCT/US2013/058599 patent/WO2014039884A1/fr not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080216094A1 (en) * | 2001-04-30 | 2008-09-04 | The Commonwealth Of Australia | Event handling system |
| US20080294648A1 (en) * | 2004-11-01 | 2008-11-27 | Sybase, Inc. | Distributed Database System Providing Data and Space Management Methodology |
| US20100211924A1 (en) * | 2005-07-05 | 2010-08-19 | Microsoft Corporation | Discovering and exploiting relationships in software repositories |
| US20070209074A1 (en) * | 2006-03-04 | 2007-09-06 | Coffman Thayne R | Intelligent intrusion detection system utilizing enhanced graph-matching of network activity with context data |
| US20100174692A1 (en) * | 2007-03-15 | 2010-07-08 | Scott Meyer | Graph store |
| US20120017207A1 (en) * | 2009-09-30 | 2012-01-19 | Amitt Mahajan | Apparatuses, Methods and Systems for a Social Networking Application Updater |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9092548B2 (en) | 2013-03-15 | 2015-07-28 | Magnet Systems, Inc. | Time-based graph data model |
| CN112612905A (zh) * | 2020-12-28 | 2021-04-06 | 北京明略软件系统有限公司 | 基于Elasticsearch的数据处理方法、系统、计算机及可读存储介质 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9092548B2 (en) | Time-based graph data model | |
| US11281626B2 (en) | Systems and methods for management of data platforms | |
| US8825711B2 (en) | Managing cross-correlated data | |
| US10025904B2 (en) | Systems and methods for managing a master patient index including duplicate record detection | |
| US8725711B2 (en) | Systems and methods for information categorization | |
| US10572461B2 (en) | Systems and methods for managing a master patient index including duplicate record detection | |
| US8150888B2 (en) | Automatic elimination of functional dependencies between columns | |
| US9665632B2 (en) | Managing activities over time in an activity graph | |
| US20210110278A1 (en) | Enterprise knowledge graph | |
| US20160179863A1 (en) | Generating Secured Recommendations for Business Intelligence Enterprise Systems | |
| US10417265B2 (en) | High performance parallel indexing for forensics and electronic discovery | |
| US12423285B2 (en) | Systems and methods for determining dataset intersection | |
| US20140071135A1 (en) | Managing activities over time in an activity graph | |
| CN114090634A (zh) | 一种基于数据仓库的酒店数据管理方法及装置 | |
| US10614091B1 (en) | Warehouse based reporting and operational reporting integration | |
| US20240241986A1 (en) | Method and System for Processing File Metadata | |
| JP6831118B2 (ja) | 就労マッチング装置、就労マッチングプログラム及び就労マッチング方法 | |
| EP3652660B1 (fr) | Systèmes et procédés d'assemblage d'ensembles de données | |
| WO2014039884A1 (fr) | Modèle de données de graphe basé sur le temps | |
| CN118708808A (zh) | 基于大模型的推荐方法、装置、设备以及存储介质 | |
| van Gils | Reference and Master Data Management | |
| JP2013171495A (ja) | データ管理装置、データ管理方法及びデータ管理プログラム | |
| CN105653593A (zh) | 一种基于社交好友的知识产权数据管理系统和共享方法 | |
| Galloway et al. | Digging in the details: A case study in network data mining | |
| Christen | Recent Developments and Research Challenges in Data Linkage |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13835757 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 13835757 Country of ref document: EP Kind code of ref document: A1 |