[go: up one dir, main page]

CN111814006A - Analysis method, device and computer equipment for graph network structure - Google Patents

Analysis method, device and computer equipment for graph network structure Download PDF

Info

Publication number
CN111814006A
CN111814006A CN202010733106.8A CN202010733106A CN111814006A CN 111814006 A CN111814006 A CN 111814006A CN 202010733106 A CN202010733106 A CN 202010733106A CN 111814006 A CN111814006 A CN 111814006A
Authority
CN
China
Prior art keywords
graph
network structure
vertex
edge
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010733106.8A
Other languages
Chinese (zh)
Inventor
刘利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010733106.8A priority Critical patent/CN111814006A/en
Publication of CN111814006A publication Critical patent/CN111814006A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • G06Q10/40

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the field of artificial intelligence, and discloses an analysis method of a graph network structure, wherein the graph network structure to be analyzed is obtained according to an execution script of a graph analysis task, the graph network structure comprises a vertex set and an edge set, each vertex in the vertex set carries a first type label and a first attribute corresponding to the vertex respectively, each edge in the edge set is formed by connecting two vertexes with an incidence relation, and each edge carries a second type label and a second attribute corresponding to the edge respectively; respectively mapping each vertex and each edge in the graph network structure to a specified data set; acquiring a graph operator corresponding to the graph analysis task according to the execution script; and analyzing and calculating the designated data set corresponding to the graph network structure by utilizing the graph operator to obtain an analysis result. And establishing a graph network structure with extended attributes, converting graph operation into operation of a data set, realizing analysis of a single graph or a graph set, and providing a new function for graph analysis.

Description

图网络结构的分析方法、装置和计算机设备Analysis method, device and computer equipment for graph network structure

技术领域technical field

本申请涉及到人工智能领域,特别是涉及到图网络结构的分析方法、装置和计算机设备。The present application relates to the field of artificial intelligence, in particular to the analysis method, apparatus and computer equipment of graph network structure.

背景技术Background technique

通过图可以更直观地表达事物以及事物之间的联系,但计算机识别图时需要根据图进行建模,以识别图中信息并计算图中各事物之间的关系。图分析在研究和工业中扮演着重要的角色。比如图能够表示社交网络中的社区关系。但关于图的复杂分析问题需要结合多种分析操作,例如对网站进行排名或分析社交网络,图数据模型是执行图算法的先决条件。目前大规模图计算是基于某个数据流框架上的库的实现,如Apache Spark上的GraphX或Apache Flink上的Gelly,结合底层框架提供的通用数据转换运算符执行某个图计算,但不支持数据集,无法解决复杂的分析任务。Graphs can express things and the connections between things more intuitively, but when a computer recognizes a graph, it needs to be modeled according to the graph to identify the information in the graph and calculate the relationship between the things in the graph. Graph analytics plays an important role in research and industry. For example, graphs can represent community relationships in social networks. But complex analytical problems about graphs require a combination of multiple analytical operations, such as ranking websites or analyzing social networks, and a graph data model is a prerequisite for executing graph algorithms. At present, large-scale graph computing is based on the implementation of a library on a data flow framework, such as GraphX on Apache Spark or Gelly on Apache Flink, combined with the general data conversion operators provided by the underlying framework to perform a certain graph computing, but does not support Data sets that cannot solve complex analytical tasks.

发明内容SUMMARY OF THE INVENTION

本申请的主要目的为提供图网络结构的分析方法,旨在解决现有图计算不支持数据集,无法解决复杂的分析任务的技术问题。The main purpose of this application is to provide an analysis method for a graph network structure, aiming to solve the technical problem that the existing graph computing does not support data sets and cannot solve complex analysis tasks.

本申请提出一种图网络结构的分析方法,包括:This application proposes a method for analyzing a graph network structure, including:

根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;Obtain the graph network structure to be analyzed according to the execution script of the graph analysis task, wherein the graph network structure includes a vertex set and an edge set, and each vertex in the vertex set carries the first type label and the first type label corresponding to the vertex respectively. attribute, each edge in the edge set is formed by connecting two vertices with an associated relationship, and each of the edges carries the second type label and the second attribute corresponding to the edge respectively;

将所述图网络结构中的各顶点以及各条边分别映射指定数据集;Map each vertex and each edge in the graph network structure to the specified data set respectively;

根据所述执行脚本,获取与所述图分析任务相对应的图运算符;obtaining a graph operator corresponding to the graph analysis task according to the execution script;

利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。The specified data set corresponding to the graph network structure is analyzed and calculated by using the graph operator, and an analysis result is obtained.

优选地,所述根据图分析任务的执行脚本,获取待分析的图网络结构的步骤,包括:Preferably, the step of obtaining the graph network structure to be analyzed according to the execution script of the graph analysis task includes:

获取图分析任务的执行脚本中携带的所述待分析的图网络结构的图元素,其中,所述图元素包括图头;obtaining the graph element of the graph network structure to be analyzed carried in the execution script of the graph analysis task, wherein the graph element includes a graph header;

根据所述图头从指定数据库中抽取所述图头对应的图头数据集;Extract the image header data set corresponding to the image header from the specified database according to the image header;

判断所述图头数据集是否包括多个对象;judging whether the image header data set includes multiple objects;

若否,则判定所述待分析的图网络结构为图头为单个对象的逻辑图;If not, it is determined that the graph network structure to be analyzed is a logical graph whose graph header is a single object;

从所述数据集中获取与所述逻辑图关联的顶点数据和边数据,分别对应组成顶点集和边集;Acquiring vertex data and edge data associated with the logic graph from the data set, respectively forming a vertex set and an edge set;

通过所述顶点集和边集表示所述逻辑图,得到所述待分析的图网络结构。The logic graph is represented by the vertex set and the edge set, and the graph network structure to be analyzed is obtained.

优选地,指定数据集包括Flink数据集,所述根据所述执行脚本,获取与所述图分析任务相对应的图运算符的步骤之后,包括:Preferably, the specified data set includes a Flink data set, and after the step of acquiring the graph operator corresponding to the graph analysis task according to the execution script, the step includes:

获取各所述图运算符的执行顺序;Obtain the execution order of each of the graph operators;

按照所述执行顺序将各所述图运算符,通过Java编程语言链接为运算逻辑;According to the execution order, each of the graph operators is linked into operation logic through the Java programming language;

将所述运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序。The Flink operation program is formed by associating the operation logic with the trigger of the Flink trigger program.

优选地,所述图运算符包括排除运算符,利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:Preferably, the graph operator includes an exclusion operator, and the graph operator is used to analyze and calculate the specified data set corresponding to the graph network structure, and the steps of obtaining the analysis result include:

获取第一逻辑图对应的第一顶点集和第一边集,获取第二逻辑图对应的第二顶点集和第二边集;obtaining a first vertex set and a first edge set corresponding to the first logical graph, and obtaining a second vertex set and a second edge set corresponding to the second logical graph;

通过函数ID_ONLY从第二逻辑图中,提取所述第二逻辑图的标识符,形成新id对象对应的数据集;The identifier of the second logic diagram is extracted from the second logic diagram by the function ID_ONLY to form a data set corresponding to the new id object;

通过所述排除运算符筛选指定顶点和指定边,得到所述排除运算符对应的排除结果数据集,其中,所述指定顶点包含于所述第一顶点集,不包含于所述第二顶点集,所述指定边真包含于所述第一边集;The specified vertex and the specified edge are filtered by the exclusion operator to obtain the exclusion result data set corresponding to the exclusion operator, wherein the specified vertex is included in the first vertex set and not included in the second vertex set , the specified edge is really included in the first edge set;

通过Flink广播将所述排除结果数据集发送至Apache Flink框架集群的所有节点。The exclusion result dataset is sent to all nodes of the Apache Flink framework cluster via Flink broadcast.

优选地,所述图网络结构为多个逻辑图组成的社交网络图,所述图运算符包括Flink提供的关系操作符,所述利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:Preferably, the graph network structure is a social network graph composed of multiple logical graphs, the graph operator includes a relational operator provided by Flink, and the specified data corresponding to the graph network structure is analyzed by using the graph operator. The steps of analyzing and calculating the set to obtain the analysis result include:

识别用户选定的关系操作符是否为寻找所述社交网络图中逻辑图之间的最大公共子图;Identifying whether the relational operator selected by the user is to find the largest common subgraph between the logical graphs in the social network graph;

若是,则聚集所述社交网络图中每个所述逻辑图的顶点数量;If so, aggregate the number of vertices of each of the logical graphs in the social network graph;

根据每个所述逻辑图的顶点数量,选择高于最小顶点计数的指定逻辑图;according to the number of vertices in each of said logical graphs, selecting a specified logical graph higher than the minimum vertex count;

基于所述指定逻辑图,通过标签传播算法得到所述社交网络图中逻辑图之间的最大公共子图。Based on the specified logic graph, the maximum common subgraph between the logic graphs in the social network graph is obtained through a label propagation algorithm.

优选地,所述图预算符至少包括两个,所述利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:Preferably, the graph budget operator includes at least two, and the step of using the graph operator to analyze and calculate the specified data set corresponding to the graph network structure to obtain the analysis result includes:

获取多个图预算符对应的指定Flink程序;Get the specified Flink program corresponding to multiple graph budget symbols;

获取所述指定Flink程序当前运行节点,预测可并发执行运算的数据集;Obtain the current running node of the specified Flink program, and predict the data set that can perform operations concurrently;

将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器;Distribute the concurrently executed data sets to designated machines in the Apache Flink framework cluster;

接收各指定机器分别执行分发的数据集得到的计算结果;Receive the calculation results obtained by each designated machine executing the distributed data set respectively;

将各计算结果汇总得到多个图预算符的分析结果;Summarize the calculation results to obtain the analysis results of multiple graph budget symbols;

将所述多个图预算符的分析结果,通过Flink广播发送至Apache Flink框架集群的所有节点。The analysis results of the multiple graph budget identifiers are sent to all nodes of the Apache Flink framework cluster through Flink broadcast.

优选地,所述指定Flink程序包括对特定逻辑图的聚合运算、变换预算和模式匹配预算,所述将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器的步骤,包括:Preferably, the specified Flink program includes aggregation operations, transformation budgets and pattern matching budgets for a specific logic graph, and the step of distributing the concurrently executed data set to the specified machines in the Apache Flink framework cluster includes:

获取所述特定逻辑图对应的数据集、聚合运算对应的聚合函数、变换运算对应的转换函数以及模式匹配运算对应的匹配函数;acquiring the data set corresponding to the specific logic diagram, the aggregation function corresponding to the aggregation operation, the conversion function corresponding to the transformation operation, and the matching function corresponding to the pattern matching operation;

将所述特定逻辑图对应的数据集和聚合运算对应的聚合函数分发至集群中的第一机器,将所述特定逻辑图对应的数据集和变换运算对应的转换函数分发至集群中的第二机器,将所述特定逻辑图对应的数据集和模式匹配运算对应的匹配函数分发至集群中的第三机器;Distribute the data set corresponding to the specific logic diagram and the aggregation function corresponding to the aggregation operation to the first machine in the cluster, and distribute the data set corresponding to the specific logic diagram and the transformation function corresponding to the transformation operation to the second machine in the cluster. a machine that distributes the data set corresponding to the specific logic diagram and the matching function corresponding to the pattern matching operation to the third machine in the cluster;

控制所述第一机器、所述第二机器和所述第三机器并发运行。The first machine, the second machine and the third machine are controlled to operate concurrently.

本申请还提供了一种图网络结构的分析装置,包括:The application also provides a graph network structure analysis device, including:

第一获取模块,用于根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;The first acquisition module is used to acquire the graph network structure to be analyzed according to the execution script of the graph analysis task, wherein the graph network structure includes a vertex set and an edge set, and each vertex in the vertex set carries the corresponding A first type label and a first attribute, each edge in the edge set is formed by connecting two vertices with an associated relationship, and each of the edges respectively carries the second type label and the second attribute corresponding to the edge;

映射模块,用于将所述图网络结构中的各顶点以及各条边分别映射指定数据集;a mapping module, used to map each vertex and each edge in the graph network structure to the specified data set respectively;

第二获取模块,用于根据所述执行脚本,获取与所述图分析任务相对应的图运算符;a second obtaining module, configured to obtain a graph operator corresponding to the graph analysis task according to the execution script;

计算模块,用于利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。The calculation module is configured to perform analysis and calculation on the specified data set corresponding to the graph network structure by using the graph operator to obtain an analysis result.

本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述方法的步骤。The present application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the above method when executing the computer program.

本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的方法的步骤。The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned method are implemented.

本申请不仅建立有扩展属性的图网络结构,图表示的语义更丰富,支持多个不同属性的图,且基于有扩展属性的图网络结构,提供声明性和可组合的图运算符,在ApacheFLink上实现了扩展属性的图网络结构,并将图运算转换为数据集的运算,实现分析单个图或图集合,为图形分析提供了新的功能,实现对大规模数据和复杂图形处理的能力。This application not only establishes a graph network structure with extended attributes, but also has richer semantics of graph representation, supports multiple graphs with different attributes, and provides declarative and composable graph operators based on the graph network structure with extended attributes. It implements the graph network structure with extended attributes, and converts graph operations into data set operations to analyze a single graph or graph collection, provide new functions for graph analysis, and realize the ability to process large-scale data and complex graphs.

附图说明Description of drawings

图1本申请一实施例的图网络结构的分析方法流程示意图;1 is a schematic flowchart of a method for analyzing a graph network structure according to an embodiment of the present application;

图2本申请一实施例的图网络结构的分析装置结构示意图;2 is a schematic structural diagram of an apparatus for analyzing a graph network structure according to an embodiment of the present application;

图3本申请一实施例的计算机设备内部结构示意图。FIG. 3 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

参照图1,本申请一实施例的图网络结构的分析方法,包括:1 , a method for analyzing a graph network structure according to an embodiment of the present application includes:

S1:根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;S1: Obtain the graph network structure to be analyzed according to the execution script of the graph analysis task, wherein the graph network structure includes a vertex set and an edge set, and each vertex in the vertex set carries the first type label corresponding to the vertex and the The first attribute, each edge in the edge set is formed by connecting two vertices with an associated relationship, and each of the edges respectively carries the second type label and the second attribute corresponding to the edge;

S2:将所述图网络结构中的各顶点以及各条边分别映射指定数据集;S2: Map each vertex and each edge in the graph network structure to the specified data set respectively;

S3:根据所述执行脚本,获取与所述图分析任务相对应的图运算符;S3: Obtain a graph operator corresponding to the graph analysis task according to the execution script;

S4:利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。S4: Use the graph operator to analyze and calculate the specified data set corresponding to the graph network structure to obtain an analysis result.

本申请中的图网络结构带有扩展属性,图网络结构由顶点集V和边集E组成,其中顶点集V中包括各类型顶点,由点对应的第一类型标签和第一属性进行描述和表示各顶点。举例地,上述带扩展属性的图网络结构为图社交网络,表示为有向图G=<V,E>,由一组顶点V和一组边

Figure BDA0002604023300000051
组成。比如图网络由顶点集V={v0,…,v9}和边集E={e0,…,e19}组成。顶点可分为多种类型,包括表示人(Person)的顶点、表示论坛(Forum)的顶点和表示兴趣(Tag)的顶点。不同类型的顶点由相应的第一类型标签表示和相应的第一属性进行描述。边描述了顶点之间的关系,也有对应的第二类型标签(如knows)和第二属性进行描述。具有相同类型标签的顶点可能具有不同的属性,例如v0和v1,v1或许包括age属性,但是v0或许没有age属性,这与顶点的类型相关,可通过配置文件配置各顶点的类型标签和属性,以及各边的类型标签和属性,实现了有扩展属性的图网络结构,使网络结构的语义更丰富,可支持多个不同属性的图,信息量更庞大,可表达的关系更加复杂。本申请通过将图网络结构中的各顶点以及各条边分别映射为指定数据类型的指定数据集,比如Flink数据集,以便通过解析指定数据集之间的关系,计算图运算符对应的分析结果,实现对复杂的图网络结构的分析计算,为图形分析提供了新功能领域的应用可能。本申请不仅建立有扩展属性的图网络结构,图表示的语义更丰富,支持多个不同属性的图,且基于有扩展属性的图网络结构,提供声明性和可组合的图运算符,在Apache FLink上实现了扩展属性的图网络结构,并将图运算转换为数据集的运算,实现分析单个图或图集合,为图形分析提供了新的功能,实现对大规模数据和复杂图形处理的能力。The graph network structure in this application has extended attributes, and the graph network structure is composed of a vertex set V and an edge set E, wherein the vertex set V includes various types of vertices, which are described by the first type label and the first attribute corresponding to the point. represents each vertex. For example, the above graph network structure with extended attributes is a graph social network, represented as a directed graph G=<V, E>, consisting of a set of vertices V and a set of edges
Figure BDA0002604023300000051
composition. For example, a graph network consists of a vertex set V={v 0 ,...,v 9 } and an edge set E={e 0 ,...,e 19 }. Vertices can be classified into various types, including vertices representing people (Person), vertices representing forums (Forum), and vertices representing interests (Tag). Different types of vertices are represented by corresponding first type labels and described by corresponding first attributes. Edges describe the relationship between vertices, and also have corresponding second type labels (such as knows) and second attributes to describe. Vertices with the same type label may have different attributes, such as v0 and v1, v1 may include an age attribute, but v0 may not have an age attribute, which is related to the type of vertex, and the type label and attribute of each vertex can be configured through the configuration file. As well as the type labels and attributes of each side, a graph network structure with extended attributes is realized, which makes the semantics of the network structure richer, supports multiple graphs with different attributes, has a larger amount of information, and can express more complex relationships. In this application, each vertex and each edge in the graph network structure is mapped to a specified data set of a specified data type, such as a Flink data set, so as to analyze the relationship between the specified data sets and calculate the analysis results corresponding to the graph operators , to realize the analysis and calculation of the complex graph network structure, and provide the possibility of application in the new functional field for the graph analysis. This application not only establishes a graph network structure with extended attributes, but also has richer semantics of graph representation, supports multiple graphs with different attributes, and provides declarative and composable graph operators based on the graph network structure with extended attributes. FLink implements a graph network structure with extended attributes, and converts graph operations into data set operations to analyze a single graph or graph set, providing new functions for graph analysis, and realizing the ability to process large-scale data and complex graphs .

进一步地,所述根据图分析任务的执行脚本,获取待分析的图网络结构的步骤S1,包括:Further, the step S1 of acquiring the graph network structure to be analyzed according to the execution script of the graph analysis task includes:

S10:获取图分析任务的执行脚本中携带的所述待分析的图网络结构的图元素,其中,所述图元素包括图头;S10: Obtain the graph elements of the graph network structure to be analyzed carried in the execution script of the graph analysis task, wherein the graph elements include graph headers;

S11:根据所述图头从指定数据库中抽取所述图头对应的图头数据集;S11: Extract the image header data set corresponding to the image header from the specified database according to the image header;

S12:判断所述图头数据集是否包括多个对象;S12: Determine whether the image header data set includes multiple objects;

S13:若否,则判定所述待分析的图网络结构为图头为单个对象的逻辑图;S13: If not, then determine that the graph network structure to be analyzed is a logical graph whose graph header is a single object;

S14:从所述数据集中获取与所述逻辑图关联的顶点数据和边数据,分别对应组成顶点集和边集;S14: Obtain vertex data and edge data associated with the logic graph from the data set, and form a vertex set and an edge set respectively correspondingly;

S15:通过所述顶点集和边集表示所述逻辑图,得到所述待分析的图网络结构。S15: The logic graph is represented by the vertex set and the edge set, and the graph network structure to be analyzed is obtained.

本申请实施例以待分析的图网络结构为逻辑图为例,详细说明从数据库中获取图网络结构的数据集,表示图网络结构的过程。逻辑图是图集合的一个特殊情形,其图头数据集中只包含单个对象,图中所有的数据信息都是相通的。逻辑图也有类型标签(例如Community)和属性,可以通过用特定的度量(例如vertexCount:3)或关于该逻辑图的一般信息(例如interest:Databases)来描述该逻辑图。逻辑图的类型标签和属性可通过显式声明的方式显示在逻辑图中,也可以是图算法的输出或用作后续运算符的输入,例如社区检测或图模式匹配的应用情况。本申请中,预定义的数据库DB=<V,E,L>,由顶点集V={vi}、边集E={ek}和一组逻辑图L={Gm}组成。逻辑图Gm=<Vm,Em>,由顶点集V的子集

Figure BDA0002604023300000061
和边集的子集
Figure BDA0002604023300000062
组成的有序对。图头表示数据与某个逻辑图相关联的关系,如果待分析的图网络结构的图头均只针对一种对象,则整个待分析的图网络结构为数据相通或具有数据关联的逻辑图。本申请的逻辑图带有扩展属性,比如图头的扩展属性包括地址、标签和数据属性,表示为GraphHead:=<Id,Label,Properties>;顶点的扩展属性包括地址、标签、顶点数据属性和图中顶点编号,表示为Vertex:=<Id,Label,Properties,GraphIds>;边的扩展属性包括地址、标签、边起始顶点地址、边终止顶点地址、边数据属性和图中的边编号,表示为Edge:=<Id,Label,SrcId,TrgtId,Properties,GraphIds>。举例地,逻辑图为g0={DataSet<GraphHead>graphHead={<0,′Community′,{′interest′:′Databases′,...}>},DataSet<Vertex>vertices={<0,′Person′,{′name′:′Alice′,...},{0,2}>,<,′Person′,{′name′:′Bob′,...},{0,2}>},DataSet<Edge>edges={<0,′knows′,0,1,{′since′:2014},{0,2}>,<1,′Person′,1,0,{′snce′:2014},{0,2}>}}。The embodiments of the present application take the graph network structure to be analyzed as a logical graph as an example, and describe in detail the process of acquiring the graph network structure data set from the database to represent the graph network structure. A logic graph is a special case of a graph set. The graph header data set contains only a single object, and all the data information in the graph is the same. Logic graphs also have type labels (eg Community) and attributes, which can be described by either specific metrics (eg vertexCount:3) or general information about the logic graph (eg interest:Databases). The type labels and properties of a logic graph can be explicitly declared in the logic graph, and can also be the output of graph algorithms or used as input to subsequent operators, such as the application of community detection or graph pattern matching. In this application, the predefined database DB=<V,E,L> consists of vertex set V={vi } , edge set E={ ek } and a set of logic graph L={G m }. Logical graph Gm=<V m ,E m >, consisting of a subset of vertex set V
Figure BDA0002604023300000061
and a subset of the edge set
Figure BDA0002604023300000062
composed of ordered pairs. The graph header represents the relationship between data and a certain logical graph. If the graph headers of the graph network structure to be analyzed are only for one type of object, the entire graph network structure to be analyzed is a logical graph with interconnected data or data association. The logic graph of this application has extended attributes. For example, the extended attributes of the graph header include address, label and data attributes, which are expressed as GraphHead:=<Id, Label, Properties>; the extended attributes of the vertex include address, label, vertex data attribute and The vertex number in the graph, expressed as Vertex:=<Id,Label,Properties,GraphIds>; the extended attributes of the edge include address, label, edge start vertex address, edge end vertex address, edge data attribute and edge number in the graph, Expressed as Edge:=<Id, Label, SrcId, TrgtId, Properties, GraphIds>. For example, the logic graph is g0={DataSet<GraphHead>graphHead={<0,'Community',{'interest':'Databases',...}>},DataSet<Vertex>vertices={<0,'Person',{'name':'Alice', ...}, {0, 2}>, <, 'Person', {'name': 'Bob', ...}, {0, 2}>},DataSet<Edge>edges={<0,'knows',0,1,{'since':2014},{0,2}>,<1,'Person',1,0,{'snce' : 2014}, {0, 2}>}}.

进一步地,所述指定数据集包括Flink数据集,所述根据所述执行脚本,获取与所述图分析任务相对应的图运算符的步骤S3之后,包括:Further, the specified data set includes a Flink data set, and after the step S3 of acquiring the graph operator corresponding to the graph analysis task according to the execution script, it includes:

S31:获取各所述图运算符的执行顺序;S31: Obtain the execution order of each of the graph operators;

S32:按照所述执行顺序将各所述图运算符,通过Java编程语言链接为运算逻辑;S32: link each of the graph operators into operational logic through the Java programming language according to the execution order;

S33:将所述运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序。S33: Associate the operation logic with the trigger of the Flink trigger program to form a Flink operation program.

本申请实施例的图网络结构,为基于Apache Flink框架的图表示,使用三种对象类型来表示带扩展属性的图网络结构的图元素:图头、顶点和边。为表示图网络结构的数据集合,为每种图元素使用专用的Flink数据集。图元素对应的Flink数据集,通过数据映射转换,转为为各自对应的专属Flink数据集。在Apache Flink框架中,需要显式地触发程序才能执行,例如,将结果写入文件或数据库。由于本申请实施例中的图运算符实现的过程,都不包含显式地触发程序的触发器,因此本实施例将多个运算符按照执行脚本中的执行顺序链接成运算逻辑,然后将运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序,使整个运算过程作为一个Flink程序执行。The graph network structure of the embodiment of the present application is a graph representation based on the Apache Flink framework, and three object types are used to represent the graph elements of the graph network structure with extended attributes: graph header, vertex and edge. For datasets representing graph network structures, a dedicated Flink dataset is used for each graph element. The Flink datasets corresponding to the graph elements are converted into their respective dedicated Flink datasets through data mapping conversion. In the Apache Flink framework, programs need to be explicitly triggered to execute, for example, to write results to a file or database. Since none of the processes implemented by the graph operators in the embodiments of the present application include triggers that trigger programs explicitly, in this embodiment, multiple operators are linked into operation logic according to the execution order in the execution script, and then the operation The logic is associated with the trigger of the Flink trigger program to form a Flink operation program, so that the entire operation process is executed as a Flink program.

进一步地,所述图运算符包括排除运算符,利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤S4,包括:Further, the graph operator includes an exclusion operator, and the graph operator is used to analyze and calculate the specified data set corresponding to the graph network structure, and the step S4 of obtaining the analysis result includes:

S41:获取第一逻辑图对应的第一顶点集和第一边集,获取第二逻辑图对应的第二顶点集和第二边集;S41: Obtain the first vertex set and the first edge set corresponding to the first logical graph, and obtain the second vertex set and the second edge set corresponding to the second logical graph;

S42:通过函数ID_ONLY从第二逻辑图中,提取所述第二逻辑图的标识符,形成新id对象对应的数据集;S42: extract the identifier of the second logic diagram from the second logic diagram through the function ID_ONLY to form a data set corresponding to the new id object;

S43:通过所述排除运算符筛选指定顶点和指定边,得到所述排除运算符对应的排除结果数据集,其中,所述指定顶点包含于所述第一顶点集,不包含于所述第二顶点集,所述指定边真包含于所述第一边集。S43: Filter the specified vertex and the specified edge by the exclusion operator to obtain the exclusion result data set corresponding to the exclusion operator, wherein the specified vertex is included in the first vertex set, and is not included in the second vertex set The vertex set, the specified edge is actually contained in the first edge set.

S44:通过Flink广播将所述排除结果数据集发送至Apache Flink框架集群的所有节点。S44: Send the exclusion result dataset to all nodes of the Apache Flink framework cluster through Flink broadcast.

本申请以对图计算的排除运算符操作,得到转换后的数据集的过程进行说明。排除运算符的实现过程包括,生成的逻辑图由包含在第一逻辑图中但不包含在第二逻辑图中的顶点和边组成。实现的方式是基于逻辑图的顶点和边进行过滤。先通过对第二逻辑图的图头数据集,用映射变换来提取第二逻辑图的标识符。转换时通过调用用户定义的函数ID_ONLY进行参数化,使通过函数ID_ONLY从图头数据集中提取标识符。生成的逻辑图的数据集包含一个新id对象,用于筛选不包含在具有此新id对象的第二逻辑图中的顶点。筛选转换时通过用户定义的函数(NOT_IN_GRAPH_FILTER)作为参数,对第一逻辑图和第二逻辑图的数据集中的每个顶点调用该函数。为了使新id对象适用于过滤filter函数,使用Flink的广播概念将运算后得到的数据集,发送给Apache Flink框架集群所有节点。然后生成的数据集,只包含filter函数计算结果为true的顶点。对于边的过滤具有相同的过程,但边的过滤还需要检测两个半连接状态,即构成边的源顶点和目标顶点只有一端在生成的数据集中的情况需要排除,以确保源顶点和目标顶点均包含在新的顶点集中。根据运算生成的数据集,创建一个新逻辑图,新逻辑图的图头由构造函数创建,包括生成一个新id。上述检测两个半连接状态的过程是指,组成边的源顶点和目标顶点有一端在另一个逻辑图中,即一逻辑图的一条边的部分在另一逻辑图中,需要将重叠的区域排除的情况。The present application describes the process of obtaining the transformed data set by operating the exclusion operator of graph computation. The implementation process of the exclusion operator includes that the generated logic graph is composed of vertices and edges included in the first logic graph but not included in the second logic graph. The way it is implemented is to filter based on the vertices and edges of the logic graph. First, the identifier of the second logic diagram is extracted by mapping the header data set of the second logic diagram. The conversion is parameterized by calling the user-defined function ID_ONLY, so that the identifier is extracted from the header dataset by the function ID_ONLY. The dataset of the resulting logical graph contains a new id object for filtering vertices that are not included in the second logical graph with this new id object. The user-defined function (NOT_IN_GRAPH_FILTER) is passed as a parameter during the filter transformation, and the function is called for each vertex in the data set of the first and second logic graphs. In order to make the new id object suitable for the filter function, the data set obtained after the operation is sent to all nodes of the Apache Flink framework cluster using Flink's broadcasting concept. Then the generated dataset contains only vertices for which the filter function evaluates to true. The same process is used for edge filtering, but the edge filtering also needs to detect two semi-connected states, i.e. the case where only one end of the source and destination vertices that make up the edge is in the generated dataset needs to be excluded to ensure that the source and destination vertices are are included in the new vertex set. According to the data set generated by the operation, a new logic graph is created, and the graph header of the new logic graph is created by the constructor, including generating a new id. The above process of detecting two semi-connected states means that the source vertex and the target vertex of the edge have one end in another logical graph, that is, the part of one edge of one logical graph is in another logical graph, and the overlapping area needs to be divided. excluded situations.

进一步地,所述图网络结构为多个逻辑图组成的社交网络图,所述图运算符包括Flink提供的关系操作符,所述利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤S4,包括:Further, the graph network structure is a social network graph composed of multiple logical graphs, the graph operator includes a relational operator provided by Flink, and the specified data corresponding to the graph network structure is analyzed by using the graph operator. The step S4 of analyzing and calculating the set to obtain the analysis result includes:

S401:识别用户选定的关系操作符是否为寻找所述社交网络图中逻辑图之间的最大公共子图;S401: Identify whether the relation operator selected by the user is to find the largest common subgraph between the logic graphs in the social network graph;

S402:若是,则聚集所述社交网络图中每个所述逻辑图的顶点数量;S402: If yes, aggregate the number of vertices of each of the logic graphs in the social network graph;

S402:根据每个所述逻辑图的顶点数量,选择高于最小顶点计数的指定逻辑图;S402: According to the number of vertices in each of the logical graphs, select a specified logical graph higher than the minimum vertex count;

S403:基于所述指定逻辑图,通过标签传播算法得到所述社交网络图中逻辑图之间的最大公共子图。S403: Based on the specified logic graph, obtain the maximum common subgraph between the logic graphs in the social network graph through a label propagation algorithm.

本申请使用Flink数据集来表示图形,因此图形分析不仅限于预先定义好的运算符,而且还可以使用Flink提供的所有数据运算,如关系操作符、机器学习符或图形处理符。以通过关系操作符,找到社交网络中较大的社区之间的最大公共子图为例,详细说明分析计算过程。图网络结构所表示的对象通常是异构的,例如社交网络图的顶点可以表示用户、组或乐队,而关系可以表示友谊、成员资格或兴趣。即同一类型的实体对象可能具有不同的属性,即不同的用户可能提供不同的属性信息。上述社交网络图包括多个逻辑图,包含一组逻辑图L={G0,G1,G2},每个逻辑图表示社交网络图中的一个社区,比如是特定的兴趣组,例如Databases。每个逻辑图都有一个对应顶点子集和边子集,例如,V(G0)={v0,v1}和E(G0)={e0,e1}。分析逻辑图G0和G2的关系,从社交网络图的三个逻辑图的带扩展属性的社交网络图可得到,逻辑图G0关联到的顶点和边与逻辑图G2共享,可以得知G0和G2顶点和边可能重叠,因为V(G0)∩V(G2)={v0,v1}和E(G0)∩E(G2)={e0,e1}。本实施例通过识别社区并聚集每个社区的顶点数量,选择高于最小顶点计数的社区,即确定选定的较大的社区,最小顶点计数额由用户预先设定。最后应用标签传播算法来找到最大公共子图。上述标签传播算法是基于图的半监督学习方法,过程为从已标记的节点的标签信息,预测未标记的节点的标签信息,利用样本间的关系,建立完全图模型。每个节点标签按相似度传播给相邻节点,在节点传播的每一步,每个节点根据相邻节点的标签来更新自己的标签,与该节点相似度越大,其相邻节点对其标注的影响权值越大,相似节点的标签越趋于一致,其标签就越容易传播。在标签传播过程中,保持已标记的数据的标签不变,使其将标签传给未标注的数据。最终当迭代结束时,相似节点的概率分布趋于相似,可以划分到一类中,找到最大公共子图对应得最大独立集。This application uses Flink datasets to represent graphs, so graph analysis is not limited to pre-defined operators, but can also use all data operations provided by Flink, such as relational operators, machine learning operators or graph processing operators. Taking the relationship operator to find the largest common subgraph among larger communities in a social network as an example, the analysis and calculation process is described in detail. Objects represented by graph network structures are often heterogeneous, for example, vertices of a social network graph can represent users, groups, or bands, while relationships can represent friendships, memberships, or interests. That is, entity objects of the same type may have different attributes, that is, different users may provide different attribute information. The above social network diagram includes multiple logic diagrams, including a set of logic diagrams L={G 0 , G 1 , G 2 }, each logic diagram represents a community in the social network diagram, such as a specific interest group, such as Databases . Each logical graph has a corresponding subset of vertices and edges, eg, V(G0)={v0,v1} and E(G0)={e0,e1}. Analyzing the relationship between the logical graphs G0 and G2, it can be obtained from the social network graph with extended attributes of the three logical graphs of the social network graph. The vertices and edges associated with the logical graph G0 are shared with the logical graph G2, and it can be known that G0 and G2 Vertices and edges may overlap because V(G0)∩V(G2)={v0,v1} and E(G0)∩E(G2)={e0,e1}. In this embodiment, by identifying communities and gathering the number of vertices of each community, a community higher than the minimum vertex count is selected, that is, the selected larger community is determined, and the minimum vertex count amount is preset by the user. Finally a label propagation algorithm is applied to find the maximum common subgraph. The above label propagation algorithm is a graph-based semi-supervised learning method. The process is to predict the label information of unlabeled nodes from the label information of labeled nodes, and use the relationship between samples to build a complete graph model. The label of each node is propagated to adjacent nodes according to the similarity. At each step of node propagation, each node updates its label according to the label of the adjacent node. The greater the similarity with the node, the adjacent node will label it. The larger the influence weight is, the more consistent the labels of similar nodes are, and the easier its labels are to propagate. During label propagation, the label of the labeled data is kept unchanged, so that it passes the label to the unlabeled data. Finally, when the iteration ends, the probability distributions of similar nodes tend to be similar and can be divided into one class to find the largest independent set corresponding to the largest common subgraph.

进一步地,所述图预算符至少包括两个,所述利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤S4,包括:Further, the graph budget operator includes at least two, and the step S4 of using the graph operator to analyze and calculate the specified data set corresponding to the graph network structure to obtain the analysis result includes:

S4001:获取多个图预算符对应的指定Flink程序;S4001: Obtain the specified Flink programs corresponding to multiple graph budget symbols;

S4002:获取所述指定Flink程序当前运行节点,预测可并发执行运算的数据集;S4002: Obtain the currently running node of the specified Flink program, and predict a data set that can perform operations concurrently;

S4003:将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器;S4003: Distribute the concurrently executed data set to the designated machine in the Apache Flink framework cluster;

S4004:接收各指定机器分别执行分发的数据集得到的计算结果;S4004: Receive the calculation result obtained by each designated machine executing the distributed data set respectively;

S4005:将各计算结果汇总得到多个图预算符的分析结果;S4005: Summarize the calculation results to obtain analysis results of multiple graph budget symbols;

S4006:将所述多个图预算符的分析结果,通过Flink广播发送至Apache Flink框架集群的所有节点。S4006: Send the analysis results of the multiple graph budget identifiers to all nodes of the Apache Flink framework cluster through Flink broadcast.

本申请的Apache Flink框架,对来自流和批处理的数据,支持声明性定义和分布式执行分析,支持的数据包括流式的和批处理的。程序的基本组成是数据集和转换,通过对数据集进行定义和执行的任务进行抽象,即形成数据集和数据集的转换过程。数据集是任意数据对象的集合,转换描述了一个数据集到另一个数据集的数据类型的转换。例如,假设X,Y是数据集,那么转换可以看作函数t:X→Y。举例说,上述转换是映射MAP,对于每个输入的数据对象xi∈X,有且只有一个输出数据对象yi∈Y。比如,转换时归约reduce,则所有输入数据对象都被聚合到一个对象的数据集。关系型数据库中的数据转换,包括join、groupby、project、union和distinct等。本申请为了描述Flink程序应用逻辑,通过用户自定义的函数对数据转换过程进行参数化。Flink程序可以表示为多个链式转换,执行树转化时Flink框架负责程序优化、数据分发和跨机器的集群并行执行。即一个Flink脚本任务可以包括多个链式的数据变换,Flink程序在执行时会一并优化程序、完成数据分发以及在跨机器的集群上并发执行。The Apache Flink framework of this application supports declarative definition and distributed execution analysis of data from streaming and batch processing. The supported data includes streaming and batch processing. The basic components of the program are datasets and transformations, which are abstracted through the tasks of defining and executing datasets, that is, forming datasets and the transformation process of datasets. A dataset is a collection of arbitrary data objects, and a transformation describes the conversion of data types from one dataset to another. For example, assuming X, Y are datasets, then the transformation can be seen as a function t: X→Y. For example, the above transformation is a map MAP, for each input data object x i ∈ X, there is one and only one output data object y i ∈ Y. For example, when reducing reduce when transforming, all input data objects are aggregated into a dataset of objects. Data transformation in relational databases, including join, groupby, project, union, and distinct. In order to describe the Flink program application logic, this application parameterizes the data conversion process through user-defined functions. Flink programs can be represented as multiple chain transformations. When performing tree transformations, the Flink framework is responsible for program optimization, data distribution, and parallel execution across clusters of machines. That is, a Flink script task can include multiple chained data transformations, and the Flink program will optimize the program, complete data distribution, and execute concurrently on clusters across machines during execution.

进一步地,所述指定Flink程序包括对特定逻辑图的聚合运算、变换预算和模式匹配预算,所述将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器的步骤S4003,包括:Further, the specified Flink program includes aggregate operations, transformation budgets and pattern matching budgets for a specific logic graph, and the step S4003 of distributing the data set of concurrent execution operations to the specified machines in the Apache Flink framework cluster includes:

S40031:获取所述特定逻辑图对应的数据集、聚合运算对应的聚合函数、变换运算对应的转换函数以及模式匹配运算对应的匹配函数;S40031: Obtain the data set corresponding to the specific logic diagram, the aggregation function corresponding to the aggregation operation, the conversion function corresponding to the transformation operation, and the matching function corresponding to the pattern matching operation;

S40032:将所述特定逻辑图对应的数据集和聚合运算对应的聚合函数分发至集群中的第一机器,将所述特定逻辑图对应的数据集和变换运算对应的转换函数分发至集群中的第二机器,将所述特定逻辑图对应的数据集和模式匹配运算对应的匹配函数分发至集群中的第三机器;S40032: Distribute the data set corresponding to the specific logic diagram and the aggregation function corresponding to the aggregation operation to the first machine in the cluster, and distribute the data set corresponding to the specific logic diagram and the transformation function corresponding to the transformation operation to the cluster. The second machine distributes the data set corresponding to the specific logic diagram and the matching function corresponding to the pattern matching operation to the third machine in the cluster;

S40033:控制所述第一机器、所述第二机器和所述第三机器并发运行。S40033: Control the first machine, the second machine, and the third machine to run concurrently.

本实施例以图运算符同时包括聚合运算符、变换运算符和模式匹配运算符为例,进行详细说明Apache Flink框架集群中的机器并发执行的过程。上述聚合运算符指,将输入的特定逻辑图G映射到输出图G’。通过执行用户事先定义好的聚合函数α∶L→A,L是逻辑图,A是值域,得到输出图G’是输入的特定逻辑图G的具有新属性k的修改版本,使得κ(G’,k)→α(G),κ是映射关系,把逻辑图的属性k映射到一个值。例如顶点计数:alpha=(g=>g.V.count());outGraph=inGraph.agregate(′vertexCount′,alpha)。上述变换运算符对输入的特定逻辑图G,执行用户定义的转换函数γ:L→L,ν:V→V和∈:E→E,得到输出图G′=γ(G),其中V(G′)={ν(v):v∈V(G)},E(G′)={∈(e):e∈E(G)}。上述模式匹配运算符包含两个参数,分别是模式图G*和谓词

Figure BDA0002604023300000101
模式匹配运算应用在特定逻辑图G上,返回包含所有匹配的图集合
Figure BDA0002604023300000102
In this embodiment, the process of concurrent execution by machines in the Apache Flink framework cluster is described in detail by taking the graph operator including the aggregation operator, the transformation operator, and the pattern matching operator as an example. The above aggregation operator refers to mapping the input specific logic graph G to the output graph G'. By executing a user-defined aggregation function α:L→A, where L is the logic graph and A is the value domain, the output graph G' is a modified version of the input specific logic graph G with a new attribute k, such that κ (G ', k)→α(G), κ is the mapping relationship, which maps the attribute k of the logic diagram to a value. For example vertex count: alpha=(g=>gVcount());outGraph=inGraph.agregate('vertexCount', alpha). The above transformation operator performs a user-defined transformation function γ: L→L, ν: V→V and ∈: E→E on the input specific logical graph G, and obtains the output graph G′=γ(G), where V( G')={ν(v):v∈V(G)}, E(G')={∈(e):e∈E(G)}. The above pattern matching operator takes two parameters, the pattern graph G * and the predicate
Figure BDA0002604023300000101
The pattern matching operation is applied to a specific logical graph G and returns a set of graphs containing all matches
Figure BDA0002604023300000102

参照图2,本申请一实施例的图网络结构的分析装置,包括:2, an apparatus for analyzing a graph network structure according to an embodiment of the present application includes:

第一获取模块1,用于根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;The first acquisition module 1 is used to acquire the graph network structure to be analyzed according to the execution script of the graph analysis task, wherein the graph network structure includes a vertex set and an edge set, and each vertex in the vertex set carries the corresponding vertex The first type label and the first attribute of the edge set, each edge in the edge set is formed by connecting two vertices with an associated relationship, and each of the edges respectively carries the second type label and the second attribute corresponding to the edge;

映射模块2,用于将所述图网络结构中的各顶点以及各条边分别映射指定数据集;The mapping module 2 is used to map each vertex and each edge in the graph network structure to the specified data set respectively;

第二获取模块3,用于根据所述执行脚本,获取与所述图分析任务相对应的图运算符;A second obtaining module 3, configured to obtain a graph operator corresponding to the graph analysis task according to the execution script;

计算模块4,用于利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。The calculation module 4 is configured to analyze and calculate the specified data set corresponding to the graph network structure by using the graph operator to obtain an analysis result.

本申请中的图网络结构带有扩展属性,图网络结构由顶点集V和边集E组成,其中顶点集V中包括各类型顶点,由点对应的第一类型标签和第一属性进行描述和表示各顶点。举例地,上述带扩展属性的图网络结构为图社交网络,表示为有向图G=<V,E>,由一组顶点V和一组边

Figure BDA0002604023300000111
组成。比如图网络由顶点集V={v0,…,v9}和边集E={e0,…,e19}组成。顶点可分为多种类型,包括表示人(Person)的顶点、表示论坛(Forum)的顶点和表示兴趣(Tag)的顶点。不同类型的顶点由相应的第一类型标签表示和相应的第一属性进行描述。边描述了顶点之间的关系,也有对应的第二类型标签(如knows)和第二属性进行描述。具有相同类型标签的顶点可能具有不同的属性,例如v0和v1,v1或许包括age属性,但是v0或许没有age属性,这与顶点的类型相关,可通过配置文件配置各顶点的类型标签和属性,以及各边的类型标签和属性,实现了有扩展属性的图网络结构,使网络结构的语义更丰富,可支持多个不同属性的图,信息量更庞大,可表达的关系更加复杂。本申请通过将图网络结构中的各顶点以及各条边分别映射为指定数据类型的指定数据集,比如Flink数据集,以便通过解析指定数据集之间的关系,计算图运算符对应的分析结果,实现对复杂的图网络结构的分析计算,为图形分析提供了新功能领域的应用可能。本申请不仅建立有扩展属性的图网络结构,图表示的语义更丰富,支持多个不同属性的图,且基于有扩展属性的图网络结构,提供声明性和可组合的图运算符,在Apache FLink上实现了扩展属性的图网络结构,并将图运算转换为数据集的运算,实现分析单个图或图集合,为图形分析提供了新的功能,实现对大规模数据和复杂图形处理的能力。The graph network structure in this application has extended attributes, and the graph network structure is composed of a vertex set V and an edge set E, wherein the vertex set V includes various types of vertices, which are described by the first type label and the first attribute corresponding to the point. represents each vertex. For example, the above graph network structure with extended attributes is a graph social network, represented as a directed graph G=<V, E>, consisting of a set of vertices V and a set of edges
Figure BDA0002604023300000111
composition. For example, a graph network consists of a vertex set V={v 0 ,...,v 9 } and an edge set E={e 0 ,...,e 19 }. Vertices can be classified into various types, including vertices representing people (Person), vertices representing forums (Forum), and vertices representing interests (Tag). Different types of vertices are represented by corresponding first type labels and described by corresponding first attributes. Edges describe the relationship between vertices, and also have corresponding second type labels (such as knows) and second attributes to describe. Vertices with the same type label may have different attributes, such as v0 and v1, v1 may include an age attribute, but v0 may not have an age attribute, which is related to the type of vertex, and the type label and attribute of each vertex can be configured through the configuration file, As well as the type labels and attributes of each side, a graph network structure with extended attributes is realized, which makes the semantics of the network structure richer, supports multiple graphs with different attributes, has a larger amount of information, and can express more complex relationships. In this application, each vertex and each edge in the graph network structure is mapped to a specified data set of a specified data type, such as a Flink data set, so as to analyze the relationship between the specified data sets and calculate the analysis results corresponding to the graph operators , to realize the analysis and calculation of the complex graph network structure, and provide the possibility of application in the new functional field for the graph analysis. This application not only establishes a graph network structure with extended attributes, but also has richer semantics of graph representation, supports multiple graphs with different attributes, and provides declarative and composable graph operators based on the graph network structure with extended attributes. FLink implements a graph network structure with extended attributes, and converts graph operations into data set operations to analyze a single graph or graph set, providing new functions for graph analysis, and realizing the ability to process large-scale data and complex graphs .

进一步地,第一获取模块1,包括:Further, the first acquisition module 1 includes:

第一获取单元,用于获取图分析任务的执行脚本中携带的所述待分析的图网络结构的图元素,其中,所述图元素包括图头;a first obtaining unit, configured to obtain the graph elements of the graph network structure to be analyzed carried in the execution script of the graph analysis task, wherein the graph elements include graph headers;

抽取单元,用于根据所述图头从指定数据库中抽取所述图头对应的图头数据集;an extraction unit, used for extracting a picture header data set corresponding to the picture header from a specified database according to the picture header;

判断单元,用于判断所述图头数据集是否包括多个对象;a judging unit for judging whether the image header data set includes multiple objects;

判定单元,用于若未包括多个对象,则判定所述待分析的图网络结构为图头为单个对象的逻辑图;a determining unit, configured to determine that the graph network structure to be analyzed is a logical graph whose graph header is a single object if it does not include multiple objects;

第二获取单元,用于从所述数据集中获取与所述逻辑图关联的顶点数据和边数据,分别对应组成顶点集和边集;a second acquiring unit, configured to acquire vertex data and edge data associated with the logic graph from the data set, and form a vertex set and an edge set respectively;

第一得到单元,用于通过所述顶点集和边集表示所述逻辑图,得到所述待分析的图网络结构。The first obtaining unit is used to represent the logic graph through the vertex set and the edge set, and obtain the graph network structure to be analyzed.

本申请实施例以待分析的图网络结构为逻辑图为例,详细说明从数据库中获取图网络结构的数据集,表示图网络结构的过程。逻辑图是图集合的一个特殊情形,其图头数据集中只包含单个对象,图中所有的数据信息都是相通的。逻辑图也有类型标签(例如Community)和属性,可以通过用特定的度量(例如vertexCount:3)或关于该逻辑图的一般信息(例如interest:Databases)来描述该逻辑图。逻辑图的类型标签和属性可通过显式声明的方式显示在逻辑图中,也可以是图算法的输出或用作后续运算符的输入,例如社区检测或图模式匹配的应用情况。本申请中,预定义的数据库DB=<V,E,L>,由顶点集V={vi}、边集E={ek}和一组逻辑图L={Gm}组成。逻辑图Gm=<Vm,Em>,由顶点集V的子集

Figure BDA0002604023300000121
和边集的子集
Figure BDA0002604023300000122
组成的有序对。图头表示数据与某个逻辑图相关联的关系,如果待分析的图网络结构的图头均只针对一种对象,则整个待分析的图网络结构为数据相通或具有数据关联的逻辑图。本申请的逻辑图带有扩展属性,比如图头的扩展属性包括地址、标签和数据属性,表示为GraphHead:=<Id,Label,Properties>;顶点的扩展属性包括地址、标签、顶点数据属性和图中顶点编号,表示为Vertex:<Id,Label,Properties,GraphIds>;边的扩展属性包括地址、标签、边起始顶点地址、边终止顶点地址、边数据属性和图中的边编号,表示为Edge:<Id,Label,SrcId,TrgtId,Properties,GraphIds>。举例地,逻辑图为g0={DataSet<GraphHead>graphHead={<0,′Community′,{′interest′:′Databases′,...}>},DataSet<Vertex>vertices={<0,′Person′,{′name′:′Alice′,...},{0,2}>,<1,′Person′,{′name′:′Bob′,...},{0,2}>},DataSet<Edge>edges={<0,′knows′,0,1,{′since:2014},{0,2}>,<1,′Person′,1,0,{′since′:2014},{0,2}>}}。The embodiments of the present application take the graph network structure to be analyzed as a logical graph as an example, and describe in detail the process of acquiring the graph network structure data set from the database to represent the graph network structure. A logic graph is a special case of a graph set. The graph header data set contains only a single object, and all the data information in the graph is the same. Logic graphs also have type labels (eg Community) and attributes, which can be described by either specific metrics (eg vertexCount:3) or general information about the logic graph (eg interest:Databases). The type labels and properties of a logic graph can be explicitly declared in the logic graph, and can also be the output of graph algorithms or used as input to subsequent operators, such as the application of community detection or graph pattern matching. In this application, the predefined database DB=<V, E, L> consists of a vertex set V={vi } , an edge set E={ ek } and a set of logic graphs L={G m }. Logical graph Gm=<V m , Em >, consisting of a subset of vertex set V
Figure BDA0002604023300000121
and a subset of the edge set
Figure BDA0002604023300000122
composed of ordered pairs. The graph header represents the relationship between data and a certain logical graph. If the graph headers of the graph network structure to be analyzed are only for one type of object, the entire graph network structure to be analyzed is a logical graph with interconnected data or data association. The logic graph of the present application has extended attributes. For example, the extended attributes of the graph header include address, label and data attributes, which are expressed as GraphHead:=<Id, Label, Properties>; the extended attributes of the vertex include address, label, vertex data attribute and The vertex number in the graph, expressed as Vertex: <Id, Label, Properties, GraphIds>; the extended attributes of the edge include address, label, edge start vertex address, edge end vertex address, edge data attribute and the edge number in the graph, indicating For Edge: <Id, Label, SrcId, TrgtId, Properties, GraphIds>. For example, the logic graph is g0={DataSet<GraphHead>graphHead={<0,'Community',{'interest':'Databases',...}>},DataSet<Vertex>vertices={<0,'Person',{'name':'Alice', ...}, {0, 2}>, <1, 'Person', {'name': 'Bob', ...}, {0, 2} >},DataSet<Edge>edges={<0,'knows',0,1,{'since:2014},{0,2}>,<1,'Person',1,0,{'since' : 2014}, {0, 2}>}}.

进一步地,所述指定数据集包括Flink数据集,图网络结构的分析装置,包括:Further, the specified data set includes a Flink data set, an analysis device for a graph network structure, including:

第三获取模块,用于获取各所述图运算符的执行顺序;The third acquisition module is used to acquire the execution order of each of the graph operators;

链接模块,用于按照所述执行顺序将各所述图运算符,通过Java编程语言链接为运算逻辑;a linking module, configured to link each of the graph operators into operational logic through the Java programming language according to the execution sequence;

关联模块,用于将所述运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序。The association module is used to associate the operation logic with the trigger of the Flink trigger program to form the Flink operation program.

本申请实施例的图网络结构,为基于Apache Flink框架的图表示,使用三种对象类型来表示带扩展属性的图网络结构的图元素:图头、顶点和边。为表示图网络结构的数据集合,为每种图元素使用专用的Flink数据集。图元素对应的Flink数据集,通过数据映射转换,转为为各自对应的专属Flink数据集。在Apache Flink框架中,需要显式地触发程序才能执行,例如,将结果写入文件或数据库。由于本申请实施例中的图运算符实现的过程,都不包含显式地触发程序的触发器,因此本实施例将多个运算符按照执行脚本中的执行顺序链接成运算逻辑,然后将运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序,使整个运算过程作为一个Flink程序执行。The graph network structure of the embodiment of the present application is a graph representation based on the Apache Flink framework, and three object types are used to represent the graph elements of the graph network structure with extended attributes: graph header, vertex and edge. For datasets representing graph network structures, a dedicated Flink dataset is used for each graph element. The Flink datasets corresponding to the graph elements are converted into their respective dedicated Flink datasets through data mapping conversion. In the Apache Flink framework, programs need to be explicitly triggered to execute, for example, to write results to a file or database. Since none of the processes implemented by the graph operators in the embodiments of the present application include triggers that trigger programs explicitly, in this embodiment, multiple operators are linked into operation logic according to the execution order in the execution script, and then the operation The logic is associated with the trigger of the Flink trigger program to form a Flink operation program, so that the entire operation process is executed as a Flink program.

进一步地,所述图运算符包括排除运算符,计算模块4,包括:Further, the graph operator includes an exclusion operator, and the calculation module 4 includes:

第三获取单元,用于获取第一逻辑图对应的第一顶点集和第一边集,获取第二逻辑图对应的第二顶点集和第二边集;a third obtaining unit, configured to obtain the first vertex set and the first edge set corresponding to the first logical graph, and obtain the second vertex set and the second edge set corresponding to the second logical graph;

提取单元,用于通过函数ID_ONLY从第二逻辑图中,提取所述第二逻辑图的标识符,形成新id对象对应的数据集;An extraction unit, used for extracting the identifier of the second logic diagram from the second logic diagram through the function ID_ONLY, to form a data set corresponding to the new id object;

筛选单元,用于通过所述排除运算符筛选指定顶点和指定边,得到所述排除运算符对应的排除结果数据集,其中,所述指定顶点包含于所述第一顶点集,不包含于所述第二顶点集,所述指定边真包含于所述第一边集。A screening unit, configured to filter a specified vertex and a specified edge through the exclusion operator, to obtain an exclusion result data set corresponding to the exclusion operator, wherein the specified vertex is included in the first vertex set, and is not included in the the second vertex set, the specified edge is actually contained in the first edge set.

第一发送单元,用于通过Flink广播将所述排除结果数据集发送至Apache Flink框架集群的所有节点。The first sending unit is configured to send the excluded result data set to all nodes of the Apache Flink framework cluster through Flink broadcast.

本申请以对图计算的排除运算符操作,得到转换后的数据集的过程进行说明。排除运算符的实现过程包括,生成的逻辑图由包含在第一逻辑图中但不包含在第二逻辑图中的顶点和边组成。实现的方式是基于逻辑图的顶点和边进行过滤。先通过对第二逻辑图的图头数据集,用映射变换来提取第二逻辑图的标识符。转换时通过调用用户定义的函数ID_ONLY进行参数化,使通过函数ID_ONLY从图头数据集中提取标识符。生成的逻辑图的数据集包含一个新id对象,用于筛选不包含在具有此新id对象的第二逻辑图中的顶点。筛选转换时通过用户定义的函数(NOT_IN_GRAPH_FILTER)作为参数,对第一逻辑图和第二逻辑图的数据集中的每个顶点调用该函数。为了使新id对象适用于过滤filter函数,使用Flink的广播概念将运算后得到的数据集,发送给Apache Flink框架集群所有节点。然后生成的数据集,只包含filter函数计算结果为true的顶点。对于边的过滤具有相同的过程,但边的过滤还需要检测两个半连接状态,即构成边的源顶点和目标顶点只有一端在生成的数据集中的情况需要排除,以确保源顶点和目标顶点均包含在新的顶点集中。根据运算生成的数据集,创建一个新逻辑图,新逻辑图的图头由构造函数创建,包括生成一个新id。上述检测两个半连接状态的过程是指,组成边的源顶点和目标顶点有一端在另一个逻辑图中,即一逻辑图的一条边的部分在另一逻辑图中,需要将重叠的区域排除的情况。The present application describes the process of obtaining the transformed data set by operating the exclusion operator of graph computation. The implementation process of the exclusion operator includes that the generated logic graph is composed of vertices and edges included in the first logic graph but not included in the second logic graph. The way it is implemented is to filter based on the vertices and edges of the logic graph. First, the identifier of the second logic diagram is extracted by mapping the header data set of the second logic diagram. The conversion is parameterized by calling the user-defined function ID_ONLY, so that the identifier is extracted from the header dataset by the function ID_ONLY. The dataset of the resulting logical graph contains a new id object for filtering vertices that are not included in the second logical graph with this new id object. The user-defined function (NOT_IN_GRAPH_FILTER) is passed as a parameter during the filter transformation, and the function is called for each vertex in the data set of the first and second logic graphs. In order to make the new id object suitable for the filter function, the data set obtained after the operation is sent to all nodes of the Apache Flink framework cluster using Flink's broadcasting concept. Then the generated dataset contains only vertices for which the filter function evaluates to true. The same process is used for edge filtering, but edge filtering also requires detection of two semi-connected states, i.e. the case where only one end of the source and destination vertices that make up the edge is in the generated dataset needs to be excluded to ensure that the source and destination vertices are are included in the new vertex set. According to the data set generated by the operation, a new logic graph is created, and the graph header of the new logic graph is created by the constructor, including generating a new id. The above process of detecting two semi-connected states means that the source vertex and the target vertex of the edge have one end in another logical graph, that is, the part of one edge of one logical graph is in another logical graph, and the overlapping area needs to be divided. excluded situations.

进一步地,所述图网络结构为多个逻辑图组成的社交网络图,所述图运算符包括Flink提供的关系操作符,计算模块4,包括:Further, the graph network structure is a social network graph composed of multiple logical graphs, the graph operators include relational operators provided by Flink, and the calculation module 4 includes:

识别单元,用于识别用户选定的关系操作符是否为寻找所述社交网络图中逻辑图之间的最大公共子图;an identification unit for identifying whether the relation operator selected by the user is to find the largest common subgraph between the logic graphs in the social network graph;

聚集单元,用于若为寻找所述社交网络图中逻辑图之间的最大公共子图,则聚集所述社交网络图中每个所述逻辑图的顶点数量;an aggregation unit, configured to aggregate the number of vertices of each of the logical graphs in the social network graph if the largest common subgraph between the logical graphs in the social network graph is to be found;

选择单元,用于根据每个所述逻辑图的顶点数量,选择高于最小顶点计数的指定逻辑图;a selection unit for selecting a specified logical graph higher than the minimum vertex count according to the number of vertices in each of the logical graphs;

第二得到单元,用于基于所述指定逻辑图,通过标签传播算法得到所述社交网络图中逻辑图之间的最大公共子图。The second obtaining unit is configured to obtain the maximum common subgraph between the logic graphs in the social network graph through a label propagation algorithm based on the specified logic graph.

本申请使用Flink数据集来表示图形,因此图形分析不仅限于预先定义好的运算符,而且还可以使用Flink提供的所有数据运算,如关系操作符、机器学习符或图形处理符。以通过关系操作符,找到社交网络中较大的社区之间的最大公共子图为例,详细说明分析计算过程。图网络结构所表示的对象通常是异构的,例如社交网络图的顶点可以表示用户、组或乐队,而关系可以表示友谊、成员资格或兴趣。即同一类型的实体对象可能具有不同的属性,即不同的用户可能提供不同的属性信息。上述社交网络图包括多个逻辑图,包含一组逻辑图L={G0,G1,G2},每个逻辑图表示社交网络图中的一个社区,比如是特定的兴趣组,例如Databases。每个逻辑图都有一个对应顶点子集和边子集,例如,V(G0)={v0,v1}和E(G0)={e0,e1}。分析逻辑图G0和G2的关系,从社交网络图的三个逻辑图的带扩展属性的社交网络图可得到,逻辑图G0关联到的顶点和边与逻辑图G2共享,可以得知G0和G2顶点和边可能重叠,因为和。本实施例通过识别社区并聚集每个社区的顶点数量,选择高于最小顶点计数的社区,即确定选定的较大的社区,最小顶点计数额由用户预先设定。最后应用标签传播算法来找到最大公共子图。上述标签传播算法是基于图的半监督学习方法,过程为从已标记的节点的标签信息,预测未标记的节点的标签信息,利用样本间的关系,建立完全图模型。每个节点标签按相似度传播给相邻节点,在节点传播的每一步,每个节点根据相邻节点的标签来更新自己的标签,与该节点相似度越大,其相邻节点对其标注的影响权值越大,相似节点的标签越趋于一致,其标签就越容易传播。在标签传播过程中,保持已标记的数据的标签不变,使其将标签传给未标注的数据。最终当迭代结束时,相似节点的概率分布趋于相似,可以划分到一类中,找到最大公共子图对应得最大独立集。This application uses Flink datasets to represent graphs, so graph analysis is not limited to pre-defined operators, but can also use all data operations provided by Flink, such as relational operators, machine learning operators or graph processing operators. Taking the relationship operator to find the largest common subgraph among larger communities in a social network as an example, the analysis and calculation process is described in detail. Objects represented by graph network structures are often heterogeneous, for example, vertices of a social network graph can represent users, groups, or bands, while relationships can represent friendships, memberships, or interests. That is, entity objects of the same type may have different attributes, that is, different users may provide different attribute information. The above social network diagram includes multiple logic diagrams, including a set of logic diagrams L={G 0 , G 1 , G 2 }, each logic diagram represents a community in the social network diagram, such as a specific interest group, such as Databases . Each logical graph has a corresponding subset of vertices and edges, eg, V(G0)={v0,v1} and E(G0)={e0,e1}. Analyzing the relationship between the logical graphs G0 and G2, it can be obtained from the social network graph with extended attributes of the three logical graphs of the social network graph. The vertices and edges associated with the logical graph G0 are shared with the logical graph G2, and it can be known that G0 and G2 Vertices and edges may overlap because and . In this embodiment, by identifying communities and gathering the number of vertices of each community, a community higher than the minimum vertex count is selected, that is, the selected larger community is determined, and the minimum vertex count amount is preset by the user. Finally a label propagation algorithm is applied to find the maximum common subgraph. The above label propagation algorithm is a graph-based semi-supervised learning method. The process is to predict the label information of unlabeled nodes from the label information of labeled nodes, and use the relationship between samples to build a complete graph model. The label of each node is propagated to adjacent nodes according to the similarity. At each step of node propagation, each node updates its label according to the label of the adjacent node. The greater the similarity with the node, the adjacent node will label it. The larger the influence weight is, the more consistent the labels of similar nodes are, and the easier its labels are to propagate. During label propagation, the label of the labeled data is kept unchanged, so that it passes the label to the unlabeled data. Finally, when the iteration ends, the probability distributions of similar nodes tend to be similar and can be divided into one class to find the largest independent set corresponding to the largest common subgraph.

进一步地,所述图预算符至少包括两个,计算模块4,包括:Further, the graph budget symbol includes at least two, and the calculation module 4 includes:

第四获取单元,用于获取多个图预算符对应的指定Flink程序;The fourth obtaining unit is used to obtain the designated Flink programs corresponding to multiple graph budget symbols;

第五获取单元,用于获取所述指定Flink程序当前运行节点,预测可并发执行运算的数据集;The fifth acquisition unit is used to acquire the currently running node of the specified Flink program, and predict the data set that can perform operations concurrently;

分发单元,用于将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器;The distribution unit is used to distribute the concurrently executed data sets to the designated machines in the Apache Flink framework cluster;

接收单元,用于接收各指定机器分别执行分发的数据集得到的计算结果;a receiving unit, configured to receive the calculation results obtained by the respective designated machines executing the distributed data sets;

汇总单元,用于将各计算结果汇总得到多个图预算符的分析结果;The summary unit is used for summarizing the calculation results to obtain the analysis results of multiple graph budget symbols;

第二发送单元,用于将所述多个图预算符的分析结果,通过Flink广播发送至Apache Flink框架集群的所有节点。The second sending unit is configured to send the analysis results of the plurality of graph budget identifiers to all nodes of the Apache Flink framework cluster through Flink broadcast.

本申请的Apache Flink框架,对来自流和批处理的数据,支持声明性定义和分布式执行分析,支持的数据包括流式的和批处理的。程序的基本组成是数据集和转换,通过对数据集进行定义和执行的任务进行抽象,即形成数据集和数据集的转换过程。数据集是任意数据对象的集合,转换描述了一个数据集到另一个数据集的数据类型的转换。例如,假设X,Y是数据集,那么转换可以看作函数t:X→Y。举例说,上述转换是映射MAP,对于每个输入的数据对象xi∈X,有且只有一个输出数据对象yi∈Y。比如,转换时归约reduce,则所有输入数据对象都被聚合到一个对象的数据集。关系型数据库中的数据转换,包括join、groupby、project、union和distinct等。本申请为了描述Flink程序应用逻辑,通过用户自定义的函数对数据转换过程进行参数化。Flink程序可以表示为多个链式转换,执行树转化时Flink框架负责程序优化、数据分发和跨机器的集群并行执行。即一个Flink脚本任务可以包括多个链式的数据变换,Flink程序在执行时会一并优化程序、完成数据分发以及在跨机器的集群上并发执行。The Apache Flink framework of this application supports declarative definition and distributed execution analysis of data from streaming and batch processing. The supported data includes streaming and batch processing. The basic components of the program are datasets and transformations, which are abstracted through the tasks of defining and executing datasets, that is, forming datasets and the transformation process of datasets. A dataset is a collection of arbitrary data objects, and a transformation describes the conversion of data types from one dataset to another. For example, assuming X, Y are datasets, then the transformation can be seen as a function t: X→Y. For example, the above transformation is a map MAP, for each input data object x i ∈ X, there is one and only one output data object y i ∈ Y. For example, when reducing reduce when transforming, all input data objects are aggregated into a dataset of objects. Data transformation in relational databases, including join, groupby, project, union, and distinct. In order to describe the Flink program application logic, this application parameterizes the data conversion process through user-defined functions. Flink programs can be represented as multiple chain transformations. When performing tree transformations, the Flink framework is responsible for program optimization, data distribution, and parallel execution across clusters of machines. That is, a Flink script task can include multiple chained data transformations, and the Flink program will optimize the program, complete data distribution, and execute concurrently on clusters across machines during execution.

进一步地,所述指定Flink程序包括对特定逻辑图的聚合运算、变换预算和模式匹配预算,分发单元,包括:Further, the specified Flink program includes aggregation operations, transformation budgets and pattern matching budgets for specific logic graphs, and the distribution unit includes:

获取子单元,用于获取所述特定逻辑图对应的数据集、聚合运算对应的聚合函数、变换运算对应的转换函数以及模式匹配运算对应的匹配函数;an acquisition subunit for acquiring the data set corresponding to the specific logic diagram, the aggregation function corresponding to the aggregation operation, the conversion function corresponding to the transformation operation, and the matching function corresponding to the pattern matching operation;

分发子单元,用于将所述特定逻辑图对应的数据集和聚合运算对应的聚合函数分发至集群中的第一机器,将所述特定逻辑图对应的数据集和变换运算对应的转换函数分发至集群中的第二机器,将所述特定逻辑图对应的数据集和模式匹配运算对应的匹配函数分发至集群中的第三机器;A distribution subunit, configured to distribute the data set corresponding to the specific logic diagram and the aggregation function corresponding to the aggregation operation to the first machine in the cluster, and distribute the data set corresponding to the specific logic diagram and the conversion function corresponding to the transformation operation to the second machine in the cluster, and distribute the data set corresponding to the specific logic diagram and the matching function corresponding to the pattern matching operation to the third machine in the cluster;

控制子单元,用于控制所述第一机器、所述第二机器和所述第三机器并发运行。A control subunit for controlling the first machine, the second machine and the third machine to run concurrently.

本实施例以图运算符同时包括聚合运算符、变换运算符和模式匹配运算符为例,进行详细说明Apache Flink框架集群中的机器并发执行的过程。上述聚合运算符指,将输入的特定逻辑图G映射到输出图G’。通过执行用户事先定义好的聚合函数α:L→A,L是逻辑图,A是值域,得到输出图G’是输入的特定逻辑图G的具有新属性k的修改版本,使得κ(G′,k)→α(G),κ是映射关系,把逻辑图的属性k映射到一个值。例如顶点计数:alpha=(g=>g.V.count());outGraph=inGraph.aggregate(′vertexCount′,alpha)。上述变换运算符对输入的特定逻辑图G,执行用户定义的转换函数γ:L→L,v:V→V和∈:E→E,得到输出图G′=γ(G),其中G′={v(v):v∈V(G)},E(G′)={∈(e):e∈E(G)}。上述模式匹配运算符包含两个参数,分别是模式图G*和谓词

Figure BDA0002604023300000171
模式匹配运算应用在特定逻辑图G上,返回包含所有匹配的图集合
Figure BDA0002604023300000172
In this embodiment, the process of concurrent execution by machines in the Apache Flink framework cluster is described in detail by taking the graph operator including the aggregation operator, the transformation operator, and the pattern matching operator as an example. The above aggregation operator refers to mapping the input specific logic graph G to the output graph G'. By executing a user-defined aggregation function α: L→A, where L is the logic graph and A is the value domain, the output graph G' is a modified version of the input specific logic graph G with a new attribute k, such that κ (G ′, k)→α(G), κ is the mapping relationship, which maps the attribute k of the logic diagram to a value. For example, vertex count: alpha=(g=>gVcount());outGraph=inGraph.aggregate('vertexCount', alpha). The above transformation operator performs a user-defined transformation function γ: L→L, v: V→V and ∈: E→E on the input specific logical graph G, and obtains the output graph G′=γ(G), where G′ ={v(v):v∈V(G)}, E(G')={∈(e):e∈E(G)}. The above pattern matching operator takes two parameters, the pattern graph G * and the predicate
Figure BDA0002604023300000171
The pattern matching operation is applied to a specific logical graph G and returns a set of graphs containing all matches
Figure BDA0002604023300000172

参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储图网络结构的分析过程需要的所有数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现图网络结构的分析方法。Referring to FIG. 3 , an embodiment of the present application further provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store all the data required for the analysis process of the graph network structure. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for analyzing a graph network structure.

上述处理器执行上述图网络结构的分析方法,根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;将所述图网络结构中的各顶点以及各条边分别映射指定数据集;根据所述执行脚本,获取与所述图分析任务相对应的图运算符;利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。The above-mentioned processor executes the analysis method of the above-mentioned graph network structure, and obtains the graph network structure to be analyzed according to the execution script of the graph analysis task, wherein the graph network structure includes a vertex set and an edge set, and each vertex in the vertex set is respectively Carrying the first type label and the first attribute corresponding to the vertex, each edge in the edge set is formed by connecting two vertices with an associated relationship, and each of the edges carries the second type label and the second attribute corresponding to the edge respectively. ; Map each vertex and each edge in the graph network structure to the specified data set respectively; According to the execution script, obtain the graph operator corresponding to the graph analysis task; Utilize the graph operator to the described graph operator The specified data set corresponding to the graph network structure is analyzed and calculated, and the analysis result is obtained.

上述计算机设备,不仅建立有扩展属性的图网络结构,图表示的语义更丰富,支持多个不同属性的图,且基于有扩展属性的图网络结构,提供声明性和可组合的图运算符,在Apache FLink上实现了扩展属性的图网络结构,并将图运算转换为数据集的运算,实现分析单个图或图集合,为图形分析提供了新的功能,实现对大规模数据和复杂图形处理的能力。The above computer equipment not only establishes a graph network structure with extended attributes, but also has richer semantics of graph representation, supports multiple graphs with different attributes, and provides declarative and composable graph operators based on the graph network structure with extended attributes, The graph network structure with extended attributes is implemented on Apache FLink, and graph operations are converted into data set operations to analyze a single graph or graph collection, provide new functions for graph analysis, and realize large-scale data and complex graph processing. Ability.

在一个实施例中,上述处理器根据图分析任务的执行脚本,获取待分析的图网络结构的步骤,包括:获取图分析任务的执行脚本中携带的所述待分析的图网络结构的图元素,其中,所述图元素包括图头;根据所述图头从指定数据库中抽取所述图头对应的图头数据集;判断所述图头数据集是否包括多个对象;若否,则判定所述待分析的图网络结构为图头为单个对象的逻辑图;从所述数据集中获取与所述逻辑图关联的顶点数据和边数据,分别对应组成顶点集和边集;通过所述顶点集和边集表示所述逻辑图,得到所述待分析的图网络结构。In one embodiment, the step of obtaining the graph network structure to be analyzed by the processor according to the execution script of the graph analysis task includes: obtaining the graph elements of the graph network structure to be analyzed carried in the execution script of the graph analysis task , wherein the picture element includes a picture header; extract the picture header data set corresponding to the picture header from the specified database according to the picture header; determine whether the picture header data set includes multiple objects; if not, determine The graph network structure to be analyzed is a logical graph whose graph header is a single object; vertex data and edge data associated with the logical graph are obtained from the data set, and form a vertex set and an edge set respectively; Sets and edge sets represent the logic graph, and the graph network structure to be analyzed is obtained.

在一个实施例中,指定数据集包括Flink数据集,上述处理器根据所述执行脚本,获取与所述图分析任务相对应的图运算符的步骤之后,包括:获取各所述图运算符的执行顺序;按照所述执行顺序将各所述图运算符,通过Java编程语言链接为运算逻辑;将所述运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序。In one embodiment, the specified data set includes a Flink data set, and after the step of obtaining the graph operator corresponding to the graph analysis task according to the execution script, the processor includes: obtaining the graph operator of each graph operator. Execution sequence; according to the execution sequence, each of the graph operators is linked into operation logic through the Java programming language; the operation logic is associated with the trigger of the Flink trigger program to form a Flink operation program.

在一个实施例中,所述图运算符包括排除运算符,上述处理器利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:获取第一逻辑图对应的第一顶点集和第一边集,获取第二逻辑图对应的第二顶点集和第二边集;通过函数ID_ONLY从第二逻辑图中,提取所述第二逻辑图的标识符,形成新id对象对应的数据集;通过所述排除运算符筛选指定顶点和指定边,得到所述排除运算符对应的排除结果数据集,其中,所述指定顶点包含于所述第一顶点集,不包含于所述第二顶点集,所述指定边真包含于所述第一边集;通过Flink广播将所述排除结果数据集发送至Apache Flink框架集群的所有节点。In one embodiment, the graph operator includes an exclusion operator, and the processor uses the graph operator to perform analysis and calculation on a specified data set corresponding to the graph network structure, and the step of obtaining an analysis result includes: obtaining a first The first vertex set and the first edge set corresponding to a logical graph are obtained, and the second vertex set and the second edge set corresponding to the second logical graph are obtained; the function ID_ONLY is used to extract the second logical graph from the second logical graph. identifier to form the data set corresponding to the new id object; filter the specified vertex and the specified edge by the exclusion operator, and obtain the exclusion result data set corresponding to the exclusion operator, wherein the specified vertex is included in the first The vertex set is not included in the second vertex set, and the specified edge is really included in the first edge set; the excluded result data set is sent to all nodes of the Apache Flink framework cluster through Flink broadcast.

在一个实施例中,所述图网络结构为多个逻辑图组成的社交网络图,所述图运算符包括Flink提供的关系操作符,上述处理器利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:识别用户选定的关系操作符是否为寻找所述社交网络图中逻辑图之间的最大公共子图;若是,则聚集所述社交网络图中每个所述逻辑图的顶点数量;根据每个所述逻辑图的顶点数量,选择高于最小顶点计数的指定逻辑图;基于所述指定逻辑图,通过标签传播算法得到所述社交网络图中逻辑图之间的最大公共子图。In one embodiment, the graph network structure is a social network graph composed of multiple logical graphs, the graph operators include relational operators provided by Flink, and the above-mentioned processor uses the graph operators to analyze the graph network structure. The corresponding designated data set is analyzed and calculated, and the steps of obtaining the analysis result include: identifying whether the relationship operator selected by the user is to find the largest common subgraph between the logic diagrams in the social network diagram; The number of vertices of each of the logical graphs in the social network graph; according to the number of vertices of each of the logical graphs, a specified logical graph higher than the minimum vertex count is selected; based on the specified logical graph, the tag propagation algorithm is used to obtain the The largest common subgraph between logical graphs in a social network graph.

在一个实施例中,所述图预算符至少包括两个,上述处理器利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:获取多个图预算符对应的指定Flink程序;获取所述指定Flink程序当前运行节点,预测可并发执行运算的数据集;将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器;接收各指定机器分别执行分发的数据集得到的计算结果;将各计算结果汇总得到多个图预算符的分析结果;将所述多个图预算符的分析结果,通过Flink广播发送至Apache Flink框架集群的所有节点。In one embodiment, the graph budget operator includes at least two, and the processor uses the graph operator to perform analysis and calculation on a specified data set corresponding to the graph network structure, and the step of obtaining an analysis result includes: obtaining multiple data sets. The designated Flink program corresponding to each graph budget symbol; obtain the current running node of the designated Flink program, and predict the data set that can execute the operation concurrently; distribute the data set that can execute the operation concurrently to the designated machine in the Apache Flink framework cluster; receive each designated Flink program The machine executes the calculation results obtained from the distributed datasets separately; aggregates the calculation results to obtain the analysis results of multiple graph budget symbols; sends the analysis results of the multiple graph budget symbols to all members of the Apache Flink framework cluster through Flink broadcast node.

在一个实施例中,所述指定Flink程序包括对特定逻辑图的聚合运算、变换预算和模式匹配预算,上述处理器将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器的步骤,包括:获取所述特定逻辑图对应的数据集、聚合运算对应的聚合函数、变换运算对应的转换函数以及模式匹配运算对应的匹配函数;将所述特定逻辑图对应的数据集和聚合运算对应的聚合函数分发至集群中的第一机器,将所述特定逻辑图对应的数据集和变换运算对应的转换函数分发至集群中的第二机器,将所述特定逻辑图对应的数据集和模式匹配运算对应的匹配函数分发至集群中的第三机器;控制所述第一机器、所述第二机器和所述第三机器并发运行。In one embodiment, the designated Flink program includes aggregation operations, transformation budgets, and pattern matching budgets on a specific logic graph, and the above-mentioned processor distributes the data sets that perform operations concurrently to the designated machines in the Apache Flink framework cluster. Including: acquiring the data set corresponding to the specific logic diagram, the aggregation function corresponding to the aggregation operation, the conversion function corresponding to the transformation operation, and the matching function corresponding to the pattern matching operation; and the data set corresponding to the specific logic diagram and the aggregation operation. The aggregation function is distributed to the first machine in the cluster, the data set corresponding to the specific logic diagram and the transformation function corresponding to the transformation operation are distributed to the second machine in the cluster, and the data set corresponding to the specific logic diagram is matched with the pattern The matching function corresponding to the operation is distributed to the third machine in the cluster; the first machine, the second machine and the third machine are controlled to run concurrently.

本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.

本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现图网络结构的分析方法,根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;将所述图网络结构中的各顶点以及各条边分别映射指定数据集;根据所述执行脚本,获取与所述图分析任务相对应的图运算符;利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, a method for analyzing a graph network structure is implemented, and a graph to be analyzed is obtained according to an execution script of a graph analysis task. A network structure, wherein the graph network structure includes a vertex set and an edge set, each vertex in the vertex set carries a first type label and a first attribute corresponding to the vertex, and each edge in the edge set is associated with The two vertices of the relationship are connected to form, and each of the edges respectively carries the second type label and the second attribute corresponding to the edge; each vertex and each edge in the graph network structure are mapped to the specified data set respectively; according to the execution A script is used to obtain a graph operator corresponding to the graph analysis task; using the graph operator to perform analysis and calculation on a specified data set corresponding to the graph network structure, an analysis result is obtained.

上述计算机可读存储介质,不仅建立有扩展属性的图网络结构,图表示的语义更丰富,支持多个不同属性的图,且基于有扩展属性的图网络结构,提供声明性和可组合的图运算符,在Apache FLink上实现了扩展属性的图网络结构,并将图运算转换为数据集的运算,实现分析单个图或图集合,为图形分析提供了新的功能,实现对大规模数据和复杂图形处理的能力。The above computer-readable storage medium not only establishes a graph network structure with extended attributes, but also has richer semantics of graph representation, supports multiple graphs with different attributes, and provides declarative and composable graphs based on the graph network structure with extended attributes. The operator implements a graph network structure with extended attributes on Apache FLink, and converts graph operations into data set operations, realizing the analysis of a single graph or graph collection, providing new functions for graph analysis, and realizing large-scale data and graph analysis. The capability of complex graphics processing.

在一个实施例中,上述处理器根据图分析任务的执行脚本,获取待分析的图网络结构的步骤,包括:获取图分析任务的执行脚本中携带的所述待分析的图网络结构的图元素,其中,所述图元素包括图头;根据所述图头从指定数据库中抽取所述图头对应的图头数据集;判断所述图头数据集是否包括多个对象;若否,则判定所述待分析的图网络结构为图头为单个对象的逻辑图;从所述数据集中获取与所述逻辑图关联的顶点数据和边数据,分别对应组成顶点集和边集;通过所述顶点集和边集表示所述逻辑图,得到所述待分析的图网络结构。In one embodiment, the step of obtaining the graph network structure to be analyzed by the processor according to the execution script of the graph analysis task includes: obtaining the graph elements of the graph network structure to be analyzed carried in the execution script of the graph analysis task , wherein the picture element includes a picture header; extract the picture header data set corresponding to the picture header from the specified database according to the picture header; determine whether the picture header data set includes multiple objects; if not, determine The graph network structure to be analyzed is a logical graph whose graph header is a single object; vertex data and edge data associated with the logical graph are obtained from the data set, and form a vertex set and an edge set respectively; Sets and edge sets represent the logic graph, and the graph network structure to be analyzed is obtained.

在一个实施例中,指定数据集包括Flink数据集,上述处理器根据所述执行脚本,获取与所述图分析任务相对应的图运算符的步骤之后,包括:获取各所述图运算符的执行顺序;按照所述执行顺序将各所述图运算符,通过Java编程语言链接为运算逻辑;将所述运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序。In one embodiment, the specified data set includes a Flink data set, and after the step of obtaining the graph operator corresponding to the graph analysis task according to the execution script, the processor includes: obtaining the graph operator of each graph operator. Execution sequence; according to the execution sequence, each of the graph operators is linked into operation logic through the Java programming language; the operation logic is associated with the trigger of the Flink trigger program to form a Flink operation program.

在一个实施例中,所述图运算符包括排除运算符,上述处理器利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:获取第一逻辑图对应的第一顶点集和第一边集,获取第二逻辑图对应的第二顶点集和第二边集;通过函数ID_ONLY从第二逻辑图中,提取所述第二逻辑图的标识符,形成新id对象对应的数据集;通过所述排除运算符筛选指定顶点和指定边,得到所述排除运算符对应的排除结果数据集,其中,所述指定顶点包含于所述第一顶点集,不包含于所述第二顶点集,所述指定边真包含于所述第一边集;通过Flink广播将所述排除结果数据集发送至Apache Flink框架集群的所有节点。In one embodiment, the graph operator includes an exclusion operator, and the processor uses the graph operator to perform analysis and calculation on a specified data set corresponding to the graph network structure, and the step of obtaining an analysis result includes: obtaining a first The first vertex set and the first edge set corresponding to a logical graph are obtained, and the second vertex set and the second edge set corresponding to the second logical graph are obtained; the function ID_ONLY is used to extract the second logical graph from the second logical graph. identifier to form the data set corresponding to the new id object; filter the specified vertex and the specified edge by the exclusion operator, and obtain the exclusion result data set corresponding to the exclusion operator, wherein the specified vertex is included in the first The vertex set is not included in the second vertex set, and the specified edge is really included in the first edge set; the excluded result data set is sent to all nodes of the Apache Flink framework cluster through Flink broadcast.

在一个实施例中,所述图网络结构为多个逻辑图组成的社交网络图,所述图运算符包括Flink提供的关系操作符,上述处理器利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:识别用户选定的关系操作符是否为寻找所述社交网络图中逻辑图之间的最大公共子图;若是,则聚集所述社交网络图中每个所述逻辑图的顶点数量;根据每个所述逻辑图的顶点数量,选择高于最小顶点计数的指定逻辑图;基于所述指定逻辑图,通过标签传播算法得到所述社交网络图中逻辑图之间的最大公共子图。In one embodiment, the graph network structure is a social network graph composed of multiple logical graphs, the graph operators include relational operators provided by Flink, and the above-mentioned processor uses the graph operators to analyze the graph network structure. The corresponding designated data set is analyzed and calculated, and the steps of obtaining the analysis result include: identifying whether the relationship operator selected by the user is to find the largest common subgraph between the logic diagrams in the social network diagram; The number of vertices of each of the logical graphs in the social network graph; according to the number of vertices of each of the logical graphs, a specified logical graph higher than the minimum vertex count is selected; based on the specified logical graph, the tag propagation algorithm is used to obtain the The largest common subgraph between logical graphs in a social network graph.

在一个实施例中,所述图预算符至少包括两个,上述处理器利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:获取多个图预算符对应的指定Flink程序;获取所述指定Flink程序当前运行节点,预测可并发执行运算的数据集;将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器;接收各指定机器分别执行分发的数据集得到的计算结果;将各计算结果汇总得到多个图预算符的分析结果;将所述多个图预算符的分析结果,通过Flink广播发送至Apache Flink框架集群的所有节点。In one embodiment, the graph budget operator includes at least two, and the processor uses the graph operator to perform analysis and calculation on a specified data set corresponding to the graph network structure, and the step of obtaining an analysis result includes: obtaining multiple data sets. The designated Flink program corresponding to each graph budget symbol; obtain the current running node of the designated Flink program, and predict the data set that can execute the operation concurrently; distribute the data set that can execute the operation concurrently to the designated machine in the Apache Flink framework cluster; receive each designated Flink program The machine executes the calculation results obtained from the distributed datasets separately; aggregates the calculation results to obtain the analysis results of multiple graph budget symbols; sends the analysis results of the multiple graph budget symbols to all members of the Apache Flink framework cluster through Flink broadcast node.

在一个实施例中,所述指定Flink程序包括对特定逻辑图的聚合运算、变换预算和模式匹配预算,上述处理器将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器的步骤,包括:获取所述特定逻辑图对应的数据集、聚合运算对应的聚合函数、变换运算对应的转换函数以及模式匹配运算对应的匹配函数;将所述特定逻辑图对应的数据集和聚合运算对应的聚合函数分发至集群中的第一机器,将所述特定逻辑图对应的数据集和变换运算对应的转换函数分发至集群中的第二机器,将所述特定逻辑图对应的数据集和模式匹配运算对应的匹配函数分发至集群中的第三机器;控制所述第一机器、所述第二机器和所述第三机器并发运行。In one embodiment, the designated Flink program includes aggregation operations, transformation budgets, and pattern matching budgets on a specific logic graph, and the above-mentioned processor distributes the data sets that perform operations concurrently to the designated machines in the Apache Flink framework cluster. Including: acquiring the data set corresponding to the specific logic diagram, the aggregation function corresponding to the aggregation operation, the conversion function corresponding to the transformation operation, and the matching function corresponding to the pattern matching operation; and the data set corresponding to the specific logic diagram and the aggregation operation. The aggregation function is distributed to the first machine in the cluster, the data set corresponding to the specific logic diagram and the transformation function corresponding to the transformation operation are distributed to the second machine in the cluster, and the data set corresponding to the specific logic diagram is matched with the pattern The matching function corresponding to the operation is distributed to the third machine in the cluster; the first machine, the second machine and the third machine are controlled to run concurrently.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the process in the method of the above-mentioned embodiments can be implemented by instructing the relevant hardware through a computer program, and the above-mentioned computer program can be stored in a non-volatile computer-readable storage medium , when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, device, article or method comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, apparatus, article or method. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.

以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related The technical field is similarly included in the scope of patent protection of this application.

Claims (10)

1.一种图网络结构的分析方法,其特征在于,包括:1. an analysis method of graph network structure, is characterized in that, comprises: 根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;Obtain the graph network structure to be analyzed according to the execution script of the graph analysis task, wherein the graph network structure includes a vertex set and an edge set, and each vertex in the vertex set carries the first type label and the first type label corresponding to the vertex respectively. attribute, each edge in the edge set is formed by connecting two vertices with an associated relationship, and each of the edges carries the second type label and the second attribute corresponding to the edge respectively; 将所述图网络结构中的各顶点以及各条边分别映射指定数据集;Map each vertex and each edge in the graph network structure to the specified data set respectively; 根据所述执行脚本,获取与所述图分析任务相对应的图运算符;obtaining a graph operator corresponding to the graph analysis task according to the execution script; 利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。The specified data set corresponding to the graph network structure is analyzed and calculated by using the graph operator, and an analysis result is obtained. 2.根据权利要求1所述的图网络结构的分析方法,其特征在于,所述根据图分析任务的执行脚本,获取待分析的图网络结构的步骤,包括:2. The method for analyzing a graph network structure according to claim 1, wherein the step of obtaining the graph network structure to be analyzed according to the execution script of the graph analysis task, comprises: 获取图分析任务的执行脚本中携带的所述待分析的图网络结构的图元素,其中,所述图元素包括图头;obtaining the graph element of the graph network structure to be analyzed carried in the execution script of the graph analysis task, wherein the graph element includes a graph header; 根据所述图头从指定数据库中抽取所述图头对应的图头数据集;Extract the image header data set corresponding to the image header from the specified database according to the image header; 判断所述图头数据集是否包括多个对象;judging whether the image header data set includes multiple objects; 若否,则判定所述待分析的图网络结构为图头为单个对象的逻辑图;If not, it is determined that the graph network structure to be analyzed is a logical graph whose graph header is a single object; 从所述数据集中获取与所述逻辑图关联的顶点数据和边数据,分别对应组成顶点集和边集;Acquiring vertex data and edge data associated with the logic graph from the data set, respectively forming a vertex set and an edge set; 通过所述顶点集和边集表示所述逻辑图,得到所述待分析的图网络结构。The logic graph is represented by the vertex set and the edge set, and the graph network structure to be analyzed is obtained. 3.根据权利要求1所述的图网络结构的分析方法,其特征在于,指定数据集包括Flink数据集,所述根据所述执行脚本,获取与所述图分析任务相对应的图运算符的步骤之后,包括:3. The method for analyzing a graph network structure according to claim 1, wherein the specified data set comprises a Flink data set, and according to the execution script, the graph operator corresponding to the graph analysis task is obtained. After the steps, including: 获取各所述图运算符的执行顺序;Obtain the execution order of each of the graph operators; 按照所述执行顺序将各所述图运算符,通过Java编程语言链接为运算逻辑;According to the execution order, each of the graph operators is linked into operation logic through the Java programming language; 将所述运算逻辑与Flink触发程序的触发器关联,形成Flink运算程序。The Flink operation program is formed by associating the operation logic with the trigger of the Flink trigger program. 4.根据权利要求1所述的图网络结构的分析方法,其特征在于,所述图运算符包括排除运算符,利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:4 . The method for analyzing a graph network structure according to claim 1 , wherein the graph operator comprises an exclusion operator, and the graph operator is used to analyze and calculate a specified data set corresponding to the graph network structure. 5 . , the steps to get the analysis result include: 获取第一逻辑图对应的第一顶点集和第一边集,获取第二逻辑图对应的第二顶点集和第二边集;obtaining a first vertex set and a first edge set corresponding to the first logical graph, and obtaining a second vertex set and a second edge set corresponding to the second logical graph; 通过函数ID_ONLY从第二逻辑图中,提取所述第二逻辑图的标识符,形成新id对象对应的数据集;The identifier of the second logic diagram is extracted from the second logic diagram by the function ID_ONLY to form a data set corresponding to the new id object; 通过所述排除运算符筛选指定顶点和指定边,得到所述排除运算符对应的排除结果数据集,其中,所述指定顶点包含于所述第一顶点集,不包含于所述第二顶点集,所述指定边真包含于所述第一边集;The specified vertex and the specified edge are filtered by the exclusion operator to obtain the exclusion result data set corresponding to the exclusion operator, wherein the specified vertex is included in the first vertex set and not included in the second vertex set , the specified edge is really included in the first edge set; 通过Flink广播将所述排除结果数据集发送至Apache Flink框架集群的所有节点。The exclusion result dataset is sent to all nodes of the Apache Flink framework cluster via Flink broadcast. 5.根据权利要求1所述的图网络结构的分析方法,其特征在于,所述图网络结构为多个逻辑图组成的社交网络图,所述图运算符包括Flink提供的关系操作符,所述利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:5. The method for analyzing a graph network structure according to claim 1, wherein the graph network structure is a social network graph composed of a plurality of logical graphs, and the graph operators include relational operators provided by Flink, so Describe the steps of using the graph operator to analyze and calculate the specified data set corresponding to the graph network structure to obtain the analysis result, including: 识别用户选定的关系操作符是否为寻找所述社交网络图中逻辑图之间的最大公共子图;Identifying whether the relational operator selected by the user is to find the largest common subgraph between the logical graphs in the social network graph; 若是,则聚集所述社交网络图中每个所述逻辑图的顶点数量;If so, aggregate the number of vertices of each of the logical graphs in the social network graph; 根据每个所述逻辑图的顶点数量,选择高于最小顶点计数的指定逻辑图;according to the number of vertices in each of said logical graphs, selecting a specified logical graph higher than the minimum vertex count; 基于所述指定逻辑图,通过标签传播算法得到所述社交网络图中逻辑图之间的最大公共子图。Based on the specified logic graph, the maximum common subgraph between the logic graphs in the social network graph is obtained through a label propagation algorithm. 6.根据权利要求1所述的图网络结构的分析方法,其特征在于,所述图预算符至少包括两个,所述利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果的步骤,包括:6 . The method for analyzing a graph network structure according to claim 1 , wherein the graph budget operator comprises at least two, and the graph operator is used to perform an analysis on a specified data set corresponding to the graph network structure. 7 . The steps of analyzing and calculating to obtain the analysis result include: 获取多个图预算符对应的指定Flink程序;Get the specified Flink program corresponding to multiple graph budget symbols; 获取所述指定Flink程序当前运行节点,预测可并发执行运算的数据集;Obtain the current running node of the specified Flink program, and predict the data set that can perform operations concurrently; 将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器;Distribute the concurrently executed data sets to designated machines in the Apache Flink framework cluster; 接收各指定机器分别执行分发的数据集得到的计算结果;Receive the calculation results obtained by each designated machine executing the distributed data set respectively; 将各计算结果汇总得到多个图预算符的分析结果;Summarize the calculation results to obtain the analysis results of multiple graph budget symbols; 将所述多个图预算符的分析结果,通过Flink广播发送至Apache Flink框架集群的所有节点。The analysis results of the multiple graph budget identifiers are sent to all nodes of the Apache Flink framework cluster through Flink broadcast. 7.根据权利要求6所述的图网络结构的分析方法,其特征在于,所述指定Flink程序包括对特定逻辑图的聚合运算、变换预算和模式匹配预算,所述将并发执行运算的数据集分发至Apache Flink框架集群中的指定机器的步骤,包括:7. The method for analyzing a graph network structure according to claim 6, wherein the specified Flink program includes an aggregate operation, a transformation budget and a pattern matching budget for a specific logic graph, and the data set that will execute the operation concurrently The steps to distribute to specified machines in the Apache Flink framework cluster include: 获取所述特定逻辑图对应的数据集、聚合运算对应的聚合函数、变换运算对应的转换函数以及模式匹配运算对应的匹配函数;acquiring the data set corresponding to the specific logic diagram, the aggregation function corresponding to the aggregation operation, the conversion function corresponding to the transformation operation, and the matching function corresponding to the pattern matching operation; 将所述特定逻辑图对应的数据集和聚合运算对应的聚合函数分发至集群中的第一机器,将所述特定逻辑图对应的数据集和变换运算对应的转换函数分发至集群中的第二机器,将所述特定逻辑图对应的数据集和模式匹配运算对应的匹配函数分发至集群中的第三机器;Distribute the data set corresponding to the specific logic diagram and the aggregation function corresponding to the aggregation operation to the first machine in the cluster, and distribute the data set corresponding to the specific logic diagram and the transformation function corresponding to the transformation operation to the second machine in the cluster. a machine that distributes the data set corresponding to the specific logic diagram and the matching function corresponding to the pattern matching operation to the third machine in the cluster; 控制所述第一机器、所述第二机器和所述第三机器并发运行。The first machine, the second machine and the third machine are controlled to operate concurrently. 8.一种图网络结构的分析装置,其特征在于,包括:8. An analysis device of a graph network structure, characterized in that, comprising: 第一获取模块,用于根据图分析任务的执行脚本,获取待分析的图网络结构,其中,所述图网络结构包括顶点集和边集,所述顶点集中的各顶点分别携带该顶点对应的第一类型标签和第一属性,所述边集中的各条边由具有关联关系的两个顶点相连形成,各所述边分别携带边对应的第二类型标签和第二属性;The first acquisition module is used to acquire the graph network structure to be analyzed according to the execution script of the graph analysis task, wherein the graph network structure includes a vertex set and an edge set, and each vertex in the vertex set carries the corresponding A first type label and a first attribute, each edge in the edge set is formed by connecting two vertices with an associated relationship, and each of the edges respectively carries the second type label and the second attribute corresponding to the edge; 映射模块,用于将所述图网络结构中的各顶点以及各条边分别映射指定数据集;a mapping module, used to map each vertex and each edge in the graph network structure to the specified data set respectively; 第二获取模块,用于根据所述执行脚本,获取与所述图分析任务相对应的图运算符;a second obtaining module, configured to obtain a graph operator corresponding to the graph analysis task according to the execution script; 计算模块,用于利用所述图运算符对所述图网络结构对应的指定数据集进行分析计算,得到分析结果。The calculation module is configured to perform analysis and calculation on the specified data set corresponding to the graph network structure by using the graph operator to obtain an analysis result. 9.一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述方法的步骤。9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 7 when the processor executes the computer program . 10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.
CN202010733106.8A 2020-07-27 2020-07-27 Analysis method, device and computer equipment for graph network structure Pending CN111814006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010733106.8A CN111814006A (en) 2020-07-27 2020-07-27 Analysis method, device and computer equipment for graph network structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010733106.8A CN111814006A (en) 2020-07-27 2020-07-27 Analysis method, device and computer equipment for graph network structure

Publications (1)

Publication Number Publication Date
CN111814006A true CN111814006A (en) 2020-10-23

Family

ID=72862680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010733106.8A Pending CN111814006A (en) 2020-07-27 2020-07-27 Analysis method, device and computer equipment for graph network structure

Country Status (1)

Country Link
CN (1) CN111814006A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145771A1 (en) * 2007-03-15 2010-06-10 Ariel Fligler System and method for providing service or adding benefit to social networks
US20160188391A1 (en) * 2014-12-31 2016-06-30 International Business Machines Corporation Sophisticated run-time system for graph processing
US20170116764A1 (en) * 2015-10-23 2017-04-27 International Business Machines Corporation Dynamic interaction graphs with probabilistic edge decay
CN107402745A (en) * 2017-07-04 2017-11-28 清华大学 The mapping method and device of DFD
US20180189388A1 (en) * 2017-01-05 2018-07-05 International Business Machines Corporation Representation of a data analysis using a flow graph
CN108351941A (en) * 2015-11-02 2018-07-31 日本电信电话株式会社 Analytical equipment, analysis method and analysis program
US20190057138A1 (en) * 2017-08-18 2019-02-21 Vmware, Inc. Presenting a temporal topology graph of a computing environment at a graphical user interface
CN110110154A (en) * 2018-02-01 2019-08-09 腾讯科技(深圳)有限公司 A kind of processing method of map file, device and storage medium
CN111026544A (en) * 2019-11-06 2020-04-17 中国科学院深圳先进技术研究院 Node classification method and device of graph network model and terminal equipment
EP3664374A1 (en) * 2018-12-06 2020-06-10 Fujitsu Limited Data stream processing method and data stream processing system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145771A1 (en) * 2007-03-15 2010-06-10 Ariel Fligler System and method for providing service or adding benefit to social networks
US20160188391A1 (en) * 2014-12-31 2016-06-30 International Business Machines Corporation Sophisticated run-time system for graph processing
US20170116764A1 (en) * 2015-10-23 2017-04-27 International Business Machines Corporation Dynamic interaction graphs with probabilistic edge decay
CN108351941A (en) * 2015-11-02 2018-07-31 日本电信电话株式会社 Analytical equipment, analysis method and analysis program
US20180189388A1 (en) * 2017-01-05 2018-07-05 International Business Machines Corporation Representation of a data analysis using a flow graph
CN107402745A (en) * 2017-07-04 2017-11-28 清华大学 The mapping method and device of DFD
US20190057138A1 (en) * 2017-08-18 2019-02-21 Vmware, Inc. Presenting a temporal topology graph of a computing environment at a graphical user interface
CN110110154A (en) * 2018-02-01 2019-08-09 腾讯科技(深圳)有限公司 A kind of processing method of map file, device and storage medium
EP3664374A1 (en) * 2018-12-06 2020-06-10 Fujitsu Limited Data stream processing method and data stream processing system
CN111026544A (en) * 2019-11-06 2020-04-17 中国科学院深圳先进技术研究院 Node classification method and device of graph network model and terminal equipment

Similar Documents

Publication Publication Date Title
CN111061859B (en) Data processing methods, devices and computer equipment based on knowledge graphs
JP7344327B2 (en) System and method for metadata-driven external interface generation of application programming interfaces
US8655805B2 (en) Method for classification of objects in a graph data stream
WO2020088439A1 (en) Method for identifying isomerism graph and molecular spatial structural property, device, and computer apparatus
WO2020048058A1 (en) Fund knowledge inference method and system, computer device, and storage medium
US20150039611A1 (en) Discovery of related entities in a master data management system
CN104077723B (en) A kind of social networks commending system and method
US11204707B2 (en) Scalable binning for big data deduplication
CN104809244B (en) Data digging method and device under a kind of big data environment
Marchant et al. d-blink: Distributed end-to-end Bayesian entity resolution
CN106815307A (en) Public Culture knowledge mapping platform and its use method
CN110020176A (en) A kind of resource recommendation method, electronic equipment and computer readable storage medium
US20180121526A1 (en) Method, apparatus, and computer-readable medium for non-structured data profiling
Belcastro et al. ParSoDA: high-level parallel programming for social data mining
Braun et al. MapReduce-based complex big data analytics over uncertain and imprecise social networks
Carletti et al. Interpretable anomaly detection with diffi: Depth-based isolation forest feature importance
Zhan et al. Identification of top-K influential communities in big networks
CN114493853A (en) Credit rating evaluation method, device, electronic device and storage medium
Jafarzadegan et al. An agglomerative hierarchical clustering framework for improving the ensemble clustering process
Jovanovič et al. Big-data analytics: a critical review and some future directions
Wei et al. Graph learning and its advancements on large language models: A holistic survey
Venkatesan et al. CI/CD Pipelines for Model Training: Reducing Turnaround Time in Offline Model Training with Hive and Spark
CN117725220A (en) Method, server and storage medium for document characterization and document retrieval
Belcastro et al. A parallel library for social media analytics
CN110941662A (en) Graphical method, system, storage medium, and terminal for scientific research cooperation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240126