CN116166634A

CN116166634A - Data blood relationship graph construction method and device, storage medium and electronic equipment

Info

Publication number: CN116166634A
Application number: CN202310130931.2A
Authority: CN
Inventors: 陈定玮
Original assignee: Feisuanzhi Technology Shenzhen Co ltd
Current assignee: Feisuanzhi Technology Shenzhen Co ltd
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-05-26

Abstract

The present disclosure relates to a method, device, storage medium, and electronic device for constructing a data kinship graph, wherein the method includes: acquiring task component information, the task component information including the execution sequence of multiple task components and the execution order of the multiple task components Configuration information; based on the execution sequence, determine the field association relationship corresponding to the task component according to the configuration information of each task component in turn, and the field association relationship is used to represent the relationship between the field information of the source table and the field information of the target table Corresponding relationship, the target table field information is obtained after the task component processes the source table field information; according to each of the field correlations, a data blood relationship diagram is determined. By constructing the data kinship diagram through the above method, we can intuitively see the corresponding relationship between databases and databases, data tables and data tables, and/or data fields and data fields, so as to provide support for the later construction and maintenance of the system.

Description

Method, device, storage medium and electronic equipment for constructing data kinship graph

技术领域technical field

本公开涉及数据处理技术领域，具体地，涉及一种数据血缘关系图构建方法、装置、存储介质及电子设备。The present disclosure relates to the technical field of data processing, and in particular, relates to a method, device, storage medium, and electronic device for constructing a data kinship relationship graph.

背景技术Background technique

在实际的数据开发治理过程中，需要从不同的数据源抽取数据，并按照一定的数据处理规则对抽取的数据进行加工和/或格式转换，以得到包含目标数据的目标数据表或者目标文件等，最后加载形成用户所需要的表数据。In the actual data development and governance process, it is necessary to extract data from different data sources, and process and/or format the extracted data according to certain data processing rules to obtain target data tables or target files containing target data, etc. , and finally load the table data required by the user.

随着企业业务的扩展，系统中的数据表不断增加，开发治理人员很难熟知数据库与数据库、数据表与数据表、数据字段与数据字段之间的关系，不利于系统的后期建设与维护。With the expansion of enterprise business, the number of data tables in the system continues to increase, and it is difficult for development and management personnel to be familiar with the relationship between databases and databases, data tables and data tables, and data fields and data fields, which is not conducive to the later construction and maintenance of the system.

发明内容Contents of the invention

本公开的目的是提供一种数据血缘关系图构建方法、装置、存储介质及电子设备，以解决上述技术问题。The purpose of the present disclosure is to provide a method, device, storage medium and electronic equipment for constructing a data kinship graph, so as to solve the above-mentioned technical problems.

为了实现上述目的，本公开第一方面提供一种数据血缘关系图构建方法，所述方法包括：In order to achieve the above purpose, the first aspect of the present disclosure provides a method for constructing a data kinship relationship graph, the method comprising:

获取任务组件信息，所述任务组件信息包括多个任务组件的执行顺序以及所述多个任务组件的配置信息；Obtaining task component information, the task component information including the execution sequence of multiple task components and configuration information of the multiple task components;

基于所述执行顺序，依次根据每个所述任务组件的配置信息确定对应所述任务组件的字段关联关系，所述字段关联关系用于表征源表字段信息和目标表字段信息之间的对应关系，所述目标表字段信息由所述任务组件对所述源表字段信息进行处理后得到；Based on the execution sequence, determine the field association relationship corresponding to the task component according to the configuration information of each task component in turn, and the field association relationship is used to represent the correspondence between the field information of the source table and the field information of the target table , the target table field information is obtained after the task component processes the source table field information;

根据每个所述字段关联关系，确定数据血缘关系图。According to the association relationship of each of the fields, a data blood relationship diagram is determined.

可选地，所述根据每个所述任务组件的配置信息确定对应所述任务组件的字段关联关系，包括：Optionally, the determining the field association relationship corresponding to the task component according to the configuration information of each task component includes:

针对每个所述任务组件的配置信息，执行以下步骤：For the configuration information of each of the described task components, perform the following steps:

根据所述配置信息确定源表字段信息标识、目标表字段信息标识以及所述源表字段信息标识和所述目标表字段信息标识之间的对应关系；Determine a source table field information identifier, a target table field information identifier, and a correspondence between the source table field information identifier and the target table field information identifier according to the configuration information;

根据所述对应关系对所述源表字段信息标识和所述目标表字段信息标识进行绑定，得到所述字段关联关系。The field information identifier of the source table and the field information identifier of the target table are bound according to the corresponding relationship to obtain the field association relationship.

可选地，所述根据所述配置信息确定源表字段信息标识，包括：Optionally, the determining the source table field information identifier according to the configuration information includes:

当所述配置信息包括源库信息时，根据所述源库信息确定源表字段信息以及源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识；When the configuration information includes source database information, determine source table field information and a source table identifier according to the source database information, and determine the source table field information identifier according to the source table field information and the source table identifier;

当所述配置信息包括结构化查询语句且不包括源库信息时，解析所述结构化查询语句，得到源表字段信息以及源表表名，根据所述源表表名确定源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识。When the configuration information includes a structured query statement and does not include source library information, parse the structured query statement to obtain source table field information and source table name, determine the source table identifier according to the source table name, and Determine the source table field information identifier according to the source table field information and the source table identifier.

可选地，所述解析所述结构化查询语句，得到源表字段信息以及源表表名，包括：Optionally, the parsing of the structured query statement to obtain source table field information and source table name includes:

对所述结构化查询语句进行语句解析，得到语句解析结果；Performing statement analysis on the structured query statement to obtain a statement analysis result;

根据所述语句解析结果确定所述结构化查询语句的语句类型；determining the statement type of the structured query statement according to the statement parsing result;

根据所述语句类型确定语句解析策略，并根据所述语句解析策略解析所述结构化查询语句，得到所述源表字段信息以及所述源表表名。A statement parsing strategy is determined according to the statement type, and the structured query statement is parsed according to the statement parsing strategy to obtain field information of the source table and a table name of the source table.

可选地，当所述语句类型为选取语句类型时，所述根据所述语句解析策略解析所述结构化查询语句，得到所述源表字段信息以及所述源表表名，包括：Optionally, when the statement type is a selected statement type, parsing the structured query statement according to the statement parsing strategy to obtain the source table field information and the source table name includes:

根据所述结构化查询语句确定操作指令；determining an operation instruction according to the structured query statement;

在所述操作指令为联合操作指令时，对所述结构化查询语句进行拆分，得到操作指令为选取操作指令的目标结构化查询语句，并根据所述目标结构化查询语句确定所述源表字段信息以及所述源表表名；When the operation instruction is a combined operation instruction, split the structured query statement to obtain a target structured query statement whose operation instruction is a selected operation instruction, and determine the source table according to the target structured query statement Field information and the name of the source table;

在所述操作指令为选取操作指令时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名。When the operation instruction is a selection operation instruction, the source table field information and the source table name are determined according to the structured query statement.

可选地，当所述语句类型为嵌入语句类型或创建语句类型时，所述根据所述语句解析策略解析所述结构化查询语句，得到所述源表字段信息以及所述源表表名，包括：Optionally, when the statement type is an embedded statement type or a created statement type, the structured query statement is parsed according to the statement parsing strategy to obtain the source table field information and the source table name, include:

确定所述结构化查询语句中是否存在选取语句；Determine whether there is a selection statement in the structured query statement;

在所述结构化查询语句中不存在所述选取语句时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名；When the selection statement does not exist in the structured query statement, determine the source table field information and the source table name according to the structured query statement;

在所述结构化查询语句中存在所述选取语句时，根据所述结构化查询语句确定操作指令；When the selection statement exists in the structured query statement, determine an operation instruction according to the structured query statement;

可选地，当所述配置信息包括用于表征所述源表字段信息不支持通过数据库连接协议查询的信息时，所述根据所述配置信息确定源表字段信息标识，包括：Optionally, when the configuration information includes information indicating that the source table field information does not support query through a database connection protocol, the determining the source table field information identifier according to the configuration information includes:

根据所述配置信息确定所述任务组件的任务类型；determining the task type of the task component according to the configuration information;

在所述任务类型为字典表加载任务时，根据所述配置信息从目标数据库中确定源表字段信息以及源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识；When the task type is a dictionary table loading task, determine the source table field information and source table identifier from the target database according to the configuration information, and determine the source table according to the source table field information and the source table identifier Field information identification;

在所述任务类型为转换任务时，根据所述配置信息确定结构化查询语句，并解析所述结构化查询语句，得到源表字段以及源表表名；When the task type is a conversion task, determine a structured query statement according to the configuration information, and parse the structured query statement to obtain a source table field and a source table name;

根据所述源表表名确定源表标识，并根据所述源表标识和所述源表字段在已解析的数据里确定所述源表字段的字段类型；Determine the source table identifier according to the source table name, and determine the field type of the source table field in the parsed data according to the source table identifier and the source table field;

根据所述源表字段和所述字段类型，确定所述源表字段信息，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识。The source table field information is determined according to the source table field and the field type, and the source table field information identifier is determined according to the source table field information and the source table identifier.

可选地，所述根据每个所述字段关联关系，确定数据血缘关系图，包括：Optionally, the determining the data kinship diagram according to each of the field associations includes:

针对每个所述字段关联关系，确定对应的所述源表字段信息标识和所述目标表字段信息标识，并根据所述源表字段信息标识确定源表标识，根据所述目标表字段信息标识确定目标表标识；For each field association relationship, determine the corresponding source table field information identifier and the target table field information identifier, and determine the source table identifier according to the source table field information identifier, and determine the source table identifier according to the target table field information identifier Determine the target table ID;

以具有相同所述源表标识的所述源表字段信息标识作为第一数据，对应的所述源表标识作为第一表名，构建第一数据表，以及以具有相同所述目标表标识的所述目标表字段信息标识作为第二数据，对应的所述目标表标识作为第二表名，构建第二数据表；Using the source table field information identifier with the same source table identifier as the first data, and the corresponding source table identifier as the first table name, construct a first data table, and use the same target table identifier as the first data table The field information identifier of the target table is used as the second data, and the corresponding target table identifier is used as the second table name to construct the second data table;

根据所述字段关联关系对所述第一数据表中的第一数据和所述第二数据表中的所述第二数据进行绑定，得到所述数据血缘关系图。Binding the first data in the first data table and the second data in the second data table according to the field association relationship to obtain the data blood relationship graph.

本公开第二方面提供了一种数据血缘关系图构建装置，所述装置包括：The second aspect of the present disclosure provides a device for constructing a data kinship relationship graph, the device comprising:

获取模块，用于获取任务组件信息，所述任务组件信息包括多个任务组件的执行顺序以及所述多个任务组件的配置信息；An acquisition module, configured to acquire task component information, the task component information including the execution order of multiple task components and configuration information of the multiple task components;

第一确定模块，用于基于所述执行顺序，依次根据每个所述任务组件的配置信息确定对应所述任务组件的字段关联关系，所述字段关联关系用于表征源表字段信息和目标表字段信息之间的对应关系，所述目标表字段信息由所述任务组件对所述源表字段信息进行处理后得到；The first determination module is configured to determine the field association relationship corresponding to the task component according to the configuration information of each task component in turn based on the execution order, and the field association relationship is used to represent the field information of the source table and the target table Correspondence between field information, the target table field information is obtained after the task component processes the source table field information;

第二确定模块，用于根据每个所述字段关联关系，确定数据血缘关系图。The second determining module is configured to determine a data blood relationship diagram according to each of the field association relationships.

本公开第三方面提供了一种非临时性计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现第一方面任意一项所述方法的步骤。A third aspect of the present disclosure provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of any one of the methods described in the first aspect are implemented.

本公开第四方面提供了一种电子设备，包括：A fourth aspect of the present disclosure provides an electronic device, including:

存储器，其上存储有计算机程序；a memory on which a computer program is stored;

处理器，用于执行所述存储器中的所述计算机程序，以实现第一方面任意一项所述方法的步骤。A processor, configured to execute the computer program in the memory, so as to implement the steps of any one of the methods in the first aspect.

通过上述技术方案，提取源表字段信息和目标表字段信息之间的字段关联关系，并根据字段关联关系构建数据血缘关系图，一方面可以直观的看出数据库与数据库、数据表与数据表和/或数据字段与数据字段之间的对应关系，为系统的后期建设与维护提供支撑；另一方面，通过数据血缘关系图可以使数据开发过程中所涉及的数据库/数据表可视化，能够一目了然的看出数据血缘关系图中每个数据字段信息的来源，不再需要咨询相关的数据开发人员，减少项目对人员的依赖。Through the above technical solution, the field association relationship between the field information of the source table and the field information of the target table is extracted, and the data blood relationship diagram is constructed according to the field association relationship. /or the corresponding relationship between data fields and data fields provides support for the later construction and maintenance of the system; on the other hand, the database/data table involved in the data development process can be visualized through the data blood relationship diagram, which can be seen at a glance See the source of information for each data field in the data kinship diagram, no longer need to consult relevant data developers, and reduce the project's dependence on personnel.

本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.

附图说明Description of drawings

附图是用来提供对本公开的进一步理解，并且构成说明书的一部分，与下面的具体实施方式一起用于解释本公开，但并不构成对本公开的限制。在附图中：The accompanying drawings are used to provide a further understanding of the present disclosure, and constitute a part of the description, together with the following specific embodiments, are used to explain the present disclosure, but do not constitute a limitation to the present disclosure. In the attached picture:

图1是根据本公开一示例性实施例示出的一种数据血缘关系图构建方法的流程图；Fig. 1 is a flow chart showing a method for constructing a data kinship relationship graph according to an exemplary embodiment of the present disclosure;

图2是根据本公开一示例性实施例示出的一种数据血缘关系图示意图；Fig. 2 is a schematic diagram of a data kinship relationship diagram according to an exemplary embodiment of the present disclosure;

图3是根据本公开一示例性实施例示出的一种数据血缘关系图构建装置的结构框图；Fig. 3 is a structural block diagram of a device for constructing a data kinship relationship graph according to an exemplary embodiment of the present disclosure;

图4是根据本公开一示例性实施例示出的一种电子设备的框图。Fig. 4 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例，然而应当理解的是，本公开可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例，相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是，本公开的附图及实施例仅用于示例性作用，并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

应当理解，本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行，和/或并行执行。此外，方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.

本文使用的术语“包括”及其变形是开放性包括，即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”；术语“另一实施例”表示“至少一个另外的实施例”；术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.

需要注意，本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的，本领域技术人员应当理解，除非在上下文另有明确指出，否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" Multiple".

首先，对本公开的应用场景进行说明。在数据开发冶理过程中，需要从不同的数据源抽取数据，并按照一定的数据处理规则对抽取的数据进行加工和/或格式转换，以得到目标数据。然后再将目标数据写入或传输至目标数据表或目标文件，得到包含目标数据的目标数据表或者目标文件。最后再通过加载目标数据表或目标文件形成用户所需要的表数据。但随着企业业务的扩展，系统中的数据表不断增加，开发冶理人员很难熟知数据库与数据库、数据表与数据表和/或数据字段与数据字段之间的关系，因此不利于系统的后期建设与维护。First, the application scenarios of the present disclosure are described. In the process of data development and management, it is necessary to extract data from different data sources, and process and/or format the extracted data according to certain data processing rules to obtain target data. Then write or transmit the target data to the target data table or target file to obtain the target data table or target file containing the target data. Finally, the table data required by the user is formed by loading the target data table or target file. However, with the expansion of enterprise business, the number of data tables in the system continues to increase, and it is difficult for developers and managers to be familiar with the relationship between databases, data tables and data tables, and/or data fields and data fields, which is not conducive to system development. Later construction and maintenance.

有鉴于此，本公开实施例提供一种数据血缘关系图构建方法、装置、存储介质及电子设备，通过提取源表字段信息和目标表字段信息之间的字段关联关系，并根据字段关联关系构建数据血缘关系图，以直观的看出数据库与数据库、数据表与数据表和/或数据字段与数据字段之间的对应关系，从而为系统的后期建设与维护提供支撑。In view of this, an embodiment of the present disclosure provides a method, device, storage medium, and electronic device for constructing a data lineage relationship graph, by extracting the field association relationship between the field information of the source table and the field information of the target table, and constructing the data according to the field association relationship The data blood relationship diagram can intuitively see the corresponding relationship between databases and databases, data tables and data tables, and/or data fields and data fields, so as to provide support for the later construction and maintenance of the system.

以下结合附图，对本公开实施例进行进一步解释说明。The embodiments of the present disclosure will be further explained below in conjunction with the accompanying drawings.

图1是根据本公开一示例性实施例示出的一种数据血缘关系图构建方法的流程图，参照图1，数据血缘关系图构建方法可以包括以下步骤：Fig. 1 is a flow chart of a method for constructing a data kinship relationship graph according to an exemplary embodiment of the present disclosure. Referring to Fig. 1 , the method for constructing a data kinship relationship graph may include the following steps:

S101，获取任务组件信息，所述任务组件信息包括多个任务组件的执行顺序以及所述多个任务组件的配置信息。S101. Obtain task component information, where the task component information includes an execution sequence of multiple task components and configuration information of the multiple task components.

应当理解的是，任务组件是任务的最小单位，每个任务组件用于完成该任务的一个步骤或一个功能。当一个任务包括多个任务组件时，可以按照该任务对数据的处理顺序对多个任务组件进行排序，以得到多个任务组件的执行顺序。当包含有多个任务时，则可以先确定多个任务的执行顺序，然后再根据多个任务的执行顺序，依次确定每个任务的任务组件执行顺序。It should be understood that a task component is the smallest unit of a task, and each task component is used to complete a step or a function of the task. When a task includes multiple task components, the multiple task components can be sorted according to the order in which the task processes data, so as to obtain the execution order of the multiple task components. When multiple tasks are included, the execution sequence of the multiple tasks may be determined first, and then the execution sequence of the task components of each task may be sequentially determined according to the execution sequence of the multiple tasks.

另外应当理解的是，每个任务组件的配置信息可以根据该任务组件的功能进行配置，本公开实施例对此不作任何限制。在可能的实施方式中，每个任务组件的配置信息可以包括该任务组件所需要处理的源表字段信息、源表字段信息的读取位置，目标表字段信息的存储位置等。例如，配置信息可以包括源库信息(源库地址、源库库名和/或源表表名)、结构化查询语句和/或目标库信息(目标库地址、目标库库名和/或目标表表名)。In addition, it should be understood that the configuration information of each task component may be configured according to the function of the task component, which is not limited in this embodiment of the present disclosure. In a possible implementation, the configuration information of each task component may include the source table field information that the task component needs to process, the reading location of the source table field information, the storage location of the target table field information, and the like. For example, the configuration information may include source library information (source library address, source library name and/or source table name), structured query statement and/or target library information (target library address, target library name and/or target table name) name).

S102，基于所述执行顺序，依次根据每个所述任务组件的配置信息确定对应所述任务组件的字段关联关系，所述字段关联关系用于表征源表字段信息和目标表字段信息之间的对应关系，所述目标表字段信息由所述任务组件对所述源表字段信息进行处理后得到。S102. Based on the execution sequence, determine the field association relationship corresponding to the task component according to the configuration information of each task component in sequence, and the field association relationship is used to represent the relationship between the field information of the source table and the field information of the target table. Corresponding relationship, the target table field information is obtained after the task component processes the source table field information.

在可能的实施方式中，所述根据每个所述任务组件的配置信息确定对应所述任务组件的字段关联关系，可以包括：In a possible implementation manner, the determining the field association relationship corresponding to the task component according to the configuration information of each task component may include:

根据所述配置信息确定源表字段信息标识、目标表字段信息标识以及所述源表字段信息标识和所述目标表字段信息标识之间的对应关系；根据所述对应关系对所述源表字段信息标识和所述目标表字段信息标识进行绑定，得到所述字段关联关系。Determine the source table field information identifier, the target table field information identifier, and the correspondence between the source table field information identifier and the target table field information identifier according to the configuration information; The information identifier is bound with the field information identifier of the target table to obtain the field association relationship.

应当理解的是，每个任务组件对应处理的源表字段信息一般不同，而每个源表字段信息又可以包括较多的信息，例如，源表字段、源表字段类型等。相应的，目标表字段信息也会包含对应的信息，即目标表字段、目标表字段类型等。若直接根据源表字段信息和目标表字段信息生成字段关联关系，则需要存储较多的数据信息。一方面极大的占用了系统的存储空间；另一方面，由于一个字段关联关系的数据结构较为复杂，在后续的存储或读取过程中，数据结构容易遭到破坏。有鉴于此，本实施例通过为每个源表字段信息和每个目标表字段信息生成对应的标识，即源表字段标识和目标表字段标识，并将对应的源表字段标识和目标表字段标识进行绑定，从而形成字段关联关系。相较于直接根据源表字段信息和目标表字段信息生成字段关联关系，本申请实施例的方法可以极大的简化字段关联关系的数据结构，进而克服上述问题。It should be understood that the source table field information processed by each task component is generally different, and each source table field information may include more information, for example, source table fields, source table field types, and so on. Correspondingly, the target table field information will also include corresponding information, that is, the target table field, the target table field type, and the like. If the field association relationship is generated directly according to the field information of the source table and the field information of the target table, more data information needs to be stored. On the one hand, it greatly occupies the storage space of the system; on the other hand, due to the complex data structure of a field association relationship, the data structure is easily damaged in the subsequent storage or reading process. In view of this, this embodiment generates a corresponding identifier for each source table field information and each target table field information, that is, the source table field identifier and the target table field identifier, and the corresponding source table field identifier and target table field The identifier is bound to form a field association relationship. Compared with directly generating the field association relationship based on the field information of the source table and the field information of the target table, the method in the embodiment of the present application can greatly simplify the data structure of the field association relationship, thereby overcoming the above-mentioned problems.

示意性地，若源表字段信息包括源表字段和源表字段类型，目标表字段信息包括目标表字段和目标表字段类型。若直接根据源表字段信息和目标表字段信息生成字段关联关系，则字段关联关系可以表示为：(源表字段+源表字段类型)&(目标表字段+目标表字段类型)。若根据源表字段信息标识和目标表字段信息标识生成字段关联关系，则字段关联关系可以表示为：源表字段信息标识&目标表字段信息标识。Schematically, if the source table field information includes source table fields and source table field types, the target table field information includes target table fields and target table field types. If the field association relationship is generated directly according to the field information of the source table and the field information of the target table, the field association relationship can be expressed as: (source table field + source table field type) & (target table field + target table field type). If the field association relationship is generated according to the field information identifier of the source table and the field information identifier of the target table, the field association relationship can be expressed as: the field information identifier of the source table & the field information identifier of the target table.

其中，值得说明的是，上述字段关联关系的表示方式仅为示意，并不构成对本方案的限制。例如，在可能的实施方式中，字段关联关系也可以表示为：源表字段信息标识and目标表字段信息标识。Wherein, it is worth noting that the representation of the above-mentioned field association relationship is only for illustration, and does not constitute a limitation to this solution. For example, in a possible implementation manner, the field association relationship may also be expressed as: a source table field information identifier and a target table field information identifier.

在可能的实施方式中，所述根据所述配置信息确定源表字段信息标识，可以包括：In a possible implementation manner, the determining the field information identifier of the source table according to the configuration information may include:

当所述配置信息包括源库信息时，根据所述源库信息确定源表字段信息以及源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识；当所述配置信息包括结构化查询语句且不包括源库信息时，解析所述结构化查询语句，得到源表字段信息以及源表表名，根据所述源表表名确定源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识。When the configuration information includes source database information, determine source table field information and a source table identifier according to the source database information, and determine the source table field information identifier according to the source table field information and the source table identifier; When the configuration information includes a structured query statement and does not include source library information, parse the structured query statement to obtain source table field information and source table name, determine the source table identifier according to the source table name, and Determine the source table field information identifier according to the source table field information and the source table identifier.

应当理解的是，根据源库信息和结构化查询语句(Structured Query Language，SQL)都可以获取到对应的源库字段信息。不同的是，根据源库信息可以获取到完整的源表数据(包括需要的源库字段信息和不需要的源库字段信息)，而根据结构化查询语句，只能获取到结构化查询语句中所声明的源库字段信息和源表表名。而配置信息中具体是配置源库信息还是结构化查询语句，根据实际情况进行设置，本公开实施例对此不作任何限制。It should be understood that the corresponding field information of the source database can be obtained according to the source database information and the structured query statement (Structured Query Language, SQL). The difference is that the complete source table data (including required source database field information and unnecessary source database field information) can be obtained according to the source database information, but according to the structured query statement, only the structured query statement can be obtained The declared source library field information and source table name. Whether the configuration information specifically configures source database information or structured query statements is set according to actual conditions, and the embodiments of the present disclosure do not impose any restrictions on this.

另外应当理解的是，不同的源表字段信息可能来自于同一源表，也有可能来自于不同的源表。为了后续能够根据数据血缘关系图确认数据表与数据表之间的对应关系，在确定出每个源表字段信息后，可以先将源表字段信息和源表表名进行绑定，然后再在绑定后生成对应的源表字段信息标识。同时，为了简化源表字段信息和源表表名之间的绑定关系，本实施例先通过生成源表标识，然后再将源表标识和源表字段信息进行绑定，最后再生成相应的源表字段信息标识。即，字段关联关系可以表示为：(源表标识+源表字段信息标识)&(目标表标识+目标表字段信息标识)。In addition, it should be understood that field information of different source tables may come from the same source table, or may come from different source tables. In order to confirm the corresponding relationship between data tables and data tables according to the data blood relationship diagram, after determining the field information of each source table, you can first bind the field information of the source table with the name of the source table, and then in After binding, the corresponding source table field information identifier is generated. At the same time, in order to simplify the binding relationship between the source table field information and the source table name, this embodiment first generates the source table identifier, then binds the source table identifier and the source table field information, and finally generates the corresponding Source table field information identifier. That is, the field association relationship can be expressed as: (source table identifier+source table field information identifier)&(target table identifier+target table field information identifier).

其中，值得说明的是，生成源表标识的方式可以根据实际情况进行设置，本公开实施例对此不作任何限制。在可能的实施方式中，可以通过对源库地址、源库库名以及源表表名进行加密得到；也可以通过对源表表名进行加密得当；还可以预先为源表设置对应的源表标识。Wherein, it is worth noting that the manner of generating the source table identifier may be set according to actual conditions, and this embodiment of the present disclosure does not impose any limitation on this. In a possible implementation, it can be obtained by encrypting the source library address, source library name, and source table name; it can also be obtained by properly encrypting the source table name; it is also possible to pre-set the corresponding source table for the source table logo.

示意性地，在配置信息包括源库信息时，首先根据源库信息访问源表，并根据该任务组件对应需要处理的数据，从源表中获取相应的源表字段信息；然后对源库信息进行加密，得到源表标识；最后将源表标识和源表字段信息进行绑定并进行加密处理，得到源表字段信息标识。Schematically, when the configuration information includes source database information, first access the source table according to the source database information, and obtain the corresponding source table field information from the source table according to the data that needs to be processed according to the task component; then source database information Encrypt to obtain the source table identifier; finally, bind the source table identifier and source table field information and perform encryption processing to obtain the source table field information identifier.

在配置信息不包括源库信息时，首先通过解析数据查询语句获取到相应的源表数据字段信息和源表表名；然后对源表表名进行加密，得到源表标识；最后将源表标识和源表字段信息进行绑定并进行加密处理，得到源表字段信息标识。When the configuration information does not include source database information, first obtain the corresponding source table data field information and source table name by parsing the data query statement; then encrypt the source table name to obtain the source table ID; finally, source table ID Bind with the field information of the source table and perform encryption processing to obtain the field information identifier of the source table.

在可能的实施方式中，所述解析所述结构化查询语句，得到源表字段信息以及源表表名，可以包括：In a possible implementation manner, the parsing the structured query statement to obtain the field information of the source table and the name of the source table may include:

对所述结构化查询语句进行语句解析，得到语句解析结果；根据所述语句解析结果确定所述结构化查询语句的语句类型；根据所述语句类型确定语句解析策略，并根据所述语句解析策略解析所述结构化查询语句，得到所述源表字段信息以及所述源表表名。Perform statement analysis on the structured query statement to obtain a statement analysis result; determine the statement type of the structured query statement according to the statement analysis result; determine a statement analysis strategy according to the statement type, and determine the statement analysis strategy according to the statement analysis strategy The structured query statement is parsed to obtain the field information of the source table and the name of the source table.

其中，对结构化查询语句进行语句解析，得到语句解析结果可以基于相关技术中的结构化查询语句解析插件来实现，本公开实施例对此不作赘述。Wherein, performing sentence parsing on a structured query statement to obtain a statement parsing result may be implemented based on a structured query sentence parsing plug-in in related technologies, which will not be described in detail in the embodiments of the present disclosure.

应当理解的是，通过结构化查询语句解析插件对结构化查询语句进行解析，得到的语句解析结果为结构化查询语句的数据结构。由于结构化查询语句的数据结构较为复杂，无法直接从中获取到源表数据字段信息和源表表名。因此，可以通过对结构化查询语句的解析结果进行进一步的判断，以确定该结构化查询语句的语句类型，从而针对不同的语句类型采取不同的解析策略进行解析，以得到源表字段信息以及源表表名。It should be understood that, the structured query statement is parsed by the structured query statement parsing plug-in, and the obtained statement analysis result is the data structure of the structured query statement. Due to the complex data structure of the structured query statement, it is impossible to directly obtain the data field information of the source table and the name of the source table. Therefore, further judgment can be made on the analysis result of the structured query statement to determine the statement type of the structured query statement, so as to adopt different analysis strategies for different statement types to obtain the field information of the source table and the source table. table name.

在可能的实施方式中，当所述语句类型为选取语句类型时，所述根据所述语句解析策略解析所述结构化查询语句，得到所述源表字段信息以及所述源表表名，可以包括：In a possible implementation manner, when the statement type is a selected statement type, the structured query statement is parsed according to the statement parsing strategy to obtain the field information of the source table and the name of the source table, which may be include:

根据所述结构化查询语句确定操作指令；在所述操作指令为联合操作指令时，对所述结构化查询语句进行拆分，得到操作指令为选取操作指令的目标结构化查询语句，并根据所述目标结构化查询语句确定所述源表字段信息以及所述源表表名；在所述操作指令为选取操作指令时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名。Determine the operation instruction according to the structured query statement; when the operation instruction is a joint operation instruction, split the structured query statement to obtain the target structured query statement that the operation instruction is the selected operation instruction, and according to the specified The target structured query statement determines the source table field information and the source table name; when the operation instruction is a selection operation instruction, determine the source table field information and the source table field information according to the structured query statement table name.

在可能的实施方式中，当所述语句类型为嵌入语句类型或创建语句类型时，所述根据所述语句解析策略解析所述结构化查询语句，得到所述源表字段信息以及所述源表表名，可以包括：In a possible implementation manner, when the statement type is an embedded statement type or a created statement type, the structured query statement is parsed according to the statement parsing strategy to obtain field information of the source table and the source table Table name, which can include:

确定所述结构化查询语句中是否存在选取语句；在所述结构化查询语句中不存在所述选取语句时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名；在所述结构化查询语句中存在所述选取语句时，根据所述结构化查询语句确定操作指令；在所述操作指令为联合操作指令时，对所述结构化查询语句进行拆分，得到操作指令为选取操作指令的目标结构化查询语句，并根据所述目标结构化查询语句确定所述源表字段信息以及所述源表表名；在所述操作指令为选取操作指令时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名。Determine whether there is a selection statement in the structured query statement; when the selection statement does not exist in the structured query statement, determine the source table field information and the source table name according to the structured query statement ; When the selection statement exists in the structured query statement, determine the operation instruction according to the structured query statement; when the operation instruction is a joint operation instruction, split the structured query statement to obtain The operation instruction is to select the target structured query statement of the operation instruction, and determine the field information of the source table and the table name of the source table according to the target structured query statement; when the operation instruction is the selection operation instruction, according to the The structured query statement determines the field information of the source table and the name of the source table.

本实施例中，通过对结构化查询语句进行分层并逐层递归解析，可以得到底层实体表与其字段间的归属关系，通过记录层与层间字段的对应关系，最后可以得到源表字段信息与底层实体表的对应关系。In this embodiment, by layering and recursively parsing the structured query statement layer by layer, the attribution relationship between the underlying entity table and its fields can be obtained, and the field information of the source table can be obtained by recording the corresponding relationship between layers and fields between layers The corresponding relationship with the underlying entity table.

此外值得说明的是，数据库的类型多种多样，有的数据库能够根据数据库连接协议直接获取数据库中的数据，有的数据库则不支持通过数据库连接协议获取数据库中的数据，例如阿帕奇开源的两个大数据量计算框架FlinkSql和SparkSql。因此，为了从FlinkSql和/或SparkSql中获取到源表字段信息标识，可以通过解析结构化查询语句或从目标数据库(FlinkSql和/或SparkSql中的数据来源)中确定源表字段。即，根据本公开的一个实施例，当所述配置信息包括用于表征所述源表字段信息不支持通过数据库连接协议查询的信息时，所述根据所述配置信息确定源表字段信息标识，可以包括：In addition, it is worth noting that there are various types of databases. Some databases can directly obtain the data in the database according to the database connection protocol, while some databases do not support obtaining the data in the database through the database connection protocol. For example, the Apache open source Two large data computing frameworks FlinkSql and SparkSql. Therefore, in order to obtain the source table field information identifier from FlinkSql and/or SparkSql, the source table field can be determined by parsing the structured query statement or from the target database (data source in FlinkSql and/or SparkSql). That is, according to an embodiment of the present disclosure, when the configuration information includes information indicating that the source table field information does not support query through a database connection protocol, the determination of the source table field information identifier according to the configuration information, Can include:

根据所述配置信息确定所述任务组件的任务类型；在所述任务类型为字典表加载任务时，根据所述配置信息从目标数据库中确定源表字段信息以及源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识；在所述任务类型为转换任务时，根据所述配置信息确定结构化查询语句，并解析所述结构化查询语句，得到源表字段以及源表表名；根据所述源表表名确定源表标识，并根据所述源表标识和所述源表字段在已解析的数据里确定所述源表字段的字段类型；根据所述源表字段和所述字段类型，确定所述源表字段信息，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识。Determine the task type of the task component according to the configuration information; when the task type is a dictionary table loading task, determine the source table field information and the source table identifier from the target database according to the configuration information, and according to the source The table field information and the source table identifier determine the source table field information identifier; when the task type is a conversion task, determine the structured query statement according to the configuration information, and parse the structured query statement to obtain the source table field and source table name; determine the source table identifier according to the source table name, and determine the field type of the source table field in the parsed data according to the source table identifier and the source table field; according to The source table field and the field type determine the source table field information, and determine the source table field information identifier according to the source table field information and the source table identifier.

S103，根据每个所述字段关联关系，确定数据血缘关系图。S103. Determine a data kinship relationship diagram according to each of the field association relationships.

在可能的实施方式中，所述根据每个所述字段关联关系，确定数据血缘关系图，可以包括：In a possible implementation manner, the determining the data kinship relationship diagram according to each of the field association relationships may include:

针对每个所述字段关联关系，确定对应的所述源表字段信息标识和所述目标表字段信息标识，并根据所述源表字段信息标识确定源表标识，根据所述目标表字段信息标识确定目标表标识；以具有相同所述源表标识的所述源表字段信息标识作为第一数据，对应的所述源表标识作为第一表名，构建第一数据表，以及以具有相同所述目标表标识的所述目标表字段信息标识作为第二数据，对应的所述目标表标识作为第二表名，构建第二数据表；根据所述字段关联关系对所述第一数据表中的第一数据和所述第二数据表中的所述第二数据进行绑定，得到所述数据血缘关系图。For each field association relationship, determine the corresponding source table field information identifier and the target table field information identifier, and determine the source table identifier according to the source table field information identifier, and determine the source table identifier according to the target table field information identifier Determine the target table identifier; use the source table field information identifier with the same source table identifier as the first data, and the corresponding source table identifier as the first table name, construct the first data table, and use the same source table identifier as the first table name The target table field information identification of the target table identification is used as the second data, and the corresponding target table identification is used as the second table name to construct the second data table; according to the field association relationship in the first data table Bind the first data in the second data table with the second data in the second data table to obtain the blood relationship graph of the data.

应当理解的是，所有任务组件执行完成后，会生成多个字段关联关系，而有的字段关联关系是表征同一源表和同一目标表中不同数据字段之间的对应关系。例如，第一字段关联关系是表征源表A中第一源表字段信息与目标表A中第一目标表字段信息的对应关系；第二字段关联关系是表征源表A中第二源表字段信息与目标表A中第二目标表字段信息的对应关系。因此，通过源表标识和目标表标识进行归类，将来自同一源表的源表字段信息标识进行归类，形成第一数据表，将来自同一目标表的目标表字段信息标识进行归类，形成第二数据表，然后再根据字段关联关系对第一数据表中的源表字段信息标识和第二数据表中的目标表字段信息标识进行绑定，从而得到能直观看出数据表与数据表、数据字段与数据字段之间对应关系的数据血缘关系图，如图2所示。It should be understood that after all task components are executed, multiple field associations will be generated, and some field associations represent the correspondence between different data fields in the same source table and the same target table. For example, the first field association is the corresponding relationship between the first source table field information in the source table A and the first target table field information in the target table A; the second field association is the second source table field in the source table A The corresponding relationship between the information and the field information of the second target table in the target table A. Therefore, by classifying the source table identifier and the target table identifier, the source table field information identifiers from the same source table are classified to form the first data table, and the target table field information identifiers from the same target table are classified, Form the second data table, and then bind the field information identifier of the source table in the first data table and the field information identifier of the target table in the second data table according to the field association relationship, so as to obtain the The data kinship diagram of the corresponding relationship between tables, data fields and data fields is shown in Figure 2.

在可能的实施方式中，为了能更直观的看出数据表与数据表和/或数据字段与数据字段之间的对应关系，在生成数据血缘关系图时，可以根据源表标识、源表字段信息标识、目标表标识以及目标表字段信息标识分别确定出源表表名、源表字段信息、目标表标识以及目标表字段信息，然后再通过源表表名和目标表表名进行归类，即，将来自同一源表的源表字段信息归类到第一数据表中，将来自同一目标表的目标表字段信息归类到第二数据表中，最后根据字段关联关系对第一数据表中的源表字段信息和第二数据表中的目标表字段信息进行绑定，从而得到能直观看出数据表与数据表、数据字段与数据字段之间对应关系的数据血缘关系图。In a possible implementation, in order to see the correspondence between data tables and data tables and/or data fields and data fields more intuitively, when generating the data kinship diagram, the The information ID, target table ID, and target table field information ID respectively determine the source table name, source table field information, target table ID, and target table field information, and then classify by source table name and target table name, that is , classify the field information of the source table from the same source table into the first data table, classify the field information of the target table from the same target table into the second data table, and finally classify the field information in the first data table according to the field association relationship Binding the field information of the source table and the field information of the target table in the second data table, so as to obtain a data blood relationship diagram that can intuitively see the corresponding relationship between data tables and data tables, and data fields.

综上，通过上述技术方案，提取源表字段信息和目标表字段信息之间的字段关联关系，并根据字段关联关系构建数据血缘关系图，一方面可以直观的看出数据库与数据库、数据表与数据表和/或数据字段与数据字段之间的对应关系，为系统的后期建设与维护提供支撑；另一方面，通过数据血缘关系图可以使数据开发过程中所涉及的数据库/数据表可视化，能够一目了然的看出数据血缘关系图中每个数据字段信息的来源，不再需要咨询相关的数据开发人员，减少项目对人员的依赖。To sum up, through the above technical solutions, the field association relationship between the field information of the source table and the field information of the target table is extracted, and the data blood relationship diagram is constructed according to the field association relationship. The corresponding relationship between data tables and/or data fields and data fields provides support for the later construction and maintenance of the system; on the other hand, the database/data tables involved in the data development process can be visualized through the data blood relationship diagram, It is possible to see at a glance the source of information for each data field in the data kinship diagram, and it is no longer necessary to consult relevant data developers, reducing the project's dependence on personnel.

基于同一构思，本公开实施例还提供了一种数据血缘关系图构建装置，如图3所示，该数据血缘关系图构建装置300可以包括：Based on the same idea, an embodiment of the present disclosure also provides a device for constructing a data blood relationship graph. As shown in FIG. 3 , the data blood relationship graph construction device 300 may include:

获取模块310，用于获取任务组件信息，所述任务组件信息包括多个任务组件的执行顺序以及所述多个任务组件的配置信息；An acquisition module 310, configured to acquire task component information, the task component information including the execution sequence of multiple task components and configuration information of the multiple task components;

第一确定模块320，用于基于所述执行顺序，依次根据每个所述任务组件的配置信息确定对应所述任务组件的字段关联关系，所述字段关联关系用于表征源表字段信息和目标表字段信息之间的对应关系，所述目标表字段信息由所述任务组件对所述源表字段信息进行处理后得到；The first determination module 320 is configured to determine the field association relationship corresponding to the task component according to the configuration information of each task component in sequence based on the execution order, and the field association relationship is used to represent the source table field information and the target Correspondence between table field information, the target table field information is obtained after the task component processes the source table field information;

第二确定模块330，用于根据每个所述字段关联关系，确定数据血缘关系图。The second determination module 330 is configured to determine a data blood relationship diagram according to each field association relationship.

可选地，所述第一确定模块320可以包括：Optionally, the first determining module 320 may include:

第一确定子模块，用于根据所述配置信息确定源表字段信息标识、目标表字段信息标识以及所述源表字段信息标识和所述目标表字段信息标识之间的对应关系；A first determining submodule, configured to determine a source table field information identifier, a target table field information identifier, and a correspondence between the source table field information identifier and the target table field information identifier according to the configuration information;

第一绑定子模块，用于根据所述对应关系对所述源表字段信息标识和所述目标表字段信息标识进行绑定，得到所述字段关联关系。The first binding submodule is configured to bind the source table field information identifier and the target table field information identifier according to the corresponding relationship to obtain the field association relationship.

可选地，所述第一确定子模块可以包括：Optionally, the first determining submodule may include:

第一确定单元，用于当所述配置信息包括源库信息时，根据所述源库信息确定源表字段信息以及源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识；A first determining unit, configured to determine source table field information and a source table identifier according to the source database information when the configuration information includes source library information, and determine the source table field information and the source table identifier according to the source table field information and the source table identifier The field information identifier of the source table;

第二确定单元，用于当所述配置信息包括结构化查询语句且不包括源库信息时，解析所述结构化查询语句，得到源表字段信息以及源表表名，根据所述源表表名确定源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识。The second determining unit is configured to parse the structured query statement to obtain source table field information and source table name when the configuration information includes a structured query statement and does not include source database information, according to the source table The source table identifier is determined according to the source table name, and the source table field information identifier is determined according to the source table field information and the source table identifier.

可选地，所述第二确定单元可以包括：Optionally, the second determining unit may include:

第一解析子单元，用于对所述结构化查询语句进行语句解析，得到语句解析结果；The first parsing subunit is configured to perform sentence parsing on the structured query sentence to obtain a sentence parsing result;

第一确定子单元，用于根据所述语句解析结果确定所述结构化查询语句的语句类型；A first determining subunit, configured to determine the statement type of the structured query statement according to the statement analysis result;

第二解析子单元，用于根据所述语句类型确定语句解析策略，并根据所述语句解析策略解析所述结构化查询语句，得到所述源表字段信息以及所述源表表名。The second parsing subunit is configured to determine a sentence parsing strategy according to the sentence type, and parse the structured query sentence according to the sentence parsing strategy to obtain the field information of the source table and the name of the source table.

可选地，当所述语句类型为选取语句类型时，所述第二解析子单元可以包括：Optionally, when the sentence type is a selected sentence type, the second parsing subunit may include:

第一确定组件，用于根据所述结构化查询语句确定操作指令；A first determining component, configured to determine an operation instruction according to the structured query statement;

第二确定组件，用于在所述操作指令为联合操作指令时，对所述结构化查询语句进行拆分，得到操作指令为选取操作指令的目标结构化查询语句，并根据所述目标结构化查询语句确定所述源表字段信息以及所述源表表名；The second determining component is configured to split the structured query statement when the operation instruction is a combined operation instruction, obtain a target structured query statement whose operation instruction is a selected operation instruction, and structure the query according to the target The query statement determines the field information of the source table and the name of the source table;

第三确定组件，用于在所述操作指令为选取操作指令时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名。The third determining component is configured to determine the source table field information and the source table name according to the structured query statement when the operation instruction is a selection operation instruction.

可选地，当所述语句类型为嵌入语句类型或创建语句类型时，所述第二解析子单元可以包括：Optionally, when the statement type is an embedded statement type or a created statement type, the second parsing subunit may include:

第四确定组件，用于确定所述结构化查询语句中是否存在选取语句；The fourth determination component is used to determine whether there is a selection statement in the structured query statement;

第五确定组件，用于在所述结构化查询语句中不存在所述选取语句时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名；A fifth determining component, configured to determine the source table field information and the source table name according to the structured query statement when the selection statement does not exist in the structured query statement;

第六确定组件，用于在所述结构化查询语句中存在所述选取语句时，根据所述结构化查询语句确定操作指令；A sixth determining component, configured to determine an operation instruction according to the structured query statement when the selection statement exists in the structured query statement;

第七确定组件，用于在所述操作指令为联合操作指令时，对所述结构化查询语句进行拆分，得到操作指令为选取操作指令的目标结构化查询语句，并根据所述目标结构化查询语句确定所述源表字段信息以及所述源表表名；The seventh determination component is used to split the structured query statement when the operation instruction is a joint operation instruction, obtain the target structured query statement whose operation instruction is a selected operation instruction, and structure the query statement according to the target The query statement determines the field information of the source table and the name of the source table;

第八确定组件，用于在所述操作指令为选取操作指令时，根据所述结构化查询语句确定所述源表字段信息以及所述源表表名。The eighth determining component is configured to determine the field information of the source table and the name of the source table according to the structured query statement when the operation instruction is a selection operation instruction.

可选地，当所述配置信息包括用于表征所述源表字段信息不支持通过数据库连接协议查询的信息时，所述第一确定子模块还可以包括：Optionally, when the configuration information includes information indicating that the source table field information does not support query through a database connection protocol, the first determining submodule may further include:

第三确定单元，用于根据所述配置信息确定所述任务组件的任务类型；a third determining unit, configured to determine the task type of the task component according to the configuration information;

第四确定单元，用于在所述任务类型为字典表加载任务时，根据所述配置信息从目标数据库中确定源表字段信息以及源表标识，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识；The fourth determining unit is configured to determine the source table field information and the source table identifier from the target database according to the configuration information when the task type is a dictionary table loading task, and determine the source table field information and the source table identifier according to the source table field information and the source table The table identifier determines the field information identifier of the source table;

第五确定单元，用于在所述任务类型为转换任务时，根据所述配置信息确定结构化查询语句，并解析所述结构化查询语句，得到源表字段以及源表表名；The fifth determining unit is configured to determine a structured query statement according to the configuration information when the task type is a conversion task, and parse the structured query statement to obtain source table fields and source table names;

第六确定单元，用于根据所述源表表名确定源表标识，并根据所述源表标识和所述源表字段在已解析的数据里确定所述源表字段的字段类型；The sixth determining unit is configured to determine the source table identifier according to the source table name, and determine the field type of the source table field in the parsed data according to the source table identifier and the source table field;

第七确定单元，用于根据所述源表字段和所述字段类型，确定所述源表字段信息，并根据所述源表字段信息和所述源表标识确定所述源表字段信息标识。A seventh determining unit, configured to determine the source table field information according to the source table field and the field type, and determine the source table field information identifier according to the source table field information and the source table identifier.

可选地，所述第二确定模块330可以包括：Optionally, the second determining module 330 may include:

第二确定子模块，用于针对每个所述字段关联关系，确定对应的所述源表字段信息标识和所述目标表字段信息标识，并根据所述源表字段信息标识确定源表标识，根据所述目标表字段信息标识确定目标表标识；The second determining submodule is configured to determine the corresponding source table field information identifier and the target table field information identifier for each of the field association relationships, and determine the source table identifier according to the source table field information identifier, Determine the target table identifier according to the target table field information identifier;

构建子模块，用于以具有相同所述源表标识的所述源表字段信息标识作为第一数据，对应的所述源表标识作为第一表名，构建第一数据表，以及以具有相同所述目标表标识的所述目标表字段信息标识作为第二数据，对应的所述目标表标识作为第二表名，构建第二数据表；Constructing a submodule for constructing a first data table with the field information identifier of the source table having the same source table identifier as the first data, and the corresponding source table identifier as the first table name, and constructing a first data table with the same The target table field information identifier of the target table identifier is used as second data, and the corresponding target table identifier is used as a second table name to construct a second data table;

第二绑定子模块，用于根据所述字段关联关系对所述第一数据表中的第一数据和所述第二数据表中的所述第二数据进行绑定，得到所述数据血缘关系图。The second binding submodule is configured to bind the first data in the first data table and the second data in the second data table according to the field association relationship, to obtain the blood relationship of the data relation chart.

通过上述装置，提取源表字段信息和目标表字段信息之间的字段关联关系，并根据字段关联关系构建数据血缘关系图，一方面可以直观的看出数据库与数据库、数据表与数据表和/或数据字段与数据字段之间的对应关系，为系统的后期建设与维护提供支撑；另一方面，通过数据血缘关系图可以使数据开发过程中所涉及的数据库/数据表可视化，能够一目了然的看出数据血缘关系图中每个数据字段信息的来源，不再需要咨询相关的数据开发人员，减少项目对人员的依赖。Through the above device, the field association relationship between the field information of the source table and the field information of the target table is extracted, and the data blood relationship diagram is constructed according to the field association relationship. Or the corresponding relationship between data fields and data fields provides support for the later construction and maintenance of the system; on the other hand, the database/data table involved in the data development process can be visualized through the data blood relationship diagram, which can be seen at a glance The source of information for each data field in the data blood relationship diagram is displayed, and it is no longer necessary to consult relevant data developers, reducing the project's dependence on personnel.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

图4是根据一示例性实施例示出的一种电子设备的框图。如图4所示，该电子设备400可以包括：处理器401，存储器402。该电子设备400还可以包括多媒体组件403，输入/输出(I/O)接口404，以及通信组件405中的一者或多者。Fig. 4 is a block diagram of an electronic device according to an exemplary embodiment. As shown in FIG. 4 , the electronic device 400 may include: a processor 401 and a memory 402 . The electronic device 400 may also include one or more of a multimedia component 403 , an input/output (I/O) interface 404 , and a communication component 405 .

其中，处理器401用于控制该电子设备400的整体操作，以完成上述的数据血缘关系图构建方法中的全部或部分步骤。存储器402用于存储各种类型的数据以支持在该电子设备400的操作，这些数据例如可以包括用于在该电子设备400上操作的任何应用程序或方法的指令，以及应用程序相关的数据。该存储器402可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，例如静态随机存取存储器(Static Random Access Memory，简称SRAM)，电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory，简称EEPROM)，可擦除可编程只读存储器(Erasable Programmable Read-Only Memory，简称EPROM)，可编程只读存储器(Programmable Read-Only Memory，简称PROM)，只读存储器(Read-Only Memory，简称ROM)，磁存储器，快闪存储器，磁盘或光盘。多媒体组件403可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏，音频组件用于输出和/或输入音频信号。例如，音频组件可以包括一个麦克风，麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器402或通过通信组件405发送。音频组件还包括至少一个扬声器，用于输出音频信号。I/O接口404为处理器401和其他接口模块之间提供接口，上述其他接口模块可以是键盘，鼠标，按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件405用于该电子设备400与其他设备之间进行有线或无线通信。无线通信，例如Wi-Fi，蓝牙，近场通信(Near Field Communication，简称NFC)，2G、3G、4G、NB-IOT、eMTC、或其他5G等等，或它们中的一种或几种的组合，在此不做限定。因此相应的该通信组件405可以包括：Wi-Fi模块，蓝牙模块，NFC模块等等。Wherein, the processor 401 is used to control the overall operation of the electronic device 400, so as to complete all or part of the steps in the above-mentioned method for constructing the data blood relationship graph. The memory 402 is used to store various types of data to support the operation of the electronic device 400 , such data may include instructions for any application or method operating on the electronic device 400 , as well as application-related data. The memory 402 can be implemented by any type of volatile or non-volatile storage device or their combination, such as Static Random Access Memory (Static Random Access Memory, referred to as SRAM), Electrically Erasable Programmable Read-Only Memory (EPROM) Electrically Erasable Programmable Read-Only Memory, referred to as EEPROM), Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, referred to as EPROM), Programmable Read-Only Memory (Programmable Read-Only Memory, referred to as PROM), read-only Memory (Read-Only Memory, ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. Multimedia components 403 may include screen and audio components. The screen can be, for example, a touch screen, and the audio component is used for outputting and/or inputting audio signals. For example, an audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in memory 402 or sent via communication component 405 . The audio component also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, which may be a keyboard, a mouse, buttons, and the like. These buttons can be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC for short), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or more of them Combinations are not limited here. Therefore, correspondingly, the communication component 405 may include: a Wi-Fi module, a Bluetooth module, an NFC module and the like.

在一示例性实施例中，电子设备400可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit，简称ASIC)、数字信号处理器(DigitalSignal Processor，简称DSP)、数字信号处理设备(Digital Signal Processing Device，简称DSPD)、可编程逻辑器件(Programmable Logic Device，简称PLD)、现场可编程门阵列(Field Programmable Gate Array，简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述的数据血缘关系图构建方法。In an exemplary embodiment, the electronic device 400 may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), digital signal processors (Digital Signal Processor, DSP for short), digital signal processing equipment (Digital Signal Processing Device, referred to as DSPD), programmable logic device (Programmable Logic Device, referred to as PLD), field programmable gate array (Field Programmable Gate Array, referred to as FPGA), controller, microcontroller, microprocessor or other electronic components to achieve , which is used to implement the above-mentioned method for constructing a data kinship relationship graph.

在另一示例性实施例中，还提供了一种包括程序指令的计算机可读存储介质，该程序指令被处理器执行时实现上述的数据血缘关系图构建方法的步骤。例如，该计算机可读存储介质可以为上述包括程序指令的存储器402，上述程序指令可由电子设备400的处理器401执行以完成上述的数据血缘关系图构建方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, and when the program instructions are executed by a processor, the steps of the above-mentioned method for constructing a data blood relationship graph are implemented. For example, the computer-readable storage medium may be the above-mentioned memory 402 including program instructions, and the above-mentioned program instructions can be executed by the processor 401 of the electronic device 400 to complete the above-mentioned method for constructing a data kinship graph.

在另一示例性实施例中，还提供一种计算机程序产品，该计算机程序产品包含能够由可编程的装置执行的计算机程序，该计算机程序具有当由该可编程的装置执行时用于执行上述的数据血缘关系图构建方法的代码部分。In another exemplary embodiment, there is also provided a computer program product comprising a computer program executable by a programmable device, the computer program having a function for performing the above-mentioned The code part of the data lineage graph construction method.

以上结合附图详细描述了本公开的优选实施方式，但是，本公开并不限于上述实施方式中的具体细节，在本公开的技术构思范围内，可以对本公开的技术方案进行多种简单变型，这些简单变型均属于本公开的保护范围。The preferred embodiments of the present disclosure have been described in detail above in conjunction with the accompanying drawings. However, the present disclosure is not limited to the specific details of the above embodiments. Within the scope of the technical concept of the present disclosure, various simple modifications can be made to the technical solutions of the present disclosure. These simple modifications all belong to the protection scope of the present disclosure.

另外需要说明的是，在上述具体实施方式中所描述的各个具体技术特征，在不矛盾的情况下，可以通过任何合适的方式进行组合，为了避免不必要的重复，本公开对各种可能的组合方式不再另行说明。In addition, it should be noted that the various specific technical features described in the above specific embodiments can be combined in any suitable manner if there is no contradiction. The combination method will not be described separately.

此外，本公开的各种不同的实施方式之间也可以进行任意组合，只要其不违背本公开的思想，其同样应当视为本公开所公开的内容。In addition, various implementations of the present disclosure can be combined arbitrarily, as long as they do not violate the idea of the present disclosure, they should also be regarded as the content disclosed in the present disclosure.

Claims

1. A method for constructing a data relationship diagram, characterized in that the method comprises:

Obtaining task component information, the task component information including the execution sequence of multiple task components and configuration information of the multiple task components;

Based on the execution sequence, determine the field association relationship corresponding to the task component according to the configuration information of each task component in turn, and the field association relationship is used to represent the correspondence between the field information of the source table and the field information of the target table , the target table field information is obtained after the task component processes the source table field information;

According to the association relationship of each of the fields, a data blood relationship diagram is determined.

2. The method according to claim 1, wherein the determining the field association relationship corresponding to the task component according to the configuration information of each task component comprises:

For the configuration information of each of the described task components, perform the following steps:

Determine a source table field information identifier, a target table field information identifier, and a correspondence between the source table field information identifier and the target table field information identifier according to the configuration information;

The field information identifier of the source table and the field information identifier of the target table are bound according to the corresponding relationship to obtain the field association relationship.

3. The method according to claim 2, wherein the determining the source table field information identifier according to the configuration information comprises:

When the configuration information includes source database information, determine source table field information and a source table identifier according to the source database information, and determine the source table field information identifier according to the source table field information and the source table identifier;

When the configuration information includes a structured query statement and does not include source library information, parse the structured query statement to obtain source table field information and source table name, determine the source table identifier according to the source table name, and Determine the source table field information identifier according to the source table field information and the source table identifier.

4. The method according to claim 3, wherein said parsing said structured query statement to obtain source table field information and source table name includes:

Performing statement analysis on the structured query statement to obtain a statement analysis result;

determining the statement type of the structured query statement according to the statement parsing result;

A statement parsing strategy is determined according to the statement type, and the structured query statement is parsed according to the statement parsing strategy to obtain field information of the source table and a table name of the source table.

5. The method according to claim 4, wherein, when the statement type is a selection statement type, the structured query statement is analyzed according to the statement analysis strategy to obtain the source table field information and The name of the source table includes:

determining an operation instruction according to the structured query statement;

When the operation instruction is a combined operation instruction, split the structured query statement to obtain a target structured query statement whose operation instruction is a selected operation instruction, and determine the source table according to the target structured query statement Field information and the name of the source table;

When the operation instruction is a selection operation instruction, the source table field information and the source table name are determined according to the structured query statement.

6. The method according to claim 4, wherein when the statement type is an embedded statement type or a created statement type, the structured query statement is parsed according to the statement analysis strategy to obtain the source Table field information and the name of the source table, including:

Determine whether there is a selection statement in the structured query statement;

When the selection statement does not exist in the structured query statement, determine the source table field information and the source table name according to the structured query statement;

When the selection statement exists in the structured query statement, determine an operation instruction according to the structured query statement;

7. The method according to claim 2, wherein when the configuration information includes information indicating that the field information of the source table does not support query through a database connection protocol, determining the source according to the configuration information Table field information identification, including:

determining the task type of the task component according to the configuration information;

When the task type is a dictionary table loading task, determine the source table field information and source table identifier from the target database according to the configuration information, and determine the source table according to the source table field information and the source table identifier Field information identification;

When the task type is a conversion task, determine a structured query statement according to the configuration information, and parse the structured query statement to obtain a source table field and a source table name;

Determine the source table identifier according to the source table name, and determine the field type of the source table field in the parsed data according to the source table identifier and the source table field;

The source table field information is determined according to the source table field and the field type, and the source table field information identifier is determined according to the source table field information and the source table identifier.

8. The method according to any one of claims 1-7, wherein the determination of the data blood relationship diagram according to each of the field associations includes:

For each field association relationship, determine the corresponding source table field information identifier and the target table field information identifier, and determine the source table identifier according to the source table field information identifier, and determine the source table identifier according to the target table field information identifier Determine the target table ID;

Using the source table field information identifier with the same source table identifier as the first data, and the corresponding source table identifier as the first table name, construct a first data table, and use the same target table identifier as the first data table The field information identifier of the target table is used as the second data, and the corresponding target table identifier is used as the second table name to construct the second data table;

Binding the first data in the first data table and the second data in the second data table according to the field association relationship to obtain the data blood relationship graph.

9. A data consanguinity graph construction device, characterized in that the device comprises:

An acquisition module, configured to acquire task component information, the task component information including the execution order of multiple task components and configuration information of the multiple task components;

The first determination module is configured to determine the field association relationship corresponding to the task component according to the configuration information of each task component in turn based on the execution order, and the field association relationship is used to represent the field information of the source table and the target table Correspondence between field information, the target table field information is obtained after the task component processes the source table field information;

The second determining module is configured to determine a data blood relationship diagram according to each of the field association relationships.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, characterized in that, when the program is executed by a processor, the steps of the method according to any one of claims 1-8 are implemented.

11. An electronic device, characterized in that it comprises:

a memory on which a computer program is stored;

A processor, configured to execute the computer program in the memory, so as to implement the steps of the method according to any one of claims 1-8.