[go: up one dir, main page]

CN114925688A - Data processing method and device based on artificial intelligence, electronic equipment and medium - Google Patents

Data processing method and device based on artificial intelligence, electronic equipment and medium Download PDF

Info

Publication number
CN114925688A
CN114925688A CN202210551314.5A CN202210551314A CN114925688A CN 114925688 A CN114925688 A CN 114925688A CN 202210551314 A CN202210551314 A CN 202210551314A CN 114925688 A CN114925688 A CN 114925688A
Authority
CN
China
Prior art keywords
data
database
link
database script
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210551314.5A
Other languages
Chinese (zh)
Inventor
张广凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202210551314.5A priority Critical patent/CN114925688A/en
Publication of CN114925688A publication Critical patent/CN114925688A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及人工智能技术领域,提供一种基于人工智能的数据加工方法、装置、电子设备及介质,所述方法包括:获取待加工数据对应的数据库脚本;对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库;获取每个数据库对应的数据表;基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路;确定每两个数据库对应的数据链路中的公共链路;基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。本申请结合文本识别算法找出待加工数据中存在公共依赖的部分链路,避免相同的一个数据表被反复加工,减少了资源的消耗提高了数据加工的效率。

Figure 202210551314

The present application relates to the technical field of artificial intelligence, and provides a data processing method, device, electronic device and medium based on artificial intelligence. The method includes: acquiring a database script corresponding to the data to be processed; performing text recognition on the database script, and determining A plurality of databases corresponding to the database script; obtaining a data table corresponding to each database; determining a data link corresponding to each database based on a pre-built data map and the data table; determining a data link corresponding to each two databases A public link in the road; based on the public link, determine the target data link corresponding to the database script, and perform data processing based on the target data link. In the present application, the text recognition algorithm is used to find out some links with common dependencies in the data to be processed, so as to avoid repeated processing of the same data table, reduce the consumption of resources and improve the efficiency of data processing.

Figure 202210551314

Description

基于人工智能的数据加工方法、装置、电子设备及介质Data processing method, device, electronic device and medium based on artificial intelligence

技术领域technical field

本申请涉及人工智能技术领域,具体涉及一种基于人工智能的数据加工方法、装置、电子设备及介质。The present application relates to the technical field of artificial intelligence, in particular to a data processing method, device, electronic device and medium based on artificial intelligence.

背景技术Background technique

数据集市(Data Mart)是一种作为满足特定部门或者用户的需求,按照多维的方式进行存储,然后生成决策分析的数据立方体。数据集市中维度和指标种类很多,均通过各类的数据报表进行加工对上游应用提供服务。传统的数据集市在层级划分和数据链路共享上存在许多的不足,存在严重的重复加工问题,集市中一个相同数据表被反复加工。一张数据表被反复加工,这就会导致内存严重浪费,且数据流向与数据依赖严重不规范的问题。A data mart is a data cube that is used to meet the needs of a specific department or user, store it in a multi-dimensional manner, and then generate decision analysis. There are many types of dimensions and indicators in the data mart, all of which are processed through various data reports to provide services to upstream applications. The traditional data mart has many deficiencies in hierarchical division and data link sharing, and there is a serious problem of repeated processing. The same data table in the mart is repeatedly processed. A data table is repeatedly processed, which will lead to serious waste of memory, and serious irregularities in data flow and data dependencies.

目前亟需一种数据加工方法,以解决数据重复加工读取的浪费资源的问题。At present, a data processing method is urgently needed to solve the problem of wasting resources in repeated data processing and reading.

发明内容SUMMARY OF THE INVENTION

鉴于以上内容,有必要提出一种基于人工智能的数据加工方法、装置、电子设备及介质,通过结合文本识别算法找出待加工数据中存在公共依赖的部分链路,避免相同的一个数据表被反复加工,减少了资源的消耗,提高了数据加工的效率。In view of the above content, it is necessary to propose a data processing method, device, electronic device and medium based on artificial intelligence, by combining text recognition algorithm to find out some links that have common dependencies in the data to be processed, so as to avoid the same data table from being processed. Repeated processing reduces resource consumption and improves data processing efficiency.

第一方面,本申请提供了一种基于人工智能的数据加工方法,所述方法包括:In a first aspect, the present application provides a data processing method based on artificial intelligence, the method comprising:

获取待加工数据对应的数据库脚本;Obtain the database script corresponding to the data to be processed;

对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库;Perform text recognition on the database script, and determine a plurality of databases corresponding to the database script;

获取每个数据库对应的数据表;Get the data table corresponding to each database;

基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路;Based on the pre-built data map and the data table, determine the data link corresponding to each database;

确定每两个数据库对应的数据链路中的公共链路;Determine the common link in the data link corresponding to each two databases;

基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。Based on the public link, a target data link corresponding to the database script is determined, and data processing is performed based on the target data link.

根据本申请的一个可选的实施方式,所述对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库包括:According to an optional implementation manner of the present application, performing text recognition on the database script to determine multiple databases corresponding to the database script includes:

对所述数据库脚本进行分词处理,得到分词结果;Perform word segmentation processing on the database script to obtain word segmentation results;

对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库。Perform keyword detection on the word segmentation result, and identify multiple databases corresponding to the database script from the word segmentation result.

根据本申请的一个可选的实施方式,所述对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库,包括:According to an optional embodiment of the present application, the keyword detection is performed on the word segmentation result, and multiple databases corresponding to the database script are identified from the word segmentation result, including:

采用预设特征词库从分词结果中确定出目标特征词所在位置,所述特征词库包括多个特征词,所述特征词是指所述数据库脚本使用的数据库操作语句的特征词;The location of the target feature word is determined from the word segmentation result by using a preset feature word database, the feature word database includes a plurality of feature words, and the feature word refers to the feature word of the database operation statement used by the database script;

采用关键字匹配技术,从所述分词结果中所述目标特征词所在位置之后的文本内容中识别出数据库库名;Using keyword matching technology, the database library name is identified from the text content after the position of the target feature word in the word segmentation result;

根据所述数据库库名,确定所述数据库脚本对应的多个数据库。According to the database library name, multiple databases corresponding to the database script are determined.

根据本申请的一个可选的实施方式,所述获取每个数据库对应的数据表包括:According to an optional implementation manner of the present application, the acquiring a data table corresponding to each database includes:

获取数据库的数据库库名,并将所述数据库库名确定为第一关键字;Obtain the database library name of the database, and determine the database library name as the first keyword;

根据所述第一关键字确定出所述数据库脚本包括的第二关键字,所述第二关键字位于所述第一关键字之后;determining a second keyword included in the database script according to the first keyword, where the second keyword is located after the first keyword;

根据所述第二关键字生成数据表名,并根据所述数据表名确定所述数据库对应的数据表。A data table name is generated according to the second keyword, and a data table corresponding to the database is determined according to the data table name.

根据本申请的一个可选的实施方式,所述基于所述公共链路,确定所述数据库脚本对应的目标数据链路包括:According to an optional implementation manner of the present application, the determining, based on the public link, the target data link corresponding to the database script includes:

确定所述公共链路对应的多个重合数据库;determining multiple coincidence databases corresponding to the public link;

获取每个所述重合数据库对应的数据链路;obtaining the data link corresponding to each of the overlapping databases;

基于所述公共链路,对每两个重合数据库对应的数据链路进行合并,得到第一数据链路;Based on the common link, the data links corresponding to every two overlapping databases are combined to obtain a first data link;

根据所述数据库脚本中非所述公共链路对应的数据库,得到第二数据链路;Obtain a second data link according to a database not corresponding to the public link in the database script;

根据所述第一数据链路和所述第二数据链路,确定所述数据库脚本对应的目标数据链路。A target data link corresponding to the database script is determined according to the first data link and the second data link.

根据本申请的一个可选的实施方式,基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路之前,所述方法还包括:According to an optional embodiment of the present application, before determining the data link corresponding to each database based on the pre-built data map and the data table, the method further includes:

获取历史数据库脚本;Get the history database script;

对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本;Text recognition is performed on the historical database script, and the data flow direction text is determined in the historical database script;

根据所述数据流向文本,确定数据表流向;Determine the flow direction of the data table according to the data flow direction text;

基于所述数据表流向,生成数据地图。Based on the flow direction of the data table, a data map is generated.

根据本申请的一个可选的实施方式,所述对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本包括:According to an optional implementation manner of the present application, the performing text recognition on the historical database script, and determining the data flow direction text in the historical database script includes:

调用预设的分词工具包对所述历史数据库脚本进行分词处理,得到分词结果;Call the preset word segmentation toolkit to perform word segmentation processing on the historical database script, and obtain a word segmentation result;

从所述分词结果中,确定出预设流向词在所述历史数据库脚本中的位置;From the word segmentation result, determine the position of the preset flow direction word in the historical database script;

根据所述预设流向词在所述历史数据库脚本中的位置,从所述历史数据库脚本中提取出数据流向文本。According to the position of the preset flow direction word in the historical database script, the data flow direction text is extracted from the historical database script.

第二方面,本申请提供了一种基于人工智能的数据加工装置,所述装置包括:In a second aspect, the present application provides a data processing device based on artificial intelligence, the device comprising:

脚本获取模块,用于获取待加工数据对应的数据库脚本;The script obtaining module is used to obtain the database script corresponding to the data to be processed;

文本识别模块,用于对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库;a text recognition module, configured to perform text recognition on the database script, and determine a plurality of databases corresponding to the database script;

数据获取模块,用于获取每个数据库对应的数据表;The data acquisition module is used to acquire the data table corresponding to each database;

链路确定模块,用于基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路;a link determination module, configured to determine the data link corresponding to each database based on the pre-built data map and the data table;

链路识别模块,用于确定每两个数据库对应的数据链路中的公共链路;The link identification module is used to determine the common link in the data link corresponding to each two databases;

数据加工模块,用于基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。A data processing module, configured to determine a target data link corresponding to the database script based on the public link, and perform data processing based on the target data link.

第三方面,本申请提供了一种电子设备,所述电子设备包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机程序时实现所述基于人工智能的数据加工方法。In a third aspect, the present application provides an electronic device, the electronic device includes a processor and a memory, and the processor is configured to implement the artificial intelligence-based data processing method when executing a computer program stored in the memory.

第四方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现所述基于人工智能的数据加工方法。In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program implements the artificial intelligence-based data processing method when executed by a processor.

综上所述,本申请所述的基于人工智能的数据加工方法、装置、电子设备及介质,通过基于文本识别算法对待加工数据对应的数据库脚本进行识别,确定所述数据库脚本对应的多个数据库;接着基于数据库中对应的数据表和预先构建的数据地图,确定每个数据库对应的数据链路;然后确定每两个数据库对应的数据链路中的公共链路,公共链路为存在重复处理的链路;最后基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工,基于目标数据链路进行加工可以避免相同的一个数据表被反复加工,减少了资源的消耗,提高了数据加工的效率。To sum up, the artificial intelligence-based data processing method, device, electronic device and medium described in this application identify the database script corresponding to the data to be processed based on the text recognition algorithm, and determine multiple databases corresponding to the database script. ; Then, based on the corresponding data table in the database and the pre-built data map, determine the data link corresponding to each database; then determine the public link in the data link corresponding to each two databases, and the public link is the existence of repeated processing Finally, based on the public link, determine the target data link corresponding to the database script, and perform data processing based on the target data link. Processing based on the target data link can avoid the same data table It is processed repeatedly, which reduces the consumption of resources and improves the efficiency of data processing.

附图说明Description of drawings

图1是本申请实施例一提供的基于人工智能的数据加工方法的流程图。FIG. 1 is a flowchart of an artificial intelligence-based data processing method provided in Embodiment 1 of the present application.

图2是本申请实施例提供的一种数据表流向的示意图。FIG. 2 is a schematic diagram of a flow direction of a data table provided by an embodiment of the present application.

图3是本申请实施例提供的一种数据链路的示意图。FIG. 3 is a schematic diagram of a data link provided by an embodiment of the present application.

图4是本申请实施例提供的一种数据链路的示意图。FIG. 4 is a schematic diagram of a data link provided by an embodiment of the present application.

图5是本申请实施例提供的一种数据链路的示意图。FIG. 5 is a schematic diagram of a data link provided by an embodiment of the present application.

图6是本申请实施例提供的一种目标链路的示意图。FIG. 6 is a schematic diagram of a target link provided by an embodiment of the present application.

图7是本申请实施例二提供的基于人工智能的数据加工装置的结构图。FIG. 7 is a structural diagram of an artificial intelligence-based data processing apparatus provided in Embodiment 2 of the present application.

图8是本申请实施例三提供的电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.

具体实施方式Detailed ways

为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments may be combined with each other in the case of no conflict.

除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述在一个可选的实施方式中实施例的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terminology used herein in the specification of the application is for the purpose of describing an example in an alternative embodiment only and is not intended to limit the application.

本申请实施例提供的基于人工智能的数据加工方法由电子设备执行,相应地,基于人工智能的数据加工装置运行于电子设备中。所述电子设备可以包括手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等。The artificial intelligence-based data processing method provided by the embodiments of the present application is executed by an electronic device, and correspondingly, the artificial intelligence-based data processing apparatus runs in the electronic device. The electronic devices may include mobile phones, tablet computers, notebook computers, desktop computers, personal digital assistants, wearable devices, and the like.

本申请实施例可以基于人工智能技术对数据进行加工,提高数据加工的效率。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application can process data based on artificial intelligence technology, so as to improve the efficiency of data processing. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

实施例一Example 1

图1是本申请实施例一提供的基于人工智能的数据加工方法的流程图。所述基于人工智能的数据加工方法具体包括以下步骤,根据不同的需求,该流程图中步骤的顺序可以改变,某些可以省略。FIG. 1 is a flowchart of an artificial intelligence-based data processing method provided in Embodiment 1 of the present application. The artificial intelligence-based data processing method specifically includes the following steps. According to different requirements, the sequence of the steps in the flowchart can be changed, and some can be omitted.

S11,获取待加工数据对应的数据库脚本。S11, acquiring a database script corresponding to the data to be processed.

获取数据加工任务对应的待加工数据,其中,待加工数据可以包括多个数据库中对应的多张数据表。获取所述待加工数据对应的数据库脚本。数据库脚本可以为Hibernate查询语言(Hibernate Query Language,HQL)脚本等。The data to be processed corresponding to the data processing task is acquired, wherein the data to be processed may include a plurality of corresponding data tables in a plurality of databases. Obtain the database script corresponding to the data to be processed. The database script may be a Hibernate Query Language (Hibernate Query Language, HQL) script or the like.

S12,对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库。S12: Perform text recognition on the database script to determine multiple databases corresponding to the database script.

所述数据库脚本中的数据来自多个数据库,每个数据库都有对应的名称和/或编码,可以基于文本识别算法对所述数据库脚本进行识别,识别出所述数据库脚本对应的库名,确定所述数据库脚本对应的多个数据库。The data in the database script comes from multiple databases, each database has a corresponding name and/or code, the database script can be identified based on a text recognition algorithm, the library name corresponding to the database script can be identified, and the database script can be identified. Multiple databases corresponding to the database script.

在一个可选的实施方式中,所述对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库包括:In an optional implementation manner, performing text recognition on the database script to determine multiple databases corresponding to the database script includes:

对所述数据库脚本进行分词处理,得到分词结果;Perform word segmentation processing on the database script to obtain word segmentation results;

对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库。Perform keyword detection on the word segmentation result, and identify multiple databases corresponding to the database script from the word segmentation result.

示例性的,可以调用预设的分词工具包(如结巴工具包)对数据库脚本进行分词处理,得到分词结果。采用该方式可以提升关键字识别效率和准确度。Exemplarily, a preset word segmentation toolkit (such as a stammer toolkit) may be invoked to perform word segmentation processing on the database script, and a word segmentation result may be obtained. Using this method can improve the efficiency and accuracy of keyword recognition.

在一个可选的实施方式中,所述对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库,包括:In an optional embodiment, the keyword detection is performed on the word segmentation result, and multiple databases corresponding to the database script are identified from the word segmentation result, including:

采用预设特征词库从分词结果中确定出目标特征词所在位置,所述特征词库包括多个特征词,所述特征词是指所述数据库脚本使用的数据库操作语句的特征词;The location of the target feature word is determined from the word segmentation result by using a preset feature word database, the feature word database includes a plurality of feature words, and the feature word refers to the feature word of the database operation statement used by the database script;

采用关键字匹配技术,从所述分词结果中的所述目标特征词所在位置之后的文本内容中识别出数据库库名;Using keyword matching technology, the database library name is identified from the text content after the position of the target feature word in the word segmentation result;

根据所述数据库库名,确定所述数据库脚本对应的多个数据库。According to the database library name, multiple databases corresponding to the database script are determined.

其中,该特征词库可以包括多个特征词。特征词是指该数据库脚本使用的数据库操作语句的特征词,在此对特征词的内容不做任何限定。例如,数据库操作语句为结构化查询语言(Structured Query Language,SQL)语句,相应地,特征词可以为where、from、join和一些符号(如$),等等。Wherein, the feature thesaurus may include a plurality of feature words. The characteristic word refers to the characteristic word of the database operation statement used by the database script, and the content of the characteristic word is not limited here. For example, the database operation statement is a structured query language (Structured Query Language, SQL) statement, and correspondingly, the characteristic words can be where, from, join and some symbols (eg $), and so on.

为了识别出数据库操作文本包括的数据库库名,可以对数据库脚本进行分词处理,得到分词结果。具体地。可以使用分词工具包,如结巴工具包对数据库脚本中的每条SQL语句进行分词处理,得到分词结果。假设,使用结巴工具包对数据加工任务对应的数据库脚本中的每条SQL语句进行分词处理后的分词结果,包括如下内容:In order to identify the database library name included in the database operation text, a word segmentation process can be performed on the database script to obtain a word segmentation result. specifically. You can use a word segmentation toolkit, such as the stammer toolkit, to perform word segmentation processing on each SQL statement in the database script to obtain the word segmentation result. Suppose, the word segmentation result after word segmentation processing is performed on each SQL statement in the database script corresponding to the data processing task using the stutter toolkit, including the following content:

SelectA.nameSelectA.name

,B.type, B.type

,C.group, C.group

From cx_XX_safe:.text_table b1、.text_table c1、.text_table e1。From cx_XX_safe: .text_table b1, .text_table c1, .text_table e1.

SelectB.nameSelectB.name

,B.type, B.type

,C.group, C.group

From cx_XXX_safe:text_table b1、.text_table c2、.text_table e2。From cx_XXX_safe: text_table b1, .text_table c2, .text_table e2.

在得到分词结果后,可以对所述分词结果进行关键字检测,从该分词结果中识别出该数据库脚本包括的数据库库名。具体地,可以采用预设特征词库从分词结果中确定出目标特征词所在位置,并对所述目标分词进行关键字匹配,从该分词结果中该目标特征词所在位置之后的文本内容中识别出数据库库名。例如,可以采用预设的特征词库从分词结果中确定出From所在的位置,并对所述目标分词进行关键字匹配从cx_XX_safe:text_table b1、.text_table c1、.text_table e1中识别出库名cx_XX_safe,从cx_XXX_safetext_table b1、.text_table c2、.text_table e2中识别出库名cx_XXX_safe。After the word segmentation result is obtained, keyword detection can be performed on the word segmentation result, and the database library name included in the database script can be identified from the word segmentation result. Specifically, a preset feature thesaurus can be used to determine the location of the target feature word from the word segmentation result, perform keyword matching on the target word segmentation, and identify the target feature word from the text content after the location of the target feature word in the word segmentation result. out the database name. For example, a preset feature thesaurus can be used to determine the location of From from the word segmentation result, and perform keyword matching on the target word segmentation to identify the library name cx_XX_safe from cx_XX_safe: text_table b1, .text_table c1, .text_table e1 , identify the library name cx_XXX_safe from cx_XXX_safetext_table b1, .text_table c2, .text_table e2.

在识别出数据库名之后,可以根据该数据库库名确定出该数据库脚本对应的多个数据库。After the database name is identified, multiple databases corresponding to the database script can be determined according to the database library name.

S13,获取每个数据库对应的数据表。S13, obtain a data table corresponding to each database.

每个数据库后对应着该数据库对应的数据表。在识别出数据库库名后,可以将库名之后的表名提取出来,根据表名确定所述数据库对应的数据表。Each database corresponds to the corresponding data table of the database. After the database database name is identified, the table name following the database name can be extracted, and the data table corresponding to the database can be determined according to the table name.

在一个可选的实施方式中,所述获取所述数据库对应的数据表包括:In an optional implementation manner, the obtaining the data table corresponding to the database includes:

获取数据库的数据库库名,并将所述数据库库名确定为第一关键字;Obtain the database library name of the database, and determine the database library name as the first keyword;

根据所述第一关键字确定出所述数据库脚本包括的第二关键字,所述第二关键字位于所述第一关键字之后;determining a second keyword included in the database script according to the first keyword, where the second keyword is located after the first keyword;

根据所述第二关键字生成数据表名,并根据所述数据表名确定所述数据库对应的数据表。A data table name is generated according to the second keyword, and a data table corresponding to the database is determined according to the data table name.

对分词结果进行关键字匹配技术,从所述分词结果中所述目标特征词所在位置之后的文本内容中识别出数据库的数据库库名后,将数据库库名确定为第一关键词。按照预设的提取规则,将第一关键词后的若干位分词提取出来,确定为第二关键词。根据第二关键词的内容确定数据表名,从而得到数据库对应的数据表。例如,处理器可以将库名cx_XX_safe确定为第一关键词,并按照预设的提取规则(如提取text table语句),将提取到的text table语句确定为第二关键词,如将text_table b1、.text_table c1、.text_tablee1确定为第二关键词,并根据第二关键词的内容确定数据表名(text_table b1、.text_table c1、.text_table e1),从而得到数据库对应的数据表(b1、c1、e1);可以将库名cx_XXX_safe确定为第一关键词,并按照预设的提取规则(如提取text table语句),将提取到的text table语句确定为第二关键词,如将text_table b1、.text_table c2、.text_tablee2确定为第二关键词,并根据第二关键词的内容确定数据表名(text_table b1、.text_table c2、.text_table e2),从而得到数据库对应的数据表(b1、c2、e2)。A keyword matching technique is performed on the word segmentation result, and after identifying the database name of the database from the text content after the target feature word in the word segmentation result, the database name is determined as the first keyword. According to a preset extraction rule, several word segmentations after the first keyword are extracted and determined as the second keyword. The data table name is determined according to the content of the second keyword, thereby obtaining the data table corresponding to the database. For example, the processor may determine the library name cx_XX_safe as the first keyword, and determine the extracted text table statement as the second keyword according to preset extraction rules (such as extracting text table statements), such as text_table b1, .text_table c1, .text_tablee1 are determined as the second keyword, and the data table name (text_table b1, .text_table c1, .text_table e1) is determined according to the content of the second keyword, so as to obtain the corresponding data table (b1, c1, .text_table e1) of the database. e1); the library name cx_XXX_safe can be determined as the first keyword, and the extracted text table statement can be determined as the second keyword according to the preset extraction rules (such as extracting text table statements), such as text_table b1, . text_table c2, .text_tablee2 are determined as the second keyword, and the data table name (text_table b1, .text_table c2, .text_table e2) is determined according to the content of the second keyword, so as to obtain the data table (b1, c2, e2) corresponding to the database ).

S14,基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路。S14, based on the pre-built data map and the data table, determine the data link corresponding to each database.

每两个数据表中可能存在数据流向,数据流向用于表示数据表中数据的来源。例如,数据表A流向数据表B,说明数据表A是数据表B的源头,数据表B是在数据表A的基础上加工而来的。预先构建的数据地图中记录了数据表对应的数据流向,根据所述数据地图和数据表,可以确定每个数据库对应的数据链路。所述数据链路为加工所述待加工数据时,对数据表的处理流程。根据所述数据链路,可以确定加工所述待加工数据时需要处理的数据表数量。There may be a data flow direction in every two data tables, and the data flow direction is used to indicate the source of the data in the data table. For example, data table A flows to data table B, indicating that data table A is the source of data table B, and data table B is processed on the basis of data table A. The data flow direction corresponding to the data table is recorded in the pre-built data map, and the data link corresponding to each database can be determined according to the data map and the data table. The data link is the processing flow of the data table when processing the to-be-processed data. According to the data link, the number of data tables to be processed when processing the data to be processed can be determined.

在一个可选的实施方式中,基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路之前,所述方法还包括:In an optional implementation manner, before determining the data link corresponding to each database based on the pre-built data map and the data table, the method further includes:

获取历史数据库脚本;Get the history database script;

对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本;Text recognition is performed on the historical database script, and the data flow direction text is determined in the historical database script;

根据所述数据流向文本,确定数据表流向;Determine the flow direction of the data table according to the data flow direction text;

基于所述数据表流向,生成数据地图。Based on the flow direction of the data table, a data map is generated.

指令有固定的指令词,对所述历史数据库脚本进行文本识别,识别出数据流向对应的指令词得到数据流向文本,并根据所述数据流向文本,确定数据表流向,得到数据地图。The instruction has a fixed instruction word, text recognition is performed on the historical database script, the instruction word corresponding to the data flow direction is identified to obtain the data flow direction text, and according to the data flow direction text, the flow direction of the data table is determined, and the data map is obtained.

在一个可选的实施方式中,所述对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本包括:In an optional implementation manner, the performing text recognition on the historical database script, and determining the data flow direction text in the historical database script includes:

调用预设的分词工具包对所述历史数据库脚本进行分词处理,得到分词结果;Call the preset word segmentation toolkit to perform word segmentation processing on the historical database script, and obtain a word segmentation result;

从所述分词结果中,确定出预设流向词在所述历史数据库脚本中的位置;From the word segmentation result, determine the position of the preset flow direction word in the historical database script;

根据所述预设流向词在所述历史数据库脚本中的位置,从所述历史数据库脚本中提取出数据流向文本。According to the position of the preset flow direction word in the historical database script, the data flow direction text is extracted from the historical database script.

可以根据历史数据库脚本使用的数据库操作语句对应的语言,确定预设流向词,在此对预设流向词不做任何限制。例如,若数据库操作语句对应的语言为面向对象的查询语言(Hibernate Query Language,HQL)时,预设流程词可以是insert和select…from,可以基于所述预设流向词在所述历史数据库脚本中的位置,提取该位置对应的语句作为数据流向文本。The preset flow direction word can be determined according to the language corresponding to the database operation statement used by the historical database script, and there is no restriction on the preset flow direction word here. For example, if the language corresponding to the database operation statement is an object-oriented query language (Hibernate Query Language, HQL), the preset flow words can be insert and select...from, and the historical database script can be written based on the preset flow direction words in the historical database. In the position, extract the sentence corresponding to the position as the data flow text.

例如,数据流向文本为“insert overwrite table T2;select a,b from tableT1”,可以基于文本识别,在所述数据流向文本中确定数据流向:table T1->table T2(表T1流向表T2)。如果表T1的数据是由表T0经过上述过程加工得来就会形成table T0->tableT1(表T0流向表T1),最终形成一条数据链路:table T0->table T1->table T2(表T0流向表T1,表T1流向表T2)。For example, the data flow direction text is "insert overwrite table T2; select a, b from tableT1", and the data flow direction can be determined in the data flow direction text based on text recognition: table T1->table T2 (table T1 flows to table T2). If the data of table T1 is processed by table T0 through the above process, it will form table T0->tableT1 (table T0 flows to table T1), and finally form a data link: table T0->table T1->table T2 (table T0->table T1->table T2 (table T0 flows to table T1) T0 flows to table T1, which flows to table T2).

如果还有个表T3,如图2所示,表T3是由T0和T4同时关联得来的。If there is also a table T3, as shown in Figure 2, the table T3 is obtained from the simultaneous association of T0 and T4.

此时,由表T1、表T2和表T3,可以得到一个数据地图。如图3所示,该数据地图中包括三条链路:table T0->table T1->table T2(表T0流向表T1,表T1流向表T2);table T0->table T3(表T0流向表T3);table T4->table T3(表T4流向表T3)。At this time, from table T1, table T2 and table T3, a data map can be obtained. As shown in Figure 3, the data map includes three links: table T0->table T1->table T2 (table T0 flows to table T1, table T1 flows to table T2); table T0->table T3 (table T0 flows to table T3); table T4->table T3 (table T4 flows to table T3).

S15,确定每两个数据库对应的数据链路中的公共链路。S15: Determine a common link in the data links corresponding to each of the two databases.

公共链路为两个数据库对应的数据链路中存在重合的链路。The common link is the overlapping link in the data links corresponding to the two databases.

示例性的,基于预先构建的数据地图,确定数据库cx_XX_safe对应的数据链路,如图4所示;确定数据库cx_XXX_safe对应的数据链路,如图5所示。Exemplarily, based on a pre-built data map, the data link corresponding to the database cx_XX_safe is determined, as shown in FIG. 4 ; the data link corresponding to the database cx_XXX_safe is determined, as shown in FIG. 5 .

根据数据库cx_XX_safe对应的数据链路和数据库cx_XXX_safe对应的数据链路,确定这两个数据库中存在两条公共链路:table a1>table b1(表a1流向表b1);table d1->table b1(表d1流向表b1)。According to the data link corresponding to the database cx_XX_safe and the data link corresponding to the database cx_XXX_safe, it is determined that there are two public links in these two databases: table a1>table b1 (table a1 flows to table b1); table d1->table b1 ( Table d1 flows to table b1).

S16,基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。S16, based on the public link, determine a target data link corresponding to the database script, and perform data processing based on the target data link.

对存在公共链路的两个数据库的数据链路进行合并,得到所述数据库脚本对应的目标数据链路,接着对目标数据链路进行处理,可以避免对重复的链路进行处理导致的浪费,减少CPU资源的消耗。Merging the data links of the two databases with the common link to obtain the target data link corresponding to the database script, and then processing the target data link, so as to avoid waste caused by processing the duplicate links, Reduce the consumption of CPU resources.

在一个可选的实施方式中,所述基于所述公共链路,确定所述数据库脚本对应的目标数据链路包括:In an optional implementation manner, the determining, based on the public link, the target data link corresponding to the database script includes:

确定所述公共链路对应的多个重合数据库;determining multiple coincidence databases corresponding to the public link;

获取每个所述重合数据库对应的数据链路;obtaining the data link corresponding to each of the overlapping databases;

基于所述公共链路,对每两个重合数据库对应的数据链路进行合并,得到第一数据链路;Based on the common link, the data links corresponding to every two overlapping databases are combined to obtain a first data link;

根据所述数据库脚本中非所述公共链路对应的数据库,得到第二数据链路;Obtain a second data link according to a database not corresponding to the public link in the database script;

根据所述第一数据链路和所述第二数据链路,确定所述数据库脚本对应的目标数据链路。A target data link corresponding to the database script is determined according to the first data link and the second data link.

多个重合数据库指两个或两个以上的重合数据库。一个公共链路可以对应两个数据库或两个以上的数据库。为了方便区分数据库,现将公共链路对应的数据库称为重合数据库。确定每个公共链路对应的多个重合数据库,并将该公共链路对应的多个重合数据库中每两个重合数据库对应的数据链路进行合并,得到第一数据链路。Multiple overlapping databases refer to two or more overlapping databases. A common link can correspond to two databases or more than two databases. For the convenience of distinguishing the databases, the database corresponding to the public link is now called the coincidence database. A plurality of coincidence databases corresponding to each public link are determined, and data links corresponding to every two coincidence databases in the plurality of coincidence databases corresponding to the public link are combined to obtain a first data link.

例如,确定公共链路对应了两个重合数据库(cx_XX_safe,cx_XXX_safe),两个数据库对应的公共链路如图4和图5所示,基于所述公共链路,对每两个所述重合数据库对应的数据链路进行合并,得到第一数据链路,如图6所示。For example, it is determined that the public link corresponds to two coincidence databases (cx_XX_safe, cx_XXX_safe). The public links corresponding to the two databases are shown in FIG. 4 and FIG. 5 . The corresponding data links are combined to obtain a first data link, as shown in FIG. 6 .

根据所述数据库脚本中不包含公共链路的数据库,得到所述第二数据链路。可以将不包含公共链路的数据库的数据链路,确定为第二数据链路。The second data link is obtained according to the database that does not contain the public link in the database script. The data link of the database that does not include the common link may be determined as the second data link.

本申请所述的基于人工智能的数据加工方法,通过基于文本识别算法对待加工数据对应的数据库脚本进行识别,确定所述数据库脚本对应的多个数据库;接着基于数据库中对应的数据表和预先构建的数据地图,确定每个数据库对应的数据链路;然后确定每两个数据库对应的数据链路中的公共链路,公共链路为存在重复处理的链路;最后基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工,基于目标数据链路进行加工可以避免相同的一个数据表被反复加工,减少了资源的消耗,提高了数据加工的效率。The artificial intelligence-based data processing method described in the present application identifies a database script corresponding to the data to be processed based on a text recognition algorithm, and determines multiple databases corresponding to the database script; determine the data link corresponding to each database; then determine the public link in the data link corresponding to each two databases, and the public link is the link with repeated processing; finally, based on the public link, Determine the target data link corresponding to the database script, and perform data processing based on the target data link. Processing based on the target data link can avoid repeated processing of the same data table, reduce resource consumption, and improve Efficiency of data processing.

实施例二Embodiment 2

图7是本申请实施例二提供的基于人工智能的数据加工装置的结构图。FIG. 7 is a structural diagram of an artificial intelligence-based data processing apparatus provided in Embodiment 2 of the present application.

在一些实施例中,所述基于人工智能的数据加工装置20可以包括多个由计算机程序段所组成的功能模块。所述基于人工智能的数据加工装置20中的各个程序段的计算机程序可以存储于电子设备的存储器中,并由至少一个处理器所执行,以执行(详见图1描述)基于人工智能的数据加工方法的功能。In some embodiments, the artificial intelligence-based data processing apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the data processing device 20 based on artificial intelligence can be stored in the memory of the electronic device and executed by at least one processor to execute (see description in FIG. 1 for details) the data based on artificial intelligence. The function of the processing method.

本实施例中,所述基于人工智能的数据加工装置20根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:脚本获取模块201、文本识别模块202、数据获取模块203、链路确定模块204、链路识别模块205及数据加工模块206。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机程序段,其存储在存储器中。在本实施例中,关于各模块的功能将在后续的实施例中详述。In this embodiment, the artificial intelligence-based data processing apparatus 20 can be divided into a plurality of functional modules according to the functions it performs. The functional modules may include: a script acquisition module 201 , a text recognition module 202 , a data acquisition module 203 , a link determination module 204 , a link recognition module 205 and a data processing module 206 . A module referred to in this application refers to a series of computer program segments that can be executed by at least one processor and can perform fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

脚本获取模块201,用于获取待加工数据对应的数据库脚本。The script obtaining module 201 is used for obtaining the database script corresponding to the data to be processed.

获取数据加工任务对应的待加工数据,其中,待加工数据可以包括多个数据库中对应的多张数据表。获取所述待加工数据对应的数据库脚本。数据库脚本可以为Hibernate查询语言(Hibernate Query Language,HQL)脚本等。The data to be processed corresponding to the data processing task is acquired, wherein the data to be processed may include a plurality of corresponding data tables in a plurality of databases. Obtain the database script corresponding to the data to be processed. The database script may be a Hibernate Query Language (Hibernate Query Language, HQL) script or the like.

文本识别模块202,用于对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库。The text recognition module 202 is configured to perform text recognition on the database script, and determine multiple databases corresponding to the database script.

所述数据库脚本中的数据来自多个数据库,每个数据库都有对应的名称和/或编码,可以基于文本识别算法对所述数据库脚本进行识别,识别出所述数据库脚本对应的库名,确定所述数据库脚本对应的多个数据库。The data in the database script comes from multiple databases, each database has a corresponding name and/or code, the database script can be identified based on a text recognition algorithm, the library name corresponding to the database script can be identified, and the database script can be identified. Multiple databases corresponding to the database script.

在一个可选的实施方式中,文本识别模块202对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库包括:In an optional implementation manner, the text recognition module 202 performs text recognition on the database script, and determining multiple databases corresponding to the database script includes:

对所述数据库脚本进行分词处理,得到分词结果;Perform word segmentation processing on the database script to obtain word segmentation results;

对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库。Perform keyword detection on the word segmentation result, and identify multiple databases corresponding to the database script from the word segmentation result.

示例性的,可以调用预设的分词工具包(如结巴工具包)对数据库脚本进行分词处理,得到分词结果。采用该方式可以提升关键字识别效率和准确度。Exemplarily, a preset word segmentation toolkit (such as a stammer toolkit) may be invoked to perform word segmentation processing on the database script, and a word segmentation result may be obtained. Using this method can improve the efficiency and accuracy of keyword recognition.

在一个可选的实施方式中,文本识别模块202对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库,包括:In an optional embodiment, the text recognition module 202 performs keyword detection on the word segmentation result, and identifies a plurality of databases corresponding to the database script from the word segmentation result, including:

采用预设特征词库从分词结果中确定出目标特征词所在位置,所述特征词库包括多个特征词,所述特征词是指所述数据库脚本使用的数据库操作语句的特征词;The location of the target feature word is determined from the word segmentation result by using a preset feature word database, the feature word database includes a plurality of feature words, and the feature word refers to the feature word of the database operation statement used by the database script;

采用关键字匹配技术,从所述分词结果中的所述目标特征词所在位置之后的文本内容中识别出数据库库名;Using keyword matching technology, the database library name is identified from the text content after the position of the target feature word in the word segmentation result;

根据所述数据库库名,确定所述数据库脚本对应的多个数据库。According to the database library name, multiple databases corresponding to the database script are determined.

其中,该特征词库可以包括多个特征词。特征词是指该数据库脚本使用的数据库操作语句的特征词,在此对特征词的内容不做任何限定。例如,数据库操作语句为结构化查询语言(Structured Query Language,SQL)语句,相应地,特征词可以为where、from、join和一些符号(如$),等等。Wherein, the feature thesaurus may include a plurality of feature words. The characteristic word refers to the characteristic word of the database operation statement used by the database script, and the content of the characteristic word is not limited here. For example, the database operation statement is a structured query language (Structured Query Language, SQL) statement, and correspondingly, the characteristic words can be where, from, join and some symbols (eg $), and so on.

为了识别出数据库操作文本包括的数据库库名,可以对数据库脚本进行分词处理,得到分词结果。具体地。可以使用分词工具包,如结巴工具包对数据库脚本中的每条SQL语句进行分词处理,得到分词结果。假设,使用结巴工具包对数据加工任务对应的数据库脚本中的每条SQL语句进行分词处理后的分词结果,包括如下内容:In order to identify the database library name included in the database operation text, a word segmentation process can be performed on the database script to obtain a word segmentation result. specifically. You can use a word segmentation toolkit, such as the stammer toolkit, to perform word segmentation processing on each SQL statement in the database script to obtain the word segmentation result. Suppose, the word segmentation result after word segmentation processing is performed on each SQL statement in the database script corresponding to the data processing task using the stutter toolkit, including the following content:

SelectA.nameSelectA.name

,B.type, B.type

,C.group, C.group

From cx_XX_safe:.text_table b1、.text_table c1、.text_table e1。From cx_XX_safe: .text_table b1, .text_table c1, .text_table e1.

SelectB.nameSelectB.name

,B.type, B.type

,C.group, C.group

From cx_XXX_safe:text_table b1、.text_table c2、.text_table e2。From cx_XXX_safe: text_table b1, .text_table c2, .text_table e2.

在得到分词结果后,可以对所述分词结果进行关键字检测,从该分词结果中识别出该数据库脚本包括的数据库库名。具体地,可以采用预设特征词库从分词结果中确定出目标特征词所在位置,并对所述目标分词进行关键字匹配,从该分词结果中该目标特征词所在位置之后的文本内容中识别出数据库库名。例如,可以采用预设的特征词库从分词结果中确定出From所在的位置,并对所述目标分词进行关键字匹配从cx_XX_safe..text_table b1、.text_table c1、.text_table e1中识别出库名cx_XX_safe,从cx_XXX_safetext_table b1、.text_table c2、.text_table e2中识别出库名cx_XXX_safe。After the word segmentation result is obtained, keyword detection can be performed on the word segmentation result, and the database library name included in the database script can be identified from the word segmentation result. Specifically, a preset feature thesaurus can be used to determine the location of the target feature word from the word segmentation result, perform keyword matching on the target word segmentation, and identify the target feature word from the text content after the location of the target feature word in the word segmentation result. out the database name. For example, a preset feature thesaurus can be used to determine the location of From from the word segmentation result, and keyword matching can be performed on the target word segmentation to identify the library name from cx_XX_safe..text_table b1, .text_table c1, .text_table e1 cx_XX_safe, identifies the library name cx_XXX_safe from cx_XXX_safetext_table b1, .text_table c2, .text_table e2.

在识别出数据库名之后,可以根据该数据库库名确定出该数据库脚本对应的多个数据库。After the database name is identified, multiple databases corresponding to the database script can be determined according to the database library name.

数据获取模块203,用于获取每个数据库对应的数据表。The data acquisition module 203 is used for acquiring the data table corresponding to each database.

每个数据库后对应着该数据库对应的数据表。在识别出数据库库名后,可以将库名之后的表名提取出来,根据表名确定所述数据库对应的数据表。Each database corresponds to the corresponding data table of the database. After the database database name is identified, the table name following the database name can be extracted, and the data table corresponding to the database can be determined according to the table name.

在一个可选的实施方式中,数据获取模块203获取所述数据库对应的数据表包括:In an optional implementation manner, the data acquisition module 203 acquiring the data table corresponding to the database includes:

获取数据库的数据库库名,并将所述数据库库名确定为第一关键字;Obtain the database library name of the database, and determine the database library name as the first keyword;

根据所述第一关键字确定出所述数据库脚本包括的第二关键字,所述第二关键字位于所述第一关键字之后;determining a second keyword included in the database script according to the first keyword, where the second keyword is located after the first keyword;

根据所述第二关键字生成数据表名,并根据所述数据表名确定所述数据库对应的数据表。A data table name is generated according to the second keyword, and a data table corresponding to the database is determined according to the data table name.

对所述目标分词进行关键字匹配技术,从所述分词结果中所述目标特征词所在位置之后的文本内容中识别出数据库的数据库库名后,将数据库库名确定为第一关键词。按照预设的提取规则,将第一关键词后的若干位分词提取出来,确定为第二关键词。根据第二关键词的内容确定数据表名,从而得到数据库对应的数据表。例如,处理器可以将库名cx_XX_safe确定为第一关键词,并按照预设的提取规则(如提取text table语句),将提取到的text table语句确定为第二关键词,如将text_table b1、.text_table c1、.text_tablee1确定为第二关键词,并根据第二关键词的内容确定数据表名(text_table b1、.text_table c1、.text_table e1),从而得到数据库对应的数据表(b1、c1、e1);可以将库名cx_XXX_safe确定为第一关键词,并按照预设的提取规则(如提取text table语句),将提取到的text table语句确定为第二关键词,如将text_table b1、.text_table c2、.text_tablee2确定为第二关键词,并根据第二关键词的内容确定数据表名(text_table b1、.text_table c2、.text_table e2),从而得到数据库对应的数据表(b1、c2、e2)。The keyword matching technology is performed on the target word segmentation, and after identifying the database database name of the database from the text content after the location of the target feature word in the word segmentation result, the database database name is determined as the first keyword. According to a preset extraction rule, several word segmentations after the first keyword are extracted and determined as the second keyword. The data table name is determined according to the content of the second keyword, thereby obtaining the data table corresponding to the database. For example, the processor may determine the library name cx_XX_safe as the first keyword, and determine the extracted text table statement as the second keyword according to preset extraction rules (such as extracting text table statements), such as text_table b1, .text_table c1, .text_tablee1 are determined as the second keyword, and the data table name (text_table b1, .text_table c1, .text_table e1) is determined according to the content of the second keyword, so as to obtain the corresponding data table (b1, c1, .text_table e1) of the database. e1); the library name cx_XXX_safe can be determined as the first keyword, and the extracted text table statement can be determined as the second keyword according to the preset extraction rules (such as extracting text table statements), such as text_table b1, . text_table c2, .text_tablee2 are determined as the second keyword, and the data table name (text_table b1, .text_table c2, .text_table e2) is determined according to the content of the second keyword, so as to obtain the data table (b1, c2, e2) corresponding to the database ).

链路确定模块204,用于基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路。The link determination module 204 is configured to determine the data link corresponding to each database based on the pre-built data map and the data table.

每两个数据表中可能存在数据流向,数据流向用于表示数据表中数据的来源。例如,数据表A流向数据表B,说明数据表A是数据表B的源头,数据表B是在数据表A的基础上加工而来的。预先构建的数据地图中记录了数据表对应的数据流向,根据所述数据地图和数据表,可以确定每个数据库对应的数据链路。所述数据链路为加工所述待加工数据时,对数据表的处理流程。根据所述数据链路,可以确定加工所述待加工数据时需要处理的数据表数量。There may be a data flow direction in every two data tables, and the data flow direction is used to indicate the source of the data in the data table. For example, data table A flows to data table B, indicating that data table A is the source of data table B, and data table B is processed on the basis of data table A. The data flow direction corresponding to the data table is recorded in the pre-built data map, and the data link corresponding to each database can be determined according to the data map and the data table. The data link is the processing flow of the data table when processing the to-be-processed data. According to the data link, the number of data tables to be processed when processing the data to be processed can be determined.

在一个可选的实施方式中,基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路之前,链路确定模块204还用于:In an optional implementation manner, based on the pre-built data map and the data table, before determining the data link corresponding to each database, the link determination module 204 is further configured to:

获取历史数据库脚本;Get the history database script;

对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本;Text recognition is performed on the historical database script, and the data flow direction text is determined in the historical database script;

根据所述数据流向文本,确定数据表流向;Determine the flow direction of the data table according to the data flow direction text;

基于所述数据表流向,生成数据地图。Based on the flow direction of the data table, a data map is generated.

指令有固定的指令词,对所述历史数据库脚本进行文本识别,识别出数据流向对应的指令词得到数据流向文本,并根据所述数据流向文本,确定数据表流向,得到数据地图。The instruction has a fixed instruction word, text recognition is performed on the historical database script, the instruction word corresponding to the data flow direction is identified to obtain the data flow direction text, and according to the data flow direction text, the flow direction of the data table is determined, and the data map is obtained.

在一个可选的实施方式中,链路确定模块204对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本包括:In an optional implementation manner, the link determination module 204 performs text recognition on the historical database script, and determining the data flow direction text in the historical database script includes:

调用预设的分词工具包对所述历史数据库脚本进行分词处理,得到分词结果;Call the preset word segmentation toolkit to perform word segmentation processing on the historical database script, and obtain a word segmentation result;

从所述分词结果中,确定出预设流向词在所述历史数据库脚本中的位置;From the word segmentation result, determine the position of the preset flow direction word in the historical database script;

根据所述预设流向词在所述历史数据库脚本中的位置,从所述历史数据库脚本中提取出数据流向文本。According to the position of the preset flow direction word in the historical database script, the data flow direction text is extracted from the historical database script.

可以根据历史数据库脚本使用的数据库操作语句对应的语言,确定预设流向词,在此对预设流向词不做任何限制。例如,若数据库操作语句对应的语言为面向对象的查询语言(Hibernate Query Language,HQL)时,预设流程词可以是insert和select…from,可以基于所述预设流向词在所述历史数据库脚本中的位置,提取该位置对应的语句作为数据流向文本。The preset flow direction word can be determined according to the language corresponding to the database operation statement used in the historical database script, and there is no restriction on the preset flow direction word here. For example, if the language corresponding to the database operation statement is an object-oriented query language (Hibernate Query Language, HQL), the preset flow words can be insert and select...from, and the historical database script can be written based on the preset flow direction words in the historical database. In the position, extract the sentence corresponding to the position as the data flow text.

例如,数据流向文本为“insert overwrite table T2;select a,b from tableT1”,可以基于文本识别,在所述数据流向文本中确定数据流向:table T1->table T2(表T1流向表T2)。如果表T1的数据是由表T0经过上述过程加工得来就会形成table T0->tableT1(表T0流向表T1),最终形成一条数据链路:table T0->table T1->table T2(表T0流向表T1,表T1流向表T2)。For example, the data flow direction text is "insert overwrite table T2; select a, b from tableT1", and the data flow direction can be determined in the data flow direction text based on text recognition: table T1->table T2 (table T1 flows to table T2). If the data of table T1 is processed by table T0 through the above process, it will form table T0->tableT1 (table T0 flows to table T1), and finally form a data link: table T0->table T1->table T2 (table T0->table T1->table T2 (table T0 flows to table T1) T0 flows to table T1, which flows to table T2).

如果还有个表T3,如图2所示,表T3是由T0和T4同时关联得来的。If there is also a table T3, as shown in Figure 2, the table T3 is obtained from the simultaneous association of T0 and T4.

此时,由表T1、表T2和表T3,可以得到一个数据地图。如图3所示,该数据地图中包括三条链路:table T0->table T1->table T2(表T0流向表T1,表T1流向表T2);table T0->table T3(表T0流向表T3);table T4->table T3(表T4流向表T3)。At this time, from table T1, table T2 and table T3, a data map can be obtained. As shown in Figure 3, the data map includes three links: table T0->table T1->table T2 (table T0 flows to table T1, table T1 flows to table T2); table T0->table T3 (table T0 flows to table T3); table T4->table T3 (table T4 flows to table T3).

链路识别模块205,用于确定每两个数据库对应的数据链路中的公共链路。The link identification module 205 is configured to determine a common link in the data links corresponding to each of the two databases.

公共链路为两个数据库对应的数据链路中存在重合的链路。The common link is the overlapping link in the data links corresponding to the two databases.

示例性的,基于预先构建的数据地图,确定数据库cx_XX_safe对应的数据链路,如图4所示;确定数据库cx_XXX_safe对应的数据链路,如图5所示。Exemplarily, based on a pre-built data map, the data link corresponding to the database cx_XX_safe is determined, as shown in FIG. 4 ; the data link corresponding to the database cx_XXX_safe is determined, as shown in FIG. 5 .

根据数据库cx_XX_safe对应的数据链路和数据库cx_XXX_safe对应的数据链路,确定这两个数据库中存在两条公共链路:table a0->table b3(表a0流向表b1);table d1->table b1(表d1流向表b1)。According to the data link corresponding to database cx_XX_safe and the data link corresponding to database cx_XXX_safe, it is determined that there are two public links in these two databases: table a0->table b3 (table a0 flows to table b1); table d1->table b1 (table d1 flows to table b1).

数据加工模块206,用于基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。The data processing module 206 is configured to determine a target data link corresponding to the database script based on the public link, and perform data processing based on the target data link.

对存在公共链路的两个数据库的数据链路进行合并,得到所述数据库脚本对应的目标数据链路,接着对目标数据链路进行处理,可以避免对重复的链路进行处理导致的浪费,减少CPU资源的消耗。Merging the data links of the two databases with the common link to obtain the target data link corresponding to the database script, and then processing the target data link, so as to avoid waste caused by processing the duplicate links, Reduce the consumption of CPU resources.

在一个可选的实施方式中,数据加工模块206基于所述公共链路,确定所述数据库脚本对应的目标数据链路包括:In an optional implementation manner, the data processing module 206 determines, based on the public link, the target data link corresponding to the database script includes:

确定所述公共链路对应的多个重合数据库;determining multiple coincidence databases corresponding to the public link;

获取每个所述重合数据库对应的数据链路;obtaining the data link corresponding to each of the overlapping databases;

基于所述公共链路,对每两个重合数据库对应的数据链路进行合并,得到第一数据链路;Based on the common link, the data links corresponding to every two overlapping databases are combined to obtain a first data link;

根据所述数据库脚本中非所述公共链路对应的数据库,得到第二数据链路;Obtain a second data link according to a database not corresponding to the public link in the database script;

根据所述第一数据链路和所述第二数据链路,确定所述数据库脚本对应的目标数据链路。A target data link corresponding to the database script is determined according to the first data link and the second data link.

多个重合数据库指两个或两个以上的重合数据库。一个公共链路可以对应两个数据库或两个以上的数据库。为了方便区分数据库,现将公共链路对应的数据库称为重合数据库。确定每个公共链路对应的多个重合数据库,并将该公共链路对应的多个重合数据库中每两个重合数据库对应的数据链路进行合并,得到第一数据链路。Multiple overlapping databases refer to two or more overlapping databases. A common link can correspond to two databases or more than two databases. For the convenience of distinguishing the databases, the database corresponding to the public link is now called the coincidence database. A plurality of coincidence databases corresponding to each public link are determined, and data links corresponding to every two coincidence databases in the plurality of coincidence databases corresponding to the public link are combined to obtain a first data link.

例如,确定公共链路对应了两个重合数据库(cx_XX_safe,cx_XXX_safe),两个重合数据库对应的公共链路如图4和图5所示,基于所述公共链路,对每两个所述数据库对应的数据链路进行合并,得到第一数据链路,如图6所示。For example, it is determined that the public link corresponds to two coincidence databases (cx_XX_safe, cx_XXX_safe). The public links corresponding to the two coincidence databases are shown in FIG. 4 and FIG. 5 . The corresponding data links are combined to obtain a first data link, as shown in FIG. 6 .

根据所述数据库脚本中不包含公共链路的数据库,得到所述第二数据链路。可以将不包含公共链路的数据库的数据链路,确定为第二数据链路。The second data link is obtained according to the database that does not contain the public link in the database script. The data link of the database that does not include the common link may be determined as the second data link.

本申请所述的基于人工智能的数据加工装置,通过基于文本识别算法对待加工数据对应的数据库脚本进行识别,确定所述数据库脚本对应的多个数据库;接着基于数据库中对应的数据表和预先构建的数据地图,确定每个数据库对应的数据链路;然后确定每两个数据库对应的数据链路中的公共链路,公共链路为存在重复处理的链路;最后基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工,基于目标数据链路进行加工可以避免相同的一个数据表被反复加工,减少了资源的消耗,提高了数据加工的效率。The artificial intelligence-based data processing device described in the present application identifies multiple databases corresponding to the database scripts by identifying the database scripts corresponding to the data to be processed based on a text recognition algorithm; determine the data link corresponding to each database; then determine the public link in the data link corresponding to each two databases, and the public link is the link with repeated processing; finally, based on the public link, Determine the target data link corresponding to the database script, and perform data processing based on the target data link. Processing based on the target data link can avoid repeated processing of the same data table, reduce resource consumption, and improve Efficiency of data processing.

实施例三Embodiment 3

本实施例提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述基于人工智能的数据加工方法实施例中的步骤,例如图1所示的S11-S16:This embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps in the above-mentioned embodiment of the data processing method based on artificial intelligence, for example, FIG. 1 S11-S16 shown:

S11,获取待加工数据对应的数据库脚本;S11, obtaining a database script corresponding to the data to be processed;

S12,对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库;S12, performing text recognition on the database script to determine multiple databases corresponding to the database script;

S13,获取每个数据库对应的数据表;S13, obtain the data table corresponding to each database;

S14,基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路;S14, based on the pre-built data map and the data table, determine the data link corresponding to each database;

S15,确定每两个数据库对应的数据链路中的公共链路;S15, determine the public link in the data link corresponding to every two databases;

S16,基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。S16, based on the public link, determine a target data link corresponding to the database script, and perform data processing based on the target data link.

或者,该计算机程序被处理器执行时实现上述装置实施例中各模块/单元的功能,例如图7中的模块201-206:Alternatively, when the computer program is executed by the processor, the functions of the modules/units in the foregoing apparatus embodiments are implemented, for example, the modules 201-206 in FIG. 7 :

脚本获取模块201,用于获取待加工数据对应的数据库脚本;The script acquisition module 201 is used for acquiring the database script corresponding to the data to be processed;

文本识别模块202,用于对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库;A text recognition module 202, configured to perform text recognition on the database script, and determine a plurality of databases corresponding to the database script;

数据获取模块203,用于获取每个数据库对应的数据表;The data acquisition module 203 is used to acquire the data table corresponding to each database;

链路确定模块204,用于基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路;a link determination module 204, configured to determine the data link corresponding to each database based on the pre-built data map and the data table;

链路识别模块205,用于确定每两个数据库对应的数据链路中的公共链路;a link identification module 205, configured to determine a common link in the data links corresponding to every two databases;

数据加工模块206,用于基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。The data processing module 206 is configured to determine a target data link corresponding to the database script based on the public link, and perform data processing based on the target data link.

实施例四Embodiment 4

参阅图8所示,为本申请实施例三提供的电子设备的结构示意图。在本申请较佳实施例中,所述电子设备3包括存储器31、至少一个处理器32、收发器33及至少一条通信总线34。Referring to FIG. 8 , it is a schematic structural diagram of an electronic device according to Embodiment 3 of the present application. In a preferred embodiment of the present application, the electronic device 3 includes a memory 31 , at least one processor 32 , a transceiver 33 and at least one communication bus 34 .

本领域技术人员应该了解,图8示出的电子设备的结构并不构成本申请实施例的限定,既可以是总线型结构,也可以是星形结构,所述电子设备3还可以包括比图示更多或更少的其他硬件或者软件,或者不同的部件布置。Those skilled in the art should understand that the structure of the electronic device shown in FIG. 8 does not constitute a limitation of the embodiments of the present application, and may be a bus-type structure or a star-shaped structure, and the electronic device 3 may also include a ratio diagram more or less other hardware or software, or a different arrangement of components is shown.

在一些实施例中,所述电子设备3是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路、可编程门阵列、数字处理器及嵌入式设备等。所述电子设备3还可包括客户设备,所述客户设备包括但不限于任何一种可与客户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、数码相机等。In some embodiments, the electronic device 3 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits, Programmable gate arrays, digital processors and embedded devices, etc. The electronic device 3 may also include a client device, which includes but is not limited to any electronic product that can perform human-computer interaction with a client through a keyboard, a mouse, a remote control, a touchpad, or a voice-activated device, for example, Personal computers, tablets, smartphones, digital cameras, etc.

需要说明的是,所述电子设备3仅为举例,其他现有的或今后可能出现的电子产品如可适应于本申请,也应包含在本申请的保护范围以内,并以引用方式包含于此。It should be noted that the electronic device 3 is only an example, and other existing or future electronic products, if applicable to the present application, should also be included within the protection scope of the present application, and incorporated herein by reference .

在一些实施例中,所述存储器31中存储有计算机程序,所述计算机程序被所述至少一个处理器32执行时实现如所述的基于人工智能的数据加工方法中的全部或者部分步骤。所述存储器31包括只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除可编程只读存储器(ErasableProgrammable Read-Only Memory,EPROM)、一次可编程只读存储器(One-timeProgrammable Read-Only Memory,OTPROM)、电子擦除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(CompactDisc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。In some embodiments, a computer program is stored in the memory 31, and when the computer program is executed by the at least one processor 32, all or part of the steps in the data processing method based on artificial intelligence as described above are implemented. Described memory 31 comprises read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (EEPROM), CompactDisc Read-Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.

进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc.

本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

在一些实施例中,所述至少一个处理器32是所述电子设备3的控制核心(ControlUnit),利用各种接口和线路连接整个电子设备3的各个部件,通过运行或执行存储在所述存储器31内的程序或者模块,以及调用存储在所述存储器31内的数据,以执行电子设备3的各种功能和处理数据。例如,所述至少一个处理器32执行所述存储器中存储的计算机程序时实现本申请实施例中所述的基于人工智能的数据加工方法的全部或者部分步骤;或者实现基于人工智能的数据加工装置的全部或者部分功能。所述至少一个处理器32可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。In some embodiments, the at least one processor 32 is a control core (ControlUnit) of the electronic device 3, uses various interfaces and lines to connect various components of the entire electronic device 3, and stores in the memory by running or executing 31, and call the data stored in the memory 31 to execute various functions of the electronic device 3 and process data. For example, when the at least one processor 32 executes the computer program stored in the memory, all or part of the steps of the artificial intelligence-based data processing method described in the embodiments of the present application are implemented; or an artificial intelligence-based data processing apparatus is implemented. all or part of the functions. The at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more central processing units. (Central Processing Unit, CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc.

在一些实施例中,所述至少一条通信总线34被设置为实现所述存储器31以及所述至少一个处理器32等之间的连接通信。In some embodiments, the at least one communication bus 34 is arranged to enable connection communication between the memory 31 and the at least one processor 32 and the like.

尽管未示出,所述电子设备3还可以包括给各个部件供电的电源(比如电池),优选的,电源可以通过电源管理装置与所述至少一个处理器32逻辑相连,从而通过电源管理装置实现管理充电、放电、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备3还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。Although not shown, the electronic device 3 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 32 through a power management device, so as to be implemented by the power management device Manage charging, discharging, and power management functions. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 3 may further include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

上述以软件功能模块的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,电子设备,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分。The above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present application. part.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,既可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, and may be located in one place or distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或,单数不排除复数。说明书中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or, and the singular does not exclude the plural. A plurality of units or devices stated in the specification can also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.

最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims (10)

1.一种基于人工智能的数据加工方法,其特征在于,所述方法包括:1. a data processing method based on artificial intelligence, is characterized in that, described method comprises: 获取待加工数据对应的数据库脚本;Obtain the database script corresponding to the data to be processed; 对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库;Perform text recognition on the database script, and determine a plurality of databases corresponding to the database script; 获取每个数据库对应的数据表;Get the data table corresponding to each database; 基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路;Based on the pre-built data map and the data table, determine the data link corresponding to each database; 确定每两个数据库对应的数据链路中的公共链路;Determine the common link in the data link corresponding to each two databases; 基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。Based on the public link, a target data link corresponding to the database script is determined, and data processing is performed based on the target data link. 2.如权利要求1所述的基于人工智能的数据加工方法,其特征在于,所述对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库包括:2. the data processing method based on artificial intelligence as claimed in claim 1 is characterized in that, described carrying out text recognition to described database script, confirming that the multiple databases corresponding to described database script comprises: 对所述数据库脚本进行分词处理,得到分词结果;Perform word segmentation processing on the database script to obtain word segmentation results; 对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库。Perform keyword detection on the word segmentation result, and identify multiple databases corresponding to the database script from the word segmentation result. 3.如权利要求2所述的基于人工智能的数据加工方法,其特征在于,所述对所述分词结果进行关键字检测,从所述分词结果中识别出所述数据库脚本对应的多个数据库,包括:3. The data processing method based on artificial intelligence as claimed in claim 2, characterized in that, said word segmentation result is carried out keyword detection, and multiple databases corresponding to said database script are identified from said word segmentation result ,include: 采用预设特征词库从分词结果中确定出目标特征词所在位置,所述特征词库包括多个特征词,所述特征词是指所述数据库脚本使用的数据库操作语句的特征词;The location of the target feature word is determined from the word segmentation result by using a preset feature word database, the feature word database includes a plurality of feature words, and the feature word refers to the feature word of the database operation statement used by the database script; 对所述分词结果进行关键字匹配,从所述分词结果中的所述目标特征词所在位置之后的文本内容中识别出数据库库名;The word segmentation result is subjected to keyword matching, and the database library name is identified from the text content after the location of the target feature word in the word segmentation result; 根据所述数据库库名,确定所述数据库脚本对应的多个数据库。According to the database library name, multiple databases corresponding to the database script are determined. 4.如权利要求1所述的基于人工智能的数据加工方法,其特征在于,所述获取每个数据库对应的数据表包括:4. The data processing method based on artificial intelligence as claimed in claim 1, is characterized in that, described obtaining the corresponding data table of each database comprises: 获取数据库的数据库库名,并将所述数据库库名确定为第一关键字;Obtain the database library name of the database, and determine the database library name as the first keyword; 根据所述第一关键字确定出所述数据库脚本包括的第二关键字,所述第二关键字位于所述第一关键字之后;determining a second keyword included in the database script according to the first keyword, where the second keyword is located after the first keyword; 根据所述第二关键字生成数据表名,并根据所述数据表名确定所述数据库对应的数据表。A data table name is generated according to the second keyword, and a data table corresponding to the database is determined according to the data table name. 5.如权利要求1所述的基于人工智能的数据加工方法,其特征在于,所述基于所述公共链路,确定所述数据库脚本对应的目标数据链路包括:5. The data processing method based on artificial intelligence as claimed in claim 1, wherein, determining the target data link corresponding to the database script based on the public link comprises: 确定所述公共链路对应的多个重合数据库;determining multiple coincidence databases corresponding to the public link; 获取每个所述重合数据库对应的数据链路;obtaining the data link corresponding to each of the overlapping databases; 基于所述公共链路,对每两个重合数据库对应的数据链路进行合并,得到第一数据链路;Based on the common link, the data links corresponding to every two overlapping databases are combined to obtain a first data link; 根据所述数据库脚本中非所述公共链路对应的数据库,得到第二数据链路;Obtain a second data link according to a database not corresponding to the public link in the database script; 根据所述第一数据链路和所述第二数据链路,确定所述数据库脚本对应的目标数据链路。A target data link corresponding to the database script is determined according to the first data link and the second data link. 6.如权利要求1所述的基于人工智能的数据加工方法,其特征在于,基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路之前,所述方法还包括:6. The data processing method based on artificial intelligence as claimed in claim 1, characterized in that, before determining the data link corresponding to each database based on a pre-built data map and the data table, the method further comprises: 获取历史数据库脚本;Get the history database script; 对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本;Text recognition is performed on the historical database script, and the data flow direction text is determined in the historical database script; 根据所述数据流向文本,确定数据表流向;Determine the flow direction of the data table according to the data flow direction text; 基于所述数据表流向,生成数据地图。Based on the flow direction of the data table, a data map is generated. 7.如权利要求6所述的基于人工智能的数据加工方法,其特征在于,所述对所述历史数据库脚本进行文本识别,在所述历史数据库脚本中确定数据流向文本包括:7. The data processing method based on artificial intelligence as claimed in claim 6, characterized in that, the described historical database script is carried out text recognition, and in the historical database script, it is determined that the data flow direction text comprises: 调用预设的分词工具包对所述历史数据库脚本进行分词处理,得到分词结果;Call the preset word segmentation toolkit to perform word segmentation processing on the historical database script, and obtain a word segmentation result; 从所述分词结果中,确定出预设流向词在所述历史数据库脚本中的位置;From the word segmentation result, determine the position of the preset flow direction word in the historical database script; 根据所述预设流向词在所述历史数据库脚本中的位置,从所述历史数据库脚本中提取出数据流向文本。According to the position of the preset flow direction word in the historical database script, the data flow direction text is extracted from the historical database script. 8.一种基于人工智能的数据加工装置,其特征在于,所述装置包括:8. A data processing device based on artificial intelligence, wherein the device comprises: 脚本获取模块,用于获取待加工数据对应的数据库脚本;The script obtaining module is used to obtain the database script corresponding to the data to be processed; 文本识别模块,用于对所述数据库脚本进行文本识别,确定所述数据库脚本对应的多个数据库;a text recognition module, configured to perform text recognition on the database script, and determine a plurality of databases corresponding to the database script; 数据获取模块,用于获取每个数据库对应的数据表;The data acquisition module is used to acquire the data table corresponding to each database; 链路确定模块,用于基于预先构建的数据地图和所述数据表,确定每个数据库对应的数据链路;a link determination module, configured to determine the data link corresponding to each database based on the pre-built data map and the data table; 链路识别模块,用于确定每两个数据库对应的数据链路中的公共链路;The link identification module is used to determine the common link in the data link corresponding to each two databases; 数据加工模块,用于基于所述公共链路,确定所述数据库脚本对应的目标数据链路,并基于所述目标数据链路进行数据加工。A data processing module, configured to determine a target data link corresponding to the database script based on the public link, and perform data processing based on the target data link. 9.一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1至7中任意一项所述基于人工智能的数据加工方法。9. An electronic device, characterized in that the electronic device comprises a processor and a memory, and when the processor is used to execute the computer program stored in the memory, the artificial intelligence-based artificial intelligence-based artificial intelligence as described in any one of claims 1 to 7 is realized when the processor is used to execute the computer program stored in the memory. data processing method. 10.一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任意一项所述基于人工智能的数据加工方法。10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the computer program according to any one of claims 1 to 7 is implemented. Data processing methods for artificial intelligence.
CN202210551314.5A 2022-05-18 2022-05-18 Data processing method and device based on artificial intelligence, electronic equipment and medium Pending CN114925688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210551314.5A CN114925688A (en) 2022-05-18 2022-05-18 Data processing method and device based on artificial intelligence, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210551314.5A CN114925688A (en) 2022-05-18 2022-05-18 Data processing method and device based on artificial intelligence, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114925688A true CN114925688A (en) 2022-08-19

Family

ID=82810341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210551314.5A Pending CN114925688A (en) 2022-05-18 2022-05-18 Data processing method and device based on artificial intelligence, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114925688A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040098284A1 (en) * 2002-09-18 2004-05-20 Petito Daniel A. Automated work-flow management system with dynamic interface
CN110909016A (en) * 2019-10-12 2020-03-24 中国平安财产保险股份有限公司 Database-based repeated association detection method, device, equipment and storage medium
CN113900956A (en) * 2021-10-29 2022-01-07 平安银行股份有限公司 Test case generation method, device, computer equipment and storage medium
CN114443619A (en) * 2022-01-27 2022-05-06 中国电信股份有限公司 Database capacity expansion method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040098284A1 (en) * 2002-09-18 2004-05-20 Petito Daniel A. Automated work-flow management system with dynamic interface
CN110909016A (en) * 2019-10-12 2020-03-24 中国平安财产保险股份有限公司 Database-based repeated association detection method, device, equipment and storage medium
CN113900956A (en) * 2021-10-29 2022-01-07 平安银行股份有限公司 Test case generation method, device, computer equipment and storage medium
CN114443619A (en) * 2022-01-27 2022-05-06 中国电信股份有限公司 Database capacity expansion method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113961584B (en) Field lineage analysis method, device, electronic device and storage medium
CN112650759A (en) Data query method and device, computer equipment and storage medium
CN112579586A (en) Data processing method, device, equipment and storage medium
CN114691050B (en) Cloud native storage method, device, equipment and medium based on kubernets
CN113961473A (en) Data testing method and device, electronic equipment and computer readable storage medium
CN114691356B (en) Data parallel processing method, device, computer equipment and readable storage medium
CN112231417A (en) Data classification method and device, electronic equipment and storage medium
CN113806434A (en) Big data processing method, device, equipment and medium
CN114398345A (en) Data migration method, device, computer equipment and storage medium
CN113506045A (en) Risk user identification method, device, equipment and medium based on mobile equipment
CN112037003A (en) File account checking processing method and device
CN114969261A (en) Data query method and device based on artificial intelligence, electronic equipment and medium
CN111651452A (en) Data storage method and device, computer equipment and storage medium
CN114386509A (en) Data fusion method, device, electronic device and storage medium
CN113434542A (en) Data relation identification method and device, electronic equipment and storage medium
CN112328596A (en) Data query method and device, electronic equipment and storage medium
CN114138243B (en) Function calling method, device, equipment and storage medium based on development platform
CN116186011A (en) Redundant data table monitoring method and device, computer equipment and storage medium
CN114925688A (en) Data processing method and device based on artificial intelligence, electronic equipment and medium
CN114239538B (en) Assertion processing method, device, computer equipment and storage medium
CN114416695A (en) Data splicing function migration method and device, computer equipment and storage medium
CN114418585A (en) Smart contract generation method, device and related equipment
CN116842011A (en) Blood relationship analysis method, device, computer equipment and storage medium
CN116166427A (en) Automatic capacity expansion and contraction method, device, equipment and storage medium
CN114996386A (en) Business role identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination