CN116150185A

CN116150185A - Data standard extraction method, device, equipment and medium based on artificial intelligence

Info

Publication number: CN116150185A
Application number: CN202310152800.4A
Authority: CN
Inventors: 李健智; 贺春艳; 梁子敬; 秦魏
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-05-23

Abstract

The application provides a data standard extraction method and device based on artificial intelligence, electronic equipment and storage medium, wherein the data standard extraction method based on artificial intelligence comprises the following steps: acquiring service data in a service database to obtain a service basic data set; extracting code value class fields in the service basic data set to obtain an enumeration value list; generating a field vector based on the code value class field to obtain a plurality of classes of field sets; calculating the similarity between the enumerated value lists to construct a code value similarity matrix; constructing a connected graph based on the code value similarity matrix to obtain a plurality of field connected graphs; extracting the code value information of the code value type field, and fusing the code value information based on the field connectivity graph to obtain the data standard of the service database. The method and the device can comprehensively consider the association relation between the code value information in the field annotation and the fields, and acquire the data standard by using the graph algorithm, so that the redundancy of the data standard is reduced, and the use efficiency of the database is improved.

Description

Data standard extraction method, device, equipment and medium based on artificial intelligence

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data standard extraction method, device, electronic device, and storage medium based on artificial intelligence.

Background

The data standard extraction is an important component in the big data management system, after the database reaches a certain level, as the number of tables and the fields of each table increases, the number of the database system participants increases, the data standard becomes various and non-uniform, so that the use cost of the database system increases, and the unified data standard is extracted as an important component for data management, so that the use cost of the database system can be reduced, the business efficiency of the database can be improved, and the effects of reducing the cost and enhancing the efficiency can be achieved.

In conventional data standard extraction, each field in the database is usually taken as an independent field, and the association relationship between the database fields is ignored. In fact, in a large database system, it is not uncommon for relationships within a table or even between tables, for example, a value of a field may be generally obtained from a joint query of multiple tables, where such a mapping relationship is called a relationship, and for fields having the same mapping relationship, the data standards are the same, and omitting such a relationship often causes redundancy of the data standards, so that the use efficiency of the database is reduced.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a method, an apparatus, an electronic device and a storage medium for extracting a data standard based on artificial intelligence, so as to solve the technical problem of how to reduce the redundant portion of the data standard, thereby improving the use efficiency of the database.

The application provides a data standard extraction method based on artificial intelligence, which comprises the following steps:

acquiring service data in a service database to obtain a service basic data set;

extracting code value class fields in the service basic data set to obtain an enumeration value list, wherein the code value class fields are in one-to-one correspondence with the enumeration value list;

generating a field vector based on the code value class field, and grouping the code value class field based on the field vector to obtain field groups of various categories;

calculating the similarity between the enumerated value lists corresponding to the code value class fields in the field group to construct a code value similarity matrix;

constructing a connection diagram among the code value type fields based on the code value similarity matrix to obtain a plurality of field connection diagrams;

extracting the code value information of the code value type field, and fusing the code value information based on the field connectivity graph to obtain the data standard of the service database.

In some embodiments, the acquiring the service data in the service database to obtain the service base data set comprises:

accessing a system table and a system view of the service database to collect metadata of the service database;

generating a service data query statement according to a preset service data sampling rate;

and acquiring service data in a service database based on the metadata and the service data query statement to obtain a service basic data set.

In some embodiments, the extracting the code value class field in the service basic data set to obtain an enumerated value list, where the code value class field corresponds to the enumerated value list one-to-one, includes:

removing the null field in the service basic data set to obtain a service field data set;

extracting repeated fields in the service field data set as code value class fields;

enumerating specific values of the code value type fields as enumeration values, and counting the occurrence frequency of the enumeration values in the service basic data set;

and constructing an enumeration value list corresponding to each code value class field based on the enumeration values and the frequency.

In some embodiments, the generating a field vector based on the code value class field and grouping the code value class field based on the field vector to obtain a plurality of types of field groups includes:

Converting the code value class field into a field vector according to a word vector model;

calculating the similarity between the field vectors;

and dividing the code value type field into field groups of multiple categories based on the similarity and a preset similarity threshold.

In some embodiments, the enumerated value list is a key-value pair structure, and the calculating the similarity between the enumerated value lists corresponding to the code-value class fields in the field group to construct a code-value similarity matrix includes:

converting keys in an enumeration value list corresponding to each code value class field in the field group into key vectors;

calculating the similarity between the key vectors to construct a key similarity matrix;

taking the value in the enumeration value list corresponding to each code value type field in the field group as a value vector;

calculating the similarity between the value vectors to construct a value similarity matrix;

and carrying out weighted summation on the key similarity matrix and the value similarity matrix to obtain a code value similarity matrix.

In some embodiments, the constructing a connectivity graph between the code value class fields based on the code value similarity matrix to obtain a plurality of field connectivity graphs includes:

comparing each code value similarity in the code value similarity matrix with a preset code value similarity threshold to obtain a comparison result;

Updating the code value similarity matrix based on the comparison result to obtain a graph accompanying matrix;

and constructing a connection graph among the code value class fields based on the graph accompanying matrix to obtain a plurality of field connection graphs.

In some embodiments, the extracting the code value information of the code value class field and fusing the code value information based on the field connectivity graph to obtain the data standard of the service database includes:

extracting code value information from the notes of the code value class field according to a preset language rule;

combining the code value class fields represented by the vertexes of each field connected graph into the same group to obtain a code value class field group;

combining the code value information of each code value field in the code value field group as the data standard of the code value field group;

and taking the data standard of all code value class field groups as the data standard of the service database.

The embodiment of the application also provides a data standard extraction device based on artificial intelligence, which comprises an acquisition module, an extraction module, a grouping module, a calculation module, a construction module and a fusion module:

the acquisition module is used for acquiring service data in the service database to obtain a service basic data set;

The extraction module is used for extracting code value class fields in the service basic data set to obtain an enumeration value list, and the code value class fields are in one-to-one correspondence with the enumeration value list;

the grouping module is used for generating a field vector based on the code value class field, and grouping the code value class field based on the field vector to obtain field groups of various types;

the computing module is used for computing the similarity between the enumerated value lists corresponding to the code value type fields in the field group so as to construct a code value similarity matrix;

the construction module is used for constructing a communication graph among the code value type fields based on the code value similarity matrix to obtain a plurality of field communication graphs;

and the fusion module is used for extracting the code value information of the code value type field and fusing the code value information based on the field connectivity graph to obtain the data standard of the service database.

The embodiment of the application also provides electronic equipment, which comprises:

a memory storing at least one instruction;

and the processor executes the instructions stored in the memory to realize the data standard extraction method based on artificial intelligence.

Embodiments of the present application also provide a computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executed by a processor in an electronic device to implement the artificial intelligence-based data standard extraction method.

According to the method and the device, the code value similarity matrix is built by utilizing the collected service data to comprehensively consider the association relation between the code value information in the field annotation and the fields, and the complete data standard is obtained by combining the graph algorithm, so that the redundancy of the data standard is reduced, the use efficiency of a database is improved, and the operation cost of the database is reduced.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of an artificial intelligence based data criteria extraction method in accordance with the present application.

FIG. 2 is a functional block diagram of a preferred embodiment of an artificial intelligence based data standard extraction device in accordance with the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the data standard extraction method based on artificial intelligence according to the present application.

Fig. 4 is a diagram showing an example of the structure of a code value similarity matrix according to the present application.

Fig. 5 is a diagram showing an example of the structure of the drawing accompanying matrix according to the present application.

Fig. 6 is a structural example diagram of a field connectivity diagram according to the present application.

Fig. 7 is a basic information example diagram of the code value class field to which the present application relates.

Fig. 8 is a structural example diagram of the extracted data standard according to the present application.

Detailed Description

In order that the objects, features and advantages of the present application may be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, the described embodiments are merely some, rather than all, of the embodiments of the present application.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

The embodiment of the application provides a data standard extraction method based on artificial intelligence, which can be applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware of the electronic devices comprises, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, an ASIC), a programmable gate array (Field-Programmable Gate Array, an FPGA), a digital processor (Digital Signal Processor, a DSP), an embedded device and the like.

The electronic device may be any electronic product that can interact with a customer in a human-machine manner, such as a personal computer, tablet, smart phone, personal digital assistant (Personal Digital Assistant, PDA), gaming machine, interactive web television (Internet Protocol Television, IPTV), smart wearable device, etc.

The electronic device may also include a network device and/or a client device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.

The network in which the electronic device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.

FIG. 1 is a flow chart of a preferred embodiment of the artificial intelligence based data criteria extraction method of the present application. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.

S10, collecting service data in a service database to obtain a service basic data set.

In an alternative embodiment, the acquiring the service data in the service database to obtain the service base data set includes:

In this alternative embodiment, metadata in the business database may be collected by logging into the database to access the system and accessing the system table and system view of the business database via system commands. The service database may be a database supporting SQL statements such as MySQL, SQL Server, oracle, sybase, etc., and the metadata is used to describe attribute information of service data, such as a database table name, a field annotation, a data type, a constraint condition, a table relationship, etc., and is used to support functions such as indicating a storage location, historical data, resource searching, file recording, etc.

In this alternative embodiment, a suitable service data sampling rate may be selected to generate a service data query statement, where the service data query statement collects a corresponding amount of service data in each database table from the storage location indicated by the metadata according to the service data sampling rate, so as to implement sampling collection of all service data in the service database.

Taking MySQL database as an example, the generated business data query statement is:

SELECT FROM table name ORDER By rand () LIMIT sample data, where the sample data of each table is equal to the product of the total amount of traffic data of each table and the traffic data sample rate.

Thus, a certain amount of service data can be randomly collected from each table of the service database to form a service basic data set, and data support is provided for extracting data standards in the follow-up process.

S11, extracting code value class fields in the service basic data set to obtain an enumeration value list, wherein the code value class fields are in one-to-one correspondence with the enumeration value list.

In an optional embodiment, the extracting the code value class field in the service basic data set to obtain an enumerated value list, where the code value class field corresponds to the enumerated value list one-to-one, includes:

In this alternative embodiment, null or null values may be filled in the SQL statement to remove unnecessary null fields in the service base data set, and the service base data set that does not include the null fields may be used as the service field data set.

In this optional embodiment, the repeated fields in the service field data set may be extracted by using a "select from table name where condition" SQL statement to be used as code value class fields, and enumerated, that is, specific values of the code value class fields are enumerated one by one to be used as enumeration values, and at the same time, the frequency of occurrence of the enumeration values in the service base data set is recorded, so as to construct an enumeration value list corresponding to each code value class field, where the enumeration value list is a key value pair structure.

Illustratively, the enumerated value list for the code value class field A is: { ' identity card ' 917, ' driver ' 83}, representing that the code value class field A only presents two keys of the identity card and the driver's license in the service basic data set, wherein the identity card presents 917 times and the driver's license presents 83 times, the ' identity card ' 917 is a set of key value pairs, and the ' driver's license ' 83 is another set of key value pairs.

Therefore, the data redundancy can be reduced through the emptying field, and meanwhile, effective information of each code value type field can be extracted in a key value pair mode by constructing an enumeration value list, so that the efficiency of calculating the data standard according to the effective information in the follow-up process is improved.

S12, generating a field vector based on the code value class field, and grouping the code value class field based on the field vector to obtain field groups of various categories.

In an optional embodiment, the generating a field vector based on the code value class field and grouping the code value class field based on the field vector to obtain a field group of multiple categories includes:

calculating the similarity between the field vectors;

In an alternative embodiment, the chinese name of the code value class field may be converted into a field vector by a word vector model, where the word vector model may be one of word vector models such as word2vec, glove, ELMo, BERT, and the scheme is not limited specifically.

In this optional embodiment, the association relationship between the code value fields can be measured by calculating the similarity between the field vectors through a cosine similarity algorithm, the value range of the similarity is [0,1], and the code value fields are divided into field groups of multiple categories through a preset similarity threshold. The similarity threshold may be 0.7, if the similarity between the two field vectors is not less than 0.7, the code value class fields corresponding to the two field vectors are classified into the field groups of the same class, and if the similarity between the two field vectors is less than 0.7, the code value class fields corresponding to the two field vectors need to be classified into the field groups of different classes respectively.

In this alternative embodiment, if a field group of a category includes a plurality of code value class fields, and the cosine similarity between a field vector of any one code value class field and a field vector of a code value class field to be divided is not less than 0.7, the code value class field to be divided may be classified into the field group of the category.

By way of example, "acceptor document type", "buyer document type", "employee document type", "group document type" may be divided by the Chinese name of the code value class field into a set of field sets of the same class that represent the document type.

Therefore, the code value type fields with the association relation can be divided into the same field group by calculating the similarity among the code value type fields, so that the accurate data standard can be conveniently extracted from the field group of the same category in the subsequent process.

S13, calculating the similarity between the enumerated value lists corresponding to the code value type fields in the field group to construct a code value similarity matrix.

In an optional embodiment, the calculating the similarity between the enumerated value lists corresponding to the code value class fields in the field set to construct a code value similarity matrix includes:

In an alternative embodiment, one-hot encoding may be used to convert key values in the enumerated value list corresponding to each code value type field in the field group into key vectors composed of digital codes, and then a jaccard similarity algorithm is used to calculate the acquired jaccard similarity between the key vectors, and a key similarity matrix is constructed according to the jaccard similarity and the key values. In the scheme, jacquard similarity between key vectors of each code value type field is used as first similarity between corresponding code value type fields, wherein any element C in the key similarity matrix _ij For representing a first similarity between an i-th code value class field and a j-th code value class field in the field set.

The one-hot code is also known as one-bit efficient code, by using N-bit status registers to encode N states, each with its own register bit, and expressed in binary form. By way of example, assuming that the code is for the student's gender [ male, female ], N states are coded according to the use of N-bit state registers, where there are 2 features, i.e. n=2, the following expressions are possible:

man → [1,0]; women → [0,1].

In this alternative embodiment, the jaccard similarity algorithm is mainly used to calculate the similarity between sample sets, and since each enumerated value list includes a plurality of key values, the ratio between the number of elements of the intersection of the key value sets and the number of elements of the union of the two enumerated value lists can be calculated as the jaccard coefficient to reflect the jaccard similarity between the two, and the closer 1 the value range of the jaccard coefficient is [0,1], the higher the similarity between the two is.

In this optional embodiment, because the enumerated values are in a digital form, values in an enumerated value list corresponding to each code value field in the field group can be directly used as value vectors, and cosine similarity between the value vectors is measured by adopting a cosine similarity algorithm, so as to construct and obtain a value similarity matrix. In the scheme, the similarity between the value vectors of the code value class fields is used as the second similarity between the corresponding code value class fields, wherein any element R in the value similarity matrix _ij For representing a second similarity between an ith code value class field and a jth code value class field in the field set.

In this alternative embodiment, different weights may be assigned to the key similarity matrix and the value similarity matrix, so that the similarity of the first similarity and the second similarity is obtained as the code value similarity by weighted summation between corresponding elements of the two matrices. The weights assigned to the key similarity matrix and the value similarity matrix in this scheme may be 0.6 and 0.4. Namely K _ij ＝0.6C _ij +0.4R _ij Representing the similarity of code values between the ith and jth code value class fields in the field set.

Exemplary, as shown in FIG. 4, a group of fields representing the types of certificates, fields 1 through 4 represent the code value class fields "acceptor certificate type", "buyer certificate type", "employee certificate type" and "group certificate type", respectively, wherein the first similarity between field 1 and fields 2, 3, 4 is 1, 0.9, 0.4, the second similarity is 0.5, 0.4, respectively, and then K is entered _ij ＝0.6C _ij +0.4R _ij The calculated similarity of the code values between the final field 1 and the fields 2, 3 and 4 is respectively 0.8, 0.7 and 0.4, and the similarity of the code values between other fields can be calculated by the same method, so that the code value similarity matrix shown in fig. 4 is formed.

Therefore, more accurate code value similarity can be obtained by comprehensively calculating the key similarity and the value similarity between the enumerated value lists corresponding to the code value type fields, so that the accuracy of the data standard extracted in the subsequent process is improved.

S14, constructing a connection diagram among the code value type fields based on the code value similarity matrix to obtain a plurality of field connection diagrams.

In an optional embodiment, the constructing a connectivity graph between the code value class fields based on the code value similarity matrix to obtain a plurality of field connectivity graphs includes:

In this optional embodiment, an appropriate code value similarity threshold may be selected, and if the similarity of each code value in the code value similarity matrix is not less than the code value similarity threshold, it is indicated that the two corresponding code value fields are highly similar, so that a comparison result may be obtained by comparing each code value similarity in the code value similarity matrix with a preset code value similarity threshold, where the comparison result includes a value less than the code value similarity threshold and a value not less than the code value similarity threshold.

In this alternative embodiment, as shown in fig. 5, if the code value similarity is smaller than the code value similarity threshold, the code value similarity is considered to be irrelevant between the two corresponding code value fields, and is set to 0, if the code value similarity is not smaller than the code value similarity threshold, the corresponding code value similarity is set to 1, so that all the element values in the code value similarity matrix are replaced and updated, and in this embodiment, the code value similarity matrix updated with all the element values is used as a graph accompanying matrix, where the code value similarity threshold may be 0.85.

In this alternative embodiment, two code value class fields corresponding to a position 1 in the graph accompanying matrix are used as two communicable vertices, and two code value class fields corresponding to a position 0 are not in communication relation, so as to construct a field communication graph between the code value class fields.

Illustratively, in the graph accompanying matrix shown in fig. 5, the element value of the third column of the second row is 1, which indicates that the corresponding code value class field 2 and the code value class field 3 are communicable; meanwhile, the element value of the third row and the fourth column is 0, which indicates that the corresponding code value class field 3 and the corresponding code value class field 4 have no communication relation, so that a field communication diagram shown in fig. 6 is obtained.

Thus, the generated graph accompanying matrix can generate a field connection graph among the code value type fields, and the association relation among the code value type fields is clearly expressed through the graph.

S15, extracting the code value information of the code value type field, and fusing the code value information based on the field connectivity graph to obtain the data standard of the service database.

In an optional embodiment, the extracting the code value information of the code value class field and fusing the code value information based on the field connectivity graph to obtain the data standard of the service database includes:

In this alternative embodiment, the code value information may be extracted from the annotation of the code value class field by setting a language rule, and the chinese annotation of the code value class field B is illustratively set as "purchase channel 0: external purchasing, 1: internal procurement ", the extractable code value information is" 0: external purchasing, 1: internal purchasing. In general, special symbol links such as ":", "-", etc. exist in code value information in code value field notes, and code value information can be extracted by pre-writing language rules and generating a matching script by combining regular expressions.

Illustratively, the Chinese notation of the code value class field C is set to "certificate type 01: identity card, 02: passport ", the extractable code value information is" 01: identity card, 02: passport).

In this alternative embodiment, for each field connection graph, the code value fields corresponding to the vertices connected on the field connection graph may be combined into the same packet, and the packet obtained after the combination may be used as the code value field group. And then merging the code value information of each code value field in the code value field group to be used as the data standard of the code value field group. In the scheme, the data standard of all code value field groups is finally used as the data standard common to the service database.

Illustratively, the code value class field shown in fig. 7 is finally combined by the code value information to obtain the data standard shown in fig. 8.

Thus, the complete code value information can be obtained through combining the code value information, so that a more complete data standard universal for the database is obtained.

Referring to fig. 2, fig. 2 is a functional block diagram of a preferred embodiment of the artificial intelligence based data standard extraction device of the present application. The artificial intelligence based data standard extraction device 11 comprises an acquisition module 110, an extraction module 111, a grouping module 112, a calculation module 113, a construction module 114 and a fusion module 115. The unit/module referred to herein is a series of computer readable instructions capable of being executed by the processor 13 and of performing a fixed function, stored in the memory 12. In the present embodiment, the functions of the respective units/modules will be described in detail in the following embodiments.

In an alternative embodiment, the collection module 110 is configured to collect service data in a service database to obtain a service base data set.

In an alternative embodiment, the extracting module 111 is configured to extract a code value class field in the service basic data set to obtain an enumerated value list, where the code value class field corresponds to the enumerated value list one-to-one.

In an alternative embodiment, the grouping module 112 is configured to generate a field vector based on the code value class field, and group the code value class field based on the field vector to obtain a field group of multiple categories.

calculating the similarity between the field vectors;

In an alternative embodiment, the calculating module 113 is configured to calculate the similarity between the enumerated value lists corresponding to the code value class fields in the field set to construct a code value similarity matrix.

man → [1,0]; women → [0,1].

Exemplary, as shown in FIG. 4, for a set of fields representing the type of certificate, fields 1 through 4 represent the code value class fields "acceptor certificate type", "buyer certificate type", "employee certificate type" and "clique", respectivelyA body certificate type ", wherein the first similarity between the field 1 and the fields 2, 3 and 4 is 1, 0.9 and 0.4, and the second similarity is 0.5, 0.4 and 0.4 respectively, then the body certificate type" is brought into K _ij ＝0.6C _ij +0.4R _ij The calculated similarity of the code values between the final field 1 and the fields 2, 3 and 4 is respectively 0.8, 0.7 and 0.4, and the similarity of the code values between other fields can be calculated by the same method, so that the code value similarity matrix shown in fig. 4 is formed.

In an alternative embodiment, the construction module 114 is configured to construct a connection graph between the fields of each code value class based on the code value similarity matrix to obtain a plurality of field connection graphs.

In an alternative embodiment, the fusion module 115 is configured to extract the code value information of the code value class field, and fuse the code value information based on the field connectivity graph to obtain the data standard of the service database.

According to the technical scheme, the association relation between the code value information in the field annotation and the fields can be comprehensively considered by constructing the code value similarity matrix by utilizing the collected service data, and the complete data standard is obtained by combining the graph algorithm, so that the redundancy of the data standard is reduced, the use efficiency of a database is improved, and the operation cost of the database is reduced.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1 comprises a memory 12 and a processor 13. The memory 12 is configured to store computer readable instructions and the processor 13 is configured to execute the computer readable instructions stored in the memory to implement the artificial intelligence based data standard extraction method according to any of the embodiments described above.

In an alternative embodiment, the electronic device 1 further comprises a bus, a computer program stored in said memory 12 and executable on said processor 13, for example a data standard extraction program based on artificial intelligence.

Fig. 3 shows only an electronic device 1 with a memory 12 and a processor 13, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or a different arrangement of components.

In connection with fig. 1, the memory 12 in the electronic device 1 stores a plurality of computer readable instructions to implement an artificial intelligence based data standard extraction method, the processor 13 being executable to implement:

Specifically, the specific implementation method of the above instructions by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, the electronic device 1 may be a bus type structure, a star type structure, the electronic device 1 may further comprise more or less other hardware or software than illustrated, or a different arrangement of components, e.g. the electronic device 1 may further comprise an input-output device, a network access device, etc.

It should be noted that the electronic device 1 is only used as an example, and other electronic products that may be present in the present application or may be present in the future are also included in the scope of the present application and are incorporated herein by reference.

The memory 12 includes at least one type of readable storage medium, which may be non-volatile or volatile. The readable storage medium includes flash memory, a removable hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, such as a mobile hard disk of the electronic device 1. The memory 12 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. The memory 12 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of data standard extraction programs based on artificial intelligence, but also for temporarily storing data that has been output or is to be output.

The processor 13 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, a combination of various control chips, and the like. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects the respective components of the entire electronic device 1 using various interfaces and lines, executes or executes programs or modules stored in the memory 12 (for example, executes an artificial intelligence-based data standard extraction program or the like), and invokes data stored in the memory 12 to perform various functions of the electronic device 1 and process the data.

The processor 13 executes the operating system of the electronic device 1 and various types of applications installed. The processor 13 executes the application program to implement the steps of the various embodiments of the artificial intelligence based data standard extraction method described above, such as the steps shown in fig. 1.

Illustratively, the computer program may be split into one or more units/modules, which are stored in the memory 12 and executed by the processor 13 to complete the present application. The one or more units/modules may be a series of computer readable instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the electronic device 1. For example, the computer program may be divided into an acquisition module 110, an extraction module 111, a grouping module 112, a calculation module 113, a construction module 114, a fusion module 115.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to perform portions of the artificial intelligence-based data standard extraction methods described in various embodiments of the present application.

The integrated units/modules of the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand alone product. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by instructing the relevant hardware device by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each method embodiment described above when executed by a processor.

Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory, other memories, and the like.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The bus may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 3, but only one bus or one type of bus is not shown. The bus is arranged to enable a connection communication between the memory 12 and at least one processor 13 or the like.

The embodiment of the present application further provides a computer readable storage medium (not shown), where computer readable instructions are stored, where the computer readable instructions are executed by a processor in an electronic device to implement the data standard extraction method based on artificial intelligence according to any one of the embodiments above.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

Furthermore, it is evident that the word "comprising" does not exclude other modules or steps, and that the singular does not exclude a plurality. The various modules or means set forth in the specification may also be implemented by one module or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application.

Claims

1. A method for extracting data standards based on artificial intelligence, the method comprising:

2. The artificial intelligence based data standard extraction method of claim 1, wherein the collecting the service data in the service database to obtain the service base data set comprises:

3. The method for extracting artificial intelligence based data standard according to claim 1, wherein the extracting the code value class field in the service base data set to obtain an enumerated value list, the code value class field being in one-to-one correspondence with the enumerated value list comprises:

4. The artificial intelligence based data standard extraction method of claim 1, wherein the generating a field vector based on the code value class field and grouping the code value class field based on the field vector to obtain a plurality of classes of field groups comprises:

calculating the similarity between the field vectors;

5. The method of claim 1, wherein the enumerated value list is a key-value pair structure, and the calculating the similarity between the enumerated value lists corresponding to the code-value class fields in the field group to construct the code-value similarity matrix comprises:

6. The method for extracting artificial intelligence based data criteria of claim 1, wherein constructing a connected graph between code value class fields based on the code value similarity matrix to obtain a plurality of field connected graphs comprises:

7. The method for extracting data standard based on artificial intelligence according to claim 1, wherein the steps of extracting the code value information of the code value class field, and fusing the code value information based on the field connectivity graph to obtain the data standard of the service database include:

8. The device is characterized by comprising an acquisition module, an extraction module, a grouping module, a calculation module, a construction module and a fusion module:

9. An electronic device, the electronic device comprising:

a memory storing computer readable instructions; and

A processor executing computer readable instructions stored in the memory to implement the artificial intelligence based data standard extraction method of any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the artificial intelligence based data standard extraction method of any of claims 1 to 7.