[go: up one dir, main page]

CN112527816A - Data blood relationship analysis method, system, computer device and storage medium - Google Patents

Data blood relationship analysis method, system, computer device and storage medium Download PDF

Info

Publication number
CN112527816A
CN112527816A CN202011408452.5A CN202011408452A CN112527816A CN 112527816 A CN112527816 A CN 112527816A CN 202011408452 A CN202011408452 A CN 202011408452A CN 112527816 A CN112527816 A CN 112527816A
Authority
CN
China
Prior art keywords
statement
structured query
log
logs
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011408452.5A
Other languages
Chinese (zh)
Other versions
CN112527816B (en
Inventor
皮天远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011408452.5A priority Critical patent/CN112527816B/en
Publication of CN112527816A publication Critical patent/CN112527816A/en
Priority to PCT/CN2021/083128 priority patent/WO2022116425A1/en
Application granted granted Critical
Publication of CN112527816B publication Critical patent/CN112527816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application belongs to the technical field of big data, is applied to the field of intelligent medical treatment, and relates to a data blood relationship analysis method, a system, computer equipment and a storage medium, wherein the data blood relationship analysis method comprises the steps that a plurality of pre-deployed servers synchronously receive logs distributed by a log distribution end; acquiring a structured query script in a log, and respectively encrypting structured query sentences in the structured query script based on a preset encryption algorithm to obtain sentence values, wherein the same structured query sentences have the same sentence values obtained through the encryption algorithm; obtaining historical statement values stored in a database, comparing whether the historical statement values identical to the statement values exist or not, and if not, analyzing the structured query script to obtain a blood relationship result; and storing the statement value into the database, and storing the blood relationship result into a preset result table. The result table may be stored in a block chain, among other things. The method and the device can quickly determine the analysis condition of the structured query statement.

Description

Data blood relationship analysis method, system, computer device and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data relationship analysis method, system, computer device, and storage medium.
Background
During the processes of data generation, processing, fusion, circulation and extinction, a relationship is formed between the data, and the relationship becomes a blood-related relationship of the data. With the continuous development of computer technology, the data volume is continuously increased, the significance of analysis of the blood relationship among data is continuously highlighted, and the traceability of data fusion processing is realized through blood relationship analysis, so that all related metadata objects with a certain data object as a starting point and the relationship among the metadata objects are found.
At present, in analyzing the blood relationship of a structured query statement, it is necessary to compare the analyzed structured query statement with the structured query statement to be judged one by one, so as to determine whether the structured query statement to be judged has been analyzed. Therefore, it is difficult for the computer to quickly determine whether the structural query statement has completed blood vessel relationship analysis, and the data processing speed of the computer is slow.
Disclosure of Invention
An object of the embodiment of the present application is to provide a data consanguinity relationship analysis method, apparatus, computer device, and storage medium, which can quickly determine an analysis condition of a structured query statement.
In order to solve the above technical problem, an embodiment of the present application provides a data blood relationship analysis method, which adopts the following technical solutions:
a data blood relationship analysis method comprises the following steps:
a plurality of pre-deployed servers synchronously receive logs distributed by a log distribution end;
the server acquires the structured query scripts in the log, and encrypts the structured query sentences in the structured query scripts respectively through a preset encryption algorithm to obtain sentence values, wherein the sentence values obtained through the encryption algorithm are the same for the same structured query sentences;
the server acquires a historical statement value stored in a database, compares whether the historical statement value identical to the current statement value exists, and if the historical statement value identical to the current statement value does not exist, analyzes a structured query statement corresponding to the current statement value to obtain a blood relationship result;
and the server stores the statement value into the database and stores the blood relationship result into a preset result table.
Further, before the step of the multiple servers synchronizing the multithread receiving logs, the method further includes:
the method comprises the steps that a log distribution end identifies the number of logs currently processed by a plurality of pre-deployed servers;
and the log distribution end acquires a plurality of logs and distributes the logs to different servers based on the number of the logs.
Further, the step of acquiring a plurality of logs to be distributed by the log distributing terminal and distributing the logs to be distributed to different servers based on the number of the logs comprises:
the log distributing end distributes the logs to the servers with the least number of logs currently processed one by one until the log distribution is completed or the number of logs currently processed by each server is equal;
and the log distribution end identifies whether unallocated logs exist or not, and distributes the unallocated logs to different servers when the unallocated logs exist.
Further, after the step of storing the statement value in the database and storing the blood relationship result in a preset result table by the server, the method further includes:
the current server receives a first blood margin analysis completion signal carrying a first log identifier;
the current server identifies whether a difference between the number of the received first blood margin analysis completion signals and the number of deployed servers is a digital one;
when the difference value is digital one, the current server acquires a log identifier carried by a log currently processed, and when the current log identifier is different from the first log identifier, the current server performs full-weight deduplication operation on a blood-related relationship result in the result table;
and when the difference value is not one digit, the current server acquires a log identifier carried by the currently processed log, and when the current log identifier is different from the first log identifier, a second blood margin analysis completion signal carrying the first log identifier is sent to all the servers.
Further, the step of respectively encrypting the structured query statements in the structured query script through a preset encryption algorithm to obtain statement values includes:
and respectively encrypting the structured query sentences in the structured query script by an MD5 encryption algorithm to obtain the sentence values.
Further, the step of performing parsing operation on the structured query statement corresponding to the current statement value to obtain a blood relationship result includes:
carrying out preliminary blood margin analysis operation on the structured query statement to obtain an information syntax tree;
and extracting preset targets, target source libraries, target source tables and target source fields in the information syntax tree, associating the targets, the target source libraries, the target source tables and the target source fields, and obtaining the blood relationship result.
Further, after the step of the server comparing whether the statement value is the same as the historical statement value in the database, the method further includes:
when the historical statement value identical to the statement value exists, determining the recording time of the blood relationship which has a mapping relation with the historical statement value, and modifying the recording time into the current time.
In order to solve the above technical problem, an embodiment of the present application further provides a data blood relationship analysis system, which adopts the following technical solution:
a data blood relationship analysis system comprises a plurality of servers and a log distribution end, wherein the servers comprise: the device comprises a receiving module, an encryption module, a comparison module and a storage module;
the receiving module is used for receiving the logs distributed by the log distributing end;
the encryption module is used for acquiring the structured query scripts in the logs and respectively encrypting the structured query sentences in the structured query scripts through a preset encryption algorithm to acquire sentence values, wherein the same structured query sentences have the same sentence values acquired through the encryption algorithm;
the comparison module is used for acquiring historical statement values stored in the database, comparing whether the historical statement values identical to the current statement values exist or not, and if the historical statement values identical to the current statement values do not exist, analyzing the structured query statements corresponding to the current statement values to obtain a blood relationship result;
the storage module is used for storing the statement value into the database and storing the blood relationship result into a preset result table.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device comprising a memory and a processor, wherein the memory stores computer readable instructions, and the processor implements the steps of the data relationship analysis method when executing the computer readable instructions.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the data relationship analysis method described above.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: according to the method and the system, the logs distributed by the log distribution end are synchronously received through the plurality of pre-deployed servers, so that the logs are synchronously processed, and the logs are prevented from being accumulated. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same as the historical statement value, whether the structural query statement corresponding to the current statement value completes blood vessel relationship analysis or not is rapidly determined. And if the historical statement value identical to the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. By directly parsing the structured query statement, a complete blood relationship is obtained. By comparing the statement value with the historical statement value, the analysis condition is rapidly determined, and repeated analysis of the structured query statement is avoided.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a data relationship analysis method according to the present application;
FIG. 3 is a schematic block diagram of an embodiment of a data relationship analysis system according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. a server; 301. a receiving module; 302. an encryption module; 303. a comparison module; 304. and a storage module.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
The data relationship analysis method is applied to a data relationship analysis system. As shown in fig. 1, the system architecture includes a server and a log distribution end. The connection between different servers and between the server and the log distribution end is performed through a network, and the network may include various connection types, such as a wired connection, a wireless communication link, or a fiber optic cable.
The log distribution terminal may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like.
It should be understood that the number of servers and log distribution ends in fig. 1 is merely illustrative. There may be any number of servers and log distribution ends, as desired for the implementation.
With continued reference to FIG. 2, a flow diagram of one embodiment of a data relationship analysis method according to the present application is shown. The data blood relationship analysis method comprises the following steps:
s1: and a plurality of pre-deployed servers synchronously receive the logs distributed by the log distribution end.
In this embodiment, the manner in which the plurality of servers synchronously receive and process the logs can deal with a large batch of logs, thereby avoiding log accumulation and realizing rapid analysis of the logs.
In this embodiment, the server may receive the log distributed by the log distribution end in a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
S2: and the server acquires the structured query script in the log, and encrypts the structured query sentences in the structured query script respectively through a preset encryption algorithm to acquire sentence values.
In this embodiment, the structured query statement is encrypted by an encryption algorithm to obtain a corresponding statement value, which facilitates subsequent comparison and analysis by the statement value.
Specifically, in step S2, that is, the step of encrypting the structured query statements in the structured query script respectively by using a preset encryption algorithm to obtain the statement values includes:
and respectively encrypting the structured query sentences in the structured query script by an MD5 encryption algorithm to obtain the sentence values.
In this embodiment, the MD5 encryption Algorithm, MD5 Message Digest Algorithm (MD5 Message-Digest Algorithm), is a widely used cryptographic Hash function that generates a 128-bit (16-byte) Hash Value (Hash Value) to ensure the integrity of the Message transmission. And the MD5 encryption algorithm is simple and convenient to operate, the efficient operation of a computer on data processing can be ensured, and for a fixed character string, the character string encrypted by the MD5 is fixed, so that the method is suitable for the application.
It should be noted that, in practical applications, other encryption algorithms may be selected to encrypt the structured query statement, as long as the encryption algorithms conform to: the same structured query statement is encrypted, and the generated statement values are the same, so that the method is applicable to the application.
S3: the server acquires historical statement values stored in a database, compares whether the historical statement values identical to the current statement values exist, and if the historical statement values identical to the current statement values do not exist, analyzes the structured query statements corresponding to the current statement values to obtain a blood relationship result.
In this embodiment, if there is no historical statement value that is the same as the current statement value, it can be quickly determined that the structured query statement corresponding to the statement value has not been analyzed, so as to perform an analysis operation on the structured query statement, and obtain a blood relationship result corresponding to the structured query statement.
Specifically, in step S3, the step of performing an analysis operation on the structured query statement corresponding to the current statement value to obtain a blood relationship result includes:
carrying out preliminary blood margin analysis operation on the structured query statement to obtain an information syntax tree;
and extracting preset targets, target source libraries, target source tables and target source fields in the information syntax tree, associating the targets, the target source libraries, the target source tables and the target source fields, and obtaining the blood relationship result.
In this embodiment, the information syntax tree is obtained by a preliminary blood-border parsing operation on the structured query statement. The preset target, the source library of the target, the source table of the target and the source field of the target and the incidence relation thereof can be directly extracted based on the information syntax tree, so that the blood relationship result is generated based on the incidence relation.
S4: and the server stores the statement value into the database and stores the blood relationship result into a preset result table.
In this embodiment, the statement values are stored in the database, expanding the historical statement values. And the blood relationship result is stored in a result table, so that the result table can be directly searched when needed.
After step S3, that is, after the step of the server comparing whether the statement value is the same as the historical statement value in the database, the method further includes:
when the historical statement value identical to the statement value exists, determining the recording time of the blood relationship which has a mapping relation with the historical statement value, and modifying the recording time into the current time.
In this embodiment, when the historical statement value identical to the statement value exists, it is determined that the structured query statement corresponding to the statement value has already been analyzed, and repeated analysis is not required. But the recording time of the blood relationship having the mapping relation with the historical statement value needs to be modified into the current time, so that the subsequent searching and application are facilitated. The condition that the corresponding recording time of the blood relationship is too long and is deleted in other cleaning operations is avoided.
After step S4, that is, after the step of the server storing the statement value in the database and storing the blood relationship result in a preset result table, the method further includes:
the current server receives a first blood margin analysis completion signal carrying a first log identifier;
the current server identifies whether a difference between the number of the received first blood margin analysis completion signals and the number of deployed servers is a digital one;
when the difference value is digital one, the current server acquires a log identifier carried by a log currently processed, and when the current log identifier is different from the first log identifier, the current server performs full-weight deduplication operation on a blood-related relationship result in the result table;
and when the difference value is not one digit, the current server acquires a log identifier carried by the currently processed log, and when the current log identifier is different from the first log identifier, a second blood margin analysis completion signal carrying the first log identifier is sent to all the servers.
In this embodiment, the first blood margin analysis completion signal carrying the first log flag comes from other servers. And when the other servers finish analyzing the logs distributed to the current scene, transmitting a first blood margin analysis finishing signal carrying the first log identifier to the current server. And when the current log identifier is the same as the first log identifier, determining that the analysis task of the distributed log of the current scene is not completed, and continuing to analyze the next log. According to the data processing method and device, after the plurality of servers respectively acquire the log data, different servers respectively perform data processing through the data processing process. After a certain server finishes processing all logs of the current scene, a processing completion notice of the logs of the current scene, namely a first blood margin analysis completion signal, is sent to other servers at the same time, and the last server performs full-weight deduplication operation on blood margin relations in a result table until the last server finishes data processing, so that repeated deduplication operation of the servers is avoided. Specifically, the server determines whether the logs of the current scene are analyzed through the log identifiers, all logs in the first scene carry the first log identifiers, and all logs in the second scene carry the second log identifiers. The server synchronously receives and analyzes all logs in the first scene. When the server identifies that the currently analyzed log identifier is not the first log identifier, namely, the server indicates that the blood margin analysis operation on all logs in the first scene is completed, a second blood margin analysis completion signal carrying the first log identifier is sent to other servers, so that the other servers can know the progress of each server. When the difference value is a digital value, determining that other servers finish analyzing the blood margin of the distributed logs in the first scene, acquiring a log identifier carried by the currently processed log by the current server, and determining that the blood margin of all the logs in the first scene is analyzed by the current server when the current log identifier is different from the first log identifier. A full deduplication operation may be performed by the current server on the kindred results in the results table.
Before step S1, that is, before the step of the multiple servers synchronizing the multithread reception log, the method further includes:
the method comprises the steps that a log distribution end identifies the number of logs currently processed by a plurality of pre-deployed servers;
and the log distribution end acquires a plurality of logs and distributes the logs to different servers based on the number of the logs.
In this embodiment, the log distributing end distributes the logs according to the number of the logs being processed by different servers. The consistency of the processing progress of different servers is ensured.
Specifically, the step of acquiring a plurality of logs to be distributed by the log distributing end and distributing the logs to be distributed to different servers based on the number of the logs comprises:
the log distributing end distributes the logs to the servers with the least number of logs currently processed one by one until the log distribution is completed or the number of logs currently processed by each server is equal;
and the log distribution end identifies whether unallocated logs exist or not, and distributes the unallocated logs to different servers when the unallocated logs exist.
In this embodiment, the present application provides a plurality of servers that perform synchronous processing to acquire log data. When the log distribution is carried out, the log distribution end carries out distribution based on the number of logs processed by the current server. For example, the number of the logs of the data currently processed by the server is x, y, and z, where x is greater than or equal to y and is greater than or equal to z, the log distributing end first distributes the logs to the server with the smallest number of the logs currently processed one by one until the log distribution is completed or the number of the logs currently processed by each server is equal, that is, x is y is z, and the log distributing end distributes the remaining logs to different servers, so as to ensure that the processing schedules of the servers are similar, and ensure the processing speed of the data.
According to the method and the system, the logs distributed by the log distribution end are synchronously received through the plurality of pre-deployed servers, so that the logs are synchronously processed, and the logs are prevented from being accumulated. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same as the historical statement value, whether the structural query statement corresponding to the current statement value completes blood vessel relationship analysis or not is rapidly determined. And if the historical statement value identical to the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. By directly parsing the structured query statement, a complete blood relationship is obtained. By comparing the statement value with the historical statement value, the analysis condition is rapidly determined, and repeated analysis of the structured query statement is avoided.
It is emphasized that the result table may also be stored in a node of a blockchain in order to further ensure privacy and security of the result table.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The application can be applied to the field of intelligent medical treatment and is used for blood relationship analysis of medical data, so that the construction of a smart city is promoted.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data blood relationship analysis system, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 3, the data relationship analysis system of this embodiment includes a plurality of servers and a log distribution end: wherein the server 300 includes: a receiving module 301, an encryption module 302, a comparison module 303 and a storage module 304;
the receiving module 301 is configured to receive a log distributed by a log distributing end;
the encryption module 302 is configured to obtain the structured query scripts in the logs, and encrypt the structured query statements in the structured query scripts respectively through a preset encryption algorithm to obtain statement values, where the same structured query statement is obtained through the encryption algorithm;
the comparison module 303 is configured to obtain a historical statement value stored in the database, compare whether a historical statement value identical to the current statement value exists, and if the historical statement value identical to the current statement value does not exist, perform an analysis operation on a structured query statement corresponding to the current statement value to obtain a blood relationship result;
the storage module 304 is configured to store the statement value in the database, and store the blood relationship result in a preset result table.
In this embodiment, the logs distributed by the log distribution end are synchronously received by the plurality of pre-deployed servers, so that the synchronous processing of the logs is facilitated, and the accumulation of the logs is avoided. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same as the historical statement value, whether the structural query statement corresponding to the current statement value completes blood vessel relationship analysis or not is rapidly determined. And if the historical statement value identical to the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. By directly parsing the structured query statement, a complete blood relationship is obtained. By comparing the statement value with the historical statement value, the analysis condition is rapidly determined, and repeated analysis of the structured query statement is avoided.
In some optional implementations of this embodiment, the encryption module 302 is further configured to: and respectively encrypting the structured query sentences in the structured query script by an MD5 encryption algorithm to obtain the sentence values.
The comparison module 303 includes an analysis sub-module and an extraction sub-module. The parsing submodule is used for carrying out preliminary blood margin parsing operation on the structured query statement to obtain an information syntax tree; the extraction submodule is used for extracting preset targets, target source libraries, target source tables and target source fields in the information syntax tree, associating the targets, the target source libraries, the target source tables and the target source fields, and obtaining the blood relationship result.
In some optional implementations of this embodiment, the server 300 further includes: and the time modification module is used for determining the recording time of the blood relationship with the mapping relationship with the historical statement value when the historical statement value identical to the statement value exists, and modifying the recording time into the current time.
In some optional implementations of this embodiment, the server 300 further includes: the identification module comprises a signal receiving submodule, an identification submodule, a duplication eliminating submodule and a signal sending submodule. The signal receiving submodule is used for receiving a first blood margin analysis completion signal carrying a first log identifier; the identification submodule is used for identifying whether the difference value between the number of the received first blood margin analysis completion signals and the number of the deployed servers is a digital one or not; the duplicate removal sub-module is used for acquiring a log identifier carried by a currently processed log when the difference value is digital one, and when the current log identifier is different from the first log identifier, the current server performs full-weight duplicate removal operation on the blood relationship result in the result table; and the signal sending submodule is used for acquiring a log identifier carried by the currently processed log when the difference value is not one digit, and sending a second blood margin analysis completion signal carrying the first log identifier to all servers when the current log identifier is different from the first log identifier.
The log distribution end is used for identifying the number of logs currently processed by a plurality of pre-deployed servers, acquiring a plurality of logs and distributing the logs to different servers based on the number of the logs.
The log distribution end comprises a distribution module and an even distribution module. The distribution module is used for distributing the logs to the servers with the least number of logs currently processed one by the log distribution end until the log distribution is completed or the number of the logs currently processed by each server is equal; the sharing module is used for identifying whether unallocated logs exist at a log allocating end, and sharing the unallocated logs to the different servers when the unallocated logs exist.
According to the method and the system, the logs distributed by the log distribution end are synchronously received through the plurality of pre-deployed servers, so that the logs are synchronously processed, and the logs are prevented from being accumulated. The server encrypts the structured query statement through a preset encryption algorithm to obtain a statement value; by comparing whether the statement value is the same as the historical statement value, whether the structural query statement corresponding to the current statement value completes blood vessel relationship analysis or not is rapidly determined. And if the historical statement value identical to the statement value does not exist, analyzing the structured query script to obtain a blood relationship result. By directly parsing the structured query statement, a complete blood relationship is obtained. By comparing the statement value with the historical statement value, the analysis condition is rapidly determined, and repeated analysis of the structured query statement is avoided.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 200 comprises a memory 201, a processor 202, a network interface 203 communicatively connected to each other via a system bus. It is noted that only computer device 200 having components 201 and 203 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 201 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 201 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 201 is generally used for storing an operating system installed in the computer device 200 and various types of application software, such as computer readable instructions of a data relationship analysis method. Further, the memory 201 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 202 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 202 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or computer readable instructions for processing data, such as executing the data relationship analysis method.
The network interface 203 may comprise a wireless network interface or a wired network interface, and the network interface 203 is generally used for establishing communication connection between the computer device 200 and other electronic devices.
In the embodiment, the statement value is obtained through an encryption algorithm, and the statement value and the historical statement value are directly compared to realize the rapid determination and analysis condition, so that the repeated analysis of the structured query statement is avoided.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the data relationship analysis method as described above.
In the embodiment, the statement value is obtained through an encryption algorithm, and the statement value and the historical statement value are directly compared to realize the rapid determination and analysis condition, so that the repeated analysis of the structured query statement is avoided.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A data blood relationship analysis method is characterized by comprising the following steps:
a plurality of pre-deployed servers synchronously receive logs distributed by a log distribution end;
the server acquires the structured query scripts in the log, and encrypts the structured query sentences in the structured query scripts respectively through a preset encryption algorithm to obtain sentence values, wherein the sentence values obtained through the encryption algorithm are the same for the same structured query sentences;
the server acquires a historical statement value stored in a database, compares whether the historical statement value identical to the current statement value exists, and if the historical statement value identical to the current statement value does not exist, analyzes a structured query statement corresponding to the current statement value to obtain a blood relationship result;
and the server stores the statement value into the database and stores the blood relationship result into a preset result table.
2. The method of claim 1, wherein before the step of the plurality of servers synchronizing the multithreaded reception logs, the method further comprises:
the method comprises the steps that a log distribution end identifies the number of logs currently processed by a plurality of pre-deployed servers;
and the log distribution end acquires a plurality of logs and distributes the logs to different servers based on the number of the logs.
3. The data relationship analysis method according to claim 2, wherein the log distributing end obtains a plurality of logs to be distributed, and the step of distributing the logs to be distributed to different servers based on the number of the logs comprises:
the log distributing end distributes the logs to the servers with the least number of logs currently processed one by one until the log distribution is completed or the number of logs currently processed by each server is equal;
and the log distribution end identifies whether unallocated logs exist or not, and distributes the unallocated logs to different servers when the unallocated logs exist.
4. The method for analyzing data kindred relationship according to claim 1, wherein after the steps of storing the statement value in the database and storing the kindred relationship result in a preset result table by the server, the method further comprises:
the current server receives a first blood margin analysis completion signal carrying a first log identifier;
the current server identifies whether a difference between the number of the received first blood margin analysis completion signals and the number of deployed servers is a digital one;
when the difference value is digital one, the current server acquires a log identifier carried by a log currently processed, and when the current log identifier is different from the first log identifier, the current server performs full-weight deduplication operation on a blood-related relationship result in the result table;
and when the difference value is not one digit, the current server acquires a log identifier carried by the currently processed log, and when the current log identifier is different from the first log identifier, a second blood margin analysis completion signal carrying the first log identifier is sent to all the servers.
5. The method for analyzing data consanguinity relationship according to claim 1, wherein the step of respectively encrypting the structured query statements in the structured query script by a preset encryption algorithm to obtain statement values comprises:
and respectively encrypting the structured query sentences in the structured query script by an MD5 encryption algorithm to obtain the sentence values.
6. The method according to claim 1, wherein the step of performing parsing operation on the structured query statement corresponding to the current statement value to obtain the data consanguinity relationship result comprises:
carrying out preliminary blood margin analysis operation on the structured query statement to obtain an information syntax tree;
and extracting preset targets, target source libraries, target source tables and target source fields in the information syntax tree, associating the targets, the target source libraries, the target source tables and the target source fields, and obtaining the blood relationship result.
7. The method for analyzing data consanguinity relationship according to claim 1, further comprising, after the step of comparing, by the server, whether the sentence value is identical to a historical sentence value in a database, the step of:
when the historical statement value identical to the statement value exists, determining the recording time of the blood relationship which has a mapping relation with the historical statement value, and modifying the recording time into the current time.
8. A data blood relationship analysis system is characterized by comprising a plurality of servers and a log distribution end, wherein the servers comprise: the device comprises a receiving module, an encryption module, a comparison module and a storage module;
the receiving module is used for receiving the logs distributed by the log distributing end;
the encryption module is used for acquiring the structured query scripts in the logs and respectively encrypting the structured query sentences in the structured query scripts through a preset encryption algorithm to acquire sentence values, wherein the same structured query sentences have the same sentence values acquired through the encryption algorithm;
the comparison module is used for acquiring historical statement values stored in the database, comparing whether the historical statement values identical to the current statement values exist or not, and if the historical statement values identical to the current statement values do not exist, analyzing the structured query statements corresponding to the current statement values to obtain a blood relationship result;
the storage module is used for storing the statement value into the database and storing the blood relationship result into a preset result table.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the data relationship analysis method of any one of claims 1 to 7.
10. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the data relationship analysis method according to any one of claims 1 to 7.
CN202011408452.5A 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium Active CN112527816B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011408452.5A CN112527816B (en) 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium
PCT/CN2021/083128 WO2022116425A1 (en) 2020-12-03 2021-03-26 Method and system for data lineage analysis, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011408452.5A CN112527816B (en) 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112527816A true CN112527816A (en) 2021-03-19
CN112527816B CN112527816B (en) 2023-06-02

Family

ID=74997228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011408452.5A Active CN112527816B (en) 2020-12-03 2020-12-03 Data blood relationship analysis method, system, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112527816B (en)
WO (1) WO2022116425A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064869A (en) * 2021-03-23 2021-07-02 网易(杭州)网络有限公司 Log processing method and device, sending end, receiving end equipment and storage medium
CN114218607A (en) * 2021-12-24 2022-03-22 杭州数梦工场科技有限公司 Data processing method and device
CN114253995A (en) * 2022-03-01 2022-03-29 深圳市明源云科技有限公司 Data tracing method, device, equipment and computer readable storage medium
WO2022116425A1 (en) * 2020-12-03 2022-06-09 平安科技(深圳)有限公司 Method and system for data lineage analysis, computer device, and storage medium
CN115827677A (en) * 2023-01-10 2023-03-21 北京沐融信息科技股份有限公司 Database operation method and device and storage medium
CN116662308A (en) * 2023-07-28 2023-08-29 恩核(北京)信息技术有限公司 Blood margin data extraction method based on several bins of log files
CN116932656A (en) * 2023-09-18 2023-10-24 中孚安全技术有限公司 Data blood edge storage method, system, equipment and medium based on block chain
CN119441198A (en) * 2025-01-10 2025-02-14 数翊科技(北京)有限公司武汉分公司 A script data bloodline detection method and device

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098336B (en) * 2022-07-21 2025-07-25 中国平安财产保险股份有限公司 Method, system, equipment and storage medium for monitoring task of several bins
CN115227408A (en) * 2022-07-27 2022-10-25 深圳市爱博医疗机器人有限公司 Linkage control system and method for interventional robot and storage medium
CN115455050B (en) * 2022-09-13 2026-01-02 湖南大学 A Distributed Database and Query Method
CN115544038A (en) * 2022-09-30 2022-12-30 山石网科通信技术股份有限公司 Log data processing method, device, storage medium and electronic equipment
CN116166634A (en) * 2023-02-09 2023-05-26 飞算数智科技(深圳)有限公司 Data blood relationship graph construction method and device, storage medium and electronic equipment
CN116303400A (en) * 2023-03-27 2023-06-23 平安科技(深圳)有限公司 Multi-data source business data comparison method, device, equipment and storage medium
CN116628451B (en) * 2023-05-31 2023-11-14 江苏华存电子科技有限公司 High-speed analysis method for information to be processed
CN116484084B (en) * 2023-06-21 2023-11-17 广州信安数据有限公司 Metadata blood-margin analysis method, medium and system based on application information mining
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform
CN117251413A (en) * 2023-09-22 2023-12-19 北京云思智学科技有限公司 Log processing method, device and electronic equipment
CN117312331B (en) * 2023-12-01 2024-03-29 浪潮云信息技术股份公司 Metadata bloodline analysis method, device, equipment and storage medium
CN119537321B (en) * 2024-08-30 2025-09-23 太保科技有限公司 Analysis method of data blood relationship, big data platform system, equipment and storage medium
CN118916347B (en) * 2024-10-10 2024-12-06 奇点智保(北京)科技有限公司 SQL analysis-based data blood-margin analysis method
CN118966201B (en) * 2024-10-14 2025-03-21 江西裕民银行股份有限公司 Data lineage generation method, device, equipment and medium with verification mechanism
CN119719234B (en) * 2024-12-02 2025-09-19 上海吉贝克信息技术有限公司 Intelligent data classification and classification method and system based on SQL script analysis
CN119759915B (en) * 2025-03-10 2025-05-09 嘉实远见科技(北京)有限公司 Method and device for parsing column-level kinship of data sets
CN119892514B (en) * 2025-03-27 2025-09-26 北京流金岁月科技有限公司 Cross-system data intercommunication method, platform, equipment and medium
CN120780710B (en) * 2025-09-09 2025-11-28 拉扎斯网络科技(上海)有限公司 Method, device, storage medium and terminal for constructing field blood-margin tree

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160255139A1 (en) * 2016-03-12 2016-09-01 Yogesh Chunilal Rathod Structured updated status, requests, user data & programming based presenting & accessing of connections or connectable users or entities and/or link(s)
US20170147705A1 (en) * 2015-11-19 2017-05-25 Sap Se Extensions of structured query language for database-native support of graph data
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110298001A (en) * 2019-05-30 2019-10-01 北京奇艺世纪科技有限公司 The acquisition methods and device and computer readable storage medium of daily record data packet
CN111459967A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Structured query statement generation method, device, electronic device and medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781520A (en) * 2019-10-30 2020-02-11 上海观安信息技术股份有限公司 Sensitive table group discovery method and system
CN111666326B (en) * 2020-05-29 2023-03-14 中国工商银行股份有限公司 ETL scheduling method and device
CN112015722B (en) * 2020-11-02 2025-05-30 浙江大华技术股份有限公司 Database management method, data lineage analysis method and related device
CN112527816B (en) * 2020-12-03 2023-06-02 平安科技(深圳)有限公司 Data blood relationship analysis method, system, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147705A1 (en) * 2015-11-19 2017-05-25 Sap Se Extensions of structured query language for database-native support of graph data
US20160255139A1 (en) * 2016-03-12 2016-09-01 Yogesh Chunilal Rathod Structured updated status, requests, user data & programming based presenting & accessing of connections or connectable users or entities and/or link(s)
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN110298001A (en) * 2019-05-30 2019-10-01 北京奇艺世纪科技有限公司 The acquisition methods and device and computer readable storage medium of daily record data packet
CN111459967A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Structured query statement generation method, device, electronic device and medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022116425A1 (en) * 2020-12-03 2022-06-09 平安科技(深圳)有限公司 Method and system for data lineage analysis, computer device, and storage medium
CN113064869A (en) * 2021-03-23 2021-07-02 网易(杭州)网络有限公司 Log processing method and device, sending end, receiving end equipment and storage medium
CN113064869B (en) * 2021-03-23 2023-06-13 网易(杭州)网络有限公司 Log processing method, device, transmitting end, receiving end equipment and storage medium
CN114218607A (en) * 2021-12-24 2022-03-22 杭州数梦工场科技有限公司 Data processing method and device
CN114253995A (en) * 2022-03-01 2022-03-29 深圳市明源云科技有限公司 Data tracing method, device, equipment and computer readable storage medium
CN114253995B (en) * 2022-03-01 2022-05-27 深圳市明源云科技有限公司 Data tracing method, device, equipment and computer readable storage medium
CN115827677A (en) * 2023-01-10 2023-03-21 北京沐融信息科技股份有限公司 Database operation method and device and storage medium
CN116662308A (en) * 2023-07-28 2023-08-29 恩核(北京)信息技术有限公司 Blood margin data extraction method based on several bins of log files
CN116662308B (en) * 2023-07-28 2023-11-03 恩核(北京)信息技术有限公司 Blood margin data extraction method based on several bins of log files
CN116932656A (en) * 2023-09-18 2023-10-24 中孚安全技术有限公司 Data blood edge storage method, system, equipment and medium based on block chain
CN116932656B (en) * 2023-09-18 2024-01-09 中孚安全技术有限公司 Data blood edge storage method, system, equipment and medium based on block chain
CN119441198A (en) * 2025-01-10 2025-02-14 数翊科技(北京)有限公司武汉分公司 A script data bloodline detection method and device

Also Published As

Publication number Publication date
WO2022116425A1 (en) 2022-06-09
CN112527816B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN112527816B (en) Data blood relationship analysis method, system, computer equipment and storage medium
WO2021164178A1 (en) Cloud technology-based file fragment uploading method and apparatus, and device and storage medium
CN110795499B (en) Cluster data synchronization method, device, equipment and storage medium based on big data
CN111694840A (en) Data synchronization method, device, server and storage medium
CN109189367B (en) Data processing method, device, server and storage medium
CN108846753B (en) Method and apparatus for processing data
US20130347092A1 (en) Remote Direct Memory Access Authentication of a Device
CN109657107B (en) Terminal matching method and device based on third-party application
CN107506256B (en) A method and device for monitoring crash data
CN110532165B (en) Application program installation package characteristic detection method, device, equipment and storage medium
CN111460394A (en) Copyright file verification method and device and computer readable storage medium
CN113504957A (en) Table data processing method and device, computer equipment and storage medium
CN112988674A (en) Method and device for processing big data file, computer equipment and storage medium
CN111597388A (en) Sample collection method, device, equipment and medium based on distributed system
CN112468409A (en) Access control method, device, computer equipment and storage medium
CN114912003A (en) Document searching method and device, computer equipment and storage medium
CN111629063A (en) Block chain based distributed file downloading method and electronic equipment
CN110851853A (en) Data isolation method and device, computer equipment and storage medium
CN112436943A (en) Request deduplication method, device, equipment and storage medium based on big data
CN110020272B (en) Caching method, device and computer storage medium
CN113535478B (en) Data backup methods and devices, storage media and electronic equipment
CN111090616A (en) File management method, corresponding device, equipment and storage medium
CN108628909B (en) Information pushing method and device
CN107708076B (en) Method and device for pushing access information
CN114090585A (en) Batch data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant