CN110826105B

CN110826105B - Distributed bank data desensitization method and system

Info

Publication number: CN110826105B
Application number: CN201911116450.6A
Authority: CN
Inventors: 吴昊; 王巍; 王景斌; 陈菲琪; 施志晖
Original assignee: Jiangsu Suning Bank Co Ltd
Current assignee: Jiangsu Sushang Bank Co ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2021-11-12
Anticipated expiration: 2039-11-15
Also published as: CN110826105A

Abstract

The invention discloses a distributed bank data desensitization method and a system, wherein the method comprises the following steps: defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool; the data under the backup database production environment, the Hive production environment and the unstructured text production environment are used as data desensitization sources and are respectively stored in a database backup library, a Hive backup library and an unstructured backup library; creating a database self-defined desensitization function, a Hive self-defined desensitization function and a desensitization tool in each backup library; calling a database user-defined desensitization function or a Java desensitization tool in the database backup library, calling the Hive user-defined desensitization function or the Java desensitization tool in the Hive backup library, and calling the Java desensitization tool in the unstructured backup library; and simultaneously, desensitization rule parameters A and C or D are also input when each backup calls the function. The invention can be suitable for different databases, Hive environments and unstructured text environments, and has the advantages of reversible data tracing and difficult data cracking.

Description

Distributed bank data desensitization method and system

Technical Field

The invention relates to the field of human information processing, in particular to a distributed bank data desensitization method and system.

Background

Along with the development of bank information technology, the scale of a data center is continuously enlarged, the sensitive data stored in a bank is gradually increased, and the data security risk is increased in the circulation process of the data in different processes in the bank. At present, for most banks, personal information of users collected/stored in the processes of user registration and account opening through online banking and offline network points, including names, mobile phone numbers, mailboxes, identity card numbers, addresses and the like, belong to sensitive information needing to be protected. The sensitive information may participate in the processes of development testing, data analysis, data mining, big data report and other links due to business requirements in the bank, so that the sensitive data needs to be desensitized in different links, all sensitive information processing or part sensitive information processing is performed according to different scenes with the financial business requirement leading.

In the actual processing process of sensitive data, various data forms such as structured data/big data retention data/unstructured data and the like are involved, but security products provided by security manufacturers in the market at present, such as a database desensitization system for desensitizing a database independently, a system for desensitizing a text independently and the like, often a set of products cannot meet the application of multiple scenes, cannot meet the incidence relation between data retention and unstructured data of emerging scenes such as Hive and the like, and cannot achieve the desensitization rule unification and relevance of all data in a bank

In view of the importance of bank data security, and the defects of incomplete applicable scenes, incomplete target data support types and the like of desensitization products/schemes in the current market, a universal distributed data desensitization system under different scenes in a bank needs to be researched and designed urgently.

Disclosure of Invention

The invention aims to solve the problems, and therefore provides a distributed bank data desensitization method and system.

To achieve the above object, in a first aspect, the present invention provides a distributed bank data desensitization method, including the steps of:

defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool;

the data under the backup database production environment, the Hive production environment and the unstructured text production environment are used as data desensitization sources and are respectively stored in a database backup library, a Hive backup library and an unstructured backup library;

respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a desensitization tool in a database backup library, a Hive backup library and an unstructured backup library according to a defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool;

calling a database user-defined desensitization function or a Java desensitization tool in the database backup library, inputting desensitization rule parameters A, C or D into the database user-defined desensitization function or the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the database backup library;

calling a Hive user-defined desensitization function or a Java desensitization tool in the Hive backup library, inputting desensitization rule parameters A, C or D into the Hive user-defined desensitization function or the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the Hive backup library;

calling a Java desensitization tool in the unstructured backup library, inputting desensitization rule parameters A and C or D into the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the unstructured backup library;

wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.

Further, the values of the desensitization rule parameters A, C and D are input by a desensitization management platform.

Further, the database customized desensitization function, Hive customized desensitization function and Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mailbox function, an address information function, a customer number function, a password function, an account information function and a customer or employee income information function.

Further, the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by one or two or three modes of a para-position fixed value replacement algorithm, a non-logic bit remainder operation algorithm and a logic bit para-position and invariant algorithm.

Furthermore, desensitization rule parameters A and C are input into a desensitization function generated by the non-logic bit remainder operation, and a desensitization rule parameter D is input into a desensitization function generated by the logic bit alignment and invariant algorithm.

Further, the database self-defined desensitization function is called through an update statement, and desensitization source data are updated through the statement; the Hive self-defined desensitization function is called through a MapReduce statement of hadoop, and desensitization source data are updated through the statement; the Java desensitization tool executes a Java-jar pimt. jar A C D input _ file _ path > pimt. out call through a shell statement and updates desensitization source data through the statement.

Further, when the calling of the database custom desensitization function, the Hive custom desensitization function or the Java desensitization tool fails to perform data desensitization, the data to be desensitized which do not meet the rules are all replaced with the numeric incremental character string at the beginning of err.

In a second aspect, the invention also provides a distributed bank data desensitization system, the system comprising: the desensitization function management system comprises a desensitization function definition unit, a database backup library, a Hive backup library, an unstructured backup library and a desensitization management platform;

the desensitization function definition unit is used for defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool;

the database backup library is used for backing up data of a database production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling the created database user-defined desensitization function and a Java desensitization tool; updating desensitization source data to obtain desensitized data and storing the desensitized data;

the Hive backup library is used for backing up data of a Hive production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Hive custom desensitization function or a Java desensitization tool; updating desensitization source data to obtain desensitized data and storing the desensitized data;

the unstructured backup library is used for backing up data in an unstructured text production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Java desensitization tool, and updating desensitization source data to acquire desensitized data and store the desensitized data;

the desensitization management platform is used for inputting desensitization rule parameters A and C or D into a database desensitization function or a Java desensitization tool or a Hive desensitization function called by the database backup library, the Hive backup library and the unstructured backup library, wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.

Further, the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by one or two or three modes of a para-position fixed value replacement algorithm, a logic position para-position and a non-logic position remainder operation algorithm of an invariant algorithm.

Further, the data of the database production environment, the Hive production environment and the unstructured text production environment comprise customer names, certificate numbers, telephone numbers, mailboxes, address information, customer numbers, passwords, account information and customer or employee income information.

The distributed bank data desensitization method provided by the invention adopts distributed multi-environment deployment equipment, is suitable for different databases, Hive environments and unstructured text environments, has the characteristics of reversible data tracing, does not influence the original validity check of data, keeps data relevance in different environments, can flexibly change desensitized data according to needs, and has the advantages of difficulty in data cracking and improvement of data maintenance and sharing safety.

Drawings

Fig. 1 is a block diagram of a distributed bank data desensitization system according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an exemplary embodiment of a non-logical bit remainder algorithm;

FIG. 3 is a flow chart of a distributed bank data desensitization method according to an embodiment of the present invention;

FIG. 4 is a flow chart of data desensitization in a database production environment according to an embodiment of the present invention;

FIG. 5 is a flow chart of data desensitization in a Hive production environment according to an embodiment of the invention;

fig. 6 is a flow chart of data desensitization in an unstructured text production environment according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It is to be noted that the drawings are merely illustrative and not to be drawn to strict scale, and that there may be some enlargement and reduction for the convenience of description, and there may be some default to the known partial structure.

Fig. 1 is a block diagram of a distributed bank data desensitization system according to an embodiment of the present invention.

As shown in fig. 1, a distributed bank data desensitization system provided in an embodiment of the present invention includes: the desensitization function creating unit 1, the database backup library 2, the Hive backup library 3, the unstructured backup library 4 and the desensitization management platform 5.

The desensitization function definition unit 1 is mainly responsible for defining a database custom desensitization function, a Hive custom desensitization function and a Java desensitization tool.

The database backup library 2 is used for backing up data of a database production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling the created database user-defined desensitization function and a Java desensitization tool; and updating desensitization source data to acquire desensitized data and storing.

The Hive backup library 3 is mainly responsible for backing up data of the Hive production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Hive custom desensitization function or a Java desensitization tool; and updating desensitization source data to acquire desensitized data and storing.

The unstructured backup library 4 is mainly responsible for backing up data in an unstructured text production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; and calling a Java desensitization tool, and updating desensitization source data to acquire desensitized data and save.

The desensitization management platform 5 is mainly responsible for inputting desensitization rule parameters A and C or D into a database desensitization function or a Java desensitization tool or a Hive desensitization function called by the database backup library, the Hive backup library and the unstructured backup library, wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.

Data in the database production environment, Hive production environment, and unstructured text production environment include, but are not limited to, customer name, certificate number, telephone number, Email, address information, customer number, password, customer or employee income information, etc., see table 1 below.

TABLE 1

The database custom desensitization function is adaptable to different database types including Mysql, Oracle, SqlServer, DB2, etc. The Hive custom desensitization function is adapted to store the retained historical data in Hive. The Java desensitization tool is suitable for unstructured data such as office documents, texts, XML, HTML, various reports and the like. The database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mail box function, an address information function, a customer number function, a password function, an account information function, a customer or employee income information function and the like. The database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by adopting one or two or three of a para-position fixed value replacement algorithm, a logic bit para-position and invariant algorithm and a logic bit remainder operation algorithm.

The counterpoint fixed value replacement algorithm has the using scene of the place where sensitive data such as a client name, a mailbox, address information, a password, a personal website and the like exist, and data in the same field do not exist correlation and have no check on the input type, so that the sensitive data can be directly replaced by specific data with a fixed value and a fixed length, and desensitization of user information is realized. Logic bit contraposition and invariant algorithm, using the scene as the certificate number, such as the ID card number, and the like, because of the limitation of the coding rule of the ID card number, when the system inputs the ID card number, the rationality check can be carried out (if the similar condition that part of digits in the ID card number are 19801563 can not occur), so that the random contraposition replacement algorithm can not be used under the condition of ensuring non-duplication; the subtraction method with the unchanged sum of the alignment positions is used for operation, for example, the 7 th to 10 th positions of the ID card information are subtracted by a certain specific value to obtain a value, and the value is written into the corresponding alignment position as a replacement value. The non-logic bit remainder operation algorithm uses the positions of the mobile phone number, the customer number, the ID card number, the account number information and the like in the using scene, has no specific coding rule limitation, and uses an alignment fixed algorithm.

Specifically, the client name function, the mailbox function, the address information function, the password function and the personal website function are generated by adopting a bit-aligned fixed value replacement algorithm. The client number function and the account number information function are generated by using a non-logic bit remainder operation algorithm. The telephone number function is generated by a bit fixed value replacement algorithm (fixed telephone) and a non-logic bit remainder algorithm (mobile telephone). The certificate number function is generated by adopting an alignment fixed value replacement algorithm (for public organization codes, industrial and commercial registration numbers, taxpayer identification numbers and the like), a non-logic bit surplus algorithm and a logic bit alignment and invariant algorithm (for privacies: identity numbers, passport numbers, port and Australian passes, family registers, military officer certificates and the like).

Desensitization rule parameters A and C are input into a desensitization function generated by a non-logic bit remainder operation algorithm, and desensitization rule parameter D is input into a desensitization function generated by a logic bit counterpoint and invariant algorithm. Desensitization functions generated by the alignment fixed value replacement algorithm do not lose desensitization rule parameters. Fig. 2 and table 2 illustrate input desensitization rule parameters a and C in a desensitization function generated by a non-logic bit remainder algorithm.

TABLE 2

As shown in fig. 2 and table 2, the second column of original digits in table 2 is 0,1,2,3,4,5,6,7,8, and 9, i.e., corresponding to the values of a0, a1, a2, a3, a4, a5, a6, a7, a8, and a9 in fig. 2, a0 is 0, a1 is 1, a2 is 2, a3 is 3, a4 is 4, a5 is 5, a6 is 6, a7 is 7, a8 is 8, and a9 is 9. Desensitization rule input parameters a ═ 5, C ═ 0 or C ═ 1, a0 to a9, after the fig. 2 residual transport b ═ MOD ((a ^ 2 (a +3)),9), gave the

fourth column numbers

0,4,8,3,7,2,6,1,5,0 in table 2, i.e. b '0, b' 1, b '2, b' 3, b '4, b' 5, b '6, b' 7, b '8, b' 9 in fig. 2. b '0 is 0, b' 1 is 4, b '2 is 8, b' 3 is 3, b '4 is 7, b' 5 is 2, b '6 is 6, b' 7 is 1, b '8 is 5, b' 9 is 0, then b '0 to b' 9 are compared with a0 to a9, it is judged that there are some identical values between the two, if more than 3, A + + Print is executed, and the remainder operation is continued until the obtained digits are identical to the original digits and the number of digits is less than 3. If the number of the second row is less than 3, if only the first row is the same as the second row compared with the fourth row in table 2 of this embodiment, the next step can be executed to determine the value of C, i.e. to determine whether to perform variation.

Description of whether variants are present: during the non-logic bit remainder operation, the public function leaves a C entry as a process input parameter, C represents whether offset replacement is carried out on the intermediate quantity result of the operation or not in the non-logic bit remainder operation, if the C input is 1, the variation is carried out, the replacement of b '0 and a9 is carried out, if the C input is 0, the variation is not carried out, the replacement of b' 9 and a9 is carried out, so that the randomness of the desensitization algorithm is increased, and the only traceability is ensured. Namely, the value of C can ensure that the residual values b '0 to b' 9 have no repeated value, so that the value after desensitization is unique, namely the value after desensitization is ensured to be uniquely corresponding to the original value, and reversible/backtracking operation can be carried out.

The desensitization source data in the database backup library 2, the Hive backup library 3 and the unstructured backup library 4, the customer name, the certificate number, the telephone number, Email, the address information, the customer number, the password, the customer or employee income information, and the like, which are formed by the database customized desensitization function or the Hive customized desensitization function or the Java desensitization tool, can refer to the following table 3.

TABLE 3

Fig. 3 is a flowchart of a distributed bank data desensitization method according to an embodiment of the present invention.

In step 301, a database custom desensitization function, Hive custom desensitization function, and Java desensitization tool are defined.

In step 302, data in the backup database production environment, Hive production environment, and unstructured text production environment are used as a data desensitization source and stored in the database backup library, Hive backup library, and unstructured backup library, respectively.

In step 303, a database custom desensitization function, a Hive custom desensitization function, and a Java desensitization tool are created in the database backup library, the Hive backup library, and the unstructured backup library according to the defined database custom desensitization function, Hive custom desensitization function, and Java desensitization tool, respectively.

In step 304, a database user-defined desensitization function or a Java desensitization tool is called in the database backup library, desensitization rule parameters A, C and D are input into the database user-defined desensitization function or the Java desensitization tool, desensitization source data are updated, and desensitized data are obtained and stored in the database backup library.

In step 305, a Hive custom desensitization function or a Java desensitization tool is called in the Hive backup library, desensitization rule parameters A, C and D are input into the Hive custom desensitization function or the Java desensitization tool, desensitization source data are updated, and desensitized data are obtained and stored in the Hive backup library.

In step 306, a Java desensitization tool is called in the unstructured backup library, desensitization rule parameters a and C or D are input to the Java desensitization tool, desensitization source data is updated, and desensitized data is obtained and stored in the unstructured backup library.

The desensitization rule parameters A, C and D are uniformly issued or manually extracted through uniform distribution by a desensitization management platform. A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.

It should be understood that, in the embodiment of the present invention, step 303 may be executed first, and then step 302 is executed, that is, a desensitization function is created in the backup library, and then data backup of a desensitization source is performed.

The database self-defined desensitization function, the Hive self-defined desensitization function and the Java desensitization tool in the embodiment of the invention comprise a client name function nameMark (), a certificate number function idMark (), a telephone number function telMark (), a mail box function mailMark (), an address information function addMark (), a client number function numberMark (), a password function passMark (), an account information function accountMark () and a client or employee income information function incomaMark (). The database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by adopting one or two or three modes of a para-position fixed value replacement algorithm, a non-logic bit remainder operation algorithm and a logic bit para-position and invariant algorithm. Desensitization rule parameters A and C are input into a desensitization function generated by non-logic bit remainder operation, and desensitization rule parameter D is input into a desensitization function generated by logic bit alignment and invariant algorithm.

The database self-defined desensitization function is suitable for different database types, including Mysql, Oracle, SqlServer, DB2 and the like, is put in storage and executed in a database self-defined desensitization function mode, creates the database self-defined desensitization function, calls the database self-defined desensitization function through an update statement, desensitizes different data tables in a database backup library, and then can perform database data migration for different service processes. Taking a certain database warehousing execution example, calling a database self-defined desensitization function through an update statement as follows:

update AA set aa.id _ code-idMark (aa.id _ code) -identity card number

update AA set aa.mobile-telephone number

update AA set AA. cut _ name ═ name mark (AA. cut _ name) - -name information

update AA set aa.mail ═ mailMark (aa.mail) - -mailbox information

update AA set aa.passsd ═ passMark (aa.passsd) - -password information

For different database types including Mysql, Oracle, SqlServer, DB2 and the like, the database can be exported into a text file, and the text conversion desensitization is carried out through a java desensitization tool and then used by different business processes.

The Hive user-defined desensitization function is suitable for storing and retaining historical data in a Hive environment, is put in storage to be executed in a Hive user-defined desensitization function mode, is created, is called through a MapReduce statement of hadoop, desensitizes different data tables in a Hive backup library, and is used for different business processes such as big data report making. The historical data stored and retained in the Hive environment can also be desensitized using a Java desensitization tool.

The Java desensitization tool accommodates different database types, storing retained historical data in Hive environment, and data in unstructured text environment. The Java desensitization tool is written by Java, executes a Java-jar pimt. jar A C D input _ file _ path > pimt. out call by a shell statement, and updates desensitization source data by the statement.

In addition, data quality problems inevitably exist due to the writing of data in the production environment, for example, the aa.id _ code field is filled with non-identity number information, and the desensitization algorithm is executed on the basis of the determination field, so that a situation that the desensitization function cannot be executed and an error exit is reported may occur. Therefore, in order to enable desensitization to be performed normally, in the embodiment of the present invention, when a database custom desensitization function, a Hive custom desensitization function, or a Java desensitization tool is called and data desensitization cannot be performed, the numeric increment character string at the beginning of err is replaced with all data to be desensitized that do not meet the rule.

Fig. 4 is a flow chart of data desensitization in a database production environment according to an embodiment of the present invention.

As shown in fig. 4, by way of example in the production environment Oracle database, assume that desensitization and extraction of the identity card number 321xxxxxxxxxxxx and the mobile phone number 188 xxxxxxxxxxxx in the production database Oracle are required, and the following steps are performed:

in step 401, the identification number 321xxxxxxxxxxxxxx and the mobile phone number 188 xxxxxxxxxx in the backup production environment Oracle database are stored in the database backup library as a desensitization source.

In step 402, a database custom desensitization function, idmark (), telMark (), is created in the data backup library.

In step 403, idmark (), telMark () function is called.

In step 404, the parameter value a is 5, C is 0, and D is 2020.

In step 405, an update statement update AA set aa.id _ code idMark (aa.id _ code) is executed on the identification number to be desensitized 321xxxxxxxxxxxxxx, and an identification number after desensitization 321081192911301545 is acquired; and executing an update statement update AA set AA.mobile ═ telMark (AA.mobile) on the to-be-desensitized mobile phone number 188XXXXXXX, and acquiring the desensitized mobile phone number 18892401396.

In step 406, the acquired desensitized identification number 321YYYYYYYYYYYYYYY and the mobile phone number 188YYYYYYYY are stored in a database backup library.

FIG. 5 is a flow chart of data desensitization in Hive production environment according to an embodiment of the invention.

As shown in fig. 5, by taking the example of the Hive production environment, it is assumed that identity card numbers 321xxxxxxxxxxxxxx and mobile phone numbers 188 xxxxxxxxxxxx in the Hive production environment are required, and the steps are as follows:

in step 501, identity card numbers 321XXXXXXXXXXXXXXX and mobile phone numbers 188 XXXXXXXXX in the Hive production environment are backed up and stored in a Hive backup library;

in step 502, create Hive custom desensitization function, idmark (), telMark (), in Hive backup library;

in step 503, idmark (), telMark () function is called;

in step 504, the parameter value a is 5, C is 0, and D is 2020;

in step 505, a MapReduce statement MapReduce AA set aa.id _ code ═ idMark (aa.id _ code) is executed on the identification number to be desensitized 321xxxxxxxxxxxx, and the identification number after desensitization 321YYYYYYYYYYYYYYY is obtained; executing a MapReduce statement MapReduce AA set aa.mobile ═ telMark (aa.mobile) on the to-be-desensitized mobile phone number 188XXXXXXXX, and acquiring a post-desensitized mobile phone number 188 yyyyyyyyy.

In step 506, the acquired desensitized identification number 321YYYYYYYYYYYYYYY and the mobile phone number 188YYYYYYYY are stored in the Hive backup library.

As shown in fig. 6, taking an example of txt files stored in the FTP of the unstructured text production environment, it is assumed that identity numbers 321xxxxxxxxxxxxxx and mobile phone numbers 188 xxxxxxxxxxxxxx in txt are needed, and the following steps are performed:

in step 601, the identification number 321XXXXXXXXXXXXX and the mobile phone number 188 XXXXXXXXX in txt in the unstructured text production environment FTP are backed up and stored in the unstructured backup library.

In step 602, a java desensitization tool datamask. java is created in the unstructured backup library, which includes various desensitization functions, such as certificate number function idmark (), telephone number function telMark (), and other functions.

In step 603, the java desensitization tool datamask.

In step 604, the parameter value a is 5, C is 0, and D is 2020.

In step 605, executing a java program on the identification number to be desensitized 321XXXXXXXXXXXX in an execution mode of executing java-jar pimt.jar 502020 input _ file _ path > pimt.out for the shell statement, and acquiring the identification number after desensitization 321 YYYYYYYYYYYYYYY; executing a java program for the to-be-desensitized mobile phone number 188XXXXXXX in an execution mode of executing java-jar pimt.jar 502020 input _ file _ path > pimt.out for a shell statement, and acquiring the desensitized mobile phone number 188 YYYYYYYYYYYYY.

In step 606, txt files for obtaining the desensitized identification number 321YYYYYYYYYYYYYYY and the mobile phone number 188 yyyyyyyyy are stored in an unstructured backup library.

In summary, the distributed bank data desensitization method and system provided by the invention have the following advantages:

1. the distributed multi-environment deployment device is suitable for different databases, Hive environments and unstructured text environments, has the characteristics of reversible data tracing, does not influence the original validity check of data, keeps data relevance in different environments, can flexibly change desensitized data according to needs, and has the advantages of difficulty in cracking data and improvement of data maintenance and sharing safety.

2. The desensitization process is executed in a distributed mode by adopting a database self-defined function, a Hive self-defined function and a java desensitization tool, the desensitization process can be applied to different data storage scenes in banks, different business requirements of the banking industry are met, and the comprehensive applicability is strong;

3. the data after desensitization is applied to different scenes and is difficult to be broken and restored;

4. the method has the advantages that the method has strong wildcard property, the data of banks is rich continuously, sensitive data can be dispersed in thousands of tables and fields, and different business systems or processes can select field information needing desensitization according to the requirements of the system according to a defined global desensitization object;

5. desensitization content has reversibility backtracking, and due to reversibility of a desensitization algorithm, desensitized field information can be restored into real field information according to business scene requirements, such as fault investigation, business analysis and the like, so as to form a real analysis result;

6. the original data relevance is kept, because the same set of desensitization rules are used for desensitization among different systems and different service processes, the result after desensitization can also meet the data relevance characteristics of the service system, the data calling relation among the internal, external and files of the table is kept unchanged, and the requirement of joint debugging matching test of the core system in the whole row and each peripheral matching system is well met.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A distributed bank data desensitization method is characterized by comprising the following steps:

defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool, wherein the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mail box function, an address information function, a customer number function, a password function, an account information function and a customer or employee income information function;

2. The distributed bank data desensitization method according to claim 1, wherein the values of the desensitization rule parameters a, C, D are input by the desensitization management platform.

3. The distributed bank data desensitization method according to claim 1, wherein the database custom desensitization function, Hive custom desensitization function, and Java desensitization tool are generated using one or two or three of a fixed value-of-alignment replacement algorithm, a non-logical bit remainder operation algorithm, and a logical bit alignment and invariant algorithm.

4. The distributed bank data desensitization method according to claim 3, wherein desensitization rule parameters A and C are input into the desensitization function generated by the non-logic bit remainder operation, and a desensitization rule parameter D is input into the desensitization function generated by the logic bit alignment and invariant algorithm.

5. The distributed bank data desensitization method according to claim 1, wherein the database custom desensitization function is called by an update statement and updates desensitization source data by the statement; the Hive self-defined desensitization function is called through a MapReduce statement of hadoop, and desensitization source data are updated through the statement; the Java desensitization tool executes a Java-jar pimt. jar A C D input _ file _ path > pimt. out call through a shell statement and updates desensitization source data through the statement.

6. The distributed bank data desensitization method according to claim 1, wherein when the calling database custom desensitization function, Hive custom desensitization function or Java desensitization tool cannot perform data desensitization, the numeric incremental character strings at the beginning of err are replaced with all data to be desensitized which do not meet the rules.

7. A distributed bank data desensitization system, comprising: the desensitization function management system comprises a desensitization function definition unit, a database backup library, a Hive backup library, an unstructured backup library and a desensitization management platform;

the desensitization function definition unit is used for defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool, wherein the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mail function, an address information function, a customer number function, a password function, an account information function and a customer or employee income information function;

8. The distributed bank data desensitization system according to claim 7, wherein the database custom desensitization function, Hive custom desensitization function, and Java desensitization tool are generated by one or two or three of a fixed value-to-bit replacement algorithm, a logical bit alignment algorithm, and a constant algorithm non-logical bit remainder algorithm.

9. A distributed bank data desensitization system according to claim 7, wherein said database production environment data, Hive production environment and unstructured text production environment data include customer name, certificate number, telephone number, mailbox, address information, customer number, password, account information, and customer or employee revenue information.