[go: up one dir, main page]

CN110826105B - Distributed bank data desensitization method and system - Google Patents

Distributed bank data desensitization method and system Download PDF

Info

Publication number
CN110826105B
CN110826105B CN201911116450.6A CN201911116450A CN110826105B CN 110826105 B CN110826105 B CN 110826105B CN 201911116450 A CN201911116450 A CN 201911116450A CN 110826105 B CN110826105 B CN 110826105B
Authority
CN
China
Prior art keywords
desensitization
function
data
hive
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911116450.6A
Other languages
Chinese (zh)
Other versions
CN110826105A (en
Inventor
吴昊
王巍
王景斌
陈菲琪
施志晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Sushang Bank Co ltd
Original Assignee
Jiangsu Suning Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Suning Bank Co Ltd filed Critical Jiangsu Suning Bank Co Ltd
Priority to CN201911116450.6A priority Critical patent/CN110826105B/en
Publication of CN110826105A publication Critical patent/CN110826105A/en
Application granted granted Critical
Publication of CN110826105B publication Critical patent/CN110826105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed bank data desensitization method and a system, wherein the method comprises the following steps: defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool; the data under the backup database production environment, the Hive production environment and the unstructured text production environment are used as data desensitization sources and are respectively stored in a database backup library, a Hive backup library and an unstructured backup library; creating a database self-defined desensitization function, a Hive self-defined desensitization function and a desensitization tool in each backup library; calling a database user-defined desensitization function or a Java desensitization tool in the database backup library, calling the Hive user-defined desensitization function or the Java desensitization tool in the Hive backup library, and calling the Java desensitization tool in the unstructured backup library; and simultaneously, desensitization rule parameters A and C or D are also input when each backup calls the function. The invention can be suitable for different databases, Hive environments and unstructured text environments, and has the advantages of reversible data tracing and difficult data cracking.

Description

Distributed bank data desensitization method and system
Technical Field
The invention relates to the field of human information processing, in particular to a distributed bank data desensitization method and system.
Background
Along with the development of bank information technology, the scale of a data center is continuously enlarged, the sensitive data stored in a bank is gradually increased, and the data security risk is increased in the circulation process of the data in different processes in the bank. At present, for most banks, personal information of users collected/stored in the processes of user registration and account opening through online banking and offline network points, including names, mobile phone numbers, mailboxes, identity card numbers, addresses and the like, belong to sensitive information needing to be protected. The sensitive information may participate in the processes of development testing, data analysis, data mining, big data report and other links due to business requirements in the bank, so that the sensitive data needs to be desensitized in different links, all sensitive information processing or part sensitive information processing is performed according to different scenes with the financial business requirement leading.
In the actual processing process of sensitive data, various data forms such as structured data/big data retention data/unstructured data and the like are involved, but security products provided by security manufacturers in the market at present, such as a database desensitization system for desensitizing a database independently, a system for desensitizing a text independently and the like, often a set of products cannot meet the application of multiple scenes, cannot meet the incidence relation between data retention and unstructured data of emerging scenes such as Hive and the like, and cannot achieve the desensitization rule unification and relevance of all data in a bank
In view of the importance of bank data security, and the defects of incomplete applicable scenes, incomplete target data support types and the like of desensitization products/schemes in the current market, a universal distributed data desensitization system under different scenes in a bank needs to be researched and designed urgently.
Disclosure of Invention
The invention aims to solve the problems, and therefore provides a distributed bank data desensitization method and system.
To achieve the above object, in a first aspect, the present invention provides a distributed bank data desensitization method, including the steps of:
defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool;
the data under the backup database production environment, the Hive production environment and the unstructured text production environment are used as data desensitization sources and are respectively stored in a database backup library, a Hive backup library and an unstructured backup library;
respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a desensitization tool in a database backup library, a Hive backup library and an unstructured backup library according to a defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool;
calling a database user-defined desensitization function or a Java desensitization tool in the database backup library, inputting desensitization rule parameters A, C or D into the database user-defined desensitization function or the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the database backup library;
calling a Hive user-defined desensitization function or a Java desensitization tool in the Hive backup library, inputting desensitization rule parameters A, C or D into the Hive user-defined desensitization function or the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the Hive backup library;
calling a Java desensitization tool in the unstructured backup library, inputting desensitization rule parameters A and C or D into the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the unstructured backup library;
wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.
Further, the values of the desensitization rule parameters A, C and D are input by a desensitization management platform.
Further, the database customized desensitization function, Hive customized desensitization function and Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mailbox function, an address information function, a customer number function, a password function, an account information function and a customer or employee income information function.
Further, the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by one or two or three modes of a para-position fixed value replacement algorithm, a non-logic bit remainder operation algorithm and a logic bit para-position and invariant algorithm.
Furthermore, desensitization rule parameters A and C are input into a desensitization function generated by the non-logic bit remainder operation, and a desensitization rule parameter D is input into a desensitization function generated by the logic bit alignment and invariant algorithm.
Further, the database self-defined desensitization function is called through an update statement, and desensitization source data are updated through the statement; the Hive self-defined desensitization function is called through a MapReduce statement of hadoop, and desensitization source data are updated through the statement; the Java desensitization tool executes a Java-jar pimt. jar A C D input _ file _ path > pimt. out call through a shell statement and updates desensitization source data through the statement.
Further, when the calling of the database custom desensitization function, the Hive custom desensitization function or the Java desensitization tool fails to perform data desensitization, the data to be desensitized which do not meet the rules are all replaced with the numeric incremental character string at the beginning of err.
In a second aspect, the invention also provides a distributed bank data desensitization system, the system comprising: the desensitization function management system comprises a desensitization function definition unit, a database backup library, a Hive backup library, an unstructured backup library and a desensitization management platform;
the desensitization function definition unit is used for defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool;
the database backup library is used for backing up data of a database production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling the created database user-defined desensitization function and a Java desensitization tool; updating desensitization source data to obtain desensitized data and storing the desensitized data;
the Hive backup library is used for backing up data of a Hive production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Hive custom desensitization function or a Java desensitization tool; updating desensitization source data to obtain desensitized data and storing the desensitized data;
the unstructured backup library is used for backing up data in an unstructured text production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Java desensitization tool, and updating desensitization source data to acquire desensitized data and store the desensitized data;
the desensitization management platform is used for inputting desensitization rule parameters A and C or D into a database desensitization function or a Java desensitization tool or a Hive desensitization function called by the database backup library, the Hive backup library and the unstructured backup library, wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.
Further, the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by one or two or three modes of a para-position fixed value replacement algorithm, a logic position para-position and a non-logic position remainder operation algorithm of an invariant algorithm.
Further, the data of the database production environment, the Hive production environment and the unstructured text production environment comprise customer names, certificate numbers, telephone numbers, mailboxes, address information, customer numbers, passwords, account information and customer or employee income information.
The distributed bank data desensitization method provided by the invention adopts distributed multi-environment deployment equipment, is suitable for different databases, Hive environments and unstructured text environments, has the characteristics of reversible data tracing, does not influence the original validity check of data, keeps data relevance in different environments, can flexibly change desensitized data according to needs, and has the advantages of difficulty in data cracking and improvement of data maintenance and sharing safety.
Drawings
Fig. 1 is a block diagram of a distributed bank data desensitization system according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an exemplary embodiment of a non-logical bit remainder algorithm;
FIG. 3 is a flow chart of a distributed bank data desensitization method according to an embodiment of the present invention;
FIG. 4 is a flow chart of data desensitization in a database production environment according to an embodiment of the present invention;
FIG. 5 is a flow chart of data desensitization in a Hive production environment according to an embodiment of the invention;
fig. 6 is a flow chart of data desensitization in an unstructured text production environment according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It is to be noted that the drawings are merely illustrative and not to be drawn to strict scale, and that there may be some enlargement and reduction for the convenience of description, and there may be some default to the known partial structure.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It is to be noted that the drawings are merely illustrative and not to be drawn to strict scale, and that there may be some enlargement and reduction for the convenience of description, and there may be some default to the known partial structure.
Fig. 1 is a block diagram of a distributed bank data desensitization system according to an embodiment of the present invention.
As shown in fig. 1, a distributed bank data desensitization system provided in an embodiment of the present invention includes: the desensitization function creating unit 1, the database backup library 2, the Hive backup library 3, the unstructured backup library 4 and the desensitization management platform 5.
The desensitization function definition unit 1 is mainly responsible for defining a database custom desensitization function, a Hive custom desensitization function and a Java desensitization tool.
The database backup library 2 is used for backing up data of a database production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling the created database user-defined desensitization function and a Java desensitization tool; and updating desensitization source data to acquire desensitized data and storing.
The Hive backup library 3 is mainly responsible for backing up data of the Hive production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Hive custom desensitization function or a Java desensitization tool; and updating desensitization source data to acquire desensitized data and storing.
The unstructured backup library 4 is mainly responsible for backing up data in an unstructured text production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; and calling a Java desensitization tool, and updating desensitization source data to acquire desensitized data and save.
The desensitization management platform 5 is mainly responsible for inputting desensitization rule parameters A and C or D into a database desensitization function or a Java desensitization tool or a Hive desensitization function called by the database backup library, the Hive backup library and the unstructured backup library, wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.
Data in the database production environment, Hive production environment, and unstructured text production environment include, but are not limited to, customer name, certificate number, telephone number, Email, address information, customer number, password, customer or employee income information, etc., see table 1 below.
Figure GDA0003195627340000051
Figure GDA0003195627340000061
TABLE 1
The database custom desensitization function is adaptable to different database types including Mysql, Oracle, SqlServer, DB2, etc. The Hive custom desensitization function is adapted to store the retained historical data in Hive. The Java desensitization tool is suitable for unstructured data such as office documents, texts, XML, HTML, various reports and the like. The database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mail box function, an address information function, a customer number function, a password function, an account information function, a customer or employee income information function and the like. The database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by adopting one or two or three of a para-position fixed value replacement algorithm, a logic bit para-position and invariant algorithm and a logic bit remainder operation algorithm.
The counterpoint fixed value replacement algorithm has the using scene of the place where sensitive data such as a client name, a mailbox, address information, a password, a personal website and the like exist, and data in the same field do not exist correlation and have no check on the input type, so that the sensitive data can be directly replaced by specific data with a fixed value and a fixed length, and desensitization of user information is realized. Logic bit contraposition and invariant algorithm, using the scene as the certificate number, such as the ID card number, and the like, because of the limitation of the coding rule of the ID card number, when the system inputs the ID card number, the rationality check can be carried out (if the similar condition that part of digits in the ID card number are 19801563 can not occur), so that the random contraposition replacement algorithm can not be used under the condition of ensuring non-duplication; the subtraction method with the unchanged sum of the alignment positions is used for operation, for example, the 7 th to 10 th positions of the ID card information are subtracted by a certain specific value to obtain a value, and the value is written into the corresponding alignment position as a replacement value. The non-logic bit remainder operation algorithm uses the positions of the mobile phone number, the customer number, the ID card number, the account number information and the like in the using scene, has no specific coding rule limitation, and uses an alignment fixed algorithm.
Specifically, the client name function, the mailbox function, the address information function, the password function and the personal website function are generated by adopting a bit-aligned fixed value replacement algorithm. The client number function and the account number information function are generated by using a non-logic bit remainder operation algorithm. The telephone number function is generated by a bit fixed value replacement algorithm (fixed telephone) and a non-logic bit remainder algorithm (mobile telephone). The certificate number function is generated by adopting an alignment fixed value replacement algorithm (for public organization codes, industrial and commercial registration numbers, taxpayer identification numbers and the like), a non-logic bit surplus algorithm and a logic bit alignment and invariant algorithm (for privacies: identity numbers, passport numbers, port and Australian passes, family registers, military officer certificates and the like).
Desensitization rule parameters A and C are input into a desensitization function generated by a non-logic bit remainder operation algorithm, and desensitization rule parameter D is input into a desensitization function generated by a logic bit counterpoint and invariant algorithm. Desensitization functions generated by the alignment fixed value replacement algorithm do not lose desensitization rule parameters. Fig. 2 and table 2 illustrate input desensitization rule parameters a and C in a desensitization function generated by a non-logic bit remainder algorithm.
Figure GDA0003195627340000071
TABLE 2
As shown in fig. 2 and table 2, the second column of original digits in table 2 is 0,1,2,3,4,5,6,7,8, and 9, i.e., corresponding to the values of a0, a1, a2, a3, a4, a5, a6, a7, a8, and a9 in fig. 2, a0 is 0, a1 is 1, a2 is 2, a3 is 3, a4 is 4, a5 is 5, a6 is 6, a7 is 7, a8 is 8, and a9 is 9. Desensitization rule input parameters a ═ 5, C ═ 0 or C ═ 1, a0 to a9, after the fig. 2 residual transport b ═ MOD ((a ^ 2 (a +3)),9), gave the fourth column numbers 0,4,8,3,7,2,6,1,5,0 in table 2, i.e. b '0, b' 1, b '2, b' 3, b '4, b' 5, b '6, b' 7, b '8, b' 9 in fig. 2. b '0 is 0, b' 1 is 4, b '2 is 8, b' 3 is 3, b '4 is 7, b' 5 is 2, b '6 is 6, b' 7 is 1, b '8 is 5, b' 9 is 0, then b '0 to b' 9 are compared with a0 to a9, it is judged that there are some identical values between the two, if more than 3, A + + Print is executed, and the remainder operation is continued until the obtained digits are identical to the original digits and the number of digits is less than 3. If the number of the second row is less than 3, if only the first row is the same as the second row compared with the fourth row in table 2 of this embodiment, the next step can be executed to determine the value of C, i.e. to determine whether to perform variation.
Description of whether variants are present: during the non-logic bit remainder operation, the public function leaves a C entry as a process input parameter, C represents whether offset replacement is carried out on the intermediate quantity result of the operation or not in the non-logic bit remainder operation, if the C input is 1, the variation is carried out, the replacement of b '0 and a9 is carried out, if the C input is 0, the variation is not carried out, the replacement of b' 9 and a9 is carried out, so that the randomness of the desensitization algorithm is increased, and the only traceability is ensured. Namely, the value of C can ensure that the residual values b '0 to b' 9 have no repeated value, so that the value after desensitization is unique, namely the value after desensitization is ensured to be uniquely corresponding to the original value, and reversible/backtracking operation can be carried out.
The desensitization source data in the database backup library 2, the Hive backup library 3 and the unstructured backup library 4, the customer name, the certificate number, the telephone number, Email, the address information, the customer number, the password, the customer or employee income information, and the like, which are formed by the database customized desensitization function or the Hive customized desensitization function or the Java desensitization tool, can refer to the following table 3.
Figure GDA0003195627340000081
Figure GDA0003195627340000091
Figure GDA0003195627340000101
TABLE 3
Fig. 3 is a flowchart of a distributed bank data desensitization method according to an embodiment of the present invention.
In step 301, a database custom desensitization function, Hive custom desensitization function, and Java desensitization tool are defined.
In step 302, data in the backup database production environment, Hive production environment, and unstructured text production environment are used as a data desensitization source and stored in the database backup library, Hive backup library, and unstructured backup library, respectively.
In step 303, a database custom desensitization function, a Hive custom desensitization function, and a Java desensitization tool are created in the database backup library, the Hive backup library, and the unstructured backup library according to the defined database custom desensitization function, Hive custom desensitization function, and Java desensitization tool, respectively.
In step 304, a database user-defined desensitization function or a Java desensitization tool is called in the database backup library, desensitization rule parameters A, C and D are input into the database user-defined desensitization function or the Java desensitization tool, desensitization source data are updated, and desensitized data are obtained and stored in the database backup library.
In step 305, a Hive custom desensitization function or a Java desensitization tool is called in the Hive backup library, desensitization rule parameters A, C and D are input into the Hive custom desensitization function or the Java desensitization tool, desensitization source data are updated, and desensitized data are obtained and stored in the Hive backup library.
In step 306, a Java desensitization tool is called in the unstructured backup library, desensitization rule parameters a and C or D are input to the Java desensitization tool, desensitization source data is updated, and desensitized data is obtained and stored in the unstructured backup library.
The desensitization rule parameters A, C and D are uniformly issued or manually extracted through uniform distribution by a desensitization management platform. A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.
It should be understood that, in the embodiment of the present invention, step 303 may be executed first, and then step 302 is executed, that is, a desensitization function is created in the backup library, and then data backup of a desensitization source is performed.
The database self-defined desensitization function, the Hive self-defined desensitization function and the Java desensitization tool in the embodiment of the invention comprise a client name function nameMark (), a certificate number function idMark (), a telephone number function telMark (), a mail box function mailMark (), an address information function addMark (), a client number function numberMark (), a password function passMark (), an account information function accountMark () and a client or employee income information function incomaMark (). The database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool are generated by adopting one or two or three modes of a para-position fixed value replacement algorithm, a non-logic bit remainder operation algorithm and a logic bit para-position and invariant algorithm. Desensitization rule parameters A and C are input into a desensitization function generated by non-logic bit remainder operation, and desensitization rule parameter D is input into a desensitization function generated by logic bit alignment and invariant algorithm.
The database self-defined desensitization function is suitable for different database types, including Mysql, Oracle, SqlServer, DB2 and the like, is put in storage and executed in a database self-defined desensitization function mode, creates the database self-defined desensitization function, calls the database self-defined desensitization function through an update statement, desensitizes different data tables in a database backup library, and then can perform database data migration for different service processes. Taking a certain database warehousing execution example, calling a database self-defined desensitization function through an update statement as follows:
update AA set aa.id _ code-idMark (aa.id _ code) -identity card number
update AA set aa.mobile-telephone number
update AA set AA. cut _ name ═ name mark (AA. cut _ name) - -name information
update AA set aa.mail ═ mailMark (aa.mail) - -mailbox information
update AA set aa.passsd ═ passMark (aa.passsd) - -password information
For different database types including Mysql, Oracle, SqlServer, DB2 and the like, the database can be exported into a text file, and the text conversion desensitization is carried out through a java desensitization tool and then used by different business processes.
The Hive user-defined desensitization function is suitable for storing and retaining historical data in a Hive environment, is put in storage to be executed in a Hive user-defined desensitization function mode, is created, is called through a MapReduce statement of hadoop, desensitizes different data tables in a Hive backup library, and is used for different business processes such as big data report making. The historical data stored and retained in the Hive environment can also be desensitized using a Java desensitization tool.
The Java desensitization tool accommodates different database types, storing retained historical data in Hive environment, and data in unstructured text environment. The Java desensitization tool is written by Java, executes a Java-jar pimt. jar A C D input _ file _ path > pimt. out call by a shell statement, and updates desensitization source data by the statement.
In addition, data quality problems inevitably exist due to the writing of data in the production environment, for example, the aa.id _ code field is filled with non-identity number information, and the desensitization algorithm is executed on the basis of the determination field, so that a situation that the desensitization function cannot be executed and an error exit is reported may occur. Therefore, in order to enable desensitization to be performed normally, in the embodiment of the present invention, when a database custom desensitization function, a Hive custom desensitization function, or a Java desensitization tool is called and data desensitization cannot be performed, the numeric increment character string at the beginning of err is replaced with all data to be desensitized that do not meet the rule.
Fig. 4 is a flow chart of data desensitization in a database production environment according to an embodiment of the present invention.
As shown in fig. 4, by way of example in the production environment Oracle database, assume that desensitization and extraction of the identity card number 321xxxxxxxxxxxx and the mobile phone number 188 xxxxxxxxxxxx in the production database Oracle are required, and the following steps are performed:
in step 401, the identification number 321xxxxxxxxxxxxxx and the mobile phone number 188 xxxxxxxxxx in the backup production environment Oracle database are stored in the database backup library as a desensitization source.
In step 402, a database custom desensitization function, idmark (), telMark (), is created in the data backup library.
In step 403, idmark (), telMark () function is called.
In step 404, the parameter value a is 5, C is 0, and D is 2020.
In step 405, an update statement update AA set aa.id _ code idMark (aa.id _ code) is executed on the identification number to be desensitized 321xxxxxxxxxxxxxx, and an identification number after desensitization 321081192911301545 is acquired; and executing an update statement update AA set AA.mobile ═ telMark (AA.mobile) on the to-be-desensitized mobile phone number 188XXXXXXX, and acquiring the desensitized mobile phone number 18892401396.
In step 406, the acquired desensitized identification number 321YYYYYYYYYYYYYYY and the mobile phone number 188YYYYYYYY are stored in a database backup library.
FIG. 5 is a flow chart of data desensitization in Hive production environment according to an embodiment of the invention.
As shown in fig. 5, by taking the example of the Hive production environment, it is assumed that identity card numbers 321xxxxxxxxxxxxxx and mobile phone numbers 188 xxxxxxxxxxxx in the Hive production environment are required, and the steps are as follows:
in step 501, identity card numbers 321XXXXXXXXXXXXXXX and mobile phone numbers 188 XXXXXXXXX in the Hive production environment are backed up and stored in a Hive backup library;
in step 502, create Hive custom desensitization function, idmark (), telMark (), in Hive backup library;
in step 503, idmark (), telMark () function is called;
in step 504, the parameter value a is 5, C is 0, and D is 2020;
in step 505, a MapReduce statement MapReduce AA set aa.id _ code ═ idMark (aa.id _ code) is executed on the identification number to be desensitized 321xxxxxxxxxxxx, and the identification number after desensitization 321YYYYYYYYYYYYYYY is obtained; executing a MapReduce statement MapReduce AA set aa.mobile ═ telMark (aa.mobile) on the to-be-desensitized mobile phone number 188XXXXXXXX, and acquiring a post-desensitized mobile phone number 188 yyyyyyyyy.
In step 506, the acquired desensitized identification number 321YYYYYYYYYYYYYYY and the mobile phone number 188YYYYYYYY are stored in the Hive backup library.
Fig. 6 is a flow chart of data desensitization in an unstructured text production environment according to an embodiment of the present invention.
As shown in fig. 6, taking an example of txt files stored in the FTP of the unstructured text production environment, it is assumed that identity numbers 321xxxxxxxxxxxxxx and mobile phone numbers 188 xxxxxxxxxxxxxx in txt are needed, and the following steps are performed:
in step 601, the identification number 321XXXXXXXXXXXXX and the mobile phone number 188 XXXXXXXXX in txt in the unstructured text production environment FTP are backed up and stored in the unstructured backup library.
In step 602, a java desensitization tool datamask. java is created in the unstructured backup library, which includes various desensitization functions, such as certificate number function idmark (), telephone number function telMark (), and other functions.
In step 603, the java desensitization tool datamask.
In step 604, the parameter value a is 5, C is 0, and D is 2020.
In step 605, executing a java program on the identification number to be desensitized 321XXXXXXXXXXXX in an execution mode of executing java-jar pimt.jar 502020 input _ file _ path > pimt.out for the shell statement, and acquiring the identification number after desensitization 321 YYYYYYYYYYYYYYY; executing a java program for the to-be-desensitized mobile phone number 188XXXXXXX in an execution mode of executing java-jar pimt.jar 502020 input _ file _ path > pimt.out for a shell statement, and acquiring the desensitized mobile phone number 188 YYYYYYYYYYYYY.
In step 606, txt files for obtaining the desensitized identification number 321YYYYYYYYYYYYYYY and the mobile phone number 188 yyyyyyyyy are stored in an unstructured backup library.
In summary, the distributed bank data desensitization method and system provided by the invention have the following advantages:
1. the distributed multi-environment deployment device is suitable for different databases, Hive environments and unstructured text environments, has the characteristics of reversible data tracing, does not influence the original validity check of data, keeps data relevance in different environments, can flexibly change desensitized data according to needs, and has the advantages of difficulty in cracking data and improvement of data maintenance and sharing safety.
2. The desensitization process is executed in a distributed mode by adopting a database self-defined function, a Hive self-defined function and a java desensitization tool, the desensitization process can be applied to different data storage scenes in banks, different business requirements of the banking industry are met, and the comprehensive applicability is strong;
3. the data after desensitization is applied to different scenes and is difficult to be broken and restored;
4. the method has the advantages that the method has strong wildcard property, the data of banks is rich continuously, sensitive data can be dispersed in thousands of tables and fields, and different business systems or processes can select field information needing desensitization according to the requirements of the system according to a defined global desensitization object;
5. desensitization content has reversibility backtracking, and due to reversibility of a desensitization algorithm, desensitized field information can be restored into real field information according to business scene requirements, such as fault investigation, business analysis and the like, so as to form a real analysis result;
6. the original data relevance is kept, because the same set of desensitization rules are used for desensitization among different systems and different service processes, the result after desensitization can also meet the data relevance characteristics of the service system, the data calling relation among the internal, external and files of the table is kept unchanged, and the requirement of joint debugging matching test of the core system in the whole row and each peripheral matching system is well met.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A distributed bank data desensitization method is characterized by comprising the following steps:
defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool, wherein the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mail box function, an address information function, a customer number function, a password function, an account information function and a customer or employee income information function;
the data under the backup database production environment, the Hive production environment and the unstructured text production environment are used as data desensitization sources and are respectively stored in a database backup library, a Hive backup library and an unstructured backup library;
respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a desensitization tool in a database backup library, a Hive backup library and an unstructured backup library according to a defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool;
calling a database user-defined desensitization function or a Java desensitization tool in the database backup library, inputting desensitization rule parameters A, C or D into the database user-defined desensitization function or the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the database backup library;
calling a Hive user-defined desensitization function or a Java desensitization tool in the Hive backup library, inputting desensitization rule parameters A, C or D into the Hive user-defined desensitization function or the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the Hive backup library;
calling a Java desensitization tool in the unstructured backup library, inputting desensitization rule parameters A and C or D into the Java desensitization tool, updating desensitization source data, acquiring desensitized data and storing the desensitized data in the unstructured backup library;
wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.
2. The distributed bank data desensitization method according to claim 1, wherein the values of the desensitization rule parameters a, C, D are input by the desensitization management platform.
3. The distributed bank data desensitization method according to claim 1, wherein the database custom desensitization function, Hive custom desensitization function, and Java desensitization tool are generated using one or two or three of a fixed value-of-alignment replacement algorithm, a non-logical bit remainder operation algorithm, and a logical bit alignment and invariant algorithm.
4. The distributed bank data desensitization method according to claim 3, wherein desensitization rule parameters A and C are input into the desensitization function generated by the non-logic bit remainder operation, and a desensitization rule parameter D is input into the desensitization function generated by the logic bit alignment and invariant algorithm.
5. The distributed bank data desensitization method according to claim 1, wherein the database custom desensitization function is called by an update statement and updates desensitization source data by the statement; the Hive self-defined desensitization function is called through a MapReduce statement of hadoop, and desensitization source data are updated through the statement; the Java desensitization tool executes a Java-jar pimt. jar A C D input _ file _ path > pimt. out call through a shell statement and updates desensitization source data through the statement.
6. The distributed bank data desensitization method according to claim 1, wherein when the calling database custom desensitization function, Hive custom desensitization function or Java desensitization tool cannot perform data desensitization, the numeric incremental character strings at the beginning of err are replaced with all data to be desensitized which do not meet the rules.
7. A distributed bank data desensitization system, comprising: the desensitization function management system comprises a desensitization function definition unit, a database backup library, a Hive backup library, an unstructured backup library and a desensitization management platform;
the desensitization function definition unit is used for defining a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool, wherein the database user-defined desensitization function, the Hive user-defined desensitization function and the Java desensitization tool comprise a customer name function, a certificate number function, a telephone number function, a mail function, an address information function, a customer number function, a password function, an account information function and a customer or employee income information function;
the database backup library is used for backing up data of a database production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling the created database user-defined desensitization function and a Java desensitization tool; updating desensitization source data to obtain desensitized data and storing the desensitized data;
the Hive backup library is used for backing up data of a Hive production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Hive custom desensitization function or a Java desensitization tool; updating desensitization source data to obtain desensitized data and storing the desensitized data;
the unstructured backup library is used for backing up data in an unstructured text production environment as a data desensitization source; respectively creating a database user-defined desensitization function, a Hive user-defined desensitization function and a Java desensitization tool according to the defined database user-defined desensitization function, Hive user-defined desensitization function and Java desensitization tool; calling a Java desensitization tool, and updating desensitization source data to acquire desensitized data and store the desensitized data;
the desensitization management platform is used for inputting desensitization rule parameters A and C or D into a database desensitization function or a Java desensitization tool or a Hive desensitization function called by the database backup library, the Hive backup library and the unstructured backup library, wherein A is an initial random input parameter, and the value of A is any positive integer; c is a process input parameter, and the value of C can be selected to be 0 or 1; d is an initial random input parameter, and the value of D is 2020-2120.
8. The distributed bank data desensitization system according to claim 7, wherein the database custom desensitization function, Hive custom desensitization function, and Java desensitization tool are generated by one or two or three of a fixed value-to-bit replacement algorithm, a logical bit alignment algorithm, and a constant algorithm non-logical bit remainder algorithm.
9. A distributed bank data desensitization system according to claim 7, wherein said database production environment data, Hive production environment and unstructured text production environment data include customer name, certificate number, telephone number, mailbox, address information, customer number, password, account information, and customer or employee revenue information.
CN201911116450.6A 2019-11-15 2019-11-15 Distributed bank data desensitization method and system Active CN110826105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911116450.6A CN110826105B (en) 2019-11-15 2019-11-15 Distributed bank data desensitization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911116450.6A CN110826105B (en) 2019-11-15 2019-11-15 Distributed bank data desensitization method and system

Publications (2)

Publication Number Publication Date
CN110826105A CN110826105A (en) 2020-02-21
CN110826105B true CN110826105B (en) 2021-11-12

Family

ID=69555389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911116450.6A Active CN110826105B (en) 2019-11-15 2019-11-15 Distributed bank data desensitization method and system

Country Status (1)

Country Link
CN (1) CN110826105B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274610A (en) * 2020-01-21 2020-06-12 京东数字科技控股有限公司 Data desensitization method and device and desensitization service platform
CN112116973A (en) * 2020-09-17 2020-12-22 山东健康医疗大数据有限公司 A systematic desensitization approach to personal health medical data
CN112732489B (en) * 2021-01-11 2023-05-09 上海上讯信息技术股份有限公司 Data desensitization method and device based on database virtualization
CN112861185A (en) * 2021-03-31 2021-05-28 中国工商银行股份有限公司 Data automatic deformation transmission method based on Hive data warehouse

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8881149B2 (en) * 2012-04-11 2014-11-04 International Business Machines Corporation Control of java resource runtime usage
CN106339942A (en) * 2016-08-31 2017-01-18 国信优易数据有限公司 Financial information processing method and system
CN107403111A (en) * 2017-08-10 2017-11-28 中国民航信息网络股份有限公司 HIVE data desensitization method and device
CN109977690A (en) * 2017-12-28 2019-07-05 中国移动通信集团陕西有限公司 A kind of data processing method, device and medium
CN109522740B (en) * 2018-10-16 2021-04-20 易保互联医疗信息科技(北京)有限公司 Health data privacy removal processing method and system
CN109284631A (en) * 2018-10-26 2019-01-29 中国电子科技网络信息安全有限公司 A system and method for document desensitization based on big data
CN109713785A (en) * 2019-03-06 2019-05-03 江苏苏宁银行股份有限公司 A kind of bank data centers distribution UPS power supply system and its method of supplying power to

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Hive Decimal Precision/Scale Support;Xuefu Zhang;《网页在线公开:file:///C:/Users/%E6%99%B4YYR/Downloads/Hive_Decimal_Precision_Scale_Support.pdf》;20131213;第1-4页 *

Also Published As

Publication number Publication date
CN110826105A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN110826105B (en) Distributed bank data desensitization method and system
US8069053B2 (en) Systems and methods for de-identification of personal data
US11386224B2 (en) Method and system for managing personal digital identifiers of a user in a plurality of data elements
CN110781515A (en) Static data desensitization method and desensitization device
US10055600B2 (en) Analysis and specification creation for web documents
CN113626865A (en) A data sharing and open method and system for preventing leakage of sensitive information
US20230107191A1 (en) Data obfuscation platform for improving data security of preprocessing analysis by third parties
CN116541372A (en) Data asset management method and system
CN114186275A (en) Privacy protection method, device, computer equipment and storage medium
CN113158233A (en) Data preprocessing method and device and computer storage medium
US20220019687A1 (en) Systems for and methods of data obfuscation
CN112052891A (en) Machine behavior recognition method, device, equipment and computer readable storage medium
US11722324B2 (en) Secure and accountable execution of robotic process automation
US8037109B2 (en) Generation of repeatable synthetic data
CN117609400A (en) Data flow chart generation device and method, electronic equipment and storage medium
CN116450745A (en) Multi-device-based note file operation method, system and readable storage medium
KR20250002165A (en) System and method for assessing risk scores for non-fungible tokens traded on blockchain
Neretin et al. Model for describing processes of AI systems vulnerabilities collection and analysis using big data tools
CN114238273A (en) Database management method, device, equipment and storage medium
US12417339B1 (en) Analytics tag data quality scanning
RU2821442C1 (en) Method for automatic analysis of downloads from databases
WO2022129605A1 (en) Method of processing data from a data source, apparatus and computer program
CN117591544A (en) Data query report acquisition method and device, electronic equipment and storage medium
CN119671577A (en) Merchant abnormality type identification method, device, equipment, medium and program
CN115827478A (en) Code viewing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No.4 building, Hexi Financial City, Jianye District, Nanjing City, Jiangsu Province, 210000

Patentee after: Jiangsu Sushang Bank Co.,Ltd.

Country or region after: China

Address before: No.4 building, Hexi Financial City, Jianye District, Nanjing City, Jiangsu Province, 210000

Patentee before: JIANGSU SUNING BANK Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address