[go: up one dir, main page]

WO2015085358A1 - A method and system for analysing test data to check for the presence of personally identifiable information - Google Patents

A method and system for analysing test data to check for the presence of personally identifiable information Download PDF

Info

Publication number
WO2015085358A1
WO2015085358A1 PCT/AU2014/050385 AU2014050385W WO2015085358A1 WO 2015085358 A1 WO2015085358 A1 WO 2015085358A1 AU 2014050385 W AU2014050385 W AU 2014050385W WO 2015085358 A1 WO2015085358 A1 WO 2015085358A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
distribution
identifiable information
personally identifiable
test data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/AU2014/050385
Other languages
French (fr)
Inventor
Niall CRAWFORD
Liam McCRORY
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ENOV8 DATA Pty Ltd
Original Assignee
ENOV8 DATA Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2013904792A external-priority patent/AU2013904792A0/en
Application filed by ENOV8 DATA Pty Ltd filed Critical ENOV8 DATA Pty Ltd
Publication of WO2015085358A1 publication Critical patent/WO2015085358A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Definitions

  • the present invention relates to methods and systems for analysing test data and particularl relates to methods and systems for checking test data for the presence of personall identifiable information.
  • test data is prepared which mirrors the production data.
  • Preparation of "production-like" data is typically done in one of two ways, (1) via an Extract, Transform & Load (ETL) process i.e. where the data is extracted from the production environment itself and i s then manipulated e.g. subset and privatized before loading into the non-production environment (Development, Test & Training) and/or (2) fabricated/created from scratch.
  • ETL Extract, Transform & Load
  • the former process tends to be prevalent as it tends to deliver test data that looks and behaves more like the real production environment data.
  • the process of privatisation, during the ETL process, is a necessary as a way of ensuring Personally Identifiable Information (PII) data is obfuscated and ultimately ensures that customers, business and employee details are not disseminated and thus remain protected.
  • PII Personally Identifiable Information
  • the present invention provides a method of analysing test data to check for the presence of personally identifiable information including the steps of; determining a typical distribution of values for at least one field of data in a production environment; calculating the actual distribution of values for at least one field of the test data; comparing the typical distribution with the actual distribution; and providing an indicatio of the likely presence of personally identifiable information based on the result of the comparison.
  • the step of comparing may include calculating the correlation between the typical distribution and the actual distribution.
  • the method may further include the step of scanning the test data for personally identifiable information data types.
  • the data types include, and is not limited to, an one of First Name, Last Name, Email, Address, Telephone Number, Tax File/Social Security numbers, Driving License, Passport & Credit Cards.
  • the present invention provides a system for analysing test data to check for the presence of personally identifiable information including:
  • calculating means for calculating the actual distribution of values for at least one field of the test data; comparing means for comparing a typical distribution of values with the actual distribution; and display means for providing an indication of the likely presence of personally identifiable information based on the result Of the comparison.
  • the compari g means may be arranged to calculate a value representing the correlation between the typical distribution and the actual distribution.
  • the i vention provides a software program including instructions which, when carried out by a processor, cause a computing system to operate a method according to the first aspect of the invention or to embody a system accordi g to the second aspect of the inventi n.
  • the present invention provides a computer readable medium which i s popul ated with a software program according to the third aspect of the invention.
  • the invention is based on the reali sation that the act of privatisation changes both data content (column values) and the distribution of these values. This allows automatic validation of whether data has been adequately transformed (privatised) or not.
  • Figure 1 is schematic diagram illustrating an embodiment of the invention
  • Figure 2 is an example screenshot showing the operation of the invention.
  • an organisation has a number of production databases 5 and number of non-production environments 10 i .e. development, test and training.
  • ETL Extract, Transform .& Load
  • PIT Personally Identifiable Information
  • a two-step automated profiling 20 and validation 25 process is carried out.
  • the method according to the invention i typically initiated by a user accessing th system using their own computing device with a display screen.
  • the computer processes initiated by the user ma ru o their local computing device, or on another computing device under instruction from the user's device.
  • the method of the in vention may be confi ured to ru at specifi ed intervals, or in response to various triggers, such as the loading of new data into the test en vironment.
  • the data in the test environments is automatically scanned for PII data types.
  • Common PII types of interest are First Name, Last Maine, Email, Address, Telephone Number, Tax File Social Security numbers, Driving License, Passport & Credit Cards. This scanning is done by searching for common keywords (e.g. Smith, Jones etc.) and through other value/string matching means (e.g. Regular Expressions and Checksum analysis).
  • a column is identified to be of a certain type (e.g. last name)
  • its details are passed to the validation subroutine 25.
  • the validatio subroutine then analyses the column
  • the validation subroutine 25 takes a significant sample of data from the non-production database 10 and calculates the distribution of values in the Surnames columns, The validation routine 25 then compares the calculated distribution against the known typical production distribution by calculating the correlation between the two. If it has a similar pattern (say 10:3:1) then it is likely the data has NOT been properly privatised, it is likely that the data in the non- production database is either production data or a production clone. The greater the discrepancy (lack of correlation) with the production pattern the lower the likelihood that it is not production data (indicating masking or obfuscation has taken place).
  • the size of the sample taken by the validation routine is configurable, the greater the sample the higher the likelihood of accuracy.
  • a typical sample size would be 25,000 rows.
  • the validation subroutine returns a percentage value indicating how closely the pattern found correlates with the typical production pattern, A score of 100% ( 1 ) indicates that the validation routine has found an exact distribution match. A score of 0% (0) indicates that absolutely no correlation was found. A negative score of -100% (-1) highlights a totally negative correlation.
  • EMPLOYEE JLAST__NAME- have very high correlations (97% and 92%) labelled A, One would infer thai this indicates the presence of production cloned (non-obfuscated) ⁇ information.
  • the tool also provides a sample of data that has been found in each of the tables, This information can then be used to streamline further analysis & checks,
  • distribution patterns can be used such as first names, or addresses.
  • the patterns can be tailored to suit a particular set o data, or subset of data, such as by being relevant to a particular location.
  • the following example demonstrates the use of patterns of street names i n Sydney,. Australia.
  • Pattern matching can apply to virtually anything, not just alphanumeric strings. They can apply to numbers and/or sequences of numbers. For example larger banks (issuers) have a greater number of credit cards in distribution. A credit cards prefix determines the issuer. As such one would expect to find more credit cards attributed to a larger ban k than smaller bank.
  • the calculation of the correlation between production data and testing data is managed through a 4 step process
  • Step-1 Identify a baseline-set of values with a strong "distribLrtioix/popularity" spread. From production identify a number of distinct values that have a di stinctive spread (popul ari ty count). By di stincti ve that i s to say the regularity of the value (popul arity of the value) is clearly different from other items in the data set. Refer to the surname example below. Each value (surname) is twice as popular as the next.
  • Step-2 Build an equivalent comparison-set from your Target Data Source (e.g. Test DB)
  • Target Data Source e.g. Test DB
  • Example SURNAME Comparison Set VALUE COUN1 TEST POPULARITY RATIO*
  • ⁇ Popularity Ratio is ratio of COUNT/SUM-OF-SELECTION-COUNT
  • Step-3 Apply a Correlation Formula to the two Data Sets
  • the ''Count" Data Set can also be used & would provide the same results (score).
  • the correlatio formula applied i s of the type known as the Spearma Rank Correlation formula.
  • any RAN correlation / or standard correlation (e.g. Pearson's) formula could be used.
  • the banding is configurable, and by default is set as follows:

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Systems and methods of analysing test data to check for the presence of personally identifiable information including the steps of: determining a typical distribution of values for at least one field of data in a production environment; calculating the actual distribution of values for at least one field of the test data; comparing the typical distribution with the actual distribution; and providing an indication of the likely presence of personally identifiable information based on the result of the comparison.

Description

A METHOD AND SYSTEM FOR ANALYSING TEST DATA TO CHECK FOR THE PRESENCE OF PERSONALLY IDENTIFIABLE INFORMATION
Technical Field
The present invention relates to methods and systems for analysing test data and particularl relates to methods and systems for checking test data for the presence of personall identifiable information.
Background to the Invention
Most businesses and other organi ations use computer software systems which produce, store and manipulate data, known as production data, as part of the ongoing activities of the business. Over time, it typically becomes necessary to revise, upgrade or replace various hardware' or software components of systems. Prior to implementing such changes it is customary to perform thorough testin to ensure correct operatio of the proposed system modifications thu avoiding malfunctions and disruption, to the operati on of the business. To thi s end, test data is prepared which mirrors the production data.
Preparation of "production-like" data is typically done in one of two ways, (1) via an Extract, Transform & Load (ETL) process i.e. where the data is extracted from the production environment itself and i s then manipulated e.g. subset and privatized before loading into the non-production environment (Development, Test & Training) and/or (2) fabricated/created from scratch. The former process (!) tends to be prevalent as it tends to deliver test data that looks and behaves more like the real production environment data.
The process of privatisation, during the ETL process, is a necessary as a way of ensuring Personally Identifiable Information (PII) data is obfuscated and ultimately ensures that customers, business and employee details are not disseminated and thus remain protected.
Despite all care being taken, it can occur thai production data makes its way into test systems. This is particularly the case in large organ ations or in complex systems which are maintained by a large number of people. As a result. Personally Identifiable Information becomes vul erable to be accessed by and/or disseminated to unauthorised persons.
It would therefore be desirable to be able to analyse test data to check for the presence of Personally Identifiable Information.
Summary of the Invention
In a first aspect the present invention provides a method of analysing test data to check for the presence of personally identifiable information including the steps of; determining a typical distribution of values for at least one field of data in a production environment; calculating the actual distribution of values for at least one field of the test data; comparing the typical distribution with the actual distribution; and providing an indicatio of the likely presence of personally identifiable information based on the result of the comparison.
The step of comparing may include calculating the correlation between the typical distribution and the actual distribution.
The method may further include the step of scanning the test data for personally identifiable information data types.
The data types include, and is not limited to, an one of First Name, Last Name, Email, Address, Telephone Number, Tax File/Social Security numbers, Driving License, Passport & Credit Cards.
In a second aspect the present invention provides a system for analysing test data to check for the presence of personally identifiable information including:
calculating means for calculating the actual distribution of values for at least one field of the test data; comparing means for comparing a typical distribution of values with the actual distribution; and display means for providing an indication of the likely presence of personally identifiable information based on the result Of the comparison.
The compari g means may be arranged to calculate a value representing the correlation between the typical distribution and the actual distribution.
In a third aspect the i vention provides a software program including instructions which, when carried out by a processor, cause a computing system to operate a method according to the first aspect of the invention or to embody a system accordi g to the second aspect of the inventi n.
In a fourth aspect the present invention provides a computer readable medium which i s popul ated with a software program according to the third aspect of the invention.
The invention is based on the reali sation that the act of privatisation changes both data content (column values) and the distribution of these values. This allows automatic validation of whether data has been adequately transformed (privatised) or not.
Brief Description of the Drawings
An embodiment of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 is schematic diagram illustrating an embodiment of the invention; and Figure 2 is an example screenshot showing the operation of the invention.
Detailed Description of the Preferred Embodiment
Referring to figure 1, an organisation has a number of production databases 5 and number of non-production environments 10 i .e. development, test and training. To ensure that the non-production environments 10 have data that "behaves" like production, it is necessary for the data to be copied from production using an Extract, Transform .& Load (ETL) mechanism 15. Furthermore, to ensure that the organisation is compliant with various national and regulator bodies it is also necessary for any Personally Identifiable Information ("PIT) to undergo some form of privatisation (e.g. masking, obfuscation) process.
The challenge with the simple ETL process described above is that organisational data is typically non trivial. Data is often large, uses complex data structures and is housed in many locations and on various different technologies e.g. Mainframe, Windows, UNIX, Relational Databases, Hierarchal Databases, Flat files etc. The complexity of the data inevitably means that mistakes/omissions are likely and ΡΠ information will he copied (and not privatised) into one of more non production environments. Once the Pli is in the .secondary environment it is exposed to being used for secondary purposes and potentially theft.
To mitigate against the risk of PII data finding its way into the non production environments 10, a two-step automated profiling 20 and validation 25 process is carried out. The method according to the invention i typically initiated by a user accessing th system using their own computing device with a display screen. The computer processes initiated by the user ma ru o their local computing device, or on another computing device under instruction from the user's device. In some embodiments, the method of the in vention may be confi ured to ru at specifi ed intervals, or in response to various triggers, such as the loading of new data into the test en vironment.
In the profiling 20 step, the data in the test environments is automatically scanned for PII data types. Common PII types of interest are First Name, Last Maine, Email, Address, Telephone Number, Tax File Social Security numbers, Driving License, Passport & Credit Cards. This scanning is done by searching for common keywords (e.g. Smith, Jones etc.) and through other value/string matching means (e.g. Regular Expressions and Checksum analysis). Once a column is identified to be of a certain type (e.g. last name), its details are passed to the validation subroutine 25. In the validation 25 step, the validatio subroutine then analyses the column
(data) for distribution patterns that would typically be found in a production system. A simple example would be the distribution of surnames 30 in an English speaking country. These distribution patterns have been previously determined and are stored in the system as pattern files 28.
Example:
Number of Smith's, versus Number of Taylor's, versus Number of Foster's Production distribution pattern is approximatel 10: 3: 1 ,
Taking the example of surnames above, the validation subroutine 25 takes a significant sample of data from the non-production database 10 and calculates the distribution of values in the Surnames columns, The validation routine 25 then compares the calculated distribution against the known typical production distribution by calculating the correlation between the two. If it has a similar pattern (say 10:3:1) then it is likely the data has NOT been properly privatised, it is likely that the data in the non- production database is either production data or a production clone. The greater the discrepancy (lack of correlation) with the production pattern the lower the likelihood that it is not production data (indicating masking or obfuscation has taken place).
The size of the sample taken by the validation routine is configurable, the greater the sample the higher the likelihood of accuracy. A typical sample size would be 25,000 rows.
Referring to figure 2, a user interface of the system is shown. Once the profiling 20 & validation 25 steps are complete, the validation subroutine returns a percentage value indicating how closely the pattern found correlates with the typical production pattern, A score of 100% ( 1 ) indicates that the validation routine has found an exact distribution match. A score of 0% (0) indicates that absolutely no correlation was found. A negative score of -100% (-1) highlights a totally negative correlation.
A perfect score/correlation 100% (1) would be obtained by the sequence (order of popularity) matching perfectly . In the case of the example distribution shown in figure 1 , 100% would be achieved by: Smith "being greater than" Jones "being greater than" Taylor "being greater than" Young "being greater than" Foster "being greater than" Barnett In the example in Figure-2 both EMPLOYE JFJRSTJSfAME and
EMPLOYEE JLAST__NAME- have very high correlations (97% and 92%) labelled A, One would infer thai this indicates the presence of production cloned (non-obfuscated) ΡΓΙ information.
To add further accuracy & confirmation of the result the tool also provides a sample of data that has been found in each of the tables, This information can then be used to streamline further analysis & checks,
In addition to distribution of surnames, other distribution patterns can be used such as first names, or addresses. The patterns can be tailored to suit a particular set o data, or subset of data, such as by being relevant to a particular location. The following example demonstrates the use of patterns of street names i n Sydney,. Australia.
Distribution of Sydney Street Name (Prefix)
* Prefix: John: Rating: 18
· Prefix: Albert: Rating: 9
• Prefix: Young: Rating: 6
Prefix: Nelson: Rating: 2
Rating is based on real occurrence. In this instance one would expect to find twice as many John's as Albert's etc.
Distribution of Sydney Street Name (Suffix)
An alternative to using a prefix is to analyse suffix.
For example Sydney Street Suffix;
Suffix Street: Rating : 1 166
Suffix Lane Rating : 571
Suffi Avenue Rating : 1 14
Suffix W y Rating : 10
In this instance one would expect to find twice as many Streets as Lanes etc. Pattern matching can apply to virtually anything, not just alphanumeric strings. They can apply to numbers and/or sequences of numbers. For example larger banks (issuers) have a greater number of credit cards in distribution. A credit cards prefix determines the issuer. As such one would expect to find more credit cards attributed to a larger ban k than smaller bank.
The calculation of the correlation between production data and testing data is managed through a 4 step process;
Step-1 Identify a baseline-set of values with a strong "distribLrtioix/popularity" spread. From production identify a number of distinct values that have a di stinctive spread (popul ari ty count). By di stincti ve that i s to say the regularity of the value (popul arity of the value) is clearly different from other items in the data set. Refer to the surname example below. Each value (surname) is twice as popular as the next.
Example SURNAME Baseline Set:
VALUE COUNT PROD POPULARITY RATIO*
SMITH 64000 0.504
JONES 32000 0.252
TAYLOR 16000 0.126
YOUNG 8000 0,063
FOSTER 4000 0,031
BAR ETT 2000 0.016
'Popularity Ratio is ratio of COUNT/SUM^OF-SELECTION-COUNT
Step-2 Build an equivalent comparison-set from your Target Data Source (e.g. Test DB) Example SURNAME Comparison Set: VALUE COUN1 TEST POPULARITY RATIO*
SMITH 44 0.344
JONES 32 0.250
TAYLOR 28 0,219
YOUNG 8 0.063
FOSTER 0.063
BARNETT 0.039
Popularity Ratio is ratio of COUNT/SUM-OF-SELECTION-COUNT
Step-3 Apply a Correlation Formula to the two Data Sets
Compare the correlation of the two associated rows using a Data Correlati on formula. For example: CORRELATE (P, T)
Where P = "Production Popularity Ratio" Data Set
Where T = "Test Popularity Ratio" Data Set
The ''Count" Data Set can also be used & would provide the same results (score). In one embodiment, the correlatio formula applied i s of the type known as the Spearma Rank Correlation formula. However any RAN correlation / or standard correlation (e.g. Pearson's) formula could be used.
Step-4 Display Results
Most correlation systems provide a score of -1 (inverse) to 0 (no correlation) to 1 (positive). To simplify the analysts understanding, these scores are converted into bands.
The banding is configurable, and by default is set as follows:
HIGH; High to Very High Correlation (Prod Like)
MEDIUM Medium Positive Correlation (Undetermined)
LOW: Low Negative Correlation (Non Prod Like) T'he Relationship between score & band■will depend on Rank Correlation formula used.
It can be seen that embodiments of the invention have at least one of the following advantages:
* Rapi d i dentificatio of all E2E data that can be classified as of type PII
• Automatic Identification of whether this Non-Production hosted ΡΠ data contains production patterns, thus indicating l ikelihood of data actually being unmasked production data,
· A mechani sni to further audit the risk through use of sample data.
Any reference to prior art contained herei is not to be take as an admission that the information is common general knowledge, unless otherwise indicated.
Finally, it is to be appreciated that various alterations or additions may be made to the parts previously described without departing from the spirit or ambit of the present i vention.

Claims

CLAIM'S:
1. A method of analysing test data to check for the presence of personally
identifiable information including the steps of:
determining a typical distribution of values for at least one field of data in a production en vironm ent ;
calculating the actual distribution of values for at least one field of the test data;
comparing the typical distribution with the actual distribution; and
providing an indication of the likely presence of personally identifiable information based on the result of the comparison.
2. A method according to claim 1 wherein the step of comparing includes
calculating the correlation between the typical distribution and the actual distribution.
3. A method according to either of claim 1 or claim 2 further including the step of scanning the test data for personally identifiable information data types,
4. A method according to claim 3 wherein the data types include any one of Fi st Name, Last Name, Email, Address, Telephone Number, Tax File/Social Security numbers, Driving License, Passport & Credit Cards.
5. A system for analysing test data to check for the presence of personally
identifiable information including:
calculating means for calculating the actual distribution of values for at least one field of the test data;
comparing means for comparing a typical distribution of values with the actual distribution; and
display means for providing an indication of the likely presence of personally identifiable information based on the result of the compari son.
6. A system according to claim 5 wherein the comparing means is arranged to calculate a value representing the correlation between the typical distribution and the actual distribution.
7. A software program including instructions which, when carried out by a
processor, cause a computing system to operate a method accordi g to an one of claims 1 to 4 or to embody a system according to any one of claims 5 or 6.
8. A computer readable medium which is populated with a software program according to claim 7.
PCT/AU2014/050385 2013-12-10 2014-11-28 A method and system for analysing test data to check for the presence of personally identifiable information Ceased WO2015085358A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2013904792A AU2013904792A0 (en) 2013-12-10 A method and system for analysing test data to check for the presence of personally identifiable information
AU2013904792 2013-12-10

Publications (1)

Publication Number Publication Date
WO2015085358A1 true WO2015085358A1 (en) 2015-06-18

Family

ID=53370367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2014/050385 Ceased WO2015085358A1 (en) 2013-12-10 2014-11-28 A method and system for analysing test data to check for the presence of personally identifiable information

Country Status (1)

Country Link
WO (1) WO2015085358A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199781A1 (en) * 2001-08-30 2004-10-07 Erickson Lars Carl Data source privacy screening systems and methods
US7269578B2 (en) * 2001-04-10 2007-09-11 Latanya Sweeney Systems and methods for deidentifying entries in a data source
US8069053B2 (en) * 2008-08-13 2011-11-29 Hartford Fire Insurance Company Systems and methods for de-identification of personal data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7269578B2 (en) * 2001-04-10 2007-09-11 Latanya Sweeney Systems and methods for deidentifying entries in a data source
US20040199781A1 (en) * 2001-08-30 2004-10-07 Erickson Lars Carl Data source privacy screening systems and methods
US8069053B2 (en) * 2008-08-13 2011-11-29 Hartford Fire Insurance Company Systems and methods for de-identification of personal data

Similar Documents

Publication Publication Date Title
US12045225B2 (en) Multi-table data validation tool
NL2012438B1 (en) Resolving similar entities from a database.
CA2748425C (en) Batch entity representation identification using field match templates
US8996524B2 (en) Automatically mining patterns for rule based data standardization systems
CN104516882B (en) The method and apparatus for determining the density of infection of SQL statement
US10572480B2 (en) Adaptive intersect query processing
US9336286B2 (en) Graphical record matching process replay for a data quality user interface
Kumar et al. Feature selection techniques to counter class imbalance problem for aging related bug prediction: aging related bug prediction
JP6419667B2 (en) Test DB data generation method and apparatus
CN106126736A (en) Software developer's personalized recommendation method that software-oriented safety bug repairs
KR101742041B1 (en) an apparatus for protecting private information, a method of protecting private information, and a storage medium for storing a program protecting private information
CN103180848B (en) A system and method for replicating data
Masood-Al-Farooq SQL Server 2014 Development Essentials
Hadler et al. An improved version of a tool mark comparison algorithm
WO2015085358A1 (en) A method and system for analysing test data to check for the presence of personally identifiable information
US11250127B2 (en) Binary software composition analysis
US20160104166A1 (en) Computerized account database access tool
CN116860311A (en) Script analysis method, script analysis device, computer equipment and storage medium
CA2748676C (en) Entity representation identification using entity representation level information
CN115658662A (en) A method based on database security quality inspection and automatic rectification
JP2017010376A (en) Mart-less verification support system and mart-less verification support method
US12326954B2 (en) Method and system for identifying data and managing access thereto across multiple data platforms
GB2475796A (en) Identifying an entity representation by constructing a comprehensive search criteria
Grocevs et al. Modern Algorithms to Identify Plagiarism
Davidson et al. Exam Ref 70-762 Developing SQL Databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14869514

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14869514

Country of ref document: EP

Kind code of ref document: A1