[go: up one dir, main page]

CN114117487B - Plaintext similarity estimation method, device, equipment and medium for encrypted character strings - Google Patents

Plaintext similarity estimation method, device, equipment and medium for encrypted character strings Download PDF

Info

Publication number
CN114117487B
CN114117487B CN202111402823.3A CN202111402823A CN114117487B CN 114117487 B CN114117487 B CN 114117487B CN 202111402823 A CN202111402823 A CN 202111402823A CN 114117487 B CN114117487 B CN 114117487B
Authority
CN
China
Prior art keywords
data set
estimated
plaintext
estimated distribution
distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111402823.3A
Other languages
Chinese (zh)
Other versions
CN114117487A (en
Inventor
徐莉莎
陈远猷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Para Software Co ltd
Original Assignee
Shanghai Para Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Para Software Co ltd filed Critical Shanghai Para Software Co ltd
Priority to CN202111402823.3A priority Critical patent/CN114117487B/en
Publication of CN114117487A publication Critical patent/CN114117487A/en
Application granted granted Critical
Publication of CN114117487B publication Critical patent/CN114117487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Geometry (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Complex Calculations (AREA)

Abstract

本发明实施例公开了加密字符串的明文相似性预估方法、装置、设备及介质,该方法包括:获取明文数据集,使用预设加密算法对明文数据集进行加密运算,得到密文数据集;基于多项分布分别对明文数据集和密文数据集建模,得到明文数据集对应的预估分布和密文数据集对应的预估分布;基于贝叶斯统计模型,根据明文数据集对应的预估分布和密文数据集对应的预估分布预估解密函数对应的预估分布;根据解密函数对应的预估分布预估不同目标加密字符串之间的明文相似性。采用上述技术方案,可以通过加密后的密文数据预估加密前明文数据的相似性,在达到保护明文数据隐私的同时,还可预估到多个明文数据间的关联关系。

The embodiment of the present invention discloses a method, device, equipment and medium for estimating the plaintext similarity of encrypted strings. The method includes: obtaining a plaintext data set, performing encryption operation on the plaintext data set using a preset encryption algorithm to obtain a ciphertext data set; modeling the plaintext data set and the ciphertext data set based on multiple distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; estimating the estimated distribution corresponding to the decryption function based on the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model; estimating the plaintext similarity between different target encrypted strings based on the estimated distribution corresponding to the decryption function. By adopting the above technical solution, the similarity of the plaintext data before encryption can be estimated by the encrypted ciphertext data, and while protecting the privacy of the plaintext data, the correlation between multiple plaintext data can also be estimated.

Description

Plaintext similarity estimation method, device, equipment and medium for encrypted character strings
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a plaintext similarity estimation method, device, equipment and medium for encrypted character strings.
Background
With the development of informatization, the demand of people for information security is also increasing, and in information security and data confidentiality applications, data encryption is a basic application technology for protecting information.
In the prior art, more methods exist for judging the similarity of the character strings, but in the aspect of data security, how to use encrypted ciphertext data as algorithm input and how to output the similarity of plaintext data before encryption are not very rich.
Disclosure of Invention
The embodiment of the invention provides a plaintext similarity estimation method, device, equipment and medium for an encrypted character string, which can optimize the existing related scheme for estimating plaintext data.
In a first aspect, an embodiment of the present invention provides a plaintext similarity estimation method for an encrypted string, including:
acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
Modeling the plaintext data set and the ciphertext data set based on multiple distribution respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set;
based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;
And estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.
In a second aspect, an embodiment of the present invention provides a plaintext similarity estimation apparatus for an encrypted string, including:
The encryption operation module is used for acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
The estimated distribution obtaining module is used for respectively modeling the plaintext data set and the ciphertext data set based on multiple distributions to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set;
The estimated distribution calculation module is used for estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model;
and the plaintext similarity estimation module is used for estimating plaintext similarity between different target encrypted character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encrypted character string is the preset encryption algorithm.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements a plaintext similarity estimation method for an encrypted string according to the embodiment of the present invention when the processor executes the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements a plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention.
The plaintext similarity estimation scheme of the encrypted character strings provided by the embodiment of the invention comprises the steps of firstly obtaining a plaintext data set, carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set, then respectively modeling the plaintext data set and the ciphertext data set based on multiple distribution to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set, estimating the estimated distribution corresponding to a decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model, and finally estimating plaintext similarity among different target encrypted character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encrypted character strings is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the association relationship among a plurality of plaintext data can be estimated.
Drawings
Fig. 1 is a flowchart of a plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating another method for plaintext similarity estimation of an encrypted string according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating a plaintext similarity estimation apparatus for encrypted strings according to an embodiment of the present invention;
Fig. 4 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example 1
Fig. 1 is a flow chart of a plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention, where the method may be performed by a plaintext similarity estimation device for an encrypted string, where the device may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server. As shown in fig. 1, the method includes:
S110, acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set.
The plaintext data is a string that is not encrypted. The character string may be derived from characters of different lengths and/or combinations of different characters, including at least one of numbers, letters, and symbols.
Accordingly, the plaintext data set is comprised of a predetermined number of strings of different lengths and/or different combinations. The preset number may be 10000 or 20000, etc., which is determined according to training samples required by the developer, and is not limited herein. The purpose of the encryption operation on the plaintext data set is to ensure confidentiality of the data during transmission.
Correspondingly, the ciphertext data set is obtained by performing encryption operation on character strings in the plaintext data set by using a preset encryption algorithm. The preset encryption algorithm may be a symmetric algorithm (Data Encryption Standard, DES for short), an international data encryption algorithm (International Data Encryption Algorithm, IDEA for short), a digital signature algorithm (Digital Signature Algorithm, DSA for short), or the like, which is not limited herein.
Correspondingly, the ciphertext data set after encryption operation comprises a preset number of encryption character strings.
And S120, modeling the plaintext data set and the ciphertext data set based on the multiple distributions respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set.
The polynomial distribution (Multinomial Distribution) is a generalization of the binomial distribution, which is induced by a joint distribution of two or more random variables X 1,X2,…,Xk of limited possible values (where k≥2).
Since the plaintext data provided by the embodiment of the present invention is composed of character strings, according to a conventional input manner, for example, there are a limited number of m different inputs (including numbers, letters and symbols), when modeling the plaintext data set using a polynomial distribution, the prediction distribution corresponding to the plaintext data set can be represented by the following expression:
Multinomial(n1,n2,…,nm,p1,p2,…,pm) (1)
Wherein Multinomial denotes a polynomial distribution, m denotes a total dimension of the character type, n denotes a mean vector of the corresponding dimension variable, and p denotes a covariance vector of the corresponding dimension variable.
Specifically, 94 different inputs can be obtained through statistics according to a conventional keyboard input mode, and then a prediction distribution corresponding to a plaintext data set can be specifically expressed as:
Multinomial(n1,n2,…,n94,p1,p2,…,p94) (2)
A total of 94 mean and covariance vectors in 94 dimensions can be obtained from the 94 input modes.
Correspondingly, the ciphertext data set is obtained by carrying out encryption operation on the plaintext data set, and when modeling the ciphertext data set based on polynomial distribution, the obtained prediction distribution expression corresponding to the ciphertext data set is the same as the prediction distribution expression corresponding to the plaintext data set.
S130, based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set.
The decryption function is a way of decrypting the encrypted ciphertext data set, and can be understood as an inverse function of the encryption algorithm.
When determining the estimated distribution corresponding to the decryption function, modeling needs to be performed on m different input modes obtained in step S120, where m modes correspond to m dimensions, and the estimated distribution corresponding to the decryption function may be determined using the big data law and the multivariate normal distribution, and may be represented by the following expression:
N(μmm) (3)
where μ m represents the mean vector corresponding to the decryption function in the total dimension, Σ m represents the variance matrix corresponding to the decryption function in the total dimension.
Corresponding to 94 different inputs obtained by the existing statistics, the estimated distribution corresponding to the decryption function can be expressed as:
N(μ9494) (4)
further, the obtained distribution corresponding to the plaintext data set and the obtained distribution corresponding to the ciphertext data set are used as inputs of a Bayesian statistical model, so that the estimated distribution corresponding to the decryption function is output.
According to the estimated distribution expression corresponding to the decryption function, the estimated distribution of the mean vector parameter mu corresponding to the current dimension and the estimated distribution of the variance matrix parameter sigma corresponding to the current dimension are mainly estimated based on a Bayesian statistical model.
S140, estimating the plaintext similarity between different target encrypted character strings according to the estimated distribution corresponding to the decryption function.
In the embodiment of the invention, the target encryption character string can be understood as an encryption character string needing plaintext similarity estimation, the specific number can be more than two, and the encryption algorithm corresponding to the target encryption character string is a preset encryption algorithm.
After the estimated distribution corresponding to the decryption function is determined, the target encrypted character string can be decrypted according to the estimated distribution corresponding to the current decryption function, so that the plaintext character string estimated by the corresponding encrypted character string can be obtained. After the estimated plaintext character strings corresponding to the target encrypted character strings are obtained, the similarity between any two estimated plaintext character strings can be calculated, so that the plaintext similarity between different target encrypted character strings can be estimated.
The plaintext similarity estimation method for the encrypted character strings provided by the embodiment of the invention comprises the steps of firstly obtaining a plaintext data set, carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set, then respectively modeling the plaintext data set and the ciphertext data set based on multiple distribution to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set, then estimating the estimated distribution corresponding to a decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model, and finally estimating plaintext similarity between different target encrypted character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encrypted character string is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the association relationship among a plurality of plaintext data can be estimated.
Example two
The embodiment of the invention further optimizes the Bayesian statistical model, and predicts the likelihood distribution of the Bayesian statistical model according to the predicted distribution corresponding to the plaintext data set and the predicted distribution corresponding to the ciphertext data set and the predicted distribution corresponding to the predicted decryption function based on the above embodiment, wherein the method comprises the steps of converting the predicted distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, converting the predicted distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model, and predicting the likelihood distribution of the Bayesian statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayesian statistical model is the predicted distribution corresponding to the decryption function. The advantage of this arrangement is that the decryption function problem is converted into a Bayesian statistical model problem, which is convenient for calculation.
The step of estimating the plaintext similarity between different target encryption strings according to the estimated distribution corresponding to the decryption function is further optimized, and comprises the steps of selecting a first target encryption string and a second target encryption string, inputting the first target encryption string and the second target encryption string into the estimated distribution corresponding to the decryption function to obtain first estimated plaintext data corresponding to the first target encryption string and second estimated plaintext data corresponding to the second target encryption string, and performing similarity calculation on the first estimated plaintext data and the second estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption string and the second target encryption string. The method has the advantages that through obtaining the corresponding estimated distribution of the decryption function, the corresponding plaintext data estimated about the encrypted character string is obtained, so that the similarity of the plaintext is obtained through prediction, and the data security is ensured in the data transmission process.
Referring to fig. 2, fig. 2 is a flowchart of another method for estimating plaintext similarity of an encrypted string according to an embodiment of the present invention, and the method specifically includes the following steps:
S210, acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set.
If the obtained plaintext data set is denoted as x, the encryption algorithm is denoted as f (·), and the ciphertext data set is denoted as y, the relationship y=f (x) is obtained.
In order to implement the method for estimating the similarity of the plaintext of the encrypted character string based on the ciphertext of the encrypted character string provided by the embodiment of the invention, a decryption function n (-) of an encryption algorithm f (-) is needed to be found, wherein n (-) is a generalized inverse function of f (-), and generally, only the decryption result obtained by the found decryption function does not influence the similarity estimation of the decrypted character.
When the plaintext data set x is obtained, a certain number of character strings with different lengths and different combinations can be randomly generated, and 20000 character strings are selected as the plaintext data set by way of example.
Then, a preset encryption algorithm f (·) is used to perform encryption operation on each plaintext character string in the plaintext data set, so as to obtain a corresponding ciphertext data set y.
S220, modeling the plaintext data set and the ciphertext data set based on the multiple distributions respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set.
Further, modeling the plaintext data set based on the polynomial distribution, and obtaining an estimated distribution corresponding to the plaintext data set, which is denoted as X, and correspondingly modeling the ciphertext data set based on the polynomial distribution, and obtaining an estimated distribution corresponding to the ciphertext data set, which is denoted as Y, and obtaining a relation y=f (X) according to S210.
S230, converting the estimated distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, and converting the estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model.
The plaintext similarity prediction method for the encrypted character string, provided by the embodiment of the invention, can convert the problem of calculating the decryption function into the problem of Bayesian statistical model. The distribution corresponding to the plaintext data set is marked as X, the distribution corresponding to the ciphertext data set is marked as Y, the estimated distribution X corresponding to the plaintext data set is converted into posterior distribution related to a Bayesian statistical model, the posterior distribution can be understood as a result of a priori knowledge, and then the probability distribution of the cause is estimated by the result; the distribution Y corresponding to the ciphertext data set can be regarded as prior distribution in the Bayesian statistical model, and the prior distribution can be understood as probability distribution for determining reasons before the result.
S240, estimating likelihood distribution of the Bayesian statistical model based on posterior distribution and prior distribution, wherein the likelihood distribution of the Bayesian statistical model is estimated distribution corresponding to the decryption function.
Further, the decryption function may be denoted as Q, and the estimated distribution corresponding to the decryption function may be denoted as Q, where the estimated distribution X corresponding to the plaintext data set, the distribution Y corresponding to the ciphertext data set, and the estimated distribution Q corresponding to the decryption function have the following relationship:
X∝QY (5)
I.e. the distribution X corresponding to the plaintext data set is proportional to the product of the distribution Y corresponding to the ciphertext data set and the estimated distribution Q corresponding to the decryption function.
In the formula (5), the estimated distribution corresponding to the decryption function may be regarded as likelihood distribution in the bayesian statistical model, and the likelihood distribution may be understood as a probability distribution of the result estimated according to the cause determined first.
Further, estimating parameters of the estimated distribution corresponding to the decryption function by using the law of large numbers and the multivariate normal distribution before estimating likelihood distribution of the Bayes statistical model based on posterior distribution and prior distribution.
Because of the estimated distribution corresponding to the decryption function, in relation to the plaintext data set input dimension, when the input dimension is 94, determining the estimated distribution corresponding to the decryption function using the big data law and the multivariate normal distribution can be expressed as:
N(μ9494)
Where μ 94 represents the mean vector corresponding to the decryption function in the total dimension, Σ 94 represents the variance matrix corresponding to the decryption function in the total dimension.
Therefore, before determining the estimated distribution corresponding to the decryption function, the parameters μ 94 and Σ 94 about the corresponding estimated distribution corresponding to the decryption function are also determined.
Specifically, a multivariate normal distribution is selected as an estimated distribution corresponding to the decryption function, and parameters mu 94 and sigma 94 in the estimated distribution corresponding to the decryption function are estimated by using a Bayesian statistical model after problem transformation, namely using X and Y, so as to determine an estimated distribution Q corresponding to the decryption function.
Further, the parameters of the estimated distribution corresponding to the decryption function may be estimated using the law of large numbers and the multivariate normal distribution.
Big data law (Law of Large Numbers) discusses the law that the arithmetic mean of a sequence of random variables converges to the arithmetic mean of each mathematical expectation of the random variables. The multi-element normal distribution (Multivariate normal distribution) is a one-dimensional normal distribution generalized to multiple dimensions, and the frequency ratio in any value range can be estimated according to a formula as long as the average value and standard deviation of a variable obeying the normal distribution are known. The embodiment of the invention solves the parameter problem contained in the decryption function using the law of large numbers and the multivariate normal distribution.
Accordingly, step S240 may further be based on obtaining posterior distribution X (i.e., estimated distribution corresponding to the plaintext data set) and prior distribution Y (i.e., estimated distribution corresponding to the ciphertext data set) related to the bayesian statistical model, and parameters mu 94 and Σ 94 of the estimated distribution corresponding to the decryption function, and then estimating to obtain likelihood distribution of the bayesian statistical model, where the likelihood distribution of the bayesian statistical model is the estimated distribution corresponding to the decryption function.
S250, selecting a first target encryption character string and a second target encryption character string.
The first target encryption character string and the second target encryption character string are ciphertext character strings after encryption operation by using an encryption algorithm, and plaintext similarity corresponding to the first target encryption character string and the second target encryption character string needs to be estimated.
S260, the first target encryption character string and the second target encryption character string are respectively input into the estimated distribution corresponding to the decryption function, and the first estimated plaintext data corresponding to the first target encryption character string and the second estimated plaintext data corresponding to the second target encryption character string are obtained.
According to the estimated distribution Q corresponding to the decryption function, the first target encrypted string may be denoted as y_new 1, the second target encrypted string may be denoted as y_new 2, the y_new 1 and y_new 2 may be respectively input into the estimated distribution Q corresponding to the decryption function, the first estimated plaintext data corresponding to y_new 1 may be denoted as x_new 1, and the second estimated plaintext data corresponding to y_new 2 may be obtained and denoted as x_new 2.
S270, similarity calculation is carried out on the first estimated plaintext data and the second estimated plaintext data, and plaintext similarity corresponding to the first target encrypted character string and the second target encrypted character string is obtained.
The method for calculating the similarity between the first pre-estimated plaintext data and the second pre-estimated plaintext data may be, for example, calculating the cosine similarity (cosine similarity), calculating the euclidean distance (Euclidean distance), or calculating the Mahalanobis distance, and the specific calculation method is not limited herein.
According to the plaintext similarity estimation method for the encrypted character string, when similarity estimation is carried out on corresponding plaintext by utilizing ciphertext data, the association relation between plaintext data is calculated and obtained while protecting privacy data or plaintext data in the field of data security. Meanwhile, the calculation method is a generalized decryption model, the decryption problem is converted into a Bayesian statistical model, the method theory is applicable to various encryption algorithms, accurate decryption calculation is not required to be carried out on the encryption algorithm, and the decryption cost is greatly reduced on the basis of a certain accuracy.
Example III
Fig. 3 is a block diagram of a plaintext similarity estimation apparatus for an encrypted string according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server, and the plaintext similarity of the encrypted string may be estimated by executing a plaintext similarity estimation method for the encrypted string. As shown in fig. 3, the apparatus includes an encryption operation module 31, a predicted distribution obtaining module 32, a predicted distribution calculating module 33, and a plaintext similarity predicting module 34, wherein:
The encryption operation module 31 is configured to obtain a plaintext data set, and perform encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
the estimated distribution obtaining module 32 is configured to model the plaintext data set and the ciphertext data set based on a plurality of distributions, respectively, so as to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
The estimated distribution calculation module 33 is configured to estimate, based on a bayesian statistical model, an estimated distribution corresponding to a decryption function according to an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
And the plaintext similarity estimation module 34 is configured to estimate plaintext similarity between different target encrypted strings according to an estimated distribution corresponding to the decryption function, where an encryption algorithm corresponding to the target encrypted string is the preset encryption algorithm.
The plaintext similarity estimating device for the encrypted character strings provided by the embodiment of the invention comprises the steps of firstly obtaining a plaintext data set, carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set, then respectively modeling the plaintext data set and the ciphertext data set based on multiple distribution to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set, estimating the estimated distribution corresponding to a decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model, and finally estimating plaintext similarity among different target encrypted character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encrypted character string is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the association relationship among a plurality of plaintext data can be estimated.
Optionally, the plaintext data set includes a predetermined number of strings of different lengths and/or combinations of different strings, the combinations including at least one of numbers, letters, and symbols.
Optionally, the estimated distribution calculation module 33 includes an estimated distribution conversion unit and an estimated distribution calculation unit, wherein:
The estimated distribution conversion unit is used for converting the estimated distribution corresponding to the plaintext data set into posterior distribution related to a Bayesian statistical model and converting the estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;
and the estimated distribution calculation unit is used for estimating likelihood distribution of the Bayesian statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayesian statistical model is estimated distribution corresponding to the decryption function.
Optionally, the estimated distribution calculation unit further comprises a parameter estimation subunit;
And the parameter estimation subunit is used for estimating the parameters of the estimated distribution corresponding to the decryption function by using the law of large numbers and the multivariate normal distribution.
Correspondingly, the estimated distribution calculation unit is further used for estimating likelihood distribution of the Bayesian statistical model based on parameters of the posterior distribution, the prior distribution and the estimated distribution corresponding to the decryption function.
Optionally, the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set are represented by the following expressions:
Multinomial(n1,n2,…,nm,p1,p2,…,pm)
Wherein Multinomial denotes a polynomial distribution, m denotes a total dimension of the character type, n denotes a mean vector of the corresponding dimension variable, and p denotes a covariance vector of the corresponding dimension variable.
Optionally, the estimated distribution corresponding to the decryption function is represented by the following expression:
N(μmm)
where μ m represents the mean vector corresponding to the decryption function in the total dimension, Σ m represents the variance matrix corresponding to the decryption function in the total dimension.
Optionally, the plaintext similarity estimation module 34 includes a target encrypted string selection unit, a target encrypted string input unit, and a plaintext similarity estimation unit, where:
The target encryption character string selection unit is used for selecting a first target encryption character string and a second target encryption character string;
The target encryption character string input unit is used for inputting the first target encryption character string and the second target encryption character string into the estimated distribution corresponding to the decryption function respectively to obtain first estimated plaintext data corresponding to the first target encryption character string and second estimated plaintext data corresponding to the second target encryption character string;
The plaintext similarity estimation unit is used for performing similarity calculation on the first estimated plaintext data and the second estimated plaintext data to obtain plaintext similarity corresponding to the first target encrypted character string and the second target encrypted character string.
The plaintext similarity estimating device for the encrypted character string provided by the embodiment of the invention can be used for executing the plaintext similarity estimating method for the encrypted character string provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the method.
Example IV
The embodiment of the invention provides a computer device, and the plaintext similarity estimating device of the encrypted character string provided by the embodiment of the invention can be integrated in the computer device. Fig. 4 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 400 may include a memory 401, a processor 402 and a computer program stored in the memory 401 and executable by the processor, wherein the processor 402 implements the plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention when executing the computer program.
The computer equipment provided by the embodiment of the invention can execute the plaintext similarity estimation method of the encrypted character string provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the method.
Example five
The embodiment of the invention also provides a storage medium containing computer executable instructions, which when executed by a computer processor, are used for executing a plaintext similarity estimation method for an encrypted character string, the method comprising:
acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
Modeling the plaintext data set and the ciphertext data set based on multiple distribution respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set;
based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;
And estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.
Storage media-any of various types of memory devices or storage devices. The term "storage medium" is intended to include mounting media such as CD-ROM, floppy disk or tape devices, computer system memory or random access memory such as DRAM, DDRRAM, SRAM, EDORAM, rambit (Rambus) RAM, etc., non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage), registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a second, different computer system connected to the first computer system through a network such as the internet. The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) executable by one or more processors.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the plaintext similarity estimation operation of the encrypted string described above, and may also perform the related operations in the plaintext similarity estimation method of the encrypted string provided in any embodiment of the present invention.
The plaintext similarity estimating device, the plaintext similarity estimating device and the plaintext similarity estimating storage medium for the encrypted character string provided by the embodiment of the invention can be used for executing the plaintext similarity estimating method for the encrypted character string provided by any embodiment of the invention, and have the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the above embodiments may be referred to a plaintext similarity estimation method for an encrypted string according to any of the embodiments of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (8)

1. A plaintext similarity estimation method for an encrypted character string is characterized by comprising the following steps:
acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
Modeling the plaintext data set and the ciphertext data set based on multiple distribution respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set;
based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;
Estimating plaintext similarity between different target encryption character strings according to estimated distribution corresponding to the decryption function, wherein an encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm;
The estimating, based on the bayesian statistical model, the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set includes:
Converting the estimated distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, and converting the estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;
estimating parameters of estimated distribution corresponding to the decryption function by using a large number law and multivariate normal distribution;
And estimating likelihood distribution of the Bayesian statistical model based on parameters of the posterior distribution, the prior distribution and the estimated distribution corresponding to the decryption function, wherein the likelihood distribution of the Bayesian statistical model is the estimated distribution corresponding to the decryption function.
2. The method of claim 1, wherein the plaintext data set comprises a predetermined number of strings of different lengths and/or different combinations, the combinations comprising at least one of numbers, letters, and symbols.
3. The method of claim 1, wherein the estimated distribution for the plaintext data set and the estimated distribution for the ciphertext data set are represented by the following expressions:
Multinomial(n1,n2,…,nm,p1,p2,…,pm)
Wherein Multinomial denotes a polynomial distribution, m denotes a total dimension of the character type, n denotes a mean vector of the corresponding dimension variable, and p denotes a covariance vector of the corresponding dimension variable.
4. The method of claim 1, wherein the estimated distribution corresponding to the decryption function is represented by the following expression:
N(μmm)
where μ m represents the mean vector corresponding to the decryption function in the total dimension, Σ m represents the variance matrix corresponding to the decryption function in the total dimension.
5. The method according to claim 1, wherein the estimating plaintext similarity between different target encrypted strings according to the estimated distribution corresponding to the decryption function comprises:
selecting a first target encryption character string and a second target encryption character string;
Inputting the first target encryption character string and the second target encryption character string into the estimated distribution corresponding to the decryption function respectively to obtain first estimated plaintext data corresponding to the first target encryption character string and second estimated plaintext data corresponding to the second target encryption character string;
And performing similarity calculation on the first estimated plaintext data and the second estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption string and the second target encryption string.
6. A plaintext similarity estimation apparatus for an encrypted string, comprising:
The encryption operation module is used for acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
The estimated distribution obtaining module is used for respectively modeling the plaintext data set and the ciphertext data set based on multiple distributions to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set;
The estimated distribution calculation module is used for estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model;
The plaintext similarity estimation module is used for estimating plaintext similarity between different target encrypted character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encrypted character string is the preset encryption algorithm;
The estimated distribution calculation module comprises an estimated distribution conversion unit, a parameter estimation subunit and an estimated distribution calculation unit;
The estimated distribution conversion unit is used for converting the estimated distribution corresponding to the plaintext data set into posterior distribution related to a Bayesian statistical model and converting the estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;
a parameter estimation subunit, configured to estimate parameters of the estimated distribution corresponding to the decryption function using a law of large numbers and a multivariate normal distribution;
And the estimated distribution calculation unit is used for estimating likelihood distribution of the Bayesian statistical model based on parameters of the posterior distribution, the prior distribution and the estimated distribution corresponding to the decryption function.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-5 when the computer program is executed.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.
CN202111402823.3A 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character strings Active CN114117487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111402823.3A CN114117487B (en) 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character strings

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111402823.3A CN114117487B (en) 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character strings

Publications (2)

Publication Number Publication Date
CN114117487A CN114117487A (en) 2022-03-01
CN114117487B true CN114117487B (en) 2025-06-10

Family

ID=80371774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111402823.3A Active CN114117487B (en) 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character strings

Country Status (1)

Country Link
CN (1) CN114117487B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062299B (en) * 2022-07-26 2022-11-01 华控清交信息科技(北京)有限公司 Security detection method and device for data leakage and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108600573A (en) * 2018-03-13 2018-09-28 上海大学 Ciphertext jpeg image search method based on tree-like BoW models
CN110011784A (en) * 2019-04-04 2019-07-12 东北大学 KNN classification service system and method supporting privacy protection

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636651B2 (en) * 2003-11-28 2009-12-22 Microsoft Corporation Robust Bayesian mixture modeling
CN111464282B (en) * 2019-01-18 2024-04-26 百度在线网络技术(北京)有限公司 Homomorphic encryption-based data processing method and device
CN110032642B (en) * 2019-03-26 2022-02-11 广东工业大学 Modeling method of manifold topic model based on word embedding
CN110727951B (en) * 2019-10-14 2021-08-27 桂林电子科技大学 Lightweight outsourcing file multi-keyword retrieval method and system with privacy protection function
CN111447059B (en) * 2020-03-30 2023-04-28 南阳理工学院 Ciphertext equivalent test method, ciphertext equivalent test device, electronic equipment, storage medium and ciphertext equivalent test system
CN112016120B (en) * 2020-08-26 2024-03-26 支付宝(杭州)信息技术有限公司 Event prediction method and device based on user privacy protection
CN112887089B (en) * 2021-01-25 2022-08-12 华南农业大学 Ciphertext similarity calculation method, device, system and storage medium
CN113420307B (en) * 2021-06-28 2023-03-28 未鲲(上海)科技服务有限公司 Ciphertext data evaluation method and device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108600573A (en) * 2018-03-13 2018-09-28 上海大学 Ciphertext jpeg image search method based on tree-like BoW models
CN110011784A (en) * 2019-04-04 2019-07-12 东北大学 KNN classification service system and method supporting privacy protection

Also Published As

Publication number Publication date
CN114117487A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
US11816226B2 (en) Secure data processing transactions
US11277257B2 (en) Method and apparatus for performing operation using encrypted data
CN111291401B (en) Privacy protection-based business prediction model training method and device
Sarkar et al. Fast and scalable private genotype imputation using machine learning and partially homomorphic encryption
CN114696990B (en) Multi-party computing method, system and related equipment based on fully homomorphic encryption
US20210089887A1 (en) Variance-Based Learning Rate Control For Training Machine-Learning Models
US11030246B2 (en) Fast and accurate graphlet estimation
EP3863003B1 (en) Hidden sigmoid function calculation system, hidden logistic regression calculation system, hidden sigmoid function calculation device, hidden logistic regression calculation device, hidden sigmoid function calculation method, hidden logistic regression calculation method, and program
GB2594453A (en) Methods and systems for training a machine learning model
KR20240004830A (en) Blind rotation for use in fully homomorphic encryption
CN111126628B (en) Method, device and equipment for training GBDT model in trusted execution environment
US20240135024A1 (en) Method and system for data communication with differentially private set intersection
CN114117487B (en) Plaintext similarity estimation method, device, equipment and medium for encrypted character strings
US12368569B2 (en) Apparatus and method with homomorphic encryption
JP7736246B2 (en) Homomorphic encryption-based ciphertext processing method and device
US20220416995A1 (en) Accelerated division of homomorphically encrypted data
US20240176883A1 (en) Verifiable computing using computation fingerprint within fully homomorphic encryption (fhe)
Ziegeldorf et al. SHIELD: A framework for efficient and secure machine learning classification in constrained environments
Zhang et al. Efficient Homomorphic Approximation of Max Pooling for Privacy-Preserving Deep Learning
US20250192983A1 (en) Privacy-preserving and non-interactive training of regression trees
US20250227464A1 (en) Method for implementing private set intersection protocol using oblivious pseudo-random function based on minicrypt, and terminal device using same
CN120146158B (en) Quantum-resistant robust parameter aggregation federated learning method and device for genomics
CN119862419B (en) Negative sample generation method and device for knowledge graph link prediction task of recommendation system
US20230401036A1 (en) Methods and systems for quantum computing
Yang et al. Penalized LAD regression for single-index models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant