Detailed Description
The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example 1
Fig. 1 is a flow chart of a plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention, where the method may be performed by a plaintext similarity estimation device for an encrypted string, where the device may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server. As shown in fig. 1, the method includes:
S110, acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set.
The plaintext data is a string that is not encrypted. The character string may be derived from characters of different lengths and/or combinations of different characters, including at least one of numbers, letters, and symbols.
Accordingly, the plaintext data set is comprised of a predetermined number of strings of different lengths and/or different combinations. The preset number may be 10000 or 20000, etc., which is determined according to training samples required by the developer, and is not limited herein. The purpose of the encryption operation on the plaintext data set is to ensure confidentiality of the data during transmission.
Correspondingly, the ciphertext data set is obtained by performing encryption operation on character strings in the plaintext data set by using a preset encryption algorithm. The preset encryption algorithm may be a symmetric algorithm (Data Encryption Standard, DES for short), an international data encryption algorithm (International Data Encryption Algorithm, IDEA for short), a digital signature algorithm (Digital Signature Algorithm, DSA for short), or the like, which is not limited herein.
Correspondingly, the ciphertext data set after encryption operation comprises a preset number of encryption character strings.
And S120, modeling the plaintext data set and the ciphertext data set based on the multiple distributions respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set.
The polynomial distribution (Multinomial Distribution) is a generalization of the binomial distribution, which is induced by a joint distribution of two or more random variables X 1,X2,…,Xk of limited possible values (where k≥2).
Since the plaintext data provided by the embodiment of the present invention is composed of character strings, according to a conventional input manner, for example, there are a limited number of m different inputs (including numbers, letters and symbols), when modeling the plaintext data set using a polynomial distribution, the prediction distribution corresponding to the plaintext data set can be represented by the following expression:
Multinomial(n1,n2,…,nm,p1,p2,…,pm) (1)
Wherein Multinomial denotes a polynomial distribution, m denotes a total dimension of the character type, n denotes a mean vector of the corresponding dimension variable, and p denotes a covariance vector of the corresponding dimension variable.
Specifically, 94 different inputs can be obtained through statistics according to a conventional keyboard input mode, and then a prediction distribution corresponding to a plaintext data set can be specifically expressed as:
Multinomial(n1,n2,…,n94,p1,p2,…,p94) (2)
A total of 94 mean and covariance vectors in 94 dimensions can be obtained from the 94 input modes.
Correspondingly, the ciphertext data set is obtained by carrying out encryption operation on the plaintext data set, and when modeling the ciphertext data set based on polynomial distribution, the obtained prediction distribution expression corresponding to the ciphertext data set is the same as the prediction distribution expression corresponding to the plaintext data set.
S130, based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set.
The decryption function is a way of decrypting the encrypted ciphertext data set, and can be understood as an inverse function of the encryption algorithm.
When determining the estimated distribution corresponding to the decryption function, modeling needs to be performed on m different input modes obtained in step S120, where m modes correspond to m dimensions, and the estimated distribution corresponding to the decryption function may be determined using the big data law and the multivariate normal distribution, and may be represented by the following expression:
N(μm,Σm) (3)
where μ m represents the mean vector corresponding to the decryption function in the total dimension, Σ m represents the variance matrix corresponding to the decryption function in the total dimension.
Corresponding to 94 different inputs obtained by the existing statistics, the estimated distribution corresponding to the decryption function can be expressed as:
N(μ94,Σ94) (4)
further, the obtained distribution corresponding to the plaintext data set and the obtained distribution corresponding to the ciphertext data set are used as inputs of a Bayesian statistical model, so that the estimated distribution corresponding to the decryption function is output.
According to the estimated distribution expression corresponding to the decryption function, the estimated distribution of the mean vector parameter mu corresponding to the current dimension and the estimated distribution of the variance matrix parameter sigma corresponding to the current dimension are mainly estimated based on a Bayesian statistical model.
S140, estimating the plaintext similarity between different target encrypted character strings according to the estimated distribution corresponding to the decryption function.
In the embodiment of the invention, the target encryption character string can be understood as an encryption character string needing plaintext similarity estimation, the specific number can be more than two, and the encryption algorithm corresponding to the target encryption character string is a preset encryption algorithm.
After the estimated distribution corresponding to the decryption function is determined, the target encrypted character string can be decrypted according to the estimated distribution corresponding to the current decryption function, so that the plaintext character string estimated by the corresponding encrypted character string can be obtained. After the estimated plaintext character strings corresponding to the target encrypted character strings are obtained, the similarity between any two estimated plaintext character strings can be calculated, so that the plaintext similarity between different target encrypted character strings can be estimated.
The plaintext similarity estimation method for the encrypted character strings provided by the embodiment of the invention comprises the steps of firstly obtaining a plaintext data set, carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set, then respectively modeling the plaintext data set and the ciphertext data set based on multiple distribution to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set, then estimating the estimated distribution corresponding to a decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model, and finally estimating plaintext similarity between different target encrypted character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encrypted character string is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the association relationship among a plurality of plaintext data can be estimated.
Example two
The embodiment of the invention further optimizes the Bayesian statistical model, and predicts the likelihood distribution of the Bayesian statistical model according to the predicted distribution corresponding to the plaintext data set and the predicted distribution corresponding to the ciphertext data set and the predicted distribution corresponding to the predicted decryption function based on the above embodiment, wherein the method comprises the steps of converting the predicted distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, converting the predicted distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model, and predicting the likelihood distribution of the Bayesian statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayesian statistical model is the predicted distribution corresponding to the decryption function. The advantage of this arrangement is that the decryption function problem is converted into a Bayesian statistical model problem, which is convenient for calculation.
The step of estimating the plaintext similarity between different target encryption strings according to the estimated distribution corresponding to the decryption function is further optimized, and comprises the steps of selecting a first target encryption string and a second target encryption string, inputting the first target encryption string and the second target encryption string into the estimated distribution corresponding to the decryption function to obtain first estimated plaintext data corresponding to the first target encryption string and second estimated plaintext data corresponding to the second target encryption string, and performing similarity calculation on the first estimated plaintext data and the second estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption string and the second target encryption string. The method has the advantages that through obtaining the corresponding estimated distribution of the decryption function, the corresponding plaintext data estimated about the encrypted character string is obtained, so that the similarity of the plaintext is obtained through prediction, and the data security is ensured in the data transmission process.
Referring to fig. 2, fig. 2 is a flowchart of another method for estimating plaintext similarity of an encrypted string according to an embodiment of the present invention, and the method specifically includes the following steps:
S210, acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set.
If the obtained plaintext data set is denoted as x, the encryption algorithm is denoted as f (·), and the ciphertext data set is denoted as y, the relationship y=f (x) is obtained.
In order to implement the method for estimating the similarity of the plaintext of the encrypted character string based on the ciphertext of the encrypted character string provided by the embodiment of the invention, a decryption function n (-) of an encryption algorithm f (-) is needed to be found, wherein n (-) is a generalized inverse function of f (-), and generally, only the decryption result obtained by the found decryption function does not influence the similarity estimation of the decrypted character.
When the plaintext data set x is obtained, a certain number of character strings with different lengths and different combinations can be randomly generated, and 20000 character strings are selected as the plaintext data set by way of example.
Then, a preset encryption algorithm f (·) is used to perform encryption operation on each plaintext character string in the plaintext data set, so as to obtain a corresponding ciphertext data set y.
S220, modeling the plaintext data set and the ciphertext data set based on the multiple distributions respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set.
Further, modeling the plaintext data set based on the polynomial distribution, and obtaining an estimated distribution corresponding to the plaintext data set, which is denoted as X, and correspondingly modeling the ciphertext data set based on the polynomial distribution, and obtaining an estimated distribution corresponding to the ciphertext data set, which is denoted as Y, and obtaining a relation y=f (X) according to S210.
S230, converting the estimated distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, and converting the estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model.
The plaintext similarity prediction method for the encrypted character string, provided by the embodiment of the invention, can convert the problem of calculating the decryption function into the problem of Bayesian statistical model. The distribution corresponding to the plaintext data set is marked as X, the distribution corresponding to the ciphertext data set is marked as Y, the estimated distribution X corresponding to the plaintext data set is converted into posterior distribution related to a Bayesian statistical model, the posterior distribution can be understood as a result of a priori knowledge, and then the probability distribution of the cause is estimated by the result; the distribution Y corresponding to the ciphertext data set can be regarded as prior distribution in the Bayesian statistical model, and the prior distribution can be understood as probability distribution for determining reasons before the result.
S240, estimating likelihood distribution of the Bayesian statistical model based on posterior distribution and prior distribution, wherein the likelihood distribution of the Bayesian statistical model is estimated distribution corresponding to the decryption function.
Further, the decryption function may be denoted as Q, and the estimated distribution corresponding to the decryption function may be denoted as Q, where the estimated distribution X corresponding to the plaintext data set, the distribution Y corresponding to the ciphertext data set, and the estimated distribution Q corresponding to the decryption function have the following relationship:
X∝QY (5)
I.e. the distribution X corresponding to the plaintext data set is proportional to the product of the distribution Y corresponding to the ciphertext data set and the estimated distribution Q corresponding to the decryption function.
In the formula (5), the estimated distribution corresponding to the decryption function may be regarded as likelihood distribution in the bayesian statistical model, and the likelihood distribution may be understood as a probability distribution of the result estimated according to the cause determined first.
Further, estimating parameters of the estimated distribution corresponding to the decryption function by using the law of large numbers and the multivariate normal distribution before estimating likelihood distribution of the Bayes statistical model based on posterior distribution and prior distribution.
Because of the estimated distribution corresponding to the decryption function, in relation to the plaintext data set input dimension, when the input dimension is 94, determining the estimated distribution corresponding to the decryption function using the big data law and the multivariate normal distribution can be expressed as:
N(μ94,Σ94)
Where μ 94 represents the mean vector corresponding to the decryption function in the total dimension, Σ 94 represents the variance matrix corresponding to the decryption function in the total dimension.
Therefore, before determining the estimated distribution corresponding to the decryption function, the parameters μ 94 and Σ 94 about the corresponding estimated distribution corresponding to the decryption function are also determined.
Specifically, a multivariate normal distribution is selected as an estimated distribution corresponding to the decryption function, and parameters mu 94 and sigma 94 in the estimated distribution corresponding to the decryption function are estimated by using a Bayesian statistical model after problem transformation, namely using X and Y, so as to determine an estimated distribution Q corresponding to the decryption function.
Further, the parameters of the estimated distribution corresponding to the decryption function may be estimated using the law of large numbers and the multivariate normal distribution.
Big data law (Law of Large Numbers) discusses the law that the arithmetic mean of a sequence of random variables converges to the arithmetic mean of each mathematical expectation of the random variables. The multi-element normal distribution (Multivariate normal distribution) is a one-dimensional normal distribution generalized to multiple dimensions, and the frequency ratio in any value range can be estimated according to a formula as long as the average value and standard deviation of a variable obeying the normal distribution are known. The embodiment of the invention solves the parameter problem contained in the decryption function using the law of large numbers and the multivariate normal distribution.
Accordingly, step S240 may further be based on obtaining posterior distribution X (i.e., estimated distribution corresponding to the plaintext data set) and prior distribution Y (i.e., estimated distribution corresponding to the ciphertext data set) related to the bayesian statistical model, and parameters mu 94 and Σ 94 of the estimated distribution corresponding to the decryption function, and then estimating to obtain likelihood distribution of the bayesian statistical model, where the likelihood distribution of the bayesian statistical model is the estimated distribution corresponding to the decryption function.
S250, selecting a first target encryption character string and a second target encryption character string.
The first target encryption character string and the second target encryption character string are ciphertext character strings after encryption operation by using an encryption algorithm, and plaintext similarity corresponding to the first target encryption character string and the second target encryption character string needs to be estimated.
S260, the first target encryption character string and the second target encryption character string are respectively input into the estimated distribution corresponding to the decryption function, and the first estimated plaintext data corresponding to the first target encryption character string and the second estimated plaintext data corresponding to the second target encryption character string are obtained.
According to the estimated distribution Q corresponding to the decryption function, the first target encrypted string may be denoted as y_new 1, the second target encrypted string may be denoted as y_new 2, the y_new 1 and y_new 2 may be respectively input into the estimated distribution Q corresponding to the decryption function, the first estimated plaintext data corresponding to y_new 1 may be denoted as x_new 1, and the second estimated plaintext data corresponding to y_new 2 may be obtained and denoted as x_new 2.
S270, similarity calculation is carried out on the first estimated plaintext data and the second estimated plaintext data, and plaintext similarity corresponding to the first target encrypted character string and the second target encrypted character string is obtained.
The method for calculating the similarity between the first pre-estimated plaintext data and the second pre-estimated plaintext data may be, for example, calculating the cosine similarity (cosine similarity), calculating the euclidean distance (Euclidean distance), or calculating the Mahalanobis distance, and the specific calculation method is not limited herein.
According to the plaintext similarity estimation method for the encrypted character string, when similarity estimation is carried out on corresponding plaintext by utilizing ciphertext data, the association relation between plaintext data is calculated and obtained while protecting privacy data or plaintext data in the field of data security. Meanwhile, the calculation method is a generalized decryption model, the decryption problem is converted into a Bayesian statistical model, the method theory is applicable to various encryption algorithms, accurate decryption calculation is not required to be carried out on the encryption algorithm, and the decryption cost is greatly reduced on the basis of a certain accuracy.
Example III
Fig. 3 is a block diagram of a plaintext similarity estimation apparatus for an encrypted string according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server, and the plaintext similarity of the encrypted string may be estimated by executing a plaintext similarity estimation method for the encrypted string. As shown in fig. 3, the apparatus includes an encryption operation module 31, a predicted distribution obtaining module 32, a predicted distribution calculating module 33, and a plaintext similarity predicting module 34, wherein:
The encryption operation module 31 is configured to obtain a plaintext data set, and perform encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
the estimated distribution obtaining module 32 is configured to model the plaintext data set and the ciphertext data set based on a plurality of distributions, respectively, so as to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
The estimated distribution calculation module 33 is configured to estimate, based on a bayesian statistical model, an estimated distribution corresponding to a decryption function according to an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
And the plaintext similarity estimation module 34 is configured to estimate plaintext similarity between different target encrypted strings according to an estimated distribution corresponding to the decryption function, where an encryption algorithm corresponding to the target encrypted string is the preset encryption algorithm.
The plaintext similarity estimating device for the encrypted character strings provided by the embodiment of the invention comprises the steps of firstly obtaining a plaintext data set, carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set, then respectively modeling the plaintext data set and the ciphertext data set based on multiple distribution to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set, estimating the estimated distribution corresponding to a decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set based on a Bayesian statistical model, and finally estimating plaintext similarity among different target encrypted character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encrypted character string is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the association relationship among a plurality of plaintext data can be estimated.
Optionally, the plaintext data set includes a predetermined number of strings of different lengths and/or combinations of different strings, the combinations including at least one of numbers, letters, and symbols.
Optionally, the estimated distribution calculation module 33 includes an estimated distribution conversion unit and an estimated distribution calculation unit, wherein:
The estimated distribution conversion unit is used for converting the estimated distribution corresponding to the plaintext data set into posterior distribution related to a Bayesian statistical model and converting the estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;
and the estimated distribution calculation unit is used for estimating likelihood distribution of the Bayesian statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayesian statistical model is estimated distribution corresponding to the decryption function.
Optionally, the estimated distribution calculation unit further comprises a parameter estimation subunit;
And the parameter estimation subunit is used for estimating the parameters of the estimated distribution corresponding to the decryption function by using the law of large numbers and the multivariate normal distribution.
Correspondingly, the estimated distribution calculation unit is further used for estimating likelihood distribution of the Bayesian statistical model based on parameters of the posterior distribution, the prior distribution and the estimated distribution corresponding to the decryption function.
Optionally, the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set are represented by the following expressions:
Multinomial(n1,n2,…,nm,p1,p2,…,pm)
Wherein Multinomial denotes a polynomial distribution, m denotes a total dimension of the character type, n denotes a mean vector of the corresponding dimension variable, and p denotes a covariance vector of the corresponding dimension variable.
Optionally, the estimated distribution corresponding to the decryption function is represented by the following expression:
N(μm,Σm)
where μ m represents the mean vector corresponding to the decryption function in the total dimension, Σ m represents the variance matrix corresponding to the decryption function in the total dimension.
Optionally, the plaintext similarity estimation module 34 includes a target encrypted string selection unit, a target encrypted string input unit, and a plaintext similarity estimation unit, where:
The target encryption character string selection unit is used for selecting a first target encryption character string and a second target encryption character string;
The target encryption character string input unit is used for inputting the first target encryption character string and the second target encryption character string into the estimated distribution corresponding to the decryption function respectively to obtain first estimated plaintext data corresponding to the first target encryption character string and second estimated plaintext data corresponding to the second target encryption character string;
The plaintext similarity estimation unit is used for performing similarity calculation on the first estimated plaintext data and the second estimated plaintext data to obtain plaintext similarity corresponding to the first target encrypted character string and the second target encrypted character string.
The plaintext similarity estimating device for the encrypted character string provided by the embodiment of the invention can be used for executing the plaintext similarity estimating method for the encrypted character string provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the method.
Example IV
The embodiment of the invention provides a computer device, and the plaintext similarity estimating device of the encrypted character string provided by the embodiment of the invention can be integrated in the computer device. Fig. 4 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 400 may include a memory 401, a processor 402 and a computer program stored in the memory 401 and executable by the processor, wherein the processor 402 implements the plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention when executing the computer program.
The computer equipment provided by the embodiment of the invention can execute the plaintext similarity estimation method of the encrypted character string provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the method.
Example five
The embodiment of the invention also provides a storage medium containing computer executable instructions, which when executed by a computer processor, are used for executing a plaintext similarity estimation method for an encrypted character string, the method comprising:
acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
Modeling the plaintext data set and the ciphertext data set based on multiple distribution respectively to obtain estimated distribution corresponding to the plaintext data set and estimated distribution corresponding to the ciphertext data set;
based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;
And estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.
Storage media-any of various types of memory devices or storage devices. The term "storage medium" is intended to include mounting media such as CD-ROM, floppy disk or tape devices, computer system memory or random access memory such as DRAM, DDRRAM, SRAM, EDORAM, rambit (Rambus) RAM, etc., non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage), registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a second, different computer system connected to the first computer system through a network such as the internet. The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) executable by one or more processors.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the plaintext similarity estimation operation of the encrypted string described above, and may also perform the related operations in the plaintext similarity estimation method of the encrypted string provided in any embodiment of the present invention.
The plaintext similarity estimating device, the plaintext similarity estimating device and the plaintext similarity estimating storage medium for the encrypted character string provided by the embodiment of the invention can be used for executing the plaintext similarity estimating method for the encrypted character string provided by any embodiment of the invention, and have the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the above embodiments may be referred to a plaintext similarity estimation method for an encrypted string according to any of the embodiments of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.