WO2025098109A1 - Procédé de vérification de propriété et procédé de traitement pour ensemble de données structurées, dispositif et support - Google Patents
Procédé de vérification de propriété et procédé de traitement pour ensemble de données structurées, dispositif et support Download PDFInfo
- Publication number
- WO2025098109A1 WO2025098109A1 PCT/CN2024/125335 CN2024125335W WO2025098109A1 WO 2025098109 A1 WO2025098109 A1 WO 2025098109A1 CN 2024125335 W CN2024125335 W CN 2024125335W WO 2025098109 A1 WO2025098109 A1 WO 2025098109A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data set
- data
- watermark
- structured data
- structured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/16—Program or content traceability, e.g. by watermarking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
Definitions
- the present application relates to the field of information security, and in particular to a method, processing method, device and medium for verifying ownership of a structured data set.
- watermark is common in the field of multimedia copyright related technologies.
- additional data such as the creator's identity information used for copyright identification is embedded in multimedia content files such as images, audio and video in a form visible or invisible to the human eye, so as to determine the copyright ownership of these produced contents and safeguard the legitimate rights and interests of the creators.
- Embedded watermarking technology is widely used in the confirmation of ownership of unstructured data.
- structured data is generally marked with column watermarks or row watermarks.
- the column watermark is an additional data field that has no practical meaning (or little practical meaning), or is simply a decorative mark added to the existing data in the format;
- the row watermark refers to synthesizing multiple groups of forged data based on the original structured data and mixing them into the data set, and using these forged data to achieve watermarking of the structured data.
- the defect of the column watermark is that the added data field that has no practical meaning (or little practical meaning) is very easy to distinguish.
- the data set theft party After other people or organizations (taking the data set theft party as an example below) obtain the structured data, it is easy to distinguish and strip the column watermark by machine means. After stripping the column watermark, the ownership of the structured data is difficult to identify.
- the defect of the row watermark is that the forged structured data is usually significantly different from the business data in format or content, and it is difficult to fully integrate it. It is also relatively easy for the data set theft party to remove the row watermark. After the data set thief removes the watermark, he or she can maliciously exploit the structured data, and it is difficult to guarantee the legitimate rights and interests of the data set owner at this time.
- the present invention provides a structured data set ownership verification method, processing method, device and medium.
- One aspect of the present application provides a method for verifying ownership of a structured data set, comprising the following steps:
- the structured data set includes a plurality of structured data, each of which is business data or watermark data, and the business data and the watermark data satisfy the same predetermined data format;
- the proportion of the watermark data in the structured data set is calculated, and the ownership relationship between the target object and the structured data set is determined according to the proportion result and the proportion label.
- identifying the watermark data from the structured data set according to the secret information and the specific mathematical property comprises:
- the secret information and the structured data are calculated according to the preset rule to obtain a first verification value
- the structured data is determined as watermark data.
- the preset rule includes a message authentication code algorithm or a deterministic digital signature algorithm.
- determining the ownership relationship between the target object and the structured data set according to the ratio result and the ratio label includes:
- the target object is the owner of the structured data set.
- calculating the difference value between the ratio result and the ratio label includes:
- a difference between the ratio result and the ratio label is calculated, and an absolute value of the difference is determined as a difference value.
- calculating the difference value between the ratio result and the ratio label includes:
- the difference between the ratio result and the ratio label is calculated, and the ratio of the absolute value of the difference to the ratio label is determined as the difference value.
- Another aspect of the present application discloses a method for processing a structured data set, comprising the following steps:
- Acquire an original data set and ownership mark information wherein the original data set is used to store structured data, and the structured data satisfies a predetermined data format; the ownership mark information includes secret information, a ratio label, and a specific mathematical property; the specific mathematical property is used to constrain the data obtained by calculating the secret information and watermark data through preset rules.
- the check value meets the preset mathematical characteristics; wherein, according to the preset rule, the probability that the check value calculated by any data satisfying the predetermined data format and the secret information meets the preset mathematical characteristics is less than a first threshold; the proportion label is greater than the first threshold;
- the target amount of watermark data is added to the original data set to obtain a target data set.
- obtaining ownership mark information includes:
- association information corresponding to the original data set is used to characterize the ownership of the original data set
- the secret information is generated based on the associated information.
- adding the target amount of watermark data to the original data set to obtain a target data set includes:
- Each watermark data is added to an insertion position in the original data set to obtain a target data set.
- the original data set is processed by a random insertion algorithm, a group insertion algorithm, a time series hybrid algorithm or a hybrid encryption algorithm to determine the insertion positions of the target quantity.
- a second acquisition unit is used to acquire the secret information corresponding to the structured data set, the proportion label of the watermark data and the specific mathematical property corresponding to the watermark data from the target object to be verified; the specific mathematical property is used to constrain the check value calculated by using the secret information and the watermark data according to the preset rules to meet the preset mathematical characteristics; wherein, according to the preset rules, the probability that any check value calculated by the data satisfying the predetermined data format and the secret information meets the preset mathematical characteristics is less than a first threshold; the proportion label is greater than the first threshold;
- a processing unit configured to identify the watermark data from the structured data set according to the secret information and the specific mathematical property
- a statistical unit is used to count the proportion of the watermark data in the structured data set, and determine the ownership relationship between the target object and the structured data set according to the proportion result and the proportion label.
- An information acquisition unit used to acquire an original data set and ownership mark information; wherein the original data set is used to store structured data, and the structured data meets a predetermined data format; the ownership mark information includes secret information, a ratio label and a specific mathematical property; the specific mathematical property is used to constrain a check value calculated by using the secret information and watermark data according to a preset rule to meet a preset mathematical feature; wherein, according to the preset rule, the probability that any check value calculated by using the data meeting the predetermined data format and the secret information meets the preset mathematical feature is less than a first threshold; the ratio label is greater than the first threshold;
- a first determining unit configured to determine watermark data from data satisfying the predetermined data format according to the secret information and the specific mathematical property
- a second determining unit configured to determine a target amount of watermark data to be added to the original data set according to the amount of business data included in the original data set and the ratio label;
- the data set acquisition unit is used to add the target amount of watermark data to the original data set to obtain a target data set.
- the present application discloses an electronic device, including a processor and a memory
- the memory is used to store programs
- the processor executes the program to implement the method for verifying ownership of a structured data set or the method for processing a structured data set.
- the present application discloses a computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the method for verifying ownership of a structured data set or the method for processing a structured data set.
- FIG1 is a schematic diagram of a watermark in conventional technology
- FIG2 is a schematic diagram of a watermark in conventional technology
- FIG3 is a flow chart of a method for verifying ownership of a structured data set provided in an embodiment of the present application
- FIG4 is a schematic diagram of a process for determining watermark data from structured data provided in an embodiment of the present application.
- FIG5 is a schematic diagram of a flow chart of a method for processing a structured data set provided in an embodiment of the present application
- FIG6 is a schematic diagram of a predetermined data format of business data provided in an embodiment of the present application.
- FIG7 is a schematic diagram of a process for generating secret information provided in an embodiment of the present application.
- FIG8 is a schematic diagram of inserting watermark data into an original data set provided in an embodiment of the present application.
- FIG9 is a schematic diagram of the structure of a device for verifying ownership of a structured data set provided in an embodiment of the present application.
- FIG. 10 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
- watermark is common in the field of multimedia copyright related technologies.
- additional data such as the creator's identity information used for copyright identification is embedded in multimedia content files such as images, audio and video in a form visible or invisible to the human eye, so as to determine the copyright ownership of these produced contents and safeguard the legitimate rights and interests of the creators.
- Embedded watermarking technology is widely used in the confirmation of ownership of unstructured data.
- watermarking technology for structured data sets is mainly divided into column watermark and row watermark.
- a column watermark is an additional data field that has no practical meaning (or little practical meaning), or is simply a decorative mark added to the existing data in terms of format.
- the mobile phone number 12300762185 is decorated with a mark to become ⁇ #12300762185# ⁇ to identify "this is my data", where ⁇ # and # ⁇ are the column watermark.
- the drawback of the column watermark is that the added data field that has no practical meaning (or little practical meaning) is very easy to distinguish. After other people or organizations (taking the data set theft party as an example below) obtain the structured data, it is easy to distinguish and remove the column watermark by machine means. After the column watermark is removed, the ownership of the structured data is difficult to identify.
- row watermarks do not change data from the column dimension, but generate multiple sets of forged data based on the original structured data and mix them into the data set. These forged data are used to watermark the structured data, which is equivalent to inserting "an entire fake data record.”
- a group of fake users are inserted into the mobile phone number data set. Their mobile phone numbers use non-existent numbers such as 111xxxxxxxx to identify "this is my data.”
- the defect of row watermarks is that forged structured data is usually significantly different from business data in format or content. For example, numbers such as 111xxxxxxxx can easily be identified as fake users by industry insiders, and it is difficult to fully integrate them. Removing row watermarks Printing is also easier for data set theft parties.
- the embodiment of the present application improves upon the existing watermark technology and proposes a method, a processing method, a device and a medium for verifying the ownership of a structured data set.
- the data set owner refers to an object entity that legally enjoys various legal rights and interests related to the structured data set.
- the structured data set of the data set owner will not be maliciously exploited by the data set thief, and the data set owner can normally use the business data in the structured data set for commercial activities and enjoy the legal rights and interests related thereto; however, in the information age, illegal acquisition and malicious use of structured data sets are common, which seriously affects the information security of the data set owner and hinders the data set owner from enjoying various legal rights and interests. Therefore, it is necessary to use the data watermarking technology of the embodiments of the present application to protect the information security of the structured data set.
- the data set verifier refers to an entity that wants to confirm the ownership of the structured data set.
- the embodiment of the present application proposes a method for verifying the ownership of a structured data set, which can be applied to the data set verification party. Specifically, the method includes the following steps:
- Step 310 obtaining a structured data set;
- the structured data set includes a plurality of structured data, each structured data is business data or watermark data, and the business data and the watermark data meet the same predetermined data format;
- Step 320 obtaining secret information corresponding to the structured data set, a ratio label of the watermark data, and a specific mathematical property corresponding to the watermark data from the target object to be verified; the specific mathematical property is used to constrain the check value calculated by using the secret information and the watermark data according to the preset rules to meet the preset mathematical characteristics; wherein, according to the preset rules, the probability that the check value calculated by using any data satisfying the predetermined data format and the secret information meets the preset mathematical characteristics is less than a first threshold;
- Step 330 identifying watermark data from the structured data set based on the secret information and the specific mathematical properties
- Step 340 calculate the proportion of the watermark data in the structured data set, and determine the ownership relationship between the target object and the structured data set based on the proportion result and the proportion label.
- a method for verifying ownership of a structured data set is provided, which can be applied to a data set verifier. Specifically, first, a structured data set that needs to be verified can be obtained and an object to be verified can be determined. In an embodiment of the present application, the object is recorded as a target object.
- the target object can be a related object that claims to hold a structured data set that needs to be verified, or an object that provides the structured data set to the data set verifier. In an embodiment of the present application, there is no limitation on this.
- the acquired structured data set includes multiple structured data, and these structured data satisfy the same predetermined data format, for example, the number of fields in each structured data is the same and each corresponding field has the same data format.
- each structured data in the structured data set is business data or watermark data.
- Business data refers to normal and real data, for example, it may include fields such as mobile phone number and ID card number; watermark data refers to structured data.
- the business data and the watermark data meet the same predetermined data format, and there is no discernible difference between the two in terms of format and content.
- the watermark data is not intuitively marked (such as setting the synthesized mobile phone number to start with a prefix that does not actually exist, such as 111), and the watermark data in the embodiment of the present application does not have any warning effect.
- the structured data is business data or watermark data, that is, no one can distinguish business data from watermark data.
- the secret information corresponding to the structured data set, the ratio label of the watermark data, and the specific mathematical properties of the watermark data can be obtained from the target object.
- the secret information for the owner of the structured data set, it is necessary to pre-select the secret information, the ratio label of the watermark data, and the specific mathematical properties of the watermark data for each structured data set with a claim demand, wherein the secret information must be kept confidential and authorized only to the data set verifier (such as the regulatory department, law enforcement agency) when necessary.
- the data set owner can generate watermark data with specific mathematical properties; a watermark data is indistinguishable from a business data in appearance, but for all possible data that meet the predetermined data format, only a small proportion of the synthetic data can meet the specific mathematical properties pre-set by the data set owner to become watermark data. Therefore, in the embodiment of the present application, the data set owner can control the proportion of the watermark data inserted in the structured data set, and record the ratio value as the ratio label.
- the data set verifier can use the secret information to identify each watermark data, and then determine whether the data set belongs to the target object by checking the appearance ratio of the watermark data in the structured data set.
- the specific mathematical property corresponding to the watermark data refers to the fact that the check value calculated using the secret information and the watermark data by the preset rules conforms to the preset mathematical characteristics, that is, the check value has a specific law or characteristic in the mathematical sense.
- a message authentication code Message Authentication Code
- the message authentication code is a special type of algorithm in the field of cryptography.
- This type of algorithm inputs a specified key and data (the former is secretly selected in advance; both can be readable strings or arbitrary bit strings), and outputs a data fingerprint (also called a data summary or just a message authentication code) of a standard length (such as 256 bits) in the form of a random number.
- a data fingerprint also called a data summary or just a message authentication code
- a standard length such as 256 bits
- the mathematical properties of message authentication codes also include: (a) data fingerprints are sensitive to both keys and data, and changes in either of the two inputs (even if only 1 bit changes) will result in completely different outputs; (b) it is easy to calculate data fingerprints from keys and data, but for keys of sufficient strength and data from a sufficiently large sample space, it is not practical to infer either the key or the data from the data fingerprint.
- secret information can be used as a key and structured data as data, and the algorithm can be used to calculate the check value corresponding to each structured data.
- the check value calculated using the secret information and the watermark data conforms to the preset mathematical characteristics.
- the specific form of the mathematical characteristics is not limited.
- the message authentication code is used as the check value.
- the preset mathematical feature can be: in some embodiments, the check value corresponding to the watermark data starts with 16 consecutive 1 bits, that is, it starts with two bytes ffff represented in hexadecimal; in some embodiments, the check value corresponding to the watermark data ends with 20 consecutive 0 bits; in some embodiments, the second byte and the second to last byte of the check value corresponding to the watermark data are both hexadecimal numbers aa.
- the message authentication code is a random number with a fixed length in form
- each of the above examples belongs to a low-probability event. Therefore, the specific mathematical properties can constrain only a small part of the synthetic data to become the watermark data, that is, it can be considered that the probability that the check value calculated by the preset rules for any data that meets the predetermined data format and the secret information meets the preset mathematical feature is less than the first threshold.
- the first threshold here can be determined according to the total amount of data in the predetermined data format and the actual needs. Generally speaking, the value of the first threshold can be as small as possible to reduce the interference that may occur in normal business data. Exemplarily, the first threshold can be set to 2-10 . In the embodiment of the present application, the size of the first threshold is not limited.
- the data set verifier based on the set specific mathematical properties, after the data set verifier obtains the secret information with the authorization of the target object (and only after obtaining the secret information), it can verify whether the structured data satisfies the specific mathematical properties one by one, thereby identifying whether each structured data is watermark data. Accordingly, the legitimate data set owner can reveal to the data set verifier those watermark data that appear to belong to the business data mixed in the structured data set, and prove the claim of ownership of the structured data set based on the appearance ratio of the watermark data in the structured data set.
- a key effect of this process is that only the data set owner and the authorized data set verifier can distinguish between business data and watermark data in the structured data set, so only they can count the proportion of watermark data. If the target object is not the legitimate owner of the data set, it cannot obtain the correct secret information. Furthermore, if the target object cannot provide the secret information, it can be determined that it is not the legitimate owner of the structured data set; or if the target object provides incorrect secret information, then the check value calculated based on the incorrect secret information and the structured data cannot correctly identify whether each structured data is watermark data based on the set specific mathematical properties, and the obtained proportion result will be far from the correct proportion label, so it can also be determined that it is not the legitimate owner of the structured data set.
- the owner of the data set needs to add a certain amount of watermark data to the structured data set, so that the proportion of the watermark data far exceeds the probability that the check value calculated by the conventional data that meets the predetermined data format and the secret information meets the preset mathematical characteristics. That is, the value of the ratio label is much larger than the first threshold, so that it can be ensured that the structured data set can be clearly distinguished as being processed based on the watermark data.
- the size of the ratio label there is no specific restriction on the size of the ratio label, and it can be set to 5% by way of example.
- this application involves cryptographic technology, but this application is not limited to a specific cryptographic algorithm.
- the cryptographic category that can generate a check value for structured data based on secret information (such as a message authentication code, a deterministic digital signature) can be used as the underlying algorithm of the preset rules in this application;
- this application does not extract information such as the producer's identity from the data to perform copyright identification such as content tracing. Instead, it uses secret information to verify each structured data in the structured data set one by one to identify the watermark data, and then determines the ownership of the structured data set based on the proportion of the watermark data in the structured data set.
- the one-by-one verification is to check whether the check value of each structured data satisfies a specific mathematical property
- watermark data is identified from a structured data set based on secret information and specific mathematical properties, including:
- Step 410 calculating the secret information and the structured data according to a preset rule to obtain a first verification value
- Step 420 judging whether the first verification value meets a preset mathematical characteristic according to a specific mathematical property
- Step 430 If the first verification value meets the preset mathematical characteristics, the structured data is determined as watermark data.
- a check value when determining watermark data from structured data, can be obtained by using secret information and structured data for calculation according to preset rules, and the check value is recorded as a first check value. Then, it can be determined whether the first check value meets the preset mathematical characteristics according to specific mathematical properties. If the first check value meets the preset mathematical characteristics, it can be determined as watermark data; conversely, if the first check value does not meet the preset mathematical characteristics, it can be determined as business data.
- determining the ownership relationship between the target object and the structured data set according to the ratio result and the ratio label includes:
- the business data when determining the ownership relationship between the target object and the structured data set based on the ratio result and the ratio label, the business data may conform to specific mathematical properties, and the structured data set may be modified, added, deleted, etc. by the party that has stolen the data set. Therefore, the ratio result and the ratio label here may not be completely consistent.
- the difference between the ratio result and the ratio label can be calculated, and the difference value here can be flexibly set as needed.
- the difference between the ratio result and the ratio label can be calculated, and then the absolute value of the difference can be determined as the difference value; in some embodiments, the absolute value of the difference as a percentage of the ratio label (or The ratio of the ratio result to the ratio label is determined as the difference value.
- the larger the difference value between the ratio result and the ratio label the less close the two are, and the less likely the target object is to be the owner of the structured data set; the smaller the difference value between the ratio result and the ratio label, the closer the two are, and the more likely the target object is to be the owner of the structured data set.
- a threshold value can be set, denoted as the second threshold value.
- the target object is the owner of the structured data set; if the calculated difference value is large, greater than or equal to the second threshold value, it can be determined that the target object is not the owner of the structured data set.
- a method for processing a structured data set which can be used to generate a structured data set with watermark data and can be used by the data set owner.
- the processing method includes:
- Step 510 obtaining an original data set and ownership mark information; wherein the original data set is used to store structured data, and the structured data meets a predetermined data format; the ownership mark information includes secret information, a ratio label, and a specific mathematical property; the specific mathematical property is used to constrain a check value calculated by using the secret information and the watermark data through a preset rule to meet a preset mathematical feature; wherein, through the preset rule, the probability that any check value calculated by using the data meeting the predetermined data format and the secret information meets the preset mathematical feature is less than a first threshold; the ratio label is greater than the first threshold;
- Step 520 determining watermark data from data satisfying a predetermined data format according to the secret information and the specific mathematical property
- Step 530 determining the target amount of watermark data to be added to the original data set according to the amount of business data and the ratio label contained in the original data set;
- Step 540 Add the target amount of watermark data to the original data set to obtain the target data set.
- a method for processing a structured data set is provided, which can be used by the owner of the data set.
- the owner of the data set can obtain the original data set and the ownership mark information, wherein the original data set is a structured data set, which can be used to store related structured data.
- These structured data meet the predetermined data format, and the watermark data will be searched and filtered out based on the predetermined data format in the future.
- the predetermined data format presented is a data structure of country code + domestic destination code + user number; similarly, when the original data set is used to store user addresses, the predetermined data format presented is a data structure of province + city + county/district + town/street + community.
- understanding the predetermined data format helps to produce watermark data similar to real business data, so that in the absence of specific secret information, no party can judge whether the structured data set contains business data or watermark data, that is, no one can distinguish business data from watermark data.
- all the data in the target data set can be set to be watermark data.
- the obtained original data set may not include any real business data.
- the ownership mark information is used to realize the ownership mark of the original data set, which may include secret information, ratio labels and specific mathematical properties.
- secret information may include secret information, ratio labels and specific mathematical properties.
- secret information there is usually no coupling relationship between the three types of information in the ownership mark information, so there is no order of precedence when selecting.
- secret information must be kept confidential and authorized only to a specific data set verifier (regulatory department, law enforcement agency) when necessary; other information needs to be announced to the data set verifier, and can also be made public to the whole society.
- a candidate data can be selected from the data that meets the predetermined data format according to a certain strategy (random selection or traversal in a certain order), and calculated according to the preset rules. If the calculation result just meets the selected specific mathematical property, the candidate data is output as watermark data, if not, it is ignored (candidate data is not watermark data). In this way, the watermark data can be determined from the data that meets the predetermined data format.
- the target number of watermark data that needs to be added to the original data set is also determined based on the number of business data contained in the original data set and the proportion label.
- the watermark data can be mixed with the business data so that the proportion of watermark data in the original data set is equal to the pre-selected proportion label, so that the processed target data set can be obtained.
- the watermark data cannot be distinguished from the business data in appearance. For all possible data, only a small proportion of the data can meet the specific mathematical properties and thus become watermark data. This means that a large amount of data needs to be calculated one by one to find the relatively small number of watermark data that can meet the specific mathematical properties.
- the specific search process can be random selection or traversal of data according to certain conditions, which is not limited by the present application.
- obtaining ownership mark information includes:
- Step 710 obtaining association information corresponding to the original data set; the association information is used to characterize the ownership of the original data set;
- Step 720 generating secret information according to the associated information.
- information with natural semantics can be used to generate secret information when processing the data set, so as to facilitate the subsequent possible attribute proof operations.
- the associated information corresponding to it can be obtained, and the associated information here can be information used to characterize the ownership of the original data set, for example, it can be "A Company XXX Exclusive", and then, the secret information can be generated based on the associated information, for example, the associated information can be directly used as secret information, or it can be processed as secret information, and the present application does not limit this.
- adding a target amount of watermark data to an original data set to obtain a target data set includes:
- Each watermark data is added to an insertion position in the original data set to obtain the target data set.
- the target number of watermark data is added to the original data set, which can be implemented in a mixed way, for example, the original data set is processed by a random insertion algorithm, a group insertion algorithm, a time series mixing algorithm or a hybrid encryption algorithm to determine the insertion position of the target number.
- the mixing mentioned in the embodiment of the present application refers to inserting the watermark data relatively evenly into the original business data, so that it is impossible to judge whether the data is business data or watermark data based on the position of a data in the data set (the row in the data table).
- the target number of insertion positions can be determined in the original data set, and then each watermark data is added to an insertion position to obtain the target data set.
- the original data set has 9500 business data.
- the structured data set obtained has a total of 10000 data, of which 5% of the data is watermark data, but it is impossible to judge where they appear, and there is no rule. In this way, the ownership security of the structured data set can be improved.
- the embodiments of the present application are applicable to situations where the business data contains only a single field, and are also applicable to situations where the business data contains multiple fields.
- Each embodiment is described below.
- the examples given are intended to illustrate the concepts and mathematical calculation processes involved in this application, and do not mean that this is the case in the real world (for example, the embodiments use a hypothetical 123 network segment mobile phone number).
- the following technical conventions are uniformly made in the examples given below:
- the string is encoded according to the internationally accepted UTF-8 rules.
- the encoding result of the string " ⁇ " consisting of two Chinese characters is an array of 6 bytes, and its hexadecimal representation is e695b0e5ada6.
- the watermark data verification adopts the internationally popular message authentication code HMAC-SHA-256, and the calculation result is an array of 32 bytes.
- HMAC-SHA-256 the internationally popular message authentication code
- the character encoding may adopt the Chinese standard GB 18030.
- the underlying message authentication code of the mathematical properties of the watermark data may adopt the Chinese standard HMAC-SM3 or CMAC-SM4, or the international KMAC based on SHA-3.
- Example 1 Providing attribute proof for a data set containing only a single field.
- the company as the owner of the data set, can generate a data set consisting entirely of watermark data by traversing incrementally from 12300000000, 12300023180, 12300034919, 12300078978, 12300088650, 12300393151, 12300421487, 12300600146, 12300814686, 12300857998, 12301037953...
- the regulatory department encodes the secret information and each mobile phone number as a string in UTF-8, substitutes it into the HMAC-SHA-256 formula to calculate the message authentication code shown in Table 1 below (only 10 data are shown due to space limitations):
- the company also informed the regulatory authorities that the specific mathematical property of the watermark data is that the obtained message authentication code starts with 16 consecutive 1 bits. For any random data, the probability of satisfying this mathematical property is 2 -16 ; that is, the probability of a piece of data becoming watermark data is only about 1.5 in 100,000.
- Table 1 the regulatory authorities verified that all the exposed mobile phone numbers are watermark data, so there is reason to believe that these mobile phone numbers are pre-selected test numbers by the company, rather than being allocated to real individual users and then leaked in a security incident. If a data set is leaked due to a security incident, it is impossible to find the secret information that makes all the data meet the specific mathematical property (thus being confirmed as watermark data) afterwards.
- the secret information provided by the company "China XXXX Co., Ltd. February 2024 Test Special" itself also limits the ownership of the data set, and can be combined to prove that this batch of structured data sets are pre-selected test numbers.
- Embodiment 2 Providing attribute proof for a data set containing multiple fields.
- Company A s customer data (including at least 3 fields: ID card Company A wants to provide attribute proof for its customer data set, so it selects a secret message every quarter, generates and inserts watermark data at a ratio of 5% for the transaction customer data set of that quarter (1 watermark data is inserted into every 19 business data, and the insertion position is randomly selected).
- Company B cannot identify these watermark data without the secret information, and is not even aware of their existence.
- Table 2 is a sample of watermark data inserted by Company A into its customer data set in the first quarter of 2025 (each field can be synthetic), where each row of data can be concatenated and then encoded to calculate the message authentication code according to the aforementioned "simple method":
- Company A discloses to the law enforcement agency that its secret information corresponding to a data set is "China Party A Co., Ltd. 2025 first quarter test special", in which the specific mathematical property of the watermark data is that the obtained message authentication code ends with 20 consecutive 0 bits. For any piece of data, the probability of satisfying this mathematical property is 2-20 ; that is, the probability of a random data becoming watermark data is less than one in a million.
- the watermark data is hidden in the business data so that the two cannot be distinguished in terms of format, content, etc. Without specific secret information, no one can determine whether the data is business data or watermark data.
- the present application confirms the structured data set based on the proportional characteristics formed by the watermark data and the business data. It not only confirms the ownership of the data set, but also introduces secret information. Compared with the method of directly confirming the ownership through watermark data, it is more difficult to crack and can better protect the legitimate rights and interests of the data set owner.
- the ownership verification device for a structured data set proposed in an embodiment of the present application includes:
- a first acquisition unit 910 is used to acquire a structured data set;
- the structured data set includes a plurality of structured data, each of which is business data or watermark data, and the business data and the watermark data meet the same predetermined data format;
- the second acquisition unit 920 is used to acquire the secret information corresponding to the structured data set, the proportion label of the watermark data and the specific mathematical property corresponding to the watermark data from the target object to be verified; the specific mathematical property is used to constrain the check value calculated by using the secret information and the watermark data according to the preset rules to meet the preset mathematical characteristics; wherein, according to the preset rules, the probability that the check value calculated by any data satisfying the predetermined data format and the secret information meets the preset mathematical characteristics is less than a first threshold; the proportion label is greater than the first threshold;
- a processing unit 930 configured to identify watermark data from a structured data set based on the secret information and a specific mathematical property
- the statistical unit 940 is used to count the proportion of the watermark data in the structured data set, and determine the ownership relationship between the target object and the structured data set according to the proportion result and the proportion label.
- the processing unit 930 is also used to calculate the secret information and structured data according to preset rules to obtain a first verification value; based on specific mathematical properties, determine whether the first verification value meets the preset mathematical characteristics; if the first verification value meets the preset mathematical characteristics, determine the structured data as watermark data.
- the statistical unit 940 is further configured to calculate a difference value between the ratio result and the ratio label; if the difference value is less than a second threshold, it is determined that the target object is the owner of the structured data set.
- the statistical unit 940 is further configured to calculate a difference between the ratio result and the ratio label, and determine an absolute value of the difference as a difference value.
- the statistical unit 940 is further configured to calculate the difference between the proportion result and the proportion label, and determine the ratio of the absolute value of the difference to the proportion label as the difference value.
- a structured data set processing device comprising:
- An information acquisition unit is used to acquire an original data set and ownership mark information; wherein the original data set is used to store structured data, and the structured data meets a predetermined data format; the ownership mark information includes secret information, a ratio label and a specific mathematical property; the specific mathematical property is used to constrain a check value calculated by using a preset rule, secret information and watermark data to meet a preset mathematical feature; wherein, according to the preset rule, the probability that any check value calculated by using data meeting a predetermined data format and secret information meets the preset mathematical feature is less than a first threshold; the ratio label is greater than the first threshold;
- the first determining unit is used to determine, from the data satisfying the predetermined data format, the secret information and the specific mathematical property.
- Set watermark data
- the data set acquisition unit is used to add a target amount of watermark data to the original data set to obtain a target data set.
- the information acquisition unit is used to acquire associated information corresponding to the original data set; the associated information is used to characterize the ownership of the original data set; and secret information is generated based on the associated information.
- the data set acquisition unit is further used to determine a target number of insertion positions in the original data set; and add each watermark data to an insertion position in the original data set to obtain a target data set.
- the data set acquisition unit is further used to process the original data set using a random insertion algorithm, a group insertion algorithm, a time series hybrid algorithm or a hybrid encryption algorithm to determine the insertion position of the target number.
- an embodiment of the present application provides an electronic device, including:
- At least one processor 1010 At least one processor 1010;
- At least one memory 1020 used to store at least one program
- At least one processor 1010 When at least one program is executed by at least one processor 1010, at least one processor 1010 implements a method for verifying ownership of a structured data set or a method for processing a structured data set.
- the contents of the above method embodiments are all applicable to the electronic device embodiments.
- the functions specifically implemented by the electronic device embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
- An embodiment of the present application also provides a computer-readable storage medium, which stores a program executable by the processor 1010.
- the program executable by the processor 1010 is executed by the processor 1010, it is used to execute the above-mentioned method for verifying the ownership of the structured data set or the method for processing the structured data set.
- the contents of the above method embodiments are all applicable to the computer-readable storage medium embodiments.
- the functions specifically implemented by the computer-readable storage medium embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
- the functions/operations mentioned in the block diagram may not occur in the order mentioned in the operation diagram.
- two boxes shown in succession may actually be executed substantially simultaneously or the boxes may sometimes be executed in the reverse order.
- the embodiments presented and described in the flowcharts of the present application are provided by way of example for the purpose of providing a more comprehensive understanding of the technology. The disclosed method is not limited to the operations and logical flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and other Sub-operations described as part of a larger operation are executed independently.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the computer software product is stored in a storage medium, including a number of instructions to enable a device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program code.
- the logic and/or steps represented in the flowchart or otherwise described herein, for example, can be considered as an ordered list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by an instruction execution system, device or apparatus (such as a computer-based system, a system including a processor, or other system that can fetch instructions from an instruction execution system, device or apparatus and execute instructions), or in conjunction with such instruction execution systems, devices or apparatuses.
- "computer-readable medium” can be any device that can contain, store, communicate, propagate or transmit a program for use by an instruction execution system, device or apparatus, or in conjunction with such instruction execution systems, devices or apparatuses.
- computer-readable media include the following: an electrical connection with one or more wires (electronic device), a portable computer disk case (magnetic device), a random access memory (RAM), a read-only memory (ROM), an erasable and programmable read-only memory (EPROM or flash memory), a fiber optic device, and a portable compact disk read-only memory (CDROM).
- the computer-readable medium may even be a paper or other suitable medium on which the program is printed, since the program may be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, deciphering or, if necessary, processing in another suitable manner, and then stored in a computer memory.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Editing Of Facsimile Originals (AREA)
- Storage Device Security (AREA)
- Image Processing (AREA)
Abstract
La présente demande concerne un procédé de vérification de propriété et un procédé de traitement pour un ensemble de données structurées, un dispositif et un support. Le procédé de vérification de propriété consiste à : acquérir un ensemble de données structurées ; à partir d'un objet cible à vérifier, acquérir des informations secrètes correspondant à l'ensemble de données structurées, une étiquette proportionnelle de données de filigrane et des propriétés mathématiques spécifiques correspondant aux données de filigrane, au moyen d'une règle prédéfinie, la probabilité qu'une valeur de vérification calculée à partir de n'importe quelles données satisfaisant un format de données prédéterminé et les informations secrètes satisfait une caractéristique mathématique prédéfinie étant inférieure à un premier seuil, et l'étiquette proportionnelle étant supérieure au premier seuil ; sur la base des informations secrètes et des propriétés mathématiques spécifiques, identifier les données de filigrane à partir de l'ensemble de données structurées ; et recueillir des statistiques concernant un résultat proportionnel des données de filigrane dans l'ensemble de données structurées, et déterminer la relation de propriété entre l'objet cible et l'ensemble de données structurées sur la base du résultat proportionnel et de l'étiquette proportionnelle.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311467146.2 | 2023-11-06 | ||
| CN202311467146.2A CN117521038B (zh) | 2023-11-06 | 2023-11-06 | 结构化数据集的权属验证方法、处理方法、设备与介质 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025098109A1 true WO2025098109A1 (fr) | 2025-05-15 |
Family
ID=89744868
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/125335 Pending WO2025098109A1 (fr) | 2023-11-06 | 2024-10-16 | Procédé de vérification de propriété et procédé de traitement pour ensemble de données structurées, dispositif et support |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN117521038B (fr) |
| WO (1) | WO2025098109A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120277641A (zh) * | 2025-06-06 | 2025-07-08 | 南京信息工程大学 | 一种基于互信息的联邦学习版权保护方法及系统 |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117521038B (zh) * | 2023-11-06 | 2025-09-05 | 中国电信股份有限公司 | 结构化数据集的权属验证方法、处理方法、设备与介质 |
| CN119150264A (zh) * | 2024-11-19 | 2024-12-17 | 杭州半云科技有限公司 | 一种无损数据内容的数据水印植入及识别方法 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050055554A1 (en) * | 2003-05-23 | 2005-03-10 | Radu Sion | Method and system for rights assessment over digital data through watermarking |
| CN107832626A (zh) * | 2017-11-30 | 2018-03-23 | 中国人民解放军国防科技大学 | 一种面向数据流通的结构化数据确权方法 |
| CN109740316A (zh) * | 2018-12-27 | 2019-05-10 | 北京三未信安科技发展有限公司 | 一种动态水印嵌入、验证方法及系统和动态水印处理系统 |
| CN117521038A (zh) * | 2023-11-06 | 2024-02-06 | 中国电信股份有限公司 | 结构化数据集的权属验证方法、处理方法、设备与介质 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112948895A (zh) * | 2019-12-10 | 2021-06-11 | 航天信息股份有限公司 | 数据的水印嵌入方法、水印溯源方法及装置 |
| US11699209B2 (en) * | 2020-10-22 | 2023-07-11 | Huawei Cloud Computing Technologies Co., Ltd. | Method and apparatus for embedding and extracting digital watermarking for numerical data |
| CN116702103A (zh) * | 2023-06-19 | 2023-09-05 | 建信金融科技有限责任公司 | 数据库水印处理方法、数据库水印溯源方法及装置 |
-
2023
- 2023-11-06 CN CN202311467146.2A patent/CN117521038B/zh active Active
-
2024
- 2024-10-16 WO PCT/CN2024/125335 patent/WO2025098109A1/fr active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050055554A1 (en) * | 2003-05-23 | 2005-03-10 | Radu Sion | Method and system for rights assessment over digital data through watermarking |
| CN107832626A (zh) * | 2017-11-30 | 2018-03-23 | 中国人民解放军国防科技大学 | 一种面向数据流通的结构化数据确权方法 |
| CN109740316A (zh) * | 2018-12-27 | 2019-05-10 | 北京三未信安科技发展有限公司 | 一种动态水印嵌入、验证方法及系统和动态水印处理系统 |
| CN117521038A (zh) * | 2023-11-06 | 2024-02-06 | 中国电信股份有限公司 | 结构化数据集的权属验证方法、处理方法、设备与介质 |
Non-Patent Citations (1)
| Title |
|---|
| WANG ZHEN , LI JIAN-MIN, ZHOU NAN-RUN, LIN ZHEN-RONG: "Digital Watermarking Algorithm for Relational Database based on Sharing and Grouping", COMMUNICATIONS TECHNOLOGY, vol. 42, no. 4, 10 April 2009 (2009-04-10), CN , pages 132 - 134+138, XP093314188, ISSN: 1002-0802 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120277641A (zh) * | 2025-06-06 | 2025-07-08 | 南京信息工程大学 | 一种基于互信息的联邦学习版权保护方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117521038A (zh) | 2024-02-06 |
| CN117521038B (zh) | 2025-09-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109657426B (zh) | 一种基于数字签名和数字水印的数据溯源方法 | |
| WO2025098109A1 (fr) | Procédé de vérification de propriété et procédé de traitement pour ensemble de données structurées, dispositif et support | |
| US10769252B2 (en) | Method and apparatus for watermarking of digital content, method for extracting information | |
| US20220337392A1 (en) | Automatic digital media authenticator | |
| Rey et al. | A survey of watermarking algorithms for image authentication | |
| Li et al. | Tamper detection and localization for categorical data using fragile watermarks | |
| US7978859B2 (en) | Private and controlled ownership sharing | |
| Li et al. | Constructing a virtual primary key for fingerprinting relational data | |
| CN110866223B (zh) | 基于区块链和图片隐写术的版权保护方法 | |
| CN108389059A (zh) | 基于权属的数字版权作品保护、交易和发行方法及系统 | |
| CN100337423C (zh) | 一种电子文档的保密、认证、权限管理与扩散控制的处理方法 | |
| CN109740316B (zh) | 一种动态水印嵌入、验证方法及系统和动态水印处理系统 | |
| US20130022230A1 (en) | Digital content management system, verification device, program thereof, and data processing method | |
| CN104850765A (zh) | 一种水印处理方法、装置及系统 | |
| KR20040095335A (ko) | 내용 자료의 폐기 | |
| CN103390121B (zh) | 数字作品权属认证方法和系统 | |
| CN113987581A (zh) | 一种智慧安防社区平台数据安全防护与查看溯源的方法 | |
| Delaigle et al. | Digital images protection techniques in a broadcast framework: an overview | |
| CN109242729A (zh) | 版权管理方法、装置、计算机设备与可读存储介质 | |
| JP2007207051A (ja) | 電子透かし埋め込み・配信方法、及び装置 | |
| CN114021084A (zh) | 一种基于跨媒介攻击数字水印技术实现方法 | |
| CN117557441B (zh) | 图像版权保护与交易认证方法、装置及介质 | |
| CN115955308B (zh) | 基于抗量子密钥的数字资产处理方法、装置、设备及介质 | |
| Shaw | Digital document integrity | |
| CN116167071A (zh) | 一种基于区块链的数字资产确权登记方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24887722 Country of ref document: EP Kind code of ref document: A1 |