JP2013152670A

JP2013152670A - Database disturbance parameter setting device, database disturbance system and method, and database disturbance device

Info

Publication number: JP2013152670A
Application number: JP2012013873A
Authority: JP
Inventors: Masaru Igarashi; 大五十嵐; Koji Senda; 浩司千田; Akira Kikuchi; 亮菊池; Hiroki Hamada; 浩気濱田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2012-01-26
Filing date: 2012-01-26
Publication date: 2013-08-08
Anticipated expiration: 2032-01-26
Also published as: JP5639094B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique capable of achieving Pk-anonymity even if an attacker can view arbitrary M attributes among A attributes.SOLUTION: A database disturbance parameter setting device uses a database disturbance parameter determination part 11 for determining a parameter pused for a database disturbance part 21 which performs disturbance of a table by performing disturbance based on a function A(p)determined by a prescribed parameter pto make an attribute v' after disturbance, assuming that |N| is the number of records, M is a prescribed natural number not more than A-1, α=((k-1)/(N-1)), ess inf * is essential lower limit of *, the attribute value of each attribute a is v for each of the attributes included in a table, the domain of attribute values u, v before disturbance is V, and attribute values v', u' after disturbance is V'.

Description

この発明は、プライバシーを保護しながらデータマイニングを行う技術に関する。 The present invention relates to a technique for performing data mining while protecting privacy.

いわゆるＰｋ−匿名性を満たすデータベース撹乱技術及びそのデータベース撹乱技術で用いられるパラメータ決定技術が、非特許文献１で提案されている（例えば、非特許文献１参照。）。具体的には、非特許文献１には、攻撃者がＡ個の属性の全てを見ることができるとした場合にＰｋ−匿名性を実現するためのデータベース撹乱技術及びパラメータ決定技術が記載されている。 A database disturbance technique that satisfies so-called Pk-anonymity and a parameter determination technique used in the database disturbance technique have been proposed in Non-Patent Document 1 (for example, see Non-Patent Document 1). Specifically, Non-Patent Document 1 describes a database disturbance technique and a parameter determination technique for realizing Pk-anonymity when an attacker can see all of the A attributes. Yes.

Ｐｋ−匿名性は、データベースの各レコードと、その各レコードに対応する個人とを１／ｋ以上の確率で結びつけることができないという性質である。 Pk-anonymity is a property that each record in the database and an individual corresponding to each record cannot be associated with a probability of 1 / k or more.

五十嵐大、他２名、「数値属性におけるｋ−匿名性を満たすランダム化手法」、ＣＳＳ２０１１、２０１１University of Igarashi, 2 others, “Randomization method that satisfies k-anonymity in numerical attributes”, CSS 2011, 2011

しかしながら、ＭをＡ−１以下の整数として、攻撃者がＡ個の属性のうち任意のＭ個の属性を見ることができるとした場合にＰｋ−匿名性を実現するための技術は、非特許文献１には記載されていない。 However, a technique for realizing Pk-anonymity when M is an integer equal to or less than A-1 and an attacker can view any M attributes among the A attributes is not patented. It is not described in Document 1.

この発明は、攻撃者がＡ個の属性のうち任意のＭ個の属性を見ることができるとした場合にＰｋ−匿名性を実現するためのデータベース撹乱パラメータ設定装置、データベース撹乱システム及び方法並びにデータベース撹乱装置を提供することを目的とする。 The present invention relates to a database disturbance parameter setting device, a database disturbance system and method, and a database for realizing Pk-anonymity when an attacker can view any M attributes of A attributes. An object is to provide a disturbance device.

この発明の一態様によるデータベース撹乱パラメータ設定装置は、テーブルは複数のレコードを含み、Ａを所定の２以上の整数とし、各レコードはレコード識別子及びＡ個の属性値を含み、ｋをセキュリティパラメータとし、|Ｎ|をレコードの数とし、ＭをＡ−１以下の所定の自然数とし、α＝（（ｋ−１）／（Ｎ−１））^１／Ｍとして、ｅｓｓｉｎｆ・を・の本質的下限として、テーブルに含まれる属性のそれぞれについて、そのそれぞれの属性ａの属性値をｖとし、撹乱前の属性値ｖ，ｕの定義域をＶ_ａとし、撹乱後の属性値ｖ’，ｕ’の定義域をＶ’_ａとして、所定のパラメータｐ_ａにより定まる関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’に基づく撹乱を行い撹乱後の属性値ｖ’とすることによりテーブルの撹乱を行うデータベース撹乱装置に用いられるパラメータｐ_ａを決定するデータベース撹乱パラメータ決定装置において、下記式を満たすパラメータｐ_ａを決定するパラメータ決定部 In the database disturbance parameter setting device according to one aspect of the present invention, the table includes a plurality of records, A is a predetermined integer of 2 or more, each record includes a record identifier and A attribute values, and k is a security parameter. , | N | is the number of records, M is a predetermined natural number less than or equal to A-1, α = ((k−1) / (N−1)) ^{1 / M} , and ess inf is essentially the lower limit for each attribute included in the table, then the attribute values of the respective attributes a and v, disturbance before an attribute value v, the domain of u and V _a, the attribute values v after disturbance ', u' Database disturbance in which a domain is defined as V ′ _a , disturbance is performed based on _a function A _a (pa) _{v, v ′} determined by _a predetermined parameter pa, and the attribute value v ′ after the disturbance is used to disturb the table Used in equipment In Database disturbance parameter determining device for determining a that parameter p _a, parameter determination unit determining a parameter p _a satisfying the following formula

を含む。 including.

この発明の一態様によるデータベース撹乱システムは、上記データベース撹乱パラメータ決定装置と、上記データベース撹乱装置と、を含む。 The database disturbance system by one aspect | mode of this invention contains the said database disturbance parameter determination apparatus and the said database disturbance apparatus.

この発明の一態様によるデータベース撹乱装置は、テーブルは複数のレコードを含み、Ａを所定の２以上の整数とし、各レコードはレコード識別子及び少なくとも１つの属性値を含み、ｋをセキュリティパラメータとし、|Ｎ|をレコードの数とし、ＭをＡ−１以下の所定の自然数とし、α＝（（ｋ−１）／（Ｎ−１））^１／Ｍとして、ｅｓｓｉｎｆ・を・の本質的下限として、テーブルに含まれるＭ以下の個数の属性のそれぞれについて、そのそれぞれの属性ａの属性値をｖとし、撹乱前の属性値ｖ，ｕの定義域をＶ_ａとし、撹乱後の属性値ｖ’，ｕ’の定義域をＶ’_ａとして、所定のパラメータｐ_ａにより定まる関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’に基づく撹乱を行い撹乱後の属性値ｖ’とすることによりテーブルの撹乱を行う撹乱部を含み、パラメータｐ_ａは、下記式の関係を満たす。 In the database disturbance device according to an aspect of the present invention, the table includes a plurality of records, A is a predetermined integer equal to or greater than 2, each record includes a record identifier and at least one attribute value, k is a security parameter, N | is the number of records, M is a predetermined natural number equal to or less than A-1, α = ((k−1) / (N−1)) ^{1 / M} , and ess inf · is an essential lower limit of for each attribute of the following number M included in the table, then the attribute values of the respective attributes a and v, disturbance before an attribute value v, the domain of u and V _a, the attribute values v after disturbance ' , U ′ is defined as V ′ _a , disturbance based on the function A _a (p _a ) _{v, v ′} determined by _a predetermined parameter pa is performed, and the attribute value v ′ after the disturbance is set, thereby disturbing the table. Including the disturbance section to perform Over data p _a satisfy the relationship of the following formula.

攻撃者がＡ個の属性のうち任意のＭ個の属性を見ることができるとした場合にも、Ｐｋ−匿名性を実現することができる。 Pk-anonymity can also be realized when an attacker can view any M attributes of A attributes.

第一実施形態のデータベース撹乱システムを説明するためのブロック図。The block diagram for demonstrating the database disturbance system of 1st embodiment. 第一実施形態のデータベース撹乱システムを説明するための流れ図。The flowchart for demonstrating the database disturbance system of 1st embodiment. 撹乱の対象となるデータベースの例を説明するための図。The figure for demonstrating the example of the database used as the object of disturbance. 撹乱の対象となるデータベースの例を説明するための図。The figure for demonstrating the example of the database used as the object of disturbance.

以下、図面を参照して、この発明の実施形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

［第一実施形態］
第一実施形態のデータベース撹乱システムは、図１に例示するように、データ提供者装置２、匿名データサーバ１及び分析ユーザ装置３を備えている。 [First embodiment]
The database disturbance system of 1st embodiment is provided with the data provider apparatus 2, the anonymous data server 1, and the analysis user apparatus 3, as illustrated in FIG.

匿名データサーバ１は、匿名化されたデータを記憶するサーバである。匿名データサーバ１は、パラメータ決定部１１及び記憶部１２を例えば備えている。パラメータ決定部１１が、特許請求の範囲のデータベース撹乱パラメータ決定装置に対応している。 The anonymous data server 1 is a server that stores anonymized data. The anonymous data server 1 includes, for example, a parameter determination unit 11 and a storage unit 12. The parameter determination unit 11 corresponds to the database disturbance parameter determination device in the claims.

データ提供者装置２は、データを匿名データサーバ１に預けようとする主体である。データ提供者装置２は、例えば、データを匿名データサーバ１に預けようとするユーザが有するＰＣ、携帯情報端末装置等のコンピュータである。データ提供者装置２は、２以上存在していてもよい。データ提供者装置２は、撹乱部２１を例えば備えている。撹乱部２１が、特許請求の範囲のデータベース撹乱装置に対応している。 The data provider device 2 is a subject that intends to deposit data in the anonymous data server 1. The data provider device 2 is, for example, a computer such as a PC or a portable information terminal device possessed by a user who wants to deposit data in the anonymous data server 1. Two or more data provider devices 2 may exist. The data provider device 2 includes a disturbing unit 21, for example. The disturbing unit 21 corresponds to the database disturbing device recited in the claims.

分析ユーザ装置３は、匿名化されたデータに基づいて集計処理を行う装置である。分析ユーザ装置３も、２以上存在していてもよい。 The analysis user device 3 is a device that performs aggregation processing based on anonymized data. Two or more analysis user devices 3 may exist.

まず、撹乱の対象となるデータベースについて説明する。撹乱の対象となるデータベースは、図３及び図２に例示するように、複数のレコードから構成されている。 First, a database to be disturbed will be described. The database to be disturbed is composed of a plurality of records as illustrated in FIGS. 3 and 2.

各レコードは、レコード識別子と少なくとも１つの属性値とから構成されている。レコード識別子は、個人を識別する識別子であり、いわゆるレコードＩＤである。レコード識別子は、例えば氏名や氏名に対応するＩＤ番号である。この実施形態では、レコード識別子は、データ提供者又はデータ提供者装置２を識別する識別子である。 Each record is composed of a record identifier and at least one attribute value. The record identifier is an identifier for identifying an individual and is a so-called record ID. The record identifier is, for example, a name or an ID number corresponding to the name. In this embodiment, the record identifier is an identifier for identifying the data provider or the data provider device 2.

各属性値は、ｎ次元実数ベクトルの部分集合Ｖに含まれるベクトルであり、いわゆる数値属性値である。ｎは、１以上の整数である。ｎ＝１であり属性が例えば「中間テストの点数」や「期末テストの点数」である場合には、属性値は０から１００までの何れかの整数である。 Each attribute value is a vector included in the subset V of the n-dimensional real vector, and is a so-called numerical attribute value. n is an integer of 1 or more. When n = 1 and the attribute is, for example, “intermediate test score” or “term test score”, the attribute value is any integer from 0 to 100.

また、各属性値は、いわゆるカテゴリ属性値であってもよい。カテゴリ属性値とは、例えば性別等の属性値であり、数値属性値とは異なり属性値の取り得る値がいくつかに制限されている属性値のことである。 Each attribute value may be a so-called category attribute value. The category attribute value is, for example, an attribute value such as gender, and is an attribute value that is limited to several values that the attribute value can take, unlike the numerical attribute value.

図３は、各属性が数値属性である場合のデータベースを例示している。一方、図４は、数値属性の属性とカテゴリ属性の属性との両方を含むデータベースを例示している。 FIG. 3 illustrates a database when each attribute is a numerical attribute. On the other hand, FIG. 4 illustrates a database including both numerical attribute attributes and category attribute attributes.

＜ステップＳ１＞
まず、匿名データサーバ１のパラメータ決定部１１は、下記式を満たすパラメータｐ_ａを決定する（ステップＳ１）。決定されたパラメータｐ_ａは、各データ提供者装置２に送信される。属性の種類の数をＡとする。Ａは、２以上の所定の整数である。パラメータ決定部１１は、属性ａ（ａ＝１，２，…，Ａ）のそれぞれについて、パラメータｐ_ａを決定する。パラメータｐ_ａは、例えばいわゆる二分法により計算される。 <Step S1>
First, the parameter determination unit 11 of the anonymous data server 1 determines parameters p _a satisfying the following equation (step S1). Determined parameter p _a is transmitted to the data provider device 2. Let A be the number of attribute types. A is a predetermined integer of 2 or more. Parameter determination unit 11, the attribute a (a = 1,2, ..., A) for each, to determine the parameters _{p a.} Parameter p _a is calculated by for example the so-called bisection method.

ここで、ｋをセキュリティパラメータとし、|Ｎ|をレコードの数とし、ＭをＡ−１以下の所定の自然数として、α＝（（ｋ−１）／（Ｎ−１））^１／Ｍである。また、ｅｓｓｉｎｆ・は、・の本質的下限である。 Here, α = ((k−1) / (N−1)) ^{1 / M} , where k is a security parameter, | N | is the number of records, and M is a predetermined natural number equal to or less than A-1. . In addition, ess inf · is an essential lower limit of ·.

関数ｆ（ｘ）の定義域をχとすると、関数ｆ（ｘ）の本質的下限ｅｓｓｉｎｆｆ（ｘ）は、具体的には以下のように書ける。μ（｛ｆ＜ｂ｝）を、関数ｆ（ｘ）＜ｂとなる領域の測度（例えば、面積又は体積）とする。下記式のＲは実数を意味する。 When the domain of the function f (x) is χ, the essential lower limit ess inf f (x) of the function f (x) can be specifically written as follows. Let μ ({f <b}) be a measure (eg, area or volume) of the region where the function f (x) <b. R in the following formula means a real number.

Ｖ_ａは属性ａの撹乱前の属性値ｖ，ｕの定義域であり、Ｖ’_ａは属性ａの撹乱後の属性値ｖ’，ｕ’の定義域である。Ａ_ａ（ｐ_ａ）_ｖ，ｖ’は、所定のパラメータｐ_ａにより定まる関数であり、撹乱前の属性値ｖが撹乱後に属性値ｖ’となる確率を表す。Ａ_ａ（ｐ_ａ）_ｕ，ｖ’、Ａ_ａ（ｐ_ａ）_ｖ，ｕ’及びＡ_ａ（ｐ_ａ）_ｕ，ｕ’についても、Ａ_ａ（ｐ_ａ）_ｖ，ｖ’と同様である。 V _a is a definition area of the attribute values v and u before the disturbance of the attribute a, and V ′ _a is a definition area of the attribute values v ′ and u ′ of the attribute a after the disturbance. A _a (p _a ) _{v, v ′} is _a function determined by _a predetermined parameter pa, and represents the probability that the attribute value v before disturbance becomes the attribute value v ′ after disturbance. _{_{_{A a (p a) u,}}} v ', A a (p a) v, u' and _{_{_{A a (p a) u,}}} ' for _{_{_{even, A a (p a) v}}} , v' u is the same as.

属性ａが数値属性であり、関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’が下記式により定義される分散２ｐ_ａ ^２のラプラス分布である確率密度関数である場合には、 Attribute a is a numeric attribute, if the function _{_{A a (p a) v,}} v ' is the probability density function is a Laplacian distribution of the dispersed 2p _a ² defined by the following equation,

パラメータ決定部１１は、具体的には下記式を満たすパラメータｐ_ａを決定する。μは、例えば０である。下記式において、||・||_１は・のＬ１ノルムであり、ｌｏｇの底はネイピア数ｅである。 Parameter determining unit 11 is specifically determine the parameters p _a satisfying the following equation. μ is, for example, 0. In the following equation, || · || ₁ is the L1 norm of •, and the bottom of log is the Napier number e.

また、属性ａがカテゴリ属性であり、関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’が、属性値ｖを、所定の確率（ｐ_ａ＋（１−ｐ_ａ）／｜Ｖ_ａ｜）で維持し、所定の確率（１−ｐ_ａ）／｜Ｖ_ａ｜で属性ａの属性値ｖ以外の属性値ｖ’に置換する関数である場合には、パラメータ決定部１１は、下記式を満たすパラメータｐ_ａを決定する。｜Ｖ_ａ｜は、集合Ｖ_ａの要素の数である。 The attribute a is category attribute, the function _{_{_{A a (p a) v,}}} v ' is an attribute value v, a predetermined probability and maintained at _{_{(p a + (1-p}} a) / | | V a) In the case of a function that replaces the attribute value v ′ other than the attribute value v of the attribute a with _a predetermined probability (1−p _a ) / | V _a | _a is determined. | V _a | is the number of elements in the set V _a .

＜ステップＳ２＞
各データ提供者装置２の撹乱部２１は、撹乱の対象となる各属性の属性値について、パラメータｐ_ａにより定まる関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’に基づく撹乱を行い撹乱する（ステップＳ２）。この実施形態では、各データ提供者装置２の撹乱部２１が、自身が持つ各属性ａの属性値ｖについて、パラメータｐ_ａにより定まる関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’に基づく撹乱を行い撹乱後の属性値ｖ’とする。撹乱後の各属性値ｖ’は、匿名データサーバ１に送信される。 <Step S2>
Disturbance portion 21 of each of the data provider device 2, the attribute value of each attribute to be disturbance, disrupting performed disturbance based on determined by the parameter p _a function _{_{A a (p a) v,}} v '( step S2 ). In this embodiment, disturbance portion 21 of each data provider apparatus 2, the attribute value v for each attribute a having itself performs disturbance based on the parameter p _a the determined function A _{a (p} a) _v, v _' The attribute value v ′ after the disturbance is assumed. Each attribute value v ′ after the disturbance is transmitted to the anonymous data server 1.

撹乱の対象となる属性ａが数値属性である場合には、上記関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’は確率密度関数である。例えば、関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’は、下記式により定義される分散２ｐ_ａ ^２のラプラス分布である確率密度関数である。||・||_１は、・のＬ１ノルムである。例えば、μ＝０とする。 If attribute a subject to disturbance is numeric attribute, the function _{_{A a (p a) v,}} v ' is the probability density function. For example, the function A _a (p _a ) _{v, v ′} is a probability density function that is a Laplace distribution with _a variance ² p _a ² defined by the following equation. || · || ₁ is the L1 norm of. For example, μ = 0.

撹乱の対象となる属性ａが数値属性である場合には、関数Ａ_ａ（ｐ）_ｖ，ｖ’に基づく撹乱とは、属性ａの撹乱前の属性値ｖ’に確率密度関数Ａ_ａ（ｐ）_ｖ，ｖ’に従う値を加算して撹乱後の属性値ｖ’とすることを意味する。すなわち、撹乱後の属性値ｖ’＝撹乱前の属性値ｖ＋確率密度関数Ａ_ａ（ｐ）_ｖ，ｖ’に従う値となる。 When the attribute a to be disturbed is a numerical attribute, the disturbance based on the function A _a (p) _{v, v ′} is the probability density function A _a (p ) It means that the value according to _{v and v ′} is added to obtain the attribute value v ′ after disturbance. That is, the attribute value v ′ after disturbance is equal to the attribute value v before disturbance + the probability density function A _a (p) _{v, v ′} .

以下、「確率密度関数ｆに従う値」及び「ラプラス分布に従う値」について説明する。ここでは表記の簡略化のために、確率密度関数ｆと書く。確率密度関数ｆは上記確率密度関数Ａ_ａ（ｐ）_ｖ，ｖ’と同じと考えてよい。 Hereinafter, the “value according to the probability density function f” and the “value according to the Laplace distribution” will be described. Here, in order to simplify the notation, the probability density function f is written. The probability density function f may be considered to be the same as the probability density function A _a (p) _{v, v ′} .

１．「確率密度関数ｆに従う値」について
（１）確率密度関数ｆの定義域及び属性値が１次元の場合
（ｉ）累積分布関数Ｆ（ｘ）＝∫_−∞ ^ｘｆ（ｘ’）ｄｘ’を求める。
（ｉｉ）累積分布関数Ｆ（ｘ）の逆関数Ｆ^−１を求める。
（ｉｉｉ）区間［０，１］上の一様乱数ｒを生成する。
（ｉｖ）Ｆ^−１（ｒ）を「確率密度関数ｆに従う値」として出力する。 1. Regarding “value according to probability density function f” (1) When domain and attribute value of probability density function f are one-dimensional (i) Cumulative distribution function F (x) = ∫− _∞ ^x f (x ′) dx ′ Ask.
(Ii) An inverse function F ⁻¹ of the cumulative distribution function F (x) is obtained.
(Iii) Generate a uniform random number r on the interval [0, 1].
(Iv) F ⁻¹ (r) is output as “a value according to the probability density function f”.

累積分布関数Ｆ（ｘ）や逆関数Ｆ^−１が数式で得られる場合にはその数式に基づいてＦ^−１（ｒ）を計算してもよいし、そうでない場合には数値計算によってＦ^−１（ｒ）を計算してもよい。 When the cumulative distribution function F (x) or the inverse function F ⁻¹ is obtained by a mathematical formula, F ⁻¹ (r) may be calculated based on the mathematical formula. Otherwise, F ⁻ is calculated by numerical calculation. ¹ (r) may be calculated.

（２）確率密度関数ｆの定義域及び属性値がｎ次元の場合
１．ｉ＝０，…，ｎ−１のそれぞれに対して、以下の（ｉ）（ｉｉ）を行う。 (2) When the domain and the attribute value of the probability density function f are n-dimensional The following (i) and (ii) are performed for each of i = 0,.

（ｉ）ｘ_０からｘ_ｉ−１までを固定し、ｘ_ｉ＋１からｘ_ｎ−１までを積分し、ｘ_ｉだけを変数として残した確率密度関数ｆ_ｉを求める。 (I) x ₀ to x _i−1 are fixed, x _{i + 1} to x _n−1 are integrated, and a probability density function f _{i in} which only x _i is left as a variable is obtained.

（ｉｉ）確率密度関数ｆ_ｉの定義域は１次元なので、上記「（１）確率密度関数ｆの定義域及び属性値が１次元の場合」で示した方法と同様の方法により、「確率密度関数ｆ_ｉに従う値」を計算する。 (Ii) Since the domain of the probability density function f _i is one-dimensional, the “probability density” is determined by a method similar to the method described above in “(1) When the domain and attribute value of the probability density function f are one-dimensional”. The value according to the function f _i is calculated.

ｉ＝０，…，ｎ−１のそれぞれに対して「確率密度関数ｆ_ｉに従う値」を計算することにより、ｎ個の「確率密度関数ｆ_ｉに従う値」が得られる。 By calculating “value according to probability density function f _i ” for each of i = 0,..., n−1, n “values according to probability density function f _i ” are obtained.

なお、確率密度関数がラプラス分布の場合には、以下のようになる。 In the case where the probability density function is a Laplace distribution, it is as follows.

２．「ラプラス分布に従う値」について
（１）ラプラス分布の定義域及び属性値が１次元の場合
（ｉ）区間［０，１］上の一様乱数ｒ、区間（０，１）上の一様乱数ｂを生成する。
（ｉｉ）（−１）^ｂσｌｏｇｒ＋μを「ラプラス分布に従う値」として出力する。 2. About “value according to Laplace distribution” (1) When the domain and attribute value of Laplace distribution are one-dimensional (i) Uniform random number r on interval [0, 1], Uniform random number on interval (0, 1) b is generated.
(Ii) (-1) ^b σlogr + μ is output as “value according to Laplace distribution”.

（２）ラプラス分布の定義域及び属性値がｎ次元の場合
（ｉ）上記「（１）ラプラス分布の定義域及び属性値が１次元の場合」で示した方法と同様の方法により、ｎ個の「ラプラス分布に従う値」であるｘ_０，ｘ_１，…，ｘ_ｎ−１を計算する。
（ｉｉ）これらのｘ_０，ｘ_１，…，ｘ_ｎ−１を「ラプラス分布に従う値」として出力する。 (2) When the domain and attribute value of the Laplace distribution are n-dimensional (i) n in the same manner as the method described in “(1) When the domain and attribute value of the Laplace distribution is one-dimensional” above X ₀ , x ₁ ,..., X _n−1 which are “values according to the Laplace distribution”.
(Ii) These x ₀ , x ₁ ,..., X _n−1 are output as “values according to Laplace distribution”.

撹乱の対象となる属性ａがカテゴリ属性である場合には、関数Ａ_ａ（ｐ_ａ）_ｖ，ｖ’は、属性値ｖを、所定の確率（ｐ_ａ＋（１−ｐ_ａ）／｜Ｖ_ａ｜）で維持し、所定の確率（１−ｐ_ａ）／｜Ｖ_ａ｜で属性ａの属性値ｖ以外の属性値ｖ’に置換する関数である。属性値ｖを属性ａの属性値ｖ以外の属性値ｖ’に置換するとは、例えば、属性ａが性別であり属性値ｖが「男」である場合には、その属性値「男」を属性値「女」に置換することを意味する。維持確率ρの維持−置換撹乱の詳細については、参考文献１を参照のこと。
〔参考文献〕特開２０１１−１００１１６号公報 If attribute a subject to disturbance is category attribute, the function _{_{_{A a (p a) v,}}} v ' is an attribute value v, a predetermined probability _{_{(p a + (1-p}} a) / | V _a |), and a function that replaces an attribute value v ′ other than the attribute value v of the attribute a with a predetermined probability (1−p _a ) / | V _a |. The attribute value v is replaced with an attribute value v ′ other than the attribute value v of the attribute a. For example, when the attribute a is gender and the attribute value v is “male”, the attribute value “m” is attributed. It means replacing with the value “female”. See Reference 1 for details of maintenance-replacement perturbation of maintenance probability ρ.
[References] JP 2011-100116 A

このようにして、属性値を撹乱することにより、攻撃者がＡ個の属性のうち任意のＭ個の属性を見ることができるとした場合において、Ｐｋ−匿名性を実現することができる。ここでは、その証明を省略する。 In this way, by disturbing the attribute value, Pk-anonymity can be realized when an attacker can view any M attributes among the A attributes. Here, the proof is omitted.

＜ステップＳ３＞
匿名データサーバ１は、各データ提供者装置２から受信した撹乱後の各属性値ｖ’を、その各データ提供者装置２に対応付けて記憶部１２に記憶する（ステップＳ３）。すなわち、データ提供者装置２のレコード識別子とそのデータ提供者装置から受信した撹乱後の各属性値ｖ’との組の複数が、匿名化されたデータベースとして記憶部１２に記憶される。 <Step S3>
The anonymous data server 1 stores each attribute value v ′ after disturbance received from each data provider device 2 in the storage unit 12 in association with each data provider device 2 (step S3). That is, a plurality of pairs of the record identifier of the data provider device 2 and each attribute value v ′ after the disturbance received from the data provider device are stored in the storage unit 12 as an anonymized database.

＜ステップＳ４＞
分析ユーザ装置３は、予め指定されたＭ個の属性についての情報である指定情報を、匿名データサーバ１に送信する（ステップＳ４）。Ｍ個の属性は、分析ユーザ装置３を操作する分析ユーザによって予め指定されていてもよいし、分析ユーザ自身によって予め指定されていてもよい。 <Step S4>
The analysis user device 3 transmits designation information that is information about M attributes designated in advance to the anonymous data server 1 (step S4). The M attributes may be specified in advance by the analysis user who operates the analysis user device 3, or may be specified in advance by the analysis user itself.

＜ステップＳ５＞
匿名データサーバ１は、記憶部１２に記憶されたデータベースの中から、受信した指定情報により特定されるＭ個の属性の属性値の列を取り出して、分析ユーザ装置３に送信する（ステップＳ５）。例えば、データベースが図３に例示するデータベースであり、Ｍ＝２であり、「中間テストの点数」及び「期末テストの点数」という２個の属性が指定されている場合には、属性「中間テストの点数」の属性値により構成される属性値の列ａ１と、属性「期末テストの点数」の属性値により構成される属性値の列ａ２とが、分析ユーザ装置３に送信される。 <Step S5>
The anonymous data server 1 extracts a column of attribute values of M attributes specified by the received designation information from the database stored in the storage unit 12, and transmits it to the analysis user device 3 (step S5). . For example, when the database is the database illustrated in FIG. 3, M = 2, and two attributes “intermediate test score” and “term test score” are designated, the attribute “intermediate test” The attribute value column a1 composed of the attribute value of “number of points” and the attribute value column a2 composed of the attribute value of the attribute “score for the term test” are transmitted to the analysis user device 3.

＜ステップＳ６＞
分析ユーザ装置３の集計部３１は、受信したＭ個の属性値の列を用いて集計処理を行う（ステップＳ６）。集計部３１は、例えば、参考文献２に記載された反復ベイズ手法等を用いて、クロス集計等の集計結果を推定する。 <Step S6>
The aggregation unit 31 of the analysis user device 3 performs an aggregation process using the received M attribute value sequence (step S6). The tabulation unit 31 estimates a tabulation result such as cross tabulation using, for example, an iterative Bayesian method described in Reference Document 2.

〔参考文献２〕
五十嵐大，外２名，「多値属性に適用可能な効率的プライバシー保護クロス集計」，コンピュータセキュリティシンポジウム２００８ [Reference 2]
University of Igarashi, 2 others, “Efficient privacy protection cross-tabulation applicable to multi-valued attributes”, Computer Security Symposium 2008

［変形例等］
撹乱部２１は、匿名データサーバ１に備えられていてもよい。すなわち、この場合、各データ提供者装置２から受信した属性値を、匿名データサーバ１に備えられた撹乱部２１が上記と同様に撹乱をして記憶部１２に記憶する。 [Modifications, etc.]
The disturbing unit 21 may be provided in the anonymous data server 1. That is, in this case, the disturbing unit 21 provided in the anonymous data server 1 disturbs the attribute value received from each data provider device 2 in the same manner as described above and stores it in the storage unit 12.

また、パラメータ決定部１１が、各データ提供者装置２に備えられていてもよい。 Moreover, the parameter determination part 11 may be provided in each data provider apparatus 2. FIG.

また、データ提供者装置２と匿名データサーバ１とが同一の装置に備えられていてもよい。 Further, the data provider device 2 and the anonymous data server 1 may be provided in the same device.

その他、この発明は上述の実施形態に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 In addition, the present invention is not limited to the above-described embodiment. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes.

また、上述の構成をコンピュータによって実現する場合、各装置が有すべき各部の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各部がコンピュータ上で実現される。 Further, when the above-described configuration is realized by a computer, the processing content of each unit that each device should have is described by a program. Each part is realized on the computer by executing this program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

その他、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 Needless to say, other modifications are possible without departing from the spirit of the present invention.

１匿名データサーバ
１１パラメータ決定部
１２記憶部
２データ提供者装置
２１撹乱部
３分析ユーザ装置
３１集計部 DESCRIPTION OF SYMBOLS 1 Anonymous data server 11 Parameter determination part 12 Storage part 2 Data provider apparatus 21 Disturbance part 3 Analysis user apparatus 31 Total part

Claims

The table includes a plurality of records, A is a predetermined integer of 2 or more, each record includes a record identifier and A attribute values, k is a security parameter, | N | is the number of records, and M is A With respect to each of the attributes included in the table, a predetermined natural number of −1 or less, α = ((k−1) / (N−1)) ^{1 / M} , ess inf · the attribute value of the respective attribute a and v, disturbance before an attribute value v, the domain of u and V _a, the attribute values v after disturbance ', u' the domain of the V _'a, predetermined parameters determining the parameter p _a used in database disruptor for performing disturbance of the table by a p _a the determined function _{_{a a (p a) v,}} v ' attribute value after disturbance performed disturbance based on v' Database disturbance parameter In over data determining device,
Parameter determination unit determining a parameter p _a satisfying the following formula

A database disturbance parameter determination device including:

In the database disturbance parameter determination apparatus of Claim 1,
When the attribute a is a numerical attribute and the function A _a (p _a ) _{v, v ′} is a probability density function that is a Laplace distribution with _a variance 2pa ² defined by the following equation: | ₁ as the L1 norm of

The parameter determination unit determines _a parameter pa satisfying the following formula,

| _{V a} | as the number of the set _{V a} elements, the attribute a is category attribute, the function _A a _(p _{a) v, v 'is} given the attribute values v probability _(p a + (1 −p _a ) / | V _a |), and a function that replaces an attribute value other than the attribute value v of the attribute a with a predetermined probability (1-p _a ) / | V _a | parameter determination unit determines parameters p _a satisfying the following equation,

Database disturbance parameter determination device.

The database disturbance parameter determination device according to claim 1 or 2,
The database disruptor;
Including database disturbance system.

The table includes a plurality of records, A is a predetermined integer of 2 or more, each record includes a record identifier and at least one attribute value, k is a security parameter, | N | is the number of records, and M is A −1 or less, α = ((k−1) / (N−1)) ^{1 / M} , ess inf · for each attribute, the attribute values of the respective attributes a and v, disturbance before an attribute value v, the domain of u and V _a, the attribute values v after disturbance ', u' V the domain of _'a as includes a disturbance unit which performs disturbance of the table by a predetermined parameter p _a the determined function _{_{a a (p a) v,}} v ' performs disturbance based on disturbance after the attribute values v',
The parameter p _a satisfies the relation of the following formula,

Database disturbance device.

The table includes a plurality of records, A is a predetermined integer of 2 or more, each record includes a record identifier and at least one attribute value, k is a security parameter, | N | is the number of records, and M is A −1 or less, α = ((k−1) / (N−1)) ^{1 / M} , ess inf · for each attribute, the attribute values of the respective attributes a and v, disturbance before an attribute value v, the domain of u and V _a, the attribute values v after disturbance ', u' V the domain of _'a In the database disturbance method in which disturbance is performed based on _a function A _a (pa) _{v, v ′} determined by _a predetermined parameter pa and the attribute value v ′ after the disturbance is set to disturb the table.
Parameter determination unit, a parameter determining step of determining a parameter p _a satisfying the following equation,

Disturbance portion, the attribute value v for each attribute a of M following number contained in the table, the function A _{a (p} a) determined by the predetermined parameters p _a _v, v attributes after disturbance performed disturbance based on _' The disturbance step with value v ′;
Database disruption method including.