JP2021163014A

JP2021163014A - Generation method, generation program, and generation device

Info

Publication number: JP2021163014A
Application number: JP2020062248A
Authority: JP
Inventors: 秀暢小栗; Hidenobu Oguri
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2021-10-11
Anticipated expiration: 2040-03-31
Also published as: JP7359063B2

Abstract

To provide a generation device capable of obtaining useful anonymous data.SOLUTION: A generation device 100 identifies one or more pieces of personal data whose degree of anonymity does not meet certain criteria out of multiple personal data by using a first anonymization model 110 based on the anonymizing result of personal data contained in multiple personal data. The generation device 100 learns a second anonymization model 120 based on the identified one or more personal data. The generation device 100 generates one or more pieces of anonymous data having higher anonymity than each personal data of one or more pieces of personal data based on the identified one or more pieces of personal data. The generation device 100 outputs a piece of new anonymous data obtained by anonymizing each anonymous data of one or more pieces of generated anonymous data using the learned second anonymization model 120.SELECTED DRAWING: Figure 1

Description

本発明は、生成方法、生成プログラム、および生成装置に関する。 The present invention relates to a generation method, a generation program, and a generation device.

従来、プライバシーを保護するため、個人データ（ＰＩＩ：ＰｅｒｓｏｎａｌｌｙＩｄｅｎｔｉｆｉａｂｌｅＩｎｆｏｒｍａｔｉｏｎ）を匿名化して出力する匿名化手法が存在する。例えば、個人データを加工して匿名データを生成した後、生成した匿名データに対し、プライバシーテストを実施し、元となる個人データに関して匿名性が確保されていると判断した場合にのみ、生成した匿名データを出力する匿名化手法が存在する。 Conventionally, in order to protect privacy, there is an anonymization method for anonymizing and outputting personal data (PII: Personally Identity Information). For example, after processing personal data to generate anonymous data, the generated anonymous data is subjected to a privacy test, and it is generated only when it is determined that anonymity is ensured for the original personal data. There is an anonymization method that outputs anonymous data.

先行技術としては、例えば、第１のデータ群に含まれるデータについて、所定の関係を有するデータの数がＮ個以上である場合に、所定の関係を有する複数のデータを出力するものがある。 As the prior art, for example, with respect to the data included in the first data group, when the number of data having a predetermined relationship is N or more, a plurality of data having a predetermined relationship may be output.

特開２０１４−０１６６７５号公報Japanese Unexamined Patent Publication No. 2014-016675

しかしながら、従来技術では、出力される匿名データの有用性が損なわれることがある。例えば、プライバシーテストを実施すると、複数の個人データのうち一部の個人データに基づいて生成した匿名データしか出力されないことがある。結果として、出力される複数の匿名データが、統計処理において好ましくないデータになってしまうことがある。具体的には、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとが、類似しなくなってしまうことがある。 However, in the prior art, the usefulness of the output anonymous data may be impaired. For example, when a privacy test is performed, only anonymous data generated based on some personal data among a plurality of personal data may be output. As a result, a plurality of output anonymous data may become unfavorable data in statistical processing. Specifically, the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data may not be similar.

１つの側面では、本発明は、有用な匿名データを得ることを目的とする。 In one aspect, the invention aims to obtain useful anonymous data.

１つの実施態様によれば、情報を匿名化する第１の匿名化モデルにより、複数の個人データに含まれる個人データを匿名化した結果に基づいて、前記複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定し、特定した前記１以上の個人データに基づいて、情報を匿名化する第２の匿名化モデルを学習し、特定した前記１以上の個人データに基づいて、前記１以上の個人データのそれぞれの個人データよりも匿名度合いが高い、１以上の匿名データを生成し、学習した前記第２の匿名化モデルにより、生成した前記１以上の匿名データのそれぞれの匿名データを匿名化して得られた、新たな匿名データを出力する生成方法、生成プログラム、および生成装置が提案される。 According to one embodiment, the degree of anonymity among the plurality of personal data is determined based on the result of anonymizing the personal data contained in the plurality of personal data by the first anonymization model for anonymizing the information. One or more personal data that do not meet a predetermined criterion are specified, and based on the specified one or more personal data, a second anonymization model for anonymizing information is learned, and the specified one or more personal data is specified. Based on the above, one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data is generated, and the one or more anonymous data generated by the second anonymization model learned is generated. A generation method, a generation program, and a generation device for outputting new anonymous data obtained by anonymizing each of the anonymous data of the above are proposed.

一態様によれば、有用な匿名データを得ることが可能になる。 According to one aspect, it is possible to obtain useful anonymous data.

図１は、実施の形態にかかる生成方法の一実施例を示す説明図（その１）である。FIG. 1 is an explanatory diagram (No. 1) showing an embodiment of the generation method according to the embodiment. 図２は、実施の形態にかかる生成方法の一実施例を示す説明図（その２）である。FIG. 2 is an explanatory diagram (No. 2) showing an embodiment of the generation method according to the embodiment. 図３は、データ利活用システム３００の一例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of the data utilization system 300. 図４は、生成装置１００のハードウェア構成例を示すブロック図である。FIG. 4 is a block diagram showing a hardware configuration example of the generation device 100. 図５は、データ管理テーブル５００の記憶内容の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of the stored contents of the data management table 500. 図６は、生成装置１００の機能的構成例を示すブロック図である。FIG. 6 is a block diagram showing a functional configuration example of the generator 100. 図７は、生成装置１００の第１の動作例を示す説明図である。FIG. 7 is an explanatory diagram showing a first operation example of the generator 100. 図８は、生成装置１００の第２の動作例を示す説明図である。FIG. 8 is an explanatory diagram showing a second operation example of the generator 100. 図９は、生成装置１００の第３の動作例を示す説明図である。FIG. 9 is an explanatory diagram showing a third operation example of the generator 100. 図１０は、メンバーシップインクルージョン攻撃の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of a membership inclusion attack. 図１１は、比較対象のヒストグラムの形状の第１の例を示す説明図（その１）である。FIG. 11 is an explanatory diagram (No. 1) showing a first example of the shape of the histogram to be compared. 図１２は、比較対象のヒストグラムの形状の第１の例を示す説明図（その２）である。FIG. 12 is an explanatory diagram (No. 2) showing a first example of the shape of the histogram to be compared. 図１３は、比較対象のヒストグラムの形状の第１の例を示す説明図（その３）である。FIG. 13 is an explanatory diagram (No. 3) showing a first example of the shape of the histogram to be compared. 図１４は、比較対象のヒストグラムの形状の第１の例を示す説明図（その４）である。FIG. 14 is an explanatory diagram (No. 4) showing a first example of the shape of the histogram to be compared. 図１５は、比較対象のヒストグラムの形状の第２の例を示す説明図（その１）である。FIG. 15 is an explanatory diagram (No. 1) showing a second example of the shape of the histogram to be compared. 図１６は、比較対象のヒストグラムの形状の第２の例を示す説明図（その２）である。FIG. 16 is an explanatory diagram (No. 2) showing a second example of the shape of the histogram to be compared. 図１７は、比較対象のヒストグラムの形状の第２の例を示す説明図（その３）である。FIG. 17 is an explanatory diagram (No. 3) showing a second example of the shape of the histogram to be compared. 図１８は、準備処理手順の一例を示すフローチャートである。FIG. 18 is a flowchart showing an example of the preparatory processing procedure. 図１９は、テスト処理手順の一例を示すフローチャートである。FIG. 19 is a flowchart showing an example of the test processing procedure. 図２０は、分岐処理手順の一例を示すフローチャートである。FIG. 20 is a flowchart showing an example of the branch processing procedure. 図２１は、再利用処理手順の一例を示すフローチャートである。FIG. 21 is a flowchart showing an example of the reuse processing procedure.

以下に、図面を参照して、本発明にかかる生成方法、生成プログラム、および生成装置の実施の形態を詳細に説明する。 Hereinafter, embodiments of a generation method, a generation program, and a generation apparatus according to the present invention will be described in detail with reference to the drawings.

（実施の形態にかかる生成方法の一実施例）
図１および図２は、実施の形態にかかる生成方法の一実施例を示す説明図である。図１において、生成装置１００は、個人データの匿名性を確保しつつ、個人データに基づいて匿名データを生成して出力するコンピュータである。 (An example of a generation method according to an embodiment)
1 and 2 are explanatory views showing an embodiment of the generation method according to the embodiment. In FIG. 1, the generation device 100 is a computer that generates and outputs anonymous data based on the personal data while ensuring the anonymity of the personal data.

従来では、個人データを加工することにより匿名データを生成して出力するにあたり、個人データの匿名性を確保するため、プライバシーテストが実施されることがある。例えば、プライバシーテストにより、個人データに関して匿名性が確保されていると判断した場合にのみ、個人データに基づいて生成した匿名データが出力されることになる。 Conventionally, when anonymity data is generated and output by processing personal data, a privacy test may be carried out in order to ensure the anonymity of the personal data. For example, the anonymous data generated based on the personal data will be output only when the privacy test determines that the anonymity of the personal data is ensured.

具体的には、個人データに対して、プライバシーテストを実施することにより、匿名データの有用性と、個人データの匿名性との基準を定める、（ｋ，γ）ＰＤ（ＰｌａｕｓｉｂｌｅＤｅｎｉａｂｉｌｉｔｙ）と呼ばれる安全性指標が存在する。ここで、図２の説明に移行し、従来の（ｋ，γ）ＰＤが行う処理の流れについて説明する。 Specifically, a safety called (k, γ) PD (Plausible Deniability) that sets the criteria for the usefulness of anonymous data and the anonymity of personal data by conducting a privacy test on personal data. There is a sexual index. Here, the description shifts to FIG. 2, and the flow of processing performed by the conventional (k, γ) PD will be described.

図２において、（ｋ，γ）ＰＤは、個人データ群Ｄからランダムに選択した個人データｄに対して、確率的な差分プライバシーアルゴリズムＭを適用することにより、匿名データｙ＝Ｍ（ｄ）を生成する。（ｋ，γ）ＰＤは、生成した匿名データｙに対して、プライバシーテストを実施する。プライバシーテストは、有用性要件を設定されてもよい。 In FIG. 2, (k, γ) PD obtains anonymous data y = M (d) by applying a probabilistic differential privacy algorithm M to personal data d randomly selected from personal data group D. Generate. (K, γ) PD performs a privacy test on the generated anonymous data y. Privacy tests may have usefulness requirements set.

（ｋ，γ）ＰＤは、例えば、個人データ群Ｄの中に、個人データｄと同一の属性値を含む他の個人データｄがｋ以上（ｋ＞１）存在していれば、生成した匿名データｙを、出力可能と判定し、リリースデータセットに追加する。一方で、（ｋ，γ）ＰＤは、例えば、個人データ群Ｄの中に、個人データｄと同一の属性値を含む他の個人データｄがｋ以上（ｋ＞１）存在していなければ、生成した匿名データｙを、出力不能と判定して破棄する。 The (k, γ) PD is generated anonymously if, for example, other personal data d containing the same attribute value as the personal data d exists in the personal data group D by k or more (k> 1). It is determined that the data y can be output, and the data y is added to the release data set. On the other hand, in the (k, γ) PD, for example, if there is no other personal data d containing the same attribute value as the personal data d in the personal data group D by k or more (k> 1), It is determined that the generated anonymous data y cannot be output and is discarded.

また、（ｋ，γ）ＰＤは、パラメータｋを推定する攻撃を防止するため、パラメータｋに代わり、ランダム化されたパラメータｋ＋Ｌａｐ（１／ε⁰）を用いてもよい。Ｌａｐ（・）は、ラプラス分布に基づく乱数発生メカニズムである。 Further, the (k, γ) PD may use a randomized parameter k + Lap (1 / ε ⁰ ) instead of the parameter k in order to prevent an attack that estimates the parameter k. Lap (・) is a random number generation mechanism based on the Laplace distribution.

ｋは、個人データｄと同一の属性値を有する他の個人データｄが、いくつ以上存在することが、匿名性の観点から好ましいのかを示すパラメータである。ｋは、値が大きいほど、個人データｄに対応する個人を特定されにくくすることができるという性質を有する。 k is a parameter indicating how many or more other personal data d having the same attribute value as the personal data d is preferable from the viewpoint of anonymity. The larger the value of k, the more difficult it is to identify the individual corresponding to the personal data d.

γは、確率的な差分プライバシーアルゴリズムＭのパラメータである。γは、個人データｄに対して確率的に与えるノイズ値を規定するパラメータである。γは、値が小さいほど、個人データｄに対して与えるノイズ値を大きくすることができるという性質を有する。ノイズ値が小さいほど、匿名データｙが、いずれの個人データｄから生成されたのかが、識別される危険性が高くなる傾向がある。 γ is a parameter of the stochastic differential privacy algorithm M. γ is a parameter that defines the noise value stochastically given to the personal data d. The smaller the value of γ, the larger the noise value given to the personal data d. The smaller the noise value, the higher the risk of identifying which personal data d the anonymous data y was generated from.

ε⁰は、ｋに対して与えるランダムなノイズ値を規定するパラメータである。ε⁰は、値が小さいほど、メンバーシップインクルージョン攻撃への耐性を強めることができるという性質を有する。メンバーシップインクルージョン攻撃の一例については、具体的には、図１０を用いて後述する。 ε ⁰ is a parameter that defines a random noise value given to k. The ^{smaller the value of ε 0} , the stronger the resistance to membership inclusion attacks. An example of a membership inclusion attack will be specifically described later with reference to FIG.

また、（ｋ，γ）ＰＤについては、例えば、下記非特許文献１を参照することができる。また、（ｋ，γ）ＰＤの他、（ｋ，δ）ＰＤと呼ばれる安全性指標なども存在する。（ｋ，δ）ＰＤについては、例えば、下記非特許文献２を参照することができる。 Further, for (k, γ) PD, for example, the following Non-Patent Document 1 can be referred to. In addition to (k, γ) PD, there is also a safety index called (k, δ) PD. For (k, δ) PD, for example, the following Non-Patent Document 2 can be referred to.

非特許文献１：Ｂｉｎｄｓｃｈａｅｄｌｅｒ，Ｖｉｎｃｅｎｔ，ＲｅｚａＳｈｏｋｒｉ，ａｎｄＣａｒｌＡ．Ｇｕｎｔｅｒ． “Ｐｌａｕｓｉｂｌｅｄｅｎｉａｂｉｌｉｔｙｆｏｒｐｒｉｖａｃｙ−ｐｒｅｓｅｒｖｉｎｇｄａｔａｓｙｎｔｈｅｓｉｓ．” ａｒＸｉｖｐｒｅｐｒｉｎｔａｒＸｉｖ：１７０８．０７９７５（２０１７）． Non-Patent Document 1: Bindschaedler, Vincent, Reza Shokri, and Carl A. et al. Gunter. “Plausible deniaviity for privacy-preserving data synthesis.” ArXiv preprint arXiv: 1708.07975 (2017).

非特許文献２：Ｂｉｎｄｓｃｈａｅｄｌｅｒ，Ｖｉｎｃｅｎｔ，ａｎｄＲｅｚａＳｈｏｋｒｉ． “Ｓｙｎｔｈｅｓｉｚｉｎｇｐｌａｕｓｉｂｌｅｐｒｉｖａｃｙ−ｐｒｅｓｅｒｖｉｎｇｌｏｃａｔｉｏｎｔｒａｃｅｓ．” ２０１６ＩＥＥＥＳｙｍｐｏｓｉｕｍｏｎＳｅｃｕｒｉｔｙａｎｄＰｒｉｖａｃｙ（ＳＰ）．ＩＥＥＥ，２０１６． Non-Patent Document 2: Bindschaedler, Vincent, and Reza Shokri. “Synthesis symposium privacy-preserving location races.” 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016.

しかしながら、従来では、匿名データの有用性が損なわれることがある。例えば、プライバシーテストを実施すると、複数の個人データのうち一部の個人データに基づいて生成した匿名データしか出力されないことがある。結果として、出力される複数の匿名データが、統計処理において好ましくないデータになってしまうことがある。具体的には、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとが、類似しなくなってしまうことがある。ヒストグラムについては、具体的には、図１１〜図１７を用いて後述する。 However, in the past, the usefulness of anonymous data may be impaired. For example, when a privacy test is performed, only anonymous data generated based on some personal data among a plurality of personal data may be output. As a result, a plurality of output anonymous data may become unfavorable data in statistical processing. Specifically, the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data may not be similar. Specifically, the histogram will be described later with reference to FIGS. 11 to 17.

より具体的には、（ｋ，γ）ＰＤにおいて、匿名データの有用性と、個人データの匿名性とを、バランスよく両立するよう、パラメータｋ，γ，ε⁰を調整することは難しい。ここで、（ｋ，γ）ＰＤにおいて、どのようにパラメータｋ，γ，ε⁰を調整しても、匿名データの有用性と、個人データの匿名性とのいずれかが損なわれる傾向がある。 More specifically, in (k, γ) PD, it is difficult to adjust the ^{parameters k, γ, ε 0} so as to balance the usefulness of anonymous data and the anonymity of personal data in a well-balanced manner. Here, in (k, γ) PD, no matter how the parameters k, γ, ε ⁰ are adjusted, either the usefulness of the anonymous data or the anonymity of the personal data tends to be impaired.

特に、個人データの匿名性を確保するため、パラメータｋの値を大きくすると、出力可能な匿名データを生成する元となる個人データの数が少なくなり、出力される複数の匿名データが、統計処理において好ましくないデータになる。例えば、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとが類似しなくなり、出力される複数の匿名データが、統計処理において好ましくないデータになる。 In particular, if the value of the parameter k is increased in order to ensure the anonymity of the personal data, the number of personal data that is the source of generating the anonymous data that can be output decreases, and the plurality of output anonymous data are statistically processed. It becomes unfavorable data in. For example, the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data become dissimilar, and the plurality of output anonymous data becomes unfavorable data in statistical processing. ..

また、ユーザが、ヒストグラムを考慮し、匿名データの有用性を向上するため、恣意的に、プライバシーテストを実施してしまうことがある。例えば、ユーザが、特定の個人データに対して、パラメータｋ＋Ｌａｐ（１／ε⁰）が偶々大きい値を取るまで、プライバシーテストを繰り返し実施し、特定の個人データに基づいて生成された匿名データを出力しようとすることがある。この場合、個人データの匿名性が損なわれることがある。 In addition, the user may arbitrarily perform a privacy test in order to improve the usefulness of anonymous data in consideration of the histogram. For example, the user repeatedly performs a privacy test for specific personal data until the parameter k + Lap (1 / ε ⁰ ) happens to take a large value, and outputs anonymous data generated based on the specific personal data. I may try. In this case, the anonymity of personal data may be impaired.

また、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとが類似しなくなった結果、個人データの匿名性も損なわれることがある。比較的少数の個人データに基づいて、匿名データが比較的多量に生成されるため、匿名データに基づいて、個人が識別される危険性が高くなる傾向がある。 Further, as a result of the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data becoming dissimilar, the anonymity of the personal data may be impaired. Since anonymous data is generated in a relatively large amount based on a relatively small amount of personal data, there is a tendency that the risk of identifying an individual based on anonymous data is high.

そこで、本実施の形態では、出力される匿名データの有用性を向上することができる生成方法について説明する。 Therefore, in the present embodiment, a generation method that can improve the usefulness of the output anonymous data will be described.

図１の説明に戻り、生成装置１００は、複数の個人データを、ＤＢ（ＤａｔａＢａｓｅ）１０１に記憶する。個人データは、例えば、個人に関する何らかの特徴を示す値を含む。生成装置１００は、第１の匿名化モデル１１０を記憶する。第１の匿名化モデル１１０は、情報を匿名化するモデルである。匿名化は、加工に対応する。第１の匿名化モデル１１０は、例えば、個人データに含まれる値に、ランダムなノイズ値を加算して得られる匿名データを、１以上生成するモデルである。第１の匿名化モデル１１０は、例えば、確率的な差分プライバシーアルゴリズムである。 Returning to the description of FIG. 1, the generation device 100 stores a plurality of personal data in the DB (DataBase) 101. Personal data includes, for example, values that indicate some characteristic of the individual. The generation device 100 stores the first anonymization model 110. The first anonymization model 110 is a model for anonymizing information. Anonymization corresponds to processing. The first anonymization model 110 is, for example, a model that generates one or more anonymous data obtained by adding a random noise value to a value included in personal data. The first anonymization model 110 is, for example, a stochastic differential privacy algorithm.

（１−１）生成装置１００は、複数の個人データに含まれる個人データを、第１の匿名化モデル１１０により匿名化した結果に基づいて、複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定する。所定の基準は、例えば、プライバシーテストの基準である。以下の説明では、所定の基準を満たすことを「ＯＫ」と表記し、所定の基準を満たさないことを「ＮＧ」と表記する場合がある。匿名度合いが所定の基準を満たさない１以上の個人データは、例えば、ＮＧ−ＤＢ１０２に記憶される。 (1-1) The generation device 100 is based on the result of anonymizing the personal data included in the plurality of personal data by the first anonymization model 110, and the degree of anonymity is a predetermined criterion among the plurality of personal data. Identify one or more personal data that does not meet. The predetermined criteria are, for example, privacy test criteria. In the following description, satisfying the predetermined criteria may be described as "OK", and not satisfying the predetermined criteria may be described as "NG". One or more personal data whose degree of anonymity does not meet a predetermined criterion is stored in, for example, NG-DB102.

生成装置１００は、例えば、複数の個人データに含まれる個人データをランダムに選択する。生成装置１００は、例えば、個人データを選択する都度、選択した個人データに基づいて、第１の匿名化モデル１１０により匿名データを１以上生成する。生成装置１００は、例えば、１以上の匿名データに基づいて、プライバシーテストを実施し、１以上の匿名データを生成する元となった個人データの匿名度合いが、所定の基準を満たすか否かを判定する。生成装置１００は、例えば、判定した結果に基づいて、複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定する。 The generation device 100 randomly selects, for example, personal data included in a plurality of personal data. For example, each time the personal data is selected, the generation device 100 generates one or more anonymous data by the first anonymization model 110 based on the selected personal data. The generation device 100 conducts a privacy test based on, for example, one or more anonymous data, and determines whether or not the degree of anonymity of the personal data from which the one or more anonymous data is generated satisfies a predetermined criterion. judge. The generation device 100 identifies, for example, one or more personal data whose anonymity degree does not satisfy a predetermined criterion among a plurality of personal data based on the determination result.

（１−２）生成装置１００は、特定した１以上の個人データに基づいて、第２の匿名化モデル１２０を学習する。第２の匿名化モデル１２０は、情報を匿名化するモデルである。第２の匿名化モデル１２０は、情報を匿名化するモデルである。第２の匿名化モデル１２０は、匿名データに含まれる値に、ランダムなノイズ値を加算して得られる新たな匿名データを、１以上生成するモデルである。第２の匿名化モデル１２０は、例えば、確率的な差分プライバシーアルゴリズムである。第２の匿名化モデル１２０は、例えば、第１の匿名化モデル１１０と同一のアルゴリズムである。生成装置１００は、例えば、特定した１以上の個人データに含まれる値に関する分散および平均に基づいて、第２の匿名化モデル１２０を学習する。 (1-2) The generation device 100 learns the second anonymization model 120 based on one or more specified personal data. The second anonymization model 120 is a model for anonymizing information. The second anonymization model 120 is a model for anonymizing information. The second anonymization model 120 is a model that generates one or more new anonymity data obtained by adding a random noise value to the value included in the anonymity data. The second anonymization model 120 is, for example, a stochastic differential privacy algorithm. The second anonymization model 120 is, for example, the same algorithm as the first anonymization model 110. The generator 100 learns a second anonymization model 120, for example, based on the variance and average of the values contained in one or more identified personal data.

（１−３）生成装置１００は、特定した１以上の個人データに基づいて、１以上の個人データのそれぞれの個人データよりも匿名度合いが高い、１以上の匿名データを生成する。１以上の匿名データは、例えば、ＭＡ−ＤＢ１０３に記憶される。ＭＡは、ミクロアグリゲーションを意味する。生成装置１００は、例えば、ミクロアグリゲーションを実施し、１以上の匿名データを生成する。 (1-3) The generation device 100 generates one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data based on the specified one or more personal data. One or more anonymous data is stored in, for example, MA-DB103. MA means microaggregation. The generator 100 performs, for example, microaggregation to generate one or more anonymous data.

生成装置１００は、具体的には、特定した１以上の個人データのそれぞれの個人データに含まれる値に関する統計値を算出する。統計値は、具体的には、平均値、最大値、最小値、中央値、または、最頻値などである。生成装置１００は、具体的には、特定した１以上の個人データのそれぞれの個人データに含まれる値を、算出した統計値に置換することにより、１以上の匿名データを生成する。 Specifically, the generation device 100 calculates a statistical value regarding a value included in each personal data of one or more specified personal data. The statistic is, specifically, an average value, a maximum value, a minimum value, a median value, a mode value, or the like. Specifically, the generation device 100 generates one or more anonymous data by replacing the value included in each personal data of the specified one or more personal data with the calculated statistical value.

（１−４）生成装置１００は、生成した１以上の匿名データのそれぞれの匿名データを、学習した第２の匿名化モデル１２０により匿名化して得られた、新たな匿名データを出力する。出力先は、例えば、リリースＤＢ１０４である。生成装置１００は、例えば、１以上の匿名データのそれぞれの匿名データに基づいて、第２の匿名化モデル１２０により、新たな匿名データを生成して出力する。この際、生成装置１００は、例えば、複数の個人データのうち、所定の基準を満たす個人データに基づいて、第１の匿名化モデル１１０により生成された匿名データを、併せて出力してもよい。 (1-4) The generation device 100 outputs new anonymous data obtained by anonymizing each anonymous data of one or more generated anonymous data by the learned second anonymization model 120. The output destination is, for example, release DB 104. The generation device 100 generates and outputs new anonymous data by the second anonymization model 120 based on each anonymous data of one or more anonymous data, for example. At this time, the generation device 100 may also output the anonymity data generated by the first anonymization model 110 based on the personal data satisfying a predetermined criterion among the plurality of personal data, for example. ..

これにより、生成装置１００は、有用な匿名データを得ることができ、匿名データの有用性と、個人データの匿名性とを両立することができる。 As a result, the generation device 100 can obtain useful anonymous data, and can achieve both the usefulness of the anonymous data and the anonymity of the personal data.

生成装置１００は、例えば、プライバシーテストにより、匿名度合いが所定の基準を満たさないと判定された個人データに基づいて、新たな匿名データを生成することができる。このため、生成装置１００は、例えば、出力される複数の匿名データを、統計処理において好ましいデータにすることができる。生成装置１００は、具体的には、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとを近づけることができる。 The generation device 100 can generate new anonymity data based on personal data whose degree of anonymity is determined not to meet a predetermined criterion by, for example, a privacy test. Therefore, the generation device 100 can, for example, convert a plurality of output anonymous data into preferable data in statistical processing. Specifically, the generation device 100 can bring the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data close to each other.

また、生成装置１００は、例えば、匿名度合いが所定の基準を満たさないと判定された個人データを、より匿名度合いが高い形式である匿名データに変換してから、第２の匿名化モデル１２０により、新たな匿名データを生成することができる。このため、生成装置１００は、例えば、個人データの匿名性を確保し易くすることができる。 Further, for example, the generation device 100 converts personal data for which the degree of anonymity is determined not to satisfy a predetermined criterion into anonymous data in a format having a higher degree of anonymity, and then uses the second anonymization model 120. , New anonymous data can be generated. Therefore, the generation device 100 can facilitate, for example, ensuring the anonymity of personal data.

生成装置１００は、例えば、複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データに基づいて、第２の匿名化モデル１２０を学習することができる。生成装置１００は、例えば、第２の匿名化モデル１２０により、新たな匿名データを生成することができる。このため、生成装置１００は、例えば、出力される複数の匿名データを、統計処理において好ましいデータにすることができる。生成装置１００は、具体的には、例えば、匿名度合いが所定の基準を満たさない個人データの特徴分布を示すヒストグラムと、第２の匿名化モデル１２０により生成される新たな匿名データの特徴分布を示すヒストグラムとを近づけることができる。 The generation device 100 can learn the second anonymization model 120 based on, for example, one or more personal data whose anonymity degree does not satisfy a predetermined criterion among a plurality of personal data. The generation device 100 can generate new anonymized data by, for example, the second anonymization model 120. Therefore, the generation device 100 can, for example, convert a plurality of output anonymous data into preferable data in statistical processing. Specifically, the generation device 100 specifically displays, for example, a histogram showing a feature distribution of personal data whose anonymity degree does not meet a predetermined criterion, and a feature distribution of new anonymous data generated by the second anonymization model 120. It can be brought closer to the histogram shown.

生成装置１００は、本来であればＮＧと判定され得る個人データに基づいて、匿名データを生成して出力することができる。このため、生成装置１００は、ユーザが、匿名データの有用性を向上するため、恣意的に、プライバシーテストを実施してしまうことを防止することができ、個人データの匿名性を確保することができる。 The generation device 100 can generate and output anonymous data based on personal data that can be normally determined to be NG. Therefore, the generation device 100 can prevent the user from arbitrarily performing the privacy test in order to improve the usefulness of the anonymous data, and can ensure the anonymity of the personal data. can.

以上により、生成装置１００は、匿名データの有用性と、個人データの匿名性とを両立し、外部に流通させて第３者に参照されても問題が発生しないと考えられる匿名データのＤＢを生成することができる。このため、生成装置１００は、統計分析、または、機械学習の分野において利用することができる。 As described above, the generation device 100 has both the usefulness of anonymous data and the anonymity of personal data, and provides a DB of anonymous data that is considered to cause no problem even if it is distributed to the outside and referred to by a third party. Can be generated. Therefore, the generator 100 can be used in the field of statistical analysis or machine learning.

また、生成装置１００は、プライバシーテストを行う手法を改良することができる。生成装置１００は、例えば、（ｋ，γ）ＰＤ、または、（ｋ，δ）ＰＤなどの手法を改良することができる。生成装置１００は、（ｋ，γ）ＰＤ、および、（ｋ，δ）ＰＤ以外の、プライバシーテストを行う手法を改良することができる。 In addition, the generation device 100 can improve the method of performing the privacy test. The generator 100 can improve the method such as (k, γ) PD or (k, δ) PD. The generator 100 can improve the method of performing the privacy test other than (k, γ) PD and (k, δ) PD.

（データ利活用システム３００の一例）
次に、図３を用いて、図１に示した生成装置１００を適用した、データ利活用システム３００の一例について説明する。 (Example of data utilization system 300)
Next, an example of the data utilization system 300 to which the generation device 100 shown in FIG. 1 is applied will be described with reference to FIG.

図３は、データ利活用システム３００の一例を示す説明図である。図３において、データ利活用システム３００は、生成装置１００と、データ提供側装置３０１と、データ利用側装置３０２とを含む。 FIG. 3 is an explanatory diagram showing an example of the data utilization system 300. In FIG. 3, the data utilization system 300 includes a generation device 100, a data provider device 301, and a data utilization side device 302.

データ利活用システム３００において、生成装置１００とデータ提供側装置３０１とは、有線または無線のネットワーク３１０を介して接続される。ネットワーク３１０は、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどである。また、生成装置１００とデータ利用側装置３０２とは、有線または無線のネットワーク３１０を介して接続される。 In the data utilization system 300, the generation device 100 and the data provider device 301 are connected via a wired or wireless network 310. The network 310 is, for example, a LAN (Local Area Network), a WAN (Wide Area Network), the Internet, or the like. Further, the generation device 100 and the data utilization side device 302 are connected via a wired or wireless network 310.

生成装置１００は、個人データをデータ提供側装置３０１から収集する。収集した個人データは、例えば、図５に後述するデータ管理テーブル５００に記憶される。生成装置１００は、収集した複数の個人データに基づいて、複数の匿名データを生成してデータ利用側装置３０２に送信する。複数の匿名データを生成する具体例については、例えば、図７〜図９を用いて後述する。生成装置１００は、例えば、サーバ、または、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）などである。 The generation device 100 collects personal data from the data provider device 301. The collected personal data is stored in, for example, the data management table 500 described later in FIG. The generation device 100 generates a plurality of anonymous data based on the collected personal data and transmits the plurality of anonymous data to the data user side device 302. Specific examples of generating a plurality of anonymous data will be described later with reference to, for example, FIGS. 7 to 9. The generation device 100 is, for example, a server, a PC (Personal Computer), or the like.

データ提供側装置３０１は、個人データを取得し、生成装置１００に送信する。データ提供側装置３０１は、操作者の操作入力に基づき、個人データを取得し、生成装置１００に送信する。データ提供側装置３０１は、例えば、タブレット端末、スマートフォン、ウェアラブル端末、ＩｏＴ（ＩｎｔｅｒｎｅｔｏｆＴｈｉｎｇｓ）機器などから、個人データを取得し、生成装置１００に送信する。データ提供側装置３０１は、例えば、サーバ、または、ＰＣなどである。 The data providing side device 301 acquires personal data and transmits it to the generating device 100. The data providing side device 301 acquires personal data based on the operation input of the operator and transmits it to the generating device 100. The data providing side device 301 acquires personal data from, for example, a tablet terminal, a smartphone, a wearable terminal, an IoT (Internet of Things) device, and transmits the personal data to the generating device 100. The data providing side device 301 is, for example, a server, a PC, or the like.

データ利用側装置３０２は、複数の匿名データを生成装置１００から受信する。データ利用側装置３０２は、複数の匿名データに基づいて、データ利活用タスクを実施する。データ利活用タスクは、例えば、統計分析、または、機械学習などのタスクである。データ利用側装置３０２は、例えば、サーバ、または、ＰＣなどである。 The data utilization side device 302 receives a plurality of anonymous data from the generation device 100. The data utilization side device 302 executes a data utilization task based on a plurality of anonymous data. The data utilization task is, for example, a task such as statistical analysis or machine learning. The data utilization side device 302 is, for example, a server or a PC.

ここでは、生成装置１００が、データ提供側装置３０１とは異なる装置である場合について説明したが、これに限らない。例えば、生成装置１００が、データ提供側装置３０１と一体であり、データ提供側装置３０１としても動作する場合があってもよい。 Here, the case where the generation device 100 is a device different from the data providing side device 301 has been described, but the present invention is not limited to this. For example, the generation device 100 may be integrated with the data providing side device 301 and may also operate as the data providing side device 301.

ここでは、生成装置１００が、データ利用側装置３０２とは異なる装置である場合について説明したが、これに限らない。例えば、生成装置１００が、データ利用側装置３０２と一体であり、データ利用側装置３０２としても動作する場合があってもよい。 Here, the case where the generation device 100 is a device different from the data utilization side device 302 has been described, but the present invention is not limited to this. For example, the generation device 100 may be integrated with the data utilization side device 302 and may also operate as the data utilization side device 302.

ここでは、データ提供側装置３０１が、サーバ、または、ＰＣなどである場合について説明したが、これに限らない。例えば、データ提供側装置３０１が、タブレット端末、スマートフォン、ウェアラブル端末、ＩｏＴ機器などである場合があってもよい。 Here, the case where the data providing side device 301 is a server, a PC, or the like has been described, but the present invention is not limited to this. For example, the data providing side device 301 may be a tablet terminal, a smartphone, a wearable terminal, an IoT device, or the like.

（生成装置１００のハードウェア構成例）
次に、図４を用いて、生成装置１００のハードウェア構成例について説明する。 (Example of hardware configuration of generator 100)
Next, a hardware configuration example of the generator 100 will be described with reference to FIG.

図４は、生成装置１００のハードウェア構成例を示すブロック図である。図４において、生成装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）４０１と、メモリ４０２と、ネットワークＩ／Ｆ（Ｉｎｔｅｒｆａｃｅ）４０３と、記録媒体Ｉ／Ｆ４０４と、記録媒体４０５とを有する。また、各構成部は、バス４００によってそれぞれ接続される。 FIG. 4 is a block diagram showing a hardware configuration example of the generation device 100. In FIG. 4, the generation device 100 includes a CPU (General Processing Unit) 401, a memory 402, a network I / F (Interface) 403, a recording medium I / F 404, and a recording medium 405. Further, each component is connected by a bus 400.

ここで、ＣＰＵ４０１は、生成装置１００の全体の制御を司る。メモリ４０２は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびフラッシュＲＯＭなどを有する。具体的には、例えば、フラッシュＲＯＭやＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ４０１のワークエリアとして使用される。メモリ４０２に記憶されるプログラムは、ＣＰＵ４０１にロードされることにより、コーディングされている処理をＣＰＵ４０１に実行させる。 Here, the CPU 401 controls the entire generation device 100. The memory 402 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a flash ROM, and the like. Specifically, for example, a flash ROM or ROM stores various programs, and the RAM is used as a work area of the CPU 401. The program stored in the memory 402 is loaded into the CPU 401 to cause the CPU 401 to execute the coded process.

ネットワークＩ／Ｆ４０３は、通信回線を通じてネットワーク３１０に接続され、ネットワーク３１０を介して他のコンピュータに接続される。そして、ネットワークＩ／Ｆ４０３は、ネットワーク３１０と内部のインターフェースを司り、他のコンピュータからのデータの入出力を制御する。ネットワークＩ／Ｆ４０３は、例えば、モデムやＬＡＮアダプタなどである。 The network I / F 403 is connected to the network 310 through a communication line, and is connected to another computer via the network 310. Then, the network I / F 403 controls the internal interface with the network 310 and controls the input / output of data from another computer. The network I / F 403 is, for example, a modem or a LAN adapter.

記録媒体Ｉ／Ｆ４０４は、ＣＰＵ４０１の制御に従って記録媒体４０５に対するデータのリード／ライトを制御する。記録媒体Ｉ／Ｆ４０４は、例えば、ディスクドライブ、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）ポートなどである。記録媒体４０５は、記録媒体Ｉ／Ｆ４０４の制御で書き込まれたデータを記憶する不揮発メモリである。記録媒体４０５は、例えば、ディスク、半導体メモリ、ＵＳＢメモリなどである。記録媒体４０５は、生成装置１００から着脱可能であってもよい。 The recording medium I / F 404 controls data read / write to the recording medium 405 according to the control of the CPU 401. The recording medium I / F404 is, for example, a disk drive, an SSD (Solid State Drive), a USB (Universal General Bus) port, or the like. The recording medium 405 is a non-volatile memory that stores data written under the control of the recording medium I / F404. The recording medium 405 is, for example, a disk, a semiconductor memory, a USB memory, or the like. The recording medium 405 may be detachable from the generation device 100.

生成装置１００は、上述した構成部の他、例えば、キーボード、マウス、ディスプレイ、プリンタ、スキャナ、マイク、スピーカーなどを有してもよい。また、生成装置１００は、記録媒体Ｉ／Ｆ４０４や記録媒体４０５を複数有していてもよい。また、生成装置１００は、記録媒体Ｉ／Ｆ４０４や記録媒体４０５を有していなくてもよい。 The generation device 100 may include, for example, a keyboard, a mouse, a display, a printer, a scanner, a microphone, a speaker, and the like, in addition to the above-described components. Further, the generation device 100 may have a plurality of recording media I / F 404 and recording media 405. Further, the generation device 100 does not have to have the recording medium I / F 404 or the recording medium 405.

（データ管理テーブル５００の記憶内容）
次に、図５を用いて、データ管理テーブル５００の記憶内容の一例について説明する。データ管理テーブル５００は、例えば、図４に示した生成装置１００のメモリ４０２や記録媒体４０５などの記憶領域により実現される。 (Stored contents of data management table 500)
Next, an example of the stored contents of the data management table 500 will be described with reference to FIG. The data management table 500 is realized, for example, by a storage area such as a memory 402 or a recording medium 405 of the generation device 100 shown in FIG.

図５は、データ管理テーブル５００の記憶内容の一例を示す説明図である。図５に示すように、データ管理テーブル５００は、氏名と、性別と、年齢と、身長とのフィールドを有する。データ管理テーブル５００は、個人ごとに各フィールドに情報を設定することにより、個人データがレコード５００−ａとして記憶される。ａは、任意の整数である。 FIG. 5 is an explanatory diagram showing an example of the stored contents of the data management table 500. As shown in FIG. 5, the data management table 500 has fields for name, gender, age, and height. In the data management table 500, personal data is stored as a record 500-a by setting information in each field for each individual. a is an arbitrary integer.

氏名のフィールドには、個人を識別する氏名が設定される。性別のフィールドには、個人の属性値として、個人の性別が設定される。年齢のフィールドには、個人の属性値として、個人の年齢が設定される。身長のフィールドには、個人の属性値として、個人の身長が設定される。個人データは、個人の属性値として、個人の氏名と、性別と、年齢と、身長とのいずれかを含まなくてもよい。個人データは、個人の属性値として、個人の氏名と、性別と、年齢と、身長との属性値以外を含んでいてもよい。 In the name field, a name that identifies an individual is set. In the gender field, the gender of the individual is set as the attribute value of the individual. In the age field, the age of the individual is set as the attribute value of the individual. In the height field, the height of the individual is set as an attribute value of the individual. The personal data does not have to include any of the individual's name, gender, age, and height as individual attribute values. The personal data may include other than the attribute values of the individual's name, gender, age, and height as the attribute values of the individual.

（データ提供側装置３０１のハードウェア構成例）
データ提供側装置３０１のハードウェア構成例は、図４に示した生成装置１００のハードウェア構成例と同様であるため、説明を省略する。 (Example of hardware configuration of data provider device 301)
Since the hardware configuration example of the data providing side device 301 is the same as the hardware configuration example of the generation device 100 shown in FIG. 4, the description thereof will be omitted.

（データ利用側装置３０２のハードウェア構成例）
データ利用側装置３０２のハードウェア構成例は、図４に示した生成装置１００のハードウェア構成例と同様であるため、説明を省略する。 (Example of hardware configuration of device 302 on the data utilization side)
Since the hardware configuration example of the data utilization side device 302 is the same as the hardware configuration example of the generation device 100 shown in FIG. 4, the description thereof will be omitted.

（生成装置１００の機能的構成例）
次に、図６を用いて、生成装置１００の機能的構成例について説明する。 (Example of functional configuration of generator 100)
Next, a functional configuration example of the generator 100 will be described with reference to FIG.

図６は、生成装置１００の機能的構成例を示すブロック図である。生成装置１００は、記憶部６００と、取得部６０１と、第１の匿名化部６０２と、判定部６０３と、特定部６０４と、学習部６０５と、生成部６０６と、第２の匿名化部６０７と、出力部６０８とを含む。 FIG. 6 is a block diagram showing a functional configuration example of the generator 100. The generation device 100 includes a storage unit 600, an acquisition unit 601, a first anonymization unit 602, a determination unit 603, a specific unit 604, a learning unit 605, a generation unit 606, and a second anonymization unit. 607 and an output unit 608 are included.

記憶部６００は、例えば、図４に示したメモリ４０２や記録媒体４０５などの記憶領域によって実現される。以下では、記憶部６００が、生成装置１００に含まれる場合について説明するが、これに限らない。例えば、記憶部６００が、生成装置１００とは異なる装置に含まれ、記憶部６００の記憶内容が生成装置１００から参照可能である場合があってもよい。 The storage unit 600 is realized by, for example, a storage area such as the memory 402 or the recording medium 405 shown in FIG. Hereinafter, the case where the storage unit 600 is included in the generation device 100 will be described, but the present invention is not limited to this. For example, the storage unit 600 may be included in a device different from the generation device 100, and the storage contents of the storage unit 600 may be referenceable from the generation device 100.

取得部６０１〜出力部６０８は、制御部の一例として機能する。取得部６０１〜出力部６０８は、具体的には、例えば、図４に示したメモリ４０２や記録媒体４０５などの記憶領域に記憶されたプログラムをＣＰＵ４０１に実行させることにより、または、ネットワークＩ／Ｆ４０３により、その機能を実現する。各機能部の処理結果は、例えば、図４に示したメモリ４０２や記録媒体４０５などの記憶領域に記憶される。 The acquisition unit 601 to the output unit 608 function as an example of the control unit. Specifically, the acquisition unit 601 to the output unit 608 may, for example, cause the CPU 401 to execute a program stored in a storage area such as the memory 402 or the recording medium 405 shown in FIG. 4, or the network I / F 403. To realize the function. The processing result of each functional unit is stored in a storage area such as the memory 402 or the recording medium 405 shown in FIG. 4, for example.

記憶部６００は、各機能部の処理において参照され、または更新される各種情報を記憶する。記憶部６００は、複数の個人データを記憶する。個人データは、例えば、個人に関する何らかの特徴を示す値を含む。値は、例えば、属性値である。個人データは、例えば、取得部６０１によって取得される。記憶部６００は、複数の匿名データを記憶する。匿名データは、例えば、第１の匿名化部６０２、または、第２の匿名化部６０７によって生成される。 The storage unit 600 stores various information referred to or updated in the processing of each functional unit. The storage unit 600 stores a plurality of personal data. Personal data includes, for example, values that indicate some characteristic of the individual. The value is, for example, an attribute value. Personal data is acquired by, for example, acquisition unit 601. The storage unit 600 stores a plurality of anonymous data. The anonymized data is generated, for example, by the first anonymization unit 602 or the second anonymization unit 607.

記憶部６００は、第１の匿名化モデルを記憶する。第１の匿名化モデルは、情報を匿名化するモデルである。匿名化は、加工に対応する。第１の匿名化モデルは、例えば、個人データに含まれる値に、ランダムなノイズ値を加算して得られる匿名データを、１以上生成するモデルである。第１の匿名化モデルは、例えば、確率的な差分プライバシーアルゴリズムである。第１の匿名化モデルは、例えば、生成部６０６によって生成される。 The storage unit 600 stores the first anonymization model. The first anonymization model is a model for anonymizing information. Anonymization corresponds to processing. The first anonymization model is, for example, a model that generates one or more anonymous data obtained by adding a random noise value to a value included in personal data. The first anonymization model is, for example, a stochastic differential privacy algorithm. The first anonymization model is generated, for example, by the generation unit 606.

記憶部６００は、第２の匿名化モデルを記憶する。第２の匿名化モデルは、情報を匿名化するモデルである。第２の匿名化モデルは、例えば、匿名データに含まれる値に、ランダムなノイズ値を加算して得られる新たな匿名データを、１以上生成するモデルである。第２の匿名化モデルは、例えば、確率的な差分プライバシーアルゴリズムである。第２の匿名化モデルは、例えば、第１の匿名化モデルと同一のアルゴリズムである。 The storage unit 600 stores the second anonymization model. The second anonymization model is a model for anonymizing information. The second anonymization model is, for example, a model that generates one or more new anonymous data obtained by adding a random noise value to a value included in the anonymous data. The second anonymization model is, for example, a stochastic differential privacy algorithm. The second anonymization model is, for example, the same algorithm as the first anonymization model.

取得部６０１は、各機能部の処理に用いられる各種情報を取得する。取得部６０１は、取得した各種情報を、記憶部６００に記憶し、または、各機能部に出力する。また、取得部６０１は、記憶部６００に記憶しておいた各種情報を、各機能部に出力してもよい。取得部６０１は、例えば、ユーザの操作入力に基づき、各種情報を取得する。取得部６０１は、例えば、生成装置１００とは異なる装置から、各種情報を受信してもよい。 The acquisition unit 601 acquires various information used for processing of each functional unit. The acquisition unit 601 stores various acquired information in the storage unit 600 or outputs the acquired information to each function unit. Further, the acquisition unit 601 may output various information stored in the storage unit 600 to each function unit. The acquisition unit 601 acquires various information based on, for example, a user's operation input. The acquisition unit 601 may receive various information from a device different from the generation device 100, for example.

取得部６０１は、複数の個人データを取得する。取得部６０１は、例えば、複数の個人データを、データ提供側装置３０１から受信することにより取得する。取得部６０１は、例えば、ユーザの操作入力に基づき、複数の個人データを取得する。取得部６０１は、例えば、第１の匿名化モデルを取得してもよい。 The acquisition unit 601 acquires a plurality of personal data. The acquisition unit 601 acquires, for example, by receiving a plurality of personal data from the data providing side device 301. The acquisition unit 601 acquires a plurality of personal data based on, for example, a user's operation input. The acquisition unit 601 may acquire, for example, the first anonymization model.

取得部６０１は、いずれかの機能部の処理を開始する開始トリガーを受け付けてもよい。開始トリガーは、例えば、ユーザによる所定の操作入力があったことである。開始トリガーは、例えば、他のコンピュータから、所定の情報を受信したことであってもよい。開始トリガーは、例えば、いずれかの機能部が所定の情報を出力したことであってもよい。取得部６０１は、例えば、複数の個人データを取得したことを、第１の匿名化部６０２〜第２の匿名化部６０７の処理を開始する開始トリガーとして受け付ける。 The acquisition unit 601 may accept a start trigger to start processing of any of the functional units. The start trigger is, for example, that there is a predetermined operation input by the user. The start trigger may be, for example, the receipt of predetermined information from another computer. The start trigger may be, for example, that any functional unit outputs predetermined information. The acquisition unit 601 accepts, for example, the acquisition of a plurality of personal data as a start trigger for starting the processing of the first anonymization unit 602 to the second anonymization unit 607.

第１の匿名化部６０２は、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化し、１以上の匿名データを生成する。第１の匿名化部６０２は、例えば、複数の個人データに含まれる個人データをランダムに複数回選択する。第１の匿名化部６０２は、例えば、個人データを選択する都度、選択した個人データに基づいて、第１の匿名化モデルにより匿名データを１以上生成する。これにより、第１の匿名化部６０２は、匿名データを生成し、個人データの匿名性の向上を図ることができる。 The first anonymization unit 602 anonymizes the personal data included in the plurality of personal data by the first anonymization model and generates one or more anonymous data. The first anonymization unit 602 randomly selects personal data included in a plurality of personal data a plurality of times. For example, each time the personal data is selected, the first anonymization unit 602 generates one or more anonymous data by the first anonymization model based on the selected personal data. As a result, the first anonymization unit 602 can generate anonymous data and improve the anonymity of personal data.

判定部６０３は、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化した結果に基づいて、当該個人データの匿名度合いが、所定の基準を満たすか否かを判定する。所定の基準は、例えば、プライバシーテストの基準である。判定部６０３は、例えば、１以上の匿名データに基づいて、プライバシーテストを実施し、１以上の匿名データを生成する元となった、選択した個人データの匿名度合いが、所定の基準を満たすか否かを判定する。 The determination unit 603 determines whether or not the degree of anonymity of the personal data satisfies a predetermined criterion based on the result of anonymizing the personal data contained in the plurality of personal data by the first anonymization model. .. The predetermined criteria are, for example, privacy test criteria. For example, the determination unit 603 conducts a privacy test based on one or more anonymous data, and whether the degree of anonymity of the selected personal data that is the source of generating one or more anonymous data meets a predetermined criterion. Judge whether or not.

判定部６０３は、具体的には、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化した際に、当該個人データと同一または類似する値を含む、複数の個人データのうちの他の個人データの数を算出する。また、それぞれの個人データが、複数の値を含む場合が考えられる。この場合、判定部６０３は、具体的には、特定の項目の値が、匿名化の対象とした個人データと同一または類似する他の個人データの数を算出してもよい。 Specifically, the determination unit 603 includes a plurality of personal data including values that are the same as or similar to the personal data when the personal data contained in the plurality of personal data is anonymized by the first anonymization model. Calculate the number of other personal data of. In addition, each personal data may include a plurality of values. In this case, the determination unit 603 may specifically calculate the number of other personal data whose value of the specific item is the same as or similar to the personal data targeted for anonymization.

ここで、判定部６０３は、算出した数が所定の数以下であれば、所定の基準を満たさないと判定する。所定の数は、例えば、固定値である。所定の数は、例えば、可変値であってもよい。可変値は、例えば、ｋ＋Ｌａｐ（１／ε⁰）である。一方で、判定部６０３は、算出した数が所定の数より大きければ、所定の基準を満たすと判定する。これにより、判定部６０３は、個人データの匿名性が確保されているか否かを判定することができる。 Here, if the calculated number is equal to or less than a predetermined number, the determination unit 603 determines that the predetermined criterion is not satisfied. The predetermined number is, for example, a fixed value. The predetermined number may be, for example, a variable value. The variable value is, for example, k + Lap (1 / ε ⁰ ). On the other hand, if the calculated number is larger than the predetermined number, the determination unit 603 determines that the predetermined criterion is satisfied. As a result, the determination unit 603 can determine whether or not the anonymity of the personal data is ensured.

判定部６０３は、具体的には、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化した際に、加算したノイズ値の代表値が、所定の閾値以下であれば、所定の基準を満たさないと判定する。代表値は、具体的には、平均値、最大値、最小値、中央値、または、最頻値などである。これにより、判定部６０３は、個人データの匿名性が確保されているか否かを判定することができる。 Specifically, when the personal data included in the plurality of personal data is anonymized by the first anonymization model, the determination unit 603 determines that the representative value of the added noise value is equal to or less than a predetermined threshold value. , Judge that the predetermined criteria are not met. Specifically, the representative value is an average value, a maximum value, a minimum value, a median value, a mode value, or the like. As a result, the determination unit 603 can determine whether or not the anonymity of the personal data is ensured.

判定部６０３は、具体的には、所定の確率で、選択した個人データの実際の匿名度合いによらず、所定の基準を満たさないと判定してもよい。所定の確率は、例えば、ユーザによって設定される。これにより、判定部６０３は、学習部６０５が参照可能な個人データの数の増大化を図ることができ、学習部６０５が第２の匿名化モデルを生成し易くすることができる。 Specifically, the determination unit 603 may determine with a predetermined probability that the predetermined criteria are not satisfied regardless of the actual degree of anonymity of the selected personal data. The predetermined probability is set by the user, for example. As a result, the determination unit 603 can increase the number of personal data that the learning unit 605 can refer to, and the learning unit 605 can easily generate the second anonymization model.

特定部６０４は、複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定する。特定部６０４は、例えば、判定した結果に基づいて、１以上の個人データを特定する。特定部６０４は、具体的には、判定した結果がＮＧである１以上の個人データを特定する。これにより、特定部６０４は、プライバシーテストにより破棄された匿名データを生成する元となった、匿名度合いが所定の基準を満たさない１以上の個人データを特定することができる。 The specific unit 604 identifies one or more personal data whose anonymity degree does not meet a predetermined criterion among the plurality of personal data. The identification unit 604 identifies one or more personal data based on the determination result, for example. Specifically, the specific unit 604 specifies one or more personal data whose determination result is NG. As a result, the identification unit 604 can identify one or more personal data whose anonymity degree does not meet a predetermined criterion, which is the source of generating the anonymous data destroyed by the privacy test.

特定部６０４は、特定した１以上の個人データを、１以上のクラスタに分割してもよい。特定部６０４は、例えば、特定した１以上の個人データのそれぞれの個人データに含まれる値に基づいて、特定した１以上の個人データを、１以上のクラスタに分割する。 The specific unit 604 may divide the specified one or more personal data into one or more clusters. The specific unit 604 divides the specified one or more personal data into one or more clusters, for example, based on the value included in each personal data of the specified one or more personal data.

特定部６０４は、具体的には、特定した１以上の個人データのそれぞれの個人データに含まれる値の大小関係に基づいて、特定した１以上の個人データをソートする。特定部６０４は、具体的には、ソートした１以上の個人データの上位から、ｋ個の個人データごとに、同一のクラスタに属するよう、１以上の個人データを、１以上のクラスタに分割する。ｋは、可変値であってもよい。これにより、特定部６０４は、生成部６０６が生成する１以上の匿名データの特徴分布を示すヒストグラムを、１以上の個人データの特徴分布を示すヒストグラムと対応させ易くすることができる。 Specifically, the specific unit 604 sorts the specified one or more personal data based on the magnitude relation of the values included in the respective personal data of the specified one or more personal data. Specifically, the specific unit 604 divides one or more personal data into one or more clusters so that each k personal data belongs to the same cluster from the upper rank of one or more sorted personal data. .. k may be a variable value. As a result, the specific unit 604 can easily associate the histogram showing the feature distribution of one or more anonymous data generated by the generation unit 606 with the histogram showing the feature distribution of one or more personal data.

学習部６０５は、特定した１以上の個人データに基づいて、第２の匿名化モデルを学習する。学習部６０５は、例えば、特定した１以上の個人データのそれぞれの個人データに含まれる値に関する分散および平均に基づいて、第２の匿名化モデルに用いられるノイズ値の範囲を決定し、第２の匿名化モデルを学習する。これにより、学習部６０５は、第２の匿名化部６０７が生成する１以上の匿名データの特徴分布を示すヒストグラムを、複数の個人データの特徴分布を示すヒストグラムと対応させ易くすることができる。 The learning unit 605 learns the second anonymization model based on one or more identified personal data. The learning unit 605 determines the range of noise values used in the second anonymization model, for example, based on the variance and average of the values contained in each personal data of one or more identified personal data. Learn the anonymization model of. As a result, the learning unit 605 can easily associate the histogram showing the feature distribution of one or more anonymous data generated by the second anonymization unit 607 with the histogram showing the feature distribution of a plurality of personal data.

学習部６０５は、クラスタごとに、クラスタに分割した個人データに基づいて、クラスタに対応する第２の匿名化モデルを学習する。これにより、学習部６０５は、第２の匿名化部６０７が生成する１以上の匿名データの特徴分布を示すヒストグラムを、複数の個人データの特徴分布を示すヒストグラムと対応させ易くすることができる。 The learning unit 605 learns the second anonymization model corresponding to the cluster based on the personal data divided into the clusters for each cluster. As a result, the learning unit 605 can easily associate the histogram showing the feature distribution of one or more anonymous data generated by the second anonymization unit 607 with the histogram showing the feature distribution of a plurality of personal data.

生成部６０６は、特定した１以上の個人データに基づいて、１以上の個人データのそれぞれの個人データよりも匿名度合いが高い、１以上の匿名データを生成する。生成部６０６は、例えば、ミクロアグリゲーションを実施し、１以上の匿名データを生成する。 The generation unit 606 generates one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data based on the specified one or more personal data. The generation unit 606, for example, performs microaggregation to generate one or more anonymous data.

生成部６０６は、具体的には、特定した１以上の個人データのそれぞれの個人データに含まれる値に関する統計値を算出する。統計値は、具体的には、平均値、最大値、最小値、中央値、または、最頻値などである。生成部６０６は、特定した１以上の個人データのそれぞれの個人データに含まれる値を、算出した統計値に置換し、１以上の匿名データを生成する。これにより、生成部６０６は、個人データの匿名性の向上を図ることができる。 Specifically, the generation unit 606 calculates a statistical value regarding a value included in each personal data of one or more specified personal data. The statistic is, specifically, an average value, a maximum value, a minimum value, a median value, a mode value, or the like. The generation unit 606 replaces the value included in each personal data of the specified one or more personal data with the calculated statistical value, and generates one or more anonymous data. As a result, the generation unit 606 can improve the anonymity of the personal data.

生成部６０６は、クラスタごとに、クラスタに分割した個人データに基づいて、クラスタに分割した個人データのそれぞれよりも匿名度合いが高い、クラスタに対応する匿名データを生成する。これにより、生成部６０６は、クラスタごとに、第２の匿名化モデルが参照する匿名データを生成することができる。 The generation unit 606 generates anonymous data corresponding to the cluster, which has a higher degree of anonymity than each of the personal data divided into the clusters, based on the personal data divided into the clusters for each cluster. As a result, the generation unit 606 can generate the anonymous data referred to by the second anonymization model for each cluster.

第２の匿名化部６０７は、生成した１以上の匿名データのそれぞれの匿名データを、学習した第２の匿名化モデルにより匿名化し、新たな匿名データを生成する。これにより、第２の匿名化部６０７は、生成した１以上の新たな匿名データの特徴分布を示すヒストグラムが、複数の個人データの特徴分布を示すヒストグラムに対応するよう、１以上の新たな匿名データを生成することができる。 The second anonymization unit 607 anonymizes each anonymized data of one or more generated anonymous data by the learned second anonymization model, and generates new anonymized data. As a result, the second anonymization unit 607 has one or more new anonymities so that the histogram showing the feature distribution of one or more new anonymous data generated corresponds to the histogram showing the feature distribution of a plurality of personal data. Data can be generated.

第２の匿名化部６０７は、クラスタごとに、生成したクラスタに対応する匿名データを、学習したクラスタに対応する第２の匿名化モデルにより匿名化し、新たな匿名データを生成する。これにより、第２の匿名化部６０７は、クラスタごとに、生成した１以上の新たな匿名データの特徴分布を示すヒストグラムが、複数の個人データの特徴分布を示すヒストグラムに対応するよう、１以上の新たな匿名データを生成することができる。 The second anonymization unit 607 anonymizes the anonymized data corresponding to the generated cluster for each cluster by the second anonymization model corresponding to the learned cluster, and generates new anonymized data. As a result, the second anonymization unit 607 has one or more histograms showing the feature distributions of one or more new anonymous data generated for each cluster so as to correspond to the histograms showing the feature distributions of a plurality of personal data. New anonymous data can be generated.

出力部６０８は、いずれかの機能部の処理結果を出力する。出力形式は、例えば、ディスプレイへの表示、プリンタへの印刷出力、ネットワークＩ／Ｆ４０３による外部装置への送信、または、メモリ４０２や記録媒体４０５などの記憶領域への記憶である。これにより、出力部６０８は、いずれかの機能部の処理結果をユーザに通知可能にし、生成装置１００の利便性の向上を図ることができる。 The output unit 608 outputs the processing result of any of the functional units. The output format is, for example, display on a display, print output to a printer, transmission to an external device by the network I / F 403, or storage in a storage area such as a memory 402 or a recording medium 405. As a result, the output unit 608 can notify the user of the processing result of any of the functional units, and can improve the convenience of the generation device 100.

出力部６０８は、第２の匿名化部６０７によって生成された新たな匿名データを出力する。出力部６０８は、さらに、第１の匿名化部６０２によって生成された匿名データを出力する。出力部６０８は、例えば、第１の匿名化部６０２によって生成された匿名データと、第２の匿名化部６０７によって生成された新たな匿名データとを併せて出力する。これにより、出力部６０８は、有用な匿名データを利用可能にすることができる。 The output unit 608 outputs new anonymized data generated by the second anonymization unit 607. The output unit 608 further outputs the anonymous data generated by the first anonymization unit 602. The output unit 608 outputs, for example, the anonymous data generated by the first anonymization unit 602 and the new anonymity data generated by the second anonymization unit 607. As a result, the output unit 608 can make useful anonymous data available.

（生成装置１００の第１の動作例）
次に、図７を用いて、生成装置１００の第１の動作例について説明する。 (First operation example of the generator 100)
Next, a first operation example of the generator 100 will be described with reference to FIG. 7.

図７は、生成装置１００の第１の動作例を示す説明図である。図７において、（７−１）生成装置１００は、データ管理テーブル５００から、氏名の属性値を削除した後の個人データ群７０１を取得する。 FIG. 7 is an explanatory diagram showing a first operation example of the generator 100. In FIG. 7, (7-1) the generation device 100 acquires the personal data group 701 after deleting the attribute value of the name from the data management table 500.

（７−２）生成装置１００は、取得した個人データ群７０１のそれぞれの個人データに含まれる値に基づいて、個人データ群７０１をクラスタリングする。クラスタリングは、ヒストグラムを作成する可能性が高い属性について実施される。ヒストグラムを作成する可能性が高い属性は、例えば、予めユーザによって指定される。クラスタリングは、例えば、Ｋ−匿名化処理に規定されたクラスタリングが用いられる。 (7-2) The generation device 100 clusters the personal data group 701 based on the value included in each personal data of the acquired personal data group 701. Clustering is performed on attributes that are likely to produce a histogram. The attributes that are likely to create a histogram are specified in advance by the user, for example. As the clustering, for example, the clustering defined in the K-anonymization process is used.

図７の例では、生成装置１００は、個人データ群７０１を、性別の値が「女性」の個人データを含むクラスタ７０２と、性別の値が「男性」の個人データを含むクラスタ７０３とに分割する。ここでは、性別の値が、準識別子として扱われている。性別以外の値が、センシティブ属性として扱われている。生成装置１００は、性別の値が「女性」の個人データを含むクラスタ７０２を、ＤＢ７１０に保存する。生成装置１００は、性別の値が「男性」の個人データを含むクラスタ７０３を、ＤＢ７２０に保存する。 In the example of FIG. 7, the generator 100 divides the personal data group 701 into a cluster 702 containing personal data having a gender value of "female" and a cluster 703 containing personal data having a gender value of "male". do. Here, the gender value is treated as a quasi-identifier. Values other than gender are treated as sensitive attributes. The generation device 100 stores the cluster 702 containing the personal data whose gender value is "female" in the DB 710. The generation device 100 stores the cluster 703 including the personal data whose gender value is "male" in the DB 720.

以下の説明では、生成装置１００が、ＤＢ７１０を処理対象とする場合について説明する。生成装置１００が、ＤＢ７２０を処理対象とする場合については、生成装置１００が、ＤＢ７１０を処理対象とする場合と同様であるため、説明を省略する。 In the following description, a case where the generation device 100 targets the DB 710 as a processing target will be described. The case where the generation device 100 targets the DB 720 as the processing target is the same as the case where the generation device 100 targets the DB 710, and thus the description thereof will be omitted.

（７−３）生成装置１００は、ＤＢ７１０に対して、ノイズ付与とプライバシーテストとを実施する。生成装置１００は、例えば、ＤＢ７１０に記憶された個人データ群７１１を取得する。生成装置１００は、例えば、取得した個人データ群７１１に含まれる個人データを、ランダムに所定回数選択する。 (7-3) The generation device 100 performs noise addition and a privacy test on the DB 710. The generation device 100 acquires, for example, the personal data group 711 stored in the DB 710. The generation device 100 randomly selects, for example, the personal data included in the acquired personal data group 711 a predetermined number of times.

生成装置１００は、例えば、個人データを選択する都度、選択した個人データに基づいて、生成モデル７３０により匿名データを１以上生成する。生成モデル７３０は、確率的な生成モデルである。生成モデル７３０は、例えば、差分プライバシーのメカニズムを有する生成モデルである。生成モデル７３０は、例えば、個人データ群７１１に基づいて生成されてもよい。 For example, the generation device 100 generates one or more anonymous data by the generation model 730 based on the selected personal data each time the personal data is selected. The generative model 730 is a probabilistic generative model. The generative model 730 is, for example, a generative model having a differential privacy mechanism. The generative model 730 may be generated based on, for example, the personal data group 711.

生成装置１００は、例えば、生成した１以上の匿名データに基づいて、プライバシーテストを実施し、生成した１以上の匿名データを生成する元となった個人データの匿名度合いが、所定の基準を満たすか否かを判定する。ここで、生成装置１００は、例えば、判定した結果がＯＫであれば、生成した１以上の匿名データを、リリースＤＢ７４０に保存する。ＯＫは、個人データの匿名度合いが所定の基準を満たし、プライバシーテストに合格したことを意味する。一方で、生成装置１００は、判定した結果がＮＧであれば、生成した１以上の匿名データを破棄し、生成した１以上の匿名データを生成する元となった個人データを、ＮＧ−ＤＢ７５０に保存する。ＮＧは、個人データの匿名度合いが所定の基準を満たさず、プライバシーテストに合格しなかったことを意味する。 The generation device 100, for example, conducts a privacy test based on one or more generated anonymous data, and the degree of anonymity of the personal data from which the generated one or more anonymous data is generated satisfies a predetermined criterion. Judge whether or not. Here, for example, if the determination result is OK, the generation device 100 stores the generated one or more anonymous data in the release DB 740. OK means that the degree of anonymity of personal data meets certain criteria and has passed the privacy test. On the other hand, if the determination result is NG, the generation device 100 discards the generated one or more anonymous data, and converts the generated personal data that is the source of generating the one or more anonymous data into the NG-DB750. save. NG means that the degree of anonymity of personal data did not meet the prescribed criteria and did not pass the privacy test.

（７−４）生成装置１００は、ＮＧ−ＤＢ７５０に対して、ミクロアグリゲーションを実施する。生成装置１００は、例えば、ＮＧ−ＤＢ７５０に記憶された個人データ群７５１を取得する。生成装置１００は、例えば、取得した個人データ群７５１に対して、ミクロアグリゲーションを実施する。ミクロアグリゲーションとは、個人データ群７５１のそれぞれの個人データに含まれる値を、個人データ群７５１のそれぞれの個人データに含まれる値に関する統計値に置換する手法である。統計値は、具体的には、平均値、最大値、最小値、中央値、または、最頻値などである。 (7-4) The generator 100 performs microaggregation on the NG-DB750. The generation device 100 acquires, for example, the personal data group 751 stored in the NG-DB750. The generation device 100 performs microaggregation on, for example, the acquired personal data group 751. Microaggregation is a method of replacing the value contained in each personal data of the personal data group 751 with a statistical value relating to the value contained in each personal data of the personal data group 751. The statistic is, specifically, an average value, a maximum value, a minimum value, a median value, a mode value, or the like.

図７の例では、生成装置１００は、個人データ群７５１のそれぞれの個人データに含まれる値を、個人データ群７５１のそれぞれの個人データに含まれる値に関する平均値に置換することにより、匿名データ群７６１を生成する。生成装置１００は、例えば、ミクロアグリゲーションにより、個人データ群７５１から得られた匿名データ群７６１を、ＭＡ−ＤＢ７６０に保存する。 In the example of FIG. 7, the generator 100 replaces the value contained in each personal data of the personal data group 751 with the average value of the values contained in each personal data of the personal data group 751 to obtain anonymous data. Generate group 761. The generation device 100 stores, for example, the anonymous data group 761 obtained from the personal data group 751 by microaggregation in the MA-DB760.

ここでは、生成装置１００が、個人データ群７５１のそれぞれの個人データに含まれる値を、個人データ群７５１のそれぞれの個人データに含まれる値に関する平均値に置換することにより、匿名データを生成する場合について説明したが、これに限らない。この場合、匿名データは、平均値に比較的近い値を含む個人データに対応する個人のものであると誤認されるおそれがある。このため、例えば、生成装置１００が、個人データ群７５１のそれぞれの個人データに含まれる値を、個人データ群７５１のそれぞれの個人データに含まれる値から一定以上離れた値に置換することにより、匿名データを生成する場合があってもよい。 Here, the generation device 100 generates anonymous data by replacing the value contained in each personal data of the personal data group 751 with the average value of the values contained in each personal data of the personal data group 751. The case has been described, but it is not limited to this. In this case, the anonymous data may be mistaken for the individual corresponding to the personal data containing a value relatively close to the average value. Therefore, for example, the generation device 100 replaces the value included in each personal data of the personal data group 751 with a value separated from the value contained in each personal data of the personal data group 751 by a certain amount or more. Anonymous data may be generated.

（７−５）生成装置１００は、ＮＧ−ＤＢ７５０に基づいて、生成モデル７７０を学習する。生成装置１００は、例えば、ＮＧ−ＤＢ７５０に記憶された個人データ群７５１を取得する。生成装置１００は、例えば、取得した個人データ群７５１のそれぞれの個人データを、学習データに用いて、生成モデル７７０を学習する。生成モデル７７０は、確率的な生成モデルである。生成モデル７７０は、例えば、差分プライバシーのメカニズムを有する生成モデルである。生成装置１００は、例えば、取得した個人データ群７５１のそれぞれの個人データに含まれる値に関する分散および平均に基づいて、生成モデル７７０を学習する。 (7-5) The generation device 100 learns the generation model 770 based on the NG-DB750. The generation device 100 acquires, for example, the personal data group 751 stored in the NG-DB750. The generation device 100 learns the generation model 770 by using, for example, each personal data of the acquired personal data group 751 as the training data. The generative model 770 is a probabilistic generative model. The generative model 770 is, for example, a generative model having a differential privacy mechanism. The generative device 100 learns the generative model 770, for example, based on the variance and average of the values contained in each personal data of the acquired personal data group 751.

（７−６）生成装置１００は、ＭＡ−ＤＢ７６０に対して、ノイズ付与を実施する。生成装置１００は、例えば、ＭＡ−ＤＢ７６０に記憶された匿名データ群７６１を取得する。生成装置１００は、例えば、取得した匿名データ群７６１に含まれる匿名データを、ランダムに所定回数選択する。生成装置１００は、取得した匿名データ群７６１のそれぞれの匿名データを選択してもよい。 (7-6) The generation device 100 applies noise to the MA-DB760. The generation device 100 acquires, for example, the anonymous data group 761 stored in the MA-DB760. The generation device 100 randomly selects, for example, the anonymous data included in the acquired anonymous data group 761 a predetermined number of times. The generation device 100 may select each anonymous data of the acquired anonymous data group 761.

生成装置１００は、例えば、匿名データを選択する都度、選択した匿名データに基づいて、学習した生成モデル７７０により、新たな匿名データを１以上生成する。生成装置１００は、例えば、生成した新たな匿名データを含む匿名データ群７７１を、リリースＤＢ７４０に保存する。生成装置１００は、新たな匿名データを１以上生成した際、プライバシーテストを実施してもよい。 For example, the generation device 100 generates one or more new anonymous data by the learned generation model 770 based on the selected anonymous data each time the anonymous data is selected. The generation device 100 stores, for example, the anonymous data group 771 including the generated new anonymous data in the release DB 740. The generation device 100 may perform a privacy test when one or more new anonymous data are generated.

このように、生成装置１００が、ＤＢ７１０を処理対象とした場合、リリースＤＢ７４０が得られる。一方で、生成装置１００が、ＤＢ７２０を処理対象とした場合、リリースＤＢ７８１が得られたものとする。 In this way, when the generation device 100 targets the DB 710 as a processing target, the release DB 740 is obtained. On the other hand, when the generation device 100 targets the DB 720 as a processing target, it is assumed that the release DB 781 is obtained.

（７−７）生成装置１００は、ＤＢ７１０を処理対象として得られたリリースＤＢ７４０と、ＤＢ７２０を処理対象として得られたリリースＤＢ７８１とを結合し、ＤＢ７８０を生成する。これにより、生成装置１００は、有用な匿名データを得ることができ、匿名データの有用性と、個人データの匿名性とを両立することができる。 (7-7) The generation device 100 combines the release DB 740 obtained with the DB 710 as the processing target and the release DB 781 obtained with the DB 720 as the processing target to generate the DB 780. As a result, the generation device 100 can obtain useful anonymous data, and can achieve both the usefulness of the anonymous data and the anonymity of the personal data.

生成装置１００は、例えば、プライバシーテストにより、ＮＧと判定された個人データに基づいて、新たな匿名データを生成することができる。このため、生成装置１００は、例えば、出力される複数の匿名データを、統計処理において好ましいデータにすることができる。生成装置１００は、具体的には、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとを近づけることができる。 The generation device 100 can generate new anonymous data based on personal data determined to be NG by, for example, a privacy test. Therefore, the generation device 100 can, for example, convert a plurality of output anonymous data into preferable data in statistical processing. Specifically, the generation device 100 can bring the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data close to each other.

また、生成装置１００は、例えば、匿名度合いが所定の基準を満たさないと判定された個人データを、より匿名度合いが高い形式である匿名データに変換してから、生成モデル７７０により、新たな匿名データを生成することができる。このため、生成装置１００は、例えば、個人データの匿名性を確保し易くすることができる。 Further, for example, the generation device 100 converts personal data for which the degree of anonymity is determined not to satisfy a predetermined criterion into anonymous data in a format having a higher degree of anonymity, and then uses the generation model 770 to perform new anonymity. Data can be generated. Therefore, the generation device 100 can facilitate, for example, ensuring the anonymity of personal data.

生成装置１００は、例えば、複数の個人データのうち、ＮＧと判定された１以上の個人データに基づいて、生成モデル７７０を学習し、学習した生成モデル７７０により、新たな匿名データを生成することができる。このため、生成装置１００は、例えば、出力される複数の匿名データを、統計処理において好ましいデータにすることができる。生成装置１００は、具体的には、例えば、ＮＧと判定された個人データの特徴分布を示すヒストグラムと、生成モデル７７０により生成される新たな匿名データの特徴分布を示すヒストグラムとを近づけることができる。 The generation device 100 learns a generation model 770 based on one or more personal data determined to be NG among a plurality of personal data, and generates new anonymous data by the learned generation model 770. Can be done. Therefore, the generation device 100 can, for example, convert a plurality of output anonymous data into preferable data in statistical processing. Specifically, the generation device 100 can bring, for example, a histogram showing the feature distribution of personal data determined to be NG and a histogram showing the feature distribution of new anonymous data generated by the generative model 770 close to each other. ..

以上により、生成装置１００は、匿名データの有用性と、個人データの匿名性とを両立し、外部に流通させて第３者に参照されても問題が発生しないと考えられる匿名データのＤＢ７８０を生成することができる。このため、生成装置１００は、統計分析、または、機械学習の分野において利用することができる。 As described above, the generation device 100 has both the usefulness of anonymous data and the anonymity of personal data, and it is considered that no problem will occur even if the data is distributed to the outside and referred to by a third party. Can be generated. Therefore, the generator 100 can be used in the field of statistical analysis or machine learning.

ここでは、生成装置１００が、（７−３）において、単にプライバシーテストを実施する場合について説明したが、これに限らない。例えば、生成装置１００が、（７−３）において、プライバシーテストを実施するにあたり、生成した１以上の匿名データを生成する元となった個人データの匿名度合いによらず、一定確率で、ＮＧであると判定するように動作する場合があってもよい。 Here, the case where the generator 100 simply performs the privacy test in (7-3) has been described, but the present invention is not limited to this. For example, when the generator 100 performs the privacy test in (7-3), it is NG with a certain probability regardless of the degree of anonymity of the personal data that is the source of generating one or more generated anonymous data. It may operate so as to determine that there is.

これにより、生成装置１００は、ＮＧ−ＤＢ７５０に記憶された個人データの数が少ないために、個人データの匿名性が損なわれるおそれが生じるような状況を回避することができる。この場合における生成装置１００の動作例は、図８を用いて後述する第２の動作例に対応する。 As a result, the generation device 100 can avoid a situation in which the anonymity of the personal data may be impaired because the number of personal data stored in the NG-DB 750 is small. The operation example of the generation device 100 in this case corresponds to the second operation example described later with reference to FIG.

ここでは、生成装置１００が、（７−４）において、ＮＧ−ＤＢ７５０に記憶された個人データ群７５１全体に対して、ミクロアグリゲーションを実施する場合について説明したが、これに限らない。例えば、生成装置１００が、（７−４）において、ＮＧ−ＤＢ７５０に記憶された個人データ群７５１のうち、ｋ個の個人データごとに、ミクロアグリゲーションを実施する場合があってもよい。この場合、生成装置１００は、ｋ個の個人データごとに、生成モデル７７０を学習することになる。 Here, the case where the generator 100 performs microaggregation on the entire personal data group 751 stored in the NG-DB750 in (7-4) has been described, but the present invention is not limited to this. For example, in (7-4), the generation device 100 may perform microaggregation for each k personal data in the personal data group 751 stored in the NG-DB750. In this case, the generation device 100 learns the generation model 770 for each k personal data.

これにより、生成装置１００は、ＮＧと判定された個人データの特徴分布を示すヒストグラムと、生成モデル７７０により生成される新たな匿名データの特徴分布を示すヒストグラムとを、さらに近づけ易くすることができる。この場合における生成装置１００の動作例は、図９を用いて後述する第３の動作例に対応する。 As a result, the generation device 100 can make it easier to bring the histogram showing the feature distribution of the personal data determined to be NG and the histogram showing the feature distribution of the new anonymous data generated by the generative model 770 closer to each other. .. The operation example of the generation device 100 in this case corresponds to the third operation example described later with reference to FIG.

（生成装置１００の第２の動作例）
次に、図８を用いて、生成装置１００の第２の動作例について説明する。 (Second operation example of the generator 100)
Next, a second operation example of the generator 100 will be described with reference to FIG.

図８は、生成装置１００の第２の動作例を示す説明図である。第２の動作例は、生成装置１００が、プライバシーテストを実施するにあたり、生成した１以上の匿名データを生成する元となった個人データの匿名度合いによらず、一定確率で、ＮＧであると判定するように動作する場合に対応する。 FIG. 8 is an explanatory diagram showing a second operation example of the generator 100. In the second operation example, when the generation device 100 carries out the privacy test, it is determined that the generation device 100 is NG with a certain probability regardless of the degree of anonymity of the personal data from which the generated one or more anonymous data is generated. Corresponds to the case where it operates to judge.

図８において、（８−１）生成装置１００は、（７−１）および（７−２）と同様の動作により、データ管理テーブル５００に基づいて、性別の値が「女性」の個人データを含む個人データ群８０１を、ＤＢ８００に保存している。 In FIG. 8, the (8-1) generator 100 performs the same operation as (7-1) and (7-2) to generate personal data having a gender value of "female" based on the data management table 500. The personal data group 801 including the personal data group 801 is stored in the DB 800.

以下の説明では、生成装置１００が、ＤＢ８００を処理対象とする場合について説明する。生成装置１００が、性別の値が「男性」の個人データを含む個人データ群を保存した他のＤＢなどを処理対象とする場合については、生成装置１００が、ＤＢ８００を処理対象とする場合と同様であるため、説明を省略する。 In the following description, a case where the generation device 100 targets the DB 800 as a processing target will be described. When the generation device 100 targets another DB or the like that stores a personal data group including personal data whose gender value is "male", it is the same as when the generation device 100 targets the DB 800. Therefore, the description thereof will be omitted.

（８−２）生成装置１００は、ＤＢ８００に対して、ノイズ付与とプライバシーテストとを実施する。生成装置１００は、例えば、ＤＢ８００に記憶された個人データ群８０１を取得する。生成装置１００は、例えば、取得した個人データ群８０１に含まれる個人データを、ランダムに所定回数選択する。 (8-2) The generation device 100 performs noise addition and a privacy test on the DB 800. The generation device 100 acquires, for example, the personal data group 801 stored in the DB 800. The generation device 100 randomly selects, for example, the personal data included in the acquired personal data group 801 a predetermined number of times.

生成装置１００は、例えば、個人データを選択する都度、選択した個人データに基づいて、生成モデル８１０により匿名データを１以上生成する。生成モデル８１０は、確率的な生成モデルである。生成モデル８１０は、例えば、差分プライバシーのメカニズムを有する生成モデルである。生成モデル８１０は、例えば、個人データ群８０１に基づいて生成されてもよい。 For example, the generation device 100 generates one or more anonymous data by the generation model 810 based on the selected personal data each time the personal data is selected. The generative model 810 is a probabilistic generative model. The generative model 810 is, for example, a generative model having a differential privacy mechanism. The generative model 810 may be generated based on, for example, the personal data group 801.

生成装置１００は、例えば、生成した１以上の匿名データに基づいて、プライバシーテストを実施し、生成した１以上の匿名データを生成する元となった個人データの匿名度合いが、所定の基準を満たすか否かを判定する。この際、生成装置１００は、一定確率で、生成した１以上の匿名データを生成する元となった個人データの匿名度合いによらず、所定の基準を満たさないと判定する。 The generation device 100, for example, conducts a privacy test based on one or more generated anonymous data, and the degree of anonymity of the personal data from which the generated one or more anonymous data is generated satisfies a predetermined criterion. Judge whether or not. At this time, the generation device 100 determines with a certain probability that the predetermined criteria are not satisfied regardless of the degree of anonymity of the personal data from which the generated one or more anonymous data is generated.

ここで、生成装置１００は、例えば、判定した結果がＯＫであれば、生成した１以上の匿名データを、リリースＤＢ８２０に保存する。一方で、生成装置１００は、判定した結果がＮＧであれば、生成した１以上の匿名データを破棄し、生成した１以上の匿名データを生成する元となった個人データを、ＮＧ−ＤＢ８３０に保存する。 Here, for example, if the determination result is OK, the generation device 100 stores the generated one or more anonymous data in the release DB 820. On the other hand, if the determination result is NG, the generation device 100 discards the generated one or more anonymous data, and converts the generated personal data that is the source of generating the one or more anonymous data into the NG-DB830. save.

（８−３）生成装置１００は、ＮＧ−ＤＢ８３０に対して、ミクロアグリゲーションを実施する。生成装置１００は、例えば、ＮＧ−ＤＢ８３０に記憶された個人データ群８３１を取得する。生成装置１００は、例えば、取得した個人データ群８３１に対して、ミクロアグリゲーションを実施する。 (8-3) The generator 100 performs microaggregation on the NG-DB830. The generation device 100 acquires, for example, the personal data group 831 stored in the NG-DB830. The generation device 100 performs microaggregation on the acquired personal data group 831, for example.

図８の例では、生成装置１００は、個人データ群８３１のそれぞれの個人データに含まれる値を、個人データ群８３１のそれぞれの個人データに含まれる値に関する平均値に置換することにより、匿名データ群８４１を生成する。生成装置１００は、例えば、ミクロアグリゲーションにより、個人データ群８３１から得られた匿名データ群８４１を、ＭＡ−ＤＢ８４０に保存する。 In the example of FIG. 8, the generator 100 replaces the value contained in each personal data of the personal data group 831 with the average value of the value contained in each personal data of the personal data group 831, thereby causing anonymous data. Generate group 841. The generation device 100 stores, for example, the anonymous data group 841 obtained from the personal data group 831 by microaggregation in the MA-DB840.

ここでは、生成装置１００が、個人データ群８３１のそれぞれの個人データに含まれる値を、個人データ群８３１のそれぞれの個人データに含まれる値に関する平均値に置換することにより、匿名データを生成する場合について説明したが、これに限らない。この場合、匿名データは、平均値に比較的近い値を含む個人データに対応する個人のものであると誤認されるおそれがある。このため、例えば、生成装置１００が、個人データ群８３１のそれぞれの個人データに含まれる値を、個人データ群８３１のそれぞれの個人データに含まれる値から一定以上離れた値に置換することにより、匿名データを生成する場合があってもよい。 Here, the generation device 100 generates anonymous data by replacing the values included in each personal data of the personal data group 831 with the average values of the values included in each personal data of the personal data group 831. The case has been described, but it is not limited to this. In this case, the anonymous data may be mistaken for the individual corresponding to the personal data containing a value relatively close to the average value. Therefore, for example, the generation device 100 replaces the value included in each personal data of the personal data group 831 with a value separated from the value contained in each personal data of the personal data group 831 by a certain amount or more. Anonymous data may be generated.

（８−４）生成装置１００は、ＮＧ−ＤＢ８３０に基づいて、生成モデル８５０を学習する。生成装置１００は、例えば、ＮＧ−ＤＢ８３０に記憶された個人データ群８３１を取得する。生成装置１００は、例えば、取得した個人データ群８３１のそれぞれの個人データを、学習データに用いて、生成モデル８５０を学習する。生成モデル８５０は、確率的な生成モデルである。生成モデル８５０は、例えば、差分プライバシーのメカニズムを有する生成モデルである。生成装置１００は、例えば、取得した個人データ群８３１のそれぞれの個人データに含まれる値に関する分散および平均に基づいて、生成モデル８５０を学習する。 (8-4) The generation device 100 learns the generation model 850 based on the NG-DB 830. The generation device 100 acquires, for example, the personal data group 831 stored in the NG-DB830. The generation device 100 learns the generation model 850 by using, for example, each personal data of the acquired personal data group 831 as training data. The generative model 850 is a probabilistic generative model. The generative model 850 is, for example, a generative model having a differential privacy mechanism. The generative device 100 learns the generative model 850, for example, based on the variance and average of the values contained in each personal data of the acquired personal data group 831.

（８−５）生成装置１００は、ＭＡ−ＤＢ８４０に対して、ノイズ付与を実施する。生成装置１００は、例えば、ＭＡ−ＤＢ８４０に記憶された匿名データ群８４１を取得する。生成装置１００は、例えば、取得した匿名データ群８４１に含まれる匿名データを、ランダムに所定回数選択する。生成装置１００は、取得した匿名データ群８４１のそれぞれの匿名データを選択してもよい。 (8-5) The generation device 100 applies noise to the MA-DB840. The generation device 100 acquires, for example, the anonymous data group 841 stored in the MA-DB840. The generation device 100 randomly selects, for example, the anonymous data included in the acquired anonymous data group 841 a predetermined number of times. The generation device 100 may select each anonymous data of the acquired anonymous data group 841.

生成装置１００は、例えば、匿名データを選択する都度、選択した匿名データに基づいて、学習した生成モデル８５０により、新たな匿名データを１以上生成する。生成装置１００は、例えば、取得した匿名データ群８４１のうち、いずれか一つの匿名データを選択し、選択した匿名データに基づいて、学習した生成モデル８５０により、新たな匿名データを、匿名データ群８４１の匿名データの数と同一の数だけ生成してもよい。生成装置１００は、例えば、生成した新たな匿名データを含む匿名データ群８５１を、リリースＤＢ８２０に保存する。生成装置１００は、新たな匿名データを１以上生成した際、プライバシーテストを実施してもよい。 For example, the generation device 100 generates one or more new anonymous data by the learned generation model 850 based on the selected anonymous data each time the anonymous data is selected. The generation device 100 selects, for example, any one of the acquired anonymous data groups 841 and uses the generated generation model 850 to learn new anonymous data based on the selected anonymous data to generate new anonymous data in the anonymous data group. The same number as the number of anonymous data of 841 may be generated. The generation device 100 stores, for example, an anonymous data group 851 including the generated new anonymous data in the release DB 820. The generation device 100 may perform a privacy test when one or more new anonymous data are generated.

このように、生成装置１００が、ＤＢ８００を処理対象とした場合、リリースＤＢ８２０が得られる。一方で、生成装置１００が、性別の値が「男性」の個人データを含む個人データ群を保存した他のＤＢを処理対象とした場合、リリースＤＢ８６１が得られたものとする。 As described above, when the generation device 100 targets the DB 800 as a processing target, the release DB 820 is obtained. On the other hand, when the generation device 100 targets another DB that stores the personal data group including the personal data whose gender value is "male", it is assumed that the release DB 861 is obtained.

（８−６）生成装置１００は、リリースＤＢ８２０と、リリースＤＢ８６１とを結合し、ＤＢ８６０を生成する。これにより、生成装置１００は、有用な匿名データを得ることができ、匿名データの有用性と、個人データの匿名性とを両立することができる。 (8-6) The generation device 100 combines the release DB 820 and the release DB 861 to generate the DB 860. As a result, the generation device 100 can obtain useful anonymous data, and can achieve both the usefulness of the anonymous data and the anonymity of the personal data.

生成装置１００は、プライバシーテストを実施するにあたり、一定確率で、本来であればＯＫと判定され得る個人データを、ＮＧ−ＤＢ８３０に保存することができる。このため、生成装置１００は、ＮＧ−ＤＢ８３０に記憶された個人データの数が少ないために、統計的解析により、個人データの匿名性が損なわれるおそれが生じるような状況を回避することができる。 When carrying out the privacy test, the generation device 100 can store personal data, which can be normally determined to be OK, in the NG-DB 830 with a certain probability. Therefore, since the number of personal data stored in the NG-DB 830 is small, the generation device 100 can avoid a situation in which the anonymity of the personal data may be impaired by the statistical analysis.

また、生成装置１００は、プライバシーテストによりＯＫと判定された個人データの数に対して、所定の閾値を設定してもよい。所定の閾値は、例えば、ユーザによって予め設定される。そして、生成装置１００は、ＯＫと判定された個人データの数が、所定の閾値以下である間、一定確率で、本来であればＯＫと判定され得る個人データをＮＧと判定し、ＮＧ−ＤＢ８３０に保存するという動作を実施する。一方で、生成装置１００は、ＯＫと判定された個人データの数が、所定の閾値より大きくなった後、一定確率で、本来であればＯＫと判定され得る個人データをＮＧと判定し、ＮＧ−ＤＢ８３０に保存するという動作を停止する。これにより、生成装置１００は、ＯＫと判定される個人データの数と、ＮＧと判定される個人データの数との偏りを低減することができ、データの匿名性の向上を図り易くすることができる。 Further, the generation device 100 may set a predetermined threshold value for the number of personal data determined to be OK by the privacy test. The predetermined threshold is set in advance by the user, for example. Then, the generation device 100 determines that the personal data that can be normally determined to be OK is NG with a certain probability while the number of personal data determined to be OK is equal to or less than a predetermined threshold value, and NG-DB830. The operation of saving to is performed. On the other hand, the generation device 100 determines that the personal data that can be normally determined to be OK is NG with a certain probability after the number of personal data determined to be OK becomes larger than a predetermined threshold value, and is NG. -Stops the operation of saving to DB830. As a result, the generation device 100 can reduce the bias between the number of personal data determined to be OK and the number of personal data determined to be NG, and can easily improve the anonymity of the data. can.

また、生成装置１００は、ＯＫと判定された個人データの数が、所定の閾値より大きくなった後、一定確率で、本来であればＯＫと判定され得る個人データをＮＧと判定し、ＮＧ−ＤＢ８３０に保存するという動作を開始するようにしてもよい。これにより、生成装置１００は、ＯＫと判定される個人データの数と、ＮＧと判定される個人データの数との偏りを低減することができ、データの匿名性の向上を図り易くすることができる。 Further, the generation device 100 determines that the personal data that can be normally determined to be OK is NG with a certain probability after the number of personal data determined to be OK becomes larger than a predetermined threshold value, and NG-. The operation of saving in the DB 830 may be started. As a result, the generation device 100 can reduce the bias between the number of personal data determined to be OK and the number of personal data determined to be NG, and can easily improve the anonymity of the data. can.

また、生成装置１００は、ＮＧ−ＤＢ８３０に保存する個人データの数に対して、所定の閾値を設定してもよい。所定の閾値は、例えば、ユーザによって予め設定される。そして、生成装置１００は、ＮＧ−ＤＢ８３０に保存された個人データの数が、所定の閾値以下である間、一定確率で、本来であればＯＫと判定され得る個人データをＮＧと判定し、ＮＧ−ＤＢ８３０に保存するという動作を実施する。一方で、生成装置１００は、ＮＧ−ＤＢ８３０に保存された個人データの数が、所定の閾値より大きくなった後、一定確率で、本来であればＯＫと判定され得る個人データをＮＧと判定し、ＮＧ−ＤＢ８３０に保存するという動作を停止する。これにより、生成装置１００は、ＯＫと判定される個人データの数と、ＮＧと判定される個人データの数との偏りを低減することができ、データの匿名性の向上を図り易くすることができる。 Further, the generation device 100 may set a predetermined threshold value for the number of personal data stored in the NG-DB 830. The predetermined threshold is set in advance by the user, for example. Then, the generation device 100 determines that the personal data stored in the NG-DB 830 is NG or NG with a certain probability while the number of personal data stored in the NG-DB 830 is equal to or less than a predetermined threshold value. -The operation of saving in DB830 is performed. On the other hand, the generation device 100 determines that the personal data stored in the NG-DB 830 is NG with a certain probability after the number of personal data stored in the NG-DB 830 becomes larger than a predetermined threshold value. , The operation of saving in NG-DB830 is stopped. As a result, the generation device 100 can reduce the bias between the number of personal data determined to be OK and the number of personal data determined to be NG, and can easily improve the anonymity of the data. can.

（生成装置１００の第３の動作例）
次に、図９を用いて、生成装置１００の第３の動作例について説明する。 (Third operation example of the generator 100)
Next, a third operation example of the generator 100 will be described with reference to FIG.

図９は、生成装置１００の第３の動作例を示す説明図である。第３の動作例は、生成装置１００が、プライバシーテストにより、ＮＧと判定された複数の個人データのうち、ｋ個の個人データごとに、ミクロアグリゲーションを実施する場合に対応する。 FIG. 9 is an explanatory diagram showing a third operation example of the generator 100. The third operation example corresponds to the case where the generation device 100 performs microaggregation for each of k personal data among the plurality of personal data determined to be NG by the privacy test.

図９において、（９−１）生成装置１００は、（７−１）および（７−２）と同様の動作により、データ管理テーブル５００に基づいて、性別の値が「女性」の個人データを含む個人データ群を、ＤＢ９００に保存している。 In FIG. 9, the (9-1) generator 100 performs the same operation as (7-1) and (7-2) to generate personal data having a gender value of "female" based on the data management table 500. The personal data group including the personal data group is stored in the DB 900.

以下の説明では、生成装置１００が、ＤＢ９００を処理対象とする場合について説明する。生成装置１００が、性別の値が「男性」の個人データを含む個人データ群を保存した他のＤＢなどを処理対象とする場合については、生成装置１００が、ＤＢ９００を処理対象とする場合と同様であるため、説明を省略する。 In the following description, a case where the generation device 100 targets the DB 900 as a processing target will be described. When the generation device 100 targets another DB or the like that stores a personal data group including personal data whose gender value is "male", it is the same as when the generation device 100 targets the DB 900. Therefore, the description thereof will be omitted.

（９−２）生成装置１００は、ＤＢ９００に対して、ノイズ付与とプライバシーテストとを実施する。生成装置１００は、例えば、ＤＢ９００に記憶された個人データ群を取得する。生成装置１００は、例えば、取得した個人データ群に含まれる個人データを、ランダムに所定回数選択する。 (9-2) The generation device 100 performs noise addition and a privacy test on the DB 900. The generation device 100 acquires, for example, a personal data group stored in the DB 900. The generation device 100 randomly selects, for example, personal data included in the acquired personal data group a predetermined number of times.

生成装置１００は、例えば、個人データを選択する都度、選択した個人データに基づいて、生成モデル９１０により匿名データを１以上生成する。生成モデル９１０は、確率的な生成モデルである。生成モデル９１０は、例えば、差分プライバシーのメカニズムを有する生成モデルである。生成モデル９１０は、例えば、取得した個人データ群に基づいて生成されてもよい。 For example, the generation device 100 generates one or more anonymous data by the generation model 910 based on the selected personal data each time the personal data is selected. The generative model 910 is a probabilistic generative model. The generative model 910 is, for example, a generative model having a differential privacy mechanism. The generative model 910 may be generated based on, for example, the acquired personal data group.

生成装置１００は、例えば、生成した１以上の匿名データに基づいて、プライバシーテストを実施し、生成した１以上の匿名データを生成する元となった個人データの匿名度合いが、所定の基準を満たすか否かを判定する。ここで、生成装置１００は、例えば、判定した結果がＯＫであれば、生成した１以上の匿名データを、リリースＤＢ９２０に保存する。一方で、生成装置１００は、判定した結果がＮＧであれば、生成した１以上の匿名データを破棄し、生成した１以上の匿名データを生成する元となった個人データを、ＮＧ−ＤＢ９３０に保存する。 The generation device 100, for example, conducts a privacy test based on one or more generated anonymous data, and the degree of anonymity of the personal data from which the generated one or more anonymous data is generated satisfies a predetermined criterion. Judge whether or not. Here, for example, if the determination result is OK, the generation device 100 stores the generated one or more anonymous data in the release DB 920. On the other hand, if the determination result is NG, the generation device 100 discards the generated one or more anonymous data, and transfers the generated personal data that is the source of generating the one or more anonymous data to the NG-DB930. save.

（９−３）生成装置１００は、ＮＧ−ＤＢ９３０に対して、クラスタリングを実施する。生成装置１００は、例えば、ＮＧ−ＤＢ９３０に記憶された個人データ群９４０を取得する。生成装置１００は、例えば、個人データ群９４０のそれぞれの個人データに含まれる値が大きい順に、個人データ群９４０をソートする。 (9-3) The generation device 100 performs clustering on the NG-DB930. The generation device 100 acquires, for example, the personal data group 940 stored in the NG-DB 930. The generation device 100 sorts the personal data group 940 in descending order of the values included in the personal data of the personal data group 940, for example.

生成装置１００は、例えば、ソート後の個人データ群９４０のうち、上位からｋ個の個人データずつ選択し、同一のクラスタとして分割することにより、ソート後の個人データ群９４０に対して、クラスタリングを実施する。ｋは、例えば、変動値であってもよい。換言すれば、クラスタごとに、異なる数の個人データが含まれていてもよい。図９の例では、生成装置１００は、個人データ群９４０を、個人データ群９４１を同一のクラスタとして分割し、個人データ群９４２を同一のクラスタとして分割する。 For example, the generation device 100 selects k personal data from the top of the sorted personal data group 940 and divides them into the same cluster to perform clustering on the sorted personal data group 940. implement. k may be, for example, a variable value. In other words, each cluster may contain a different number of personal data. In the example of FIG. 9, the generation device 100 divides the personal data group 940 into the personal data group 941 as the same cluster and the personal data group 942 into the same cluster.

（９−４）生成装置１００は、クラスタごとに、ミクロアグリゲーションを実施する。生成装置１００は、例えば、あるクラスタに分割された個人データ群９４１を取得する。生成装置１００は、例えば、取得した個人データ群９４１に対して、ミクロアグリゲーションを実施する。図９の例では、生成装置１００は、個人データ群９４１のそれぞれの個人データに含まれる値を、個人データ群９４１のそれぞれの個人データに含まれる値に関する平均値に置換することにより、匿名データ群９６１を生成する。 (9-4) The generator 100 performs microaggregation for each cluster. The generation device 100 acquires, for example, a personal data group 941 divided into a certain cluster. The generation device 100 performs microaggregation on, for example, the acquired personal data group 941. In the example of FIG. 9, the generator 100 replaces the value contained in each personal data of the personal data group 941 with the average value of the value contained in each personal data of the personal data group 941 to obtain anonymous data. Generate group 961.

生成装置１００は、例えば、あるクラスタに分割された個人データ群９４２を取得する。生成装置１００は、例えば、取得した個人データ群９４２に対して、ミクロアグリゲーションを実施する。図９の例では、生成装置１００は、個人データ群９４２のそれぞれの個人データに含まれる値を、個人データ群９４２のそれぞれの個人データに含まれる値に関する平均値に置換することにより、匿名データ群９６２を生成する。生成装置１００は、例えば、ミクロアグリゲーションにより、個人データ群９４１から得られた匿名データ群９６１と、個人データ群９４２から得られた匿名データ群９６２とを合わせた、匿名データ群９６０を、ＭＡ−ＤＢ９５０に保存する。 The generation device 100 acquires, for example, a personal data group 942 divided into a certain cluster. The generation device 100 performs microaggregation on the acquired personal data group 942, for example. In the example of FIG. 9, the generator 100 replaces the value contained in each personal data of the personal data group 942 with the average value of the value contained in each personal data of the personal data group 942, thereby causing anonymous data. Generate group 962. For example, the generator 100 uses the MA-, which is a combination of the anonymous data group 961 obtained from the personal data group 941 and the anonymous data group 962 obtained from the personal data group 942 by microaggregation. Save in DB950.

（９−５）生成装置１００は、クラスタごとに、生成モデル９７１，９７２を学習する。生成モデル９７１，９７２は、確率的な生成モデルである。生成モデル９７１，９７２は、例えば、差分プライバシーのメカニズムを有する生成モデルである。 (9-5) The generation device 100 learns the generation models 971 and 972 for each cluster. The generative models 971 and 972 are probabilistic generative models. The generative models 971 and 972 are, for example, generative models having a differential privacy mechanism.

生成装置１００は、例えば、あるクラスタに分割された個人データ群９４１を取得する。生成装置１００は、例えば、取得した個人データ群９４１のそれぞれの個人データを、学習データに用いて、生成モデル９７１を学習する。生成装置１００は、例えば、取得した個人データ群９４１のそれぞれの個人データに含まれる値に関する分散および平均に基づいて、生成モデル９７１を学習する。 The generation device 100 acquires, for example, a personal data group 941 divided into a certain cluster. The generation device 100 learns the generation model 971 by using, for example, each personal data of the acquired personal data group 941 as the training data. The generation device 100 learns the generation model 971 based on, for example, the variance and average of the values contained in each personal data of the acquired personal data group 941.

生成装置１００は、例えば、あるクラスタに分割された個人データ群９４２を取得する。生成装置１００は、例えば、取得した個人データ群９４２のそれぞれの個人データを、学習データに用いて、生成モデル９７２を学習する。生成装置１００は、例えば、取得した個人データ群９４２のそれぞれの個人データに含まれる値に関する分散および平均に基づいて、生成モデル９７２を学習する。 The generation device 100 acquires, for example, a personal data group 942 divided into a certain cluster. The generation device 100 learns the generation model 972 by using, for example, each personal data of the acquired personal data group 942 as the training data. The generation device 100 learns the generation model 972 based on, for example, the variance and average of the values contained in each personal data of the acquired personal data group 942.

（９−６）生成装置１００は、クラスタごとに、ノイズ付与を実施する。生成装置１００は、例えば、あるクラスタに分割された匿名データ群９６１を取得する。生成装置１００は、例えば、取得した匿名データ群９６１に含まれる匿名データを、ランダムに所定回数選択する。生成装置１００は、取得した匿名データ群９６１のそれぞれの匿名データを選択してもよい。 (9-6) The generation device 100 adds noise to each cluster. The generation device 100 acquires, for example, an anonymous data group 961 divided into a certain cluster. The generation device 100 randomly selects, for example, the anonymous data included in the acquired anonymous data group 961 a predetermined number of times. The generation device 100 may select each anonymous data of the acquired anonymous data group 961.

生成装置１００は、例えば、匿名データを選択する都度、選択した匿名データに基づいて、学習した生成モデル９７１により、新たな匿名データを１以上生成する。生成装置１００は、例えば、生成した新たな匿名データを含む匿名データ群を、リリースＤＢ９２０に保存する。生成装置１００は、新たな匿名データを１以上生成した際、プライバシーテストを実施してもよい。 For example, the generation device 100 generates one or more new anonymous data by the learned generation model 971 based on the selected anonymous data each time the anonymous data is selected. The generation device 100 stores, for example, an anonymous data group including the generated new anonymous data in the release DB 920. The generation device 100 may perform a privacy test when one or more new anonymous data are generated.

生成装置１００は、例えば、あるクラスタに分割された匿名データ群９６２を取得する。生成装置１００は、例えば、取得した匿名データ群９６２に含まれる匿名データを、ランダムに所定回数選択する。生成装置１００は、取得した匿名データ群９６２のそれぞれの匿名データを選択してもよい。 The generation device 100 acquires, for example, an anonymous data group 962 divided into a certain cluster. The generation device 100 randomly selects, for example, the anonymous data included in the acquired anonymous data group 962 a predetermined number of times. The generation device 100 may select each anonymous data of the acquired anonymous data group 962.

生成装置１００は、例えば、匿名データを選択する都度、選択した匿名データに基づいて、学習した生成モデル９７２により、新たな匿名データを１以上生成する。生成装置１００は、例えば、生成した新たな匿名データを含む匿名データ群を、リリースＤＢ９２０に保存する。生成装置１００は、新たな匿名データを１以上生成した際、プライバシーテストを実施してもよい。 For example, the generation device 100 generates one or more new anonymous data by the learned generation model 972 based on the selected anonymous data each time the anonymous data is selected. The generation device 100 stores, for example, an anonymous data group including the generated new anonymous data in the release DB 920. The generation device 100 may perform a privacy test when one or more new anonymous data are generated.

このように、生成装置１００が、ＤＢ９００を処理対象とした場合、リリースＤＢ９２０が得られる。一方で、生成装置１００が、性別の値が「男性」の個人データを含む個人データ群を保存した他のＤＢを処理対象とした場合、リリースＤＢ９８１が得られたものとする。 As described above, when the generation device 100 targets the DB 900 as a processing target, the release DB 920 is obtained. On the other hand, when the generation device 100 targets another DB that stores the personal data group including the personal data whose gender value is "male", it is assumed that the release DB 981 is obtained.

（９−７）生成装置１００は、リリースＤＢ９２０と、リリースＤＢ９８１とを結合し、ＤＢ９８０を生成する。これにより、生成装置１００は、有用な匿名データを得ることができ、匿名データの有用性と、個人データの匿名性とを両立することができる。また、生成装置１００は、ＮＧと判定された個人データの特徴分布を示すヒストグラムと、生成モデル９７１，９７２により生成される新たな匿名データの特徴分布を示すヒストグラムとを、さらに近づけ易くすることができる。 (9-7) The generation device 100 combines the release DB 920 and the release DB 981 to generate the DB 980. As a result, the generation device 100 can obtain useful anonymous data, and can achieve both the usefulness of the anonymous data and the anonymity of the personal data. Further, the generation device 100 can make it easier to bring the histogram showing the feature distribution of the personal data determined to be NG and the histogram showing the feature distribution of the new anonymous data generated by the generation models 971 and 972 closer to each other. can.

このように、生成装置１００は、（ｋ，γ）ＰＤなどのプライバシーテストを含む匿名化データの生成アルゴリズムについて、出力される匿名データの有用性を向上することができ、匿名データの有用性と、個人データの匿名性とを両立することができる。 As described above, the generator 100 can improve the usefulness of the output anonymous data for the anonymized data generation algorithm including the privacy test such as (k, γ) PD, and the usefulness of the anonymous data. , Anonymity of personal data can be compatible.

（生成装置１００による効果）
次に、図１０〜図１７を用いて、生成装置１００による効果について説明する。 (Effect of generator 100)
Next, the effect of the generator 100 will be described with reference to FIGS. 10 to 17.

図１０は、メンバーシップインクルージョン攻撃の一例を示す説明図である。生成装置１００は、動作例１〜動作例３により、メンバーシップインクルージョン攻撃を防止することができる。 FIG. 10 is an explanatory diagram showing an example of a membership inclusion attack. The generation device 100 can prevent the membership inclusion attack according to the operation examples 1 to 3.

メンバーシップインクルージョン攻撃は、個人データｄに係る個人と同一の属性を有する１以上の個人が、攻撃者と結託したと仮定した場合に可能な攻撃である。例えば、攻撃者は、プライバシーテストのパラメータｋ＝３である場合、個人データｄに係る個人と同一の属性を有する２名の個人を知っていれば、残りの１名の個人データｄを推定可能である。 A membership inclusion attack is an attack that is possible when it is assumed that one or more individuals having the same attributes as the individual related to the personal data d have colluded with the attacker. For example, if the privacy test parameter k = 3, the attacker can estimate the remaining one personal data d if he / she knows two individuals who have the same attributes as the individual related to the personal data d. Is.

図１０の例では、Ａｌｉｃｅが存在し、また、Ａｌｉｃｅと同一の属性「女性」「１０代」を有するＢｅｃｋｙ、Ｃｈｌｉｓ、Ｄａｚｙが存在する。そして、それぞれの個人の個人データが、データ管理テーブル５００に記憶されている。従来では、データ管理テーブル５００に基づいて、それぞれの個人の個人データにノイズを加えたデータを含めたリリースデータセット１００１が出力されることになる。 In the example of FIG. 10, Alice exists, and Becky, Chlis, and Dazzy having the same attributes “female” and “teen” as Alice exist. Then, the personal data of each individual is stored in the data management table 500. Conventionally, based on the data management table 500, the release data set 1001 including the data in which noise is added to the personal data of each individual is output.

ここで、攻撃者は、Ａｌｉｃｅ、Ｂｅｃｋｙ、Ｃｈｌｉｓ、Ｄａｚｙの背景情報を知り、リリースデータセット１００１を観察することにより、リリースデータセット１００１に、Ａｌｉｃｅの個人データを基にしたデータが含まれると推定可能である。攻撃者は、例えば、Ａｌｉｃｅと同一の属性「女性」「１０代」を有するＢｅｃｋｙ、Ｃｈｌｉｓ、Ｄａｚｙが存在することを知っている場合、Ａｌｉｃｅに関する真値に近い個人データを取得することができる。 Here, the attacker knows the background information of Alice, Becky, Chlis, and Dazzy, and by observing the release data set 1001, it is estimated that the release data set 1001 contains data based on Alice's personal data. It is possible. If the attacker knows that there are Becky, Chlis, and Dazzy having the same attributes "female" and "teen" as Alice, the attacker can obtain personal data close to the true value of Alice.

これに対し、生成装置１００は、データ管理テーブル５００に基づいて、ランダムに選択した個人データにノイズを加えたデータを含めたリリースデータセット１００２を出力することになる。このため、生成装置１００は、それぞれの個人の個人データの匿名性を確保し、メンバーシップインクルージョン攻撃を防止することができる。 On the other hand, the generation device 100 outputs a release data set 1002 including data in which noise is added to randomly selected personal data based on the data management table 500. Therefore, the generation device 100 can ensure the anonymity of the personal data of each individual and prevent the membership inclusion attack.

生成装置１００は、本来であればＮＧと判定され得る個人データに基づいて、匿名データを生成して出力することができる。このため、生成装置１００は、ユーザが、匿名データの有用性を向上するため、恣意的に、プライバシーテストを実施してしまうことを防止することができ、メンバーシップインクルージョン攻撃を防止し易くすることができる。 The generation device 100 can generate and output anonymous data based on personal data that can be normally determined to be NG. Therefore, the generator 100 can prevent the user from arbitrarily performing the privacy test in order to improve the usefulness of the anonymous data, and makes it easier to prevent the membership inclusion attack. Can be done.

図１１〜図１４は、比較対象のヒストグラムの形状の第１の例を示す説明図である。生成装置１００は、動作例１〜動作例３により、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとを近づけることができ、有用な匿名データを得ることができる。 11 to 14 are explanatory views showing a first example of the shape of the histogram to be compared. The generator 100 can bring the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data close to each other according to the operation examples 1 to 3, and is useful anonymous data. Can be obtained.

ここで、生成装置１００との比較対象として、従来のプライバシーテストにより出力される複数の匿名データの特徴分布を示すヒストグラムについて説明する。図１１の表１１００は、従来のプライバシーテストにおいて、ε⁰を１００に固定し、ｋを１〜１０００まで変動した場合に、１月〜１２月までのいずれかの値を有する個人データに基づいて出力される匿名データの数を示す。ここで、図１２の説明に移行し、表１１００をグラフ化した一例について説明する。 Here, as a comparison target with the generation device 100, a histogram showing the feature distribution of a plurality of anonymous data output by the conventional privacy test will be described. Table 1100 in FIG. 11 is based on personal data having any value from January to December when ^{ε 0 is fixed at 100 and k fluctuates from 1 to 1000 in a conventional privacy test.} Indicates the number of anonymous data to be output. Here, the explanation shifts to FIG. 12, and an example in which Table 1100 is graphed will be described.

図１２のグラフ１２００は、表１１００をグラフ化した一例である。表１１００およびグラフ１２００に示すように、個人データの匿名性を向上するため、ｋを大きくすると、出力される匿名データの数に偏りが生じることになる。例えば、ｋ＝１０００の場合、ｋ＝１の場合に比べて、１月〜９月のいずれかの値を有する個人データに基づく匿名データは、出力されなくなっている。 Graph 1200 in FIG. 12 is an example of graphing Table 1100. As shown in Table 1100 and Graph 1200, in order to improve the anonymity of personal data, increasing k causes a bias in the number of output anonymous data. For example, in the case of k = 1000, the anonymous data based on the personal data having any value from January to September is not output as compared with the case of k = 1.

これに対し、従来のプライバシーテストにおいて、ε⁰を小さくし、偏りの低減化を図ることも考えられる。図１３の表１３００は、従来のプライバシーテストにおいて、ε⁰を０．１に固定し、ｋを１〜１０００まで変動した場合に、１月〜１２月までのいずれかの値を有する個人データに基づいて出力される匿名データの数を示す。ここで、図１４の説明に移行し、表１３００をグラフ化した一例について説明する。 On the other hand, in the conventional privacy test, it is conceivable to reduce ^{ε 0 to reduce the bias.} Table 1300 in FIG. 13 shows ^{personal data having any value from January to December when ε 0} is fixed at 0.1 and k is changed from 1 to 1000 in the conventional privacy test. Indicates the number of anonymous data to be output based on. Here, the explanation shifts to FIG. 14, and an example in which Table 1300 is graphed will be described.

図１４のグラフ１４００は、表１３００をグラフ化した一例である。表１３００およびグラフ１４００に示すように、ε⁰を小さくしても、出力される匿名データの数に偏りが生じることになる。例えば、ｋ＝１０００の場合、ｋ＝１の場合に比べて、１１月の値を有する個人データに基づく匿名データの数は８割程度減少するのに対し、４月の値を有する個人データに基づく匿名データの数は５割程度減少することになる。 Graph 1400 in FIG. 14 is an example of graphing Table 1300. As shown in Table 1300 and Graph 1400, ^{even if ε 0} is made small, the number of output anonymous data will be biased. For example, in the case of k = 1000, the number of anonymous data based on personal data having the value in November is reduced by about 80% as compared with the case of k = 1, whereas the number of personal data having the value in April is reduced. The number of anonymous data based on it will decrease by about 50%.

一方で、生成装置１００は、プライバシーテストにより、ＮＧと判定された個人データに基づいて、新たな匿名データを生成して出力することができる。このため、生成装置１００は、出力される匿名データの数に、偏りが生じにくくすることができ、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとを近づけることができる。そして、生成装置１００は、有用な匿名データを得ることができる。 On the other hand, the generation device 100 can generate and output new anonymous data based on the personal data determined to be NG by the privacy test. Therefore, the generation device 100 can make it difficult for the number of output anonymous data to be biased, and shows a histogram showing the feature distribution of a plurality of personal data and a feature distribution of the plurality of output anonymous data. It can be brought closer to the histogram. Then, the generation device 100 can obtain useful anonymous data.

また、生成装置１００は、本来であればＮＧと判定され得る個人データに基づいて、匿名データを生成して出力することができる。このため、生成装置１００は、ユーザが、匿名データの有用性を向上するため、恣意的に、プライバシーテストを実施してしまうことを防止することができ、個人データの匿名性を向上することができる。 Further, the generation device 100 can generate and output anonymous data based on personal data that can be normally determined to be NG. Therefore, the generation device 100 can prevent the user from arbitrarily performing the privacy test in order to improve the usefulness of the anonymous data, and can improve the anonymity of the personal data. can.

図１５〜図１７は、比較対象のヒストグラムの形状の第２の例を示す説明図である。生成装置１００は、動作例１〜動作例３により、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとを近づけることができ、有用な匿名データを得ることができる。 15 to 17 are explanatory views showing a second example of the shape of the histogram to be compared. The generator 100 can bring the histogram showing the feature distribution of a plurality of personal data and the histogram showing the feature distribution of a plurality of output anonymous data close to each other according to the operation examples 1 to 3, and is useful anonymous data. Can be obtained.

ここで、生成装置１００との比較対象として、従来のプライバシーテストにより出力される複数の匿名データの特徴分布を示すヒストグラムについて説明する。図１５のグラフ１５００は、１月〜１２月までのいずれかの値を有する個人データの数をグラフ化した一例である。次に、図１６の説明に移行する。 Here, as a comparison target with the generation device 100, a histogram showing the feature distribution of a plurality of anonymous data output by the conventional privacy test will be described. Graph 1500 in FIG. 15 is an example of graphing the number of personal data having any value from January to December. Next, the description shifts to FIG.

図１６のグラフ１６０１〜１６０３は、従来のプライバシーテストにおいて、ｋを平均値ｋ１に固定し、ε⁰を０．００１〜１００まで変動した場合に、１月〜１２月までのいずれかの値を有する個人データに基づいて出力される匿名データの数をグラフ化した一例である。図１６のグラフ１６０１は、ε⁰を０．００１にした場合に対応する。図１６のグラフ１６０２は、ε⁰を１にした場合に対応する。図１６のグラフ１６０３は、ε⁰を１００にした場合に対応する。 Graphs 1601 to 1603 of FIG. 16 show any value from January to December when k is fixed to the average value k1 and ε ^{0 fluctuates from 0.001 to 100 in the conventional privacy test.} This is an example of graphing the number of anonymous data output based on the personal data possessed by the user. Graph 1601 of FIG. 16 corresponds to the case where ^{ε 0 is 0.001.} Graph 1602 in FIG. 16 corresponds to the case where ^{ε 0 is set to 1.} Graph 1603 of FIG. 16 corresponds to the case where ^{ε 0 is set to 100.}

グラフ１６０１〜１６０３に示すように、個人データの匿名性を向上するため、ε⁰を大きくすると、出力される匿名データの数に偏りが生じることになる。例えば、ε⁰＝１００の場合、１月〜４月、６月、８月のいずれかの値を有する個人データに基づく匿名データは、出力されなくなっている。次に、図１７の説明に移行する。 As shown in graphs 1601 to 1603, if ε ⁰ is increased in order to improve the anonymity of personal data, the number of output anonymous data will be biased. For example, when ε ⁰ = 100, anonymous data based on personal data having a value of January to April, June, or August is not output. Next, the description shifts to FIG.

図１７のグラフ１７０１〜１７０３は、従来のプライバシーテストにおいて、ｋを最低値ｋ２に固定し、ε⁰を０．００１〜１００まで変動した場合に、１月〜１２月までのいずれかの値を有する個人データに基づいて出力される匿名データの数をグラフ化した一例である。図１７のグラフ１７０１は、ε⁰を０．００１にした場合に対応する。図１７のグラフ１７０２は、ε⁰を１にした場合に対応する。図１７のグラフ１７０３は、ε⁰を１００にした場合に対応する。 Graphs 1701 to 1703 in FIG. 17 show any value from January to December when k is fixed at the minimum value k2 and ε ^{0 fluctuates from 0.001 to 100 in the conventional privacy test.} This is an example of graphing the number of anonymous data output based on the personal data possessed by the user. Graph 1701 in FIG. 17 corresponds to the case where ^{ε 0 is set to 0.001.} Graph 1702 in FIG. 17 corresponds to the case where ^{ε 0 is set to 1.} Graph 1703 of FIG. 17 corresponds to the case where ^{ε 0 is set to 100.}

グラフ１７０１〜１７０３に示すように、ε⁰を大きくすると、出力される匿名データの数に基づいて、ｋの値が推定され易くなってしまう。例えば、ε⁰＝１００の場合、４月の値を有する個人データに基づく匿名データの数のみが５割程度減少しており、４月の値を有する個人データに基づく匿名データの数に基づいて、ｋの値が推定され易くなってしまう。 As shown in Graphs 1701 to 1703, when ε ⁰ is increased, the value of k is easily estimated based on the number of output anonymous data. For example, when ε ⁰ = 100, only the number of anonymous data based on personal data having the April value has decreased by about 50%, and based on the number of anonymous data based on the personal data having the April value. , K values are easy to estimate.

一方で、生成装置１００は、プライバシーテストにより、ＮＧと判定された個人データに基づいて、新たな匿名データを生成して出力することができる。このため、生成装置１００は、出力される匿名データの数に、偏りが生じにくくすることができ、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとを近づけることができる。そして、生成装置１００は、有用な匿名データを得ることができる。また、生成装置１００は、プライバシーテストのパラメータを推定されにくくすることができる。 On the other hand, the generation device 100 can generate and output new anonymous data based on the personal data determined to be NG by the privacy test. Therefore, the generation device 100 can make it difficult for the number of output anonymous data to be biased, and shows a histogram showing the feature distribution of a plurality of personal data and a feature distribution of the plurality of output anonymous data. It can be brought closer to the histogram. Then, the generation device 100 can obtain useful anonymous data. In addition, the generator 100 can make it difficult to estimate the parameters of the privacy test.

このように、従来のプライバシーテストのパラメータｋ，γ，ε⁰を、匿名データの有用性と、個人データの匿名性とを、バランスよく両立するよう、調整することは難しい。これに対し、生成装置１００は、プライバシーテストのパラメータｋ，γ，ε⁰を、適切に調整せずとも、匿名データの有用性と、個人データの匿名性とを、バランスよく両立することができる。 As described above, it is difficult to adjust the parameters k, γ, and ε ⁰ of the conventional privacy test so as to balance the usefulness of anonymous data and the anonymity of personal data in a well-balanced manner. On the other hand, the generation device 100 can balance the usefulness of anonymous data and the anonymity of personal data in a well-balanced manner without appropriately adjusting the privacy test parameters k, γ, and ε ^0. ..

生成装置１００は、プライバシーテストのパラメータｋ，γ，ε⁰により、ＮＧと判定される個人データの数が増加したとしても、匿名データの有用性と、個人データの匿名性とを、バランスよく両立することができる。結果として、生成装置１００は、複数の個人データの特徴分布を示すヒストグラムと、出力される複数の匿名データの特徴分布を示すヒストグラムとの、数量と比率とを類似させることができ、かつ、個人データの匿名性を確保することができる。 The generator 100 balances the usefulness of anonymous data and the anonymity of personal data in a well-balanced manner even if the number of personal data determined to be NG increases due to the privacy test parameters k, γ, and ε ^0. can do. As a result, the generator 100 can resemble the quantity and ratio of the histogram showing the feature distribution of the plurality of personal data and the histogram showing the feature distribution of the plurality of output anonymous data, and the individual. Data anonymity can be ensured.

（準備処理手順）
次に、図１８を用いて、生成装置１００が実行する、準備処理手順の一例について説明する。準備処理は、例えば、図４に示したＣＰＵ４０１と、メモリ４０２や記録媒体４０５などの記憶領域と、ネットワークＩ／Ｆ４０３とによって実現される。 (Preparation procedure)
Next, an example of the preparatory processing procedure executed by the generation device 100 will be described with reference to FIG. The preparatory process is realized, for example, by the CPU 401 shown in FIG. 4, a storage area such as a memory 402 or a recording medium 405, and a network I / F 403.

図１８は、準備処理手順の一例を示すフローチャートである。図１８において、まず、生成装置１００は、ヒストグラムの有用性を一定にする対象の属性を取得する（ステップＳ１８０１）。対象の属性は、例えば、ユーザによって予め設定される。ユーザは、例えば、ヒストグラムの有用性を一定にする属性の優先度を、性別、身長の順に設定する。 FIG. 18 is a flowchart showing an example of the preparatory processing procedure. In FIG. 18, first, the generation device 100 acquires the target attribute that makes the usefulness of the histogram constant (step S1801). The target attribute is preset by the user, for example. For example, the user sets the priority of the attribute that makes the usefulness of the histogram constant in the order of gender and height.

次に、生成装置１００は、ＤＢの記憶内容を取得する（ステップＳ１８０２）。そして、生成装置１００は、優先度に沿って、いずれかの属性について、ＤＢの記憶内容を分割する（ステップＳ１８０３）。生成装置１００は、例えば、優先度に沿って、性別が女性と男性とで、ＤＢの記憶内容を分割する。 Next, the generation device 100 acquires the stored contents of the DB (step S1802). Then, the generation device 100 divides the stored contents of the DB for any of the attributes according to the priority (step S1803). The generation device 100 divides the stored contents of the DB into, for example, female and male genders according to the priority.

次に、生成装置１００は、分割した記憶内容のそれぞれを、ＤＢｎとして保存する（ステップＳ１８０４）。そして、生成装置１００は、未処理の属性が存在するか否かを判定する（ステップＳ１８０５）。 Next, the generation device 100 stores each of the divided stored contents as DBn (step S1804). Then, the generation device 100 determines whether or not there is an unprocessed attribute (step S1805).

ここで、未処理の属性が存在する場合（ステップＳ１８０５：Ｙｅｓ）、生成装置１００は、ステップＳ１８０２の処理に戻る。一方で、未処理の属性が存在しない場合（ステップＳ１８０５：Ｎｏ）、生成装置１００は、準備処理を終了する。 Here, if an unprocessed attribute exists (step S1805: Yes), the generation device 100 returns to the process of step S1802. On the other hand, when there is no unprocessed attribute (step S1805: No), the generation device 100 ends the preparatory process.

（テスト処理手順）
次に、図１９を用いて、生成装置１００が実行する、テスト処理手順の一例について説明する。テスト処理は、例えば、図４に示したＣＰＵ４０１と、メモリ４０２や記録媒体４０５などの記憶領域と、ネットワークＩ／Ｆ４０３とによって実現される。 (Test processing procedure)
Next, an example of a test processing procedure executed by the generator 100 will be described with reference to FIG. The test process is realized, for example, by the CPU 401 shown in FIG. 4, a storage area such as a memory 402 or a recording medium 405, and a network I / F 403.

図１９は、テスト処理手順の一例を示すフローチャートである。図１９において、生成装置１００は、安全性基準を取得する（ステップＳ１９０１）。安全性基準は、例えば、プライバシーテストのパラメータｋ，γ，ε⁰を含む。 FIG. 19 is a flowchart showing an example of the test processing procedure. In FIG. 19, the generator 100 acquires a safety standard (step S1901). Safety criteria include, for example, privacy test parameters k, γ, ε ⁰ .

次に、生成装置１００は、ＤＢｎの記憶内容を取得する（ステップＳ１９０２）。そして、生成装置１００は、ＤＢｎの記憶内容に基づいて、データを生成するメカニズムＭを生成する（ステップＳ１９０３）。 Next, the generation device 100 acquires the stored contents of the DBn (step S1902). Then, the generation device 100 generates a mechanism M for generating data based on the stored contents of the DBn (step S1903).

メカニズムＭは、差分プライバシーを保証したデータを生成するアルゴリズムである。生成装置１００は、例えば、ＤＢｎの記憶内容の各レコードｒ_iに含まれる値に関する平均および分散を算出し、算出した平均および分散と、安全性基準とに基づいて、メカニズムＭを生成する。 Mechanism M is an algorithm that generates data that guarantees differential privacy. The generation device 100 calculates, for example, the _{average and variance regarding the values included in each record r i} of the stored contents of DBn, and generates the mechanism M based on the calculated average and variance and the safety standard.

次に、生成装置１００は、ＤＢｎの記憶内容から、未処理のいずれかのレコードｒ_iを抽出し、メカニズムＭへの入力に設定する（ステップＳ１９０４）。そして、生成装置１００は、メカニズムＭにより、レコードｒ_iにノイズを加えたデータＭ（ｒ_i）を、所定の数だけ生成する（ステップＳ１９０５）。 _{Next, the generation device 100 extracts one of the unprocessed records r i} from the stored contents of the DBn and sets it as an input to the mechanism M (step S1904). Then, generator 100, by a mechanism M, the data M (r _i) plus noise record r _i, to generate a predetermined number (step S1905).

次に、生成装置１００は、ｋのランダム化処理を実施する（ステップＳ１９０６）。生成装置１００は、例えば、ｋに関するメカニズムＭ’により、ｋのランダム化処理を実施する。この際、生成装置１００は、複数のランダム化したｋ’を生成してもよい。そして、生成装置１００は、ランダム化したｋ’を取得する（ステップＳ１９０７）。 Next, the generation device 100 performs a randomization process of k (step S1906). The generation device 100 performs the randomization process of k by, for example, the mechanism M'related to k. At this time, the generation device 100 may generate a plurality of randomized k's. Then, the generation device 100 acquires the randomized k'(step S1907).

次に、生成装置１００は、レコードｒ_iに対してプライバシーテストを実施し、ＯＫか否かを判定する（ステップＳ１９０８）。この際、生成装置１００は、一定確率で、ＮＧであると判定してもよい。ここで、ＮＧである場合（ステップＳ１９０８：Ｎｏ）、生成装置１００は、ステップＳ１９１０の処理に移行する。一方で、ＯＫである場合（ステップＳ１９０８：Ｙｅｓ）、生成装置１００は、ステップＳ１９０９の処理に移行する。 Next, the generation device 100 performs a _{privacy test on the record r i} and determines whether or not it is OK (step S1908). At this time, the generation device 100 may determine that it is NG with a certain probability. Here, if it is NG (step S1908: No), the generation device 100 shifts to the process of step S1910. On the other hand, if it is OK (step S1908: Yes), the generator 100 shifts to the process of step S1909.

ステップＳ１９０９では、生成装置１００は、データＭ（ｒ_i）をリリースデータセットとして、ＲＬ−ＤＢに保存する（ステップＳ１９０９）。ＲＬ−ＤＢは、リリースＤＢである。そして、生成装置１００は、ステップＳ１９１１の処理に移行する。 In step S1909, generator 100, data M a (r _i) as release data set is stored in RL-DB (Step S1909). RL-DB is a release DB. Then, the generation device 100 shifts to the process of step S1911.

ステップＳ１９１０では、生成装置１００は、レコードｒ_iをＮＧ−ＤＢに保存する（ステップＳ１９１０）。そして、生成装置１００は、ステップＳ１９１１の処理に移行する。 In step S1910, the generation device 100 stores the record r _i in the NG-DB (step S1910). Then, the generation device 100 shifts to the process of step S1911.

ステップＳ１９１１では、生成装置１００は、レコードｒ_iを所定の数抽出したか否かを判定する（ステップＳ１９１１）。ここで、所定の数抽出していない場合（ステップＳ１９１１：Ｎｏ）、生成装置１００は、ステップＳ１９０４の処理に戻る。一方で、所定の数抽出している場合（ステップＳ１９１１：Ｙｅｓ）、生成装置１００は、テスト処理を終了する。 In step S1911, the generation device 100 determines whether or not a predetermined number of _{records r i have been extracted (step S1911).} Here, when a predetermined number has not been extracted (step S1911: No), the generation device 100 returns to the process of step S1904. On the other hand, when a predetermined number is extracted (step S1911: Yes), the generation device 100 ends the test process.

（分岐処理手順）
次に、図２０を用いて、生成装置１００が実行する、分岐処理手順の一例について説明する。分岐処理は、例えば、図４に示したＣＰＵ４０１と、メモリ４０２や記録媒体４０５などの記憶領域と、ネットワークＩ／Ｆ４０３とによって実現される。 (Branch processing procedure)
Next, an example of the branch processing procedure executed by the generation device 100 will be described with reference to FIG. The branching process is realized, for example, by the CPU 401 shown in FIG. 4, a storage area such as a memory 402 or a recording medium 405, and a network I / F 403.

図２０は、分岐処理手順の一例を示すフローチャートである。図２０において、生成装置１００は、安全性基準を取得する（ステップＳ２００１）。安全性基準は、例えば、プライバシーテストのパラメータｋ，γ，ε⁰を含む。 FIG. 20 is a flowchart showing an example of the branch processing procedure. In FIG. 20, the generator 100 acquires a safety standard (step S2001). Safety criteria include, for example, privacy test parameters k, γ, ε ⁰ .

次に、生成装置１００は、ＲＬ−ＤＢの記憶内容を取得する（ステップＳ２００２）。そして、生成装置１００は、ＲＬ−ＤＢの記憶内容におけるレコードの数を取得する（ステップＳ２００３）。 Next, the generation device 100 acquires the stored contents of the RL-DB (step S2002). Then, the generation device 100 acquires the number of records in the stored contents of the RL-DB (step S2003).

次に、生成装置１００は、レコードの数が上限以上であるか否かを判定する（ステップＳ２００４）。ここで、レコードの数が上限以上である場合（ステップＳ２００４：Ｙｅｓ）、生成装置１００は、ステップＳ２００５の処理に移行する。一方で、レコードの数が上限未満である場合（ステップＳ２００４：Ｎｏ）、生成装置１００は、ステップＳ２００７の処理に移行する。 Next, the generation device 100 determines whether or not the number of records is equal to or greater than the upper limit (step S2004). Here, when the number of records is equal to or greater than the upper limit (step S2004: Yes), the generation device 100 shifts to the process of step S2005. On the other hand, when the number of records is less than the upper limit (step S2004: No), the generation device 100 shifts to the process of step S2007.

ステップＳ２００５では、生成装置１００は、プライバシーテストのオプションのＰＴＯＰを変更する（ステップＳ２００５）。次に、生成装置１００は、図１９に示したテスト処理を再度実行する（ステップＳ２００６）。そして、生成装置１００は、分岐処理を終了する。 In step S2005, the generator 100 changes the privacy test option PTOP (step S2005). Next, the generator 100 re-executes the test process shown in FIG. 19 (step S2006). Then, the generation device 100 ends the branching process.

ステップＳ２００７では、生成装置１００は、レコードの数が下限以下であるか否かを判定する（ステップＳ２００７）。ここで、レコードの数が下限以下である場合（ステップＳ２００７：Ｙｅｓ）、生成装置１００は、ステップＳ２００８の処理に移行する。一方で、レコードの数が下限より大きい場合（ステップＳ２００７：Ｎｏ）、生成装置１００は、ステップＳ２００９の処理に移行する。 In step S2007, the generation device 100 determines whether or not the number of records is equal to or less than the lower limit (step S2007). Here, when the number of records is equal to or less than the lower limit (step S2007: Yes), the generation device 100 shifts to the process of step S2008. On the other hand, when the number of records is larger than the lower limit (step S2007: No), the generation device 100 shifts to the process of step S2009.

ステップＳ２００８では、生成装置１００は、ミクロアグリゲーションのオプションのＭＡＯＰを変更する（ステップＳ２００８）。そして、生成装置１００は、ステップＳ２００９の処理に移行する。 In step S2008, the generator 100 changes the optional MAOP for microaggregation (step S2008). Then, the generation device 100 shifts to the process of step S2009.

ステップＳ２００９では、生成装置１００は、図２１に後述する再利用処理を実行する（ステップＳ２００９）。そして、生成装置１００は、分岐処理を終了する。 In step S2009, the generator 100 executes the reuse process described later in FIG. 21 (step S2009). Then, the generation device 100 ends the branching process.

（再利用処理手順）
次に、図２１を用いて、生成装置１００が実行する、再利用処理手順の一例について説明する。再利用処理は、例えば、図４に示したＣＰＵ４０１と、メモリ４０２や記録媒体４０５などの記憶領域と、ネットワークＩ／Ｆ４０３とによって実現される。 (Reuse processing procedure)
Next, an example of the reuse processing procedure executed by the generation device 100 will be described with reference to FIG. The reuse process is realized, for example, by the CPU 401 shown in FIG. 4, a storage area such as a memory 402 or a recording medium 405, and a network I / F 403.

図２１は、再利用処理手順の一例を示すフローチャートである。図２１において、生成装置１００は、安全性基準を取得する（ステップＳ２１０１）。安全性基準は、例えば、プライバシーテストのパラメータｋ，γ，ε⁰を含む。 FIG. 21 is a flowchart showing an example of the reuse processing procedure. In FIG. 21, the generator 100 acquires a safety standard (step S2101). Safety criteria include, for example, privacy test parameters k, γ, ε ⁰ .

次に、生成装置１００は、ＮＧ−ＤＢの記憶内容を取得する（ステップＳ２１０２）。そして、生成装置１００は、ＭＡＯＰを取得し、ＮＧ−ＤＢの記憶内容に対してミクロアグリゲーション処理を実行する（ステップＳ２１０３）。 Next, the generation device 100 acquires the stored contents of the NG-DB (step S2102). Then, the generation device 100 acquires the MAOP and executes the microaggregation process on the stored contents of the NG-DB (step S2103).

次に、生成装置１００は、ＮＧ−ＤＢの記憶内容のうち、上位からｋ個のデータを抽出する（ステップＳ２１０４）。そして、生成装置１００は、抽出したｋ個のデータに基づいて、データを生成するメカニズムＭ_iを生成する（ステップＳ２１０５）。 Next, the generation device 100 extracts k pieces of data from the upper level of the stored contents of the NG-DB (step S2104). The generator 100 based on the extracted k pieces of data to generate a mechanism M _i for generating data (step S2105).

メカニズムＭ_iは、差分プライバシーを保証したデータを生成するアルゴリズムである。生成装置１００は、例えば、抽出したｋ個のデータに含まれる値に関する平均および分散を算出し、算出した平均および分散と、安全性基準とに基づいて、メカニズムＭ_iを生成する。 Mechanism M _i is an algorithm to generate the data that guarantees the difference privacy. The generator 100 calculates, for example, the average and variance for the values contained in the extracted k pieces of data, and generates the _{mechanism M i based on the calculated average and variance and the safety standard.}

次に、生成装置１００は、ｋ個のデータを平均化し、ｉ番目の処理対象としてＭＡ−ＤＢに保存する（ステップＳ２１０６）。そして、生成装置１００は、ＮＧ−ＤＢの記憶内容のすべてのデータを抽出したか否かを判定する（ステップＳ２１０７）。ここで、未抽出のデータが残っている場合（ステップＳ２１０７：Ｎｏ）、生成装置１００は、ステップＳ２１０４の処理に戻る。一方で、すべてのデータを抽出している場合（ステップＳ２１０７：Ｙｅｓ）、生成装置１００は、ステップＳ２１０８の処理に移行する。 Next, the generation device 100 averages k pieces of data and stores them in the MA-DB as the i-th processing target (step S2106). Then, the generation device 100 determines whether or not all the data of the stored contents of the NG-DB have been extracted (step S2107). Here, if unextracted data remains (step S2107: No), the generation device 100 returns to the process of step S2104. On the other hand, when all the data is extracted (step S2107: Yes), the generation device 100 shifts to the process of step S2108.

ステップＳ２１０８では、生成装置１００は、ＭＡ−ＤＢの記憶内容から、ｉ番目の処理対象のレコードｍ_i,kを抽出する（ステップＳ２１０８）。次に、生成装置１００は、メカニズムＭ_iにより、レコードｍ_i,kを変換し、レコードＭ_i（ｍ_i,k）を生成する（ステップＳ２１０９）。そして、生成装置１００は、生成したレコードＭ_i（ｍ_i,k）をＲＬ−ＤＢに保存する（ステップＳ２１１０）。 _{In step S2108, the generation device 100 extracts the i-th processing target record mi, k} from the stored contents of the MA-DB (step S2108). Next, the generation device 100 converts the records _{mi, k} by _{the mechanism M i} _{, and generates the record M i} ( _{mi, k} ) (step S2109). The generator 100 generates record M _{_i} (m _{_i, k)} to be stored in the RL-DB (Step S2110).

次に、生成装置１００は、ＭＡ−ＤＢの記憶内容のすべてのレコードを抽出したか否かを判定する（ステップＳ２１１１）。ここで、未抽出のレコードが残っている場合（ステップＳ２１１１：Ｎｏ）、生成装置１００は、ステップＳ２１０８の処理に戻る。一方で、すべてのレコードを抽出している場合（ステップＳ２１１１：Ｙｅｓ）、生成装置１００は、再利用処理を終了する。 Next, the generation device 100 determines whether or not all the records of the stored contents of the MA-DB have been extracted (step S2111). Here, when an unextracted record remains (step S211: No), the generation device 100 returns to the process of step S2108. On the other hand, when all the records are extracted (step S211: Yes), the generation device 100 ends the reuse process.

ここで、生成装置１００は、図１８〜図２１の各フローチャートの一部ステップの処理の順序を入れ替えて実行してもよい。例えば、ステップＳ１９０４，１９０５の処理と、ステップＳ１９０６，１９０７の処理との順序は入れ替え可能である。また、生成装置１００は、図１８〜図２１の各フローチャートの一部ステップの処理を省略してもよい。例えば、ステップＳ１８０１〜Ｓ１８０５の処理は省略可能である。 Here, the generation device 100 may execute the processes in the partial steps of the flowcharts of FIGS. 18 to 21 by changing the order of processing. For example, the order of the processes of steps S1904 and 1905 and the processes of steps S1906 and 1907 can be exchanged. Further, the generation device 100 may omit the processing of a part of the steps of each flowchart of FIGS. 18 to 21. For example, the processes of steps S1801 to S1805 can be omitted.

以上説明したように、生成装置１００によれば、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化した結果に基づいて、複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定することができる。生成装置１００によれば、特定した１以上の個人データに基づいて、第２の匿名化モデルを学習することができる。生成装置１００によれば、特定した１以上の個人データに基づいて、１以上の個人データのそれぞれの個人データよりも匿名度合いが高い、１以上の匿名データを生成することができる。生成装置１００によれば、生成した１以上の匿名データのそれぞれの匿名データを、学習した第２の匿名化モデルにより匿名化して得られた、新たな匿名データを出力することができる。これにより、生成装置１００は、有用な匿名データを得ることができ、匿名データの有用性と、個人データの匿名性とを両立することができる。 As described above, according to the generation device 100, the degree of anonymity is determined among the plurality of personal data based on the result of anonymizing the personal data contained in the plurality of personal data by the first anonymization model. It is possible to identify one or more personal data that do not meet the criteria of. According to the generator 100, the second anonymization model can be learned based on one or more specified personal data. According to the generation device 100, it is possible to generate one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data based on the specified one or more personal data. According to the generation device 100, it is possible to output new anonymous data obtained by anonymizing each anonymous data of one or more generated anonymous data by the learned second anonymization model. As a result, the generation device 100 can obtain useful anonymous data, and can achieve both the usefulness of the anonymous data and the anonymity of the personal data.

生成装置１００によれば、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化した結果に基づいて、当該個人データの匿名度合いが、所定の基準を満たすか否かを判定することができる。生成装置１００によれば、複数の個人データに含まれる個人データの匿名度合いが、所定の基準を満たすか否かを判定するにあたり、所定の確率で、所定の基準を満たさないと判定することができる。生成装置１００によれば、判定した結果に基づいて、１以上の個人データを特定することができる。これにより、生成装置１００は、所定の基準を満たさないと判定される個人データの数が少ないために、統計的解析により、個人データの匿名性が損なわれるおそれが生じるような状況を回避することができる。 According to the generation device 100, whether or not the degree of anonymity of the personal data satisfies a predetermined criterion based on the result of anonymizing the personal data contained in the plurality of personal data by the first anonymization model. Can be determined. According to the generation device 100, in determining whether or not the degree of anonymity of personal data included in a plurality of personal data satisfies a predetermined criterion, it is determined with a predetermined probability that the degree of anonymity does not satisfy the predetermined criterion. can. According to the generation device 100, one or more personal data can be specified based on the determination result. As a result, the generator 100 avoids a situation in which the anonymity of the personal data may be impaired by the statistical analysis because the number of personal data determined not to satisfy the predetermined criteria is small. Can be done.

生成装置１００によれば、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化した際に、当該個人データと同一または類似する値を含む、複数の個人データのうちの他の個人データの数を算出することができる。生成装置１００によれば、算出した数が、所定の数以下であれば、所定の基準を満たさないと判定することができる。これにより、生成装置１００は、個人データの匿名性が確保されているか否かを判定することができる。 According to the generator 100, when the personal data included in the plurality of personal data is anonymized by the first anonymization model, the personal data including the same or similar values as the personal data is included in the plurality of personal data. The number of other personal data can be calculated. According to the generator 100, if the calculated number is equal to or less than a predetermined number, it can be determined that the predetermined criterion is not satisfied. Thereby, the generation device 100 can determine whether or not the anonymity of the personal data is ensured.

生成装置１００によれば、所定の数に、可変値を採用することができる。これにより、生成装置１００は、メンバーシップインクルージョン攻撃を防止し易くすることができる。 According to the generator 100, variable values can be adopted for a predetermined number. As a result, the generation device 100 can easily prevent the membership inclusion attack.

生成装置１００によれば、第１の匿名化モデルに、個人データに含まれる値に、ランダムなノイズ値を加算して得られる匿名データを、１以上生成するモデルを採用することができる。生成装置１００によれば、複数の個人データに含まれる個人データを、第１の匿名化モデルにより匿名化した際に、加算したノイズ値の代表値が、所定の閾値以下であれば、所定の基準を満たさないと判定することができる。これにより、生成装置１００は、個人データの匿名性が確保されているか否かを判定することができる。 According to the generation device 100, a model that generates one or more anonymous data obtained by adding a random noise value to a value included in personal data can be adopted as the first anonymization model. According to the generation device 100, when the personal data included in the plurality of personal data is anonymized by the first anonymization model, if the representative value of the added noise value is equal to or less than a predetermined threshold value, it is predetermined. It can be determined that the criteria are not met. Thereby, the generation device 100 can determine whether or not the anonymity of the personal data is ensured.

生成装置１００によれば、特定した１以上の個人データを、１以上のクラスタに分割することができる。生成装置１００によれば、クラスタごとに、クラスタに分割した個人データに基づいて、クラスタに対応する第２の匿名化モデルを学習することができる。生成装置１００によれば、クラスタごとに、クラスタに分割した個人データに基づいて、クラスタに分割した個人データのそれぞれよりも匿名度合いが高い、クラスタに対応する匿名データを生成することができる。生成装置１００によれば、クラスタごとに、生成したクラスタに対応する匿名データを、学習したクラスタに対応する第２の匿名化モデルにより匿名化して得られた、新たな匿名データを出力することができる。これにより、生成装置１００は、第２の匿名化モデルにより匿名化して得られる１以上の新たな匿名データの特徴分布を示すヒストグラムを、複数の個人データの特徴分布を示すヒストグラムと類似させ易くすることができる。 According to the generator 100, one or more specified personal data can be divided into one or more clusters. According to the generator 100, it is possible to learn the second anonymization model corresponding to the cluster based on the personal data divided into the clusters for each cluster. According to the generation device 100, it is possible to generate anonymous data corresponding to a cluster, which has a higher degree of anonymity than each of the personal data divided into clusters, based on the personal data divided into clusters for each cluster. According to the generator 100, it is possible to output new anonymous data obtained by anonymizing the anonymous data corresponding to the generated cluster by the second anonymization model corresponding to the learned cluster for each cluster. can. As a result, the generation device 100 makes it easy to make the histogram showing the feature distribution of one or more new anonymous data obtained by anonymization by the second anonymization model similar to the histogram showing the feature distribution of a plurality of personal data. be able to.

生成装置１００によれば、特定した１以上の個人データのそれぞれの個人データに含まれる値に関する統計値を算出することができる。生成装置１００によれば、特定した１以上の個人データのそれぞれの個人データに含まれる値を、算出した統計値に置換し、１以上の匿名データを生成することができる。これにより、生成装置１００は、個人データの匿名性の向上を図ることができる。 According to the generation device 100, it is possible to calculate a statistical value regarding a value included in each personal data of one or more specified personal data. According to the generation device 100, the value included in each personal data of one or more specified personal data can be replaced with the calculated statistical value, and one or more anonymous data can be generated. As a result, the generation device 100 can improve the anonymity of personal data.

生成装置１００によれば、さらに、複数の個人データのうち、匿名度合いが所定の基準を満たす個人データを、第１の匿名化モデルにより匿名化して得られた匿名データを出力することができる。これにより、生成装置１００は、ユーザが、有用な匿名データを利用可能にすることができる。 According to the generation device 100, it is possible to further output the anonymous data obtained by anonymizing the personal data whose degree of anonymity satisfies a predetermined criterion among the plurality of personal data by the first anonymization model. This allows the generator 100 to make useful anonymous data available to the user.

生成装置１００によれば、第２の匿名化モデルに、匿名データに含まれる値に、ランダムなノイズ値を加算して得られる新たな匿名データを、１以上生成するモデルを採用することができる。これにより、生成装置１００は、個人データの匿名性を向上可能な第２の匿名化モデルを採用することができる。 According to the generation device 100, as the second anonymization model, it is possible to adopt a model that generates one or more new anonymous data obtained by adding a random noise value to the value included in the anonymous data. .. As a result, the generation device 100 can adopt a second anonymization model capable of improving the anonymity of personal data.

生成装置１００によれば、特定した１以上の個人データのそれぞれの個人データに含まれる値に関する分散および平均に基づいて、第２の匿名化モデルに用いられるノイズ値の範囲を決定し、第２の匿名化モデルを学習することができる。これにより、生成装置１００は、第２の匿名化モデルにより匿名化して得られる１以上の新たな匿名データの特徴分布を示すヒストグラムを、複数の個人データの特徴分布を示すヒストグラムと類似させ易くすることができる。 According to the generator 100, the range of noise values used in the second anonymization model is determined based on the variance and average of the values contained in each personal data of one or more identified personal data, and the second You can learn the anonymization model of. As a result, the generation device 100 makes it easy to make the histogram showing the feature distribution of one or more new anonymous data obtained by anonymization by the second anonymization model similar to the histogram showing the feature distribution of a plurality of personal data. be able to.

なお、本実施の形態で説明した生成方法は、予め用意されたプログラムをＰＣやワークステーションなどのコンピュータで実行することにより実現することができる。本実施の形態で説明した生成プログラムは、コンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。記録媒体は、ハードディスク、フレキシブルディスク、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）−ＲＯＭ、ＭＯ（ＭａｇｎｅｔｏＯｐｔｉｃａｌｄｉｓｃ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）などである。また、本実施の形態で説明した生成プログラムは、インターネットなどのネットワークを介して配布してもよい。 The generation method described in this embodiment can be realized by executing a program prepared in advance on a computer such as a PC or a workstation. The generation program described in this embodiment is recorded on a computer-readable recording medium and executed by being read from the recording medium by the computer. The recording medium is a hard disk, a flexible disk, a CD (Compact Disc) -ROM, an MO (Magnet Optical disc), a DVD (Digital Versaille Disc), or the like. Further, the generation program described in the present embodiment may be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are further disclosed with respect to the above-described embodiment.

（付記１）情報を匿名化する第１の匿名化モデルにより、複数の個人データに含まれる個人データを匿名化した結果に基づいて、前記複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定し、
特定した前記１以上の個人データに基づいて、情報を匿名化する第２の匿名化モデルを学習し、
特定した前記１以上の個人データに基づいて、前記１以上の個人データのそれぞれの個人データよりも匿名度合いが高い、１以上の匿名データを生成し、
学習した前記第２の匿名化モデルにより、生成した前記１以上の匿名データのそれぞれの匿名データを匿名化して得られた、新たな匿名データを出力する、
処理をコンピュータが実行することを特徴とする生成方法。 (Appendix 1) Anonymization of information Based on the result of anonymizing personal data contained in a plurality of personal data by the first anonymization model, the degree of anonymity of the plurality of personal data is set as a predetermined criterion. Identify one or more personal data that does not meet and
Based on the identified one or more personal data, learn a second anonymization model that anonymizes information.
Based on the specified one or more personal data, one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data is generated.
A new anonymous data obtained by anonymizing each anonymized data of the one or more generated anonymous data by the learned second anonymization model is output.
A generation method characterized by the processing being performed by a computer.

（付記２）前記第１の匿名化モデルにより、前記複数の個人データに含まれる個人データを匿名化した結果に基づいて、当該個人データの匿名度合いが、前記所定の基準を満たすか否かを判定する、
処理を前記コンピュータが実行し、
前記判定する処理は、
前記第１の匿名化モデルにより、前記複数の個人データに含まれる個人データを匿名化した結果に基づいて、当該個人データの匿名度合いが、前記所定の基準を満たすか否かを判定するにあたり、所定の確率で、前記所定の基準を満たさないと判定し、
前記特定する処理は、
判定した結果に基づいて、前記１以上の個人データを特定する、ことを特徴とする付記１に記載の生成方法。 (Appendix 2) Whether or not the degree of anonymity of the personal data satisfies the predetermined criteria based on the result of anonymizing the personal data included in the plurality of personal data by the first anonymization model. judge,
The computer executes the process,
The determination process is
In determining whether or not the degree of anonymity of the personal data satisfies the predetermined criteria based on the result of anonymizing the personal data included in the plurality of personal data by the first anonymization model. With a predetermined probability, it is determined that the predetermined criteria are not met,
The specific process is
The generation method according to Appendix 1, wherein one or more personal data is specified based on the determination result.

（付記３）前記判定する処理は、
前記第１の匿名化モデルにより、前記複数の個人データに含まれる個人データを匿名化した際に、当該個人データと同一または類似する値を含む、前記複数の個人データのうちの他の個人データの数が、所定の数以下であれば、前記所定の基準を満たさないと判定する、ことを特徴とする付記２に記載の生成方法。 (Appendix 3) The determination process is
When personal data contained in the plurality of personal data is anonymized by the first anonymization model, other personal data among the plurality of personal data including values that are the same as or similar to the personal data. The generation method according to Appendix 2, wherein it is determined that the predetermined number or less is not satisfied with the predetermined number.

（付記４）前記所定の数は、可変値である、ことを特徴とする付記３に記載の生成方法。 (Appendix 4) The generation method according to Appendix 3, wherein the predetermined number is a variable value.

（付記５）前記第１の匿名化モデルは、前記個人データに含まれる値に、ランダムなノイズ値を加算して得られる匿名データを、１以上生成するモデルであり、
前記判定する処理は、
前記第１の匿名化モデルにより、前記複数の個人データに含まれる個人データを匿名化した際に、加算したノイズ値の代表値が、所定の閾値以下であれば、前記所定の基準を満たさないと判定する、ことを特徴とする付記２〜４のいずれか一つに記載の生成方法。 (Appendix 5) The first anonymization model is a model that generates one or more anonymous data obtained by adding a random noise value to a value included in the personal data.
The determination process is
When the personal data included in the plurality of personal data is anonymized by the first anonymization model, if the representative value of the added noise value is equal to or less than a predetermined threshold value, the predetermined criterion is not satisfied. The generation method according to any one of Supplementary note 2 to 4, wherein the method is determined to be.

（付記６）特定した前記１以上の個人データを、１以上のクラスタに分割する、
処理を前記コンピュータが実行し、
前記学習する処理は、
前記クラスタごとに、前記クラスタに分割した個人データに基づいて、前記クラスタに対応する前記第２の匿名化モデルを学習し、
前記生成する処理は、
前記クラスタごとに、前記クラスタに分割した個人データに基づいて、前記クラスタに分割した個人データのそれぞれよりも匿名度合いが高い、前記クラスタに対応する匿名データを生成し、
前記出力する処理は、
前記クラスタごとに、学習した前記クラスタに対応する前記第２の匿名化モデルにより、生成した前記クラスタに対応する前記匿名データを匿名化して得られた、新たな匿名データを出力する、ことを特徴とする付記１〜５のいずれか一つに記載の生成方法。 (Appendix 6) The specified one or more personal data is divided into one or more clusters.
The computer executes the process,
The learning process is
For each cluster, the second anonymization model corresponding to the cluster is learned based on the personal data divided into the clusters.
The process to be generated is
For each of the clusters, based on the personal data divided into the clusters, anonymous data corresponding to the clusters having a higher degree of anonymity than each of the personal data divided into the clusters is generated.
The output process is
Each of the clusters is characterized in that new anonymous data obtained by anonymizing the generated anonymous data corresponding to the cluster by the second anonymization model corresponding to the learned cluster is output. The generation method according to any one of Supplementary note 1 to 5.

（付記７）特定した前記１以上の個人データのそれぞれの個人データに含まれる値に関する統計値を算出する、
処理を前記コンピュータが実行し、
前記生成する処理は、
特定した前記１以上の個人データのそれぞれの個人データに含まれる値を、算出した前記統計値に置換し、１以上の匿名データを生成する、ことを特徴とする付記１〜６のいずれか一つに記載の生成方法。 (Appendix 7) Calculate the statistical value regarding the value included in each personal data of the specified one or more personal data.
The computer executes the process,
The process to be generated is
Any one of Appendix 1 to 6, characterized in that the value included in each of the specified personal data of 1 or more is replaced with the calculated statistical value to generate 1 or more anonymous data. The generation method described in 1.

（付記８）前記出力する処理は、
さらに、前記複数の個人データのうち、匿名度合いが前記所定の基準を満たす個人データを、前記第１の匿名化モデルにより匿名化して得られた匿名データを出力する、ことを特徴とする付記１〜７のいずれか一つに記載の生成方法。 (Appendix 8) The output process is
Further, of the plurality of personal data, the personal data whose degree of anonymity satisfies the predetermined criterion is anonymized by the first anonymization model, and the anonymized data obtained is output. The generation method according to any one of 7 to 7.

（付記９）前記第２の匿名化モデルは、前記匿名データに含まれる値に、ランダムなノイズ値を加算して得られる新たな匿名データを、１以上生成するモデルである、ことを特徴とする付記１〜７のいずれか一つに記載の生成方法。 (Appendix 9) The second anonymization model is characterized in that one or more new anonymous data obtained by adding a random noise value to the value included in the anonymous data is generated. The generation method according to any one of Supplementary note 1 to 7.

（付記１０）前記学習する処理は、
特定した前記１以上の個人データのそれぞれの個人データに含まれる値に関する分散および平均に基づいて、前記第２の匿名化モデルに用いられるノイズ値の範囲を決定し、前記第２の匿名化モデルを学習する、ことを特徴とする付記９に記載の生成方法。 (Appendix 10) The learning process is
The range of noise values used in the second anonymization model is determined based on the variance and average of the values contained in each of the specified one or more personal data, and the second anonymization model is used. The generation method according to Appendix 9, wherein the method is learned.

（付記１１）情報を匿名化する第１の匿名化モデルにより、複数の個人データに含まれる個人データを匿名化した結果に基づいて、前記複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定し、
特定した前記１以上の個人データに基づいて、情報を匿名化する第２の匿名化モデルを学習し、
特定した前記１以上の個人データに基づいて、前記１以上の個人データのそれぞれの個人データよりも匿名度合いが高い、１以上の匿名データを生成し、
学習した前記第２の匿名化モデルにより、生成した前記１以上の匿名データのそれぞれの匿名データを匿名化して得られた、新たな匿名データを出力する、
処理をコンピュータに実行させることを特徴とする生成プログラム。 (Appendix 11) Anonymization of information Based on the result of anonymizing personal data contained in a plurality of personal data by the first anonymization model, the degree of anonymity of the plurality of personal data is set as a predetermined criterion. Identify one or more personal data that does not meet and
Based on the identified one or more personal data, learn a second anonymization model that anonymizes information.
Based on the specified one or more personal data, one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data is generated.
A new anonymous data obtained by anonymizing each anonymized data of the one or more generated anonymous data by the learned second anonymization model is output.
A generator characterized by having a computer perform processing.

（付記１２）情報を匿名化する第１の匿名化モデルにより、複数の個人データに含まれる個人データを匿名化した結果に基づいて、前記複数の個人データのうち、匿名度合いが所定の基準を満たさない１以上の個人データを特定し、
特定した前記１以上の個人データに基づいて、情報を匿名化する第２の匿名化モデルを学習し、
特定した前記１以上の個人データに基づいて、前記１以上の個人データのそれぞれの個人データよりも匿名度合いが高い、１以上の匿名データを生成し、
学習した前記第２の匿名化モデルにより、生成した前記１以上の匿名データのそれぞれの匿名データを匿名化して得られた、新たな匿名データを出力する、
制御部を有することを特徴とする生成装置。 (Appendix 12) Anonymization of information Based on the result of anonymizing personal data contained in a plurality of personal data by the first anonymization model, the degree of anonymity of the plurality of personal data is set as a predetermined criterion. Identify one or more personal data that does not meet and
Based on the identified one or more personal data, learn a second anonymization model that anonymizes information.
Based on the specified one or more personal data, one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data is generated.
A new anonymous data obtained by anonymizing each anonymized data of the one or more generated anonymous data by the learned second anonymization model is output.
A generator characterized by having a control unit.

１００生成装置
１０２，７５０，８３０，９３０ＮＧ−ＤＢ
１０３，７６０，８４０，９５０ＭＡ−ＤＢ
１０４，７４０，７８１，８２０，８６１，９２０，９８１リリースＤＢ
１１０第１の匿名化モデル
１２０第２の匿名化モデル
３００データ利活用システム
３０１データ提供側装置
３０２データ利用側装置
３１０ネットワーク
４００バス
４０１ＣＰＵ
４０２メモリ
４０３ネットワークＩ／Ｆ
４０４記録媒体Ｉ／Ｆ
４０５記録媒体
５００データ管理テーブル
６００記憶部
６０１取得部
６０２第１の匿名化部
６０３判定部
６０４特定部
６０５学習部
６０６生成部
６０７第２の匿名化部
６０８出力部
７０１，７１１，７５１，８０１，８３１，８４２，９４０〜９４２個人データ群
７０２，７０３クラスタ
７１０，７２０，７８０，８００，８６０，９００，９８０ＤＢ
７３０，７７０，８１０，８５０，９１０，９７１，９７２生成モデル
７６１，７７１，８４１，８５１，９６０〜９６２匿名データ群
１００１，１００２リリースデータセット
１１００，１３００表
１２００，１４００，１５００，１６０１〜１６０３，１７０１〜１７０３グラフ 100 Generator 102,750,830,930 NG-DB
103,760,840,950 MA-DB
104,740,781,820,861,920,981 release DB
110 First anonymization model 120 Second anonymization model 300 Data utilization system 301 Data provider device 302 Data user side device 310 Network 400 Bus 401 CPU
402 Memory 403 Network I / F
404 Recording medium I / F
405 Recording medium 500 Data management table 600 Storage unit 601 Acquisition unit 602 First anonymization unit 603 Judgment unit 604 Specific unit 605 Learning unit 606 Generation unit 607 Second anonymization unit 608 Output unit 701,711,751,801 831,842,940-942 Personal data group 702,703 cluster 710,720,780,800,860,900,980 DB
730,770,810,850,910,971,972 Generative model 761,771,841,851,960-962 Anonymous data group 1001,1002 Release dataset 1100,1300 Table 1200,1400,1500,1601-16031701 ~ 1703 Graph

Claims

Based on the result of anonymizing personal data contained in a plurality of personal data by the first anonymization model for anonymizing information, one or more of the plurality of personal data whose degree of anonymity does not meet a predetermined standard. Identify your personal data
Based on the identified one or more personal data, learn a second anonymization model that anonymizes information.
Based on the specified one or more personal data, one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data is generated.
A new anonymous data obtained by anonymizing each anonymized data of the one or more generated anonymous data by the learned second anonymization model is output.
A generation method characterized by the processing being performed by a computer.

Based on the result of anonymizing the personal data included in the plurality of personal data by the first anonymization model, it is determined whether or not the degree of anonymity of the personal data satisfies the predetermined criterion.
The computer executes the process,
The determination process is
In determining whether or not the degree of anonymity of the personal data satisfies the predetermined criteria based on the result of anonymizing the personal data included in the plurality of personal data by the first anonymization model. With a predetermined probability, it is determined that the predetermined criteria are not met,
The specific process is
The generation method according to claim 1, wherein one or more personal data is specified based on the determination result.

The determination process is
When personal data contained in the plurality of personal data is anonymized by the first anonymization model, other personal data among the plurality of personal data including values that are the same as or similar to the personal data. The generation method according to claim 2, wherein if the number of is equal to or less than a predetermined number, it is determined that the predetermined number is not satisfied.

The first anonymization model is a model that generates one or more anonymous data obtained by adding a random noise value to a value included in the personal data.
The determination process is
When the personal data included in the plurality of personal data is anonymized by the first anonymization model, if the representative value of the added noise value is equal to or less than a predetermined threshold value, the predetermined criterion is not satisfied. The generation method according to claim 2 or 3, wherein the determination is made.

Divide the specified one or more personal data into one or more clusters.
The computer executes the process,
The learning process is
For each cluster, the second anonymization model corresponding to the cluster is learned based on the personal data divided into the clusters.
The process to be generated is
For each of the clusters, based on the personal data divided into the clusters, anonymous data corresponding to the clusters having a higher degree of anonymity than each of the personal data divided into the clusters is generated.
The output process is
Each of the clusters is characterized in that new anonymous data obtained by anonymizing the generated anonymous data corresponding to the cluster by the second anonymization model corresponding to the learned cluster is output. The generation method according to any one of claims 1 to 4.

Calculate the statistical value regarding the value included in each personal data of the specified one or more personal data.
The computer executes the process,
The process to be generated is
Any of claims 1 to 5, wherein the value included in each of the specified personal data of one or more personal data is replaced with the calculated statistical value to generate one or more anonymous data. The generation method described in one.

Based on the result of anonymizing personal data contained in a plurality of personal data by the first anonymization model for anonymizing information, one or more of the plurality of personal data whose degree of anonymity does not meet a predetermined standard. Identify your personal data
Based on the identified one or more personal data, learn a second anonymization model that anonymizes information.
Based on the specified one or more personal data, one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data is generated.
A new anonymous data obtained by anonymizing each anonymized data of the one or more generated anonymous data by the learned second anonymization model is output.
A generator characterized by having a computer perform processing.

Based on the result of anonymizing personal data contained in a plurality of personal data by the first anonymization model for anonymizing information, one or more of the plurality of personal data whose degree of anonymity does not meet a predetermined standard. Identify your personal data
Based on the identified one or more personal data, learn a second anonymization model that anonymizes information.
Based on the specified one or more personal data, one or more anonymous data having a higher degree of anonymity than each personal data of the one or more personal data is generated.
A new anonymous data obtained by anonymizing each anonymized data of the one or more generated anonymous data by the learned second anonymization model is output.
A generator characterized by having a control unit.