JP2017215669A

JP2017215669A - Probability density function estimation device, continuous value prediction device, method, and program

Info

Publication number: JP2017215669A
Application number: JP2016107501A
Authority: JP
Inventors: 秀明金; Hideaki Kim; 澤田　宏; Hiroshi Sawada; 宏澤田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2016-05-30
Filing date: 2016-05-30
Publication date: 2017-12-07
Anticipated expiration: 2036-05-30
Also published as: JP6517731B2

Abstract

【課題】個体が生成する連続値データを表す正しい確率密度関数を推定することができる。【解決手段】トピック推定部２４が、各連続値データについて、所属するトピックを推定し、各個体の各トピックに対する重みの分布を表すハイパーパラメータと、各トピックに対するヒストグラムを特徴付ける形状パラメータの分布を表すハイパーパラメータとを推定する。個体確率密度関数推定部２６が、形状パラメータの事後確率、及び重みの事後確率に従って、各トピックに対する形状パラメータ、及び各個体の各トピックに対する重みを推定し、各個体について、推定された各トピックに対する前記形状パラメータに基づく各トピックに対するヒストグラムの、推定された個体の各トピックに対する重みを用いた線形和で表される、個体が生成する連続値データの確率密度関数を出力する。【選択図】図１A correct probability density function representing continuous value data generated by an individual can be estimated. A topic estimation unit estimates a topic to which each continuous value data belongs, and represents a distribution of weight parameters for each topic of each individual and a distribution of shape parameters characterizing a histogram for each topic. Estimate hyper parameters. The individual probability density function estimating unit 26 estimates the shape parameter for each topic and the weight for each topic of each individual according to the posterior probability of the shape parameter and the weight posterior probability, and for each individual, for each estimated topic. A probability density function of continuous value data generated by the individual, which is represented by a linear sum using a weight for each topic of the estimated individual of the histogram for each topic based on the shape parameter, is output. [Selection] Figure 1

Description

本発明は、確率密度関数推定装置、連続値予測装置、方法、及びプログラムに関する。 The present invention relates to a probability density function estimation device, a continuous value prediction device, a method, and a program.

一般に、確率的に振る舞う個体が連続値を生成し、その連続値をデータとして我々が観測する状況を考える。本状況としては、例えば、消費者（=個体）がある金額（=連続値）の購買を行う、消費者（=個体）がある時間間隔（=連続値）で購買を行う、旅行者（=個体）がある時間間隔（=連続値）で次の滞在地に移動する、旅行者（=個体）がある位置座標（=連続値）に滞在する、文書（=個体）の中のある単語がある時刻（=連続値）に出現する、機器（=個体）の故障がある時刻（=連続値）に発生する、など様々なものが挙げられる。 In general, we consider a situation in which an individual who behaves stochastically generates a continuous value and observes that continuous data as data. For example, a consumer (= individual) purchases a certain amount (= continuous value), a consumer (= individual) purchases at a certain time interval (= continuous value), a traveler (= An individual moves to the next place of stay at a certain time interval (= continuous value), a traveler (= individual) stays at a certain position coordinate (= continuous value), and a word in a document (= individual) There are various things such as appearing at a certain time (= continuous value), occurring at a time (= continuous value) when there is a failure of the device (= individual).

個体が連続値を生成する時に従う確率的な規則、すなわち確率密度関数をデータから推定することができれば、個体が将来に生成する連続値を予測することができる。上記の例では、消費者の将来の購買時刻や旅行者の将来の滞在地を予測できることになる。確率密度関数の推定方法は大きく分けて、パラメトリック密度推定とノンパラメトリック密度推定の二種類が存在する。パラメトリック密度推定とは、確率密度関数を、パラメータを有した分布関数で表現し、データからそのパラメータの値を決める方法である。一方ノンパラメトリック密度推定とは、確率密度関数の形状に強い仮定を置かずにデータから推定する方法である。強い仮定を置かないため、ノンパラメトリック密度推定では正確な推定のためにより多くのデータが必要である。ノンパラメトリック密度推定の代表的なものにヒストグラム法があり、本発明はこのヒストグラム法に関するものである。なお、ヒストグラム法とは、連続値を取る変数の定義域を有限個の区間に分割し、各区間において確率密度関数が一定値を取ると仮定し、その各区間の値を観測データから推定することで確率密度関数を推定する手法である。ヒストグラム法に基づき推定された確率密度関数を特にヒストグラムと呼ぶこととする。個体の確率密度関数をヒストグラムで表現する装置に非特許文献１がある。 If a probabilistic rule that an individual follows when generating a continuous value, that is, a probability density function can be estimated from the data, a continuous value that the individual will generate in the future can be predicted. In the above example, the consumer's future purchase time and the traveler's future place of stay can be predicted. Probability density function estimation methods can be broadly divided into two types: parametric density estimation and nonparametric density estimation. Parametric density estimation is a method in which a probability density function is expressed by a distribution function having parameters, and the values of the parameters are determined from data. On the other hand, nonparametric density estimation is a method of estimating from data without making a strong assumption on the shape of the probability density function. Because no strong assumptions are made, non-parametric density estimation requires more data for accurate estimation. A representative example of nonparametric density estimation is a histogram method, and the present invention relates to this histogram method. Note that the histogram method divides the domain of a variable that takes continuous values into a finite number of sections, assumes that the probability density function takes a constant value in each section, and estimates the value of each section from the observed data This is a method for estimating the probability density function. The probability density function estimated based on the histogram method is particularly called a histogram. Non-Patent Document 1 is an apparatus that represents a probability density function of an individual using a histogram.

J. Rissanen, "Density estimation by stochastic complexity", IEEE Transactions on Information Theory, Vol. 38, pp.315-323, 1992.J. Rissanen, "Density estimation by stochastic complexity", IEEE Transactions on Information Theory, Vol. 38, pp.315-323, 1992.

個体から観測されたデータが十分手に入る場合、密度関数の形状が制限される（自由度が低い）パラメトリック密度推定ではなく、より多様な形状を再現できるヒストグラム法が適している。しかし多くの場合、手に入る個体のデータは少ないため、ヒストグラム法は適用できずパラメトリック密度推定が用いられる。このとき、パラメトリック密度推定において仮定された分布関数の形状が真の形状と異なる場合、正しい確率密度関数を推定することができず、結果予測精度が低くなる。ヒストグラム法とパラメトリック密度推定の長所と短所について表１にまとめる。 When the data observed from an individual is sufficiently available, a histogram method that can reproduce more various shapes is suitable instead of parametric density estimation in which the shape of the density function is limited (low degree of freedom). However, in many cases, since there is little data of individuals available, the histogram method cannot be applied and parametric density estimation is used. At this time, when the shape of the distribution function assumed in the parametric density estimation is different from the true shape, a correct probability density function cannot be estimated, and the result prediction accuracy is lowered. Table 1 summarizes the advantages and disadvantages of the histogram method and parametric density estimation.

我々は個体に内在する真の確率密度関数を知り得ないため、可能な限りヒストグラム法を使用すべきである。 Since we cannot know the true probability density function inherent in an individual, we should use the histogram method whenever possible.

本発明は、上記事情を鑑みて成されたものであり、個体が生成する連続値データを表す正しい確率密度関数を推定することができる確率密度関数推定装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and provides a probability density function estimation device, method, and program capable of estimating a correct probability density function representing continuous value data generated by an individual. Objective.

また、個体が生成する連続値データを精度よく予測することができる連続値予測装置、方法、及びプログラムを提供することを目的とする。 It is another object of the present invention to provide a continuous value prediction apparatus, method, and program capable of accurately predicting continuous value data generated by an individual.

上記目的を達成するために、第１の発明に係る確率密度関数推定装置は、複数の個体で観測された、前記個体が生成した連続値データの集合を入力とし、前記連続値データの集合に含まれる各連続値データについて、連続値の確率密度を表す、各トピックに対するヒストグラムの、前記トピック毎に定められた分割数及びビン幅を用いて表される、前記連続値データが所属するトピックの事後確率に従って、前記連続値データが所属するトピックを推定し、各連続値データについての前記トピックの推定結果に基づいて、各個体の各トピックに対する重みの分布を表すハイパーパラメータと、各トピックに対するヒストグラムを特徴付ける形状パラメータの分布を表すハイパーパラメータとを推定するトピック推定部と、前記トピック推定部によって推定された、前記重みの分布を表すハイパーパラメータ、前記形状パラメータの分布を表すハイパーパラメータ、及び各連続値データについての前記トピックの推定結果に基づく、前記形状パラメータの事後確率、及び前記重みの事後確率に従って、各トピックに対する前記形状パラメータ、及び各個体の各トピックに対する重みを推定し、各個体について、前記推定された各トピックに対する前記形状パラメータに基づく各トピックに対するヒストグラムの、前記推定された個体の各トピックに対する重みを用いた線形和で表される、前記個体が生成する連続値データの確率密度関数を出力する個体確率密度関数推定部と、を含んで構成されている。 In order to achieve the above object, a probability density function estimation device according to a first aspect of the present invention receives, as an input, a set of continuous value data generated by the individual and observed by a plurality of individuals. For each continuous value data included, the histogram for each topic, which represents the probability density of the continuous value, is expressed using the number of divisions and bin widths defined for each topic, and the topic to which the continuous value data belongs. According to the posterior probability, the topic to which the continuous value data belongs is estimated, and based on the estimation result of the topic for each continuous value data, a hyperparameter that represents a distribution of weights for each topic of each individual, and a histogram for each topic A topic estimation unit that estimates hyperparameters representing a distribution of shape parameters that characterize the topic, and the topic estimation unit Therefore, the estimated hyperparameter representing the weight distribution, the hyperparameter representing the shape parameter distribution, and the posterior probability of the shape parameter based on the estimation result of the topic for each continuous value data, and the weight According to the posterior probability, the shape parameter for each topic and the weight of each individual for each topic are estimated, and for each individual, the estimated individual of the histogram for each topic based on the shape parameter for each estimated topic And an individual probability density function estimating unit that outputs a probability density function of continuous value data generated by the individual expressed by a linear sum using a weight for each topic.

また、第２の発明に係る確率密度関数推定方法は、トピック推定部が、複数の個体で観測された、前記個体が生成した連続値データの集合を入力とし、前記連続値データの集合に含まれる各連続値データについて、連続値の確率密度を表す、各トピックに対するヒストグラムの、前記トピック毎に定められた分割数及びビン幅を用いて表される、前記連続値データが所属するトピックの事後確率に従って、前記連続値データが所属するトピックを推定し、各連続値データについての前記トピックの推定結果に基づいて、各個体の各トピックに対する重みの分布を表すハイパーパラメータと、各トピックに対するヒストグラムを特徴付ける形状パラメータの分布を表すハイパーパラメータとを推定し、個体確率密度関数推定部が、前記トピック推定部によって推定された、前記重みの分布を表すハイパーパラメータ、前記形状パラメータの分布を表すハイパーパラメータ、及び各連続値データについての前記トピックの推定結果に基づく、前記形状パラメータの事後確率、及び前記重みの事後確率に従って、各トピックに対する前記形状パラメータ、及び各個体の各トピックに対する重みを推定し、各個体について、前記推定された各トピックに対する前記形状パラメータに基づく各トピックに対するヒストグラムの、前記推定された個体の各トピックに対する重みを用いた線形和で表される、前記個体が生成する連続値データの確率密度関数を出力する。 Further, in the probability density function estimation method according to the second invention, the topic estimation unit receives a set of continuous value data generated by the individual and observed in a plurality of individuals, and is included in the continuous value data set. For each continuous value data, the posterior of the topic to which the continuous value data belongs, expressed using the division number and bin width determined for each topic in the histogram for each topic, representing the probability density of the continuous value According to the probability, the topic to which the continuous value data belongs is estimated, and based on the estimation result of the topic for each continuous value data, a hyper parameter representing the distribution of weights for each topic of each individual, and a histogram for each topic Hyperparameters representing the distribution of the shape parameters to be characterized, and the individual probability density function estimation unit The posterior probability of the shape parameter based on the hyperparameter representing the distribution of the weight, the hyperparameter representing the distribution of the shape parameter, and the estimation result of the topic for each continuous value data, Estimating the shape parameter for each topic and the weight for each topic for each individual according to the posterior probability of weight, and for each individual, the estimated histogram of each topic based on the shape parameter for each estimated topic A probability density function of continuous value data generated by the individual, which is expressed by a linear sum using a weight for each topic of the individual, is output.

第３の発明に係る確率密度関数推定装置は、複数の個体で観測された、前記個体が生成した連続値データの集合を入力とし、前記連続値データの集合に含まれる各連続値データについて、連続値の確率密度を表す、各トピックに対するヒストグラムの、前記トピック毎の分割数及びビン幅を用いて表される、前記連続値データが所属するトピックの事後確率に従って、前記連続値データが所属するトピックを推定し、各トピックについて、各連続値データについての前記トピックの推定結果と、前記トピックに対するヒストグラムのビン幅とを用いて表される、前記トピックに対するヒストグラムの分割数の事後確率に従って、前記トピックに対するヒストグラムの分割数を推定し、各連続値データについての前記トピックの推定結果と、各トピックに対するヒストグラムの分割数とを用いて表される、各トピックに対するヒストグラムのビン幅の事後確率に従って、各トピックに対するヒストグラムのビン幅を推定し、各連続値データについての前記トピックの推定結果に基づいて、各個体の各トピックに対する重みの分布を表すハイパーパラメータと、各トピックに対するヒストグラムを特徴付ける形状パラメータの分布を表すハイパーパラメータとを推定する分割数ビン幅トピック推定部と、前記分割数ビン幅トピック推定部によって推定された、前記重みの分布を表すハイパーパラメータ、前記形状パラメータの分布を表すハイパーパラメータ、及び各連続値データについての前記トピックの推定結果に基づく、前記形状パラメータの事後確率、及び前記重みの事後確率に従って、各トピックに対する前記形状パラメータ、及び各個体の各トピックに対する重みを推定し、前記推定された各トピックに対する前記形状パラメータ、前記分割数、及び前記ビン幅に基づく各トピックに対するヒストグラムの、前記推定された個体の各トピックに対する重みを用いた線形和で表される、前記個体が生成する連続値データの確率密度関数を出力する個体確率密度関数推定部と、を含んで構成されている。 The probability density function estimation device according to a third aspect of the present invention is an input of a set of continuous value data generated by the individual, observed by a plurality of individuals, and for each continuous value data included in the set of continuous value data, The continuous value data belongs according to the posterior probability of the topic to which the continuous value data belongs, expressed using the number of divisions and bin width of each topic of the histogram for each topic, which represents the probability density of the continuous value. Estimating a topic and, for each topic, according to the a posteriori probability of the number of divisions of the histogram for the topic, expressed using the estimation result of the topic for each continuous value data and the bin width of the histogram for the topic, Estimate the number of histogram divisions for a topic, estimate the topic for each continuous value data, and Histogram bin width for each topic is estimated according to the histogram bin width posterior probability for each topic, expressed using the number of histogram divisions for the pick, and based on the topic estimation results for each continuous value data A division number bin width topic estimation unit for estimating a hyperparameter representing a distribution of weights for each topic of each individual and a hyperparameter representing a distribution of shape parameters characterizing a histogram for each topic; and the division number bin width topic The posterior probability of the shape parameter based on the hyperparameter representing the distribution of the weight, the hyperparameter representing the distribution of the shape parameter, and the estimation result of the topic for each continuous value data, estimated by the estimation unit, and To the posterior probability of weight The shape parameter for each topic and the weight for each topic for each individual, and the histogram for each topic based on the shape parameter for each estimated topic, the number of divisions, and the bin width, An individual probability density function estimator that outputs a probability density function of continuous value data generated by the individual represented by a linear sum using the weight of each estimated individual for each topic.

また、第４の発明に係る確率密度関数推定方法は、分割数ビン幅トピック推定部が、複数の個体で観測された、前記個体が生成した連続値データの集合を入力とし、前記連続値データの集合に含まれる各連続値データについて、連続値の確率密度を表す、各トピックに対するヒストグラムの、前記トピック毎の分割数及びビン幅を用いて表される、前記連続値データが所属するトピックの事後確率に従って、前記連続値データが所属するトピックを推定し、各トピックについて、各連続値データについての前記トピックの推定結果と、前記トピックに対するヒストグラムのビン幅とを用いて表される、前記トピックに対するヒストグラムの分割数の事後確率に従って、前記トピックに対するヒストグラムの分割数を推定し、各連続値データについての前記トピックの推定結果と、各トピックに対するヒストグラムの分割数とを用いて表される、各トピックに対するヒストグラムのビン幅の事後確率に従って、各トピックに対するヒストグラムのビン幅を推定し、各連続値データについての前記トピックの推定結果に基づいて、各個体の各トピックに対する重みの分布を表すハイパーパラメータと、各トピックに対するヒストグラムを特徴付ける形状パラメータの分布を表すハイパーパラメータとを推定し、個体確率密度関数推定部が、前記分割数ビン幅トピック推定部によって推定された、前記重みの分布を表すハイパーパラメータ、前記形状パラメータの分布を表すハイパーパラメータ、及び各連続値データについての前記トピックの推定結果に基づく、前記形状パラメータの事後確率、及び前記重みの事後確率に従って、各トピックに対する前記形状パラメータ、及び各個体の各トピックに対する重みを推定し、前記推定された各トピックに対する前記形状パラメータ、前記分割数、及び前記ビン幅に基づく各トピックに対するヒストグラムの、前記推定された個体の各トピックに対する重みを用いた線形和で表される、前記個体が生成する連続値データの確率密度関数を出力する。 Also, in the probability density function estimation method according to the fourth aspect of the invention, the division number bin width topic estimation unit receives a set of continuous value data generated by the individual and observed by a plurality of individuals, and the continuous value data For each continuous value data included in the set, the histogram for each topic that represents the probability density of the continuous value is represented using the number of divisions and bin width for each topic, and the topic to which the continuous value data belongs. The topic to which the continuous value data belongs is estimated according to the posterior probability, and the topic is expressed using the estimation result of the topic for each continuous value data and the bin width of the histogram for the topic for each topic. In accordance with the posterior probability of the number of histogram divisions for the topic, the number of histogram divisions for the topic is estimated and The histogram bin width for each topic is estimated according to the a posteriori probability of the histogram bin width for each topic expressed using the estimation result of the topic and the number of histogram divisions for each topic, and each continuous value Based on the estimation result of the topic for the data, the hyperparameter representing the distribution of the weight for each topic of each individual and the hyperparameter representing the distribution of the shape parameter characterizing the histogram for each topic are estimated, and the individual probability density function Based on the hyperparameter representing the distribution of the weight, the hyperparameter representing the distribution of the shape parameter, and the estimation result of the topic for each continuous value data estimated by the division number bin width topic estimation unit The shape parameters Estimating the shape parameter for each topic and the weight for each topic of each individual according to the probability and the posterior probability of the weight, and based on the estimated shape parameter for each topic, the number of divisions, and the bin width A probability density function of continuous value data generated by the individual, which is expressed by a linear sum using a weight for each topic of the estimated individual of the histogram for each topic, is output.

また、第５の発明に係る連続値予測装置は、予測対象の個体が生成する連続値データを予測する連続値予測装置であって、前記予測対象の個体に対して予め求められた、連続値の確率密度を表す、各トピックに対するヒストグラムを特徴付ける形状パラメータ、各トピックに対する前記ヒストグラムの分割数、各トピックに対する前記ヒストグラムのビン幅に基づく各トピックに対するヒストグラムの、前記個体の各トピックに対する重みを用いた線形和で表される、個体が生成する連続値データの確率密度関数に従って、前記予測対象の個体が生成する連続値データを予測する連続値予測部を含んで構成されている。 A continuous value prediction apparatus according to a fifth aspect of the present invention is a continuous value prediction apparatus that predicts continuous value data generated by an individual to be predicted, and is a continuous value obtained in advance for the individual to be predicted. Using the shape parameter characterizing the histogram for each topic, the number of divisions of the histogram for each topic, the histogram for each topic based on the bin width of the histogram for each topic, and the weight for each topic of the individual A continuous value predicting unit that predicts continuous value data generated by the individual to be predicted is represented in accordance with a probability density function of continuous value data generated by the individual represented by a linear sum.

第６の発明に係る連続値予測方法は、予測対象の個体が生成する連続値データを予測する連続値予測装置における連続値予測方法であって、連続値予測部が、前記予測対象の個体に対して予め求められた、連続値の確率密度を表す、各トピックに対するヒストグラムを特徴付ける形状パラメータ、各トピックに対する前記ヒストグラムの分割数、各トピックに対する前記ヒストグラムのビン幅に基づく各トピックに対するヒストグラムの、前記個体の各トピックに対する重みを用いた線形和で表される、個体が生成する連続値データの確率密度関数に従って、前記予測対象の個体が生成する連続値データを予測する。 A continuous value prediction method according to a sixth aspect of the present invention is a continuous value prediction method in a continuous value prediction apparatus that predicts continuous value data generated by an individual to be predicted, wherein the continuous value prediction unit determines the individual to be predicted. The shape parameter characterizing the histogram for each topic, representing the probability density of a continuous value determined in advance, the number of divisions of the histogram for each topic, the histogram for each topic based on the bin width of the histogram for each topic, The continuous value data generated by the individual to be predicted is predicted according to the probability density function of the continuous value data generated by the individual represented by a linear sum using the weight of each individual for each topic.

第７の発明に係るプログラムは、コンピュータを、上記確率密度関数推定装置、又は上記連続値予測装置を構成する各部として機能させるためのプログラム。 A program according to a seventh invention is a program for causing a computer to function as each part constituting the probability density function estimation device or the continuous value prediction device.

本発明の確率密度関数推定装置、方法、及びプログラムによれば、各トピックに対する前記形状パラメータ、及び各個体の各トピックに対する重みを推定し、各個体について、前記推定された各トピックに対する前記形状パラメータに基づく各トピックに対するヒストグラムの、前記推定された個体の各トピックに対する重みを用いた線形和で表される、前記個体が生成する連続値データの確率密度関数を出力することにより、個体が生成する連続値データを表す正しい確率密度関数を推定することができる、という効果が得られる。 According to the probability density function estimation apparatus, method, and program of the present invention, the shape parameter for each topic and the weight of each individual for each topic are estimated, and the shape parameter for each estimated topic for each individual. An individual is generated by outputting a probability density function of continuous value data generated by the individual represented by a linear sum using a weight for each topic of the estimated individual of the histogram for each topic based on An effect is obtained that a correct probability density function representing continuous value data can be estimated.

また、本発明の連続値予測装置、方法、及びプログラムによれば、各トピックに対するヒストグラムを特徴付ける形状パラメータ、各トピックに対する前記ヒストグラムの分割数、各トピックに対する前記ヒストグラムのビン幅に基づく各トピックに対するヒストグラムの、前記個体の各トピックに対する重みを用いた線形和で表される、個体が生成する連続値データの確率密度関数に従って、前記予測対象の個体が生成する連続値データを予測することにより、個体が生成する連続値データを精度よく予測することができる、という効果が得られる。 Further, according to the continuous value prediction apparatus, method, and program of the present invention, the shape parameter characterizing the histogram for each topic, the number of divisions of the histogram for each topic, the histogram for each topic based on the bin width of the histogram for each topic By predicting the continuous value data generated by the individual to be predicted according to the probability density function of the continuous value data generated by the individual represented by a linear sum using the weights for each topic of the individual, The continuous value data generated by can be predicted with high accuracy.

本発明の第１の実施の形態に係る確率密度関数推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the probability density function estimation apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１、第２の実施の形態に係る連続値予測装置の構成を示すブロック図である。It is a block diagram which shows the structure of the continuous value prediction apparatus which concerns on the 1st, 2nd embodiment of this invention. 本発明の第１の実施の形態に係る確率密度関数推定装置における確率密度関数推定処理ルーチンを示すフローチャートである。It is a flowchart which shows the probability density function estimation process routine in the probability density function estimation apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る連続値予測装置における連続値予測処理ルーチンを示すフローチャートである。It is a flowchart which shows the continuous value prediction process routine in the continuous value prediction apparatus which concerns on the 1st Embodiment of this invention. ヒストグラムの一例を示す図である。It is a figure which shows an example of a histogram. 本発明の第２の実施の形態に係る確率密度関数推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the probability density function estimation apparatus which concerns on the 2nd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞
まず、本発明の実施の形態における概要を説明する。 <Outline according to Embodiment of the Present Invention>
First, an outline of the embodiment of the present invention will be described.

多くの場合、個体一つ一つから得られるデータは少ないが、一方で複数の個体からそれぞれ少数のデータを得ることができる。例えばeコマースサイトにおいて、大半の顧客は数回しか購買を行っていないが、登録されている顧客数が極めて多い、という状況である。個体はそれぞれ独自の確率密度関数を有しているが、互いに似た密度関数を有する個体群が存在するはずである。このとき、(i) 全個体の確率密度関数が少数のヒストグラムの線形和で表現される、(ii) 個体ごとの確率密度関数の違いは線形和の重み付けの違いで表現される、の二点を仮定することで、ヒストグラム法の長所である高い自由度を担保しつつ、その弱みであるデータのスパース性を回避することができる、と着想した。個体の確率密度関数をヒストグラムの線形和で表現する装置はすでに存在するが（上記非特許文献1）、それは単一の個体に対してであり、ヒストグラム法の弱点であるデータのスパース性を回避することはできない。 In many cases, there is little data obtained from each individual, but on the other hand, a small number of data can be obtained from a plurality of individuals. For example, in an e-commerce site, most customers make purchases only a few times, but the number of registered customers is extremely large. Each individual has its own probability density function, but there should be individuals having similar density functions. In this case, (i) the probability density function of all individuals is expressed by a linear sum of a small number of histograms, and (ii) the difference of probability density functions for each individual is expressed by a difference in weighting of the linear sum. Assuming that, the high degree of freedom that is an advantage of the histogram method can be secured, and the sparsity of data, which is its weakness, can be avoided. There is already a device that expresses the probability density function of an individual as a linear sum of histograms (Non-Patent Document 1 above), but it is for a single individual and avoids the sparsity of data that is a weak point of the histogram method. I can't do it.

さらに上記に加え、(iii) 可変ビン幅ヒストグラムを構成することを着想した。ヒストグラム法にて決定される区間の幅をビン幅と呼ぶが、ビン幅が等間隔であるという制限が存在する場合、以下に挙げる問題が生じる。 In addition to the above, (iii) the idea was to construct a variable bin width histogram. The width of the section determined by the histogram method is called a bin width. However, when there is a restriction that the bin width is equal, the following problems arise.

ヒストグラムにおける分割位置、すなわち確率密度関数の不連続点を任意の位置に設定できないため、真の確率密度関数と推定確率密度関数との間にズレが生じる可能性がある。 Since the division position in the histogram, that is, the discontinuity point of the probability density function cannot be set to an arbitrary position, there is a possibility that a deviation occurs between the true probability density function and the estimated probability density function.

推定対象の確率密度関数において関数形が局在している領域と広く分布している領域が共存している場合、前者ではビン幅を狭く、後者ではビン幅を広く設定すべきであるが、ビン幅は単一の値しか取れないため全領域で狭いビン幅（あるいは中途半端なビン幅）がデータから決定されてしまう。その結果、確率密度関数の推定精度が悪化する可能性がある。また、必要以上に狭いビン幅が選ばれるということは、定義域を必要以上に多くの区間に分割するということであり、このときヒストグラム法においてデータから推定すべきパラメータ数が多くなり、推定/予測時の計算コストが増大してしまう。 In the probability density function of the estimation target, when the region where the function form is localized and the widely distributed region coexist, the bin width should be set narrow in the former, and the bin width should be set wide in the latter, Since only a single value can be taken for the bin width, a narrow bin width (or a half-width bin width) is determined from the data in the entire region. As a result, the estimation accuracy of the probability density function may deteriorate. In addition, the fact that the bin width narrower than necessary is selected means that the domain is divided into more sections than necessary.At this time, the number of parameters to be estimated from the data in the histogram method increases, and the estimation / The calculation cost at the time of prediction will increase.

一定でないビン幅をデータから推定するヒストグラム法（以後、可変ビン幅ヒストグラム）の先行技術は存在するが（例えば、非特許文献2）、全個体の確率密度関数を少数のヒストグラムの線形和で表現し、かつ個体ごとの確率密度関数の違いを線形和の重み付けの違いで表現した上で、可変ビン幅ヒストグラムを実現する装置は未だ存在しない。 Although there is a prior art of a histogram method (hereinafter referred to as variable bin width histogram) that estimates a non-constant bin width from data (for example, Non-Patent Document 2), the probability density function of all individuals is expressed by a linear sum of a small number of histograms. However, there is still no device that realizes a variable bin width histogram after expressing the difference in probability density function for each individual by the difference in weighting of the linear sum.

本発明の実施の形態は上記の点に鑑みてなされたものであり、(i) 全個体の確率密度関数を少数のヒストグラムの線形和で表現する、(ii) 個体ごとの確率密度関数の違いを線形和の重み付けの違いで表現する、(iii) 可変ビン幅ヒストグラムを構成することにより、確率密度関数の推定を高精度かつ個体単位で実現することを可能とする技術を提供することを目的とする。 The embodiment of the present invention has been made in view of the above points. (I) The probability density function of all individuals is expressed by a linear sum of a small number of histograms. (Ii) The difference of probability density functions for each individual. (Iii) It is intended to provide a technology that makes it possible to estimate the probability density function with high accuracy and in individual units by constructing a variable bin width histogram. And

［非特許文献２］：P. Kontkanen and P. Myllymaki, "MDL Histogram Density Estimation", International Conference on Artificial Intelligence and Statistics, pp.219-226, 2007. [Non-Patent Document 2]: P. Kontkanen and P. Myllymaki, "MDL Histogram Density Estimation", International Conference on Artificial Intelligence and Statistics, pp.219-226, 2007.

本発明の実施の形態では、各個体の連続値データの集合から確率密度関数を推定する確率密度関数推定装置と、連続値予測装置とに本発明を適用させた場合を例に説明する。具体的には、まず確率密度関数推定装置において、ヒストグラムで表現された個体の確率密度関数が推定される。次に、連続値予測装置においては、推定された個体の確率密度関数に基づき将来観測される連続値が予測される。各装置の詳細を以下で説明する。 In the embodiment of the present invention, a case where the present invention is applied to a probability density function estimation device that estimates a probability density function from a set of continuous value data of each individual and a continuous value prediction device will be described as an example. Specifically, first, the probability density function estimation apparatus estimates the probability density function of an individual expressed by a histogram. Next, in the continuous value prediction apparatus, a continuous value to be observed in the future is predicted based on the estimated probability density function of the individual. Details of each device will be described below.

［第１の実施の形態］
＜本発明の第１の実施の形態に係る確率密度関数推定装置の構成＞
次に、本発明の第１の実施の形態に係る確率密度関数推定装置の構成について説明する。図１に示すように、本発明の第１の実施の形態に係る確率密度関数推定装置１００は、ＣＰＵと、ＲＡＭと、後述する確率密度関数推定処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この確率密度関数推定装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部４０とを備えている。 [First Embodiment]
<Configuration of probability density function estimation device according to first embodiment of the present invention>
Next, the configuration of the probability density function estimation apparatus according to the first embodiment of the present invention will be described. As shown in FIG. 1, the probability density function estimation apparatus 100 according to the first embodiment of the present invention includes a CPU, a RAM, a program for executing a probability density function estimation processing routine described later, and various data. It can be composed of a computer including a stored ROM. Functionally, the probability density function estimation apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 40 as shown in FIG.

入力部１０は、複数の個体で観測された、当該個体の連続値データの集合の入力を受け付ける。 The input unit 10 receives input of a set of continuous value data of the individual observed by a plurality of individuals.

連続値データは、解析対象である個体から生成される。また、連続値データは、個体ＩＤ（以下、ｕと表記する）、個体ｕで観測された連続値データの個数（以下、Ｎ_ｕと表記する）、全個体で観測された連続値データの総数（以下、Ｎと表記する）、全個体で観測された連続値データの集合（｛ｔ_ｊ｝≡（ｔ_１，ｔ_２，…ｔ_Ｎ）と表記）、及び各連続値データを生成した個体ＩＤの集合（｛ｕ_ｊ｝≡（ｕ_１，ｕ_２，…ｕ_Ｎ）と表記）を含む。 Continuous value data is generated from an individual to be analyzed. The continuous value data includes an individual ID (hereinafter referred to as u), the number of continuous value data observed in the individual u (hereinafter referred to as _Nu ), and the total number of continuous value data observed in all individuals. (Hereinafter referred to as N), a set of continuous value data observed in all individuals (represented as {t _j } ≡ (t ₁ , t ₂ ,... T _N )), and the individual that generated each continuous value data A set of IDs (represented as {u _j } ≡ (u ₁ , u ₂ ,... U _N )).

また、入力部１０は、個体の確率密度関数を表現するためのヒストグラムの個数Ｋ、及び確率密度関数の定義域（連続値が取りうる値の範囲）

を入力として受け付ける。ただし以後、Ｋ個のヒストグラムをそれぞれｋ＝１，２，３，…，Ｋで表記し、ｋ番目のヒストグラムをトピックｋと呼ぶこととする。 In addition, the input unit 10 includes the number K of histograms for expressing the probability density function of an individual, and a definition area of the probability density function (a range of values that can be taken by continuous values).

Is accepted as input. However, hereinafter, the K histograms are denoted by k = 1, 2, 3,..., K, respectively, and the k-th histogram is referred to as a topic k.

また、入力部１０は、各トピックの定義域の分割数Ｗ_k、及び各トピックの各分割区間の幅（＝ビン幅）Δ_lk（Ｔ₁−Ｔ₀）を入力として受け付ける（１≦ｋ≦Ｋ、１≦ｌ≦Ｗ_k）。ただし、Δ_lkは単位化されたビン幅で、

が成り立つ。以後、以下の式に示すように、特にインデックスを付けない場合は各パラメータの集合を表すものとする。 Further, the input unit 10 receives as input the number of divisions W _{k of the} domain of each topic and the width (= bin width) Δ _lk (T ₁ −T ₀ ) of each divided section of each topic (1 ≦ k ≦). K, 1 ≦ l ≦ W _k ). Where Δ _lk is the unitized bin width,

Holds. Hereinafter, as shown in the following expression, a set of parameters is represented when no index is added.

ここで、本実施の形態で用いる生成モデルの原理について説明する。 Here, the principle of the generation model used in this embodiment will be described.

まず、以下のような生成モデルに基づき、各個体から連続値データが生成されると仮定する。ｊ番目のデータｔ_ｊを生成した個体ｕ_ｊは、Ｋ個のヒストグラムの線形和で表現される確率密度関数ｐ（ｔ｜φ，Ｗ，Δ）からデータｔ_ｊを生成する。 First, it is assumed that continuous value data is generated from each individual based on the following generation model. j-th individual u _j that generated the data t _j of the probability density function p expressed by the linear sum of K histogram (t | φ, W, Δ ) generates the data t _j from.

・・・（１）
... (1)

ただし、ｚ_ｊはｊ番目のデータが所属するトピック（１〜Ｋ）を表す潜在変数であり、

は個体ｕ_ｊのトピックｋに対する重み、ｐ（ｔ_ｊ｜ｚ_ｊ＝ｋ，φ_.k）はヒストグラム However, z _j is a latent variable representing the topic (1-K) to which the j-th data belongs,

Is the weight of individual u _j to topic k, and p (t _j | z _j = k, φ _.k ) is the histogram

・・・（２）
... (2)

で表されるトピックｋの分布、φ_lkはヒストグラムの形状を決めるパラメータ、ｗ_j ^kは The distribution of the topic k expressed as _follows , φ _lk is a parameter that determines the shape of the histogram, and w _j ^k is

・・・（３）
... (3)

を満たす整数ｌ（１＜ｗ_j ^k＜Ｗ_k）である。ヒストグラムによって表現される確率密度関数は、定義域［Ｔ₀，Ｔ₁］をＷ_k個の区間に等分割しその各区間ｌ（１〜Ｗ_k）での確率密度の値を一定値φ_lkで表現したものであり、ｗ_j ^kは連続値データｔ_jが何番目の区間内に入っているかを示している。 An integer l satisfying 1 (1 <w _j ^k <W _k ). The probability density function expressed by a histogram is obtained by equally dividing the domain [T ₀ , T ₁ ] into W _k sections and setting the probability density value in each section l ( ₁ to W _k ) to a constant value φ _lk. W _j ^k indicates in which section the continuous value data t _j is included.

次に、上記式（１）〜（３）の生成モデルに現れる二種類のパラメータ

及びφ_lkに対して、共役な事前分布であるディリクレ分布をそれぞれに仮定する。 Next, two types of parameters appearing in the generation models of the above formulas (1) to (3)

_And Dirichlet distribution that is a conjugate prior distribution with respect to φ _lk .

・・・（４）
... (4)

次に、上記式（１）〜（４）より、データの所属トピックｚ≡（ｚ_１，ｚ_２，…ｚ_Ｎ）及び観測される連続値データｔ≡（ｔ_１，ｔ_２，…ｔ_Ｎ）の同時確率を得る。 Next, from the above formulas (1) to (4), the data belonging topic z≡ (z ₁ , z ₂ ,... Z _N ) and the observed continuous value data t≡ (t ₁ , t ₂ ,... T _N ).

・・・（５）
... (5)

ただし、Ｎ_kuは個体ｕのデータのうち所属トピックがｋであるデータの個数、Ｎ_klは所属トピックがｋである全データの中でヒストグラムのｌ番目の区間に含まれているデータの個数、Ｎ_kは所属トピックがｋであるデータの総数を表しており、また、モデルを簡潔にするためディリクレ分布のパラメータβの全要素を等しいと置いた（β₁＝β₂＝…＝β）。 Here, N _ku is the number of data whose belonging topic is k among the data of individual u, N _kl is the number of data included in the l-th section of the histogram among all data whose belonging topic is k, N _k represents the total number of data whose topic is k, and in order to simplify the model, all elements of the parameter β of the Dirichlet distribution are set equal (β ₁ = β ₂ =... Β).

上記（１）〜（５）式によって本実施の形態における生成モデルが計算できる。以上が本実施の形態における生成モデルの原理である。 The generation model in the present embodiment can be calculated by the above equations (1) to (5). The above is the principle of the generation model in the present embodiment.

演算部２０は、連続値データ記憶部２２と、トピック推定部２４と、個体確率密度関数推定部２６とを含んで構成されている。 The calculation unit 20 includes a continuous value data storage unit 22, a topic estimation unit 24, and an individual probability density function estimation unit 26.

連続値データ記憶部２２には、入力部１０によって受け付けた連続値データの集合が格納される。 The continuous value data storage unit 22 stores a set of continuous value data received by the input unit 10.

トピック推定部２４は、連続値データ記憶部２２に記憶された連続値データの集合に含まれる各連続値データについて、各トピックに対するヒストグラムの、トピックｋ毎に定められた分割数Ｗ_k及びビン幅Δ_lkを用いて表される、連続値データが所属するトピックｚの事後確率ｐ（ｚ｜ｔ，α，β，Ｗ）に従って、連続値データが所属するトピックｋを推定する。また、トピック推定部２４は、各連続値データについてのトピックｋの推定結果に基づいて、各個体ｕの各トピックｋに対する重み

の分布を表すハイパーパラメータαと、各トピックに対するヒストグラムを特徴付ける形状パラメータφ._kの分布を表すハイパーパラメータβとを推定する。トピック推定部２４は、上記のトピックの推定及びハイパーパラメータの推定を繰り返し行う。 For each continuous value data included in the set of continuous value data stored in the continuous value data storage unit 22, the topic estimation unit 24 determines the number of divisions W _k and bin width determined for each topic k in the histogram for each topic. The topic k to which the continuous value data belongs is estimated according to the posterior probability p (z | t, α, β, W) of the topic z to which the continuous value data belongs, which is expressed using _Δlk . Also, the topic estimation unit 24 weights each individual u with respect to each topic k based on the estimation result of topic k for each continuous value data.

A hyper parameter α representing the distribution, estimates a hyper parameter β representative of the distribution of the shape parameter phi. _K characterizing histogram for each topic. The topic estimation unit 24 repeatedly performs the above topic estimation and hyperparameter estimation.

トピック推定部２４で行われる具体的な計算方法について以下に詳述する。まず、連続値データ毎の所属トピックｚ≡（ｚ_１，ｚ_２，…ｚ_Ｎ）の事後確率ｐ（ｚ｜ｔ，α，β，Ｗ）を計算することで、未知のパラメータα、β、

、φをすべて推定することが出来る。事後確率を解析的に扱うのは困難であるので、トピック推定部２４による所属トピックの推定は、上記式（５）の同時確率から得られる、各連続値データｔ_jが所属するトピックｚ_jをギブスサンプリングするための公式 A specific calculation method performed by the topic estimation unit 24 will be described in detail below. First, by calculating the posterior probability p (z | t, α, β, W) of the belonging topic z≡ (z ₁ , z ₂ ,... Z _N ) for each continuous value data, the unknown parameters α, β,

, Φ can all be estimated. Since it is difficult to analytically handle the posterior probability, the topic estimation unit 24 estimates the topic to be assigned to the topic z _j to which each continuous value data t _j obtained from the simultaneous probability of the above equation (5) belongs. Formula for Gibbs sampling

・・・（６）
... (6)

を用いてｚ≡（ｚ_１，ｚ_２，…ｚ_Ｎ）のＰ回分のサンプル（ｚ^（１），ｚ^（２），…，ｚ^（Ｐ））を生成し、保持しておく。ただし、上記式（６）に現れる−ｊは、総数Ｎの全データ集合からｊ番目のデータを除いた部分集合を表す。実際にはｚのサンプルそのものではなく、その十分統計量である

を保持することになる。 Are used to generate and hold ^P samples (z ⁽¹⁾ , z ⁽²⁾ ,..., Z ^(P) ) of z≡ (z ₁ , z ₂ ,... Z _N ). However, −j appearing in the above equation (6) represents a subset obtained by excluding the jth data from the total number N of data sets. Actually it is not a sample of z itself, but a sufficient statistic

Will hold.

また、トピック推定部２４は、Ｐ回分のサンプルを生成するのと同時に、確率的ＥＭ法（例えば、非特許文献３を参照）に基づき、以下の式（７）に従って、未知のハイパーパラメータα，βの値をＰ回更新する。そして、トピック推定部２４は、Ｐ回更新後のハイパーパラメータ値α^（Ｐ），β^（Ｐ）をそれぞれの推定値とする。 In addition, the topic estimation unit 24 generates P samples, and at the same time, based on the probabilistic EM method (see, for example, Non-Patent Document 3), according to the following equation (7), the unknown hyperparameter α, Update the value of β P times. Then, the topic estimation unit 24 sets the hyper parameter values α ^(P) and β ^(P) after the P updates as the respective estimated values.

・・・（７）
... (7)

具体的な更新式は、以下の（８）式で与えられる。 A specific update formula is given by the following formula (8).

・・・（８）
... (8)

［非特許文献３］：C. Bishop.“Pattern Recognition and Machine Learning”, Springer, New York, 2006. [Non-Patent Document 3]: C. Bishop. “Pattern Recognition and Machine Learning”, Springer, New York, 2006.

個体確率密度関数推定部２６は、トピック推定部２４によって推定されたハイパーパラメータα、β、及び各連続値データについてのトピックの推定結果（ｚ^（１），ｚ^（２），…，ｚ^（Ｐ））に基づく、形状パラメータの事後確率、及び重みの事後確率に従って、各トピックｋに対する形状パラメータφ_lk、及び各個体ｕの各トピックｋに対する重み

を推定する。 The individual probability density function estimator 26 estimates the hyperparameters α and β estimated by the topic estimator 24 and the topic estimation results (z ⁽¹⁾ , z ⁽²⁾ ,..., Z ^{(P )} Based on the posterior probability of the shape parameter and the posterior probability of the weight based on ⁾ ), the shape parameter φ _lk for each topic k and the weight for each topic k of each individual u

Is estimated.

具体的には、個体確率密度関数推定部２６は、トピック推定部２４によって得られたα、β，十分統計量

を用いて、以下の式（９）、（１０）に従って、各トピックｋに対する形状パラメータφ_lk、及び各個体ｕの各トピックｋに対する重み

を推定する。 Specifically, the individual probability density function estimation unit 26 calculates α, β, sufficient statistics obtained by the topic estimation unit 24.

, The shape parameter φ _lk for each topic k and the weight for each topic k for each individual u according to the following equations (9), (10)

Is estimated.

なお、本実施の形態では、各トピックｋに対する形状パラメータφ_lk、及び各個体ｕの各トピックｋに対する重み

の推定値として、それぞれの事後確率の平均を採用する。 In the present embodiment, the shape parameter φ _lk for each topic k and the weight of each individual u for each topic k

The average of each posterior probability is adopted as the estimated value of.

・・・（９）

・・・（１０）
... (9)

... (10)

また、個体確率密度関数推定部２６は、上記式（９）、（１０）で得られた結果を用いて、各個体ｕについて、推定された各トピックに対する形状パラメータφ_lkに基づく各トピックに対するヒストグラムの、推定された個体の各トピックに対する重みを用いた線形和で表される個体ｕが生成する連続値データの確率密度関数の推定値

を、出力部４０より出力する。 Further, the individual probability density function estimation unit 26 uses the results obtained by the above formulas (9) and (10), and for each individual u, a histogram for each topic based on the estimated shape parameter φ _lk for each topic. Of the probability density function of the continuous value data generated by the individual u represented by a linear sum using the weights of each estimated individual for each topic

Is output from the output unit 40.

・・・（１１）
(11)

ただし、ヒストグラム

は式（２）で定義される。 However, the histogram

Is defined by equation (2).

ここで、個体の確率密度関数は

で表現される。個体ｕごとに

（１≦ｋ≦Ｋ）が出力され、トピックｋごとに分割数Ｗ_kと、単位化されたビン幅Δ_lkと、φ_lk（１≦ｌ≦Ｗｋ）が出力され、そして共通の定義域としてＴ≡［Ｔ₀，Ｔ₁］が出力される。出力例を表２に示す。なお、表２では、トピック数Ｋ＝３の場合の例を示している。 Where the probability density function of an individual is

It is expressed by For each individual u

(1 ≦ k ≦ K) is output, the number of divisions W _k , the unitized bin width Δ _lk , and φ _lk (1 ≦ l ≦ Wk) are output for each topic k, and as a common domain T≡ [T ₀ , T ₁ ] is output. An output example is shown in Table 2. Table 2 shows an example in which the number of topics K = 3.

＜本発明の第１の実施の形態に係る連続値予測装置の構成＞
次に、本発明の第１の実施の形態に係る連続値予測装置の構成について説明する。図２に示すように、本発明の第１の実施の形態に係る連続値予測装置２００は、ＣＰＵと、ＲＡＭと、後述する連続値予測処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この連続値予測装置２００は、機能的には図２に示すように入力部５０と、演算部６０と、出力部７０とを備えている。連続値予測装置２００は、予測対象の個体が将来生成する連続値データを予測する。 <Configuration of continuous value prediction apparatus according to first embodiment of the present invention>
Next, the configuration of the continuous value prediction apparatus according to the first embodiment of the present invention will be described. As shown in FIG. 2, the continuous value prediction apparatus 200 according to the first embodiment of the present invention stores a CPU, a RAM, a program for executing a continuous value prediction processing routine described later, and various data. It can be composed of a computer including a ROM. Functionally, the continuous value prediction apparatus 200 includes an input unit 50, a calculation unit 60, and an output unit 70 as shown in FIG. The continuous value prediction apparatus 200 predicts continuous value data that the individual to be predicted will generate in the future.

入力部５０は、予測対象の個体ｕに対して予め確率密度関数推定装置１００によって推定された、各トピックｋに対するヒストグラムを特徴付ける形状パラメータφ_lk、各トピックに対する重み

と、各トピックに対するヒストグラムの分割数Ｗ_kと、各トピックに対するヒストグラムのビン幅Δ_lkとを受け付ける。 The input unit 50 uses the shape parameter φ _lk that characterizes the histogram for each topic k, which is estimated by the probability density function estimation apparatus 100 in advance for the individual u to be predicted, and the weight for each topic.

And the histogram division number W _k for each topic and the histogram bin width Δ _lk for each topic.

演算部６０は、連続値予測部６２を含んで構成されている。 The calculation unit 60 includes a continuous value prediction unit 62.

連続値予測部６２は、入力部５０により受け付けた、各トピックｋに対するヒストグラムを特徴付ける形状パラメータφ_lk、各トピックに対する重み

、各トピックに対するヒストグラムの分割数Ｗ_k、各トピックに対するヒストグラムのビン幅Δ_lkに基づいて、各トピックに対するヒストグラムの、個体ｕの各トピックに対する重み

を用いた線形和で表される、上記式（１１）に示す個体ｕ_jが生成する連続値データの確率密度関数に従って、予測対象の個体ｕが将来生成する連続値データを予測する。 The continuous value prediction unit 62 receives the shape parameter φ _lk that characterizes the histogram for each topic k, and the weight for each topic, received by the input unit 50.

, The weight of the histogram for each topic for each topic of the individual u based on the number of histogram divisions W _k for each topic and the bin width Δ _lk of the histogram for each topic

The continuous value data generated in the future by the individual u to be predicted is predicted according to the probability density function of the continuous value data generated by the individual u _j shown in the above equation (11), which is expressed by a linear sum using

具体的には、連続値予測部６２は、確率密度関数推定装置１００によって出力された予測対象の個体ｕの確率密度関数

を用いて、将来生成される連続値データの期待値Ｅ［ｔ^u］を計算し、連続値データの予測値として出力部７０により出力する。 Specifically, the continuous value prediction unit 62 outputs the probability density function of the individual u to be predicted output by the probability density function estimation device 100.

Is used to calculate the expected value E [t ^u ] of the continuous value data to be generated in the future, and is output by the output unit 70 as the predicted value of the continuous value data.

＜本発明の第１の実施の形態に係る確率密度関数推定装置の作用＞
次に、本発明の第１の実施の形態に係る確率密度関数推定装置１００の作用について説明する。入力部１０において個体の個体ＩＤが付与された連続値データの集合を受け付けると、確率密度関数推定装置１００は、連続値データの集合を、連続値データ記憶部２２に格納する。また、入力部１０において、トピック数Ｋと、連続値の定義域と、トピックｋの各々のヒストグラムの分割数Ｗ_k及びビン幅Δ_lkとを受け付けると、確率密度関数推定装置１００は、図３に示す確率密度関数推定処理ルーチンを実行する。 <Operation of the probability density function estimation apparatus according to the first embodiment of the present invention>
Next, the operation of the probability density function estimation apparatus 100 according to the first embodiment of the present invention will be described. When receiving the set of continuous value data to which the individual ID of the individual is assigned in the input unit 10, the probability density function estimation apparatus 100 stores the set of continuous value data in the continuous value data storage unit 22. Further, when the input unit 10 _{receives the} number of topics K, the domain of continuous values, the number of divisions W _k and the bin width Δ _lk of each topic k, the probability density function estimation apparatus 100 receives FIG. The probability density function estimation processing routine shown in FIG.

まず、ステップＳ１００では、入力部１０において受け付けた連続値データの集合と、トピック数Ｋと、連続値の定義域と、トピックｋの各々のヒストグラムの分割数Ｗ_k及びビン幅Δ_lkとを取得する。 First, in step S100, the set of continuous value data received by the input unit 10, the number of topics K, the domain of continuous values, the number of divisions W _k and the bin width Δ _lk of each histogram for topic k are acquired. To do.

次に、ステップＳ１０２において、トピック推定部２４は、連続値データ記憶部２２に記憶された連続値データの集合に含まれる各連続値データについて、α、βの初期値又は前回のステップＳ１０４で推定されたα、βと、各連続値データについてのトピックｋの初期値又は前回の推定結果から得られる十分統計量Ｎ_ku、Ｎ_kl、Ｎ_kとに基づいて、上記式（６）に従って、各連続値データが所属するトピックｋを推定する。また、各連続値データについてのトピックｋの推定結果に基づいて十分統計量Ｎ_ku、Ｎ_kl、Ｎ_k、Ｎ_lを計算する。 Next, in step S102, the topic estimation unit 24 estimates the initial values of α and β or the previous step S104 for each continuous value data included in the set of continuous value data stored in the continuous value data storage unit 22. And the sufficient statistics N _ku , N _kl , and N _k obtained from the initial value of topic k or the previous estimation result for each continuous value data, The topic k to which the continuous value data belongs is estimated. In addition, sufficient statistics N _ku , N _kl , N _k , and N _l are calculated based on the estimation result of topic k for each continuous value data.

ステップＳ１０４において、トピック推定部２４は、上記ステップＳ１０２で得られた各連続値データについてのトピックｋの推定結果から得られる十分統計量Ｎ_ku、Ｎ_kl、Ｎ_k、Ｎ_lに基づいて、各個体ｕの各トピックｋに対する重み

の分布を表すハイパーパラメータαと、各トピックに対するヒストグラムを特徴付ける形状パラメータφ._kの分布を表すハイパーパラメータβを、上記式（８）に従って更新する。 In step S104, the topic estimation unit 24, based on sufficient statistics N _ku , N _kl , N _k , and N _l obtained from the estimation result of topic k for each continuous value data obtained in step S102 above, Weight of individual u for each topic k

A hyper parameter α representing the distribution, the hyper parameter β representative of the distribution of the shape parameter phi. _K characterizing histogram for each topic, is updated according to the equation (8).

ステップＳ１０６において、ステップＳ１０２〜Ｓ１０４の処理が予め定められたＰ回繰り返したかを判定し、Ｐ回繰り返していなければステップＳ１０２へ戻ってステップＳ１０２〜Ｓ１０４の処理を繰り返し、Ｐ回繰り返していればステップＳ１０８へ移行する。 In step S106, it is determined whether the processes in steps S102 to S104 have been repeated P times. If not repeated, the process returns to step S102 and the processes in steps S102 to S104 are repeated. The process proceeds to S108.

ステップＳ１０８において、個体確率密度関数推定部２６は、上記ステップＳ１０４で更新されたハイパーパラメータα、β、及び繰り返し毎に得られた、上記ステップＳ１０２で計算された各連続値データについてのトピックの計算結果（ｚ^（１），ｚ^（２），…，ｚ^（Ｐ））に対応する十分統計量

に基づく、上記式（９）、（１０）に示す、各トピックｋに対する形状パラメータφ_lkの事後確率の平均、及び各個体ｕの各トピックｋに対する重み

の事後確率の平均から、各トピックｋに対する形状パラメータφ_lk、及び各個体ｕの各トピックｋに対する重み

を推定する。 In step S108, the individual probability density function estimation unit 26 calculates the topic for each of the continuous value data calculated in step S102 obtained for each of the hyperparameters α and β updated in step S104. Sufficient statistics corresponding to the results (z ⁽¹⁾ , z ⁽²⁾ , ..., z ^(P) )

The average of the posterior probabilities of the shape parameter φ _lk for each topic k and the weight for each topic k of each individual u shown in the above formulas (9) and (10) based on

From the average of the posterior probabilities of, the shape parameter φ _lk for each topic k and the weight of each individual u for each topic k

Is estimated.

そして、ステップＳ１１０において、推定された各トピックｋに対する形状パラメータφ_lk、及び各個体ｕの各トピックｋに対する重み

を用いた、各個体ｕが生成する連続値データの確率密度関数の推定値

を、出力部４０より出力して、確率密度関数推定処理ルーチンを終了する。 In step S110, the estimated shape parameter φ _lk for each topic k and the weight of each individual u for each topic k

Estimated probability density function of continuous value data generated by each individual u using

Is output from the output unit 40, and the probability density function estimation processing routine is terminated.

＜本発明の第１の実施の形態に係る連続値予測装置の作用＞
次に、本発明の第１の実施の形態に係る連続値予測装置２００の作用について説明する。入力部５０において、予測対象の個体ｕについて確率密度関数推定装置１００により推定された、個体ｕが生成する連続値データの確率密度関数を受け付けると、連続値予測装置２００は、図４に示す予測処理ルーチンを実行する。 <Operation of Continuous Value Prediction Device According to First Embodiment of the Present Invention>
Next, the operation of the continuous value prediction apparatus 200 according to the first embodiment of the present invention will be described. When the input unit 50 receives the probability density function of the continuous value data generated by the individual u estimated by the probability density function estimation device 100 for the individual u to be predicted, the continuous value prediction device 200 performs the prediction shown in FIG. Execute the processing routine.

ステップＳ２００において、個体ｕが生成する連続値データの確率密度関数を取得する。 In step S200, a probability density function of continuous value data generated by the individual u is acquired.

ステップＳ２０２において、連続値予測部６２は、入力部５０において受け付けた個体ｕが生成する連続値データの確率密度関数に基づいて、連続値データの期待値を計算し、予測値とする。 In step S202, the continuous value prediction unit 62 calculates the expected value of the continuous value data based on the probability density function of the continuous value data generated by the individual u received by the input unit 50, and sets it as the predicted value.

そして、ステップＳ２０４において、出力部７０は、上記ステップＳ２０２で得られた連続値データの予測値を結果として出力して、予測処理ルーチンを終了する。 In step S204, the output unit 70 outputs the predicted value of the continuous value data obtained in step S202 as a result, and ends the prediction processing routine.

以上説明したように、第１の実施の形態に係る確率密度関数推定装置によれば、各トピックに対する形状パラメータ、及び各個体の各トピックに対する重みを推定し、各個体について、推定された各トピックに対する前記形状パラメータに基づく各トピックに対するヒストグラムの、推定された個体の各トピックに対する重みを用いた線形和で表される、個体が生成する連続値データの確率密度関数を出力することにより、個体が生成する連続値データを表す正しい確率密度関数を推定することができる。 As described above, according to the probability density function estimation apparatus according to the first embodiment, the shape parameter for each topic and the weight of each individual for each topic are estimated, and each estimated topic for each individual is estimated. By outputting a probability density function of continuous value data generated by an individual represented by a linear sum using a weight for each topic of the estimated individual of the histogram for each topic based on the shape parameter for A correct probability density function representing the continuous value data to be generated can be estimated.

また、第１の実施の形態によれば、全個体の確率密度関数を少数のヒストグラムの線形和で表現し、かつ個体ごとの確率密度関数の違いを線形和の重みで表現し、かつ可変ビン幅ヒストグラムを構成することで、確率密度関数の推定を高精度かつ個体単位で実現することを可能とする。例えば、図５に示すような、可変ビン幅ヒストグラムを用いて、確率密度関数を推定することができる。 Further, according to the first embodiment, the probability density function of all individuals is expressed by a linear sum of a small number of histograms, the difference in probability density function for each individual is expressed by the weight of the linear sum, and the variable bin By constructing the width histogram, the probability density function can be estimated with high accuracy and in individual units. For example, the probability density function can be estimated using a variable bin width histogram as shown in FIG.

また、第１の実施の形態に係る連続値予測装置によれば、各トピックに対するヒストグラムを特徴付ける形状パラメータ、各トピックに対する前記ヒストグラムの分割数、各トピックに対する前記ヒストグラムのビン幅に基づく各トピックに対するヒストグラムの、個体の各トピックに対する重みを用いた線形和で表される、個体が生成する連続値データの確率密度関数に従って、予測対象の個体が生成する連続値データを予測することにより、個体が生成する連続値データを精度よく予測することができる。 In addition, according to the continuous value predicting apparatus according to the first embodiment, the shape parameter characterizing the histogram for each topic, the number of divisions of the histogram for each topic, the histogram for each topic based on the bin width of the histogram for each topic The individual is generated by predicting the continuous value data generated by the individual to be predicted according to the probability density function of the continuous value data generated by the individual, expressed as a linear sum using the weight of each individual topic. It is possible to accurately predict continuous value data.

［第２の実施の形態］
ついて説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Second Embodiment]
explain about. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、所属トピックを推定すると同時に、各トピックのヒストグラムの分割数とビン幅を推定している点が、第１の実施の形態と異なっている。 The second embodiment is different from the first embodiment in that the affiliated topic is estimated, and at the same time, the histogram division number and bin width are estimated for each topic.

＜本発明の第２の実施の形態に係る確率密度関数推定装置の構成＞
図６に示すように、第２の実施の形態に係る確率密度関数推定装置３００は、入力部１０、演算部３２０、及び出力部４０を備えている。 <Configuration of probability density function estimation apparatus according to second embodiment of the present invention>
As illustrated in FIG. 6, the probability density function estimation apparatus 300 according to the second embodiment includes an input unit 10, a calculation unit 320, and an output unit 40.

入力部１０は、複数の個体で観測された、当該個体の連続値データの集合の入力を受け付ける。また、入力部１０は、個体の確率密度関数を表現するためのヒストグラムの個数Ｋ、及び確率密度関数の定義域（連続値が取りうる値の範囲）

を入力として受け付ける。 The input unit 10 receives input of a set of continuous value data of the individual observed by a plurality of individuals. In addition, the input unit 10 includes the number K of histograms for expressing the probability density function of an individual, and a definition area of the probability density function (a range of values that can be taken by continuous values).

Is accepted as input.

演算部３２０は、連続値データ記憶部２２と、分割数ビン幅トピック推定部３２４と、個体確率密度関数推定部２６とを含んで構成されている。 The calculation unit 320 includes a continuous value data storage unit 22, a division number bin width topic estimation unit 324, and an individual probability density function estimation unit 26.

分割数ビン幅トピック推定部３２４は、第１の実施の形態のトピック推定部２４と同様に、連続値データ記憶部２２に記憶された連続値データの集合に含まれる各連続値データについて、連続値の確率密度を表す、各トピックに対するヒストグラムの、トピックｋ毎に定められた分割数Ｗ_k及びビン幅Δ_lkを用いて表される、連続値データが所属するトピックｚの事後確率ｐ（ｚ｜ｔ，α，β，Ｗ，Δ）に従って、連続値データが所属するトピックｋを推定する。 Similarly to the topic estimation unit 24 of the first embodiment, the division number bin width topic estimation unit 324 continuously outputs each continuous value data included in the set of continuous value data stored in the continuous value data storage unit 22. The posterior probability p (z) of the topic z to which the continuous value data belongs, expressed using the number of divisions W _k and the bin width Δ _lk determined for each topic k in the histogram for each topic, representing the probability density of the values | T, α, β, W, Δ), the topic k to which the continuous value data belongs is estimated.

また、分割数ビン幅トピック推定部３２４は、各トピックについて、各連続値データについてのトピックの推定結果と、トピックに対するヒストグラムのビン幅とを用いて表される、トピックに対するヒストグラムの分割数の事後確率ｐ（Ｗ｜ｚ、ｔ、α、β、Ｗ、Δ）に従って、トピックに対するヒストグラムの分割数を推定する。また、分割数ビン幅トピック推定部３２４は、各連続値データについてのトピックの推定結果と、各トピックに対するヒストグラムの分割数とを用いて表される、各トピックに対するヒストグラムのビン幅の事後確率ｐ（Δ｜ｚ、ｔ、α、β、Ｗ、Δ）に従って、各トピックに対するヒストグラムのビン幅を推定する。 In addition, the division number bin width topic estimation unit 324, for each topic, uses the topic estimation result for each continuous value data and the histogram bin width for the topic, and the subsequent number of histogram divisions for the topic. The number of histogram divisions for the topic is estimated according to the probability p (W | z, t, α, β, W, Δ). Further, the division number bin width topic estimation unit 324 uses the topic estimation result for each continuous value data and the histogram division number for each topic, and the histogram bin width posterior probability p for each topic. Estimate the bin width of the histogram for each topic according to (Δ | z, t, α, β, W, Δ).

また、分割数ビン幅トピック推定部３２４は、第１の実施の形態のトピック推定部２４と同様に、各連続値データについてのトピックｋの推定結果に基づいて、各個体ｕの各トピックｋに対する重み

の分布を表すハイパーパラメータαと、各トピックに対するヒストグラムを特徴付ける形状パラメータφ._kの分布を表すハイパーパラメータβとを推定する。トピック推定部２４は、上記のトピックの推定、分割数の推定、ビン幅の推定、及びハイパーパラメータの推定を繰り返し行う。 Similarly to the topic estimation unit 24 according to the first embodiment, the division number bin width topic estimation unit 324 applies each topic u to each topic k based on the estimation result of the topic k for each continuous value data. weight

A hyper parameter α representing the distribution, estimates a hyper parameter β representative of the distribution of the shape parameter phi. _K characterizing histogram for each topic. The topic estimation unit 24 repeatedly performs the above-described topic estimation, division number estimation, bin width estimation, and hyperparameter estimation.

分割数ビン幅トピック推定部３２４で行われる、具体的な計算方法について以下に詳述する。 A specific calculation method performed by the division number bin width topic estimation unit 324 will be described in detail below.

まず、第１の実施の形態のトピック推定部２４と同様に、所属トピックのギブスサンプリングを一度実行する。次に、分割数Ｗのサンプリングを得るために、事前分布としてディリクレ分布 First, similarly to the topic estimation unit 24 of the first embodiment, the Gibbs sampling of the affiliated topic is executed once. Next, in order to obtain sampling of the number of divisions W, Dirichlet distribution as prior distribution

・・・（１２）
(12)

を仮定し、上記式（５）の同時分布をビン幅 Δについて周辺化を行う。γはディリクレ分布のパラメータである。周辺化はBayesian Information Criteria（上記非特許文献3）に基づき近似的に実行される。 And the marginal distribution of the simultaneous distribution of the above equation (5) is performed with respect to the bin width Δ. γ is a parameter of the Dirichlet distribution. Peripheralization is approximately performed based on Bayesian Information Criteria (Non-Patent Document 3 above).

・・・（１３）
... (13)

ただし、Δ^*はビン幅の最大事後確率推定値である。 Where Δ ^* is the maximum posterior probability estimate of bin width.

・・・（１４）
(14)

さらに分割数に関する無情報事前分布として一様分布 Furthermore, uniform distribution as no information prior distribution about the number of divisions

・・・（１５）
(15)

を仮定し、分割数Ｗについてのギブスサンプリングの公式 And Gibbs sampling formula for the number of divisions W

・・・（１６）
... (16)

を得る。上記式（１６）に従い分割数Ｗのサンプリングを実行する。その際同時にビン幅は式（１４）により与えられる。経験的に分割数Ｗの分布は鋭いので、最後のサンプリングで得られた分割数 Get. Sampling of the division number W is executed according to the above equation (16). At the same time, the bin width is given by equation (14). Empirically, the distribution of the number of divisions W is sharp, so the number of divisions obtained in the last sampling

を推定値とする。未知のハイパーパラメータα、βの更新式は上記式（８）と同様にして以下の（１７）式で与えられる。 Is an estimated value. The update formulas for the unknown hyperparameters α and β are given by the following formula (17) in the same manner as the formula (8).

・・・（１７）

... (17)

個体確率密度関数推定部２６は、分割数ビン幅トピック推定部３２４によって得られたα、β、十分統計量

、Ｗ、Δを用いて、上記の式（９）、（１０）に従って、各トピックｋに対する形状パラメータφ_lk、及び各個体ｕの各トピックｋに対する重み

を推定する。 The individual probability density function estimation unit 26 calculates α, β, and sufficient statistics obtained by the division number bin width topic estimation unit 324.

, W, and Δ, the shape parameter φ _lk for each topic k and the weight for each topic k of each individual u according to the above equations (9) and (10)

Is estimated.

なお、第２の実施の形態に係る確率密度関数推定装置３００の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 In addition, about the other structure and effect | action of the probability density function estimation apparatus 300 which concern on 2nd Embodiment, since it is the same as that of 1st Embodiment, description is abbreviate | omitted.

＜本発明の第２の実施の形態に係る連続値予測装置の構成＞
第２の実施の形態に係る連続値予測装置は、第１の実施の形態と同様に、予測対象の個体ｕについて確率密度関数推定装置３００により推定された、個体ｕが生成する連続値データの確率密度関数に基づいて、連続値データの期待値を計算し、予測値とする。 <Configuration of continuous value prediction apparatus according to second embodiment of the present invention>
The continuous value prediction apparatus according to the second embodiment is similar to the first embodiment in that the continuous value data generated by the individual u, which is estimated by the probability density function estimation apparatus 300 for the individual u to be predicted. Based on the probability density function, the expected value of the continuous value data is calculated and used as a predicted value.

以上説明したように、第２の実施の形態に係る確率密度関数推定装置によれば、各トピックに対するヒストグラムの分割数、及び各トピックに対するヒストグラムのビン幅を推定し、各トピックに対する形状パラメータ、及び各個体の各トピックに対する重みを推定し、各個体について、推定された各トピックに対する形状パラメータ、各トピックに対する分割数、各トピックに対するビン幅に基づく各トピックに対するヒストグラムの、推定された個体の各トピックに対する重みを用いた線形和で表される、個体が生成する連続値データの確率密度関数を出力することにより、個体が生成する連続値データを表す正しい確率密度関数を推定することができる。 As described above, according to the probability density function estimation apparatus according to the second embodiment, the number of histogram divisions for each topic and the bin width of the histogram for each topic are estimated, the shape parameter for each topic, and Estimate the weight of each individual for each topic, and for each individual, each estimated topic for each topic in the shape parameter for each estimated topic, the number of divisions for each topic, and the histogram for each topic based on the bin width for each topic By outputting a probability density function of continuous value data generated by an individual represented by a linear sum using a weight for, a correct probability density function representing continuous value data generated by the individual can be estimated.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上記実施の形態に係る確率密度関数推定装置１００、３００は、連続値データ記憶部２２を備えている場合について説明したが、例えば連続値データ記憶部２２が確率密度関数推定装置１００、３００の外部装置に設けられ、確率密度関数推定装置１００、３００は、外部装置と通信手段を用いて通信することにより、連続値データ記憶部２２を参照するようにしてもよい。 For example, the probability density function estimation apparatuses 100 and 300 according to the above embodiment have been described with respect to the case where the continuous value data storage unit 22 is provided. For example, the continuous value data storage unit 22 includes the probability density function estimation apparatuses 100 and 300. The probability density function estimation apparatuses 100 and 300 may be configured to refer to the continuous value data storage unit 22 by communicating with the external apparatus using a communication unit.

また、上記実施の形態では、確率密度関数推定装置１００、３００と連続値予測装置２００とを別々の装置として構成する場合を例に説明したが、確率密度関数推定装置１００、３００と連続値予測装置２００とを１つの装置として構成してもよい。 Moreover, although the case where the probability density function estimation apparatuses 100 and 300 and the continuous value prediction apparatus 200 are configured as separate apparatuses has been described as an example in the above embodiment, the probability density function estimation apparatuses 100 and 300 and the continuous value prediction are described. The device 200 may be configured as one device.

また、上述の確率密度関数推定装置及び連続値予測装置は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 In addition, the probability density function estimation device and the continuous value prediction device described above have a computer system inside, but if the “computer system” uses a WWW system, a homepage providing environment (or Display environment).

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

１０、５０入力部
２０、６０、３２０演算部
２２連続値データ記憶部
２４トピック推定部
２６個体確率密度関数推定部
４０、７０出力部
６２連続値予測部
１００、３００確率密度関数推定装置
２００連続値予測装置
３２４分割数ビン幅トピック推定部 10, 50 Input unit 20, 60, 320 Operation unit 22 Continuous value data storage unit 24 Topic estimation unit 26 Individual probability density function estimation unit 40, 70 Output unit 62 Continuous value prediction unit 100, 300 Probability density function estimation device 200 Continuous value Prediction device 324 Division number bin width topic estimation unit

Claims

A set of continuous value data generated by the individual observed by a plurality of individuals is input, and for each continuous value data included in the continuous value data set, a histogram of each topic representing the probability density of the continuous value. , The topic to which the continuous value data belongs is represented according to the posterior probability of the topic to which the continuous value data belongs, expressed using the number of divisions and bin width determined for each topic,
Topic estimation that estimates hyperparameters representing the distribution of weights for each topic of each individual and hyperparameters representing the distribution of shape parameters that characterize the histogram for each topic based on the topic estimation results for each continuous value data And
A posterior probability of the shape parameter based on the hyperparameter representing the distribution of the weight, the hyperparameter representing the distribution of the shape parameter, and the estimation result of the topic for each continuous value data, estimated by the topic estimation unit, And estimating the shape parameter for each topic and the weight for each topic for each individual according to the posterior probability of the weight, and for each individual, the histogram for each topic based on the shape parameter for each estimated topic, An individual probability density function estimator that outputs a probability density function of continuous value data generated by the individual represented by a linear sum using a weight for each topic of the estimated individual;
Probability density function estimation device.

A set of continuous value data generated by the individual observed by a plurality of individuals is input, and for each continuous value data included in the continuous value data set, a histogram of each topic representing the probability density of the continuous value. , The topic to which the continuous value data belongs is estimated according to the posterior probability of the topic to which the continuous value data belongs, expressed using the number of divisions and bin width for each topic,
For each topic, the division of the histogram for the topic according to the posterior probability of the number of divisions of the histogram for the topic, expressed using the estimation result of the topic for each continuous value data and the bin width of the histogram for the topic Estimate the number,
Estimate the histogram bin width for each topic according to the a posteriori probability of the histogram bin width for each topic, expressed using the topic estimation results for each continuous value data and the number of histogram divisions for each topic. ,
Based on the estimation result of the topic for each continuous value data, the number of divisions for estimating the hyperparameter representing the distribution of the weight for each topic of each individual and the hyperparameter representing the distribution of the shape parameter characterizing the histogram for each topic A bin width topic estimator;
The shape parameter based on the hyperparameter representing the distribution of the weight, the hyperparameter representing the distribution of the shape parameter, and the estimation result of the topic for each continuous value data estimated by the division number bin width topic estimation unit. The shape parameter for each topic and the weight for each topic of each individual in accordance with the posterior probability of the weight and the weight posterior probability, and the shape parameter for each of the estimated topics, the number of divisions, and the bin width An individual probability density function estimator that outputs a probability density function of continuous value data generated by the individual represented by a linear sum using a weight for each topic of the estimated individual of the histogram for each topic based on ,
Probability density function estimation device.

A continuous value prediction apparatus for predicting continuous value data generated by an individual to be predicted,
Representing the probability density of continuous values obtained in advance for the individual to be predicted, shape parameters characterizing the histogram for each topic, the number of divisions of the histogram for each topic, and each histogram based on the bin width of the histogram for each topic Continuous that predicts continuous value data generated by the individual to be predicted according to a probability density function of continuous value data generated by the individual represented by a linear sum using a weight for each topic of the individual of the histogram for the topic A continuous value prediction device including a value prediction unit.

The topic estimation unit receives a set of continuous value data generated by the individual observed by a plurality of individuals, and represents the probability density of continuous values for each continuous value data included in the set of continuous value data. According to the posterior probability of the topic to which the continuous value data belongs, represented by using the division number and bin width determined for each topic of the histogram for each topic, estimate the topic to which the continuous value data belongs,
Based on the estimation result of the topic for each continuous value data, a hyperparameter representing a distribution of weights for each topic of each individual and a hyperparameter representing a distribution of shape parameters characterizing a histogram for each topic are estimated,
The individual probability density function estimation unit is based on the hyper parameter representing the distribution of the weight, the hyper parameter representing the distribution of the shape parameter, and the estimation result of the topic for each continuous value data estimated by the topic estimation unit. In accordance with the posterior probability of the shape parameter and the posterior probability of the weight, the shape parameter for each topic and the weight of each individual for each topic are estimated, and for each individual, the estimated shape parameter for each topic is A probability density function estimation method for outputting a probability density function of continuous value data generated by the individual, expressed by a linear sum using a weight for each topic of the estimated individual of a histogram for each topic based thereon.

The division number bin width topic estimation unit receives a set of continuous value data generated by the individual observed by a plurality of individuals, and for each continuous value data included in the continuous value data set, the probability of the continuous value The topic to which the continuous value data belongs is estimated according to the posterior probability of the topic to which the continuous value data belongs, which is expressed using the number of divisions and the bin width for each topic in the histogram for each topic that represents density. ,
For each topic, the division of the histogram for the topic according to the posterior probability of the number of divisions of the histogram for the topic, expressed using the estimation result of the topic for each continuous value data and the bin width of the histogram for the topic Estimate the number,
Estimate the histogram bin width for each topic according to the a posteriori probability of the histogram bin width for each topic, expressed using the topic estimation results for each continuous value data and the number of histogram divisions for each topic. ,
Based on the estimation result of the topic for each continuous value data, a hyperparameter representing a distribution of weights for each topic of each individual and a hyperparameter representing a distribution of shape parameters characterizing a histogram for each topic are estimated,
An individual probability density function estimation unit estimated by the division number bin width topic estimation unit is a hyperparameter representing the weight distribution, a hyperparameter representing the shape parameter distribution, and the topic of each continuous value data. Based on the estimation result, according to the posterior probability of the shape parameter and the posterior probability of the weight, the shape parameter for each topic and the weight for each topic of each individual are estimated, and the shape parameter for each estimated topic, Output a probability density function of continuous value data generated by the individual represented by a linear sum using a weight for each topic of the estimated individual of the histogram for each topic based on the number of divisions and the bin width Probability density function estimation method.

A continuous value prediction method in a continuous value prediction apparatus for predicting continuous value data generated by an individual to be predicted,
A shape parameter characterizing a histogram for each topic, which is obtained in advance by the continuous value prediction unit for the individual to be predicted, characterizing the probability density of the continuous value, the number of divisions of the histogram for each topic, and the histogram for each topic A continuous generated by the individual to be predicted according to a probability density function of continuous value data generated by the individual represented by a linear sum using a weight for each topic of the individual of the histogram for each topic based on the bin width of A continuous value prediction method that predicts value data.

The program for functioning a computer as each part which comprises the probability density function estimation apparatus of Claim 1 or Claim 2, or the continuous value prediction apparatus of Claim 3.