JP2017004493A

JP2017004493A - Data analysis method, data analysis apparatus, and program

Info

Publication number: JP2017004493A
Application number: JP2016020209A
Authority: JP
Inventors: 英生梅谷; Hideo Umetani; 郁大濱; Iku Ohama; 亮太藤村; Ryota Fujimura; 幸恵庄田; Yukie Shoda
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2015-06-05
Filing date: 2016-02-04
Publication date: 2017-01-05

Abstract

【課題】多様な情報を反映したクラスタリングを可能とする。【解決手段】データ分生方法は、Ｎ個の第１対象物のそれぞれと、Ｍ個の第２対象物のそれぞれとの関連度を示すＮ行Ｍ列の基礎行列を、３つの行列に分解して第１対象物および前記第２対象物の少なくとも一つをクラスタリングする。データ分析方法は、基礎行列の各要素に対して、関連度を示す値が入力された基礎行列を取得する取得ステップと、第１対象物のクラスタ数を示すＫと、第２対象物のクラスタ数を示すＬとを設定する設定ステップと、３つの行列を、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ列の行列の第２行列と、Ｌ行Ｍ列の第３行列とし、第１行列、第２行列および第３行列の積が、基礎行列に近似するように、第１行列、第２行列および第３行列に分解する分解ステップと、第１行列、第２行列および第３行列の少なくとも一つを出力することで、クラスタリング結果を出力する出力ステップと、を含む。【選択図】図６[Problem] To enable clustering that reflects a variety of information. [Solution] A data distribution method decomposes a fundamental matrix of N rows and M columns that indicates the degree of association between each of N first objects and each of M second objects into three matrices, and clusters at least one of the first objects and the second objects. A data analysis method includes an acquisition step of acquiring a fundamental matrix in which a value indicating the degree of association is input for each element of the fundamental matrix, a setting step of setting K indicating the number of clusters of the first object and L indicating the number of clusters of the second object, a decomposition step of decomposing the three matrices into a first matrix of N rows and K columns, a second matrix of a matrix of K rows and L columns, and a third matrix of L rows and M columns, such that the product of the first matrix, the second matrix, and the third matrix approximates the fundamental matrix, and an output step of outputting a clustering result by outputting at least one of the first matrix, the second matrix, and the third matrix. [Selected Figure] FIG.

Description

本発明は、データ分析方法、データ分析装置およびプログラムに関する。 The present invention relates to a data analysis method, a data analysis apparatus, and a program.

近年、ネットワーク化が進み、様々な機器を介して様々なデータが収集され蓄積されるようになった。様々なデータとはＷＥＢサイトのアクセス情報であったり、顧客の購買履歴であったり、番組の録画視聴履歴であったり、顧客の年齢・性別などの情報である。そのなかで、購買履歴や録画履歴などを用いて、ユーザを、好みなどの属性ごとにクラスタリングし、商品をお勧めするなどのレコメンドサービスが行われている。現在知られているクラスタリングの方法として、ＮＭＦや、さらにそのＮＭＦを拡張したＴｒｉ−ＮＭＦという行列分解方法が提案されている(例えば非特許文献１)。非特許文献１では、入力データとなる行列を３つの行列の積で近似できるような行列分解を行うことで、その３つの内の１つの行列を用いてクラスタリングを行っている。 In recent years, networking has progressed, and various data has been collected and accumulated via various devices. The various data are access information of the WEB site, customer purchase history, program recording / viewing history, customer age / sex information, and the like. Among them, recommendation services such as recommending merchandise by clustering users according to attributes such as preferences using purchase histories and recording histories are performed. As a currently known clustering method, NMF and a matrix decomposition method called Tri-NMF obtained by further expanding the NMF have been proposed (for example, Non-Patent Document 1). In Non-Patent Document 1, clustering is performed using one of the three matrices by performing matrix decomposition that can approximate a matrix serving as input data by the product of three matrices.

Orthogonal Nonnegative Matrix Tri-Factorizations for ClusteringOrthogonal Nonnegative Matrix Tri-Factorizations for Clustering

より多様な情報を反映したクラスタリングが求められている。 Clustering that reflects more diverse information is required.

そこで、本発明は、多様な情報を反映したクラスタリングが可能なデータ分析方法、データ分析装置およびプログラムを提供する。 Therefore, the present invention provides a data analysis method, data analysis apparatus, and program capable of clustering reflecting various information.

本発明の一態様に係るデータ分析方法は、Ｎ個の第１対象物のそれぞれと、Ｍ個の第２対象物のそれぞれとの関連度を示すＮ行Ｍ列の基礎行列を、３つの行列に分解して第１対象物および第２対象物のうち少なくとも一つをクラスタリングするデータ分析方法であって、基礎行列の各要素に対して、関連度を示す値が入力された基礎行列を取得する取得ステップと、第１対象物のクラスタ数を示すＫと、第２対象物のクラスタ数を示すＬとを設定する設定ステップと、３つの行列を、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ列の行列であって、特定の１行および特定の１列の少なくとも一方の各要素に、所定範囲に収まる数値が格納されている第２行列と、Ｌ行Ｍ列の第３行列とし、第１行列、第２行列および第３行列の積が、基礎行列に近似するように、第１行列、第２行列および第３行列に分解する分解ステップと、第１行列、第２行列および第３行列の少なくとも一つを出力することで、第１対象物および第２対象物のうち少なくとも一つのクラスタリング結果を出力する出力ステップと、を含む。 In the data analysis method according to an aspect of the present invention, a basic matrix of N rows and M columns indicating the degree of association between each of the N first objects and each of the M second objects is represented by three matrices. Is a data analysis method for clustering at least one of the first object and the second object by decomposing the basic matrix into which the values indicating the degree of association are input for each element of the basic matrix An acquisition step, a setting step for setting K indicating the number of clusters of the first object, and an L indicating the number of clusters of the second object, three matrices, a first matrix of N rows and K columns, A second matrix in which at least one element in one specific row and one specific column stores numerical values that fall within a predetermined range, and a third matrix of L rows and M columns. And the product of the first matrix, the second matrix, and the third matrix approximates the fundamental matrix And outputting at least one of the first matrix, the second matrix, and the third matrix by decomposing into the first matrix, the second matrix, and the third matrix, and the first object and the second object An output step of outputting at least one clustering result.

なお、これらの包括的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたはコンピュータ読み取り可能なＣＤ−ＲＯＭなどの記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM, and the system, method, integrated circuit, and computer program. And any combination of recording media.

本発明のデータ分析方法、データ分析装置およびプログラムは、多様な情報を反映したクラスタリングが可能となる。 The data analysis method, data analysis apparatus, and program of the present invention enable clustering reflecting various information.

実施の形態１に係るデータ分析方法を実行するためのデータ分析システムの概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a data analysis system for executing a data analysis method according to Embodiment 1. FIG. 実施の形態１に係る基礎行列の一例を示す説明図である。6 is an explanatory diagram illustrating an example of a basic matrix according to Embodiment 1. FIG. 実施の形態１に係る基礎行列の変形例を示す説明図である。FIG. 10 is an explanatory diagram illustrating a modification of the basic matrix according to the first embodiment. 実施の形態１に係るデータ分析装置の概略構成を示すブロック図である。1 is a block diagram illustrating a schematic configuration of a data analysis apparatus according to a first embodiment. 実施の形態１に係る基礎行列、第１行列、第２行列および第３行列の概念を示す説明図である。6 is an explanatory diagram illustrating concepts of a basic matrix, a first matrix, a second matrix, and a third matrix according to Embodiment 1. FIG. 実施の形態１に係るデータ分析方法の流れを示すフローチャートである。3 is a flowchart showing a flow of a data analysis method according to the first embodiment. 実施の形態１に係る分解処理の流れを示すフローチャートである。3 is a flowchart showing a flow of decomposition processing according to the first embodiment. 実施の形態１に係る基礎行列の一例を示す説明図である。6 is an explanatory diagram illustrating an example of a basic matrix according to Embodiment 1. FIG. 図８の基礎行列を基にしてデータ分析方法を行い、得られた第１行列、第２行列、第３行列の一例を示す説明図である。It is explanatory drawing which shows an example of the 1st matrix, the 2nd matrix, and the 3rd matrix which were obtained by performing the data analysis method based on the basic matrix of FIG. 実施の形態２に係る分解処理の流れを示すフローチャートである。10 is a flowchart showing a flow of decomposition processing according to the second embodiment. 実施の形態２に係る削除処理の流れを示すフローチャートである。10 is a flowchart showing a flow of deletion processing according to the second embodiment. データ分析装置の変形例を示すブロック図である。It is a block diagram which shows the modification of a data analyzer.

（本発明の基礎となった知見）
本発明者は、「背景技術」の欄において記載した方法に関し、以下の問題が生じることを見出した。 (Knowledge that became the basis of the present invention)
The inventor has found that the following problems occur with respect to the method described in the “Background Art” column.

購買履歴や録画履歴などを収集する場合、収集できるデータは、ある人が、「ある商品を購入した」、または「ある番組を録画した」という情報のみが収集されていて、「商品を購入していない」、「番組を録画していない」という情報は直接的には収集できてはいない。以下、説明の便宜上、収集データが録画履歴に関するデータである場合を例示して説明する。すなわち、「録画していない」という情報は、「録画した」という情報が蓄積されているため、「蓄積されていない」＝「録画していない」というように逆算で求められる。つまり、「録画した」という情報は、「ユーザがその番組が好きだから録画した」というように考えられるが、「録画していない」という情報には、「ユーザがその番組が嫌いだから録画していない」という意味と、「ユーザがそもそもその番組の存在を知らない」という意味との２種類の意味が考えられる。しかしながら、非特許文献１の方法では、それらの２種類の「録画していない」という情報は考慮されず、「録画した」＝「その番組が好き」という情報のみを考慮してクラスタリングが行われている。つまり、その番組が「嫌いである」という情報は全く考慮されていないことになる。 When collecting purchase history, recording history, etc., the only data that can be collected is the information that a person has purchased a product or recorded a program. Information that “not recorded” or “not recorded” has not been collected directly. Hereinafter, for convenience of explanation, a case where collected data is data related to a recording history will be described as an example. That is, since the information “recorded” is accumulated, the information “not recorded” is obtained by reverse calculation such that “not recorded” = “not recorded”. In other words, the information “recorded” can be thought of as “recorded because the user likes the program”, but the information “not recorded” includes “recording because the user dislikes the program”. There are two types of meanings: the meaning of “not present” and the meaning of “the user does not know the existence of the program in the first place”. However, in the method of Non-Patent Document 1, the two types of information “not recorded” are not considered, and clustering is performed considering only the information “recorded” = “I like the program”. ing. That is, the information that the program is “disliked” is not considered at all.

このため、「嫌いである」という情報を考慮したクラスタリングを可能とすることで、多様な情報を反映したクラスタリングを可能とする。 For this reason, by enabling clustering in consideration of the information “I hate”, it is possible to perform clustering reflecting various information.

このような問題を解決するため、本発明の一態様にかかるデータ分析方法は、Ｎ個の第１対象物のそれぞれと、Ｍ個の第２対象物のそれぞれとの関連度を示すＮ行Ｍ列の基礎行列を、３つの行列に分解して第１対象物および第２対象物のうち少なくとも一つをクラスタリングするデータ分析方法であって、基礎行列の各要素に対して、関連度を示す値が入力された基礎行列を取得する取得ステップと、第１対象物のクラスタ数を示すＫと、第２対象物のクラスタ数を示すＬとを設定する設定ステップと、３つの行列を、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ列の行列であって、特定の１行および特定の１列の少なくとも一方の各要素に、所定範囲に収まる数値が格納されている第２行列と、Ｌ行Ｍ列の第３行列とし、第１行列、第２行列および第３行列の積が、基礎行列に近似するように、第１行列、第２行列および第３行列に分解する分解ステップと、第１行列、第２行列および第３行列の少なくとも一つを出力することで、第１対象物および第２対象物のうち少なくとも一つのクラスタリング結果を出力する出力ステップと、を含む。 In order to solve such a problem, the data analysis method according to one aspect of the present invention provides N rows M indicating the degree of association between each of the N first objects and each of the M second objects. A data analysis method for decomposing a matrix of columns into three matrices and clustering at least one of a first object and a second object and indicating a degree of relevance for each element of the matrix An acquisition step of acquiring a basic matrix to which values are input, a setting step of setting K indicating the number of clusters of the first object, and L indicating the number of clusters of the second object, and three matrices N A first matrix of rows and K columns, and a second matrix of K rows and L columns, in which numerical values that fall within a predetermined range are stored in each element of at least one specific row and one specific column; , A third matrix of L rows and M columns, a first matrix, a second matrix and a third matrix A decomposition step for decomposing the first matrix, the second matrix, and the third matrix so that the product of the columns approximates the base matrix, and outputting at least one of the first matrix, the second matrix, and the third matrix And outputting the clustering result of at least one of the first object and the second object.

これにより、ユーザが対象物を取得しなかった要因を考慮したクラスタリングを行うことが可能となる。したがって、多様な情報を反映したクラスタリングが可能となる。 Thereby, it becomes possible to perform clustering in consideration of the factor that the user did not acquire the object. Therefore, clustering reflecting various information becomes possible.

例えば、特定の１行および特定の１列の各要素に所定範囲に収まる数値が格納されていてもよい。 For example, numerical values that fall within a predetermined range may be stored in each element of a specific row and a specific column.

これにより、第２行列の特定の１行および特定の１列の各要素に所定範囲に収まる数値が格納されているので、異なる情報（対象物の周知度、ユーザの取得頻度）を反映させたクラスタリングが可能となる。 As a result, numerical values that fall within a predetermined range are stored in each element of a specific row and a specific column of the second matrix, so that different information (well-known level of the object, user acquisition frequency) is reflected. Clustering is possible.

例えば、所定の範囲に収まる数値を、実質的に０となる正の値としてもよい。 For example, a numerical value falling within a predetermined range may be a positive value that is substantially zero.

これにより、所定の範囲に収まる数値が実質的に０となる正の値であるので、クラスタとの関連度をほとんどなくすことができ、多様な情報に特化した値を求めることができる。 Thereby, since the numerical value falling within the predetermined range is a positive value that is substantially 0, the degree of association with the cluster can be almost eliminated, and a value specialized for various information can be obtained.

例えば、第１行列の各行における各要素の総和を全ての行で実質的に同じ値としてもよい。 For example, the sum of the elements in each row of the first matrix may be substantially the same value in all rows.

これにより、第１行列の各行における各要素の総和が全ての行で実質的に同じ値であるので、第１行列の各列の値の比較を容易に行うことができる。 Thereby, since the sum total of each element in each row of the first matrix is substantially the same value in all rows, the values of the respective columns of the first matrix can be easily compared.

例えば、第３行列の各列における各要素の総和を全ての列で実質的に同じ値としてもよい。 For example, the sum of the elements in each column of the third matrix may be substantially the same value in all columns.

これにより、第３行列の各列における各要素の総和が全ての列で実質的に同じ値であるので、第３行列の各行の値の比較を容易に行うことができる。 Thereby, since the sum total of each element in each column of the third matrix is substantially the same value in all columns, it is possible to easily compare the values of each row of the third matrix.

例えば、分解ステップは、第１行列、第２行列および第３行列の積と、基礎行列との差が小さくなるように、第１行列、第２行列および第３行列を更新することを繰り返すことにしてもよい。 For example, the decomposition step repeats updating the first matrix, the second matrix, and the third matrix so that a difference between a product of the first matrix, the second matrix, and the third matrix and a base matrix is reduced. It may be.

これにより、第１行列、第２行列および第３行列の積と、基礎行列との差が小さくなるように、第１行列、第２行列および第３行列を更新しているので、行列分解をスムーズに行うことができる。 As a result, the first matrix, the second matrix, and the third matrix are updated so that the difference between the product of the first matrix, the second matrix, and the third matrix and the basic matrix becomes small. It can be done smoothly.

例えば、分解ステップは、第２行列の特定の１行以外の行において、ｋ行目における各要素が所定範囲に収まる数値である場合には、第２行列におけるｋ行目を削除し、第１行列におけるｋ列目を削除することで、Ｎ行Ｋ−１列の第１行列と、Ｋ−１行Ｌ列の第２行列と、Ｌ行Ｍ列の第３行列とに更新することにしてもよい。 For example, the decomposition step deletes the k-th row in the second matrix when each element in the k-th row is a numerical value that falls within a predetermined range in a row other than the specific one row of the second matrix, By deleting the k-th column in the matrix, the first matrix with N rows and K-1 columns, the second matrix with K-1 rows and L columns, and the third matrix with L rows and M columns are updated. Also good.

これにより、処理の高速化、クラスタリングの正確性を高めることができる。 Thereby, it is possible to increase the processing speed and the accuracy of clustering.

例えば、分解ステップは、第２行列の特定の１列以外の列において、ｌ行目における各要素が所定範囲に収まる数値である場合には、第２行列におけるｌ列目を削除し、第３行列におけるｌ行目を削除することで、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ−１列の第２行列と、Ｌ−１行Ｍ列の第３行列とに更新することとしてもよい。 For example, the decomposition step deletes the l-th column in the second matrix when each element in the l-th row is a numerical value that falls within a predetermined range in a column other than the specific one column of the second matrix, By deleting the l-th row in the matrix, updating to the first matrix of N rows and K columns, the second matrix of K rows and L-1 columns, and the third matrix of L-1 rows and M columns may be performed. Good.

例えば、第１対象物はユーザであり、Ｎ行Ｍ列の基礎行列の各要素に対する関連度は、Ｍ個の第２対象物のそれぞれに対するＮ人のユーザの関心の有無を示すのでもよい。 For example, the first object may be a user, and the relevance level for each element of the basic matrix of N rows and M columns may indicate whether or not N users are interested in each of the M second objects.

また、本発明の一態様にかかるデータ分析装置は、Ｎ個の第１対象物のそれぞれと、Ｍ個の第２対象物のそれぞれとの関連度を示すＮ行Ｍ列の基礎行列を、３つの行列に分解して第１対象物および第２対象物のうち少なくとも一つをクラスタリングするデータ分析装置であって、基礎行列の各要素に対して、関連度を示す値が入力された基礎行列を取得する取得部と、第１対象物のクラスタ数を示すＫと、第２対象物のクラスタ数を示すＬとを設定する設定部と、３つの行列を、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ列の行列であって、特定の行および特定の列の少なくとも一方の各要素に、所定の範囲に収まる数値が格納されている第２行列と、Ｌ行Ｍ列の第３行列とし、第１行列、第２行列および第３行列の積が、基礎行列に近似するように、第１行列、第２行列および第３行列に分解する分解部と、第１行列、第２行列および第３行列の少なくとも一つを出力することで、前記第１対象物および前記第２対象物のうち少なくとも一つのクラスタリング結果を出力する出力部と、を有する。 In addition, the data analysis apparatus according to one aspect of the present invention provides a basic matrix of N rows and M columns indicating the degree of association between each of the N first objects and each of the M second objects. A data analysis apparatus that decomposes into one matrix and clusters at least one of the first object and the second object, and a basic matrix in which a value indicating the degree of association is input to each element of the basic matrix A setting unit that sets K indicating the number of clusters of the first object, and L indicating the number of clusters of the second object, and three matrices, the first matrix of N rows and K columns A second matrix in which numerical values that fall within a predetermined range are stored in each element of at least one of a specific row and a specific column, and a third matrix of L rows and M columns A matrix, and the product of the first matrix, the second matrix, and the third matrix approximates the basic matrix, A decomposition unit that decomposes the first matrix, the second matrix, and the third matrix; and outputs at least one of the first matrix, the second matrix, and the third matrix, so that the first object and the second object And an output unit that outputs at least one clustering result.

また、本発明の一態様にかかるプログラムは、コンピュータに、上記の記載のデータ分析方法を実行させるためのプログラムである。 A program according to one embodiment of the present invention is a program for causing a computer to execute the data analysis method described above.

（実施の形態１）
以下、実施の形態に係るデータ分析方法について、図面を参照しながら具体的に説明する。なお、以下で説明する実施の形態は、いずれも包括的または具体的な例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置および接続形態、ステップ、ステップの順序等は、一例であり、本発明を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 (Embodiment 1)
Hereinafter, a data analysis method according to an embodiment will be specifically described with reference to the drawings. It should be noted that each of the embodiments described below shows a comprehensive or specific example. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the present invention. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements.

[システムの全体構成]
図１は、実施の形態１に係るデータ分析方法を実行するためのデータ分析システムの概略構成を示すブロック図である。 [Overall system configuration]
FIG. 1 is a block diagram showing a schematic configuration of a data analysis system for executing the data analysis method according to the first embodiment.

データ分析システム１は、Ｍ個の対象物のそれぞれに対するＮ人のユーザの関心の有無を示すＮ行Ｍ列の基礎行列を３つの行列に分解して、ユーザをクラスタリングするデータ分析方法を実行する。対象物は、ユーザが関心を持つ対象物であり、例えばユーザによって購入或いはレンタルされる商品と、ユーザによって視聴又は録画・録音されるテレビ番組、ラジオ番組などの番組とが挙げられる。関心の有無としては、対象物が商品の場合では、購入したことを「関心有り」とし、購入していないことを「関心無し」とする。他方、対象物が番組の場合では、番組を視聴又は録画・録音したことを「関心有り」とし、番組を見ていないこと又は録画・録音していないことを「関心無し」とする。 The data analysis system 1 executes a data analysis method that decomposes a basic matrix of N rows and M columns indicating whether or not N users are interested in each of M objects into three matrices, and clusters users. . The object is an object that the user is interested in, and examples thereof include products purchased or rented by the user and programs such as TV programs and radio programs that are viewed or recorded / recorded by the user. As for the presence / absence of interest, when the object is a product, “purchased” means “purchased” and “not interested” means not purchased. On the other hand, when the target is a program, “interested” means that the program has been viewed or recorded / recorded, and “not interested” means that the program has not been viewed or recorded / recorded.

図２は、基礎行列の一例を示す説明図である。 FIG. 2 is an explanatory diagram illustrating an example of a basic matrix.

図２に示す基礎行列では、Ｎ人のユーザＵ１，Ｕ２，Ｕ３，Ｕ４，Ｕ５…がＭ個の対象物Ｏ１，Ｏ２，Ｏ３，Ｏ４，Ｏ５，Ｏ６…のそれぞれに対する関心の有無を示している。基礎行列の各要素に対しては、関心の有無を示す値が入力されている。具体的に、関心がある要素に対しては「１」が代入されており、関心のない要素には「０」が代入されている。例えば、対象物を番組として、当該番組に対する録画の有無を関心の有無とした場合には、ユーザが録画している番組に対しては「１」が入力され、ユーザが録画していない番組に対しては「０」が入力される。なお、データの値と形式は、あくまでも一例であり、これに限定されるものではない。各要素に入力される値は非負の値であればよい。 In the basic matrix shown in FIG. 2, N users U1, U2, U3, U4, U5... Indicate whether or not there are interests in each of the M objects O1, O2, O3, O4, O5, O6. . A value indicating the interest is input to each element of the basic matrix. Specifically, “1” is assigned to an element of interest, and “0” is assigned to an element of no interest. For example, when the target is a program and the presence / absence of recording on the program is determined to be interested, “1” is input to the program recorded by the user, and the program is not recorded by the user. On the other hand, “0” is input. Note that the data values and formats are merely examples, and the present invention is not limited to these. The value input to each element may be a non-negative value.

図３は、基礎行列の変形例を示す説明図である。 FIG. 3 is an explanatory diagram showing a modification of the basic matrix.

図３の例では、番組に対する関心の有無を５段階評価で示した場合を示している。この場合においても、最大値である「５」が「１」となるように、各要素の値を正規化すればよい。 In the example of FIG. 3, the case where the presence or absence of interest with respect to a program is shown by 5-step evaluation is shown. Even in this case, the value of each element may be normalized so that the maximum value “5” becomes “1”.

そして、データ分析システム１は、この基礎行列を３つの行列に分解することで、ユーザ或いは対象物をクラスタリングする。 Then, the data analysis system 1 clusters users or objects by decomposing the basic matrix into three matrices.

具体的に、データ分析システム１は、図１に示すように、入力装置２００と、表示装置３００と、データ分析装置４００とを備えている。入力装置２００と、表示装置３００と、データ分析装置４００とはネットワーク５００を介して通信可能に接続されている。 Specifically, the data analysis system 1 includes an input device 200, a display device 300, and a data analysis device 400, as shown in FIG. The input device 200, the display device 300, and the data analysis device 400 are communicably connected via a network 500.

ネットワーク５００とは、イーサネット（登録商標）等の有線ネットワーク、無線ＬＡＮ等の無線ネットワーク、公衆網、または、これらのネットワークが組み合わされたネットワーク等である。公衆網とは、電気通信事業者が、不特定多数の利用者の通信のために提供している通信回線のことであり、例えば、一般電話回線またはＩＳＤＮなどが挙げられる。 The network 500 is a wired network such as Ethernet (registered trademark), a wireless network such as a wireless LAN, a public network, or a network in which these networks are combined. A public network is a communication line provided by a telecommunications carrier for communication of an unspecified number of users, and includes, for example, a general telephone line or ISDN.

入力装置２００は、Ｎ行Ｍ列の基礎行列が入力される装置である。入力装置２００は、例えばキーボード、タッチパネル、ポインティングデバイスなどの入力部２１０を備えたパーソナルコンピューター、スマートフォン、フィーチャーフォン、タブレット端末などである。入力装置２００は、Ｎ行Ｍ列の基礎行列が入力されると、当該基礎行列をネットワーク５００を介してデータ分析装置４００に送信する。 The input device 200 is a device to which a basic matrix of N rows and M columns is input. The input device 200 is, for example, a personal computer, a smartphone, a feature phone, a tablet terminal, or the like provided with an input unit 210 such as a keyboard, a touch panel, or a pointing device. When the basic matrix of N rows and M columns is input, the input device 200 transmits the basic matrix to the data analysis device 400 via the network 500.

表示装置３００は、基礎行列と、３つの行列とのうち少なくとも一つの行列がデータ分析装置４００から入力されると、当該少なくとも一つの行列を表示する装置である。表示装置３００は、例えばディスプレイなどの表示部３１０を備えたパーソナルコンピューター、スマートフォン、フィーチャーフォン、タブレット端末などである。表示装置３００の表示部３１０に表示された少なくとも一つの行列を解析者が閲覧することで、クラスタリングされた結果を解析することができる。 When at least one of the basic matrix and the three matrices is input from the data analysis device 400, the display device 300 is a device that displays the at least one matrix. The display device 300 is, for example, a personal computer, a smartphone, a feature phone, a tablet terminal, or the like provided with a display unit 310 such as a display. The analyst views at least one matrix displayed on the display unit 310 of the display device 300, so that the clustered result can be analyzed.

なお、本実施の形態では、入力装置２００と表示装置３００とが独立した異なる端末である場合を例示しているが、入力装置２００と表示装置３００とが一台の端末であってもよい。 In this embodiment, the case where the input device 200 and the display device 300 are independent and different terminals is illustrated, but the input device 200 and the display device 300 may be a single terminal.

［データ分析装置］
データ分析装置４００は、Ｎ行Ｍ列の基礎行列を３つの行列に分解する処理装置である。データ分析装置４００は、例えば、サーバ、パーソナルコンピューター、スマートフォン、フィーチャーフォン、タブレット端末などである。 [Data analysis equipment]
The data analysis device 400 is a processing device that decomposes a basic matrix of N rows and M columns into three matrices. The data analysis device 400 is, for example, a server, a personal computer, a smartphone, a feature phone, a tablet terminal, or the like.

図４は、データ分析装置４００の概略構成を示すブロック図である。 FIG. 4 is a block diagram illustrating a schematic configuration of the data analysis apparatus 400.

図４に示すように、データ分析装置４００は、取得部４１０と、処理部４２０と、出力部４３０とを備えている。 As illustrated in FIG. 4, the data analysis device 400 includes an acquisition unit 410, a processing unit 420, and an output unit 430.

取得部４１０は、入力装置２００からネットワーク５００を介して入力された基礎行列を取得し、処理部４２０に出力する。 The acquisition unit 410 acquires a basic matrix input from the input device 200 via the network 500 and outputs the basic matrix to the processing unit 420.

処理部４２０は、取得部４１０から入力された基礎行列を３つの行列に分解する処理部であり、ＣＰＵ、ＲＡＭ、ＲＯＭ等を備える。処理部４２０は、格納部４２１と、設定部４２２と、分解部４２３とを備える。 The processing unit 420 is a processing unit that decomposes the basic matrix input from the acquisition unit 410 into three matrices, and includes a CPU, a RAM, a ROM, and the like. The processing unit 420 includes a storage unit 421, a setting unit 422, and a decomposition unit 423.

格納部４２１は、取得部４１０から入録された基礎行列を記憶する記憶領域であり、例えば不揮発性メモリまたは揮発性メモリである。 The storage unit 421 is a storage area for storing the basic matrix recorded from the acquisition unit 410, and is, for example, a nonvolatile memory or a volatile memory.

設定部４２２は、分解部４２３での分解処理で用いられる設定項目を記憶している。設定項目としては、例えば３つの行列のサイズを決める値であるＫとＬとが挙げられる。また、設定項目としては、分解処理時における収束判定条件などが挙げられる。設定部４２２は、分解部４２３で分解処理が行われる際に、設定項目を分解部４２３に出力することで、設定項目を設定する。 The setting unit 422 stores setting items used in the disassembling process in the disassembling unit 423. Examples of setting items include K and L, which are values that determine the sizes of the three matrices. In addition, examples of the setting item include a convergence determination condition at the time of decomposition processing. The setting unit 422 sets the setting item by outputting the setting item to the decomposition unit 423 when the decomposition process is performed by the decomposition unit 423.

なお、設定項目は、設定部４２２に予め記憶されていなくとも、入力装置２００から入力された設定値を設定項目としてもよい。この場合、入力装置２００から取得部４１０を介して受信した設定項目を設定部４２２が記憶する。 Note that the setting item may not be stored in the setting unit 422 in advance, but may be a setting value input from the input device 200 as the setting item. In this case, the setting unit 422 stores the setting item received from the input device 200 via the acquisition unit 410.

分解部４２３は、３つの行列を、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ列の行列であって、特定の１行および特定の１列の少なくとも一方の各要素に、所定範囲に収まる数値が格納されている第２行列と、Ｌ行Ｍ列の第３行列とし、第１行列、第２行列および第３行列の積が、基礎行列に近似するように、第１行列、第２行列および第３行列に分解する。 The decomposition unit 423 divides the three matrices into a first range of N rows and K columns and a matrix of K rows and L columns, and each element in at least one of one specific row and one specific column is within a predetermined range. The first matrix, the second matrix, and the third matrix with L rows and M columns, the first matrix, the second matrix, and the third matrix are approximated to the basic matrix. Decompose into two and third matrices.

図５は、本実施の形態に係る基礎行列、第１行列、第２行列および第３行列の概念を示す説明図である。 FIG. 5 is an explanatory diagram showing the concept of the basic matrix, the first matrix, the second matrix, and the third matrix according to the present embodiment.

例えばＮ行Ｍ列の基礎行列を１４行２０列の行列とし、Ｋ＝３、Ｌ＝３とすると、Ｎ行Ｋ列の第１行列は２０行３列の行列となり、Ｋ行Ｌ列の第２行列は３行３列の行列となり、Ｌ行Ｍ列の第３行列は３行１４列の行列となる。 For example, if the basic matrix of N rows and M columns is a matrix of 14 rows and 20 columns, and K = 3 and L = 3, the first matrix of N rows and K columns becomes a matrix of 20 rows and 3 columns, and the first matrix of K rows and L columns. The 2 matrix is a matrix of 3 rows and 3 columns, and the third matrix of L rows and M columns is a matrix of 3 rows and 14 columns.

Ｋとは、ユーザクラスタ数の値である。Ｌとは、対象物クラスタ数の値である。ただし、ユーザクラスタ（第１対象物のクラスタ）および対象物クラスタ（第２対象物クラスタ）とは直接関係の無い要素（クラスタ）が１列或いは１行分だけ第１行列、第２行列および第３行列に含まれる。 K is the value of the number of user clusters. L is the value of the number of object clusters. However, the elements (clusters) that are not directly related to the user cluster (cluster of the first object) and the object cluster (second object cluster) include the first matrix, the second matrix, and the first matrix corresponding to one column or one row. Included in 3 matrices.

図５の場合においては、第１対象物のクラスタ数を示すＫを「３」として、第２対象物のクラスタ数を示すＬを「３」としており、今後の説明においては、それぞれ３つのクラスタのうち、２つは第１対象物と第２対象物とのそれぞれのクラスタに直接関係があり、１つはそれぞれのクラスタに直接関係の無いクラスタであるものとして説明する。なお、解析者によって、分類したいクラスタのうちで、それぞれのクラスタに直接関係のあるユーザクラスタ数と対象物クラスタ数だけが入力された場合においては、設定部４２２或いは分解部４２３で各クラスタ数に１を加えた値（直接関係の無い要素を加えた値）をＫ、Ｌとするようにしてもよい。 In the case of FIG. 5, K indicating the number of clusters of the first object is set to “3”, and L indicating the number of clusters of the second object is set to “3”. Of these, two are described as being directly related to the respective clusters of the first object and the second object, and one being a cluster not directly related to each cluster. When the analyst inputs only the number of user clusters and the number of object clusters that are directly related to each of the clusters to be classified, the setting unit 422 or the decomposition unit 423 determines the number of clusters. A value obtained by adding 1 (a value obtained by adding an element not directly related) may be K and L.

ここで、所定範囲に収まる数値とは、実質的に０となる正の値であり、具体的には０以上０．１以下の範囲に収まる値であり、好ましくは０以上０．０１以下の範囲に収まる値である。また、第２行列における特定の１行および特定の１列の少なくとも一方の各要素の値は、所定の範囲に収まる数値であれば同一でなくともよい。本実施の形態では、第２行列におけるＫ行目およびＬ列目の各要素の値を０とする場合を例示して説明する。しかし、第２行列におけるＫ行目およびＬ列目の一方の各要素の値を所定の範囲に収まる数値としてもよい。なお、特定の１行はＫ行目以外の行であってもよいし、特定の１列はＬ列目以外の列であってもよい。 Here, the numerical value that falls within the predetermined range is a positive value that is substantially 0, specifically, a value that falls within the range of 0 to 0.1, preferably 0 to 0.01. The value is within the range. In addition, the value of each element in at least one specific row and specific column in the second matrix may not be the same as long as it is a numerical value falling within a predetermined range. In the present embodiment, a case where the value of each element in the K-th row and the L-th column in the second matrix is 0 will be described as an example. However, the value of one element of the Kth row and the Lth column in the second matrix may be a numerical value that falls within a predetermined range. The specific row may be a row other than the Kth row, and the specific column may be a column other than the Lth column.

また、第２行列の特定の１列の各要素を所定の範囲に収まる数値とする場合、第３行列では、各列における各要素の総和が全ての列で同じ値とする条件を付与する。他方、第２行列の特定の１行の各要素を所定の範囲に収まる数値とする場合、第１行列では、各行における各要素の総和が全ての行で同じ値とする条件を付与する。「同じ値」としては、完全に一致する値でなくともよく、わずかに許容範囲を有した実質的に「同じ値」であればよい。「同じ値」は、如何なる値でもよいが、ユーザと対象物と各クラスタとの関係性が認識しやすくするべく「１」、「１００」などを用いてもよい。 Further, when each element of a specific column of the second matrix is set to a numerical value that falls within a predetermined range, the third matrix gives a condition that the sum of the elements in each column is the same value in all the columns. On the other hand, when each element in a specific row of the second matrix is a numerical value that falls within a predetermined range, the first matrix gives a condition that the sum of the elements in each row is the same value in all rows. The “same value” may not be a completely matching value, but may be a substantially “same value” having a slight tolerance. The “same value” may be any value, but “1”, “100”, or the like may be used to make it easy to recognize the relationship between the user, the object, and each cluster.

なお、第１行列、第２行列および第３行列のより具体的は説明については後述する。 A more specific description of the first matrix, the second matrix, and the third matrix will be described later.

そして、分解部４２３は、第１行列、第２行列および第３行列の積と、基礎行列との誤差が小さくなるように第１行列、第２行列および第３行列を繰り返し更新するが、設定部４２２に記憶された収束判定条件を満たすと更新を終了する。例えば所定回数（１０００回など）以上の繰り返しが行われた場合に収束判定条件を満たしたと判定する方式、第１行列、第２行列および第３行列の積と、基礎行列との誤差が所定値（例えば１ｅ−６）以下となった場合に収束判定条件を満たしたと判定する方式、一回の更新の前後での第１行列、第２行列および第３行列の積の誤差が所定値（例えば１ｅ−６）以下となった場合に収束判定条件を満たしたと判定する方式などが挙げられる。各方式を一つだけ用いてもよいし、組み合わせて用いてもよい。また、誤差とは、各行列要素の引き算で得られた値の和であってもいいし、各要素の引き算で得られた値の二乗和であってもよい。 The decomposition unit 423 repeatedly updates the first matrix, the second matrix, and the third matrix so that the error between the product of the first matrix, the second matrix, and the third matrix and the basic matrix becomes small. When the convergence determination condition stored in the unit 422 is satisfied, the update is terminated. For example, a method for determining that the convergence determination condition is satisfied when a predetermined number of times (1000 times or more) is repeated, and an error between the product of the first matrix, the second matrix, and the third matrix and the base matrix is a predetermined value. (E.g., 1e-6) or less, a method of determining that the convergence determination condition is satisfied, and the error of the product of the first matrix, the second matrix, and the third matrix before and after one update is a predetermined value (e.g., 1e-6) A method of determining that the convergence determination condition is satisfied when the value is equal to or less than 1e-6) can be given. Each method may be used alone or in combination. The error may be a sum of values obtained by subtraction of each matrix element, or may be a sum of squares of values obtained by subtraction of each element.

図４に示すように、出力部４３０は、基礎行列、第１行列、第２行列および第３行列の少なくとも一つを表示装置３００に出力する。出力部４３０は、基礎行列、第１行列、第２行列および第３行列を一括して出力してもよいし、これらを組み合わせて出力してもよい。また、出力部４３０は、最終的な、第１行列、第２行列および第３行列の積と、基礎行列との誤差も表示装置３００に出力してもよい。 As illustrated in FIG. 4, the output unit 430 outputs at least one of the basic matrix, the first matrix, the second matrix, and the third matrix to the display device 300. The output unit 430 may output the basic matrix, the first matrix, the second matrix, and the third matrix in a lump, or may output them in combination. The output unit 430 may also output an error between the final product of the first matrix, the second matrix, and the third matrix and the basic matrix to the display device 300.

［データ分析方法］
次に、本実施の形態に係るデータ分析方法について説明する。 [Data analysis method]
Next, a data analysis method according to the present embodiment will be described.

図６は、本実施の形態に係るデータ分析方法の流れを示すフローチャートである。 FIG. 6 is a flowchart showing the flow of the data analysis method according to the present embodiment.

入力装置２００では、基礎行列の各要素に、関心の有無を示す値が入力される。また、入力装置２００では、Ｋ、Ｌ、収束判定条件も入力される。これらの入力後においては、入力装置２００は、基礎行列、Ｋ、Ｌおよび収束判定条件をデータ分析装置４００に出力する。なお、設定項目がすでにデータ分析装置４００の設定部４２２に設定されていて、それが以降の処理に用いられる場合には、入力装置２００での設定項目の入力は不要である。 In the input device 200, a value indicating the interest is input to each element of the basic matrix. In the input device 200, K, L, and convergence determination conditions are also input. After these inputs, the input device 200 outputs the basic matrix, K, L, and convergence determination conditions to the data analysis device 400. Note that if the setting item has already been set in the setting unit 422 of the data analysis device 400 and is used for the subsequent processing, it is not necessary to input the setting item with the input device 200.

データ分析装置４００の取得部４１０は、入力装置２００からネットワークを介して入力された基礎行列、Ｋ、Ｌおよび収束判定条件を取得する（ステップＳ１）。取得後においては、取得部４１０は、基礎行列を格納部４２１に格納する。 The acquisition unit 410 of the data analysis device 400 acquires the basic matrix, K, L, and convergence determination condition input from the input device 200 via the network (step S1). After the acquisition, the acquisition unit 410 stores the basic matrix in the storage unit 421.

また、設定部４２２は、取得部４１０で取得したＫとＬと収束判定条件とを設定項目として記憶する（ステップＳ２，Ｓ３）。 Further, the setting unit 422 stores K and L acquired by the acquisition unit 410 and the convergence determination condition as setting items (steps S2 and S3).

データ分析装置４００の分解部４２３は、基礎行列と設定項目とに基づいて分解処理を実行する。 The decomposition unit 423 of the data analysis device 400 executes the decomposition process based on the basic matrix and the setting items.

図７は本実施の形態に係る分解処理の流れを示すフローチャートである。 FIG. 7 is a flowchart showing the flow of the disassembling process according to the present embodiment.

分解部４２３は、Ｎ、Ｍ、Ｋ、Ｌに基づいて第１行列、第２行列および第３行列を生成する。このとき、分解部４２３は、第１行列、第２行列、第３行列の各要素に対してランダムな値を代入し、初期化する（ステップＳ１１）。 The decomposition unit 423 generates the first matrix, the second matrix, and the third matrix based on N, M, K, and L. At this time, the decomposing unit 423 substitutes random values for each element of the first matrix, the second matrix, and the third matrix, and initializes them (step S11).

次いで、分解部４２３は、ｉを０とする（ステップＳ１２）。 Next, the decomposition unit 423 sets i to 0 (step S12).

次いで、分解部４２３は、ｉが収束判定条件の所定回数以上であるか否かを判定し、所定回数未満である場合にはステップＳ１４に移行し、所定回数以上である場合には分解処理を終了する。 Next, the decomposing unit 423 determines whether i is equal to or greater than the predetermined number of convergence determination conditions. If it is smaller than the predetermined number, the process proceeds to step S14. If i is equal to or greater than the predetermined number, the decomposing process is performed. finish.

ステップＳ１４では、分解部４２３は、第１行列、第２行列および第３行列を更新する。 In step S14, the decomposition unit 423 updates the first matrix, the second matrix, and the third matrix.

更新時においては、分解部４２３は、初期化した第１行列、第２行列および第３行列を、所定の行列更新式を用いて、更新を行うことで、第１行列、第２行列および第３行列の積が基礎行列に近似する第１行列、第２行列および第３行列を求める。このとき、分解部４２３は、第２行列のＫ行目およびＬ列目の各要素の値が０となるように更新する。 At the time of updating, the decomposing unit 423 updates the initialized first matrix, second matrix, and third matrix using a predetermined matrix updating formula, so that the first matrix, the second matrix, and the second matrix are updated. A first matrix, a second matrix, and a third matrix are obtained in which the product of the three matrices approximates the basic matrix. At this time, the decomposing unit 423 updates the values of the elements in the Kth row and the Lth column of the second matrix to be 0.

以下に行列更新式の一例を示す。 An example of the matrix update formula is shown below.

なお、以下の行列更新式（１）では基礎行列がＸであり、第１行列がＦであり、第２行列がＳであり、第３行列がＧ^Ｔである。またα、β、γは定数である。Ｓ^*は、Ｓの特定
の１行および特定の１列の少なくとも一方の各要素の値に所定の範囲に収まる値（本実施の形態では０）を入れた行列である。

In the following matrix updating formula (1) and the fundamental matrix is X, the first matrix is F, the second matrix is S, the third matrix is G ^T. Α, β, and γ are constants. S ^* is a matrix in which a value (0 in the present embodiment) that falls within a predetermined range is inserted into the value of each element of at least one specific row and specific column of S.

分解部４２３はこの行列更新式（１）が最小となるように第１行列、第２行列および第３行列を更新する。 The decomposition unit 423 updates the first matrix, the second matrix, and the third matrix so that the matrix update formula (1) is minimized.

行列更新式（１）の詳細を示した数式の一例を下記に示す。

但し、Ｆ_ｗ,ｋは、Ｆのｗ行ｋ列の要素の値示す（１≦ｗ≦Ｎ、１≦ｋ≦Ｋ）。
Ｓ_ｋ,ｌは、Ｓのｋ行ｌ列の要素の値を示す（１≦ｋ≦Ｋ、１≦ｌ≦Ｌ）。
Ｇ_ｔ,ｌは、Ｇのｔ行ｌ列の要素の値の値を示す（１≦ｔ≦Ｍ、１≦ｌ≦Ｌ）。
Ｆ^Ｔは、Ｆの転置行列を示す。
Ｇ^Ｔは、Ｇの転置行列を示す。
Ｓ^Ｔは、Ｓの転置行列を示す。
Ｘ^Ｔは、Ｘの転置行列を示す。
Ａ^Ｔは、Ａの転置行列を示す。 An example of a mathematical expression showing details of the matrix update equation (1) is shown below.

Here, F _{w, k} indicates the value of the element of F in w rows and k columns (1 ≦ w ≦ N, 1 ≦ k ≦ K).
S _{k, l} indicates the value of an element of k rows and l columns of S (1 ≦ k ≦ K, 1 ≦ l ≦ L).
G _{t, l} represents the value of the element of G row t column l (1 ≦ t ≦ M, 1 ≦ l ≦ L).
F ^T indicates the transposed matrix of F.
G ^T denotes a transposed matrix of G.
S ^T indicates a transposed matrix of S.
X ^T represents a transposed matrix of X.
^AT represents the transpose matrix of A.

分解部４２３は、式（２）〜（４）を用いて第１行列、第２行列および第３行列を更新することで、誤差が小さくなるように更新することができる。 The decomposing unit 423 can update the first matrix, the second matrix, and the third matrix so as to reduce the error by using the equations (2) to (4).

次いで、分解部４２３は、更新後の第１行列、第２行列、第３行列の積と、基礎行列との誤差を算出する（ステップＳ１５）。 Next, the decomposing unit 423 calculates an error between the product of the updated first matrix, second matrix, and third matrix and the base matrix (step S15).

次いで、分解部４２３は、ステップＳ１５で求めた誤差が所定値以下か否かを判定し、所定値以下の場合にはステップＳ１７に移行し、所定値よりも大きい場合には分解処理を終了する。 Next, the decomposing unit 423 determines whether or not the error obtained in step S15 is equal to or smaller than a predetermined value. If the error is equal to or smaller than the predetermined value, the process proceeds to step S17. .

ステップＳ１７では、分解部４２３は、ｉに１を加えてステップＳ１３に移行する。これにより、誤差が所定値以下となるまで、或いはｉが所定回数となるまで、第１行列、第２行列および第３行列の更新が繰り返されることになる。分解処理の終了時においては、第１行列、第２行列および第３行列の積と基礎行列との誤差がほとんどない第１行列、第２行列および第３行列が求められる。 In step S17, the disassembling unit 423 adds 1 to i and proceeds to step S13. Thereby, the update of the first matrix, the second matrix, and the third matrix is repeated until the error becomes equal to or smaller than the predetermined value or until i reaches the predetermined number of times. At the end of the decomposition process, the first matrix, the second matrix, and the third matrix with almost no error between the product of the first matrix, the second matrix, and the third matrix and the basic matrix are obtained.

分解処理が終了し、図５のステップＳ５に移行すると、出力部４３０を表示装置３００に対して基礎行列、第１行列、第２行列および第３行列を出力する。表示装置３００では、基礎行列、第１行列、第２行列および第３行列が表示されるので、解析者がこれらを閲覧することで、クラスタリングされた結果を解析することができる。 When the disassembly process ends and the process proceeds to step S5 in FIG. 5, the output unit 430 outputs the basic matrix, the first matrix, the second matrix, and the third matrix to the display device 300. Since the basic matrix, the first matrix, the second matrix, and the third matrix are displayed on the display device 300, an analyst can analyze them to analyze the clustered result.

［各行列の一例］
次に、データ分析方法で用いた基礎行列と、データ分析方法によって得られた第１行列、第２行列および第３行列の一例について説明する。 [An example of each matrix]
Next, an example of the basic matrix used in the data analysis method and the first matrix, the second matrix, and the third matrix obtained by the data analysis method will be described.

図８は、基礎行列の一例を示す説明図である。 FIG. 8 is an explanatory diagram illustrating an example of a basic matrix.

図８に示す基礎行列は、２０行１４列の行列である。２０人のユーザＵ１〜Ｕ２０が１４個の対象物としての番組Ｐ１〜Ｐ１４を録画したか否かを図７の基礎行列に示している。各ユーザが録画をした番組に対しては「１」が入力され、録画していない番組に対しては「０」が入力されている。この「１」および「０」が関心の有無を示す値である。 The basic matrix shown in FIG. 8 is a matrix with 20 rows and 14 columns. The basic matrix of FIG. 7 shows whether or not 20 users U1 to U20 have recorded 14 programs P1 to P14. “1” is input for a program recorded by each user, and “0” is input for a program not recorded. These “1” and “0” are values indicating the presence or absence of interest.

図９は、図８の基礎行列を基にしてデータ分析方法を行い、得られた第１行列、第２行列、第３行列の一例を示す説明図である。なお、同じ基礎行列を用いたとしても、最終的な第１行列、第２行列、第３行列は、初期化時の各要素の値、収束判定条件、行列更新式によって異なる。 FIG. 9 is an explanatory diagram illustrating an example of the first matrix, the second matrix, and the third matrix obtained by performing the data analysis method based on the basic matrix of FIG. Even if the same basic matrix is used, the final first matrix, second matrix, and third matrix differ depending on the value of each element at the time of initialization, the convergence determination condition, and the matrix update expression.

図８の例では、Ｋを３とし、第２行列の３列（Ｋ列）目を特定の１列としている。つまり、第２行列の１列目、２列目はユーザクラスタＵＣ１，ＵＣ２に関する列となる。また、Ｌを３とし、第２行列の３行（Ｌ行）目を特定の１行としている。つまり、第２行列の１行目、２行目は番組クラスタＰＣ１，ＰＣ２に関する列となる。 In the example of FIG. 8, K is 3, and the third column (K column) of the second matrix is a specific column. That is, the first column and the second column of the second matrix are columns related to the user clusters UC1 and UC2. Further, L is 3, and the third row (L row) of the second matrix is a specific one row. That is, the first row and the second row of the second matrix are columns related to the program clusters PC1 and PC2.

第１行列では、２０人のユーザＵ１〜Ｕ２０と、ユーザクラスタＵＣ１，ＵＣ２および録画頻度とのそれぞれの関連度合いが各要素に格納されている。第２行列では、ユーザクラスタＵＣ１，ＵＣ２および録画頻度と、番組クラスタＰＣ１，ＰＣ２および周知度とのそれぞれの関連度合いが各要素に格納されている。第３行列では、番組クラスタＰＣ１，ＰＣ２および周知度と、１４個の番組Ｐ１〜Ｐ１４とのそれぞれの関連度合いが各要素に格納されている。 In the first matrix, the respective degrees of association between the 20 users U1 to U20, the user clusters UC1 and UC2, and the recording frequency are stored in each element. In the second matrix, the degree of association between the user clusters UC1 and UC2 and the recording frequency and the program clusters PC1 and PC2 and the degree of familiarity is stored in each element. In the third matrix, the program clusters PC1 and PC2, the degree of familiarity, and the degree of association between each of the 14 programs P1 to P14 are stored in each element.

ここで、本発明者は、第２行列の特定の１列に対して、ユーザクラスタＵＣ１，ＵＣ２との関連度が殆どないことを示す数値（所定範囲に収まる数値）を格納することで、結果的に番組（対象物）の周知度を示す行（図９に示す第３行列では３行目）が生成されることを見出した。図９の第３行列の場合、周知度を示す行（３行目）の各値が大きければ周知度の度合いは小さく、各値が小さければ周知度の度合いは大きいことを示している。周知度とは、その番組（対象物）がどれだけ知られているかを示す指標であり、その番組の人気度として用いてもよい。周知度が高い番組を録画しないユーザは、「その番組を知らないから録画していない」のではなく、「知っているのに録画しない」と推測することができる。「人気がある番組をあえて録画しない」とも考えられるので、「このユーザはこの番組を嫌い」であると推察することができる。この周知度を示す行が第３行列に生成されているために、分解処理では周知度が反映されて第１行列、第２行列および第３行列が更新される。したがって、「嫌い」という情報を考慮したクラスタリングが可能となる。 Here, the present inventor stores a numerical value (a numerical value falling within a predetermined range) indicating that there is almost no degree of association with the user clusters UC1 and UC2 for a specific column of the second matrix. As a result, it was found that a row indicating the degree of familiarity of the program (object) (the third row in the third matrix shown in FIG. 9) is generated. In the case of the third matrix in FIG. 9, the degree of familiarity is small when each value of the row (third line) indicating the degree of familiarity is large, and the degree of familiarity is large when each value is small. The degree of familiarity is an index indicating how much the program (object) is known, and may be used as the popularity of the program. A user who does not record a program with a high level of familiarity can estimate that he does not record because he does not know the program, but does not record. Since it is considered that “a popular program is not recorded intentionally”, it can be inferred that “this user does not like this program”. Since the row indicating the degree of familiarity is generated in the third matrix, the first matrix, the second matrix, and the third matrix are updated by reflecting the degree of familiarity in the decomposition process. Therefore, clustering considering information “dislike” is possible.

さらに、本発明者は、第２行列の特定の１行に対して番組クラスタＰＣ１，ＰＣ２との関連度が殆どないことを示す数値（所定範囲に収まる数値）を格納することで、結果的にユーザの録画頻度を示す行（図９に示す第１行列では３列目）が生成されることを見出した。図９の第１行列の場合、録画頻度を示す行（３列目）の各値が大きければユーザの録画頻度が小さく、各値が小さければ録画頻度が大きいことを示している。録画頻度とは、ユーザが番組を録画する度合いを示す指標である。そして、本発明者は、録画頻度を示す列が第１行列に生成されると、分解処理後の第１行列のその他の列には、各ユーザＵ１〜Ｕ２０と、ユーザクラスタＵＣ１，ＵＣ２との関連度が、録画頻度の影響を極力除いた値として格納されることを見出した。これにより、録画頻度によらず、嗜好の似通ったユーザのクラスタリングが可能となる。 Furthermore, the present inventor stores a numerical value (a numerical value falling within a predetermined range) indicating that there is almost no degree of association with the program clusters PC1 and PC2 with respect to a specific row of the second matrix. It was found that a row indicating the recording frequency of the user (the third column in the first matrix shown in FIG. 9) is generated. In the case of the first matrix in FIG. 9, if each value in the row (third column) indicating the recording frequency is large, the user's recording frequency is low, and if each value is small, the recording frequency is high. The recording frequency is an index indicating the degree to which the user records a program. Then, when the column indicating the recording frequency is generated in the first matrix, the present inventor includes each of the users U1 to U20 and the user clusters UC1 and UC2 in the other columns of the first matrix after the decomposition process. It was found that the relevance is stored as a value excluding the influence of the recording frequency as much as possible. This allows clustering of users with similar preferences regardless of the recording frequency.

なお、本実施の形態では、対象物が番組であるので、所定範囲に収まる数値を第２行列の特定の１行に格納することで、第１行列の１つの行に録画頻度が出現することになった。しかし対象物が商品である場合には、ユーザが商品を購入（或いはレンタル）する購入頻度が第１行列の１つの行に出現することになる。購入頻度、録画頻度ともにユーザが対象物を取得する度合いを示す指標であるため、これらをまとめて取得頻度と称してもよい。 In this embodiment, since the object is a program, the recording frequency appears in one row of the first matrix by storing a numerical value falling within a predetermined range in a specific row of the second matrix. Became. However, when the object is a product, the purchase frequency at which the user purchases (or rents) the product appears in one row of the first matrix. Since both the purchase frequency and the recording frequency are indices indicating the degree to which the user acquires an object, these may be collectively referred to as an acquisition frequency.

また、本実施の形態では、各ユーザＵ１〜Ｕ２０が第１行列における各行に対応し、各番組Ｐ１〜Ｐ１４が第３行列の各列に対応しているため、第２行列の特定の１列が周知度に対応し、特定の１行が録画頻度に対応している。逆に、各ユーザＵ１〜Ｕ２０が第３行列における各列に対応し、各番組Ｐ１〜Ｐ１４が第１行列の各行に対応している場合には、第２行列の特定の１行が周知度に対応し、特定の１列が録画頻度に対応する。つまり、クラスタリングしたい対象と、第１行列、第２行列および第３行列との関係性によって、周知度や録画頻度に対応する要素が第２行列の一つの行となったり、一つの列となったりする。周知度や録画頻度（取得頻度）の一方のみを考慮したクラスタリングを行うのであれば、上述の関係性を考慮して、第２行列の特定の１行および特定の１列の一方の各要素に対して、所定の範囲に収まる数値を格納すればよい。 Moreover, in this Embodiment, since each user U1-U20 respond | corresponds to each row in a 1st matrix, and each program P1-P14 respond | corresponds to each column of a 3rd matrix, specific 1 column of a 2nd matrix Corresponds to the degree of familiarity, and one specific line corresponds to the recording frequency. Conversely, when each user U1 to U20 corresponds to each column in the third matrix and each program P1 to P14 corresponds to each row of the first matrix, a specific one row of the second matrix is known. And one specific column corresponds to the recording frequency. In other words, depending on the relationship between the target to be clustered and the first matrix, the second matrix, and the third matrix, the element corresponding to the degree of familiarity and the recording frequency becomes one row or one column of the second matrix. Or If clustering is performed in consideration of only one of the degree of familiarity and the recording frequency (acquisition frequency), each element in one specific row and one specific column of the second matrix is considered in consideration of the above-described relationship. On the other hand, a numerical value that falls within a predetermined range may be stored.

［効果等］
以上のように、本実施の形態によれば、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ列の行列であって、特定の１行および特定の１列の少なくとも一方の各要素に、所定範囲に収まる数値が格納されている第２行列と、Ｌ行Ｍ列の第３行列との３つの行列の積が、基礎行列に近似するように、第１行列、第２行列および第３行列が分解されている。これにより、ユーザが対象物（番組、商品）を取得しなかった要因を考慮したクラスタリングを行うことが可能となる。したがって、多様な情報を反映したクラスタリングが可能となる。 [Effects]
As described above, according to the present embodiment, the first matrix of N rows and K columns and the matrix of K rows and L columns, each element of at least one specific row and specific column, The first matrix, the second matrix, and the third matrix are such that the product of the three matrices of the second matrix storing numerical values falling within a predetermined range and the third matrix of L rows and M columns approximates the basic matrix. The matrix is decomposed. Thereby, it becomes possible to perform clustering in consideration of the factor that the user did not acquire the object (program, product). Therefore, clustering reflecting various information becomes possible.

そして、本実施の形態の場合では、ユーザの録画履歴から「嫌い」といった情報が推察できていたが、上述したデータ分析方法がなければ「嫌い」という情報も収集しなければならない。つまり、多様な情報を反映したクラスタリングを行えるのであれば、それだけ情報収集に関するエネルギー消費を低減することができる。 In the case of the present embodiment, information such as “dislike” can be inferred from the user's recording history. However, if there is no data analysis method described above, information “dislike” must also be collected. In other words, if clustering reflecting a variety of information can be performed, energy consumption related to information collection can be reduced accordingly.

また、第２行列の特定の１行および特定の１列の各要素に所定範囲に収まる数値が格納されているので、対象物の周知度と、ユーザの取得頻度とを反映させたクラスタリングが可能となる。 In addition, since numerical values that fall within a predetermined range are stored in each element of a specific row and a specific column of the second matrix, clustering that reflects the familiarity of the object and the user's acquisition frequency is possible. It becomes.

また、所定の範囲に収まる数値が実質的に０となる正の値であるので、ユーザクラスタや番組クラスタとの関連度をほとんどなくすことができ、周知度、取得頻度に特化した値を求めることができる。 In addition, since the numerical value falling within the predetermined range is a positive value that is substantially 0, the degree of association with the user cluster or the program cluster can be almost eliminated, and a value specialized for the degree of familiarity and acquisition frequency is obtained. be able to.

また、第１行列の各行における各要素の総和が全ての行で実質的に同じ値であるので、取得頻度を示す各列の値を前記同じ値を基準にして算出することができる。したがって、前記各列の値の比較を容易に行うことができる。 Further, since the sum of the elements in each row of the first matrix is substantially the same value in all rows, the value of each column indicating the acquisition frequency can be calculated based on the same value. Therefore, the values of the respective columns can be easily compared.

また、第３行列の各列における各要素の総和が全ての列で実質的に同じ値であるので、周知度を示す各行の値を前記同じ値を基準にして算出することができる。したがって、前記各行の値の比較を容易に行うことができる。 In addition, since the sum of the elements in each column of the third matrix is substantially the same value in all columns, the value of each row indicating the degree of familiarity can be calculated based on the same value. Therefore, the values of the respective rows can be easily compared.

また、第１行列、第２行列および第３行列の積と、基礎行列との差が小さくなるように、第１行列、第２行列および第３行列を更新しているので、行列分解をスムーズに行うことができる。 In addition, since the first matrix, the second matrix, and the third matrix are updated so that the difference between the product of the first matrix, the second matrix, and the third matrix and the basic matrix becomes small, the matrix decomposition is smoothly performed. Can be done.

（実施の形態２）
実施の形態１で例示したデータ分析方法では、ある程度大きなＫ、Ｌが設定された場合、更新時に特定の１行と同じような行が発生したり、特定の１列と同じような列が発生したりすることが想定される。こうなった場合、クラスタリングの正確性が低下するおそれがあるため、この実施の形態２では、データ分析方法の実行時に、特定の１行と同じような性質の行が第２行列に発生した場合又は特定の１列と同じような性質の列が第２行列に発生した場合には、同じ性質となった行又は列を削除する方法について説明する。 (Embodiment 2)
In the data analysis method exemplified in the first embodiment, when K and L are set to some extent, a row similar to a specific row is generated or a column similar to a specific column is generated at the time of update. It is assumed that In this case, since the accuracy of clustering may be reduced, in the second embodiment, when the data analysis method is executed, a row having the same property as a specific one row is generated in the second matrix. Alternatively, when a column having the same property as a specific column occurs in the second matrix, a method of deleting a row or column having the same property will be described.

図１０は、実施の形態２に係る分解処理の流れを示すフローチャートである。 FIG. 10 is a flowchart showing the flow of the disassembling process according to the second embodiment.

図１０に示すフローチャートは、実施の形態１に係る分解処理のステップＳ１４とステップＳ１５との間に、削除処理を行うステップＳ１８を追加している。このため、ここではステップＳ１８についてのみ説明し、他のステップについては説明を省略する。 In the flowchart shown in FIG. 10, step S18 for performing a deletion process is added between steps S14 and S15 of the decomposition process according to the first embodiment. For this reason, only step S18 is demonstrated here and description is abbreviate | omitted about another step.

ステップＳ１８の削除処理では、分解部４２３は、特定の１行と同じような性質の行が第２行列に発生した場合又は特定の１列と同じような性質の列が第２行列に発生した場合には、同じ性質となった行又は列を削除する。 In the deletion process in step S18, the decomposition unit 423 generates a column having the same property as the specific one row in the second matrix or a column having the same property as the specific one column in the second matrix. In some cases, rows or columns that have the same properties are deleted.

図１１は削除処理の流れを示すフローチャートである。 FIG. 11 is a flowchart showing the flow of the deletion process.

分解部４２３は、第２行列における特定の１列（Ｌ列目）の各値と、他の各列の各値との差を計算する（ステップＳ２１）。 The decomposition unit 423 calculates a difference between each value of a specific first column (Lth column) in the second matrix and each value of each other column (step S21).

次いで、分解部４２３は、全ての要素で差の絶対値が一定値以下となる列（ｌ列目）があるか否かを判定し（ステップＳ２２）、ｌ列目があった場合にはステップＳ２３に移行し、ｌ列目がない場合にはステップＳ２４に移行する。 Next, the decomposing unit 423 determines whether or not there is a column (the l-th column) in which the absolute value of the difference is less than or equal to a certain value for all elements (step S22). The process proceeds to S23, and if there is no l-th column, the process proceeds to Step S24.

ステップＳ２３では、分解部４２３は、第２行列のｌ列目を削除し、第３行列のｌ行目を削除することで、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ−１列の第２行列と、Ｌ−１行Ｍ列の第３行列とに更新する。 In step S23, the decomposition unit 423 deletes the 1st column of the second matrix and the 1st row of the third matrix, so that the first matrix of N rows and K columns and the K row of L-1 columns are deleted. Update to the second matrix and the third matrix of L-1 rows and M columns.

ステップＳ２４では、分解部４２３は、第２行列における特定の１行（Ｋ行目）の各値と、他の各行の各値との差を計算する。 In step S24, the decomposing unit 423 calculates the difference between each value of a specific first row (Kth row) in the second matrix and each value of each other row.

次いで、分解部４２３は、全ての要素で差の絶対値が一定値以下となる行（ｋ行目）があるか否かを判定し（ステップＳ２５）、ｋ行目があった場合にはステップＳ２６に移行し、ｋ行目がない場合には削除処理を終了する。 Next, the decomposing unit 423 determines whether or not there is a row (k-th row) in which the absolute value of the difference is less than or equal to a certain value for all elements (step S25). The process proceeds to S26, and if there is no k-th row, the deletion process is terminated.

ステップＳ２６では、分解部４２３は、第２行列のｋ行目を削除し、第１行列のｋ列目を削除することで、Ｎ行Ｋ−１列の第１行列と、Ｋ−１行Ｌ列の第２行列と、Ｌ行Ｍ列の第３行列とに更新し、削除処理を終了する。 In step S26, the decomposition unit 423 deletes the k-th row of the second matrix and the k-th column of the first matrix, so that the first matrix of N rows and K-1 columns and the K-1 row L Update to the second matrix of columns and the third matrix of L rows and M columns, and the deletion process ends.

なお、一定値は例えば０．１以下の値である。また、ステップＳ２１からステップＳ２３までの処理と、ステップＳ２４からステップＳ２６までの処理とが逆の順序でもよい。さらに、ステップＳ２２，Ｓ２５では差の絶対値を基に判定を行ったが、特定の１行や特定の１列の各要素が０に近い値である場合には、ある行やある列の各要素の総和が一定値以下であるか否かで判定を行ってもよい。 The constant value is, for example, a value of 0.1 or less. Moreover, the process from step S21 to step S23 and the process from step S24 to step S26 may be reversed. Furthermore, in steps S22 and S25, the determination is made based on the absolute value of the difference. If each element in a specific row or a specific column is close to 0, each row or column The determination may be made based on whether or not the sum of elements is equal to or less than a certain value.

［効果等］
以上のように、本実施の形態によれば、第２行列の特定の１行以外の行において、ｋ行目における各要素が所定範囲に収まる数値である場合には、第２行列におけるｋ行目を削除し、第１行列におけるｋ列目を削除することで、Ｎ行Ｋ−１列の第１行列と、Ｋ−１行Ｌ列の第２行列と、Ｌ行Ｍ列の第３行列とに更新する。これにより、分割処理の高速化、クラスタリングの正確性を高めることができる。 [Effects]
As described above, according to the present embodiment, k rows in the second matrix are obtained when the elements in the kth row are values that fall within a predetermined range in the rows other than the specific one row of the second matrix. By deleting the eyes and deleting the k-th column in the first matrix, the first matrix of N rows and K-1 columns, the second matrix of K-1 rows and L columns, and the third matrix of L rows and M columns And update. Thereby, it is possible to increase the speed of the division process and the accuracy of clustering.

また、第２行列の特定の１列以外の列において、ｌ行目における各要素が所定範囲に収まる数値である場合には、第２行列におけるｌ列目を削除し、第３行列におけるｌ行目を削除することで、Ｎ行Ｋ列の前記第１行列と、Ｋ行Ｌ−１列の前記第２行列と、Ｌ−１行Ｍ列の第３行列とに更新する。これにより、分割処理の高速化、クラスタリングの正確性を高めることができる。 In addition, in the columns other than the specific one column of the second matrix, when each element in the l-th row is a numerical value that falls within a predetermined range, the l-th column in the second matrix is deleted, and the l-th row in the third matrix By deleting the eye, the first matrix with N rows and K columns, the second matrix with K rows and L-1 columns, and the third matrix with L-1 rows and M columns are updated. Thereby, it is possible to increase the speed of the division process and the accuracy of clustering.

（その他の実施の形態）
以上のように、本出願において開示する技術の例示として、実施の形態１，２を説明した。しかしながら、本実施の形態における技術は、これに限定されず、適宜、変更、置き換え、付加、省略などを行った実施の形態にも適用可能である。また、上記各実施の形態で説明した各構成要素を組み合わせて、新たな実施の形態とすることも可能である。 (Other embodiments)
As described above, Embodiments 1 and 2 have been described as examples of the technology disclosed in the present application. However, the technology in the present embodiment is not limited to this, and can be applied to an embodiment in which changes, replacements, additions, omissions, and the like are appropriately performed. Moreover, it is also possible to combine each component demonstrated by each said embodiment into a new embodiment.

以下の説明において上記実施の形態と同一部分については同一の符号を付してその説明を省略する場合がある。 In the following description, the same parts as those in the above embodiment may be denoted by the same reference numerals and the description thereof may be omitted.

例えば、上記の実施の形態では、ネットワーク５００を介して基礎行列がデータ分析装置４００に入力される場合を例示して説明したが、データ分析装置に直接基礎行列が入力（作成）されてもよい。 For example, in the above embodiment, the case where the basic matrix is input to the data analysis device 400 via the network 500 has been described as an example. However, the basic matrix may be directly input (created) to the data analysis device. .

図１２はデータ分析装置の変形例を示すブロック図である。 FIG. 12 is a block diagram showing a modification of the data analysis apparatus.

図１２に示すように、データ分析装置４００Ａには、入力部４５０と、処理部４２０と、表示部４６０とが設けられている。入力部４５０はキーボード、タッチパネル、マウスなどの入力デバイスであり、解析者が入力部４５０を操作することにより基礎行列が入力（作成）される。つまり、入力部４５０が取得部である。また、表示部４６０は、ディスプレイであり、基礎行列、第１行列、第２行列および第３行列の少なくとも一つを表示することで出力する。つまり、表示部４６０が出力部である。さらに、データ分析装置４００Ａは、基礎行列、第１行列、第２行列および第３行列を蓄積するハードディスクやメモリなどの蓄積部を備えていてもよい。 As shown in FIG. 12, the data analysis apparatus 400A is provided with an input unit 450, a processing unit 420, and a display unit 460. The input unit 450 is an input device such as a keyboard, a touch panel, or a mouse, and a basic matrix is input (created) when an analyst operates the input unit 450. That is, the input unit 450 is an acquisition unit. The display unit 460 is a display, and displays and outputs at least one of a basic matrix, a first matrix, a second matrix, and a third matrix. That is, the display unit 460 is an output unit. Furthermore, the data analysis device 400A may include a storage unit such as a hard disk or a memory that stores the basic matrix, the first matrix, the second matrix, and the third matrix.

なお、上記各実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。ここで、上記各実施の形態の画像復号化装置などを実現するソフトウェアは、次のようなプログラムである。 In each of the above embodiments, each component may be configured by dedicated hardware or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. Here, the software that realizes the image decoding apparatus of each of the above embodiments is the following program.

すなわち、このプログラムは、コンピュータに、Ｎ個の第１対象物のそれぞれと、Ｍ個の第２対象物のそれぞれとの関連度を示すＮ行Ｍ列の基礎行列を、３つの行列に分解して第１対象物および前記第２対象物のうち少なくとも一つをクラスタリングするデータ分析方法であって、基礎行列の各要素に対して、関連度を示す値が入力された基礎行列を取得する取得ステップと、第１対象物のクラスタ数を示すＫと、前記第２対象物のクラスタ数を示すＬとを設定する設定ステップと、３つの行列を、Ｎ行Ｋ列の第１行列と、Ｋ行Ｌ列の行列であって、特定の１行および特定の１列の少なくとも一方の各要素に、所定範囲に収まる数値が格納されている第２行列と、Ｌ行Ｍ列の第３行列とし、第１行列、第２行列および第３行列の積が、基礎行列に近似するように、第１行列、第２行列および第３行列に分解する分解ステップと、第１行列、第２行列および第３行列の少なくとも一つを出力することで、前記第１対象物および前記第２対象物のうち少なくとも一つのクラスタリング結果を出力する出力ステップとを含むデータ分析方法を実行させる。 That is, the program decomposes a basic matrix of N rows and M columns indicating the degree of association between each of the N first objects and each of the M second objects into three matrices. A data analysis method for clustering at least one of the first object and the second object, and acquiring a basic matrix in which a value indicating the degree of association is input for each element of the basic matrix A setting step for setting a step, K indicating the number of clusters of the first object, and L indicating the number of clusters of the second object, and three matrices: a first matrix of N rows and K columns; A matrix of rows and L columns, a second matrix in which numerical values falling within a predetermined range are stored in each element of at least one specific row and a specific column, and a third matrix of L rows and M columns , The first matrix, the second matrix and the third matrix are the basic rows The first object is output by decomposing into a first matrix, a second matrix, and a third matrix, and outputting at least one of the first matrix, the second matrix, and the third matrix so as to approximate to And an output step of outputting a clustering result of at least one of the second objects.

また、上記各実施の形態において、特定の処理部が実行する処理を別の処理部が実行してもよい。また、複数の処理の順序が変更されてもよいし、複数の処理が並行して実行されてもよい。 Moreover, in each said embodiment, another process part may perform the process which a specific process part performs. Further, the order of the plurality of processes may be changed, and the plurality of processes may be executed in parallel.

以上、一つまたは複数の態様に係るデータ分析方法について、実施の形態に基づいて説明したが、本発明は、この実施の形態に限定されるものではない。本発明の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、一つまたは複数の態様の範囲内に含まれてもよい。 As described above, the data analysis method according to one or more aspects has been described based on the embodiment, but the present invention is not limited to this embodiment. Unless it deviates from the gist of the present invention, various modifications conceived by those skilled in the art have been made in this embodiment, and forms constructed by combining components in different embodiments are also within the scope of one or more aspects. May be included.

本発明は、クラスタリングに用いられるデータ分析方法、データ分析装置およびプログラムとして有用である。すなわち、本発明は、推薦システムや文章分類などクラスタリングを必要とする様々な分野で応用可能である。 The present invention is useful as a data analysis method, data analysis apparatus, and program used for clustering. That is, the present invention can be applied in various fields that require clustering, such as a recommendation system and sentence classification.

１データ分析システム
２００入力装置
３００表示装置
４００データ分析装置
４１０取得部
４２０処理部
４２１格納部
４２２設定部
４２３分解部
４３０出力部
５００ネットワーク DESCRIPTION OF SYMBOLS 1 Data analysis system 200 Input device 300 Display apparatus 400 Data analysis device 410 Acquisition part 420 Processing part 421 Storage part 422 Setting part 423 Decomposition part 430 Output part 500 Network

Claims

A basic matrix of N rows and M columns indicating the degree of association between each of the N first objects and each of the M second objects is decomposed into three matrices, and the first object and the first object are decomposed. A data analysis method for clustering at least one of two objects,
An acquisition step of acquiring the basic matrix in which a value indicating the degree of association is input for each element of the basic matrix;
A setting step of setting K indicating the number of clusters of the first object and L indicating the number of clusters of the second object;
The three matrices are a first matrix of N rows and K columns, and a matrix of K rows and L columns, and numerical values that fall within a predetermined range are stored in each element of one specific row and one specific column. The first matrix, so that the product of the first matrix, the second matrix, and the third matrix approximates the basic matrix, A decomposing step of decomposing the second matrix and the third matrix;
An output step of outputting at least one clustering result of the first object and the second object by outputting at least one of the first matrix, the second matrix, and the third matrix;
Data analysis method including.

The data analysis method according to claim 1, wherein a numerical value that falls within the predetermined range is stored in each element of the specific row and the specific column.

The data analysis method according to claim 1, wherein the numerical value falling within the predetermined range is a positive value that is substantially zero.

The data analysis method according to any one of claims 1 to 3, wherein the first matrix has a sum of elements in each row that is substantially the same in all rows.

The data analysis method according to any one of claims 1 to 4, wherein the third matrix has a sum of elements in each column that is substantially the same in all columns.

The decomposition step includes
Repeat updating the first matrix, the second matrix, and the third matrix so that a difference between a product of the first matrix, the second matrix, and the third matrix and the basic matrix is reduced. The data analysis method as described in any one of Claims 1-5.

The decomposition step includes
In each row other than the specific one row of the second matrix, when each element in the k-th row is a numerical value that falls within the predetermined range, the k-th row in the second matrix is deleted, and the first matrix To update the first matrix of N rows and K-1 columns, the second matrix of K-1 rows and L columns, and the third matrix of L rows and M columns by deleting the k-th column in The data analysis method according to any one of 1 to 6.

The decomposition step includes
In each column other than the specific column of the second matrix, if each element in the l-th row is a numerical value that falls within the predetermined range, the l-th column in the second matrix is deleted, and the third matrix The first row of N rows and K columns, the second matrix of K rows and L-1 columns, and the third matrix of L-1 rows and M columns are updated by deleting the 1st row in The data analysis method as described in any one of 1-7.

The first object is a user, and the degree of association with each element of the basic matrix indicates whether or not N users are interested in each of the M second objects. The data analysis method according to one item.

A basic matrix of N rows and M columns indicating the degree of association between each of the N first objects and each of the M second objects is decomposed into three matrices, and the first object and the A data analysis device for clustering at least one of the second objects,
An acquisition unit that acquires the basic matrix in which a value indicating the degree of association is input for each element of the basic matrix;
A setting unit for setting K indicating the number of clusters of the first object and L indicating the number of clusters of the second object;
The three matrices are a first matrix of N rows and K columns and a matrix of K rows and L columns, and numerical values that fall within a predetermined range are stored in at least one element of a specific row and a specific column. The first matrix, the third matrix, and the third matrix with L rows and M columns, and the product of the first matrix, the second matrix, and the third matrix approximates the basic matrix. A decomposition unit that decomposes the second matrix and the third matrix;
An output unit for outputting at least one clustering result of the first object and the second object by outputting at least one of the first matrix, the second matrix, and the third matrix;
A data analysis apparatus.

A program for causing a computer to execute the data analysis method according to claim 1.