JP2020009078A

JP2020009078A - Data processing system and data processing method

Info

Publication number: JP2020009078A
Application number: JP2018128621A
Authority: JP
Inventors: 裕之大崎; Hiroyuki Osaki; 直明横井; Naoaki Yokoi
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-07-05
Filing date: 2018-07-05
Publication date: 2020-01-16
Anticipated expiration: 2038-07-05
Also published as: JP7068079B2

Abstract

【解決課題】データの特徴が多岐に亘っても、データベースに対するデータ検索をユーザの負荷を低減しながら、精度よく行えるデータ処理システムを提供する。【課題解決手段】ユーザーデータの特徴量に基づいて類似データを検索するためのデータ処理システムであって、コントローラと、メモリと、を備え、前記コントローラは前記メモリに記録されたデータ処理プログラムを実行することにより、前記ユーザーデータが属するデータ構造を認識し、当該データ構造は、前記ユーザーデータを有する第１のノードと、当該第１のノードの近傍の第２のノードを備え、前記第２のノードの情報に基づいて前記特徴量を算出する。【選択図】図１PROBLEM TO BE SOLVED: To provide a data processing system capable of accurately searching data for a database while reducing a user's load even if the characteristics of data are diverse. A data processing system for retrieving similar data based on a feature amount of user data, which includes a controller and a memory, and the controller executes a data processing program recorded in the memory. By doing so, the data structure to which the user data belongs is recognized, and the data structure includes a first node having the user data and a second node in the vicinity of the first node, and the second node is provided. The feature amount is calculated based on the node information. [Selection diagram] Fig. 1

Description

本発明は、ユーザーデータに類似するデータをデータベースから検索するためのデータ処理に関する。 The present invention relates to data processing for searching a database for data similar to user data.

半導体、医薬等様々な材料を扱う技術分野では、材料の開発や研究にデータベースが活用されている。データベースのユーザは、データベースに、材料に関する実験データやシミュレーションデータ等を格納し、そして、例えば、自己が開発している材料に参考になる情報を得ようにして、データベースに対してデータ検索を行うことが多々ある。 In the technical field dealing with various materials such as semiconductors and pharmaceuticals, databases are used for material development and research. The user of the database stores data such as experimental data and simulation data related to the material in the database, and performs data search in the database, for example, in order to obtain information that is useful for the material being developed by the user. There are many things.

データベースに対する検索のためのシステムとして、例えば、特許文献１に記載されたものが知られている。特許文献１は、それぞれが多要素・多成分の性質を持つ物質を組み合わせ配合して新しい物質を生成する場合において使用に不明な物質が存在する場合に、過去のシミュレーションや実験から候補となる物質情報をユーザに提供でき、また過去の実験や物質ごとのコストや入手容易性などを保存し、それらを計算に反映させる手法を具備し、それにより配合された物質が検索できる物質作用配合解析データ保存抽出機構を開示している。 As a system for searching a database, for example, a system described in Patent Literature 1 is known. Patent Literature 1 discloses a substance that is a candidate from past simulations and experiments when there is a substance that is unknown in use when a new substance is generated by combining and blending substances each having multi-element / multi-component properties. Material action / combination analysis data that can provide information to users, has a method to save past experiments and costs and availability for each substance, and reflects them in calculations, and to search for the compounded substance. A conservative extraction mechanism is disclosed.

さらに、特許文献２は、検索対象データベース抽出手段が、検索条件が入力された際に、検索条件の絞り込みの対象となるデータを含んでいるデータベースを抽出し、統合検索手段は、検索条件が入力されると、実施的に同じレコードに対してデータベースにおいて個別に付加されているデータを統合し、そして、統合された情報に基づいて、検索条件に適合するレコードを検索し、系統検索手段は、特定のレコードを指定した系統検索指令を受けると、指定されたレコードに系統的に近い他のレコードを検索し、検索結果表示手段は、統合検索手段による検索結果や系統検索手段による検索結果を、表示装置の画面に表示する、データベース検索装置を開示している。 Further, in Patent Literature 2, when a search condition is input, a search target database extraction unit extracts a database including data to be narrowed down by the search condition. Then, the data that has been individually added in the database to the same record is practically integrated, and based on the integrated information, a record that matches the search condition is searched. Upon receiving a systematic search command that specifies a specific record, another record that is systematically close to the specified record is searched, and the search result display unit displays the search result by the integrated search unit and the search result by the systematic search unit. A database search device displayed on a screen of a display device is disclosed.

特開2012-048615号公報JP 2012-048615 A 特開1999-175552号公報JP-A-1999-175552

データベースにおいて材料を検索する際、検索対象の材料とデータベースに登録されている材料との類似度を判定するための尺度として、材料が持つ特徴が利用されている。特徴には、物理的に測定可能な、長さ、質量、強度等があり、その他、コンピュータによって計算される分子・原子に関する情報、例えば、電子状態（エネルギー等）の他、統計処理等の演算処理によって得られるものもある。このように特徴の種類が多岐に及ぶため、データベースでデータ検索を行う際や材料を登録する際、ユーザに特徴を選択、あるいは、指定させることを強いると、ユーザの負荷が大きくなるばかりか、データ検索の精度が低下するおそれがある。 When searching for a material in a database, a characteristic of the material is used as a scale for determining the similarity between the material to be searched and the material registered in the database. Features include physical measurable length, mass, intensity, and other information about molecules and atoms calculated by a computer, such as electronic states (energy, etc.), as well as calculations such as statistical processing. Some can be obtained by processing. As described above, since the types of features are various, when performing a data search in a database or registering a material, forcing a user to select or specify a feature not only increases the load on the user, Data retrieval accuracy may be reduced.

そこで、本発明は、データの特徴が多岐に亘っても、データベースに対するデータ検索をユーザの負荷を低減しながら、精度よく行えるデータ処理システムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a data processing system capable of accurately performing data search on a database while reducing the load on a user even if the characteristics of the data are diverse.

本発明は、前記目的を達成するために、ユーザーデータの特徴量に基づいて類似データを検索するためのデータ処理システムであって、コントローラと、メモリと、を備え、前記コントローラは前記メモリに記録されたデータ処理プログラムを実行することにより、前記ユーザーデータが属するデータ構造を認識し、当該データ構造は、前記ユーザーデータを有する第１のノードと、当該第１のノードの近傍の第２のノードを備え、前記第２のノードの情報に基づいて前記特徴量を算出する、ことを特徴とする。さらに、本発明は、前記目的を達成するために、ユーザーデータの特徴量に基づいて類似データを検索するためのデータ処理方法でもある。 In order to achieve the above object, the present invention is a data processing system for searching for similar data based on a feature amount of user data, comprising a controller and a memory, wherein the controller records on the memory By executing the data processing program, the data structure to which the user data belongs is recognized, and the data structure includes a first node having the user data and a second node near the first node. And calculating the characteristic amount based on the information of the second node. Further, the present invention is also a data processing method for searching for similar data based on a feature amount of user data in order to achieve the above object.

本発明によれば、データの特徴が多岐に亘っても、データベースに対するデータ検索をユーザの負荷を低減しながら、精度よく行えるデータ処理システム、そして、データ処理方法を提供することができる。 Advantageous Effects of Invention According to the present invention, it is possible to provide a data processing system and a data processing method capable of accurately performing data search for a database while reducing the load on a user even if the characteristics of the data are diverse.

図１はデータ処理システムのブロック図の一例である。FIG. 1 is an example of a block diagram of a data processing system. 図２は、データベースＤＢ１，ＤＢ２夫々のデータ構造を表すブロック図である。FIG. 2 is a block diagram showing the data structure of each of the databases DB1 and DB2. 図３は、データベースのデータ構造と、ユーザーデータのカテゴリと、ユーザーデータの特徴との関係のブロック図である。FIG. 3 is a block diagram showing the relationship between the data structure of the database, categories of user data, and characteristics of user data. 図４は業務サーバのブロック図である。FIG. 4 is a block diagram of the business server. 図５は端末計算機のブロック図である。FIG. 5 is a block diagram of the terminal computer. 図６は、端末計算機の既述の類似データ検索画面の一例である。FIG. 6 is an example of the above-described similar data search screen of the terminal computer. 図７は、端末計算機のユーザーデータ登録画面一例である。FIG. 7 is an example of a user data registration screen of the terminal computer. 図８は、端末計算機のカテゴリ判定条件の入力画面の一例である。FIG. 8 is an example of a category determination condition input screen of the terminal computer. 図９は、業務サーバのカテゴリ対応テーブルの一例である。FIG. 9 is an example of a category correspondence table of the business server. 図１０は、業務サーバの特徴対応テーブルの一例である。FIG. 10 is an example of a feature correspondence table of the business server. 図１１は、業務サーバの特徴対応テーブルの他の例である。FIG. 11 is another example of the feature correspondence table of the business server. 図１２は、業務サーバの計算式リストテーブルの一例である。FIG. 12 is an example of a calculation formula list table of the business server. 図１３は、端末計算機からユーザーデータの入力を受けてから特徴量を算出するまでの業務サーバに処理の流れをのフローチャートである。FIG. 13 is a flowchart showing the flow of processing performed by the business server from when the user data is input from the terminal computer to when the feature amount is calculated. 図１４は、相対パス数の範囲の近傍ノードに複数の経路があることを説明する、データ構造のブロック図である。FIG. 14 is a block diagram of a data structure for explaining that there are a plurality of routes at neighboring nodes within the range of the relative path number. 図１５は、近傍ノードへの他の経路を備える、データ構造のブロック図である。FIG. 15 is a block diagram of a data structure with another route to a neighboring node. 図１６は、業務サーバのアクセス履歴テーブルの一例である。FIG. 16 is an example of the access history table of the business server.

次に、本発明の実施形態を図面に基づいて説明する。図１はデータ処理システムのブロック図の一例である。データ処理システムは、複数のデータベース（データベースサーバ）１０−ｎにネットワーク（Net1）を介して接続する業務サーバ１０１と、業務サーバ１０１にネットワーク（Net2）を介して接続し、業務サーバ１０１に対してデータベースへのデータ登録やデータ検索等をリクエストするユーザ側端末計算機１０２とを備える。 Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is an example of a block diagram of a data processing system. The data processing system includes a business server 101 connected to a plurality of databases (database servers) 10-n via a network (Net1), and a business server 101 connected to the business server 101 via a network (Net2). A user-side terminal computer 102 for requesting data registration, data retrieval, and the like to a database.

業務サーバ１０１は、複数のデータベースに対する、データの検索、データの登録のためのクラウドサービスを複数の端末計算機１０２に提供する。端末計算機１０２は、クラウドサービスを利用することによって、例えば、材料開発のソルーションを強化することができる。 The business server 101 provides a plurality of terminal computers 102 with a cloud service for searching for data and registering data for a plurality of databases. The terminal computer 102 can use, for example, a cloud service to enhance a solution for material development.

データベースは特に限定されるものではなく、例えば、MongoDB, PostgreSQL, MySQL, MariaDB（以上、データベースの固有名称）であってよい。データベースは、オープンデータ、及び、ユーザの内部データを保存する。データベースは、ユーザによって管理、運例されるものであってもよい、端末計算機１０２は、材料開発に伴う実験データ、解析データ（三次元シミュレーション等）を業務サーバ１０１を介してデータベースに登録し、又は、類似データをデータベースに基づいて検索することができる。 The database is not particularly limited, and may be, for example, MongoDB, PostgreSQL, MySQL, MariaDB (the above is a unique name of the database). The database stores open data and user internal data. The database may be managed and practiced by a user. The terminal computer 102 registers experimental data and analysis data (such as three-dimensional simulation) associated with material development in the database via the business server 101, Alternatively, similar data can be searched based on a database.

従来、ユーザは、自身のデータである、実験データやシミュレーションデータが持つ、或いは、データに付帯する特徴に着目して、ユーザーデータに類似するデータをデータベースから検出、抽出することを広く行っている。特徴とは、例えば、材料データの個別情報であって、これは、引っ張り強さ、密度、強度等多種類に及ぶ。 2. Description of the Related Art Conventionally, a user has widely performed detection and extraction of data similar to user data from a database by paying attention to characteristics of the data, such as experimental data and simulation data, or characteristics accompanying the data. . The characteristic is, for example, individual information of material data, and this includes various types such as tensile strength, density, and strength.

そして、材料の組成、形状、構造、物性等材料の種類や分類の違いによって、材料が備える特徴と材料が備えない特徴との組み合わせは膨大な数になる。それにも拘わらず、従来、ユーザ自身が、特徴を選択、選定、或いは、決定した上で特徴量を計算し、特徴量に基づいて、データベースから類似データを検索しなければならかったことから、その負荷はユーザにとって重かったばかりでなく、特徴の選択が適切でないと、検索結果の精度も低下してしまうおそれもあった。 The number of combinations of features provided by the material and features not provided by the material is enormous depending on the type and classification of the material, such as the composition, shape, structure, and physical properties of the material. Nevertheless, conventionally, the user himself had to select, select, or determine the feature, calculate the feature amount, and search for similar data from the database based on the feature amount. Not only was the load heavy for the user, but if the selection of the feature was not appropriate, the accuracy of the search results could be reduced.

そこで、業務サーバ１０１は、端末計算機１０２からのユーザーリクエストに応じて、データ処理プログラムを実行し、データベースに格納されているユーザーデータが備えるデータ構造に基づいてユーザーデータの特徴の種類、範囲等を決定し、そして、ユーザーデータを用いて、特徴の重み、例えば、特徴量を計算し、次いで、計算された特徴量に基づいてデータベースを検索して類似データを検索することによって、ユーザの負荷を解放すると共に、検索の精度を向上させている。 Thus, the business server 101 executes a data processing program in response to a user request from the terminal computer 102, and determines the type, range, and the like of the characteristics of the user data based on the data structure of the user data stored in the database. Determine, and use the user data to calculate feature weights, e.g., feature values, and then search the database based on the calculated feature values for similar data to reduce user load. Release and improve search accuracy.

業務サーバ１０１が、ユーザーデータ自体からユーザーデータの特徴を決定することは容易でない。何故なら、実験データやシミュレーションデータの値や形式は、ユーザ毎、或いは、データ毎によって大きく相違するからである。 It is not easy for the business server 101 to determine the characteristics of the user data from the user data itself. This is because the values and formats of the experimental data and the simulation data greatly differ depending on the user or the data.

一方、ユーザーデータが、テキストベースのフォーマット（例えば、ｊｓｏｎ、スタースキーマ）で記録されるデータベースでは、データ構造において、ユーザーデータのノードの近傍（「周辺」、と言い換えられてもよい）に、ユーザーデータのためのインデックス、フラグ、要約文、キーワード、汎用語等テキストベースのメタデータとして記録するノードが存在する。このノードを以後、「近傍ノード」と称する。なお、「周辺ノード」等他の用語に言い換えられてもよい。 On the other hand, in a database in which user data is recorded in a text-based format (for example, json or star schema), in a data structure, the vicinity of a user data node (which may be paraphrased as “periphery”) There are nodes that record as text-based metadata such as indexes, flags, summary sentences, keywords, and general words for data. This node is hereinafter referred to as a “neighbor node”. In addition, it may be paraphrased into another term such as “peripheral node”.

このメタデータは、ユーザーデータの種類、分類、又は、範囲等(以後、これを「カテゴリ」と総称する。）を判定、特定、決定、又は、判別するのに有益である。そこで、業務サーバ１０１は近傍ノードの格納情報に基づいて、対応テーブル、機械学習、又は、演算等の手段によって、ユーザーデータのカテゴリを判定する。ユーザーデータのカテゴリが定まればこれに関連する特徴の種類も限られてくるため、業務サーバ１０１は限られた特徴について、ユーザーデータに基づいて特徴の値を難なく求めることができる。カテゴリと特徴の対応は、テーブル、機械学習、又は、演算等の手段によって可能になる。 The metadata is useful for determining, specifying, determining, or determining the type, classification, range, and the like of user data (hereinafter, these are collectively referred to as “categories”). Therefore, the business server 101 determines the category of the user data by means of a correspondence table, machine learning, calculation, or the like based on the storage information of the neighboring nodes. Once the category of the user data is determined, the types of features related to the category are also limited. Therefore, the business server 101 can easily find the value of the limited feature based on the user data. The correspondence between the category and the feature is made possible by means such as a table, machine learning, or calculation.

ユーザーデータのノードに対する近傍のノードの範囲は、ユーザーデータのノードを起点として相対パスによって特定されてよい。パスとはノードとノードとの間のリンクである。近傍ノードは、起点ノードから特定の相対パス数のノードでも、あるいは、起点ノードから所定のパス数の範囲にある複数のノードでもよい。近傍ノードである必要がある。例えば、ユーザーデータのノードから遠いノードではユーザーデータとの関係性が薄れ、ユーザーデータのカテゴリを正しく反映しないおそれがある。 The range of nodes near the user data node may be specified by a relative path starting from the user data node. A path is a link between nodes. The neighbor node may be a node having a specific number of relative paths from the origin node, or a plurality of nodes within a predetermined number of paths from the origin node. Must be a neighboring node. For example, a node far from the user data node loses its relationship with the user data and may not correctly reflect the category of the user data.

近傍ノードの範囲を限定することは、あまり意味があることではない。ユーザーデータのカテゴリを反映するメタデータを備えるノードであれば、近傍ノードであるといってよい。一例を示せば、相対パスの数が１０以下のノードは近傍ノードであるといってよい。相対パスは、親ノード側へのパス数と子ノード側へのパス数との組み合わせによって定義されてよい。たとえば、(Parent x <=2)/(child x <=2)によって相対パスが定義された近傍ノードは、親ノード側に２つのパス以内、次いで、子ノード側に２つのパス以内にある複数のノードの一つ又は複数が“近傍ノード”になる。 Limiting the range of neighboring nodes is not very meaningful. Any node that includes metadata that reflects the category of user data may be said to be a neighboring node. For example, a node having a relative path number of 10 or less may be a neighboring node. The relative path may be defined by a combination of the number of paths to the parent node and the number of paths to the child node. For example, the neighboring nodes whose relative paths are defined by (Parent x <= 2) / (child x <= 2) are within two paths on the parent node side and then on multiple paths within two paths on the child node side. One or more of the nodes becomes a “neighbor node”.

図２は、データベースＤＢ１，ＤＢ２夫々のデータ構造を表すブロック図である。データ構造は、複数のノードを木構造で連結させたものである。ＤＢ１とＤＢ２の夫々のデータ構造において、計算対象データとして標記されているノードがユーザーデータのノードである。業務サーバ１０１は、データベースのデータ構造を認識でき、そして、計算対象データのノードへのリンク情報を備えていることから、データ構造内のユーザーデータのノードを認識することができる。 FIG. 2 is a block diagram showing the data structure of each of the databases DB1 and DB2. The data structure is obtained by connecting a plurality of nodes in a tree structure. In each of the data structures of DB1 and DB2, a node marked as calculation target data is a node of user data. The business server 101 can recognize the data structure of the database and, since it has the link information to the node of the calculation target data, can recognize the node of the user data in the data structure.

業務サーバ１０１は、起点ノードからの相対パス数によって近傍ノードの範囲を認識すると、複数のノードがある場合は、これを順番に走査してノード内のデータに基づいてカテゴリを判定する。 When recognizing the range of neighboring nodes based on the number of relative paths from the originating node, the business server 101, when there are a plurality of nodes, sequentially scans these and determines a category based on data in the nodes.

図２のＤＢ１において、業務サーバ１０１は、起点ノード（計算対象データ）から相対パスの範囲内にある情報、即ち、（(Parent x <=2)/(child x <=2)/*）の範囲内にある、近傍ノードとしての”化学成分ノード”（ＴＮ２００）に属する情報（カテゴリに関連する情報である、「Ｃｕ，Ｎｉ」というテキスト）に基づいて、カテゴリを「金属」と判定する。 In the DB1 of FIG. 2, the business server 101 stores information within a range of a relative path from the origin node (calculation target data), that is, ((Parent x <= 2) / (child x <= 2) / *) The category is determined to be “metal” based on the information belonging to the “chemical component node” (TN200) as a neighboring node within the range (the text “Cu, Ni” which is information related to the category).

そして、ＤＢ２においても、業務サーバ１０１は、起点ノード（計算対象データ）から相対パス
（(Parent x <=2)/(child x <=2)/*）の範囲内にある、近傍ノードとしての「Component Element Prop」（ＴＮ２００）の「Ｎｉ」というテキストに基づいて、カテゴリを「金属」と判定する。 In the DB2 as well, the business server 101 determines that a neighboring node is within a range of a relative path ((Parent x <= 2) / (child x <= 2) / *) from the origin node (data to be calculated). The category is determined as “metal” based on the text “Ni” of “Component Element Prop” (TN200).

さらにまた、業務サーバ１０１は、起点ノード（計算対象データ）から相対パス（(Parent x <=2)/(child x <=2)/*）の範囲内にある、近傍ノードとしての「素材形状」（ＴＮ２０２）に属する情報、即ち、既述のカテゴリに関連する情報である「Cast Block」というテキストに基づいて、カテゴリを「鋳造」と判定する。 Further, the business server 101 sets the “material shape” as a neighboring node within a range of a relative path ((Parent x <= 2) / (child x <= 2) / *) from the origin node (calculation target data). (TN202), that is, the category is determined to be “casting” based on the text “Cast Block” which is information related to the above-described category.

そして、ＤＢ２においても、業務サーバ１０１は、起点ノード（計算対象データ）から相対パス
（(Parent x <=2)/(child x <=2)/*）の範囲内にある、近傍ノードとしての「ｔａｇ」（ＴＮ２０２）に属する情報、即ち、既述のカテゴリに関連する情報である、Alloy cast ironというテキストに基づいて、カテゴリを「鋳造」と決定する。業務サーバ１０１は、近傍ノードである”素材形状ノード”に属する情報、既述のカテゴリに関連する情報である、ＣａｓｔＢｌｏｃｋというテキストに基づいて、カテゴリを「鋳造」と決定する。 In the DB2 as well, the business server 101 determines that a neighboring node is within a range of a relative path ((Parent x <= 2) / (child x <= 2) / *) from the origin node (data to be calculated). The category is determined to be “casting” based on the information belonging to “tag” (TN202), that is, the text “Alloy cast iron” which is the information related to the above-described category. The business server 101 determines the category as "casting" based on the information belonging to the "material shape node" which is a neighboring node and the text "CastBlock" which is information related to the above-described category.

図３は、データベースのデータ構造３０１と、ユーザーデータのカテゴリ３０２と、ユーザーデータの特徴３０３との関係のブロック図である。業務サーバ１０１は、複数のカテゴリの中から、計算対象データのノードの近傍ノードである「化学成分」のノードに属する、カテゴリに関連する情報である「Ｃｕ，Ｎｉ」に基づいて、カテゴリを特定する。 FIG. 3 is a block diagram showing a relationship between a data structure 301 of a database, a category 302 of user data, and a feature 303 of user data. The business server 101 specifies a category from a plurality of categories based on “Cu, Ni” which is information related to the category and belongs to a “chemical component” node which is a node close to the node of the calculation target data. I do.

業務サーバ１０１は、カテゴリとして複数のものを備える(３０２)。夫々のカテゴリには、カテゴリの種類に応じて関連する特徴３０３が予め割当てられている。業務サーバ１０１は、この割り当てに従って、カテゴリ３０２から特徴３０３を特定する。特徴には特徴量を計算するための式が対応付けられており、業務サーバ１０１は、この計算式に計算対象データを適用して特徴量を算出する。業務サーバ１０１は、カテゴリを介することによって、計算対象データに適した特徴を選定することができる。 The business server 101 has a plurality of categories (302). Relevant features 303 are assigned to each category in advance according to the type of category. The business server 101 specifies the feature 303 from the category 302 according to the assignment. The feature is associated with an expression for calculating the feature amount, and the business server 101 calculates the feature amount by applying the calculation target data to the calculation expression. The business server 101 can select a feature suitable for the calculation target data via the category.

次に、業務サーバ１０１について詳しく説明する。図４は業務サーバ１０１のブロック図である。業務サーバ１０１は、コンピュータのハードウェア資源と、ソフトウェア資源と、データ資源とを備える。ハードウェア資源は、コントローラ（ＣＰＵ）４０２と、メモリ４０３と、通信モジュール４０４ａとを備える。ソフトウェア資源は、メモリ４０３内のプログラムによって夫々実現される、近傍ノード抽出モジュール４０５と、カテゴリ特定量モジュール４０６と、特徴決定モジュール４０７と、アクセス履歴収集モジュール４１２と、特徴量計算モジュール４１３、類似データ検索モジュール４１４と、ユーザーデータ登録モジュール４１６とを備える。既述のモジュールは、メモリのプログラムをコントローラ４０２が実行することによって達成される。 Next, the business server 101 will be described in detail. FIG. 4 is a block diagram of the business server 101. The business server 101 includes computer hardware resources, software resources, and data resources. The hardware resources include a controller (CPU) 402, a memory 403, and a communication module 404a. The software resources are realized by the programs in the memory 403, and are each implemented by a neighboring node extraction module 405, a category identification amount module 406, a feature determination module 407, an access history collection module 412, a feature amount calculation module 413, and similar data. A search module 414 and a user data registration module 416 are provided. The above-described module is achieved by the controller 402 executing a memory program.

データ資源は、カテゴリ対応テーブル４０８と、特徴対応テーブル４０９と、計算式リストテーブル４１０と、アクセス履歴情報テーブル４１１と、を備える。 The data resource includes a category correspondence table 408, a characteristic correspondence table 409, a calculation formula list table 410, and an access history information table 411.

業務サーバ１０１は、近傍ノード抽出モジュール４０５を実行して、相対パス数に基づいて、ユーザーデータのノードに対する近傍ノード（図２の「化学成分」ノード、「素材形状」ノード等）を抽出する。業務サーバ１０１は、カテゴリ特定モジュール４０６を実行して、近傍ノードを走査して、カテゴリに関する情報（図２の「Ｃｕ」、「Ｎｉ」、「Ｃａｓｔ」）を抽出し、この情報に基づいてカテゴリ対応テーブル４０８（図３：３０１と３０２とのリンク）を参照して、「カテゴリ」を特定する。 The business server 101 executes the neighborhood node extraction module 405 to extract the neighborhood nodes (the “chemical component” node, the “material shape” node, etc. in FIG. 2) for the user data node based on the number of relative paths. The business server 101 executes the category specifying module 406 to scan the neighboring nodes, extract information on the category (“Cu”, “Ni”, “Cast” in FIG. 2), and perform the category based on this information. The “category” is specified with reference to the correspondence table 408 (FIG. 3: link between 301 and 302).

さらに、業務サーバ１０１は、特徴決定モジュール４０７を実行して、カテゴリに基づいて、特徴対応テーブル４０９（図３：３０２と３０３との間のリンク）に基づいて特徴を決定する。業務サーバ１０１は、特徴量計算モジュール４１３を実行して、決定された計算式リストテーブル４１０を参照し、特徴に対応する計算式を取得し、計算式にユーザーデータを適用して特徴量を計算する。 Further, the business server 101 executes the feature determination module 407, and determines the feature based on the category and the feature correspondence table 409 (FIG. 3: link between 302 and 303). The business server 101 executes the feature amount calculation module 413, refers to the determined calculation formula list table 410, acquires a calculation formula corresponding to the feature, and calculates the feature value by applying user data to the calculation formula. I do.

業務サーバ１０１は、類似データ検索モジュール４１４を実行して、特徴量に基づいてデータベースに対して類似データを検索する。類似データとは、同一又は類似の特徴について、特徴量に近似した範囲の値を持つデータであってよい。業務サーバ１０１は、ユーザーデータ登録モジュール４１６を実行して、端末計算機１０２からアップロードされるデータを所定のデータベースのデータ構造のノードに格納する。 The business server 101 executes the similar data search module 414 to search the database for similar data based on the feature amount. The similar data may be data having a value in a range similar to the feature amount for the same or similar feature. The business server 101 executes the user data registration module 416 to store data uploaded from the terminal computer 102 in a node of a data structure of a predetermined database.

さらに、アクセス履歴収集モジュール４１３は、サーバ１０１がデータベースにアクセスした履歴を収集し、これをアクセス履歴情報テーブル４１１に記録する。アクセス履歴は、近傍ノード抽出モジュール４０５が起点ノードに対する近傍ノードの抽出をする際に役立つものである。通信モジュール４０４ａは、端末計算機１０２、データベース（ＤＢｎ）との通信を実行する。 Further, the access history collection module 413 collects the history of the server 101 accessing the database, and records this in the access history information table 411. The access history is useful when the neighboring node extraction module 405 extracts a neighboring node from the originating node. The communication module 404a executes communication with the terminal computer 102 and the database (DBn).

図５は端末計算機１０２のブロック図である。端末計算機１０２は、コントローラ４１７と、メモリ４１５と、通信モジュール４０４ｂを備える。メモリ４１５は、ユーザに、業務サーバ１０１へのリクエストを支援するＧＵＩを実現する。ＧＵＩは、ユーザがユーザーデータに類似するデータをデータベース（ＤＢｎ）から検索するための入力画面４１５Ａと、ユーザがユーザーデータをデータベース（ＤＢｎ）に登録するための入力画面４１５Ｂと、業務サーバ１０１に、カテゴリ対応テーブル４０８（図４Ａ）を登録するための入力画面（カテゴリ判定条件入力画面）４１５Ｃとをユーザに提供する。通信モジュール４０４ｂは業務サーバ１０１との通信を実行する。 FIG. 5 is a block diagram of the terminal computer 102. The terminal computer 102 includes a controller 417, a memory 415, and a communication module 404b. The memory 415 implements a GUI for supporting a request to the business server 101 for a user. The GUI includes an input screen 415A for the user to search the database (DBn) for data similar to the user data, an input screen 415B for the user to register the user data in the database (DBn), and the business server 101. An input screen (category determination condition input screen) 415C for registering the category correspondence table 408 (FIG. 4A) is provided to the user. The communication module 404b executes communication with the business server 101.

図６は、既述の類似データ検索画面４１５Ａの一例である。符号５０２Ｂは、ユーザが端末計算機１０２の入力データ５０２Ａ（［C:\input.json］：入力データへのパス）を業務サーバ１０１を介して所定のデータベースのデータ構造にアップロードするための操作部である。符号５０３は、業務サーバ１０１のカテゴリ特定モジュール４０６によって特定されたカテゴリ（「金属」と「棒状」の二つのカテゴリ）を表示する領域である。 FIG. 6 is an example of the similar data search screen 415A described above. Reference numeral 502B denotes an operation unit for a user to upload input data 502A ([C: \ input.json]: path to input data) of the terminal computer 102 to a data structure of a predetermined database via the business server 101. is there. Reference numeral 503 denotes an area for displaying the categories specified by the category specifying module 406 of the business server 101 (two categories of “metal” and “rod”).

符号５０４Ａは、業務サーバ１０１の特徴決定モジュール４０７によって特定された特徴の種別（「引張強さ」と「破断伸び」の二つの特徴名）を表示する領域である。ユーザーデータが所定のデータベースに既に登録されている場合には、５０２Ａ、５０２Ｂへの入力に代えて、業務サーバ１０１は、類似データ検索画面４１５Ａに、ユーザーデータが記録されているデータベースへのリンク情報を表示するようにしてもよい。 Reference numeral 504A denotes an area for displaying the types of characteristics (two characteristic names of “tensile strength” and “elongation at break”) specified by the characteristic determination module 407 of the business server 101. If the user data is already registered in the predetermined database, instead of inputting to the 502A and 502B, the business server 101 displays, on the similar data search screen 415A, link information to the database in which the user data is recorded. May be displayed.

符号５０４Ｂは、業務サーバ１０１の特徴量計算モジュール４１３によって計算された、特徴名に対する特徴量を表示する領域である。「引張強さ」（特徴名）に対する特徴量は「２５０ＭＰａ」であり、「破断伸び」（特徴名）に対する特徴量は「０．２％」である。符号５０５は、業務サーバ１０１の類似データ検索モジュール４１４がデータベースを検索して得られた類似データを表示する領域である。符号５０５は、「引張強さ」の特徴量「２５０ＭＰａ」に対して、金属Ａ（「引張強さ」：特徴量「２５０ＭＰａ」）と金属Ｂ（「引張強さ」：特徴量「２３５ＭＰａ」）とが発見されたことを示している。 Reference numeral 504B denotes an area for displaying a feature amount corresponding to a feature name calculated by the feature amount calculation module 413 of the business server 101. The feature value for “tensile strength” (feature name) is “250 MPa”, and the feature value for “elongation at break” (feature name) is “0.2%”. Reference numeral 505 denotes an area for displaying similar data obtained by the similar data search module 414 of the business server 101 searching the database. Reference numeral 505 denotes a metal A (“tensile strength”: feature amount “250 MPa”) and a metal B (“tensile strength”: feature amount “235 MPa”) for the feature amount “250 MPa” of “tensile strength”. And that it was discovered.

図７はユーザーデータ登録画面４１５Ｂの一例である。符号５０８Ａはカテゴリが表示される領域であって、符号５０８Ｂの領域に対する入力によって、業務サーバ１０１は、近傍ノード抽出モジュール４０５、そして、カテゴリ特定モジュール４０６を実行してカテゴリを特定し、特定したカテゴリ（例えば、「金属」,「棒状」）を領域５０８Ａに表示する。 FIG. 7 is an example of the user data registration screen 415B. Reference numeral 508A denotes an area in which a category is displayed, and the business server 101 executes the neighborhood node extracting module 405 and the category specifying module 406 to specify the category by inputting to the area of 508B, and specifies the category. (For example, “metal” and “rod”) are displayed in the area 508A.

さらに、業務サーバ１０１は、領域５０９Ｃへの入力によって、特徴決定モジュール４０７と特徴量計算モジュール４１２を実行して、決定された特徴名（例えば、「引張強さ」、「破断伸び」）を領域５０９Ａに表示し、さらに、特徴量の計算値を５０９Ｂに表示する。なお、５０８Ａの領域、そして、５０９Ａと５０９Ｂの領域に、ユーザが所定値を入力してもよい。この場合、業務サーバ１０１の類似データ検索モジュール４１４は入力された特徴量名と特徴量の値に基づいて類似データを検索してもよい。 Further, the business server 101 executes the feature determination module 407 and the feature amount calculation module 412 by inputting the information to the area 509C, and stores the determined feature names (eg, “tensile strength” and “elongation at break”) in the area. 509A, and the calculated value of the feature amount is displayed on 509B. The user may input a predetermined value in the area 508A and the areas 509A and 509B. In this case, the similar data search module 414 of the business server 101 may search for similar data based on the input feature value name and feature value.

図８は、カテゴリ判定条件の入力画面４１５Ｃの一例である。領域５１１は、近傍ノードの範囲（起点ノードからの相対パス）をユーザが入力する領域である。業務サーバ１０１は、データベースに存在するユーザーデータのデータ構造（図２）を入力画面に表示して、ユーザの入力を支援してもよい。このとき、業務サーバは、データ構造の中で、ユーザーデータのノードをユーザに報知すればよい。ユーザは、この報知を受けて、近傍ノードの範囲を指定すればよい。 FIG. 8 is an example of a category determination condition input screen 415C. The area 511 is an area where the user inputs the range of the neighboring node (relative path from the origin node). The business server 101 may display the data structure (FIG. 2) of the user data existing in the database on the input screen to support the user's input. At this time, the business server may notify the user of the node of the user data in the data structure. Upon receiving this notification, the user may specify the range of the neighboring node.

「(Parent x <= 2)/(Child x <= 3)/*」は、起点ノード（ユーザーデータのノード）からの相対パスを示すものであって、これは、起点ノードから親ノード側に２以内のノードであり、さらに、そこから、子ノード側に３以内のノードを近傍ノードの範囲として指定されたことを意味する。業務サーバ１０１の近傍ノード抽出モジュール４０５は、この入力情報に基づいて近傍ノードを順に走査して、カテゴリ関連情報をサーチする。ユーザは、(Parent x = 2)/(Child x= 3)/*」のように、特定のノードを入力してもよい。また、近傍ノード抽出モジュール４０５は、ユーザーデータの特性、属性等、例えば、ユーザーデータのデータフォーマットがｊｓｏｎであるか、スタースキーマであるかによって、近傍ノードの範囲、又は、近傍ノードを定めてもよい。 "(Parent x <= 2) / (Child x <= 3) / *" indicates a relative path from the origin node (user data node). It means that the node is within two nodes, and further from that, the node within three is specified as the range of the neighboring nodes on the child node side. The neighborhood node extraction module 405 of the business server 101 sequentially scans the neighborhood nodes based on the input information and searches for category related information. The user may input a specific node such as “(Parent x = 2) / (Child x = 3) / *”. Further, the neighboring node extraction module 405 may determine the range of the neighboring nodes or the neighboring nodes depending on the characteristics and attributes of the user data, for example, whether the data format of the user data is json or a star schema. Good.

次いで、ユーザは、カテゴリの判定条件を５１３Ａ〜５１３Ｅに入力する。例えば、図示されているように、ユーザは、（近傍ノード）の値（５１３Ａ）に、Ｃｕ（５１３Ｂ）を含む（或いは含まない）（５１３Ｃ）を入力し、決定（５１３Ｄ）を操作すると、判定条件が確定され、一方、判定条件の編集（５１３Ｅ）も可能である。さらに、ユーザは、判定条件が成立した際に、判定されるべきカテゴリが“金属”であることを入力する（５１４）。ユーザは、複数のカテゴリの中から、意図するカテゴリを選択すればよい。なお、業務サーバ１０１は、入力条件（５１３Ｂ、５１３Ｃ）とカテゴリ名（５１４）との親和性を検証し、ユーザに警告表示を出力してもよい。例えば、Ｃｕ、又は、Ｎｉに、カテゴリ（高分子）は馴染まない。 Next, the user inputs category determination conditions to 513A to 513E. For example, as illustrated, when the user inputs (513C) that includes (or does not include) Cu (513B) in the value (513A) of (neighboring node) and operates the determination (513D), the determination is made. The conditions are fixed, while the judgment conditions can be edited (513E). Further, when the determination condition is satisfied, the user inputs that the category to be determined is “metal” (514). The user may select an intended category from a plurality of categories. Note that the business server 101 may verify the affinity between the input conditions (513B, 513C) and the category name (514) and output a warning display to the user. For example, a category (polymer) does not fit into Cu or Ni.

図９は、カテゴリ対応テーブル４０８の一例である。カテゴリ特定モジュール４０６は、入力情報（図５：４１５Ｃ、図８）をカテゴリ対応テーブル４０８に更新記録する。カテゴリ対応テーブル４０８には、起点ノードからの相対パスの情報を記録するレコード６０２と、カテゴリ関連情報の条件のレコード６０３と、カテゴリ関連条件が成立した場合のカテゴリのレコード６０４とが、互いに対応するように記録されている。 FIG. 9 is an example of the category correspondence table 408. The category specifying module 406 updates and records the input information (415C in FIG. 5, FIG. 8) in the category correspondence table 408. In the category correspondence table 408, a record 602 for recording information of a relative path from the origin node, a record 603 of the condition of the category related information, and a record 604 of the category when the category related condition is satisfied correspond to each other. It is recorded as follows.

図１０は、特徴対応テーブル４０９の一例としてのカテゴリと特徴名を対応させたテーブルである。この対応テーブル４０９は、カテゴリ名７０２と特徴名７０３とを対応させている。このテーブルはユーザによって作成されて業務サーバ１０１に登録されてもよいし、又は、業務サーバ１０１によるビックデータの分析等に基づいたものでもよい。特徴決定モジュール４０７は、カテゴリ特定モジュール４０６によって特定されたカテゴリに基づいて特徴名を決定する。 FIG. 10 is a table as an example of the feature correspondence table 409, in which categories are associated with feature names. This correspondence table 409 associates category names 702 with feature names 703. This table may be created by the user and registered in the business server 101, or may be based on the analysis of the big data by the business server 101 or the like. The feature determining module 407 determines a feature name based on the category specified by the category specifying module 406.

一つのカテゴリに対して、複数の特徴名が割り当てられていてもよいし、複数のカテゴリが同じ特徴名に割り当てられていてもよい。特徴対応テーブル４０９（図１０）は、さらに「特徴重み」というレコード７０４を有する。特徴重みとは、カテゴリ名に記録されたカテゴリに対応する特徴名が、このカテゴリにおいて有する影響度合いのことである。図１０において、「引張強さ」が「棒状金属」に対して発揮する重みが０．３であるのに対いて、「引張り強さ」が「金属」に対して発揮する重みが、これより低い０．２である。即ち、「引張り強さ」は「棒状金属」のカテゴリにおいて単なる「金属」のカテゴリより重視されるべきファクタである。特徴量計算モジュール４１３は、近傍ノードの関連情報によって特定されたカテゴリについて特徴量を算出する際、この重みを特徴量に反映させることによって、類似データの検出精度を向上させることができる。 A plurality of feature names may be assigned to one category, or a plurality of categories may be assigned to the same feature name. The feature correspondence table 409 (FIG. 10) further has a record 704 called “feature weight”. The feature weight refers to the degree of influence that the feature name corresponding to the category recorded in the category name has in this category. In FIG. 10, the “tensile strength” exerted on “metal” is 0.3, whereas the “tensile strength” exerted on “metal” is 0.3. It is low 0.2. That is, “tensile strength” is a factor that should be emphasized in the “bar-shaped metal” category over the simple “metal” category. When calculating the feature amount for the category specified by the related information of the neighboring node, the feature amount calculation module 413 can improve the detection accuracy of similar data by reflecting the weight on the feature amount.

図１１に、特徴対応テーブル４０９の他の例としての特徴リストテーブル８０１を示す。特徴決定モジュール４０７は、カテゴリ対応テーブル４０８のカテゴリを上から順に読み込んで、対応テーブル４０９（図１０）を参照して、カテゴリに対応する特徴名と特徴重みを特徴リストテーブル８０１に登録する。符号８０４は、特徴対応テーブル４０９としての特徴重み合計テーブルであって、特徴決定モジュール４０７は、特徴リストテーブル８０１を参照し、同じ特徴名の特徴重みを合計し、これを特徴合計リストテーブル８０４に登録する。この特徴重み合計テーブル８０４によって、ユーザーデータに備わってる特徴の種類と、ユーザーデータにおける特徴の影響度合いが確定する。 FIG. 11 shows a feature list table 801 as another example of the feature correspondence table 409. The feature determination module 407 reads the categories of the category correspondence table 408 in order from the top, and registers the feature name and the feature weight corresponding to the category in the feature list table 801 with reference to the correspondence table 409 (FIG. 10). Reference numeral 804 denotes a feature weight total table serving as a feature correspondence table 409. The feature determination module 407 refers to the feature list table 801 to sum the feature weights of the same feature name, and stores the sum in the feature total list table 804. register. The feature weight sum table 804 determines the types of features included in the user data and the degree of influence of the features in the user data.

図１２は、計算式リストテーブル４１０の一例を示す。計算式リストテーブル４１０は、特徴名のレコード９０２と、計算可能条件のレコード９０３と、特徴量を計算するための計算式のレコード９０４を備える。このテーブルは、ユーザによって業務サーバ１０１に定義されたものでよい。特徴量計算モジュール４１３は、特徴重み合計テーブル８０４の特徴名を順番に参照して、計算式リストテーブル４１０（図９）に基づいて、特徴名に対応する計算可能条件９０３と特徴量計算式９０４を決定する。 FIG. 12 shows an example of the calculation formula list table 410. The calculation formula list table 410 includes a record 902 of a feature name, a record 903 of a computable condition, and a record 904 of a calculation formula for calculating a feature amount. This table may be defined in the business server 101 by the user. The feature value calculation module 413 refers to the feature names in the feature weight total table 804 in order, and based on the calculation formula list table 410 (FIG. 9), the computable condition 903 and the feature value calculation formula 904 corresponding to the feature name. To determine.

そして、特徴量計算モジュール４１３は、特徴重み合計テーブル８０４に記録された特徴名毎に計算可能条件に合致するか否かを判定して、合致する場合には、ユーザーデータを特徴量計算式に適用して、特徴名毎に特徴量の値を計算する。特徴量計算モジュール４１３は、特徴重み合計テーブル８０４に従って、特徴名毎の特徴量に合計重みを乗じて特徴量の修正値を計算し、計算した値を図６の検索画面（図６：５０４Ｂ）に出力する。類似データ検索モジュール４１４は、特徴名毎の修正値に基づいて、データべースを検索し、類似する特徴量を持ったレコードを検索画面にリストとして表示する（図６：５０５）。 Then, the feature amount calculation module 413 determines whether or not each of the feature names recorded in the feature weight total table 804 matches the calculatable condition, and if so, converts the user data into a feature amount calculation formula. By applying, the value of the feature amount is calculated for each feature name. The feature value calculation module 413 calculates a correction value of the feature value by multiplying the feature value for each feature name by the total weight according to the feature weight sum table 804, and uses the calculated value as a search screen in FIG. 6 (504B in FIG. 6). Output to The similar data search module 414 searches the database based on the correction value for each feature name, and displays records having similar feature amounts as a list on the search screen (FIG. 6: 505).

次に、業務サーバ１０１が、類似データの検索のために、端末計算機１０２からユーザーデータの入力を受けてから特徴量を算出するまでの処理の流れを改めて図１３のフローチャートを利用して説明する。まず、業務サーバ１０１は、端末計算機１０２から類似データ検索の対象のリクエストを受けると（Ｓ１０００）、ユーザーデータのリスト（どのデータベースにどのユーザーデータのフォルダが存在するかというリスト）を端末計算機１０２に出力する。業務サーバ１０１は、端末計算機１０２からユーザーデータの指定を受けると、近傍ノード抽出モジュール４０５を実行して、ユーザーデータのノードの近傍ノードを相対パス数（図８：５１１）にしたがって抽出する（Ｓ１００２）。 Next, the flow of processing from when the business server 101 receives user data input from the terminal computer 102 to calculate a feature amount for searching for similar data will be described again with reference to the flowchart in FIG. . First, when the business server 101 receives a similar data search target request from the terminal computer 102 (S1000), a list of user data (a list of which database contains which user data folder exists) is sent to the terminal computer 102. Output. Upon receiving the designation of the user data from the terminal computer 102, the business server 101 executes the neighboring node extracting module 405 to extract the neighboring nodes of the user data node according to the relative path number (511 in FIG. 8) (S1002). ).

続いて、カテゴリ特定モジュール４０６は、近傍ノードを走査し、カテゴリ対応テーブル４０８（図８、図９:６０３）を参照して、近傍ノードにカテゴリ対応テーブルのカテゴリ関連情報（「Ｃｕ，「Ｎｉ」（図８：５１３Ｂ、図９:６０３）、「Ｃａｓｔ」、「Ｂａｒ」（図９:６０３））が含まれていると、関連情報に対応するカテゴリ（図８：５１４、図９:６０４）を特定し、特定されたカテゴリを端末計算機１０２に表示させて（図６：５０３）、Ｓ１００４を終了する。 Subsequently, the category specifying module 406 scans the neighboring nodes, refers to the category correspondence table 408 (FIG. 8, FIG. 9: 603), and stores the category related information (“Cu,“ Ni ”) of the category correspondence table in the neighboring nodes. (Fig. 8: 513B, Fig. 9: 603), "Cast", and "Bar" (Fig. 9: 603)) include the category (Fig. 8: 514, Fig. 9: 604) corresponding to the related information. Is specified, and the specified category is displayed on the terminal computer 102 (503 in FIG. 6), and S1004 ends.

続いて、特徴決定モジュール４０７は、特徴対応テーブル４０９（図１０）を参照して、特定されたカテゴリに対する特徴名と特徴重みを選択し、これらを特徴リストテーブル８０１に登録する（Ｓ１００６）。さらに続いて、徴決定モジュール４０７は、特徴対応テーブル４０９において、同一の特徴名を含む行の特徴重みの総和を計算し、これを特徴重み合計テーブル８０４に登録する（Ｓ１００８）。なお、このステップでは、影響度が少ない、即ち、合計重みの値が小さい特徴名が登録されるのを防いで、特徴量の計算負荷を低減するために、合計重みの値の所定閾値を超える場合だけ、特徴量名を特徴合計リストテーブル８０４に登録するようにしてもよい。 Subsequently, the feature determination module 407 selects a feature name and a feature weight for the specified category with reference to the feature correspondence table 409 (FIG. 10), and registers these in the feature list table 801 (S1006). Subsequently, the feature determination module 407 calculates the sum of the feature weights of the rows including the same feature name in the feature correspondence table 409, and registers this in the feature weight total table 804 (S1008). In this step, in order to prevent a feature name having a small influence, that is, a feature name having a small total weight value from being registered, and to reduce a calculation load of the feature amount, the total weight value exceeds a predetermined threshold value. Only in this case, the feature amount name may be registered in the feature total list table 804.

さらに続いて、特徴量計算モジュール４１３は、計算式リストテーブル４１０（図１２）を参照し、特徴名９０２に対応する特徴量計算式９０４を選択し、これに、ユーザーデータと、特徴名の合計重み（図１１:８０４）を適用して、特徴量を計算する（Ｓ１１００）。なお、特徴量計算モジュール４１３は、計算式リストテーブル４１０の条件（図１２：９０３）を判定し、例えば、ユーザーデータが所定の条件を満たしたとき、特徴量を計算するようにしてもよい。例えば、破断伸び（特徴名）の計算可能条件が「ユーザーデータ:形式（３Ｄ）」であることは、破断伸びの計算には、ユーザーデータが３次元データであることが必要であることを示している。 Subsequently, the feature value calculation module 413 refers to the calculation formula list table 410 (FIG. 12), selects a feature value calculation formula 904 corresponding to the feature name 902, and adds the user data and the sum of the feature name The feature amount is calculated by applying the weight (804 in FIG. 11) (S1100). Note that the feature amount calculation module 413 may determine the condition (903 in FIG. 12) of the calculation formula list table 410, and calculate the feature amount, for example, when the user data satisfies a predetermined condition. For example, the condition under which the elongation at break (feature name) can be calculated is “user data: format (3D)”, which indicates that the user data must be three-dimensional data in calculating the elongation at break. ing.

以上によって、業務サーバ１０１は、類似データ検索のための特徴量の計算を終了する。業務サーバ１０１の類似データ検索モジュール４１４がこの特徴量に基づいてデータベースをサーチして類似データを検索することにより、データの特徴が多岐に亘っても、データベースに対するデータ検索を、ユーザの負荷を低減しながら、精度よく実行されるようにする。 Thus, the business server 101 ends the calculation of the feature amount for similar data search. The similar data search module 414 of the business server 101 searches the database based on the feature amount to search for similar data, so that even if the characteristics of the data are diverse, the data search for the database can be performed with reduced user load. While being executed with high accuracy.

既述のとおり、近傍ノードは、相対パス数によってその範囲が定まる。そして、近傍ノード抽出モジュール４０５は、相対パス数の範囲のノードを近傍ノードとして抽出する。ところで、相対パス数の範囲のノードには、複数の系統のノードが存在する態様がある。近傍ノード抽出モジュール４０５は全てのノードを抽出する。近傍ノード抽出モジュールは、抽出されるノードを分類する手段を備え、一部の分類のノードに制限してこれを抽出することもできる。既述のアクセス履歴収集モジュール４１１は、この手段の一例である。アクセス履歴とは、ユーザがユーザーデータの登録、更新のために、アクセスした履歴であり、履歴は、日付、アクセス先への経路の記録を含む。 As described above, the range of a neighboring node is determined by the number of relative paths. Then, the neighboring node extracting module 405 extracts nodes within the range of the relative path number as neighboring nodes. By the way, there is a mode in which a plurality of nodes exist in a node within the range of the relative path number. The neighbor node extraction module 405 extracts all nodes. The neighboring node extracting module includes means for classifying extracted nodes, and can also extract the nodes by limiting them to some of the classified nodes. The access history collection module 411 described above is an example of this means. The access history is a history in which a user has accessed for registering and updating user data, and the history includes a record of a date and a route to an access destination.

相対パス数の範囲の近傍ノードに複数の経路があることをまず説明する。ユーザーデータ（特徴量を計算すするための元データ）に複数の近傍データ（近傍ノードのデータ）が存在する場合に、相対パスで指す近傍データが一つに同定できない。例えば、図１４に示すように、ユーザーデータ（材料A1と材料A2を組み合わせた実験Xデータ）には「材料A1の概要」という親ノードを含む近傍ノード１１０１と、「材料A２の概要」という親ノードを経由する近傍ノード１１０２からなる２つの経路を有する。 First, a description will be given of the fact that there are a plurality of routes at neighboring nodes in the range of the relative path number. When a plurality of pieces of neighboring data (data of neighboring nodes) exist in user data (original data for calculating a feature amount), neighboring data indicated by a relative path cannot be uniquely identified. For example, as shown in FIG. 14, user data (experiment X data combining material A1 and material A2) includes a neighbor node 1101 including a parent node “summary of material A1” and a parent node “summary of material A2”. It has two routes consisting of neighboring nodes 1102 via nodes.

さらに、図１５に示すように、入力データに実験データも含まれているので、ユーザーデータに対して「実験Xの概要」という親ノードを含む、図１４の系統とは完全に別系統の近傍ノード１１０３が存在することになる。 Further, as shown in FIG. 15, since the input data also includes the experiment data, the user data includes a parent node “Summary of Experiment X”, which is completely different from the system in FIG. The node 1103 will exist.

次に、アクセス履歴について説明する。端末計算機１０２のＧＵＩからのアクセスでは利用者への表示画面一つに一階層のデータが表示されるため、１１０１（図１４）の階層における「材料A1の概要」というWebページにアクセスしてから「材料A1と材料A2を組み合わせた実験Xデータ」というWebページにアクセスする場合は画面遷移が発生する。このような画面遷移における前回アクセスページと今回アクセスページを紐付ける履歴が生じる。近傍ノード抽出モジュール４０５は、この履歴を利用することによって、相対パス数の範囲の近傍ノードを「化学配合A1」に制限できる。 Next, the access history will be described. When the terminal computer 102 accesses the GUI from the GUI, one layer of data is displayed on one display screen to the user. Therefore, after accessing the web page “Overview of material A1” in the layer of 1101 (FIG. 14), When accessing the web page "Experiment X data combining material A1 and material A2", a screen transition occurs. In such a screen transition, a history linking the previous access page and the current access page is generated. By using this history, the neighboring node extraction module 405 can limit the neighboring nodes within the range of the relative path number to “chemical combination A1”.

ユーザーデータに対してSQLなどのクエリ言語からのアクセスでは、親ノードとその子ノードに対して一度のクエリでアクセスでき、複数のノードを同時に表示するために使用されるJOIN（ジョイン）の対象データ名での順番をアクセス経路の履歴になる。例えば、クエリ文「SELECT * FROM 実験Xデータ JOIN 材料A1の概要 USING 材料ID」は「材料A１と材料A2を組合せた実験Xデータ」とその親の「材料AIの概要」を同時に参照するクエリ文である。「JOIN」の右側のデータ名、すなわち「材料A1の概要」を最初のアクセスとし、左側のデータ名すなわち「実験Xデータ」を次のアクセスとして、JOINの順番をアクセス経路の履歴とすればよい。 When accessing user data from a query language such as SQL, the parent node and its child nodes can be accessed with a single query, and the target data name of a JOIN used to display multiple nodes simultaneously Is the access route history. For example, the query statement “SELECT * FROM experiment X data JOIN outline of material A1 USING material ID” is a query statement that simultaneously refers to “experiment X data combining material A1 and material A2” and its parent “summary of material AI”. It is. The data name on the right side of "JOIN", that is, "Summary of material A1" is used as the first access, the data name on the left side, "Experiment X data" is used as the next access, and the order of JOIN may be used as the access path history. .

アクセス履歴収集モジュール４１３は、ユーザのアクセス履歴を収集して、収集した履歴をアクセス履歴情報テーブル４１１に記録する。図１６はアクセス履歴情報テーブル４１１の一例である。アクセス経路情報は、ユーザからのアクセスごとに与えられる単一のセッションＩＤの列１２０５と、アクセス経路の列１２０６を含む。１２０６には、ユーザによるWebブラウザからのアクセス経路ないしクエリから判定したアクセス経路が記録される。 The access history collection module 413 collects the access history of the user and records the collected history in the access history information table 411. FIG. 16 is an example of the access history information table 411. The access route information includes a column 1205 of a single session ID given for each access from the user and a column 1206 of the access route. In 1206, an access route from a Web browser by a user or an access route determined from a query is recorded.

近傍ノード抽出モジュール４０５は、複数のセッションＩＤのうち所定のセッションＩＤのアクセス履歴に従って近傍ノードを抽出してよい。例えば、図１６において、セッションＩＤが最新のセッションである場合に、これに基づいて近傍ノードを抽出してよい。最新セッションを利用することにより、最新のデータに合わせてユーザーデータの特徴量を求めることができる。 The neighboring node extraction module 405 may extract a neighboring node according to an access history of a predetermined session ID among a plurality of session IDs. For example, in FIG. 16, when the session ID is the latest session, a neighboring node may be extracted based on this. By using the latest session, the feature amount of the user data can be obtained in accordance with the latest data.

近傍ノード抽出モジュール４０５がセッションＩＤ１のアクセス履歴によって近傍ノードを抽出すると、「材料一覧/材料A1/材料A1+材料A2_実験X」というアクセス経路の「材料A1+材料A2_実験X」を取得し、そこを起点として相対パスが「(Parent)」であれば、「材料A1」が近傍ノードとして抽出される。このように、アクセス履歴を利用して、業務用サーバが近傍ノードを抽出することにより、近傍ノードの範囲を制限でき特徴量の計算が効率的になる。 When the neighboring node extraction module 405 extracts a neighboring node based on the access history of the session ID 1, it acquires “material A1 + material A2_experiment X” in the access route “material list / material A1 / material A1 + material A2_experiment X”. If the relative path is “(Parent)” starting there, “material A1” is extracted as a neighboring node. As described above, the business server extracts the neighboring nodes using the access history, so that the range of the neighboring nodes can be limited, and the calculation of the feature amount becomes efficient.

既述の実施形態は、ユーザーデータを材料開発に関わる実験データ、及び/又は、ミュレーションデータであることを説明したが、これに限らず、商品開発のためのビックデータ等、特に限定されるべきものではない。さらに、合計重み（図１１：８０４）のうち、所定の閾値以上の合計重みのみがテーブル８０４に登録されるようにしてもよい。こうすることにより、ユーザーデータに類似するデータの検索に関与する程度が少ない種類の特徴量を除いて、サーバの計算負荷を低減することができる。 In the above-described embodiment, the user data is described as experimental data related to material development, and / or simulation data. However, the present invention is not limited to this, and particularly limited, such as big data for product development. It should not be. Furthermore, of the total weights (804 in FIG. 11), only the total weights equal to or greater than a predetermined threshold may be registered in the table 804. By doing so, it is possible to reduce the computational load on the server except for a type of feature amount that is less involved in searching for data similar to user data.

業務サーバ１０１のコントローラはデータ処理プログラムを実行して人工知能を実現し、人工知能にカテゴリと特徴名とユーザーデータとの組み合わせを学習データとして学習させ、学習結果に基づいてデータ処理対象としてのユーザーデータに計算されたスコアに基づいてユーザーデータの特徴を決定するようにしてもよい。さらに、図１２のテーブルは、一つの特徴に対応付けられる一つの計算式を説明したが、一つの特徴に対して、複数の計算式の中から所定の計算式を対応させてもよい。 The controller of the business server 101 executes the data processing program to realize the artificial intelligence, makes the artificial intelligence learn the combination of the category, the feature name, and the user data as learning data, and based on the learning result, the user as the data processing target. The characteristics of the user data may be determined based on the score calculated for the data. Further, although the table of FIG. 12 describes one calculation formula associated with one feature, a predetermined calculation formula from a plurality of calculation formulas may be associated with one feature.

本発明は、さらに、業務サーバ１０１であり、業務サーバ１０１にデータ処理を実行させるための端末計算機１０２であり、業務サーバ１０１のデータ処理プログラムであり、そして、データ処理プログラムを記録した（非一時的）な記録媒体、例えば、ハードディスク、フラッシュメモリである。以上説明した実施の形態は、本発明の技術的を限定するためのものではなく、あくまでも例示であって、適宜変更することができる。 The present invention further relates to the business server 101, the terminal computer 102 for causing the business server 101 to execute data processing, the data processing program of the business server 101, and the data processing program recorded (non-temporary). Target), such as a hard disk and a flash memory. The embodiments described above are not intended to limit the technical features of the present invention, but are merely examples, and can be appropriately modified.

１０１：業務サーバ
１０２：端末計算機
ＤＢ１，ＤＢ２，・・・ＤＢｎ：データベース
４０２:コントローラ 101: business server 102: terminal computer DB1, DB2,... DBn: database 402: controller

Claims

A data processing system for searching for similar data based on a feature amount of user data,
A controller,
And a memory,
The controller executes a data processing program recorded in the memory,
Recognizing a data structure to which the user data belongs, the data structure comprising: a first node having the user data; and a second node near the first node.
Calculating the feature quantity based on the information of the second node;
Data processing system.

The controller is
Acquiring related information related to the type of the feature of the user data from the information of the second node;
Identifying the characteristics of the user data based on the related information,
Calculating the feature amount based on the specified feature,
The data processing system according to claim 1.

The memory includes a table that associates the related information with the feature,
The controller specifies the feature from the related information based on the table,
The data processing system according to claim 2.

The memory includes a table that associates the specified feature with a calculation formula for the feature amount,
The controller is
Specify a formula based on the table,
Calculating the feature amount by applying the user data to the specified calculation formula,
The data processing system according to claim 3.

The controller extracts the second node based on a relative path number with the first node as a starting node.
The data processing system according to claim 1.

The first node is configured based on a text-based format;
The controller obtains the related information based on text information of the second node,
The data processing system according to claim 2.

The controller is
Identifying a plurality of characteristics of the user data;
Setting a weight according to the degree of influence on the user data for each of the plurality of features,
Applying the weight to the calculation of the feature amount;
The data processing system according to claim 2.

The controller is
Of the plurality of features, sum the weights set for the same feature,
Applying the total value of the weights to the calculation of the feature amount;
The data processing system according to claim 7.

The controller is
Obtaining an access history to the first node;
Extracting the second node based on the relative path number and the access history;
The data processing system according to claim 5.

A data processing method in which a computer searches for similar data based on a feature amount of user data,
The computer executes a data processing program,
Recognizing a data structure to which the user data belongs, the data structure comprising: a first node having the user data; and a second node near the first node.
Calculating the feature quantity based on the information of the second node;
Data processing method.