JP2018185372A

JP2018185372A - Information processor, information processing program and building

Info

Publication number: JP2018185372A
Application number: JP2017085533A
Authority: JP
Inventors: 境　克司; Katsushi Sakai; 克司境; 村瀬　有一; Yuichi Murase; 有一村瀬
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-04-24
Filing date: 2017-04-24
Publication date: 2018-11-22

Abstract

【課題】ユーザごとに発話した音を抽出すること。【解決手段】情報処理装置１０１は、各マイクロフォンＭ１，Ｍ２に入力される音に関する情報の記憶部１１０の各バッファＢ１，Ｂ２へのバッファリングを開始する。情報処理装置１０１は、端末Ｔへの操作に応じて端末Ｔから送信される情報に基づいて、端末Ｔのユーザが発話した期間Ｐと、マイクロフォンＭ１，Ｍ２から端末Ｔへの方向θとを特定する。情報処理装置１０１は、記憶部１１０に記憶されたマイクロフォンＭ１，Ｍ２ごとの音に関する情報のうちの期間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報と、方向θとに基づいて、端末Ｔのユーザが発話した音に関する情報を抽出する。情報処理装置１０１は、抽出したユーザが発話した音に関する情報を、端末Ｔの識別情報と対応付けて出力する。【選択図】図１To extract sounds uttered for each user. An information processing apparatus 101 starts buffering information about sounds input to microphones M1 and M2 into buffers B1 and B2 of a storage unit 110. The information processing apparatus 101 identifies the period P when the user of the terminal T speaks and the direction θ from the microphones M1 and M2 to the terminal T based on information transmitted from the terminal T in response to an operation on the terminal T. To do. The information processing apparatus 101 determines the user of the terminal T based on the information about the sound for each of the microphones M1, M2 in the period P among the information about the sound for each of the microphones M1, M2 stored in the storage unit 110 and the direction θ. Extract information about the sound uttered by. The information processing apparatus 101 outputs information related to the extracted sound uttered by the user in association with the identification information of the terminal T. [Selection] Figure 1

Description

本発明は、情報処理装置、情報処理プログラム、および建物に関する。 The present invention relates to an information processing apparatus, an information processing program, and a building.

近年、人工知能技術の発達や膨大な会話データベースの蓄積により、音声入力は身近で実用的なものとなってきている。また、ファーフィールド音声認識技術も実用レベルに達し、数メートルからの距離での音声認識が可能となってきている。 In recent years, speech input has become familiar and practical due to the development of artificial intelligence technology and the accumulation of a huge conversation database. In addition, far-field speech recognition technology has reached a practical level, and speech recognition at a distance from several meters has become possible.

関連する先行技術としては、例えば、発話者と発話言語の対応関係を保持しておき、発話者と発話言語の対応関係を記録して、現在対話を進める話者対を切り替えながら、対話の流れに応じて言語変換方向を決定するものがある。また、操作者により指定された指定区間と、入力された音声から検出された発声区間とが重複する部分が検出され、入力された音声に基づき話者が操作者以外の者であると判断された場合に、重複する部分が含まれる発声区間を処理区間として決定する技術がある。また、教室内の各生徒を被写体に含めた撮影を行い、オプティカルフローを用いて、発言者となるべき生徒が椅子から立ち上がる動作や口を動かす動作を検出することで撮影画像上における発言者の位置を特定し、発言者の顔部分の画像データを抽出する技術がある。 As related prior art, for example, the correspondence between the speaker and the spoken language is maintained, the correspondence between the speaker and the spoken language is recorded, and the flow of the dialogue is switched while switching the speaker pair that is currently proceeding with the dialogue. Depending on the language, the language conversion direction is determined. In addition, a portion where the designated section designated by the operator overlaps the utterance section detected from the input voice is detected, and the speaker is determined to be a person other than the operator based on the input voice. In such a case, there is a technique for determining an utterance section including an overlapping portion as a processing section. In addition, each student in the classroom is photographed and the optical flow is used to detect the movement of the student who should be the speaker from the chair and the movement of the mouth. There is a technique for specifying a position and extracting image data of a speaker's face.

特開２００７−３２２５２３号公報JP 2007-322523 A 特開２００７−２６４４７３号公報JP 2007-264473 A 国際公開第２０１１／０１３６０５号International Publication No. 2011-013605

しかしながら、従来技術では、複数人が発話する環境下において、音を収集するマイクロフォンを起点として発話する人の居る方向が異なるときに、ユーザが指定した区間での人ごとの音を抽出することが難しい場合がある。例えば、複数人が同じタイミングで発話した場合、人ごとに発話した音を区別して抽出することが難しい。 However, in the conventional technology, in an environment where a plurality of people speak, when the direction of the person who speaks from a microphone that collects sound is different, the sound for each person in the section specified by the user can be extracted. It can be difficult. For example, when multiple people utter at the same timing, it is difficult to distinguish and extract the uttered sound for each person.

一つの側面では、本発明は、ユーザごとに発話した音を抽出することを目的とする。 In one aspect, the present invention aims to extract sounds uttered for each user.

１つの実施態様では、複数のマイクロフォンと、前記複数のマイクロフォンに含まれるマイクロフォンごとに、前記マイクロフォンに入力される音に関する情報を記憶する記憶部と、端末への操作に応じて前記端末から送信される情報に基づいて、前記端末のユーザが発話した期間と、前記複数のマイクロフォンから前記端末への方向とを特定する特定部と、前記記憶部に記憶された前記期間の前記マイクロフォンごとの音に関する情報と、前記方向とに基づいて、前記ユーザが発話した音に関する情報を抽出する抽出部と、を有する情報処理装置が提供される。 In one embodiment, for each microphone included in the plurality of microphones, a storage unit that stores information related to sound input to the microphone, and a signal transmitted from the terminal in response to an operation on the terminal. And a specific unit that specifies a period during which the user of the terminal speaks, a direction from the plurality of microphones to the terminal, and a sound for each microphone of the period stored in the storage unit. An information processing apparatus is provided that includes an extraction unit that extracts information related to the sound uttered by the user based on the information and the direction.

本発明の一側面によれば、ユーザごとに発話した音を抽出することができる。 According to one aspect of the present invention, it is possible to extract a sound uttered for each user.

図１は、実施の形態にかかる情報処理装置１０１の一実施例を示す説明図である。FIG. 1 is an explanatory diagram of an example of the information processing apparatus 101 according to the embodiment. 図２は、情報処理システム１００のシステム構成例を示す説明図である。FIG. 2 is an explanatory diagram illustrating a system configuration example of the information processing system 100. 図３は、情報処理装置１０１のハードウェア構成例を示す説明図である。FIG. 3 is an explanatory diagram illustrating a hardware configuration example of the information processing apparatus 101. 図４は、端末Ｔｉのハードウェア構成例を示すブロック図である。FIG. 4 is a block diagram illustrating a hardware configuration example of the terminal Ti. 図５は、発話区間テーブル２２０の記憶内容の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of the contents stored in the utterance section table 220. 図６は、情報処理装置１０１の機能的構成例を示すブロック図である。FIG. 6 is a block diagram illustrating a functional configuration example of the information processing apparatus 101. 図７は、複数のマイクロフォンＭとユーザとの位置関係の一例を示す説明図（その１）である。FIG. 7 is an explanatory diagram (part 1) illustrating an example of a positional relationship between a plurality of microphones M and a user. 図８は、バッファＢ１，Ｂ２に記憶された音に関する情報の具体例を示す説明図である。FIG. 8 is an explanatory diagram showing a specific example of information related to sound stored in the buffers B1 and B2. 図９は、高さ方向の音源認識の一例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of sound source recognition in the height direction. 図１０は、情報処理装置１０１の第２のハードウェア構成例を示す説明図である。FIG. 10 is an explanatory diagram illustrating a second hardware configuration example of the information processing apparatus 101. 図１１は、複数のマイクロフォンＭとユーザとの位置関係の一例を示す説明図（その２）である。FIG. 11 is an explanatory diagram (part 2) illustrating an example of the positional relationship between the plurality of microphones M and the user. 図１２は、情報処理装置１０１の情報処理手順の一例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of an information processing procedure of the information processing apparatus 101.

以下に図面を参照して、本発明にかかる情報処理装置、情報処理プログラム、および建物の実施の形態を詳細に説明する。 Exemplary embodiments of an information processing apparatus, an information processing program, and a building according to the present invention will be described below in detail with reference to the drawings.

（実施の形態）
図１は、実施の形態にかかる情報処理装置１０１の一実施例を示す説明図である。図１において、情報処理システム１００は、情報処理装置１０１と、端末Ｔと、を含む。情報処理装置１０１は、複数のマイクロフォンＭ（例えば、マイクロフォンＭ１，Ｍ２）と記憶部１１０とを有し、端末Ｔのユーザが発話した音に関する情報を抽出するコンピュータである。 (Embodiment)
FIG. 1 is an explanatory diagram of an example of the information processing apparatus 101 according to the embodiment. In FIG. 1, the information processing system 100 includes an information processing apparatus 101 and a terminal T. The information processing apparatus 101 is a computer that includes a plurality of microphones M (for example, microphones M1 and M2) and a storage unit 110, and extracts information related to sounds uttered by the user of the terminal T.

マイクロフォンＭは、音声を電気信号に変換する装置である。複数のマイクロフォンＭは、例えば、高さが略同一の位置に設置される。また、マイクロフォンＭ間の距離は、例えば、数センチ〜数十センチ程度である。また、マイクロフォンＭの数は、２以上であればよく、３つでも４つでもよい。 The microphone M is a device that converts sound into an electrical signal. For example, the plurality of microphones M are installed at substantially the same height. The distance between the microphones M is, for example, about several centimeters to several tens of centimeters. The number of microphones M may be two or more, and may be three or four.

記憶部１１０は、複数のマイクロフォンＭに含まれるマイクロフォンＭごとに、マイクロフォンＭに入力される音に関する情報を記憶する。より詳細に説明すると、記憶部１１０は、マイクロフォンＭごとに、マイクロフォンＭに入力される音に関する情報を記憶するバッファＢ（例えば、バッファＢ１，Ｂ２）を有する。各バッファＢは、他のバッファＢと独立して各マイクロフォンＭに接続される。音に関する情報は、例えば、マイクロフォンＭに入力される音の音圧や周波数の時系列変化を示す時系列データ（デジタル信号）である。 The storage unit 110 stores information regarding the sound input to the microphone M for each microphone M included in the plurality of microphones M. If it demonstrates in detail, the memory | storage part 110 will have the buffer B (for example, buffer B1, B2) which memorize | stores the information regarding the sound input into the microphone M for every microphone M. FIG. Each buffer B is connected to each microphone M independently of the other buffers B. The information regarding the sound is, for example, time-series data (digital signal) indicating a time-series change in sound pressure or frequency of the sound input to the microphone M.

端末Ｔは、ユーザに操作されるコンピュータである。端末Ｔは、例えば、ユーザに装着される、あるいは、ユーザが操作容易な場所に設置される。具体的には、例えば、端末Ｔは、リング（指輪）型、リストバンド型、ペンダント型、バッチ型などのウェアラブル端末である。また、端末Ｔは、壁に設けられるスイッチ型の装置であってもよい。さらに、端末Ｔは、スマートフォン、タブレットなどの装置であってもよい。 The terminal T is a computer operated by a user. For example, the terminal T is attached to the user or installed in a place where the user can easily operate. Specifically, for example, the terminal T is a wearable terminal such as a ring (ring) type, a wristband type, a pendant type, or a batch type. The terminal T may be a switch type device provided on a wall. Furthermore, the terminal T may be a device such as a smartphone or a tablet.

なお、情報処理システム１００では複数人が発話する環境を想定しているが、図１では、説明の都合上、ユーザを一人のみ表示している。また、複数人が発話する場合、端末Ｔは、発話するユーザごとに設けられる。 In the information processing system 100, an environment where a plurality of people speak is assumed, but in FIG. 1, only one user is displayed for convenience of explanation. Further, when a plurality of people speak, the terminal T is provided for each user who speaks.

ここで、複数人が発話する環境下において、複数人が同じタイミングで発話する場合がある。例えば、数人のグループごとに会話をしているときに、別々のグループの人が同時に発話することがある。このような場合、従来技術では、人ごとの音を区別して抽出することが難しく、ひいては、音声認識精度の低下を招くという問題がある。 Here, in an environment where a plurality of people speak, a plurality of people may speak at the same timing. For example, when talking in groups of several people, people from different groups may speak at the same time. In such a case, in the prior art, it is difficult to distinguish and extract sounds for each person, and as a result, there is a problem in that the voice recognition accuracy is lowered.

また、日常的に会話がなされる場所などで連続的に音声認識を行うと、話者が意図しない会話が認識されて、プライバシーやセキュリティの問題となることがある。さらに、話者が音声認識を意図していないと、不明確な文脈や認識不能な新語を使ってしまう傾向があり、音声認識精度を確保することが難しい。 In addition, if voice recognition is continuously performed in a place where conversations are made on a daily basis, conversations that are not intended by the speaker may be recognized, resulting in privacy and security problems. Furthermore, if the speaker does not intend to recognize speech, there is a tendency to use unclear contexts or unrecognizable new words, and it is difficult to ensure speech recognition accuracy.

そこで、本実施の形態では、複数人が発話する環境下であっても、ユーザごとに発話した音を抽出することができる情報処理方法について説明する。以下、情報処理装置１０１の処理例について説明する。 Therefore, in the present embodiment, an information processing method that can extract a sound uttered for each user even in an environment where a plurality of people speak is described. Hereinafter, a processing example of the information processing apparatus 101 will be described.

（１）情報処理装置１０１は、各マイクロフォンＭ１，Ｍ２に入力される音に関する情報の記憶部１１０の各バッファＢ１，Ｂ２へのバッファリングを開始する。なお、各バッファＢ１，Ｂ２にどれだけの時間長の情報を記憶するかは、任意に設計可能である。例えば、各バッファＢ１，Ｂ２には、数十秒程度の時間長の最新の情報が記憶される。 (1) The information processing apparatus 101 starts buffering information related to sound input to the microphones M1 and M2 in the buffers B1 and B2 of the storage unit 110. It should be noted that it is possible to arbitrarily design how long the information is stored in each of the buffers B1 and B2. For example, the latest information having a time length of about several tens of seconds is stored in each of the buffers B1 and B2.

（２）情報処理装置１０１は、端末Ｔへの操作に応じて端末Ｔから送信される情報に基づいて、端末Ｔのユーザが発話した期間Ｐと、マイクロフォンＭ１，Ｍ２から端末Ｔへの方向θとを特定する。ここで、情報処理システム１００において、ユーザは、自分の発話区間を指定するために、端末Ｔを操作する。 (2) The information processing apparatus 101 determines, based on information transmitted from the terminal T in response to an operation on the terminal T, a period P in which the user of the terminal T speaks and a direction θ from the microphones M1 and M2 to the terminal T. And specify. Here, in the information processing system 100, the user operates the terminal T in order to specify his / her speech section.

例えば、ユーザは、発話を開始するタイミングで、端末Ｔに対して第１の操作を行う。この場合、第１の操作が行われたことを示す第１の情報が、端末Ｔから送信される。また、ユーザは、発話を終了するタイミングで、端末Ｔに対して第２の操作を行う。この場合、第２の操作が行われたことを示す第２の情報が、端末Ｔから送信される。そして、情報処理装置１０１は、端末Ｔから送信される第１および第２の情報に基づいて、第１の操作が行われた時点ｔ１から第２の操作が行われた時点ｔ２までの期間を、ユーザが発話した期間Ｐとして特定する。 For example, the user performs the first operation on the terminal T at the timing of starting the utterance. In this case, the first information indicating that the first operation has been performed is transmitted from the terminal T. In addition, the user performs the second operation on the terminal T at the timing of ending the utterance. In this case, second information indicating that the second operation has been performed is transmitted from the terminal T. Then, the information processing apparatus 101 determines, based on the first and second information transmitted from the terminal T, the period from the time point t1 when the first operation is performed to the time point t2 when the second operation is performed. This is specified as a period P during which the user speaks.

なお、第１および第２の操作がそれぞれ行われた時点ｔ１，ｔ２を示す情報は、例えば、第１および第２の情報にそれぞれ含まれていてもよい。また、情報処理装置１０１は、例えば、第１および第２の情報をそれぞれ受信した時点を、第１および第２の操作が行われた時点ｔ１，ｔ２として特定してもよい（いわゆる、タイムスタンプ）。 Note that the information indicating the time points t1 and t2 when the first and second operations are performed may be included in the first and second information, respectively, for example. Further, for example, the information processing apparatus 101 may specify the time points at which the first and second information are received as time points t1 and t2 at which the first and second operations are performed (so-called time stamps). ).

また、方向θは、マイクロフォンＭ１，Ｍ２から見た端末Ｔの方向である。方向θは、例えば、同一水平面（空間を上方から見た座標系）において、マイクロフォンＭ１，Ｍ２間の中点を通る軸と、マイクロフォンＭ１，Ｍ２間の中点から端末Ｔに向かうベクトルとのなす角度によって表現される。換言すれば、方向θは、マイクロフォンＭ１，Ｍ２から見た端末Ｔのユーザの方向に相当する。 The direction θ is the direction of the terminal T as viewed from the microphones M1 and M2. The direction θ is, for example, an axis passing through the midpoint between the microphones M1 and M2 and a vector from the midpoint between the microphones M1 and M2 toward the terminal T in the same horizontal plane (a coordinate system viewed from above). Expressed by angle. In other words, the direction θ corresponds to the direction of the user of the terminal T viewed from the microphones M1 and M2.

例えば、情報処理装置１０１は、端末Ｔの周辺に設置された複数の受信機（例えば、図２に示すビーコン受信機２０１）によって、端末Ｔから送信される無線信号（例えば、第１の情報）が受信された際の受信信号強度から端末Ｔの位置を推定する。そして、情報処理装置１０１は、推定した端末Ｔの位置とマイクロフォンＭ１，Ｍ２の設置位置とに基づいて、方向θを特定する。 For example, the information processing apparatus 101 includes a wireless signal (for example, first information) transmitted from the terminal T by a plurality of receivers (for example, the beacon receiver 201 illustrated in FIG. 2) installed around the terminal T. The position of the terminal T is estimated from the received signal strength when is received. Then, the information processing apparatus 101 specifies the direction θ based on the estimated position of the terminal T and the installation positions of the microphones M1 and M2.

なお、端末Ｔへの操作に応じて端末Ｔから送信される情報に、例えば、端末Ｔの位置情報が含まれていてもよい。この場合、情報処理装置１０１は、例えば、第１の情報に含まれる位置情報から端末Ｔの位置を推定し、推定した端末Ｔの位置とマイクロフォンＭ１，Ｍ２の設置位置とから方向θを特定することができる。 Note that the information transmitted from the terminal T in response to an operation on the terminal T may include, for example, the position information of the terminal T. In this case, for example, the information processing apparatus 101 estimates the position of the terminal T from the position information included in the first information, and specifies the direction θ from the estimated position of the terminal T and the installation positions of the microphones M1 and M2. be able to.

（３）情報処理装置１０１は、記憶部１１０に記憶されたマイクロフォンＭ１，Ｍ２ごとの音に関する情報のうちの期間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報と、方向θとに基づいて、端末Ｔのユーザが発話した音に関する情報を抽出する。 (3) The information processing apparatus 101 is a terminal based on the information regarding the sound for each of the microphones M1, M2 in the period P among the information regarding the sound for each of the microphones M1, M2 stored in the storage unit 110 and the direction θ. Information related to the sound uttered by the user T is extracted.

具体的には、例えば、情報処理装置１０１は、記憶部１１０の各バッファＢ１，Ｂ２から、期間Ｐの各マイクロフォンＭ１，Ｍ２の音に関する情報をそれぞれ読み出す。そして、情報処理装置１０１は、読み出した期間Ｐの各マイクロフォンＭ１，Ｍ２の音に関する情報に基づいて、方向θへのビームフォーム処理を行うことにより、ユーザが発話した音に関する情報を抽出する。 Specifically, for example, the information processing apparatus 101 reads information related to the sounds of the microphones M1 and M2 in the period P from the buffers B1 and B2 of the storage unit 110, respectively. Then, the information processing apparatus 101 extracts the information related to the sound uttered by the user by performing the beamform process in the direction θ based on the information related to the sound of each of the microphones M1 and M2 in the read period P.

ここで、ビームフォーム処理とは、複数のマイクロフォンＭを用いて指向性を制御する処理であり、例えば、特定の方向の感度を高めて、特定の方向から到来する信号を強調する処理である。ビームフォーム処理によれば、各バッファＢ１，Ｂ２に記憶された各マイクロフォンＭ１，Ｍ２の音に関する情報を用いて、方向θから到来する音声信号を強調して、端末Ｔのユーザが発話した音に関する情報を抽出することができる。 Here, the beamform process is a process of controlling directivity using a plurality of microphones M, for example, a process of enhancing the sensitivity in a specific direction and enhancing a signal coming from the specific direction. According to the beamform processing, the information about the sound of each microphone M1, M2 stored in each buffer B1, B2 is used to emphasize the sound signal coming from the direction θ, and the sound uttered by the user of the terminal T Information can be extracted.

なお、ビームフォーム処理についての詳細な説明は、周知技術のため省略する。ビームフォーム処理の具体的な処理内容については、例えば、「宝珠山治他、「知識の森」、電子情報通信学会、２０１２、２群−６編−２章」を参照することができる。 A detailed description of the beamform process is omitted because it is a well-known technique. As for the specific processing contents of the beamform processing, for example, “Hozayama Osamu et al.,“ Knowledge Forest ”, IEICE, 2012, Group 2, Chapter 6, Chapter-2 can be referred to.

（４）情報処理装置１０１は、抽出したユーザが発話した音に関する情報を、端末Ｔの識別情報と対応付けて出力する。端末Ｔの識別情報は、端末Ｔを一意に識別する情報であり、端末Ｔへの操作に応じて端末Ｔから送信される情報に含まれる。 (4) The information processing apparatus 101 outputs information on the extracted sound uttered by the user in association with the identification information of the terminal T. The identification information of the terminal T is information that uniquely identifies the terminal T, and is included in information transmitted from the terminal T in response to an operation on the terminal T.

具体的には、例えば、情報処理装置１０１は、端末Ｔの識別情報と対応付けて、端末Ｔのユーザが発話した音に関する情報を、音声認識処理を実行する外部装置に送信することにしてもよい。この結果、外部装置において、端末Ｔのユーザが発話した音に関する情報に対する音声認識処理が実行される。 Specifically, for example, the information processing apparatus 101 transmits information related to the sound uttered by the user of the terminal T in association with the identification information of the terminal T to the external apparatus that executes the voice recognition process. Good. As a result, in the external device, the speech recognition process is performed on the information related to the sound uttered by the user of the terminal T.

このように、情報処理装置１０１によれば、端末Ｔの方向θ（マイクロフォンＭ１，Ｍ２から見た端末Ｔのユーザの方向）から到来した音声信号を強調して、端末Ｔのユーザが発話した音を抽出することができる。具体的には、例えば、情報処理装置１０１は、各バッファＢ１，Ｂ２に記憶されたマイクロフォンＭ１，Ｍ２ごとの音に関する情報を用いて、各方向θから到来した音声信号を強調した音情報をそれぞれ生成することができる。また、端末Ｔの識別情報から、端末Ｔのユーザを特定可能となる。このため、音を収集するマイクロフォンを起点として発話する人の居る方向が異なるときに、ユーザが指定した区間での人ごとの音を抽出することができる。例えば、複数人が同じタイミングで発話した場合であっても、ユーザごとに発話した音を精度良く抽出することができる。また、ユーザが発話区間を指定可能なため、ユーザが意図しない会話が認識されて流出するのを防ぐことができる。さらに、ユーザが発話区間を指定することで、ユーザ自身が認識して欲しい語彙の明確化が可能となり、ひいては、音声認識精度の低下を防ぐことができる。 In this way, according to the information processing apparatus 101, the sound uttered by the user of the terminal T by emphasizing the voice signal that has arrived from the direction θ of the terminal T (the direction of the user of the terminal T viewed from the microphones M1 and M2). Can be extracted. Specifically, for example, the information processing apparatus 101 uses the information on the sound for each of the microphones M1 and M2 stored in the buffers B1 and B2, and uses the information about the sound signal that has been emphasized from each direction θ, respectively. Can be generated. Further, the user of the terminal T can be specified from the identification information of the terminal T. For this reason, when the direction of the person who speaks from the microphone that collects the sound is different, the sound for each person in the section specified by the user can be extracted. For example, even when a plurality of people utter at the same timing, it is possible to accurately extract the uttered sound for each user. In addition, since the user can designate the utterance section, it is possible to prevent a conversation unintended by the user from being recognized and leaked. Furthermore, when the user designates the utterance period, the vocabulary that the user wants to recognize can be clarified, and consequently, the voice recognition accuracy can be prevented from being lowered.

（情報処理システム１００のシステム構成例）
つぎに、情報処理システム１００のシステム構成例について説明する。 (System configuration example of the information processing system 100)
Next, a system configuration example of the information processing system 100 will be described.

図２は、情報処理システム１００のシステム構成例を示す説明図である。図２において、情報処理システム１００は、情報処理装置１０１と、複数のビーコン受信機２０１と、端末Ｔ１〜Ｔｎと、を含む。情報処理システム１００において、情報処理装置１０１および複数のビーコン受信機２０１は、有線または無線のネットワーク２１０を介して接続される。ネットワーク２１０は、例えば、ＬＡＮ、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどを含む。 FIG. 2 is an explanatory diagram illustrating a system configuration example of the information processing system 100. In FIG. 2, the information processing system 100 includes an information processing apparatus 101, a plurality of beacon receivers 201, and terminals T1 to Tn. In the information processing system 100, the information processing apparatus 101 and the plurality of beacon receivers 201 are connected via a wired or wireless network 210. The network 210 includes, for example, a LAN, a WAN (Wide Area Network), the Internet, and the like.

以下の説明では、端末Ｔ１〜Ｔｎのうちの任意の端末を「端末Ｔｉ」と表記する場合がある（ｉ＝１，２，…，ｎ）。図１に示した端末Ｔは、例えば、端末Ｔｉに対応する。 In the following description, an arbitrary terminal among the terminals T1 to Tn may be expressed as “terminal Ti” (i = 1, 2,..., N). The terminal T illustrated in FIG. 1 corresponds to the terminal Ti, for example.

情報処理装置１０１は、発話区間テーブル２２０を有し、端末Ｔｉのユーザが発話した音に関する情報を抽出する。なお、発話区間テーブル２２０の記憶内容については、図５を用いて後述する。端末Ｔｉは、例えば、情報処理システム１００のユーザにより使用されるリング（指輪）型のウェアラブル端末である。また、端末Ｔｉは、ユーザが操作容易な場所に設置されることにしてもよい。 The information processing apparatus 101 has an utterance section table 220 and extracts information related to the sound uttered by the user of the terminal Ti. The contents stored in the utterance section table 220 will be described later with reference to FIG. The terminal Ti is, for example, a ring (ring) type wearable terminal used by a user of the information processing system 100. The terminal Ti may be installed in a place where the user can easily operate.

ビーコン受信機２０１は、端末Ｔｉから送信されるビーコン信号を受信するコンピュータである。複数のビーコン受信機２０１は、空間Ｒ内の異なる位置にそれぞれ設置される。空間Ｒは、端末Ｔｉのユーザが存在する空間である。例えば、空間Ｒは、複数人で会話する際に利用される建物、部屋、ブースなどである。建物は、床と、床から立ち上がると共に室内（空間Ｒ）を囲む壁と、を有する。建物の壁には、マイクロフォンＭ１，Ｍ２が備えられている。 The beacon receiver 201 is a computer that receives a beacon signal transmitted from the terminal Ti. The plurality of beacon receivers 201 are respectively installed at different positions in the space R. The space R is a space where the user of the terminal Ti exists. For example, the space R is a building, a room, a booth, or the like that is used when talking by a plurality of people. The building has a floor and a wall that rises from the floor and surrounds the room (space R). Microphones M1 and M2 are provided on the wall of the building.

また、ビーコン受信機２０１は、端末Ｔｉから受信したビーコン信号の受信信号強度を測定し、測定した受信信号強度を、端末Ｔｉから受信したビーコン信号に付加して情報処理装置１０１に送信する。受信信号強度は、ビーコン受信機２０１が受信したビーコン信号の強度を示す指標値である。受信信号強度としては、例えば、ＲＳＳＩ（ＲｅｃｅｉｖｅｄＳｉｇｎａｌＳｔｒｅｎｇｔｈＩｎｄｉｃａｔｏｒ）値を用いることができる。ＲＳＳＩ値の単位は、例えば、［ｄＢｍ］である。 The beacon receiver 201 measures the received signal strength of the beacon signal received from the terminal Ti, adds the measured received signal strength to the beacon signal received from the terminal Ti, and transmits the signal to the information processing apparatus 101. The received signal strength is an index value indicating the strength of the beacon signal received by the beacon receiver 201. As the received signal strength, for example, an RSSI (Received Signal Strength Indicator) value can be used. The unit of the RSSI value is, for example, [dBm].

（情報処理装置１０１のハードウェア構成例）
図３は、情報処理装置１０１のハードウェア構成例を示す説明図である。図３において、情報処理装置１０１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３０１と、メモリ３０２と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）３０３と、ビーコン受信部３０４と、ディスクドライブ３０５と、ディスク３０６と、収音部３０７と、を有する。また、各構成部は、バス３００によってそれぞれ接続される。 (Hardware configuration example of information processing apparatus 101)
FIG. 3 is an explanatory diagram illustrating a hardware configuration example of the information processing apparatus 101. In FIG. 3, the information processing apparatus 101 includes a CPU (Central Processing Unit) 301, a memory 302, an I / F (Interface) 303, a beacon receiving unit 304, a disk drive 305, a disk 306, and a sound collecting unit. 307. Each component is connected by a bus 300.

ここで、ＣＰＵ３０１は、情報処理装置１０１の全体の制御を司る。メモリ３０２は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびフラッシュＲＯＭなどを有する。具体的には、例えば、フラッシュＲＯＭやＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ３０１のワークエリアとして使用される。メモリ３０２に記憶されるプログラムは、ＣＰＵ３０１にロードされることで、コーディングされている処理をＣＰＵ３０１に実行させる。 Here, the CPU 301 governs overall control of the information processing apparatus 101. The memory 302 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), and a flash ROM. Specifically, for example, a flash ROM or ROM stores various programs, and a RAM is used as a work area for the CPU 301. The program stored in the memory 302 is loaded into the CPU 301 to cause the CPU 301 to execute the coded process.

Ｉ／Ｆ３０３は、通信回線を通じてネットワーク２１０（図２参照）に接続され、ネットワーク２１０を介して他のコンピュータ（例えば、図２に示したビーコン受信機２０１）に接続される。そして、Ｉ／Ｆ３０３は、ネットワーク２１０と内部のインターフェースを司り、他のコンピュータからのデータの入出力を制御する。Ｉ／Ｆ３０３には、例えば、モデム、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）などを採用することができる。 The I / F 303 is connected to the network 210 (see FIG. 2) through a communication line, and is connected to another computer (for example, the beacon receiver 201 shown in FIG. 2) via the network 210. The I / F 303 controls an internal interface with the network 210 and controls input / output of data from other computers. For example, a modem or a NIC (Network Interface Card) can be adopted as the I / F 303.

ビーコン受信部３０４は、ビーコン信号（無線信号）を受信する。具体的には、例えば、ビーコン受信部３０４は、ビーコン信号を受信するアンテナと、アンテナによって受信されたアナログ信号をデジタル信号に変換してバス３００に出力する信号処理部と、を有する。 The beacon receiving unit 304 receives a beacon signal (wireless signal). Specifically, for example, the beacon receiving unit 304 includes an antenna that receives a beacon signal, and a signal processing unit that converts an analog signal received by the antenna into a digital signal and outputs the digital signal to the bus 300.

ディスクドライブ３０５は、ＣＰＵ３０１の制御に従ってディスク３０６に対するデータのリード／ライトを制御する。ディスク３０６は、ディスクドライブ３０５の制御で書き込まれたデータを記憶する。ディスク３０６としては、例えば、磁気ディスク、光ディスクなどが挙げられる。 The disk drive 305 controls reading / writing of data with respect to the disk 306 according to the control of the CPU 301. The disk 306 stores data written under the control of the disk drive 305. Examples of the disk 306 include a magnetic disk and an optical disk.

収音部３０７は、マイクロフォンＭ１，Ｍ２と、バッファＢ１，Ｂ２と、を含む。マイクロフォンＭ１，Ｍ２は、音声を電気信号に変換する装置である。マイクロフォンＭ１，Ｍ２は、空間Ｒ内の高さが略同一の位置に設置される。バッファＢ１は、マイクロフォンＭ１に接続され、マイクロフォンＭ１に入力される音に関する情報を記憶する。バッファＢ２は、マイクロフォンＭ２に接続され、マイクロフォンＭ２に入力される音に関する情報を記憶する。 The sound collection unit 307 includes microphones M1 and M2 and buffers B1 and B2. The microphones M1 and M2 are devices that convert sound into an electrical signal. The microphones M1 and M2 are installed at substantially the same height in the space R. The buffer B1 is connected to the microphone M1 and stores information related to sound input to the microphone M1. The buffer B2 is connected to the microphone M2 and stores information related to the sound input to the microphone M2.

なお、情報処理装置１０１は、上述した構成部のうち、例えば、ディスクドライブ３０５やディスク３０６を有さないことにしてもよい。また、情報処理装置１０１は、上述した構成部のほかに、入力装置（例えば、キーボード、マウス、入力パッドなど）、出力装置（例えば、ディスプレイ、スピーカなど）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などを有することにしてもよい。 The information processing apparatus 101 may not include, for example, the disk drive 305 or the disk 306 among the above-described components. The information processing apparatus 101 includes an input device (for example, a keyboard, a mouse, an input pad, etc.), an output device (for example, a display, a speaker, etc.), an SSD (Solid State Drive), and the like in addition to the components described above. You may decide.

（端末Ｔｉのハードウェア構成例）
図４は、端末Ｔｉのハードウェア構成例を示すブロック図である。図４において、端末Ｔｉは、ＣＰＵ４０１と、メモリ４０２と、操作ボタン４０３と、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）ランプ４０４と、ビーコン送信部４０５と、を有する。また、各構成部は、バス４００によってそれぞれ接続される。 (Example of hardware configuration of terminal Ti)
FIG. 4 is a block diagram illustrating a hardware configuration example of the terminal Ti. In FIG. 4, the terminal Ti includes a CPU 401, a memory 402, an operation button 403, an LED (Light Emitting Diode) lamp 404, and a beacon transmission unit 405. Each component is connected by a bus 400.

ＣＰＵ４０１は、端末Ｔｉの全体の制御を司る。メモリ４０２は、例えば、ＲＯＭ、ＲＡＭなどを有する。具体的には、例えば、ＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ４０１のワークエリアとして使用される。メモリ４０２に記憶されるプログラムは、ＣＰＵ４０１にロードされることで、コーディングされている処理をＣＰＵ４０１に実行させる。 The CPU 401 controls the entire terminal Ti. The memory 402 includes, for example, a ROM, a RAM, and the like. Specifically, for example, the ROM stores various programs, and the RAM is used as a work area for the CPU 401. The program stored in the memory 402 is loaded on the CPU 401 to cause the CPU 401 to execute the coded process.

操作ボタン４０３は、端末Ｔｉのユーザの発話区間を指定するために操作される入力装置である。具体的には、例えば、端末Ｔｉのユーザは、発話を開始するタイミングで、操作ボタン４０３をＯＮにする操作を行い、発話を終了するタイミングで、操作ボタン４０３をＯＦＦにする操作を行う。 The operation button 403 is an input device operated to specify the user's utterance section of the terminal Ti. Specifically, for example, the user of the terminal Ti performs an operation of turning on the operation button 403 at the timing of starting the utterance, and performs an operation of turning off the operation button 403 at the timing of ending the utterance.

より詳細に説明すると、例えば、端末Ｔｉのユーザは、発話を開始するタイミングで、操作ボタン４０３を指で押下し（ＯＮ操作）、発話が終了するまで操作ボタン４０３を押下し続ける。そして、端末Ｔｉのユーザは、発話を終了するタイミングで、操作ボタン４０３から指を離す（ＯＦＦ操作）。 More specifically, for example, the user of the terminal Ti presses the operation button 403 with a finger (ON operation) at the timing of starting the utterance, and continues to press the operation button 403 until the utterance ends. Then, the user of the terminal Ti releases his / her finger from the operation button 403 at the timing of ending the speech (OFF operation).

ＬＥＤランプ４０４は、操作ボタン４０３の操作に応じて点灯するランプである。具体的には、例えば、ＬＥＤランプ４０４は、操作ボタン４０３がＯＮになると点灯し、操作ボタン４０３がＯＦＦになると消灯する。ＬＥＤランプ４０４によれば、端末Ｔｉのユーザは、操作ボタン４０３のＯＮ／ＯＦＦの状態を確認することができる。 The LED lamp 404 is a lamp that is turned on in response to the operation of the operation button 403. Specifically, for example, the LED lamp 404 is turned on when the operation button 403 is turned on and turned off when the operation button 403 is turned off. According to the LED lamp 404, the user of the terminal Ti can check the ON / OFF state of the operation button 403.

ビーコン送信部４０５は、操作ボタン４０３の操作に応じて、ビーコン信号（無線信号）を送信する。具体的には、例えば、ビーコン送信部４０５は、バス４００に出力されたデジタル信号をアナログ信号に変換してアンテナに出力する信号処理部と、信号処理部から出力された無線信号を送信するアンテナと、を有する。ビーコン送信部４０５は、例えば、ＢＬＥ（ＢｌｕｅｔｏｏｔｈＬｏｗＥｎｅｒｇｙ）通信により、ビーコン信号を送信する。Ｂｌｕｅｔｏｏｔｈは、登録商標である。 The beacon transmission unit 405 transmits a beacon signal (wireless signal) in response to the operation of the operation button 403. Specifically, for example, the beacon transmission unit 405 includes a signal processing unit that converts a digital signal output to the bus 400 into an analog signal and outputs the analog signal, and an antenna that transmits a radio signal output from the signal processing unit. And having. The beacon transmission unit 405 transmits a beacon signal by, for example, BLE (Bluetooth Low Energy) communication. Bluetooth is a registered trademark.

より詳細に説明すると、例えば、ビーコン送信部４０５は、操作ボタン４０３がＯＮになると、端末Ｔｉの端末ＩＤ「Ｔｉ」と、操作種別「ＯＮ」と、を含むビーコン信号ｂｓを送信する。操作種別「ＯＮ」は、操作ボタン４０３をＯＮにする操作を示す。また、ビーコン送信部４０５は、操作ボタン４０３がＯＦＦになると、端末Ｔｉの端末ＩＤ「Ｔｉ」と、操作種別「ＯＦＦ」と、を含むビーコン信号ｂｓを送信する。操作種別「ＯＦＦ」は、操作ボタン４０３をＯＦＦにする操作を示す。ビーコン信号ｂｓには、例えば、操作ボタン４０３が操作された日時を示す情報が含まれていてもよい。 More specifically, for example, when the operation button 403 is turned on, the beacon transmission unit 405 transmits a beacon signal bs including the terminal ID “Ti” of the terminal Ti and the operation type “ON”. The operation type “ON” indicates an operation for turning on the operation button 403. Further, when the operation button 403 is turned off, the beacon transmission unit 405 transmits a beacon signal bs including the terminal ID “Ti” of the terminal Ti and the operation type “OFF”. The operation type “OFF” indicates an operation for turning off the operation button 403. The beacon signal bs may include information indicating the date and time when the operation button 403 is operated, for example.

なお、端末Ｔｉは、例えば、一次電池または二次電池により駆動する。また、端末Ｔｉは、上述した構成部のほかに、例えば、通信回線を通じてネットワーク２１０（図２参照）に接続される公衆網Ｉ／Ｆや、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）ユニットなどを有することにしてもよい。端末ＴｉがＧＰＳユニットを有する場合、ビーコン送信部４０５は、ＧＰＳユニットにより出力される位置情報を含むビーコン信号ｂｓを送信することにしてもよい。 The terminal Ti is driven by, for example, a primary battery or a secondary battery. In addition to the components described above, the terminal Ti includes, for example, a public network I / F connected to the network 210 (see FIG. 2) through a communication line, a GPS (Global Positioning System) unit, and the like. Also good. When the terminal Ti has a GPS unit, the beacon transmission unit 405 may transmit the beacon signal bs including the position information output by the GPS unit.

（発話区間テーブル２２０の記憶内容）
つぎに、情報処理装置１０１が有する発話区間テーブル２２０の記憶内容について説明する。発話区間テーブル２２０は、例えば、図３に示したメモリ３０２、ディスク３０６などの記憶装置により実現される。 (Stored contents of the utterance section table 220)
Next, the contents stored in the utterance section table 220 of the information processing apparatus 101 will be described. The utterance section table 220 is realized by a storage device such as the memory 302 and the disk 306 shown in FIG.

図５は、発話区間テーブル２２０の記憶内容の一例を示す説明図である。図５において、発話区間テーブル２２０は、端末ＩＤ、方向、ＯＮ時刻およびＯＦＦ時刻のフィールドを有し、各フィールドに情報を設定することで、発話区間情報（例えば、発話区間情報５００−１，５００−２）をレコードとして記憶する。 FIG. 5 is an explanatory diagram showing an example of the contents stored in the utterance section table 220. In FIG. 5, the utterance section table 220 includes fields of terminal ID, direction, ON time, and OFF time. By setting information in each field, utterance section information (for example, utterance section information 500-1, 500). -2) is stored as a record.

端末ＩＤは、端末Ｔｉを一意に識別する識別情報である。端末ＩＤとしては、例えば、端末ＴｉのＭＡＣ（ＭｅｄｉａＡｃｃｅｓｓＣｏｎｔｒｏｌ）アドレスを用いることができる。なお、端末ＩＤフィールドには、例えば、情報処理装置１０１とペアリングされた端末Ｔｉの端末ＩＤが予め設定されることにしてもよい。 The terminal ID is identification information that uniquely identifies the terminal Ti. As the terminal ID, for example, the MAC (Media Access Control) address of the terminal Ti can be used. In the terminal ID field, for example, the terminal ID of the terminal Ti paired with the information processing apparatus 101 may be set in advance.

方向は、マイクロフォンＭ１，Ｍ２から見た端末Ｔｉの方向である。方向は、例えば、同一水平面（空間Ｒを上方から見た座標系）において、マイクロフォンＭ１，Ｍ２間の中点を通る軸（例えば、マイクロフォンＭ１，Ｍ２間を結ぶ線分に直交する軸）と、マイクロフォンＭ１，Ｍ２間の中点から端末Ｔｉに向かうベクトルとのなす角度θによって表現される。 The direction is the direction of the terminal Ti viewed from the microphones M1 and M2. The direction is, for example, an axis passing through the midpoint between the microphones M1 and M2 (for example, an axis orthogonal to a line segment connecting the microphones M1 and M2) on the same horizontal plane (coordinate system when the space R is viewed from above), This is expressed by an angle θ formed by a vector from the midpoint between the microphones M1 and M2 toward the terminal Ti.

ＯＮ時刻は、端末Ｔｉの操作ボタン４０３（図４参照）をＯＮにする操作が行われた日時を示す。ＯＦＦ時刻は、端末Ｔｉの操作ボタン４０３をＯＦＦにする操作が行われた日時を示す。例えば、発話区間情報５００−１は、端末ＩＤ「Ｔ１」、方向「θ１」、ＯＮ時刻「ｔ₁₁」およびＯＦＦ時刻「ｔ₁₂」を示す。 The ON time indicates the date and time when the operation of turning on the operation button 403 (see FIG. 4) of the terminal Ti is performed. The OFF time indicates the date and time when the operation for turning off the operation button 403 of the terminal Ti is performed. For example, the utterance section information 500-1 indicates the terminal ID “T1”, the direction “θ1”, the ON time “t ₁₁ ”, and the OFF time “t ₁₂ ”.

（情報処理装置１０１の機能的構成例）
図６は、情報処理装置１０１の機能的構成例を示すブロック図である。図５において、情報処理装置１０１は、取得部６０１と、特定部６０２と、抽出部６０３と、音声認識部６０４と、出力部６０５と、を含む構成である。取得部６０１〜出力部６０５は制御部となる機能であり、具体的には、例えば、図３に示したメモリ３０２、ディスク３０６などの記憶装置に記憶されたプログラムをＣＰＵ３０１に実行させることにより、または、Ｉ／Ｆ３０３、ビーコン受信部３０４により、その機能を実現する。各機能部の処理結果は、例えば、メモリ３０２、ディスク３０６の記憶装置に記憶される。 (Functional configuration example of the information processing apparatus 101)
FIG. 6 is a block diagram illustrating a functional configuration example of the information processing apparatus 101. In FIG. 5, the information processing apparatus 101 includes an acquisition unit 601, a specification unit 602, an extraction unit 603, a voice recognition unit 604, and an output unit 605. The acquisition unit 601 to the output unit 605 are functions serving as a control unit. Specifically, for example, by causing the CPU 301 to execute a program stored in a storage device such as the memory 302 and the disk 306 illustrated in FIG. Alternatively, the function is realized by the I / F 303 and the beacon receiving unit 304. The processing result of each functional unit is stored in the storage device of the memory 302 and the disk 306, for example.

取得部６０１は、端末Ｔｉへの操作に応じて端末Ｔｉから送信される情報を取得する。ここで、端末Ｔｉから送信される情報は、例えば、端末Ｔｉの操作ボタン４０３の操作に応じて端末Ｔｉのビーコン送信部４０５から送信されるビーコン信号ｂｓである。ビーコン信号ｂｓには、例えば、端末Ｔｉの端末ＩＤと、操作種別とが含まれる。操作種別は、ビーコン信号ｂｓの送信の契機となった操作の種別を示す情報である。操作種別としては、例えば、操作種別「ＯＮ」と操作種別「ＯＦＦ」とがある。操作種別「ＯＮ」は、操作ボタン４０３をＯＮにする操作を示す。操作種別「ＯＦＦ」は、操作ボタン４０３をＯＦＦにする操作を示す。 The acquisition unit 601 acquires information transmitted from the terminal Ti in response to an operation on the terminal Ti. Here, the information transmitted from the terminal Ti is, for example, the beacon signal bs transmitted from the beacon transmission unit 405 of the terminal Ti in response to the operation of the operation button 403 of the terminal Ti. The beacon signal bs includes, for example, the terminal ID of the terminal Ti and the operation type. The operation type is information indicating the type of operation that triggered the transmission of the beacon signal bs. As the operation type, for example, there are an operation type “ON” and an operation type “OFF”. The operation type “ON” indicates an operation for turning on the operation button 403. The operation type “OFF” indicates an operation for turning off the operation button 403.

具体的には、例えば、取得部６０１は、図３に示したビーコン受信部３０４により、端末Ｔｉから送信されるビーコン信号ｂｓを受信することにより、ビーコン信号ｂｓを取得することにしてもよい。また、取得部６０１は、図３に示したＩ／Ｆ３０３により、端末Ｔｉから送信されるビーコン信号ｂｓをビーコン受信機２０１から受信することにより、ビーコン信号ｂｓを取得することにしてもよい。ビーコン受信機２０１から受信されるビーコン信号ｂｓには、ビーコン受信機２０１において測定されたビーコン信号ｂｓの受信信号強度（例えば、ＲＳＳＩ値）が含まれる。 Specifically, for example, the acquisition unit 601 may acquire the beacon signal bs by receiving the beacon signal bs transmitted from the terminal Ti by the beacon reception unit 304 illustrated in FIG. Further, the acquisition unit 601 may acquire the beacon signal bs by receiving the beacon signal bs transmitted from the terminal Ti from the beacon receiver 201 by the I / F 303 illustrated in FIG. 3. The beacon signal bs received from the beacon receiver 201 includes the received signal strength (for example, RSSI value) of the beacon signal bs measured by the beacon receiver 201.

特定部６０２は、取得部６０１によって取得される端末Ｔｉへの操作に応じて端末Ｔｉから送信される情報に基づいて、端末Ｔｉのユーザが発話した期間Ｐ（以下、「発話区間Ｐ」と称する）を特定する。具体的には、例えば、特定部６０２は、操作種別「ＯＮ」を含むビーコン信号ｂｓが取得されたことに応じて、ビーコン信号ｂｓに含まれる端末ＩＤと対応付けて、ビーコン信号ｂｓが取得（受信）された時刻をＯＮ時刻として、発話区間テーブル２２０（図５参照）に登録する。 Based on information transmitted from the terminal Ti in response to an operation on the terminal Ti acquired by the acquiring unit 601, the specifying unit 602 speaks a period P (hereinafter referred to as “speaking section P”) of the user of the terminal Ti. ). Specifically, for example, the identifying unit 602 acquires the beacon signal bs in association with the terminal ID included in the beacon signal bs in response to the acquisition of the beacon signal bs including the operation type “ON” ( The received time is registered in the utterance section table 220 (see FIG. 5) as the ON time.

例えば、端末ＩＤ「Ｔ１」と操作種別「ＯＮ」とを含むビーコン信号ｂｓが取得された場合、特定部６０２は、端末ＩＤ「Ｔ１」と対応付けて、ビーコン信号ｂｓが受信された時刻ｔ₁₁をＯＮ時刻として、発話区間テーブル２２０に登録する。これにより、発話区間情報５００−１が新たなレコードとして発話区間テーブル２２０に登録される。この時点では、発話区間情報５００−１の方向およびＯＦＦ時刻フィールドは「Ｎｕｌｌ」である。 For example, when the beacon signal bs including the terminal ID “T1” and the operation type “ON” is acquired, the specifying unit 602 associates the terminal ID “T1” with the beacon signal bs and receives the time t _11. Is registered in the utterance section table 220 as the ON time. As a result, the utterance section information 500-1 is registered in the utterance section table 220 as a new record. At this time, the direction and OFF time field of the utterance section information 500-1 are “Null”.

つぎに、特定部６０２は、操作種別「ＯＦＦ」を含むビーコン信号ｂｓが取得されたことに応じて、ビーコン信号ｂｓに含まれる端末ＩＤに対応する発話区間情報のＯＦＦ時刻として、ビーコン信号ｂｓが取得（受信）された時刻を設定する。例えば、端末ＩＤ「Ｔ１」と操作種別「ＯＦＦ」とを含むビーコン信号ｂｓが取得された場合、特定部６０２は、端末ＩＤ「Ｔ１」に対応する発話区間情報５００−１のＯＦＦ時刻フィールドに、ビーコン信号ｂｓが受信された時刻ｔ₁₂を設定する。 Next, in response to the acquisition of the beacon signal bs including the operation type “OFF”, the identifying unit 602 uses the beacon signal bs as the OFF time of the speech section information corresponding to the terminal ID included in the beacon signal bs. Set the time of acquisition (reception). For example, when the beacon signal bs including the terminal ID “T1” and the operation type “OFF” is acquired, the specifying unit 602 displays the OFF time field of the utterance section information 500-1 corresponding to the terminal ID “T1”. setting the time t ₁₂ to the beacon signal bs is received.

これにより、特定部６０２は、発話区間情報５００−１のＯＮ時刻ｔ₁₁からＯＦＦ時刻ｔ₁₂までの期間を、端末Ｔ１のユーザが発話した発話区間Ｐとして特定することができる。 Accordingly, the identifying unit 602 may a period from ON time t ₁₁ speech period information 500-1 to OFF time t _12, identified as speech segment P uttered by a user of the terminal T1.

なお、情報処理装置１０１は、ビーコン受信機２０１を経由してビーコン信号ｂｓを受信する場合、複数のビーコン受信機２０１からほぼ同時に、同一の端末ＩＤおよび操作種別を含むビーコン信号ｂｓを受信することになる。この場合、特定部６０２は、例えば、最初に取得されたビーコン信号ｂｓに応じて、発話区間テーブル２２０への登録を行うことにしてもよい。 When the information processing apparatus 101 receives the beacon signal bs via the beacon receiver 201, the information processing apparatus 101 receives the beacon signal bs including the same terminal ID and operation type almost simultaneously from the plurality of beacon receivers 201. become. In this case, for example, the specifying unit 602 may perform registration in the utterance section table 220 according to the beacon signal bs acquired first.

また、特定部６０２は、取得部６０１によって取得される端末Ｔｉへの操作に応じて端末Ｔｉから送信される情報に基づいて、マイクロフォンＭ１，Ｍ２から端末Ｔｉへの方向θを特定する。具体的には、例えば、特定部６０２は、複数のビーコン受信機２０１から受信される端末Ｔｉの端末ＩＤを含むビーコン信号ｂｓに基づいて、端末Ｔｉの位置を推定する。 Further, the specifying unit 602 specifies the direction θ from the microphones M1 and M2 to the terminal Ti based on information transmitted from the terminal Ti in response to an operation on the terminal Ti acquired by the acquiring unit 601. Specifically, for example, the specifying unit 602 estimates the position of the terminal Ti based on the beacon signal bs including the terminal ID of the terminal Ti received from the plurality of beacon receivers 201.

より具体的には、例えば、特定部６０２は、ビーコン受信機２０１からビーコン信号ｂｓが受信されると、当該ビーコン信号ｂｓと同一の端末ＩＤおよび操作種別を含むビーコン信号ｂｓを、他のビーコン受信機２０１から受信するのを一定時間待つ。一定時間は、例えば、１秒程度である。つぎに、特定部６０２は、各ビーコン受信機２０１から受信されたビーコン信号ｂｓのＲＳＳＩ値と、各ビーコン受信機２０１の設置位置とに基づいて、空間Ｒにおける端末Ｔｉの位置を推定する。各ビーコン受信機２０１の設置位置を示す情報は、例えば、メモリ３０２、ディスク３０６などの記憶装置に記憶されている。 More specifically, for example, when the beacon signal bs is received from the beacon receiver 201, the specifying unit 602 receives another beacon signal bs including the same terminal ID and operation type as the beacon signal bs. It waits for a certain time to receive from the machine 201. The certain time is, for example, about 1 second. Next, the specifying unit 602 estimates the position of the terminal Ti in the space R based on the RSSI value of the beacon signal bs received from each beacon receiver 201 and the installation position of each beacon receiver 201. Information indicating the installation position of each beacon receiver 201 is stored in a storage device such as the memory 302 and the disk 306, for example.

なお、端末Ｔｉのビーコン信号を各ビーコン受信機２０１が受信したときの受信信号強度（ＲＳＳＩ値）から端末Ｔｉの位置を推定する技術は周知技術のため、ここでは説明を省略する。また、端末Ｔｉの位置をビーコン信号の受信信号強度から推定する技術として、既存の如何なる技術を用いることにしてもよい。また、端末ＴｉがＧＰＳユニットを有する場合、情報処理装置１０１は、端末ＴｉからＧＰＳユニットにより出力される位置情報を受信することにしてもよい。この場合、特定部６０２は、端末Ｔｉから受信される位置情報をもとに端末Ｔｉの位置を推定することにしてもよい。 In addition, since the technique which estimates the position of terminal Ti from the received signal strength (RSSI value) when each beacon receiver 201 receives the beacon signal of terminal Ti is a well-known technique, description is abbreviate | omitted here. Also, any existing technique may be used as a technique for estimating the position of the terminal Ti from the received signal strength of the beacon signal. When the terminal Ti has a GPS unit, the information processing apparatus 101 may receive position information output from the terminal Ti by the GPS unit. In this case, the specifying unit 602 may estimate the position of the terminal Ti based on the position information received from the terminal Ti.

そして、特定部６０２は、推定した端末Ｔｉの位置と、マイクロフォンＭ１，Ｍ２（収音部３０７）の設置位置とに基づいて、マイクロフォンＭ１，Ｍ２から端末Ｔｉへの方向θを特定する。方向θは、例えば、同一水平面において、マイクロフォンＭ１，Ｍ２間を結ぶ線分の中点を通る当該線分に直交する軸と、マイクロフォンＭ１，Ｍ２間の中点から端末Ｔｉに向かうベクトルとのなす角度に相当する。すなわち、方向θは、マイクロフォンＭ１，Ｍ２から見た端末Ｔｉのユーザの方向に相当する。マイクロフォンＭ１，Ｍ２（収音部３０７）の設置位置を示す情報は、例えば、メモリ３０２、ディスク３０６などの記憶装置に記憶されている。 Then, the specifying unit 602 specifies the direction θ from the microphones M1 and M2 to the terminal Ti based on the estimated position of the terminal Ti and the installation positions of the microphones M1 and M2 (sound collection unit 307). The direction θ is, for example, an axis orthogonal to the line segment passing through the midpoint of the line segment connecting the microphones M1 and M2 on the same horizontal plane and a vector from the midpoint between the microphones M1 and M2 toward the terminal Ti. Corresponds to the angle. That is, the direction θ corresponds to the direction of the user of the terminal Ti viewed from the microphones M1 and M2. Information indicating the installation positions of the microphones M1 and M2 (sound collection unit 307) is stored in a storage device such as the memory 302 and the disk 306, for example.

特定された方向θは、ビーコン信号ｂｓに含まれる端末ＩＤと対応付けて、発話区間テーブル２２０（図５参照）に登録される。例えば、マイクロフォンＭ１，Ｍ２から端末Ｔ１への方向θ１が特定された場合、特定部６０２は、端末ＩＤ「Ｔ１」に対応する発話区間情報５００−１の方向フィールドに、方向θ１を設定する。 The identified direction θ is registered in the utterance section table 220 (see FIG. 5) in association with the terminal ID included in the beacon signal bs. For example, when the direction θ1 from the microphones M1 and M2 to the terminal T1 is specified, the specifying unit 602 sets the direction θ1 in the direction field of the speech section information 500-1 corresponding to the terminal ID “T1”.

抽出部６０３は、バッファＢ１，Ｂ２に記憶されたマイクロフォンＭ１，Ｍ２ごとの音に関する情報のうちの発話区間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報と、方向θとに基づいて、端末Ｔｉのユーザが発話した音に関する情報を抽出する。ここで、音に関する情報は、例えば、音圧の時系列変化を示す時系列データ（デジタル信号）である。 Based on the information about the sound for each of the microphones M1, M2 in the utterance section P among the information about the sound for each of the microphones M1, M2 stored in the buffers B1, B2, the extraction unit 603 is based on the direction θ and the terminal Ti. Information on the sound uttered by the user is extracted. Here, the information regarding sound is, for example, time-series data (digital signal) indicating a time-series change in sound pressure.

具体的には、例えば、まず、抽出部６０３は、発話区間テーブル２２０を参照して、端末Ｔｉの発話区間Ｐを特定する。つぎに、抽出部６０３は、各バッファＢ１，Ｂ２から、特定した発話区間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報を読み出す。つぎに、抽出部６０３は、発話区間テーブル２２０を参照して、端末Ｔｉの方向θを特定する。そして、抽出部６０３は、読み出した発話区間Ｐの各マイクロフォンＭ１，Ｍ２の音に関する情報に基づいて、特定した方向θへのビームフォーム処理を行うことにより、端末Ｔｉのユーザが発話した音に関する情報を抽出（生成）する。 Specifically, for example, first, the extraction unit 603 refers to the utterance interval table 220 to identify the utterance interval P of the terminal Ti. Next, the extraction part 603 reads the information regarding the sound for every microphone M1, M2 of the specified speech section P from each buffer B1, B2. Next, the extraction unit 603 refers to the utterance section table 220 and identifies the direction θ of the terminal Ti. And the extraction part 603 performs the beamform process to the specified direction (theta) based on the information regarding the sound of each microphone M1, M2 of the read speech section P, and the information regarding the sound uttered by the user of the terminal Ti Is extracted (generated).

一例として、図５に示した発話区間テーブル２２０内の発話区間情報５００−１を例に挙げると、まず、抽出部６０３は、発話区間情報５００−１を参照して、端末Ｔ１の発話区間Ｐ（ｔ₁₁〜ｔ₁₂）を特定する。つぎに、抽出部６０３は、各バッファＢ１，Ｂ２から、特定した発話区間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報を読み出す。つぎに、抽出部６０３は、発話区間テーブル２２０を参照して、端末Ｔ１の方向θ１を特定する。そして、抽出部６０３は、発話区間Ｐの各マイクロフォンＭ１，Ｍ２の音に関する情報に基づいて、方向θ１へのビームフォーム処理を行うことにより、端末Ｔ１のユーザが発話した音に関する情報を抽出する。 As an example, taking the utterance section information 500-1 in the utterance section table 220 shown in FIG. 5 as an example, the extraction unit 603 first refers to the utterance section information 500-1 and refers to the utterance section P of the terminal T1. identifying the (t ₁₁ ~t _12). Next, the extraction part 603 reads the information regarding the sound for every microphone M1, M2 of the specified speech section P from each buffer B1, B2. Next, the extraction unit 603 refers to the utterance section table 220 and identifies the direction θ1 of the terminal T1. Then, the extraction unit 603 extracts information related to the sound uttered by the user of the terminal T1 by performing beamforming processing in the direction θ1 based on the information related to the sound of each of the microphones M1 and M2 in the utterance section P.

なお、情報処理装置１０１は、抽出部６０３によって抽出された端末Ｔ１のユーザが発話した音に関する情報に対してノイズキャンセリング処理を施すことにしてもよい。ノイズキャンセリング処理は、例えば、環境ノイズや反射音の影響を除去する処理である。 Note that the information processing apparatus 101 may perform a noise canceling process on information regarding the sound uttered by the user of the terminal T1 extracted by the extraction unit 603. The noise canceling process is a process for removing the influence of environmental noise and reflected sound, for example.

出力部６０５は、抽出部６０３によって抽出された端末Ｔｉのユーザが発話した音に関する情報を、端末Ｔｉの端末ＩＤと対応付けて出力する。出力部６０５の出力形式としては、例えば、メモリ３０２、ディスク３０６などの記憶装置への記憶、Ｉ／Ｆ３０３による外部装置への送信、不図示のディスプレイへの表示などがある。 The output unit 605 outputs information on the sound uttered by the user of the terminal Ti extracted by the extraction unit 603 in association with the terminal ID of the terminal Ti. Examples of the output format of the output unit 605 include storage in a storage device such as the memory 302 and the disk 306, transmission to an external device by the I / F 303, display on a display (not shown), and the like.

具体的には、例えば、出力部６０５は、端末Ｔｉの端末ＩＤと対応付けて、端末Ｔｉのユーザが発話した音に関する情報を、音声認識処理を実行する外部装置に送信することにしてもよい。この結果、外部装置において、端末Ｔｉのユーザが発話した音に関する情報に対する音声認識処理が実行される。 Specifically, for example, the output unit 605 may transmit information related to the sound uttered by the user of the terminal Ti in association with the terminal ID of the terminal Ti to the external device that executes the voice recognition process. . As a result, in the external device, a speech recognition process is performed on information related to the sound uttered by the user of the terminal Ti.

音声認識部６０４は、抽出部６０３によって抽出された端末Ｔｉのユーザが発話した音に関する情報を音声認識する。ここで、音声認識とは、例えば、ユーザが発話した音に関する情報を文字列に変換する処理である。なお、音声認識技術として、既存の如何なる技術を用いることにしてもよい。 The voice recognition unit 604 performs voice recognition on the information related to the sound uttered by the user of the terminal Ti extracted by the extraction unit 603. Here, the speech recognition is, for example, a process of converting information about the sound uttered by the user into a character string. Note that any existing technology may be used as the speech recognition technology.

具体的には、例えば、音声認識部６０４は、音響モデル、言語モデル、単語辞書等を用いて、端末Ｔｉのユーザが発話した音に関する情報を音声認識する。音響モデルは、認識対象の音素がどのような周波数特性を持っているかを表したものである。言語モデルは、音素の並び方に関する制約を表したものである。 Specifically, for example, the speech recognition unit 604 recognizes information related to sound uttered by the user of the terminal Ti using an acoustic model, a language model, a word dictionary, and the like. The acoustic model represents what frequency characteristic the phoneme to be recognized has. The language model expresses restrictions on how phonemes are arranged.

また、出力部６０５は、音声認識部６０４によって音声認識された認識結果を、端末Ｔｉの端末ＩＤと対応付けて出力する。具体的には、例えば、出力部６０５は、端末Ｔｉの端末ＩＤと対応付けて、音声認識部６０４によって音声認識された認識結果をディスク３０６に蓄積することにしてもよい。また、出力部６０５は、音声認識部６０４によって音声認識された認識結果を、音声入力可能な電子機器に送信することにしてもよい。 The output unit 605 outputs the recognition result recognized by the voice recognition unit 604 in association with the terminal ID of the terminal Ti. Specifically, for example, the output unit 605 may store the recognition result recognized by the voice recognition unit 604 in the disk 306 in association with the terminal ID of the terminal Ti. Further, the output unit 605 may transmit the recognition result recognized by the voice recognition unit 604 to an electronic device capable of voice input.

なお、情報処理装置１０１は、音声認識部６０４を有さないことにしてもよい。また、端末Ｔｉは、操作ボタン４０３の操作に応じて、不図示の公衆網Ｉ／Ｆにより、端末Ｔｉの端末ＩＤと操作種別とを含むイベント情報を情報処理装置１０１に送信することにしてもよい。この場合、特定部６０２は、端末Ｔｉへの操作に応じて端末Ｔｉから送信されるイベント情報に基づいて、端末Ｔｉのユーザが発話した発話区間Ｐを特定することにしてもよい。また、情報処理装置１０１の機能部は、例えば、ハードウェアにより実現されてもよい。具体的には、例えば、機能部は、論理積回路であるＡＮＤ、否定論理回路であるＩＮＶＥＲＴＥＲ、論理和回路であるＯＲ、論理和否定回路であるＮＯＲや、ラッチ回路であるＦＦ（ＦｌｉｐＦｌｏｐ）などの素子によって形成されてもよい。 Note that the information processing apparatus 101 may not include the voice recognition unit 604. In addition, the terminal Ti transmits event information including the terminal ID of the terminal Ti and the operation type to the information processing apparatus 101 through a public network I / F (not illustrated) in response to the operation of the operation button 403. Good. In this case, the specifying unit 602 may specify the utterance section P uttered by the user of the terminal Ti based on event information transmitted from the terminal Ti in response to an operation on the terminal Ti. Further, the functional unit of the information processing apparatus 101 may be realized by hardware, for example. Specifically, for example, the functional unit includes AND that is a logical product circuit, INVERTER that is a negative logical circuit, OR that is a logical sum circuit, NOR that is a logical sum negation circuit, and FF (Flip Flop) that is a latch circuit. Or the like.

（端末Ｔｉのユーザが発話した音に関する情報の抽出例）
つぎに、図７および図８を用いて、端末Ｔｉのユーザが発話した音に関する情報の抽出例について説明する。 (Extraction example of information about sound uttered by user of terminal Ti)
Next, an example of extracting information related to the sound uttered by the user of the terminal Ti will be described with reference to FIGS.

図７は、複数のマイクロフォンＭとユーザとの位置関係の一例を示す説明図（その１）である。図７において、空間Ｒ内に端末Ｔ１のユーザＡと、端末Ｔ２のユーザＢとが存在する。空間Ｒは、例えば、会議室やミーティングルームなどである。また、空間Ｒ内のそれぞれ異なる位置に４つのビーコン受信機２０１が設置され、空間Ｒの右側にマイクロフォンＭ１，Ｍ２が設置されている。 FIG. 7 is an explanatory diagram (part 1) illustrating an example of a positional relationship between a plurality of microphones M and a user. In FIG. 7, a user A of the terminal T1 and a user B of the terminal T2 exist in the space R. The space R is, for example, a conference room or a meeting room. Four beacon receivers 201 are installed at different positions in the space R, and microphones M1 and M2 are installed on the right side of the space R.

ここでは、端末Ｔ１のユーザＡと端末Ｔ２のユーザＢとが発話した場合を想定し、マイクロフォンＭ１，Ｍ２にそれぞれ入力された音に関する情報について説明する。また、マイクロフォンＭ１，Ｍ２から端末Ｔ１への方向θを「方向θ１」とし、マイクロフォンＭ１，Ｍ２から端末Ｔ２への方向θを「方向θ２」とする。すなわち、方向θ１は、マイクロフォンＭ１，Ｍ２から見た端末Ｔ１のユーザＡの方向に相当する。また、方向θ２は、マイクロフォンＭ１，Ｍ２から見た端末Ｔ２のユーザＢの方向に相当する。ただし、θ１は、基準軸（図７中、点線）から時計回りの角度を示し、θ２は、基準軸から反時計回りの角度を示す。 Here, assuming that the user A of the terminal T1 and the user B of the terminal T2 speak, information on sounds input to the microphones M1 and M2 will be described. Also, the direction θ from the microphones M1, M2 to the terminal T1 is “direction θ1,” and the direction θ from the microphones M1, M2 to the terminal T2 is “direction θ2.” That is, the direction θ1 corresponds to the direction of the user A of the terminal T1 viewed from the microphones M1 and M2. The direction θ2 corresponds to the direction of the user B of the terminal T2 as viewed from the microphones M1 and M2. However, θ1 indicates a clockwise angle from the reference axis (dotted line in FIG. 7), and θ2 indicates a counterclockwise angle from the reference axis.

図８は、バッファＢ１，Ｂ２に記憶された音に関する情報の具体例を示す説明図である。図８において、音情報８１０は、バッファＢ１に記憶された、マイクロフォンＭ１に入力された音に関する情報である。音情報８２０は、バッファＢ２に記憶された、マイクロフォンＭ２に入力された音に関する情報である。 FIG. 8 is an explanatory diagram showing a specific example of information related to sound stored in the buffers B1 and B2. In FIG. 8, sound information 810 is information related to the sound input to the microphone M1 stored in the buffer B1. The sound information 820 is information related to the sound input to the microphone M2 stored in the buffer B2.

ここでは、ユーザＡにより、時刻ｔ₁₁に端末Ｔ１の操作ボタン４０３をＯＮにする操作が行われ、時刻ｔ₁₂に端末Ｔ１の操作ボタン４０３をＯＦＦにする操作が行われた場合を想定する。また、ユーザＢにより、時刻ｔ₂₁に端末Ｔ２の操作ボタン４０３をＯＮにする操作が行われ、時刻ｔ₂₂に端末Ｔ２の操作ボタン４０３をＯＦＦにする操作が行われた場合を想定する。 Here, the user A, the operation of the operation button 403 of the terminal T1 is turned ON at time t ₁₁ is performed, operation of the operation button 403 of the terminal T1 to OFF at time t ₁₂ is assumed when done. Further, the user B, the operation of the operation button 403 of the terminal T2 is turned ON at time t ₂₁ is performed, operation of the operation button 403 of the terminal T2 is turned OFF at time t ₂₂ is assumed when done.

この場合、抽出部６０３は、各バッファＢ１，Ｂ２から、時刻ｔ₁₁から時刻ｔ₁₂までのマイクロフォンＭ１，Ｍ２ごとの音に関する情報を読み出す（図８中、点線枠８３０）。そして、抽出部６０３は、読み出した各マイクロフォンＭ１，Ｍ２の音に関する情報に基づいて、方向θ１（ユーザＡの方向）へのビームフォーム処理を行う。これにより、方向θ１から到来した信号を強調して、端末Ｔ１のユーザＡが発話した音に関する情報を抽出することができる。 In this case, the extraction unit 603, from the buffer B1, B2, reads the information about the sound of each microphone M1, M2 from time t ₁₁ to time t ₁₂ (in FIG. 8, a dotted line frame 830). Then, the extraction unit 603 performs beamform processing in the direction θ1 (the direction of the user A) based on the read information regarding the sounds of the microphones M1 and M2. As a result, it is possible to extract information related to the sound uttered by the user A of the terminal T1 while emphasizing the signal arriving from the direction θ1.

また、抽出部６０３は、各バッファＢ１，Ｂ２から、時刻ｔ₂₁から時刻ｔ₂₂までのマイクロフォンＭ１，Ｍ２ごとの音に関する情報を読み出す（図８中、点線枠８４０）。そして、抽出部６０３は、読み出した各マイクロフォンＭ１，Ｍ２の音に関する情報に基づいて、方向θ２（ユーザＢの方向）へのビームフォーム処理を行う。これにより、方向θ２から到来した信号を強調して、端末Ｔ２のユーザＢが発話した音に関する情報を抽出することができる。 The extraction unit 603, from the buffer B1, B2, reads the information about the sound of each microphone M1, M2 from time t ₂₁ to time t ₂₂ (in FIG. 8, a dotted line frame 840). Then, the extraction unit 603 performs beamform processing in the direction θ2 (the direction of the user B) based on the read information regarding the sounds of the microphones M1 and M2. As a result, it is possible to extract information related to the sound uttered by the user B of the terminal T2 while emphasizing the signal arriving from the direction θ2.

ここでは、ユーザＡの発話区間Ｐ（時刻ｔ₁₁から時刻ｔ₁₂）とユーザＢの発話区間Ｐ（時刻ｔ₂₁から時刻ｔ₂₂）とが一部重なっている。すなわち、ユーザＡとユーザＢとが同時に会話した区間（点線枠８３０と点線枠８４０とが重なった部分）が存在する。このような場合であっても、情報処理装置１０１は、各バッファＢ１，Ｂ２に記憶された音に関する情報を用いて、各ユーザＡ，Ｂの方向θ１，θ２から到来した音声信号を強調した音情報をそれぞれ生成して、各ユーザＡ，Ｂが発話した音を抽出することができる。 Here, the utterance section P of user A (from time t ₁₁ to time t ₁₂ ) and the utterance section P of user B (from time t ₂₁ to time t ₂₂ ) partially overlap. That is, there is a section where the user A and the user B talk at the same time (a portion where the dotted frame 830 and the dotted frame 840 overlap). Even in such a case, the information processing apparatus 101 uses the information related to the sound stored in each of the buffers B1 and B2, and emphasizes the sound signal that has arrived from the directions θ1 and θ2 of the users A and B. Information can be generated to extract sounds uttered by the users A and B.

（高さ方向の音源認識）
つぎに、図９を用いて、高さ方向の音源認識について説明する。音声認識に用いられる音響モデルには、大人用の音響モデルや子供用の音響モデルなど、音源となるユーザの成長度合いに応じたモデルが存在する。したがって、例えば、端末Ｔｉのユーザが大人であるか子供であるかを判別できれば、音声認識にどの音響モデルを用いるのがよいのかを判断することが可能となる。 (Sound source recognition in the height direction)
Next, sound source recognition in the height direction will be described with reference to FIG. As acoustic models used for speech recognition, there are models according to the degree of growth of users as sound sources, such as an acoustic model for adults and an acoustic model for children. Therefore, for example, if it can be determined whether the user of the terminal Ti is an adult or a child, it is possible to determine which acoustic model should be used for speech recognition.

そこで、情報処理装置１０１は、上述した複数のマイクロフォンＭ（例えば、マイクロフォンＭ１，Ｍ２）とは別に、空間Ｒ内の高さが異なる位置に設置される複数のマイクロフォンＭ’を有することにしてもよい。複数のマイクロフォンＭ’それぞれに入力される音に関する情報は、例えば、メモリ３０２、ディスク３０６に記憶される。ただし、複数のマイクロフォンＭのうちのいずれかのマイクロフォンＭを、マイクロフォンＭ’の一つとして用いることにしてもよい。 Therefore, the information processing apparatus 101 has a plurality of microphones M ′ installed at different positions in the space R in addition to the plurality of microphones M (for example, the microphones M1 and M2) described above. Good. Information regarding the sound input to each of the plurality of microphones M ′ is stored in, for example, the memory 302 and the disk 306. However, any one of the plurality of microphones M may be used as one of the microphones M ′.

図９は、高さ方向の音源認識の一例を示す説明図である。図９において、空間Ｒ内の壁９０１に、高さが異なるようにマイクロフォンＭ’ａ，Ｍ’ｂが設置されている。マイクロフォンＭ’ａ，Ｍ’ｂ間の距離は、例えば、数センチメートル〜数十センチメートル程度である。 FIG. 9 is an explanatory diagram illustrating an example of sound source recognition in the height direction. In FIG. 9, microphones M′a and M′b are installed on a wall 901 in the space R so as to have different heights. The distance between the microphones M′a and M′b is, for example, about several centimeters to several tens of centimeters.

ここでは、マイクロフォンＭ’ａ，Ｍ’ｂ間の中点が、床９０２から「１３０ｃｍ」の高さとなっている。ただし、高さ「１３０ｃｍ」は一例であり、任意に変更可能である。また、壁９０１における子供でも操作可能な高さ（例えば、１００ｃｍ程度）の位置に、端末Ｔｉが設置されている。なお、端末Ｔｉとは異なる他の端末Ｔについても、端末Ｔｉとほぼ同じ高さの位置に設置される。 Here, the midpoint between the microphones M′a and M′b is a height of “130 cm” from the floor 902. However, the height “130 cm” is an example and can be arbitrarily changed. In addition, the terminal Ti is installed at a height (for example, about 100 cm) on the wall 901 that can be operated by a child. Note that another terminal T different from the terminal Ti is also installed at a position substantially the same height as the terminal Ti.

この場合、音声認識部６０４は、マイクロフォンＭ’ａ，Ｍ’ｂそれぞれに入力される音に関する情報に基づいて、端末Ｔｉのユーザの身長が所定の高さＫ以上であるか否かを判断することにしてもよい。所定の高さＫは、マイクロフォンＭ’ａ，Ｍ’ｂが設置された高さに相当する。図９の例では、所定の高さＫは、例えば、「１３０ｃｍ」である。 In this case, the voice recognition unit 604 determines whether or not the height of the user of the terminal Ti is equal to or higher than a predetermined height K based on information about sound input to the microphones M′a and M′b. You may decide. The predetermined height K corresponds to the height at which the microphones M'a and M'b are installed. In the example of FIG. 9, the predetermined height K is, for example, “130 cm”.

ただし、高さ方向の音源を認識するにあたり、端末Ｔｉのユーザのみが発話することとする。この際、端末Ｔｉのユーザは、発話区間を指定するための操作ボタン４０３の操作を行う。この結果、端末Ｔｉのビーコン信号ｂｓが送信され、情報処理装置１０１は、ビーコン信号ｂｓから、端末Ｔｉを音源として特定することができる。 However, only the user of the terminal Ti speaks when recognizing the sound source in the height direction. At this time, the user of the terminal Ti operates the operation button 403 for designating the utterance section. As a result, the beacon signal bs of the terminal Ti is transmitted, and the information processing apparatus 101 can specify the terminal Ti as a sound source from the beacon signal bs.

具体的には、例えば、音声認識部６０４は、各マイクロフォンＭ’ａ，Ｍ’ｂに入力される音の時間差（音源からの伝搬遅延時間の差）から、マイクロフォンＭ’ａ，Ｍ’ｂの上方向または下方向のいずれの方向に音源があるかを判断する。ここで、音源が上方向にある場合、音声認識部６０４は、端末Ｔｉのユーザの身長が所定の高さＫ以上であると判断する。一方、音源が下方向にある場合、音声認識部６０４は、端末Ｔｉのユーザの身長が所定の高さＫ未満であると判断する。 Specifically, for example, the speech recognition unit 604 determines whether the microphones M′a and M′b are based on the time difference between sounds input to the microphones M′a and M′b (difference in propagation delay time from the sound source). Determine whether the sound source is in the upward or downward direction. Here, when the sound source is in the upward direction, the voice recognition unit 604 determines that the height of the user of the terminal Ti is equal to or higher than the predetermined height K. On the other hand, when the sound source is in the downward direction, the voice recognition unit 604 determines that the height of the user of the terminal Ti is less than the predetermined height K.

そして、音声認識部６０４は、判断結果に基づいて、抽出部６０３によって抽出された端末Ｔｉのユーザが発話した音に関する情報を音声認識する。具体的には、例えば、音声認識部６０４は、端末Ｔｉのユーザの身長が所定の高さＫ以上の場合、大人用の音響モデルを用いて音声認識を行う（図９中、（９−１））。一方、端末Ｔｉのユーザの身長が所定の高さＫ未満の場合には、音声認識部６０４は、子供用の音響モデルを用いて音声認識を行う（図９中、（９−２））。 Based on the determination result, the voice recognition unit 604 recognizes the information related to the sound uttered by the user of the terminal Ti extracted by the extraction unit 603. Specifically, for example, when the height of the user of the terminal Ti is equal to or higher than a predetermined height K, the voice recognition unit 604 performs voice recognition using an adult acoustic model ((9-1 in FIG. 9). )). On the other hand, when the height of the user of the terminal Ti is less than the predetermined height K, the voice recognition unit 604 performs voice recognition using an acoustic model for children ((9-2) in FIG. 9).

これにより、端末Ｔｉのユーザが大人であるか子供であるかを判別して、音声認識に用いる音響モデルを設定することが可能となり、音声認識精度を向上させることができる。また、図９に示すような位置関係で、マイクロフォンＭ’ａ，Ｍ’ｂと端末Ｔｉを壁９０１に設置することで、マイクロフォンＭ’ａ，Ｍ’ｂから音源（端末Ｔｉのユーザ）までの距離が固定され、音源方向の推定精度を向上させることができる。 Thereby, it is possible to determine whether the user of the terminal Ti is an adult or a child, and to set an acoustic model used for speech recognition, thereby improving speech recognition accuracy. In addition, by installing the microphones M′a, M′b and the terminal Ti on the wall 901 in the positional relationship as shown in FIG. 9, from the microphones M′a, M′b to the sound source (user of the terminal Ti). The distance is fixed and the estimation accuracy of the sound source direction can be improved.

（端末Ｔｉのユーザの位置特定）
つぎに、図１０および図１１を用いて、端末Ｔｉのユーザの位置特定について説明する。まず、図１０を用いて、情報処理装置１０１の第２のハードウェア構成例について説明する。ただし、図３で説明した構成部と同一の構成部については説明を省略する。 (Locating the user of the terminal Ti)
Next, the location of the user of the terminal Ti will be described with reference to FIGS. First, a second hardware configuration example of the information processing apparatus 101 will be described with reference to FIG. However, the description of the same components as those described in FIG. 3 is omitted.

図１０は、情報処理装置１０１の第２のハードウェア構成例を示す説明図である。図１０において、情報処理装置１０１は、ＣＰＵ３０１と、メモリ３０２と、Ｉ／Ｆ３０３と、ビーコン受信部３０４と、ディスクドライブ３０５と、ディスク３０６と、収音部３０７と、第２の収音部１００１と、を有する。また、各構成部は、バス３００によってそれぞれ接続される。 FIG. 10 is an explanatory diagram illustrating a second hardware configuration example of the information processing apparatus 101. 10, the information processing apparatus 101 includes a CPU 301, a memory 302, an I / F 303, a beacon receiving unit 304, a disk drive 305, a disk 306, a sound collecting unit 307, and a second sound collecting unit 1001. And having. Each component is connected by a bus 300.

第２の収音部１００１は、マイクロフォンＭ３，Ｍ４と、バッファＢ３，Ｂ４と、を含む。マイクロフォンＭ３，Ｍ４は、音声を電気信号に変換する装置である。マイクロフォンＭ３，Ｍ４は、空間Ｒ内の高さが略同一の位置であって、マイクロフォンＭ１，Ｍ２とは異なる位置に設置される。 The second sound collection unit 1001 includes microphones M3 and M4 and buffers B3 and B4. The microphones M3 and M4 are devices that convert sound into an electrical signal. The microphones M3 and M4 are installed at positions that have substantially the same height in the space R and are different from the microphones M1 and M2.

バッファＢ３は、マイクロフォンＭ３に接続され、マイクロフォンＭ３に入力される音に関する情報を記憶する。バッファＢ４は、マイクロフォンＭ４に接続され、マイクロフォンＭ４に入力される音に関する情報を記憶する。なお、第２の収音部１００１は、３以上のマイクロフォンＭと、３以上のマイクロフォンＭそれぞれに接続されるバッファＢとを含むことにしてもよい。 The buffer B3 is connected to the microphone M3 and stores information related to sound input to the microphone M3. The buffer B4 is connected to the microphone M4 and stores information related to the sound input to the microphone M4. The second sound collection unit 1001 may include three or more microphones M and a buffer B connected to each of the three or more microphones M.

図１１は、複数のマイクロフォンＭとユーザとの位置関係の一例を示す説明図（その２）である。図１１において、空間Ｒ内に端末Ｔ１のユーザＡが存在する場合を想定する。また、空間Ｒの左側にマイクロフォンＭ１，Ｍ２が設置され、空間Ｒの手前側にマイクロフォンＭ３，Ｍ４が設置されている。 FIG. 11 is an explanatory diagram (part 2) illustrating an example of the positional relationship between the plurality of microphones M and the user. In FIG. 11, it is assumed that the user A of the terminal T1 exists in the space R. Further, microphones M1 and M2 are installed on the left side of the space R, and microphones M3 and M4 are installed on the front side of the space R.

ここでは、Ｘ軸とＹ軸とからなる原点Ｏの座標系が設定されているとする。Ｘ軸とＹ軸は水平面に平行である。すなわち、Ｘ軸とＹ軸とからなる座標系は、空間Ｒを上方から見た座標系である。また、マイクロフォンＭ１，Ｍ２の設置位置を「（Ｘ，Ｙ）＝（０，Ｈ）」とし、マイクロフォンＭ３，Ｍ４の設置位置を「（Ｘ，Ｙ）＝（Ｗ，０）」とする。 Here, it is assumed that a coordinate system of the origin O composed of the X axis and the Y axis is set. The X axis and the Y axis are parallel to the horizontal plane. That is, the coordinate system composed of the X axis and the Y axis is a coordinate system when the space R is viewed from above. Further, the installation positions of the microphones M1 and M2 are “(X, Y) = (0, H)”, and the installation positions of the microphones M3 and M4 are “(X, Y) = (W, 0)”.

この場合、特定部６０２は、マイクロフォンＭ１，Ｍ２それぞれに入力される音に関する情報に基づいて、マイクロフォンＭ１，Ｍ２から端末Ｔｉへの方向θ_aを特定する。ただし、端末Ｔｉのユーザの位置を特定するにあたり、端末Ｔｉのユーザのみが発話することとする。この際、端末Ｔｉのユーザは、発話区間を指定するための操作ボタン４０３の操作を行う。この結果、端末Ｔｉのビーコン信号ｂｓが送信され、情報処理装置１０１は、ビーコン信号ｂｓから、端末Ｔｉを音源として特定することができる。 In this case, the specifying unit 602 specifies the direction θ _a from the microphones M1 and M2 to the terminal Ti based on information regarding the sound input to the microphones M1 and M2. However, only the user of the terminal Ti speaks in specifying the position of the user of the terminal Ti. At this time, the user of the terminal Ti operates the operation button 403 for designating the utterance section. As a result, the beacon signal bs of the terminal Ti is transmitted, and the information processing apparatus 101 can specify the terminal Ti as a sound source from the beacon signal bs.

また、特定部６０２は、マイクロフォンＭ３，Ｍ４それぞれに入力される音に関する情報に基づいて、マイクロフォンＭ３，Ｍ４から端末Ｔｉへの方向θ_bを特定する。そして、抽出部６０３は、特定部６０２によって特定された方向θ_a，θ_bと、マイクロフォンＭ１，Ｍ２の設置位置と、マイクロフォンＭ３，Ｍ４の設置位置とに基づいて、端末Ｔｉの位置を特定する。 Further, the identifying unit 602, based on the information about the sound input to the microphone M3, M4, respectively, to identify the direction theta _b from the microphone M3, M4 to the terminal Ti. Then, the extracting unit 603 specifies the position of the terminal Ti based on the directions θ _a and θ _b specified by the specifying unit 602, the installation positions of the microphones M1 and M2, and the installation positions of the microphones M3 and M4. .

具体的には、例えば、抽出部６０３は、マイクロフォンＭ１，Ｍ２とマイクロフォンＭ３，Ｍ４それぞれへの同時入力性（ほぼ同時に同じ発話が入力される）から、下記式（１）および（２）を用いて、端末Ｔｉの位置（ｘ，ｙ）を特定することができる。 Specifically, for example, the extraction unit 603 uses the following formulas (1) and (2) based on the simultaneous input characteristics to the microphones M1 and M2 and the microphones M3 and M4 (the same utterance is input almost simultaneously). Thus, the position (x, y) of the terminal Ti can be specified.

ｘ＝（ｔａｎθ_b・Ｈ＋Ｗ）／（１−ｔａｎθ_a・ｔａｎθ_b）・・・（１）
ｙ＝｛ｔａｎθ_a（ｔａｎθ_b・Ｈ＋Ｗ）＋Ｈ｝／（１−ｔａｎθ_a・ｔａｎθ_b）
・・・（２） x = (tan θ _b · H + W) / (1−tan θ _a · tan θ _b ) (1)
y = {tan θ _a (tan θ _b · H + W) + H} / (1−tan θ _a · tan θ _b )
... (2)

これにより、空間Ｒにおける端末Ｔｉのユーザの位置を特定することができる。 Thereby, the position of the user of the terminal Ti in the space R can be specified.

そして、抽出部６０３は、特定した端末Ｔｉの位置に基づいて、収音部３０７（バッファＢ１，Ｂ２）または第２の収音部１００１（バッファＢ３，Ｂ４）のいずれに記憶された情報をもとに端末Ｔｉのユーザが発話した音に関する情報を抽出するかを決定することにしてもよい。具体的には、例えば、抽出部６０３は、マイクロフォンＭ１，Ｍ２の設置位置から端末Ｔｉの位置までの距離Ｄ１を算出する。また、抽出部６０３は、マイクロフォンＭ３，Ｍ４の設置位置から端末Ｔｉの位置までの距離Ｄ２を算出する。ただし、距離Ｄ１，Ｄ２は、Ｘ軸とＹ軸とからなる座標系における距離である。 Then, the extraction unit 603 stores the information stored in either the sound collection unit 307 (buffers B1 and B2) or the second sound collection unit 1001 (buffers B3 and B4) based on the identified position of the terminal Ti. Alternatively, it may be determined whether to extract information related to the sound spoken by the user of the terminal Ti. Specifically, for example, the extraction unit 603 calculates a distance D1 from the installation position of the microphones M1 and M2 to the position of the terminal Ti. Further, the extraction unit 603 calculates a distance D2 from the installation position of the microphones M3 and M4 to the position of the terminal Ti. However, the distances D1 and D2 are distances in a coordinate system composed of the X axis and the Y axis.

ここで、距離Ｄ１が距離Ｄ２よりも短い場合、抽出部６０３は、収音部３０７に記憶された情報をもとに端末Ｔｉのユーザが発話した音に関する情報を抽出すると決定する。そして、抽出部６０３は、バッファＢ１，Ｂ２に記憶されたマイクロフォンＭ１，Ｍ２ごとの音に関する情報のうちの発話区間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報と、方向θ_aとに基づいて、端末Ｔｉのユーザが発話した音に関する情報を抽出する。 Here, when the distance D1 is shorter than the distance D2, the extraction unit 603 determines to extract information on the sound uttered by the user of the terminal Ti based on the information stored in the sound collection unit 307. And the extraction part 603 is based on the information regarding the sound for every microphone M1, M2 of the speech section P among the information regarding the sound for each microphone M1, M2 memorize | stored in buffer B1, B2, and direction (theta) _a . Information on the sound uttered by the user of the terminal Ti is extracted.

一方、距離Ｄ２が距離Ｄ１よりも短い場合、抽出部６０３は、第２の収音部１００１に記憶された情報をもとに端末Ｔｉのユーザが発話した音に関する情報を抽出すると決定する。そして、抽出部６０３は、バッファＢ３，Ｂ４に記憶されたマイクロフォンＭ３，Ｍ４ごとの音に関する情報のうちの発話区間ＰのマイクロフォンＭ３，Ｍ４ごとの音に関する情報と、方向θ_bとに基づいて、端末Ｔｉのユーザが発話した音に関する情報を抽出する。 On the other hand, when the distance D2 is shorter than the distance D1, the extraction unit 603 determines to extract information about the sound uttered by the user of the terminal Ti based on the information stored in the second sound collection unit 1001. And the extraction part 603 is based on the information regarding the sound for every microphone M3, M4 of the speech section P among the information regarding the sound for each microphone M3, M4 memorize | stored in buffer B3, B4, and direction (theta) _b . Information on the sound uttered by the user of the terminal Ti is extracted.

これにより、端末Ｔｉのユーザから物理的に近い位置のマイクロフォンＭに入力された音に関する情報から、端末Ｔｉのユーザが発話した音に関する情報を抽出することができる。この結果、より音圧の高い情報を使って音声認識を行うことができ、音声認識精度を向上させることができる。 Thereby, the information regarding the sound uttered by the user of the terminal Ti can be extracted from the information regarding the sound input to the microphone M physically close to the user of the terminal Ti. As a result, voice recognition can be performed using information with higher sound pressure, and voice recognition accuracy can be improved.

（情報処理装置１０１の情報処理手順）
つぎに、図１２を用いて、情報処理装置１０１の情報処理手順について説明する。 (Information processing procedure of the information processing apparatus 101)
Next, an information processing procedure of the information processing apparatus 101 will be described with reference to FIG.

図１２は、情報処理装置１０１の情報処理手順の一例を示すフローチャートである。図１２のフローチャートにおいて、まず、情報処理装置１０１は、収音部３０７の各バッファＢ１，Ｂ２への各マイクロフォンＭ１，Ｍ２に入力される音に関する情報のバッファリングを開始する（ステップＳ１２０１）。 FIG. 12 is a flowchart illustrating an example of an information processing procedure of the information processing apparatus 101. In the flowchart of FIG. 12, first, the information processing apparatus 101 starts buffering information related to sound input to the microphones M1 and M2 to the buffers B1 and B2 of the sound collection unit 307 (step S1201).

つぎに、情報処理装置１０１は、端末Ｔｉから送信されるビーコン信号ｂｓを受信したか否かを判断する（ステップＳ１２０２）。ここで、情報処理装置１０１は、ビーコン信号ｂｓを受信するのを待つ（ステップＳ１２０２：Ｎｏ）。そして、情報処理装置１０１は、ビーコン信号ｂｓを受信した場合（ステップＳ１２０２：Ｙｅｓ）、ビーコン信号ｂｓに含まれる操作種別が「ＯＮ」であるか否かを判断する（ステップＳ１２０３）。 Next, the information processing apparatus 101 determines whether or not the beacon signal bs transmitted from the terminal Ti has been received (step S1202). Here, the information processing apparatus 101 waits for reception of the beacon signal bs (step S1202: No). When the information processing apparatus 101 receives the beacon signal bs (step S1202: Yes), the information processing apparatus 101 determines whether the operation type included in the beacon signal bs is “ON” (step S1203).

ここで、操作種別が「ＯＮ」の場合（ステップＳ１２０３：Ｙｅｓ）、情報処理装置１０１は、ビーコン信号ｂｓに含まれる端末ＩＤと対応付けて、ビーコン信号ｂｓが受信された時刻をＯＮ時刻として、発話区間テーブル２２０に登録する（ステップＳ１２０４）。これにより、新たな発話区間情報がレコードとして発話区間テーブル２２０に登録される。 When the operation type is “ON” (step S1203: Yes), the information processing apparatus 101 associates the terminal ID included in the beacon signal bs with the time when the beacon signal bs is received as the ON time. It registers in the utterance section table 220 (step S1204). Thereby, new utterance section information is registered in the utterance section table 220 as a record.

ただし、情報処理装置１０１は、ビーコン受信機２０１を経由してビーコン信号ｂｓを受信する場合、複数のビーコン受信機２０１からほぼ同時に、同一の端末ＩＤおよび操作種別を含むビーコン信号ｂｓを受信することになる。この場合、情報処理装置１０１は、例えば、最初に受信されたビーコン信号ｂｓに応じて、発話区間テーブル２２０への登録を行う。 However, when the information processing apparatus 101 receives the beacon signal bs via the beacon receiver 201, the information processing apparatus 101 receives the beacon signal bs including the same terminal ID and operation type from the plurality of beacon receivers 201 almost simultaneously. become. In this case, for example, the information processing apparatus 101 performs registration in the utterance section table 220 according to the beacon signal bs received first.

つぎに、情報処理装置１０１は、受信したビーコン信号ｂｓに基づいて、マイクロフォンＭ１，Ｍ２から端末Ｔｉへの方向θを特定する（ステップＳ１２０５）。具体的には、例えば、情報処理装置１０１は、複数のビーコン受信機２０１から受信されるビーコン信号ｂｓ（端末Ｔｉの端末ＩＤを含む）のＲＳＳＩ値と、各ビーコン受信機２０１の設置位置とに基づいて、空間Ｒにおける端末Ｔｉの位置を推定する。そして、情報処理装置１０１は、推定した端末Ｔｉの位置と、マイクロフォンＭ１，Ｍ２の設置位置とに基づいて、マイクロフォンＭ１，Ｍ２から端末Ｔｉへの方向θを特定する。 Next, the information processing apparatus 101 specifies the direction θ from the microphones M1 and M2 to the terminal Ti based on the received beacon signal bs (step S1205). Specifically, for example, the information processing apparatus 101 determines the RSSI value of the beacon signal bs (including the terminal ID of the terminal Ti) received from the plurality of beacon receivers 201 and the installation position of each beacon receiver 201. Based on this, the position of the terminal Ti in the space R is estimated. Then, the information processing apparatus 101 specifies the direction θ from the microphones M1 and M2 to the terminal Ti based on the estimated position of the terminal Ti and the installation positions of the microphones M1 and M2.

つぎに、情報処理装置１０１は、ビーコン信号ｂｓに含まれる端末ＩＤと対応付けて、特定した方向θを発話区間テーブル２２０に登録して（ステップＳ１２０６）、ステップＳ１２０２に戻る。すなわち、情報処理装置１０１は、ビーコン信号ｂｓに含まれる端末ＩＤに対応する発話区間情報の方向フィールドに、特定した方向θを設定する。 Next, the information processing apparatus 101 registers the specified direction θ in the utterance section table 220 in association with the terminal ID included in the beacon signal bs (step S1206), and returns to step S1202. That is, the information processing apparatus 101 sets the specified direction θ in the direction field of the utterance period information corresponding to the terminal ID included in the beacon signal bs.

また、ステップＳ１２０３において、操作種別が「ＯＦＦ」の場合（ステップＳ１２０３：Ｎｏ）、情報処理装置１０１は、ビーコン信号ｂｓに含まれる端末ＩＤと対応付けて、ビーコン信号ｂｓが受信された時刻をＯＦＦ時刻として、発話区間テーブル２２０に登録する（ステップＳ１２０７）。すなわち、情報処理装置１０１は、ビーコン信号ｂｓに含まれる端末ＩＤに対応する発話区間情報のＯＦＦ時刻フィールドに、ビーコン信号ｂｓが受信された時刻を設定する。 If the operation type is “OFF” in step S1203 (step S1203: No), the information processing apparatus 101 turns off the time when the beacon signal bs is received in association with the terminal ID included in the beacon signal bs. The time is registered in the utterance section table 220 (step S1207). That is, the information processing apparatus 101 sets the time when the beacon signal bs is received in the OFF time field of the utterance period information corresponding to the terminal ID included in the beacon signal bs.

つぎに、情報処理装置１０１は、発話区間テーブル２２０を参照して、ビーコン信号ｂｓに含まれる端末ＩＤに対応する発話区間Ｐ（ＯＮ時刻〜ＯＦＦ時刻）および方向θを特定する（ステップＳ１２０８）。そして、情報処理装置１０１は、各バッファＢ１，Ｂ２から、特定した発話区間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報を読み出す（ステップＳ１２０９）。 Next, the information processing apparatus 101 refers to the utterance interval table 220 and specifies the utterance interval P (ON time to OFF time) and the direction θ corresponding to the terminal ID included in the beacon signal bs (step S1208). Then, the information processing apparatus 101 reads information related to the sound for each of the microphones M1 and M2 in the specified speech section P from each of the buffers B1 and B2 (step S1209).

つぎに、情報処理装置１０１は、読み出した発話区間Ｐの各マイクロフォンＭ１，Ｍ２の音に関する情報に基づいて、特定した方向θへのビームフォーム処理を行う（ステップＳ１２１０）。そして、情報処理装置１０１は、ビームフォーム処理により抽出された端末Ｔｉ（ビーコン信号ｂｓに含まれる端末ＩＤの端末Ｔｉ）のユーザが発話した音に関する情報を音声認識処理する（ステップＳ１２１１）。 Next, the information processing apparatus 101 performs beamform processing in the specified direction θ based on the information related to the sounds of the microphones M1 and M2 in the read utterance period P (step S1210). Then, the information processing apparatus 101 performs voice recognition processing on information regarding the sound uttered by the user of the terminal Ti (terminal Ti of the terminal ID included in the beacon signal bs) extracted by the beamform process (step S1211).

つぎに、情報処理装置１０１は、ビーコン信号ｂｓに含まれる端末ＩＤと対応付けて、音声認識結果を出力する（ステップＳ１２１２）。そして、情報処理装置１０１は、情報処理システム１００が終了したか否かを判断する（ステップＳ１２１３）。ここで、情報処理システム１００が終了していない場合（ステップＳ１２１３：Ｎｏ）、情報処理装置１０１は、ステップＳ１２０２に戻る。 Next, the information processing apparatus 101 outputs a voice recognition result in association with the terminal ID included in the beacon signal bs (step S1212). Then, the information processing apparatus 101 determines whether or not the information processing system 100 has ended (step S1213). If the information processing system 100 has not ended (step S1213: No), the information processing apparatus 101 returns to step S1202.

一方、情報処理システム１００が終了した場合（ステップＳ１２１３：Ｙｅｓ）、情報処理装置１０１は、本フローチャートによる一連の処理を終了する。これにより、端末Ｔｉのユーザごとに発話した音に関する情報を抽出することができる。 On the other hand, when the information processing system 100 ends (step S1213: Yes), the information processing apparatus 101 ends a series of processes according to this flowchart. Thereby, the information regarding the sound uttered for each user of the terminal Ti can be extracted.

以上説明したように、実施の形態にかかる情報処理装置１０１によれば、端末Ｔｉへの操作に応じて端末Ｔｉから送信されるビーコン信号ｂｓに基づいて、端末Ｔｉのユーザが発話した発話区間Ｐと、マイクロフォンＭ１，Ｍ２から端末Ｔｉへの方向θを特定することができる。そして、情報処理装置１０１によれば、バッファＢ１，Ｂ２に記憶された発話区間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報と、方向θとに基づいて、端末Ｔｉのユーザが発話した音に関する情報を抽出することができる。具体的には、例えば、情報処理装置１０１によれば、発話区間ＰのマイクロフォンＭ１，Ｍ２ごとの音に関する情報に基づいて、方向θに対するビームフォーム処理を行うことにより、ユーザが発話した音に関する情報を抽出することができる。 As described above, according to the information processing apparatus 101 according to the embodiment, the utterance period P uttered by the user of the terminal Ti based on the beacon signal bs transmitted from the terminal Ti in response to an operation to the terminal Ti. Then, the direction θ from the microphones M1, M2 to the terminal Ti can be specified. Then, according to the information processing apparatus 101, the information related to the sound uttered by the user of the terminal Ti based on the information about the sound for each of the microphones M1 and M2 in the utterance section P stored in the buffers B1 and B2 and the direction θ. Can be extracted. Specifically, for example, according to the information processing apparatus 101, information on sound uttered by the user is performed by performing beamform processing for the direction θ based on information on sound for each of the microphones M1 and M2 in the utterance section P. Can be extracted.

これにより、端末Ｔｉの方向θ（マイクロフォンＭ１，Ｍ２から見た端末Ｔｉのユーザの方向）から到来した音声信号を強調して、端末Ｔｉのユーザが発話した音を抽出することができる。また、各バッファＢ１，Ｂ２に記憶されたマイクロフォンＭ１，Ｍ２ごとの音に関する情報を用いて、各方向θから到来した音声信号を強調した音情報をそれぞれ生成することができる。このため、複数人が同じタイミングで発話した場合であっても、ユーザごとに発話した音を精度良く抽出することができる。 As a result, it is possible to extract the sound uttered by the user of the terminal Ti by emphasizing the voice signal that has arrived from the direction θ of the terminal Ti (the direction of the user of the terminal Ti viewed from the microphones M1 and M2). In addition, sound information in which the sound signal arriving from each direction θ is emphasized can be generated using the information regarding the sound for each of the microphones M1 and M2 stored in the buffers B1 and B2. For this reason, even when a plurality of people speak at the same timing, it is possible to accurately extract the sound spoken for each user.

また、情報処理装置１０１によれば、抽出した端末Ｔｉのユーザが発話した音に関する情報を、端末Ｔｉの端末ＩＤと対応付けて出力することができる。これにより、ユーザが発話した音に関する情報を、当該ユーザを判別可能にして外部装置等（例えば、音声認識装置）に提供することができる。 Further, according to the information processing apparatus 101, it is possible to output information relating to the extracted sound uttered by the user of the terminal Ti in association with the terminal ID of the terminal Ti. Thereby, the information regarding the sound uttered by the user can be identified and provided to an external device or the like (for example, a voice recognition device).

また、情報処理装置１０１によれば、抽出した端末Ｔｉのユーザが発話した音に関する情報を音声認識し、音声認識した認識結果を、端末Ｔｉの端末ＩＤと対応付けて出力することができる。これにより、ユーザが発話した音に関する情報を音声認識して得られた認識結果を、当該ユーザを判別可能にして外部装置等（例えば、音声入力可能な電子機器）に提供することができる。 Further, according to the information processing apparatus 101, it is possible to recognize the information related to the sound uttered by the user of the extracted terminal Ti and output the recognition result of the recognized voice in association with the terminal ID of the terminal Ti. As a result, the recognition result obtained by voice recognition of the information related to the sound uttered by the user can be provided to an external device or the like (for example, an electronic device capable of voice input) while making the user discriminable.

また、情報処理装置１０１によれば、高さが異なる位置に設置されるマイクロフォンＭ’ａ，Ｍ’ｂそれぞれに入力される音に関する情報に基づいて、端末Ｔｉのユーザの身長が所定の高さＫ以上であるか否かを判断することができる。そして、情報処理装置１０１によれば、判断した結果に基づいて、抽出した端末Ｔｉのユーザが発話した音に関する情報を音声認識することができる。これにより、例えば、端末Ｔｉのユーザが大人であるか子供であるかを判別して、音声認識に用いる音響モデルを設定することが可能となり、音声認識精度を向上させることができる。 Further, according to the information processing apparatus 101, the height of the user of the terminal Ti is set to a predetermined height based on information about sound input to the microphones M′a and M′b installed at different heights. It can be determined whether or not it is K or more. And according to the information processing apparatus 101, based on the determined result, the information regarding the sound uttered by the user of the extracted terminal Ti can be recognized. Thereby, for example, it is possible to determine whether the user of the terminal Ti is an adult or a child, and to set an acoustic model used for speech recognition, thereby improving speech recognition accuracy.

また、情報処理装置１０１によれば、マイクロフォンＭ１，Ｍ２それぞれに入力される音に関する情報に基づいて、マイクロフォンＭ１，Ｍ２から端末Ｔｉへの方向θ_aを特定することができる。また、情報処理装置１０１によれば、マイクロフォンＭ１，Ｍ２とは異なる位置に設置されるマイクロフォンＭ３，Ｍ４それぞれに入力される音に関する情報に基づいて、マイクロフォンＭ３，Ｍ４から端末Ｔｉへの方向θ_bを特定することができる。そして、情報処理装置１０１によれば、特定した方向θ_a，θ_bと、マイクロフォンＭ１，Ｍ２の設置位置と、マイクロフォンＭ３，Ｍ４の設置位置とに基づいて、端末Ｔｉの位置を特定することができる。 Further, according to the information processing apparatus 101 can be based on information about the sound input to the microphone M1, M2 respectively, to identify the direction theta _a from the microphone M1, M2 to the terminal Ti. Further, according to the information processing apparatus 101, the direction θ _b from the microphones M3 and M4 to the terminal Ti is based on information about sound input to the microphones M3 and M4 installed at positions different from the microphones M1 and M2. Can be specified. Then, according to the information processing apparatus 101, the position of the terminal Ti can be specified based on the specified directions θ _a and θ _b , the installation positions of the microphones M1 and M2, and the installation positions of the microphones M3 and M4. it can.

これにより、空間Ｒにおける端末Ｔｉのユーザの位置を特定することができる。ただし、端末Ｔｉのユーザの位置を特定するにあたり、端末Ｔｉのユーザのみが発話することとする。このため、複数のユーザが存在する場合は、例えば、複数のユーザそれぞれが順番に発話して、各ユーザの位置を特定することになる。 Thereby, the position of the user of the terminal Ti in the space R can be specified. However, only the user of the terminal Ti speaks in specifying the position of the user of the terminal Ti. For this reason, when there are a plurality of users, for example, each of the plurality of users speaks in order and specifies the position of each user.

また、情報処理装置１０１によれば、特定した端末Ｔｉの位置に基づいて、収音部３０７（バッファＢ１，Ｂ２）または第２の収音部１００１（バッファＢ３，Ｂ４）のいずれに記憶された情報をもとに端末Ｔｉのユーザが発話した音に関する情報を抽出するかを決定することができる。 Further, according to the information processing apparatus 101, the information is stored in either the sound collection unit 307 (buffers B1 and B2) or the second sound collection unit 1001 (buffers B3 and B4) based on the specified position of the terminal Ti. Based on the information, it can be determined whether or not to extract information related to the sound spoken by the user of the terminal Ti.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）複数のマイクロフォンと、
前記複数のマイクロフォンに含まれるマイクロフォンごとに、前記マイクロフォンに入力される音に関する情報を記憶する記憶部と、
端末への操作に応じて前記端末から送信される情報に基づいて、前記端末のユーザが発話した期間と、前記複数のマイクロフォンから前記端末への方向とを特定する特定部と、
前記記憶部に記憶された前記期間の前記マイクロフォンごとの音に関する情報と、前記方向とに基づいて、前記ユーザが発話した音に関する情報を抽出する抽出部と、
を有することを特徴とする情報処理装置。 (Supplementary note 1) a plurality of microphones;
For each microphone included in the plurality of microphones, a storage unit that stores information about sound input to the microphones;
Based on information transmitted from the terminal in response to an operation to the terminal, a specifying unit that specifies a period during which the user of the terminal spoke and a direction from the plurality of microphones to the terminal;
An extraction unit for extracting information on the sound uttered by the user based on the information on the sound for each microphone in the period stored in the storage unit and the direction;
An information processing apparatus comprising:

（付記２）前記抽出部によって抽出された前記ユーザが発話した音に関する情報を、前記端末の識別情報と対応付けて出力する出力部を有することを特徴とする付記１に記載の情報処理装置。 (Supplementary note 2) The information processing apparatus according to supplementary note 1, further comprising: an output unit that outputs information relating to the sound uttered by the user extracted by the extraction unit in association with identification information of the terminal.

（付記３）前記抽出部によって抽出された前記ユーザが発話した音に関する情報を音声認識する音声認識部と、
前記音声認識部によって音声認識された認識結果を、前記端末の識別情報と対応付けて出力する出力部と、
を有することを特徴とする付記１または２に記載の情報処理装置。 (Supplementary Note 3) A voice recognition unit that recognizes information related to the sound uttered by the user extracted by the extraction unit;
An output unit that outputs a recognition result recognized by the voice recognition unit in association with identification information of the terminal;
The information processing apparatus according to appendix 1 or 2, characterized by comprising:

（付記４）前記抽出部は、
前記期間の前記マイクロフォンごとの音に関する情報に基づいて、前記方向に対するビームフォーム処理を行うことにより、前記ユーザが発話した音に関する情報を抽出する、ことを特徴とする付記１〜３のいずれか一つに記載の情報処理装置。 (Supplementary Note 4) The extraction unit
Any one of appendices 1 to 3, wherein information related to the sound spoken by the user is extracted by performing beamform processing for the direction based on information related to the sound for each microphone in the period. Information processing apparatus described in one.

（付記５）前記音声認識部は、
高さが異なる位置に設置される複数の第２マイクロフォンそれぞれに入力される音に関する情報に基づいて、前記ユーザの身長が所定の高さ以上であるか否かを判断し、
判断した判断結果に基づいて、前記抽出部によって抽出された前記ユーザが発話した音に関する情報を音声認識する、ことを特徴とする付記３に記載の情報処理装置。 (Supplementary Note 5) The voice recognition unit
Determining whether the height of the user is greater than or equal to a predetermined height based on information about sound input to each of the plurality of second microphones installed at different heights;
4. The information processing apparatus according to appendix 3, wherein information related to the sound uttered by the user extracted by the extraction unit is recognized based on the determined determination result.

（付記６）前記複数のマイクロフォンとは異なる位置に設置された複数の第３マイクロフォンと、
前記複数の第３マイクロフォンに含まれる第３マイクロフォンごとに、前記第３マイクロフォンに入力される音に関する情報を記憶する第２記憶部と、を有し、
前記特定部は、
前記複数のマイクロフォンそれぞれに入力される音に関する情報に基づいて、前記複数のマイクロフォンから前記端末への第１の方向を特定し、
前記複数の第３マイクロフォンそれぞれに入力される音に関する情報に基づいて、前記複数の第３マイクロフォンから前記端末への第２の方向を特定し、
前記抽出部は、
前記第１および第２の方向と、前記複数のマイクロフォンの設置位置と、前記複数の第３マイクロフォンの設置位置とに基づいて、前記端末の位置を特定し、
特定した前記端末の位置に基づいて、前記記憶部または前記第２記憶部のいずれに記憶された情報をもとに前記ユーザが発話した音に関する情報を抽出するかを決定する、
ことを特徴とする付記１〜５のいずれか一つに記載の情報処理装置。 (Appendix 6) A plurality of third microphones installed at positions different from the plurality of microphones;
A second storage unit that stores information about sound input to the third microphone for each third microphone included in the plurality of third microphones;
The specific part is:
Identifying a first direction from the plurality of microphones to the terminal based on information about sound input to each of the plurality of microphones;
Identifying a second direction from the plurality of third microphones to the terminal based on information about sound input to each of the plurality of third microphones;
The extraction unit includes:
Identifying the position of the terminal based on the first and second directions, the installation positions of the plurality of microphones, and the installation positions of the plurality of third microphones;
Based on the identified position of the terminal, it is determined whether to extract information related to the sound spoken by the user based on information stored in either the storage unit or the second storage unit.
The information processing apparatus according to any one of supplementary notes 1 to 5, wherein:

（付記７）端末への操作に応じて前記端末から送信される情報に基づいて、前記端末のユーザが発話した期間と、複数のマイクロフォンから前記端末への方向とを特定し、
前記複数のマイクロフォンに含まれるマイクロフォンごとに、前記マイクロフォンに入力される音に関する情報を記憶する記憶部に記憶された前記期間の前記マイクロフォンごとの音に関する情報と、前記方向とに基づいて、前記ユーザが発話した音に関する情報を抽出する、
処理をコンピュータに実行させることを特徴とする情報処理プログラム。 (Appendix 7) Based on information transmitted from the terminal in response to an operation on the terminal, a period during which the user of the terminal speaks and directions from a plurality of microphones to the terminal are identified,
For each microphone included in the plurality of microphones, the user based on the information about the sound for each microphone in the period stored in the storage unit that stores information about the sound input to the microphone, and the direction Extract information about the sound uttered by
An information processing program for causing a computer to execute processing.

（付記８）複数のマイクロフォンを壁に有する建物であって、
前記複数のマイクロフォンに含まれるマイクロフォンごとに、前記マイクロフォンに入力される音に関する情報を記憶する記憶部と、端末への操作に応じて前記端末から送信される情報に基づいて、前記端末のユーザが発話した期間と、前記複数のマイクロフォンから前記端末への方向とを特定する特定部と、前記記憶部に記憶された前記期間の前記マイクロフォンごとの音に関する情報と、前記方向とに基づいて、前記ユーザが発話した音に関する情報を抽出する抽出部とを有する情報処理装置
を備えることを特徴とする建物。 (Appendix 8) A building having a plurality of microphones on a wall,
For each microphone included in the plurality of microphones, based on information stored in the storage unit that stores information related to sound input to the microphone and information transmitted from the terminal in response to an operation on the terminal, the user of the terminal Based on the period of utterance, the identification unit that identifies the direction from the plurality of microphones to the terminal, the information on the sound for each microphone of the period stored in the storage unit, and the direction, A building comprising: an information processing apparatus including an extraction unit that extracts information related to sound uttered by a user.

１００情報処理システム
１０１情報処理装置
１１０記憶部
２０１ビーコン受信機
２１０ネットワーク
２２０発話区間テーブル
３００，４００バス
３０１，４０１ＣＰＵ
３０２，４０２メモリ
３０３Ｉ／Ｆ
３０４ビーコン受信部
３０５ディスクドライブ
３０６ディスク
３０７，１００１収音部
４０３操作ボタン
４０４ＬＥＤランプ
４０５ビーコン送信部
６０１取得部
６０２特定部
６０３抽出部
６０４音声認識部
６０５出力部 DESCRIPTION OF SYMBOLS 100 Information processing system 101 Information processing apparatus 110 Memory | storage part 201 Beacon receiver 210 Network 220 Speech area table 300,400 Bus 301,401 CPU
302, 402 Memory 303 I / F
304 beacon reception unit 305 disk drive 306 disk 307,1001 sound collection unit 403 operation button 404 LED lamp 405 beacon transmission unit 601 acquisition unit 602 identification unit 603 extraction unit 604 voice recognition unit 605 output unit

Claims

Multiple microphones,
For each microphone included in the plurality of microphones, a storage unit that stores information about sound input to the microphones;
Based on information transmitted from the terminal in response to an operation to the terminal, a specifying unit that specifies a period during which the user of the terminal spoke and a direction from the plurality of microphones to the terminal;
An extraction unit for extracting information on the sound uttered by the user based on the information on the sound for each microphone in the period stored in the storage unit and the direction;
An information processing apparatus comprising:

The information processing apparatus according to claim 1, further comprising: an output unit that outputs information related to the sound uttered by the user extracted by the extraction unit in association with identification information of the terminal.

A voice recognition unit for recognizing information about the sound uttered by the user extracted by the extraction unit;
An output unit that outputs a recognition result recognized by the voice recognition unit in association with identification information of the terminal;
The information processing apparatus according to claim 1, further comprising:

The voice recognition unit
Determining whether the height of the user is greater than or equal to a predetermined height based on information about sound input to each of the plurality of second microphones installed at different heights;
The information processing apparatus according to claim 3, wherein information on the sound uttered by the user extracted by the extraction unit is recognized based on the determined determination result.

A plurality of third microphones installed at positions different from the plurality of microphones;
A second storage unit that stores information about sound input to the third microphone for each third microphone included in the plurality of third microphones;
The specific part is:
Identifying a first direction from the plurality of microphones to the terminal based on information about sound input to each of the plurality of microphones;
Identifying a second direction from the plurality of third microphones to the terminal based on information about sound input to each of the plurality of third microphones;
The extraction unit includes:
Identifying the position of the terminal based on the first and second directions, the installation positions of the plurality of microphones, and the installation positions of the plurality of third microphones;
Based on the identified position of the terminal, it is determined whether to extract information related to the sound spoken by the user based on information stored in either the storage unit or the second storage unit.
The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

Based on information transmitted from the terminal in response to an operation to the terminal, specify a period during which the user of the terminal speaks and directions from a plurality of microphones to the terminal,
For each microphone included in the plurality of microphones, the user based on the information about the sound for each microphone in the period stored in the storage unit that stores information about the sound input to the microphone, and the direction Extract information about the sound uttered by
An information processing program for causing a computer to execute processing.

A building having a plurality of microphones on the wall,
For each microphone included in the plurality of microphones, based on information stored in the storage unit that stores information related to sound input to the microphone and information transmitted from the terminal in response to an operation on the terminal, the user of the terminal Based on the period of utterance, the identification unit that identifies the direction from the plurality of microphones to the terminal, the information on the sound for each microphone of the period stored in the storage unit, and the direction, A building comprising: an information processing apparatus including an extraction unit that extracts information related to sound uttered by a user.