JP2010049086A

JP2010049086A - Object signal section estimating device, method and program, and recording medium

Info

Publication number: JP2010049086A
Application number: JP2008214097A
Authority: JP
Inventors: Kentaro Ishizuka; 健太郎石塚; Akiko Araki; 章子荒木; Tatsuya Kawahara; 達也河原
Original assignee: Nippon Telegraph and Telephone Corp; Kyoto University NUC
Current assignee: Kyoto University NUC; NTT Inc
Priority date: 2008-08-22
Filing date: 2008-08-22
Publication date: 2010-03-04
Anticipated expiration: 2028-08-22
Also published as: JP5147012B2

Abstract

【課題】雑音環境下であって、なおかつ、目的信号の到来方向を正確に知ることが出来ない状況において、少ない計算量で精度よく目的信号区間を推定する。
【解決手段】複数のセンサで観測された各信号を所定の時間区間であるフレーム毎に切り出し、切り出された各センサについての各フレームの信号を周波数領域に変換し、時間周波数ビン毎の周波数領域信号を各センサについて生成する。また、基本周波数を推定し、基本周波数又はその倍音成分近傍のグリッドのみについて、基準センサに対応する周波数領域信号を基準として、当該基準センサ以外のセンサに対応する各周波数領域信号を正規化し、時間周波数ビン毎の正規化信号値を生成する。そして、グリッド毎に正規化信号値の偏在性を示す偏在値を求め、それらを用いてフレーム毎の偏在性を示す偏在性指標値を算出し、当該偏在性指標値を指標とし、各フレームが目的信号区間に対応するか否かを判定する。
【選択図】図１In a noise environment and in a situation where the direction of arrival of a target signal cannot be accurately determined, a target signal section is accurately estimated with a small amount of calculation.
Each signal observed by a plurality of sensors is cut out for each frame that is a predetermined time interval, the signal of each frame for each cut out sensor is converted into a frequency domain, and the frequency domain for each time frequency bin A signal is generated for each sensor. In addition, the fundamental frequency is estimated, and only the frequency domain signal corresponding to the reference sensor is normalized with respect to only the grid near the fundamental frequency or its harmonic component, and the frequency domain signals corresponding to the sensors other than the reference sensor are normalized. Generate a normalized signal value for each frequency bin. Then, an uneven distribution value indicating the uneven distribution of the normalized signal value is obtained for each grid, and an uneven distribution index value indicating the uneven distribution for each frame is calculated using them, and the uneven distribution index value is used as an index. It is determined whether or not it corresponds to the target signal section.
[Selection] Figure 1

Description

本発明は、信号処理技術に関し、特に、雑音が含まれる観測信号から目的信号が存在する区間を推定する技術に関する。 The present invention relates to a signal processing technique, and more particularly, to a technique for estimating a section where a target signal exists from an observation signal including noise.

音声信号や音楽信号などの目的信号を処理対象とする符号化、雑音信号の抑圧、残響除去、自動音声認識などの音響信号処理技術では、複数種類の信号が含まれる入力音響信号から目的信号が存在する区間を推定する必要がある。この目的信号区間推定の精度はその後の信号処理性能に大きく影響する。 In acoustic signal processing technologies such as encoding, processing for target signals such as audio signals and music signals, suppression of noise signals, dereverberation, and automatic speech recognition, the target signal is derived from an input acoustic signal containing multiple types of signals. It is necessary to estimate existing intervals. The accuracy of the target signal interval estimation greatly affects the subsequent signal processing performance.

環境雑音下での目的信号区間推定を行う際に複数のマイクロフォンが利用可能な場合には、目的音響信号の区間を推定するために信号の到達時間差の情報を利用できる。従来、目的信号の到来方向を既知として到来方向の信号を強調することで目的信号区間推定を行う手法（非特許文献１）や、零交差数などの音響特徴に対する閾値を目的信号の推定到来方向の信頼度によって決定する方法（非特許文献２）や、空間スペクトルのピークの有無で音声の有無を推定する手法（非特許文献３）や、推定信号到来方向が時間的に一定となる区間を音声の存在する区間とする手法（非特許文献４）などがある。しかしながら、これらの方法で十分な精度を得るためには、目的信号の到来方向が既知であったり、周辺環境が静かであったりする必要がある。 When a plurality of microphones can be used when estimating a target signal section under environmental noise, information on the difference in arrival time of signals can be used to estimate a section of the target acoustic signal. Conventionally, a method for estimating a target signal section by emphasizing a signal in the direction of arrival with a known direction of arrival of the target signal (Non-patent Document 1), and a threshold for acoustic features such as the number of zero crossings are used as the estimated direction of arrival of the target signal A method for determining the presence / absence of speech based on the presence / absence of a spatial spectrum peak (Non-Patent Document 3), and a section in which the estimated signal arrival direction is constant over time. There is a technique (Non-patent Document 4) that uses a section in which speech exists. However, in order to obtain sufficient accuracy by these methods, it is necessary that the arrival direction of the target signal is known or the surrounding environment is quiet.

また、複数のマイクロフォンを利用し、各マイクロフォンの信号に対してそれぞれ目的信号区間を推定した後、各マイクロフォンに対応する推定結果を比較して最終的な目的信号区間推定結果を得る手法が存在する（非特許文献５）。しかしながら、この方法では、複数のマイクロフォンを利用することによる空間情報（目的信号の到来方向の情報）を十分利用できていなかった。 Also, there is a method of using a plurality of microphones and estimating a target signal section for each microphone signal and then comparing the estimation results corresponding to the microphones to obtain a final target signal section estimation result. (Non-patent document 5). However, this method cannot sufficiently use the spatial information (information on the arrival direction of the target signal) by using a plurality of microphones.

その一方で、複数の音響信号が同時にあらゆる方向・あらゆる周波数帯域において到来するような環境（例えば街頭や駅・空港のような日常環境）で信号の到達時間差を用いて十分な目的信号区間推定精度を得るための手法として、一定範囲の時間周波数区間で推定した信号の到達時間差がある一定の値に偏る度合い（偏在性）を利用する手法（非特許文献６）がある。
Alvarez, A., Gomez, P., Nieto, V., Martinez, R., and Rodellar, V., "Application of a first-order differential microphone for efficient voice activity detection in a car platform", Proceedings of Interspeech, 2669-2672, 2005. 田中貴雅，傳田遊亀，中山雅人，西浦敬信，“Weighted CSP法と音声特徴量に基づくハンズフリー発話区間検出の検討”，日本音響学会2006年度春期全国大会講演論文集，1-P-3, pp. 149-150, Mar. 2006. 山本潔，浅野太，吉村隆，本村陽一，麻生英樹，原功，市村直幸，緒方淳，北脇信彦，“音響情報と画像情報の統合による発話区間検出・分離システムの評価，” 日本音響学会秋季研究発表会講演論文集，３−６−１０，Ｐ１２１−１２２，２００３．藤本雅清，有木康雄，堂下修司，“マルチモーダルインタラクションによるニュース映像中の人物認識，”日本音響学会誌，Ｖｏｌ．６２，Ｎｏ．３，Ｐ１８２−１９２，２００６．荒木章子，藤本雅清，石塚健太郎，澤田宏，牧野昭二，「音声区間検出と方向情報を用いた会議音声話者識別システムとその評価」，日本音響学会春季研究発表会，ｐｐ.１−４，２００８. Juan E. Rubio, Kentaro Ishizuka, Hiroshi Sawada, Shoko Araki, Tomohiro Nakatani, and Masakiyo Fujimoto, "Two-Microphone Voice Activity Detection Based on the Homogeneity of the Direction of Arrival Estimates," Proceedings of the 32nd International Conference on Acoustics, Speech, and Signal Processing, Vol. 4, pp. 385-388, 2007. On the other hand, sufficient target signal interval estimation accuracy using signal arrival time differences in environments where multiple acoustic signals simultaneously arrive in all directions and frequency bands (for example, daily environments such as streets, stations, and airports) As a technique for obtaining the above, there is a technique (Non-patent Document 6) that uses a degree (unevenness) in which a difference in arrival time of signals estimated in a certain time frequency interval is biased to a certain value.
Alvarez, A., Gomez, P., Nieto, V., Martinez, R., and Rodellar, V., "Application of a first-order differential microphone for efficient voice activity detection in a car platform", Proceedings of Interspeech, 2669-2672, 2005. Takamasa Tanaka, Yuka Tomita, Masato Nakayama, Takanobu Nishiura, “Examination of hands-free utterance detection based on weighted CSP method and speech features”, Proceedings of the Acoustical Society of Japan 2006 Spring Meeting, 1-P-3, pp 149-150, Mar. 2006. Kiyoshi Yamamoto, Tataka Asano, Takashi Yoshimura, Yoichi Motomura, Hideki Aso, Isao Hara, Naoyuki Ichimura, Satoshi Ogata, Nobuhiko Kitawaki, “Evaluation of Speech Section Detection and Separation System by Integration of Acoustic Information and Image Information,” Acoustical Society of Japan Proceedings of Autumn Research Presentation, 3-6-10, P121-122, 2003. Masayoshi Fujimoto, Yasuo Ariki, Shuji Doshita, “Person Recognition in News Video by Multimodal Interaction,” Journal of the Acoustical Society of Japan, Vol. 62, no. 3, P182-192, 2006. Akiko Araki, Masaki Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino, “Conference Speech Speaker Identification System Using Speech Interval Detection and Direction Information and Its Evaluation”, Spring Meeting of the Acoustical Society of Japan, pp. 1-4 2008. Juan E. Rubio, Kentaro Ishizuka, Hiroshi Sawada, Shoko Araki, Tomohiro Nakatani, and Masakiyo Fujimoto, "Two-Microphone Voice Activity Detection Based on the Homogeneity of the Direction of Arrival Estimates," Proceedings of the 32nd International Conference on Acoustics, Speech , and Signal Processing, Vol. 4, pp. 385-388, 2007.

しかし、非特許文献６の方法では、全ての時間周波数区間での偏在性を計算するため多大な計算量を要し、また方向性の雑音がある場合にはそれも目的信号として検出してしまうという問題点がある。 However, in the method of Non-Patent Document 6, a large amount of calculation is required to calculate the uneven distribution in all time frequency sections, and if there is directional noise, it is also detected as a target signal. There is a problem.

本発明はこのような点に鑑みてなされたものであり、雑音環境下であって、なおかつ、目的信号の到来方向を正確に知ることが出来ない状況において、少ない計算量で精度よく目的信号区間を推定することが可能な技術を提供することを目的とする。 The present invention has been made in view of the above points, and in a situation where it is under noisy environment and the arrival direction of the target signal cannot be accurately determined, the target signal section can be accurately obtained with a small amount of calculation. It is an object to provide a technique capable of estimating

本発明では上記課題を解決するために、まず、信号抽出部が、複数のセンサで観測された各信号をそれぞれ所定の時間区間であるフレーム毎に切り出し、周波数領域変換部が、信号切出部で切り出された各フレームの信号を周波数領域に変換し、時間周波数ビン毎の周波数領域信号を各センサについて生成する。また、基本周波数推定部が、信号切出部で切り出された各フレームの信号の基本周波数をそれぞれ推定し、時間周波数領域分割部が、基本周波数又はその各倍音成分をそれぞれ含む有限の時間周波数区間であるグリッドをフレーム毎に１以上ずつ特定し、各グリッドに属する各時間周波数ビンの周波数領域信号を抽出する。その後、正規化部が、センサに含まれる特定の基準センサに対応する時間周波数領域分割部で抽出された周波数領域信号を基準として、少なくとも当該基準センサ以外のセンサに対応する各周波数領域信号を正規化し、センサで観測された信号の到来方向に対応する時間周波数領域分割部で抽出された正規化信号値を時間周波数ビン毎に生成する。そして、偏在性指標値算出部が、正規化信号値のグリッド毎の偏在性を示す偏在値を求め、当該グリッド毎の偏在値を用い、正規化信号値のフレーム毎の偏在性を示す偏在性指標値を算出する。 In the present invention, in order to solve the above-described problem, first, the signal extraction unit cuts out each signal observed by a plurality of sensors for each frame that is a predetermined time interval, and the frequency domain conversion unit generates a signal extraction unit. The signal of each frame cut out in step 1 is converted into the frequency domain, and a frequency domain signal for each time frequency bin is generated for each sensor. Further, the fundamental frequency estimation unit estimates the fundamental frequency of each frame signal cut out by the signal cutout unit, and the time frequency domain division unit includes a finite time frequency section including the fundamental frequency or each harmonic component thereof. 1 or more is specified for each frame, and the frequency domain signal of each time frequency bin belonging to each grid is extracted. Thereafter, the normalization unit normalizes at least each frequency domain signal corresponding to a sensor other than the reference sensor based on the frequency domain signal extracted by the time frequency domain division unit corresponding to the specific reference sensor included in the sensor. And a normalized signal value extracted by the time-frequency domain dividing unit corresponding to the direction of arrival of the signal observed by the sensor is generated for each time-frequency bin. Then, the ubiquity index value calculation unit obtains an ubiquitous value indicating the ubiquity of the normalized signal value for each grid, and uses the ubiquitous value for each grid to indicate the ubiquity of the normalized signal value for each frame. An index value is calculated.

ここで、本発明の正規化部が生成する正規化信号値は信号の到来方方向に対応する値となる。通常、環境雑音は多様な方向からセンサに到来するのに対し、目的信号は或る方向のみからセンサに到来するという性質（性質１）を持つ。そのため、目的信号が存在しない時間周波数ビンの正規化信号値は広く分布する（偏在性が低い）のに対し、目的信号が存在する時間周波数ビンの正規化信号値は目的信号の到来方方向に対応する値の付近に偏って分布する（偏在性が高い）。また、同一の目的信号の基本周波数又はその倍音成分（基本周波数の整数倍の周波数成分）はそれぞれ時間周波数領域で狭く分布するのに対し、雑音のパワーは時間周波数領域に広く分布する（性質２）。本発明では、これらの性質を利用し、基本周波数又はその倍音成分をそれぞれ含む有限の時間周波数区間であるグリッド毎の偏在性を示す偏在値を求め、当該グリッド毎の偏在値を用い、正規化信号値のフレーム毎の偏在性を示す偏在性指標値を算出する。これにより、目的信号区間を精度よく推定することができる。また、本発明では、各グリッドに属する各時間周波数ビンのみについて正規化信号値を求め、各グリッドのみについて偏在値を求めるため、全ての時間周波数区間で正規化信号値や偏在値を求める場合に比べ演算量を削減できる。なお、このように正規化信号値の偏在性を指標とする場合、目的信号の到来方向を正確に知る必要はない。よって、本発明では、目的信号の正確な到来方向を推定できない場合であっても、適切に目的信号区間を推定することができる。 Here, the normalized signal value generated by the normalization unit of the present invention is a value corresponding to the direction of arrival of the signal. Normally, environmental noises arrive at the sensor from various directions, whereas the target signal has a property (characteristic 1) that arrives at the sensor only from a certain direction. Therefore, the normalized signal values of time frequency bins where the target signal does not exist are widely distributed (low unevenness), whereas the normalized signal values of the time frequency bin where the target signal exists are in the direction of arrival of the target signal. Distributed in the vicinity of the corresponding value (highly ubiquitous). Further, the fundamental frequency of the same target signal or its harmonic component (a frequency component that is an integral multiple of the fundamental frequency) is distributed narrowly in the time frequency domain, whereas the noise power is widely distributed in the time frequency domain (property 2). ). In the present invention, using these properties, an uneven value indicating the uneven distribution for each grid, which is a finite time frequency section including the fundamental frequency or its harmonic component, is obtained, and the uneven distribution value for each grid is used for normalization. An ubiquity index value indicating the ubiquity of the signal value for each frame is calculated. Thereby, the target signal section can be estimated with high accuracy. Also, in the present invention, since the normalized signal value is obtained only for each time frequency bin belonging to each grid and the uneven value is obtained only for each grid, the normalized signal value and the uneven value are obtained in all time frequency sections. Compared to the amount of calculation. In addition, when the uneven distribution of normalized signal values is used as an index, it is not necessary to know the arrival direction of the target signal accurately. Therefore, in the present invention, even when the accurate arrival direction of the target signal cannot be estimated, the target signal section can be estimated appropriately.

また、本発明において好ましくは、偏在性指標値算出部は、正規化信号値を量子化し、量子化された正規化信号値の頻度をグリッド毎に求め、グリッド毎のヒストグラムを生成するヒストグラム生成部と、グリッド毎のヒストグラムを用い、当該ヒストグラムの分布の偏りを示す偏在値をグリッド毎に算出する偏在性計算部と、同一のフレームに対応する各グリッドの偏在値を平均し、その平均値を当該フレームの偏在性指標値として算出する平均部と、を有する。 Preferably, in the present invention, the uneven distribution index value calculation unit quantizes the normalized signal value, obtains the frequency of the quantized normalized signal value for each grid, and generates a histogram for each grid. Then, using the histogram for each grid, the ubiquity calculation unit that calculates the ubiquitous value indicating the deviation of the distribution of the histogram for each grid, the omnidirectional value of each grid corresponding to the same frame, and the average value And an average part calculated as an uneven distribution index value of the frame.

このように、同一のフレームに対応する各グリッドの偏在値を平均し、その平均値を当該フレームの偏在性指標値として算出することにより、パワーや到来方向が時間周波数領域に広く分布する雑音成分の影響を低減させ、目的信号区間の推定精度を向上させることができる。 In this way, by averaging the uneven distribution values of the grids corresponding to the same frame and calculating the average value as the uneven distribution index value of the frame, a noise component whose power and arrival direction are widely distributed in the time frequency domain Can be reduced, and the estimation accuracy of the target signal section can be improved.

また、本発明において好ましくは、ヒストグラム生成部は、重み係数を用い、量子化された正規化信号値の頻度を重み付けし、当該重み付けされた頻度を用いてグリッド毎のヒストグラムを生成する。この重み係数を適宜設定することにより、各環境に最適な目的信号区間推定方法が構築できる。 Preferably, in the present invention, the histogram generation unit weights the frequency of the quantized normalized signal value using a weighting factor, and generates a histogram for each grid using the weighted frequency. By setting this weighting factor as appropriate, an optimal target signal interval estimation method can be constructed for each environment.

また、本発明において好ましくは、偏在性計算部は、グリッド毎のヒストグラムを用い、量子化された各正規化信号値にそれぞれ対応する値を確率変数とする確率密度関数をグリッド毎に求める確率密度関数生成部と、確率密度関数のエントロピーに対して単調増加する関数値、又は、当該エントロピーに対して単調減少する関数値を、偏在値として求める偏在値算出部とを有する。このように偏在値を求めることにより、目的信号が存在する区間で小さな値をとり、目的信号が存在しない区間で大きな値をとる偏在性指標値、又は、目的信号が存在する区間で大きな値をとり、目的信号が存在しない区間で小さな値をとる偏在性指標値を生成することが可能となる。 In the present invention, it is preferable that the ubiquitous calculation unit uses a histogram for each grid, and obtains a probability density function for each grid, which uses a value corresponding to each quantized normalized signal value as a random variable. A function generation unit; and a ubiquitous value calculation unit that obtains a function value monotonously increasing with respect to the entropy of the probability density function or a function value monotonously decreasing with respect to the entropy as an ubiquitous value. By obtaining the ubiquitous value in this way, a ubiquitous index value that takes a small value in a section where the target signal exists and takes a large value in a section where the target signal does not exist, or a large value in a section where the target signal exists. Therefore, it is possible to generate an uneven distribution index value that takes a small value in a section where there is no target signal.

また、本発明において好ましくは、正規化部は、基準センサに対応する周波数領域信号の位相及び／又は振幅を基準とし、少なくとも当該基準センサ以外のセンサに対応する各周波数領域信号の位相及び／又は振幅を正規化し、当該正規化値又はその写像である正規化信号値を生成する。また、この場合に好ましくは、正規化信号値は、周波数成分が正規化され、周波数依存性が排除された値である。正規化信号値の周波数依存性が排除されていない場合、目的信号の時間周波数ビンにおける正規化信号値は、信号の到来方向と周波数とに依存した値となる。一方、正規化信号値の周波数依存性が排除されていた場合、目的信号の時間周波数ビンにおける正規化信号値は、信号の到来方向のみに依存した値となる。すなわち、同じ目的信号に対応する正規化信号値であったとしても、周波数依存性が排除された正規化信号値のほうが、周波数依存性が排除されていない正規化信号値よりも偏在性が高い。その結果、目的信号に起因する正規化信号値の偏在性がより明確に表れた偏在性指標値を得ることができ、偏在性指標値を指標として行われる目的信号区間の推定精度が向上する。 Preferably, in the present invention, the normalization unit uses the phase and / or amplitude of the frequency domain signal corresponding to the reference sensor as a reference, and at least the phase and / or each frequency domain signal corresponding to a sensor other than the reference sensor. The amplitude is normalized, and a normalized signal value that is the normalized value or a map thereof is generated. In this case, the normalized signal value is preferably a value obtained by normalizing frequency components and eliminating frequency dependence. When the frequency dependency of the normalized signal value is not excluded, the normalized signal value in the time frequency bin of the target signal is a value depending on the arrival direction and the frequency of the signal. On the other hand, when the frequency dependence of the normalized signal value is eliminated, the normalized signal value in the time frequency bin of the target signal is a value that depends only on the arrival direction of the signal. That is, even if the normalized signal values correspond to the same target signal, the normalized signal value from which the frequency dependence is eliminated is more unevenly distributed than the normalized signal value from which the frequency dependence is not eliminated. . As a result, it is possible to obtain an unevenness index value in which the unevenness of the normalized signal value caused by the target signal appears more clearly, and the accuracy of estimation of the target signal section performed using the unevenness index value as an index is improved.

また、本発明において判定部は、例えば、各フレームの偏在性指標値又はそれらの写像と、所定の閾値と、の大小を比較し、各フレームが目的信号区間であるか否かを判定する。また、判定部が、判定対象のフレームの偏在性指標値と、を非目的信号区間のフレームの偏在性指標値との比である除算値又は当該除算値の写像が、所定の閾値以上であった場合に判定対象のフレームが目的信号区間であると判定するか、当該所定の閾値を超える場合に判定対象のフレームが目的信号区間であると判定してもよい。また、例えば、判定部が、事前学習されたフレームの偏在性指標値と当該フレームが目的信号区間であるか否かの判定結果との関係を用いたパターン認識により、偏在性指標値算出部で算出された偏在性指標値に対応するフレームが目的信号区間であるか否かを判定してもよい。 In the present invention, for example, the determination unit compares the ubiquity index values of each frame or their mapping with a predetermined threshold value, and determines whether each frame is a target signal section. Further, the determination unit has a division value that is a ratio of the ubiquity index value of the frame to be determined and the ubiquity index value of the frame in the non-target signal section, or the mapping of the division value is equal to or greater than a predetermined threshold value. The determination target frame may be determined to be the target signal section, or if the predetermined threshold is exceeded, the determination target frame may be determined to be the target signal section. Further, for example, the determination unit uses the pattern recognition using the relationship between the pre-learned frame ubiquitous index value and the determination result as to whether or not the frame is the target signal section. It may be determined whether or not the frame corresponding to the calculated uneven distribution index value is the target signal section.

以上のように本発明では、雑音環境下であって、なおかつ、目的信号の到来方向を正確に知ることが出来ない状況において、少ない計算量で精度よく目的信号区間を推定することが可能となる。 As described above, according to the present invention, it is possible to accurately estimate the target signal section with a small amount of calculation in a situation where the direction of arrival of the target signal cannot be accurately obtained even in a noisy environment. .

以下、本発明を実施するための最良の形態を図面を参照して説明する。
図１は、本形態の目的信号区間推定装置１０の全体構成を例示したブロック図である。また、図２は、図１の偏在性指標値算出部１６の詳細構成を例示したブロック図である。また、図３は、図１の判定部１７の詳細構成を例示したブロック図である。 The best mode for carrying out the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram illustrating the overall configuration of a target signal section estimation device 10 of the present embodiment. FIG. 2 is a block diagram illustrating a detailed configuration of the uneven distribution index value calculation unit 16 of FIG. FIG. 3 is a block diagram illustrating a detailed configuration of the determination unit 17 of FIG.

＜構成＞
図１に例示するように、本形態の目的信号区間推定装置１０は、信号切出部１１と周波数領域変換部１２と基本周波数推定部１３と時間周波数領域分割部１４と正規化部１５と偏在性指標値算出部１６と判定部１７と制御部１８と記憶部１９とを具備し、Ｓ（Ｓ≧２）個のセンサ２０−１〜Ｓで観測され、サンプリング部３０でサンプリングされた信号が入力され、目的信号区間の分析結果を出力する装置である。また、図２（ａ）に例示するように、この例の偏在性指標値算出部１６は、ヒストグラム生成部１６ａと確率密度関数計算部１６ｂとエントロピー計算部１６ｃと平均部１６ｄを具備する。また、図２（ｂ）に例示するように、この例の判定部１７は、相対値計算部１７ａと尤度比算出部１７ｂと閾値判定部１７ｃとを具備する。 <Configuration>
As illustrated in FIG. 1, the target signal interval estimation device 10 of the present embodiment includes a signal extraction unit 11, a frequency domain conversion unit 12, a fundamental frequency estimation unit 13, a time frequency domain division unit 14, a normalization unit 15, and an uneven distribution. A sex index value calculation unit 16, a determination unit 17, a control unit 18, and a storage unit 19 are provided, and signals observed by S (S ≧ 2) sensors 20-1 to S and sampled by the sampling unit 30 are obtained. It is a device that inputs and outputs the analysis result of the target signal section. As illustrated in FIG. 2A, the ubiquitous index value calculation unit 16 of this example includes a histogram generation unit 16a, a probability density function calculation unit 16b, an entropy calculation unit 16c, and an average unit 16d. As illustrated in FIG. 2B, the determination unit 17 of this example includes a relative value calculation unit 17a, a likelihood ratio calculation unit 17b, and a threshold determination unit 17c.

なお、目的信号区間推定装置１０は、例えば、ＣＰＵ（central processing unit），ＲＡＭ（random access memory），ＲＯＭ（read only memory）等から構成される公知のコンピュータに所定のプログラムを実行させることによって構成されるものである。 The target signal section estimation device 10 is configured by causing a known computer including a CPU (central processing unit), a RAM (random access memory), a ROM (read only memory), and the like to execute a predetermined program, for example. It is what is done.

＜処理＞
次に、本形態の目的信号区間推定方法について説明する。
本形態の目的信号区間推定方法では、複数のセンサ２０−１〜Ｓ（Ｓ≧２）で観測された各信号を時間周波数分析し、特定の基準センサを基準とした正規化信号値を求め、所定の時間周波数区間であるグリッド内における正規化信号値の偏在性に基づいて、目的信号の有無を検出して出力する。なお、本形態では、複数のセンサ２０−１〜Ｓとしてマイクロフォンを利用し、それらで観測された各音響信号を用い、音声信号や音楽信号などの目的信号の有無を検出して出力する場合を例示する。また、以下では明記しないが、目的信号区間推定装置１０は、制御部１８の制御に基づいて各演算処理を実行し、各演算処理の過程で得られたデータは記憶部１９に逐次格納され、それ以降の各演算処理に利用される。 <Processing>
Next, the target signal section estimation method of this embodiment will be described.
In the target signal section estimation method of this embodiment, each signal observed by a plurality of sensors 20-1 to S (S ≧ 2) is subjected to time-frequency analysis, and a normalized signal value based on a specific reference sensor is obtained, Based on the uneven distribution of the normalized signal value in the grid that is a predetermined time frequency interval, the presence or absence of the target signal is detected and output. In this embodiment, a microphone is used as the plurality of sensors 20-1 to 20 -S, and each acoustic signal observed with them is used to detect and output the presence of a target signal such as a voice signal or a music signal. Illustrate. Further, although not specified below, the target signal section estimation device 10 executes each arithmetic processing based on the control of the control unit 18, and data obtained in the process of each arithmetic processing is sequentially stored in the storage unit 19, It is used for each subsequent calculation process.

図４は、本形態の目的信号区間推定方法を説明するためのフローチャートである。また、図５は、ステップＳ７の詳細を説明するためのフローチャートであり、図６は、ステップＳ８の詳細を説明するためのフローチャートである。以下、これらのフローチャートに沿って本形態の目的信号区間推定方法を説明していく。 FIG. 4 is a flowchart for explaining the target signal section estimation method of the present embodiment. FIG. 5 is a flowchart for explaining details of step S7, and FIG. 6 is a flowchart for explaining details of step S8. Hereinafter, the target signal section estimation method of this embodiment will be described along these flowcharts.

まず、Ｓ（Ｓ≧２）個のセンサ２０−１〜Ｓでそれぞれ観測された各信号がサンプリング部３０に入力される。これらの信号には音声信号や音楽信号等目的信号の他に環境雑音信号も含まれる。サンプリング部３０は、各信号を所定（例えば16,000Hz）のサンプリング周波数ｆ_ｓでサンプリングし、これにより各センサ２０−１〜Ｓに対応する時間領域の信号x(1,t),...,x(S,t)を抽出する（ステップＳ１）。なお、tはt番目のサンプリング点を示す。 First, each signal observed by S (S ≧ 2) sensors 20-1 to S is input to the sampling unit 30. These signals include environmental noise signals in addition to target signals such as audio signals and music signals. The sampling unit 30 samples each signal at a predetermined sampling frequency f _s (for example, 16,000 Hz), and thereby, the time domain signals x (1, t),. x (S, t) is extracted (step S1). Note that t represents the t-th sampling point.

サンプリング部３０で抽出された各時間領域の信号x(1,t),...,x(S,t)は、目的信号区間推定装置１０の信号切出部１１に入力される。信号切出部１１は、入力された各信号x(1,t),...x(S,t)を所定の時間区間であるフレーム毎に切り出し、各センサ２０−１〜Ｓについての各フレームi（ｉはフレームインデックスを示す）の信号x’(1,i,n),...,x’(S,i,n)を抽出する（ステップＳ２）。なお、nはフレームiにおけるn番目のサンプル点を表す。具体的には、信号切出部１１は、例えば、入力された各信号x(1,t),...,x(S,t)に対し、それぞれ所定の窓関数を例えば時間軸方向に１６ｍｓずつ移動（シフト）させながら乗じ、これにより、例えば３２ｍｓの時間長の信号x’(1,i,n),...,x’(S,i,n)を切り出す。より具体的には、例えば、サンプリング周波数が16,000Hzの場合、信号切出部１１は、入力された各信号x(1,t),...,x(S,t)に対し、それぞれ例えば式（１）のハニング窓を２５６サンプル点（１６，０００Ｈｚ×１６ｍｓ）ずつ移動（シフト）させながら乗じ、センサ２０−１〜Ｓ毎に、５１２サンプル点（１６，０００Ｈｚ×３２ｍｓ）の離散信号を１フレーム分の信号として切り出す。ここで、Lは切り出される１フレーム分の信号のサンプル点数（フレーム長：上述の例ではL=512）を表す。 Signals x (1, t),..., X (S, t) in each time domain extracted by the sampling unit 30 are input to the signal extraction unit 11 of the target signal section estimation device 10. The signal cutout unit 11 cuts out each input signal x (1, t),... X (S, t) for each frame that is a predetermined time interval, and each of the sensors 20-1 to 20S. The signals x ′ (1, i, n),..., X ′ (S, i, n) of the frame i (i indicates a frame index) are extracted (step S2). Note that n represents the nth sample point in frame i. Specifically, for example, the signal cutout unit 11 applies a predetermined window function to each input signal x (1, t),..., X (S, t), for example, in the time axis direction. By multiplying while shifting (shifting) by 16 ms, for example, a signal x ′ (1, i, n),..., X ′ (S, i, n) having a time length of 32 ms is cut out. More specifically, for example, when the sampling frequency is 16,000 Hz, the signal extraction unit 11 performs, for example, for each input signal x (1, t),..., X (S, t), respectively. The Hanning window of Equation (1) is multiplied while moving (shifted) by 256 sample points (16,000 Hz × 16 ms), and a discrete signal of 512 sample points (16,000 Hz × 32 ms) is obtained for each sensor 20-1 to S. Cut out as a signal for one frame. Here, L represents the number of sample points (frame length: L = 512 in the above example) of the signal for one frame to be cut out.

信号切出部１１は、以上のように切り出した各センサ２０−１〜Ｓについての各フレームiの信号x’(1,i,n),...,x’(S,i,n)を出力し、これらは周波数領域変換部１２に入力される。

The signal cutout unit 11 outputs signals x ′ (1, i, n),..., X ′ (S, i, n) of each frame i for the sensors 20-1 to S cut out as described above. These are input to the frequency domain transform unit 12.

周波数領域変換部１２では、各センサ２０−１〜Ｓについての各フレームiの信号x’(1,i,n),...,x’(S,i,n)を周波数領域に変換し、時間周波数ビン(i,k)毎の周波数領域信号（周波数領域スペクトル）X(1,i,k),...,X(S,i,k)を各センサ２０−１〜Ｓについて生成する（ステップＳ３）。離散フーリエ変換によってこの変換を行う場合、周波数領域変換部１２は、以下の式（２）のように周波数領域信号X(1,i,k),...,X(S,i,k)を算出する。 The frequency domain converter 12 converts the signals x ′ (1, i, n),..., X ′ (S, i, n) of each frame i for the sensors 20-1 to S to the frequency domain. , Frequency domain signals (frequency domain spectrum) X (1, i, k),..., X (S, i, k) for each time frequency bin (i, k) are generated for each sensor 20-1 to S. (Step S3). When performing this transformation by the discrete Fourier transform, the frequency domain transformation unit 12 uses frequency domain signals X (1, i, k),..., X (S, i, k) as shown in the following equation (2). Is calculated.

ここで、ｊは虚数単位を示し、ｓ（s∈{1,...,S}）は各センサ２０−１〜Ｓの番号を示す。また、ｋ（k=0,...,M-1）は周波数インデックスであり、サンプリング周波数ｆ_ｓをＭ等分した離散点を表す。Ｍはフレーム長Ｌ以上の自然数であり、例えば、M=512とする。周波数領域変換部１２は、以上のような変換によって得られた周波数領域信号（周波数スペクトル）X(1,i,k),...,X(S,i,k)を出力する。

Here, j indicates an imaginary unit, and s (sε {1,..., S}) indicates the number of each sensor 20-1 to S. K (k = 0,..., M−1) is a frequency index, which represents a discrete point obtained by dividing the sampling frequency f _s into M equal parts. M is a natural number greater than or equal to the frame length L, for example, M = 512. The frequency domain transform unit 12 outputs frequency domain signals (frequency spectrum) X (1, i, k),..., X (S, i, k) obtained by the above transformation.

また、信号切出部１１で切り出された各センサ２０−１〜Ｓについての各フレームiの信号x’(1,i,n),...,x’(S,i,n)は、基本周波数推定部１３にも入力される。基本周波数推定部１３は、各フレームiの時間領域の信号x’(1,i,n),...,x’(S,i,n)を用い、各センサs及びフレームi毎の基本周波数F₀(1,i),...,F₀(S,i)をそれぞれ推定する（ステップＳ４）。この推定には、例えば以下の自己相関法（例えば、「Quatieri, T. F., "Discre-time Speech Signal Processing principles and practice," Prentice-Hall, 2002; pp. 504-505」参照）を用いる。この場合、基本周波数推定部１３は、まず、以下のように各センサs及びフレームi毎の自己相関関数の係数c(s,i,n)をn=1,...,Lについて求める。 Further, the signals x ′ (1, i, n),..., X ′ (S, i, n) of each frame i for the sensors 20-1 to S cut out by the signal cutout unit 11 are: It is also input to the fundamental frequency estimation unit 13. The fundamental frequency estimator 13 uses the time domain signals x ′ (1, i, n),..., X ′ (S, i, n) of each frame i and uses the fundamental for each sensor s and frame i. The frequencies F ₀ (1, i),..., F ₀ (S, i) are estimated (step S4). For this estimation, for example, the following autocorrelation method (see, for example, “Quatieri, TF,“ Discre-time Speech Signal Processing principles and practice, ”Prentice-Hall, 2002; pp. 504-505”) is used. In this case, the fundamental frequency estimation unit 13 first obtains the coefficient c (s, i, n) of the autocorrelation function for each sensor s and frame i for n = 1,.

なお、周波数領域変換部１２が離散フーリエ変換によって周波数領域信号X(1,i,k),...,X(S,i,k)を算出する場合、基本周波数推定部１３は、周波数領域変換部１２の出力である各周波数領域信号X(1,i,k),...,X(S,i,k)の絶対値を２乗して逆フーリエ変換し、各自己相関関数の係数c(s,i,n)を求めることもできる。

When the frequency domain transform unit 12 calculates the frequency domain signals X (1, i, k),..., X (S, i, k) by discrete Fourier transform, the fundamental frequency estimation unit 13 The absolute value of each frequency domain signal X (1, i, k),..., X (S, i, k) that is the output of the transform unit 12 is squared and inverse Fourier transformed, and each autocorrelation function The coefficient c (s, i, n) can also be obtained.

次に、基本周波数推定部１３は、各センサs及びフレームi毎に、nの一定の探索範囲、例えば、32≦n≦320（サンプリング周波数f_s=16,000Hzの場合の50Hzから500Hzの周波数範囲に該当）の範囲内において、自己相関関数の係数c(s,i,n)が最大となるnを検出する。その結果得られたnは、入力信号x’(1,i,n),...,x’(S,i,n)の当該探索範囲において最も支配的な周期成分の周期長に対応し、入力信号x’(1,i,n),...,x’(S,i,n)がそれぞれ単一の完全な周期信号（例えば、正弦波）の場合にはその周期長に対応する。基本周波数推定部１３は、サンプリング周波数f_sを各センサs及びフレームi毎に得られたnで割ることで、各センサs及びフレームi毎の基本周波数F₀(1,i),...,F₀(S,i)を生成し、これらを出力する。なお、基本周波数の推定方法として、並列処理法、SIFTアルゴリズム、ケプストラム分析等を用いてもよい（例えば、「古井貞煕，“デジタル音声処理”，東海大学出版，ISBN4-486-00896-0」参照）。 Next, the fundamental frequency estimation unit 13 performs a constant search range of n for each sensor s and frame i, for example, 32 ≦ n ≦ 320 (a frequency range from 50 Hz to 500 Hz when the sampling frequency f _{s is} 16,000 Hz). N) where the coefficient c (s, i, n) of the autocorrelation function is maximized. The resulting n corresponds to the period length of the most dominant periodic component in the search range of the input signal x ′ (1, i, n), ..., x ′ (S, i, n). , If the input signals x '(1, i, n), ..., x' (S, i, n) are each a single perfect periodic signal (eg sine wave), it corresponds to the period length To do. The fundamental frequency estimator 13 divides the sampling frequency f _s by n obtained for each sensor s and frame i, so that the fundamental frequency F ₀ (1, i),. , F ₀ (S, i) are generated and output. Note that parallel processing, SIFT algorithm, cepstrum analysis, etc. may be used as the fundamental frequency estimation method (for example, “Sadaaki Furui,“ Digital Speech Processing ”, Tokai University Press, ISBN4-486-00896-0”). reference).

次に、時間周波数領域分割部１４に、各基本周波数F₀(1,i),...,F₀(S,i)と、周波数領域信号X(1,i,k),...,X(S,i,k)とが入力される。時間周波数領域分割部１４は、入力された各基本周波数F₀(1,i),...,F₀(S,i)又はその各倍音成分（基本周波数の整数倍の周波数成分）をそれぞれ含む有限の時間周波数区間であるグリッドを各センサs及びフレームi毎に１以上ずつ特定し、各グリッドに属する各時間周波数ビンの周波数領域信号XGRID₁(1,i,k),...,XGRID_G(S,i,k)を抽出して出力する（ステップＳ５）。なお、センサsのフレームiの各グリッドは、基本周波数F₀(s,i)又はその各倍音成分の近傍にある一定の時間周波数区間であり、例えば、基本周波数F₀(s,i)に最も近い時間周波数ビンから所定の時間周波数範囲にある時間周波数区間、及び、基本周波数F₀(s,i)の各倍音成分にそれぞれ最も近い時間周波数ビンから所定の時間周波数範囲にある時間周波数区間である。例えば、基本周波数F₀(s,i)に最も近い周波数ビン及び基本周波数F₀(s,i)の各倍音成分にそれぞれ最も近い周波数ビンをk'と表現すると（f=f_s・k'/Mが基本周波数F₀(s,i)又はその各倍音成分に最も近くなる）、センサsのフレームiの各グリッドに属する各時間周波数ビンの周波数領域信号XGRID_g(s,i,k)（g=1,...,G）は以下のように表現できる。
XGRID_g(s,i,k)={X(s,i+P,k'+Q)} ...(4) Next, each time frequency domain dividing unit 14 is supplied with each fundamental frequency F ₀ (1, i),..., F ₀ (S, i) and frequency domain signal X (1, i, k),. , X (S, i, k) are input. The time-frequency domain dividing unit 14 inputs each input fundamental frequency F ₀ (1, i),..., F ₀ (S, i) or each harmonic component (a frequency component that is an integral multiple of the fundamental frequency). A grid that is a finite time frequency interval including one or more is specified for each sensor s and frame i, and the frequency domain signal XGRID ₁ (1, i, k), ..., of each time frequency bin belonging to each grid. XGRID _G (S, i, k) is extracted and output (step S5). Each grid of the frame i of the sensor s is a fixed time frequency interval in the vicinity of the fundamental frequency F ₀ (s, i) or each harmonic component thereof, for example, at the fundamental frequency F ₀ (s, i). A time frequency interval in the predetermined time frequency range from the nearest time frequency bin, and a time frequency interval in the predetermined time frequency range from the time frequency bin closest to each harmonic component of the fundamental frequency F ₀ (s, i) It is. For example, if the frequency bin closest to the fundamental frequency F ₀ (s, i) and the frequency bin closest to each harmonic component of the fundamental frequency F ₀ (s, i) are expressed as k ′ (f = f _s · k ′) / M is the closest to the fundamental frequency F ₀ (s, i) or its harmonic component), the frequency domain signal XGRID _g (s, i, k) of each time frequency bin belonging to each grid of frame i of sensor s (G = 1, ..., G) can be expressed as follows.
XGRID _g (s, i, k) = {X (s, i + P, k '+ Q)} ... (4)

ここで、Aは時間方向の幅を示し、Bは周波数方向の幅を示し、｛・｝は・を要素とする集合を意味する。A、Bには例えばA=9,B=5（時間幅160ms、周波数幅156.25Hz）を用いる。また、Ｇは基本周波数及びその各倍音成分の合算数を示す。Ｇは例えば定数である。

Here, A indicates the width in the time direction, B indicates the width in the frequency direction, and {·} means a set whose element is. For A and B, for example, A = 9, B = 5 (time width 160 ms, frequency width 156.25 Hz) is used. G represents the total number of fundamental frequencies and their harmonic components. G is a constant, for example.

［好ましいグリッド幅の設定方法］
なお、前述したように、本発明では、正規化信号値のグリッド毎の偏在性を示す偏在値を求め、当該グリッド毎の偏在値を用い、正規化信号値のフレーム毎の偏在性を示す偏在性指標値を算出し、そのフレームが目的信号区間であるか否かの判定を行う。ここで、グリッドの時間周波数区間が広すぎるとグリッド内での正規化信号値の偏在性が平坦化され、偏在性から目的信号区間であるか非目的信号区間であるかを判定することが困難となる。逆にグリッドの時間周波数区間が狭すぎると、サンプル数が少ないため全てのグリッドでの正規化信号値の偏在性が高くなり、偏在性から目的信号区間であるか非目的信号区間であるかを判定することが困難となる。よって、グリッド幅はこのような問題が生じない範囲で設定する必要がある。以下に好ましいグリッド幅の設定方法について説明する。 [Preferred grid width setting method]
As described above, in the present invention, an uneven value indicating the uneven distribution of each normalized signal value for each grid is obtained, and the uneven distribution value indicating the uneven distribution of each normalized signal value for each frame is obtained using the uneven distribution value for each grid. The sex index value is calculated, and it is determined whether or not the frame is the target signal section. Here, if the time frequency section of the grid is too wide, the uneven distribution of the normalized signal value in the grid is flattened, and it is difficult to determine whether the signal is the target signal section or the non-target signal section from the uneven distribution. It becomes. On the other hand, if the time frequency section of the grid is too narrow, the number of samples is small and the unevenness of the normalized signal value in all grids becomes high, and it is determined whether it is the target signal section or the non-target signal section from the uneven distribution. It becomes difficult to judge. Therefore, it is necessary to set the grid width within a range where such a problem does not occur. A preferable grid width setting method will be described below.

《式（５）のAについて》
信号が音声信号の場合、およそ音声信号の定常性が仮定できる50〜300 msの時間長に対応するAを決定すればよい。すなわち、フレームシフトの幅をSF msとすると、50/SF〜300/SFの間の整数値をAとすればよい。また、話者の発話速度SR syllables/sec（１秒あたりに発話される音節数）が事前に分かるならば、(1000/SR)/SF近傍の（例えば最も近い）整数値をAとしてもよい（例えば、SR=7 syllables/sec, SF=16msならば、(1000/SR)/SF=(1000/7)/16=8.93なので、A=9とする、など）。また、対象信号が音楽信号ならば、音楽のリズム（音声のSRに対応）から同様にAを求める値を用いることが望ましい。 << About A in Formula (5) >>
When the signal is an audio signal, A corresponding to a time length of 50 to 300 ms in which the steadiness of the audio signal can be assumed is determined. That is, if the width of the frame shift is SF ms, an integer value between 50 / SF and 300 / SF may be A. Also, if the speaker's speech rate SR syllables / sec (the number of syllables spoken per second) is known in advance, an integer value in the vicinity of (1000 / SR) / SF (for example, the closest) may be set to A. (For example, if SR = 7 syllables / sec and SF = 16 ms, (1000 / SR) / SF = (1000/7) /16=8.93, so A = 9). If the target signal is a music signal, it is desirable to use a value for obtaining A in the same way from the rhythm of music (corresponding to the SR of sound).

《式（５）のBについて》
基本的に、窓関数w(n)のメインローブ幅から得られる幅を用いるとよい。例えば、窓関数w(n)の離散フーリエ変換値をW(k)とし、1<k<M/2の範囲で20 log₁₀(W(k)/W(0))> -60dBを満たす最大の周波数ビンkをcfとし、cf・2＋１近傍の（例えば最も近い）整数値をBとする。この値はサンプリング周波数f_s，分析フレーム長L，離散フーリエ変換の周波数ビンの総数Mに応じて変化する（例えば、サンプリング周波数 8kHz，窓関数の幅が256サンプル点，M=256ならば、cf=2となり、B=5となる）。 << About B in Formula (5) >>
Basically, a width obtained from the main lobe width of the window function w (n) may be used. For example, the discrete Fourier transform value of the window function w (n) is W (k), and the maximum value satisfying 20 log ₁₀ (W (k) / W (0))>-60dB in the range of 1 <k <M / 2 And cf · 2 + 1 (for example, the closest integer value) is B. This value varies depending on the sampling frequency f _s , the analysis frame length L, and the total number M of frequency bins of the discrete Fourier transform (for example, if the sampling frequency is 8 kHz, the width of the window function is 256 sampling points, and M = 256, cf = 2 and B = 5).

また、基本周波数推定部１３で推定された基本周波数がF₀(s,i)Hzであった場合には、一つのグリッドに２つ以上の音声信号の調波成分が入らないように、例えばB=2・F₀(s,i)/(f_s/M)+1により定め、これが上記のメインローブ幅から求まる幅より大きくなる場合は、上記のメインローブ幅から求める値を採用することとしてもよい。例えば、サンプリング周波数 8kHz，窓関数の幅が256サンプル点，M=256のとき、F₀(s,i)=50 HzならB=2・50・（8000/256）+1＝4.2となるので、例えばB=4とする。一方、F₀(s,i)=200 HzならB=2・200・（8000/256）+1＝13.8となるが、上記のメインローブ幅から求める値B=5よりも大きくなるため、B=5を採用する。これは、メインローブ幅の中でのみ音声信号の到来方向が偏在することによる。これらは目的信号が音楽信号である場合も同様である（「好ましいグリッド幅の設定方法」の説明終わり）。 Further, when the fundamental frequency estimated by the fundamental frequency estimation unit 13 is F ₀ (s, i) Hz, for example, to prevent harmonic components of two or more audio signals from entering one grid, for example, If B = 2 · F ₀ (s, i) / (f _s / M) +1 and this is larger than the width obtained from the main lobe width, the value obtained from the main lobe width should be adopted. It is good. For example, if the sampling frequency is 8 kHz, the window function width is 256 sampling points, and M = 256, then F = ₀ (s, i) = 50 Hz, so B = 2 · 50 · (8000/256) + 1 = 4.2 For example, B = 4. On the other hand, if F ₀ (s, i) = 200 Hz, B = 2 · 200 · (8000/256) + 1 = 13.8, but this is larger than the value B = 5 obtained from the above main lobe width. = 5 is adopted. This is because the arrival direction of the audio signal is unevenly distributed only within the main lobe width. The same applies to the case where the target signal is a music signal (end of description of “preferred grid width setting method”).

次に、時間周波数領域分割部１４から出力された各グリッドに属する各時間周波数ビンの周波数領域信号XGRID₁(1,i,k),...,XGRID_G(S,i,k)が正規化部１５に入力される。正規化部１５は、特定の基準センサs_B∈{1,...,S}に対応する時間周波数領域分割部１４で抽出された周波数領域信号XGRID₁(s_B,i,k),...,XGRID_G(s_B,i,k)を基準として、少なくとも当該基準センサs_B以外のセンサs(≠s_B)に対応する時間周波数領域分割部１４で抽出された各周波数領域信号XGRID₁(s,i,k),...,XGRID_G(s,i,k)を正規化し、センサで観測された信号の到来方向に対応する正規化信号値ZGRID₁(i,k),...,ZGRID_G(i,k)を時間周波数ビン(i,k)毎に生成する（ステップＳ５）。なお、このような各正規化信号値ZGRID₁(i,k),...,ZGRID_G(i,k)は、目的信号が存在する時間周波数ビン(i,k)において、目的信号の到来方向に対応する値に偏ったものとなる。以下に正規化部１５が生成する正規化信号値ZGRID_g(i,k)（g=1,...,G）の例を示す。 Next, the frequency domain signals XGRID ₁ (1, i, k),..., XGRID _G (S, i, k) of each time frequency bin belonging to each grid output from the time frequency domain dividing unit 14 are normalized. Is input to the conversion unit 15. The normalization unit 15 uses the frequency domain signal XGRID ₁ (s _B , i, k),... Extracted by the time frequency domain division unit 14 corresponding to a specific reference sensor s _B ∈ {1,. .., XGRID _G (s _B , i, k) as a reference, each frequency domain signal XGRID extracted by the time frequency domain dividing unit 14 corresponding to at least a sensor s (≠ s _B ) other than the reference sensor s _B ₁ (s, i, k), ..., XGRID _G (s, i, k) is normalized and the normalized signal value ZGRID ₁ (i, k), corresponding to the direction of arrival of the signal observed by the sensor ..., ZGRID _G (i, k) is generated for each time frequency bin (i, k) (step S5). Each such normalized signal value ZGRID ₁ (i, k), ..., ZGRID _G (i, k) is the arrival of the target signal in the time frequency bin (i, k) where the target signal exists. It becomes biased to a value corresponding to the direction. An example of the normalized signal value ZGRID _g (i, k) (g = 1,..., G) generated by the normalization unit 15 is shown below.

［正規化信号値ZGRID_g(i,k)の例］
本形態では正規化信号値ZGRID_g(i,k)の一例として、Ｓ＝２とし、基準センサ２０−１に対応する周波数領域信号XGRID_g(1,i,k)と、他方のセンサ２０−２に対応する周波数領域信号XGRID_g(2,i,k)とから信号到来方向を推定し、その信号到来方向推定値を正規化信号値ZGRID_g(i,k)とする（正規化信号値ZGRID_g(i,k)の例１）。この例では、正規化部１５は、以下の式（７）（８）によって算出された信号到来方向θ_g(i,k)を正規化信号値ZGRID_g(i,k)とする。なお、νは音速（約340ｍ/秒）を示し、dはセンサ間距離（ｍ）を示し、fは周波数ビンｋに対応する離散周波数f=f_s・k/Mを示し、arg(・)は・の位相（偏角）を示す。また、τ_g(i,k)は信号源から各センサ２０−１，２までの信号到達時間差を示し、θ_g(i,k)は信号到来方向推定値を示す。また、式（８）によって算出される信号到来方向θ_g(i,k)は、センサ２０−１，２を結ぶ線分の中点を通り、その線分と直交する方向を0radianとした角度（radian）である。なお、このように算出された正規化信号値ZGRID_g(i,k)は、周波数成分ｆが正規化され、周波数依存性が排除された値となる。 [Example of normalized signal value ZGRID _g (i, k)]
In this embodiment, as an example of the normalized signal value ZGRID _g (i, k), S = 2, the frequency domain signal XGRID _g (1, i, k) corresponding to the reference sensor 20-1, and the other sensor 20- 2 is estimated from the frequency domain signal XGRID _g (2, i, k) corresponding to 2, and the estimated signal arrival direction is set as a normalized signal value ZGRID _g (i, k) (normalized signal value). Example 1 of ZGRID _g (i, k). In this example, the normalization unit 15 sets the signal arrival direction θ _g (i, k) calculated by the following equations (7) and (8) as the normalized signal value ZGRID _g (i, k). Ν represents the speed of sound (about 340 m / sec), d represents the distance between sensors (m), f represents the discrete frequency f = f _s · k / M corresponding to the frequency bin k, arg (·) Indicates the phase (deflection angle). Further, τ _g (i, k) indicates a signal arrival time difference from the signal source to each of the sensors 20-1, 2 and θ _g (i, k) indicates a signal arrival direction estimated value. Further, the signal arrival direction θ _g (i, k) calculated by the equation (8) passes through the midpoint of the line connecting the sensors 20-1 and 20-2, and is an angle with the direction orthogonal to the line as 0radian. (Radian). The normalized signal value ZGRID _g (i, k) calculated in this way is a value in which the frequency component f is normalized and the frequency dependency is eliminated.

また、前述の式（７）で算出された信号到達時間差τ_g(i,k)を正規化信号値ZGRID_g(i,k)としてもよい（正規化信号値ZGRID_g(i,k)の例２）。なお、このように算出された正規化信号値ZGRID_g(i,k)も周波数成分ｆが正規化され、周波数依存性が排除された値となる。

The signal arrival time difference tau _g (i, k) calculated in formula (7) described above the normalized signal value zgrid _g (i, k) may be (normalized signal value zgrid _g (i, k) Example 2). Note that the normalized signal value ZGRID _g (i, k) calculated in this way is also a value in which the frequency component f is normalized and the frequency dependency is eliminated.

また、周波数領域信号XGRID_g(2,i,k)の位相に対する周波数領域信号XGRID_g(1,i,k)の位相差arg(XGRID_g(2,i,k)/XGRID_g(1,i,k))を正規化信号値ZGRID_g(i,k)としてもよいし（正規化信号値ZGRID_g(i,k)の例３）、周波数領域信号XGRID_g(2,i,k)の位相と周波数領域信号XGRID_g(1,i,k)の位相との差arg(XGRID_g(2,i,k))- arg (XGRID_g(1,i,k))を正規化信号値ZGRID_g(i,k)としてもよい（正規化信号値ZGRID_g(i,k)の例４）。さらに、周波数領域信号XGRID_g(2,i,k)振幅に対する周波数領域信号XGRID_g(1,i,k)の振幅の比｜XGRID_g(2,i,k)｜/｜XGRID_g(1,i,k)｜を正規化信号値ZGRID_g(i,k)としてもよいし（正規化信号値ZGRID_g(i,k)の例５）、周波数領域信号XGRID_g(1,i,k)のパワーに対する周波数領域信号XGRID_g(2,i,k)のパワーの比｜XGRID_g(2,i,k)｜^２/｜XGRID_g(1,i,k)｜^２を正規化信号値ZGRID_g(i,k)としてもよい（正規化ZGRID_g(i,k)の例６）。何れの場合も、目的信号が存在する時間周波数ビン(i,k)においてのみ、目的信号の到来方向に対応する値に偏った値を取るため、正規化信号値ZGRID_g(i,k)の偏在性を指標として目的信号が存在するか否かを判定できる。 The phase difference arg (XGRID _g (2, i of the frequency domain signal _{XGRID g (2, i, k} ) a frequency domain signal XGRID _g with respect to the phase of the (1, i, k), k) / XGRID g (1, i , k)) may be set as the normalized signal value zGRID _g (i, k) (eg normalized signal value _{zGRID g (i, k) 3} ), the frequency domain signal _{XGRID g (2, i, k} ) phase and frequency domain signal _{XGRID g (1, i, k} ) the difference between the phase of _{arg (XGRID g (2, i} , k)) - arg (XGRID g (1, i, k)) normalized signal value ZGRID _g (i, k) may be used (Example 4 of normalized signal value ZGRID _g (i, k)). Further, the ratio of the amplitude of the frequency domain signal XGRID _g (1, i, k) to the amplitude of the frequency domain signal XGRID _g (2, i, k) | XGRID _g (2, i, k) | / | XGRID _g (1, i, k) | (may be a i, k) (normalized signal value zGRID _g (i, k) a normalized signal value zgrid _g example 5), the frequency domain signal _{XGRID g (1, i, k} ) Of power of frequency domain signal XGRID _g (2, i, k) to power of | XGRID _g (2, i, k) | ² / | XGRID _g (1, i, k) | ² is normalized signal value ZGRID _g (i, k) may also be used (Example 6 of normalized ZGRID _g (i, k)). In either case, only in the time frequency bin (i, k) where the target signal exists, a value biased to a value corresponding to the direction of arrival of the target signal is taken, so that the normalized signal value ZGRID _g (i, k) Whether or not the target signal exists can be determined using the uneven distribution as an index.

また、上記ではセンサの数が２つの場合を例示したが、センサの数が３以上の場合は、例えば以下のように、目的信号の到来方位角推定値θ_g(i,k)と仰角推定値φ_g (i,k)とを求め、それら２つの値を時間周波数ビン(i,k)に対する正規化信号値ZGRID_g(i,k)としてもよい（正規化信号値ZGRID_g(i,k)の例７）。 Moreover, although the case where the number of sensors was two was illustrated above, when the number of sensors is three or more, for example, as shown below, the arrival azimuth angle estimated value θ _g (i, k) of the target signal and the elevation angle estimation value phi _g (i, k) and seeking, these two values time-frequency bins (i, k) the normalized signal values for zGRID _g (i, k) may be (normalized signal value zgrid _g (i, Example 7) of k).

まず、各センサ２０−s（s=1,...,S）の空間中の座標ベクトルをｄ_ｓ＝［ｘ座標，ｙ座標，ｚ座標］とする。また、J（J∈(1,...,S)）番目のセンサ２０−Jを基準センサとし、基準センサ２０−Jと各センサ２０−sとの距離ベクトルDを以下の式（９）のように設定する。なお、[・]^Tはベクトル・の転置を示す。 First, a coordinate vector in the space of each sensor 20-s (s = 1,..., S) is set to d _s = [x coordinate, y coordinate, z coordinate]. Further, a J (J∈ (1,..., S)) th sensor 20-J is set as a reference sensor, and a distance vector D between the reference sensor 20-J and each sensor 20-s is expressed by the following equation (9). Set as follows. [•] ^T indicates transposition of a vector.

D=[d₁-d_J, d₂-d_J,...,d_S-d_J]^T ...(9)
また、基準センサ２０−Jと各センサ２０−sとの信号到達時間差τ_g(s,i,k)を以下の式（１０）で求め、それらを要素とした信号到達時間差ベクトルτ_g'(i,k)を以下の式（１１）のように求める。 D = [d ₁ -d _J , d ₂ -d _J , ..., d _S -d _J ] ^T ... (9)
Further, a signal arrival time difference τ _g (s, i, k) between the reference sensor 20-J and each sensor 20-s is obtained by the following equation (10), and a signal arrival time difference vector τ _g ′ () using them as elements. i, k) is obtained as in the following equation (11).

τ_g'(i,k)=[τ_g(1,i,k),τ_g(2,i,k),...,τ_g(S,i,k)]^T ...(11)
上述の式（９）〜（１１）には以下の式（１２）の関係が成り立ち、以下の式（１２）から目的信号の到来方位角推定値θ_g(i,k)と仰角推定値φ_g(i,k)とを求める。なお、式（１２）におけるD^-1はムーア・ペンローズ型一般化逆行列などの一般化逆行列である。また、目的信号の到来方位角とはｘ−ｙ平面上の目的信号の到来方向を意味し、目的信号の仰角とはｘ−ｚ平面上の目的信号の到来方向を意味する。また、ｙ軸線方向が0radianである。

τ _g '(i, k) = [τ _g (1, i, k), τ _g (2, i, k), ..., τ _g (S, i, k)] ^T ... (11 )
The relationship of the following equation (12) is established in the above equations (9) to (11). From the following equation (12), the arrival azimuth angle estimated value θ _g (i, k) of the target signal and the elevation angle estimated value φ _{Find g} (i, k). In Equation (12), D ⁻¹ is a generalized inverse matrix such as a Moore-Penrose type generalized inverse matrix. Also, the arrival azimuth angle of the target signal means the arrival direction of the target signal on the xy plane, and the elevation angle of the target signal means the arrival direction of the target signal on the xz plane. The y-axis direction is 0radian.

ν・D^-1・τ_g'(i,k)=[cosθ_g(i,k) cosφ_g(i,k),sinθ_g(i,k)sinφ_g(i,k),sinφ_g(i,k)]^T ...(12)
また、上述の正規化信号値Z (i,k)の例１〜７で例示した正規化信号値ZGRID_g(i,k)を組み合わせ、時間周波数ビン(i,k)毎に２以上の正規化信号値ZGRID_g(i,k)算出する構成であってもよい（正規化信号値ZGRID_g(i,k)の例８）。例えば、位相差arg(XGRID_g(2,i,k)/XGRID_g(1,i,k))と振幅の比｜XGRID_g(2,i,k)｜/｜XGRID_g(1,i,k)｜との組を時間周波数ビン(i,k)の正規化信号値ZGRID_g(i,k)としてもよい。また、例えばS=3とし、位相差arg(XGRID_g(2,i,k)/XGRID_g(1,i,k))と振幅の比｜XGRID_g(3,i,k)｜/｜XGRID_g(1,i,k)｜との組を時間周波数ビン(i,k)の正規化信号値ZGRID_g(i,k)としてもよい。また、上述のように生成した値の写像を正規化信号値ZGRID_g(i,k)としてもよい（［正規化信号値ZGRID_g(i,k)の例］の説明終わり）。 ν ・ D ⁻¹・ τ _g ′ (i, k) = (cosθ _g (i, k) cosφ _g (i, k), sinθ _g (i, k) sinφ _g (i, k), sinφ _g (i , k)] ^T ... (12)
Further, the normalized signal value ZGRID _g (i, k) illustrated in Examples 1 to 7 of the above-described normalized signal value Z (i, k) is combined, and two or more normalized values are obtained for each time frequency bin (i, k). The normalized signal value ZGRID _g (i, k) may be calculated (Example 8 of the normalized signal value ZGRID _g (i, k)). For example, the ratio of the phase difference arg (XGRID _g (2, i, k) / XGRID _g (1, i, k)) to the amplitude | XGRID _g (2, i, k) | / | XGRID _g (1, i, k k) | may be a normalized signal value ZGRID _g (i, k) of the time frequency bin (i, k). Also, for example, S = 3, and the ratio of the phase difference arg (XGRID _g (2, i, k) / XGRID _g (1, i, k)) and amplitude | XGRID _g (3, i, k) | / | XGRID The pair with _g (1, i, k) | may be the normalized signal value ZGRID _g (i, k) of the time frequency bin (i, k). Moreover, (End description of the example of the normalized signal values _{ZGRID g (i, k)]} ) mapping the normalized signal value ZGRID _g (i, k) of the value generated as described above may be.

以上のようにステップＳ６では、正規化部１５が上述のような正規化信号値ZGRID₁(i,k),...,ZGRID_G(i,k)を生成し、出力する。 As described above, in step S6, the normalization unit 15 generates and outputs the normalized signal values ZGRID ₁ (i, k),..., ZGRID _G (i, k) as described above.

正規化部１５から出力された各正規化信号値ZGRID₁(i,k),...,ZGRID_G(i,k)は、偏在性指標値算出部１６に入力される。偏在性指標値算出部１６は、各正規化信号値ZGRID₁(i,k),...,ZGRID_G(i,k)のグリッド毎の偏在性を示す偏在値H₁(i,k),...,H_G(i,k)を求め、当該グリッド毎の偏在値H₁(i,k),...,H_G(i,k)を用い、正規化信号値のフレームi毎の偏在性を示す偏在性指標値H(i)を算出する（ステップＳ７）。以下にステップＳ７の詳細を例示する。 Each normalized signal value ZGRID ₁ (i, k),..., ZGRID _G (i, k) output from the normalizing unit 15 is input to the uneven distribution index value calculating unit 16. The ubiquitous index value calculation unit 16 has an ubiquitous value H ₁ (i, k) indicating the ubiquity of each normalized signal value ZGRID ₁ (i, k),..., ZGRID _G (i, k) for each grid. , ..., H _G (i, k), and using the uneven value H ₁ (i, k), ..., H _G (i, k) for each grid, the normalized signal value frame i The ubiquity index value H (i) indicating the ubiquity for each is calculated (step S7). Details of step S7 will be exemplified below.

［ステップＳ７の例］
この例の場合、まず、偏在性指標値算出部１６のヒストグラム生成部１６ａ（図２）が、入力された各正規化信号値ZGRID₁(i,k),...,ZGRID_G(i,k)をそれぞれＣ個の値Z(c)(c=1,..,C)に量子化し、量子化された正規化信号値Z(c)の頻度bin₁(i,k,c),...,bin_G(i,k,c)(c=1,...,C)をグリッド毎に求め、グリッド毎のヒストグラムを生成する（ステップＳ７１）。例えば、正規化信号値ZGRID_g(i,k)が信号到来方向θ_g(i,k)であり、Ｃ＝３２であった場合、各正規化信号値ZGRID_g(i,k)は以下のようなＣ個の正規化信号値Z(c)に量子化される。 [Example of Step S7]
In this example, first, the histogram generation unit 16a (FIG. 2) of the ubiquitous index value calculation unit 16 inputs the respective normalized signal values ZGRID ₁ (i, k), ..., ZGRID _G (i, k) is quantized into C values Z (c) (c = 1,..., C), respectively, and the frequency bin ₁ (i, k, c), ..., bin _G (i, k, c) (c = 1, ..., C) is obtained for each grid, and a histogram for each grid is generated (step S71). For example, when the normalized signal value ZGRID _g (i, k) is the signal arrival direction θ _g (i, k) and C = 32, each normalized signal value ZGRID _g (i, k) is It is quantized into such C normalized signal values Z (c).

Z(1) (-π/2≦ZGRID_g(i,k)＜-7π/16)
Z(2) (-7π/16≦ZGRID_g(i,k)<-3π/16)
・・・
Z(C) (7π/16<ZGRID_g(i,k)<π/2)
なお、前述の式（７）で算出された信号到達時間差τ_g(i,k)を正規化信号値ZGRID_g(i,k)とした場合には、ヒストグラム生成部１６ａは、例えば｜τ_g(i,k)｜≦(d/ν)×α（αは正の定数）の単位で正規化信号値ZGRID_g(i,k)をＣ個に量子化する。 Z (1) (-π / 2 ≦ ZGRID _g (i, k) <-7π / 16)
Z (2) (-7π / 16 ≦ ZGRID _g (i, k) <-3π / 16)
...
Z (C) (7π / 16 <ZGRID _g (i, k) <π / 2)
When the signal arrival time difference τ _g (i, k) calculated by the above equation (7) is the normalized signal value ZGRID _g (i, k), the histogram generation unit 16a, for example, | τ _g The normalized signal value ZGRID _g (i, k) is quantized to C in units of (i, k) | ≦ (d / ν) × α (α is a positive constant).

そして、ヒストグラム生成部１６ａは、時間周波数ビン(i,k)毎に正規化信号値ZGRID_g(i,k)がいずれの正規化信号値Z(c)に対応するかを判断し、その頻度をグリッド毎にカウントし、グリッド毎のヒストグラムを生成する。この際、ヒストグラム生成部１６ａが、或る重み係数を用い、量子化された正規化信号値Z(c)の頻度を重み付けし、当該重み付けされた頻度を用いてグリッド毎のヒストグラムを生成してもよい。例えば、ヒストグラム生成部１６ａが、頻度のカウントの際、対応する時間周波数ビン(i,k)の重み係数W(i,k)で頻度を重み付けしてもよい。より具体的には、例えば、時間周波数ビン(1,2)の正規化信号値ZGRID_g(i,k)を量子化した値がZ(5)であった場合、Z(5)に対する頻度としてW(1,2)をカウントする。すなわち、正規化信号値ZGRID_g(i,k)に対する、量子化された正規化信号値Z(c)の頻度bin_g(i,k,c)(c=1,...,C)は、例えば、以下の式（１３）のようにカウントされてもよい。 Then, the histogram generator 16a determines which normalized signal value Z (c) corresponds to the normalized signal value ZGRID _g (i, k) for each time frequency bin (i, k), and the frequency thereof. Is counted for each grid, and a histogram for each grid is generated. At this time, the histogram generation unit 16a weights the frequency of the quantized normalized signal value Z (c) using a certain weighting coefficient, and generates a histogram for each grid using the weighted frequency. Also good. For example, the histogram generation unit 16a may weight the frequency with the weighting factor W (i, k) of the corresponding time frequency bin (i, k) when counting the frequency. More specifically, for example, if the value obtained by quantizing the normalized signal value ZGRID _g (i, k) of the time frequency bin (1, 2) is Z (5), the frequency for Z (5) is Count W (1,2). That is, the frequency bin _g (i, k, c) (c = 1, ..., C) of the quantized normalized signal value Z (c) with respect to the normalized signal value ZGRID _g (i, k) is For example, you may count like the following formula | equation (13).

bin_g(i,k,c)=ΣW(i+P,k+Q) if ZGRID_g(i+P,k+Q)∈Z(c) ...(13) bin _g (i, k, c) = ΣW (i + P, k + Q) if ZGRID _g (i + P, k + Q) ∈Z (c) ... (13)

《重み係数W(i,k)の例》
以下に重み係数W(i,k)の例を示す。重み係数W(i,k)の例として、例えば、以下の式（１４）のように、全センサについての周波数領域信号X(1,i,k),...,X(S,i,k)のパワーを合算し、それを全センサ・全周波数についての周波数領域信号X(1,i,k),...,X(S,i,k)のパワー総和で正規化した値を例示できる（重み係数W(i,k)の例１）。 << Example of weighting factor W (i, k) >>
An example of the weighting factor W (i, k) is shown below. As an example of the weighting factor W (i, k), for example, the frequency domain signals X (1, i, k),..., X (S, i, k) is summed and normalized by the sum of the power of the frequency domain signals X (1, i, k), ..., X (S, i, k) for all sensors and all frequencies. (Example 1 of weighting factor W (i, k)).

また、重み係数W(i,k)として、例えば、以下の式（１５）のように、全センサについての周波数領域信号X(1,i,k),...,X(S,i,k)の振幅の絶対値を合算し、それを全センサ・全周波数についての周波数領域信号X(1,i,k),...,X(S,i,k)の振幅の絶対値の総和で正規化した値を用いてもよい（重み係数W(i,k)の例２）。

Further, as the weighting factor W (i, k), for example, the frequency domain signals X (1, i, k),..., X (S, i, The absolute value of the amplitude of k) is summed, and the sum of the absolute values of the amplitudes of the frequency domain signals X (1, i, k), ..., X (S, i, k) for all sensors and all frequencies. A value normalized by the sum may be used (Example 2 of weighting factor W (i, k)).

また、式（１４）（１５）のような正規化を行わないで重み係数W(i,k)を求めてもよい（重み係数W(i,k)の例３）。この場合であっても、雑音環境によっては十分に目的信号区間推定が可能な場合もある。例えば、以下の式（１１）（１２）のように重み係数W(i,k)を求めてもよい。

Further, the weighting factor W (i, k) may be obtained without performing normalization as in the equations (14) and (15) (Example 3 of the weighting factor W (i, k)). Even in this case, the target signal section may be sufficiently estimated depending on the noise environment. For example, the weighting coefficient W (i, k) may be obtained as in the following equations (11) and (12).

また、全センサについての周波数領域信号X(1,i,k),...,X(S,i,k)の振幅の絶対値やパワーを合算するのではなく、一部のセンサについての周波数領域信号X(1,i,k),...,X(S,i,k)の振幅の絶対値やパワーを合算したり、以下の式（１８）（１９）のように１個のセンサ２０−Ｊの周波数領域信号X(J,i,k)の振幅の絶対値やパワーを重み係数W(i,k)としたりしてもよい（重み係数W(i,k)の例４）。なおこの場合には、できるだけ信号源に近い（出来れば最も近い）センサ２０−Jの周波数領域信号X(J,i,k)を用いることが望ましい。信号源に近いセンサ２０−Jほど、遅延や畳み込みの影響が少なく、適切な重み係数W(i,k)を算出できるからである。

Also, instead of adding the absolute values and powers of the amplitudes of the frequency domain signals X (1, i, k), ..., X (S, i, k) for all sensors, The absolute values and powers of the amplitudes of the frequency domain signals X (1, i, k),..., X (S, i, k) are added together, or one as shown in the following equations (18) and (19) The absolute value or power of the amplitude of the frequency domain signal X (J, i, k) of the sensor 20-J may be used as the weighting factor W (i, k) (an example of the weighting factor W (i, k)) 4). In this case, it is desirable to use the frequency domain signal X (J, i, k) of the sensor 20-J as close to the signal source as possible (closest if possible). This is because the sensor 20-J closer to the signal source has less influence of delay and convolution, and an appropriate weighting factor W (i, k) can be calculated.

W(i,k)=|X(J,i,k)| ...(18)
W(i,k)=|X(J,i,k)|² ...(19)
なお、重み係数W(i,k)を１などの固定値としてもよい。また、雑音環境や目的信号の状況に応じ、重み係数W(i,k)を１などの固定値とする場合と、重み係数W(i,k)の例１〜４のように重み係数W(i,k)を逐次算出する場合と、を切り替え制御可能な構成としてもよい（《重み係数W(i,k)の例》の説明終わり）。 W (i, k) = | X (J, i, k) | ... (18)
W (i, k) = | X (J, i, k) | ² ... (19)
The weight coefficient W (i, k) may be a fixed value such as 1. In addition, the weighting factor W (i, k) is set to a fixed value such as 1 according to the noise environment and the state of the target signal, and the weighting factor W as in Examples 1 to 4 of the weighting factor W (i, k). A configuration in which (i, k) is sequentially calculated and a switchable control may be employed (end of explanation of << example of weighting factor W (i, k) >>).

図１１は、以上のように生成したヒストグラムを、横軸を量子化された正規化信号値（信号到来方向）Z(c)とし、縦軸を正規化された重み付け後の頻度bin_g(i,k,c)として表示した例である。ここで、図１１（ａ）は、目的信号が存在する時間周波数ビンを含むグリッドについて作成されたヒストグラムであり、図１１（ｂ）は、目的信号が存在せず、雑音信号のみが存在する時間周波数ビンを含むグリッドについて作成されたヒストグラムの例示である。なお、これらの例において重み係数W(i,k)は１である。 FIG. 11 shows the histogram generated as described above, with the horizontal axis representing quantized normalized signal value (signal arrival direction) Z (c) and the vertical axis representing normalized frequency bin _g (i , k, c). Here, FIG. 11A is a histogram created for a grid including a time frequency bin in which the target signal exists, and FIG. 11B shows a time in which only the noise signal exists without the target signal. It is an example of the histogram produced about the grid containing a frequency bin. In these examples, the weighting factor W (i, k) is 1.

図１１（ａ）（ｂ）の対比から分かるように、目的信号が存在する時間周波数ビンを含むグリッドのヒストグラム（図１１（ａ））は、正規化信号値Z(c)が特定の値に偏った分布をみせる（偏在性が高い）のに対し、目的信号が存在せず、雑音信号のみが存在する時間周波数ビンを含むグリッドのヒストグラム（図１１（ｂ））は、幅広く分布する形状となることが分かる。 As can be seen from the comparison between FIGS. 11 (a) and 11 (b), the grid histogram (FIG. 11 (a)) including the time-frequency bin where the target signal exists has a normalized signal value Z (c) of a specific value. The histogram of the grid (FIG. 11 (b)) including the time frequency bin where the target signal does not exist and only the noise signal exists while the distribution is uneven (highly uneven) has a widely distributed shape. I understand that

ヒストグラム生成部１６ａは、以上のように生成したグリッド毎のヒストグラムを特定するためのbin_g(i,k,c)(c=1,...,C)を出力し、bin_g(i,k,c)は確率密度関数計算部１６ｂに入力される。 The histogram generation unit 16a outputs bin _g (i, k, c) (c = 1,..., C) for specifying the histogram for each grid generated as described above, and bin _g (i, k, c) is input to the probability density function calculator 16b.

確率密度関数計算部１６ｂは、bin_g(i,k,c)を用い、以下の式（２０）のようにヒストグラムを確率密度関数P_g(i,k,c)とみなし、量子化された各正規化信号値にそれぞれ対応する値c=1,...,Cを確率変数とする確率密度関数P_g(i,k,c)を算出し、出力する（ステップＳ７２）。 Probability density function calculation section 16b uses the bin _g (i, k, c), regarded as the following equation (20) a histogram probability density function P _g as (i, k, c), the quantized A probability density function P _g (i, k, c) having values c = 1,..., C corresponding to the respective normalized signal values as random variables is calculated and output (step S72).

出力されたグリッド毎の各確率密度関数P_g(i,k,c)は、エントロピー計算部１６ｃに入力され、エントロピー計算部１６ｃは、以下の式（２１）のようにグリッド毎のエントロピーH_g(i,H)を求め、各グリッドの偏在値として出力する。

The output probability density functions P _g (i, k, c) for each grid are input to the entropy calculation unit 16c, and the entropy calculation unit 16c performs entropy H _{g for} each grid as shown in the following equation (21). (i, H) is obtained and output as an uneven value of each grid.

このように算出したエントロピーH_g(i,k)は、正規化信号値Z(c)のヒストグラムが特定の値に偏った分布をみせる場合には低い値となり、幅広く分布する場合には高い値となり、ヒストグラムの分布の偏りを示す。すなわち、図１１（ａ）のように、目的信号が存在する時間周波数ビンを含むグリッドのヒストグラムは、正規化信号値Z(c)が特定の値に偏るため、エントロピーH_g(i,k)は小さくなる。なお、このエントロピーの大小を反転させるため、エントロピー計算部１６ｃがさらに以下の計算を行い、その演算結果を各グリッドの偏在値として出力としてもよい。

The entropy H _g (i, k) calculated in this way is low when the histogram of the normalized signal value Z (c) shows a distribution biased to a specific value, and high when it is widely distributed. Thus, the bias of the histogram distribution is shown. That is, as shown in FIG. 11 (a), the histogram of the grid including the time frequency bin where the target signal is present has the entropy H _g (i, k) because the normalized signal value Z (c) is biased to a specific value. Becomes smaller. In order to invert the magnitude of the entropy, the entropy calculation unit 16c may further perform the following calculation and output the calculation result as an unevenly distributed value of each grid.

この演算結果は、エントロピーH_g(i,k)の大小を反転させたものであり、目的信号が存在するグリッドで大きい値をとり、それ以外のグリッドで小さい値をとり、ヒストグラムの分布の偏りを示す。以下では、エントロピー計算部１６ｃから出力される各グリッドの偏在値を式（２１）の演算結果も含めてH_g(i,k)と表現する。

The result of the calculation is the inversion of the entropy H _g (i, k), which takes a large value in the grid where the target signal exists, takes a small value in the other grids, and biases the histogram distribution. Indicates. Hereinafter, the uneven distribution value of each grid output from the entropy calculation unit 16c is expressed as H _g (i, k) including the calculation result of Expression (21).

エントロピー計算部１６ｃから出力された各グリッドの偏在値H_g(i,k)は、平均部１６ｄに入力される。平均部１６ｄは、同一のフレームiに対応する各グリッドの偏在値H_g(i,k)を平均し、その平均値を当該フレームiの偏在性指標値H(i)として算出する（ステップＳ７４）。すなわち、平均部１６ｄは、同一のフレームiに対応する各グリッドの偏在値H_g(i,k)をg=1,...,Gについて合算し、Gで割ることにより当該フレームiの偏在性指標値H(i)を算出する。 The uneven distribution value H _g (i, k) of each grid output from the entropy calculation unit 16c is input to the averaging unit 16d. The averaging unit 16d averages the uneven distribution value H _g (i, k) of each grid corresponding to the same frame i, and calculates the average value as the uneven distribution index value H (i) of the frame i (step S74). ). That is, the averaging unit 16d adds the uneven distribution values H _g (i, k) of the grids corresponding to the same frame i with respect to g = 1,. Sex index value H (i) is calculated.

なお、ここではヒストグラムの偏りを示す指標としてエントロピーを用い、それを偏在性指標値H(i)としているが、その他の正規化信号値ZGRID_g(i,k)の偏在性を示す指標を在性指標値H(i)としてもよい。以下に他の偏在性指標値H(i)を例示する。 Here, entropy is used as an index indicating the bias of the histogram and is used as the ubiquitous index value H (i), but other indexes indicating the ubiquity of the normalized signal value ZGRID _g (i, k) are present. It may be the sex index value H (i). Examples of other uneven distribution index values H (i) are shown below.

《偏在性指標値H(i)の変形例》
例えば、図２の偏在性指標値算出部１６の代わりに、図７の偏在性指標値算出部１６を用いてもよい（偏在性指標値H(i)の変形例１）。この例では分散を偏在性指標値H(i)として用いる。この場合、まず、偏在性指標値算出部１６の平均値算出部１６ｅに、正規化信号値ZGRID_g(i,k)が入力される。平均値算出部１６ｅは、以下の式（２２）のように、時間周波数ビン(i,k)毎に各正規化信号値ZGRID_g(i,k)を重み係数W(i,k)で重み付けし、グリッド毎に重み付け後の平均値Ｅ_g(i,k)を求めて出力する。なお、μはグリッド毎の正規化信号値ZGRID_g(i,k)の要素数である。 <Modification of uneven distribution index value H (i)>
For example, the ubiquitous index value calculation unit 16 of FIG. 7 may be used instead of the ubiquitous index value calculation unit 16 of FIG. 2 (modified example 1 of the ubiquitous index value H (i)). In this example, the variance is used as the uneven distribution index value H (i). In this case, first, the normalized signal value ZGRID _g (i, k) is input to the average value calculation unit 16e of the uneven distribution index value calculation unit 16. The average value calculation unit 16e weights each normalized signal value ZGRID _g (i, k) with a weighting factor W (i, k) for each time frequency bin (i, k) as shown in the following equation (22). Then, the weighted average value E _g (i, k) is obtained and output for each grid. Note that μ is the number of elements of the normalized signal value ZGRID _g (i, k) for each grid.

偏在性指標値算出部１６の分散計算部１６ｆには、平均値Ｅ_g(i,k)と、各正規化信号値ZGRID_g(i,k)とが入力され、以下の式（２３）のように分散H_g(i,k)を計算し、それを各グリッドの偏在値H_g(i,k)として出力する。

An average value E _g (i, k) and each normalized signal value ZGRID _g (i, k) are input to the variance calculation unit 16f of the uneven distribution index value calculation unit 16, and the following equation (23) is obtained. Thus, the variance H _g (i, k) is calculated and output as the uneven distribution value H _g (i, k) of each grid.

分散計算部１６ｆから出力された各グリッドの偏在値H_g(i,k)は、平均部１６ｄに入力される。平均部１６ｄは、同一のフレームiに対応する各グリッドの偏在値H_g(i,k)を平均し、その平均値を当該フレームiの偏在性指標値H(i)として算出する。

The uneven distribution value H _g (i, k) of each grid output from the variance calculation unit 16f is input to the averaging unit 16d. The averaging unit 16d averages the uneven distribution value H _g (i, k) of each grid corresponding to the same frame i, and calculates the average value as the uneven distribution index value H (i) of the frame i.

また、図２の偏在性指標値算出部１６の代わりに、図８の偏在性指標値算出部１６を用いてもよい（偏在性指標値H(i)の変形例２）。この例では尖度を偏在性指標値H(i)として用いる。 Further, the ubiquitous index value calculator 16 shown in FIG. 8 may be used instead of the ubiquitous index value calculator 16 shown in FIG. 2 (Modification 2 of the ubiquitous index value H (i)). In this example, kurtosis is used as the uneven distribution index value H (i).

この場合、まず、偏在性指標値算出部１６の平均値算出部１６ｅに、正規化信号値ZGRID_g(i,k)が入力される。平均値算出部１６ｅは、式（２２）のように、時間周波数ビン(i,k)毎に各正規化信号値ZGRID_g(i,k)を重み係数W(i,k)で重み付けし、重み付け後の平均値Ｅ_g(i,k)を求めて出力する。また、偏在性指標値算出部１６の分散計算部１６ｇには、平均値Ｅ_g(i,k)と、各正規化信号値ZGRID_g(i,k)とが入力され、式（２３）と同様に分散σ_g (i,k)を計算し出力する。 In this case, first, the normalized signal value ZGRID _g (i, k) is input to the average value calculation unit 16e of the uneven distribution index value calculation unit 16. The average value calculation unit 16e weights each normalized signal value ZGRID _g (i, k) with a weighting coefficient W (i, k) for each time frequency bin (i, k) as shown in Expression (22). The average value E _g (i, k) after weighting is obtained and output. In addition, the mean value E _g (i, k) and each normalized signal value ZGRID _g (i, k) are input to the variance calculation unit 16g of the uneven distribution index value calculation unit 16, and the equation (23) Similarly, the variance σ _g (i, k) is calculated and output.

さらに尖度計算部１６ｈに、分散σ_g(i,k)と、平均値Ｅ_g(i,k)と、各正規化信号値ZGRID_g(i,k)とが入力され、尖度計算部１６ｈは、例えば以下の式（２４）によって尖度H_g(i,k)を求めて出力する。 Further, the variance σ _g (i, k), the average value E _g (i, k), and each normalized signal value ZGRID _g (i, k) are input to the kurtosis calculation unit 16h, and the kurtosis calculation unit For 16h, for example, the kurtosis H _g (i, k) is obtained and output by the following equation (24).

尖度計算部１６ｈから出力された各グリッドの偏在値H_g(i,k)は、平均部１６ｄに入力される。平均部１６ｄは、同一のフレームiに対応する各グリッドの偏在値H_g(i,k)を平均し、その平均値を当該フレームiの偏在性指標値H(i)として算出する。

The uneven distribution value H _g (i, k) of each grid output from the kurtosis calculation unit 16h is input to the averaging unit 16d. The averaging unit 16d averages the uneven distribution value H _g (i, k) of each grid corresponding to the same frame i, and calculates the average value as the uneven distribution index value H (i) of the frame i.

また、標準偏差等その他正規化信号値ZGRID_g(i,k)の偏在性を示す統計量を各グリッドの偏在値H_g(i,k)とし、それをフレームi毎に平均して偏在性指標値H(i)としてもよい。 Also, the statistic indicating the uneven distribution of other normalized signal values ZGRID _g (i, k) such as standard deviation is defined as the uneven distribution value H _g (i, k) of each grid, which is averaged for each frame i. The index value H (i) may be used.

さらに、時間周波数ビン(i,k)毎に２種類以上の正規化信号値ZGRID_g(i,k)（例えば位相差と振幅比）が生成されている場合には、当該２種類以上の正規化信号値ZGRID_g(i,k)の偏在性をそれぞれ示す２以上の偏在性指標値H(i)を算出してもよいし、当該２種類以上の正規化信号値ZGRID_g(i,k)を要素とするベクトルの偏在性を示す偏在性指標値H(i)を算出してもよいが、２以上の偏在性指標値H(i)を算出する場合と１種類の偏在性指標値H(i)を算出する場合とでは、後述する判定部１７での処理内容が相違する（《偏在性指標値H(i)の変形例》［ステップＳ７の例］の説明終わり）。 Furthermore, when two or more kinds of normalized signal values ZGRID _g (i, k) (for example, phase difference and amplitude ratio) are generated for each time frequency bin (i, k), the two or more kinds of normalization signal values are generated. Two or more uneven distribution index values H (i) each indicating the uneven distribution of the normalized signal value ZGRID _g (i, k) may be calculated, or the two or more types of normalized signal values ZGRID _g (i, k) may be calculated. ) May be calculated as the ubiquity index value H (i) indicating the ubiquity of the vector, but when calculating two or more ubiquity index values H (i) and one type of ubiquity index value The content of processing in the determination unit 17 to be described later is different from the case of calculating H (i) (<< variation of uneven distribution index value H (i) >> [end of description of step S7]).

上述のように偏在性指標値算出部１６から出力された正規偏在性指標値H(i)は、判定部１７に入力され、判定部１７は、偏在性指標値H(i)を指標とし、各フレームが目的信号区間であるか否かを判定する（ステップＳ８）。 As described above, the normal ubiquity index value H (i) output from the ubiquitous index value calculation unit 16 is input to the determination unit 17, and the determination unit 17 uses the ubiquity index value H (i) as an index. It is determined whether or not each frame is a target signal section (step S8).

本形態の判定部１７は、判定対象のフレームの偏在性指標値と、非目的信号区間のフレームの偏在性指標値と、の比である除算値又は当該除算値の写像が、所定の閾値以上であった場合に判定対象のフレームが目的信号区間であると判定するか、当該所定の閾値を超える場合に判定対象のフレームが目的信号区間であると判定する。 The determination unit 17 of the present embodiment is configured such that the division value that is the ratio of the ubiquity index value of the frame to be determined and the ubiquity index value of the frame in the non-target signal section or a mapping of the division value is equal to or greater than a predetermined threshold value If it is, it is determined that the determination target frame is the target signal section, or if the predetermined threshold is exceeded, it is determined that the determination target frame is the target signal section.

［ステップＳ８の詳細］
図３及び図６に示す例では、まず、判定部１７の相対値算出部１７ａ（図３）は、式（２１）によって算出されて偏在性指標値算出部１６から出力された偏在性指標値のうち、判定対象のフレームの偏在性指標値H’(i)とし、目的信号が存在しないと推定される非目的信号区間のフレームの偏在性指標値λとし、それらの比である除算値γ(i)を以下のように算出して出力する（ステップＳ８１）。なお、目的信号が存在しないと推定される非目的信号区間の一例、例えば、i=1,...,20などの初期区間である。 [Details of Step S8]
In the example shown in FIGS. 3 and 6, first, the relative value calculation unit 17 a (FIG. 3) of the determination unit 17 calculates the ubiquitous index value calculated by the equation (21) and output from the ubiquitous index value calculation unit 16. Among them, the ubiquity index value H ′ (i) of the frame to be determined is set as the ubiquity index value λ of the frame in the non-target signal section where it is estimated that the target signal does not exist, and a division value γ that is a ratio thereof (i) is calculated and output as follows (step S81). An example of a non-target signal section in which no target signal is estimated, for example, an initial section such as i = 1,.

γ(i)=H’(i)/λ ...(25)
次に、尤度比算出部１７ｂに除算値γ(i)が入力され、尤度比算出部１７ｂは、以下の式（２６）に従って尤度比Λ(i)を算出して出力する（ステップＳ８２）。なお、式（２６）の対数は自然対数である。また、この尤度比の計算式は、例えば、Shon, J, Kim, N.-S., and Sung, W., “A Statistical Model-based Voice Activity Detection,” IEEE Signal Processing Letters, Vol. 6, No. 1, pp.1-3, 1999.等に開示されている。 γ (i) = H '(i) / λ ... (25)
Next, the division value γ (i) is input to the likelihood ratio calculation unit 17b, and the likelihood ratio calculation unit 17b calculates and outputs the likelihood ratio Λ (i) according to the following equation (26) (Step S1). S82). In addition, the logarithm of Formula (26) is a natural logarithm. The likelihood ratio calculation formula is, for example, Shon, J, Kim, N.-S., and Sung, W., “A Statistical Model-based Voice Activity Detection,” IEEE Signal Processing Letters, Vol. , No. 1, pp.1-3, 1999.

次に、閾値判定部１７ｃに尤度比Λ(i)が入力され、閾値判定部１７ｃは尤度比Λ(i)と所定の閾値thとを比較し、尤度比Λ(i)に対応するフレームiが目的信号区間であるか否か、すなわち、フレームiが目的信号区間であるか否かを判定し、その判定結果を出力する（ステップＳ８３）。具体的には、閾値判定部１７ｃは、例えば、尤度比Λ(i)が所定の閾値thより大きい場合（「閾値th以上の場合」としてもよい）、目的信号がフレームiに含まれるとして１を出力し（ステップＳ８４）、尤度比Λ(i)が所定の閾値thより小さい場合（「閾値th以下の場合」としてもよい）、目的信号がフレームiに含まれないとして０を出力する（ステップＳ８５）。なお、閾値thは、尤度比Λ(i)の時間長平均（複数のフレームｉに対する平均）や分散などの統計量を用いて設定されてもよいし、th=0.2などの固定値を事前に設定しておいてもよい。分散などの統計量を用いて閾値thを設定する場合の一例としては、目的信号が存在しないと推定されるフレームを判定対象のフレームとして尤度比Λ(i)を求め、それらの平均値から所定のマージンを設けた値を閾値thとする方法がある。

Next, the likelihood ratio Λ (i) is input to the threshold determination unit 17c, and the threshold determination unit 17c compares the likelihood ratio Λ (i) with a predetermined threshold th and corresponds to the likelihood ratio Λ (i). It is determined whether or not the frame i to be processed is the target signal section, that is, whether or not the frame i is the target signal section, and the determination result is output (step S83). Specifically, the threshold determination unit 17c determines that the target signal is included in the frame i, for example, when the likelihood ratio Λ (i) is greater than the predetermined threshold th (may be “when the threshold is greater than or equal to the threshold th”). 1 is output (step S84), and if the likelihood ratio Λ (i) is smaller than the predetermined threshold th (may be “below the threshold th”), 0 is output because the target signal is not included in the frame i. (Step S85). Note that the threshold th may be set using a statistic such as a time length average (average for a plurality of frames i) or variance of the likelihood ratio Λ (i), or a fixed value such as th = 0.2 is set in advance. It may be set to. As an example of setting the threshold th using a statistic such as variance, the likelihood ratio Λ (i) is obtained with a frame estimated to be absent of the target signal as a frame to be determined, and the average value thereof is calculated. There is a method in which a value provided with a predetermined margin is used as a threshold th.

なお、偏在性指標値H(i)を指標として目的信号区間を判定する方法はこれに限定されない。前述のように偏在性指標値H(i)の大きさは、各フレームiが目的信号区間であるか否かによって変化する値である。偏在性指標値H(i)の大きさを評価し、その評価結果を各フレームiが目的信号区間であるか否かの判定結果に対応付ける方法であれば、どのような方法を用いてもよい。以下に目的信号区間判定方法の変形例を示す。 Note that the method of determining the target signal section using the uneven distribution index value H (i) as an index is not limited to this. As described above, the size of the uneven distribution index value H (i) is a value that varies depending on whether or not each frame i is the target signal section. Any method may be used as long as it evaluates the size of the ubiquitous index value H (i) and associates the evaluation result with the determination result of whether or not each frame i is the target signal section. . Below, the modification of the target signal area determination method is shown.

［目的信号区間判定方法の変形例］
例えば、図３の判定部１７の代わりに、図９の判定部１７を用いてもよい（目的信号区間判定方法の変形例１）。この変形例の場合、第１値算出部１７ａに前述の判定対象のフレームの偏在性指標値H’(i)と、目的信号が存在しないと推定される非目的信号区間のフレームの偏在性指標値λとが入力され、それらの比である除算値γ(i)を上述の式（２５）のように算出して出力する。次に、閾値判定部１７ｄに除算値γ(i)が入力され、閾値判定部１７ｄは、フレームi毎に除算値γ(i)と閾値thとを比較し、除算値γ(i)が閾値thよりも大きい場合（「閾値th以上の場合」としてもよい）、除算値γ(i)に対応するフレームiが目的信号区間に対応し、そうでなければ除算値γ(i)に対応するフレームiが非目的信号区間であると判定し、その判定結果（１ or 0）を出力する。また、除算値γ(i)を用いる代わりに、偏在性指標値H’(i)から偏在性指標値λを減算した減算値を用い、その減算値に対して上記と同様な閾値処理を行って、目的信号区間であるか否かを推定してもよい。 [Modification of target signal section judgment method]
For example, the determination unit 17 in FIG. 9 may be used instead of the determination unit 17 in FIG. 3 (Modification 1 of the target signal section determination method). In the case of this modification, the first value calculation unit 17a has the above-described determination of the ubiquity index value H ′ (i) of the frame and the ubiquity index of the frame in the non-target signal section estimated that the target signal does not exist. A value λ is input, and a division value γ (i), which is a ratio of the values λ, is calculated and output as in the above equation (25). Next, the division value γ (i) is input to the threshold value determination unit 17d, and the threshold value determination unit 17d compares the division value γ (i) with the threshold value th for each frame i, and the division value γ (i) is the threshold value. If it is greater than th (“may be greater than or equal to the threshold th”), the frame i corresponding to the division value γ (i) corresponds to the target signal interval, otherwise it corresponds to the division value γ (i). It is determined that the frame i is a non-target signal section, and the determination result (1 or 0) is output. Also, instead of using the division value γ (i), a subtraction value obtained by subtracting the ubiquitous index value λ from the ubiquitous index value H ′ (i) is used, and the threshold processing similar to the above is performed on the subtracted value. Thus, it may be estimated whether or not it is the target signal section.

また、例えば、図３の判定部１７の代わりに、図１０（ａ）の判定部１７を用いてもよい（目的信号区間判定方法の変形例２）。この変形例の場合、判定部１７の閾値判定部１７ｉに、式（２１）によって算出されて偏在性指標値算出部１６から出力された偏在性指標値H(i)が入力され、閾値判定部１７ｉは、フレームi毎に除算値γ(i)と閾値thとを比較し、除算値γ(i)が閾値thよりも大きい場合（「閾値th以上の場合」としてもよい）、除算値γ(i)に対応するフレームiが目的信号区間に対応し、そうでなければ除算値γ(i)に対応するフレームiが非目的信号区間に対応すると判定し、その判定結果（１ or 0）を出力する。なお、閾値thは、例えば、閾値算出部１７ｈが入力された偏在性指標値H(i)のフレームi毎の平均値等の統計量をもとに動的に設定される。また、閾値thは固定値であってもよい。 Further, for example, the determination unit 17 in FIG. 10A may be used instead of the determination unit 17 in FIG. 3 (Modification 2 of the target signal section determination method). In the case of this modification, the ubiquitous index value H (i) calculated by the equation (21) and output from the ubiquitous index value calculating unit 16 is input to the threshold determining unit 17i of the determining unit 17, and the threshold determining unit 17i compares the division value γ (i) with the threshold value th for each frame i, and if the division value γ (i) is larger than the threshold value th (may be “when it is equal to or greater than the threshold value th”), the division value γ It is determined that the frame i corresponding to (i) corresponds to the target signal section, otherwise the frame i corresponding to the division value γ (i) corresponds to the non-target signal section, and the determination result (1 or 0) Is output. The threshold th is dynamically set based on, for example, a statistic such as an average value for each frame i of the uneven distribution index value H (i) input by the threshold calculation unit 17h. Further, the threshold th may be a fixed value.

なお、上述した以外の偏在性指標値H(i)を用い、上述のように目的信号区間を判定してもよい。この場合の閾値判定は偏在性指標値H(i)の特性による。すなわち、偏在性が高いほど値が大きくなる偏在性指標値H(i)を用いる際には、偏在性指標値H(i)又はその写像が所定の閾値を超えた場合（又は「以上の場合」）に目的信号区間であると判定し、偏在性指標値H(i)又はその写像が所定の閾値を未満場合（又は「以下の場合」）に目的信号区間でないと判定する。一方、偏在性が低いほど値が大きくなる偏在性指標値H(i)を用いる際には、偏在性指標値H(i)又はその写像が所定の閾値を超えた場合（又は「以上の場合」）に目的信号区間でないと判定し、偏在性指標値H(i)又はその写像が所定の閾値を未満場合（又は「以下の場合」）に目的信号区間であると判定する。 Note that the target signal section may be determined as described above using an uneven distribution index value H (i) other than that described above. The threshold determination in this case is based on the characteristic of the uneven distribution index value H (i). That is, when using the ubiquitous index value H (i), which increases in value as the ubiquity increases, the ubiquitous index value H (i) or its mapping exceeds a predetermined threshold (or )) Is determined to be the target signal section, and if the ubiquitous index value H (i) or its mapping is less than a predetermined threshold (or “in the following case”), it is determined not to be the target signal section. On the other hand, when using the ubiquitous index value H (i) whose value increases as the ubiquity is low, the ubiquitous index value H (i) or its mapping exceeds a predetermined threshold (or ]), It is determined that it is not the target signal section, and is determined to be the target signal section when the ubiquitous index value H (i) or the mapping thereof is less than a predetermined threshold (or “in the following case”).

また、時間周波数ビン(i,k)毎に２種類以上の正規化信号値ZGRID_g(i,k)が生成され、各グリッドに属する２種類以上の正規化信号値ZGRID_g(i,k)を要素とするベクトルの偏在性を示す偏在性指標値H(i)がフレームi毎に算出されている場合であっても、判定部１７は、上述と同様に目的信号区間であるか否かの判定を行うことができる。 Further, time-frequency bins (i, k) normalized signal values of two or more for each ZGRID _g (i, k) are generated, two or more of the normalized signal values belonging to each grid ZGRID _g (i, k) Even when the ubiquity index value H (i) indicating the ubiquity of the vector having the element is calculated for each frame i, the determination unit 17 determines whether or not the target signal section is the same as described above. Can be determined.

一方、時間周波数ビン(i,k)毎に２種類以上の正規化信号値ZGRID_g(i,k)が生成され、各グリッドに属する２種類以上の正規化信号値ZGRID_g(i,k)の偏在性をそれぞれ示す２以上の偏在性指標値H(i)がフレームi毎に算出されている場合、判定部１７は、例えば、フレームi毎の２以上の偏在性指標値H(i)に重み付けを行い、当該重み付け後の偏在性指標値を指標とし、各フレームiが目的信号区間であるか否かを判定する。具体的には、例えば、２以上の偏在性指標値H(i)の重み付け和が所定の閾値を超えるか否かによって、フレームiが目的信号区間であるか否かを判定する。 On the other hand, time-frequency bins (i, k) normalized signal values of two or more for each ZGRID _g (i, k) are generated, two or more of the normalized signal values belonging to each grid ZGRID _g (i, k) When two or more uneven distribution index values H (i) each indicating the uneven distribution of each frame i are calculated for each frame i, for example, the determination unit 17 sets two or more uneven distribution index values H (i) for each frame i. Are weighted, and the unevenness index value after the weighting is used as an index to determine whether or not each frame i is the target signal section. Specifically, for example, it is determined whether or not the frame i is the target signal section, based on whether or not the weighted sum of two or more uneven distribution index values H (i) exceeds a predetermined threshold.

また、上述のように偏在性指標値H(i)又はその写像と、所定の閾値との大小を比較して目的信号区間であるか否かを判定する代わりに、事前学習されたフレームの偏在性指標値と当該フレームが目的信号区間であるか否かの判定結果との関係を用いたパターン認識により、偏在性指標値算出部で算出された偏在性指標値に対応するフレームが目的信号区間に対応するか否かを判定してもよい。この場合、例えば、図１０（ｂ）の判定部１７のように、パラメータ学習部１７ｈに、フレームの音響特徴量（偏在性指標値やγ(i)など）と当該フレームが目的信号区間であるか否かの判定結果との組からなる学習サンプルを入力し、パラメータ学習部１７ｈでパターン認識学習を行い、モデルパラメータを求める。そして、パターン認識部１７ｉにこのパラメータと判定対象の偏在性指標値H(i)とを入力し、パターン認識によって偏在性指標値H(i)に対応するフレームiが目的信号区間のものであるか否かを判定する。なお、パターン認識技術には、例えば、公知のサポートベクターマシーン（津田宏治，“サポートベクターマシーンとは何か”，電子情報通信学会誌，２０００：４６０〜４６６頁）や、隠れマルコフモデル（北研二，中村哲，永田昌明，“音声言語処理”，森出版株式会社，１９９６：５７〜９０頁）等を利用する。 Further, as described above, instead of determining whether or not the target signal section is a comparison by comparing the ubiquity index value H (i) or its mapping with a predetermined threshold value, the pre-learned frame ubiquity The frame corresponding to the ubiquitous index value calculated by the ubiquitous index value calculation unit by the pattern recognition using the relationship between the sex index value and the determination result of whether or not the frame is the target signal section is the target signal section It may be determined whether or not it corresponds to. In this case, for example, as in the determination unit 17 in FIG. 10B, the parameter learning unit 17h causes the frame acoustic feature amount (such as an uneven distribution index value or γ (i)) and the frame to be the target signal section. A learning sample consisting of a pair with the determination result is input and pattern recognition learning is performed by the parameter learning unit 17h to obtain a model parameter. Then, this parameter and the ubiquity index value H (i) to be determined are input to the pattern recognition unit 17i, and the frame i corresponding to the ubiquity index value H (i) by the pattern recognition belongs to the target signal section. It is determined whether or not. Examples of pattern recognition techniques include known support vector machines (Koji Tsuda, “What is a support vector machine”, Journal of the Institute of Electronics, Information and Communication Engineers, 2000: 460-466 pages), and hidden Markov models (Kenji Kitaken). , Satoshi Nakamura, Masaaki Nagata, “Spoken Language Processing”, Mori Publishing Co., Ltd., 1996: 57-90).

判定部１７が、目的信号区間であるか否かの判定結果を出力するのではなく、前述の尤度比Λ(i)そのもの、または、それを確率値に変換したΛ(i)/（1+Λ(i)）などを出力してもよい。 The determination unit 17 does not output the determination result as to whether or not it is the target signal section, but the above-described likelihood ratio Λ (i) itself, or Λ (i) / (1 + Λ (i)) may be output.

＜実験結果＞
本形態の効果を示すための実験結果を示す。この実験では、センサとして２本のマイクロフォンを用い、音声信号と雑音信号が混在する音響信号を観測し、その音響信号を本形態の信号区間推定方法によって分析し、音声信号区間を検出する実施例を示す。なお、この実験では、正規化信号値Z (i,k)として信号到来方向推定値を用い、偏在性指標値H(i)として式（２１）によって算出されて偏在性指標値算出部１６から出力された偏在性指標値を用いて目的信号区間の推定を行った。 <Experimental result>
The experimental result for showing the effect of this form is shown. In this experiment, two microphones are used as sensors, an acoustic signal in which an audio signal and a noise signal are mixed is observed, the acoustic signal is analyzed by the signal interval estimation method of this embodiment, and an audio signal interval is detected. Indicates. In this experiment, the signal arrival direction estimation value is used as the normalized signal value Z (i, k), and the ubiquity index value H (i) is calculated by the equation (21) and is calculated from the ubiquity index value calculation unit 16. The target signal interval was estimated using the output ubiquitous index value.

使用した音響信号データは大学の研究室内で学生がポスターを用いて自身の研究の発表を行っている発話を収録した信号で、サンプリング周波数16kHz、量子化ビット数16ビットで離散サンプリングされたものを用いた。収録には２本のマイクロフォンを用いており、２本のマイクロフォンは同一直線上に4cmの間隔で配置されている。図１２（ａ）に収録された音響信号を示す。なお、図１２（ａ）における横軸は時間であり、縦軸は音響信号の振幅である。この音響信号には、冒頭の部分にポスター発表が行われている研究室のドアを開閉する音（方向性雑音）が混入している。この音響信号に対し、本形態による信号区間推定方法を、１フレームの時間長を32ms（512サンプル点）とし、16ms（256サンプル点）毎にフレーム始点を移動させて適用する。図１２（ｂ）は、これによって各フレームで推定された偏在性指標値H(i)（音響特徴量）を示すグラフである。なお、図１２（ａ）における横軸は時間であり、縦軸は偏在性指標値H(i)の振幅である。また、非特許文献６記載の方法で求まる偏在性指標値H(i)を図１２（ｃ）に示す。 The acoustic signal data used was a signal that recorded the utterances that students gave their research presentations using posters in a university laboratory, and was sampled discretely at a sampling frequency of 16 kHz and a quantization bit rate of 16 bits. Using. Two microphones are used for recording, and the two microphones are arranged on the same line at intervals of 4 cm. FIG. 12A shows an acoustic signal recorded. In FIG. 12A, the horizontal axis is time, and the vertical axis is the amplitude of the acoustic signal. The sound signal contains sound (directional noise) that opens and closes the door of the laboratory where the poster is being presented at the beginning. The signal section estimation method according to the present embodiment is applied to this acoustic signal by setting the time length of one frame to 32 ms (512 sample points) and moving the frame start point every 16 ms (256 sample points). FIG. 12B is a graph illustrating the uneven distribution index value H (i) (acoustic feature amount) estimated in each frame. In FIG. 12A, the horizontal axis is time, and the vertical axis is the amplitude of the uneven distribution index value H (i). Moreover, the uneven distribution index value H (i) obtained by the method described in Non-Patent Document 6 is shown in FIG.

図示のように本形態による目的信号区間推定方法により出力される偏在性指標値H(i)が、音声信号の存在区間Ｂでは高い値を示し、それ以外の区間では小さい値を示すことがわかる。また、図１２（ｂ）と（ｃ）とを比較すると、図１２（ａ）のデータに含まれるドアの開閉音のみが収録されている区間Ａにおいて、本形態による手法では偏在性指標値H(i)の値が小さいままなのに対し（図１２（ｂ））、非特許文献６記載の方法では音声信号区間と同程度の高い値が見られる（図１２（ｃ））。このことから、本形態によって得られる偏在性指標値H(i)が、調波性を持たない、ドアの開閉音のような方向性雑音に対し頑健であることがわかる。加えて、本形態では基本周波数とその倍音成分の近傍における到達時間差の偏在性のみを利用しているため、全時間周波数帯域で到達時間差の偏在性を計算する非特許文献６記載の方法よりも高速に偏在性指標値H(i)を計算できる。この実験においては、本形態の方法は非特許文献６記載の方法の9%の計算時間で音響特徴の計算を行うことができた。なお、最悪の計算量は、
（サンプリング周波数／基本周波数推定の探索範囲の下限周波数）／離散フーリエ変換の点数
により計算することができる。本形態に例示したサンプリング周波数（16kHz）と離散フーリエ変換の点数（512点）、及び基本周波数推定の探索範囲の下限周波数（50Hz）の場合は、(16000/50)/512=約31.25%の計算量で済む。実際は、推定される基本周波数の値は50~500Hzに分布するため、上記の実験により示されたように、最悪の計算量よりも少ない時間で計算が可能となる。 As shown in the figure, it is understood that the ubiquitous index value H (i) output by the target signal section estimation method according to the present embodiment shows a high value in the voice signal existence section B and a small value in the other sections. . Further, when FIGS. 12B and 12C are compared, in the section A in which only the door opening / closing sound included in the data of FIG. While the value of (i) remains small (FIG. 12 (b)), the method described in Non-Patent Document 6 shows a high value similar to that of the audio signal interval (FIG. 12 (c)). From this, it can be seen that the ubiquitous index value H (i) obtained by the present embodiment is robust against directional noise such as door opening / closing sound that does not have harmonics. In addition, in this embodiment, only the ubiquity of the arrival time difference in the vicinity of the fundamental frequency and its harmonic component is used, so that the ubiquity of the arrival time difference is calculated in the entire time frequency band than the method described in Non-Patent Document 6. The uneven distribution index value H (i) can be calculated at high speed. In this experiment, the method of this embodiment was able to calculate the acoustic features with a calculation time of 9% of the method described in Non-Patent Document 6. The worst calculation amount is
(Sampling frequency / lower frequency limit of search range of fundamental frequency estimation) / discrete Fourier transform score. In the case of the sampling frequency (16 kHz) and the discrete Fourier transform score (512 points) exemplified in this embodiment and the lower limit frequency (50 Hz) of the search range of the fundamental frequency estimation, (16000/50) / 512 = about 31.25% Just the amount of calculation. Actually, since the estimated fundamental frequency value is distributed in the range of 50 to 500 Hz, the calculation can be performed in a time shorter than the worst calculation amount as shown by the above experiment.

以上のことから、本形態により、方向性雑音の影響を受けにくく、高速に目的音響信号区間を検出することが可能になることがわかる。なお、非特許文献６記載の方法と比較して、本形態では観測信号の基本周波数を推定する必要があるが、本形態記載の基本周波数推定方法は高速での実行が可能であることから、全体の計算量には影響を与えない。 From the above, it can be seen that this embodiment makes it possible to detect the target acoustic signal section at high speed without being affected by the directional noise. In addition, compared with the method described in Non-Patent Document 6, it is necessary to estimate the fundamental frequency of the observation signal in this embodiment, but the fundamental frequency estimation method described in this embodiment can be executed at high speed. It does not affect the overall computational complexity.

なお、本発明は上述の実施の形態に限定されるものではない。例えば、本形態のステップＳ４では、基本周波数推定部１３が全てのセンサに対応する基本周波数を推定し、それらをそれ以降の各センサに対応する処理にそれぞれ利用することとした。しかし、ステップＳ４において、基本周波数推定部１３が一部のセンサ（例えば１つのセンサ）に対応する基本周波数のみを推定し、それをそれ以降の全てのセンサに対応する処理に利用してもよい。 The present invention is not limited to the embodiment described above. For example, in step S4 of the present embodiment, the fundamental frequency estimation unit 13 estimates fundamental frequencies corresponding to all sensors, and uses them for processing corresponding to each subsequent sensor. However, in step S4, the fundamental frequency estimation unit 13 may estimate only the fundamental frequencies corresponding to some sensors (for example, one sensor) and use it for processing corresponding to all subsequent sensors. .

上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、信号区間推定装置１０がサンプリング部３０を包含する構成であってもよいし、信号区間推定装置１０の機能を複数のコンピュータで分散処理する構成であってもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 The various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Moreover, the structure which includes the sampling part 30 may be sufficient as the signal area estimation apparatus 10, and the structure which carries out the distributed process of the function of the signal area estimation apparatus 10 with a some computer may be sufficient. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。 Further, when the above-described configuration is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい In this embodiment, the apparatus is configured by executing a predetermined program on a computer. However, at least a part of the processing contents may be realized by hardware.

本発明の利用分野としては、例えば、音声信号や音楽信号などの目的信号が雑音信号とともに観測される環境においてなされる、目的信号の符号化、雑音信号の抑圧、残響除去、自動音声認識などの音響信号処理分野を例示できる。もちろん、音響信号以外の信号処理に本発明を適用してもかまわない。 The fields of application of the present invention include, for example, encoding of target signals, suppression of noise signals, dereverberation, automatic speech recognition, etc., in an environment where target signals such as voice signals and music signals are observed together with noise signals. The acoustic signal processing field can be exemplified. Of course, the present invention may be applied to signal processing other than acoustic signals.

図１は、本形態の目的信号区間推定装置の全体構成を例示したブロック図である。FIG. 1 is a block diagram illustrating the overall configuration of the target signal section estimation device of the present embodiment. 図２は、図１の偏在性指標値算出部１６の詳細構成を例示したブロック図である。FIG. 2 is a block diagram illustrating a detailed configuration of the uneven distribution index value calculation unit 16 of FIG. 図３は、図１の判定部１７の詳細構成を例示したブロック図である。FIG. 3 is a block diagram illustrating a detailed configuration of the determination unit 17 of FIG. 図４は、本形態の目的信号区間推定方法を説明するためのフローチャートである。FIG. 4 is a flowchart for explaining the target signal section estimation method of the present embodiment. 図５は、ステップＳ７の詳細を説明するためのフローチャートである。FIG. 5 is a flowchart for explaining details of step S7. 図６は、ステップＳ８の詳細を説明するためのフローチャートである。FIG. 6 is a flowchart for explaining details of step S8. 図７は、偏在性指標値算出部の構成例を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration example of the uneven distribution index value calculation unit. 図８は、偏在性指標値算出部の構成例を示すブロック図である。FIG. 8 is a block diagram illustrating a configuration example of the uneven distribution index value calculation unit. 図９は、判定部の構成例を示すブロック図である。FIG. 9 is a block diagram illustrating a configuration example of the determination unit. 図１０（ａ）（ｂ）は、判定部の構成例を示すブロック図である。10A and 10B are block diagrams illustrating a configuration example of the determination unit. 図１１（ａ）（ｂ）は、本形態の処理によって得られたヒストグラムを示すグラフである。FIGS. 11A and 11B are graphs showing histograms obtained by the processing of this embodiment. 図１２（ａ）は、実験において収録された音響信号を示すグラフである。図１２（ｂ）は、実験において各フレームで推定された偏在性指標値H(i)（音響特徴量）を示すグラフである。図１２（ｃ）は、非特許文献６記載の方法で求まる偏在性指標値H(i) を示すグラフである。FIG. 12A is a graph showing acoustic signals recorded in the experiment. FIG. 12B is a graph showing the ubiquitous index value H (i) (acoustic feature amount) estimated in each frame in the experiment. FIG. 12C is a graph showing the uneven distribution index value H (i) obtained by the method described in Non-Patent Document 6.

Explanation of symbols

１０信号区間推定装置 10 Signal section estimation device

Claims

A target signal section estimation device for estimating a target signal section,
A signal extraction unit that extracts each signal observed by a plurality of sensors for each frame that is a predetermined time interval;
A frequency domain conversion unit that converts the signal of each frame extracted by the signal extraction unit into a frequency domain, and generates a frequency domain signal for each time frequency bin for each sensor;
A fundamental frequency estimation unit that estimates the fundamental frequency of each frame signal cut out by the signal cutout unit;
Time-frequency domain division that identifies one or more grids that are finite time-frequency sections each including the fundamental frequency or each harmonic component thereof for each frame, and extracts the frequency-domain signal of each time-frequency bin belonging to each grid And
Based on the frequency domain signal extracted by the time frequency domain dividing unit corresponding to a specific reference sensor included in the sensor, it is extracted by at least the time frequency domain dividing unit corresponding to the sensor other than the reference sensor. Normalizing each frequency domain signal, and generating a normalized signal value corresponding to the direction of arrival of the signal observed by the sensor for each time frequency bin,
The ubiquitous value for calculating the ubiquity index value indicating the ubiquity of the normalized signal value for each frame by using the ubiquitous value for each frame by obtaining the ubiquitous value indicating the ubiquity of the normalized signal value for each grid. A sex index value calculator,
A target signal section estimation device comprising:

The target signal section estimation device according to claim 1,
A determination unit that determines whether each frame corresponds to the target signal section using the uneven distribution index value as an index;
A target signal section estimation device characterized by the above.

The target signal section estimation device according to claim 1 or 2,
The uneven distribution index value calculation unit
A histogram generating unit that quantizes the normalized signal value, calculates a frequency of the quantized normalized signal value for each grid, and generates a histogram for each grid;
Using the histogram for each grid, an uneven distribution calculating unit that calculates an uneven value indicating the distribution of the histogram for each grid,
An average part that averages the uneven value of each grid corresponding to the same frame, and calculates the average value as the uneven index value of the frame,
A target signal section estimation device comprising:

The target signal section estimation apparatus according to claim 3, wherein
The uneven distribution calculation unit
A probability density function generating unit for obtaining a probability density function for each grid using a histogram for each grid and using a value corresponding to each quantized normalized signal value as a random variable;
A function value that monotonically increases with respect to the entropy of the probability density function, or a function value that monotonously decreases with respect to the entropy, and an uneven value calculation unit that calculates the uneven value,
A target signal section estimation device comprising:

The target signal section estimation device according to any one of claims 2 to 4,
The determination unit is
Comparing the ubiquitous index value of each frame or a map thereof and a predetermined threshold value to determine whether each frame is the target signal section;
A target signal section estimation device characterized by the above.

The target signal section estimation device according to any one of claims 2 to 4,
The determination unit is
The division value that is the ratio of the ubiquitous index value of the frame to be determined and the ubiquitous index value of the frame of the non-target signal section or the mapping of the divided value is greater than or equal to a predetermined threshold value. A threshold determination unit that determines that the determination target frame is the target signal interval or determines that the determination target frame is the target signal interval when the predetermined threshold is exceeded.
A target signal section estimation device characterized by the above.

The target signal section estimation device according to any one of claims 2 to 4,
The determination unit is
The uneven distribution calculated by the uneven distribution index value calculation unit by pattern recognition using a relationship between the pre-learned frame unevenness index value and a determination result of whether or not the frame is the target signal section. Determining whether the frame corresponding to the sex index value is the target signal interval;
A target signal section estimation device characterized by the above.

A target signal section estimation method of a target signal section estimation device for estimating a target signal section,
A step in which the signal extraction unit cuts out each signal observed by a plurality of sensors for each frame that is a predetermined time interval;
A frequency domain transforming unit transforming the signal of each frame cut out by the signal cutout unit into a frequency domain, and generating a frequency domain signal for each time frequency bin for each sensor;
A step of estimating a fundamental frequency of a signal of each frame cut out by the signal cutout unit,
The time-frequency domain dividing unit specifies one or more grids that are finite time-frequency sections each including the fundamental frequency or each harmonic component thereof for each frame, and the frequency-domain signal of each time-frequency bin belonging to each grid Extracting the
The time frequency domain corresponding to at least the sensor other than the reference sensor based on the frequency domain signal extracted by the time frequency domain dividing unit corresponding to the specific reference sensor included in the sensor. Normalizing each frequency domain signal extracted by the dividing unit and generating a normalized signal value corresponding to the direction of arrival of the signal observed by the sensor for each time frequency bin;
An unevenness index value calculation unit obtains an uneven value indicating the unevenness of the normalized signal value for each grid, and indicates the unevenness of the normalized signal value for each frame by using the uneven value for each grid. Calculating an uneven distribution index value;
A target signal section estimation method comprising:

8. A target signal section estimation program for causing a computer to function as the target signal section estimation apparatus according to claim 1.

A computer-readable recording medium storing the target signal section estimation program according to claim 9.