JP2010245984A

JP2010245984A - Device for correcting sensitivity of microphone in microphone array, microphone array system including the same, and program

Info

Publication number: JP2010245984A
Application number: JP2009094577A
Authority: JP
Inventors: Kazunobu Kondo; 多伸近藤; Makoto Yamada; 誠山田
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2009-04-09
Filing date: 2009-04-09
Publication date: 2010-10-28
Anticipated expiration: 2029-04-09
Also published as: JP5240026B2

Abstract

<P>PROBLEM TO BE SOLVED: To enable sensitivity correction to be properly performed without paying any special attention about whether or not conditions for properly correcting dispersion in sensitivity of each microphone configuring a microphone array are available. <P>SOLUTION: A sensitivity correcting device of the microphone produces a separation matrix for performing sound source separation relating to two observed signals to be otained by collecting a mixed sound of two sounds each emitted from different sound sources S (S1, S2) by each of two microphones M1, M2 so that its first line has a dead angle into a normal line direction of an array surface and its second line has the dead angle into an arrangement direction of the microphone. An incoming direction of the sound to be suppressed by matrix elements of the first line of the separation matrix is estimated, then a signal level of an output signal of any one of the microphones is corrected in response to a ratio of each absolute value of the matrix elements of the first line when the incoming direction is not largely separated from the normal line of the array surface. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、マイクロホンアレイを構成する各マイクロホンの感度のばらつきを補正する技術に関する。 The present invention relates to a technique for correcting variations in sensitivity of microphones constituting a microphone array.

特定の方向から到来する音のみを収音することができるように指向性パターンの設定が可能な収音システムの一例として、マイクロホンアレイシステムが挙げられる。マイクロホンアレイシステムは、複数のマイクロホンを１次元或いは２次元的に配列したマイクロホンアレイを含んでおり、マイクロホンアレイを構成する各マイクロホンから出力されるオーディオ信号にＦＩＲ（Finite Impulse Response）フィルタ処理などのフィルタ処理を施し、フィルタ処理済みのオーディオ信号をミキシングして出力する構成となっている。そして、上記フィルタ処理のフィルタ係数を調整することで指向性パターンの調整が行われる。 A microphone array system is an example of a sound collection system in which a directivity pattern can be set so that only sound coming from a specific direction can be collected. The microphone array system includes a microphone array in which a plurality of microphones are arranged one-dimensionally or two-dimensionally, and filters such as FIR (Finite Impulse Response) filter processing are performed on audio signals output from the microphones constituting the microphone array. Processing is performed, and the filtered audio signal is mixed and output. The directivity pattern is adjusted by adjusting the filter coefficient of the filter processing.

この種の収音システムでは、各マイクロホンの感度が揃っている必要がある。何故ならば、各マイクロホンの感度にばらつきがあると、指向性パターンの調整に支障が生じ得るからである。しかし、マイクロホンは機械部品であるため製造ばらつきを避けることはできず、製造段階では±４デシベル若しくはそれ以上の感度のばらつきが生じる可能性がある。マイクロホンアレイを構成する各マイクロホンの感度に±４デシベル程度のばらつきがあると、指向性能の劣化は避けられない。そこで、マイクロホンアレイを構成する各マイクロホンの感度のばらつきを補正する技術が種々提案されている（特許文献１や特許文献２など）。特許文献１には、マイクロホンアレイを構成する複数のマイクロホンの何れか一つを基準マイクロホンとし、他のマイクロホンの出力信号の信号レベルが基準マイクロホンの出力信号のレベルと等しくなるようにゲインを調整することで、感度のばらつきを補正する技術が開示されている。一方、特許文献２には、マイクロホンアレイを構成する複数のマイクロホンのうち、一定周波数かつ一定音圧の音響信号が所定時間以上入力されているマイクロホンを基準マイクロホンとして他のマイクロホンの感度を補正する技術が開示されている。 In this type of sound collection system, the sensitivity of each microphone needs to be uniform. This is because if the sensitivity of each microphone varies, it may hinder the adjustment of the directivity pattern. However, since the microphone is a mechanical component, manufacturing variations cannot be avoided, and variations in sensitivity of ± 4 dB or more may occur in the manufacturing stage. When the sensitivity of each microphone constituting the microphone array has a variation of about ± 4 dB, deterioration of directivity is inevitable. Therefore, various techniques for correcting variations in sensitivity of the microphones constituting the microphone array have been proposed (Patent Document 1, Patent Document 2, etc.). In Patent Document 1, any one of a plurality of microphones constituting a microphone array is used as a reference microphone, and the gain is adjusted so that the signal level of the output signal of the other microphone is equal to the level of the output signal of the reference microphone. Thus, a technique for correcting variation in sensitivity is disclosed. On the other hand, Patent Document 2 discloses a technique for correcting the sensitivity of other microphones by using, as a reference microphone, a microphone to which an acoustic signal having a constant frequency and a constant sound pressure is input for a predetermined time or more among a plurality of microphones constituting a microphone array. Is disclosed.

特開平７−１３１８８６号公報JP 7-131886 A 特開２００７−２４６１８号公報JP 2007-24618 A

しかし、マイクロホンアレイを構成する複数のマイクロホンの何れか１つを基準として他のマイクロホンの出力信号のレベル調整を行うことで感度のばらつきを補正する技術には、マイクロホンアレイに対して音源が正対していない場合（アレイ面の中心を通り、かつアレイ面に垂直な方向（以下、アレイ面の法線方向）に音源が位置していない場合）に適切な感度補正をすることができないといった問題がある。これは、遠隔音源からの音波は平面波となって空間を伝搬するため、マイクロホンアレイに対して音源が正対していない場合には各マイクロホンと音源との距離の差により、各マイクロホンの位置で観測される音波の音圧が各々異なったものになるからである。したがって、特許文献１等に開示された技術によりマイクロホンの感度を補正する場合は、感度補正を適切に行い得る条件（音源がマイクロホンアレイに正対している等）が揃っているか否かについて十分に注意を払う必要があり、煩わしいといった問題があった。なお、このような問題を解決するには、各マイクロホンの出力信号に基づいて音の到来方向を推定し、その到来方向を加味して感度補正を行うようにすることが考えられる。しかし、ステアリングベクトルを用いた方法（ＭＶＤＲやＭＵＳＩＣを含む）など音の到来方向を推定するための従来技術では、マイクロホンアレイを構成する各マイクロホンの感度が揃っていることを前提としている。このため、この種の到来方向推定技術を、マイクロホンアレイを構成する各マイクロホンの感度のばらつきの補正の前提として用いることはできない。 However, in the technology that corrects the variation in sensitivity by adjusting the level of the output signal of another microphone on the basis of any one of the plurality of microphones constituting the microphone array, the sound source is directly opposed to the microphone array. If the sound source is not located in the direction that passes through the center of the array surface and is perpendicular to the array surface (hereinafter referred to as the normal direction of the array surface), the sensitivity correction cannot be performed properly. is there. This is because the sound wave from the remote sound source propagates through the space as a plane wave, and if the sound source is not facing the microphone array, it is observed at the position of each microphone due to the difference in distance between each microphone and the sound source. This is because the sound pressures of the sound waves to be produced are different from each other. Therefore, when the sensitivity of the microphone is corrected by the technique disclosed in Patent Document 1 or the like, it is sufficient whether or not conditions (such as a sound source facing the microphone array) that can appropriately perform sensitivity correction are prepared. There was a problem that it was necessary to pay attention and was troublesome. In order to solve such a problem, it is conceivable to estimate the arrival direction of sound based on the output signal of each microphone, and to perform sensitivity correction in consideration of the arrival direction. However, conventional techniques for estimating the direction of sound arrival, such as methods using steering vectors (including MVDR and MUSIC), assume that the sensitivity of each microphone constituting the microphone array is uniform. For this reason, this kind of direction-of-arrival estimation technique cannot be used as a premise for correcting variations in sensitivity of the microphones constituting the microphone array.

本発明は上記課題に鑑みて為されたものであり、マイクロホンアレイを構成する各マイクロホンの感度のばらつきを適切に補正するための条件が揃っているか否かについて特段の注意を払うことなく、感度補正を適切に行うことを可能にする技術を提供することを目的とする。 The present invention has been made in view of the above problems, and without paying special attention to whether or not the conditions for appropriately correcting the sensitivity variations of the microphones constituting the microphone array are met. An object of the present invention is to provide a technique that enables appropriate correction.

上記課題を解決するため、本発明は、各々異なる音源から放射されるＭ（Ｍは２以上の自然数）種類の音の混合音をマイクロホンアレイを構成するＭ個のマイクロホンの各々で収音して得られるＭ個の観測信号の各々に周波数分析を施し、複数の周波数の各々における信号強度を示す時系列の観測データをマイクロホン毎に算出する周波数分析部と、前記複数の周波数のうちの少なくとも１つを選択し、当該周波数成分についての音源分離を行うためのＭ行Ｍ列の複素数値行列である分離行列を当該周波数成分の観測データに対する独立成分分析により生成する分離行列生成部と、前記分離行列生成部により生成される分離行列の各行について、各行の行列要素の偏角の差から、当該行の行列要素により抑圧される音の到来方向を推定する方向推定部と、前記方向推定部により推定される音の到来方向が前記マイクロホンアレイの法線方向から大きくはずれてはいない前記分離行列の行がある場合に、当該行の行列要素の絶対値の比に応じて前記各マイクロホンの出力信号の信号レベルのばらつきを補正する感度補正部とを有することを特徴とするマイクロホンアレイを構成するマイクロホンの感度補正装置、およびコンピュータを上記各部として機能させることを特徴とするプログラム、を提供する。 In order to solve the above problems, the present invention collects a mixed sound of M kinds of sounds (M is a natural number of 2 or more) radiated from different sound sources by each of the M microphones constituting the microphone array. A frequency analysis unit that performs frequency analysis on each of the obtained M observation signals and calculates time-series observation data indicating the signal intensity at each of the plurality of frequencies for each microphone, and at least one of the plurality of frequencies A separation matrix generation unit that generates a separation matrix that is a complex value matrix of M rows and M columns for performing sound source separation for the frequency component by independent component analysis with respect to observation data of the frequency component; For each row of the separation matrix generated by the matrix generation unit, a method for estimating the arrival direction of the sound suppressed by the matrix element of the row from the difference in the declination of the matrix element of each row When there is a row of the separation matrix in which the direction of arrival of the sound estimated by the estimation unit and the direction estimation unit is not greatly deviated from the normal direction of the microphone array, the ratio of absolute values of matrix elements of the row And a sensitivity correction unit that corrects variations in the signal level of the output signal of each microphone according to the above, and a microphone sensitivity correction device that constitutes a microphone array, and a computer that functions as the above-described units. Program.

このような感度補正装置およびプログラムによれば、まず、マイクロホンアレイを構成するＭ個のマイクロホンの各々から出力されるＭ個の観測信号を用いた独立成分分析により、Ｍ種類の音の音源分離を行うためのＭ行Ｍ列の分離行列が算出され、この分離行列の行毎に、行列要素の偏角の差に基づいてその行により抑圧される音の到来方向が推定される。そして、上記のようにして推定される音の到来方向が前記マイクロホンアレイの法線方向から大きくはずれてはいない行が分離行列に含まれている場合に、当該行の行列要素の絶対値の比に応じて各マイクロホンの出力信号の信号レベルのばらつきが補正される。詳細については後述するが、Ｍ＝２である場合、アレイ面の法線方向に死角を形成する（すなわち、アレイ面の法線方向から到来する音を抑圧する）行の行列要素の絶対値の比は、２つのマイクロホンの出力信号の信号レベルの比（すなわち、２つのマイクロホンの感度の比）に等しくなる。このため、本発明によれば、マイクロホンアレイを構成する各マイクロホンの感度のばらつきを適切に補正するための条件（独立成分分析により生成される分離行列のＭ個の行に、前記マイクロホンアレイの法線方向から到来する音を抑圧するものが含まれているという条件、換言すれば、アレイ面の法線方向に何れかの音源が位置しているという条件）を満たしているか否かについて特段の注意を払わなくとも、その条件が満たされたときに、マイクロホンアレイを構成する各マイクロホンの感度のばらつきが自動的に補正される。 According to such a sensitivity correction apparatus and program, first, M types of sound sources are separated by independent component analysis using M observation signals output from each of the M microphones constituting the microphone array. A separation matrix of M rows and M columns to perform is calculated, and for each row of the separation matrix, the arrival direction of the sound suppressed by the row is estimated based on the difference in the declination of the matrix elements. If the separation matrix includes a row in which the direction of arrival of the sound estimated as described above is not greatly deviated from the normal direction of the microphone array, the ratio of the absolute values of the matrix elements of the row Accordingly, the variation in the signal level of the output signal of each microphone is corrected. Although details will be described later, when M = 2, the absolute value of the matrix element of the row that forms a blind spot in the normal direction of the array surface (that is, suppresses sound coming from the normal direction of the array surface). The ratio is equal to the ratio of the signal levels of the output signals of the two microphones (ie, the ratio of the sensitivity of the two microphones). Therefore, according to the present invention, a condition for appropriately correcting the sensitivity variation of each microphone constituting the microphone array (the method of the microphone array is included in M rows of the separation matrix generated by independent component analysis). Whether or not the condition that the sound coming from the line direction is included, in other words, the condition that any sound source is located in the normal direction of the array surface) is satisfied. Even if care is not taken, when the condition is satisfied, variations in sensitivity of the microphones constituting the microphone array are automatically corrected.

Ｍ＝２である場合、前記感度補正装置の分離行列生成部は、前記独立成分分析の出発点となる初期分離行列を、一方の行の行列要素に関しては前記マイクロホンアレイのアレイ面の法線方向から到来する音を抑圧するように値を設定し、かつ他方の行の行列要素についてはアレイ面におけるマイクロホンの配列方向から到来する音を抑圧するように値を設定することを特徴とする。Ｍ＝２の場合に独立成分分析の出発点となる初期分離行列を上記のように設定するのは、このような初期分離行列を用いて逐次学習を行えば、アレイ面の法線方向およびアレイ面におけるマイクロホンの配列方向に死角を有する分離行列を得やすくなることが一般に知られているからである。 When M = 2, the separation matrix generation unit of the sensitivity correction apparatus uses the initial separation matrix as a starting point for the independent component analysis, and the normal direction of the array surface of the microphone array with respect to the matrix element of one row The values are set so as to suppress the sound coming from the sound source, and the matrix elements in the other row are set so as to suppress the sound coming from the arrangement direction of the microphones on the array surface. The initial separation matrix that is the starting point for independent component analysis when M = 2 is set as described above. If sequential learning is performed using such an initial separation matrix, the normal direction of the array surface and the array This is because it is generally known that it becomes easy to obtain a separation matrix having a blind spot in the arrangement direction of microphones on the surface.

また、上記課題を解決するために本発明は、Ｎ（Ｎは２以上の自然数）のマイクロホンで構成されるマイクロホンアレイと、Ｍ＝２である場合の上記感度補正装置をＮ−１個備え、前記Ｎ個のマイクロホンのうちの何れか１つを基準マイクロホンとするとともに、他のＮ−１個のマイクロホンの各々を感度補正対象のマイクロホンとし、前記Ｎ−１個の感度補正装置の各々を前記Ｎ−１個の感度補正対象のマイクロホンの各々に一つずつ接続するとともに、当該Ｎ−１個の感度補正装置の各々を前記基準マイクロホンに接続し、当該Ｎ−１個の感度補正装置の各々により前記Ｎ−１個の補正対象マイクロホンの各々の出力信号の信号レベルを補正することを特徴とするマイクロホンアレイシステムを提供する。このような態様によれば、上記基準マイクロホンの出力信号の信号レベルを基準として、他のＮ−１個のマイクロホンの感度を補正する処理が上記Ｎ−１個の感度補正装置の各々によって実行される。これにより、マイクロホンアレイを構成するＮ個のマイクロホンの感度のばらつきが補正される。 In order to solve the above-mentioned problem, the present invention includes N-1 microphone arrays each including N (N is a natural number of 2 or more) microphones, and the sensitivity correction device when M = 2. Any one of the N microphones is a reference microphone, each of the other N-1 microphones is a sensitivity correction target microphone, and each of the N-1 sensitivity correction devices is the above-described microphone. Each of the N-1 sensitivity correction target microphones is connected to each of the N-1 sensitivity correction target microphones, and each of the N-1 sensitivity correction device is connected to the reference microphone. To provide a microphone array system that corrects the signal level of the output signal of each of the N-1 correction target microphones. According to such an aspect, the process of correcting the sensitivity of the other N−1 microphones based on the signal level of the output signal of the reference microphone is executed by each of the N−1 sensitivity correction devices. The Thereby, the variation in sensitivity of the N microphones constituting the microphone array is corrected.

この発明の第１実施形態であるマイクロホンアレイシステム１００Ａの構成例を示す図である。It is a figure which shows the structural example of 100 A of microphone array systems which are 1st Embodiment of this invention. 同システムに含まれる感度補正装置２０の周波数分析部２２が実行する処理を説明するための図である。It is a figure for demonstrating the process which the frequency analysis part 22 of the sensitivity correction apparatus 20 contained in the system performs. 同感度補正装置２０の分離行列生成部４０Ａの構成例を示す図である。It is a figure which shows the structural example of 40 A of separation matrix production | generation parts of the same sensitivity correction apparatus. 同感度補正装置２０の感度補正制御部２８の構成例を示す図である。It is a figure which shows the structural example of the sensitivity correction control part 28 of the sensitivity correction apparatus 20. FIG. 同感度補正制御部２８の補正量算定部７６が実行する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which the correction amount calculation part 76 of the sensitivity correction control part 28 performs. 同実施形態における音の混合系および分離系を説明するための図である。It is a figure for demonstrating the mixing system and separation system of a sound in the embodiment. 本発明の第２実施形態であるマイクロホンアレイシステム１００Ｂの構成例を示す図である。It is a figure which shows the structural example of the microphone array system 100B which is 2nd Embodiment of this invention. 本発明の第３実施形態であるマイクロホンアレイシステム１００の構成例を示す図である。It is a figure which shows the structural example of the microphone array system 100 which is 3rd Embodiment of this invention. 同マイクロホンアレイシステム１００に含まれる演算装置１２の信号処理部２４の構成例を示すブロック図である。3 is a block diagram illustrating a configuration example of a signal processing unit 24 of the arithmetic device 12 included in the microphone array system 100. FIG. 同演算装置１２の分離行列生成部４０の構成例を示すブロック図である。3 is a block diagram illustrating a configuration example of a separation matrix generation unit 40 of the arithmetic device 12. FIG.

以下、図面を参照しつつ本発明の実施形態について説明する。
＜Ａ：第１実施形態＞
図１は、本発明の第１実施形態であるマイクロホンアレイシステム１００Ａの構成例を示すブロック図である。マイクロホンアレイシステム１００Ａは、ｎ個（ｎは２以上の自然数）のマイクロホンにより構成されるマイクロホンアレイ１０Ａを含んでいる。本実施形態では、図１に示すように、マイクロホンＭ１およびマイクロホンＭ２の２個でマイクロホンアレイ１０Ａが構成されている場合（ｎ＝２）を想定する。マイクロホンＭ１およびマイクロホンＭ２の各々は、収音軸が平行になるように相互に間隔をあけて平面ＰＬに沿って配置されている。このため、マイクロホンアレイ１０Ａのアレイ面は平面ＰＬと平行になる。マイクロホンＭ１およびマイクロホンＭ２の周囲の相異なる位置には、上記各マイクロホンの収音軸およびマイクロホンアレイ１０Ａのアレイ面の法線を含む平面内にｎ個の音源Ｓ（Ｓ１，Ｓ２）が存在する。音源Ｓ１は、マイクロホンアレイ１０Ａのアレイ面の法線Ｌｎに対して角度θ１の方向に位置し、音源Ｓ２は、法線Ｌｎに対して角度θ２（θ２≠θ１）の方向に位置する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<A: First Embodiment>
FIG. 1 is a block diagram showing a configuration example of a microphone array system 100A according to the first embodiment of the present invention. The microphone array system 100A includes a microphone array 10A composed of n (n is a natural number of 2 or more) microphones. In the present embodiment, as shown in FIG. 1, a case is assumed in which the microphone array 10A is configured by two microphones M1 and M2 (n = 2). Each of the microphone M1 and the microphone M2 is arranged along the plane PL so as to be spaced from each other so that the sound collection axes are parallel to each other. For this reason, the array surface of the microphone array 10A is parallel to the plane PL. At different positions around the microphones M1 and M2, there are n sound sources S (S1, S2) in a plane including the sound collection axis of each microphone and the normal line of the array surface of the microphone array 10A. The sound source S1 is located in the direction of the angle θ1 with respect to the normal line Ln of the array surface of the microphone array 10A, and the sound source S2 is located in the direction of the angle θ2 (θ2 ≠ θ1) with respect to the normal line Ln.

音源Ｓ１から放射された音ＳＶ１と音源Ｓ２から放射された音ＳＶ２はマイクロホンＭ１およびマイクロホンＭ２の両方に到達する。マイクロホンＭ１は、音源Ｓ１からの音ＳＶ１と音源Ｓ２からの音ＳＶ２との混合音の波形を表す観測信号Ｖ１を生成する。同様に、マイクロホンＭ２は、音源Ｓ１からの音ＳＶ１と音源Ｓ２からの音ＳＶ２との混合音の波形を表す観測信号Ｖ２を生成する。図１に示すようにマイクロホンＭ２から出力される観測信号Ｖ２はアンプＧ２による信号レベルの増幅を経て信号処理部３０に与えられる一方、マイクロホンＭ１から出力される観測信号Ｖ１はそのまま（アンプによる増幅を経ることなく）信号処理部３０に与えられる。 The sound SV1 emitted from the sound source S1 and the sound SV2 emitted from the sound source S2 reach both the microphone M1 and the microphone M2. The microphone M1 generates an observation signal V1 that represents the mixed sound waveform of the sound SV1 from the sound source S1 and the sound SV2 from the sound source S2. Similarly, the microphone M2 generates an observation signal V2 that represents a mixed sound waveform of the sound SV1 from the sound source S1 and the sound SV2 from the sound source S2. As shown in FIG. 1, the observation signal V2 output from the microphone M2 is supplied to the signal processing unit 30 through amplification of the signal level by the amplifier G2, while the observation signal V1 output from the microphone M1 is left as it is (amplified by the amplifier). Without passing through).

信号処理部３０は、指向性収話のためのフィルタ処理を観測信号Ｖ１および観測信号Ｖ２に施すフィルタ部と、各々フィルタ処理を経た観測信号Ｖ１および観測信号Ｖ２をミキシングして出力する加算器を含んでいる（何れも図示省略）。マイクロホンアレイシステム１００Ａでは、上記フィルタ処理にて使用するフィルタ係数を調整することで指向パターンの設定が行われる。そして、信号処理部３０から出力される信号は、放音機器（例えばスピーカやヘッドホン）に供給されることで音響として再生される。なお、観測信号Ｖ１および観測信号Ｖ２をデジタル信号に変換するＡ／Ｄ変換器や、信号処理部３０の出力信号をアナログ信号に変換するＤ／Ａ変換器の図示は省略されている。 The signal processing unit 30 includes a filter unit that performs filtering processing for directional speech on the observation signal V1 and the observation signal V2, and an adder that mixes and outputs the observation signal V1 and the observation signal V2 that have passed through the filter processing, respectively. (Both not shown). In the microphone array system 100A, the directivity pattern is set by adjusting the filter coefficient used in the filter processing. The signal output from the signal processing unit 30 is reproduced as sound by being supplied to a sound emitting device (for example, a speaker or headphones). Note that an A / D converter that converts the observation signal V1 and the observation signal V2 into a digital signal and a D / A converter that converts the output signal of the signal processing unit 30 into an analog signal are omitted.

感度補正装置２０は、観測信号ＳＶ１の信号レベルを基準として観測信号ＳＶ２の信号レベルを調整することでマイクロホンＭ１およびマイクロホンＭ２の感度のばらつきを補正する。詳細については後述するが、感度補正装置２０は、観測信号Ｖ１および観測信号Ｖ２から本実施形態の特徴を顕著に示す手法で感度補正量Ｒを算出し、この感度補正量Ｒに応じたゲインをアンプＧ２に設定する。これにより、観測信号Ｖ１と観測信号Ｖ２の信号レベルが略揃い、マイクロホンＭ１とマイクロホンＭ２の感度のばらつきが補正されるのである。 The sensitivity correction device 20 corrects variations in sensitivity of the microphone M1 and the microphone M2 by adjusting the signal level of the observation signal SV2 with reference to the signal level of the observation signal SV1. Although details will be described later, the sensitivity correction device 20 calculates a sensitivity correction amount R from the observation signal V1 and the observation signal V2 by a method that remarkably shows the characteristics of the present embodiment, and sets a gain corresponding to the sensitivity correction amount R. Set to amplifier G2. Thereby, the signal levels of the observation signal V1 and the observation signal V2 are substantially uniform, and variations in sensitivity between the microphone M1 and the microphone M2 are corrected.

感度補正装置２０は、例えばパーソナルコンピュータなどのコンピュータ装置である。この感度補正装置２０のＣＰＵ（Central Processing Unit：図示略）は、記憶装置１４に格納されているプログラムを実行することにより、本実施形態の特徴を顕著に示す感度補正処理を実行する。記憶装置１４には、上記プログラム（以下、感度補正支援プログラム）や各種のデータが格納されている。この記憶装置１４としては、半導体記録媒体や磁気記録媒体などの公知の記録媒体が採用される。 The sensitivity correction device 20 is a computer device such as a personal computer. A CPU (Central Processing Unit: not shown) of the sensitivity correction device 20 executes a program stored in the storage device 14 to execute sensitivity correction processing that significantly shows the features of the present embodiment. The storage device 14 stores the above program (hereinafter referred to as a sensitivity correction support program) and various data. As the storage device 14, a known recording medium such as a semiconductor recording medium or a magnetic recording medium is employed.

感度補正装置２０のＣＰＵは感度補正支援プログラムを実行し、図１に示す周波数分析部２２、分離行列生成部４０Ａ、および感度補正制御部２８として機能する。なお、本実施形態では、周波数分析部２２、分離行列生成部４０Ａおよび感度補正制御部２８の各々をソフトウェアで実現したが、ＤＳＰなどの信号処理専用の電子回路で周波数分析部２２、分離行列生成部４０Ａおよび感度補正制御部２８を実現しても良く、これら各部を複数の集積回路に分散的に搭載した構成でも良い。 The CPU of the sensitivity correction apparatus 20 executes a sensitivity correction support program, and functions as the frequency analysis unit 22, the separation matrix generation unit 40A, and the sensitivity correction control unit 28 illustrated in FIG. In this embodiment, each of the frequency analysis unit 22, the separation matrix generation unit 40A, and the sensitivity correction control unit 28 is realized by software. However, the frequency analysis unit 22, separation matrix generation is performed by an electronic circuit dedicated to signal processing such as a DSP. The unit 40A and the sensitivity correction control unit 28 may be realized, or a configuration in which these units are distributedly mounted on a plurality of integrated circuits may be employed.

周波数分析部２２は、観測信号Ｖ（Ｖ１，Ｖ２）を時間軸上で区分した複数のフレームの各々について周波数スペクトルＱ（観測信号Ｖ１の周波数スペクトルＱ１および観測信号Ｖ２の周波数スペクトルＱ２）を算定する。周波数スペクトルＱの算定には、例えば短時間フーリエ変換が利用される。図２に示すように、番号（時刻）ｔで識別される１個のフレームの周波数スペクトルＱ１は、周波数軸上に設定されたＫ個の周波数ｆ１〜ｆＫの各々における強度ｘ１（ｔ，ｆ１）〜ｘ１（ｔ，ｆＫ）として算定される。同様に、周波数スペクトルＱ２は、Ｋ個の周波数ｆ１〜ｆＫの各々における強度ｘ２（ｔ，ｆ１）〜ｘ２（ｔ，ｆＫ）として算定される。 The frequency analysis unit 22 calculates the frequency spectrum Q (the frequency spectrum Q1 of the observation signal V1 and the frequency spectrum Q2 of the observation signal V2) for each of a plurality of frames obtained by dividing the observation signal V (V1, V2) on the time axis. . For the calculation of the frequency spectrum Q, for example, a short-time Fourier transform is used. As shown in FIG. 2, the frequency spectrum Q1 of one frame identified by the number (time) t has an intensity x1 (t, f1) at each of the K frequencies f1 to fK set on the frequency axis. Calculated as ~ x1 (t, fK). Similarly, the frequency spectrum Q2 is calculated as intensities x2 (t, f1) to x2 (t, fK) at each of the K frequencies f1 to fK.

周波数分析部２２は、Ｋ個の周波数ｆ１〜ｆＫについてフレーム毎に観測ベクトルＸ（ｔ，ｆ１）〜Ｘ（ｔ，ｆＫ）を生成する。第ｋ番目（ｋ＝１〜Ｋ）の周波数ｆｋの観測ベクトルＸ（ｔ，ｆＫ）は、図２に示すように、周波数スペクトルＱ１のうち周波数ｆｋでの強度ｘ１（ｔ，ｆｋ）と、共通のフレームの周波数スペクトルＱ２のうち周波数ｆｋでの強度ｘ２（ｔ，ｆｋ）とを要素とするベクトル（Ｘ（ｔ，ｆｋ）＝［ｘ１（ｔ，ｆｋ）^＊ｘ２（ｔ，ｆｋ）^＊］^Ｈである。記号＊は複素共役を意味し、記号Ｈは行列の転置（エルミート転置）を意味する。周波数分析部２２がフレーム毎に生成した観測ベクトルＸ（ｔ，ｆ１）〜Ｘ（ｔ，ｆＫ）は記憶装置１４に格納される。記憶装置１４に格納された観測ベクトルＸ（ｔ，ｆ１）〜Ｘ（ｔ，ｆＫ）は、図２に示すように、所定個（例えば５０個）のフレームで構成される単位区間ＴＵ毎に観測データＤ（ｆ１）〜Ｄ（ｆＫ）に区分される。周波数ｆｋの観測データＤ（ｆｋ）は、単位区間ＴＵ内の各フレームについて算定された周波数ｆｋの観測ベクトルＸ（ｔ，ｆｋ）の時系列である。 The frequency analysis unit 22 generates observation vectors X (t, f1) to X (t, fK) for each frame for the K frequencies f1 to fK. The observation vector X (t, fK) of the kth (k = 1 to K) frequency fk is common to the intensity x1 (t, fk) at the frequency fk in the frequency spectrum Q1, as shown in FIG. Vector (X (t, fk) = [x1 (t, fk) ^* x2 (t, fk) ^* ] ^H whose elements are the intensity x2 (t, fk) at the frequency fk in the frequency spectrum Q2 of the frame of Symbol * means complex conjugate, and symbol H means matrix transposition (Hermitian transposition) Observation vectors X (t, f1) to X (t, fK) generated by the frequency analysis unit 22 for each frame. ) Is stored in the storage device 14. The observation vectors X (t, f1) to X (t, fK) stored in the storage device 14 are, as shown in FIG. Observation data D (f1 for each unit interval TU consisting of ~D is divided into (fK). Observed data D (fk) of the frequency fk is a time series of observed vectors X (t, fk) of the calculated frequency fk for each frame in the unit interval TU.

分離行列生成部４０Ａは、観測データＤ（ｆｋ）から所謂独立成分分析により分離行列Ｗ（ｆ１）〜Ｗ（ｆＫ）を生成する。ここで、分離行列とは、本来的には、観測信号Ｖ１および観測信号Ｖ２から音ＳＶ１または音ＳＶ２（或いは両者）を分離するための信号処理演算に用いられる２行２列（ｎ行ｎ列）の複素数値行列である。しかし、本実施形態では、この分離行列を用いてマイクロホンＭ１とマイクロホンＭ２の感度のばらつきを補正することに特徴がある。 The separation matrix generation unit 40A generates separation matrices W (f1) to W (fK) from the observation data D (fk) by so-called independent component analysis. Here, the separation matrix is essentially two rows and two columns (n rows and n columns) used for signal processing operations for separating the sound SV1 or the sound SV2 (or both) from the observation signal V1 and the observation signal V2. ) Complex-valued matrix. However, this embodiment is characterized in that variations in sensitivity between the microphone M1 and the microphone M2 are corrected using this separation matrix.

図３は、分離行列生成部４０Ａのブロック図である。
図３に示すように、分離行列生成部４０Ａは、初期値生成部４２、周波数選択部５４、および学習処理部４４を含んでいる。初期値生成部４２は、Ｋ個の周波数ｆ１〜ｆＫの各々について初期的な分離行列（以下「初期分離行列」という）Ｗ_０（ｆ１）〜Ｗ_０（ｆＫ）を生成する。周波数ｆｋに対応する初期分離行列Ｗ_０（ｆｋ）は、記憶装置１４に格納された観測データＤ（ｆｋ）を利用して単位区間ＴＵ毎に生成される。初期分離行列Ｗ_０（ｆ１）〜Ｗ_０（ｆＫ）の生成手法としては公知の手法を適宜採用すれば良い。ここで、初期分離行列Ｗ_０（ｆ１）〜Ｗ_０（ｆＫ）としてどのようなものを生成するのかについては、種々の態様が考えられるが、本実施形態では、所謂死角型ビームフォーマを採用する。より詳細に説明すると、周波数ｆ１〜ｆＫの各々について初期分離行列を観測信号Ｖ１およびＶ２に乗算したとした場合に、これら２つの観測信号と当該分離行列の１行目の行列要素（すなわち、（１，１）成分および（１，２）成分）との乗算により得られる信号においてはマイクロホンアレイ１０Ａのアレイ面の法線方向から到来する音が抑圧され（すなわち、同法線方向が死角となり）、これら２つの観測信号と当該初期分離行列の２行目の行列要素（すなわち、（２，１）成分および（２，２）成分）との乗算により得られる信号においてはマイクロホンアレイ１０Ａにおける各マイクロホンの配列方向から到来する音が抑圧される（すなわち、同配列方向が死角となる）ように初期分離行列を設定する。本実施形態では上記のように初期分離行列を設定するため、死角型ビームフォーマの分離行列、すなわち、分離行列の行毎にその死角方向から到来する音を抑圧する（換言すれば、死角以外の方向から到来する音を強調する）ことで音源分離を行う分離行列が生成されることになる。 FIG. 3 is a block diagram of the separation matrix generator 40A.
As illustrated in FIG. 3, the separation matrix generation unit 40A includes an initial value generation unit 42, a frequency selection unit 54, and a learning processing unit 44. The initial value generation unit 42 generates initial separation matrices (hereinafter referred to as “initial separation matrix”) W ₀ (f1) to W ₀ (fK) for each of the K frequencies f1 to fK. The initial separation matrix W ₀ (fk) corresponding to the frequency fk is generated for each unit interval TU using the observation data D (fk) stored in the storage device 14. Initial separation matrix _W 0 _(f1) as the technique of generating to W-0 (fK) may be appropriately used a known technique. Here, various forms of the initial separation matrices W ₀ (f1) to W ₀ (fK) are generated. In the present embodiment, a so-called blind spot beamformer is employed. . More specifically, when the observation signals V1 and V2 are multiplied by the initial separation matrix for each of the frequencies f1 to fK, these two observation signals and the matrix element in the first row of the separation matrix (that is, ( 1,1) and (1,2) components), the sound coming from the normal direction of the array surface of the microphone array 10A is suppressed (that is, the normal direction becomes a blind spot). In the signal obtained by multiplying these two observation signals and the matrix elements (that is, the (2,1) component and the (2,2) component) of the second row of the initial separation matrix, each microphone in the microphone array 10A. The initial separation matrix is set so that the sound coming from the arrangement direction is suppressed (that is, the arrangement direction becomes a blind spot). In the present embodiment, since the initial separation matrix is set as described above, the separation matrix of the blind spot beamformer, that is, the sound coming from the blind spot direction is suppressed for each row of the separation matrix (in other words, other than the blind spot) By emphasizing the sound coming from the direction), a separation matrix for sound source separation is generated.

周波数選択部５４は、Ｋ種類の周波数ｆ１〜ｆＫのうちから、独立成分分析による分離行列の学習対象とする１または複数の周波数をマイクロホンアレイ１０ＡにおけるマイクロホンＭ１およびマイクロホンＭ２の配置間隔の大きさに応じて選択する。より詳細に説明すると、周波数選択部５４は、マイクロホンＭ１およびマイクロホンＭ２の配置間隔との関係でアレイゲインが高く、かつ折り返し雑音の少ない１または複数の周波数をＫ種類の周波数ｆ１〜ｆＫのうちから選択する。以下、周波数選択部５４により選択される周波数のことを「選択周波数」と呼ぶ。つまり、本実施形態では、Ｋ種類の周波数ｆ１〜ｆＫのうち周波数選択部５４により選択された周波数に関してのみ、独立成分分析を用いた分離行列の学習が行われる。その理由は以下の通りである。 The frequency selection unit 54 selects one or a plurality of frequencies to be learned from the separation matrix by independent component analysis from among the K types of frequencies f1 to fK according to the arrangement interval of the microphone M1 and the microphone M2 in the microphone array 10A. Select accordingly. More specifically, the frequency selection unit 54 selects one or a plurality of frequencies having high array gain and low aliasing noise from among the K types of frequencies f1 to fK in relation to the arrangement interval of the microphones M1 and M2. select. Hereinafter, the frequency selected by the frequency selection unit 54 is referred to as “selected frequency”. That is, in the present embodiment, learning of the separation matrix using independent component analysis is performed only for the frequency selected by the frequency selection unit 54 among the K types of frequencies f1 to fK. The reason is as follows.

音源分離を目的として分離行列の学習を行う場合には、Ｋ種類の周波数ｆ１〜ｆＫの全てについて分離行列を算出することが理想的である。しかし、本実施形態では、音源分離を目的としている訳ではなく、音の到来方向の推定と各マイクロホンの感度補正を目的としているため、その目的が達せられる範囲で分離行列を算定することができれば十分である。そこで、第１実施形態においては、Ｋ個の周波数ｆ１〜ｆＫのうち、アレイゲインが高くかつ折り返し雑音のない１または複数の周波数をマイクロホンアレイ１０ＡにおけるマイクロホンＭ１およびマイクロホンＭ２の配置間隔に基づいて選択し、それら周波数についてのみ観測データＤ（ｆｋ）を使用した分離行列Ｗ（ｆｋ）の逐次学習を実行することとして、感度補正に要する演算量を削減しているのである。 When learning the separation matrix for the purpose of sound source separation, it is ideal to calculate the separation matrix for all of the K types of frequencies f1 to fK. However, the present embodiment is not intended for sound source separation, but for the purpose of estimating the direction of arrival of sound and correcting the sensitivity of each microphone, so long as the separation matrix can be calculated within a range that can achieve that purpose. It is enough. Therefore, in the first embodiment, one or more frequencies having a high array gain and no aliasing noise are selected from the K frequencies f1 to fK based on the arrangement interval of the microphone M1 and the microphone M2 in the microphone array 10A. However, the amount of calculation required for sensitivity correction is reduced by performing sequential learning of the separation matrix W (fk) using the observation data D (fk) only for those frequencies.

学習処理部４４は、周波数選択部５４により選択された選択周波数ｆｋの各々について、初期分離行列Ｗ_０（ｆｋ）を初期値とした逐次的な学習で分離行列Ｗ（ｆｋ）を生成する。分離行列Ｗ（ｆｋ）の学習には、記憶装置１４に格納された周波数ｆｋの観測データＤ（ｆｋ）が使用される。例えば、観測データＤ（ｆｋ）に分離行列Ｗ（ｆｋ）を乗算して得られる分離信号Ｕ１（数１で定義される強度ｕ１（ｔ，ｆｋ）の時系列）と分離信号Ｕ２（数２で定義される強度ｕ２（ｔ，ｆｋ）の時系列）とが統計的に相互に独立となるように分離行列Ｗ（ｆｋ）の更新を反復する独立成分分析（例えば高次ＩＣＡ）が、分離行列Ｗ（ｆｋ）の生成に好適に採用される。なお、以下の数１および数２において、ｗｉｊ（ｆｋ）は分離行列Ｗ（ｆｋ）のｉ行ｊ列成分である。

以上が分離行列生成部４０Ａの構成である。 The learning processing unit 44 generates a separation matrix W (fk) by sequential learning using the initial separation matrix W ₀ (fk) as an initial value for each of the selection frequencies fk selected by the frequency selection unit 54. For the learning of the separation matrix W (fk), observation data D (fk) of the frequency fk stored in the storage device 14 is used. For example, the separation signal U1 (time series of the intensity u1 (t, fk) defined by Equation 1) and the separation signal U2 (Equation 2) obtained by multiplying the observation data D (fk) by the separation matrix W (fk). Independent component analysis (for example, higher-order ICA) that repeatedly updates the separation matrix W (fk) so that the defined intensity u2 (t, fk) is statistically independent of each other is performed. It is preferably used for generating W (fk). In the following

equations

1 and 2, wij (fk) is an i-row j-column component of the separation matrix W (fk).

The above is the configuration of the separation matrix generation unit 40A.

次いで、感度補正制御部２８の構成について説明する。
図４は感度補正制御部２８の構成を示すブロック図である。図４に示すように感度補正制御部２８は、方向推定部７２と補正量算定部７６を含んでいる。 Next, the configuration of the sensitivity correction control unit 28 will be described.
FIG. 4 is a block diagram showing the configuration of the sensitivity correction control unit 28. As shown in FIG. 4, the sensitivity correction control unit 28 includes a direction estimation unit 72 and a correction amount calculation unit 76.

方向推定部７２には、選択周波数ｆｋを示すデータと、学習処理部４４による学習後の分離行列Ｗ（ｆｋ）とが供給される。方向推定部７２は、選択周波数ｆｋに関し学習後の各分離行列Ｗ（ｆｋ）から、この分離行列Ｗ（ｆｋ）の各行により抑圧される音の到来方向（具体的には、アレイ面の法線Ｌｎと音の到来方向とがなす角度）を推定する。より詳細に説明すると、方向推定部７２は、学習処理部４４による学習後の分離行列Ｗ（ｆｋ）の１行目の行列要素の偏角の差（すなわち、ｗ１１（ｆｋ）の偏角とｗ１２（ｆｋ）の偏角の差）から当該１行目の行列要素により抑圧される音の到来方向θ１（ｆｋ）を推定し、同２行目の行列要素の偏角の差（すなわち、ｗ２１（ｆｋ）の偏角とｗ２２（ｆｋ）の偏角の差）から当該２行目の行列要素により抑圧される音の到来方向θ２（ｆｋ）を推定する。分離行列Ｗ（ｆｋ）の行列要素を利用した到来方向θ１（ｆｋ）および到来方向θ２（ｆｋ）の推定には、H. Saruwatari, et. al., "Blind Source Separation Combining
Independent Component Analysis and Beamforming", EURASIP Journal on
Applied Signal Processing Vol.2003, No.11, pp.1135-1146, 2003に開示された方法などを用いることができる。例えば、ｗ１１（ｆｋ）の偏角とｗ１２（ｆｋ）の偏角の差がゼロであれば、分離行列の１行目の行列要素により抑圧される音の到来方向θ１（ｆｋ）はマイクロホンアレイ１０Ａのアレイ面の法線方向であると推定される。 The direction estimation unit 72 is supplied with data indicating the selected frequency fk and the separation matrix W (fk) after learning by the learning processing unit 44. The direction estimation unit 72 determines the arrival direction of the sound (specifically, the normal of the array plane) from each separation matrix W (fk) after learning with respect to the selected frequency fk, by each row of the separation matrix W (fk). The angle formed by Ln and the direction of arrival of the sound is estimated. More specifically, the direction estimating unit 72 determines the difference between the declinations of the matrix elements in the first row of the separation matrix W (fk) after learning by the learning processing unit 44 (that is, the declination of w11 (fk) and w12). The arrival direction θ1 (fk) of the sound suppressed by the matrix element of the first row is estimated from the difference of the declination angle of (fk), and the difference of the declination of the matrix elements of the second row (that is, w21 ( The direction of arrival θ2 (fk) of the sound suppressed by the matrix element in the second row is estimated from the difference between the deviation angle of fk) and the deviation angle of w22 (fk). For estimation of the arrival direction θ1 (fk) and the arrival direction θ2 (fk) using the matrix elements of the separation matrix W (fk), H. Saruwatari, et. Al., “Blind Source Separation Combining
Independent Component Analysis and Beamforming ", EURASIP Journal on
The methods disclosed in Applied Signal Processing Vol. 2003, No. 11, pp. 1135-1146, 2003 can be used. For example, if the difference between the deviation angle of w11 (fk) and the deviation angle of w12 (fk) is zero, the arrival direction θ1 (fk) of the sound suppressed by the matrix element in the first row of the separation matrix is the microphone array 10A. The normal direction of the array surface is estimated.

補正量算定部７６は、学習処理部４４による学習後の分離行列Ｗ（ｆｋ）からマイクロホンＭ２についての感度の補正量Ｒを算出し、その補正量Ｒに応じたゲインをアンプＧ２に設定する処理を実行する。図５は、補正量算定部７６が実行する処理の流れを示すフローチャートである。図５に示すように補正量算定部７６は、選択周波数ｆｋの各々に関して方向推定部７２により推定される音の到来方向θ１（ｆｋ）（すなわち、分離行列Ｗ（ｆｋ）の１行目により抑圧される音の到来方向）がアレイ面の法線方向から大幅にはずれているか否かを判定し、大幅にはずれていると判定される周波数ｆｋを選択周波数から除外する（ステップＳＡ１００）。例えば、補正量算定部７６は、到来方向を示す角度（すなわち、θ１（ｆｋ）やθ２（ｆｋ））の絶対値が所定の閾値を超えている場合に、その到来方向はアレイ面の法線方向から大幅に外れていると判定する。ここで、到来方向θ１（ｆｋ）がアレイ面の法線方向から大幅にはずれている周波数を除外するのは、そのような周波数に対応する分離行列に対してステップＳＡ１２０以降の演算を行っても、感度補正の精度向上を望めないからである。 The correction amount calculation unit 76 calculates a sensitivity correction amount R for the microphone M2 from the separation matrix W (fk) after learning by the learning processing unit 44, and sets a gain corresponding to the correction amount R to the amplifier G2. Execute. FIG. 5 is a flowchart showing a flow of processing executed by the correction amount calculation unit 76. As shown in FIG. 5, the correction amount calculation unit 76 suppresses the arrival direction θ1 (fk) of the sound estimated by the direction estimation unit 72 for each selected frequency fk (that is, the first row of the separation matrix W (fk)). It is determined whether or not the arrival direction of the generated sound is significantly deviated from the normal direction of the array surface, and the frequency fk determined to be deviated significantly is excluded from the selected frequency (step SA100). For example, when the absolute value of the angle indicating the arrival direction (that is, θ1 (fk) or θ2 (fk)) exceeds a predetermined threshold, the correction amount calculation unit 76 determines that the arrival direction is the normal line of the array surface. Judged to be significantly out of direction. Here, the frequency in which the arrival direction θ1 (fk) is significantly deviated from the normal direction of the array surface is excluded even if the calculation after step SA120 is performed on the separation matrix corresponding to such a frequency. This is because improvement in sensitivity correction cannot be expected.

次いで、補正量算定部７６は、ステップＳＡ１００にて選択周波数の全てが除外されたか否かを判定し（ステップＳＡ１１０）、その判定結果が“Ｎｏ”である場合（すなわち、選択周波数ｆｋのうち除外されなかったものがあった場合）にのみ、ステップＳＡ１２０以降の処理を実行する。このステップＳＡ１２０では、補正量算定部７６は、ステップＳＡ１００にて除外されなかった周波数（すなわち、到来方向θ１（ｆｋ）がアレイ面の法線方向から大幅にはずれてはいないと判定された周波数）ｆｋの各々についての分離行列ｗ（ｆｋ）の１行目の行列要素（すなわち、ｗ１１（ｆｋ）およびｗ１２（ｆｋ））から以下の数３にしたがってマイクロホンＭ２についての感度の補正量Ｒ（ｆｋ）を算定する。この数３において｜｜は絶対値を意味する。

Next, the correction amount calculation unit 76 determines whether or not all of the selected frequencies have been excluded in Step SA100 (Step SA110). If the determination result is “No” (that is, excludes the selected frequencies fk). Step SA120 and the subsequent steps are executed only when there is something that has not been done. In this step SA120, the correction amount calculation unit 76 does not exclude the frequency in step SA100 (that is, the frequency at which the arrival direction θ1 (fk) is determined not to deviate significantly from the normal direction of the array surface). Sensitivity correction amount R (fk) for the microphone M2 from the matrix elements in the first row of the separation matrix w (fk) for each of fk (ie, w11 (fk) and w12 (fk)) according to the following Equation 3. Is calculated. In Equation 3, || means an absolute value.

ここで、マイクロホンＭ２についての感度の補正量Ｒ（ｆｋ）を前掲数３にしたがって算出することができる理由は以下の通りである。音源Ｓ１から放射される音ＳＶ１と音源Ｓ２から放射される音ＳＶ２の混合系が図６に示すように表され、マイクロホンＭ１とマイクロホンＭ２の感度が揃っておらず、あたかもマイクロホンＭ１側にのみゲインｐが入っているかのように観測信号Ｖ１の信号レベルと観測信号Ｖ２の信号レベルに差がある場合（図６参照）、音ＳＶ１および音ＳＶ２から観測信号Ｖ１および観測信号Ｖ２を以下の数４にしたがって生成する混合行列Ａは、以下の数５で表される。なお、以下の数５においてａ_ｉｊは音源ＳｊからマイクロホンＭｉへ至る音の伝搬経路の伝達関数である。

Here, the reason why the sensitivity correction amount R (fk) for the microphone M2 can be calculated according to the above-mentioned equation 3 is as follows. A mixed system of the sound SV1 radiated from the sound source S1 and the sound SV2 radiated from the sound source S2 is represented as shown in FIG. 6, and the sensitivities of the microphone M1 and the microphone M2 are not uniform, and the gain is only on the microphone M1 side. When there is a difference between the signal level of the observation signal V1 and the signal level of the observation signal V2 (see FIG. 6) as if p is present, the observation signal V1 and the observation signal V2 are expressed by the following equation 4 from the sound SV1 and the sound SV2. The mixing matrix A generated according to the above is expressed by the following formula 5. In Equation 5 below, a _ij is a transfer function of a sound propagation path from the sound source Sj to the microphone Mi.

この場合、観測信号Ｖ１および観測信号Ｖ２から音ＳＶ１および音ＳＶ２を分離するための分離行列Ｗの候補の一つとしては混合行列Ａの逆行列Ａ^−１が挙げられる。この場合の分離行列Ｗは以下の数６で与えられる。この分離行列Ｗは、その１行目の行列要素によって音源Ｓ２から放射された音を抑圧し、同２行目の行列要素によって音源Ｓ１から放射された音を抑圧する。

In this case, one of the separation matrix W candidates for separating the sound SV1 and the sound SV2 from the observation signal V1 and the observation signal V2 is an inverse matrix A- ¹ of the mixing matrix A. The separation matrix W in this case is given by the following Equation 6. The separation matrix W suppresses the sound emitted from the sound source S2 by the matrix element in the first row, and suppresses the sound emitted from the sound source S1 by the matrix element in the second row.

分離行列Ｗの１行目の行列要素がアレイ面の法線方向に死角を形成している場合（分離行列Ｗ（ｆｋ）の１行目により抑圧される音がマイクロホンアレイ１０Ａのアレイ面の法線方向から到来している場合、すなわち、音源Ｓ２がアレイ面に正対している場合）、音源Ｓ２からマイクロホンＭ１へ至る距離と音源Ｓ２からマイクロホンＭ２へ至る距離は等しくなり、ａ_１２＝ａ_２２となる。したがって、分離行列Ｗの１行目の行列要素がアレイ面の法線方向に死角を形成している場合、分離行列Ｗの１行目の行列要素Ｗ１１およびＷ１２の比Ｒは以下の数７で算定され、この数７にしたがって算定される値Ｒの大きさは、マイクロホンＭ１とマイクロホンＭ２の感度比ｐに等しくなる。

When the matrix element of the first row of the separation matrix W forms a blind spot in the normal direction of the array surface (the sound suppressed by the first row of the separation matrix W (fk) is the method of the array surface of the microphone array 10A. When coming from the line direction, that is, when the sound source S2 is directly facing the array surface, the distance from the sound source S2 to the microphone M1 is equal to the distance from the sound source S2 to the microphone M2, and a ₁₂ = a ₂₂ It becomes. Accordingly, when the matrix element in the first row of the separation matrix W forms a blind spot in the normal direction of the array surface, the ratio R of the matrix elements W11 and W12 in the first row of the separation matrix W is given by The magnitude of the value R that is calculated and calculated according to Equation 7 is equal to the sensitivity ratio p between the microphone M1 and the microphone M2.

したがって、分離行列Ｗ（ｆｋ）の１行目がアレイ面の法線方向に死角を形成している場合には、前掲数３にしたがって算出されるＲ（ｆｋ）に応じたゲインをアンプＧ２に設定することで、マイクロホンＭ１とマイクロホンＭ２の感度のばらつきを補正することができるのである。 Therefore, when the first row of the separation matrix W (fk) forms a blind spot in the normal direction of the array surface, a gain corresponding to R (fk) calculated according to the above equation 3 is given to the amplifier G2. By setting, it is possible to correct variations in sensitivity between the microphone M1 and the microphone M2.

そして、補正量算定部７６は、ステップＳＡ１００にて除外されなかった周波数ｆｋの各々に関して数３にしたがって算出される補正量Ｒ（ｆｋ）を代表する値Ｒ（選択周波数ｆｋが複数の残っている場合には、それら選択周波数ｆｋの各々について算出されるＲ（ｆｋ）の相加平均や中央値など、選択周波数ｆｋが１つしか残っていない場合には、その選択周波数ｆｋについて算出されるＲ（ｆｋ））を求める（ステップＳＡ１３０）。そして、補正量算定部７６は、ステップＳＡ１３０で算出したＲに応じたゲインをアンプＧ２に設定し（ステップＳＡ１４０）、感度補正を完了する。
以上が補正量算定部７６が実行する処理の流れである。 Then, the correction amount calculation unit 76 has a plurality of values R (selected frequencies fk) representing the correction amount R (fk) calculated according to Equation 3 for each of the frequencies fk not excluded in step SA100. In the case where only one selected frequency fk remains, such as an arithmetic mean or median value of R (fk) calculated for each of the selected frequencies fk, the R calculated for the selected frequency fk. (Fk)) is obtained (step SA130). Then, the correction amount calculation unit 76 sets the gain corresponding to R calculated in step SA130 in the amplifier G2 (step SA140), and completes the sensitivity correction.
The above is the flow of processing executed by the correction amount calculation unit 76.

以上説明しように、マイクロホンアレイシステム１００Ａにおいては、マイクロホンアレイ１０Ａを構成する各マイクロホンの感度の補正を適切に行うための条件が揃ったこと（アレイ面の法線方向に音源が位置していること）を自動的に検出し、マイクロホンＭ１およびマイクロホンＭ２の感度のばらつきを補正する処理が感度補正装置２０によって実行される。これにより、上記条件に特段の注意を払わなくとも、各マイクロホンの感度のばらつきが自動的に補正されるのである。 As described above, in the microphone array system 100A, the conditions for appropriately correcting the sensitivity of the microphones constituting the microphone array 10A are met (the sound source is located in the normal direction of the array surface). ) Is automatically detected, and the sensitivity correction device 20 executes a process of correcting variations in sensitivity of the microphones M1 and M2. As a result, variations in sensitivity of each microphone are automatically corrected without paying special attention to the above conditions.

なお、マイクロホンアレイ１０Ａを構成する各マイクロホンの感度の補正は、工場出荷時或いは運用開始直後に一回だけ行えば良いから、感度補正を実行済みであるか否かを示すフラグ（値が０ならば感度補正を未実行、値が１ならば感度補正済み）に初期値“０”をセットして記憶装置１４に書き込んでおき、このフラグの値が０である間は定期的に感度補正支援プログラムを感度補正装置２０のＣＰＵに実行させ、上記ステップＳＡ１４０の処理の実行を契機として上記フラグを１に更新する処理を上記ＣＰＵに実行させるようにしても良い。また、本実施形態では、分離行列Ｗ（ｆｋ）の１行目の行列要素の絶対値の比（数３にしたがって算出される値Ｒ（ｆｋ）または、複数の選択周波数ｆｋについてのＲ（ｆｋ）を代表する値）に応じて、マイクロホンＭ２の出力信号の信号レベルを調整することでマイクロホンＭ１およびマイクロホンＭ２の感度のばらつきを補正したが、マイクロホンＭ１の出力信号の信号レベルを上記Ｒ（ｆｋ）の逆数（或いはＲ（ｆｋ）を代表する値の逆数）に応じて調整することで、両マイクロホンの感度のばらつきを補正しても勿論良い。 Note that the sensitivity of each microphone constituting the microphone array 10A needs to be corrected only once at the time of factory shipment or immediately after the start of operation. Therefore, a flag indicating whether sensitivity correction has been executed (if the value is 0) If the value is 1, the sensitivity is corrected if the value is 1, the initial value “0” is set and written in the storage device 14, and while this flag value is 0, the sensitivity correction is periodically supported. The program may be executed by the CPU of the sensitivity correction apparatus 20, and the CPU may be caused to execute a process of updating the flag to 1 when the process of step SA140 is executed. In this embodiment, the ratio of the absolute values of the matrix elements in the first row of the separation matrix W (fk) (value R (fk) calculated according to Equation 3 or R (fk) for a plurality of selection frequencies fk ), The variation in sensitivity of the microphone M1 and the microphone M2 is corrected by adjusting the signal level of the output signal of the microphone M2, and the signal level of the output signal of the microphone M1 is set to the above R (fk). It is of course possible to correct the variation in sensitivity of both microphones by adjusting the reciprocal of (or the reciprocal of the value representing R (fk)).

＜Ｂ：第２実施形態＞
次いで本発明の第２実施形態について説明する。第１実施形態においては２個のマイクロホンＭ（Ｍ１、Ｍ２）からなるマイクロホンアレイ１０Ａを用いてマイクロホンアレイシステム１００Ａを構成した。これに対して、第２実施形態では、３個以上のマイクロホンＭ（Ｍ１、Ｍ２…ＭＮ：Ｎは３以上の自然数）からなるマイクロホンアレイ１０Ｂを用いてマイクロホンアレイシステム１００Ｂが構成されている。図７は、マイクロホンアレイシステム１００Ｂの構成例を示すブロック図である。図７に示すように、マイクロホンアレイシステム１００Ｂにおいて、マイクロホンＭ１以外の（Ｎ−１）個のマイクロホンＭｋ（ｋ＝２〜Ｎ）は、各々アンプＧｋ（ｋ＝２〜Ｎ）を介して信号処理部３０に接続されている。そして、マイクロホンＭ１とマイクロホンＭｋ（ｋ＝２〜Ｎ）とは、感度補正装置２０−ｋ（ｋ＝２〜Ｎ）に接続されており、この感度補正装置２０−ｋによってアンプＧｋのゲインの調整が行われる。これら感度補正装置２０−ｋ（ｋ＝２〜Ｎ）の各々は、図１の感度補正装置２０と同一の構成を有している。 <B: Second Embodiment>
Next, a second embodiment of the present invention will be described. In the first embodiment, the microphone array system 100A is configured by using the microphone array 10A including two microphones M (M1, M2). On the other hand, in the second embodiment, the microphone array system 100B is configured using a microphone array 10B composed of three or more microphones M (M1, M2... MN: N is a natural number of 3 or more). FIG. 7 is a block diagram illustrating a configuration example of the microphone array system 100B. As shown in FIG. 7, in the microphone array system 100B, (N−1) microphones Mk (k = 2 to N) other than the microphone M1 are each subjected to signal processing via an amplifier Gk (k = 2 to N). Connected to the unit 30. The microphone M1 and the microphone Mk (k = 2 to N) are connected to a sensitivity correction device 20-k (k = 2 to N), and the gain of the amplifier Gk is adjusted by the sensitivity correction device 20-k. Is done. Each of these sensitivity correction devices 20-k (k = 2 to N) has the same configuration as the sensitivity correction device 20 of FIG.

つまり、マイクロホンアレイシステム１００Ｂにおいては、マイクロホンＭ１を基準マイクロホンとし、他の（Ｎ−１）個のマイクロホンＭｋ（ｋ＝２〜Ｎ）の感度補正が感度補正装置２０−ｋ（ｋ＝２〜Ｎ）の各々によって行われる。これによりマイクロホンアレイ１０Ｂを構成する各マイクロホンＭｋの感度を補正するための条件が揃ったときに、各マイクロホンＭｋの感度の補正が順次実行される。このように本実施形態によれば、マイクロホンアレイが３個以上のマイクロホンで構成されている場合であっても、マイクロホンアレイシステム１００Ｂの利用者に特段の注意を払わせることなく自動的に、マイクロホンアレイ１０Ｂを構成する各マイクロホンの感度のばらつきを補正することができる。 That is, in the microphone array system 100B, the microphone M1 is used as a reference microphone, and the sensitivity correction of the other (N−1) microphones Mk (k = 2 to N) is performed by the sensitivity correction device 20-k (k = 2 to N). ) Each. Thus, when the conditions for correcting the sensitivity of each microphone Mk constituting the microphone array 10B are met, the sensitivity correction of each microphone Mk is sequentially executed. As described above, according to the present embodiment, even when the microphone array is composed of three or more microphones, the microphones are automatically and without paying special attention to the user of the microphone array system 100B. Variation in sensitivity of each microphone constituting the array 10B can be corrected.

ここで、図７に示すようにマイクロホンアレイがＮ個のマイクロホンで構成されている場合には、Ｎチャネルの独立成分分析を行って各マイクロホンの感度のばらつきを補正することも考えられる。具体的には、
各々異なる音源から放射されるＮ種類の音の混合音をマイクロホンアレイを構成するＮ個のマイクロホンの各々で収音して得られるＮ個の観測信号の各々に周波数分析を施し、複数の周波数の各々における信号強度を示す時系列の観測データをマイクロホン毎に算出する周波数分析部と、
前記複数の周波数のうちの少なくとも１つを選択し、当該周波数成分についての音源分離を行うためのＮ行Ｎ列の複素数値行列である分離行列を当該周波数成分の観測データに対する独立成分分析により生成する分離行列生成部と、
前記分離行列生成部により生成される分離行列の各行について、各行の行列要素の偏角の差から、当該行の行列要素により抑圧される音の到来方向を推定する方向推定部と、
前記方向推定部により推定される音の到来方向が前記マイクロホンアレイの法線方向から大きくはずれてはいない前記分離行列の行がある場合に、当該行の行列要素の絶対値の比に応じて前記各マイクロホンの出力信号の信号レベルのばらつきを補正する感度補正部と、を組み合わせて感度補正装置を構成し、この感度補正装置に上記Ｎ個のマイクロホンとＮ−１個のアンプを接続してマイクロホンアレイシステムを構成しても勿論良い。 Here, when the microphone array is composed of N microphones as shown in FIG. 7, it is conceivable to perform an N-channel independent component analysis to correct variations in sensitivity of each microphone. In particular,
A frequency analysis is performed on each of the N observation signals obtained by collecting a mixed sound of N kinds of sounds emitted from different sound sources by each of the N microphones constituting the microphone array, and a plurality of frequencies are obtained. A frequency analysis unit that calculates time-series observation data indicating the signal intensity at each microphone;
Select at least one of the plurality of frequencies and generate an N-by-N complex-value matrix separation matrix for performing sound source separation on the frequency component by independent component analysis on the observation data of the frequency component A separating matrix generating unit,
For each row of the separation matrix generated by the separation matrix generation unit, a direction estimation unit that estimates the arrival direction of the sound suppressed by the matrix element of the row from the difference in declination of the matrix element of each row;
When there is a row of the separation matrix in which the direction of arrival of the sound estimated by the direction estimation unit is not greatly deviated from the normal direction of the microphone array, depending on the ratio of absolute values of the matrix elements of the row A sensitivity correction unit is configured by combining a sensitivity correction unit that corrects variation in the signal level of the output signal of each microphone, and the N microphones and N-1 amplifiers are connected to the sensitivity correction device to connect the microphones. Of course, an array system may be configured.

Ｎチャネルの独立成分分析を行う態様と、本実施形態のように２チャンネルの独立成分分析を行う感度補正装置をＮ−１個組み合わせる態様の何れを採用してマイクロホンアレイシステムを構成するのかについては、マイクロホンアレイシステムの構成が簡潔になることが好ましいのか、それとも、分離行列の演算に要する演算量が少なくなることが好ましいのかに応じて定めるようにすれば良い。Ｎチャネルの独立成分分析を行う態様では、感度補正装置が１つで済むため、マイクロホンアレイシステムの構成は簡潔になる。これに対して、本実施形態のように２チャンネルの独立成分分析を行う感度補正装置をＮ−１個組み合わせてマイクロホンアレイシステムを構成する態様では、Ｎチャネルの独立成分分析を行う態様に比較して演算量が少なくなるといった特徴がある。Ｎチャネルの独立成分分析では、分離行列の逐次学習に要する演算量がＮ^２に比例するのに対し、２チャンネルの独立成分分析を行う感度補正装置をＮ−１個組み合わせる態様では、同演算量は２^２×（Ｎ−１）に比例するからである。 As to which of the mode of performing N-channel independent component analysis and the mode of combining N-1 sensitivity correction devices that perform 2-channel independent component analysis as in this embodiment, the microphone array system is configured. It may be determined depending on whether the configuration of the microphone array system is preferably simplified or it is preferable that the amount of computation required for the computation of the separation matrix is reduced. In the aspect in which the N-channel independent component analysis is performed, the configuration of the microphone array system is simplified because only one sensitivity correction device is required. On the other hand, in the embodiment in which the microphone array system is configured by combining N-1 sensitivity correction apparatuses that perform 2-channel independent component analysis as in the present embodiment, it is compared with the embodiment in which N-channel independent component analysis is performed. Therefore, the calculation amount is reduced. In the N-channel independent component analysis, the amount of calculation required for the sequential learning of the separation matrix is proportional to N ² , whereas in the aspect in which N−1 sensitivity correction devices that perform 2-channel independent component analysis are combined, the amount of calculation is the same. This is because is proportional to 2 ² × (N−1).

＜Ｃ：第３実施形態＞
上述した第１および第２実施形態では、分離行列生成部４０Ａにより生成された分離行列Ｗ（ｆｋ）を用いて、マイクロホンアレイを構成する各マイクロホンの感度のばらつきを補正した。しかし、分離行列Ｗ（ｆｋ）を用いて音源分離を行っても良いことは勿論である。図８は、観測信号Ｖ１および観測信号Ｖ２にフィルタ処理（音源分離）を施して分離信号Ｕ１およびＵ２を生成するマイクロホンアレイシステム１００の構成例を示すブロック図である。図８に示すマイクロホンアレイシステム１００は、マイクロホンＭ１およびマイクロホンＭ２からなるマイクロホンアレイと、観測信号Ｖ１および観測信号Ｖ２から分離信号Ｕ１および分離信号Ｕ２を生成する演算を実行する演算装置１２と、記憶装置１４とを含んでいる。図８においては、図１と同一の構成要素には同一の符号が付されている。以下、図１に示すシステムとの相違点を中心に説明する。 <C: Third Embodiment>
In the first and second embodiments described above, the variation in sensitivity of each microphone constituting the microphone array is corrected using the separation matrix W (fk) generated by the separation matrix generation unit 40A. However, it goes without saying that sound source separation may be performed using the separation matrix W (fk). FIG. 8 is a block diagram illustrating a configuration example of the microphone array system 100 that performs filtering (sound source separation) on the observation signal V1 and the observation signal V2 to generate the separation signals U1 and U2. A microphone array system 100 illustrated in FIG. 8 includes a microphone array including a microphone M1 and a microphone M2, an arithmetic device 12 that performs an operation of generating a separation signal U1 and a separation signal U2 from the observation signal V1 and the observation signal V2, and a storage device. 14 and so on. In FIG. 8, the same components as those in FIG. 1 are denoted by the same reference numerals. Hereinafter, a description will be given focusing on differences from the system shown in FIG.

図８に示すように演算装置１２は、周波数分析部２２、信号処理部２４、信号合成部２６および分離行列生成部４０を含んでいる。この演算装置１２は、前述した第１実施形態における感度補正装置２０と同様にコンピュータ装置であり、記憶装置１４に格納されているプログラムをＣＰＵに実行させることで周波数分析部２２、信号処理部２４、信号合成部２６および分離行列生成部４０として機能する。 As shown in FIG. 8, the arithmetic device 12 includes a frequency analysis unit 22, a signal processing unit 24, a signal synthesis unit 26, and a separation matrix generation unit 40. The arithmetic device 12 is a computer device, similar to the sensitivity correction device 20 in the first embodiment described above, and causes the CPU to execute a program stored in the storage device 14, thereby causing the frequency analysis unit 22 and the signal processing unit 24 to operate. , Function as a signal synthesis unit 26 and a separation matrix generation unit 40.

図８の信号処理部２４は、周波数分析部２２が算定した強度ｘ１(ｔ，ｆｋ)と強度ｘ２（ｔ，ｆｋ）とにフィルタ処理（音源分離）を実行することでフレーム毎に順次に強度ｕ１（ｔ，ｆｋ）および強度ｕ２（ｔ，ｆｋ）を生成する。信号合成部２６は、信号処理部２４が生成した強度ｕ１（ｔ，ｆ１）〜ｕ１（ｔ，ｆＫ）を時間領域の信号に変換するとともに前後のフレームで連結して分離信号Ｕ１を生成する。同様に、信号合成部２６は、強度ｕ２（ｔ，ｆ１）〜ｕ２（ｔ，ｆＫ）を時間領域の信号に変換するとともに前後のフレームで連結して分離信号Ｕ２を生成する。 The signal processing unit 24 in FIG. 8 performs the filtering process (sound source separation) on the intensity x1 (t, fk) and the intensity x2 (t, fk) calculated by the frequency analysis unit 22 to sequentially increase the intensity for each frame. u1 (t, fk) and intensity u2 (t, fk) are generated. The signal synthesis unit 26 converts the strengths u1 (t, f1) to u1 (t, fK) generated by the signal processing unit 24 into time domain signals and connects them with the preceding and subsequent frames to generate the separated signal U1. Similarly, the signal synthesizer 26 converts the strengths u2 (t, f1) to u2 (t, fK) into signals in the time domain and connects them with the preceding and succeeding frames to generate a separated signal U2.

図９は、信号処理部２４のブロック図である。図９に示すように、信号処理部２４は、Ｋ個の周波数ｆ１〜ｆＫの各々に対応するＫ個の処理部Ｐ１〜ＰＫで構成される。周波数ｆｋに対応する処理部Ｐｋは、強度ｘ１（ｔ，ｆｋ）および強度ｘ２(ｔ，ｆｋ)から強度ｕ１(ｔ，ｆｋ)を生成するフィルタ３２と、強度ｘ１（ｔ，ｆｋ）および強度ｘ２（ｔ，ｆｋ）から強度ｕ２(ｔ，ｆｋ)を生成するフィルタ３４とを含んでいる。 FIG. 9 is a block diagram of the signal processing unit 24. As illustrated in FIG. 9, the signal processing unit 24 includes K processing units P1 to PK corresponding to the K frequencies f1 to fK, respectively. The processing unit Pk corresponding to the frequency fk includes a filter 32 that generates the intensity u1 (t, fk) from the intensity x1 (t, fk) and the intensity x2 (t, fk), and the intensity x1 (t, fk) and the intensity x2. And a filter 34 that generates an intensity u2 (t, fk) from (t, fk).

フィルタ３２およびフィルタ３４には遅延加算型（ＤＳ(delay-sum)型）のビームフォーマが利用される。すなわち、処理部Ｐｋのフィルタ３２は、前掲数１で定義されるように、係数ｗ１１(ｆｋ)に応じた遅延を強度ｘ１（ｔ，ｆｋ）に付加する遅延素子３２１と、係数ｗ１２(ｆｋ)に応じた遅延を強度ｘ２（ｔ，ｆｋ）に付加する遅延素子３２３と、遅延素子３２１の出力と遅延素子３２３の出力とを加算して分離信号Ｕ1の強度ｕ１（ｔ，ｆｋ）を生成する加算部３２５とを含んでいる。同様に、フィルタ３４は、前掲数２で定義されるように、係数ｗ２１（ｆｋ）に応じた遅延を強度ｘ１（ｔ，ｆｋ）に付加する遅延素子３４１と、係数ｗ２２（ｆｋ）に応じた遅延を強度ｘ２（ｔ，ｆｋ）に付加する遅延素子３４３と、遅延素子３４１の出力と遅延素子３４３の出力とを加算して分離信号Ｕ２の強度ｕ２（ｔ，ｆｋ）を生成する加算部３４５とを含む。 For the filters 32 and 34, a delay addition type (DS (delay-sum) type) beamformer is used. That is, the filter 32 of the processing unit Pk includes a delay element 321 that adds a delay corresponding to the coefficient w11 (fk) to the intensity x1 (t, fk) and a coefficient w12 (fk), as defined by the above equation 1. A delay element 323 that adds a delay corresponding to the intensity x2 (t, fk), and the output of the delay element 321 and the output of the delay element 323 are added to generate the intensity u1 (t, fk) of the separated signal U1. And an adder 325. Similarly, the filter 34 includes a delay element 341 that adds a delay corresponding to the coefficient w21 (fk) to the intensity x1 (t, fk) and a coefficient w22 (fk) as defined in the above equation 2. A delay element 343 that adds a delay to the intensity x2 (t, fk), and an adder 345 that adds the output of the delay element 341 and the output of the delay element 343 to generate the intensity u2 (t, fk) of the separated signal U2. Including.

図１０は、分離行列生成部４０の構成例を示すブロック図である。この分離行列生成部４０は、前述した第１実施形態における分離行列生成部４０Ａと同様に観測データＤ（ｆｋ）を用いて独立成分分析を行うことにより分離行列を生成する。図１０に示すように分離行列生成部４０は、初期値生成部４２、学習処理部４４および周波数選択部５４を含んでいる。そして、分離行列生成部４０は、選択周波数ｆｋに関して学習処理部４４による学習処理で生成される分離行列Ｗ（ｆｋ）の各行列要素を信号処理部２４の処理部Ｐｋのフィルタ３２およびフィルタ３４に各々設定する。 FIG. 10 is a block diagram illustrating a configuration example of the separation matrix generation unit 40. The separation matrix generation unit 40 generates a separation matrix by performing independent component analysis using the observation data D (fk) in the same manner as the separation matrix generation unit 40A in the first embodiment described above. As illustrated in FIG. 10, the separation matrix generation unit 40 includes an initial value generation unit 42, a learning processing unit 44, and a frequency selection unit 54. Then, the separation matrix generation unit 40 applies each matrix element of the separation matrix W (fk) generated by the learning processing by the learning processing unit 44 with respect to the selected frequency fk to the filter 32 and the filter 34 of the processing unit Pk of the signal processing unit 24. Set each one.

加えて、分離行列生成部４０は、図１０に示すように方向推定部７２と行列補充部７４とを有している。方向推定部７２は、選択周波数ｆｋの各々に関して学習処理部４４により生成された分離行列Ｗ（ｆｋ）の各行により分離される音の到来方向θ１（ｆｋ）およびθ２（ｆｋ）を推定し、それらθ１（ｆｋ）を代表する値θ１（θ１（ｆｋ）の相加平均や中央値）およびθ２（ｆｋ）を代表する値θ２を算出し、θ１およびθ２を示すデータを行列補充部７４に与える。図１０の行列補充部７４は、周波数ｆ１〜ｆＫのＫ種類の周波数のうち、周波数選択部５４により選択されなかった周波数（以下、非選択周波数）についての分離行列を以下の要領で生成し、信号処理部２４に与える。すなわち、行列補充部７４は、前述した初期値生成部４２における初期分離行列の生成と同様のアルゴリズムにしたがって、非選択周波数についての分離行列をその１行目についてはθ１方向が死角となり、２行目についてはθ２方向が死角となるように生成する。 In addition, the separation matrix generation unit 40 includes a direction estimation unit 72 and a matrix supplementation unit 74 as shown in FIG. The direction estimation unit 72 estimates arrival directions θ1 (fk) and θ2 (fk) of sounds separated by each row of the separation matrix W (fk) generated by the learning processing unit 44 for each of the selected frequencies fk, and A value θ1 representative of θ1 (fk) (arithmetic mean or median of θ1 (fk)) and a value θ2 representative of θ2 (fk) are calculated, and data indicating θ1 and θ2 is given to the matrix supplementation unit 74. The matrix supplementation unit 74 in FIG. 10 generates a separation matrix for frequencies that are not selected by the frequency selection unit 54 among the K types of frequencies f1 to fK (hereinafter, non-selected frequencies) in the following manner. The signal processing unit 24 is provided. That is, the matrix supplementation unit 74 follows the same algorithm as the generation of the initial separation matrix in the initial value generation unit 42 described above, and the separation matrix for the non-selected frequency becomes a blind spot in the θ1 direction for the first row. The eyes are generated so that the θ2 direction is a blind spot.

分離行列を用いた従来の音源分離では、分離行列の生成に要する演算量を削減するため、周波数ｆ１〜ｆＫのＫ種類の周波数のうちの特定の周波数（本実施形態では、選択周波数ｆｋ）についてのみ分離行列の学習を行い、その他の周波数については初期値生成部４２により生成した初期分離行列をそのまま用いることが一般的であった。学習処理により得られた分離行列を用いる周波数帯域では、その分離行列を介してマイクロホンアレイを構成する各マイクロホンの感度のばらつきが補正されるが、初期分離行列を用いる周波数帯域では各マイクロホンの感度のばらつきが補正されておらず、各マイクロホンの感度のばらつきに起因して死角が適切に形成されず、音源の分離精度が劣化するという問題があった。これに対して本実施形態では、非選択周波数について、学習処理により得られた分離行列から推定される方向に死角を形成するように生成された分離行列を用いることで、精度良く音源分離を行うことが可能になる。 In the conventional sound source separation using the separation matrix, in order to reduce the amount of calculation required to generate the separation matrix, a specific frequency (selected frequency fk in the present embodiment) among the K types of frequencies f1 to fK is selected. In general, only the separation matrix is learned, and the initial separation matrix generated by the initial value generation unit 42 is used as it is for other frequencies. In the frequency band using the separation matrix obtained by the learning process, the variation in sensitivity of each microphone constituting the microphone array is corrected via the separation matrix, but in the frequency band using the initial separation matrix, the sensitivity of each microphone is corrected. There was a problem that the variation was not corrected, the blind spot was not properly formed due to the variation in sensitivity of each microphone, and the sound source separation accuracy deteriorated. On the other hand, in this embodiment, sound source separation is performed with high accuracy by using a separation matrix generated so as to form a blind spot in a direction estimated from the separation matrix obtained by the learning process for the non-selected frequencies. It becomes possible.

＜Ｄ：変形＞
以上、本発明の各実施形態について説明したが、これら実施形態に以下の変形を加えても勿論良い。
（１）上述した各実施形態では、マイクロホンアレイのアレイ面におけるマイクロホンの配置間隔に応じて、分離行列を学習する周波数を選択したが、他の尺度を基準に周波数の選択を行っても良い。このような尺度の一例としては学習の有意性（分離行列を学習することにより音源分離の精度が初期分離行列を用いた音源分離に比較して向上する場合に、その向上の度合い）を用いることが考えられる。ここで、学習の有意性を示す指標としては、例えば、Ｋ個の周波数ｆ１〜ｆＫの各々についての観測データＤ（ｆｋ）の共分散行列Ｒｘｘ（ｆｋ）の行列式ｚ１（ｆｋ）が好適であることが知られている。具体的には、行列式ｚ１（ｆｋ）が所定の閾値を上回っている周波数ｆｋを学習対象として選択するといった具合である。なお、共分散行列Ｒｘｘ（ｆｋ）は以下の数８で定義される。以下の数８や数９における記号Ｅは期待値（加算値）を意味し、記号Σ_{t}は、単位区間ＴＵ内の複数（例えば５０個）のフレームにわたる加算（平均）を意味する。すなわち、共分散行列Ｒｘｘ（ｆｋ）は、観測ベクトルＸ（ｔ，ｆｋ）と観測ベクトルＸ（ｔ，ｆｋ）の転置との乗算を単位区間ＴＵ内（観測データＤ（ｆｋ）内）の複数の観測ベクトルＸ（ｔ，ｆｋ）について加算したｎ行ｎ列の行列である。ただし、以下の数９では、単位区間ＴＵ内の総てのフレームにわたる観測ベクトルＸ（ｔ，ｆｋ）の加算を零行列と仮定した（ゼロ平均）。

<D: Deformation>
As mentioned above, although each embodiment of this invention was described, it is needless to say that the following modifications may be added to these embodiments.
(1) In each of the above-described embodiments, the frequency for learning the separation matrix is selected according to the arrangement interval of the microphones on the array surface of the microphone array. However, the frequency may be selected based on another scale. One example of such a measure is the significance of learning (the degree of improvement when learning the separation matrix improves the accuracy of sound source separation compared to sound source separation using the initial separation matrix). Can be considered. Here, as an index indicating the significance of learning, for example, the determinant z1 (fk) of the covariance matrix Rxx (fk) of the observation data D (fk) for each of the K frequencies f1 to fK is preferable. It is known that there is. Specifically, the frequency fk at which the determinant z1 (fk) exceeds a predetermined threshold is selected as a learning target. The covariance matrix Rxx (fk) is defined by the following formula 8. Symbol E in the following equations 8 and 9 means an expected value (added value), and symbol Σ_ {t} means addition (average) over a plurality of (for example, 50) frames in the unit interval TU. That is, the covariance matrix Rxx (fk) is obtained by multiplying the observation vector X (t, fk) by the transpose of the observation vector X (t, fk) within a unit interval TU (in the observation data D (fk)). It is an n-by-n matrix added for the observation vector X (t, fk). However, in the following Equation 9, the addition of the observation vectors X (t, fk) over all the frames in the unit interval TU is assumed to be a zero matrix (zero average).

（２）上述した各実施形態では、初期分離行列Ｗ_０（ｆｋ）として、１行目の行列要素によりマイクロホンアレイのアレイ面の法線方向に死角を形成し、かつ、２行目の行列要素によりマイクロホンアレイにおける各マイクロホンの配列方向に死角を形成する死角型ビームフォーマのものを用いたが、１行目の行列要素の役割と２行目の行列要素の役割とを入れ替えたものを用いても良い。このように、１行目の行列要素によりマイクロホンアレイにおけるマイクロホンの配列方向に死角を形成し、かつ、２行目の行列要素によりマイクロホンアレイのアレイ面の法線方向に死角を形成する死角型ビームフォーマのものを用いる場合には、逐次学習により生成される分離行列Ｗ（ｆｋ）の２行目の行列要素により抑圧される音の到来方向がアレイ面の法線方向から大幅にはずれているか否かを判定し、はずれてはいない場合に、当該２行目の行列要素の絶対値の比（すなわち、｜ｗ２２｜／｜ｗ２１｜）に応じて補正対象マイクロホン（第１実施形態においては、マイクロホンＭ２、第２実施形態においてはマイクロホンＭ２〜ＭＮ）の出力信号のゲインを調整することで感度補正を行えば良い。 (2) In each of the embodiments described above, a blind spot is formed in the normal direction of the array surface of the microphone array by the matrix element in the first row as the initial separation matrix W ₀ (fk), and the matrix element in the second row Thus, a blind spot beamformer that forms a blind spot in the direction of arrangement of each microphone in the microphone array was used, but the role of the matrix element in the first row and the role of the matrix element in the second row were used interchangeably. Also good. In this way, a blind spot beam that forms a blind spot in the microphone array direction in the microphone array by the matrix element in the first row and forms a blind spot in the normal direction of the array surface of the microphone array by the matrix element in the second row. When the former one is used, whether or not the arrival direction of the sound suppressed by the matrix element in the second row of the separation matrix W (fk) generated by the sequential learning is greatly deviated from the normal direction of the array surface. If not, the correction target microphone (in the first embodiment, the microphone) is determined according to the ratio of the absolute values of the matrix elements in the second row (that is, | w22 | / | w21 |). M2, in the second embodiment, sensitivity correction may be performed by adjusting the gain of the output signal of the microphones M2 to MN).

（３）上述した各実施形態では、本発明の特徴を顕著に示す感度補正装置がマイクロホンアレイシステムに予め組み込まれていたが、感度補正装置単体で提供し、感度補正装置の各部をマイクロホンアレイの各部に接続してマイクロホンアレイシステム１００Ａやマイクロホンアレイシステム１００Ｂと同様な構成となるようにしても良い。 (3) In each of the above-described embodiments, the sensitivity correction apparatus that significantly shows the features of the present invention is incorporated in the microphone array system in advance. However, the sensitivity correction apparatus is provided as a single unit, and each part of the sensitivity correction apparatus is connected to the microphone array. It may be configured to be connected to each unit to have a configuration similar to that of the microphone array system 100A or the microphone array system 100B.

（４）上述した実施形態では、本発明に特徴的なマイクロホンの感度補正をＣＰＵに実行させるプログラムが記憶装置１４に予め格納されていた。しかしながら、ＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）などのコンピュータ読み取り可能な記録媒体に上記プログラムを書き込んで配布しても良く、また、インターネットなどの電気通信回線経由のダウンロードにより上記プログラムを配布しても良い。 (4) In the above-described embodiment, the program for causing the CPU to perform the microphone sensitivity correction characteristic of the present invention is stored in the storage device 14 in advance. However, the program may be distributed by being written on a computer-readable recording medium such as a CD-ROM (Compact Disk-Read Only Memory), or the program may be distributed by downloading via a telecommunication line such as the Internet. May be.

１００Ａ，１００Ｂ，１００…マイクロホンアレイシステム、１０Ａ，１０Ｂ…マイクロホンアレイ、Ｍ１，Ｍ２，ＭＮ…マイクロホン、２０，２０−２，２０−３…２０−Ｎ…感度補正装置、１２…演算装置、２２…周波数分析部、１４…記憶装置、４０Ａ，４０…分離行列生成部、４２…初期値生成部、４４…学習処理部、５４…周波数選択部、２８…感度補正制御部、７２…方向推定部、７４…行列補充部、７６…補正量算定部。 100A, 100B, 100 ... microphone array system, 10A, 10B ... microphone array, M1, M2, MN ... microphone, 20, 20-2, 20-3 ... 20-N ... sensitivity correction device, 12 ... arithmetic device, 22 ... Frequency analysis unit, 14 ... storage device, 40A, 40 ... separation matrix generation unit, 42 ... initial value generation unit, 44 ... learning processing unit, 54 ... frequency selection unit, 28 ... sensitivity correction control unit, 72 ... direction estimation unit, 74: Matrix replenishment unit, 76 ... Correction amount calculation unit.

Claims

Each of M observation signals obtained by collecting a mixed sound of M kinds of sounds (M is a natural number of 2 or more) radiated from different sound sources with each of the M microphones constituting the microphone array has a frequency. A frequency analysis unit that performs analysis and calculates time-series observation data indicating the signal strength at each of a plurality of frequencies for each microphone;
Select at least one of the plurality of frequencies, and generate a separation matrix that is a complex value matrix of M rows and M columns to perform sound source separation for the frequency component by independent component analysis on the observation data of the frequency component A separating matrix generating unit,
For each row of the separation matrix generated by the separation matrix generation unit, a direction estimation unit that estimates the arrival direction of the sound suppressed by the matrix element of the row from the difference in declination of the matrix element of each row;
When there is a row of the separation matrix in which the direction of arrival of the sound estimated by the direction estimation unit is not greatly deviated from the normal direction of the microphone array, depending on the ratio of absolute values of the matrix elements of the row And a sensitivity correction unit that corrects variations in the signal level of the output signal of each microphone. A microphone sensitivity correction apparatus constituting a microphone array.

When M = 2, the separation matrix generation unit
The initial separation matrix that is the starting point for the independent component analysis is set to a value that suppresses sound coming from the normal direction of the array surface of the microphone array with respect to the matrix element of one row, and the other row. The sensitivity correction apparatus according to claim 1, wherein values are set so as to suppress sound coming from the arrangement direction of the microphones on the array surface.

A microphone array composed of N (N is a natural number of 2 or more) microphones;
The sensitivity correction device according to claim 1, wherein M = 2 is provided, and N−1 sensitivity correction devices are provided.
Any one of the N microphones is a reference microphone, each of the other N-1 microphones is a sensitivity correction target microphone, and each of the N-1 sensitivity correction devices is the above-described microphone. Each of the N-1 sensitivity correction target microphones is connected to each of the N-1 sensitivity correction target microphones, and each of the N-1 sensitivity correction device is connected to the reference microphone. The microphone array system, wherein the signal level of the output signal of each of the N-1 correction target microphones is corrected by:

Computer
Each of M observation signals obtained by collecting a mixed sound of M kinds of sounds (M is a natural number of 2 or more) radiated from different sound sources with each of the M microphones constituting the microphone array has a frequency. A frequency analysis unit that performs analysis and calculates time-series observation data indicating the signal strength at each of a plurality of frequencies for each microphone;
Select at least one of the plurality of frequencies, and generate a separation matrix that is a complex value matrix of M rows and M columns to perform sound source separation for the frequency component by independent component analysis on the observation data of the frequency component A separating matrix generating unit,
For each row of the separation matrix generated by the separation matrix generation unit, a direction estimation unit that estimates the arrival direction of the sound suppressed by the matrix element of the row from the difference in declination of the matrix element of each row;
When there is a row of the separation matrix in which the direction of arrival of the sound estimated by the direction estimation unit is not greatly deviated from the normal direction of the microphone array, depending on the ratio of absolute values of the matrix elements of the row A program that functions as a sensitivity correction unit that corrects variations in the signal level of the output signal of each microphone.