JP2012165189A

JP2012165189A - Zoom microphone device

Info

Publication number: JP2012165189A
Application number: JP2011024178A
Authority: JP
Inventors: Kenta Niwa; 健太丹羽; Sumitaka Sakauchi; 澄宇阪内; Kenichi Furuya; 賢一古家; Yoichi Haneda; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2011-02-07
Filing date: 2011-02-07
Publication date: 2012-08-30
Anticipated expiration: 2031-02-07
Also published as: JP5395822B2

Abstract

【課題】従来よりも鋭い指向性を有する狭指向音声強調技術を実現可能なズームマイク装置を提供する。
【解決手段】支持構造体１基と、固定反射板２Ｎ枚（Ｎは１以上の整数）と、複数のマイクロホンを直線状に配置してなるマイクロホンアレー２Ｎ個とを備え、固定反射板の面とマイクロホンアレーのマイクロホン配列方向とが平行になるようにマイクロホンアレーが固定反射板の面に１つずつ取り付けられ、２枚の固定反射板のマイクロホンアレーを取り付けた面同士が９０度をなし、一方の固定反射板に取り付けられたマイクロホンアレーのマイクロホン配列方向と、他方の固定反射板に取り付けられたマイクロホンアレーのマイクロホン配列方向とが９０度をなすように２枚の固定反射板同士を向き合わせて固定した固定反射板の組（Ｎ組）が支持構造体に取り付けられる。
【選択図】図１８A zoom microphone apparatus capable of realizing a narrow-directional speech enhancement technique having sharper directivity than before is provided.
A surface of a fixed reflector is provided with one support structure, 2N fixed reflectors (N is an integer of 1 or more), and 2N microphone arrays in which a plurality of microphones are linearly arranged. One microphone array is attached to the surface of the fixed reflector so that the microphone array direction of the microphone array and the microphone array of the microphone array are parallel to each other, and the surfaces of the two fixed reflectors to which the microphone array is attached form 90 degrees. The two fixed reflectors face each other so that the microphone array direction of the microphone array attached to the fixed reflector is 90 degrees and the microphone array direction of the microphone array attached to the other fixed reflector is 90 degrees. A fixed set of fixed reflectors (N sets) is attached to the support structure.
[Selection] Figure 18

Description

本発明は所望の方向を含む狭い範囲の音声を強調する技術（狭指向音声強調技術）を実現するズームマイク装置に関する。 The present invention relates to a zoom microphone apparatus that realizes a technique for enhancing a narrow range of sound including a desired direction (narrow-directed sound enhancement technique).

例えばマイクロホンを備えた動画撮影装置（ビデオカメラやカムコーダ）で被写体をズームイン撮影する場合を考えると、ズームイン撮影に連動して被写体近傍のみからの音声が強調されることが動画撮影にとって好ましい。このような、所望の方向（目的方向）を含む狭い範囲の音声を強調する技術（狭指向音声強調技術）は、従来から研究・開発されている。なお、マイクロホンの周囲の方向とマイクロホンの感度との関係は指向性と呼ばれ、或る方向への指向性が鋭いほど、当該方向を含む狭い範囲の音声を強調し、当該範囲以外の範囲の音声を抑圧することができる。ここでは、狭指向音声強調技術に関する従来技術＜反射音を選択収音することによる狭指向音声強調技術＞を例示する。なお、この明細書では、「音声」は、人の発する声に限定されるものではなく、人や動物の声はもとより楽音や環境雑音など「音」一般を指す。 For example, considering a case where a subject is zoomed in with a moving image shooting apparatus (video camera or camcorder) equipped with a microphone, it is preferable for moving image shooting that the sound from only the vicinity of the subject is enhanced in conjunction with the zoom in shooting. Such a technology (narrow-directed speech enhancement technology) for enhancing a narrow range of speech including a desired direction (target direction) has been researched and developed conventionally. Note that the relationship between the direction around the microphone and the sensitivity of the microphone is called directivity. The sharper the directivity in a certain direction, the more the sound in a narrow range including the direction is emphasized. The voice can be suppressed. Here, a conventional technique relating to a narrow-directional speech enhancement technique <a narrow-directional speech enhancement technique by selectively collecting reflected sounds> will be exemplified. In this specification, “speech” is not limited to a voice uttered by a person, but refers to a general “sound” such as a musical sound or an environmental noise as well as a voice of a person or an animal.

＜反射音を選択収音することによる狭指向音声強調技術＞
この技術の代表的な例として、マルチビームフォーミング法がある（非特許文献１参照）。マルチビームフォーミング法は、直接音や反射音という個々の音を寄せ集めることで、高SN比で目的方向の音声を収音することができる狭指向音声強調技術であり、音声分野よりも無線分野でよく研究されている。 <Narrow-directed speech enhancement technology by selectively collecting reflected sounds>
A typical example of this technique is a multi-beam forming method (see Non-Patent Document 1). The multi-beam forming method is a narrow-directional speech enhancement technology that collects individual sounds such as direct sound and reflected sound, and can pick up the sound in the target direction with a high signal-to-noise ratio. Well studied.

以下、周波数領域でのマルチビームフォーミング法の処理内容を説明する。説明に先立ち、記号を定義する。周波数のインデックスをω、フレーム番号のインデックスをkとする。M個のマイクロホンで受音したアナログ信号の周波数領域表現をX^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^T、方向θ_sにある強調したい音源からの直接音の到来方向をθ_s1、反射音の到来方向をθ_s2,…,θ_sRとする。Tは転置を表し、R-1は反射音の総数である。方向θ_srの音声を強調するフィルタをW^→(ω,θ_sr)とする。ここで、rは1≦r≦Rを満たす各整数である。 Hereinafter, processing contents of the multi-beam forming method in the frequency domain will be described. Prior to explanation, symbols are defined. Let the frequency index be ω and the frame number index be k. A frequency domain representation of an analog signal received by M microphones is expressed as X ^→ (ω, k) = [X ₁ (ω, k), ..., X _M (ω, k)] ^T , with direction θ _s The direction of arrival of direct sound from the desired sound source is θ _s1 , and the direction of arrival of reflected sound is θ _s2 ,..., Θ _sR . T represents transposition and R-1 is the total number of reflected sounds. _Let W ^→ (ω, θ _sr ) be a filter that enhances the voice in the direction θ _sr . Here, r is an integer satisfying 1 ≦ r ≦ R.

マルチビームフォーミング法では、直接音および反射音の到来方向や到来時間が既知であることが前提である。つまり、音の反射が明らかに予想できる壁、床、反射板といった物体の数がR-1に等しい。また、反射音数R-1は３あるいは４という比較的小さな値に設定されることが多い。これは、直接音と低次の反射音との間に高い相関性が認められることに基づく。マルチビームフォーミング法は、各々の音声を個別に強調して同期加算する方式なので、出力信号Y(ω,k,θ_s)は式（１）で与えられる。Hはエルミート転置を表す。

In the multi-beam forming method, it is assumed that the arrival direction and arrival time of the direct sound and the reflected sound are known. In other words, the number of objects such as walls, floors, and reflectors that can clearly predict sound reflection is equal to R-1. The reflected sound number R-1 is often set to a relatively small value of 3 or 4. This is based on the fact that a high correlation is recognized between the direct sound and the low-order reflected sound. Since the multi-beam forming method is a method in which each sound is individually emphasized and synchronously added, the output signal Y (ω, k, θ _s ) is given by Equation (1). H represents Hermitian transpose.

フィルタW^→(ω,θ_sr)の設計法として遅延合成法を説明する。直接音や反射音が平面波到来すると仮定すると、フィルタW^→(ω,θ_sr)は式（２）で与えられる。h^→(ω,θ_sr)=[h₁(ω,θ_sr),…,h_M(ω,θ_sr)]^Tは、方向θ_srから到来する音声の伝搬ベクトルである。

A delay synthesis method will be described as a design method of the filter W ^→ (ω, θ _sr ). Assuming that direct sound or reflected sound arrives as a plane wave, the filter W ^→ (ω, θ _sr ) is given by equation (2). h ^→ (ω, θ _sr ) = [h ₁ (ω, θ _sr ),..., h _M (ω, θ _sr )] ^T is a propagation vector of speech coming from the direction θ _sr .

線形マイクロホンアレー（M個のマイクロホンが直線状に並べられたマイクロホンアレー）に平面波が到来することを仮定すると、h^→(ω,θ_sr)を構成する要素h_m(ω,θ_sr)は式（３）で与えられる。mは1≦m≦Mを満たす各整数である。cは音速を、uは隣り合うマイクロホン間の距離を表す。ｊは虚数単位である。τ(θ_sr)は、方向θ_srから到来する反射音の直接音に対する時間遅延を表す。

When linear microphone array assuming that the plane wave (M number of microphones microphone array are arranged in a straight line) ^{_{arrives, h → (ω, θ sr}} ) elements h _m (ω, θ _sr) that constitute the formula It is given by (3). m is an integer satisfying 1 ≦ m ≦ M. c represents the speed of sound, and u represents the distance between adjacent microphones. j is an imaginary unit. τ (θ _sr ) represents a time delay with respect to the direct sound of the reflected sound coming from the direction θ _sr .

最後に、出力信号Y(ω,k,θ_s)を時間領域に変換することによって、目的方向θ_sにある音源の音声を強調した信号が得られる。 Finally, by converting the output signal Y (ω, k, θ _s ) to the time domain, a signal in which the sound of the sound source in the target direction θ _s is enhanced is obtained.

マルチビームフォーミング法による狭指向音声強調技術の機能構成を図１に示す。 FIG. 1 shows a functional configuration of the narrow directivity speech enhancement technique based on the multi-beam forming method.

ステップ１
ＡＤ変換部１１０は、M個のマイクロホン１００−１，…，１００−Ｍの出力であるアナログ信号をディジタル信号x^→(t)＝[x₁(t),…,x_M(t)]^Tに変換する。ここでtは離散時間のインデックスを表す。 Step 1
The AD converter 110 converts an analog signal output from the M microphones 100-1,..., 100-M into a digital signal x ^→ (t) = [x ₁ (t),..., X _M (t)] ^T Convert to Here, t represents an index of discrete time.

ステップ２
周波数領域変換部１２０は、各チャネルのディジタル信号を高速離散フーリエ変換などの手法で周波数領域信号に変換する。例えば、m番目(1≦m≦M)のマイクロホンについて、N点の信号x_m((k-1)N+1),…,x_m(kN)をバッファに貯める。Nは16KHzサンプリングの場合で512程度である。バッファに貯められたMチャネルのアナログ信号を高速離散フーリエ変換処理することによって、周波数領域信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tを得る。 Step 2
The frequency domain transform unit 120 transforms the digital signal of each channel into a frequency domain signal by a technique such as fast discrete Fourier transform. For example, for the m-th (1 ≦ m ≦ M) microphone, N-point signals x _m ((k−1) N + 1),..., X _m (kN) are stored in the buffer. N is about 512 in the case of 16 KHz sampling. The frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] by subjecting the M-channel analog signal stored in the buffer to fast discrete Fourier transform processing. ^{Get T.}

ステップ３
各強調フィルタリング部１３０−ｒ（1≦r≦R）は、周波数領域信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tに対して方向θ_srのフィルタW^→H(ω,θ_sr)を適用し、方向θ_srの音声が強調された信号Z_r(ω,k)を出力する。つまり、各強調フィルタリング部１３０−ｒ（1≦r≦R）は、式（４）で表される処理を行う。

Step 3
Each emphasis filtering unit 130-r (1 ≦ r ≦ R) corresponds to the frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T. direction theta _sr filter ^{_{W → H (ω, θ sr}} ) was applied, and outputs a direction theta _sr of audio enhancement signal Z _r (ω, k). That is, each emphasizing filtering unit 130-r (1 ≦ r ≦ R) performs the processing represented by Expression (4).

ステップ４
加算部１４０は、信号Z₁(ω,k),…,Z_R(ω,k)を入力として、加算信号Y(ω,k)を出力する。加算処理は式（５）で表わされる。

Step 4
The adder 140 receives the signals Z ₁ (ω, k),..., Z _R (ω, k) and outputs an addition signal Y (ω, k). The addition process is expressed by equation (5).

ステップ５
時間領域変換部１５０は、加算信号Y(ω,k)を時間領域に変換して方向θ_sの音声が強調された時間領域信号y(t)を出力する。 Step 5
The time domain conversion unit 150 converts the addition signal Y (ω, k) into the time domain and outputs a time domain signal y (t) in which the voice in the direction θ _s is emphasized.

J.L.Flanagan, A.C.Surendran, E.E.Jan, "Spatially selective sound capture for speech and audio processing," Speech Communication, Volume 13, Issue 1-2, pp.207-222, October 1993.J.L.Flanagan, A.C.Surendran, E.E.Jan, "Spatially selective sound capture for speech and audio processing," Speech Communication, Volume 13, Issue 1-2, pp.207-222, October 1993.

上述した狭指向音声強調技術によると、目的方向以外の方向の音声に埋もれないように目的方向の音声を高SN比で収音することや上述の駆動制御手段を要することなく任意の方向の音声を強調することは可能であるが、狭指向性を実現することが難しい。特に、人の声は１００Hz程度から２kHz程度の周波数成分を多く含んでいるが、上述の従来技術によって、このような低周波帯域で目的方向に対して±５°〜±１０°程度の鋭い指向性を実現することは困難である。このような現状にあって、十分なSN比で収音し、マイクロホンの物理的な移動を要することなく任意の方向の音声に追従可能でもありながら、所望の方向に対して従来よりも鋭い指向性を有する狭指向音声強調技術を実現するのに適した装置が従来存在しなかった。 According to the narrow-directional speech enhancement technique described above, the voice in any direction is collected without collecting the voice in the target direction at a high SN ratio so as not to be buried in the voice in the direction other than the target direction, and without requiring the drive control means described above. Can be emphasized, but it is difficult to achieve narrow directivity. In particular, human voice contains many frequency components from about 100 Hz to about 2 kHz, but with the above-described conventional technology, a sharp directivity of about ± 5 ° to ± 10 ° with respect to the target direction in such a low frequency band. It is difficult to realize sex. Under such circumstances, sound is picked up with a sufficient signal-to-noise ratio and can follow the sound in any direction without the need for physical movement of the microphone. In the past, there was no device suitable for realizing the narrow-directional speech enhancement technology having the characteristics.

そこで、本発明では従来よりも鋭い指向性を有する狭指向音声強調技術を実現可能なズームマイク装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a zoom microphone apparatus that can realize a narrow-directional speech enhancement technique having sharper directivity than before.

本発明のズームマイク装置は、支持構造体１基と、固定反射板２Ｎ枚（Ｎは１以上の整数）と、複数のマイクロホンを直線状に配置してなるマイクロホンアレー２Ｎ個とを備える。本発明のズームマイク装置は、固定反射板の面とマイクロホンアレーのマイクロホン配列方向とが平行になるようにマイクロホンアレーが固定反射板の面に１つずつ取り付けられ、２枚の固定反射板のマイクロホンアレーを取り付けた面同士が９０度をなし、一方の固定反射板に取り付けられたマイクロホンアレーのマイクロホン配列方向と、他方の固定反射板に取り付けられたマイクロホンアレーのマイクロホン配列方向とが９０度をなすように２枚の固定反射板同士を向き合わせて固定した固定反射板の組（Ｎ組）が支持構造体に取り付けられてなる。 The zoom microphone device of the present invention includes one support structure, 2N fixed reflectors (N is an integer of 1 or more), and 2N microphone arrays in which a plurality of microphones are linearly arranged. In the zoom microphone device of the present invention, the microphone array is attached one by one to the surface of the fixed reflector so that the surface of the fixed reflector and the microphone array direction of the microphone array are parallel to each other. The surfaces to which the array is attached form 90 degrees, and the microphone array direction of the microphone array attached to one fixed reflector and the microphone array direction of the microphone array attached to the other fixed reflector form 90 degrees. In this way, a set of fixed reflectors (N sets) in which two fixed reflectors are fixed to face each other is attached to the support structure.

本発明のズームマイク装置により、従来よりも鋭い指向性を有する狭指向音声強調技術を実現できる。 With the zoom microphone device of the present invention, it is possible to realize a narrow directional speech enhancement technique having sharper directivity than before.

従来技術の一例としてマルチビームフォーミング法による狭指向音声強調技術の機能構成を示す図。The figure which shows the function structure of the narrow directional speech enhancement technique by a multi-beam forming method as an example of a prior art. （ａ）直接音だけを考慮した場合に狭指向性が十分に実現できないことを模式的に示す図、（ｂ）直接音と反射音を考慮した場合に狭指向性が十分に実現できることを模式的に示す図。(A) A diagram schematically showing that narrow directivity cannot be sufficiently realized when only direct sound is considered, and (b) schematically showing that narrow directivity can be sufficiently realized when direct sound and reflected sound are considered. FIG. 従来技術による場合と本発明の原理による場合のコヒーレンスの方向依存性を示す図。The figure which shows the direction dependence of the coherence in the case of based on the case of a prior art, and the principle of this invention. 実施形態に係る狭指向音声強調装置の機能構成を示す図。The figure which shows the function structure of the narrow directivity audio | voice emphasis apparatus which concerns on embodiment. 実施形態に係る狭指向音声強調方法の処理手順を示す図。The figure which shows the process sequence of the narrow directivity audio | voice enhancement method which concerns on embodiment. 第１の実施例の構成を示す図。The figure which shows the structure of a 1st Example. 第１の実施例の実験結果を示す図。The figure which shows the experimental result of a 1st Example. 第１の実施例の実験結果を示す図。The figure which shows the experimental result of a 1st Example. 第１の実施例にてフィルタW^→(ω,θ)による指向性を示す図。The figure which shows the directivity by filter W- ^> ((omega), (theta)) in a 1st Example. 第２の実施例の構成を示す図。The figure which shows the structure of a 2nd Example. 第２の実施例の実験結果を示す図。The figure which shows the experimental result of a 2nd Example. 第２の実施例の実験結果を示す図。The figure which shows the experimental result of a 2nd Example. 本発明の実施構成例を示す図。（ａ）平面図。（ｂ）正面図。（ｃ）側面図。The figure which shows the implementation structural example of this invention. (A) Top view. (B) Front view. (C) Side view. （ａ）本発明の別の実施構成例を示す側面図。（ｂ）本発明の別の実施構成例を示す側面図。(A) The side view which shows another implementation structural example of this invention. (B) The side view which shows another implementation structural example of this invention. 図１４（ｂ）に示す実施構成例における使用形態を示す図。The figure which shows the usage pattern in the implementation structural example shown in FIG.14 (b). 本発明の実施構成例を示す図。（ａ）平面図。（ｂ）正面図。（ｃ）側面図。The figure which shows the implementation structural example of this invention. (A) Top view. (B) Front view. (C) Side view. 本発明の実施構成例を示す側面図。The side view which shows the implementation structural example of this invention. 第３の実施例の構成を示す正面図。The front view which shows the structure of a 3rd Example. 第３の実施例の構成を示す側面図。The side view which shows the structure of a 3rd Example. 第３の実施例の第１の変形例の構成を示す正面図。The front view which shows the structure of the 1st modification of a 3rd Example. 第３の実施例の第２の変形例の構成を示す正面図。The front view which shows the structure of the 2nd modification of a 3rd Example. 第３の実施例の第３の変形例の構成を示す正面図。The front view which shows the structure of the 3rd modification of a 3rd Example. 第３の実施例の第４の変形例の構成を示す正面図。The front view which shows the structure of the 4th modification of a 3rd Example. 第３の実施例の第５の変形例の構成を示す正面図。The front view which shows the structure of the 5th modification of a 3rd Example.

《原理》
本発明の原理について説明する。本発明は、信号処理に基づいて任意の方向の音声に追従できるというマイクロホンアレー技術の本質と、反射音を積極的に利用することによって高SN比で収音することとを基本としつつ、鋭い指向性を可能とする信号処理技術を組み合わせたことを特徴の一つとしている。 "principle"
The principle of the present invention will be described. The present invention is sharp, based on the essence of microphone array technology that can follow sound in any direction based on signal processing, and to collect sound with a high S / N ratio by actively using reflected sound. One of the features is the combination of signal processing technologies that enable directivity.

説明に先立ち、改めて記号を定義する。離散周波数のインデックスをω（周波数ｆと角周波数ωとの間にはω=2πfの関係があるから、離散周波数のインデックスωをこの角周波数ωと同一視してもかまわない。ωに関して「離散周波数のインデックス」を単に「周波数」ともいう）、フレーム番号のインデックスをkとする。M個のマイクロホンで受音したアナログ信号の第kフレームの周波数領域表現をX^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^T、マイクロホンアレーの中心から見て目的方向θ_sの音声の周波数領域表現を周波数ωで強調するフィルタをW^→(ω,θ_s)とする。Mは2以上の整数とする。Tは転置を表す。このとき、目的方向θ_sの音声の周波数領域表現が周波数ωで強調された周波数領域信号（以下、出力信号と呼ぶ）Y(ω,k,θ_s)は式（６）で与えられる。Hはエルミート転置を表す。

Prior to the explanation, the symbols are defined again. The index of the discrete frequency is ω (there is a relationship of ω = 2πf between the frequency f and the angular frequency ω, so the index ω of the discrete frequency may be identified with the angular frequency ω. The frequency index "is also simply referred to as" frequency "), and the frame number index is k. The frequency domain representation of the kth frame of the analog signal received by M microphones is expressed as X ^→ (ω, k) = [X ₁ (ω, k), ..., X _M (ω, k)] ^T , microphone array Let W ^→ (ω, θ _s ) be a filter that emphasizes the frequency domain representation of the speech in the target direction θ _s with the frequency ω as viewed from the center of the. M is an integer of 2 or more. T represents transposition. At this time, a frequency domain signal (hereinafter referred to as an output signal) Y (ω, k, θ _s ) in which the frequency domain representation of the voice in the target direction θ _s is emphasized by the frequency ω is given by Expression (6). H represents Hermitian transpose.

「マイクロホンアレーの中心」は任意に定めることができるが、一般的にはM個のマイクロホンの配置の幾何学的中心が「マイクロホンアレーの中心」とされ、例えば線形マイクロホンアレーであれば両端のマイクロホンの中間点が「マイクロホンアレーの中心」とされ、例えばm×m（m²=M）の正方マトリックス状に配置された平面マイクロホンアレーであれば、四隅のマイクロホンの対角線が交わる位置が「マイクロホンアレーの中心」とされる。 The “center of the microphone array” can be arbitrarily determined, but generally, the geometric center of the arrangement of the M microphones is the “center of the microphone array”. For example, in the case of a linear microphone array, the microphones at both ends The middle point of the microphone is the “center of the microphone array”. For example, in the case of a planar microphone array arranged in a square matrix of m × m (m ² = M), the positions where the diagonal lines of the microphones at the four corners intersect are “microphone array”. The center of

フィルタW^→(ω,θ_s)の設計法としては種々あるが、ここでは最小分散無歪応答法（MVDR method;minimum variance distortion response method）に拠る場合を説明する。最小分散無歪応答法では、フィルタW^→(ω,θ_s)は、式（８）の拘束条件の下、空間相関行列Q(ω)を用いて目的方向θ_s以外の方向の音声（以下、「目的方向θ_s以外の方向の音声」を「雑音」とも呼ぶ）のパワーが周波数ωで最小となるように設計される（式（７）参照）。a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは、方向θ_sに音源が在ると仮定した場合の、当該音源とM本のマイクロホンとの間の周波数ωでの伝達特性である。換言すれば、a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは、マイクロホンアレーに含まれる各マイクロホンへの方向θ_sの音声の周波数ωでの伝達特性である。

There are various design methods for the filter W ^→ (ω, θ _s ), but here, a case based on a minimum variance distortion response method (MVDR method) will be described. In the minimum variance-free response method, the filter W ^→ (ω, θ _s ) is a voice in a direction other than the target direction θ _s using the spatial correlation matrix Q (ω) under the constraint condition of Equation (8) , “Speech in a direction other than the target direction θ _s ” is also referred to as “noise”) so that the power is minimized at the frequency ω (see Expression (7)). a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ), ..., a _M (ω, θ _s )] ^T is the sound source when it is assumed that there is a sound source in the direction θ _s This is a transfer characteristic at a frequency ω between M microphones. In other words, a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ), ..., a _M (ω, θ _s )] ^T is the direction θ _s to each microphone included in the microphone array. It is a transfer characteristic at the frequency ω of sound.

式（７）の最適解であるフィルタW^→(ω,θ_s)は式（９）で与えられることが知られている。
（参考文献１）Simon Haykin著、鈴木博他訳、「適応フィルタ理論」、初版、株式会社科学技術出版、2001．pp.66-73,248-255

It is known that the filter W ^→ (ω, θ _s ), which is the optimal solution of the equation (7), is given by the equation (9).
(Reference 1) by Simon Haykin, translated by Hiroshi Suzuki et al., "Adaptive Filter Theory", First Edition, Science and Technology Publishing Co., Ltd., 2001. pp.66-73,248-255

空間相関行列Q(ω)の逆行列が式（９）に含まれることから察せられるように、空間相関行列Q(ω)の構造は鋭い指向性を実現する上で重要であることがわかる。また、式（７）から、雑音のパワーは空間相関行列Q(ω)の構造に依存することもわかる。 As can be seen from the fact that the inverse matrix of the spatial correlation matrix Q (ω) is included in the equation (9), it can be seen that the structure of the spatial correlation matrix Q (ω) is important for realizing sharp directivity. It can also be seen from equation (7) that the noise power depends on the structure of the spatial correlation matrix Q (ω).

雑音の到来方向のインデックスpが属する集合を{1,2,…,P-1}とする。目的方向θ_sのインデックスsは集合{1,2,…,P-1}に属さないとする。P-1個の雑音が任意の方向から到来すると仮定すると、空間相関行列Q(ω)は式（１０ａ）で与えられる。多くの雑音が存在する中でも十分に機能するフィルタを作る観点から、Pはある程度大きい値であることが好ましく、M程度の整数であるとする。なお、ここでは発明の原理を分かり易く説明する観点から目的方向θ_sがあたかも特定の方向の如く説明しているが（それ故、目的方向θ_s以外の方向を「雑音」の方向としている）、後述の実施形態で明らかになるように、実際には、目的方向θ_sは音声強調の対象となりえる任意の方向であり、目的方向θ_sになりえる方向として一般的に複数の方向が想定される。このような観点からすると、目的方向θ_sと雑音の方向との区別は凡そ主観的なものであり、目的音か雑音かの区別なく音声の到来方向として想定される複数の方向としてP個の異なる方向を予め決めておき、P個の方向のうち選択された一つの方向が目的方向であり、それ以外の方向が雑音の方向であると理解することがより正確である。そこで、集合{1,2,…,P-1}と集合{s}との和集合をΦとすると、空間相関行列Q(ω)は、音声の到来方向として想定される複数の方向に含まれる各方向θ_φの音声の各マイクロホンへの伝達特性a^→(ω,θ_φ)＝[a₁(ω,θ_φ),…,a_M(ω,θ_φ)]^T（φ∈Φ）によって表される空間相関行列であり、式（１０ｂ）で表される。なお、|Φ|=Pである。|Φ|は集合Φの要素数を表す。

Let {1, 2,..., P-1} be a set to which the noise arrival direction index p belongs. It is assumed that the index s in the target direction θ _s does not belong to the set {1, 2,..., P-1}. Assuming that P−1 noises come from any direction, the spatial correlation matrix Q (ω) is given by equation (10a). From the viewpoint of creating a filter that functions sufficiently even in the presence of a lot of noise, P is preferably a somewhat large value, and is assumed to be an integer of about M. Here, from the viewpoint of easily explaining the principle of the invention, the target direction θ _s is described as if it were a specific direction (therefore, directions other than the target direction θ _s are set as “noise” directions). As will be apparent from the embodiments described later, in practice, the target direction θ _s is an arbitrary direction that can be a target of speech enhancement, and a plurality of directions are generally assumed as directions that can be the target direction θ _s. Is done. From this point of view, the distinction between the target direction θ _s and the noise direction is almost subjective, and there are P number of directions that can be assumed as voice arrival directions regardless of whether the target sound or noise is distinguished. It is more accurate to determine different directions in advance and understand that one of the P directions selected is the target direction and the other directions are noise directions. Therefore, if the union of the set {1, 2, ..., P-1} and the set {s} is Φ, the spatial correlation matrix Q (ω) is included in multiple directions that are assumed as the voice arrival directions. transfer characteristics a to each microphone of the speech in each direction theta _phi to ^{_{→ (ω, θ φ) =}} [a 1 (ω, θ φ), ..., a M (ω, θ φ)] T (φ∈Φ) Is a spatial correlation matrix expressed by Equation (10b). Note that | Φ | = P. | Φ | represents the number of elements of the set Φ.

ここで、目的方向θ_sの音声の伝達特性a^→(ω,θ_s)と、方向p∈{1,2,…,P-1}の音声の伝達特性a^→(ω,θ_p)=[a₁(ω,θ_p),…,a_M(ω,θ_p)]^Tがお互いに直交すると仮定する。つまり、式（１１）で表される条件を満たすP個の直交基底系が存在すると仮定する。記号⊥は直交性を表す。A^→⊥B^→である場合、ベクトルA^→とベクトルB^→の内積値はゼロである。ここではP≦Mを満たすとする。なお、式（１１）で表される条件を緩和し、近似的に直交基底系と見なせるP個の基底系が存在すると仮定できるような場合には、PはM程度、あるいはM以上のある程度大きい値であることが好ましい。

Here, the transfer characteristic a ^→ (ω, θ _s ) of the voice in the target direction θ _s and the transfer characteristic a ^→ (ω, θ _p ) = of the voice in the direction p∈ {1,2, ..., P-1} [a ₁ (ω, θ _p ), ..., a _M (ω, θ _p )] Assume that ^T are orthogonal to each other. That is, it is assumed that there are P orthogonal basis sets that satisfy the condition expressed by Expression (11). The symbol ⊥ represents orthogonality. If A ^→ ⊥B ^→ , the inner product value of vector A ^→ and vector B ^→ is zero. Here, P ≦ M is satisfied. In addition, when the condition represented by Expression (11) is relaxed and it can be assumed that there are P basis sets that can be regarded as approximately orthogonal basis sets, P is about M, or is somewhat larger than M It is preferably a value.

このとき、空間相関行列Q(ω)は式（１２）のように展開できる。式（１２）は、直交性を満たすP個の伝達特性で構成された行列V(ω)=[a^→(ω,θ_s),a^→(ω,θ₁),…,a^→(ω,θ_P-1)]^Tと単位行列Λ(ω)によって空間相関行列Q(ω)を分解できることを意味している。ρは空間相関行列Q(ω)による式（１１）を満たす伝達特性a^→(ω,θ_φ)の固有値であり実数である。

At this time, the spatial correlation matrix Q (ω) can be expanded as shown in Equation (12). Equation (12) is a matrix V (ω) = [a ^→ (ω, θ _s ), a ^→ (ω, θ ₁ ),..., A ^→ (ω , θ _P-1 )] ^T and unit matrix Λ (ω) means that the spatial correlation matrix Q (ω) can be decomposed. ρ is an eigenvalue and a real number of the transfer characteristic a ^→ (ω, θ _φ ) satisfying the expression (11) based on the spatial correlation matrix Q (ω).

このとき、空間相関行列Q(ω)の逆行列は式（１３）で与えられる。

At this time, the inverse matrix of the spatial correlation matrix Q (ω) is given by Equation (13).

式（１３）を式（７）に代入すると、雑音のパワーが最小となることがわかる。雑音のパワーが最小となれば目的方向θ_sに対する指向性が実現する。よって、異なる方向の伝達特性の間に直交性が成り立っていることは、目的方向θ_sに対する指向性を実現する上で、重要な条件となる。 It can be seen that the noise power is minimized by substituting equation (13) into equation (7). If the noise power is minimized, directivity in the target direction θ _s is realized. Therefore, the fact that orthogonality is established between transfer characteristics in different directions is an important condition for realizing directivity with respect to the target direction θ _s .

以下、従来技術において目的方向θ_sに対して鋭い指向性を実現することが困難な理由について考察する。 Hereinafter, the reason why it is difficult to realize a sharp directivity with respect to the target direction θ _s in the prior art will be considered.

従来技術では、伝達特性が直接音のみで構成されると仮定してフィルタの設計を行っていた。現実には同じ音源から発せられた音声が壁や天井等で反射してマイクロホンに到達する反射音が存在するが、反射音は指向性を悪化させる要因と考えて反射音の存在を無視していたのである。方向θから到来する直接音のみのステアリングベクトルをh^→ _d(ω,θ)=[h_d1(ω,θ),…,h_dM(ω,θ)]^Tとすると、従来では、伝達特性a^→ _conv(ω,θ)=[a₁(ω,θ),…,a_M(ω,θ)]^Tをa^→ _conv(ω,θ)=h^→ _d(ω,θ)としていた。なお、ステアリングベクトルは、マイクロホンアレーの中心から見て方向θの音波について、基準点に対する各マイクロホンの周波数ωでの位相応答特性を並べた複素ベクトルである。 In the prior art, the filter is designed on the assumption that the transfer characteristic is composed only of direct sound. In reality, there is a reflected sound that is reflected from the sound source from the same sound source and reaches the microphone, but the reflected sound is considered to be a factor that deteriorates directivity and ignores the presence of the reflected sound. It was. If the steering vector of only direct sound coming from the direction θ is h ^→ _d (ω, θ) = [h _d1 (ω, θ), ..., h _dM (ω, θ)] ^T , conventionally, the transfer characteristic a ^→ _conv (ω, θ) = [a ₁ (ω, θ),..., A _M (ω, θ)] ^T is set as a ^→ _conv (ω, θ) = h ^→ _d (ω, θ). The steering vector is a complex vector in which the phase response characteristics at the frequency ω of each microphone with respect to the reference point are arranged for sound waves in the direction θ as viewed from the center of the microphone array.

線形マイクロホンアレーに音声が平面波として到来すると仮定すると、直接音のステアリングベクトルh^→ _d(ω,θ)を構成するm番目の要素h_dm(ω,θ)は例えば式（１４ａ）で与えられる。mは1≦m≦Mを満たす各整数である。cは音速を、uは隣り合うマイクロホン間の距離を表す。ｊは虚数単位である。基準点は線形マイクロホンアレーの全長の半分の位置（線形マイクロホンアレーの中心）である。方向θは線形マイクロホンアレーの中心から見て直接音の到来方向と線形マイクロホンアレーに含まれるマイクロホンの配列方向とがなす角度として定義した（図６参照）。なお、ステアリングベクトルの表し方は種々あり、例えば、基準点を線形マイクロホンアレーの一端にあるマイクロホンの位置とすれば、直接音のステアリングベクトルh^→ _d(ω,θ)を構成するm番目の要素h_dm(ω,θ)は例えば式（１４ｂ）で与えられる。以下、直接音のステアリングベクトルh^→ _d(ω,θ)を構成するm番目の要素h_dm(ω,θ)は式（１４ａ）で与えられるとして説明する。

Assuming that speech arrives at the linear microphone array as a plane wave, the m-th element h _dm (ω, θ) constituting the direct sound steering vector h ^→ _d (ω, θ) is given by, for example, the following equation (14a). m is an integer satisfying 1 ≦ m ≦ M. c represents the speed of sound, and u represents the distance between adjacent microphones. j is an imaginary unit. The reference point is half the total length of the linear microphone array (the center of the linear microphone array). The direction θ is defined as an angle formed by the direct sound arrival direction and the arrangement direction of the microphones included in the linear microphone array as seen from the center of the linear microphone array (see FIG. 6). There are various ways of expressing the steering vector. For example, if the reference point is the position of the microphone at one end of the linear microphone array, the mth element constituting the direct sound steering vector h ^→ _d (ω, θ). h _dm (ω, θ) is given by, for example, equation (14b). In the following description, it is assumed that the m-th element h _dm (ω, θ) constituting the direct sound steering vector h ^→ _d (ω, θ) is given by equation (14a).

方向θの伝達特性と目的方向θ_sの伝達特性との内積値γ_conv(ω,θ)は式（１５）で表される。なお、θ≠θ_sとする。

The inner product value γ _conv (ω, θ) of the transfer characteristic in the direction θ and the transfer characteristic in the target direction θ _s is expressed by Expression (15). Note that θ ≠ θ _s .

以後、γ_conv(ω,θ)をコヒーレンスと呼称する。コヒーレンスγ_conv(ω,θ)が０となる方向θは式（１６）で与えられる。ｑは０を除く任意の整数である。また、０＜θ＜π／２であるから、ｑの範囲は周波数帯域ごとに制限されることになる。

Hereinafter, γ _conv (ω, θ) is referred to as coherence. The direction θ in which the coherence γ _conv (ω, θ) is 0 is given by equation (16). q is an arbitrary integer other than 0. In addition, since 0 <θ <π / 2, the range of q is limited for each frequency band.

式（１６）にて、変更可能なパラメータはマイクロホンアレーのサイズに関わるパラメータ（Mとu）のみであるから、方向の差（角度差）|θ-θ_s|が小さい場合には、マイクロホンアレーのサイズに関わるパラメータを変更することなくコヒーレンスγ_conv(ω,θ)を小さくすることは困難である。この場合、雑音のパワーは十分に小さくならず、図２（ａ）に模式的に示すように、目的方向θ_sに対して広いビーム幅を持った指向性となってしまう。 In Expression (16), the only parameters that can be changed are the parameters (M and u) related to the size of the microphone array. Therefore, when the direction difference (angle difference) | θ−θ _s | is small, the microphone array It is difficult to reduce the coherence γ _conv (ω, θ) without changing the parameters related to the size of. In this case, the power of the noise is not sufficiently reduced, and directivity having a wide beam width with respect to the target direction θ _{s is} obtained as schematically shown in FIG.

他方、本発明は、このような考察に基づき、目的方向θ_sに対して鋭い指向性を持つためのフィルタ設計には、方向の差（角度差）|θ-θ_s|が小さい場合でもコヒーレンスを十分に小さくできるようにすることが重要であるとの知見に基づき、従来技術と異なり反射音を積極的に考慮することを特徴とする。 On the other hand, according to the present invention, based on such consideration, the filter design for having a sharp directivity with respect to the target direction θ _s is coherence even when the direction difference (angle difference) | θ−θ _s | is small. Unlike the prior art, based on the knowledge that it is important to be able to reduce the noise sufficiently, it is characterized by positively considering reflected sound.

マイクロホンアレーの各マイクロホンには、音源からの直接音と、当該音源からの音が反射物３００で反射した反射音との二種類の平面波が混入することになる。反射音の数をΞとする。Ξは１以上の予め定められた整数である。このとき、伝達特性a^→(ω,θ)＝[a₁(ω,θ),…,a_M(ω,θ)]^Tは、音声強調の対象となりえる方向の音声がマイクロホンアレーに直接届く直接音の伝達特性と当該音声が反射物で反射してマイクロホンアレーに届く一つ以上の反射音の各伝達特性との和、具体的には、直接音とξ番目（1≦ξ≦Ξ）の反射音との到来時間差をτ_ξ(θ)とし、α_ξ（1≦ξ≦Ξ）を反射による音の減衰を考慮するための係数とすると、式（１７ａ）のように、直接音のステアリングベクトルと、反射による音の減衰および直接音に対する到来時間差が補正されたΞ個の反射音のステアリングベクトルの和で表現できる。h^→ _rξ(ω,θ)=[h_r1ξ(ω,θ),…,h_rMξ(ω,θ)]^Tは方向θの直接音に対応する反射音のステアリングベクトルを表す。α_ξ（1≦ξ≦Ξ）は、通常、α_ξ≦1（1≦ξ≦Ξ）である。各反射音について、音源からマイクロホンに到達するまでの反射回数が１回であるならば、α_ξ（1≦ξ≦Ξ）は、ξ番目の反射音が反射した物体の音の反射率を表していると考えて差し支えない。

In each microphone of the microphone array, two types of plane waves, that is, a direct sound from the sound source and a reflected sound in which the sound from the sound source is reflected by the reflector 300 are mixed. Let the number of reflected sounds be Ξ. Ξ is a predetermined integer of 1 or more. At this time, the transfer characteristic a ^→ (ω, θ) = [a ₁ (ω, θ),..., A _M (ω, θ)] ^T directly transmits the sound in the direction that can be the target of speech enhancement to the microphone array. The sum of the direct sound transmission characteristics and the transmission characteristics of one or more reflected sounds that are reflected by the reflector and reach the microphone array, specifically, the direct sound and the ξth (1 ≦ ξ ≦ Ξ) If the arrival time difference from the reflected sound is τ _ξ (θ), and α _ξ (1 ≦ ξ ≦ Ξ) is a coefficient for taking into account the attenuation of the sound due to reflection, the direct sound as shown in equation (17a) This can be expressed as the sum of the steering vector and the steering vector of several reflected sounds in which the sound attenuation due to reflection and the arrival time difference with respect to the direct sound are corrected. h ^→ _rξ (ω, θ) = [h _r1ξ (ω, θ),..., h _rMξ (ω, θ)] ^T represents the steering vector of the reflected sound corresponding to the direct sound in the direction θ. α _ξ (1 ≦ ξ ≦ Ξ) is usually α _ξ ≦ 1 (1 ≦ ξ ≦ Ξ). For each reflected sound, if the number of reflections from the sound source to the microphone is one, α _ξ (1 ≦ ξ ≦ Ξ) represents the reflectance of the sound of the object reflected by the ξth reflected sound. You can think that it is.

Ｍ個のマイクロホンで構成されるマイクロホンアレーに対して一つ以上の反射音を与えることが望まれるので、一つ以上の反射物が存在することが好ましい。このような観点からすると、目的方向に音源が在るとして、当該音源とマイクロホンアレーと一つ以上の反射物との位置関係は、当該音源からの音が少なくとも一つの反射物で反射してマイクロホンアレーに届くように、各反射物が配置されていることが好ましい。各反射物の形状は、２次元形状（例えば平板）または３次元形状（例えばパラボラ形状）である。また、各反射物の大きさはマイクロホンアレーと同等かそれ以上（１〜２倍程度）の大きさを持つことが好ましい。反射音を効果的に活用するためには、各反射物の反射率α_ξ（1≦ξ≦Ξ）は少なくとも０よりも大きく、さらに言えば、マイクロホンアレーに届いた反射音の振幅が直接音の振幅の例えば0.2倍以上であることが望ましく、例えば各反射物は剛性を有する固体とされる。反射物は移動可能な物体（例えば反射板）であっても移動不能な物体（床や壁や天井）であってもよい。なお、移動不能な物体を反射物として設定するとマイクロホンアレーの設置位置の変更などに伴って、反射音のステアリングベクトルの変更を要することとなり（後述する関数Ψ(θ)やΨ_ξ(θ)を参照のこと）、ひいてはフィルタ計算のやり直し（再設定）が余儀なくされる。そこで、環境変化に対して頑健であるためには、各反射物はマイクロホンアレーの従物であることが好ましい（この場合、想定されるΞ個の反射音は各反射物によるものであると考えることになる）。ここで「マイクロホンアレーの従物」とは、「マイクロホンアレーに対する配置関係（幾何学的関係）を維持したままマイクロホンアレーの位置や向きなどの変更に従うことができる有体物」のことである。単純な例として、マイクロホンアレーに各反射物が固定されている構成が挙げられる。 Since it is desired to provide one or more reflected sounds to a microphone array composed of M microphones, it is preferable that one or more reflectors exist. From this point of view, assuming that there is a sound source in the target direction, the positional relationship between the sound source, the microphone array, and one or more reflectors is such that the sound from the sound source is reflected by at least one reflector. Each reflector is preferably arranged to reach the array. Each reflector has a two-dimensional shape (for example, a flat plate) or a three-dimensional shape (for example, a parabolic shape). Moreover, it is preferable that the size of each reflector is equal to or larger than the microphone array (about 1 to 2 times). In order to effectively use the reflected sound, the reflectance α _ξ (1 ≦ ξ ≦ Ξ) of each reflector is at least greater than 0, and more specifically, the amplitude of the reflected sound that reaches the microphone array is a direct sound. For example, each reflector is a rigid solid. The reflecting object may be a movable object (for example, a reflector) or an immovable object (a floor, a wall, or a ceiling). If an immovable object is set as a reflector, the steering vector of the reflected sound needs to be changed along with the change of the microphone array installation position (functions Ψ (θ) and Ψ _ξ (θ) described later are changed). (Refer to the above), and the filter calculation must be redone (reset). Therefore, in order to be robust against environmental changes, it is preferable that each reflector is a subordinate of the microphone array (in this case, it is assumed that the estimated number of reflected sounds is due to each reflector. become). Here, the “subordinate of the microphone array” refers to “a tangible object that can follow changes in the position and orientation of the microphone array while maintaining the positional relationship (geometric relationship) with respect to the microphone array”. A simple example is a configuration in which each reflector is fixed to a microphone array.

以下、本発明の利点を具体的に説明する観点から、Ξ=1とし、反射音の反射回数は１回であって、マイクロホンアレーの中心からLメートル離れた位置に一つの反射物が存在すると仮定する。反射物は厚みのある剛体とする。この場合、Ξ=1であるからこれを表す添え字を略することとして、式（１７ａ）は式（１７ｂ）のように表すことができる。

Hereinafter, from the viewpoint of specifically explaining the advantages of the present invention, Ξ = 1, the number of reflections of the reflected sound is one, and there is one reflector at a position L meters away from the center of the microphone array. Assume. The reflector is a thick rigid body. In this case, since Ξ = 1, the subscript representing this is omitted, and the expression (17a) can be expressed as the expression (17b).

反射音のステアリングベクトルh^→ _r(ω,θ)=[h_r1(ω,θ),…,h_rM(ω,θ)]^Tのm番目の要素は、直接音のステアリングベクトルの表し方と同様に（式（１４ａ）参照）、式（１８ａ）で表される。関数Ψ(θ)は反射音の到来方向を出力する。なお、直接音のステアリングベクトルを式（１４ｂ）で表す場合には、反射音のステアリングベクトルh^→ _r(ω,θ)=[h_r1(ω,θ),…,h_rM(ω,θ)]^Tのm番目の要素は式（１８ｂ）で表される。一般的に、ξ番目（1≦ξ≦Ξ）のステアリングベクトルh^→ _rξ(ω,θ)=[h_r1ξ(ω,θ),…,h_rMξ(ω,θ)]^Tのm番目の要素は、式（１８ｃ）や式（１８ｄ）で表される。関数Ψ_ξ(θ)はξ番目（1≦ξ≦Ξ）の反射音の到来方向を出力する。

Reflection sound steering vector h ^→ _r (ω, θ) = [h _r1 (ω, θ),…, h _rM (ω, θ)] The m-th element of ^T Similarly (see formula (14a)), it is represented by formula (18a). The function Ψ (θ) outputs the arrival direction of the reflected sound. When the direct sound steering vector is expressed by equation (14b), the reflected sound steering vector h ^→ _r (ω, θ) = [h _r1 (ω, θ),..., H _rM (ω, θ) ] The m-th element of ^T is expressed by Expression (18b). In general, the _ξth (1 ≦ ξ ≦ Ξ) steering vector h ^→ _rξ (ω, θ) = [h _r1ξ (ω, θ),…, h _rMξ (ω, θ)] The mth element of ^T Is expressed by equation (18c) or equation (18d). The function Ψ _ξ (θ) outputs the arrival direction of the ξth (1 ≦ ξ ≦ Ξ) reflected sound.

反射物の位置は適宜に設定可能であるから、反射音の到来方向は変更可能なパラメータとして扱うことができる。 Since the position of the reflector can be set as appropriate, the arrival direction of the reflected sound can be treated as a variable parameter.

平板状の反射物がマイクロホンアレーの近傍にある（距離Lがマイクロホンアレーのサイズに比して極端に大きくない）と仮定すると、コヒーレンスγ(ω,θ)は式（１９）で表される。なお、θ≠θ_sとする。

Assuming that the flat reflector is in the vicinity of the microphone array (distance L is not extremely large compared to the size of the microphone array), coherence γ (ω, θ) is expressed by equation (19). Note that θ ≠ θ _s .

式（１９）から、式（１５）の従来のコヒーレンスγ_conv(ω,θ)よりも式（１９）のコヒーレンスγ(ω,θ)の方が小さくなる可能性があることがわかる。反射物の置き方によって変更できるパラメータ（Ψ(θ)やL）が式（１９）の第２〜４項目の中に存在するので第１項目のh^→ _d ^H(ω,θ)h^→ _d(ω,θ)を除去できる可能性がある。 From equation (19), it can be seen that the coherence γ (ω, θ) of equation (19) may be smaller than the conventional coherence γ _conv (ω, θ) of equation (15). There are parameters (Ψ (θ) and L) that can be changed depending on how the reflector is placed in the second to fourth items of equation (19), so the first item h ^→ _d ^H (ω, θ) h ^→ _d (ω, θ) may be removed.

例えば、線形マイクロホンアレーに対して、マイクロホンの配列方向が反射板の法線となるように平板の反射板を配置すると、関数Ψ(θ)についてΨ(θ)=π-θが成立し、直接音と反射音との到来時間差τ(θ)について式（２０）が成立するので、式（１９）を構成する要素に式（２１）（２２）の各条件が生成される。記号＊は複素共役を表す演算子である。

For example, for a linear microphone array, if a flat reflector is placed so that the microphone array direction is normal to the reflector, Ψ (θ) = π-θ holds for the function Ψ (θ), and Since the equation (20) is established for the arrival time difference τ (θ) between the sound and the reflected sound, the conditions of the equations (21) and (22) are generated in the elements constituting the equation (19). The symbol * is an operator representing a complex conjugate.

h^→ _d ^H(ω,θ)h^→ _r(ω,θ)の絶対値はh^→ _d ^H(ω,θ)h^→ _d(ω,θ)よりも十分に小さいので、式（１９）の第２項、第３項を無視すると、コヒーレンスγ(ω,θ)は式（２３）のように近似できる。

^{_{^{h → d H (ω, θ}}} ) the absolute value of h ^→ _r (ω, θ) is ^{_{^{h → d H (ω, θ}}} ) h → d (ω, θ) sufficiently smaller than, the formula (19) If the second and third terms are ignored, the coherence γ (ω, θ) can be approximated as shown in Equation (23).

仮にh^→ _d ^H(ω,θ)h^→ _d(ω,θ)≠0であるとしても、近似コヒーレンスγ~(ω,θ)は式（２４）の極小解θを持つ。ｑは任意の正整数である。また、ｑの範囲は周波数帯域ごとに制限される。

If ^{_{^{h → d H (ω, θ}}} ) h → d (ω, θ) even as ≠ 0, approximate coherence γ ~ (ω, θ) has a local minimum θ of equation (24). q is an arbitrary positive integer. Further, the range of q is limited for each frequency band.

つまり、式（１６）で与えられる方向だけではなく、式（２４）で与えられる方向でもコヒーレンスを抑圧できる。コヒーレンスを抑圧できれば、雑音のパワーをより小さくできるので、図２（ｂ）に模式的に示すように、鋭い指向性の実現が可能になる。 That is, the coherence can be suppressed not only in the direction given by Expression (16) but also in the direction given by Expression (24). If the coherence can be suppressed, the power of noise can be reduced, so that a sharp directivity can be realized as schematically shown in FIG.

なお、図２では本発明の原理に拠る場合と従来技術に拠る場合の指向性の違いを模式的に示したが、図３に、式（１６）で与えられるθと式（２４）で与えられるθの違いを具体的に示す。ω=2π×1000[rad/s]，L=0.70[m]，θ_s=π/4[rad]である。図３では両者の比較のために正規化されたコヒーレンスの方向依存性を示してあり、記号○で示された方向が式（１６）で与えられるθであり、記号＋で示された方向が式（２４）で与えられるθである。図３から明らかなように、従来技術に拠るとθ_s=π/4[rad]に対してコヒーレンスがゼロとなるθは記号○で示された方向だけであるが、本発明の原理に拠るとθ_s=π/4[rad]に対してコヒーレンスがゼロとなるθは記号＋で示される多数の方向に存在し、特に、記号○で示された方向よりもθ_s=π/4[rad]にはるかに近い方向に記号＋で示された方向が存在するため、従来技術に比べて鋭い指向性が実現されることが理解できる。 2 schematically shows the difference in directivity between the case according to the principle of the present invention and the case according to the prior art. In FIG. 3, θ given by Expression (16) and Expression (24) are given. The difference in θ obtained will be specifically shown. ω = 2π × 1000 [rad / s], L = 0.70 [m], θ _s = π / 4 [rad]. FIG. 3 shows the direction dependency of the normalized coherence for comparison between the two, the direction indicated by the symbol ○ is θ given by the equation (16), and the direction indicated by the symbol + is It is (theta) given by Formula (24). As is apparent from FIG. 3, according to the prior art, the coherence becomes zero with respect to θ _s = π / 4 [rad] only in the direction indicated by the symbol ○, but according to the principle of the present invention. And θ _s = π / 4 [rad] and coherence is zero in many directions indicated by the symbol +, and in particular, θ _s = π / 4 [ Since the direction indicated by the symbol + exists in a direction much closer to rad], it can be understood that sharp directivity is realized as compared with the prior art.

上述の説明から明らかなように、本発明の特徴の要点は、伝達特性a^→(ω,θ)＝[a₁(ω,θ),…,a_M(ω,θ)]^Tを、例えば式（１７ａ）のように、直接音のステアリングベクトルとΞ個の反射音のステアリングベクトルの和で表現していることにある。従って、フィルタの設計コンセプト自体に影響を与えないので、最小分散無歪応答法以外の手法によってフィルタW^→(ω,θ_s)を設計することができる。 As is apparent from the above description, the main feature of the present invention is that the transfer characteristic a ^→ (ω, θ) = [a ₁ (ω, θ),..., A _M (ω, θ)] ^T , for example, This is because the expression is expressed by the sum of the steering vector of the direct sound and the steering vector of a number of reflected sounds, as in Expression (17a). Accordingly, since the filter design concept itself is not affected, the filter W ^→ (ω, θ _s ) can be designed by a method other than the minimum variance distortionless response method.

最小分散無歪応答法以外の手法として、SN比最大化規準によるフィルタ設計法とパワーインバージョン(Power Inversion)に基づくフィルタ設計法を説明する。SN比最大化規準によるフィルタ設計法とパワーインバージョンに基づくフィルタ設計法については参考文献２を参照のこと。
（参考文献２）菊間信良著、「アダプティブアンテナ技術」、第１版、株式会社オーム社、２００３年、pp.35-90 A filter design method based on the S / N ratio maximization criterion and a filter design method based on power inversion will be described as methods other than the minimum variance distortion-free method. See Reference 2 for the filter design method based on the SNR maximization criterion and the filter design method based on power inversion.
(Reference 2) Nobuyoshi Kikuma, “Adaptive Antenna Technology”, 1st Edition, Ohm Corporation, 2003, pp.35-90

<１>SN比最大化規準によるフィルタ設計法
SN比最大化規準によるフィルタ設計法では、目的方向θ_sでのSN比（SNR）を最大化する規準でフィルタW^→(ω,θ_s)を決定する。目的方向θ_sの音声の空間相関行列をR_ss(ω)、目的方向θ_s以外の方向の音声の空間相関行列をR_nn(ω)とする。このとき、SNRは式（２５）で表される。なお、R_ss(ω)は式（２６）、R_nn(ω)は式（２７）で表される。伝達特性a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは式（１７ａ）で表される（正確には、式（１７ａ）のθをθ_sとしたものである）。

<1> Filter design method based on S / N ratio maximization criteria
In the filter design method based on the SN ratio maximization criterion, the filter W ^→ (ω, θ _s ) is determined based on the criterion for maximizing the SN ratio (SNR) in the target direction θ _s . The spatial correlation matrix of the audio object direction θ _s R _ss (ω), the spatial correlation matrix of the direction other than the target direction theta _s voice and R _nn (ω). At this time, the SNR is expressed by Expression (25). Note that R _ss (ω) is expressed by Expression (26), and R _nn (ω) is expressed by Expression (27). Transfer characteristic a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ),..., A _M (ω, θ _s )] ^T is expressed by equation (17a) (exactly, equation (17a ) Is θ _s ).

式（２５）のSNRを最大にするフィルタW^→(ω,θ_s)は、フィルタW^→(ω,θ_s)に関する勾配をゼロとすること、つまり式（２８）によって求めることができる。

The filter W ^→ (ω, θ _s ) that maximizes the SNR in Expression (25) can be obtained by setting the gradient related to the filter W ^→ (ω, θ _s ) to zero, that is, Expression (28).

これにより、式（２５）のSNRを最大にするフィルタW^→(ω,θ_s)は式（２９）で与えられる。

Thus, the filter W ^→ (ω, θ _s ) that maximizes the SNR in Expression (25) is given by Expression (29).

式（２９）には目的方向θ_s以外の方向の音声の空間相関行列R_nn(ω)の逆行列が含まれているが、R_nn(ω)の逆行列を、目的方向θ_sの音声と目的方向θ_s以外の方向の音声を含む入力全体の空間相関行列R_xx(ω)の逆行列に置換してもよいことが知られている。なお、R_xx(ω)=R_ss(ω)+R_nn(ω)=Q(ω)である（式（１０ａ）、式（２６）、式（２７）参照）。つまり、式（２５）のSNRを最大にするフィルタW^→(ω,θ_s)を式（３０）で求めてもよい。

Expression (29) includes an inverse matrix of the spatial correlation matrix R _nn (ω) of speech in a direction other than the target direction θ _s , and the inverse matrix of R _nn (ω) is used as the speech in the target direction θ _s . It is known that it may be replaced with an inverse matrix of the spatial correlation matrix R _xx (ω) of the entire input including speech in directions other than the target direction θ _s . Note that R _xx (ω) = R _ss (ω) + R _nn (ω) = Q (ω) (see Expression (10a), Expression (26), and Expression (27)). That is, the filter W ^→ (ω, θ _s ) that maximizes the SNR in Expression (25) may be obtained by Expression (30).

<２>パワーインバージョンに基づくフィルタ設計法
パワーインバージョンに基づくフィルタ設計法では、一つのマイクロホンに対するフィルタ係数を一定値に固定した状態で出力のパワーを最小化する基準でフィルタW^→(ω,θ_s)を決定する。ここでは、一例として、M個のマイクロホンのうち1番目のマイクロホンに対するフィルタ係数を固定するとして説明する。この設計法では、フィルタW^→(ω,θ_s)は、式（３２）の拘束条件の下、空間相関行列R_xx(ω)を用いて全方向（音声の到来方向として想定される全ての方向）の音声のパワーが最小となるように設計される（式（３１）参照）。伝達特性a^→(ω,θ_s)＝[a₁(ω,θ_s),…,a_M(ω,θ_s)]^Tは式（１７ａ）で表される（正確には、式（１７ａ）のθをθ_sとしたものである）。なお、R_xx(ω)=Q(ω)である（式（１０ａ）、式（２６）、式（２７）参照）。

<2> Filter design method based on power inversion In the filter design method based on power inversion, the filter W ^→ (ω, θ _s ) is determined. Here, as an example, it is assumed that the filter coefficient for the first microphone among the M microphones is fixed. In this design method, the filter W ^→ (ω, θ _s ) is omnidirectional (all the possible directions of arrival of speech) using the spatial correlation matrix R _xx (ω) under the constraint condition of Equation (32). It is designed to minimize the power of the voice in the direction (see equation (31)). Transfer characteristic a ^→ (ω, θ _s ) = [a ₁ (ω, θ _s ),..., A _M (ω, θ _s )] ^T is expressed by equation (17a) (exactly, equation (17a ) Is θ _s ). Note that R _xx (ω) = Q (ω) (see Expression (10a), Expression (26), and Expression (27)).

式（３１）の最適解であるフィルタW^→(ω,θ_s)は式（３３）で与えられることが知られている（参考文献２参照）。

It is known that the filter W ^→ (ω, θ _s ), which is the optimal solution of the equation (31), is given by the equation (33) (see Reference 2).

《実施形態》
本発明の実施形態の機能構成および処理フローを図４と図５に示す。この実施形態の狭指向音声強調装置１は、ＡＤ変換部２１０、フレーム生成部２２０、周波数領域変換部２３０、フィルタ適用部２４０、時間領域変換部２５０、フィルタ設計部２６０、記憶部２９０を含む。 <Embodiment>
The functional configuration and processing flow of the embodiment of the present invention are shown in FIGS. The narrow-directional speech enhancement apparatus 1 according to this embodiment includes an AD conversion unit 210, a frame generation unit 220, a frequency domain conversion unit 230, a filter application unit 240, a time domain conversion unit 250, a filter design unit 260, and a storage unit 290.

ステップＳ１
予め、フィルタ設計部２６０が音声強調の対象となりえる離散的な方向ごとに、周波数ごとのフィルタW^→(ω,θ_i)を計算しておく。音声強調の対象となりえる離散的な方向の総数をI（Iは１以上の予め定められた整数であり、I≦Pを満たす）とすると、W^→(ω,θ₁)，…，W^→(ω,θ_i)，…，W^→(ω,θ_I)（1≦i≦I, ω∈Ω; iは整数、Ωは周波数ωの集合）を事前に計算しておくのである。このためには、伝達特性a^→(ω,θ_i)＝[a₁(ω,θ_i),…,a_M(ω,θ_i)]^T（1≦i≦I, ω∈Ω）を求める必要があるが、これは、マイクロホンアレーにおけるマイクロホンの配置、反射物である例えば反射板、床、壁、天井のマイクロホンアレーに対する位置関係、直接音とξ番目（1≦ξ≦Ξ）の反射音との到来時間差、反射物の音の反射率などの環境情報を基に式（１７ａ）によって具体的に計算できる（正確には、式（１７ａ）のθをθ_iとしたものである）。反射音の数Ξは１≦Ξを満たす整数に設定されるが、Ξの値として特に限定はなく計算能力などに応じて適宜に設定すればよい。一つの反射板をマイクロホンアレーの近傍に設置する場合には、伝達特性a^→(ω,θ_i)は式（１７ｂ）によって具体的に計算できる（正確には、式（１７ｂ）のθをθ_iとしたものである）。ステアリングベクトルの計算には、例えば式（１４ａ）、式（１４ｂ）、式（１８ａ）、式（１８ｂ）、式（１８ｃ）、式（１８ｄ）を用いることができる。なお、式（１７ａ）や式（１７ｂ）に拠らず、例えば実環境下における実測で得られた伝達特性を用いてもよい。そして、伝達特性a^→(ω,θ_i)を用いて、例えば式（９）、式（２９）、式（３０）、式（３３）のいずれかによってW^→(ω,θ_i)（1≦i≦I）を求める。なお、式（９）または式（３０）または式（３３）を用いる場合には空間相関行列Q(ω)（あるいはR_xx(ω)）は式（１０ｂ）で計算できる。式（２９）を用いる場合には空間相関行列R_nn(ω)は式（２７）で計算できる。I×|Ω|個のフィルタW^→(ω,θ_i)（1≦i≦I,ω∈Ω）は記憶部２９０に記憶される。|Ω|は集合Ωの要素数を表す。 Step S1
In advance, the filter design unit 260 calculates a filter W ^→ (ω, θ _i ) for each frequency for each discrete direction that can be a target of speech enhancement. The total number of I discrete directions that may be subject to speech enhancement (I is 1 or more predetermined integer, satisfying the I ≦ P) ^{_{When, W → (ω, θ 1}} ), ..., W → (ω, θ _i ),..., W ^→ (ω, θ _I ) (1 ≦ i ≦ I, ω∈Ω; i is an integer, Ω is a set of frequencies ω) is calculated in advance. For this purpose, transfer characteristic a ^→ (ω, θ _i ) = [a ₁ (ω, θ _i ), ..., a _M (ω, θ _i )] ^T (1 ≦ i ≦ I, ω∈Ω) It is necessary to obtain this information from the microphone array in the microphone array, the positional relationship of the reflectors such as the reflector, floor, wall, and ceiling with respect to the microphone array, the direct sound and the ξth (1 ≦ ξ ≦ Ξ) reflection. arrival time difference between the sounds can be specifically calculated based on the environment information, such as the reflectivity of the sound reflector by the formula (17a) (to be precise, in which a theta of formula (17a) was theta _i) . The number 反射 of the reflected sound is set to an integer satisfying 1 ≦ Ξ, but the value of Ξ is not particularly limited and may be appropriately set according to the calculation ability. When one reflector is installed in the vicinity of the microphone array, the transfer characteristic a ^→ (ω, θ _i ) can be specifically calculated by the equation (17b) (more precisely, θ in the equation (17b) is θ _i ). For the calculation of the steering vector, for example, Expression (14a), Expression (14b), Expression (18a), Expression (18b), Expression (18c), Expression (18d) can be used. In addition, you may use the transfer characteristic obtained by actual measurement in a real environment, for example, without depending on Formula (17a) and Formula (17b). Then, using the transfer characteristic a ^→ (ω, θ _i ), for example, W ^→ (ω, θ _i ) (1) according to any one of the equations (9), (29), (30), and (33). ≦ i ≦ I) is obtained. In addition, when using Formula (9), Formula (30), or Formula (33), the spatial correlation matrix Q (ω) (or R _xx (ω)) can be calculated by Formula (10b). When using equation (29), the spatial correlation matrix R _nn (ω) can be calculated by equation (27). I × | Ω | filters W ^→ (ω, θ _i ) (1 ≦ i ≦ I, ω∈Ω) are stored in the storage unit 290. | Ω | represents the number of elements of the set Ω.

ステップＳ２
マイクロホンアレーを構成するM個のマイクロホン２００−１，…，２００−Ｍを用いて収音する。Mは２以上の整数である。 Step S2
Sound is picked up using M microphones 200-1,..., 200-M constituting the microphone array. M is an integer of 2 or more.

M個のマイクロホンの並べ方に制限は無い。ただし、２次元または３次元的にM個のマイクロホンを配置することによって、音声強調する方向の不確定性がなくなるという利点がある。つまり、M個のマイクロホンを水平方向に直線状に並べたときに例えば正面方向から到来する音声と真上から到来する音声との区別ができなくなるという問題を、マイクロホンを平面的ないし立体的に並べることで防ぐことができる。また、収音方向として設定できる方向を広くとるためには、各マイクロホンの指向性は、収音方向である目的方向θ_sになり得る方向にある程度の音圧で音声を収音可能な指向性を持っていたほうがよい。したがって、無指向性マイクロホンや単一指向性マイクロホンといった指向性が比較的緩やかなマイクロホンが好適である。 There is no limit to the way the M microphones are arranged. However, by arranging M microphones two-dimensionally or three-dimensionally, there is an advantage that there is no uncertainty in the direction of voice enhancement. In other words, when M microphones are arranged in a straight line in the horizontal direction, for example, it is impossible to distinguish between voices coming from the front direction and voices coming from directly above. Can be prevented. In addition, in order to take a wide range of directions that can be set as the sound collection direction, the directivity of each microphone is a directivity capable of collecting sound with a certain sound pressure in a direction that can be the target direction θ _s that is the sound collection direction. It is better to have Therefore, a microphone having a relatively gentle directivity such as an omnidirectional microphone or a unidirectional microphone is preferable.

ステップＳ３
ＡＤ変換部２１０が、M個のマイクロホン２００−１，…，２００−Ｍで収音されたアナログ信号（収音信号）をディジタル信号x^→(t)＝[x₁(t),…,x_M(t)]^Tへ変換する。ｔは離散時間のインデックスを表す。 Step S3
The AD converter 210 converts an analog signal (sound collected signal) collected by the M microphones 200-1,..., 200-M into a digital signal x ^→ (t) = [x ₁ (t),. _M (t)] Convert to ^T. t represents a discrete time index.

ステップＳ４
フレーム生成部２２０は、ＡＤ変換部２１０が出力したディジタル信号x^→(t)＝[x₁(t),…,x_M(t)]^Tを入力とし、チャネルごとにNサンプルをバッファに貯めてフレーム単位のディジタル信号x^→(k)＝[x^→ ₁(k),…,x^→ _M(k)]^Tを出力する。kはフレーム番号のインデックスである。x^→ _m(k)=[x_m((k-1)N+1),…,x_m(kN)]（1≦m≦M）である。Nはサンプリング周波数にもよるが、16kHzサンプリングの場合には512点あたりが妥当である。 Step S4
The frame generation unit 220 receives the digital signal x ^→ (t) = [x ₁ (t),..., X _M (t)] ^T output from the AD conversion unit 210 and stores N samples in a buffer for each channel. Thus, the digital signal x ^→ (k) = [x ^→ ₁ (k),..., X ^→ _M (k)] ^T is output in frame units. k is an index of the frame number. x ^→ _m (k) = [x _m ((k−1) N + 1),..., x _m (kN)] (1 ≦ m ≦ M). N depends on the sampling frequency, but in the case of 16kHz sampling, around 512 points is reasonable.

ステップＳ５
周波数領域変換部２３０は、各フレームのディジタル信号x^→(k)を周波数領域の信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tに変換して出力する。ωは離散周波数のインデックスである。時間領域信号を周波数領域信号に変換する方法の一つに高速離散フーリエ変換があるが、これに限定されず、周波数領域信号に変換する他の方法を用いてもよい。周波数領域信号X^→(ω,k)は、各周波数ω、フレームkごとに出力される。 Step S5
The frequency domain transform unit 230 converts the digital signal x ^→ (k) of each frame into the frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T Convert to and output. ω is an index of discrete frequency. One method for converting a time domain signal to a frequency domain signal is a fast discrete Fourier transform, but the present invention is not limited to this, and other methods for converting to a frequency domain signal may be used. The frequency domain signal X ^→ (ω, k) is output for each frequency ω and for each frame k.

ステップＳ６
フィルタ適用部２４０は、フレームkごとに、各周波数ω∈Ωについて、周波数領域信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tに、強調したい目的方向θ_sに対応するフィルタW^→(ω,θ_s)を適用して、出力信号Y(ω,k,θ_s)を出力する（式（３４）参照）。目的方向θ_sのインデックスsは、s∈{1,…,I}であり、フィルタW^→(ω,θ_s)は記憶部２９０に記憶されているので、例えば、ステップＳ６の処理の都度、フィルタ適用部２４０は、強調したい目的方向θ_sに対応するフィルタW^→(ω,θ_s)を記憶部２９０から取得すればよい。目的方向θ_sのインデックスsが集合{1,…,I}に属さない場合、つまり、目的方向θ_sに対応するフィルタW^→(ω,θ_s)がステップＳ１の処理で計算されていない場合、臨時に目的方向θ_sに対応するフィルタW^→(ω,θ_s)をフィルタ設計部２６０に計算させてもよいし、あるいは目的方向θ_sに近い方向θ_s'に対応するフィルタW^→(ω,θ_s')を用いてよい。

Step S6
The filter application unit 240 changes the frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k)] ^T for each frequency ω∈Ω for each frame k. Then, by applying the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s to be emphasized, the output signal Y (ω, k, θ _s ) is output (see Expression (34)). Since the index s of the target direction θ _s is s∈ {1,..., I}, and the filter W ^→ (ω, θ _s ) is stored in the storage unit 290, for example, every time the process of step S6 is performed, The filter application unit 240 may acquire the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s to be emphasized from the storage unit 290. When the index s of the target direction θ _s does not belong to the set {1,..., I}, that is, when the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s is not calculated in the process of step S1. The filter design unit 260 may calculate the filter W ^→ (ω, θ _s ) corresponding to the target direction θ _s temporarily, or the filter W ^→ ( ^→ corresponding to the direction θ _{s ′} close to the target direction θ _s (ω, θ _{s ′} ) may be used.

ステップＳ７
時間領域変換部２５０は、第kフレームの各周波数ω∈Ωの出力信号Y(ω,k,θ_s)を時間領域に変換して第kフレームのフレーム単位時間領域信号y(k)を得て、さらに、得られたフレーム単位時間領域信号y(k)をフレーム番号のインデックスの順番に連結して目的方向θ_sの音声が強調された時間領域信号y(t)を出力する。周波数領域信号を時間領域信号に変換する方法は、ステップＳ５の処理で用いた変換方法に対応する逆変換であり、例えば高速離散逆フーリエ変換である。 Step S7
The time domain transform unit 250 transforms the output signal Y (ω, k, θ _s ) of each frequency ω∈Ω of the kth frame into the time domain to obtain a frame unit time domain signal y (k) of the kth frame. Further, the obtained frame unit time domain signal y (k) is connected in the order of the frame number index to output the time domain signal y (t) in which the voice in the target direction θ _s is emphasized. The method for converting the frequency domain signal into the time domain signal is an inverse transformation corresponding to the transformation method used in the process of step S5, for example, a fast discrete inverse Fourier transform.

ここでは、ステップＳ１の処理で予めフィルタW^→(ω,θ_i)を計算しておく実施形態を説明したが、狭指向音声強調装置１の計算処理能力などに応じて、目的方向θ_sが定まってからフィルタ設計部２６０が周波数ごとのフィルタW^→(ω,θ_s)を計算する実施形態を採用することもできる。 Here, the embodiment in which the filter W ^→ (ω, θ _i ) is calculated in advance in the process of step S1 has been described, but the target direction θ _s is determined according to the calculation processing capability of the narrow directivity speech enhancement apparatus 1 and the like. An embodiment in which the filter design unit 260 calculates the filter W ^→ (ω, θ _s ) for each frequency after it is determined can also be adopted.

本発明の実施形態（最小分散無歪応答法）による実験結果を説明する。図６に示すように、24本のマイクロホンを直線的に配置し、この線形マイクロホンアレーに含まれるマイクロホンの配列方向が反射板３００の法線となるように反射板３００を配置した。反射板３００の形状に制限はないが、反射面が平面であって、1.0m×1.0mのサイズと適度な厚みと剛性を持つ平板の反射板を用いた。隣り合うマイクロホンの間隔を4cm、反射板３００の反射率αを0.8とした。目的方向θ_sを45度に設定した。線形マイクロホンアレーに音声が平面波として到来すると仮定し、伝達特性を式（１７ｂ）（式（１４ａ）、式（１８ａ）を参照）で算出して、生成されるフィルタの指向性を検証した。比較対象として、２つの従来法（反射板無しの最小分散無歪応答法と反射板有りの遅延合成法）を用いた。 An experimental result according to an embodiment of the present invention (minimum dispersion no distortion response method) will be described. As shown in FIG. 6, 24 microphones are linearly arranged, and the reflector 300 is arranged so that the arrangement direction of the microphones included in the linear microphone array is a normal line of the reflector 300. Although there is no restriction | limiting in the shape of the reflecting plate 300, the reflecting surface is a plane, The flat reflecting plate which has a size of 1.0m x 1.0m, moderate thickness, and rigidity was used. The interval between adjacent microphones was 4 cm, and the reflectance α of the reflector 300 was 0.8. The target direction θ _s was set to 45 degrees. Assuming that the voice arrives at the linear microphone array as a plane wave, the transfer characteristic is calculated by equation (17b) (see equations (14a) and (18a)), and the directivity of the generated filter is verified. For comparison, two conventional methods (minimum dispersion no distortion response method without a reflector and delayed synthesis method with a reflector) were used.

実験結果を図７、図８に示す。２つの従来法と比較して、どの周波数帯域でも本発明の実施形態の方が、目的方向に対して鋭い指向性を実現できていることが分かる。特に、低周波数帯域ほど本発明の有用性が理解される。また、図９には、本発明の実施形態に従って生成したフィルタW^→(ω,θ)による指向性を示した。図９から、直接音だけでなく、反射音も強調していることが分かる。 The experimental results are shown in FIGS. Compared to the two conventional methods, it can be seen that the embodiment of the present invention can realize sharp directivity with respect to the target direction in any frequency band. In particular, the lower the frequency band, the better the utility of the present invention. FIG. 9 shows the directivity by the filter W ^→ (ω, θ) generated according to the embodiment of the present invention. FIG. 9 shows that not only the direct sound but also the reflected sound is emphasized.

また、図１０に示すように、線形マイクロホンアレーに含まれるマイクロホンの配列方向と反射板３００の平面とのなす角が45度になるように反射板３００を配置した場合についても上述の実験と同様の実験を行った。目的方向θ_sを22.5度に設定し、その他の実験条件は線形マイクロホンアレーに含まれるマイクロホンの配列方向が反射板３００の法線となるように反射板３００を配置した場合と同じとした。 As shown in FIG. 10, the case where the reflector 300 is arranged so that the angle formed by the arrangement direction of the microphones included in the linear microphone array and the plane of the reflector 300 is 45 degrees is the same as in the above-described experiment. The experiment was conducted. The target direction θ _s was set to 22.5 degrees, and other experimental conditions were the same as when the reflector 300 was arranged so that the arrangement direction of the microphones included in the linear microphone array was normal to the reflector 300.

実験結果を図１１、図１２に示す。２つの従来法と比較して、どの周波数帯域でも本発明の実施形態の方が、目的方向に対して鋭い指向性を実現できていることが分かる。特に、低周波数帯域ほど本発明の有用性が理解される。 The experimental results are shown in FIGS. Compared to the two conventional methods, it can be seen that the embodiment of the present invention can realize sharp directivity with respect to the target direction in any frequency band. In particular, the lower the frequency band, the better the utility of the present invention.

次に、本発明の実施構成の例を図１３〜図１７を参照して説明する。これらの例ではマイクロホンアレーの構成は線形マイクロホンアレーとして図示されているが、線形マイクロホンアレーの構成に限定されない。 Next, exemplary embodiments of the present invention will be described with reference to FIGS. In these examples, the configuration of the microphone array is illustrated as a linear microphone array, but is not limited to the configuration of the linear microphone array.

図１３に示す実施構成例では、線形マイクロホンアレーを構成するM個のマイクロホン２００−１，…，２００−Ｍは矩形平板状の支持部材４００に固定されており、この状態で各マイクロホンの収音孔は支持部材４００の或る一つの平面（以下、開口面と呼ぶ）に配置されている（図示の例ではM=13）。なお、各マイクロホン２００−１，…，２００−Ｍに接続される配線は図示していない。そして、各マイクロホン２００−１，…，２００−Ｍの配列方向が矩形平板状の反射板３００の法線となるように反射板３００が支持部材４００の端部に固定されている。支持部材４００の開口面は、反射板３００と９０度をなす面である。図１３に示す実施構成例では、反射板３００の好ましいとされる性状は既述の反射物の性状と同じであり、支持部材４００の性状については特に限定はなく各マイクロホン２００−１，…，２００−Ｍをしっかりと固定できる剛性を持っていれば十分である。 In the embodiment shown in FIG. 13, M microphones 200-1,..., 200-M constituting the linear microphone array are fixed to a rectangular flat plate-like support member 400, and in this state, sound collection of each microphone is performed. The holes are arranged on one plane (hereinafter referred to as an opening surface) of the support member 400 (M = 13 in the illustrated example). Note that wirings connected to the microphones 200-1,..., 200-M are not shown. The reflection plate 300 is fixed to the end of the support member 400 so that the arrangement direction of the microphones 200-1,..., 200-M is the normal line of the rectangular flat reflection plate 300. The opening surface of the support member 400 is a surface that forms 90 degrees with the reflector 300. In the embodiment configuration example shown in FIG. 13, the preferable property of the reflector 300 is the same as the property of the reflector described above, and the property of the support member 400 is not particularly limited, and each microphone 200-1,. It is sufficient to have a rigidity capable of firmly fixing 200-M.

図１４（ａ）に示す実施構成例では、支持部材４００の端部に軸部４１０が固定されており、反射板３００は軸部４１０に回動自在に取り付けられている。この実施構成例によると、マイクロホンアレーに対する反射板３００の幾何学的配置を変更することが可能である。 In the exemplary embodiment shown in FIG. 14A, the shaft portion 410 is fixed to the end portion of the support member 400, and the reflection plate 300 is rotatably attached to the shaft portion 410. According to this embodiment, it is possible to change the geometric arrangement of the reflector 300 with respect to the microphone array.

図１４（ｂ）に示す実施構成例では、図１３に示す実施構成例において、さらに二つの反射板３１０，３２０が追加されている。追加された二つの反射板３１０，３２０の性状は反射板３００の性状と同じでも異なってもよい。また、反射板３１０の性状は反射板３２０の性状と同じでも異なってもよい。以下、反射板３００を固定反射板３００と呼称する。固定反射板３００の端部（支持部材４００に固定されている固定反射板３００の端部とは反対側の端部）に軸部５１０が固定されており、反射板３１０は軸部５１０に回動自在に取り付けられている。また、支持部材４００の端部（固定反射板３００が固定されている支持部材４００の端部とは反対側の端部）に軸部５２０が固定されており、反射板３２０は軸部５２０に回動自在に取り付けられている。以下、反射板３１０，３２０を可動反射板３１０，３２０と呼称する。図１４（ｂ）に示す実施構成例によると、例えば固定反射板３００の反射面と可動反射板３１０の反射面が一致するように可動反射板３１０の位置を設定すると、固定反射板３００と可動反射板３１０の組み合わせを、固定反射板３００よりも大きい反射面を持つ反射板として機能させることができる。また、図１４（ｂ）に示す実施構成例によると、可動反射板３１０，３２０を適切な位置に設定することによって、例えば図１５に示すように支持部材４００、固定反射板３００、可動反射板３１０，３２０で囲まれた空間内で何度も音声を反射させることができるので、反射音の数Ξを制御することができる。なお、図１４（ｂ）に示す実施構成例の場合、支持部材４００は反射物としての役割を果たすことになるので、既述の反射物の性状と同じ性状を持つことが好ましい。 In the embodiment configuration example shown in FIG. 14B, two reflectors 310 and 320 are further added to the embodiment configuration example shown in FIG. The properties of the two added reflectors 310 and 320 may be the same as or different from those of the reflector 300. The properties of the reflector 310 may be the same as or different from those of the reflector 320. Hereinafter, the reflection plate 300 is referred to as a fixed reflection plate 300. The shaft 510 is fixed to the end of the fixed reflector 300 (the end opposite to the end of the fixed reflector 300 fixed to the support member 400), and the reflector 310 is rotated around the shaft 510. It is attached movably. Further, the shaft portion 520 is fixed to the end portion of the support member 400 (the end portion opposite to the end portion of the support member 400 to which the fixed reflection plate 300 is fixed), and the reflection plate 320 is attached to the shaft portion 520. It is pivotally attached. Hereinafter, the reflectors 310 and 320 are referred to as the movable reflectors 310 and 320. 14B, for example, when the position of the movable reflecting plate 310 is set so that the reflecting surface of the fixed reflecting plate 300 and the reflecting surface of the movable reflecting plate 310 coincide with each other, the fixed reflecting plate 300 and the movable reflecting plate 300 are movable. The combination of the reflecting plates 310 can be made to function as a reflecting plate having a reflecting surface larger than that of the fixed reflecting plate 300. 14B, by setting the movable reflectors 310 and 320 to appropriate positions, for example, as shown in FIG. 15, the support member 400, the fixed reflector 300, and the movable reflector. Since the sound can be reflected many times in the space surrounded by 310 and 320, the number of reflected sounds can be controlled. In the case of the embodiment shown in FIG. 14B, the support member 400 serves as a reflector, and therefore preferably has the same properties as those of the reflector described above.

図１６に示す実施構成例は、反射板３００にもマイクロホンアレー（図示の例では線形マイクロホンアレー）が設けられていることが図１３に示す実施構成例と異なる。図１６に示す実施構成例では、支持部材４００に固定されたM個のマイクロホンの配列方向と反射板３００に固定されたM’個のマイクロホンの配列方向が同一平面上にあるが、このような配置構成に限定されない（図示の例ではM’=13）。例えば、支持部材４００に固定されたM個のマイクロホンの配列方向と直交するような配列方向を持つように反射板３００にM’個のマイクロホンが固定されていてもよい。図１６に示す実施構成例によると、支持部材４００に設けられたマイクロホンアレーと反射板３００（反射板３００に設けられたマイクロホンアレーを使用せず、反射板３００を反射物として使用する）との組み合わせで本発明を実施したり、支持部材４００（支持部材４００に設けられたマイクロホンアレーを使用せず、支持部材４００を反射物として使用する）と反射板３００に設けられたマイクロホンアレーとの組み合わせで本発明を実施したりすることができる。 16 differs from the embodiment shown in FIG. 13 in that the reflector 300 is also provided with a microphone array (linear microphone array in the example shown). In the embodiment shown in FIG. 16, the arrangement direction of the M microphones fixed to the support member 400 and the arrangement direction of the M ′ microphones fixed to the reflector 300 are on the same plane. The arrangement is not limited (M ′ = 13 in the illustrated example). For example, M ′ microphones may be fixed to the reflection plate 300 so as to have an arrangement direction orthogonal to the arrangement direction of M microphones fixed to the support member 400. According to the exemplary configuration shown in FIG. 16, the microphone array provided on the support member 400 and the reflector 300 (the microphone array provided on the reflector 300 is not used and the reflector 300 is used as a reflector). The present invention is implemented by a combination, or a combination of the support member 400 (the microphone array provided on the support member 400 is not used and the support member 400 is used as a reflector) and the microphone array provided on the reflector 300 The present invention can be implemented.

また、図１６に示す実施構成例の拡張実施構成例として、図１４（ｂ）に示す実施構成例と同様に、図１６に示す実施構成例においてさらに二つの反射板３１０，３２０を追加した構成としてもよい（図１７参照）。また、図示していないが、可動反射板３１０，３２０の少なくとも一つにマイクロホンアレーを設けてもよい。可動反射板３１０に設けられるマイクロホンアレーを構成する各マイクロホンの収音孔は、例えば、支持部材４００の開口面と対向可能な可動反射板３１０の平面（開口面）に配置される。可動反射板３２０に設けられるマイクロホンアレーを構成する各マイクロホンの収音孔は、例えば、支持部材４００の開口面と同一平面を形成可能な可動反射板３２０の平面（開口面）に配置される。このような実施構成例であっても図１４（ｂ）に示す実施構成例と同様の使用形態が可能である。また、この実施構成例によると、例えば支持部材４００の開口面と可動反射板３２０の開口面が一致するように可動反射板３２０の位置を設定すると、支持部材４００と可動反射板３２０の組み合わせを、支持部材４００に設けられたマイクロホンアレーよりも大きいマイクロホンアレーとして機能させることができる。図１７に示す実施構成例においても、可動反射板３１０，３２０の少なくとも一つにマイクロホンアレーを設けた実施構成例においても、図１５に示す実施構成例と同様の使用形態が可能である。また、図１７に示す実施構成例においても、可動反射板３１０，３２０の少なくとも一つにマイクロホンアレーを設けた実施構成例においても、例えば、可動反射板３１０，３２０を通常の反射物として用い、支持部材４００に設けられたマイクロホンアレーと固定反射板３００に設けられたマイクロホンアレーとを一体のマイクロホンアレーとして用いる使用形態も可能である。この場合、(M+M’)個のマイクロホンで構成されたマイクロホンアレーと二つの反射物を使用する実施構成例と等価となる。 Further, as an example of an extended implementation configuration of the implementation configuration example shown in FIG. 16, a configuration in which two reflectors 310 and 320 are further added to the implementation configuration example shown in FIG. It is good also as (refer FIG. 17). Although not shown, a microphone array may be provided on at least one of the movable reflectors 310 and 320. The sound collection holes of the microphones constituting the microphone array provided in the movable reflecting plate 310 are arranged, for example, on the plane (opening surface) of the movable reflecting plate 310 that can face the opening surface of the support member 400. The sound collecting holes of the microphones constituting the microphone array provided in the movable reflecting plate 320 are disposed on the plane (opening surface) of the movable reflecting plate 320 that can form the same plane as the opening surface of the support member 400, for example. Even such an example of the configuration can be used in the same manner as in the configuration example shown in FIG. Further, according to this embodiment, for example, when the position of the movable reflecting plate 320 is set so that the opening surface of the supporting member 400 and the opening surface of the movable reflecting plate 320 coincide with each other, the combination of the supporting member 400 and the movable reflecting plate 320 is changed. The microphone array can be made to function larger than the microphone array provided on the support member 400. Also in the embodiment configuration example shown in FIG. 17, in the embodiment configuration example in which at least one of the movable reflectors 310 and 320 is provided with a microphone array, the same usage pattern as that in the embodiment configuration example shown in FIG. Also in the example of the configuration shown in FIG. 17, in the example of the configuration in which the microphone array is provided on at least one of the movable reflectors 310 and 320, for example, the movable reflectors 310 and 320 are used as normal reflectors. It is also possible to use the microphone array provided on the support member 400 and the microphone array provided on the fixed reflector 300 as an integrated microphone array. In this case, this is equivalent to an embodiment in which a microphone array composed of (M + M ′) microphones and two reflectors are used.

可動反射板３１０にマイクロホンアレーを設ける場合、可動反射板３１０に設けられるマイクロホンアレーを構成する各マイクロホンの収音孔が、支持部材４００の開口面と対向可能な可動反射板３１０の平面の反対側の平面（開口面）に配置されるように、可動反射板３１０にマイクロホンアレーを設けてもよい。また、可動反射板３２０にマイクロホンアレーを設ける場合、可動反射板３２０に設けられるマイクロホンアレーを構成する各マイクロホンの収音孔が、支持部材４００の開口面と同一平面を形成可能な可動反射板３２０の平面の反対側の平面（開口面）に配置されるように、可動反射板３２０にマイクロホンアレーを設けてもよい。もちろん、可動反射板３１０，３２０の少なくとも一つについて、その両面に開口面とするように当該可動反射板にマイクロホンアレーを設けてもよい。 When the movable reflector 310 is provided with a microphone array, the sound collection holes of the microphones constituting the microphone array provided in the movable reflector 310 are opposite to the plane of the movable reflector 310 that can face the opening surface of the support member 400. A microphone array may be provided on the movable reflecting plate 310 so as to be arranged on the flat surface (opening surface). When the microphone array is provided on the movable reflector 320, the movable reflector 320 that can form the same plane as the opening surface of the support member 400 with the sound collection holes of the microphones constituting the microphone array provided on the movable reflector 320. A microphone array may be provided on the movable reflector 320 so as to be arranged on a plane (opening surface) opposite to the plane. Of course, a microphone array may be provided on the movable reflecting plate so that at least one of the movable reflecting plates 310 and 320 has an opening surface on both sides thereof.

［Ａ］マイクロホンアレーを可動反射板３１０，３２０の少なくとも一つに設けた場合であって、可動反射板３１０の開口面を支持部材４００の開口面と対向可能な平面とした場合ないし可動反射板３２０の開口面を支持部材４００の開口面と同一平面を形成可能な平面とした場合、図１５に示す使用形態では、視線方向に対して可動反射板３１０および/または可動反射板３２０の開口面が見えないように可動反射板３１０および/または可動反射板３２０が配置されることによって視線方向の見かけ上のアレーサイズは小さくなるものの、可動反射板３１０および/または可動反射板３２０に設けられたマイクロホンアレーを利用することによって、アレーサイズを大きくした場合と同じ効果を得ることができる。 [A] When the microphone array is provided on at least one of the movable reflectors 310 and 320, and the opening surface of the movable reflector 310 is a plane that can face the opening surface of the support member 400, or the movable reflector. When the opening surface of 320 is a flat surface that can form the same plane as the opening surface of the support member 400, in the usage mode shown in FIG. Although the apparent array size in the line-of-sight direction is reduced by disposing the movable reflector 310 and / or the movable reflector 320 so as not to be visible, the movable reflector 310 and / or the movable reflector 320 is provided. By using the microphone array, the same effect as when the array size is increased can be obtained.

［Ｂ］マイクロホンアレーを可動反射板３１０，３２０の少なくとも一つに設けた場合であって、可動反射板３１０の開口面を支持部材４００の開口面と対向可能な平面の反対側の平面とした場合ないし可動反射板３２０の開口面を支持部材４００の開口面と同一平面を形成可能な平面の反対側の平面とした場合、図１５に示す使用形態では、視線方向に対して見かけ上のアレーサイズを保ったまま、アレーサイズを大きくした場合と同じ効果を得ることができる。 [B] When the microphone array is provided on at least one of the movable reflectors 310 and 320, the opening surface of the movable reflector 310 is a plane opposite to the plane that can face the opening surface of the support member 400. If the opening surface of the movable reflecting plate 320 is a plane opposite to the plane that can form the same plane as the opening surface of the support member 400, in the usage mode shown in FIG. The same effect as when the array size is increased can be obtained while maintaining the size.

可動反射板３１０，３２０の少なくとも一つについて、その両面に開口面とするように当該可動反射板にマイクロホンアレーを設けた場合には、［Ａ］と［Ｂ］の双方の効果を得ることも可能である。 When at least one of the movable reflectors 310 and 320 is provided with a microphone array on the movable reflector so as to have an opening surface on both sides, the effects of both [A] and [B] can be obtained. Is possible.

上述したように、線形マイクロホンアレーの配列方向と垂直に平板型の反射板を設置することは、狭指向性のビームを生成出来る条件の一つである。図１６、図１７に示した実施形態では、１次元状にマイクロホンが展開されているので、例えばこれらのマイクロホンの配列方向を水平方向として設置した場合には、水平角方向の方向制御は可能であるが、仰角方向の方向制御を行うことが出来ない。そこで、図１７において述べた実施構成例を３次元状に展開することで、３次元空間における任意の方向を強調できるようにした実施形態について以下説明する。図１８、図１９を参照して、上述した図１７の実施構成例における固定反射板３００、支持部材４００が正八角錘の向かい合う角錘面を構成するように組み合せた第３の実施例に係るズームマイク装置６００について説明する。図１８は、本実施例に係るズームマイク装置６００の正面図である。図１９は、本実施例に係るズームマイク装置６００の側面図である。なお、本実施例では、図１７で呼び分けていた固定反射板３００、支持部材４００を全て固定反射板と総称する。また図１７で可動反射板３１０、３２０として示した構成は、本実施例においても可動反射板と呼ばれる。本実施例のズームマイク装置６００は、１基の支持構造体６０１、８枚の固定反射板６１１〜６１８、８個の固定マイクロホンアレー６２１〜６２８、８枚の可動反射板６３１〜６３８、８個の可動マイクロホンアレー６４１〜６４８、１枚の中央反射板６５１、１個の中央マイクロホンアレー６６１、８個の蝶番６７１〜６７８、８枚の支持金属板（大）６８１〜６８８、８枚の支持金属板（小）６９１〜６９８および、ボルトやビスなどの接合用部品からなる。支持構造体６０１は固定反射板６１１〜６１８、中央反射板６５１を所定の位置、向きに固定指示することを目的としている。支持構造体６０１は例えば型鋼、角鋼管などにより組み上げることができる。固定反射板６１１〜６１８は台形形状をした平板であり、反射率の高い素材で構成されている。例えば、固定反射板６１１〜６１８には厚み１ｃｍ程度の木材、ＡＢＳ樹脂素材などを用いることができる。固定マイクロホンアレー６２１〜６２８は、複数のマイクロホンが一直線上に並ぶように構成されている。本実施例では固定マイクロホンアレー６２１〜６２８は、複数のマイクロホンを長板形状の支持金属板（大）６８１〜６８８上に、支持金属板（大）６８１〜６８８の長手方向に一直線上に並ぶように配置して、それぞれのマイクロホンをボルトやねじなどの接合部材で支持金属板（大）６８１〜６８８上に固定することで、構成されている。固定反射板６１１〜６１８には固定マイクロホンアレー６２１〜６２８が、固定反射板６１１〜６１８の面と固定マイクロホンアレー６２１〜６２８のマイクロホン配列方向とがそれぞれ平行になるように、各固定反射板の面に対して１つずつ取り付けられている。本実施例では、各固定反射板に、複数の小孔を一直線上に設けてある。複数の小孔は台形形状をなす固定反射板の上底または下底の垂直二等分線上に配置されているものとする。この小孔を覗くように固定マイクロホンアレー６２１〜６２８の各マイクロホンの受音部が配置された状態で、前述の支持金属板（大）６８１〜６８８を長ボルトなどで固定反射板６１１〜６１８に取り付けることで、固定反射板６１１〜６１８に固定マイクロホンアレー６２１〜６２８を固定している。固定反射板６１１〜６１８は、相対する角錘面が垂直をなす正八角錘台の、８つの角錘面を構成するように組み合わせられ互いの斜辺が接合されている。正八角錘台の上底には、正八角形の平板形状の中央反射板６５１を配置し、中央反射板６５１の各辺には、台形形状の固定反射板６１１〜６１８の上底が接合されている。これら固定反射板６１１〜６１８と中央反射板６５１とによって形成される正八角錘台の凹面側の面をオモテ面とし、凸面側の面をウラ面とすれば、前述の支持金属板（大）６８１〜６８８は凸面側であるウラ面側に長ボルトなどで取り付けられているものとし、固定マイクロホンアレー６２１〜６２８の各マイクロホンの受音部は、凹面側であるオモテ面に向かって、小孔から露出しているものとする。固定反射板６１１〜６１８と中央反射板６５１とによって形成される正八角錘台は、前述した支持構造体６０１の所定の位置に所定の向きで取り付けられている。固定反射板６１１〜６１８と中央反射板６５１とによって形成される正八角錘台開口端辺には蝶番６７１〜６７８が１辺につき１個ずつ取り付けられている。本実施例においては、この蝶番６７１〜６７８には任意の角度で静止可能なヒンジ（例えば、フリーストップヒンジ）を用いている。この蝶番６７１〜６７８を介して、正八角錘台開口端辺の一辺につき１枚ずつ可動反射板６３１〜６３８が正八角錘台開口端辺を軸として回動可能に取り付けられている。本実施例では、可動反射板６３１〜６３８は台形形状をした平板であり、固定反射板６１１〜６１８と同じ素材からなる。固定反射板６１１〜６１８の下底の長さと、固定反射板６３１〜６３８の下底の長さは等しい。固定反射板６１１〜６１８と可動反射板６３１〜６３８の下底同士が蝶番６７１〜６７８を介して接続されている。蝶番６７１〜６７８には任意の角度で静止可能なヒンジを用いているため、可動反射板６３１〜６３８は手動で角度を変更でき、かつ手動で動かされた角度を維持することができる。可動反射板６３１〜６３８には、前述の固定反射板６１１〜６１８と同様に上底または下底の垂直二等分線上に一直線に小孔が設けられている。可動反射板６３１〜６３８には、可動マイクロホンアレー６４１〜６４８が取り付けられている。可動マイクロホンアレー６４１〜６４８は、前述の固定マイクロホンアレー６２１〜６２８と同様に、複数のマイクロホンを長板形状の支持金属板（小）６９１〜６９８上に長手方向に一直線上に並ぶように配置して、支持金属板（小）６９１〜６９８に固定することで構成されている。可動マイクロホンアレー６４１〜６４８は、固定マイクロホンアレー６２１〜６２８と同様、その受音部が凹面側であるオモテ面に向かって小孔から露出するように、支持金属板（小）６９１〜６９８と長ボルトを用いて可動反射板６３１〜６３８に取り付けられている。各反射板に設けられている小孔の位置は、可動反射板６３１〜６３８を可動軸回りに回動させて、可動反射板６３１〜６３８の凹面側の板面と固定反射板６１１〜６１８の凹面側の板面とが同一平面内となる場合に、可動反射板に設けられた小孔と、それに接続された固定反射板に設けられた小孔が同一直線上に配置されるように決められている。前述の中央反射板６５１には、中央マイクロホンアレー６６１が取り付けられている。この中央マイクロホンアレー６６１は、支持棒と、複数のマイクロホンとからなる。支持棒は、丸棒の径方向に貫通する丸穴が軸方向に複数配置されてなる。マイクロホンはこの支持棒に空けられた丸穴に配置固定されている。このように構成された中央マイクロホンアレー６６１は、その支持棒が固定反射板６１１〜６１８の面を角錘面として形成される正八角錘の頭頂点を通り正八角錘の底面と垂直となる直線上に配置されるよう、中央反射板６５１のオモテ面（凹面側）に垂直に取り付けられている。 As described above, it is one of the conditions that a narrow directivity beam can be generated by installing a flat reflector in the direction perpendicular to the arrangement direction of the linear microphone array. In the embodiment shown in FIGS. 16 and 17, since the microphones are deployed one-dimensionally, for example, when these microphones are arranged in the horizontal direction, the direction control in the horizontal angular direction is possible. There is, however, direction control in the elevation direction cannot be performed. Therefore, an embodiment in which an arbitrary direction in a three-dimensional space can be emphasized by developing the example of the embodiment configuration described in FIG. 17 in a three-dimensional manner will be described below. Referring to FIGS. 18 and 19, according to the third embodiment, the fixed reflector 300 and the support member 400 in the above-described embodiment configuration example of FIG. 17 are combined so as to form the opposite pyramid surfaces of the regular octagonal pyramid. The zoom microphone device 600 will be described. FIG. 18 is a front view of the zoom microphone device 600 according to the present embodiment. FIG. 19 is a side view of the zoom microphone device 600 according to the present embodiment. In the present embodiment, all of the fixed reflector 300 and the support member 400 that are referred to in FIG. 17 are collectively referred to as a fixed reflector. The configuration shown as the movable reflectors 310 and 320 in FIG. 17 is also called a movable reflector in this embodiment. The zoom microphone device 600 of this embodiment includes one support structure 601, eight fixed reflectors 611 to 618, eight fixed microphone arrays 621 to 628, eight movable reflectors 631 to 638, and eight. Movable microphone arrays 641 to 648, one central reflector 651, one central microphone array 661, eight hinges 671 to 678, eight supporting metal plates (large) 681 to 688, and eight supporting metals It consists of joining parts such as plates (small) 691-698 and bolts and screws. The support structure 601 is intended to instruct the fixed reflectors 611 to 618 and the central reflector 651 to be fixed at predetermined positions and orientations. The support structure 601 can be assembled using, for example, a die steel or a square steel pipe. The fixed reflecting plates 611 to 618 are trapezoidal flat plates and are made of a material having high reflectivity. For example, the fixed reflectors 611 to 618 can be made of wood having a thickness of about 1 cm, ABS resin material, or the like. The fixed microphone arrays 621 to 628 are configured such that a plurality of microphones are arranged in a straight line. In the present embodiment, the fixed microphone arrays 621 to 628 are arranged such that a plurality of microphones are aligned on the long plate-shaped support metal plates (large) 681 to 688 in a straight line in the longitudinal direction of the support metal plates (large) 681 to 688. And each microphone is fixed on a supporting metal plate (large) 681 to 688 with a joining member such as a bolt or a screw. The fixed reflectors 611 to 618 have fixed microphone arrays 621 to 628, and the surfaces of the fixed reflectors 611 to 628 are arranged so that the surfaces of the fixed reflectors 611 to 618 and the microphone arrangement directions of the fixed microphone arrays 621 to 628 are parallel to each other. One is attached to each. In this embodiment, each fixed reflector is provided with a plurality of small holes on a straight line. The plurality of small holes are arranged on the vertical bisector of the upper or lower base of the fixed reflector having a trapezoidal shape. With the sound receiving portions of the microphones of the fixed microphone arrays 621 to 628 arranged so as to look into the small holes, the supporting metal plates (large) 681 to 688 are fixed to the fixed reflecting plates 611 to 618 with a long bolt or the like. By attaching, the fixed microphone arrays 621 to 628 are fixed to the fixed reflecting plates 611 to 618. The fixed reflectors 611 to 618 are combined so as to form eight pyramidal surfaces of a regular octagonal pyramid whose opposing pyramidal surfaces are vertical, and their hypotenuses are joined. A regular octagonal flat plate-shaped central reflector 651 is arranged on the upper base of the regular octagonal pyramid, and the upper base of the trapezoidal fixed reflectors 611 to 618 is joined to each side of the central reflector 651. Yes. If the concave-side surface of the regular octagonal pyramid formed by the fixed reflecting plates 611 to 618 and the central reflecting plate 651 is a front surface and the convex-side surface is a back surface, the above-mentioned supporting metal plate (large) Numerals 681 to 688 are attached to the back surface side, which is the convex surface side, with long bolts, and the sound receiving portions of the microphones of the fixed microphone arrays 621 to 628 are small holes toward the front surface which is the concave surface side. It is assumed that it is exposed from. A regular octagonal pyramid formed by the fixed reflecting plates 611 to 618 and the central reflecting plate 651 is attached to a predetermined position of the support structure 601 in a predetermined direction. One hinge 671-678 is attached to each of the sides of the opening of the regular octagonal pyramid formed by the fixed reflectors 611-618 and the central reflector 651. In this embodiment, hinges 671 to 678 are hinges (for example, free stop hinges) that can be stopped at an arbitrary angle. Via the hinges 671 to 678, one movable reflecting plate 631 to 638 is attached to each side of the opening side of the regular octagonal truncated cone so as to be rotatable about the regular octagonal truncated cone opening end. In the present embodiment, the movable reflectors 631 to 638 are trapezoidal flat plates made of the same material as the fixed reflectors 611 to 618. The length of the lower base of the fixed reflectors 611 to 618 is equal to the length of the lower base of the fixed reflectors 631 to 638. The lower bases of the fixed reflectors 611 to 618 and the movable reflectors 631 to 638 are connected to each other via hinges 671 to 678. Since the hinges 671 to 678 are hinges that can be stopped at an arbitrary angle, the movable reflectors 631 to 638 can be manually changed in angle and can be maintained at the manually moved angle. Similar to the above-described fixed reflectors 611 to 618, the movable reflectors 631 to 638 are provided with small holes in a straight line on the vertical bisector of the upper or lower base. Movable microphone arrays 641 to 648 are attached to the movable reflectors 631 to 638. The movable microphone arrays 641 to 648 are arranged so that a plurality of microphones are arranged in a straight line in the longitudinal direction on the long support metal plates (small) 691 to 698 in the same manner as the fixed microphone arrays 621 to 628 described above. And it is comprised by fixing to support metal plate (small) 691-698. Similar to the fixed microphone arrays 621 to 628, the movable microphone arrays 641 to 648 are long with the supporting metal plates (small) 691 to 698 so that the sound receiving portions are exposed from the small holes toward the front surface on the concave side. It is attached to the movable reflectors 631 to 638 using bolts. The positions of the small holes provided in the respective reflecting plates are such that the movable reflecting plates 631 to 638 are rotated around the movable axis so that the concave reflecting plate surfaces of the movable reflecting plates 631 to 638 and the fixed reflecting plates 611 to 618 are arranged. When the plate surface on the concave side is in the same plane, the small holes provided in the movable reflector and the small holes provided in the fixed reflector connected thereto are arranged on the same straight line. It has been. A central microphone array 661 is attached to the central reflector 651 described above. The central microphone array 661 includes a support bar and a plurality of microphones. The support bar is formed by arranging a plurality of round holes penetrating in the radial direction of the round bar in the axial direction. The microphone is arranged and fixed in a round hole formed in the support rod. The central microphone array 661 configured as described above is a straight line whose support rod passes through the top vertex of the regular octagonal pyramid formed with the surfaces of the fixed reflectors 611 to 618 as the pyramidal surface and is perpendicular to the bottom surface of the regular octagonal pyramid. It is vertically attached to the front surface (concave surface side) of the central reflecting plate 651 so as to be disposed on the top.

前述したように、固定反射板６１１〜６１８の反射面（オモテ面）と可動反射板６３１〜６３８の反射面（オモテ面）が同一平面内に配置されるように可動反射板６３１〜６３８の位置を設定すると、固定反射板６１１〜６１８と可動反射板６３１〜６３８の組み合わせを、固定反射板６１１〜６１８よりも大きい反射面を持つ反射板として機能させることができる。さらに、可動反射板６３１〜６３８により背面から回り込む音を遮断することもできる。これらはどちらも狭指向性のビームを生成しやすい環境を創出することに役立つ。また、可動反射板６３１〜６３８を適切な位置に回動静止することによって、図１５で述べたように固定反射板６１１〜６１８、可動反射板６３１〜６３８で囲まれた空間内で何度も音声を反射させることができるので、反射音の数Ξを制御することができ、高いエネルギーで音を収音することができる。これにより、目的音と到来方向差のない角度から到来する雑音の伝達特性の間に差分を生じさせやすいことから狭指向性のビームを生成しやすい環境を創出することに役立つ。また、上述した構成により特定の方向からの音声と同方向の音声を距離別に収音したい場合には前述の正八角錘台の高さや角度を調整可能とすればよい。また、本実施例では固定反射板６１１〜６１８の面を角錘面として形成される正八角錘の頭頂点を通り正八角錘の底面と垂直となる直線上に配置されるよう、中央マイクロホンアレー６６１を取り付けている。これは直線形状のマイクロホンアレーの特徴である鉛直方向への指向性の強さを利用して、本ズームマイク装置６００の指向特性をさらに鋭角とするためのものである。 As described above, the positions of the movable reflecting plates 631 to 638 so that the reflecting surfaces (front surfaces) of the fixed reflecting plates 611 to 618 and the reflecting surfaces (front surfaces) of the movable reflecting plates 631 to 638 are arranged in the same plane. Is set, the combination of the fixed reflectors 611 to 618 and the movable reflectors 631 to 638 can function as a reflector having a larger reflection surface than the fixed reflectors 611 to 618. Furthermore, the sound that wraps around from the back surface can be blocked by the movable reflectors 631 to 638. Both of these are useful for creating an environment in which a narrow-directional beam can be easily generated. Further, by rotating and stationary the movable reflecting plates 631 to 638 to appropriate positions, as described in FIG. 15, the movable reflecting plates 631 to 638 are repeatedly performed in the space surrounded by the fixed reflecting plates 611 to 618 and the movable reflecting plates 631 to 638. Since sound can be reflected, the number of reflected sounds can be controlled, and sound can be collected with high energy. This is useful for creating an environment in which a beam having a narrow directivity is easily generated because a difference is easily generated between the target sound and the transfer characteristics of noise arriving from an angle having no arrival direction difference. In addition, with the above-described configuration, when the sound in the same direction as the sound from a specific direction is desired to be collected by distance, the height and angle of the regular octagonal pyramid may be adjusted. In the present embodiment, the central microphone array is arranged so as to be arranged on a straight line passing through the top of the regular octagonal pyramid formed with the surfaces of the fixed reflectors 611 to 618 as the pyramidal surface and perpendicular to the bottom of the regular octagonal pyramid. 661 is attached. This is to make the directivity characteristics of the zoom microphone device 600 more acute by utilizing the strength of directivity in the vertical direction, which is a characteristic of the linear microphone array.

［変形例１〜５］
第３の実施例では正八角錘の向かい合う角錘面が垂直をなすように組み合せたズームマイク装置を示したが、これに限られない。上述したように、線形マイクロホンアレーの配列方向と垂直に平板型の反射板を設置してあり、水平角方向、仰角方向に方向制御が可能であれば固定反射板の形状は正八角錘台の角錘面を構成する配置でなくともよい。また、可動反射板６３１〜６３８により反射音の数Ξの制御に用いることができるが、これは必須ではない。中央反射板６５１、中央マイクロホンアレー６６１も適宜省略することができる。反射板は対であるほうがよいので、２の倍数個の角錘面を持つ正角錘台、もしくは正角錘形状をしていることが望ましい。例えば、正四角錘（台）、正六角錘（台）、正八角錘（台）などである。また、正十六角錘（台）、正三十二角錘（台）など、反射板の数を多くしすぎてしまうと反射音のエネルギーが小さくなる、一枚の反射板に設置できるマイクロホンの数が少なくなり、マイクロホン間隔が広くなってしまうため制御できる周波数帯域が狭くなってしまう問題が生じる。使用できるマイクロホン数が１６〜９６本程度であれば正四角錘（台）、正六角錘（台）、正八角錘（台）形状が丁度よい。以下、図２０〜２４を参照して第３の実施例の第１〜第５の変形例について以下に説明する。図２０は本実施例の第１の変形例の構成を示す正面図である。図２１は本実施例の第２の変形例の構成を示す正面図である。図２２は本実施例の第３の変形例の構成を示す正面図である。図２３は本実施例の第４の変形例の構成を示す正面図である。図２４は本実施例の第５の変形例の構成を示す正面図である。第１の変形例におけるズームマイク装置６００ａは、前述のズームマイク装置６００の固定反射板、可動反射板をそれぞれ４枚ずつとし、固定反射板が正四角錘台の角錘面を構成するように組み合わせたものである。ズームマイク装置６００と同様、相対する角錘面が垂直をなすように取り付けられている。この構成においても前述のズームマイク装置６００と同様、３次元空間における任意の方向を強調できる。第２の変形例におけるズームマイク装置６００ｂは、前述のズームマイク装置６００の固定反射板、可動反射板をそれぞれ６枚ずつとし、固定反射板が正六角錘台の角錘面を構成するように組み合わせたものである。ズームマイク装置６００と同様、相対する角錘面が垂直をなすように取り付けられている。この構成においても前述のズームマイク装置６００と同様、３次元空間における任意の方向を強調できる。また、前述したように可動反射板、中央反射板、中央マイクロホンアレーは必須の構成要素ではない。従って、第１の変形例におけるズームマイク装置６００ａから、可動反射板、中央反射板、中央マイクロホンアレーを略した構成をズームマイク装置６００ｃ、第２の変形例におけるズームマイク装置６００ｂから、可動反射板、中央反射板、中央マイクロホンアレーを略した構成をズームマイク装置６００ｄ、第３の実施例におけるズームマイク装置６００から、可動反射板、中央反射板、中央マイクロホンアレーを略した構成をズームマイク装置６００ｅとして示す。これらのズームマイク装置６００ｃ、６００ｄ、６００ｅにおいても従来よりも鋭い指向性を有する狭指向音声強調技術の実現できる。 [Modifications 1 to 5]
In the third embodiment, the zoom microphone device is shown in which the pyramidal surfaces facing each other of the regular octagonal pyramids are perpendicular to each other. However, the present invention is not limited to this. As described above, a flat reflector is installed perpendicular to the arrangement direction of the linear microphone array. If the direction can be controlled in the horizontal angle direction and the elevation angle direction, the shape of the fixed reflector is a regular octagonal pyramid. The arrangement of the pyramid surface is not necessarily required. Further, the movable reflectors 631 to 638 can be used to control the number of reflected sounds, but this is not essential. The central reflector 651 and the central microphone array 661 can be omitted as appropriate. Since the reflecting plates are preferably paired, it is desirable that the reflectors have a regular pyramid shape having a multiple of two pyramid surfaces or a regular pyramid shape. For example, a regular tetragonal pyramid (pedestal), a regular hexagonal pyramid (pedestal), and a regular octagonal pyramid (pedestal). In addition, if the number of reflectors is too large, such as regular hexagonal pyramids (tables), regular thirty-two square pyramids (tables), the energy of the reflected sound will be reduced. As the number of microphones decreases and the interval between the microphones becomes wider, the frequency band that can be controlled becomes narrower. If the number of microphones that can be used is about 16 to 96, the shape of a regular square pyramid (pedestal), regular hexagonal pyramid (pedestal), and regular octagonal pyramid (pedestal) is just right. Hereinafter, first to fifth modifications of the third embodiment will be described with reference to FIGS. FIG. 20 is a front view showing the configuration of the first modification of the present embodiment. FIG. 21 is a front view showing the configuration of the second modification of the present embodiment. FIG. 22 is a front view showing the configuration of the third modification of the present embodiment. FIG. 23 is a front view showing the configuration of the fourth modification of the present embodiment. FIG. 24 is a front view showing the configuration of the fifth modification of the present embodiment. The zoom microphone device 600a according to the first modification includes four fixed reflectors and four movable reflectors of the zoom microphone device 600 described above, and the fixed reflectors constitute a pyramidal surface of a regular quadrangular frustum. It is a combination. Similar to the zoom microphone device 600, the opposing pyramid surfaces are attached so as to be vertical. Also in this configuration, as in the zoom microphone device 600 described above, an arbitrary direction in the three-dimensional space can be emphasized. The zoom microphone device 600b according to the second modification has six fixed reflecting plates and six movable reflecting plates, respectively, of the zoom microphone device 600 described above, and the fixed reflecting plates form a pyramid surface of a regular hexagonal frustum. It is a combination. Similar to the zoom microphone device 600, the opposing pyramid surfaces are attached so as to be vertical. Also in this configuration, as in the zoom microphone device 600 described above, an arbitrary direction in the three-dimensional space can be emphasized. Further, as described above, the movable reflector, the central reflector, and the central microphone array are not essential components. Therefore, from the zoom microphone device 600a in the first modification, the movable reflector, the central reflector, and the central microphone array are omitted from the zoom microphone device 600c, and from the zoom microphone device 600b in the second modification to the movable reflector. The zoom microphone device 600d has a configuration in which the central reflector and the central microphone array are omitted, and the zoom microphone device 600e has a configuration in which the movable reflector, the central reflector, and the central microphone array are omitted from the zoom microphone device 600 in the third embodiment. As shown. Also in these zoom microphone devices 600c, 600d, and 600e, it is possible to realize a narrow-directional speech enhancement technique having sharper directivity than before.

＜応用例＞
狭指向音声強調技術は、画像に譬えて表現すれば、不鮮明な惚けた画像から鮮明な画像を生成することに対応し、音場の情報をより詳細に得ることに役立つ。以下、本発明が有用なサービス例について述べる。 <Application example>
The narrow-directional speech enhancement technique is useful for obtaining sound field information in more detail, corresponding to the generation of a clear image from a blurred image when expressed in an image. Examples of services in which the present invention is useful will be described below.

第１の例として、映像と組み合わせたコンテンツ制作が挙げられる。本発明の実施形態を利用すると、雑音（目的外音声等）が多い雑音環境でも遠方の目的音声をクリアに強調することができるので、例えば、フィールド外から撮影したサッカー選手がドリブルするズームイン映像に対応した音声付けを行うことができる。 A first example is content production combined with video. By using the embodiment of the present invention, it is possible to clearly emphasize a target sound in a distant place even in a noisy environment where there is a lot of noise (such as non-target sound). Corresponding audio can be added.

第２の例として、TV会議システム（音声会議システムでもよい）への応用が挙げられる。狭い部屋で会議する場合には、従来技術でも、数本のマイクロホンを用いて発言者の音声を強調することがそれなりに可能であったが、広い会議室（例えばマイクロホンから5m以上離れた位置に話者が存在するような広い空間）では、クリアに遠方話者の音声を強調することが困難であり、このため、各発言者の前にマイクロホンを設置する必要があった。しかし、本発明の実施形態を利用すると、遠方の音をクリアに強調することが可能であるため、各発言者の前にマイクロホンを設置することなく、広い会議室に対応したTV会議システムを構築することが可能となる。 As a second example, there is an application to a TV conference system (which may be an audio conference system). When conferencing in a small room, it was possible to emphasize the voice of the speaker using several microphones in the conventional technology as well, but in a large conference room (for example, 5 m or more away from the microphone) In a wide space where a speaker is present), it is difficult to clearly emphasize the voice of a distant speaker. For this reason, it is necessary to install a microphone in front of each speaker. However, by using the embodiment of the present invention, it is possible to clearly emphasize distant sounds, so a TV conference system corresponding to a large conference room can be constructed without installing a microphone in front of each speaker. It becomes possible to do.

Claims

A zoom microphone device comprising one support structure, 2N fixed reflectors (N is an integer of 1 or more), and 2N microphone arrays in which a plurality of microphones are linearly arranged,
The microphone array is attached to the surface of the fixed reflector one by one so that the surface of the fixed reflector and the microphone array direction of the microphone array are parallel to each other,
The surfaces of the two fixed reflectors to which the microphone array is attached form 90 degrees, the microphone array direction of the microphone array attached to one fixed reflector, and the microphone array attached to the other fixed reflector N sets of fixed reflectors are formed by fixing the two fixed reflectors facing each other so that the microphone arrangement direction forms 90 degrees, and the N sets of fixed reflectors thus prepared are supported by the support structure. A zoom microphone device that is attached to the body.

The zoom microphone device according to claim 1,
A movable reflector that is movable around a straight line formed by intersecting a plane perpendicular to the microphone array direction of the microphone array and a plane including the plate surface of the fixed reflector at the opening end of the N sets of fixed reflectors. A microphone array in which a plurality of microphones are linearly arranged on the movable reflector is attached so that a surface of the movable reflector and a microphone array direction of the microphone array are parallel to each other. age,
When the plate surface of the movable reflector and the plate surface of the fixed reflector are in the same plane,
The surface on which the microphone array of the movable reflector is attached and the surface on which the microphone array of the fixed reflector is attached are in the same direction,
A zoom microphone device, wherein a microphone array attached to the movable reflector and a microphone array attached to the fixed reflector are arranged on the same straight line.

The zoom microphone device according to claim 1 or 2,
A zoom microphone apparatus, wherein the N sets of fixed reflectors are combined so as to form opposing pyramidal surfaces of a regular 2N pyramid.

The zoom microphone device according to claim 3,
A zoom microphone apparatus comprising: a microphone array in which a plurality of microphones are linearly arranged on a straight line passing through the top of the regular 2N pyramid and perpendicular to the bottom surface of the regular 2N pyramid.