JPH11331827A

JPH11331827A - Television camera

Info

Publication number: JPH11331827A
Application number: JP10128836A
Authority: JP
Inventors: Takafumi Enami; 隆文枝並
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-05-12
Filing date: 1998-05-12
Publication date: 1999-11-30

Abstract

(57)【要約】【課題】魚眼又は超広角レンズ及び可変指向性マイク
ロフォンを用いたテレビカメラ装置に関し、テレビ会議
中に音声が発生されない場合でも人物の追尾を安定して
行い、アクティビティのあるテレビ会議の映像を生成
し、雑音やエコーを低減した高品質の臨場感のあるテレ
ビ会議を実現し、小型で軽量、且つ可動部の無いテレビ
カメラ装置を提供する。【解決手段】テレビカメラ装置１０は、中央部に魚眼
又は超広角レンズ部１１とＣＣＤ撮像部１３を備え、周
辺部に複数の無指向性マイクロフォン１２が配列され、
無指向性マイクロフォンの位相制御により可変指向性マ
イクロフォンとすると共に、音源位置の方向（話者方
向）を判定し、該音源位置方向を追尾し、音源位置方向
の画像（話者の人物像）を切り出して映像信号を生成す
る構成を有する。 (57) [Problem] To provide a television camera device using a fish-eye or super-wide-angle lens and a variable directivity microphone, which can stably track a person even when no sound is generated during a video conference, and have an activity. A video camera for generating a video conference image, realizing a high-quality realistic video conference with reduced noise and echo, and providing a small, lightweight, and video camera device having no moving parts. SOLUTION: A television camera device 10 includes a fisheye or super-wide-angle lens unit 11 and a CCD imaging unit 13 in a central part, and a plurality of omnidirectional microphones 12 are arranged in a peripheral part.
A variable directional microphone is obtained by controlling the phase of the omnidirectional microphone, the direction of the sound source position (speaker direction) is determined, the direction of the sound source position is tracked, and an image of the sound source position direction (person image of the speaker) is obtained. It has a configuration to cut out and generate a video signal.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、魚眼又は超広角レ
ンズ及び可変指向性マイクロフォンを用いたテレビカメ
ラ装置に関し、特に、テレビ会議用のカメラ装置又は遠
隔監視用のモニタカメラ装置等に使用され、画像信号か
ら人物等の移動像とその方向を識別すると共に、音声信
号から発言者の方向を識別し、魚眼又は超広角レンズで
撮影した映像の中から、移動人物又は発言者の映像を切
り出して映し出すテレビカメラ装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a television camera apparatus using a fish-eye or super-wide-angle lens and a variable directivity microphone, and more particularly to a camera apparatus for video conference or a monitor camera apparatus for remote monitoring. Identify the moving image of a person or the like and its direction from the image signal, identify the direction of the speaker from the audio signal, and select the moving person or the image of the speaker from the images captured with a fisheye or ultra-wide-angle lens. The present invention relates to a television camera device for cutting out and projecting.

【０００２】[0002]

【従来の技術】従来のテレビカメラ装置は、固定カメラ
又はリモコン操作式の首振りカメラとマイクロフォンと
により、会議参加者の映像と音声とを取り込み、その映
像信号と音声信号とを遠隔の会議場のモニタテレビ装置
に回線網を介して送出する。2. Description of the Related Art A conventional television camera apparatus captures video and audio of a conference participant using a fixed camera or a remote-controlled swing camera and a microphone, and converts the video and audio signals into a remote conference room. To a monitor television device via a network.

【０００３】前述の固定カメラによる映像は、撮像エリ
アが一定で動きがなく、そのため映像画面が単調であ
り、又、会議参加者が撮像エリアから移動すると、モニ
タテレビ画面から外へはみ出してしまい、固定カメラに
よるテレビ会議装置は、円滑なテレビ会議には不向きで
ある。一方、リモコン操作式の首振りカメラは、会議参
加者が、会話をしながらカメラの首振りを操作しなけれ
ばならず、その操作が煩わしく、そして首振り動作の反
応が遅く、操作性及び使用勝手が悪い。[0003] The image captured by the above-mentioned fixed camera has a fixed imaging area and does not move. Therefore, the image screen is monotonous. When a conference participant moves out of the imaging area, it goes out of the monitor television screen. A videoconference device using a fixed camera is not suitable for a smooth videoconference. On the other hand, a remote control-operated swing camera requires that the meeting participants operate the swing of the camera while talking, which is cumbersome, the response of the swing operation is slow, and the operability and use of the camera are slow. It is bad.

【０００４】そこで、発言を行っている話者を自動的に
識別して該話者を映し出すよう、カメラの向きを自動的
に旋回させて追尾させたり、話者を示す表示を付加して
映し出すテレビカメラシステムが考え出されている（例
えば、特開平５−１５３５８２号公報、特開平７−１５
７１０号公報、特開平６−３５１０１５号公報、特開平
５−２０７４４９号公報等参照）。In order to automatically identify the speaker who is speaking and project the speaker, the camera is automatically turned to track the subject, or a display indicating the speaker is added and projected. A television camera system has been devised (for example, Japanese Patent Application Laid-Open No. H5-153592, Japanese Patent Application Laid-Open No. H07-15-15).
710, JP-A-6-351015, JP-A-5-207449, etc.).

【０００５】図１２は従来のテレビ会議用のカメラ旋回
システムを示す図である。図の（Ａ）はカメラ旋回シス
テムの構成、図の（Ｂ）はその使用状況を示す。同図に
おいて、１２−１はｎ個の単一指向性マイクロフォンを
放射状に配置した第１のマイクロフォン装置、１２−２
はｍ個の高感度単一指向性マイクロフォンを放射状に配
置した第２のマイクロフォン装置、１２−３及び１２−
６はそれぞれ第１及び第２のマイクロフォン装置の中
で、最も大きい音声が入力されたマイクロフォンを検出
する音声認識部、１２−４は第２のマイクロフォン装置
の旋回を制御するマイク制御部、１２−５はカメラの旋
回を制御するカメラ制御部、１２−７はテレビカメラで
ある。FIG. 12 shows a conventional camera turning system for a video conference. (A) of the figure shows the configuration of the camera swivel system, and (B) of the figure shows the usage status. In the figure, reference numeral 12-1 denotes a first microphone device in which n unidirectional microphones are radially arranged, and 12-2.
Are second microphone devices, 12-3 and 12-, in which m sensitive unidirectional microphones are radially arranged.
Reference numeral 6 denotes a voice recognition unit that detects a microphone to which the loudest voice has been input among the first and second microphone devices. Reference numeral 12-4 denotes a microphone control unit that controls turning of the second microphone device. Reference numeral 5 denotes a camera control unit for controlling turning of the camera, and reference numeral 12-7 denotes a television camera.

【０００６】第１のマイクロフォン装置１２−１のｎ個
の単一指向性マイクロフォンは、各々、会議卓の円周を
ｎ等分した各円弧の中心方向にその指向性を向けて固定
的に配置されている。今、図の（Ｂ）に示すように、会
議卓の円周部のｎ等分した或る１つの円弧１２−８の方
向に指向性を向けたマイクロフォンからの音声入力が、
最も大きい音声として第１の音声認識部１２−３により
認識されると、音声認識部１２−３は、該円弧１２−８
の位置方向の情報を、マイク制御部１２−４とカメラ制
御部１２−５とに送出する。[0006] The n unidirectional microphones of the first microphone device 12-1 are fixedly arranged with their directivity directed toward the center of each arc that divides the circumference of the conference table into n equal parts. Have been. Now, as shown in (B) of the figure, a voice input from a microphone whose directivity is directed in the direction of a certain circular arc 12-8 equally divided into n around the circumference of the conference table,
When the first voice recognition unit 12-3 recognizes the loudest voice, the voice recognition unit 12-3 turns the arc 12-8.
Is transmitted to the microphone control unit 12-4 and the camera control unit 12-5.

【０００７】マイク制御部１２−４は、第２のマイクロ
フォン装置１２−２全体を旋回させ、最も大きい音声が
認識された前述の円弧部１２−８の中心方向に第２のマ
イクロフォン装置１２−２全体を向ける。第２のマイク
ロフォン装置１２−２のｍ個の高感度単一指向性マイク
ロフォンは、各々、該円弧部１２−８の円周を更にｍ等
分した方向にその指向性を向けて配置され、第２の音声
認識装置１２−６は、円弧部１２−８をｍ等分したｍ個
の位置方向の中から、最も大きい音声が入力される位置
方向１２−９を認識し、その位置方向情報をカメラ制御
部１２−５に送出する。The microphone control unit 12-4 turns the entire second microphone device 12-2, and moves the second microphone device 12-2 toward the center of the arc portion 12-8 where the loudest voice is recognized. Turn the whole. The m high-sensitivity unidirectional microphones of the second microphone device 12-2 are arranged so that their directivities are directed in directions in which the circumference of the arc portion 12-8 is further equally divided by m. The second voice recognition device 12-6 recognizes a position direction 12-9 where the loudest voice is input from among m position directions obtained by dividing the arc portion 12-8 into m equal parts, and recognizes the position direction information. This is sent to the camera control unit 12-5.

【０００８】カメラ制御部１２−５は、第１及び第２の
音声認識部から送出される位置方向情報を基に、最も大
きい音声が入力される位置方向１２−９が話者の方向と
判定し、該位置方向１２−９に向けてテレビカメラ１２
−７を旋回させる。The camera control unit 12-5 determines the position direction 12-9 where the loudest voice is input as the direction of the speaker based on the position direction information sent from the first and second voice recognition units. Then, the television camera 12 is moved toward the position direction 12-9.
Turn -7.

【０００９】図１３は従来の話者位置を表示するテレビ
会議システムを示す図である。同図において、１３−１
及び１３−２は第１及び第２のマイクロフォン、１３−
３は音声回路、１３−４は位相差検出回路、１３−５は
演算回路、１３−６は位置情報回路、１３−７はテレビ
カメラ、１３−８は映像回路、１３−９は映像合成回
路、１３−１０は映像符号器、１３−１１はネットワー
ク、１３−１２は受信回路、１３−１３はモニタテレ
ビ、１３−１４はスピーカーである。FIG. 13 is a diagram showing a conventional video conference system for displaying speaker positions. In the figure, 13-1
And 13-2 are first and second microphones, and 13-
3 is an audio circuit, 13-4 is a phase difference detection circuit, 13-5 is an arithmetic circuit, 13-6 is a position information circuit, 13-7 is a television camera, 13-8 is a video circuit, and 13-9 is a video synthesis circuit. , 13-10 is a video encoder, 13-11 is a network, 13-12 is a receiving circuit, 13-13 is a monitor television, and 13-14 is a speaker.

【００１０】第１及び第２のマイクロフォン１３−１，
１３−２に入力される音声の位相差を位相差検出回路１
３−４により検出し、該位相差から話者の位置を演算
し、その位置情報から話者位置を表示する映像信号と、
テレビカメラ１３−７で撮影した映像（映像回路１３−
８の出力映像信号）とを、映像合成回路１３−９により
合成し、該合成した映像信号を符号化し、ネットワーク
１３−１１を介してモニタテレビ１３−１３に映し出す
ものである。The first and second microphones 13-1, 13-1,
13-2 detects the phase difference of the sound input to the phase difference detection circuit 1
3-4, a video signal for calculating the position of the speaker from the phase difference, and displaying the speaker position from the position information;
Video taken by the TV camera 13-7 (video circuit 13-
8) is synthesized by a video synthesizing circuit 13-9, and the synthesized video signal is encoded and displayed on a monitor television 13-13 via a network 13-11.

【００１１】[0011]

【発明が解決しようとする課題】従来のテレビ会議用の
システムは、マイクロフォンとテレビカメラの設置位置
とを、あらかじめ会議参加者に向けて設定するというプ
リセット動作が必要であり、会議中に、マイクロフォン
の位置を変更することはできない。The conventional video conference system requires a preset operation of setting the microphone and the installation position of the television camera in advance for the conference participants. Cannot be changed.

【００１２】又、マイクロフォンは、テレビ会議に参加
する人数分のマイクを必要とする。又、テレビカメラの
旋回時に発生する機械音が会話に支障をきたす等の問題
点を含んでいる。又、２個のマイクロフォンから入力さ
れる音の位相差により、話者の位置を検出する場合、雑
音やモニタテレビ装置のスピーカーから送出される音声
に対しても、敏感に反応するため、雑音やスピーカー音
の伴う実際のテレビ会議の場面での使用には支障が多
い。Also, the microphones need microphones for the number of participants in the video conference. Also, there is a problem that a mechanical sound generated at the time of turning the television camera disturbs conversation. In addition, when the position of a speaker is detected based on the phase difference between sounds input from two microphones, the position of the speaker is sensitive to noise and sounds transmitted from speakers of a monitor television device. There are many obstacles to using it in an actual video conference with speaker sounds.

【００１３】又、音声のみにより、追尾方向を決定する
ため、対象人物が発声をしないと追尾方向が不安定とな
り、又、会議場に複数の人物が存在する場合に、複数の
人物を映し出したりすることができず、常に一人の会議
参加者のみを追尾することしかできない。又、テレビカ
メラの旋回音が、会議参加者に緊張感や違和感を与えて
しまう等の問題を抱えている。[0013] Further, since the tracking direction is determined only by voice, the tracking direction becomes unstable unless the target person utters, and when a plurality of persons exist in the conference hall, a plurality of persons are projected. Can only track one conference participant at any given time. In addition, there is a problem that the turning sound of the television camera gives a sense of tension or discomfort to the conference participants.

【００１４】更に、テレビカメラ装置は、遠隔の相手方
の会議参加者を映すモニタテレビ装置とその音声を放音
するスピーカーと共に、会議室等に設置され、該スピー
カーから放音される音声が、テレビカメラ装置のマイク
ロフォンに回り込んでエコーが発生するため、エコーを
消去又は抑制する高価なエコーキャンセラ又はエコーサ
プレッサを組み込む必要があった。Further, the television camera device is installed in a conference room or the like together with a monitor television device for displaying a remote participant's conference participant and a speaker for emitting the sound, and the sound emitted from the speaker is transmitted to the television. Since an echo is generated by wrapping around the microphone of the camera device, it is necessary to incorporate an expensive echo canceller or echo suppressor that cancels or suppresses the echo.

【００１５】本発明は、テレビ会議中に音声が発生され
ない場合でも人物の追尾を安定して行い、自然でアクテ
ィビティのあるテレビ会議の映像を生成することを目的
とする。又、話者の位置を特定するのみでなく、マイク
ロフォンに入力される音源が、雑音、人物の音声又はモ
ニタテレビ装置のスピーカー音であるかに応じて、それ
ぞれの音源に対して異なる指向性を与え、高品質の臨場
感のあるテレビ会議を実現することを目的とする。更
に，会議場等の音声及び画像をすべて収集し、その中か
ら送出する音声及び画像を抽出する小型で軽量、且つ可
動部の無いテレビカメラ装置を提供することを目的とす
る。SUMMARY OF THE INVENTION It is an object of the present invention to stably track a person even when no sound is generated during a video conference, and to generate a video of a video conference with natural and activity. In addition to specifying the position of the speaker, depending on whether the sound source input to the microphone is noise, human voice, or speaker sound of a monitor television device, different directivities are given to the respective sound sources. The purpose is to realize a high-quality, realistic video conference. It is still another object of the present invention to provide a small, lightweight, and no moving part television camera device that collects all voices and images of a conference hall or the like and extracts voices and images to be transmitted from the collected voices and images.

【００１６】[0016]

【課題を解決するための手段】本発明のテレビカメラ装
置は、（１）テレビ会議又は遠隔監視に使用されるテレ
ビカメラ装置であって、魚眼又は超広角レンズのカメラ
と、複数の無指向性マイクロフォンを配列した可変指向
性マイクロフォンと、前記可変指向性マイクロフォンに
より、音源位置の方向を判定して、該音源位置方向を追
尾する手段と、前記音源位置方向の画面を切り出して映
像信号を生成する手段とを備えたものである。According to the present invention, there is provided a television camera apparatus for use in (1) a video conference or remote monitoring, which comprises a camera having a fish-eye or super wide-angle lens, and a plurality of omni-directional cameras. Directional microphone in which directional microphones are arranged, a means for determining the direction of a sound source position by the variable directional microphone, and tracking the direction of the sound source position, and generating a video signal by cutting out a screen in the direction of the sound source position Means for performing the operation.

【００１７】又、（２）テレビ会議又は遠隔監視に使用
されるテレビカメラ装置であって、魚眼又は超広角レン
ズのカメラと、複数の無指向性マイクロフォンを配列し
た可変指向性マイクロフォンと、前記可変指向性マイク
ロフォンにより、音源位置の方向を判定して、該音源位
置方向を追尾する手段と、前記魚眼レンズ又は超広角レ
ンズのカメラで撮像した画像のフレーム間差分により移
動像を検出して追尾する手段と、前記音源位置方向を追
尾する手段と移動像を検出して追尾する手段とのいずれ
か一方を選択して画面を切り出し、映像信号を生成する
手段とを備えたものである。(2) A television camera device used for video conference or remote monitoring, wherein the camera has a fish-eye or ultra-wide-angle lens, a variable directional microphone in which a plurality of omnidirectional microphones are arranged, and A variable directional microphone determines the direction of the sound source position, and tracks the direction of the sound source position, and detects and tracks a moving image based on an inter-frame difference of an image captured by the camera of the fisheye lens or the ultra-wide-angle lens. And a means for selecting one of the means for tracking the direction of the sound source position and the means for detecting and tracking a moving image, cutting out the screen, and generating a video signal.

【００１８】又、（３）前記音源位置方向を追尾する手
段は、有声音を検出する手段を有し、有声音が検出され
る音源位置方向の画面を切り出し、映像信号を生成する
構成を有するものである。(3) The means for tracking the direction of the sound source position has means for detecting voiced sound, and has a configuration for cutting out a screen in the direction of the sound source position where voiced sound is detected and generating a video signal. Things.

【００１９】又、（４）前記音源位置方向を追尾する手
段は、有声音が検出されない音源位置方向の利得を低下
させるように前記無指向性マイクロフォンの出力音信号
を合成する構成を有するものである。(4) The means for tracking the direction of the sound source position has a configuration for synthesizing the output sound signal of the omnidirectional microphone so as to reduce the gain in the direction of the sound source position where no voiced sound is detected. is there.

【００２０】又、（５）前記音源位置方向を追尾する手
段は、モニタテレビ装置のスピーカー音信号を検出する
手段を有し、前記無指向性マイクロフォンの出力音信号
と該モニタテレビ装置のスピーカー音信号との相関を計
算して、スピーカー位置を判定し、前記無指向性マイク
ロフォンから出力されるスピーカー音を抑圧するよう
に、前記無指向性マイクロフォンの出力音信号を合成す
る構成を有するものである。(5) The means for tracking the direction of the sound source position has means for detecting a speaker sound signal of the monitor television device, and the output sound signal of the omnidirectional microphone and the speaker sound of the monitor television device. It has a configuration of calculating a correlation with a signal, determining a speaker position, and synthesizing an output sound signal of the omnidirectional microphone so as to suppress speaker sound output from the omnidirectional microphone. .

【００２１】又、（６）前記音源位置方向を追尾する手
段と移動像を検出して追尾する手段とを備えたテレビカ
メラ装置において、音源位置方向を追尾する手段を優先
させて画面を切り出し、話者を強調して表示する映像信
号を生成する構成を有するものである。(6) In a television camera device provided with a means for tracking the sound source position direction and a means for detecting and tracking a moving image, a screen is cut out with priority given to the means for tracking the sound source position direction. It has a configuration for generating a video signal that emphasizes and displays a speaker.

【００２２】又、（７）前記音源位置方向を追尾する手
段と移動像を検出して追尾する手段とを備えたテレビカ
メラ装置において、移動像を検出して追尾する手段を優
先させて画面を切り出し、移動人物を強調して表示する
映像信号を生成する構成を有するものである。(7) In a television camera device provided with means for tracking the direction of the sound source position and means for detecting and tracking a moving image, a screen is displayed by giving priority to the means for detecting and tracking a moving image. It has a configuration for generating a video signal that cuts out and displays a moving person with emphasis.

【００２３】又、（８）前記音源位置方向を追尾する手
段と移動像を検出して追尾する手段とを備え、その両者
の追尾手段のいずれかをキー入力により選択する構成を
有するものである。(8) The apparatus further comprises means for tracking the direction of the sound source position and means for detecting and tracking a moving image, and has a configuration in which one of the two tracking means is selected by key input. .

【００２４】又、（９）魚眼又は超広角レンズのカメラ
と複数の無指向性マイクロフォンとを一体化し、前記魚
眼又は超広角レンズを中央部に配置し、前記複数の無指
向性マイクロフォンを周辺部に配列して構成したもので
ある。(9) A fish-eye or super-wide-angle lens camera and a plurality of omni-directional microphones are integrated, and the fish-eye or ultra-wide-angle lens is arranged at the center, and the plurality of omni-directional microphones are integrated. They are arranged in a peripheral part.

【００２５】[0025]

【発明の実施の形態】図１は、本発明の実施の形態のテ
レビカメラ装置の構成と使用状況を示す図である。図の
（Ａ）は、本発明の実施の形態のテレビカメラ装置の平
面図、図の（Ｂ）は、ａ１−ａ２で切断したその断面図
である。図の（Ｃ）は、本発明の実施の形態のテレビカ
メラ装置をテレビ会議に使用した場合の使用状況を示
す。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a diagram showing a configuration and a use state of a television camera device according to an embodiment of the present invention. FIG. 1A is a plan view of a television camera device according to an embodiment of the present invention, and FIG. 1B is a cross-sectional view taken along line a1-a2. (C) of the drawing shows a use situation when the television camera device according to the embodiment of the present invention is used for a video conference.

【００２６】図中、１０は全体のテレビカメラ装置、１
１は魚眼レンズ部、１２は無指向性マイクロフォン、１
３はＣＣＤ撮像部、１４は会議参加者、１５は会議卓、
１６はモニタテレビ装置である。In the figure, reference numeral 10 denotes an entire television camera device, 1
1 is a fisheye lens unit, 12 is an omnidirectional microphone, 1
3 is a CCD imaging unit, 14 is a conference participant, 15 is a conference table,
Reference numeral 16 denotes a monitor television device.

【００２７】テレビカメラ装置１０は、その中央部にカ
メラの魚眼レンズ部１１を配置し、周囲に複数の無指向
性マイクロフォン１２を、等間隔に配置して構成され
る。各無指向性マイクロフォン１２の出力音声信号は、
それぞれ音の特性に応じて遅延量（位相）を変化させて
合成することにより、全体として、音の特性に応じて指
向性が異なる可変指向性マイクロフォンとして機能す
る。The television camera device 10 has a fisheye lens portion 11 of the camera disposed in the center thereof, and a plurality of omnidirectional microphones 12 disposed at equal intervals around the periphery. The output audio signal of each omnidirectional microphone 12 is
By synthesizing while changing the amount of delay (phase) according to the characteristics of the sound, the entire device functions as a variable directional microphone having different directivities depending on the characteristics of the sound.

【００２８】魚眼レンズは約１８０度の画角を有するた
め、図１の（Ｃ）に示すように、テレビカメラ装置１０
を、会議参加者が着座する会議卓１５の略中央付近に上
方を向けて１台設置することにより、会議参加者全員の
像を映すことができる。そして、魚眼レンズ部１１で撮
影した映像の中から、話者又は移動人物等の注目すべき
画像を切り出し、切り出した画像の魚眼レンズ歪を補正
し、映像信号として出力する。なお、魚眼レンズの代わ
りに超広角レンズを用いることもできる。Since the fisheye lens has an angle of view of about 180 degrees, as shown in FIG.
Is installed near the center of the conference table 15 on which the conference participants are seated, with one facing upward, so that images of all conference participants can be projected. Then, an image of interest such as a speaker or a moving person is cut out from the video taken by the fish-eye lens unit 11, the fish-eye lens distortion of the cut-out image is corrected, and the image is output as a video signal. It should be noted that a super wide angle lens can be used instead of the fisheye lens.

【００２９】切り出す画像を、話者又は移動人物の位置
識別により自動的に決定して切り替えることにより、旋
回台等の可動装置を使用することなく、従って、旋回音
の発生がなく、会議参加者は、カメラを意識したりする
ことなく、自然な雰囲気の中で会議を行うことができ
る。By automatically determining and switching the image to be cut out based on the position identification of the speaker or the moving person, no moving device such as a swivel table is used, so that no turning sound is generated, and the conference participants Can hold a meeting in a natural atmosphere without being conscious of the camera.

【００３０】テレビカメラ装置１０に配置する無指向性
マイクロフォン１２は、通常は３乃至４個程であれば充
分であるが、より鋭く、精度の高い指向性を得て多人数
の会議参加者に対応できるようにするためには、その分
より多くの無指向性マイクロフォンを配置すればよい。The number of omnidirectional microphones 12 to be arranged in the television camera device 10 is usually sufficient for three or four microphones. However, a sharper and more accurate directivity can be obtained for a large number of conference participants. In order to be able to respond, more omnidirectional microphones need to be arranged.

【００３１】図２は本発明の実施の形態のテレビカメラ
装置を用いたテレビ会議システムの構成を示す図であ
る。同図の（Ａ）は画像処理部の構成を示し、図の
（Ｂ）は音声処理部の構成を示している。２Ａ−１は魚
眼レンズカメラ、２Ａ−２は人物位置検出部、２Ａ−３
は人物位置決定部、２Ａ−４は画像切り出し・歪み補正
部、２Ａ−５は画像通信部（コーデック）、２Ａ−６は
遠隔の相手方のテレビ会議システムに接続される回線、
２Ａ−７は相手方の会議場の映像を映すモニタテレビ装
置である。FIG. 2 is a diagram showing a configuration of a video conference system using the television camera device according to the embodiment of the present invention. FIG. 2A shows the configuration of the image processing unit, and FIG. 2B shows the configuration of the audio processing unit. 2A-1 is a fish-eye lens camera, 2A-2 is a person position detector, 2A-3
Is a person position determination unit, 2A-4 is an image cutout / distortion correction unit, 2A-5 is an image communication unit (codec), 2A-6 is a line connected to a remote partner's video conference system,
2A-7 is a monitor television device for displaying an image of the other party's conference hall.

【００３２】２Ｂ−１はマイクロフォンバンク及び音声
信号処理部、２Ｂ−２は有声音検出部、２Ｂ−３は話者
位置計算部、２Ｂ−４は雑音源位置計算部、２Ｂ−５は
スピーカー音検出部、２Ｂ−６はスピーカー位置計算
部、２Ｂ−７はマイクロフォン指向性決定部、２Ｂ−８
は指向性制御部、２Ｂ−９は音声通信部（コーデッ
ク）、２Ｂ−１０は相手方のテレビ会議システムに接続
される回線、２Ｂ−１１は音質改善部、２Ｂ−１２は相
手方の会議場の音声を放音するスピーカーである。2B-1 is a microphone bank and voice signal processing section, 2B-2 is a voiced sound detection section, 2B-3 is a speaker position calculation section, 2B-4 is a noise source position calculation section, and 2B-5 is a speaker sound. Detector, 2B-6 is speaker position calculator, 2B-7 is microphone directivity determiner, 2B-8
Is a directivity control unit, 2B-9 is a voice communication unit (codec), 2B-10 is a line connected to the other party's video conference system, 2B-11 is a sound quality improvement unit, and 2B-12 is the voice of the other party's conference room. Speaker.

【００３３】魚眼レンズカメラ２Ａ−１で撮影された画
像信号は、人物位置検出部２Ａ−２と画像切り出し・歪
み補正部２Ａ−４とに出力され、人物位置検出部２Ａ−
２は、移動人物の像及びその位置を検出し、該位置情報
を人物位置決定部２Ａ−３に送出する。An image signal photographed by the fish-eye lens camera 2A-1 is output to a person position detecting section 2A-2 and an image cutout / distortion correcting section 2A-4, and the person position detecting section 2A-
2 detects the image of the moving person and its position, and sends the position information to the person position determining unit 2A-3.

【００３４】人物位置決定部２Ａ−３は、人物位置検出
部２Ａ−２から出力された人物位置情報と、後述する話
者位置計算部２Ｂ−３から出力される話者位置情報とを
基に、魚眼レンズで捉えた画像の中から切り出して映し
出す人物位置領域を、設定された優先順序に従って決定
する。The person position determination unit 2A-3 is based on the person position information output from the person position detection unit 2A-2 and the speaker position information output from a speaker position calculation unit 2B-3 described later. Then, a person position area cut out from the image captured by the fisheye lens and projected is determined according to the set priority order.

【００３５】画像切り出し・歪み補正部２Ａ−４は、人
物位置決定部２Ａ−３から出力された人物位置領域情報
を基に、魚眼レンズの画像の中から当該人物像の画像部
分を切り出し、該画像部分の魚眼レンズによる歪みを写
像変換操作により補正して、歪みの無い通常の画像の映
像信号を、画像通信部２Ａ−５に送出する。The image cutout / distortion correction unit 2A-4 cuts out the image portion of the person image from the fisheye lens image based on the person position area information output from the person position determination unit 2A-3, and The distortion of the portion caused by the fisheye lens is corrected by a mapping conversion operation, and a video signal of a normal image without distortion is transmitted to the image communication unit 2A-5.

【００３６】画像通信部２Ａ−５は、画像切り出し・歪
み補正部２Ａ−４から送出された映像信号を伝送方式に
適合した符号形式により符号化し、回線２Ａ−６を介し
て、相手方のテレビ会議システムに送信する。又、画像
通信部２Ａ−５は、相手方から回線２Ａ−６を介して受
信される相手方の映像符号信号を復号化し、モニタテレ
ビ装置２Ａ−７に相手方の映像信号を送出し、モニタテ
レビ装置２Ａ−７は、該映像信号により相手方の映像を
映し出す。The image communication unit 2A-5 encodes the video signal sent from the image clipping / distortion correcting unit 2A-4 in a code format suitable for the transmission system, and transmits the video signal to the other party via the line 2A-6. Send to system. The image communication unit 2A-5 decodes the other party's video code signal received from the other party via the line 2A-6, sends the other party's video signal to the monitor television apparatus 2A-7, and -7 displays the image of the other party by the video signal.

【００３７】図３は前述の人物位置検出部２Ａ−２の構
成を示す図である。同図において、３−１はＡ／Ｄ変換
器、３−２はフレームメモリ、３−３はフレーム間差分
検出部、３−４はフレーム内差分検出部、３−５は移動
像輪郭抽出部、３−６は移動人物位置算出部である。FIG. 3 is a diagram showing the configuration of the above-mentioned person position detecting section 2A-2. In the figure, 3-1 is an A / D converter, 3-2 is a frame memory, 3-3 is an inter-frame difference detection unit, 3-4 is an intra-frame difference detection unit, and 3-5 is a moving image contour extraction unit. , 3-6 are moving person position calculation units.

【００３８】魚眼レンズカメラ２Ａ−１により撮像した
映像は、Ａ／Ｄ変換器３−１によりディジタル信号に変
換され、フレームメモリ３−２にフレーム単位に記憶さ
れ、フレーム間差分検出部３−３により前フレームとの
差分が検出され、移動した像の部分が検出される。そし
て、フレーム内差分検出部３−４により１フレーム内の
隣接する画像信号間の差分が検出され、像の輪郭部分が
検出される。An image picked up by the fish-eye lens camera 2A-1 is converted into a digital signal by an A / D converter 3-1 and stored in a frame memory 3-2 on a frame-by-frame basis. The difference from the previous frame is detected, and the moved image portion is detected. Then, the difference between adjacent image signals in one frame is detected by the intra-frame difference detection unit 3-4, and the outline of the image is detected.

【００３９】移動像輪郭抽出部３−５は、前記フレーム
間差分検出部３−３とフレーム内差分検出部３−４との
出力を結合することにより、移動像の輪郭部分を抽出す
る。移動人物位置算出部３−６は、移動像輪郭抽出部３
−５から出力される移動像の輪郭部分の位置を算出し、
その位置情報を図２の人物位置決定部２Ａ−３に出力す
る。The moving image contour extracting unit 3-5 extracts the contour part of the moving image by combining the outputs of the inter-frame difference detecting unit 3-3 and the intra-frame difference detecting unit 3-4. The moving person position calculating unit 3-6 includes a moving image contour extracting unit 3
-5 to calculate the position of the contour portion of the moving image output from
The position information is output to the person position determination unit 2A-3 in FIG.

【００４０】図２の人物位置決定部２Ａ−３は、該移動
人物の位置情報又は後述する話者位置情報を基に映し出
す人物位置領域を決定し、該人物位置領域の情報を、画
像切り出し・歪み補正部２Ａ−４に送出する。ここで、
画像切り出し・歪み補正部２Ａ−４における魚眼レンズ
画像からの表示画像の切り出しと歪み補正の原理につい
て説明する。The person position determination unit 2A-3 in FIG. 2 determines a person position area to be projected based on the position information of the moving person or the speaker position information described later, and extracts the information of the person position area from an image cut-out. This is sent to the distortion correction unit 2A-4. here,
The principle of clipping a display image from a fisheye lens image and correcting distortion in the image clipping / distortion correcting unit 2A-4 will be described.

【００４１】図４は魚眼レンズの画像の切り出しと歪み
補正の原理説明図である。同図において、４−１は仮想
の画像フレーム、４−２は仮想の半球面、４−３は魚眼
レンズの撮像画面である。又、ｘ，ｙ，ｚ軸で示す３次
元空間の原点Ｏに魚眼レンズの中心が置かれ、且つ、魚
眼レンズはｚ軸の方向に向けて置かれているものとす
る。FIG. 4 is a view for explaining the principle of clipping an image of a fisheye lens and correcting distortion. In the figure, 4-1 is a virtual image frame, 4-2 is a virtual hemisphere, and 4-3 is a fisheye lens imaging screen. It is assumed that the center of the fisheye lens is located at the origin O of the three-dimensional space indicated by the x, y, and z axes, and that the fisheye lens is oriented in the direction of the z axis.

【００４２】又、仮想の半球面４−２は、魚眼レンズが
正射影方式のレンズでその焦点距離がｆであるとする
と、半径ｆの仮想的な半球の球面であり、その平面部は
ｘ−ｙ平面上に置かれ、その中心は原点Ｏの位置に配置
される。The virtual hemispherical surface 4-2 is a virtual hemispherical spherical surface having a radius f if the fisheye lens is an orthographic lens and its focal length is f. It is placed on the y-plane, and its center is located at the position of the origin O.

【００４３】仮想の画像フレーム４−１は、魚眼レンズ
の位置（原点）から撮影対象物へ向かう視線方向ベクト
ルＤＯＶ（ＤｉｒｅｃｔｉｏｎＯｆＶｉｅｗ）と直
交する仮想的な平面上のフレームで、所定の大きさの枠
を持ち、後に説明するように、この枠内の映像が魚眼レ
ンズ画像から切り出されてモニタテレビ装置の画面に表
示されることとなる。即ち、モニタテレビ装置の画像表
示フレームの枠と同じ大きさの枠となる。The virtual image frame 4-1 is a frame on a virtual plane orthogonal to a line of sight vector DOV (Direction Of View) from the position (origin) of the fisheye lens toward the object to be photographed, and has a predetermined size. As will be described later, an image in this frame is cut out from the fisheye lens image and displayed on the screen of the monitor television device. That is, the frame has the same size as the frame of the image display frame of the monitor television device.

【００４４】又、仮想の画像フレーム４−１は、そのフ
レーム内に視線方向ベクトルＤＯＶと交差する点を原点
とする２次元座標軸（ｐ，ｑ）を持ち、仮想の画像フレ
ーム４−１内の点はこの座標の成分（ｐ，ｑ）によって
表される。The virtual image frame 4-1 has a two-dimensional coordinate axis (p, q) having an origin at a point intersecting with the line-of-sight direction vector DOV within the virtual image frame 4-1. The point is represented by the component (p, q) of this coordinate.

【００４５】魚眼レンズによって映し出される像（魚眼
レンズ画像）は、レンズの射影方式（projection fomu
la）によっていくつかのタイプに分けられるが、以下、
正射影（orthographic projection）方式の魚眼レンズ
について説明する。The image projected by the fish-eye lens (fish-eye lens image) is obtained by the projection method of the lens (projection fomu).
la) can be divided into several types.
The fisheye lens of the orthographic projection system will be described.

【００４６】魚眼レンズの置かれた位置（原点Ｏ）から
前記仮想の半球面４−２を通して見える像を、そのまま
該仮想の半球面４−２に貼り付けたと仮定する。そし
て、その半球面４−２上に貼り付けた３次元空間の像
を、図のｘ軸及びｙ軸からなる２次元空間の平面上に、
該平面に垂直に（ｚ方向から原点方向に）押し潰して貼
り付けた像が、正射影方式の魚眼レンズの画像である。It is assumed that an image seen through the virtual hemisphere 4-2 from the position where the fisheye lens is placed (origin O) is directly attached to the virtual hemisphere 4-2. Then, the image of the three-dimensional space pasted on the hemisphere 4-2 is placed on the plane of the two-dimensional space consisting of the x-axis and the y-axis in the drawing.
The image which is crushed and attached perpendicularly to the plane (from the z direction to the origin) is an image of the orthographic fisheye lens.

【００４７】なお、実際には、魚眼レンズの画像は上下
左右逆転した位置に像を結ぶが、像の歪曲の相対的な位
置ずれの関係は変わらないので、説明を簡略化するため
前述したように結像するものとする。In practice, the image of the fisheye lens forms an image at a position where the image is inverted upside down, left and right, but since the relative positional deviation of the image distortion does not change, as described above for simplification of the description. It forms an image.

【００４８】前記半球面４−２上の像を垂直に押し潰し
た２次元空間の平面が魚眼レンズ撮像画面４−３であ
り、該魚眼レンズ撮像画面４−３の円の内側の画像が魚
眼レンズで撮影された画像である。外側の四角の枠は、
ＣＣＤ撮像装置から得られる全体の撮像画面の外枠であ
る。A plane in a two-dimensional space obtained by vertically crushing the image on the hemisphere 4-2 is a fisheye lens imaging screen 4-3, and an image inside the circle of the fisheye lens imaging screen 4-3 is photographed with a fisheye lens. Image. The outer square frame is
This is the outer frame of the entire imaging screen obtained from the CCD imaging device.

【００４９】魚眼レンズの画像は歪曲しているため、魚
眼レンズカメラで撮影した像を表示するには、その一部
を切り出して正常な画像に補正する必要がある。そこ
で、表示するフレームに対応するフレームを仮想の画像
フレーム４−１として想定する。Since the image of the fisheye lens is distorted, it is necessary to cut out a part of the image and correct it to a normal image in order to display the image taken by the fisheye lens camera. Therefore, a frame corresponding to the frame to be displayed is assumed to be a virtual image frame 4-1.

【００５０】魚眼レンズが向いているｚ軸の方向が、鉛
直線の上方の方向であるとし、前記視線方向ベクトルＤ
ＯＶがｚ軸と成す頂点角（天頂角）をφ、水平面の基準
軸（ｘ軸又はｙ軸）と成す方位角をθとする。又、焦点
距離（魚眼レンズ画像の円の半径）をｆ、画像フレーム
の回転角をω、拡大率をｍとする。拡大率ｍは、魚眼レ
ンズの位置（原点Ｏ）から画像フレーム４−１内の原点
までの距離をＤとすると、ｍ＝Ｄ／ｆである。It is assumed that the direction of the z-axis to which the fisheye lens is directed is a direction above the vertical line, and the line-of-sight direction vector D
The vertex angle (zenith angle) formed by the OV with the z-axis is φ, and the azimuth formed by the horizontal reference axis (x-axis or y-axis) is θ. The focal length (the radius of the circle of the fisheye lens image) is f, the rotation angle of the image frame is ω, and the magnification is m. The magnification factor m is m = D / f, where D is the distance from the position of the fisheye lens (origin O) to the origin in the image frame 4-1.

【００５１】魚眼レンズ画像平面４−３内の点（ｘ，
ｙ）と画像フレーム４−１内の点（ｐ，ｑ）との対応関
係は、下記の関係式によって表わされる。ｘ＝Ｒ（ｐＡ−ｑＢ＋ｍＲｓｉｎφｓｉｎθ）／（ｐ²
＋ｑ²＋ｍ²Ｒ²）^1/2 ｙ＝Ｒ（ｐＣ−ｑＤ−ｍＲｓｉｎφｓｉｎθ）／（ｐ²
＋ｑ²＋ｍ²Ｒ²）^1/2 Ａ＝ｃｏｓωｃｏｓθ−ｓｉｎωｓｉｎθｃｏｓφ Ｂ＝ｓｉｎωｃｏｓθ＋ｃｏｓωｓｉｎθｃｏｓφ Ｃ＝ｃｏｓωｓｉｎθ＋ｓｉｎωｃｏｓθｃｏｓφ Ｄ＝ｓｉｎωｓｉｎθ−ｃｏｓωｃｏｓθｃｏｓφThe point (x,
The correspondence between y) and the point (p, q) in the image frame 4-1 is represented by the following relational expression. x = R (pA-qB + mR sin φ sin θ) / (p ²
+ Q ² + m ² R ² ) ^1/2 y = R (pC-qD-mR sin φsin θ) / (p ²
+ Q ² + m ² R ² ) ^1/2 A = cosωcosθ−sinωsinθcosφ B = sinωcosθ + cosωsinθcosφ C = cosωsinθ + sinωcosθcosφ D = sinωsinθ−cosωcosθcosφ

【００５２】上記の関係式を利用して魚眼レンズ画像の
歪みを補正し、元の画像を復元してモニタテレビ装置に
表示させることができる。即ち、先ず前記画像フレーム
４−１を定位させ、モニタ表示する領域（切り出し領
域）を定める。この定位操作は、画像フレームの頂点角
φ、方位角θ、回転角φ及び拡大率ｍを、前述の人物位
置決定部２Ａ−３から、画像切り出し・歪み補正部２Ａ
−４に与えることによって設定される。The distortion of the fisheye lens image can be corrected using the above relational expression, and the original image can be restored and displayed on the monitor television device. That is, first, the image frame 4-1 is localized, and an area (cutout area) to be displayed on the monitor is determined. In this localization operation, the vertex angle φ, the azimuth angle θ, the rotation angle φ, and the enlargement factor m of the image frame are converted from the above-described person position determination unit 2A-3 into the image cutout / distortion correction unit 2A.
-4.

【００５３】画像切り出し・歪み補正部２Ａ−４は、画
像フレーム４−１内の点（ｐ，ｑ）に対応する魚眼レン
ズ画像平面４−３内の点（ｘ，ｙ）を、前記の関係式に
より求め、その点の色情報信号を、魚眼画像を記憶した
映像フレームメモリ（図示省略）から読み出して、画像
フレーム４−１内の点（ｐ，ｑ）に対応するアドレスを
有する出力用の映像メモリ（図示省略）に該色情報信号
を書き込む。The image clipping / distortion correcting unit 2A-4 calculates the point (x, y) in the fisheye lens image plane 4-3 corresponding to the point (p, q) in the image frame 4-1 by the above-mentioned relational expression. , The color information signal at that point is read out from a video frame memory (not shown) storing the fisheye image, and output for an output having an address corresponding to the point (p, q) in the image frame 4-1. The color information signal is written into a video memory (not shown).

【００５４】魚眼画像を記憶した映像フレームメモリか
ら、画像フレーム４−１の点に対応するアドレスを有す
る出力用の映像メモリへの色情報転送操作を、画像フレ
ーム４−１内の全ての点（ｐ，ｑ）について行い、出力
用の映像メモリから順次、色情報信号を読み出だして出
力することにより、画像切り出し・歪み補正部２Ａ−４
は、原点Ｏから仮想の画像フレーム４−１の枠を通して
見た実物象の歪みのない映像信号を生成することができ
る。（米国特許第５，１８５，６６７号明細書又は先に
出願した特願平１０−６２５３１号の明細書等参照）The color information transfer operation from the video frame memory storing the fisheye image to the output video memory having an address corresponding to the point of the image frame 4-1 is performed for all points in the image frame 4-1. By performing (p, q) and sequentially reading out and outputting the color information signals from the output video memory, the image cutout / distortion correction unit 2A-4
Can generate a video signal without distortion of a real object viewed from the origin O through the frame of the virtual image frame 4-1. (See U.S. Pat. No. 5,185,667 or the specification of Japanese Patent Application No. 10-62531 filed earlier).

【００５５】次に、図２の（Ｂ）を参照して、音声によ
る人物追尾について説明する。複数の無指向性マイクロ
フォン１２を配列したマイクロフォンバンクの出力信号
を、音声信号処理部２Ｂ−１によりディジタル信号に変
換し、有声音検出部２Ｂ−２により有声音（母音）が含
まれているかどうかを検出し、有声音を含む音声信号に
対しては話者位置計算部２Ｂ−３により、マイクロフォ
ンバンクの出力信号の位相差から話者位置を計算し、話
者位置の情報を人物位置決定部２Ａ−３とマイクロフォ
ン指向性決定部２Ｂ−７とに送出する。Next, with reference to FIG. 2B, the person tracking by voice will be described. An output signal of a microphone bank in which a plurality of omnidirectional microphones 12 are arranged is converted into a digital signal by an audio signal processing unit 2B-1, and a voiced sound (vowel) is included by a voiced sound detection unit 2B-2. And a speaker position calculator 2B-3 calculates a speaker position for a voice signal including a voiced sound from a phase difference of an output signal of the microphone bank, and outputs information on the speaker position to a person position determiner. 2A-3 and the microphone directivity determining unit 2B-7.

【００５６】又、有声音を含まない入力音信号に対して
は雑音源位置計算部２Ｂ−４により、同様にその位相差
から雑音源の位置を計算し、雑音源位置の情報を、マイ
クロフォン指向性決定部２Ｂ−７に送出する。なお、入
力音信号に有声音が含まれているかどうかは、線形予測
符号化（ＬＰＣ）のフィルタ係数やケプストラム分析の
技術を適用して検出することができる。For an input sound signal that does not include a voiced sound, the position of the noise source is similarly calculated from the phase difference by the noise source position calculation unit 2B-4, and the information on the noise source position is transmitted to the microphone. It is sent to the sex determination section 2B-7. Whether or not the input sound signal includes voiced sound can be detected by applying a filter coefficient of linear predictive coding (LPC) or a technique of cepstrum analysis.

【００５７】更に、相手方から回線２Ｂ−１０を介して
送信された音声が、スピーカー２Ｂ−１２により放音さ
れ、その音声がマイクロフォンバンクに入力されて相手
方にエコーとして送出されるのを防止するため、スピー
カー音検出部２Ｂ−５は、スピーカー２Ｂ−１２に入力
される音声信号とマイクロフォンバンクから入力される
音声信号との相関量を検出し、スピーカー位置計算部２
Ｂ−６は該相関量により、スピーカー２Ｂ−１２の位置
方向を検出し、その位置情報をマイクロフォン指向性決
定部２Ｂ−７に出力する。Further, in order to prevent the sound transmitted from the other party via the line 2B-10 from being emitted by the speaker 2B-12, the sound is input to the microphone bank and transmitted to the other party as an echo. , The speaker sound detection unit 2B-5 detects the correlation amount between the audio signal input to the speaker 2B-12 and the audio signal input from the microphone bank, and calculates the speaker position calculation unit 2B-5.
B-6 detects the position direction of the speaker 2B-12 based on the correlation amount, and outputs the position information to the microphone directivity determination unit 2B-7.

【００５８】マイクロフォン指向性決定部２Ｂ−７は、
話者位置情報、雑音源位置情報及びスピーカー位置情報
を基に、各音源に対するマイクロフォンバンクの指向性
を決定する。指向性制御部２Ｂ−８は、マイクロフォン
指向性決定部２Ｂ−７で決定された指向性に従って、各
音源毎に、各無指向性マイクロフォンの出力信号に異な
る位相を与えて合成することにより、音源毎に異なる指
向性を与える。The microphone directivity determining unit 2B-7
The directivity of the microphone bank for each sound source is determined based on the speaker position information, the noise source position information, and the speaker position information. The directivity control unit 2B-8 gives a different phase to the output signal of each omnidirectional microphone for each sound source in accordance with the directivity determined by the microphone directivity determining unit 2B-7, and synthesizes the sound sources. Different directivities are provided for each.

【００５９】その結果、有声音を含む話者方向の音源に
対しては利得の高い指向性を生じさせ、常時入力される
ような雑音源の方向及びモニタテレビ装置のスピーカー
の方向に対しては低利得となる指向性を生じさせること
により、音声に対しては高感度で、かつ、雑音やスピー
カー音に対しては低感度となり、雑音やスピーカー音を
抑圧し、エコーの発生を低減することができる。As a result, a high-gain directivity is produced for a sound source in the direction of a speaker including voiced sounds, and a direction of a noise source which is always input and a direction of a speaker of a monitor television apparatus are generated. By generating a low-gain directivity, it is highly sensitive to voice and low to noise and speaker sound, suppressing noise and speaker sound and reducing the occurrence of echo. Can be.

【００６０】図５は本発明の実施の形態の雑音を抑圧す
る指向性を生成する構成の説明図である。同図におい
て、Ｍ０乃至Ｍ３は無指向性マイクロフォン、５−１は
Ａ／Ｄ変換器、５−２は相互相関計算部、５−３は音源
位置計算部、５−４は話者方向／特性記憶部、５−５は
フィルタ及びディレイ部、５−６は音声合成部、５−７
はレベル調整／適応フィルタ部、５−８は加算部であ
る。FIG. 5 is an explanatory diagram of a configuration for generating directivity for suppressing noise according to the embodiment of the present invention. In the figure, M0 to M3 are omnidirectional microphones, 5-1 is an A / D converter, 5-2 is a cross-correlation calculator, 5-3 is a sound source position calculator, and 5-4 is a speaker direction / characteristic. Storage unit, 5-5 is a filter and delay unit, 5-6 is a speech synthesis unit, 5-7
Denotes a level adjustment / adaptive filter unit, and 5-8 denotes an addition unit.

【００６１】マイクロフォンバンクの無指向性マイクロ
フォンＭ０乃至Ｍ３からの出力信号は、Ａ／Ｄ変換器５
−１によりＡ／Ｄ変換され、相互相関計算部５−２に入
力される。相互相関計算部５−２は、無指向性マイクロ
フォンＭ０からの入力信号に対する他の無指向性マイク
ロフォンＭ１，Ｍ２，Ｍ３からの入力信号の相互の相関
量と相互位相差とを求め、それらを音源位置計算部５−
３に送出する。Output signals from the omnidirectional microphones M0 to M3 of the microphone bank are supplied to the A / D converter 5
The signal is A / D converted by -1 and input to the cross-correlation calculator 5-2. The cross-correlation calculation unit 5-2 obtains a mutual correlation amount and a mutual phase difference between input signals from the other omnidirectional microphones M1, M2, and M3 with respect to an input signal from the omnidirectional microphone M0, and calculates them as sound sources. Position calculation unit 5-
3

【００６２】音源位置計算部５−３は、相関量と相互位
相差とにより音源位置を計算し、該音源位置の方向を負
の指向性（抑圧特性）とするため、各無指向性マイクロ
フォンＭ０乃至Ｍ３からの入力信号に対して与える相互
のディレイ値（位相量）を計算し、その値をフィルタ及
びディレイ部５−５に送出する。なお、相互相関計算部
５−２と音源位置計算部５−３とは、図２の雑音源位置
計算部２Ｂ−４とマイクロフォン指向性決定部２Ｂ−８
に対応する。The sound source position calculation unit 5-3 calculates the sound source position based on the correlation amount and the mutual phase difference, and sets each omnidirectional microphone M0 in order to make the direction of the sound source position a negative directivity (suppression characteristic). To calculate the mutual delay value (phase amount) given to the input signal from M3, and send the calculated value to the filter and delay unit 5-5. The cross-correlation calculator 5-2 and the sound source position calculator 5-3 are the same as the noise source position calculator 2B-4 and the microphone directivity determiner 2B-8 shown in FIG.
Corresponding to

【００６３】フィルタ及びディレイ部５−５は、音源位
置計算部５−３から送出されたディレイ値に応じて、各
無指向性マイクロフォンＭ０乃至Ｍ３からの入力信号に
ディレイを与えて位相を揃え、音声合成部５−６はそれ
らを合成し、レべル調整／適応フィルタ部５−７は、合
成信号のレベルをマイクロフォンＭ０からの入力信号の
レベルに相対的に合わせ、加算部５−８は、レべル調整
／適応フィルタ部５−７からの反転出力信号を、マイク
ロフォンＭ０からの入力信号に加算して互いに相殺さ
せ、音源に対して負の指向性（抑圧特性）を生成させ
る。The filter and delay unit 5-5 delays the input signals from the omnidirectional microphones M0 to M3 in accordance with the delay value sent from the sound source position calculation unit 5-3 to make the phases uniform, The voice synthesizing unit 5-6 synthesizes them, the level adjustment / adaptive filter unit 5-7 relatively adjusts the level of the synthesized signal to the level of the input signal from the microphone M0, and the adder 5-8 , The inverted output signal from the level adjustment / adaptive filter section 5-7 is added to the input signal from the microphone M0 to cancel each other, thereby generating a negative directivity (suppression characteristic) for the sound source.

【００６４】従って、加算部５−８から出力される信号
は、雑音源等からの不要な音声帯域信号に対して抑圧さ
れたものとなる。なお、図５のフィルタ及びディレイ部
５−５と音声合成部５−６とレべル調整／適応フィルタ
部５−７と加算部５−８とは、図２の指向性制御部２Ｂ
−８に対応する。Therefore, the signal output from the adder 5-8 is a signal that is suppressed from unnecessary audio band signals from a noise source or the like. The filter / delay section 5-5, voice synthesis section 5-6, level adjustment / adaptive filter section 5-7, and addition section 5-8 in FIG. 5 are combined with the directivity control section 2B in FIG.
-8.

【００６５】又、レべル調整／フィルタ部５−７には、
加算部５−８の雑音抑圧音出力のレべルを最小とするよ
うな適応フィルタを用いることにより、より精度のよい
雑音抑圧効果が得られる。又、雑音源が複数存在する場
合には、各雑音源に対してレべルの大きい雑音源から順
次同様の処理を行うことにより、複数の雑音源に対して
同時に抑圧することができる。The level adjustment / filter unit 5-7 includes:
By using an adaptive filter that minimizes the level of the noise suppression sound output of the adder 5-8, a more accurate noise suppression effect can be obtained. When there are a plurality of noise sources, the same processing is sequentially performed for each noise source in order from the noise source having a large level, so that the noise sources can be simultaneously suppressed.

【００６６】図６は本発明の実施の形態の抑圧及び強調
指向性を生成する構成の説明図である。この構成は、音
声検出部を含み、雑音に対して抑圧指向性を生成し、音
声に対して強調指向性を生成する構成を示している。同
図において、６−１は有声音検出部であり、他の構成は
図５に示した構成と同様であるので、同一の構成要素に
は同一の符号を付し、重複した説明は省略する。FIG. 6 is an explanatory diagram of a configuration for generating suppression and enhancement directivity according to the embodiment of the present invention. This configuration includes a voice detection unit, generates suppressed directivity for noise, and generates enhanced directivity for voice. In the figure, reference numeral 6-1 denotes a voiced sound detection unit, and the other configuration is the same as the configuration shown in FIG. 5, and therefore, the same components are denoted by the same reference numerals, and redundant description will be omitted. .

【００６７】無指向性マイクロフォンＭ０からの出力信
号を有声音検出部６−１に入力し、有声音検出部６−１
は、入力信号に有声音が含まれるかどうかを検出し、そ
の検出結果を音源位置計算部５−３に送出する。なお、
有声音検出部６−１は図２に示した有声音検出部２Ｂ−
２に相当する。The output signal from the omnidirectional microphone M0 is input to the voiced sound detector 6-1 and the voiced sound detector 6-1 is input.
Detects whether a voiced sound is included in the input signal and sends the detection result to the sound source position calculation unit 5-3. In addition,
The voiced sound detection section 6-1 is the voiced sound detection section 2B- shown in FIG.
Equivalent to 2.

【００６８】有声音が含まれるとき、音源位置計算部５
−３は、発言者が発声した音声であると判定し、各無指
向性マイクロフォンＭ０乃至Ｍ３の相互位相差及び相関
量を基に、フィルタ及びディレイ部５−５に対して、そ
の方向の指向性を強調するようにディレイを設定する。When a voiced sound is included, the sound source position calculation unit 5
-3 is determined to be a voice uttered by the speaker, and based on the mutual phase difference and the correlation amount of each of the omnidirectional microphones M0 to M3, the directivity in the direction is given to the filter and the delay unit 5-5. Set the delay to emphasize the gender.

【００６９】又、各無指向性マイクロフォンＭ０乃至Ｍ
３の相互位相差及び相関量により話者の位置（音源位
置）を計算し、該話者位置（音源位置）を話者方向／特
性記憶部５−４に送出する。話者方向／特性記憶部５−
４は、該話者位置の方向と音源の特性（話者音声）を記
憶すると共に、話者位置情報（音源位置情報）を前述の
図２の人物位置決定部２Ａ−３に送出する。The omnidirectional microphones M0 to M
The position of the speaker (sound source position) is calculated based on the mutual phase difference and the correlation amount of No. 3 and the speaker position (sound source position) is sent to the speaker direction / characteristic storage unit 5-4. Speaker direction / characteristic storage unit 5-
4 stores the direction of the speaker position and the characteristics of the sound source (speaker voice), and sends the speaker position information (sound source position information) to the above-described person position determination unit 2A-3 in FIG.

【００７０】有声音検出部６−１において、有声音が検
出されない場合は、雑音と判定し、音源位置計算部５−
３は、その音源からの合成信号のエネルギーが最少とな
るように、フィルタ及びディレイ部５−５のディレイ値
（指向性パラメータ）を設定する。なお、この設定の手
法は、図５の構成で説明した手法と同様である。If the voiced sound detecting section 6-1 does not detect any voiced sound, it is determined that the sound is noise, and the sound source position calculating section 5-
3 sets the filter and the delay value (directivity parameter) of the delay unit 5-5 so that the energy of the synthesized signal from the sound source is minimized. The setting method is the same as the method described in the configuration of FIG.

【００７１】図７は本発明の実施の形態のスピーカー音
抑圧指向性を生成する構成の説明図である。同図におい
て、７−１はスピーカー音検出部である。他の構成は図
５に示した構成と同様であるので、同一の構成要素には
同一の符号を付し、重複した説明は省略する。FIG. 7 is an explanatory diagram of a configuration for generating speaker sound suppression directivity according to the embodiment of the present invention. In the figure, 7-1 is a speaker sound detection unit. The other configuration is the same as the configuration shown in FIG. 5, and therefore, the same components are denoted by the same reference symbols, and redundant description will be omitted.

【００７２】スピーカー音検出部７−１は、図２に示し
たスピーカー音検出部２Ｂ−５に相当し、モニタテレビ
装置のスピーカーへの入力信号を参照信号として使用す
る。スピーカー音検出部７−１において、所定レベル以
上のスピーカーへの入力信号が検出されると、相互相関
計算部５−２において、各無指向性マイクロフォンＭ０
乃至Ｍ３とスピーカー音との相互位相差及び相関量を計
算し、音源位置計算部５−３は、スピーカー音の抑圧指
向性パラメータを算出する。The speaker sound detector 7-1 corresponds to the speaker sound detector 2B-5 shown in FIG. 2, and uses an input signal to a speaker of the monitor television device as a reference signal. When the speaker sound detection unit 7-1 detects an input signal to a speaker of a predetermined level or higher, the cross-correlation calculation unit 5-2 causes each of the omnidirectional microphones M0 to M1.
The sound source position calculation unit 5-3 calculates the mutual phase difference and the correlation amount between M3 and M3 and the speaker sound, and calculates the suppression directivity parameter of the speaker sound.

【００７３】そして、雑音抑制の制御の場合と同様に、
フィルタ及びディレイ部５−５に対して、スピーカー音
が抑圧されるようにディレイを与え、音声合成部５−６
はスピーカー音に対しては相互に相殺するように合成す
る。Then, as in the case of the noise suppression control,
A delay is applied to the filter and delay unit 5-5 so that the speaker sound is suppressed, and a voice synthesis unit 5-6 is provided.
Are synthesized so as to cancel each other out for the speaker sound.

【００７４】図８は本発明の実施の形態の話者音声と雑
音とスピーカー音とを合成する構成の説明図である。同
図において、８−１，８−２は合成部、８−３，８−４
はフィルタ部である。前述の図５乃至図７の構成で説明
したように、話者音声と雑音とスピーカー音とは、異な
る特性の音源として区別され、それぞれの音源に対して
強調又は抑圧の指向性を与えた後、抽出されるが、該抽
出音のうち、話者音声は、合成部８−１，８−２により
強調され、雑音はフィルタ部８−３を介した後、合成部
８−１により抑圧され、スピーカー音はフィルタ部８−
４を介した後、合成部８−２により抑圧され、最終的な
合成音が合成部８−２から生成される。FIG. 8 is an explanatory diagram of a configuration for synthesizing the speaker's voice, noise, and speaker sound according to the embodiment of the present invention. In the figure, reference numerals 8-1 and 8-2 denote synthesis units, and reference numerals 8-3 and 8-4.
Is a filter unit. As described in the above-described configurations of FIGS. 5 to 7, the speaker's voice, the noise, and the speaker's sound are distinguished as sound sources having different characteristics, and after emphasis or suppression directivity is given to each sound source. Of the extracted sounds, the speaker's voice is emphasized by the synthesis units 8-1 and 8-2, and the noise is suppressed by the synthesis unit 8-1 after passing through the filter unit 8-3. , Speaker sound is filtered by 8-
After passing through 4, the signal is suppressed by the synthesis unit 8-2, and a final synthesized sound is generated from the synthesis unit 8-2.

【００７５】図９は本発明の実施の形態の話者音声と雑
音とスピーカー音とに対する指向性を示す図である。９
−１は話者音声源、９−２は雑音源、９−３はスピーカ
ー音源、９−４は話者音声源に対する指向性、９−５は
雑音源に対する指向性、９−６はスピーカー音源に対す
る指向性、９−７はテレビカメラ装置である。FIG. 9 is a diagram showing the directivity of the embodiment of the present invention with respect to speaker voice, noise, and speaker sound. 9
-1 is a speaker voice source, 9-2 is a noise source, 9-3 is a speaker sound source, 9-4 is directivity for a speaker voice source, 9-5 is directivity for a noise source, and 9-6 is a speaker sound source. 9-7 is a television camera device.

【００７６】前述の図５乃至図７に示す構成により得ら
れる抽出音を、図８に示す構成により合成して出力音を
得ることとなる。この出力音は、話者音声に対しては指
向性が強く、雑音及びスピーカー音に対しては弱い指向
性となる。従って、雑音が低減され、品質の良い話者音
声が生成されると共に、スピーカー音の回り込みが削減
され、エコーキャンセラ又はエコーサプレッサを設けた
のと同等に機能し、且つ、本発明のスピーカー音の回り
込み削減は、他の音源に対する指向性生成のための構成
を兼用して行うことができるため、エコーキャンセラ又
はエコーサプレッサを別途設けた場合に比べてはるかに
少ない演算処理量で同等の機能を果たすことができる。The output sound is obtained by synthesizing the extracted sound obtained by the structure shown in FIGS. 5 to 7 by the structure shown in FIG. This output sound has a strong directivity for the speaker's voice and a weak directivity for the noise and the speaker sound. Accordingly, noise is reduced, a high quality speaker's voice is generated, the wraparound of the speaker sound is reduced, and the function of the speaker sound of the present invention is equivalent to that of providing an echo canceller or an echo suppressor, and Since the wraparound can be performed by also using a configuration for generating directivity for another sound source, the same function can be performed with a much smaller amount of calculation processing than when an echo canceller or an echo suppressor is separately provided. be able to.

【００７７】図１０は本発明の実施の形態の可変指向性
マイクロフォンの指向性制御のフローチャートである。
ステップ１０−１においてテレビカメラ装置の音声処理
部の各部の初期設定を行い、ステップ１０−２において
監視ループ処理を行い、ステップ１０−３においてスピ
ーカー音の検出判定を行い、スピーカー音が検出された
場合はステップ１０−４において各マイクロフォン入力
音とスピーカー音との相互相関を計算し、ステップ１０
−５において相関パラメータ（位相差、相関量）を計算
し、ステップ１０−６において、指向性が減少するよう
に、マイクロフォン毎の出力にディレイ及び係数を与え
て合成（加減算）し、ステップ１０−３に戻る。FIG. 10 is a flowchart of the directivity control of the variable directivity microphone according to the embodiment of the present invention.
In step 10-1, initialization of each unit of the audio processing unit of the television camera device is performed, monitoring loop processing is performed in step 10-2, detection and determination of speaker sound is performed in step 10-3, and speaker sound is detected. In this case, in step 10-4, the cross-correlation between each microphone input sound and the speaker sound is calculated, and
In step -5, a correlation parameter (phase difference, correlation amount) is calculated. In step 10-6, a delay and a coefficient are given to the output of each microphone so as to reduce the directivity, and synthesis (addition / subtraction) is performed. Return to 3.

【００７８】スピーカー音が検出されない場合は、ステ
ップ１０−７において有声音の音量が大であるか判定
し、大であるときは話者方向を追尾するよう、ステップ
１０−８において各マイクロフォン入力音間の相互相関
を計算し、ステップ１０−９において相関パラメータ
（位相差、相関量）を計算し、ステップ１０−６におい
て指向性が増大するように、マイクロフォン毎の出力に
ディレイ及び係数を与えて合成（加減算）し、ステップ
１０−３に戻る。If no speaker sound is detected, it is determined in step 10-7 whether or not the volume of the voiced sound is loud. If the sound is loud, each microphone input sound is determined in step 10-8 to track the speaker direction. In step 10-9, a correlation parameter (phase difference, correlation amount) is calculated, and in step 10-6, a delay and a coefficient are given to the output of each microphone so that the directivity increases. Combine (addition / subtraction) and return to step 10-3.

【００７９】有声音の音量が小であるときは、雑音を打
ち消すよう、ステップ１０−１０において各マイクロフ
ォン入力音間の相互相関を計算し、ステップ１０−１１
において相関パラメータ（位相差、相関量）を計算し、
ステップ１０−６において指向性が減少するように、マ
イクロフォン毎の出力にディレイ及び係数を与えて合成
（加減算）し、ステップ１０−３に戻る。If the volume of the voiced sound is low, the cross-correlation between the microphone input sounds is calculated in step 10-10 so as to cancel the noise.
Calculates the correlation parameters (phase difference, correlation amount) at
In step 10-6, a delay and a coefficient are given to the output of each microphone to combine (addition / subtraction) so that the directivity is reduced, and the process returns to step 10-3.

【００８０】次に、本発明の画像及び音声による統合追
尾について説明する。前述したように魚眼画像による移
動像の追尾、及び複数の無指向性マイクロフォンの入力
音の位相差による話者の追尾の組み合わせにより、移動
人物の存在方向の検出とその追尾、及び話者の方向の検
出とその追尾が可能になる。Next, the integrated tracking by the image and the sound according to the present invention will be described. As described above, the combination of the tracking of the moving image by the fisheye image and the tracking of the speaker by the phase difference of the input sounds of the plurality of omnidirectional microphones enables the detection and tracking of the direction in which the moving person exists and the tracking of the speaker. The direction can be detected and tracked.

【００８１】魚眼レンズカメラによる全天周映像の中か
ら、一部の人物を切り出す操作は、複数の人物に対して
同時に並行して実行することができ、その切り出しにお
いて、移動人物又は話者の検出結果は、互いに独立に採
用することが可能であり、外部からの設定によって動人
物追尾優先、話者追尾優先、等の選択を行うことが可能
となる。The operation of cutting out a part of persons from the whole sky image by the fish-eye lens camera can be executed simultaneously for a plurality of persons at the same time. The results can be adopted independently of each other, and it is possible to select moving person tracking priority, speaker tracking priority, and the like by setting from the outside.

【００８２】前述の図２に示した人物位置決定部２Ａ−
３は、人物位置検出部２Ａ−２又は話者位置計算部２Ｂ
−３から得られる移動人物位置情報又は話者位置情報に
基づいて、設定された優先順序に従って、移動像の追尾
又は音源位置方向の追尾を行う。又、移動像の追尾或い
は音源位置方向の追尾を、キー入力により選択するよう
に構成することもできる。The person position determining unit 2A- shown in FIG.
3 is a person position detecting unit 2A-2 or a speaker position calculating unit 2B
Based on moving person position information or speaker position information obtained from -3, tracking of a moving image or tracking in a sound source position direction is performed in accordance with a set priority order. The tracking of the moving image or the tracking in the direction of the sound source position may be selected by key input.

【００８３】図１１は、本発明の実施の形態の送出画像
の例を示す図である。図の（Ａ）は４人の人物に対して
同時に並行して切り出し操作を実行し、４画面の合成映
像としたものである。それぞれの人物が会話をせずに移
動するような場合には、それぞれの個々の人物追尾機能
が働き、常時分割された映像の中に、人物が表示され
る。FIG. 11 is a diagram showing an example of a transmission image according to the embodiment of the present invention. (A) of the figure shows a case where a clipping operation is simultaneously performed on four persons in parallel to form a composite image of four screens. When each person moves without having a conversation, the individual person tracking function works, and the person is displayed in the always divided video.

【００８４】又、人物が静止した状態で会話を続けるよ
うな場合には、音声追尾による位置決め精度の向上によ
り、人物の顔部分の特定が可能になり、図（Ｂ）の右上
画面又は図の（Ｃ）の左下画面のように、ズームアップ
を行うことができる。当然、会話を行う人物が変わった
り、複数存在する場合には、それに合わせて人物をズー
ムアップすることが可能になる。このように、話者に対
してズーミングを行うこと等で、従来に比べ、よりアク
ティビティのある映像を、人が操作を行うことなく確実
にかつ迅速に行うことができる。Further, in the case where the person continues a conversation in a stationary state, the positioning accuracy is improved by voice tracking, so that the face portion of the person can be specified, and the upper right screen of FIG. Zooming up can be performed as in the lower left screen of (C). Naturally, if the person who is having a conversation changes or if there are a plurality of persons, the person can be zoomed up accordingly. As described above, by performing zooming on a speaker, a video having more activity can be reliably and quickly performed without a human operation, as compared with the related art.

【００８５】[0085]

【発明の効果】以上説明したように、本発明によれば、
魚眼又は超広角レンズのカメラと、複数の無指向性マイ
クロフォンを配列した可変指向性マイクロフォンとによ
り、画像及び音声を全て収集し、その中から話者又は移
動人物等の画像を切り出して表示する構成としたことに
より、可動部分が全くなく、会議参加者に違和感を与え
ないコンパクトでシンプルなテレビカメラ装置を構成す
ることができる。As described above, according to the present invention,
With a fish-eye or ultra-wide-angle lens camera and a variable directional microphone in which a plurality of omnidirectional microphones are arranged, all images and voices are collected, and images of a speaker or a moving person are cut out from the images and displayed. With this configuration, it is possible to configure a compact and simple television camera device that has no moving parts and does not give a sense of discomfort to conference participants.

【００８６】又、複数の無指向性マイクロフォンによる
音源位置方向の特定において、有声音の検出及びスピー
カー音との相関の検出を行い、音源の音質に応じて異な
る指向性を生成することにより、雑音やエコーの少ない
高品質の臨場感のあるテレビ会議を実現することができ
る。Further, in specifying the sound source position direction using a plurality of omnidirectional microphones, voiced sound and correlation with speaker sound are detected, and different directivities are generated in accordance with the sound quality of the sound source, thereby producing noise. It is possible to realize a high-quality realistic video conference with little or no echo.

【００８７】又、魚眼又は超広角レンズのカメラによる
移動人物追尾と、可変指向性マイクロフォンによる話者
追尾の機能を備えたことにより、移動せずに発言する人
物の画像切り出し、及び音声を発生せずに移動する人物
の画像切り出しを、安定的に高速且つ高精度で行うこと
ができ、アクティビティのある画像を生成することがで
きる。Further, by providing a function of tracking a moving person with a fish-eye or ultra-wide-angle lens camera and a function of tracking a speaker with a variable directional microphone, an image of a person who speaks without moving and a voice can be generated. It is possible to stably perform high-speed and high-accuracy clipping of an image of a person who moves without moving, and generate an image with activity.

【００８８】又、監視用のテレビカメラ装置等として使
用した場合に、逆光や照明不足などのため移動像の切り
出しが難しい場合でも、音声の発生方向を検知すること
により、安定した人物の切り出しと追尾が可能となる
等、使用状況に適応した切り出し処理を設定することが
できる。Further, when used as a surveillance television camera device or the like, even if it is difficult to cut out a moving image due to backlight or insufficient lighting, etc., it is possible to detect a sound generation direction to obtain a stable cutout of a person. It is possible to set a cutout process adapted to the use situation, such as enabling tracking.

[Brief description of the drawings]

【図１】本発明の実施の形態のテレビカメラ装置の構成
と使用状況を示す図である。FIG. 1 is a diagram showing a configuration and a use state of a television camera device according to an embodiment of the present invention.

【図２】本発明の実施の形態のテレビカメラ装置を用い
たテレビ会議システムの構成を示す図である。FIG. 2 is a diagram showing a configuration of a video conference system using the television camera device according to the embodiment of the present invention.

【図３】本発明の実施の形態のテレビカメラ装置の人物
位置検出部の構成を示す図である。FIG. 3 is a diagram illustrating a configuration of a person position detecting unit of the television camera device according to the embodiment of the present invention.

【図４】魚眼レンズの画像の切り出しと歪み補正の原理
説明図である。FIG. 4 is a diagram illustrating the principle of clipping an image of a fisheye lens and correcting distortion.

【図５】本発明の実施の形態の雑音を抑圧する指向性を
生成する構成の説明図である。FIG. 5 is an explanatory diagram of a configuration for generating a directivity for suppressing noise according to the embodiment of the present invention;

【図６】本発明の実施の形態の抑圧及び強調指向性を生
成する構成の説明図である。FIG. 6 is a diagram illustrating a configuration for generating suppression and enhancement directivity according to the embodiment of the present invention.

【図７】本発明の実施の形態のスピーカー音抑圧指向性
を生成する構成の説明図である。FIG. 7 is an explanatory diagram of a configuration for generating speaker sound suppression directivity according to the embodiment of the present invention.

【図８】本発明の実施の形態の話者音声と雑音とスピー
カー音とを合成する構成の説明図である。FIG. 8 is an explanatory diagram of a configuration for synthesizing a speaker voice, noise, and a speaker sound according to the embodiment of the present invention.

【図９】本発明の実施の形態の話者音声と雑音とスピー
カー音とに対する指向性を示す図である。FIG. 9 is a diagram illustrating directivity of a speaker voice, noise, and speaker sound according to the embodiment of the present invention.

【図１０】本発明の実施の形態の可変指向性マイクロフ
ォンの指向性制御のフローチャートである。FIG. 10 is a flowchart of directivity control of the variable directivity microphone according to the embodiment of the present invention.

【図１１】本発明の実施の形態の送出画像の例を示す図
である。FIG. 11 is a diagram showing an example of a transmission image according to the embodiment of the present invention.

【図１２】従来のテレビ会議用のカメラ旋回システムを
示す図である。FIG. 12 is a diagram showing a conventional camera turning system for a video conference.

【図１３】従来の話者位置を表示するテレビ会議システ
ムを示す図である。FIG. 13 is a diagram showing a conventional video conference system for displaying speaker positions.

[Explanation of symbols]

１０テレビカメラ装置１１魚眼レンズ部１２無指向性マイクロフォン１３ＣＣＤ撮像部１４会議参加者１５会議卓１６モニタテレビ装置 DESCRIPTION OF SYMBOLS 10 Television camera apparatus 11 Fisheye lens part 12 Non-directional microphone 13 CCD imaging part 14 Conference participant 15 Conference table 16 Monitor television apparatus

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＨ０４Ｎ 5/268 Ｈ０４Ｎ 7/15 7/15 Ｇ０６Ｆ 15/62 ３８０ 15/64 ３４０Ｂ 15/66 ４７０Ａ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁶ Identification symbol FI H04N 5/268 H04N 7/15 7/15 G06F 15/62 380 15/64 340B 15/66 470A

Claims

[Claims]

1. A television camera device used for video conference or remote monitoring, comprising: a camera having a fisheye or super wide-angle lens; a variable directional microphone in which a plurality of omnidirectional microphones are arranged; A television camera device comprising: means for determining the direction of a sound source position and tracking the direction of the sound source position; and means for cutting out a screen in the direction of the sound source position and generating a video signal.

2. A television camera device used for video conference or remote monitoring, comprising: a fish-eye or super wide-angle lens camera; a variable directional microphone in which a plurality of omnidirectional microphones are arranged; Means for determining the direction of the sound source position and tracking the direction of the sound source position; means for detecting and tracking a moving image based on a difference between frames of an image captured by a camera of the fisheye lens or the ultra-wide-angle lens; A television camera device comprising: means for selecting one of a means for tracking a position direction and a means for detecting and tracking a moving image to cut out a screen and generate a video signal.

3. The apparatus according to claim 2, wherein the means for tracking the direction of the sound source position includes means for detecting a voiced sound, and has a configuration for cutting out a screen in the direction of the sound source position where the voiced sound is detected and generating a video signal. The television camera device according to claim 1 or 2, wherein

4. The apparatus according to claim 1, wherein the means for tracking the direction of the sound source position synthesizes an output sound signal of the omnidirectional microphone so as to reduce a gain in a direction of the sound source position where no voiced sound is detected. The television camera device according to claim 1.

5. The means for tracking the direction of a sound source position has means for detecting a speaker sound signal of a monitor television apparatus, and a means for detecting an output sound signal of the omnidirectional microphone and a speaker sound signal of the monitor television apparatus. Calculate the correlation,
5. The apparatus according to claim 1, wherein a speaker position is determined, and an output sound signal of the omnidirectional microphone is synthesized so as to suppress a speaker sound output from the omnidirectional microphone. A television camera device according to item 1.

6. A television camera device comprising: means for tracking the direction of a sound source position; and means for detecting and tracking a moving image. The television camera device according to any one of claims 2 to 5, further comprising a configuration for generating a video signal to be displayed in an emphasized manner.

7. A television camera apparatus comprising: means for tracking the direction of a sound source position; and means for detecting and tracking a moving image. 6. The television camera device according to claim 2, wherein the television camera device has a configuration for generating a video signal for highlighting and displaying a person.

8. A system comprising: means for tracking the direction of the sound source position; and means for detecting and tracking a moving image, wherein one of the two tracking means is selected by key input. Item 6. A television camera device according to any one of Items 2 to 5.

9. A fish-eye or ultra-wide-angle lens camera and a plurality of omni-directional microphones are integrated, the fish-eye or ultra-wide-angle lens is arranged at a central portion, and the plurality of omni-directional microphones are arranged at a peripheral portion. 9. The television camera device according to claim 1, wherein the television camera device is arranged.